Untitled

Optics f 2f
from Fourier to Fresnel

Optics f 2f
from Fourier to Fresnel
Charles S. Adams and Ifan G. Hughes
Physics Department, Durham University
1
3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Charles S. Adams and Ifan G. Hughes 2019
The moral rights of the authors have been asserted
First Edition published in 2019
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2018953423
ISBN 978–0–19–878678–8 (hbk.)
ISBN 978–0–19–878679–5 (pbk.)
DOI: 10.1093/oso/9780198786788.001.0001
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
To K for love and cake, and E for the hedgehog. (CSA)
I Dad ac Aled, cyd-deithwyr ffyddlon ar drywydd dirgelwch
goleuni. (IGH)
Preface
Beginning in 1814, Augustin-Jean Fresnel performed a series of experi-

ments using a honey droplet as a lens and formulated the mathematics of
summing curved waves that we still use today. A few years later, Joseph
Fourier developed a mathematical framework to sum planar waves. In
this book we apply the wave theories of both Fresnel and Fourier to the
propagation of light. Our title, Optics f 2f , is a tribute to their genius.
This book has arisen from lecture notes used by the authors when
teaching compulsory core modules on Optics to second and third year
undergraduates in the Physics Department at Durham University for
two decades. When preparing teaching material for these courses, we
searched for a suitable text. Although there is no shortage of books on
optics, no single text treated the subject in the way we think about light.
So starting from the pre-requisites of a basic knowledge of geometric
optics, and the mathematics of complex numbers and vector operators,
we wrote our own.
After reviewing Maxwell’s wave theory, we begin by treating one
wave—with planar or curved wave fronts—then two waves, and finally
many waves. By summing waves, we show how it is possible to
understand all of the key concepts of optics, including interference,
polarization, diffraction, and coherence, as well as focusing and image
formation. A recurring theme is how from the simple building blocks of
plane and spherical waves we can construct any light field. We emphasize
that although plane waves are exactly transverse, and it is sometimes a
good approximation to treat the field as a plane wave, real light fields
are always a superposition of plane waves, and the electric field is not
necessarily perpendicular to the direction of propagation. In fact, now
it is commonplace, not only to study, but also to enhance and exploit,
the non-transverse nature of light.
We build on insights gained from research on optical and atomic
physics to give students grounding in the fundamentals of optics, and to
illustrate the key concepts with contemporary examples from modern
research. Fourier methods and the angular-spectrum approach are
used extensively throughout, especially to provide a unified approach
to Fraunhofer and Fresnel diffraction. Particular attention is paid
to analysing topics in contemporary optics—propagation, dispersion,
laser beams and wave guides, apodization, tightly focused vector fields,
unconventional polarization states, and light–matter interactions. We
exploit (i) the development of numerically efficient fast Fourier transform
routines to simulate light propagation, and (ii) the fact that a modern
viii Preface
PC, or laptop (or even your phone!) can be used to perform diffraction
integrals efficiently. The teaching of optics cannot be reduced to a
sequence of computer codes and algorithms; however, the insight gained
by utilizing modern computer techniques to produce visualizations in
conjunction with more conventional analytic approaches is, in our
experience, a far better way to gain a deeper understanding of light
propagation. For this reason, we make available a library of supporting
python codes that were used to generate many of the figures in this
book; see:
http://www.dur.ac.uk/physics/opticsf2f
Preliminary versions of parts of the material have been used by thou-
sands of physics and natural sciences students at Durham University;
we are grateful to all of those who helped identify and eradicate
inconsistencies, errors, and other sources of confusion. We also thank the
undergraduates and summer students who let us use some of the images
and data recorded in the undergraduate laboratory. In many ways, we
have learnt more from our students than any one else. Many colleagues
kindly donated their time to proofread various chapters, and we are
indebted to them for this service: Tom Lancaster gave us sage advice
about writing a physics book, and made suggestions for improvements
on most of the manuscript; we benefited enormously from the years of
experience as an optics teacher of Steve Hopkins, who read carefully the
first half; Robert Bettles helped root out mathematical inconsistencies
and made suggestions for clearer wording, especially in Chapter 13;
Lukas Novotny proofread Chapter 12; Aled and Rhiannon Hughes gave
advice on the photographic elements; and Eileen Lovell discovered our
problem with plurals. The authors have enjoyed discussions on various
topics in optics with mentors and colleagues over the years, including
Geoffrey Brooker, Antoine Browaeys, Allister Ferguson, Matthew Jones,
Klaus Mølmer, Tilman Pfau, Erling Riis, and Nicholas Spong. Mr John
Harris inspired one of the authors (IGH) to study optics at Ysgol Gyfun
Ystalyfera.
We would like to thank our families for support and encouragement,
our copy editor Graham Bliss, production editor Saranya Jayakumar,
and Sönke Adlung and Harriet Konishi at OUP for their enthusiasm
and patience.
CSA and IGH, Durham, on the 311th birthday of Leonhard Euler, who
taught us that eiπ = −1, 15 April, 2018.
Contents
1 Light as a wave 1
1.1 Wave optics 1
1.2 A brief history 1
1.3 Maxwell’s equations 2
1.4 Maxwell’s wave equation 2
1.5 Principle of superposition 3
1.6 The harmonic wave solution 3
1.7 E or B? 4
1.8 Phasors 5
1.9 Spatial frequency 5
1.10 Intensity/Poynting vector 7
1.11 Complex representation 8
1.12 Scalar approximation 9
1.13 General solution 10
1.14 Propagation 10
1.15 Waves and quanta 11
Exercises 13
2 One wave: plane or curved 15

2.1 Introduction 15
2.2 Wave fronts 15
2.3 Plane waves 16
2.4 Transverse property 17
2.5 Scalar plane wave 18
2.6 Plane wave in a medium 19
2.7 Law of refraction 20
2.8 Dispersion 20
2.9 Fresnel coefficients 20
2.10 Brewster’s angle 22
2.11 Reflectivity 22
2.12 Curved wave fronts 23
2.13 Paraxial optics 23
2.14 Paraxial curvature 25
2.15 Lenses: a brief history 26
2.16 Geometry of a lens 27
2.17 Collimation 28
2.18 Imaging property 29
Exercises 31
x Contents
3 Two waves: interference 33

3.1 Introduction 33
3.3 Two plane waves 35
3.4 Standing waves 36
3.5 Two spherical waves 36
3.6 Young’s interferometer 37
3.7 Plane plus spherical 40
3.8 Three waves 42
3.9 Diffraction grating 43
3.10 Interferometry 45
3.11 Fabry–Perot etalon 45
3.12 Michelson interferometer 47
Exercises 49
4 Polarization 51
4.1 Introduction 51
4.2 Linear basis (|) 52
4.3 Linear polarization (|) 53
4.4 Circular polarization (|) 53
4.5 Elliptical polarization (|) 55
4.6 Circular basis (◦) 55
4.7 Poincaré sphere (◦) 56
4.8 Photon spin (◦) 56
4.9 Polarized light in a medium 57
4.10 Polarizers 58
4.11 Malus’ Law 58
4.12 Linear birefringence (|) 59
4.13 Wave plates (|) 59
4.14 Circular birefringence (|) 61
4.15 Natural optical activity (|) 61
4.16 The Faraday effect (|) 62
4.17 Interference 64
Exercises 67
5 Many waves I: Fresnel and Fraunhofer 71

5.1 Introduction 71
5.3 Fresnel diffraction integral 72
5.4 Fresnel zones 74
5.5 Circular aperture 75
5.6 Cartesian separability 76
5.7 Fraunhofer diffraction 77
5.7.1 Case I: Focal plane of a lens 77
5.7.2 Case II: Far field 78
5.8 One, two, many slits 79
5.9 2D Fraunhofer 82
5.10 Fresnel integrals 85
Contents xi
5.11 Talbot effect 87

Exercises 88
6 Many waves II: Fourier 91

6.1 Introduction 91
6.2 Fourier 91
6.3 Angular spectrum 95
6.4 Propagation 98
6.5 Fourier to Fresnel 99
6.6 Fresnel to Fourier 101
6.7 Regular arrays 102
6.8 Babinet’s principle 104
Exercises 107
7 Optical phenomena in the time domain 111

7.1 Introduction 111
7.2 Frequency spectrum 111
7.3 An optical pulse 112
7.4 Two pulses 113
7.5 Multiple pulses 114
7.6 Two frequencies 116
7.7 Many waves: propagation 117
7.8 Group propagation 118
7.9 Group velocity dispersion 119
7.10 Dispersive resonance 121
7.11 Slow light 122
7.12 Fast light 122
7.13 Information propagation 123
Exercises 125
8 Coherence 127
8.2 Statistical light 128
8.3 Temporal coherence 128
8.4 White light 130
8.5 Wiener–Khinchin–Einstein theorem 132
8.6 Power spectral density 133
8.7 Intensity correlations 136
8.8 Spatial coherence 136
8.9 van Cittert–Zernike 137
8.10 Propagation of coherence 140
8.11 Stellar interferometry 141
Exercises 142
9 Optical imaging 147

9.2 History: Zeiss and Abbe 147
9.3 Point-spread function 148
9.4 Angular resolution 149
xii Contents
9.5 f to f 151
9.6 Two-lens system 153
9.7 Magnification 155
9.8 Complementarity I 156
Exercises 157
10 Spatial filtering 159

10.2 Apodization 159
10.2.1 Apodization of 1D apertures 160
10.2.2 Apodization of 2D apertures 162
10.2.3 Inverse apodization and super-resolution 163
10.3 Spatial filtering 164
10.4 1D periodic 165
10.6 2D arbitrary objects 167
10.7 Convolution 170
10.8 Phase-contrast imaging 171
10.9 Complementarity II 172
Exercises 174
11 Light propagation: beams and guides 177

11.2 Laser beam propagation 177
11.3 Focusing of laser beams 180
11.4 Optical cavities 182
11.5 Waveguides 183
11.6 Modes within a slit 184
11.7 A cylindrical light guide 185
11.8 Step-index fibre 188
11.9 Fibre modes 189
Exercises 193
12 Vector light fields 195

12.1 EM fields are not purely transverse 195
12.2 Beyond paraxial 196
12.2.1 Optical beams—‘non-existence’ theorems 198
12.3 Vector angular spectrum 198
12.3.1 Far-field analysis of the angular spectrum 200
12.4 Radial/azimuthal modes 201
12.4.1 Radial polarization 202
12.4.2 Azimuthal polarization 202
12.5 High-NA focusing 203
12.5.1 Geometry and integral representation of the fo-
cused field 204
12.5.2 Linearly polarized illumination 205
12.5.3 Radially polarized illumination 207
12.5.4 Azimuthally polarized input 207
Contents xiii
Exercises 210
13 Light and matter 213

13.1 Induced dipoles 213
13.2 Refractive index 214
13.3 Ewald–Oseen extinction 217
13.4 Clausius–Mossotti 219
13.5 Kramers–Kronig 220
13.6 Point-like scatterers 221
13.6.1 Scattering cross section 222
13.7 The extinction paradox 225
13.8 Metals 225
13.9 Non-linear optics 228
Exercises 231
A Electromagnetic scalar and vector potentials 233

A.1 The potentials φ and A 233
A.2 Gauge transformations 234
A.3 Application: Electric field of a dipole 235
B Fourier transform toolkit 239

B.1 Executive summary 239
B.2 δ-function 240
B.3 Properties 241
B.4 Convolution 242
B.5 rect sinc 243
B.6 gauss gauss 244
B.7 δ-function constant 246
B.8 Phasor δ-function 247
B.9 comb comb 247
B.10 2D Fourier transforms 250
B.11 Cartesian separability 250
B.12 2D rect 250
B.13 circ jinc 250
B.14 Fourier on a computer 252
Exercises 253
C Induced dipoles 255

C.1 Induced dipole moment 255
C.2 Optical Bloch equations 256
C.3 Complex polarizability 258
C.4 Lorentz model 259
References 261
Index 267
Light as a wave 1
We’re all equal before a wave.
1.1 Wave optics 1
Laird John Hamilton (San Francisco 1964–)
1.3 Maxwell’s equations 2
1.1 Wave optics 1.4 Maxwell’s wave equation 2
This book is about wave optics, which is the foundation stone of 1.6 The harmonic wave solution 3
the wider edifice of optical phenomena illustrated in Fig. 1.1. The 1.7 E or B? 4
optics map includes: electromagnetic optics, where we care about 1.8 Phasors 5
the electromagnetic character of light; quantum optics, where we 1.9 Spatial frequency 5
care more about effects associated with counting individual photons, 1.10 Intensity/Poynting vector 7
and non-linear optics where the field is sufficiently strong that the 1.11 Complex representation 8
interaction with a medium is non-linear, see Chapter 13. Wave optics 1.12 Scalar approximation 9
gets us surprisingly far, and only in a few special topics do we need to 1.13 General solution 10
invoke additional phenomena associated with the full electromagnetic
1.14 Propagation 10
theory or quantum theory. A cornerstone of wave optics is the principle
of superposition, see Section 1.5. This says that we can add any
Chapter summary 13
solution of the wave equation to form new solutions. So starting with
Exercises 13
one wave (this chapter and Chapter 2), we can add another wave
and explain two wave phenomena such as interference (Chapter 3)
and polarization (Chapter 4), and then we add more (Chapters 5–
7). A sum of many waves allows us to explain the full range of wave
complexity found in Nature.
1.2 A brief history

Historically, it was not obvious that light is a wave and that radio
waves, light, and X-rays are all related. As the wavelength of light
is much smaller than the dimensions of everyday objects, observation
suggests that light propagates in straight lines like the rays in Fig. 1.2.
Although a wave theory was discussed, in particular by Christiaan Fig. 1.1 Wave optics is the foun-
Huygens (born The Hague 1629–died 1695), it was not widely accepted dation stone for a broader range
of sub-fields including: ray optics—
until the experiments of Thomas Young (Milverton 1773–London 1829) important in the design of optical
and Augustin-Jean Fresnel (Broglie 1788–Ville-d’Avery 1827). The instruments; electromagnetic optics in-
physical origin of this wave remained unknown until Michael Faraday cluding plasmonics (light interactions
(London 1791–1867) suggested in 1846 that radiation could be a with metals); quantum optics—which
is about photon correlations; and non-
transverse vibration of electric and magnetic lines of force. The linear optics—widely used to convert
mathematical proof that light was electromagnetic in origin was provided the frequency of light.
by James Clerk Maxwell (Edinburgh 1831–Cambridge 1879) in 1865. By
Optics f2f: From Fourier to Fresnel. Charles S. Adams and Ifan G. Hughes
c Charles S. Adams and Ifan G. Hughes 2019.
Published in 2019 by Oxford University Press. DOI: 10.1093/oso/9780198786788.001.0001
2 Light as a wave
1
Armand Hippolyte Louis Fizeau comparing data on the speed of light1 with independent measurements
(Paris 1819–Venteuil 1896) and Jean of the electrical permittivity and magnetic permeability of free space, 0
Bernard Léon Foucault (Paris 1819–
and μ0 , respectively,2 Maxwell realized that light is an electromagnetic
1868) made accurate measurements of √
the speed of light in 1848 and 1850, wave that travels at a speed c = 1/ μ0 0 . As most information we
respecitively. acquire about the Universe is delivered by electromagnetic waves this
2
Wilhelm Eduard Weber (Wittenberg was hugely significant. In making the connection, Maxwell unified the
1804–Göttingen 1891) and Rudolf Her- previously unrelated disciplines of optics and electromagnetism, and
mann Arndt Kohlrausch (Göttingen introduced the concept of the electromagnetic field—a function of space
1809–Erlangen 1858) found that the
ratio of electrostatic to electromagnetic
and time that characterizes the forces on particles. The field concept
units could be combined to produce a provided a template for modern physics and was subsequently applied
speed close to that of light in 1856. to matter as well, see e.g. Lancaster and Blundell (2014).
1.3 Maxwell’s equations

The electromagnetic field—and hence light—is described by a set of
equations, known as Maxwell’s equations, which are an appropriate
starting point for a book on optics. For now, we shall focus on the
vacuum form of Maxwell’s equations which may be written as3
∂B
∇×E = − , (1.1)
∂t
Fig. 1.2 Light rays propagating ∂E
through the trees. ∇×B = μ0 0 , (1.2)
∂t
3
The original twenty equations of ∇·E = 0, (1.3)
Maxwell were reduced to four by
Oliver Heaviside (Camden 1850– ∇·B = 0, (1.4)
Torquay 1925), who discovered
Maxwell’s Treatise on Electricity and where ∇ is the gradient operator vector, E and B are the electric and
Magnetism in Newcastle’s Literary magnetic field vectors, and 0 and μ0 are the electrical permittivity
and Philosophical Society Library and
and magnetic permeability of free space. The electromagnetic field is
went on to develop vector algebra.
The potential form of Maxwell’s manifest via the Lorentz force on a charge q moving with velocity v:
equations (outlined in Appendix A)
is more elegant as there are only F = q(E + v × B) . (1.5)
two equations: one for the three-
component vector potential A and one In Section 1.7, we shall show that for v less than the speed of light the
for the scalar potential φ. These four
numbers completely specify the field in electric field term in eqn (1.5) is larger.
four-dimensional space-time.
1.4 Maxwell’s wave equation

To derive the wave equation we take the curl of eqn (1.1)
∂
∇ × (∇ × E) = − ∇×B , (1.6)
∂t
4
Note that some care is needed in and then use the vector identity,4 ∇ × (∇ × E) = ∇(∇ · E) − ∇2 E
using this relation with spherical- together with eqn (1.3) on the left-hand side of eqn (1.6), plus eqn (1.2)
polar coordinates; see Robinson (1973)
Chapter 10.
on the right-hand side, to find that
1 ∂2E
∇2 E − =0, (1.7)
c2 ∂t2
√
where c = 1/ μ0 0 is the speed of light. Maxwell’s wave equation,
eqn (1.7), is linear—there are no terms above first order in E. Note
also that both the electric and magnetic fields are governed by the same
wave equation.5 The challenge of wave optics is that even for the simplest 5
Taking the curl of eqn (1.2) and
of scenarios, the solution of the vector wave eqn (1.7) is complicated. substituting eqn (1.1) and eqn (1.4)
allows us to derive the wave equation
However, as we shall see there are a number of simplifications that often for the magnetic field:
apply.
1 ∂2B
∇2 B − =0. (1.8)
c2 ∂t2
1.5 Principle of superposition
A key feature of the wave equation, eqn (1.7), is that it is linear in the
field. As a consequence the principle of linear superposition holds.
Principle of superposition
If E 1 is a solution of the wave equation

and E 2 is a solution of the wave equation
then E 1 +E 2 is also a solution of the wave equation.
This is a powerful result. It allows us to build new solutions from known

solutions.6 6
A recurring theme throughout the
book is how (potentially complicated)
solutions of interest can always be con-
structed as a sum of simpler solutions—
1.6 The harmonic wave solution like plane or spherical waves.
A particularly useful solution of the wave equation is a wave with

a particular wavelength λ. This is known as the harmonic wave
solution and corresponds to the case of monochromatic light. The
harmonic wave solution is written as
F = F 0 cos(k · r − ωt + φ0 ) , (1.9)
where k = (kx , ky , kz ) is known as the wave vector, ω is the angular

frequency, φ0 is a phase offset, and F 0 is the amplitude and direction
of the field vector. The vector F may represent either E or B. The two
main types of waves that we consider in Chapter 2—plane and spherical
waves—are both special cases of the harmonic wave solution, where the
difference resides in how k and F 0 depend on the spatial coordinates.7 7
The field amplitude F 0 may also de-
The magnitude of the wave vector, k, is related to the wavelength, pend on time. In which case, the wave
is no longer purely monochromatic.
λ, via the relation This is discussed in Chapter 7.
2π
|k| = k = . (1.10)
λ
The period of the wave, T , is related to the angular frequency via the
relation T = 2π/ω. For a single wave we can set φ0 = 0, but when we
consider more than one wave, we may need to include a phase offset.
The wave travels along the direction of the wave vector k. By
considering the argument of the cosine, it is evident that the phase
4 Light as a wave
velocity is given by vp = ω/k. On substituting the periodic solution

eqn (1.9) into the wave equation eqn (1.7) we find, as expected, that the
phase velocity in free space is simply the speed of light, i.e.
ω
vp = =c. (1.11)
k
Plots of the temporal and spatial evolution of the electric field for a
harmonic wave (with k along the z axis) are depicted in Fig. 1.3. For
visible light, the field changes sign every femtosecond, which makes it
difficult to measure the field amplitude directly. In principle, the electric
8
A positive or negative electric field field amplitude may be read-out via the motion of electric charge,8 but
amplitude corresponds to a positive this only works up to microwave frequencies (10–100 GHz). There are no
(upwards) or negative (downwards)
force on a charge.
electronic components that respond fast enough to detect optical fields
with frequencies of hundreds of terahertz. Consequently, for infra-red
and visible light, we measure the time-averaged energy flux or intensity,
instead, see Section 1.10.
1.7 E or B?
For an electromagnetic wave in vacuum or a non-magnetic medium, the
electric and magnetic fields, E and B, are linearly related, and it is
sufficient to consider either E or B only. To show this, take a harmonic
wave solution with k along the z axis such that, E = E 0 cos(kz −ωt) and
B = B0 cos(kz − ωt). Using the Maxwell equation, ∇ × E = −∂B/∂t,
we find that k × E 0 = ωB0 , and hence that
Fig. 1.3 A harmonic wave with
wavelength λ = 600 nm travelling along |E 0 /B0 | = c .
the z axis: eqn (1.9) with k·r = kz and
phase offset φ0 = 0. (a) The magnitude In general, we shall choose to work with E because it has the stronger
of the electric field as a function of time
at z = 0. We observe a wave crest
interaction with charges inside a medium. For example, the ratio of the
every 2 fs, i.e. 500, 000 billion waves per electric to the magnetic force from eqn (1.5) is
second, corresponding to a frequency of
F e qE c
500 THz. (b) The magnitude of the =
electric field as a function of position F m qv × B = v , (1.12)
at t = 0. We observe a wave crest
every 600 nm, corresponding to 1.67
million waves per metre, i.e. a spatial
where we have used the fact that for a harmonic wave in free space
frequency of 1.67 × 106 m−1 . |E/B| = c. In an insulator or dielectric the speed of a charge can be
estimated using the Bohr model of an atom. If the mean radius of a
bound electron is of the order of the Bohr radius a0 then, recalling that
in the Bohr model, the angular momentum of excited states is quantized,
mvr = , we arrive at an electron speed of v = /ma0 . Consequently,
the ratio of the electric to the magnetic force is

Fe c c 1

F m = v = /ma0 = α , (1.13)
where α 1/137 is the fine-structure constant. As F e is over one

hundred times larger, it is often sufficient to consider only the electric
field component of the electromagnetic wave.
1.8 Phasors 5
1.8 Phasors
A convenient way to represent the phase of any wave at a particular
position and time is using a phasor—a unit vector that rotates anti-
clockwise in a fictional plane with an angle φ relative to the positive
horizontal axis.9 For the harmonic wave solution, eqn (1.9), with F 9
In complex notation, see Section 1.11,
replaced by E: this fictional plane corresponds to the
complex plane.
E = E 0 cos φ , (1.14)
where the phase φ = k · r − ωt + φ0 . The phasor evolution for a wave

propagating along the z axis with φ0 = 0 is shown in Fig. 1.4.
Fig. 1.4 (a) The magnitude of the

electric field as a function of position for
a harmonic wave propagating along the
z axis at t = 0. (b) Phasor diagrams
corresponding to particular positions.
For increasing z, the phasor rotates
anti-clockwise with a phasor angle, φ =
kz. In contrast, at z = 0, the phasor
rotates clockwise with time, φ = −ωt.
Warning:
A phasor is a unit vector in a fictional plane representing the phase
of a wave, φ. The axes are directions in virtual space, not real space.
A phasor vector is only a graphical representation of phase and has
nothing to do with the electric field (or polarization) vector.
1.9 Spatial frequency

As well as frequency—the number of waves passing a point in unit time—
we can define a spatial frequency as the number of waves per unit
length—or wave ‘density’. Spatial frequency has the units of inverse
length, and is the real-space analogue of frequency. Any periodic wave
form, or structure, has a dominant, or fundamental, spatial frequency.
For example, the spatial frequency of rows of vines in a vineyard is about
0.5 per metre, i.e. one row every 2 m. Figure 1.5 shows a visualization of
periodic patterns where the lower image has double the spatial frequency
and hence half the wavelength. In these examples the spatial frequency
6 Light as a wave
is zero in the vertical direction. Another example, a brick wall has a

higher spatial frequency in the vertical than the horizontal direction.
For monochromatic light, the number of waves per unit length in the
propagation direction is equal to the inverse of the wavelength and is
often referred to as the wave number:
ν̃ = 1/λ . (1.15)
The magnitude of the wave vector is the phase change per unit length
and equal to 2π times the wave number:
k = 2πν̃ = 2π/λ . (1.16)

Fig. 1.5 Wave structures with different
horizontal spatial frequency. The
lower figure has double the spatial
frequency of the upper figure. Neither In three dimensions we can count waves in any direction and there are
wave varies along the vertical direction,
consequently they have zero vertical different spatial frequencies associated with each dimension—however
spatial frequency. only two are independent. If, for example, we consider a wave
propagating at an angle θ relative to the z axis in the xz plane, as
in Fig. 1.6, then the number of waves per unit length along the x axis is
1
u = sin θ . (1.17)
λ
This quantity is know as the spatial frequency along x, and also

has the units of inverse length. The spatial frequency along the z axis
is (1/λ) cos θ. By definition, the spatial frequency associated with any
direction must be less than or equal to the wave number.
For the harmonic wave solution, eqn (1.9), the spatial frequency along
a particular direction is related to the component of the wave vector in
that direction. For the x direction,
Fig. 1.6 A harmonic wave propagating
at an angle θ relative to the z axis in the
2π
kx = k sin θ = sin θ = 2πu . (1.18)
xz plane. The spatial frequency along λ
x is u = sin θ/λ, and along z is cos θ/λ.
In the small-angle approximation, these Consequently, the magnitude of each component of the wave vector is
expressions become θ/λ and
equal to the phase change (in radians) per unit length in that direction.
1 u 2 λ2 The magnitude of the wave vector is simply 2π radians times the spatial
1− ,
λ 2 frequency, and has units of rad.m−1 . This is the spatial analogue of the
respectively. phase change per unit time or angular frequency, ω.
In summary, spatial frequency is the number of waves per unit length
in a particular direction. For a harmonic wave—monochromatic light—
the components of the wave vector (kx , ky , and kz ) are the phase change
per unit length in a particular direction (x, y, and z, respectively).
Although we could call the components of the wave vector angular spatial
frequencies, to avoid confusion between the angle in angular frequency
and the angle of propagation, we avoid this nomenclature.
1.10 Intensity/Poynting vector 7
1.10 Intensity/Poynting vector

In optics, rather than measuring the electromagnetic field directly, often
we measure the time-averaged energy flux or intensity. To find a
mathematical expression for the energy flux we start from the energy
density of an electromagnetic wave which can be found by re-writing
Maxwell’s equations in the form of a continuity equation:
∂u
+∇·S =0 , (1.19)
∂t
where u is the energy density inside a volume and S is energy flux out
of the volume. The continuity equation expresses the conservation of
energy.10 From Maxwell’s equation we find that11 the energy density is 10
We can check this by integrating
the flux over a closed surface, σ, then

1 B2 using the divergence theorem and the
u= 0 E 2 + , (1.20) continuity equation we find
2 μ0 ‹ ˚
S · n̂dσ = ∇ · SdV ,
and the flux—corresponding to the energy flow per unit area per unit σ V
˚
time—is given by =−
∂u
dV = −
∂U
,
∂tV ∂t
1 where U is the energy inside the volume
S= E ×B . (1.21) V enclosed by σ. Exercise (1.15)
μ0
considers the energy density inside a
capacitor.
The quantity S is known as the Poynting vector.12
11
Over timescales longer than the optical period, we are only interested Taking the dot product of the first
and second Maxwell equation, eqn (1.1)
in the time average of the energy flux which is given by the time average
and eqn (1.2), with B/μ0 and E,
of the Poynting vector’s magnitude. In Chapter 2 we shall show that respectively, we have
for a harmonic wave F = F 0 cos(k · r − ωt) with E 0 and B0 constant in B B ∂B
space (a plane wave), E × B = n̂|E|2 /c, where n̂ is a unit vector in the · (∇ × E) = − · ,
μ0 μ0 ∂t
direction of propagation (k = k n̂). Using this result we can write the
B ∂E
E · ∇× = 0 E · .
Poynting vector as μ0 ∂t
Subtracting the first equation from the
S = 0 c|E 0 |2 cos2 (k · r − ωt)n̂ , (1.22) second and using a vector identity we
find

where we have used c2 = 1/μ0 0 . The intensity, I, of the electro- 1
−∇ · E ×B =
magnetic radiation at position r is given by the time average of the μ0

magnitude of the Poynting vector: ∂ 1
0 E 2 +
1 2
B .
∂t 2 μ0
ˆ T
1 12
I = S = S dt = 12 0 cE02 , (1.23) Named after John Henry Poynting
T 0 (Eccles 1852–Birmingham 1914), Pro-
fessor at the University of Birmingham.
where the 12 comes from the average of cos2 ωt and E0 is the magnitude
of E 0 . The key aspect is that the intensity of a light wave—the quantity
measured by optical detectors—is proportional to the square of the
amplitude of the electric field. If we count photons, the intensity
tells us how many photons we expect per unit area per unit time. In
Section 1.11, we shall derive the equivalent result using complex notation
for the field.
8 Light as a wave
1.11 Complex representation

The amplitude and phase of a wave can be conveniently represented by
13
Complex is an unfortunate label for a complex number13 which also has two parameters. In the complex
something that we introduce to make representation, we expand the cosine in the harmonic wave solution,
life simpler. One of the pioneers of com-
plex numbers—il giocatore di Milano—
eqn (1.14), using Euler’s formula:
Gerolamo Cardona (Pavia 1501–Rome
1576) in his 1545 book Ars Magma E = E+ + E− , (1.24)
called them quantitas sophistica. Car-
dona became interested in the solution where
of cubic equations following on from
±iφ
the work of Scipione del Ferro (1465– E± = 2 E 0e
1
, (1.25)
1526) and Niccolo Fontana Tartaglia
(known as the stammerer) (Brescia and φ = k · r − ωt + φ0 . The ‘shorthand’ is to choose to work with only
1499–Venice 1557). He describes the
equation,
the so-called ‘positive’ frequency component, E+ . We absorb the factor
√ √ of 12 into the amplitude and omit the subscript. The complex form of
(5 + −15)(5 − −15) = 40 , the harmonic wave solution, eqn (1.9), is written as
as so subtle that it is useless. The
utility of complex numbers was fur- E = E 0 ei(k·r−ωt) . (1.26)
ther developed by Rafael Bombelli
(Bologna 1526–Rome 1572) who in- Sometimes, we may omit the explicit time dependence and write
troduced the symbol i in his book
L’Algebra. Maxwell, in line with
the common-sense school of nineteenth
century philosophy, wrote that Imagi- E = E 0 eik·r . (1.27)
nary quantities, . . . , have no place in
physical science, but following the work
of William Rowan Hamilton (Dublin
1805–65), the complex shorthand be-
The convenience of complex notation is illustrated by an operation such
came widespread and any early scruples as a phase shift. In complex notation, to shift the phase of a wave by an
were forgotten. amount φ relative to a reference wave, we simply multiply by a phase
factor as follows:
E = E eiφ , (1.28)
whereas for a real field it would be necessary to change the argument

of the cosine. We shall encounter such phase shifts in many scenarios,
such as in the context of polarization, see Chapter 4.
Although complex numbers provide a convenient shorthand, all
measurable quantities including E are real, and complex notation may
disguise important physics, as a seemingly innocuous factor of i signifies a
field component that oscillates with a different phase. A factor i = eiπ/2
corresponds to a field component that oscillates in-quadrature with
a phase difference of π/2 relative to the original wave. Such phase
differences have profound effects in light propagation and scattering.
An important consequence of the complex shorthand is a change in
how we calculate intensity. Previously, we found the time average of the
Poynting vector eqn (1.23), I = S = 12 0 cE02 , where the 12 arises from
the average of the cosine squared. For a complex field where the time
dependence has been neglected, we take the modulus squared, and
add the time-average factor explicitly:
∗
I = 2 0 c|EE |
1
. (1.29)
1.12 Scalar approximation 9
Substituting eqn (1.27) gives
I = 1 2
2 0 cE0 , (1.30)
as before. As the intensity is a time-averaged quantity it does not depend

on the overall phase; however, later, when we look at more than one wave
we shall find that relative phases do matter.
Finally, another oddity of complex notation is that in a sum of many
waves—see Chapter 6—we shall need to include negative frequencies,
whereas for real fields a sum over real positive frequencies is sufficient.
1.12 Scalar approximation

In free space and most optical media, the vector wave equation eqn (1.7)
is separable into scalar equations for each component of E or B, and we
can write a scalar wave equation for each component. If we are not
interested in polarization effects, it may be sufficient to consider only a
single component of E, and we can replace the vector field E by a scalar
amplitude E. This is known as the scalar approximation. The scalar
wave equation for one component of E is
1 ∂2E
∇2 E − =0. (1.31)
c2 ∂t2
For the special case where the field varies in only one spatial dimension,
say z, the wave equation reduces to Fig. 1.7 Intensity distribution in the
xz plane corresponding to the x and
∂2E 1 ∂2E z field components for focused light
− =0. (1.32) (high intensity is white, zero intensity
∂z 2 c2 ∂t2 is black). In the scalar approximation,
we only retain the dominant component
As we shall see next, the one-dimensional wave equation has a very of the field; in this example, Ex , and
general solution corresponding to a propagating wave form. the associated intensity, Ix . The effect
of a lens is to tilt the electric field
The scalar approximation breaks down if we tilt the field vector too vector, converting a part of Ex into
far relative to the propagation direction, as in the case of strong focusing Ez . For small tilt angles, the scalar
illustrated in Fig. 1.7. We shall discuss the full vector theory of focusing approximation remains valid. However,
in Chapter 12. A lens changes both the distribution of wave vectors, k, for larger tilt angles—a significant
fraction of π/2 (the strong focusing
and the distribution of electric field vectors, E. The scalar approximation limit)—other electric field components
says we can neglect this change in E which is only approximately true appear (lower image) and a full vector
as long as the range of propagation angles relative to the optical axis treatment is needed, see Chapter 12.
remains sufficiently small. The regime where the spread in propagation
angles is not too large is known as paraxial optics, see Section 2.13. It is
apparent from Fig. 1.7 that the scalar approximation is only applicable in
this paraxial regime. The scalar approximation is known to break down
when a light beam is tightly focused, and the longitudinal components
become significant, as is depicted in Fig. 1.7.
10 Light as a wave
1.13 General solution

Solutions to the one-dimensional scalar wave equation eqn (1.32) will be
functions of the variables z and t. Here we show that any function, E,
14
We use the physics definition of of the form E0 f(z − ct) is a solution for E.14 Let f and f represent the
‘any’, there are some constraints on first and second derivatives, respectively, of the function f with respect
the function such as it being twice
differentiable.
to the argument (z − ct), then
∂f
= f (z − ct) , (1.33)
∂z
and
∂2f
= f (z − ct) . (1.34)
∂z 2
In addition
∂f
= −cf (z − ct) , (1.35)
∂t
moreover
∂2f
= c2 f (z − ct) . (1.36)
∂t2
Thus by substituting eqn (1.34) and eqn (1.36) into eqn (1.32) we
conclude that E = E0 f(z − ct) is a solution to the one-dimensional wave
Fig. 1.8 Generic solutions to the one- equation. Furthermore, a similar analysis shows that E = E0 g(z + ct)
dimensional wave equation correspond- is also a solution for any function, g(z, t), of the form g(z + ct). Using
ing to a wave propagating from left to the principle of superposition we see that the most general solution to
right (a) f(z − ct), and from right to left the one-dimensional wave equation is E = E0 [f(z − ct) + g(z + ct)]. We
(b) g(z + ct). The plots show the wave
at successive times tC > tB > tA . refer to f(z −ct) and g(z +ct) as travelling-wave solutions, for reasons
that are explained in Fig. 1.8. The solution E = E0 f(z − ct) evidently
represents an electric-field wave travelling to the right, with speed c;
whereas the solution E = E0 g(z + ct) represents an electric-field wave
travelling to the left, also with speed c.
1.14 Propagation
One of the most important questions in optics is how does light
propagate from A to B? Often, we know the field in a particular input
plane, and want to know the field everywhere else, or at least in a
particular observation plane. A large part of this book is devoted
to finding an expression for the field ‘downstream’ of the input plane.
We shall find—in Chapters 5 and 6, respectively—that the light field
anywhere can be written as a sum of waves with either planar or curved
wave fronts each with a particular phase. So the short answer to the
question, how does light propagate from A to B? is that it is all about the
phase. As momentum is conserved, a more accurate statement is that it
is all about phase and momentum. Using these simple concepts—adding
waves, paying attention to their relative phase, and the conservation of

momentum—we can predict most optical phenomena.
A light propagation scenario is illustrated in Fig. 1.9. We know the
input field in the z = 0 plane which we write as
E (0) = E0 f(x , y ) , (1.37)
where E0 is the field amplitude and f(x , y ) is a dimensionless function

that contains all the information about the initial transverse spatial field
distribution. The superscript on E (0) is used to denote the z = 0 plane.
We use the dashed variables, x and y , for the transverse coordinates in
the input plane so that we can distinguish them from displacements, x
and y, in the observation plane. From the field we can find the intensity,
I (0) = I0 |f(x , y )|2 .
We assume that light propagates predominantly along the z axis,
known as the optical axis. The field in the observation plane, a distance
z downstream, is written as
E (z) = E(x, y) , (1.38)
where now the superscript denotes the plane at a propagation distance z

and the function E(x, y) is to be determined. We shall find expressions
for E (z) in subsequent chapters. Once we have E (z) we convert this
to an intensity I (z) , which is what we measure. An example of light
propagation through a two-lens system is shown in Fig. 1.10. We shall
discuss how this image is calculated in Chapter 6.
Fig. 1.9 The intensity distribution

1.15 Waves and quanta in the input plane at z = 0 is
I (0) = I0 |f(x , y )|2 . The intensity
In the twentieth century we learned that the electromagnetic field is distribution a distance z downstream is
quantized and began to call the quanta of energy photons. For a I (z) . High intensity is light grey, low
monochromatic light field with wave vector k, a photon has an energy intensity is darker.
ω and a momentum
p = k . (1.39)
This expression relates the propagation direction specified by k to the

momentum, and hence a distribution of k corresponds to a distribution
of momentum. As the magnitude of the wave vector is k = 2π/λ, a
photon with wavelength λ has a momentum p = h/λ, which is known
as the de Broglie relation, named after Louis Victor Pierre Raymond
(Dieppe 1892–Louvecinnes 1987), 7th duc de Broglie.15 15
de Broglie postulated that all matter
A consequence of the wave–particle duality of light and matter is has wave-like properties and received
the Nobel Prize for Physics in 1929
that the wave concept—learned in optics—is also applicable in quantum for his discovery of the wave nature
mechanics. In fact, light and non-relativistic massive particles can be of electrons, but it is his mathematical
described by an identical wave equation! Substituting a harmonic wave relationship between momentum and
solution, E = E0 cos(kz − ωt), into the scalar wave equation eqn (1.31), wavelength that is of most interest in
optics.
we obtain
∇2 E + k 2 E = 0 , (1.40)
12 Light as a wave
where k = ω/c.
This time-independent form of the wave equation is known as the
Helmholtz equation. The time-independent Schrödinger equation
for the wave function, ψ of a particle of mass m, and energy E in a
potential V ,
2 2
− ∇ ψ + V ψ = Eψ , (1.41)
2m
can also be written in the form of a Helmholtz equation,
∇2 ψ + k 2 ψ = 0 , (1.42)

where k = 2m(E − V )/2 . Consequently, the same wave theory
works in both cases. The analogy works both ways. One can either think
of optical photons as being confined in potentials created by matter, as
in an optical fibre or waveguide; or particles confined by potentials which
can be created by light, as in optical tweezers (Adams et al. 1994). We
shall make use of this analogy in later chapters, particularly when we
Fig. 1.10 Intensity map for light consider what the field looks like inside an aperture and when we discuss
propagating through a two-lens system. the light distribution inside an optical fibre, see Chapter 11.
Black and white correspond to zero and We should keep in mind that the light field is inherently lumpy. We
peak intensity, respectively, but where
is the photon? How to calculate this shall often write the field amplitude, E0 , as if it were a constant, when
image is discussed in Chapter 6. in fact it contains fluctuations, see Chapter 8; E0 is the average of a
fluctuating field. A final caveat on the use of the photon concept is
that although we often talk about them, the photonic character of the
field is only really important if we are sensitive to individual photon
correlations in a way that goes beyond classical wave theory. The
story of correlated photons lies in the realm of quantum optics, see
e.g. Loudon (2000). Verifying that light is a quantum phenomenon
requires both interference and counting, i.e. both the wave and particle
character of the field. The reality of photons is referenced to correlations
in the detected signal. To emphasize this point, Roy Jay Glauber
(New York City 1925–)—co-recipient of the 2005 Nobel Prize in Physics
for his contribution to the quantum theory of optical coherence—says
(Roychoudhuri 2008): A photon is what a photon detector detects, and
A photon is where a photon detector detects it.
Exercises 13
Chapter summary
• Maxwell’s equations show that light is described by an

electromagnetic wave.
• Light–matter interactions are dominated by the interaction
between the electric field and charges in a medium.
• As the wave equation is linear we can use the principle of
superposition to construct new solutions E 1 +E 2 from known
solutions E 1 and E 2 .
• A harmonic wave has a single frequency, ω/2π, and in optics
corresponds to monochromatic light. Plane and spherical waves
are special cases of harmonic wave solutions.
• The density of wave fronts—number of wave crests per unit
length—is know as the spatial frequency.
• Although monochromatic light has a single frequency it still has
different spatial frequencies in different directions. The spatial
frequency in a particular direction is equal to the component of
the wave vector in that direction divided by 2π, e.g. along x,
u = kx /2π.
• As the frequency of optical fields is a few hundred terahertz, we
cannot measure the electric field directly and instead measure the
time-averaged energy flux or intensity, which is proportional
to the square of the field.
• Often in optics it is possible to consider only one component of the
field. This is known as the scalar approximation.
• The general solution to the one-dimensional wave equation is of
the form f(z − ct) + g(z + ct), where f(z − ct) and g(z + ct) represent
waves travelling along the +z and −z directions, respectively.
• The link between the wave-like (wavelength or wave vector) and
particle-like (momentum) properties of light is made via the de
Broglie relation, p = k.
Exercises
(1.1) Speed of light the relation c2 = 1/(μ0 0 ). Evaluate 0 to four
The speed of light has been defined as significant figures.
c = 299 792 458 m s−1 (exact). Likewise, the
value for the vacuum permeability is exact, (1.2) Photons in a beam
μ0 = 4π × 10−7 N A−2 (exact). The value for A laser pointer emits light of wavelength λ =
the permittivity of free space is thus defined by 650 nm in a beam of power 2 mW. How many
photons per second are emitted?
14 Exercises
(1.3) Electric and magnetic fields at z = 0, for t in the range −5 fs ≤ t ≤ 5 fs.

From the fundamental definitions of electric (1.11) Wave propagation (1)
and magnetic fields, show that their ratio has Verify that E = E0 [f(z − ct) + g(z + ct)] is indeed
dimensions of speed. a solution of the one-dimensional wave equation.
(1.4) Period, wavelength, and frequency (1.12) Wave propagation (2)
A laser emits light of wavelength λ = 632.8 nm Consider the wave E = E0 /[1 + 4(z − ct)2 /a2 ].
which propagates in vacuum and to an excellent Sketch the wave form at t = 0 as a function
approximation takes the form of the harmonic of the dimensionless variable z/a for the range
wave, eqn (1.9). What are (i) the linear frequency, −5 ≤ z/a ≤ 5. What is the interpretation of
and (ii) the angular frequency of the light? What the parameter a? Add to your sketch two other
is the temporal delay at a given point in space wave forms evaluated at t = a/c and t = 2a/c.
between sequential occurrences of (iii) maximum Comment on the temporal evolution.
electric field, (iv) zero electric field, and (v)
maximum intensity? (1.13) Wave propagation (3)
Consider the wave E = E0 sech[(z + ct)/b], where
(1.5) Spatial frequency and angular spatial frequency sech (x) = 2/(ex + e−x ). Sketch the intensity wave
Define spatial frequency. How does the spatial form at t = 0 as a function of the dimensionless
frequency of light in the propagation direction variable z/b for the range −5 ≤ z/b ≤ 5. What is
relate to the magnitude of the wave vector? the interpretation of the parameter b? Add to your
(1.6) Spatial frequencies everywhere sketch intensity wave forms evaluated at t = b/c
Order the following in terms of increasing spatial and t = 2b/c. Comment on the temporal evolution
frequency: (i) Bricks in a wall in a horizontal of the wave.
direction. (ii) Bricks in a wall in a vertical (1.14) Poynting vector
direction. (iii) The horizontal lines on a human From the fundamental definitions of electric and
forehead (furrows on a furrowed brow). (iv) Row magnetic fields and vacuum permeability, show
of vines in a vineyard. (v) The teeth of a comb. that the Poynting vector has dimensions of energy
(1.7) Angular spatial frequency: numerical value per area per time.
What is the magnitude of the wave vector for light (1.15) Energy density
with a wavelength of 500 nm? Write your answer Consider the work done in charging a capacitor
in the form k = 2πu, where u has the units of m−1 . up to a voltage V :
(1.8) Frequency to spatial frequency ˆ ∞ ˆ ∞
What is the spatial frequency of buses if there are U= IV dt = (dQ/dt)V dt .
two per hour and their average speed is 20 kmh−1 ? 0 0
(1.9) Harmonic wave (1) Rewrite U in terms of the voltage across the
Sketch the form of the harmonic wave E = capacitor V and the capacitance C. For
E0 cos(kz − ωt), with a wavelength λ = 0.5 μm, a capacitor with area A and spacing d the
at t = 0, for z in the range −1.5 μm ≤ z ≤ 1.5 μm. capacitance is C = A0 /d and the field is E = V /d.
(1.10) Harmonic wave (2) Substituting for C, find an expression for energy
Sketch the form of the harmonic wave E = density, u. How would this expression change for
E0 cos(kz − ωt), with a wavelength λ = 0.5 μm, a time-varying field?
One wave: plane or curved 2
Every great decision creates ripples – like a huge boulder
2.1 Introduction 15
dropped in a lake.
2.2 Wave fronts 15
Benjamin Disraeli (London 1804–1881)
2.3 Plane waves 16
2.1 Introduction 2.5 Scalar plane wave 18

In this chapter we consider one wave with either planar or curved 2.7 Law of refraction 20
wave fronts. In particular, we consider two special cases of the 2.8 Dispersion 20
harmonic-wave solution for monochromatic light, eqn (1.9), the plane 2.9 Fresnel coefficients 20
wave and the spherical wave (or cylindrical wave—curved in one 2.10 Brewster’s angle 22
direction, planar in the other). These two types of waves are sufficient 2.11 Reflectivity 22
to describe most wave phenomena.1 The two descriptions—planar or 2.12 Curved wave fronts 23
curved—are complementary. A curved wave front can be constructed 2.13 Paraxial optics 23
from a superposition of planar waves propagating at different angles. 2.14 Paraxial curvature 25
Alternatively, a plane wave can be constructed from a superposition 2.15 Lenses: a brief history 26
of spherical waves. Consequently, we may choose whichever basis is 2.16 Geometry of a lens 27
more convenient. As plane waves have a unique momentum they form a 2.17 Collimation 28
convenient basis to describe the momentum distribution of light, whereas 2.18 Imaging property 29
spherical waves are more convenient to describe converging or diverging Chapter summary 31
waves, see Section 2.13. In Chapters 5 and 6 we shall discuss how the
Exercises 31
curved and plane wave bases are associated with the Huygens–Fresnel
principle and Fourier optics, respectively. 1
Note that although both plane and
spherical (or cylindrical) waves are
All light fields can be described in terms of a superposition of unphysical—a plane wave has infinite
spatial extent and therefore must
waves with either planar or curved wave fronts. have a vanishingly small amplitude in
order to have finite energy, whereas
a spherical (or cylindrical) wave is
In this chapter, we consider the properties of plane waves, Sec- not a solution to the vector form of
tions 2.3–2.11, first in free space and then inside a medium or at an Maxwell’s wave equation—they still
interface; and then curved waves, Section 2.12. Finally, we consider how provide useful mathematical building
blocks to construct real light fields, as
a lens converts between them, Sections 2.15–2.18. long as we are many wavelengths from
a source or obstacle.
2.2 Wave fronts

Before considering the properties of plane and spherical waves, first we
should discuss what is meant by a wave front. A wave front is a surface
of constant phase. The main distinction between plane and spherical (or
cylindrical) waves is whether the wave fronts are planar or curved in the
16 One wave: plane or curved
region of interest. Figure 2.1 shows a visualization of the phase of two

waves (in a particular plane) one with planar wave fronts and one with
2
In optics, it is difficult to observe wave curved wave fronts.2 The two key properties of such wave patterns are
fronts directly as the speed of light is spatial frequency—the number of waves per unit length—and wave-
too fast. Consequently, it is convenient
to use the example of water waves
front curvature. These properties are generic to all types of wave.
instead. In a ripple tank experiment, The distinguishing feature of planar waves is that there is no curvature,
the wave motion can be frozen using i.e., the phase in any plane perpendicular to the wave front is uniform
stroboscopic illumination. and the wave fronts are flat as in Fig. 2.1(left). In contrast, for circular
waves the wave fronts are curved with a radius of curvature equal to the
distance from the source, Fig. 2.1(right).
Fig. 2.1 Left/right: Visualization of

planar/circular wave fronts. For a
planar wave the phase is uniform in a
plane orthogonal to the propagation di-
rection. For a circular wave, the phase
varies quadratically with transverse
displacement from the propagation axis
(black line at the front of the wave).
The phase variation in the propagation
direction (grey line) is the same for
both.
Note that as we move away from a localized source, as in Fig. 2.2,

the wave fronts in a region close to the propagation direction become
more and more planar, i.e. the wave-front curvature (the reciprocal of
the radius of curvature) decreases, however the curvature only vanishes
ccompletely at infinity.
2.3 Plane waves

The complex form of the harmonic wave solution of Maxwell’s wave
Fig. 2.2 Visualization of circular wave
equation, eqn (1.26), omitting the phase offset is
fronts. The wave propagates from left
to right and the wave fronts become E = E 0 ei(k·r−ωt) . (2.1)
gradually more planar as we move
further from the source. The special case where E 0 is a constant—independent of space and
time—is known as a plane wave. A key property of this solution is that
the wave fronts are planar, meaning that in any plane orthogonal to the
propagation direction the phase is uniform, as shown in Fig. 2.1(left).
In general, the wave vector k and the polar vector r are not in the
same direction, and we have
k·r = k x x + ky y + k y z . (2.2)
For the special case of a plane wave propagating at an angle θ relative
to the z axis in the xz plane, as shown in Fig. 2.3, this reduces to
k·r = k sin θx + k cos θz , (2.3)
and the spatial frequencies along x and z are u = sin θ/λ and cos θ/λ,
respectively. We shall come back to a plane wave propagating at an
angle relative to z in Section 2.5, but first we need to introduce another

important property of plane waves.
2.4 Transverse property

One important property of a plane wave, that arises directly from
Maxwell’s equations, is that the field vector is perpendicular to the
propagation direction. For a plane wave with wave vector k =
(kx , ky , kz ) we can write F (r, t) = F 0 ei(k·r−ωt) , where F represents
either E or B. From this definition, the following relations can be derived
Fig. 2.3 The coordinate system for a
(see end-of-chapter exercises):
plane wave propagating at an angle θ
relative to the z axis in the xz plane.
The wave fronts are shown in grey.
∇·F = ik · F , (2.4) P is an arbitrary observation point at
∇×F = ik × F , (2.5) position (x, z). For non-zero θ, the
wave vector k and the polar vector r
∂F
= −iωF , (2.6) are not parallel.
∂t
∇2 F = −k F .
2
(2.7)
Substituting the plane-wave solution into Maxwell’s equations (1.1) and

(1.3), we arrive at some useful vector identities for plane waves:
ik · E = 0, (2.8)
ik × E = iωB . (2.9)
Equation (2.8) shows that E is perpendicular to k, therefore for a

plane wave, the electric field is transverse relative to the propagation
direction. From eqn (2.9) we deduce that B is perpendicular to both k
and E, as in Fig. 2.4; therefore for a plane wave the magnetic field is also
transverse. Consequently, it is often said that light is a transverse wave;
however, this statement is only approximately true in some scenarios.
The correct statement is that the plane-wave solution of Maxwell’s
equations is a transverse wave. The ‘pure’ transverse character only
arises because a plane wave has infinite spatial extent perpendicular to
the propagation direction.
In eqn (2.9) the ‘i’ factors cancel, and as k and ω are real we learn that
that the electric field E and magnetic field B are in phase for a plane Fig. 2.4 The electric, magnetic, and
wave. By taking the magnitude of the quantities in eqn (2.9) we see that wave vectors for a plane wave. The elec-
|E/B| = ω/k. Substituting the results of eqn (2.6) and eqn (2.7) into tric field vector E and magnetic field
vector B lie within the shaded plane.
the wave equation, we obtain c2 k 2 = ω 2 , which confirms that the speed
The wave vector k is perpendicular to
of propagation of these plane waves is c; we also find that the ratio of this plane.
the electric to magnetic fields is |E| = c|B|, which was the result used in
the previous chapter to justify why the electric field is more important
than the magnetic field when light interacts with matter.
Warning
Certain properties of plane waves, such as
• E, B, and k being mutually orthogonal;

• E and B being in phase;
are not necessarily shared by ALL light waves. Plane waves form
a convenient basis, but the properties of a superposition of plane waves
are not the same as the properties of the individual basis functions.
There are numerous examples of situations where the statement light
is a transverse wave is not valid.
2.5 Scalar plane wave

In the scalar approximation, we replace the vector field E by a scalar E
and can write a scalar plane wave solution at t = 0 as
E = E0 ei(kx x+ky y+kz z) . (2.10)
The transverse property of plane waves restricts either the electric

field direction or the range of propagation angles over which the scalar
approximation can be used, see Chapter 12. For example, for a plane
wave propagating in the xz plane, see Fig. 2.3, the scalar approximation
3
We shall consider other cases in is valid if the electric field vector is along y.3 In this case, we can replace
Chapters 4 and 12. the vector field E = (0, E, 0) with the scalar field E. Using eqn (2.3) at
t = 0, the electric field as a function of position in the xz plane is
E = E0 ei(kx x+kz z) = E0 ei(k sin θx+k cos θz) . (2.11)
A surface plot of the electric field is shown in Fig. 2.5.
Fig. 2.5 Plane wave propagating at an

angle θ relative to the z axis in the
xz plane. The rate of phase variation
along the x and z axes is given by kx
and kz , respectively. For θ < 45◦ , the
rate of phase variation along x, kx , and
the spatial frequency in the x-direction,
u = kx /2π, are smaller than along z.
The scalar plane wave has four parameters: (i) wavelength λ = 2π/k =
2πc/ω, (ii) amplitude E0 , (iii) propagation direction θ, and (iv) direction
of the electric field vector (in this case along y), but remember that
the transverse property means that the propagation direction and the
direction of electric field vector are related. The rate of phase variation
along the x axis is
kx = k sin θ , (2.12)
and the photon momentum along the x axis is
px = k sin θ . (2.13)
As discussed in Section 1.8, it is convenient to represent the phase of
any wave using a phasor. For the complex form of a scalar plane wave
eqn (2.10) the phasor angle is
φ = kx x + ky y + kz z , (2.14)
and eqn (2.10) can be written as
E = E0 eiφ . (2.15)
In the propagation direction, the phasor rotates anti-clockwise with an
angle proportional to the distance travelled, as illustrated in Fig. 2.6.
In this phasor representation, the field only has two parameters, an
amplitude E0 , and a phase, φ, but the phase is a function of space and
time. To describe the propagation of light, it is convenient to define an
optical axis, which we shall take as the z axis. The xy plane at z = 0 is
defined as the input plane, and the field in the input plane is written as Fig. 2.6 Phasor evolution for a plane
E (0) = E0 f(x , y ). Using eqn (2.10) the field a distance z downstream is wave in the xz plane. In the propa-
gation direction, the phasor completes
E (z) = eikz z E (0) , (2.16) one revolution (a 2π rotation) between
successive wave crests separated by a
where kz = k cos θ and eikz z is known as the propagator. In Chapter 6 distance λ.
we shall employ this plane wave propagation equation to describe the
propagation of arbitrary fields by writing them as a superposition of
plane waves propagating at different angles.
2.6 Plane wave in a medium

For a plane wave propagating inside a refractive medium4 with real 4
Unlike other topics in this chapter,
refractive index, n, the magnitude of the wave vector k is replaced by light propagation inside a medium is
not a one-wave phenomenon, however,
nk and we can write: we may treat it as such. The field inside
a medium involves a superposition of
the incident field plus induced (or
k ⇒ nk . (2.17) scattered) fields. The resulting inter-
ference produces a field that appears to
propagate at a speed, v = c/n, where
n is the refractive index. In complex
A plane wave inside the medium with refractive index n is notation, the refractive index may also
be complex, however for transparent
E = E0 eink·r . (2.18) optical materials, n is predominantly
real. For metals, n is predominantly
Next, we consider what happens when a plane wave crosses an interface imaginary with the result that the light
between one medium and another. In general, if the angle of incidence is absorbed. For now, we shall only
is not normal to the interface, then not only the magnitude of the wave consider the case where n is real.
vector, but also the propagation direction, changes.
2.7 Law of refraction

Consider a plane wave incident on an interface in the z = 0 plane at an
angle θi as shown in Fig. 2.7. In the xz plane the equation of the plane
wave is
E = Ei ei(kx x+kz z) = Ei eik(sin θi x+cos θi z) . (2.19)
Inside the medium the transmitted plane wave is

E = Et ei(kx x+kz z) = Et eink(sin θt x+cos θt z) , (2.20)
where we have allowed for the possibility that the plane wave has a
different amplitude and propagates in a different direction. At the
boundary z = 0, the variation of the phase along the x axis must be the
Fig. 2.7 Phase continuity at an
interface between vacuum (left) and a same, see Fig. 2.7, which requires that
medium with refractive index n (shaded
region on the right). sin θi = n sin θt . (2.21)
This is known as the law of refraction. For the more general case
where the interface is between a medium with refractive index ni and a
second medium with index nt , the law becomes
5
The first mathematical derivation
was by René Descartes (La Haye
ni sin θi = nt sin θt . (2.22)
en Touraine 1596–Stockholm 1650) in
Dioptrique 1637. Although the law of refraction is often referred to as Snell’s law, after
Willebrord Snellius (Leiden 1580–1626), it first appeared in the work of
Ibn Sahl (Baghdad 940–1000) in 984, see Rashid (1990).5
2.8 Dispersion
As the refractive index, n, depends on the wavelength, see Chapter 13,
different colours refract by different angles as they enter and leave a
medium, which gives rise to optical phenomena such as rainbows. The
change in refractive index with wavelength is known as dispersion. We
briefly review dispersion here and postpone the details to Chapter 13.
Media that transmit visible light typically have electronic resonances
Fig. 2.8 The angular frequency depen- at frequencies in the ultra-violet region and the refractive index spectrum
dence of the refractive index, n, for a looks something like the curve shown in Fig. 2.8, see Chapter 13. The
simple medium with a single resonance
at angular frequency, ω0 . If ω0 is in
refractive index is larger for higher frequencies and consequently shorter
the ultra-violet region, then the angular wavelengths—blue rather than red—refract more. It is worth noting
frequency of both red and blue light, ωr that this is the opposite to diffraction, where longer wavelengths—red
and ωb , respectively, is less than ω0 . rather than blue—diffract more, see Chapter 5.
The refractive index for blue light is
larger as it is closer to resonance.
2.9 Fresnel coefficients

The continuity of field and flux (intensity) at an interface between
optical media with different refractive indices gives rise to a reflected
wave. Fresnel developed a theory of light reflection at an interface
2.9 Fresnel coefficients 21
in 1823. Although his theory was based on the false assumption of

elastic deformation of the medium, it did give the correct answers and
we still use it today. In this section, we use the superposition principle
and the conservation of energy to derive the reflection and transmission
coefficients at an interface.6 6
An alternative approach is to em-
If the plane wave is incident at an angle θi to the z axis in the xz ploy boundary conditions derived from
Maxwell’s equations for both E and
plane, see Fig. 2.9, then conservation of energy flow in the z direction B, however, as noted previously the
gives7 optical response is typically dominated
by E, and B is negligible except for
(Ei2 − Er2 ) cos θi = nEt2 cos θt . (2.23) magnetic materials.
7
The energy flux is given by the
For the field amplitude, there are two cases depending on whether the intensity or time-averaged Poynting
field is polarized perpendicular or parallel to the plane of incidence, the vector (Section 1.10) S = E02 /(μ0 c).
Inside the medium c is replaced by c/n
xz plane in our case. These two cases are often referred to as s and p which gives rise to the factor of n on
polarization from the German senkrecht and parallel for perpendicular the right-hand side.
and parallel, respectively. We use the labels E ⊥ and E to distinguish
between them.
If the electric field vector is aligned along y, i.e., perpendicular to
the plane of incidence, the superposition principle tells us that at the
boundary
Ei⊥ + Er⊥ = Et⊥ . (2.24)
Dividing eqn (2.23) by eqn (2.24), we find
(Ei⊥ − Er⊥ ) cos θi = nEt cos θt . (2.25)
Eliminating Et⊥ we find that the amplitude of the reflected field is
Er⊥ cos θi − n cos θt

= . (2.26)
Ei⊥ cos θi + n cos θt Fig. 2.9 The geometry for in-plane
reflection from an interface.
Subsequently, we use the law of refraction, n = sin θi / sin θt , either to
eliminate n or θi . Eliminating n we find
Er⊥ sin(θi − θt )
= − . (2.27)
Ei⊥ sin(θi + θt )
The other case where the field is polarized within or parallel to the xz
plane is slightly more complex as now we have two components of the
field, see Fig. 2.9, and we need to go beyond a scalar wave theory. Now
only the component of the field perpendicular to the interface (the z
component in Fig. 2.9) satisfies the superposition principle—consistent
with Maxwell’s equation, ∇ · E = 0, with no surface charge. This gives

(Ei + Er ) cos θi = Et cos θt . (2.28)
Dividing eqn (2.23) by eqn (2.28) we get

Ei − Er = nEt , (2.29)

and eliminating Et we find that the reflected wave is given by

Er −n cos θi + cos θt

= . (2.30)
Ei n cos θi + cos θt
Using the law of refraction to eliminate n,

Er tan(θi − θt )

= . (2.31)
Ei tan(θi + θt )

A plot of Er⊥ /Ei⊥ and Er /Ei and their modulus squared, the reflectivity
R, as a function of incidence angle is shown in Fig. 2.10.
Fig. 2.10 The intensity reflection
A situation that arises frequently in optics is that there are reflections
coefficients for in-plane (black) and
perpendicular polarization (grey) as a from more than one surface, for example, when light passes through
function of the angle of incidence, θi , a pane of glass or reflects from an oil film on water. These multiple
at an interface with refractive index n reflections can interfere giving rise to dramatic changes in the reflection
(in this case n = 1.5). The amplitude
reflection coefficients are shown inset.
and transmission, which depend on the wavelength. As this is a two-
wave phenomenon we postpone the discussion until Chapter 3.
2.10 Brewster’s angle

The interesting result hidden in eqn (2.31) is that the reflection
amplitude is zero if θi + θt = π/2. We call this particular case of
the incident angle Brewster’s angle, and give it the label θb . From
θb + θt = π/2 we have sin θt = sin(π/2 − θb ) = cos θb and using Snell’s
law sin θb = n sin θ we find that
tan θb = n .
Fig. 2.11 The intensity reflection At Brewster’s angle there is no reflected light, all the light is transmitted—
coefficient, R, at normal incidence, which is particularly useful in applications where low loss is important,
eqn (2.32), as a function of the
refractive index, n. The reflection
such as inside laser cavities, see Chapter 11. For typical glasses with
coefficient rises from R = 0.04 for n = n = 1.5, Brewster’s angle is about 57◦ , as shown in Fig. 2.10. The
1.5 (glass) to R = 0.17 for n = 2.4 disappearance of the reflected wave at Brewster’s angle arises due to the
(diamond). transverse nature of plane waves; namely, the impossibility of solutions
with coexisting orthogonal transverse fields. It follows that for a real
light field, which cannot have the infinite spatial extent of a plane wave,
8
Note that there is nothing in the the Brewster-angle condition can only be met partially.8
theory relating to the microscopic
properties of the medium. Conse-
quently, Brewster’s angle is not related
to the angular dependence of fields
2.11 Reflectivity
produced by microscopic dipoles inside
the medium. For normal incidence, θi = 0, the intensity reflection coefficient for either
polarization reduces to
2 2
Er 1−n
R = = . (2.32)
E0 1+n
We plot R versus refractive index in Fig. 2.11. As the reflectivity of
optical media like glass and transparent crystals is low, interference
2.12 Curved wave fronts 23
effects between multiple layers are often used to produce high-reflectivity

mirrors. For two interfaces, the reflection coefficient can be as high as
0.15 for glass instead of 0.04, see Chapter 3.
2.12 Curved wave fronts

In addition to the planar waves considered so far, we shall also encounter
curved wave fronts such as in the case of spherical waves (or cylindrical
waves—curved in one transverse direction and planar in the other). Both
spherical (and cylindrical) waves have circular wave fronts in a particular
plane, as shown schematically in Fig. 2.12. The radius of curvature at
a point P on the wave front, is represented by a vector from the source
Fig. 2.12 Circular wave fronts in the
to P. Spherical waves provide good approximate solutions to Maxwell’s xz plane for either a spherical (or
wave equation in a localized region far from a source or aperture and cylindrical) wave centred on (x , 0).
close to the direction of propagation. This is known as the paraxial The radius of curvature at a point, P,
region and will be discussed next, see Section 2.13. is represented by the vector, r , where
r is the distance from the centre of the
With the caveat,9 that it is only valid for a propagation distance, wave to P . Note that for a wave that

r λ, we can write a scalar spherical wave as is not centred on the origin (e.g. x = 0
or y = 0) the wave vector k and the
polar vector r are non-aligned.
Es i(k·r−ωt) Es i(kr −ωt)
E = e = e , (2.33) 9
A scalar spherical wave is not a
ikr ikr solution of Maxwell’s equations due to
the vector nature of light. A trans-
verse field vector cannot be spherically
where Es is the amplitude, r is the distance from a source (or effective symmetric—just as you cannot brush
source) coordinate (x , y , z ) to an observation coordinate, (x, y, z), as all the hairs on a seamless tennis ball
shown in Fig. 2.12. For the simplest case with the source at the origin, in the same direction, or have all the
ocean currents flow smoothly without
r = r and k·r = kr. The 1/r factor follows from energy conservation— a discontinuity somewhere. We shall
the surface area of the wave front expands as r2 , so the intensity must revisit this point in Chapter 12.
decrease as 1/r2 , and amplitude is proportional to the square root of
intensity. Aside from the 1/r factor, we are free to choose how to define
the amplitude and phase at t = 0. We include a factor of 1/k in the
amplitude such that both E and Es have the same units. We also include
a factor10 of (1/i). 10
Later we shall see that this (1/i)-
factor arises often, and is known as
a Gouy phase—named after Louis
Georges Gouy (Vals-les-Bains 1854–
2.13 Paraxial optics 1926). The mathematical origin of the
Gouy phase is discussed in Section 6.5.
The scalar spherical wave solutions given above, eqn (2.33), are only
valid for a propagation distance r λ. In the region close to a specific
propagation direction, we can make an additional simplification. This
region is referred to as the paraxial regime, where paraxial literally
means by the axis. The relevant axis is known as the optical axis and
unless otherwise stated will be chosen along z, i.e., the paraxial region
corresponds to x < z and y < z, as illustrated in Fig. 2.13. Consequently,
the propagation angle, θ = tan−1 (x/z) in the xz plane, is also small.
In paraxial optics, there is a clear distinction between the propagation
direction, z, and the transverse dimensions, x and y, and the small angle
of propagation, θ, means that we can make the following approximations:
(1) First, being close to the optical axis means that we can make the
small-angle approximation:
sin θ θ, and cos θ 1 − θ2 /2 .
In practice, this is a surprisingly good approximation because

the relative error in setting sin θ = θ is approximately (θ3 /3!)/θ,
which even for an angle of 40◦ is still only 5%. The small-angle
approximation means that the spatial frequency along the x axis
becomes
u = θ/λ ,
i.e., the spatial frequency is linearly proportional to the angle of
propagation.
(2) Second, a corollary of the small-angle approximation, is that if
Fig. 2.13 Illustration of the paraxial the angle of propagation relative to the z axis is small, then the
regime. The angle of propagation, θ, transverse components of the wave vector, kx and ky , must be
relative to the optical axis along z is
small compared to the axial component, kz :
small, and the transverse coordinates,
x and x are smaller than the propaga-
tion distance z. kx and ky
kz , (2.34)
and we can rewrite the axial component as

kx2 + ky2
kz = (k 2 − kx2 − ky2 )1/2 ≈ k − . (2.35)
2k
This approximation is useful in the description of light propagation
as we shall see.
(3) Third, if the transverse coordinates—(x , y ) in the input plane
and (x, y) in the observation plane—are small compared to the
propagation distance z,
x , y , x and y < z,
then we can approximate the distance from an input point

(x , y , 0) to an observation point (x, y, z),
r = [z 2 + (x − x )2 + (y − y )2 ]1/2 ,
by the paraxial distance

(x − x )2 + (y − y )2
rp = z+ . (2.36)
2z
This is known as the Fresnel approximation.
To summarize, in paraxial optics—paths close to the optical axis—
11
Some texts use the term paraxial we can make two related approximations—small angle and Fresnel.11
approximation which could refer to These approximations are no longer appropriate in the limit of strong
both or either.
focusing, such as in microscopy, because the transverse components of
the wave vector can be as large as the axial component. Next, we look at
how these approximations modify the expressions for planar and curved
waves.
2.14 Paraxial curvature 25
Example 2.1
Paraxial plane wave: In the paraxial regime, we can derive a paraxial form of
the plane wave solution, eqn (2.10). Rewriting the paraxial expression for the axial
component of the wave vector, eqn (2.35), in terms of spatial frequencies using kx =
2πu and ky = 2πv, we obtain
kz = k − π(u2 + v 2 )λ , (2.37)
where v is the spatial frequency in the y direction. Substituting for the components
of k in eqn (2.10) we obtain
2
+v 2 )λ]z i2π(ux+vy)
E = E0 ei[k−iπ(u e . (2.38)
This equation is known as the paraxial plane wave solution. The first exponential
factor is the phase evolution for a plane wave along z (less than kz for a wave
propagating at an angle), and the second is the phase variation along x and y. It
follows from eqn (2.38) that the field in an xy plane at z can be written as
2
+v 2 )λ]z (0)
E (z) = ei[k−iπ(u E , (2.39)
where E (0) is the field in the xy plane at z = 0. This equation, the paraxial form of
eqn (2.16), is the basis of paraxial Fourier optics, as we shall see in Chapter 6.
2.14 Paraxial curvature

In the paraxial regime, the scalar spherical solution, eqn (2.33), is
rewritten as a paraxial spherical wave. First, we replace the Fig. 2.14 Circular wave fronts in the
propagation distance, r , by the paraxial distance, rp = z + [(x − x )2 + xz plane. The paraxial regime—close
(y − y )2 ]/(2z). Second, we replace the 1/r factor in the amplitude by to the propagation direction (in this
1/z. This is an excellent approximation as long as z λ, because in case z)—is indicated in dark grey. For
a wave with source at the origin the
this case the amplitude is slowly varying compared to the phase.12 Using radius of curvature vector, r , has
these two approximations, the equation for a spherical wave, eqn (2.33), a magnitude equal to the propaga-
at t = 0 becomes tion distance z and the wave-front
curvature appears as an exponential
2
term of the form, eikρ /2z . Unlike a
Es ikrp spherical wave—which propagates in all
E = e . (2.40) directions—a paraxial spherical wave
ikz
has a particular propagation direction.
12
Unlike a spherical wave, the paraxial spherical wave has a particular For a paraxial spherical wave with
propagation direction, which we have taken as the positive z direction. source at (0, 0, 0), the relative error
at (x, 0, z) in neglecting x2 /(2z) in
The paraxial spherical wave is particularly useful in the description of 1/r is Δr /r = x2 /(2zr), whereas
diffraction, as we shall see in later chapters. A paraxial spherical wave the relative error in the phase, kr ,
solution with source at (x , 0, 0) is illustrated in Fig. 2.14. If the source is is Δφ/(2π) = kx2 /(4πz) = x2 /(2λz).
at the origin, x = y = 0, then in the y = 0 plane, the paraxial spherical For z λ, the former is negligible, i.e.,
the wave amplitude E0 /(kr ) is slowly
wave is varying compared to the phase, eikr .

Es ikz ikρ2 /2z

E = e e , (2.41)
ikz
where ρ = (x2 + y 2 )1/2 . The first exponential term, eikz , is the same
as a plane wave propagating along z. The second exponential term,
2
eikρ /2z , expresses the wave-front curvature with radius of curvature
equal to the propagation distance, z. As we saw in Figs. 2.2 and 2.14, as
we propagate further from the source, the radius of curvature increases,
and the wave fronts become more and more planar. In Fig. 2.1(right),
we showed a visualization of the phase of a spherical wave in a particular
xz plane and indicated the phase variation along the x axis at a distance
2
z from the source, which is given by the real part of eikx /2z . Next, we
consider how an ideal lens changes the curvature of a paraxial spherical
13
Curved wave fronts are common wave.13
in optics and we shall encounter
many examples where quadratic phase
2
factors, similar to the eikρ /2z term in
eqn (2.40), arise in later chapters. 2.15 Lenses: a brief history
The lens—named after the lens culinaris or lentil—is an essential
14
So useful that Nature invented it! component in most optical instruments.14 Although the light-bending
The origin of man-made lenses is not properties of water and glass were well known to the Greeks and
clear. The so-called Nimrod lens, found
at the Assyrian temple of Nimrod and
Romans, the first scientific understanding of lenses is attributed to Ibn
now in the British Museum, is a lens- Sahl (Baghdad 940–1000) who wrote his treatise On Burning Mirrors
shaped glass over 2700 years old, but and Lenses in 984, which contained the first exposition of the law
may not have been used as a lens. of refraction. Ibn Sahl worked out that to focus light with minimal
Fig. 2.15 Principle of Ibn Sahl’s

anaclastic lens. Planar wave fronts
incident on (i) a glass sphere and (ii)
an ellipsoid. For a spherical interface
(i), the wave fronts inside the medium
become elliptical. For an elliptical
interface (ii), the wave fronts become
spherical. By making the first surface
elliptical, white dashed line in (ii), and
the second spherical, black dashed line
in (ii), Ibn Sahl invented a perfect lens.
15 aberration requires a combination of spherical and parabolic surfaces,

Two hundred years later, Robert
Grosseteste (Stradbroke 1175–Buckden which we now refer to as an anaclastic lens, see Fig. 2.15. In the
1253) and Roger Bacon (Ilchester 1214– second millennium, Ibn al-Haytham’s (Basra 965–Cairo 1040) Book of
Oxford 1292) studied the focusing effect Optics became the main source text for European scholars.15 In the
of water in a curved glass.
fourteenth century, lenses began to appear in European art, particularly
to magnify religious texts, as described in the Marian Prayer Magnificat
16
In 1610 Galileo Galilei (Pisa 1564– anima mea Dominum, or as spectacles in the paintings of Hieronymus
Arcetri 1642) built a telescope with a Bosch and Jan van Eyck. By the sixteenth century lens making for
magnification of 30×, and was able to
resolve the moons of Jupiter and the
spectacles was well established, and Dutch opticians, Hans and Zacharia
rings of Saturn. The great pioneers Janssen (father and son) (The Hague 1580–Amsterdam 1638), and Hans
of microscopy were Robert Hooke Lippershey (Wesel 1570–Middelburg 1619) are credited with making the
(Freshwater 1635–London 1703) who in first microscope and telescope, respectively.16 Throughout this period,
1665 wrote Micrographia, introducing
the word cell in the description of the
the standard understanding of the operation of a lens was the law of
structure of cork, and Antonie van refraction, where a light ray is refracted, or bent, by an angle that
Leeuwenhoek (Delft 1632–1723) who increases linearly with distance from the optical axis. Not until the
first observed bacteria in 1676. nineteenth century, and the experiments of Fresnel did we begin to
2.16 Geometry of a lens 27
understand the lens from a wave perspective, see Chapters 5 and 9.
2.16 Geometry of a lens

The function of a lens follows from its geometry. A plano-convex lens
consists of a cylindrically symmetrical piece of glass—flat on one side
and spherical on the other with radius of curvature, RL . We assume
that the lens is placed in the x y plane at z = 0, as shown in Fig. 2.16.
The lens is made of an optical medium with refractive index n and has
Fig. 2.16 The geometry of a plano-

convex lens. In the thin-lens approxi-
mation, the radius of curvature of the
lens, RL , is one-half of the focal length,
f , for a refractive index, n = 1.5.
However, for a real lens, the finite
thickness may change the curvature
needed to achieve a particular focal
length.
an on-axis thickness, t0 , where t0

RL is known as the thin-lens limit.
Consider a plane wave propagating along the z axis, incident on the lens
in the z = 0 plane. At a transverse displacement, ρ = (x2 + y 2 )1/2 ,
the lens has a thickness

t = RL2 − ρ2 − (RL − t0 ) . (2.42)
If the plane-wave field incident on the lens in the z = 0 plane is E (0) ,

then the field immediately after the lens is given by
E (L) = E (0) eik(t0 −t) einkt , (2.43)
where the two exponential factors correspond to propagating a distance

t0 − t in air, where the phase evolves as k(t0 − t), and a distance t inside
the medium, where the phase evolves as nkt. In the paraxial regime,
ρ < RL , we can rewrite t as

ρ2
t = RL2 − ρ2 − (RL − t0 ) ≈ t0 − . (2.44)
2RL
Substituting into eqn (2.43) we obtain
2
E (L) = E (0) eikt0 ei(n−1)kt = E (0) einkt0 e−i(n−1)kρ /2RL
. (2.45)
Finally, we define the focal length as

17
Neglecting the lens thickness leads
f = RL /(n − 1) , (2.46) to the misleading impression that the
off-axis parts of the wave ρ = 0 are
and use the thin-lens approximation to neglect the einkt0 term in phase advanced relative to the on-axis
eqn (2.45),17 which gives component, whereas in fact all parts of
the wave front are delayed, only the off-
axis parts are delayed by less.
2
E (L) = E (0) e−ikρ /2f
. (2.47)
The result suggests that an ideal thin lens acts as a plane-wave to

paraxial-spherical-wave converter, as illustrated in Fig. 2.16. The effect
2
of the lens is to multiply the input field by a factor e−ikρ /2f , i.e.,
the lens imprints a phase that depends quadratically on the transverse
displacement, ρ . This produces a spherical wave that converges towards
a focus at z = f .
2.17 Collimation
A lens can also be used to collimate light. A diverging spherical wave
centred at z = −f incident on a lens in the z = 0 plane is converted to
a plane wave, as shown in Fig. 2.17(i). Note the positive and negative
sign of the curvature for diverging and converging waves, Fig. 2.17(i)
and (ii), respectively.
Fig. 2.17 (i) A diverging spherical

wave propagating along z. If the
radius of curvature, |r | = f , in the
plane of the lens, then the lens cancels
the curvature producing planar wave
fronts, i.e., the light is collimated. (ii)
A plane wave incident on a lens results
in a converging spherical wave with
negative curvature. The phase term,
immediately after the lens in the lens
2
plane, is e−ikρ /2f .
In summary, a lens imprints a spatially varying phase such

that the input field E (0) becomes
2
E (L) = E (0) e−ikρ /2f
. (2.48)
This converts a plane wave into a converging spherical wave.
Real lenses are not as perfect as the ideal thin lens we have considered
above. First, the focal length depends on the refractive index n which
is a function of wavelength, as discussed in Section 2.8. This leads
to chromatic aberration, where different wavelengths are focused
18
An ideal lens would have a parabolic in different planes. Chromatic aberrations are reduced by using a
or aspherical surface, however standard
polishing techniques produce spherical
second lens made of a different glass—an achromatic doublet—to cancel
surfaces. The difference gives rise to the dispersion of the first. In addition, beyond the paraxial regime,
spherical aberrations. In the parax- other sources of aberration arise due to spherical surfaces and the
ial regime, a spherical and parabolic finite thickness of the lens.18 Optical engineering is focused on reducing
surface are the same.
aberration using multiple lenses; see e.g. Fischer et al. (2008).
2.18 Imaging property 29
2.18 Imaging property

As previously described, an ideal lens converts plane waves to spherical
waves, as illustrated in Fig. 2.16; but what happens if the input is a
spherical wave? The answer is that a lens produces an image of the
source. In this section, we illustrate the imaging-forming property of
lenses, and derive an expression for the distance from the source to the
image. Consider an input field described by a paraxial spherical wave
with an origin at z = −s1 , as in Fig. 2.18.19 A lens in the z = 0 plane 19
In this case eqn (2.41) becomes
forms an image of the source a distance s2 downstream of the lens. We Es1 2
can find the relationship between s1 and s2 using the phase imprinting E= eik(z+s1 ) eikρ /2(z+s1 ) .
ik(z + s1 )
property. The input field in the plane of the lens at z = 0 is
Es1 iks1 ikρ2 /2s1
E (0) = e e ,
iks1
where ρ = (x2 + y 2 )1/2 is the transverse displacement from the optical

axis (along z) and the amplitude, Es1 /(iks1 ), is determined by the
distance travelled from the source. For a wave propagating in the
Fig. 2.18 A diverging spherical wave

(with positive radius of curvature, s1 ,
at the lens) is converted into a con-
verging spherical wave with negative
radius of curvature, s2 , by the lens.
By imprinting a phase, the lens maps
all source points in a plane at position
z = −s1 to image points in the plane
at z = s2 .
positive z direction, the radius of curvature, s1 , is positive for a diverging

wave (but would be negative for a converging wave, see Fig. 2.17).
The field immediately after the lens, assuming the thin-lens limit and
ignoring any finite-size effects, is
2 Es1 iks1 ikρ2 /2s1 −ikρ2 /2f
E (L) = E (0) e−ikρ /2f
= e e e . (2.49)
iks1
We can write this as a new spherical wave converging on the point z = s2 ,
as shown in Fig. 2.18.20 The radius of curvature, |r | = −s2 , is negative 20
In this case, eqn (2.41) becomes
in the z = 0 plane because the wave is converging, see Fig. 2.17(ii). In Es2 2
the z = 0 plane, the new paraxial spherical wave is E= eik(z−s2 ) eikρ /2(z−s2 ) .
ik(z − s2 )
Es2 −iks2 −ikρ2 /2s2

E (L) = e e , (2.50)
iks2
where Es2 is the modified amplitude. Equating the two expressions for
E (L) , eqns (2.49) and (2.50), we find that
Es1 iks1 ikρ2 /2s1 −ikρ2 /2f Es2 −iks2 −ikρ2 /2s2
e e e = e e , (2.51)
iks1 iks2
such that
Es2 −iks2 Es
e = 1 eiks1 ,
iks2 iks1
and from the ρ dependence,
1 1 1
− = − ,
s1 f s2
1 1 1
+ = . (2.52)
s1 s2 f
This equation is known as the thin-lens equation and is easily
extended from spherical waves to images. For images, each point in
an input plane a distance s1 upstream of the lens is mapped onto an
image point at a distance s2 downstream of the lens. The positions
z = −s1 and z = s2 are known as conjugate planes.
Exercises 31
Chapter summary
• A distinguishing feature of monochromatic waves is wave-front

curvature. Wave fronts may be either planar or curved—as in
plane or spherical waves, respectively.
• As the wave equation is linear, we can use simple solutions like
plane or spherical waves as building blocks to form more complex
solutions.
• Either plane waves or spherical waves may be used as the basis
functions (or building blocks) to construct any light field.
• Plane-wave and spherical-wave solutions are mathematical ideal-
izations, but can provide a good approximation to a real-world
light field in a particular region of interest.
• A scalar plane wave is monochromatic with a wavelength, λ.
It has infinite spatial extent, is transverse, and has a unique
amplitude and phase. The electric field is perpendicular to the
propagation direction.
• Photons described by a plane wave have a unique momentum, p =
k.
• Inside an optical medium with refractive index, n, the magnitude
of the wave vector becomes nk.
• Phase and energy continuity at an interface leads to the law of
refraction, and the reflection and transmission coefficients.
• Often we are interested in waves that travel predominantly in a
particular direction, known as the optical axis (usually assumed to
be along the z axis). Ignoring parts of the field which are far from
the optical axis is known as paraxial optics.
• In paraxial optics, wave-front curvature appears as a phase that
varies quadratically with transverse displacement.
• An ideal thin lens imprints a spatially varying phase that
converts plane waves to paraxial spherical waves.
Exercises
(2.1) Plane-wave properties (1) axis in vacuum at a particular instant in time, e.g.,
Verify the results of equations (2.4)–(2.7). t = 0.
(2.2) Plane-wave properties (2) (2.3) Small-angle approximation

List all the independent parameters required to Explain, briefly, what is meant by the small-angle
specify the properties of a linearly polarized plane approximation. When is it useful and when can
wave (polarized along y) propagating along the z it not be used in optics? Estimate the fractional
32 Exercises
error (as a percentage) in using the small-angle would the field look like upstream of the lens?
approximation for the case of light propagating at (2.7) Diverging and converging paraxial spherical waves
an angle θ = 30◦ relative to the z axis. Write expressions for both diverging and converg-
(2.4) Paraxial distance ing paraxial spherical waves propagating along the
Write an expression for the distance r between z axis, if the two wave originate from z = −f and
an input point (x , 0) and an observation point z = f , respectively.
(x, z). Use |x − x | < z to expand r in terms of (2.8) Collimation of a point source
z, x, x , x2 , and x2 . What is this approximation Write an equation for the electric field of a
called? Explain, briefly, when it might be possible spherical wave centred on the origin. Rewrite
to neglect the x2 term while retaining the other this equation in a plane a distance z = f
terms. downstream in the paraxial regime. Comment on
(2.5) Paraxial plane waves the approximations used.
Write an equation to describe the electric field A plano-convex lens with focal length f is placed
variation along the x axis for a paraxial plane wave in the z = f plane. What is the form of the wave
with amplitude E0 propagating in the xz plane at fronts downstream of the lens?
an angle θ relative to the z axis. (2.9) Scalar and paraxial breakdown
Write an equation in complex notation for a plane Give an example of an optical instrument where
wave propagating at angles θx and θy relative to the scalar approximation breaks down. Explain
the z axis in the xz and yz planes, respectively. why these approximations break down in this case.
Express your answer in terms of kx = sin θx , (2.10) Intensity
ky = k sin θ, and k only. If the electric field at position (x, y, z) is given
Write an inequality for kx , ky , and z in the by E(x, y, z) = E0 ei(kx x+ky y+kz z) , write an
paraxial regime. Use this inequality to write an expression for the intensity distribution I(x, y, z)
expression for a paraxial plane wave. and comment on the spatial dependence.
(2.6) Paraxial spherical wave (2.11) Dispersion
Write an equation for a spherical wave with origin Both dispersion and diffraction may induce a
at z0 . Rewrite this equation for the case of the change in the propagation direction. Explain, why
paraxial regime. Comment on whether a lens with dispersion tends to deflect blue light more than
focal length f in the z = 0 plane would cancel red, whereas for diffraction it is the other way
or double the transverse phase dependence. What around.
Two waves: interference 3
Two roads diverged in a wood, and I —
3.1 Introduction 33
I took the one less traveled by,
And that has made all the difference.
3.4 Standing waves 36
Robert Frost (San Francisco 1874–Boston 1963),
Mountain Interval, 1916. 3.5 Two spherical waves 36
3.8 Three waves 42
3.1 Introduction 3.9 Diffraction grating 43
In Chapter 2 we considered one wave (with either planar or curved wave 3.11 Fabry–Perot etalon 45
fronts). In this chapter we consider the sum of two waves. The ability 3.12 Michelson interferometer 47
to add any two wave solutions follows directly from the linearity of the Chapter summary 49
wave equation and the principle of superposition, Section 1.5 in Chapter
Exercises 49
1. The two waves could be either completely independent, one wave
that induces another wave inside a medium, or two waves obtained from
one by dividing the wave front or amplitude into two and subsequently
recombining them. The two waves may have different frequencies, or
the same frequency but different propagation directions, or even the
same frequency and same direction but arrive via different paths. The
sum of two waves gives rise to the phenomenon of interference, see
Fig. 3.1. For two waves to interfere they must have a well-defined
relative phase, a property known as coherence that we shall consider
in Chapter 8. Depending on their relative phase, two waves may
interfere either constructively or destructively. We begin with a
brief history of interference phenomena.
Fig. 3.1 Left: Photograph of inter-

fering water waves (Courtesy of Sarah
3.2 A brief history Bunton). Right: Simulation indicating
the circular wave fronts (white lines).
In 1801, Thomas Young (Milverton 1773–London 1829) observed the
interference between light fields originating from two separate apertures.
In 1814, Augustin-Jean Fresnel (Broglie 1788–Cille d’Aray 1827) ob-
served interference in the shadow of a wire. Only after these interference
experiments of Young and Fresnel did the wave theory of light become
firmly established; however, wave interference was already well known.
In his 1632 Dialogue on the Two Chief World Systems, Galileo Galilei
wrote about constructive and destructive interference in the tides:
34 Two waves: interference
sometimes it will happen that the primary and secondary causes

agree...the tides are very large; other times, the primary impulse
being somehow opposite to that of the secondary cause,...the sea
will reduce to a very calm and almost motionless state
A discussion on tidal interference also appears in Isaac Newton’s
(Woolsthorpe 1642–London 1726) Principia Book III (1687), Proposi-
tion XXIV,
it may happen that the tide may be propagated from the ocean
through different channels towards the same port,...in which case
the same tide, divided into two or more succeeding one another,
may compound new motions of different kinds.
Fig. 3.2 Schematic of a mechanical
tide-predicting machine that adds two Newton cites the atypical tide observed by the
waves with different amplitude and
period. Two rotating wheels act as
seamen in the port of Batsham, in the kingdom of Tunquin
phasors. Their sum is read out by
the marker on the left, whose vertical
position is determined via the angle that puzzled scientists because it was observed to rise and fall only once a
of the phasor wheels. In contrast day. As tides depend on contributions from the Moon and the Sun, it was
to light, where only relative phases realized that summing waves was the key to accurate predictions. An
matter, for water the absolute phase of
every component matters. The largest
enormous effort was made to build mechanical computers that could sum
tide-predicting machine ever built (now waves with different amplitude and phase: William Thomson (Belfast
in the Deutsches Museum in Munich) 1824–Largs 1907)1 built a ten-component tide-predicting machine in
summed sixty-four components. 1874 that is sometimes cited as one of the first functional computers.
1 These machines use rotating wheels, i.e., mechanical phasors, to move
Known as Lord Kelvin from 1892.
pulleys up and down to create a read-out, as in Fig. 3.2.
2
Young trained as a medic and be- The interference of light waves was first seen by Thomas Young.2 In
came interested in understanding both 1801 he wrote (quoted in Darrigol 2012):
speech and vision. First he studied
sound waves and then showed that light I am at present employed in some further optical investigations,
shares some similar properties.
which I imagine, will be considered as more important than any
of my former attempts, as I think they will establish almost
incontrovertibly the undulatory system of light, ...
Those optical investigations were Young’s famous two-hole experiment.3
Young placed an aperture upstream of the two holes to collimate the
sunlight and achieve coherent illumination, see Chapter 8. Rather
than observing a geometrical shadow of the aperture, Young observed
Fig. 3.3 Young’s sketch of his two-
hole experiment. Two holes on the
many spots and suggested that a way to understand the patterns of
left (labelled A and B) emit circular light and dark was to consider the interference of two spherical waves
waves propagating from left to right. emerging from each hole in a similar way to interfering waves on water.
Positions of constructive interference The sketch he drew, Fig. 3.3, is almost identical to the case of two
are labelled C to F on the right-hand
side. Note the similarity to interfering
circular water waves shown in Fig. 3.1. However, experiments on the
water waves in Fig. 3.1. The time- polarization of light remained puzzling, and in 1810 Young wrote that
averaged intensity pattern for light, see he was ‘undecided’ as to its true nature. In 1815, Augustin-Jean Fresnel
Fig. 3.6, looks very different. lost his job (for political reasons), returned to the family home, and
3
In Chapter 5 we shall consider the began a series of experiments using a honey drop as a lens. He measured
modern variant of Young’s experiment the interference pattern in the shadow of a wire (see Darrigol 2012) and
based on two slits.
was able to reproduce his observations using a sum of circular waves,
thereby validating Young’s wave hypothesis.
3.3 Two plane waves

As a first example of adding two waves, consider two plane waves
with the same amplitude, E0 , and angular frequency ω, propagating
in different directions specified by wave vectors k1 and k2 , respectively.
The total (scalar) electric field is
E = E0 ei(k1 ·r−ωt) + E0 ei(k2 ·r−ωt) . (3.1)
We have included the time dependence explicitly in order to illustrate

the effect of time averaging. We can simplify this expression by taking
out a common phase factor to obtain

E = E0 ei(k̄·r−ωt) eiΔk·r/2 + e−iΔk·r/2 ,
= 2E0 ei(k̄·r−ωt) cos (Δk · r/2) , (3.2)
where k̄ = (k1 + k2 )/2 is the bisector of k1 and k2 , and Δk = k1 − k2 is

perpendicular to the bisector, see Fig. 3.4. As the time dependence of
the electric field is fast (an optical field changes sign every femtosecond),
we are interested in the intensity—the magnitude of the time-averaged Fig. 3.4 The interference pattern
Poynting vector, eqn (1.23) in Chapter 1. The intensity—proportional (greyscale) for two plane waves with
to the squared modulus of the complex scalar field—is: wave vectors k1 and k2 that intersect
at an angle θ0 (in the plane of k1 and
I = 4I0 cos2 (Δk · r/2) , (3.3) k2 ). The propagation directions and
wave fronts are indicated by the white
arrows and lines, respectively. The
where I0 = 12 0 cE02 is the intensity observed for a single plane wave.
interference fringes are perpendicular
From this expression we find that the interference pattern consists of to the bisector of wave vectors, k̄, with
cosine-squared fringes in the direction of Δk, i.e., perperdicular to the a spacing between the maxima of Λ =
bisector of k1 and k2 , see Fig. 3.4. For two plane waves with equal λ/[2 sin(θ0 /2)].
amplitude the intensity minima are zero. In regions of space where the
two waves are exactly out of phase, destructive interference is complete.
At the maxima, the two waves are in phase; the electric field is twice
as large and the intensity is four times larger than for a single wave.
The average intensity is 2I0 —the same as the sum of the intensities of
the two individual waves. This analysis highlights a generic feature of
interference: the location of the energy in the electromagnetic field is
redistributed, but the total energy content is conserved.
Example 3.1
Two plane waves at angles ±θ 0 /2 to the z axis: Consider two plane
waves propagating in the xz plane at angles θ = ±θ0 /2 relative to the z axis,
such that the angle between them is θ0 , i.e., x and z, horizontal and vertical,
respectively in Fig. 3.4. The wave vectors are k1 = (k sin θ0 /2, 0, k cos θ0 /2) and
k2 = (−k sin θ0 /2, 0, k cos θ0 /2). In this case, eqn (3.1) becomes

E = E0 ei[k cos(θ0 /2)z−ωt] eik sin(θ0 /2)x + e−ik sin(θ0 /2)x ,
= 2E0 ei(k cos(θ0 /2)z−ωt) cos [k sin(θ0 /2)x] . (3.4)

This equation says that we have a travelling wave along z and a standing wave along
x. The standing wave is formed because we have two components that propagate in
opposite directions along x. In Section 3.4, we shall consider the special case where
k1 and k2 are anti-parallel. By calculating the time average we find an intensity
I = 4I0 cos2 [k sin(θ0 /2)x] = 2I0 {1 + cos [2k sin(θ0 /2)x]} , (3.5)
where we have rewritten the result using 2 cos2
πux = 1 + cos 2πux to highlight that
the spatial frequency of cosine-squared is twice that of cosine.
There is no intensity variation along the bisector of k1 and k2 , i.e., along z, as
both plane waves have the same value of kz . However, there are periodic interference
fringes in the direction of Δk, which in this example is along x, as there are two
values of kx . This type of interference pattern may be observed, for example, when
a plane wave is reflected by a wedge-shaped piece of glass producing two reflected
waves propagating at different angles, see Exercise 3.2. By considering the intensity
variation along x, we find that the distance between maxima in the interference
pattern, Λ, is
λ
Λ= , (3.6)
2 sin(θ0 /2)
where θ0 is the angle between the wave vectors. In the small-angle approximation, the
spacing between the maxima is Λ ≈ λ/θ0 and the spacing frequency of the intensity
pattern is θ0 /λ.
3.4 Standing waves

A special case of two interfering plane waves is obtained when they
are counter propagating, i.e., k2 = −k1 = k. For equal amplitudes,
eqn (3.1) becomes
function of space

−iωt
E = 2E0 e
cos(k · r) , (3.7)
Fig. 3.5 The formation of a standing
wave. Left column: Two counter- function of time
propagating waves moving apart at
times t = 0 (top) to 11 T (bottom)
and we get a static or standing wave whose spatial dependence is fixed
40
in intervals of T /40. Right column: but with an amplitude that oscillates in time. As above, we detect the
The corresponding sum. At t = T /4 intensity (or time-averaged Poynting vector magnitude), which is
(row 11), the amplitude is zero. For
t > T /4 (row 12), the amplitude starts I = 4I0 cos2 k · r = 2I0 (1 + cos 2k · r) . (3.8)
to increase again. The shaded region
indicates the maximum amplitude of
the resulting standing wave.
Note that whereas the intensity of each individual plane wave is uniform
their sum has an intensity distribution that varies sinusoidally along
the propagation direction, k. The spatial frequency of the interference
fringes is 2u = 2k/(2π), and as predicted by eqn 3.6, the spatial period
of the interference fringes is λ/2. A time sequence of two counter-
propagating waves and their sum is shown in Fig. 3.5.
3.5 Two spherical waves

Our next example of adding two waves is the case of two spherical waves.
We consider two waves with the same amplitude, Es , and wavelength,
λ = 2π/k, but with different origins. We neglect the explicit time
dependence, as it is the same for both terms and disappears when

we calculate the intensity. We are interested in finding the field at an
observation point, P, with co-ordinates, r, as shown in Fig. 3.6. At P,
the two waves propagate in different directions. Consequently, although
the wave vectors k1 and k2 have the same magnitude, they point in
different direction, directed normal to the wave front and parallel to a
line back to the source point in both cases, see Fig. 3.6. We can write the
total (scalar) electric field as the sum of the fields from each spherical
wave:
ei[k1 ·(r−d1 )] ei[k2 ·(r−d2 )] Fig. 3.6 The interference pattern for
E = Es + Es , (3.9) two spherical waves with centres at
ik|r − d1 | ik|r − d2 | positions d1 and d2 relative to the
origin (marked by the black dot).
where d1 and d2 are vectors that specify the origin of each spherical The time-averaged intensity pattern is
wave. We can make a useful simplification when the propagation shown using the greyscale. The wave
vectors at the observation position P
distance z is much larger than both the wavelength λ and the separation (white dot) are k1 and k2 (indicated
of the sources, |d1 −d2 |. In this case we can put r−d1,2 ≈ r̄, where r̄ is a by the white arrows). The polar
vector from the mid-point between the source points to the observation vector and a vector from the mid-point
point, see Fig. 3.6. In this case between the sources to the point P are
r and r̄, respectively. The wave crests
ei(k1 ·r̄) are indicated by the white semicircles.
E = Es 1 + ei[(k2 −k1 )·r̄] , (3.10) In contrast to plane waves, Fig. 3.4, the
ikr̄ interference pattern spreads out as it
propagates. Note the difference to an
and the intensity is amplitude interference pattern, such as
circular water waves, shown in Fig. 3.1.
I = 2Īs {1 + cos [(k2 − k1 ) · r̄]} , (3.11)
where Īs = Is /(kr̄)2 is the intensity of a single spherical wave at a

distance r̄. Apart from this pre-factor (the peak intensity drops as 1/r̄2
as required by energy conservation) the result looks the same as obtained
for plane waves, eqn (3.3). However, the interference pattern, Fig. 3.6
compared to Fig. 3.4, looks very different, because of the way that k
changes direction for a spherical wave. The intensity pattern, Fig. 3.6,
also looks very different to the instantaneous amplitude patterns for
the interference of circular water waves, Fig. 3.1. Whereas for water,
we can observe the amplitude directly and the absolute phase of every
component matters, for light we observe the time-averaged energy flux
(or intensity) and only the relative phase between the waves matters.
Next, we apply the above results to Young’s interference experiment.
3.6 Young’s interferometer

As Young’s experiment remains one of the simplest examples of two-wave
interference, it has become the paradigm to explore interferometry in
general, including in quantum physics.4 If the hole size is small compared 4
Interference experiments have also
to their separation and the distance to the screen, then we can derive been performed using neutrons, atoms,
electrons, and molecules, see Adams et
an approximate expression for the interference pattern by summing two al. (1994).
spherical waves.
Fig. 3.7 Geometry of Young’s in-

terferometer. Two small apertures
separated by d are illuminated normally
by uniform monochromatic light. In
the paraxial regime, x < z, the
difference in the optical path from each
aperture to the observation plane is
d sin θ ≈ d(x/z).
Consider an opaque screen in the z = 0 plane containing two small

holes with positions (x , y ) = (±d/2, 0) and diameters D
d, as in
Fig. 3.7. If the screen is illuminated by a monochromatic plane wave
propagating along z, then the field at z d is given by the sum of two
spherical waves:
eikr1 eikr2
E = Es + Es , (3.12)
ikr1 ikr2
5
Only if we specify a hole diameter can where Es is the effective amplitude of the spherical waves,5 and r1 and r2
we relate Es to the amplitude of the are the distances from the centre of each hole to the observation point,
incident field, E0 , see Chapter 5.
as defined in Fig. 3.7. Again we neglect the explicit time dependence as
it cancels when we calculate intensity. In the far-field, z d, we can
neglect the slight
√ variation
√ in the relative amplitude of the two waves
and repace 1/ r by 1/ r̄, where r̄ = z + (x2 + y 2 )/2z is the paraxial
distance from the mid-point between the two apertures (0, 0, 0) to the
observation point (x, y, z), see Fig. 3.7. In this case

E = Ēs eikr1 + eikr2 , (3.13)
where Ēs = Es /ikr̄ is the amplitude of a single spherical wave at a

6
For slits rather than holes, we would
√
distance r̄.6 For the region close to the optical axis, x < z, we can
replace the amplitude by Ēc = Ec / ikr̄, use paraxial spherical waves, eqn (2.40), and replace 1/r by 1/z. In
where Ec is the amplitude of the both cases, the bracketed term in eqn (3.13) has the form of a two-
cylindrical wave, see Chapter 5 for more
details. phasor sum, as shown in Fig. 3.8. The relative phasor angles—which
depend on the aperture-to-screen path difference—determine whether
we see constructive or destructive interference.
To find the path difference, we rewrite the distances r1,2 using the
Fresnel approximation. Recall that the distance between a general
input coordinate (x , 0) and an observation point (x, z) is
1/2
r (x − x ) + z 2
2
= , (3.14)
and for a propagation distance greater than the transverse region of

interest, z > |x − x |, we can expand the square root as follows:

1/2 (x − x )2
r (x − x ) + z 2
2
= ≈z+ ,
2z
x2 − 2x x + x2
= z+ . (3.15)
2z
Substituting x = ±d/2 and using r̄ = z + x2 /2z, we obtain
xd d2 xd d2
r1 = r̄ − + ; r2 = r̄ + + . (3.16)
2z 8z 2z 8z
In the far-field, where z x , it is also convenient to neglect the x2 /2z =
d2 /8z term as well. This is known as the Fraunhofer approximation,
see Chapter 5.7 For Young’s two apertures, we can retain the x2 terms, 7
Note that although the x2 /2z be-
as they cancel in the final result. Substituting for r1 and r2 , eqn (3.16) comes smaller with increasing z, the
x x/z does not, as the range of x
in eqn (3.13), the sum of the two waves becomes increases linearly with z as the light
field spreads out. Typically, z is at least
E = Ēs eik(r̄+d /8z) eikdx/2z + e−ikdx/2z ,
2
a factor of two larger than x and more
than an order of magnitude larger than
2 kdx x .
= 2Ēs eik(r̄+d /8z) cos . (3.17)
2z
This expression tells us how the path difference, ±dx/2z, and hence
the phasor angles, ±kdx/2z, depend on the transverse position x in the
observation plane. The prefactor contains information about the mean
distance between the apertures and the observation point, and gives rise
to a global phase that disappears when we calculate the time-averaged
intensity. The intensity is proportional to the modulus-squared of the
field:

kdx 2πx
I = 4Īs cos 2
= 2Īs 1 + cos , (3.18)
2z (λ/d)z
where Īs = Is /(kr̄)2 is the on-axis intensity observed a distance z

downstream for a single hole. Note that in the expression for the
Fig. 3.8 Intensity distribution, I, as a

function of transverse displacement x in
the far-field of Young’s interferometer.
The relative phasor angles of each
component for two intensity maxima
and a position of zero intensity are
shown on the right.
2
intensity, the global phase factor, eik(r̄+d /8z) , has disappeared and only
the relative phase between the two paths matters.
The intensity far downstream of Young’s two apertures, eqn (3.18)
plotted in Fig. 3.8, varies sinusoidally with position along the x axis
with a spatial frequency u0 = d/(λz) or effective ‘wave length’, 1/u0 =

(λ/d)z, that increases with propagation distance z. The fringes of
light and dark corresponding to regions of constructive and destructive
8
The interference pattern can still be interference8 spreads out as we move further downstream, as in the right-
observed in the limit where there is on hand side of Fig. 3.6. Note the difference to the sum of two plane waves,
average only one photon at a time, see
e.g. Taylor (1909).
Fig. 3.4, where there is no spreading out of the interference pattern.
A convenient way to interpret the light and dark fringes is in terms
of phasor diagrams, Fig. 3.8. For intensity maxima at positions x =
m(λ/d)z, where m is any integer, the two phasors are parallel and add
constructively. Whereas, for intensity minima at positions x = (m +
1/2)(λ/d)z, the phasors are anti-parallel and cancel.
Note that only in the far-field can we neglect the finite size of the
aperture, and calculate the field as a discrete sum of two phasors. For
real apertures with a finite size, we have to sum the field from all
contributing points in the input plane, and the discrete sum is replaced
by an integral, see Chapter 5.
Example 3.2
Young’s double slit using eqn (3.11): Consider two slits aligned in the vertical,
or y direction, centred at (±d/2, 0), as in Fig. 3.9. The mid-point between the slits
is at the origin. If the field is uniform along y, we need only consider the field
dependence along x. From eqn (3.11) the sum of the two waves is
I = 2Īs {1 + cos [(k2 − k1 ) · r̄]} , (3.19)
where for slits the two waves are cylindrical, and Īs = Is /kr̄ with Is equal to the
on-axis intensity in the observation plane for a single slit. If θ1 and θ2 are the angles
from the centre of each slit to the observation point relative to the z axis, then
(k2 − k1 ) · r = (k sin θ2 − k sin θ1 )x + (k cos θ2 − k cos θ1 )z .
In the far-field, z x, we can make use of the small-angle approximation,
x − d/2
sin θ1
θ1 = ; cos θ1 = 1 ,
Fig. 3.9 Geometry of the double slit z
used in Example 3.2. In practice, z and
x + d/2
d, more like in Fig. 3.8. sin θ2
θ1 = ; cos θ2 = 1 .
z
Substituting into the intensity formula we find

kdx
I = 2Īs 1 + cos . (3.20)
z
This result is identical to the two-hole example, eqn (3.18), except for the different
form of Īs .
3.7 Plane plus spherical

In addition to the sum of two plane or spherical waves, another case of
interest is a plane wave plus a spherical wave. For convenience, we shall
assume that the amplitude and phase of the spherical wave at the point
(0, 0, z), Es /ikz, is equal to the amplitude of the plane wave E0 . The
sum of a spherical wave and a plane wave as a function of transverse

displacement ρ = (x2 + y 2 )1/2 from the z axis is
2

E = E0 eikz 1 + eikρ /2z , (3.21)
where we have used the paraxial form of the spherical wave, eqn (2.40).
The corresponding intensity distribution is
2 2
kρ kρ
I = 2Is 1 + cos = 4Is cos2 . (3.22)
2z 4z
This intensity pattern in the xy plane is shown in Fig. 3.10. As we
have seen before, interference converts phase information—in this case
the phase across the wave front of a spherical waves as in Fig. 2.1—
into intensity information. The plane wave acts as a reference that Fig. 3.10 The interference pattern
generates an intensity read-out of the phase of the other wave.9 The between a plane wave and a spherical
first zero moving away from the centre occurs at a radius ρ1 , where wave a distance z downstream of the
cos−1 (kρ21 /4z) = π/2, which gives origin of the spherical wave.
√ 9
We shall encounter similar patterns
ρ1 = λz . (3.23) when we look at phase differences
between on-axis and off-axis paths in
The subsequent dark fringes are at odd multiples of π, Chapter 5.

ρm = (2m − 1)λz , (3.24)
where m is an integer.
In Fig. 3.11 we show the interference pattern in the xz plane. A
pattern of this type would be produced if a point-like scatter (at the
origin) reflects a part of the incident plane wave. Alternatively, by
placing a mask at −z that reproduces the amplitude of the interference
pattern we can recreate the field produced by the scatterer when
it is not there. This is the principle of holography, invented by
Dennis Gabor (Budapest 1900–London 1978) while working at British
Thomson-Houston in 1947. The mask needed to create the required
phase and amplitude is known as a hologram. As any object is
simply an array of scattering points, holography can recreate images Fig. 3.11 Intensity pattern in the xz
of three-dimensional objects. Figure 3.11 also provides an example of plane for a superposition of a plane
how interference can produce light patterns that appear not to travel in wave and a spherical wave. A ‘mask’
placed at −z that creates the field
straight lines!
shown in Fig. 3.10 reproduces an image
of a point-like object at the origin. A
complex mask or hologram can be used
to recreate any scattered field.
Example 3.3
Newton’s rings: A pattern similar to Fig. 3.10 was first observed by Newton
in 1717 when he looked at the reflection from a spherical glass surface placed
on top of a reflecting planar surface. A similar pattern is also observed due to
the interference between the reflections from planar and curved surfaces of a
plano-convex lens, as illustrated in Fig. 3.12. The sum of planar and curved
waves from the front and back surface of a lens with radius of curvature RL is
E = R1/2 E0 eikz − (1 − R1/2 )R1/2 E0 eikz e2inkt ,
2
= R1/2 E0 eikz − (1 − R1/2 )R1/2 E0 eikz e2inkt0 eikz e−inkρ /RL
,
where R1/2 is the amplitude reflection coefficient as defined in eqn (2.32) and
t0 is the thickness
of the lens. If nkt0 = mπ, then we observe bright fringes at
positions ρm = (2m − 1)λRL /2 and the pattern is similar to Fig. 3.10 with
the bright and dark fringes interchanged.
3.8 Three waves

Many interference phenomena require a sum of more than two waves.
Fig. 3.12 The geometry used to ob- For any multi-path interference phenomenon involving more than two
serve Newton’s rings in the reflections
from a plano-convex lens. The planar
paths, it is possible to extend the two-phasor sum of eqn (3.13) to three
and curved surfaces of the lens produce or more. As a first example, we consider a three-phasor sum. Consider
a plane wave and a spherical wave that an aperture containing three small holes with positions x = −d, 0, and
interfere. +d as illustrated in Fig. 3.13. The aperture is illuminated normally by
Fig. 3.13 The geometry of three small

apertures with spacing d. The electric
field at (x, z) is proportional to the
sum of three phasors. As only the
relative phases matter, the central path
provides a reference with a fixed phasor
(darker grey) aligned along the real
axis. As we move along the x axis,
the other two phasors (lighter grey)
rotates anti-clockwise and clockwise
with angles, φ = ±kdx/λ.
uniform monochromatic light as before. The field at a position (x, z)

for z d is given by the equivalent of eqn (3.13), except now there are
three terms:

E = Ēs eikr̄ e−ikdx/z + 1 + eikdx/z . (3.25)
The intensity distribution along the x axis has the form

2
kdx
I = Īs 1 + 2 cos . (3.26)
z
We plot both the normalized field E/Ēs and the normalized intensity
I/Īs as a function of position in the observation plane in Fig. 3.14.
The new feature compared to two apertures is that there are now two
types of peaks; big peaks with relative intensity 9, and smaller peaks
with relative intensity 1. These two types of peak are called principal
maxima and subsidiary maxima, respectively.
10
Note that the spatial average of the The principal and subsidiary maxima are easily interpreted in terms
intensity along the x axis is three times
that of a single phasor. This highlights
of phasor diagrams. For a principal maximum, the phasors interfere
the key feature of interference—energy constructively, giving a resultant field vector that is three times larger
is conserved but spatially redistributed. and an intensity—proportional to the modulus-squared of the field—that
is nine times larger, see Fig. 3.14.10 As we move away from a principal
3.9 Diffraction grating 43
maximum along the x axis, one phasor remains fixed while the other two
rotate clockwise and anti-clockwise at the same rate. We can think of
this as a clock face with three hands. At a position x the relative angles
of the phasors are φ = −kdx/z, 0, and +kdx/z. When kdx/z = 2π/3,
i.e. x = 13 (λ/d)z, the three phasors interfere destructively to give zero.
We refer to the zero on either side of the central maximum as the first
zero. When kdx/z = π, the two rotating phasors are in the opposite
direction to the fixed phasor, giving a resultant of minus one, and hence
the intensity is one. This case corresponds to the subsidiary maximum
and is midway between principal maxima. As we move further along x
we eventually come back to the position at kdx/z = 2π where all the
phasors line up again, giving another principal maximum.
Example 3.4
Three plane waves: Consider the sum of three plane waves propagating at angles
−θ0 , 0, and +θ0 , relative to the zaxis in the xz plane. The total field is Fig. 3.14 Normalized field (a) and

E = E0 ei(k sin θ0 x+k cos θ0 z) + eikz + ei(−k sin θ0 x+k cos θ0 z) , (3.27) intensity (b) corresponding to the sum
of three phasors used to describe the
and the intensity distribution along the x axis in the far-field has the form far-field diffraction pattern produced

I = I0 1 + 4 cos [k(cos θ0 − 1)z] cos(k sin θ0 x) + 4 cos2 (k sin θ0 x) . (3.28) by three small apertures. Phasor
diagrams for positions where I/Is = 0,
The intensity pattern is plotted in Fig. 3.15. As the third plane wave has a different
1, and 9 are shown.
spatial frequency along z, the intensity remains a function of z, repeating over a
distance Λ = λ/(cos θ0 − 1).
3.9 Diffraction grating

In this section, we extend the above analysis to the general case of N -
apertures which takes the form of an N -phasor sum. For large N ,
the N -phasor sum provides a useful description of the properties of a
Fig. 3.15 The interference of three
diffraction grating, first made by Joseph von Fraunhofer (Straub- plane waves. The spacing between the
ing 1787–Munich 1826) to calibrate wavelength in measurements of principal maxima in the x direction
dispersion.11 Fraunhofer’s grating consisted of lines ruled on a metal is λ/(2 sin θ0 ). The phase mismatch
between the on-axis (2) and off-axis
surface and worked in reflection rather than transmission, but the theory
(1 and 3) waves leads to an intensity
is similar for both cases. First, we consider the case of a transmission variation along z, with the pattern
grating. For N apertures with spacing d, the equivalent of eqns (3.13) repeating over a length scale λ/(1 −
and (3.25) is a sum of N terms which can be written in the form of a cos θ0 ).
geometric progression, see Appendix B, Section B.9:
11
−1 Fraunhofer used glasses with dif-

N
ikr̄ −i(N −1)kdx/2z ferent dispersion to make achromatic
E = Ēs e e e ikdx/z
lenses and build achromatic telescopes.
n=0 He died at only thirty-nine from
sin N kdx/2 tuberculosis and possibly lead oxide
= Ēs eikr̄ e−i(N −1)kdx/2z , poisoning, having become one of the
sin kdx/2
greatest experimentalists in the history
and the intensity is of optics.
sin2 (N kdx/2)
I = Īs . (3.29)
sin2 (kdx/2)
This function is plotted in Fig. 3.16. The key features are that: the
Fig. 3.16 Plot of eqn (3.29) for N =

2 − 6. Note that the peak intensity
is proportional to N 2 , the number of
subsidiary maxima is N − 2, and the
position of the first zero is [λ/(N d)]z.
principal maxima are located at integer multiples of (λ/d)z and have an

intensity proportional to N 2 . The first zeros on either side of the central
principal maxima occur at x = ±[λ/(N d)]z, when the N phasors are
equally distributed around the circle, i.e., the angular spacing between
adjacent phasors kdx/z = 2π/N , which gives x/(λz) = 1/(N d).
An important example where we can use the N -phasor sum is the
reflection grating shown in Fig. 3.17. The grating consists of N -
ruled lines which produce N reflections each having traversed a different
optical path. The phase difference between the centre of the grating
and a reflection from the mth line is
Fig. 3.17 A reflecting diffraction grat-
ing. For an angle of incidence θi , the φm = mkd(sin θi + sin θr ) , (3.30)
light reflected from successive ridges
interferes constructively for certain
angles of reflection θr .
where θi and θr are the angle of incidence and angle of reflection relative
to the grating normal n̂, respectively. The total reflected field is

N/2
Er = Ei eiφm , (3.31)
m=−N/2
which has the form of an N -phasor sum. The first-order intensity

maxima occurs when the path difference between adjacent lines is one
wavelength, i.e.,
12
If the first-order maximum is at θr
then the additional path difference for
d (sin θi + sin θr ) = λ. (3.32)
an angle θr +δ is λ+δ cos θr , which gives
a phase difference linearly proportional
to δ, and hence transverse displacement The angle or position dependence around the principal maxima is the
in the observation plane, x = δz. same as in Fig. 3.16.12 A reflection grating is sometimes used as a
frequency selection reflector in a laser cavity, see Chapter 11.
Example 3.5
Wavelength resolution: A grating is typically used to separate different wave-
lengths. If we change the wavelength, then the principal maxima move (except for
the central, or zero-order, maximum). If we consider two wavelengths, λ1 and λ2 ,
then the first-order diffraction peaks are at relative positions (λ1 /d)z and (λ2 /d)z,
respectively. We can say that the two wavelengths are ‘resolved’, if the first
principal maximum at one colour (λ1 ) overlaps with the zero adjacent to the principal
maximum of the other colour (λ2 ), as shown in Fig. 3.18.13 As the distance to the
first zero is (λ/N d)z, this condition gives
λ2 λ1 λ1
z = z+ z. (3.33)
d d Nd
Rearranging, we find that
Fig. 3.18 Detail of the zero and
λ1 λ1 first order for a diffraction grating
= =N . (3.34)
Δλ λ2 − λ 1 illuminated by two colours, λ1 (grey)
This quantity is often called the resolving power of the grating. The equation and λ2 (black). In this example, the
tells us that the smallest wavelength difference we can hope to resolve is inversely wavelength difference is chosen such
proportional to the number of slits (or lines) on the grating. Note that the resolving that first order at λ2 sits at the position
power is a factor of two higher for the second-order diffraction peak. of a zero for λ1 .
13
See also Chapter 9 for further
discussion of this point.
3.10 Interferometry
The application of interference to measurement is known as interfer-
ometry. Interferometry allows us to measure changes in the phase of a
wave, and is used to measure length, as in gravitational wave detection
(Abbott et al. 2016) or the spectrum of light, see Chapter 8. Young’s
two-hole experiment, N -slits, and the reflection grating in Fig. 3.17
are examples of wave front division interferometry, where the
wave front is divided spatially, and the component parts subsequently
recombine and interfere. The colour of a butterfly wing, or the chirped
echo from the steps of the Chichen Itza pyramid14 are other examples. It 14
The reflections of successive steps
is also possible to produce interference by amplitude division, where a produce a rising pitch similar to the call
of the Quetzal bird.
part of the light field is redirected by a partially reflecting surface, then
the two parts are recombined at the same, or another, interface. For
measurement applications, amplitude division has the advantage that
no light is thrown away. Below we focus on two examples of amplitude
division interferometry—the Fabry–Perot interferometer15 and the 15
Named after Maurice Paul Auguste
Michelson interferometer.16 Other examples include the Mach– Charles Fabry (Marseille 1867–Paris
1945) and Jean-Baptiste Alfred Pérot
Zehnder, Sagnac, and Jamin. Aside from differences in optical layout (Metz 1863–Paris 1925). Fabry, along
and applications, the underlying physics of all interferometers is the with Henri Buisson, also discovered the
same, namely, the addition of two (or more) waves. ozone layer, see Mulligan (1998).
16
Named after Albert Abraham
Michelson (Strzelno 1852–Pasadena
3.11 Fabry–Perot etalon 1931) who used it to try to measure
the motion of the Earth through the
In Chapter 2, we looked at the reflection and transmission of light at an aether.
interface. In this section we consider two interfaces where light reflects
back and forth giving rise to multiple-path interference. In optics, a
system that reflects light back and forth is variously known as a Fabry–
17
Translated from the French as stan- Perot interferometer, an etalon,17 or a cavity. An example is
dard. illustrated in Fig. 3.19. The physics of the Fabry–Perot interferometer—
described using an N phasor sum—provides a convenient starting point
to understand a diverse range of interference phenomena, including the
colour of soap and oil films, anti-reflection coatings on optics and laser
cavities. All that matters in each case is the reflection coefficient of each
interface, R1/2 , their separation, , and the wavelength of the light, λ.
The transmitted field is a sum of light transmitted directly and light
that is reflected back and forth and then transmitted as illustrated in
Fig. 3.19. If we assume that the incident light is a plane wave and the
amplitude reflection and transmission coefficients at each interface are
R1/2 and T 1/2 , respectively,18 then we can write the total transmission
as a geometric progression:
1
Et = E0 T 1 + Reiφ + R2 e2iφ + . . . = E0 T , (3.35)
Fig. 3.19 A Fabry–Perot interferome- 1 − Reiφ
ter where two interfaces at z = − and
z = 0 reflect light back and forth. where φ = 2nk/ cos θ is the phase accumulated during one round trip,
18 n is the refractive index, θ is the angle of propagation (relative to z)
The reflection and transmission coef-
ficients are, in general, complex as there inside the Fabry–Perot, and is the length. The transmitted intensity
is a phase shift on reflection, however,
is given by the modulus-squared:19
including these phase shifts does not
change the main result.
It 1
19 = , (3.36)
I0 1+ 4(F 2 /π 2 ) sin2 φ/2
It T T
= where we have defined
I0 1 − Reiφ 1 − Re−iφ
T2 √
= , π R
1 + R2 − 2R cos φ F = , (3.37)
T 2 1−R
=
1 + R2 − 2R(1 − 2 sin2 φ/2)
which is known as the finesse. The finesse determines the sensitivity
T2
= , of the transmission (or reflection) to small changes in the wavelength
(1 − R) + 4R sin2 φ/2
2
or spacing between the two interfaces.20 For example, if we plot the
Using T 2 = (1 − R)2 we get transmitted intensity, eqn (3.36), as a function of the length, , for two
It
=
1
.
values of the reflectivity, R = 0.10 (F = 1.1, grey) and R = 0.80
I0 1 + 4R2 /(1 − R)2 sin2 φ/2 (F = 14, black), in Fig. 3.20, we see a dramatic change in the width of
the transmission maxima. For low reflectivity and hence low finesse, the
20 transmission oscillates sinusoidally as a function of the etalon or cavity
In Example 7.12, we show that
the finesse also relates to the average length, , with peak transmission whenever the round-trip phase is an
number of times that light is reflected integer multiple of 2π. As the phase depends on the wavelength, λ, for
back and forth before escaping.
white light illumination, the interference fringes are coloured. This effect
gives rise to the colour observed in oil or soap films.
For high reflectivity interfaces, the transmission consists of narrow
resonances, the black curve in Fig. 3.20. Again, the resonances occur
when the phase difference is an integer multiple of 2π, i.e., the mth
resonance is given by φm = 2nk/ cos θ = mπ. If we write φ =
φm + then for high finesse, the transmission function, eqn (3.36), is
only non-zero in regions where is small, and we use the small-angle
approximation to write sin φ/2 ±/2. Using φm = 2nkm cos θ and
3.12 Michelson interferometer 47
km = 2πνm /c, for n = 1 and θ = 0, /2 = π(ν −νm )(2/c) and eqn (3.36)
becomes
It (Δν/2)2
= , (3.38)
I0 (ν − νm )2 + (Δν/2)2
where Δν is the full-width at half-maximum (FWHM) of the resonances,
Δνfsr
Δν = , (3.39)
F
and Δνfsr = c/(2) is known as the free spectral range. Equation
(3.38) says that for high finesse, the transmission is a sum of Lorentzian
resonances with spacing Δνfsr , and width, Δνfsr /F. As a high
finesse cavity or etalon has narrow transmission peaks, it is extremely
sensitive to small changes in either the cavity length or the wavelength
(equivalently frequency) and can be used as a frequency reference.
Fig. 3.20 The intensity transmitted

through a Fabry–Perot etalon or cavity
as a function of the cavity-round-trip
phase, φ, which is proportional to
the cavity length, , and inversely
proportional to the wavelength, λ. The
two curves correspond to relatively low
and high reflectivities at the interfaces,
R = 0.1 (grey) and R = 0.8 (black),
respectively. For high reflectivity, the
ratio of the spacing to the width of
the transmission resonances is given by
the finesse, F . A typical Fabry–Perot
might have a finesse of 100. A finesse
of over a billion has been achieved, see
Kuhr (2007).
In summary, for multiple reflections either in a thin film, an etalon,

or a cavity, there are two key parameters: (i) how many times the light
reflects back and forth—characterized by finesse—and, (ii) the round-
trip phase shift, which determines the free spectral range.
3.12 Michelson interferometer

A second important class of amplitude-division interferometers uses
a glass interface or beamsplitter to divide the wave amplitude in
two.21 The two components propagate in orthogonal directions and are 21
An optical component that splits a
subsequently recombined using mirrors to direct them back to the same light field into two equal components is
known as a 50:50 beamsplitter.
or another beamsplitter. In the Michelson interferometer, shown in
Fig. 3.21, the mirrors reflect the light back onto the same beamsplitter,
producing an output that is sensitive to the optical path difference,
2Δ, between the two arms.
For a monochromatic plane wave input with wavelength λ, the output

signal is a sum of two plane waves with a relative phase which depends
on the path difference, Δ. If for Δ = 0 the path length of both arms
is an integer number of wavelengths, then the output field is equal to
the input. If the mirror in one arm is moved backward by Δ then the
path difference increases to 2Δ and the field at the output becomes

E = 12 E0 1 + ei4πΔ/λ , (3.40)
and the intensity is given by
I = 2 I0
1
[1 + cos(4πΔ/λ)] , (3.41)
which has maxima at Δ = mλ/2, where m is an integer. A Michelson

interferometer has recently been used to detect gravitational waves, see
Abbott et al. (2016). If the gravitational wave increases the length of
one arm by Δ and decreases the length of the other arm by Δ, then
the output field is

E = 12 E0 e−i4πΔ/λ + ei4πΔ/λ = E0 cos(4πΔ/λ) . (3.42)
We consider the application of a Michelson interferometer to the

measurement of gravitational waves in Exercise 3.11. The application of
the Michelson interferometer to measure coherence and the spectrum of
a source is discussed in Chapter 8.
Fig. 3.21 Optical layout of a Michelson

interferometer. The input (from
the left) is divided into two by a
beamsplitter. The two components
are retro-reflected by two mirrors and
recombine on the same beamsplitter.
The direction of the recombined light
depends on the relative phase of the two
components. Consequently, the output
intensity, eqn (3.41), is a sensitive
function of the path difference, 2Δ.
Exercises 49
Chapter summary
• The sum of two waves (with either planar or curved wave fronts)
produces interference.
• Constructive or destructive interference depends on the relative
phase of the two waves.
• Young’s two-hole experiment is an example of a wave front
division interferometer. For small holes the interference pattern is
given by the sum of two spherical waves.
• The interference of a plane wave and a spherical wave produces a
Newton’s ring interference pattern.
• Thin films, diffraction gratings, and laser cavities are all examples
of multiple-path interference.
• Young’s two-hole experiment and the diffraction grating are
examples of wave front division. A Fabry–Perot and Michelson
are examples of amplitude division interferometry, where light
interferes via multiple reflections between two reflective interfaces.
Exercises
(3.1) Interference with two inclined plane waves of wave in the Fraunhofer approximation and show
different amplitudes that it can be written in the form
Rework the analysis of Section 3.3 for two plane
eikr̄
waves with the same frequency, propagating in E = Es (1 + ) eiφ .
different directions, with different amplitudes. ikr̄
Show that the interference pattern is still periodic Give expressions for and φ. In a Young’s double-
in space. What is the spatial period? What are slit experiment using a green laser pointer; the slit
the maximum and minimum intensities? positions are at x = ±0.5 mm and the distance
to the screen is z = 1 m. Estimate the size
(3.2) Wedge fringes of the phase term φ and the correction to the
A plane wave with wavelength 633 nm is incident amplitude for a laser wavelength λ = 500 nm. As
on a pane of glass whose front and back surface r̄ = z + x2 /z, we can write that 1/r̄ = 1/z to first
normals are inclined at angles of ±0.050◦ relative order in x/z. Use your answers to justify a further
to the propagation direction. Calculate the spatial approximation in order re-write the spherical wave
period of the fringes observed in reflection. in terms of x , x, and z only.
(3.3) Double slit with a green laser pointer (3.4) Adding N phasors
A spherical wave is written as E = Es eikr /(ikr). The phase of a wave evolves as eikr , where r is the
Explain why there is a factor of k in the distance traversed. Write an expression for r for
denominator. In the Fraunhofer approximation, the case where the start and finish coordinates in
the distance r between a point (x , 0) in the input the xz plane are (x , 0) and (x, z), respectively.
plane and a point (x, z) in the observation plane Rewrite r for the case where z x and x. Give
is given by r = r̄ − x x/z, where r̄ is the distance expressions for r that are used in the Fresnel and
between (0, 0) and (x, z). Rewrite the spherical Fraunhofer approximations, respectively.
50 Exercises
Write an expression for the sum of 4 phasors with between principal maxima .
source points x = − 32 d, − 12 d, 12 d, and 32 d. (3.10) Fabry–Perot etalon
The intensity of light is proportional to the Show that the free spectral range of a Fabry–Perot
modulus-squared of the field amplitude. Write an etalon is Δλ = λ/(2n ), where is the length and
expression for modulus-squared of the phasor sum. n is the refractive index inside the etalon. Find
Express your answer in terms of cosines. What is the optimal thickness of a thin film of titanium
the maximum value? dioxide intended to partially separate the D-lines
Draw phasor diagrams corresponding to the of sodium with wavelengths of 589.0 and 589.6 nm.
observer positions, (i) x = λz/2d and (ii) λz/d,
(3.11) Sensitivity of a gravitational wave detector
and specify the intensity in both cases.
A Michelson interferometer consists of a beam-
(3.5) Summing plane waves splitter that divides an input with amplitude E0
In an optics experiment, the light field can into two equal amplitude ‘arms’ with lengths 1
be approximated by three plane waves with and 2 . A mirror retro-reflects each arms such that
amplitude E0 propagating at angles −θ, 0, and +θ the two paths interfere at the beamsplitter.
relative to the z axis. Write an expression for the
field along the x axis. (a) Write an expression for the output field after
the two paths recombine at the beamsplitter.
(3.6) Summing real waves
State any assumptions you make.
In the xz plane, the general plane solution to
Maxwell’s wave equation is E = E0 cos(kx x + (b) The path difference, 2 − 1 , is chosen such
kz z − ωt). Consider two plane waves propagating that the intensity at the output is one-half
at angles ±θ relative to the z axis. Write an of the maximum value. A gravitational
expression for the total field along the x axis. Re- wave arriving at a Michelson interferometer
write the sum in the form of a standing wave and increases the length of one arm by Δ , and
discuss what happens as a function of time. What decreases the length of the other arm by Δ .
is the field at ωt = π? Explain, briefly, what (c) Write an expression for 2 − 1 in terms
this means for the energy of the field and energy of the wavelength λ in the absence of a
conservation. gravitational wave, i.e. when Δ = 0.
(3.7) Light and water (d) Next, write an expression for the output
In a phasor model of the tides, see Fig. 3.2, intensity as a function of Δ , assuming that
two phasors are sufficient to explain a wave form Δ is small.
with both principal and subsidiary maxima, i.e. (e) If the power circulating in each arm is
alternating larger and smaller peaks. In contrast, 0.8 MW and the minimal detectable signal
for light, three phasors are needed to account is 1.0 μW, the wavelength is 0.5 μm and the
for an equivalent intensity pattern, see Fig. 3.15. length of each arm is 4 km, estimate the
Explain, briefly, the difference between the two minimum strain, Δ / , that can be detected
cases. in principle.
(3.8) Young’s two holes (f) Give two reasons why Young’s double-slit
Young made two holes in an opaque screen with a interferometer is less well suited to measure
spacing of 1 mm. He observed the interference gravitational waves than a Michelson inter-
pattern on screen a distance 2 m downstream. ferometer.
What was the spacing between the interference ⎡ ⎤
fringes assuming that the centre wavelength of Hints:
light is 550 nm? ⎢ cos(A + B) = cos A cos B − sin A sin B. ⎥
⎢ ⎥
⎢ ⎥
(3.9) More than two holes ⎣ For small B, sin B = B, cos B = 1 ⎦
A screen contains four narrow slits uniformly and cos(A + B) = cos A − B sin A.
spaced with separation d. Give their positions
along the x axis assuming that they are (3.12) Energy conservation in the Michelson interferom-
symmetrically distributed about the z axis. eter
Write a phasor sum in the far-field. A Michelson interferometer is adjusted such that
Sketch phasor diagrams for (i) a position with zero the output as expressed by eqn (3.41) is zero.
intensity on either side of the principal maxima, Where has the energy gone?
and (ii) a position with zero intensity midway
Polarization 4
‘I remember in 37 when . . . you could go up a spiral staircase
4.1 Introduction 51
and sit up on top. Those were great, great days.’
4.2 Linear basis (|) 52
Tiny Tim (New York 1932–Hennepin County 1996).
4.4 Circular polarization (|) 53
4.1 Introduction 4.6 Circular basis (◦) 55
4.7 Poincaré sphere (◦) 56
Polarization is a fundamental property of any wave motion that can
4.8 Photon spin (◦) 56
sustain oscillations in more than one direction for a given direction of
propagation.1 Light can exist in two polarization states—photons have
4.10 Polarizers 58
an angular momentum or spin, either parallel or anti-parallel to the
propagation direction.2 Consequently, we can think of polarization as a 4.11 Malus’ Law 58
two-wave phenomenon. Most light sources such as lamps or stars tend 4.12 Linear birefringence (|) 59
to produce light with a mixture of polarization states—unpolarized 4.13 Wave plates (|) 59
light. Unpolarized light can be converted into polarized light using 4.14 Circular birefringence (|) 61
optical components. Lasers tend to produce polarized light. 4.15 Natural optical activity (|) 61
The polarization properties of light are responsible for many everyday 4.16 The Faraday effect (|) 62
optical phenomena such as the reduction of glare, or scatter, when 4.17 Interference 64
looking through polarizing sun glasses; the anti-glare devices on display Chapter summary 67
screens and monitors; optical devices such as DVD players; and the Exercises 67
glasses used in 3D cinema. Polarization analysis is used in the eyes of
the mantis shrimp and other animals. Understanding the polarization 1
The recently observed gravitational
properties of light is of vital importance in optical science, and finds waves (Abbott et al. 2016) also display
polarization phenomena.
utility in other fields.3 The scattered blue light from the sky is
2
polarized, with the extent and orientation of polarization depending As photons are massless and cannot
on the viewing angle with respect to the Sun. There is evidence that be brought to rest, they can only have
two angular momentum states.
bees can detect the direction of the electric-field vector in the celestial
3
polarization pattern (Evangelista et al. 2014), and it is thought that the In 1848, the study of polarized
light propagating through solutions of
Vikings used sky-polarimetric techniques for maritime navigation tartaric acid led to the discovery of
(Horváth et al. 2011). chiral chemistry by Louis Pasteur
In this chapter, we shall investigate the polarization properties of (Dole 1822–Marnes-la-Coquette 1895),
optical waves of infinite transverse spatial extent,4 formed from the see Section 4.15.
4
superposition of two co-propagating monochromatic waves of the same Using Fourier techniques from Chap-
ter 6 it is possible to generalize the
frequency,5 but with different electric field directions. The vector that treatment to spatially localized waves
specifies the direction of the electric field is called the polarization by summing over many wave vector
vector. The harmonic waves being superposed may have different orientations, see Chapter 12.
amplitudes and phase. 5
Again Fourier techniques can be
First, we consider polarized light propagating in free space. In a employed to extend the analysis to non-
plane perpendicular to the propagation direction, two orthogonal basis monochromatic waves.
52 Polarization
vectors are needed to characterize the polarization state. A complete

description of polarization is specified by the magnitudes of two basis
6
As we saw in Chapter 3 only the rel- vectors, and their relative phase.6 In optics, it is mostly convenient to use
ative phase is important, as the global linear basis vectors, that coincide with orthogonal spatial dimensions,
phase can be easily incorporated into
the origin of the temporal evolution.
such as horizontal and vertical. However, in the framework of
light–matter interactions, see Chapter 13, it is more convenient to
use a circular basis based on orthogonal circularly polarized unit
vectors. Clearly, the final results of a calculation or analysis are
independent of the basis, and it is a matter of choice as to which is
7
We use the symbols (|) and (◦) to used.7 A goal of this chapter is to provide the reader with a feeling of
denote which basis is used in each which choice is more appropriate.
section.
In summary, the investigation in this chapter is encapsulated in the
question: how does the vector electric field associated with an optical
field of infinite extent evolve in space and time? We shall use both
equations and figures to illustrate the answer.
4.2 Linear basis (|)

In the linear basis, two plane waves—both with the same wave vector
k and angular frequency ω—are used to construct polarized light. The
electric fields associated with the two plane waves point along directions
Fig. 4.1 For a plane wave with wave
specified by the unit vectors ˆ1 and ˆ2 . As plane waves are transverse,
vector k, the polarization basis vectors we have k · ˆ1 = k · ˆ2 = 0, and we choose for our basis vectors,
1 and ˆ
are ˆ 2 , where ˆ1 × ˆ
2 = k̂, ˆ1 · ˆ2 = 0; i.e. ˆ1 and ˆ2 are always orthogonal, see Fig. 4.1. We also
and ˆ1 · ˆ
2 = 0. The two polarization choose ˆ1 , ˆ2 , and k̂ to form a right-handed coordinate system, such that
vectors lie within the plane shaded in
grey.
ˆ1 × ˆ2 = k̂. In a laboratory, it is good practice to confine optical waves
to a plane parallel to the optical bench; in this case for convenience one
would typically choose the unit vectors ˆ1 = ˆx and ˆ2 = ˆy , i.e. the
horizontal and vertical directions, respectively.
Using the complex form of a harmonic wave, eqn (1.26), we can write
any arbitrarily polarized plane wave in the linear basis as
E = E 1 + E 2 = (E1 ˆ1 + E2 ˆ2 ) ei(k·r−ωt) , (4.1)
where the amplitudes E1 and E2 are constants but may in general be

complex, to account for a phase difference between the two components.
The magnetic field, B, can be deduced by summing the vectors for each
plane wave component in eqn (2.9): k × E 1(2) = ωB1(2) .
For the special case where the two √ polarization components have equal
Fig. 4.2 Axes in a linear basis with
the angle α defined as the angle of the amplitudes, e.g. |E1 | = |E2 | = E0 / 2 and a phase difference ϕ, eqn (4.1)
electric-field vector relative to ˆ
1 . For a becomes
plane wave propagating along z and ˆ 1
along x, α is the angle of the electric- E = √12 E0 ˆ1 + eiϕ ˆ2 ei(k·r−ωt) . (4.2)
field vector relative to the x axis.
As we shall see, the relative phase ϕ determines whether the light is
linearly or circularly polarized. If ϕ changes as the light propagates
then the polarization state changes.
4.3 Linear polarization (|)

In the linear basis, eqn (4.1), a linearly-polarized wave is achieved
by choosing E1 and E2 to be in phase. The plane of polarization is
defined as that containing both the direction of propagation k and the
electric field E; and is at an angle α = tan−1 (E2 /E1 ) with respect to
ˆ1 , see Fig. 4.2. It is important not to confuse the polarization angle
α with the propagation angle θ.8 The magnitude of the electric field 8
For most of this chapter we shall
is √|E1 |2 + |E2 |2 . For the simple case of equal amplitudes, E1 = E2 = choose propagation along z, such that
θ does not appear.
E0 / 2 in eqn (4.1), we have
E= √1 E0
2
(ˆ1 + ˆ2 ) ei(k·r−ωt) , (4.3)
which is a linearly polarized wave with polarization vector in the α =
+45◦ plane, as shown in Fig. 4.3.
Fig. 4.3 A linearly polarized plane

wave propagating in the direction,
n̂ = k/k. The electric-field vector is
confined to one plane. (i)–(iv): The
evolution of the field at the origin r = 0
at times (i) t/T = 0, (ii) 1/8, (iii) 1/4,
and (iv) 3/8, where T is the period of
the wave.
To follow the time evolution, (i)−(iv) in Fig. 4.3, imagine an observer

at r = 0 measuring the electric field as a function of time as the wave
propagates in the direction specified by the wave vector.
4.4 Circular polarization (|)

Circularly polarized waves are obtained by having E1 and E2 with
equal amplitudes and a phase difference of ±π/2. We refer to the
two circular polarizations as L- and R-circularly polarized light,
corresponding to ϕ = +π/2 and ϕ = −π/2, respectively. Note that
there are other labelling conventions, but we adopt the optics standard
defined in the Optical Society of America’s Handbook of Optics:9 9
Warning: in different branches of
physics, such as high-energy physics, a
different convention for the handedness
L- and R-circularly polarized light of a wave is commonly adopted.
If an observer facing into the approaching wave sees the electric field
rotating clockwise (anti-clockwise) in time, the polarization is said to
be R (L).
A circularly polarized light wave is depicted in Fig. 4.4. In contrast

to linearly polarized light, there is no time when the magnitude of the
54 Polarization
field is zero; the electric field vector sweeps out a spiral or helix as it
propagates in space. The sign of the phase difference of π/2 dictates the
sense of rotation of the electric field.
Fig. 4.4 A left-circularly polarized

wave, eqn (4.4), propagating along the
z axis. If the thumb of your left
hand points along k then your fingers
point in the direction of rotation of
the electric-field vector in space. Inset:
The time evolution of the field at the
origin z = 0. (i) t/T = 0, (ii) 1/8,
(iii) 1/4, and (iv) 3/8, where T is the
period of the wave. For left-circular,
the field rotates anti-clockwise in time
when looking into the wave.
√
Substituting E1 = E0 / 2 and E2 = E1 e±iπ/2 in eqn (4.1) we obtain the
following expressions for L- and R-circularly polarized light:
EL = √1 E0
2
(ˆ1 + iˆ2 ) ei(k·r−ωt) , (4.4)
ER = √1 E0
2
(ˆ1 − iˆ2 ) ei(k·r−ωt) . (4.5)
The factors of ±i correspond to a phase difference, ϕ = ±π/2, between

the orthogonal linear components. At all points in space and time the
magnitude of the electric field for these circularly polarized waves is |E0 |.
Figure 4.4 shows the spatial evolution at a given time, and the temporal
evolution in a given plane, (i) − (iv), for a circularly polarized field. The
forms of the waves in eqns (4.4) and (4.5) are valid for any direction of
propagation, and with the two components of the field along any two
orthogonal directions (which form a right-handed set of co-ordinates with
k). It is worth reiterating that the complex notation was introduced for
mathematical convenience in Chapter 2 as the easiest way to manipulate
the sinusoidal form and phases of plane waves.
It is possible to write circularly polarized optical waves entirely with
summations of real functions, suitably phase delayed. Readers who want
to compare the techniques are encouraged to attempt the relevant end-
of-chapter exercises. Here we give explicit forms of circularly polarized
light with real fields, with the wave vector along the z direction, and use
horizontal, ˆ1 = ˆx , and vertical, ˆ2 = ˆy , as the polarization basis:
EL = E0 cos (kz − ωt) ˆx − E0 sin (kz − ωt) ˆy , (4.6)
ER = E0 cos (kz − ωt) ˆx + E0 sin (kz − ωt) ˆy . (4.7)
Both of these waves have electric fields that point along the x direction at
t = 0 in the z = 0 plane. For left circular, eqn (4.6), the y component lags
behind the x component by π/2, as seen in Fig. 4.5. How to incorporate

a field pointing along another direction at t = 0 in the z = 0 plane is
worked through in the end-of-chapter exercises.
4.5 Elliptical polarization (|)

When the two orthogonal linear components in the polarized wave of
eqn (4.1) have different magnitudes and a phase difference, the most
general form of polarized light, elliptical light, is obtained. In any
given plane the electric-field vector traces out an ellipse as a function of
time; the same convention of L and R handedness that was introduced Fig. 4.5 Left-circular light propagating
for circular polarization is used. along z with polarization along x in the
Let the phase difference between the two components be δ. It can z = 0 plane, eqn (4.4). The projections
be shown that the angle, α, of the semi-major axis of the ellipse with on x and y, Ex and Ey are shown below
and behind, respectively.
respect to with respect to ˆ1 is
2E1 E2 cos δ
tan 2α = , (4.8)
E12 − E22
where E1 and E2 are the amplitudes of the real fields. The situation is
depicted in Fig. 4.6. The details of the calculation are outlined in an
end-of-chapter exercise.
4.6 Circular basis (◦)

This basis is useful for light–matter interactions. We use two circularly
polarized light waves with opposite handedness, both with the same
wave vector k and angular frequency ω. The unit vectors are
√1 Fig. 4.6 Elliptically polarized light ob-

ˆ+ = (ˆ1 + iˆ2 ) , (4.9)
2 tained when the two orthogonal linear
ˆ− = √1
2
(ˆ1 − iˆ2 ) . (4.10) components have different magnitudes
and a phase difference.
The basis vectors are complex; they are orthonormal in the sense that
ˆ∗+ · ˆ+ = ˆ∗− · ˆ− = 1, and ˆ∗+ · ˆ− = 0, where the star denotes a complex
conjugate.10 The most general polarized plane wave in the circular basis 10
This notation is widely used, for
is example, to represent quantum states
as vectors in a Hilbert space. A clear
and readable account can be found
in Isham’s “Lectures on Quantum
E = E + + E − = (E+ ˆ+ + E− ˆ− ) ei(k·r−ωt) . (4.11) Theory” (1995).
Any polarization state may be represented using either a circular basis,

eqn (4.11), or a linear basis, eqn (4.1).
Linear polarization in a circular basis (◦): A superposition

of two co-propagating waves with opposite handedness with the same
amplitude, |E− | = |E+ |, results in a linearly polarized wave. For example,
if E− and E+ are both in phase, then the ˆ1 components in eqn (4.11) add,
56 Polarization
whereas the ˆ2 components exactly cancel, leaving a plane wave linearly
polarized along ˆ1 . When E− and E+ have the same magnitude but
different phases the resultant wave is linearly polarized, but at another
angle in the transverse plane; the relevant calculation is outlined in the
end-of-chapter exercises. An example of this decomposition is shown
in Fig. 4.7. The concept that a phase difference between two circularly
polarized waves of opposite polarization leads to a rotation of the plane
of polarization will be useful in the context of optical activity, and
forms the basis for the explanation of Faraday rotation, as we shall
discuss in Section 4.14.
Circular polarization (◦): Circularly polarized light is easily

obtained in this basis: we simply set E− = 0 for L-circularly polarized
light, or E+ = 0 for R-circularly polarized light.
Elliptical polarization (◦): If we choose E+ and E− to both be

finite and not equal in magnitude, then an elliptically polarized wave is
produced.
4.7 Poincaré sphere (◦)

Fig. 4.7 Decomposition of linearly If the amplitude |E| is fixed then eqn (4.11) only has two free parameters:
polarized light in terms of circular
basis states. Left and right circularly
the ratio E+ /E− and the relative phase. Sometime it is convenient
polarized component, E + and E − , (or to represent both of these parameters as angles, in which case the
E L and E R ), and their sum at four polarization state can be represented as a vector in an abstract space
different times. known as the Poincaré sphere. First we rewrite eqn (4.11) in the form
π π
E = E0 eiψ cos − χ ˆ+ + e−iψ sin − χ ˆ− ei(k·r−ωt) , (4.12)
4 4
where χ and ψ determine the relative amplitude and phase of the left-
and right-circular basis vectors, respectively. If 2χ and 2ψ are the angles
of latitude and longitude, as in Fig. 4.8, then circular polarization lies
at the poles, and horizontal polarization around the equator.
The convenience of the Poincaré sphere is the similarity to the Bloch
sphere used to represent a spin-1/2. This allows the use of spin-rotation
matrices to describe arbitrary polarization transformations, which is
particularly useful if we want to think about photon polarization as
a qubit, see e.g. Nielsen and Chuang (2011).
Fig. 4.8 Poincaré sphere represen-

tation of polarization: 2χ = ±π/2 4.8 Photon spin (◦)
correspond to left- and right-circularly
polarized light, respectively; 2χ = 0 A strong motivation for using the circular basis is the ease with which
corresponds to linearly polarized light angular momentum can be incorporated. A photon has an angular
with an orientation determined by 2ψ.
momentum (or spin) of ±, which is either parallel or anti-parallel to
the direction of propagation. Left-hand circularly-polarized light has a
projection of + in the direction of propagation, whereas right-hand has
a projection of −, as shown in Fig. 4.9.
Photon momentum and circularly polarized light

L- (R-)circularly polarized light has an angular momentum projection
on to the k̂ direction of + (−).
In quantum mechanics, the projection of the spin along the direction

of propagation is called the helicity of a particle.11
Historical note: An insight into the kinds of mechanical analogues
of light discussed by the pioneers is provided by Poynting’s 1909 paper
The wave motion of a revolving shaft, and a suggestion as to the angular
momentum in a wave of circularly polarised light. He suggested that
a circularly polarized wave of wavelength λ had angular momentum
equal to the wave’s linear momentum multiplied by λ/2π. This result is
easily confirmed in a photon picture, as the angular momentum in the Fig. 4.9 The angular momentum of
circularly polarized wave is per photon, and the magnitude of the linear left- (top ms = +) and right- (bottom
momentum per photon is k. Recalling that k = 2π/λ, it is evident ms = −) circularly polarized photons.
that Poynting’s conjecture is correct. The mathematical derivation 11
Warning: Helicity is another con-
is significantly more involved; however, it is possible to calculate the cept where there is more than one
angular momentum associated with circularly polarized waves entirely convention. In particle physics, the
classically, without recourse to the photon concept, see e.g. Zangwill opposite convention between handed-
(2013). As angular momentum is conserved when atoms absorb or ness and helicity is adopted, see e.g.
Aitchison and Hey (1989).
emit light, the polarization of radiation can be used to derive selection
rules12 for atomic transitions. If on absorption of a photon the atom 12
See Foot, Atomic Physics (2004).
gains a unit of z component of angular momentum, the transition is
labelled as σ + ; in a σ − transition the atom loses a unit of z component 13
A clear discussion on the relationship
of angular momentum, whereas if there is no change in the atom’s z between polarization and atomic tran-
component of angular momentum the transition is called π.13 sitions is given by Corney (1977).
4.9 Polarized light in a medium

So far, we have only considered the propagation of polarized waves
in vacuum. In the following sections we consider how polarized light
propagates inside a medium. If the electronic response of the medium
is anisotropic or chiral then the phase of the propagating wave is
dependent on the orientation of the electric-field vector. In this case,
propagation inside the medium is characterized by two refractive indices,
one for each of the two polarization basis states (either orthogonal linear Fig. 4.10 An optical medium may have
or orthogonal circular), as in Fig. 4.10. A medium with two refractive refractive indices, n1 and n2 associated
indices is known as birefringent. with each polarization state. In a
linearly birefringent medium, the reso-
At a fundamental level—encapsulated in the Kramers–Kronig re- nances, ω1 and ω2 , are associated with
lations, see Chapter 13—a refractive medium also attenuates light; electronic oscillations along orthogonal
consequently a medium that exhibits birefringence must also exhibit crystal axes. In the Faraday effect, the
dichroism, i.e., different attenuation of one polarization state relative splitting, ω1 and ω2 , is induced by an
external magnetic field.
to the other. Below we consider how these properties may be exploited
to realize optical devices.
58 Polarization
4.10 Polarizers
A polarizer or polarizing filter modifies both the polarization state
and the amount of light transmitted. A polarizer resolves the input
electric field into orthogonal components, and only transmits one of
them. A polarizer can be used to turn an unpolarized wave into a
polarized wave, at the expense of a reduction in the intensity of the
transmitted wave.
Polarizers may be based on either dichroism or Fresnel reflection.
Light incident at an interface at Brewster’s angle experiences a reflec-
tivity of 0% for the in-plane p-polarization, and 15% for the out-of-
plane s-polarization, see Fig. 2.10. Consequently, the reflected light
is s-polarized and the transmitted light has a higher percentage of p-
polarization. By using two media with high and low indices it is possible
to adjust Brewster’s angle to be 45◦ . Using a stack of alternating high
and low index layers, the reflection coefficient for the p-polarization can
be increased to close to 100%. This device, illustrated schematically in
Fig. 4.11, is known as a polarizing beam-splitter cube.
A polarizing filter attenuates one polarization and transmits the other,
Fig. 4.11 A polarizing beam-
splitter cube. A stack of layers with as in Fig. 4.12. An ideal filter would have no attenuation for one
high and low refractive indices inserted component, and infinite extinction for the orthogonal component. The
between two prisms reflects the out-of- most common polarizing filter is Polaroid, where the vastly different
plane s-polarization and transmits the
extinction along different axes is a manifestation of the alignment
in-plane p-polarization.
of herapathitite (iodoquinine sulfate) crystals embedded in a plastic
14
The polarizing effect of iodoqui- sheet.14
nine sulfate was discovered by Doctor
William Bird Herapath (Bristol 1820–
1868). The iodoquinine sulfate crystals
were found when iodine was added to
4.11 Malus’ Law
the urine of a dog that had been fed
quinine (Kahr et al. 2009). To calculate the reduction in the intensity of plane polarized light
incident on a polarizing filter, we simply resolve the incident light’s
electric field into components parallel and perpendicular to the axis
of the polarizer along which light is transmitted. Figure 4.2 shows
the geometry. If the angle between the direction of polarization of
the incident light and the transmission axis of the polarizer is α, then
only the component E0 cos α is transmitted. Recalling the result from
Chapter 1 that optical detectors detect intensity, proportional to the
Fig. 4.12 A polarizer (grey disc) only electric field squared, we can predict the intensity of a plane wave of
transmits light parallel to a particular
axis. In this example, the axis of the
incident intensity I0 that is transmitted by a polarizer at an angle α
polarizer is vertical and the input light with respect to the polarization vector of the input:
is linearly polarized at +45◦ . Only
the vertical component is transmitted. I (θ) = I0 cos2 α . (4.13)
The√amplitude is reduced by a factor
of 2, and the intensity by a factor
This result is called Malus’ Law, after Étienne-Louis Malus (Paris
of 2. In contrast to earlier figures,
now we indicate the polarization state 1775–1812), who made a number of fundamental discoveries, especially
by the black line that follows the time regarding polarized light. Unfortunately, Malus died within three years
evolution of the electric field vector at of discovering the phenomenon of polarization by reflection, leaving
a particular position.
Francois Arago (Estagel 1786–Paris 1853) and Jean-Baptiste Biot (Paris
1774–1862) to explain his observations (Levitt, 2009).
4.12 Linear birefringence (|) 59
4.12 Linear birefringence (|)

In linearly birefringent media, including crystals with certain symme-
tries, plastics, and optical fibres,15 the refractive index depends on the 15
Linear birefringence can also be in-
direction of electric-field vector, and we have to define a refractive index duced. For example, photo-elastic ma-
terials become birefringent under the
associated with each polarization component. The two orthogonal axes application of stress—stress-induced
are typically called the fast and slow axes, with refractive indices nf birefringence. In electro-optic mate-
and ns , respectively, where nf < ns .16 As light propagates through a rials an external electric field induces
birefringent medium the relative phase between different polarization birefringence giving rise to the Kerr
and Pockels effects.
components changes, leading to a change in the polarization state. 16
For certain crystals, these are also
For a plane wave propagating along the z axis, if the input polarization
called the extraordinary and ordi-
in the z = 0 plane is linearly polarized with a direction at +45◦ to the nary axes.
slow axis, we can write
E (0) = √1 E0
2
(ˆf + ˆs ) e−iωt .
The field after propagating a distance z is

E (z) = √12 E0 einf kz ˆf + eins kz ˆs e−iωt = √12 E0 ˆf + eiϕ ˆs ei(nf kz−ωt) ,
where ϕ = 2π(ns − nf )z/λ and λ is the vacuum wavelength. As z

increases, the phase difference increases, and the polarization changes
from linear to left circular, to orthogonal linear, to right circular, and
back to linear again. As the refractive index is wavelength dependent,
the phase difference will also be a function of wavelength.
4.13 Wave plates (|)

A wave plate is an optical component made from a birefringent
material with a length, , chosen such that the phase difference ϕ =
2π (ns − nf ) /λ is equal to an integer or half-integer multiple of π. As
we saw previously, altering the phase difference between the two electric-
field components allows for the modification of the polarization state of
light—this property is exploited in half- and quarter-wave plates. Fig. 4.13 Half-wave plate: The
Half-wave plate: For a given wavelength and refractive index input light is linearly polarized at −45◦
to the fast axis. Downstream of
difference, a half-wave plate is realized when the thickness of the medium
the half-wave plate the polarization is
is adjusted such that the slow component is retarded by half a wave with rotated to +45◦ . As in Fig. 4.12, we
respect to the fast component, i.e. the field components travelling along indicate the polarization state as the
the different axes pick up a phase difference, ϕ = π, on propagation pattern traced out by the tip of the
polarization vector over time.
through the medium. A half-wave plate can be used to rotate the
direction of polarization of linearly polarized light, and change the
handedness of circularly polarized light.
Linearly polarized light: Figure 4.13 shows a linearly polarized
wave incident normally on a half-wave plate. There are two orientations
where it is trivial to predict the output: if the direction of polarization 17
By analogy with quantum mechanics
coincides with either the fast or the slow axis, then the wave propagates a polarization along either the fast or
through the crystal, picks up the relevant retardation, but the polar- slow axis is considered an eigenfunction
ization state is invariant.17 For the case of the direction of polarization of the wave plate.
60 Polarization
inclined at an angle α with respect to the fast axis, the slow component
picks up a π phase shift and the action of the half-wave plate can be
written as
18
We are assuming that the fast and E in = E0 (cos α ˆ1 + sin α ˆ2 ) ei(k·r−ωt)
slow axes correspond to the directions
1 and ˆ
of ˆ 2 , respectively.
E out = E0 (cos α ˆ1 − sin α ˆ2 ) ei(k·r+nf k−ωt) , (4.14)
where is the length of the wave plate. In words, this result says that
the electric field is reflected with respect to the fast axis.18 Linearly
polarized light incident at an angle α with respect to the fast axis exits
the wave plate linearly polarized, but with the direction of polarization
rotated to be at an angle of −α.
Circularly polarized light: Recalling from Section 4.6 that the unit
vectors for L- and R-circularly polarized light are ˆ+ = √12 (ˆ1 + iˆ2 )
and ˆ− = √12 (ˆ1 − iˆ2 ), and that a half-wave plate retards the
slow polarization component by half a wavelength relative to the fast
polarization component, it is evident that circularly polarized light
changes its handedness on passing through a half-wave plate.
Quarter-wave plate: For a given wavelength and refractive index

difference, a quarter-wave plate is realized when the thickness of the
medium is adjusted such that the slow component is retarded by a
quarter of a wave with respect to the fast component, i.e. the field
components travelling along the different axes pick up a phase difference
ϕ = π/2 on propagation through the medium. The evolution of the
electric-field components along the fast and slow axes after traversing a
quarter-wave plate is
Fig. 4.14 Quarter-wave plate: The
input light is linearly polarized at +45◦
to the fast axis (vertical). Beyond E in = (E1 ˆ1 + E2 ˆ2 ) ei(k·r−ωt) ,
the wave plate it becomes left-circularly E out = (E1 ˆ1 + iE2 ˆ2 ) ei(k·r+nf k−ωt) . (4.15)
polarized (indicated by the black circles
that trace out the electric-field vector at
each position over a complete cycle). These relations allow quarter-wave plates to convert between linear and
circularly polarized light.
Linearly polarized light: When a linearly polarized wave is incident
normally on a quarter-wave plate, just as for a half-wave plate, there are
two orientations where it is trivial to predict the output: if the direction
of polarization coincides with either the fast or the slow axis, then the
wave propagates through the crystal, picks up the relevant retardation,
but the polarization state is unchanged. However, if the input light
is linearly polarized light√at an angle α = +π/4 with respect to the
fast axis, E1 = E2 = E0 / 2 in eqn (4.15), then output is L-circularly
polarized as shown in Fig.
√ 4.14; the orthogonal linear polarization, α =
+π/4 (E1 = −E2 = E0 / 2) generates R circular polarization. For other
orientations of the input linear polarization, elliptical light is generated
on traversing the quarter-wave plate.
Circularly polarized light: Using eqns (4.9) and (4.15) it is evident
that a quarter-wave plate converts L-hand circularly polarized light to
light that is linearly polarized at an angle θ = π/4 (+45◦ ) between
4.14 Circular birefringence (|) 61
the fast and slow axes, i.e. along the direction √12 (ˆ1 + ˆ2 ); whereas R-
hand circular polarization also becomes linearly polarized, but along the
direction √12 (ˆ1 − ˆ2 ), θ = −π/4 (−45◦ ). These results are as expected
on account of the time-reversed situations previously described.
4.14 Circular birefringence (|)

A second kind of birefringence—circular birefringence—arises when a
medium responds differently to the two circular polarization states. We
can distinguish between intrinsic circular birefringence—which is known
as optical activity—and induced optical activity, such as the Faraday
effect, where the addition of a magnetic field induces different response
to left- and right-circularly polarized light. Circular-birefringent media
include some liquids (such as sugar solutions in particular), and media
subject to external electromagnetic fields.
To explain circular birefringence we decompose linearly polarized
light into a superposition of L- and R-circularly polarized waves of
equal amplitude. The relative phase, ϕ in eqn (4.2), between the two
components dictates the orientation of the polarization vector. Consider
linearly polarized light with polarization along direction ˆ1 at the origin.
Using eqn (4.4) and eqn (4.5) we can write this as:
E = E0 ei(k·r−ωt) ˆ1 = √1
2
(E L + E R ) . (4.16)
If the left and right components experience refractive indices nL and nR ,
respectively, and assuming that there is no difference in the attenuation
coefficients of the different handedness of light, then after traversing a Fig. 4.15 For linearly polarized light
distance, r, inside the medium the field is in a circularly birefringent medium, the
plane of polarization rotates as the field
E = 12 E0 (ˆ1 + iˆ2 ) einL k·r + (ˆ1 − iˆ2 ) einR k·r e−iωt , propagates. For optical activity, the

= 12 E0 einL k·r + einR k·r ˆ1 + i einL k·r − einR k·r ˆ2 e−iωt , rotation is reciprocal, meaning that if
the light is retro-reflected the rotation
where we have substituted for E L and E R using eqns (4.9) and (4.10), is undone. However, this is not the case
for the Faraday effect, see Fig. 4.18.
respectively. By defining n = (nL + nR )/2 and Δn = (nL − nR )/2, we
can rewrite this as
E = E0 (cos βˆx − sin βˆy ) ei(nk·r−ωt) ,
where β = Δnkr/2 = π(nL − nR )r/λ. This result shows that light
remains linearly polarized, but the direction of polarization rotates by
an angle β, given by
πr
β= (nL − nR ) , (4.17)
λ
where r is the propagation distance. The plane of polarization rotates
by π over a distance Λ = λ/(nL − nR ).
4.15 Natural optical activity (|)

Whereas the physical origin of birefringence in crystals can be traced
to the fact that electrons in the medium oscillate differently along
62 Polarization
different crystal axes, the origin of intrinsic optical activity is more

subtle. We might not expect to observe birefringence in liquids,
as the molecules are randomly orientated; however, many liquids do
19
A fascinating historical account of exhibit circular birefringence,19 as in Fig. 4.15. The key concept
the importance of optical activity in for explaining this natural optical activity is chirality—the idea that
the formulation of optics is given in A
History of Optics from Greek Antiquity
certain molecules are handed, and not superimposable on their mirror
to the Nineteenth Century (Darrigol, images, see Fig. 4.16.20 Many molecules of interest in biochemistry
2012). Optical rotation was used as are chiral, including numerous sugars and amino acids. A solution of
a diagnostic of the purity of sugar molecules with a particular chirality rotates the plane of the polarization
imports. By the end of the nineteenth
century, Biot’s polarimeter was a key
as in Fig. 4.15—even after averaging over the random molecular
instrument in the manufacture and orientations.21 The mirror image of a chiral molecule is called an
pricing of sugar (Levitt, 2009). enantiomer. Enantiometric solutions rotate the direction of polarization
20
The term chirality was introduced by the same amount, but in opposite directions. As discussed in
by William Thomson, and comes from Section 4.14, for a medium of length , the plane of polarization of
the Greek word for hand. As we saw linear light is rotated by an angle β = Δnkr/2 = π(nL − nR )/λ. As the
in Fig. 4.9, photons are also chiral.
For massless particles, the helicity and
refractive indices are proportional to the concentration of molecules, this
chirality are the same. leads to the concept of specific rotation—the rotation per unit length
21
See e.g. The optical activity of
per unit concentration. Optical rotation can be used to determine (i)
oriented copper helices, by Tinoco Jr.
and Freeman (1957).
Fig. 4.16 A left (L) and right (R)

handed structure.
the identity and (ii) the enantiomeric purity of the substance, or (iii)
the concentration of a known substance in a solution.
Optical rotation is said to be dextro rotatory if the direction of
polarization rotates clockwise when looking towards the source, and
laevo rotatory if the rotation is anti-clockwise. Optical rotation
is a reciprocal optical process meaning that if a wave picks up a rotation
β on traversing the medium, the rotation is undone if the wave is retro-
Fig. 4.17 Photographs of two lasers reflected back through the same medium, see Fig. 4.15. Figure 4.17
with different wavelengths propagating shows how the plane of polarization of a linearly polarized wave rotates
through corn syrup (λ = 633 nm as it propagates through a sugar solution.
and 532 nm in the upper and lower
images, respectively). We only see
scattered light when the polarization
is orthogonal to the observation plane. 4.16 The Faraday effect (|)
The distance between the intensity
maxima is Λ = λ/(nL − nR ). For
In this section, we consider the Faraday effect, where an applied
the green laser (lower image) Λg =
14 cm. For the red laser (upper image) magnetic field induces circular birefringence. In 1845, in a sequence
Λr < Λg because red is further from of experimental investigations, Michael Faraday revealed for the first
resonance and the index difference is time the link between electromagnetism and light. These experiments
smaller, see Fig. 4.10. The attenuation
is larger for green light (lower image).
had far-reaching consequences that shaped the modern world, such as
Images courtesy of Miranda Nixon, the invention of electric motors and the ability to transform heat into
Durham University, 2015. electricity. Faraday showed that a magnetic field in the same direction
as the wave vector k can induce a change in the plane of polarization—
an effect which became known as Faraday rotation. We discussed in
4.16 The Faraday effect (|) 63
Section 4.6 how the natural basis for describing atom–light interactions
is the circular basis. Atomic transitions that are degenerate in the
absence of the magnetic field occur at different frequencies when the
field is applied. As a consequence, the absorption coefficient for L- and
R-circularly polarized light is different—the medium is said to exhibit
circular dichroism. There will also be a concomitant difference in
the refractive indices for the different handednesses of light, i.e. circular
birefringences. We can therefore use the same analysis as in Section 4.15
to predict a Faraday rotation angle for a medium of length, , of
β = π(nL − nR )/λ. As the index difference is proportional to the
external magnetic field B this is often written as
β = V B , (4.18)
where V = π(nL − nR )/(λB) (units rad.T−1 m−1 ) is called the Verdet
coefficient, which is a property of the medium.22 Media with large 22
It is often called the Verdet constant,
Verdet coefficients are either crystals that contain paramagnetic ions, somewhat of a misnomer as it is
wavelength dependent.
such as terbium, e.g. terbium gallium garnet (TGG); or atomic vapours,
where Verdet coefficients that are orders of magnitude larger than TGG
can be achieved, but only over a restricted wavelength range (Weller et
al. 2012).
Fig. 4.18 Faraday effect. Top:

a vertically polarized field enters a
Faraday medium and is rotated by
+45◦ . Bottom: the same wave
retro-reflected back through the same
medium is rotated in the same direction
by an additional +45◦ , such that the
retro-reflected light is orthogonal to
the input light. This non-reciprocal
property of the Faraday effect is used
for optical isolation.
One application of the Faraday effect is to create a non-reciprocal

optical device known as an optical diode, or optical isolator.
Whereas optical activity—like most optical phenomena—is reciprocal,
meaning that if the light is retro-reflected back through the same medium
the effect of the medium is reversed, this is no longer the case for the
Faraday effect. If the Faraday rotation is π/4, when the light is retro-
reflected then there is an additional π/4 rotation, such that the total is
π/2 and the output is orthogonally polarized relative to the input, as
shown in Fig. 4.18. By combining the Faraday rotator with a polarizing
filter it is thus possible to make a device where light is transmitted in
the forward direction (accompanied by a π/4 polarization rotation), but
no light can be transmitted in the reverse direction—this is an optical
diode. Such a device is vital for many contemporary optics experiments
using lasers, where it is important that the laser action is not perturbed
by weak back reflections feeding back into the laser cavity. The different
64 Polarization
sign of the rotation picked up by the reflected wave in contrast to the

example of natural optical activity is discussed further in an end-of-
23
A clear review of the principle chapter exercise.23
of reciprocity in optics is provided
by Potton (2004), who also provides
technological applications of magneto-
optic non-reciprocal media. 4.17 Interference
We finish this chapter by studying the interference of polarized light.
This builds on some of the results from Chapter 3 and also gives some
concrete examples of a phenomenon we first encountered in Chapter 2,
namely that a superposition of plane waves does not necessarily share
all of the properties of the individual plane waves.
Intensities of orthogonally polarized waves add: Consider two

plane waves A and B with the same frequency and with polarization
vectors Â and ˆB respectively, and wave vectors kA and kB respectively.
The total electric field at (r, t) is
E = EA Â ei(kA ·r−ωt) + EB ˆB ei(kB ·r−ωt) . (4.19)
We know from Chapter 3 that the intensity is proportional to the square

modulus of the electric field, which is given by:

∗
|E|2 = |EA |2 + |EB |2 + 2 Re EA EB ei(kB −kA )·r ˆ∗A · ˆB . (4.20)
For the case of parallel polarizations, the polarization vectors are

parallel, ˆ∗A · ˆB = 1, and thus we regain the form of eqn (3.3) derived
for scalar waves (and equal amplitudes). For the case of orthogonal
polarization, we have ˆ∗A · ˆB = 0, and the third—or interference—
term is identically zero. Therefore we conclude that we simply add the
intensities of the individual components when two orthogonally polarized
waves are added. We might be tempted to state that orthogonally
polarized waves do not interfere, but there is more to it than that.
The intensity at every point in space is the sum of the individual
intensities when two orthogonal polarized waves sum; however the
electric-field vector points in a different direction to either of the
components, and it can vary in space. We now go on to consider
three different cases of standing waves formed by superposing counter-
24
propagating plane waves with the same frequency and electric field
At the end of the last century, the
amplitude, but different polarization states. We shall show that each
analysis of polarization gradients in
standing waves, and their interactions configuration manifests a distinct polarization structure, with two of
with atoms, was crucial in the field the three having a rapid change of the polarization state over distance,
of sub-Doppler laser cooling (Dalibard i.e. a polarization gradient.24 For these calculations we choose to
and Cohen-Tannoudji 1989). The
parameters for polarization gradients
use fields propagating along the z and −z direction, and set the phases
in three-dimensional electromagnetic of the component travelling waves to zero; finite values of these phases
standing waves were studied by Hop- simply correspond to a translation of the standing wave structure along
kins and Durrant (1997). the propagation axis, consequently a suitable choice of space and time
origins eliminate these offsets (see end-of-chapter exercises).
4.17 Interference 65
Counter-propagating parallel linear: Let the electric field be
E = E0 cos (kz − ωt) ˆx + E0 cos (kz + ωt) ˆx . (4.21)
Using a standard trigonometric identity,25 we can rewrite this as 25

cos A + cos B =

A+B A−B
E = 2E0 cos kz cos ωt ˆx . (4.22) 2 cos cos .
2 2
Unsurprisingly, this is of the same form as we found when analysing

standing waves formed from scalar waves in Chapter 3, eqn (3.7). The
spatial dependence for this configuration is shown at the top in Fig. 4.19.
Fig. 4.19 Interference of polarized

light. The top image is for parallel
linear polarizations (linlin), which is
the case we considered in Chapter 3.
The middle image is for orthogonal
linear polarizations (lin⊥lin). In this
case the polarization changes between
linear and circular. Finally, in the lower
image we show the case of opposite
circular polarizations where the field
is everywhere linear but the direction
rotates forming a twisted mode.
From Maxwell’s equations, the magnetic fields associated with the

travelling waves of eqn (4.21) point along the directions ˆy and −ˆy ,
respectively. The field of the standing wave is
B = 2B0 sin kz sin ωt ˆy , (4.23)
where E0 = cB0 . The co-sinusoidal spatial and temporal dependence

of the electric field has been replaced by a sinusoidal dependence
for the magnetic field—thus, unlike individual plane waves, for this
superposition of two counter-propagating plane waves, the electric and
magnetic field are not in phase. However, the electric and magnetic
fields are still orthogonal.
Counter-propagating lin–perp–lin: We now consider counter-

propagating linearly polarized waves, but this time the polarization
vectors of the travelling waves are orthogonal. The electric field is
E = E0 cos (kz − ωt) ˆy + E0 cos (kz + ωt) ˆx . (4.24)
Once again, standard trigonometric identities26 allow this expression to 26

cos (A ± B) =
be rewritten in a more useful form:
cos A cos B ∓ sin A sin B.
E = E0 [cos kz cos ωt (ˆx + ˆy ) − sin kz sin ωt (ˆx − ˆy )] . (4.25)
This is a rather complicated spatial dependence for the polarization,

with a large polarization gradient. The spatial dependence for this
66 Polarization
configuration is depicted in the middle of Fig. 4.19. The ellipticity

varies in space, and the whole pattern repeats every λ/2 along the z
axis. To illustrate the complexity of this standing wave, let us examine
the polarization state at four different positions:
• z = 0. Here the (ˆx − ˆy ) component is zero, thus the polarization
is linear, along the direction √12 (ˆx + ˆy ).
• z = λ/8. Here the (ˆx + ˆy ) and (ˆx − ˆy ) components are equal
in magnitude and π/2 out of phase temporally, thus we obtain
circular polarization.
27
Note that there is not much point • z = λ/4. Here the (ˆx + ˆy ) component is zero, therefore
using the designations ‘L’ and ‘R’ here, the polarization is linear, along the direction √12 (ˆx − ˆy ), i.e.
as they are defined with respect to the orthogonal to the direction of polarization at z = 0.
direction of motion, but with counter-
propagating plane waves a standing • z = 3λ/8. Here the (ˆx + ˆy ) and (ˆx − ˆy ) components are equal
wave is formed. Nevertheless, an atom in magnitude and −π/2 out of phase temporally, thus we obtain
placed at z = λ/8 would absorb circular polarization. The sense of circulation27 is the opposite
photons with spin projection + with
respect to the z axis, and photons with in this plane to the one obtained in the plane z = λ/8.
spin projection − at z = 3λ/8.
The spatial variation of the magnetic field associated with this
configuration is also rather complicated, and can be written as
B = −B0 [cos kz cos ωt (ˆx + ˆy ) + sin kz sin ωt (ˆx − ˆy )] , (4.26)
where E0 = cB0 . At z = 0 we find that the magnetic field, like the
electric field, is plane polarized, but unlike the individual plane-wave
components, the electric and magnetic fields are parallel.
Counter-propagating L-circular: Finally, we consider counter-

propagating L-circularly polarized waves. The electric field is
E = E0 [cos (kz − ωt) ˆx − sin (kz − ωt) ˆy ]
+E0 [cos (kz + ωt) ˆx − sin (kz + ωt) ˆy ] . (4.27)
28
sin (A ± B) = sin A cos B After using more trigonometric identities,28 we can rewrite this expres-
sion as
± cos A sin B .
E = 2E0 cos ωt [cos kzˆx − sin kzˆy ] . (4.28)
We notice that the ˆx and ˆy components have the same temporal
dependence—therefore in every transverse plane the light is linearly
polarized. The phase difference between the ˆx and ˆy components
evolves linearly along z, therefore this represents a corkscrew polar-
ization gradient. The bottom sketch in Fig. 4.19 illustrates the spatial
dependence for this configuration.
The associated magnetic field is
B = −2B0 sin ωt [cos kzˆx − sin kzˆy ] , (4.29)
where E0 = cB0 . We note that the electric and magnetic fields are
temporally out of phase, but that they are parallel in space—i.e. the
magnetic field is linearly polarized with the direction of the magnetic
field vector following a corkscrew pattern along the z axis.
Exercises 67
Chapter summary
• Polarization is a fundamental property of waves in optics.

• A monochromatic wave linearly polarized along ˆ1 and propagating
in direction k is E = E1 ˆ1 ei(k·r−ωt) .
• Plane-, circular-, and elliptically polarized waves have a
constant phase difference between the two orthogonal transverse
components, equal to zero, ±π/2, and an arbitrary value,
respectively.
• Elliptically polarized waves—if an observer facing into the
approaching wave sees the electric field rotating clockwise (anti-
clockwise) in time, the polarization is said to be R (L).
• An L-circularly polarized wave propagating in direction k is
E L = √12 E0 (ˆ1 + iˆ2 ) ei(k·r−ωt) . Replacing the factor of i with −i
produces a R-circularly polarized wave. The unit vectors form a
R-handed coordinate system, ˆ1 × ˆ2 = k̂.
• L- (R-)circularly polarized light has an angular momentum
projection on to the k̂ direction of + (−) per photon.
• A quarter-wave plate introduces a π/2 phase retardation
between the components of the electric field along the slow and fast
axes, and is used to convert between linear and circularly polarized
light.
• A half-wave plate introduces a π phase retardation between the
components of the electric field along the slow and fast axes, and
is used to rotate the direction of polarized light.
• Malus’ Law states that the intensity of a plane wave of incident
intensity I0 that is transmitted by a polarizer at an angle α with
respect to the polarization vector of the input is I (θ) = I0 cos2 α.
• Optical activity—some media have the ability to rotate the
polarization of incident linearly polarized light.
• The non-reciprocal nature of the Faraday effect can be exploited
to create an optical diode.
• The interference of two counter-propagating plane waves with
orthogonal polarizations gives rise to an optical field with strong
polarization gradients.
Exercises
(4.1) Plot of linearly polarized light 2 plane for times t/T = 0, 1/8, 1/4, 3/8, and
ˆ
1 –
Use eqn (4.1) to plot the electric field in the ˆ 1/2. Assume that E1 and E2 are real, equal in
68 Exercises
magnitude, and in phase. 1 direction. Let Ea and Eb be the components of

ˆ
(4.2) Plot of circularly polarized light the field along the semi-major and semi-minor axes
Using eqn (4.6) plot the electric field in the plane of the ellipse. Write down equations that relate Ea ,
z = 0 for times t/T = 0, 1/8, 1/4, 3/8 and 1/2. Eb , E1 , E2 and α, the angle of the semi-major axis
of the ellipse with respect to ˆ 1 . Combine your
(4.3) Circularly-polarized light with real fields results, and verify the result given in the text as
Rework eqn (4.4) with real fields and an eqn (4.8). What value do you obtain for the special
appropriate choice of axes to confirm the form of case of δ = 0? Comment on your result.
eqn (4.6) for a light wave propagating along z.
(4.9) Magnetic field for elliptically polarized light
(4.4) Different states of polarized light—complex nota- By thinking about the relative orientation of
tion the magnetic and electric fields of the plane
Using complex notation, write down the electric waves summed to give elliptically polarized light,
field for the following polarization states: describe the form of the magnetic field.
(i) an L-circularly polarized wave propagating
(4.10) Poincaré sphere
along the x axis;
Sketch the evolution on the Poincaré sphere as
(ii) a R-circularly polarized wave propagating
light propagates through a medium that exhibits:
along the y axis;
(i) Linear birfringence; (ii) Circular birefringence.
(iii) a linearly polarized wave at π/4 with respect
to both x and y axes, propagating along the −z (4.11) Linear birefringence and wave plates
axis. A linearly polarized plane wave propagating along
z with polarization vector at an angle α = π/4 to
(4.5) Phase difference between orthogonal linear polar- the x axis enters a birefringent medium at z = 0.
ized waves The refraction indices in the x and y directions are
(i) Write an equation for the sum √ of two plane nx and ny , respectively.
waves, both with amplitude, E0 / 2, propagating (i) Write an expression for the field after it has
along the z axis with orthogonal linear polariza- propagated a distance inside the medium.
tions (along x and y), where the y component lags (ii) Explain, briefly, why only the relative phase
behind the x component with a phase difference, between the two terms matters.
ϕ. (iii) For quartz at a wavelength of 589 nm, the
(ii) What value of ϕ corresponds to (a) linear, (b) index difference is ny − nx = 9.13 × 10−3 . What
left-circular, and (c) right-circular polarization? thickness of quartz is required to convert the linear
(iii) Comment on the orientation of the electric- input to a circular output?
field vector in the case of linearly polarized light. (iv) Is the output left- or right-circularly polar-
(4.6) Different states of polarized light—real fields ized? How could you change this?
Repeat the analysis of the previous question, using (v) Comment on the practicality of making a wave
only real fields. plate with this thickness, and what alternatives
there might be.
(4.7) Mirror reflection of circularly polarized light
An L-hand circularly polarized wave is normally (4.12) Thickness of a calcite wave plate
incident on a mirror. What hand does the reflected For calcite at λ = 589 nm, the fast refractive index
wave have? Explain your answer. (Hint: recall the is 1.4864, the slow index is 1.6584. What is the
boundary conditions for a perfect conductor such minimum thickness required to construct a half-
as a mirror—the sum of the incident and reflected wave plate?
electric fields has to be zero). (4.13) Rotation of a half-wave plate
(4.8) Orientation of elliptically polarized light Vertically polarized light is normally incident on
Start by writing the real electric field as E = a half-wave plate orientated with its fast axis

E1 cos (kz − ωt) ˆ
1 + E2 cos (kz − ωt + δ) ˆ
2 , where vertical. Describe the changes in the output

E1 and E2 are the amplitudes of the fields along polarization state as the wave plate is slowly
directions ˆ1 and ˆ 2 , respectively, and δ the rotated about the wave vector of the light by π.
relative phase between the components. Expand (4.14) Circularly polarized light and a half-wave plate
the cosine in the ˆ 2 component, and eliminate Verify mathematically the assertion in the text
the (kz − ωt) terms. You should obtain the that a circularly polarized light wave changes its
equation of an ellipse, rotated with respect to the handedness on passing through a half-wave plate.
Exercises 69
The projection of the angular momentum of the polarized waves let the electric field be
photons in the wave onto the axis of propagation E = E0 [cos (kz − ωt + δ− ) + cos (kz + ωt + δ+ )] ˆ x .
must therefore be reversed after traversing the Show that a shift of the origin of the
half-wave plate. Is this consistent with the coordinate system along the z axis using
conservation of angular momentum? the expression z = z + (δ− + δ+ ) /2k
(4.15) Intensity before and after wave plates allows the field to be rewritten as E =
Using eqns (4.14) and (4.15) verify that for E0 {cos [kz − (ωt + δ )] + cos [kz + (ωt + δ )]} ˆ
x ,
both half- and quarter-wave plates, although the where δ = (δ+ − δ− ) /2. Show that δ can also be
electric field is modified on transmission, the eliminated with an appropriate choice of temporal
intensity is invariant. origin.
(4.16) Cascading polarization components (1) (4.20) Standing waves with complex waves
Consider a linearly polarized wave incident Use complex notation for the plane waves to derive
normally on a sequence of wave plates. The the form of the electric field, eqns (4.22), (4.25),
direction of polarization is at π/4 with respect and (4.28), for the three different standing waves
to the initial quarter-wave plate axes; there then analysed in the text.
follows a half-wave plate with axes at an arbitrary (4.21) Faraday effect and optical diode
orientation, with the final element being a quarter- The electric field of left- and right-circularly
wave plate with the same orientation as the polarized plane waves propagating along the z axis
first. Describe the state of polarization after each may be written as E L = √12 E0 (ˆ x + iˆy ) ei(kz−ωt)
element.
and E R = √12 E0 (ˆx − iˆ
y ) ei(kz−ωt) , where ˆ x and
(4.17) Cascading polarization components (2) y are unit vectors along x and y.
ˆ
Consider unpolarized light incident normally on (i) Write an equation for a plane wave propagating
a polarizing filter; the transmitted light is then along z and linearly polarized along x in terms of
incident on a quarter-wave plate with axes E L and E R .
oriented at π/4 with respect to the axis of the (ii) The plane wave enters a Faraday medium at
polarizer. The light then reflects from a mirror z = 0. Inside the medium left- and right-circularly
and passes through the quarter-wave plate before polarized light have refractive indices, nL and nR ,
being incident on the polarizer for a second time. respectively. Write an equation for the field after
By analysing the polarization state after each propagating a distance z inside the medium.
component, explain why no light is transmitted (iii) By writing nL = n+Δn/2 and nR = n−Δn/2,
through the polarizer on the second traversal. where n = (nL + nR )/2 and Δn = nL − nR , show
What is a practical use of this device? (Hint: the that E = E0 (cos ϕˆ x − sin ϕˆ y ) ei(nkz−ωt) , where
mirror can be replaced by a computer monitor). ϕ = Δnkz/2 = π(nL − nR )z/λ. (Note that this
Does this device work for every colour? Does the result also applies to an optically active medium.)
device work for light waves that are not normally (iv) For rubidium gas, in a magnetic field of
incident? 0.600 T using a laser at 780 nm, nL − nR =
(4.18) Cascading polarization components (3) 9.75×10−5 . If the gas cell has a length of 2.00 mm
Consider a vertically polarized plane wave nor- what is the direction of polarization of light after
mally incident on a polarizer whose axis is parallel traversing the cell? What is the value of the Verdet
to the plane of the electric field of the light. constant?
Downstream the light traverses a second polarizer, (v) Explain, briefly, how this medium could be
whose axis is inclined at π/4 with respect to the combined with two linear polarizers to realize an
first, and a final polarizer whose axis is orthogonal optical diode (a device that transmits light in one
to the first. Write down (vector) expressions for direction only).
the electric field before and after each polarizer. (4.22) Magnetic fields of standing waves
What fraction of the initial light intensity is Use either (i) complex notation of the magnetic
transmitted by this sequence of polarizers? Repeat field of the constituent plane waves, or (ii) the
the analysis when the middle polarizer is removed. vector potential given the form of the electric field,
(4.19) Eliminating phase shifts by suitable choice of space to derive the magnetic fields for the three standing
and time origins waves as expressed by eqns (4.23), (4.26), and
For a pair of counter-propagating parallel linearly (4.29).
70 Exercises
(4.23) Elliptical polarizations known as tilt-fringes.

The circles in Fig. 4.20 trace out the position of A waveplate in one arm of the interferometer is
the tip of the electric-field vector over time (darker rotated, see Fig. 4.21, such that the electric field
later). Give values for E1 and E2 and ϕ for each becomes
cases, assuming you are looking into the field.
(4.24) Michelson, tilt fringes, and complementarity x + e−ikθ0 x/2 ˆ

E = √12 E0 ei(kz−ωt) eikθ0 x/2 ˆ y ,
The output of a misaligned Michelson interferome-
ter, as in Fig. 4.21, can be approximately described where cos θ0 /2 1 allows us to neglect the electric
by the sum of two plane waves with amplitudes field component in the propagation direction z.
√1 E0 , propagating at angles ±θ0 /2 relative to the (i) Write an expression for the modified intensity
2
z axis. Assuming we can make the small-angle distribution.
approximations, sin θ0 /2 θ0 /2 and cos θ0 /2 1, (ii) What type of wave plate is used, and by how
then the sum of the two fields (if polarized along much is it rotated?
y) is (iii) The principle of complementarity, see

also Sections 9.8 and 10.9, states that we
y + e−ikθ0 x/2 ˆ
E = √12 E0 ei(kz−ωt) eikθ0 x/2 ˆ y , can observe either wave-like properties (such as
and the intensity distribution is interference), or particle-like properties (such as
the trajectory or path), but not both at the same
I = 12 0 cE · E ∗ = 4I0 cos2 (kθ0 x/2) ,
time. Use complementarity to explain the change
where we have used I0 = 12 0 cE02 . The cosine- in the interference pattern when the wave plate is
squared intensity maxima in this context are rotated.
Fig. 4.20 The tip of the polarization

vector over one cycle of the wave. The
grey scale indicates time (darker being
more recent). See Exercise 4.23.
Fig. 4.21 The layout and output of the

Michelson interferometer considered in
Exercise 4.24, for two orientations of
the wave plate.
Many waves I: Fresnel and
Fraunhofer 5
5.1 Introduction 71
Light is fat and shadow is thin 5.3 Fresnel diffraction integral 72
5.4 Fresnel zones 74
Fang Yizhi (1611–71) 5.5 Circular aperture 75
5.6 Cartesian separability 76
5.1 Introduction 5.8 One, two, many slits 79
In Chapters 3 and 4 we discussed adding two waves, either with the 5.10 Fresnel integrals 85
same or different polarizations. We showed how constructive and
destructive interference leads to periodic spatial structures in the light
Chapter summary 88
intensity or polarization. Now, we extend this idea to adding many
Exercises 88
waves—infinitely many. The many could be either many curved waves
(originating from different positions) or many plane waves (propagating
at different angles). These two descriptions—curved or plane wave—
correspond to the Huygens–Fresnel principle or Fourier optics,
respectively. In this chapter, we focus on the Huygens–Fresnel principle
and express the propagation of light as a sum of infinitely many curved
waves. We shall restrict the discussion to monochromatic light, a single
polarization, and use the scalar approximation.
5.2 A brief history

In his 1704 book Opticks, Newton observed light in the shadow of a long
wedged-shaped slit formed by two sharp butchers knives driven into a
wooden table. His sketch is shown at the top of Fig. 5.1. The image
below shows a similar pattern observed in the shadow of the contact
point between two ball bearings placed side by side. Newton struggled
to explain these strange patterns using his corpuscular theory. Fig. 5.1 Top: Newton’s sketch of
light in the shadow behind two knives
Over one hundred years later, in the summer of 1815, Fresnel forming a wedge-shaped slit. Below:
performed a series of shadow experiments using a honey-drop as a Image of laser light in the shadow
lens, see Darrigol (2012). To explain his observations he developed a behind two ball bearings placed side
mathematical description of light propagation, building on the ideas by side. A region similar to Newton’s
sketch is highlighted by the white
of Christiaan Huygens (The Hague 1629–95), that each point on rectangle. Note also the spot of Arago
the wave front emits a secondary wave. In paraxial optics, these marking the centre of the shadow of
secondary waves are paraxial spherical waves. The sum of infinitely each ball.
72 Many waves I: Fresnel and Fraunhofer
many secondary waves—encapsulated in Fresnel’s diffraction integral—is

phenomenologically equivalent to the sum of phasors that we introduced
1 in Chapter 3. The interpretation of light propagation as a sum of
What is the source of secondary waves
in vacuum? What determines their phasors—illustrated schematically in Fig. 5.2—is commonly referred
amplitude and phase; and why don’t to as the Huygens–Fresnel principle. In spite of some conceptual
they radiate backwards? Attempts difficulties,1 Fresnel’s theory correctly predicts the propagation of light
by Gustav Robert Kirchhoff (Konigs-
fields in the paraxial limit and can be derived analytically from the scalar
burg 1824–Berlin 1887) and others
to derive Fresnel’s theory directly wave equation, see Section 6.5.
from Maxwell’s equation also proved Fresnel also recognized that—as we saw in Section 2.16—a lens
problematic. imprints a quadratic phase that cancels the quadratic phase terms
appearing in the paraxial spherical wave. As the imprinted phase is
periodic, as long as the phase shift is matched modulo 2π, then the lens
will do the same job. Consequently, one can ‘squash’ a standard lens
into segments as illustrated in Fig. 5.3. As this Fresnel lens needs
considerably less optical material than a conventional lens there is a big
saving in weight and cost, especially for larger lenses. The first Fresnel
lens was used in the Cordouan lighthouse in the Gironde estuary in
1826, only eight years after Fresnel had developed his theory—a shining
example of theoretical physics delivering practical applications!
Fig. 5.2 Geometry of the Huygens– Another surprising prediction of Fresnel’s theory is that there should
Fresnel principle in the xz plane. The be a bright spot in the centre of the shadow of an opaque disk. The
field at (x, z) is given by a sum of
spherical waves that originate from
mathematician Siméon Denis Poisson (Pithiviers 1781–Sceaux 1840)
points (xj , 0) in the input plane. The suggested that this prediction must mean that Fresnel’s theory was
sum over all points that contribute wrong; however in 1818—with the help of Francois Arago (Estagel
gives rise to the Fresnel diffraction 1786–Paris 1853)—Fresnel demonstrated that the spot does exist, as
integral, eqn (5.2).
is apparent in Fig. 5.1. Historically, the so-called spot of Arago, along
with Young’s two-hole experiment, are the two major results that helped
firmly establish the wave theory of light.
5.3 Fresnel diffraction integral

Consider the geometry shown in Fig. 5.4; we ask, given that the field
is E (0) in the plane z = 0, what is the field E (z) in an observation plane
at z? According to the Huygens–Fresnel principle we can write E (z)
as a superposition of paraxial spherical waves. Using eqn (2.40) the
field at (x, y, z) for a paraxial spherical wave originating from a point at
(xj , ym

, 0) is given by
Fig. 5.3 A standard lens (a) is
equivalent to (b) a Fresnel lens, a Afjm ikz ik[(x−xj )2 +(y−ym
2
segmented glass structure that imprints E = e e ) ]/2z
. (5.1)
the same phase modulo 2π. ikz
The total field is a sum of all secondary waves with amplitudes, Afjm ,
originating in the input plane, i.e.
2 2
E (z) = Aeikz fjm eik[(x−xj ) +(y−ym ) ]/2z ,
jm
where A is a normalization constant to be determined. For two waves

the sum is the same as in Chapter 3, as illustrated in Fig. 5.2. For
5.3 Fresnel diffraction integral 73
Fig. 5.4 Geometry of a diffraction

experiment. Light propagating along
the z axis is diffracted by an obstacle
in the z = 0 plane. The Fresnel
diffraction integral, eqn (5.2), expresses
the field, E (z) , in the observation plane
at z, in terms of the input field, E (0) =
E0 f(x , y ).
an infinite number of waves we need to replace the discrete sum by an

integral,
¨ ∞
2 2
E (z) = Aeikz f(x , y )eik[(x−x ) +(y−y ) ]/2z dx dy .
−∞
We refer to f(x , y ) as the aperture function, or transmission

function—it is a two-dimensional function that determines what frac-
tion of the input light is transmitted at the location (x , y ) in the plane
z = 0. By requiring that for a plane wave input, f(x , y ) = 1, we recover
a plane wave at (0, 0, z), we find that the amplitude of the secondary
waves is,2 A = E0 /(iλz), and can write 2
For a plane wave, E (z) = E0 eikz , the
sum of secondary waves at a point on
¨ ∞
the optical axis must still be a plane
E0 eikz 2
+(y−y )2 ]/2z wave, i.e.
E (z) = f(x , y )eik[(x−x ) dx dy . (5.2) ¨ ∞
iλz −∞ Aeikz
2 2
eik(x +y ) /2z dx dy = E0 eikz.
−∞
Using
This equation is known as the Fresnel diffraction integral. In the ˆ ∞ √
paraxial regime, the distance from the source point (x , y , 0) to the e−πx
2
/(iλz)
dx = iλz ,
−∞
observation point (x, y, z) is
and similarly for the y integral, we find
2 2
(x − x ) + (y − y ) that
E0
rp = z + , A= .
2z iλz
and we can rewrite the Fresnel diffraction integral in the form

¨ ∞
E0
E (z) = f(x , y )eikrp dx dy . (5.3)
iλz −∞
This equation says that the field at a point P, with coordinates (x, y, z),
is given by a sum of contributions from points P , with coordinates
(x , y , 0), with a phase that depends on the optical path length between
P and P. The Fresnel diffraction integral extends the discrete phasor
sum we encountered in Chapter 3 to infinitely many waves.
Next we consider some special cases that can be solved analytically
and provide considerable insight. First the case of cylindrical symmetry.
5.4 Fresnel zones

If the input field in Fig. 5.4 is cylindrically symmetric and we want to
find the field E (z) at an observation point (0, 0, z) on the optical axis,
it is convenient to rewrite the Fresnel diffraction integral, eqn (5.3), in
cylindrical coordinates:
ˆˆ ∞
E0 eikr̄ 2
E (z) = f(x , y )e−ik(xx +yy )/z eikρ /2z dx dy , (5.4)
iλz −∞
where r̄ = z+ρ2 /2z, ρ = (x2 +y 2 )1/2 , and ρ = (x2 +y 2 )1/2 are the radial
displacements in the input plane and the observation plane, respectively.
To further simplify, we assume that the input field is a plane wave
propagating along z. To set an upper limit on the values of x and y that
Fig. 5.5 The radius of the first contribute—which is required in order to use the scalar approximation,
Fresnel zone, ρ1 , occurs when the path see Section 1.12—we can assume the input plane contains a circular
difference between, (0, 0) to (0, z) and aperture with radius Ra that we can vary. The scalar approximation
(ρ1 , 0) to (0, z), is λ/2. In the paraxial
is valid as long as Ra < z, such that we are in the paraxial regime.
limit, this is equal to ρ2 1 /2z.
As the input field, E0 f(x , y ), is cylindrically symmetric, we can replace
f(x , y ) by f(ρ ), where for a circular aperture f(ρ ) = 1 for ρ ≤ Ra and
0 otherwise. The field is given by the sum of paths from input points
(ρ , 0) to the observation point (0, z). The phase of these contributions
oscillates as the source point (ρ , 0) moves away from the z axis. In
Fig. 5.5 we have labelled two points, ρ1 and ρ2 , where the phase changes
sign. This happens when the path difference between the on-axis path
(0, 0) to (0, z) and the off-axis path (ρm , 0) to (0, z) is equal to an integer
multiple of λ/2. Using the paraxial distance, rp , we can write that
ρ2
m λ
= m , (5.5)
2z 2
which gives
√
Fig. 5.6 Fresnel zones: The field ρm = mλz . (5.6)
component arriving at (0, z) from (ρ , 0)
has a phase, The region in the input plane between ρm−1 and ρm is known as the

ρ2 mth Fresnel zone. In Fig. 5.6 we show these Fresnel zones in the input
φ=k z+ .
2z plane. Light passing through the white regions contributes to the field
The curve plotted along x shows cos φ, with positive phase, and light passing through the grey regions with
which changes sign whenever ρ =
√ negative phase.3 All zones have the same area,
mλz, where m is an integer. The
first Fresnel zone (central
√ white circle) π(ρ2m+1 − ρ2m ) = π [(m + 1)λz − mλz] = πλz . (5.7)
with radius ρ ≤ λz contributes
with positive
√ phase. The√second zone, Fresnel realized that as the field from successive Fresnel zones
between λz < ρ ≤ 2λz (shown interferes destructively, then if we block all the odd, or all the even,
in grey) contributes with a negative
phase. zones we can arrange to have purely constructive interference on-axis in
3 a particular plane downstream. The mask is known as a Fresnel zone
Although the phase plotted in Fig. 5.6
looks similar to the phase of a spherical plate and looks very similar to Fig. 5.6. Note that Fresnel zones are
wave, the phase across the aperture is abstract theoretical constructs, whereas a zone plate is a physical device.
uniform, and now we are considering The effective focal length of the zone plate is
the phase of wave components originat-
ing at (ρ , 0) when they arrive at (0, z). ρ21
f = , (5.8)
λ
5.5 Circular aperture 75
where ρ1 is the radius of the first zone.4 Whereas conventional lenses 4

The size of the focus is determined by
rely on refraction for a gain in intensity, a zone plate focuses light using the overall size of the zone plate, see
Exercise 5.9, similar to a conventional
diffraction. Zone plates are particularly useful for focusing particle lens, see Chapter 9.
beams, or photons in regions of the electromagnetic spectrum where
there are few transparent optical materials (to make a lens)—such as
X-rays. A micro-fabricated zone plate used to focus a beam of helium
atoms is shown in Fig. 5.7.5 5
A disadvantage of using a Fresnel zone
plate as a lens, evident in eqn (5.8),
is that the focal length is inversely
proportional to the wavelength, which
5.5 Circular aperture gives rise to chromatic aberration.
In this section we consider what happens when we vary the aperture

radius, Ra . In Fig. 5.8 we repeat Fig. 5.5 and this time show the
aperture. We have chosen the aperture √ radius to be equal to the radius
of the first Fresnel zone, Ra = ρ1 = λz. In this case, all paths to the
on-axis observation point add in phase, and we expect to observe an
intensity maximum. If the aperture radius, Ra , is increased, the second
zone begins to contribute components √ that are out of phase and the
intensity will reduce. At Ra = ρ2 = 2λz there are equal and opposite
contributions from the first and second zones that cancel and we expect
to observe zero intensity. We can check this hypothesis using the Fresnel
diffraction integral. For uniform illumination of a circular aperture, the Fig. 5.7 A Fresnel zone plate used to
on-axis intensity is given by substituting ρ = 0 and f(ρ ) = 0 for ρ < Ra focus a beam of helium atoms (Adams
in eqn (5.4): et al., 1994).
ˆ
E0 eikz Ra
2
E (z)
= eikρ /2z
2πρ dρ , (5.9)
iλz

0

2 2 kRa2
= −E0 eikz eikRa /2z − 1 = −2iE0 eikz eikRa /4z sin .
4z
Taking the modulus-squared, we obtain the on-axis intensity,

2
πRa
I (z) = 4I0 sin2 , (5.10)
2λz
Fig. 5.8 Geometry for Fresnel diffrac-
where I0 is the incident intensity. In Fig. 5.9(i), we plot the on-axis tion by a circular aperture. If the first
Fresnel zone fills the aperture, then the
intensity at a particular distance z = zP and vary Ra . As expected path difference between (0, 0) to (0, z)
the intensity oscillates between 0 and 4I0 . The intensity in the ρz and (0, Ra ) to (0, z) is λ/2. In the
plane is shown in Fig. 5.9(ii). In Fig. 5.9(iii), we plot eqn (5.10) for paraxial limit, this path difference is
a fixed aperture radius as a function of the propagation distance. The equal to Ra2 /2z.
interpretation is more complicated in this case because as we move along
the z axis the Fresnel zones change size. At z = Ra2 /λ—the most distant 6
For large z > Ra2 /λ, sin(πRa2 /λz) → 0
maxima—the first Fresnel zone fills the aperture.6 At z = Ra2 /2λ the first and the intensity tends to zero.
two Fresnel zones fill the aperture, and the intensity is zero. The number
of Fresnel zones that contribute to the on-axis intensity, Ra2 /λz, is known
as the Fresnel number. For z < Ra , both the Fresnel approximation
and the scalar approximation break down, so we should not read too
much into the left-hand region of the graph where the intensity oscillates
rapidly.
Fig. 5.9 (i) The on-axis intensity at

a fixed distance, zP , downstream of a
circular aperture as a function of the
aperture radius, Ra . (ii) The intensity
pattern in the xz plane for a fixed
aperture radius, Ra . (iii) The on-axis
intensity as a function of distance z for
a fixed aperture radius, Ra .
Figure 5.10 (top row) shows the intensity pattern in the xy plane at
increasing distance z downstream of a circular aperture. The left and
right image are at z = 0 (Fresnel number infinite) and z = Ra2 /λ (Fresnel
number unity), respectively. Although for high Fresnel number (higher
Fresnel zones) the scalar approximation breaks down, it is possible to
approximate the vector nature of the field using an obliquity factor,
where E0 is replaced by 12 E0 (1+cos θ), or, more accurately, by considering
each vector component of the field, as we shall see in Chapter 12. The
bottom row in Fig. 5.10 shows the case of a complementary screen—an
opaque disk rather than an aperture. Here the spot of Arago is seen as
a bright region in the centre of the shadow.
Fig. 5.10 Intensity in xy plane at

increasing distance z downstream of
a circular aperture (top row) or disk
(bottom row) with radius Ra . The
first and last image are at z = 0 and
z = Ra2 /λ, respectively. The analytic
result of eqn (5.10) predicts the on-axis
intensity for the top row, but eqn (5.4)
has to be solved for all other points.
5.6 Cartesian separability

In addition to the cylindrically symmetric examples considered above,
another class of examples that can be solved analytically arises when the
input field is cartesian separable, i.e., we can write f(x , y ) = g(x )h(y ).
Here we consider an example where the field is uniform, or does
not change, in the y direction, so we can write f(x , y ) = g(x ) or
f(x , y ) = g(x )h(y). Substituting f(x , y ) = g(x ) in eqn (5.2) the
Fresnel diffraction integral at y = 0 gives

ˆ ˆ
E0 ikz ∞ ∞ 2 2
E (z)
= e g(x )eik(x−x ) /2z eiky /2z dx dy .(5.11)
iλz −∞ −∞
We can separate the x and y integrals giving,

ˆ ˆ ∞
E0 ikz ∞ ik(x−x )2 /2z 2
E (z)
= e g(x )e dx eiky /2z dy , (5.12)
iλz −∞ −∞
and evaluating the integral over y gives7 7

Writing k = 2π/λ
ˆ ∞
ˆ 2 √
E0 eikz ∞ 2 e−πy /(iλz) dy = iλz .
E (z)
= √ g(x )eik(x−x ) /2z dx . (5.13) −∞
iλz −∞
This is a useful form of the Fresnel diffraction integral that applies to
all cases where the field does not change in the y direction. The integral
has the form of a sum of cylindrical rather than spherical
√ waves as we
might expect. The field amplitude decreases as 1/ z, rather than the
1/z that we find for spherical waves, eqns (2.40) and (5.3).
5.7 Fraunhofer diffraction

There are two special cases of Fresnel diffraction, where the quadratic
dependence on the input coordinates (x , y ) either (i) cancels, or (ii) may Fig. 5.11 Schematic of diffraction
be neglected. The first case—where the x2 and y 2 terms cancel—is in using a lens. If we know the field and
the focal plane of a lens. The second case—where the x2 and y 2 terms intensity incident on the lens I (0) , then
may be neglected—is far from the aperture plane, where z x and y . the intensity distribution in the focal
plane, I (f ) , is given by the Fraunhofer
Both cases are known as Fraunhofer diffraction,8 and although they diffraction formula, eqn (5.16).
are distinct, the same approach may be used for both. 8
Fraunhofer used a simplified form of
Fresnel’s wave theory to understand
how his diffraction gratings worked.
5.7.1 Case I: Focal plane of a lens
A special case of Fresnel diffraction occurs in the focal plane of a lens.9 9
We do not always think of the focusing
A schematic of the optical set is shown in Fig. 5.11. A field E (0) = effect of a lens as an example of
diffraction. The essential insight of
E0 f(x , y ) is incident on the lens in the z = 0 plane, and we ask what Fresnel was that all light propagation
is the field E (f ) = E0 f(x, y), and the corresponding intensity I (f ) in the is a diffraction phenomenon.
focal plane at z = f . The effect of a lens is to imprint phase, Section 2.16,
such that
2
f(ρ ) ⇒ f(ρ )e−ikρ /2f
. (5.14)
Inserting this phase-imprinted field into the Fresnel diffraction integral,

eqn (5.4), we find that for the special case, z = f , the lens exactly cancels
the phase due to path differences appearing in the Fresnel integral,
Figs. 5.5 and 5.6, giving
ˆˆ ∞
E0 eikr̄
E (f ) = f(x , y )e−ik(xx +yy )/f dx dy , (5.15)
iλf −∞
and the corresponding intensity distribution is

ˆˆ ∞ 2
I0 −ik(xx +yy )/f

I (f ) = 2 2 f(x , y )e dx dy . (5.16)
λ f −∞
We refer to this equation as the Fraunhofer diffraction formula.

As we shall see in Chapter 6, the integrals corresponds to a Fourier
transform of the input field distribution, with Fourier variables u = x/λf
and v = y/λf . Next, we apply this formula to find the distribution of
light in the focal plane for uniform illumination of a lens.
Example 5.1
Finite lens size: A lens with finite size can only capture the light that falls within
the aperture of the lens. For a plane wave incident on a circular lens with diameter
D, the aperture function is given by a circ-function:

ρ 0 ρ > D/2
f(ρ ) = circ = . (5.17)
D 1 ρ ≤ D/2
Substituting into eqn (5.16), and evaluating the integral in cylindrical coordinates,
see Section B.13 in Appendix B, we find that

π2 D4 πDρ
I (f ) = I0 jinc 2
, (5.18)
16λ2 f 2 λf
where jinc(α) = J1 (α)/α and J1 is the first-order Bessel function of the first kind.
This intensity distribution is known as an Airy pattern and is shown in Fig. 5.12.
The first zero in the Airy pattern is given by the first zero of the Bessel function, and
occurs at a radius of ρ = 1.22f λ/D. The finite size of the intensity distribution at the
focus sets a limit to the smallest detail that can be resolved by a lens, see Chapter 9.
To resolve finer detail, the ratio f /D—called the f-number in photography—needs
to be small.
Fig. 5.12 (i) The intensity distribution

along the x axis in the focal plane
of a lens with diameter D illuminated
normally by uniform monochromatic
light with wavelength λ. (ii) The 5.7.2 Case II: Far field
intensity pattern in the xy plane. This
is known as the Airy pattern. The
The second case where we can observe Fraunhofer diffraction is when the
first
dark ring is located at a radius of propagation distance, z, is sufficiently large—which we call the far-field
ρ = x2 + y 2 = 1.22f λ/D. regime—that we can neglect the x2 and y 2 in the Fresnel diffraction
integral. Starting from eqn (5.4), we can write the intensity as
ˆˆ ∞
I0 2
I (z)
= f(x , y )e−i2π(xx +yy )/(λz) eikρ /(2z) dx dy ,
λ2 z 2 −∞
where I (z) = I0 |f(x , y )|2 is the field in the input plane. The
Fraunhofer approximation says that for z ρ (for all input
2
coordinates that contribute) we can set eikρ /2z 1 and therefore
ˆˆ ∞ 2
I0 −i2π(xx +yy )/(λz)

I (z)
= f(x , y )e dx dy . (5.19)
λ2 z 2 −∞
This is the same as eqn (5.16) if we put put z = f ; however,

whereas eqn (5.16) is exact, this far-field case is only approximate.
For the Fraunhofer approximation to be accurate we require that the
phase kρ2
max /2z
π/4, where ρmax is the maximum value of ρ that
contributes. For a rectangular aperture with width, a, we can write
ρmax = a/2, and therefore
a2
z dR = , (5.20)
λ
10
At z = 10dR or 100dR the phase
error is π/40 ∼ 0.08 or π/400 ∼ 0.008,
where dR is the Rayleigh distance or Rayleigh length, see also therefore to achieve a 1% accuracy,
Section 5.8. The condition z dR defines the far field. Even in z dR means two orders of magnitude
this far-field region, the Fraunhofer diffraction formula is still only larger.
approximate.10
In the case of cartesian separability, Section 5.6, f(x , y ) = g(x )h(y )
and we can separate the x and y integrals in eqn (5.19). In this case
the far-field Fraunhofer diffraction formula is written as
ˆ ˆ 2
I0 ∞ −ikxx /z ∞ −ikyy /z
I = 2 2
(z)
g(x )e dx h(y )e dy . (5.21)
λ z −∞ −∞
A similar result holds in the focal plane of a lens with z = f . If the field
is uniform in the y direction, starting from eqn (5.13) and making the
Fraunhofer approximation, we obtain
ˆ 2
I0 ∞ −i2πxx /(λz)
I =
(z)
g(x )e dx . (5.22)
λz −∞
Next, we apply these formulae to a range of diffraction problems.
5.8 One, two, many slits

In this section, we consider Fraunhofer diffraction by one, two, and Fig. 5.13 The aperture function for (i)
a single slit of width a, (ii) a displaced
many slits. The slits are located in the z = 0 plane and illuminated
slit, and (iii) a double slit with slit
at normal incidence with uniform monochromatic light with amplitude width a and separation d.
E0 and wavelength λ. First, we shall consider the case of one slit, as
illustrated schematically in Fig. 5.14(i).
Example 5.2 11
We shall discuss the limitations of
Single-slit Fraunhofer diffraction: The slit is assumed to have a width a in the this idealized aperture function in
x direction and infinite spatial extent in the y direction, such that we can write the Chapter 11. In practice, the screen will
field in the z = 0 plane as E (0) = E0 g(x ), where11 have finite thickness, and the edges of

x 0 |x | > a/2 the slit are unlikely to be smooth on
g(x ) = rect = , (5.23)
a 1 |x | ≤ a/2 the scale of a wavelength; however, if
a λ and z a we can neglect these
where we have introduced the label rect to denote the rectangular function shown in imperfections.
Fig. 5.13(i).
Substituting eqn (5.23) into eqn (5.22) we obtain

ˆ 2 2
I0 a/2 −i2πxx /(λz) I0 e−iπax/(λz) − eiπax(λz)
I (z)
= e dx = , (5.24)
λz −a/2 λz −i2πx/(λz)
2

I0 sin[πax/(λz)] a 2 πax
= = I0 sinc2 . (5.25)
λz πx/(λz) λz λz
The intensity pattern corresponds to a sinc-squared distribution with a dominant
central fringe and less intense fringes on either side, with periodically spaced zeros,
as shown on the far right of Fig. 5.14(i). Note that the peak intensity is proportional
Fig. 5.14 The two cases of Fraunhofer

diffraction: The background greyscale
shows the intensity in the xz plane for
single-slit diffraction: (i) without, and
(ii) with a lens (not on the same scale).
The functional forms of the intensity
distribution (i) in the far field, z > dR ,
and (ii) in the focal plane of a lens,
z = f are the same (dark grey curves).
However, the width of the distribution,
(λ/a)z versus (λ/a)f , and the way the
field evolves as it propagates, are very
different.
to a2 . Increasing the width of the slit increases the input flux by a factor a and
reduces the width of the diffraction pattern by another factor of a. The case with
a lens is the same but with z replaced by f and the result only applies in the focal
plane, as shown in Fig. 5.14(ii).
We can use the above result to give a geometric interpretation of the

Rayleigh distance, eqn (5.20), see Fig. 5.15. The width of the central
fringe is set by the position of the first zeros, which occur at transverse
displacements, x = ±(λ/a)z. We define the angular width as either the
angle from the centre to the first zero, or equivalently, half the angle
between both zeros, see Fig. 5.15:
λ
Δθ = . (5.26)
a
It follows that the Rayleigh distance corresponds to the distance, z, at
which the contribution to the width due to diffraction, defined as Δθz,
is equal to the initial width, a, as illustrated in Fig. 5.15.
Fig. 5.15 Geometry of the Rayleigh

distance, dR , for diffraction by a single
slit with width a. The central fringe of Example 5.3
the diffraction pattern has an angular Displaced slit: One odd prediction of the Fraunhofer diffraction formula is that the
spread (half angle subtended by first diffraction pattern does not change when we translate the position of the field within
zeros) Δθ = λ/a (solid white lines). At the input plane. We illustrate this using the example of a slit. If we translate the
z = dR , the contribution to the width of slit by a distance d the aperture function becomes
⎧
the light distribution due to diffraction, ⎨ 0 −∞ < x ≤ d − a/2

Δθz, is equal to the initial width, a f(x ) = 1 d − a/2 < x ≤ d + a/2 , (5.27)
⎩
(horizontal dashed white lines). 0 d + a/2 < x ≤ ∞
and the Fraunhofer integral is

ˆ ∞ ˆ d+a/2

f(x )e−i2πxx /(λz) dx = e−i2πxx /(λz)
dx .
−∞ d−a/2
giving
ˆ ∞
e−i2πdx/(λz) −iπax/(λz)
f(x )e−i2πxx /(λz)
dx = e − eiπax/(λz) .
−∞ −i2πx/(λz)
πax
= e−i2πdx/(λz) asinc . (5.28)
λz
The effect of translation is only to multiply by a phasor factor, e−i2πdx/(λz) .
Inserting this result into eqn (5.22) we find that
a2 πax
I (z) = I0 sinc2 , (5.29)
λz λz
which is the same as before. This seems odd; how can we translate the slit without
changing the diffraction pattern? The answer is that the Fraunhofer approximation
assumes that x z for all x that contribute, therefore we can only move the slit a
small distance, d z, before the Fraunhofer approximation breaks down.
In contrast, in the focal plane of the lens where the Fraunhofer diffraction formula
is ‘exact’, the insensitivity of the diffraction pattern to translation in the input plane
is illustrated in Fig. 5.16. In summary, a small displacement in the input plane gives
rise to an exponential phase factor in the far-field amplitude, which on its own does
not change the intensity distribution. This topic is explored further in Exercise 5.14.
Example 5.4
Double slit: Now consider two slits with width a and spacing as shown in
Fig. 5.13(iii). In this case, the input function becomes
⎧ Fig. 5.16 Fraunhofer diffraction using
⎪ 0 −∞ < x ≤ −d/2 − a/2
⎪
⎪ a lens in the z = 0 plane. In
⎨ 1 −d/2 − a/2 < x ≤ −d/2 + a/2
this example, a slit is placed in a
f(x ) = 0 −d/2 + a/2 < x ≤ d/2 − a/2 . (5.30)
⎪
⎪ plane at z = −f and the Fraunhofer
⎪
⎩ 1 d/2 − a/2 < x ≤ d/2 + a/2
diffraction pattern is observed at z = f .
0 d/2 + a/2 < x ≤ ∞
Translating the slit does not change the
The integral in the Fraunhofer diffraction formula is now a sum of two displaced slits position of the diffraction pattern.
at x = ±d/2. Using our previous result for a single displaced slit, eqn (5.28), the
Fig. 5.17 The intensity pattern pre-

dicted for double-slit interference by
eqn (5.32). In this example, d = 20a.
12
integral for the two slits is the sum of two terms:12 We have chosen to write the integrals
ˆ ∞ ˆ a/2 in terms of k this time.

f(x )e−ikxx /z dx = e−ikdx/(2z) + eikdx/(2z) e−ikxxd /z dxd ,
−∞ −a/2

kdx kax
= 2a cos sinc . (5.31)
2z 2z
Now the exponential phase factors due to the slit displacements give rise to an
interference term which does modify the intensity pattern. Substituting k = 2π/λ,
we find that the intensity distribution in the far field is
πax
4I0 a2 πdx
I (z) = cos2 sinc2 . (5.32)
λz λz λz
This function is plotted in Fig. 5.17. The cosine-squared term produces interference
fringes with a spacing (λ/d)z, eqn (3.18), as in Chapter 3. The sinc-squared term
limits the peak intensity of each fringe. As the sinc-squared function goes to zero at
x = ±(λ/a)z, if d/a is an integer the expected fringe at this position is suppressed.
This is referred to as a missing order. In Fig. 5.17, d/a = 20 and the 20th fringe
(counting the central fringe as zero) is suppressed.
Example 5.5
Many slits (the diffraction grating): The above treatment can be extended to
N -slits, in which case the prefactor in eqn (5.31) becomes a sum of N terms, similar to
the N -slit interference discussed in Chapter 3. Extending the 2-slit sum in eqn (5.31)
Fig. 5.18 Far-field intensity pattern

for a diffracting grating consisting of
twelve slits (or lines), eqn (5.34) with
N = 12 and d = 8a. The principal
maxima are separated by (λ/d)z. The
dashed line is the sinc-squared envelope
arising due to the finite slit width with
first zero at (λ/a)z with the result that
the mth principal maximum, where
m = d/a, is suppressed.
to N -slits, the Fraunhofer integral becomes

ˆ ∞
N −1 ˆ a/2

f(x )e−i2πux dx = e−i(N −1)πud ein2πud e−i2πux dx ,
−∞ n=0 −a/2
sin N πud
= asinc(πua) , (5.33)
sin πud
and the intensity is
I0 a2 sin2 (N πdx/λz) πax
I (z) = 2
sinc2 . (5.34)
λz sin (πdx/λz) λz
A plot of this function for N = 12 is shown in Fig. 5.18. As for the double slit, the
only difference with respect to the N -phasor sum considered in Section 3.9 is that
now there is a sinc-squared envelope and there is a missing order at x = (λ/a)z.
5.9 2D Fraunhofer
The previous examples focused on diffraction in only one transverse
direction, x. We now consider some examples where there is diffraction
in both x and y. A schematic of the Fraunhofer diffraction is shown
Fig. 5.19 A Fraunhofer diffraction

experiment where an object (shown
here greatly magnified) produces an
intensity pattern in the far field, z
dR . The image shows the X-ray
diffraction pattern for a thin wafer of
cuprous oxide (the central maximum is
blocked). Courtesy of Liam Gallagher
and Josh Rogers, Durham University,
2018.
in Fig. 5.19. The input which can be approximated by a plane wave

interacts with an object in the z = 0 plane wave. The intensity
distribution is observed at a distance z dR . This scenario arises,
for example, in X-ray crystallography: the image shown in the figure
is the X-ray diffraction image recorded for a thin wafer of cuprous oxide,
see also Section 6.8. We shall focus on examples that are cartesian
separable such that we can use eqn (5.21).
Example 5.6
Laser beam: A useful example of cartesian separability is the case of a gaussian
13
laser beam,13 see Chapter 11 for more detail. For a cylindrically symmetrical laser Named after the function, see
beam, the field in the z = 0 plane may be written as E (0) E0 g(x )h(y ), where g(x ) = Sec. B.6 associated with Carl Friedrich
2 2 2 2
e−x /w0 , h(y ) = e−y /w0 , and w0 is the beam radius. Both the x -integral and Gauss (Brunswick 1777–Göttingen
y -integral in eqn (5.21) are performed by completing the square, see Appendix B. 1855).
The x -integral gives,
ˆ ∞
2 √ 2 2 2 2 2
e−x /w0 e−i2πxx /λ dx = πw0 e−π w0 x /λ z .
−∞
Similarly for y , and we obtain the far-field intensity distribution,
π 2 w4 2 2 2 2 2 π 2 w4 2 2
I (z) = I0 2 20 e−2π w0 ρ /λ z = I0 2 20 λze−2ρ /w , (5.35)
λ z λ z
2 2 2
where ρ = (x + y ) and w = [λ/(πw0 )]z is the far-field beam radius. Hence
in the far field, the Fraunhofer intensity distribution is also a gaussian, but with a
significantly larger beam radius, that is inversely proportional to the initial width,
w0 . The angular spread of the laser beam, see Fig. 5.20, is defined as Δθ = w/z,
thus
λ
Δθ = . (5.36)
πw0
As for the case of a single slit, the Fraunhofer approximation is only valid when the
width of the light distribution is much larger than the initial size, i.e., when spreading
due to diffraction Δθz w0 . The cross-over between initial size dominating
and diffraction dominating occurs at the Rayleigh distance—more often called the Fig. 5.20 In the far field, where the
Rayleigh range, zR , for laser beams—which is defined as ΔθzR = w0 , which gives laser beam radius, w, is much larger
πw02 than the initial size (or waist), w0 ,
zR = . (5.37) we can write w = Δθz, where Δθ =
λ
λ/πw0 .
The gaussian has the property that the product of the size and spread (momentum
distribution) is a minimum, consequently the Rayleigh range is larger than the
Rayleigh distance for other light distributions such as the rectangular aperture.
Although Fraunhofer diffraction gives the correct result for the far-field intensity
distribution of a laser it ignores wave front curvature, as we shall see in Chapter 11.
Example 5.7
Rectangular aperture: Our second example of Fraunhofer diffraction in two
transverse dimensions is the case of a rectangular aperture with width a and height
b, see Fig. 5.21(i). We assume uniform illumination, for example using a laser
beam with beam radius, w0 , much larger than the dimensions of the aperture,
w0 a > b. The field immediately downstream of the aperture plane can be
written as E (0) = E0 f(x , y ), with

x y
f(x , y ) = rect rect . (5.38)
a b
The corresponding intensity distribution is shown in Fig. 5.21(i). To find the intensity
distribution in the far field we use eqn (5.21) and the integral eqn (5.24) to obtain
πax
a2 b 2 πby
I (z) = I0 2 2 sinc2 sinc2 . (5.39)
λ z λz λz
This intensity distribution is shown in Fig. 5.21(ii). Along the x and y axes the first
zeros occur at x = ±λz/a and y = ±λz/b, respectively. For a > b, the input field is
Fig. 5.21 (i) Uniform illumination of wide and short, while the diffraction pattern is tall and thin. In Chapter 6 we shall
a rectangular aperture with dimensions see how this inverse scaling arises from the Fourier relationship between position and
a and b in the horizontal and vertical momentum—a narrow real space distribution requires a large spread in momentum,
directions, respectively. (ii) The far- and vice versa. The peak intensity of the diffraction pattern is proportional to (ab)2 ,
field intensity pattern. The first i.e., the square of the area of the aperture. This scaling is explored further in the
zeros are at ±λz/a and ±λz/b in end-of-chapter exercises.
the horizontal and vertical directions,
respectively.
Example 5.8
Single or multiple slits and a laser: A likely scenario in a single-slit diffraction
experiment is that the slit is tall and thin (b a), and the laser beam is smaller than
the aperture in the vertical direction, w0 < b, as in Fig. 5.22(i). If the laser beam
size is relatively large (of the order of a millimetre) then we are likely to observe the
diffraction pattern at an intermediate distance z corresponding to the far field in x
but the near field in y. In terms of the Rayleigh length for single slit diffraction and
Rayleigh range of the laser, the observation distance is
dR z zR .
For b > w0 a, the intensity profile is approximately given by a cartesian separable
function of the form f(x , y ) = g(x )h(y ), where

x y
g(x ) = rect and h(y ) = gauss , (5.40)
a w0
2 2
with gauss (y /w0 ) = e−y /w0 describing the laser field profile. We can assume that
the laser beam remains unchanged in the vertical y direction, and use the Fraunhofer
Fig. 5.22 (i) Laser illumination of a diffraction integral for one transverse dimension multiplied by a fixed y-dependence:
vertical slit with width a and height b, πax
a2 y
where a b. The laser beam radius is I (z) = I0 sinc2 gauss2 . (5.41)
much larger that the width but smaller λz λz w0
than the height, a w0 < b. (ii) The For diffraction in only one direction the prefactor is 1/(λz), see eqn (5.22). The
far-field intensity pattern has a sinc- calculated far-field intensity pattern for this case is shown in Fig. 5.22(ii). The
squared pattern in the horizontal but pattern consists of a sinc-squared pattern along x and a gaussian along y. Figure
is gaussian in the vertical direction. (i) 5.23 shows an example with a laser beam and five vertical slits, where one sees the
and (ii) are not to scale. five-slit diffraction pattern along x and the gaussian profile along y. If, in contrast,
we move the observation plane back into the far field of the laser beam, z zR , then
there is diffraction in both transverse directions, and the intensity in the observation
plane is
πa2 w2 πax πw y
0
I (z) = I0 2 20 sinc2 gauss2 . (5.42)
λ z λz λz
5.10 Fresnel integrals 85
Fig. 5.23 Top: Light distribution

downstream of an aperture consisting
of five vertical slits illuminated by a
He–Ne laser. The image is recorded
on a CCD sensor without a lens.
Bottom: measured intensity along the
x axis and prediction of eqn (5.21).
The parameters of the experiment are
λ = 0.63 μm, a = 0.10 mm, d =
0.40 mm, giving a spacing between
principal maxima at z = 1.2 m of
(λ/d)z = 1.9 mm. Data courtesy
of Sarah Bunton, Ogden Trust intern,
Durham University, 2016.
The pattern looks similar to Fig. 5.22 but now with a beam radius in the vertical
direction given by, w = Δθz, where Δθ = λ/(πw0 ).
5.10 Fresnel integrals

An alternative route to solving the Fresnel diffraction integral is to use
the Fresnel integrals, often included as in-built functions in many
software packages. The Fresnel integrals are defined as
ˆ ξ̃1 π ˆ ξ̃1 π
C(ξ˜1 ) = cos ξ˜2 dξ˜ ; S(ξ˜1 ) = sin ξ˜2 dξ˜ . (5.43) Fig. 5.24 The geometry of Fresnel
0 2 0 2 diffraction. At any observation point
P with position (x, z) the field is given
Sometimes the π/2 factor is omitted. We now consider a few examples by the sum of phasors arriving from all
where the diffraction integral can be written in term of these functions.14 points x in the input plane at z = 0.
The transverse displacement between
the observation point and the input
point is ξ = x − x.
Example 5.9
Single slit: Consider a long narrow slit of width a in the z = 0 plane, orientated
vertically, along the y axis. The slit is illuminated by uniform monochromatic light 14
The Fresnel integrals are similar
propagating along z. As the field is uniform along y, we can use eqn (5.13) with
to the error function (Hughes 2010),
f(x ) = 1 for |x | ≤ a/2 and 0 otherwise, see Fig. 5.13(i), and the Fresnel diffraction
but with a complex argument. They
integral is
are also used in the design of roads
ˆ
E0 eikz a/2 ik(x−x )2 /2z and velodrome tracks to minimises
E (z) = √ e dx , (5.44) the forces experienced on entering a
iλz −a/2
bend. The minimum-force trajectory is
or in terms of ξ = x − x , as in Fig. 5.24. known as the transition curve.
ˆ
E0 eikz a/2−x ikξ2 /2z
E (z) = √ e dξ , (5.45)
iλz −a/2−x
ˆ 2
I0 a/2−x
2
I (z) = eikξ /2z dξ , (5.46)
λz −a/2−x
where I0 is the intensity in the absence of the slit. The intensity pattern predicted
by eqn (5.46) is shown in Fig. 5.25. In
order to compute √ the integral, it is convenient
to rescale all distances in terms of λz/2, where λz is known as
the Fresnel
length. The rescaled position in the observation plane is x̃ = x/ λz/2. The
rescaled transverse displacement

between the input point (x , 0) and the observation
point (x, z) is ξ̃ = ξ / λz/2. Re-writing the integral in terms of these scaled
variables we find that
ˆ 2
I (z̃) 1 ξ̃2 i(π/2)ξ̃2
= e dξ̃ , (5.47)
I0 2 ξ̃1

where ξ̃1 = −ã/2 − x̃ and ξ̃2 = ã/2 − x̃, with ã = a/ λz/2 being the dimensionless
slit width. The reason for this rescaling is that we now can rewrite the integral in
terms of Fresnel integrals, giving
2
I (z̃) 1 2
= C ξ̃2 − C ξ̃1 + S ξ̃2 − S ξ̃1 . (5.48)
I0 2
Figure 5.25 is generated by evaluating eqn (5.43) on a grid for many values of (x, z).

downstream of a single slit with width In Fig. 5.26 we plot eqn (5.48) in a particular observation plane—fixed
a. Light propagates from left to right. z—as the slit is gradually closed—decreasing value of a. The third
On the right, the intensity on-axis is
higher than the input intensity.

at a fixed distance z downstream√of a
single slit as the slit width, ã = a/ λz,
is varied. The dimensionless parameter √
ã is the Fresnel number. For ã = 2 2,
the intensity on-axis is larger than the
input intensity, similar to the focusing
effect of a circular aperture, Fig. 5.9(ii).
For small Fresnel number, ã 1
(lower right), the intensity distribution
approaches the far-field (Fraunhofer)
limit.
√
frame with a/ λz = 2 λz/2 is equivalent to a cross section through
the intensity pattern on the far right-hand side of Fig. 5.25. For a large
slit width ã = 10 (top left) we observe something close to the geometrical
shadow, with some fringing at the edges. This corresponds to the top
of Newton’s sketch in Fig. 5.1. As the slit narrows the fringes begin to
interfere. Interestingly for ã = 2 (bottom left) we observe an on-axis
intensity that is significantly larger than that obtained without the slit.
We could say that the slit effectively focuses the light. This type of
constructive interference between components led Fresnel to the idea of
Fresnel lenses and zone plates. Finally, for a very narrow slit (lower
right) the light distribution bears little resemblance to the original, and
starts to look more like a sinc-squared distribution—as we expect in a
far-field Fraunhofer regime. This corresponds to the lower portion of
Newton’s sketch, Fig. 5.1.
Example 5.10
Double slit: We now extend the single-slit theory to more slits. For two slits of
width a separated by a distance d the aperture function looks as in Fig. 5.13(iii),
and the Fresnel diffraction integral is
ˆ ˆ d/2+a/2−x 2
I0 −d/2+a/2−x ikξ2 /2z 2

I (z) = e dξ + eikξ /2z dξ .

λz −d/2−a/2−x d/2−a/2−x
As previously, this integral can be rewritten as a sum of Fresnel integrals, now
with four terms rather than two. The four coefficients are ξ1 = −d/2 ˜ − ã/2 − x̃ to
Fig. 5.27 Intensity in the xz plane
˜ + ã/2 − x̃. The intensity pattern in the xz plane for this case is shown in
ξ4 = d/2 downstream of a double slit. In the
Fig. 5.27. We see how the light from each slit first spreads out, and then overlaps far field (far right) the intensity distri-
to form the two-slit interference pattern. In the far field, on the right-hand side of bution has evolved into fringes which
the figure, we see the cosine-squared interference fringes characteristic of Young’s spread out linearly with propagation
double-slit experiment, Chapter 3. However, the intensity pattern is very different distance, similar to the interference
to Young’s sketch, Fig. 3.3, as Young was drawing amplitude rather than intensity. between two cylindrical waves, Fig. 3.6.
Example 5.11
Edge: A slit reduces to an edge if we move the other edge to infinity, i.e., set ξ̃1 = −x̃
and ξ̃2 = ∞. In this case, using C(∞) = 1/2 and S(∞) = 1/2 we obtain
2 2
I (z̃) 1 1 1
= + C (x̃) + + S (x̃) . (5.49)
I0 2 2 2
This intensity pattern downstream of an edge is plotted in Fig. 5.28. Note that
the function in units of the scaled variable x̃ is always the same. If we propagate
in the z direction the pattern spreads out, corresponding to a simple rescaling of
the horizontal axis, but it always has the same functional form. Note also that
constructive interference leads to a higher value of the intensity in the shadow relative Fig. 5.28 Intensity pattern down-
to the incident wave, and that for large displacements from the edge the intensity stream of an edge. As the field
asymptotically becomes equal to the value obtained were the light to propagate propagates, the pattern retains its
without obstruction. For all points downstream with the same lateral displacement functional form but spreads out with a
as the edge (x = 0), the intensity is exactly one quarter the value of the incident scaling that depends on the square root
beam. This is discussed further in an end-of-chapter exercise. of the propagation distance z.
5.11 Talbot effect

Finally, we consider the case of N -slits. Figure 5.29 shows an example
calculated using the same method as Fig. 5.27 but now for the large
N —grating—limit. The remarkable feature of this pattern is that it
repeats—an image of the grating self-replicates even though there is no
lens. The pattern repeats over a distance, zT = 2d2 /λ, known as the
Fig. 5.29 Talbot carpet: Intensity
Talbot length after Henry Fox Talbot (Melbury 1800–Lacock 1877). A pattern in the xz plane downstream of
half-Talbot period is shown in Fig. 5.29. At fractional multiples of the a grating with d/a = 20, calculated
half-Talbot length zT /(2m), where m is an integer, the spatial frequency using the Fresnel diffraction integral.
of the intensity pattern is m times that of the input. The m = 2, 3, etc. The dimensions of the image are 5d
in the x direction (shown vertical) and
fractional revivals are 1/2, 1/3, etc. of the way across of Fig. 5.29. In d2 /λ (half the Talbot length) in the z
Fig. 5.29 the slits are much narrower than their spacing, a/d = 1/20. In direction (shown horizontal).
contrast if we choose a/d = 1/2—Ronchi grating—and only illuminate
the central N = 16 slits the pattern looks very different, see Fig. 5.30.
88 Exercises
Chapter summary
• The propagation of a scalar light field in the paraxial regime is

described by the Fresnel diffraction integral.
• The Fresnel diffraction integral can be interpreted as a sum of
spherical waves (or cylindrical waves for diffraction in only one
transverse dimension).
• One of the few analytical solutions to the Fresnel diffraction
Fig. 5.30 Talbot carpet for a Ronchi
grating (d/a = 2) where only the
integral is the on-axis intensity downstream of a circular
central N = 16 slits are illuminated. aperture.
Note how the pattern decays after • In this case, the input field is divided into theoretical constructs
approximately three Talbot lengths.
known as Fresnel zones. Each zone contributes an equal
amplitude but contributions from adjacent zones √ are alternately
phased. The radius of the mth zone is ρm = mλz.
• Either by blocking all odd, or all even, zones we can arrange that
all contributions add constructively a distance z downstream. A
device for realizing this concept is known as a Fresnel zone plate.
• The removal of some Fresnel zones gives rise to the spot of
Arago—a bright region in the centre of the shadow of an opaque
disk.
• In two situations—the focal plane of a lens and the far-field—
the Fresnel diffraction integral simplifies to the case of Fraunhofer
diffraction.
• The Fresnel integrals—included as in-built functions in many
software packages—can be used to calculate the Fresnel diffraction
patterns of slits and edges.
Exercises
(5.1) Fresnel diffraction integral k = 2π/λ. For a field that is uniform in the y
Write an expression for the Fresnel diffraction direction, we can write f(x , y ) = f(x ). Show that
integral in terms of a sum of phasors for (i) the the field at y = 0 is given by
most general case, and (ii) when we can neglect ˆ
E0 ikz ∞ 2
diffraction in the y direction. Explain the two E (z) = √ e f(x )eik(x−x ) /2z dx .
main differences. iλz −∞
´ ∞ −πy2 /(iλz) √
(5.2) Fresnel diffraction integral: from two to one Hint: −∞
e dy = iλz.
transverse dimensions
What is the field along the z axis if the field is also
The Fresnel diffraction integral is
ˆ ∞ ˆ ∞ uniform in the x direction? How does your answer
E0
E (z) = f(x , y )eikrp dx dy , compare to the incident field?
iλz −∞ −∞
(5.3) Fresnel diffraction integral—cylindrical symmetry
where rp = z + [(x − x )2 + (y − y )2 ]/(2z) and Write an expression for the Fresnel diffraction
Exercises 89
integral in terms of a sum of phasors for (i) the m 1. Write an expression for width of the mth
most general case, and (ii) when we can neglect zone, δRm , in terms of m, the focal length f , and
diffraction in the y direction. Explain the two the wavelength, λ. Write an expression for the
main differences. focal spot size, xf = f λ/D, in terms of focal length
(5.4) Fresnel diffraction from an edge f , the wavelength λ, and the number of zones m.
Figure 5.28 shows that the intensity at x = 0, the Hence show that the width of the outermost (or
location of the edge, is one quarter of the value of mth) zone is approximately equal to the spot size.
the incident light. Why is this? [Hint: Consider (5.10) Other forms of the Fresnel diffraction integral
what the intensity would be at x = 0 from the Write the Fresnel diffraction integral for one
mirror-image edge, and then consider adding the transverse dimension x in the form of (i) a
fields from these two configurations.] convolution integral, and (ii) a Fourier transform.
(5.5) Fresnel zones (1) Write the Fourier variable kx in terms of k, x and
In Fig. 5.10 top row which images are closest to z, or u in terms of λ, x and z.
the case of 1, 2, and 4 Fresnel zones? Explain your (5.11) An improved Fresnel zone plate?
reasoning. A conventional Fresnel zone plate achieves a high
(5.6) Fresnel zones (2) intensity on-axis by blocking all of either the
The field on-axis at a distance z downstream of a odd or the even Fresnel zones, thus eliminating
cylindrically symmetrical aperture is given by the destructive cancellation of the fields from
ˆ neighbouring zones. What would happen if it were
E0 eikz ∞ ikρ2 /2z possible to manufacture a mask that rather than
E (z) = f(ρ )e 2πρ dρ ,
iλz 0 blocking the even zones, allowed the light to pass
but retarded the phase by π?
where f(ρ ) is the aperture function. Write an
expression for the field on-axis at a distance z (5.12) X-ray crystallography and Fraunhofer diffraction
downstream for the case of a circular annulus A typical wavelength for X-ray crystallography
with inner and outer radii ρ1 and ρ2 , respectively. is of the order of 1 × 10−10 m, and a typical
[Hint: separation of planes in a crystal is of the order of
a few ×10−10 m. Show therefore that the relevant
ˆ ξ2 2
2 2 diffraction regime for X-ray crystallography is

eiξ 2ξdξ = −i eiξ2 − eiξ1 .]
ξ1
Fraunhofer.
(5.13) Single slit
Write expressions for the field from the second
A slit with width a and height b is illuminated
Fresnel zone, E2 . Rewrite E2 in terms of the field
normally by a laser beam with radius a w0 <
from the first zone E1 .
b. Write down an expression for the Fraunhofer
(5.7) Fresnel zones (3) intensity pattern downstream; comment on any
A plane wave with λ = 514 nm impinges normally assumption you make. At what distance does
on an opaque screen containing a circular hole. the vertical size of the beam become equal to the
When viewed axially from a distance of 250 mm horizontal width? Comment on whether this is
the hole uncovers the first Fresnel zone. What is smaller or larger than either the Rayleigh length
the diameter of the hole? associated with a slit of width b, or the Rayleigh
(5.8) Fresnel zones (4) range associated with the laser beam.
A small probe for measuring intensity sits on (5.14) Translation of the aperture
the central axis 2.50 m behind an opaque screen An opaque screen containing a rectangular
containing a circular hole. With normally incident aperture of width a and height b is illuminated
plane wave illumination at λ = 488 nm, show that normally by uniform monochromatic light with
a hole of radius 1.02 mm will generate an intensity intensity I0 and wavelength λ.
maximum. The probe is then moved along the axis (i) Using g(x ) = rect(x /a), h(y ) = rect(y /b)
towards the screen. At which separation from the in eqn (5.21), write an expression for (i) the far-
screen does the next intensity minimum occur, and field intensity distribution and (ii) the intensity
at which separation the next maximum? distribution in the focal plane of a lens, in terms
(5.9) Fresnel zone plate of a, b, x, y, z, I0 , and λ.
Consider a Fresnel zone plate with m zones, where (ii) For the far-field case (no lens), if λ = 0.5 μm,
90 Exercises
z = 5 m, and the first zero in the diffraction (5.16) Three slits

pattern is observed at x = 5 mm, what is the slit Write an expression for the Fraunhofer intensity
width, a? pattern for three slits. Plot the intensity
(iii) What is the Rayleigh distance for diffraction distribution along the x axis for the case of d = 2a.
in the x direction, dR , for these parameters? (5.17) Diffraction of a laser beam
(iv) What is the value of the ratio, z/dR ? Is this What is the Rayleigh range of a laser of wavelength
consistent with the far-field condition, z dR ? λ = 633 nm of waist 0.250 mm? What is the size
(v) Show that according to the Fraunhofer of the beam after it has propagated 500 m?
approximation, displacing the slit by a distance,
(5.18) Estimate of laser spot size
d, along the x axis does not change the intensity
Write an equation for the angular divergence Δθ of
distribution.
a laser beam with wavelength, λ, and beam waist,
[Hint:
w0 . Use this expression to estimate the beam
ˆ d+a/2 πax
radius of a red laser pointer with λ = 0.63 μm and
−i2πxx /(λz)
e dx = e−i2πxd/(λz) asinc .] w0 = 1.0 mm at a distance z = 10 m downstream
d−a/2 λz
of the waist. Comment on what is assumed
(vi) How far, in practice, does the diffraction in order to make this estimate. Beyond what
pattern move for d = 1 mm? Express your answer distance does this assumption begin to become
as a fraction of the distance to the first zero, reasonable? Would you expect the actual beam
and comment on the accuracy of the Fraunhofer radius to be larger or smaller than your estimate?
approximation. (5.19) Laser beam size at a satellite
(vii) In contrast to the far-field case, the A laser with wavelength 1.0 μm is used to send
Fraunhofer diffraction formula is exact in the focal signals to a satellite in a geostationary orbit,
plane of a lens. If a lens with focal length f = 36 × 106 m above the Earth’s surface. Estimate
10 cm is placed in the aperture plane, what is the the laser beam radius at the satellite if the initial
distance to the first zero along x in this case? beam radius w0 = 1.0 mm. What value of w0
(viii) How far does the diffraction pattern move if should be chosen to optimize the power density at
the slit is displaced by a distance, d = 1 mm, along the satellite?
x in the lens plane? (5.20) Area scaling of far-field diffraction peak intensity.
(5.15) Young’s double slit Consider the case of Fraunhofer diffraction with
An aperture is placed in the z = 0 plane a cartesian-separable aperture function f(x , y ) =
and illuminated uniformly with monochromatic g(x )h(y ). Write down an expression for the
light with wavelength λ and intensity I0 at far-field diffracted intensity in terms of Fourier
normal incidence. Write an equation for the transforms. The scaling x → x /α and y → y /β
intensity distribution in the far field a distance maintains the shape of the aperture, but scales
z downstream. If the aperture is a double slit the area by a factor of αβ. Show that the peak
with slit width a, height b, and slit separation 4a, intensity increases as the area-squared. Explain
write an expression for the aperture function and this result. [Hint: Think in terms of how much
the intensity distribution (in terms of λ). Which more light is transmitted by a wider aperture, and
intensity maxima are suppressed? List them all. what happens to the size of the diffraction pattern
Count the central maxima as zero. as the area of the aperture increases.]
Many waves II: Fourier 6
After this I looked, and there before me was a great multitude
6.1 Introduction 91
that no one could count [. . . ]
6.2 Fourier 91
The Book of Revelation, The Bible, New International
6.4 Propagation 98
Version, Chapter 7 Verse 9.
6.1 Introduction 6.7 Regular arrays 102
In Chapter 5, we analysed the propagation of light in terms of a sum Chapter summary 107
of many waves using a curved-wave basis. In this chapter we use the Exercises 107
complementary plane-wave basis. In Section 6.5 we shall show that these
two descriptions are mathematically equivalent, and that it is possible
to use whichever is more convenient. As a superposition of plane waves
propagating at different angles has the form of a Fourier transform,
the plane-wave basis is known as Fourier optics. We begin by reviewing
the mathematics of Fourier transforms, then consider a superposition of
plane waves propagating at different angles—the angular spectrum—and
then use the angular-spectral method to describe light propagation.
Fig. 6.1 Image of Joseph Fourier

6.2 Fourier after high-pass spatial filtering, see
Chapter 10. Image courtesy of Darcy
The complementary treatment of light propagation based on plane van Eerten and Jack Stevens, Durham
University, 2017.
waves—Fourier optics—is named after Joseph Fourier (Auxerre 1768–
Paris 1830),1 Fig. 6.1, who introduced the idea of summing waves 1
Fourier’s life coincided with turbulent
with different spatial frequencies to model heat conduction, in 1822. times in France—after the revolution,
he ended up in prison. After his
The ideas of Fourier were so powerful that they spread to other release he accompanied Napoleon on
fields,2 including quantum physics and electrical engineering. In optics, his Egyptian campaign in 1800, and
the Fourier description is based on the idea that it is possible to is reported to have made a rubbing
construct spatially varying light patterns by summing two plane waves of the Rosetta stone which he gave
to his nephew Jean-Francois Cham-
that propagate at different angles, as we saw in Section 3.3. By pollion (Figeac 1790–Paris 1832), who
adding infinitely many plane waves, with a particular distribution of completed the first decoding of the
propagation angles, it is possible to construct light fields with any hieroglyphic script. One of the
spatial distribution. We shall show that the distribution of propagation last contributions of Fourier was to
highlight the importance of certain
angles—known as the angular spectrum—and the spatial distribution gases on the temperature of the Earth.
are related via a Fourier transform. 2
As discussed in Chapter 3, the first
Before discussing the angular spectrum concept, we shall briefly review computers were built to predict the
the mathematics of Fourier: first, in the form of a discrete sum of tides by summing a Fourier series.
waves—a Fourier series, and then a continuous sum of infinitely many
92 Many waves II: Fourier
waves—a Fourier transform. Whereas a discrete sum of waves produces

a wave form that is periodic, extending the sum to infinitely many waves
3
The reader familiar with these topics can produce a localized wave form that does not repeat.3
can jump to Section 6.3. Fourier series: Our first task is to understand how a sum of
many harmonic waves with different frequencies or spatial frequencies—
Fourier synthesis—can produce a particular wave form. A schematic
of Fourier synthesis is shown in Fig. 6.2, which depicts a discrete sum
of many waves with different spatial frequencies. By summing a finite
Fig. 6.2 Building a periodic wave

form, f(x), from a discrete sum of
harmonic waves. The amplitudes, aj ,
and frequencies, uj , of each component
are given by the spectrum on the left.
For a discrete sum, or Fourier series, the
superposition produces a periodic train
of wave packets.
number of waves with different spatial frequencies we are able to build a

particular dimensionless function, f(x), where x is a spatial coordinate.
In optics, this dimensionless function may be converted to the electric
field of light using E = E0 f(x), where E0 is a constant with units of
electric field.
4
Mathematicians are quite precise The mathematical statement4 of Fig. 6.2 is that we can write the
about which functions can be expanded function, f(x), as a sum of harmonic waves of the form,
as a Fourier series, and in which
sense the right-hand side of eqn (6.1) ∞
∞

converges to f(x). Here we restrict our- f(x) = a0 + aj cos (2πuj x) + bj sin (2πuj x) , (6.1)
selves to physically relevant solutions
j=1 j=1
that are encountered in optics which
are continuous, bounded, and don’t
have an infinite number of maxima or where a0 is a constant and aj and bj are the amplitudes of the cosine
minima. In this case, we can safely use and sinusoidal waves with spatial frequencies uj , respectively. For a
eqn (6.1). finite number of waves, as in Fig. 6.2, the function constructed, f(x),
is periodic. If the period is d; i.e., f(x) = f(x + d) for all x, then the
fundamental spatial frequency used to construct the wave form, u1 , is
simply the inverse of the spatial period: u1 = 1/d. In addition to u1 ,
we only require spatial frequencies, uj = j u1 = j/d, that are integer
multiples of the fundamental frequency.
The process of finding the appropriate amplitudes, aj and bj , is known
as Fourier analysis. The amplitude a0 is simply equal to the average
value of f(x) over one spatial period; the amplitudes aj and bj are
6.2 Fourier 93
obtained by using the orthogonality of sine and cosine. Multiplying

eqn (6.1) by cos (2πuj x) and sin (2πuj x), respectively, and integrating
over one spatial period, yields5 5
Note that it is conventional to write
ˆ ˆ the limits of the integrals as 0 and d,
2 d 2 d j2πx but they can be evaluated between any
aj = f(x) cos 2πuj x dx ≡ f(x) cos dx , (6.2) x and x + d; a judicious choice of x
d 0 d 0 d
ˆ ˆ can make the algebra easier.
2 d 2 d j2πx
bj = f(x) sin 2πuj x dx ≡ f(x) sin dx . (6.3)
d 0 d 0 d
It is slightly cumbersome to have cosine and sine waves of the same
spatial frequency, and they can be combined into a cosine wave with
a phase: aj cos (2πuj x) + bj sin (2πuj x) ≡ ãj cos (2πuj x + φj ). More
6
elegantly, as we saw in Section 1.11, we can use complex notation, and In optics, positive and negative spatial
eqn (6.1) can be written as frequencies emerge naturally in com-
plex notation, corresponding to plane
∞
∞
waves inclined at positive and negative
f(x) = cj e i2πuj x
≡ cj eij2πx/d , (6.4) angles relative to the z-axis, i.e.,
j=−∞ j=−∞ positive or negative components of the
wave vector, respectively. The origin of
where the amplitudes, negative frequencies in the time domain
∞ is less obvious, but discussed in detail
in Chapter 7. A complex coefficient
cj = f(x)e−i2πuj x , (6.5) has the interpretation that the real part
j=−∞ of cj tells us how much cosine, and
the imaginary part how much sine are
are, in general, also complex.6 In Fig. 6.3 we show an example where needed to make the desired function.
Fig. 6.3 Frequency spectrum, F(u),

(left) and wave form, f(x), (right).
Discrete frequencies (vertical bars) are
separated by u1 = 1/d creating a wave
form with period, d. Top row: two
waves propagating left and right with
spatial frequency u = ±1/d form a
standing wave with wavelength, d. The
amplitude is offset by adding a zero-
frequency component (a constant).
Adding higher spatial frequencies (rows
2 to 5) with amplitudes and frequencies
given in the left-hand column produces
a square wave. Bottom row: Filling in
the gaps within the discrete frequency
spectrum cancels all maxima except a
single central rectangular function of
width a = d/2, with a corresponding
frequency spectrum with first zero at
u = 1/a = 2/d.
the amplitudes, cj , of each complex exponential are chosen in order to

construct the desired wave form—in this case a square wave.
Discrete to continuous: Now we want to extend the discrete sum
of many waves, eqn (6.4), to a continuous sum of infinitely many. First
we rewrite the discrete sum, eqn (6.4), as an integral,
ˆ ∞
f(x) = F(u)ei2πux du . (6.6)
−∞
Here F(u) is a continuous function of spatial frequency, u, with

7
For the case of the discrete sum, F(u) dimensions of inverse spatial frequency, i.e. length.7 This equation can
can be written as a sum of Dirac δ- be applied to any function, f(x), and is known as an inverse Fourier
functions:
∞
transform. It says that we can build a function, f(x), as a sum of

F(u) = cj δ(u − uj ) . infinitely many waves with spatial frequencies, u, and amplitudes, F(u).
j=−∞ Fourier transform: As for a Fourier series, eqn (6.3), we can find
See Section B.2 for more details. the amplitudes, F(u), via Fourier analysis of the function, f(x), the
continuous equivalent of eqn (6.5):
ˆ ∞
F(u) = f(x)e−i2πux dx . (6.7)
−∞
8 This equation says that the spectrum of wave amplitudes, F(u), is given
There is no standard convention for
the location of the minus sign in the by the Fourier transform of the spatial function, f(x). An example of
exponential in eqns (6.6) and (6.7), nor a Fourier transform is given in the bottom row of Fig. 6.3, with F(u)
where to include the factors of 2π. In and f(x) in the left and right columns, respectively. Filling in the gaps
the inverse transform, we use the sign in the discrete spectrum has the effect of suppressing all the repetitions
convention that matches the physics
convention for the positive frequency of the periodic wave form, leaving just one. In summary, a Fourier
term in the complex form of the plane- series, or discrete sum of waves, produces a periodic wave form; whereas
wave solution, ei2πux . Instead of using a continuous integral, or Fourier transform, produces any wave form.
the spatial frequency u, we could write
Often, we shall use the following shorthand notation:
the Fourier transform in terms of the
phase change per unit distance, which ˆ ∞
in optics corresponds to the component F(u) = F [f(x)] (u) = f(x)e−i2πux dx , (6.8)
of the wave vector in the x direction, −∞
kx = 2πu. In this case, the Fourier
transform takes the form: and
ˆ ∞
F(kx ) = f(x)e−ikx x dx , (6.10) ˆ ∞
−∞ −1
f(x) = F [F(u)] (x) = F(u)ei2πux du , (6.9)
and the inverse transform is −∞
ˆ ∞
dkx
f(x) = F(kx )eikx x . (6.11) for the forward and inverse transforms, respectively.8
−∞ 2π
Both the u and kx forms of the Fourier
In Chapter 8 we shall look beyond monochromatic light to consider
transform have their advantages and waves with different wavelengths. In this case, the wave form is made up
disadvantages. Generally, we shall use of components where the wave vector k not only has a range of directions
the u form, and convert to kx as (spread in kx and ky but k is fixed), but also a range of magnitudes,
required.
spread in k = 2π/λ. If the spectrum of angular frequencies, ω = ck, is
9
F(ω)dω is the amplitude of the given by a function, F(ω),9 then we use a Fourier transform in the time
field component with angular frequency domain to relate the time dependence of the field f(t) to the spectrum:
between ω and ω + dω.
ˆ ∞
dω
f(t) = F(ω)e−iωt , (6.12)
−∞ 2π
which says that we can reconstruct a function of time as a sum of spectral

components. The spectrum is given by the Fourier transform of the time
dependence,
ˆ ∞
F(ω) = f(t)eiωt dt . (6.13)
−∞
Note, in particular, the different sign convention for the time-frequency

Fourier transform, in order to be consistent with a wave, ei(kz−ωt) ,
traveling in the positive z direction.
6.3 Angular spectrum

Whereas a single plane wave with a unique propagation has infinite
spatial extent with a constant intensity, a light field with a spatially
varying intensity must have a spread in propagation angles, and
hence a spread in momentum. It is this spread that leads to diffraction.
In this section, we apply the Fourier concept to rewrite a spatially
varying field in a particular plane as a superposition of plane waves 10
Note the variation in the use of the
propagating at different angles. The distribution of these plane waves term ‘spectrum’ here. We are still
is known as the angular spectrum.10 Below, we show how the spread talking about monochromatic light,
in angles characterized by the angular spectrum is related to the spatial there is only one frequency, and only
one value of k. A ‘spectrum’ of angles
distribution via a Fourier transform. is distinct from the more commonly
Let the field along the x axis, in the z = 0 plane, be encountered ‘spectrum’ of frequencies.
E (0) = E0 f(x ) , (6.14)

where E0 is the electric field amplitude and f(x ) is a dimensionless
function that describes the spatial distribution. From the definition
of the inverse Fourier transform, eqn (6.6), we can write
ˆ ∞

E (0) = A(0) ei2πux du = F −1 [A(0) ](u) , (6.15)
−∞
where A(0) = E0 F(u). Equation (6.15) says that we can write the field
E (0) as a superposition of plane waves with amplitudes, A(0) , and spatial
frequencies, u, along the x axis. As spatial frequency is related to the
angle of propagation via u = sin θ/λ, see Fig. 6.4, this is saying that any
spatial field distribution along x can be written as a superposition of
plane waves propagating at angles, θ, relative to the z axis.
To illustrate this angular-spectrum concept, in Fig. 6.5 we revisit the
Fig. 6.4 The sinusoidal curves show
case of two plane waves, first considered in Section 3.3. In (a) and (b) we the phase variation along x (or x) for
show the intensity patterns produced by two plane waves propagating plane waves propagating at the angle,
at different angles in the xz plane (note that the z axis is vertical in this θ, relative to the z axis, for two values
of θ. The larger the angle, the higher
plot). Below in (c) and (d) we plot the distribution of propagation angles,
the spatial frequency in the x direction.
the angular spectrum, A(0) = E0 F(u), which in this case is represented In the small-angle approximation, the
by two Dirac δ-functions, see Section B.2, one for each wave. Using spatial frequency in the x direction is
the relationship between spatial frequency and propagation angle, two u = θ/λ.
plane waves propagating at angles θ = ±θ0 /2 have spatial frequencies,
u = ±θ0 /(2λ), and as for eqn (6.7), we can write

θ0 θ0
F(u) = F0 δ u + +δ u− , (6.16)
2λ 2λ
where F0 is an amplitude factor. Figure 6.5 shows that if we increase the

angle θ0 as in (b), then spatial frequencies increase and the interference
11
Note that the case shown in Fig. 6.5 fringes become more localized.11
is special in that both waves have In Fig. 6.5, F(u) only contains two plane waves, the spectrum is
the same spatial frequency along the
propagation direction, i.e. the same
discrete, and the resulting wave form is periodic. By adding more
value of kz . Consequently, the field is waves—infinitely many—we can form any light distribution, as shown in
uniform in the z direction. This is not the bottom row of Fig. 6.3. The relationship between the forward and
usually the case. For example, when inverse transforms, eqns (6.6) and (6.7), tell us how to find the angular
we add more than two waves, as we
saw in Fig. 3.15, there is more than one
spectrum, i.e. the amplitude and phase of each plane-wave component
value of kz and the field also varies in needed to form a particular light distribution, E (0) . Using the definition
the z direction. This z dependence is of the Fourier transform, eqn (6.7), we can write
the basis of propagation, as we shall see
later. A(0) = F E (0) (u) , (6.17)
where A(0) is a function of spatial frequency, u, and hence the angle

of propagation, θ. Hence the spatial field distribution, E (0) , and the
angular spectrum of plane waves, A(0) , are a Fourier transform pair.
Fig. 6.5 Angular spectrum con-

cept: Two plane waves propagating at
angles, θ = ±θ0 /2, relative to the z axis
for (a) small θ0 and (b) large θ0 . The
resulting interference pattern is shown
in the background. The corresponding
angular spectra, F(u), are shown in (c)
and (d) below. The plane waves have
spatial frequencies u = ±θ0 /2λ in the
x direction.
In general, there are usually two distinct propagation angles, corre-

sponding to the xz and yz planes, and hence two spatial frequencies (or
Fourier variables), u and v (or kx and ky ), corresponding to the x and
12
We can always write the axial y directions, respectively.12 For two transverse dimensions, x and y, the
component of the wave vector, kz , in angular spectrum is given by a two-dimensional Fourier transform,
terms of kx,y using
2π A(0) = F[E (0) ](u, v) , (6.18)
k = (kx2 + ky2 + kz2 )1/2 = .
λ
where E (0) = E0 f(x, y) and
ˆ ∞
F[f(x, y)](u, v) = f(x, y)e−i2π(ux+vy) dxdy .
−∞
The field in the z = 0 plane is composed of a superposition of plane

waves with amplitudes A(0) :
ˆ ∞
−1
E (0)
= F [A ](x, y) = E0
(0)
f(x, y)ei2π(ux+vy) dxdy . (6.19)
−∞
Next, we shall look at a specific example, the angular spectrum of a laser

beam.
Example 6.1
Angular spectrum of a laser beam: Consider a laser beam, see Chapter 11 for
more details. The transverse field dependence in the z = 0 plane is given by
2
/w2
E (0) = E0 e−x , (6.20)
where w is the beam radius. The intensity distribution is plotted √ in Fig. 6.6 and
has the form of a gaussian with a standard deviation, or a 1/ e-width, Δx = w/2.
What is the angular spectrum associated with this light distribution?
The angular spectrum, eqn (6.17), is given by the Fourier transform of the field. In
Fig. 6.7 we show the construction of a gaussian wave form from harmonic waves. As
in Fig. 6.3 a discrete sum or Fourier series produces a periodic wave form, in this case
a train of gaussian wave packets. To suppress all the repetitions of the wave packet,
except one, we fill all the gaps in the discrete spectrum, i.e. replace the Fourier series
by a Fourier transform. For the special case of a gaussian wave form, the spectrum
is also a gaussian function. As the variables x and u are interchangeable, the Fourier Fig. 6.6 The transverse intensity
inversion theorem—the relationship between the forward and inverse transforms, see profile of a laser beam. At a transverse
Appendix B—is easily proved for this case. Mathematically, the Fourier transform displacement equal to the beam radius
is found by completing the square, see Section B.6, giving w the intensity falls to 1/e2 times the
√ 2 2 2 on-axis value.
A(0) = F E (0) (u) = πwE0 e−π u w . (6.21)
This expression tells us that the distribution of propagation angles, θ = sin−1 (uλ),
and hence the distribution of transverse momentum, px = h sin θ/λ, needed to form a
localized gaussian light distribution, is also gaussian. Substituting u = sin θ/λ ≈ θ/λ,
Fig. 6.7 Summing harmonic waves to

form a localized gaussian wave form.
In the left column we show the spatial
frequency spectrum, in the right we
show their sum. In the top row (i) we
superpose three frequencies, u = 0 and
±1/(10w). The zero spatial frequency
produces an offset and the + and −
produce a cosine, so the sum appears as
a single wave, offset from zero. In the
second row (ii), we add another spatial
frequency (both + and −), and so on.
Row (v) is a sum of 11 frequencies. The
bottom row shows the continuous sum,
or Fourier transform.
the distribution of angles is

√ 2
w2 θ 2 /λ2
A(0) = πwE0 e−π . (6.22)
If we define an angular width (or divergence), Δθ as
√ 2 2
A(0) = πwE0 e−θ /(Δθ) , (6.23)
then from the angular spectrum eqn (6.22) we find that
λ
Δθ = . (6.24)
πw
The angular divergence of a localized gaussian beam is illustrated in Fig. 6.8. This
inverse relationship between the initial real space width, w0 , and the angular spread,
Δθ, is true for all distributions, not just gaussians. We can also express the
angular divergence in terms of a transverse momentum distribution. The momentum
distribution is proportional to the modulus-squared of the angular spectrum,
2 2
u w2 2 2
/22
|A(0) |2 = πw2 E02 e−2π = πw2 E02 e−px w ,
where
√ we have used u = 2πpx /. This gives an uncertainty (standard deviation or a
1/ e-width) in the x-component of photon momentum of
Δpx = /w . (6.25)
Combining this with the uncertainty in position obtained from the intensity
distribution, Δx = w/2, we find that
Fig. 6.8 A gaussian laser beam with
width (standard deviation) Δx has a ΔxΔpx = /2 , (6.26)
momentum spread Δp = /(2Δx) which is the Heisenberg uncertainty relationship for photons at the waist of a
leading to an angle divergence Δθ = laser beam. In optics, Heisenberg’s uncertainty relationship—arising from the Fourier
2(Δp/p). The factor of 2 arises, as in relationship between space and momentum—tells us that if our light distribution has
Fig. 6.6, because Δθ is the angle to the a small spatial extent, it must have a large spread in transverse momentum, and vice
1/e√2 -intensity radius, whereas Δp is a versa.13
1/ e-width.
13
As we shall see in Chapter 11,
eqn (6.26) is only true at the position,
where the laser beam radius w is a
minimum. This position is called the
beam waist and the minimum value
6.4 Propagation
of w is written as w0 .
A common scenario in optics is that we know the field in a particular
plane, for example, E (0) in the z = 0 plane, and want to find the
field, E (z) , after propagating a distance z. Using the angular spectrum
method, propagation reduces to multiplying each plane-wave component
by a propagation phase. Recalling eqn (2.10), E = E0 ei(kx x+ky y+kz z) , we
see that in moving from z = 0 to z, the plane wave acquires a phase
14
For a plane wave propagating at a eikz z , which is called the propagator.14 Consequently, if the angular
small angle, θ, relative to the z axis spectrum in the input plane at z = 0 is A(0) , then the angular spectrum
shown in Fig. 2.5, the phase variation
along z is typically much faster than
a distance z downstream will be
along x, kz kx for small θ.
A(z) = eikz z A(0) . (6.27)
From A(z) (the spectrum of plane waves at z) we can reconstruct the

field at z using an inverse transform,

E (z) = F −1 A(z) . (6.28)
2 2
15
|A(z) |2 is proportional to the proba- Note that A(z) = A(0) is required by momentum conservation.15
bility density function of light described Substituting for A(z) using eqn (6.27) and using that A(0) = F[E (0) ], we
by a plane wave with wave vector k =
(kx , ky , kz ), and hence the fraction of
find that the field in a plane a distance z downstream is
photons with momentum p = k. For
propagation in free space, momentum
and hence |A(z) |2 is conserved. E (z) = F −1 eikz z F[E (0) ] . (6.29)
This is known as the hedgehog equation and provides a complete

solution of Maxwell’s scalar wave equation for monochromatic light.
Operationally, the hedgehog equation consists of three steps, illustrated

in Fig. 6.9: (1) Decompose the input field, E (0) , as a superposition of
plane waves with amplitudes, A(0) = F[E (0) ] . (2) Multiply each plane
wave by a propagation phase, A(z) = A(0) eikz z . (3) Reconstruct the field,
E (z) = F −1 [A(z) ], as a superposition of phase-shifted plane waves.16 16
Hedgehog pronounced hed-ge-hog,
In Section 6.5 we show that the Fresnel diffraction integral, eqn (5.2), helps us remember the three steps: hog
is a Fourier transform, ge represents the
can be derived directly from eqn (6.29). The extensions to ‘white’ light Green’s function propagator, and hed is
and vector fields are discussed in Chapters 7 and 12, respectively. To an inverse transform.
propagate a monochromatic light field in the z direction, the Fourier
transform and inverse transform in eqn (6.29) are two-dimensional.
However, if we can separate the x and y direction and we are only
interested in the field either in the x or y direction then we can use
one-dimensional Fourier transforms. The principle—Fourier transform,
propagate, inverse transform—is illustrated in Fig. 6.9. Unless otherwise
stated all simulations of light propagation in this book are calculated
using eqn (6.29).17 Once we have the field we convert to intensity
by finding the modulus-squared. Figure 6.10 shows an example of Fig. 6.9 Schematic illustration of the
three steps of the hedgehog equation,
the application of eqn (6.29), the light distribution downstream of a
eqn (6.29).
uniformly illuminated rectangular aperture.
17
Equation (6.29) can be written as a
single line of computer code, e.g.
EZ=ifft2(P*fftshift(fft2(E0))) ,
Example 6.2
Hedgehog solution of Helmholtz equation Here we present an alternative where P=exp(-i*KZ*Z) is the propaga-
derivation of eqn (6.27), and hence eqn (6.29), starting from the Helmholtz equation, tor, KZ=2*pi*sqrt(1/(L*L)-U*U+V*V)
eqn (1.40): is the axial component of the wave
vector, L is the wavelength, and
∂2E ∂2E ∂2E ‘fft’ represents an in-built fast Fourier
2
+ 2
+ + k2 E = 0 .
∂x ∂y ∂z 2 transform, see Section B.14.
On substituting
¨ ∞ dkx dky
E= Aei(kx x+ky y) ,
−∞ 2π 2π
we find that
¨ ∞
dkx dky i(kx x+ky y) d2 A
e + (k2 − kx2 − ky2 )A = 0,
−∞ 2π 2π dz 2
which is satisfied if the integrand is zero for any kx or ky , i.e.
d2 A
+ (k2 − kx2 − ky2 )A = 0 .
dz 2
This equation has the solution
2 2 2 1/2
−kx −ky z
A(kx , ky , z) = ei(k )
A(kx , ky , 0) ,
Fig. 6.10 Application of eqn (6.29):
in agreement with eqn (6.27). The light distribution in the input
plane, E (0) , is propagated a distance
z to find E (z) and I (z) . The x
dependence of each I (z) ‘slice’ is
plotted vertically at horizontal position
6.5 Fourier to Fresnel z to produce a map of the intensity
distribution in the xz plane.
In this section we derive the Fresnel diffraction integral from the

Fourier propagation equation—the hedgehog equation, eqn (6.29). This
result demonstrates that the two complementary bases—curved or

planar waves (Huygens–Fresnel or Fourier)—are formally equivalent. As
the hedgehog result is a full solution of the Helmholtz equation, it follows
that the Fresnel diffraction integral is a formal mathematical solution of
the Helmholtz equation, eqn (1.40), in the paraxial regime.
We re-write the hedgehog equation for plane-wave propagation,
eqn (6.29), in the form

E (z) = F −1 F[h]F[E (0) ] , (6.30)
where the function h is the inverse Fourier transform of the propagator:
h = F −1 [H] = F −1 [eikz z ] .
Next, we use the convolution theorem, F[g∗h] = F[g]F[h], which tells us

that the inverse Fourier transform of a product of two Fourier transforms
is simply a convolution, F −1 {F[g]F[h]} = g ∗ h. Hence we can rewrite
the field in the plane z as a convolution integral,
18
The propagator is cartesian separa-

ble. For both x and y we define a width,
w, using w2 = −λz/(iπ), such that for E (z) = F −1 F[E (0) ]F[h] = E (0) ∗ h . (6.31)
the x direction, we have
e−iπu
2
λz
= e−π
2 2
u w2
, In the paraxial regime, we can calculate h analytically. Expanding kz in
terms of kx and ky ,
and the inverse transform becomes
2 2 2 1 2 2
F −1 [e−π u w ](u) = √ e−x /w , kx2 ky2
πw kz = (k 2 − kx2 − ky2 )1/2 k − − = k − πλ(u2 + v 2 ) , (6.32)
1 2 2k 2k
=√ e−πx /(iλz) ,
iλz
and the 2D inverse Fourier transform of the propagator is
and similarly for the y direction.

eikz F −1 e−iπ(u +v )λz (u, v)
2 2
h =
19
Note that the 1/i = e−iπ/2 factor
appearing in the Fresnel diffraction 1 ikz iπρ2 /(λz)
= e e ,
integral corresponds to a phase advance iλz
π/4 from each dimension. In the
time domain, an input field e−iωt where ρ2 = x2 + y 2 , and we have used the Fourier toolkit re-
becomes e−i(ωt+π/2) . This phase
advance is a problem for the secondary
sult, eqn (B.38).18 Inserting this result into the convolution integral,
wave concept because it appears that eqn (6.31), we find
secondary waves are ahead of the
ˆ ∞
incident field violating causality! The
phase advance is known as the Gouy E (z) = E (0) ∗ h = E (0) (x , y )h(x − x , y − y )dx ,
−∞
phase and we shall analyse it again for ˆ
the case of laser beams in Chapter 11. eikz ∞ (z) ik[(x−x )2 +(y−y )2 ]/2z
The Gouy phase can be interpreted = E e dx dy , (6.33)
geometrically as we cannot focus light
iλz −∞
to a point, a focused light beam
travels less far than predicted by the which is the Fresnel diffraction integral, eqn (5.2). By deriving the
geometrical path, see Boyd (1980). Fresnel diffraction integral from the hedgehog equation, eqn (6.29), we
demonstrate the equivalence of the Huygens–Frensel and Fourier optics
viewpoints. In addition, we obtain the amplitude and phase of the
constituent waves, E (0) /iλz, directly.19
6.6 Fresnel to Fourier

In Section 6.5, we saw how to obtain Fresnel’s diffraction integral from
Fourier optics. In this section, we rewrite the Fresnel diffraction integral
in the form of a Fourier transform. This allows us to employ the full
trickery of the Fourier transform toolkit, Appendix B, to solve diffraction
problems. Using the definition of the Fourier transform, the Fresnel
diffraction integral, eqn (5.4), can be rewritten as:
E0 eikr̄ ikρ2 /2z
E (z) = F f(x , y )e (u, v) , (6.34)
iλz
where r̄ = z + ρ2 /2z and the Fourier variables are u = x/(λz) and
v = y/(λz). Equation (6.34) says that the Fresnel diffraction integral
can be written as a Fourier transform of the input field, times a factor
2
eikρ /2z arising from the finite size of the aperture. In the focal plane of
2
a lens, f(x , y ) ⇒ f(x , y )e−ikρ /2f and we end up with
E0 eikr̄
E (f ) = F [f(x , y )] (u, v) , (6.35)
iλf
with r̄ = f +ρ2 /2f . Similarly in the far field, we can make the Fraunhofer
2
approximation, eikρ /2z 1, and obtain the Fraunhofer diffraction
formula:
Fig. 6.11 (i) Laser illumination of a
I0 vertical double slit with width a, sepa-
I (z) = |F [f(x , y )] |2 , (6.36) ration d = 5a, and height b much larger
λ2 z 2
than the other dimensions. Re-scaled
for convenience. (ii) The far-field
intensity pattern consists of cosine-
where the Fourier variables are u = x/λz and v = y/λz. If the input squared fringes with a sinc-squared
function is cartesian separable, f(x , y ) = g(x )h(y ), then we can write envelope in the horizontal direction.
The gaussian intensity distribution of
I0
I (z) = |G(u)|2 |H(v)|2 , (6.37) the laser beam is observed in the
λ2 z 2 vertical direction. Note that (i) and (ii)
are not to scale.
with G(u) = F[g(x )](u) and H(v) = F[h(y )](v). Also for a field that
does not change in the y direction, we can perform the y integral as in
Chapter 5, and the field in the xz plane is
E0 eikr̄ ikx2 /2z
E (z) = √ F f(x )e (u) ,
iλz
where r̄ = z + x2 /2z. Next, we apply these results to a few simple
examples to illustrate their utility.
Example 6.3
Double slit revisited: Consider the case of two slits in the form of two rectangular
apertures each with width a, height b, separated by a distance, d, where b > d > a.
The horizontal component of the aperture function is

x
g(x ) = rect
(2)
∗ Xd (x) , (6.38)
a
(N )
where we have made use of the replicating comb function, Xd (x), see Section B.9,
to make identical copies of the single-slit aperture function. The Fourier transform
(2)
of Xd (x) is

F Xd (x) (u) = e−iπud + eiπud = 2 cos πud ,
(2)
(6.39)
where u = x/λz. For laser illumination with a beam size, w0 < b, the light
distribution in the y direction is given by h(y ) = gauss(y /w0 ), see Fig. 6.11(i).
For an observation distance, dR z zR , we are in the far field for diffraction in
the x direction, but there is no diffraction in y and the y integral returns a factor of
√
λz times h(y). Using the convolution theorem for the x direction, Section B.4, we
can use eqn (6.37) to write the Fraunhofer intensity distribution as
πax
4I0 a2 πdx y
I (z) = cos2 sinc2 gauss2 . (6.40)
λz λz λz w0
This intensity distribution is shown in Fig. 6.11(ii). The x dependence is the same
as in eqn (5.32). This example illustrates the convenience of using Fourier methods.
6.7 Regular arrays

When calculating Fraunhofer diffraction patterns, there are numerous
types of apertures for which Fourier techniques are particularly suitable.
Here we discuss the case of a regular array of identical apertures—
some examples are shown in Figs. 6.12 and 6.13. Let f1 (x , y ) be a
two-dimensional aperture, with Fourier transform F(u, v). We consider
N identical copies of the aperture, placed in a regular array along the
x axis, with centres separated by d. For convenience we place the array
Fig. 6.12 (i) Three identical circular

apertures of diameter D centred at
(x , y ) with values of (−d, 0), (0, 0),
and (d, 0). (ii) and (iii) Intensity of the
Fraunhofer diffraction pattern along
the x and y axes, respectively. (iv)
The complete pattern in the xy plane.
Along y an Airy pattern is seen, with
a peak intensity nine times that of a
single aperture, and first zeros at angles
of ±1.22λ/D. Along the x axis the
Airy pattern provides an envelope, and
a characteristic fringe pattern of three
identical objects is seen. The principal
maxima are at ±(λ/d)z, ±2(λ/d)z, etc.
20
As we are interested in the inten- symmetrically about the origin.20 We can write the aperture function
sity diffraction pattern we know that for the array as
the square modulus of the Fourier
(N −1)/2
transform of the aperture function is
invariant under a translation, therefore fN (x , y ) = f1 (x ± md, y ) ,
we are free to choose a convenient
origin. m=0
(N −1)/2

= f1 (x , y ) ∗ δ (x ± nd, y ) . (6.41)
m=0
6.7 Regular arrays 103
We have chosen to write the function explicitly as a sum of displaced

functions, but could also use the comb function.21 To calculate the 21
We could write
Fraunhofer diffraction pattern, we need the Fourier transform, FN (u, v),
fN (x , y ) = f1 (x , y ) ∗ Xd
(N )
(x) ,
which is given by
see Section B.9.
FN (u, v) = F [fN (x , y )] ,
(N −1)/2

= F1 (u, v) F [δ (x ± nd, y )] ,
m=0
= F1 (u, v) e−iπu(N −1)d + e−iπu(N −3)d + · · · eiπu(N −1)d ,

sin(N πud)
= F1 (u, v) , (6.42)
sin(πud)
where we have used the linearity property of Fourier transforms and
the convolution theorem between the first and second lines; the fact
that the Fourier transform of a δ-function is a phase factor between the
second and third lines; and recognized the sum in the penultimate line
as a geometric progression. From eqn (6.36), we predict a Fraunhofer
intensity pattern of
function of the array

2
I0 sin(N πud)
I (z) = 2z2
|F [f1 (x , y )] |2 . (6.43)
λ

sin(πud)
single aperture distribution
The intensity diffraction pattern has two components: (i) the diffraction
pattern we would have obtained with only one aperture, and (ii) a
function of the array only, i.e. independent of the details of the aperture.
This result is known as the array theorem. Figures 6.12 and 6.13
show examples of a regular arrays of three identical circular apertures
and nine triangular apertures, respectively. The extension to a regular
two-dimensional array, Fig. 6.13, is an end-of-chapter exercise. The
array theorem is particularly useful because Nature often gives us exact Fig. 6.13 The aperture distribution
(above) and intensity diffraction pat-
copies of some functions in a regular array, e.g. crystals; eqn (6.43) is tern (below) for nine identical trian-
often used in crystallography. gular apertures. Note the alternating
principal and subsidiary maxima in
both x and y, with peak intensities set
by the diffraction pattern for a single
Example 6.4 triangular aperture.
Many slits and gratings revisited: As an example of a regular array we consider
laser illumination of N vertical slits. The input field is cartesian separable and we
can write the aperture function as f(x , y ) = g(x )h(y ), where along x the N -slits
are formed by a convolution of a rect function and a comb function with N teeth,
see Section B.9:

x
g(x ) = rect
(N )
∗ Xd (x) . (6.44)
a
For N = 5, g(x ) = rect(x /a) ∗ Xd (x ),
(5)

F Xd (x) (u) = e−i4πud + e−i2πud + 1 + ei2πud + ei4πud ,
(5)
= 1 + 2 cos 2πud + 2 cos 4πud , (6.45)

and we obtain the intensity pattern shown in Fig. 5.23. Experimental images
corresponding to the intensity patterns for one to six slits are shown in Fig. 6.14.
Note how (i) the intensity of the principal maxima increases as N 2 ; (ii) the number
of subsidiary maxima is N − 2; and (iii) the width of the principal maxima scales as
1/N . For larger N —the grating limit—we can write the aperture function as a
Fig. 6.14 Images of the Fraunhofer

diffraction patterns for between one
(top) and six (bottom) vertical slits
illuminated by a laser beam. There is
no diffraction in the vertical direction
and the intensity profile matches that
of the incident laser beam. In the hori-
zontal direction, as the number of slits
increases, the peak intensity is higher,
the principal maxima are narrower, and
there are more subsidiary maxima, as
expected, see also Fig. B.10. Data
courtesy of Sarah Bunton, Ogden Trust
intern, Durham University, 2016.
product of an infinite Dirac comb, see Section B.9, and a function that describes
how the grating is illuminated. For uniform illumination of a grating with length
L = N d, the aperture function is

x x x 1 x
f(x ) = rect
(N )
∗ Xd (x) = rect ∗ rect X ,
a a L d d
and the diffracted field is proportional to

F f(x ) = N a sinc(πua) sinc(N πud) ∗ dX(ud) ,
∞

= N a sinc(πua) sinc[N π(ud − m)] ,
m=−∞
which is a comb of sinc functions centred at ud = m; if u = x/(λz) this maps into

positions, x = m(λ/d)z. If instead the grating is illuminated with a laser beam with
beam radius, w, satisfying d < w < L, the input field is described by

x x 1 x
f(x ) = rect ∗ gauss X ,
a w d d
and the diffracted field, proportional to
√
F f(x ) = N πw gauss(πuw) sinc(N πud) ∗ dX(ud) ,
is a comb of gaussians. See also Chapter 7, where these ideas are explored in the
time domain.
6.8 Babinet’s principle

22
Jacques Babinet (Lusignan 1794– Babinet’s principle22 states that complementary apertures23 produce
Paris 1872). identical diffracted intensity patterns in the far field (except at the
23
If f(x ) is the aperture function then origin). This is illustrated in Fig. 6.15, where laser illumination of a small
the complementary aperture is the square aperture (top) produces the same far-field pattern as the same-
function fc (x ) = 1 − f(x ).
sized square obstacle (below). This arises in X-ray diffraction, see e.g.
Fig. 5.19, where the diffraction pattern is the same whether we regard
the atoms as blocking the field or as scattering sources of the field. The
Fraunhofer limit of the Fresnel diffraction integral, eqn (6.36), for the
complementary function can be evaluated using the properties of Fourier
transforms, see Appendix B. For a two-dimensional complementary
Fig. 6.15 Babinet’s principle: The

Fraunhofer diffraction patterns (shown
on the right) produced by laser illu-
mination of complementary aperture
functions (shown on the left): a square
aperture (top) and a square obstacle
(bottom). The laser beam size is larger
than the dimensions of the square.
function fc (x , y ) = 1 − f(x , y ), the transform is

F [fc (x , y )] = F [1] − F [f(x , y )] ,
= δ(u, v) − F [f(x , y )] . (6.46)
Except at the origin, the modulus-squared of the Fourier transforms are
equal,24 24
The condition ∀x, y = 0 means
everywhere except x = y = 0.
|F [fc (x , y )] |2 = |F [f(x , y )] |2 , ∀x, y = 0 . (6.47)
Inserting this result into eqn (6.36) proves that the Fraunhofer intensity
diffraction patterns of complementary apertures are the same except on
axis (x = y = 0).25 25
One interesting consequence of Babi-
net’s principle is the extinction para-
dox, see Chapter 13.
Example 6.5
Diffraction by a wire: As an illustration of Babinet’s principle we consider the
scenario illustrated schematically in Fig 6.16, where a wire is placed in the near field
of a laser pointer. The radius of the laser beam, w0 , is larger than the diameter of
the wire, a: w0 > a. We observe the diffraction pattern at a distance less than the
Rayleigh range, zR = πw02 /λ, but much larger than the Rayleigh length for the wire,
dR z zR , where dR = a2 /λ. In this case, we are in the far-field Fraunhofer
limit for the obstacle, but remain in the near field of the laser beam, and can assume
that the gaussian beam profile of the laser is unchanged. The aperture function,
f(x , y ) = g(x )h(y ), is

x x y
g(x ) = 1 − rect gauss and h(y ) = gauss .
a w0 w0
In the x direction we have a product of two functions, so to find the Fourier transform
we use the inverse convolution theorem. Assuming that we can neglect the change
in the laser beam, we obtain
x πxa
x
G(u) = δ − asinc ∗ gauss , (6.48)
λz λz w0
where we have substituted u = x/λz on the right-hand side. Note that the gauss
function that describes the laser beam is unchanged because we are in the near field
of the laser beam, z < zR . The convolution results in a gaussian at the origin
x = 0 and a slight smearing out of the sinc function, which is negligible for w0 a.
Consequently, the intensity distribution—proportional to the modulus-squared of
G(u)—looks like the original laser beam superimposed on top of a much wider sinc-
squared pattern as shown in Fig. 6.16.
Fig. 6.16 Illustration of Babinet’s

principle using a wire in a laser
beam. The greyscale plot shows the
intensity distribution in the xz plane
downstream of the wire using a log scale
to emphasize the fringe pattern. The
input and output intensities along the x
axis are shown on adjusted linear scales
on the far left and right, respectively.
If we assume that the gaussian beam can be approximated as a δ-function in the

far field then for x > w0 the intensity along the x axis in the observation plane is
a2 πax
I (z) = I0 sinc2 , ∀x = 0 , (6.49)
λz λz
where ∀x = 0 means except close to x = 0. This is identical to the case of a single
slit, eqn (5.25), but the result is only valid outside the region of the undiffracted laser
beam, i.e., for x > w0 .
Exercises 107
Chapter summary
• In Fourier optics, the light field is written as a superposition of

plane waves propagating at different angles. The distribution of
angles is known as the angular spectrum, and the propagation
of light is described by multiplying each plane-wave component by
a propagation phase as expressed by the hedgehog equation,

E (z) = F −1 eikz z F[E (0) ] .
• In the paraxial regime, this propagation equation is equivalent to

the Fresnel diffraction integral.
• The Fraunhofer intensity diffraction pattern of aperture f(x , y ) is
I0
I (z) = |F [f(x , y )] |2 .
λ2 z 2
• The Fourier relationship between real space and momentum gives
rise to an uncertainty relation for the position and momentum of
photons in a light field.
• The Fresnel diffraction integral can be re-written in the form of
a Fourier transform with Fourier variables u = x/(λz) and v =
y/(λz).
• The Fraunhofer intensity diffraction pattern of a regular array
of identical apertures is the product of the diffraction pattern that
would be obtained by one aperture, and a trigonometric function
of the array.
• Babinet’s principle states that the Fraunhofer intensity
diffraction patterns of complementary apertures are the same,
except on-axis.
Exercises
(6.1) Fourier series coefficients (1) (6.3) Fourier series coefficients (3)
From the definition of the coefficients in eqn (6.1), Derive expressions which relate the coefficients cj
and by following the steps outlined in the text, of eqn (6.4) to ãj and φj .
derive the explicit relations of eqns (6.2) and (6.3). (6.4) Fourier series of a square wave
(6.2) Fourier series coefficients (2) A square wave with spatial period d is defined
Show that the coefficients in eqn (6.1) can be within one period as
combined into one amplitude and a phase for
1 |x| ≤ d/4
a cosine wave: aj cos (2πuj x) + bj sin (2πuj x) ≡ f(x) =
0 |x| > d/4
.
ãj cos (2πuj x + φj ), and derive expressions for ãj
and φj . (i) Show that the Fourier series of this function has
108 Exercises
coefficients, a0 = 1/2 and aj = −[2/(jπ)] sin jπ/2. (iv) Write an expression for the transverse
(ii) Sketch the function for the range −d ≤ x ≤ d. momentum distribution a distance z downstream
(iii) Explain why all bj terms are zero. of the beam waist. Comment on your reasoning.
(iv) Plot the Fourier series representation of the (6.7) Momentum distribution of a laser beam
series using Use the uncertainty principle for photons to derive
(a) the DC term and the fundamental spatial the standard deviation in the photon momentum
frequency, distribution in terms of the radius of the beam
waist, w0 . Comment on why the momentum
(b) the DC term, the fundamental, and the
distribution is independent of the laser wavelength
second harmonic, and
but the angular spread is not. [Hint: Δx = w0 /2.]
(c) the first ten non-zero terms.
(6.8) The transverse velocity of photons
(6.5) Fourier series of a rectified sine wave Write an equation for the angular divergence Δθ of
(i) Calculate the Fourier coefficients for a a laser beam with wavelength λ and beam waist
sinusoidal wave with a spatial period d, f(x) = w0 . Use this expression to estimate an average
sin (2πx/d). The rectified wave is defined within transverse velocity of photons, vx , for a red laser
one period d as pointer with λ = 0.63 μm and w0 = 1.0 mm.
Estimate the difference between the longitudinal
+ sin (2πx/d) 0 ≤ x ≤ d/2
aj = . velocity, vz , and the speed of light c = 3.0 ×
− sin (2πx/d) d/2 ≤ x ≤ d
108 m s−1 . Assume that the beam is cylindrically
(ii) Show that the Fourier series of this function symmetrical.
has coefficients bj = 0 and (6.9) Propagation
⎧ Explain, briefly, why if you know the light field in
⎨ 2/π j=0 the z = 0 plane, E (0) , you can then determine how
aj = 0 j = 1, 3, 5 . . . . the light will propagate downstream; whereas if
⎩
−(4/π)[1/(j 2 − 1)] j = 2, 4, 6 . . . you only know the intensity, I (0) , you can’t. What
information is missing in the intensity?
(iii) Sketch the function for the range −d ≤ x ≤ d.
(iv) Why are all bj terms zero? (6.10) Diffraction grating
(v) Why do we need only even numbered A one-dimensional diffraction grating has a
harmonics? transmission profile,

(vi) Plot the Fourier series representation of the x x
series using T(x ) = 0.5 + 0.4 cos 2π + 0.1 cos 4π ,
d d
(a) only the DC term, where d is the period of the grating. Show that the
(b) the DC term and the second harmonic, and intensity Fraunhofer diffraction pattern consists
(c) the first ten non-zero terms. of five spots. What is their angular location?
Calculate the relative intensities of the five spots.
(6.6) Angular spectrum of a laser
(6.11) Four identical apertures (1)
(i) Write an expression for field amplitude along
Four identical infinitesimally small holes are
the x axis of a laser beam with beam waist w0 .
aligned along the x axis, with neighbours
Assume that the laser is propagating in the z
separated by b. The transmission function of the
direction and that the beam waist is in the z = 0
aperture is
plane.
(ii) Write an expression for the x component of the T(x , y ) = δ(x + 3b/2, y ) + δ(x + b/2, y )
angular spectrum of plane waves A(0) in the z = 0 +δ(x − b/2, y ) + δ(x − 3b/2, y ).
plane.
(iii) Use your result to derive an expression for the Show that the Fraunhofer intensity diffraction
angular divergence. What is assumed about the pattern as a function of angles θx , θy is given by
angular width of the intensity distribution? 2
3bθx bθx
(iii) Using the de Broglie relation, write an expres- I(θx , θy ) = 4I1 cos 2π + cos 2π ,
sion for the transverse momentum distribution, i.e. 2λ 2λ
the probability of measuring a momentum, px , in where I1 is the intensity which would be obtained
the x direction. Comment on the normalization. from one hole. Sketch I/I1 as a function of θx .
Exercises 109
(6.12) Four identical apertures (2) (6.16) The letter H

Four identical circular apertures of radius a Sketch the far-field diffraction pattern for an
are aligned on a screen, with their centres aperture with the shape of the letter H. If
at (x , y ) = (−3b/2, 0), (−b/2, 0), (b/2, 0), and the width of the lines is a and the spacing of
(3b/2, 0) respectively (with b > a). Sketch two the vertical lines is d, what is the transverse
orthogonal cross-sections through the Fraunhofer displacement of the first minimum in horizontal
intensity pattern, taking the directions parallel and vertical directions?
and perpendicular to a line joining the aperture
centres. (6.17) The letter E
Sketch the far-field diffraction pattern for an
(6.13) Two-dimensional regular array aperture with the shape of the letter E. If
Generalize the result for a one-dimensional regular the width of the lines is a and the spacing
array or identical apertures encapsulated in between the horizontal lines is d, what is the
eqn (6.43) for a two-dimensional array. Assume transverse displacement of the first minimum in
that there are N copies spaced by dx along x, and the horizontal and vertical directions?
M copies spaced by dy along y.
(6.14) Babinet’s principle (6.18) Two-dimensional Fraunhofer diffraction patterns.
A hair with width a is placed in a collimated laser The images, (a)–(h) in Fig. 6.17 show the intensity
beam with beam radius w0 (a w0 ). What is patterns observed in the focal plane of a lens when
the effective Rayleigh length in this case? Is this different apertures are placed in the lens plane.
larger or smaller than the Rayleigh length for the Match the observed patterns to the apertures, (1)–
complementary aperture? (8), shown on the right.
(6.15) Extinction paradox and exoplanets (6.19) Fourier code
Comment on whether the far-field condition can be The images in Figs. 6.18 and 6.19, show sequences
met for the observation of exoplanets with a size of Fraunhofer diffraction patterns observed when
equal to that of the Earth. Assume that the radius letter-shaped apertures (both upper and lower
of the Earth and distance to the edge of the Milky case) are placed in the input plane. Use the
Way are 6 × 106 m and 6 × 1020 m, respectively, ideas of cartesian separability, interference, and
and that the observation is made at a wavelength symmetry to find each letter and decode the two
of 600 nm. words.
110 Exercises
Fig. 6.17 (a)–(h) Examples of optical

Fourier transforms. (1)–(8) input
images. See Exercise 6.18.
Fig. 6.18 Fourier transform code. See Exercise 6.19.
Fig. 6.19 Another Fourier transform code. See Exercise 6.19.

Optical phenomena in the
time domain 7
And time has told me, Not to ask for more, For some day 7.1 Introduction 111
our ocean, Will find its shore. 7.2 Frequency spectrum 111
Nicholas Rodney Drake (Rangoon 1948–Tamworth-in-Arden 7.3 An optical pulse 112
1974). 7.4 Two pulses 113
7.6 Two frequencies 116
7.1 Introduction 7.7 Many waves: propagation 117
The topic of this chapter is the study of various optical phenomena 7.8 Group propagation 118
where the time dependence of the electric field is not simply a sinusoidal 7.9 Group velocity dispersion 119
oscillation. A time-dependent amplitude corresponds to a superposition 7.10 Dispersive resonance 121
of many waves with more than one frequency—therefore in this chapter 7.11 Slow light 122
the restriction to monochromatic waves encountered in earlier chapters 7.12 Fast light 122
is relaxed. We shall discuss optical pulses, i.e. excitations of 7.13 Information propagation 123
the electromagnetic field that are not continuous in time, and their Chapter summary 124
frequency spectra. We shall also consider how a superposition of waves Exercises 125
with different frequencies propagates, both in free space—where each
component propagates with the speed of light—and in dispersive
media—where different frequency components travel at different speeds.
We shall ignore polarization, and discuss solutions to the scalar wave
equation.
7.2 Frequency spectrum

In Chapter 6 we introduced the Fourier transform and applied this to
the case of monochromatic light with a spread in propagation angles—
the angular spectrum. In this chapter, we focus on a particular
propagation direction, and instead consider a range of wavelengths. The
difference between the angular spectrum of monochromatic light, and Fig. 7.1 Schematic illustration of the
the frequency spectrum of unidirectional white light can be illustrated distribution of wave vectors: (i) for
schematically by considering their respective wave vector distributions, the case of the angular spectrum of
monochromatic light, considered in
as in Fig. 7.1. Recalling from Chapter 6, the Fourier transform in
Chapter 6, and (ii) for the frequency
the time domain, eqn (6.13), tells us that the amplitude of the field spectrum of unidirectional plane waves
component with angular frequency between ω and ω + dω, E0 F(ω)dω, is with different frequencies, considered in
given by the Fourier transform of E0 f(t), where this chapter.
ˆ ∞
F(ω) = f(t)eiωt dt = F [f(t)] (ω) , (7.1)
−∞
112 Optical phenomena in the time domain
where f(t) describes the time dependence of the field. The quantity
A(ω) = E0 F(ω) represents the frequency spectrum which can also be
written as a function of k or λ. Note that the units of the spectrum
1
As a consequence of the different A(ω) are different to the units of the angular spectrum A.1 Next, we use
variables in the Fourier transform, the Fourier transform to find the frequency spectrum of a square pulse.
eqns (6.7) and (6.13).
7.3 An optical pulse

Nowadays it is possible to construct lasers whose frequency instability
is of the order of 10−18 ; i.e. the assumption that the output beam from
the laser is monochromatic is excellent. Such a beam is frequently called
a cw beam, where cw stands for continuous wave. We can think of the
time dependence as being a single cosine with centre angular frequency,
ωc , and a constant amplitude. The envelope function for this wave is
constant. In this section we ask a simple question: when a cw wave is
extinguished by a shutter, such that the light amplitude is non-zero for
a fixed amount of time, τ , as in Fig. 7.2, what happens to the spectrum
of the light? From the discussion of time-domain Fourier transforms
in Chapters 6 and Appendix B we anticipate that the answer is that a
Fig. 7.2 A monochromatic wave that range of frequencies is needed to make a temporal pulse, i.e., pulsed light
is turned on and off suddenly to form a is no longer monochromatic. We shall use Fourier techniques to derive
pulse. the form of the spectral distribution of the light.
Consider passing a cw laser beam through a shutter that only lets the
2
In electronic engineering, this fre- light pass for a duration τ . If the angular frequency of the laser is ωc 2
quency is referred as the carrier fre- and the electric field amplitude is E0 then we can write the temporal
quency.
profile of a single pulse, E0 f(t), where

0 |t| > τ /2
f(t) = , (7.2)
cos ωc t |t| ≤ τ /2
which is plotted in Fig. 7.2.
The Fourier integral can either be evaluated directly, or by using the
techniques from Appendix B. We recognize f(t) as the product of a cosine
function and an envelope function which is the rect function, therefore

t
F(ω) = F [f(t)] (ω) = F rect × cos ωc t ,
τ

t
= F rect ∗ (F [cos ωc t]) ,
τ
τ ωτ
= sinc ∗ [δ(ω + ωc ) + δ(ω − ωc )] ,
Fig. 7.3 The frequency spectrum, 2 2
Aω = E0 F(ω) of a ‘square’ pulse, τ (ω + ωc ) τ (ω − ωc ) τ
= sinc + sinc , (7.3)
eqn (7.3). In this example, the pulse 2 2 2
duration is τ = 40/ωc , where ωc is
the central angular frequency of the where we have used the convolution theorem between steps two and
monochromatic wave. three, and eqns (B.9) and (B.14) between steps three and four. Thus
the frequency spectrum, Aω = A(ω) = E0 F(ω), is two displaced sinc
functions, as shown in Fig. 7.3. It is also noteworthy that the width of
the function F(ω) decreases as the temporal duration, τ , of the pulse
7.4 Two pulses 113
increases. This is an example of the bandwidth theorem that will

crop up on numerous occasions in this book. In order to produce short
optical pulses it is necessary to have a large bandwidth. It is possible to
produce optical pulses with durations of only a few optical cycles; these
are extremely broad band with a bandwidth that is a significant fraction
of the central frequency.
A question that arises from eqn (7.3) and Fig. 7.3 is: what are negative
angular frequencies? This question raises an important difference
between the spatial and temporal Fourier transforms. In Section 6.3
we interpreted negative spatial frequencies as plane waves inclined at
negative angles relative to the optical axis. However, for temporal
frequencies no such interpretation is possible. Recall that in Section 1.11
we introduced complex notation for mathematical convenience; the
appearance of negative frequencies is a consequence of that choice.
3
Optical waves have real electric fields, and the information found in Further details of why the information
in the negative-frequency components
the negative-frequency components is redundant—merely a copy of the is redundant can be found in the
information found in the positive-frequency components. Consequently end-of-chapter exercises. See also
we shall restrict our attention to the positive-frequency domain, 0 ≤ Brooker (2003) for a thorough discus-
ω < ∞, from now on.3 sion.
7.4 Two pulses

Next, we extend our discussion to more complicated functions of time,
(N )
first two pulses and then multiple pulses. The comb function, XT (t),
see Appendix B, allows us to replicate a single-pulse profile N times,
here in time rather than in space. Figure 7.4(ii) shows the envelope
function,

t (2)
f(t) = rect ∗ XT (t) cos ωc t , (7.4)
τ
of two pulses, each of duration τ and separated by a time T . The Fourier

transform (for the positive-frequency region) gives

(ω − ωc )τ (ω − ωc )T
F(ω) = τ sinc cos . (7.5) Fig. 7.4 (i) A single pulse of duration
2 2 τ. (ii) Two pulses of duration τ
separated by a time T . Note that if
In Fig. 7.5 we plot the modulus-squared of the Fourier transform, we replace t by x, τ by a, and T by
|F(ω)|2 , which is related to the power spectrum of the light field d, then we obtain the aperture function
for a double slit with slit width, a, and
(the power spectrum is discussed in detail in Section 8.5). Note the
spacing, d, see Chapter 5.
similarity with a double-slit diffraction pattern, see Chapter 5. A
two-pulse (‘double-slit in time’) excitation scheme is used in atomic
clocks and quantum gates and is known as a Ramsey interferometer.
The interference fringes in Fig. 7.5 are referred to as Ramsey fringes,
after Norman Foster Ramsey (Washington 1915–Wayland 2011) who
was awarded the 1989 Nobel Prize for the invention of the separated
oscillatory fields method and its use in the hydrogen maser and other
atomic clocks.
Fig. 7.5 The spectrum of a pair of

‘square’ pulses demonstrating Ramsey
fringes. The envelope (dashed) has a
width (the angular frequency difference
between the zeros on either side of the
main peak) Δω = 2π/τ , where τ is
the duration of each pulse. Compare
to Fig. 5.17.
7.5 Multiple pulses

Now, we extend the sequence of two pulses to N pulses—the time
analogue of a diffraction grating, see Example 6.4. In this case, the
time dependence has the form

t (N )
f(t) = rect ∗ XT (t) cos ωc t . (7.6)
τ
(N )
Writing XT (t) as a product of XT (t) and rect[t/(N T )], see Sec-
tion B.9, we obtain the Fourier transform

(ω − ωc )τ (ω − ωc )N T
F(ω) = τ sinc X1/T (ω) ∗ sinc . (7.7)
2 2
4
Half of the 2005 physics Nobel Prize
The spectrum, |F(ω)|2 , is shown in Fig. 7.6, and corresponds to a
was awarded to John Hall and Theodor
Hänsch for their contributions to the sequence of equally spaced (narrow) sinc-squared functions with a
development of laser-based precision (broad) sinc-squared envelope. It is evident that as the number of pulses
spectroscopy, including the optical fre- gets very large, only certain frequencies appear in the spectrum. In this
quency comb technique.
limit, the Fourier transform can be approximated by a Fourier series,
because the time dependence is periodic. Stabilized lasers that emit
a series of discrete, equally spaced frequency ‘lines’ are called optical
frequency combs, and find great utility in metrology and precision
measurements.4
Fig. 7.6 The spectrum of a train of

‘square’ pulses. The spacing and width
of the peaks are Δω = 2π/T and
Δω = 2π/(N T ), respectively, where N
is the number of pulses in the train.
The envelope (dashed) has a width (the
angular frequency difference between
the zeros on either side of the main
peak), Δω = 2π/τ , where τ is the
duration of each pulse. Compare to
Fig. 5.18.
We can also harness the power of Fourier techniques to answer the

inverse question: given a spectrum F(ω), can we calculate the expected
temporal profile f(t)? The answer is given by the inverse Fourier
transform, eqn (6.12), reproduced here for convenience:

ˆ ∞
dω
f(t) = F(ω)e−iωt = F −1 [F(ω)](t) . (7.8)
−∞ 2π
We illustrate this technique with two examples where the spectrum F(ω)
is known: first an idealized pulsed laser, and second a gas laser.5 5
See Hooker and Webb (2010) for a full
discussion of laser physics.
Example 7.1
A mode-locked laser: In a laser cavity of length L, see Fig. 11.6 in Chapter 11,
the boundary conditions for the electric field at the mirrors lead to only discrete
angular frequencies; for the mth mode the angular frequency is ωm = mω1 = mπc/L.
Therefore the spectrum is a sum of electric fields of the form Em exp(−iωm t + iφm ),
with a phase φm for each mode. By a process known as mode locking, it is possible
to arrange for all the modes to oscillate with the same phase, φm , in which case
we can write the spectrum as F(ω) = m Fm exp(−iωn t). If we assume that all
N modes have the same amplitude, then we can write F(ω) as a product of the
infinite frequency-replicating function, Xω1 (ω), and a rectangle function that selects
N modes, rect[(ω − ωc )/Δω]; F(ω) = F0 Xω1 (ω) rect[(ω − ωc )/Δω]; here F0 has
dimensions of time. The rect function is centred on ωc , the central angular frequency
of the laser spectrum, and Δω = (N − 1)ω1 is the bandwidth occupied by the excited
modes. The mode number for the central frequency is given by the ratio of the cavity
length to half the wavelength, 2L/λ. For lasers with a cavity length of L ∼ 1 m, the
excited modes have mode numbers of order ∼ 106 . We use eqn (7.8) to calculate the
temporal profile:
Fig. 7.7 The time dependence of the
ω − ωc
f(t) = F −1 [F(ω)](t) = F0 F −1 Xω1 (ω) × rect , output of a mode-locked laser. The
Δω pulses are separated by the cavity

ω − ωc round-trip time, T = c/2L, and have
= F0 F −1 [Xω1 (ω)] ∗ F −1 rect ,
Δω a width inversely proportional to the
laser bandwidth, τ = 2π/Δω.
F0 Δω Δω t
= exp(−iωc t) XT (t) ∗ sinc , (7.9)
4π 2 2
where T = 2π/ω1 . Note that we have used the convolution theorem between steps
two and three, and eqns (B.5), (B.9), and (B.13) between steps three and four. This
function is a set of periodically displaced sinc functions in time, and is known as a
6
mode-locked pulse train.6 The field profile is shown in Fig. 7.7. We see that the It can also be written explicitly as a
pulses are separated in time by T = 2π/ω1 = c/2L: this is the round-trip time of sum without the comb function:
the cavity. It is also evident that the temporal width of the pulses decreases as the f(t) = F0 Δω/(4π 2 ) exp(−iωc t)×
bandwidth of the laser, Δω, increases.
n=+∞
sinc Δω (t − nT )/2.
n=−∞
In an actual laser, the spectrum of modes does not have a rectangular

profile. In the next example, we consider an Ar+ ion laser, where the ions
move with a range of velocities, and from kinetic theory we know that
the distribution of the velocity component along any axis is a gaussian.
As a consequence of the Doppler effect, the distribution of the mode
amplitudes is also gaussian.
Example 7.2
Ar+ ion laser: For the Ar+ ion laser we can write the Fourier spectrum as

(ω − ωc )2
F(ω) = Xω1 (ω) G0 exp − ,
Δω 2
where Δω is the bandwidth of the excited modes, and ωc the central angular
frequency of the laser. When all the modes are locked, we calculate the temporal
shape of the output:
f(t) = F −1 [F(ω)](t) ,

= F −1 Xω1 (ω) × G0 exp − (ω − ωc )2 /Δω 2 ,
−1 −1
= G0 F [Xω1 (ω)] ∗ F exp − (ω − ωc )2 /Δω 2 ,

ΔωG0 Δω t
= exp(−iωc t) XT (t) ∗ gauss . (7.10)
4π 3/2 2
This mode-locked pulse train, shown schematically in Fig. 7.8, is a set of periodically
displaced gaussian functions in time. As expected, the pulses are separated in time
by c/2L, and again it is evident that the temporal width of the pulses decreases as
Fig. 7.8 The time dependence of the the bandwidth of the laser increases.
output of a mode-locked laser with a
gaussian gain profile. The gaussian
pulses are separated by the cavity
round-trip time, T = c/2L, and have
a (1/e-intensity) width that is inversely
proportional to the laser bandwidth, 7.6 Two frequencies
τ = 1/Δω.
As we have done in earlier chapters, we will start our investigation of
optical fields that are not monochromatic by considering the simplest
possible case: the addition of two waves of almost equal frequency. Let
E1 and E2 be two harmonic waves propagating along the z direction.
Their sum is
E = E0 cos(k1 z − ω1 t) + E0 cos(k2 z − ω2 t), (7.11)
where we have assumed for simplicity that both waves have the same
7
cos A + cos B = amplitude. Using a standard trigonometric identity7 we can rewrite this
as
A+B A−B
2 cos cos .
2 2
E = 2E0 cos k̄z − ω̄t cos (Δk z/2 − Δω t/2) , (7.12)
where we have introduced the average quantities k̄ = (k1 + k2 )/2,

ω̄ = (ω1 + ω2 )/2, and the differences Δk = k2 − k1 , and Δω = ω2 − ω1 .
This solution represents a wave that is modulated in space and time, as
is shown in Fig. 7.9. In addition to the (fast) oscillation at frequency ω̄,
the second term in eqn (7.12) represents a slowly varying envelope.
This is an example of the well-known phenomenon of beats, familiar
from other branches of wave physics.8 By considering the coefficients
of the space and time components in eqn (7.12), it is evident that
Fig. 7.9 The interference of two waves the envelope has a velocity of Δω/Δk. We saw in Section 1.13 that
with different angular frequencies, ω1 in free space, any pulse shape E = E0 f(z − ct) propagates without
and ω2 . If the two waves have the changing shape. The function in eqn (7.12) is a linear superposition
same phase velocity, ω1 /k1 = ω2 /k2 ,
there is no dispersion and the wave of two functions of this form. From a Fourier perspective it is easy
form propagates without changing. In to explain the invariance of the shape—an arbitrary function f(z − ct)
a dispersive medium ω1 /k1 = ω2 /k2 can be decomposed into a superposition of plane waves; in free space
and the wave envelope propagates at a all these components propagate at the same speed, c, therefore the
different speed to the wave.
constructive and destructive interference that leads to a particular shape
8
See, for example, Freegarde (2012). of the function f(z − ct) does not change. Specifically for this case,
7.7 Many waves: propagation 117
Δω/Δk can be evaluated by using the phase velocities of the individual

components ω1 /k1 = c, and ω2 /k2 = c, as
Δω ω2 − ω 1 c (k2 − k1 )
= = = c. (7.13)
Δk k2 − k1 k2 − k1
We see that both the components, and their superposition—or group—
propagate at the same velocity, c. This type of medium is known as
dispersionless or non-dispersive. The same applies to a medium
where all components propagate at phase velocity vp = c/n, where the
refractive index n is independent of frequency.
In contrast, in media where not all of the Fourier components
propagate at the same speed—a dispersive medium—the locations
of the regions of constructive and destructive interference evolve as the
wave propagates. If the two waves in eqn (7.11) have different phase
velocities, i.e., ω1 /k1 = ω2 /k2 , the superposition, eqn (7.12), will no
longer be invariant, and the profile of the group will evolve with time.
Next, we extend these ideas to many waves.
7.7 Many waves: propagation

In Chapter 6 we analysed the propagation of monochromatic light using
the angular spectrum method. This is an example of a general approach
to solving differential equations, known as spectral methods, and is
equally applicable to propagating light fields in time. In this section, we
derive the analogue of the hedgehog equation, eqn (6.29), in the time
domain, and plot the position of the pulse at different times, Fig. 7.10
Fig. 7.10 A gaussian pulse,
Let A(k) be the Fourier spectrum of the wave vectors, illustrated in
2 2
Fig. 7.1, defined by the relation E(z, 0) = E0 eikc z e−z /z0
,
ˆ ∞
propagating through a dispersionless
E(z, t) = A(k)ei[kz−ω(k)t] dk , (7.14) medium. The normalized electric field,
−∞ E/E0 = f(z, t) (grey), and intensity
I/I0 = |f(z, t)|2 (black), at times t = 0
where ω(k) is known as the dispersion relation.9 Note that A(k) is (bottom), t/2 (middle), and t (top).
different and has different units to the angular frequency spectrum, The centre of the pulse propagates at
A(ω). From eqn (7.14) we see that A(k) is defined as the Fourier an effective speed c/n, where for a
transform of the spatial profile of the electric field at time t = 0: dispersionless medium n is independent
of frequency.
ˆ ∞
A(k) = E0 F [f(z, 0)] (k) = E0 f(z, 0)e−ikz dz . (7.15)
−∞ 9
Rather confusingly, a medium with
a linear dispersion relation, where
After a time t each frequency component has acquired a phase, e−iω(k)t , the refractive index is independent of
and the field is given by a superposition of each phase-shifted component frequency over the region of interest, is
that has the form of an inverse transform: called dispersionless or non-dispersive.

E(z, t) = F −1 e−iω(k)t F[E(z, 0)] , (7.16)
where E(z, 0) is the field at time t = 0. This hedgehog-in-time

equation is the time equivalent of the angular spectrum propagation
equation, eqn (6.29). Next, we solve this equation for the simplest
case of propagation in a dispersionless medium, where ω(k) is linearly
proportional to k.
Example 7.3
Dispersionless propagation: In a dispersionless medium, we can write the
dispersion relation as ω(k) = (c/n)k, where the refractive index, n, is independent
of frequency. In this case, eqn (7.16) is exactly solvable using the same method
we used to derive the Fresnel diffraction integral, Section 6.5. Writing h =
F −1 [e−i(c/n)kt ](z) = δ[z − (c/n)t] and using the inverse convolution theorem, we
find
E(z, t) = δ[z − (c/n)t] ∗ E(z, 0) = E[z − (c/n)t, 0] . (7.17)
The result is that the wave translates a distance z = (c/n)t without changing shape,
as shown in Figs. 7.10 and 7.11. This is the same result as we found in Section 1.13;
however, the use of eqn (7.16) is particularly powerful, as it can now be applied for
any form of ω(k), as we will show.
7.8 Group propagation

To study the propagation of optical pulses in media with any dispersion
relation, we expand ω(k) around the central angular frequency, ωc =
(c/nc )kc , using a Taylor series:10
Fig. 7.11 The same as Fig. 7.10 except

for a rectangular pulse. dω 1 d2 ω 2
ω(k) = ωc + (k − k ) + (k − kc ) + · · · . (7.18)
dk kc 2 dk 2 kc
10 c
Taylor’s theorem enables a function
f(x) to be expanded in a power series
in x in a given interval, and states First, we revisit the dispersionless case by retaining only the first two
that if f(x) is a continuous, single-
valued function of x with continuous
terms in the Taylor expansion. Substituting these into eqn (7.14), we
derivatives f (x), f (x), and so on, in obtain
a given interval, then ˆ ∞
dω
(x − a) E(z, t) = ei(kc z−ωc t) A(k) exp i t − z (k − kc ) dk . (7.19)
f(x) = f(a)+
1!
f (a)
−∞ dk kc
(x − a)2
+ f (x)+· · · . We notice that the term outside the integral is a conventional plane
2!
wave with angular frequency ωc . The amplitude of the plane wave is
modulated by an envelope function given by the integral. Note the
similarity to how the wave form was modulated in eqn (7.12). The
integral is a function of the composite variable

dω
ξ= t−z . (7.20)
dk kc
If we call the integral, or envelope function, F(ξ), then the field can be
written as

dω
E(z, t) = E0 ei(kc z−ωc t) F t − z . (7.21)
dk kc
7.9 Group velocity dispersion 119
The peak of the Fourier integral will occur when the phase factor is zero,
such that all the components add constructively. Therefore the group
will move to a location where

dω
t=z , (7.22)
dk kc
and we can define a group velocity as11 11
As the derivative is always evaluated
at kc (equivalently carrier angular
dω frequency, ωc ), the subscript to denote
vgp = . (7.23)
dk this is typically omitted.
In a dispersionless medium, where only the first two terms in
eqn (7.18) are significant, we learn from eqn (7.21) that the pulse
propagates without distortion in shape, as the envelope function that
defines the group propagates at a constant velocity—the group velocity.
Figure 7.11 is an example of the evolution of an optical pulse at the
group velocity—the same as Fig. 7.10 but for a rectangular pulse. Also,
although we have restricted our attention to waves moving along one
direction (z), the full vector group velocity can be calculated from the
divergence operator using the wave vector v gp = ∇k ω.
Recalling that the refractive index is defined by the relation kc =
n(ω)ω, we can also write the expression for the group velocity as
c
vgp = . (7.24)
n + ωdn/dω
This result helps us to classify two different regimes: (i) when dn/dω >
0, this case is referred to as normal dispersion, and vgp < vp ; and (ii)
when dn/dω < 0, this case is referred to as anomalous dispersion,
and vgp > vp .12 It must be emphasized that the group velocity is not 12
There are other mathematically
the average of the phase velocities. Equation (7.24) says that the group equivalent ways of writing the expres-
sion for group velocity, as the refractive
velocity depends both on the refractive index and its derivative with index can also be given as a function of
respect to frequency. Many curious optical phenomena arise when the wavelength or wave vector magnitude;
phase and group velocities are vastly different in magnitude; indeed, an end-of-chapter exercise investigates
the phase and group velocities can even have different signs. We shall some of the alternatives.
encounter some examples in the following sections. In a medium where
the phase velocity—or equivalently refractive index—is constant over the
frequency range spanned by the pulse, the group and phase velocities do
not differ. As expected, in vacuum, vgp = vp = c.
The concept of a group index is also useful in the context of dispersive
media. The group index, ngp , is defined as
c dn
ngp = =n+ω . (7.25)
vgp dω
We will now look at what happens when the gradient term, ωdn/dω
dominates.
7.9 Group velocity dispersion

The third term in the Taylor expansion of the dispersion relation,
eqn (7.18), depends quadratically on the magnitude of the wave vector,
i.e. it is proportional to k 2 . This term leads to a group velocity,

vgp = dω/dk, that is frequency dependent, and is associated with the
phenomenon of group velocity dispersion (GVD), which causes a
pulse to change shape as it propagates.
For the case of a gaussian pulse, illustrated in Fig. 7.12, the change
of shape is an increase in the pulse duration. As the pulse expands
its amplitude decreases—as required by energy conservation—and it
acquires a chirp: the higher-frequency components are at the front
of the pulse, and the lower-frequency components are at the rear. In
Example 7.4, we derive an expression for the pulse width and the chirp.
Example 7.4
Dispersion of a gaussian wave packet: We can use the hedgehog equation,
eqn (7.16), to calculate the change in width (or duration) of a gaussian pulse
due to GVD—the third term in eqn (7.18). The pulse has initial wave form
2 2
E(z, 0) = E0 g(z, 0)h(z, 0), where g(z, 0) = e−z /2(Δz0 ) describes the pulse envelope
Fig. 7.12 Propagation of a gaussian and h(z, 0) =√eikc z , the carrier wave. The Fourier transform of the envelope function
2 2
pulse through a medium with group is G(k) = 2πΔz0 e−(Δz0 ) k /2 . Multiplying by the propagator in eqn (7.16),
velocity dispersion. The centre of the 2 2
e−iω(k)t , the third term in eqn (7.18) produces a term of the form, e−β k /2 , where
pulse propagates at a speed vgp = 2
c/ngp , where ngp is the group refractive d ω
β 2 = (Δz0 )2 + i t. (7.26)
index at the central frequency ωc . dk2
Taking the inverse transform, we obtain
Δz0 −z2 /β 2
g(z, t) = e ∗ δ(z − vt) , (7.27)
β
where v = ωc /kc and
1 1
= . (7.28)
β2 (Δz0 )2 + i(d2 ω/dk2 )t
This confirms that after propagation the wave form is still a gaussian, but with
a different width. The modified width, Δzt , is found by separating the real and
imaginary parts, i.e. writing
1 1
= + iα(t) , (7.29)
β2 (Δzt )2
which gives
2
1 d2 ω
(Δzt )2 = (Δz0 )2 + t2 , (7.30)
(Δz0 )2 dk2
and
(d2 ω/dk2 )2
α(t) = t2 , (7.31)
(Δz0 )2 + (d2 ω/dk2 )2 t2
is a linear chirp (the frequency, ω, depends linearly and the phase, ωt, quadratically,
on time).
We note two things about the spreading, eqn (7.30): (i) it is

proportional to the third term in the expansion of eqn (7.18), the group
velocity dispersion; and (ii) the spreading is inversely proportional to the
initial spatial width. The fact that a gaussian group, or wave packet,
maintains its gaussian nature but increases its width on propagation
7.10 Dispersive resonance 121
with a rate inversely proportional to the initial width is similar to the

analysis of the transverse diffraction of a gaussian laser beam. In fact,
the mathematics of Example 7.4 is identical to the analysis of laser beam
propagation presented in Section 11.2.
GVD is often characterized in terms of the chromatic dispersion
coefficient, D, which is defined as the temporal broadening, Δt (in
nanoseconds), for a range of wavelengths, Δλ (in nanometres), over a
propagation distance, z, (in kilometres):
Δt = |D|zΔλ . (7.32)
For the gaussian pulse with spatial width, Δz0 , from eqn (7.30) we
2
find that vgp Δt = (d2 ω/dk 2 )(z/Δz0 ). Substituting for spectral width,
Δk = −2πΔλ/λ2 = 1/Δz0 , and assuming vgp c inside the fibre, we
obtain
2π d2 ω
Δt = Δλz , (7.33)
c2 λ2 dk 2
and therefore,
2π d2 ω
|D| = . (7.34)
c2 λ2 dk 2
It is possible to write D in numerous ways but they all involve a second-
order derivative of the refractive index; an end-of-chapter exercise
investigates some of these alternatives. In optical communications, GVD
causes the pulses to spread out and merge into one another, which limits
the maximum data rate or propagation distance. For optical fibres,
the maximal dispersion is specified by the International Communication Fig. 7.13 Eye diagrams in a simulated
Union as |D| < 3.5 ps.nm−1 km−1 at λ = 1.55 μm. In Fig. 7.13 we show optical fibre communications link. The
a simulation of an optical communications signal to illustrate the effect ‘0’ and ‘1’ signal wave forms at
propagation distances of 0, 100, and
of dispersion. The signal is a rectangular wave with amplitude between 200 km are shown. If τ = 100 ps, then
0 and 1 and pulse duration τ (100 ps for 10 Gbit.s−1 ). The plots show the bandwidth at λ = 1.55 μm (Δλ =
both a 0 and 1 arriving within a time window centred around t = 0. This (λ2 /c)Δν for Δν = 20 GHz) is of
is known as an eye diagram. On the right we show a histogram of the order 0.15 nm. A dispersion coefficient
of |D| = 2 ps.nm−1 km−1 adds 30 ps
signal level recorded within a time window τ /2. The histogram indicates of broadening over 100 km, which is
the distinguishability between 0 and 1. In the lower plot, 0 and 1 are enough to significantly increase the bit-
clearly distinguishable; for longer propagation distance (middle and top) error rate (upper plots).
they become less so. The overlap of the 0 and 1 histograms determines
the bit-error rate (BER) of the communication channel.
7.10 Dispersive resonance

Optical media exhibit electronic resonances, see Chapter 13. Whether
we can assume a linear or quadratic dispersion relation depends on the
centre frequency of the pulse relative to the resonance frequency, and
also on the spectral width (bandwidth) of the pulse. The dispersion
relationship—ignoring the imaginary component, see Chapter 13—for a
medium with a resonance at angular frequency ω0 = ck0 can be written

as
−1
ck N c(k − k0 )
ω(k) = = ck 1 − 2 , (7.35)
n(k) c (k − k0 )2 + Γ2 /4
where N is a constant that depends on the properties of the medium
and Γ expresses the frequency width of the resonance. This function
along with n(k) are plotted in Fig. 7.14. A feature of this dispersion
relation is that it contains a region where the gradient of n versus k is
negative, shaded in Fig. 7.14, corresponding to anomalous dispersion.
This dispersion relation also has the feature that there is a region where
Fig. 7.14 (i) Refractive index and
(ii) dispersion relation for a medium ωdn/dω may be larger than n, and hence vgp
c (ngp 1).
with a single resonance at angular To model the propagation of light in this dispersive medium, we solve
frequency ω0 = ck0 . The shaded region eqn (7.16) for any desired initial pulse shape, E(z, 0), using the dispersion
corresponds to anomalous dispersion.
relation, eqn (7.35). Next, we consider examples in both the normal and
anomalous dispersive regions.
7.11 Slow light

First, we consider the case of normal dispersion, and choose a centre
light frequency below resonance, i.e. ωc < ω0 , see the inset in Fig. 7.15.
This case—light frequency less than the resonance frequency—is often
referred to as ‘red detuning’. We choose N , Γ, ω0 and ωc such that
ngp 1 as required for the slow-light regime. As before, we consider
a gaussian pulse such that the initial electric field can be written as
E(z, 0) = E0 eikc z e−z /z0 , as illustrated in Fig. 7.15 (lower frame). The
2 2
effect of the large group index is to delay the pulse relative to the
dispersionless case, as is apparent in the top frame of Fig. 7.15. This is
the slow-light effect.
In practice, a large real part of the refractive index is associated with
a concomitant imaginary part (see Section 13.5) which leads to loss of
Fig. 7.15 Propagation of a non- light via scattering. This loss is apparent as a reduction in the intensity
resonant gaussian pulse with central of the pulse in Fig. 7.15. To avoid loss, it is preferable to choose the
angular frequency, ωc < ω0 . The delay angular frequency of the light, ωc , as far away as possible from the
relative to the dispersionless medium
(grey curve in the top graph) is the
resonant frequency of the medium, ω0 ; however, this also results in lower
slow light effect: ωdn/dω 1. dispersion. A solution to this problem is to exploit a phenomenon known
The normalized spectrum (grey) and as EIT (electromagnetically induced transparency), where an additional
n(ω) − 1 are shown inset. laser is used to engineer the dispersion of the medium.13
13
Using this idea, Schmidt and col-
leagues (Schmidt et al. 1996) achieved
a group velocity of c/3000. Later, Hau 7.12 Fast light
and colleagues (Hau et al. 1999) slowed
light to vgp = 17 m s−1 , slower than a By exploiting the anomalous dispersion region, it is possible to realize
cyclist! In EIT experiments temporal
control of the relevant optical fields media where the second term in eqn (7.25) is both large and negative, in
also allows one to store light; see which case the group index is negative. This phenomenon is known as
Fleischhauer et al. (2005) for details. fast light. The controversial description of ‘superluminal propagation’
is occasionally used to describe this regime. To observe fast light, one
must satisfy the inequality dn/dω < −n/ω. One strategy to fulfil
7.13 Information propagation 123
this inequality is to locate the pulse frequency at the centre of an

absorption line, as illustrated in Fig. 7.16.14 However, this means that 14
Keaveney et al. (2012) achieved a
the large anomalous dispersion is accompanied by strong absorption. group index of −1 × 105 in a hot gas
and Jennewein et al. (2016) measured
Consequently, research on fast light has focused on achieving a large
a group velocity of vgp = −300 m s−1
negative group index in media with gain.15 in a gas of laser cooled atoms.
One might think that some of the topics mentioned in this section 15
Wang et al. (2000) achieved a
are in conflict with Einstein’s Special Relativity, but this is not so. The group index of ngp = −310 using the
whole of our discussion is based in the framework of the Kramers–Kronig anomalous dispersion region between
relations, and causality is built in from the start; see Section 13.5 or Boyd two narrow gain lines, and Gehring et
al. (2006) observed backward pulse
(2002). We go on in Section 7.13 to discuss briefly some ideas related to
propagation, where the peak of the
the speed of transfer of information in a dispersive medium. transmitted pulse leaves the medium
before the peak of the incident pulse en-
ters, in an erbium-doped fibre amplifier
with a negative group velocity.
7.13 Information propagation
We have seen in Section 7.12 that fast light is associated with apparent
‘superluminal’ propagation, and from eqn (7.24) we can obtain a group
velocity that exceeds c in a region of anomalous dispersion. Some
treatments of these topics make the mistake of conflating the group
velocity with the velocity of a signal, or that of energy propagation.
Although a dramatic modification of the velocity of light was not
demonstrated until the end of the twentieth century, the theoretical
analysis of many of these phenomena was performed much earlier.
One of the most readable accounts can be found in Brillouin’s16 book
Wave Propagation and Group Velocity (1960), which reproduces some
of the earlier papers on the topic, including ‘About the propagation of
light in dispersive media’ by Sommerfeld17 . Brillouin considered the
propagation of a pulse centred on the medium’s resonant frequency,
where there is large anomalous dispersion and the group velocity can
exceed the speed of light, i.e. vgp > c. In this case the distortion of the
pulse as it propagates—which is inevitable for a dispersive medium—
is crucial to include in the analysis. It appears that one easy way to Fig. 7.16 Fast light: Propagation of
resolve some of the issues would be to use a step-function discontinuity, a resonant gaussian pulse with central
as then the front velocity would simply be defined as the velocity angular frequency on resonance, ωc =
ω0 . The peak of the gaussian pulse
of the discontinuity. Note, however, that this is too simplistic a view, (indicated by the black dashed line)
as we know from a Fourier context that a step-function discontinuity propagates faster than in free space
would require Fourier components with an extremely broad bandwidth, (grey dashed line), vgr > c. However,
i.e., a wide range of frequencies. In a dispersive medium, higher-order this apparent ‘superluminal’ propaga-
tion is at the expense of attenuation.
terms in eqn (7.18) become important, and as the pulse propagates, the
step will inevitably distort, as is apparent in Fig. 7.17, which shows a 16
Léon Nicolas Brillouin (Sèvres 1889–
near-resonant rectangular pulse propagating through a medium. The New York 1969).
appearance of a signal in advance of the front of the pulse is known as 17
Arnold Johannes Wilhelm Sommer-
a precursor, or forerunner. feld (Königsberg 1868–Munich 1951).
The other concept that is often neglected in this discussion of
information propagation with light is that of signal-to-noise, see e.g.
Fig. 7.13 for how a real fast pulse might appear. Whether one can
detect optical precursors or not depends on their strength relative to
the inevitable noise that is recorded by any optical detector. For
example, in the work of Keaveney et al. the total integrated signal

was analysed, both for a superluminal pulse and an off-resonance pulse
(which propagates as if it were in vacuum). The total integrated counts
for both pulses verified the preservation of causality—the probability of
detecting a photon was found to be always higher in the reference pulse.
To emphasize that the topic of light propagation in a dispersive
medium is non-trivial, we point out that Milonni in his book Fast Light,
Slow Light and Left-handed Light (2005) defines six different velocities!
These are (i) c, the speed of light in vacuum; (ii) phase velocity; (iii)
group velocity; (iv) front velocity; (v) signal velocity; and finally (vi)
energy-transport velocity. Note that this topic is still a subject of lively
discussion, without as yet no full consensus. In particular, the definition
of signal velocity has been modified since Sommerfeld and Brillouin’s
first discussion, especially in the context of weak (i.e. few photon)
pulses; see Kuzmich et al. (2001). In summary, experiments have been
performed in regimes where either vgp > c or vgp < 0, but it is possible
to explain the propagation by analysing the dispersion properties of the
medium. No one has proposed, or demonstrated, a violation of causality
by considering the propagation of light in a dispersive medium.
Are you ready for the end of time?

John Charles Ryle (Macclesfield 1816–Lowestoft 1900).
Fig. 7.17 The propagation of rectan-
gular and gaussian pulses through a
resonant medium. For the rectangular
pulse (top), parts of the frequency Chapter summary
spectrum lie within the anomalous
dispersion region (see inset). This
leads to a break up of the pulse • A harmonic wave with time-dependent envelope f(t) has a
with precursors at the leading edge. frequency spectrum given by the Fourier transform F(ω) =
A gaussian pulse is shown below for
comparison. The lower plot shows the F[f(t)](ω).
rectangular pulse at t = 0. • The propagation of an optical pulse is encapsulated by the
hedgehog-in-time equation:

E(z, t) = F −1 e−iω(k)t F[E(z, 0)] ,
where ω(k) is the dispersion relation.

• The dependence of the angular frequency of a light wave, ω, on
the modulus of the wave vector, k, in a medium is known as the
dispersion relation.
• In a dispersionless medium with negligible group velocity
dispersion, an optical pulse propagates without distortion at the
group velocity, vgp = dω/dk, with a corresponding group index of
n + ωdn/dω.
• In a medium exhibiting group-velocity dispersion an optical
pulse broadens as it propagates.
• Slow and fast light refers to the regime where the group
refractive index is larger or smaller than one, respectively.
Exercises 125
Exercises
(7.1) Spectrum of a rectangular pulse particular case the spectrum can also be calculated
Explicitly evaluate the integral in eqn (7.1) for the as the sum of a geometric series. Show explicitly
temporal function defined in eqn (7.2), and verify that this technique generates an identical answer
that the spectrum of a rectangular pulse in time to that of eqn (7.9).
is indeed two displaced sinc functions.
(7.7) Width of mode-locked pulses
(7.2) Spectrum of an isolated triangular pulse
In an Ar+ ion laser of cavity length L = 2.00 m,
A triangular pulse of duration τ has a profile that
the gain bandwidth has a gaussian profile and a
is zero for t ≤ −τ /2; the electric field increases
standard deviation of 2π × 1.0 GHz. Calculate the
linearly for −τ /2 ≤ t ≤ 0, decreases linearly
temporal duration and separation of the pulses in
to zero for 0 ≤ t ≤ τ /2, and is zero for t ≥
the mode-locked train.
τ /2. Calculate the frequency spectrum of this
pulse. [Hint: a triangular function can be obtained (7.8) Bandwidth of short-pulse lasers (1)
by convolving two rectangular pulses of half the A mode-locked Ti:sapphire laser has a central
width.] operating wavelength of 800 nm, and produces
(7.3) Spectrum of a periodic train of triangular pulses pulses of duration 10 fs. (i) Sketch the form
What is the frequency spectrum of a periodic train of the optical field. (ii) What is the angular-
(period T ) of triangular pulses of duration τ ? frequency bandwidth of the pulses? (iii) What is
the wavelength bandwidth of the pulses?
(7.4) Appearance of negative frequencies in the Fourier
transform of pulses´ (7.9) Bandwidth of short-pulse lasers (2)
∞
Start with F(ω) = −∞ f(t)eiωt dt. Note that F(ω) Show that if an optical pulse only lasts for
is in general complex, and we seem to need to sum a duration of approximately N cycles, the
over negative frequencies. In this question we shall bandwidth of the spectrum is approximately 1/N
consider the physical significance of these terms. of the central frequency.
(i) First, show that for a real function f(t), the
function F(ω) obeys the relation F(ω)∗ = F(−ω). (7.10) Intensity of mode-locked pulses
(ii) Next, show´that if f(t) is a real even function, (i) Show that the electric field strength of the
∞
then F(ω) = 2 0 f(t) cos ωt dt, which is real, and mode-locked pulse train of eqn (7.3) has a peak
F(ω) = F(−ω). Therefore the real part of F(ω) amplitude proportional to N , the number of
tells us how much cos ωt there is in f(t). excited modes. (ii) Hence show that the peak
(iii) Show that if f(t) is a real odd function, then intensity scales as N 2 . (iii) Further, show that the
F(ω) is purely imaginary, and F(ω) = −F(−ω). duration of the pulses is inversely proportional to
Therefore the imaginary part of F(ω) tells us how N . (iv) Hence show that the average intensity
much sin ωt there is in f(t). scales as N . This result emphasizes that the
(iv) Now we can get rid of the negative frequencies mode-locked pulses arise as a consequence of
completely. Using the results of the earlier parts interference, a process that can redistribute energy
of this question, show that we can always write the but can neither increase nor destroy the sum of the
transform of a real function as a sum over positive energy of the individual modes.
frequencies with real amplitudes.
(7.11) Temporal form of mode-locked pulse train
(7.5) Fourier transform of two pulses
Write the temporal form of the gaussian mode-
Verify the result in the text, that the Fourier
locked pulse train of eqn (7.10) explicitly as a sum
transform of a pair of identical pulses of duration
without the comb function.
τ and separation T is given by eqn (7.5).
(7.6) Uniform amplitude mode-locked pulse train (7.12) Photon lifetime inside a Fabry–Perot cavity
In the text we have used Fourier methods to An ultra-short gaussian light pulse with centre
calculate the time dependence of the uniform frequency, ωc , is emitted inside a Fabry–Perot
amplitude mode-locked pulse train. For this cavity, see Section 3.11. The field outside the
126 Exercises
cavity for t > 0 is (7.15) Phase and group velocities for matter waves
(i) Use a trial solution, ψ = Aei(kz−ωt) , in
2
/τ 2
E(t) = E0 e−(iωc +κ/2)t e−(t−mT ) , the Schrödinger equation to derive the following
m dispersion relation for the matter wave associated
with a particle of mass m in free space: k 2 /2m =
where 1/κ is the cavity decay time, T = 2 /c is ω. (ii) Show that the phase velocity is vp =
the cavity round-trip time, and m is an integer. k/2m. (iii) Show that the group velocity is vgp =
The two interfaces have intensity reflectivities of k/m. (iv) Interpret these results. (v) Do matter
R and unity, respectively. As we lose a fraction waves in free space exhibit normal or anomalous
1 − R on each round-trip, the change in the pulse dispersion?
intensity is δIp /δt = −(1 − R)Ip /[c/(2 )], which
(7.16) Cauchy’s formula for refractive index variation
has the solution
Cauchy found an empirical formula for the
variation of refractive index as a power series
Ip = I0 e−(1−R)t/[(2 )/c] ,
in 1/λ2 . The simplest form of his relation is
n = A + B/λ2 . For BK7 borosilicate glass the
which gives κ (1 − R)c/(2 ). To find the
coefficients are A = 1.5046 and B = 4200(nm)2 .
spectrum of light emitted by the cavity, |F(ω)|2 ,
Find (a) the refractive index, and (b) the group
it is convenient to write the time dependence as
index at: (i) 400 nm, (ii) 500 nm, and (iii) 600 nm.
f(t) = g(t)h(t), where
(7.17) GVD-induced pulse broadening (1)
For an optical pulse where group velocity
0 t<0
g(t) = , dispersion is not negligible, show that the pulse
e−(iωc +κ/2)t t>0
broadening is Δt = |D|zΔλ, where the dispersion
and parameter is
λ d2 n
h(t) = X(t/T ) ∗ gauss(t/τ ) . |D| = .
c dλ2
Find an expression for the Fourier transforms, Show that this can also be written as
G(ω) = F [g(t)] and H(ω) = F [h(t)] and then use
2πc d2 k
the convolution theorem to find an expression for |D| = .
λ2 dω 2
the spectrum of light |F(ω)|2 emitted by the cavity.
What is the width of the peaks in terms of κ? How (7.18) GVD-induced pulse broadening (2)
does this width of the peaks compare to the width (i) Write down the form of the electric field for
of the transmission resonances of a Fabry–Perot a gaussian pulse with initial spatial width (rms)
interferometer, eqn (3.38)? Δz0 . (ii) By taking the Fourier transform, eval-
(7.13) Slow and fast oscillations for two colours uate the wave vector spectrum using eqn (7.15).
A sodium lamp emits light with two wavelengths, (iii) Keep the group velocity dispersion term in the
589.0 and 589.6 nm. Evaluate the quantities k̄ = expansion of eqn (7.18), and evaluate the field at a
(k1 + k2 )/2, ω̄ = (ω1 + ω2 )/2, Δk = k2 − k1 , and time t later from eqn (7.14). [Hint: for a gaussian
Δω = ω2 − ω1 . Comment on the magnitudes of profile the integral is analytic, having completed
your results. the square.] (iv) Show that the spatial width
after propagation, Δzt , is exactly of the form of
(7.14) Alternative expressions for the group velocity eqn (7.30).
(i) Starting with the definition vgp = dω/dk,
(7.19) Slow light
and recalling that the phase velocity is defined as
What is the group index for a pulse of light slowed
vp = ω/k, show that vgp = vp + kdvp /dk = vp −
to 17 m s−1 ?
λdvp /dλ. What signs do dvp /dk and dvp /dλ take
in regions of normal and anomalous dispersion, (7.20) Fast light
respectively? In a slow or fast light medium there is a
(ii) Show that vgp = c/(n − λdn/dλ). compression of the physical length of the pulse by
(iii) Show that a factor 1/|ngp |. What is the length of a 25 ns
long pulse (i) in free space, and (ii) in a fast light

k dn medium with ngp = −1 × 106 ? [Hint: these are
vgp = vp 1− .
n dk the parameters from Jennewein et al. (2016)].
Coherence 8
Io ritornai dalla santissima onda, rifatto.
I return to the sacred wave, refreshed.
8.2 Statistical light 128
Dante Aligheri (Florence 1265–Ravenna 1321), Divina Com-
media (1308–20)
8.4 White light 130
8.5 Wiener–Khinchin–Einstein
theorem 132
8.1 Introduction 8.7 Intensity correlations 136
8.8 Spatial coherence 136
In previous chapters, when discussing interference and diffraction, we 8.9 van Cittert–Zernike 137
restricted our attention to monochromatic light with a single wavelength
8.10 Propagation of coherence 140
λ and a well-defined phase, i.e. light fields that can be described in terms
of the harmonic wave solution, eqn (1.9) of Chapter 1. In this chapter, we
Chapter summary 142
shall extend our discussion to include light with more than one frequency,
Exercises 142
components whose relative phase varies with time, and extended sources
that emit neither plane nor spherical waves. The concept of coherence
relates to the extent that knowledge of the electric field at one point in
space and time provides information about the phase at other points,
i.e. to what extent there is a correlation between the fields at the
two locations (displaced in either space or time). The extent of these
correlations between fields at different locations gives us a quantitative
measure of the coherence, and there exists a continuous scale between
fully coherent and completely incoherent.
Incoherence describes the situation when we do not have complete
information about the field, and the best we can do is to express the field
as a statistical mixture of different components. The unknown phases
in incoherent light are treated as random. This is a significant break
with previous chapters where we have assumed that we had complete
knowledge of the field, and could write it as a superposition of fully Fig. 8.1 A statistical light source
coherent plane or spherical wave solutions, eqns (2.10) and (2.33), recall (in the plane on the left) emits light
Fig. 2.1. Real light fields do not behave like these idealizations because with different frequencies and phases
that evolve with time (the black and
light sources are themselves statistical, emitting different frequencies grey waves indicate two times). Our
with a different time dependence, as in Fig. 8.1. For a statistical source knowledge about the source is limited
it is impossible to completely predict the relative phase at subsequent by what we choose to detect.
times. In addition, the capabilities of measurement are limited. In
paraxial optics, where we choose to detect only light close to a particular
axis, we select parts of the field with particular phase relationships and
may find that coherence grows with propagation distance. Consequently,
it is important to emphasize that:
128 Coherence
Coherence is not uniquely a property of the source, but rather

of the complete optical system.
A useful measure of coherence is provided by the ‘quality’ of the fringes

observed in an interferometer.1 The ‘quality’ of the interference fringes
is quantified in terms of the visibility,
Imax − Imin
V= , (8.1)
Imax + Imin
Fig. 8.2 The intensity maxima and
minima, Imax and Imin , used to define where Imax and Imin are the intensities of maxima and minima in a
visibility, eqn (8.1). particular region of the interference pattern, see Fig. 8.2. The visibility
1
There is a parallel in quantum physics is constrained to lie within the range 0 ≤ V ≤ 1.
where imperfect knowledge of the wave We will now discuss the statistical nature of light, and then consider
function, for example, due to a coupling
to environment, reduces the visibility
temporal coherence—which arises when light is made up of different
of interference fringes. In quantum wavelengths—and finally spatial coherence—when the light source is
mechanics this phenomenon is known spatially extended. We shall use the examples of Michelson’s interferom-
as decoherence. See, for example, eter and Young’s interferometer to measure visibility and characterize
Joos and Zeh (2003).
temporal and spatial coherence, respectively.
8.2 Statistical light

In Chapter 7 we showed that turning a source on and off changes
the spectrum. Here, we show that even a source that is always on
cannot be monochromatic due to the quantum nature of light. Even the
most ideal source—such as an ultra-stable laser—exhibits phase noise
due to spontaneous emission. In fact, all light sources are statistical
at the microscopic level and exhibit a degree of randomness; either
the individual quantum emitters that make up a source decay via
spontaneous emission at random times, or the emission is interrupted
2
For non-laser sources, such as dis- randomly by the environment, e.g. collisions.2
charge lamps, stars etc. where different Consider an ‘ideal’ source, like a laser, that emits monochromatic
atoms emit light independently of each
other, the spectrum may be dominated
light with a frequency ωc /(2π). At a quantum level, the laser consists of
by the spread in atomic velocities— individual quantum emitters that undergo spontaneous emission giving
Doppler broadening—or collisions— rise to phase fluctuations. In Fig. 8.3 we simulate a monochromatic
collisional or pressure broadening. wave form that exhibits phase jumps at random times (indicated by
See Loudon (2000), Chapter 3 for
a comprehensive discussion of the
the dashed lines). The spectrum of light—given by a numerical Fourier
fluctuation properties of chaotic light. transform3 —is shown below. The effect of the phase jumps is to broaden
3
See Exercise 8.11 for more details.
the spectrum producing a Lorentzian ‘lineshape’. The spectral width is
inversely proportional to the average time between the phase jumps, τc ,
which we will show is related to the coherence time.
8.3 Temporal coherence

In this section we shall look at how the degree of coherence is related to
the non-monochromicity that all light sources exhibit. First, we consider
the idealized case of the sum of two plane waves with amplitudes E1 and
E2 , and angular frequencies ω1 = ck1 and ω2 = ck2 . The general case

would be to consider two waves that have different phases φ1 and φ2
that evolve in time in a random way, but as we will find that the time-
averaged intensity does not depend on this relative phase, this does not
change the result. The total field at position z = 0 and time t is
E = E1 cos ω1 t + E2 cos ω2 t , (8.2)
and the instantaneous intensity, before any time averaging, is

I = 2I1 cos2 ω1 t + 2I2 cos2 ω1 t + 2 I1 I2 cos ω1 t cos ω2 t , (8.3)
where we have used the standard relationship between intensity and field
amplitude, I = 12 0 cE 2 . Now we perform the time average. The first
two terms have a constant average (of I1 and I2 , respectively) but the
last term has a magnitude that varies4 with the sum (ω1 + ω2 )/2, and
difference |ω1 − ω2 |/2, and what we see depends on whether our detector
is fast enough to follow these frequencies. As optical frequencies are Fig. 8.3 Top: Time dependence,
f(t), for a monochromatic wave with
in the hundreds of terahertz range, and even fast detectors can only central angular frequency, ωc , that is
measure tens of gigahertz, it is typically the case that we will only see phase shifted at random times t1 , t2 ,
the time averages of these terms, which are zero. Consequently, the t3 , etc. The average time between
time-averaged intensity is these phase jumps corresponds to the
coherence time. Below: The spectrum
of the wave, given by |F(ω)|2 , where
I = I1 + I2 . (8.4) F(ω) = F [f(t)]. The spectrum (shown
in grey) is described by a Lorentzian
This expression says that if the light field contains different frequency function (black line) with a width
components, and we measure the time-averaged intensity, then the total inversely proportional to the coherence
intensity is given by the sum of the intensities of each component. We time. The Fourier relationship between
the spectrum width and correlation
can generalize this result in the form of a crude rule,5 namely: for
(or coherence) time is an example
coherent light we add amplitudes and then square to find the intensity; of the Wiener–Khinchin–Einstein theo-
whereas for incoherent light we simply add intensities: rem, see Section 8.5 and Exercise 8.11.
4
Rule: coherent, add amplitudes; incoherent, add intensities. cos A cos B =
1
2
cos(A − B) + 1
2
cos(A + B).
5
We now extend our discussion to the more realistic case of fields com- Crude because all light is somewhere
posed of more than two discrete frequencies. The randomness associated between coherent and incoherent.
with incoherence arises from the addition of many independent waves
with different phases and different frequencies. In this case, inspired by
Fig. 8.3, we define coherence time, τc , as the inverse of the bandwidth6 6
A more detailed discussion of how the
of the light: width of various functions is defined is
found later in the chapter.
1
τc = , (8.5)
Δν
and the distance light travels in the coherence time is known as the
coherence length, Lc , defined as
Lc = cτc . (8.6) 7
Equation (8.5) and Table 8.1 clearly
demonstrate the motivation for using
Table 8.1 lists the spectral widths and coherence lengths for five different stable lasers in interferometers where
light sources.7 The coherence length of sunlight is so short that path large path differences are encountered.
130 Coherence
Table 8.1 Central wavelength, λc , spectral width, coherence time, and length for five
different light sources.
Source λc (nm) Δν (Hz) τc Lc
Sunlight 575 3 × 1014 3 fs 1 μm

Sodium discharge lamp 589 5 × 1011 2 ps 0.6 mm
He–Ne laser (multimode) 633 1.5 × 109 0.67 ns 0.36 m
Diode laser (single mode) 780 1 × 105 0.1 μs 3 km
Ultra-stable laser 1500 40 × 10−3 25 s 7.5 × 106 km
lengths in an interferometer have to match to less than a micron in order

to achieve high-visibility fringes; for discharge lamps the maximum path
difference is ∼ 1 mm. By contrast a typical undergraduate laboratory
He–Ne laser has a coherence length of tens of cm, proving useful in, for
example, holography experiments. The most stable lasers in research
laboratories have now achieved frequency instabilities (i.e. Δν/νc ) of
the order of 10−18 , with coherence times that exceed a second, and
coherence lengths greater than a million kilometres.
8.4 White light

Next we consider what happens to an interference pattern when the
light contains more than one frequency.8 In Chapter 7 we looked briefly
at adding different frequencies with a well-defined phase relationship
in order to build wave packets, or pulses of light. The difference now
Fig. 8.4 A Michelson interferometer is that we consider much larger frequency differences that give rise to
similar to Section 3.12, except that
now the input consists of more than a time dependence that is too fast to detect. In this case, the time-
one ‘colour’ or range of colours. The averaged intensity is given by a sum of the intensity distributions for
example shown is for two colours. each frequency component.
8
We may refer to light with many fre- To measure the interference pattern we shall use the Michelson
quencies generically as polychromatic, interferometer, discussed in Section 3.12, see Fig. 8.4. To recap: the
broadband, or white light. Michelson interferometer consists of a beam-splitter, which divides the
input into two beams that traverse separate paths and are recombined
with a variable path difference 2Δ. At the output we measure the
intensity as a function of Δ. If the input is partially coherent, then the
output intensity, I, oscillates as a function of the path difference. For
a single-frequency input with frequency νc = ωc /(2π) = c/λc (perfect
coherence), there is an intensity maximum whenever Δ = mλc /2, as
shown in the top rows in Fig. 8.5 for two different values of λc . If,
instead, the input consists of two wavelengths, λ1 and λ2 , with the same
amplitude, then the output field is

E = 12 E0 1 + ei4πΔ/λ1 e−iω1 t + 1 + ei4πΔ/λ2 e−iω2 t , (8.7)
where we have assumed that both path lengths for Δ = 0 are an integer
number of wavelengths. In contrast to Section 3.12 we have included the
8.4 White light 131
time dependence explicitly. When we take the modulus-squared of the

field, the cross terms involving |ω1 − ω2 | and ω1 + ω2 average to zero and
we measure an intensity,

4πΔ 4πΔ
I = I0 2 + cos + cos , (8.8)
λ1 λ2
i.e. the sum of the two interference pattern for each colour on its
own. The intensity as a function of Δ—known as an interferogram—
is plotted at the bottom of Fig. 8.4 and on the right in row (iii) of
Fig. 8.5. The pattern is similar to the phenomena of beats in the time
domain, discussed in Chapter 7. As we shall show, the interferogram
provides a means of determining the spectrum of a source. As we add
Fig. 8.5 Frequency spectrum (left

column) and Michelson interferometer
output (interferogram, right column)
as a function of the path difference
Δ. In rows (i) and (ii) the input is
monochromatic with a frequency equal
to 0.84 and 1.16 times the central
frequency, νc = c/λc , respectively. If
the central frequency corresponds to
green light (λc = 532 nm) then rows
(i) and (ii) correspond to 633 nm (red)
and 459 nm (blue), respectively. In
rows (iii)–(iv) we show combinations
of two and four colours. Panel (v)
depicts a continuous spectrum and the
corresponding fringe pattern.
more frequencies, row (iv) in Fig. 8.5, some of the maxima in the beat
pattern are suppressed. For a continuous spectrum, row (v), all but
the central fringe in the interferogram are suppressed. As we might
guess by comparing Fig. 8.5 to Fig. 6.7, the input spectrum and the
interferogram are related via a Fourier transform. We shall quantify
this Fourier relationship in Section 8.5. The use of a Michelson, or
other interferometer, to characterize an unknown spectrum is known as
Fourier-transform spectroscopy.
Figure 8.5 also illustrates the relationship between fringe visibility
and the input spectrum. The visibility of the fringes is equal to 1 for all
path-length differences for a monochromatic input, rows (i) and (ii); the
visibility oscillates periodically as a function of path-length difference for
discrete frequencies in the input, rows (iii)–(iv); the visibility rapidly
falls to zero as a function of path-length difference for white light input,
row (v). As we discussed in Section 8.3 the maximum path difference
at which interference can be observed is the coherence length. Row (v)
132 Coherence
suggests how a Michelson might be used to measure a coherence length,

eqn (8.6).
8.5 Wiener–Khinchin–Einstein theorem

9
Norbert Wiener (Missouri 1894– The Wiener–Khinchin–Einstein theorem9 states that the power
Stockholm 1964); Aleksandr spectrum of a function is proportional to the Fourier transform of its
Yakovlevich Khinchin (Kondrovo
1894–Moscow 1959); Albert Einstein
autocorrelation function. It is particularly powerful—finding utility
(Ulm 1879–Princeton 1955). beyond optics—because it can be applied to a class of random signals for
10 which a Fourier transform is undefined.10 To understand the theorem,
Historically, Einstein used the con-
cept without proof in 1914; Wiener first we need to define the autocorrelation and the power spectrum.
proved the theorem in his 1930 ar- In Section 8.4 we saw the link between fringe visibility in a Michelson
ticle ‘Generalized harmonic analysis’; interferometer and the (discrete) spectrum of light. The next step,
and Khinchin extended the result to
stationary stochastic processes. The
mathematically, is to extend the sum from a few to infinitely many
proof of the theorem is very similar to waves.11 As the Michelson is an example of an amplitude-division
that used for the convolution theorem, interferometer we can evaluate the interferogram by thinking of the
and is included as an end-of-chapter correlation of the two fields that interfere. At a point in the centre of the
exercise.
11
observation plane, let E(t) be the electric field from one of the paths; the
Whereas in the analysis of the field from the other path is E(t + τ ), where τ is the time delay associated
interferogram produced by a Michelson
interferometer when illuminated by with the arms of the interferometer not necessarily being of the same
discrete frequencies, we have used length: τ = 2Δ/c. From the principle of superposition the total field
Fourier synthesis, we shall now take at that point will be the sum of the two components associated with the
the complementary approach of Fourier
different paths, and we can calculate the time-averaged intensity from
analysis to interpret the output.
the expression
∗
I = 0 c |E|2 = 0 c {E(t) + E(t + τ )} {E(t) + E(t + τ )} . (8.9)
The square modulus of the electric field can be written as

2 |E(t)|2 + 2 E ∗ (t)E(t + τ ) = 2 |E(t)|2 + 2Γ(τ ) . (8.10)
The first term is the sum of the intensities of the waves from the separate
arms, and the second term involves the average of the product of the
fields of the separate waves. Γ(τ ) is know as the autocorrelation
12
It is also known as the first-order cor- function of the electric field,12 formally defined as
relation function. Later in the chapter,
ˆ Ta
when we meet intensity correlations,
∗ 1
second-order coherence functions will Γ(τ ) = E (t)E(t + τ ) = E ∗ (t)E(t + τ )dt , (8.11)
appear. Ta 0
where the averaging time Ta is long compared to the characteristic
13
To calculate the autocorrelation of timescale of the fluctuations.13 It is also useful to write a normalized
a function we multiply each point of form of the autocorrelation,
the function by another point a time
τ later, and then sum the products Γ(τ ) E ∗ (t)E(t + τ )
over the integration duration Ta — γ(τ ) = = . (8.12)
mathematically it is very similar to Γ(0) E ∗ (t)E(t)
the convolution operation, Section B.4;
however only one function is needed, The normalized autocorrelation function is constrained to lie within the
and there is no need to take the mirror range 0 ≤ |γ(τ )| ≤ 1. As it quantifies the temporal correlation between
image in the multiplication. the two fields in the Michelson interferometer it is a measure of the
coherence of the fields. Equation (8.12) quantifies the earlier discussion:
when |γ(τ )| = 1 the fields are fully coherent, ‘add amplitudes’; |γ(τ )| = 0
the fields are fully incoherent, ‘add intensities’; and |γ(τ )| =
0 or 1 the
fields are partially coherent. Evidently the temporal behaviour of γ(τ )
allows a quantification of the coherence time and length.
The autocorrelation is also related to the fringe visibility. We can
write the time-averaged intensity of eqn (8.9) as
I = 2I0 {1 + γ(τ )} , (8.13)
where I0 is the intensity from either wave alone. Therefore, for this case,
we have
Imax − Imin (1 + |γ(τ )|) − (1 − |γ(τ )|)
V= = = |γ(τ )| , (8.14)
Imax + Imin (1 + |γ(τ )|) + (1 − |γ(τ )|)
i.e. the fringe visibility is equal to the normalized autocorrelation
function of the fields. This confirms mathematically our earlier result
that coherent light gives clear fringes; partially coherent light gives
less distinct interference fringes; and incoherent light does not produce
interference fringes.14 14
For this particular case of the two
Table 8.2 shows the functional form for the autocorrelation function, waves having equal amplitude, the
visibility is exactly equal to the mod-
γ(τ ), for a monochromatic wave, a Lorentzian chaotic light source— ulus of the autocorrelation; this one-
similar to the example shown in Fig. 8.3 and explored in Exercise 8.11— to-one correspondence is not obtained
and a Doppler-broadened light source. with waves of different intensities, as
is investigated in an end-of-chapter
exercise.
Table 8.2 Example of the autocorrelation function γ(τ ) for three different
types of light waves of central angular frequency ωc . Typical values of the
correlation time, τc , for different light sources can be found in Table 8.1.
light source γ(τ )
Monochromatic wave e−iωc τ

Lorentzian chaotic light (e.g. collision broadened) e−iωc τ e−|τ |/τc
2
Gaussian chaotic light (e.g. Doppler broadened) e−iωc τ e−(π/2)(τ /τc )
For the examples of non-monochromatic waves in Table 8.2, γ(τ ) de-

crease monotonically, but with different functional forms.15 A frequently 15
The result for a monochromatic
encountered definition of the temporal ‘width’ of the autocorrelation wave is an end-of-chapter-exercise; the
derivations of the other two results
function, and hence a formal definition of the coherence time, is the can be found in Chapter 3 of Loudon
power-equivalent width, defined as16 (2000).
ˆ ∞
16
τc = |γ(τ )|2 dτ . (8.15) See Saleh and Teich (1991), Chapter
−∞ 10.
We shall present a definition of the bandwidth of the spectrum of the

light later in Section 8.6.
8.6 Power spectral density

Now, we apply the Wiener–Khinchin–Einstein theorem to investigating
the link between the envelope of the interferogram and the frequency
134 Coherence
spectrum of the light. As we saw in Section 8.2, the electric field from a
sum of many waves with different frequencies may be chaotic or random,
but the theorem also applies to any stationary random process and
its spectrum. A random, or stochastic, process is called stationary if
the statistical properties such as the mean and variance—calculated
from the probability densities governing the fluctuations; see Hughes
and Hase (2010)—are invariant under a translation of the origin of time.
Specifically, when calculating the average in eqn (8.11) the value of the
17
The wave form shown in Fig. 8.3 is an integral is independent of the initial time.17
example of a stationary random func- Even for a field with random fluctuations, we can still write the average
tion, where the statistical properties do
not change over time.
intensity as a sum of contributions from components with angular
18
frequency, ω, which is proportional to18
Note in particular the use of the ˆ ∞ˆ ∞
dummy variables for time in the Fourier
integrals, these have to be different.

|A(ω)| =
2
E ∗ (t)E(t )e−iω(t −t) dt dt ,
ˆ−∞ −∞
∞ ˆ ∞
= E ∗ (t)E(t + τ )e−iωτ dt dτ , (8.16)
−∞ −∞
with τ = t − t and A(ω) = E0 F(ω). Substituting the definition of the

autocorrelation of the electric field, eqn (8.11), into the right-hand side
of eqn (8.16), we obtain
ˆ ∞
|A(ω)|2 = Ta E ∗ (t)E(t + τ )e−iωτ dτ ,
−∞
= Ta F [Γ(τ )] (ω) . (8.17)
Recall that the averaging time Ta has to be longer than the timescale of
any fluctuations. Formally, we can take the limit Ta → ∞, and define
19
Recall that the factor of 0 c is needed the power spectral density S(ω) as19
to relate the intensity to square of the
1
electric field. See eqn (1.23). S(ω) = lim 0 c |A(ω)|2 . (8.18)
Ta →∞ Ta
S(ω) dω is the average power per unit area transmitted by angular

20
The total intensity is obtained by frequencies in the range ω to ω + dω.20 As advertised, eqns (8.17) and
summing the contributions from all (8.18) are a statement of the Wiener–Khinchin–Einstein theorem—the
angular frequencies,
ˆ ∞ power spectral density and autocorrelation function are related via a
I= S(ω) dω . (8.19) Fourier transform,
−∞
S(ω) = 0 cF [Γ(τ )] (ω) . (8.20)
We can also define a normalized version of the power spectral density,
21
The normalized power spectral den- S̃(ω),21 defined as
sity and the normalized autocorrelation
function are a Fourier transform pair, |A(ω)|2 |F(ω)|2
S̃(ω) = ´ ∞ = ´∞ . (8.21)
see end-of-chapter exercises.
−∞
|A(ω)|2 dω −∞
|F(ω)|2 dω
The power spectral density also allows us to quantify the concept of
the width of the frequency spectrum of the light, the bandwidth. If we
define the angular bandwidth as
´ 2
∞
−∞
S(ω)dω
Δω = ´ ∞ 2 , (8.22)
−∞
S (ω)dω
then the linear bandwidth is simply Δν = Δω/(2π), and we find that
1
Δν = , (8.23)
τc
in agreement with eqn (8.5), irrespective of the shape of the spectrum.22 22

See Saleh and Teich (1991), Chapter
The following example, see Fig. 8.6, further illustrates this inverse 10, for more details.
relationship between bandwidth and coherence length.
Example 8.1
Fourier transform spectroscopy: We can further highlight the role of the Fourier
transform in Michelson interferometry with broadband statistical light by looking
again at the intensity at the output, as expressed by eqn (8.9). Recalling the
Fig. 8.6 Demonstration of the Wiener–

Khinchin–Einstein theorem using a
Michelson interferometer. The input
is a statistical light source with the
normalized power spectrum, S̃(ω),
shown in the left-hand column. The
corresponding Michelson interferogram
is shown in the right-hand column.
The spectrum and autocorrelation—
extracted from the fringe visibility,
eqn (8.14)—are a Fourier transform
pair. The number of fringes is
approximately 2λc /δλ, where δλ is
the width of the spectrum and λc =
c/[ωc /(2π)] is the central wavelength.
Fourier relationship between the electric field’s autocorrelation and the power spectral
density, eqns (8.17) and (8.18), and the link between intensity and power spectral
density, eqn (8.19), and taking advantage of the fact that S(ω) is real, we can rewrite
23
this expression as23 Note the different integration limits
ˆ ∞ to eqn (8.19).
I=2 S(ω) [1 + cos (ωτ )] dω . (8.24)
0
The interpretation of this result is that the output of the Michelson interferometer
as a function of time delay, or path-length difference—the interferogram—is a sum
of contributions produced by each monochromatic component of the input light,
weighted by the power spectral density. For a sum of discrete outputs we have seen
that the visibility collapsed and then revived; here, with a continuous input spectrum,
there is a decrease in the visibility as a function of the time delay. Measuring the
decrease in visibility of the interference fringes as a function of path-length difference,
as in Fig. 8.6, allows us to evaluate the spectrum of the input light (the power spectral
density). This is the basis of a widely used technique known as Fourier transform
spectroscopy.
136 Coherence
8.7 Intensity correlations

Thus far, we have always assumed that the incident intensity, I0 , is
constant; however, all light sources exhibit some intensity fluctuations,
and I0 is itself time dependent. The observed time-averaged intensity
depends on the nature of these fluctuations, which in turn depends on
the nature of the source. When considering first-order coherence we have
characterized the statistical properties of light by considering intensity
measurements at one instant of time. Further insight into the properties
of light can be gained by considering intensity correlations, i.e. pairs
of intensity measurements with a delay τ . The normalized degree of
24
Note that the order of the electric second-order coherence, g 2 (τ ), is defined as24
field factors obeys a convention that is
I ∗ (t)I(t + τ ) E ∗ (t)E ∗ (t + τ )E(t + τ )E(t)
2
used in the quantum treatment of the
g 2 (τ ) = = . (8.25)
coherence properties of light. I 2 E ∗ (t)E(t)
The degree of second-order coherence can be used to characterize photon
emission by different sources. For example, a single quantum emitter
can only emit one photon at a time giving rise to anti-bunched light,
g 2 (0) < 1. In contrast, a thermal (or blackbody) source emits photons at
random times, and interference of the time-averaged intensity produces
bunched light, g 2 (0) > 1. The discovery of bunched light by Hanbury
25
Robert Hanbury Brown Brown and Twiss25 in 1956—later known as the Hanbury Brown and
(Aruvankadu 1916–Andover 2002), Twiss effect—led to the birth of quantum optics.26
Richard Q. Twiss (Simla 1920–2005).
26
The interested reader is referred
to Loudon’s The Quantum Theory of 8.8 Spatial coherence
Light (2000).
In this section we shall assume that the light illuminating our inter-
ferometer is quasi-monochromatic, i.e. the coherence length is longer
than any path length differences in the aperture, such that there is
no degradation of the interference fringes as a consequence of lack of
temporal coherence. In studying spatial coherence we shall investigate
the consequence of the spatial extent of the source on the degree to which
knowledge of the electric field at one location allows prediction (or lack
of it) of the electric field at another location. We shall exemplify these
ideas by analysing Young’s double-slit experiment.27
27
Consider the double slit shown in Fig. 8.7. The double slit is in the
Young observed interference fringes
z = 0 plane and the field is assumed to be uniform in the y direction.
using sunlight even though it is neither
monochromatic nor from a localized In Chapter 3, a key assumption was that the light arriving at the two
source. The Sun is 1.4 million slits could be described by a plane wave propagating along z, i.e. at
kilometres across and light emitted normal incidence to the screen containing the slits. This meant that the
from one side does not maintain a fixed
phase relationship with light emitted
field arriving at both slits was the same. This is a very good assumption
from the other. Why was Young still if the source looks like a distant point source, but what happens if the
able to observe interference fringes? source is spatially extended? To limit the spatial extent of the source,
Young placed another aperture upstream of the double slit, and it is the
dimension of this aperture that determines whether interference fringes
are observed or not.
Consider an input aperture with width as , placed a distance zs
upstream of the double slit, as shown in Fig. 8.7. The aperture allows
plane waves with angles in the range −as /(2zs ) < θs < as /(2zs ) to enter
the interferometer.
In Young’s apparatus, for an on-axis input point the field at a
displacement x in the observation plane is
E = E1 eikdx/2z + E2 e−ikdx/2z . (8.26)
For an input point displaced by a distance xs the two terms pick up an
additional phase such that
E = E1 eikdx/2z eikdxs /2zs + E2 e−ikdx/2z e−ikdxs /2zs , (8.27) Fig. 8.7 Young’s interferometer, con-
sisting of a source plane at z = −zs ,
and the modulus-squared is a double-slit plane at z = 0, and the
observation plane at z. In this example,
kdx kdxs
EE ∗ = E12 + E22 + 2E1 E2 cos + . (8.28) the incident light is assumed to be a
z zs plane wave propagating at an angle θs
relative to the z axis. In the paraxial
If E2 = E1 , then we can write the intensity in the plane z as
limit, we can write that θs = xs /zs ,
kdx kdxs 2 kd x xs where xs is a transverse displacement
Is = 2Īs 1 + cos + = 4Īs cos + , (8.29) in the source plane. The effect of the
z zs 2 z zs displaced source point is to translate
where Īs is the intensity in the observation plane if one slit is blocked. the interference pattern by a distance
(xs /zs )z in the observation plane.
Hence, the result of the displaced input is simply to translate the
interference pattern by a geometrical factor (z/zs )xs .
The next step is to sum over the source coordinate, xs . As each
component originates from a different point on a distant source, e.g. the
Sun, we can assume that we can sum each component incoherently, i.e.
add their intensity contributions rather than their amplitudes. Hence for
an extended input, the intensity distribution is given by the integral of
eqn (8.29) over the source coordinate xs with limits ±as /2. The result of
this integral for different values of as is shown in Fig. 8.8(right column).
As more displaced waves are added, the visibility of the interference
fringes decreases towards zero; then the fringes partially reappear, but
with a π phase difference. The reason for this sign reversal is that there
are now more waves in the sum with their central maximum displaced
by half a spatial period. Next, we derive an analytical expression for the
visibility of the fringe pattern seen in Fig. 8.8.
8.9 van Cittert–Zernike

Shortly we shall find that the visibility of the interference fringes is given 28
Pieter Hendrik van Cittert (Gouda
by the modulus of the Fourier transform of the input intensity distribu- 1889–Utrecht 1959); Frits Zernike (Am-
tion. This result, known as the van Cittert–Zernike theorem,28 is sterdam 1888–Amersfoort 1966).
particularly useful in astronomy, as it says that by analysing interference
patterns one can determine the size of a distant source, e.g. a star, even
when it is impossible to directly image their spatial extent.
Following the analysis of eqn (8.29), we can write the intensity
component from a particular source point xs at an observation point,
(x, z), as

Is = 2Īs + 2Īs ei(kdx/z+kdxs /zs ) , (8.30)
138 Coherence
Fig. 8.8 Variation of the fringe

visibility as the spatial extent of the
‘source’ is varied. The left-hand column
shows the spatial width of the source
(as in Fig. 8.7). The right-hand column
shows the corresponding double-slit
interference pattern. Examples of the
displaced fringes that make up the
pattern are shown in grey in the top
row. The visibility goes to zero when
the input width as = (λ/d)zs . If the
width is increased further (bottom row)
the fringes partially reappear, but with
the opposite phase. The maximum
visibility of fringes is obtained in the
limit as → 0, however, at the cost that
the intensity goes to zero.
where we have reverted to the complex form, as this allows us to separate

the two terms and easily perform the integral over the input coordinate,
29
Taking care to sum all the contribu- xs . Now we perform the sum over all input coordinates.29 The first two
tions of the intensity per unit length of terms sum to give the intensity emerging from slits 1 and 2. The final
the source.
term contains all the information about the relative phase. Assuming
that the contribution to the intensity from each slit is Ī, then the integral
of eqn (8.30) has the form

kdx
I = 2Ī 1 + |γ12 | cos , (8.31)
z
where the interference term is
Γ(k)
γ12 = , (8.32)
Γ(0)
with
ˆ ∞
Fig. 8.9 The fringe visibility, |γ12 |

Γ12 (k) = f(xs )e−i(kd/zs )xs dxs . (8.33)
−∞
in eqn (8.34), for Young’s double-slit
experiment as a function of the input We have used the aperture function f(xs ) that characterizes the intensity
slit width, as . The first zero in the
visibility occurs when as = (λ/d)zs , distribution of the input. For a symmetric source γ12 is real. This
where zs is the distance between the expression has the form of a Fourier transform. Following the analysis
input plane and the plane containing of eqn (8.14), we can explicitly evaluate the visibility of the fringes,
the slit, see Fig. 8.7. and obtain V = |γ12 |. This result, that the visibility of the interference
pattern is given by the Fourier transform of the intensity distribution of
the source, is known as the van Cittert–Zernike theorem.30 For a
30
The van Cittert–Zernike theorem uniform source of width as , f(xs ) = rect(xs /as ), and we obtain
plays a similar role for spatial coher-

ence to that of the Wiener–Khinchin– kdas
Einstein theorem for temporal coher- γ12 = sinc . (8.34)
ence; specifically that the visibility of 2zs
the interference fringes is governed by
the Fourier transform of a correlation The fringe visibility given by |γ12 | is plotted in Fig. 8.9. The first zero
function. of the sinc term is at as = (λ/d)zs , exactly as we found previously.
There is a simple explanation for why the fringe visibility goes to zero
when as = (λ/d)zs . Recall that for a rectangular aperture of width as ,
the sinc diffraction pattern has first zeros at an angle of λ/as . Therefore
the transverse size of the spot on the screen with the double slits is
zs λ/as . If the slit is narrower than (λ/d)zs the diffraction pattern covers
both slits (of separation d). We can say that the transverse coherence
length is longer than the separation of the slits, hence we get clear
fringes. By contrast, for a wider slit there is a narrower diffraction
pattern, and light illuminating one of the slits is not coherent with the
light illuminating the other.31 Figure 8.10 highlights the importance of 31
In a photon picture one can ask
the width of the first slit in the formation of clear interference fringes whether it is possible to work out which
path a photon takes from the source
in Young’s experiment. Note the trade-off between the amount of to the screen. If one can, there is
light incident on the screen containing the double slits—a wider initial no interference. The condition on the
aperture transmits more light—and the coherence of the illumination—a width of the first slit is exactly to
narrower initial aperture illuminates the double slits more coherently. ensure that no ‘which path’ information
is available, see Section 9.8.
Fig. 8.10 For a narrow slit (left

column), waves that are normally
incident (upper) and inclined (lower)
are diffracted sufficiently to illuminate
the double slits ‘coherently’; and the
position of the interference does not
shift significantly as the angle of
incidence is changed. In contrast, for
a wider entrance slit (right column),
changing the angle of incidence shifts
the fringes by an amount comparable to
fringe spacing leading to a loss of fringe
visibility.
There is another useful way of thinking about the transverse coherence

length of light in the double-slit plane. For small angles we see that the
transverse coherence length, which is given by λzs /as , can be written as
λ/θ, where θ is the angle submitted by the source slit at the centre of
the double-slit plane. Sources which present small angular widths at an
interferometer provide coherent illumination.
If, instead, we vary the aperture separation, d, (or tune the wavelength
by filtering) we can null the fringe visibility, as in Fig. 8.11. In this case,
the sinc function is zero, so kdas /2zs = π, which can be rearranged to
give the angle subtended by the source, as /zs = λ/d. Hence, if we can
assume that the source intensity is uniform then by nulling the fringe
visibility we measure the size of the source, see Section 8.11.
The van Cittert–Zernike theorem finds great utility in optics, beyond
giving an analytic expression for the fringe visibility in Young’s appa- Fig. 8.11 Fringe pattern in a double-
ratus; see Goodman’s Statistical Optics (1985) or O’Neill’s Introduction slit experiment as the slit separation d
to Statistical Optics (1963) for further details. The van Cittert–Zernike is varied. The fringes disappear when
result explains how the degree of spatial coherence of an optical field the slit separation d = λ/Δθs , where
Δθs = as /zs is the angle subtended by
downstream of a source of incoherent emitters is related to the geometry a source of width as at a distance zs ,
of the source. It allows for a calculation of the degree of coherence see Fig. 8.7.
of an extended quasi-monochromatic source in terms of the intensity
140 Coherence
distribution over the source. Note that an incoherent set of emitters gives
rise, in general, to a partially coherent field. Specifically, propagation
and diffraction of light can improve the degree of coherence of the field.
That is why we emphasized earlier that coherence is a property of the
field, not the source: Young found a way to illuminate coherently two
holes using sunlight by placing a sufficiently small aperture in front of
them; thus the question is the Sun a coherent source? is not useful.
8.10 Propagation of coherence

An alternative approach to that taken by van Cittert and Zernike is to
use the Fresnel propagation formula, eqn (6.33), to calculate the electric
field correlation between two points, (0, z) and (x, z), in the observation
32
See also Champeney (1973). plane; abbreviating E (z) (x) as E(x), we have32
¨ ∞
E ∗ (0)E(x)
2
eikx /2z 2 2
= f ∗ (x1 )f(x2 )eik(x2 −x1 )/2z e−ikxx2 /z dx1 dx2 ,
E02 λz −∞
where the double integral is over all coordinates, x1 and x2 , in the z = 0
plane. The equation shows that the spatial coherence characteristics in
the two planes are related by a Fourier transform. The importance of the
2 2
Fresnel factor, eik(x2 −x1 )/2z , which characterizes the size of the Fresnel
zones, as illustrated in Fig. 8.12, decreases as z increases, because the
Fresnel zones become larger and the zones associated with input points
x1 and x2 have a larger overlap. If the field in the first plane has a
very short transverse coherence length, it is possible to derive a simpler
result: that the transverse correlation function in the observation plane
varies with position in exactly the same way as the field amplitude in a
coherent diffraction experiment, with the aperture function replaced by
Fig. 8.12 The propagation of coher- the intensity distribution of the source:
ence: the correlation (spatial coher-
ence) between the fields at (0, z) and 2 ˆ ∞
∗ 2 eikx /2z
(x, z) is given by the Fourier transform
E (0)E(x) = Lc I(x2 )e−ikxx2 /z dx2 , (8.35)
of the overlap between Fresnel zones c0 λz −∞
centred around points (x1 , z) and
(x2 , z) in the input plane. The spatial
coherence grows as the field propagates,
where we have written I(x2 ) = 12 c0 E02 |f(x2 )|2 and Lc is the coherence
i.e. a field that is incoherent at z = 0 length in the input plane. As a consequence, light from a source
can become more coherent in the plane composed of incoherent emitters (such as a star) with characteristic
z, as in Young’s two-hole experiment dimension D at a distance z downstream will have a transverse coherence
with sunlight.
length of ∼ λz/D. For the case of a uniform circular source of diameter
D the spatial coherence function first goes to zero at a value of 1.22λz/D.
Extending the analysis of transverse coherence to two transverse
dimensions we arrive at the concept of a coherence area. Our
example above used Young’s apparatus, concentrating on the form of
the interference fringes along one dimension. We found it useful to
compare the transverse coherence length of the light arriving at the two
slits relative to the slit separation. Of course, the light illuminating the
slits has a two-dimensional distribution. Using the van Cittert–Zernike
33
See Goodman (1985) for details. theorem, it is possible to show,33 for a monochromatic uniformly bright
source of incoherent emitters with an area As , that the coherence area,

Ac , at a distance z downstream is
(λz)2 λ2
Ac = = , (8.36)
As Ωs
where Ωs is the solid angle subtended by the source at the centre of
the observation region. This result encapsulates the earlier discussion
that the coherence properties of a wave field improve with propagation,
i.e. it is easier to observe interference fringes if the light illuminates an
interferometer.
8.11 Stellar interferometry

The Fourier transform relationship between fringe visibility and trans-
verse coherence outlined in Section 8.10 may be used to measure the
Fig. 8.13 Michelson’s stellar interfer-
size of stars, a technique known as Michelson stellar interferometry. ometer. By varying the distance, d,
For a star the source is a circular aperture and we should replace the between the outermost mirrors until
rect function in eqn (8.33) by circ(ρ/Ds ), the sinc dependence of the the interference pattern disappears, it
visibility in eqn (8.34) is replaced by a jinc function and the fringes is possible to estimate the spatial size
of the source.
vanish for an angular width Ds /zs = 1.22λ/d, where Ds is the diameter
of the source. This idea was used by Michelson and Pease (1921) to
measure the diameter of Betelgeuse, the first measurement of an extra-
solar stellar diameter. They mounted two mirrors on a rail such that d
could be varied, as in Fig. 8.13, and found that fringes disappeared at
about d = 3 m using light at λ = 575 nm. This gives an angular width
of Ds /zs = 2 × 10−7 (Michelson quotes 47.0 ± 4.7 milli-arcseconds).
The distance to Betelgeuse is zs = 7 × 1018 m, giving a diameter
Ds = 1.4 × 1012 m, about 103 times larger than the diameter of the
Sun (the solar radius is 6.96 × 108 m). It has proved difficult to make
more accurate measurement because Betelgeuse is a complex source with
a non-uniform intensity distribution and a shape that evolves with time.
142 Exercises
Chapter summary
• Coherent light has a well-defined phase.

• The visibility of the fringe pattern in an interferometer is a
measure of the coherence of the illuminating light.
• Temporal coherence relates to the time variation of the phase
associated with different frequency components.
• The coherence time of chaotic light is the inverse of the bandwidth.
• A Michelson interferometer can be used as a Fourier transform
spectrometer to characterize the power spectral density of the
light.
• The Wiener–Khinchin–Einstein theorem states that the
autocorrelation function of a stationary random process and the
power spectrum of the process are a Fourier transform pair.
• Spatial coherence relates to the ability to predict the electric
field at one location from knowledge of the field at another location
when the light is emitted by a source of finite size.
• Young’s double-slit apparatus can be used to characterize spatial
coherence.
• The transverse coherence length (area) of light is inversely
proportional to the angle (solid angle) submitted by the extended
source.
• The van Cittert–Zernike theorem states that the visibility
of an interference pattern formed in a wavefront-division
interferometer illuminated by quasi-monochromatic light is given
by the Fourier transform of the intensity distribution of the source.
Exercises
(8.1) Temporal coherence (8.3) Single emitters
If the spectral width of a source is quoted as the Discuss the coherence of light emitted by a
wavelength range Δλ, show that the coherence single atom. How would you test the coherence
time is Δt = λ2c /(cΔλ), where λc is the central properties experimentally?
wavelength.
(8.4) Visibility of fringes and γ(τ )
(8.2) Coherence of sunlight Repeat the analysis of the form of the interference
Assuming a filter is used to select a narrow band fringes in an amplitude-splitting interferometer,
of colours around a mean wavelength of 550 nm, but with different amplitudes for the two waves.
and given the diameter of the Sun, and the Earth– In this
case, show that the visibility is given by
Sun separation in the text, calculate the transverse V = 2 (I1 I2 )1/2 / (I1 + I2 ) |γ(τ )|, where I1 and
coherence length of sunlight at Earth. I2 are, respectively, the intensities that waves 1
Exercises 143
and 2 alone would produce. Show that this reduces expressions for the power spectrum. If we define
to the value quoted in the text in the special case the autocorrelation as
of equal intensities.
γ(t) = f ∗ (t )f(t + t)
(8.5) Autocorrelation function for a monochromatic ˆ
1 T ∗
wave = lim f (t )f(t + t)dt ,
T →∞ T
For a monochromatic wave with angular frequency 0
ω0 , show that γ(τ ) = e−iω0 τ . Hence show that the and the power spectrum as
magnitude of the first-order correlation is always
1 ∗
equal to one; i.e. the light is perfectly coherent. S(ω) = lim F (ω)F(ω) ,
T →∞ T
(8.6) Autocorrelation function for different light sources
then using Parseval’s theorem, see Appendix B,
Plot the three forms of the autocorrelation
eqn (B.25), and the Wiener–Khinchin–Einstein
function given in Table 8.2 as a function of
theorem, show that
the variable τ /τc . Comment on similarities and ˆ ∞
differences among the curves.
S(ω)dω = 2π|f(t)|2 .
(8.7) Properties of the autocorrelation function 0
Show that the autocorrelation function, Γ(τ ), is: If P(φ, t) is the probability of the phase jumping
(i) a maximum value at zero delay. by an amount between φ and φ + dφ in a time t,
(ii) a Hermitian symmetric function of τ ; i.e. then
Γ∗ (τ ) = Γ(−τ ). ˆ 2π
(iii) periodic if E is periodic, with the same period. γ(t) = e−iωc t e−iφ P(φ, t)dφ .
(iv) in the limit of large τ , Γ(τ → ∞) = E ∗ E . 0
Use the result of part (iv) to comment on the value If the average time between phase jumps is τ then
of the autocorrelation function of the electric field we can write that
in the limit of large τ . 1
P(φ, t) = e−|t|/τ δ(φ) + (1 − e−|t|/τ ) ,
(8.8) Power-equivalent width of autocorrelation func- 2π
tions where the first term is the probability that no
Verify that the form of the autocorrelation jump has occurred and the second term is to
functions in Table 8.2 for (i) Lorentzian and (ii) normalize. Using these expressions, show that the
gaussian chaotic light are consistent with the autocorrelation function is
power-equivalent width as defined in eqn (8.15).
(8.9) The Wiener–Khinchin–Einstein theorem γ(t) = e−iωc t e−|t|/τ .
Use the results of the Fourier toolkit (especially Now use the Wiener–Khinchin–Einstein theorem,
the derivation of the convolution theorem) to fill to show that the normalized power spectrum is
in all of the details skipped in the text to prove the
Wiener–Khinchin–Einstein theorem, i.e. that the 1/τ 2
S̃(ω) = . (8.37)
autocorrelation function of a stationary random 1/τ 2 + (ω − ωc )2
process and the power spectrum of the process
Write a simulation of a harmonic wave with
form a Fourier transform pair.
random phase resets as in Fig. 8.3. Calculate the
(8.10) Normalized power spectral density Fourier transform numerically and fit the power
Using results from the Fourier toolkit, show spectrum with the analytical Lorentzian lineshape,
that the normalized power spectral density and eqn (8.37).
the normalized autocorrelation function form a Write an expression for the probability of detecting
Fourier transform pair. a photon with angular frequency between ω and
(8.11) Lorentzian lineshape—tutorial ω + dω over a measurement time T .
The wave form shown in Fig. 8.3 is an example of a (8.12) Fourier transform spectrometry—qualitative
stationary random function, where the statistical Sketch the interferograms associated with a source
properties do not change over time. Although an that emits:
analytic form for the Fourier transform of such (i) two colours with equal amplitude, and
functions does not exist, we can still calculate it (ii) a continuous spectrum with a width equal to
numerically and compare the result to analytical one-tenth of the central frequency.
144 Exercises
(8.13) Fourier transform spectrometry—quantitative (8.18) Temporal coherence and Young’s double slits (1)
Annotate sketches of interferograms for these Figure 8.15 shows the intensity pattern in a
specific cases: Young’s double-slit experiment where the tem-
(i) illumination with a sodium lamp emitting one poral coherence rather than spatial coherence
line at 589.0 nm which is twice as intense as determines the fringe visibility. Show using a
another line at 589.6 nm. Both lines have a similar analysis to Section 8.9 that the intensity
gaussian profile for their power spectral density is given by
with a full width at half maximum (FWHM) of
2 GHz. kc dx
I = 2I1 1 + |γ| cos ,
(ii) Illumination with a helium–neon laser with a z
wavelength 632.8 nm with a gaussian profile for the
power spectral density with a FWHM of 1.5 GHz. where kc = 2π/λc , νc = c/λc is the central
frequency of the input light, and γ = Γ(ν)/Γ(0),
(8.14) Young’s two-hole experiment
where
Thomas Young placed a single hole upstream of a
ˆ ∞
screen containing two holes in order to limit the
Γ(ν) = F(ν)ei2πνdx/(cz) dν .
effective spatial extent of the source. The distance −∞
between the single hole and the two holes was
equal to the distance between the two holes and Derive an expression for the intensity pattern if
the observation plane, zs = z = 1.0 m, and the slit instead of the gaussian spectrum of Fig. 8.15
separation was d = 1.0 mm. an interference filter is placed at the input, and
(i) Estimate the size of the hole needed to observed the frequency spectrum can be described by the
interference fringes for a central wavelength of function F(ν) = rect(ν − νc )/Δν. The eighth
0.55 μm. fringe at x = 8(λc /d)z is found to be suppressed
[Hint: take the upper limit on the hole size as completely. What is the relationship between Δν
the diameter where the fringes disappear, and and νc = c/λc ?
remember to include a factor of 1.22 for circular (8.19) Temporal coherence and Young’s double slits (2)
apertures.] Use the analysis of Exercise 8.18 to show that the
(ii) What is the spacing between the fringes? approximate number of fringes observed, e.g. the
(iii) If the visible spectrum is between 445 and number to first zero of the envelope function, is
625 nm, estimate how many fringes are visible. independent of the dimensions of the apparatus.
(8.15) Spatial coherence and Young’s double slits (1) If we can approximate visible sunlight as a
In the text we derived the width of the slit for rectangular function with Δν/νc = 1/3, estimate
which the visibility of the fringes first goes to zero, how many fringes Young might have been able to
as = (λ/d)zs . What is the relationship between observe.
the fringes from waves originating at positions xs (8.20) Temporal coherence and Young’s double slits (3)
and xs + as /2? Hence explain why the visibility is Figure 8.16 shows the intensity pattern in the
zero for this value of as . xz plane for a Young’s double-slit experiment
(8.16) Spatial coherence and Young’s double slits (2) using white light. Sketch the far-field intensity
Consider a Young’s interferometer where the first distributions along the x axis for
slit has a fixed width as , but the separation d (i) red light only,
between the pair of holes in the second screen is (ii) blue light only, and
variable. Discuss what happens to the visibility of (iii) all frequency components.
the fringes as a function of d. Comment on the near-field distributions for each
(8.17) Spatial coherence and Young’s double slits (3) case.
Figure 8.14 shows the intensity pattern in a (8.21) Michelson’s stellar interferometer
Young’s double-slit experiment as the entrance Using the parameters given in the text, calculate:
slit width as is varied. Sketch how the visibility (i) the transverse coherence length, and
varies as a function of as , indicating the position (ii) the coherence area of the light from Betelgeuse
as = (λ/d)zs . at Earth.
Exercises 145
Fig. 8.14 Fringe pattern in a double-

slit experiment as the ‘source’-slit
width as (see Fig. 8.7) is varied. As
as is increased, more light enters the
interferometer but the range of angles
also increases which leads to a ‘washing
out’ of the fringe pattern. The fringes
disappear completely when the input
width as = (λ/d)zs . For even larger
values of as , the interference pattern
revives, but with reduced visibility.
Fig. 8.15 Young’s double-slit experi-

ment using ‘white light’. The frequency
spectrum is shown on the left and
the corresponding fringe pattern on the
right. The number of observed fringes
is roughly νc /Δν, where νc is the centre
frequency and Δν is the bandwidth.
146 Exercises
Fig. 8.16 Young’s double-slit exper-

iment using ‘white light’. The total
intensity distribution in the xz plane,
given by the sum of the intensity
distributions for each colour, is shown.
Optical imaging 9
Une simple image, si elle est nouvelle, ouvre un monde.
La poétique de l’espace (1958)
9.2 History: Zeiss and Abbe 147
Gaston Bachelard (Bar-sur-Aube 1884–Paris 1962)
9.3 Point-spread function 148
9.5 f to f 151
9.1 Introduction 9.6 Two-lens system 153
In Chapter 6 we looked at the Fourier transforming property of a lens.
9.8 Complementarity I 156
In this chapter we examine how this property determines the parameters
Chapter summary 157
of an image. We introduce and analyse the point-spread function of
Exercises 157
an imaging system, and highlight the importance of the resolution limit
in optics. Also we shall consider a two-lens system, which provides a
useful prototype for common optical instruments such as the telescope
or microscope. First, we briefly review the importance of understanding
diffraction in the history of imaging, and then the Fourier transforming
property of a lens.
9.2 History: Zeiss and Abbe

In 1868, the industrialist Carl Zeiss (Weimar 1816–Jena 1888) sought
the help of the local physics professor Ernst Abbe (Eisenach 1840–Jena
1905) on how to improve the lenses in his microscopes. Abbe began from
the perspective of geometric optics and reached the wrong conclusion
that a smaller aperture lens would be better because aberrations would
be reduced. When these smaller aperture microscopes were found to be
worse, Abbe went back to the laboratory and realized the importance of
diffraction in imaging systems. He concluded that the lens needs to be as
large as possible to capture the higher spatial frequencies that provide
the fine detail in an image. Abbe derived a formula—known as the
Abbe diffraction limit—that relates the diameter and focal length
of the lens to the spatial resolution. A convenient derivation of the Fig. 9.1 The Abbe diffraction limit
Abbe diffraction limit follows from the sum of plane waves considered in is derived by considering the sum of
Chapter 3 and illustrated in Fig. 9.1. The sum of two waves with origin two waves—with wave vectors k1 and
k2 —originating from opposite sides of
on opposite sides of the lens produces an interference pattern with a
a lens.
period,
λ
Λ = ,
2 sin α
148 Optical imaging
where tan α = D/(2f ). Any spatial frequencies larger than 1/Λ will miss
the edge of the lens so Λ sets the resolution limit. The Abbe diffraction
limit is often written in terms of the numerical aperture (NA) of
the lens. We can write that the smallest resolvable detail has a spatial
extent,
λ
Δx
, =
2NA
where NA = n sin α, if we include the possibility that the microscope
1
In the small-angle approximation, the may be operated with a liquid with refractive index n.1 Abbe’s result,
numerical aperture becomes Δx f λ/D, agrees approximately with the Fraunhofer or Fourier result,

NA = n sin tan −1 D
≈n
D
.
Δx = 1.22f λ/D, see Fig. 5.12. Abbe’s contribution also illustrates a
2f 2f fruitful synergy between industry and academia (and between theory
and experiment).
9.3 Point-spread function

Historically, Abbe’s theory was an important step towards our current
perspective, where the diffraction limit of a lens, or lens system, is
expressed in term of the Fourier transform of the transmission through
the lens. Recall that the Fresnel diffraction integral in the focal plane
of a lens, eqn (6.35), is
E0 eikr̄
E (f ) = F [f(x , y )] (u, v) , (9.1)
iλf
where E0 f(x , y ) is the field incident on the lens. The Fourier relationship
between input field and focal spot tells us that if we want to image a
‘point’, i.e. a δ-function in position, we need an infinite range of spatial
frequencies, and hence a lens with infinite spatial extent. In practice, the
finite size of the lens, or lens aperture, sets a lower limit on the spatial
extent of the focal spot and hence on the detail in any image. Figure 9.2
shows the intensity distribution for a uniform light field incident on a
lens with finite size. Only light within the cross-sectional area of the lens
is captured, the rest either continues or is blocked. Looking carefully at
the focus, one can see that the light is not focused to a point but has a
finite transverse size.
The function f(x , y ) in eqn (9.1) may be split into two parts. Firstly,
Fig. 9.2 A plane wave propagating the field incident on the lens plane, which we write as fi (x , y ), and
from left to right incident on a lens with secondly, a function that expresses the transmission through the lens,
finite diameter, D. Only light that falls t(x , y ), which includes the effect of the finite size of the lens. It follows
on the lens is captured and focused. that
The light distribution at the focus has
f(x , y ) fi (x , y )t(x , y ) ,
the form of an Airy pattern, with
spatial extent inversely proportional to
=
the lens size.
and therefore using the inverse convolution theorem, the field in the focal
plane is
E0 eikr̄
E (f ) = F [fi (x , y )] (u, v) ∗ psf(u, v) , (9.2)
iλf
where
psf(u, v) = F [t(x , y )] (9.3)
is known as the amplitude point-spread function. For the case of a

single lens with diameter D, where t(x , y ) = circ(ρ /D), the amplitude
point-spread function, see Section B.13, is

πD2 πDρ
psf(u, v) = jinc , (9.4)
4 λf
and the modulus-squared—the intensity point spread function—is
an Airy pattern. Equation (9.2) can be generalized to more complex
imaging systems, and says that the image is given by the convolution
of the object with the point-spread function of the instrument. The
Fig. 9.3 Effect of the instrument point-
process of convolving one smooth function with a function with fine spread function on the image of a
detail is to ‘smear out’ the details in the latter, as illustrated in double slit calculated from eqn (9.2).
Fig. 9.3. Consequently, the width of the point-spread function dictates The input function, fi (x), is shown
the minimum length scale of the features in the field in the image. dashed. The image function in the focal
plane, f(x), is shown in black. The
As an example, consider the Hubble Space Telescope, which has point-spread function is shown in light
a 2.4 m primary mirror; therefore, the angular width of the point- grey.
spread function is expected to be ∼ 0.3 μrad (or 0.1 arc seconds in
the units preferred by astronomers), see end-of-chapter exercise. The
earliest images obtained from the device unfortunately revealed a point-
spread function that was at least an order of magnitude wider than
the desired specification. Analysis revealed spherical aberration had
been introduced in the production of the mirror, necessitating the
design and construction of The Corrective Optics Space Telescope
Axial Replacement (COSTAR) system that was subsequently added to
the instrument to correct the aberrations and produce a substantially
narrower point-spread function.
Although there is no trivial ‘inverse convolution’ or ‘deconvolving’
operation, if we know the point-spread function of a system it is possible
to (partially) deconvolve the image to remove the limitations of the
instrument. The process involves modelling an image, convolving with
the known point-spread function, and comparing the output with the
measured image. Computationally intensive techniques can then be used
iteratively to achieve a desired level of precision. If the point-spread
function of the instrument is not know, but has to be deduced from
the images, the process becomes more demanding. Nevertheless, there
are examples where deconvolution has been applied successfully to high
signal-to-noise images. Figure 9.4 demonstrates deconvolution for an
image of Supernova 1987a recorded by the Hubble Space Telescope. In Fig. 9.4 Top: Raw image of Supernova
1987a. Bottom: After deconvolution
the upper image we see that the effect of the convolution is a blurring. with the point-spread function. From
Krist et al. (2011).
9.4 Angular resolution

As an example of the application of a convolution with the point-spread
function, consider a plane wave incident at an angle Δθ with respect to
150 Optical imaging
the optical axis. Such a scenario arises, for example, in astronomy when
imaging distant objects. The incident field in the z = 0 plane is given
by E0 fi (x , y ), where

fi (x , y ) = eikΔθx = ei2π(Δθ/λ)x .
Using the translation property of Fourier transforms, we find that the

Fourier transform of the plane wave is a displaced δ-function,

Δθ
F ei2π(Δθ/λ)x (u) = δ u − ,
λ
where u = x/(λf ). For an optical instrument with aperture diameter,
Fig. 9.5 Plane waves from two distant

sources with an angular separation of
Δθ fall on a lens. The light is focused
forming two Airy patterns separated by
f Δθ in the focal plane.
D, the transmission is given by

ρ
t(x , y ) = circ .
D
The field in the focal plane is proportional to the convolution of the
displaced δ-function and the amplitude point-spread function, eqn (9.2),

E0 eikr̄ Δθ
E (f ) = δ u− ∗ psf(u, v) ,
iλf λ
where for a simple lens the amplitude point-spread function is a jinc
function. The effect of the convolution with a δ-function is simply to
change the angle to the centre of the jinc function in the xz plane, as
illustrated in Fig. 9.5. The intensity pattern is
⎧ 1/2 ⎫
2 4 ⎨ 2 ⎬
π D Δθ
I (f ) = I0 jinc 2
πD u − + v 2
. (9.5)
16λ2 f 2 ⎩ λ ⎭
As u = x/(λf ), this corresponds to an Airy pattern with centre displaced

to x = Δθf , see Fig. 9.5. For incoherent sources with an angular
displacement, see Chapter 8, the total intensity in the focal plane is
simple the sum of two displaced Airy patterns, as illustrated in Fig. 9.5.
According to the Rayleigh criterion, two Airy patterns of equal
9.5 f to f 151
Fig. 9.6 Overlapping Airy patterns

in the image plane measured using
a photon counting camera—each dot
corresponds to one count (7500 in
total). The counts in the vicinity of the
horizontal axis are plotted above in a
histogram. By fitting the distribution
with two displaced Airy patterns one
finds that it is still possible to resolve
objects with an angular separation
significantly less than the Rayleigh
limit, ΔθR , (right-hand image). Even
when, as in this example, the signal-to-
noise ratio is relatively low (about 15).
intensity are said to be ‘just resolved’ when the maximum of one sits on
the first minimum of the other. This gives f Δθ = 1.22f λ/D and hence
an angular resolution limit,
λ
ΔθR = 1.22 . (9.6)
D
However, in practice, it is possible to resolve objects with a smaller

angular separation, and the limits are set by signal-to-noise consider-
ations. Figure 9.6 shows a simulation of imaging two point sources
and the corresponding Airy patterns using a photon-counting camera.
Even when the angular separation of the incoming plane waves is less
than ΔθR , it is still possible to measure Δθ by fitting the data, if there
is sufficient signal. Rayleigh’s angular resolution limit is not always
relevant when imaging objects of different intensities. In Chapter 10
we shall specifically look at Fourier techniques for modifying the point-
spread function of an optical system in order to resolve two objects of
vastly different intensities.
9.5 f to f
A lens performs a Fourier transform of the field incident on the lens,
however, the transform is not exact due to the effect of wave-front
curvature. The electric field at a position (x, y) in the focal plane is
given by the Fresnel diffraction integral, eqn (6.35),
eikf eikρ
2
/2f
E (f )
= F E (0) (u, v) , (9.7)
iλf
where E (0) is the field distribution on the lens, ρ2 = x2 + y 2 where

x and y are the transverse coordinates in the focal plane, and the
Fourier variables are u = x/f λ and v = y/f λ. The Fourier transform
is not ‘exact’ because of the additional x and y dependence in the term
152 Optical imaging
2
eikρ /2f , which tells us that the wave fronts in the focal plane are curved.
Although this curvature is not apparent if we are only interested in
intensity, it does have a dramatic effect on the propagation around the
focus. This effect is illustrated in Fig. 9.7(top image). The input field in
the z = 0 plane is a rect function with uniform phase. The signature of
the wave-front curvature term from eqn (9.7) is that the field upstream
and downstream of the focal plane is not the same.
Now we consider the effect of moving the input plane back to z =
−f , lower image in Fig. 9.7. We still observe an Airy pattern in the
focal plane, but the intensity pattern upstream and downstream is very
different to the upper image. Note how the intensity pattern upstream
and downstream of the focal plane is symmetric for this case. Now we
show mathematically that the effect of moving the input plane back to
z = −f is to cancel the wave-front curvature in the focal plane. If the
input plane is moved upstream a distance f , as in Fig. 9.7, then each
plane-wave component arriving in the focal plane has travelled an extra
distance f . This means that each plane-wave component characterized
by the angular spectrum amplitude A is multiplied by an additional
propagation phase, eikz f . Using the paraxial expansion of kz in terms
of kx and ky , eqn (6.32) from Section 6.5, we obtain
eikf e−ikρ f /2k
2
eikz f =
eikf e−ikρ
2
/2f
Fig. 9.7 The effect of moving the = , (9.8)
input plane from the lens plane z = 0 where we have used kρ = kρ/f . Hence in eqn (9.7) we replace A (0)
by
(upper image) to z = −f upstream
(lower image). Moving the input plane A(−f ) eikz f which gives
back to z = −f cancels the wave-front 2
eikf eikρ /2f (−f ) ikf −ikρ2 /2f
curvature (quadratic phase factor) in E (f ) = A e e ,
the focal plane such that the field iλf
upstream and downstream of the focus ei2kf (−f )
is symmetric (lower image). = F E . (9.9)
iλf
This is a remarkable result as it shows that by moving the input plane
upstream we can cancel the wave-front curvature in the focal plane and
(apart from a constant prefactor) the Fourier transform relationship
between the input and output fields is exact! We refer to this case as an
optical Fourier transform. There are numerous applications where
we care about the phase of the field, and the cancellation of the wave-
front curvature is important.
There is another significant change in moving the input plane
upstream. Light can now diffract between the input plane and the first
lens, which in practice has a finite diameter D. If the light distribution
in the input plane contains features that are strongly localized, with a
transverse size less that 1.22f λ/D, then some of the input light spreads
out sufficiently fast so as to miss the first lens. The finite size of the
lens, or correspondingly if the lens is apodized, the so-called entrance
pupil size, sets an upper limit on the spatial frequencies accepted by
the optical system. Next, we consider moving the input plane in the
example of Young’s double slit where the effect of changing the wave-
front curvature in the focal plane is dramatic.
9.6 Two-lens system 153
Example 9.1
Young’s double slit: An example of changing the lens position in a Young’s double-
slit experiment is shown in Fig. 9.8. Consider a ‘one-dimensional’ scenario where
the field is uniform in the y direction. An opaque screen with two slits with width a
and separation d is placed either in the z = 0 plane or at z = −f , and illuminated
by uniform monochromatic light with wavelength λ. The light passes through a lens
with focal length f in the z = 0 plane. A plot of the intensity pattern in the xz
plane for both cases—calculated using the angular spectrum method—is shown in
Fig. 9.8. Here we are interested in finding an analytical expression for the intensity
distribution in the focal plane of the lens at z = f . For the one-dimensional Fourier
transform, eqn (9.9) has the form
ei2kf (−f )
E (f ) = √ F E . (9.10)
iλf
The input field along the x axis is E (−f ) = E0 f(x ), where the aperture function is
given by

x
f(x ) = Xd (x ) ∗ rect
(2)
, (9.11)
a
and the field in the Fourier plane is proportional to the one-dimensional Fourier
transform,

F(u) = F f(x ) = asinc (πua) cos πud . (9.12)
Substituting into eqn (9.10) and using u = x/(λf ), we find

ei2kf πax πdx
E (f ) = E0 √ asinc cos . (9.13) Fig. 9.8 The effect of moving the input
iλf λf λf
plane from the lens plane, z = 0
The intensity distribution—proportional to the modulus-squared—is the same as if (upper image), to z = −f upstream
2
the double slit is in the lens plane, but the absence of the eikρ /2f term means that (lower image). Although the intensity
the wave fronts are planar rather than curved, as is apparent in Fig. 9.8. distribution in the focal plane at z =
As in previous chapters we should remember the distinction between Fraunhofer f remains the same, the signature of
diffraction—where the Fourier transform relationship holds for any propagation far- wave-front curvature is clearly manifest
field distance z—and the case of a lens—where the Fresnel quadratic phase terms are by the changes in the fringe pattern
still important, and the Fourier relationship only holds for a particular plane (the upstream and downstream of the focal
focal plane in this example). plane.
9.6 Two-lens system

In the next few sections we shall discuss two-lens systems. Firstly,
for convenience, we consider two lenses with the same focal length f
separated by 2f , as illustrated in Fig. 9.9. As above we assume that
the input plane is a distance f upstream of the first lens and the output
plane is a distance f downstream of the second lens. At first this appears
not to be that useful as the output is the same as the input, apart from a
sign change; but the interesting part is what happens between the lenses
where we have direct access to the Fourier transform. In Chapter 10 we
shall focus on applications that exploit the ability to alter the field in
the Fourier plane.
Firstly, we write expressions for the field in (i) the input plane, (ii)
the Fourier plane, and (iii) the output plane. If the field in the input
154 Optical imaging
plane is described by aperture function f(x, y), i.e. E = E0 f(x, y), then
the field distribution in the Fourier plane is
ei2kf ei2kf
g(x, y) = F [f(x, y)] = F(u, v) , (9.14)
iλf iλf
where we use the mapping between the Fourier variables and the real
space, u = x/f λ and v = x/f λ, to obtain the real space distribution in
the Fourier plane. The field in the output plane is given by a Fourier
transform of the field in the Fourier plane,
ei2kf ei4kf
h(x, y) = F [g(x, y)] = − F [F(u, v)] . (9.15)
iλf (λf )2
To evaluate the Fourier transform of F(u, v) we try to make it look
Fig. 9.9 A two-lens system consisting

of an input plane, a first lens plane,
a focal plane, a second lens, and an
output plane. In this example, these
five planes are at positions, z = −2f ,
−f , 0, f , and 2f , respectively.
like an inverse transform by first replacing dx and dy in the integral by

du and dv using u = x/λf and v = x/λf , and then recognizing that
the integral form is an inverse transform, if we replace x by −x and
y by −y. Consequently, the second transform gives us back the input
function with a parity change, i.e.
ˆ ∞ˆ ∞
F [F(u, v)] = F(u, v)e−i2π(ux+vy) dxdy
−∞ −∞
ˆ ∞ˆ ∞
= (λf )2 F(u, v)e−i2π(ux+vy) dudv ,
−∞ −∞
= f(−x, −y) . (9.16)
Substituting this result into eqn (9.15),
h(x, y) = −ei4kf f(−x, −y) . (9.17)
The first minus sign is two times the Gouy phase. If we measure
intensity, the global phase factor of −ei4kf disappears, and the intensity
distribution is
I (2f ) = I0 |f(−x, −y)|2 . (9.18)
Although the global phase disappears the parity change does not.
An example with an asymmetric input distribution which illustrates
the inversion of the image in the output plane is shown in Fig. 9.9.
Figure 9.10 illustrates the inverse scaling between the input/output
planes and the Fourier plane.
9.7 Magnification
Now we consider what happens if the lenses have different focal lengths
f1 and f2 . In this case the field distribution in the input plane is
magnified by a factor f2 /f1 , as follows from geometrical optics, see
Fig. 9.11. This is the basic principle of a microscope. A large
magnification is achieved by choosing a small f1 . The first lens is called
the objective because it is close to the object, and the second lens the
eyepiece. In a microscope, the object does not necessarily need to be
placed in the focal plane of the objective, but for convenience we consider
Fig. 9.10 A symmetrical two-lens
the case where it is. imaging system, with an input plane at
To see how the magnification works in terms of Fourier transforms we z = −2f , first lens at z = −f , second
repeat the analysis in Section 9.6, except with f1 for the first lens and f2 lens at z = f , and output plane at
for the second. We shall find that the change of Fourier variables leads z = 2f . The Fourier plane is located
midway between the lenses at z = 0.
directly to a rescaling of the image. The field in the Fourier plane is The upper and lower images illustrate
the inverse scaling between the input
ei2kf1 ei2kf1
g(x, y) = F [f(x, y)] (u1 , v1 ) = F(u1 , v1 ) , (9.19) plane and Fourier plane.
iλf1 iλf
where u1 = x/f1 λ and v1 = y/f1 λ. The field at the output is
ei2kf2
h(x, y) = F [g(x, y)] (u2 , v2 ) ,
iλf2
ei2k(f1 +f2 )
= − 2 F [F(u1 , v1 )] (u2 , v2 ) , (9.20)
λ f1 f2
where u2 = x/f2 λ and v2 = y/f2 λ. We do the Fourier transform by
rewriting it as an inverse transform but in terms of u2 and v2 , which
Fig. 9.11 A geometrical optics
introduces a scaling factor, schematic of a two-lens system with a
magnification of two.
F [F(u1 , v1 )] = F [F(x/f1 λ, y/f1 λ)] ,
= F {F [(f2 /f1 )u2 , (f2 /f1 )v2 )]} ,
ˆˆ ∞
f2 f2
= (f2 λ)2 F u2 , v2 e−i2π(u2 x+v2 y) du2 dv2 ,
f1 f1
−∞
x y
= (f1 λ)2 f − ,− . (9.21)
f2 /f1 f2 /f1
So the output field is

1 x y
h(x, y) = − f − ,− ei2k(f1 +f2 ) . (9.22) Fig. 9.12 A two-lens system with a
f2 /f1 f2 /f1 f2 /f1 magnification of two. The example
shown illustrates the magnification of
For f1 < f2 the image is stretched by a scaling factor f2 /f1 and as the some intensity fringes.
light is spread out over a large area the intensity is reduced by a factor
of (f2 /f1 )2 . In Fig. 9.12 we repeat the scenario of Fig. 9.11 except now
showing the intensity pattern as it propagates through the system.
156 Optical imaging
9.8 Complementarity I
In this chapter we have explored light propagation through optical
systems and seen that sometimes light is localized at particular positions,
or along particular paths, while at other times it is delocalized and
may even interfere with itself. In quantum physics, we tend to think
of path as a particle-like property and interference as a wave-
2
Formulated by Niels Bohr (Copen- like property, so what does this tell us about wave–particle duality?
hagen 1885–Copenhagen 1962). One solution to the wave–particle duality paradox is the principle of
complementarity,2 which states that one can observe either the wave-
like or particle-like properties but not both at the same time. For
example, in Young’s double-slit experiment we can observe either the
path—which slit the photons pass through—or the interference fringes,
but not both.
To illustrate why, we can use the example of Young’s double-slit
experiment, but performed with atoms or electrons rather than photons,
see Adams et al. (1994). To gain path information we need to look
at which slit an atom has passed through, as illustrated in Fig. 9.13.
Looking means scattering light off the atoms and in order to resolve
the slits we need to scatter photons with a range of angles of order
λ/d, where d is the slit separation. To capture all these photons we
need a high numerical aperture lens, called a Heisenberg microscope
after Werner Heisenberg (Würzburg 1901–Munich 1976) who devised
this thought experiment. As a result of momentum conservation, the
scattered photons change the momentum, and hence the wave vector,
Fig. 9.13 Schematic of the Heisenberg of the matter wave, see Fig. 9.13. As photons are emitted at random
microscope. A Young’s double-slit angles, the effect of many scattering events is to introduce a range of
experiment is performed using atoms
phase shifts, as in Fig. 8.8, which ‘washes out’ the interference pattern.
(or other particles such as electrons).
The microscope, consisting of a high If the recoil in the x direction is Δkx then the shifted fringes become
numerical aperture lens (grey), is cos[(kx + Δkx )d/2]. Using the Fourier relationship between spatial
designed to provide ‘which-path’ infor- resolving power, Δx, and photon momentum component, Δpx = Δkx ,
mation by detecting scattered photons
with wave vectors k. However, a
for Δx < d, Δkx > 2π/d, and the shift is greater than π leading to a
scattered photon deflects the particle, complete wash-out of the interference pattern. The effect of averaging
causing the interference fringes to shift. over a large enough range of Δkx to resolve the path ‘washes out’ the
interference fringes. In Chapter 10 we shall revisit this complementarity
concept using only photons, see Section 10.9.
Exercises 157
Chapter summary
• The spatial detail in an image (with size Δx) is limited by the

range of spatial frequencies captured by a lens (Δkx ).
• The range of spatial frequencies (Δkx ) captured by a lens is limited
by the size of the lens. Consequently, the finite size of the lens
limits the minimum width of the focal spot.
• The field in the focal plane of a lens is given by the convolution of
the Fourier transform of the incident field and the point-spread
function of the lens.
• If the field in the plane of the lens has planar wave fronts then the
field in the focal plane will have curved wave fronts.
• Moving the input plane back by a distance f cancels the
wave-front curvature in the focal plane, and the field in the focal
plane is an exact Fourier transform of the input field.
• For a two-lens system, the field in the output plane is a double
Fourier transform of the field in the input plane.
• If the two lenses have different focal lengths then the scaling of the
Fourier variables leads to a magnification of the output.
• The principle of complementarity says that we cannot observe
both the wave- and particle-like properties of light (or matter) at
the same time. Consequently, we can observe path or interference,
but not both.
Exercises
(9.1) Imaging of the point-spread function, assuming that it is
Write equations for (i) a spherical wave with origin limited by diffraction, for the centre of the optical
at z = f , and (ii) a paraxial spherical wave with spectrum (λ ∼ 0.55 μm).
origin at z = f in the z = 0 plane. Comment on
whether a lens with focal length f in the z = 0 (9.4) Resolving power
plane would cancel or double the transverse phase Give an expression for the angular resolution
dependence, and what the field would look like limit, Δθmin , for light with wavelength λ, of
upstream of the lens. an instrument with entrance aperture size, D.
Describe, briefly, the practical limits to the
(9.2) Point-spread function resolution of the instrument.
Write an expression for amplitude and intensity
point-spread functions for a single lens with (9.5) Focusing laser beams
diameter D and focal length f . A laser beam with beam waist w0 is incident
on a lens with focal length f and diameter D.
(9.3) Point-spread function of Hubble Space Telescope The effect of the finite size of the lens is to
Given that the Hubble Space Telescope has a clip the edges of the gaussian beam which can
2.4 m primary mirror, estimate the angular width be described using an aperture function, f(ρ ) =
158 Exercises
gauss(ρ /w0 )circ(ρ /D). The field in the focal (9.8) 2D Optical transforms
plane is proportional to the Fourier transform, The field incident on a lens with focal length f
2 2
F(kρ ) = F[f(x , y )](u, v). Write an expression for in the z = 0 plane is E (0) = 14 E0 e−ρ /w0 [3 +

the Fourier transform. Describe what happens in cos(2πx /d)], where w0 is the beam radius and
the two limit cases (i) w0 D and (ii) w0 < D. d = w0 /4 is the distance characterizing a fringe
Write expressions for the size of the focal spot pattern on the beam. What is the spacing between
in each case. Assuming that the total power of the intensity maxima in the x direction in units of
the laser is fixed such that the input intensity d? Write an expression for the field in the focal
is inversely proportional to w0 , comment on the plane. What are the positions of the maxima in
optimal ratio of w0 /D to maximize the on-axis the focal plane? What is the 1/e width of the
intensity in the focal plane. maxima in the focal plane? What is the ratio of
(9.6) Focal spot the spacing between the maxima to their width?
Write an equation for the intensity distribution What is the intensity ratio between the brightest
in the focal plane of a lens with diameter, D, and faintest maxima?
assuming that the field on the lens is characterized (9.9) Photography: depth-of-field
by an aperture function f(x , y ). Explain, briefly, In photography, the depth of field (depth of
why it is not possible to produce the intensity focus) relates to how far the object (image) plane
pattern can move without the image becoming blurred.
2 This maps directly onto the concept of Rayleigh
I0 π 2 D4 π(2D)ρ
I = 2 2 π(2D) gauss
(f ) 2
. distance, or Rayleigh range, where we ask how far
λ f λf can a light field propagate before its distribution
What is the input field distribution if the intensity changes substantially? So for a focused gaussian
in the focal plane is light distribution with beam radius wf , we can say
that the depth of focus or tolerance in the position
4 2 2
I0 π 2 D 1 β of the image plane is zR = πwf2 /λ. Show that the
I (f )
= 2 2 jinc (β) − jinc , depth of focus is proportional to the f-number
λ f 2 2 2
squared, where f-number is the ratio of the focal
where β = πDρ/(λf )? This relates to the topic of length f to the diameter of the entrance aperture,
apodization, discussed in the Chapter 10. D. Comment on how the result is modified for
(9.7) Convolution and ‘blurring’ of details depth of field.
Sketch the convolution of the one-dimensional (9.10) Two-lens system
function that represents a periodic array of Sketch a version of Fig. 9.14, labelling the key
rectangles of width a and separation d, with a planes along the propagation axis, with lines to
gaussian of width w. Illustrate three cases: (i) mark the spatial extent of the light field. Indicate
w a, (ii) w ∼ a, and (iii) w > d. Comment on the wave fronts before the first lens, after the
your results. second lens, and in the focal plane.
Fig. 9.14 Two examples of light propagating through a two-lens system, see Exercise 9.10.
Spatial filtering 10
Sometimes a strange light
shines, purer than the moon,
casting no shadow
10.3 Spatial filtering 164
R. S. Thomas (Caerdydd 1913–Pentrefelin 2000),
Frequencies, 1978. 10.5 2D periodic 166
10.7 Convolution 170
10.1 Introduction 10.8 Phase-contrast imaging 171
In this chapter we examine apodization—modifying the aperture Chapter summary 173
function in order to change the point-spread function—and spatial Exercises 174
filtering—modifying the transmission in the Fourier plane, in order to
process an image or light distribution. Both of these concepts share
the feature that understanding Fourier optics enables the design of
optical devices with improved performance. Apodization exploits the
Fourier link between the lens plane and focal plane to suppress secondary
maxima in the point-spread function. We shall also analyse inverse
apodization, and explain how super resolution is achieved. Spatial
filtering exploits a two-lens system to perform Fourier analysis and
synthesis. By modifying the amplitude or phase of the diffraction
pattern in the Fourier plane we can re-engineer the image. By
modifying the phase—phase-contrast imaging—it is possible to
render transparent objects visible.
10.2 Apodization
Previously, in Chapter 9, our discussion of the imaging properties of
optical systems largely focused on only the central part of the diffraction
pattern. As encapsulated in the Rayleigh criterion, it is this central
part that determines the ability of the optical device to resolve two
equally bright point objects. The pattern of light surrounding the central
maximum is much fainter; but there are side lobes, or ‘wings’.1 1
Also called the ‘feet’ of the pattern.
If we are interested in looking for a faint object near a bright object,
then the wings of each diffraction pattern diminish significantly our
ability to resolve detail. Making a narrower point-spread function is
unlikely to be possible, as many optical instruments such as telescopes
are already as large as they feasibly can be; therefore another technique
is necessary to modify the point-spread function.
160 Spatial filtering
For an optical instrument with cylindrical symmetry, the Airy pattern

in the focal plane arises from having a uniform light distribution over
the whole area of the lens. If we modify the light distribution on the
lens, see Fig. 10.1, it follows that we will modify the field distribution
at the focus. We shall demonstrate in this chapter that an appropriate
reduction in amplitude of secondary maxima can be achieved by using
an aperture function whose transmission is smaller at the edges than in
2
The concept is to suppress (or greatly the centre—this is the concept of apodization.2
reduce) the relative intensity of the side Much attention was devoted to this topic last century in astronomy,
lobes, or ‘feet’ of the pattern; hence the
name of apodization, from the Greek α,
when looking, for example, for faint companions in a binary star
to subtract, and πoδoζ, foot (Jacquinot system, and this century in the search for extrasolar planets (Kasdin
and Roizen-Dossier, 1964). et al. 2003). As the relative intensity of an extrasolar planet can be
∼ 109 times less than that of the star, a technique is needed to re-
engineer the point-spread function; specifically a means of reducing the
amplitude of the secondary maxima. For example, for a one-dimensional
rectangular aperture, the first side lobe has an intensity of approximately
5% of the central maximum, and the tenth side lobe an intensity of
approximately 0.1%. (An end-of-chapter exercise investigates the drop
off of subsidiary maxima intensity for the Airy pattern.) Therefore
faint objects whose angular separation from a bright object exceed the
conventional Rayleigh-resolution limit will not necessarily be resolved.
We shall investigate how the point-spread function of a device
Fig. 10.1 Schematic of apodization: can be modified by suitable changes to the aperture’s transmission
The lens is covered by a transmission function, using Fourier techniques. To illustrate the principle, we
filter (grey shading). In this example initially consider one-dimensional examples—where many analytical
the transmission filter has a gaussian
results exist—then we move on to discuss cylindrically symmetrical two-
spatial profile, t(ρ ) = gauss(ρ /a)
with a < D/2, that has the effect of dimensional apertures.
removing the fringes in the focal plane, Note that as Fourier analysis plays a central role, apodization also
see Example 10.2. finds great utility elsewhere. Two specific examples of (one-dimensional)
apodization are (i) in a Fourier transform spectrometer, and (ii) in
signal analysis in communication theory (using the frequency–time
Fourier relationship). A review of window functions in temporal signal
processing can be found in Harris (1978).
10.2.1 Apodization of 1D apertures

In spite of being a technique for improving the performance of an
optical device, counter intuitively, when apodizing an aperture less light
is transmitted. The idea is to use a transmission function, apod(x),
which does not just provide uniform illumination over the pupil, i.e.
a rectangle. We know that the point-spread function is the Fourier
transform of the pupil function, and when we calculated the transform of
the rectangle function we pointed out that the discontinuity created high
spatial frequencies in the Fourier domain. These high spatial frequencies
in the aperture function appear in the wings of the diffraction pattern;
therefore an apodized aperture has the feature that the amplitude does
not vanish abruptly at the edge of the pupil.
Table 10.1 lists frequently encountered one-dimensional apodization
Table 10.1 One-dimensional apodization functions apod(x) over the range −a/2 ≤ x ≤
a/2, and the corresponding Fourier transforms—the amplitude point-spread function.
Name Function apod(x) Fourier transform—amplitude psf(u)
uniform 1 a sinc πau
2|x| a πau
triangle 1− sinc2
a 2 2
πx 2a cos πau
cosine cos
a π (1 − 4a2 u2 )
πx a sinc πau
Hann cos2
a 2 (1 − a2 u2 )

21 1 2πx 2 4πx a 21 25
− 100
9
a2 u2 sinc πau
Blackman + cos + cos
50 2 a 25 a 2 (1 − a2 u2 ) (1 − a2 u2 /4)
functions apod(x) over the range −a/2 ≤ x ≤ a/2, and their

corresponding Fourier transforms. Figure 10.2 plots the functions, and
their intensity point-spread functions (note the use of a logarithmic scale
for the ordinate). The explicit calculation of the point-spread function
for one of the functions using Fourier techniques is given in Example 10.1.
Example 10.1
Cosine apodization: As an example of apodization in one dimension we calculate
the point-spread function when the conventional uniform pupil function, rect(x/a), is
softened by the cosine function apod(x) = cos(πx/a). Therefore the modified pupil
transmission function is t(x) = apod(x)rect(x/a).
psf(u) = F [t(x)] ,
x πx
= F rect × cos ,
a
x a πx
= F rect ∗ F cos ,
a a

1 1 1
= a sinc au ∗ δ u− +δ u+ ,
2 2a 2a

a 1 1
= sinc au − + sinc au + ,
2 2 2
2a cos πau
= ,
π (1 − 4a2 u2 )
which is the result quoted in Table 10.1. The two sinc functions are displaced such
that the amplitudes in the wings have opposite signs and partially cancel, resulting
in the desired effect of greatly suppressed secondary maxima.
Let us compare the point-spread functions of the rectangular and

triangular transmission functions (first two columns of Fig. 10.2). The
point-spread function of the apodized function is significantly wider than
the first; and the peak intensity is reduced by a factor of 4/π 2 —both
of these are negatives as far as the properties of a point-spread function
Fig. 10.2 Plots of the five apodiza-

tion functions of Table 10.1 (uniform,
triangle, cosine, Hann, and Blackman)
over the range −a ≤ x ≤ a. Below,
the corresponding normalized intensity
point-spread functions (on a log scale
from 10−8 to 1). The uniform case
is repeated in grey for comparison.
The apodized apertures have a broader
central maximum, and suppressed sec-
ondary maxima. The suppression of
the wings of the diffraction pattern
is particularly noteworthy for the
Blackman function (final column).
in an imaging system are concerned. However, the huge positive is that

the intensities of the subsidiary maxima, or side lobes, relative to the
peak, is greatly suppressed.
The other apodizing functions in Table 10.1 have different strengths
and weaknesses in terms of the properties of the modified point-
spread function, and the ease with which the function can be realized
experimentally. In optics the last point can become a major difficulty,
which is typically not the case for applications in signal processing. Note
that the Blackman function, which not only has soft edges but goes to
zero smoothly at x = ±a/2, has the most impressive suppression of the
3
In the context of atom–light inter- wings of the point-spread function.3
actions, see Chapter 13, the width In spectroscopy, the instrument response function of a spectrometer is
of the velocity class of atoms that a
pulsed laser interacts with is linearly
proportional to the Fourier transform of the input slit. For a rectangular
proportional to the Fourier spectrum of slit with a discontinuity of the transmission at the edges the ringing of
the temporal profile, as a consequence the concomitant sinc function can be problematic. This issue can be
of the Doppler effect. Kasevich and alleviated by suitable apodization. Similarly, in Section 8.6 we saw
Chu (1992) achieved far greater cooling
with a Raman transition by using a
that a Michelson interferometer can be used as a Fourier transform
Blackman temporal profile in contrast spectrometer, with the interferogram given by eqn (8.24). Therefore
to a rectangular envelope, because the instrument profile here is sinc, not sinc2 , and the wings of the
the frequency spectrum has very little point-spread function are much more prominent, therefore apodization
power away from the central excitation
frequency. is highly desirable. The interferogram is achieved by recording the
transmission as a function of path length. Apodization can be achieved
in this case by varying the gain of the detector smoothly as a function
of path length.
10.2.2 Apodization of 2D apertures

The principle of apodization in two dimensions is the same as in one
dimension—reducing the amount of light transmitted at the edge of a
pupil in order to reduce the prominence of the subsidiary maxima in
the point-spread function. For apertures and apodizing functions with
Cartesian separability, exactly the same analysis as in Section 10.2.1
can be used. Many optical instruments have cylindrical symmetry,
and the Fourier transform becomes an integral over the radial variable.
Example 10.2 demonstrates apodizing of the point-spread function of a

circular pupil when a gaussian apodizing filter is applied, as illustrated
in Fig. 10.1.
Example 10.2
Two-dimensional gaussian apodization of a circular aperture: We consider
a gaussian transmission filter that modifies the field in the plane of the lens such that
the field immediately downstream is described by the distribution function

ρ ρ
f(x , y ) = gauss circ , (10.1)
w D
where w < D/2 for the filter to have any effect. Using the inverse convolution
theorem we find that the field in the focal plane is proportional to

πD 2 πDρ πwρ
F(ρ) = jinc ∗ πw2 gauss . (10.2)
4 λf λf
The convolution with a gaussian works as a smoothing function that slightly broadens
the central peak but also smoothes out the oscillations of the subsidiary maxima.
Detail of the field distribution in the focal plane is shown in Fig. 10.3. The field in Fig. 10.3 Apodization: Intensity
the xz plane is illustrated in Fig. 10.1. Note that in comparison with Fig. 9.2, the images of the focal ‘spot’ without (left)
focal spot is larger but the fringes around the central spot are strongly suppressed. and with (right) apodization. Notice
how the wings are suppressed, allowing
faint objects to be imaged in the
vicinity of the main spot.
10.2.3 Inverse apodization and super-resolution

There are many applications where it is desirable to decrease the width
of the central maximum of the diffraction pattern (without changing
the dimensions of the pupil). We shall see that this is accompanied by
an increase in the relative height of the wings, or feet. Therefore this
can be thought of as the opposite of apodization, and we refer to the
technique as inverse apodization. Rather than attenuating the high
spatial frequencies, we remove the low spatial frequencies by blocking
the centre of the lens. In this case the high spatial frequencies interfere
in the focal plane giving higher spatial resolution, but at the expense of
more fringes. This is equivalent to a high-pass filter.
As elucidated by Jacquinot and Roizen-Dossier (1964), a simple way
to achieve super-resolution is to take the complement of any function
apod(x) that is a suitable apodization function; i.e. use 1 − apod(x) for
a 1D aperture, or 1 − apod(ρ ) for a radially symmetric 2D function. It
is easy to show (see end-of-chapter exercises) for a lens of focal length f
and diameter D that the first zero of this modified function occurs at a
radial displacement of less than4 1.22λ/D. This phenomenon is referred 4
Which is to say that the central
to as super-resolution. maximum of the point-spread function
of the inverse-apodized function is
For cylindrically symmetrical pupils an easy way to achieve inverse narrower than that of the unapodized
apodization is by using an annular filter, i.e. blocking light in a radially pupil.
symmetrical fashion from the centre out to some fraction of the radius.
For example, for a lens of focal length f and diameter D, if we block the
central region with ρ < D/2, then

ρ ρ
f(ρ ) = circ − circ , (10.3)
D D/2
and the field in the focal plane is proportional to

πD2 πDρ π(D/2)2 π(D/2)ρ
F(ρ) = jinc − jinc . (10.4)
4 λf 4 λf
The field in the xz plane for this example is shown in Fig. 10.4(ii).
Fig. 10.4 Intensity distribution in the

xz plane for (i) conventional focusing
and (ii) inverse apodization. By block-
ing the central region, the field from
the rim interferes in the focal plane
producing a narrower central fringe,
but with the potential disadvantage
that the subsidiary maxima (or fringes)
of the point-spread function are more
prominent.
The interference fringes in the focal plane produce a narrower

5
The ability to produce narrower central maximum.5 Table 10.2 compares and contrasts the point-spread
fringes in this way inspired Norman functions obtained from pupils that are unaltered, apodized, and inverse-
Ramsey to developed the technique of
Ramsey interferometry, see Chapter 7).
apodized. We have noted the possibility of making the central maximum
Table 10.2 Comparison of the diffraction patterns obtained when the pupil of an image-
forming instrument is modified.
Aperture Peak intensity Width of central maximum Wing prominence
uniform same same same
apodized reduced wider suppressed
inverse apodized reduced narrower stronger
of the diffraction pattern narrower. This leads to the question: Is there

a minimum width? Jacquinot and Roizen-Dossier (1964) discuss this,
and show that for an absorbing annular aperture the reduction of the
diameter of the central maximum is the ratio of the first zeros of the
Bessel functions J1 and J0 , which is 3.83/2.40 = 1.6. Note, however, that
if the annular filter’s central region does not block the light, but retards
6
Such an aperture is referred to as a its phase by π,6 then, in principle, there is no limit to the central width—
phase mask, and we shall encounter this although there is a great reduction in the transmitted intensity. An
later in the chapter.
end-of-chapter exercise investigates the trade-off between the narrower
central maximum and the drop off in peak intensity.
10.3 Spatial filtering

A powerful feature of the Fourier transforming property of a lens is the
ability to modify the field in Fourier space. This concept is known
as spatial filtering. We can base our spatial filtering set-up on our

symmetric imaging system, as in Fig. 9.9. As before we assume an input
Fig. 10.5 Schematic of a 4f spatial

filter. An object is illuminated in the
input plane at z = −2f (the letter K in
this case). The first lens, at z = −f ,
produces a diffraction pattern that is
the Fourier transform of the object in
the Fourier plane (also known as the
filter plane) at z = 0. The second
lens, at z = f , makes another Fourier
transform in the image plane (or output
plane) at z = 2f . If no masks are
inserted into the filter plane, the image
is the inverted object. However, this
can be modified by applying suitable
filters (in this case a high-pass filter,
resulting in edge enhancement).
plane at z = −2f a distance f upstream of the first lens at z = −f ,

which performs a Fourier transform located in the z = 0 plane, followed
by a second lens at z = f that performs a second Fourier transform in the
output plane at z = 2f . As the total propagation distance is four times
the focal length f , this is known as a 4f -spatial filter. Figure 10.5
shows a schematic of the 4f set-up.
The 4f set-up can be viewed as two sequential Fourier transforma-
tions, as we discussed in Section 9.6. The first lens transforms the input
into Fourier space, and the second lens will transform it back to real
space in the output plane. The first half of the apparatus performs
a Fourier analysis and the second half a Fourier synthesis. The idea
behind optical image processing is that by placing suitable apertures
(or masks) in the Fourier plane we can modify the nature of the image
formed. These apertures can either physically block some of the light
in the Fourier plane, or phase-shift selected components. The process of
placing masks in the Fourier plane to modify the images is called spatial Fig. 10.6 Spatial masks used to block
filtering. either high (top) or low (bottom)
Equation (9.9) shows us that the central z = 0 plane in the 4f set-up spatial frequencies in the Fourier or
is a map of the Fourier transform of the object. Therefore, low-pass filter plane (z = 0 in Fig. 10.5). In this
example, the mask is circular such that
filtering is achieved by centring an opaque mask on the axis; whereas wave vectors with radial component kρ
high-pass filtering is achieved with the complementary aperture, see either less than (low pass) or greater
Fig. 10.6. Next, we shall illustrate some of these concepts with examples, than (high pass) kc = kρc /z are
transmitted.
starting with simple one-dimensional periodic objects, before moving
on to two-dimensional periodic objects, and finally two-dimensional
arbitrary objects.
10.4 1D periodic
As a first example of spatial filtering, consider an input field that con-
tains a cosine-squared transverse modulation, E (−2f ) = E0 f(x)e−x /w0 ,
2 2
where f(x) = cos2 2πu0 x, 2u0 is the spatial frequency of the modulation,
and the beam size is larger than the modulation wavelength w0 >
02).
1/(2u Recalling
that cos2 2πu0 x = (1 + cos 4πu0 x)/2, we find
F cos 2πu0 x = [δ(u) + δ(u − 2u0 ) + δ(u + 2u0 )] /2; it follows that
the Fourier transform has three contributions, with spatial frequencies
u = 0, and u = ±2u0 . For this object we expect to see three intense
spots in the Fourier plane, as evident in Fig. 10.7.
To perform spatial filtering, we can choose to block either the high
or the low spatial frequencies in the Fourier plane. Low-pass filtering
is seen in the upper panel of Fig. 10.7, where the mask in the Fourier
plane blocks the spatial frequencies u = ±2u0 . The image therefore only
contains the spatial frequency u = 0, and the second lens transforms
this into a uniform beam in the output plane, as expected. High-pass
filtering is achieved (lower panel in Fig. 10.7) by choosing a mask in the
Fourier plane that blocks the u = 0 component, but transmits the spatial
frequencies u = ±2u0 . The interesting result is that when we block the
zero frequency component the spatial frequency of the fringes observed
in the output plane is doubled. This follows because when we remove
Fig. 10.7 Low- and high-pass spa-
tial filtering (upper and lower image, the u = 0 component we see interference between the ±2u0 components,
respectively): In the upper image which has a spatial frequency of 4u0 . (It is evident that the periodicity
only the low spatial frequencies pass of the bright lines in the plane z = 2f is twice as high as it is in the plane
through the Fourier plane at z =
0, leading to an output field without
z = −2f .) This is an example of false detail. Understanding false detail
fringes. In the lower image only the and the resolution possible by transmitting different diffraction orders
high spatial frequencies pass and only was important historically in Abbe’s development of a theory for image
fringes remain—note that the spatial formation in a microscope. We are familiar with optical systems where
period of the fringes is halved. Note
the similarity of the intensity pattern
the image is lower resolution than the object, but here the converse can
before the first lens to the Talbot occur.
carpets of Section 5.11.
10.5 2D periodic
We now consider a two-dimensional object, comprised of an array of

7
Experiments on spatial filtering of 2D square apertures with the same periodicity along the x and y axes.7
periodic objects were first performed by Figure 10.8 shows images of the Fourier plane (top) row and the
Porter through illumination of a fine
wire mesh, see Porter (1906).
corresponding output below. Column (i) corresponds with no mask
and the output is the same as the input. In column (ii) the mask in the
Fourier plane only transmits one vertical row of the diffraction pattern.
As a consequence the horizontal information of the object is retained in
the image z = 2f , but not the vertical. The converse occurs when a
horizontal mask is used, column (iii). When a diagonal mask at +45◦ is
used as in column (iv), the image is periodic and inclined at −45◦ . We
also see another example
√ of false detail: as the dots transmitted by the
diagonal mask are 2 further apart in Fourier√ space, the image has a
periodicity that is reduced by a factor of 2, i.e. the dominant spatial
frequency in the output is higher than that of the input.
Fig. 10.8 Two-dimensional spatial

filtering. The top and bottoms rows
show the intensity patterns in the
Fourier and output planes, respectively
(z = 0 and z = 2f in Fig. 10.5): (i)
no filter—the output is the same as
the input; (ii) vertical low-pass filter,
only the horizontal information about
the object is retained in the image; (iii)
horizontal low-pass filter, only vertical
information is retained—the converse
of (ii); (iv) diagonal mask at +45◦ ,
leads to a periodic image inclined at
−45◦ —as the intensity maxima are
√
2 further apart in Fourier space, the
image
√ has a periodicity that is a factor
of 2 smaller.
10.6 2D arbitrary objects

Next, we demonstrate the principles outlined previously for more
complex objects and masks. We begin by considering some simple but
more general examples of low- and high-pass filtering. One of the most
frequently encountered examples of using spatial filtering in a laboratory
is to clean the mode of a laser. This is considered in Example 10.3 and is
similar to Fig. 10.7(top) but now we include the y dependence, which for
a cartesian-separable input, if we only filter in x, is unchanged anyway.
Example 10.3
Cleaning up the mode of a laser: To illustrate low-pass filtering we consider a
practical example of the removal of intensity irregularities from a gaussian laser beam.
Passing a gaussian beam through numerous optical components, which might have
dust on them, leads to fringes, lumps, and bumps in the intensity profile. These
correspond with plane-wave components with higher spatial frequency, therefore
by using a 4f spatial filter and blocking high spatial frequencies in the Fourier
plane we can remove them. We show how this works for one particular spatial
frequency component. By extension, the same principle can be applied to other
spatial frequencies. Consider a laser beam with some cosine fringes in the x direction,
see Fig. 10.9(i). The field in the input plane at z = −2f is E (−2f ) = E0 f(x , y ), where

2 2 2 2πx
f(x , y ) = e−(x +y )/w0 1 + cos , (10.5)
d
where and d are the amplitude and wavelength of the fringes. The spatial frequency
of the fringes is u0 = 1/d. The ratio w0 /d tells us the number of fringes within a
distance equal to the beam radius. As an example we shall take w0 /d = 5. The field
in the Fourier plane is proportional to the Fourier transform,

F f(x , y ) = [G(u, v) ∗ H(u)] (u, v) ,
where
2
(u2 +v 2 )w0
2
G(u, v) = πw02 e−π ,

1 1
H(u) = δ(u) + δ u+ + δ u− ,
2 d 2 d
Fig. 10.9 An example of low-pass

spatial filtering: (i) A gaussian beam
with periodic fringes in the x direction
in the input plane. (ii) The normalized
light intensity in the Fourier plane.
(iii) By using a mask (dashed line) that
only transmits low spatial frequencies,
we obtain a smoothed output, (iv),
where the high-frequency intensity ir-
regularities are removed, without large
losses.
u = x/λf and v = y/λf . The convolution means that we get three copies of the
focused gaussian beam in the Fourier plane at positions x = −f λ/d, 0, f λ/d, see
Fig. 10.9(ii). The focused gaussian spot has a radius wf = f λ/πw0 , so given that we
chose w0 = 5d the spacing between the spots is a factor 5π larger than their radius,
i.e. we can say that the different spatial-frequency components are ‘well resolved’.
We now filter the image by blocking the two spots at x = ±f λ/d, Fig. 10.9(iii).
After the filter the field distribution in the Fourier plane is a single gaussian spot
characterized by the function
2 2
(x2 +y 2 )/λ2 f 2
f (x, y) = πw02 e−π w0
, (10.6)
which becomes the input function for the second Fourier-transforming lens. The
output field is proportional to the Fourier transform of the field in the Fourier plane,
which is
2 2 2
F f (x, y) = e−(x +y )/w0 . (10.7)
Therefore the output is a smooth gaussian beam with the fringes removed.
Example 10.4
Edge detection using a high-pass filter: One interesting application of a high-
pass filter is the ability to pick out edges in an image. We first consider what
Fig. 10.10 High-pass filtering of a happens mathematically in one dimension (using cylindrical lenses) and then some
rect function, g(x). Upper panel two-dimensional examples using letters. We shall use a gaussian filter function for
shows the original rect function (dark convenience. For a high-pass filter in the x direction, we define the transmission
grey), and the modified one obtained function as
by convolving with a gaussian (light
x
grey). The output of the 4f set-up is high(x ) = 1 − gauss , (10.8)
xc
proportional to the difference between
these functions, shown in the middle where xc defines an effective cut-off distance, where light with position x < xc in the
panel. The intensity—shown in the Fourier plane is strongly attenuated, similar to the 2D mask in Fig. 10.6. If the input
lower panel—is only non-zero where image is a broad rect function of width a, then in the Fourier plane, immediately
there is a sudden change—an edge—in after the filter, the field is proportional to

the input function. x πax
g (x) = a 1 − gauss sinc . (10.9)
xc λf
The field in the output plane is proportional to the Fourier transform of g (x), which
we can evaluate using the inverse convolution theorem. We find
x x √
πxc x
h(x) = rect − rect ∗ πxc gauss . (10.10)
a a λf
The convolution of a gauss and a rect produces a smoothed rect function, as shown
in Fig. 10.10(top). The difference between a rect and a smoothed rect, shown in
Fig. 10.10(middle), is only non-zero near the edges. This function is large close to an
edge and zero everywhere else, i.e. a high-pass filter picks out the edges of an image,
because it is at the edges of the original function that the field varies most rapidly in
space, which generates high-frequency components. The high-pass filter only allows
these components to contribute to the image.
We can illustrate many of the concepts discussed here using examples

where the input function is a two-dimensional image of a letter. For
example, edge enhancement of a two-dimensional object by high-pass
filtering is evident in Fig. 10.5, which was based on the letter K. We
now consider some additional examples, both measured and simulated
on a computer. Figure 10.11 shows experimental images recorded in
the output plane when the input plane is an aperture with the shape of
the letter Z, and the Fourier plane contains a mask of the kind shown in
Fig. 10.6, that blocks either low or high spatial frequencies. As expected,
when we block high spatial frequency (low-pass filter) the image is
smoothed, Fig. 10.11(top). By contrast, when we block low spatial Fig. 10.11 Low- (top) and high-
pass (bottom) filtering of the letter
frequencies (high-pass filter) we pick out the edges, Fig. 10.11(bottom). Z. Low- and high-pass filtering lead
Note the double structure in the edge, as explained in Fig. 10.10. to smoothing and edge enhancement,
Figure 10.12 shows a simulation of what happens when we apply a respectively. Experimental images
gaussian transmission filter, or its complement, eqn (10.8), to a two- courtesy of Ismael Lasanta, Durham
University, 2016.
dimensional image using the example of the letter O. In the left-hand
column we show a low-pass filter which smoothes the image and on the
right we show a high-pass filter which picks out the edges, similar to the
experimental results shown in Fig. 10.11. We could also choose to filter
selected regions of the Fourier transform in order to modify the image
in a particular way.
Fig. 10.12 Low- and high-pass filtering

of a light distribution corresponding to
the letter O, left and right column,
respectively. The top row shows the
output image. The bottom left im-
age shows the corresponding intensity
distribution in the Fourier plane after
filtering. Low-pass filtering (left-hand
column) leads to a smoothed output.
High-pass filtering (right-hand column)
leads to edge enhancement, as demon-
strated experimentally in Fig. 10.11.
In Fig. 10.13 we show an example where we either transmit, or block,
light close to the vertical axis. As expected, building on the concepts
illustrated in Fig. 10.8, when the low-pass filter is applied in the y

direction the reconstructed image only contains interesting information
along the x direction; the horizontal bar of the A survives. On applying
a high-pass filter in the y direction in the Fourier plane (bottom right
panel of Fig. 10.13), we expect all but the horizontal bar of the A to
appear, which it does (upper right panel). Note that the image in the
plane z = 2f has been inverted for convenience. (The letter A as an
object gives an upside down, left to right letter A as an image with no
filtering).
Fig. 10.13 Spatial filtering of a light

distribution corresponding to the letter
A. The top row shows the output
image, firstly when only light close to
the vertical axis in the Fourier plane
is transmitted (low-pass in y, left-hand
column), and then when only light close
to the vertical axis is blocked (high-pass
in y, left-hand column). The bottom
row shows the corresponding intensity
distribution in the Fourier plane after
filtering.
10.7 Convolution
A spatial filter set-up can be used to perform a convolution. For example,
if we want to make copies of an image we need to convolve with a comb
function. A convolution in real space is equivalent to a multiplication
in Fourier space, so by convolving with a comb we can multiply by the
Fourier transform of a comb (also a comb) in the Fourier plane. This is
illustrated in Fig. 10.14, where a grating is placed in the Fourier plane.
Fig. 10.14 (i) A 4f spatial filter with

a grating placed in the Fourier plane.
The output is a convolution of the input
image (in this case the letter K) and the
Fourier transform of the grating which
can be described by a comb function.
The effect is to make copies of the
input image. Translating the grating
by a half period multiplies the output
by a phase such that successive images
either interfere (ii) destructively or (iii)
constructively. See Exercise 10.13. The
experimental images were obtained by
Ismael Lasanta at Durham University
in 2016.
10.8 Phase-contrast imaging 171
The output is a convolution of the original image and a comb

function. Interestingly, the phases of the replicated images depend on
the transverse position of the grating.
10.8 Phase-contrast imaging

Often in microscopy we are interested in transparent or semi-transparent
objects. A transparent object imprints a phase on the field but this
does not show up if we measure intensity. In the 1930s, Frits Zernike
(Amsterdam 1888–Amersfoort 1966) realized that if we modify the phase
of different spatial frequencies then we can map a phase pattern into an
intensity pattern. This idea is known as phase-contrast imaging or
phase-contrast microscopy and earned Zernike the Nobel Prize in
1953. There are many possible variations on the phase-contrast-imaging
theme.
We shall illustrate the basic principle of imaging spatial phase
patterns by modifying the field in the Fourier plane. In the simulation
of Fig. 10.15, the input field has slowly varying gaussian intensity
distribution with width a, but a phase that varies sinusoidally, i.e. the
input field has the form
x
f(x) = gauss ei2πu0 x , Fig. 10.15 Phase-contrast imaging.
a
The input image has a smooth intensity
profile but a spatially varying phase.
where u0 > 1/a is the spatial frequency of the phase variation. Although
(i) Without spatial filtering, the output
the phase has a dramatic effect on how the light propagates, the output is the same (only magnified). (ii)
image also has a smooth intensity and we do not detect the phase If we add a phase plate in the
information. The output image changes dramatically if we add a filter in Fourier plane, then the output (on
the far right) is changed dramatically,
the Fourier plane that shifts the phase of low spatial frequencies by π/2 revealing the phase information as an
relative to the high spatial frequencies. The effect of this phase plate is intensity modulation.
illustrated in the lower plot of Fig. 10.15. In this case, the initial phase
variation is mapped into an intensity variation in the image and can be
detected directly.
Example 10.5
Phase encoding: In phase-contrast imaging we read out the phase information in
the field via interference. To see how important the phase information is in any
image we consider a simple example where in the Fourier plane of a 4f spatial filter
we swap the phase information. The effect is shown in Fig. 10.16. The left-hand
column shows the input intensity, the middle column shows the field in the Fourier
plane, and the right-hand column shows the output field. In the Fourier plane we
swap the phase patterns, which converts the cat into a duck, and vice versa; i.e. phase
is more important than intensity in determining the image. If we ignore the intensity
information completely in the Fourier plane and just imprint the phase pattern on a
laser beam we get a similar result.
Fig. 10.16 Input image (left-hand

column) and the corresponding in-
tensity patterns in the Fourier plane
are shown (middle column). The
phase information in the Fourier plane
is swapped, producing the output
intensity patterns shown in the right-
hand column. Images courtesy of
Emma Clennett, Durham University,
2015.
10.9 Complementarity II
In Chapter 9, we illustrated the principle of complementarity using
Heisenberg’s microscope, Section 9.8. Another way of looking at
complementarity, which has the advantage of only using photons, is
to exploit the real and Fourier space distributions in our 4f imaging
system. Whereas in the Heisenberg microscope we attempt to measure
which-path information and pay the price that the fringe visibility is
degraded, this time we attempt to observe the interference pattern and
find that the which-path information is degraded. The idea is to combine
Young’s double-slit experiment with the 4f spatial filter, as illustrated in
Fig. 10.17. If there is no mask, then photons that enter through input
slits A and B will exit through output slits A and B, respectively. If
the photon passes through both slits then there will be an interference
pattern in the Fourier plane.
Fig. 10.17 Imaging of Young’s inter-

ference fringes. If there is no mask
in the Fourier plane (top image), then
light entering at A exits at A, and
similarly for B. However, if a grating
is inserted in the Fourier plane (lower
image), where the spatial period of
the grating matches the fringe spacing,
then light entering at A is diffracted by
the grating and may exit at either A
or B. Consequently, the ‘measurement’
of the wave interference using a grating
erases the which-path information.
We can attempt to measure the interference pattern by inserting a

grating in the Fourier plane. If the grating period matches the period
of the interference fringes and it is aligned appropriately then most
of the light is still transmitted; whereas if there were no fringes, a
significant fraction of the intensity would be blocked. So can we argue
that this optical system both provides path information and is sensitive
to interference? Does this violate complementarity? The answer is no,
because inserting the grating scrambles the which path information by
diffracting light from one entrance slit to the other exit slit, as illustrated
in the lower image in Fig. 10.17. The grating produces multiple copies
of each entrance slit as we saw in Fig. 10.14. An experiment verifying
this effect and reaffirming complementarity was performed using single
photons by Jacques et al. in 2008.
Interestingly, the better we match the grating transmission to the
interference pattern, the more efficient is the diffraction between paths
A and B. This is because the fringe visibility, V, and the path
distinguishability, D, are linked via the complementarity inequality,
V 2 +D2 ≤ 1, see Jacques et al. (2008). A convenient way to think about
this is to use time-reversal symmetry or the reciprocity theorem
to map back from the output plane to the Fourier plane; then it follows
that having light at both output ports would produce an interference
pattern that matches the grating, as happened for the two inputs.
Chapter summary
• The spatial distribution in the focal plane can be modified by

apodizing the light distribution in the plane of the lens.
• Inverse apodization lets more light through from the edges of
the lens relative to the centre and gives rise to super-resolution.
• A 4f spatial filter can be used for optical image processing.
• In a 4f set-up, low- (high)-pass filtering is achieved with a hole
(solid disk) centred on the axis in the Fourier plane.
• Low-pass filtering in the Fourier plane softens the edges in an
image.
• High-pass filtering in the Fourier plane sharpens the edges in an
image.
• Placing a diffraction grating in the Fourier plane of a 4f set-up
creates an image with numerous replicates of the input. The
output is given by a convolution of the input with the grating
transmission function.
• Phase-contrast imaging can be achieved by using a filter in the
Fourier plane that retards the phase of different spatial frequencies.
• The phase of the field in the Fourier plane is important as
demonstrated by the example of phase encoding.
174 Exercises
Exercises
(10.1) Intensity of subsidiary maxima filter show that the central peak of the point-
(i) For a one-dimensional rectangular aperture spread function has to be narrower than that
tabulate the intensity of the 1st, 2nd, . . ., 10th of the unaltered pupil. [Hint: At the centre of
subsidiary maximum relative to that of the central the diffraction pattern, from the central ordinate
peak. theorem we know that the field is the integral
(ii) Repeat the analysis for a two-dimensional sym- over (1 − T), which is positive. For a value of
metric aperture where the point-spread function is 1.22λ/D the field has two components, that from
the Airy pattern. the ‘1’ goes to zero. Therefore show that the field
(10.2) Point-spread functions of apodizing functions here must be negative. (Recall that the central
Use Fourier techniques, as demonstrated in maximum of the point-spread function of T is
Example 10.1, to reproduce the Fourier transforms wider than the Airy pattern.) Hence show that
of the apodizing functions listed in Table 10.1. the Fourier transform of this modified pattern
must cross zero between these two values—and
(10.3) Plotting apodized intensity point-spread functions thus is narrower.
Plot the intensity point-spread functions of the
apodizing functions in Table 10.1 to reproduce (10.8) Width of central maximum for an inverse-apodized
Fig. 10.2. Make two versions of each graph, one circular aperture—absorbing filter
with a linear and the other a logarithmic scale for Calculate the intensity point-spread function for a
the ordinate. circular pupil with diameter D in front of a lens
of focal length f using an annular aperture, which
(10.4) Reduced maximum intensity with apodized pupil only transmits light in the region
Show that the peak intensity of an apodized
function has to be less than that of the unapodized αD/2 ≤ ρ ≤ D/2 ,
aperture.
[Hint: use the central ordinate theorem.] with 0 ≤ α ≤ 1. Plot the width of the central
(10.5) Designing an apodizing function for a desired maximum and the peak intensity as a function
point-spread function of α. Comment on the trade-off between the
Discuss why it is not possible to design a point- narrower central maximum and the drop off in
spread function (amplitude or intensity), and then peak intensity.
find the corresponding apodizing function. (10.9) Width of central maximum for an inverse-apodized
[Hint: consider the extent of the Fourier transform circular aperture—phase filter
of an arbitrary function.] Repeat the analysis of the previous question, but
(10.6) Inverse apodization in one dimension with an inverse-apodizing mask given by
Consider the function
−1 0 ≤ ρ ≤ αD/2
⎧ apod(ρ ) =
1 αD/2 ≤ ρ ≤ D/2
⎨ 1 −a/2 ≤ x ≤ −αa/2

apod(x ) = 0 −αa/2 ≤ x ≤ αa/2 Comment on the trade-off between the narrower
⎩
1 αa/2 ≤ x ≤ a/2 central maximum and the drop off in peak
intensity.
where 0 ≤ α ≤ 1. Calculate the amplitude point-
spread function for this inverse-apodized function. (10.10) Spatial filtering of a 1D grating
Plot the width of the central maximum and the The object in a 4f set-up is a one-dimensional
peak intensity as a function of α. Comment on the grating with transmission profile
trade-off between the narrower central maximum
f(x ) = 0.5 + 0.4 cos(2πx /d) + 0.1 cos(4πx /d) ,
and the drop off in peak intensity.
(10.7) Narrower point-spread function for inverse- where d is the period of the grating.
apodized functions (a) Show that the intensity diffraction pattern in
For a circular pupil with an inverse-apodization the Fourier plane consists of five spots.
Exercises 175
(b) Given the wavelength of the light used is λ, in the Fourier plane to remove one of the horizontal
what is their location? bars, such that the image looks like an F.
(c) Calculate the relative intensities of the five
(10.13) Twisted comb
spots.
A grating with period d can be described using the
(d) What intensity pattern is observed in the
function Xd (x). If the grating is translated by a
output if
half-period this becomes t(x) = Xd (x−d/2). Find
(i) no filter is inserted,
the expression for the Fourier transform of t(x)
(ii) the outer two spots are blocked, and
with Fourier variable u = x/(λf ), and sketch the
(iii) only the outer two spots are transmitted?
function. Use your sketch to explain Fig. 10.14(ii)
(10.11) Fourier transform of the letter K and (iii).
Figure 10.5 depicts the square modulus of the two-
dimensional (2D) Fourier transform of the letter K. (10.14) Spatial filtering of a letter
Explain the form of the diffraction pattern. Write a caption for Fig. 10.18. Estimate the ratio
between the cut-off between low and high spatial
(10.12) Fourier transform of the letter E
frequencies, ρc , and the widths, a, of the lines used
(i) Sketch the form of the square modulus of the
to create the letter.
2D Fourier transform of the letter E.
(ii) Is it possible to insert a mask in the Fourier (10.15) Convolutor
plane to remove the vertical bar, and retain the Sketch the optical layouts used to obtain the
horizontal bars? images shown in Fig. 10.19. Indicate where the
(iii) If so, sketch the form of the filter. detector should be placed in order to observe (i)
(iv) Explain why it is impossible to insert a mask and (ii).
Fig. 10.18 Two examples of spatial

filtering. See Exercise 10.14.
176 Exercises
Fig. 10.19 Images recorded in either

the output plane or the Fourier plane
using a 4f spatial filter. One compo-
nent in the set-up is moved between
the measurement of (i) and (ii). See
Exercise 10.15. Experimental images
courtesy of Ismael Lasanta, Durham
University, 2016.
Light propagation: beams
and guides 11
. . . it is a great thing to take any step that leads us onwards.
Anthony Trollope, Phineas Finn (1868)
11.1 Introduction 11.4 Optical cavities 182
11.5 Waveguides 183
In previous chapters we have looked at light fields that spread out. 11.6 Modes within a slit 184
Now we shall focus on fields that mostly remain the same. In either 11.7 A cylindrical light guide 185
a laser cavity or a wave guide—such as an optical fibre—light can 11.8 Step-index fibre 188
propagate thousands of kilometres without changing significantly. Light 11.9 Fibre modes 189
beams with a well-defined spatial profile are known as modes. The Chapter summary 192
most common examples are laser beams and light confined in a cavity Exercises 193
or waveguide. We now consider the propagation of modes including
gaussian laser beams and the modes inside optical fibres. We shall
discuss how these light modes determine the design of laser cavities and
optical fibres.
11.2 Laser beam propagation

Firstly, we shall focus on the simplest and most useful eigensolution of
the propagation equation, eqn (6.29), where the field distribution in a
particular plane—which we put at z = 0 for now—is given by
2
E (0) = E0 e−ρ /w02
, (11.1)

where ρ = x2 + y 2 is the transverse displacement in cylindrical
coordinates, and w0 is known as the beam waist. This field profile 1
The name is unfortunate as the field
is sometimes called a TEM00 mode, where TEM stands for transverse is not purely transverse, as we shall see
electric and magnetic,1 and ‘00’ indicates that there are no nodes in the in Chapter 12.
transverse field distribution. A schematic of a TEM00 gaussian beam
indicating the beam waist in the z = 0 plane is shown in Fig. 11.1. The
intensity I = 12 0 c|E|2 is given by 2
It is regrettable that there are three
−2ρ2 /w02 different conventions for defining the
I (0)
= I0 e . (11.2) width of a gaussian: (i) the standard
deviation, (ii) the full width at half
The beam waist, w0 , corresponds to the distance off-axis at which the maximum, and (iii) the 1/e2 -radius. As
intensity falls to 1/e2 of its maximum, i.e. a 1/e2 -radius.2 To obtain an the latter is ubiquitous in laser physics
we shall stick to it in this chapter.
equation for a gaussian beam at any point (x, y, z) we solve the paraxial
178 Light propagation: beams and guides
propagation equation. First we find the angular spectrum. Using

cartesian separability to write f(x , y ) = g(x )h(y ), and the Fourier
transform of a gaussian, eqn (6.21). We obtain
= πw02 E0 e−kρ w0 /4 ,
2 2
= πw02 E0 e−π
2
(u2 +v 2 )w02
A(0) (11.3)
where kρ = (kx2 + ky2 )1/2 is the √

component of the wave vector in the
radial direction, and we obtain a πw0 prefactor from both dimensions.
For monochromatic light, there is only one value of k, and we can
re-write kz in terms of kx and ky or kρ : kz2 = k 2 − kρ2 . In the paraxial
3
Recall that the scalar approximation limit,3 kρ < kz , we have kz = k − kρ2 /(2k), and the paraxial form of the
is only valid in the paraxial limit. propagator is
eikz z = eikz e−ikρ z/2k .

2
The paraxial form of the hedgehog equation, eqn (6.29), is

E (z) = eikz F −1 e−ikρ z/2k F[E (0) ] .
2
(11.4)
For the case of a gaussian input field we can solve this equation exactly.
The angular spectrum in a plane at z is given by multiplying the angular
spectrum in the input plane, eqn (11.3), by the propagator:
eikz e−ikρ z/2k A(0) = πw02 E0 eikz e−(w0 /4+iz/2k)kρ ,

2 2 2
A(z) =
πw02 E0 eikz e−ikρ q/2k ,
2
= (11.5)
where
q = z − izR
is known as the complex beam parameter and zR = kw02 /2 = πw02 /λ
is the Rayleigh range as defined in Chapter 5. Finally, we find the
inverse transform using F −1 [gauss(kρ ξ/2)] = 1/(πξ 2 )gauss(ρ/ξ) with
Fig. 11.1 The geometry of a gaussian
laser beam in the xz plane. The ξ 2 = i2q/k, which gives
greyscale shows intensity (peak inten- 1
F −1 e−ikρ q/2k
2 2
sity is white). The grey dashed lines = eikρ /2q . (11.6)
correspond to the beam radius, w, πi2q/k
which is the transverse distance at
which the intensity falls to 1/e2 of its Multiplying by the prefactor πw02 E0 eikz we get
on-axis value. The peak intensity and
minimum radius, w0 , occurs at the
position of the beam waist, which here zR 2
is in the z = 0 plane.In the far-field,

E (z) = E0 eikz eikρ /2q . (11.7)
iq
the angular divergence is Δθ = w/z =
λ/(πw0 ), as we saw in Chapter 5.
This remarkably simple result, derived solely from knowledge of the field
in the z = 0 plane, tells us the amplitude and phase of the laser beam
everywhere! This gaussian-beam equation is a full eigensolution of the
paraxial wave equation.
The role of the complex beam parameter q is quite subtle, and to
interpret eqn (11.7) we expand 1/q into real and imaginary parts:
1 z zR
= 2 + i z2 + z2 , (11.8)
q z 2 + zR R
and the phase term is
kρ2 ikρ2 z kρ2 zR kρ2 ρ2

i = 2 − 2 = i − , (11.9)
2q 2 z 2 + zR 2 z 2 + zR 2R w2
where we have defined two new parameters:

2 1/2
w = w0 1 + z 2 /zR , (11.10)
2
R = z + zR /z , (11.11)
known as the beam radius and the wave-front curvature, respec-

tively. Substituting these terms into the gaussian beam equation,
eqn (11.7), we find
zR
E0 eikz eikρ /2R e−ρ /w .
2 2 2
E (z) = (11.12)
iq
Figures 11.1–11.3 illustrate the beam radius and wave-front curvature.
Next, we consider them in more detail.
Beam radius: The factor e−ρ /w in eqn (11.12) indicates that the
2 2
transverse spatial profile remains gaussian, but with a modified√beam Fig. 11.2 Illustration of the Rayleigh
radius given by eqn (11.10). At z = zR the beam radius w = 2w0 , range, zR , and angular divergence,
see Fig. 11.2, and the beam area is double that at the waist. In Δθ = λ/(πw0 ), of a laser beam in
the far-field, z > zR , we recover the simple result that the beam the xz plane. The initial beam radius
(waist) in the z = 0 plane√is w0 . The
radius increases linearly with distance, w = w0 z/zR , and the angular beam radius at z = zR is 2w0 .
divergence Δθ = w/z = w0 /zR = λ/πw0 , as we saw in Chapter 5. The
Rayleigh range characterizes the cross-over between the near field and
far field, and also how far the beam can propagate before the transverse
distribution changes substantially. In this respect it is analogous to
the Rayleigh distance in diffraction, see Chapter 5, and emphasizes the
point that a length scale corresponding to transverse size-squared over
wavelength is characteristic of all diffraction phenomena.
2
Wave-front curvature: The factor eikρ /2R in eqn (11.12)—similar
to the quadratic phase term appearing in the paraxial spherical wave in
Chapter 2, Section 2.14—tells us that the wave fronts are curved.
If we consider the phase factor associated with the spherical wave eikr
4
with r = (R2 + ρ2 )1/2 , then in the paraxial limit (ρ < R) we have We could have guessed this from
the symmetry of the gaussian beam
r = R + ρ2 /2R, giving the same dependence on ρ as the laser beam. So solution upstream and downstream of
the wave fronts are approximately spherical with a radius of curvature the waist, but still it may seem
R. Note that R > z so the effective origin of the spherical wave front surprising that the wave fronts at the
is always further away than the waist, as shown in Fig. 11.1. Note that waist are planar yet the beam spreads
out.
the radius of curvature is infinite in the z = 0 plane, i.e. the wave fronts
are planar at the beam waist, as shown in Fig. 11.3.4 Finally, we note
that the wave front is most curved, i.e. R is a minimum, when z = zR .
Gouy phase: The prefactor in eqn (11.12) is also complex and can
be rewritten in terms of an amplitude and a phase. First separating the
real and imaginary parts, we have
zR zR z 2 − izR z
= = R2 2 ,
iq iz + zR z + zR
which can be written as
zR zR w0 −iα
= 2 )1/2
e−iα = e ,
iq (z 2 + zR w
where

z
α = tan−1 (11.13)
zR
is the Gouy phase (see also Chapter 5), and the factor w0 /w ensures
that energy is conserved. This phase was first discovered by Louis George
Gouy (Vals les Bains 1854–1926) in 1890, and arises due to the finite
size of a wave at a focus. The Gouy phase evolves by π from one side
of the focus to the other, and is crucial in explaining some features in
light–matter interactions (see Chapter 13).
Intensity: Putting all the above factors together, we obtain

Fig. 11.3 Visualization of the wave
fronts of a gaussian beam. The wave w0 −iα ikz ikρ2 /2R −ρ2 /w2
fronts are planar at the position of the E (z) = E0 e e e e , (11.14)
beam waist. The minimum radius of
w
curvature is R = 2zR at a distance zR
from the waist.
w 2
e−2ρ
2
0 /w2
I (z) = I0 . (11.15)
w
Note that while the transverse intensity is described by a gaussian, the
on-axis intensity is described by a Lorentzian.
11.3 Focusing of laser beams

The focusing of gaussian laser beams, either by a lens or a curved mirror,
is a key concept in the design of lasers and coupling light into optical
fibres. In general, the beam waist is not in the plane of the lens, and the
effect of the lens is analysed using the complex beam parameter form of
the gaussian beam, eqn (11.7). Consider an input beam with Rayleigh
range, zR1 = πw12 /λ, incident on a lens at a distance z1 downstream of
the waist, as illustrated in Fig. 11.4. The field incident on the lens is
q0 2
E (0) = E0 eikρ /2q1 ,
q1
where the complex beam parameter, q1 = z1 − izR1 . The field

immediately after the lens is
q0
E (0) e−ikρ
2 2
E (L) = /2f
= E0 eikρ /2q2 ,
q1
where q2 = −z2 − izR2 is the modified complex beam parameter (the

minus sign appears because the lens is upstream of the new waist).
Substituting for E (0) , we find that
1 1 1 Fig. 11.4 A laser beam with waist w1
= − , propagates a distance z1 to a lens. The
q2 q1 f lens produces a beam with waist w2 at
a distance z2 , given by equating the real
analogous to the result we obtained for paraxial spherical waves in and imaginary parts of eqn (11.16).
Section 2.18. Rearranging, we obtain
q1
q2 = ,
1 − q1 /f
z1 − izR1
−z2 − izR2 = . (11.16)
1 − z1 /f + izR1 /f
By equating the real and imaginary parts of eqn (11.16) we find the new
waist size in terms of zR2 and position z2 . This formula is particularly
useful in the design of laser cavities.
Consider the case where the laser beam is sufficiently well collimated
that we can assume that the beam waist lies in the same plane as the
lens, i.e. z1 = 0 as in Fig. 11.5. Putting z1 = 0 in eqn (11.16) and
equating imaginary parts, we obtain
zR1
zR2 = 2 /f 2 . (11.17)
1 + zR1
If zR1 = πw02 /λ and zR2 = πwf2 /λ, then for zR1 > f , zR2 f 2 /zR1 ,
giving
fλ
wf = , (11.18)
πw0
which is the same result as obtained using the Fraunhofer diffraction
formula, see Chapter 5. This is a useful result for estimating the size of
a focused laser beam; however, we have to remember that it assumes that
the lens does not introduce any aberrations—optimal focusing without
aberration is described as diffraction limited.
Note that the waist of the focused beam is not in the focal plane, see
Fig. 11.5, and the wave fronts in the focal plane are curved—not planar— Fig. 11.5 A laser beam with beam
as expected from the Fresnel diffraction integral, eqn (6.34). By equating waist w0 in the z = 0 plane is incident
the real parts in eqn (11.16) we find that the focal shift—the difference on a lens with focal length f positioned
2 in the z = 0 plane. The lens creates a
between the focal plane and waist plane—is approximately zR2 /f , where
new waist with radius, wf , at position
zR2 is the Rayleigh range of the focused beam. Alternatively, it can be z = f − zR2 2 /f , where z
R2 is the
written as f 3 /zR1
2
, where zR1 is the Rayleigh range of the incident beam, Rayleigh range of the focused beam.
see Exercise 11.11.
11.4 Optical cavities

In Chapter 3 we discussed the Fabry–Perot etalon, where light is
reflected backwards and forwards between two interfaces. The beam
diffracts as it travels back and forth. To counter this spreading due to
diffraction, it is possible to refocus the beam using a curved reflector,
as illustrated in Fig. 11.6. Using two or more mirrors to confine light in
this way is known as an optical cavity or optical resonator. The
reflecting surfaces impose a boundary condition on the electric field
which restricts the allowed frequencies—like the resonances of a Fabry–
Perot, see Section 3.11, or waves on a string. The boundary conditions
also constrain the spatial mode. The eigenmodes of a cavity must
be solutions of the propagation equation, eqn (6.29), such as gaussian
beams. The additional constraint of the mirror is that the wave-front
curvature at each mirror must match the mirror curvature.
For the plano-convex cavity shown in Fig. 11.6, if the plane and curved
Fig. 11.6 Schematic of a gaussian mirrors are at z = 0 and z = L, respectively, then the wave-front
mode inside a plano-convex cavity. The curvature is R = ∞ at z = 0 and R = Rm at z = L, where Rm is
wave-front curvature of the gaussian the radius of curvature of the mirror. Using the expression for gaussian
beam matches the mirror curvature.
beam curvature, eqn (11.11), we have
Consequently the beam waist is located
2
at the plane mirror. For there to be zR
a gaussian mode with a finite beam Rm = L+. (11.19)
waist, the cavity must be shorter than
L
the radius of curvature of the curved Substituting zR = πw02 /λ, and rearranging we find that the beam waist
mirror, Rm . is at z = 0 and has a radius,
1/2
λ 1/4 1/4
w0 = (L) (Rm − L) . (11.20)
π
This result implies that w0 → 0 for L → 0 and L → Rm which sets
5
See Hooker and Webb (2010) for a the so-called limits of stability5 of the cavity, 0 < L < Rm . If we
clear treatment of the conditions for a utilize the mid-point in the stability region, L = Rm /2, we obtain the
low-loss/stable cavity.
maximum value of w0 for a particular cavity length,
1/2
λL
w0,max = . (11.21)
π
Next, we consider a specific example where these considerations are
important.
Example 11.1
Laser pointer: An interesting example of optical engineering which exploits these
ideas is a green laser pointer. The heart of the laser consists of a plano-convex cavity
similar to Fig. 11.6. The laser light is generated inside the cavity using a crystal of
yttrium aluminium garnet doped with neodynium ions (Nd:YAG). The Nd ions are
optically excited using light from a semiconductor diode laser and are subsequently
stimulated to emit into the laser mode. The emitted laser light at λ = 1.06 μm is
frequency doubled to produce green light at 0.532 μm using a non-linear crystal. A
typical cavity length is L = 1.00 cm. Using eqn (11.21) we obtain a mode waist size
of w0 = 58.1 μm, which defines the physical size of the region where we need to excite
the laser crystal.
11.5 Waveguides 183
Example 11.2
Symmetric cavity: A symmetric version of a laser cavity, see Fig. 11.7, consists of
two curved mirrors at z = ±L/2. In this case,
L z2
Rm = + R . (11.22)
2 L/2
Using zR = πw02 /λ we can re-arrange to find the beam waist at the centre of the
cavity,
1/2 1/4
λ L L 1/4
w0 = Rm − . (11.23)
π 2 2
We can see that w0 → 0 for L = 0 and L = 2Rm , which set the limits of stability of
the cavity. The optimal stability is at the mid-point, L = Rm , which also corresponds
with the maximum value of w0 for a particular cavity length. Fig. 11.7 Schematic of a symmetric
cavity. The wave front curvature of
the gaussian beam matches the mirror
curvature, at z = ±L/2. The beam
waist is located in the z = 0 plane.
For there to be a gaussian mode with
a finite beam waist, the cavity must
11.5 Waveguides be shorter than 2Rm , where Rm is the
radius of curvature of the mirror.
An optical resonator preserves the spatial profile of a propagating beam
by compensating diffraction by refocusing. If we use lenses rather than
curved mirrors, as in Fig. 11.8, the light propagates in one direction
rather than back and forth, yet the spatial mode remains the same. We
could call this sequence of lenses a single-mode waveguide. The most
common form of single-mode waveguide is an optical fibre, effectively an
infinitely long lens where the focusing effect is just enough to cancel
diffraction, creating a stable transverse field distribution or mode. In
a single-mode waveguide, the transverse profile is uniquely defined and
the light emerging from the fibre has the same spatial distribution as Fig. 11.8 Unwrapping the cavity in
the input. Fig. 11.7, where a sequence of lenses
Before we consider how this works, it is important to mention that rather than curved mirrors preserves
the spatial mode.
lensing is not the only way to guide light. If a reflecting interface
surrounds the propagation direction then light is guided like water in
a pipe. For example, light inside a glass rod may be confined by 6
This type of light guide is used
total internal reflection.6 This type of light guide is called multimode in the rod-lens endoscope invented
because it supports many propagation modes rather than just one. For by Harold Horace Hopkins (Leicester
1918–Reading 1994). Hopkins also
a multimode guide, light propagation cannot be described analytically made major contributions to Fourier
and we need to revert to the hedgehog equation, eqn (6.29), or vector optics and the theory of aberrations,
theory, see Chapter 12. still used in lens design.
Here we shall focus on single-mode wave guides as these are the 7
Although the mathematics of cylindri-
most useful for applications such as optical communications.7 As both cal waveguides is not pretty, as optical
light and matter are described by solutions of a wave equation, there is a fibres are so ubiquitous it is important
for any student of optics to have at least
useful analogy between guided light modes and the bound states found some idea of how they work, and where
in quantum systems. This idea is very powerful. Consequently we shall to find the mathematics if required.
devote a significant fraction of the following sections to the cross-over
between quantum physics and guided light.
11.6 Modes within a slit

In our analysis of diffraction in Chapters 5 and 6, we assumed that
all apertures had zero thickness, which is reasonable if the transverse
dimensions are much larger than the depth. However, if the depth of
the aperture is significant compared to the width, as in Fig. 11.9, the
slit begins to behave like a guide and the equations for Fresnel and
Fraunhofer diffraction give poor results. In this section, we model the
wave nature of an electric field amplitude within a slit, and show how the
angular spectrum can be represented as a discrete sum of modes, each
with a distribution of wave vectors. In turn, the momentum distribution
and the far-field diffraction pattern depends on the allowed modes and
their respective wave vector distributions. This analysis of single-slit
diffraction in terms of modes provides a stepping stone to the treatment
of wave-guide modes.
As the Helmholtz equation in optics is the same as the time-
independent Schrödinger equation, finding the light field with a partic-
ular boundary condition is the same as solving the quantum particle-
in-a-box problem. The edges of the slit impose a boundary condition
on the field. Up to now we have assumed that the boundary condition is
E = 0 at the edges of the slit, |x| = a/2, but for most materials there is
some penetration of the field inside the medium. For example, inside a
metal, the field decays exponentially over a distance known as the skin
depth, δ, which is typically an order of magnitude smaller than the
wavelength, see Chapter 13. Similarly to the particle-in-a-box problem,
we can write the wave solution inside a metallic slit as a sum of modes
with angular spatial frequency kx , where

Ak cos kx x |x| < a/2
Ek (x) = . (11.24)
Bk e−x/δ |x| ≥ a/2
Ak and Bk are amplitudes to be determined, and δ is the skin depth.
According to Maxwell’s equation, the field and the field gradient must
Fig. 11.9 The change in the diffraction
pattern arising from the depth of the be continuous at |x| = a/2, which gives
slit. For small depth (top image),
the intensity pattern is similar to the A cos kx a/2 = Bk e−a/2δ ,
result obtained in Chapter 5, whereas 1
for a deep slit (lower image) the central kx A sin kx a/2 = Bk e−a/2δ .
δ
fringe is wider and there are fewer
fringes. Both plots were calculated Taking the ratio we have
using eqn (6.29).
1
tan kx a/2 = . (11.25)
kx δ
We plot these two functions in Fig. 11.10(i). Their intercept gives the
spatial frequencies u = kx /2π of the allowed modes. Next, we rewrite
the input distribution (a rect function) as a sum of these allowed modes.
The amplitude of each mode—equivalent to the angular spectrum—can
be evaluated from its overlap with the rect function,
ˆ a/2
Ak = cos kx xdx = asinckx a/2 . (11.26)
−a/2
In Fig. 11.10(ii) we show the sum of allowed modes, which is similar to,
but not a perfect reproduction of, a rect function, partly because the
maximum angular spatial frequency is capped at kmax = 2π/λ. Using
the modified light distribution we can calculate the far-field intensity
pattern, as shown in Fig. 11.10(iii). The difference to the sinc-squared
distribution is small, but noticeable. If we reduce the slit width the
discrepancy becomes larger as fewer modes are allowed. Worth noting is
that although now we have a discrete spectrum of kx values, each one is
associated with a mode with finite spatial extent, so has a corresponding
spread in kx .
Fig. 11.10 (i) The spatial frequencies,

u, of allowed modes inside a slit with
width a = 6λ. (ii) The spatial field pat-
tern produced by the allowed modes.
(iii) The resultant angular spectrum
(and hence momentum distribution)
compared to a sinc-squared distribution
(grey).
This idea of light modes within a region of space becomes particularly

important in the context of wave guides such as optical fibres. In this
case, the light is confined to a transverse size not much larger than
the wavelength and the number of allowed transverse modes is reduced
to of order one. Whereas previously we have treated a situation with
confinement in only one transverse dimension, next, we focus on the
case of cylindrical symmetry which is particularly relevant to the case
of optical fibres.
11.7 A cylindrical light guide

The chain of lenses illustrated schematically in Fig. 11.8 produces a
beam of light that remains approximately the same. We could imagine
Fig. 11.11 Discrete model of light
bringing the lenses closer to form a continuous guide; however, as the guiding where diffraction in free space
lenses rely on their thickness variation to achieve a focusing effect, this (white regions) is countered by refo-
would not work. Instead, imagine a lens where rather than the glass cusing in media with spatially varying
thickness being a function of transverse displacement, the thickness is index (grey regions). The spatial extent
of the light mode is indicated by the
kept constant and the refractive index is a function of the transverse black lines.
displacement. This is known as a graded-index (GRIN) lens.8 8
In a classical picture, the lens exerts
Starting with the sequence of graded-index lenses, shown in Fig. 11.11, a transverse force proportional to the
we can increase their thickness until they merge into a continuous displacement from the optical axis,
medium known as a graded-index optical fibre. Although, most similar to the case of harmonic motion,
and a GRIN lens is analogous to a
optical fibres use a step-index, see Section 11.8, the principle is the parabolic potential well for photons.
same, and we shall consider the graded-index case first.
Consider what happens when a gaussian beam propagates a small
distance δz inside a medium with uniform refractive index n0 . From
eqn (11.14), the field a distance δz
zR downstream of the beam waist
is
2
/2R −ρ2 /w02
E (δz) = E0 einkδz ein0 kρ e ,
where we have neglected the variation of the beam radius, w w0 [1 +

(δz)2 /2zR
2
], because it is second order in δz. For δz
zR , the radius of
curvature
R = δz + zR2
/δz ≈ zR
2
/δz ,
and
2 2
−ρ2 /w02
E (δz) = E0 ein0 kδz ein0 kρ δz/2zR
e . (11.27)
In order to preserve the spatial distribution, we need to cancel the

2 2
wave-front distortion contained in the ein0 kρ δz/2zR term. This is
achieved by imprinting an appropriate parabolic phase as in the thin-lens
approximation. For a graded-index lens, the refractive index decreases
Fig. 11.12 Zoom-in on the discrete

light guide of Fig. 11.11. In the limit,
δz → 0, this corresponds to a graded-
index fibre.
quadratically with displacement from the optical axis, i.e.
ρ2
n = n0 − Δn , (11.28)
w02
such that the refractive index is n0 on-axis and decreases by an amount
Δn over a transverse distance w0 . Substituting this index profile into
the gaussian beam propagation equation (neglecting terms above second
order in ρ), we find

Δn n0
exp −i 2 kδzρ exp i 2 kδzρ e−ρ /w0 .
2 2
E (δz)
= E0 e in0 kδz 2 2
w0 2zR
It follows that we can cancel the wave-front curvature by choosing

n0 Δn
2 = w2 ,
2zR
(11.29)
0
as seen in Fig. 11.12. Substituting for the Rayleigh range inside a

medium with refractive index n0 , zR = n0 kw02 /2, we find that
2
Δn = . (11.30)
n0 k 2 w02
This equation allows us to design the parameters of the fibre in order

to guide a mode with a particular transverse size w0 . For w0 = 3λ with
n1 = 1.5, we obtain, Δn ≈ 2 × 10−3 , i.e. an index variation of two
parts in a thousand is sufficient to confine a gaussian beam. One thing
that we learn from this result is that relatively small index variations
are sufficient to cancel diffraction.
This analysis treats light guiding as a propagation effect where
diffraction is cancelled by imprinting a spatially varying phase. In
Example 11.3, we solve the wave equation directly, and make use of the
similarity between the Helmholtz equation for light inside a waveguide
and the Schrödinger equation for a particle confined by a potential.
Although the two approaches give the same answer, they provide quite
different insight: either light guiding as a propagation effect where phase
imprinting cancels diffraction; or light guiding as a bound eigenstate in
a potential well.
Example 11.3
GRIN fibre revisited: In this example, we find the mode inside a graded-index fibre
by solving the Helmholtz equation, eqn (1.40). Inside a medium with refractive index
n, ∇2 E + n2 k2 E = 0, and for a propagating mode of the form E(x, y, z) = E(x, y)eiβz ,
we obtain
∂2E ∂2E
+ + (n2 k2 − β 2 )E = 0 , (11.31)
∂x2 ∂y 2
where β is known as the propagation constant. The parabolic index variation in a
graded-index fibre has the convenient property of cartesian separability allowing us
to separate the Helmholtz equations for each transverse dimension. For x, we have

∂2E k 2 x2
+ n 2 2
0 k − β 2
− 2n 0 Δn E=0, (11.32)
∂x2 w02
where we have used eqn (11.28) to put n2

n20 − 2n0 Δnx2 /w02 . The corresponding
particle-in-a box system is the harmonic oscillator where the Schrödinger equation
9
for the ground state may be written as9 This is derived from

∂2ψ 1 x2 2 ∂ 2 ψ
+ − ψ=0, (11.33) − + V ψ = Eψ ,
∂x2 a20 a40 2m ∂x2
where the confining potential is V =
where a0 = [/(mωosc )]1/2 is the harmonic
√ oscillator length. The solution to 1 2 x2 ,
mωosc ωosc is the oscillation
2
this equation is a gaussian with a 1/ e width equal to a0 : frequency, and the ground-state energy
is E = 12 ωosc .
1 x2
ψ= exp − . (11.34)
(2πa20 )1/4 2a20
Comparing the x2 terms in Helmholtz and Schrödinger equations, (11.32) and (11.33),
10
gives10 Equating the x2 terms gives
2 2n0 Δn(k2 /w02 ) = 1/a40 ,
Δn = , (11.35)
n0 k2 w02 √
and using w0 = 2a0 , we get
which is the same result we found by imprinting a parabolic phase.
2n0 Δn(k2 /w02 ) = 4/w04 .
11.8 Step-index fibre

Although we used the example of a GRIN fibre to illustrate light guiding,
most optical fibres use a step index, where there is a sudden change
in the refractive index at a radius a. The inner region is known as the
core and has an index n1 , and the outer region ρ > a is known as the
11
Although light is guided inside a cladding and has an index n2 with n2 < n1 .11 In this case, we would
region with higher refractive index, it write the index profile as
is possible to design a fibre where
the core is hollow (and hence has
2
ρ
a lower index than the surrounding n = n1 − (n1 − n2 )circ
2 2 2 2
.
region) by using periodic structures to a2
make the propagation constant in the
cladding imaginary, see Exercise 11.15 The guiding works exactly the same as the GRIN fibre, where the higher
on photonic crystal fibres. The index on-axis produces an effective refocusing to counteract diffraction.
idea of using a periodic structure to However, mathematically the solution is slightly more cumbersome
filter or reflect light is known as a
photonic bandgap, and exploits the
because the circ function is not cartesian separable, and we need to
same multiple-path interference effects solve the Helmholtz equation in cylindrical coordinates. Assuming a
that we explored in Chapter 3. cylindrical mode of the form E(ρ, φ, z) = E(ρ)eimφ eiβz , the Helmholtz
equation becomes

∂2E 1 ∂E m2
+ + n k −β − 2 E =0 ,
2 2 2
(11.36)
∂ρ2 ρ ∂ρ ρ
where m describes a phase winding. Solutions to this equation—like
other cylindrical symmetry problems such as the modes on a drum—are
Bessel functions. Re-writing the Helmholtz equation in the form
∂2E 1 ∂E
2
+ + κ2 E = 0 , (11.37)
∂ρ ρ ∂ρ
where for the fundamental m = 0 mode, κ has a constant value in the
core and cladding,
2 2
2 n1 k − β 2 ρ < a ,
κ = (11.38)
n22 k 2 − β 2 ρ ≥ a .
For convenience we introduce the dimensionless real propagation

constants
Fig. 11.13 The fundamental mode
(top) and index variation (bottom) in u2 = a2 (n21 k 2 − β 2 ) and w2 = a2 (β 2 − n22 k 2 ) ,
a step-index (black) and GRIN fibre
(grey), respectively. The mode in a corresponding to the core and cladding, respectively. In terms of these
GRIN fibre is a gaussian. parameters the solution is

E(ρ) J0 (uρ/a) ρ < a
= (11.39)
E0 ζK0 (wρ/a) ρ ≥ a ,
where E0 is the maximum field on-axis and ζ = J0 (u)/K0 (w) is a

constant that matches the amplitudes at the interface between the core
and the cladding. The modes functions can be plotted using the in-
built special functions provided in scientific software such as scientific
python. In Fig. 11.13 we plot this fundamental, m = 0 mode, for
a step-index fibre together with the gaussian mode of a GRIN fibre.

The Bessel function solution is close to a cosine inside the core and an
exponential decay in the cladding. The mode profile is similar to the
gaussian and a gaussian beam is typically a good approximation for
describing light emerging from a step-index fibre, as is apparent in the
top plot of Fig. 11.13.
11.9 Fibre modes

In this section, we consider whether an optical fibre can support one
or more modes. We expand on the idea of fitting waves into ‘boxes’
(and hence the analogy between light and matter waves) by borrowing
an idea from quantum physics, known as the WKB approximation.12 12
Named after Gregor Wenzel (Old-
Figure 11.13 illustrates the fitting-waves-into-boxes concept. In both enburg 1898–Ascona 1978), Hendrik
Kramers (Rotterdam 1894–Oegstgeest
optics and quantum theory the wave may penetrate the walls of the 1952), and Leon Nicholas Brillouin
box.13 To find the full wave solution we need to match the oscillatory (Sevres 1889–New York 1969). A full
and damped solutions across the boundary. Below we use a WKB ansatz WKB analysis of fibre modes is given
to derive some approximate criteria in order to design a single-mode in Hartog et al. (1977).
step-index fibre that supports one, and only one, mode. Such fibres are 13
In quantum physics we say that
widely used in optical communications. the particle can tunnel into the
classically forbidden region, and walls
The WKB ansatz is that we can write a general wave solution in one or boundaries are not ‘hard’ in the
dimension in the form E = E0 eiφ(ρ) , where φ(ρ) is a spatially dependent sense that all motion stops there. A
phase. For the special case, φ(x) = kx x, we have plane waves. WKB goes classical analogue is a rope with one
beyond planes waves by allowing any spatial form of φ(x). Substituting end in water where the oscillations are
strongly damped.
the WKB ansatz into the Helmholtz equation with m = 0 and equating
real parts, we find
2
∂φ
− + κ2 = 0 . (11.40)
∂x
We can solve this equation for φ by integrating, but the solution depends
critically on the sign of κ2 which changes from positive to negative in
going from the core to the cladding. In the core ρ < a, where κ is real,
we have
ˆ ρ
φ(ρ) = κdρ , (11.41)
0
whereas for ρ > a, κ is imaginary and instead we write

ˆ ρ
φ(ρ) = i |κ|dρ . (11.42)
a
Substituting these integrals into the WKB ansatz, we get14 14

The solution changes from oscillatory
ˆ a inside the core, where κ2 is positive, to
Ccore
E(ρ) = √ exp i κdρ , (11.43) an exponential decay in the cladding,
where κ2 is negative. Exactly the same
κ 0
as in the finite square well problem
and encountered in quantum mechanics.
ˆ ρ
Cclad
E(ρ) = exp − |κ|dρ , (11.44)
|κ| a
in the core and cladding, respectively,

√ where Ccore,clad are amplitudes to
be determined, and the factor of 1/ κ is required in order to normalize
the solution. The two solutions must match at the boundary: if we
continue the damped solution
√ into the core by substituting |κ| = iκ, we
find that Ccore = Cclad / i = Cclad e−iπ/4 , so defining Cclad = C we find
that in the core,
ˆ a
C π
E(x) = √ exp i |κ|dρ − . (11.45)
κ x 4
This result is known as the connection formula in the WKB method.
One can interpret the factor of π/4 as a phase shift that arises due to
the penetration into the cladding.
For the lowest order, m = 0 mode, the integrals are relatively
straightforward. For the core we find
ˆ r ˆ x ˆ x
u u
κdρ = (n1 k − β ) dx =
2 2 2 1/2
dx = x , (11.46)
0 0 0 a a
and in the cladding we get
ˆ ρ ˆ ρ ˆ ρ
w w
κdρ = (β 2 − n22 k 2 )1/2 dρ = dρ = (ρ − a) . (11.47)
a a a a a
So the complete solution is

A cos (uρ/a − π/4) ρ<a,
E(ρ) = (11.48)
B exp [−w(ρ − a)/a] ρ>a.
The solution in the core is not particularly accurate because of the π/4
correction that arose from the connection formula; however, the clever
property of the WKB approximation is that it still predicts accurate
eigenenergies, and hence propagation parameters, even though the wave
functions are less precise.
To find the constants A and B we match the amplitude and gradient
at ρ = a, which gives
π u π w
A cos u − = B and − A sin u − =− B ,
4 a 4 a
Fig. 11.14 A b−V plot for the first two
respectively. Taking the ratio of these two equations, we get
modes in a step-index fibre. The upper
solid line is the solution to eqn (11.49) π w
derived using the WKB approximation. tan u − = ,
The full Bessel function solution is
4 u
indicated by the dots. As the which we rewrite in the form
wavelength is reduced, V increases, and π u
it becomes possible to squeeze a higher cos u − = , (11.49)
order, m = 1 mode into the guide, as 4 V
indicated by the lower solid line. The
shaded region indicates the range of V , where
and hence the range of wavelengths for
which the fibre is single mode. V = (u2 + w2 )1/2 = ka(n21 − n22 )1/2 , (11.50)
is known as the V -number of the fibre. The V -number is proportional

to the core radius and index step and inversely proportional to the
optical wavelength. The solution to this equation is plotted as a b -

V plot, where b2 = 1 − u2 /V 2 in Fig. 11.14. These solutions allow us
to determine the propagation constant for a particular index step and
core radius. The WKB connection formula leads to the result that there
is no solution for V < π/4. This determines the long-wavelength cut-off
for single-mode operation of the fibre. In practice, the propagation of
longer wavelengths, smaller V -number, is limited by tunneling into the
cladding which introduces larger bend losses, as we will discuss further.
Also, in Fig. 11.14 we plot the normalized propagation constant derived
from the full Bessel function solutions.15 To find out whether the fibre 15
The WKB approximation does a
will be single mode we need to see if there are any higher-order solutions. remarkably good job even though the
mode function given by eqn (11.48) is
This is achieved by repeating the analysis for the higher-order modes. not that accurate.
Example 11.4
Higher-order modes: For the m = 1 mode in eqn (11.36), the integral in the core
becomes
ˆ ρ ˆ ρ
1 1/2
κdρ = n21 k2 − β 2 − 2 dρ ,
0 0 ρ
2 2 1/2
u ρ −1 a
= − 1 − cos ,
a2 uρ
where use has been made of the standard integral
ˆ
(ξ 2 − 1)1/2 1
dξ = (ξ 2 − 1)1/2 − cos−1 .
ξ ξ
In the cladding we get
ˆ ρ 2 2 1/2
w ρ 2 1/2 1 + (w2 ρ2 /a2 + 1)1/2 a
κdρ = + 1 − w + 1 + ln ,
a a2 1 + (w2 + 1)1/2 ρ
which uses another standard integral
ˆ
(ξ 2 + α2 )1/2 α + (ξ 2 + α2 )1/2

dξ = (ξ 2 + α2 )1/2 − αln .
ξ ξ
So the complete solution is

1/2
u2 ρ2 −1 a
E(ρ) = A cos −1 − cos − π/4 ,
a2 uρ Fig. 11.15 The fundamental mode for

⎡ " ⎤ different values of the mode parameter

1 + w2 ρ2 /a2 + 1 a ⎣ w +1− w 2 ρ2
V (black curve). The gaussian mode
= B √ ρ exp
2 + 1⎦ ,
1 + w2 + 1 a2 of a graded-index fibre is shown in grey
for comparison. For small V the mode
for ρ < a and ρ > a, respectively. Again matching the field and gradient across the spreads out far into the cladding. For
boundary we find large V the mode is strongly localized
in the core, but when V > 2.405 higher-
1 π (u2 − 1)1/2
cos (u2 − 1)1/2 − cos−1 − = . (11.51) order modes can also propagate in the
u 4 V fibre.
We plot the solution to eqn (11.51) together with the solution for the
fundamental (m = 0) mode, eqn (11.49), in Fig. 11.14, and indicate the
range where the fibre is single mode. As V is inversely proportional to
the wavelength, the plot shows us that if the wavelength is too short
then V is large and the fibre becomes multimode. At the other extreme,
if the wavelength is too long, then it does not support any propagating
modes. Consequently any fibre with a particular core radius and index
step will have a finite range of wavelengths over which it is single mode.
For b close to zero, u ≈ V and w ≈ 0 and the mode decays only slowly
into the cladding. As V increases—which happens either if we increase
the core radius a or reduce the wavelength λ—both b and w increase
and the mode becomes more localized in the core. This is illustrated in
Fig. 11.15, where we plot the modes for three values of V . Eventually
as we increase V further, we can fit another mode into the core. Single-
mode fibres are typically designed to operate just below the single mode
limit, V = 2.405, where the mode is well localized within the core. This
minimizes losses due to variations in the core radius.
Chapter summary
• The lowest-order modes emitted by a laser are gaussian beams

that are solutions of the paraxial wave equation.
• As a laser beam propagates away from a focus, the beam radius
increases and the wave fronts become curved.
• The Rayleigh range of a gaussian beam of waist w0 and
wavelength λ is zR = πw02 /λ.
• The beam radius, w, evolves with distance from a focus as w =

2 1/2
w0 1 + z 2 /zR .
• The wave-front curvature, R, evolves with distance from a focus
2
as R = z + zR /z.
• The far-field diffraction angle is Δθ = λ/πw0 .
• Laser cavities are designed by matching the wave-front curvature
to the mirror curvature.
• An optical fibre imprints a phase that cancels diffraction, leading
to a guided light mode.
• The modes of a fibre or waveguide are analogous to the wave
function for the quantum particle-in-a-box problem.
• The propagation modes in a standard step-index optical fibre
are determined by three parameters: the optical wavelength, λ, the
core radius, a, and the index step, n1 −n2 . These three parameters
are combined in a dimensionless quantity known as the V -number
of the fibre.
• For π/4 < V < 2.4 the fibre is single mode.
Exercises 193
Exercises
(11.1) Laser beams and spherical waves A red laser beam is intended for use in a survey
Write an equation for a gaussian beam in terms of theodolite. If it has a waist size of 1 mm,
the complex beam parameter, q = z−izR . Rewrite show that the beam ‘remains parallel’ over a
this equation in the far field, z zR , in terms of distance of approximately 5 m. Think of a suitable
the cylindrical coordinates (ρ, z). How does this definition for a diffracting beam to remain parallel.
compare to a paraxial spherical wave? How do Typically a beam-expanding telescope is used on
the amplitudes compare? What does this suggest the output of a survey laser. If the beam waist
about the effective size of a source of a spherical is expanded to be 25 mm, over what distance will
wave? the beam remain parallel? Which has the largest
(11.2) Laser beam: on-axis intensity (1) spot size on the moon: a He–Ne laser with waist
How does the on-axis intensity of a laser beam 1 mm, or beam expanded in a telescope to be 1 m?
scale with the beam waist w0 ? Explain the [Hint: Distance to moon = 3.8 × 105 km.]
physical origin of the power law. (11.10) Beam expansion (2)
(11.3) Laser beam: on-axis intensity (2) Calculate how many Rayleigh ranges a laser beam
Show that the probability of detecting a photon has to propagate before the central intensity
along the propagation axis of a laser beam is becomes 50 times weaker that its initial value.
described by a Lorentzian (or Cauchy–Lorentz) (11.11) Focal shift of a focused laser
distribution. A laser beam with a waist w0 = w1 is incident on
(11.4) Wave-front curvature a lens with focal length f in the z = 0 plane.
For a gaussian beam with waist in the z = 0 plane (a) What is the wave-front curvature of the
plot the wave-front curvature, R, as a function of incident beam in the z = 0 plane?
distance. For the abscissa use the scaled variable (b) What is the wave-front curvature, R,
z/zR . Verify mathematically that the wave-front immediately after the lens?
curvature attains its minimum value for z = zR .
(c) If the new waist is formed in a plane at
(11.5) Gouy phase z = z2 . Use your equation for R to write
For a gaussian beam with waist in the z = 0 plane an expression for z2 in terms of f and the
plot the Gouy phase as a function of distance. For Rayleigh range of the focused beam, zR2 .
the abscissa use the scaled variable z/zR . Verify Hence find an expression for the focal shift,
mathematically that the Gouy phase changes by zR2 − f .
π in traversing the plane containing the beam’s
(d) Use the complex beam parameter,
waist.
eqn (11.16), to show that z2 = f − f 3 /zR12
,
(11.6) Wave-front curvature at the waist where zR1 is the Rayleigh range of the input
Verify that the wave fronts of a gaussian beam beam.
are flat at the waist. Does this mean that the
(e) Using w2 = f λ/(πw1 ), show that the
beam can be considered to be a plane wave at this
expressions for z2 obtained in (c) and (d) are
location?
the same.
(11.7) Characterizing a gaussian beam
In addition to the wavelength, how many (11.12) Laser cavities
other independent parameters are required to Write an equation for the radius of curvature, R,
characterize a gaussian beam? of a laser beam with waist size w0 in terms of
the propagation distance, z. Define any additional
(11.8) Rayleigh range quantities used.
What is the Rayleigh range of a laser of wavelength Find the distance at which the radius of curvature
λ = 633 nm of waist 0.250 mm? What is the size is a minimum, i.e. the wave front is most
of the beam after it has propagated 500 m? curved. What is the radius of curvature, R, at
(11.9) Beam expansion (1) this distance?
194 Exercises
A laser cavity of length L consists of two mirrors , and the wavelength of the pump laser. Compare
with radius of curvature Rm . By matching wave- the Rayleigh range of the optimal pump beam to
front curvature to the mirror curvature, derive an the length of the crystal.
expression for the beam waist inside the cavity and (11.14) Laser focusing
find the maximum value of L that will give a stable By equating real and imaginary parts in
cavity. eqn (11.16) find expressions for the new waist
(11.13) Minimizing beam size position, z2 , and the new waist size, w2 , in terms
Write an equation for the radius, w, of a laser of the z1 and w1 .
beam with waist size, w0 , as a function of (11.15) Project: photonic crystal fibre
propagation distance, z. Define any quantities Use eqn (6.29) to simulate the propagation of a
used. gaussian mode through the hollow-core structure
In an optically pumped solid-state laser, it is found shown in Fig. 11.16, and see whether you can
that the laser threshold is a minimum when the find solutions where the mode is confined within
pump laser beam radius is minimized at both ends the lower index core. The cladding region is
of the laser crystal. Find an expression for the structured with regions of high and low index
optimal pump beam waist, w0 , that minimizes the which imprints a spatially dependent phase that
threshold in terms of the length of the laser crystal, impedes transverse propagation.
Fig. 11.16 Simulation of light propa-

gation through a hollow-core fibre using
the hedgehog equation, eqn (6.29).
Above we show the effective potential,
Veff , and the propagation mode inten-
sity I downstream. Note that in the
particle-in-a-box analogy, the potential
corresponds to an anti-trap.
Vector light fields 12
In this chapter we focus on vector light fields. In Chapter 1 we
12.1 EM fields are not purely
explained why it was an excellent approximation to use the scalar transverse 195
wave equation to describe light fields in numerous circumstances. Here, 12.2 Beyond paraxial 196
we shall quantify the conditions where the scalar approach fails, and 12.3 Vector angular spectrum 198
incorporate the vector nature of the fields into our formalism. We
emphasized in Chapter 2 that solutions such as plane waves and spherical
waves were excellent building blocks for construction of more complex
Chapter summary 209
light fields, and in this chapter we take a further step by adding
Exercises 210
many basis functions whose polarization vectors are not necessarily
parallel. As a consequence, we shall discover that the properties of
the superposition of waves do not have to be the same as the properties
of the individual components—specifically, the non-transverse nature of
light beams will feature prominently.
The structure of the chapter is as follows: we first distinguish
between realistic light beams and the basis functions studied in earlier
chapters; the vector-angular-spectrum method is introduced, and
applied to a gaussian beam to provide results beyond the paraxial
scalar approximation. A new set of polarization states—realizable with
physical light beams—is introduced that go beyond the linear and
circular states discussed in Chapter 4. Finally, the form of the full vector
light field in the vicinity of the focus of tightly focused high-numerical-
aperture beams is discussed.
12.1 EM fields are not purely transverse

We start by considering a beam of light propagating along the z
direction, with the dominant polarization component along x, as
Fig. 12.1 Light is not a transverse
illustrated in the upper image in Fig. 12.1. In contrast to a plane wave, wave! A localized light field has
the defining feature of a beam is that the transverse extent is finite, and a field component parallel to the
as a consequence the total energy carried by the wave is also finite. Let propagation direction. This example
f(x, y, z) be the form of a beam-like solution to the scalar wave equation. shows intensities associated with the x
and z components of a laser beam near
We can attempt to convert this into a vector solution—recalling that the the waist in the xz plane.
beam was assumed to be polarized along the x direction—by writing the
following:
E = Ex ei(kz−ωt) x̂ = f(x, y, z) ei(kz−ωt) x̂ .
On substituting this form into the Maxwell equation, ∇ · E = 0, we
find the condition ∂f/∂x = 0. Therefore, our trial solution is evidently
not compatible with Maxwell’s equation, as the beam—by definition—
196 Vector light fields
has a finite extent, thus the gradient of the function can not be zero
everywhere. The full solution of Maxwell’s wave equation for fields
with finite spatial extent contains a component polarized parallel to the
propagation direction, as illustrated in the lower image in Fig. 12.1.
Before attempting to resolve this issue mathematically, let us consider
what is happening physically. We have already made extensive use of
the concept that any function, f(x, y, z), can be written as a sum of
plane waves; the difference now is that as these plane waves will be
inclined at different angles, there will be components of their (transverse)
electric fields which will lie along the z axis. This is evident in
Fig. 12.2 Two plane waves √ with Fig. 12.2, where we see how the superposition of plane waves can lead
amplitudes E1 and E2 = 2E1 , and
to regions where light is purely longitudinal, i.e. polarized parallel to
wave√ vectors√ k1 = (k, 0, 0) and k2 =
(k/ 2, 0, k/ 2). At a point (x, z) the ‘propagation direction’. Consequently, we should always expect that
where the two waves are π out of phase, beam solutions in vector diffraction theory will have components of the
the total field, E 1 + E 2 , is purely axial, electric field along the propagation equation, i.e. that optical beams
i.e. along z.
are not purely transverse waves. To see how this works mathematically,
let us write the form of the vector electric field for a beam as E =
[f(x, y, z) x̂ + g(x, y, z) ẑ] ei(kz−ωt) . On substituting this modified form
into the Maxwell equation, ∇ · E = 0, we obtain
∂f ∂g
+ ikg + =0. (12.1)
∂x ∂z
1
Linear first-order differential equa- By using an integrating factor,1 this linear first-order differential
tions that can be expressed in the form equation can be solved, giving the expression
dg/dz + P (z)g = Q(z) can be solved ˆ
by multiplying´ by an integrating factor, ∂f ikz
given by exp( z P (z )dz ). This makes g = −e−ikz e dz. (12.2)
∂x
the left-hand side of the equation a
total derivative, facilitating integration We learn three things from eqn (12.2): (i) for the special case of a
of both sides and giving a solution for
g(z).
plane wave, f(x, y, z) is constant and therefore the derivative is zero;
this is consistent with our earlier findings that there is no component
of the electric field along the propagation direction—plane waves are
transverse. (ii) For beams of light, the gradients of the field distribution
along the transverse polarization direction generate a longitudinal (z)
component of field. (iii) Certain beam shapes, such as a gaussian,
have zero gradient for all points along the line (x = 0, y = 0), and
as a consequence the field will be purely transverse along the z axis.
From the second point we also learn that narrow beams—where the
field distribution falls off quickly off-axis—will have large gradients with
2
This also applies to electromagnetic concomitant larger non-transverse field values.2 It is not surprising that
fields in waveguides. In particular, a narrow beam—which is very different to a plane wave—does not share
for microwave fields in waveguides,
where transverse confinement occurs
one of the properties of a plane wave. From a Fourier perspective, to
over a range comparable to the wave- make a narrow beam a broad range of plane waves at large angles will
length, significant axial electric field have to be summed.
components exist; see Yariv and Yeh
(2007).
12.2 Beyond paraxial
From Section 12.1 we saw that the gaussian scalar solution is not
compatible with Maxwell’s equations, and in Section 12.3 we develop a
12.2 Beyond paraxial 197
formalism to rectify this. There is, however, a half-way house where we

can write an analytic form for the longitudinal field component, within
a certain approximation. Even with the scalar wave equation we have
seen that it was much easier to solve within the paraxial approximation.
In this section we shall develop a formula for the (vector) electric field
for a paraxial gaussian beam, correct to first order in an expansion of a
small parameter.
Lax et al. (1975) expanded the solution of Maxwell’s equations in
a power series of a small parameter, given by 1/kw. Here k is the
magnitude of the wave vector, and w the characteristic size of the beam.3 3
Recall that for, say, a typical labora-
Keeping the longitudinal component, but restricting the treatment to the tory He–Ne laser, w ∼ 1 mm, k ∼ 1 ×
107 m−1 , the parameter 1/kw ∼ 10−4 ,
paraxial regime, we can recast eqn (12.2). To first order in 1/kw the which is indeed small.
term ∂f/∂x is independent of z, and we can write g ≈ (i/k)∂f/∂x. We
can generalize the notation slightly, and write this result in the form,
i
Ez = ∇ · ET , (12.3)
k
where ∇ is the transverse gradient operator, and E T the (dominant)
transverse component of the field. Note the presence of the ‘i’ term,
which we have seen previously, and indicates that the longitudinal and
transverse components are temporally π/2 out of phase, see Fig. 12.3.
In Chapter 11 we studied, at length, the gaussian-beam solution to
the paraxial scalar propagation equation, eqn (11.14). In light of the
last paragraph we can now write this as the x component of a vector
solution:
w0
E = Ex x̂ = E0 e−iα eikz eikρ /2R e−ρ /w x̂ .
2 2 2
(12.4)
w
Using eqn (12.3) we can calculate the longitudinal component to be

i 2x Ex z
Ez = ∇ · E T = − i+ , (12.5) Fig. 12.3 The longitudinal, Ez , and
k w kw zR transverse, Ex , electric-field compo-
nents for a laser beam in the vicinity
where we have also used eqn (11.10) and eqn (11.11). This equation can of the waist. White is positive, black
also be written in the convenient form is negative, and grey is zero. Note the
phase change for Ez on either side of the
x x
Ez = − Ex = − Ex , (12.6) z axis. The intensity pattern is similar
z − izR q to Fig. 12.1.
with q the complex beam parameter parameter introduced for gaussian

beams in Chapter 11. We can now write the vector solution, to first
order in 1/kw, for the gaussian beam as

w0 x
E = Ex x̂ + Ez ẑ = E0 e−iα eikz eikρ /2R e−ρ /w x̂ − ẑ . (12.7)
2 2 2
w q
Figure 12.3 shows plots of both the transverse and longitudinal com-
ponents of the electric field of the gaussian beam in the xz plane. As
expected from the symmetry of the gaussian beam, the longitudinal
component is zero on-axis. It is not too difficult to show (see end-of-
chapter exercise) that the ratio of the maximum value of the longitudinal
component
√ −1/2 to the maximum value of the transverse component is
2e /kw = 0.858/kw = 0.137 λ/w. In the paraxial regime
the parameter 1/kw is small, therefore, as expected, the transverse
component dominates. It is also evident that the smaller the beam for
a given wavelength, the more prominent the longitudinal component, as
4
Note that in waveguides, where the expected.4
fields are confined in the transverse
directions by conductors, it is possible
to have pure transverse electric (TE) 12.2.1 Optical beams—‘non-existence’ theorems
and transverse magnetic (TM) modes;
see Yariv and Yeh (2007). Before going on to consider vector beams beyond the paraxial limit we
note that there are some ‘non-existence’ theorems about optical beams.
We emphasized in Chapter 2 that not all of the properties of plane waves
are shared by every optical beam, and in this section we have been able
to quantify the relative importance of the longitudinal component. This
result could also be interpreted as the impossibility of obtaining a pure
transverse beam. There are other such statements that can be made,
inspired by Lekner’s paper (Lekner, 2003):
• Pure transverse electric and magnetic optical beams do not exist.
• Beams of fixed linear polarization do not exist.
• Beams which are everywhere circularly polarized in a fixed plane
do not exist.
Further details of the mathematical analysis to back up these statements
are given in the end-of-chapter exercises. A related result is the non
existence of isotropic light waves. Maxwell’s equations dictate that all
Fig. 12.4 The hairy ball theorem: the
impossibility of a continuous transverse spherical electromagnetic waves are intrinsically anisotropic, i.e. the
vector field on a sphere. There must be electric and magnetic fields depend on at least one angular variable
at least two positions where there is a (Zangwill, 2013). The impossibility of achieving a vector field that
discontinuity and hence the amplitude
is everywhere tangent to a spherical surface is sometimes called the
is zero. Image courtesy of Nicholas
Spong, Durham University, 2018. ‘hairy billiard ball’ theorem, see Fig. 12.4; Milnor (1978) provides a
mathematical proof.
12.3 Vector angular spectrum

Here we build on the results of Chapter 6 and extend the discussion
to include the vector nature of the light. We consider the following
diffraction problem: we assume that the electric-field profile in the plane
5
As in earlier chapters, we won’t worry z = 0 is tangential, and known;5 how do we determine the (vector) field
too much about how the field came to everywhere in the half space where z > 0 ? Following the methodology
have that particular profile on the plane
z = 0; see Sherman (1969).
of Chapter 6 we shall utilize the plane-wave representation; i.e. we shall
decompose the field in the plane z = 0 into a sum of plane waves with
Fourier transforms. The propagation of the plane waves into the half
space where z > 0 is trivial, and at an arbitrary plane downstream the
inverse Fourier transform can be used to reconstruct the (vector) field.
We write the electric field in the plane z = 0 as
E (x, y, z = 0) = Ex (x, y, z = 0) x̂ + Ey (x, y, z = 0) ŷ ,
E (x, y, z = 0) = Ex(0) x̂ + Ey(0) ŷ . (12.8)
12.3 Vector angular spectrum 199
Generalizing eqn (2.10), we can write the vector plane wave as
E = (E0x x̂ + E0y ŷ + E0z ẑ) ei(kx x+ky y+kz z) , (12.9)
where the harmonic time dependence has been suppressed. The vector
angular spectrum method represents the field in the half space where z >
0 as a superposition of the basis functions of eqn (12.9). As in Chapter 6,
only the plane-wave components with kx2 + ky2 < k 2 contribute to the
summation. These components will contribute to the far field whereas
components with kx2 + ky2 > k 2 produce a complex kz and represent
evanescent plane waves that decay exponentially with z. Once again
we rewrite the transverse wave vector in terms of spatial frequencies
kx = 2πu and ky = 2πv and eliminate kz ; the phase term of the plane
wave becomes
2
−v 2 )1/2 z]/λ
ei(kx x+ky y+kz z) = ei2π[ux+vy+(1−u . (12.10)
We can also make use of the fact that plane waves are transverse to
eliminate the amplitude of the z component of the plane wave:
1
k·E =0 ⇒ E0z = − (kx E0x + ky E0y ) . (12.11)
kz
We are left with two field components, E0x and E0y . As we established
in Chapter 6 the angular spectrum of the plane waves can be calculated
from the Fourier transform of the field in the plane z = 0; we generalize
(0)
that idea here for vector waves. We introduce two functions,6 Ax 6
There can only be two independent
(0)
and Ay , which are, respectively, the angular spectrum of the x and functions that allow full specification
of all fields downstream. We have
y components of the electric field in the plane z = 0: (0) (0)
chosen Ax and Ay ; the spectrum of
ˆ ˆ ∞ Ez can be obtained from ∇ · E = 0,
A(0)
x (u, v) = Ex(0) (x, y)e−i2π(ux+vy) dxdy , and the spectrum of the magnetic field
−∞ from ∇ × E = −∂B/∂t. See Clemmow
(1966) for a full mathematical discus-
= F[Ex(0) (x, y)](u, v) , (12.12)
ˆ ˆ ∞ sion.
A(0)
y (u, v) = Ey(0) (x, y)e−i2π(ux+vy) dxdy ,
−∞
= F[Ey(0) (x, y)](u, v) . (12.13)
As for the scalar case, the propagation of the plane waves from the plane
z = 0 downstream is trivial—this is one strong motivation for using this
method—and simply involves multiplying the amplitude of the angular
spectrum by the phase factor eikz z ,
A(z)
x (u, v) = eikz z A(0)
x (u, v) ,
A(z)
y (u, v) = eikz z A(0)
y (u, v) . (12.14)
Therefore the full propagation equations for the vector field into the half
space where z > 0—the equivalent of the hedgehog equation (6.29)—
are given by the inverse Fourier transform of the angular spectrum
components:

Ex(z) = F −1 eikz z F[Ex(0) ] , (12.15)

Ey(z) = F −1 eikz z F[Ey(0) ] , (12.16)

kx ky
Ez(z) = −F −1 eikz z F[Ex(0) ] + F[Ey(0) ] . (12.17)
kz kz
7
There is a very insightful discussion
It is worth reiterating the point we made in Chapter 6: eqns (12.15)–
by Clemmow (1966) as to the power
of the formalism that allows the plane- (12.17) are amazingly powerful. Once we have specified the tangential
wave representation of a field in a half electric field in one plane, these equations allow the full vector field
space solely in terms of an ‘aperture to be calculated downstream.7 There exist very efficient algorithms for
distribution’ of certain tangential field
components. An alternative physical
calculating the two-dimensional Fourier transforms at the heart of these
interpretation in terms of current results on a computer (and other modern electronic devices such as
densities as sources of electromagnetic smart phones); consequently these equations are used extensively in
fields is not as flexible computational optics.
We can also compare the full vector solutions of eqns (12.15)–(12.17)
with their scalar counterpart, eqn (6.29). Firstly, we notice that there is
no ‘cross-talk’ between the polarization components along the x and y
directions. As we often encounter the situation where only one of these
components is non-zero, we are reassured that the full vector solution
does not ‘generate’ another transverse component—this justifies one of
the main assumptions of the scalar approximation. Secondly, we see that
in agreement with Section 1.12 the longitudinal component is smaller
that the transverse component for each plane wave in the representation
by the geometrical factors kx /kz and ky /kz , for the x and y components,
respectively. Equation (12.17) allows us to quantify the discussion of
how large an angular width a certain beam may have for the beam to
be considered transverse to a certain degree of approximation.
Fig. 12.5 Axis system for far-field

analysis of the angular spectrum.
12.3.1 Far-field analysis of the angular spectrum
8
Inspired by the treatment of
Smith (1997), and Novotny and Hecht In this section we shall perform a far-field analysis of the angular
(2012). spectrum of the optical field.8 This will serve to introduce a notation
that will be convenient in Section 12.5, where we discuss tight focusing of
light fields. It will also allow us to quantify earlier discussions about the
regime of validity of Fourier optics, and also provide a firm mathematical
link between diffraction theory and the geometrical construct of a ray.
Figure 12.5 shows the geometry. Let the point P have coordinates
(x, y, z), with r = x2 + y 2 + z 2 , and θ and φ the usual angular
9
An excellent overview of this math- coordinates. This point can also be specified by the unit vector ŝ, with
ematical technique applied to asymp- components sx , sy , and sz , respectively, where x = sx r, y = sy r, and
totic values of Fourier-like integrals can
be found in Stamnes (1986) and Mandel z = sz r. Using the method of stationary phase9 the propagation
and Wolf (1995). equations for the fields can be found in the asymptotic limit of kr → ∞.
At sufficiently large distances from an aperture it is possible to isolate
10
See Smith (1997) for the mathemati- a dominant part of the field, called the radiation field or the far field,
cal details.
denoted by E R . The result is10
eikr
(0) (0)
E R = Aθ (θ, φ)θ̂ + Aφ (θ, φ)φ̂ , (12.18)
r
where
(0) −ik
Aθ = cos φ A(0)
x (ks x , ks y ) + sin φ A(0)
y (ks x , ks y ) , (12.19)
2π
−ik
(0)
Aφ = cos θ − sin φ A(0)x (ksx , ksy )
2π
+ cos φ A(0)
y (ksx , ksy ) . (12.20)
What we learn from these equations is that the amplitude of only one of
the plane-wave building blocks from the angular spectrum representation
contributes to the asymptotic radiation field in the direction ŝ; that with
wave vector parallel to this direction, i.e. with kx = sx k, ky = sy k, and
kz = sz k. The contributions from all other plane waves destructively
interfere along this direction. The asymptotic form of the radiated field
can be written as11 11
The symbol ∼ is used to denote
‘asymptotic to’. See Rhodes (1964)
eikr and Smith (1997) for further details
E R (rs) ∼ E0 F(s) as kr → ∞ . (12.21) of asymptotic methods in electromag-
r
netism.
The (angular dependent) function E0 F(s) is referred to as the radiation
pattern. Note in particular that the radial dependence takes the form
of a spherical wave. The angular-spectrum method uses plane waves as
building blocks, but the asymptotic value of the field in a given direction
does not take the form of a plane wave.
We saw in Chapter 2, Fig. 2.1, that for a plane wave the surfaces
of constant phase are planar surfaces perpendicular to the direction of
propagation. We can think of these as geometrical wave fronts. The
geometrical rays are the vectors that denote the direction of energy flow.
For plane waves in vacuum the rays are parallel to the wave vector.
For the radiation field of eqn (12.18) the wave fronts are spherical and
the rays are in the radial direction, once again orthogonal to the wave
fronts.12 12
It is fascinating to see some of
Having spent some time on the far-field, asymptotic form of the light the concepts of geometrical optics
emerge from analysis of the far-field
downstream of an aperture, we move on to the issue of describing radiation pattern. Geometrical optics
the vector light fields that are achieved by focusing. But first, we has been one of the longest studied
augment the discussion of polarization states of light from Chapter 4, topics in the physical sciences, certainly
by introducing two new states of polarization. many centuries before the appearance
of electromagnetic fields and Maxwell’s
equations. The reader interested
in more details of the link between
12.4 Radial/azimuthal modes geometrical optics and the (vector)
wave theory of light is directed to Born
In Chapter 4 we used the linear and circular polarization bases and Wolf’s classic text, Born and Wolf
extensively. Here, we extend the discussion of types of polarization states (1999).
of optical beams. We shall demonstrate that radial and azimuthal
polarization states of light can be generated by taking suitable
combinations of linearly polarized beams. These so-called cylindrical
vector beams are solutions of the vector wave equation that obey
cylindrical symmetry in both amplitude and polarization (Zhan, 2009).
In Chapter 11 we concentrated on the properties of a gaussian beam—

the lowest-order mode of a laser cavity. Higher-order modes exist, and
when there is cartesian symmetry the Hermite–Gauss modes can be
13
Hooker and Webb (2010) discuss excited.13 The first-order Hermite–Gauss modes—obtained from the
higher-order modes of a low-loss laser derivatives of the TEM00 mode, see end-of-chapter exercise—can be
cavity.
written as
√
2x −(x2 +y2 )/w2
E10
HG
= E0 e , (12.22)
√w
2y −(x2 +y2 )/w2
E01
HG
= E0 e . (12.23)
w
Figure 12.6 shows the field profile for these modes.
12.4.1 Radial polarization

Consider the linear combination of (01) and (10) modes, with orthogonal
linear polarization:
ER =E10
HG
x̂ + E01
HG
ŷ ,
√
2E0 −(x2 +y2 )/w2
ER = e (xŷ + y ŷ) ,
√ w
2E0 −(x2 +y2 )/w2
ER = e rr̂ . (12.24)
Fig. 12.6 The 10 and 01 modes of a w
Gaussian beam. The + and − signs Evidently, this is a radially polarized beam. In Fig. 12.7 we
indicate the relative phase. show the polarization of the individual (01) and (10) modes, and
their combination. There are numerous techniques (Zhan, 2009) for
Fig. 12.7 A radially polarized light

mode, which can be formed from the
sum of orthogonal linearly polarized
01 and 10 modes. The black arrows
indicate the direction of the electric
field vector.
generating radial polarization states of light, including: interferometric

combination of beams; using segmented wave plates; or by exploiting the
polarization dependence of the Fresnel coefficients on internal reflection
from a conical reflector (Radwell et al. 2016).
12.4.2 Azimuthal polarization

To construct an azimuthal polarization we take a different linear
combination of the (01) and (10) modes, with orthogonal polarizations:
E Az = E01
HG
x̂ − E10
HG
ŷ ,
√
2E0 −(x2 +y2 )/w2
E Az = e (y x̂ − xŷ) ,
√ w
2E0 −(x2 +y2 )/w2
E Az = e rφ̂ . (12.25)
w
Evidently, the polarization of this beam is azimuthal. In Fig. 12.8

we show the polarization of the individual (01) and (10) modes, and
their combination.
Fig. 12.8 An azimuthally polarized

light mode, which can be formed from
the sum of orthogonal linearly polarized
01 and 10 modes.
Note that neither the radial nor the azimuthal polarized beams can
have a finite field at the origin,14 as there is a singularity in the direction 14
Beams with intensity patterns sim-
of the polarization vector—similar to the zero density at the centre of a ilar to those depicted in Figs. 12.7
and 12.8 are frequently referred to as
fluid vortex.15 ‘doughnut beams’.
15
Note that the most general cylindri-
cal vector beam has a fixed direction of
12.5 High-NA focusing polarization with respect to the radial
vector, and can be generated (Zhan,
In previous sections we have discussed the limitations of scalar diffraction 2009) from a linear superposition of
theory, and showed how a beyond-paraxial approximation can be used radial and azimuthal polarization.
to model vector fields. The fact that most of this book uses scalar
theory shows to a certain extent that the vector addition is not often
crucial. However, one particular field of modern optics where scalar
diffraction theory is not valid, and that necessitates the use of a vector
diffraction theory, is that of high numerical aperture (NA) optical
focusing and imaging. In addition to being of fundamental interest,
during this century there has been a burgeoning interest in the study of
this phenomenon, as tightly focused fields have found a wide range of
applications in, e.g. data storage, optical microscopy, optical tweezers16 16
Further details can be found in Jones
and particle manipulation, drilling holes, to name but a few. et al. (2015)
Most lenses have spherical surfaces, as these are by far the easiest to
machine. A compound lens formed of many spherical surfaces will, have 17
We do not have space to discuss the
many aberrations, especially if some of the rays are at large angles
details here, but there is a strong link
with respect to the optical axis. A notable exception is an aplanatic between the features of an aplanatic
system, where ‘aberration-free’ imaging of the points located in the lens and a concept known as the
vicinity of the optical axis can be achieved.17 Abbe sine condition, named after Ernst
Abbe. Systems are designed to fulfil
The starting equation for our analysis was written down and solved Abbe’s condition with the intention of
numerically for a plane wave illuminating a finite-diameter lens over reducing aberrations. An insightful
half a century ago, by Wolf (1959) and Richards and Wolf (1959). discussion can be found in Wave
However, in spite of the success of their vector angular spectrum Theory of Aberrations, Hopkins (1950)
method, the numerically intense nature of the integrals meant that not
much attention was devoted to this topic. Early in this millennium,
Youngworth and Brown (2000) published a paper that applied Richards 18
This allows the design of, for ex-
and Wolf’s method to the focusing of high-numerical-aperture cylindrical
ample, complex optical longitudinal
vector beams, and the topic has flourished since—largely as the power of polarization structures, such as linked
modern computers renders calculation of complicated three-dimensional and knotted longitudinal vortex lines;
fields and intensities in the vicinity of a focus a tractable problem.18 see Maucher et al. (2018).
12.5.1 Geometry and integral representation of the

focused field
19
Rays in planes through the axis are Figure 12.9 shows a meridional plane19 for a particular light path. The
known as meridian rays. optical axis is along z, and we restrict our attention to optical systems
with cylindrical symmetry, which simplifies the mathematical analysis
considerably. For an aplanatic lens (meaning no spherical aberrations)
the refraction of the rays is characterized by a spherical surface behind
the lens with radius of curvature equal to the focal length, f . The
maximum value of θ is called α, and is related to the numerical
20
If the medium behind the lens has aperture, NA, by the relation20 sin α = NA. If the input beam has
a refractive index of n, the relation an axial planar wave front, then the resulting converging spherical wave
becomes n sin α = NA.
propagates to a diffraction-limited axial point image. The unit vector
ĝ 0 is oriented perpendicular to the optical axis, and can be expressed in
cylindrical coordinates as
ĝ 0 = cos φx̂ + sin φŷ , (12.26)
where the azimuthal angle φ is with respect to the x axis. After the lens,
the vector s is the direction of propagation of the ray, and the vector ĝ 1
21
Note the appearance of a z- is given by21
component of the radial field as a
consequence of the tilting of the ray. In ĝ 1 = cos θ (cos φx̂ + sin φŷ) + sin θẑ , (12.27)
contrast with the scalar approach, we
do not assume this angle to be small. where θ is the polar angle.
The radial component of the electric field is parallel to ĝ 0 , and the
azimuthal component points along the direction ĝ 0 × k̂. Consequently
in region 0, in front of the lens, the electric field may be written as

E = f(θ) Er ĝ 0 + Eφ ĝ 0 × k̂ , (12.28)
where f(θ) is the relative amplitude of the incident field. Richards and
Wolf showed that the electric field at point P in the vicinity of the
focus can be expressed as a diffraction integral of the angular-spectrum
representation of the field over the spherical surface of radius f , see
Richards and Wolf, eqn (2.2):
ˆ ˆ
−ik a1 ik(sx x+sy y+sz z)
E P
= e dsx dsy . (12.29)
2π Ω sz
The integration is over the solid angle, Ω, subtended by the lens at the
focus. The particularly simple form of the phase factor in the exponential
is a consequence of modelling an aplanatic aberration-free lens. The
element of solid angle, dsx dsy /sz , can also be written as (Richards and
Wolf, eqn (2.25))
dsx dsy /sz = dΩ = sin θdθdφ. (12.30)
The amplitude function a1 behind the lens is derived from the amplitude
before the lens by the equation
√
a1 = f cos θ f (θ) Er ĝ 0 + Eφ ĝ 0 × k̂ . (12.31)
√
The factor cos θ arises from energy conservation considerations; see
Richards and Wolf eqn (2.13), and the end-of-chapter exercises. It is
convenient to use cylindrical coordinates (ρP , φP , zP ) for point P , with
the origin at the paraxial focus.
From eqn (12.29) we derive expressions for the cartesian components
of the electric field in the vicinity of the focus, where
E P = ExP x̂ + EyP ŷ + EzP ẑ .
The first two are easily interchanged for the radial and azimuthal
components:
EφP = EyP cos φP − ExP sin φP , (12.32)

EρP = ExP cos φP + EyP sin φP . (12.33)
As we have assumed cylindrical symmetry, the double integral of

Fig. 12.9 Axis system used to describe
eqn (12.29) separates, and the φ integral can be evaluated analytically the focusing of light. The diagram
by using the following integrals: shows the meridional plane, which
intersects the xz plane at an angle φ.
ˆ 2π
cos (nφ) eikρP sin θ cos(φ−φP ) dφ = 2πin Jn (kρP sin θ) cos (nφP ) ,
0
ˆ 2π
sin (nφ) eikρP sin θ cos(φ−φP ) dφ = 2πin Jn (kρP sin θ) sin (nφP ) ,
0
where Jn are Bessel functions of the first kind of order n. Having

developed this formalism, we now go on to apply it to three specific
cases: tight focusing of (i) a linearly polarized gaussian beam, (ii) a
radially polarized doughnut beam, and (iii) an azimuthally polarized
doughnut beam.
12.5.2 Linearly polarized illumination

In their classic paper, Richards and Wolf considered a plane wave
polarized along x̂. We generalize their treatment slightly, by considering
the input to have the form E = E0 f (θ) x̂. From eqn (12.29) we obtain
the following expressions for the electric field components at a point P
in the vicinity of the focus:
ExP = −iA (I0 + I2 cos 2φP ) , (12.34)

EyP = −iAI2 sin 2φP , (12.35)
EzP = −2AI1 cos φP , (12.36)
where the strength factor A is given by
kf E0 πf E0
A= = . (12.37)
2 λ
The integrals that appear in the above expressions are given by:
ˆ α √
I0 = f (θ) cos θ sin θ (1 + cos θ) J0 (kρP sin θ) eikzP cos θ dθ , (12.38)
ˆ0 α √
I1 = f (θ) cos θ sin2 θJ1 (kρP sin θ) eikzP cos θ dθ , (12.39)
ˆ0 α √
I2 = f (θ) cos θ sin θ (1 − cos θ) J2 (kρP sin θ) eikzP cos θ dθ . (12.40)
0
These integrals have to be evaluated numerically for every point around

the focus in order to see the shape of the tightly focused field—as
previously emphasized. The power of modern computers enables fast
evaluation of these calculations. Setting f (θ) equal to 1 regains Richards
and Wolf’s results for plane-wave illumination.
Fig. 12.10 The electric field compo-

nents, Ex , Ey , and Ez , and the intensity,
I, in the focal plane of a lens for an
input linearly polarized along x. For
E, white is positive, black is negative,
and grey is zero, whereas for I black
and white correspond to zero and peak
intensity, respectively. In this example,
the input beam size is w = 40λ and
the focal length is f = 100λ. The focal
spot predicted by paraxial optics has
a radius wf = f λ/(πw) = 0.8λ. The
region shown has dimensions 5λ × 5λ.
As we have devoted much attention to the (weak) focusing of scalar
gaussian beams, it is instructive to investigate strong focusing of a vector
gaussian beam. The input function is chosen as
f (θ) = e−(x
2
+y 2 )/w2
= e−f
2
sin2 θ/w2
, (12.41)
where we have assumed that the waist of the (broad) gaussian beam is
on the front surface of the lens.
Figure 12.10 shows plots of Ex , Ey , and Ez , as well as the total intensity,
in the plane zP = 0. We have chosen the parameter w/f = 0.4. As
expected, the x-component dominates; the focal spot is compact but
not fully symmetric (it is elongated along the x axis). However, we also
see a weaker longitudinal Ez component, with a bimodal distribution.
From symmetry arguments it is easy to argue why the longitudinal
component has to be zero at the origin, but it is finite to either side.
There is also an even weaker Ey component, that arises from ‘cross talk’
between the directions when the rays are refracted by the lens. This has
a four-fold symmetry. The bimodal Ez component is in agreement with

our treatment of the gaussian beam beyond the paraxial approximation
that we discussed in Section 12.2, eqn (12.7). Evidently, the full vector
treatment presented in this section predicts features not found by adding
a correction to the paraxial treatment, such as the presence of an Ey field
component.
12.5.3 Radially polarized illumination

For the case where the input beam has radial polarization we set Eφ = 0
in eqn (12.28). From eqn (12.29) we obtain the following expressions for
the electric field components at point P in the vicinity of the focus:
ˆ √α
EρP =A f (θ) cos θ sin 2θJ1 (kρP sin θ) eikzP cos θ dθ , (12.42)
0
ˆ α √
EzP = 2iA f (θ) cos θ sin2 θJ0 (kρP sin θ) eikzP cos θ dθ . (12.43)
0
Fig. 12.11 Field distributions for a
focused radially polarized beam, see
Fig. 12.7, in the focal plane. The radial
From symmetry, the azimuthal component is zero everywhere. Fig- and axial polarization states, Eρ and
ures 12.11 and 12.12 show plots of Eρ and Ez in the xy plane with zP = 0, Ez , are shown. The parameters are the
and Iρ and Iz in the xz plane for y = 0, respectively. As in Fig. 12.10, we same as in Fig. 12.10.
have chosen w/f = 0.4. Figure 12.7 illustrates why on-axis the field has
to be purely longitudinal. The spot size is compact—indeed, one of the
most prominent early investigations of radial polarized light showed that
a smaller focal spot size could be obtained in comparison with linearly
polarized light (Dorn et al. 2003). Figure 12.11 also hints that the
strongest contribution to the radial longitudinal focus component comes
from rays which are tipped most by the lens. This can be confirmed by
having an annular mask which apodizes the input such that only light
within a restricted range of θ is transmitted.
The peak intensity of the radial and longitudinal components occur
at different locations, and for our parameters their ratio is 5. Clearly by
using a radial beam and a high numerical aperture lens the form of the
field at focus is very different to the prediction of scalar wave theory—
this is not remotely surprising, as the conditions are those under which
the scalar approximation is expected to fail.
12.5.4 Azimuthally polarized input Fig. 12.12 Intensity distributions as-

sociated with the radial and axial
For the case of the input beam having azimuthal polarization we set Eρ = polarization states Iρ and Iz in the xz
0 in eqn (12.28). From eqn (12.29) we obtain the following expression plane. The parameters are the same as
for the electric field component at point P in the vicinity of the focus: in Fig. 12.10.
ˆ α √
EφP = 2A f (θ) cos θ sin θJ1 (kρP sin θ) eikzP cos θ dθ . (12.44)
0
From symmetry, the radial and longitudinal components are both zero.
Figure 12.13 shows plots of Eφ in the plane zP = 0 and the plane x = 0.
We have used the same parameters as previously, i.e. w = 40λ and the
focal length is f = 100λ . From symmetry it is evident that there can
be no field on-axis, and that there cannot be a longitudinal component.
To finish this section, we note that our treatment has assumed that
the waves behind the lens and in the focal region were in air. In many
high-resolution microscopes a transparent fluid with high refractive
index is used, and oil-immersion lenses are practically ubiquitous for
most applications. The numerical aperture is increased, allowing for
higher resolution. Our aim in this section was to give a flavour of
how vector-light formalism can be used to describe the tight focusing
of different light beams. This is a field of modern optics where there is
considerable activity, and we direct the reader to recent review articles—
Chen et al. (2012), Brown (2011)—and the research literature to see
what researchers are doing at the cutting edge of this vibrant field.
Fig. 12.13 Intensity distributions as-

sociated with a focused azimuthially-
polarizated beam, see Fig. 12.8, in
the xy (top) and xz (bottom) planes.
The parameters are the same as in
Fig. 12.10.
Chapter summary
• Optical beams are not purely transverse.

• The gradients of the field distribution along the transverse
polarization direction for a beam generates a longitudinal
component.
• To first order in the small parameter 1/kw, where k is the wave
vector and w the size of the beam, the longitudinal field is given
by Ez = (i/k)∇ · E T , where ∇ is the transverse gradient operator
and E T the transverse component of the field.
• To first order in 1/kw, with q = z−izR , the cylindrically symmetric
vector gaussian beam is

zR 2 x
E = Ex x̂ + Ez ẑ = E0 eikz eikρ /2q x̂ − ẑ .
iq q
• For a gaussian beam the ratio of the maximum value of the
longitudinal component to the maximum value of the transverse
component is 0.858/kw = 0.137 λ/w.
(0) (0)
• When the electric field in the plane z = 0 is Ex x̂ + Ey ŷ, the
vector field downstream for the half space z > 0 is

Ex(z) = F −1 eikz z F[Ex(0) ] ,

Ey(z) = F −1 eikz z F[Ey(0) ] ,

−1 kx ky
Ez(z)
= −F e ikz z
F[Ex ] + F[Ey ]
(0) (0)
.
kz kz
• Cylindrical vector beams are solutions of the wave equation
with cylindrical symmetry in both amplitude and polarization.
• Cylindrical vector beams can display radial and azimuthal
polarization.
• The focal structure of a tightly focused light beam can be generated
(numerically) from the Richards–Wolf vector diffraction integral,
ˆ ˆ
−ik a1 ik(sx x+sy y+sz z)
EP = e dsx dsy .
2π Ω sz
• For radially polarized illumination the radial and longitudinal

electric field components around the focus are:
ˆ α √
EρP = A f (θ) cos θ sin 2θJ1 (kρP sin θ) eikzP cos θ dθ ,
0
ˆ α √
Ez = 2iA
P
f (θ) cos θ sin2 θJ0 (kρP sin θ) eikzP cos θ dθ .
0
• For azimuthally polarized illumination the azimuthal field

component around the focus is
ˆ α √
EφP = 2A f (θ) cos θ sin θJ1 (kρP sin θ) eikzP cos θ dθ .
0
210 Exercises
Exercises
(12.1) Ratio of longitudinal to transverse fields for a and
gaussian beam ∂2 ∂2
+ G=0.
Verify the statement in the text after eqn (12.7), ∂x2 ∂y 2
that the ratio of longitudinal to transverse fields Therefore neither F nor G can localize the field
√
for a gaussian beam is 2e−1/2 /kw = 0.858/kw = about the axis and constitute a beam.
0.137 λ/w. (12.5) Non-existence theorem (2)—Beams of fixed linear
(12.2) Importance of the non-transverse field for a beam polarization do not exist
By interpreting w in the expansion parameter (This question is based on Lekner (2003), Section
1/kw generally as a measure of the width of a 2.2.) If a beam of fixed linear polarization existed,
beam, comment on the importance of the non- we could write it in the form E = Ex ei(kz−ωt) x̂.
transverse component of the field for the following: Using the two curl Maxwell equations show that
(i) a 1 m–wide beam at the entrance of a telescope; these imply that
2
(ii) a He–Ne laser beam (λ = 0.633 μm) expanded ∂ Ex ∂ 2 Ex
+ + k 2 Ex = 0 ,
to a width of 10 cm, and (iii) a diode laser (λ = ∂y 2 ∂z 2
0.780 μm) focused to 25 μm.
∂ 2 Ex ∂ 2 Ex
(12.3) Magnetic field in the paraxial limit = =0.
∂x∂y ∂x∂z
For paraxial vector fields where the longitudinal These equations imply that Ex must be a function
component is given by eqn (12.3), show that the of y and z but not x, and thus cannot be a localized
magnetic field is given by iωB = ∇ · E. Verify that beam solution along x.
these solutions are consistent with the Maxwell
(12.6) Non-existence theorem (3)—Beams which are
equations ∇ · E = 0, and ∇ · B = 0.
everywhere circularly polarized in a fixed plane do
(12.4) Non-existence theorem (1)—Pure transverse elec- not exist
tric and magnetic optical beams do not exist This question is based on Lekner (2003), Section
This question is based on Lekner (2003), Section 2.3. We can write a circularly polarized beam in
2.1. A purely transverse electric and magnetic the form E = (Ex x̂ + Ey ŷ) e−ickt , where Ex and Ey
beam would have fields of the form have the same magnitude and are a quarter of a
cycle out of phase. Using the same mathematical
E = Ex x̂ + Ey ŷ , procedure as the last question show that the
and functional form of Ex (and hence Ey ) has to be of
B = Bx x̂ + By ŷ , the form Ex = Ex (x, z), and Ex = Ex (y, z). This
implies that Ex can only be a function of z, and
with a time dependence of e−ickt . Substitute these thus cannot be a localized beam solution in the
solutions into Maxwell’s equations to show that x, y plane.
the electric field components are governed by the
(12.7) Magnetic field in the vector angular spectrum
equations
∂ 2 Ex formalism (1)
+ k 2 Ex = 0 , For the electric field solutions of eqns (12.15)–
∂z 2
and (12.17) find expressions for the (vector) magnetic
∂ 2 Ey field.
+ k 2 Ey = 0 .
∂z 2 (12.8) Magnetic field in the vector angular spectrum
Show that propagating solutions to these two formalism (2)
equations will be of the form Ex /E0 = eikz F (x, y) In the text we considered the vector angular
and Ey /E0 = eikz G (x, y), and that both F and spectrum method when we specified a tangential
G are harmonic solutions, i.e. subject to the electric field in an aperture in the z = 0
equations plane. This is sometimes referred to as the
2 TE case. We could also specify a tangential
∂ ∂2
+ F=0, magnetic field in this plane—the so-called TM
∂x2 ∂y 2
Exercises 211
case. Find expressions for the (vector) electric (12.15) Fields at the focus of a gaussian beam (1)
and magnetic fields for this case in terms of By considering the geometry of the refraction of
the Fourier transforms of Bx(0) and By(0) , where the rays, explain why the focal field of an x
B (x, y, z = 0) = Bx(0) x̂ + By(0) ŷ. polarized gaussian beam has a bimodal pattern
for the z-component of electric field, and a
(12.9) Vector angular spectrum (1)
quadrupolar symmetry for the field component
For the electric field solutions of eqns (12.15)–
along y.
(12.17) evaluate the fields in the plane z = 0.
Comment on your result. (12.16) Fields at the focus of a gaussian beam (2)
Generate your own version of Fig. 12.10. Study
(12.10) Vector angular spectrum (2) what happens as the width of the input gaussian
For the z-component of the electric field solution beam is varied. What happens when the initial
of eqn (12.17) there are terms proportional to kx width greatly exceeds the aperture of the lens?
and ky . Recalling the result of Appendix B about Explain your result.
the Fourier transform of a derivative, eqn (B.24),
(12.17) Fields at the focus of a radial beam (1)
explain why, in the context of Section 12.2, there
Generate your own version of Fig. 12.11.
is an inevitability to this result.
(12.18) Fields at the focus of a radial beam (2)
(12.11) Vector angular spectrum (3) Consider the effect of an annular aperture.
The electric field solutions of eqns (12.15)–(12.17) Restrict the range of integration of θ to from βα
are extremely compact, owing to the Fourier to α, where 0 ≤ β ≤ 1. Plot a graph of the full
notation. To see why these equations did not width at half maximum of the spot size versus β.
find much utility until the advent of efficient Comment on your result.
Fourier algorithms on powerful contemporary
computers, it is instructive to rework these (12.19) Fields at the focus of an azimuthal beam (1)
equations, retaining the integrals. Rewrite each Generate your own version of Fig 12.11.
of the equations explicitly as two pairs of double (12.20) Fields at the focus of an azimuthal beam (2)
integrals. By drawing a figure similar to Fig. 12.8 for
the case of azimuthal polarization for the input,
(12.12) Hermite–Gauss modes explain why the axial field is zero, and the focused
Use the Fourier transform of a derivative, field has to be transverse.
eqn (B.24), to show that if a TEM00 mode is a
solution to the hedgehog equation, eqn (6.29), then (12.21) Fields at the focus (1)
the Hermite–Gauss mode, eqn (12.22), is also a We have used the results of Richards and Wolf
solution. to calculate the vector field in the vicinity of the
focus of a high numerical aperture lens. Quantify
(12.13) Magnetic component of radiated field the concept of ‘in the vicinity’. What assumptions
For the asymptotic radiated field of eqn (12.18), of the model break down in other spatial regions?
what is the form of the magnetic field?
(12.22) Fields at the focus (2)
(12.14) Fields before and after an aplanatic lens Use the results presented in this Chapter to
By considering the energy transported along a produce a version of Fig. 12.14 for light linearly
ray parallel to the optical axis before the lens, polarized along y. How does the size of the focal
show that the field strength √ after the lens must spot in the x and y directions compare to the
be modified by a factor cos θ. prediction of paraxial theory?
212 Exercises
Fig. 12.14 The electric field compo-

nents, Ex , Ey , and Ez , and the intensity,
I, in the focal plane of a lens for an
input linearly polarized along x. For
E, white is positive, black is negative,
and grey is zero, whereas for I black
and white correspond to zero and peak
intensity, respectively. In this example,
the input beam size is w = 100λ and
the focal length is f = 100λ. The
region shown has dimensions 5λ × 5λ.
Light and matter 13
The man who is not able to develop and use his mind is
13.1 Induced dipoles 213
bound to be the slave of the other man, ...
Marcus Mosiah Garvey (St. Ann’s Bay 1887–London 1940),
Nova Scotia, October 1937.
Electromagnetic waves couple strongly to electrons in matter. In 13.5 Kramers–Kronig 220
previous chapters, we said that the outcome of this interaction is that the 13.6 Point-like scatterers 221
magnitude of the wave vector becomes k = nk, where n is the refractive 13.7 The extinction paradox 225
index. In this chapter, our main goal is to demonstrate the microscopic 13.8 Metals 225
origin of refraction (the change in k) and dispersion (how this change 13.9 Non-linear optics 228
varies with wavelength). At the microscopic level, the oscillatory electric Chapter summary 230
field drives charge oscillations which generate additional fields, and the Exercises 231
propagating field is a superposition of the incident and induced field,
as in Fig. 13.1. In this sense we can think of light propagation inside
a medium as a two-wave phenomenon, where the two waves are the
incident field and the sum of the induced dipolar fields. The nature of
the induced fields depends on whether the electrons are bound—as in
an insulator or dielectric—or free—as in a metal. In insulators, the
electromagnetic field induces oscillatory dipoles, whereas in metals it
creates oscillatory currents, see Section 13.8. We begin by considering
bound charges, specifically plane-wave illumination of a homogeneous
thin slab of non-interacting dipoles, and the microscopic origins of
refraction. We consider the interaction between a single dipole and a
light field which leads to the concept of a scattering cross section. Next,
we discuss metals and plasmas, where the electric field drives currents,
Finally, in Section 13.9, we discuss the non-linear response of charges in
matter.
Fig. 13.1 Inside a medium an incident

13.1 Induced dipoles field, Ei , drives localized charge oscil-
lations which create additional fields,
For neutral matter, the effect of an electric field is to induce a relative Ed , that lag behind the driving field
displacement of the positive and negative charges. An electron displaced and may propagate both forwards and
backwards. The propagating field is a
by a distance r results in a dipole moment d = −er. The magnitude of superposition of the incident field plus
the dipole moment is related to the incident field via the induced field, Ei + Ed , summed over
the whole medium.
d = αE , (13.1)
where α is the known as the polarizability. The case where α is

independent of E (d linearly proportional to E) is known as linear
214 Light and matter
optics. In an oscillatory electromagnetic field, using complex notation,

both α and d are complex. The induced dipoles oscillate at the same
frequency as the light, but not necessarily in phase. Like any driven
1 oscillator, the relative phase depends on the difference between the
As we shall see, the width of the cross-
over between 0 and −π phase difference drive frequency (ω/2π) and the oscillator frequency (ω0 /2π). The
depends on the width of the resonance, dipole oscillates in phase below resonance ω < ω0 , with a π/2 phase
as in Fig. 13.5. lag on resonance, and π out of phase above resonance.1 This phase
difference gives rise to both refraction and scattering. In the scalar
approximation, the steady-state expectation value of the dipole moment
2
for a field amplitude, E0 , may be written as
In Section 13.9, we show that not too
large means E0 Eat , where Eat is the d = αE0 . (13.2)
binding field of the electron.
In Appendix C, we show that, if E0 is not too large, 2
then the
polarizability is given by
D02 1
α = − , (13.3)
3
It is important to distinguish between Δ + iΓ/2
the dipole matrix element, D0 , which
depends on the properties of the atom, where Δ = ω − ω0 (the detuning) is the difference between the
and the induced dipole moment, d, angular frequency of the light and the resonance, Γ = k 3 D02 /(3π0 )
which also depends on the amplitude is the photon scattering rate (equal to the inverse of the excited
of the light field. The induced√dipole state lifetime Γ = 1/τ ), and D0 = −e|b|r · ˆ|a| is known as the
has a maximum value of D0 / 2, see
Appendix C. dipole matrix element between the ground and excited state for light
with polarization vector ˆ.3 In the linear optics regime, described by
eqn (C.12), d
D0 . The real and imaginary parts of α,
D02 Δ D2 Γ/2
α = − and α = 0 2 ,
Δ + (Γ/2)
2 2 Δ + (Γ/2)2
are plotted as a function of the detuning in Fig. 13.2. The fact that the
polarizability is complex tells us that the induced dipole is not in phase
with the incident field. This phase difference gives rise to a resultant
wave that appears to propagate slower inside a medium. As we shall
show, the refractive index is approximately proportional to α, which
explains the similarity between the form of α and the refractive index
curve we introduced in Chapter 2, Fig. 2.8. For this reason the form of
α is often referred to as a dispersion lineshape.
13.2 Refractive index

Fig. 13.2 The real (i) and imaginary In this section we derive an expression for the refractive index of a
(ii) parts of the polarizability in units
of 2D02 /Γ, α̃ = α/(2D02 /Γ). The
medium of many dipoles. At distances, r > λ/(2π), a single dipole
refractive index of a medium is related at the origin creates a field that can be approximated by a spherical
to the real part (i). Attenuation of wave with a particular angular dependence,
light via scattering is proportional to
the imaginary part (ii). d k 2
Ed = f(θ, φ)ei(kr−ωt) , (13.4)
4π0 r
where f(θ, φ) depends on the electric field polarization and orientation
of the dipole, see Appendix A, Section A.3. Now consider many dipoles
in a slab in the x y plane at z = 0 with N dipoles per unit volume.
The slab has infinite spatial extent in the x and y directions,

a thickness δz
λ in the z direction, and is illuminated by a
(z)
monochromatic plane wave Ei , see Fig. 13.3. If the dipoles do not
interact with one another, then the dipolar field at a distance z is given
by the sum of the individual fields of each dipole. The sum is found by
integrating eqn (13.4) over the x y plane. For plane-wave illumination,
the integral contains contributions with positive and negative phase
similar to the Fresnel zones we encountered in Chapter 5, eqn (5.9)
with Ra → ∞. The contributions from the outer zones cancel leaving
only terms with f(θ, φ) 1, and we can write the total dipolar field as
ˆ ∞ˆ ∞ 2
(z) k d ikr
Ed = N δz e dx dy . (13.5) Fig. 13.3 Inside a medium (grey), the
−∞ −∞ 4π 0z
incident field, Ei , induces dipolar fields,
For a continuous medium, where there is a ‘dipole’ everywhere there is a Ed , that lag behind. The resultant field
is given by Ei + Ed .
driving field, we can use eqn (13.2) to write the induced dipole moment
(0)
as d = αEi , and the total dipolar field becomes
ˆ ˆ
(z) N αk 2 δz ∞ ∞ (0) ikr
Ed = E e dx dy , (13.6)
4π0 z −∞ −∞ i 4
For a medium of discrete dipoles, we
would write the polarizability of the jth
where α is the polarizability.4 The dimensionless ratio, dipole as αδ(r − rj ) and the integral
reverts to a sum of spherical waves
Nα originating at the positions rj only.
χ = , (13.7)
0
is called the susceptibility, and provides a measure of the amplitude
(and phase) of the induced dipolar field. The polarizability, α, and
susceptibility, χ, characterize the microscopic (individual dipole) and
the macroscopic (or bulk medium) response, respectively. A linear
relationship between α and χ holds only if the induced dipole moment
depends only on the incident field and not the field produced by other
dipoles. We shall come back to this point in Section 13.3. Substituting
for χ in eqn (13.6), and rearranging such that the integral has the form
of the Fresnel diffraction integral, we obtain
ˆ ∞ˆ ∞
1 1
E eikr dx dy ,
(z) (0)
Ed = ikχδz
2 iλz −∞ −∞ i
Fig. 13.4 Successive slabs in the
1 (z) medium produce delayed dipolar fields
= ikχδzEi . (13.8)
2 (top). In an extended medium (below)
the accumulated delays produce a field
(z)
This result tells us that the induced dipolar field, Ed , has the same that appears to have a wavelength λ/n
(z)
spatial distribution as the driving field, Ei . If the incident field is a (and phase change per unit length is
nk), and propagates at an effective
plane wave then the induced dipolar field is also a plane wave, but with speed c/n, where n is the refractive
a different amplitude and phase. If either the incident field or the slab index.
have a finite spatial extent, then the far field corresponds to the Fresnel
diffraction pattern for that input, e.g. for a thin disc of dipoles, the far
field dipolar radiation pattern is an Airy pattern, similar to Babinet’s
principle, Section 6.8.
Next, we consider adding more slabs, as in Fig. 13.4(top), until
we reach the limit of a continuous three-dimensional medium. Each
successive slice produces a dipolar field that is further delayed relative to

the driving field, such that the resulting wave is effectively compressed,
Fig. 13.4(bottom). The total field at position z + δz is a sum of the
incident field, E (z) , plus the dipolar field, eqn (13.8), modified by a
propagation phase, eikδz 1 + ikδz, as light traverses the slab. To first
order in the slab thickness, δz, we obtain

1
E (z+δz) 1 + ikδz + ikχδz E (z) , (13.9)
2
which can be rewritten as

E (z+δz) − E (z) 1
= i 1 + χ kE (z) . (13.10)
δz 2
Next, we sum the forward propagating fields from successive slabs by

integrating along the z axis from 0 to z:
Fig. 13.5 (i) The phase ‘lag’ of the
1
dipole as a function of the detuning, E (z) = E (0) exp i 1 + χ kz . (13.11)
Δ, between the light frequency, ω/(2π), 2
and the resonance frequency, ω0 /(2π).
(ii) The modulus of the polarizability. If we write the susceptibility in terms of real and imaginary parts, χ =
χ + iχ , we have

1 1
E (z) = E (0) exp i 1 + χ kz exp − χ kz . (13.12)
2 2
This expression contains two important results. First, the phase of the
propagating wave evolves with a phase change per unit length of k = nk,
where
1 N α
n = 1 + χ = 1 + , (13.13)
2 20
5
In Exercise 13.2 we consider whether is the refractive index.5 The refractive index is related to the real
it makes sense to define a refractive part of the polarizability, α , or equivalently, the real part of the
index over a distance scale less than
a wavelength. For most transparent
susceptibility, χ . Secondly, the amplitude of the wave is attenuated at
optical media, like glass or water, a rate, kχ /2. Hence, the real and imaginary part of the polarizability
the light frequency is less than the (or susceptibility) determine the refractive and scattering properties of
resonance frequency, ω < ω0 , α > 0, the medium, respectively. The distinguishing feature of the real and
and the refractive index is greater than
one.
imaginary parts is whether the dipole oscillates in phase, quadrature, or
anti-phase with the incident field. The real part oscillates predominantly
either in phase or anti-phase (with a π phase lag) relative to the driving
field. The imaginary part oscillates in quadrature (with a π/2 phase lag)
relative to the driving field, see Fig. 13.5.
The above derivation gives an important insight into the apparent
change in the speed of light, v = c/n, inside a medium. Light still
propagates at c but interference between the incident field and the phase-
shifted induced dipolar field, eqn (13.9) results in a wave with a modified
wavelength and a different apparent propagation speed.
Example 13.1
The Bouguer–Beer–Lambert law: The second important result in eqn (13.12)
concerns the imaginary part of the polarizability or susceptibility. The imaginary
part modifies the amplitude of the wave (via extinction in the forward direction).
The intensity I = 12 0 |E|2 as a function of propagation distance is
I(z) = I0 e−kχ z , (13.14)
where kχ is known as the extinction or attenuation coefficient, and I0 is the

incident intensity at z = 0. The law—which says that the light intensity falls
exponentially as illustrated in Fig. 13.6—was first discovered by Pierre Bouguer
(Croisic 1698–Paris 1758) in his 1729 Essai d’optique sur la gradation de la lumiére,
and later by Johann Heinrich Lambert (Mulhouse 1728–Berlin 1777) and August
Beer (Trier 1825–Bonn 1863). In addition to attenuation due to scattering, many
media also exhibit absorption, where light is removed and converted into some other
form. In this case, the total attenuation coefficient is given by a sum of the scattering
and absorption coefficients.
Fig. 13.6 Photograph of light attenua-

13.3 Ewald–Oseen extinction tion via scattering. The light intensity
decreases exponentially with distance,
Up to now we have neglected the backward-propagating field shown illustrating the Bouguer–Beer–Lambert
in Fig. 13.1, which is only a good approximation when the total dipolar law, eqn (13.14).
field—summed over all dipoles—is much less than the incident field,
|χ|
1. In this section, we include the backward-propagating field and
derive a more accurate expression for the refractive index in terms of
the microscopic properties.
The geometry of forward- and backward -propagating waves for plane
(z)
wave illumination, Ei = E0 eikz , of a homogeneous medium is illustrated
in Fig. 13.7. From eqn (13.8), the contribution to the dipolar field in a
plane at z due to a slab at z is
(z,z ) 1
δEd = iχkδz E (z ) eik(z−z ) ,
2

where E (z ) is the total field in the z plane (including the dipolar fields

from other slabs) and eik(z−z ) is the propagation phase.6 The sum of 6
For an infinite slab the near-field
the forward- and backward- propagating dipolar fields is angular terms in the dipolar field
average to zero. A derivation including
ˆ ˆ
(z) kχ z (z ) ik(z−z ) kχ ∞ (z ) −ik(z−z ) the near-field terms can be found in
Ed = i dz E e +i dz E e , (13.15) Fearn et al. (1996).
2 0 2 z
and the total field in plane z is a sum of the incident field plus the
induced dipole fields,
(z) (z)
E (z) = Ei + Ed . (13.16)
The resulting propagating field can be written as a plane wave with

amplitude, E0 and phase per unit length, k , i.e. E (z) = E0 eik z .
Substituting this into eqn (13.15) and integrating along z over the whole
extent of the medium, we obtain an expression for the dipole component,
ˆ ˆ
(z) kχ ikz z i(k −k)z kχ −ikz ∞ i(k +k)z
Ed = i e dz E0 e +i e dz E0 e
2 0 2 z
kχE0 kχE0
=
eikz ei(k −k)z − 1 − e−ikz ei(k +k)z ,
2(k − k) 2(k + k)
kχE0 ik z kχE0
=
(e − e ikz
) −
eik z , (13.17)
2(k − k) 2(k + k)
where we have assumed that there is no contribution from the limit
z → ∞. Substituting this result into eqn (13.16) and adding the incident
field we obtain a total field,

kχE0 k 2 χE
E0 eik z = E0 −
eikz + 2 02 eik z . (13.18)
2(k − k) (k − k )
Equating terms, we find that
k2 χ
= 1, (13.19)
Fig. 13.7 The forward- and backward- (k 2 − k 2 )
propagating fields in an extended
medium. The field experienced by a which gives the ratio of the new to old angular spatial frequency,
dipole in any plane is the sum of the
k
incident field plus the forward- and = 1+χ=n , (13.20)
backward-propagating dipolar fields. k
7
Note that in macroscopic theory the where n is the refractive index.7 This result agrees with our previous
relationship between refractive index result, eqn (13.13), when the dipolar field is small, |χ|
1. From the
and susceptibility is obtained trivially
by definition. The speed of light, c/n, is
eikz term we get
assumed to arise due to the polarization 2E0 (k − k)
induced by the incident field, P = E0 = , (13.21)
χ0 E. Using kχ
D = r 0 E = 0 E + P
which by substituting for χ, we can rearrange as
2E0 (k − k) 2E0
and rearranging for P, we find E0 = = ,
(k 2 − k 2 )/k k /k + 1
P = (r − 1)0 E = (n2 − 1)0 E ,
2E0
where for a non-magnetic medium with = . (13.22)
√ n+1
μr = 1, n = r , we obtain χ = n2 − 1,
as in the microscopic theory. The reflected field is
2E0 n−1
Er = E0 −
= E0 . (13.23)
n+1 n+1
This is the same as the Fresnel reflection coefficient for normal incidence,
see Chapter 2. Finally, substituting in eqn (13.18) we find that the
induced dipolar field is
(z) 2
Ed = −E0 eikz + E0 einkz (13.24)
n+1
This remarkable result, known as the Ewald–Oseen extinction
8
Paul Peter Ewald (Berlin 1888— theorem,8 states that the dipole field consists of two terms:
Ithaca 1985), Carl Wilhelm Oseen
(Lund 1879—Uppsala, 1944). (1) The first term exactly cancels the incident field; and
(2) The second term is an equivalent wave with amplitude 2E0 /(n + 1)
that propagates with angular wave vector k = nk.
13.4 Clausius–Mossotti
So far we have assumed that the atoms or molecules in the medium do
not interact except indirectly via the propagating field. In this case,
the microscopic response characterized by the polarizability and the
macroscopic response, characterized by the susceptibility are linearly
related. However in a dense medium, the near-field part of the induced
dipolar field, see Section A.3, may begin to influence the light–matter
interaction.
Consider a small spherical void inside a homogeneous dielectric, as in
Fig. 13.8. If the polarization P in the medium is uniform and aligned
with the z axis then using Gauss’ law we can say that the charge density
on the surface of the void, σ, is given by
ˆ ˆ
1
E · dS = σdS .
0
For an annulus of charge at an angle θ we have
P σ
cos θdS = − dS ,
0 0
so σ = −P cos θ. The area of the annulus is rdθ2πr sin θ. Using
Coulomb’s law to sum the z-component of the field at the centre of
the void due to surface charge, we find
ˆ π
1 P
Ez = 2
dθσ2πr2 sin θ cos θ = .
4π0 r 0 30
The total field at a particular location, called the ‘local’ field, is equal
to the sum of the incident field plus the dipolar field produced by all the
other dipoles,
P
Eloc = E+ , (13.25)
30
where P = N d is the polarization density. The susceptibility
determines the bulk response, P = 0 χE, whereas the polarizability
determines the local response, P = N αEloc . Substituting for E and P Fig. 13.8 The back action of dipoles
we find that on each other is modelled as the mean
field at the centre of a void due to the
N α/0 surrounding medium.
χ = . (13.26)
1 − 13 N α/0
This equation relates the macroscopic variable χ to the microscopic (or
single atom) parameter α and is known as the Lorentz–Lorenz law.9 9
Derived by Ludwig Lorenz (Helsingør
If we rewrite χ in terms of the refractive index χ = n2 − 1, eqn (13.20), 1829–Copenhagen 1891) in 1869 and
Hendrik Anton Lorentz (Arnhem 1853–
we find how the refractive index varies with number density N , Haarlem 1928) in 1880. Lorentz
N α/0 won the Nobel Prize in 1902 together
n2 = 1+ . (13.27) with Pieter Zeeman (Zonnemaire 1865–
1 − 13 N α/0 Amsterdam 1943) for the discovery
and explanation of the Zeeman effect,
At low density the second term is small and we recover the dilute result, and is also known for the Lorentz
eqn (13.13), whereas at higher density the term in the denominator transformation.
begins to play a role. We might worry that when 13 N α/0 = 1 we have

a singularity, but this never happens because there is an upper limit on
N imposed by the finite size of the constituents.
To see how the polarizability relates to molecular ‘size’, we rearrange
the Lorentz–Lorenz law, eqn (13.27), in the form
n2 − 1 1 Nα
= . (13.28)
n2 + 2 3 0
10
Developed independently by This is known as the Clausius–Mossotti relationship:10 For a
Ottaviano-Fabrizio Mossotti (Novara uniform medium, the polarizability per unit volume, V , is
1791–Pisa 1863) in 1850 and Rudolf
Clausius (Koslin 1822–Bonn 1888) in α n2 − 1
1879. = 30 . (13.29)
V n2 + 2
For a uniform sphere of radius, a, we simply integrate this value over
the sphere to find that the polarizability is
11
The case of resonance is more
complex. Whereas the polarizability n2 − 1 3
α = (4π0 ) a . (13.30)
of dielectric spheres corresponds to n2 + 2
an effective volume that is always
smaller than their physical size. The So large n corresponds to N = 1/( 43 πa3 ), where the molecules effectively
resonant polarizability of atoms, α = fill all the available volume.11
(4π0 )(3/2)(λ/2π)3 , indicates a volume
larger than their physical size. For
this reason strong dipole–dipole inter-
actions are possible close to resonance 13.5 Kramers–Kronig
and we can no longer treat the dipole–
dipole interaction by simply adding a Imposing causality—there should be no output before there is any
mean field, as we did in eqn (13.25), input12 —places strong constraints on the amplitude and phase of Fourier
see e.g. Bettles et al. (2016).
components, and allows us to derive a relationship between the real
12
It is difficult to define causality pre- and imaginary parts of the electric susceptibility, χ(ω), or the refractive
cisely, but the meaning is unambiguous:
effect cannot precede the cause.
index, n(ω) = 1 + χ(ω). Imagine a signal f(t) that turns on at time
t = 0, i.e. f(t) = 0 for t < 0, see Fig. 13.9. This function is unchanged
under multiplication by the Heaviside function,

0 t<0
heavy(t) = , (13.31)
1 t≥0
i.e. f(t) = f(t)heavy(t). We may employ the convolution theorem to
find the equivalent of this invariance in the frequency domain. Using
eqn (B.28) we can write
F(ω) = F(ω) ∗ F[heavy(t)] . (13.32)
By writing the Heavyside function in terms of the signum function,
Fig. 13.9 A signal f(t) that begins at
heavy(t) = 12 [1 + sgn(t)], it is easy to show that the Fourier transform is
t = 0 is unchanged when multiplied by 1
the Heavyside function, heavy(t). F[heavy(t)] = πδ(ω) + . (13.33)
iω
Using this result in eqn (13.32) and remembering the 1/(2π) factor in
the convolution integral for angular frequencies, eqn (B.29), we obtain
ˆ
1 ∞ F(ω )dω
F(ω) = . (13.34)
iπ −∞ ω − ω
This integral is known as a Hilbert transform, named after David

Hilbert (Königsberg 1862–Göttingen 1943). The invariance to mul-
tiplication by a Heaviside function in the time domain maps into an
invariance to a Hilbert transform in the frequency domain. This result
holds for all passive linear systems.
By separating the spectrum into real and imaginary parts using
F(ω) = F (ω) + iF (ω) and equating real and imaginary parts, see
Exercise 13.7, we obtain the Kramers–Kronig relations,13 which for
the susceptibility χ(ω) = χ (ω) + iχ (ω), relates the real and imaginary
parts as follows:
ˆ
2 ∞ ω χ (ω )
χ (ω) = dω , (13.35)
π 0 ω 2 − ω 2
and the imaginary part of the susceptibility is obtained from the

spectrum of the real part,
ˆ
2ω ∞ χ (ω )
χ (ω) = − dω . (13.36) Fig. 13.10 (i) The Hilbert transform
π 0 ω 2 − ω 2 (black dots) calculated from the χ̃
data shown in (ii). The Hilbert
The Kramers–Kronig relations allow us to find the real part of the transform result is compared to the-
oretically expected susceptibility χ̃
frequency-dependent susceptibility, χ (ω), from knowledge of the spec- (grey line). The range of good
trum of the imaginary part, χ (ω ), and vice versa. agreement is set by the range of χ̃
Equations (13.35) and (13.36) can be difficult to work with and it is data. For convenience, we have plotted
easier to work with the mathematically equivalent Hilbert transform, the normalized susceptibilities, χ̃ =
χ/max(χ ).
eqn (13.34).14 If we define the Hilbert transform of a spectrum F(ω) as
13
F̂(ω) then for the susceptibility using eqn (13.34), we can write that Hendrik Anthony Kramers (Rotter-
dam 1894–Oegstgeest 1952); Ralph
Kronig (Dresden 1904–Zeist 1995).
χ (ω) = −χ̂ (ω) . (13.37)
14
A standard function in many signal-
From an experimental measurement of the transmission spectrum, we processing software packages, and takes
can calculate χ (ω), and then use the Hilbert transform eqn (13.34) only a fraction of a millisecond to
to calculate the concomitant frequency-dependent refractive index (or calculate, compared to tens of seconds
for an equivalent Kramers–Kronig com-
the dispersion relation), as illustrated in Fig. 13.10; see Whittaker et al. putation (Whittaker et al. 2015).
(2015) for further details.
13.6 Point-like scatterers

The opposite extreme of plane-wave illumination of an infinite homoge-
neous medium is a single dipole or ‘point-like’ scatterer (with size less
15
than the wavelength). Here, we explore the effect of the spatial overlap— Note that the scalar gaussian beam
is a paraxial solution to the scalar
mode matching—between the incident field and the induced dipolar Helmholtz equation and, as we saw in
field, and how this leads to the concept of an extinction cross section. Chapter 12, is not an exact solution
Consider a dipole driven by a laser beam propagating along z, polarized of Maxwell’s equations in the strong
along y 15 and with a beam waist in the z = 0 plane. Using eqn (11.7) focusing limit desirable for strong light–
matter coupling. However, the paraxial
the incident electric field is model still illustrates the underlying
zR 2 physics.
(z)
Ei = −i E0 eikz eikρ /2q , (13.38)
q
where q = z − izR and zR = πw02 /λ is the Rayleigh range. In the far

field, z > zR , this becomes
zR
E0 eikz eikρ /2z e−ρ /w .
(z) 2 2 2
Ei = −i (13.39)
z
Here the factor of −i = e−iπ/2 (the Gouy phase, see Chapter 11) means
that the phase of the focused gaussian beam is ahead of a plane-wave
solution by a quarter of a cycle. The effect of the laser is largest
on resonance, i.e. when ω = ω0 . In this case, the polarizability
α = i2D02 /Γ, is imaginary, meaning that the dipole lags behind the
driving field by π/2 radians. Using Γ = k 3 D02 /(3π0 ), we obtain
α = i6π0 /k 3 and then from using d = αE0 we can write the dipolar
field (for a dipole oscillating along y) as
(z) d k 2 ikr 2 3 1

Ed = e sin α = i E0 eikr sin2 α , (13.40)
4π0 r 2 kr
where α is the angle relative to the y axis. The superposition of the
Fig. 13.11 Intensity maps in the yz incident field and the dipolar field (close to the z axis) is
plane for: the radiation pattern of a
single dipole polarized along y (top); a (z) (z)
E (z) = Ei + Ed
focused laser beam propagating along ikr
z (middle); and their sum (bottom). 3e zR ikz ikρ2 /2z −ρ2 /w02
Note that the dipole also emits in the = iE0 − e e e . (13.41)
2 kr z
x direction, see Example A.2.
In the forward direction, the driving field and the dipolar field are exactly
out of phase and interfere destructively. This extinction in the forward-
scattering direction is illustrated in the xz plane in Fig. 13.11. In the
16
Because of the three-dimensional reverse direction the fields interfere constructively, so we can say that the
nature of the dipolar field, see Exam-
ple A.2, it is difficult to achieve an
incident field is ‘reflected’ by the dipole. The ‘quality’ of the extinction
extinction of more than 10% in free in the forward direction—or single dipole reflectivity— depends on the
space, see Tey et al. (2008) and Hwang geometrical overlap (mode matching) between the incident field and the
et al. (2009). Higher extinctions are dipolar field.16
possible either by using a cavity, see
e.g. Durak et al. (2014) where an
anaclastic lens, Section 2.15, is used to 13.6.1 Scattering cross section
focus the incident light, or by adding
more dipoles in particular geometries, From Fig. 13.11, it is apparent that a single dipole behaves like a mirror
see Bettles et al. (2016). for resonant light, and we can ask: what is the effective ‘area’ of this
mirror? The ‘area’—known as the scattering cross section, σ—
17
The angular integral is follows from how much flux is removed from the incident beam, and
ˆ π is defined as the power radiated by the dipole divided by the incident
2π sin θ3 dθ intensity,
0
ˆ π
= −2π (1 − cos2 θ)d(cos θ)
P
σ = , (13.42)
0
π I0
1 8π
= −2π cos θ − cos3 θ = .
3 0 3 where the power radiated is equal to the integral ´ of the Poynting vector
This derivation can also be found over a sphere centred on the dipole: P = S · dA. The Poynting
in texts on electromagnetism, e.g. vector associated with E d is S = c0 E d × (r̂ × E d ). Using the far field
Jackson (1999).
form, E d = E0 [k 2 /(4π0 r)] sin θθ̂, derived in Appendix A, and including
the factor of 12 from the time average, we obtain17
ˆ
1 |d|2 k 4 π
|d|2 ck 4
P = c0 2π sin θ3 dθ = . (13.43)
2 (4π0 )2 0 4π0 3
In the language of particles, we would say that the incoming photons
are scattered by the dipole; or as the dipole radiates, it removes energy
from the beam via scattering. Substituting for P in eqn (13.42), using
I0 = 12 c0 E02 and |d| = |α|E0 , we find that the scattering cross section
is
8π |α|2 4
σ = k . (13.44)
3 (4π0 )2
This is a useful result in two regimes. Firstly, when the light frequency
is far off resonance, where the wavelength dependence explains why the
sky is blue, and secondly on resonance.
Example 13.2
Off-resonance scattering: the blue sky: Many atoms or molecules have
resonances in the ultra-violet, so for visible light, the light frequency is much less
than the resonance frequency, ω ω0 . This is the case for light travelling through
air or other transparent media such as water or glass. In this case the polarizability,
eqn (C.12), becomes frequency independent, α
2D02 /(ω0 ), see eqn (C.16) in
Appendix C, and the scattering cross section, eqn (13.44), is inversely proportional
to the fourth power of the wavelength,
4
8π |α|2 2π 8π 3 |α|2 1
σ = = . (13.45)
3 (4π0 )2 λ 320 λ4
This is known as Rayleigh scattering and is responsible for the blue sky. Shorter
wavelength (e.g. blue light) is scattered more than longer wavelength (red light),
as in Fig. 13.12. If there are N molecules per unit volume (and we can neglect
intermolecular interactions) then the light intensity after propagating a distance z
through the atmosphere is I = I0 e−N σz . For white light, the directly transmitted
component contains more red than blue which is particularly apparent when we look
at the Sun at sunrise or sunset (see end-of-chapter exercise).
Example 13.3
Resonant scattering: The other interesting case is on resonance. Substituting the
resonant polarizability, eqn (C.12) with Δ = 0, giving α = i6π0 k−3 in eqn (13.44),
we find that the resonant cross section—the effective area removed from the Fig. 13.12 Sunlight propagating from
incident beam due to destructive interference in the forward direction—is left to right contains red (light grey)
3λ2 and blue (dark grey) photons. The blue
σ = . (13.46) sees a larger scattering cross-section
2π
leaving a larger component of red in the
This is a surprisingly simple result, and says that the resonance cross section does directly transmitted light.
not depend on the dipole moment or the scattering rate, it only depends on the
wavelength. As we shall see in Example 13.13, the same result holds for any point-like
scatterer with dimensions much less than the wavelength. However, if the scattered
field is isotropic then there is less overlap between the incident field and the dipolar
field in the forward direction, and we obtain a smaller cross section, σ = λ2 /2π.
Example 13.4
The optical theorem: A more general result for the scattering cross section that
does not assume anything about dipoles, and also applies to quantum scattering
of massive particles, is known as the optical theorem. Consider a point-like
scatterer in a plane wave, as shown in Fig. 13.13. The incident wave is a scalar
monochromatic plane wave, propagating along z axis, Ei = E0 eikz (neglecting the
explicit time dependence). The scatterer behaves like a point source that—in the far
field—radiates a spherical wave of the form
eikr
Ed = E0 f(θ) , (13.47)
r
where f(θ) contains information about the relative phase and angular distribution of
the dipolar field, θ is the angle relative to the z axis, and we are assuming that the
18
If the scatterer is an atomic dipole, scattering is cylindrically symmetric.18 The total field is
cylindrical symmetry only applies to eikr
the case of excitation using circularly E = E0 eikz + E0 f(θ) . (13.48)
r
polarized light.
If we are only interested in ‘forward scattering’, in the paraxial regime we can make
the following approximations, (θ 1, x, y z), and write
2

E eikρ /2z
= eikz 1 + f(0) , (13.49)
E0 z
where ρ2 = x2 + y 2 . The normalized intensity is
2 2

I eikρ /2z ∗ e−ikρ /2z
= 1 + f(0) + f (0) + ... . (13.50)
I0 z z
To find out how much light the dipole has removed we integrate over a disk with
radius R in the far field (R z). The integral of the quadratic phase factor gives
ˆ 2
R
1 R ikρ2 /2z 2π eikρ /2z 2π
e 2πρdρ = =i ,
Fig. 13.13 A point-like scatterer z 0 z ik/z k
0
illuminated by a plane wave, E0 eikz ,
where we have assumed that kR2 /2z 2π such that the exponential oscillates so
generates a scattered field, E0 f(θ)eikr/r .
fast around ρ = R that we can take the average, which is zero. The integral gives
The total field is the sum of the incident
P 2π
field and the scattered field. = πR2 + i [f(0) − f ∗ (0) + . . . ] . (13.51)
I0 k
Using i[f(0) − f ∗ (0)] = −2[f(0)] we can write
P 4π
= πR2 − [f(0)] . (13.52)
I0 k
This is known as the optical theorem. If [f(0)] > 0 the effect of the dipole is to
reduce the effective area of the beam by an amount
4π
σ = [f(0)] . (13.53)
k
This is known as the optical or extinction cross section. The subtle part of this
derivation is the factor of 1/i that crept in when we performed the integral—the same
1/i factor that arises in the Fresnel diffraction integral and the Gouy phase.
Resonant extinction: As above, we now apply this result to the case of a dipole
driven on resonance. To find f(0) we use the equation for the induced dipolar field,
eqn (13.40), with d = αE0 and α = i6π0 /k3 on resonance, i.e.
3 1
Ed = i E0 eikr . (13.54)
2 kr
Comparing to eqn (13.47) we find [f(0)] = 3/2k and substituting in eqn (13.53)
4π 3 3λ2
σ = = , (13.55)
k 2k 2π
as in Example 13.3.
13.7 The extinction paradox 225
13.7 The extinction paradox

In this section we look at wave scattering by an obstacle with dimensions
larger than the wavelength. A surprising result is that the scattering
cross section is larger than the physical cross section of the object—
typically exactly two times larger.19 Consider an opaque disc with 19
See e.g. Berg et al. (2011).
radius a that is uniformly illuminated at normal incidence by light
with wavelength λ, where a > λ. What is the effective area removed
when we observe the flux in the far field? Using Babinet’s principle, see
Section 6.8, we can say that the light diffraction by the obstacle—the
scattered field in Fig. 13.14—is
ˆ ∞ˆ ∞
1
E0 eikr dx dy ,
(z)
Escatter = −
iλz −∞ −∞
1
= − E0 eikr πa2 , (13.56)
iλz
where the minus sign comes from Babinet’s principle and we have
assumed that the phase variation across the aperture is negligible
because the propagation distance is large. Comparing this expression
to a point scatterer, f(0)eikr /r, using 1/r ≈ 1/z, we find that
1 2 Fig. 13.14 An object with dimensions
f(0) = − πa , (13.57) larger than the wavelength removes
iλ an amount of light from the forward
and so propagating direction corresponding to
twice its physical cross section, due to
4π
σ = [f(0)] = 2πa2 , (13.58) both attenuation and scattering.
k
i.e. twice the cross-sectional area and hence twice the cross section
predicted by geometric optics! Going back to Babinet’s principle, we
can say that the light diffracted from the forward direction contributes
πa2 (i.e. the area of the equivalent complementary aperture) and the
other πa2 comes from the geometry shadow, i.e. light that is reflected
or absorbed by the disc, as illustrated in Fig. 13.14. The same result
applies in quantum scattering.
13.8 Metals
Speiglein, Speiglein an der Wand, 20
The simple answer is that light is
Wer ist die Schönste im ganzen Land. reflected because the electron motion
Schneewittchen, Jacob (1785–1863) and Wilhelm Grimm inside the metal cancels the field at
(1786–1859). the surface. However, the field does
not decay instantaneously to zero. The
Why is a metal shiny?20 In this section we shall look at the optical main difference between metals and
insulators is that the electrons are free
response of metals as a function of the frequency of the electromagnetic and the electromagnetic field drives
field. In a metal, electron motion generates a current which creates oscillatory currents rather than dipoles.
a source term in Maxwell equations (1.2)–(1.3). For linearly polarized For small nanoscale metallic samples
light, we can rewrite the scalar wave equation (1.31) as the charges are bounded to the sample
and a dipolar resonance does occur.
1 ∂2E ∂J
∇2 E − = μ0 , (13.59)
c2 ∂t2 ∂t
where J is a current density associated with the motion of the electron

or electrons. Firstly we consider free electrons with no damping, and
then see how the damping changes the response. Consider a metal with
N electrons per unit volume illuminated by light linearly polarized along
21
We have neglected the magnetic x. The Lorentz force, F = −eE = mẍ,21 and we find that the source
force, −ev × B, because for the plane term is
wave in vacuum we found that E/B = c ∂J ∂ μ0 N e2 ωp2
and we assume that electron speed, v, μ0 = μ0 (−N ev) = E= 2E ,
is much less than the speed of light, c. ∂t ∂t m c
This assumption breaks down for heavy where
(high Z) metals such as gold where
the electron speed can reach one-third 2 1/2
of the speed of light. The relativistic Ne
ωp = , (13.60)
corrections account for why gold is gold m0
and not silver!
is known as the angular plasma frequency. The wave equation
becomes
1 ∂2E ωp2
∇2 E − = E . (13.61)
c2 ∂t2 c2
Substituting E = E0 e−iωt we obtain a Helmholtz equation,
1 2
∇2 E 0 + (ω − ωp2 )E0 = 0. (13.62)
c2
From this expression we see that the ωp corresponds to the light
frequency at which the source field due to electron motion becomes as
large as the driving field. For silver and copper, the plasma frequencies
are approximately 2.2 × 1015 Hz and 2.6 × 1015 Hz, respectively, which
are deep into the ultra-violet region of the spectrum (close to 100 nm),
about four times larger than the frequency of visible light (red light with
wavelength 600 nm has a frequency of 5 × 1014 Hz). Consequently for
22
In this model, ω > ωp , which implies visible light, ω < ωp .22

that the field due to the electrons For a plane-wave solution of the form E = E0 eik z we obtain
is larger than the driving field. For
plane waves this would violate energy 1 2
conservation. In practice, this does not k = (ω − ωp2 )1/2 = nk , (13.63)
happen when we include damping. c
where we have defined an effective refractive index,
& '1/2
ωp2
n = 1− 2 . (13.64)
ω
For ω < ωp the refractive index is imaginary, n = in where n = ωp /ω,
which means that light is strongly attenuated over a characteristic length
scale
1 c
δ =
= . (13.65)
n k ωp
Note that this is not the same as the low-frequency skin depth, as we
will show.
13.8 Metals 227
Example 13.5
Plasma dispersion: A plasma has well-characterized dispersion properties which
is important in the study of radio waves propagating in the atmosphere, and light
propagating at high frequencies through metals. The plasma dispersion relation is
ω 2 = k2 c2 + ωp2 . (13.66)
Waves with angular frequencies ω > ωp propagate, whereas waves with angular
frequencies ω < ωp are evanescent and decay. Recalling the definition of refractive
index ωn = kc, and substituting into eqn (13.66), we find that it has the functional
form
ω 2
p
n2 = 1 − . (13.67)
ω
Fig. 13.15 The real, n , and imaginary,
The phase velocity is given by
n , parts of the refractive index of
c c a metal or plasma as a function of
vp = = % , (13.68)
n 1 − ωp2 /ω 2 the angular frequency of the light in
the vicinity of the angular plasma
which always exceeds c for waves with angular frequencies ω > ωp . Differentiating frequency, ωp .
eqn (13.66) with respect to k allows us to calculate the group velocity,
dω
2ω = 2c2 k,
dk
dω ω
∴ = c2 ,
dk k
∴ vgp vp = c2 . (13.69)
Figure 13.15 shows the functional form of the frequency dependence of the refractive
indices of electromagnetic waves in a plasma. The product of group and phase
velocity is equal to the speed of light squared; therefore the group velocity is always
23
less than c for waves with angular frequencies ω > ωp .23 The explicit expression for the group
%
velocity is vgp = c 1 − ωp2 /ω 2 .
Example 13.6
Drude model and skin depth: Missing from the above description is damping
due to scattering by the lattice. In the Drude–Lorentz model the lattice imposes
a friction-like force characterized by a phenomenological resistive damping rate γ,
and the equation of electron motion becomes
−eE = m(ẍ − γ ẋ) ,
where the friction coefficient is equal to the inverse of the damping time constant,
24
γ = 1/τ .24 The addition of a damping term modifies the refractive index to, see The damping rate, γ, is related to the
Appendix C, conductivity, σ. Consider the response
in the low-frequency limit ω → 0,
ωp2
n2 = 1− . (13.70) where the electron speed tends towards
ω2 + iωγ a steady-state drift with ẍ = 0. In
The damping rate for copper is γ/2π = 6.5 THz. For visible light, ω > γ, the this case, −eE = −mγ ẋ and the
free-electron refractive index is a reasonable approximation. √For lower frequencies, current density becomes J = N eẋ =
√ (N e2 τ /mγ)E = σE, which gives σ =
including microwave, ω < γ, we obtain n2
iωp2 /(ωγ). Using i = eiπ/2 = eiπ/4 =
√1 (1 + i), we find that
N e2 /mγ, i.e. lower damping gives a
2 higher conductivity.
& '1/2
ωp2
n = (1 + i) , (13.71)
2ωγ
i.e. the real and imaginary parts are equal. The attenuation coefficient is given by
the imaginary part times the magnitude of the wave vector,
& '1/2 & '1/2 1/2
ωp2 ωp2 ω σω
n k = k = , (13.72)
2ωγ 2γc2 20 c2
where we have used ωp2 = σγ/0 to write the index in terms of the conductivity. This
gives a low-frequency skin depth,
1/2
1 20 c2
δ = = . (13.73)
κ σω
The skin depth in copper reduces from 2 mm at 1 kHz, to 70 microns at 1 MHz, and
2 microns at 1 GHz. For this reason waveguides rather than wires are used in the
microwave domain for frequencies above around 20 GHz.
13.9 Non-linear optics

Since the invention of the laser, it is has become possible to produce
light intensities that are sufficiently high that the induced dipole d is
no longer a linear function of the incident field E0 . This may arise either
close to resonance, where the light–matter coupling is strong, or far off
resonance due to a shift in the resonant frequency (an ac Stark shift).
This non-linear optics regime is extremely useful for generating new
light frequencies (see Boyd, 1992) and making photons interact. In this
section, we consider the optical Kerr effect where the origin of the
non-linearity is an ac Stark shift.
A non-linearity arises when the applied electric field (either the optical
field or any other electric field) induces a shift (Stark shift) in the
resonant frequency of the dipole, as illustrated in Fig. 13.16. This
frequency shift modifies the detuning, i.e.
1 α E02
Δ → Δ− , (13.74)
2
where α is the polarizability. The susceptibility for a far-red-detuned
field, |Δ| Γ, is
N α N |D0 |2
χ = = . (13.75)
0 0 |Δ|
Substituting the modified detuning, and assuming the shift is smaller
Fig. 13.16 (i) Change in the refractive
that the detuning, we obtain
index due to the shift in the resonance
from angular frequency, ω0 to
N |D0 |2 α E2
ω1 = ω0 − 12 α E02 / .
χ 1+ 0 . (13.76)
0 |Δ| 2|Δ|
(ii) The corresponding imaginary part
of the polarizability.
We can write this as
χ = χ(1)

+ χ(3)

E02 , (13.77)
where the first term is known as the linear susceptibility and the second
term as the optical Kerr non-linearity. As the non-linear term is
13.9 Non-linear optics 229
proportional to the electric field squared, we can also write it as an

intensity-dependent refractive index, n = n0 + n2 I0 . By comparing
eqn (13.76) and eqn (13.77) and setting |Δ| ω0 for the large-detuning
limit, we find that
|D0 |2
χ(3)

= χ
(1) , (13.78)
(ω0 )2
i.e. the non–linear term is proportional to the linear term times the
dipole matrix over an energy all squared. A similar form occurs for
most non-linearities; however, by working closer to a resonance it is
possible to enhance non-linearity by reducing the energy term in the
denominator.
We can estimate the size of the non-linear term by defining the binding
field of the electron as Eat = ω/|D0 |, then
χ(1)
χ(3) =
2 . (13.79)

Eat
Using the Bohr model, the energy ω ∼ 12 e2 /4π0 r and |D0 | ∼ ea0 ,
giving Eat ∼ 12 e/4π0 a20 ∼ 5 × 1011 Vm−1 . As χ(1) ≤ 1, we obtain an
upper limit of

(3)
χ 10−23 V−2 m2 . (13.80)
This rough estimate is not so far off typical values; for example, air is
1.7 × 10−25 V−2 m2 and water is 2.5 × 10−22 V−2 m2 ; (Boyd, 1992). Note
that this is for off-resonant fields, and much higher values are obtained
by exploiting resonances. If the medium has an internal field due to the
crystal structure, then it is also possible to have a linear Stark effect
giving rise to a χ(2)

non-linearity. This χ(2) term can used to frequency-
double laser light, as we shall see in Example 13.7, or to convert between
frequencies using parametric down-conversion.
Example 13.7
Classical theory: In the classical Lorentz model, see Section C.4, the electron
is treated as a mass on a spring. The linear susceptibility χ(1) corresponds with
Hooke’s law, where the restoring force is linearly proportional to the displacement.
In this case, the electron moves in a harmonic potential of the form 12 mω02 x2 for a
field polarized along x. The non-linear terms correspond to higher-order anharmonic
terms in the potential. If we expand the susceptibility as a power series in E,
N d
χ = = χ(1) + χ(2) E + χ(3) E 2 + · · · , (13.81)
0 E
then successive terms correspond to harmonic, cubic, and quadratic terms in the
25
binding potential. In the absence of any symmetry-breaking fields,25 the first-order In non-linear crystals there is an
correction to the harmonic potential is x4 , which gives rise to the χ(3) term. The internal electric field imposed by the
effect of these non-linear terms on the electron motion and how this leads to the crystal structure.
appearance of harmonics in the spectrum is illustrated in Fig. 13.17.
Fig. 13.17 The motion of a particle

in an anharmonic potential. The top
and bottom rows show the effect of a
cubic and quadratic term, respectively.
The cubic term breaks the symmetry of
the potential and in this example the
particle moves less far in the negative x
direction. This gives rise to a second
harmonic in the frequency spectrum.
Chapter summary
• Inside a medium a light field induces an additional field and the

propagating field is the result of interference between the incident
and induced fields.
• For insulators, where the charges are bound, the induced fields
are produced by oscillatory dipoles.
• In metals the induced fields are produced by oscillatory
currents.
• In a plasma (or metal) with characteristic angular frequency ωp ,
waves with ω > ωp propagate with a phase velocity that exceeds
c, a group velocity that is less than c, and their product is c2 .
• The Kramers–Kronig relations allow us to find the real part
of the frequency-dependent susceptibility from knowledge of the
spectrum of the´ imaginary part,
∞
χ (ω) = (2/π) 0 ω χ (ω )/(ω 2 − ω 2 ) dω ,
or, the imaginary part of the susceptibility from the spectrum of
the real part, ´∞
χ (ω) = −(2ω/π) 0 χ (ω )/(ω 2 − ω 2 ) dω .
• The relative phase between the incident field and the induced
fields determines the effective speed of the propagating wave which
is characterized by the refractive index.
• Destructive interference between the incident field and the
induced field in the forward propagating direction gives rise
to attenuation of the forward-propagating field via off-axis
scattering.
Exercises 231
Exercises
(13.1) Dipole phase a tangent to the Earth’s surface. What is the
Draw a phasor diagram at t = 0 for an electric effective depth of the atmosphere if the Earth’s
field, E/E0 = e−iωt . Indicate the direction radius is 6.4 × 106 m? Estimate the ratio of red to
of rotation. Add a phasor corresponding to blue light at sunset.
an induced dipole, d/|d|, with resonant angular (13.4) Wave propagation in a plasma
frequency, ω0 , for (i) ω = ω0 , (ii) ω ω0 , and (iii) The dispersion relationship in a metal is ω 2 =
ω ω0 . k2 c2 + ωp2 , where the plasma frequency ωp2 is a
(13.2) Refractive index of a thin slab constant. Do electromagnetic waves propagating
In Fig. 13.18 we show the phase of a harmonic through a plasma at a frequency higher than the
wave propagating through a thin slab with length plasma frequency exhibit normal or anomalous
(the interfaces of the medium are indicated by dispersion?
dashed lines) and refractive index n. In free space (13.5) Impossibility of a function that only absorbs one
and inside the medium the phase change per unit frequency component
distance is k and nk, respectively. Figure 13.18 Revisit the discussion of causality in Section 13.5.
also shows the Fourier transform for a short slab, Consider the case of an optical wave that is zero
and inset a slab that is 10 times longer. until t = 0 incident on a device. (i) What would
(a) What is the length of the medium, in units the output be if the device filtered out only one
of the wavelength, in both cases? frequency component? (ii) Show that the wave
(b) What is the value of the refractive index? with this modified spectrum would have to have a
finite value for t < 0. (iii) Reason why such a filter
(c) What two properties are neglected in the is not compatible with causality.
simulation? [Hint: Fresnel and Bouguer.]
(13.6) Frequency filtering and temporal convolution
(d) Comment on the uncertainty in the magni- The spectrum of the output wave with input
tude of the wave vector inside the medium spectrum F(ω) from a filter with profile G(ω)
for the short and long medium. is F(ω)G(ω). Write an explicit expression for
(e) To what extent does it makes sense to define the time dependence of the output in terms of
a refractive index for a medium of length less the functions f (t) and g(t), the inverse Fourier
than or of order λ? transforms of F(ω) and G(ω), respectively.
(13.3) Blue sky (13.7) Kramers–Kronig from Hilbert
Above the atmosphere the solar intensity of red By substituting F(ω) = F (ω) + iF (ω) into
(0.65 μm) and blue (0.45 μm) light are equal. If eqn (13.34) and equating real and imaginary parts,
the vertical depth of the atmosphere is 10 km, find the expression for F (ω) in terms of F (ω)
and the average number density of molecules is and vice versa.
N = 1.0 × 1025 m−3 , estimate the difference (13.8) Non-linearities
between the intensity of red and blue light at the What are the units of χ(3) and n2 ? What is the
Earth’s surface. At sunset the Sun light makes conversion factor between them?
232 Exercises
Fig. 13.18 Top: The normalized field,

f(z), as a function of position as it
propagates through a thin slab with
boundaries indicated by the dashed
lines. Bottom: The modulus-squared
of the Fourier transform of f(z). See
Exercise 13.2.
Electromagnetic scalar and
vector potentials A
A.1 The potentials φ and A A.1 The potentials φ and A 233
A.2 Gauge transformations 234
The Maxwell equations are A.3 Application: Electric field of a
dipole 235
∂B
∇×E = − , (A.1)
∂t

∂E
∇×B = μ0 0 +J , (A.2)
∂t
ρ
∇·E = , (A.3)
0
∇·B = 0, (A.4)
where E and B are the electric and magnetic fields; 0 and μ0 are the
electrical permittivity and magnetic permeability of free space; and J
and ρ are the electric current density (current per unit area) and charge
density (charge per unit volume). The ‘vacuum’ form encountered in
Chapter 1 are obtained by setting the current and charge densities to
zero.
At each point in space there are six components of the electromag-
netic field, three each for both vector fields. However, as there are
eight separate scalar Maxwell equations only four components can be
specified independently. In this appendix we show the power of the
electromagnetic scalar and vector potentials in representing the electric
and magnetic fields.
In electrostatics the concept that the electric field can be generated
from the gradient of a scalar potential, φ, is familiar: E = −∇φ.
In electromagnetism a similar procedure is adopted for specifying the
magnetic field. The fourth Maxwell equation, eqn (A.4), shows that the
magnetic field is always divergence free, therefore we can always write
it as1 1
Recall the vector identity
B =∇×A . (A.5) ∇ · (∇ × A) ≡ 0.
The quantity A is known as the vector potential. From the scalar and
vector potentials we can also generate the electric field,
∂A
E = −∇φ − , (A.6)
∂t
where we recognize the electrostatic limit when the time derivative of
A is zero. From eqn (A.5) and eqn (A.6) it is evident that both vector
234 Electromagnetic scalar and vector potentials
electromagnetic fields can be generated from the four components of A

2
Readers familiar with the concept of and φ.2
four vectors will recognize A and φ as We can find the conditions for the electromagnetic fields E and
the constituents of the four potential.
B generated from the scalar and vector potentials to be consistent
with Maxwell’s equations by substituting eqn (A.5) and eqn (A.6) into
3
Using the vector identity eqn (A.2) and eqn (A.3), to obtain3
∇ × (∇ × A) = ∇(∇ · A) − ∇2 A.
1 ∂ 1 ∂2A
∇ (∇ · A) − ∇2 A + 2
∇φ + 2 2 = μ0 J (A.7)
c ∂t c ∂t
and
∂A ρ
∇2 φ + ∇ · =− . (A.8)
∂t 0
These are known as the field equations as they allow the form
of the electromagnetic fields to be calculated from knowledge of the
distribution of charge and current.
A.2 Gauge transformations

There is an inherent arbitrariness in the definitions of the potentials φ
and A. The arbitrariness is evident when we consider the swap φ → φ
and A → A , where
∂χ
φ = φ − (A.9)
∂t
and
A = A + ∇χ . (A.10)
In these equations χ is an arbitrary function of space and time. The
electromagnetic fields E and B generated from φ and A are identical
to those generated from φ and A. χ is called a gauge function, and
the fact that the electromagnetic fields are identical is an example of the
4
The concept was introduced in the principle of gauge invariance.4 An appropriate choice of gauge function
early days of developing the theory of and gauge transformation allows us to simplify eqn (A.7) and eqn (A.8).
electromagnetism, and was elevated to
a general principle by Hermann Weyl
We can therefore calculate the potentials φ and A from a knowledge
(Elmshorn 1885–Zurich 1955) when of the current and charge distribution; and hence calculation of the
considering the transformation of the electromagnetic fields E and B follows. This mathematical procedure
wave function in quantum mechanics. is often substantially easier than trying to solve for the fields from
The historical roots of gauge invariance
are discussed at length by Jackson and
Maxwell’s equations directly. We now go on to look at two gauges
Okun (2001). encountered frequently (Jackson 2002) in electromagnetism.
5
Note that it is the constraints on Coulomb gauge: The Coulomb gauge is defined by the condition5
the functions φ and A that are most
important, we don’t particularly care ∇·A=0 , (A.11)
about the explicit form of the function
χ. in which case eqn (A.7) and eqn (A.8) simplify to
1 ∂ 1 ∂2A
−∇2 A + 2
∇φ + 2 2 = μ0 J (A.12)
c ∂t c ∂t
and
ρ
∇2 φ = − . (A.13)
0
The scalar potential satisfies the well-known Poisson’s equation from

electrostatics. Although eqn (A.12) still contains a term involving
φ, it is possible to remove this by splitting the current density, thus
J = J T +J L . J T and J L are known as the transverse (or solenoidal) and
longitudinal (or irrotational) components, respectively; they are defined
by the relations6 ∇ · J T = 0 and ∇ × J L = 0. We see that the vector 6
A theorem due to Helmholtz states
potential A can be determined from the transverse component of the that any vector field can be resolved
into the sum of a curl-free (irrota-
current density, by solving the inhomogeneous wave equation. tional) component and a divergence-
free (solenoidal) component.
Lorenz gauge: The second frequently encountered gauge is known as
the Lorenz gauge7 , specified by the condition 7
Ludvig Valentin Lorenz (Helsingør
1829–Copenhagen 1891). Note that
1 ∂φ this gauge is frequently incorrectly
∇·A+ =0. (A.14) assigned to Lorentz. A thorough his-
c2 ∂t
torical analysis is provided in Jackson
Subject to this condition, the field equation, eqns (A.7) and (A.8), and Okun (2001).
become
1 ∂2A
∇2 A − 2 2 = −μ0 J (A.15)
c ∂t
and
1 ∂2φ ρ
∇2 φ − 2 2 = − . (A.16)
c ∂t 0
These are, respectively, vector and scalar inhomogeneous wave equa-
tions, which are decoupled; i.e. in the Lorenz gauge it is evident that
the vector potential A can be determined from the current density only
from eqn (A.8), whereas eqn (A.16) enables the scalar potential φ to be
evaluated from knowledge of the charge distribution only. The Lorenz
gauge is used extensively in the context of calculating the radiated fields
from specified charge and current distributions.
A.3 Application: Electric field of a dipole

It is often mathematically convenient to use the scalar and vector
potentials when calculating electromagnetic fields. For example, for
monochromatic fields once the form of E is specified it is trivial to use
eqn (A.6) to generate A, from which B can be calculated quickly from
eqn (A.5). In contrast, in quantum mechanics the vector potential is
essential. A is used extensively to incorporate the magnetic field into
the Hamiltonian. There are some curious quantum phenomena, such
as the Aharonov–Bohm effect,8 where the presence of the vector 8
Aharonov and Bohm (1959). Yakir
potential influences the wave function of a charged particle, despite the Aharonov (Haifa 1932), David Joseph
Bohm (Wilkes-Barre 1917–London
particle being confined to a region where the magnetic field is zero. 1992).
The topological aspects of the Aharonov–Bohm effect are discussed by
Lancaster and Blundell (2014).
In this section we derive the spatial dependence of the field produced
by a single oscillatory dipole. We assume a classical dipole and we
should bear in mind that there are some important differences between
classical and quantum dipoles. For example, an atomic dipole consists
of a three-dimensional oscillating charge distribution, not just a point
charge oscillating along one axis. We shall come back to this point after
we have derived the field produced by a linear dipole.
Example A.1
Far field of a dipole: A ‘quick’ derivation proceeds from the general solution to
Poisson’s equation for the vector potential in the Lorenz gauge, eqn (A.15) (see e.g.
Zangwill 2013):
ˆ
μ0 J(x , t )
A(r, t) = d 3 x , (A.17)
4π r
where r = |x − x | and t = t − |x − x |/c is the retarded time, see Jackson (1999).
For a localised dipole with charge q and position z = z0 e−iωt , we have
ˆ
d3 x Jz (x,t) = q ẋ = −iωqz0 e−iωt = −iωde−iωt
iμ0 ωd ei(kr−ωt)

Az (r, t) = − .
4π r
The electric field is given by
∂A
E = −∇φ −
.
∂t
If we are interested in the far field, r > λ/(2π), where the field is tangential to the
propagation direction, we only need to find Eθ = E · θ̂. There is no contribution to
Eθ from the gradient term because the second term in
∂φ 1 ∂φ
∇φ = r̂ + θ̂ ,
∂r r ∂θ
tends to zero for large r. So we find
∂Aθ
Eθ = −
∂t
d k2
= − sin θei(kr−ωt) ,
4π0 r
where we have used Aθ = sin θAz . The z-component of the field is
d k2
Ed = sin2 θei(kr−ωt) , (A.18)
4π0 r
where the additional minus sign arises because as θ increases this reduces the z-
component. We shall use this expression in Chapter 13 to explain the microscopic
origin of refractive index.
Example A.2
Near and far field: In the above we ignored the near field and although we do
not use the result in the main text we include it here for completeness. We avoid
9
A similar derivation is given in Souza using eqn (A.17) in order to demonstrate the near- and far-field matching directly.9
(1983). Again, we consider a ‘point’ dipole oscillating at frequency ω along the z axis. As J
is parallel to the z axis the only non-zero component of A is Az . For a ‘point’ dipole,
1 ∂ 2 Az
∇2 A z − = 0,
c2 ∂t2
everywhere except at the origin. The solution is a spherical wave,
ei(kr−ωt)
Az (r, t) = A0 ,
r
with an unknown amplitude, A0 , that we can find by matching the scalar and vector
potential in the near field (kr < 1) using eqn (A.14),

∂Az 1 ik
∇ · A = cos θ = A0 cos θ − 2 + ei(kr−ωt) . (A.19)
∂r r r
For kr < 1 we can neglect the second term. Also in the near field, the potential
follows that of the static dipole. The potential due to a charge q at the origin is
q
φ(r, θ) = .
4π0 r
Adding a negative charge displaced by a distance z along the z axis, the potential
becomes

q 1 1
φ(r, θ) = − 2
4π0 r (r − z 2 − 2rz cos θ)1/2

q 1 1
= − 2
4π0 r (r − z − 2rz cos θ)
2 1/2
2
q z cos θ z d cos θ
= 1−1+ +O ≈ ,
4π0 r r r2 4π0 r2
where the static dipole moment is d = qz.
d cos θ i(kr−ωt)
φ(r, θ, t) = e .
4π0 r 2
Substituting in eqn (A.14) we find
1 i(kr−ωt) d cos θ i(kr−ωt)
−A0 cos θ e − iω e =0,
r2 4π0 c2 r2
so
iωd iμ0 ωd
A0 = − =− . (A.20)
4π0 c2 4π
Substituting eqn (A.20) into eqn (A.19) and using eqn (A.14), we find

iω iμ0 ωd cos θ 1 ik
− 2φ− − 2 + ei(kr−ωt) = 0 ,
c 4π r r
so

d cos θ 1 ik
φ(r, θ, t) = − ei(kr−ωt) .
4π0 r2 r
The electric field is given by
∂A
E = −∇φ − .
∂t
∂φ 1 ∂φ
∇φ = r̂ + θ̂
∂r r ∂θ
2
d k 2ik 2 1 ik
= cos θ + 2 − 3 r̂ − sin θ 3
− 2 θ̂ ei(kr−ωt) .
4π0 r r r r r
Using
iμ0 ωd
A = − (cos θr̂ − sin θ θ̂)ei(kr−ωt) ,
4πr
we find that
∂A μ0 ω 2 d
= − (cos θr̂ − sin θ θ̂)ei(kr−ωt) ,
∂t 4πr
d 2
= − k (cos θr̂ − sin θθ̂)ei(kr−ωt) ,
4π0 r
and so

d 1 ik 1 ik k2
E = − 2 2 cos θr̂ + − 2 − sin θ θ̂ ei(kr−ωt) .
4π0 r3 r r3 r r
In the far field (kr > 1) the 1/r term, known as the radiation term, dominates.
This term corresponds with a dipolar radiation pattern emitted by a dipole with
negligible spatial size. Note that there is no radial component of the radiation (1/r)
term, i.e. the radiated field is transverse. In the near field (kr < 1) the higher-order
terms dominate. Note that the imaginary part of the field remains finite at the origin.
The imaginary part of the 1/r 3 and 1/r 2 terms has the form

1 k kr k3 r 3 k k3 r2 k3
lim sin kr− cos kr = 3− − 2+ +O(r 2 ) = .
r→0 r 3 r2 r 6r 3 r 2r 2 3
It is often useful to look at a particular component of the dipolar field. Here,
we choose the component parallel to the direction of the induced dipole, i.e. the z-
component,
Ez = Er cos θ − Eθ sin θ

d 1 ik 1 ik k2
= 3
− 2 2 cos2 θ − 3
− 2 + sin2 θ ei(kr−ωt) .
4π0 r r r r r
Note that the minus sign arises because as θ increases this reduces the z-component.
This last result gives a useful form for the scalar dipolar field:

d 1 ik k2
Ed = − (3 cos 2
θ − 1) + sin 2
θ ei(kr−ωt) , (A.21)
4π0 r3 r2 r
from which we obtain the far field, kr 1, radiation term as:
d k2
Ed = sin2 θei(kr−ωt) ,
4π0 r
in agreement with the quick derivation, eqn (A.18).
Dipoles in three dimensions: Finally, we discuss briefly the more complex case
of three-dimensional charge distributions. In the far field, we can describe the three-
dimensional oscillating charge distribution as a superposition of point-like dipoles
along the x, y, and z axes. More convenient is to take a z dipole, dz , and dipoles
rotating in both directions around the z axis, dx + idy and dx − idy (these modes
10
These different dipole modes are are all symmetric with respect to the z axis).10 The dipolar modes dz , and dx ± idy
associated with different internal quan- are associated with the transitions Δm = 0 and Δm = ±1 and are often labelled as
tum states. The simplest quantum π and σ ± . Light emitted due to a π transition is linearly polarized, whereas light
dipole corresponds with the superposi- emitted due to a σ ± transition is either circularly polarized close to the z axis or
tion between quantum states with total linearly polarized in the xy plane, see Chapter 4. The intensity distributions for
angular momentum J = 0 and J = 1. π and σ ± transitions are proportional to sin2 θ and 12 (1 + cos2 θ), respectively. If
The J = 0 and J = 1 states consist we add the radiation from all three modes together we find that the distribution is
of sub-states labelled m = 0 and m = isotropic and the intensity is a factor of 2 larger than for dz alone.
−1, 0, −1, respectively.
Fourier transform toolkit B
B.1 Executive summary B.1 Executive summary 239
B.2 δ-function 240
In this appendix, we focus on the mathematical properties of Fourier B.3 Properties 241
transforms and develop a toolkit that can be applied to optics B.4 Convolution 242
throughout the book.1 The following dozen equations—relating to B.5 rect sinc 243
Fourier transforms—are particularly useful in optics. The first six relate B.6 gauss gauss 244
to the mathematical properties of the Fourier transform. The second
B.7 δ-function constant 246
half dozen are particular examples of Fourier transform pairs. We
use lower case letters for real space functions and capitals for function
B.9 comb comb 247
in Fourier (or frequency) space. We define a Fourier transform using:
B.10 2D Fourier transforms 250
ˆ ∞ B.11 Cartesian separability 250
F(u) = F [f(x)] (u) = f(x)e−i2πux dx , (B.1) B.12 2D rect 250
−∞
ˆ B.13 circ jinc 250
∞
−1
f(x) = F [F(u)] (x) = F(u)e i2πux
du . (B.2) B.14 Fourier on a computer 252
−∞ Exercises 253
1
The following six properties of Fourier transforms: the central If you are familiar with Fourier
transforms and are happy with the
ordinate theorem; linearity; translation; scaling; convolution (inverse dozen equations, B.2–13, you could skip
convolution); and cartesian separability, can be expressed as: the further details.
ˆ ∞
F(0) = f(x)dx , (B.3)
−∞
F [g(x) + h(x)] (u) = G(u) + H(u) , (B.4)
F [f(x − d)] (u) = F(u)e−i2πud , (B.5)

x
F f (u) = |a|F (ua) , (B.6)
a
F [g ∗ h] (u) = GH, F [gh] (u) = (G ∗ H)(u) , (B.7)
F [g(x)h(y)] (u, v) = G(u)H(v) , (B.8)
where G(u) = F [g(x)] (u) and H(v) = F [h(y)] (v).

240 Fourier transform toolkit
Six useful Fourier transform pairs:
x
F rect (u) = a sinc (πua) , (B.9)
a
ρ πD2
F circ (u, v) = jinc [π W D] , (B.10)
D 4

x √
F gauss (u) = πw0 gauss (πuw0 ) , (B.11)
w0
F [δ(x)] (u) = 1, (B.12)
sin N πud
(N )
F Xd (x) (u) = , (B.13)
sin πud
F [cos 2πu0 x] (u) = 1

2 [δ(u + u0 ) + δ(u − u0 )] , (B.14)
relate to rectangles; circles; gaussians; Dirac δ-functions; combs; and

cosines; respectively. These functions are defined in the subsequent
sections. In these expressions, u is the spatial frequency in the x
direction, v is the spatial frequency in the y direction, ρ = (x2 + y 2 )1/2
and W = (u2 + v 2 )1/2 are the equivalent real space and Fourier variables
in cylindrical coordinates.
B.2 δ-function
The Dirac δ-function is defined as

∞ x=0
δ(x) = , (B.15)
0 x = 0
2
This normalization condition intro- with the condition that2
duces the scaling property ˆ ∞
x
δ = aδ(x) . δ(x)dx = 1 . (B.16)
a −∞
An important property of the δ-function is the ability to shift any

function to a new location using a convolution integral, see also
Section B.4,
ˆ ∞
f(x)δ(x − x1 )dx = f(x1 ) . (B.17)
−∞
If x1 = x − d then the function is translated by a distance d along the x

axis. This property also allows us to make copies of a function,
ˆ ∞
f(x ) [δ(x − x1 ) + δ(x − x2 )] dx = f(x1 ) + f(x1 ) . (B.18)
−∞
We shall use this property later, e.g. in Section B.9.

B.3 Properties 241
B.3 Properties
In this section, we consider some properties of the Fourier transform
operator.
Central ordinate theorem: From eqn (B.1), it follows that the
amplitude of the zero spatial frequency component is equal to the area
under the curve,
ˆ ∞
F(0) = f(x)dx . (B.19)
−∞
This can provide a useful self-consistency check. For an odd function

f(x) = −f(−x), we obtain F(0) = 0.
Symmetry: For even functions where f(x) = f(−x), F(u) = F(−u), the
‘forward’ and inverse transforms are the same, and applying the forward
transform twice returns the original function. For non-symmetric
functions, we find
ˆ ∞
F [F [f(x)]] = F [F(u)] = F(u)e−i2πux du
−∞
= f(−x) , (B.20)
where we have made use of the definition of the inverse transform.3 3

This double Fourier transform arises
in an optical system involving two
lenses, such as that considered in
Linearity: A Fourier transform is a linear operator, Chapter 9.
F [g(x) + h(x)] = G(u) + H(u) . (B.21)
In optics, the property of linearity leads to Babinet’s principle, Sec-

tion 6.8.
Translation:4 If we move a function along the x axis by a distance d 4

This property arises frequently in
then the Fourier transform becomes optics, for example, in Fraunhofer
diffraction, see Chapter 5.
F [f(x − d)] (u) = F(u)e−i2πud . (B.22)
If we translate the function, the Fourier transform acquires a linear phase

gradient, e−i2πud .
Scaling:5 We can stretch or compress a function, f(x/a), by increasing 5

This inverse scaling between real space
or decreasing the scaling factor, a, respectively, where a characterizes the and Fourier space gives rise to the
‘uncertainty’ relation between position
‘width’ or wavelength of the function. The Fourier transform is given by and momentum.
x
F f (u) = aF (ua) . (B.23)
a
If we stretch the function in real space by increasing a, then we compress
the width in Fourier space by a factor of 1/a.
Derivatives: As long as f(x) → 0 for |x| → ∞ then the Fourier

transform of a derivative can be found by integrating by parts:

d
F f(x) (u) = 2πiuF (u) . (B.24)
dx
The differential operator can be applied repeatedly to obtain higher-
order derivatives.
Parseval’s (or Rayleigh’s) theorem:

ˆ ∞ ˆ ∞
|F(u)| du =
2
|f(x)|2 dx . (B.25)
−∞ −∞
This result is related to energy (or flux) conservation.
B.4 Convolution
6
The convolution concept is useful in The convolution6 of two functions of x, g(x) and h(x), is defined as7
optics because it is often possible to ˆ ∞
write the input field either in the
form of a convolution integral, or as a (g ∗ h)(x) = g(x )h(x − x )dx . (B.26)
−∞
product of two functions, in which case
the field downstream can be expressed For a particular value of x this has the form of an overlap integral with
as a convolution. Convolution is also
useful in electronics, where for example,
h reflected about the x axis and re-centred at x, see Fig. B.1. In the
the output of a filter is given by a
convolution of the input function with
the frequency response of the filter.
7
Also sometimes written as g(x)⊗h(x).
Fig. B.1 The convolution of g(x ) and
h(x ): The left panel shows g(x ) and
h(x − x ) as a function of x for five
values of x. The parameter x behaves
as an offset—as x is varied, the function
h(x − x ) moves along the x axis. Note
that h(−x ) is a mirror reflection of
h(x ) about the x axis. The overlap
between g(x ) and h(x − x ) is shaded.
The shaded area gives the value of the
convolution as indicated by the black
dots in the right-hand panel. The
black dots trace out the full convolution
function, g ∗ h.
left-hand column of Fig. B.1, we illustrate this overlap in grey between

two arbitrary functions, g(x ) and h(x ), where g(x ) is a rect function
and h(x ) is a skewed-right distribution, rising steeply on the left
and falling slowly on the right. The function h(x − x ) is flipped left–
right becoming skewed left (rising slowly on the left and falling quickly
on the right) and re-centred at x = x . The five rows correspond to
five different values of x. The corresponding values of the convolution
at each of these values of x is indicated by the black dots in the right-
hand column of Fig. B.1. The black line shows the convolution at all
B.5 rect sinc 243
values of x. Operationally, we can think of a convolution as equivalent to

sliding one function (flipped left–right) through the other and summing
the overlap at each position.8 We shall find the convolution particularly 8
The example shown is similar to the
useful for translating functions and hence making multiple copies of a case in electronics where a square
pulse is sent through an RC-filter:
function. For example, if we want to re-centre the function rect(x/a) to a g(x ) describes the pulse and h(x ) the
position x = d, then we take rect(x/a)∗δ(x−d), and if we want to repeat response of the filter (in practice, a
the function rect(x/a) at x = −d, we take rect(x/a)∗[δ(x+d)+δ(x−d)]. step function on the left-hand and an
The Fourier transform of a convolution of two functions is equal to a exponential decrease on the right-hand
side). The convolution gives the output
product of their Fourier transforms, pulse shape.
F [(g ∗ h)(x)] = G(u)H(u) . (B.27) 9

For angular frequencies the convolu-
tion integral should contain the same
This is known as the convolution theorem. The inverse convolution 2π factor that appears in the Fourier
theorem states that the Fourier transform of a product of two functions transform, i.e.
is equal to the convolution of their Fourier transforms, i.e. (G ∗ H)(kx ) =
ˆ ∞
dk
G(kx )H(kx − kx ) x . (B.29)
F [g(x)h(x)] = (G ∗ H)(u) . (B.28) −∞ 2π
This equation is very useful in optics when we are able to write the input
field in terms of a product of functions.9 We consider another example
of a convolution of two rect function in the Exercise B.11, see Fig. B.15.
B.5 rect sinc

The rect function describing a rectangular ‘pulse’ or ‘top hat’ of width
a is given by
x
0 |x| > a/2
rect = . (B.30)
a 1 |x| ≤ a/2
The Fourier transform is

x ˆ ∞ x
F rect (u) = rect e−i2πux dx
a −∞ a
ˆ a/2
e−iπua − eiπua
= e−i2πux dx =
−a/2 −i2πu
sin(πua)
= = a sinc (πua) , (B.31)
πu
where sincα = sin α/α is the sinc function. The sinc function is shown
in Fig. B.2.
Apart from a sign change, the Fourier transform and inverse transform
are the same. Using the double Fourier transform relationship of Fig. B.2 rect–sinc Fourier trans-
eqn (B.20), it follows that the Fourier transform of a sinc function is form pair: (a) f(x) = rect(x/a). (b)
a rect, i.e. rect and sinc are a Fourier transform pair. The sinc function F(u) = F [f](u). For a rect function
with width a, the first zeros in the sinc
is maximum at the origin,
function are at u = ±1/a.
lim a sinc (πua) = a,

a→0
the amplitude at u = 0 is equal to the area under the rect function in

agreement with the central ordinate theorem, eqn (B.3). As is shown in
Fig. B.3, the zeros of the sinc function occur at
πua = ±mπ . (B.32)
In optics, the spatial frequency, u, often maps back into a real-space
coordinate according to u = x/λz, which gives us that the position of
the first zero (m = 1) is x = (λ/a)z.
Fig. B.3 Effect of scaling on the

rect–sinc pair: The Fourier transforms
of rect(x/a) with (a) a = 1 and
(b) a = 2. The rect functions are
shown inset. Doubling the width of
the rect halves the width of the sinc
(scaling property), and doubles the
height (central ordinate theorem). The
limit a → ∞, f(x) = 1 gives f(u) =
δ(u), where δ(u) is the Dirac δ-function.
B.6 gauss gauss

We have seen that gaussian functions are useful in optics, in the context
of laser beams. They also appear in the description of wave packets
in quantum physics and pulses in electronics. We define the function
labelled gauss as
10
Note that the definition of ‘width’
x
= e−x /w0 .
2 2
depends on the context. For example,
in probability theory the normal distri-
gauss (B.33)
w0
bution is defined as
gauss(x/w) is unity at the origin and falls to 1/e at x = ±w0 . We refer
1 x2
P (x) = exp − , to w0 as the ‘width’ of the gaussian.10 The Fourier transform of gauss is
(2πσx2 )1/2 2σx2
ˆ ∞
and the ‘standard √deviation’, σx , x x2
corresponds to a 1/ e ‘width’. In F gauss = exp − 2 e−i2πux dx ,
w0 −∞ w
contrast, the intensity of a laser beam ˆ ∞
is
= e−π u w0 e−(x/w0 +iπuw0 ) dx ,
2 2 2 2
2x2
I(x) = I0 exp − 2 , −∞
w0
and the ‘width’, w0 , is the 1/e2 radius. where we have rearranged the exponent by completing the square. If we
11 ´ ∞ 2 √
−∞ exp −ξ dξ = π. define a new variable, ξ = x/w0 + iπuw0 , then dξ = dx/w0 and11
ˆ ∞
x
(u) = w0 e−π u w0
2 2 2
F gauss exp −ξ 2 dξ
w0 −∞
√
πw0 e−π u w0
2 2 2
=
√
= πw0 gauss (πuw0 ) . (B.34)
B.6 gauss gauss 245
The Fourier transform of a gaussian is a gaussian!
This is depicted in Fig. B.4. We could also write the Fourier transform
in terms of angular spatial frequency,

x √ kx w0
F gauss (kx ) = πw0 gauss . (B.35)
w0 2
Note that for w2 = 1/π we have

F e−πx (u) e−πu ,
2 2
= (B.36)
demonstrating the self-Fourier property, F[f(x)](u) = f(u), of a

gaussian and its transform. From eqn (B.24), we find that the derivatives
of a gaussian share the same self-similar property except for a phase shift,

F xe−πx (u) = −iue−πu .
2 2
(B.37)
This result is important in laser physics, where we find that both gaus-
sians and derivatives of gaussians—related to Hermite polynomials—can
be used to describe the transverse-field profile of the field inside laser
cavities, see Chapter 11.
We can also have a gaussian with a complex argument. For a purely
imaginary argument the gaussian is a cosine and sine of a quadratic
function, like the wave front of a circular wave in Fig. 2.1. The Fourier
transform remains self-similar, for example with w2 = iλz/π,
√ Fig. B.4 gauss gauss: (a) f(x) =
F e−πx /iλz (u) = iλze−iπλzu ,
2 2
(B.38) gauss(x/w0 ). (b) F(u) = F [f](u) =
gauss(πw0 u).
which is relevant to the derivation of the Fresnel diffraction integral, see
Chapter 5.
Example B.1
Heisenberg uncertainty relationship: Consider a gaussian wave packet described
by the probability distribution

1 x2
P (x) = exp − ,
(2πσx2 )1/2 2σx2
√
where σx is the 1/ e width or standard deviation. In electromagnetism or quantum
mechanics this probability distribution would correspond to the field amplitude or
12
wave function,12 In optics, the field amplitude squared
determines the intensity or flux and
1 x2
ψ(x) = exp − . hence the probability of detecting a
(2πσx2 )1/4 4σx2 photon. In quantum mechanics, the
The amplitude distribution of angular spatial frequencies is given by the Fourier ground state
√ of the harmonic oscillator,
transform with respect to kx , σx = a0 / 2, where a0 = (/mωosc )1/2
1 is the 1/e width and ωosc is the
ψ̃(kx ) = π 1/2 2σx exp −kx2 σx2 , oscillation frequency.
(2πσx2 )1/4
and the probability distribution is
√
P(kx ) = 2 2π 1/2 σx exp −2kx2 σx2 .
We can also write the probability distribution of spatial frequencies as P(k) =

2 2
e−kx /2σk /(2πσk2 )1/2 . Comparing these two expressions we find that 2σx2 = 1/2σk2 ,
so defining the uncertainties in x and k as the standard deviations, Δx = σx and
Δkx = σk , we find
1
ΔxΔkx = 2
. (B.39)
Using p = k we have ΔxΔpx = /2, which is the Heisenberg uncertainty principle.

In fact, a gaussian wave packet has the special property that the product of the real
and momentum space width is a minimum. This result applies to light at the focus
of a laser beam, see Chapter 6.
B.7 δ-function constant

The Fourier transform of the real-space Dirac δ-function, δ(x), is
ˆ ∞
F(u) = F [δ(x)] = δ(x)e−i2πux dx = 1 , (B.40)
−∞
i.e. δ(x) and 1 are a Fourier transform pair as illustrated in Fig. B.5.
Similarly, for angular spatial frequencies,
ˆ ∞
F(kx ) = F [δ(x)] = δ(x)e−ikx x dx = 1 . (B.41)
−∞
The real space δ–function, δ(x), contains all angular spatial frequencies
with an equal amplitude, i.e. F(kx ) = 1. Inserting F(u) = 1 into the
Fig. B.5 δ-function constant: (a) inverse transform, eqn (6.9), we find
f(x) = δ(x) (the vertical arrow indicates ˆ ∞
a δ-function) and (b) F(u) = F [f ](u).
δ(x) = ei2πux du . (B.42)
−∞
If we swap x and u in this expression, we find

ˆ ∞ ˆ ∞
δ(u) = ei2πux dx = e−i2πux dx = F [1] , (B.43)
−∞ −∞
i.e. δ(u) is the Fourier transform of a constant in real space f(x) = 1.

Or to put it another way, a constant f(x) = 1 corresponds to a single
spatial frequency with frequency zero, δ(u). To convert between spatial
and angular spatial frequency we note that it is the area ‘under’ the δ-
function that is fixed, so if we rescale the width we need to renormalize
the ‘height’. This gives

kx
F [1] (u) = δ(u) = δ = 2πδ(kx ) . (B.44)
2π
We have to remember this 2π factor if we use angular spatial frequencies.

In optics the Dirac δ–function appears most frequently when we have a
sinusoidally varying field component as an input, as we consider next.
B.8 Phasor δ-function

The Fourier transform of a cosine with spatial frequency u0 is
ˆ
1 ∞ −i2π(u−u0 )x
F [cos(2πu0 x)] (u) = e + e−i2π(u−u0 )x dx ,
2 −∞
= 12 [δ(u − u0 ) + δ(u + u0 )] , (B.45)
where u0 is the spatial frequency of the cosine. Be careful not to confuse

u0 , which is a constant, with the Fourier variable u. This expression
shows that cosine and 2 δ-functions are a Fourier transform pair, see
Fig. B.6. We can think of the two δ-functions as representing counter-
propagating plane waves which interfere to produce a cosine standing
wave. If instead we work with angular spatial frequencies, the Fourier
Fig. B.6 cosine and two δ-functions:
transform contains an extra 2π factor, (a) f(x) = cos(2πu0 x). (b) F(u) =
F [f ](u).
F [cos(k0 x)] (kx ) = π [δ(k − k0 ) + δ(k + k0 )] . (B.46)
B.9 comb comb

A comb function is a sum of regularly spaced δ-functions. The
dimensionless unit-spacing Dirac comb is defined as13 13
The cyrillic symbol X is used to
represent the Dirac comb—‘the fluffy-
∞
gray, three-stemmed Russian letter
X(x̃) = δ (x̃ − m) , (B.47) that stands for sh, a letter as old as the
m=−∞ rushes of the Nile’, Speak Memory, 1951
by Vladimir Vladimirovich Nabokov
where x̃ is dimensionless and m is an integer. The unit-spacing Dirac (St. Petersburg 1899–Montreux 1977).
comb is also a self-Fourier function,
F [X(x̃)] (ũ) = X(ũ) . (B.48)
In optics, we are interested in comb functions in space or time, with

teeth separated by a distance d or a time T . For example, a function to
describe N teeth separated by a distance d, may be written as14 14
We can relate the replicating comb
function and the Dirac comb as follows:
(N −1)/2
(N )
∞

Xd (x) = δ (x − md) , (B.49) Xd (x) = δ (x − md) ,
m=−∞
m=−(N −1)/2
1 ∞ x
and, similarly, in time, = δ −m ,
d m=−∞
d
(N −1)/2 1 x
(N )
= X ,
XT (t) = δ (t − mT ) . (B.50) d d
m=−(N −1)/2 where the 1/d factor arises due to
the scaling property of the δ-function:
This comb function is particularly useful as a replicator. To make N δ(x/d) = dδ(x). To generate a finite
comb we can use
copies of the spatial function g(x), with spacing d, we convolve with a
(N ) 1 x x
replicating comb as follows: Xd (x) = X rect .
d d Nd
(N −1)/2
(N )

f(x) = g(x) ∗ Xd (x) = g (x − md) . (B.51)
m=−(N −1)/2
The Fourier transform of the array, f(x), is given by the convolution of

(N )
the Fourier transforms of g(x) and Xd (x). Using eqn (B.40) and the
translation property of the Fourier transform, eqn (B.22), one finds that
15
Each successive term is multiplied by the Fourier transform of a finite comb with spacing d is15
a factor of ei2πud , and we can use the
formula for geometrical progression,
F Xd (x) = e−i(N −1)πud + e−i(N −1)πud ei2πud . . . + ei(N −1)πud ,
(N )

N −1
a(1 − r N ) sin N πud
arj = ,
1−r = .
j=0 sin πud
to perform the sum analytically:
For small N , it is more convenient to write the Fourier transform in
(N )
F Xd (x) (u) terms of a discrete sum of phasors. For N = 2,

F Xd (x) = e−iπud + eiπud = 2 cos πud ;

N −1 (2)
(B.52)
= e−i(N −1)πud ein2πud ,
n=0
−i(N −1)πud 1− eiN 2πud for N = 3,

= e .
1 − ei2πud
F Xd (x) = e−i2πud + 1 + ei2πud ,
(3)
(B.53)
(3)
and so on. Figure B.7 shows the example of a Xd (x), and its transform.
The position of the first zero in the phasor sum is when the phasors
are evenly distributed around the clock face at angles of 2πn/N , with
n going from −(N − 1)/2 to +(N − 1)/2, see Fig. B.8. For N = 3,
Fig. B.8(i), the phasor angles are −2π/3, 0 and +2π/3. When we take
the modulus-squared, we obtain a principal maximum with height 9
and subsidiary maxima with height 1.
(N )
The modulus-squared of the Fourier transform of Xd (x) for N = 2–
6 is illustrated in Fig. B.9. The generic properties of the N -phasor sum
are:
• The intensities of the principal maxima are proportional to N 2 .

(3) • The position of the next principal maximum is u = 1/d.
Fig. B.7 (a) f(x) = Xd (x) (the
vertical arrows indicate δ-functions) • The position of the first zero is u = 1/N d.
and (b) F(u) = F [f](u).
• There are N −2 subsidiary maxima between the principal maxima.
For N rect functions with width a and spacing d, the Fourier transform
is
x sin N πud
(N )
F rect ∗ Xd (x) (u) = a sinc(πua) .
a sin πud
In Fig. B.10 the square of the Fourier transform is plotted for N = 1, 2,
3, 4, 5, and 12 with d = 2a. In optics, this expression is used to describe
the diffraction pattern for N -slits or a diffraction grating. As the
number of rect functions increases the peaks become narrower but their
Fig. B.8 Phasor diagrams correspond-
ing to the first zero of the Fourier height is still given by the envelope arising from a single rect function.
(N )
transform of Xd (x) for (i) N = 3, For large N the spectrum approaches the discrete Fourier series that we
(ii) 4, and (iii) 5. saw in trying to build a square, see Fig. 6.3. In optics, the square-wave
pattern with equal regions of on and off is known as a Ronchi grating.
B.9 comb comb 249
We can also write a finite comb with N terms as a product of an

infinite comb and a rect function, with dimension L = N d,
(N ) 1 x x
Xd (x) = X rect .
d d Nd
As the Fourier transform of the scaled Dirac comb is

x
F X (u) = dX (ud) ,
d
then the Fourier transform of N rect functions is

x 1 x x
F rect ∗ X rect
a d d Nd
= N a sinc(πua) [dX(ud) ∗ sinc(N πud)] ,

∞
m
= N a sinc(πua) δ u− ∗ sinc(N πud) ,
m=−∞
d
∞

= N a sinc(πua) sinc [N π(ud − m)] .
m=−∞
This is a useful form as it shows that for large N —bottom row in

Fig. B.10—the narrow peaks are described by sinc functions, see Section Fig. B.9 The modulus-squared of
(N )
6.7. This topic is also explored in Chapter 7. the Fourier transform of Xd (x) for
increasing values of N .
Fig. B.10 The convolution of a rect

(N )
and Xd (x) (with d = 2a) for
N = 1–11. The left-hand panel
shows the convolution and the right-
hand panel shows the corresponding
modulus-squared of the Fourier trans-
form. The square-wave pattern on the
bottom row corresponds to a Ronchi
grating in optics.
B.10 2D Fourier transforms

In 2D a Fourier transform takes the form of a double integral. In this
case the definition of the Fourier transform and its inverse become
ˆ ∞ˆ ∞
F(u, v) = f(x, y)e−i2π(ux+vy) dxdy , (B.54)
−∞ −∞
ˆ ∞ˆ ∞
f(x, y) = F(u, v)ei2π(ux+vy) dudv , (B.55)
−∞ −∞
where u and v are the spatial frequencies corresponding with the x and
y directions, respectively. In optics there are two cases which occur
frequently: (i) cartesian separable where the function can be written
as a product of functions of x and y; and (ii) cylindrical symmetry,
where the function has cylindrical symmetry.
B.11 Cartesian separability

If the function f(x, y) is cartesian separable, we can write f(x, y) =
g(x)h(y), then
F [f(x, y)] = F [g(x)] F [h(y)] = G(u)H(v) , (B.56)
where G(u) = F [g(x)] and H(v) = F [h(y)] are the 1D transforms. Note
that 2D transforms have the same symmetry as the function, e.g. if the
function has two axes of symmetry then the Fourier transform will also
have two axes of symmetry.
B.12 2D rect
A simple example of a cartesian separable function is the two-dimensional
Fig. B.11 Two-dimensional intensity rect function given by the product of two rect functions in the x and y
map corresponding to the modulus- directions, i.e.
squared of the Fourier transform of x y
a two-dimensional rect function with 0 |x| > a/2 or |y| > b/2
rect rect = . (B.57)
width three times larger than the a b 1 |x| ≤ a/2 and |y| ≤ b/2
height, a = 3b. Both the function
and the transform have two axes of Using the one-dimensional Fourier transform of rect, eqn (B.31), and
symmetry.
cartesian separability, eqn (B.56), we find that
x y
F rect rect = ab sinc (πua) sinc (πvb) . (B.58)
a b
The Fourier transform of a two-dimensional rectangular function with
width three times larger than the height, a = 3b, is shown in Fig. B.11.
B.13 circ jinc

Another important class of two-dimensional functions are those with
cylindrical symmetry. In this case, it is convenient to use the
B.13 circ jinc 251

cylindrical
√ radial distance ρ = x2 + y 2 and its Fourier partner W =
u2 + v 2 .
The circ function is the cylindrically-symmetrical equivalent of the
rect function, and in optics it describes a circular aperture. The circ
function with diameter D is written as
ρ
0 ρ > D/2
circ = (B.59)
D 1 ρ ≤ D/2 .
The Fourier transform is given by
ρ πD2
F circ (u, v) = jinc (π W D) , (B.60)
D 4
where jinc is the cylindrical analogue of sinc. The derivation proceeds
as follows: we re-write the two-dimensional Fourier transform in polar
coordinates,
ˆ ∞ˆ ∞
F[f] = f(x, y)e−i2π(ux+vy) dxdy ,
−∞ −∞
ˆ 2π ˆ ∞
= f(ρ, θ)e−i2πW ρ(sin φ sin θ+cos φ cos θ) ρdρdθ ,
0 0
where W and φ are the Fourier space equivalents of ρ and θ. Using the
identity sin φ sin θ + cos φ cos θ = cos(φ − θ), if f(ρ, θ) is independent of
θ, then the angular integral gives the Bessel function,
ˆ 2π
1
J0 (W ρ) = e−i2πW ρ cos(φ−θ) dθ ,
2π 0
and we can rewrite the Fourier transform as an integral over ρ only,
ρ ˆ D/2
F circ = J0 (W ρ) ρdρ ,
D 0
πD2 J1 (π W D)
= ,
4 πW D
using the Bessel function identity
ˆ α
J1 (α) = βJ0 (β)dβ .
0
In Fig. B.12 we plot the modulus-squared of the jinc function. In

Fig. B.12 (i) The circ function. (ii)
optics the light distribution corresponding to jinc2 is known as the Airy The modulus-squared of the Fourier
pattern. The first zero of the jinc pattern occurs at transform corresponds to an Airy
pattern with first zero at a radial
WD = 1.22 , (B.61) displacement corresponding to a spatial
frequency of 1.22/D.
compared to uD = 1 for a sinc. The jinc function appears frequently
in optics because we often deal with systems with cylindrical symmetry
and with circular apertures. For example, lenses are typically circular
and we will find that the diffraction-limited focus of a lens corresponds
to the Airy pattern, and that this sets the limit for the resolution of
most optical instruments, such as telescopes.
Example B.2
Arrays of identical shapes: Arrays of the same object are formed by a convolution
with a replicating comb function, see Sections 6.7 and B.9. Here we consider
the simple example of two circ functions separated by a distance, d, as shown in
Fig. B.13(i). Two circ functions centred at positions ±d/2 along the x axis are
described by the function
ρ
(2)
f(x, y) = circ ∗ Xd (x) . (B.62)
D
The Fourier transform is
πD2
F [f(x, y))] (u, v) = 2 cos πud jinc (π W D) . (B.63)
4
The modulus-squared of the Fourier transform corresponds to an Airy pattern with
cosine-squared interference fringes, as observed in Young’s two-hole experiment, see
Fig. B.13(ii) plus Chapters 3 and 5.
B.14 Fourier on a computer

Fig. B.13 Fourier transform of two circ The application of Fourier transforms has become more widespread
functions: (i) The two circ functions following the development of algorithms allowing their implementation
in the (x, y) plane separated by a on a computer. Most scientific software packages contain an in-built
distance d in the x direction. (ii)
Fourier module often called the Fast Fourier Transform (FFT). We can
The modulus-squared of the Fourier
transform in the (u, v) plane. In optics, use this as a black box but having some idea of how it works is useful.
this distribution is observed in Young’s One difficulty is that the FFT algorithm does not use the variables x
two-hole experiment. The maxima in and u. Instead it works with integer indices—we shall call them m
the interference pattern occur at u =
m/d, where m is an integer and the first
and n—and our first task is to map the continuous input function,
zero in the Airy pattern envelope occurs f(x), into an array with N values uniformly distributed in x, e.g.,
at u = 1.22/D. f[m] = f(x[m]), where x[m] = xmin + m(xmax − xmin )/N , and m is
an integer between 0 and N − 1 (or 1 and N ). The FFT algorithm
uses a discrete Fourier transform to find the amplitudes of the complex
harmonics, F[n], required to build a wave form that passes through the
points f[m]. For N points, the discrete Fourier transform is written as

N −1
F[n] = f[m]e−i2πnm/N , (B.64)
16 m=0
One of the quickest routes to de-
termining whether FFT delivers the
output we expect is to find the Fourier where n is also an integer between 0 and N − 1. In this expression,
transform pair for a simple case such the real and Fourier space variables x and u are replaced by the
as cos 2πu0 x, and checking that the integers m and n, respectively. This equation is not exactly the discrete
Fourier transform returns peaks at the
spatial frequencies ±u0 .
variant of the Fourier transform because the sum is only over positive
frequencies (because computers prefer positive indices), whereas the
Fourier transform integrates over both positive and negative frequencies.
We can see the consequence of this difference by looking at a specific
example.16
Exercises 253
Example B.3
Discrete FT: Using eqn (B.64) we implement a Fourier transform manually and
compare it to an in-built routine. As an input we choose the binomial sequence with
N = 9 values, i.e. f[n] = [1, 8, 29, 58, 72, 58, 29, 8, 1]. This has a similar shape to a
gaussian, as illustrated in the top row of Fig. B.14. To map a gaussian centred at
the origin onto the m axis, we would use x[m] = [−4, −3, −2, −1, 0, 1, 2, 3, 4].
The values of F[n] found using eqn (B.64) are plotted as points in the middle row.
The values are complex so we plot the modulus. The grey boxes indicate the output
of the ‘python’ open-source fft module. The fact that they agree shows that the
in-built module is using an algorithm based on eqn (B.64). Note that the first term
in our output array is F[0] which is the zero frequency component (or dc offset in
electronics). Putting n = 0 in eqn (B.64), we recover the central ordinate theorem,

N −1
F[0] = f[m] , (B.65)
m=0 Fig. B.14 The top row shows the input
function. The middle row shows the
which in our example is 264. For n > 0 the terms decrease and then increase again
prediction of eqn (B.64) (black circles)
close to n = N − 1. This increase is not so surprising when we remember that for a
and the output of an open source fft
discrete Fourier series the wave form repeats, and if we continued with values n ≥ N
code. The bottom row shows the effect
we would return an identical sequence of numbers. This cyclic nature of the sum
of applying the function fftshift to the
means than we can shift the sequence such that zero is in the middle.17 The shifted
output.
output is gaussian-like, Fig. B.14(bottom row), as expected for a ‘gaussian’ input.
17
In ‘python’, the fftshift function
performs this operation.
Exercises
(B.1) Fourier transform properties (2)
(iv) Xd (x) ∗ rect(x/a).
Evaluate the Fourier transforms of the following (v) gauss(x/wx )gauss(y/wy ).
functions, using F(u) = F[f(x)](u), G(u) =
F[g(x)](u), H(u) = F[h(x)](u), and H(v) = (vi) gauss(ρ/w0 ).
F [h(y)](v), etc. where appropriate: (vii) rect(x/a)gauss(y/w0 ).
(viii) circ(ρ/D) cos(2πu0 x).
(i) g(x) + h(y).
Comment on when you might encounter each of
(ii) f(x − d).
these functions in optics.
(iii) f(x − d/a).
(B.3) rect
(iv) f[(x − d)/a]. (a) Sketch the following functions: (i) rect(2x),
(v) g(x)h(x). (ii) (1/2)rect(x/2), and (iii) 5rect[(x − 5)/5]. (b)
(vi) g(x)h(y). Write down the Fourier transforms of the following
functions (i) 5rect(x), (ii) 3rect(x/3), and (iii)
(vii) [f(x) ∗ g(x)]h(x). rect(x/5). Sketch the sum of the functions, and
(viii) [f(x) ∗ g(x)]h(y). the Fourier transform of the sum.
(B.2) Fourier transforms (B.4) Convolution of different-width rect functions
Evaluate the Fourier transforms of the following: Use a graphical technique to convolve rect(x/a)
with rect(x/b), where b > a.
(i) rect(x/a) ∗ δ(x − d).
(B.5) sine and cosine
(ii) rect(x/a)ei2πu0 x . The Fourier transform of cosine is two δ functions,
(4)
(iii) Xd (x). F [cos(2πu0 x)] (u) = 21 [δ(u − u0 ) + δ(u + u0 )] .
254 Exercises
Which two properties of Fourier transforms are use the results above to show that
used in deriving this result? What happens to the
Fourier transform in the limit u0 → 0? What is F [Xd (x)] (u) = X1/d (u) .
the Fourier transform of sin(2πu0 x)?
(B.6) Gaussian functions Comment on the unusual scaling property of this
Write an equation for the Fourier transform of pair of functions.
f(x) = gauss(x/a). In the propagation of light,
such as the paraxial approximation to a plane (B.8) comb
wave at an angle, we encounter quadratic phase An aperture consists of four narrow slits at x =
2
factors of the form, H(kx ) = eikx z/2k . The far-field −2d, −d, d, and 2d. Write an expression for
light distribution can be written as a convolution the aperture function, f(x), both as a difference
of the inverse Fourier transform of this function, between a comb function and a δ-function, and
h = F −1 [H(kx )](x), with the input field. Find the as a convolution of two comb functions. Write
function h. [Hint: H is a gaussian with a complex expressions for the Fourier transform in both
width, a, given by 1/a2 = −iz/k.] cases as a sum of phasors, and show that
they are the same. How many subsidiary
(B.7) Dirac combs and replicators maxima are there between the principal maxima
By substituting x̃ = x/d and ũ = ud into in the interference pattern far downstream of
ˆ ∞ ∞ the aperture? Comment on your reasoning by
F [X(x̃)] = δ(x̃ − m)e−i2πũx̃ dx̃ , describing or drawing a phasor diagram and giving
−∞ m=−∞
the angles corresponding to each of the zeros.
= X(ũ) , (B.9) Spatial frequency or angular spatial frequency
show that Rewrite eqns (B.3) to (B.14) in terms of angular
x
spatial frequencies kx = 2πu and ky = 2πv.
F X (u) = dX (ud) . (B.10) Two-dimensional Fourier transforms
d
If F(kx , ky ) = F[f(x, y)], find an expression for
If we define the replicator functions: F(0, 0) for f(x, y) = rect(x/D)rect(y/D) and
1 x
∞ f(x, y) = circ(ρ/D). What is the ratio?

Xd (x) = (x − md) = X ,
d d (B.11) tri
m=−∞
Find the Fourier transform of the function,
m
∞
X1/d (u) = u− = dX (ud) , tri(x/a) = (1/a) [rect(x/a) ∗ rect(x/a)], see
m=−∞
d Fig. B.15. Comment on the dimensions of tri(x/a).
Fig. B.15 The convolution of two rect

functions. See Exercise B.11.
Induced dipoles C
C.1 Induced dipole moment C.1 Induced dipole moment 255
Here, we present a semi-classical derivation of a polarizability, α, for a C.3 Complex polarizability 258
two-level quantum system such as an atom, ion, molecule, or quantum C.4 Lorentz model 259
dot. The two-level system consists of a ground state, labelled a, and
an excited state, labelled b, with energies Ea = 0 and Eb = ω0 ,
respectively. The expectation value of the dipole operator, or induced
dipole moment, is
d = −eψ|rλ |ψ , (C.1)
where rλ is a projection of the electron position, r, on the polarization

vector of the light. For the two-state wave function,
|ψ = ca e−iEa t/ |a + cb e−iEb t/ |b , (C.2)
we find that
d = −D0 (c∗a cb e−iω0 t + c∗b ca eiω0 t )

= −D0 (ρab e−iω0 t + ρba eiω0 t ) , (C.3)
where ρij are known as coherences, the off-diagonal elements of the

density matrix ρ = |ψψ|, and D0 = |Dij | = −e|i|rλ |j| is the dipole
matrix element for a transition between states |i and |j.1 It is 1
If |i and |j are real then Dji = Dij .
important to make a clear distinction between D0 , which is a constant We choose to define D0 as the modulus
of the matrix element such that it is
and depends only on the specific properties of the atom, and d—the positive, but this means that we need
induced dipole moment—which depends on the amplitude of the to include the minus sign explicitly in
applied field. the expression for the induced dipole.
Not at all transparent in eqn (C.3) is that (i) the induced dipole
oscillates at the same frequency as the driving field and (ii) the induced
dipole moment is real. If the two-level system at z = 0 is subject to a
monochromatic field,
E = E0 cos(kz − ωt) , (C.4)
then it is convenient to rewrite eqn (C.3) as
d = −D0 (ρ̃ab e−iωt + ρ̃ba eiωt ) , (C.5)
where
ρ̃ab = ρab eiΔt , and ρ̃ba = ρba e−iΔt , (C.6)

256 Induced dipoles
and Δ = ω − ω0 is the detuning. Below we show that for constant field

amplitude E, ρ̃ is time independent, and hence the dipole oscillates at
the same frequency as the field. Next, we separate the terms in eqn (C.5)
into components that oscillate in phase and in quadrature (90◦ out of
phase) with the driving field,
d = −2D0 [u cos ωt − v sin ωt] , (C.7)
where
1 1
u = (ρ̃ab + ρ̃ba ) , and v= (ρ̃ab − ρ̃ba ) . (C.8)
2 2i
For a constant field real amplitude E0 , both u and v are constant and
real. We find u and v by solving the optical Bloch equations.
C.2 Optical Bloch equations

Starting from the Schrödinger equation,
∂|ψ
i = (H0 + Hint )|ψ ,
∂t
with
H0 = Ea |aa| + Eb |bb| , and Hint = −D̂ · E ,
where D̂ = −er is the dipole operator. Note that the electric field
vector is directed from positive to negative, whereas the dipole vector is
directed from negative to positive, as in Fig. C.1. The lower energy state
is when D̂ and E are parallel, in which case we can write Hint = −D̂E,
where D̂ = −erλ is the projection of r only the polarization state of the
Fig. C.1 A classical dipole in an light. Inserting E = E0 cos(kz − ωt) and the two-state wave function, we
electric field. have

i ċa e−iEa t/ |a + ċb e−iEa t/ |b − ca Ea e−iEa t/ |a + cb Eb e−iEb t/ |b

= Ea |aa| + Eb |bb|−D̂ ca e−iEa t/ |a + cb e−iEa t/ |b E0 cos ωt.
Taking the inner product with a|eiEa t/ , we find
iċa = −D0 E0 cb e−iω0 t cos ωt ,
where D0 = |a|D̂|b| is the expectation value of the dipole operator.

Next we define a Rabi frequency,
D0 E0
Ω= . (C.9)

Also we assume that |Δ|
ω + ω0 , which allows us to neglect terms in
ω + ω0 (the rotating wave approximation). This gives
Ω
c˙a = −i cb eiΔt .
2
By taking the inner product with b|eiEb t/ we obtain,

Ω
c˙b = −i ca e−iΔt .
2
The solutions of these equations are easily found by differentiating the
first equation, and substituting from the second; and vice versa.2 The 2
For ca (0) = 1 we find
next step is to include decay due to spontaneous emission which requires Ωt
that we use the density matrix, where the coherence term is |ca |2 = cos2,
2
Ωt
Ω |cb |2 = sin2 ,
ρ̇ba = cb ċ∗a + ċb c∗a = −i (ρaa − ρbb )e−iΔt . 2
2 i.e. the populations oscillate at the
We make the substitution, ρ̃ij = ρij e−iΔt , giving Rabi frequency.
Ω
ρ̃˙ ba = −i (ρaa − ρbb ) + iΔρ̃ba .
2
Finally, we add a damping term. If the only damping mechanism is
spontaneous emission from the excited state then the coherence term
decays at a rate equal to one half of the spontaneous emission rate Γ
and the optical Bloch equation for the coherence becomes
Ω Γ
ρ̃˙ ba = −i (ρaa − ρbb ) + iΔρ̃ba − ρ̃ba .
2 2
Similarly, we obtain the corresponding equation for the rate of change
of the excited state population,
Ω
ρ̇bb = −i (ρ̃ab − ρ̃ba ) − Γρbb .
2
We can combine these equations into equations for u, v, and the
population difference w = 12 (ρbb − ρaa ):
1
u̇ = − Γu + Δv ,
2
1
v̇ = −Δu − Γv − Ωw ,
2
1
ẇ = Ωv − Γ w + .
2
In this form of the optical Bloch equations, the field acts as a torque
(Ω, 0, −Δ) on the Bloch vector (u, v, w) (see Adams et al., 1994). The
steady-state solutions of these equations are
Δ s Γ s 1 1
ust = , vst = , and wst = − ,
Ω 1+s 2Ω 1 + s 21+s
where s = (Ω2 /2)/(Δ2 +Γ2 /4) is known as the saturation parameter.
Another useful result is the steady-state population in the excited state,
1 1 s
ρbb = wst + = ,
2 21+s
Ω2 /4
= .
Δ + Γ2 /4 + Ω2 /2
2
258 Induced dipoles
In the linear-optics regime, Ω < Γ, the excited-state population on

resonance is ρbb = (Ω/Γ)2 = (D0 E0 /Γ)2 . Using the steady-state
solution, we find that the steady-state induced dipole is
d = −2D0 (ust cos ωt − vst sin ωt) , (C.10)

D02 Δ Γ/2
=− cos ωt − 2 sin ωt E0 ,
Γ2 /4 + Δ2 + Ω2 /2 Γ /4 + Δ2 + Ω2 /2
where we have used Ω = D0 E0 / in the numerator. From this form of the

average dipole we see that on resonance Δ = 0, there is no component
of the dipole in phase with the field, ust = 0, the oscillation of the dipole
lags the field by π/2 radians.
√ Also, on resonance, the dipole√ attains a
maximum value of D0 / 2 at a Rabi frequency of Ω = 2Γ/2. This
Rabi frequency corresponds to the saturation intensity. In the linear-
optics regime we can neglect the Ω2 terms in the denominator and the
induced dipole is linearly proportional to E0 ,

D2 Δ Γ/2
d = − 0 cos ωt − sin ωt E0 .(C.11)
Γ2 /4 + Δ2 Γ2 /4 + Δ2
This result can be rewritten using complex short hand.
C.3 Complex polarizability

For a complex driving field, E = E0 e−iωt , we can rewrite eqn (C.11) as
d = αE0 , where
D02 1
α = − , (C.12)
Δ + iΓ/2
and the real and imaginary parts,
D02 Δ D02 Γ/2

α
= − and α = , (C.13)
Δ2 + (Γ/2)2 Δ2 + (Γ/2)2
match the terms in eqn (C.11). Note that far off resonance, |Δ| ∼ ω0 ,
we can neglect the damping but we must include the counter-rotating
3
(ω + ω0 ) terms, and the polarizability is
We have restricted our attention
to a system with one excited state.
However, in a multi-level system there
D02 D02
α = + . (C.14)
will be many energy levels, with (ω0 − ω) (ω0 + ω)
energies Ek , that contribute to the
polarizability. In this case, we sum the The dc polarizability3 is
contribution of each mode of the dipole
to give a polarizability of state j,
2D02
2|Djk |2 α = . (C.16)
αj = . (C.15) ω0
k=j
Ek − E j
In Example C.1 we derive a relationship between Γ and the dipole
matrix element, D0 .
C.4 Lorentz model 259
Example C.1
Scattering rate: To find an expression for the scattering (or spontaneous decay)
rate, Γ, we can use the fact that the power radiated by the dipole must be equal to
the energy loss rate of a single quantum emitter. For a two-level atom, the energy loss
rate is equal to the probability of being in the excited state, ρbb , times the excited
state decay rate, Γ, times the energy of one photon, ω0 , i.e.
P = Γρbb ω0 , (C.17)
where on resonance (Δ = 0) the steady-state excited-state probability is ρbb =
(D0 E0 /Γ)2 . The power radiated by an oscillating dipole is derived in Chapter 13,
eqn (13.43). Equating the energy loss, eqn (C.17), and the power radiated using the
resonant polarizability, α = i2D02 /Γ, we obtain
2
D0 E0 2 1 2D02 ck4
Γ ω0 = E0 , (C.18)
Γ 4π0 Γ 3
which gives
k3 D02 D02
Γ = = . (C.19)
3π0 3π0 (λ/2π)3
From this result we see that the coupling between a dipole and the vacuum
(characterized by Γ) is approximately equal to the coupling between two dipoles
separated by a distance r = λ/2π. The dipole–dipole interaction energy is D0 Ed ,
where Ed is the dipolar field, eqn (A.21).
C.4 Lorentz model

In 1904, before the discovery of quantum mechanics, Lorentz4 proposed 4
Lorentz, Hendrik Antoon, Electro-
a harmonic oscillator model of light–matter interactions. Although magnetic Phenomena in a System
Moving with any Velocity Smaller than
the model makes the false assumption that the induced dipole is linearly that of Light, Verslagen Konignklijke
proportional to the field, this is reasonable in the linear-optics regime, Akademie Van Wetenschapen (Amster-
and it is possible to add in the non-linear terms in a perturbative manner. dam). Proceedings of the Section of
We will derive the classical equation of motion of a harmonically bound Science, 6, 1904, pp. 809–836; p. 813.
charge, and use the solution to derive an expression for the refractive
index.
Example C.2
Classical equation of motion: The bound charge is subject to a driving field,
E = E0 e−iωt , which induces a displacement x in the centre of mass of the electric
charge distribution. The displacement oscillates at the same frequency as the driving
field, x = x0 e−iωt . The amplitude of the oscillation depends on the amplitude of
the driving field, E0 , the drive frequency, ω, and the damping or scattering rate, Γ.
For convenience, we assume that the optical response is dominated by one resonance,
5
with resonant frequency ω0 .5 This difference between the drive frequency and the For real atoms or molecules the bound
resonant frequency is called the detuning, Δ = ω − ω0 . charge has multiple resonances, but
Consider a dipole consisting of a single bound electron with resonant frequency often the light is closer to resonance
ω0 . If the charge displacement, x, is small compared to the optical wavelength, then with one particular resonance and we
the equation of motion has the form can neglect the others.
e
ẍ + γ ẋ + ω02 x2 = − E0 e−iωt , (C.20)
m
260 Induced dipoles
where −e and m are the charge and mass of the electron. Substituting the trial
solution, x = x0 e−iωt , we find
e 1
x0 = − E0 . (C.21)
m −ω 2 + ω02 − iωγ
The induced dipole moment is defined as charge times displacement,
e2 1
d = −ex0 = − E0 . (C.22)
m ω 2 − ω02 + iωγ
Example C.3
Refractive index: From the induced dipole, eqn (C.22), assuming that the dipoles
do not interact, we obtain an expression for the refractive index,
N d N e2 1
n2 − 1 = χ= =− , (C.23)
0 E 0 m0 ω 2 − ω02 + iωγ
which, as we will show, is similar to the quantum model. In a metal, there is no
binding, ω0 = 0, and ωp = [N e2 /(m0 )]1/2 is known as the plasma frequency and
the refractive index is given by
2
ωp
n2 =1− ω 2 +iωγ
. (C.24)
At high frequency, ω > γ, this reduces to
2
ωp
n2 =1− ω2
. (C.25)
Example C.4
Comparison between quantum and classical dipole models: If the drive
frequency is close to resonance, ω + ω0
2ω0 and ω
ω0 , then
e2 1
d = E0 . (C.26)
mω0 −2Δ − iΓ
Rewriting the oscillation frequency in terms of the harmonic oscillator length, a20 =
/2mω0 , we obtain
e2 a20 1
d = − . (C.27)
Δ + iΓ/2
6
As previously, there is a clear distinc- We can define a semi-classical analogue of the dipole matrix element as D0 = −ea0 ,6
tion between the constant D0 which and then
only depends on the properties of the D2 1
atom, and the induced dipole moment d = − 0 E0 = αE0 , (C.28)
Δ + iΓ/2
d which depends on the amplitude of
the applied field. where
D2 1
α = − 0 . (C.29)
Δ + iΓ/2
This is the same result derived for the two-level system in the linear optics regime,
eqn (C.12). Note also the difference between the quantum and classical models:
eqn (C.28) predicts that the induced dipole grows linearly in proportion to the applied
field, whereas the (correct) quantum model, see eqn (C.10), shows that there is
saturation.
References
Abbott, B. P. et al. (LIGO Scientific Collaboration and Virgo

Collaboration) (2016) Observation of gravitational waves from a binary
black hole merger Physical Review Letters 116 061102.
Adams, C. S., Siegel, M., and Mlynek, J. (1994) Atom optics Physics
Reports 240 143–210.
Aitchison, I. J. R. and Hey, A. J. G. (1989) Gauge Theories in Particle
Physics, 2nd edition, Taylor and Francis, Bristol.
Berg, M. J., Sorensen, C.M., and Chakrabarti, A. (2011) A
new explanation of the extinction paradox Journal of Quantitative
Spectroscopy and Radiative Transfer 112 1170–81.
Bettles, R. J., Gardiner, S. A., and Adams, C. S. (2016) Enhanced
optical cross section via collective coupling of atomic dipoles in a 2D
array Physical Review Letters 116 103602.
Born, M., and Wolf, E. (1999) Principles of Optics 7th edition,
Cambridge University Press, Cambridge.
Boyd, R. W. (1980) Intuitive explanation of the phase anomaly of
focused light beams Journal of the Optical Society of America B 70
877–80.
Boyd, R. W. (1992) Nonlinear Optics 3rd edition, Academic Press,
London.
Boyd, R. W. and Gauthier, D. J. (2002) “Slow” and “fast” light
Progress in Optics 43 497–530.
Brillouin, L. (1960) Wave Propagation and Group Velocity Academic
Press, London.
Brooker, G. (2003) Modern Classical Optics Oxford University Press,
Oxford.
Brown, T. G. (2011) Unconventional polarization states: beam
propagation, focusing and imaging Progress in Optics 56 81–129.
Champeney, D. C. (1973) Fourier Transforms and their Physical
Applications Academic Press, London.
Chen, Z., Hua, L. and Pu, J. (2012) Tight focusing of light beams: effect
of polarization, phase, and coherence Progress in Optics 57 219–260.
Clemmow, P. C. (1966) The Plane Wave Spectrum Representation of
Electromagnetic Fields Pergamon Press, Oxford.
Corney, A. (1977) Atomic and Laser Spectroscopy. Oxford University
Press, Oxford.
262 References
Dalibard, J. and Cohen-Tannoudji, C. (1989) Laser cooling below the

Doppler limit by polarization gradients: simple theoretical models
Journal of the Optical Society of America B 6 2023–2045.
Darrigol, O. (2012) A History of Optics from Greek Antiquity to the
Nineteenth Century Oxford University Press, Oxford.
Dorn, R., Quabis, S., and Leuchs, G. (2003) Sharper focus for a radially
polarized light beam Physical Review Letters 91 233901.
Durak, K., Nguyen, C. H., Leong, V., Straupe, S., and Kurtsiefer,
C. (2014) Diffraction-limited Fabry-Perot cavity in the near concentric
regime New Journal of Physics 14 103002.
Evangelista, C., Kraft, P., Dacke, M., Labhart T. and Srinivasan M.
V. (2014) Honeybee navigation: critically examining the role of the
polarization compass Philosophical Transactions of the Royal Society
of London B: Biological Sciences 369 20130037.
Fearn, H., James, D. F. V., and Milonni, M. W. (1996) Microscopic
approach to reflection, transmission, and the Ewald–Oseen extinction
theorem American Journal of Physics 64 986–95.
Fischer, R. E., Tadic-Galeb, B. and Yoder, P. R. (2008) Optical System
Design, 2nd edition, SPIE Press, McGraw Hill, New York.
Fleischhauer, M., Imamoglu, A. and Marangos, J. P. (2005)
Electromagnetically induced transparency: Optics in coherent media
Reviews of Modern Physics 77 633–73.
Foot, C. J. (2004) Atomic Physics Oxford University Press, Oxford.
Freegarde, T. (2012) Introduction to the Physics of Waves Cambridge
University Press, Cambridge.
Gehring, G. M., Schweinsberg, A., Barsi, C., Kostinski, N., and Boyd,
R. W. (2006) Observation of backward pulse propagation through a
medium with a negative group velocity Science 312 895–7.
Goodman, J. W. (1985) Statistical Optics John Wiley and Sons, New
York.
Harris, F. J. (1978) On the use of windows for harmonic analysis with
the discrete Fourier transform Proceedings of the IEEE 66 51–83.
Hartog, A. H. and Adams, M. J. (1977) On the accuracy of the WKB
approximation in optical dielectric waveguides Optical and Quantum
Electronics 9 223–32.
Hau, L. V., Harris, S. E., Dutton, Z. and Behroozi, C. H. (1999) Light
speed reduction to 17 metres per second in an ultracold atomic gas
Nature 397 594–8.
Hooker, S. and Webb, C. (2010) Laser Physics Oxford University Press,
Oxford.
Hopkins, H. H. (1950) Wave Theory of Aberrations Oxford University
Press, Oxford.
Hopkins, S. A. and Durrant, A. V. (1997) Parameters for polarization
gradients in three-dimensional electromagnetic standing waves Physical
Review A 56 4012–22.
References 263
Horváth, G., Barta, A., Pomozi, I., Suhai, B., Hegedüs, R., Åkesson,
S., Meyer-Rochow, B. and Wehner, R (2011) On the trail of Vikings
with polarized skylight: experimental study of the atmospheric optical
prerequisites allowing polarimetric navigation by Viking seafarers
Philosophical Transactions of the Royal Society of London B: Biological
Sciences 366 772–82.
Hughes, I. G. and Hase, T. P. A. (2010) Measurements and their
Uncertainties: A Practical Guide to Modern Error Analysis Oxford
University Press, Oxford.
Hwang, J., Pototschnig, M., Lettow, R., Zumofen, G., Renn, A.,
Götzinger, S., and Sandoghdar V., (2009) A single-molecule optical
transistor Nature 460 76–80.
Isham, C. J. (1995) Lectures On Quantum Theory: Mathematical And
Structural Foundations Imperial College Press, London.
Jackson, J. D. (1999) Classical Electrodynamics 3rd edition, John Wiley
& Sons, New York.
Jackson, J. D. and Okun, L. B. (2001) Historical roots of gauge
invariance Review of Modern Physics 73 663–80.
Jackson, J. D. (2002) From Lorenz to Coulomb and other explicit gauge
transformations American Journal of Physics 70 917–28.
Jacques,V . Lai, N, D., Dréau, A., Zheng, D., Chauvat, D., Treussart,
F., Grangier, P. and Roch, J-F (2008) Illustration of quantum
complementarity using single photons interfering on a grating New
Journal of Physics 10 123009.
Jacquinot, P. and Roizen-Dossier, B. (1964) Apodisation Progress in
Optics 3 29–186.
Jennewein, S., Sortais, Y. R. P., Greffet, J. J., and Browaeys, A. (2016)
Propagation of light through small clouds of cold interacting atoms
Physical Review A 94 053828.
Jones, P. H., Maragò, O. M., and Volpe, G (2015) Optical Tweezers:
Principles and Applications Cambridge University Press, Cambridge.
Joos, E. and Zeh, H. D. (2003) Decoherence and the Appearance of a
Classical World in Quantum Theory Springer, Berlin.
Kasevich, M. and Chu, S. (1992) Laser cooling below a photon recoil
with three-Level atoms Physical Review Letters 69 1741–4.
Kasdin, N. J., Vanderbei, R. J., Spergel, D. N., and Littman, M.
G. (2003) Extrasolar planet finding via optimal apodized-pupil and
shaped-pupil coronagraphs The Astrophysical Journal 582 1147–61.
Keaveney, J., Hughes, I. G., Sargsyan, A., Sarkisyan, D., and Adams,
C. S. (2012) Maximal refraction and superluminal propagation in a
gaseous nanolayer Physical Review Letters 109 233001.
Krist, J. E., Hook, R. N., Stoehr, F. (2011) 20 years of Hubble Space
Telescope optical modeling using Tiny Tim Proceedings of the SPIE
8127 81270J.
264 References
Kahr, B., Freudenthal, J., Phillips, S., and Kaminsky W. (2009)

Herapathite, Science 324 1407.
Kuhr, S., Gleyzes, S., Guerlin, C., Bernu, J., Hoff, U. B., Deléglise,
S., Osnaghi, S. M. Brune, M., and Raimond, J.-M. (2007) Ultrahigh
finesse Fabry–Perot superconducting resonator, Applied Physics Letters
90 164101.
Kuzmich, A., Dogariu, A., Wang, L. J., Milonni, P. W., and Chiao, R.
Y. (2001) Signal velocity, causality, and quantum noise in superluminal
light pulse propagation Physical Review Letters 86 3925.
Lancaster, T. and Blundell, S. J. (2014) Quantum Field Theory for the
Gifted Amateur Oxford University Press, Oxford.
Lax, M., Louisell, W. H., and McKnight, W. B. (1975) From Maxwell
to paraxial wave optics Physical Review A 11 1365–70.
Lekner, J (2003) Polarization of tightly focused laser beams Journal of
Optics A: Pure and Applied Optics 5 6–14.
Levitt, T. (2009) The Shadow of the Enlightenment Oxford University
Press, Oxford.
Loudon, R. (2000) The Quantum Theory of Light Oxford University
Press, Oxford.
Mandel, L. and Wolf, E. (1995) Optical Coherence and Quantum Optics
Cambridge University Press, Cambridge.
Maucher, F., Skupin, S., Gardiner, S. A., and Hughes, I. G (2018)
Creating optical longitudinal polarization structures Physical Review
Letters 120 163903.
Michelson, A. A. and Pease, F. G. (1921) Measurement of the diameter
of alpha Orionis with the interferometer Astrophysical Journal 53 249–
59.
Milnor, J (1978) Analytic proofs of the “hairy ball theorem” and the
Brouwer fixed point theorem The American Mathematical Monthly 85
521–4.
Milonni, P. (2005) Fast Light, Slow Light and Left-Handed Light Taylor
and Francis, New York.
Mulligan, J. F. (1998) Who were Fabry and Perot? American Journal
of Physics 66 797–802.
Nielsen, M. A. and Chuang, I. L. (2011) Quantum Computation and
Quantum Information Cambridge University Press, Cambridge.
Novotny, L. and Hecht, B. (2012) Principles of Nano-Optics 2nd
edition, Cambridge University Press, Cambridge.
O’Neill, E. L. (1963) Introduction to Statistical Optics Dover, New
York.
Porter, A. B. (1906) On the diffraction theory of microscopic vision The
London, Edinburgh, and Dublin Philosophical Magazine and Journal of
Science 11 154–66.
Potton, R. J. (2004) Reciprocity in optics Rep. Prog. Phys. 67 717–54.
References 265
Poynting, J. H. (1909) The wave motion of a revolving shaft, and

a suggestion as to the angular momentum in a beam of circularly
polarised light Proc. R. Soc. London A 82 560–7.
Radwell, N., Hawley, R. D., Götte, J. B., and Franke-Arnold, S.
(2016) Achromatic vector vortex beams from a glass cone Nature
Communications 7 10564.
Rashid, R. (1990) A pioneer in anaclastics: Ibn Sahl on burning mirrors
and lenses Isis 81 464–91.
Rhodes, D. R. (1964) On a fundamental principle in the theory of
planar antennas Proceedings of the IEEE 52 1013–21.
Richards, B. and Wolf, E. (1959) Electromagnetic diffraction in optical
systems. II. Structure of the image field in an aplanatic system Proc.
R. Soc. London A 253 358–79.
Robinson, F. N. H. (1973) Electromagnetism Oxford University Press,
Oxford.
Roychoudhuri, C., Kracklauer, A. F., and Creath, K. (2008) The Nature
of Light: What is a Photon? CRC Press, Boca Raton.
Saleh, B. E. A. and Teich, M. C. (1991) Fundamentals of Photonics
Wiley, New York.
Schmidt, O., Wynands, R., Hussein, Z. and Meschede, D. (1996) Steep
dispersion and group velocity below c/3000 in coherent population
trapping Physical Review A 53 R27–30.
Sherman, G. C. and Bremermann, H. J. (1969) Generalization of the
angular spectrum of plane waves and the diffraction Transform, J. Opt.
Soc. B 59 146–56.
Souza, J. A. (1983) Alternative derivation of the electric dipole
radiation fields American Journal of Physics 51 54.
Smith, G.S. (1997) An Introduction to Classical Electromagnetic
Radiation Cambridge University Press, Cambridge.
Stamnes, J. J. (1986) Waves in Focal Regions Hilger, Bristol.
Taylor, G. I. (1909) Interference fringes with feeble light, Proceedings
of the Cambridge Philosophical Society 15 114–5.
Tey, M. K., Chen, Z., Aljunid, S. A., Chng, B., Huber, F., Maslennikov,
G., and Kurtsiefer, Ch. Strong interaction between light and a single
trapped atom without the need for a cavity (2008) Nature Physics 4
924–7.
Tinoco Jr., I. and Freeman, M. P. (1957) The optical activity of oriented
copper helices J. Phys. Chem. 61 1196–200.
Wang, L. J., Kuzmich, A., and Dogariu, A. (2000) Gain-assisted
superluminal light propagation Nature 406 277–9.
Weller, L., Kleinbach, K. S., Zentile, M. A., Knappe, S., Hughes, I. G.
and Adams, C. S. (2012) Optical isolator using an atomic vapor in the
hyperfine Paschen-Back regime Optics Letters 37 3405–7.
266 References
Whittaker, K. A., Keaveney, J., Hughes, I. G., Adams, C. S. (2015)

Hilbert transform: applications to atomic spectra. Physical Review A
91 032513.
Wiener, N. (1930) Generalized harmonic analysis Acta Math. 55 117–
258.
Wolf, E. (1959) Electromagnetic diffraction in optical systems. I An
integral representation of the image field Proc. R. Soc. London A 253
349–57.
Yariv, A. and Yeh, P. (2007) Photonics: Optical Electronics in Modern
Communications 6th edition, Oxford University Press, Oxford.
Youngworth, K. S. and Brown, T. G. (2000) Focusing of high numerical
aperture cylindrical-vector beams Optics Express 7 77–87.
Zangwill, A. (2013) Modern Electrodynamics Cambridge University
Press, Cambridge.
Zhan, Q. (2009) Cylindrical vector beams: from mathematical concepts
to applications Advances in Optics and Photonics 1 1–57.
Index
A axis bunched light, 136

fast, 59, 60–1 b-V plot, 190–1
Abbe, Ernst, 147 optical, 9, 11, 19, 23–4, 29, 38, 74,
aberration, 26, 147, 181, 183 113, 150, 186, 203–4 C
chromatic, 28, 75 slow, 59, 60–1
spherical, 28, 149, 203–4 autocorrelation function of the electric calcite, 68
absorption, 57, 63, 123, 217 field, 132, 133–5 carrier wave, 112, 119–20
Aharonov, Yakir, 235 azimuthal polarization, 201–3 cartesian separability, 76–7, 79, 83,
Aharonov–Bohm effect, 235 162, 178, 187, 239, 250
Airy pattern, 78, 102, 148–52, 160, 174, B Cauchy’s dispersion theorem, 126
215, 251–2 causality, 100, 123–4, 220, 231
al Haytham, Ibn, 26 Babinet, Jacques, 104 cavity, 44
amplitude division, 45, 47, 49, 132 Babinet’s principle, 104–6, 107, 109, Fabry-Perot, 45–7
anaclastic lens, 26 215, 225, 241 laser, 63, 115–16, 125–6, 177, 182–3,
angular frequency, 3 backward propagating wave, 213 194, 202, 222
angular resolution limit, 151 Bacon, Roger, 26 limits of stability, 182
angular spatial frequency, 6 bandwidth, 113, 115–16, 121, 123, 125, cell, 26
angular spectrum, vii, 91, 95–8, 107–8, 129, 133–5, 142, 146 central ordinate theorem, 239, 241
111–2, 117, 152–3, 178, 184–5 theorem, 113 chaotic light, 128, 133
vector, 195, 198–204, 210–11 beam charge density, 219
anharmonic, 229–30 gaussian, see gaussian beam chiral chemistry, 51
anisotropic radius, 83–5, 97–8, 104, 167, 179, 186 chirality, 57, 62
intrinsic for spherical electromagnetic vector field associated with, 197–210
chirp, 120
waves, 198 waist, 98, 108, 177–82, 221
chromatic abberation, 28, 75
medium, 57 beamsplitter, 47–8, 50, 58
chromatic dispersion coefficient, 121
annular filter, 163–4, 207 beats, 116, 131
circ function, 78, 141, 188, 250–2
anomalous dispersion, 119, 122–4, 126, Beer, August, 217
circular aperture, 75–6, 86, 102–3, 141,
231 bees, 51
163, 251
anti-bunched light, 136 Bessel function, 78, 164, 188–91, 205,
circular basis, 52, 55–6
anti-reflection coating, 46 251
circular birefringence, 61–4
aperture function, 73, 138, 153, 159–60 Betelgeuse, 141
beyond paraxial, 196 circular dichroism, 63
array, 102
complementary aperture, 104–5 Biot, Jean-Baptiste, 58, 62 circular polarization, 52–5
double slit, 79, 87, 101, 113, 153 birefringence, 57, 59, 61–3 cladding, of a fibre, 188–92
grating, 103–4 bit error rate (BER), 121 Clausius, Rudolf, 220
lens, 78 Bloch Clausius–Mossotti relationship, 219–20
single slit, 79–80, 102 optical Block equations, 256–7 coherence, vii, 12, 33, 48, 127–45
apodization, vii, 158–64, 173 sphere, 56 area, 140–1
Arago, Francois, 58, 72 blue sky, 223, 231 first order, 132, 136
Arago, spot of, 71–2, 76, 88 Bohm, David, 235 length, 129–32, 135–6, 139–40
argon-ion laser, 115–16 Bohr second order, 136
array theorem, 103 complementarity, 156, 171–2 time, 129–30, 133
aspherical surface, 28 model, 4, 229 collimation, 28, 32
atom, 37, 63–4, 66, 75, 105, 113, 123, radius, 4, 260 collisional broadening, 128
128, 143, 156, 162, 214, Bouguer-Beer-Lambert law, 217 comb, 14
219–20, 223–4, 235, 255, boundary condition, 20–1, 115, 182, Dirac, 104, 247–8, 254
258–60 184, 189–91 frequency, 114
attenuation, 57–8, 62, 123, 214, 217, Brewster’s angle, 22, 58 function, 102–4, 113, 115, 170, 240,
225, 230 Brillouin, Leon Nicholas, 123–4, 189 247–8, 252, 254
attenuation coefficient, 61, 217, 228 Broglie, Louis Victor de, 11 complementarity, 156, 171–2
268 Index
complementary aperture, 76, 104, 105, degree of coherence, 128, 136, 139–40 wave, 4, 13
165, 225 density matrix, 255 electromagnetically induced transparency
complex depth of (EIT), 122
beam parameter, 178, 180–1, 197 field, 158 electron, 4, 11, 37, 61, 156, 213
notation, Fourier series, 93 focus, 158 bound, 20, 213
notation, polarizability, 214, 258 dextro rotary, 62 free, 213, 225
notation, Fourier transform, 94, 113 dichroism, 57–8, 63 ellipsoid, 26
notation, polarization, 54–5, 68–9 dielectric, 4, 213, 219–220 elliptical polarization
notation, plane waves, 16, 19, 32 diffraction, 57–8, 63 polarization, 55–6, 60, 68, 70
numbers, vii, 8–9 Fraunhofer, vii, 77–85, 101–105, 148, wave fronts, 26
modulus-squared, 8 153, 181, 184, 241 emission
conductivity, 227–8 Fresnel, vii, 72–7, 85–8 spontaneous, 128, 257–8
conjugate planes, 30 grating, 43–5, 77, 82, 87–8, 104, 109, endoscope, 183
connection formula, 190 114, 170, 172, 174–5, 248–9 energy, 35, 42, 195, 223
continuity equation, 7 limit, 147–8, 181 conservation, 21, 23, 37, 42, 50, 120,
continuous, 93 diode 180, 205, 226, 242
medium, 215 laser, 130, 182, 210 density, 7, 14
spectrum, 131, 135 optical, 63, 69 flux, 4, 7, 21, 37
sum, 93 dipole, 22, 213–225, 228–31, 235–8, level, 12, 187, 258
continuous wave (cw), 112 255–8 photon, 11
convolution, 100, 102–3, 105–6, 112, Dirac δ-function, 95, 240, 246 entrance pupil, 152
115, 118, 132, 148–50, 163, discrete envelope function, 112
168–70, 220, 239–40, 242–3, Fourier transform, 252–3 etalon, Fabry-Perot, see Fabry-Perot
248–9, 252–4 medium, 215 Euler, Leonhard, viii
core, of a fibre, 188–92 spectrum, 94, 96–7, 114–5, 131–2 evanescent wave, 199, 227
corn syrup, 62 sum, 40, 73, 91–4, 96, 131–2, 184–5, Ewald, Paul, 218
correlation, 248 Ewald–Oseen extinction theorem,
and coherence, 127 dispersion, vii, 20, 28, 32, 43, 116–24, 217–18
intensity, 136 126, 213–4, 221, 227, 231 extinction, 58, 217–18
photon, 12 anomalous, 119 cross section, 222, 224
spatial and van Cittert and Zernike, normal, 119 paradox, 105, 109, 225
140 dispersionless, 117–9, 122 extraordinary axis, 59
time, 133 dispersive medium, 117 eye diagram, 121
and Wiener–Khinchin–Einstein Doppler broadening, 128
theorem, 132 double slit, 40, 49–50, 79, 81–2, 87, 90, F
cosine wave, 92–3, 108 101, 113, 136–9, 144–6, 149,
COSTAR (Corrective optics space 152–3, 156, 172 f-number, 78, 158
telescope axial replacement), doublet, achromatic, 28 Fabry, Charles, 45
149 Drude–Lorentz model, 226–7 Fabry-Perot, 45–47
counter-propagating waves, 36, 65–6, false detail, 166
247 E Faraday, Michael, 1
cross section, 213, 223–5 Faraday
crystallography, 83, 89, 103 edge detection, 168 effect, 57, 61–3, 69
current density, 200, 226–7, 233–5 eigenfunction, 59 rotation, 56
curved wave fronts, vii, 10, 15–16, 23, eigenmode, 182 far field, 38–40, 78–84, 87, 101–2,
26 Einstein, Albert, 132 105–6, 178–9, 201, 222, 225
cylindrical special relativity, 123 fast axis, 59–60, 68
lens, 168 electric fast light, 122–3
symmetry, 27, 73–4, 78, 83, 88–9, current, 213, 225, 230, 234 fibre
160, 162–3, 188, 250–1 dipole, see dipole graded index, 185–7
vector beam, 201, 203 field, vii, 2–4, 14, 111, 127, 177, multimode, 183, 190–2
wave, 15, 23, 38, 40, 77, 87 196–8, 213, 228–9, 233–7, 256, 259 photonic crystal, 194
waveguide, 183–8 scalar approximation, 9 single mode, 183, 189–92
vector, 51–5, 57, 64–6, 198–200, step index, 188–92
D 202, 204–7 field, vii, 233
susceptibility, 215–21, 228–9 electric and magnetic, 1–5, 233
damping, 226–7, 257–9 electromagnetic electromagnetic, 2
de Broglie relation, 11, 13, 108 field, 2, 7, 11, 35, 61, 111, 196, 198, film
decay rate, 258 201, 213–14, 225, 227, 233–5 thin, 22, 46–7, 49–50
decoherence, 128 spectrum, 75 filter, 242
Index 269
high-pass, 165–70 approximation, 24, 38 H

low-pass, 165–70 coefficients, 20–1, 202, 218
polarizing, 58, 63 diffraction, 72–7, 184 hairy-ball theorem, 198
RC, 243 diffraction integral, 73, 99–101, 118, half-wave plate, 59–60
spatial, 159–60, 163–76 148, 151, 215, 224, 245 Hamilton, William, 8
spectral, 139, 143–4 integrals, 85–7 Hamiltonian, 235, 256
finesse, 46–7 lens, 72 Hanbury Brown, Robert, 136
fluctuation number, 75–6 Hanbury Brown and Twiss effect, 136
quantum, 12 reflection, 20–1, 58, 218 harmonic
flux, 4, 7, 37, 22, 80, 245 zone plate, 75 motion, 185, 229
continuity of, 20–1, 225 zone, 74–5, 140, 215 oscillator, 187, 229, 245, 259–60
focal fringe wave, 3–8, 11, 15–16, 52, 92, 97, 116,
length, 27–8, 74–5, 147, 155, 163, central, 80, 184 127
165, 181 interference, 35–6, 40–2, 46, 49–50, harmonics, 108, 252
plane, 77–81, 101, 148–55, 159–60, 70, 82, 86–7, 96, 101–2, 128, Heaviside, Oliver, 2
206 130–46, 153, 155–6, 160, 163–4, Heaviside function, 220–1
shift, 181, 193 166–8, 172, 252 hedgehog equation, 98–9, 178, 183
spot, 89, 148, 158, 163, 206–7 Ramsey, 113–14 in time, 117
Fourier, Joseph, vii, 91 vector, 199–200
Fourier G Heisenberg
analysis, 92 microscope, 156, 172
on a computer, 252–3 Gabor, Dennis, 41 uncertainty relation, 98, 156, 245–6
optics, 15, 25, 71, 91, 100–1 gain, 116, 123, 162 helicity, 57, 62
relation, 84 Galileo, Galilei, 26, 33–4 Helmholtz equation, 12, 99–100, 184,
series, 91–3, 97 gauge 187–9, 221, 226
spectrum, 117 Coulomb, 234–5 Herapath, William Bird, 58
synthesis, 92 Lorenz, 235–6 herapathite, 58
toolkit, 100–1, 239–52 transformation, 234–5 high-energy physics, 53
transform, vii, 78, 94–5, 97, 99–105, Gauss, Carl Friedrich, 83 high-pass filter, see filter
111–17, 120, 128 gauss function, 102, 104, 160, 171, 178, high-NA focusing, 9, 203–8
pair, 96 240, 244–5, 253–4 high numerical aperture lens, 156
spectroscopy, 131 gaussian higher-order modes, 190–1
Fourier transform, properties apodization, 163 high-reflectivity mirrors, 23, 46–7
cartesian separability, 76, 250 beam, 83–4, 97–8, 101, 104–6, 167–8, Hilbert, David, 221
central ordinate theorem, 239, 241 177–83, 185–9, 193–7, 202, 205–7, Hilbert
convolution, 242–3 221–2 space, 55
definition, 239 distribution, 115–7, 120, 133, 144, transform, 221, 231
linearity, 241 158, 160 history of
scaling property, 84, 90, 155, 239, filter, 168 diffraction, 71–2
241, 244, 247 pulse, 117, 120–4 imaging, 147–8
translation property, 102, 150, 239, wave packet, 120, 246 lenses, 26
241, 248 geometrical optics, 1–2, 43, 62
Fox Talbot, Henry, 87 optics, 155 wave interference, 33–4
Fraunhofer, Joseph, 43 progression, 248 hologram, 41
Fraunhofer shadow, 34, 71, 86 holography, 41, 130
approximation, 39 wave fronts, 201 honey lens, vii, 34, 71
diffraction, vii, 77–85, 101–5, 148, global phase, 39, 52, 154 Hooke, Robert, 26
153, 181, 184, 241 graded index fibre, see fibre Hooke’s law, 229
diffraction formula, 101 graded index (GRIN) lens, 185 Hopkins, Harald Horace, 183
free spectral range, 47, 50 gravitational wave detection, 45, 48, Hubble Space Telescope, 149, 157
frequency 50–1 Huygens, Christian, 1
angular, 3 Gouy, Louis George, 23, 180 Huygens secondary waves, 71–2
angular spatial, 6 Gouy phase, 23, 100, 154, 180, 222, 224 Huygens–Fresnel principle, 15, 71–2,
spatial, 4, 5–6, 14, 16, 18, 24–5, 36, grating, see diffraction grating 100
40, 43, 91–7, 147–8, 160, 165–71 Green’s function, 99
spectrum, 93, 97, 111–12, 117, 124, Grosseteste, Robert, 26 I
131, 134, 146, 162, 230 group index, 119
Fresnel, Augustin-Jean, vii, 1, 33–4, group velocity, 119 Ibn Sahl, see Sahl, Ibn
71–2 dispersion, 120 image, vii, 28–30, 41, 137, 148–9, 155,
Fresnel 159, 164
270 Index
imaging, 28–30, 87, 172, 204 diode, 130, 182 low-pass filter, 165–70
phase-contrast, 171–2 He–Ne, 85, 130, 193, 197
incoherence, 127 modes, see mode M
index of refraction, see refractive index pointer, 14, 49, 90, 105, 108, 182
in-quadrature, 8, 216, 256 ultra-stable, 128, 130 Mach–Zehnder interferometer, 45
intensity, 4, 7–9, 11–12, 20–3, 34–7 laser beam propagation, see magnetic field, 2, 4, 61–3, 65–7, 198–9
correlations, 136 propagation, laser beam derived from potential, 233–5
fluctuations, 12 Law of refraction, 20–2, 25–6 in the paraxial limit, 210
map, 12 Left-circularly polarized light, in phase with electric field, 17
maxima and minima, 40 convention, 53 not in phase with electric field, 65
point spread function, 149, 161–2 length, coherence, see coherence, for a plane wave, 17, 52
reflection coefficient, 22, 126 length, wave equation, 3
saturation, 258 lens magneto-optic media, 64
time average, 7, 9 aberrations, see aberration Malus, Étienne-Louis, 58
interference, vii, 1, 12, 22, 33–43, 64–5, achromatic, 28, 43 Malus’ Law, 58
71, 74, 81–2, 87, 96, 113, anaclastic, 26, 222 matrix element for light–matter
116–17, 128, 131, 136–41, 147, angular resolution limit, 151 interaction, 214, 255, 258,
156, 164, 172, 188, 216, 223, aplanatic, 203, 204, 211 260
252 beam divergence, 90, 97–8, 178, 189 matter wave, 11–12, 126, 156, 183, 189
interferogram, 131, 162 equation, 30 Maxwell, James, 1
interferometer, 37, 45–8, 129–30, 135, f to f, 151–2 Maxwell’s equations, 2, 3–4, 7, 17, 21,
137 f number, 78, 158 23, 65, 72, 184, 195–8, 201,
Michelson, 130–2, 135, 162 finite size, consequence of, 78 221, 225, 233–4
Michelson’s stellar, 141, 145 focal length, 27–8 meridional plane, 204
Ramsey, 113 Fourier-transform property of, see f Michelson, Albert, 45
Young’s, see Young’s two-hole to f Michelson
experiment Fraunhofer diffraction realized with, interferometry, 47–9, 70, 130–2, 135,
interferometry, 45–8, 130, 135 77–8 162
inverse apodization, 159, 163–4 Fresnel, 72, 86 stellar interferometry, see stellar
inverse Fourier transform, 94 geometry, 27–8 interferometry
iodoquinine sulfate, 58 graded-index, 185–6 microscope, 26, 147–8, 155, 166
history, 26, 147 microwave, 4, 196, 227–8
J honey drop, 34, 71 mirror
imprinting quadratic phase, 28, 29, history, 26
Jamin interferometer, 45 186 in laser cavity, 182–3
Janssen, Hans and Zacharia, 26 modifying point-spread function, see in telescope, 141, 149
jinc function, 78, 141, 149–50, 163–4, apodization metal, 1, 19, 43, 184, 213, 225–7, 260
240, 250–2 numerical aperture of, 148 mode, 177
Jupiter, 26 thin-lens approximation, 27–9 azimuthally polarized, 201–4
two-lens system, see two-lens system cavity, 182–3
K zone plate as, 74–5 fibre, 185–192
lifetime, of excited state, 214 Hermite–Gauss, 202, 211
Kelvin, see Thomson, William light-matter interaction, vii, 4, 55, longitudinal of laser, 115–16
Kerr effect, 59, 228 213–29 matching, 221–2
Khinchin, Aleksandr, 132 limit, Abbe diffraction, 147–8 radially polarized, 201–2
Kirchhoff, Gustav, 72 linearity of Fourier transform, 241 transverse of laser, 182–3
Kohlrausch, Rudolf, 2 linearly-polarized light, see transverse electric (TE) mode, see
Kramers, Hendrik, 180, 221 polarization, linear transverse electric (TE) mode
Kramers-Kronig relations, 57, 123, longitudinal component of electric field transverse electric and magnetic
220–1, 231 in a light beam, 9, 196–8, (TEM) mode, see transverse
Kronig, Ralph, 221 203, 206–8, 210 electric and magnetic (TEM)
Lorentz, Hendrick, 219 mode
L Lorentz–Lorenz law, 219–20 transverse magnetic (TM) mode, see
Lorentz force, 2 transverse magnetic (TM) mode
laevo rotatory, 62 Lorentz model, 259 of a waveguide, 183
laser Lorentzian lineshape, 47, 128–9, 143, within a slit, 184–5
argon-ion, 115–16 180 modulus squared, 8
cavity, 115–16, 182–4 Lorentzian chaotic light, 133 momentum
coherence time, 130 Lorenz, Ludvig, 219 distribution, 83–4, 95, 97–8, 108,
cooling, 64, 162 Lorenz gauge, 235–6 184–5, 241, 246
Index 271
of a photon, 11, 19, 156 P polarizability, 213–23, 228, 255, 258–9

monochromatic wave, 3, 6, 11, 15, 38, polarization 51–66
48, 71, 78, 95, 99, 112, 128, p polarization, 21, 58 azimuthal, 201–3, 207
131–3, 178, 215, 224, 255 paraxial, beyond the approximation, circular, 52–7, 59, 61, 63, 66, 198,
Mossotti, Ottaviano-Fabrizio, 220 195, 196–7, 203, 207 201, 224
multimode paraxial distance, 24, 25, 38 circular basis, 52, 55–6
guide, 183, 192 paraxial optics, 9, 23–6, 27–9, 38, elliptical, 55–6, 60
laser, 130 71–5, 100, 137, 178–9, 224 gradient, 64–6
multiple-path interference, 45, 188 Parseval’s theorem, 242 linear, 53–55, 60, 65, 198, 202
multiple reflections, 22–3, 47 partial coherence, 130, 133, 137, 140 linear basis, 52–3, 55
multiplication in Fourier space, 170 particle in a box, 184, 187, 194 radial, 202–3, 207
path, which and complementarity, 139, by reflection, 22
N 156, 172 by scattering, 51
Pérot, Jean-Baptiste, 45 polarization density 219
narrow slit, 85–6, 139 period (optical), 3–4, 53–4 polarizer, 58, 69
natural optical activity, 61–2, 64 periodic function, space, 5–6, 36, 92, polarizing beam splitter, 58
near field 94, 96 polaroid, 58
diffraction, 74, 84, 105–6, 179 1D, 165–6 polychromatic light, 130
of a dipole, 217–19, 236–8 2D, 166–7 potential
negative frequency, 9, 93, 113, 125 periodic function, time, 3, 7, 53–4, of a dipole, 236–8
Newton, Isaac, 34 114–16, 143 electric scalar, 2, 233–5
Newton’s permeability, magnetic of free space, 2, magnetic vector, 2, 69, 233, 234–7
rings, 41 14, 233 retarded, 236
two-knife experiment, 71 permittivity, electric of free space, 2, power, 7, 14, 134, 222, 258–9
Nimrod lens, 26 14, 233 power-equivalent width, 133
noise, 123, 128 phase, 3–6, 8, 15–20, 25–9, 33, 39, 41–2, power spectral density, 133–5
non-linear optics, 1, 182, 213, 228–30 45–9, 52–7, 59–66, 72–82, Poynting, John, 7
normal dispersion, 119, 122, 126 95–6, 98–103, 117–19, 127–30, Poynting’s conjecture as to the angular
normal distribution, 244 152–6, 164, 171, 180, 187, 197, momentum in a circularly
numerical aperture, 148, 156, 203–4, 199–204, 214–22 polarized light wave, 57
207–8 phase change on reflection, 22, 46, 202 Poynting vector, 7, 21, 35–6, 222
phase-contrast imaging, 171 precursor, 123–4
O phase encoding, 171 pressure broadening, 128
phase jump, 128–9, 143–4 principal maximum, 42–5, 82, 85,
object distance, 29–30 phase velocity, see velocity, phase, 102–4, 248
objective, 155 phasor, 5, 19, 34, 38–46, 72, 82, 85, prism, 58
obliquity factor, 76 231, 247–9 probability distribution, 134, 143, 193,
oil film, interference in, 22, 46 photography, 78, 158 244–6
oil immersion lens, 208 photon, 1, 7, 11, 12, 40, 51, 56–7, 75, propagation, vii, 7, 10–1, 16–19, 24–6,
optical activity, 56, 61–4 98, 124, 136, 139, 156, 172, 35, 71–8, 87–8, 95–9, 152–3,
optical axis, 9, 11, 19, 23–4, 26, 29, 38, 223, 228, 245, 258 165, 225
73–4, 113, 150, 185–6, 203–4 counting, 1, 7, 12, 136, 151 of coherence, 140–1
optical cavity, see cavity statistics, 12, 136 of laser beam, 177–81
optical diode, 63, 69 photonic bandgap, 188 in matter, 214–20
optical fibre, see fibre photonic crystal fibre, 188 in an optical fibre, 188–91
optical image processing, 165 planar wave fronts, 16, 26, 28, 204 of a vector beam, 195–203
optical isolator, see optical diode plane wave, vii, 7, 15, 16–19, 20–1, 25, of a wave group, 117–24
optical Kerr effect, 228 27, 40–3, 52, 55–6, 73–4, 78, in a wave guide, 183–7
optical path, 38, 44, 47, 73 83, 91–106, 111, 116, 128, 137, propagator, 19, 98–100, 120, 178
optical tweezers, 12, 203 147–52, 167, 189, 195–206, pulse, optical, 111, 112–16
order 213–7, 224 double, 113–14
of diffraction peak, 44–5 two plane waves, 35–6, 52, 64–6 multiple, 114–16
missing, 82 plasma, 226–7, 231, 260 pulse train, mode-locked, 115–16, 125
ordinary axis, 59 Pockels effect, 59 single, 112–13
orthogonality Poincaré sphere, 56 pupil function, 152, 160–4
of complex basis vectors, 55 point spread function, 147, 148–9, 150,
of sine and cosine, 93 159–164 Q
oscillator, 187, 213–14, 235, 259 Poisson, Siméon, 72
Poisson’s equation, 235–6 quadratures, 8, 216, 256
Poisson’s spot, see Arago, spot of quality of fringes, see visibility,
272 Index
quantum dipole model, 235, 238, 255, resonance frequency, 20, 62, 121–4, Snellius, Willebrord, 20
260 214–6, 222–5, 228–9, 258–60 Snell’s law, 20, 22
quantum field, 2, 11, 246 resonator, optical, see cavity Sommerfeld, Arnold, 123
quantum mechanics, 1, 11–12, 37, 55, retardation, in wave plate, 59–60 spatial filter (4f), 164–72
57, 59, 91, 113, 128, 136, 156, retarded time, 236 spatial filtering, 91, 148, 159, 164,
183, 189, 224–5, 234–5, 244–5, Richards–Wolf vector diffraction integral, 165–72
259–60 204 spatial frequency, 4, 5, 6, 14, 16, 18, 24,
quantum optics, 1, 12, 136 Right-circularly polarized light, 36, 92–7, 166–71, 184–5,
quantum tunneling, 191 convention, 53 240–7, 251
quarter-wave plate, 59–61 Ronchi grating, 87–8, 248–9 spectral width, 121, 128–30
quartz, birefringence, 68 rotatory spectrometer, Fourier Transform, 131,
quasi-monochromatic light, 136, 139 dextro, 62 160–2
laevo, 62 spectroscopy, Fourier Transform, 131,
R 135, 162
S spectrum
Rabi frequency, 256 angular, 91, 95, 96–8, 111, 152–3,
Rabi oscillations, 257–8 s polarization, 21, 58 178, 185
radiation Sagnac interferometer, 45 frequency, 111, 112–13, 117, 124,
field, 200–1, 215, 222, 238 Sahl, Ibn, 20, 26 131, 134, 146, 162, 230
term, of electric dipole, 238 Saturn, rings, 26 power, 113, 132–3, 134, 135, 143–4
rainbow, 20 scalar approximation, 9 vector angular spectrum, 195, 198,
Raman transition, 162 scalar potential, 2, 233, 235 199–204
Ramsey, Norman, 113 scalar wave, see wave, scalar speed of light, 2–4, 14, 16, 111, 123–4
Ramsey, interferometer and fringes, scaling property of Fourier transforms, sphere, field on, 198, 222
113–14 84, 90, 155, 239, 241, 244, 247 spherical aberration, see aberration,
scattered field (or light), 19, 41, 51, 62, spherical
randomness, 62, 127–9, 132, 134, 136,
156, 223–5 spherical wave, 15, 23, 25–30, 34,
156
scattering, 8, 41, 105, 122, 156, 213–17, 36–42, 71–2, 179, 193, 201,
ray, 1, 26, 200, 204
223–5, 258–9 204, 214–15
Rayleigh
scattering cross section, 213, 222, spin, photon, 56
criterion, 150, 159
223–5
distance (length), 79, 80, 84 spontaneous emission, 128, 257
Schrödinger equation, 12, 126, 184, 187,
limit, 151, 160 spot of Arago, see Arago, spot of
256
range, 83, 105, 178, 179–81, 186, 222 spot, focal, 89, 148, 158, 163, 206–7
secondary wave, 71–3, 100
scattering, 223 square wave, 93, 108, 248–9
second-order coherence, 136
theorem, 242 stability of laser cavity, 182–3
selection rules, 57
real field, 8, 9, 50, 54–5, 68, 113, 125 self Fourier, 245, 247 standing wave, 36, 64–6, 69, 93, 247
reciprocal rotation, 61–2 self replicating, 87 stationary phase, method of, 200
reciprocity theorem, 64, 172 shadow, 33–4, 71–2, 76, 86–7, 159, 225 stationary random process, 134, 143
rect function, 79, 84, 101, 103–5, short laser pulses, 113, 125 stellar interferometry, 141
112–15, 138, 141, 152–3, 161, signal-to-noise, 123, 149, 151 step-index fibre, 185, 188–9, 190
168–9, 240, 243–4, 247–51 signum function, 220 step function, 123, 243
reflection coefficient, see Fresnel sinc function, 80–2, 84, 101–2, 104–6, Heaviside function, 220
coefficients, 112–15, 138–41, 153, 161–2, stored light, 122
refraction, 20–2, 26, 75, 204, 213–4 168, 184–5, 240, 243, 244, stress-induced birefringence, 59
refractive index, 19–22, 28, 46, 59–60, 248–51 subsidiary maxima, 42–4, 103–4, 160–4,
117–22, 148, 186–8, 204, 208, sine condition (Abbe), 203 174, 248
229 sine wave, 92–3, 108 sugar, 61–2
for a medium of dipoles, 214–16, single mode superluminal propagation, 122–4
218–20, 236, 259–60 fibre, 189–92 superposition
for a plasma, 226–7 laser, 130 linear, vii, 15, 18–19, 55, 61, 64–5,
relative phase, 9, 11, 33–4, 37, 39, 42, waveguide, 183 72, 91–6, 99, 111, 116–17, 195–9,
48, 52, 56, 59, 61, 127, 129, skin depth, 184, 226–8 203, 213, 222, 238
138, 202, 214, 224 sky, colour of, 51, 223, 231 principle of, 1, 3, 10, 21, 33, 132
replicating function, see comb function, slit, 38, 45, 79, 80–8, 104 super resolution, 159, 163–4
resolution slow axis, 59, 60–1 susceptibility, electric, 215–21, 228–9
angular, 141–51 slow light, 122–4, 126 symmetry
limit, 147, 151, 160, 251 slowly varying envelope, 116 cartesian (cartesian separability), 76,
wavelength, 45 small-angle approximation, 24, 36, 40, 83–4, 90, 100–1, 103, 162, 167,
resolving power, 45, 156–7 95, 98, 139, 148 178, 187, 202, 250
Index 273
cylindrical, 27, 73, 160, 185, 188, 201, van Cittert–Zernike theorem, 137–40 wave number, 6
204–5, 224, 250–1 variance, 134 wave-particle duality, 11, 156
of Fourier transform, 241 vector light field, 197–210 wave plate, 59, 60–1, 68–70
time-reversal, see time-reversal vector wave, 2–3, 199, 201 half-wave plate, 59–60
symmetry vector potential, 2, 69, 233, 234–7 quarter-wave plate, 59–61
velocity segmented, 202
T of electromagnetic waves, 2 wave vector, 3, 6, 11, 16–17, 19, 23–5,
energy transport, 124 93–4, 96, 98, 111, 117, 197,
Talbot effect, 87–8, 166 front, 123 199, 201, 218
telescope, 26, 43, 147, 159, 251 group, 117, 116, 119–20, 122–4, 227 waveguide, 12, 183–4, 187, 196, 198, 228
Hubble Space Telescope, point-spread of information, 123–4 Weber, Wilhelm, 2
function, 149, 157 phase, 4, 116–17, 119, 124, 227 wedge-shaped slit, 72
temporal broadening of pulse, 121, 126 signal, 124 wedge fringes, 36, 49
temporal coherence, 128, 128–9, 136 Verdet coefficient, 63 white light, 46, 111, 223
thermal light, 136 Vikings, 51 coherence time, 130
thin lens, 27–30, 186 visibility, 128, 130, 172 fringes in Young’s double-slit
thin slab, 213, 231–2 and spatial coherence, 137–9 experiment, 146
Thomson, William, 34, 62 and stellar interferometry, 141 interferometry, 130–1
tide, 33–4, 91 and temporal coherence, 131–3, 135
width, of a spectral function 134–5
tide-predicting machine, 34 and which-path information, 156, 172
Wiener, Norbert, 132
tilt fringes, 70
Wiener–Khinchin–Einstein theorem,
time average, 4, 7–9, 21, 35–7, 129, 132, W
132, 134–5, 143–4
136, 222
window function, 160
time-reversal symmetry, 172 waist of a gaussian beam, 83, 98, 177,
WKB approximation, 189–91
see also reciprocity theorem 178–83, 193–4, 197
top hat, 243 wave
X
see also rect function circular, see circular wave
total internal reflection, 183, 202 cylindrical, see cylindrical wave
X-rays, 1, 33–4
transition, atomic, 57, 63, 162, 238, 255 electromagnetic, 2, 7, 198
translation property of Fourier electromagnetic wave in a metal, crystallography and diffraction, 83,
transforms, 102, 150, 239, 225–6 89
241, 248 electromagnetic wave in a plasma, diffraction and Babinet’s principle,
transmission coefficient, 21–2, 46 227 104
transmission function, 73, 160–2, 168–9 equation, 2, 3–4, 9–11, 225–6, 235 focusing by zone plate, 75
see also aperture function evanescent, 199, 227
transverse coherence length, 139, 140–1 front, 10, 15–17, 23, 26–8, 33, 35, 71, Y
transverse electric (TE) mode, 198 201
transverse electric and magnetic (TEM) front curvature, 16, 25–6, 83, 151–3, Young, Thomas, 1, 33–4
mode, 177 186 Young’s interferometer, see Young’s
transverse magnetic (TM) mode, 198 front curvature of gaussian beam, two-hole experiment
transverse wave, 17–18, 195–6 179, 182–3 Young’s two-hole experiment, 34, 37,
triangle function, 125, 161–2, 254 front division interferometry, 45 38–40, 50, 87, 90, 136–40,
triangular aperture, 103 harmonic, see harmonic wave 144–6, 152–3, 156, 172, 252
Twiss, Richard, 136 longitudinal, see longitudinal wave
two-lens system, 12, 147, 153–6, 159 matter, see matter wave Z
packet, 92, 97, 120, 130, 244–6
U plane, see plane wave Zeeman, Pieter, 219
scalar, 9–11, 98, 111, 225 Zeiss, Carl, 147
ultra-stable laser, 130 secondary, see secondary wave Zernike, Frits, 137, 171
ultra-violet region of spectrum, 20, 223, sound, 34 zero
226 spherical, see spherical wave first, 41, 43–4, 78, 80, 82, 84, 102,
uncertainty principle, 98, 241, 245–6 standing, see standing wave 140, 145, 163–4, 244, 248, 251
unpolarized light, 51, 58 transverse, see transverse wave frequency or spatial frequency, 6, 93,
travelling, 10 97, 246, 253
V water, 16, 33, 37, 50 order, 45
wavelength, 1, 3–6, 11, 18, 20, 28, 43–8, zone, Fresnel, see Fresnel zone
van Cittert, Pieter, 137 111, 130, 222–3 zone plate, 74, 75, 86, 89

Untitled

Uploaded by

Copyright:

Available Formats

Untitled

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Untitled

Uploaded by

Copyright:

Available Formats

Optics f 2f

from Fourier to Fresnel

Beginning in 1814, Augustin-Jean Fresnel performed a series of experi-

2 One wave: plane or curved 15

3 Two waves: interference 33

5 Many waves I: Fresnel and Fraunhofer 71

5.11 Talbot eﬀect 87

6 Many waves II: Fourier 91

7 Optical phenomena in the time domain 111

9 Optical imaging 147

10 Spatial ﬁltering 159

11 Light propagation: beams and guides 177

12 Vector light ﬁelds 195

13 Light and matter 213

A Electromagnetic scalar and vector potentials 233

B Fourier transform toolkit 239

C Induced dipoles 255

1.2 A brief history

1.3 Maxwell’s equations

1.4 Maxwell’s wave equation

If E 1 is a solution of the wave equation

This is a powerful result. It allows us to build new solutions from known

A particularly useful solution of the wave equation is a wave with

where k = (kx , ky , kz ) is known as the wave vector, ω is the angular

velocity is given by vp = ω/k. On substituting the periodic solution

where α  1/137 is the ﬁne-structure constant. As F e is over one

where the phase φ = k · r − ωt + φ0 . The phasor evolution for a wave

Fig. 1.4 (a) The magnitude of the

1.9 Spatial frequency

is zero in the vertical direction. Another example, a brick wall has a

k = 2πν̃ = 2π/λ . (1.16)

This quantity is know as the spatial frequency along x, and also

1.10 Intensity/Poynting vector

1.11 Complex representation

whereas for a real ﬁeld it would be necessary to change the argument

Substituting eqn (1.27) gives

as before. As the intensity is a time-averaged quantity it does not depend

1.12 Scalar approximation

1.13 General solution

waves, paying attention to their relative phase, and the conservation of

E (0) = E0 f(x , y ) , (1.37)

where E0 is the ﬁeld amplitude and f(x , y ) is a dimensionless function

E (z) = E(x, y) , (1.38)

where now the superscript denotes the plane at a propagation distance z

Fig. 1.9 The intensity distribution

This expression relates the propagation direction speciﬁed by k to the

• Maxwell’s equations show that light is described by an

(1.3) Electric and magnetic ﬁelds at z = 0, for t in the range −5 fs ≤ t ≤ 5 fs.

2.1 Introduction 2.5 Scalar plane wave 18

2.2 Wave fronts

region of interest. Figure 2.1 shows a visualization of the phase of two

Fig. 2.1 Left/right: Visualization of

Note that as we move away from a localized source, as in Fig. 2.2,

2.3 Plane waves

angle relative to z in Section 2.5, but ﬁrst we need to introduce another

2.4 Transverse property

Substituting the plane-wave solution into Maxwell’s equations (1.1) and

Equation (2.8) shows that E is perpendicular to k, therefore for a

• E, B, and k being mutually orthogonal;

2.5 Scalar plane wave

where α 1/137 is the ﬁne-structure constant. As F e is over one

sin θ θ, and cos θ 1 − θ2 /2 .