BYUOpticsBook PDF

Physics of Light and Optics
Justin Peatross
Michael Ware
Brigham Young University
August 13, 2010

Preface
This curriculum was originally developed for a senior-level optics course in the
Department of Physics and Astronomy at Brigham Young University. Topics are
addressed from a physics perspective and include the propagation of light in
matter, reflection and transmission at boundaries, polarization effects, disper-
sion, coherence, ray optics and imaging, diffraction, and the quantum nature of
light. Students using this book should be familiar with complex numbers, vector
calculus, and Fourier transforms. A brief summary of these mathematical tools is
provided in Chapter 0.
While the authors retain the copyright, we have made this book available free
of charge at optics.byu.edu. This is our contribution toward a future world with
free textbooks! The web site also provides a link to purchase bound copies of the
book for the cost of printing. A collection of electronic material related to the
text is available at the same site, including videos of students performing the lab
assignments found in the book.
We have included a number of historical sketches about scientists who helped
develop the field of optics. These sketches are not authoritative (most of them
are summaries of Wikipedia entries). However, we feel that it adds richness to
the course when students can learn something of the people who pioneered the
material they are studying.
The authors may be contacted at opticsbook@byu.edu. We enjoy hearing
reports from those using the book and welcome constructive feedback. We occa-
sionally revise the text. The title page indicates the date of the last revision.
We would like to thank all those who have helped improve this material. We
especially thank John Colton, Bret Hess, and Harold Stokes for their careful review
and extensive suggestions. This curriculum benefitted from a CCLI grant from
the National Science Foundation Division of Undergraduate Education (DUE-
9952773).
iii
Contents
Preface iii
Table of Contents v
0 Mathematical Tools 1
0.1 Vector Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.3 Fourier Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
0.4 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Appendix 0.A Table of Integrals and Sums . . . . . . . . . . . . . . . . 19
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1 Electromagnetic Phenomena 25
1.1 Gauss’ Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.2 Gauss’ Law for Magnetic Fields . . . . . . . . . . . . . . . . . . . . 27
1.3 Faraday’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.4 Ampere’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.5 Maxwell’s Adjustment to Ampere’s Law . . . . . . . . . . . . . . . . 31
1.6 Polarization of Materials . . . . . . . . . . . . . . . . . . . . . . . . 34
1.7 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2 Plane Waves and Refractive Index 43

2.1 Plane Wave Solutions to the Wave Equation . . . . . . . . . . . . . 44
2.2 Index of Refraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.3 The Lorentz Model of Dielectrics . . . . . . . . . . . . . . . . . . . 49
2.4 Index of Refraction of a Conductor . . . . . . . . . . . . . . . . . . 52
2.5 Poynting’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6 Irradiance of a Plane Wave . . . . . . . . . . . . . . . . . . . . . . . 55
Appendix 2.A Energy Density of Electric Fields . . . . . . . . . . . . . 56
Appendix 2.B Energy Density of Magnetic Fields . . . . . . . . . . . . 58
Appendix 2.C Radiometry Versus Photometry . . . . . . . . . . . . . . 59
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3 Reflection and Refraction 63

3.1 Refraction at an Interface . . . . . . . . . . . . . . . . . . . . . . . . 63
v
vi CONTENTS
3.2 The Fresnel Coefficients . . . . . . . . . . . . . . . . . . . . . . . . 66

3.3 Reflectance and Transmittance . . . . . . . . . . . . . . . . . . . . 68
3.4 Brewster’s Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.5 Total Internal Reflection . . . . . . . . . . . . . . . . . . . . . . . . 71
3.6 Reflections from Metal . . . . . . . . . . . . . . . . . . . . . . . . . 73
Appendix 3.A Boundary Conditions For Fields at an Interface . . . . 74
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 Multiple Parallel Interfaces 79

4.1 Double-Interface Problem Solved Using Fresnel Coefficients . . . 80
4.2 Two-Interface Transmittance at Sub Critical Angles . . . . . . . . 83
4.3 Beyond Critical Angle: Tunneling of Evanescent Waves . . . . . . 86
4.4 Fabry-Perot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.5 Setup of a Fabry-Perot Instrument . . . . . . . . . . . . . . . . . . 90
4.6 Distinguishing Nearby Wavelengths in a Fabry-Perot Instrument 92
4.7 Multilayer Coatings . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.8 Repeated Multilayer Stacks . . . . . . . . . . . . . . . . . . . . . . . 99
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5 Propagation in Anisotropic Media 105

5.1 Constitutive Relation in Crystals . . . . . . . . . . . . . . . . . . . 106
5.2 Plane Wave Propagation in Crystals . . . . . . . . . . . . . . . . . . 107
5.3 Biaxial and Uniaxial Crystals . . . . . . . . . . . . . . . . . . . . . . 111
5.4 Refraction at a Uniaxial Crystal Surface . . . . . . . . . . . . . . . 112
5.5 Poynting Vector in a Uniaxial Crystal . . . . . . . . . . . . . . . . . 113
Appendix 5.A Symmetry of Susceptibility Tensor . . . . . . . . . . . . 115
Appendix 5.B Rotation of Coordinates . . . . . . . . . . . . . . . . . . 117
Appendix 5.C Electric Field in Crystals . . . . . . . . . . . . . . . . . . 119
Appendix 5.D Huygens’ Elliptical Construct for a Uniaxial Crystal . . 122
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Review, Chapters 1–5 127
6 Polarization of Light 133

6.1 Linear, Circular, and Elliptical Polarization . . . . . . . . . . . . . 134
6.2 Jones Vectors for Representing Polarization . . . . . . . . . . . . . 135
6.3 Elliptically Polarized Light . . . . . . . . . . . . . . . . . . . . . . . 136
6.4 Linear Polarizers and Jones Matrices . . . . . . . . . . . . . . . . . 137
6.5 Jones Matrix for Polarizers at Arbitrary Angles . . . . . . . . . . . 140
6.6 Jones Matrices for Wave Plates . . . . . . . . . . . . . . . . . . . . . 141
6.7 Polarization Effects of Reflection and Transmission . . . . . . . . 144
Appendix 6.A Ellipsometry . . . . . . . . . . . . . . . . . . . . . . . . . 145
Appendix 6.B Partially Polarized Light . . . . . . . . . . . . . . . . . . 147
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
© 2010 Peatross and Ware

CONTENTS vii
7 Superposition of Quasi-Parallel Plane Waves 159

7.1 Intensity of Superimposed Plane Waves . . . . . . . . . . . . . . . 160
7.2 Group vs. Phase Velocity: Sum of Two Plane Waves . . . . . . . . 163
7.3 Frequency Spectrum of Light . . . . . . . . . . . . . . . . . . . . . 165
7.4 Packet Propagation and Group Delay . . . . . . . . . . . . . . . . . 169
7.5 Quadratic Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.6 Generalized Context for Group Delay . . . . . . . . . . . . . . . . . 174
Appendix 7.A Pulse Chirping in a Grating Pair . . . . . . . . . . . . . . 177
Appendix 7.B Causality and Exchange of Energy with the Medium . . 179
Appendix 7.C Kramers-Kronig Relations . . . . . . . . . . . . . . . . . 184
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8 Coherence Theory 191

8.1 Michelson Interferometer . . . . . . . . . . . . . . . . . . . . . . . 192
8.2 Temporal Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.3 Fringe Visibility and Coherence Length . . . . . . . . . . . . . . . 196
8.4 Fourier Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.5 Young’s Two-Slit Setup and Spatial Coherence . . . . . . . . . . . 199
Appendix 8.A Spatial Coherence for a Continuous Source . . . . . . . 204
Appendix 8.B Van Cittert-Zernike Theorem . . . . . . . . . . . . . . . 206
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
9 Light as Rays 217

9.1 The Eikonal Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 218
9.2 Fermat’s Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.3 Paraxial Rays and ABCD Matrices . . . . . . . . . . . . . . . . . . . 224
9.4 Reflection and Refraction at Curved Surfaces . . . . . . . . . . . . 226
9.5 ABCD Matrices for Combined Optical Elements . . . . . . . . . . 228
9.6 Image Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.7 Principal Planes for Complex Optical Systems . . . . . . . . . . . 233
9.8 Stability of Laser Cavities . . . . . . . . . . . . . . . . . . . . . . . . 234
Appendix 9.A Aberrations and Ray Tracing . . . . . . . . . . . . . . . . 236
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
10 Diffraction 245
10.1 Huygens’ Principle as Formulated by Fresnel . . . . . . . . . . . . 246
10.2 Scalar Diffraction Theory . . . . . . . . . . . . . . . . . . . . . . . . 248
10.3 Fresnel Approximation . . . . . . . . . . . . . . . . . . . . . . . . . 250
10.4 Fraunhofer Approximation . . . . . . . . . . . . . . . . . . . . . . . 252
10.5 Diffraction with Cylindrical Symmetry . . . . . . . . . . . . . . . . 253
Appendix 10.A Fresnel-Kirchhoff Diffraction Formula . . . . . . . . . . 255
Appendix 10.B Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . 258
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

viii CONTENTS
11 Diffraction Applications 263

11.1 Fraunhofer Diffraction Through a Lens . . . . . . . . . . . . . . . 263
11.2 Resolution of a Telescope . . . . . . . . . . . . . . . . . . . . . . . . 267
11.3 The Array Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
11.4 Diffraction Grating . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
11.5 Spectrometers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
11.6 Diffraction of a Gaussian Field Profile . . . . . . . . . . . . . . . . 275
11.7 Gaussian Laser Beams . . . . . . . . . . . . . . . . . . . . . . . . . 277
Appendix 11.A ABCD Law for Gaussian Beams . . . . . . . . . . . . . . 279
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
12 Interferograms and Holography 287

12.1 Interferograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.2 Testing Optical Components . . . . . . . . . . . . . . . . . . . . . . 288
12.3 Generating Holograms . . . . . . . . . . . . . . . . . . . . . . . . . 289
12.4 Holographic Wavefront Reconstruction . . . . . . . . . . . . . . . 291
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
13 Blackbody Radiation 301

13.1 Stefan-Boltzmann Law . . . . . . . . . . . . . . . . . . . . . . . . . 302
13.2 Failure of the Equipartition Principle . . . . . . . . . . . . . . . . . 303
13.3 Planck’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
13.4 Einstein’s A and B Coefficients . . . . . . . . . . . . . . . . . . . . . 308
Appendix 13.A Thermodynamic Derivation of the Stefan-Boltzmann
Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Appendix 13.B Boltzmann Factor . . . . . . . . . . . . . . . . . . . . . . 312
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
References 317
Index 319
Physical Constants 325

Chapter 0
Mathematical Tools
The study of optics requires a variety of mathematical tools. This chapter reviews
many of these, in particular those relied upon in this book. This book often refers
back to sections in this chapter. Students are encouraged to look over this material
before starting a course in optics, which typically would begin with Chapter 1. The
material in this chapter is not a comprehensive review. It is assumed that students
are already familiar with differentiation, integration, and standard trigonometric
and algebraic manipulation.
The topics in this mathematical overview appear in the order that they are
encountered in the text. Section 0.1 is an overview of vector calculus and related
theorems, which are used extensively in electromagnetic theory and invoked
frequently throughout this book. Although the material in 0.1 is only occasionally
needed for homework problems, if students are comfortable with vector calculus,
they will better grasp the connection between electromagnetic principles and
optical phenomena as these principles are explained. Section 0.2 reviews complex
arithmetic, and students really need to know this material by heart. Section 0.3 is
an introduction to Fourier theory. Fourier transforms are used extensively in the
study of optics, beginning with chapter 7 in this book. The presentation below is
sufficiently comprehensive for the student who encounters Fourier transforms
here for the first time. Such a student is strongly advised to study section 0.3
before starting chapter 7.
0.1 Vector Calculus

René Descartes (1596-1650, French)
In optics we are concerned primarily with electromagnetic fields that are defined was a mathematician, physicist, and
throughout space. Each position in space corresponds to a unique vector r ≡ philosopher. He is credited with invent-
ing the cartesian coordinate system,
x x̂ + y ŷ + z ẑ, where x̂, ŷ, and ẑ are unit vectors with length one, pointing along
which is named after him.
their respective axes. The boldface type indicates a vector. The use of x̂, ŷ, and ẑ
denotes a Cartesian coordinate system. Electric and magnetic fields are vectors
whose magnitude and direction can depend on position, as denoted by E (r)
or B (r). An example of such a field is E (r) = q (r − r0 ) 4π²0 |r − r0 |3 , which is
±
the static electric field surrounding a point charge located at position r0 . The
1
2 Chapter 0 Mathematical Tools
absolute-value brackets indicate the magnitude (or length) of the vector given by
¯ ¯
|r − r0 | = ¯(x − x 0 ) x̂ + y − y 0 ŷ + (z − z 0 ) ẑ¯
¡ ¢
q
¢2 (0.1)
= (x − x 0 )2 + y − y 0 + (z − z 0 )2
¡
Example 0.1
Compute the electric field at r = 2x̂ + 2ŷ + 2ẑ Å due to an charge q positioned at
¡ ¢
r0 = 1x̂ + 1ŷ + 2ẑ Å.

¡ ¢
Solution: As mentioned above, the field is given by E (r) = q (r − r0 ) 4π²0 |r − r0 |3 .

±
We have
r − r0 = (2 − 1)x̂ + (2 − 1)ŷ + (2 − 2)ẑ Å = 1x̂ + 1ŷ Å
¡ ¢ ¡ ¢
and
p p
|r − r0 | = (1)2 + (1)2 Å = 2Å
The electric field is then

e 1x̂ + 1ŷ Å
¡ ¢
E=− ¢3
4π²0 2 Å
¡p
In addition to position, the electric and magnetic fields almost always depend
also on time in optics problems. For example, a common time-dependent field is
E(r, t ) = E0 cos(k·r−ωt ). The dot product k·r is an example of vector multiplication,
and signifies the following operation:
k · r = k x x̂ + k y ŷ + k z ẑ · x x̂ + y ŷ + z ẑ
¡ ¢ ¡ ¢
= kx x + k y y + kz z (0.2)
= |k||r| cos φ
where φ is the angle between the vectors k and r.
Proof of the final line of (0.2)
It is always possible to choose a plane containing the two vectors k and r. Call
it the x 0 y 0 -plane. In this coordinate system, the two vectors can be written as
k = k cos θx̂0 + k sin θŷ0 and r = r cos αx̂0 + r sin αŷ0 , where θ and α are the respec-
tive angles that the two vectors make with the x 0 -axis. The dot product gives
k·r = kr (cos θ cos α + sin θ sin α). This simplifies to k·r = kr cos (θ − α) (see (0.14)),
where θ − α = φ is the angle between the vectors. Thus, the dot product between
two vectors is the product of the magnitudes of each vector times the cosine of the
angle between them.

0.1 Vector Calculus 3
Another type of vector multiplication is the cross product , which is accom-

plished in the following manner:
¯ ¯
¯ x̂ ŷ ẑ ¯¯
¯
E × B = ¯¯ E x E y E z ¯¯
¯ B (0.3)
x B y Bz
¯
= E y B z − E z B y x̂ − (E x B z − E z B x ) ŷ + E x B y − E y B x ẑ
¡ ¢ ¡ ¢
Note that the cross product results in a vector, whereas the dot product results
in a scalar (i.e. a number with appropriate units). The resultant vector is always
perpendicular to the two vectors that are cross multiplied.
We will use several multidimensional derivatives in our study of optics, namely
the gradient, the divergence, and the curl. In Cartesian coordinates, the gradient
of a scalar function is given by
¢ ∂f ∂f ∂f
∇ f x, y, z = (0.4)
¡
x̂ + ŷ + ẑ
∂x ∂y ∂z
the divergence, which applies to vector functions, is given by
∂E x ∂E y ∂E z
∇·E = + + (0.5)
∂x ∂y ∂z
and the curl, also applies to vector functions, is given by

¯ ¯
¯ x̂ ŷ ẑ ¯¯
∇ × E = ¯¯ ∂/∂x ∂/∂y ∂/∂z ¯¯
¯
¯ E Ey Ez ¯
x (0.6)
∂E z ∂E y ∂E z ∂E x ∂E y ∂E x
µ ¶ µ ¶ µ ¶
= − x̂ − − ŷ + − ẑ
∂y ∂z ∂x ∂z ∂x ∂y
Example 0.2
Derive the gradient (0.4) in cylindrical coordinates defined by the transformations
x = ρ cos φ and y = ρ sin φ. (The coordinate z remains unchanged.)
Solution: By inspection of Fig. 1, the cartesian unit vectors may be expressed as
x̂ = cos φρ̂ − sin φφ̂ and ŷ = sin φρ̂ + cos φφ̂
In accordance with the rules of calculus, the needed partial derivatives expressed
in terms of the new variables are
Figure 1 The unit vectors x̂ and
∂ ∂ρ ∂ ∂φ ∂ ∂ ∂ρ ∂ ∂φ ∂
µ ¶ µ ¶ µ ¶ µ ¶
= + and = + ŷ may be expressed in terms of
∂x ∂x ∂ρ ∂x ∂φ ∂y ∂y ∂ρ ∂y ∂φ components along φ̂ and ρ̂ in
Meanwhile, the inverted form of the coordinate transformation is cylindrical coordinates.
q
ρ = x 2 + y 2 and φ = tan−1 y/x

from which we obtain the following derivatives:
∂ρ x ∂φ y sin φ
=p = cos φ =− 2 =−
∂x x + y2
2 ∂x x + y2 ρ
∂ρ y ∂φ x cos φ
=p = sin φ = 2 2
=
∂y x + y2
2 ∂y x +y ρ
Putting this all together, we arrive at
∂f ∂f ∂f
∇f = x̂ + ŷ + ẑ
∂x ∂y ∂z
∂ f sin φ ∂ f ¡
µ ¶
= cos φ cos φρ̂ − sin φφ̂
¢
−
∂ρ ρ ∂φ
∂ f cos φ ∂ f ¡ ¢ ∂f
µ ¶
+ sin φ + sin φρ̂ + cos φφ̂ + ẑ
∂ρ ρ ∂φ ∂z
∂f 1 ∂f ∂f
= ρ̂ + φ̂ + ẑ
∂ρ ρ ∂φ ∂z
We will sometimes need a multidimensional second derivative called the

Laplacian. When applied to a scalar function, it is defined as the divergence of a
gradient:
∇2 f x, y, z ≡ ∇ · ∇ f x, y, z (0.7)
¡ ¢ £ ¡ ¢¤
In cartesian coordinates, this reduces to
¢ ∂2 f ∂2 f ∂2 f
∇2 f x, y, z = 2 + 2 + 2 (0.8)
¡
∂x ∂y ∂z
Since the Laplacian applied to a scalar gives a result that is also a scalar, in Carte-
sian coordinates we deal with vector functions by applying the Laplacian to the
scalar function attached to each unit vector:
Pierre-Simon Laplace (1749-1827,
∂2 E y ∂2 E y ∂2 E y
µ 2
French) was a mathematician and as-
∂ E x ∂2 E x ∂2 E x
¶ Ã !
tronomer. 2
∇ E= + + x̂ + + + ŷ
∂x 2 ∂y 2 ∂z 2 ∂x 2 ∂y 2 ∂z 2
µ 2 (0.9)
∂ E z ∂2 E z ∂2 E z
¶
+ + + ẑ
∂x 2 ∂y 2 ∂z 2
This is possible because each unit vector is a constant in Cartesian coordinates.

The various multidimensional derivatives take on more complicated forms in
non-cartesian coordinates such as cylindrical or spherical. One can derive the
Laplacian for these other coordinate systems by changing variables and rewriting
the unit vectors starting from the above Cartesian expression. (See Problem 0.11.)
Regardless of the coordinate system, the Laplacian for a vector function can
always be obtained from first derivatives though
∇2 E ≡ ∇(∇ · E) − ∇ × (∇ × E) (0.10)

0.1 Vector Calculus 5
Verification of (0.10) in Cartesian coordinates
From (0.6), we have
∂E z ∂E y ∂E z ∂E x ∂E y ∂E x
µ ¶ µ ¶ µ ¶
∇×E = − x̂ − − ŷ + − ẑ
∂y ∂z ∂x ∂z ∂x ∂y
and
¯ ¯
¯ x̂ ŷ ẑ ¯
∂/∂x ∂/∂y ∂/∂z
¯ ¯
∇ × (∇ × E) = ¯ ³
¯ ¯
¯ ∂E z ∂E y ∂E
¯
∂E z ∂E x y ∂E x
´ ³ ´ ³ ´ ¯
¯
∂y
− ∂z − ∂x − ∂z ∂x
− ∂y ¯
∂ ∂E y ∂E x ∂ ∂E z ∂E x ∂ ∂E y ∂E x ∂ ∂E z ∂E y
· µ ¶ µ ¶¸ · µ ¶ µ ¶¸
= − + − x̂ − − − − ŷ
∂y ∂x ∂y ∂z ∂x ∂z ∂x ∂x ∂y ∂z ∂y ∂z
∂ ∂E z ∂E x ∂ ∂E z ∂E y
· µ ¶ µ ¶¸
+ − − − − ẑ
∂x ∂x ∂z ∂y ∂y ∂z
∂2 E x ∂2 E y 2
After adding and subtracting ∂x 2
+ ∂y 2
+ ∂∂zE2z and then rearranging, we get
∂2 E y ∂2 E z
∂2 E x
2
∂2 E x ∂ E y ∂2 E z
2
∂2 E x ∂ E y ∂2 E z
" # " # " #
∇ × (∇ × E) = + + x̂ + + + ŷ + + + ẑ
∂x 2 ∂x∂y ∂x∂z ∂x∂y ∂y 2 ∂y∂z ∂x∂z ∂y∂z ∂z 2
" 2
∂2 E x ∂2 E x ∂2 E x ∂ E y ∂2 E y ∂2 E y ∂2 E z ∂2 E z ∂2 E z
" # # " #
− + + x̂ − + + ŷ − + + ẑ
∂x 2 ∂y 2 ∂z 2 ∂x 2 ∂y 2 ∂z 2 ∂x 2 ∂y 2 ∂z 2
After some factorization, we obtain

¸ " 2
∂E x ∂E y ∂E z ∂2 ∂2 £
#
∂ ∂ ∂ ∂
· ¸·
∇ × (∇ × E) = x̂ E x x̂ + E y ŷ + E z ẑ
¤
+ ŷ + ẑ + + − + +
∂x ∂y ∂z ∂x ∂y ∂z ∂x 2 ∂y 2 ∂z 2
= ∇ (∇ · E) − ∇2 E
where on the final line we invoked (0.4), (0.5), and (0.8).
We will also encounter several integral theorems involving vector functions in

the course of this book. The divergence theorem for a vector function F is
I Z
F · n̂ d a = ∇·F dv (0.11)
S V
The integration on the left-hand side is over the closed surface S, which contains
the volume V associated with the integration on the right-hand side. The unit
vector n̂ points outward, normal to the surface. The divergence theorem is espe-
cially useful in connection with Gauss’ law, where the left hand side is interpreted
as the number of field lines exiting a closed surface.
Example 0.3
Check the divergence theorem (0.11) for F x, y, z = ¯y 2¯x̂ + x y ŷ + x 2 z ẑ. Take as the
¡ ¢
volume a cube contained by the six planes |x| = ±1, ¯ y ¯ = ±1, and |z| = ±1.

Solution: First, we evaluate the left side of (0.11) for this function
I Z1 Z1 Z1 Z1 Z1 Z1
F · n̂d a = d xd y x 2 z z=1 − d xd y x 2 z z=−1 + d xd z x y y=1
¡ ¢ ¡ ¢ ¡ ¢
S −1 −1 −1 −1 −1 −1
Z1 Z1 Z1 Z1 Z1 Z1
2
d xd z x y d yd z y d yd z y 2 x=−1
¡ ¢ ¡ ¢ ¡ ¢
− y=−1 + x=1 −
−1 −1 −1 −1 −1 −1
Z1 Z1 Z1 Z1 ¯1 ¯1
x 3 ¯¯ x 2 ¯¯ 8
=2 d xd y x 2 + 2 d xd zx = 4 +4 =
3 ¯−1 2 ¯−1 3
−1 −1 −1 −1
Now we evaluate the right side of (0.11):
Z1 Z1 Z1 Z1 · 2 ¸1
x x3 8
Z
2
∇ · Fd v = d xd yd z x + x =4 d x x + x2 = 4
£ ¤ £ ¤
+ =
2 3 −1 3
V −1 −1 −1 −1
Another important theorem is Stokes’ theorem :

Z I
∇ × F · n̂ d a = F · d ` (0.12)
S C
The integration on the left-hand side is over an open surface S (not enclosing a
volume). The integration on the right-hand side is around the edge of the surface.
Again, n̂ is a unit vector that always points normal to the surface . The vector d `
points along the curve C that bounds the surface S. If the fingers of your right
hand point in the direction of integration around C , then your thumb points
in the direction of n̂. Stokes’ theorem is especially useful in connection with
Ampere’s law and Faraday’s law. The right-hand side is an integration of a field
around a loop.
The following vector integral theorem will also be useful:
Z I
[F (∇ · G) + (G · ∇) F] d v = F (G · n̂) d a (0.13)
V S
0.2 Complex Numbers

It is often convenient to represent electromagnetic wave phenomena (i.e. light) as
a superposition of sinusoidal functions, each having the form A cos α + β , The
¡ ¢
sine function is intrinsically present in this formula through the identity
cos α + β = cos α cos β − sin α sin β (0.14)

¡ ¢
This is a good formula to commit to memory, as well as the frequently used

identity
sin α + β = sin α cos β + sin β cos α (0.15)
¡ ¢

0.2 Complex Numbers 7
With a basic familiarity with trigonometry, one can approach many optical
problems including those involving the addition of multiple waves. However,
the manipulation of trigonometric functions via identities such as (0.14) and
(0.15) can be cumbersome and tedious. Fortunately, complex notation offers an
equivalent approach with far less busy work. (The modest investment needed to
become comfortable with complex notation is definitely worth it; optics problems
can become cumbersome enough even with the complex notation!)
The convenience of complex notation has its origins in Euler’s formula:
e i φ = cos φ + i sin φ (0.16)
p
where i ≡ −1 is an imaginary number. Euler’s formula is easily proven using a
Taylor’s series expansion:
2 ¯¯
1 d f ¯¯ 1 2 d f ¯
¯
f (x) = f (x 0 ) + (x − x 0 ) + (x − x 0 ) +··· (0.17)
1! d x ¯x=x0 2! d x 2 ¯x=x0
By expanding each function appearing in (0.16) in a Taylor’s series about the

Leonhard Euler (1707-1783, Swiss)
origin we obtain was a mathematician and physicist
who made many contributions to math,
mechanics, fluid dynamics, optics, and
φ2 φ4 astronomy
cos φ = 1 − + −···
2! 4!
φ3 φ5
i sin φ = i φ − i +i −··· (0.18)
3! 5!
φ2 φ3 φ4 φ5
eiφ = 1 + i φ − −i + +i −···
2! 3! 4! 5!
The last line of (0.18) is seen to be the sum of the first two lines, from which Euler’s
formula directly follows.
By inverting Euler’s formula (0.16) we can obtain the following representation
of the cosine and sine functions:
e i φ + e −i φ
cos φ = ,
2 (0.19)
e i φ − e −i φ
sin φ =
2i
Example 0.4
Prove (0.14) and (0.15) as well as cos2 φ + sin2 φ = 1 by taking advantage of (0.19).
Solution: We start with (0.14). By direct application of (0.19) and some rearranging,

we have
e i α + e −i α e i β + e −i β e i α − e −i α e i β − e −i β
cos α cos β − sin α sin β = −
2 2 2i 2i
i (α+β) i (α−β) −i (α−β) −i (α+β)
e +e +e +e
=
4
e i (α+β) − e i (α−β) − e −i (α−β) + e −i (α+β)
+
4
e i (α+β) + e −i (α+β)
= cos α + β
¡ ¢
=
2
We can prove (0.15) using the same technique.
e i α − e −i α e i β + e −i β e i β − e −i β e i α + e −i α
sin α cos β + sin β cos α = +
2i 2 2i 2
e i (α+β) + e i (α−β) − e −i (α−β) − e −i (α+β)
=
4i
e i (α+β) − e i (α−β) + e −i (α−β) − e −i (α+β)
+
4i
i (α+β) −i (α+β)
e −e
= sin α + β
¡ ¢
=
2i
Finally, for cos2 φ + sin2 φ = 1 we have

¶2 ¶2
e i φ + e −i φ e i φ − e −i φ
µ µ
cos2 φ + sin2 φ = +
2 2i
e 2i φ + 2 + e −2i φ e 2i φ − 2 + e −2i φ
= − =1
4 4
Equation (0.19) shows how ordinary sines and cosines are intimately related
to hyperbolic cosines and hyperbolic sines. If φ happens to be imaginary such that
φ = i γ where γ is real, then we have
e −γ − e γ
sin i γ = = i sinh γ
2i (0.20)
e −γ + e γ
cos i γ = = cosh γ
2
There are several situations in optics where one is interested in a complex
angle, φ = η + i γ where η and γ are real numbers. For example, the solution
to the wave equation when absorption or amplification takes place contains
an exponential with a complex argument. In this case, the imaginary part of
φ introduces exponential decay or growth, as is apparent upon examination of
(0.19). Another important situation occurs when one attempts to calculate the
transmission angle for light incident upon a surface beyond the critical angle for
total internal reflection. In this case, it is necessary to compute the arcsine of
a number greater than one in an effort to satisfy Snell’s law. Even though such
an angle does not exist in the usual sense, a complex value for φ can be found

0.2 Complex Numbers 9
which satisfies (0.19). The complex value for the angle is useful in computing the
characteristics of an evanescent wave that exists on the transmitted side of the
surface.
As was mentioned previously, we will be interested in waves of the form
A cos (x + α). We can use complex notation to represent this wave simply by
writing n o
A cos α + β = Re Ãe i α (0.21)
¡ ¢
where the phase β is conveniently contained within the complex factor Ã ≡ Ae i β .

The operation Re {} means to retain only the real part of the argument without
regard for the imaginary part.. As an example, we have Re {1 + 2i } = 1. The
expression (0.21) is a direct result of Euler’s equation (0.16).
It is common (even conventional) to omit the explicit writing of Re {}. Thus,
physicists agree that Ãe i α actually means A cos α + β . This laziness is permissi-
¡ ¢
ble because it is possible to perform linear operations on Re f such as addition,

© ª
differentiation, or integration while procrastinating the taking of the real part

until the end:
Re f + Re g = Re f + g
© ª © ª © ª
d df
½ ¾
Re f = Re
© ª
dx dx (0.22)
Z ½Z ¾
Re f d x = Re f dx
© ª
As an example, note that Re {1 + 2i } + Re {3 + 4i } = Re {(1 + 2i ) + (3 + 4i )} = 4.

However, one must be careful when performing other operations such as multi- Gerolamo Cardano (1501-1576, Ital-
plication. In this case, it is essential to take the real parts before performing the ian) conceived and defined the notion
of complex numbers (he called them
operation. Notice that “fictitious") during his attempts to find
solutions to cubic equations.
Re f × Re g 6= Re f × g (0.23)
© ª © ª © ª
As an example, we see Re {1 + 2i } × Re {3 + 4i } = 3, but Re {(1 + 2i ) (3 + 4i )} = −5.

When dealing with complex numbers it is often advantageous to transform
between a Cartesian representation and a polar representation. With the aid of
Euler’s formula, it is possible to transform any complex number a + i b into the
form ρe i φ , where a, b, ρ, and φ are real. From (0.16), the required connection
between ρ, φ and (a, b) is
¡ ¢
ρe i φ = ρ cos φ + i ρ sin φ = a + i b (0.24)
The real and imaginary parts of this equation must separately be equal. Thus, we
have
a = ρ cos φ
(0.25)
b = ρ sin φ
These equations can be inverted to yield
p
ρ = a2 + b2
b (0.26)
φ = tan−1 (a > 0)
a
When a < 0, we must adjust φ by π since the arctangent has a range only from
−π/2 to π/2.
The transformations in (0.25) and (0.26) have a clear geometrical interpreta-
Quadrant I
tion in the complex plane, and this makes it easier to remember them. They are
just the usual connections between Cartesian and polar coordinates. As seen in
II
Fig. 2, ρ is the hypotenuse of a right triangle having legs with lengths a and b, and
φ is the angle that the hypotenuse makes with the x-axis. Again, students should
be careful when a is negative since the arctangent is defined in quadrants I and
IV. An easy way to deal with the situation of a negative a is to factor the minus
III IV sign out before proceeding (i.e. a + i b = − (−a − i b) ). Then the transformation is
made on −a − i b where −a is positive. The minus sign out in front is just carried
along unaffected and can be factored back in at the end. Notice that −ρe i φ is the
Figure 2 A number in the complex same as ρe i (φ±π) .
plane can be represented either by
Cartesian or polar representation.
Example 0.5
Write −3 + 4i in polar format.
Solution: We must be careful with the negative real part since it indicates a quad-
rant (in this case III) outside of the domain of the inverse tangent (quadrants I and
IV). Best to factor the negative out and deal with it separately.
−1 (−4) −1 4 −1 4
−3 + 4i = −(3 − 4i ) = − 32 + (−4)2 e i tan 3 = e i π 5e −i tan 3 = 5e i (π−tan 3 )
p
Finally, we consider the concept of a complex conjugate. The conjugate of a

complex number z = a + i b is denoted with an asterisk and amounts to changing
the sign on the imaginary part of the number:
z ∗ = (a + i b)∗ ≡ a − i b (0.27)
The complex conjugate is useful when computing the absolute value of a complex
number: p p p
|z| = z ∗ z = (a − i b) (a + i b) = a 2 + b 2 = ρ (0.28)
Note that the absolute value of a complex number is the same as its magnitude ρ
as defined in (0.26). The complex conjugate is also useful for eliminating complex
numbers from the denominator of expressions:
a + i b (a + i b) (c − i d ) ac + bd + i (bc − ad )
= = (0.29)
c + i d (c + i d ) (c − i d ) c2 + d2
No matter how complicated an expression, the complex conjugate is cal-
culated by simply inserting a minus sign in front of all occurrences of i in the
expression, and placing an asterisk on all complex variables in the expression.
For example, the complex conjugate of ρe i φ is ρe −i φ assuming ρ and φ are real,
as can be seen from Euler’s formula (0.16). As another example consider
¤∗
E o exp {i (K z − ωt )} = E o∗ exp −i K ∗ z − ωt
£ © ¡ ¢ª

0.3 Fourier Theory 11
assuming z, ω, and t are real, but E o and K are complex.

A common way of obtaining the real part of an expression is simply by adding
the complex conjugate and dividing the result by 2:
1¡
Re {z} = z + z∗ (0.30)
¢
2
Notice that the expression for cos φ in (0.19) is an example of this formula. Some-
times when a complicated expression is added to its complex conjugate, we let
“C.C.” represent the complex conjugate in order to avoid writing the expression
twice.
0.3 Fourier Theory
We often decompose complicated light fields into a superposition of pure sinu-

soidal waves. This is called Fourier analysis, and it enables us to consider the
behavior of the individual frequency components one at a time. This is important
since, for example, refractive index is different for different frequencies. After
determining how individual sine waves move through an optical system (say, a
piece of glass), we can reassemble the sinusoidal waves to see the effect of the
system on the overall waveform. Fourier transforms are used for this purpose. In
fact, it will be possible to work simultaneously with infinitely many sinusoidal
waves, where the frequencies comprising a light field are spread continuously
over a range. Fourier transforms are also helpful in diffraction problems where
waves propagating in many different directions (all at the same frequency) are
superimposed . Joseph Fourier (1768-1830, French)
was a mathematician, physicist, and
We begin with a derivation of the Fourier integral theorem. A periodic function historian. He made significant contribu-
can be represented in terms of the sine and the cosine in the following manner: tions to the study of heat transfer and
vibrations.
∞
X
f (t ) = a n cos (n∆ωt ) + b n sin (n∆ωt ) (0.31)
n=0
This is called a Fourier expansion. It is similar in idea to a Taylor’s series (0.17),

which rewrites a function as a polynomial. In both cases, the goal is to represent
one function in terms of a linear combination of other functions (requiring a
complete basis set). In a Taylor’s series the basis functions are polynomials and
in a Fourier expansion the basis functions are sines and cosines with various
frequencies.
By inspection, we see that all terms in (0.31) repeat with a maximum period
of 2π/∆ω. This is why the expansion is limited in its use to periodic functions.
The period of the function by such an expansion is such that f (t ) = f (t + 2π/∆ω).
The expansion (0.31) is possible even if f (t ) is complex (requiring a n and b n to
be complex).
We can rewrite the sines and cosines in the expansion (0.31) using (0.19) as

follows:
∞
X e i n∆ωt + e −i n∆ωt e i n∆ωt − e −i n∆ωt
f (t ) = an + bn
n=0 2 2i
∞ a −ib ∞ a +ib
(0.32)
X n n i n∆ωt X n n −i n∆ωt
= a0 + e + e
n=1 2 n=1 2
Thus, (0.31) becomes simply
∞
c n e −i n∆ωt
X
f (t ) = (0.33)
n=−∞
where
a −n − i b −n
c n<0 ≡
2
an + i bn (0.34)
c n>0 ≡
2
c0 ≡ a0
Notice that if c −n = c n∗ for all n, then f (t ) is real (i.e. real a n and b n ); otherwise
f (t ) is complex. The real parts of the c n coefficients are connected with the cosine
terms in (0.31), and the imaginary parts of the c n coefficients are connected with
the sine terms in (0.31).
Given a known function f (t ), we can compute the various coefficients c n .
There is a trick for figuring out how to do this. We multiply both sides of (0.33) by
e i m∆ωt , where m is an integer, and integrate over the function period 2π/∆ω:
π/∆ω π/∆ω
Z ∞ Z
i m∆ωt
e i (m−n)∆ωt d t
X
f (t )e dt = cn
n=−∞
−π/∆ω −π/∆ω
¸π/∆ω
∞ e i (m−n)∆ωt
·
X
= cn
n=−∞ i (m − n) ∆ω −π/∆ω (0.35)
X 2πc n e i (m−n)π − e −i (m−n)π
∞ · ¸
=
n=−∞ ∆ω 2i (m − n) π
∞ 2πc sin [(m − n) π]
X n
=
n=−∞ ∆ω (m − n) π
The function sin [(m − n) π] / [(m − n) π] is equal to zero for all n 6= m, and it is
equal to one when n = m (to see this, use L’Hospital’s rule on the zero-over-zero
situation, or just go back and re perform the above integral for n = m). Thus, only
one term contributes to the summation in (0.35). We now have
π/∆ω
∆ω
Z
cm = f (t )e i m∆ωt d t (0.36)
2π
−π/∆ω
from which the coefficients c n can be computed, given a function f (t ). (Note that
m is a dummy index so we can change it back to n if we like.)

This completes the circle. If we know the function f (t ), we can find the
coefficients c n via (0.36), and, if we know the coefficients c n , we can generate the
function f (t ) via (0.33). If we are feeling a bit silly, we might combine these into a
single identity:
π/∆ω
 
 ∆ω
∞ Z
f (t )e i n∆ωt d t  e −i n∆ωt
X
f (t ) = (0.37)
n=−∞ 2π
−π/∆ω
We start with a function f (t ) followed by a lot of computation and obtain the

function back again! (This is not quite as foolish as it first appears, as we will see
later.)
As mentioned above, Fourier expansions represent functions f (t ) that are
periodic over the interval 2π/∆ω. This is disappointing since many optical wave-
forms do not repeat (e.g. a single short laser pulse). Nevertheless, we can represent
a function f (t ) that is not periodic if we let the period 2π/∆ω become infinitely
long. In other words, we can accommodate non-periodic functions if we take the
limit as ∆ω goes to zero so that the spacing of terms in the series becomes very
fine. Applying this limit to (0.37) we obtain
Z∞
 
1 ∞ 0
e −i n∆ωt f t 0 e i n∆ωt d t 0 ∆ω
X
f (t ) = lim (0.38)
¡ ¢
2π ∆ω→0 n=−∞
−∞
At this point, a brief review of the definition of an integral is helpful to better

understand the next step that we shall administer to (0.38).
Changing the summation in (0.38) over to an integral
Recall that an integral is really a summation of rectangles under a curve with finely
spaced steps:
Zb b−a
∆ω
g (ω) d ω ≡ lim g (a + n∆ω) ∆ω
X
∆ω→0 n=0
a
b−a
(0.39)
2∆ω a +b
µ ¶
+ n∆ω ∆ω
X
= lim g
∆ω→0 b−a 2
n=− 2∆ω
The final expression has been manipulated so that the index ranges through both
negative and positive numbers. If we set a = −b and take the limit b → ∞, then the
above expression becomes
Z∞ ∞
g (ω) d ω = lim g (n∆ω) ∆ω
X
(0.40)
∆ω→0 n=−∞
−∞
This concludes our short review of calculus.

Obviously, (0.38) has the same form as (0.40) if g (n∆ω) represents everything
in the square brackets of (0.38). The result is the Fourier integral theorem:
Z∞ Z∞
 
1 1
e −i ωt  p f t 0 e i ωt d t 0  d ω
0
f (t ) = p (0.41)
¡ ¢
2π 2π
−∞ −∞
The piece in brackets is called the Fourier transform, and the rest of the operation
is called the inverse Fourier transform. The Fourier integral theorem (0.41) is often
written with the following (potentially confusing) notation:
Z∞
1
f (ω) ≡ p f (t )e i ωt d t
2π
−∞
(0.42)
Z∞
1
f (t ) ≡ p f (ω) e −i ωt d ω
2π
−∞
The transform and inverse transform are also sometimes written as f (ω) ≡
F f (t ) and f (t ) ≡ F −1 f (ω) . Note that the functions f (t ) and f (ω) are en-
© ª © ª
tirely different, even taking on different units (e.g. the latter having extra units of
per frequency). The two functions are distinguished by their arguments, which
also have different units (e.g. time vs. frequency). Nevertheless, it is customary to
use the same letter to denote either function since they form a transform pair.
You should be aware that it is arbitrary which of the expressions in (0.42) is
called the transform and which is called the inverse transform. In other words,
the signs in the exponents of (0.42) may be interchanged. The convention varies
in published works. Also, the factor 2π may be placed on either the transform or
the inverse transform, or divided equally between the two as has been done here.
Example 0.6
2 /2T 2
Compute the Fourier transform of E (t ) = E 0 e −t e −i ω0 t followed by the inverse
Fourier transform.
Solution: According to (0.42), the Fourier transform is

Z∞ ³ Z∞
1 −t 2 /2T 2 −i ω0 t
´
i ωt E0 2 2
E (ω) = p E0e e e dt = p e −t /2T +i (ω−ω0 )t d t
2π 2π
−∞ −∞
The integration can be performed with the help of (0.55), which yields
( 2
0)ω−ω
E0 π
r
− 2 2
4(1/2T 2 )
E (ω) = p e = T E 0 e −T (ω−ω0 ) /2
2π 1/2T 2
Similarly, the inverse Fourier transform of the above function is

Z∞ ³ Z∞
1 2
(ω−ω0 )2 /2
´ T E0 T2 2
ω2 +(T 2 ω0 −i t )ω− T2 ω20
E (t ) = p T E 0 e −T e −i ωt d ω = p e− 2 dω
2π 2π
−∞ −∞

where again we use (0.55) to obtain

2
T E0
r
π (T 2 ω0 −i t ) 2
− T2 ω20 2 /2T 2 −i ω t
4(T 2 /2)
E (t ) = p e = E 0 e −t 0
2π T 2 /2
which brings us back to where we started.
As was previously mentioned, it would seem rather pointless to perform a

Fourier transform on the function f (t ) followed by an inverse Fourier transform,
just to end up with f (t ) again. Nevertheless, we are interested in this because
we want to know the effect of an optical system on a waveform (represented by
f (t )). It turns out that in many cases, the effect of the optical system can only
be applied to f (ω) (for example, if the effect is frequency dependent). Thus, we
perform a Fourier transform on f (t ), then apply the frequency-dependent effect
on f (ω), and finally perform an inverse Fourier transform on the result. In this
case, the final function will be different from f (t ). Keep in mind that f (ω) is the
continuous analog of the discrete coefficients c n (or the a n and b n ). The real part
of f (ω) indicates the amplitudes of the cosine waves necessary to construct the
function f (t ). The imaginary part of f (ω) indicates the amplitudes of the sine
waves necessary to construct the function f (t ).
Finally, we comment on the delta function, which is defined indirectly through
Z∞
f (t ) = f t0 δ t0 − t dt0 (0.43)
¡ ¢ ¡ ¢
−∞
The delta function δ t 0 − t is zero everywhere except at t 0 = t where it is infinite

¡ ¢
in such a way as to make the integral take on the value of the function f (t ). (One
can think of δ t 0 − t d t 0 as an infinitely tall and infinitely thin rectangle centered
¡ ¢
at t 0 = t with an area unity.) The integral pays attention only to the value of f t 0
¡ ¢
only at the point t 0 = t .

A remarkable attribute of the delta function can be seen from the Fourier
integral theorem. After rearranging the order of integration, the Fourier integral
theorem (0.41) can be written as
Z∞ Z∞
 
¡ 0¢ 1
e i ω(t −t ) d ω d t 0
0
f (t ) = f t  (0.44)
2π
−∞ −∞
A comparison of (0.43) and (0.44) shows that one may write the delta function
as a uniform superposition of all frequency components:
Z∞
1
e i ω(t −t ) d ω
0
0
δ t −t = (0.45)
¡ ¢
2π
−∞

Example 0.7
Use (0.45) to prove Parseval’s theorem:
Z∞ Z∞
¯ f (ω)¯2 d ω = ¯ f (t )¯2 d t
¯ ¯ ¯ ¯
−∞ −∞
which is used extensively in the study of optics.
Solution:
Z∞ Z∞
¯ f (ω)¯2 d ω = f (ω) f ∗ (ω) d ω
¯ ¯
−∞ −∞
Z∞  1 Z∞   1 Z∞ ¡ ¢
  

i ωt ∗ 0 −i ωt 0 0
= p f (t ) e d t p f t e dt dω
 2π   2π 
−∞ −∞ −∞
The order of integration can be changed to give
Z∞ Z∞ Z∞  1 Z∞
 
e i ω(t −(−t )) d ω d t d t 0
¯2 0

¯ f (ω)¯ d ω = f (t ) f ∗ −t 0
¯ ¡ ¢
 2π 
−∞ −∞ −∞ −∞
Z∞ Z∞
f (t ) f ∗ −t 0 δ t 0 − (−t ) d t d t 0
¡ ¢ ¡ ¢
=
−∞ −∞
Z∞ Z∞
∗ ¯ f (t )¯2 d t
¯ ¯
= f (t ) f (t ) d t =
−∞ −∞
Equation (0.45) was used to reach the final result.
0.4 Linear Algebra

In the study of optics, it is common to encounter sets of linear equations. Most
often, there are just two equations with two variables to solve. The simplest
example of such a set of equations is
Ax + B y = F and Cx + Dy = G (0.46)
where x and y are variables. Here, each equation specifies a line, and for that
reason they are called linear equations. A set of linear equations such as (0.46)
can be expressed using matrix notation as
A B x Ax + B y F
· ¸· ¸ · ¸ · ¸
= = (0.47)
C D y Cx +Dy G
The above 2×2 matrix multiplied onto the two-dimensional column vector results
in a vector as seen above. The elements of rows on the left are multiplied onto

0.4 Linear Algebra 17
elements of columns on the right and summed to create each new element in the
result. This applies whether a matrix is multiplied onto a vector or onto another
matrix (resulting in a matrix).
To solve a matrix equation such as (0.47), one multiplies both sides by the
inverse matrix, which gives
¸−1 · ¸−1 ·
A B A B x A B F
· ¸· ¸ · ¸
= (0.48)
C D C D y C D G
The inverse matrix has the property that

¸−1 ·
A B A B 1 0
· ¸ · ¸
= (0.49)
C D C D 0 1
where the right-hand side is called the identity matrix. You can easily check that
the identity matrix leaves unchanged anything that it multiplies, and so (0.48)
simplifies to
¸−1 ·
x A B F
· ¸ · ¸
=
y C D G
Once the inverse matrix is found, the matrix multiplication on the right can be
performed and the answers for x and y obtained as the upper and lower elements
of the result.
The inverse of a 2 × 2 matrix is given by
¸−1
A B 1 D
· · ¸
−B
=¯ (0.50)
C D ¯ A B ¯¯ −C A
¯
¯
¯ C D ¯
where
¯ A B ¯¯
¯ ¯
¯
¯ C ≡ AD −C B
D ¯
is called the determinant. 1 We can check that (0.50) is correct by direct substitu-
tion:
¸−1 ·
A B A B 1 D A B
· ¸ · ¸· ¸
−B
=
C D C D AD − BC −C A C D
1 AD − BC 0
· ¸
= (0.51)
AD − BC 0 AD − BC
1 0
· ¸
=
0 1
The above review of linear algebra is very basic. In contrast, we next discuss
Sylvester’s theorem, which students probably have not previously encountered.
1 We used a three dimensional version of the determinant to perform a cross product in (0.3),
which is altogether a different context.

Sylvester’s theorem is useful when multiplying the same 2 × 2 matrix (with a

determinate of unity) together many times (i.e. raising the matrix to a power).
This situation occurs when modeling periodic multilayer mirror coatings or when
considering light rays trapped in a laser cavity as they reflect many times.
Sylvester’s Theorem: If the determinant of a 2×2 matrix is one, (i.e. AD −BC = 1)
then
¸N
A B 1 A sin N θ − sin (N − 1) θ B sin N θ
· · ¸
= (0.52)
C D sin θ C sin N θ D sin N θ − sin (N − 1) θ
where
1
cos θ = (A + D) (0.53)
2
James Joseph Sylvester (1814-1897, Proof of Sylvester’s theorem by induction

English) made fundamental contribu-
tions to matrix theory, invariant theory, When N = 1, the equation is seen to be correct by direct substitution. Next we
number theory, partition theory and assume that the theorem holds for arbitrary N , and we check to see if it holds for
combinatorics. He played a leadership N + 1:
role in American mathematics in the ¸N +1
A B 1 A B A sin N θ − sin (N − 1) θ B sin N θ
· · ¸· ¸
later half of the 19th century as a pro-
=
fessor at the Johns Hopkins University C D sin θ C D C sin N θ D sin N θ − sin (N − 1) θ
and as founder of the American Journal
of Mathematics. Now we inject the condition AD − BC = 1 into the right-hand side
· ¡ 2
1 A + BC sin N θ − A sin (N − 1) θ (AB + B D) sin N θ − B sin (N − 1) θ
¢ ¸
¡ 2
sin θ (AC +C D) sin N θ −C sin (N − 1) θ D + BC sin N θ − D sin (N − 1) θ
¢
and rearrange the result to give

· ¡ 2
1 A + AD − 1 sin N θ − A sin (N − 1) θ B [(A + D) sin N θ − sin (N − 1) θ]
¢ ¸
¡ 2
sin θ C [(A + D) sin N θ − sin (N − 1) θ] D + AD − 1 sin N θ − D sin (N − 1) θ
¢
and then
1 A [(A + D) sin N θ − sin (N − 1) θ] − sin N θ B [(A + D) sin N θ − sin (N − 1) θ]
· ¸
sin θ C [(A + D) sin N θ − sin (N − 1) θ] D [(A + D) sin N θ − sin (N − 1) θ] − sin N θ
In each matrix element, the expression
(A + D) sin N θ = 2 cos θ sin N θ = sin (N + 1) θ + sin (N − 1) θ (0.54)

1
occurs, which we have rearranged using cos θ = 2 (A + D) while twice invoking
(0.15). The result is
¸N +1
A B 1 A sin (N + 1) θ − sin N θ B sin (N + 1) θ
· · ¸
=
C D sin θ C sin (N + 1) θ D sin (N + 1) θ − sin N θ
which completes the proof.

0.A Table of Integrals and Sums 19
Appendix 0.A Table of Integrals and Sums

The following table of formulas are useful for various problems encountered in
the text.
Z∞
π b2 +c
r
−ax 2 +bx+c
e dx = e 4a Re {a} > 0 (0.55)
a
−∞
Z∞
e i ax π |b| −|ab|
2 2
dx = e b>0 (0.56)
1 + x /b 2
0
Z2π
e ±i a cos(θ−θ ) d θ = 2πJ 0 (a)
0
(0.57)
0
Za
a
J 0 (bx) x d x = J 1 (ab) (0.58)
b
0
Z∞ 2
−ax 2 e −b /4a
e J 0 (bx) x d x = (0.59)
2a
0
Z∞
sin2 (ax) π
2
dx = (0.60)
(ax) 2a
0
Zπ Zπ
1
sin(ax) sin(bx) d x = cos(ax) cos(bx) d x = δab (a, b integer) (0.61)
2
0 0
N N
1−r
ar n = a
X
(0.62)
n=1 1−r
∞ a
ar n =
X
(r < 1) (0.63)
n=1 1−r

Exercises
Exercises for 0.1 Vector Calculus
Let r = x̂ + 2ŷ − 3ẑ m and r0 = −x̂ + 3ŷ + 2ẑ m.

¡ ¢ ¡ ¢
P0.1
(a) Find the magnitude of r.
(b) Find r − r0 .
(c) Find the angle between r and r0 .
p
Answer: (a) r = 14 m; (c) 94◦ .
P0.2 Prove that the cross product between two vectors is the product of
the magnitudes of the two vectors multiplied by the sine of the angle
between them. The result is a vector directed perpendicular to the
plane containing the original two vectors in accordance with the right
hand rule.
P0.3 Verify the “BAC-CAB” rule: A × (B × C) = B (A · C) − C (A · B).
P0.4 Prove the following identity:
r − r0
¡ ¢
1
∇r =− ,
|r − r0 | |r − r0 |3
where ∇r operates only on r, treating r0 as a constant vector.
P0.5 (r−r0 ) is zero, except at r = r0 where a singularity situation

Prove that ∇r · |r−r 0 |3
occurs.
P0.6 Verify ∇ · (∇ × f) = 0 for any vector function f.
Verify ∇ × f × g = f ∇ · g − g (∇ · f) + g · ∇ f − (f · ∇) g.
¡ ¢ ¡ ¢ ¡ ¢
P0.7
Verify ∇ · f × g = g · (∇ × f) − f · ∇ × g .
¡ ¢ ¡ ¢
P0.8
Verify ∇ · g f = f · ∇g + g ∇ · f.
¡ ¢
P0.9
Verify ∇ × g f = ∇g × f + g ∇ × f.
¡ ¢ ¡ ¢
P0.10
P0.11 Show that

³ the´ Laplacian in cylindrical coordinates can be written as
2 2
1 ∂ ∂ ∂ ∂
∇= ρ ∂ρ ρ ∂ρ + ρ12 ∂φ 2 + ∂z 2 .
Solution: Continuing with the approach in Example 0.2, we have
∂2 f ∂2 ρ ∂ f ∂2 φ ∂ f
Ã ! Ã !
∂ρ ∂ ∂ f ∂φ ∂ ∂ f
= + + +
∂x 2 ∂x 2 ∂ρ ∂x ∂ρ ∂x ∂x 2 ∂φ ∂x ∂φ ∂x
∂2 ρ ∂ f
Ã ! ¸ Ã 2 !
∂ρ ∂ ∂ρ ∂ f ∂φ ∂ f ∂ φ ∂f ∂φ ∂ ∂ρ ∂ f ∂φ ∂ f
·µ ¶ µ ¶ ·µ ¶ µ ¶ ¸
= + + + + +
∂x 2 ∂ρ ∂x ∂ρ ∂x ∂ρ ∂x ∂φ ∂x 2 ∂φ ∂x ∂φ ∂x ∂ρ ∂x ∂φ

Exercises 21
and
∂2 f ∂2 f ∂2 f
∇f = + +
∂x 2 ∂y 2 ∂z 2
∂2 ρ ∂2 ρ ∂ρ 2 ∂2 f
¶¸ 2
∂ρ 2
Ã ! Ãµ ¶ !
∂f ∂φ ∂ρ ∂φ ∂ρ ∂ f
¶ µ ·µ ¶µ ¶ µ ¶µ
= 2
+ 2
+ + 2
+2 +
∂x ∂y ∂ρ ∂x ∂y ∂ρ ∂x ∂x ∂y ∂y ∂φ∂ρ
∂2 φ ∂2 φ ∂φ 2 ∂φ 2 ∂2 f ∂2
"Ã ! Ã !# "µ ¶ #
∂f
¶ µ
+ + + + +
∂x 2 ∂y 2 ∂φ ∂x ∂y ∂φ2 ∂z 2
The needed first derivatives are given in Example 0.2. The needed second derivatives are
∂2 ρ 1 x2 sin2 φ
=q −¡ =
∂x 2
¢3/2
x2 + y 2 ρ
x2 + y 2
∂2 φ 2x y 2 sin φ cos φ
=¡ ¢2 =
∂x 2 x2 + y 2 ρ2
∂2 ρ 1 y2 cos2 φ
=q −¡ =
∂y 2
¢3/2
x2 + y 2 ρ
x2 + y 2
∂2 φ 2x y 2 sin φ cos φ
=−¡ ¢2 = −
∂y 2 x2 + y 2 ρ2
Substitution of the derivatives into the above expression yields
sin2 φ cos2 φ ∂ f ³¡ ¢2 ´ ∂2 f
Ã !
¢2 ¡
∇f = + + cos φ + sin φ
ρ ρ ∂ρ ∂ρ 2
¸ 2
sin φ ¡ cos φ ¡ ¢ ∂ f 2 sin φ cos φ 2 sin φ cos φ ∂ f
·µ ¶ µ ¶ µ ¶
+2 − cos φ + sin φ
¢
+ −
ρ ρ ∂φ∂ρ ρ2 ρ2 ∂φ
¶2 2 2
"µ #
sin φ 2 ∂ f ∂ f
+ sin φ
¡ ¢
+ − +
ρ ∂φ2 ∂z 2
1 ∂f ∂2 f 1 ∂2 f ∂2 f
= + 2+ 2 2
+ 2
ρ ∂ρ ∂ρ ρ ∂φ ∂z
1 ∂ ∂f 1 ∂2 f ∂2 f
µ ¶
= ρ + 2 2
+ 2
ρ ∂ρ ∂ρ ρ ∂φ ∂z
P0.12 Verify Stokes’ theorem (0.12) for the function given in Example 0.3.
Take¯the
¯ surface to be a square in the x y-plane contained by |x| = ±1
and y ¯ = ±1.
¯
P0.13 Use the divergence theorem to show that the function in P0.5 is 4π
times the three-dimensional delta function.
Solution: We have by the divergence theorem
r − r0 r − r0
I ¡ ¢ Z ¡ ¢
· n̂d a = ∇ ·
r ¯ ¯ dv
¯ r − r 0 ¯3 ¯r − r 0 ¯3
¯ ¯
S V
From P0.5, the argument in the integral on the right-hand side is zero except at r = r0 . Therefore,
if the volume V does not contain the point r = r0 , then the result of both integrals must be zero.
Let us construct a volume between an arbitrary surface S 1 containing r = r0 and S 2 , the surface

of a tiny sphere centered on r = r0 . Since the point r = r0 is excluded by the tiny sphere, the result
of either integral in the divergence theorem is still zero. However, we have on the tiny sphere
Z2πZπ Ã !
r − r0
I ¡ ¢
1
¯3 · n̂d a = − r ²2 sin φd φd α = −4π
r ²2
¯
¯r − r 0 ¯
S2 0 0
Therefore, for the outer surface S 1 (containing r = r0 ) we must have the equal and opposite
result:
r − r0
I ¡ ¢
¯ ¯3 · n̂d a = 4π
¯r − r 0 ¯
S1
This implies
r − r0
¡ ¢
4π if V contains r0
Z ½
∇r · ¯ ¯3 d v = 0 otherwise
¯r − r 0 ¯
V
3 0
The
¡ 0argument
¢ ¡ 0 of this ¢ exhibits the same characteristics as the delta function δ r − r ≡
¢ ¡ integral
¡ ¢
0
δ x − x δ y − y δ z − z . Namely,
1 if V contains r0
Z ½
δ 3 r0 − r d v =
¡ ¢
0 otherwise
V
r−r0
¡ ¢
Therefore, ∇r · = 4πδ3 r − r0 . The delta function is defined in (0.43)
¡ ¢
|r−r0 |3
Exercises for 0.2 Complex Numbers
P0.14 Without using a calculator, compute z 1 − z 2 and z 1 /z 2 in both rectan-

gular and polar form for z 1 = 1 − i and z 2 = 3 + 4i .
P0.15 Show that

a −ib −1 b
= e −2i tan a
a +ib
regardless of the sign of a, assuming a and b are real.
P0.16 Invert (0.16) to get both formulas in (0.19).
P0.17 Show Re {A} × Re {B } = (AB + A ∗ B ) /4 +C .C .
P0.18 If E o = |E o | e i δE and B o = |B o | e i δB , and if k, z, ω, and t are all real, prove

n o n o 1¡
Re E o e i (kz−ωt ) Re B o e i (kz−ωt ) = E o∗ B o + E o B o∗
¢
4
1
+ |E o | |B o | cos [2 (kz − ωt ) + δE + δB ]
2
p
P0.19 (a) If sin φ = 2, show that cos φ = i 3. HINT: Use sin2 φ + cos2 φ = 1.
p
(b) Show that the angle φ in (a) is π/2 − i ln(2 + 3).
P0.20 Write A cos(ωt ) + 2A sin(ωt + π/4) as simple phase-shifted cosine wave

(i.e. find the amplitude and phase of the resultant cosine wave).

Exercises 23
Exercises for 0.3 Fourier Theory
P0.21 Prove linear superposition of Fourier Transforms:
F ag (t ) + bh (t ) = ag (ω) + bh (ω)
© ª
where g (ω) ≡ F g (t ) and h(ω) ≡ F {h(t )}.

© ª
ª 1 ¡ω¢
Prove F g (at ) = |a| g a .
©
P0.22
Prove F g (t − τ) = g (ω)e i ωτ .
© ª
P0.23
2
P0.24 Show that the Fourier transform of E (t ) = E 0 e −(t /T ) cos ω0 t is
T E 0 − (ω+ω02)2 −(
ω−ω0 )2
µ ¶
E (ω) = p e 4/T + e 4/T 2
2 2
P0.25 Take the inverse Fourier transform of the result in P0.24. Check that it
returns exactly the original function.
P0.26 The following operation is referred to as the convolution of the func-

tions g (t ) and h(t ):
Z∞
¯
g (t ) ⊗ h(t )¯τ ≡ g (t )h(τ − t ) d t
−∞
A convolution measures the overlap of g (t ) and a reversed h(t ) as a

function of the offset τ. The result is a function of τ.
(a) Prove the convolution theorem:
p
F g (t ) ⊗ h(t )¯τ ¯ω = 2πg (ω)h(ω)
© ¯ ª¯
(b) Prove this related form of the convolution theorem:

1
F g (t )h(t ) ¯ω̄ = p
¯
g (ω) ⊗ h(ω)¯ω̄
© ª¯
2π
Solution: Part (a)

 ∞
Z∞  Z∞
¯  
Z ¯¯ 1
g (t ) h (τ − t ) d t e i ωτ d τ

F g (t )h(τ − t ) d t ¯¯ = p (Let τ = t 0 + t )
 ¯ 2π  
−∞ ω −∞ −∞
Z∞ Z∞
1 ¡ 0 ¢
g (t ) h t 0 e i ω t +t d t d t 0
¡ ¢
=p
2π
−∞ −∞
Z∞ Z∞
p 1 1 0
= 2π p g (t ) e i ωt d t p h t 0 e i ωt d t 0
¡ ¢
2π 2π
−∞ −∞
p
= 2πg (ω) h (ω)

P0.27 Prove the autocorrelation theorem:

 ∞ 
Z  p
F h(t )h ∗ (t − τ)d t = 2π |h(ω)|2
 
−∞
P0.28 (a) Compute the Fourier transform of a Gaussian function, f 1 (t ) =

2 2
e −t /2T . Do the integral by hand using the table in Appendix 0.A.
(b) Compute the Fourier transform of a sine function, f 2 (t ) = sin ω0 t .
Don’t use a computer to do the integral—use the fact that sin(x) =
1 ix −i x
2i (e − e ), combined with the integral formula (0.45).
(c) Use your results to parts (a) and (b) and a convolution theorem from
2 2
P0.26 to evaluate the Fourier transform of g (t ) = e −t /2T sin ω0 t . (The
answer should be similar to P0.24).
(d) Plot g (t ) and the imaginary part of its Fourier transform for the
parameters ω0 = 1 and T = 8.

Chapter 1
Electromagnetic Phenomena
In the 1860s, James Maxwell assembled the various known relationships of elec-
tricity and magnetism into a concise1 set of equations:
ρ
∇·E = (Gauss’s Law) (1.1)
²0
∇·B = 0 (Gauss’s Law for magnetism) (1.2)
∂B
∇×E = − (Faraday’s Law) (1.3)
∂t
B ∂E
∇× = ²0 +J (Ampere’s Law revised by Maxwell) (1.4)
µ0 ∂t
Here E and B represent electric and magnetic fields, respectively. The charge
density ρ describes the charge per volume distributed through space.2 The current
density J describes the motion of charge density (in units of ρ times velocity). The
constant ²0 is called the permittivity, and the constant µ0 is called the permeability.
Taken together, these are known as Maxwell’s equations.
After introducing a key revision into Ampere’s law, Maxwell realized that to-
gether these equations comprise a complete self-consistent theory of electromag-
netic phenomena. Moreover, the equations imply the existence of electromag-
netic waves, which travel at the speed of light. Since the speed of light had been
measured before Maxwell’s time, it was immediately apparent (as was already
suspected) that light is a high-frequency manifestation of the same phenomena
that govern the influence of currents and charges upon each other. Previously,
optics was considered to be a topic quite separate from electricity and magnetism.
Once the connection was made, it became clear that Maxwell’s equations form
the theoretical foundations of optics, and this is where we begin our study of light.
In this chapter, we review the physical principles associated with each of
Maxwell’s equations and illustrate the connection between electromagnetic phe-
nomena and light. While many of the details discussed in this chapter (e.g. static
1 In Maxwell’s original notation, this set of equations was hardly concise, written without the
convenience of modern vector notation or ∇. His formulation wouldn’t fit easily a T-shirt!
2 Later in the book we use ρ for the radius in cylindrical coordinates, not to be confused with
charge density.
25
26 Chapter 1 Electromagnetic Phenomena
fields and magnetic effects) are not directly used in later chapters, they are in-
cluded to better appreciate the basic physics that Maxwell’s equations describe. It
may be helpful to study the vector calculus review in section 0.1 before beginning
this chapter.
1.1 Gauss’ Law

The force on a point charge q located at r exerted by another point charge q 0
located at r0 is
F = qE(r) (1.5)
where
q 0 r − r0
¡ ¢
E (r) = (1.6)
Origin 4π²0 |r − r0 |3
This relationship is known as Coulomb’s law. The force is directed along the
Figure 1.1 The geometry of vector r − r0 , which points from charge 0
Coulomb’s law for a point charge
¯ q 0 ¯to q as seen in Fig. 1.1. The length or
magnitude of this vector is given by ¯r − r ¯ (i.e. the distance between q¯ 0 and q).
The familiar inverse square law can be seen by noting that r − r0 ¯r − r0 ¯ is a unit
¡ ¢ ±¯
vector. We have written the force in terms of an electric field E (r), which is defined
throughout space (regardless of whether a second charge q is actually present).
The permittivity ²0 amounts to a proportionality constant.
The total force from a collection of charges is found by summing expression
(1.5) over all charges q n0 associated with their specific locations r0n . If the charges
are distributed continuously throughout space, having density ρ r0 (units of
¡ ¢
charge per volume), the summation for finding the net electric field at r becomes
an integral:
¡ 0 ¢ r − r0
¡ ¢
1
Z
Origin E (r) = ρ r d v0 (1.7)
4π²0 |r − r0 |3
V
Figure 1.2 The geometry of 3
Coulomb’s law for a charge dis- This three-dimensional integral gives the net electric field produced by the
tribution. charge density ρ distributed throughout the volume V .
Gauss’ law (1.1), the first of Maxwell’s equations, follows directly from (1.7).
No new physical phenomenon is introduced. Gauss’ law is simply a mathematical
interpretation of Coulomb’s law.4
Derivation of Gauss’ law
We begin with the divergence of (1.7):

r − r0
¡ ¢
1
Z
¡ 0¢
∇ · E (r) = ρ r ∇r · d v0 (1.8)
4π²0 |r − r0 |3
V
3 Here d v 0 stands for d x 0 d y 0 d z 0 and r0 = x 0 x̂ + y 0 ŷ + z 0 ẑ (in Cartesian coordinates).

4 Coulomb’s law is incomplete since it implies an instantaneous response of the field to a recon-
figuration of the charge. The generalized version of Coulomb’s law, one of Jefimenko’s equations,
incorporates the fact that electromagnetic news travels at the speed of light. Ironically, Gauss’ law
derived from Coulomb’s law holds perfectly whether the charges remain still or are in motion.

1.2 Gauss’ Law for Magnetic Fields 27
The subscript on ∇r indicates that it operates on r while treating r0 , the dummy

variable of integration, as a constant. As messy as this integral appears, it contains a
remarkable mathematical property that ¡ ¢can be exploited, even without specifying
the form of the charge distribution ρ r0 . In modern mathematical language, the
vector expression in the integral is a three-dimensional delta function:
r − r0
¡ ¢
≡ 4πδ3 r0 − r ≡ 4πδ x 0 − x δ y 0 − y δ z 0 − z (1.9)
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
∇r · 3
0
|r − r |
A derivation of this formula and a description of its properties are addressed in

problem P0.13. The delta function allows the integral in (1.8) to be performed, and
the relation becomes simply
ρ (r)
∇ · E (r) =
²0 Carl Friedrich Gauss (1777–1855, Ger-
man) was born in Braunschweig, Ger-
which is the differential form of Gauss’ law (1.1).
many to to a poor family. Gauss was
a child prodigy, and he made his first
significant advances to mathematics as
a teenager. In grade school, he purport-
edly was asked to add all integers from
Example 1.1 1 to 100, which he did in seconds to the
astonishment of his teacher. (Presum-
Suppose we have an electric field given by E = (αx 2 y 3 x̂ + βz 4 ŷ) cos ωt . Use Gauss’ ably, Friedrich immediately realized that
the numbers form fifty pairs equal to
law (1.1) to find the charge density ρ(x, y, z, t ). 101.) Gauss made important advances
in number theory and differential geome-
Solution: try. He developed the law discussed here
as one of Maxwell’s equations in 1835,
∂ ∂ ∂ but it was not published until 1867, af-
µ ¶
ρ = ²0 ∇ · E = ²0 x̂ + ŷ + ẑ (αx 2 y 3 x̂ + βz 4 ŷ) cos ωt = 2²0 αx y 3 cos ωt ter Gauss’ death. Ironically, Maxwell
∂x ∂y ∂z was already using Gauss’ law by that
time.
The (perhaps more familiar) integral form of Gauss’ law can be obtained by
integrating (1.1) over a volume V and applying the divergence theorem (0.11) to
the left-hand side:
1
I Z
E (r) · n̂ d a = ρ (r) d v (1.10)
²0
S V
This form of Gauss’ law shows that the total electric field flux extruding through a
closed surface S (i.e. the integral on the left side) is proportional to the net charge
contained within it (i.e. within volume V contained by S).
1.2 Gauss’ Law for Magnetic Fields

In order to ‘feel’ a magnetic force, a charge q must be moving at some velocity (call
Figure 1.3 Gauss’ law in integral
it v). The magnetic field arises itself from charges that are in motion. We consider
form relates the flux of the elec-
the magnetic field to arise from a distribution of moving charges described by a
tric field through a surface to the
current density J r0 throughout space. The current density has units of charge
¡ ¢
charge contained inside that sur-
times velocity per volume (or equivalently, current per cross sectional area). The face.
magnetic force law analogous to Coulomb’s law is
F = qv × B (1.11)

where
µ0 r − r0
Z ¡ ¢
¡ 0¢
B (r) = J r × 3
d v0 (1.12)
4π 0
|r − r |
V
The latter equation is known as the Biot-Savart law. The permeability µ0 dictates
the strength of the magnetic field, given the current distribution.
As with Coulomb’s law, we can apply mathematics to the Biot-Savart law
to obtain another of Maxwell’s equations. Nevertheless, the essential physics
is already inherent in the Biot-Savart law.5 Using the result from P0.4, we can
rewrite (1.12) as6
Jean-Baptiste Biot (1774-1862,
µ0 µ0 J r0
¡ ¢
French) was born in Paris. He attended 1
Z Z
B (r) = − J r0 × ∇r d v0 = d v0 (1.13)
¡ ¢
the École Polytechnique where math- ∇×
ematician Gaspard Monge recognized 4π |r − r0 | 4π |r − r0 |
his academic potential. After grad- V V
uating, Biot joined the military and
then took part in an insurrection on Since the divergence of a curl is identically zero (see P0.6), we get the second of
the side of the Royalists. He was cap-
tured, and his career might of have Maxwell’s equations (1.2)
met a tragic ending there had Monge ∇·B = 0
not successfully pleaded for his release
from jail. Biot went on to become a
professor of physics at the College de which is known as Gauss’ law for magnetic fields. (Two equations down; two to
France. Among other contributions, go.)
Biot participated in the first hot-air
balloon ride with Gay-Lussac and cor- The similarity between ∇ · B = 0 and ∇ · E = ρ/²0 , Gauss’ law for electric fields,
rectly deduced that meteorites that fell is immediately apparent. In integral form, Gauss’ law for magnetic fields looks the
on L’Aigle, France in 1803 came from
space. Later Biot formed collaborated same as (1.10), only with zero on the right-hand side. If one were to imagine the
with the younger Felix Savart (1791- existence of magnetic monopoles (i.e. isolated north or south ‘charges’), then the
1841) on the theory of magnetism and
electrical currents. They formulated right-hand side would not be zero. The law implies that the total magnetic flux
their famous law in 1820. extruding through any closed surface balances with as many field lines pointing
inwards as pointing outwards.
Example 1.2
The field surrounding a magnetic dipole is given by
B = β 3xz x̂ + 3y z ŷ + 3z 2 − r 2 ẑ /r 5
£ ¡ ¢ ¤
where r ≡ x 2 + y 2 + z 2 . Show that this field satisfies Gauss’ law for magnetic
p
fields (1.2).
5 Like Coulomb’s law, the Biot-Savart law is incomplete since it also implies an instantaneous
response of the magnetic field to a reconfiguration of the currents. The generalized version of the
Biot-Savart law, another of Jefimenko’s equations, incorporates the fact that electromagnetic news
travels at the speed of light. Ironically, Gauss’ law for magnetic fields and Maxwell’s version of
Ampere’s law, derived from the Biot-Savart law, hold perfectly whether the Currents are steady or
vary in time. Jefimenko equations, analogs of Coubomb and Biot-Savart, also embody Faraday’s
law, the only of Maxwell’s equations that cannot be derived from the usual forms of Coulomb’s law
and the Biot-Savart law (together with the continuity equation).
6 Note that ∇ ignores the variable of integration r0 .
r

1.3 Faraday’s Law 29
Solution:
∂ ³ xz ´ ∂ ³ y z ´ ∂ 3z 2 1
· µ ¶¸
∇·B = β 3 +3 + − 3
∂x r 5 ∂y r 5 ∂z r 5 r
z 5xz ∂r z 5y z ∂r 6z 15z 2 ∂r 3 ∂r
· µ ¶ µ ¶ µ ¶¸
=β 3 5 − 6 +3 5 − 6 + 5− 6 +
r r ∂x r r ∂y r r ∂z r 4 ∂z
12z 15z ∂r ∂r ∂r 3 ∂r
· µ ¶ ¸
=β − 6 x +y +z + 4
r5 r ∂x ∂y ∂z r ∂z
.p
The necessary derivatives are ∂r /∂x = x x 2 + y 2 + z 2 = x/r , ∂r /∂y = y/r , and
∂r /∂z = z/r , which lead to
12z 15z 3z
· ¸
∇·B = β − + =0 Michael Faraday (1791–1867, English)
r5 r5 r5 was one of the greatest experimental
physicist in history. Born on the out-
skirts of London, his family was not well
off, his father being a blacksmith. The
young Michael Faraday only had access
1.3 Faraday’s Law to a very basic education, and so he
was mostly self taught and never did
acquire much skill in mathematics. As
Michael Faraday discovered that changing magnetic fields induces electric fields. a teenager, he obtained a seven-year
apprenticeship with a book binder, dur-
This distinct physical effect, called induction, can be observed when a magnet is
ing which time he read many books,
waved around by a loop of wire. Faraday showed that a change in magnetic flux including books on science and electric-
through a circuit loop (see Fig. 1.4) induces an electromotive force around the ity. Given his background, Faraday’s
entry into the scientific community was
loop according to very gradual, from servant to assistant
∂
I Z
and eventually to director of the labo-
E · d` = − B · n̂ d a (1.14) ratory at the Royal Institution. Faraday
∂t
C S is perhaps best known for his work that
established the law of induction and
This relation is known as Faraday’s law, is the integral form of the next entry in for the discovery that magnetic fields
can interact with light, known as the
our list of Maxwell’s equations. The right side describes a change in the magnetic
Faraday effect. He also made many ad-
flux through a surface and the left side describes the voltage vances to chemistry during his career
To obtain the differential form of Faraday’s law, we apply Stokes’ theorem to including figuring out how to liquify
several gases. Faraday was a deeply re-
the left-hand side and obtain ligious man, serving as a Deacon in his
church.
∂ ∂B
Z Z Z µ ¶
∇ × E · n̂ d a = − B · n̂ d a or ∇×E+ · n̂ da = 0 (1.15)
∂t ∂t
S S S
Since this equation is true regardless of what surface we choose, it implies
∂B
∇×E = −
∂t
the differential form of Faraday’s law (1.4) (three of Maxwell’s equations down;
one to go).
N
Example 1.3
Magnet
For the electric field given in Example 1.1, E = (αx 2 y 3 x̂+βz 4 ŷ) cos ωt , use Faraday’s
law (1.3) to find B(x, y, z, t ). Figure 1.4 Faraday’s law.

Solution:
¯ ¯
¯ x̂ ŷ ẑ ¯¯
∂B ¯
∂ ∂ ∂ ¯
= −∇ × E = − cos ωt ¯
¯
∂x ∂y ∂z ¯¯
∂t ¯ αx 2 y 3 βz 4 0 ¯
¯
∂ ∂ ∂ ∂
¡ 4¢ ¡ 2 3¢ #
βz − ŷ ∂x αx y
"
x̂ ∂y (0) − x̂ ∂z (0) + ŷ ∂z
= − cos ωt ∂ ∂
+ ẑ ∂x βz 4 − ẑ ∂y αx 2 y 3
¡ ¢ ¡ ¢
= 4βz 3 x̂ + 3αx 2 y 2 ẑ cos ωt

¡ ¢
Integrating in time, we get
¢ sin ωt
B = 4βz 3 x̂ + 3αx 2 y 2 ẑ
¡
ω
plus possibly a constant field.
1.4 Ampere’s Law

The Biot-Savart law (1.12) can also be used to obtain another of Maxwell’s equa-
tions: Ampere’s law. Ampere’s law is merely the inversion of the Biot-Savart law
(1.12) so that J appears by itself, unfettered by integrals or the like. This is ac-
complished through mathematics, so again no new physical phenomenon is
introduced, only a new interpretation.
Inversion of Biot-Savart Law
André-Marie Ampère (1775-1836,

We take the curl of (1.12):
French) was a physicist and mathe-
µ0 r − r0
Z · ¡ ¢¸
matician who did pioneering work in ¡ 0¢
∇ × B (r) = ∇r × J r × d v0 (1.16)
electromagnetism. 4π |r − r0 |3
V
We next apply the differential vector rule from P0.7 while noting that J r0 does not
¡ ¢
depend on r so that only two terms survive. The curl of B (r) then becomes
µ0 r − r0 ¤ r − r0
Z µ · ¡ ¢¸ ¡ ¢¶
¡ 0¢ £ ¡ 0¢
∇ × B (r) = J r ∇r · − J r · ∇r d v0 (1.17)
4π |r − r0 |3 |r − r0 |3
V
According to (1.9), the first term in the integral is 4πJ r0 δ3 r0 − r , which is easily
¡ ¢ ¡ ¢
integrated. To make progress on the second term, we observe that the gradient can
be changed to operate on the primed variables without affecting the final result
(i.e. ∇r → −∇r0 ). In addition, we take advantage of the vector integral theorem
(0.13) to arrive at
µ0 r − r0 £ ¡ 0 ¢¤ 0 µ0 r − r0 £ ¡ 0 ¢ ¤ 0
Z ¡ ¢ I ¡ ¢
∇ × B (r) = µ0 J (r) − ∇r0 · J r d v + J r · n̂ d a
4π |r − r0 |3 4π |r − r0 |3
V S
(1.18)

1.5 Maxwell’s Adjustment to Ampere’s Law 31
The last term in (1.18) vanishes if we assume that the current density J is com-
pletely contained within the volume V so that it is zero at the surface S. Thus, the
expression for the curl of B (r) reduces to
µ0 r − r0 £
Z ¡ ¢
∇ × B (r) = µ0 J (r) − ∇r0 · J r0 d v 0 (1.19)
¡ ¢¤
4π |r − r |
0 3
V
The latter term in (1.19) vanishes if
∇·J ∼
=0 (steady-state approximation) (1.20)
in which case we have succeeded in inverting the Biot-Savart law.
As originally formulated, Ampere’s law
∇ × B = µ0 J (1.21)
only applies to quasi steady-state situations, since the final term in (1.19) is
ignored. The approximation (1.20) is valid only if the charge distribution ρ does
not vary much in time. Keep in mind that the current J moves charge density ρ
from place to place, so in general this is not a good approximation (especially for
optical phenomena).
One can better appreciate the physical interpretation of Ampere’s law from
the integral form, obtained by integrating both sides of (1.21) over an open surface
S, bounded by contour C . Stokes’ theorem (0.12) is applied to the left-hand side
to get I Z
B (r) · d ` = µ0 J (r) · n̂ d a ≡ µ0 I (1.22)
C S
This law says that the line integral of B around a closed loop C is proportional to
the total current flowing through the loop (see Fig. 1.5). Recall that the units of J
are current per area, so the surface integral containing J yields the current I in
units of charge per time.
Figure 1.5 Ampere’s law.
1.5 Maxwell’s Adjustment to Ampere’s Law

Maxwell was the first to realize that Ampere’s law is incomplete as written in (1.21)
since there exist situations where ∇ · J 6= 0. Maxwell figured out that (1.20) should
be replaced with
∂ρ
∇·J = − (1.23)
∂t
This is called the continuity equation for charge and current densities. Simply
stated, if there is net current flowing into a volume there ought to be charge charge
piling up inside. For the steady-state situation inherently considered by Ampere,
the current into and out of a volume is balanced so that ∂ρ ∂t = 0.
±

Derivation of the Continuity Equation
Consider a volume of space enclosed by a surface S through which current is

flowing. The total current exiting the volume is
I
I = J · n̂ d a (1.24)
S
where n̂ is the outward normal to the surface. The units on this equation are that
of current, or charge per time, leaving the volume.
Since we have considered a closed surface S, the net current leaving the enclosed
volume V must be the same as the rate at which charge within the volume vanishes:
∂
Z
I =− ρ dv (1.25)
∂t
V
James Clerk Maxwell (1831–1879,
Scottish) was born to a wealthy family Upon equating these two expressions for current, as well as applying the diver-
in Edinburgh, Scotland. Originally, his
name was John Clerk, but he added his gence theorem (0.11) to the former, we get
mother’s maiden name when he inher-
∂ρ ∂ρ
Z Z Z µ ¶
ited an estate from her family. Maxwell ∇ · Jd v = − d v or ∇·J+ dv = 0 (1.26)
was a bright and inquisitive child and ∂t ∂t
displayed an unusual gift for mathe- V V V
matics at an early age. He attended
Edinburgh University and then Trin- Since (1.26) is true regardless of which volume V we choose, it implies (1.23).
ity College at Cambridge University.
Maxwell started his career as a professor
at Aberdeen University, but lost his job
a few years later during restructuring,
Maxwell’s main contribution (aside from organizing other people’s formulas
at which time Maxwell took a post at and recognizing them as a complete set of differential equations – a big deal) was
King’s College of London. Maxwell is the injection of the continuity equation (1.23) into the derivation of Ampere’s law
best known for his fundamental contri-
butions to electricity and magnetism (1.19). This yields
and the kinetic theory of gases. He
µ0 ∂ ¡ 0 ¢ r − r0
¡ ¢
studied numerous other subjects, includ-
Z
ing the human perception of color and ∇ × B = µ0 J + ρ r d v0 (1.27)
color-blindness, and is credited with pro- 4π ∂t |r − r0 |3
ducing the first color photograph. He V
originally postulated that electromag-
netic waves propagated in a mechanical Then substitution of (1.7) into this formula gives
‘luminiferous ether’. He founded the
Cavendish laboratory at Cambridge in B ∂E
1874, which has produced 28 Nobel ∇× = J + ²0
prizes to date. Maxwell, one of Ein-
µ0 ∂t
stein’s heros, died of stomach cancer in
his forties. the last of Maxwell’s equations (1.4).
This revised Ampere’s law includes extra term ²0 ∂E/∂t , which is known as
the displacement current (density). The displacement current exists even in the
absence of any actual charge density ρ.7 It indicates that a changing electric field
behaves like a current in the sense that it produces magnetic fields. The similarity
between Faraday’s law (1.7) and the corrected Ampere’s law is apparent. No doubt
this played a part in motivating Maxwell’s work.
7 One might think that the displacement current ² ∂E/∂t ought to be zero in a region of space
0
with no charge density ρ. However, in (1.27) ρ appears in a volume integral over a region of space
sufficiently large to include any charges responsible for the field E; there can be no field without a
source.

1.5 Maxwell’s Adjustment to Ampere’s Law 33
In summary, in the previous section we saw that the basic physics in Ampere’s
law is present in the Biot-Savart law. Infusing it with charge conservation (1.23)
as well as Gauss’ law (1.10) yields the corrected form of Ampere’s law.
C
Example 1.4
(a) Use Gauss’s law to find the electric field in a gap that interupts a current-carrying
I I
wire, as shown in Fig. 1.6.
(b) Find the strength of the magnetic field on contour C using Ampere’s law applied
to surface S 1 .
(c) Show that the displacement current in the gap leads to the identical magnetic
field when using surface S 2 . Figure 1.6 Charging capacitor.
Solution: (a) We’ll assume that the cross-sectional area of the wire A is much wider
than the gap separation. Then the electric field in the gap will be uniform, and the
integral on the left-hand side of (eq:1.2.4) reduces to E A since there is essentially
no field other than in the gap. If the accumulated charge on the ‘plate’ is Q, then
the right-hand side of (eq:1.2.4) integrates to Q/²0 , and the electric field turns out
to be E = Q/(²0 A).
(b) Let the contour C be a circle at radius r . The magnetic field points around the
circumference with constant strength. The left-hand side of (1.22) becomes 2πr B
while the right-hand side is
∂Q
Z
µ0 J · n̂d a = µ0 I = µ0
∂t
S
This gives for the magnetic field
µ0 ∂Q
B=
2πr ∂t
(c) If instead we use the displacement current ²0 ∂E/∂t in place of J in in the right-
hand side of right-hand side of (1.22), we get for that piece
∂E ∂Q
Z
µ0 J · n̂d a = µ0 ε0 A = µ0
∂t ∂t
S
which is the same as before.
Example 1.5
2 3
For the electric field E = (αx ¡ y x̂ + βz 4 ŷ) cos ¢ωt (see Example 1.1) and the as-
sociated magnetic field B = 4βz x̂ + 3αx 2 y 2 ẑ sinωωt (see Example 1.3), find the
3
current density J (x, y, z, t ).

Solution:
*
B ∂E
J = ∇× − ²0
µ0 ∂t
¯ ¯
¯ x̂ ŷ ẑ ¯
sin ωt ¯ ∂
¯
∂ ∂
¯
= ¯ + ²0 ω(αx 2 y 3 x̂ + βz 4 ŷ) sin ωt
¯
¯ ∂x ∂y ∂z
µ0 ω ¯¯
4βz 3 0 3αx 2 y 2 ¯
¯
sin ωt £
6αx 2 y x̂ − 6αx y 2 ŷ + 4βz 3 ŷ + ²0 ω(αx 2 y 3 x̂ + βz 4 ŷ) sin ωt
¤
=
µ0 ω
6αx 2 y 4βz 3 6αx y 2
·µ ¶ µ ¶ ¸
2 3 4
= ²0 ωαx y + x̂ + ²0 ωβz + − ŷ sin ωt
µ0 ω µ0 ω µ0 ω
1.6 Polarization of Materials

We are essentially finished with our analysis of Maxwell’s equations except for a
brief discussion of current density J and charge density ρ. The current density
can be decomposed into three general categories. First, as one would expect,
currents can arise from free charges in motion such as electrons in a metal. We
denote this type of current as Jfree . Second, individual atoms can exhibit internal
currents that give rise to paramagnetic and diamagnetic effects, denoted by Jm .
These are seldom important in optics problems, and so we will ignore these types
of currents. Third, molecules in a material can elongate and become dipoles in
response to an applied electric field. We denote this type of current, which arises
from the polarization of the medium, by Jp .
The polarization current Jp is associated with a dipole distribution function P,
called the polarization (in units of dipoles per volume, or charge times length per
volume). Physically, if the dipoles (depicted in Fig. 1.7) change their strength or
orientation as a function of time in some coordinated fashion, an effective current
density arises in the medium. Since the time-derivative of dipole moments
renders charge times velocity, a distribution of ‘sloshing’ dipoles gives a current
density equal to
∂P
Jp = (1.28)
∂t
We thus write the total current in an optical medium (ignoring magnetic effects)
as
∂P
J = Jfree + (1.29)
∂t
Turning our attention now to charge density ρ, we seldom consider the prop-
agation of electromagnetic waveforms through electrically charged materials. We
therefore will write ρ free = 0. One might be tempted in this case to set the overall
charge density ρ to zero, but this would be wrong. The polarization of a neutral
material, described by P, can vary spatially, leading to local concentrations of
positive or negative charges.

1.6 Polarization of Materials 35
We let ρ p denote the charge density created by variations in the polarization

P(r). To determine an expression for ρ p , we write the continuity equation (1.23)
as applied to the currents and charges associated with this polarization:
∂ρ p
∇ · Jp = − (1.30)
∂t
Substitution of (1.28) into this equation immediately yields
ρ p = −∇ · P (1.31)
(a)
To better appreciate local charge buildup due to variation in the medium
polarization, consider the divergence theorem (0.11) applied to P (r):
I Z
− P (r) · n̂ d a = − ∇ · P (r) d v (1.32)
S V
The left-hand side of (1.32) is a surface integral, which after integrating gives units
of charge. Physically, it is the sum of the charges touching the inside of surface S
(multiplied by a minus since dipole vectors point from the negatively charged end
of a molecule to the positively charged end). The situation is depicted in Fig. 1.7. (b)
Keep in mind that P (r) is a continuous function so that Fig. 1.7 depicts crudely
an enormous number of very tiny dipoles (no fair drawing a surface that avoids
cutting the dipoles; cut through them at random). When ∇ · P is zero, there are
equal numbers of positive and negative charges touching S from within. When
∇ · P is not zero, the positive and negative charges touching S are not balanced.
Essentially, excess charge ends up within the volume because the non-uniform
alignment of dipoles causes them to be cut preferentially at the surface.
Figure 1.7 A polarized medium
Since typical optical media do not include unbalanced free charges, we write
with (a) ∇ · P = 0 and with (b)
the charge density according to (1.32) as ∇ · P 6= 0.
ρ = −∇ · P (1.33)
In summary, in electrically neutral non-magnetic media, Maxwell’s equations

(in terms of the medium polarization P) are8
∇·P
∇·E = − (Gauss’s law) (1.34)
²0
∇·B = 0 (Gauss’s law for magnetism) (1.35)
∂B
∇×E = − (Faraday’s law) (1.36)
∂t
B ∂E ∂P
∇× = ²0 + + Jfree (Ampere’s law; fixed by Maxwell) (1.37)
µ0 ∂t ∂t
8 It is not uncommon to see the macroscopic Maxwell equations written in terms of two auxiliary
fields: H and D. The field H is useful in magnetic materials. In these materials, the combination
B µ0 in Ampere’s law is replaced by H ≡ B/µ0 − M, where Jm = ∇ × M is the current associated
±
with the material’s magnetization. Since we only consider nonmagnetic materials (M = 0), there
is little point in using H. The field D, called the displacement, is defined as D ≡ ²0 E + P. This
combination of E and P occurs in Coulomb’s law and Ampere’s law. For the purposes of this book,
it is conceptually more clear to retain the polarization P as a separate field in these two equations.

1.7 The Wave Equation

When Maxwell unified electromagnetic theory, he immediately noticed that waves
are solutions to this set of equations. In fact his desire to find a set of equations
that allowed for waves aided his effort to find the correct equations. After all, it
was already known that light traveled as waves. Kirchhoff had previously noticed
±p
that 1 ²0 µ0 gives the correct speed of light c = 3.00×108 m/s (which had already
been measured). Faraday and Kerr had observed that strong magnetic and electric
fields affect light propagating in crystals. The time was right to suspect that light
was merely a high-frequency manifestation of electromagnetic phenomena.
At first glance, Maxwell’s equations might not immediately suggest (to the
inexperienced eye) that waves are solutions. However, we can manipulate the
equations (first order differential equations that couple E to B) into the familiar
wave equation (decoupled second order differential equations for either E or B).
Students should become familiar with the derivation of the wave equation from
Maxwell’s equations. In what follows, we will derive the wave equation for E. The
derivation of the wave equation for B is very similar (see problem P1.6).
Derivation of the Wave Equation
Taking the curl of (1.3) gives
∂
∇ × (∇ × E) + (∇ × B) = 0 (1.38)
∂t
We may eliminate ∇ × B by substitution from (1.4), which gives
∂2 E ∂J
∇ × (∇ × E) + µ0 ²0 2
= −µ0 (1.39)
∂t ∂t
Next we apply the differential vector identity (0.10), ∇ × (∇ × E) = ∇ (∇ · E) − ∇2 E,

and use Gauss’ law (1.1) to replace the term ∇ · E, which brings us to
∂2 E ∂J ∇ρ
∇2 E − µ0 ²0 = µ0 + (1.40)
∂t 2 ∂t ²0
Substitution from (1.29) and (1.33) gives the more-useful-for-optics form
∂2 E ∂Jfree ∂2 P 1
∇2 E − µ0 ²0 = µ0 + µ0 − ∇ (∇ · P) (1.41)
∂t 2 ∂t ∂t 2 ²0
The left-hand side of (1.41) is the familiar wave equation. However, the right-
hand side contains a number of source terms, which arise when various currents
and/or polarizations are present. The first term on the right-hand side of (1.41)
describes currents of free charges, which are important for determining the re-
flection of light from a metallic surface or for determining the propagation of
light in a plasma. The second term on describes dipole oscillations, which behave
similar to currents. The final term on the right-hand side of (1.41) is important in

1.7 The Wave Equation 37
anisotropic media such as crystals. In this case, the polarization P responds to the
electric field along a direction not necessarily parallel to E, due to the influence of
the crystal lattice (addressed in chapter 5).
For most problems in optics, some of the terms on the right-hand side of (1.41)
are zero. Usually, at least one of the terms must be retained when considering
propagation in a medium other than vacuum. For example, in a non-conducting
optical material such as glass, the Jfree = 0 and ∇ · P = 0, but ∂2 P ∂t 2 is not zero,
±
as the medium polarization responds to the light field. This polarization current
determines the refractive index of the material (discussed in chapter 2).
Even though the magnetic field B satisfies a similar wave equation, decoupled
from E (see P1.6), the two waves are not independent. The fields for E and B must
be chosen to be consistent with each other through Maxwell’s equations. After
solving the wave equation (1.41) for E, one may simply obtain B from E through
an application of Faraday’s law (1.36).
In vacuum all of the terms on the right-hand side in (1.41) are zero, in which
case the equation reduces to
∂2 E
∇2 E − µ0 ²0 =0 (vacuum) (1.42)
∂t 2
The solutions to the vacuum wave equation (1.42) propagate with speed
±p
c ≡1 ²0 µ0 = 2.9979 × 108 m/s (1.43)
and any functional form E is a valid solution as long as it caries the dependence
on the argument û · r − ct , where û is a unit vector specifying the direction of
propagation. The argument û · r − ct preserves the shape of the waveform as
it propagates in the û direction; features occurring at a given position recur
‘downstream’ at a distance ct after a time t . By checking this solution in (1.42),
one effectively verifies that the speed of propagation is c (see P1.8). Note that we
may add together any combination of solutions (even with differing directions of
propagation) to form other valid solutions.
Example 1.6
Show that the electric field
E = (αx 2 y 3 x̂ + βz 4 ŷ) cos ωt
and the associated charge density (see Example 1.1)
ρ = 2²0 αx y 3 cos ωt
together with the associated current density (see Example 1.5)
6αx 2 y 4βz 3 6αx y 2

·µ ¶ µ ¶ ¸
J= ²0 ωαx 2 y 3 + x̂ + ²0 ωβz 4 + − ŷ sin ωt
µ0 ω µ0 ω µ0 ω
satisfy the wave equation (1.40).

Solution: We have
∂2 E £ ¡ 3
∇2 E − µ0 ²0 = α 2y + 6x 2 y x̂ + 12βz 2 ŷ cos ωt
¢ ¤
∂t 2
+ µ0 ²0 ω2 (αx 2 y 3 x̂ + βz 4 ŷ) cos ωt
= α 2y 3 + 6x 2 y + µ0 ²0 ω2 x 2 y 3 x̂ + β 12z 2 + µ0 ²0 ω2 z 4 ŷ cos ωt
£ ¡ ¢ ¡ ¢ ¤
Similarly,
∂J ∇ρ £¡
µ0 = µ0 ²0 ω2 αx 2 y 3 + 6αx 2 y x̂ + µ0 ²0 ω2 βz 4 + 12z 2 − 6αx y 2 ŷ cos ωt
¢ ¡ ¢ ¤
+
∂t ²0
+ 2αy 3 x̂ + 6αx y 2 ŷ cos ωt
£ ¤
= α µ0 ²0 ω2 x 2 y 3 + 6x 2 y + 2y 3 x̂ + µ0 ²0 ω2 βz 4 + 12z 2 ŷ cos ωt
£ ¡ ¢ ¡ ¢ ¤
The two expressions are equivalent, and the wave equation is satisfied.9
9 The expressions in Example 1.6 hardly look like waves. The (quite unlikely) current and charge
distributions, which fill all space, would have to be artificially induced rather than arise naturally
from in response to a field disturbance on a medium.

Exercises 39
Exercises
Exercises for 1.1 Gauss’ Law
P1.1 Consider an infinitely long hollow cylinder (inner radius a, outer radius
b) which carries a volume charge density ρ = k/s 2 for a < s < b and no
charge elsewhere, where s is the distance from the axis of the cylinder
as shown in Fig. 1.8. Use Gauss’s Law in integral form to find the
electric field produced by this charge for each of the three regions:
s < a, a < s < b, and s > b.
a
HINT: For each region first draw an appropriate “Gaussian surface” and
integrate the charge density over the volume to figure out the enclosed b
charge. Then use Gauss’s law in integral form and the symmetry of the
problem to solve for the electric field. Figure 1.8 A charged cylinder with
charge located between a and b.
Exercises for 1.3 Faraday’s Law
Suppose that an electric field is given by E(r, t ) = E0 cos k · r − ωt + φ ,

¡ ¢
P1.2
where k⊥E0 and φ is a constant phase. Show that
k × E0
B(r, t ) = cos k · r − ωt + φ
¡ ¢
ω
is consistent with (1.3).
Exercises for 1.4 Ampere’s Law
P1.3 A conducting cylinder with the same geometry as P1.1 carries a volume
current density J = k/s ẑ along the axis of the cylinder for a < s < b.
Using Ampere’s Law in integral form, find the magnetic field due to this
current. Find the field for each of the three regions: s < a, a < s < b,
and s > b.
HINT: For each region first draw an appropriate ‘Amperian loop’ and
integrate the current density over the surface to figure out how much
current passes through the loop. Then use Ampere’s law in integral
form and the symmetry of the problem to solve for the magnetic field.
Exercises for 1.6 Polarization of Materials
P1.4 Memorize Maxwell equations (1.1)–(1.4) together with 1.29 and 1.33.
Explain (very) briefly the meaning of each equation and the assump-
tions that go into 1.29 and 1.33. Be prepared to reproduce them from
memory on an exam. After studying, write everything from memory
on your homework page for submission.

P1.5 Check that the E and B fields in P1.2, satisfy the rest of Maxwell’s equa-
tions (1.1), (1.2), and (1.4). What are the implications for J and ρ?
Exercises for 1.7 The Wave Equation
P1.6 Derive the wave equation for the magnetic field B in vacuum (i.e. J = 0
and ρ = 0).
P1.7 Show that the magnetic field in P1.2 is consistent with the wave equa-
tion derived in P1.6.
P1.8 Verify that E(û·r−c t ) satisfies the vacuum wave equation (1.42), where
E has an arbitrary functional form.
(a) Show that E (r, t ) = E0 cos k(û · r − c t ) + φ is a solution to the vac-

¡ ¢
P1.9
uum wave equation (1.42), where û is an arbitrary unit vector and k is
a constant with units of inverse length.
(b) Show that each wave front forms a plane, which is why such solu-
tions are often called ‘plane waves’. HINT: A wavefront is a surface in
space where the argument of the cosine (i.e. the phase of the wave) has
a constant value. Set the cosine argument to an arbitrary constant and
see what positions are associated with that phase.
(c) Determine the speed v = ∆r /∆t that a wave front moves in the û
direction. HINT: Set the cosine argument to a constant, solve for r, and
differentiate.
(d) By analysis, determine the wavelength λ in terms of k. HINT: Find
the distance between identical wave fronts by changing the cosine
argument by 2π at a given instant in time.
(e) Use (1.34) to show that E0 and û must be perpendicular to each
other in vacuum.
L1.10 Measure the speed of light using a rotating mirror. Provide an estimate
of the experimental uncertainty in your answer (not the percentage
error from the known value). (video)
Screen D
Laser
A
Retro-reflecting
Collimation Telescope
B C
Rotating Delay Path Rotating Long Corridor
Mirror mirror
Front of laser can
Figure 1.9 Geometry for lab 1.10. serve as screen
for returning light
Laser
Figure 1.10 A schematic of the setup for lab 1.10.

Exercises 41
Figure 1.9 shows a simplified geometry for the optical path for light
in this experiment. Laser light from A reflects from a rotating mirror
at B towards C . The light returns to B , where the mirror has rotated,
sending the light to point D. Notice that a mirror rotation of θ deflects
the beam by 2θ.
P1.11 Ole Roemer made the first successful measurement of the speed of light
in 1676 by observing the orbital period of Io, a moon of Jupiter with a Earth
period of 42.5 hours. When Earth is moving toward Jupiter, the period Io
Sun
is measured to be shorter than 42.5 hours because light indicating the
end of the moon’s orbit travels less distance than light indicating the Jupiter
beginning. When Earth is moving away from Jupiter, the situation is Earth
reversed, and the period is measured to be longer than 42.5 hours.
Figure 1.11 Geometry for P1.11
(a) If you were to measure the time for 40 observed orbits of Io when
Earth is moving directly toward Jupiter and then several months later
measure the time for 40 observed orbits when Earth is moving directly
away from Jupiter, what would you expect the difference between these
two measurements be? Take the Earth’s orbital radius to be 1.5×1011 m.
To simplify the geometry, just assume that Earth move directly toward
or away from Jupiter over the entire 40 orbits (see Fig. 1.11).
(b) Roemer did the experiment described in part (a), and experimen-
tally measured a 22 minute difference. What speed of light would one
deduce from that value?
P1.12 In an isotropic medium (i.e. ∇ · P = 0), the polarization can often be

written as function of the electric field: P = ²0 χ (E ) E, where χ (E ) =
χ1 + χ2 E + χ3 E 2 · · · . The higher order coefficients in the expansion (i.e. Ole Roemer (1644–1710, Danish) was
a man of many interests. In addition to
χ2 , χ3 , ...) are typically small, so only the first term is important at measuring the speed of light, he created
low intensities. The field of nonlinear optics deals with intense light- a temperature scale which with slight
modification became the Fahrenheit
matter interactions, where the higher order terms of the expansion are scale, introduced a system of standard
important. This can lead to phenomena such as harmonic generation. weights and measures, and was heavily
involved in civic affairs (city planning,
Starting with Maxwell’s equations, derive the wave equation for nonlin- etc.). Scientists initially became inter-
ested in Io’s orbit because its eclipse
ear optics in an isotropic medium: (when it went behind Jupiter) was an
event that could be seen from many
¢ ∂2 E ∂2 χ2 E + χ3 E 2 + · · · E ∂J
¡ ¢
2
places on earth. By comparing accurate
∇ E − µ0 ²0 1 + χ1 = µ0 ²0 + µ0
¡
measurements of the local time when Io
∂t 2 ∂t 2 ∂t was eclipsed by Jupiter at two remote
places on earth, scientists in the 1600s
We have retained the possibility of current here since, for example, in were able to determine the longitude
a gas some of the molecules might ionize in the presence of a strong difference between the two places.
field, giving rise to a current.

Chapter 2
Plane Waves and Refractive Index
In this chapter we focus on sinusoidal solutions to Maxwell’s equations, called

plane waves. Restricting our attention to plane waves may seem limiting at first,
since (as mentioned in chapter 1) any waveform shape can satisfy the wave
equation in vacuum, as long as it travels at c and has the requisite connections
between E and B. However, an arbitrary waveform can always be constructed
from a linear superposition of sinusoidal waves. Thus, there is no loss of generality
if we focus our attention on plane-wave solutions.
In a material, the electric field of a plane wave induces oscillating dipoles,
and these oscillating dipoles in turn alter the electric field. We use the index of
refraction to describe this effect. Plane waves of different frequencies experience
different refractive indexes, which causes them to travel at different speeds in
materials. Thus, an arbitrary waveform, which is composed of multiple sinusoidal
waves, invariably changes shape as it travels in a material, as the different sinu-
soidal waves change relationship with respect to one another. This phenomenon
(called dispersion), discussed in chapter 7, is one of the primary reasons why
physicists and engineers choose to work with sinusoidal waves. Every waveform
except for individual sinusoidal waves changes shape as it travels in a material.
When describing plane waves, it is convenient to employ complex numbers
to represent physical quantities. This is particularly true for problems involving
absorption, which takes place inside metals and, to a lesser degree (usually),
inside dielectrics (i.e. a non conducting material such as glass). When the electric
field is represented using complex notation, the index of refraction also becomes
a complex number. The imaginary part controls the rate at which the field decays,
while the real part governs the familiar oscillatory behavior. Complex notation
will be used extensively throughout this book. Students should make sure they
are comfortable with everything in the review provided in section 0.2.
In this chapter we introduce a very successful physical model for index of
refraction developed by Hendrik Lorentz. We also discuss Poynting’s theorem,
which governs the flow of energy carried by electromagnetic fields. This leads to
the concept of irradiance (or intensity), power per area delivered by an electro-
magnetic wave.
43
44 Chapter 2 Plane Waves and Refractive Index
2.1 Plane Wave Solutions to the Wave Equation

Consider the wave equation for an electric field waveform propagating in vacuum
AM
(1.42):
∂2 E
Frequency (Hz)
∇2 E − µ0 ²0 2 = 0 (2.1)
Radio ∂t
We are interested in solutions to (2.1) that have the functional form (see P1.9)
FM
E(r, t ) = E0 cos k · r − ωt + φ (2.2)

¡ ¢
Here φ represents an arbitrary (constant) phase term. The vector k, called the
Radar
Microwave wave vector, may be written as
2π
k ≡ k û = û (vacuum) (2.3)
λvac
where k has units of inverse length, û is a unit vector defining the direction of
Infrared propagation, and λvac is the length by which r must vary (in the direction of û) to
cause the cosine to go through a complete cycle. This distance is known as the
(vacuum) wavelength. The frequency of oscillation is related to the wavelength via
Visible
2πc
ω= (vacuum) (2.4)
Ultraviolet λvac
The frequency ω has units of radians per second. Frequency is also often ex-
pressed as ν ≡ ω/2π in units of inverse seconds or Hz. Notice that k and ω are
not independent of each other; they are related through the vacuum dispersion
X-rays relation
ω
k= (vacuum) (2.5)
c
Typical values for λvac are given in Fig. 2.1. Sometimes the spatial period of the
wave is expressed as 1/λvac , in units of cm−1 , called the wave number.
Wavelength (m)
A magnetic wave accompanies any electric wave, and it obeys a similar wave
Gamma Rays
equation (see P1.6). The magnetic wave corresponding to (2.2) is
B(r, t ) = B0 cos k · r − ωt + φ , (2.6)

¡ ¢
but it is important to note that B0 , k, ω, and φ are not independently chosen in

(2.6). In order to satisfy Faraday’s law (1.3), the arguments of the cosine in (2.2)
and (2.6) must be identical. Therefore, in vacuum the electric and magnetic fields
Figure 2.1 The electromagnetic
travel in phase. In addition, Faraday’s law requires (see P1.2)
spectrum
k × E0
B0 = (2.7)
ω
The above cross product means that B0 , E0 , and k are all mutually perpendicular.
Since k and E0 are perpendicular, the magnitudes of the fields are related through
B 0 = kE 0 /ω or B 0 = E 0 /c, in view of (2.5).

2.1 Plane Wave Solutions to the Wave Equation 45
The influence of the magnetic field only becomes important (in comparison
to the electric field) for charged particles moving near the speed of light. This typ-
ically takes place only for extremely intense lasers (intensities above 1018 W/cm2 ,
see P2.12) where the electric field is sufficiently strong to cause electrons to oscil-
late with velocities near the speed of light. Therefore, the magnetic field can be
ignored in most optics problems. Throughout the remainder of this book, we will
focus our attention mainly on the electric field with the understanding that we
can at any time deduce the (less important) magnetic field from the electric field
via Faraday’s law.
The depiction of the electric field (2.2) and the associated magnetic field (2.6)
in Fig. 2.2 shows the fields drawn like transverse waves on a string. However,
they are actually large planar sheets of uniform fields (different fields in different
planes) that move in the direction of k. The name plane wave is given since
the argument in (2.2) at any moment is constant (and hence the electric field is
uniform) across planes that are perpendicular to k. A plane wave fills all space and
may be thought of as a series of infinite sheets of uniform electric and magnetic
Figure 2.2 Depiction of electric
field moving in the k direction. and magnetic fields associated
At this point, we rewrite our plane wave solution using complex number nota- with a plane wave.
tion. Although this change in notation will not make the task at hand any easier
(and may even appear to complicate things), we introduce it here in preparation
for later sections, where it will save considerable labor. (For a review of complex
notation, see section 0.2.)
Using complex notation we rewrite (2.2) as
n o
E(r, t ) = Re Ẽ0 e i (k·r−ωt ) (2.8)
where we have hidden the phase term φ inside of Ẽ0 as follows:

Ẽ0 ≡ E0 e i φ (2.9)
The next step we take is to become intentionally sloppy. Physicists throughout
the world have conspired to avoid writing Re { } in an effort (or lack thereof if
you prefer) to make expressions less cluttered. Nevertheless, only the real part of
the field is physically relevant even though expressions and calculations contain
both real and imaginary terms. This sloppy notation is okay since the real and
imaginary parts of complex numbers never intermingle when adding, subtracting,
differentiating, or integrating. We can delay taking the real part of the expression
until the end of the calculation. Also, when hiding a phase φ inside of the field
amplitude as in (2.8), we drop the tilde (might as well since we are already being
sloppy); when using complex notation, we will automatically assume that the
complex field amplitude contains phase information. Putting this all together,
our plane wave solution in complex notation is written simply as
E(r, t ) = E0 e i (k·r−ωt ) (2.10)
It is possible to construct any electromagnetic disturbance from a linear superpo-
sition of such waves, which we will do in chapter 7.

Example 2.1
Verify that the complex plane wave (2.10) is a solution to the wave equation (2.1).
Solution: The first term gives
∂2 ∂2 ∂2
· ¸
∇2 E0 e i (k·r−ωt ) = E0 + + e i (k x x+k y y+k z z−ωt )
∂x 2 ∂y 2 ∂z 2
(2.11)
³ ´
= −E0 k x2 + k y2 + k z2 e i (k·r−ωt )
= −k 2 E0 e i (k·r−ωt )
and the second term gives
1 ∂2 ³ i (k·r−ωt )
´ ω2
E 0 e = − E0 e i (k·r−ωt ) (2.12)
c 2 ∂t 2 c2
Upon insertion into (2.1) we obtain the vacuum dispersion relation (2.5), which
specifies the connection between the wavenumber k and the frequency ω. While
the vacuum dispersion relation is simple, it emphasizes that k and ω cannot be
independently chosen (as we saw in (2.3) and (2.4)).
2.2 Index of Refraction

Let’s take a look at how plane waves behave in dielectric media (e.g. glass). We
assume an isotropic, homogeneous, and non-conducting medium (i.e. Jfree = 0).
In this case, we expect E and P to be parallel to each other so ∇ · P = 0 from (1.34).
The general wave equation (1.41) for the electric field reduces in this case to
∂2 E ∂2 P
∇2 E − ²0 µ0 = µ0 (2.13)
∂t 2 ∂t 2
Since we are considering sinusoidal waves, we consider solutions of the form
E = E0 e i (k·r−ωt )
(2.14)
P = P0 e i (k·r−ωt )
By writing this, we are making the (reasonable) assumption that if an electric field
stimulates a medium at frequency ω, then the polarization in the medium also
oscillates at frequency ω. This assumption is typically rather good except when
extreme electric fields are used (see P1.12). Recall that by our prior agreement,
the complex amplitudes of E0 and P0 carry phase information. Thus, while E and
P in (2.14) oscillate at the same frequency, they can be out of phase with respect to
each other. This phase discrepancy is most pronounced for materials that absorb
energy at the plane wave frequency.
Substitution of the trial solutions (2.14) into (2.13) yields
− k 2 E0 e i (k·r−ωt ) + ²0 µ0 ω2 E0 e i (k·r−ωt ) = −µ0 ω2 P0 e i (k·r−ωt ) (2.15)

2.2 Index of Refraction 47
At this point, we need to make an explicit connection between E0 and P0 . In

a linear medium (essentially any material if the electric field strength is not
extreme), the polarization amplitude is proportional to the strength of the applied
electric field:
P0 (ω) = ²0 χ (ω) E0 (ω) (2.16)
This is known as a constitutive relation. We have introduced a dimensionless pro-
portionality factor χ(ω) called the susceptibility, which depends on the frequency
of the field. We account for the possibility that E and P oscillate out of phase by
allowing χ(ω) be a complex number in (2.16).
By inserting (2.16) into (2.15) and canceling the field terms, we obtain the
dispersion relation in dielectrics:
ωp
k 2 = ²0 µ0 1 + χ (ω) ω2 or k = 1 + χ (ω) (2.17)
£ ¤
c
p
where we have used c ≡ 1/ ²0 µ0 . In general, χ(ω) is a complex number, which
leads to a complex index of refraction, defined by1
N (ω) ≡ n(ω) + i κ(ω) = 1 + χ(ω)

p
(2.18)
where n and κ are respectively the real and imaginary parts of the index. (Note
that κ is not k.) According to (2.17), the magnitude of the wave vector is also
complex according to
N ω (n + i κ) ω
k= = (2.19)
c c
The use of complex index of refraction only makes sense in the context of complex
representation of plane waves.
The complex index N takes account of absorption as well as the usual oscilla-
tory behavior of the wave. We see this by explicitly placing (2.19) into (2.14):
κω
¡ nω
E(r, t ) = E0 e −Im{k}·r e i (Re{k}·r−ωt ) = E0 e − ei
¢
û·r û·r−ωt
c c (2.20)
As before, here û is a real unit vector specifying the direction of k. Again, when
looking at (2.20), by special agreement in advance, we should just think of the real
part, namely2 ³ nω
κω
´
E(r, t ) = E0 e − c û·r cos û · r − ωt + φ (2.21)
c
where an overall phase φ was formerly held in the complex vector Ẽ0 . (The tilde Figure 2.3 Electric field of a decay-
had been suppressed.) Figure 2.3 shows a graph of (2.21). The imaginary part of ing plane wave. For convenience
the index κ causes the wave to decay as it travels. The real part of the index n is in plotting, the direction of prop-
agation is chosen to be in the z
1 Electrodynamics books often use the electric displacement D ≡ ² E + P = ²E. The permittivity direction (i.e. û = ẑ).
0
² encapsulates the constitutive relation that connects P with E. In a linear medium we have
p
² ≡ ²0 (1 + χ), so that the index of refraction is given by N = ²/²0 .
2 For the sake of simplicity in writing (2.21) we assumed linearly polarized light. That is, all vector
components of E0 were assumed to have the same complex phase φ. The expression would be
somewhat more complicated, for example, in the case of circularly polarized light (described in
chapter 6).

associated with the oscillations of the wave. By inspection of the cosine argument
in (eq:2.3.20), we see that the speed of the diminishing sinusoidal wave fronts is
v phase (ω) = c /n(ω) (2.22)
It is apparent that n(ω) is the ratio of the speed of the light in vacuum to the speed
of the wave in the material.
In a dielectric, the vacuum relations (2.3) and (2.4) are modified to read
2π
Re {k} ≡ û, (2.23)
λ
where
λ ≡ λvac /n. (2.24)
While the frequency ω is the same, whether in a material or in vacuum, the
wavelength λ in the material is different from the wavelength in vacuum, as
indicated by (2.24).
Example 2.2
When n = 1.5, κ = 0.1, and ν = 5 × 1014 Hz, find (a) the wavelength inside the
material, and (b) the propagation distance over which the amplitude of the wave
diminishes by the factor e −1 (called the skin depth).
Solution: (a)
λvac 2πc 3 × 108 m/s

¡ ¢
c
λ= = = = ¢ = 400 nm
n nω nν 1.5 5 × 1014 Hz
¡
(b)
κω c c 3 × 108 m/s
e− c z = e −1 ⇒ z= = = ¢ = 950 nm
κω 2πκν 2π (0.1) 5 × 1014 Hz
¡
Obtaining n and κ from the complex susceptibility χ
From (2.18) we have
(n + i κ)2 = n 2 − κ2 + i 2nκ = 1 + χ (2.25)
The real parts and imaginary parts in the above equation are separately equal,
which gives
n 2 − κ2 = 1 + Re χ and 2nκ = Im χ (2.26)
© ª © ª
From the latter equation we have
κ = Im χ /2n (2.27)
© ª

2.3 The Lorentz Model of Dielectrics 49
When this is substituted into the first equation of (2.26) we get a quadratic in n 2
¡ © ª¢2
Im χ
n 4 − 1 + Re χ n 2 − =0 (2.28)
¡ © ª¢
4
The positive3 real root to this equation is
v
u¡ © ª¢ q¡ © ª¢2 ¡ © ª¢2
t 1 + Re χ + 1 + Re χ + Im χ
u
n= (2.29)
2
The imaginary part of the index is then obtained from (2.27).
When absorption is small we can neglect the imaginary part of χ(ω), and
(2.29) reduces to
n (ω) = 1 + χ (ω)
p
(negligible absorption) (2.30)
2.3 The Lorentz Model of Dielectrics Hendrik Antoon Lorentz (1853–1928,

Dutch) was born in Arnhem, Nether-
lands, the son a successful nurseryman.
To compute the index of refraction in either a dielectric or a conducting material, Hendrick’s mother died when he was
we require a model that describes the response of electrons in the material to nine years old. He studied classical lan-
the passing electric field wave. Of course, the model in turn influences how the guages and then entered the University
of Leiden where he was strongly influ-
electric field propagates, which is what influences the material in the first place! enced by astronomy professor Frederik
The model therefore must be solved together with the propagating field in a Kaiser whose niece Hendrik married.
Hendrik was persuaded to become a
self-consistent manner. physicist and wrote a doctoral disserta-
Hendrik Lorentz developed a very successful model in the late 1800s, which tion entitled “On the theory of reflection
and refraction of light,” in which he re-
treats each (active) electron in the medium as a classical particle obeying Newton’s fined Maxwell’s electromagnetic theory
second law (F = ma). In the case of a dielectric medium, electrons are subject to and used it to explain the reflection and
an elastic restoring force that keeps each electron bound to its respective atom refraction of light. Lorentz correctly
hypothesized that the atoms were com-
and a damping force that dissipates energy and gives rise to absorption. posed of charged particles, and that
The Lorentz model determines the susceptibility χ (ω) (the connection be- their movement was the source of light.
He also derived the transformations of
tween the electric field E0 and the polarization P0 ) and hence the index of re- space and time used in Einstein’s the-
fraction. The model assumes that all atoms (or molecules) in the medium are ory of relativity. Lorentz won the Nobel
prize in 1902 for his contributions to
identical, each with one (or a few) active electrons responding to the external electromagnetic theory.
field. The atoms are uniformly distributed throughout space with N identical
active electrons per volume (units of number per volume). The polarization of
the material is then
P = N q e rmicro (2.31)
Recall that polarization has units of dipoles per volume. Each dipole has strength
q e rmicro , where rmicro is a microscopic displacement of the electron from equilib-
rium.
At the time of Lorentz, atoms were thought to be clouds of positive charge
wherein point-like electrons sat at rest unless stimulated by an applied electric
3 It is possible to have n < 0 for so called meta materials, not considered here.

field. In our modern quantum-mechanical viewpoint, rmicro corresponds to an

average displacement of the electronic cloud, which surrounds the nucleus (see
Fig. 2.4). The displacement rmicro of the electron charge in an individual atom
depends on the local strength of the applied electric field E at the position of the
atom. Since the diameter of the electronic cloud is tiny compared to a wavelength
of (visible) light, we may consider the electric field to be uniform across any
Unperturbed individual atom.
The Lorentz model uses Newton’s equation of motion to describe an electron
displacement from equilibrium within an atom. In accordance with the classical
+ laws of motion, the electron mass m e times its acceleration is equal to the sum of
the forces on the electron:
m e r̈micro = q e E − m e γṙmicro − k Hooke rmicro (2.32)
The electric field pulls on the electron with force q e E.4 A drag force (or friction)
In an electric field −m e γṙmicro opposes the electron motion and accounts for absorption of energy.
Without this term, it is only possible to describe optical index at frequencies away
from where absorption takes place. Finally, −k Hooke rmicro is a force accounting
- + for the fact that the electron is bound to the nucleus. This restoring force can be
thought of as an effective spring that pulls the displaced electron back towards
equilibrium with a force proportional to the amount of displacement, so this
term is essentially the familiar Hooke’s law. With some rearranging, (2.32) can be
written as
qe
Figure 2.4 A distorted electronic r̈micro + γṙmicro + ω20 rmicro = E (2.33)
cloud becomes a dipole. me
where ω0 ≡ k Hooke /m e is the natural oscillation frequency (or resonant fre-
p
quency) associated with the electron mass and the “spring constant.”
In accordance with our examination of a single sinusoidal wave, we insert
(2.14) into (2.33) and obtain
qe
r̈micro + γṙmicro + ω20 rmicro = E0 e i (k·r−ωt ) (2.34)
me
Note that within a given atom the excursions of rmicro are so small that k·r remains
essentially constant, since k·r varies with displacements on the scale of an optical
wavelength, which is huge compared to the size of an atom. The inhomogeneous
solution to (2.34) is (see P2.1)
qe E0 e i (k·r−ωt )
µ¶
rmicro = (2.35)
m e ω20 − i ωγ − ω2
The electron position rmicro oscillates (not surprisingly) with the same frequency
ω as the driving electric field. This solution illustrates the convenience of the com-
plex notation. The imaginary part in the denominator implies that the electron
4 The electron also experiences a force due to the magnetic field of the light, F = q v
e micro × B,
but this force is tiny for typical optical fields.

2.3 The Lorentz Model of Dielectrics 51
oscillates with a different phase from the electric field oscillations; the damping
term γ (the imaginary part in the denominator) causes the two to be out of phase
somewhat. The complex algebra in (2.35) accomplishes quite easily what would
otherwise be cumbersome (i.e. working out a trigonometric phase).
We are now able to write the polarization in terms of the electric field. By
substituting (2.35) into (2.31) and rearranging, we obtain
ω2p
Ã !
P = ²0 E0 e i (k·r−ωt ) (2.36)
ω20 − i ωγ − ω2
where the plasma frequency ωp is

s
N q e2
ωp ≡ (2.37)
²0 m e
A comparison of (2.36) with (2.16) reveals the (complex) susceptibility:
ω2p
χ (ω) = (2.38)
ω20 − i ωγ − ω2
The index of refraction is then found by substituting the susceptibility (2.38) into
(2.18). The real and imaginary parts of the index are solved by equating separately
the real and imaginary parts of (2.18), namely
2
ω2p
(n + i κ) = 1 + χ (ω) = 1 + (2.39)
ω20 − i ωγ − ω2
A graph of n and κ is given in Fig. 2.5.

Most materials actually have more than one species of active electron, and
different active electrons behave differently. The generalization of (2.39) in this
case is
f j ω2p j
2
(n + i κ) = 1 + χ (ω) = 1 +
X
2 2
(2.40)
j ω0 j − i ωγ j − ω Figure 2.5 Real and imaginary
parts of the index for a single
where f j is the aptly named oscillator strength for the j th species of active electron. Lorentz oscillator dielectric with
Each species also has its own plasma frequency ωp j , natural frequency ω0 j , and ωp = 10γ.
damping coefficient γ j .
Lorentz introduced this model well before the development of quantum
mechanics. Even though the model pays no attention to quantum physics, it
works surprisingly well for describing frequency-dependent optical indices and
absorption of light. As it turns out, the Schrödinger equation applied to two levels
in an atom reduces in mathematical form to the Lorentz model in the limit of
low-intensity light. Quantum mechanics also explains the oscillator strength,
which before the development of quantum mechanics had to be inserted ad hoc
to make the model agree with experiments. The friction term γ turns out not to be
associated with something internal to atoms but rather with collisions between
atoms that on average gives rise to the same behavior.

2.4 Index of Refraction of a Conductor

In a conducting medium, the outer electrons of atoms are free to move without
being tethered to any particular atom. However, the electrons are still subject to a
damping force due to collisions that remove energy and gives rise to absorption.
Such collisions are associated with resistance in a conductor. As it turns out,
we can obtain a simple formula for the refractive index of a conductor from the
Lorentz model in section 2.3. We simply remove the restoring force that binds
electrons to their atoms. That is, we set ω0 = 0 in (2.39), which gives
ω2p
(n + i κ)2 = 1 − (2.41)
i ωγ + ω2
This underscores the fact that ∂P/∂t is a current very much like Jfree . When
we remove the restoring force k Hooke = m e ω20 from the atomic model, the elec-
trons effectively become free, and it is not surprising that they exactly mimic the
behavior of a free current Jfree . A graph of n and κ in the conductor model is given
in Fig. 2.6. Below, we provide the derivation for (2.41) in the context of Jfree rather
than as a limiting case of the dielectric model.
Derivation of Refractive Index for a Conductor
We will include the current density Jfree while setting the medium polarization P
to zero. The wave equation is
∂2 ∂
∇2 E − ²0 µ0 2
E = µ0 Jfree (2.42)
∂t ∂t
We assume that the current is made up of individual electrons traveling with
velocity vmicro :
Jfree = N q e vmicro (2.43)
Figure 2.6 Real and imaginary
parts of the index for conductor As before, N is the number density of free electrons (in units of number per vol-
with ωp = 50γ. ume). Recall that current density Jfree has units of charge times velocity per volume
(or current per cross sectional area), so (2.43) may be thought of as a definition of
current density in a fundamental sense.
Again, the electrons satisfy Newton’s equation of motion, similar to (2.32) except
without a restoring force:
m e r̈micro = q e E − m e γṙmicro (2.44)
For a sinusoidal electric field E = E0 e i (k·r−ωt ) , the solution to this equation is
q e E0 e i (k·r−ωt )
µ ¶
vmicro ≡ ṙmicro = (2.45)
me γ−iω
where again we assume that the electron oscillation excursions described by rmicro
are small compared to the wavelength so that r can be treated as a constant in
(2.44). The current density (2.43) in terms of the electric field is then
N q e2 E0 e i (k·r−ωt )
µ ¶
Jfree = (2.46)
me γ−iω

2.5 Poynting’s Theorem 53
We substitute this together with the electric field into the wave equation (2.42) and
get
ω2 µ0 N q e2 E0 e i (k·r−ωt )
µ ¶
− k 2 E0 e i (k·r−ωt ) + 2 E0 e i (k·r−ωt ) = −i ω (2.47)
c me γ−iω
This simplifies down to the dispersion relation
ω2 ω2p
Ã !
2
k = 2 1− (2.48)
c i γω + ω2
which agrees with (2.41). We have made the substitution ω2p = N q e2 /²0 m e in accor-
ω2 1+χ
dance with (2.37). As usual, k 2 = ω (n+i κ) = ( ) .
2 2
c2 c2
Note that in the low-frequency limit (i.e. ω << γ), the current density (2.46)
reduces to Ohm’s law J = σE, where σ = N q e2 /m e γ is the DC conductivity. In
the high-frequency limit (i.e. ω >> γ), the behavior changes over to that of a
free plasma, where collisions, which are responsible for resistance, become less
important since the excursions of the electrons during oscillations become very
small. This formula captures the general behavior of metals, but actual values of
the index vary from this somewhat (see P2.6 ).
In either the conductor or dielectric model, the damping term removes energy
from electron oscillations. The damping term gives rise to an imaginary part
of the index, which causes an exponential attenuation of the plane wave as it
propagates.
2.5 Poynting’s Theorem

Until now, we have described light as the propagation of an electromagnetic
disturbance. However, we typically observe light by detecting absorbed energy
rather than the field amplitude directly. In this section we examine the connection
between propagating electromagnetic fields (such as the plane waves discussed
above) and the energy transported by such fields.
In the late 1800s John Poynting developed (from Maxwell’s equations) the
theoretical foundation that describes light energy transport. Students should ap-
preciate and remember the ideas involved, especially the definition and meaning
of the Poynting vector, even if they forget the specifics of its derivation.
Derivation of Poynting’s Theorem
We require just two of Maxwell’s Equations: (1.3) and (1.4). We take the dot product
of B/µ0 with the first equation and the dot product of E with the second equation.
Then by subtracting the second equation from the first we obtain
∂E B ∂B
µ ¶
B B
· (∇ × E) − E · ∇ × + ²0 E · + · = −E · J (2.49)
µ0 µ0 ∂t µ0 ∂t

The first two terms can be simplified using the vector identity P0.8. The next two
terms are the time derivatives of ²0 E 2 /2 and B 2 /2µ0 , respectively. The relation
(2.49) then becomes
∂ ²0 E 2 B 2
µ ¶ µ ¶
B
∇· E× + + = −E · J (2.50)
µ0 ∂t 2 2µ0
This is Poynting’s theorem. Each term in this equation has units of power per
volume.
The conventional way of writing Poynting’s theorem is as follows:

∂u field ∂u medium
∇·S+ =− (2.51)
∂t ∂t
where
B
S ≡ E× (2.52)
µ0
is called the Poynting vector and has units of power per area, called irradiance.
The expression
²0 E 2 B 2
u field ≡ + (2.53)
2 2µ0
is the energy per volume stored in the electric and magnetic fields. Derivations of
the electric field energy density and the magnetic field energy density are given in
Appendices 2.A and 2.B. (See (2.69) and (2.76).) Finally,
∂u medium
John Henry Poynting (1852–1914, ≡ E·J (2.54)
English) was the youngest son of a Uni- ∂t
tarian minister who operated a school
near Manchester England where John is the power per volume delivered to the medium. Equation (2.54) is reminiscent
received his childhood education. He of the familiar circuit power law, Power = Voltage × Current. Power is delivered
later attended Owen’s College in Manch-
when a charged particle traverses a distance while experiencing a force. This
ester and then went on to Cambridge
University where he distinguished him- happens when currents flow in the presence of electric fields.
self in mathematics and worked under Poynting’s theorem is essentially a statement of the conservation of energy,
James Maxwell in the Cavendish Lab-
oratory. Poynting joined the faculty of where S describes the flow of energy. To appreciate this, consider Poynting’s
the University of Birmingham (then theorem (2.51) integrated over a volume V (enclosed by surface S). If we also
called Mason Science College) where
he was a professor of physics from 1880 apply the divergence theorem (0.11) to the term involving ∇ · S we obtain
until his death. Besides developing his ∂
I Z
famous theorem on the conservation S · n̂ d a = − (u field + u medium ) d v (2.55)
of energy in electromagnetic fields, he ∂t
performed innovative measurements of S V
Newton’s gravitational constant and Notice that the volume integral over energy densities u field and u medium gives
discovered that the Sun’s radiation
draws in small particles towards it, the the total energy stored in V , whether in the form of electromagnetic field energy
Poynting-Robertson effect. Poynting density or as energy density that has been given to the medium. The integration
was the principal author of a multi-
volume undergraduate physics textbook, of the Poynting vector over the surface gives the net Poynting vector flux directed
which was in wide use until the 1930s. outward. Equation (2.55) indicates that the outward Poynting vector flux matches
the rate that total energy disappears from the interior of V . Conversely, if the
Poynting vector is directed inward (negative), then the net inward flux matches
the rate that energy increases within V . The vector S defines the flow of energy
through space. Its units of power per area are just what is needed to describe the
brightness of light impinging on a surface.

2.6 Irradiance of a Plane Wave 55
2.6 Irradiance of a Plane Wave

Consider the electric field wave described by (2.10). The magnetic field that
accompanies this electric field can be found from Maxwell’s equation (1.3), and it
turns out to be (compare with problem P1.2)
k × E0 i (k·r−ωt )
B(r, t ) = e (2.56)
ω
When k is complex, B is out of phase with E, and this occurs when absorption
takes place. When there is no absorption, then k is real, and B and E carry the
same complex phase.
Before computing the Poynting vector (2.52), which involves multiplication,
we must remember our unspoken agreement that only the real parts of the fields
are relevant. We necessarily remove the imaginary parts before multiplying (see
(0.23)). To obtain the real parts of the fields, we add their respective complex
conjugates and divide the result by 2 (see (0.30)). The real field associated with
(2.10) is
1h ∗
i
E(r, t ) = E0 e i (k·r−ωt ) + E∗0 e −i (k ·r−ωt ) (2.57)
2
and the real field associated with (2.56) is
1 k × E0 i (k·r−ωt ) k∗ × E∗0 −i (k∗ ·r−ωt )
· ¸
B(r, t ) = e + e (2.58)
2 ω ω
We have merely exercised our previous (conspiratorial) agreement that only the
real parts of (2.39) and (2.56) are to be retained.
Now we are ready to calculate the Poynting vector. The algebra is a little messy
in general, so we restrict the analysis to the case of an isotropic medium for the
sake of simplicity.
Calculation of the Poynting Vector for a Plane Wave
Using (2.57) and (2.56) in (2.52) gives
B
S ≡ E×
µ0
1h 1 k × E0 i (k·r−ωt ) k∗ × E∗0 −i (k∗ ·r−ωt )
· ¸
E0 e i (k·r−ωt ) + E∗0 e −i (k ·r−ωt ) ×
∗
i
= e + e
2 2µ0 ω ω
E0 ×(k×E0 ) 2i (k·r−ωt ) E∗
0 ×(k×E0 ) i (k−k )·r
∗
" #
1 ω¡ e ¢ + ω ¡ e ¢
= E0 × k∗ ×E∗ E∗ × k∗ ×E∗
4µ0 + 0
e i ( k−k )
∗ ·r
+ 0 ω 0 e −2i (k ·r−ωt )
∗
ω
1 k k
· ¸
κω
= E0 × (û × E0 ) e 2i (k·r−ωt ) + E∗0 × (û × E0 ) e −2 c û·r + C.C.
4µ0 ω ω
(2.59)
The letters ‘C.C.’ stand for the complex conjugate of what precedes in the square
brackets. The direction of k is specified with the real unit vector û. We have also
used (2.19) to rewrite i (k − k∗ ) as −2 (κω/c) û.

The assumption of an isotropic medium (not a crystal) means that ∇ · E(r, t ) = 0

and therefore û · E0 = 0. We can use this fact together with the BAC-CAB rule P0.3
to reduce the above expression to
û k k¡
· ¸
κω
(E0 · E0 ) e 2i (k·r−ωt ) + E0 · E∗0 e −2 c û·r + C.C. (2.60)
¢
S=
4µ0 ω ω
The final expression shows that (in an isotropic medium) the flow of energy is in
the direction of û (or k). This agrees with our intuition that energy flows in the
direction that the wave propagates.
Very often, we are interested in the time-average of the Poynting vector, de-
noted by 〈S〉t . There are no electronics that can keep up with the rapid oscillation
of visible light (i.e. > 1014 Hz). Therefore, what is always measured is the time-
averaged absorption of energy. Under time averaging, the first term in (2.60)
vanishes since it rapidly oscillates positive and negative. Note that k is the only
factor in the second term that is (potentially) not real. The time-averaged Poynting
vector becomes
û k + k ∗ ¡ κω
E0 · E∗0 e −2 c û·r
¢
〈S〉t =
4µ0 ω (2.61)
n²0 c ³ ¯2 ´ κω
|E 0 x |2 + ¯E 0 y ¯ + |E 0 z |2 e −2 c û·r
¯
= û
2
We have used (2.19) to rewrite k + k ∗ as 2 (nω/c). We have also used (1.43) to

rewrite 1/µ0 c as ²0 c.
The expression (2.61) is formally called the irradiance (with the direction û
included). However, we often speak of the intensity of a field I , which amounts to
the same thing, but without regard for the direction û. The definition of intensity
is thus less specific, and it can be applied, for example, to standing waves where
the net irradiance is technically zero (i.e. counter-propagating plane waves with
zero net energy flow). Nevertheless, atoms in standing waves ‘feel’ the oscillating
field. In general, the intensity is written as
n²0 c n²0 c ³ ¯2 ´
E0 · E∗0 = |E 0 x |2 + ¯E 0 y ¯ + |E 0 z |2
¯
I= (2.62)
2 2
where in this case ¯ we¯2have ignored absorption (i.e. κ ≈ 0). Alternatively, we could
consider |E 0 x |2 , ¯E 0 y ¯ , and |E 0 z |2 to include the factor exp(−2(κω/c)û · r) so that
they correspond to the local electric field.
Appendix 2.A Energy Density of Electric Fields

In this appendix we show that the term ²0 E 2 /2 in (2.53) corresponds to the energy
density of an electric field. The electric potential φ(r) (in units of energy per
charge, or in other words volts) describes the potential energy that a charge would

2.A Energy Density of Electric Fields 57
experience if placed at any given point in the field. The electric field and the
potential are connected through
E (r) = −∇φ (r) (2.63)
The energy U necessary to assemble a distribution of charges (owing to attraction

or repulsion) can be written in terms of a summation over all of the charges (or
charge density ρ (r)) located within the potential:
1
Z
U= φ (r) ρ (r) d v (2.64)
2
V
We consider the potential to arise from the charges themselves. The factor 1/2
is necessary to avoid double counting. To appreciate this factor consider just
two point charges: We only need to count the energy due to one charge in the
presence of the other’s potential to obtain the energy required to bring the charges
together.
A substitution of (1.1) for ρ (r) into (2.64) gives
²0
Z
U= φ (r) ∇ · E (r) d v (2.65)
2
V
Next, we use the vector identity in P0.9 and get
²0 ²0
Z Z
U= ∇ · φ (r) E (r) d v − E (r) · ∇φ (r) d v (2.66)
£ ¤
2 2
V V
An application of the divergence theorem (0.11) on the first integral and a substi-
tution of (2.63) into the second integral yields
²0 ²0
I Z
U= φ (r) E (r) · n̂d a + E (r) · E (r) d v (2.67)
2 2
S V
We can consider the volume V (enclosed by S) to be as large as we like, say

a sphere of radius R, so that all charges are contained well within it. Then the
surface integral over S vanishes as R → ∞ since φ ∼ 1/R and E ∼ 1/R 2 , whereas
d a ∼ R 2 . Then the total energy is expressed solely in terms of the electric field:
Z
U= u E (r) d v (2.68)
All
Space
where
²0 E 2
u E (r) ≡ (2.69)
2
is interpreted as the energy density of the electric field.

Appendix 2.B Energy Density of Magnetic Fields

In a derivation similar to that in appendix 2.A, we consider the energy associated
with magnetic fields. The magnetic vector potential A (r) (in units of energy
per charge×velocity) describes the potential energy that a charge moving with
velocity v would experience if placed in the field. The magnetic field and the
vector potential are connected through
B (r) = ∇ × A (r) (2.70)
The energy U necessary to assemble a distribution of currents can be written in

terms of a summation over all of the currents (or current density J (r)) located
within the vector potential field:
1
Z
U= J (r) · A (r) d v (2.71)
2
V
As in (2.64), the factor 1/2 is necessary to avoid double counting the influence of
the currents on each other.
Under the assumption of steady currents (no variations in time), we may
substitute Ampere’s law (1.21) into (2.71), which yields
1
Z
U= [∇ × B (r)] · A (r) d v (2.72)
2µ0
V
Next we employ the vector identity P0.8 from which the previous expression
becomes
1 1
Z Z
U= B (r) · [∇ × A (r)] d v − ∇ · [A (r) × B (r)] d v (2.73)
2µ0 2µ0
V V
Upon substituting (2.70) into the first equation and applying the Divergence
theorem (0.11) on the second integral, this expression for total energy becomes
1 1
Z I
U= B (r) · B (r) d v − [A (r) × B (r)] · n̂ d a (2.74)
2µ0 2µ0
V S
As was done in connection with (2.67), if we choose a large enough volume (a

sphere with radius R → ∞), the surface integral vanishes since A ∼ 1/R and
B ∼ 1/R 2 , whereas d a ∼ R 2 . The total energy (2.74) then reduces to
Z
U= u B (r) d v (2.75)
All
Space
where
B2
u B (r) ≡ (2.76)
2µ0
is the energy density for a magnetic field.

2.C Radiometry Versus Photometry 59
Appendix 2.C Radiometry Versus Photometry

Photometry refers to the characterization of light sources in the context of the
spectral response of the human eye. However, physicists most often deal with
radiometry, which treats light of any wavelength on equal footing. Table 2.1 lists
several concepts important in radiometry. The last two entries are associated
with the average Poynting flux described in section 2.6.
The concepts used in photometry are similar, except that the radiometric
quantities are multiplied by the spectral response of the human eye, a curve
that peaks at λvac = 555 nm and drops to near zero for wavelengths longer than
λvac = 700 nm or shorter than λvac = 400 nm. Photometric units, which may seem
a little obscure, were first defined in terms of an actual candle with prescribed
dimensions made from whale tallow. The basic unit of luminous power is called
the lumen, defined to be (1/683) W of light with wavelength λvac = 555 nm, the
peak of the eye’s response. More radiant power is required to achieve the same
number of lumens for wavelengths away from the center of the eye’s spectral
response. Photometric units are often used to characterize room lighting as well
as photographic, projection, and display equipment. Table 2.1 gives the names
of the various photometric quantities, which parallel the entries for radiometric
quantities. We include a variety of units that are sometimes encountered.
Radiometric quantities and units Photometric quantities and units

Radiant Power (of a source): Electromagnetic energy Luminous Power (of a source): Visible light energy emit-
emitted per time from a source. Units: Watts W = ted per time from a source. Units: lumens (lm)
J/s lm=(1/683) W @ 555 nm
Radiant Solid-Angle Intensity (of a source): Radiant Luminous Solid-Angle Intensity (of a source) Luminous
power per steradian emitted from a point-like power per steradian emitted from a point-like
source (4π steradians in a sphere). Units: W/Sr source. Units: candelas (cd), cd = lm/Sr.
Radiance or Brightness (of a source): Radiant solid- Luminance (of a source): Luminous solid-angle inten-
angle intensity per unit projected area of an ex- sity per projected area of an extended source. (The
tended source. The projected area foreshortens by projected area foreshortens by cos θ, where θ
cos θ, where θ is the observation angle relative to is the observation angle relative to the surface
the surface normal. Units: W/(Sr · cm2 ) normal.) Units: cd/cm2 = stilb, cd/m2 = nit,
nit = 3183 lambert = 3.4 footlambert
Radiant Emittance or Exitance (from a source): Radiant
Power emitted per unit surface area of an extended Luminous Emittance or Exitance (from a source):
source (the Poynting flux leaving). Units: W/cm2 Luminous Power emitted per unit surface area
of an extended source. Units: lm/cm2
Irradiance (to a receiver) Often called intensity:
Electromagnetic power delivered per area to a Illuminance (to a receiver): Incident luminous power
receiver: Poynting flux arriving. Units: W/cm2 delivered per area to a receiver. Units: lux;
lm/m2 = lux, lm/cm2 = phot, lm/ft2 = footcandle
Table 2.1 A comparison of radiometric and photometric concepts.

Exercises
Exercises for 2.2 Index of Refraction
P2.1 Verify that (2.35) is a solution to (2.34).
P2.2 Derive the Sellmeier equation
Aλ2vac
n2 = 1 +
λ2vac − λ20,vac
from (2.39) for a gas with negligible absorption (i.e. γ ∼ = 0, valid far
from resonance ω0 ), where λ0,vac corresponds to frequency ω0 and A is
a constant. Many materials (e.g. glass, air) have strong resonances in
the ultraviolet. In such materials, do you expect the index of refraction
for blue light to be greater than that for red light? Make a sketch of n as
a function of wavelength for visible light down to the ultraviolet (where
λ0,vac is located).
P2.3 In the Lorentz model, take N = 1028 m−3 for the density of bound
electrons in an insulator (note that N is number per volume, not just
number), and a single transition at ω0 = 6 × 1015 rad/sec (in the UV),
and damping γ = ω0 /5 (quite broad). Assume E 0 is 104 V/m.
For three frequencies ω = ω0 − 2γ, ω = ω0 , and ω = ω0 + 2γ find the
magnitude and phase of the following (give the phase relative to the
phase of E 0 ). Give correct SI units with each quantity. You don’t need
to worry about vector directions.
(a) The charge displacement amplitude r micro (2.35)
(b) The polarization amplitude P (ω)
(c) The susceptibility χ(ω). What would the susceptibility be for twice
the E-field strength as before?
For the following no phase is needed:
(d) Find n and κ at the three frequencies. You will have to solve for the
real and imaginary parts of (n + i κ)2 = 1 + χ(ω).
(e) Find the three speeds of light in terms of c. Find the three wave-
lengths λ.
(f) Find how far light penetrates into the material before only 1/e of the
amplitude of E remains. Find how far light penetrates into the material
before only 1/e of the intensity I remains.
P2.4 (a) Use a computer graphing program and the Lorentz model to plot n
and κ as a function of ω frequency for a dielectric (i.e. obtain graphs
such as the ones in Fig. 2.5). Use these parameters to keep things

Exercises 61
simple: ωp = 1, ω0 = 10, and γ = 1; plot your function from ω = 0 to

ω = 20.
(b) Plot n and κ as a function of frequency for a material that has
three resonant frequencies: ω0 1 = 10, γ1 = 1, f 1 = 0.5; ω0 2 = 15, γ2 = 1,
f 2 = 0.25; and ω0 3 = 25, γ3 = 3, f 3 = 0.25. Use ωp = 1 for all three
resonances, and plot the results from ω = 0 to ω = 30. Comment on
your plots.
Exercises for 2.4 Index of Refraction of a Conductor
P2.5 For silver, the complex refractive index is characterized by n = 0.14

and κ = 4.0. Find the distance that light travels inside of silver before
the field is reduced by a factor of 1/e. Assume a wavelength of λvac =
633 nm. What is the speed of the wave crests in the silver (written as a
number times c)? Are you surprised?
P2.6 Use (2.27), (2.29), and (2.48) to estimate the index of silver at λ = 633nm.
The density of free electrons in silver is N = 5.86 × 1028 m−3 and the
DC conductivity is σ = 6.62 × 107 C2 / (J · m · s). Compare with the actual
index given in P2.5.
Answer: n + i κ = 0.02 + i 4.50
P2.7 The dielectric model and the conductor model give identical results
for n in the case of a low-density plasma where there is no restoring
force (i.e. ω0 = 0) and no dragging term (i.e. γ = 0). Use this to model
the ionosphere (the uppermost part of the atmosphere that is ionized
by solar radiation to form a low-density plasma).
(a) If the index of refraction of the ionosphere is n = 0.9 for an FM
station at ν = ω/2π = 100 MHz, calculate the number of free electrons
per cubic meter.
(b) What is the complex refractive index of the ionosphere for radio
waves at 1160 kHz (KSL radio station)? Is this frequency above or below
the plasma frequency? Assume the same density of free electrons as in
part (a).
For your information, AM radio reflects better than FM radio from the
ionosphere (like visible light from a metal mirror). At night, the lower
layer of the ionosphere goes away so that AM radio waves reflect from
a higher layer.
P2.8 Use a computer to plot n and κ as a function of frequency for a con-

ductor (obtain plots such as the ones in Fig. 2.6). Use these parame-
ters to keep things simple, let γ = 0.02ωp and plot your function from
ω = 0.6ωp to ω = 2ωp .

Exercises for 2.6 Irradiance of a Plane Wave
P2.9 In the case of a linearly-polarized plane wave, where the phase of each
vector component of E0 is the same, re-derive (2.61) directly from the
real field (2.21). For simplicity, you may ignore absorption (i.e. κ ∼
= 0).
HINT: The time-average of cos2 k · r − ωt + φ is 1/2.
¡ ¢
P2.10 (a) Find the intensity (in W/cm2 ) produced by a short laser pulse (lin-
early polarized) with duration ∆t = 2.5 ×10−14 s and energy E = 100 mJ,
focused in vacuum to a round spot with radius r = 5 µm.
(b) What is the peak electric field (in V/Å)?
HINT: The SI units of electric field are N/C = V/m.
(c) What is the peak magnetic field (in T = kg/(s · C)?
P2.11 (a) What is the intensity (in W/cm2 ) on the retina when looking directly
at the sun? Assume that the eye’s pupil has a radius r pupil = 1 mm.
Take the Sun’s irradiance at the earth’s surface to be 1.4 kW/m2 , and
neglect refractive index (i.e. set n = 1). HINT: The Earth-Sun distance
is d o = 1.5 × 108 km and the pupil-retina distance is d i = 22 mm. The
radius of the Sun r Sun = 7.0 × 105 km is de-magnified on the retina
according to the ratio d i /d o .
(b) What is the intensity at the retina when looking directly into a
1 mW HeNe laser? Assume that the smallest radius of the laser beam
is r waist = 0.5 mm positioned d o = 2 m in front of the eye, and that the
entire beam enters the pupil. Compare with part (a).
P2.12 Show that the magnetic field of an intense laser with λ = 1 µm becomes
important for a free electron oscillating in the field at intensities above
1018 W/cm2 . This marks the transition to relativistic physics. Neverthe-
less, for convenience, use classical physics in making the estimate.
HINT: At lower intensities, the oscillating electric field dominates, so
the electron motion can be thought of as arising solely from the electric
field. Use this motion to calculate the magnetic force on the mov-
ing electron, and compare it to the electric force. The forces become
comparable at 1018 W/cm2 .

Chapter 3
Reflection and Refraction
As we know from everyday experience, when light arrives at an interface between

materials it is partially reflected and partially transmitted. In this chapter, we
examine what happens to plane waves when they propagate from one material
(characterized by index n or even by complex index N ) to another material. We
will derive expressions to quantify the amount of reflection and transmission.
The results depend on the angle of incidence (i.e. the angle between k and the
normal to the surface) as well as on the orientation of the electric field (called
polarization—not to be confused with P, also called polarization). In this chap-
ter, we consider only isotropic materials (e.g. glass); in chapter 5 we consider
anisotropic materials (e.g. a crystal).
As we develop the connection between incident, reflected, and transmitted
light waves, several familiar relationships will emerge naturally (e.g. Snell’s law
and Brewster’s angle). The formalism also describes polarization-dependent
phase shifts upon reflection (especially interesting in the case of total internal
reflection or in the case of reflections from absorbing surfaces such as metals).
For simplicity, we initially neglect the imaginary part of the refractive index.
Each plane wave is thus characterized by a real wave vector k. We will write each
plane wave in the form E(r, t ) = E0 exp [i (k · r − ωt )], where, as usual, only the real
part of the field corresponds to the physical field. The restriction to real refractive
indices is not as serious as it might seem. The use of the letter n instead of N
hardly matters. The math is all the same, which demonstrates the power of the
complex notation. We can simply update our expressions in the end to include
complex refractive indices, but in the mean time it is easier to think of absorption
as negligible.
3.1 Refraction at an Interface

To study the reflection and transmission of light at a material interface, we will
examine three distinct waves traveling in the directions ki , kr , and kt as depicted
in the Fig. 3.1. In the upcoming development, we will often refer to Fig. 3.1. We
assume a planar boundary between the two materials. The index n i characterizes
63
64 Chapter 3 Reflection and Refraction
the material on the left, and the index n t characterizes the material on the right.
ki specifies an incident plane wave making an angle θi with the normal to the
interface. kr specifies a reflected plane wave making an angle θr with the interface
normal. These two waves exist only to the left of the interface. kt specifies
a transmitted plane wave making an angle θt with the interface normal. The
transmitted wave exists only to the right of the material interface.
We choose the y–z plane to be the plane of incidence, containing ki , kr , and
kt (i.e. the plane represented by the surface of this page). By symmetry, all three
k-vectors must lie in a single plane, assuming an isotropic material. We are free to
orient our coordinate system in many different ways (and every textbook seems
to do it differently!). We choose the normal incidence on the interface to be along
the z-direction. The x-axis points into the page.
In an isotropic medium, the electric field amplitude E0 is confined to a plane
perpendicular to k. For a given ki , the electric field vector Ei can be decomposed
into arbitrary components as long as they are perpendicular to ki . For conve-
nience, we choose one of the electric field vector components to be that which
(p)
lies within the plane of incidence, as depicted in Fig. 3.1. E i denotes this com-
z-axis ponent, represented by an arrow in the plane of the page. We call this p-polarized
light, where p stands for parallel to the plane of incidence.
x-axis
directed into page The remaining electric field vector component, denoted by E i(s) , is directed
normal to the plane of incidence. The superscript s stands for senkrecht, a German
word meaning perpendicular. We call this s-polarized light. In Fig. 3.1, E i(s) is
represented by the tail of an arrow pointing into the page, or the x-direction, by
our convention.
Figure 3.1 Incident, reflected, and
transmitted plane wave fields at a The other fields Er and Et are similarly split into s and p components as
material interface. indicated in Fig. 3.1. All field components are considered to be positive when they
point in the direction of their respective arrows.1
By inspection of Fig. 3.1, we can write the various k-vectors in terms of the ŷ
and ẑ unit vectors:
ki = k i ŷ sin θi + ẑ cos θi
¡ ¢
kr = k r ŷ sin θr − ẑ cos θr
¡ ¢
(3.1)
kt = k t ŷ sin θt + ẑ cos θt
¡ ¢
Also by inspection of Fig. 3.1 (following the conventions for the electric fields
indicated by the arrows), we can write the incident, reflected, and transmitted
fields in terms of x̂, ŷ, and ẑ:
Ei = E i ŷ cos θi − ẑ sin θi + x̂E i(s) e i [ki ( y sin θi +z cos θi )−ωi t ]

(p) ¡
h ¢ i
Er = E r ŷ cos θr + ẑ sin θr + x̂E r(s) e i [kr ( y sin θr −z cos θr )−ωr t ]

(p) ¡
h i
(3.2)
¢
Et = E t ŷ cos θt − ẑ sin θt + x̂E t(s) e i [kt ( y sin θt +z cos θt )−ωt t ]

(p) ¡
h ¢ i
1 Many textbooks draw the arrow for E (p) in the direction opposite of ours. However, that choice
r
leads to an awkward situation at normal incidence (i.e. θi = θr = 0) where the arrows for the incident
and reflected fields are parallel for the s-component but anti parallel for the p-component.

3.1 Refraction at an Interface 65
Each field has the form (2.8), and we have utilized the k-vectors (3.1) in the
exponents of (3.2).
Now we are ready to connect the fields on one side of the interface to the
fields on the other side. This is done using boundary conditions. As explained
in appendix 3.A, Maxwell’s equations require that the component of E that are
parallel to the interface must be the same on either side of the boundary. In
our coordinate system, the x̂ and ŷ components are parallel to the interface, and
z = 0 defines the interface. This means that at z = 0 the x̂ and ŷ components
of the combined incident and reflected fields must equal the corresponding
components of the transmitted field: Figure 3.2 Animation of s- and
p-polarized fields incident on an
E i ŷ cos θi + x̂E i(s) e i (ki y sin θi −ωi t ) + E r ŷ cos θr + x̂E r(s) e i (kr y sin θr −ωr t )
(p) (p)
h i h i
interface as the angle of incidence
is varied.
= E t ŷ cos θt + x̂E t(s) e i (kt y sin θt −ωt t )
(p)
h i
(3.3)
Since this equation must hold for all conceivable values of t and y, we are com-
pelled to set all the phase factors in the complex exponentials equal to each other.
The time portion of the phase factors requires the frequency of all waves to be the
same:
ωi = ωr = ωt ≡ ω (3.4)
(We could have guessed that all frequencies would be the same; otherwise wave
fronts would be annihilated or created at the interface.) Similarly, equating the
spatial terms in the exponents of (3.3) requires
k i sin θi = k r sin θr = k t sin θt (3.5)
Now recall from (2.19) the relations k i = k r = n i ω/c and k t = n t ω/c. With these
relations, (3.5) yields the law of reflection
Willebrord Snell (or Snellius) (1580–
θr = θi (3.6) 1626, Dutch) was an astronomer and
mathematician born in Leiden, Nether-
and Snell’s law lands. In 1613 he succeeded his father
as professor of mathematics at the
n i sin θi = n t sin θt (3.7) University of Leiden. He was an accom-
plished mathematician, developing a
The three angles θi , θr , and θt are not independent. The reflected angle matches new method for calculating π as well
the incident angle, and the transmitted angle obeys Snell’s law. The phenomenon as an improved method for measuring
the circumference of the earth. He is
of refraction refers to the fact that θi and θt are different. most famous for his rediscovery of the
Because the exponents are all identical, (3.3) reduces to two relatively simple law of refraction in 1621. (The law was
known (in table form) to the ancient
equations (one for each dimension, x̂ and ŷ): Greek mathematician Ptolemy, to Arab
engineer Ibn Sahl (900s), and to Polish
E i(s) + E r(s) = E t(s) (3.8) philosopher Witelo (1200s).) Snell au-
thored several books, including one on
trigonometry, published a year after his
and death.
(p) (p) (p)
³ ´
Ei + Er cos θi = E t cos θt (3.9)
We have derived these equations from the boundary condition (3.52) on the
parallel component of the electric field. This set of equations has four unknowns

(p) (p)
(E r , E r(s) , E t , and E t(s) ) assuming that we pick the incident fields, so we require
two further equations to solve the system. These are obtained using the separate
boundary condition on the parallel component of magnetic fields given in (3.56)
(also discussed in appendix 3.A).
From Faraday’s law (1.3), we have for a plane wave (see (2.56))
k×E n
B= = û × E (3.10)
ω c
where û ≡ k/k is a unit vector in the direction of k. We have also utilized (2.19)
for a real index. This expression is useful for writing Bi , Br , and Bt in terms of the
electric field components that we have already introduced. When injecting (3.1)
and (3.2) into (3.10), the incident, reflected, and transmitted magnetic fields turn
out to be
ni h
−x̂E i + E i(s) −ẑ sin θi + ŷ cos θi e i [ki ( y sin θi +z cos θi )−ωi t ]
(p) ¡ ¢i
Bi =
c
n r h (p)
x̂E r + E r(s) −ẑ sin θr − ŷ cos θr e i [kr ( y sin θr −z cos θr )−ωr t ]
¡ ¢i
Br = (3.11)
c h
nt
−x̂E t + E t(s) −ẑ sin θt + ŷ cos θt e i [kt ( y sin θt +z cos θt )−ωt t ]
(p) ¡ ¢ i
Bt =
c
Next, we apply the boundary condition (3.56), namely that the components of B
parallel to the interface (i.e. in the x̂ and ŷ dimensions) are the same2 on either
side of the plane z = 0. Since we already know that the exponents are all equal
and that θr = θi and n i = n r , the boundary condition gives
ni h (p)
i n h
i (p)
i n h
t (p)
i
−x̂E i + E i(s) ŷ cos θi + x̂E r − E r(s) ŷ cos θi = −x̂E t + E t(s) ŷ cos θt
c c c
(3.12)
As before, (3.12) reduces to two relatively simple equations (one for the x̂ dimen-
sion and one for the ŷ dimension):
(p) (p) (p)
³ ´
ni E i − E r = nt E t (3.13)
and ³ ´
n i E i(s) − E r(s) cos θi = n t E t(s) cos θt (3.14)
These two equations together with (3.8) and (3.9) allow us to solve for the reflected
Er and transmitted fields Et for the s and p polarization components. However,
(3.8), (3.9), (3.13), and (3.14) are not yet in their most convenient form.
3.2 The Fresnel Coefficients

Augustin Fresnel first derived the results of the previous section. Since he lived
well before Maxwell’s time, he did not have the benefit of Maxwell’s equations as
we have. Instead, Fresnel thought of light as transverse mechanical waves prop-
agating within materials. (Fresnel was naturally a proponent of ‘luminiferous
2 We assume the permeability µ is the same everywhere – no magnetic effects.
0

3.2 The Fresnel Coefficients 67
ether’.) Instead of relating the parallel components of the electric and magnetic
fields across the boundary between the materials, Fresnel used the principle that,
as a transverse mechanical wave propagates from one material to the other, the
two materials should not slip past each other at the interface. This “gluing” of the
materials at the interface also forbids the possibility of the materials detaching
from one another (creating gaps) or passing through one another as they expe-
rience wave vibrations. This mechanical approach to light worked splendidly
and explained polarization effects along with the variations in reflectance and
transmittance as a function of the incident angle of the light.
Fresnel wrote the relationships between the various plane waves depicted
in Fig. 3.1 in terms of coefficients that compare the reflected and transmitted
field amplitudes to those of the incident field. He then calculated the ratio of
the reflected and transmitted field components to the incident field components Augustin Fresnel (1788–1829, French)
for each polarization. In the following example, we illustrate this procedure for was born in Broglie, France, the son of
an architect. As a child, he was slow to
s-polarized light. It is left as a homework exercise to solve the equations for develop and still could not read when he
p-polarized light (see P3.1). was eight years old, but by age sixteen
he excelled and entered the École Poly-
technique where he earned distinction.
As a youn man, Fresnel began a success-
Example 3.1 ful career as an engineer, but he lost his
post in 1814 when Napoleon returned
Calculate the ratio of transmitted field to the incident field and the ratio of the to power. (Fresnel had supported the
reflected field to incident field for s-polarized light. Bourbons.) This difficult year was when
Fresnel turned his attention to optics.
Fresnel became a major proponent of
Solution: We use (3.8) the wave theory of light and four years
later wrote a paper on diffraction for
E i(s) + E r(s) = E t(s) [3.8] which he was awarded a prize by the
French Academy of Sciences. A year
later he was appointed commissioner
and (3.14), which with the help of Snell’s law is written of lighthouses, which motivated the in-
vention of the Fresnel lens (still used in
sin θi cos θt (s) many commercial applications). Fresnel
E i(s) − E r(s) = E (3.15)
sin θt cos θi t was under appreciated before his un-
timely death from tuberculosis. Many
If we add these two equations, we get of his papers did not make it into print
until years later. Fresnel made huge
sin θi cos θt advances in the understanding of re-
· ¸
(s)
2E i = 1 + E (s) (3.16) flection, diffraction, polarization, and
sin θt cos θi t birefringence. In 1824 Fresnel wrote
to Thomas Young, “All the compli-
and after dividing by E i(s) and doing a little algebra, it turns into ments that I have received from Arago,
Laplace and Biot never gave me so
E t(s) 2 sin θt cos θi much pleasure as the discovery of a
= theoretic truth, or the confirmation of
Ei(s)
sin θt cos θi + sin θi cos θt a calculation by experiment.” Augustin
Fresnel is a hero of one of the authors
To get the ratio of reflected to incident, we subtract (3.15) from (3.8) to obtain of this textbook.
sin θi cos θt
· ¸
2E r(s) = 1 − E (s) (3.17)
sin θt cos θi t
and then divide (3.17) by (3.16). After a little algebra, we arrive at
E r(s) sin θt cos θi − sin θi cos θt

=
E i(s) sin θt cos θi + sin θi cos θt

The ratio of the reflected and transmitted field components to the incident
field components are specified by the following coefficients, called the Fresnel
coefficients:
E r(s) sin θt cos θi − sin θi cos θt sin (θi − θt ) n i cos θi − n t cos θt

rs ≡ = =− = (3.18)
E i(s) sin θt cos θi + sin θi cos θt sin (θi + θt ) n i cos θi + n t cos θt
E (s) 2 sin θt cos θi 2 sin θt cos θi 2n i cos θi

t s ≡ t(s) = = = (3.19)
Ei sin θt cos θi + sin θi cos θt sin (θi + θt ) n i cos θi + n t cos θt
(p)
Er cos θt sin θt − cos θi sin θi tan (θi − θt ) n i cos θt − n t cos θi
r p ≡ (p) = =− = (3.20)
Ei cos θt sin θt + cos θi sin θi tan (θi + θt ) n i cos θt + n t cos θi
(p)
E 2 cos θi sin θt 2 cos θi sin θt 2n i cos θi
t p ≡ t(p) = = =
Ei cos θt sin θt + cos θi sin θi sin (θi + θt ) cos (θi − θt ) n i cos θt + n t cos θi
(3.21)
All of the above forms of the Fresnel coefficients are potentially useful, depending
on the problem at hand. Remember that the angles in the coefficient are not inde-
pendently chosen, but are subject to Snell’s law (3.7). (The right-most expression
for each coefficient is obtained from the first form using Snell’s law).
The Fresnel coefficients pin down the electric field amplitudes on the two
sides of the boundary. They also keep track of phase shifts at a boundary. In
Fig. 3.3 we have plotted the Fresnel coefficients for the case of a air-glass interface.
Notice that the reflection coefficients are sometimes negative in this plot, which
corresponds to a phase shift of π upon reflection (remember e i π = −1). Later we
will see that when absorbing materials are encountered, more complicated phase
shifts can arise due to the complex index of refraction.
Figure 3.3 The Fresnel coefficients
plotted versus θi for the case of an 3.3 Reflectance and Transmittance
air-glass interface with n i = 1 and
n t = 1.5. We are often interested in knowing the fraction of intensity that transmits through
or reflects from a boundary. Since intensity is proportional to the square of the
amplitude of the electric field, we can write the fraction of the light reflected from
the surface, or reflectance, in terms of the Fresnel coefficients:
¯ ¯2
R s ≡ |r s |2 and R p ≡ ¯r p ¯ (3.22)
These expressions are applied individually to each polarization component (s or

p). The intensity reflected for each of these orthogonal polarizations is additive
because the two electric fields are orthogonal and cannot interfere with each
other. The total reflected intensity is therefore
(p) (p)
I r(total) = I r(s) + I r = R s I i(s) + R p I i (3.23)
where the incident intensity is given by (2.62):
1 ¯ (s) ¯2 ¯ (p) ¯2
·¯ ¯ ¸
(p)
¯ ¯
(total) (s)
Ii = I i + I i = n i ²0 c ¯E i ¯ + ¯E i ¯ (3.24)
2

3.3 Reflectance and Transmittance 69
Since intensity is power per area, we can rewrite (3.23) as incident and re-
flected power:
(p) (p)
P r(total) = P r(s) + P r = R s P i(s) + R p P i (3.25)
Using this expression and requiring that energy be conserved (i.e. P i(total) = P r(total) +
P t(total) ), we find that the portion of the power that transmits is
(p) (p)
P t(total) = P i(s) + P i − P r(s) + P r
¡ ¢ ¡ ¢
¢ (p) (3.26)
= (1 − R s ) P i(s) + 1 − R p P i
¡
From this expression we see that the transmittance (i.e. the fraction of the light
that transmits) for either polarization is
Ts ≡ 1 − Rs and Tp ≡ 1 − Rp (3.27)
Figure 3.4 shows typical reflectance and transmittance values for an air-glass
interface.
You might be surprised at first to learn that
¯ ¯2
T s 6= |t s |2 and T p 6= ¯t p ¯ (3.28) Figure 3.4 The reflectance and
transmittance plotted versus θi for
However, recall that the transmitted intensity (in terms of the transmitted fields) the case of an air-glass interface
depends also on the refractive index. The Fresnel coefficients t s and t p relate the with n i = 1 and n t = 1.5.
bare electric fields to each other, whereas the transmitted intensity (similar to
(3.24)) is
1 ¯ (s) ¯2 ¯ (p) ¯2
·¯ ¯ ¸
(p)
¯ ¯
(total) (s)
It = I t + I t = n t ²0 c ¯E t ¯ + ¯E t ¯ (3.29)
2
Therefore, we expect T s and T p to depend on the ratio of the refractive indices n t
and n i as well as on the squares of t s and t p .
There is another more subtle reason for the inequalities in (3.28). Consider
a lateral strip of light associated with a plane wave incident upon the material
interface in Fig. 3.5. Upon refraction into the second medium, the strip is seen
to change its width by the factor cos θt / cos θi . This is a geometrical effect, owing
to the change in propagation direction at the interface. The change in direction
alters the intensity (power per area) but not the power. In computing the trans-
mittance, we must remove this geometrical effect from the ratio of the intensities,
which leads to the following transmittance coefficients:
n t cos θt
Ts = |t s |2
n i cos θi
(valid when no total internal reflection) (3.30)
n t cos θt ¯¯ ¯¯2 Figure 3.5 Light refracting into a
Tp = tp
n i cos θi surface
Note that (3.30) is valid only if a real angle θt exists; it does not hold when the
incident angle exceeds the critical angle for total internal reflection, discussed in
section 3.5. In that situation, we must stick with (3.27).

Example 3.2
Show analytically for p-polarized light that R p + Tp = 1, where R p is given by (3.22)
and T p is given by (3.30).
Solution: From (3.20) we have
¯ cos θt sin θt − cos θi sin θi ¯2

¯ ¯
Rp = ¯
¯ ¯
cos θt sin θt + cos θi sin θi ¯
cos2 θt sin2 θt − 2 cos θi sin θi cos θt sin θt + cos2 θi sin2 θi
=
(cos θt sin θt + cos θi sin θi )2
From (3.21) and (3.30) we have

¸2
n t cos θt 2 cos θi sin θt
·
Tp =
n i cos θi cos θt sin θt + cos θi sin θi
sin θi cos θt 4 cos2 θi sin2 θt
· ¸
=
sin θt cos θi (cos θt sin θt + cos θi sin θi )2
4 cos θi sin θt sin θi cos θt
=
Then
cos2 θt sin2 θt + 2 cos θi sin θi cos θt sin θt + cos2 θi sin2 θi

Rp + Tp =
=
=1
3.4 Brewster’s Angle

Notice r p and R p go to zero at a certain angle in Figs. 3.3 and 3.4, indicating that
no p-polarized light is reflected at this angle. This behavior is quite general, as we
can see from the second form of the Fresnel coefficient formula for r p in (3.20),
which has tan (θi + θt ) in the denominator. Since the tangent “blows up” at π/2,
the reflection coefficient goes to zero when
(requirement for zero π

θi + θt = (3.31)
p-polarized reflection) 2
By inspecting Fig. 3.1, we see that this condition occurs when the reflected and
transmitted wave vectors, kr and kt , are perpendicular to each other. If we insert
(3.31) into Snell’s law (3.7), we can solve for the incident angle θi that gives rise to
this special circumstance:
³π ´
n i sin θi = n t sin − θi = n t cos θi (3.32)
2
3.5 Total Internal Reflection 71
The special incident angle that satisfies this equation, in terms of the refractive
indices, is found to be
nt
θB = tan−1 (3.33)
ni
We have replaced the specific θi with θB in honor of Sir David Brewster who first
discovered the phenomenon. The angle θB is called Brewster’s angle. At Brewster’s
angle, no p-polarized light reflects (see L 3.4). Physically, the p-polarized light
cannot reflect because kr and kt are perpendicular. A reflection would require
the microscopic dipoles at the surface of the second material to radiate along
their axes, which they cannot do. Maxwell’s equations ‘know’ about this, and so
everything is nicely consistent.
David Brewster (1781–1868, Scot-

3.5 Total Internal Reflection tish) was born in Jedburgh, Scottland.
His father was a teacher and wanted
From Snell’s law (3.7), we can compute the transmitted angle in terms of the David to become a clergyman. At age
twelve, David went to the University of
incident angle: Edinburgh for that purpose, but his incli-
−1 n i
µ ¶
nation for natural science soon became
θt = sin sin θi (3.34) apparent. He became licensed to preach,
nt but his interests in science distracted
The angle θt is real only if the argument of the inverse sine is less than or equal to him from that profession, and he spent
much of his time studying diffraction.
one. If n i > n t , we can find a critical angle at which the argument begins to exceed Taking an empirical approach, Brewster
one: independently discovered many of the
nt
θc ≡ sin−1 (3.35) same things usually credited to Fresnel.
ni He even made a dioptric apparatus for
lighthouses before Fresnel developed
When θi > θc , then there is total internal reflection and we can directly show that his. Brewster became somewhat fa-
R s = 1 and R p = 1 (see P3.7). To demonstrate this, one computes the Fresnel mous in his day for the development
of the kaleidoscope and stereoscope
coefficients (3.18) and (3.20) while employing the following substitutions: for enjoyment by the general public.
Brewster was a prolific science writer
ni
sin θt = sin θi (Snell’s law) (3.36) and editor throughout his life. Among
nt his works is an important biography of
Isaac Newton. He was knighted for his
and v accomplishments in 1831.
u 2
un
cos θt = i t i2 sin2 θi − 1 (θi > θc ) (3.37)
nt
(see P0.19).
In this case, θt is a complex number. However, we do not assign geometrical
significance to it in terms of any direction. Actually, we don’t even need to know
the value for θt ; we need only the values for sin θt and cos θt , as specified in (3.36)
and (3.37). Even though sin θt is greater than one and cos θt is imaginary, we can
use their values to compute r s , r p , t s , and t p . (Complex notation is wonderful!)
Upon substitution of (3.36) and (3.37) into the Fresnel reflection coefficients
(3.18) and (3.20) we obtain
r
ni n i2
nt cos θi − i n t2
sin2 θi − 1
rs = r (θi > θc ) (3.38)
ni n i2 2
nt cos θi + i n t2
sin θi − 1

and r
ni n i2
cos θi − i nt n t2
sin2 θi − 1
rp = − r (θi > θc ) (3.39)
ni n i2 2
cos θi + i nt n t2
sin θi − 1
These Fresnel coefficients can be manipulated (see P3.7) into the forms
  v 
u 2
 n t n
t i sin2 θ − 1
u 
r s = exp −2i tan−1  i (θi > θc ) (3.40)
 n i cos θi n 2 
t
and
  v 
u 2
 n i n
t i sin2 θ − 1
u 
r p = − exp −2i tan−1  i (θi > θc ) (3.41)
 n t cos θi n 2 
t
Each coefficient has a different phase (note n i /n t vs. n t /n i in the expressions),

which means that the s- and p-polarized fields experience different ¯ ¯ phase shifts
upon reflection. Nevertheless, we definitely have |r s | = 1 and ¯r p ¯ = 1. We rightly
conclude that 100% of the light reflects. The transmittance is zero as dictated by
Figure 3.6 Animation of light
(3.25). We emphasize that one should not employ (3.29) or (3.30) in the case of
waves incident on an interface
both below and beyond the critical total internal reflection, as the imaginary θt makes the geometric factor in this
angle. equation invalid.
Even with zero transmittance, the boundary conditions from Maxwell’s equa-
tions (see appendix 3.A) require that the fields be non-zero on the transmitted
side of the boundary, meaning t s 6= 0 and t p 6= 0. While this situation may seem
like a contradiction at first, it is an accurate description of what actually happens.
The coefficients t s and t p characterize evanescent waves that exist on the trans-
mitted side of the interface. The evanescent wave travels parallel to the interface
so that no energy is conveyed away from the interface deeper into the medium
on the transmission side.
To compute the explicit form of the evanescent wave, we plug (3.36) and (3.37)
into the transmitted field (3.2):
Incident
Wave Evanescent
Et = E t ŷ cos θt − ẑ sin θt + x̂E t(s) e i [kt ( y sin θt +z cos θt )−ωt ]
(p) ¡
h ¢ i
Wave
s
n2
  v u 2  
i sin2 θ −1
(p)  t n i ni −k t z n
h i
2 (s)  n t2
i i k t y n i sin θi −ωt
u
= tp Ei
 ŷi sin θi − 1 − ẑ sin θi + x̂t s E i
 e e t
n t2 nt
(3.42)
Figure 3.7 A wave experiencing Figure 3.7 plots the evanescent wave described by (3.42) along with the associ-
total internal reflection creates an ated incident wave. The phase of the evanescent wave indicates that it propagates
evanescent wave that propagates parallel to the boundary (in the y-dimension). Its strength decays exponentially
parallel to the interface. (The away from the boundary (in the z-dimension). We leave the calculation of t s and
reflected wave is not shown.)
t p as an exercise (P3.8).

3.6 Reflections from Metal 73
3.6 Reflections from Metal

In this section we extend our analysis to materials with complex refractive index
N ≡ n + i κ. As a reminder, the imaginary part of the index controls attenuation
of a wave as it propagates within a material. The real part of the index governs the
oscillatory nature of the wave. It turns out that both the imaginary and real parts
of the index strongly influence the reflection of light from a surface. The reader
may be grateful that there is no need to re-derive the Fresnel coefficients (3.18)–
(3.21) for the case of complex indices. The coefficients remain valid whether the
index is real or complex—just replace the real index n with the complex index N .
However, we do need to be a bit careful when applying them.
We restrict our discussion to reflections from a metallic or other absorbing
material surface. As we found in the case of total internal reflection, we actually do
not need to know the transmitted angle θt to employ Fresnel reflection coefficients
(3.18) and (3.20). We need only acquire expressions for cos θt and sin θt , and we
can obtain these from Snell’s law (3.7). To minimize complications, we let the
incident refractive index be n i = 1 (which is often the case). Let the index on the
transmitted side be written simply as Nt = N . Then by Snell’s law, the sine of the
transmitted angle is
sin θi
sin θt = (3.43)
N
This expression is of course complex since N is complex, but that is just fine. The
cosine of the same angle is
1
q q
cos θt = 1 − sin2 θt = N 2 − sin2 θi (3.44)
N p
The positive sign in front of the square root is appropriate since it is clearly the
right choice if the imaginary part of the index approaches zero.
Upon substitution of these expressions, the Fresnel reflection coefficients
(3.18) and (3.20) become p
cos θi − N 2 − sin2 θi
p
rs = (3.45)
cos θi + N 2 − sin2 θi Figure 3.8 The reflectances (top)
p
with associated phases (bottom)
and for silver, which has index n = 0.2
N 2 − sin2 θi − N 2 cos θi
p
and κ = 3.4. Note the minimum
rp = p (3.46)
N 2 − sin2 θi + N 2 cos θi of R p corresponding to a kind of
Brewster’s angle.
These expressions are tedious to evaluate. When evaluating the expressions, it is
usually desirable to put them into the form
r s = |r s | e i φs (3.47)
and
r p = ¯r p ¯ e i φp
¯ ¯
(3.48)
However, we refrain from putting (3.45) and (3.46) into this form using the general
expressions; we would get a big mess. It is a good idea to let your calculator or

a computer do it after a specific value for N ≡ n + i κ is chosen. An important

point to notice is that the phases upon reflection can be very different for s and
p-polarization components (i.e. φp and φs can be very¯ different).¯ This is true in
general, even when the reflectivity is high (i.e. |r s | and ¯r p ¯ on the order of unity).
Brewster’s angle exists also for surfaces with complex refractive index. How-
ever, in general the expressions (3.46) and (3.48) do not go to zero at any incident
angle θi . Rather, the reflection of p-polarized light can go through a minimum at
some angle θi , which we refer to as Brewster’s angle (see Fig. ¯ ¯3.8). This minimum
is best found numerically since the general expression for ¯r p ¯ in terms of n and κ
and as a function of θi can be unwieldy.
Appendix 3.A Boundary Conditions For Fields at an Inter-

face
We are interested in the continuity of fields across a boundary from one medium
with index n 1 to another medium with index n 2 . We will show that the compo-
nents of electric field and the magnetic field parallel to the interface surface must
be the same on either side (adjacent to the interface). This result is independent
of the refractive index of the materials; in the case of the magnetic field we assume
the permeability µ0 is the same on both sides. To derive the boundary conditions,
we consider a surface S (a rectangle) that is perpendicular to the interface between
the two media and which extends into both media, as depicted in Fig. 3.9.
d First we examine the integral form of Faraday’s law (1.14)
∂
I Z
S E · d` = − B · n̂ d a (3.49)
C ∂t S
applied to the rectangular contour depicted in Fig. 3.9. We perform the path
d
integration on the left-hand side around the loop as follows:
I
E · d ` = E 1|| d − E 1⊥ `1 − E 2⊥ `2 − E 2|| d + E 2⊥ `2 + E 1⊥ `1 = E 1|| − E 2|| d (3.50)
¡ ¢
Figure 3.9 Interface of two materi-
als.
Here, E 1|| refers to the component of the electric field in the material with index
n 1 that is parallel to the interface. E 1⊥ refers to the component of the electric field
in the material with index n 1 which is perpendicular to the interface. Similarly,
E 2|| and E 2⊥ are the parallel and perpendicular components of the electric field
in the material with index n 2 . We have assumed that the rectangle is small enough
that the fields are uniform within the half rectangle on either side of the boundary.
Next, we shrink the loop down until it has zero surface area by letting the
lengths `1 and `2 go to zero. In this situation, the right-hand side of Faraday’s law
(3.49) goes to zero Z
B · n̂ d a → 0 (3.51)
S
and we are left with
E 1|| = E 2|| (3.52)

3.A Boundary Conditions For Fields at an Interface 75
This simple relation is a general boundary condition, which is met at any material
interface. The component of the electric field that lies in the plane of the interface
must be the same on both sides of the interface.
We now derive a similar boundary condition for the magnetic field using the
integral form of Ampere’s law:3
∂E
I Z µ ¶
B · d ` = µ0 J + ²0 · n̂ d a (3.53)
∂t
C S
As before, we are able to perform the path integration on the left-hand side for
the geometry depicted in the figure, which gives
I
B · d ` = B 1|| d −B 1⊥ `1 −B 2⊥ `2 −B 2|| d +B 2⊥ `2 +B 1⊥ `1 = B 1|| − B 2|| d (3.54)
¡ ¢
The notation for parallel and perpendicular components on either side of the
interface is similar to that used in (3.50).
Again, we can shrink the loop down until it has zero surface area by letting the
lengths `1 and `2 go to zero. In this situation, the right-hand side of (3.53) goes to
zero (ignoring the possibility of surface currents):
∂E
Z µ ¶
J + ²0 · n̂ d a → 0 (3.55)
∂t
S
and we are left with

B 1|| = B 2|| (3.56)
This is a general boundary condition that must be satisfied at the material inter-
face.
3 This form can be obtained from (1.4) by integration over the surface S in Fig. 3.9 and applying
Stokes’ theorem (0.12) to the magnetic field term.

Exercises
Exercises for 3.2 The Fresnel Coefficients
P3.1 Derive the Fresnel coefficients (3.20) and (3.21) for p-polarized light.
P3.2 Verify that each of the alternative forms given in (3.18)–(3.21) are equiv-
alent (given Snell’s law). Show that at normal incidence (i.e. θi = θt = 0)
the Fresnel coefficients reduce to
nt − ni 2n i
lim r s = lim r p = − and lim t s = lim t p =
θi →0 θi →0 nt + ni θi →0 θi →0 nt + ni
P3.3 Undoubtedly the most important interface in optics is when air meets
glass. Use a computer to make the following plots for this interface as a
function of the incident angle. Use n i = 1 for air and n t = 1.6 for glass.
Explicitly label Brewster’s angle on all of the applicable graphs.
(a) r p and t p (plot together on same graph)
(b) R p and T p (plot together on same graph)
(c) r s and t s (plot together on same graph)
(d) R s and T s (plot together on same graph)
Exercises for 3.3 Reflectance and Transmittance
L3.4 (a) In the laboratory, measure the reflectance for both s and p polarized
light from a flat glass surface at about ten points. You can normalize
the detector by placing it in the incident beam of light before the glass
surface. Especially watch for Brewster’s angle (described in section 3.4).
Figure 3.10 illustrates the experimental setup. (video)
High sensitivity
detector
Slide detector
with the beam
Uncoated glass
Polarizer on rotation stage
Laser
Figure 3.10 Experimental setup for lab 3.4.
(b) Use a computer to calculate the theoretical air-to-glass reflectance

as a function of incident angle (i.e. plot R s and R p as a function of θi ).

Exercises 77
Take the index of refraction for glass to be n t = 1.54 and the index for air
to be one. Plot this theoretical calculation as a smooth line on a graph.
Plot your experimental data from (a) as points on this same graph (not
points connected by lines).
P3.5 Show analytically for s-polarized light that R s + T s = 1, where R s is

given by (3.22) and T s is given by (3.30).
Exercises for 3.4 Brewster’s Angle
P3.6 Find Brewster’s angle for glass n = 1.5.
Exercises for 3.5 Total Internal Reflection
P3.7 Derive (3.40) and (3.41) and show that R s = 1 and R p = 1. HINT: See
problem P0.15.
P3.8 Compute t s and t p in the case of total internal reflection.
P3.9 Use a computer to plot the air-to-water transmittance as a function

of incident angle (i.e. plot (3.27) as a function of θi ). Also plot the
water-to-air transmittance on a separate graph. Plot both T s and T p on
each graph. The index of refraction for water is n = 1.33. Take the index
of air to be one.
P3.10 Light (λvac = 500 nm) reflects internally from a glass surface (n = 1.5)
surrounded by air. The incident angle is θi = 45◦ . An evanescent wave
travels parallel to the surface on the air side. At what distance from the
surface is the amplitude of the evanescent wave 1/e of its value at the
surface?
Exercises for 3.6 Reflections from Metal
P3.11 Using a computer, plot |r s |, |r p | versus θi for silver (n = 0.14 and κ = 4.5).
Make a separate plot of the phases φs and φp from (3.47) and (3.48).
Clearly label each plot, and comment on how the phase shifts are
different from those experienced when reflecting from glass. 4
P3.12 Find Brewster’s angle for silver (n = 0.14 and κ = 4.5) by calculating R p 80
and finding its minimum. You will want to use a computer program to s
do this (Matlab, Maple, Mathematica, etc.).
p
P3.13 The complex index for silver is given by n = 0.14 and κ = 4.5. Find r s
and r p when θi = 80◦ and put them into the forms (3.47) and (3.48).
Figure 3.11 Geometry for P3.13
4 Are you surprised that the real part of the index can be less than one?

Chapter 4
Multiple Parallel Interfaces
In chapter 3, we studied the transmission and reflection of light at a single in-

terface between two isotropic and homogeneous materials with indices n i and
n t . We found that the percent of light reflected and transmitted depends on the
incident angle and on whether the light is s- or p-polarized. The Fresnel coeffi-
cients (3.18)–(3.21) connect the reflected and transmitted fields to the incident
field. The fraction of the incident intensity going into the reflected or transmitted
beams is given by either R s and T s or R p and T p , depending on the polarization
of the incident light (see (3.22) and (3.25)).
In this chapter we consider the overall transmission and reflection through
multiple parallel interfaces. We start with a two-interface system, where a layer of
material is inserted between the initial and final materials. This situation occurs
frequently in optics. For example, lenses are often coated with a thin layer of
material in an effort to reduce reflections. Metal mirrors usually have a thin oxide
layer or a protective coating between the metal and the air. We can develop
reflection and transmission coefficients r tot and t tot , which apply to the overall
double-boundary system, similar to the Fresnel coefficients for a single boundary.
Likewise, we can compute an overall reflectance and transmittance R tot and T tot .
These can be used to compute the ‘tunneling’ of evanescent waves across a gap
between two parallel surfaces when the critical angle for total internal reflection
is exceeded.
The formalism we develop for the double-boundary problem is useful for
describing a simple instrument called a Fabry-Perot etalon (or interferometer if
the instrument has the capability of variable spacing between the two surfaces).
Such an instrument, which is constructed from two partially reflective parallel
surfaces, is useful for distinguishing closely spaced wavelengths.
Finally, in this chapter we will extend our analysis to multilayer coatings,
where an arbitrary number of interfaces exist between many material layers.
Multilayers are often used to make highly reflective mirror coatings from dielectric
materials (as opposed to metallic materials). Such mirror coatings can reflect
with efficiencies greater than 99.9% at certain wavelengths. In contrast, metallic
mirrors typically reflect with ∼ 96% efficiency, which can be a significant loss
79
80 Chapter 4 Multiple Parallel Interfaces
if there are many mirrors in an optical system. Dielectric multilayer coatings

also have the advantage of being more durable and less prone to damage from
high-intensity lasers.
4.1 Double-Interface Problem Solved Using Fresnel Coef-

ficients
Consider a slab of material sandwiched between two other materials as depicted
in Fig. 4.1. Because there are multiple reflections inside the middle layer, we have
dropped the subscripts i, r, and t used in chapter 3 and instead use the symbols
and to indicate forward and backward traveling waves, respectively. Let n 1
stand for the refractive index of the middle layer. For consistency with notation
that we will later use for many-layer systems, let n 0 and n 2 represent the indices
of the other two regions. For simplicity, we assume that indices are real. As with
the single-boundary problem, we are interested in finding the overall transmitted
(p) (p)
fields E 2(s) and E 2 in terms of the incident fields E 0(s) and E 0 . Similarly, we can
(p) (p)
also find the reflected fields E 0(s) and E 0 in terms the incident fields E 0(s) and E 0 .
Both forward and backward traveling plane waves exist in the middle region.
Our intuition rightly tells us that in this region there are many reflections, bounc-
ing both forward and backward between the two surfaces. It might therefore seem
that there should be an infinite number of fields represented, each corresponding
to a different bounce. Fortunately, the many forward-traveling plane waves aris-
ing from arbitrary numbers of bounces all travel in the same direction. Similarly,
the backward-traveling plane waves arising from various bounces are all parallel.
Hence, these plane-wave fields join neatly into a net forward-moving plane-wave
x-axis directed into page
Figure 4.1 Waves propagating through a dual interface between materials.

4.1 Double-Interface Problem Solved Using Fresnel Coefficients 81
field and a net backward-moving plane-wave field.1

As of yet, we do not know the amplitudes or phases of the net forward and net
backward traveling plane waves in the middle layer. We simply denote them by
(p) (p)
E 1(s) and E 1(s) or by E 1 and E 1 , separated into their s and p components as usual.
(p) (p)
Similarly, E 0(s) and E 0 as well as E 2(s) and E 2 are understood to include light that
‘leaks’ through the surfaces from the middle region. Thus, we need only concern
ourselves with five plane waves depicted in Fig. 4.1.
The various plane-wave fields are connected to each other at the boundaries
via the single-boundary Fresnel coefficients (3.18)–(3.21). At the first surface we
define
sin θ1 cos θ0 − sin θ0 cos θ1 cos θ1 sin θ1 − cos θ0 sin θ0

r s01 ≡ r p01 ≡
sin θ1 cos θ0 + sin θ0 cos θ1 cos θ1 sin θ1 + cos θ0 sin θ0
(4.1)
2 sin θ1 cos θ0 2 cos θ0 sin θ1
t s01 ≡ t p01 ≡
The notation 0 1 indicates the first surface from the perspective of starting
on the incident side and propagating towards the middle layer. The Fresnel
coefficients for the backward traveling light approaching the first interface from
within the middle layer are given by
r s10 = −r s01 r p10 = −r p01

2 sin θ0 cos θ1 2 cos θ1 sin θ0 (4.2)
t s10 ≡ t p10 ≡
where 1 0 again indicates connections at the first interface, but from the per-
spective of beginning inside the middle layer. Finally, the single-boundary coeffi-
cients for light approaching the second interface are
sin θ2 cos θ1 − sin θ1 cos θ2 cos θ2 sin θ2 − cos θ1 sin θ1

r s12 ≡ r p12 ≡
(4.3)
2 sin θ2 cos θ1 2 cos θ1 sin θ2
t s12 ≡ t p12 ≡
In a similar fashion, the notation 1 2 indicates connections made at the second

interface from the perspective of beginning in the middle layer.
We would like to find the various fields depicted in Fig.4.1, in particular
the overall transmitted and reflected fields, in terms of the incident field. For
each polarization (s or p), there are five fields defined, so we will need four
equations to connect them (taking the incident field as a given). For definiteness,
we will consider s-polarized light in the upcoming analysis. The equations for p-
polarized light look exactly the same; just replace the subscript s with p. Through
the remainder of this section and the next, we will continue to economize by
1 More specifically, the sum of parallel plane waves P E e i (k·r−ωt ) , where the phase of each wave
j
j
is contained in E j , can be written as ( E j )e i (k·r−ωt ) , which is effectively one plane wave.
P
j

writing the equations only for s-polarized light with the understanding that they
apply equally well to p-polarized light.
The forward-traveling wave in the middle region arises from both a transmis-
sion of the incident wave and a reflection of the backward-traveling wave in the
middle region at the first interface. Using the Fresnel coefficients, we can write
E 1(s) as the sum of fields arising from E 0(s) and E 1(s) as follows:
E 1(s) = t s01 E 0(s) + r s10 E 1(s) (4.4)
The factor t s01 and r s10 are the single-boundary Fresnel coefficients selected
appropriately from (4.3). Similarly, the overall reflected field E 0(s) , is given by the
reflection of the incident field and the transmission of the backward-traveling
field in the middle region according to
E 0(s) = r s01 E 0(s) + t s10 E 1(s) (4.5)
Two connections done; two to go.

Before we continue, we need to specify the origin so that we can calculate
phase shifts associated with propagation in the middle region. Propagation was
not an issue in the single-boundary problem studied back in chapter 3. However,
in the double-boundary problem, the thickness of the middle region dictates
phase variations that strongly influence the result. We take the origin to be
located on the first interface, as shown in Fig. 4.1. Since all fields in (4.4) and
(4.5) are evaluated at the origin (y, z) = (0, 0), there are no phase factors explicitly
written (since it is taken to be zero).
We will connect the plane-wave fields across the second interface at the
point r = ẑd . Since the field E 1(s) represents the forward-traveling field at (y, z) =
(0, 0), the appropriate phase-adjusted2 field at (y, z) = (0, d ) becomes E 1(s) e i k1·r =
E 1(s) e i k1 d cos θ1 . The transmitted field in the final medium arises only from the
forward-traveling field in the middle region, and at our selected point it is
E 2(s) = t s12 E 1(s) e i k1 d cos θ1 (4.6)
Note that E 2(s) stand for the transmitted field at the point (y, z) = (0, d ); its local
phase can be built into its definition so no need to write an explicit phase.
The backward-traveling plane wave in the middle region arises from the
reflection of the forward-traveling plane wave in that region:
E 1(s) e −i k1 d cos θ1 = r s12 E 1(s) e i k1 d cos θ1 (4.7)
Like before, E 1(s) is referenced to the origin (y, z) = (0, 0). Therefore, the factor
e i k1 ·r = e −i k1 d cos θ1 is needed at (y, z) = (0, d ).
The relations (4.4)–(4.7) permit us to find overall transmission and reflection
coefficients for the two-interface problem.
2 In the middle region, k = k 1 ŷ sin θ1 + ẑ cos θ1 and k1 = k 1 ŷ sin θ1 − ẑ cos θ1 .

¡ ¢ ¡ ¢
1

4.2 Two-Interface Transmittance at Sub Critical Angles 83
Example 4.1
Derive the transmission coefficient that connects the final transmitted field to the
incident field for the double-interface problem according to t stot ≡ E 2(s) /E 0(s) .
Solution: From (4.6) we may write
E 2(s) −i k1 d cos θ1
E 1(s) = e (4.8)
t s12
Substitution of this into (4.7) gives
r s12
E 1(s) = E 2(s) e i k1 d cos θ1 (4.9)
t s12
Finally, substituting both (4.9) and (4.9) into (4.4) yields the connection we seek
between the incident and transmitted fields:
E 2(s) −i k1 d cos θ1 01 (s) 10 (s) s
r 12 i k1 d cos θ1
e = t s E 0 + r s E 2 e (4.10)
t s12
t s12
After rearranging, we arrive at the more useful form
E 2(s) t s01 t s12

t stot ≡ = (4.11) p can be switched for s
E (s)
0 e −i k1 d cos θ1 − r s10 r s12 e i k1 d cos θ1
The coefficient t stot derived in Example 4.1 connects the amplitude and phase
of the incident field to the amplitude and phase of the transmitted field in a
manner similar to the single-boundary Fresnel coefficients. The numerator of
(4.11) remind us of the physics of the situation: the field transmits through the
first interface, acquires a phase due to propagating through the middle layer, and
transmits through the second interface. The denominator of (4.11) modifies the
result to account for feedback from multiple reflections in the middle region.
The overall reflection coefficient is found to be (see P4.1)
E 0(s) t s01 e i k1 d cos θ1 r s12 e i k1 d cos θ1 t s10

r stot ≡ = r 01
s + (4.12) p can be switched for s
E 0(s) 1 − r s10 r s12 e i 2k1 d cos θ1
Again the equation reminds us of the basic physics, and we did not completely
simplify the expression to make this more apparent. There is an initial reflection
from the first interface. That light is joined by light that transmits through the first
interface (looking at only the numerator of the second term), propagates through
the middle layer, reflects from the second interface, propagates back through the
middle layer, and transmits back through the first interface. The denominator of
the second term accounts for the effects of multiple-reflection feedback.
4.2 Two-Interface Transmittance at Sub Critical Angles

Often we are interested in the intensity of the light that goes through or reflects
from the double-interface setup. Because the transmission coefficient (4.11) has

a simpler form than the reflection coefficient (4.12), it will be easier to calcu-
late the total transmittance T stot and obtain the reflectance, if desired, from the
relationship
T stot + R stot = 1 (4.13)
When the transmitted angle θ2 is real, we may write the fraction of the transmitted
power as in (3.30):
n 2 cos θ2 ¯¯ tot ¯¯2

T stot = t
n 0 cos θ0 s
p can be switched for s ¯ 01 ¯2 ¯ 12 ¯2
¯t ¯ ¯t ¯ (θ2 real) (4.14)
n 2 cos θ2 s s
=
n 0 cos θ0 ¯e −i k1 d cos θ1 − r s10 r s12 e i k1 d cos θ1 ¯2
¯ ¯
Equation (4.14) is valid also even if the angle θ1 is complex. Thus, it can be applied
to the case of evanescent waves ‘tunneling’ through a gap where θ0 lies beyond
the critical angle for total internal reflection from the middle layer. This will be
studied further in section 4.3.
When there are no evanescent waves in any of the regions (i.e. θ0 and θ1 both
do not exceed critical angle) we can simplify (4.14) into the following useful form
(see P4.3):
T smax
p can be switched for s T stot = (θ1 and θ2 real) (4.15)
1 + F s sin2 Φ2
¡ ¢
where
T s01 T s12
T smax ≡ ³ p ´2 (4.16)
1 − R s10 R s12
Φ ≡ 2k 1 d cos θ1 + δr s10 + δr s12 (4.17)
and p 10 12

4Rs Rs
Fs ≡ ³ p 10 12 ´2 (4.18)
1 − Rs Rs
The quantity T smax is the maximum possible transmittance of intensity through

the two surfaces. The single-interface transmittances (T s01 and T s12 ) and re-
flectances (R s10 and R s12 ) are calculated from the single-interface Fresnel co-
efficients in the usual way as described in chapter 3. The numerator of T smax
represents the combined transmittances for the two interfaces without consider-
ing feedback due to multiple reflections. The denominator enhances this value
for the case optimal reinforcing feedback in the middle layer.
The phase delay experienced by the plane wave in the middle region is de-
scribed by Φ. The first term 2k 1 d cos θ1 represents the phase delay acquired
during round-trip propagation in the middle region. The terms δr s10 and δr s12
account for phase shifts upon reflection from each interface. They are defined
indirectly from the single-boundary Fresnel reflection coefficients:
¯ i δ 10 ¯ i δ 1 2
r s10 = ¯r s10 ¯ e r s r s12 = ¯r s12 ¯ e r s
¯ ¯
and (4.19)

4.2 Two-Interface Transmittance at Sub Critical Angles 85
If all the indices in the double-boundary system are real, then δr s10 and δr s12
can only be zero or π (i.e. the coefficients can only be positive or negative real
numbers).
F s is called the coefficient of finesse (not to be confused with reflecting finesse
defined in section 4.6), which determines how strongly the transmittance is
influenced when Φ is varied (for example, through varying d or the wavelength
λvac ).
Example 4.2
Consider a ‘beam splitter’ designed for s-polarized light incident on a substrate of Partial
glass (n = 1.5) at 45◦ as shown in Fig. 4.2. A thin coating of zinc sulfide (n = 2.32) reflection
is applied to the front of the glass to cause about half of the light to reflect. A coating
magnesium fluoride (n = 1.38) coating is applied to the back surface of the glass
46%
to minimize reflections.3 Each coating constitutes a separate double-interface
problem. The front coating is deferred to problem P4.5. In this example, find the 54%
Anti-reflection
highest transmittance possible through the antireflection film at the back of the coating
‘beam splitter’ and the smallest possible d 2 that accomplishes this for light with
wavelength λvac = 633 nm. Glass
Solution: For the back coating, we have n 0 = 1.5, n 1 = 1.38, and n 2 = 1. We can
find θ0 and θ1 from θ2 = 45◦ using Snell’s law
Figure 4.2 Side view of a beam-
µ
sin 45 ◦¶ splitter.
n 1 sin θ1 = sin θ2 ⇒ θ1 = sin−1 = 30.82◦
1.38
sin 45◦
µ ¶
n 0 sin θ0 = sin θ2 ⇒ θ0 = sin−1 = 28.13◦
1.5
Next we calculate the single-boundary Fresnel coefficients:
sin (θ1 − θ2 ) sin (30.82◦ − 45◦ )

r s12 = − =− = 0.253
sin (θ1 + θ2 ) sin (30.82◦ + 45◦ )
sin (θ1 − θ0 ) sin (30.82◦ − 28.13◦ )

r s10 = − =− = −0.0549
sin (θ1 + θ0 ) sin (30.82◦ + 28.13◦ )
These coefficients give us the phase shift due to reflection
δr s10 + δr s12 = π + 0 = π
The single-boundary reflectances are given by

¯2
R s10 ≡ ¯r s10 ¯ = |−0.0549|2 = 0.0030
¯
¯2
R s12 ≡ ¯r s12 ¯ = |0.253|2 = 0.0640
¯
3 We ignore possible feedback between the front and rear coatings. Since the antireflection
films are usually imperfect, beam splitter substrates are often slightly wedged so that unwanted
reflections from the second surface travel in a different direction.

and the transmittances are
T s01 = T s10 = 1 − R s10 = 1 − 0.0030 = 0.997

T s12 = 1 − R s12 = 1 − 0.0640 = 0.936
Finally, we calculate the coefficient of finess

p p
R s10 R s12
4 4 (0.0030) (0.0640)
F=³ ´2 = ¡ p ¢2 = 0.0570
1 − (0.0030) (0.0640)
p
1 − R s10 R s12
and the maximum transmittance
T s01 T s12 (0.997) (0.936)

T smax = ³ ´2 = ¡ p ¢2 = 0.960
1 − (0.0030) (0.0640)
p
1 − R s10 R s12
Putting everything together, we have
0.960
T stot =
θ1 +π
³ ´
1 + 0.0570 sin2 2k1 d2 cos
2
The maximum transmittance occurs when the sine is zero. In that case, T stot =
0.960, meaning that 96% of the light is transmitted. We find the thickness by setting
the argument of the sine to π
2k 1 d 2 cos θ1 + π = 2π
Since k 1 = 2πn 1 /λvac , we have
λvac 633 nm
d2 = = = 134 nm
4n 1 cos θ1 4(1.38) cos 30.82◦
Without the coating, (i.e. d 2 = 0), the transmittance through the antireflection
coating would be 0.908, so the coating does give an improvement.
4.3 Beyond Critical Angle: Tunneling of Evanescent Waves

When θ0 is greater than the critical angle, an evanescent wave forms in the middle
region. In this case the formula (4.15) for the transmittance cannot be used.
However, the formula (4.14) still holds if the angle θ2 is real (i.e. if the critical
angle in the absence of the middle layer is not exceeded). Thus, we can use (4.14)
to describe frustrated total internal reflection. In this case an evanescent wave
occurs in the middle region, and if the second surface is sufficiently close to the
Figure 4.3 Animation showing first surface, the evanescent wave stimulates the second surface to produce a
showing frustrated total internal transmitted wave propagating at an angle θ2 . This behavior is sometimes referred
reflection. to as ‘tunneling’.

4.3 Beyond Critical Angle: Tunneling of Evanescent Waves 87
We do not need to deal directly with the complex angle θ1 . Rather, we just need
sin θ1 and cos θ1 in order to calculate the single-boundary Fresnel coefficients.
From Snell’s law we have
n0 n2
sin θ1 = sin θ0 = sin θ2 (4.20)
n1 n1
and for the middle layer we write

q
cos θ1 = i sin2 θ1 − 1 (4.21)
Note that beyond the critical angle, sin θ1 is greater than one. We illustrate how to
apply (4.14) via a specific example:
Example 4.3
Calculate the transmittance of p-polarized light through the region between two
closely spaced 45◦ right prisms, as shown in Fig. 4.4, as a function of λvac and
the prism spacing d . Take the index of refraction of the prisms to be n = 1.5
surrounded by index n = 1, and use θ0 = θ2 = 45◦ . Neglect possible reflections
from the exterior surfaces of the prisms.
Solution: From (4.20) and (4.21) we have
sin θ1 = 1.5 sin 45◦ = 1.0607
and p
cos θ1 = i 1.06072 − 1 = i 0.3536
Figure 4.4 Frustrated total internal
We must compute various expressions involving Fresnel coefficients that appear reflection in two prisms.
in (4.14): ¯2
¯ 01 ¯2 ¯¯ 2 cos θ0 sin θ1
¯ ¯ ¯
¯
¯t p ¯ = ¯ ¯
cos θ1 sin θ1 + cos θ0 sin θ0 ¯
¯2 (4.22)
2 p1 (1.061)
¯
2
¯ ¯
=¯ ¯ = 5.76
¯ ¯
¯ (i 0.3536) (1.0607) + p1 p1 ¯
2 2
¯2
¯ 12 ¯2 ¯¯ 2 cos θ1 sin θ2
¯ ¯ ¯
¯
¯t p ¯ = ¯ ¯
cos θ2 sin θ2 + cos θ1 sin θ1 ¯
¯2 (4.23)
2 (i 0.3536) p1
¯
2
¯ ¯
=¯ 1 1 ¯ = 0.640
¯ ¯
¯ p p + (i 0.3536) (1.0607) ¯
2 2
cos θ1 sin θ1 − cos θ0 sin θ0

r p12 = r p10 = −r p01 = −
cos θ1 sin θ1 + cos θ0 sin θ0
(i 0.3536) (1.0607) − p1 p1
2 2 (4.24)
=−
(i 0.3536) (1.061) + p1 p1
2 2
= e −i 1.287

For the last step above, see problem P0.15. Note that r p12 = r p10 since n 0 = n 2 . We
also need
2π 2π
k 1 d cos θ1 = d cos θ1 = d (i 0.3536)
λvac λvac
(4.25)
d
µ ¶
= i 2.22
λvac
We are now ready to compute the total transmittance (4.14). The factors out in
front vanish since θ0 = θ2 and n 0 = n 2 , and we have
¯ 01 ¯2 ¯ 12 ¯2
¯ ¯ ¯ ¯
¯t p ¯ ¯t p ¯
T ptot = ¯
¯e −i k1 d cos θ1 − r 10 r 12 e i k1 d cos θ1 ¯2
¯
p p
Figure 4.5 Transmittance of p-
polarized light through a gap be- (5.76)(0.640)
=¯ h í ¯2
tween two 45◦ prisms with n = 1.5
³ í h ³
¯ −i i 2.22 d d
−i 1.287 e −i 1.287 e i i 2.22 λvac ¯
¯
¯e λvac − e
as a function of gap thickness (Ex- ¯ ¯
ample 4.3). 3.69
=µ ³
d
´ ³
d
´ ¶µ ³ ´ ³ ´ ¶
2.22 λ −2.22 λ −i 2.574 2.22 λ d −2.22 λ d +i 2.574
e vac − e vac e vac − e vac
(4.26)
3.69
= ³
d
´ ³
d
´
4.44 e i 2.574 +e −i 2.574
³ ´
−4.44
e λvac +e λvac −2 2
3.69
= ³
d
´ ³
d
´
4.44 −4.44
e λvac +e λvac − 2 cos(2.574)
3.69
= ³
d
´ ³
d
´
4.44 −4.44
e λvac +e λvac + 1.69
Figure 4.5 shows a plot of the transmittance (4.26) calculated in Example 4.3.
Notice that the transmittance is 100% when the two prisms are brought together
Maurice Paul Auguste Charles Fabry as expected (T ptot (d /λvac = 0) = 1). When the prisms are about a wavelength apart,
(1867-1945, French) was born in Mar- the transmittance is significantly reduced, and as the distance gets large compared
seille, France. At age 18, he entered the
École Polytechnique in Paris where he to a wavelength, the transmittance quickly goes to zero (T ptot (d /λvac À 1) ≈ 0).
studied for two years. Following that, he
spent a number of years teaching state
secondary school while simultaneously
working on a doctoral dissertation on
4.4 Fabry-Perot
interference phenomona. After com-
pleting his doctorate, he began working In the 1890s, Charles Fabry realized that a double interface could be used to
as a lecturer and laboratory assistant distinguish wavelengths of light that are very close together. He and a talented
at the University of Marseille where a
decade later he was appointed a pro- experimentalist colleague, Alfred Perot, constructed an instrument and began to
fessor of physics. Soon after his arrival use it to make measurements on various spectral sources. The Fabry-Perot instru-
to the University of Marseille, Fabry
began a long and fruitful collaboration ment consists simply of two identical (parallel) surfaces separated by spacing d .
with Alfred Perot (1863-1925). Fabry We can use our analysis in section 4.2 to describe this instrument. For simplicity,
focused on theoretical analysis and mea-
surements while his colleague did the
we choose the refractive index before the initial surface and after the final surface
design work and construction of their to be the same (i.e. n 0 = n 2 ). We assume that the transmission angles are such
new interferometer, which they continu- that total internal reflection is avoided. Whether the double-boundary setup
ally improved over the years. During his
career, Fabry made significant contribu- has high or low transmittance depends on the exact spacing between the two
tions to spectroscopy and astrophysics boundaries and on the reflectivity of the surfaces, as well as on the wavelength of
and is credited with co-discovery of the
ozone layer. the light.

4.4 Fabry-Perot 89
If the spacing d separating the two parallel surfaces is adjustable (scanned),

the instrument is called a Fabry-Perot interferometer. If the spacing is fixed while
the angle of the incident light is varied, the instrument is called a Fabry-Perot
etalon. An etalon can therefore be as simple as a piece of glass with parallel
surfaces. Sometimes, a thin optical membrane called a pellicle is used as an
etalon (occasionally inserted into laser cavities to discriminate against certain
wavelengths). However, to achieve sharp discrimination between closely-spaced
wavelengths, a large spacing d is desirable.
As we previously derived (4.11), the transmittance through a double boundary
is
T max
T tot = (4.27)
1 + F sin2 Φ2
¡ ¢
In the case of identical interface on the incident and transmitted sides, the
transmittance and reflectance coefficients are the same at each surface (i.e.
T = T 01 = T 12 and R = R 10 = R 12 ). In this case, the maximum transmittance Jean-Baptiste Alfred Perot (1863-
and the finesse coefficient simplify to 1925, French)
T2
T max = (4.28)
(1 − R)2
and
4R
F= (4.29)
(1 − R)2
In principle, these equations should be evaluated for either s- or p-polarized light.
However, a Fabry-Perot interferometer or etalon is usually operated near normal
incidence so that there is little difference between the two polarizations.
When using a Fabry-Perot instrument, one observes the transmittance T tot as
the parameter Φ is varied. The parameter Φ can be varied by altering d , θ1 , or λ
as prescribed by
4πn 1 d
Φ= cos θ1 + δr (4.30)
λvac
To increase the sensitivity of the instrument, it is desirable to have the transmit- p p
tance T tot vary strongly when Φ is varied. By inspection of (4.27), we see that
T tot varies strongest if the finesse coefficient F is large. We achieve a large finesse
Figure 4.6 Transmittance as the
coefficient by increasing the reflectance R. To accomplish a large R, the two
phase Φ is varied. The different
surfaces need to reflect much better than, say, a simple air-glass interface.
curves correspond to different
The total transmittance T tot (4.27) through a Fabry-Perot instrument is de- values of the finesse coefficient.
picted in Fig. 4.6 as a function of Φ. The various curves correspond to different Φ0 represents a large multiple of
values of F . Typical values of Φ can be extremely large. For example, suppose 2π.
that the instrument is used at near-normal incidence (i.e. cos θ1 ∼ = 1) with a wave-
length of λvac = 500 nm and an interface separation of d 0 = 1 cm. From (4.30) the
value of Φ (ignoring the constant phase terms δr ) is approximately
4π(1 cm)
Φ0 = = 80, 000π
500 nm
As we vary d , λ, or θ1 by small amounts, we can easily cause Φ to change by 2π as

depicted in Fig. 4.6. The figure shows small changes in Φ above a value Φ0 , which
d represents a large multiple of 2π.
The basic setup of a Fabry-Perot instrument is shown in Fig. 4.7. In order to
achieve a relatively high reflectivity R (and therefore high finesse coefficient F ,
special coatings can be applied to the surfaces, for example, a thin layer of silver
(or some other coating) to achieve a partial reflection, say 90%. Typically, two glass
substrates are separated by distance d , with the coated surfaces facing each other
Incident as shown in the figure. The substrates are aligned so that the interior surfaces are
light R T parallel to each other. It is typical for each substrate to be slightly wedge-shaped
Ag A
coatings so that unwanted reflections from the outer surfaces do not interfere with the
double boundary situation between the two plates.
Figure 4.7 Typical Fabry-Perot
Technically, each coating constitutes its own double-boundary problem (or
setup. If the spacing d is variable,
multiple-boundary as the case may be). We can ignore this detail and simply
it is called an interferometer; oth-
erwise, it is called an etalon. think of the overall setup as a single two-interface problem. Without regard for the
details of the coatings, we can say that each coating has a certain has reflectance
R and transmittance T . However, as light goes through the coating, it can also be
attenuated through absorption. Therefore, at each coating surface we have
R +T + A = 1 (4.31)
Actuated
Substrate
Detector
where A represents the amount of light absorbed at a coating. The attenuation A
Collimated
Light
Interferometer reduces the amount of light that makes it through the instrument, but it does not
impact the nature of the interferences within the instrument.
Aperture
Angle
Adjustment The reflection phase δr in (4.30) depends on the exact nature of the coatings
in the Fabry-Perot instrument. However, we do not need to know the value of δr
Trig Sig (depending on both the complex index of the coating material and its thickness).
Whatever the value of δr , we only care that it is constant. Experimentally, we can
Oscilloscope
always compensate for the δr by ‘tweaking’ the spacing d . Note that the required
Figure 4.8 Setup for a Fabry-Perot ‘tweak’ on the spacing need only be a fraction of a wavelength, which is typically
interferometer. tiny compared to the overall spacing d .
4.5 Setup of a Fabry-Perot Instrument

Transmittance
Figure 4.8 shows the typical experimental setup for a Fabry-Perot interferometer.
A collimated beam of light is sent through the instrument. The beam is aligned so
that it is normal to the surfaces. It is critical for the two surfaces of the interferom-
eter to be extremely close to parallel. When aligned correctly, the transmission
of a collimated beam will ‘blink’ all together as the spacing d is changed (by tiny
amounts). A mechanical actuator can be used to vary the spacing between the
plates while the transmittance is observed on a detector. To make the alignment
Figure 4.9 Transmittance as the of the instrument somewhat less critical, a small aperture can be placed in front
separation d is varied (F = 100). of the detector so that it observes only a small portion of the beam.
d 0 represents a large distance for The transmittance as a function of plate separation is shown in Fig. 4.9. In this
which Φ is a multiple of 2π. case, Φ varies via changes in d (see (4.30) with cos θ1 = 1 and fixed wavelength).

4.5 Setup of a Fabry-Perot Instrument 91
As the spacing is increased by only a half wavelength, the transmittance changes

through a complete period. The various peaks in the figure are called fringes.
Etalon
Point
The setup for a Fabry-Perot etalon is similar to that of the interferometer Source
except that the spacing d remains fixed. Often the two surfaces in the etalon
are held parallel to each other by a precision ring spacer. An advantage to the
Fabry-Perot etalon (as opposed to the interferometer) is that no moving parts are Angle
Screen
Adjustment
needed. To make measurements with an etalon, the angle of the light is varied
rather than the plate separation. After all, to see fringes, we just need to cause Φ in Figure 4.10 A diverging monochro-
(4.27) to vary in some way. According to (4.30), we can do that as easily by varying matic beam traversing a Fabry-
θ1 as we can by varying d . One way to obtain a range of angles is to observe light Perot etalon. (The angle of diver-
from a ‘point source’, as depicted in Fig. 4.10. Different portions of the beam go gence is exaggerated in the figure.)
through the device at different angles. When aligned straight on, the transmitted
light forms a ‘bull’s-eye’ pattern on a screen, as will be described below.
In Fig. 4.11 we graph the transmittance T tot (4.27) as a function of angle
Transmission
(holding λvac = 500 nm and d = 1 cm fixed). Since cos θ1 is not a linear function,
the spacing of the peaks varies with angle. As θ1 increases from zero, the cosine
steadily decreases, causing Φ to decrease. Each time Φ decreases by 2π we get a
new peak. Not surprisingly, only a modest change in angle is necessary to cause
the transmittance to vary from maximum to minimum, or vice versa.
The bull’s-eye pattern in Fig. 4.10 can be understood as the curve in Fig. 4.11
rotated about a circle. Depending on the exact spacing between the plates, the
Figure 4.11 Transmittance
radii (or angles) where the fringes occur can be different. For example, the center
through a Fabry-Perot etalon
spot could be dark. (F = 10) as the angle θ1 is varied. It
Spectroscopic samples often are not compact point-like sources. Rather, they is assumed that the distance d is
are extended diffuse sources. The earlier setup shown in Fig. 4.10 won’t work for chosen such that Φ is a multiple of
extended sources unless all of the light at the sample is blocked except for a tiny 2π when the angle is zero.
point. This is impractical if there remains insufficient illumination at the final
screen for observation. Diffuse
Source Screen
In order to preserve as much light as possible, we can sandwich the etalon Lens Etalon Lens
between two lenses. We place the diffuse source at the focal point of the first lens.
We place the screen at the focal point of the second lens. This causes an image of
the source to appear on the screen.4
Each point of the diffuse source is mapped to a corresponding point on the
screen. Moreover, the light associated with any particular point of the source
Figure 4.12 Setup of a Fabry-Perot
travels as a collimated beam in the region between the lenses. Each collimated
etalon for looking at a diffuse
beam traverses the etalon with a unique angle. The light associated with each source.
point traverses the etalon with higher or lower transmittance, according to the
differing angles. The result is that a bull’s eye pattern becomes superimposed
on the image of the diffuse source. One can observe the pattern directly by
substituting the lens and retina of the eye for the final lens and screen.
4 If the diffuse source has the shape of Mickey Mouse, then an image of Mickey Mouse appears
on the screen. Imaging techniques are discussed in chapter 9.

4.6 Distinguishing Nearby Wavelengths in a Fabry-Perot

Instrument
Thus far, we have examined how the transmittance through a Fabry-Perot instru-
ment varies with surface separation d and angle θ1 . However, the main purpose
of a Fabry-Perot instrument is to measure small changes in the wavelength of
light, which similarly affect the value of Φ (see (4.30)).
Consider a Fabry-perot interferometer where the transmittance through the
instrument is plotted as a function of surface separation d . Let the spacing
Transmittance
d 0 correspond to the case when Φ is a multiple of 2π for the wavelength λvac .

Next, suppose we adjust the wavelength of the light from λvac = λ0 to λvac =
λ0 + ∆λ while observing the transmittance. As we do this, the value of Φ changes.
Figure 4.13 shows what happens as we scan the spacing d of the interferometer in
the neighborhood of d 0 . The dashed line corresponds to a different wavelength.
As the wavelength changes, the plate separation at which a particular fringe
occurs also changes.
We now derive the connection between a change in wavelength and the
Figure 4.13 Transmittance as the
amount that Φ changes, which gives rise to the fringe shift seen in Fig. 4.13.
spacing d is varied for two differ-
Suppose that the transmittance through the Fabry-Perot instrument is maximum
ent wavelengths (F = 100). The
solid line plots the transmittance at the wavelength λ0 . That is, we have
of light with a wavelength of λ0 ,
and the dashed line plots the
4πn 1 d 0 cos θ1
Φ0 = + δr (4.32)
transmittance of a wavelength λ0
shorter than λ0 . Note that the
fringes shift positions for different where Φ0 is an integer multiple of 2π. At a new wavelength (all else remaining the
wavelengths. same) we have
4πn 1 d 0 cos θ1
Φ= + δr (4.33)
λ0 + ∆λ
The change in wavelength ∆λ is usually very small compared to λ0 , so we can
represent the denominator with the first two terms of a Taylor-series expansion:
1 1 ∼ 1 − ∆λ/λ0
= = (4.34)
λ0 + ∆λ λ0 (1 + ∆λ/λ0 ) λ0
Then the difference between Φ0 and Φ can be rewritten as
4πn 1 d 0 cos θ1
∆Φ ≡ Φ0 − Φ = ∆λ (4.35)
λ20
If the change in wavelength is enough to cause ∆Φ = 2π, the fringes in Fig. 4.13
shift through a whole period, and the picture looks the same.
This brings up an important limitation of the instrument. If the fringes shift
by too much, we might become confused as to what exactly has changed, owing
to the periodic nature of the fringes. If two wavelengths aren’t sufficiently close,
the fringes of one wavelength may be shifted past several fringes of the other
wavelength, and we will not be able to tell by how much they differ.

4.6 Distinguishing Nearby Wavelengths in a Fabry-Perot Instrument 93
This introduces the concept of free spectral range, which is the wavelength
change ∆λFSR that causes the fringes to shift through one period. We find this by
setting (4.35) equal to 2π. After rearranging, we get
λ2vac
∆λFSR = (4.36)
2n 1 d 0 cos θ1
The free spectral range tends to be extremely narrow; a Fabry-Perot instrument is

not well suited for measuring wavelength ranges wider than this. In summary, the
free spectral range is the largest change in wavelength permissible while avoiding
confusion. To convert this wavelength difference ∆λFSR into a corresponding
frequency difference, one differentiates ν = c/λvac to get
c∆λFSR
|∆νFSR| = (4.37)
λ2vac
We next consider the smallest change in wavelength that can be noticed,

or resolved with a Fabry-Perot instrument. For example, if two very near-by
wavelengths are sent through the instrument simultaneously, we can distinguish
them only if the separation between their corresponding fringe peaks is at least
as large as the width of individual peaks. This situation of two barely resolvable
fringe peaks is illustrated in Fig. 4.14 for a diverging beam traversing an etalon.
We will look for the wavelength change that causes a peak to shift by its own
width. We define the width of a peak by its full width at half maximum (FWHM). Figure 4.14 Transmittance of a
Again, let Φ0 be a multiple of 2π so that a peak in transmittance occurs when diverging beam through a Fabry-
Perot etalon. Two nearby wave-
Φ = Φ0 . In this case, we have from (4.27) that
lengths are sent through the in-
T max strument simultaneously, (top)
T tot = ´ = T max (4.38) barely resolved and (bottom) eas-
sin2 Φ20
³
1+F ily resolved.
If Φ varies from Φ0 to Φ0 ± ΦFWHM /2, then, by definition, the transmittance drops

to one half. Therefore, we may write
T max T max
T tot = ´= (4.39)
Φ0 ±ΦFWHM /2 2
³
1 + F sin2 2
We solve (4.39) for ΦFWHM , and we see that this equation requires
ΦFWHM
µ ¶
F sin2 =1 (4.40)
4
where we have taken advantage of the fact that Φ0 is a multiple of 2π. Next,
we suppose that ΦFWHM is rather small so that we may represent the sine by its
argument. This approximation is okay if the finesse coefficient F is rather large
(say, 100). With this approximation, (4.40) simplifies to
4
ΦFWHM ∼
=p . (4.41)
F

The ratio of the period between peaks 2π to the width ΦFWHM of individual peaks
is called the reflecting finesse (or just finesse).
p
2π π F
f ≡ = (4.42)
ΦFWHM 2
This parameter is often used to characterize the performance of a Fabry-Perot
instrument. Note that a higher finesse f implies sharper fringes in comparison to
the fringe spacing.
The free spectral range ∆λFSR compared to the minimum wavelength ∆λFWHM
is the same as a whole period 2π compared to ΦFWHM , or the reflecting finesse f .
Therefore, we have
∆λFSR λ2vac
∆λFWHM = = p (4.43)
f πn 1 d 0 cos θ1 F
As a final note, the ratio of λ0 to ∆λmin , where ∆λmin is the minimum change
of wavelength that the instrument can distinguish in the neighborhood of λ0 is
called the resolving power. For a Fabry-Perot instrument it is
λ0
RP ≡ (4.44)
∆λFWHM
Fabry-Perot instruments tend to have very high resolving powers since they re-
spond to very small differences in wavelength.
Example 4.4
A Fabry-Perot instrument has plate spacing d 0 = 1 cm, reflectivity R = 0.85, and
index n 1 = 1. If it is used in the neighborhood of λvac = 500 nm, find the free
spectral range, the finesse, the minimum distinguishable wavelength separation,
and the resolving power.
Solution: From (4.36), the free spectral range is
λ2vac (500 nm)2

∆λFSR = = ∆λFSR = = 0.0125 nm
2n 1 d 0 cos θ1 2 (1) (1 cm) cos 0◦
This means that we should not use the instrument to distinguish wavelengths that
are separated by more than this small amount. From (4.29), the finesse coefficient
is
4R 4 (0.85)
F= = = 151
(1 − R)2 (1 − (0.85))2
and by (4.42) the finesse is
p p
π F π 151
f = = = 19.3
2 2
The minimum resolvable wavelength change is then
∆λFSR 0.0125 nm
∆λFWHM = = = 0.00065 nm (4.45)
f 19

4.7 Multilayer Coatings 95
The instrument can distinguish two wavelengths separated by this tiny amount,
which gives an impressive resolving power of
λvac 500 nm
RP = = = 772, 000
∆λFWHM 0.00065 nm
For comparison, the resolving power of a typical grating spectrometer is much less
(a few thousand). However, a grating spectrometer has the advantage that it can
simultaneously observe wavelengths over hundreds of nanometers, whereas the
Fabry-Perot instrument is confined to the extremely narrow free spectral range.
4.7 Multilayer Coatings

As we saw in Example 4.2, a single coating cannot always accomplish a desired
effect, especially if the goal is to make a highly reflective mirror. For example, if
we want to make a mirror surface using a dielectric (i.e. nonmetallic) coating,
a single layer is insufficient to reflect the majority of the light. In P4.5 we find
that a single dielectric layer deposited on glass can reflect at most about 46%
of the light, even when we used a material with very high index. We would like
to do much better (e.g. >99%), and this can be accomplished with multilayer
dielectric coatings. Multilayer dielectric coatings can perform considerably better
than metal surfaces such as silver and have the advantage of being less prone to
damage.
In this section, we develop the formalism for dealing with arbitrary numbers
of parallel interfaces (i.e. multilayer coatings). Rather than incorporate the single-
interface Fresnel coefficients into the problem as we did in section 4.1, we will
find it easier to return to the fundamental boundary conditions for the electric
and magnetic fields at each interface between the layers.
We examine p-polarized light incident on an arbitrary multilayer coating (all
interfaces parallel to each other). It is left as an exercise to re-derive the formalism
for s-polarized light (see P4.13). The upcoming derivation is valid also for complex
refractive indices, although our notation suggests real indices. The ability to deal
with complex indices is very important if, for example, we want to make mirror
coatings work in the extreme ultraviolet wavelength range where virtually every
material is absorptive. Consider the diagram of a multilayer coating in Fig. 4.15
for which the angle of light propagation in each region may be computed from
Snell’s law:
n 0 sin θ0 = n 1 sin θ1 = · · · = n N sin θN = n N +1 sin θN +1 (4.46)
where N denotes the number of layers in the coating. The subscript 0 represents
the initial medium outside of the multilayer, and the subscript N + 1 represents
the final material, or the substrate on which the layers are deposited.
In each layer, only two plane waves exist, each of which is composed of light
arising from the many possible bounces from various layer interfaces. The arrows
pointing right indicate plane wave fields in individual layers that travel roughly

z-direction
Figure 4.15 Light propagation through multiple layers.
in the forward (incident) direction, and the arrows pointing left indicate plane
wave fields that travel roughly in the backward (reflected) direction. In the final
(p)
region, there is only one plane wave traveling with a forward direction (E N +1 )
which gives the overall transmitted field.
As we have studied in chapter 3 (see (3.9) and (3.13)), the boundary conditions
for the parallel components of the E field and for the parallel components of the
B field lead respectively to
¡ (p) (p)
¡ (p) (p)
cos θ0 E 0 + E 0 = cos θ1 E 1 + E 1 (4.47)
¢ ¢
and
¡ (p) (p)
¡ (p) (p)
n 0 E 0 − E 0 = n 1 E 1 − E 1 (4.48)
¢ ¢
Similar equations give the field connection for s-polarized light (see (3.8) and
(3.14)).
We have applied these boundary conditions at the first interface only. Of
course there are many more interfaces in the multilayer. For the connection
between the j th layer and the next, we may similarly write
³ ´
cos θ j E j e i k j ` j cos θ j + E j e −i k j ` j cos θ j = cos θ j +1 E j +1 + E j +1
(p) (p)
¡ (p) (p)
(4.49)
¢
and ³ ´
n j E j e i k j ` j cos θ j − E j e −i k j ` j cos θ j = n j +1 E j +1 − E j +1
(p) (p)
¡ (p) (p)
(4.50)
¢
Here we have set the origin within each layer at the left surface. Then when
making the connection with the subsequent layer at the right surface, we must
specifically take into account the phase k j · ` j ẑ = k j ` j cos θ j . This corresponds
¡ ¢
to the phase acquired by the plane wave field in traversing the layer with thickness
` j . The right-hand sides of (4.49) and (4.50) need no phase adjustment since the
( j + 1)th field is evaluated on the left side of its layer.
At the final interface, the boundary conditions are
³ ´
cos θN E N e i k N `N cos θN + E N e −i k N `N cos θN = cos θN +1 E N +1
(p) (p) (p)
(4.51)

4.7 Multilayer Coatings 97
and ³ ´
n N E N e i k N `N cos θN − E N e −i k N `N cos θN = n N +1 E N +1
(p) (p) (p)
(4.52)
since there is no backward-traveling field in the final medium.

At this point we are ready to solve (4.47)–(4.52). We would like to eliminate
(p) (p) (p)
all fields besides E 0 , E 0 , and E N +1 . Then we will be able to find the overall
reflectance and transmittance of the multilayer coating. In solving (4.47)–(4.52),
we must proceed with care, or the algebra can quickly get out of hand. Fortunately,
most students have had training in linear algebra, and this is a case where that
training pays off.
We first write a general matrix equation that summarizes the mathematics in
(4.47)–(4.52), as follows:
cos θ j e i β j cos θ j e −i β j
(p) (p)
E j cos θ j +1 cos θ j +1 E j +1
· ¸· ¸ · ¸· ¸
= (4.53)
n j ei βj −n j e −i β j
(p) (p)
Ej n j +1 −n j +1 E j +1
where
0 j =0
½
βj ≡ (4.54)
k j ` j cos θ j 1≤ j ≤N
and
(p)
E N +1 ≡ 0 (4.55)
(It would be good to take a moment to convince yourself that this set of matrix
equations properly represents (4.47)–(4.52) before proceeding.) We rewrite (4.53)
as
¸−1 ·
cos θ j e i β j cos θ j e −i β j
· (p) ¸ · ¸ · (p) ¸
E j cos θ j +1 cos θ j +1 E j +1
=
n j ei βj −n j e −i β j
(p) (p)
Ej n j +1 −n j +1 E j +1
(4.56)
Keep in mind that (4.56) represents a distinct matrix equation for each differ-
ent j . We can substitute the j = 1 equation into the j = 0 equation to get
(p) ¸−1 (p)
E 0 cos θ0 cos θ0 cos θ2 cos θ2 E 2
· ¸ · · ¸· ¸
(p)
(p) = M1 (p) (4.57)
E0 n0 −n 0 n2 −n 2 E2
where we have grouped the matrices related to the j = 1 layer together via
¸−1
cos θ1 cos θ1 cos θ1 e i β1 cos θ1 e −i β1
· ¸·
(p)
M1 ≡ (4.58)
n1 −n 1 n 1 e i β1 −n 1 e −i β1
We can continue to substitute into this equation progressively higher order equa-
tions (i.e. for j = 2, j = 3, ... ) until we reach the j = N layer. All together this will
give
· (p) ¸ · ¸−1 Ã N !· ¸ · (p) ¸
E 0 cos θ0 cos θ0 Y (p) cos θN +1 cos θN +1 E N +1
(p) = M j
E0 n0 −n 0 j =1
n N +1 −n N +1 0
(4.59)

where the matrices related to the j th layer are grouped together according to
¸−1
cos θ j cos θ j cos θ j e i β j cos θ j e −i β j
· ¸·
(p)
Mj ≡
nj −n j n j ei βj −n j e −i β j
(4.60)
cos β j −i sin β j cos θ j /n j
· ¸
=
−i n j sin β j / cos θ j cos β j
The matrix inversion in the first line was performed using (0.50). The symbol Π
signifies the product of the matrices with the lowest subscripts on the left:
N
(p) (p) (p) (p)
Y
M j ≡ M1 M2 · · · M N (4.61)
j =1
(p)
As a finishing touch, we divide (4.59) by the incident field E 0 as well as perform
the matrix inversion on the right-hand side to obtain
· (p) ± (p) ¸
1 E N +1 E 0
· ¸
(p)
(p) (p) =A (4.62)
E 0 E 0 0
±
where
(p) (p) ¸Ã N
!·
a 11 a 12 1 n0 0cos θ0 cos θN +1
· ¸ · ¸
(p)
(p)
Y
A ≡ (p) (p) = Mj
a 21 a 22 2n 0 cos θ0 j =1
n0 − cos θ0
0 n N +1
(4.63)
In the final matrix in (4.63) we have replaced the entries in the right column with
zeros. This is permissable since it operates on a column vector with zero in the
bottom component.
Equation (4.62) represents two equations, which must be solved simultane-
(p) (p) (p) (p)
ously to find the ratios E 0 /E 0 and E N +1 /E 0 . Once the matrix A (p) is computed,
this is a relatively simple task:
(p)
E N +1 1
tp ≡ (p)
= (p)
(Multilayer) (4.64)
E 0 a 11
(p) (p)
E0 a 21
rp ≡ (p)
= (p)
(Multilayer) (4.65)
E 0 a 11
The convenience of this notation lies in the fact that we can deal with an
arbitrary number of layers N with varying thickness and index. The essential
information for each layer is contained succinctly in its respective 2 × 2 matrix.
To find the overall effect of the many layers, we need only multiply the matrices
for each layer together to find A from which we compute the reflection and
transmission coefficients for the whole system.
The derivation for s-polarized light is similar to the above derivation for p-
polarized light. The equation corresponding to (4.62) for s-polarized light turns
out to be · (s) ± (s) ¸
1 E N +1 E 0
· ¸
(s)
=A (4.66)
E 0(s) E 0(s) 0
±

4.8 Repeated Multilayer Stacks 99
where
¸Ã N
!·
(s)
a 11 (s)
a 12 1 n 0 cos θ0 1 1 0
· ¸ · ¸
(s)
Y (s)
A ≡ = Mj
(s)
a 21 (s)
a 22 2n 0 cos θ0 n 0 cos θ0 −1 j =1
n N +1 cos θN +1 0
(4.67)
and
cos β j −i sin β j /(n j cos θ j )
· ¸
(s)
Mj = (4.68)
−i n j cos θ j sin β j cos β j
The transmission and reflection coefficients are found (as before) from
E N(s)+1 1
ts ≡ (s)
= (s) (Multilayer) (4.69)
E 0 a 11
E 0(s) (s)
a 21
rs ≡ = (Multilayer) (4.70)
E 0(s) a 11
(s)
4.8 Repeated Multilayer Stacks

Many different types of multilayer coatings are possible. For example, a Brewster’s-
angle polarizer has a coating designed to transmit with high efficiency p-polarized
light while simultaneously reflecting s-polarized light with high efficiency. The
backside of the substrate is left uncoated where p-polarized light passes with
100% efficiency at Brewster’s angle.
Sometimes multilayer coatings are made with repeated stacks of layers. In
general, if the same series of layers in (??) is repeated many times, say q times,
Sylvester’s theorem (see appendix 0.4) can come in handy:
¸q
A B 1 A sin qθ − sin q − 1 θ B sin qθ
· · ¡ ¢ ¸
= (4.71)
C D C sin qθ D sin qθ − sin q − 1 θ
¡ ¢
sin θ
where
1
cos θ ≡ (A + D) . (4.72)
2
This formula relies on the condition AD − BC = 1, which is true for matrices of ... substrate
the form (4.60) and (4.68) or any product of them. Here, A, B , C , and D represent
the elements of a matrix composed of a block of matrices corresponding to a
repeated pattern within the stack.
In general, high-reflection coatings are designed with alternating high and Figure 4.16 A repeated multilayer
low refractive indices. For high reflectivity, each layer should have a quarter- structure with alternating high
wave thickness. Since the layers alternate high and low indices, at every other and low indexes where each layer
boundary there is a phase shift of π upon reflection from the interface. Hence, is a quarter wavelength in thick-
the quarter wavelength spacing is appropriate to give constructive interference in ness. This structure can achieve
the reflected direction. very high reflectance.

Example 4.5
Derive the reflection and transmission coefficients for p polarized light interacting
with a high reflector constructed using a λ/4 stack.
Solution: For a λ/4 stack we need

π
βj =
2
This amounts to a thickness requirement of
λvac
`j =
4n j cos θ j
In this situation, the matrix (4.60) for each layer simplifies to
0 −i cos θ j /n j
· ¸
(p)
Mj =
−i n j / cos θ j 0
The matrices for a high and a low refractive index layer are multiplied together in
the usual manner. Each layer pair takes the form
θH θL
# " n cos θ
− i cos − i cos
" #" #
0 nH 0 nL
− nL cos θH 0
= H L
i nH i nL cos θL
− cos θH 0 − cos θL 0 0 − nn Hcos θ L H
To extend to q = N /2 identical layer pairs, we have

" n cos θ #q
N − nL cos θH 0
(p)
Y
Mj = H L
cos θL
j =1 0 − nn Hcos θH
L
n L cos θH q
 ³ ´ 
− n cos θ 0
H L
= ³
cos θL q
´ 
t 0 − nn Hcos θ L H
r
Substituting this into (4.63), we obtain
n L cos θH q cos θN +1 n H cos θL q
 ³ ´ ³ ´ 
n N +1
1 − n cos θ cos θ + − 0
A (p) =  ³ n Hcos θ L ´q cos θ 0 ³ nL cos θH ´q n0 
cos θL n N +1
q 2 − nL cos θH N +1
cos θ0 − − nn Hcos θ n0 0
H L L H
Figure 4.17 The transmission and

reflection coefficients for a quarter With A (p) in hand, we can now calculate the transmission coefficient from (4.64)
wave stack as q is varied (n L = 1.38 1
tp = ³ (λ/4 stack, p-polarized) (4.73)
and n H = 2.32). θH q cos θN +1 cos θL q
´ ³ ´
n N +1
− nnL cos
cos θ cos θ0 + − nn Hcos θ n0
H L L H
and the reflection coefficient from (4.65)

θH q cos θN +1 n H cos θL q
³ ´ ³ ´
n N +1
− nnL cos
cos θ cos θ 0
− − n cos θ n0
H L L H
rp = ³ (λ/4 stack, p-polarized) (4.74)
n L cos θH q cos θN +1 n H cos θL q
´ ³ ´
n N +1
− n cos θ cos θ0 + − n cos θ n0
H L L H
The quarter-wave multilayer considered in Example 4.5 can achieve extraor-

dinarily high reflectivity. In the limit of q → ∞, we have t p → 0 and r p → −1 (see
Fig. 4.17), giving 100% reflection with a π phase shift.

Exercises 101
Exercises
Exercises for 4.1 Double-Interface Problem Solved Using Fresnel Coefficients
P4.1 Use (4.4)–(4.7) to derive r stot given in (4.12).
P4.2 Consider a 1 micron thick coating of dielectric material (n = 2) on

a piece of glass (n = 1.5). Use a computer to plot the magnitude of
the overall Fresnel coefficient (4.11) from air into the glass at normal
incidence. Plot as a function of wavelength for wavelengths between
200 nm and 800 nm (assume the index remains constant over this
range).
Exercises for 4.2 Two-Interface Transmittance at Sub Critical Angles
P4.3 Verify that in the case that θ1 and θ2 are real that (4.14) simplifies to
(4.15).
P4.4 A light wave impinges at normal incidence on a thin glass plate with
index n and thickness d .
(a) Show that the transmittance through the plate as a function of
wavelength is
1
T tot = 2
1 + ( 4n 2 ) sin2 2πnd
2
n −1
³ ´
λvac
HINT: Find
n −1
r 12 = r 10 = −r 01 =
n +1
and then use
T 01 = 1 − R 01
T 12 = 1 − R 12
(b) If n = 1.5, what is the maximum and minimum transmittance

through the plate?
(c) If the plate thickness is d = 150 µm, what wavelengths transmit with
maximum efficiency? Express your answer as a formula involving an
integer j .
P4.5 Show that the maximum reflectance possible from the front coating in
Example 4.2 is 46%. Find the smallest possible d 1 that accomplishes
this for light with wavelength λvac = 633 nm.

Exercises for 4.3 Beyond Critical Angle: Tunneling of Evanescent Waves
P4.6 Re-compute (4.26) in the case of s-polarized light. Write the result in
the same form as the last expression in (4.26). HINT: You need to redo
(4.22)–(4.24).
L4.7 Consider s-polarized microwaves (λvac = 3 cm) encountering an air

gap separating two paraffin wax prisms (n = 1.5). The 45◦ right-angle
prisms are arranged with the geometry shown in Fig. 4.4. The presence
of the second prism ‘frustrates’ the total internal reflection.
Microwave Paraffin Paraffin Paraffin Microwave

Source Lens Prisms Lens Detector
Figure 4.19
Separation (cm)
(a) Use a computer to plot the transmittance through the gap (i.e. the
Figure 4.18 Theoretical vs. mea- result of P4.6) as a function of separation d (normal to gap surface).
sured microwave transmission Neglect reflections from other surfaces of the prisms.
through wax prisms. Mismatch is
presumably due to imperfections (b) Measure the transmittance of the microwaves through the prisms
in microwave collimation and/or as function of spacing d (normal to the surface) and superimpose the
extraneous reflections. results on the graph of part (a). Figure 4.18 shows a plot of typical data
taken with this setup. (video)
Exercises for 4.6 Distinguishing Nearby Wavelengths in a Fabry-Perot Instru-

ment
P4.8 A Fabry-Perot interferometer has silver-coated plates each with re-

flectance R = 0.9, transmittance T = 0.05, and absorbance A = 0.05.
The plate separation is d = 0.5 cm with interior index n 1 = 1. Suppose
that the wavelength being observed near normal incidence is 587 nm.
(a) What is the maximum and minimum transmittance through the
interferometer?
(b) What are the free spectral range ∆λFSR and the fringe width ∆λFWHM ?
(c) What is the resolving power?
P4.9 Generate a plot like Fig. 4.11(a), showing the fringes you get in a Fabry-
Perot etalon when θ1 is varied. Let Tmax = 1, F = 10, λ = 500 nm,
d = 1 cm, and n 1 = 1.
(a) Plot T vs. θ1 over the angular range used in Fig. 4.11(a).

Exercises 103
(c) Suppose d was slightly different, say 1.00001 cm. Make a plot of T
vs θ1 for this situation.
P4.10 Consider the configuration depicted in Fig. 4.10, where the center of the
diverging light beam λvac = 633 nm approaches the plates at normal
incidence. Suppose that the spacing of the plates (near d = 0.5 cm) is
just right to cause a bright fringe to occur at the center. Let n 1 = 1. Find
the angle for the m th circular bright fringe surrounding the central spot
(the 0th fringe corresponding to the center). HINT: cos θ ∼ = 1−θ 2 /2. The
p
answer has the form a m; find the value of a.
L4.11 Characterize a Fabry-Perot etalon in the laboratory using a HeNe laser

(λvac = 633 nm). Assume that the bandwidth ∆λHeNe of the HeNe laser
is very narrow compared to the fringe width of the etalon ∆λFWHM .
Assume two identical reflective surfaces separated by 5.00 mm. Deduce
the free spectral range ∆λFSR , the fringe width ∆λFWHM , the resolving
power, and the reflecting finesse (small f ). (video)
Diverging Lens
Laser
Filter
Fabry-Perot CCD
Etalon Camera
Figure 4.20
L4.12 Use the same Fabry-Perot etalon to observe the Zeeman splitting of the
yellow line λ = 587.4 nm emitted by a krypton lamp when a magnetic
N
Filter
field is applied. As the line splits and moves through half of the free
spectral range, the peak of the decreasing wavelength and the peak of
CCD
the increasing wavelength meet on the screen. When this happens, by S Fabry-Perot Camera
how much has each wavelength shifted? (video) Etalon
Figure 4.21
Exercises for 4.7 Multilayer Coatings
P4.13 (a) Write (4.47) through (4.52) for s-polarized light.

(b) From these equations, derive (4.66)–(4.68).
P4.14 Show that (4.69) for a single layer (i.e. two interfaces), is equivalent to
(4.11). WARNING: This is more work than it may appear at first.

Exercises for 4.8 Repeated Multilayer Stacks
P4.15 (a) What should be the thickness of the high and the low index layers in
a periodic high-reflector mirror? Let the light be p-polarized and strike
the mirror surface at 45◦ . Take the indices of the layers be n H = 2.32
and n L = 1.38, deposited on a glass substrate with index n = 1.5. Let
the wavelength be λvac = 633 nm.
(b) Find the reflectance R with 1, 2, 4, and 8 periods in the high-low
stack.
P4.16 Find the high-reflector matrix for s-polarized light that corresponds to
(??).
P4.17 Design an anti-reflection coating for use in air (assume the index of air
is 1):
(a) Show that for normal incidence and λ/4 films (thickness= 14 the
wavelength of light inside the material), the reflectance of a single layer
(n 1 ) coating on a glass is
!2
n g − n 12
Ã
R=
n g + n 12
(b) Show that for a two coating setup (air-n 1 -n 2 -glass; n 1 and n 2 are
each a λ/4 film), that !2
n 22 − n g n 12
Ã
R= 2
n 2 + n g n 12
(c) If n g = 1.5, and you have a choice of these common coating ma-
terials: ZnS (n = 2.32), CeF (n = 1.63) and MgF (n = 1.38), find the
combination that gives you the lowest R for part (b). (Be sure to specify
which material is n 1 and which is n 2 .) What R does this combination
give?
P4.18 Consider a two-coating ‘anti-reflection optic’ (each coating set for

λ/4, as in problem P4.17) using n 1 = 1.6 and n 2 = 2.1 applied to a
glass substrate n g = 1.5 at normal incidence. Suppose the coating
thicknesses are optimized for λ = 550 nm (in the middle of the visible
range) and ignore possible variations of the indices with λ. Use the
matrix techniques and a computer to plot R(λair ) for 400 to 700 nm
(visible range). Do this for a single bilayer (one layer of each coating),
two bilayers, four bilayers, and 25 bilayers.

Chapter 5
Propagation in Anisotropic Media
To this point, we have considered only isotropic media where the susceptibility
χ(ω) (and hence the index of refraction) is the same for all propagation directions
and polarizations. In anisotropic materials, such as crystals, it is possible for
light to experience a different index of refraction depending on the orientation
(i.e. polarization) of the electric field E. This difference in the index of refraction
occurs when the direction and strength of the induced dipoles depends responds
to the lattice structure of the material in addition to the propagating field.1 The
unique properties of anisotropic materials make them important elements in
many optical systems.
In the following section 5.1 we discuss how to connect E and P in anisotropic
media using a susceptibility tensor. In section 5.2 we apply Maxwell’s equations to
a plane wave traveling in a crystal. The analysis leads to Fresnel’s equation, which
relates the components of the k-vector to the components of the susceptibility
tensor. In section 5.3 we apply Fresnel’s equation to a uniaxial crystal (e.g. quartz,
sapphire) where χx = χ y 6= χz . In the context of a uniaxial crystal, we show the
general principle that in a crystal the Poynting vector and the k-vector are not
parallel.
More than a century before Fresnel, Christian Huygens successfully described
birefringence in crystals using the idea of elliptical wavelets. His method gives
the direction of the Poynting vector associated with the extraordinary ray in a
crystal. It was Huygens who coined the term ‘extraordinary’ since one of the
rays in a birefringent material appeared not to obey Snell’s law. Actually, the
k-vector always obeys Snell’s law, but in a crystal, the k-vector points in a different
direction than the Poynting vector, and it is the Poynting vector that delivers the
energy seen by an observer. Huygens’ approach is outlined in Appendix 5.D.
1 Not all crystals are anisotropic. For instance, crystals with a cubic lattice structure (such as
NaCl) are highly symmetric and respond to electric fields the same in any direction.
105
106 Chapter 5 Propagation in Anisotropic Media
5.1 Constitutive Relation in Crystals

In a anisotropic crystal, asymmetries in the lattice can cause the medium polar-
ization P to respond in a different direction than the electric field E (i.e. P 6= ²0 χE).
However, at low intensities the response of materials is still linear (or propor-
tional) to the strength of the electric field. The linear constitutive relation which
connects P to E in a crystal can be expressed in its most general form as
Px χxx χx y χxz Ex
    
 P y  = ²0  χ y x χy y χy z   E y  (5.1)
Pz χzx χz y χzz Ez
The matrix in (5.1) is called the susceptibility tensor. To visualize the behavior
Figure 5.1 A physical model of an of electrons in such a material we imagine each electron bound as though by
electron bound in a crystal lat- tiny springs with different strengths in different dimensions to represent the
tice with the coordinate system anisotropy (see Fig. 5.1). When an external electric field is applied, the electron
specially chosen along the princi- experiences a force that moves it from its equilibrium position. The ‘springs’
pal axes so that the susceptibility (actually the electric force from ions bound in the crystal lattice) exert a restoring
tensor takes on a simple form. force, but the restoring force is not equal in all directions—the electron tends to
move more along the dimension of the weaker spring. The displaced electron
creates a microscopic dipole, but the asymmetric restoring force causes P to be in
a direction different than E as depicted in Fig. 5.2.
To understand the geometrical interpretation of the many coefficients χi j ,
assume, for example, that the electric field is directed along the x-axis (i.e. E y =
E z = 0) as depicted in Fig. 5.2. In this case, the three equations encapsulated in
(5.1) reduce to
P x = ²0 χxx E x
P y = ²0 χ y x E x
P z = ²0 χzx E x
Notice that the coefficient χxx connects the strength of P in the x̂ direction with
Figure 5.2 The applied field E the strength of E in that same direction, just as in the isotropic case. The other two
and the induced polarization P in coefficients (χ y x and χzx ) describe the amount of polarization P produced in the
general are not parallel in a crystal ŷ and ẑ directions by the electric field component in the x-dimension. Likewise,
lattice.
the other coefficients with mixed subscripts in (5.1) describe the contribution to
P in one dimension made by an electric field component in another dimension.
As you might imagine, working with nine susceptibility coefficients can get
complicated. Fortunately, we can greatly reduce the complexity of the description
by a judicious choice of coordinate system. In Appendix 5.A we explain how
conservation of energy requires that the susceptibility tensor (5.1) for typical
non-aborbing crystals be real and symmetric (i.e. χi j = χ j i ).2
2 By ‘typical’ we mean that the crystal does not exhibit optical activity. Optically active crystals
have a complex susceptibility tensor, even when no absorption takes place. Conservation of energy
in this more general case requires that the susceptibility tensor be Hermitian (χi j = χ∗j i ).

5.2 Plane Wave Propagation in Crystals 107
Appendix 5.B shows that, given a real symmetric tensor, it is is always possible
to choose a coordinate system for which off-diagonal elements vanish. This is
true even if the lattice planes in the crystal are not mutually orthogonal (e.g.
rhombus, hexagonal, etc.). We will imagine that this rotation of coordinates
has been accomplished. In other words, we can let the crystal itself dictate the
orientation of the coordinate system, aligned to the principal axes of the crystal
for which the off-diagonal elements of (5.1) are zero
With the coordinate system aligned to the principal axes, the constitutive
relation for a non absorbing crystal simplifies to
Px χx 0 0 Ex
    
 P y  = ²0  0 χy 0  Ey  (5.2)
Pz 0 0 χz Ez
or without the matrix notation (since it no longer offers much convenience)
P = x̂²0 χx E x + ŷ²0 χ y E y + ẑ²0 χz E z (5.3)
By assumption, χx , χ y , and χz are all real. (We have dropped the double subscript;
χx stands for χxx , etc.)
5.2 Plane Wave Propagation in Crystals

We consider a plane wave with frequency ω propagating in a crystal. In a manner
similar to our previous analysis of plane waves propagating in isotropic materials,
we write as trial solutions
E = E0 e i (k·r−ωt )
B = B0 e i (k·r−ωt ) (5.4)
i (k·r−ωt )
P = P0 e
where restrictions on E0 , B0 , P0 , and k are yet to be determined. As usual, the
phase of each wave is included in the amplitudes E0 , B0 , and P0 , whereas k is real
in accordance with our assumption of no absorption.
We can make a quick observation about the behavior of these fields by apply-
ing Maxwell’s equations directly. Gauss’s law for electric fields requires
∇ · (²0 E + P) = k · (²0 E + P) = 0 (5.5)
and Gauss’s law for magnetism gives
∇·B = k·B = 0 (5.6)
We immediately notice the following peculiarity: From its definition, the Poynting
vector S ≡ E × B/µ0 is perpendicular to both E and B, and by (5.6) the k-vector is
perpendicular to B. However, by (5.5) the k-vector is not necessarily perpendicular
to E, since in general k · E 6= 0 if P points in a direction other than E. Therefore, k

and S are not necessarily parallel in a crystal. In other words, the flow of energy
and the direction of the phase-front propagation can be different in anisotropic
media.
Our main goal here is to relate the k-vector to the susceptibility parameters χx ,
χ y , and χz . To do this, we plug our trial plane-wave fields into the wave equation
(1.41). Under the assumption Jfree = 0, we have
∂2 E ∂2 P
∇2 E − µ0 ²0 = µ0 + ∇ (∇ · E) (5.7)
∂t 2 ∂t 2
Derivation of the dispersion relation in crystals
We begin by substituting the trial solutions (5.4) into the wave equation (5.7). After
carrying out the derivatives we find
k 2 E − ω2 µ0 (²0 E + P) = k (k · E) (5.8)
Inserting the constitutive relation (5.3) for crystals into (5.8) yields
k 2 E − ω2 µ0 ²0 1 + χx E x x̂ + 1 + χ y E y ŷ + 1 + χz E z ẑ = k (k · E) (5.9)
£¡ ¢ ¡ ¢ ¡ ¢ ¤
This relationship is unwieldy because of the mix of electric field components that
appear in the expression. This was not a problem when we investigated isotropic
materials for which the k-vector is perpendicular to E, making the right-hand side
of the equations zero. However, there is a trick for dealing with this.
Relation (5.9) actually contains three equations, one for each dimension. Explicitly,
these equations are
ω2 ¡
· ¸
k 2 − 2 1 + χx E x = k x (k · E) (5.10)
¢
c
ω2 ¡
· ¸
2
k − 2 1 + χ y E y = k y (k · E) (5.11)
¢
c
and
ω2 ¡
· ¸
k2 − 1 χ E z = k z (k · E) (5.12)
¢
+ z
c2
We have replaced the constants µ0 ²0 with 1/c 2 according to (1.43). We multiply
(5.10)–(5.12) respectively by k x , k y , and k z and also move the factor in square
brackets in each equation to the denominator on the right-hand side. Then by
adding the three equations together we get
k x2 (k · E) k y2 (k · E) k z2 (k · E)
i+h i+h i = k x E x + k y E y + k z E z = (k · E)
k2 − ( x) (1+χ y ) k2 − ( z)
ω2 1+χ ω2 ω2 1+χ
h
c2 k2 − c2 c2
(5.13)
Now k · E appears in every term and can be divided away. This gives the dispersion
relation (unencumbered by field components):
k x2 k y2 k z2 ω2
¢¤ + £ ¢¤ + £ ¢¤ = 2 (5.14)
k 2 c 2 /ω2 − 1 + χx k 2 c 2 /ω2 − 1 + χ y k 2 c 2 /ω2 − 1 + χz c
£ ¡ ¡ ¡
As a final touch, we have multiplied the equation through by ω2 /c 2

5.2 Plane Wave Propagation in Crystals 109
The dispersion relation (5.14) allows us to find a suitable k, given values for ω,
χx , χ y , and χz . Actually, it only restricts the magnitude of k; we must still decide
on a direction for the wave to travel (i.e. we must choose the ratios between k x , k y ,
and k z ). To remind ourselves of this fact, we introduce a unit vector that points in
the direction of the k:
k = k x x̂ + k y ŷ + k z ẑ = k u x x̂ + u y ŷ + u z ẑ = k û (5.15)
¡ ¢
With this unit vector inserted, the dispersion relation (5.14) for plane waves in a
crystal becomes
u x2 u 2y u z2 ω2
¢¤ + £ ¢¤ + £ ¢¤ = 2 2 (5.16)
k 2 c 2 /ω2 − 1 + χx k 2 c 2 /ω2 − 1 + χ y k 2 c 2 /ω2 − 1 + χz k c
£ ¡ ¡ ¡
We may define refractive index as the ratio of the speed of light in vacuum
c to the speed of phase propagation in a material ω/k (see P1.9). The relation
introduced for isotropic media (i.e. (2.19) for real index) remains appropriate.
That is
kc
n= (5.17)
ω
This familiar relationship between k and ω, in the case of a crystal, depends on
the direction of propagation in accordance with (5.16).
Inspired by (2.30), we will find it helpful to introduce several refractive-index
parameters:
n x ≡ 1 + χx
p
q
n y ≡ 1 + χy (5.18)
n z ≡ 1 + χz
p
With these definitions (5.17)-(5.18), the dispersion relation (5.16) becomes
u x2 u 2y u z2 1
2 2
+ 2 2
+ 2 2
¢= 2 (5.19)
n − nx n − ny n − nz n
¡ ¢ ¡ ¢ ¡
This is called Fresnel’s equation (not to be confused with the Fresnel coefficients
studied in chapter 3). The relationship contains the yet unknown index n that
varies with the direction of the k-vector (i.e. the direction of the unit vector û).
After multiplying through by all of the denominators (and after a fortuitous
cancelation owing to u x2 + u 2y + u z2 = 1), Fresnel’s equation (5.19) can be rewritten
as a quadratic in n 2 . The two solutions are
p
2 B ± B 2 − 4AC
n = (5.20)
2A
where
A ≡ u x2 n x2 + u 2y n 2y + u z2 n z2 (5.21)
³ ´ ³ ´
B ≡ u x2 n x2 n 2y + n z2 + u 2y n 2y n x2 + n z2 + u z2 n z2 n x2 + n 2y (5.22)
¡ ¢
C ≡ n x2 n 2y n z2 (5.23)

The upper and lower signs ( + and −) in (5.20) give two positive solutions for
n 2 . The positive square root of these solutions yields two physical values for n.
It turns out that each of the two values for n is associated with a polarization
direction of the electric field, given a propagation direction k. A broader analysis
carried out in appendix 5.C renders the orientation of the electric fields, whereas
here we only show how to find the two values of n. We refer to the two indices as
the slow and fast index, since the waves associated with each propagate at speed
v = c/n.
In the special cases of propagation along one of the principal axes of the
crystal, the index n takes on two of the values n x , n y , or n z , depending on which
are orthogonal to the direction of propagation.
Example 5.1
Calculate the two possible values for the index of refraction when k is in the ẑ
direction (in the crystal principal frame).
Solution: With u z = 1 and u x = u y = 0 we have

³ ´
A = n z2 ; B = n z2 n x2 + n 2y ; C = n x2 n 2y n z2
The square-root term is then

p q ¡
B 2 − 4AC = n z4 n x4 + 2n x2 n 2y + n 4y − 4n x2 n 2y n z4
¢
q ¡ ¢2
= n z4 n x2 − n 2y
³ ´
= n z2 n x2 − n 2y
Inserting this expression into (5.20), we find the two values for the index
n = nx , n y
The index n x is experienced by light whose electric field points in the x-dimension,
and the index n y is experienced by light whose electric field points in the y-
dimension (see appendix 5.C ).
Before moving on, let us briefly summarize what has been accomplished so
far. Given values for χx , χ y , and χz associated with light in a crystal at a given
frequency, one defines the indices n x , n y , and n z , according to (5.18). Next, a
direction for the k-vector is chosen (i.e. u x , u y , and u z ). This direction generally
has two values for the index of refraction associated with it, found using Fresnel’s
equation (5.20). Each index is associated with a specific polarization direction
for the electric field as outlined in appendix 5.C. Every propagation direction û
has its own natural set of polarization components for the electric field. The two
polarization components travel at different speeds, so even though the frequency
is the same, the wavelength within the crystal for each component is different.
Figure 5.3 Spherical coordinates. This is known as birefringence.

5.3 Biaxial and Uniaxial Crystals 111
5.3 Biaxial and Uniaxial Crystals

All anisotropic crystals have certain special propagation directions where the
two values for n from Fresnel’s equation are equal. These directions are referred
to as the optic axes of the crystal. When propagation is along an optic axis, all
polarization components experience the same index of refraction. If the values
of n x , n y , and n z are all unique, a crystal will have two optic axes, and hence is
referred to as a biaxial crystal.
It is often convenient to use spherical coordinates to represent the compo-
nents of û (see to Fig. 5.3):
u x = sin θ cos φ
u y = sin θ sin φ (5.24)
u z = cos θ 2.22 2.35
Here θ is the polar angle measured from the z-axis of the crystal and φ is the
azimuthal angle measured from the x-axis of the crystal. These equations em-
phasize the fact that there are only two degrees of freedom when specifying
propagation direction (θ and φ). It is important to remember that these angles
must be specified in the frame of the crystal’s principal axes, which are often not
aligned with the faces of a cut crystal in an optical setup.
By convention, we order the crystal axes for biaxial crystals so that n x < n y <
n z . Under this convention, the two optic axes occur in the x-z plane (φ = 0) at
two values of the polar angle θ, measured from the z-axis (see P5.3):
v 2.35 2.41
u 2
nx u n z − n 2y
cos θ = ± t (Optic axes directions, biaxial crystal) (5.25)
n y n z2 − n x2
While finding the optic axes in a biaxial crystal is not too bad, an expression for
the two indices of refraction is messy. The lower value is commonly referred to
as the ‘fast’ index and the higher value the ‘slow’ index. Figure 5.4 shows the
two refractive indices (i.e. the solutions to Fresnel’s equation (5.20)) for a biaxial
crystal plotted with color shading on the surface of a sphere. Each point on the
sphere represents a different θ and φ. The two optic axes are apparent in the plot
0 0.19
of the difference between n slow and n fast . When propagating in these directions,
either polarization experiences the same index. For the remainder of this chapter, Figure 5.4 The fast and slow re-
we will focus on the simpler case of uniaxial crystals. fractive indices (and their differ-
In uniaxial crystals two of the coefficients χx , χ y , and χz are the same. In ence) as a function of direction
for potassium niobate (KNbO3 ) at
this case, there is only one optic axis for the crystal (hence the name uniaxial).
λ = 500 nm (n x = 2.22, n y = 2.35,
By convention, in uniaxial crystals we label the dimension that has the unique
and n z = 2.41) .
susceptibility as the z-axis (i.e. χx = χ y 6= χz ). This makes the z-axis the optic axis.
The unique index of refraction is called the extraordinary index
n z = ne (5.26)
and the other index is called the ordinary index
n x = n y = no (5.27)

These names were coined by Huygens, one of the early scientists to study light
in crystals (see appendix 5.D). A uniaxial crystal with n e > n o is referred to as a
positive crystal, and one with n e < n o is referred to as a negative crystal.
To calculate the index of refraction for a wave propagating in a uniaxial crystal,
we use definitions (5.26) and (5.27) along with the spherical representation of û
(5.24) in Fresnel’s equation (5.20) to find the following two values for n (see P5.4):
n = no (uniaxial crystal) (5.28)
and
no ne
n = n e (θ) ≡ q (uniaxial crystal) (5.29)
1.56 1.68
n o2 sin2 θ + n e2 cos2 θ
The index n e (θ) in (5.29) is also commonly referred to as the extraordinary index
along with the constant n e = n z . While this has the potential for some confusion,
the practice is so common that we will perpetuate it here. We will write n e (θ)
when the angle dependent quantity specified by (5.29) is required, and write n e
in formulas where the constant (5.26) is called for (as in the right hand side of
(5.29)). Notice that n e (θ) depends only on θ (the polar angle measured from the
optic axis ẑ) and not φ (the azimuthal angle). Figure 5.5 shows the two refractive
indices (5.28) and (5.29) as a function θ and φ. Since n e (θ) has no φ dependence
1.68 1.68
and n o is constant, the variation is much simpler than for the biaxial case.
As outlined in appendix 5.C, the index n o corresponds to an electric field
component that points perpendicular to the plane containing û and ẑ (e.g. if
û is in the x-z plane, n o is associated with light polarized in the y-dimension).
On the other hand, the index n e (θ) corresponds to field polarization that lies
within the plane containing û and ẑ. In this case, the polarization component
is directed partially along the optic axis (i.e. it has a z-component). That is why
(5.29) gives for the refractive index a mixture of n o and n e . If θ = 0, then the
k-vector is directed exactly along the optic axis, and n e (θ) reduces to n o so that
0 0.12 both polarization components experience same index n o .
Figure 5.5 The extraordinary and
ordinary refractive indices (and
their difference) as a function of
5.4 Refraction at a Uniaxial Crystal Surface
direction for beta barium borate
(BBO) at λ = 500 nm (n o = 1.68 Next we consider refraction as light enters a uniaxial crystal. Snell’s law (3.7)
and n e = 1.56). describes the connection between the k-vectors incident upon and transmitted
through the surface. One must consider separately the portion of the light that ex-
periences the ordinary index from the portion that experiences the extraordinary
index. Because of the different indices, the ordinary and extraordinary polarized
light refract into the crystal at two different angles; they travel at two different
velocities in the crystal; and they have two different wavelengths in the crystal.
If we assume that the index outside of the crystal is one, Snell’s law for the
ordinary polarization is
(ordinary polarized light) sin θi = n o sin θt (5.30)

5.5 Poynting Vector in a Uniaxial Crystal 113
where n o is the index inside the crystal. The extraordinary polarized light also
obeys Snell’s law, but now the index of refraction in the crystal depends on direc-
tion of propagation inside the crystal relative to the optic axis. Snell’s law for the
extraordinary polarization is
sin θi = n e (θ 0 ) sin θt (5.31) (extraordinary polarized light)
where θ 0 is the angle between the optic axis inside the crystal and the direction of
propagation in the crystal (given by θt in the plane of incidence). When the optic
axis is at an arbitrary angle with respect to the surface the connection between
θ 0 and θt is cumbersome. We will examine Snell’s law only for the specific case
when the optic axis is perpendicular to the crystal surface, for which θt = θ 0 . y-axis
Snell’s Law for a Uniaxial Crystal with Optic Axis Perpendicular to the
Surface
Refer to Fig. 5.6. With the optic axis perpendicular to the surface, if the light z-axis
hits the crystal surface straight on, the index of refraction is n o , regardless of the
x-axis (directed into page)
orientation of polarization since θ 0 = 0. When the light strikes the surface at an
angle, s-polarized light continues to experience the index n o , while p-polarized
light experiences the extraordinary index n e(θ) . 3
When we insert (5.29) into Snell’s law (5.31) with θ 0 = θt , the expression can be
inverted to find the transmitted angle θt in terms of θi (see P5.5):
n e sin θi Figure 5.6 Propagation of light in a

tan θt = q (extraordinary polarized, optic axis ⊥ surface) (5.32) uniaxial crystal with its optic axis
n o n e2 − sin2 θi perpendicular to the surface.
As strange as this formula may appear, it is Snell’s law, but with an angularly
dependent index.
5.5 Poynting Vector in a Uniaxial Crystal

When an object is observed through a crystal (acting as a window), the energy
associated with ordinary and extraordinary polarized light follow different paths,
giving rise to two different images. This phenomenon is one of the more com-
monly observed manifestations of birefringence. Since the Poynting vector dic-
tates the direction of energy flow, it is the direction of S that determines the
separation of the double image seen when looking through a birefringent crystal.
Snell’s law dictates the connection between the directions of the incident
and transmitted k-vectors. The Poynting vector S for purely ordinary polarized
3 The correspondence between s and p and ordinary and extraordinary polarization components
is specific to the orientation of the optic axis in this example. For arbitrary orientations of the
optic axis with respect to the surface, the ordinary and extraordinary components will generally be
mixtures of s and p polarized light.

light points in the same direction as the k-vector, so the direction of energy flow
for ordinary polarized light also obeys Snell’s law. However, for extraordinary
polarized light, the Poynting vector S is not parallel to k (recall the discussion in
connection with (5.5) and (5.6)). Thus, the energy flow associated with extraordi-
nary polarized light does not obey Snell’s law. When Christiaan Huygens saw this
in the 1600s, he exclaimed “how extraordinary!” Huygens’ method for describing
the phenomenon is outlined appendix 5.D.
To analyze this situation, it is necessary to derive an expression for extraordi-
nary polarized light similar to Snell’s law, but which applies to S rather than to k.
This describes the direction that the energy associated with extraordinary rays
takes upon entering the crystal. To calculate the direction that the extraordinary
polarized S takes upon entering a crystal, we first calculate the direction of k
inside the crystal using Snell’s law (5.31). Then we use the expression (5.62) for E
along with B = (k × E)/ω, to evaluate S = E × B/µ0 . In general, this process is best
done numerically, since Snell’s law (5.31) for extraordinary polarized light usually
does not have simple analytic solutions.
Poynting Vector in a Uniaxial Crystal with Optic Axis Perpendicular to

the Surface
To find the direction of energy flow, we must calculate S = E × B/µ0 . We will need
to know E associated with n e (θ). We can obtain E from the procedures outlined
in appendix 5.C. Equivalently, we can obtain it from the constitutive relation (5.3)
with the definitions (5.18): we have
²0 E + P = ²0 1 + χx E x x̂ + 1 + χ y E y ŷ + 1 + χz E z ẑ
£¡ ¢ ¡ ¢ ¡ ¢ ¤
(5.33)
= ²0 n o2 E x x̂ + n o2 E y ŷ + n e2 E z ẑ
¡ ¢
Let the k-vector lie in the y-z plane. We may write it as k = k ŷ sin θt + ẑ cos θt .
¡ ¢
Then the ordinary component of the field points in the x-direction, while the
extraordinary component lies in the y-z plane.
Equation (5.33) requires
k · (²0 E + P) = k ŷ sin θt + ẑ cos θt · ²0 n o2 E x x̂ + n o2 E y ŷ + n e2 E z ẑ

¡ ¢ ¡ ¢
= ²0 k n o2 E y sin θt + n e2 E z cos θt (5.34)

¡ ¢
=0
Therefore, the y and z components of the extraordinary field are related through
n o2 E y
Ez = − tan θt (5.35)
n e2
We may write the extraordinary polarized electric field (leaving off e i (k·r−ωt ) ) as
n o2
µ ¶
(extraordinary polarized) E = E y ŷ − ẑ 2 tan θt (5.36)
ne

5.A Symmetry of Susceptibility Tensor 115
The associated magnetic field (see (2.56)) is
k×E
B=
ω
n2
³ ´
k ŷ sin θt + ẑ cos θt × E y ŷ − ẑ no2 tan θt
¡ ¢
= e (5.37) (extraordinary polarized)

ω
kE y n o2
µ ¶
= −x̂ 2
sin θt tan θt + cos θt
ω ne
The time-averaged Poynting vector becomes (5.37) we get
B
S = E×
2µ0
n o2 kE y n o2
µ ¶ µ ¶
= −E y ŷ − ẑ 2 tan θt × sin θt tan θt + cos θt x̂ (5.38) (extraordinary polarized)
ne 2µ0 ω n e2
kE y2 n o2 n o2
µ ¶µ ¶
= sin θ t tan θ t + cos θ t ẑ + ŷ tan θ t
2µ0 ω n e2 n e2
Let us label the direction of the Poynting vector with the angle θS . By definition,
the tangent of this angle is the ratio of the two vector components of S:
Sy n o2
tan θS ≡ = tan θt (5.39) (extraordinary polarized)
Sz n e2
While the k-vector is characterized by the angle θt , the Poynting vector is charac-
terized by the angle θS . Combining (5.32) and (5.39), we can connect θS to the
incident angle θi :
n o sin θi
tan θS = q (5.40) (extraordinary polarized)
n e n e2 − sin2 θi
As we noted in the last example, we have the case where ordinary polarized light is
s-polarized light, and extraordinary polarized light is p-polarized light due to our
specific choice of orientation for the optic axis in this section. In general, the s- and
p-polarized portions of the incident light can each give rise to both extraordinary
and ordinary rays.
Appendix 5.A Symmetry of Susceptibility Tensor

Here we show that the assumption of a non-absorbing (and non optically active)
medium implies that the susceptibility tensor is symmetric. We assume that P is
due to a single species of electron, so that we have P = N p. Here N is the number
of microscopic dipoles per volume and p = q e r, where q e is the charge on the
electron and r is the microscopic displacement of the electron. The force on this
electron due to the electric field is given by F = Eq e . With these definitions, we

can use (5.1) to write a connection between the force due to a static E and the
electron displacement:
x χxx χx y χxz Fx
   
²0
 χy x
N qe  y  = χy y χy z   F y  (5.41)
qe
z χzx χz y χzz Fz
The column vector on the left represents the components of the displacement
r. We next invert (5.41) to find the force of the electric field on an electron as a
function of its displacement4
Fx k xx kx y k xz x
    
 Fy  =  kyx ky y kyz   y  (5.42)
Fz k zx kz y k zz z
where
−1
k xx kx y k xz χxx χx y χxz
  
2
N qe
 kyx ky y kyz ≡  χy x χy y χy z  (5.43)
²0
k zx kz y k zz χzx χz y χzz
The total work done on an electron in moving it to its displaced position is
given by Z
W= F(r0 ) · d r0 (5.44)
path
While there are many possible paths for getting the electron to any specific dis-
placement (each path specified by a different history of the electric field), the
work done along any of these paths must be the same if the system is conservative
(i.e. no absorption). For example, for a final displacement of r = x x̂ + y ŷ we could
have the following two paths:
(x,y,0) Path 2 (x,y,0)
Path 1
(0,0,0) (0,0,0)
We can use (5.42) in (5.44) to calculate the total work done on the electron
along path 1:
Z x Z y
0 0 0 0
W= F x (x , y = 0, z = 0)d x + F y (x 0 = x, y 0 , z 0 = 0)d y 0
0 0
Z x Z y
0 0
= k xx x d x + (k y x x + k y y y 0 ) d y 0
0 0
k xx 2 ky y 2
= x + kyx x y + y
2 2
4 This inversion assumes the field changes slowly so the forces on the electron are always es-
sentially balanced. This is not true for optical fields, but the proof gives the right flavor for why
conservation of energy results in the symmetry. A more formal proof that doesn’t make this as-
sumption can be found in Principles of Optics, 7th Ed., Born and Wolf, pp. 790-791 (Ref. [1]).

5.B Rotation of Coordinates 117
If we take path 2, the total work is

Z y Z x
W= F y (x 0 = 0, y 0 , z 0 = 0)d y 0 + F x (x 0 , y 0 = y, z 0 = 0)d x 0
0 0
Z y Z x
= ky y y0 d y0 + (k xx x 0 + k x y y) d x 0
0 0
ky y k xx 2
= y 2 + kx y x y + x
2 2
Since the work must be the same for these two paths, we clearly have k x y = k y x .
Similar arguments for other pairs of dimensions ensure that the matrix of k
coefficients is symmetric. From linear algebra, we learn that if the inverse of a
matrix is symmetric then the matrix itself is also symmetric. When we combine
this result with the definition (5.43), we see that the assumption of no absorption
requires the susceptibility matrix to be symmetric.
Appendix 5.B Rotation of Coordinates
In this appendix, we go through the labor of showing that (5.1) can always be
written as (5.3) via rotations of the coordinate system, given that the susceptibility
tensor is symmetric (i.e. χi j = χ j i ). We have
P = ²0 χE (5.45)
where
Ex Px χxx χx y χxz
     
E ≡  Ey  P ≡  Py  χ ≡ χx y
 χy y χy z  (5.46)
Ez Pz χxz χy z χzz
Our task is to find a new coordinate system x 0 , y 0 , and z 0 for which the susceptibil-
ity tensor is diagonal. That is, we want to choose x 0 , y 0 , and z 0 such that
P0 = ²0 χ0 E0 , (5.47)
where
χ0 0 0
     
E x0 0 P x0 0 0 0
 x0 x χ0y 0 y 0
E0 ≡  E y0 0  P0 ≡  P y0 0  0
χ ≡ 0 (5.48)
    

E z0 0 P z0 0 0 0 χ0z 0 z 0
To arrive at the new coordinate system, we are free to make pure rotation trans-
formations. In a manner similar to (6.29), a rotation through an angle γ about the
z-axis, followed by a rotation through an angle β about the resulting y-axis, and

finally a rotation through an angle α about the new x-axis, can be written as
R 11 R 12 R 13
 
R ≡  R 21 R 22 R 23 
R 31 R 32 R 33
1 0 0 cos β 0 sin β cos γ sin γ 0
   
=  0 cos α sin α   0 1 0   − sin γ cos γ 0 

0 − sin α cos α − sin β 0 cos β 0 0 1
cos β cos γ cos β sin γ sin β
 
=  − cos α sin γ − sin α sin β cos γ cos α cos γ − sin α sin β sin γ sin α cos β 
sin α sin γ − cos α sin β cos γ − sin α cos γ − cos α sin β sin γ cos α cos β
(5.49)
The matrix R produces an arbitrary rotation of coordinates in three dimensions.
Specifically, we can write:
E0 = RE
(5.50)
P0 = RP
These transformations can be inverted to give
E = R−1 E0
(5.51)
P = R−1 P0
where
cos β cos γ − cos α sin γ − sin α sin β cos γ sin α sin γ − cos α sin β cos γ
 
R−1 =  cos β sin γ cos α cos γ − sin α sin β sin γ − sin α cos γ − cos α sin β sin γ 
sin β sin α cos β cos α cos β
R 11 R 21 R 31
 
=  R 12 R 22 R 32  = RT (5.52)
R 13 R 23 R 33
Note that the inverse of the rotation matrix is the same as its transpose, an impor-
tant feature that we exploit in what follows.
Upon inserting (5.51) into (5.45) we have
R−1 P0 = ²0 χR−1 E0 (5.53)
or
P0 = ²0 RχR−1 E0 (5.54)
From this equation we see that the new susceptibility tensor we seek for (5.47) is
χ0 ≡ RχR−1
R 11 R 12 R 13 χxx χx y χxz R 11 R 21 R 31
   
= R 21 R 22 R 23
   χx y χy y χy z   R 12 R 22 R 32 
R 31 R 32 R 33 χxz χy z χzz R 13 R 23 R 33
χx 0 x 0 χ0x 0 y 0 χ0x 0 z 0
 0 
=  x 0 y 0 χ0y 0 y 0 χ0y 0 z 0 
 χ0
(5.55)

χ0x 0 z 0 χ0y 0 z 0 χ0z 0 z 0

5.C Electric Field in Crystals 119
We have expressly indicated that the off-diagonal terms of χ0 are symmetric (i.e.
χ0i j = χ0j i ). This can be verified by performing the multiplication in (5.55). It is a
consequence of χ being symmetric and R−1 being equal to RT
The three off-diagonal elements of χ0 (appearing both above and below the
diagonal) are found by performing the matrix multiplication in the second line
of (5.55). The specific expressions for these three elements are not particularly
enlightening. The important point is that we can make all three of them equal to
zero since we have three degrees of freedom in the angles α, β, and γ. Although,
we do not expressly solve for the angles, we have demonstrated that it is always
possible to set
χ0x 0 y 0 = 0
χ0x 0 z 0 = 0 (5.56)
χ0y 0 z 0 =0
This justifies (5.3).
Appendix 5.C Electric Field in Crystals
To determine the direction of the electric field associated with the each value
of n, we return to (5.10), (5.11), and (5.12) in the analysis in section 5.2. These
equations can be written in matrix format as
ω2
 
1 + χx − k y2 − k z2 kx k y kx kz
¡ ¢
 Ex
 
c2
ω2 2 2

kx k y 1 χ  Ey  = 0
y − kx − kz k y kz
¡ ¢

 c2
+ 
ω2 Ez
kx kz k y kz 1 + χz − k x2 − k y2
¡ ¢
c2
(5.57)
where we have used k x2 + k y2 + k z2 = k 2 . We can divide every element by k 2 and
employ the definitions (5.15), (5.17), and (5.18) to make this matrix equation look
slightly nicer:
n x2
− u 2y − u z2
 
ux u y ux uz
 Ex
 
n2
 n 2y  Ey  = 0
 ux u y − u x2 − u z2 u y uz (5.58)
 n2 
n z2 Ez
ux uz u y uz n2
− u x2 − u 2y
For (5.58) to have a non-trivial solution (i.e. non zero fields), the determinant
of the matrix must be zero. Imposing this requirement is an equivalent way to
derive Fresnel’s equation (5.19) for n.
Given a direction for û and a value for n (from Fresnel’s equation), we can use
(5.58) to determine the direction of the electric field associated with that index.
It is left as an exercise to show that when all three u x 6= 0, u y 6= 0, and u z 6= 0, the

appropriate field direction for a value of n is given by

 ux 
 n 2 − n x2
Ex
  
uy
 
 
 Ey ∝ 2 (5.59)
   
 n − n 2y


Ez 
 uz


n 2 − n z2
This is a proportionality rather than an equation, since Maxwell’s equation only

specifies the direction of E – we are free to choose the amplitude. Because Fres-
nel’s equation gives two values for n, (5.59) specifies two distinct polarization
components associated with each û. These polarization components form a natu-
ral basis for describing light propagation in a crystal. When light is composed of a
mixture of these two polarizations, the two polarization components experience
different indices of refraction.
If any of the components of û (i.e. u x , u y , or u z ) is precisely zero, the cor-
responding entry in (5.59) yields a zero-over-zero situation. This happens as at
least one of the dimensions in (5.58) becomes decoupled from the others. In
these cases, one can and re-solve (5.58) for the polarization directions as in the
following example.
Example 5.2
Determine the directions of the two polarization components associated with light
propagating in the û = ẑ direction. (Compare with Example 5.1.)
Solution: In this case we have u x = u y = 0, so as noted above, we have to go back

to (5.58) and re-solve. In our case, the set of equations becomes
n x2
 
−1 0 0
 Ex
 
n2
 n 2y
 Ey
 0 −1 0   =0 (5.60)
 n2
n z2 Ez
0 0 n2
Notice that all three dimensions are decoupled in this system (i.e. there are no
off-diagonal terms). In Example 5.1 we found that the two values of n associated
with û = ẑ are n x and n y . If we use n = n x in our set of equations, we have
 
0 0 0
 Ex
 
 n 2y
 0 −1 0 
 n x2  Ey  = 0
2
nz Ez
 
0 0 n 2
x
Assuming n x and n y are unique so that n y /n x 6= 1, these equations require E y =

E z = 0 but allow E x to be non-zero. This proves our earlier assertion that the index
n x is associated with light polarized in the x-dimension in the special case of û = ẑ.
Similarly, when n y is inserted into (5.60), we find that it is associated with light
polarized in the y-dimension.

5.C Electric Field in Crystals 121
We can use (5.59) to study the behavior of polarization direction as the direc-
tion of propagation varies. Figure 5.7 shows plots of the polarization direction (i.e.
normalized E x , E y , and E z ) in Potassium Niobate as the propagation direction (a) Polarization Direction for Slow Index
is varied. The plot is created by inserting the spherical representation of û (5.24)

into Fresnel’s equation (5.20) for a chosen sign of the ±, and then inserting the re-
sulting n into (5.59) to find the associated electric field. As we saw in Example 5.2,
at θ = 0 the light associated with the slow index is polarized along the y-axis and
the light associated with the fast index is polarized along the x-axis.
In Fig. 5.7(c) we have plotted the angle between the two polarization com-
ponents. At θ = 0, the two polarization components are 90◦ apart, as one would
expect. However, notice that in other propagation directions the two linear polar-
ization components are not precisely perpendicular. 5 Even so, the two polariza- (b) Polarization Direction for Fast Index
tion components of E are orthogonal in a mathematical sense, so that they still
comprise a useful basis for decomposing the light field.
Determining the Fields in a Uniaxial Crystal.
− sin φ
 
Eo (û) ∝  cos φ  (5.61)

0 (c) Angle Between Polarization Components
This is shown by inserting n = n o into the requirement (5.58), and finding the
allowed fields (see P5.9). This field component is associated with the ordinary
wave because just as in an isotropic medium such as glass, the index of refraction
for light with this polarization does not vary with θ. The polarization component
associated with n e (θ) is found by using (5.59):
sin θ cos φ
 
 n 2 (θ) − n 2 
 e o 
 
Figure 5.7 Polarization direction
 
 sin θ sin φ 
Ee (û) ∝  2 (5.62) associated with the two values of n
 
 n e (θ) − n o2

in Potassium Niobate (KNbO3 ) at

 
λ = 500 nm (n x = 2.22, n y = 2.34,
 
cos θ
 
and n z = 2.41) and φ = π/4. Frame
 
n e2 (θ) − n e2
(c) shows the angle between the
Notice that this polarization component is partially directed along the optic axis two polarization components.
(i.e. it has a z-component), and it is not perpendicular to k since û · Ee (û) 6= 0 (see
P5.10). It is, however, perpendicular to the ordinary polarization component, since
Ee · Eo = 0.
Notice that when θ = 0, (5.29) reduces to n = n o so that both indices are the same.
On the other hand, if θ = π/2 then (5.29) reduces to n = n e .
5 The two components of the electric displacement vector D = ² E + P remain perpendiular.

0

Appendix 5.D Huygens’ Elliptical Construct for a Uniaxial

Crystal
In 1690 Christian Huygens developed a way to predict the direction of extraordi-
nary rays in a crystal by examining an elliptical wavelet. The point on the elliptical
wavelet that propagates along the optic axis is assumed to experience the index
n e . The point on the elliptical wavlet that propagates perpendicular to the optic
axis is assumed to experience the index n o . It turns out that Huygens’ approach
agreed with the direction energy propagation (5.40) (as opposed to the direction
of the k-vector). This was quite satisfactory in Huygens’ day (except that he was
largely ignored for a century, owing to Newton’s corpuscular theory) since the
y-axis direction of energy propagation is what an observer sees.
Consider a plane wave entering a uniaxial crystal with the optic axis perpen-
dicular to the surface. In Huygens’ point of view, each point on a wave front acts
as a wavelet source which combines with neighboring wavelets to preserve the
overall plane wave pattern. Inside the crystal, the wavelets propagate in the shape
of an ellipse. The equation for an elliptical wave front after propagating during a
time t is
y2 z2
+ =1 (5.63)
(c t /n e )2 (c t /n o )2
After rearranging, the equation of the ellipse inside the crystal can also be written
as s
ct y2
z= 1− (5.64)
no (c t n e )2
In order to have the wavelet joint neatly with other wavelets to build a plane wave,
the wave front of the ellipse must be parallel to a new wave front entering the sur-
face at a distance c t / sin θi above the original point. This distance is represented
Figure 5.8 Elliptical wavelet.
by the hypotenuse of the right triangle seen in Fig. 5.8. Let the point where the
wave front touches the ellipse be denoted by y, z = (z tan θS , z). The slope (rise
¡ ¢
over run) of the line that connects these two points is then
dz z
=− (5.65)
dy c t / sin θi − z tan θS
At the point where the wave front touches the ellipse (i.e., y, z = (z tan θS , z)), the
¡ ¢
slope of the curve for the ellipse is
dz −yn e2 n e2 y n e2
= =− =− tan θS (5.66)
dy n o2 z n o2
r
y2
n o c t 1 − (ct /n 2
e)
We would like these two slopes to be the same. We therefore set them equal to
each other:
n e2 z c t n e2 tan θS n e2
− 2
tan θS = − ⇒ = 2 tan2 θS + 1 (5.67)
no c t / sin θi − z tan θS z n o2 sin θi no

5.D Huygens’ Elliptical Construct for a Uniaxial Crystal 123
If we evaluate (5.63) for the point y, z = (z tan θS , z), we obtain

¡ ¢
s
ct n e2
= no tan2 θS + 1 (5.68)
z n o2
Upon substitution of this into (5.67) we arrive at

s
n e2 tan θS n e2 n e4 tan2 θS n e2
= tan2 θS + 1 ⇒ = tan2 θS + 1 (5.69)
n o2 sin θi no2
n o2 sin2 θi n o2
n e2 n o2 n o sin θi
· ¸
⇒ 2
− 1 tan2 θS = ⇒ tan θS = (5.70)
sin θi n e2
q
n e n e2 − sin2 θi
This agrees with (5.40) as anticipated. Again, Huygens’ approach obtained the
correct direction of the Poynting vector associated with the extraordinary wave.

Exercises
Exercises for 5.2 Plane Wave Propagation in Crystals
P5.1 Solve Fresnel’s equation (5.19) to find the two values of n associated
with a given û. Show that both solutions yield a positive index of
refraction
HINT: Show that (5.19) can be manipulated into the form
h³ ´ i
0= u x2 + u 2y + u z2 − 1 n 6
h³ ´ ³ ´ ³ í
+ n x2 + n 2y + n z2 − u x2 n 2y + n z2 − u 2y n x2 + n z2 − u z2 n x2 + n 2y n 4
¡ ¢
h³ ´ i
− n x2 n 2y + n x2 n z2 + n 2y n z2 − u x2 n 2y n z2 − u 2y n x2 n z2 − u z2 n x2 n 2y n 2 + n x2 n 2y n z2
The coefficient of n 6 is identically zero since by definition we have

u x2 + u 2y + u z2 = 1.
P5.2 Suppose you have a crystal with n x = 1.5, n y = 1.6, and n z = 2.0. Use
Fresnel’s equation to determine what the two indices of p refraction are
for a k-vector in the crystal along the û = (x̂ + 2ŷ + 3ẑ)/ 14 direction.
Exercises for 5.3 Biaxial and Uniaxial Crystals
P5.3 Given that the optic axes are in the x-z plane, show that the direction
of the optic axes are given by (5.25).
HINT: The two indices are the same when B 2 − 4AC = 0. You will want
to use polar coordinates for the direction unit vector, as in (5.24). Set
φ = 0 so you are in the x-z plane. Use sin2 θ + cos2 θ = 1 to get an
equation that only has cosine terms and solve for cos2 θ.
P5.4 Use definitions (5.26) and (5.27) along with the spherical representation
of û (5.24) in Fresnel’s equation (5.20) to calculate the two values for
the index in a uniaxial crystal (i.e. (5.28) and (5.29)).
HINT: First show that
A = n o2 sin2 θ + n e2 cos2 θ
B = n o2 n e2 + n o4 sin2 θ + n e2 n o2 cos2 θ
C = n o4 n e2
and then use these expressions to evaluate Fresnel’s equation.
P5.5 Derive (5.32).

Exercises 125
P5.6 A quartz plate (uniaxial crystal with the optic axis perpendicular to
the surfaces) has thickness d = 0.96 mm. The indices of refraction
are n o = 1.54424 and n e = 1.55335. A plane wave with wavelength
λvac = 633 nm passes through the plate. After emerging from the crystal,
there is a phase difference ∆ between the two polarization components
of the plane wave, and this phase difference depends on incident angle
θi . Use a computer to plot ∆ as a function of incident angle from zero
to 90◦ .
HINT: For s-polarized light, show that the number of wavelengths
d
that fit in the plate is (s) . For p-polarized light, show that
(λvac /n o ) cos θt
the number of wavelengths that fit in the plate and the extra leg δ
d δ
outside of the plate (see Fig. 5.9) is (p) + λ , where δ =
( λvac /n p ) cos θt vac
(p)
h i
d tan θt(s) − tan θt sin θi and n p is given by (5.29). Find the difference Figure 5.9 Diagram for P5.6.
between these expressions and multiply by 2π to find ∆.
L5.7 In the laboratory, send a HeNe laser (λvac = 633 nm) through two
crossed polarizers, oriented at 45◦ and 135◦ . Place the quartz plate
described in P5.6 between the polarizers on a rotation stage. Now
equal amounts of s- and p-polarized light strike the crystal as it is
rotated from normal incidence. (video)
Dim spots
Bright spots
Laser
Polarizer Quartz Crystal Polarizer Screen Phase Difference
on a rotation stage
Figure 5.11 Schematic for L 5.7.
If the phase shift between the two paths is an odd integer times π, the Figure 5.10 Plot for P5.6 and L 5.7.
crystal acts as a half wave plate and maximum transmission through
the second polarizer results. If the phase shift is an even integer times π,
then minimum transmission through the second polarizer results. Plot
these measured maximum and minimum points on your computer-
generated graph of the previous problem.
Exercises for 5.C Electric Field in Crystals
P5.8 Show that (5.59) is a solution to (5.58).
P5.9 Show that the field polarization component associated with n = n o in

a uniaxial crystal is directed perpendicular to the plane containing û
and ẑ by substituting this value for n into (5.58) and determining what
combination of field components are allowable.

HINT: Use (5.24) to represent û with φ = 0 (the index is the same for all
φ, so you may as well use one that makes calculation easy). When you
substitute into (5.58) you will find that E y can be any value because of
the location of zeros in the matrix.
To get a requirement on E x and E z , collapse the matrix equation down
to a 2 × 2 system. For non-trivial solutions to exist (i.e. E x 6= 0 or E y 6= 0),
the determinant of the matrix must be zero. Show that this is only the
case if n o = n e (i.e. the crystal is isotropic).
P5.10 Show that the electric field for extraordinary polarized light Ee (û) in a
uniaxial crystal is not perpendicular to k (i.e. û), but that it is perpen-
dicular to the ordinary polarization component Eo (û).

Review, Chapters 1–5
Students preparing for an exam should understand the following questions and
problems thoroughly enough to be able to work them without referring back to
previous chapters.
True and False Questions
R1 T or F: The optical index of any material (not vacuum) varies with

frequency.
R2 T or F: The frequency of light can change as it enters a crystal (consider

low intensity—no nonlinear effects).
R3 T or F: The entire expression E0 e i (k·r−ωt ) associated with a light field

(both the real part and the imaginary parts) is physically relevant.
R4 T or F: The real part of the refractive index cannot be less than one.
R5 T or F: s-polarized light and p-polarized light experience the same

phase shift upon reflection from a material with complex index.
R6 T or F: When light is incident upon a material interface at Brewster’s

angle, only one polarization can transmit.
R7 T or F: When light is incident upon a material interface at Brewster’s

angle one of the polarizations stimulates dipoles in the material to
oscillate with orientation along the direction of the reflected k-vector.
R8 T or F: The critical angle for total internal reflection exists on both sides
of a material interface.
R9 T or F: From any given location above a (smooth flat) surface of water,

it is possible to see objects positioned anywhere under the water.
R10 T or F: From any given location beneath a (smooth flat) surface of water,
it is possible to see objects positioned anywhere above the water.
R11 T or F: An evanescent wave travels parallel to an interface surface on

the transmitted side.
127
128 Review, Chapters 1–5
R12 T or F: When p-polarized light enters a material at Brewster’s angle, the

intensity of the transmitted beam is the same as the intensity of the
incident beam.
R13 T or F: For incident angles beyond the critical angle for total internal
reflection, the Fresnel coefficients t s and t p are both zero.
R14 T or F: It is always possible to completely eliminate reflections using a

single-layer antireflection coating if you are free to choose the coating
thickness but not its index.
R15 T or F: For a given incident angle and value of n, there is only one
single-layer coating thickness d that will minimize reflections.
R16 T or F: When coating each surface of a lens with a single-layer antire-

flection coating, the thickness of the coating on the exit surface will
need to be different from the thickness of the coating on the entry
surface.
R17 T or F: As light enters a crystal, the Poynting vector always obeys Snell’s
law.
R18 T or F: As light enters a crystal, the k-vector does not obey Snell’s for
the extraordinary wave.
Problems
R19 (a) Write down Maxwell’s equations.

(b) Derive the wave equation for E under the assumptions that Jfree = 0
and P = ²0 χE. Note: ∇ × (∇ × f) = ∇ (∇ · f) − ∇2 f.
(c) Show by direct substitution that E (r, t ) = E0 e i (k·r−ωt ) is a solution to
the wave equation. Find the resulting connection between k and ω.
Give appropriate definitions for c and n, assuming that χ is real.
(d) If k = k ẑ and E0 = E 0 x̂, find the associated B-field.
(e) The Poynting vector is S = E × B/µ0 , where the fields are real. Derive
an expression for I ≡ 〈S〉t .
z-axis
R20 Consider an interface between two isotropic media where the incident
x-axis field is defined by
directed into page
Ei = E i ŷ cos θi − ẑ sin θi + x̂E i(s) e i [ki ( y sin θi +z cos θi )−ωi t ]
h i
(p)
¡ ¢
The plane of incidence is shown in Fig. 5.12

Figure 5.12 (a) By inspection of the figure, write down similar expressions for the
reflected and transmitted fields (i.e. Er and Et ).

129
(b) Find an expression relating Ei , Er , and Et using the boundary condi-

tion at the interface. From this expression obtain the law of reflection
and Snell’s law.
(c) The boundary condition requiring that the tangential component
of B must be continuous leads to
(p) (p) (p)
n i (E i − E r ) = n t E t
n i (E i(s) − E r(s) ) cos θi = n t E t(s) cos θt

Use this and the results from part (b) to derive
(p)
Er tan (θi − θt )
rp ≡ =−
(p)
Ei tan (θi + θt )
You may use the identity
sin θi cos θi − sin θt cos θt tan (θi − θt )

=
sin θi cos θi + sin θt cos θt tan (θi + θt )
R21 The Fresnel equations are
E r(s) sin θt cos θi − sin θi cos θt

rs ≡ =
Ei (s)
sin θt cos θi + sin θi cos θt
E t(s) 2 sin θt cos θi

ts ≡ =
Ei (s)
sin θt cos θi + sin θi cos θt
(p)
Er cos θt sin θt − cos θi sin θi
rp ≡ =
Ei
(p)
cos θt sin θt + cos θi sin θi
(p)
Et 2 cos θi sin θt
tp ≡ =
Ei
(p)
cos θt sin θt + cos θi sin θi
(a) Find what each of these equations reduces to when θi = 0. Give your
answer in terms of n i and n t .
(b) What percent of light (intensity) reflects from a glass surface (n =
1.5) when light enters from air (n = 1) at normal incidence?
(c) What percent of light reflects from a glass surface when light exits
into air at normal incidence?
R22 Light goes through a glass prism with optical index n = 1.55. The light
enters at Brewster’s angle and exits at normal incidence as shown in
Fig. 5.13.
Figure 5.13
(a) Derive and calculate Brewster’s angle θB . You may use the results of
R20 (c).

(b) Calculate φ.
(c) What percent of the light (power) goes all the way through the prism
if it is p-polarized? You may use the Fresnel coefficients given in R21.
(d) What percent for s-polarized light?
R23 A 45◦ - 90◦ - 45◦ prism is a good device for reflecting a beam of light
parallel to the initial beam (see Fig. 5.14). The exiting beam will be
parallel to the entering beam even when the incoming beam is not
normal to the front surface (although it needs to be in the plane of the
drawing).
(a) How large an angle θ can be tolerated before there is no longer total
internal reflection at both interior surfaces? Assume n = 1 outside of
the prism and n = 1.5 inside.
(b) If the light enters and leaves the prism at normal incidence, what
will the difference in phase be between the s and p-polarizations? You
Figure 5.14 may use the Fresnel coefficients given in R21.
R24 A thin glass plate with index n = 1.5 is oriented at Brewster’s angle so
that p-polarized light with wavelength λvac = 500 nm goes through
with 100% transmittance.
(a) What is the minimum thickness that will make the reflection of
s-polarized light be maximum?
(b) What is the total transmittance T stot for this thickness assuming
s-polarized light?
R25 Consider a Fabry-Perot interferometer. Note: R 1 = R 2 = R.

(a) Show that the free spectral range for a Fabry-Perot interferometer is
λ2
∆λFSR =
2nd cos θ
(b) Show that the fringe width is
λ2
∆λFWHM = p
π F nd cos θ
4R
where F ≡ (1−R)2
.
(c) Derive the reflecting finesse f = ∆λFSR /∆λFWHM .
R26 For a Fabry-Perot etalon, let R = 0.90, λvac = 500 nm, n = 1, and d =
5.0 mm.
(a) Suppose that a maximum transmittance occurs at the angle θ = 0.
What is the nearest angle where the transmittance will be half of the
maximum transmittance? You may assume that cos θ ∼ = 1 − θ 2 /2.

131
(b) You desire to use a Fabry-Perot etalon to view the light from a large
diffuse source rather than a point source. Draw a diagram depicting
where lenses should be placed, indicating relevant distances. Explain
briefly how it works.
R27 You need to make an antireflective coating for a glass lens designed to
work at normal incidence.
The matrix equation relating the incident field to the reflected and
transmitted fields (at normal incidence) is
−i
1 1 E0 cos k 1 ` sin k 1 ` 1 E 2
· ¸ · ¸ · ¸· ¸
+ = n1
n0 −n 0 E 0 −i n 1 sin k 1 ` cos k 1 ` n2 E 0
(a) What is the minimum thickness the coating should have?

Figure 5.15
HINT: It is less work if you can figure this out without referring to the
above equation. You may assume n 1 < n 2 .
(b) Find the index of refraction n 1 that will make the reflectivity be zero.
R28 Second harmonic generation (the conversion of light with frequency ω

into light with frequency 2ω) can occur when very intense laser light
travels in a material. For good harmonic production, the laser light
and the second harmonic light need to travel at the same speed in the
material. In other words, both frequencies need to have the same index
of refraction so that harmonic light produced down stream joins in
phase with the harmonic light produced up stream, referred to as phase
matching. This ensures a coherent building of the second harmonic
field rather than destructive cancellations.
Unfortunately, the index of refraction is almost never the same for dif-
ferent frequencies in a given material, owing to dispersion. However,
we can achieve phase matching in some crystals where one frequency
propagates as an ordinary wave and the other propagates as an extraor-
dinary wave. We cause the two indices to be precisely the same by
tuning the angle of the crystal.
Consider a ruby laser propagating and generating the second harmonic
in a uniaxial KDP crystal (potassium dihydrogen phosphate). The
indices of refraction are given by n o and
no ne
q
n o2 sin2 θ + n e2 cos2 θ
where φ is the angle made with the optic axis. At the frequency of a
ruby laser, KDP has indices n o (ω) = 1.505 and n e (ω) = 1.465. At the
frequency of the second harmonic, the indices are n o (2ω) = 1.534 and
n e (2ω) = 1.487.

Show that phase matching can be achieved if the laser is polarized so

that it experiences only the ordinary index and the second harmonic
light is polarized perpendicular to that. At what angle φ does this phase
matching occur?
Selected Answers
R21: (b) 4% (c) 4%.

R22: (b) 33◦ , (c) 95%, (d) 79%.
R23: (a) 4.8◦ , (b) 74◦ .
R24: (a) 100 nm. (b) 0.55.
R26: (a) 0.074◦ .
P27: (b) 1.24.
R28: 51.12◦ .

Chapter 6
Polarization of Light
When the direction of the electric field of light oscillates in a regular, predictable
fashion, we say that the light is polarized. Polarization describes the direction
of the oscillating electric field, a distinct concept from dipoles per volume in a
material P – also called polarization. In this chapter, we develop a formalism for
describing polarized light and the effect of devices that modify polarization. If the
electric field oscillates in a plane, we say that it is linearly polarized. The electric
field can also spiral around while a plane wave propagates, and this is called
elliptical polarization. There is a convenient way for keeping track of polarization
using a two-dimensional Jones vector.
Many devices can affect polarization such as polarizers and wave plates. Their
effects on a light field can be represented by 2 × 2 Jones matrices that operates on
the Jones vector representing the light. A Jones matrix can describe, for example,
a linear polarizer oriented at an arbitrary angle with respect to the coordinate
system. Likewise, a Jones matrix can describe the manner in which a wave plate
introduces a relative phase between two components the electric field. A wave
plate can be used to convert, for example, linearly polarized light into circularly
polarized light. Figure 6.1 Animation showing
In this chapter, we will also see how reflection and transmission at a material different polarization states of
interface influences field polarization. The Fresnel coefficients studied in the light.
chapters 3 and 4 can be conveniently incorporated into the 2 × 2 matrix formula-
tion for handling polarization. As we saw previously, the amount of light reflected
from a surface depends on the type of polarization, s or p. In addition, upon
reflection, s-polarized light can acquire a phase lag or phase advance relative to
p-polarized light. This is especially true at metal surfaces, which have complex
indices of refraction (i.e. highly absorptive). Ellipsometry, outlined in appendix
6.A, is the science of characterizing optical properties of materials through an
examination of these effects.
Throughout this chapter, we consider light to have well characterized polar-
ization. However, in most natural sources of light (e.g. sunlight or the light from an
incandescent lamp) the direction of the electric field varies rapidly and randomly.
Such sources are commonly referred to as unpolarized. It is common to have a
133
134 Chapter 6 Polarization of Light
mixture of unpolarized and polarized light, called partially polarized light. The
Jones vector formalism used in this chapter is inappropriate for describing the
unpolarized portions of the light. In appendix 6.B we describe a more general
formalism for dealing with light with an arbitrary degree of polarization.
6.1 Linear, Circular, and Elliptical Polarization

Consider the plane-wave solution to Maxwell’s equations given by
E (r, t ) = E0 e i (k·r−ωt ) (6.1)
The wave vector k specifies the direction of propagation. We neglect absorption

so that the refractive index is real and k = nω/c = 2πn/λvac (see (2.19)–(2.24)). In
an isotropic medium we know that k and E0 are perpendicular, but even after
the direction of k is specified, we are still free to have E0 point anywhere in two
dimensions perpendicular to k. If we orient our coordinate system with the z-axis
in the direction of k, we can write (6.1) as
E (z, t ) = E x x̂ + E y ŷ e i (kz−ωt ) (6.2)

¡ ¢
As always, only the real part of (6.2) is physically relevant. The complex amplitudes
of E x and E y keep track of the phase of the oscillating field components. In
general the complex phases of E x and E y can differ, so that the wave in one of the
dimensions lags or leads the wave in the other dimension.
The relationship between E x and E y describes the polarization of the light.
+ For example, if E y is zero, the plane wave is said to be linearly polarized along the
x-dimension. Linearly polarized light can have any orientation in the x–y plane,
and it occurs whenever E x and E y have the same complex phase (or a phase
differing by an integer times π). For our purposes, we will take the x-dimension
to be horizontal and the y-dimension to be vertical unless otherwise noted.
As an example, suppose E y = i E x , where E x is real. The y-component of the
field is then out of phase with the x-component by the factor i = e i π/2 . Taking the
real part of the field (6.2) we get
h i h i
E (z, t ) = Re E x e i (kz−ωt ) x̂ + Re e i π/2 E x e i (kz−ωt ) ŷ
y = E x cos (kz − ωt ) x̂ + E x cos (kz − ωt + π/2) ŷ (left circular) (6.3)

= E x cos (kz − ωt ) x̂ − sin (kz − ωt ) ŷ
£ ¤
x
z In this example, the field in the y-dimension lags the field in the x-dimension
Figure 6.2 The combination of by a quarter cycle. That is, the behavior seen in the x-dimension happens in the
two orthogonally polarized plane y-dimension a quarter cycle later. The field never goes to zero simultaneously
waves that are out of phase results in both dimensions. In fact, in this example the strength of the electric field
in elliptically polarized light. Here is constant, and it rotates in a circular pattern in the x-y dimensions. For this
we have left circularly polarized reason, this type of field is called circularly polarized. Figure 6.2 graphically shows
light created as specified by (6.3).
the two linear polarized pieces in (6.3) adding to make circularly polarized light.

6.2 Jones Vectors for Representing Polarization 135
If we view a circularly polarized light field throughout space at a frozen instant

in time (as shown in Fig. 6.2), the electric field vector spirals as we move along
the z-dimension. If the sense of the spiral (with time frozen) matches that of a
common wood screw oriented along the z-axis, the polarization is called right
handed. (It makes no difference whether the screw is flipped end for end.) If
instead the field spirals in the opposite sense, then the polarization is called left
handed. The field in shown at the right side of Fig. 6.2 is an example of left-handed
circularly polarized light.
An equivalent way to view the handedness convention is to imagine the light
impinging on a screen as a function of time. The field of a right-handed circularly
polarized wave rotates counter clockwise at the screen, when looking along the k
direction (towards the front side of the screen). The field rotates clockwise for a
left-handed circularly polarized wave.
Linearly polarized light can become circularly or, in general, elliptically po-
larized after reflection from a metal surface if the incident light has both s- and
p-polarized components. A good experimentalist working with light needs to
know this. Reflections from multilayer dielectric mirrors can also exhibit these
phase shifts.
6.2 Jones Vectors for Representing Polarization

In 1941, R. Clark Jones introduced a two-dimensional matrix algebra that is useful
for keeping track of light polarization and the effects of optical elements that
influence polarization. The algebra deals with light having a definite polarization,
such as plane waves. It does not apply to un-polarized or partially polarized light
(e.g. sunlight). For partially polarized light, a four-dimensional algebra known as
Stokes calculus is used (see Appendix 6.B).
In preparation for introducing Jones vectors, we explicitly write the complex
phases of the field components in (6.2) as R. Clark Jones (1916–2004, American)
was born in Toledo Ohio. He was one
³ ´ of six high school seniors to receive a
E (z, t ) = |E x |e i φx x̂ + |E y |e i φy ŷ e i (kz−ωt ) (6.4) Harvard College National Prize Fellow-
ship. He earned both his undergraduate
(summa cum laude 1938) and Ph.D.
and then factor (6.4) as follows: degrees from Havard (1941). After
³ ´ working several years at Bell Labs, he
E (z, t ) = E eff A x̂ + B e i δ ŷ e i (kz−ωt ) (6.5) spent most of his professional career
at Polaroid Corporation in Cambridge
MA, until his retirement in 1982. He
where is well-known for a series of papers on
polarization published during the period
q ¯ ¯2 1941-1956. He also contributed greatly
E eff ≡ |E x |2 + ¯E y ¯ e i φx (6.6) to the development of infrared detectors.
He was an avid train enthusiast, and
|E x | even wrote papers on railway engineer-
A≡ q ¯ ¯2 (6.7) ing.
|E x |2 + ¯E y ¯
¯ ¯
¯E y ¯
B≡q ¯ ¯2 (6.8)
|E x |2 + ¯E y ¯

δ ≡ φ y − φx (6.9)
Please notice that A and B are real non-negative dimensionless numbers that
satisfy A 2 + B 2 = 1. If E y is zero, then B = 0 and everything is well-defined. On the
other hand, if E x happens to be zero, then its phase e i φx is indeterminant. In this
case we let E eff = |E y |e i φy , B = 1, and δ = 0.
Linearly polarized along x The overall field strength E eff is often unimportant in a discussion of polariza-
tion. It represents the strength of an effective linearly polarized field that would
1
· ¸
give the same intensity that (6.4) would yield. Specifically, from (6.5) and (2.62)
0 we have
1 1
Linearly polarized along y I = 〈S〉t = nc²0 E · E∗ = nc²0 |E eff |2 (6.10)
2 2
0 The phase of E eff represents an overall phase shift that one can trivially adjust by
· ¸
1 physically moving the light source (a laser, say) forward or backward by a fraction
of a wavelength.
Linearly polarized at angle α
The portion of (6.5) that is relevant to our discussion of polarization is the
(measured from the x-axis)
vector A x̂+B e i δ ŷ, referred to as the Jones vector. This vector contains the essential
cos α information regarding field polarization. Notice that the Jones vector is a kind
· ¸
sin α of unit vector, in that (A x̂ + B e i δ ŷ) · (A x̂ + B e i δ ŷ)∗ = 1. (The asterisk represents

the complex conjugate.) When writing a Jones vector we dispense with the x̂ and
Right circularly polarized
ŷ notation and organize the components into a column vector (for later use in
1 1 matrix algebra) as follows:
· ¸
p
A
· ¸
2 −i
(6.11)
B eiδ
Left circularly polarized
This vector can describe the polarization state of any plane wave field. Table 6.1
1 1
· ¸
p lists some Jones vectors representing various polarization states.
2 i
6.3 Elliptically Polarized Light

Table 6.1 Jones Vectors for several
common polarization states. In general, the Jones vector (6.11) represents a polarization state in between linear
and circular. This ‘in-between’ state is known as elliptically polarized light. As
the wave travels, the field vector makes a spiral motion. If we observe the field
vector at a point as the field goes by, the field vector traces out an ellipse oriented
perpendicular to the direction of travel (i.e. in the x–y plane). One of the axes of
the ellipse occurs at the angle
1 2AB cos δ
µ ¶
α= tan−1 (6.12)
2 A2 − B 2
with respect to the x-axis (see P6.8). This angle sometimes corresponds to the
minor axis and sometimes to the major axis of the ellipse, depending on the exact
values of A, B , and δ. The other axis of the ellipse (major or minor) then occurs at
α ± π/2 (see Fig. 6.3).
We can deduce whether (6.12) corresponds to the major or minor axis of the
ellipse by comparing the strength of the electric field when it spirals through the

6.4 Linear Polarizers and Jones Matrices 137
direction specified by α and when it spirals through α ± π/2. The strength of the
electric field at α is given by (see P6.8)
p
E α = |E eff | A 2 cos2 α + B 2 sin2 α + AB cos δ sin 2α (E max or E min ) (6.13)
and the strength of the field when it spirals through the orthogonal direction
(α ± π/2) is given by
p
E α±π/2 = |E eff | A 2 sin2 α + B 2 cos2 α − AB cos δ sin 2α (E max or E min ) (6.14)
After computing (6.13) and (6.14), we decide which represents E min and which
E max according to
E max ≥ E min (6.15)
We could predict in advance which of (6.13) or (6.14) corresponds to the major
axis and which corresponds to the minor axis. However, making this prediction is
as complicated as simply evaluating (6.13) and (6.14) and determining which is
greater.
Elliptically polarized light is often characterized by the ratio of the minor axis
to the major axis. This ratio is called the ellipticity, which is a dimensionless
number:
E min
e≡ (6.16)
E max
Figure 6.3 The electric field of el-
The ellipticity e ranges between zero (corresponding to linearly polarized light)
liptically polarized light traces an
and one (corresponding to circularly polarized light). Finally, the helicity or ellipse in the plane perpendicular
handedness of elliptically polarized light is as follows (see P6.2): to its propagation direction. The
two plots are for different values
0<δ<π → left-handed helicity (6.17) of A, B , and δ. The angle α can
describe the major axis (left figure)
π < δ < 2π → right-handed helicity (6.18) or the minor axis (right figure),
depending on the values of these
parameters.
6.4 Linear Polarizers and Jones Matrices
In 1928, Edwin Land invented Polaroid at the age of nineteen. He did it by stretch-
ing a polymer sheet and infusing it with iodine. The stretching causes the polymer
chains to align along a common direction, whereupon the sheet is cemented to
a substrate. The infusion of iodine causes the individual chains to become con-
ductive. When light impinges upon the Polaroid sheet, the component of electric
field that is parallel to the polymer chains causes a current Jfree to oscillate in
that dimension. The resistance to the current quickly dissipates the energy (i.e.
the refractive index is complex) and the light is absorbed. The thickness of the
Polaroid sheet is chosen sufficiently large to ensure that virtually none of the light
with electric field component oscillating along the chains makes it through the
device.
The component of electric field that is orthogonal to the polymer chains
encounters electrons that are essentially bound, unable to leave their polymer

Arbitrary incident chains. For this polarization component, the wave passes through the material
polarization
like it does through typical dielectrics such as glass (i.e. the refractive index is
real). Today, there are a wide variety of technologies for making polarizers, many
Transmission Axis very different from Polaroid.
A polarizer can be represented as a 2 × 2 matrix that operates on Jones vectors.
The function of a polarizer is to pass only the component of electric field that
is oriented along the polarizer transmission axis. Thus, if a polarizer is oriented
with its transmission axis along the x-dimension, then only the x-component
of polarization transmits; the y-component is killed. If the polarizer is oriented
Transmitted polarization
component with its transmission axis along the y-dimension, then only the y-component of
the field transmits, and the x-component is killed. These two scenarios can be
Figure 6.4 Light transmitting represented with the following Jones matrices:
through a Polaroid sheet. The
conducting polymer chains run 1 0
· ¸
vertically in this drawing, and
(polarizer with transmission along x-axis) (6.19)
0 0
light polarized along the chains
is absorbed. Light polarized per- 0 0
· ¸
pendicular to the polymer chains (polarizer with transmission along y-axis) (6.20)
0 1
passes through the polarizer.
These matrices operate on any Jones vector representing the polarization of
incident light. The result gives the Jones vector for the light exiting the polarizer.
Example 6.1
Use the Jones matrix (6.19) to calculate the effect of a horizontal polarizer on
light that is initially horizontally polarized, vertically polarized, and arbitrarily
polarized.
Solution: First we consider a horizontally polarized plane wave traversing a polar-

izer with its transmission axis oriented also horizontally (x-dimension):
1 0 1 1
· ¸· ¸ · ¸
= (horizontal polarizer on horizontally polarized field)
0 0 0 0
As expected, the polarization state is unaffected by the polarizer (ignoring small

surface reflections).
Now consider vertically polarized light traversing the same horizontal polarizer. In
this case, we have:
1 0 0 0
· ¸· ¸ · ¸
= (horizontal polarizer on vertical linear polarization)
0 0 1 0
As expected, the polarizer extinguishes the light.

Finally, when a horizontally oriented polarizer operates on light with an arbitrary
Jones vector (6.11), we have
1 0 A A
· ¸· ¸ · ¸
= (horizontal polarizer on arbitrary polarization)
0 0 B eiδ 0
Only the horizontal component of polarization is transmitted through the polar-

izer.

6.4 Linear Polarizers and Jones Matrices 139
While students will readily agree that the matrices given in (6.19) and (6.20)
can be used to get the right result for light traversing a horizontal or a vertical
polarizer, the real advantage of the matrix formulation has yet to be demonstrated.
In the next few sections we will derive Jones matrices for a number of optical
elements that can modify polarization: polarizers at arbitrary angle, wave plates
at arbitrary angle, and reflection or transmissions at an interface. Table 6.2 shows
Jones matrices for each of these devices. Before deriving these specific Jones ma-
trices, however, we take a moment to appreciate why the Jones matrix formulation
is useful.
The real power of the formalism becomes clear as we consider situations Linear polarizer
where light encounters multiple polarization elements in sequence. In these situ-
cos2 θ sin θ cos θ
· ¸
ation, we use a product of Jones matrices to represent the effect of the compound
systems. We can represent this situation by sin θ cos θ sin2 θ
Half wave plate

A0 A
· ¸ · ¸
= Jsystem (6.21)
B0 B eiδ ·
cos 2θ sin 2θ
¸
sin 2θ − cos 2θ
where the unprimed Jones vector represents light going into the system and the
primed Jones vector represents light emerging from the system. In general, A 0 Quarter wave plate
and B 0 will turn out to be complex. However, if desired they can be change them ·
cos2 θ + i sin2 θ (1 − i ) sin θ cos θ
¸
into the usual form by writing (1 − i ) sin θ cos θ sin2 θ + i cos2 θ
A0 |A 0 | Right circular polarizer

· ¸ · ¸
i φ A0
=e
|B 0 |e i δ
0
B0
1 1 i
· ¸
2 −i 1
where φ A 0 is an unimportant overall phase, and δ0 is the phase difference between
B 0 and A 0 . Left circular polarizer
The matrix Jsystem is a Jones matrix formed by the series polarization devices.
1 1
· ¸
If there are N devices in the system, the compound matrix is calculated as −i
2 i 1
Jsystem ≡ JN JN −1 · · · J2 J1 (6.22)
Reflection from an interface
th
where Jn is is the matrix for the n polarizing optical element encountered in −r p 0
· ¸
the system. Notice that the matrices operate on the Jones vector in the order that 0 rs
the light encounters the devices. Therefore, the matrix for the first device (J1 ) is
Transmission through an
written on the right, and so on until the last device encountered, which is written
interface
on the left, farthest from the Jones vector.
tp 0
· ¸
When part of the light is absorbed by passing through one or more polarizers
in a system, the Jones vector of the exiting light is no longer normalized to magni- 0 ts
tude one. Since the components of a Jones vector represent the electric field, we
find the factor by which the intensity of the light decreases by dotting the vector Table 6.2 Summary of Jones Ma-
with its complex conjugate. In accordance with (6.10), the intensity of the exiting trices. The variable θ is measured
light is with respect to the x-axis and
1 ³ ´ ³ ´∗ specifies the transmission axis for
I = nc²0 |E eff |2 A 0 x̂ + B 0 e i δ ŷ · A 0 x̂ + B 0 e i δ ŷ
0 0
2 the linear polarizer and the fast

(6.23) axis for the wave plates.
1 2
³¯ ¯ ¯ ¯ ´
2
= nc²0 |E eff |2 ¯ A 0 ¯ + ¯B 0 ¯
2
¯ ¯2 ¯ ¯2
Notice that the intensity is attenuated by the factor ¯ A 0 ¯ + ¯B 0 ¯ after propagating
through the system. Recall that E eff represents the effective strength of the field
before it enters the polarizer (or other device), so that the initial Jones vector is
normalized to one (see (6.10)). By convention we normally remove an overall
phase factor from the Jones vector so that A 0 is real and non-negative, and we
choose δ0 so that B 0 is real and non-negative. However, if we don’t bother doing
this, the absolute value signs on A 0 and B 0 in (6.23) ensure that we get the correct
value for intensity.
6.5 Jones Matrix for Polarizers at Arbitrary Angles

Incident Light
In this section we develop a Jones matrix for describing an ideal polarizer with
its transmission axis at an arbitrary angle θ from the x-axis. We will do this in a
general context so that we can take advantage of present work when discussing
Transmission
Axis wave plates at a later point. To help keep things on a more conceptual level,
we revert back to using electric field components directly. We will make the
connection with Jones calculus at the end.
The polarizer acts on a plane wave with arbitrary polarization. The electric
field of our plane wave may be written as
Transmitted
E (z, t ) = E x x̂ + E y ŷ e i (kz−ωt ) (6.24)
¡ ¢
component
Let the transmission axis of the polarizer be specified by the unit vector ê1
Figure 6.5 Light transmitting
and the absorption axis of the polarizer be specified by ê2 (orthogonal to the
through a polarizer oriented with
transmission axis at angle θ from
transmission axis). The vector ê1 is oriented at an angle θ from the x-axis. We
x-axis. need to write the electric field components in terms of the new basis specified by
the unit vectors ê1 and ê2 as shown in Fig. 6.6. By inspection of the geometry, the
x-y unit vectors are connected to the new coordinate system via:
x̂ = cos θê1 − sin θê2

(6.25)
ŷ = sin θê1 + cos θê2
Substitution of (6.25) into (6.24) yields for the electric field
E (z, t ) = (E 1 ê1 + E 2 ê2 ) e i (kz−ωt ) (6.26)
where
E 1 ≡ E x cos θ + E y sin θ
(6.27)
E 2 ≡ −E x sin θ + E y cos θ
Now we introduce the effect of the polarizer on the field: E 1 is transmitted
unaffected, while E 2 is extinguished. To account for the effect of the device, we
multiply E 2 by a parameter ξ. In the case of the polarizer, ξ is simply zero, but
when we consider wave plates we will use other values for ξ. After traversing the
polarizer, the field becomes
Figure 6.6 Electric field compo-
nents written in the ê1 –ê2 basis. Eafter (z, t ) = (E 1 ê1 + ξE 2 ê2 ) e i (kz−ωt ) (6.28)

6.6 Jones Matrices for Wave Plates 141
We now have the field after the polarizer, but it would be nice to rewrite it in
terms of the original x–y basis. By inverting (6.25), or by inspection of Fig. 6.5, if
preferred, we see that
ê1 = cos θx̂ + sin θŷ
(6.29)
ê2 = − sin θx̂ + cos θŷ
Substitution of these relationships into (6.28) together with the definitions (6.27)
for E 1 and E 2 yields
Eafter (z, t ) = E x cos θ + E y sin θ cos θx̂ + sin θŷ

£¡ ¢¡ ¢
+ξ −E x sin θ + E y cos θ − sin θx̂ + cos θŷ e i (kz−ωt )

¡ ¢¡ ¢¤
= E x cos2 θ + ξ sin2 θ + E y (sin θ cos θ − ξ sin θ cos θ) x̂e i (kz−ωt )

£ ¡ ¢ ¤
+ E x (sin θ cos θ − ξ sin θ cos θ) + E y sin2 θ + ξ cos2 θ ŷe i (kz−ωt )

£ ¡ ¢¤
(6.30)
Notice that if ξ = 1 (i.e. no polarizer), then we get back exactly what we started
with (i.e. (6.30) reduces to (6.24)).
To get to the Jones matrix for the polarizer, we note that (6.30) is a linear mix-
ture of E x and E y which can be represented with matrix algebra. If we represent
the electric field as a two dimensional column vector with its x-component in the
top and its y-component in the bottom (like a Jones vector), then we can rewrite
(6.30) as
cos2 θ + ξ sin2 θ sin θ cos θ − ξ sin θ cos θ Ex
· ¸· ¸
Eafter (z, t ) = e i (kz−ωt )
sin θ cos θ − ξ sin θ cos θ sin2 θ + ξ cos2 θ Ey
(6.31)
The matrix here is a proper Jones matrix, although we did not bother factoring
out E eff to make a properly normalized Jones vector, as specified in (6.5). We can
now write down the Jones matrix for a polarizer by simply inserting ξ = 0 into the
matrix:
cos2 θ sin θ cos θ
· ¸
(polarizer with transmission axis at angle θ) (6.32)
sin θ cos θ sin2 θ
Notice that when θ = 0 this matrix reduces to that of a horizontal polarizer (6.19),
and when θ = π/2, it reduces to that of a vertical polarizer (6.20).
6.6 Jones Matrices for Wave Plates

We next consider wave plates (or retarders), which are usually made from a bire-
fringent crystals. The index of refraction in the crystal depends on the orientation
of the electric field polarization. A wave plate has the appearance of a thin win-
dow through which the light passes. The crystal is cut such that the wave plate
has a fast and a slow axis, which are 90◦ apart in the plane of the window. If the
light is polarized along the fast axis, it experiences index n fast . The orthogonal
polarization component experiences higher index n slow .
When a plane wave passes through a wave plate, the component of the electric
field oriented along the fast axis travels faster than its orthogonal counterpart.

The fast component gets ahead, and this introduces a relative phase between the
two polarization components. The wave vectors associated with the individual
Slow axis
electric field components within the wave plate are given by
2πn slow 2πn fast

k slow = and k fast = (6.33)
λvac λvac
Fast axis As light passes through a wave plate of thickness d , the phase difference that
accumulates between the fast and the slow polarization components is
2πd
Waveplate k slow d − k fast d = (n slow − n fast ) (6.34)
λvac
Transmitted polarization
components have altered
By adjusting the thickness of the wave plate, one can introduce any desired phase
relative phase difference.
The most common types of wave plates are the quarter-wave plate and the
Figure 6.7 Wave plate interacting
half-wave plate. The quarter-wave plate introduces a phase difference of
with a plane wave.
k slow d − k fast d = π/2 + 2πm (quarter-wave plate) (6.35)
between the two polarization components, where m is an integer. This means

that the polarization component along the slow axis is delayed spatially by a
quarter wavelength (or five quarters, etc.).
The half-wave plate introduces a phase difference of
k slow d − k fast d = π + 2πm (half-wave plate) (6.36)
where m is an integer. This means that the polarization component along the
slow axis is delayed spatially by a half wavelength (or three halves, etc.). When
m = 0 in either (6.35) or (6.36), the wave plate is said to be zero order.
The derivation of the Jones matrix for the two wave plates is essentially the
same as the derivation for the polarizer in the previous section. Let ê1 correspond
to the fast axis, and let ê2 correspond to the slow axis, as illustrated in Fig. 6.7. We
proceed as before. However, instead of setting ξ equal to zero in (6.31), we must
choose values for ξ appropriate for each wave plate. Since nothing is absorbed,
ξ should have a magnitude equal to one. The important feature is the phase of
ξ. As seen in (6.34), the field component along the slow axis accumulates excess
phase relative to the component along the fast axis, and we let ξ account for this.
In the case of the quarter-wave plate, the appropriate factor from (6.35) is
ξ = e i π/2 = i (quarter-wave plate) (6.37)
This describes a relative phase delay for the light emerging with polarization along
the slow axis. Substituting (6.37) into (6.30) yields the Jones matrix for a quarter
wave plate:
cos2 θ + i sin2 θ sin θ cos θ − i sin θ cos θ

· ¸
quarter-wave plate Jones matrix (6.38)
sin θ cos θ − i sin θ cos θ sin2 θ + i cos2 θ

6.6 Jones Matrices for Wave Plates 143
For the half-wave plate, the appropriate factor applied to the slow axis is
ξ = e i π = −1 (half-wave plate) (6.39)
and the Jones matrix becomes:
cos2 θ − sin2 θ 2 sin θ cos θ cos 2θ sin 2θ
· ¸ · ¸
= (6.40) half-wave plate Jones matrix
2 sin θ cos θ sin2 θ − cos2 θ sin 2θ − cos 2θ
Remember that θ refers to the angle that the fast axis makes with respect to the
x-axis.
Before moving on, consider the following two examples that illustrate how
wave plates are often used:
Example 6.2
Calculate the Jones matrix for a quarter wave plate at θ = 45◦ , and calculate its
effect on horizontally polarized light.
Solution: At θ = 45◦ , the Jones matrix for the quarter-wave plate (6.38) reduces to
Figure 6.8 Animation showing
e i π/4 1 −i
· ¸
p (quarter-wave plate, fast axis at θ = 45◦ ) (6.41) effects of polarizers and wave
2 −i 1 plates on polarized light.
The overall phase factor e i π/4 in front is not important since it merely accompanies
the overall phase of the beam, which can be adjusted arbitrarily by moving the
light source forwards or backwards through a fraction of a wavelength.
Now we calculate the effect of the quarter wave plates (oriented at θ = 45◦ ) operat-
ing on horizontally polarized light:
1 1 −i 1 1 1
· ¸· ¸ · ¸
p =p (6.42)
2 −i 1 0 2 −i
The previous example shows that a quarter-wave plate (properly oriented) can
turn linearly polarized light into right-circularly polarized light (see Table 6.1).
On the other hand, as seen in the next example, a half wave plate can rotate the
polarization angle of linearly polarized light by varying degrees while preserving
the linear polarization.
Example 6.3
Calculate the effect of a half wave plate at an arbitrary θ on horizontally polarized
light.
Solution: Carrying out the multiplication, we obtain

cos 2θ sin 2θ 1 cos 2θ
· ¸· ¸ · ¸
= (6.43)
sin 2θ − cos 2θ 0 sin 2θ
The resulting Jones vector describes linearly polarized light an angle of α = 2θ from
the x-axis (see Table 6.1).

6.7 Polarization Effects of Reflection and Transmission

When light encounters a material interface, the amount of reflected and trans-
mitted light depends on the polarization. The Fresnel coefficients (3.18)–(3.21)
dictate how much of each polarization is reflected and how much is transmitted.
In addition, the Fresnel coefficients keep track of phases intrinsic in the reflec-
tion phenomenon. This is true also for reflections from multilayer coatings with
effective Fresnel coefficients (4.64), (4.65), (4.69) and (4.70).
To the extent that the s and p components of the field behave differently,
the overall polarization state is altered. For example, a linearly-polarized field
upon reflection can become elliptically polarized (see L 6.9). Even when a wave
reflects at normal incidence so that the s and p components are indistinguishable,
right-circular polarized light becomes left-circular polarized. This is the same
effect that causes a right-handed person to appear left-handed when viewed in a
mirror.
We can use Jones calculus to keep track of how reflection and transmission
influences polarization. However, before proceeding, we emphasize that in this
context we do not strictly adhere to a single coordinate system as we did in
chapter 3, for example in Fig. 3.1. Instead, we consider each plane wave, whether
incident, reflected or transmitted, to propagate in the z-direction of its own frame,
x-axis regardless of the relative angles between the incident and reflected wave. This
directed into page loose manner of defining coordinate systems, depicted in Fig. 6.9, has a great
advantage. The x and y dimensions in each individual frame is aligned parallel
to their respective s and p field component. We will adopt the convention that
p-polarized light in all cases is associated with the x-dimension (horizontal, say).
The s-polarized component then lies along the y-dimension (vertical).
Figure 6.9 Incident, reflected and
We are now in a position to see why there is a handedness inversion upon
transmitted plane waves, each reflection from a mirror. Notice in Fig. 6.9 that for the incident light, the s-
propagating along the z-axis of its component of the field crossed into the p-component of the field yields a vector
own reference frame. pointing along beam’s propagation direction. However, for the reflected light,
the s-component crossed into the p-component points opposite to that beam’s
propagation direction.
The Jones matrix corresponding to reflection from a surface is simply
0
· ¸
−r p
(Jones matrix for reflection) (6.44)
0 rs
By convention, we place the minus sign on the coefficient r p to take care of

handedness inversion. We could put the minus sign on r s instead; the important
point is that the two polarizations acquire a relative phase differential of π when
the propagation direction flips.
The Fresnel coefficients specify the ratios of the exiting fields to the incident
ones. When (6.44) operates on an arbitrary Jones vector such as (6.11), −r p
multiplies the horizontal component of the field, and r s multiplies the vertical
component of the field. Especially in the case of reflection from an absorbing

6.A Ellipsometry 145
surface such as a metal, the phases of the two polarization components can
vary markedly (see P6.11). Thus, linearly polarized light containing both s- and
p-components in general becomes elliptically polarized when reflected from a
surface. When light undergoes total internal reflection, again the phases of the s-
and p-components differ markedly, which can cause linearly polarized light to
become elliptically polarized (see P6.12).
Transmission through a material interface can also influence the polarization
of the field, although typically to a lesser degree. However, there is no handedness
inversion, since the light continues on in a forward sense. The Jones matrix for
transmission is
tp 0
· ¸
(Jones matrix for transmission) (6.45)
0 ts
If a beam of light encounters a series of mirrors, the final polarization is

rotated x-axis
determined by multiplying the sequence of appropriate Jones matrices (6.44) (in the plane of incidence)
onto the initial polarization. This procedure is straightforward if the normals
to all of the mirrors lie in a single plane (say parallel to the surface of an optical original
bench). However, if the beam path deviates from this plane (due to vertical y-axis
tilt on the mirrors), then we must reorient our coordinate system before each
mirror to have a new ‘horizontal’ (p-polarized dimension) and the new ‘vertical’ rotated
y-axis
(s-polarized dimension). Earlier in this chapter we performed a rotation of a original
x-axis
coordinate system through an angle θ, described in (6.27), which is also useful
here. The rotation can be accomplished by multiplying the following matrix onto
Figure 6.10 If the plane of inci-
the incident Jones vector: dence does not coincide for suc-
cessive elements in an optical
cos θ sin θ
· ¸
(rotation of coordinates through an angle θ) (6.46) system, a rotation matrix must be
− sin θ cos θ applied to rotate the x-axis to the
plane of incidence before comput-
This is understood as a rotation about the z-axis. The angle of rotation θ is
ing the effect of each element.
chosen such that the rotated x-axis lies in the plane of incidence for the mirror.
When such a reorientation of coordinates is necessary, the two orthogonal field
components in the initial coordinate system are stirred together to form the
field components in the new system. This does not change the fundamental
characteristics of the polarization, just its representation.
Appendix 6.A Ellipsometry

Measuring the polarization of light reflected from a surface can yield information
regarding the optical constants of that surface (i.e. n and κ). As done in L 6.9, it is
possible to characterize the polarization of a beam of light using a quarter-wave
plate and a polarizer. However, we often want to know n and κ at a range of
frequencies, and this would require a different quarter-wave plate thickness d for
each wavelength used (see (6.35)). Therefore, many commercial ellipsometers do
not try to extract the helicity of the light, but only the ellipticity. In this case only
polarizers are used, which can be made to work over a wide range of wavelengths.

If in addition a variety of incident angles are measured, it is possible to extract

detailed information about the optical constants n and κ and the thicknesses of
possibly many layers of materials influencing the reflection.
Commercial ellipsometers typically employ two polarizers, one before and
one after the sample, where s and p-polarized reflections take place. The first
polarizer ensures that linearly polarized light arrives at the test surface (polarized
at angle α to give both s and p-components). The Jones matrix for the test surface
reflection is given by (6.44), and the Jones matrix for the analyzing polarizer
oriented at angle θ is given by (6.32). The Jones vector for the light arriving at the
detector is then
cos2 θ sin θ cos θ −r p 0 cos α

· ¸· ¸· ¸
sin θ cos θ sin2 θ 0 rs sin α

−r p cos α cos2 θ + r s sin α sin θ cos θ
· ¸
= (6.47)
−r p cos α sin θ cos θ + r s sin α sin2 θ
and the intensity arriving to the detector is

¯2 ¯ ¯2
I ∝ ¯−r p cos α cos2 θ + r s sin α cos θ sin θ ¯ +¯−r p cos α cos θ sin θ + r s sin α sin2 θ ¯
¯
³ ´
∗ ∗
¯ ¯2 r r
p s + r r
s p
= ¯r p ¯ cos2 α cos2 θ + |r s |2 sin2 α sin2 θ − sin 2α sin 2θ
4
(6.48)
For ellipsometry measurements, it is customary to express the ratio of Fresnel
coefficients as
r p r s ≡ tan Ψe i ∆ (6.49)
±
In this case, the intensity may be shown to be proportional to (see problem P6.13)
I ∝ 1 − η sin 2θ + ξ cos 2θ (6.50)
where
tan Ψ cos ∆ tan α tan2 Ψ − tan2 α
η≡2 and ξ ≡ (6.51)
tan2 Ψ + tan2 α tan2 Ψ + tan2 α
In commercial ellipsometers, the angle θ of the analyzing polarizer often rotates at
a high speed, and the time dependence of the light reaching a detector is analyzed.
From this type of measurement, the coefficients η and ξ can be extracted with
high precision. Then equations (6.51) can be inverted (see problem P6.13) to
reveal s
1+ξ η
tan Ψ = |tan α| and cos ∆ = p sign(α) (6.52)
1−ξ 1 − ξ2
From a series of these types of measurements, it is possible to extract the values
of n and κ for materials from the expressions for r s and r p (with the aid of a
computer!). A more extensive series of such measurements are needed in the case
of multilayers involving multiple layers with varying thicknesses.

6.B Partially Polarized Light 147
Appendix 6.B Partially Polarized Light

We outline here an approach for dealing with partially polarized light, which is a
mixture of polarized and unpolarized light. Most natural light such as sunshine is
unpolarized. The transverse electric field direction in natural light varies rapidly
(and quasi randomly). Such variations imply the superposition of multiple fre-
quencies as opposed to the single frequency assumed in the formulation of Jones
calculus earlier in this chapter. Unpolarized light can become partially polarized
when it, for example, reflects from a surface at oblique incidence, since s and p
components of the polarization might reflect with differing strength.
Stokes vectors are used to keep track of the partial polarization (and attenua-
tion) of a light beam as the light progresses through an optical system. In contrast,
Jones vectors are designed for pure polarization states. Partially polarized light is
a mixture of polarized and unpolarized light. We can consider any light beam as
an intensity sum of completely unpolarized light and perfectly polarized light: Sir George Gabriel Stokes (1819–1903,
Irish) was born in Skreen, Ireland. He
entered Cambridge University at age 18
I = I pol + I un (6.53) and graduated four years later with the
distinction of senior wrangler. In 1849,
It is assumed that both types of light propagate in the same direction. he became a professor of mathematics
at Cambridge where he later worked
The main characteristic of unpolarized light is that it cannot be extinguished with James Clerk Maxwell and Lord
by a single polarizer (in combination with a wave plate). Moreover, the transmis- Kelvin to form the Cambridge School
of Mathematical Physics. Stokes was a
sion of unpolarized light through an ideal polarizer (in combination with a wave powerful mathematician as well as good
plate) is always 50%. On the other hand, polarized light (be it linearly, circularly, or experimentalist, often testing his the-
oretical solutions in the laboratory. In
elliptically polarized) can always be represented by a Jones vector, and it is always addition to his contributions to optics,
possible to extinguish polarized light with a wave plate and a single polarizer. Stokes made important contributions to
We may introduce the degree of polarization as the fraction of the intensity fluid dynamics (e.g. the Navier-Stokes
equations) and to mathematical physics;
that is in a definite polarization state: Stokes’ theorem is employed several
places in this in this book.
I pol
ξpol ≡ (6.54)
I pol + I un
The degree of polarization takes on values between zero and one. Thus, if the
light is completely unpolarized (such that I pol = 0), then the degree of polarization
is zero. On the other hand, if the beam is fully polarized (such that I un = 0), then
the degree of polarization is one.
A Stokes vector, which characterizes a partially polarized beam, is a column
vector written as  
S0
 S 
 1 
 S2 
 
S3
The parameter
I
S0 ≡ (6.55)
I in
is a comparison of the beam’s intensity (or power) to a benchmark or ’input’ inten-
sity, I in , measured before the beam enters an optical system under consideration.

I represents the intensity at the point of investigation, where one wishes to char-
acterize the beam. Thus, the value S 0 = 1 represents the input intensity, and S 0
can drop to values less than one, to account for attenuation of light by polarizers
in the system. (Alternatively, S 0 could grow in the atypical case of amplification.)
The next parameter, S 1 , describes how much the light looks either horizontally
or vertically polarized, and it is defined as
2I hor
S1 ≡ − S0 (6.56)
I in
Here, I hor represents the amount of light detected if an ideal linear polarizer is
placed with its axis aligned horizontally directly in front of the detector (inserted
where the light is characterized). S 1 ranges between negative one and one, taking
on its extremes when the light is linearly polarized either horizontally or vertically,
respectively. If the light has been attenuated, it may still be perfectly horizontally
polarized even if S 1 has a magnitude less than one. (One might wish to examine
S 1 /S 0 , which is guaranteed to a number ranging between negative one and one.)
The parameter S 2 describes how much the light looks linearly polarized along
the diagonals. It is given by
2I 45◦
S2 ≡ − S0 (6.57)
I in
Similar to the previous case, I 45◦ represents the amount of light detected if an
ideal linear polarizer is placed with its axis at 45◦ directly in front of the detector
(inserted where the light is characterized). As before, S 2 ranges between negative
one and one, taking on extremes when the light is linearly polarized either at 45◦
or 135◦ .
Finally, S 3 characterizes the extent to which the beam is either right or left
circularly polarized:
2I r-cir
S3 ≡ − S0 (6.58)
I in
Here, I r-cir represents the amount of light detected if an ideal right-circular po-
larizer is placed directly in front of the detector. A right-circular polarizer is
one that passes right-handed polarized light, but blocks left handed polarized
light. One way to construct such a polarizer is a quarter wave plate, followed
by a linear polarizer with the transmission axis aligned 45◦ from the wave-plate
fast axis, followed by another quarter wave plate at −45◦ from the polarizer (see
P6.14).1 Again, this parameter ranges between negative one and one, taking on
the extremes for right and left circular polarization, respectively.
Importantly, if any of the parameters S 1 , S 2 , or S 3 take on their extreme values
(i.e. a magnitude equal to S 0 ), the other two parameters necessarily equal zero. As
an example, if a beam is linearly horizontally polarized with I = I in , then we have
I hor = I in , I 45◦ = I in /2, and I r-cir = I in /2. This yields S 0 = 1, S 1 = 1, S 2 = 0, and S 3 = 0.
As a second example, suppose that the light has been attenuated to I = I in /3 but is
1 The final quarter wave plate is to put the light back into the original circular state – not needed
to measure the Stokes parameter.

purely left circularly polarized. Then we have I hor = I in /6, I 45◦ = I in /6, and I r-cir = 0.
Whereas the Stokes parameters are S 0 = 1/3, S 1 = 0, S 2 = 0, and S 3 = −1/3.
Another interesting case is completely unpolarized light, which transmits 50%
through all of the polarizers discussed above. In this case, I hor = I 45◦ = I r-cir = I /2
and S 1 = S 2 = S 3 = 0.
Example 6.4
Find the Stokes
£ parameters for perfectly polarized light, represented by an arbitrary
Jones vector BA where A and B are complex.2 Depending on the values A and B ,
¤
the polarization can follow any ellipse.
Solution: The input intensity of this polarized beam is I in = I pol = |A|2 + |B |2 , ac-
cording to Eq. (6.23), where we absorb the factor 12 ²0 c |E eff |2 into |A|2 and |B |2
for convenience. The Jones vector for the light that passes through a horizontal
polarizer is
1 0 A A
· ¸· ¸ · ¸
=
0 0 B 0
which gives a measured intensity of I hor = |A|2 . Similarly, the Jones vector when
the beam is passed through a polarizer oriented at 45◦ is
1 1 1 A A +B 1
· ¸· ¸ · ¸
=
2 1 1 B 2 1
leading to an intensity of
|A + B |2 |A|2 + |B |2 + A ∗ B + AB ∗
I 45◦ = =
2 2
Finally, the Jones vector for light passing through a right-circular polarizer (see
P6.14) is
1 1 i A A +iB 1
· ¸· ¸ · ¸
=
2 −i 1 B 2 −i
giving an intensity of
|A + i B |2 |A|2 + |B |2 + i (A ∗ B − AB ∗ )
I r-cir = =
2 2
Thus, the Stokes parameters become
|A|2 + |B |2
S0 = =1
I in
2|A|2 |A|2 + |B |2 |A|2 − |B |2
S1 = − =
I in I in I in
|A|2 + |B |2 + A ∗ B + AB ∗ |A|2 + |B |2 A ∗ B + AB ∗
S2 = − =
I in I in I in
|A|2 + |B |2 + i (A ∗ B − AB ∗ ) |A|2 + |B |2 (A ∗ B − AB ∗ )
S3 = − =i
I in I in I in
A
· ¸ · ¸
2 We will find it easier in this appendix to write |A|
instead of , where δ is the phase
B |B |e i δ
difference between B and A.

It is clear from the linear dependence of S 0 , S 1 , S 2 , and S 3 on intensity (see

Eqs. (6.55)–(6.58)) that the overall Stokes vector may be regarded as the sum of
the individual Stokes vectors for polarized and unpolarized light. That is, we may
(pol)
write S j = S j + S (un)
j
, j = 0, 1, 2, 3.
This is certainly true for
I I pol + I un
S0 = = (6.59)
I in I in
and in the other cases the unpolarized portion of the light does not contribute to
the Stokes parameters. Half of the unpolarized light survives any of the test filters,
which cancels neatly this the unpolarized portion of S 0 in Eqs. (6.56)–(6.58).
A completely general form of the Stokes vector may then be written as (see
Example 6.4)    
S0 I pol + I un
 S  1  |A|2 − |B |2 
 1 
= (6.60)
 
 S 2  I in  A ∗ B + AB ∗ 
  
S3 i (A ∗ B − AB ∗ )
where the Jones vector for the polarized portion of the light is
A
· ¸
and the intensity of the polarized portion of the light is
I pol = |A|2 + |B |2 (6.61)
The factor 12 ²0 c |E eff |2 into |A|2 and |B |2 .

We would like to express the degree ofq polarization in terms of the Stokes
parameters. We first note that the quantity S 12 + S 22 + S 32 can be expressed as
s
¶2 ¶2 ¶2
|A|2 − |B |2 (A ∗ B + AB ∗ ) i (A ∗ B − AB ∗ )
q µ µ µ
S 12 + S 22 + S 32 = + +
I in I in I in
|A|2 + |B |2 (6.62)
=
I in
I pol
=
I in
Substituting (6.59) and (6.62) into the expression for the degree of polarization
(6.54) yields
1q 2
ξpol ≡ S 1 + S 22 + S 32 (6.63)
S0
If the light is polarized such that it perfectly transmits through or is perfectly
extinguished by one of the three test polarizers associated with S 1 , S 2 , or S 3 , then
the degree of polarization will be unity. Obviously, it is possible to have pure
polarization states that are not aligned with the axes of any one of these test

polarizers. In this situation, the degree of polarization is still one, although the
values S 1 , S 2 , and S 3 may all three contribute to (6.63).
Finally, it is possible to represent polarizing devices as matrices that operate
on the Stokes vectors in much the same way that Jones matrices operate on
Jones vectors. Since Stokes vectors are four-dimensional, the matrices used are
four-by-four. These are known as Mueller matrices.
Derivation: Mueller Matrix for a Linear Polarizer
We know that the 50% of the unpolarized light transmits through a polarizer,
ending up as polarized light with Jones vector
r
A 01 I un cos θ
· ¸ · ¸
=
B 10 2 sin θ
(see table 6.1). As usual, let θ give the angle of the transmission axis relative to the
horizontal. The Jones matrix (6.23) acts on the polarized portion of the light as
follows
Hans Mueller (Swiss) was a shepherd
cos2 θ cos θ sin θ cos θ
· 0 ¸ ·
A2 A
¸· ¸ · ¸
= = [A cos θ + B sin θ] until his late teens. As a physics pro-
B 20 cos θ sin θ sin2 θ B sin θ fessor at MIT, he built on the work of
Stokes and in 1943 formulated a matrix
h
A 01
i h
A 02
i method for manipulating Stokes vectors.
One might be tempted to add B 10
and B 20
, but this would be wrong, since He was an engaging lecturer into the
1950s and was known for his exciting
the two beams are not coherent. As mentioned previously, unpolarized light demonstrations. He was a student of
necessarily contains multiple frequencies, and so the fields from the polarized and Arnold Sommerfeld, and did seminal
unpolarized beams destructively interfere as often as they constructively interfere. work on ferroelectricity (he is reported
In this case, we simply add intensities rather than fields. That is, we have to have coined the term).
¯ A ¯ = ¯ A ¯ + ¯ A ¯ = I un + |A cos θ + B sin θ|2 cos2 θ

· ¸
¯ 0 ¯2 ¯ 0 ¯2 ¯ 0 ¯2
1 2
2
I un
· ¸
+ |A|2 cos2 θ + |B |2 sin2 θ + A ∗ B + AB ∗ sin θ cos θ cos2 θ
¡ ¢
=
2
S 0 cos 2θ sin 2θ
· ¸
= I in + S1 + S 2 cos2 θ
2 2 2
Similarly,
¯B ¯ = ¯B ¯ + ¯B ¯ = I in S 0 + cos 2θ S 1 + sin 2θ S 2 sin2 θ

· ¸
¯ 0 ¯2 ¯ 0 ¯2 ¯ 0 ¯2
1 2
2 2 2
Since the light has gone through a linear polarizer, we are guaranteed that A 0 and
B 0 have the same phase. Therefore, A 0∗ B 0 = A 0 B 0∗ = |A 0 ||B 0 |. In view of (6.60), these
results lead to
¯ 0 ¯2 ¯ 0 ¯2
¯ A ¯ + ¯B ¯ S 0 cos 2θ sin 2θ
0
S0 = = + S1 + S2
I in 2 2 2
¯ 0 ¯2 ¯ 0 ¯2 ·
¯ A ¯ − ¯B ¯ S 0 cos 2θ sin 2θ
¸
S 10 = S1 + S 2 cos2 θ − sin2 θ
¡ ¢
= +
I in 2 2 2
cos 2θ cos2 2θ sin 4θ
= S0 + S1 + S2
2 2 4

¯ 0¯ ¯ 0¯ ¯ 0¯ ¯ 0¯
¯ A ¯ ¯B ¯ + ¯ A ¯ ¯B ¯ ·
S 0 cos 2θ sin 2θ
¸
S 20 = =2 + S1 + S 2 cos θ sin θ
I in 2 2 2
sin 2θ sin 4θ sin2 2θ
= S0 + S1 + S2
¯ 20 ¯ ¯ 0 ¯ ¯ 40 ¯ ¯ 0 ¯ 2
¯ A ¯ ¯B ¯ − ¯ A ¯ ¯B ¯
S 30 = i =0
I in
These transformations expressed in matrix format become
S 00 1 cos 2θ sin 2θ 0 S0
    
1
 S 0  1  cos 2θ
 1 =  cos2 2θ 2 sin 4θ   S1
0   
1
sin2 2θ

 S 0  2  sin 2θ 0   S2
2 2 sin 4θ

S 30 0 0 0 0 S3
which reveals the Mueller matrix for a linear polarizer.
The Mueller matrix for a half wave plate is worked out below. The Mueller
matrix for a quarter wave plate is deferred to problem 6.15
Derivation: Mueller Matrix for a Half Wave Plate
We know that all of the light transmits through the wave plate. This immediately
gives
S 00 = S 0
The wave plate does nothing to unpolarized light. On the other hand, the polarized
portion of the light is influenced by the wave plate as follows (see (6.40)):
A0 cos 2θ sin 2θ A A cos 2θ + B sin 2θ

· ¸ · ¸· ¸ · ¸
= =
B0 sin 2θ − cos 2θ B A sin 2θ − B cos 2θ
As usual, θ is the angle of the fast axis relative to the horizontal. (As expected,
¯ 0 ¯ 2 ¯ 0 ¯2
¯ A ¯ + ¯B ¯ = |A|2 + |B |2 ; the intensity of the light is unaltered.) Using (6.60) we get
¯ 0 ¯ 2 ¯ 0 ¯2
¯ A ¯ − ¯B ¯ |A cos 2θ + B sin 2θ|2 − |A sin 2θ − B cos 2θ|2
S 10 = =
I in Ii n
¡ 2
|A| − |B |2 cos 4θ + (A ∗ B + AB ∗ ) sin 4θ
¢
= = S 1 cos 4θ + S 2 sin 4θ
Ii n
¯ 0 ¯ 2 ¯ 0 ¯2
¯ A ¯ − ¯B ¯ |A cos 2θ + B sin 2θ|2 − |A sin 2θ − B cos 2θ|2
S 10 = =
I in Ii n
¡ 2
|A| − |B |2 cos 4θ + (A ∗ B + AB ∗ ) sin 4θ
¢
= = S 1 cos 4θ + S 2 sin 4θ
Ii n
A 0∗ B 0 + A 0 B 0∗
S 20 =
Ii n
(A cos 2θ + B ∗ sin 2θ) (A sin 2θ − B cos θ)
∗
=
Ii n
(A cos 2θ + B sin 2θ) (A ∗ sin 2θ − B ∗ cos θ)
+
Ii n

|A|2 − |B |2 AB ∗ + A ∗ B
= sin 4θ − cos 4θ = S 1 sin 4θ − S 2 cos 4θ
Ii n Ii n
A 0∗ B 0 − A 0 B 0∗
S 30 = i
Ii n
(A ∗ cos 2θ + B ∗ sin 2θ) (A sin 2θ − B cos θ)
=i
Ii n
(A cos 2θ + B sin 2θ) (A ∗ sin 2θ − B ∗ cos θ)
−i
Ii n
A ∗ B − AB ∗
= −i = −S 3
Ii n
These transformations expressed in matrix format become
S 00 1 0 0 0 S0
    
0
 S   0
 1 = cos 4θ sin 4θ 0 

 S1 

 S0   0 sin 4θ − cos 4θ 0   S2
2

S 30 0 0 0 −1 S3
which reveals the Mueller matrix for a half wave plate.

Exercises
Exercises for 6.2 Jones Vectors for Representing Polarization

¢∗
Show that A x̂ + B e i δ ŷ · A x̂ + B e i δ ŷ = 1, as defined in connection
¡ ¢ ¡
P6.1
with (6.5).
P6.2 Prove that if 0 < δ < π, the helicity is left-handed, and if π < δ < 2π the
helicity is right-handed.
HINT: Write the relevant real field associated with (6.5)
E (z, t ) = |E eff | x̂A cos kz − ωt + φ + ŷB cos kz − ωt + φ + δ

£ ¡ ¢ ¡ ¢¤
where φ is the phase of E eff . Freeze time at, say, t = φ/ω. Determine the
field at z = 0 and at z = λ/4 (a quarter cycle), say. If E (0, t ) × E (λ/4, t )
points in the direction of k, then the helicity matches that of a wood
screw.
P6.3 For the following cases, what is the orientation of thepmajor axis, and
p = B = 1/ 2; δ = 0 Case II:
what is thepellipticity of the light? Case I: A
A = B = 1/ 2; δ = π/2; Case III: A = B = 1/ 2; δ = π/4.
L6.4 Determine how much right-handed circularly polarized light (λvac =

633 nm) is delayed (or advanced) with respect to left-handed circularly
polarized light as it goes through approximately 3 cm of Karo syrup
(the neck of the bottle). This phenomenon is called optical activity.
Because of a definite-handedness to the molecules in the syrup, right-
and left-handed polarized light experience slightly different refractive
indices. (video)
Karo Light
Corn Syrup
Screen
Polarizer Polarizer
Laser
Figure 6.11 Lab schematic for L 6.4
HINT: Linearly polarized light contains equal amounts of right and left
circularly polarized light. Consider
1 1 eiφ 1
· ¸ · ¸
+
2 i 2 −i

Exercises 155
where φ is the phase delay of the right circular polarization. Show that
this can be written as
cos φ/2
· ¸
iδ
e
sin φ/2
The overall phase δ is unimportant. Compare this with
cos α
· ¸
sin α
where α is the angle of linearly polarized light (see table 6.1).
Exercises for 6.4 Linear Polarizers and Jones Matrices
P6.5 (a) Suppose that linearly polarized light is oriented at an angle α with
respect to the horizontal axis (x-axis) (see table 6.1). What fraction of
the original intensity gets through a vertically oriented polarizer?
(b) If the original light is right-circularly polarized, what fraction of the
original intensity gets through the same polarizer?
Exercises for 6.5 Jones Matrix for Polarizers at Arbitrary Angles
P6.6 Horizontally polarized light (α = 0) is sent through two polarizers, the

first oriented at θ1 = 45◦ and the second at θ2 = 90◦ . What fraction of
the original intensity emerges? What is the fraction if the ordering of
the polarizers is reversed?
P6.7 (a) Suppose that linearly polarized light is oriented at an angle α with
respect to the horizontal or x-axis. What fraction of the original inten-
sity emerges from a polarizer oriented with its transmission at angle θ
from the x-axis?
Answer: cos2 (θ − α); compare with P6.5.
(b) If the original light is right circularly polarized, what fraction of the
original intensity emerges from the same polarizer?
P6.8 Derive (6.12), (6.13), and (6.14).

HINT: Analyze the Jones vector just as you would analyze light in the
laboratory. Put a polarizer in the beam and observe the intensity of the
light as a function of polarizer angle. Compute the intensity via (6.23).
Then find the polarizer angle (call it α) that gives a maximum (or a
minimum) of intensity. The angle then corresponds to an axis of the
ellipse inscribing the E-field as it spirals. When taking the arctangent,
remember that it is defined only over a range of π. You can add π for
another valid result (which corresponds to the second ellipse axis).

Exercises for 6.6 Jones Matrices for Wave Plates
L6.9 Create a source of unknown elliptical polarization by reflecting a lin-

early polarized laser beam (with both s and p-components) from a
metal mirror with a large incident angle (i.e. θi ≥ 80◦ ). Use a quarter-
wave plate and a polarizer to determine the Jones vector of the reflected
beam. Find the ellipticity, the helicity (right or left handed), and the
orientation of the major axis. (video)
o
Silver Mirror
~80 angle of incidence
Polarizer
Laser 1/4 wave

Polarizer plate
o
set at 45
Screen
Figure 6.12 Lab schematic for L 6.9
HINT: A polarizer alone can reveal the direction of the major and minor
axes and the ellipticity, but it does not reveal the helicity. Use a quarter-
wave plate (oriented at a special angle θ) to convert the unknown
elliptically polarized light into linearly polarized light. A subsequent
polarizer can then extinguish the light, from which you can determine
the Jones vector of the light coming through the wave plate. This must
equal the original (unknown) Jones vector (6.11) operated on by the
wave plate (6.38). As you solve the matrix equation, it is helpful to note
that the inverse of (6.38) is its own complex conjugate.
P6.10 What is the minimum thickness (called zero-order thickness) of a

quartz plate made to operate as a quarter-wave plate for λvac = 500 nm?
The indices of refraction are n fast = 1.54424 and n slow = 1.55335.
Exercises for 6.7 Polarization Effects of Reflection and Transmission
P6.11 Light is linearly polarized at α = 45◦ with a Jones vector according to

table 6.1. The light is reflected from a vertical silver mirror with angle
of incidence θi = 80◦ , as described in (P3.13). Find the Jones vector
80 representation for the polarization of the reflected light. NOTE: The
s answer may be somewhat different than the result measured in L 6.9.
For one thing, we have not considered that a silver mirror inevitably
p
has a thin oxide layer.
0.668
µ ¶
Figure 6.13 Geometry for P6.11 Answer: .
0.702e 1.13i

Exercises 157
P6.12 Calculate the angle θ to cut the glass in a Fresnel rhomb such that after
the two internal reflections there is a phase difference of π/2 between
the two polarization states. The rhomb then acts as a quarter wave
plate. Fresnel
Rhomb
HINT: You need to find the phase difference between (3.40) and (3.41).
Set the difference equal to π/4 for each bounce. The equation you get
does not have a clean analytic solution, but you can plot it to find a
Side
numerical solution. View
= 50◦ and θ ∼
Answer: There are two angles that work: θ ∼ = 53◦ .
Figure 6.14 Fresnel Rhomb geom-

etry for P6.12
Exercises for 6.A Ellipsometry
P6.13 Derive (6.50) and (6.52), often used for ellipsometry measurements.
1−cos 2θ 1+cos 2θ
HINT: Using sin2 θ = 2 and cos2 θ = 2 , first show
³ ´. ¯ ¯2 . 2
r p r s∗ + r s r p∗ |r s |2 tan α ¯r p ¯ |r s | − tan2 α
I ∝ 1− ¯ ¯ . sin 2θ + ¯ ¯ . cos 2θ
¯r p ¯2 |r s |2 + tan2 α ¯r p ¯2 |r s |2 + tan2 α
Exercises for 6.B Partially Polarized Light
P6.14 (a) One way to construct a right-circular polarizer is using a quarter

wave plate with fast axis at 45◦ , followed by a linear polarizer oriented
vertically, and finally a quarter wave plate with fast axis at −45◦ . Calcu-
late the Jones matrix for this system.
1 i
· ¸
1
Answer: 2
−i 1
(b) Check that the device leaves right-circularly polarized light unal-
tered while killing left-circularly polarized light.
P6.15 Derive the Mueller matrix for a quarter wave plate.

Answer:
1 0 0 0
 
1
 0 cos2 2θ 2 sin 4θ − sin 2θ 
1
sin2 2θ
 
 0
2 sin 4θ cos 2θ 
0 sin 2θ − cos 2θ 0

Chapter 7
Superposition of Quasi-Parallel
Plane Waves
To this point in our study of optical behavior, we have typically only considered
individual plane wave fields which have uniform intensity throughout space and
time. Some optical fields can be well-approximated by a plane wave, but most
have much more complicated structure. Nevertheless, it turns out that any field
(e.g. pulses or focused beams), regardless of how complicated, can be described
by a superposition of many plane wave fields. In this chapter, we develop the
techniques for superimposing plane waves.
We begin our analysis with a discrete sum of plane wave fields and show how
to calculate the intensity in this case. We will introduce the concept of group
velocity, which describes the motion of interference ‘ripples’ resulting when
multiple plane waves are superimposed. Group velocity is distinct from phase
velocity that we encountered previously. As we saw in chapter 2, the real part of
refractive index in certain situations can be less than one, indicating superluminal
wave crest propagation (i.e. greater than c)! In this case, the group velocity is
usually less than c. Since group velocity tracks the speed of the interference
ripples, regions of light intensity tend to advance with the group velocity rather
than the phase velocity.
Beginning in section 7.3, we extend our analysis of wave superposition to
waveforms composed of continua of plane waves rather than from discrete sums.
The analysis is based on Fourier theory (see section 0.3 for a review), which in
essence is simply a tool for keeping track of the plane waves that make up a given
wave form E (r1 , t ). Since it is easiest to deal with plane waves, we will learn how
to decompose arbitrary wave forms into plane waves for purposes of determining
effects such as propagation in a material (with a frequency-dependent index).
Conversely, we will also learn how to reassemble plane waves into a final pulse at
the end of propagation.
Different frequency components of the waveform experience different phase
velocities, causing the waveform to undergo distortion as it propagates, a phe-
nomenon called dispersion. We shall see that the group velocity tracks the move-
159
160 Chapter 7 Superposition of Quasi-Parallel Plane Waves
ment of the center of the wave packet. For narrowband packets (i.e. packets
comprised of a narrow range of frequencies and hence long duration), the packet
tends to maintain its shape (with some spreading) while propagating at the group
velocity. On the other hand, broadband pulses (i.e. packets comprised of many
frequencies and possibly of short duration) tend to distort severely while prop-
agating in materials. Nevertheless, the group velocity tracks the center of the
pulse.
It turns out that group velocity can become superluminal when significant
absorption and/or amplification of the light pulse is involved. This is no cause
for alarm (nor is it cause for an abundance of gee-wiz papers on the subject).
Absorption and amplification can cause a pulse to appear to move unexpectedly
fast through a reshaping effect. Group velocity, or rather its inverse group delay,
takes this into account, which makes it remarkably general. In such a scenario,
Sir Isaac Newton (1643–1727, En- energy can be lost from the back of a pulse or perhaps added to an already-present
glish) was born in Lincolnshire, England
three months after the death of his fa-
forward portion of a pulse such that the average pulse position appears to advance
ther who was a farmer. Newton spent abruptly. When all energy is accounted for (both the energy in the medium and in
much of his childhood with is maternal
the light pulse), however, nothing advances faster than the universal speed limit
grandmother, after his mother remarried.
(Newton did not like his stepfather.) c. Appendix 7.B gives a good look under the hood at how a medium exchanges
In his teenage years, Newton’s mother energy with a pulse to produce these eye-catching effects.
tried to persuade him to take up farm-
ing, but his love for education won out.
He became the top-ranked student
and was admitted into Trinity College, 7.1 Intensity of Superimposed Plane Waves
Cambridge at age 18. Newton was in-
fluenced by the works of Descartes,
Copernicus, Galileo and Kepler. Upon
We can construct arbitrary waveforms by adding together many plane waves with
graduation four years later, the univer- different propagation directions, amplitudes, phases, frequencies and polariza-
sity closed for two years because of a tions. Consider the following discrete sum of plane waves:
plague. Newton’s return to farm life co-
incided with a remarkable period when
E j e i (k j ·r−ω j t )
X
he first developed ideas on calculus, E(r, t ) = (7.1)
gravitation, and optics. Newton later j
returned to Cambridge where he spent
his extraordinarily prolific career and
The corresponding magnetic field according to (2.56)) is
became the first scientist to be knighted.
In optics, Newton advanced the ray the-
X kj × Ej
B j e i (k j ·r−ω j t ) = e i (k j ·r−ω j t )
ory of light and image formation. He X
showed that ‘white’ light is comprised of B(r, t ) = (7.2)
j j ωj
many colors and that the amount of re-
fraction depends on color. He built the
first reflecting telescope, which avoids As usual, the (time and space independent) individual field components E j con-
chromatic aberration. Newton advo-
cated against the wave theory of light in
tain both amplitude and phase information for each plane wave.
favor of his ‘corpuscular’ theory. (Imag- The Poynting vector (2.52) associated with the fields (7.1) and (7.2) is
ining that by this Newton foresaw the
quantized nature of light energy gives Re {B (r, t )}
too much credit!) S(r, t ) = Re{E (r, t )} ×
µ0
1 (7.3)
Re E j e i (k j ·r−ω j t ) × km × Re Em e i (km ·r−ωm t )
X n o ³ n o´
=
j ,m ωm µ0
(Recall the conspiracy that only the real parts of the fields are relevant – crucial
when multiplying.) The above expression is cumbersome because of the many
cross terms that arise when the two summations are multiplied. We need some

7.1 Intensity of Superimposed Plane Waves 161
simplifying assumptions before we can make any real progress on this expression.
For example, we can time-average the rapid fluctuations in the expression that
vary on the scale of optical frequencies. Additionally, it is common to encounter
the situation where all plane-wave components travel roughly parallel to each
other, which allows a dramatic simplification of (7.3). In the following derivation
we use these assumptions to obtain a simple formula for the time averaged
Poynting vector which retains validity in many useful situations.
Intensity for Quasi Parallel-Traveling Light
For simplicity, we assume that all vectors k j are real. If the wave vectors are com-
plex, the result is essentially the same, but, as in (2.62), the field amplitudes E j
would simply correspond to local amplitudes (adjusted for absorption or amplifi-
cation during prior propagation). We apply the BAC-CAB rule (P0.3) to (7.3) and
obtain
1 h ³ n ¢o n o´
km Re E j e i k j ·r−ω j t · Re Em e i (km ·r−ωm t )
X ¡
S(r, t ) =
j ,m ωm µ0 (7.4)
n o³ n ¢o í
− Re Em e i (km ·r−ωm t ) Re E j e i k j ·r−ω j t · km
¡
The last term in (7.4) can be dismissed if all k-vectors are approximately parallel to
each other, in which case all of the km are essentially perpendicular to each of the
E j . We will make this rather stringent assumption and kill the last line in (7.4).
The magnitude of the Poynting vector then becomes (with the help of (0.30))
i k ·r−ω j t
+ E∗j e −i k j ·r−ω j t 
 ¡ ¢ ¡ ¢
X km  E j e j
S(r, t ) =
j ,m ωm µ0 2
 
Em e i (km ·r−ωm t ) + E∗m e −i (km ·r−ωm t )

( )
·
2 (parallel k-vectors)
X km n
E j · Em e i k j +km ·r− ω j +ωm t + E∗j · E∗m e −i k j +km ·r− ω j +ωm t
£¡ ¢ ¡ ¢ ¤ £¡ ¢ ¡ ¢ ¤
=
j ,m 4ωm µ0
¢ ¤o
+ E j · E∗m e i k j −km ·r− ω j −ωm t + E∗j · Em e −i k j −km ·r− ω j −ωm t
£¡ ¢ ¡ ¢ ¤ £¡ ¢ ¡
(7.5)
The terms involving (ω j + ωm )t oscillate rapidly and time-average to zero. On the
other hand, the terms involving (ω j − ωm )t oscillate by comparison very slowly,
especially when the ω j are in the neighborhood of the ωm .; the terms with j = m
don’t oscillate at all. We will retain the slower fluctuations and discard the rapid
oscillations.
In accordance with (1.43) and (2.19) we may write k m /(ωµ0 ) = n m ²0 c, where n m

refers to the refractive index associated with the frequency ωm . For purposes of
computing the intensity (as opposed to determining phase changes with prop-
agation) we can approximate the index as a constant (i.e. n m ∼ = n). We seldom
measure intensity inside of materials anyway.

With the rapid oscillations discarded, (7.5) simplifies to
∗ i k j −km ·r− ω j −ωm t

£¡ ¢ ¡ ¢ ¤
+ E∗j · Em e −i k j −km ·r−(ωn −ωm )t
£¡ ¢ ¤
n²0 c X E j · Em e
〈S(r, t )〉osc =
2 j ,m 2
( )
n²0 c X i k j ·r−ω j t
¡ ¢ X
∗ −i (km ·r−ωm t )
= Re Ej e · Em e
2 j m
n²0 c ©
Re E (r, t ) · E∗ (r, t ) .
ª
=
2
(parallel k-vectors) (7.6)
Note that this expression is already manifestly real so there is no need to apply the
operation Re {}.
In the above derivation, we assumed that the k-vectors for the various added
plane waves are approximately parallel and that the refractive index is similar for
all frequencies. We call the time-averaged magnitude of the Poynting vector in
(7.6) the intensity:
n²0 c
valid for parallel or antiparallel I (r, t ) = E(r, t ) · E∗ (r, t ) (7.7)
2
k-vectors and constant n
In a surprising turn of events, it is important that E(r, t ) in (7.7) be written as the
entire complex expression for the electric field rather than just the real part. When
we do this, the formula (7.7) automatically time-averages over rapid oscillations
in such a way that I retains a slowly varying time dependence. This expression is
reminiscent of (2.62), but it should be kept in mind that we previously considered
only a single plane wave (perhaps with two distinct polarization components).
If some of the k-vectors point in an anti-parallel direction, we can still use
(7.7) with negative signs entered explicitly for those components. This brings up
a distinction between irradiance S and intensity I . For example, 〈S〉 is zero for
standing waves because there is no net flow of energy, whereas (7.7) still gives a
result. Intensity specifies whether atoms locally experience an oscillating electric
field without regard for whether there is a net flow of energy carried by a light field.
At extreme intensities, however, when the influence of the magnetic field becomes
comparable to that of the electric field, the distinction between propagating and
standing fields becomes important to the behavior of charged particles in that
field.
The assumption that all vectors k j are all parallel is not as serious as might
seem at first. For example, the output of a Michelson interferometer (studied
in chapter 8) is the superposition of two fields, each composed of a range of
frequencies with parallel k j ’s. We can relax the restriction of parallel k j ’s slightly
and apply (7.7) also to plane waves with nearly parallel k j ’s such as occurs in a
Young’s two-slit diffraction experiment (studied in chapter 8). In such diffraction
problems, (7.7) is viewed as an approximation valid to the extent that the vectors
k j are close to parallel. For the remainder of the chapter we will assume that the
k-vectors for all frequency components in our waveform are essentially parallel.

7.2 Group vs. Phase Velocity: Sum of Two Plane Waves 163
7.2 Group vs. Phase Velocity: Sum of Two Plane Waves

To begin our study of interference, consider just two plane waves with equal
amplitudes given by
E1 = E0 e i (k1 ·r−ω1 t ) and E2 = E0 e i (k2 ·r−ω2 t ) (7.8)
As we previously studied (see P1.9), the velocities of the wave crests for these two
waves are
v p1 = ω1 /k 1 and v p2 = ω2 /k 2 (7.9)
These are known as the phase velocities of the individual plane waves.
Next consider a composite wave created from the superposition of the above
two plane waves:
E(r, t ) = E0 e i (k1 ·r−ω1 t ) + E0 e i (k2 ·r−ω2 t ) (7.10)
The two plane waves interfere, producing regions of higher and lower intensity
that move in time. Remarkably, these intensity peaks can propagate at a speeds
quite different from either of the phase velocities in (7.9). The intensity (7.7) for
the field (7.10) is computed as follows:
n²0 c h ih i
Figure 7.1 Animation showing su-
I (r, t ) = E0 · E∗0 e i (k1 ·r−ω1 t ) + e i (k2 ·r−ω2 t ) e −i (k1 ·r−ω1 t ) + e −i (k2 ·r−ω2 t )
2 perposition of two plane waves
n²0 c h i
(electric fields) with different fre-
= E0 · E∗0 2 + e i [(k2 −k1 )·r−(ω2 −ω1 )t ] + e −i [(k2 −k1 )·r−(ω2 −ω1 )t ]
2 (7.11) quencies and traveling at different
∗
= n²0 cE0 · E0 [1 + cos [(k2 − k1 ) · r − (ω2 − ω1 ) t ]] speeds.
= n²0 cE0 · E∗0 [1 + cos (∆k · r − ∆ωt )]
where
∆k ≡ k2 − k1
(7.12)
∆ω ≡ ω2 − ω1
The darker in in Fig. 7.2 shows the intensity computed with (7.11). Keep in
mind that this intensity is averaged over rapid oscillations. For comparison, the
lighter line shows the Poynting flux with the rapid oscillations retained, according
Intensity
to (7.5). It is left as an exercise (see P7.3) to show that the rapid-oscillation peaks
in Fig. 7.2 move with a phase velocity derived from the average k and average ω
of the two plane waves.
A careful examination of the cosine argument in (7.11) reveals that the time-
averaged curve in Fig. 7.2 (solid) travel with speed
∆ω Position
vg ≡ (7.13)
∆k
Figure 7.2 Intensity of two inter-
This is known as the group velocity. Essentially, v g may be thought of as the fering plane waves. The solid line
velocity for the envelope that encloses the rapid oscillations. shows intensity averaged over
In general, v g and v p are not the same. This means that as the waveform rapid oscillations.
propagates, the rapid oscillations move within the larger modulation pattern, for
example, continually disappearing at the front and reappearing at the back of

each modulation. The group velocity is identified with the propagation of overall
waveforms. The presence of field energy in a waveform is clearly tied more to v g
than to v p .
Example 7.1
Determine the phase velocity and group velocity for the superposition of two plane
waves in a plasma (see P2.7).
Solution: The index of refraction is given by

q
n plasma (ω) = 1 − ω2p /ω2 < 1 (assuming ω > ωp ) (7.14)
The phase velocity for each frequency is computed by
v p1 = c/n plasma (ω1 )

(7.15)
v p2 = c/n plasma (ω2 )
John William Strutt (3rd Baron
Rayleigh) (1842–1919, British) was Since n plasma < 1, both of these velocities exceed c. However, the group velocity is
born in Langford Grove, Essex, Eng-
land and was frequently ill during in ∆ω ∼ d ω d k −1 d ωn plasma (ω) −1
· ¸ · ¸
his youth. He entered the University of vg = = = = = n plasma (ω) c (7.16)
Cambridge in 1861 and graduated four ∆k dk dω dω c
years later as senior wrangle in mathe-
which is clearly less than c. The derivation of the final expression in (7.16) from
matics. He married in 1871 and became
the father of three sons. In 1873, Strutt the previous one is left as an exercise. For convenience, we have taken ω1 and ω2
inherited the Barony of Rayleigh (and to lie very close to each other.
the title Lord Rayleigh) from his father
who died that year. In 1879 Strutt suc-
ceeded James Clerk Maxwell as the Example 7.1 illustrates that in an environment where the index of refraction is
Cavendish Professor of Physics at Cam-
bridge. Rayleigh studied a wide variety real (i.e. no net exchange of energy with the medium), the group velocity does not
of subjects. He is credited with the dis- exceed c, even when the phase velocity does. The ‘fast-moving’ phase velocity
covery of argon. He studied how atoms
scatter light (Rayleigh scattering) and v p result merely from an interplay between the field and the plasma. In a similar
explained why the sky is blue. He devel- sense, the intersection of an ocean wave with the shoreline can also exceed c, if
oped the notion of group velocity and
used it to understand the propagation
different points on the wave front happen to strike the shore nearly simultane-
of sound. He won the Nobel prize in ously. The point of intersection between the wave and the shoreline does not
physics in 1904. constitute an actual object under motion. Similarly, wave crests of individual
plane waves do not necessarily constitute actual objects that are moving. In short,
v p is not the relevant speed at which events up stream influence events down
stream.
Individual plane waves have infinite length and infinite duration. They do not
exist in isolation except in our imagination. All real waveforms are comprised of a
range of frequency components, and so interference always happens. Energy is
associated with regions of constructive interference between those waves. Group
velocity v g tracks the presence of field energy, whether that energy propagates
or is extracted from a material. If there is an exchange of energy between the
field and the medium (i.e. if the index of refraction is complex), v g can in fact
become superluminal. Nevertheless, energy (or signals) are never transported
faster than the universal speed limit c. An examination of energy flow in these
exotic situations is given in Appendix 7.B.

7.3 Frequency Spectrum of Light 165
7.3 Frequency Spectrum of Light

A waveform constructed from a discrete sum (as in the previous two sections)
must eventually repeat over and over (i.e. it is periodic). To create a waveform that
does not repeat (e.g. a single laser pulse or, technically speaking, any waveform
that exists in the physical world since no light source repeats forever) we must
replace the discrete sum (7.1) with an integral that combines a continuum of
plane waves. Such a waveform at a point r can be expressed as
Z∞
1
E(r, t ) = p E (r, ω) e −i ωt d ω (7.17)
2π
−∞
The function E (r, ω), called the spectrum, has units of field per frequency. Essen-
tially, it gives the amplitude and phase of each plane wave that makes up the over-
all waveform. It includes any spatially dependent factors such as exp {i k (ω) · r}.
We distinguish the spectrum E (r, ω) from the wholly separate function E(r, t ) by
its argument (i.e. ω instead of t ). (Sorry for using E for both functions, but this is
standard notation.) The operation (7.17) is called an inverse Fourier transform
p
as outlined in section 0.3. The factor 1/ 2π is introduced to match our Fourier-
transform convention. Regardless of what the function is called, please notice
that (7.17) merely sums together a range of plane waves in much the same way
that our earlier discrete summation (7.1) does.
If we have a waveform E(r, t ), one might wonder what plane waves should be
added together in order to construct it. Equation (7.17) can be inverted, which
remarkably has a very similar form:
Z∞
1
E (r, ω) = p E (r, t ) e i ωt d t (7.18)
2π
−∞
This operation is called the Fourier transform. It is used to generate the spectrum
E (r, ω) from the field E(r, t ) in much the same way that (7.17) is used to generate
the field E(r, t ) from the spectrum E (r, ω).
Although only the real part of E(r, t ) is physically relevant, we can continue
our habit of working with the complex field and taking the real part at our leisure.
As we shall see, often there is no problem with taking the Fourier transform of
a complex field.1 In fact, we will find it advantageous to work with the complex
field instead of only the real part.
The intensity formula (7.7) remains useful for continuous superpositions of
plane waves (i.e. a field defined by the inverse Fourier transform (7.17)):
n²0 c
I (r, t ) ≡ E(r, t ) · E∗ (r, t ) (7.19)
2
1 Since Fourier transforms are linear, one can take the Fourier transform of the real and imaginary
parts of a field separately. Appropriate modifications to E (r, ω) in the frequency domain will not
cause the two parts to become mingled. Upon taking the inverse Fourier transform to obtain E(r, t )
again, the original real part remains purely real, and the original imaginary part remains purely
imaginary.

This formula specifically requires the fields to be in complex format, and it takes
care of the time-average over rapid oscillations automatically.2
Similarly, we will define the power spectrum produced from E (r, ω), which we
write as
n²0 c
I (r, ω) ≡ E (r, ω) · E∗ (r, ω) (7.20)
2
The power spectrum I (r, ω) is what one observes when the waveform is sent into
a spectral analyzer or spectrometer. We must apologize again for the potentially
confusing notation (in wide usage): I (r, ω) is not the Fourier transform of I (r, t )!
These functions are defined only through (7.17) and (7.20).
Parseval’s theorem (see example 0.7) imposes an interesting connection be-
tween the time-integral of the intensity and the frequency-integral of the power
spectrum:
Z∞ Z∞
I (r, t )d t = I (r, ω) d ω (7.21)
−∞ −∞
As a reminder, the above expressions for I (r, t ) and I (r, ω) assume that all relevant
Figure 7.3 Real part of electric k-vectors are essentially parallel.
field (7.22) with T = 4π/ω0 and With the above formalities out of the way, we will illustrate the use of Fourier
T = 10π/ω0 , where 2π/ω0 is the transforms through some examples.
period of the carrier frequency.
Example 7.2
Find E (r, ω) associated with the field
2 2T 2
e −i ω0 t
±
E(r, t ) = E0 (r) e −t (7.22)
The real part of this field is shown in Fig. 7.3 for two different durations T . The
intensity profile computed by (7.19) is also shown.
Solution: The argument r is unimportant to our calculation. It merely specifies

that we are considering the field at the point r. We compute the Fourier transform
as follows:
Z∞
1 2
± 2
E (r, ω) = p E0 (r) e −t 2T e −i ω0 t e i ωt d t
2π
−∞
(7.23)
Z∞
E0 (r) −t 2 /2T 2 +i (ω−ω0 )t
= p e dt
2π
−∞
This integral can be performed with the help of (0.55), and we obtain
T 2 (ω−ω0 )2
E (r, ω) = T E0 (r) e − 2 (7.24)
Figure 7.4 The intensity (7.19) of
Notice that E (r, ω) has units of Field multiplied by time, or in other words, field per
the fields in Fig. 7.3.
frequency.
2 To use this expression there needs to be a sufficient number of oscillations within the waveform
to make the rapid time average meaningful.

7.3 Frequency Spectrum of Light 167
In general, E (r, ω) is a complex function (although it turns out to have uniform

complex phase in the above example, matching the phase of E0 ). E (r, ω) keeps
track of the amplitude and phase of each plane wave needed to compose the
waveform E(r, t ). More often than not, E (r, ω) exhibits a complicated complex
phase structure, depending on the time-shape of E(r, t ).
The spectrum (7.18) of the field in example 7.2 is shown in Fig. 7.5. The
corresponding power spectrum (7.20 is also plotted in Fig. 7.6. As expected, the
waveform includes frequencies in the neighborhood of ω0 . A range of frequencies
are needed to construct waveforms that turns on and off. The shorter the duration
of the waveform, the more frequency components that are necessary. This trend
can be seen for the two pulse durations T plotted.
Example 7.3
Check Parseval’s theorem for the field and spectrum in Example 7.2.
Figure 7.5 Spectral components

Solution: The time integration in (7.21) yields (7.24) of the fields in Fig. 7.3 with
T = 4π/ω0 and T = 10π/ω0 , where
Z∞ Z∞ 2π/ω0 is the period of the carrier
n²0 c 2
±
T2
I (r, t )d t = E0 (r) · E∗0 (r) e −t dt frequency.
2
−∞ −∞
n²0 c p
= E0 (r) · E∗0 (r) T π
2
where we have used (0.55) to perform the integration. This result has units of
energy per area. It is the energy per area, for example, of the entire pulse absorbed
by a detector. The frequency integration in (7.21) yields
Z∞ Z∞
n²0 c 2
(ω−ω0 )2
I (r, ω) d ω = E0 (r) · E∗0 (r) T 2 e −T dω
2
−∞ −∞
p
n²0 c π
= E0 (r) · E∗0 (r) T 2
2 T
which is the same answer. The interpretation of this latter expression is the area
under the spectral intensity curve measured when the waveform is sent into a
spectrometer.
As mentioned previously, the inverse Fourier transform is interpreted as sum-

ming together many plane waves to create a waveform.
Figure 7.6 Power spectrum based
on (7.27) for the spectral compo-
nents shown in Fig. 7.5.
Example 7.4
Take the inverse Fourier transform of (7.24) to recover the original waveform (7.22).

Solution: The inverse Fourier transform (eq:7.4.1) is
Z∞
1
E(r, t ) = p E (r, ω) e −i ωt d ω
2π
−∞
Z∞ T 2 (ω−ω0 )2
T E0 (r)
= p e− 2 e −i ωt d ω (7.25)
2π
−∞
Z∞ T 2 ω0 2
T E0 (r) T 2 ω2
+(T 2 ω0 −i t )ω−
= p e− 2 2 dω
2π
−∞
This integral can be performed with the help of (0.55), which gives
2
(T 2 ω0 −i t ) T 2 ω2
T E0 (r) π 0
r
−
e 4(T /2)
2 2
E(r, t ) = p
2π T 2 /2
2 2T 2
e −i ω0 t
±
= E0 (r) e −t
Q.E.D.
Since only the real part of the time profile E(r, t ) is physically relevant, students
might be curious about how the Fourier transform of the real part of the field
compares with that of the complex version of the field that we have been using.
Indeed, there are situations where it is more appropriate to use the real version
of the field rather than its complex form. For example, if a waveform includes
multiple propagation directions or if a waveform contains only a few cycles, then
the approximations used to derive (7.19) fail and the convenience of the complex
format begins to wane.
Fourier Transform of the Real Part of a Field
The real part of (7.22) is
E(r, t ) + E∗ (r, t )
Er (r, t ) =
2
± 2 E (r) e −i ω0 t + E∗ (r) e i ω0 t
(7.26)
2 0
= e −t 2T 0
2
2 /2T 2
If E0 (r) happens to be real, then this field can be written as E0 (r) e −t cos (ω0 t ).
Upon applying (7.18) we get (see P0.24)
T 2 (ω+ω0 )2 T 2 (ω−ω0 )2
E0 (r) e − 2 + E∗0 (r) e − 2
Er (r, ω) = T (7.27)
2
One thing to notice is that the transform of the real part of the field tends to be more
cumbersome than the transform of the entire complex field, which is a primary
reason the complex format is more often used. The spectrum is shown in Fig. 7.7.
Notice that that both positive and negative frequency components contribute to

7.4 Packet Propagation and Group Delay 169
the over all spectrum. Moreover, the Fourier transform of a real function Er (r, t )
obeys the following symmetry relation:
Er (r, −ω) = E∗r (r, ω) (if Er (r, t ) is real) (7.28)
Notice that the Fourier transform of the real field depicted in Fig. 7.7 obeys this
symmetry relation (7.28), whereas the Fourier transform of the complex field de-
picted in Fig. 7.5 does not. Essentially, the spectrum of the complex representation
of the field can be understood to be twice the spectrum of the real representation,
but plotted only for the positive frequencies. Figure 7.7 Spectrum based on
(7.27) with T = 10π/ω0 . Compare
with the lower curve in Fig. 7.5
7.4 Packet Propagation and Group Delay

Once we have the spectrum for a waveform, we can apply effects to the individual
spectral components. In particular, we can find how a waveform propagates,
taking advantage of our knowledge of how individual plane waves propagate. At
any point, we can perform an inverse Fourier transform on the modified spectral
components to see how the waveform looks in time. Thus, we can predict the
temporal profile of a waveform at any location given knowledge of that waveform
at a particular location.
Let E(r0 , t ) give the temporal profile of a pulse at some point r0 in a medium.
The spectrum of this pulse E(r0 , ω) (found using (7.18)) gives the amplitudes and
phases of the individual plane wave components at the point r0 . We already
know how to propagate individual plane waves through a material (see (2.20)). A
phase shift associated with a displacement ∆r modifies the spectral components
according to
E (r0 + ∆r, ω) = E (r0 , ω) e i k(ω)·∆r (7.29)
The k-vector contains the frequency-dependent information about the material

via k = n(ω)ω/c. (A complex wave vector k may also be used if absorption or
amplification is present.) Once we have the spectrum E (r0 + ∆r, ω) at the end of
propagation, we take the inverse Fourier transform to determine the waveform
E (r0 + ∆r, t ) at the new position:
Z∞
1
E(r0 + ∆r, t ) = p E(r0 + ∆r, ω)e −i ωt d ω
2π
−∞
Z∞
1
= p E(r0 , ω)e i (k(ω)·∆r−ωt ) d ω (7.30)
2π
−∞
As a reminder, the second line of (7.30) is a sum over traveling plane waves.
Being able to predict the shape and arrival time of a waveform is important
since a waveform traversing a material such as glass can undergo significant tem-
poral dispersion as different frequency components experience different indices

of refraction. Each frequency component propagates at its own phase velocity,

and if the frequency components have a large spread in phase velocities the tem-
poral form of a pulse can change significantly. For example, an ultra-short laser
pulse traversing a glass window or a lens can emerge with significantly longer
duration, owing to this effect.
The exponent in (7.29) is called the phase delay for the pulse propagation. It
is often expanded in a Taylor series about the pulse carrier frequency ω0 :
∂k ¯¯ 1 ∂2 k ¯¯
· ¯ ¯ ¸
∼ 2
k · ∆r = k|ω0 + (ω − ω0 ) + (ω − ω0 ) + · · · · ∆r (7.31)
∂ω ¯ω0 2 ∂ω2 ¯ω0
The k-vector has a sometimes-complicated frequency dependence through the

functional form of n(ω). If we retain only the first two terms in this expansion
then (7.30) becomes
Z∞
1
³h i ´
∂k ¯
¯
i k(ω0 )+ (ω−ω0 ) ·∆r−ωt
E(r0 + ∆r, t ) = p E(r0 , ω)e ∂ω ω0
dω
2π
−∞
Z∞
1
h i ³ ´
∂k ¯ ∂k ¯
−i ω t −
¯ ¯
i k(ω0 )−ω0 ·∆r ·∆r
= e ∂ω ω0
p E (r0 , ω) e ∂ω ω0
dω
2π
−∞
Z∞
i [k(ω0 )·∆r−ω0 t 0 ] 1
E (r0 , ω) e −i ω(t −t ) d ω
0
= e p (7.32)
2π
−∞
where in the last line we have used the definition

∂k ¯¯ ∂Rek ¯¯
¯ ¯
0
t ≡ ∼
· ∆r = · ∆r (7.33)
∂ω ¯ω0 ∂ω ¯ω0
and assumed that the imaginary part of k is roughly constant near ω0 so that t 0 is
real. Then (7.32) is seen to be simply the Fourier transform of the original pulse
with a new time argument. The integral in (7.32) is then performed by definition
to obtain
E (r0 + ∆r, t ) = e i [k(ω0 )·∆r−ω0 t ] E r0 , t − t 0
0
(7.34)
¡ ¢
The first factor in (7.34) merely gives an overall phase shift due to propagation. It
is dictated by the phase velocity of the carrier frequency (see (7.15)):
k (ω0 )
v p−1 (ω0 ) = (7.35)
ω0
Otherwise (7.34) is unaltered except for a delay t 0 , the time required for the
pulse to traverse the displacement ∆r. The function ∂Rek ∂ω · ∆r is known as
±
the group delay function, and in (7.33) it is evaluated only at the carrier frequency
ω0 . Traditional group velocity is obtained by dividing the displacement ∆r by the
group delay time t 0 to obtain
∂Re{k(ω)} ¯¯
¯
v g−1 (ω0 ) = (7.36)
∂ω ¯
ω0

7.5 Quadratic Dispersion 171
Group delay (or group velocity) essentially tracks the center of the packet.
In our derivation we have assumed that the phase delay k(ω)·∆r could be well-
represented by the first two terms of the expansion (7.31). While this assumption
gives results that are often useful, the other terms also play a role. In section 7.5
we’ll study what happens if you keep the next higher order term in the expansion.
We’ll find that this term controls the rate at which the wave packet spreads as it
travels. We should also note that there are times when the expansion (7.31) fails
to converge (usually when ω0 is near a resonance of the medium), and the above
expansion approach is not valid. We’ll address how to analyze pulse propagation
for these situations in section 7.6.
7.5 Quadratic Dispersion
A light pulse traversing a material in general undergoes dispersion when different

frequency components take on different phase velocities. As an example, consider
a short laser pulse traversing an optical component such as a lens or window,
as depicted in Fig. 7.8. The light can undergo temporal dispersion, where a
short light pulse spreads out in time with the different frequency components
becoming separated (often called stretching or chirping). Dispersion can occur 25 fs
56 fs
even if the optic absorbs very little of the light. Dispersion does not alter the
power spectrum of the light pulse (7.20), ignoring absorption or reflections. That
is, if the amplitude of E(r, ω) does not change, but only its phase, then the power
spectrum (7.20) is unaltered. according to (7.29). Figure 7.8 A 25 fs pulse traversing
a 1 cm piece of BK7 glass.
To compute the effect of dispersion on a pulse after it travels a distance in
glass, we need to choose a specific pulse form. Suppose that just before entering
the glass, the pulse has a Gaussian temporal profile given by (7.22). We’ll place
r0 at the start of the glass at z = 0 and assume that all plane-wave components
travel in the ẑ-direction, so that k · ∆r = kz. Let the polarization of the field be the
same for all frequencies. The Fourier transform of the Gaussian pulse is given in
(7.24). Hence we have
2
/2T 2 −i ω0 t
E (0, t ) = E0 e −t e
T 2 ω−ω 2
(7.37)
− ( 2 0)
E (0, ω) = T E0 e
To find the field downstream we invoke (7.29), which gives the appropriate phase
shift for each plane wave component:
T 2 (ω−ω0 )2
E (z, ω) = E (0, ω) e i k(ω)z = T E0 e − 2 e i k(ω)z (7.38)
To find the waveform at the new position z (where the pulse presumably has just
exited the glass), we take the inverse Fourier transform of (7.38). However, before
doing this we must specify the function k (ω). For example, if the glass material is
replaced by vacuum, the wave number is simply k vac (ω) = ω/c. In this case, the

final waveform is
Z∞
1 T 2 (ω−ω0 )2 ω − (t −z/c)
2
(vacuum) E (z, t ) = p E0 Te − 2 e i c z e −i ωt d ω = E0 e 2T 2
e i (k0 z−ω0 t ) (7.39)
2π
−∞
where k 0 ≡ ω0 /c. Not surprisingly, after traveling a distance z though vacuum, the
pulse looks identical to the original pulse, only its peak occurs at a later time z/c.
The term k 0 z appropriately adjusts the phase at different points in space so that
at the time z/c the overall phase at z goes to zero.
Of course the functional form of the k-vector in glass is different (and more
complicated) than in vacuum. One could represent the index with a Sellmeier
equation such as in P2.2, but in that case, we could only perform the inverse
Fourier transform numerically. For our present purposes, we again resort to
an expansion of the type (7.31), but this time we will keep three terms in the
expansion rather than just two as in the previous section. We will also suppose
the imaginary part of the index is negligible. We will retain up to the quadratic
term in expansion (7.31), which we write as
= k 0 z + v g−1 (ω − ω0 ) z + α (ω − ω0 )2 z + ...
k (ω) z ∼ (7.40)
where
ω0 n (ω0 )
k 0 ≡ k (ω0 ) = (7.41)
c
∂k ¯ = (ω0 ) + ω0 n (ω0 )
n 0
¯
v g−1 ≡
¯
(7.42)
∂ω ω0
¯ c c
2 ¯¯
1 ∂ k¯ n (ω0 ) ω0 n 00 (ω0 )
0
α≡ = + (7.43)
2 ∂ω2 ω0¯ c 2c
With this approximation for k (ω), we are now able to perform the inverse
Fourier transform on (7.38):
Z∞
1 T 2 (ω−ω0 )2 −1
(ω−ω0 )z+i α(ω−ω0 )2 z −i ωt
E (z, t ) = p E0 Te − 2 e i k0 z+i v g e dω
2π
−∞
(7.44)
Z∞
T E0 e i (k0 z−ω0 t )
e −(T /2−i αz )(ω−ω0 )
2 2
i v g−1 (ω−ω0 )z−i (ω−ω0 )t
= p e dω
2π
−∞
We can avoid considerable clutter if we change variables to ω0 ≡ ω − ω0 . Then the

inverse Fourier transform becomes
Z∞
T E0 e i (k0 z−ω0 t ) 2
e − 2 (1−i 2αz/T )ω −i (t −z/v g )ω d ω0
T 2 02 0
E (z, t ) = p (7.45)
2π
−∞

7.5 Quadratic Dispersion 173
The above integral can be performed with the aid of (0.55). The result is
2
( )
t −z/v g
T E0 e i (k0 z−ω0 t ) π
s
− 2
E (z, t ) = p ¢e (
4 T2 1−i 2αz/T 2 )
T2
2π 1 − i 2αz/T 2
¡
2
2 (7.46)
i
tan−1 2αz ( t −z/v g ) (1+i 2αz/T 2 )
e2 T2 −
( 2
)
= E0 e i (k0 z−ω0 t ) q 2T 1+ 2αz/T 2
2
³ ´
4
¢2 e
1 + 2αz/T 2
¡
Next, we spruce up the appearance of this rather cumbersome formula as follows:
E0 (t −z/v g )2 (t −z/v g )2 Φ(z)+i (k 1

tan−1 Φ(z)
− −i 0 z−ω0 t )+i 2
E (z, t ) = p e 2T̃ 2 (z) e 2T̃ 2 (z) (7.47)
T̃ (z)/T
where
2α
Φ(z) ≡ z (7.48)
T2
and
p
T̃ (z) ≡ T 1 + Φ2 (z) (7.49)
We can immediately make a few observation about (7.47). First, note that at
z = 0 (i.e. zero thickness of glass), (7.47) reduces to the input pulse given in (7.37),
as we would expect. Secondly, the peak of the pulse moves at speed v g since the
term
−
(t −z/v g )2
e 2T̃ 2 (z)
controls the pulse amplitude, while the other terms (multiplied by i ) in the ex-
Figure 7.9 Animation of a
ponent of (7.47) merely alter the phase. Also note that the duration of the pulse Gaussian-envelope pulse (elec-
increases and its peak intensity decreases as it travels, since T̃ (z) increases with tric field) undergoing dispersion
z. In P7.8 we will find that (7.47) also predicts that for large z, the field of the during transit.
spread-out pulse oscillates less rapidly at the beginning of the pulse than at the
end (assuming α > 0). This phenomenon is known as chirp, and indicates that
red frequencies get ahead of blue frequencies during propagation since the red
frequencies experience a lower index of refraction.
While we have derived these results for the specific case of a Gaussian pulse,
the results are qualitatively similar for all pulses. Although the exact details
will vary by pulse shape, all short pulses eventually broaden and chirp as they
propagate through a dispersive medium such as glass. The higher order terms in
the expansion (7.31) cause spreading, chirping, and other deformations to the
pulses as they propagates. The influence of each order becomes progressively
more cumbersome to study analytically. In this case, it is easier to perform the
inverse Fourier transform numerically; there is no need to expand k (ω) if the
integration is done numerically.

7.6 Generalized Context for Group Delay

The expansion of k (ω) in (7.31) is inconvenient if the frequency content (band-
width) of a waveform encompasses a substantial portion of a resonance structure.
In this case, it becomes necessary to retain a large number of terms in (7.31) to
describe accurately the phase delay k (ω) · ∆r. Moreover, if the bandwidth of the
waveform is wider than the spectral resonance of the medium (such as shown
in Fig. 7.11), the series altogether fails to converge. These difficulties have led
to the traditional viewpoint that group velocity loses meaning for broadband
waveforms near a resonance. In this section, we study a broader context for group
velocity (or rather its inverse, group delay d k/d ω), which is always valid, even for
broadband pulses where the expansion (7.31) utterly fails. The analysis avoids
the expansion and so is not restricted to a narrowband context.
Figure 7.10 Real and imaginary
We are interested in the arrival time of a waveform (or pulse) to a point, say,
parts of the refractive index for an
absorptive medium. where a detector is located. The definition of the arrival time of pulse energy
need only involve the Poynting flux (or the intensity), since it alone is responsible
for energy transport. To deal with arbitrary broadband pulses, the arrival time
should avoid presupposing a specific pulse shape, since the pulse may evolve
in complicated ways during propagation. For example, the pulse peak or the
midpoint on the rising edge of a pulse are poor indicators of arrival time if the
pulse contains multiple peaks or a long and non-uniform rise time.
For the reasons given, we use a time expectation integral (or time ‘center-of-
mass’) to describe the arrival time of a pulse:
Before Propagation
R∞
t I (r, t ) d t
−∞
〈t 〉r ≡ (7.50)
R∞
I (r, t ) d t
−∞
For simplification, we have assumed that the light travels in a uniform direction
by using intensity rather than the Poynting vector.
Consider a pulse as it travels from point r0 to point r = r0 + ∆r in a homoge-
After Propagation
neous medium. The difference in arrival times at the two points is
∆t ≡ 〈t 〉r − 〈t 〉r0 (7.51)
Figure 7.11 Normalized power
spectrum of a broadband pulse The pulse shape can evolve in complicated ways between the two points, spread-
before and after propagation ing with different portions being absorbed (or amplified) during transit as de-
through an absorbing medium picted in Fig. 7.12. Nevertheless, (7.51) renders an unambiguous time interval
with the complex index shown in
between the passage of the pulse center at each point.
Fig. 7.10. The absorption line eats
a hole in the spectrum.
This difference in arrival time can be shown to consist of two terms (see
P7.11):
∆t = ∆tG (r) + ∆t R (r0 ) (7.52)
The first term, called the net group delay, dominates if the field waveform is
initially symmetric in time (e.g. an unchirped Gaussian). It amounts to a spectral

7.6 Generalized Context for Group Delay 175
Initial Pulse Final Pulse

Detector Detector
Figure 7.12 Transit time defined as the difference between arrival time at two points.
average of the group delay function taken with respect to the spectral content of
the pulse arriving at the final point r = r0 + ∆r:
R∞ ³
∂Rek
´
I (r0 , ω) ∂ω · ∆r d ω
−∞
∆tG (r) = (7.53)
R∞
I (r0 , ω) d ω
−∞
where I (r, ω) is given in (7.20). The two curves in Fig. 7.11 show I (r0 , ω) (before
propagation) and I (r, ω) (after propagation) for an initially Gaussian pulse. As
seen in (7.53), the pulse travel time depends on the spectral shape of the pulse at
the end of propagation.
Note the close resemblance between the formulas (7.50) and (7.53). Both are
expectation integrals. The former is executed as a ‘center-of-mass’ integral on
time; the latter is executed in the frequency domain on ∂Rek · ∆r/∂ω, the group
delay function. The group delay at every frequency present in the pulse influences
the result. If the pulse has a narrow bandwidth in the neighborhood of ωs ub0, the
integral reduces to ∂Rek/∂ω|ωs ub0 · ∆r, in agreement with (7.36) (see P7.9). The
net group delay depends only on the spectral content of the pulse, independent
of its temporal organization (i.e., the phase of E (r, ω) has no influence). Only the
real part of the k-vector plays a direct role in (7.53).
Figure 7.13 The center of a
The second term in (7.52), called the reshaping delay, represents a delay
chirped pulse can shift owing
that arises solely from a reshaping of the spectral amplitude. Often this term is
to the reshaping effect when spec-
negligible. The term takes into account how the pulse time center-of-mass shifts trum is removed.
as portions of the spectrum are removed (or added), as illustrated in Fig. 7.13. It
is computed at r0 before propagation takes place:3
∆t R (r0 ) = 〈t 〉r0 ¯altered − 〈t 〉r0

¯
(7.54)
Here 〈t 〉r0 represents the usual arrival time of the pulse at the initial point r0 ,
according to (7.50). The intensity at this point is associated with a field E (r0 , t )
whose spectrum is E (r0 , ω). On the other hand, 〈t 〉r0 ¯altered is the arrival time of
¯
a pulse with modified spectrum E (r0 , ω) e −Imk·∆r . Notice that E (r0 , ω) e −Imk·∆r is
3 The reshaping delay can instead be computed after propagation takes place, in which case the
net group delay should be computed with the initial rather than final spectrum.

still evaluated at the initial point r0 . Only the spectral amplitude (not the phase)
is modified, according to what is anticipated to be lost (or gained) during the trip.
In contrast to the net group delay, the reshaping delay is sensitive to how a pulse
is organized. The reshaping delay is negligible if the pulse is initially symmetric
(in amplitude and phase) before propagation. The reshaping delay also goes to
zero in the narrowband limit, and the total delay reduces to the net group delay.
Example 7.5
Find the time required for a Gaussian pulse (7.22) to traverse a slab of absorption
material (neglecting possible surface reflections). Let the material response be
Figure 7.14 Animation compar- described by the Lorentz model described in section 2.2 with the carrier frequency
ing narrowband vs. broadband of the pulse ω0 , coinciding with the material resonance frequency. Let the slab
Gaussian pulses traversing an have thickness ∆r = cγ−1 /10 and absorption strength ω2p = 10γ.
absorbing slab (green stripe) on
resonance. Note the logarithmic Solution: The spectrum of the initially Gaussian pulse is given by (7.24), and its
scale. See Example 7.5. power spectrum is4
2 2
I (r0 , ω) ∝ e −T (ω−ω0 )
After propagating from r0 to r = r0 + ∆r , the power spectrum becomes
2
(ω−ω0 )2 −2 κ(ω)ω
c ∆r
I (r, ω) ∝ e −T e
The net group delay is then
R∞ ³
∂(ωn/c)
´ R∞ 2
(ω−ω0 )2 −2 κω
³
∂n
´
I (r, ω) dω e −T e c ∆r n + ω ∂ω dω
∂ω ∆r
−∞ −∞
∆tG (r) = ∆r =
R∞ c R∞ κω
c ∆r
2 (ω−ω )2
I (r, ω)d ω e −T 0 e −2 dω
−∞ −∞
The index of refraction n + i κ is given by (2.39) (see also (2.27) and (2.29)). Since
the expressions for n and κ are complicated, the integration in the above formula
must be performed numerically.
p
The result when T = T1 = 10γ−1 / 2 (narrowband) is
∆tG = −5.1/γ = −51∆r /c = −0.72T1

p
and result when T = T2 = γ−1 / 2 (broadband) is
∆tG = 0.67/γ = 6.7∆r /c = 0.95T2
The reshaping delay 7.54 in both cases is negligible.
The narrowband pulse (with duration T1 ) in Example 7.5 traverses the ab-
sorbing medium superluminally (i.e. faster than c). The negative transit time
means that the ‘center-of-mass’ of the exiting pulse emerges even before the
‘center-of-mass’ of the entering pulse reaches the medium! On the other hand,
4 In general, one should write ω̄ to distinguish the carrier frequency of the pulse from the
0
resonance frequency of the material ω0 ; in practice, these are often different.

7.A Pulse Chirping in a Grating Pair 177
the broadband pulse (with the shorter duration T2 ) has a large positive delay time,
indicating that the exiting pulse emerges subluminally.
Figure 7.14 shows the intensity profiles for these two pulses as they traverse
the absorption slab, calculated with the aid of (7.30). By eye, one can see how
the centers of the two pulses are either advanced or delayed as they go through
the absorption medium. In both cases, the pulse that emerges is well within
the envelope of the original pulse propagated forward at c. In the case of the
broadband pulse, the absorption peak eats a hole in the center of the spectrum
as shown in Fig. 7.11, causing the emerging pulse to be distorted in time. The
analysis in this section predicts the center of pulses, whereas to see the shape of
pulses one needs to calculate (7.30).
The results for the two pulse durations in example 7.5 indicate a trend. Su-
perluminal behavior only occurs for long boring pulses. In the case of single
absorption resonance, this comes with a severe cost of attenuation. Figure 7.15
shows the delay time as a function of pulse duration. As the injected pulse be-
comes more sharply defined in time, the superluminal behavior does not persist.
Sharply defined waveforms (i.e. broadband) cannot propagate superluminally
precisely because much of their bandwidth lies away from the frequencies with Figure 7.15 Delay as a function of
pulse duration.
superluminal group delays.
It should be mentioned that superluminal cannot persist for indefinite dis-
tances since the medium eventually removes the superluminal spectral com-
ponents through absorption (or else add subluminal spectral components in
the case of amplification). This limits the amount that a pulse center can be
advanced—on the scale of the pulse’s own duration.
As we saw for the absorption situation the exiting pulse is tiny and resides
well within the original envelope of the pulse propagated forward at speed c,
as depicted in Fig. 7.16. Without the absorbing material in place, the signal
would be detectable just as early. This statement is also true for any spectral
behavior of a medium, including amplifying media. One use the Lorentz model
Figure 7.16 Narrowband pulse
(2.40) to describe an amplifying medium with a negative oscillator strength f .
traversing an absorbing medium.
Figure 7.17 shows narrowband and broadband pulse traversing an amplifying
medium. In this case, superluminal behavior occurs for spectra near by but not
on an amplifying resonance. If the pulse is too broadband, its spectrum will be
amplified, which adds slower components to the overall group delay.
Appendix 7.A Pulse Chirping in a Grating Pair

Grating pairs can be used to introduce large amounts of dispersion into a light
pulse. Gratings are especially useful for amplification of ultrashort laser pulses, Figure 7.17 Animation compar-
where laser pulses are first stretched in time before amplification (to prevent ing narrowband vs. broadband
damage to the amplifier) and then compressed back to short duration just before Gaussian pulses traversing an am-
the experiment (called chirped pulse amplification). Diffraction from a grating plifying slab (green stripe) slightly
causes each k-vector to travel at a different angle. A second grating parallel to the off resonance.
first can realign all of the k-vectors to be parallel to each other. Since laser beams

are not infinitely wide, the light is typically sent through the grating pair twice
to undo the tendency of the different frequency components becoming laterally
separated. In the present analysis, we will consider an infinitely wide plane wave
pulse incident upon grating. The scenario is depicted in Fig. 7.19: A short plane
wave pulse strikes the grating at an angle, and a spreading pulse emerges.
Consider a plane-wave pulse that ricochets between a pair of parallel grating
surfaces. Although different k-vectors point with different angles, they are all
straightened out upon diffracting from the second grating. Therefore, for sim-
plicity we can consider all k-vectors as being parallel with each other. We will
consider a pulse just before the first bounce and just after the second bounce, but
First Second our analysis will concentrate on the dispersion in the region between.
Grating Grating Consider the a plane wave incident on a grating at an incident angle θi with
respect to the grating normal (aligned with the x-axis in our coordinate system) as
depicted in Fig. 7.18. The plane wave diffracts from the first grating, and reflects
away at an angle θr (also referenced from the grating normal). This angle is
governed by the grating diffraction formula5
−1 2πc
· ¸
θr = sin − sin θi (7.55)
ωd
where d is the grating groove spacing. By examining the geometry of the figure,
Figure 7.18 Direction of k-vector
we see that the reflected k-vector is given by k = x̂ cos(θr ) + ŷ sin(θr ) ω/c.
£ ¤
between parallel gratings (top
view). Grating rulings run in and Suppose we know the pulse at a point r0 on the first grating. Next we choose a
out of the page. point r0 + ∆r on the second grating where we will determine the outgoing pulse.
Since we are considering an infinitely wide plane-wave pulse, it doesn’t matter
where we choose that point as long as it lies on the surface of the second grating.
The waveform will be the same everywhere along the surface of the second gratin,
only its arrival time will trivially differ. For convenience, we might as well take the
second point to be r0 + ∆r = r0 + L x̂ (i.e. ∆r = L x̂) as shown in Fig. 7.18.
The phase delay needed for (7.29) becomes
Lω
k (ω) · ∆r = cos θr (7.56)
c
We will express this as a Taylor-series expansion similar to (7.40) so that we can
perform the inverse Fourier transform analytically. We will approximate (7.56) as
k (ω) · ∆r ≈ k 0 L + v g−1 (ω − ω0 ) L + α (ω − ω0 )2 L + · · · (7.57)
so that we can take advantage of formula (7.47). To calculate the terms in this
expansion we will need the derivative of θr :
d θr 1 2πc 1 2πc
µ ¶ µ ¶
=q ¢2 − ω2 d = p − 2
dω ¡ 2πc
1 − sin2 θr ω d
1− − sin θi
ωd (7.58)
2πc 1 2πc sin θi + sin θr
=− 2 =− =−
ω d cos θr ω cos θr ωd ω cos θr
5 This formula is equivalent to d sin θ + d sin θ = λ with λ = 2πc/ω.
i r

7.B Causality and Exchange of Energy with the Medium 179
The derivatives necessary for the Taylor’s series expansion are
dk L d θr
µ ¶
· ∆r = cos θr − ω sin θr
dω c dω
L sin θi + sin θr
µ ¶
= cos θr + sin θr (7.59)
c cos θr
L 1 + sin θr sin θi
µ ¶
=
c cos θr
and
d 2k L sin θr (1 + sin θr sin θi ) d θr
µ ¶
· ∆r = sin θi +
d ω2 c cos2 θr dω
L sin θi + sin θr sin θi + sin θr
µ ¶µ ¶
= − (7.60)
c cos2 θr ω cos θr
L (sin θi + sin θr )2
=−
ωc cos3 θr
The coefficients in (7.57) then become
∆r ω0
k 0 ≡ k|ω0 · = (7.61)
L c
d k ¯¯ ∆r 1 + sin θr sin θi ¯¯
¯ ¯
v g−1 ≡ · = (7.62)
d ω ¯ω0 L c cos θr ¯
ω0
1 d 2 k ¯¯ ∆r (sin θi + sin θr )2 ¯¯
¯ ¯
α≡ · = − (7.63)
2 d ω2 ¯ω0 L 2cω cos3 θr ¯ω0
In the case of a Gaussian pulse, we can employ (7.47), where L takes the place of
z, and k 0 , v g−1 and α are defined by (7.61) – (7.63). The duration of the pulse is Figure 7.19 Animation showing a
controlled by (7.63) and the spacing between the gratings L. short plane-wave pulse diffracting
from a grating positioned along
the left edge of the frame.
Appendix 7.B Causality and Exchange of Energy with the

Medium
The group delay function indicates the average arrival of field energy to a point.
Since this is only part of the whole energy story, there is no problem when it
becomes superluminal. The overly rapid appearance of electromagnetic energy at
one point and its simultaneous disappearance at another point merely indicates
an exchange of energy between the electric field and the medium.
We should not be dazzled by the magician who invites the audience to look
only at the field energy while energy transfers into and out of the ‘unwatched’
domain of the medium. Extra field energy seems to appear ‘prematurely’ down-
stream only if there is already non-zero field energy downstream to stimulate a
transfer of energy from the medium. The actual transport of energy is strictly
bounded by c; superluminal propagation of a signal front is impossible.

In accordance with Poynting’s theorem (2.51), the total energy density stored
in an electromagnetic field and in a medium is given by
u(r, t ) = u field (r, t ) + u med (r, t ) + u (r, −∞) (7.64)
where the time-dependent accumulation of energy transferred into the medium

from the field (ignoring possible free current Jfree ) is
Zt
¡ 0 ¢ ∂P r, t 0
¡ ¢
u med (r, t ) = E r, t · dt0 (7.65)
∂t 0
−∞
The expression (7.64) for the energy density includes all (relevant) forms of energy,
including a non-zero integration constant u (r, −∞) corresponding to energy
stored in the medium before the arrival of any pulse (important in the case of an
amplifying medium). u field (r, t ) and u med (r, t ) are both zero before the arrival of
the pulse (i.e. at t = −∞). In addition, u field (r, t ), given by (2.53), returns to zero
after the pulse has passed (i.e. at t = +∞).
As u med increases, the energy in the medium increases. Conversely, as u med
decreases, the medium surrenders energy to the electromagnetic field. While it is
possible for u med to become negative, the combination u med + u (−∞) (i.e. the net
energy in the medium) can never go negative since a material cannot surrender
more energy than it has to begin with.
Poynting’s theorem (2.51) has the form of a continuity equation which when
integrated spatially over a small volume V yields
∂
I Z
S · da = − u dV (7.66)
∂t
A V
where the left-hand side has been transformed into an surface integral represent-
ing the power leaving the volume. Let the volume be small enough to take S to be
uniform throughout V .
We can define an energy transport velocity (directed along S) as the effective
speed at which all of the energy density would need to travel in order to achieve
the Poynting flux:
S
vE ≡ (7.67)
u
Note that this ratio of the Poynting flux to the energy density has units of velocity.
When the total energy density u is used in computing (7.67), the energy transport
velocity has a fictitious nature; it is not the actual velocity of the total energy
(since part is stationary), but rather the effective velocity necessary to achieve
the same energy transport that the electromagnetic flux alone delivers. If we
reduce the denominator to the subset of the energy that can move, namely u field ,
the Cauchy-Schwartz inequality (i.e. α2 + β2 ≥ 2αβ) ensures an energy transport
velocity v E remains strictly bounded by the speed of light in vacuum c. The total
energy density u is at least as great as the field energy density u field . Hence, this
strict luminality is maintained.

Centroid of Energy
Consider a weighted average of the energy transport velocity:
vE u d 3 r S d 3r
R R
〈vE 〉 ≡ R = (7.68)
u d 3r u d 3r
R
where we have substituted from (7.67).

Integration by parts leads to
R ∂u 3
r∇ · S d 3 r r d r
R
〈vE 〉 = − = R ∂t 3 (7.69)
u d 3r ud r
R
where we have assumed that the volume for the integration encloses all energy in
the system and that the field near the edges of this volume is zero. Since we have
included all energy, Poynting’s theorem (2.51) can be written with no source terms
(i.e. ∇ · S + ∂u/∂t = 0). This means that the total energy in the system is conserved
and is given by the integral in the denominator of (7.69). This allows the derivative
to be brought out in front of the entire expression giving
∂ 〈r〉 ru d 3 r
R
〈vE 〉 = where 〈r〉 ≡ R (7.70)
∂t u d 3r
The latter expression represents the ‘center-of-mass’ or centroid of the total en-
ergy in the system, which is guaranteed to evolve strictly luminally since vE is
everywhere luminal.6
It is enlightening to consider u med within a frequency-domain context. In an

isotropic medium, the polarization for an individual plane wave can be written in
terms of the linear susceptibility defined in (2.16):
P (r, ω) = ²0 χ (r, ω) E (r, ω) (7.71)
We can use this to express u med in terms of the electric field and material suscepti-
bility.
Expressing u med in terms of the power spectrum

6 Although (7.70) guarantees that the centroid of the total energy moves strictly luminally, there is
no such limitation on the centroid of field energy alone. The steps leading to (7.70) are not possible
if u field is used in place of u. Explicitly, that is
∂ ru field d 3 r
¿ À R
S
6=
u field ∂t u field d 3 r
R
As was pointed out, the left-hand side is strictly luminal. However, the right-hand side can easily
exceed c as the medium exchanges energy with the field. In an amplifying medium, for example, the
rapid appearance of a pulse downstream can occur when the leading portion of a pulse stimulates
energy already present in the medium to convert to the form of field energy. Group velocity is
related to this method of accounting, which is why it also can become superluminal.

The field E(r, t ) can be expressed as an inverse Fourier transform (7.17). Similarly,
the polarization P can be written as7
Z∞ Z∞
1 −i ωt ∂P(r, t ) −i
P(r, t ) = p P (r, ω) e dω ⇒ =p ωP (r, ω) e −i ωt d ω (7.72)
2π ∂t 2π
−∞ −∞
The energy density in the medium (7.65) can then be written as

Z∞ Z∞ Z∞
  
 p1
0 0 −i ² 0
E r, ω0 e −i ω t d ω0 · p ωχ (r, ω) E (r, ω) e −i ωt d ω d t 0
0
u med (r, ∞) =
¡ ¢
2π 2π
−∞ −∞ −∞
(7.73)
where we have incorporated (7.71) and evaluated u med after the pulse is over at
t = ∞. We may change the order of integration and write
Z∞ Z∞ Z∞
¢ 1
e −i (ω+ω )t d t 0
0 0
u med (r, ∞) = −i ²0 d ωωχ (r, ω) E (r, ω) · d ω0 E r, ω0
¡
2π
−∞ −∞ −∞
(7.74)
The final integral is a delta function a delta function similar to (0.45), which allows
the middle integral also to be performed. The expression for u med then reduces to
Z∞
u med (r, ∞) = −i ²0 ωχ (r, ω) E (r, ω) ·E (r, −ω) d ω (7.75)
−∞
In this derivation, we take E(r, t ) and P(r, t ) to be real functions, so we can employ
the symmetry (7.28) along with
P∗ (r, ω) = P (r, −ω) and χ∗ (r, ω) = χ (r, −ω) .
Then we obtain
Z∞
u med (r, ∞) = ²0 ωImχ (r, ω) E (r, ω) · E∗ (r, ω) d ω (7.76)
−∞
The expression (7.76) describes the net energy density transfered to a point
in the medium after all action has finished (i.e. at t = ∞). It involves the power
spectrum of the pulse. We can modify this formula in an intuitive way so that it
describes the transfer of energy density to the medium for any time during the
pulse.
Since the medium is unable to anticipate the spectrum of the entire pulse
before experiencing it, the material responds to the pulse according to the history
of the field up to each instant. In particular, the material has to be prepared for
the possibility of an abrupt cessation of the pulse at any moment, in which case
all exchange of energy with the medium immediately ceases. In this extreme sce-
nario, there is no possibility for the medium to recover from previously incorrect
attenuation or amplification, so it must have gotten it right already.
7 We assume that the real forms of the fields in the time domain are used for the sake of this
multiplication.

If the pulse were in fact to abruptly terminate at a given instant, it would

not be necessary to integrate the inverse Fourier transform (7.18) beyond the
termination time t after which all contributions are zero. Causality requires that
the medium be indifferent to whether a pulse actually terminates if that possibility
lies in the future. Therefore, (7.76) can apply for any time t (not just for t = ∞)
if the spectrum (7.18) is evaluated just for that portion of the field previously
experienced by the medium (up to time t ).
The following is then an exact representation for the energy density (7.65)
transferred to the medium:
Z∞
u med (r, t ) = ²0 ωImχ (r, ω) Et (r, ω) · E∗t (r, ω) d ω (7.77)
−∞
where
Zt
1
E r, t 0 e i ωt d t 0
0
E t (r, ω) ≡ p (7.78)
¡ ¢
2π
−∞
This time dependence enters only through Et (r, ω) · E∗t (r, ω), known as the instan-
taneous power spectrum.
The expression (7.77) gives physical insight into the manner in which causal
dielectric materials exchange energy with different parts of an electromagnetic
pulse. Since the function E t (ω) is the Fourier transform of the pulse truncated
at the current time t and set to zero thereafter, it can include many frequency
components that are not present in the pulse taken in its entirety. This explains
why the medium can respond differently to the front of a pulse compared to the
back. Even though absorption or amplification resonances may lie outside of
the spectral envelope of a pulse taken in its entirety, the instantaneous spectrum
on a portion of the pulse can momentarily lap onto or off of resonances in the
medium.
In view of (7.77) and (7.78) it is straightforward to predict when the electro-
magnetic energy of a pulse will exhibit superluminal or subluminal behavior. In
section 7.5, we saw that this behavior is controlled by the group velocity function.
However, with (7.77) and (7.78), it is not necessary to examine the group velocity
directly, but only the imaginary part of the susceptibility χ (r, ω).
If the entire pulse passing through point r has a spectrum in the neighborhood
of an amplifying resonance, but not on the resonance, superluminal behavior
can result. The instantaneous spectrum during the front portion of the pulse is Figure 7.20 Real and imaginary
generally wider and can therefore lap onto the nearby gain peak. The medium parts of the refractive index for an
accordingly amplifies this perceived spectrum, and the front of the pulse grows. amplifying medium.
The energy is then returned to the medium from the latter portion of the pulse
as the instantaneous spectrum narrows and withdraws from the gain peak. The
effect is not only consistent with the principle of causality, it is a direct and general
consequence of causality as demonstrated by (7.77) and (7.78). p
As an illustration, consider the broadband waveform with T2 = γ−1 / 2 de-
scribed in example 7.5. Consider an amplifying medium with index shown in

Fig. 7.20 with the amplifying resonance (negative oscillator strength) set on the
frequency ω0 = ω̄0 + 2γ, where ω̄0 is the carrier frequency. Thus, the resonance
structure is centered a modest distance above the carrier frequency, and there is
only minor spectral overlap between the pulse and the resonance structure.
Superluminal behavior can occur in amplifying materials when the forward
edge of a narrow-band pulse receives extra amplification. Fig. 7.21 shows how the
early portion of a pulse has a wide instantaneous spectrum computed by (7.78)
that can lap onto the amplifying resonance. As the wings grow and access the
neighboring resonance, the pulse extracts more energy from the medium. As the
wings diminish, the pulse surrenders much of that energy back to the medium,
which shifts the center of the pulse forward.
In this appendix we have indirectly proven that a sharply defined signal edge
cannot propagate faster than c. If a signal edge begins abruptly at time t 0 , the
instantaneous spectrum E t (ω) clearly remains identically zero until that time. In
other words, no energy may be exchanged with the medium until the field energy
from the pulse arrives. Since, as was pointed out in connection with (7.67), the
Cauchy-Schwartz inequality prevents the field energy from traveling faster than c,
at no point in the medium can a signal front exceed c.
Figure 7.21 Animation of a nar-

rowband pulse traversing an am- Appendix 7.C Kramers-Kronig Relations
plifying medium off resonance.
The black dot shows the move- In the late 1920s, of Ralph Kronig and Hendrik Kramers independently discovered
ment of the center of all energy. a remarkable relationship between the real and imaginary parts of a material’s
The red line inside the medium
susceptibility χ (ω). Recall that the susceptibility as defined in (2.16) relates the
shows the energy held in that
polarization of a material to the field that stimulates the medium:
medium, which cannot go nega-
tive. The lower figure shows the
instantaneous spectrum of the P (ω) = ²0 χ (ω) E (ω) (7.79)
pulse at the front of the medium
relative to the narrow amplifying They made an argument based on causality (i.e. effect cannot precede cause),
resonance. which allows one to obtain the real part of χ (ω) from the imaginary part of χ (ω), if
it is known for all ω. Similarly, one can obtain the imaginary part of χ (ω) from the
real part of χ (ω), if it is known for all ω. We develop the Kramers-Kronig formulas
below.
We can replace E (ω) in (7.79) with the Fourier transform of E (t ) in accordance
with (7.18). In addition, we take the inverse Fourier transform (7.18) of both sides
of (7.79) and obtain
Z∞ Z∞
 
²0 1 ¡ 0 ¢ i ωt 0 0 −i ωt
P (t ) = p χ (ω)  p E t e dt e dω (7.80)
2π 2π
−∞ −∞
Next we interchange the order of integration to get
Z∞
 ∞ 
²0
Z
E t  χ (ω) e −i ω(t −t ) d ω d t 0
¡ 0¢ 0
P (t ) = (7.81)
2π
−∞ −∞

7.C Kramers-Kronig Relations 185
Now for the causality argument: The polarization of the medium P (t ) cannot
depend on the field E t 0 at future times t 0 > t . Therefore the expression in square
¡ ¢
brackets must be identically zero unless t − t 0 > 0. This places a restriction on the
functional form of χ (ω) as we shall see.
The causality argument comes explicitly into play when we employ the fol-
lowing integral formula:8
Z∞
e−iω (t−t ) 0
0 0
−i ω(t −t 0 ) 0 1
e = sign{t − t } dω (7.82)
iπ ω − ω0
−∞
+1 (t > t 0 )
½
0
Apparently, we require the positive sign from sign{t − t } ≡ .
−1 (t < t 0 )
Upon substitution of (7.82) into (7.81) and after changing the order of integra-
tion within the square brackets we obtain
Z∞
 ∞
Z∞
 
²0 1 χ (ω)
Z
d ω e −i ω (t −t ) d ω0  d t 0
¡ 0¢ 0 0
P (t ) = E t   (7.83)
2π iπ ω − ω0
−∞ −∞ −∞
For (7.81) and (7.83) to be the same, we require

Z∞ ¡ 0 ¢
1 χ ω
χ (ω) = d ω0 (7.84)
iπ ω0 − ω
−∞
or
Z∞
Reχ ω0 + i Imχ ω0
¡ ¢ ¡ ¢
1
Reχ (ω) + i Imχ (ω) = d ω0 (7.85)
iπ ω0 − ω
−∞
Finally, equating separately the real and imaginary parts of the above equation
yields
Z∞ Z∞
Imχ ω0 Reχ ω0
¡ ¢ ¡ ¢
1 0 1
Reχ (ω) = d ω and Imχ (ω) = − d ω0 (7.86)
π ω0 − ω π ω0 − ω
−∞ −∞
These are known as the Kramers-Kronig relations on real and imaginary parts of
χ.9 If the real part of χ is known at all frequencies, we can use the Kramers-Kronig
relations to generate the imaginary part, and visa versa. We see that the real and
imaginary parts of χ cannot be chosen independently if we are to respect the
principle of causality.
8 This integral, which is a specific instance of Cauchy’s theorem, is tricky because it involves two
diverging pieces, to either side of the singularity ω = ω0 . The divergences have opposite sign so that
they cancel. The integration must approach the singularity in the same manner from either side, in
which case the result is called the principal value. In practical terms, if the integral is performed
numerically, the sampling of points should straddle the singularity symmetrically; other sampling
schemes can change the result dramatically, which is incorrect.
9 As with (7.82), the principal value of the integral must be calculated. If the integral is performed
numerically, the sampling of points should straddle the singularity symmetrically. Separately, the
integral on each side of ω0 = ω diverges, but with opposite sign.

Example 7.6
Show that the expression in square brackets of (7.81) is zero when t 0 > t , if χ (ω)
satisfies the Krammers-Kronig relations (7.86).
Solution: The expression may be written as

Z∞ Z∞ Z∞
−i ω(t −t 0 ) −i ω(t −t 0 )
Imχ (ω) e −i ω(t −t ) d ω
0
χ (ω) e dω = Reχ (ω) e dω + i
−∞ −∞ −∞
Z∞ Z∞ Z∞
 
Reχ ω0
¡ ¢
Reχ (ω) e −i ω( t −t 0 )d ω + i − 1 d ω0  e −i ω(t −t ) d ω
0
=
π ω0 − ω
−∞ −∞ −∞
Z∞ Z∞ Z∞
 
e −i ω(t −t )
0
−i ω(t −t 0 ) 1 0
Reχ (ω) e dω + Reχ ω  d ω d ω0
¡ ¢
=
iπ ω0 − ω
−∞ −∞ −∞
(7.87)
where we have invoked the Krammers-Kronig relation for Imχ (ω) (7.86)
and interchanged the order of integration in the final expression. Since we
are specifically considering future times t 0 > t , we have by (7.82)
Z∞
e −i ω(t −t )
0
1
d ω = −e −i ω (t −t )
0 0
iπ ω −ω
0
−∞
Hence
Z∞ Z∞ Z∞
−i ω(t −t 0 ) −i ω(t −t 0 )
Reχ ω0 e −i ω (t −t ) d ω0
0 0
χ (ω) e d ω = Reχ (ω) e dω −
¡ ¢
−∞ −∞ −∞
=0
(7.88)
Finally, it is worth noting that the Krammers-Kronig relations also apply to

the real and imaginary parts of the index of refraction (subtract one). 10
Z∞ Z∞
κ ω0 n ω0 − 1
¡ ¢ ¡ ¢
1 0 1
n (ω) − 1 = dω and κ (ω) = − d ω0 (7.89)
π ω0 − ω π ω0 − ω
−∞ −∞
One can use the Kramers-Kronig relations to find the real part of the index from
a measurement of absorption, if the measurement is done over a broad enough
range of the spectrum. This is the most useful form of the Kramers-Kronig rela-
tions.
It is sometimes convenient to multiply the numerator and denominator inside
the integrands of (7.89) by ω0 + ω. Then noting that n is an even function and
10 This follows from Cauchy’s theorem since the index (subtract one) is the square root of chi (ω).
The Krammers-Kronig relations for chi (ω) guarantee that χ (ω) has no poles in the upper half
complex plane, when ω is considered (for mathematical purposes) to be a complex variable. Taking
the square root does not introduce poles to the upper half plane.

7.C Kramers-Kronig Relations 187
κ is an odd function allows us to dismiss either ω0 or ω in the numerator and

integrate11 over positive frequencies only:
Z∞ Z∞ ¡ 0 ¢
ω0 κ ω0 n ω −1
¡ ¢
2 0 2ω
n (ω) − 1 = dω and κ (ω) = − d ω0 (7.90)
π ω02 − ω2 π ω02 − ω2
0 0
11 The integrals (7.89) and (7.90) diverge to either side of ω0 = ω, but with opposite sign. Again,
the principal value of the integral is required, which means a numeric grid should straddle the
singularity symetrically.

Exercises
Exercises for 7.1 Intensity of Superimposed Plane Waves
P7.1 (a) Consider two counter-propagating fields described by x̂E 1 e i (kz−ωt )

and x̂E 2 e i (−kz−ωt ) where E 1 and E 2 are both real. Show that their sum
can be written as
x̂E tot (z) e i (Φ(z)−ωt )
where s
E2 2 E2
µ ¶
E tot (z) = E 1 1 − + 4 cos2 kz
E1 E1
and
(1 − E 2 /E 1 )
· ¸
−1
Φ (z) = tan tan kz
(1 + E 2 /E 1 )
Outside the range − π2 ≤ kz ≤ π
2 the pattern repeats.
(b) Suppose that two counter-propagating laser fields have separate
intensities, I 1 and I 2 = I 1 /100. The ratio of the fields is then E 2 /E 1 =
1/10. In the standing interference pattern that results, what is the ratio
of the peak intensity to the minimum intensity? Are you surprised how
high this is?
P7.2 Equation (7.7) implies that there is no interference between fields that
are polarized along orthogonal dimensions. That is, the intensity of
E(r, t ) = x̂E 0 e i [(k ẑ)·r−ωt ] + ŷE 0 e i [(k x̂)·r−ωt ]
according to (7.7) is uniform throughout space. Of course (7.7) does not

apply since the k-vectors are not parallel. Show that the time-average
of S (r, t ) according to (7.4) exhibits interference in the distribution of
net energy flow.
Exercises for 7.2 Group vs. Phase Velocity: Sum of Two Plane Waves
P7.3 Show that (7.10) can be written as

ω2 +ω1
∆k ∆ω
³ ´ µ ¶
k2 +k1
i ·r− t
E(r, t ) = 2E0 e 2 2
cos ·r− t
2 2
From this show that the speed of the rapid-oscillation intensity peaks
in Fig. 7.2 is v p = k̄/ω̄ where
(k 1 + k 2 ) (ω1 + ω2 )
k̄ ≡ and ω̄ ≡
2 2
P7.4 Confirm the right-hand side of (7.16).

Exercises 189
Exercises for 7.3 Frequency Spectrum of Light
P7.5 The continuous field of a very narrowband continuous laser may be

approximated as a pure plane wave: E(r, t ) = E0 e i (k0 z−ω0 t ) . Suppose the
wave encounters a shutter at the plane z = 0.
(a) Compute the power spectrum of the light before the shutter. HINT:
The answer is proportional to the square of a delta function centered
on ω0 (see (0.45)).
(b) Compute the power spectrum after the shutter if it is opened during
the interval −T /2 ≤ t ≤ T /2. Plot the result. Are you surprised that the
shutter appears to create extra frequency components?
HINT: Write your answer in terms of the sinc function defined by
sincα ≡ sin α/α.
P7.6 (a) Determine the Full-Width-at-Half-Maximum of the intensity (i.e.

the width of I (r, t ) represented by ∆t FWHM ) and of the power spectrum
(i.e. the width of I (r, ω) represented by ∆ωFWHM ) for the Gaussian pulse
defined in (7.24).
HINT: Both answers are in terms of T .
(b) Give an uncertainty principle for the product of ∆t FWHM and ∆ωFWHM .
Exercises for 7.5 Quadratic Dispersion
P7.7 The intensity of a Gaussian laser pulse has a FWHM duration TFWHM =
25 fs with carrier frequency ω0 corresponding to λvac = 800 nm. The
pulse goes through a lens of thickness ` = 1 cm (laser quality glass type
BK7) with index of refraction given approximately by
ω
n (ω) ∼
= 1.4948 + 0.016
ω0
What is the full-width-at-half-maximum of the intensity for the emerg-

ing pulse?
HINT: For the input pulse we have
TFWHM
T= p
2 ln 2
(see P7.6).
P7.8 If the pulse defined in (7.47) travels through the material for a very long
distance z such that T (z) → T Φ (z) and tan−1 Φ (z) → π/2, show that
the instantaneous frequency of the pulse is
t − 2z/v g
ω0 +
4αz
COMMENT: As the wave travels, the earlier part of the pulse oscillates
more slowly than the later part. This is called chirp, and it means that
the red frequencies get ahead of the blue ones since they experience a
lower index.
Exercises for 7.6 Generalized Context for Group Delay
P7.9 When the spectrum is narrow compared to features in a resonance

(such as in Fig. 7.10), the reshaping delay (7.54) tends to zero and can
be ignored. Show that when the spectrum is narrow the net group delay
(7.53) reduces to
∂Rek
¯
lim ∆tG (r) = · ∆r¯¯
¯
T →∞ ∂ω ω̄
P7.10 When the spectrum is very broad the reshaping delay (7.54) also tends
to zero and can be ignored. Show that when the spectrum is extremely
broad, the net group delay reduces to
∆r
lim ∆tG (r) =
T →0 c
assuming k and ∆r are parallel. This implies that a sharply defined
signal cannot travel faster than c.
HINT: The real index of refraction n goes to unity far from resonance,
and the imaginary part κ goes to zero.
P7.11 Work through the derivation of (7.52).

HINT: This somewhat lengthy derivation can be found in Optics Ex-
press 9, 506-518 (2001).
Exercises for 7.A Pulse Chirping in a Grating Pair
P7.12 A Gaussian pulse with T = 20 fs is incident with θi = 20◦ on a grating

pair with groove separation d = 1.67 µm. What grating separation L
will lead to a pulse duration of T = 100 ps? Assume two passes through
the grating pair for a total effective separation of 2L. Take the pulse
carrier frequency to corresponds to λ0 = 800 nm.

Chapter 8
Coherence Theory
Most students of physics become familiar with a Michelson interferometer (shown

in Fig. 8.1) early in their course work. This preliminary understanding is usually
gained in terms of a single-frequency plane wave that travels through the instru-
ment. A Michelson interferometer divides the initial beam into two identical
beams and then delays one beam with respect to the other before bringing them
back together. Depending on the relative path difference d (roundtrip by our
convention) between the two arms of the system, the light can interfere construc- Beam
Splitter
tively or destructively in the direction of the detector. We will find it convenient to
express the relative path difference as a relative time delay τ ≡ d /c. The intensity
seen at the detector as a function of path difference is computed to be
c²0 h i h i∗
I det (τ) = E0 e i (kz−ωt ) + E0 e i (kz−ω(t −τ)) · E0 e i (kz−ωt ) + E0 e i (kz−ω(t −τ))
2 Detector
c²0 £
2E0 · E∗0 + 2E0 · E∗0 cos(ωτ)
¤
=
2 Figure 8.1 Michelson interferome-
= 2I 0 [1 + cos(ωτ)] ter.
(8.1)
where I 0 ≡ c²20 E0 · E∗0 is the intensity from one beam alone (when the other arm
of the interferometer is blocked). This formula is familiar and it describes how
the intensity at the detector oscillates between zero and four times the intensity
of one beam alone. Keep in mind that if a 50:50 beam splitter is used, then the
intensity arriving to the detector from one arm alone (with other arm blocked) is
one fourth of the original beam, since the light meets the beam splitter twice.
In this chapter, we will derive an appropriate replacement for (8.1) when light
containing a continuous band of frequencies is sent through the interferometer.
Instead of repeating indefinitely, the oscillations in the intensity at the detector
become less pronounced as the mirror in one arm of the interferometer is scanned
away from the position where the two paths are equal. Remarkably, this decrease
in fringe visibility depends only upon the frequency content of the light without
regard to whether as a frequency components are organized into a short pulse or
a longer time pattern. This brings up the concept of temporal coherence, which is
related to how fast fringe visibility diminishes as delay is introduced in an arm of
191
192 Chapter 8 Coherence Theory
the Michelson interferometer.

In section 8.4, we discuss a practical application known as Fourier spec-
troscopy. This powerful technique makes it possible to deduce the spectral con-
tent of light using a Michelson interferometer rather than a grating spectrometer.
Finally, we will study a Young’s two-slit setup, which is similar to a Michelson
interferometer in that light takes two different paths and then interferes. The
concept of spatial coherence arises naturally with the two-slit setup in the same
way that the concept of temporal coherence arises naturally iwith the Michelson
interferometer.
8.1 Michelson Interferometer

Consider a waveform E(t ) that has traveled through the first arm of a Michelson
interferometer to arrive at the detector in Fig. 8.1. Specifically, E(t ) is the value of
the field at the detector when the second arm of the interferometer is blocked.
The waveform E(t ) in general may be composed of many frequency components
Albert Abraham Michelson (1852–
1931, United States) was born in according to the inverse Fourier transform (7.17). For convenience we will think
Poland, but he immigrated to the of E (t ) as a pulse containing a finite amount of energy. (We will comment on
US with his parents and grew up in
the rough mining towns of California
continuous light sources in the next section.) The beam that travels through the
and Nevada where his father was a second arm of the interferometer is associated with the same waveform, albeit
merchant. Michelson attended high with a delay τ according to the path difference between the two arms. Thus,
school in San Fransisco. He entered
the US Naval Academy in 1869 (with E (t − τ) indicates the field at the detector from the second arm when the first arm
intervention from US President Grant of the interferometer is blocked. (Again, τ represents the round-trip delay of the
after Michelson pleaded his case on
the grounds near the White House). adjustable path relative to the position where the two paths have equal lengths.)
After two years at sea, Michelson re- The total field at the detector is composed of the two waveforms:
turned to the Naval Academy to teach
physics and mathematics for several
years. Michelson was fascinated by the Edet (t , τ) = E (t ) + E (t − τ) (8.2)
problem of determining the speed of
light, and developed successive exper- The intensity (7.20) at the detector (setting n = 1) is then
iments to measure it more accurately.
He is probably most famous for his ex- c²0
periment conducted at Case School of I det (t , τ) = Edet (t , τ) · E∗det (t , τ)
Applied Science in Cleveland with Ed- 2
ward Morley to detect the motion of c²0 £
E(t ) · E∗ (t ) + E(t ) · E∗ (t − τ) + E(t − τ) · E∗ (t ) + E(t − τ) · E∗ (t − τ)
¤
=
the earth through the ether. Michelson 2
later was a professor at the University of c²0 £
= I (t ) + I (t − τ) + E(t ) · E∗ (t − τ) + E(t − τ) · E∗ (t )
¤
Chicago and then at Caltech. In 1907
he became the first American to win the 2
= I (t ) + I (t − τ) + c²0 Re E(t ) · E∗ (t − τ)
© ª
Nobel prize, for his contributions to op-
tics. Michelson married late in life and
was the father of four.
(8.3)
The function I (t ) corresponds to the intensity of one of the beams arriving at the
detector while the opposite path of the interferometer is blocked. Notice that we
have retained the dependence on t in I det (t , τ) in addition to the dependence on
the path delay τ. This allows for pulses with arbitrary duration and shape. The
rapid oscillations of the light are automatically averaged away in I (t ), but not the
slowly varying form of the pulse.
The total energy (per area), or fluence, accumulated at the detector is found
by integrating the intensity over time. In other words, we let the detector integrate

8.1 Michelson Interferometer 193
the energy for the entire pulse before taking a reading. For short laser pulses
(sub-nanosecond), a detector automatically integrates the entire energy (per area)
of the pulse since a detector cannot keep up with temporal variations on such a
rapid time scale. The integration of (8.3) over time yields the signal at the detector
(that varies with delay τ):
Z∞
Sig(τ) ∝ I det (t , τ) d t
−∞
(8.4)
Z∞ Z∞ Z∞
= I (t )d t + I (t − τ) d t + c²0 Re E (t ) · E∗ (t − τ) d t
−∞ −∞ −∞
The first two integrals on the right-hand side are equal:
Z∞ Z∞
E≡ I (t )d t = I (t − τ) d t (8.5)
−∞ −∞
E represents the fluence (accumulated energy per area) from one arm of the
interferometer when the other arm is blocked. Note that the second integral
is insensitive to τ since a change of variables t 0 = t − τ converts it into the first
integral.
The final integral in (8.4) remains unchanged if we take a Fourier transform
followed by an inverse Fourier transform:
Z∞ Z∞ Z∞ Z∞
 
1 1
E(t ) · E∗ (t − τ) d t = p d ωe −i ωτ  p d τe i ωτ E (t ) · E∗ (t − τ) d t 
2π 2π
−∞ −∞ −∞ −∞
(8.6)
The reason for this procedure is so that we can take advantage of the autocorre-
lation theorem (see P0.27). With it, the expression in square brackets simplifies
p p Figure 8.2 The output or signal
to 2πE (ω) · E∗ (ω) = 2π2I (ω) /c²0 . With the aid of (8.5) and (8.6), the overall
from a Michelson interferometer
fluence (8.4) becomes
for light with a Gaussian spec-
Z∞

Z∞
 trum.
1
I det (t , τ) d t = 2E 1 + Re I (ω)e −i ωτ d ω (8.7)
E
−∞ −∞
It is convenient to rewrite this result in terms of the degree of coherence func-

tion γ (τ):
Sig(τ) ∝ 1 + Reγ (τ) (8.8)
where
R∞
I (ω) e −i ωτ d ω
−∞
γ (τ) ≡ (8.9)
R∞
I (ω) d ω
−∞

The denominator of (8.9) was rewritten with the help of Parseval’s theorem (8.5)
R∞ R∞
E≡ I (t )d t = I (ω) d ω.
−∞ −∞
In summary, (8.8) describes the signal (i.e. accumulated energy per area)
arriving to the detector after the Michelson interferometer. The dependence on
the path delay τ is entirely contained in the function γ (τ).
Example 8.1
Compute the output signal when a Gaussian pulse with spectrum (7.24) is sent
into a Michelson interferometer.
Solution: The power spectrum of the pulse is1
²0 c 2 2
I (r, ω) = E0 · E∗0 T 2 e −T (ω−ω0 )
2
where T is the pulse duration, not to be confused with τ in this section, which is
the delay of the interferometer arm. As shown in Example 7.3, we also have
Z∞
²0 c p
I (r, ω) d ω = E0 · E∗0 T π
2
−∞
The degree of coherence (8.9) is then
Z∞
T 2
(ω−ω0 )2 −i ωτ
γ (τ) = p e −T e dω
π
−∞
Z∞ 2 ω −i τ 2
τ T π (2T )
r
0
−T 2 ω2 +(2T 2 ω0 −i τ)ω−T 2 ω20 −T 2 ω20
=p e dω = p e 4T 2
π π T2
−∞
2
− τ2
=e 4T e −i ω0 τ
Formula (0.55) was used to complete the integration.
According to (8.8), the signal at the detector is then
τ2
−
Sig(τ) ∝ 1 + Reγ (τ) = 1 + e 4T2 cos (ω0 τ)
Fig. 8.2 shows this signal for a given T . As delay is added (or subtracted), the output
signal oscillates. Eventually enough delay is introduced such that the very short
pulses no longer interfere (arriving sequentially), and the output signal becomes
steady.
1 Technically, the output intensity is one fourth this, but our calculation of the degree of coher-
ence is insensitive to amplitude.

8.2 Temporal Coherence 195
8.2 Temporal Coherence

We could have derived (8.8) using another strategy, which may seem more intu-
itive than the approach in the previous section. Equation (8.1) gives the intensity
at the detector when a single plane wave of frequency ω goes through the inter-
ferometer. Now suppose that a waveform composed of many frequencies is sent
through the interferometer. The intensity associated with each frequency acts
independently, obeying (8.1) individually.
The total energy (per area) accumulated at the detector is then a linear super-
position of the spectral intensities of all frequencies present:
Z∞ Z∞
I det (ω, τ) d ω = 2I (ω) [1 + cos (ωτ)] d ω (8.10)
−∞ −∞
While this procedure may seem obvious, the fact that we can do it is remarkable!
Remember that it is usually the fields that we must add together before finding
the intensity of the resulting superposition. The formula (8.10) with its super-
position of intensities relies on the fact that the different frequencies inside the
interferometer when time-averaged (over all time) do not interfere. Certainly,
the fields at different frequencies do interfere (or beat in time). However, they
constructively interfere as often as they destructively interfere, and over time it is
as though the individual frequency components transmit independently. Again,
in writing (8.10) we considered the light to be pulsed rather than continuous so
that the integrals converge.
We can manipulate (8.10) as follows:
R∞
 
Z∞ Z∞ I (ω) cos (ωτ) d ω 
 

−∞
I det (ω, τ) d ω = 2 I (ω) d ω 1 + (8.11)
  
∞ 
I (ω) d ω
 R 
−∞ −∞
−∞
This is the same as (8.7) since we can replace cos(ωτ) with Re e −i ωτ , and we can
© ª
apply Parseval’s theorem (8.5) to the other integrals. Thus, the above arguments
lead to (8.8) and (8.9), in complete agreement with the previous section.
Finally, let us consider the case of a continuous light source for which the
integrals in (8.8) diverge. This is the case for starlight or for a continuous wave
R∞
(CW) laser source. The integral −∞ I (t )d t diverges since a source that is on
forever (or at least for a very long time) emits infinite (or very much) energy.
However, note that the integrals on both sides of (8.8) diverge in the same way.
We can renormalize (8.8) in this case by replacing the integrals on each side with
the average value of the intensity:
ZT /2
1
I ave ≡ 〈I (t )〉t = I (t )d t (continuous source) (8.12)
T
−T /2

The duration T must be large enough to average over any fluctuations that are
present in the light source. The average in (8.12) should not be used on a pulsed
light source since the result would depend on the duration T of the temporal
window.
In the continuous wave (CW) case (e.g. starlight or a CW laser), the signal at
the detector (8.8) becomes
〈I det (t , τ)〉t = 2 〈I (t )〉t 1 + Reγ (τ) (continuous source) (8.13)
£ ¤
Although technically the integrals involved in computing γ (τ) (8.9) also diverge
in the case of CW light, the numerator and the denominator diverge in the same
way. Therefore, we may renormalize I (ω) in any way we like to deal with this
problem, and this does not affect the final result. Regardless of how large I (ω)
is, and regardless of the units on the measurement (volts or whatever), we can
simply plug the instrument reading directly into (8.9). The units in the numerator
and denominator cancel so that γ (τ) always remains dimensionless.
A very remarkable aspect of the above result is that the behavior of the light in
the Michelson interferometer does not depend on the phase of E (ω). It depends
only on the amount of light associated with each frequency component through
I (ω) ≡ ²20 c E (ω) · E∗ (ω). When the light at one frequency undergoes constructive
interference for a given path difference τ, the light at another frequency might
undergo destructive interference. The net effect is embodied in the degree of
coherence function γ (τ), which contains the essential information describing
interference. Fig. 8.3 depicts the degree of coherence function as one arm of the
interferometer is adjusted through various delays τ.
8.3 Fringe Visibility and Coherence Length

The degree of coherence function γ (τ) is responsible for oscillations in intensity at
the detector as the mirror in one arm of the interferometer is moved. The real part
Reγ (τ) is analogous to cos(ωτ) in (8.1). For large delays τ, the oscillations tend
to die off as different frequencies get out of synch, individually some interfere
constructively, while others destructively. Narrowband light is temporally more
coherent than broadband light because there is less opportunity for frequencies to
get out of synch. Still, for large path differences, the oscillations tend to eventually
die off, and the intensity at the detector remains steady as the mirror is moved
further.
We define the coherence time to be the amount of delay necessary to cause
γ(τ) to quit oscillating (i.e. its amplitude approaches zero). A useful (although
arbitrary) definition for the coherence time is
Z∞ Z∞
τc ≡ ¯γ (τ)¯2 d τ = 2 ¯γ (τ)¯2 d τ
¯ ¯ ¯ ¯
(8.14)
−∞ 0
The coherence length is the distance that light travels in this time:
`c ≡ cτc (8.15)

8.4 Fourier Spectroscopy 197
Another useful concept is fringe visibility. The fringe visibility is defined in

the following way:
I max − I min
V (τ) ≡ (continuous source) (8.16)
I max + I min
or
E max − E min
V (τ) ≡ (pulsed source) (8.17)
E max + E min
R∞
where E max ≡ max −∞ I det (t , τ) d t refers to the accumulated energy (per area) at
the detector when the mirror is positioned such that the amount of throughput to
the detector is a local maximum (i.e. the left-hand side of (8.8)). E min refers to the
accumulated energy at the detector when the mirror is positioned such that the
amount of throughput to the detector is a local minimum. As the mirror moves a
large distance from the equal-path-length position, the oscillations become less
pronounced because the values of E min and E max tend to take on the same value,
and the fringe visibility goes to zero. The fringe visibility goes to zero when γ (τ)
goes to zero. It is left as an exercise to show that the fringe visibility can be written
simply as
V (τ) = ¯γ (τ)¯
¯ ¯
(8.18)
Example 8.2
Find the fringe visibility and the coherence time for the Gaussian pulse studied in Figure 8.3 Re[γ(τ)] (solid) and
Example 8.1. |γ(τ)| (dashed) for a light pulse
with a Gaussian spectrum as in
Solution: By (8.18), the fringe visibility is examples 8.1 and 8.2.
2
− τ
V (τ) = ¯γ (τ)¯ = e 4T 2 .
¯ ¯
This is shown as the dashed line in Fig. 8.3. As expected, the fringe visibility dies
off as delay τ changes from zero, the point where the interferometer arms are
equidistant. From (8.14) the coherence time is
Z∞ Z∞
τ2 p
τc = ¯γ (τ)¯2 d τ = −
dτ =
¯ ¯
e 2T 2 2πT
−∞ −∞
which is the delay necessary to cause the fringes to substantially diminish.
8.4 Fourier Spectroscopy

As we have seen in the previous discussion, the signal output from a Michelson
interferometer (8.7) for a pulsed input may be written as
Z∞
Sig (τ) ∝ 2E + 2Re I (ω)e −i ωτ d ω (8.19)
−∞

Typically, the signal comes in the form of a voltage or a current from a sensor.
However, the signal can easily be normalized to the beam fluence. In particular,
for large τ the fringe visibility goes to zero (i.e. γ (τ) = 0), and the normalized
signal must approach
Z ∞
lim Sig (τ) = 2E = 2 I (t )d t (8.20)
τ→∞ −∞
We will assume that this normalization has taken place and write (8.19) as an
equality.
Given our measurement of Sig(τ), we would like to find I (ω), or the spectrum
of the light. Unfortunately, I (ω) is buried within an integral in (8.19). However,
since the integral looks like an inverse Fourier transform of I (ω), we will be
able to extract the desired spectrum after some manipulation. This procedure
for extracting I (ω) from an interferometric measurement is known as Fourier
spectroscopy.
We first take the Fourier transform of (8.19):2
Z∞
 
 
F Sig (τ) = F {2E } + F 2Re I (ω) e −i ωτ d ω (8.21)
© ª
 
−∞
The left-hand side is known since it is the measured data, and a computer can
be employed to take the Fourier transform of it. The first term on the right-hand
side is the Fourier transform of a constant:
Z∞
1 p
F {2E } = 2E p e i ωτ d τ = 2E 2πδ (ω) (8.22)
2π
−∞
Notice that (8.22) is zero everywhere except where ω = 0, where a spike occurs.
This represents the DC component of F Sig (τ) .
© ª
The second term of (8.21) can be written as

Z∞
 ∞
Z∞
  
  Z 
F 2Re I (ω) e −i ωτ d ω = F I (ω) e −i ωτ d ω + I (ω) e i ωτ d ω
   
−∞ −∞ −∞
Z∞ Z∞ Z∞ Z∞
   
1 0 1 0
= p I (ω0 )e −i ω τ d ω0  e i ωτ d τ + p I (ω0 )e i ω τ d ω0  e i ωτ d τ
2π 2π
−∞ −∞ −∞ −∞
 ∞ ∞ ∞ Z∞
    
p 1 1
Z Z Z
0 −i (ω0 −ω)τ 0 0 −i (ω0 +ω)τ 0
= 2π  I (ω )  e d τ d ω + I (ω )  e d τ d ω 
2π 2π
−∞ −∞ −∞ −∞
 ∞ ∞

p Z Z
= 2π  I (ω0 )δ ω0 − ω d ω0 + I (ω0 )δ ω0 + ω d ω0 
¡ ¢ ¡ ¢
−∞ −∞
p
= 2π [I (ω) + I (−ω)]
(8.23)
2 This is weird since normally we take Fourier transforms on fields rather than expressions
involving intensity!

8.5 Young’s Two-Slit Setup and Spatial Coherence 199
With (8.22) and (8.23) we can write (8.21) as
F Sig (τ)
© ª
p = 2E 0 δ (ω) + I (ω) + I (−ω) (8.24)
2π
The Fourier transform of the measured signal is seen to contain three terms, one
of which is the power spectrum that we are after, namely I (ω). Fortunately, when
graphed as a function of ω (shown in Fig. 8.4), the three terms on the right-hand
side typically do not overlap. As a reminder, the measured signal as a function of
τ looks something like that in Fig. 8.2. The oscillation frequency of the fringes lies
in the neighborhood of ω0 . To obtain I (ω) the procedure is clear: Record Sig (τ); A graphical depiction of
Figure 8.4±p
if desired, normalize by its value at large τ; take its Fourier transform; extract the F {Sig(τ)} 2π .
curve at positive frequencies.
8.5 Young’s Two-Slit Setup and Spatial Coherence

In close analogy with the Michelson interferometer, which is able to investigate
temporal coherence, a Young’s two-slit setup can be used to investigate spatial co-
herence of quasi-monochromatic light. Thomas Young, who lived nearly a century
before Michelson, used his two-slit setup for the first conclusive demonstration
that light propagates as a wave. The Young’s double-slit setup and the Michelson
interferometer have in common that two beams of light travel different paths
and then interfere. In the Michelson interferometer, one path is delayed with
respect to the other so that temporal effects can be studied. In the Young’s two-slit
setup, two laterally separate points of the same wave are compared as they are
sent through two slits. Depending on the coherence of the wave at the two points,
the fringe pattern observed can exhibit good or poor visibility.
Just as the Michelson interferometer is sensitive to the spectral content of
light, the Young’s two-slit setup is sensitive to the spatial extent of the light source
illuminating the two slits. For example, if light from a distant star (restricted
by a filter to a narrow spectral range) is used to illuminate a double-slit setup,
the resulting interference pattern appearing on a subsequent screen contains
information regarding the angular width of the star. Michelson was the first to
use this type of setup to measure the angular width of stars.
Light emerging from a single ideal point source has wave fronts that are
spatially uniform in a lateral sense (see Fig. 8.5). Such wave fronts are said to be
spatially coherent, even if the temporal coherence is not perfect (i.e. if a range
of frequencies is present). When spatially coherent light illuminates a Young’s
two-slit setup, fringes of maximum visibility are seen at a distant screen, meaning
the fringes vary between a maximum intensity and zero. If a larger source of light
(with randomly varying phase across its extent) is used to illuminate the Young’s
two-slit setup as in Fig. 8.6, the wave fronts at the two slits are less correlated,
and the visibility of the fringes on the distant screen diminishes because fringes
fluctuate rapidly in time and partially ‘wash out’.

Fringe Pattern
Point Source
Figure 8.5 A point source produces coherent (locked phases) light. When this light
which traverses two slits and arrives at a screen it produces a fringe pattern.
When the slits of a Young’s two-slit setup are illuminated with spatially coher-
ent light, the resulting pattern on a far-away screen is given by
I = 2I 0 1 + cos k (d 2 − d 1 ) + φ2 − φ1 = 2I 0 1 + cos kh y/D + ∆φ (8.25)

£ £ ¤¤ £ ¡ ¢¤
where φ1 and φ2 are the phases of the wave front at the two slits, respectively.
Notice the close similarity with a Michelson interferometer (see (8.1)). Here
the controlling variable is h (the separation of the slits) rather than τ (the delay
introduced by moving a mirror in the Michelson interferometer). To obtain the
final expression in (8.25) we have made the approximations:
s ¢2 " ¢2 #
y − h/2 y − h/2
q ¡ ¡
¢2
d1 y = y − h/2 + D 2 = D 1+ ∼
= D 1+ (8.26)
¡ ¢ ¡
+···
D2 2D 2
and
s ¢2 " ¢2 #
y + h/2 y + h/2
q ¡ ¡
¢2
d2 y = y + h/2 + D 2 = D 1+ ∼
= D 1+ (8.27)
¡ ¢ ¡
+···
D2 2D 2
These approximations are valid as long as D À y and D À h.

We next consider how to modify (8.25) so that it applies to the case when
the two slits are illuminated by a host of point sources distributed over a finite
lateral extent. This situation is depicted in Fig. 8.6 and it leads to partial spatial
coherence if the phase of each emitter is random. We will find that a larger source
gives less coherent wave fronts at the slits.
To simplify our analysis, we restrict the distribution of point sources to vary
only in the y 0 dimension. We assume that the light is quasi-monochromatic so
that its frequency is approximately ω with a phase that fluctuates randomly over
time intervals much longer than the period of oscillation 2π/ω. This necessarily
implies that there will be some frequency bandwidth, however small.

8.5 Young’s Two-Slit Setup and Spatial Coherence
Extended Source 201
Fringe Pattern
Figure 8.6 Light from an extended source is only partially coherent. Fringes are still
possible, but they exhibit less contrast.
The light emerging from the j th point at y 0j travels by means of two very
narrow slits to a point y on a screen. Let E 1 (y 0j ) and E 2 (y 0j ) be the fields on the
screen at y, each originating from the point y 0j and traveling respectively through
the two slits. We suppress the vectorial nature of E 1 (y 0j ) and E 2 (y 0j ), and we ignore
possible complications due to field polarization. The total field contribution at
the screen from the j th point is obtained by adding E 1 (y 0j ) and E 2 (y 0j ). Let us make
the assumption that E 1 (y 0j ) and E 2 (y 0j ) have the same amplitude |E (y 0j )|. Thus,
the two fields differ only in their phases according to the respective distances
traveled to the screen. This allows us to write the two fields as
¯ n h 0 i
0
o
0 ¯ i k r 1 (y j )+d 1 (y) −ωt +φ(y j )
¯
0
E 1 (y j ) = ¯E (y j )¯ e (8.28)
¯
and ¯ n h 0 i o
0
0 ¯ i k r 2 (y j )+d 2 (y) −ωt +φ(y j )
¯
0
E 2 (y j ) = ¯E (y j )¯ e (8.29)
¯
Notice that we have explicitly included an arbitrary phase φ(y 0j ), which is different
for each point source.
We now set about finding the cumulative field at y arising from the many
points indexed by the subscript j . We therefore sum over the index j . Again,
for simplicity we have assumed that the point sources are distributed along one
dimension, in the y 0 -direction. The upcoming results can be generalized to a
two-dimensional source where the point sources are distributed also in and out
of the plane of Fig. 8.6. However, in this case, the slits should be replaced with
two pinholes.
The net field on the screen at point y is
Xh i
E net (h) = E 1 (y 0j ) + E 2 (y 0j ) (8.30)
j
This net field depends not only on h, but also on y, R, D, and k as well as on the
phase φ(y 0j ) at each point. Nevertheless, in the end we will mainly emphasize the

dependence on the slit separation h. The intensity of this field is
²0 c
I net (h) = |E net (h)|2
2 " #· ¸∗
²0 c X 0 0
X 0 0
= E 1 (y j ) + E 2 (y j ) E 1 (y m ) + E 2 (y m ) (8.31)
2 j m
²0 c Xh i
= E 1 (y 0j )E 1∗ (y m
0
) + E 2 (y 0j )E 2∗ (y m
0
) + 2ReE 1 (y 0j )E 2∗ (y m
0
)
2 j ,m
When inserting the field expressions (8.28) and (8.29) into this expression for the
intensity at the screen, we get
²0 c X ¯¯
· h i h i
0 ¯ i k r 1 (y j )−r 1 (y m ) i φ(y j )−φ(y m )
0 0 0 0
¯¯
0 ¯¯
¯
I net (h) = ¯E (y j )¯ E (y m ) e e
2 j ,m
h i h i
0 ¯ i k r 2 (y j )−r 2 (y m ) i φ(y j )−φ(y m )
0 0 0 0
¯ ¯¯
+ ¯E (y 0j )¯ ¯E (y m
¯
) e e
¯ ¯
h i h i¸
0 ¯ i k r 1 (y j )−r 2 (y m ) i k [d 1 (y)−d 2 (y)] i φ(y j )−φ(y m )
0 0 0 0
¯ ¯¯
+2Re ¯E (y 0j )¯ ¯E (y m
¯
) e e e
¯ ¯
(8.32)
At this juncture we make a critical assumption that the phase of the emission
φ(y 0j ) varies in time independently at every point on the source. This assump-
tion is appropriate for the emission from thermal sources such as starlight, a
glowing filament (filtered to a narrow frequency range), or spontaneous emission
from an excited gas or plasma. The assumption of random phase, however, is
inappropriate for coherent sources such as laser light. We comment on this in
Appendix 8.B.
A wonderful simplification happens to (8.32) when φ(y 0j ) − φ(y m
0
) varies ran-
domly in time for j 6= m (i.e. when there is no correlation between the two phases).
Keep in mind that to the extent that the phases vary in time, the frequency spec-
trum of the light broadens in competition with our quasi-monochromatic h as-
i
i φ(y 0 )−φ(y 0 )
sumption. If we average the intensity over an extended time, then e j m
0
averages to zero unless we have j = m in which case the factor reduces to e = 1.
Thus, we have
1 if j = m,
¿ h iÀ ½
i φ(y 0j )−φ(y m
0
)
e = δ j ,m ≡ (random phase assumption) (8.33)
t 0 if j =
6 m.
The function δ j ,m is known as the Kronecker delta function.

The time-averaged intensity under the random-phase assumption (8.33) be-
comes
h i
0 0
0 i k r 1 (y j )−r 2 (y j ) i k [d 1 (y)−d 2 (y)]
I (y 0j ) + I (y 0j ) + 2Re
X X X
〈I net (h)〉t = I (y j )e e (8.34)
j j j
We may use (8.26) to simplify d 1 (y)−d 2 (y) ∼

= h y/D, and similarly, we may simplify

8.5 Young’s Two-Slit Setup and Spatial Coherence 203
r 1 (y 0j ) − r 2 (y 0j ) ∼
= y 0j h/R with the approximations
 ³ ´2 
y 0j − h/2
r
³ ´2
r 1 (y 0j ) = y 0j − h/2 + R 2 ∼
= R 1 + + · · · (8.35)
 
2R 2
and  ³ ´2 
y 0j + h/2
r
³ ´2
r 2 (y 0j ) = y 0j + h/2 + R 2 ∼
= R 1 + +··· (8.36)
 
2R 2
With these simplifications, (8.34) becomes

kh y 0
X ³ 0´ kh y X j
³ ´
〈I net (h)〉t = 2 I y j + 2Ree −i D I y 0j e −i R Thomas Young (1773–1829, English)
j j was born in Milverton, Somerset, and
(random phase assumption) (8.37) was the eldest of ten children. By age
fourteen, he had become proficient at a
The only thing left to do is to put this formula into a slightly more familiar form:
dozen different languages. As a young
" # adult, he studied medicine and then
X ³ 0´ £ went to GŽtttingen, Germany where
〈I net (h)〉t = 2 I y j 1 + Reγ (h) (random phase assumption) (8.38)
¤
he earned a doctoral degree in physics.
j In 1801, he was appointed professor of
natural philosophy at the Royal Insti-
where 0
tute, but he also maintained an active
−i
kh y P ³ 0 ´ −i kh y j medical practice on the side. He con-
e D I yj e R tributed to a wide variety of fields and
j helped to decipher ancient Egyptian hi-
γ (h) ≡ P ³ 0´ (8.39) eroglyphs, including the Rosetta Stone.
I yj He published descriptions of the heart
j and arteries as well as how the eye ac-
commodates to see at different depths
Students should notice the close similarity to the Michelson interferometer, (8.8) and how the eye perceives color. In en-
and (8.9). As before, γ(h) is known as the degree of coherence, in this case spatial gineering fields, Young is well known
his analysis of stresses and strains in
coherence. It controls the fringe pattern seen at the screen. elastic media. Young’s double-slit exper-
We can generalize (8.38) so that it applies to the case of a continuous distribu- iment gave convincing evidence of the
wave nature of light, overturning New-
tion of light as opposed to a collection of discrete point sources. In Appendix 8.A ton’s corpusculor theory. Regarding this,
we show how summations in (8.38) and (8.39) become integrals over the source Thomas Young traded ideas with Au-
intensity distribution, and we write gustin Fresnel through correspondence.
〈I net (h)〉t = 2 〈I oneslit 〉t 1 + Reγ (h) (random phase assumption) (8.40)

£ ¤
where
kh y R∞ kh y 0
e −i D I (y 0 )e −i R d y0
−∞
γ (h) ≡ (8.41)
R∞
I (y 0 )d y 0
−∞
0
Note that I (y ) has units of intensity per length in this expression.
The factor exp −i kh y/D defines the positions of the periodic fringes on
¡ ¢
the screen. The remainder of (8.41) controls the depth of the fringes as the slit
separation h is varied. When the slit separation h increases, the amplitude of γ (h)
tends to diminish until the intensity at the screen becomes uniform. When the

kh y 0
two slits have very small separation (such that e −i R ∼ = 1) then we have ¯γ (h)¯ = 1
¯ ¯
and very good fringe visibility results. As the slit separation h increases, the fringe
visibility
V (h) = ¯γ (h)¯
¯ ¯
(8.42)
diminishes, eventually approaching zero (see (8.18)). In analogy to the temporal

case (see (8.14)), we can define a slit separation sufficiently large to make the
fringes at the screen disappear:
Z∞
¯2
h c ≡ 2 ¯γ (h)¯ d h
¯
(8.43)
0
Appendix 8.A Spatial Coherence for a Continuous Source

In this appendix we examine the spatial coherence of light from a continuous
spatial distribution (as opposed to a collection of discrete point sources) and
justify (8.41) and (8.42) under the assumption of randomly varying phase at the
source. We begin by replacing the summations in (8.32) with integrals over a con-
tinuous emission source. As we do this, we must consider the field contributions
to be in units of field per length of the extended source. We make the following
replacements:
Z∞
1
E 1 (y 0j ) → p E 1 (y 0 )d y 0
X
j 2π
−∞
Z∞
0 1
E 1 (y 00 )d y 00
X
E 1 (y m )→ p
m 2π
−∞
(8.44)
Z∞
1
E 2 (y 0j ) → p E 2 (y 0 )d y 0
X
j 2π
−∞
Z∞
0 1
E 2 (y 00 )d y 00
X
E 2 (y m )→ p
m 2π
−∞
p
We include the factor 1/ 2π here as part of the definition of the field distributions
for later convenience. With the above replacements, (8.32) becomes
Z∞ Z∞

²0 c  1 ¯E (y 0 )¯ e i kr 1 (y 0 ) e i φ(y 0 ) d y 0
¯ ¯
¯E (y 00 )¯ e −i kr 1 (y 00 ) e −i φ(y 00 ) d y 00
¯ ¯
I net (h) =
2 2π
−∞ −∞
Z∞ Z∞
1 ¯E (y 0 )¯ e i kr 2 (y 0 ) e i φ(y 0 ) d y 0
¯ ¯
¯E (y 00 )¯ e −i kr 2 (y 00 ) e −i φ(y 00 ) d y 00
¯ ¯
+
2π
−∞ −∞
Z∞ Z∞

i k [d 1 (y)−d 2 (y)]
e ¯E (y 0 )¯ e i kr 1 (y 0 ) e i φ(y 0 ) d y 0
¯ ¯ 00 00
¯E (y 00 )¯ e −i kr 2 (y ) e −i φ(y ) d y 00 
¯ ¯
+2Re
2π
−∞ −∞
(8.45)

8.A Spatial Coherence for a Continuous Source 205
The next step is to make the average over random phases. Rather than deal
with a time average of randomly varying phases, we will instead work with a linear
superposition of all conceivable phase factors. That is, we will write the phase as
φ(y 0j ) → K y 0 , where K is a parameter with units of inverse length, which we allow
to take on all possible real values with uniform likelihood. The way we modify
(8.33) for the continuous case is then
¿ h iÀ Z∞
i φ(y 0j )−φ(y m
0
) 0 00
e = δ j ,m → e i K (y −y ) d K = 2πδ(y 00 − y 0 ) (8.46)
t
−∞
Instead of taking the time average, we integrate both sides of (8.45) over all pos-
sible values of the phase parameter K , whereupon the delta function in (8.46)
naturally arises on the right-hand side of the equation.
When (8.45) is integrated over K , the result is
Z∞
 ∞
Z∞
²0 c  ¯¯
Z
E (y 0 )¯ e i kr 1 ( y ) d y 0
0 ¯ ¡ 00 ¢¯ −i kr (y 00 ) ¡ 00
δ y − y 0 d y 00
¯
I net (h) d K = ¯E y ¯ e
¢
1
2
−∞ −∞ −∞
Z∞ Z∞
¯E (y 0 )¯ e i kr 2 (y 0 ) d y 0 ¯E (y 00 )¯ e −i kr 2 ( y 00 ) δ y 00 − y 0 d y 00
¯ ¯ ¯ ¯ ¡ ¢
+
−∞ −∞
Z∞ Z∞

+2Ree i k [d1 ( y )−d2 ( y )] ¯E (y 00 )¯ e −i kr 2 ( y ) δ y 00 − y 0 d y 00 

¯ ¡ 0 ¢¯ i kr (y 0 ) 0 ¯ ¯ 00
¯E y ¯ e 1 d y
¡ ¢
−∞ −∞
(8.47)
It may seem strange at first that the left-hand side of (8.47) has units of inten-
sity per unit length. This is somewhat abstract. However, these units result from
the natural way of dealing with the random phases when the source is continuous.
As K varies, the phase distribution at the source varies. The integral in (8.47)
averages all of these possibilities.
The delta functions in (8.47) allow us to perform another stage of integration
for each term on the right-hand side. We can also make substitutions from (8.26),
(8.27), (8.35) and (8.36). The result is
Z∞ Z∞ Z∞
kh y kh y 0
0 0 −i
I net (h) d K = 2 I (y )d y + 2Ree D I (y 0 )e −i R d y0 (8.48)
−∞ −∞ −∞
where
1 ¯2
I (y 0 ) ≡ ²0 c ¯E (y 0 )¯
¯
(8.49)
2
Notice that I (y 0 ) in the present context has units of intensity per length squared
since E (y 0 ) has units of field per length. As they should, the units on the two sides
of (8.48) match, both having units of intensity per length. (Recall that K has units
of per length and I net (h) has usual units of intensity.) We can renormalize these
strange units on each side of the equation. We can redefine the left-hand side
R∞
−∞ IRnet (h) d K to be the intensity at the screen and the integral on the right-hand
∞
side −∞ I (y 0 )d y 0 to be the intensity at the screen when only one slit is open. Then
(8.48) reduces to (8.40) and (8.41).

Appendix 8.B Van Cittert-Zernike Theorem

In this appendix we avoid making the assumption of randomly varying phase.
This would be the case when the source of light is, for example, a laser. By
substituting (8.28) and (8.29) into (8.45) we have
¯ ∞ ¯2 ¯ ∞ ¯2
²0 c ¯¯ ¯ i φ( y 0 )+i k y 02 −i kh y 0 ¯ i φ(y 0 )+i k y 02 i kh y 0
¯Z · ¸ ¯ ¯Z · ¸ ¯
0 0 0 0
¯ ¯ ¯ ¯ ¯
I net (h) = p ¯E (y )¯ e 2R e 2R d y ¯ + ¯ ¯E (y )¯ e 2R e 2R d y ¯
2 2π ¯
¯ ¯ ¯ ¯
−∞ −∞
¯ ¯ ¯
kh y Z∞ ·
 ∞
 ∗ 
ei D ¯ ¡ 0 ¢¯ i φ(y 0 )+i k y 02 −i kh y 0 k y 002 kh y 00
¸  Z ·¯ ¡ ¢¯ ¸
¯E y 00 ¯ e i φ( y )+i 2R e i 2R d y 00 
00

+ 2Re p ¯E y ¯ e 2R e 2R d y 0
2π  
−∞ −∞
(8.50)
The three terms on the right-hand side of (8.50) can be understood as follows.
The first term is the intensity on the screen when the lower slit is covered. The
second term is the intensity on the screen when the upper slit is covered. The last
term is the interference term, which modifies the sum of the individual intensities
when both slits are uncovered.
Notice the occurrence of Fourier transforms (over position) on the quantities
inside of the square brackets. Later, when we study diffraction theory, we will
recognize these transforms as determining the strength of fields impinging on
the individual slits.This corresponds to a major difference between a spatially
coherent source and a random-phase source. With the random-phase source, the
slits are always illuminated with the same strength regardless of the separation.
However, with a coherent source, ‘beaming’ can occur such that the strength as
well as phase the field at each slit depends on the slit separation.
A beautiful simplification occurs when the phase of the emitted light has the
following distribution:
k y 02
(converging spherical wave) φ(y 0 ) = − (8.51)
2R
Equation (8.51) is not as arbitrary as it may first appear. This particular phase
is an approximation to a concave spherical wave front converging to the center
between the two slits. This type of wave front is created when a plane wave passes
through a lens. With the special phase (8.51), the intensity (8.50) reduces to
¯2 ¯ ¯2
Z∞ Z∞
¯
²0 c ¯¯ 1 1
¯ ¯ ¯ ¯
kh y 0 kh y 0
0 ¯ −i 2R 0¯ 0 ¯ i 2R 0¯
¯ ¯ ¯ ¯ ¯ ¯
I net (h) = E (y ) e d y ¯ +¯p E y e dy ¯
¡ ¢¯
p ¯ ¯ ¯
2 ¯ 2π¯
¯ ¯ 2π ¯
−∞ −∞
(converging spherical wave) ∗  (8.52)
kh y Z∞
 1 Z∞ ¯
 
ei D kh y 0
¯E (y 0 )¯ e −i 2R d y 0 p
¯ ¯ kh y 00
¯E (y 0 )¯ e i 2R d y 00 
¯ 
+2Re p
2π  2π 
−∞ −∞
There is a close resemblance between the expression

Z∞
¯ ¯
¯ 1
¯ ¯
kh y 0
0 ¯ −i 2R 0¯
¯ ¯ ¯
|E slit one (h/2)| ≡ ¯ p
¯ ¯ E (y ) e dy ¯ (8.53)
¯ 2π ¯
−∞
and the magnitude of the degree of coherence V = ¯γ (h)¯ from (8.41). Here
¯ ¯
E slitone denotes the field impinging on the screen that goes through the upper slit

8.B Van Cittert-Zernike Theorem 207
positioned at a distance h/2 from center. The field strength when the single slit is
positioned at h compared to that when it is positioned at zero is
¯ ∞ ¯
¯ R ¯
0
¯ −i kh y 0 0
¯
¯ ¯E (y ) e R d y ¯¯
¯
¯ E slit one (h) ¯ ¯¯−∞
¯ ¯
¯=¯ (8.54) (converging spherical wave
¯ ¯
¯E ∞¯
(0)
¯
slit one
¯ ¯
assumption)
¯
¯E (y 0 )¯ d y 0
R ¯
¯ ¯
¯ −∞ ¯
This looks very much like ¯γ (h)¯ of (8.41) except that the magnitude of the field
¯ ¯
appears in (8.54), whereas the intensity appears in (8.41).

If¯ we replace the field¯in (8.54)¯ with one that is proportional to the intensity
2
(i.e. ¯E new y 0 ¯ ∝ I (y 0 ) ∝ ¯E old (y 0 )¯ ), then the expression becomes the same as
¡ ¢¯
(8.41). This may seem rather contrived, but at least it is cute, and it is known as
the van Cittert-Zernike theorem. It says that the spatial coherence of an extended
source with randomly varying phase corresponds to the field distribution created
by replacing the extended source with a converging spherical wave whose field
amplitude distribution is the same as the original intensity distribution.

Exercises
Exercises for 8.2 Temporal Coherence
P8.1 Show that Re{γ(τ)} defined in (8.9) reduces to cos (ω0 τ) in the case of a
plane wave E (t ) = E 0 e i (k0 z−ω0 t ) being sent through a Michelson inter-
ferometer. In other words, the output intensity from the interferometer
reduces to
I = 2I 0 [1 + cos (ω0 τ)]
as you already expect.

HINT: Don’t be afraid of delta functions. After integration, the left-over
delta functions cancel.
P8.2 Light emerging from a dense hot gas has a collisionally broadened
power spectrum described by the Lorentzian function
I (ω0 )
I (ω) = ´2
ω−ω0
³
1+ ∆ωFWHM /2
The light is sent into a Michelson interferometer. Make a graph of the

average power arriving to the detector as a function of τ.
HINT: See (0.56).
P8.3 Consider the light source described in P8.2

(a) Regardless of how the phase of E (ω) is organized, the oscillation
of the energy arriving to the detector as a function of τ is the same.
The spectral phase of the light in P8.2 is randomly organized. Describe
qualitatively how the light probably looks as a function of time.
(b) Now suppose that the phase of the light is somehow neatly orga-
nized such that ω
i E (ω0 ) e i c z
E (ω) =
i + ∆ωω−ω 0
FWHM /2
Perform the inverse Fourier transform on the field and find how the
intensity of the light looks a function of time.
HINT:
Z∞
e −i ax −2i πe i aβ if a>0
½
dx = Imβ > 0
¡ ¢
x +β 0 if a<0
−∞
The constants I (ω0 ), and ∆ωFWHM will appear in the answer.

Exercises 209
Exercises for 8.3 Fringe Visibility and Coherence Length
P8.4 (a) Verify that (8.18) gives the fringe visibility.

HINT: Write γ = ¯γ¯ e i φ and assume that the oscillations ¯in¯ γ that give
¯ ¯
rise to fringes are due entirely to changes in φ and that ¯γ¯ is a slowly
varying function in comparison to the oscillations.
(b) What is the coherence time τc of the light in P8.2?
P8.5 (a) Show that the fringe visibility of a Gaussian spectral distribution
(see Example 8.2) goes from 1 to e −π/2 = 0.21 as the round-trip path in
one arm of the instrument is extended by a coherence length.
(b) Find the FWHM bandwidth in wavelength ∆λFWHM in terms of the
coherence length `c and the center wavelength λ0 .
HINT: First determine ∆ωFWHM , defined to be the width of I (ω) at half
of its peak. To convert to a wavelength difference, use ω = 2πc λ ⇒
2πc
∆ωFWHM ∼ −
= λ2 ∆λFWHM . You can ignore the minus sign; it simply
0
means that wavelength decreases as frequency increases.
Exercises for 8.4 Fourier Spectroscopy
L8.6 (a) Use a scanning Michelson interferometer to measure the wave-

length of the ultrashort laser pulses from a mode-locked Ti:sapphire
oscillator.
(b) Measure the coherence length of the source by observing the dis-
tance over which the visibility diminishes. From your measurement,
what is the bandwidth ∆λFWHM of the source, assuming the Gaussian Beam
Splitter
profile in the previous problem? See P8.5.
(c) Use a computer to perform a fast Fourier transform (FFT) of the
signal output. For the positive frequencies, plot the laser spectrum as a
function of λ and compare with the results of (a) and (b).
Detector
(d) How do the results change if the ultrashort pulses are first stretched
in time by traversing a thick piece of glass?
Figure 8.7
Exercises for 8.5 Young’s Two-Slit Setup and Spatial Coherence
P8.7 (a) A point source with wavelength λ = 500 nm illuminates two parallel
slits separated by h = 1.0 mm. If the screen is D = 2 m away, what is
the separation between the diffraction peaks on the screen? Make a
sketch.
(b) A thin piece of glass with thickness d = 0.01 mm and index n = 1.5 is
placed in front of one of the slits. By how many fringes does the pattern
at the screen move?

HINT: This effectively introduces a relative phase ∆φ in (8.25). Com-

pare the phase of the light when traversing the glass versus traversing
an empty region of the same thickness.
L8.8 (a) Carefully measure the separation of a double slit in the lab (h ∼
1 mm separation) by shining a HeNe laser (λ = 633 nm) through it and
measuring the diffraction peak separations on a distant wall (say, 2 m
from the slits).
HINT: For better accuracy, measure across several fringes and divide.
Double slit
Single slit separation h
Diffuser
width a Filter
Laser
CCD
Camera
Rotating diffuser
to create phase
variation
Figure 8.8
(b) Create an extended light source with a HeNe laser using a time-
varying diffuser followed by an adjustable single slit. (The diffuser
must rotate rapidly to create random time variation of the phase at
each point as would occur automatically for a natural source such
as a star.) Place the double slit at a distance of R ≈ 100 cm after the
first slit. (Take note of the exact value of R, as you will need it for the
next problem.) Use a lens to image the diffraction pattern that would
have appeared on a far-away screen into a video camera. Observe
the visibility of the fringes. Adjust the width of the source with the
single slit until the visibility of the fringes disappears. After making the
source wide enough to cause the fringe pattern to degrade, measure
the single slit width a by shining a HeNe laser through it and observing
the diffraction pattern on the distant wall.
HINT: As we will study later, a single slit of width a produces an inten-
sity pattern on a screen a distance L away described by
³ πa ´
I (x) = I peak sinc2 x
λL
sin α sin α
where sinc (α) ≡ α and lim α = 1.
α→0
NOTE: It would have been nicer to vary the separation of the two slits
to determine the width of a fixed source. However, because it is hard to
make an adjustable double slit, we varied the size of the source until
the spatial coherence of the light matched the slit separation.

Exercises 211
P8.9 (a) Compute h c for a uniform intensity distribution of width a using

(8.43).
(b) Use this formula to check that your measurements in L8.8 agree
with spatial coherence theory.
HINT: In your experiment h c is the double slit separation. Use your
measured R and h to calculate what the width of the single slit (i.e.
a) should have been when the fringes disappeared and compare this
calculation to your direct measurement of a.
Solution: (This is only a partial solution)
a/2
y0

y −i kh
a/2 a/2 y0
e −i kh D  e
y R
y0 y
e −i kh D e −i kh R
h ³ í
I 0 exp −i kh R + D d y0 d y0
R R
−i kh

R
−a/2 −a/2 −a/2
γ (h) = = =
a/2 a a
I0d y 0
R
−a/2
a/2 −a/2
 
−i kh
y
e
R − e −i kh R y
 = e −i kh D sinc kha
= e −i kh D
−2i kh a/2
R
2R
Note that
Z∞
sin2 αx π
dx =
(αx)2 2α
0

R29 T or F: In our notation (widely used), I (t ) is the Fourier transform of

I (ω).
R30 T or F: The integral of I (t ) over all t equals the integral of I (ω) over all
ω.
R31 T or F: The phase velocity of light (the speed of an individual frequency

component of the field) never exceeds the speed of light c.
R32 T or F: The group velocity of light in a homogeneous material can

exceed c if absorption or amplification takes place.
R33 T or F: The group velocity of light never exceeds the phase velocity.
R34 T or F: A Michelson interferometer can be used to measure the spectral

intensity of light I (ω).
R35 T or F: A Michelson interferometer can be used to measure the duration

of a short laser pulse and thereby characterize its chirp.
R36 T or F: A Michelson interferometer can be used to measure the wave-

length of light.
R37 T or F: A Michelson interferometer can be used to measure the phase

of E (ω).
R38 T or F: The Fourier transform (or inverse Fourier transform if you prefer)
of I (ω) is proportional to the degree of temporal coherence.
R39 T or F: A Michelson interferometer is ideal for measuring the spatial

coherence of light.
R40 T or F: The Young’s two-slit setup is ideal for measuring the temporal
coherence of light.
213
R41 T or F: Vertically polarized light illuminates a Young’s double-slit setup

and fringes are seen on a distant screen with good visibility. A half wave
plate is placed in front of one of the slits so that the polarization for that
slit becomes horizontally polarized. Here’s the statement: The fringes
at the screen will shift position but maintain their good visibility.
Problems
Horizontal Vertical
Polarizer Polarizer
R42 (a) Horizontally polarized light enters a system and first travels through
a horizontal and then a vertical polarizer in series. What is the Jones
vector of the transmitted field?
(b) Now a polarizer at 45◦ is inserted between the two polarizers in the
system described in (a). What is the Jones vector of the transmitted
field? How does the final intensity compare to initial intensity?
(c) Now a quarter wave plate with a fast-axis angle at 45◦ is inserted
Figure 8.9 between the two polarizers (instead of the polarizer of part (b)). What
is the Jones vector of the transmitted field? How does the final intensity
compare to initial intensity?
R43 (a) Find the Jones matrix for half wave plate with its fast axis making an
arbitrary angle θ with the x-axis.
HINT: Project an arbitrary polarization with E x and E y onto the fast
and slow axes of the wave plate. Shift the slow axis phase by π, and then
project the field components back onto the horizontal and vertical axes.
The answer is
cos2 θ − sin2 θ 2 sin θ cos θ
· ¸
2 sin θ cos θ sin2 θ − cos2 θ
y-axis (b) We desire to create a variable attenuator for a polarized laser beam
using a half wave plate and a polarizer aligned to the initial polarization
Fast axis
of the beam (see figure). The fast axis of the half wave plate is initially
x-axis Transmission Axis aligned in the direction of polarization and then rotated through an
angle θ. What is the ratio of the intensity exiting the polarizer to the
incoming intensity as a function of θ?
R44 (a) What is the spectral content (i.e., I (ω)) of a square laser pulse
E 0 e −i ω0 t , |t | ≤ τ/2
½
Figure 8.10 Polarizing Elements E (t ) =
0 , |t | > τ/2
Make a sketch of I (ω), indicating the location of the first zeros.

(b) What is the temporal shape (i.e., I (t )) of a light pulse with frequency
content
E 0 , |ω − ω0 | ≤ ∆ω/2
½
E (ω) =
0 , |ω − ω0 | > ∆ω/2

215
where in this case E 0 has units of E-field per frequency. Make a sketch
of I (t ), indicating the location of the first zeros.
(c) If E (ω) is known (any arbitrary function, not the same as above), and
the light goes through a material of thickness ` and index of refraction
n (ω), how would you find the form of the pulse E (t ) after passing
through the material? Please set up the integral.
R45 (a) Prove Parseval’s theorem:
Z∞ Z∞
2
|E (ω)| d ω = |E (t )|2 d t .
−∞ −∞
HINT:
Z∞
1
e i ω(t −t ) d ω
0
0
δ t −t =
¡ ¢
2π
−∞
(b) Explain the physical relevance of Parseval’s theorem to light pulses.

Suppose that you have a detector that measures the total energy in
a pulse of light, say 1 mJ directed onto an area of 1 mm2 . Next you
measure the spectrum of light and find it to have a width of ∆λ =
50 nm, centered at λ0 = 800 nm. Assume that the light has a Gaussian
frequency profile
¡ ω−ω0 ¢2
I (ω) = I (ω0 )e − δω
2πc
Use as an approximate value δω ∼
= λ2
∆λ. Find a value and correct
units for I (ω0 ).
HINT:
Z∞
π B 2 /4A+C
r
−Ax 2 +B x+C
e dx = e Re {A} > 0
A
−∞
R46 Continuous light entering a Michelson interferometer has a spectrum

described by
I 0 , |ω − ω0 | ≤ ∆ω/2
½
I (ω) =
0 , |ω − ω0 | > ∆ω/2
The Michelson interferometer uses a 50:50 beam splitter. The emerging
light has intensity 〈I det (t , τ)〉t = 2 〈I (t )〉t 1 + Reγ (τ) , where degree of
£ ¤
coherence is
Z∞ , Z∞
−i ωτ
γ(τ) = I (ω) e dω I (ω)d ω
−∞ −∞
Find the fringe visibility V ≡ (I max − I min )/(I max + I min ) as a function of τ
(i.e. the round-trip delay due to moving one of the mirrors).

Extended Source
Fringe Pattern
R47 Light emerging from a point travels by means of two very narrow slits
to a point y on a screen. The intensity at the screen arising from a point
source at position y 0 is found to be
y y0
½ · µ ¶¸¾
¡ 0 ¢ 0
I screen y , h = 2I (y ) 1 + cos kh +
D R
where an approximation has restricted us to small angles.
(a) Now, suppose that I (y 0 ) characterizes emission from a wider source
with randomly varying phase across its width. Write down an expres-
sion (in integral form) for the resulting intensity at the screen:
Z∞
I screen (h) ≡ I screen y 0 , h d y 0
¡ ¢
−∞
(b) Assume that the source has an emission distribution with the form
02 02
I (y 0 ) = I 0 /∆y 0 e −y /∆y . What is the function γ(h) where the intensity
¡ ¢
p
is written I screen (h) = 2 πI 0 1 + Reγ(h) ?
£ ¤
HINT:
Z∞
π B 2 /4A+C
r
−Ax 2 +B x+C
e dx = e Re {A} > 0.
A
−∞
(c) As h varies, the intensity at a point on the screen y oscillates. As h

grows wider, the amplitude of oscillations decreases. How wide must
the slit separation h become (in terms of R, k, and ∆y 0 ) to reduce the
visibility to
I max − I min 1
V≡ =
I max + I min 3
Selected Answers
R42: (b) 1/4, (c) 1/2.

R45: (b) 3.8 × 10−16 J/ cm2 · s−1 .
¡ ¢

Chapter 9
Light as Rays
So far in our study of optics, we have described light in terms of waves, which
satisfy Maxwell’s equations. However, as is well known to students, in many
situations light can be thought of as rays directed along the flow of energy. A ray
picture is useful when one is interested in the macroscopic flow of light energy, but
rays fail to reveal fine details. For example, simple ray theory suggests that a lens
can focus light down to a point. However, if a beam of light were concentrated
onto a true point, the intensity would be infinite! Clearly ray theory cannot
describe the intensity profile of focused light where necessary to consider waves
and diffraction phenomena. Nevertheless, ray theory is useful for predicting
where a focus occurs. It is also useful for describing imaging properties of optical
systems (e.g. lenses and mirrors).
Beginning in section 9.3 we study the details of ray theory and the imaging
properties of optical systems. First, however, we examine the justification for ray
theory starting from Maxwell’s equations. In the short-wavelength limit, Maxwell’s
equations give rise to the eikonal equation, which governs the direction of rays
in a medium with an index of refraction that varies with position. The German
word ‘eikonal’ comes from the Greek ‘²ικων’ from which the modern word ‘icon’
derives. The eikonal equation therefore has a descriptive title since it controls the
formation of images. Although we will not use the eikonal equation extensively,
we will show how it embodies the underlying justification for ray theory. As will be
apparent in its derivation, the eikonal equation relies on an approximation that
the features of interest in the light distribution are large relative to the wavelength
of the light.
The eikonal equation describes the flow of energy in an optical medium.
This applies even to complicated situations such as desert mirages where air is
heated near the ground and has a different index than the air further from the
ground. Rays of light from the sky that initially are directed toward the ground
can be bent such that they travel parallel to or even up from the ground, owing
to the inhomogeneous refractive index. If the index of refraction as a function of
position is known, the eikonal equation can be used to determine the propagation
of such rays. This also applies to practical problems such as the propagation of
217
218 Chapter 9 Light as Rays
rays through lenses (where the index also varies with position, albeit abruptly).
The eikonal equation can be used to deduce Fermat’s principle, which in short
says that light travels from point A to point B following a path that takes the mini-
mum time. Of course Fermat asserted his principle more than a century before
Maxwell’s equations were known, but it is nice to give justification retroactively to
Fermat’s principle using the modern perspective.
We will analyze the propagation of rays through optical systems composed of
lenses and/or curved mirrors in the context of paraxial ray theory,. The paraxial
approximation restricts rays to travel nearly parallel to the axis of such systems.
We consider the effects of three basic optical elements acting on paraxial rays.
The first element is simply an unobstructed distance d through which rays prop-
agate in a uniform medium; if the ray is not exactly parallel to the optical axis,
then it moves further away from (or closer to) the optical axis as it travels. The
second element is a curved spherical mirror, which reflects a ray and changes its
angle with respect to the optical axis. The third element is a spherical interface
between two materials with differing refractive indices. The effects of each of
these elements on a ray of light can be represented as a 2 × 2 matrix. These three
basic elements can be combined to construct more complex imaging systems
(such as a lens or a series of lenses and curved mirrors). The overall effect of a
complex system on a ray can be computed by multiplying together the matrices
associated with each of the basic elements.
Image formation occurs in the context of the paraxial approximation, includ-
ing the familiar formula
1 1 1
= + (9.1)
f do di
which describes the location of images produced by a curved mirror or a thin lens.
We will see that complicated multi-element optical systems also obey (9.1) if d o
and d i are referenced to principal planes rather than the single plane of a thin lens.
In this case, the overall system has an effective focal length f eff . In appendix 9.A
we address deviations from the paraxial ray theory known as aberrations. We
also comment on ray-tracing techniques, used for designing optical systems that
minimize such aberrations.
Paraxial ray theory can also be used to study the stability of laser cavities. The
ray formalism predicts whether a ray, after many round trips in the cavity, remains
near the optical axis (trapped and therefore stable) or if it drifts endlessly away
from the axis of the cavity on successive round trips.
9.1 The Eikonal Equation

In this chapter, we consider light to consist of only a single frequency ω. The wave
equation (2.13) for a medium with a real index of refraction in this case may be
written as
n 2 (r) ω2
∇2 E(r, t ) − E (r, t ) = 0 (9.2)
c2
9.1 The Eikonal Equation 219
where we have already performed the time differentiation on the assumed time
dependence e −i ωt . Although in chapter 2 we considered solutions to the wave
equation in a homogeneous material, the wave equation remains perfectly valid
when the index of refraction varies throughout space (i.e. if n (r) is an arbitrary
function of r). In this case, the usual plane-wave solutions no longer satisfy the
wave equation.
As a trial solution for (9.2), we take
E(r, t ) = E0 (r) e i [kvac R(r)−ωt ] (9.3)
where
ω 2π
k vac = = (9.4)
c λvac
Here R (r) is a real scalar function (which depends on position) having the dimen-
sion of length. By taking that R (r) to be real, we do not account for absorption or
amplification in the medium. Even though the trial solution (9.3) looks somewhat
like a plane wave,1 the function R (r) accommodates wave fronts that can be
curved or distorted as depicted in Fig. 9.1. At any given instant t , the phase of the
curved surfaces described by R (r) = constant can be interpreted as wave fronts Figure 9.1 Wave fronts (i.e. sur-
of the solution. The wave fronts travel in the direction for which R (r) varies the faces of constant phase given by
R(r)) distributed throughout space
fastest. This direction is given by ∇R (r), which lies in the direction perpendicular
in the presence of a spatially inho-
to surfaces of constant phase.
mogeneous refractive index. The
The substitution of the trial solution (9.3) into the wave equation (9.2) gives gradient of R gives the direction of
1 2h i k vac R(r)
i travel for a wavefront.
2
∇ E 0 (r) e + n 2 (r) E0 (r) e i kvac R(r) = 0 (9.5)
k vac
where we have divided each term by e −i ωt .
Computing the Laplacian in (9.5)
The gradient of the x component of the field is

h i
∇ E 0x (r) e i kvac R(r) = [∇E 0x (r)] e i kvac R(r) + i k vac E 0x (r) [∇R (r)] e i kvac R(r)
The Laplacian of the x component is

h i ©
∇ · ∇ E 0x (r) e i kvac R(r) = ∇2 E 0x (r) − k vac
2
E 0x (r) [∇R (r)] · [∇R (r)]
+i k vac E 0x (r) ∇ R (r) + 2i k vac [∇E 0x (r)] · [∇R (r)] e i kvac R(r)
£ 2 ¤ ª
Upon combining the result for each vector component of E0 (r), the required spatial
derivative can be written as
h i ¡
∇2 E0 (r) e i kvac R(r) = ∇2 E0 (r) − k vac
2
E0 (r) [∇R (r)] · [∇R (r)] + i k vac E0 (r) ∇2 R (r)
£ ¤
+2i k vac x̂ [∇E 0x (r)] · [∇R (r)] + ŷ ∇E 0 y (r) · [∇R (r)]

© £ ¤
+ ẑ [∇E 0z (r)] · [∇R (r)]}) e i kvac R(r)

1 If the index is spatially independent (i.e. n (r) → n), then (9.3) reduces to the usual plane-wave
solution of the wave equation. In this case, we have R (r) = k · r/k vac and the field amplitude
becomes constant (i.e. E0 (r) → E0 ).

After performing the Laplacian and after some rearranging, (9.5) becomes
∇2 E0 (r) i 2i
∇R(r) · ∇R(r) − n 2 (r) E0 (r) = ∇2 R (r) +
£ ¤
2
+ x̂∇E 0x (r) · ∇R (r)
k vac k vac k vac
2i £ £
ŷ ∇E 0 y (r) · ∇R (r) + ẑ∇E 0z (r) · ∇R (r)
¤ ¤
+
k vac
(9.6)
Don’t be afraid; at this point we are ready to make an important approxima-
tion. We take the limit of a very short wavelength (i.e. 1/k vac = λvac /2π → 0), and
the entire right-hand side of (9.6) vanishes (thank goodness)! With it we lose the
effects of diffraction. We also lose surface reflections at abrupt index changes
unless specifically considered. This approximation works best in situations where
only macroscopic features are of concern.
Our wave equation has been simplified to
[∇R (r)] · [∇R (r)] = n 2 (r) (9.7)

Written another way, this equation is
∇R (r) = n (r) ŝ (r) (9.8)
where ŝ is a unit vector pointing in the direction ∇R (r), the direction normal to
y wave front surfaces. Equation (9.8) is called the eikonal equation.
Example 9.1
h
Suppose that a region of air above the desert
¡ ¢ on a phot day has an index of refraction
that varies with height y according to n y n 1 + y 2 /h 2 . Verify that R x, y =
¡ ¢
= 0
h/2 n 0 x ± y 2 /2h is a solution to the eikonal equation. (See problem P9.1 for a more
¡ ¢
h/4 general solution.)

x
Solution: The gradient of our trial solution gives
Figure 9.2 Depiction of possible
∇R x, y = n 0 x̂ ± ŷ y/h
¡ ¢ ¡ ¢
light ray paths in a region with
varying index. Substituting this into (9.7) gives
∇R · ∇R = n 0 x̂ ± ŷ y/h · n 0 x̂ ± ŷ y/h = n 02 1 + y 2 /h 2 = n 2 y
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
which confirms that it is a solution. The direction of light propagation is

n 0 x̂ ± ŷ y/h
¡ ¢
∇R x̂ ± ŷ y/h
ŝ y ≡
¡ ¢
= p = p
|∇R| n 0 1 + y 2 /h 2 1 + y2
Computed at various heights, the direction for rays turns out to be
x̂ ± ŷ x̂ ± ŷ/2 x̂ ± ŷ/4
ŝ (h) = p ŝ (h/2) = p ŝ (h/4) = p
2 5/4 17/16
These are represented in Fig. 9.2. In a desert mirage, light from the sky can appear
to come from a lower position. We can determine a path for the rays by setting
d y/d x equal to the slope of ŝ:
dy y
= ⇒ y = y 0 e ±(x−x0 )/h
dx h

9.2 Fermat’s Principle 221
Under the assumption of an infinitely short wavelength, the Poynting vector

is directed along ŝ as demonstrated in P9.2. In other words, the direction of
ŝ specifies the direction of energy flow. The unit vector ŝ at each location in
space points perpendicular to the wave fronts and indicates the direction that the
waves travel as seen in Fig. 9.1. We refer to a collection of vectors ŝ distributed
throughout space as rays.
In retrospect, we might have jumped straight to (9.8) without going through
the above derivation. After all, we know that each part of a wave front advances
in the direction of its gradient ∇R (r) (i.e. in the direction that R (r) varies most
rapidly). We also know that each part of a wave front defined by R (r) = constant
travels at speed c/n (r). The slower a given part of the wave front advances, the
more rapidly R (r) changes with position r and the closer the contours of constant
phase. It follows that ∇R (r) must be proportional to n (r) since ∇R (r) denotes the
rate of change in R (r).
9.2 Fermat’s Principle

Pierre de Fermat (1601–1665, French)
As we have seen, the eikonal equation (9.8) governs the path that rays follow as was born in Beaumont-de-Lomagne,
they traverse a region of space, where the index varies with position. Another way France to a wealthy merchant family.
He attended the University of Toulouse
of deducing the correct path rays is via Fermat’s principle. Fermat’s principle says before moving to Bordeaux in the late
that if a ray happens to travel through point A and through point B, it will follow a 1620s where Fermat distinguished him-
self as a mathematician. Fermat was
path between the points that takes the least time. proficient in many languages and went
on to obtain a law degree in 1631 from
the University of Orleans. He continued
Derivation of Fermat’s Principle from the Eikonal Equation his study of mathematics as a hobby
throughout his life. He corresponded
We begin by taking the curl of (9.8) to obtain2 with a number of notable mathemati-
cians, and through his letters made
notable contributions to analytic ge-
∇ × [n (r) ŝ (r)] = ∇ × [∇R (r)] = 0 (9.9)
ometry, probability theory, and number
theory. He was often quite secretive
This can be integrated over an open surface of area A to give about the methods used to obtain his
Z I results. Mathematicians suspect that
∇ × [n (r) ŝ (r)] d a = n (r) ŝ (r) · d ` = 0 (9.10) Fermat didn’t actually prove his famous
last theorem, which was not able to be
A C
verified until the 1990’s. Fermat was
where we have applied Stokes’ theorem (0.12) to convert the area integral into a the first to assert that the path taken by
a beam of light is the one that can be
path integral around the perimeter contour C . traveled in the least amount of time.
Equation (9.10) states that the integration of nŝ · d ` around a closed loop is always
zero. If we consider a closed loop from point A to point B and then back again to
point A, but along another route, the integrals for the two legs always cancel even
while holding one leg fixed while varying the other. This means
ZB
nŝ · d ` is independent of path from A to B. (9.11)
A
Now consider a path from A to B that is parallel to ŝ, as depicted in Fig. 9.3. In
this case, the cosine in the dot product in is always one. If we choose some other
2 The curl of a gradient is identically zero for any function.

path that connects A and B, the cosine associated with the dot product is often
less than one, whereas the result of the integral is the same. Therefore, if we
artificially remove the dot product from the integral (i.e. exclude the cosine factor),
the result of the integral will exceed the true value unless the path chosen follows
the direction of ŝ (i.e. the path that corresponds to the one that light rays actually
follow).
In mathematical form, this argument can be expressed as
A
ZB
B 
Z 
nŝ · d ` = min nd ` (9.12)
 
A A
The integral on the right is called the optical path length (OP L) between points A
and B:
ZB
B
OP L|BA ≡ nd ` (9.13)
A
Figure 9.3 A ray of light leaving
point A arriving at B. The conclusion is that the true path that light follows between two points (i.e.
the one that stays parallel to ŝ) is the one with the shortest optical path length.
The index n may vary with position and therefore can be different for each of the
incremental distances d `.
Fermat’s principle is usually stated in terms of the time it takes light to travel
between points. The travel time ∆t depends not only on the path taken by the
light but also on the velocity of the light v (r), which varies spatially with the
refractive index:
ZB ZB
B d` d` OP L|BA
∆t | A = = = (9.14)
v(r) c/n(r) c
A A
To find the correct path for the light ray that leaves point A and crosses point
B, we need only minimize the optical path length between the two points. Mini-
mizing the optical path length is equivalent to minimizing the time of travel since
it differs from the time of travel only by the constant c. The optical path length
is not the actual distance that the light travels; it is proportional to the number
of wavelengths that fit into that distance (see (2.24)). Thus, as the wavelength
shortens due to a higher index of refraction, the optical path length increases.
The correct ray traveling from A to B does not necessarily follow a straight line
but can follow a complicated curve according to how the index varies.
An imaging situation occurs when many paths from point A to point B have
the same optical path length. An example of this occurs when a lens causes an
A B image to form. In this case all rays leaving point A (on an object) and traveling
through the system to point B (on the image) experience equal optical path
lengths. This situation is depicted in Fig. 9.4. Note that while the rays traveling
through the center of the lens have a shorter geometric path length, they travel
Figure 9.4 Rays of light leaving through more material so that the optical path length is the same for all rays.
point A with the same optical path
To summarize Fermat’s principle, of the many rays that might emanating
length to B.
from a point A, the ray that crosses a second point B is the one that follows the

9.2 Fermat’s Principle 223
shortest optical path length. If many rays tie for having the shortest optical path,
we say that an image of point A forms at point B. It should be noted that Fermat’s
principle, as we have written it, does not work for anisotropic media such as
crystals where n depends on the direction of a ray as well as on its location (see
P9.4).
Example 9.2
Use Fermat’s principle to derive Snell’s law.
Solution: Consider the many rays of light that leave point A seen in Fig. 9.5. Only
one of the rays passes through point B. Within each medium we expect the light to
travel in a straight line since the index is uniform. However, at the boundary we
must allow for bending since the index changes.
The optical path length between points A and B may be written
q q
OP L = n i x i2 + y i2 + n t x t2 + y t2 (9.15)
B
We need to minimize this optical path length to find the correct one according to
Fermat’s principle.
Since points A and B are fixed, we may regard x i and x t as constants. The distances
y i and y t are not constants although the combination
y tot = y i + y t (9.16) A
is constant. Thus, we may rewrite (9.15) as

q q
¢2 Figure 9.5 Rays of light leaving
OP L y i = n i x i2 + y i2 + n t x t2 + y tot − y i (9.17)
¡ ¢ ¡
point A; not all of them will tra-
verse point B.
where everything on the right-hand side is constant except for y i .
We now minimize the optical path length by taking the derivative and setting it
equal to zero:
− y tot − y i
¡ ¢
d (OP L) yi
= ni q + nt q ¢2 = 0 (9.18)
d yi x i2 + y i2 x t2 + y tot − y i
¡
Notice that
yi yt
sin θi = q and sin θt = q (9.19)
x i2 + y i2 x t2 + y t2
When these are substituted into (9.18) we obtain
n i sin θi = n t sin θt (9.20)
which is the familiar Snell’s law.

Example 9.3
Use Fermat’s principle to derive the equation of curvature for a reflective surface
that causes all rays leaving one point to image to another. Do the calculation in
two dimensions rather than in three.3
Solution: We adopt the convention that the origin is half way between the points,
which are separated by a distance 2a, as shown in Fig. 9.6. If the points are to
image to each other, Fermat’s principle requires that the total path length be a
constant; call it b. By inspection of the figure, we set the reflected path equal to the
constant b: q q
(x + a)2 + y 2 + (x − a)2 + y 2 = b (9.21)
To get (9.21) into a more recognizable form, we isolate the first square root and
square both sides of the equation, which gives
q
(x + a)2 + y 2 = b 2 + (x − a)2 + y 2 − 2b (x − a)2 + y 2
Figure 9.6 After squaring the two binomial terms, some nice cancelations occur, and we get
q
4ax − b 2 = −2b (x − a)2 + y 2
which we square again to obtain
16a 2 x 2 − 4ab 2 x + b 4 = 4b 2 x 2 − 2ax + a 2 + y 2

¡ ¢
After some cancellations and regrouping this becomes
16a 2 − 4b 2 x 2 − 4b 2 y = 4a 2 b 2 − b 4
¡ ¢
Finally, we divide both sides by the term on the right to obtain the (hopefully)
familiar form of an ellipse
x2 y2
³ 2´ + ³ 2 ´ =1 (9.22)
b b 2
4 4 − a
9.3 Paraxial Rays and ABCD Matrices

We now turn our attention to the effects of curved mirrors and lenses on rays of
light. Keep in mind that when describing light as a collection of rays rather than
as waves, the results can only describe features that are macroscopic compared to
a wavelength. The rays of light at each location in space describe approximately
the direction of travel of the wave fronts at that location. Since the wavelength of
visible light is extraordinarily small compared to the macroscopic features that we
perceive in our day-to-day world, the ray approximation is often a very good one.
3 This configuration is used to direct flash lamp energy into a laser amplifier rod. One ‘point’ in
Fig. 9.6 represents the end of an amplifier rod while the other represents the end of a thin flash-lamp
tube.

9.3 Paraxial Rays and ABCD Matrices 225
This is the reason that ray optics was developed long before light was understood
as a wave.
We consider ray theory within the paraxial approximation, meaning that
we restrict our attention to rays that are near and almost parallel to an optical
axis of a system, say the z-axis. It is within this approximation that the familiar
imaging properties of lenses occur. An image occurs when all rays from a point
on an object converge to a corresponding point on what is referred to as the
image. To the extent that the paraxial approximation is violated, the clarity of
an image can suffer, and we say that there are aberrations present. The field of
optical engineering is often concerned with the minimization aberrations in cases
where the paraxial approximation is not strictly followed. This is done so that, for
example, a camera can take pictures of objects that occupy a fairly wide angular
field of view, where rays violate the paraxial approximation. Optical systems are
typically engineered using the science of ray tracing, which is described briefly in
section 9.A.
As we develop paraxial ray theory, we should remember that rays impinging
on devices such as lenses or curved mirrors should strike the optical component
at near normal incidence. To quantify this statement, the paraxial approximation
is valid to the extent that
sin θ ∼
=θ (9.23)
is a good approximation, and similarly
tan θ ∼
=θ (9.24)
Here, the angle θ (in radians) represents the angle that a particular ray makes
with respect to the optical axis. There is an important mathematical reason for
this approximation. The sine is a nonlinear function, but at small angles it is
approximately linear and can be represented by its argument. It is this linearity
that is crucial to the process of forming images. The linearity also greatly simplifies
the formulation since it reduces the problem to linear algebra. Conveniently, we
will be able to keep track of imaging effects with a 2×2 matrix formalism.
Consider a ray propagating in the y–z plane where the optical axis is in the z-
direction. Let us specify a ray at position z 1 by two coordinates: the displacement
from the axis y 1 and the orientation angle θ1 (see Fig. 9.7). If the index is uniform
everywhere, the ray travels along a straight path. It is straightforward to predict the
coordinates of the same ray down stream, say at z 2 . First, since the ray continues
in the same direction, we have Figure 9.7 The behavior of a ray as
θ2 = θ1 (9.25) light traverses a distance d .
By referring to Fig. 9.7 we can write y 2 in terms of y 1 and θ1 :
y 2 = y 1 + d tan θ1 (9.26)
where d ≡ z 2 − z 1 . Equation (9.26) is nonlinear in θ1 . However, in the paraxial
approximation (9.24) becomes linear, which after all is the point of the approxi-
mation. In this approximation the expression for y 2 simplifies to
y 2 = y 1 + θ1 d (9.27)

Equations (9.25) and (9.27) describe a linear transformation which in matrix

notation can be consolidated into the form
y2 1 d y1
· ¸ · ¸· ¸
ABCD matrix for propagation = (9.28)
θ2 0 1 θ1
through a distance d
Here, the vectors in this equation specify the essential information about the ray
before and after traversing the distance d , and the matrix describes the effect of
traversing the distance. This type of matrix is called an ABCD matrix; sometimes
physicists are not very inventive with names.
Example 9.4
Let the distance d be subdivided into two distances, a and b, such that d = a +
b. Show that an application of the ABCD matrix for distance a followed by an
application of the ABCD matrix for b renders same result as an application of the
ABCD matrix for distance d .
Solution: Individually, the effects of propagation through a and through b are
y mid 1 a y1 y2 1 b y mid
· ¸ · ¸· ¸ · ¸ · ¸· ¸
= and = (9.29)
θmid 0 1 θ1 θ2 0 1 θmid
where the subscript “mid” refers to the ray in the middle position after traversing
the distance a. If we combine the equations, we get
y2 1 b 1 a y1
· ¸ · ¸· ¸· ¸
= (9.30)
θ2 0 1 0 1 θ1
which is in agreement with (9.28) since the ABCD matrix for the entire displace-
ment is
A B 1 b 1 a 1 a +b
· ¸ · ¸· ¸ · ¸
= = (9.31)
C D 0 1 0 1 0 1
9.4 Reflection and Refraction at Curved Surfaces

We next consider the effect of reflection from a spherical surface as depicted in
Fig. 9.8. We consider only the act of reflection without considering propagation
before or after the reflection takes place. Thus, the incident and reflected rays
in the figure are symbolic only of the direction of propagation before and after
reflection; they do not indicate any amount of travel. We immediately write
y2 = y1 (9.32)
since the ray has no chance to go anywhere.

We adopt the widely used convention that, upon reflection, the positive z-
direction is reoriented so that we consider the rays still to travel in the positive

9.4 Reflection and Refraction at Curved Surfaces 227
z sense. An easy way to remember this is that the positive z direction is always
taken to be down stream of where the light is headed. Notice that in Fig. 9.8, the
reflected ray approaches the z-axis. In this case θ2 is a negative angle (as opposed
to θ1 which is drawn as a positive angle) and is equal to
θ2 = − (θ1 + 2θi ) (9.33)
where θi is the angle of incidence with respect to the normal to the spherical
mirror surface. By the law of reflection, the incident and reflected ray both occur
at an angle θi referenced to the surface normal. The surface normal points towards
the center of curvature of the mirror surface, which we assume is on the z-axis a
distance R away. By convention, the radius of curvature R is a positive number
if the mirror surface is concave and a negative number if the mirror surface is
convex.
Elimination of θi from (9.33) in favor of θ1 and y 1
By inspection of Fig. 9.8 we can write

y1
= sin φ ∼
=φ (9.34)
R
where we have applied the paraxial approximation (9.23). (The angles in Fig. 9.8
are exaggerated.) By inspection of the geometry, we also have
φ = θ1 + θi (9.35)
and when this is combined with (9.34), we get

Figure 9.8 A ray depicted in the
y1 act of reflection from a spherical
θi = − θ1 (9.36)
R surface.
With this we are able to put (9.33) into a useful linear form:
2
θ2 = − y 1 + θ1 (9.37)
R
Equations (9.32) and (9.37) describe a linear transformation that can be con-
cisely formulated as
y2 1 0 y1
· ¸ · ¸· ¸
= (9.38) ABCD matrix for a curved mirror
θ2 −2/R 1 θ1
The ABCD matrix in this transformation describes the act of reflection from a
concave mirror with radius of curvature R. The radius R is negative when the
mirror is convex.
The final basic element that we shall consider is a spherical interface between
two materials with indices n i and n t (see Fig. 9.9). This has an effect similar to
that of the curved mirror, which changes the direction of a ray without altering
its distance y 1 from the optical axis. Please note that here the radius of curvature
is considered to be positive for a convex surface (opposite convention from that

of the mirror). In this way, if the lower index is on the left, a positive radius R for
either the interface or the mirror tends to deflect rays towards the axis. Again, we
are interested only in the act of transmission without any travel before or after
the interface. As before, (9.32) applies (i.e. y 2 = y 1 ).
At the interface, the rays obey Snell’s obeys, which in the paraxial approxima-
tions is written
n i θi = n t θt (9.39)
The angles θi and θt are referenced from the surface normal, as seen in Fig. 9.9.
Substituting θ1 , θ2 and y 1 into Snell’s Law

Figure 9.9 A ray depicted in the
By inspection of Fig. 9.9, we have
act of transmission at a curved
material interface. θ i = θ1 + φ (9.40)
and
θt = θ2 + φ (9.41)
where φ is the angle that the surface normal makes with the z-axis. As before (see
(9.34)), within the paraxial approximation we may write
φ∼
= y 1 /R
When this is used in (9.40) and (9.41), which are substituted into (9.39), Snell’s law
becomes
ni y 1 ni
µ ¶
θ2 = −1 + θ1 (9.42)
nt R nt
The compact matrix form of (9.32) and (9.42) is written
y2 1 0 y1
· ¸ · ¸· ¸
ABCD matrix for a curved = (9.43)
θ2 (n i /n t − 1) /R n i /n t θ1
interface
9.5 ABCD Matrices for Combined Optical Elements

To summarize the previous two sections, we have developed ABCD matrices for
three basic elements: 1) propagation through a region of uniform index (9.28),
2) reflection from a curved mirror (9.38), and 3) transmission through a curved
interface between regions with different indices (9.43). All other ABCD matrices
that we will use are composites of these three. For example, one can construct the
ABCD matrix for a lens by using two matrices like those in (9.43) to represent the
entering and exiting surfaces of the lens. A distance matrix (9.28) can be inserted
to account for the thickness of the lens. It is left as an exercise to derive the ABCD
matrix for a thick lens (seeP9.6).

9.5 ABCD Matrices for Combined Optical Elements 229
Example 9.5
Derive the ABCD matrix for a thin lens, where the thickness between the two lens
surfaces is ignored.
Solution: A thin lens is depicted in Fig. 9.10. R 1 is the radius of curvature for the
first surface (which is positive if convex as drawn), and R 2 is the radius of curvature
for the second surface (which is negative as drawn). For either surface, the radius
of curvature is considered to be positive if the surface is convex from the perspective
of rays that encounter it. Figure 9.10 Thin lens.
We take the index outside of the lens to be unity while that of the lens material to
be n. We apply the ABCD matrix (9.43) in sequence, once for entering the lens and
once for exiting:
A B 1 0 ¡ 11 0
· ¸ · ¸· ¸
= 1 1 1
(n − 1) n −1
¢
C D R2 R1 n n
(9.44) ABCD matrix for a thin lens
1³ 0
" #
= ´
− (n − 1) R11 − R12 1
The matrix for the first interface is written on the right, where it operates first on
an incoming ray vector. In this case, n i = 1 and n t = n. The matrix for the second
surface is written on the left so that it operates afterwards. For the second surface,
n i = n and n t = 1.
Notice the close similarity between (9.44) and the matrix in (9.38). The ABCD
Distance within a material,
matrix for either a thin lens or a mirror can be written as excluding interfaces
A B 1 0
· ¸ · ¸
1 d
· ¸
= (9.45)
C D −1/ f 1 0 1
where in the case of the thin lens the focal length is given by the lens maker’s Window, starting and stopping
formula in air
1 1 1
µ ¶
1 d /n
· ¸
= (n − 1) − (focal length of thin lens) (9.46)
f R1 R2 0 1
and in the case of a curved mirror, the focal length is
Thin lens or Mirror
f = R/2 (focal length for a curved mirror) (9.47) 1± 0

· ¸
−1 f 1
Table 9.1 is a summary of ABCD matrices of common optical elements. ³ ´
Thin Lens: 1f = (n − 1) R1 − R1
1 2
Mirror: 1f = R2
Example 9.6
Thick lens
Derive the ABCD matrix for a window with thickness d and index n.
1+ d n 1 −1 d
 ³ ´ 
 R1 n 
µ ¶
(1−n) R1 − R1 + R dR 2− n 1 −n 1− d n
1 −1
 ³ ´ ³ ´
Solution: We can again take advantage of the ABCD matrix for a curved interface 1 2 1 2 R2
(9.43), only in this problem we will let R 1 = ∞ and R 2 = ∞ to provide flat surfaces.
We take the index outside of the window to be unity and the index inside the
Table 9.1 Summary of ABCD
matrices for common optical
elements.
window to be n. We use the ABCD matrix (9.43) twice, once for each interface,
sandwiching matrix (9.31), which endows the window with thickness:
A B 1 0 1 d 1 0
· ¸ · ¸· ¸· ¸
= 1
C D 0 n 0 1 0 n
(9.48)
1 d /n
· ¸
= (window)
0 1
As far as rays are concerned, a window is effectively shorter to traverse than free
space.4 Fig. 9.11 illustrates why this is the case. The displacement of the exiting ray
Figure 9.11 Window. is not as great as it would have been without the window. The window impedes
the rate at which the ray can move away from or toward the optical axis.
Example 9.7
y y
h i h i
Find ray θ22 that results when θ11 propagates through a distance a, reflects from
a mirror of radius R, and then propagates through a distance b. See Fig. 9.12.
Solution: The final ray in terms of the initial one is computed as follows:
y2 1 b 1 0 1 a y1
· ¸ · ¸· ¸· ¸· ¸
=
θ2 0 1 −2/R 1 0 1 θ1
1 − 2b/R a + b − 2ab/R y1
· ¸· ¸
= (9.49)
−2/R 1 − 2a/R θ1
(1 − 2b/R) y 1 + (a + b − 2ab/R) θ1
· ¸
=
(−2/R) y 1 + (1 − 2a/R) θ1
As always, the ordering of the matrices is important. The first effect that the ray
experiences is represented by the matrix on the right, which is in the position that
y1
h i
first operates on θ1 .
Figure 9.12 A ray that travels
through a distance a, reflects from
We have derived our basic ABCD matrices for rays traveling in the y–z plane,
a mirror, and then travels through
as suggested in Figs. 9.7–9.12. This may have given the impression that it is
a distance b.
necessary to work within a plane that contains the z-axis. However, within the
paraxial approximation, the ABCD matrices are valid for rays that become dis-
placed simultaneously in both the x and y dimensions during propagating along
z.
As we demonstrate below, the behavior of rays functions independently in
the x and y dimensions. h y Ifi desired, one can write a ray vector for each dimen-
£x¤
sion, namely θx and θ y . Moreover, the identical matrices, for example any
in table 9.1, are used for either dimension. Figs. 9.7–9.12 therefore represent
projections of rays onto the y–z plane. To complete the story, one can imagine
corresponding figures representing the projection of the rays onto the x–z plane.
4 In contrast, the optical path length OPL is effectively longer than free space by the factor n.

9.6 Image Formation 231
Independence of Rays in the x and y Dimensions
Imagine a ray contained within a plane that is parallel to the y–z plane but for
which x > 0. One might be concerned that when the ray meets, for example, a
spherically concave mirror, the radius of curvature in the perspective of the y–z
dimension might be different for x > 0 than for x = 0 (at the center of the mirror).
This concern is actually quite legitimate and is the source of what is known as
spherical aberration. Nevertheless, in the paraxial approximation the intersection
with the curved mirror of all planes that are parallel to the optical axis gives the
same curve.
To see why this is so, consider the curvature of the mirror in Fig. 9.8. As we
move away from the mirror center (in the x or y-dimension or some combination
thereof), the mirror surface deviates to the left by the amount
δ = R − R cos φ (9.50)
∼ 2
. φ = 1 − φ /2. And since in this approxi-
In the paraxial approximation, we have cos
mation we may also write φ ∼= x 2 + y 2 R, (9.50) becomes
p
Galileo di Vincenzo Bonaiuti de’

x2 y2
δ∼
= + (9.51) Galilei (1564–1642, Italian) was born
2R 2R in Pisa, Italy, the son of a musician.
Galileo enrolled in the University of Pisa
In the paraxial approximation, we see that the curve of the mirror is parabolic, and with the intent to study medicine but
therefore separable between the x and y dimensions. That is, the curvature in the soon became diverted into mathematics.
x-dimension (i.e. ∂δ/∂x = x/R) is independent of y, and the curvature in the y- He served three years as chair of math-
ematics in Pisa beginning in 1589 and
dimension (i.e. ∂δ/∂y = y/R) is independent of x. A similar argument can be made then moved to the University of Padua
for a spherical interface between two media within the paraxial approximation. where he taught geometry, mechanics,
and astronomy for two decades. While
Galileo did not invent the telescope, he
considerably improved the design. With
it he discovered four moons of Jupiter
and was the first to observe sunspots
9.6 Image Formation and mountains and valleys on the Moon.
Galileo also was the first to document
the phases of Venus, similar to the
Consider Example 9.7 where a ray travels a distance a, reflects from a curved
phases of the moon. He used these
mirror, and then travels a distance b. From (9.49), the ABCD matrix for the overall observations to argue in favor of the
process is Copernican model of the solar system,
but this conflicted with the prevailing
A B 1 − b/ f a + b − ab/ f
· ¸ · ¸
= (9.52) views of the Catholic Church at the
C D −1/ f 1 − a/ f time, and he was placed under house
arrest and forbidden to publish of any
where by (9.47) we have replaced 2/R with 1/ f . Because of the similarity between of his works. While under house arrest,
he wrote much on kinematics and other
the behavior of a curved mirror and a thin lens, the above expression can also principles of physics and is considered to
represent a ray traveling a distance a, traversing a thin lens with focal length f , be the father of modern physics. Galileo
and then traveling a distance b. The only difference is that, in the case the thin attempted to measure the speed of light
by observing an assistant uncover a
lens, f is given by lens maker’s formula (9.46). lantern on a distant hill in response to a
As is well known, it is possible to form an image with either a curved mirror light signal. He concluded that light is
“really fast” if not instantaneous.
or a lens. Suppose that the initial ray is one of many rays that leaves a particular
point on an object positioned a = d o before the mirror (or lens). In order for an
image to occur at d i = b, it is essential that all rays leaving the particular point on
the object converge to a corresponding point on the image. That is, we want rays
leaving the point y 1 on the object (which may take on a range of angles θ1 ) all to

converge to a single point y 2 at the image. In the following equation we need y 2

to be independent of θ1 :
y2 A B y1 Ay 1 + B θ1
· ¸ · ¸· ¸ · ¸
= = (9.53)
θ2 C D θ1 C y 1 + Dθ1
Image The condition for image formation is therefore
Object B =0 (condition for image formation) (9.54)
When this condition is applied to (9.52), we obtain
Figure 9.13 Image formation by a do di 1 1 1

do + di − =0⇒ = + (9.55)
thin lens. f f do di
which is the familiar imaging formula (9.1). When the object is infinitely far away
(i.e. d o → ∞), the image appears at d i → f . This gives a physical interpretation
to the focal length f , as we have been calling it. Please note that d o and d i can
each be either positive (real as depicted in Fig. 9.13) or negative (virtual meaning
a screen cannot be inserted to display the image).
The magnification of the image is found by comparing the size of y 2 to y 1 .
From (9.52)–(9.55), the magnification is found to be
y2 2d i di
M≡ = A = 1− =− (9.56)
y1 R do
The negative sign indicates that for positive distances d o and d i the image is
inverted.
In the above discussion, we have examined image formation by a thin lens
or a curved mirror. Of course, images can also be formed by thick lenses or by
more complex composite optical systems (e.g. a system of lenses and spaces).
The ABCD matrices for the elements in a composite system are simply multiplied
together (the first element that rays encounter appearing on the right) to obtain an
overall ABCD matrix. The principles for image with an arbitrary ABCD matrix are
the same as those for a thin lens or curved mirror. As before, consider propagation
a distance d o from an object to the optical element followed by propagation a
distance d i to an image. The ABCD matrix for the overall operation is
1 di A B 1 do A + d iC d o A + B + d o d iC + d i D
· ¸· ¸· ¸ · ¸
=
0 1 C D 0 1 C d oC + D
· 0 0 ¸
(9.57)
A B
=
C 0 D0
An image occurs according to (9.54) when B 0 = 0, or
general condition for image d o A + B + d o d iC + d i D = 0, (9.58)
formation
with magnification
M = A + d iC (9.59)
For a complex lens system, the matrix elements A, B , C and D can be complicated
expressions. There is a convenient way to simplify the analysis discussed in the
next section.

9.7 Principal Planes for Complex Optical Systems 233
9.7 Principal Planes for Complex Optical Systems

For every ABCD matrix representing an optical system, there exist two principal
planes located (in our convention) a distance p 1 before entering the system and
a distance p 2 after exiting the system. When the matrices corresponding to the
(appropriately chosen) distances to those planes are appended to the original
ABCD matrix of the system, the overall matrix simplifies to one that looks identical
to the matrix for a simple thin lens (9.45).
With knowledge of the positions of the principal planes, one can treat the
complicated imaging system in the same way that one treats a simple thin lens.
That is, we can simply use the common formulas (9.55) and (9.56). The only
difference is that d o is the distance from the object to the first principal plane and
d i is the distance from the second principal plane to the image. (In the case of an
actual thin lens, both principal planes are at p 1 = p 2 = 0. For a composite system,
p 1 and p 2 can be either positive or negative.) First Second
We assert that for any optical system,5 p 1 and p 2 can always be selected such Principal Principal
Plane Plane
that we can write
Figure 9.14 A multi-element sys-
1 p2 A B 1 p1 A + p 2C p 1 A + B + p 1 p 2C + p 2 D
· ¸· ¸· ¸ · ¸
tem represented as an ABCD ma-
=
0 1 C D 0 1 C p 1C + D trix for which principal planes
always exist.
1 0
· ¸
=
−1/ f eff 1
(9.60)
The final matrix is that of a simple thin lens, and it takes the place of the composite
system including the distances to the principal planes.
Determination of p 1 and p 2 and Justification of (9.60)
Our task is to find the values of p 1 and p 2 that make (9.60) true. We can straight-
away make the definition
f eff ≡ −1/C (9.61)
We can also solve for p 1 and p 2 by setting the diagonal elements of the matrix to 1.
Explicitly, we get
1−D
p 1C + D = 1 ⇒ p 1 = (9.62)
C
and
1− A
A + p 2C = 1 ⇒ p2 = (9.63)
C
It remains to be shown that the upper right element in (9.60) (i.e. p 1 A + B +
p 1 p 2C + p 2 D) automatically goes to zero for our choices of p 1 and p 2 . This may
seem unlikely at first, but watch what happens!
5 The starting and ending refractive index must be the same.

When (9.62) and (9.63) are substituted into the upper right matrix element of (9.60)
we get
1−D 1−D 1− A 1− A
p 1 A + B + p 1 p 2C + p 2 D = A +B + C+ D
C C C C
1
= [1 − AD + BC ] (9.64)
C
1 ¯ A B ¯
µ ¯ ¯¶
= 1−¯¯ ¯
C C D ¯
This vanishes (as desired) if the determinant of the original ABCD matrix equals
one. Fortunately, this is always the case as long as we begin and end in the same
index of refraction:
¯ A B ¯
¯ ¯
¯ C D ¯=1 (9.65)
¯ ¯
Notice that the determinants of all of the matrices in table 9.1 are one. Moreover,
ABCD matrices constructed of these will also have determinants equal to one.6
(a)
9.8 Stability of Laser Cavities
The ABCD matrix formulation provides a powerful tool to analyze the stability of
a laser cavity. The basic elements of a laser cavity include an amplifying medium
(b)
and mirrors to provide feedback. Presumably, at least one of the end mirrors is
partially transmitting so that energy is continuously extracted from the cavity.
Here, we dispense with the amplifying medium and concentrate our attention on
the optics providing the feedback.
(c) As might be expected, the mirrors must be carefully aligned or successive
reflections might cause rays to ‘walk’ continuously away from the optical axis,
so that they eventually leave the cavity out the side. If a simple cavity is formed
with two flat mirrors that are perfectly aligned parallel to each other, one might
(d) suppose that the mirrors would provide ideal feedback. However, all rays except
for those that are perfectly aligned to the mirror surface normals would eventually
wander out of the side of the cavity as illustrated in Fig. 9.15a. Such a cavity is said
to be unstable. We would like to do a better job of trapping the light in the cavity.
To improve the situation, a cavity can be constructed with concave end mir-
Figure 9.15 (a) A ray bouncing rors to help confine the beams within the cavity. Even so, one must choose
between two parallel flat mirrors.
carefully the curvature of the mirrors and their separation L. If this is not done
(b) A ray bouncing between two
correctly, the curved mirrors can ‘overcompensate’ for the tendency of the rays
curved mirrors in an unstable
configuration. (c) A ray bouncing to wander out of the cavity and thus aggravate the problem. Such an unstable
between two curved mirrors in a scenario is depicted in Fig. 9.15b.
stable configuration. (d) Stable Figure 9.15c depicts a cavity made with curved mirrors where the separation
cavity utilizing a lens and two flat L is chosen appropriately to make the cavity stable. Although a ray, as it makes
end mirrors.
6 The determinant of (9.43) is not one since it starts and ends with different indices of refraction.
However, when this matrix is used in succession to form a lens, the resulting matrix has determinant
equal to one.

9.8 Stability of Laser Cavities 235
successive bounces, can strike the end mirrors at a variety of points, the curvature
of the mirrors keeps the ‘trajectories’ contained within a narrow region so that
they cannot escape out the sides of the cavity.
There are many ways to make a stable laser cavity. For example, a stable cavity
can be made using a lens between two flat end mirrors as shown in Fig. 9.15d. Any
combination of lenses (perhaps more than one) and curved mirrors can be used
to create stable cavity configurations. Ring cavities can also be made to be stable
where in no place do the rays retro-reflect from a mirror but circulate through
a series of elements like cars going around a racetrack. The ABCD matrix for a
round trip in the cavity will be useful for this analysis.
Example 9.8
Find the round-trip ABCD matrix for the cavities shown in Figs. 9.15c and 9.15d.
Solution: The round-trip ABCD matrix for the cavity shown in Fig. 9.15c is
A B 1 L 1 0 1 L 1 0
· ¸ · ¸· ¸· ¸· ¸
= (9.66)
C D 0 1 −2/R 2 1 0 1 −2/R 1 1
where we have begun the round trip just after a reflection from the first mirror.
The round-trip ABCD matrix for the cavity shown in Fig. 9.15d is
A B 1 2L 1 1 0 1 2L 2 1 0
· ¸ · ¸· ¸· ¸· ¸
= (9.67)
C D 0 1 −1/ f 1 0 1 −1/ f 1
where we have begun the round trip just after a transmission through the lens
moving to the right. It is somewhat arbitrary where a round trip begins. The
multiplication on the above matrices will need to be carried out to do problems
P9.13 and P9.14.
To determine whether a given configuration of a cavity is stable, we need to

know what a ray does after making many round trips in the cavity. To find the
effect of propagation through many round trips, we multiply the round-trip ABCD
matrix together N times, where N is the number of round trips that we wish to
consider. We can then examine what happens to an arbitrary ray after making N
round trips in the cavity as follows:
¸N ·
y N +1 A B y1
· ¸ · ¸
= (9.68)
θN +1 C D θ1
At this point students might be concerned that taking an ABCD matrix to the N th
power can be a lot of work. (It is already a significant work just to compute the
ABCD matrix for a single round trip.) In addition, we are interested in letting N
be very large, perhaps even infinity. Students can relax because we have a neat
trick to accomplish this daunting task.

By Sylvester’s theorem in appendix 0.4, we have

¸N
· · ¸
= (9.69)
where
1
cos θ =
(A + D) . (9.70)
2
This is valid as long as the determinant of the ABCD matrix is one. As noted earlier
(see (9.65)), we are in luck! The determinant is one any time a ray begins and stops
in the same refractive index, which by definition is guaranteed for any round trip.
We therefore can employ Sylvester’s theorem for any N that we might choose,
including very large integers.
We would like the elements of (9.69) to remain finite as N becomes very large.
If this is the case, then we know that a ray remains trapped within the cavity
and stays reasonably close to the optical axis. Since N only appears within the
argument of a sine function, which is always bounded between −1 and 1 for
real arguments, it might seem that the elements of (9.69) always remain finite
as N approaches infinity. However, it turns out that θ can become imaginary
depending on the outcome of (9.70), in which case the sine becomes a hyperbolic
sine, which can ‘blow up’ as N becomes large. In the end, the condition for cavity
stability is that a real θ must exist for (9.70), or in other words we need
(a) 1
−1 < (A + D) < 1 (condition for a stable cavity) (9.71)
2
It is left as an exercise to apply this condition to (9.66) and (9.67) to find the
necessary relationships between the various element curvatures and spacing in
order to achieve cavity stability.
(b) Appendix 9.A Aberrations and Ray Tracing

The paraxial approximation places serious limitations on the performance of
optical systems (see (9.23) and (9.24)). To stay within the approximation, all rays
traveling in the system should travel very close to the optic axis with very shallow
angles with respect to the optical axis. To the extent that this is not the case, the
collection of rays associated with a single point on an object may not converge to
Figure 9.16 (a) Paraxial theory pre-
a single point on the associated image. The resulting distortion or “blurring” of
dicts that the light imaged from
a point source will converge to the image is known as aberration.
a point (i.e. have spherical wave Common experience with photographic and video equipment suggests that
fronts coming to the image point). it is possible to image scenes that have a relatively wide angular extent (many
(b) The image of a point source tens of degrees), in apparent serious violation of the paraxial approximation.
made by a real lens with aberra- The paraxial approximation is indeed violated in these devices, so they must be
tions is an extended and blurred designed using more complicated analysis techniques than those we have learned
patch of light and the converg-
in this chapter. The most common approach is to use a computationally intensive
ing wavefronts are only quasi-
spherical.
procedure called ray tracing in which sin θ and tan θ are rendered exactly. The

9.A Aberrations and Ray Tracing 237
nonlinearity of these functions precludes the possibility of obtaining analytic

solutions describing the imaging performance of such optical systems.
The typical procedure is to start with a collection of rays from a test point such
as shown in Fig. 9.17. Each ray is individually traced through the system using
the exact representation of geometric surfaces as well as the exact representation
of Snell’s law. On close analysis, the rays typically do not converge to a distinct Figure 9.17 Ray tracing through a
imaging point. Rather, the rays can be ‘blurred’ out over a range of points where simple lens.
the image is supposed to occur. Depending on the angular distribution of the
rays as well as on the elements in the setup, the spread of rays around the image
point can be large or small. The engineer who designs the system must determine
whether the amount of aberration is acceptable, given the various constraints of
the device.
To minimize aberrations below typical tolerance levels, several lenses can
be used together. If properly chosen, the lenses (some positive, some negative)
separated by specific distances, can result in remarkably low aberration levels
over certain ranges of operation for the device. Ray tracing is best done with
commercial software designed for this purpose (e.g. Zemax or other professional
products). Such software packages are able to develop and optimize designs for
specific applications. A nice feature is that the user can specify that the design
should employ only standard optical components available from known optics
companies. In any case, it is typical to specify that all lenses in the system should
have spherical surfaces since these are much less expensive to manufacture. We
mention briefly a few types of aberrations that you may encounter. Multiple
aberrations can often be observed in a single lens.
Chromatic aberration arises from the fact that the index of refraction for glass
varies with the wavelength of light. Since the focal length of a lens depends on
the index of refraction (see, for example, Eq. (9.46)), the focal length of a lens
varies with the wavelength of light. Chromatic aberration can be compensated
for by using a pair of lenses made from two types of glass as shown in Fig. 9.18
(the pair is usually cemented together to form a “doublet” lens). The lens with the
shortest focal length is made of the glass whose index has the lesser dependence
on wavelength. By properly choosing the prescription of the two lenses, you
can exactly compensate for chromatic aberration at two wavelengths and do a
good job for a wide range of others. Achromatic doublets can also be designed to
minimize spherical aberration (see below), so they are often a good choice when
you need a high quality lens.
Monochromatic aberrations arise from the shape of the lens rather than the
low dispersion high dispersion
variation of n with wavelength. Before the advent computers facilitated the glass glass
widespread use of ray tracing, these aberrations had to be analyzed primarily
with analytic techniques. The analytic results derived previously in this chapter Figure 9.18 Chromatic aberration
were based on first order approximations (e.g. sin θ ≈ θ). This analysis predicts causes lenses to have different
focal lengths for different wave-
that a lens can image a point source to an exact image point, which predicts
lengths. It can be corrected using
spherically converging wavefronts at the image point as shown in Fig. 9.16(a). You
an achromatic doublet lens.
can increase the accuracy of the theory for non-paraxial rays by retaining second-

order correction terms in the analysis. With these second-order terms included,
the wave fronts converging towards an image point are mostly spherical, but have
second-order aberration terms added in (shown conceptually in Fig. 9.16(b)).
There are five aberration terms in this second-order analysis, and these represent
a convenient basis for discussing aberration.
The first aberration term is known as spherical aberration. This type of aber-
ration results from the fact that rays traveling through a spherical lens at large
radii experience a different focal length than those traveling near the axis. For a
converging lens, this causes wide-radius rays to focus before the near-axis rays
as shown in Fig. 9.19. This problem can be helped by orienting lenses so that
the face with the least curvature is pointed towards the side where the light rays
have the largest angle. This procedure splits the bending of rays more evenly
between the front and back surface of the lens. As mentioned above, you can also
cement two lenses made from different types of glass together so that spherical
aberrations from one lens are corrected by the other.
Figure 9.19 Spherical aberration The aberration term referred to as astigmatism occurs when an off-axis object
in a plano-convex lens.
point is imaged to an off-axis image point. In this case a spherical lens has a
different focal length in the horizontal and vertical dimensions. For a focusing
lens this causes the two dimensions to focus at different distances, producing a
vertical line at one image plane and a horizontal line at another. A lens can also be
inherently astigmatic even when viewed on axis if it is football shaped rather than
spherical. In this case, the astigmatic aberration can be corrected by inserting a
cylindrical lens at the correct orientation (this is a common correction needed in
eyeglasses).
A third aberration term is referred to as coma. This is observed when off-axis
points are imaged and produces a comet shaped tail with its head at the point
Undistorted predicted by paraxial theory. (The term ‘coma’ refers to the atmosphere of a
comet, which is how the aberration got its name.) This aberration is distinct from
astigmatism, which is also observed for off-axis points, since coma is observed
even when all of the rays are in one plane (see Fig. 9.20). You have probably seen
coma if you’ve ever played with a magnifying glass in the sun—just tilt the lens
slightly and you see a comet-like image rather than a point.
The curvature of the field aberration term arises from the fact that spherical
Barrel Distortion lenses image spherical surfaces to another spherical surface, rather than imaging
a plane to a plane. This is not so bad for your eyeball, which has a curved screen,
but for things like cameras and movie projectors we would like to image to a flat
screen. When a flat screen is used and the curvature of the field aberration is
present, the image will be focus well near the center, but become progressively
out of focus as you move to the edge of the screen (i.e. the flat screen is further
from the curved image surface as you move from the center).
Pincushion Distortion The final aberration term is referred to as distortion. This aberration occurs
Figure 9.21 Distortion occurs
when the magnification of a lens depends on the distance from the center of
when magnification is not con- the screen. If magnification decreases as the distance from the center increases,
stant across an extended image. then ‘barrel’ distortion is observed. When magnification increases with distance,

9.A Aberrations and Ray Tracing 239
c
b
a
c
a
Image on screen
Figure 9.20 Illustration of coma. Rays traveling through the center of the lens are im-
aged to point a as predicted by paraxial theory. Rays that travel through the lens at
radius ρ b in the plane of the figure are imaged to point b. Rays that travel through the
lens at radius ρ b , but outside the plane of the figure are imaged to other points on the
circle (in the image plane) containing point b. Rays at that travel through the lens at
other radii on the lens (e.g. ρ c ) also form circles in the image plane with radius propor-
tional to ρ 2 with the center offset from point a a distance proportional to ρ 2 . When
light from each of these circles combines on the screen it produces an imaged point
with a “comet tail.”
‘pincushion’ distortion is observed (see Fig. 9.21).

All lenses will exhibit some combination of the aberrations listed above (i.e.
chromatic aberration plus the five second-order aberration terms). In addition to
the five named monochromatic aberrations, there are many other higher order
aberrations that also have to be considered. Aberrations can be corrected to a high
degree with multiple-element systems (designed using ray-tracing techniques)
composed of lenses and irises to eliminate off-axis light. For example, a camera
lens with a focal length of 50 mm, one of the simplest lenses in photography, is
typically composed of about six individual elements. However, optical systems
never completely eliminate all aberration, so designing a system always involves
some degree of compromise in choosing which aberrations to minimize and
which ones you can live with.

Exercises
Exercises for 9.1 The Eikonal Equation
P9.1 Consider the index described in Example 9.1. The solution given in
the example corresponds to rays that asymptotically approach y = 0. A
more general solution is given by
p
µ q ¶
∇R = n 0 x̂ 1 + α ± ŷ y 2 /h 2 − α 1 + α > 0 and y 2 /h 2 − α > 0
¡ ¢
This corresponds to rays that either hit the ground or return toward the
sky without reaching the ground, depending on the sign of α.
(a) Verify that ∇R satisfies the eikonal equation and determine the
function R x, y .
¡ ¢
ξ
³ p ´
HINT: d ξ ξ2 − α = 2 ξ2 − α − α2 ln ξ + ξ2 − α (ξ − α > 0).
R p p
p ³ ´
(b) Verify that the light path is given by y = h α cosh x−x p 0 when
h 1+α
p ¯
¯ x−x
¯
α > 0 and is given by y = h |α| sinh ¯ p 0
¯ when α < 0. Consider only
¯
h 1+α
the region y > 0 (i.e. above ground). Notice that these solutions can
make rays that travel either to the right or to the left.
HINT: cosh2 ξ − sinh2 ξ = 1 ddξ cosh ξ = sinh ξ ddξ sinh ξ = cosh ξ.
(c) Make a sketch of these two solution classes in the case of α = ±4.
P9.2 Prove that under the approximation of very short wavelength, the
Poynting vector is directed along ∇R (r) or ŝ.
Solution: (partial)
First, from Faraday’s law (1.36) we have
i
∇ × E0 (r)e i (kvac R(r)−ωt )
³ ´
B(r, t ) =
ω
Applying the identity ∇ × aψ = ψ(∇ × a) + ∇ψ × a to this equation, we obtain:

¡ ¢
i ³ i (kvac R(r)−ωt )
[∇ × E0 (r)] + i k vac e i (kvac R(r)−ωt ) [∇R(r) × E0 (r)]
´
B(r, t ) = e
ω
i λvac i [kvac R(r)−ωt ] 1
= e [∇ × E0 (r)] − e i [kvac R(r)−ωt ] [∇R (r) × E0 (r)]
2πc c
The first term vanishes in the limit of very short wavelength, and we have:
1
B(r, t ) → − [∇R (r)] × E0 (r) e i [kvac R(r)−ωt ] . (9.72)
c
Next, from Gauss’s law (1.34) and the constitutive relation (2.16) we have
∇ · 1 + χ(r) E0 (r)e i (kvac R(r)−ωt ) = 0

h¡ ¢ i

Exercises 241
Applying the identity ∇ · (aψ) = a · ∇ψ + ψ∇ · a to this expression yields:
e i (kvac R(r)−ωt ) ∇ · 1 + χ (r) E0 (r) + i k vac e i (kvac R(r)−ωt ) 1 + χ(r) [∇R (r) · E0 (r)] = 0
£¡ ¢ ¤ ¡ ¢
Canceling the common exponential term, using k vac = 2π/λvac , and some algebra then gives
∇ · 1 + χ(r) E0 (r)
£¡ ¢ ¤
−i λvac + ∇R(r) · E0 (r) = 0
2π 1 + χ(r)
¡ ¢
In the limit of very short wavelength, this becomes
∇R(r) · E0 (r) → 0 (9.73)
Finally, compute the time average of the Poynting vector
1
S= Re {E(r, t )} × Re {B(r, t )}
µ0
1 £
E (r, t ) + E∗ (r, t ) × B(r, t ) + B∗ (r, t )
¤ £ ¤
=
4µ0
You will need to employ expressions (9.72) and (9.73), as well as the BAC-CAB rule (see P0.3).
Exercises for 9.2 Fermat’s Principle
P9.3 Use Fermat’s Principle to derive the law of reflection (3.6) for a reflective
surface. A
B
HINT: Do not consider light that goes directly from A to B; require a
single bounce.
P9.4 Show that Fermat’s Principle fails to give the correct path for an extraor-
dinary ray entering a uniaxial crystal whose optic axis is perpendicular
to the surface.
HINT: With the index given by (5.29), show that Fermat’s principle leads
Figure 9.22
to an answer that neither agrees with the direction of the k-vector (5.32)
nor with the direction of the Poynting vector (5.40).
Exercises for 9.4 Reflection and Refraction at Curved Surfaces
P9.5 Derive the ABCD matrix that takes a ray on a round trip through a
simple laser cavity consisting of a flat mirror and a concave mirror of
radius R separated by a distance L. HINT: Start at the flat mirror. Use
the matrix in (9.28) to travel a distance L. Use the matrix in (9.38) to
represent reflection from the curved mirror. Then use the matrix in
(9.28) to return to the flat mirror. The matrix for reflection from the flat
mirror is the identity matrix (i.e. R flat → ∞).

P9.6 Derive the ABCD matrix for a thick lens made of material n 2 sur-
rounded by a liquid of index n 1 . Let the lens have curvatures R 1 and R 2
and thickness d .
Answer:
n1 n
1 + Rd
 ³ ´ 
A B −1 d n1
· ¸
= 1 ´ n2 ³2
n 1 1 d n1 n2 n1
1 − Rd
³ ´³ ³ ´ ´ 
C D − n2 − 1
1 R1 − R2 + R1 R2 2 − n2 − n1 2 n2 − 1
Exercises for 9.6 Image Formation
P9.7 (a) Show that the ABCD matrix for a thick lens (see P9.6) reduces to that
of a thin lens (9.45) when the thickness goes to zero. Take the index
outside of the lens to be n 1 = 1.
(b) Find the ABCD matrix for a thick window (thickness d ). Take the
index outside of the window to be n 1 = 1. HINT: A window is a thick
lens with infinite radii of curvature.
P9.8 An object is placed in front of a concave mirror. Find the location of

the image d i and magnification M when d o = R, d o = R/2, d o = R/4,
and d o = −R/2 (virtual object). Make a diagram for each situation,
depicting rays traveling from a single off-axis point on the object to
a corresponding point on the image. You may want to emphasize
especially the ray that initially travels parallel to the axis and the ray
that initially travels in a direction intersecting the axis at the focal point
R/2.
Exercises for 9.7 Principal Planes for Complex Optical Systems

unknown
element P9.9 A complicated lens element is represented by an ABCD matrix. An
object placed a distance d 1 before the unknown element causes an
image to appear a distance d 2 after the unknown element.
Suppose that when d 1 = `, we find that d 2 = 2`. Also, suppose that
Figure 9.23 when d 1 = 2`, we find that d 2 = 3`/2 with magnification −1/2. What is
the ABCD matrix for the unknown element?
HINT: Use the conditions for an image (9.58) and (9.59). If the index
Principal Principal
of refraction is the same before and after, then (9.65) applies. HINT:
Plane Plane First find linear expressions for A, B , and C in terms of D. Then put the
results into (9.65).
P9.10 (a) Consider a lens with thickness d = 5 cm, R 1 = 5 cm, R 2 = −10 cm,
n = 1.5. Compute the ABCD matrix of the lens. HINT: See P9.6.
(b) Where are the principal planes located and what is the effective
Figure 9.24 focal length f eff for this system?

Exercises 243
L9.11 Deduce the positions of the principal planes and the effective focal
length of a compound lens system. Reference the positions of the
principal planes to the outside ends of the metal hardware that encloses
the lens assembly. (video)
HINT: Obtain three sets of distances to the object and image planes
and place the data into (9.58) to create three distinct equations for the
unknowns A, B, C, and D. Find A, B, and C in terms of D and place the
results into (9.65) to obtain the values for A, B, C, and D. The effective
Figure 9.25
focal length and principal planes can then be found through (9.61)–
(9.63).
P9.12 Use a computer program to calculate the ABCD matrix for the com-
pound system shown in Fig. 9.26, known as the “Tessar lens.” The
details of this lens are as follows (all distances are in the same units,
and only the magnitude of curvatures are given—you decide the sign):
Convex-convex lens 1 (thickness 0.357, R 1 = 1.628, R 2 = 27.57, n =
1.6116) is separated by 0.189 from concave-concave lens 2 (thickness
0.081, R 1 = 3.457, R 2 = 1.582, n = 1.6053), which is separated by 0.325
from plano-concave lens 3 (thickness 0.217, R 1 = ∞, R 2 = 1.920, n =
1.5123), which is directly followed by convex-convex lens 4 (thickness
0.396, R 1 = 1.920, R 2 = 2.400, n = 1.6116).
1 2 3 4
HINT: You can reduce the number of matrices you need to multiply by
using the “thick lens” matrix. Figure 9.26
Exercises for 9.8 Stability of Laser Cavities
P9.13 (a) Show that the cavity depicted in Fig. 9.15c is stable if
L L
µ ¶µ ¶
0 < 1− 1− <1
R1 R2
(b) The two concave mirrors have radii R 1 = 60 cm and R 2 = 100 cm.
Over what range of mirror separation L is it possible to form a stable
laser cavity?
HINT: There are two different stable ranges with an unstable range
between them.
P9.14 Find the stable ranges for L 1 = L 2 = L for the laser cavity depicted in
Fig. 9.15d with focal length f = 50 cm.
L9.15 Experimentally determine the stability range of a HeNe laser with ad-
justable end mirrors. Check that this agrees reasonably well with theory. Figure 9.27
Can you think of reasons for any discrepancy? (video)

Chapter 10
Diffraction
In the 1600’s, Christian Huygens developed a wave description for light. However,
his ideas were largely overlooked at the time because Sir Isaac Newton promoted
a competing theory. Newton proposed that light should be though of as many tiny
bullets or corpuscles as he called them. Newton’s ideas prevailed for more than
a century, perhaps because he was right on so many other things, until Thomas
Young performed his famous two-slit experiment, conclusively demonstrating the
wave nature of light. Even then, Young’s conclusions were accepted only gradually
by others, a notable exception being a young Frenchman, Augustin Fresnel. The
two formed a close friendship through correspondence, and it was Fresnel that
followed up on Young’s conclusions and dedicated his life to a study of light.
Fresnel’s skill as a mathematician allowed him to transform physical intuition
into powerful and concise ideas. Perhaps Fresnel’s greatest accomplishment was
the adaptation of Huygens’ principle of wavelet superposition into a mathematical Christiaan Huygens (1629–1695,
Dutch) was born in The Hague, Nether-
formula. Ironically, he used Newton’s calculus to achieve this. Huygens’ principle
lands. His father was friends with the
asserts that a wave front can be thought of as many wavelets, which propagate and mathematician René Descartes, which
interfere to form new wave fronts. This is illustrated in Fig. 10.1. The phenomenon probably influenced his upbringing. Huy-
gens studied law and mathematics at
of diffraction an aperture is then understood as the spilling of wavelets around the University of Leiden, which preceded
obstructions (e.g. the edges of a hole through which light spills). a very productive career as a scientist
and mathematician. During mid career,
After formulating Huygens’ principle as a diffraction integral, Fresnel made Huygens held a position in the French
an approximation to his own formula (called the Fresnel approximation) for the Academy of Sciences in Paris for 15
years, but spent the majority of his life
sake of making the integration easier to perform. As far as approximations go, in The Hague. Huygens was the first
the Fresnel approximation is surprisingly accurate in describing the light field to advocate the wave theory of light.
He was able to explain birefringence in
in the region down stream from an aperture. The diffraction pattern can evolve terms of his wave theory together with
in complicated ways as the distance from an aperture increases. At distances far a refractive index that varied with direc-
down stream from an aperture, the diffraction pattern acquires a final form that tion. Huygens constructed a telescope
with which he discovered Saturn’s moon
no longer evolves, other than to grow in proportion to distance. The far-away Titan. He also made the first detailed
diffraction pattern is often of interest, and it turns out that the Fresnel diffraction observations of the Orion nebula. Huy-
gens made significant advancements in
formula can be simplified further in this case. The far-away limit of the Fresnel clock-making technology and wrote a
diffraction formula is called the Fraunhofer approximation. book on probability theory. Huygens
was one of the earliest science-fiction
From the modern perspective, Fresnel’s diffraction formula needs justifica- writers and speculated that life exists on
tion starting from Maxwell’s equation. The diffraction formula is based on scalar other planets in his book Cosmotheoros.
245
246 Chapter 10 Diffraction
diffraction theory, which ignores polarization effects. In some situations, ignor-

ing polarization is benign, but in other situations ignoring polarization effects
produces significant errors. These issues as well as the approximations leading to
scalar diffraction theory are discussed in section 10.2.
10.1 Huygens’ Principle as Formulated by Fresnel

In this section we discuss the calculus of summing up the contributions from
the many wavelets originating in an aperture illuminated by a light field. Each
point in the aperture is thought of as a source of a spherical wave.1 In our modern
notation, such a spherical wave can be written as proportional to e i kR /R, where R
is the distance from the source. As a spherical wave propagates, its strength falls
off in proportion to the distance traveled and the phase is related to the distance
propagated, similar to the phase of a plane wave.
Students should be aware that a spherical wave of the form e i kR /R (even if
some sort of polarization is attached) is a poor solution to Maxwell’s equations
(see P10.2). It utterly fails near R = 0. However, if R is much larger than a wave-
length, this spherical wave starts to approximate actual solutions to Maxwell’s
equations. It is within this regime that the diffraction formula derived here is
Figure 10.1 Wave fronts depicted successful. It should be noted that by choosing k, we consider only a single
as a series of Huygens’ wavelets.
wavelength of light (i.e. one frequency).
Consider an aperture or opening in an opaque screen in the plane z = 0. Let
the aperture be illuminated with a light field distribution E (x 0 , y 0 , z = 0) within
the aperture. Then for a point (x, y, z) lying somewhere after the aperture (z > 0)
the net field is given by adding together wavelets emitted from each point in the
aperture.
Each spherical wavelet takes on the strength and phase of the field at the
point where it originates. Mathematically, this summation takes the form
i e i kR
Ï
E (x, y, z) = − E (x 0 , y 0 , 0) d x 0d y 0 (10.1)
λ R
aperture
where q
R= (x − x 0 )2 + (y − y 0 )2 + z 2 (10.2)
Figure 10.2
is the radius of each wavelet as it individually intersects the point (x, y, z). The
constant −i /λ in front of the integral in (10.1) ensures the right phase and field
strength (not to mention units). We will see how these factors arise in sections 10.2
and 10.3. To summarize, (10.1) tells us how to compute the field down stream
given knowledge of the field in an aperture. The field at each point (x 0 , y 0 ) in
1 For simplicity, we use the term ‘spherical wave’ in this book to refer to waves of the type
imagined by Huygens (i.e. of the form e i kR /R). There is a different family of waves based on
spherical harmonics that are also sometimes referred to as spherical waves. These waves have
angular as well as radial dependence, and they are solutions to Maxwell’s equations. For details see
pp. 429–432 of Jackson’s Classical Electrodynamics, 3rd Ed. (Ref. [2]).

10.1 Huygens’ Principle as Formulated by Fresnel 247
the aperture, with its unique strength and phase, is treated as the source for a
spherical wave. The integral in (10.1) sums the contributions for all of these
wavelets.
Example 10.1
Find the on-axis2 (i.e. x, y = 0) intensity following a circular aperture of diameter `
illuminated by a uniform plane wave.
Solution: The diffraction integral (10.1) takes the form

p 02 02 2
i ¢ e i k x +y +z
Ï
0 0
E (0, 0, z) = − E x , y ,0 p d x0d y 0
¡
λ x 02 + y 02 + z 2
aperture
Figure 10.3 Circular aperture illu-
The circular hole encourages a change to cylindrical coordinates: x 0 = ρ 0 cos φ0 and minated by a plane wave.
y 0 = ρ 0 sin φ0 ; d x 0 d y 0 → ρ 0 d ρ 0 d φ0 . In this case, the limits of integration define of
the geometry of the aperture, and the integration is accomplished as follows:
Z2π Z`/2 i k pρ 02 +z 2
i E0 e
E (0, 0, z) = − d φ0 ρ0 d ρ0
λ ρ 02 + z 2
p
0 0
p 02 2 ¯`/2 µ p
i E0 e i k ρ +z ¯¯
¶
2 2
=− 2π ¯ = −E 0 e i k (`/2) +z − e i kz
λ ik ¯
0
The on axis intensity becomes

µ p ¶µ p ¶
2 2 2 2
I (0, 0, z) ∝E (0, 0, z) E ∗ (0, 0, z)=|E 0 |2 e i k (`/2) +z − e i kz e −i k (`/2) +z − e −i kz
· µ q ¶¸
= 2 |E 0 |2 1 − cos k (`/2)2 + z 2 − kz
(10.3)
See problem P10.6 for a graph of this function.
When an aperture has a complicated shape, it is more convenient to break

up the diffraction integral (10.1) into several pieces. Students are already used to
doing this sort of piecewise approach to integration in other settings. It seems
hardly worth giving a name to this technique, but it is called Babinet’s principle;
perhaps in Babinet’s day people were not as comfortable with calculus.
As an example of how to use Babinet’s principle, suppose that we have an
aperture that consists of a circular obstruction within a square opening as de-
picted in Fig. 10.4. Thus, the light transmits through the region between the circle
and the square. One can evaluate the overall diffraction pattern by first evaluating
the diffraction integral for the entire square (ignoring the circular block) and then
subtracting the diffraction integral for a circular opening having the shape of the
Figure 10.4 Aperture comprised of
block. This removes the unwanted part of the previous integration and yields the
the region between a circle and a
2 An analytical solution is not possible off axis. square.

overall result. It is important to add and subtract the integrals (i.e. fields), not
their squares (i.e. intensity).
As trivial as Babinet’s principle may seem to the modern student, it may
not be obvious at first that Babinet’s principle also applies to an infinitely wide
plane wave that is interrupted by finite obstructions. In this case, one simply
computes the diffraction of the blocked portions of the field as though these
portions were openings in a mask. This result is then subtracted from the plane
wave (no integration needed for the plane), as depicted in Fig. 10.5.
Mask
When Fresnel first presented his diffraction formula to the French Academy of
Block
Sciences, a certain judge of scientific papers named Simeon Poisson noticed that
the formula predicted that there should be light in the center of the geometric
shadow behind a circular obstruction. This seemed so absurd that Fresnel’s work
Figure 10.5 A block in a plane was initially disbelieved until the spot was experimentally confirmed. Needless to
wave giving rise to diffraction in say, Fresnel’s paper was then awarded first prize, and this spot appearing behind
the geometric shadow.
circular blocks has since been known as Poisson’s spot.
Example 10.2
Find the on-axis (i.e. x, y = 0) intensity behind a circular block of diameter ` placed
in a uniform plane wave.
Solution:
³ Fromp Example ´ 10.1, the on-axis field behind a circular aperture is
i kz i k (`/2)2 +z 2
E0 e − e . Babinet’s principle says to subtract this result from a plane
wave to obtain the field behind the circular block. The situation is depicted in
Fig. 10.5 (side view). The on-axis field is then
Siméon Denis Poisson (1781-1840,
French) µ p ¶ p
2 2 2 2
E (0, 0, z) = E 0 e i kz − E 0 e i kz − e i k (`/2) +z = E 0 e i k (`/2) +z
The on axis intensity becomes simply

p p
(`/2)2 +z 2 −i k (`/2)2 +z 2
I (0, 0, z) ∝ E (0, 0, z) E ∗ (0, 0, z) = |E 0 |2 e i k e = |E 0 |2
This result says that, in the exact center of the shadow behind a circular obstruction,
the intensity is the same as the illuminating plane wave for all distance z. A spot of
light in the center forms right away; no wonder Poisson was astonished!
10.2 Scalar Diffraction Theory

In this section we provide the background motivation for Hyugen’s principle and
Fresnel’s formulation of it. Consider a light field with a single frequency ω. The
light field can be represented by E (r) e −i ωt which obeys the wave equation
n2 ∂2 e −i ωt
∇2 E (r) e −i ωt − E (r) =0 (10.4)
c2 ∂t 2
10.2 Scalar Diffraction Theory 249
We take the index n to be uniform throughout space. Since the temporal part of
the field is written explicitly, the time derivative in (10.4) is easily performed, and
the equation reduces to
∇2 E (r) + k 2 E (r) = 0 (10.5)
where k ≡ nω/c is the magnitude of the usual wave vector. Equation (10.5) is
called the Helmholtz equation. It is the wave equation written for the case of a
single frequency, where the trivial time dependence has been removed from the
equation. To obtain the full wave solution, the factor e −i ωt is simply appended to
the solution of the Helmholtz equation E (r).
At this point it we take an egregious step: We simply ignore the vectorial
nature of E(r) and write (10.5) using only the magnitude E (r). When using scalar
diffraction theory, we must keep in mind that it is based on this serious step.
Under the scalar approximation, the vector Helmholtz equation (10.5) becomes
the scalar Helmholtz equation:
∇2 E (r) + k 2 E (r) = 0 (10.6)
This equation of course is consistent with (10.5) in the case of a plane wave.
However, we are interested in spherical waves of the form E (r ) = E 0 r 0 e i kr /r . It
turns out that such spherical waves are exact solutions to the scalar Helmholtz
equation (10.6). The proof is left as an exercise (see P10.3). Nevertheless, spherical
waves of this form only approximately satisfy the vector Helmholtz equation (10.5).
We can get away with this slight of hand if the radius r is large compared to a
wavelength (i.e., kr À 1) and we restrict r to a narrow range perpendicular to the
polarization.
Significance of the Scalar Wave Approximation
The solution of the scalar Helmholtz equation is not completely unassociated with
the solution to the vector Helmholtz equation. In fact, if E scalar (r) obeys the scalar
Helmholtz equation (10.6), then
E (r) = r × ∇E scalar (r) (10.7)
obeys the vector Helmholtz equation (10.5).

Consider a spherical wave, which is a solution to the scalar Helmholtz equation:
E scalar (r) = E 0 r 0 e i kr /r (10.8)
Remarkably, when this expression is placed into (10.7) the result is zero. Although
zero is in fact a solution to the vector Helmholtz equation, it is not very interesting.
A more interesting solution to the scalar Helmholtz equation is
i e i kr
µ ¶
E scalar (r) = r 0 E 0 1 − cos θ (10.9)
kr r
which is one of an infinite number of unique ’spherical’ solutions that exist. Notice
that in the limit of large r , this expression looks similar to (10.8), aside from the

factor cos θ. The vector form of this field according to (10.7) is
i e i kr
µ ¶
E (r) = −φ̂r 0 E 0 1 − sin θ (10.10)
kr r
This field looks approximately like the scalar spherical wave solution (10.8) in the
limit of large r if the angle is chosen to lie near θ ∼
= π/2 (spherical coordinates).
Since our use of the scalar Helmholtz equation is in connection with this spherical
wave under these conditions, the results are close to those obtained from the
vector Helmholtz equation.
Fresnel developed his diffraction formula (10.1) a half century before Maxwell
assembled the equations of electromagnetic theory. In 1887, Gustav Kirchhoff
showed that Fresnel’s diffraction formula satisfies the scalar Helmholtz equation.
In doing this he clearly showed the approximations implicit in the theory, and
made a slight revision to the formula:
i ¢ e i kR 1 + cos (r, ẑ)

Ï µ ¶
¡ 0 0
E x, y, z = − E x ,y ,z = 0 d x 0d y 0 (10.11)
¡ ¢
λ R 2
aperture
The factor in parentheses, Kirchhoff’s revision, is known as the obliquity factor.

Here, cos(r, ẑ) indicates the cosine of the angle between r and ẑ). Notice that this
factor is approximately equal to one when the point (x, y, z) is chosen to be in
the forward direction; we usually study diffraction under this circumstance. On
the other hand, the obliquity factor equals zero for fields traveling in the reverse
direction (i.e. in the −ẑ direction). This fixes a problem with Fresnel’s version of
the formula (10.1) based on Huygens’ wavelets, which suggested that light could
as easily diffract in the reverse direction as in the forward direction
In honor of Kirchhoff’s work, (10.11) is referred to as the Fresnel-Kirchhoff
diffraction formula. The details of Kirchhoff’s more rigorous derivation, including
how the factor −i /λ naturally arises, are given in Appendix 10.A. Since the Fresnel-
Kirchhoff formula can be understood as a superposition of spherical waves, it is
not surprising that it satisfies the scalar Helmholtz equation (10.6).
10.3 Fresnel Approximation

Although the Fresnel-Kirchhoff integral looks innocent enough, it is actually quite
difficult to evaluate analytically. It is problematic even if the field E x 0 , y 0 , z = 0 is
¡ ¢
constant across the aperture and if the obliquity factor (1 + cos (r, ẑ))/2 is approxi-
mated as one (i.e. forward direction).
Fresnel introduced an approximation to his diffraction formula that makes
the integration somewhat easier to perform. The approximation is analogous
to the paraxial approximation made for rays in chapter 9. Thus, the Fresnel
approximation requires the avoidance of large angles with respect to the z-axis
Besides letting the obliquity factor be one, Fresnel approximated R by the
distance z in the denominator of (10.11) . He thereby removed the dependence

10.3 Fresnel Approximation 251
on x 0 and y 0 so that the denominator can be brought out in front of the integral.
This is valid to the extent that we restrict ourselves to small angles:
R∼
=z (denominator only; Fresnel approximation) (10.12)
The above approximation is wholly inappropriate in the exponent of (10.11) since

small changes in R can result in dramatic variations in the periodic function e i kR .
To approximate R in the exponent, we must proceed with caution. To this end
we expand (10.2) under the assumption z 2 À (x − x 0 )2 + (y − y 0 )2 . Again, this is
consistent with the idea of restricting ourselves to relatively small angles. The
expansion of (10.2) is written as
s ¢2
(x − x 0 )2 + y − y 0
¡
R=z 1+
z2
0 2
" ¢2 #
x −x + y − y0
¡ ¢ ¡
∼
= z 1+ +···
2z 2
(exponent; Fresnel approximation) (10.13)
Substitution of (10.12) and (10.13) into the Fresnel diffraction formula (10.1) yields
i kz i 2z (x 2 +y 2 )
k
E x 0 , y 0 , 0 e i 2z (x +y ) e −i z (xx +y y ) d x 0 d y 0
¢ k 02 02 k
= − i e e λz
0 0
E x, y, z ∼
¡ ¢ Î ¡
aperture
(Fresnel approximation) (10.14)
This is Fresnel’s approximation to his diffraction integral formula. It may look a bit
messier than before, but in terms of being able to make progress on integration
we are better off than previously. Notice that the integral can be interpreted as a
two-dimensional Fourier transform on E x 0 , y 0 , 0 e i 2z (x +y ) .
¡ ¢ k 02 02
Example 10.3
Compute the Fresnel diffraction field following a rectangular aperture (dimensions
∆x by ∆y) illuminated by a uniform plane wave.
Solution: According to (10.14), the field down stream is
∆x/2 ∆y/2
e i kz i k (x 2 +y 2 )
Z Z
k 02 ky
0 i 2z x −i kx
z x
0 k 02 y0
E x, y, z = −i E 0 e 2z dx e e d y 0 e i 2z y e −i
¡ ¢
z
λz
−∆x/2 −∆y/2
Unfortunately, the integration in the preceding example must be performed

numerically. This is often the case for diffraction integrals in the Fresnel approx- Figure 10.6 Field amplitude fol-
lowing a rectangular aperture
imation. Figure 10.6 shows the result of such an integration for a rectangular
computed in the Fresnel approxi-
aperture with a height twice its width. mation.
Paraxial Wave Equation

If we assume that the light coming through the aperture is highly directional, such
that it propagates mainly in the z-direction, we are motivated to write the field
as E (x, y, z) = Ẽ (x, y, z)e i kz . Upon substitution of this into the scalar Helmholtz
equation (10.6), we arrive at
∂2 Ẽ ∂2 Ẽ ∂Ẽ ∂2 Ẽ i kz
µ ¶
+ + 2i k + e =0 (10.15)
∂x 2 ∂y 2 ∂z ∂z 2
2
At this point we make the paraxial wave approximation, which is |2k ∂∂zẼ | À | ∂∂zẼ2 |.
That is, we assume that the amplitude of the field varies slowly in the z-direction
such that the wave looks much like a plane wave. We permit the amplitude to
change as the wave propagates in the z-direction as long as it does so on a scale
much longer than a wavelength. This leads to the paraxial wave equation:
∂2 ∂2 ∂
µ ¶
+ + 2i k Ẽ (x, y, z) ∼
=0 (paraxial wave equation) (10.16)
∂x 2 ∂y 2 ∂z
It turns out that the Fresnel approximation (10.14) is an exact solution to the
paraxial wave equation. As demonstrated in problem P10.5, (10.16) is satisfied by
Joseph von Fraunhofer (1787–1826,
Ï∞
German) was born in Straubing, Bavaria. i 2 2
h i
i k (x−x 0 ) +( y−y 0 )
He was orphaned at age 11 and where- Ẽ (x, y, z) ∼
=− Ẽ (x 0 , y 0 , 0)e 2z d x0d y 0 (10.17)
upon he was apprenticed to a glass- λz
−∞
maker. The workshop collapsed, trap-
ping him in the rubble. The Prince
of Bavaria directed the rescue efforts When the factor e i kz is appended, this field is identical to (10.14).
and thereafter took an interest in
Fraunhofer’s education. The prince
required the glassmaker to allow young
Joseph time to study, and he naturally
took an interest in optics. Fraunhofer
later worked at the Optical Institute
at Benediktbeuern, where he learned
10.4 Fraunhofer Approximation
techniques for making the finest optical
glass in his day. Fraunhofer developed An additional approximation to the diffraction integral was made famous by
numerous glass recipes and was expert
at creating optical devices. Fraunhofer
Joseph von Fraunhofer. The Fraunhofer approximation is simply the limiting
was the inventor of the spectroscope, case of the Fresnel approximation when the field is observed at a distance far
making it possible to do quantitative
after the aperture (called the far field). A diffraction pattern continuously evolves
spectroscopy. Using his spectroscope,
Fraunhofer was the first to observe and along the z-direction, as described by the Fresnel approximation. Eventually
document hundreds of absorption lines it evolves into a final diffraction pattern that maintains itself as it continues to
in the sun’s spectrum. He also noticed
that these varied for different stars, thus propogate (although it increases its size in proportion to distance). It is this far-
establishing the field of stellar spec- away diffraction pattern that is obtained from the Fraunhofer approximation.
troscopy. He was also the inventor of
the diffraction grating. In 1822, he was Since the Fresnel approximation requires the angles to be small (i.e. the paraxial
granted an honorary doctorate from approximation), so does the Fraunhofer approximation.
the University of Erlangen. Fraunhofer
passed away at age 39, perhaps due
In many textbooks, the Fraunhofer approximation is presented first because
to heavy-metal poisoning from glass the formula is easier to use. However, since it is a special case of the Fresnel
blowing. approximation, it logically should be discussed afterwards as we are doing here.
To obtain the diffraction pattern very far after the aperture, we make the following
approximation:
e i 2z (x +y ) ∼
k 02 02
=1 (far field) (10.18)

10.5 Diffraction with Cylindrical Symmetry 253
The validity of this approximation depends on a comparison of the size of the

aperture to the distance z where the diffraction pattern is observed. We need
k¡ ¢2
zÀ aperture radius (condition for far field) (10.19)
2
By removing the factor (10.18) from (10.14), we obtain the Fraunhofer diffrac-
tion formula:
k
ie i kz i 2z
e (x 2 +y 2 ) Ï
E x 0 , y 0 , 0 e −i z (xx +y y ) d x 0 d y 0
k 0 0
E x, y, z ∼ (10.20) (Fraunhofer approximation)
¡ ¢ ¡ ¢
=−
λz
aperture
Obviously, the removal of e i 2z (x +y ) from the integrand improves our chances of

k 02 02
being able to perform the integration. Notice that the integral can now be inter-
preted as a two-dimensional Fourier transform on the aperture field E x 0 , y 0 , 0 .
¡ ¢
Once we are in the Fraunhofer regime, a change in z is not very interesting

since it appears in the combination x/z or y/z inside the integral. At a larger
distance z, the same diffraction pattern is obtained with a proportionately larger
value of x or y. The Fraunhofer diffraction pattern thus preserves itself indefinitely
as the field propagates. It grows in size as the distance z increases, but the angular
size defined by x/z or y/z remains the same.
Example 10.4
Compute the Fraunhofer diffraction pattern following a rectangular aperture (di-
mensions ∆x by ∆y) illuminated by a uniform plane wave.
∆x/2 ∆y/2
e i kz i k (x 2 +y 2 )
Z Z
ky
0 −i kx
z x
0 y0
E x, y, z = −i E 0 e 2z dx e d y 0 e −i
¡ ¢
z
λz
−∆x/2 −∆y/2
It is left as an exercise (see P10.8) to perform the integration and compute the Figure 10.7 Fraunhofer diffraction
intensity. The result turns out to be pattern (field amplitude) gener-
ated by a uniformly illuminated
∆x 2 ∆y 2 2 π∆x 2 π∆y
µ ¶ µ ¶
I x, y, z = I 0 sinc x sinc y (10.21) rectangular aperture with a height
¡ ¢
λ2 z 2 λz λz twice the width.
where sincξ ≡ sin ξ/ξ. Note that lim sin ξ/ξ = 1.
ξ→0
10.5 Diffraction with Cylindrical Symmetry

Sometimes the field transmitted by an aperture is cylindrically symmetric. In this
case, the field at the aperture can be written as
E (x 0 , y 0 , z = 0) = E (ρ 0 , z = 0) (10.22)

where ρ ≡ x 2 + y 2 . Under cylindrical symmetry, the two-dimensional integra-

p
z = 25/k
tion over x 0 and y 0 in (10.14) or (10.20) can be reduced to a single-dimensional
integral over a cylindrical coordinate ρ 0 . The Fresnel diffraction integral (10.14) in
this situation is given by
kρ 2 Z2π
100/k i e i kz e i ¢ kρ02
Z
2z
ρ 0 d ρ 0 E ρ 0 , 0 e i 2z e −i z (ρ cos θρ cos θ +ρ sin θρ sin θ )
k 0 0 0 0
E ρ, z = − d θ0
¡ ¢ ¡
λz
0 aperture
500/k (10.23)
where
z = 75/k
x ≡ ρ cos θ y ≡ ρ sin θ x 0 ≡ ρ 0 cos θ 0 y 0 ≡ ρ 0 sin θ 0 (10.24)
Notice that in the exponent of (10.23) we can write
ρ 0 ρ cos θ 0 cos θ + sin θ 0 sin θ = ρ 0 ρ cos θ 0 − θ (10.25)

¡ ¢ ¡ ¢
With this simplification, the diffraction formula (10.23) can be written as

500/k
kρ 2 Z2π
i e i kz e i ¢ kρ02 kρρ 0
Z
2z
d θ 0 e −i z cos(θ−θ )
z = 200/k 0
E ρ, z = − ρ d ρ E ρ 0 , 0 e i 2z
0 0
(10.26)
¡ ¢ ¡
λz
aperture 0
We are able to perform the integration over θ with the help of the formula (0.57):
Z2π
kρρ 0
µ ¶
kρρ 0
e −i z cos(θ−θ ) d θ 0 = 2πJ 0
0
(10.27)
z
500/k 0
J 0 is called the zero-order Bessel function. Equation (10.26) then reduces to

z = 1000/k
kρ 2
2πi e i kz e i kρρ 0
µ ¶
¢ kρ02
Z
2z
E ρ, z = − ρ 0 d ρ 0 E ρ 0 , 0 e i 2z J 0
¡ ¢ ¡
λz z
aperture
(Fresnel approximation with cylindrical symmetry) (10.28)
¢ kρ02
The integral in (10.28) is called a Hankel transform on E ρ 0 , 0 e i 2z .
¡
In the case of the Fraunhofer approximation, the diffraction ³ integral becomes

500/k
kρ 02
´
a Hankel transform on just the field E ρ , z = 0 since exp i 2z goes to one.
¡ 0 ¢
Figure 10.8 Field amplitude fol- Under cylindrical symmetry, the Fraunhofer approximation is
lowing a circular aperture com-
puted in the Fresnel approxima- kρ 2
2πi e i kz e i kρρ 0
Z µ ¶
2z
tion. 0 0
¡ 0 ¢
E ρ, z = − ρ d ρ E ρ , 0 J0
¡ ¢
λz z
aperture
(Fraunhofer approximation with cylindrical symmetry) (10.29)
Just as fast Fourier transform algorithms aid in the numerical evaluation of diffrac-
tion integrals in Cartesian coordinates, fast Hankel transforms exist and can be
used with cylindrically symmetric diffraction integrals.

10.A Fresnel-Kirchhoff Diffraction Formula 255
Example 10.5
Compute the Fresnel and Fraunhofer diffraction patterns following a circular
aperture (diameter `) illuminated by a uniform plane wave.
kρ 2 Z`/2
2πe i kz e i kρ 02 kρρ 0
µ ¶
2z
0 0 i 2z
E ρ, z = −i E 0 ρ dρ e J0
¡ ¢
λz z
0
Unfortunately, the integration must be performed numerically. The result of the

calculation for a uniform field illuminating a circular aperture is shown in Fig. 10.8.
On the other hand, the field in the Fraunhofer limit (10.29) is
kρ 2 Z`/2
2πe i kz e i kρρ 0
µ ¶
2z
E ρ, z = −i E 0 ρ0d ρ0 J0
¡ ¢
λz z
0
!
which can be integrated analytically. It is left as an exercise to perform the integra-
tion and to show that the intensity of the Fraunhofer pattern is
µ 2 ¶2 · ¢ ¸2
π` J 1 k`ρ/2z
¡
I ρ, z = I 0 2 ¡ (10.30)
¡ ¢
4λz k`ρ/2z
¢
Figure 10.9 Fraunhofer diffraction

The function 2J 1x(x) (sometimes called the jinc function) looks similar to the sinc pattern (field amplitude) gener-
function (see Example 10.4)except that its first zero is at x = 1.22π rather than at π. ated for a uniformly illuminated
2J (ξ)
Note that lim 1ξ = 1. circular aperture.
ξ→0
Appendix 10.A Fresnel-Kirchhoff Diffraction Formula

To begin our derivation of the Fresnel-Kirchhoff diffraction formula, we employ
Green’s theorem (proven in appendix 10.B):
∂V ∂U
I · ¸ Z
U −V da = U ∇2V − V ∇2U d v (10.31)
£ ¤
∂n ∂n
S V
The notation ∂/∂n implies a derivative in the direction normal to the surface. We
choose for the functions to be used in this formula
V ≡ e i kr /r
(10.32)
U ≡ E (r)
where E (r) is assumed to satisfy the scalar Helmholtz equation, (10.6). When
these functions are used in Green’s theorem (10.31), we obtain
∂ e i kr e i kr ∂E i kr i kr
I " # Z " #
e e
E − da = E ∇2 − ∇2 E d v (10.33)
∂n r r ∂n r r
S V

The right-hand side of this equation vanishes (as long as we exclude the point
r = 0; see P0.4 and P0.5) since we have
e i kr e i kr 2 e i kr e i kr 2
E ∇2 − ∇ E = −k 2 E + k E =0 (10.34)
r r r r
where we have taken advantage of the fact that E (r) and e i kr /r both satisfy (10.6).
This is exactly the reason for our judicious choices of the functions V and U since
with them we were able to make half of (10.31) disappear. We are left with
∂ e i kr e i kr ∂E
I " #
E − da = 0 (10.35)
∂n r r ∂n
S
Now consider a volume between a small sphere of radius ² at the origin and an
outer surface of whatever shape. The total surface that encloses the volume is
comprised of two parts (i.e. S = S 1 + S 2 as depicted in Fig. 10.10).
When we apply (10.35) to the surface in Fig. 10.10, we have
∂ e i kr e i kr ∂E ∂ e i kr e i kr ∂E
I " # I " #
E − da = − E − da (10.36)
∂n r r ∂n ∂n r r ∂n
S2 S1
Our motivation for choosing this geometry with multiple surfaces is that eventu-
ally we want to find the field at the origin (inside the little sphere) from knowledge
of the field on the outside surface. To this end, we assume that ² is small so that
E (r) is approximately the same everywhere on the surface S 1 . Then the integral
over S 1 becomes
Figure 10.10 A two-part surface Z2π Zπ " Ã

∂ e i kr e i kr ∂E ∂ e i kr ∂r e i kr ∂E ∂r
I " # ! µ ¶ #
enclosing volume V . E − d a = lim dφ E − r 2 sin θd θ

∂n r r ∂n r =²→0 ∂r r ∂n r ∂r ∂n
S1 0 0
(10.37)
where we have used spherical coordinates. Notice that we have employed the
chain rule to execute the normal derivative ∂/∂n. Since r always points opposite
to the direction of the surface normal n̂, the normal derivative ∂r /∂n is always
equal to −1. (From the definition of the normal derivative we have ∂r /∂n ≡
∇r · n̂ = −n̂ · n̂ = −1.) We can now perform the integration in (10.37) as well as take
the limit as ² → 0 to obtain
∂ e i kr e i kr ∂E e i kr e i kr i kr µ
I " # " Ã ! ¶#
2 2e ∂E
lim E − d a = −4π lim r − 2 + i k E −r
²→0 ∂n r r ∂n ²→0 r r r ∂r
S1 r =²
∂E
·³ ´ µ ¶ ¸
= −4π lim −e i k² + i k²e i k² E − e i k² ²
²→0 ∂r r =²
= 4πE (0)
(10.38)
With the aid of (10.38), Green’s theorem applied to our specific geometry
(10.36) reduces to
I " i kr
∂ e i kr
#
1 e ∂E
E (0) = −E da (10.39)
4π r ∂n ∂n r
S2

10.A Fresnel-Kirchhoff Diffraction Formula 257
The field E on the left is understood to be the value of the field inside the little
sphere at the origin. The field E inside the integral is the value of the field on the
surface of integration. Hence, if we know the field everywhere on the outer surface
S 2 , then we can predict the field at the origin. Of course we are free to choose
any coordinate system in order to find the field anywhere inside the surface by
moving the origin.
Now let us choose a specific surface S 2 . We choose an infinite mask with a
mask
finite aperture connected to a hemisphere of infinite radius R → ∞. In the end,
we will actually be interested in light that enters through the mask and propagates
to the origin. In our present coordinate system, the vectors r and n̂ point opposite
aperture
to the incoming light. We will transform our coordinate system at a later point.
We must evaluate (10.39) on the surface depicted in the figure. For the portion origin
of S 2 which is on the hemisphere, the integrand tends to zero as R becomes large.
To argue this, it is necessary to recognize the fact that at large distances the field
takes on a form proportional to e i kR /R so that the two terms in the integrand
cancel. On the mask, we assume, as did Kirchhoff, that both ∂E /∂n and E are zero.
Figure 10.11 Surface S 2 depicted
(Later Sommerfeld noticed that these two assumptions actually contradict each as a mask and a large hemisphere.
other, and he revised Kirchhoff’s work to be more accurate. However, the revision
in practice makes only a tiny difference as light spills onto the back of the aperture
over a distance of only a wavelength. We ignore this and make Kirchhoff’s (slightly
flawed) assumptions since it saves a lot of work.) Thus, we are left with only the
integration over the open aperture:
Ï " i kr
∂ e i kr
#
1 e ∂E
E (0) = −E da (10.40)
4π r ∂n ∂n r
aperture
We have essentially arrived at the result that we are seeking. The field coming
through the aperture is integrated to find the field at the origin, which is located
beyond the aperture. Let us manipulate the formula a little further. The second
term in the integral of (10.40) can be rewritten as follows:
∂ e i kr ∂ e i kr ∂r i ke i kr
Ã !
ik 1 i kr
µ ¶
= = − 2 e cos (r, n̂) → cos (r, n̂) (10.41)
∂n r ∂r r ∂n r r r Àλ r
where ∂r /∂n = cos (r, n̂) indicates the cosine of the angle between r and n̂. We
have also assumed that the distance r is much larger than a wavelength in order
to drop a term. Next, we assume that the field in the plane of the aperture can be
written as E ∼= Ẽ x, y e i kz . This represents a field traveling through the aperture
¡ ¢
from left to right. Then, we may write the first term in the integral of (10.40) as
∂E ∂E ∂z
= i k Ẽ x, y e i kz (−1) = −i kE (10.42)
¡ ¢
=
∂n ∂z ∂n
Substituting (10.41) and (10.42) into (10.40) yields
i e i kr 1 + cos (r, n̂)

Ï · ¸
E (0) = − E da (10.43)
λ r 2
aperture

Finally, we wish to rearrange our coordinate system to that depicted in Fig. 10.2.
In our derivation, it was less cumbersome to place the origin at a point after the
aperture. Now that we have completed our mathematics, it is convenient to make
a change of coordinate system and move the origin to the plane of the aperture as
in Fig. 10.2. Then, we can obtain the field at a point lying somewhere after the
aperture by computing
i ¢ e i kR 1 + cos (r, ẑ)

Ï · ¸
0 0
E x, y, z = d = − E x ,y ,z = 0 d x 0d y 0 (10.44)
¡ ¢ ¡
λ R 2
aperture
where q
¢2
R= (x − x 0 )2 + y − y 0 + d 2 (10.45)
¡
Equation (10.11) is the same as (10.43) after applying a coordinate transformation.

It is called the Fresnel-Kirchhoff diffraction formula and it agrees with (10.1)
except for the obliquity factor [1 + cos (r, ẑ)]/2.
Appendix 10.B Green’s Theorem

To derive Green’s theorem, we begin with the divergence theorem (see (0.11)):
I Z
f · n̂ d a = ∇ · f d v (10.46)
S V
The unit vector n̂ always points normal to the surface of volume V over which
the integral is taken. Let the vector function f be U ∇V , where U and V are both
analytical functions of the position coordinate r. Then (10.46) becomes
I Z
(U ∇V ) · n̂ d a = ∇ · (U ∇V ) d v (10.47)
S V
We recognize ∇V · n̂ as the directional derivative of V directed along the surface

normal n̂. This is often represented in shorthand notation as
∂V
∇V · n̂ = (10.48)
∂n
The argument of the integral on the right-hand side of (10.47) can be expanded
with the chain rule:
∇ · (U ∇V ) = ∇U · ∇V +U ∇2V (10.49)
With these substitutions, (10.47) becomes
∂V
I Z
U da = ∇U · ∇V +U ∇2V d v (10.50)
£ ¤
∂n
S V
Actually, so far we haven’t done much. Equation (10.50) is nothing more than
the divergence theorem applied to the vector function U ∇V . Similarly, we can

10.B Green’s Theorem 259
apply the divergence theorem to an alternative vector function given by the

combination V ∇U . Thus, we can write an equation similar to (10.50) where U
and V are interchanged:
∂U
I Z
V da = ∇V · ∇U + V ∇2U d v (10.51)
£ ¤
∂n
S V
We simply subtract (10.51) from (10.50), and this leads to (10.31) known as Green’s
theorem.

Exercises
Exercises for 10.1 Huygens’ Principle as Formulated by Fresnel
P10.1 Huygens’ principle is often used to describe diffraction through a slits,

but it can be also used to describe refraction. Use a drawing program
or a ruler and compass to produce a picture similar to Fig. 10.12, which
shows that the graphical prediction of refracted angle from the Huy-
gens’ principle. Verify that the Huygens picture matches the numerical
prediction from Snell’s Law for an incident angle of your choice. Use
n i = 1 and n t = 2.
HINT: Draw the wavefronts hitting the interface at an angle and treat
each point where the wavefronts strike the interface as the source of
Figure 10.12 circular waves propagating into the n = 2 material. The wavelength of
the circular waves must be exactly half the wavelength of the incident
light since λ = λvac /n. Use at least four point sources and connect the
matching wavefronts by drawing tangent lines as in the figure.
P10.2 (a) Show that the function
A
f (r ) = cos (kr − ωt )
r
is a solution to the wave equation in spherical coordinates with only
radial dependence,
1 ∂ 2 ∂f 1 ∂2 f
µ ¶
r =
r 2 ∂r ∂r v 2 ∂t 2
Determine what v is, in terms of k and ω.

(b) If the electric field were a scalar field, we might be done there.
However, it’s a vector field, and moreover it must satisfy Maxwell’s
equations. We know from experience that it’s generally transverse, and
since it’s traveling radially let’s make a guess that it’s oscillating in the φ̂
direction:
A
E(r ) = cos (kr − ωt ) φ̂
r
Show that this choice for E is not consistent with Maxwell’s equations.
In particular: (i) show that it does satisfy Gauss’s Law (1.1); (ii) compute
the curl of E use Faraday’s Law (1.3) to deduce B; (iii) Show that this B
does satisfy Gauss’s Law for magnetism (1.2); (iv) but this B it does not
satisfy Ampere’s law (1.4).
(c) A somewhat more complicated ‘spherical’ wave
A sin θ 1
· ¸
E(r, θ) = cos (kr − ωt ) − sin (kr − ωt ) φ̂
r kr

Exercises 261
does satisfy Maxwell’s equations. Describe how this wave behaves

as a function of r and θ. What conditions need to be satisfied for this
equation to reduce to the spherical wave formula used in the diffraction
formulas?
Exercises for 10.2 Scalar Diffraction Theory
P10.3 Show that E (r ) = E 0 r 0 e i kr /r is a solution to the scalar Helmholtz equa-

tion (10.6).
HINT:
1 ∂2 r ψ ∂ ∂ψ ∂2 ψ
¡ ¢
1 1
µ ¶
2
∇ ψ= + sin θ +
r ∂r 2 r 2 sin θ ∂θ ∂θ r 2 sin2 θ ∂φ2
P10.4 Learn by heart the derivation of the Fresnel-Kirchhoff diffraction for-

mula (outlined in Appendix 10.A). Indicate the percentage of how well
you understand the derivation. If you write 100% percent, it means
that you can reproduce the derivation after closing your notes.
P10.5 Check that (10.17) is the solution to the paraxial wave equation (10.16).
Exercises for 10.4 Fraunhofer Approximation
P10.6 (a) Repeat Example 10.1 to find the on-axis intensity after a circular
aperture in both the Fresnel and Fraunhofer approximations. (HINT:
Use (10.28) and (10.29) to obtain the fields ρ = 0.) Also make suitable
approximations directly to (10.3) to obtain the same answers.
(b) Check how well the Fresnel and Fraunhofer approximations work
by graphing the three curves (i.e. (10.3) and the curves obtained in part
(a)) on a single plot as a function of z. Take ` = 10 µm and λ = 500 nm.
To see the result better, use a log scale on the z-axis.
Fresnel
Fraunhofer
Fresnel-Kirchoff Figure 10.13 “The Fraunhofer Ap-

proximation” by Sterling Cornaby
z (mm)
Figure 10.14

L10.7 (a) Why does the on-axis intensity behind a circular opening fluctuate
(see Example 10.1) whereas the on-axis intensity behind a circular
obstruction remains constant (see example 10.2)?
(b) Create a collimated laser beam several centimeters wide. Observe
the on-axis intensity on a movable screen (e.g. a hand-held card) be-
hind a small circular aperture and behind a small circular obstruction
placed in the beam. (video)
(c) In the case of the circular aperture, measure the distance to several
on-axis minima and check that it agrees with prediction. (See problem
P10.6.)
Laser
Figure 10.15
P10.8 Calculate the Fraunhofer diffraction field and intensity patterns for a
rectangular aperture (dimensions ∆x by ∆y) illuminated by a plane
wave E 0 . In other words, derive (10.21).
P10.9 A single narrow slit has a mask placed over it so the aperture function
is not a square pulse but rather a cosine: E (x 0 , y 0 , 0) = E 0 cos(x 0 /L) for
−L/2 < x 0 < L/2 and E (x 0 , y 0 , 0) = 0 otherwise. Calculate the far-field
(Fraunhofer) diffraction pattern. Make a plot of intensity as a function
of xkL/2z; qualitatively compare the pattern to that of a regular single
slit.
Exercises for 10.5 Diffraction with Cylindrical Symmetry
P10.10 Calculate the Fraunhofer diffraction intensity pattern (10.30) for a cir-
cular aperture (diameter `) illuminated by a plane wave E 0 .

Chapter 11
Diffraction Applications
In this chapter, we consider a number of practical examples of diffraction. We first

discuss diffraction theory in systems involving lenses. The Fraunhofer diffraction
pattern discussed in section 10.4, applicable in the far-field limit, is imaged to the
focus of a lens when the lens is placed in the stream of light. This has important
implications for the resolution of instruments such as telescopes or the human
eye.
The array theorem, which applies to Fraunhofer limit, is introduced in sec-
tion 11.3. This theorem is a powerful mathematical tool that enables one to deal
conveniently with diffraction from an array of identical apertures. One of the
important uses of the array theorem is in determining Fraunhofer diffraction
from a grating, since a diffraction grating can be thought of as an array of narrow
slit apertures. In section 11.5, we study the workings of a diffraction spectrometer.
To find the resolution limitations, one combines the diffraction properties of
gratings with the Fourier properties of lenses.
Finally, we consider a Gaussian laser beam to understand its focusing and
diffraction properties. The information presented here comes up remarkably
often in research activity. We often think of lasers as collimated beams of light
that propagate indefinitely without expanding. However, the laws of diffraction
require that every finite beam eventually grow in width. The rate at which a laser
beam diffracts depends on its beam waist size. Because laser beams usually have
narrow divergence angles and therefore obey the paraxial approximation, we can
calculate their behavior via the Fresnel approximation discussed in section 10.3.
Appendix 11.A discusses the ABCD law for Gaussian beams, which is a method
of computing the effects of optical elements represented by ABCD matrices on
Gaussian laser beams.
11.1 Fraunhofer Diffraction Through a Lens

The Fraunhofer limit corresponds to the ultimate amount of diffraction that
light in an optical system experiences. As has been previously discussed, the
Fraunhofer approximation applies to diffraction when the propagation distance
263
264 Chapter 11 Diffraction Applications
from an aperture is sufficiently large (see (10.19) and (10.20)). Mathematically, it

is obtained via a two-dimensional Fourier transform. The intensity of the far-field
diffraction pattern is
¯ ¯2
¢ 1 1
¯ Ï ¯
¯ ¡ 0 0 ¢ −i k x x 0 + y y 0
¡ ¢
0 0¯
¯
I x, y, z = c²0 ¯ E x , y ,0 e dx dy ¯ (11.1)
¡
¯ z z
2 ¯ λz
aperture
¯
Notice that the dependence of the diffraction on x, y, and z comes only

through the combinations θx ∼ = x/z and θ y ∼= y/z. Therefore, the diffraction
pattern in the Fraunhofer limit is governed by the two angles θx and θ y , and
the pattern preserves itself indefinitely. As the light continues to propagate, the
pattern increases in size at a rate proportional to distance traveled so that the
angular width is preserved. The situation is depicted in Fig. 11.1.
Recall that in order to use the Fraunhofer diffraction formula we need to
¢2
satisfy z À π aperture radius /λ (see (10.19)). As an example, if an aperture
¡
with a 1 cm radius (not necessarily circular) is used with visible light, the light
must travel more than a kilometer in order to reach the Fraunhofer limit. It
may therefore seem unlikely to reach the Fraunhofer limit in a typical optical
Figure 11.1 Diffraction in the far system, especially if the aperture or beam size is relatively large. Nevertheless,
field.
spectrometers, which typically utilize diffraction gratings many centimeters wide,
depend on achieving the Fraunhofer limit within the confines of a manageable
instrument box. This is accomplished using imaging techniques. The Fraunhofer
limit is also important to the performance of other optical instruments that use
lenses (e.g. a telescope).
Consider a lens with focal length f placed in the path of light following an
aperture (see Fig. 11.2). Let the lens be placed an arbitrary distance L after the
aperture. The lens produces an image of the Fraunhofer pattern at a new location
d i following the lens according to the imaging formula (see (9.55))
1 1 1
= + . (11.2)
f − (z − L) d i
Keep in mind that the lens interrupts the light before the Fraunhofer pattern
has a chance to form. This means that the Fraunhofer diffraction pattern may
Image of the pattern that

would have appeared at
infinity
Figure 11.2 Imaging of the Fraunhofer diffraction pattern to the focus of a lens.

11.1 Fraunhofer Diffraction Through a Lens 265
be thought of as a virtual object a distance z − L to the right of the lens. Since

the Fraunhofer diffraction pattern occurs at very large distances (i.e. z → ∞) the
image of the Fraunhofer pattern appears at the focus of the lens:
di ∼
= f. (11.3)
Thus, a lens makes it very convenient to observe the Fraunhofer diffraction pat-
tern even from relatively large apertures. It is not necessary to let the light propa-
gate for kilometers. We need only observe the pattern at the focus of the lens as
shown in Fig. 11.2. Notice that the spacing L between the aperture and the lens is
unimportant to this conclusion.
Even though we know that the Fraunhofer diffraction pattern occurs at the
focus of a lens, the question remains as to the size of the image. To find the answer,
let us examine the magnification (9.56), which is given by
di
M =− (11.4)
− (z − L)
Taking the limit of very large z and employing (11.3), the magnification becomes
f
M→ (11.5)
z
This is a remarkable result. When the lens is inserted, the size of the diffraction
pattern decreases by the ratio of the lens focal length f to the original distance
z to a far-away screen. Since in the Fraunhofer regime the diffraction pattern is
proportional to distance (i.e. si ze ∝ z), the image at the focus of the lens scales
in proportion to the focal length (i.e. si ze ∝ f ). This means that the angular
width of the pattern is preserved! With the lens in place, we can rewrite (11.1)
straightaway as
¯ ¯2
¢ 1 ¯ 1
¯ Ï ¯
¡ 0 0 ¢ −i k (xx 0 +y y 0 ) 0 0 ¯
∼
I x, y, L + f = c²0 ¯¯ E x , y ,0 e d x d y ¯¯ (11.6)
¡
f
2 ¯λf
aperture
¯
which describes the intensity distribution pattern at the focus of the lens.
Although (11.6) correctly describes the intensity, we cannot easily write the
electric field since the imaging techniques that we have used do not render the
phase information. To obtain an expression for the field, it will be necessary to
employ the Fresnel diffraction formula. In addition, we need to know how a lens
adjust the phase fronts of the light passing through it.
Phase Front Alteration by a Lens
Consider a monochromatic light field that goes through a thin lens with focal
length f . In traversing the lens, the wavefront undergoes a phase shift that varies
across the lens. We will reference the phase shift to that experienced by the light
that goes through the center of the lens. In the Fig. 11.3, R 1 is a positive radius

of curvature, and R 2 is a negative radius of curvature, according to our previous

convention. We take the distances `1 and `2 , as drawn, to be positive.
The light passing through the off-axis portion of the lens experiences less material
than the light passing through the center. The difference in optical path length is
(n − 1) (`1 + `2 ) (see discussion connected with (9.14)). This means that the phase
of the field passing through the off-axis portion of the lens relative to the phase of
field passing through the center is
∆φ = −k (n − 1) (`1 + `2 ) . (11.7)
Figure 11.3 A thin lens, which The negative sign indicates a phase advance (i.e. same sign as −ωt ). Since off axis
modifies the phase of a field pass- the light travels through less material, the phase of the wave front gets ahead of
ing through. the light traveling through the center of the lens. In (11.7), k represents the wave
number in vacuum (i.e. 2π/λvac ); since `1 and `2 correspond to distances outside
of the lens material.
We can find expressions for `1 and `2 from the equations describing the spherical
surfaces of the lens:
(R 1 − `1 )2 + x 2 + y 2 = R 12
(11.8)
(R 2 + `2 )2 + x 2 + y 2 = R 22
In the Fresnel approximation, which takes place in the paraxial limit, it is appro-
priate to neglect the terms `21 and `22 in comparison with the other terms present.
Within this approximation, equations (11.8) become
x2 + y 2 x2 + y 2
`1 ∼
= and `2 ∼
=− (11.9)
2R 1 2R 2
Substitution into (11.7) yields
x2 + y 2
¶¡ ¢
1 1 k ¡ 2
µ
∆φ = −k (n − 1) x + y2 (11.10)
¢
− =−
R1 R2 2 2f
where the focal length of a thin lens f has been introduced according to lens-
maker’s formula (9.46).
In summary, the light traversing a lens experiences a relative phase shift given by
¢ −i k x 2 +y 2 )
E x, y, z after lens = E x, y, z before lens e 2 f ( (11.11)
¡ ¢ ¡
Equation (11.11) introduces a wave-front curvature to the field. For example, if a

plane wave (i.e. a uniform field E 0 ) passes through the lens, the field emerges with
a spherical-like wave front converging towards the focus of the lens.
We compute the diffraction pattern after the lens in three steps, as illustrated
in Fig. 11.4). First, we use the Fresnel diffraction formula to compute the field
arriving at the lens. Second, we adjust the phase front of the light passing through
the lens according to (11.11). Third, we use the field exiting the lens as the input
for a second Fresnel diffraction integral to find the field at the lens focus. The
result gives an intensity pattern in agreement with (11.6). It also provides the full
expression for the field, including its phase.

11.2 Resolution of a Telescope 267
Starting from the known field E x 0 , y 0 , 0 at the aperture, we compute the field
¡ ¢
incident on the lens using the Fresnel approximation:
e i kL e i 2L (x
k 002
+y 002 ) Ï
E (x 0 , y 0 , 0)e i 2L (x
k
00 00 02
+y 02 ) −i kL (x 00 x 0 +y 00 y 0 )
E (x , y , L) = −i e d x 0d y 0
λL
(11.12)
(The double primes keep track of distinct variables in sequential diffraction
integrals.) As mentioned, the field gains a phase factor according to (11.11) upon
transmitting through the lens. Finally, we use the Fresnel diffraction formula a
second time to propagate the distance f from the back of the thin lens:
i 2kf (x 2 +y 2 )
eik f e
Ï h
E (x 00 , y 00 , L)e 2 f (
−i k x 002 +y 002 )
i
E x, y, L + f = −i
¡ ¢
λf
i 2kf (x 002 +y 002 ) −i kf (xx 00 +y y 00 )
×e e d x 00 d y 00 (11.13)
As is immediately appreciated by students, the injection of (11.12) into (11.13)

makes a rather long formula involving four dimensions of integration. Neverthe-
less, two of the integrals can be performed in advance of choosing the aperture
(i.e. those over x 00 and y 00 ). This is accomplished with the help of the integral
formula (0.55) (even though in this instance the real part of A is zero). After this
cumbersome work, (11.13) reduces to
kL 2 2
i 2kf (x 2 +y 2 ) −i 2 f 2 (x +y )
e i k (L+ f ) e e Figure 11.4 Diffraction from an
Ï k
E x, y, L + f = −i
¡ ¢
E (x 0 , y 0 , 0)e
−i f (xx 0 +y y 0 ) d x 0 d y 0
λf aperture viewed at the focus of a
(11.14) lens.
Notice that at least the integration portion of this formula looks exactly like
the Fraunhofer diffraction formula! This happened even though in the preceding
discussion we did not at any time specifically make the Fraunhofer approximation.
The result (11.14) implies the intensity distribution (11.6) as anticipated. However,
the phase of the field is also revealed in (11.14).
In general, the field caries a wave front curvature as it passes through the
focal plane of the lens. In the special case L = f , the diffraction formula takes a
particularly simple form:
e 2i k f
Ï
−i kf (xx 0 +y y 0 )
E (x 0 , y 0 , L + f )¯L= f = −i E (x 0 , y 0 , 0)e d x 0d y 0
¯
(11.15)
λf
When the lens is placed at this special distance following the aperture, the Fraun-
hofer diffraction pattern viewed at the focus of the lens carries a flat wave front.
11.2 Resolution of a Telescope

In the previous section we learned that the Fraunhofer diffraction pattern appears
at the focus of a lens. This has important implications for telescopes and other
optical instruments. In essence, any optical instrument incorporates an aperture,
limiting the light that enters. If nothing else, the diameter of a lens itself acts

effectively an aperture. The pupil of the human eye is an aperture that induces a
Fraunhofer diffraction pattern to occur at the retina. Cameras have irises which
aperture the light, again causing a Fraunhofer diffraction pattern to occur at the
image plane.
Of course, the focus of the lens is just where one needs to look in order to
see images of distant objects. The Fraunhofer pattern, which occurs at the focus,
represents the ultimate amount of diffraction caused by an aperture. This has the
effect of blurring out features in the image and limiting resolution. This illustrates
why it is impossible to focus light to a true point.
Figure 11.5 To resolve distinct im-
ages at the focus of a lens, the an- Suppose you point a telescope at two distant stars. An image of each star is
gular separation must exceed the formed in the focal plane of the lens. The angular separation between the two
width of the Fraunhofer diffraction images (referenced from the lens) is the same as the angular separation between
patterns. the stars.1 This is depicted in Fig. 11.5.
A resolution problem occurs when the Fraunhofer diffraction causes the
image of each star to blur by more than the angular separation between them.
In this case the two images cannot be resolved because they ’bleed’ into one
another.
The Fraunhofer diffraction pattern from a circular aperture was computed
previously (see (10.30)). At the focus of a lens, this pattern becomes
µ 2 ¶2 · ¢ ¸2
π` J 1 k`ρ/2 f
¡
I ρ, f = I 0 2 ¡ (11.16)
¡ ¢
4λ f k`ρ/2 f
¢
where f , the focal length of the lens, takes the place of z in the diffraction formula.
The parameter ` is its diameter of the lens. This intensity pattern contains the
first order Bessel function J 1 , which behaves somewhat like a sine wave as seen in
Fig. 11.6. The main differences are that the zero crossings are not exactly periodic
and the function slowly diminishes with larger arguments. The first zero crossing
(after x = 0) occurs at 1.22π.
The intensity pattern described by (11.16) contains the factor 2J 1 (ξ)/ξ, where
ξ represents the combination k`ρ/2 f . As noticed in Fig. 11.6, J 1 (ξ) goes to zero
at ξ = 0. Thus, we have a zero-divided-by-zero situation when evaluating 2J 1 (ξ)/ξ
at the origin. This is similar to the sinc function (i.e. sin (ξ)/ξ), which approaches
one at the origin. In fact, 2J 1 (ξ)/ξ is sometimes called the jinc function because it
also approaches one at the origin. The square of the jinc is shown in Fig. 11.6b.
This curve is proportional to the intensity described in (11.16). This pattern is
sometimes called an Airy pattern after Sir George Biddell Airy (English, 1801–1892)
Figure 11.6 (a) First-order Bessel who first described the pattern. As can be seen in Fig. 11.6b, the intensity quickly
function. (b) Square of the Jinc drops at larger radii.
function.
1 In the thin-lens approximation, the ray from either star that traverses the center of the lens (i.e.
y = 0) maintains its angle:
0 1 0 0 0
· ¸ · ¸· ¸ · ¸
= =
θ2 −1/ f 1 θ1 θ1

11.2 Resolution of a Telescope 269
We now return to the question of whether the images of two nearby stars
as depicted in Fig. 11.5 can be distinguished. Since the peak in Fig. 11.6b is the
dominant feature in the diffraction pattern, we will say that the two stars are
resolved if the angle between them is enough to keep their respective diffraction
peaks from seriously overlapping. We will adopt the criterion suggested by Lord
Rayleigh that the peaks are distinguishable if the peak of one pattern is no closer
than the first zero to the other peak. This situation is shown in Fig. 11.7.
The angle that corresponds to this separation of diffraction patterns is found
simply by setting the argument of (11.16) equal to 1.22π, the location of the first
zero:
k`ρ
= 1.22π (11.17)
2f
With a little rearranging we have
ρ 1.22λ
θmin ∼
= = (11.18)
f `
Here we have associated the ratio ρ/ f (i.e. the radius of the diffraction pattern
compared to the distance from the lens) with an angle θmin . The Rayleigh criterion
requires that the diffraction patterns be separated by at least this angle before we
say that they are resolved. Figure 11.7 The Rayleigh criterion
for a circular aperture.
θmin depends on the diameter of the lens ` as well as on the wavelength of the
light. Since the angle between the images and the angle between the objects is
the same, θmin tells the minimum angle between objects that can be resolved with
a given instrument. This analysis assumes that the light from the two objects is
incoherent, meaning the intensities in the image plane add; interferences between
the two fields fluctuate rapidly in time and average away.
Example 11.1
What minimum telescope diameter is required to distinguish a Jupiter-like planet

(orbital radius 8 × 108 km) from its star if they are 10 light-years away?
Solution: From (11.18) and assuming 500 nm light, we need
1.22λ 1.22(500 × 10−9 m) 9.5 × 1015 m

`> = × = 0.07m
θmin (8 × 1011 m)/(10ly) ly
This seems like a piece of cake; a telescope with a diameter bigger than 7cm will do
the trick. However, the vastly unequal brightness of the star and the planet is the
real technical challenge. The faint diffraction rings in the star’s diffraction pattern
completely swamp the faint signal from the planet.

11.3 The Array Theorem

In this section we develop the array theorem, which is used for calculating the
Fraunhofer diffraction from an array of N identical apertures. We will be using
the theorem to compute diffraction from a grating, which may be thought of as a
mask with many closely spaced identical slits. However, the array theorem can be
applied to apertures with any shape, as suggested by Fig. 11.8.
Consider N apertures in a mask, each with the identical field distribution
described by E aperture (x 0 , y 0 , 0). Outside of the aperture, we suppose E aperture (x 0 , y 0 , 0)
is zero, so in a diffraction integral we won’t need to worry about the limits of
integration; we can just integrate over the whole mask. Each identical aperture has
a unique location on the mask. Let the location of the n th aperture be designated
by the coordinates x n0 , y n0 . The field associated with the n th aperture is then
¡ ¢
E aperture (x 0 − x n0 , y 0 − y n0 , 0), where the offset in the arguments simply shifts the
Figure 11.8 Array of identical aper- location of the aperture. The field comprising all of the identical apertures is
tures.
N
E x 0, y 0, 0 = E aperture (x 0 − x n0 , y 0 − y n0 , 0)
¢ X
(11.19)
¡
n=1
We next compute the Fraunhofer diffraction pattern for the above field. Upon
inserting (11.19) into the Fraunhofer diffraction formula (10.20) we obtain
k 2 +y 2 ∞ Z∞
e i kz e i 2z (x ) X
N Z k
d y 0 E aperture x 0 − x n0 , y 0 − y n0 , 0 e −i z (xx +y y )
0 0 0
E x, y, z = −i dx
¡ ¢ ¡ ¢
λz n=1
−∞ −∞
(11.20)
where we have taken the summation out in front of the integral. We have also
integrated over the entire (infinitely wide) mask since E aperture is nonzero only
inside each aperture.
Even without yet choosing the shape of the identical apertures, we can make
some progress on (11.20) with the change of variables x 00 ≡ x 0 −x n0 and y 00 ≡ y 0 − y n0 :
Z∞ Z∞
e i kz e i 2z (x
k 2
+y 2 ) X
N
00
E x, y, z = −i dx d y 00 E aperture x 00 , y 00 , 0
¡ ¢ ¡ ¢
λz n=1
−∞ −∞
× e −i z [x (x
k 00
+x n0 )+y ( y 00 +y n0 )]
(11.21)
Next we simply pull the factor exp {−i kz (xx n0 + y y n0 )} out in front of the integral to
arrive at our final result:
" #
N
−i kz (xx n0 +y y n0 )
X
E x, y, z = e
¡ ¢
n=1
Z∞ Z∞
 
e i kz e i 2z (x
k 2
+y 2 )
d y 0 E aperture x 0 , y 0 , 0 e −i (xx +y y ) 
k 0 0
d x0
¡ ¢
× −i z
λz
−∞ −∞
(11.22)

11.3 The Array Theorem 271
For the sake of elegance, we have traded back x 0 for x 00 and y 0 for y 00 as the
variables of integration. Equation (11.22) is known as the array theorem.2 Note
that the second factor in brackets is exactly the Fraunhofer diffraction pattern
from a single aperture centered on x 0 = 0 and y 0 = 0. When more than one
identical aperture is present, we only need to evaluate the Fraunhofer diffraction
formula for a single aperture. Then, the single-aperture result is multiplied by the
summation in front, which entirely contains the information about the placement
of the (many) identical apertures.
Example 11.2
Calculate the Fraunhofer diffraction pattern for two identical circular apertures
with diameter ` whose centers are separated by a spacing h.
Solution: As computed previously, the single-slit Fraunhofer diffraction pattern

from a circular aperture (see (10.30)) is
µ 2 ¶2 · ¢ ¸2
π` J 1 k`ρ/2 f
¡
I ρ, z = I 0 2 ¡
¡ ¢
4λz k`ρ/2z
¢
From the array theorem (11.22), the intensity of the overall diffraction pattern is Figure 11.9 Fraunhofer diffraction
¯
2
¯2 µ 2 ¶2 · ¢ ¸2 pattern from two identical circular
π` J 1 k`ρ/2 f
¡
−i kz (xx n0 +y y n0 ) ¯
¢ ¯¯ X ¯
I x, y, z = ¯ e ¯ × I0 2 ¡ holes separated by twice their
¡
4λz k`ρ/2z
¢
¯n=1
diameters.
¯
Let y 10 = y 20 = 0. To create the separation h, let x 10 = −h/2 and x 20 = h/2. Then

2 ³ ´ ³ ´ µ
khx
¶
k −i k − hx −i k hx
e −i z (xxn +y y n ) = e z 2 + e z 2 = 2 cos
X 0 0
n=1 2z
The overall pattern then becomes

¶2 · ¢ ¸2
π`2 J 1 k`ρ/2 f
¡
2 khx
µ µ ¶
I x, y, z = I 0 2 cos
¡ ¢
2λz k`ρ/2z 2z
¡ ¢
This pattern can be seen in Fig. 11.9.

2 A somewhat abstract alternative route to the array theorem recognizes that the field for
each aperture can

¢ be written as a 2-D convolution (see P0.26) between the aperture function
E aperture x 0 , y 0 , 0 and delta functions specifying the aperture location:
¡
Z∞ Z∞
E aperture x 0 − x n0 , y 0 − x n0 , 0 = dx 0
d y 0 δ x 00 − x n0 δ y 00 − x n0 E aperture x 0 − x 00 , y 0 − y 00 , 0
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
−∞ −∞
The integral in (11.20) therefore may be viewed as 2-D Fourier transform of a convolution, where
kx/z and k y/z play the role of spatial frequencies. The convolution theorem (see P0.26) indicates
that this is the same as the product of Fourier transforms. The 2-D Fourier transform for the delta
function (times 2π) is
Z∞ Z∞
k k
d y 00 δ x 00 − x n0 δ y 00 − y n0 e −i z (xx +y y ) = e −i z (xxn +y y n )
00 00 0 0
d x 00
¡ ¢ ¡ ¢
−∞ −∞
The array theorem (11.22) exhibits this factor. It multiplies the single-slit Fraunhofer diffraction
integral, which is the Fourier transform of the other function.

11.4 Diffraction Grating

In this section we will use the array theorem to calculate the diffraction from
a grating comprised of an array of equally spaced identical slits. An array of
uniformly spaced slits is called a transmission grating (see Fig. 11.10). Reflection
gratings are similar, being composed of an array of narrow rectangular mirrors
that behave similarly to the slits.
The Fraunhofer diffraction pattern from a single rectangular aperture was
previously calculated (see Example 10.4 and problem P10.8):
∆x∆ye i kz i k (x 2 +y 2 ) π∆x π∆y

µ ¶ µ ¶
E aperture x, y, z = −i E 0 e sinc x sinc y (11.23)
¡ ¢
2z
λz λz λz
The only part of (11.22) that remains to be evaluated is the summation out in
front. Let the apertures be positioned at
N +1
µ ¶
Figure 11.10 Transmission grating. 0
xn = n − h, y n0 = 0 (11.24)
2
where N is the total number of slits. Then the summation in the array theorem,
(11.22), becomes
N N
e −i z (xxn +y y n ) = e i
k khx
0 0
¡ N +1 ¢ X khx
n
e −i
X
z 2 z (11.25)
n=1 n=1
This summation is recognized as a geometric sum, which can be performed using

formula (0.62).
Equation (11.25) then simplifies to
khx
N N
X −i kz (xx n0 +y y n0 ) i kz
¡ N +1 ¢
xh −i khx e −i z −1
e =e 2 e z
−i khx
n=1 e ³− 1 ´
z
(11.26)
e −i
khx
2z
N
− ei
khx
2z
N sin N khx
2z
= khx khx
= ³ ´
e −i 2z − ei 2z sin khx
2z
By combining (11.23) and (11.26) we obtain the full Fraunhofer diffraction pattern
for a diffraction grating. The expression for the field is
³ ´
khx "
¢ sin N 2z ∆x∆ye i kz i k (x 2 +y 2 )
¶#
π∆x π∆y
µ ¶ µ
E x, y, z = ´ −i E 0 e 2z sinc x sinc y
¡
λz λz λz
³
sin khx
2z
(11.27)
Now let us suppose that the slits are really tall (parallel to the y-dimension)
such that ∆y À λ. If they slits are infinitely tall, the final sinc function in Eq. (11.27)
can be approximated as one. 3 The intensity pattern in the horizontal direction
3 This is mostly the right idea, but is still a bit of a fake. In fact, the field often does not have a
uniform phase along the entire slit in the y-dimension, so our use of the function sinc π∆y/λz y
£¡ ¢ ¤
was inappropriate to begin with. The energy in a real spectrometer is usually spread out in a diffuse
pattern in the y-dimension. However, its form in y is of little relevance; the spectral information is
carried in the x-dimension only.

11.5 Spectrometers 273
can then be written in terms of the peak intensity of the diffraction pattern on the
screen: ³ ´
¶ sin2 N πhx
π∆x
µ
λz
I (x) = I peak sinc2 x (11.28)
λz
³ ´
N 2 sin2 πhx
λz
(a) N = 2
sin N α
Note that lim = N so we have placed N 2 in the denominator when intro-
α→0 sin α
ducing our definition of I peak , which represents the intensity on the screen at
x = 0. In principle, the intensity I peak is a function of y and depends on the exact
details of how the slits are illuminated as a function of y, but this is usually not of
interest as long as we stay with a given value of y as we scan along x.
It is left as an exercise to study the functional form of (11.28), especially (b) N = 5
how the number of slits N influences the behavior. The case of N = 2 describes
the diffraction pattern for a Young’s double slit experiment. We now have a
description of the Young’s two-slit pattern in the case that the slits have finite
openings of width ∆x rather than infinitely narrow ones.
A final note: You may wonder why we are interested in Fraunhofer diffraction
from a grating. The reason is that we are actually interested in separating different
wavelengths by observing their distinct diffraction patterns separated in space. In (c) N = 10
order to achieve good spatial separation between light of different wavelengths,
it is necessary to allow the light to propagate a far distance. Optimal separation
(the maximum possible) occurs therefore in the Fraunhofer regime.
11.5 Spectrometers
(d) N = 100
The formula (11.28) can be exploited to make wavelength measurements. This
forms the basis of a diffraction grating spectrometer. A spectrometer has relatively
poor resolving power compared to a Fabry-Perot interferometer. Nevertheless, a
spectrometer is not hampered by the serious limitation imposed by free spectral
range. A spectrometer is able to measure a wide range of wavelengths simulta-
neously. The Fabry-Perot interferometer and the grating spectrometer in this
sense are complementary, the one being able to make very precise measurements
within a narrow wavelength range and the other being able to characterize wide
ranges of wavelengths simultaneously. Figure 11.11 Diffraction through
To appreciate how a spectrometer works, consider Fraunhofer diffraction various numbers of slits, each
from a grating, as described by (11.28). The structure of the diffraction pattern with ∆x = h/2 (slit widths half
has various peaks. For example, Fig. 11.11a shows the diffraction peaks from a the separation). The dotted line
Young’s double slit (i.e. N = 2). The diffraction pattern is comprised of the typical shows the single slit diffraction
pattern. (a) Diffraction from a
Young’s double-slit pattern multiplied by³ the´diffraction pattern of a single slit.
³ ´ ³ ´ double slit. (b) Diffraction from 5
(Note that sin2 2 πhx
λz /4sin
2 πhx
λz = cos2 πhx
λz .) slits. (c) Diffraction from 10 slits.
As the number of slits N is increased, the peaks seen in the Young’s double-slit (d) Diffraction from 100 slits.
pattern tend to sharpen with additional smaller peaks appearing in between.
Figure 11.11b shows the case for N = 5. The more significant peaks occur when
sin(πhx/λz) in the denominator of (11.28) goes to zero. Keep in mind that the

numerator goes to zero at the same places, creating a zero-over-zero situation, so

the peaks are not infinitely tall.
With larger values of N , the peaks can become extremely sharp, and the small
secondary peaks in between are smaller in comparison. Fig. 11.11c shows the
case of N = 10 and Fig. 11.11d, shows the case of N = 100.
When very many slits are used, the diffraction pattern becomes very useful for
measuring spectra of light, since the position of the diffraction peaks depends on
wavelength (except for the center peak at x = 0). If light of different wavelengths
is simultaneously present, then the diffraction peaks associated with different
wavelengths appear in different locations. It helps to have very many slits involved
(i.e. large N ) so that the diffraction peaks are sharply defined. Then closely spaced
wavelengths can be more easily distinguished.
Consider the inset in Fig. 11.11d, which gives a close-up view of the first-order
Figure 11.12 Animation showing diffraction peak for N = 100. The location of this peak on a distant screen varies
diffraction through a number of
with the wavelength of the light. How much must the wavelength change to cause
slits.
the peak to move by half of its ‘width’ as marked in the inset of Fig. 11.11d? We
will say that this is the minimum separation of wavelengths that still allows the
two peaks to be distinguished.
Finding the Minimum Distinguishable Wavelength Separation
As mentioned, the main diffraction peaks occur when the denominator of (11.28)
goes to zero, i.e.
πhx
= mπ (11.29)
λz
The numerator of (11.28) goes to zero at these same locations (i.e. N πhx/λz =
N mπ), so the peaks remain finite. If two nearby wavelengths λ1 and λ2 are sent
through the grating simultaneously, their m th peaks are located at
mzλ1 mzλ2
x1 = and x 2 = (11.30)
h h
These are spatially separated by
mz
∆x λ ≡ x 2 − x 1 = ∆λ (11.31)
h
where ∆λ ≡ λ2 − λ1 .
Meanwhile, we can find the spatial width of, say, the first peak by considering the
change in x 1 that causes the sine in the numerator of (11.28) to reach the nearby
zero (see inset in Fig. 11.11d). This condition implies
πh x 1 + ∆x peak
¡ ¢
N = N mπ + π (11.32)
λ1 z
We will say that two peaks, associated with λ1 and λ2 , are barely distinguishable
when ∆x λ = ∆x peak . We also substitute from (11.30) to rewrite (11.32) as
πh (mzλ1 /h + mz∆λ/h) λ
N = N mπ + π ⇒ ∆λ = (11.33)
λ1 z Nm
Here we have dropped the subscript on the wavelength in the spirit of λ1 ≈ λ2 ≈ λ.

11.6 Diffraction of a Gaussian Field Profile 275
As we did for the Fabry-Perot interferometer, we can define the resolving

power of the diffraction grating as
λ
RP ≡ = mN (11.34)
∆λ
The resolving power is proportional to the number of slits illuminated on the
diffraction grating. The resolving power also improves for higher diffraction
orders m.
Example 11.3
What is the resolving power of a 2-cm-wide grating with 500 slits per millimeter,
and how wide is the 1st-order diffraction peak for 500-nm light after 1-m focusing?
Solution: From (11.34) the resolving power is

5000
RP = mN = 2cm = 104
cm
and the minimum distinguishable wavelength separation is
∆λ = λ/RP = 500nm/104 = 0.05nm
From (11.31), with z → f , we have

mf 1m
∆x = ∆λ = 0.05nm = 25µm
h 2 × 10−6 m
11.6 Diffraction of a Gaussian Field Profile

Consider a Gaussian field profile described, at the plane z = 0, with the functional
form 02 02 x +y
−
w2
E (x 0 , y 0 , 0) = E 0 e 0 (11.35)
where w 0 , called the beam waist, specifies the radius of Gaussian profile. It is
depicted in Fig. 11.13. To better appreciate the meaning of w 0 , consider the
intensity of the above field distribution:
02 2
I x 0 , y 0 , 0 = I 0 e −2ρ /w 0 (11.36)
¡ ¢
z-axis
where ρ 02 ≡ x 02 + y 02 . In (11.36) we see that w 0 indicates the radius at which the

intensity reduces by the factor e −2 = 0.135.
We would like to know how this field evolves when it propagates forward from
the plane z = 0. We compute the field downstream using the Fresnel approxima-
tion (10.14): Figure 11.13 Diffraction of a Gaus-
k 2 +y 2 ∞ Z∞ sian field profile.
e i kz e i 2z (x ) Z 2
i k 02 02 k
d y 0 E 0 e −(x +y )/w 0 e i 2z (x +y ) e −i z (xx +y y )
h 02 02 0 0
E x, y, z = −i d x0
¡ ¢
λz
−∞ −∞
(11.37)

The Gaussian profile itself limits the dimension of the emission region, so there is
no problem in integrating to infinity. Equation (11.37) can be rewritten as
∞ ∞
µ ¶ µ ¶
k 2 +y 2
E 0 e i kz e i 2z (x ) Z 1 k ky
− −i 2z x 02 −i kx
z x
0 Z − 1 k
+i 2z y 02 −i z y 0
0 w2 0 w2
E x, y, z = −i dx e dy e
¡ ¢
0 0
λz
−∞ −∞
(11.38)
The integrals over x 0 and y 0 have the identical form and can be done individually
with the help of the integral formula (0.55). The algebra is cumbersome, but the
integral in the x 0 dimension becomes
1  ³ ´2 
Z∞ −i kx

2
µ ¶
− 1
w2
−i k
2z
02
x −i kx
z
x 0
π z
d x 0e =  exp 
0

1 k
 ³ ´
−i 1 k
−∞ w 02 2z 4 w 2 − i 2z
0
 1  
2
2
π −kx
= ´  exp  ³
   
³ ´
k 2z 2z
−i 2z 1 + i kw 2 2z kw 2 − i
0 0
 1
2  h i
2 2z
 λz   −kx kw 02
+i 
= r  exp 
  
´2 i tan−1 2z  · ³ ´2 ¸ 
2z
³
2z kw 2 2z 1 + kw
 
1 + kw 2 e 0 2
0 0
(11.39)
A similar expression results from the integration on y 0 .
When (11.39) and the equivalent expression for the y-dimension are used in
(11.38), the result is
(xÃ2 +y 2 )!
µ ¶
1 k
− 2 +i 2z
w2
e i kz e i (x +y )
k 2 2
2z 0 2z
2z 1+ −i tan−1
kw 2 kw 2
E x, y, z = E 0 r ´2 e e (11.40)
¡ ¢
0 0
³
2z
1 + kw 2
0
This rather complicated-looking expression for the field distribution is in fact very
useful and can be directly interpreted, as discussed in the next section.
Gaussian Field in Cylindrical Coordinates
A Gaussian field profile is one of few diffraction problems that can be handled con-
veniently in either the Cartesian (as above) or cylindrical coordinate. In cylindrical
coordinates, the Fresnel diffraction integral (10.28) is
kρ 2 Z∞
2πi e i kz e i kρ 02 kρρ 0
µ ¶
2z 02 2
E ρ, z = − ρ 0 d ρ 0 E 0 e −ρ /w 0 e i 2z J 0
¡ ¢
λz z
0

11.7 Gaussian Laser Beams 277
We can use the integral formula (0.59) to obtain
kρ 2
µ ¶
z
− " #
1 −i k
kρ 2 4
i kz i w2 2z
2πe e 2z e 0
E ρ, z = −i E 0
¡ ¢
λz
· ¸
1 k
2 w 02
− i 2z
ρ2
µ ¶
1 k
− +i 2z
kρ 2
!2
w2
Ã
e i kz e i 2z 0 2z
2z 1+ −i tan−1
kw 2 kw 2
= E0 s µ ¶2 e
0 e 0
2z
1+ kw 02
which is identical to (11.40).
11.7 Gaussian Laser Beams

The cumbersome Gaussian-field expression (11.40) can be cleaned up through
the judicious introduction of new quantities:
w 0 − ρ22 i kz+i 2R(z)

kρ 2
−i tan−1 zz
E ρ, z = E 0 e w (z) e (11.41)
¡ ¢
0
w (z)
where
ρ2 ≡ x 2 + y 2, (11.42)
q
w (z) ≡ w 0 1 + z 2 /z 02 , (11.43)
R (z) ≡ z + z 02 /z, (11.44)
2
kw 0
z0 ≡ (11.45)
2
This formula describes the lowest-order Gaussian mode, the most common laser
beam profile. (Please be aware that some lasers are multimode and exhibit more
complicated structures.)
It turns out that (11.41) works equally well for negative values of z. The
expression can therefore be used to describe the field of a simple laser beam
everywhere (before and after it goes through a focus). In fact, the expression
works also near z = 0!4 At z = 0 the diffracted field (11.41) returns the exact
expression for the original field profile (11.35) (see P11.11). In short, (11.41) may
be used with impunity as long as the divergence angle of the beam is not too wide.
4 There is good reason for this since the Fresnel diffraction integral is an exact solution to the
paraxial wave equation (10.16). The beam (11.41) therefore satisfies the paraxial wave equation for
all z.

To begin our interpretation of (11.41), consider the intensity profile I ∝ E ∗ E

as depicted in Fig. 11.14:
w 2 − 2ρ2 I0 2ρ 2
− 2
I ρ, z = I 0 2 0 e w 2 (z) = e (11.46)
¡ ¢
w (z)
w (z) 1 + z 2 /z 02
By inspection, we see that w (z) gives the radius of the beam anywhere along
z. At z = 0, the beam waist, w (z = 0) reduces to w 0 , as expected. The parameter
z 0 , known as the Rayleigh range, specifies the distance along the axis from z = 0
to the point where the intensity decreases by a factor of 2. Note that w 0 and z 0
are not independent of each other but are connected through the wavelength
Figure 11.14 A Gaussian laser field according to (11.45). There is a tradeoff: a small beam waist means a short depth
profile in the vicinity of its beam of focus. That is, a small w 0 means a small Rayleigh range z 0 .
waist. We next consider the phase terms that appear in the field expression (11.41).
The phase term i kz + i kρ 2 /2R (z) describes the phase of curved wave fronts,
where R (z) is the radius of curvature of the wave front at z. At z = 0, the radius of
curvature is infinite (see (11.44)), meaning that the wave front is flat at the laser
beam waist. In contrast, at very large values of z we have R (z) ∼ = z (see (11.44)).
kρ 2 ∼ p 2
In this case, we may write these phase terms as kz + 2R(z) = k z + ρ 2 . This
describes a spherical wave front emanating from the origin out to point ρ, z . The
¡ ¢
Fresnel approximation (same as the paraxial approximation) represents spherical

wave fronts with the former parabolic approximation. As a reminder, to restore
the temporal dependence of the field, we simply append e −i ωt to the solution, as
discussed in connection with (10.5).
The phase −i tan−1 z/z 0 is perhaps a bit more mysterious. It is called the Gouy
shift and is actually present for any light that goes through a focus, not just laser
beams. The Gouy shift is not overly dramatic since the expression tan−1 z/z 0
ranges from −π/2 (at z = −∞) to π/2 (at z = +∞). Nevertheless, when light goes
through a focus, it experiences an overall phase shift of π.
Example 11.4
Write the beam waist w 0 in terms of the f-number, defined to be the ratio of z to
the diameter of the beam diameter 2w(z) far from the beam waist.
Solution: Far away from the beam waist (i.e. z >> z 0 ) the laser beam expands
along a cone. That is, its diameter increases in proportion to distance.
q
w (z) = w 0 1 + z 2 /z 02 → w 0 z/z 0
The cone angle is parameterized by the f-number, the ratio of the cone height to
its base:
z z z0
f # ≡ lim = =
z→±∞ 2w (z) 2w 0 z/z 0 2w 0
Substitution of (11.45) into this expressions yields
Figure 11.15 2λ f #
w0 = (11.47)
π

11.A ABCD Law for Gaussian Beams 279
Equation (11.47) gives a convenient way to predict the size of a laser focus. One
obtains the f-number from the diameter of the beam at a lens and divided into the
distance to the focus. However, in practice you may be very surprised at how badly
a beam focuses compared to the theoretical prediction (due to aberrations, etc.).
It is always good practice to directly measure your focus if its size is important to
an experiment.
Appendix 11.A ABCD Law for Gaussian Beams

In this section we discuss and justify the ABCD law for Gaussian beams. The
law enables one to predict the parameters of a Gaussian beam that exits from an
optical system, given the parameters of an input Gaussian beam. To make the
prediction, one needs only the ABCD matrix for the optical system, taken as a
whole. The system may be arbitrarily complex with many optical components.
At first, it may seem unlikely that such a prediction should be possible since
ABCD matrices were introduced to describe the propagation of rays. On the other
hand, Gaussian beams are governed by the laws of diffraction. As an example of
this dichotomy, consider a collimated Gaussian beam that traverses a converging
lens. By ray theory, one expects the Gaussian beam to focus near the focal point
of the lens. However, a collimated beam by definition is already in the act of going
through focus. In the absence of the lens, there is a tendency for the beam to grow
via diffraction, especially if the beam waist is small. This tendency competes with
the focusing effect of the lens, and a new beam waist can occur at a wide range of
locations, depending on the exact outcome of this competition.
A Gaussian beam is characterized by its Rayleigh range z 0 . From this, the
beam waist radius w 0 may be extracted via (11.45), assuming the wavelength is
known. Suppose that a Gaussian beam encounters an optical system at position
z, referenced to the position of the beam’s waist as shown in Fig. 11.16. The beam
exiting from the system, in general, has a new Rayleigh range z 00 . The waist of the
new beam also occurs at a different location. Let z 0 denote the location of the exit
of the optical system, referenced to the location of the waist of the new beam. If
the exiting beam diverges as in Fig. 11.16, then it emerges from a virtual beam
waist located before the exit point of the system. In this case, z 0 is taken to be
positive. On the other hand, if the emerging beam converges to an actual waist,
then z 0 is taken to be negative since the exit point of the system occurs before the
focus.
The ABCD law is embodied in the following relationship:
A (z + i z 0 ) + B
z 0 + i z 00 = (11.48)
C (z + i z 0 ) + D
where A, B , C
p, and D are the matrix elements of the optical system. The imaginary
number i ≡ −1 imbues the law with complex arithmetic. It makes two equations
from one, since the real and imaginary parts of (11.48) must separately be equal.

Figure 11.16 Gaussian laser beam traversing an optical system described by an ABCD
matrix. The dark lines represent the incoming and exiting beams. The gray line repre-
sents where the exiting beam appears to have been.
We now prove the ABCD law. We begin by showing that the law holds for
two specific ABCD matrixes. First, consider the matrix for propagation through a
distance d :
A B 1 d
· ¸ · ¸
= (11.49)
C D 0 1
We know that simple propagation has minimal effect on a beam. The Rayleigh
range is unchanged, so we expect that the ABCD law should give z 00 = z 0 . The
propagation through a distance d modifies the beam position by z 0 = z + d . We
now check that the ABCD law agrees with these results by inserting (11.49) into
(11.48):
1 (z + i z 0 ) + d
z 0 + i z 00 = = z + d + i z 0 (propagation through distance d) (11.50)
0 (z + i z 0 ) + 1
Thus, the law holds in this case.
Next we consider the ABCD matrix of a thin lens (or a curved mirror):
A B 1 0
· ¸ · ¸
= (11.51)
C D −1/ f 1
A beam that traverses a thin lens undergoes the phase shift −kρ 2 /2 f , according
to (11.11). This modifies the original phase of the wave front kρ 2 /2R (z), seen in
(11.41). The phase of the exiting beam is therefore
kρ 2 kρ 2 kρ 2
= − (11.52)
2R (z 0 ) 2R (z) 2 f
where we do not keep track of unimportant overall phases such as kz or kz 0 . With

(11.44) this relationship reduces to
1 1 1 1 1 1
= − ⇒ 2 0
= 2
− (11.53)
R (z 0 ) R (z) f z + z 0 /z
0 0 z + z 0 /z f
In addition to this relationship, the local radius of the beam given by (11.43)
cannot change while traversing the ‘thin’ lens. Therefore,
2
z2
Ã !
z0
µ ¶
¡ 0¢ 0
w z = w (z) ⇒ z 0 1 + 2 = z 0 1 + 2 (11.54)
z 00 z0

11.A ABCD Law for Gaussian Beams 281
On the other hand, the ABCD law for the thin lens gives
1 (z + i z 0 ) + 0
z 0 + i z 00 = (traversing a thin lens with focal length f )
− 1/ f (z + i z 0 ) + 1
¡ ¢
(11.55)
It is left as an exercise (see P11.14) to show that (11.55) is consistent with (11.53)
and (11.54).
So far we have shown that the ABCD law works for two specific examples,
namely propagation through a distance d and transmission through a thin lens
with focal length f . From these elements we can derive more complicated sys-
tems. However, the ABCD matrix for a thick lens cannot be constructed from just
these two elements. However, we can construct the matrix for a thick lens if we
sandwich a thick window (as opposed to empty space) between two thin lenses.
The proof that the matrix for a thick window obeys the ABCD law is left as an
exercise (see P11.17). With these relatively few elements, essentially any optical
system can be constructed, provided that the beam propagation begins and ends
up in the same index of refraction.
To complete our proof of the general ABCD law, we need only show that when
it is applied to the compound element
A B A2 B2 A1 B1 A 2 A 1 + B 2C 1 A 2 B 1 + B 2 D 1
· ¸ · ¸· ¸ · ¸
= =
C D C2 D2 C1 D1 C 2 A 1 + D 2C 1 C 2 B 1 + D 2 D 1
(11.56)
it gives the same answer as when the law is applied sequentially, first on
A1 B1
· ¸
C1 D1
and then on
A2 B2
· ¸
C2 D2
Explicitly, we have
A 2 z 0 + i z 00 + B 2
¡ ¢
00 00
z + i z0 =
C 2 z 0 + i z 00 + D 2
¡ ¢
h i
A 2 CA11(z+i
(z+i z 0 )+B 1
z 0 )+D 1 + B 2
= h i
C 2 CA11(z+i
(z+i z 0 )+B 1
z 0 )+D 1 + D 2
A 2 [A 1 (z + i z 0 ) + B 1 ] + B 2 [C 1 (z + i z 0 ) + D 1 ] (11.57)
=
C 2 [A 1 (z + i z 0 ) + B 1 ] + D 2 [C 1 (z + i z 0 ) + D 1 ]
(A 2 A 1 + B 2C 1 ) (z + i z 0 ) + (A 2 B 1 + B 2 D 1 )
=
(C 2 A 1 + D 2C 1 ) (z + i z 0 ) + (C 2 B 1 + D 2 D 1 )
A (z + i z 0 ) + B
=
C (z + i z 0 ) + D
Thus, we can construct any ABCD matrix that we wish from matrices that are
known to obey the ABCD law. The resulting matrix also obeys the ABCD law.

Exercises
Exercises for 11.1 Fraunhofer Diffraction Through a Lens
P11.1 Fill in the steps leading to (11.14) from (11.13). Show that the intensity
distribution (11.6) is consistent (11.14).
L11.2 Set up a collimated ‘plane wave’ in the laboratory using a HeNe laser
(λ = 633 nm) and appropriate lenses.
(a) Choose a rectangular aperture (∆x by ∆y) and place it in the plane
wave. Observe the Fraunhofer diffraction on a very far away screen (i.e.,
¢2
where z À k2 aperture radius is satisfied). Check that the location of
¡
the ‘zeros’ agrees with (10.21).

(b) Place a lens in the beam after the aperture. Use a CCD camera
to observe the Fraunhofer diffraction profile at the focus of the lens.
Check that the location of the ‘zeros’ agrees with (10.21), replacing z
with f .
(c) Repeat parts (a) and (b) using a circular aperture with diameter `.
Check the position of the first ‘zero’. (video)
CCD
Camera
Filters
Screen
Laser
Far-away Removable
mirror mirror Aperture
Figure 11.17
Exercises for 11.2 Resolution of a Telescope
P11.3 On the night of April 18, 1775, a signal was sent from the Old North
Church steeple to Paul Revere, who was 1.8 miles away: “One if by
land, two if by sea.” If in the dark, Paul’s pupils had 4 mm diameters,
what is the minimum possible separation between the two lanterns
that would allow him to correctly interpret the signal? Assume that the
predominant wavelength of the lanterns was 580 nm.
HINT: In the eye, the index of refraction is about 1.33 so the wavelength
is shorter. This leads to a smaller diffraction pattern on the retina.
However, in accordance with Snell’s law, two rays separated by an angle
580 nm outside of the eye are separated by an angle θ/1.33 inside the
eye. The two rays then hit on the retina closer together. As far as
resolution is concerned, the two effects exactly compensate.

Exercises 283
L11.4 Simulate two stars with laser beams (λ = 633 nm). Align them nearly
parallel with a small lateral displacement. Send the beams down a long
corridor until diffraction causes both beams to grow into one another
so that it is no longer apparent that they are from two distinct sources.
Use a lens to image the two sources onto a CCD camera. The camera
should be placed close to the focal plane of the lens. Use a variable iris
near the lens to create different pupil openings.
Laser
CCD
Laser Camera
Filter
Pupil
Figure 11.18
Experimentally determine the pupil diameter that just allows you to

resolve the two sources according to the Rayleigh criterion. Check your
measurement against theoretical prediction. (video)
HINT: The angular separation between the two sources is obtained by

dividing propagation distance into the lateral separation of the beams.
Exercises for 11.3 The Array Theorem
P11.5 Find the diffraction pattern created by an array of nine circles, each
with radius a, which are centered at the following (x 0 , y 0 ) coordinates:
(−b, b), (0, b), (b, b), (−b, 0), (0, 0), (b, 0), (−b, −b), (0, −b), (b, −b) (a is
less than b). Make a plot of the result for the situation where (in some
choice of units) a = 1, b = 5a, and k/d = 1. View the plot at different
“zoom levels” to see the finer detail.
P11.6 (a) A plane wave is incident on a screen of N 2 uniformly spaced identi-

cal rectangular apertures of dimension ∆x by ∆y (see Fig. 11.19). Their
positions are described by x n = h n − N2+1 and y m = s m − N2+1 . Find
¡ ¢ ¡ ¢
the far-field (Fraunhofer) pattern of the light transmitted by the grid.
(b) You look at a distant sodium street lamp (somewhat monochro-

matic) through a curtain made from a fine mesh fabric with crossed
Figure 11.19
threads. Make a sketch of what you expect to see (how the lamp will
look to you).
HINT: Remember that the lens of your eye causes the Fraunhofer
diffraction of the mesh to appear at the retina.

Exercises for 11.4 Diffraction Grating
P11.7 Consider Fraunhofer diffraction from a grating of N slits having widths

∆x and equal separations h. Make plots (label relevant points and
scaling) of the intensity pattern for N = 1, N = 2, N = 5, and N =
1000 in the case where h = 2∆x, ∆x = 5 µm, and λ = 500 nm. Let the
Fraunhofer diffraction be observed at the focus of a lens with focal
length f = 100 cm. Do you expect I peak to be the same value for all of
these cases?
P11.8 For the case of N = 1000 in P11.7, you wish to position a narrow slit at
the focus of the lens so that it transmits only the first-order diffraction
peak (i.e. at khx/ 2 f = ±π). (a) How wide should the slit be if it is to
¡ ¢
be half the separation between the first intensity zeros to either side of
the peak?
(b) What small change in wavelength (away from λ = 500 nm) will
cause the intensity peak to shift by the width of the slit found in part
(a)?
Exercises for 11.5 Spectrometers
L11.9 (a) Use a HeNe laser to determine the period h of a reflective grating.
(b) Give an estimate of the blaze angle φ on the grating. HINT: Assume
that the blaze angle is optimized for first-order diffraction of the HeNe
laser (for one side) at normal incidence. The blaze angle enables a
mirror-like reflection of the diffracted light on each groove. (video)
Figure 11.20 (c) You have two mirrors of focal length 75 cm and the reflective grating
in the lab. You also have two very narrow adjustable slits and the ability
to ‘tune’ the angle of the grating. Sketch how to use these items to make
a monochromator (scans through one wavelength at a time). If the
beam that hits the grating is 5 cm wide, what do you expect the ultimate
resolving power of the monochromator to be in the wavelength range
of 500 nm? Do not worry about aberration such as astigmatism from
using the mirrors off axis.
Light out
Slit
Grating
Slit
Light in
Figure 11.21

Exercises 285
L11.10 Study the Jarrell Ash monochromator. Use a tungsten lamp as a source
and observe how the instrument works by taking the entire top off.
Do not breathe or touch when you do this. In the dark, trace the light
inside of the instrument with a white plastic card and observe what
happens when you change the wavelength setting. Place the top back
on when you are done. (video)
(a) Predict the best theoretical resolving power that this instrument can
do assuming 1200 lines per millimeter.
(b) What should the width ∆x of the entrance and exit slits be to obtain
this resolving power? Assume λ = 500 nm.
HINT: Set ∆x to be the distance between the peak and the first zero of
the diffraction pattern at the exit slit for monochromatic light.
Exercises for 11.7 Gaussian Laser Beams
P11.11 (a) Confirm that (11.41) reduces to (11.35) when z = 0.

(b) Take the limit z À z 0 to find the field far from the laser focus.
P11.12 Use the Fraunhofer integral formula (either (10.20) or (10.29)) to deter-
mine the far-field pattern of a Gaussian laser focus (11.35).
HINT: The answer should agree with P11.11 part (b).
L11.13 Consider the following setup where a diverging laser beam is collimated
using an uncoated lens. A double reflection from both surfaces of the
lens (known as a ghost) comes out in the forward direction, focusing
Ghost Beam
after a short distance. Use a CCD camera to study this focused beam.
The collimated beam serves as a reference to reveal the phase of the
focused beam through interference. Because the weak ghost beam
concentrates near its focus, the two beams can have similar intensities
Figure 11.22
for optimal interference effects. (video)
Uncoated
Filter Pin Hole Lens
Laser CCD
Camera
Lens
150 cm
Figure 11.23
The ghost beam E 1 ρ, z is described by (11.41), where the origin is at

¡ ¢
the focus. Let the collimated beam be approximated as a plane wave

E 2 e i kz+i φ , where φ is the relative phase between the two beams. The
¯2
net intensity is then I t ρ, z ∝ ¯E 1 ρ, z + E 2 e i kz+i φ ¯ or
¡ ¢ ¯ ¡ ¢
kρ 2 −1 z
· q µ ¶¸
I t ρ, z = I 2 + I 1 ρ, z + 2 I 2 I 1 ρ, z cos − tan −φ
¡ ¢ ¡ ¢ ¡ ¢
2R (z) z0

where I 1 ρ, z is given by (11.46). We now have a formula that retains

¡ ¢
both R (z) and the Gouy shift tan−1 z/z 0 , which are not present in the
intensity distribution of a single beam (see (11.46)).
(a) Determine the f-number for the ghost beam (see Example 11.4).
z=0 z = +z0 Use this measurement to predict a value for w 0 . HINT: You know that
at the lens, the focusing beam is the same size as the collimated beam.
(b) Measure the actual spot size w 0 at the focus. How does it compare
to the prediction?
HINT: Before measuring the spot size, make a subtle adjustment to
z = -z0 z = +2z0 the tilt of the lens. This incidentally causes the phase between the two
beams to vary by small amounts, which you can set to φ = ±π/2. Then
at the focus the cosine term vanishes and the two beams don’t interfere
(i.e. the intensities simply add). This is accomplished if the center of
the interference pattern is as dark as possible either far before or far
z = -2z0 z = +3z0
after the focus.
(c) Observe the effect of the Gouy shift. Since tan−1 z/z 0 varies over a
range of π, you should see that the ring pattern before versus after the
focus inverts (i.e. the bright rings exchange with the dark ones).
(d) Predict the Rayleigh range z 0 and check that the radius of curvature
z = -3z0 z = +4z0 R (z) ≡ z + z 02 /z agrees with measurement.
HINT: You should see interference rings similar to those in Fig. ??. The
Figure 11.24
only phase term that varies with ρ is kρ 2 /2R (z). If you count N fringes
out to a radius ρ, then kρ 2 /2R (z) has varied by 2πN .
Exercises for 11.A ABCD Law for Gaussian Beams
P11.14 Find the solutions to (11.55) (i.e. find z 0 and z 00 in terms of z and z 0 ).
Show that the results are in agreement with (11.53) and (11.54).
P11.15 Assuming a collimated beam (i.e. z = 0 and beam waist w 0 ), find the
location L = −z 0 and size w 00 of the resulting focus when the beam goes
through a thin lens with focal length f .
L11.16 Place a lens in a HeNe laser beam soon after the exit mirror of the cavity.
Characterize the focus of the resulting laser beam, and compare the
results with the expressions derived in P11.15.
P11.17 Prove the ABCD law for a beam propagating through a thick window of
material with matrix
A B 1 d /n
· ¸ · ¸
=
C D 0 1

Chapter 12
Interferograms and Holography
In chapter 8, we studied a Michelson interferometer in an idealized sense: 1) The

light entering the instrument was considered to be a planewave. 2) The retro-
reflecting mirrors were considered to be aligned perpendicular to the beams
impinging on them. 3) All reflective surfaces were taken to be perfectly flat. If
any of these conditions are not met, the beam emerging from the interferometer
is likely to exhibit an interference or fringe pattern. A recorded fringe pattern
(on a CCD or photographic film) is called an interferogram. In section 12.1, we
shall examine typical fringe patterns that can be produced in an interferometer.
Such patterns are very useful for testing the prescription and quality of optical
components.
In this chapter, we will also study the principles of holography. In optical
holography, an interference pattern (or fringe pattern) is recorded and then
later used to diffract light, much like gratings diffract light.1 A recorded fringe
pattern, when used for the purpose of diffracting light, is called a hologram. When
light diffracts from a hologram, it can mimic the light field originally used to
generate the previously recorded fringe pattern. This is true even for very complex
fields generated when light is scattered from arbitrary three-dimensional objects.
When the light field is re-created through diffraction by the fringe pattern, an
observer perceives the presence of the original object. The image looks three-
dimensional since the holographic fringes re-construct the original light pattern
simultaneously for a wide range of viewing angles.
12.1 Interferograms
Consider the Michelson interferometer seen in Fig. 12.1. Suppose that the beam-
spliter divides the fields evenly, so that the overall output intensity is given by
(8.1): Figure 12.1 Michelson interferom-
I det = 2I 0 [1 + cos (ωτ)] (12.1) eter.
1 In fact, a grating can be considered to be a hologram and holographic techniques are often
employed to produce gratings.
287
288 Chapter 12 Interferograms and Holography
(a) where τ is the roundtrip delay time of one path relative to the other. This equation
is based on the idealized case, where the amplitude and phase of the two beams
are uniform and perfectly aligned to each other following the beamsplitter. The
entire beam ‘blinks’ on and off as the delay path τ is varied.
What happens if one of the retro-reflecting mirrors is misaligned by a small
angle θ? The fringe patterns seen in Fig. 12.2 (a)-(c) are the result. By the law
of reflection, the beam returning from the misaligned mirror deviates from the
(b) ‘ideal’ path by an angle 2θ. This puts a relative phase term of
φ = kx sin (2θx ) + k y sin 2θ y (12.2)

¡ ¢
on the misaligned beam. Here θx represents the tilt of the mirror in the x-
dimension and θ y represents the amount of tilt in the y-dimension.
When the two plane waves join, the resulting intensity pattern is
(c)
I det = 2I 0 1 + cos φ + ωτ (12.3)
£ ¡ ¢¤
Of course, the phase term φ depends on the local position within the beam
through x and y. Regions of uniform phase, called fringes (in this case individual
stripes), have the same intensity. As the delay τ is varied, the fringes seem to
‘move’ across the detector, owing to the fact that the phase varies smoothly across
the beam. The fringes emerge from one edge of the beam and disappear at the
(d) other.
Another interesting situation arises when the beams in a Michelson interfer-
ometer are diverging. A fringe pattern of concentric circles will be seen at the
detector when the two beam paths are unequal (see Fig. 12.2 (d)). The radius of
curvature for the beam traveling the longer path is increased by the added amount
of delay d = τ/c. Thus, if beam 1 has radius of curvature R 1 when returning to the
beam splitter, then beam 2 will have radius R 2 = R 1 +d upon return (assuming flat
(e) mirrors). The relative phase (see phase term in (11.41)) between the two beams is
φ = kρ 2 /2R 1 − kρ 2 /2R 2 (12.4)
and the intensity pattern at the detector is given as before by (12.3).
12.2 Testing Optical Components

Figure 12.2 Fringe patterns for
a Michelson interferometer: (a) A Michelson interferometer is ideal for testing the quality of optical surfaces. If
Horizontally misaligned beams. any of the flat surfaces (including the beam splitter) in the interferometer are
(b) Vertically misaligned beams. distorted, the fringe pattern readily reveals it. Figure 12.3 shows an example of a
(c) Both vertically and horizontally fringe pattern when one of the mirrors in the interferometer has an arbitrary de-
misaligned beams. (d) Diverging
formity in the surface figure.2 A new fringe stripe occurs for every half wavelength
beam with unequal paths. (e) Di-
verging beam with unequal paths
that the surface varies. (The round trip turns a half wavelength into a whole
and horizontal misalignment. wavelength.) This makes it possible to determine the flatness of a surface with
2 The surface figure is a name for how well a surface contour matches a desired prescription.

12.3 Generating Holograms 289
very high precision. Of course, in order to test a given surface in an interferometer, (a)
the quality of all other surfaces in the interferometer must first be ensured.
A typical industry standard for research-grade optics is to specify the surface
flatness to within one tenth of an optical wavelength (633 nm HeNe laser). This
means that the interferometer should reveal no more than one fifth of a fringe
variation across the substrate surface. The fringe pattern tells the technician how
the surface should continue to be polished in order to achieve the desired surface
flatness. Figure 12.3(a) shows the fringe pattern for a surface with significant
variations in the surface figure. (b)
When testing a surface, it is not necessary to remove all tilt from the alignment
before the effects of surface variations become apparent in the fringe pattern.
In fact, it can be helpful to observe the distortions as deflections in a normally
regularly striped fringe pattern. Figure 12.3(b) shows fringes from a distorted
surface when some tilt is left in the interferometer alignment. An important
advantage to leaving some tilt in the beam is that one can better tell the sign of
the phase errors. We can see, for example, in the case of tilt that the two major
Figure 12.3 (a) Fringe pattern aris-
distortion regions in Fig. 12.3 have opposite phase; we can tell that one region of ing from an arbitrarily distorted
the substrate protrudes while other dishes in. On the other hand, this is not clear mirror in a perfectly aligned inter-
for an interferogram with no tilt. ferometer with plane wave beams.
Other types of optical component (besides flat mirrors) can also be tested (b) Fringe pattern from the same
with an interferometer. Figure 12.4 shows how a lens can be tested using a mirror as (a) when the mirror is
convex mirror to compensate for the focusing action of the lens. With appropriate tilted (still plane wave beams).
The distortion due to surface varia-
spacing, the lens-mirror combination can act like a flat surface. Distortions in the
tion is still easily seen.
lens figure are revealed in the fringe pattern. In this case, the surfaces of the lens
are tested together, and variations in optical path length are observed. In order
to record fringes, say with a CCD camera, it is often convenient to image a larger
beam onto a relatively small active area of the detector. The imaging objective
Optic to
should be adjusted to produce an image of the test optic on the detector screen. be tested
Of course the diameter of the objective lens needs to accommodate the whole
beam.
12.3 Generating Holograms

Imaging
In the late 1940’s, Dennis Gabor developed the concept of holography, but it wasn’t Objective
Camera
until after the invention of the laser that this field really blossomed. Consider
a coherent monochromatic beam of light that is split in half by a beamsplitter,
Figure 12.4 Twyman-Green setup
similar to that in a Michelson interferometer. Let one beam, called the reference for testing lenses.
beam, proceed directly to a recording film, and let the other beam scatter from
an arbitrary object back towards the same film. The two beams interfere at the
recording film. It is best to split the beam initially into unequal intensities such
that the light scattered from the object has an intensity similar to the reference
beam at the film.
The purpose of the film is to record the interference pattern. It is important
that the coherence length of the light be much longer than the difference in

path length starting from the beam splitter and ending at the film. In addition,
during exposure to the film, it is important that the whole setup be stable against
Object
vibrations on the scale of a wavelength since this will cause the fringes to washout.
Film For simplicity, we neglect the vector nature of the electric field, assuming that the
scattering from the object for the most part preserves polarization and that the
angle between the two beams incident on the film is modest (so that the electric
fields of the two beams are close to parallel). To the extent that the light scattered
from the object contains the polarization component orthogonal to that of the
reference beam, it provides a uniform (unwanted) background exposure to the
Beamsplitter
film on top of which the fringe pattern is recorded.
Figure 12.5 Exposure of holo- In general terms, we may write the electric field arriving at the film as
graphic film.
E film (r) e −i ωt = E object (r) e −i ωt + E ref (r) e −i ωt (12.5)
Here, the coordinate r indicates locations on the film surface, which may have
arbitrary shape, but often is a plane. The field E object (r), which is scattered from
the object, is in general very complicated. The field E ref (r) may be equally compli-
cated, but typically it is convenient if it has a simple form such as a plane wave,
since this beam must be re-created later in order to view the hologram.
The intensity of the field (12.5) is given by
1 ¯ ¯2
I film (r) = c²0 ¯E object (r) + E ref (r)¯
2
1 h¯ ¯2 i (12.6)
= c²0 ¯E object (r)¯ + |E ref (r)|2 + E ref
∗ ∗
(r) E object (r) + E ref (r) E object (r)
Dennis Gabor (1900–1979, Hungarian) 2
was born in Budapest, fought for Hun-
gary in World War I as a teenager, and
then studied at the Technical University
For typical photographic film, the exposure of the film is proportional to the
of Budapest and later at the Technical intensity of the light hitting it. This is known as the linear response regime. That
University of Berlin. In 1927, Gabor is, after the film is developed, the transmittance T of the light through the film is
completed his doctoral dissertation on
cathode ray tubes and began a long ca- proportional to the intensity of the light that exposed it (I film ). However, for low
reer working on electron-beam devices exposure levels, or for film specifically designed for holography, the transmission
such as oscilloscopes, televisions, and
electron microscopes. It was in the con- of the light through the film can be proportional to the square of the intensity
text of ‘electron optics’ that he invented of the light that exposes the film. Thus, after the film is exposed to the fringe
the concept of holography, which relied
on the wave nature of electron beams.
pattern and developed, the film acquires a spatially varying transmission function
Gabor did this work while working for a according to
British company, after fleeing Germany 2
when Hitler came to power. Holography T (r) ∝ I film (r) (12.7)
did not become practical until after the
invention of the laser, which provided
a bright coherent light source. (Gabor
If at a later point in time light of intensity I incident is directed onto the film, it will
had attempted to make holograms ear- transmit according to I transmitted = T (r)I incident . In this case, the field, as it emerges
lier using a spectral line from a mercury
from the other side of the film, will be
lamp.) In 1964 the first hologram was
produced. Soon after, holograms be-
came commercially available and were
popularized. Gabor accepted a post as
E transmitted (r) = t (r) E incident (r) ∝ I film (r) E incident (r) (12.8)
professor of applied physics at the Impe-
rial College of London from 1958 until p
he retired in 1967. He was awarded the where t (r) = T (r).
Nobel prize in physics in 1971 for the
invention of holography. © 2010 Peatross and Ware
12.4 Holographic Wavefront Reconstruction 291
12.4 Holographic Wavefront Reconstruction

Image
To see a holographic image, we re-illuminate film (previously exposed and devel-
oped) with the original reference beam. That is, we send in Film
E incident (r) = E ref (r) (12.9)
and view the light that is transmitted. According to (12.6) and (12.8), the transmit-
ted field is proportional to
E transmitted (r) ∝ I film (r) E ref (r) Observer

h¯ ¯2 i
= ¯E object (r)¯ + |E ref (r)|2 E ref (r) + |E ref (r)|2 E object (r) + E ref
2 ∗
(r) E object (r) Figure 12.6 Holographic recon-
(12.10) struction of wavefront through
Although (12.10) looks fairly complicated, each of the three terms has a direct diffraction from fringes on film.
interpretation. The first term is just the reference beam E ref (r) with an amplitude Compare with Fig. 12.5.
modified by the transmission through the film. It is the residual undeflected beam,
similar to the zero-order diffraction peak for a transmission grating. The second
term is interpreted as a reconstruction of the light field originally scattered from
the object E object (r). Its amplitude is modified by the intensity of the reference
beam, but if the reference beam is uniform across the film, this hardly matters. Reference
Film
An observer looking into the film sees a wavefront identical to the one produced Beam
by the original object (superimposed with the other fields in (12.10)). Thus,
the observer sees a virtual image at the location of the original object. Since
the wavefront of the original object has genuinely been recreated, the image
looks ‘three-dimensional’, because the observer is free to view from different
perspectives.
The final term in (12.10) is proportional to the complex conjugate of the
Point
original field from the object. It also contains twice the phase of the reference
Object
beam, which we can overlook if the reference beam is uniform on the film. In
this case, the complex conjugate of the object field actually converges to a real
image of the original object. This image is located on the observer’s side of the
film, but it is often of less interest since the image is inside out. An ideal screen for
viewing this real image would be an item shaped identical to the original object,
which of course defeats the purpose of the hologram! To the extent that the film is
not flat or to the extent that the reference beam is not a plane wave, the phase of
2
E ref (r) severely distorts the image. On the other hand, the virtual image previously
described never suffers from this problem.
Example 12.1 Figure 12.7 Exposure to holo-

graphic film by a point source
Analyze the three field terms in (12.10) for a hologram made from a point object, and a reference plane wave. The
as depicted in Fig. 12.7. holographic fringe pattern for a
point object and a plane wave ref-
Solution: Presumably, the point object is illuminated sufficiently brightly so as to erence beam exposing a flat film is
make the scattered light have an intensity similar to the reference beam at the film. shown on the right.

Reference Undeflected Let the reference plane wave strike the film at normal incidence. Then the reference
beam Film beam field will have constant amplitude and phase across it; call it E ref . The field from
the point object can be treated as a spherical wave:
E ref L p 2 2
E object ρ = p e i k L +ρ (point source example) (12.11)
¡ ¢
L2 + ρ2
Here ρ represents the radial distance from the center of the film to some other
point on the film. We have taken the amplitude of the object field to match E ref in
the center of the film.
Reference After the film is exposed, developed, and re-illuminated by the reference beam, the
beam Film field emerging from the right-hand-side of the film, according to (12.10), becomes
· 2 2
E ref L
¸
E ref L p 2 2
2 2
Virtual E transmitted ρ ∝ E ref E ref + E ref p e i k L +ρ
¡ ¢
2 2
+
image L +ρ L2 + ρ2
p (12.12)
2 E ref L 2 2
+ E ref e −i k L +ρ
L2 + ρ2
p
We see the three distinct waves that emerge from the holographic film. The first
Field associated
term in (12.12) represents the plane wave reference beam passing straight through
with virtual
image the film with some variation in amplitude (depicted in Fig. 12.8 (a)). The second
Reference
term in (12.12) has the identical form as the field from the original object (aside
beam from an overall amplitude factor). It describes an outward-expanding spherical
wave, which gives rise to a virtual image at the location of the original point object,
Real as depicted in Fig. 12.8 (b). The final term in (12.12) corresponds to a converging
image
Field associated spherical wave, which focuses to a point at a distance L from the observer’s side of
with real the screen (depicted in Fig. 12.8 (c)).
image
Film
Figure 12.8 Reference beam in-

cident on previously exposed
holographic film. (a) Part of the
beam goes through. (b) Part of the
beam takes on the field profile of
the original object. undeflected.
(c) Part of the beam converges to a
real image of the original object.

Exercises 293
Exercises
Exercises for 12.1 Interferograms
P12.1 An ideal Michelson interferometer that uses flat mirrors is perfectly

aligned to a wide collimated laser beam. Suppose that one of the mir-
rors is then misaligned by 0.1◦ . What is the spacing between adjacent
fringes on the screen if the wavelength is λ = 633 nm? What would
happen if instead of tilting one of the mirrors the the angle of the input
beam (before the beamsplitter) changed by 0.1◦ ?
P12.2 An ideal Michelson interferometer uses flat mirrors perfectly aligned

to an expanding beam that diverges from a point 50 cm before the
beamsplitter. Suppose that one mirror is 10 cm away from the beam
splitter, and the other is 11 cm. Suppose also that the center of the
resulting bull’s-eye fringe pattern is dark. If a screen is positioned 10 cm
after the beam splitter, what is the radial distance to the next dark fringe
on the screen if the wavelength is λ = 633 nm?
Exercises for 12.2 Testing Optical Components
L12.3 Set up an interferometer and observe distortions to a mirror substrate

when the setscrew is over tightened.
Exercises for 12.3 Generating Holograms
P12.4 Consider a diffraction grating as a simple hologram. Let the light from
the “object” be a plane wave (object placed at infinity) directed onto
a flat film at angle θ. Let the reference beam strike the film at normal
incidence, and take the wavelength to be λ.
(a) What is the period of the fringes?
(b) Show that when re-illuminated by the reference beam, the three
terms in (12.10) give rise to zero-order and 1st-order diffraction (occur-
ring on each side of zero-order).
P12.5 (a) Show that the phase of the real image in (12.12) may be approxi-
mated as ∆φ = −kρ 2 /2L, aside from a spatially independent overall
phase. Compare with (11.10) and comment.
(b) This hologram is similar to a Fresnel zone plate, used to focus
extreme ultraviolet light or x-rays, for which it is difficult to make a lens.
Graph the field transmission for the hologram as a function of ρ and
superimpose a similar graph for a “best-fit” mask that has regions of
either 100% or 0% transmission. Use λ = 633 nm and L = (5 × 105 − 14 )λ

(this places the point source about a 32 cm before the screen). See
Fig. 12.9.
Consider the holographic pattern produced by the point object de-
scribed in section 12.4.
L12.6 Make a hologram.
Zone Plate Transmittance
Hologram
Transmittance
Figure 12.9 Field transmission for

a point-source hologram (upper)
and a Fresnel zone plate (middle),
and a plot of both as a function of
radius (bottom).

R48 T or F: The eikonal equation and Fermat’s principle depend on the

assumption that the wavelength is relatively small compared to features
of interest.

assumption that the index of refraction varies only gradually.

assumption that the angles involved must not be too big.

assumption that the polarization is important to the problem.
R52 T or F: Spherical aberration can be important even when the paraxial

approximation works well.
R53 T or F: Chromatic aberration (the fact that refractive index depends on

frequency) is an example of the violation of the paraxial approximation.
R54 T or F: The Fresnel approximation falls within the paraxial approxima-

tion.
R55 T or F: The imaging relation 1/ f = 1/d o + 1/d i relies on the paraxial ray
approximation.
R56 T or F: The spherical waves given by e i kR /R are exact solutions to

Maxwell’s equations.
R57 T or F: Spherical waves can be used to understand diffraction from

apertures that are relatively large compared to λ.
R58 T or F: Fresnel was the first to conceive of spherical waves.
R59 T or F: Spherical waves were accepted by Poisson immediately without

experimental proof.
295
R60 T or F: The array theorem is useful for deriving the Fresnel diffraction
from a grating.
R61 T or F: A diffraction grating with a period h smaller than a wavelength

is ideal for making a spectrometer.
R62 T or F: The blaze on a reflection grating can improve the amount of

energy in a desired order of diffraction.
R63 T or F: The resolving power of a spectrometer used in a particular

diffraction order depends only on the number of lines illuminated (not
wavelength or grating period).
R64 T or F: The central peak of the Fraunhofer diffraction from two nar-
row slits separated by spacing h has the same width as the central
diffraction peak from a single slit with width ∆x = h.
R65 T or F: The central peak of the Fraunhofer diffraction from a circular

aperture of diameter ` has the same width as the central diffraction
peak from a single slit with width ∆x = `.
R66 T or F: The Fraunhofer diffraction pattern appearing at the focus of a

lens varies in angular width, depending on the focal length of the lens
used.
R67 T or F: Fraunhofer diffraction can be viewed as a spatial Fourier trans-

form (or inverse transform if you prefer) on the field at the aperture.
Problems
R68 (a) Derive Snell’s law using Fermat’s principle.

(b) Derive the law of reflection using Fermat’s principle.
R69 (a) Consider a ray of light emitted from an object, which travels a
distance d o before traversing a lens of focal length f and then traveling
a distance d i .
y2 y1
· ¸ · ¸
Write a vector equation relating to . Be sure to simplify
θ2 θ1
image
the equation so that only one ABCD matrix is involved.
object
1 0 1 d
· ¸ · ¸
HINT: ,
−1/ f 1 0 1
(b) Explain the requirement on the ABCD matrix in part (a) that ensures
Figure 12.10
that an image appears for the distances chosen. From this requirement,
extract a familiar constraint on d o and d i . Also, make a reasonable
definition for magnification M in terms of y 1 and y 2 , then substitute to
find M in terms of d o and d i .

297
(c) A telescope is formed with two thin lenses separated by the sum of
their focal lengths f 1 and f 2 . Rays from a given far-away point all strike
the first lens with essentially the same angle θ1 . Angular magnification
M θ quantifies the telescope’s purpose of enlarging the apparent angle
between points in the field of view.
Give a sensible definition for angular magnification in terms of θ1 and
θ2 . Use ABCD-matrix formulation to derive the angular magnification
of the telescope in terms of f 1 and f 2 . Figure 12.11
A B
· ¸
R70 (a) Show that a system represented by a matrix (beginning
C D
and ending in the same index of refraction) can be made to look like
the matrix for a thin lens if the beginning and ending positions along
the z-axis are referenced from two principal planes, located distances
p 1 and p 2 before and after the system.
¯ A B ¯
¯ ¯
HINT: ¯ ¯ ¯ = 1.
C D ¯
(b) Where are the principal planes located and what is the effective
focal length for two identical thin lenses with focal lengths f that are
separated by a distance d = f (see Fig. 12.12)?
R71 Derive the on-axis intensity (i.e. x, y = 0) of a Gaussian laser beam if Figure 12.12
you know that at z = 0 the electric field of the beam is
ρ 02
− 2
E ρ 0 , z = 0 = E 0 e w0
¡ ¢
Fresnel:
k 2 +y 2
i e i kd e i 2d (x )Ï ¢ k 02 02 k
E x 0 , y 0 , 0 e i 2d (x +y ) e −i d (xx +y y ) d x 0 d y 0
0 0
E x, y, d ∼
¡ ¢ ¡
=−
λd
Z∞
π B 2 +C
r
−Ax 2 +B x+C
e dx = e 4A .
A
−∞
R72 (a) You decide to construct a simple laser cavity with a flat mirror and
another mirror with concave curvature of R = 100 cm. What is the
longest possible stable cavity that you can make?
HINT: Sylvester’s theorem is
¸N
· · ¸
=
where cos θ = 21 (A + D).

(b) The amplifier is YLF crystal, which lases at λ = 1054 nm. You decide
to make the cavity 10 cm shorter than the longest possible (i.e. found in

part (a)). What is the value of w 0 , and where is the beam waist located
inside the cavity (the place we assign to z = 0)?
HINT: One can interpret the parameter R (z) as the radius of curvature
of the wave front. For a mode to exist in a laser cavity, the radius of
curvature of each of the end mirrors must match the radius of curvature
of the beam at that location.
w 0 − ρ22 i kz+i kρ2 −i tan−1 zz
E ρ, z = E 0 e w (z) e 2R(z) e
¡ ¢
0
w (z)
ρ2 ≡ x 2 + y 2
q
w (z) ≡ w 0 1 + z 2 /z 02
R (z) ≡ z + z 02 /z
kw 02
z0 ≡
2
R73 (a) Compute the Fraunhofer diffraction intensity pattern for a uni-
formly illuminated circular aperture with diameter `.
HINT:
i e i kd e i 2d (x
k 2
+y 2 ) ZZ
E x 0 , y 0 , 0 e −i d (xx +y y ) d x 0 d y 0
k 0 0
E x, y, d ∼
¡ ¢ ¡ ¢
=−
λd
Z2π
1
e ±i α cos(θ−θ ) d θ 0
0
J 0 (α) =
2π
0
Za
a
J 0 (bx) xd x = J 1 (ab)
b
0
J 1 (1.22π) = 0
2J 1 (x)
lim =1
x→0 x
(b) The first lens of a telescope has a diameter of 30 cm, which is the
only place where light is clipped. You wish to use the telescope to
examine two stars in a binary system. The stars are approximately 25
light-years away. How far apart need the stars be (in the perpendicular
sense) for you to distinguish them in the visible range of λ = 500 nm?
Compare with the radius of Earth’s orbit, 1.5 × 108 km.
R74 (a) Derive the Fraunhofer diffraction pattern for the field from a uni-
formly illuminated single slit of width ∆x. (Don’t worry about the
y-dimension.)
(b) Find the Fraunhofer intensity pattern for a grating of N slits of width
∆x positioned on the mask at x n0 = h n − N2+1 so that the spacing
¡ ¢
between all slits is h.

299
N k 0
HINT: The array theorem says that the diffraction pattern is e −i d xxn
P
n=1
times the diffraction pattern of a single slit. You will need
N rN −1
rn =r
X
n=1 r −1
(c) Consider Fraunhofer diffraction from the grating in part (b). The
grating is 5.0 cm wide and is uniformly illuminated. For best resolution
in a monochromator with a 50 cm focal length, what should the width
of the exit slit be? Assume a wavelength of λ = 500 nm.
R75 (a) A monochromatic plane wave with intensity I 0 and wavelength λ

is incident on a circular aperture of diameter ` followed by a lens of
focal length f . Write the intensity distribution at a distance f behind
the lens.
(b) You wish to spatially filter the beam such that, when it emerges from
the focus, it varies smoothly without diffraction rings or hard edges. A
pinhole is placed at the focus, which transmits only the central portion Figure 12.13
of the Airy pattern (inside of the first zero). Calculate the intensity
pattern at a distance f after the pinhole using the approximation given
in the hint below.
HINT: A reasonably good approximation of the transmitted field is
2 2
that of a Gaussian E ρ, 0 = E f e −ρ /w 0 , where E f is the magnitude of
¡ ¢
the field at the center of the focus found in part (a), and the width
is w 0 = 2λ f # /π and f # ≡ f /`. The figure below shows how well the
Gaussian approximation fits the actual curve. We have assumed that
the first aperture is a distance f before the lens so that at the focus after
the lens the wave front is flat at the pinhole. To avoid integration, you
may want to use the result of P11.12 or P11.11(b) to get the Fraunhofer
limit of the Gaussian profile. (See figure below.)
Selected Answers Figure 12.14
R71: (a) 100 cm (b) 0.32 mm.

R72: (b) 4.8 × 108 km.
R73: (c) 5 µm.

Chapter 13
Blackbody Radiation
Hot objects glow. In 1860, Kirchhoff proposed that the radiation emitted by hot
objects as a function of frequency is approximately the same for all materials.1
The notion that all materials behave similarly led to the concept of an ideal
blackbody radiator. Most materials have a certain shininess that causes light to
reflect or scatter in addition to being absorbed and reemitted. However, light
that falls upon an ideal blackbody is absorbed perfectly before the possibility of
reemission, hence the name blackbody.
The distribution of frequencies emitted by a blackbody radiator is related to its
temperature. The key concept of a blackbody radiator is that the light surrounding
it is in thermal equilibrium with the radiation. If some of the light escapes to the
environment, the object inevitably must cool as it continually moves towards a
new thermal equilibrium.
The Sun is a good example of a blackbody radiator. The light emitted from the
Gustav Kirchhoff (1824–1887, German)
Sun is associated with its surface temperature. Any light that arrives to the Sun was born in Konigsberg, the son of a
from outer space is virtually 100% absorbed, however little light that might be. lawyer. Kirchhoff attended the Univer-
Mostly, light escapes to the much colder surrounding space, and the temperature sity of Konigsberg. While still a student,
he developed what are now called Kirch-
of the Sun’s surface is maintained by the fusion process within. hoff’s law for electrical circuits. During
Experimentally, a near perfect blackbody radiator can be constructed from his career, Kirchhoff was a professor in
Breslau, Heidelberg, and finally Berlin.
a hollow object. An example is shown in Fig. ??. As the interior of the object is Kirchhoff was one of the first to study
heated, the light present inside the internal cavity is in equilibrium with the glow- the spectra emitted by various objects
when heated. Not coincidentally, his
ing walls. A small hole can be drilled through the wall into the interior to observe colleague in heidelberg was Robert
the radiation there without significantly disturbing the system. The observation Bunsen, inventor of the Bunsen burner.
Kirchhoff coined the term ‘blackbody’
hole can be thought of as a perfect blackbody since any light entering the hole radiation. He demonstrated that an ex-
from the outside is eventually absorbed (before being potentially reemitted), if not cited gas gives off a discrete spectrum,
on the first bounce then on subsequent bounces inside the cavity. In this case, the and that an unexcited gas surrounding
a blackbody emitter produces dark lines
walls of the cavity and light field are in thermal equilibrium. As another example, in the blackbody spectrum. Together
a glowing tungsten filament in an ordinary light bulb makes a reasonably good Kirchhoff and Bunsen discovered cae-
sium and rubidium. Later in his career,
blackbody radiator. However, if not formed into a cavity, one must take surface Kirchhoff showed how to derive Fres-
reflections into account because the emissivity is less than unity. nel’s diffraction formula starting from
the wave equation.
1 An important exception is atomic vapors, which have relatively few discrete spectral lines.
However, Kirchhoff’s assumption holds quite well for most solids, which are sufficiently complex.
301
302 Chapter 13 Blackbody Radiation
In this chapter, we develop a theoretical understanding of blackbody radiation

and provide some historical perspective. The explanation given by Max Planck
in 1900 marks the birth of quantum mechanics. He postulated the existence of
electromagnetic quanta, which we now call photons. Einstein used Planck’s ideas
to explain the photoelectric effect and to develop the concept of stimulated and
spontaneous emission. Because of his analysis, Einstein can be thought of as the
father of light amplification by stimulated emission of radiation (LASER).
13.1 Stefan-Boltzmann Law

One of the earliest properties deduced about blackbody radiation is known as
the Stefan-Boltzmann law, derived from thermodynamic ideas in 1879, before
blackbody radiation was well understood. This early (somewhat cumbersome)
derivation is provided in appendix 13.A. (It is less effort to obtain the Stefan-
Boltzmann law using the Planck radiation formula as a starting point (see P13.3).)
The Stefan-Boltzmann law says that the intensity I (including all frequencies)
that flows outward from the surface of an object is given by
I = eσT 4 , (13.1)
where σ is called the Stefan-Boltzmann constant and T is the absolute temper-

Figure 13.1 Blackbody radiator.
Thermal light emerges from the ature (in Kelvin) of the surface. The value of the Stefan-Boltzmann constant is
small hole in the end. σ = 5.6696 × 10−8 W/m2 · K4 . The dimensionless parameter e called the emissivity
is equal to one for an ideal blackbody surface. However, it takes on smaller values
for actual materials because of surface reflections. For example, the emissivity of
tungsten is approximately e = 0.4. Surface reflections make it harder for a material
to emit light as well as to absorb light. One can construct an ideal blackbody
radiator from a material with e < 1 by creating an enclosure, or cavity, as depicted
in Fig. 13.2. A small hole drilled into the wall of the cavity behaves to the outside
world like an ideal blackbody surface. From the perspective of the outside world,
the hole ‘surface’ has emissivity e = 1. Light within the cavity recirculates to the
extent that it avoids absorption (i.e. due to reflection) when it encounters walls.
The intensity automatically reaches that of an ideal blackbody radiator.
It is sometimes useful to express intensity in terms of the energy density of
the light field u field (given by (2.53) in units of energy per volume). The connec-
tion between the intensity emerging from the observation hole in the wall of a
blackbody cavity and the energy density of the thermal light within the cavity is
Figure 13.2 Blackbody radiator
constructed as a cavity with a cu field 4σT 4
small hole to sample the internal I= ⇒ u field = (13.2)
4 c
light.
Within the enclosed cavity, light travels at speed c isotropically in all directions.
A factor of 1/2 arrises because only half of the energy travels away from rather
than towards the hole from within the cavity. The remaining factor of 1/2 oc-
curs because the light emerging from the hole is directionally distributed over a

13.2 Failure of the Equipartition Principle 303
hemisphere as opposed to flowing in the direction of the surface normal n̂. The
average over the hemisphere is carried out as follows:
2π π/2 2π π/2
dφ r · n̂ sin θd θ dφ r cos θ sin θd θ
R R R R
0 0 0 0 1
= = (13.3)
2π π/2 2π π/2 2
dφ r sin θd θ dφ r sin θd θ
R R R R
0 0 0 0
Although (13.1) describes the total intensity of the light that leaves a blackbody
surface, it does not describe what frequencies make up the radiation field. This
frequency distribution was not fully described for another two decades, when
Max Planck developed his famous formula. Planck was first to arrive at the cor-
rect formula for the spectrum of blackbody radiation. At first, he developed the
formula to match available experimental data. When he attempted to explain it,
he was forced to introduce the concept of light quanta. Even Planck was uncom-
fortable with and perhaps disbelieved the assumption that his formula implied,
but he deserves credit for recognizing and articulating those assumptions.
13.2 Failure of the Equipartition Principle

In 1900, Lord Rayleigh attempted to explain the blackbody spectral distribution
(intensity per frequency) as a function of temperature by applying the equipar-
tition theorem to the problem. James Jeans gave a more complete derivation in
1905, which included an overall proportionality constant. They were hopelessly
behind, since Planck nailed the answer in 1900, but their failed approach is useful
pedagogically, and for that reason it gets more attention than it deserves. We
also will examine the Rayleigh-Jeans approach to illustrate the shortcomings of
classical concepts, which will help us better appreciate the quantum ideas. As we
will see later, the Rayleigh-Jeans approach actually gets the right answer in the
long-wavelength limit. In fairness to Rayleigh and Jeans, they represented their
formula as being useful only for long wavelengths.
The thermodynamic law of equipartition implies that the energy in a system
on the average is distributed equally among all degrees of freedom in the system.
For example, a system composed of oscillators (say, electrons attached to ‘springs’
representing the response of the material on the walls of a blackbody radiator)
has an energy of k B T /2 for each degree of freedom, where k B = 1.38 × 10−23 J/K is
Boltzmann’s constant. Rayleigh and Jeans supposed that each unique mode of the
electromagnetic field should carry energy k B T just as each mechanical spring in
thermal equilibrium carries energy k B T (k B T /2 as kinetic and k B T /2 as potential
energy). The problem then reduces to that of finding the number of unique modes
for the radiation at each frequency. They anticipated that requiring each mode of
electromagnetic energy to hold energy k B T should reveal the spectral shape of
blackbody radiation.

Number of Modes in an Electromagnetic Field

q
Each frequency is associated with a specific wave number k = k x2 + k y2 + k z2 .
Notice that there are many ways (i.e. combinations of k x , k y , and k z ) to come up
with the same wave number k = 2πν/c (corresponding to a single frequency ν).
To count these ways properly, we can let our experience with Fourier series guide
us. Consider a box with each side of length L. The Fourier theorem (0.33) states
that the total field inside the box (no matter how complicated the distribution)
can always be represented as a superposition of sine (and cosine) waves. The total
field in the box can therefore be written as
( )
∞ ∞ ∞
i (nk 0 x+mk 0 y+`k 0 z )
X X X
Re E e
n,m,` (13.4)
n=−∞ m=−∞ `=−∞
where each component of the wave number in any of the three dimensions is an
integer times
k 0 = 2π/L (13.5)
We note that considering a the box of size L does not artificially restrict our analysis,
since we may later take the limit L → ∞ so that our box represents the entire
universe. In fact, L naturally disappears from our calculation as we consider the
density of modes.
We can think of a given wave number k as specifying the equation of a sphere in a
coordinate system with axes labeled n, m, and `:
µ ¶2
2 2 2 k
n +m +` = (13.6)
k0
We need to know how many more ways there are to choose n, m, and ` when the
wave number k/k 0 increases to (k + d k)/k 0 . The answer is the difference in the
volume of the two spheres as shown in Fig. 13.3:
k2 d k
µ ¶
# modes in (k,k+d k) = 4π 2 (13.7)
k0 k0
Figure 13.3 The volume of a thin This represents the number of ways to come up with a wave number between
spherical shell in n, m, ` space. k and k + d k. Again, this is the number of terms in (13.4) with a wave number
between k and k + d k. Recall that n, m, and ` are integers. Notice that we have
included the possibility of negative integers. This automatically takes into account
the fact that for each mode (defined by a set n, m, and `) the field may travel in
the forwards or the backwards direction.
Since according to the Rayleigh-Jeans assumption, each mode should carry on

average equal energy k B T , the energy density (energy per volume) associated with
a specified range of wave numbers d k is then k B T /L 3 times (13.7), the number of
modes within that range. Actually, (13.4) does not account for the two distinct
polarizations for each mode (i.e. individual terms in the summation 13.4). So we
will need to double the the answer.
The total energy density in the field involving all all wave numbers is
Z∞ Z∞ 2
k B T 4πk 2 k
u field = 2 × 3 × 3 d k = k B T dk (13.8)
L k0 π2
0 0

13.3 Planck’s Formula 305
where the extra factor of 2 accounts for the two independent polarizations. As
anticipated, the dependence on L has disappeared from (13.8) after substituting
from (13.5).
We can see that (13.8) disagrees drastically with the Stefan-Boltzmann law
(13.2), since (13.8) is proportional to temperature rather than to its fourth power.
In addition, the integral in (13.8) is seen to diverge, meaning that regardless of the
temperature, the light carries infinite energy density! This has since been named
the ultraviolet catastrophe since the divergence occurs on the short wavelength
end of the spectrum. This is a clear failure of classical physics to explain blackbody
radiation. Nevertheless, Rayleigh emphasized the fact that his formula works
well for the longer wavelengths. He did not necessarily want to abandon classical
physics. Such dramatic changes take time.
It is instructive to make the change of variables k = 2πν/c in the integral to
write
Z∞
8πν2
u field = k B T dν (13.9)
c3
0
The important factor 8πν2 /c 3 can now be understood to be the number of modes
per frequency. Then (13.9) is rewritten as
Z∞
u field = ρ (ν) d ν (13.10)
0
where
8πν2
ρ Rayleigh-Jeans (ν) = k B T (13.11)
c3
describes (incorrectly) the spectral energy density of the radiation field associated
with blackbody radiation.
13.3 Planck’s Formula
In the late 1800’s as spectrographic technology improved, experimenters acquired

considerable data on the spectra of blackbody radiation. For the first time, de-
tailed maps of the intensity per frequency associated with blackbody radiation
became available over a fairly wide wavelength range. In keeping with Kirchhoff’s
notion of an ideal blackbody radiator, the results were observed to be indepen-
dent of the material for most solids. The intensity per frequency depended only
on temperature and when integrated over all frequencies agreed with the Stefan-
Boltzmann law (13.1).
In 1896, Wilhelm Wien considered the known physical and mathematical
constraints on the spectrum of blackbody radiation and proposed a spectral

function that seemed to work:2
8πhν3 e −hν/kB T
ρ Wien (ν) = (13.12)
c3
An important feature of (13.12) is that it gives a result proportional to T 4 when
integrated over all frequency ν (i.e. the Steffan-Boltzmann law).
Wien’s formula did a fairly good job of fitting the experimental data. However,
in 1900 Lummer and Pringshein, colleagues of Max Planck, reported experimental
data that deviated from the Wien distribution at long wavelengths (infrared).
Planck was privy to this information early on and came up withğ a modest revision
to Wien’s formula that fit the data beautifully everywhere:
8πhν3
ρ Planck (ν) = (13.13)
c 3 e hν/kB T − 1
£ ¤
where h = 6.626 × 10−34 J · s is an experimentally determined constant.

Figure 13.4 shows the Planck spectral distribution curve together with the
Rayleigh-Jeans curve (13.11) and the Wien curve (13.12). As is apparent, the Wien
distribution does a good job nearly everywhere. However, at long wavelengths
it was off by just enough for the experimentalists to notice that something was
wrong.
At this point, it may seem fair to ask, what did Planck do that was so great?
After all, he simply guessed a function that was only a slight modification of
Wien’s distribution. And he knew the ‘answer from the back of the book’, namely
Lummer’s and Pringshein’s well done experimental results. (At the time, Planck
Figure 13.4 Energy density per was unaware of the work by Rayleigh.) What Planck did that was so great was
frequency according to Planck, to interpret the meaning of his new formula. His interpretation was what he
Wien, and Rayleigh-Jeans. called an “act of desperation.” While Planck was able to explain the implications
of his formula, he did not assert that the implications were necessarily right; in
fact, he presented them somewhat apologetically. It was several years later that
the young Einstein published his paper explaining the photoelectric effect in
terms of the implications of Planck’s formula. Planck’s insight was an enormous
step towards understanding the quantum nature of light. The full theory of
quantum electrodynamics would not be developed until nearly three decades
later. Students should appreciate that the very people who developed quantum
mechanics were also bothered by its confrontation with deep-seated intuition. If
quantum mechanics bothers you, you are in good company!
Planck found that he could derive his formula only if he made the following
strange assumption: A given mode of the electromagnetic field is not able to
carry an arbitrary amount of energy (for example, k B T as Rayleigh and Jeans
used, which varies continuously as the temperature varies). Rather, the field
can only carry discrete amounts of energy separated by spacing hν. Under this
assumption, the probability P n that a mode of the field is excited to the n th level
2 The constant h had not yet been introduced by Planck. The actual way that Wien wrote his
distribution was ρ Wien (ν) = aν3 e −bν/T , where a and b were parameters used to fit the data.

13.3 Planck’s Formula 307
is proportional to the Boltzmann statistical weighting factor e −nhν/kB T . A review

of the Boltzmann factor is given in Appendix 13.B.
Probable Energy in Each Field Mode
The Boltzmann factor can be normalized by dividing by the sum of all such factors
to obtain the probability of having energy nhν in a particular mode:
e −nhν/kB T −nhν/k B T
h
−hν/k B T
i
Pn = ∞ = e 1 − e (13.14)
e −mhν/kB T
P
m=0
We are able to accomplish the above sum since it is a geometric series. The ex-
pected energy in each mode of the field is the sum of the probabilities times each
energy-level possibility:
∞ h ∞
iX
hνnP n = hν 1 − e −hν/kB T ne −nhν/kB T
X
n=0 n=0 Max Planck (1858–1947, German)
was born in Kiel, the sixth child in his
h i ∂ ∞
= hν e −hν/kB T − 1 e −nhν/kB T (13.15) family. His father was a law professor.
X
∂ (hν/k B T ) n=0 When Max was about nine years old,
his family moved to Munich where he
hν attended gymnasium. A mathematician,
=
e hν/kB T − 1 Herman Muller took an interest in his
schooling and tutored him in mechanics
and astronomy. Planck was a gifted
Equation (13.15) provides the expected energy in any of the modes of the musician, but he decided to pursue a
career in physics. At age 16 he enrolled
radiation field, as dictated by Planck’s assumption. This simply replaces k B T in in the University of Munich. By age 22,
the Rayleigh-Jeans derivation. That is, we substitute (13.15) for k B T in (13.10) to he had finished his doctoral dissertation
and habilitation thesis. He was initially
obtain the Planck distribution (13.13). ignored by the academic community and
It is interesting that we are now able to derive the constant in the Stefan- worked for a time as an unpaid lecturer.
He became an associate professor of
Boltzmann law (13.2) in terms of Planck’s constant h (see P13.3). The Stefan- theoretical physics at the University of
Boltzmann law is obtained by integrating the spectral density function (13.13) Kiel and then a few years later took
over all frequencies to obtain the total field energy density, which is in thermal over Kirchhoff’s post at the University
of Berlin. After nearly twenty years of
equilibrium with the blackbody radiator: idillic and happy family life, a series
of tragedies hit the Planck household.
Z∞
4 2π5 k B4 4 4 Planck’s first wife and mother of four,
u field = ρ Plank (ν)d ν = T = σT 4 (13.16) died. Then his eldest son was killed
c 15c 2 h 3 c in action during World War I. Soon
0 after, his twin daughters each died
giving birth to their first child. Later
The Stefan-Boltzmann constant is thus calculated in terms of Planck’s constant. Planck’s remaining son from his first
Since Planck’s constant was not introduced a couple decades after the Stefan- marriage was executed for participating
in a failed attempt to assassinate Hitler.
Boltzmann law was developed, one might more appropriately say that the Stefan- Planck won the Nobel prize in 1918 for
Boltzmann constant pins down Planck’s constant. his introduction of energy quanta, but
he had serious reservations about the
course that quantum mechanics theory
took.
Example 13.1
Determine ρ Planck (ω) and ρ Planck (λ) such that
Z∞ Z∞ Z∞
u field = ρ Planck (ν) d ν = ρ Planck (ω) d ω = ρ Planck (λ) d λ
0 0 0

where ρ Planck (ω) and ρ Planck (λ) represent distinct functions denoted by their argu-
ments.
Solution: The change of variables ω ≡ 2πν ⇒ d ν = d ω/2π on the integrand (13.13)

yields
Z∞ Z∞ ¡ ω ¢3 Z∞
8πh 2π dω ħω3
u field = ρ Planck (ν) d ν = h ω = 2 3 ħω/k B T − 1
¤dω
c 3 e h 2π /kB T − 1 2π 0 π c e
i £
0 0
where we have introduced ħ ≡ h/2π. By inspection, we have
ħω3
ρ Planck (ω) = (13.17)
π2 c 3 e ħω/kB T − 1
£ ¤
Similarly, the change of variables λ ≡ c/ν ⇒ d ν = −cd λ/λ2 gives
Z0 ¶ Z∞
8πh (c/λ)3 dλ 8πhc
µ
u field = ¤ −c 2 = ¤dλ
c 3 e h(c/λ)/kB T −1 λ λ5 e hc/λkB T − 1
£ £
∞ 0
By inspection, we get
8πhc
ρ Planck (λ) = (13.18)
λ5 e hc/λk BT − 1
£ ¤
It is interesting to note that the maximum of ρ Planck (λ) occurring at λmax and the
maximum of ρ Planck (ν) occurring at νmax do not correspond to a matching wave-
length and frequency. That is, λmax 6= c/νmax , because of the nonlinear nature of the
variable transformation. (See problem P13.4.)
13.4 Einstein’s A and B Coefficients

More than a decade after Planck introduced his formula, and after Bohr had pro-
posed that electrons occupy discrete energy states in atoms, Einstein reexamined
blackbody radiation in terms of Bohr’s new idea. If the material of a blackbody
radiator interacts with a mode of the field with frequency ν, then electrons in the
material must make transitions between two energy levels with energy separation
hν. Since the radiation of a blackbody is in thermal equilibrium with the material,
Einstein postulated that the field stimulates electron transitions between the
states. In addition, he postulated that some transitions must occur spontaneously.
(If the possibility of spontaneous transitions is not included, then there can be no
way for a field mode to receive energy if none is present to begin with.)
Einstein wrote down rate equations for populations of the two levels N1 and
N2 associated with the transition hν:
Ṅ1 = A 21 N2 − B 12 ρ (ν) N1 + B 21 ρ (ν) N2 ,

(13.19)
Ṅ2 = −A 21 N2 + B 12 ρ (ν) N1 − B 21 ρ (ν) N2

13.4 Einstein’s A and B Coefficients 309
The coefficient A 21 is the rate of spontaneous emission from state 2 to state 1,

B 12 ρ (ν) is the rate of stimulated absorption from state 1 to state 2, and B 21 ρ (ν) is
the rate of stimulated emission from state 2 to state 1.
In thermal equilibrium, the rate equations (13.19) are both equal to zero (i.e.,
Ṅ1 = Ṅ2 = 0) since the relative populations of each level must remain constant.
We can then solve for the spectral density ρ (ν) at the given frequency. Either
expression in (13.19) yields
A 21
ρ (ν) = N1
(13.20)
N2 B 12 − B 21
In thermal equilibrium, the spectral density must match the Planck spectral
density formula (13.13). In making the comparison, we should first rewrite the
ratio N1 /N2 of the populations in the two levels using the Boltzmann probability
factor:
N1 e −E 1 /kB T
= e (E 2 −E 1 )/kB T = e hν/kB T
Albert Einstein (1879–1955, German)
= (13.21)
N2 e −E 2 /kB T is without a doubt the most famous sci-
entist in history. Time Magazine named
Then when equating (13.20) to the Planck blackbody spectral density (13.13) we him Person of the Century. Born in
Ulm to a (non practicing) Jewish fam-
get ily, young Albert was influenced by a
A 21 8πhν3 medical student, Max Talmud, who took
= (13.22) meals with his family and enthusiasti-
e hν/kB T B 12 − B 21 c 3 e hν/kB T − 1
£ ¤
cally introduced the 10-year-old Albert
to geometry and other topics. Einstein’s
From this expression we deduce that father wanted Albert to be trained as an
electrical engineer, but Albert clashed
B 12 = B 21 (13.23) with his teachers in that program and
withdrew. Einstein then attended school
in Switzerland, and subsequently en-
and tered a mathematics program at the
8πhν3 Polytechnic in Zurich. There, Einstein
A 21 = B 21 (13.24) met his first wife, Mileva Maric, a fellow
c3 math student, who he later divorced
We see from (13.23) that the rate of stimulated absorption is the same as the before marrying Elsa Lowenthal. Early
on, Einstein could not find a job as
rate of stimulated emission. In addition, if one knows the rate of stimulated a professor, and so he worked in the
emission between a pair of states, it follows from (13.24) that one also knows the Swiss patent office until his "Miracle
Year" (1905), when published four ma-
rate of spontaneous emission. This is remarkable because to derive A 21 directly, jor papers, including relativity and the
one needs to use the full theory of quantum electrodynamics (the complete photoelectric effect (for which he later
photon description). However, to obtain B 21 , it is actually only necessary to received the Nobel prize). Thereafter,
job offers were never in short supply.
use the semiclassical theory, where the light is treated classically and the energy In 1933, as the Nazi regime came to
levels in the material are treated quantum-mechanically using the Schr´’odinger power, Einstein immigrated from Berlin
to the US and became a professor at
equation. The usual semiclassical theory cannot explain spontaneous emission, Princeton University. Einstein is most
but it can explain stimulated emission and the rate of sponaneous emission can noted for special and general relativity,
for which he became a celebrity scientist
then be obtained indirectly through (13.24). It should be mentioned that (13.23) in his own lifetime. Einstein also made
and (13.24) assume that the energy levels 1 and 2 are non-degenerate. Some huge contributions to statistical and
modifications must be made in the case of degenerate levels, but the procedure is quantum mechanics.
similar.
In writing the rate equations, (13.19), Einstein predicted the possibility of
creating lasers fifty years in advance of their development. These rate equations
are still valid even if the light is not in thermal equilibrium with the material.

The equations suggest that if the population in the upper state 2 can be made
artificially large, then amplification will result via the stimulated transition. The
rate equations also show that a population inversion (more population in the
upper state than in the lower one) cannot be achieved by ‘pumping’ the material
with the same frequency of light that one hopes to amplify. This is because the
stimulated absorption rate is balanced by the stimulated emission rate. The
material-dependent parameters A 21 and B 12 = B 21 are called the Einstein A and B
coefficients.
Appendix 13.A Thermodynamic Derivation of the Stefan-

Boltzmann Law
In this appendix, we derive the Stefan-Boltzmann law without relying on the
Planck blackbody formula. This derivation is included mainly for historical inter-
est. The derivation relies on the 1st and 2nd laws of thermodynamics. Consider a
container whose walls are all at the same temperature and in thermal equilibrium
with the radiation field inside, according to the properties of an ideal blackbody
radiator.
Notice that the units of energy density u field (energy per volume) are equiva-
lent to force per area, or in other words pressure. It turns out that the radiation
exerts a pressure of
P = u field /3 (13.25)
on the walls of the container. This can be derived from the fact that radiation of
energy ∆E imparts a momentum
∆E
∆p = cos θ (13.26)
c
when it is absorbed with incident angle θ on a surface.3 A similar momentum is

imparted when radiation is emitted.
Derivation of 13.25
Consider a thin layer of space adjacent to a container wall with area A. If the layer
has thickness ∆z, then the volume in the layer is A∆z. Half of the radiation inside
the layer flows toward the wall, where it is absorbed. The total energy in the layer
that will be absorbed is then ∆E = (A∆z)u field /2, which arrives during the interval
∆t = ∆z/(c cos θ), assuming for the moment that all light is directed with angle θ;
we must average the angle of light propagation over a hemisphere.
Figure 13.5 Field inside a black- 3 The fact that light carries momentum was understood well before the development of the
body radiator. theory of relativity and the photon description of light.

13.A Thermodynamic Derivation of the Stefan-Boltzmann Law 311
The pressure on the wall due to absorption (i.e. force or d p/d t per area) is then
2π π/2
∆p 1
dφ sin θ d θ
R R
∆t A Zπ/2
0 0 u field u field
P abs = = cos2 θ sin θ d θ = (13.27)
2π π/2 2 6
dφ sin θ d θ
R R
0
0 0
In equilibrium, an equal amount of radiation is also emitted from the wall as is

absorbed. This gives an additional pressure P emit = P abs , which confirms that the
total pressure is given by (13.25).
We derive the Stefan-Boltzmann law using the concept of entropy, which is

defined in differential form by the quantity
dQ
dS = (13.28)
T
where d Q is the injection of heat (or energy) into the radiation field in the box
and T is the temperature at which that injection takes place. We would like to
write d Q in terms of u field , V , and T . Then we may invoke the fact that S is a state
variable, which implies
∂2 S ∂2 S
= (13.29)
∂T ∂V ∂V ∂T
This is a mathematical statement of the fact that S is fully defined if the internal
energy, temperature, and volume of a system are specified. That is, S does not
depend on past temperature and volume history; it is dictated by the present
state of the system.
To obtain d Q in the form that we need, we can use the 1st law of thermody-
namics. It states that a change in internal energy dU = d (u fieldV ) can take place
by the injection of heat d Q or by doing work dW = P dV as the volume increases:
d Q = dU + P dV = d (u fieldV ) + P dV
1
= V d u field + u field dV + u field dV (13.30)
3
d u field 4
=V d T + u field dV
dT 3
We have used energy density times volume to obtain the total energy U in the radi-
ation field in the box. We have also used (13.25) to obtain the work accomplished
by pressure as the volume changes.
We can use (13.30) to rewrite (13.28) as
V d u field 4u field
dS = dT + dV (13.31)
T dT 3T
When we differentiate (13.31) with respect to temperature or volume we get
∂S 4u field
=
∂V 3T (13.32)
∂S V d u field
=
∂T T dT
We are now able to evaluate the partial derivatives in (13.29), which give
∂2 S 4 ∂ u field 4 1 ∂u field 4 u field

= = −
∂T ∂V 3 ∂T T 3 T ∂T 3 T2 (13.33)
2
∂ S 1 d u field
=
∂V ∂T T d T
Substitution of these into (13.29) yields a differential equation relating the
internal energy of the system to the temperature:
4 1 ∂u field 4 u field 1 d u field ∂u field 4u field

− = ⇒ = (13.34)
3 T ∂T 3 T2 T dT ∂T T
The solution to this differential equation is (13.2), where 4σ/c is a constant to be
determined experimentally.
Appendix 13.B Boltzmann Factor

The entropy of an object is defined by
S obj = k ln n obj (13.35)
which depends on the number of configurations n obj for a given state (e.g. defined,
for example, by fixed energy and volume). Now imagine that the object is placed
in contact with a very large thermal reservoir. For example, the ‘object’ could
be the electromagnetic radiation inside a hollow blackbody apparatus, and the
reservoir could be the walls of the apparatus, capable of holding far more energy
than the light field can hold. The condition for thermal equilibrium between the
object and the reservoir is
∂S obj ∂S res 1
= ≡ (13.36)
∂Uobj ∂Ures T
where temperature has been introduced as a definition, which is consistent with
(13.28).
The total number of configurations for the combined system is N = n obj n res ,
where n obj and n res are the number of configurations available within the object
and the reservoir separately. A thermodynamic principle is that all possible
configurations are equally probable. In thermal equilibrium, the probability for a
given configuration in the object will be proportional to
N
P∝ = n res = e S res /k (13.37)
n obj
where we have invoked (13.35).

Meanwhile, a Taylor’s series expansion of S res yields
¡ eq ¢ ∂S res ¯
¯
∼
S res (Ures ) = S res Ures + eq
Ures −Ures + ... (13.38)
¡ ¢
¯
∂Ures Ures
¯ eq

13.B Boltzmann Factor 313
Higher order terms are not needed since we assume the reservoir to be very large
so that it is disturbed only slightly by variations in the object. Since the overall
energy of the system is fixed, we may write
eq
Ures −Ures = ∆Ures = −∆Uobj (13.39)
where ∆Uobj is a small change in energy in the object. From (13.36), we may write
eq ∆Uob j
= 1 . Then (13.38) becomes P ∝ e k S res (Ures )− kT , or simply
∂S res 1
∂Ures T
∆Uob j
P ∝ e− kT (13.40)
since the first term in the exponent of is constant. Uob j represents an amount
energy added to the object when a mode in a configuration becomes occupied.
In the case of blackbody radiation, a mode takes on energy nhν, where n is the
number of energy quanta in that mode. Boltzmann’s factor (13.40) is proportional
to the probability that the mode has this energy.

Exercises
Exercises for 13.1 Stefan-Boltzmann Law
P13.1 The Sun has a radius of R S = 6.96 × 108 m. What is the total power that
it radiates, given a surface temperature of 5750 K?
P13.2 A 1 cm-radius spherical ball of polished gold hangs suspended inside

an evacuated chamber that is at room temperature 20◦ C. There is no
pathway for thermal conduction to the chamber wall.
(a) If the gold is at a temperature of 100◦ C, what is the initial rate of
temperature loss in ◦ C/s? The emissivity for polished gold is e = 0.02.
The specific heat of gold is 129 J/kg · ◦ C and its density is 19.3 g/cm3 .
HINT: Q = mc∆T and Power = Q/∆t .
(b) What is the initial rate of temperature loss if the ball is coated with
flat black paint, which has emissivity e = 0.95?
HINT: You should consider the energy flowing both ways.
Exercises for 13.3 Planck’s Formula
P13.3 Derive (or try to derive) the Stefan-Boltzmann law by integrating the
(a) Rayleigh-Jeans energy density
Z∞
u field = ρ Rayleigh-Jeans (ν) d ν
0
Please comment.
(b) Wien energy density
Z∞
u field = ρ Wien (ν) d ν
0
Please evaluate σ.
R∞ 6
HINT: x 3 e −ax d x = a4
.
0
(c) Planck energy density
Z∞
u field = ρ Planck (ν) d ν
0
Please evaluate σ. Compare results of (b) and (c).

R∞ 3 x π4
HINT: exaxd−1 = 15a 4.
0

Exercises 315
P13.4 (a) Derive Wien’s displacement law
0.00290 m · K
λmax =
T
which gives the strongest wavelength present in the blackbody spectral
distribution.
HINT: See Example 13.1. You may like to know that the solution to the
transcendental equation (5 − x) e x = 5 is x = 4.965.
(b) What is the strongest wavelength emitted by the Sun, which has a
surface temperature of 5750 K (see P13.1)?
(c) Also find νmax and show that it is not the same as c/λmax . Why would
we be interested mainly in λmax ?

Bibliography
[1] M. Born and E. Wolf, Principles of Optics, seventh ed. (Cambridge University
Press, 1999).
[2] J. D. Jackson, Classical Electrodynamics, 3rd ed. (Wiley, 1999).
[3] G. R. Fowles, Introduction to Modern Optics, 2nd ed. (Dover, 1975).
[4] J. W. Goodman, Introduction to Fourier Optics (McGraw-Hill, 1968).
[5] R. D. Guenther, Modern Optics (Wiley, 1990).
[6] P. W. Milonni and J. H. Eberly, Lasers (Wiley, 1988).
[7] P. W. Milonni, The Quantum Vacuum: an Introduction to Quantum Electrody-

namics (Academic Press, 1994).
[8] J. R. Reitz, F. J. Milford, and R. W. Christy, Foundations of Electromagnetic

Theory, fourth ed. (Addison-Wesley, 1992).
[9] A. Yariv and P. Yeh, Optical Waves in Crystals (Wiley, 1984).
317
Index
Poynting’s theorem, 180 Boundary Conditions For Fields at an

Interface, 74
ABCD law for gaussian beams, 279 Brewster’s Angle, 70
ABCD matrices Brewster, David, 71
transmission through a curved broadband, 160
surface, 228
carrier frequency, 170
ABCD Matrices for Combined Optical
Cartesian coordinates, 1
Elements, 228
causality, 183, 184
ABCD matrix, 226
Causality and Exchange of Energy
aberration, 225
with the Medium, 179
Aberrations and Ray Tracing, 236
centroid, 181
absolute value, 2, 10
chirp, 173
Airy pattern, 268
chirped pulse amplification, 177
Ampere’s Law, 30
Christiaan Huygens, 114
Ampere’s law, 25
chromatic aberration, 237
angle addition formula, 6
circular polarization, 134
anisotropic, 105
circular polarizer, 148
aperture, 246
circularly polarized light, 133
Array Theorem, 270
coefficient of finesse, 85
array theorem, 263 coherence length, 196
arrival time, 174 coherence time, 196
astigmatism, 238 coma, 238
complex angle, 8
beam waist, 263, 275, 278 complex conjugate, 10
textbfBeyond Critical Angle: Tunnel- complex notation, 43, 45
ing of Evanescent Waves, 86 Complex Numbers, 6
biaxial, 111 complex plane, 10
textbfBiaxial and Uniaxial Crystals, complex polar representation, 9
111 concave, 227
Biot, Jean-Baptiste, 28 conductivity, 61
Biot-Savart law, 28 constitutive relation, 47
birefringence, 105, 110, 113 Constitutive Relation in Crystals, 106
blackbody, 301 continuity equation, 31
blackbody radiation, 301 convex, 227
Boltzmann Factor, 312 convolution theorem, 23
boundary conditions, 65 cosine complex representation, 7
319
320 INDEX
Coulomb’s law, 26 energy transport velocity, 180

critical angle, 71 Equipartition Principle, Failure of,
cross product, 3 303
curl, 3 Euler’s formula, 7
current density, 27 evanescent waves, 72
curvature of the field (aberration), 238 extraordinary, 105
cylindrical coordinates, 3 extraordinary index, 111, 112
degree of coherence, 193, 196 f-number, 278

degree of polarization, 147, 150 Fabry, Charles, 88
delta function, 15 Fabry-Perot, 88
density of modes, 304 Fabry-Perot etalon, 91
depth of focus, 278 Fabry-Perot interferometer, 90
determinant, 17 Fabry-Perot, Distinguishing Nearby
dielectric, 43 Wavelengths, 92
Diffraction Grating, 272 far field, 252
Diffraction of a Gaussian Field Pro- Faraday’s Law, 29
file, 275 Faraday’s law, 25, 44
Diffraction with Cylindrical Symme- Faraday, Michael, 29
try, 253 fast axis, 141
dispersion, 43, 159, 169, 171 Fermat’s principle, 221
dispersion relation, 44 Fermat, Pierre, 221
in crystals, 109 finesse, 94
displacement current, 32 finesse, coefficient of, 85
distortion, 238 fluence, 192
divergence, 3 focal length, 232
divergence theorem, 5 Fourier expansion, 11
dot product, 2 Fourier integral theorem, 11, 14
Double-InterfaceProblem, 80 Fourier Spectroscopy, 197
Fourier spectroscopy, 198
eikonal equation, 217, 220 Fourier Theory, 11
Einstein’s A and B Coefficients, 308 Fourier transform, 14, 165
Einstein, Albert, 309 textbfFraunhofer Approximation, 252
electric field, 26 Fraunhofer Diffraction Through a
Electric Field in Crystals, 119 Lens, 263
ellipsometer, 145 Fraunhofer, Joseph, 252
Ellipsometry, 145 free spectral range, 93
ellipsometry, 133 frequency, 44
elliptical polarization, 133, 135, 136 Frequency Spectrum of Light, 165
Elliptically Polarized Light, 136 Fresnel Approximation, 250
ellipticity, 137, 145 Fresnel Coefficients, 66
emissivity, 301, 302 Fresnel coefficients, 68
energy density, 180, 302 Fresnel’s equation, 109
Energy Density of Electric Fields, 56 Fresnel, Augustin, 67
Energy Density of Magnetic Fields, 58 Fresnel-Kirchhoff formula, 255

INDEX 321
fringe, 288 image, 223, 225

fringe pattern, 287 Image Formation, 231
fringe visibility, 191, 197 imaginary number, 7
Fringe Visibility and Coherence Length, imaginary part, 9
196 Index of Refraction, 46
fringes, 91 index of refraction, 43
frustrated total internal reflection, 86 Index of Refraction of a Conductor,
52
Gabor, Dennis, 290 instantaneous power spectrum, 183
Galileo, 231 intensity, 56
Gauss’ Law, 26 Intensity of Superimposed Plane
Gauss’ law, 25, 26 WavesIntensity of Superim-
Gauss’ Law for Magnetic Fields, 27 posed Plane Waves, 160
Gauss, Friedrich, 27 interferogram, 287
Gaussian Laser Beams, 277 interferograms, 287
Generalized Context for Group Delay, inverse Fourier transform, 165
174 inverse matrix, 17
Generating Holograms, 289 irradiance, 54, 56
Gouy shift, 278 Irradiance of a Plane Wave, 55
gradient, 3 isotropic, 105
grating, 177 isotropic medium, 55
Green’s theorem, 258
Jones Matrices for Wave Plates, 141
group delay, 160, 174
Jones matrix, 133
group delay function, 170
Jones Matrix for Polarizers at Arbi-
group velocity, 159, 163, 170
trary Angles, 140
Group vs. Phase Velocity: Sum of Two
Jones vector, 133, 136
Plane Waves, 163
textbfJones Vectors for Representing
Hankel transform, 254 Polarization, 135
helicity, 137, 145 Jones, R. Clark, 135
Helmholtz equation, 249 Kirchhoff, Gustav, 301
vector, 249 Kramers-Kronig Relations, 184
hologram, 287 Kramers-Kronig relations, 185
Holographic Wavefront Reconstruc-
tion, 291 Laplacian, 4
holography, 287 laser beam, 277
Huygens’ Elliptical Construct for a laser cavity, 234
Uniaxial Crystal, 122 law of reflection, 65
Huygens’ Principle as Formulated by lens, 228
Fresnel, 246 lens maker’s formula, 229
Huygens, Christian, 245 Linear Algebra, 16
hyperbolic cosine, 8 linear medium, 47
hyperbolic sine, 8 linear polarization, 133
Linear Polarizers and Jones Matrices,
identity matrix, 17 137

322 INDEX
Linear, Circular, and Elliptical Polar- phase delay, 170

ization, 134 phase velocity, 163, 170
Lorentz model, 49 photometry, 59
Lorentz, Hendrik, 49 photon, 302
Planck’s Formula, 305
magnetic field, 27 Planck, Max, 307
magnification, 232 plane of incidence, 64
magnitude, 2 plane wave, 45
matrix multiplication, 16 Plane Wave Solutions to the Wave
Maxwell’s Adjustment to Ampere’s Equation, 44
Law, 31 plane waves, 43
Maxwell’s equations, 25 plasma frequency, 51
Maxwell, James, 32 Poisson’s spot, 248
Michelson Interferometer, 192 Polarization Effects of Reflection and
Michelson, Albert, 192 Transmission, 144
mirage, 220 polarization of a medium, 34
Mueller matrix, 151 polarization of light, 133
Multilayer Coatings, 95 Polarization of Materials, 34
Multilayer Stacks, 99
polarizer, 133
multimode, 277
Polaroid, 137
narrowband, 160 positive crystal, 112
negative crystal, 112 power spectrum, 166
Newton, Isaac, 160 Poynting vector, 54, 55
normal to a surface, 5, 6 Poynting Vector in a Uniaxial Crystal,
113
object, 225 Poynting’s theorem, 53, 54, 180
obliquity factor, 250, 258 Poynting, John Henry, 54
optic axes of a crystal, 111 principal axes, 107, 110
optical activity, 154 principal planes, 218, 233
optical axis, 218, 225 Principal Planes for Complex Optical
optical path length, 222 Systems, 233
ordinary, 111 principal value, 185
oscillator strength, 51 pulse chirping, 171
Pulse Chirping in a Grating Pair, 177
p-polarized light, 64, 81 pulse stretching, 171
Packet Propagation and Group Delay,
169 Quadratic Dispersion, 171
paraxial approximation, 218, 225 quarter-wave plate, 145
paraxial ray theory, 218 quasi-monochromatic light, 199
Paraxial Rays and ABCD Matrices,
224 radiometry, 59
paraxial wave equation, 251, 252 Radiometry Versus Photometry, 59
Parseval’s theorem, 16, 166 radius of curvature, 229
Partially Polarized Light, 147 ray, 221
pellicle, 89 ray tracing, 225

INDEX 323
Rayleigh criterion, 269 spectrum, 165

Rayleigh range, 278 spherical aberration, 238
Rayleigh, Lord, 164 spherical interface, 227
real part, 9 spherical surface, 226
rectangular aperture, 251, 253 spherical wave, 246, 292
reflectance, 68 Stability of Laser Cavities, 234
Reflectance and Transmittance, 68 Stefan-Boltzmann Law, 302
reflection, 226 Stokes parameters, 149
Reflection and Refraction at Curved Stokes vector, 147, 150
Surfaces, 226 Stokes’ theorem, 6
reflection from a curved surface, 227 Stokes, George Gabriel, 147
Reflections from Metal, 73 Strutt, John William, 164
refraction, 65 subluminal, 177
Refraction at a Uniaxial Crystal Sur- superluminal, 159, 176
face, 112 surface figure, 288
Refraction at an Interface, 63 susceptibility, 47
refractive index, 11 susceptibility tensor, 106
reshaping delay, 175 Sylvester’s theorem, 17, 99
resolution, 263, 268 Symmetry of Susceptibility Tensor,
Resolution of a Telescope, 267 115
resolving power, 94, 275
retarder, 141 Table of Integrals, 19
ring cavity, 235 Taylor’s series, 7
Roemer, Ole, 41 Temporal Coherence, 195
Rotation of Coordinates, 117 temporal coherence, 191
Testing Optical Components, 288
s-polarized light, 64, 81 The Eikonal Equation, 218
Savart, Felix, 28 The Lorentz Model of Dielectrics, 49
scalar diffraction, 246 The Wave Equation, 36
Scalar Diffraction Theory, 248 Thermodynamic Derivation of the
scalar Helmholtz equation, 249 Stefan-Boltzmann Law, 310
senkrecht, 64 thin lens, 229
Setup of a Fabry-Perot Instrument, Total Internal Reflection, 71
90 transmittance, 69
signal front, 179 Two-Interface Transmittance at Sub
sine complex representation, 7 Critical Angles, 83
skin depth, 48
slow axis, 141 uniaxial, 111
Snell’s law, 65, 112 unit vector, 1
Snell, Willebrord, 65 unpolarized light, 133, 147
spatial coherence, 192, 199
Spatial Coherence with a Continuous Van Cittert-Zernike Theorem, 206
Source, 204 vector, 1
spatial filter, 299 Vector Calculus, 1
Spectrometers, 273 vector multiplication, 2

324 INDEX
wave number, 44
wave plate, 133, 141
wavelength, 44
Young’s two-slit setup, 192, 199

Young’s Two-Slit Setup and Spatial
Coherence, 199
Young, Thomas, 203

Physical Constants
Constant Symbol Value
Permittivity ²0 8.854 × 10−12 C2 /N · m2

Permeability µ0 4π × 10−7 T · m/A (same as kg · m C2 )
±
Speed of light in vacuum c 2.9979 × 108 m/s

Charge of an electron qe 1.602 × 10−19 C
Mass of an electron me 9.108 × 10−31 kg
Boltzmann’s constant kB 1.380 × 10−23 J/K
Plancks constant h 6.626 × 10−34 J · s
ħ 1.054 × 10−34 J · s
Stefan-Boltzmann constant σ 5.670 × 10−8 W/m2 · K4

BYUOpticsBook PDF

Uploaded by

Copyright:

Available Formats

BYUOpticsBook PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BYUOpticsBook PDF

Uploaded by

Copyright:

Available Formats

Physics of Light and Optics

August 13, 2010

2 Plane Waves and Refractive Index 43

3 Reflection and Refraction 63

3.2 The Fresnel Coefficients . . . . . . . . . . . . . . . . . . . . . . . . 66

4 Multiple Parallel Interfaces 79

5 Propagation in Anisotropic Media 105

Review, Chapters 1–5 127

6 Polarization of Light 133

© 2010 Peatross and Ware

7 Superposition of Quasi-Parallel Plane Waves 159

8 Coherence Theory 191

Review, Chapters 6–8 213

9 Light as Rays 217

© 2010 Peatross and Ware

11 Diffraction Applications 263

12 Interferograms and Holography 287

Review, Chapters 9–12 295

13 Blackbody Radiation 301

Physical Constants 325

© 2010 Peatross and Ware

0.1 Vector Calculus

r0 = 1x̂ + 1ŷ + 2ẑ Å.

Solution: As mentioned above, the field is given by E (r) = q (r − r0 ) 4π²0 |r − r0 |3 .

The electric field is then

where φ is the angle between the vectors k and r.

Proof of the final line of (0.2)

© 2010 Peatross and Ware

Another type of vector multiplication is the cross product , which is accom-

the divergence, which applies to vector functions, is given by

and the curl, also applies to vector functions, is given by

Solution: By inspection of Fig. 1, the cartesian unit vectors may be expressed as

x̂ = cos φρ̂ − sin φφ̂ and ŷ = sin φρ̂ + cos φφ̂

© 2010 Peatross and Ware

from which we obtain the following derivatives:

Putting this all together, we arrive at

We will sometimes need a multidimensional second derivative called the

In cartesian coordinates, this reduces to

This is possible because each unit vector is a constant in Cartesian coordinates.

© 2010 Peatross and Ware

Verification of (0.10) in Cartesian coordinates

From (0.6), we have

After some factorization, we obtain

where on the final line we invoked (0.4), (0.5), and (0.8).

We will also encounter several integral theorems involving vector functions in

© 2010 Peatross and Ware

Now we evaluate the right side of (0.11):

Another important theorem is Stokes’ theorem :

0.2 Complex Numbers

sine function is intrinsically present in this formula through the identity

cos α + β = cos α cos β − sin α sin β (0.14)

This is a good formula to commit to memory, as well as the frequently used

© 2010 Peatross and Ware

e i φ = cos φ + i sin φ (0.16)

By expanding each function appearing in (0.16) in a Taylor’s series about the

© 2010 Peatross and Ware

Finally, for cos2 φ + sin2 φ = 1 we have

© 2010 Peatross and Ware

where the phase β is conveniently contained within the complex factor Ã ≡ Ae i β .