Iskandarani
Iskandarani
Iskandarani
Background
1. Name
4. Background
5. Scientic Interest
7. Programming experience/language
3
Reserve List
Dale B. Haidvogel and Aike Beckmann, Numerical Ocean Circulation Modeling
Imperial College Press, 1999. (CGFD)
Dale R. Durran, Numerical Methods for Wave Equations in Geophysical Fluid Dy-
namics, Springer, New York, 1998. (CGFD)
George J. Haltiner Roger T. Williams, Numerical Prediction and Dynamic Meteo-
rology, Wiley, 1980. (CGFD)
John C. Tannehill, Dale A. Anderson, and Richard H. Pletcher, Computational
Fluid Mechanics and Heat Transfer, Taylor and Francis, 1997. (FDM)
G. E. Forsythe and W. R. Wasow Finite-Dierence Methods for Partial Dierential
Equations, John Wiley and Sons, Inc., New York, 1960. (FDM)
R. D. Richtmyer and K. W. Morton, Dierence Methods for Initial{Value Problems,
Interscience Publishers (J. Wiley & Sons), New York, 1967.
Useful References
Gordon D. Smith, Numerical Solution of Partial Dierential Equations : Finite
Dierence Methods, Oxford University Press, New York, 1985. (FDM)
K.W. Morton and D.F. Mayers, Numerical Solution of Partial Dierential Equa-
tions : An Introduction, Cambridge University Press, New York, 1994. (FDM)
P.J. Roache, Computational Fluid Dynamics, Hermosa Publisher, 1972, ISBN 0-
913478-05-9. (FDM)
C.A.J. Fletcher, Computational Techniques for Fluid Dynamics, 2 volumes, 2nd
ed., Springer-Verlag, New York, 1991-1992. (Num. Sol. of PDE's)
Roger Peyret and Thomas D. Taylor, Computational Methods for Fluid Flow,
Springer-Verlag, New York, 1990. (Num. Sol. of PDE's)
Roger Peyret, Handbook of Computational Fluid Mechanics, Academic Press, San
Diego, 1996. (QA911 .H347 1996)
Joel H. Ferziger and M. Peric Computational Methods For Fluid Dynamics, Springer-
Verlag, New York, 1996.
R. S. Varga, Matrix Iterative Analysis, Prentice{Hall, New York, 1962.
Bengt Fornberg, A Practical Guide to Pseudospectral Methods, Cambridge Univer-
sity Press, Cambridge, 1998. (Spectral methods)
C. Canuto, M.Y. Hussaini, A. Quarteroni and T.A. Zang, Spectral Methods in Fluid
Dynamics, Springer-Verlag, New York, 1991. (Spectral Methods)
4
John P. Boyd, Chebyshev and Fourier Spectral Methods Dover Publications, 2000.
(Spectral methods)
O.C. Zienkiewicz and R.L. Taylor, The Finite Element Method, 4th edition, Mc
Graw Hill, 1989.
George Em. Karniadakis and Spencer J. Sherwin, Spectral h ; p Element Methods
for CFD, New York, Oxford University Press, 1999. (Spectral Elements)
Michel O. Deville, Paul F. Fischer and E.H. Mund, High-Order Methods for In-
compressible Fluid Flow , Cambridge Monographs on Applied and Computational
Mathematics, Cambridge University Press, Cambridge, 2002.
Useful Software
Plotting Software (e.g. matlab, NCAR Graphics, gnuplot)
Linear Algebra (e.g. matlab, LAPACK, IMSL)
Fast Fourier Transforms (e.g. matlab, tpack, ?)
Fortran Compiler (debuggers are useful too)
Numerical Methods in Ocean Modeling
Lecture Notes for MPO662
Introduction
1.1 Justication of CFD
Fluid motion is governed by the Navier-Stokes equations, a set of coupled and nonlinear
partial dierential equations derived from the basic laws of conservation of mass, momen-
tum and energy. The unknowns are usually the velocity, the pressure and the density (for
stratied uids) and some tracers like temperature and salinity. The analytical paper
and pencil solution of these equations is practically impossible save for the simplest of
ows. The simplications can take the form of geometrical simplication (the ow is in
a rectangle or a circle), and/or physical simplication (periodicity, homogeneous density,
linearity, etc...). Occasionally, it is possible to make headway by using asymptotic anal-
yses technique, and there has been remarkable success in the past O(100) year, like the
development of boundary layer theory.
Scientists had to resort to laboratory experiments when theoretical analyses was
impossible. Physical complexity can be restored to the system. The answers delivered
are, however, usually qualitatively dierent since dynamical and geometric similitudes
are dicult to enforce simultaneously between the lab experiment and the prototype. A
prime example is the Reynolds' number similarity which if violated can turn a turbulent
ow laminar. Furthermore, the design and construction of these experiments can be
dicult (and costly), particularly for stratied rotating ows.
Computational uid dynamics (CFD) is an additional tool in the arsenal of scientists.
In its early days CFD was often controversial, as it involved additional approximation
to the governing equations and raised additional (legitimate) issues. Nowadays CFD is
an established discipline alongside theoretical and experimental methods. This position
is in large part due to the exponential growth of computer power which has allowed us
to tackle ever larger and more complex problems.
1.2 Discretization
The central process in CFD is the process of discretization, i.e. the process of taking
dierential equations with an innite number of degrees of freedom, and reducing it
to a system of nite degrees of freedom. Hence, instead of determining the solution
9
10 CHAPTER 1. INTRODUCTION
Greenland
Iceland
everywhere and for all times, we will be satised with its calculation at a nite number
of locations and at specied time intervals. The partial dierential equations are then
reduced to a system of algebraic equations that can be solved on a computer.
Errors creep in during the discretization process. The nature and characteristics
of the errors must be controlled in order to ensure that 1) we are solving the correct
equations (consistency property), and 2) that the error can be decreased as we increase
the number of degrees of freedom (stability and convegence). Once these two criteria are
established, the power of computing machines can be leveraged to solve the problem in
a numerically reliable fashion.
Various discretization schemes have been developed to cope with a variety of issues.
The most notable for our purposes are: nite dierence methods, nite volume methods,
nite element methods, and spectral methods.
1.2. DISCRETIZATION 11
f (x) = lim
0
f (x + x) ; f (x) (1.1)
x 0
! x
with a nite limiting process,i.e.
Figure 1.2: Elemental partition of the global ocean as seen from the eastern and western equa-
torial Pacic. The inset shows the master element in the computational plane. The location of
the interpolation points is marked with a circle, and the structuredness of this grid local grid is
evident from the predictable adjacency pattern between collocation points.
1.4 Turbulence
Most ows occuring in nature are turbulent, in that they contain energy at all scales
ranging from hundred of kilometers to a few centimeters. It is obviously not possible to
model all these scales at once. It is often sucient to model the "large scale physics"
and to relegate the small unresolvable scale to a subgrid model. Subgrid models occupy
a large discipline in CFD, we will use the simplest of these models which rely on a simple
eddy diusivity coecient.
and diusion by viscous action. Equation (1.7) is an example of parabolic partial dif-
ferential equation requiring initial and boundary data for its unique solution. Equation
(1.6) is an example of an elliptic partial dierential equation. In this instance it is a
Poisson equation linking a given vorticity distribution to a streamfunction. Occasion-
ally the term prognostic and diagnostic are applied to the vorticity and streamfunction,
respectively, to mean that the vorticity evolves dynamically in time to balance the conser-
vation of momentum, while the streamfunction responds instantly to changes in vorticity
to enforce kinematic constraints. A numerical solution of this coupled set of equations
would proceed in the following manner: given an initial distribution of vorticity, the cor-
responding streamfunction is computed from the solution of the Poisson equations along
with the associated velocity the vorticity equation is then integrated in time using the
previous value of the unknown elds the new streamfunction is subsequently updated.
The process is repeated until the desired time is reached.
In order to gauge which process dominates, advection or diusion, the vorticity evo-
lution, we proceed to non-dimensionalize the variables with the following, time, length
and velocity scales, T , L and U , respectively. The vorticity scale is then = U=L from
the vorticity denition. The time rate of change, advection and diusion then scale as
=T , U =L, and =L2 as shown in the third line of equation 1.7. Line 4 shows the
relative sizes of the term after multiplying line 3 by T=. If the time scale is chosen to be
the advective time scale, i.e. T = L=U , then we obtain line 5 which shows a single dimen-
sionless parameter, the Reynolds number, controlling the evolution of . When Re 1
diusion dominates and the equation reduces to the so called heat equation = r2 . If
Re 1 advection dominates almost everywhere in the domain. Notice that dropping the
viscous term is problematic since it has the highest order derivative, and hence controls
the imposition of boundary conditions. Diusion then has to become dominant near the
boundary through an increase of the vorticity gradient in a thin boundary layers where
advection and viscous action become balanced.
What are the implications for numerical solution of all the above analysis. By care-
fully analysing the vorticity dynamics we have shown that a low Reynolds number sim-
ulation requires attention to the viscous operator, whereas advection dominates in high
Reynolds number ow. Furthermore, close attention must be paid to the boundary layers
forming near the edge of the domain. Further measures of checks on the solution can be
obtained by spatially integrating various forms of the vorticity equations to show that
energy, kinetic energy here, and enstrophy 2 =2 should be conserved in the inviscid case,
Re = 1, when the domain is closed.
16 CHAPTER 1. INTRODUCTION
Chapter 2
Basics of PDEs
Partial dierential equations are used to model a wide variety of physical phenomena.
As such we expect their solution methodology to depend on the physical principles used
to derive them. A number of properties can be used to distinguish the dierent type of
dierential equations encountered. In order to give concrete examples of the discussions
to follow we will use as an example the following partial dierential equation:
auxx + buxy + cuyy + dux + euy + f = 0: (2.1)
The unknown function in equation (2.1) is the function u which depends on the two inde-
pendent variables x and y. The coecients of the equations a b : : : f are yet undened.
The following properties of a partial dierential equation are useful in determining
its character, properties, method of analysis, and numerical solution:
Order : The order of a PDE is the order of the highest occuring derivative. The order
in equation (2.1) is 2. A more detailed description of the equation would require
the specication of the order for each independent variable. Equation 2.1 is second
order in both x and y. Most equations derived from physical principles, are usually
rst order in time, and rst or second order in space.
Linear : The PDE is linear if none of its coecients depend on the unknown function.
In our case this is tantamount to requiring that the functions a b : : : f are inde-
pendent of u. Linear PDEs are important since their linear combinations can be
combined to form yet another solution. More mathematically, if u and v are solu-
tion of the PDE, the so is w = u +
v where and
are constants independent
of u, x and y. The Laplace equation
uxx + uyy = 0
is linear while the one dimensional Burger equation
ut + uux = 0
is nonlinear. The majority of the numerical analysis results are valid for linear
equations, and little is known or generalizable to nonlinear equations.
17
18 CHAPTER 2. BASICS OF PDES
Quasi Linear A PDE is quasi linear if it is linear in its highest order derivative, i.e.
the coecients of the PDE multiplying the highest order derivative depends at
most on the function and lower order derivative. Thus, in our example, a, b and c
may depend on u, ux and uy but not on uxx , uyy or uxy . Quasi linear equations
form an important subset of the larger nonlinear class because methods of analysis
and solutions developed for linear equations can safely be applied to quasi linear
equations. The vorticity transport equation of quasi-geostrophic motion:
!
@ r2 + @ @ r2 ; @ @ r2 = 0 (2.2)
@t @y @x @x @y
where r2 = xx + yy is a third order quasi linear PDE for the streamfunction .
The equation can be simplied if and
can be chosen such that A = C = 0 which
in terms of the transformation factors requires:
(
ax2 + bxy + cy2 = 0 (2.16)
a
x2 + b
x
y + c
y2 = 0
Assuming y and
y are not equal to zero we can rearrange the above equation to have
the form ar2 + br + c = 0 where r = x=y or
x =
y . The number of roots for this
quadratic depends on the sign of the determinant b2 ; 4ac. Before considering the
dierent cases we note that the sign of the determinant is independent of the choice of
the coordinate system. Indeed it can be easily shown that the determinant in the new
system is B 2 ;4AC = (b2 ;4ac)(x
y ;y
x )2 , and hence the same sign as the determinant
in the old system since the quantity (x
y ; y
x ) is nothing but the squared Jacobian
of the mapping between (x y) and (
) space, and the Jacobian has to be one-signed
for the mapping to be valid.
2.1.1 Hyperbolic Equation: b2 ; 4ac > 0
In the case where b2 ; 4ac > 0 equation has two distincts real roots and the equation is
called hyperbolic. The roots are given by r = b 2ba2 4ac . The coordinate transformation
p
; ;
(2.19)
dx
; ;
y 2a
dy = ;
x = b+ b2 4ac
p
(2.20)
dx
;
y 2a
The roots of the quadratic are hence nothing but the slope of the constant and constant
curves. These curves are called the characteristic curves. They are the preferred
direction along which information propagate in a hyperbolic system.
In the (
) system the equation takes the canonical form:
Bu + Du + Eu + F = 0 (2.21)
The solution can be easily obtained in the case D = E = F = 0, and takes the form:
u = G() + H (
) (2.22)
where G and H are function determined by the boundary and initial conditions.
20 CHAPTER 2. BASICS OF PDES
Example 1 The one-dimensional wave equation:
utt ; 2 uxx = 0 ;1 x 1 (2.23)
where is the wave speed is an example of a hyperbolic system, since its b2 ; 4ac =
42 > 0. The slope of the charasteristic curves are given by
dx = (2.24)
dt
which, for constant , gives the two family of characteristics:
= x ; t
= x + t (2.25)
Initial conditions are needed to solve this problem assuming they are of the form:
u(x 0) = f (x) ut (x 0) = g(x) (2.26)
we get the equations:
F (x) + G(x) = f (x) (2.27)
;F (x) + G (x) = g(x)
0 0
(2.28)
The second equation can be integrated in x to give
Zx
; !F (x) ; F (x0 )] + !G(x) ; G(x0)] = g( ) d (2.29)
x0
where x0 is arbitrary and an integration variable. We now have two equations in two
unknowns, F and G, and the system can be solved to get:
Zx
F (x) = 2 ; 2 g( ) d ; F (x0 ) ;2 G(x0 )
f ( x) 1 (2.30)
x0
Z
G(x) = f (2x) + 21 g( ) d + F (x0 ) ;2 G(x0 ) :
x
(2.31)
x0
To obtain the nal solution we replace x by x ; t in the expression for F and by x + t
in the expression for G adding up the resulting expressions the nal solution to the PDE
takes the form:
f ( x ; t ) + f (x + t ) 1 Z x+t
u(x t) = 2 + 2 g( ) d (2.32)
x t
;
Figure 2.1 shows the solution of the wave equation for the case where = 1, g = 0, and
f (x) is a square wave. The time increases from left to right. The succession of plots
shows two travelling waves, going in the positive and negative x-direction respectively
at the speed = 1. Notice that after the crossing, the two square waves travel with no
change of shape.
2.1. CLASSIFICATION OF SECOND ORDER PDES 21
1 1 1
0 0 0
−5 0 5 −5 0 5 −5 0 5
1 1 1
0 0 0
−5 0 5 −5 0 5 −5 0 5
Figure 2.1: Solution to the second order wave equation. The top left gure shows the
initial conditions, and the subsequent ones the solution at t = 0.4, 0.8, 1.2, 1.6 and 2.0.
and k is the wavenumber. The transformed PDE is simply a rst order ODE, and can
be easily solved with the help of the initial conditions u(x 0) = u0 (x):
u~(k t) = u~0 e k2 2t
;
(2.38)
The nal solution is obtained by back Fourier transform u(x t) = F 1 (~u). The lat-
;
ter can be written as a 2convolution since the back transforms of u0 = F 1 (~u0 ), and
;
1 Z ;(x;X )2
u(x t) = p u0 (X )e 4 2 t dX
1
(2.39)
2 t
;1
0.8
0.6
0.4
0.2
0
−5 0 5
Figure 2.2: Solution to the second order wave equation. The gure shows the initial
conditions (the top hat prole), and the solution at times t = 0.01, 0.1, 0.2.
As an example we show the solution of the heat equation using the same square initial
condition as for the wave equation. The solution can be written in terms of the error
function:
Z 1 ;(x;X)2
where erf(z ) = 1 0z e s2 ds. The solution is shown in gure 2.2. Instead of a travelling
p
;
wave the solution shows a smearing of the two discontinuities as time increases accompa-
nied by a decrease of the maximum amplitude. As its name indicates, the heat equation
is an example of a diusion equation where gradients in the solution tend to be smeared.
2.2. WELL-POSED PROBLEMS 23
1. Ellipitic = ;
2 < 0
The solution for this case is then:
u(x y) = sin(NNx ) cos(Nx)
2 cosh(
Ny) v(x y) =
N 2 sinh(
Ny ) (2.46)
24 CHAPTER 2. BASICS OF PDES
Notice that even though the PDEs are identical, the two solutions u and v are
dierent because of the boundary conditions. For N ! 1 the boundary conditions
become identical, and hence one would expect the solution to become identical.
However, it is easy to verify that ju ; vj ! 1 for any nite y > 0 as N ! 1.
Hence small changes in the boundary condition lead to large changes in the solution,
and the continuity between the boundary data and the solution has been lost. The
problem in this case is that no boundary condition has been imposed on y ! 1
as required by the elliptic nature of the problem.
2. Hyperbolic =
2 > 0
The solution is then
u(x y) = sin(NNx) cos(Nx)
2 cos(
Ny ) v (x y ) = ; N 2 sin(
Ny) (2.47)
Notice that in this case u v ! 0 when N ! 1 for any nite value of y.
0.8
1/κ
t
0.6
0.4
u (x )
0.2 0 0
0
−4 −2 0 2 4 6
x
Figure 2.3: Characteristic lines for the linear advection equation. The solid lines are the
characteristics emanating from dierent locations on the initial axis. The dashed line
represents the signal at time t = 0 and t = 1. If the solution at (x t) is desired, we rst
need to nd the foot of the characteristic x0 = x ; ct, and the value of the function there
at the initial time is copied to time t.
where du is the total dierential of the function u. Since the right hand side is zero,
then the function must be constant along the lines dx=dt = c, and this constant must be
equal to the value of u at the initial time. The solution can then written as:
u(x t) = u0 (x0 ) along ddxt = c (2.51)
where x0 is the location of the foot of the characteristic, the intersection of the char-
acteristic with the t = 0 line. The simplest way to visualize this picture is to consider
the case where c is constant. The characteristic lines can then be obtained analytically:
they are straight lines given by x = x0 + ct. A family of characteristic lines are shown
in gure 2.3.1 where c is assumed positive. In this example the information is travelling
from left to right at the constant speed c, and the initial hump translates to the right
without change of shape.
If the domain is of nite extent, say 0 x L, and the characteristic intersects the
line x = 0 (assuming c > 0), then a boundary condition is needed on the left boundary
to provide the information needed to determine the solution uniquely. That is we need
to provide the variation of u at the \inlet" boundary in the form:u(x = 0 t) = g(t). The
solution now can be written as:
(
u(x t) = ug(0t(x;;x=cct) for x ; ct > 0
) for x ; ct < 0 (2.52)
Not that since the information travels from left to right, the boundary condition is
needed at the left boundary and not on the right. The solution would be impossible
to determine had the boundary conditions been given on the right boundary x = L,
26 CHAPTER 2. BASICS OF PDES
2 2
u(x,0)=1−sin(π x)
1.5 1.5
1 1
0.5 0.5
0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Figure 2.4: Characteristics for Burgers' equation (left panel), and the solution (right
panel) at dierent times for a periodic problem. The black line is the initial condition,
the red line the solution at t = 1=8, the blue at t = 1=4, and the magenta at t = 3=4.
The solution become discontinuous when characteristics intersects.
the problem would then be ill-posed for lack of proper boundary conditions. The right
boundary maybe called an \outlet" boundary since information is leaving the domain.
No boundary condition can be imposed there since the solution is determined from
\upstream" information. In the case where c < 0 the information travels from right to
left, and the boundary condition must be imposed at x = L.
If the advection speed c varies in x and t, then the characteristics are not necessarily
straight lines of constant slope, but are in general curves. Since the slopes of the curves
vary, characteristic lines may intersects. These intersection points are places where the
solution is discontinuous with sharp jumps. At these locations the slopes are innite and
space and time-derivative become meaningless, i.e. the PDE is not valid anymore. This
breakdown occurs usually because of the neglect of important physical terms, such as
dissipative terms, that act to prevent true discontinuous solutions.
An example of an equation that can lead to discontinuous solutions is the Burger
equation:
ut + uux = 0 (2.53)
where c = u. This equation is nonlinear with a variable advection speed that depend on
the solution. The characteristics are given by the lines:
dx = u (2.54)
dt
along which the PDE takes the form du = 0. Hence u is constant along characteristics,
which means that their slopes are also constant according to equation (2.54), and hence
must be straightlines. Even in this nonlinear case the characteristics are straightlines
but with varying slopes. The behavior of the solution can become quite complicated as
characteristic lines intersect as shown in gure 2.4. The solution of hyperbolic equations
in the presence of discontinuities can become quite complicated. We refer the interested
reader to !20, 13] for further discussions.
2.3. FIRST ORDER SYSTEMS 27
Reminder: A diagonal matrix is one where all entries are zero save for the ones on the
main diagonal, and in the case above the diagonal entries are the eigenvalues of the
matrix. The system can then be uncoupled by dening the auxiliary variables w = Tw^ ,
replacing in the original equation we get the equations
@ w^ + D @ w^ = 0
! @ w^i + @ w^i = 0 (2.56)
@t @x @t i @x
The equation on the left side is written in the vector form whereas the component form
on the right shows the uncoupling of the equations. The component form clearly shows
how the sytem is hyperbolic and analogous to the scalar form.
Example 5 The linearized equation governing tidal ow in a channel of constant cross
section and depth are the shallow water equations:
! ! ! !
@ u + 0 g @ u = 0 A = 0 g (2.57)
@t
h 0 @x
h 0
where u and
are the unknown velocity and surface elevation, g is the gravitational
acceleration, and h the water depth. The eigenvalues of the matrix A can be found by
solving the equation:
;
det A = 0 , det h ; = 2 ; gh = 0:
g (2.58)
p
The two real roots of the equations are = c, where c = gh. Hence the eigenvalues
are the familiar gravity wave speed. Two eigenvectors of the system are u1 and u2
corresponding to the positive and negative roots, respectively:
0 1 0 1
1 1
u1 = @ c A u2 = @ ; c A : (2.59)
g g
The eigenvectors are the columns of the transformation matrix T, and we have
! g !
1 1 1
T = c ; c T 1 = 2 11 cg :
;
(2.60)
;
g g c
It it easy to verify that !
D=T 1 AT = c 0
0 ;c (2.61)
;
28 CHAPTER 2. BASICS OF PDES
t
6 u(x t)
;;@@
;; @@
;1; ;
@@
1 @
;c
;; @@
c
;u;
^(x t) = u^0 (xa )
^(x t) =
^0 (xb )@@
; @ -
xa xb x
Figure 2.5: Characteristics for the one-dimensional tidal equations. The new variables
u^ and
^ are constant along the right, and left going characteristic, respectively. The
solution at the point (x t) can be computed by nding the foot of two characteristic
curves at the initial time, xa and xb and applying the conservation of u^ and
^.
is a diagonal matrix with real eigenvalues for all points within an neighborhood of (x y).
30 CHAPTER 2. BASICS OF PDES
Chapter 3
u (x) = lim
0
u(x + x) ; u(x) (3.1)
x 0
! x
Computers however cannot deal with the limit of x ! 0, and hence a discrete analogue
of the continuous case need to be adopted. In a discretization step, the set of points on
which the function is dened is nite, and the function value is available on a discrete set
of points. Approximations to the derivative will have to come from this discrete table of
the function.
Figure 3.1 shows the discrete set of points xi where the function is known. We
will use the notation ui = u(xi ) to denote the value of the function at the i-th node
of the computational grid. The nodes divide the axis into a set of intervals of width
xi = xi+1 ; xi . When the grid spacing is xed, i.e. all intervals are of equal size, we
will refer to the grid spacing as x. There are denite advantages to a constant grid
spacing as we will see later.
x
ui (3.2)
where now x is nite and small but not necessarily innitesimally small, i.e. . This is
known as a forward Euler approximation since it uses forward dierencing. Intuitively,
the approximation will improve, i.e. the error will be smaller, as x is made smaller.
31
32 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES
xi−1 xi xi+1
Figure 3.1: Computational grid and example of backward, forward, and central approx-
imation to the derivative at point xi . The dash-dot line shows the centered parabolic
interpolation, while the dashed line show the backward (blue), forward (red) and centered
(magenta) linear interpolation to the function.
The above is not the only approximation possible, two equally valid approximations are:
backward Euler:
u (xi) u(xi ) ;
0
u(xi ; x) = ui ; ui 1
x x
;
(3.3)
Centered Dierence
0
; u(xi ; x) = ui+1 ; ui
u (xi ) u(xi + x)2 ;1 (3.4)
x 2x
All these denitions are equivalent in the continuum but lead to dierent approximations
in the discrete case. The question becomes which one is better, and is there a way to
quantify the error committed. The answer lies in the application of Taylor series analysis.
We briey describe Taylor series in the next section, before applying them to investigate
the approximation errors of nite dierence formulae.
3.3. TAYLOR SERIES 33
Zx
u (x) = u (xi ) +
0 0
u (s) ds
00
(3.6)
xi
Replacing this expression in the original formula and carrying out the integration (since
u(xi ) is constant) we get:
Z xZ x
u(x) = u(xi ) + (x ; xi)u (xi) + 0
u (s) ds ds
00
(3.7)
xi xi
The process can be repeated with
Zx
u (x) = u (xi ) +
00 00
u (s) ds
000
(3.8)
xi
to get:
Z xZ xZ x
u(x) = u(xi ) + (x ; xi )u (xi ) + (x ;2!xi ) u (xi ) +
2
0 00
u (s) ds ds ds
000
(3.9)
xi xi xi
This process can be repeated under the assumption that u(x) is suciently dieren-
tiable, and we nd:
u(x) = u(xi ) + (x ; xi)u (xi ) + (x ;2!xi ) u (xi ) + + (x ;n!xi ) u( n)(xi ) + Rn+1 (3.10)
2 n
0 00
xi xi xi
where xi 1 = xi ; xi 1 . We can now get an expression for the error corresponding to
; ;
xi 1 2!
| {z }
;
Truncation Error
3.3. TAYLOR SERIES 35
It is now clear that the truncation error of the backward dierence, while not the same
as the forward dierence, behave similarly in terms of order of magnitude analysis, and
is linear in x:
@u = ui ; ui 1 + O(x) (3.20)
@x
;
xi x i
Notice that in both cases we have used the information provided at just two points to
derive the approximation, and the error behaves linearly in both instances.
Higher order approximation of the rst derivative can be obtained by combining the
two Taylor series equation (3.15) and (3.18). Notice rst that the high order derivatives
of the function u are all evaluated at the same point xi , and are the same in both
expansions. We can now form a linear combination of the equations whereby the leading
error term is made to vanish. In the present case this can be done by inspection of
equations (3.16) and (3.19). Multiplying the rst by xi 1 and the second by xi and
;
;
;
; 3!
(3.21)
There are several points to note about the preceding expression. First the approximation
uses information about the functions u at three points: xi 1 , xi and xi+1 . Second the
truncation error is T.E. O(xi 1 xi ) and is second order, that is if the grid spacing is
;
decreased by 1/2, the T.E. error decreases by factor of 22 . Thirdly, the previous point can
be made clearer by focussing on the important case where the grid spacing is constant:
xi 1 = xi = x, the expression simplies to:
;
ui+1 ; ui 1 ; @u = x2 @ 3 u + : : : (3.22)
@x xi 3! @x3 xi
;
2x
Hence, for an equally spaced grid the centered dierence approximation converges quadrat-
ically as x ! 0:
@u = ui+1 ; ui 1 + O(x2 ) (3.23)
@x
;
xi 2x
Note that like the forward and backward Euler dierence formula, the centered dier-
ence uses information at only two points but delivers twice the order of the other two
methods. This property will hold in general whenever the grid spacing is constant and
the computational stencil, i.e. the set of points used in approximating the derivative, is
symmetric.
3.3.2 Higher order approximation
The Taylor expansion provides a very useful tool for the derivation of higher order ap-
proximation to derivatives of any order. There are several approaches to achieve this.
We will rst look at an expendient one before elaborating on the more systematic one.
In most of the following we will assume the grid spacing to be constant as is usually the
case in most applications.
36 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES
Equation (3.22) provides us with the simplest way to derive a fourth order approx-
imation. An important property of this centered formula is that its truncation error
contains only odd derivative terms:
ui+1 ; ui 1 = @u + x2 @ 3u + x4 @ 5u + x6 @ 7u + : : : + x2m @ (2m+1) u + : : :
;
60x @x 7! @x7
The process can be repeated to derive higher order approximations.
3.3.3 Remarks
The validity of the Taylor series analysis of the truncation error depends on the existence
of higher order derivatives. If these derivatives do not exist, then the higher order
approximations cannot be expected to hold. To demonstrate the issue more clearly we
will look at specic examples.
Example 6 The function u(x) = sin x is innitely smooth and dierentiable, and its
rst derivative is given by ux = cos x. Given the smoothness of the function we expect
the Taylor series analysis of the truncation error to hold. We set about verifying this claim
in a practical calculation. We lay down a computational grid on the interval ;1 x 1
of constant grid spacing x = 2=M . The approximation points are then xi = ix ; 1,
3.3. TAYLOR SERIES 37
1 0.01
0.5 0.005
u(x)
0 0
ε
−0.5 −0.005
−1 −0.01
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
x x
0 0
10 10
FD,BD
FD,BD
−5 −5 CD2
10 10
max( |ε| )
2
CD2
||ε||
−10 −10
10 10 CD4
CD4
CD6
−15 CD6 −15
10 0 2 4
10 0 2 4
10 10 10 10 10 10
M M
Figure 3.2: Finite dierence approximation to the derivative of the function sin x. The
top left panel shows the function as a function of x. The top right panel shows the
spatial distribution of the error using the Forward dierence (black line), the backward
dierence (red line), and the centered dierences of various order (magenta lines) for the
case M = 1024 the centered dierence curves lie atop each other because their errors
are much smaller then those of the rst order schemes. The lower panels are convergence
curves showing the rate of decrease of the rms and maximum errors as the number of
grid cells increases.
38 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES
i = 0 1 : : : M . Let be the error between the nite dierence approximation to the
rst derivative, u~x , and its analytical derivative ux :
i = u~x(xi ) ; ux (xi ) (3.30)
The numerical approximation u~x will be computed using the forward dierence, equation
(3.17), the backward dierence, equation (3.20), and the centered dierence approxima-
tions of order 2, 4 and 6, equations (3.22), (3.27, and (3.29). We will use two measures
to characterize the error i , and to measure its rate of decrease as the number of grid
points is increased. One is a bulk measure and consists of the root mean square error,
and the other one consists of the maximum error magnitude. We will use the following
notations for the rms and max errors:
X
M !2
1
the order of the approximation as predicted by the Taylor series analysis. The slopes
on this log-log plot are -1 for forward and backward dierence, and -2, -4 and -6 for the
centered dierence schemes of order 2, 4 and 6, respectively. Notice that the maximum
error decreases at the same rate as the rms error even though it reports a higher error.
Finally, if one were to gauge the eciency of using information most accurately, it is
evident that for a given M , the high order methods achieve the lowest error.
Example 7 We now investigate the numerical approximation to a function with nite
dierentiability, more precisely, one that has a discontinuous third derivative. This func-
tion is dened as follows:
u(x) ux(x) uxx(x) uxxx
x < 0 sin x cos x ; sin x 2
2 ;3 cos x
0 < x xe x (1 ; 2x2 )e x 2x(2x2 ; 3)e x ;2(3 ; 12x2 + 4x4 )e x2
;
2 ;
2 ; ;
−6
x 10
1.5 1
1
0.5
0.5
u(x)
ε
0
−0.5
−0.5
−1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
x x
0 0
10 10
FD,BD
max( |ε| )
FD,BD
2
|| ε ||
−5 −5 CD2
10 10
CD2 CD4
CD6
CD4
−10 CD6 −10
10 0 2 4
10 0 2 4
10 10 10 10 10 10
M M
Figure 3.3: Finite dierence approximation to the derivative of a function with discon-
tinuous third derivative. The top left panel shows the function u(x) which, to the eyeball
norm, appears to be quite smooth. The top right panel shows the spatial distribution
of the error (M = 1024) using the fourth order centered dierence: notice the spike at
the discontinuity in the derivative. The lower panels are convergence curves showing the
rate of decrease of the rms and maximum errors as the number of grid cells increases.
40 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES
that the convergence rates of the fourth and sixth order centered schemes are no better
then that of the second order one. This is a direct consequence of the discontinuity in
the third derivative whereby the Taylor expansion is valid only up to the third term. The
eects of the discontinuity are more clearly seen in the maximum error plot (lower right
panel) then in the mean error one (lower left panel). The main message of this example
is that for functions with a nite number of derivatives, the Taylor series prediction for
the high order schemes does not hold. Notice that the error for the fourth and sixth
order schemes are lower then the other 3, but their rate of convergence is the same as
the second order scheme. This is largely coincidental and would change according to the
function.
3.3.4 Systematic Derivation of higher order derivative
The Taylor series expansion provides a systematic way of deriving approximation to
higher order derivatives of any order (provided of course that the function is smooth
enough). Here we assume that the grid spacing is uniform for simplicity. Suppose that
the stencil chosen includes the points: xj such that i ; l j i + r. There are thus l
points to the left and r points to the right of the point i where the derivative is desired
for a total of r + l + 1 points. The Taylor expansion is:
2 3 4 5
ui+m = ui + (m1!x) ux + (m x) (mx) (mx) (mx)
2! uxx + 3! uxxx + 4! uxxxx + 5! uxxxxx +: : :
(3.33)
for m = ;l : : : r. Multiplying each of these expansions by a constant am and summing
them up we obtain the following equation:
0 r 1 0 r 1
X
r X X
am ui+m ; @ am A ui = @ mam A 1!x @u
@x i
m= lm=0 m= lm=0 m= lm=0
; 6 ; 6
0 r ; 6
1
X x 2 @ 2 u
+ @ m am A 2! @x2
2
m= lm=0 i
0 r ;
1 6
X 3 3
+ @ m3am A 3!x @@xu3
0m= rlm=0;
1 6
i
X 4 4
+ @ m4am A 4!x @@xu4
0m= rlm=0;
1 6
i
X 5 5
+ @ m5am A 5!x @@xu5
m= lm=0 ; 6 i
+ ::: (3.34)
P
It is clear that the coecient of the k-th derivative is given by bk = rm= lm=0 mk am .
Equation (3.34) allows us to determine the r + l coecients am according to the derivative
; 6
desired and the order desired. Hence if the rst order derivative is needed at fourth order
accuracy, we would set b1 to 1 and b234 = 0. This would provide us with four equations,
3.3. TAYLOR SERIES 41
and hence we need at least four points in order to determine its solution uniquely. More
generally, if we need the k-th derivative then the highest derivative to be neglected must
be of order k + p ; 1, and hence k + p ; 1 points are needed. The equations will then
have the form:
Xr
bq = mq am = qk q = 1 2 : : : k + p ; 1 (3.35)
m= lm=0
; 6
where qk is the Kronecker delta qk = 1 is q = k and 0 otherwise. For the solution to
exit and be unique we must have: l + r = k + p. Once the solution is obtained we can
determine the leading order truncation term by calculating the coecient multiplying
the next higher derivative in the truncation error series:
X
r
bk+1 mk +p am : (3.36)
m= lm=0
; 6
1
CA = B@ b3 CA (3.37)
(;3)4 (;2)4 (;1)4
;
(1)4 a1 b4
If the rst derivative is desired to fourth order accuracy, we would set b1 = 1 and
b234 = 0, while if the second derivative is required to third order accuracy we would set
b134 = 0 and b2 = 1. The coecients for the rst example would be:
0 1 0 1
BB aa ; 3 ;
CC 1 BB 12 CC1
B@ a ; 2 CA = 12 B@ ;18 CA (3.38)
; 1
a1 3
x
42 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES
The second order derivative can be computed by noticing that
2 2
xuui 1=;xu(x1u!i ) = x(ux + O(x ) (3.42)
i+ 2 i
x x
;
2
= uxx + O(x2 ) (3.43)
1 (u 1 ) ; (u 1 ) = u + O(x2 ) (3.44)
x x i+ 2 x i 2
; xx
ui+1 ; 2ui + ui 1 ) = u + O(x2 ) (3.45)
xx
;
x2
The truncation error can be veried by going through the formal Taylor series analysis.
Another application of operator notation is the derivation of higher order formula.
For example, we know from the Taylor series that
2
2xui = ux + 3!x uxxx + O(x4 ) (3.46)
If I can estimate the third order derivative to second order then I can substitute this
estimate in the above formula to get a fourth order estimate. Applying the x2 operator
to both sides of the above equation we get:
2
x2 (2x ui ) = x2 (ux + 3!x uxxx + O(x4 )) = uxxx + O(x2 ) (3.47)
Thus we have 2
2x ui = ux + 3!x x2 !2xui + O(x2 )] (3.48)
Rearranging the equation we have:
3 !
uxjxi = 1 ; 3!x x2 2xui + O(x4 ) (3.49)
dierence formula.
3.4.2 Quadratic Fit
It is easily veried that the following quadratic interpolation will t the function values
at the points xi and xi 1 :
x2 i
;
2x2 i+1
(3.52)
Dierentiating the functions and evaluating it at xi we can get expressions for the rst
and second derivatives:
@L2 = ui+1 ; ui 1 (3.53)
@x xi
;
2x
2
@ L2 = ui+1 ; 2ui + ui 1
@x2 xi x2
;
(3.54)
Notice that these expression are identical to the formulae obtained earlier. A Taylor
series analysis would conrm that both expression are second order accurate.
3.4.3 Higher order formula
Higher order fomula can be develop by Lagrange polynomials of increasing degree. A
word of caution is that high order Lagrange interpolation is practical when the evaluation
point is in the middle of the stencil. High order Lagrange interpolation is notoriously
noisy near the end of the stencil when equal grid spacing is used, and leads to the
well known problem of Runge oscillations !9]. Spectral methods that do not use periodic
Fourier functions (the usual \sin" and \cos" functions) rely on unevenly spaced points.
44 CHAPTER 3. FINITE DIFFERENCE APPROXIMATION OF DERIVATIVES
Chapter 4
4.1 Introduction
Here we derive how an ODE may be obtained in the process of solving numerically a
partial dierential equations. Let us consider the problem of solving the following PDE:
ut + cux = uxx 0 x L (4.1)
subject to periodic boundary conditions. Equation (4.1) is an advection diusion equa-
tion with c being the advecting velocity and the viscosity coecient. We will take
c and to be positive constants. The two independent variables are t for time and x
for space. Because of the periodicity, it is sensible to expand the unknown function in a
Fourier series:
X 1
where u^n are the complex amplitudes and depend only on the time variable, whereas eikx
are the Fourier functions with wavenumber kn . Because of the periodicity requirement
we have kn = 2n=L where n is an integer. The Fourier functions form what is called an
orthonormal basis, and can be determined as follows: multiply the two sides of equations
(4.2) by e ikm x where m is integer and integrate over the interval !0] to get:
;
ZL X
1 ZL
ue
; ikm x dx = u^n(t) ei(kn ;km )x dx (4.3)
0 k= ;1
0
45
46 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE
Now notice that the integral on the right hand side of equation (4.3) satises the or-
thogonality property:
8 i(kn km)L
ZL >
<e ;
; 1 = ei2(n m)L ; 1 = 0 n 6= m
;
ei(kn km )x dx =
: Li(knn =; kmm ) i(kn ; km )
;
> (4.4)
0
The role of the integration is to pick out the m ; th Fourier component since all the other
integrals are zero. We end up with the following expression for the Fourier coecients:
ZL
u^m = L1 u(x)e ikm x
;
dx (4.5)
0
Equation (4.5) would allow us to calculate the Fourier coecients for a known function
u. Note that for a real function, the Fourier coecients satisfy
u^ n = u^n
;
(4.6)
where the superscript stands for the complex conjugate. Thus, only the positive Fourier
components need to be determined and the negative ones are simply the complex conju-
gates of the positive components.
The Fourier series (4.2) can now be dierentiated term by term to get an expression
for the derivatives of u, namely:
X
1
X
1
X d^un iknx
1
ut = dt e (4.9)
k=
;1
Replacing these expressions for the derivative in the original equation and collecting
terms we arrive at the following equations:
X d^un
1
2 )^
ikn x = 0
dt + ( ick n + k n n e
u (4.10)
k= ;1
Note that the above equation has to be satised for all x and hence its Fourier amplitudes
must be zero for all x (just remember the orthogonality property (4.4), and replace u by
zero). Each Fourier component can be studied separately thanks to the linearity and the
constant coecients of the PDE.
The governing equation for the Fourier amplitude is now
d^u = ;(ick + k2 ) u^ (4.11)
dt | {z }
4.2. FORWARD EULER APPROXIMATION 47
where we have removed the subscript n to simplify the notation, and have introduced
the complex number . The solution to this simple ODE is:
u^ = u^0 et (4.12)
where u^0 = u^(t = 0) is the Fourier amplitude at the initial time. Taking the ratio of the
solution between time t and t + t, we can get see the expected behavior of the solution
between two consquetive times:
u^(t + t) = et = e e()t ei m()t
R I
(4.13)
u^(t)
where Re() and I m() refer to the real and imaginary parts of . It is now clear to
follow the evolution of the amplitude of the Fourier components:
ju^(t + t)j = ju^(t)je R e()t (4.14)
The analytical solution predicts an exponential decrease if Re() < 0, an exponential
increase if Re() > 0, and a constant amplitude if Re() = 0. The imaginary part of
inuences only the phase of the solution and decreases by an amount I m()t. We now
turn to the issue of devising numerical solution to the ODE.
The terms on the right hand side are the truncation errors of the forward Euler ap-
proximation. The formal denition of the truncation error is that it is the dierence
between the analytical and approximate representation of the dierential equation. The
leading error term (for suciently small t) is linear in t and hence we expect the
errors to decrease linearly. Most importantly, the approximation is consistent in that
the truncation error goes to zero as t ! 0.
48 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE
Given the initial condition u(t = 0) = u0 we can advance the solution in time to get:
u1 = (1 + t)u0
u2 = (1 + t)u1 = (1 + t)2 u0
u3 = (1 + t)u2 = (1 + t)3 u0 (4.17)
..
.
un = (1 + t)un 1 = (1 + t)n u0
;
Let us study what happens when we let t ! 0 for a xed time integration tn = nt.
The only factor we need to worry about is the numerical amplication factor:(1 + t):
t n t n ln(1+ t) n t ( t; 2 t2 +:::)
lim
t 0
!
(1 + t ) t = lim e
t 0
!
t = lim
t 0
e t
!
= etn (4.18)
where we have used the logarithm Taylor series ln(1 + ) = ; 2 + : : :, assuming that
t is small. Hence we have proven convergence of the numerical solution to the analytic
solution in the limit t ! 0. The question is what happens for nite t?
Notice that in analogy to the analytic solution we can dene an amplication factor
associated with the numerical solution, namely:
n
A = uun 1 = jAjei
;
(4.19)
where is the argument of the complex number A. The amplitude of A will determine
whether the numerical solution is amplifying or decaying, and its argument will determine
the change in phase. The numerical amplication factor should mimic the analytical
amplication factor, and should lead to an anologous increase or decrease of the solution.
For small t it can be seen that A is just the rst term of the Taylor series expansion
of et and is hence only rst order accurate. Let us investigate the magnitude of A in
terms of , a problem parameter, and t the numerical parameter, we have:
jAj2 = AA = 1 + 2Re()t + jj2 t2
(4.20)
We focus in particular for the condition under which the amplitude factor is less then 1.
The following condition need then to be fulllled (assuming t > 0):
t ;2 Re() jj2 (4.21)
There are two cases to consider depending on the sign of Re(). If Re() > 0 then jAj > 1
for t > 0, and the nite dierence solution will grow like the analytical solution. For
Re() = 0, the solution will also grow in amplitude whereas the analytical solution
predicts a neutral amplication. If Re() < 0, then jAj > 1 for t > ;2 e(2) whereas R
j j
the analytical solution predicts a decay. The moral of the story is that the numerical
solution can behave in unexpected ways. We can rewrite the amplication factor in the
following form:
jAj2 = AA = !Re(z) + 1]2 + !I m(z)]2
(4.22)
where z = t. The above equation can be interpreted as the equation for a circle
centered at (;1 0) in the complex plane with radius jAj2 . Thus z must be within the
unit circle centered at (;1 0) for jAj2 1.
4.3. STABILITY, CONSISTENCY AND CONVERGENCE 49
each others, and invoking the linearity of the process, we can derive an equation for the
evolution of the error in time, namely:
en = Aen 1 ; T n 1 t
; ;
(4.27)
50 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE
where en = U n ; un is the total error at time tn = nt. The reapplication of this formula
to en 1 transforms it to:
;
h i
en = A(Aen 2 ; T n 2 t) ; T n 1t = (A)2 en 2 ; t AT n 2 + T n
; ; ; ; ; ; 1 (4.28)
where (A)2 = A:A, and the remainder of the superscript indicate time levels. Repeated
application of this formula shows that:
h i
en = (A)n e0 ; t (A)n 1 T 0 + (A)n 2 T 1 + : : : + AT n 2 + T n
; ; ; 1
;
(4.29)
Equation (4.29) shows that the error at time level n depends on the initial error, on
the history of the truncation error, and on the discretization through the factor A. We
will now attempt to bound this error and show that this possible if the truncation error
can be made arbitrarily small (the consistency condition), and if the scheme is stable
according to the denition shown above. A simple application of the triangle inequality
shows that
h i
jen j jAnj je0 j + t (A)n 1 T 0 + (A)n 2 T 1 + : : : + jAj T n 2 + T n 1 (4.30)
; ; ; ;
and the amplication factor is simply A = 1=(1 ; t). Its magnitude is given by:
using simple Taylor series expansions about time n + 21 . The truncation error is O(t2 )
and the method is hence second order accurate. It is implicit since un+1 is used in the
evaluation of the right hand side. The unkown function can be updated as:
t
un+1 = 1 + 2 t un (4.48)
1; 2
The amplication factor is
t
A = 1 + 2 t jAj2 = 1 + Re(t) + 2 4t2
t j
2j 2
(4.49)
1; 2 1 ; Re(t) + 4 j j
1.5
0.5
−0.5
−1
−1.5
−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Figure 4.1: Solution of the oscillation equation using the forward (x), backward (+) and
trapezoidal schemes (). The analytical solution is indicated by a red asterisk.
1.3 RK3
1.2
RK4
1.1
1
Relative Phase
RK2
0.9
0.8
TZ
0.7
0.6
FD,BD
0.5
0.4
0 0.5 1 1.5 2 2.5
κ∆ t
Figure 4.2: Phase errors for several two-level schemes when Re() = 0. The forwad
and backward dierencing schemes (FD and BD) have the same decelerating relative
phase. The Trapezoidal scheme (TZ) has lower phase error for the same t as the 2
rst order schemes. The Runge Kutta schemes of order 2 and 3 are accelerating. The
best performance is for RK4 which stays closest to the analytical curve for the largest
portion of the spectrum.
4.7. HIGHER ORDER METHODS 55
In general it is hard to get a simple formula for the phase error since the expressions often
involve the tan 1 functions with complicated arguments. Figure 4.2 shows the relative
;
phase as a function of t for the case where Re() = 0 for several time integration
schemes. The solid black line (R = 1) is the reference for an exact phase. The forward,
backward and trapezoidal dierencing have negative phase errors (and hence the schemes
are decelerating), while the RK schemes (to be presented below) have an accelerating
phase.
A family of second order Runge Kutta schemes can be obtained by varying b2 . Two
common choices are
Midpoint rule with b2 = 1, so that b1 = 0 and a21 = c2 = 21 . The schemes
becomes:
u(1) = un + 2t f (un tn ) (4.71)
un+1 = un + tf (u(1) tn + 2t ) (4.72)
The rst phase of the midpoint rule is a forward Euler half step, followed by a
centered approximation at the mid-time level.
4.7. HIGHER ORDER METHODS 57
1
RK3 AB2
2 0.8 AB3
RK4
RK2 0.6
1 0.4
0.2
0 0
−0.2
−1 −0.4
−0.6
−2 −0.8
−1
−3
−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 −1 −0.8 −0.6 −0.4 −0.2 0 0.2
Figure 4.3: Stability region for the Runge-Kutta methods of order 2, 3, and 4 (left gure),
and for the Adams Bashforth schmes of order 2 and 3 (right gure). The RK2 and AB2
stability curves are tangent to the imaginary axis at the origin, and hence the method
are not stable for purely imaginary t.
5. A new family of Runge Kutta scheme was devised in recent years to cope with the
requirements of Total Variations Diminishing (TVD) schemes. For second order
methods, the Heum scheme is TVD. The third order TVD Runge Kutta scheme is
q1 = tf (un tn ) u(1) = un + q31
q2 = tf (u(1) ) u(2) = 34 un + 14 u(1) + 41 q2 (4.77)
(2) n +1 1 n
q3 = tf (u ) u = 3 u + 3 u + 3 q3 2 (2) 2
The determination of the amplication factor is complicated by the fact that two time
levels are involved in the calculations. Nevertheless, let us assume that the amplication
factor is the same for each time step, i.e. un = Aun 1 , and un+1 = Aun . We then arrive
;
solution is capable of behaving in two dierent ways, or modes. The mode associated
with A+ is referred to as the physical mode because it approximates the solution to
the original dierential equation. The mode associated with A is referred to as the
;
and hence the computational mode is expected to keep its amplitude but switch sign at
every time step. Applying the leap-frog scheme we see that all even time levels will have
the correct value: u2 = u4 = : : : = u0 . The odd time levels will be contaminated by
error in estimating the second initial condition needed to jump start the calculations. If
u1 = u0 + where is the initial error committed, the solution at all odd time levels
will then be u2n+1 = u0 + . The numerical solution for the present simple case can be
written entirely in terms of the physical (initial condition) and computational (initial
condition error) modes:
un = u0 + 2 ; (;1)n 2 (4.82)
Absolute stability requires that jAj 1 notice however that the product of the
two roots is A+ A = ;1, which implies that jA+ jjA j = 1. Hence, if one root, say
A+ is stable jA+j < 1, the other one must be unstable with jA j = 1=jA+ j > 1 the
; ;
only exception is when both amplication factor have a neutral amplication jA+ j =
;
jA j = 1. For real z, Im(z) = 0, one of the two roots has modulus exceeding 1, and
;
the schemep is always unstable. Let us for a moment assume that z = i , we then have:
A = i + 1 ; 2 . If 1 then the quantity under the square root sign is positive and
we have two roots such that jA+ j = jA j = 1. To make further progress on studying the
;
stability of the leap frog scheme, let z = sinh(w) where w is a complex number. Using
the identity cosh2 w ; sinh2 w = 1 we arrive at the expression A = sinh w cosh w.
Setting w = a + ib where a b are real, subsituting in the previous expression for the
amplication factor, and calculating its modulus we get: jA j = e a . Hence a = 0 for
both amplication factors to be stable. The region of stability is hence z = i sin b where
b is real, and is conned to the unit slit along the imaginary axis jI m(z )j 1.
The leap frog scheme is a popular scheme to integrate PDE's of primarily hyperbolic
type in spite of the existence of the computational mode. The reason lies primarily in
60 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE
its neutral stability and good phase properties. The control of the computational mode
can be eectively achieved either with a Asselin time lter (see !13]) or by discarding
periodically the solution at level n ; 1 and taking a two time level scheme.
Multi-Step schemes
A family of multi-step schemes can built upon interpolating the right hand side of the
ODE in the interval !tn tn+1 ] and performing the integral. The derivation starts from the
exact solution to the ODE:
Z tn+1
un+1 = un + f (u t) dt (4.83)
tn
Since the integrand is unknown in !tn tn+1 ] we need to nd a way to approximate it given
information at specic time levels. A simple way to achieve this is to use a polynomial
that interpolates the integrand at the points (tk uk ), n ; p k n, where the solution
is known. If we write:
f (u t) = Vp(t) + Ep (t) (4.84)
where Vp is the polynomial approximation and Ep the error associated with it, then the
numerical scheme becomes:
Z tn+1 Z tn+1
n+1 n
u =u + Vp (t) dt + Ep(t) dt (4.85)
tn tn
If the integration of Vp is performed exactly, then the only approximation errors present
are due to the integration of the interpolation error term this term can be bounded by
max(jEp jt.
The explicit family of Adams Bashforth scheme relies on Lagrange interpolation.
Specically,
X
p
Vp(t) = hpk (t)f n p
;
(4.86)
k=0
Yp t ; tn m
hpk (t) = ;
(4.87)
m=0m=k tn k ; tn m
6
; ;
t;t
= t t ;;tnt : : : t ;nt (k 1) t t ;;tnt k 1 : : : t t ;;tnt p
; ; ; ;
(4.88)
;
n k n
; n k n (k 1) n k n k 1
; ; ; ; n k n p
; ; ; ;
It is easy to verify that hpk (t) is a polynomial of degree p ; 1 in t, and that hpk (tn m ) = 0
for m 6= k and hpk (tn k ) = 1. These last two properties ensures that Vp(tn k ) = f n k .
;
;
; ;
The error associated with the Lagrange interpolation with p + 1 points is O(tp+1).
Inserting the expressions for Vp in the numerical scheme, and we get:
Xp Z tn+1
n +1 n
u =u + f n k p
hk (t) dt
;
+ tO(tp+2) (4.89)
k=0 tn
Note that the error appearing in the above formula is only the local error, the global
error is one order less, i.e. it is O(tp+1).
4.7. HIGHER ORDER METHODS 61
n n 1 ; n 1
; n
The integral on the interval !tn tn+1 ] is
Z tn+1 Z tn+1 t ; tn 1 dtf n + Z tn+1 t ; tn dtf n
V1 (t) dt = ; 1 (4.91)
tn tn ; tn 1 tn tn 1 ; tn
;
tn ; ;
= t 23 f n ; 21 f n 1 ;
(4.92)
The nal expression for the second order Adams Bashforth formula is:
un+1 = un + t 23 f n ; 21 f n 1 + O(t3 ) ;
(4.93)
A third order formula can be designed similarly. Starting with the quadratic inter-
polation polynomial V2 (t):
V2 (t) = !t!t ; tn 1 ]!t ; tn 2 ] f n + !t ; tn]!t ; tn 2 ] f n 1
; ; ;
; ; ; ; ;
+ !t !t ; tn ]!t ; tn 1 ] f n 2
; t ]!t ; t ]
; ;
(4.94)
n 2
; n n 2
; n 1 ;
Its integral can be evaluated and plugged into equation (4.89) to get:
23
un+1 = un + t 12 f n ; 16
12 f n 1 + 5 fn 2
12
;
(4.95);
The stability of the AB2 scheme can be easily determined for the sample problem.
The amplication factors are the roots to the equation
3 z
A ; 1 + 2z A + 2 = 0
2 (4.96)
and the two roots are:
2 s 3 2 3
1 3
A = 2 4 1 + 2 z 1 + 2 z ; 2z 5 : (4.97)
Like the leap frog scheme AB2 suers from the existence of a computational mode. In the
limit of good resolution, z ! 0, we have A+ ! 1 and A+ ! 0 that is the computational
mode is heavily damped. gure 4.4 shows the modulus of the physical and computational
modes for Re(z ) = 0 and Im(z ) < 1. The modulus of the computational mode amplitude
factor is quite small for the entire range of z considered. On the other hand the physical
mode is unstable for purely imaginary z as the modulus of its amplication factor exceeds
1. Note however, that a series expansion of jA+ j for small z = it shows that jA+ j =
1 + (t4 )=4 and hence the instability grows very slowly for suciently small t. It
62 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE
1.6
1.4
1.2
|A±|
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
κ∆ t
Figure 4.4: Modulus of amplication factor for the physical and computational modes
of AB2 when Re() = 0.
can be anticipated that AB3 will have one physical mode and two computational modes
since its stability analysis leads to a third order equation for the amplication factor.
Like AB2, AB3 strongly damps the two computational modes it has the added benet
of providing conditional stability for Im(z ) 6= 0. The complete stability regions for AB2
and AB3 is shown in the right panel of gure 4.3.
Like all multilevel schemes there are some disadvantages to Adams Bashforth meth-
ods. First a starting method is required to jump start the calculations. Second the
stability region shrinks with the order of the method. The good news is that although
AB2 is unstable for imaginary , its instability is small and tolerable for nite integration
time. The third order Adams Bashforth scheme on the other hand includes portion of
the imaginary axis, which makes AB3 quite valuable for the integration of advection like
operators. The main advantage of AB schemes over Runge-Kutta is that they require
but one evaluation of the right hand side per time step and use a similar amount of
storage.
2 2 3 3 4 4
un; k+1 = un+1 ; (kt) du + (kt) d u ; (kt) d u + (kt) d u + : : : (4.98)
1! dt 2! dt2 3! dt3 4! dt4
4.8. STRONGLY STABLE SCHEMES 63
p a1 a2 a3 a4 Ppk=1 kak
1 1 1
2 4=3 ;1=3 2 =3
3 18=11 ;9=11 2=11 6=11
4 48=25 ;36=25 16=25 ;3=25 12=25
11
;
11 t
64 CHAPTER 4. APPLICATION OF FINITE DIFFERENCES TO ODE
4
3.5
2.5
1.5
0.5
Figure 4.5: Stability regions for the Backward Dierencing schemes of order 1, 2 and 3.
The schemes are unstable within the enclosed region and stable everywhere else. The
instability regions grow with the order. The stability regions are symmetric about the
real axis
48 un ; 36 un 1 + 16 un 2 ; 3 un 3 + + 12 t u jn+1 + O(t4 )(4.106)
un+1 = 25 t
; ; ;
25 25 25 25
Notice that we have shown explicitly the time level at which the time derivative is
approximated. The BDF's scheme lead to implicit expressions to update the solution at
time level un+1 . Like the Adams-Bashforth formula the BDF schemes require a starting
method. They also generate computational modes whose number depends on how many
previous time levels have been used. Their most important advantage is their stability
regions in the complex plane which are much larger then equivalent explicit schemes.
is hence naturally damped. Figure 4.5 shows the stability regions for the BDF schemes.
The contours of jAj = 1 are shown in the gure. The schemes are unstable within the
regions shown and stable outside it. The instability region grows with increasing order.
4.9. SYSTEMS OF ODES 65
67
68 CHAPTER 5. NUMERICAL SOLUTION OF PDE'S
Equation 5.4 provides a simple formula for updating the solution at time level n +1 from
the values at time n:
unj +1 = (1 ; )unj + unj 1 where = cxt
;
(5.5)
The variable is known as the Courant number and will gure prominently in the study
of the stability of the scheme. Equation 5.5 can be written as a matrix operation in the
following form:
0 u 1n+1 0 1 10 u 1n
1
B
B u 2 C
C BB 1 ; CC BB u12 CC
B
B .. C
C BB ... ... CC BB .. CC
B
B . C
C BB CC BB . CC
B
B u j 1 C
C BB 1; CC BB uj 1 CC
B C B 1; CC BB uj CC
; ;
B uj C = BB
B
B uj+1 C
C BB 1; CC BB uj+1 CC
B
B .. C
C B ... ... CC BB .. CC
B
B . C
C BB CC BB . CC
@ uN 1 A @ 1; A @ uN 1 A
1;
; ;
uN uN
(5.6)
where we have assumed that the boundary condition is given by u(x1 t) = u0 (x1 ).
The following legitimate question can now be asked:
1. Consistency: Is the discrete equation (5.4) a correct approximation to the con-
tinuous form, eq. (5.3), and does this discrete form reduce to the PDE in the limit
of t x ! 0.
2. Convergence Does the numerical solution unj ! vjn as t x ! 0.
3. Errors What are the errors committed by the approximation, and how should one
expect them to behave as the numerical resolution is increased.
4. Stability Does the numerical solution remained bounded by the data specifying
the problem? or are the numerical errors increasing as the computations are carried
out.
We will now turn to the issue of dening these concepts more precisely, and hint to
the role they play in devising nite dierence schemes. We will return to the issue of
illustrating their applications in practical situations later.
5.1.1 Convergence
Let enj = unj ; vjn denote the error between the numerical and analytical solutions of the
PDE at time nt and point j x. If this error tends to 0 as the grid and time steps are
decreased, the nite dierence solution converges to the analytical solution. Moreover,
a nite dierence scheme is said to be convergent of order (p q) if kek = O(tp xq ) as
t x ! 0.
5.2. TRUNCATION ERROR 69
en 1 = Aen 2 + zn 2 t
; ; ;
(5.16)
5.3. THE LAX RICHTMEYER THEOREM 71
where we have assumed that the matrix A does not change with time to simplify the dis-
cussion (this is tantamount to assuming constant coecients for the PDE). By repeated
application of this argument we get:
en = A2 en 2 + Azn 2 + zn 1 t
; ; ;
(5.17)
= A3 en 3 + A2 zn 3 + Azn 2 + zn 1 t
; ; ; ;
(5.18)
..
.
= An e0 + An z0 + An 1 z1 + : : : + Azn 2 + zn 1 t
; ; ;
(5.19)
Equation 5.19 shows that the error growth depends on the truncation error at all time
levels, and on the discretization through the matrix A. We can use the triangle inequality
to get a bound on the norm of the error. Thus,
kenk kAnk ke0 k + kAnk kz0 k + kAn 1k kz1 k + : : : + kAk kzn 2 k + kzn 1 k t
; ; ;
(5.20)
In order to make further progress we assume that the norm of the truncation error at
any time is bounded by a constant such that
= 0 max
m n 1
(kzm k)
;
(5.21)
where C is a constant independent of n, t and x. The sum in bracket can be bounded
by the factor nC the nal expression becomes:
kenk C ke0 k + tn (5.24)
where tn = nt is the nal integration time. When x ! 0, the initial error ken k can be
made as small as desired. Furthermore, by consistency, the truncation error ! 0 when
t x ! 0. The global error is hence guarateed to go to zero as the computational grid
is rened, and the scheme is convergent.
72 CHAPTER 5. NUMERICAL SOLUTION OF PDE'S
5.4 The Von Neumann Stability Condition
The sole requirements we have put on the scheme for convergence are consistency and
stability. The latter took the form:
kAm k C (5.25)
where C is independent of t, x and n. By the matrix norm properties we have:
kAm k kAkm (5.26)
hence it is sucient to require that kAkm C , or that
kAk C m1 = e tmt ln C = 1 + lnt C t + : : : = 1 + O(t) (5.27)
m
The Von neumann stability condition is hence that
kAk 1 + O(t) (5.28)
Note that this stability condition does not make any reference on whether the con-
tinuous (exact) solution grows or decays in time. Furthermore, the stability condition is
established for nite integration times with the limit t ! 0. In practical computations
the computations are necessarily carried out with a small but nite t, and it is fre-
quently the case that the evolution equation puts a bound on the growth of the solution.
Since the numerical solution and its errors are subject to the same growth factors via the
matrix A, it is reasonable, and in most cases essential to require the stronger condition
kAk 1 for stability for non growing solutions.
A nal practical detail still needs to be ironed out, namely what norm should be
used to measure the error? From the properties of the matrix norm it is immediately
clear that the spectral radius (A) kAk, hence (A) 1 is a necessary condition
for stability but not sucient. There are classes of matrices A where it is sucient,
for example those that posess a complete set of linear eigenvectors such as those that
arise from the discretization of hyperbolic equation. If the 1 or 1-norms are used the
condition for stability becomes sucient.
Example 10 In the case of the advection equation, the matrix A given in equation 5.6
has norm:
kAk1 = jAk = jj + j1 ; j
1 (5.29)
For stability we thus require that jj + j1 ; j 1. Two cases need to be considered:
1. 0 1: kAk = + 1 ; = 1, stable.
2. < 0: kAk = 1 ; 2 > 1, unstable.
3. > 1: kAk = 1 + 2 > 1, unstable.
The scheme is hence guaranteed to converge when 0 1.
5.5. VON NEUMANN STABILITY ANALYSIS 73
with the following expression for the growth of the Fourier coecients:
h i n
u^nk +1 = (1 ; ) + e ikx u^ (5.32)
} k
;
| {z
A
The expression in bracket is nothing but the amplication factor for Fourier mode k.
Stability requires that jAj < 1 for all k.
h ih i
jAj2 = AA = (1 ; ) + e ikx (1 ; ) + eikx
;
(5.33)
= (1 ; )2 + (1 ; )(eikx + e ikx ) + 2 ;
(5.34)
= 1 ; 2 + 2(1 ; ) cos kx + 22 (5.35)
= 1 ; 2(1 ; cos kx) + 22 (1 ; cos kx) (5.36)
= 1 ; 4 sin2 k2 x (1 ; ) (5.37)
It is now clear that jAj2 1 if (1 ; ) > 0, i.e. 0 1. It is the same stability
criterion derived via the matrix analysis procedure.
The third derivative term is indicative of the presence of dispersive errors in the numerical
solution the magnitude of these errors is amplied by the coecient multiplying the third
order derivative. This coecient is always negative in the stability region 0 1.
One can expect a lagging phase error with respect to the analytical solution. Notice also
that the coecients of the higher order derivative on the right hand side term go to zero
for = 1. This \ideal" value for the time step makes the scheme at least third order
accurate according to the modied equation in fact it is easy to convince one self on the
basis of the characteristic analysis that the exact solution is recovered.
Notice that the derivation of the modied equation uses the Taylor series form of
the nite dierence scheme, equation 5.9, rather then the original partial dierential
equations to derive the estimates for the high order derivative. This is essential to
account for the discretization errors. The book by Tannehill et all 1997 discusses a
systematic procedure for deriving the higher order terms in the modied equation.
76 CHAPTER 5. NUMERICAL SOLUTION OF PDE'S
Chapter 6
For c > 0 the scheme simplies to a FTBS, and for c < 0 it becomes a FTFS (forward
time and forward space) scheme. Here we will consider solely the case c > 0 to simplify
things. Figure 6.1 shows plots of the amplication factor for the donor cell scheme. Prior
to discussing the gures we would like to make the following remarks.
6.2.1 Remarks
1. The scheme is conditionally stable since the time step cannot be chosen inde-
pendently of the spatial discretization and must satisfy t tmax = c=x.
77
78 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION
|A| for FTBS µ = 0.25, 0.5, and 0.75
1
0.9
0.8 µ=0.25,0.75
µ=0.50
0.7
0.6
|A|
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k ∆ x/π
1.4
µ=0.75
1.2
µ=0.5,1.0
1
0.8
a
Φ/Φ
0.6
0.4
0.2 µ=0.25
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k ∆ x/π
Figure 6.1: Amplitude and phase diagram of the donor cell scheme as a function of the
wavenumber
6.2. DONOR CELL SCHEME 79
2. The wavelength appearing in the Von-Neumann stability analysis has not been
specied yet. Small values of k correspond to very long wavelegths, i.e. Fourier
modes that are well represented on the computational grid. Large values of k corre-
spond to very short wavelength. This correspondence is evident by the expression
kx = 2x=, where is the wavelength of the Fourier mode. For example, a
twenty kilometers wave represented on a grid with x = 2 kilometers would have
10 points per wavelength and its kx = 22=10 = 2=5.
3. There is an lower limit on the value of the shortest wave representable on a discrete
grid. This wave has a wavelength equal to 2x and takes the form of a see-saw
function its kx = . Any wavelength shorter then this limit will be aliased
into a longer wavelegth. This phenomenon is similar to the one encountered in the
Fourier analysis of time series where the Nyquist limit sets a lower bound to the
smallest measurable wave period.
4. In the previous chapter we have focussed primarily on the magnitude of the ampli-
cation factor as it is the one that impacts the issue of stability. However, additional
information is contained in the expression for the amplication factor that relates to
the dispersive properties of the nite dierence scheme. The analytical expression
for the amplication factor for a Fourier mode is
Aa = e ;ickt: (6.2)
Thus the analytical solution expects a unit amplication per time step, jAa j = 1,
and a change of phase of a = ;ckt = ;kx. The amplication factor for the
donor cell scheme is however:
A = jAjei (6.3)
jAj = 1 ; (1 ; )4 sin2 k2 x (6.4)
= tan 1 1 ; (1sin;kcos
;
x
kx) (6.5)
where is the argument of the complex number A. The ratio of =a gives the
relative error in the phase. A ratio less then 1 means that the numerical phase
error is less then the analytical one, and the scheme is decelerating, while a ratio
greater then indicates an accelerating scheme. We will return to phase errors later
when we look at the dispersive properties of the scheme.
5. The donor cell scheme for c positive can be written in the form:
unj +1 = (1 ; )unj + unj ; 1 (6.6)
which is a linear, convex (for 0 1), combination of the two values at the
previous time levels upstream of the point (j n). Since the two factors are positive
we have
min(unj unj 1 ) unj +1 max(unj unj 1 )
; ;
(6.7)
80 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION
In plain words the value at the next time level cannot exceed the maximum of the
two value upstream, nor be less then the minimum of these two values. This is
referred to as the monotonicity property. It plays an important role in devising
scheme which do not generate spurious oscillation because of under-resolved gra-
dients. We will return to this point several times when discussing dispersive errors
and special advection schemes.
Figure 6.1 shows jAj and =a for the donor cell scheme as a function of kx for
several values of the Courant number . The long waves (small kx are damped the
least for 0 1 whereas high wave numbers kx ! 0 are damped the most. The
most vigorous damping occurs for the shortest wavelength for = 1=2, where the donor-
cell scheme reduces to an average of the two upstream value, the amplication factor
magnitude is then jAj = 0, i.e. 2x waves are eliminated after a single time step. The
amplication curves are symmetric about = 1=2, and damping lessens as becomes
smaller for a xed wavelength. The dispersive errors are small for long waves they are
decelerating for all wavelengths for < 1=2 and accelerating for 1=2 1 they reach
their peak acceleration for = 3=4.
6.3.1 Remarks
1. truncation error The Taylor series analysis (expansion about time level n + 1)
leads to the following equation:
2 2 !
t c x t
ut + cux = ; ; 2 utt + 3 uxxx + 6 uttt + O(t3 x4) (6.9)
The leading truncation error term is O(t x2 ), and hence the scheme is rst
order in time and second order in space. Moreover, the truncation error goes to
zero for t x ! 0, and hence the scheme is consistent.
2. The Von Neumann stability analysis leads to the following amplication factor:
A = 11+;i2 sin
sin kx
2 k x (6.10)
jAj = q 1 < 1 for all kx (6.11)
1 + 2 sin2 kx
= tan 1 (; sin kx)
;
(6.12)
a ;kx
6.3. BACKWARD TIME CENTERED SPACE (BTCS) 81
|A| for BTCS
1
µ=0.25
0.9 µ=0.50
0.8 µ=0.75
µ=1.00
|A|
0.7
0.6
0.5
µ=2.00
0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k ∆ x/π
1
0.9
0.25
0.8
0.7 0.50
0.6
0.75
a
Φ/Φ
0.5
0.4 1.00
0.3
2.00
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k ∆ x/π
Figure 6.2: Amplitude and phase diagrams of the BTCS scheme as a function of the
wavenumber
82 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION
The scheme is unconditionally stable since jAj < 1 irrespective of the time step
t. By the Lax-Richtmeyer theorem the consistency and stability of the scheme
guarantee it is also convergent.
3. The modied equation for BTCS is
2 2 3 !
ut + cux = c 2t uxx ; c6x + c6 t2 uxxx + : : : (6.13)
The numerical viscosity is hence always positive and lends the scheme its stable
and damping character. Notice that the damping increasing with increasing c and
t.
4. Notice that the scheme cannot update the solution values a grid point at a time,
since the values unj +1 and unj +11 are unknown and must be determined simultane-
ously. This is an example of an implicit scheme which requires the inversion of a
system of equation. Segregating the unknowns on the left hand side of the equation
we get:
; 2 unj +11 + unj +1 + 2 unj+1
;
+1 = un
j (6.14)
which consititutes a matrix equation for the vector of unknowns at the next time
level. The equation in matrix forms are:
0 1 0 u1 1n+1 0 u1 1n
BB CC BB ... CC BB ... CC
BB CC BB u CC BB CC
BB
;
2 1
2 CC BB j 1 CC BB uj 1 CC
CC BB uj CC = BB uj CC
; ;
BB ;
2 1 (6.15)
2
BB 0 2 1 2
;
CC BB uj+1 CC BB uj+1 CC
@ A B@ ... CA B@ ... CA
uN uN
The special structure of the matrix is that the only non-zero entries are those along
the diagonal, and on the rst upper and lower diagonals. This special structure is
referred to as a tridiagonal matrix. Its inversion is far cheaper then that of a full
matrix and can be done in O(N ) addition and multiplication through the Thomas
algorithm for tridiagonal matrices in contrast, a full matrix would require O(N 3 )
operations. Finally, the rst and last rows of the matrix have to be modied to
take into account boundary conditions. We will return to the issue of boundary
conditions later.
Figures 6.2 shows the magnitude of the amplication factor jAj for several Courant
numbers. The curves are symmetric about kx = =2. The high and low wave
numbers are the least damped whereas the intermediate wave numbers are the
most damped. The departure of jAj from 1 deteriorates with increasing . Finally
the scheme is decelarating for all wavenumbers and Courant numbers, and the
deceleration deteriorates for the shorter wavelengths.
6.4. CENTERED TIME CENTERED SPACE (CTCS) 83
1 1.00
0.9
0.25
0.8 0.50
0.75
0.7
0.6
a
Φ/Φ
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k ∆ x/π
Figure 6.3: Phase diagrams of the CTCS scheme as a function of the wavenumber
2t + c 2x = 0 ;
(6.16)
6.4.1 Remarks
1. truncation error The Taylor series analysis leads to the following equation:
2 2 !
t c x
ut + cux = ; ; 3 uttt + 3 uxxx + O(t4 x4 ) (6.17)
The leading truncation error term is O(t2 x2 ), and hence the scheme is second
order in time and space. Moreover, the truncation error goes to zero for t x !
0, and hence the scheme is consistent.
84 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION
2. The Von Neumann stability analysis leads to a quadratic equation for the ampli-
cation factor: A2 + 2 sin kxA ; 1 = 0. Its two solutions are:
q
A = ;i sin kx 1 ; 2 sin2 kx
(6.18)
jA j = 1 for all jj < 1
(6.19)
= 1 tan 1 q ; sin kx (6.20)
;kx
;
a 1 ; 2 sin2 kx
The scheme is conditionally stable for jj < 1, and its amplication is neutral
since jAj = 1 within the stability region. An attribute of the CTCS scheme is
that its amplication factor mirror the neutral amplication of the analytical solu-
tion. By the Lax-Richtmeyer theorem the consistency and stability of the scheme
guarantee it is also convergent.
3. The modied equation for CTCS is
2 x4 (94 ; 102 + 1)u
ut + cux = c6x (2 ; 1)uxxx ; c120 xxxxx + : : : (6.21)
The even derivative are absent from the modied equation indicating the total
absence of numerical dissipation. The only errors are dispersive in nature due to
the presence of odd derivative in the modied equation.
4. The model requires a starting procedure to kick start the computations. It also
has a computational mode that must be damped.
5. Figures 6.3 shows the phase errors for CTCS for several Courant numbers. All
wave numbers are decelerating and the shortest wave are decelerated more then
the long waves.
0.9
µ=0.25
0.8
0.7
0.6
|A|
0.5 µ=0.50
0.4
0.3
0.2
µ=0.75
0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k ∆ x/π
1.4
1.2
1 1.00
0.25 0.75
0.50
0.8
a
Φ/Φ
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k ∆ x/π
Figure 6.4: Amplitude and phase diagrams of the Lax-Wendro scheme as a function of
the wavenumber
86 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION
All that remains to be done is to use high order approximations for the spatial derivatives
ux and ux x. We use centered derivatives for both terms as they are second order accurate
to get the nal expression:
unj +1 ; unj unj+1 ; unj 1 c2 t unj+1 ; 2unj + unj 1
t + c 2x + 2 =0 (6.25)
; ;
x2
6.5.1 Remarks
1. truncation error The Taylor series analysis (expansion about time level n + 1
leads to the following equation:
2 2 !
t c x
ut + cux = ; ; 3 uttt + 3 uxxx + O(t4 x4 ) (6.26)
The leading truncation error term is O(t2 x2 ), and hence the scheme is second
order in time and space. Moreover, the truncation error goes to zero for t x !
0, and hence the scheme is consistent.
2. The Von Neumann stability analysis leads to:
A = 1 ; 2 (1 ; cos kx) ; i sin kx (6.27)
jAj = !1 ; (1 ; cos kx)] + sin kx
2 2 2 2 2 (6.28)
1 1 ; sin kx
= ;kx tan 1 ; 2 (1 ; cos kx) (6.29)
;
a
The scheme is conditionally stable for jj < 1. By the Lax-Richtmeyer theorem
the consistency and stability of the scheme guarantee it is also convergent.
3. The modied equation for Lax Wendro is
2 3
ut + cux = c6x (2 ; 1)uxxx ; c8x (1 ; 2 )uxxxx + : : : (6.30)
4. Figures 6.4 shows the amplitude and phase errors for the Lax Wendro schemes.
The phase errors are predominantly lagging, the only accelerating errors are those
of the short wave at relatively high values of the Courant number.
and ! the corresponding frequency. Inserting this expression in equation 5.3 we get the
dispersion relation:
! = ck (6.31)
The associate phase speed, Cp , and group velocity, Cg , of the system is as follows
Cp = !k = c (6.32)
Cg = @!
@k = c (6.33)
The two velocities are constant and the system is non-dispersive, i.e. all waves travels
with the same phase speed regardless of wavenumber. The group velocity is also constant
in the present case and reects the speed of energy propagation. One can anticipate that
this property will be violated in the numerical discretization process based on what we
know of the phase error plots there it was shown that phase errors are dierent for the
dierent wave number. We will make this assertion clearer by looking at the numerical
dispersion relation.
6.6.2 Numerical Dispersion Relation: Spatial Dierencing
To keep the algebra tractable, we assume that only the spatial dimension is discretized
and the time dimenion is kept continuous. The semi-discrete form of the following
schemes are:
1. Centered second order scheme
; uj 1 = 0
ut + uj+12 ;
(6.34)
x
2. Centered fourth order scheme
ut + 8(uj +1 ; uj 12
1 ) ; (uj +2 ; uj 2 ) = 0
;
x
;
(6.35)
4. Donor cell
ut + uj ;uxj 1 = 0
;
(6.37)
5. Third order upwind
; 6uj 1) + uj 2 = 0
ut + 2uj +1 + 3uj 6 ; ;
(6.38)
x
88 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION
The dispersion for these numerical scheme can be derived also on the basis of periodic
solution of the form uj = u~ei(kxj
) . The biggest dierence is of course that the Fourier
;
expansion is discrete in space. The following expression for the phase velocity can be
derived for the dierent schemes:
CD2 = c sin kx
CD4 k = c 8 sin
kkxx ; sin 2kx
k sin 3k 6k;9xsin 2kx + 45 sin kx
x
CD6 k =c 30kx (6.39)
Donor = c sin kx ; i(1 ; cos kx)
k kx
Third Upwind k = c (8 sin k x ; sin 2kx) ; i2(1 ; cos kx)2
kx
Several things stand out in the numerical dispersion of the various schemes. First, all
of them are dispersive, and hence one expects that a wave form made up of the sum of
individual Fourier components will evolve such that the fast travelling wave will pass the
slower moving ones. Second, all the centered dierence scheme show a real frequency,
i.e. they introduce no amplitude errors. The o-centered schemes on the other hand
have real and imaginary parts. The former inuences the phase speed whereas the
former inuences the amplitude. The amplitude decays if I m() < 0, and increases for
I m() > 0. Furthermore, the upwind biased schemes have the same real part as the next
higher order centered scheme thus their dispersive properties are as good as the higher
order centered scheme except for the damping associated with their imaginary part (this
is not necessarily a bad things at least for the short waves).
Figure 6.5 shows the dispersion curve for the various scheme discussed in this section
versus the analytical dispersion curve (the solid straight line). One can immediately see
the impact of higher order spatial dierencing in improving the propagation character-
istics of the intermediate wavenumber range. As the order is increased, the dispersion
curves rise further towards the analytical curve, particularly near kx > =2, hence a
larger portion of the spectrum is propagating correctly. The lower panel shows the impact
of biasing the dierencing towards the upstream side. The net eect is the introduction
of numerical dissipation. The latter is strongest for the short waves, and decreases with
the order of the scheme.
Figure 6.6 shows the phase speed (upper panel) and group velocity (lower panel) of
the various schemes. Again it is evident that a larger portion of the wave spectrum is
propagating correctly whereas as the order is increased. None of the schemes allows the
shortest wave to propagate. The same trend can be seen for the group velocity plot.
However, the impact of the numerical error is more dramatic there since the short waves
have negative group velocities, i.e. they are propagating in the opposite direction. This
trend worsens as the accuracy is increased.
6.6. NUMERICAL DISPERSION 89
3.5
2.5
2
ω/c
FE1
1.5
CD6
CD4
1
CD2
0.5
0
0 0.2 0.4 0.6 0.8 1
k ∆ x/π
2
1.8
1.6
1.4
1.2
ω /c
1 1
i
0.8
0.6
0.4
3
0.2
0
0 0.2 0.4 0.6 0.8 1
k ∆ x/π
Figure 6.5: Dispsersion relation for various semi-discrete schemes. The upper panel
shows the real part of the frequency whereas the bottom panel shows the imaginary part
for the rst and third order upwind schemes. The real part of the frequency for these
two schemes is identical to that of the second and fourth order centered schemes.
90 CHAPTER 6. NUMERICAL SOLUTION OF THE ADVECTION EQUATION
0.9
0.8
0.7 FE1
0.6 CD6
σ/(kc)
0.5 CD4
0.4
0.3 CD2
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
k ∆ x/π
1
0.5
−0.5
c /c
−1 CD2
g
−1.5 CD4
−2 CD6
−2.5
FE1
−3
0 0.2 0.4 0.6 0.8 1
k∆ x/π
Figure 6.6: Phase speed (upper panel) and Group velocity (lower panel) for various
semi-discrete schemes.
Chapter 7
The nite dierence representation 7.2 of the Poisson equation results in a coupled system
of algebraic equations that must be solved simultaneously. In matrix notation the system
can be written in the form Ax = b, where x represents the vector of unknowns, b
represents the right hand side, and A the matrix representing the system. Boundary
conditions must be applied prior to solving the system of equations.
5 s s s s s
64 s s s s s
k3 s s s s s
2 s s s s s
1 s s s s s
1 2 3 4 5
j -
Figure 7.1: Finite Dierence Grid for a Poisson equation.
91
92 CHAPTER 7. SOLVING THE POISSON EQUATIONS
Example 12 For a square domain divided into 4x4 cells, as shown in gure 7.1, subject
to Dirichlet boundary conditions on all boundaries, there are 9 unknowns ujk , with
(j k) = 1 2 3. The nite dierence equations applied at these points provide us with
the system:
0 ;4 1 0 1 0 0 0 0 0
10 u 1 0f 1 0 u +u 1
B
B 1 ;4 1 0 1 0 0 0 0 CC BB u2322 CC BB f2322 CC BB 21u31 12 CC
B
B 0 1 ;4 1 0 1 0 0 0 CC BB u42 CC BB f42 CC BB u41 CC
B
B 1 0 1 ;4 1 0 1 0 0 CC BB u23 CC BB f23 CC BB 0 CC
B
B 0 1 0 1 ;4 1 0 1 0
CC BB u CC = BB f CC;BB 0
CC
B
B CC BB 33 CC BB 33 CC BB CC
B
B 0 0 1 0 1 ;4 1 0 1 CC BB u43 CC BB f43 CC BB 0 CC
B
B 0 0 0 1 0 1 ;4 1 0 CC BB u24 CC BB f24 CC BB u25 CC
@ 0 0 0 0 1 0 1 ;4 1 A@ u 34 A @f 34 A @ u
35 A
0 0 0 0 0 1 0 1 ;4 u44 f44 u45 + u54
(7.3)
where x = y = . Notice that the system is symmetric, and pentadiagonal (5 non-
zero diagonal). This last property precludes the ecient solution of the system using the
ecient tridiagonal solver.
The crux of the work in solving elliptic PDE is the need to update the unknowns
simultaneously by inverting the system Ax = b. The solution methodologies fall under 2
broad categories:
1. Direct solvers: calculate the solution x = A 1 b exactly (up to round-o errors).
;
equations to reduce the CPU cost and accelerate convergence. Here we mention a
few of the more common iterative schemes:
(a) Fixed point methods: Jacobi and Gauss-Seidel Methods)
(b) Multigrid methods
(c) Krylov Method: Preconditioned Conjugate Gradient (PCG)
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
λ∆ y/π
λ∆ y/π
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
κ∆ x/π κ∆ x/π
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 7.2: Magnitude of the amplication factor for the Jacobi method (left) and Gauss-
Seidel method (right) as a function of the (x y) Fourier components.
by
G = cos x +2 cos y (7.9)
where ( ) are the wavenumbers in the (x y) directions. A plot of jGj is shown in gure
7.2. It is clear that the shortest (x ! ) and longest (x ! 0) error components
are damped the least. The intermediate wavelengths (x = =2) are damped most
eectively.
7.1.2 Gauss-Seidel method
A simple modication to the Gauss-Seidel method can improve the storage and con-
vergence rate of the Jacobi method. Note that in Jacobi, only values at the previous
iterations are used to update the solution. An improved algorithm can be obtained if the
most recent value is used. Assuming that we are sweeping through the grid by increased
j and k indeces, then the Gauss-Seidel method can be written in the form:
unj+1k + unj +11k + unjk+1 + unjk+1 1 2
n+1
ujk = 4 ; 4 fjk ;
(7.10) ;
The major advantages of this scheme are that only one time level need be stored (the
values can be updated on the y), and the convergence rate can be improved substantially
(double the rate of the Jacobi method). The latter point can be quantied by looking
at the error amplication which now takes the form:
i i
G = 4 ; (ee i+ +e e i ) jGj2 = 9 ; 4(cos 1 ++cos( ;
) (7.11)
; ;
cos
) + cos( ;
)
where = x, and
= y. A plot of jGj versus wavenumbers is shown in gure 7.2,
and clearly shows the reduction in the area where jGj is close to 1. Notice that unlike
the Jacobi method, the smallest wavelengths are damped at the rate of 1=3 at every time
step. The error components that are damped the least are the long ones:
! 0.
7.1. ITERATIVE METHODS 95
1
1
0.9
0.9
0.8 0.8
0.7 0.7
0.6 0.6
λ∆ y/π
λ∆ y/π
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
κ∆ x/π κ∆ x/π
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 7.3: Magnitude of the amplication factor for the Gauss-Seidel by rows (left)
and Gauss-Seidel method by rows and columns (right) as a function of the (x y) Fourier
components.
where ! is the correction factor. For ! = 1 we revert to the Gauss-Seidel update, for
! < 1 the correction is under-relaxed, and for ! > 1 the correction is over-relaxed. For
convergence, it can be shown that 1 ! 2. The optimal !, !o, can be quite hard
to compute and depends on the number of points in each direction and the boundary
conditions applied. Analytic values for !o can be obtained for a Dirichlet problem:
p 2 x2 3
2
1; 1;
cos + cos
!o = 2
= 4 M 1 yx2 N 1 5
2; ;
(7.13)
1 + y2
where M and N are the number of points in the x and y directions, respectively.
7.1.4 Iteration by Lines
A closer examination of the Gauss-Seidel method in equation 7.10 reveals that an ecient
algorithm, relying on tridiagonal solvers, can be produced if the iteration is changed to:
unj+1
+1 + un+1 + r 2 (un
k j 1k jk +1 + unjk+1 1) x2
n +1
ujk = 2(1 + r2 ) ; 4 fjk (7.14)
; ;
where r = x=y is the aspect ratio of the grid. Notice that 7.14 has 3 unknowns only
at row j since unjk+1 1 would be known from either a boundary condition or a previous
;
96 CHAPTER 7. SOLVING THE POISSON EQUATIONS
iteration, and unjk+1 is still lagged in time. Hence a simple tridiagonal solver can be
used to update the rows one-by-one. The amplication factor for this variation on the
Gauss-Seidel method is given by:
4
jGj2 = !2(1 + r2 ; cos )]2 + !2(1r + r2 ; cos )]2 cos
+ r4 (7.15)
A plot of jGj for r = 1 is shown in gure 7.3. The areas with small jGj have expanded
with resepect to those shown in gure 7.2. In order to symmetrize the iterations along
the two directions, it is natural to follow a sweep-by-row by a sweep-by-columns. The
amplication factor for this iteration is shown in the left panel of gure 7.3 and show a
substantial reduction in error amplitude for all wavelenths except the longest ones.
Example 13 In order to illustrate the eciency of the dierent methods outline above
we solve the following Laplace equation
r2u = 0 0 x y 1 (7.16)
u(0 y) = u(1 y) = 0 (7.17)
u(x 1) = sin x (7.18)
u(x 1) = e 16(x 41 )2 sin x
; ;
(7.19)
We divide the unit square into M N grid points and we use the following meth-
ods:Jacobi, Gauss-Seidel, SOR, SOR by line in the x-direction, and SOR by line in both
directions. We monitor the convergence history with the rms change in u from one
iteration to the next:
2 3 12
1 X
4 (un+1 ; un )2 5
kk =
2 MN jk jk (7.20)
jk
The stopping criterion is kk2 < 10 13 , and we limit the maximum number of iterarions
;
to 7,000. We start all iterations with u = 0 (save for the bc) as an initial guess. The
convergence history is shown in gure 7.4 for M = N = 65. The Jacobi and Gauss-Seidel
have similar convergence history except near the end where Gauss-Seidel is converging
faster. The SOR iterations are the fastest reducing the number of iterations required
by a factor of 100 almost. We have used the optimal relaxation factor since it is was
computable in our case. The SOR iterations are also quite similar showing a slow decrease
of the error in the initial stages but very rapid decrease in the nal stages. The criteria
for the selection of an iteration algorithm should not rely solely on the algorithm's rate
of convergence it should also the operation count needed to complete each iteration.
The convergence history for the above example shows that the 2-way line SOR is the
most ecient per iterations. However, table 7.1 shows the total CPU time is cheapest
for the point-SOR. Thus, the overhead of the tridiagonal solver is not compensated by
the higher eciency of the SOR by line iterations. Table 7.1 also shows that, where
applicable, the FFT-based fast solvers are the most ecient and cheapest.
7.1. ITERATIVE METHODS 97
−2
10
−4
10
−6
10
−8
2
10
|ε|
−10
10
−12
10
−14
10
0 1 2 3 4
10 10 10 10 10
n
Figure 7.4: Convergence history for the Laplace equation. The system of equation is
solved with: Jacobi (green), Gauss-Seidel (red), SOR (black), Line SOR in x (solid
blue), and line SOR in x and y (dashed blue). Here M = N = 65.
33 65 129
Jacobi 0.161 0.682 2.769
Gauss-Seidel 0.131 2.197 10.789
SOR 0.009 0.056 0.793
SOR-Line 0.013 0.164 1.291
SOR-Line 2 0.014 0.251 1.403
FFT 0.000 0.001 0.004
Table 7.1: CPU time in second to solve Laplace equation versus the number of points
(top row).
98 CHAPTER 7. SOLVING THE POISSON EQUATIONS
7.1.5 Matrix Analysis
The relaxation schemes presented above are not restricted to the Poisson equation but
can be re-intrepeted as specic instances of a larger class of schemes. We present the
matrix approach in order to unify the dierent schemes presented. Let the matrix A be
split into
A=N ;P (7.21)
where N and P are matrices of the same order as A. The system of equations becomes:
Nx = Px + b (7.22)
Starting with an arbitrary vector x(0) , we dene a sequence of vectors x(v) by the recursion
Nx(n) = Px(n ; 1) + b n = 1 2 3 ::: (7.23)
It is now clear what kind of restrictions need to be imposed on the matrices in order to
solve for x, namely: the matrix N must be non-singular: det(N ) 6= 0, and the matrix N
must be easily invertible so that computing y from Ny = z is computationally ecient.
In order to study how fast the iterations are converging to the correct solution, we
introduce the matrix M = N 1 P , and the error vectors e(n) = x(n) ; x. Substracting
;
equation 7.22 from equation 7.23, we obtain an equation governing the evolution of the
error, thus:
e(n) = Me(n 1) = M 2e(n 2) = : : : = M n e(0)
; ;
(7.24)
where e(0) is the initial error. Thus, it is clear that a sucient condition for convergence,
i.e. that limn e(n) = 0, is that limn M n = O. This is also necessary for the
!1 !1
method to converge for all e(0) . The condition for a matrix to be convergent is that its
spectral radius (M ) < 1. (Reminder: the spectral radius of a matrix M is dened as the
maximum eigenvalue in magnitude: (M ) = maxi ji j). Since computing the eigenvalues
is dicult usually, and since the spectral radius is a lower bound for any matrix norm,
we often revert to imposing conditions on the matrix norm to enforce convergence thus
( M ) kM k < 1: (7.25)
In particular, it is common to use either the 1- or innity-norms since these are the
simplest to calculate.
The spectral radius is also useful in dening the rate of convergence of the method.
In fact since, using equation 7.24, one can bound the norm of the error by:
ke(n) k kM nkke(0) k (7.26)
ke(n) k !(M )]n ke(0) k (7.27)
Thus the number of iteration needed to reduce the initial error by a factor of is
n ln = ln!(M )]. Thus, a small spectral radius reduces the number of iterations (and
hence CPU cost) needed for convergence.
7.1. ITERATIVE METHODS 99
Jacobi Method
The Jacobi method derived for the Poisson equation can be generalized by dening the
matrix N as the diagonal of matrix A:
N = D P = A ; D (7.28)
The matrix D = aij ij , where ij is the Kronecker delta. The matrix M = D 1 (D ; A) = ;
X
K
xni = a1 aij xnj ;1 (7.29)
ii j=1
j =i
6
The procedure can be employed if aii 6= 0, i.e. all the diagonal elements of A are dierent
from zero. The rate of convergence is in general dicult to obtain since the eigenvalues
are not easily available. However, the innity and/or 1-norm of M can be easily obtained:
(M ) min(kM k1 kM k ) (7.30)
X
kM k1 = max j aaij < 1
1
(7.31)
i=1 ii
i=j
X aij
6
(7.33)
Gauss-Seidel Method
A change of splitting leads to the Gauss-Seidel method. Thus we split the matrix into a
lower triangular matrix, and an upper triangular matrix:
0 1
a11
BB a21 a22 CC
N =BB@ .. ... CC P = N ; A (7.34)
. A
aK 1 aK 2 aKK
A slightly dierent form of writing this splitting is as A = D + L + U where D is again
the diagonal part of A, L is a strictly lower triangular matrix, and U is a strictly upper
triangular matrix here N = D + L. The matrix notation for the SOR iteration is a little
complicated but can be computed:
xm = Mxm 1 + (I + D 1 L) 1 D 1 b
; ; ; ;
(7.35)
M = (I + D 1 L) 1!(1 ; )I ; D 1 U ]
; ; ;
(7.36)
100 CHAPTER 7. SOLVING THE POISSON EQUATIONS
7.2 Krylov Method-CG
Consider the system of equations Ax = b, where the matrix A is a symmetric positive
denite matrix. The solution of the system of equations is equivalent to minimizing the
functional:
'(x) = 21 xT Ax ; xT b: (7.37)
The extremum occurs for @@x = Ax ; b = 0, thanks to the symmetry of the matrix, and
the positivity of the matrix shows that this extremum is a minimum, i.e. @@x2 2 = A. The
iterations have the form:
xk = xk 1 + pk
; (7.38)
where xk is the kth iterate, is a scalar and pk are the search directions. The two
parameters at our disposal are and p. We also dene the residual vector rk = b ; Axk .
We can now relate '(xk ) to '(xk 1 ):
;
For an ecient iteration algorithm, the 2nd and 3rd terms on the right hand side of
equation 7.39 have to be minimized separately. The task is considerably simplied if we
require the search directions pk to be A-orthogonal to the solution:
xTk 1 Apk = 0:
;
(7.40)
The remaining task is to choose such that the last term in 7.39 is minimized. It is a
simple matter to show that the optimal occurs for
Tb
= pTpAp
k (7.41)
k k
and that the new value of the functional will be:
T 2
'(xk ) = '(xk 1 ) ; 21 (pTk b) : (7.42)
;
pk Apk
We can use the orthogonality requirement 7.40 to rewrite the above two equations as:
T k 1 T k 1 )2
= ppkT rAp
;
'(xk ) = '(xk 1 ) ; 21 (ppkTrAp
;
;
: (7.43)
k k k k
The remaining task is dening the iteration is to determine the algorithm needed to
update the search vectors pk the latter must satisfy the orthogonality condition 7.40,
and must maximum the decrease in the functional. Let us denote by Pk the matrix
7.2. KRYLOV METHOD-CG 101
formed by the (k ; 1) column vectors pi , then since the iterates are linear combinations
of the search vectors,we can write:
kX1
;
xk ; 1 = i pi = Pk 1 y
; (7.44)
hi=1 i
Pk 1 = p1 p2 : : : pk 1 (7.45)
0 1
; ;
1
BB 2 CC
y = B .. B C (7.46)
@ . CA
k 1 ;
We note that the solution vector xk 1 belongs to the space spanned by the search vectors
pi i = 1 : : : k ; 1. The orthogonality property can now be written as yT PkT 1 Apk = 0.
;
This property is easy to satisfy if the new search vector pk is A-orthogonal to all the
;
previous search vectors, i.e. if PkT 1 Apk = 0. The algorith can now be summarized as
;
follows: First we initialize our computations by dening our initial guess and its residual
second we perform the following iterations:
while krk k < :
1. Choose pk such that pTi Apk = 0 8i < k, and maximize pTk rk 1 . ;
T k ;1
2. Compute the optimal k = ppkTrAp
k k
3. Update the guess xk = xk 1 + k pk , and residual rk = rk 1 ; k Apk
end
; ;
A vector pk which is A-orthogonal to all previous search, and such that pTk rk 1 6= 0, ;
vectors can always be found. Note that if pTk rk 1 = 0, then the functional does not
;
decrease and the minimum has been reached, i.e. the system has been solved. To bring
about the largest decrease in '(xk ), we must maximize the inner product pTk rk 1 . This ;
can be done by minimizing the angle between the two vectors pk and rk 1 , i.e. minimizing
krk 1 ; pk k.
;
and under this condition pTk APk 1 = 0, and kpk ;rk k is minimized. We have the following
;
property:
PkT rkT = 0 (7.49)
i.e. the search vectors are orthogonal to the residual vectors. We note that
Spanfp1 p2 : : : pk g = Spanfr0 r1 : : : rk 1 g = Spanfb Ab : : : Ak 1 bg
;
;
(7.50)
102 CHAPTER 7. SOLVING THE POISSON EQUATIONS
i.e. these dierent basis sets are spanning the same vector space. The nal steps in the
conjugate gradient algorithm is that the search vectors can be written in the simple form:
pk = rk 1 +
k pk 1
; ; (7.51)
pT Ar
k = ; pTk 1Apk 1
; ;
(7.52)
k 1 k 1; ;
T
r r
k = ; kpT1Apk 1
; ;
(7.53)
k k
The conjugate gradient algorithm can now be summarized as
Initialize: r = b ; Ax, p = r, = krk2 .
while < :
k
k+1
w
Ap2
= prT w
k k
update guess: x
x + p
update residual: r
r ; w
new residual norm:
krk2
0
end
It can be shown that the error in the CG algorithm after k iteration is bounded by:
p !k
kxk ; xkA 2kx0 ; xkA p ;+ 11 (7.54)
i i
the ratio of maximum
p eigenvalue to minimum eigenvalue. The error estimate uses the
A-norm: kwkA = wT Aw. Note that for very large condition numbers, 1 , the rate
of residual decrease approaches 1:
p !
p ;+ 11 1 ; p2 1 (7.56)
Hence the number of iterations needed to reach convergence increases. For ecient
iterations must be close to 1, i.e. the eigenvalues cluster around the unit circle. The
problem becomes one of converting the original problem Ax = b into A~x~ = ~b with
(A~) 1.
7.3. DIRECT METHODS 103
ujk = MN u^mn e ;
M e ;
N (7.57)
m=0 n=0
where u^mn are the Discrete Fourier Coecients. A similar expression can be written for
the right hand side function f . Replace the Fourier expression in the original Laplace
equation we get:
1 MX1 NX1 u^
; ;
2jm
i 2kn
i 2m 2m i 2Nn 2n
MN mn e i M
;
e ;
N e
;
M + ei M e ;
+ ei N =
m=0 n=0
1 MX1 NX1 f^ e i 2jm
2 MN
;
2kn
;
M e i N (7.58)
mn
; ;
m=0 n=0
Since the Fourier functions form an orthogonal basis, the Fourier coecients should
match individually. Thus, one can obtain the following expression for the unknowns
u^mn:
2^
u^mn = 2m fmn 2n m = 0 1 : : : M ; 1 n = 0 1 : : : N ; 1 (7.59)
2 cos M + cos N ; 2
Nonlinear equations
Linear stability analysis is not sucient to establish the stability of nite dierence
approximation to nonlinear PDE's. The nonlinearities add a severe complications to
the equations by providing a continuous source for the generation of small scales. Here
we investigate how to approach nonlinear problems, and ways to mitigate/control the
growth of nonlinear instabilities.
8.1 Aliasing
In a constant coecient linear PDE, no new Fourier components are created that are
not present either in the initial and boundary conditions conditions, or in the forcing
functions. This is not the case if nonlinear terms are present or if the coecients of a
linear PDE are not constant. For example, if two periodic functions: = eik1 xj and
= eik2 xj , are multiplied during the course of a calculation, a new Fourier mode with
wavenumber k1 + k2 is generated:
= ei(k1 +k2 )xj : (8.1)
The new wave generated will be shorter then its parents if k12 have the same sign,
i.e. k12+k2 < k212 . The representation of this new wave on the nite dierence grid can
become problematic if its wavelength is smaller then twice the grid spacing. In this case
the wave can be mistaken for a longer wave via aliasing.
Aliasing occurs because a function dened on a discrete grid has a limit on the shortest
wave number it can represent all wavenumbers shorter then this limit appear as a long
wave. The shortest wavelength representable of a nite dierence grid with step size x
is s = 2x and hence the largest wavenumber is kmax = 2=s = =x. Figure 8.1
shows an example of a long and short waves aliased on a nite dierence grid consisting
of 6 cells. The solid line represents the function sin 4 6x and is indistinguishable from
x
x (dashed line): the two functions coincide
the function sin 4 x at all points of the grid
(as marked by the solid circles). This coincidence can be explained by re-writing each
Fourier mode as:
h ij
eikxj = eikjx = eikjx ei2n = ei(k+ 2nx )j x (8.2)
105
106 CHAPTER 8. NONLINEAR EQUATIONS
where n = 0 1 2 : : : Relation 8.2 is satised at all the FD grid points xj = j x it
shows that all waves with wavenumber k + 2nx are indistinguishable on a nite dierence
grid with grid size x. In the case shown in gure 8.1, the long wave has length 4x
and the short wave has length 4x=3, so that the equation 8.2 applies with n = ;1.
Going back to the example of the quadratic nonlinearity , although the individual
functions, and , are representable on the FD grid, i.e. jk12 j =x, their product
may not be since now jk1 + k2 j 2=x. In particular, if =x jk1 + k2 j 2=x,
the product will be unresolvable on the discrete grid and will be aliased into wavenumber
k~ given by (
k~ = kk1 + k2 ; 2x if k1 + k2 > x (8.3)
1 + k + 2 if k + k < ;
2 x 1 2 x
Note that very short waves are aliased to very long waves: k~ ! 0 when jk12 j ! x .
d &t %t
The aliasing wavenumber can be visualized by looking at the wavenumber axis shown in
kc jk~j k1 + k2 k
-
0 6 2
x x
into jk~ j = kmax ; (2kc ; kmax), and the latter must satisfy jk~ j > kc , and we end up with
kc < 32 kmax (8.4)
For a nite dierence grid this is equivalent to kc < 32 .
x
@t 2x
Multiplying by uj and summing over the interval we get:
@ PNj=0 u2j =2 1X N
@t = ; 2 j =0(uj uj +1 ; uj uj 1):
; (8.9)
Notice that the terms within the summation sign do not cancel out, and hence energy is
not conserved. Likewise, a nite dierence approximation to the conservative form:
@uj = ; u2j +1 ; u2j 1
;
(8.10)
@t 4x
is not conserving as its discrete energy equation attests:
@ PNj=0 u2j =2 1X N
@t = ; 4 j =0(uj uj +1 ; uj uj 1):
2 2
;
(8.11)
108 CHAPTER 8. NONLINEAR EQUATIONS
1.5
0.56
1 0.55
0.54
0.5
0.53
∆x ΣN u /2
2
0
i=1
0.52
−0.5 0.51
0.5
−1
0.49
−1.5 0.48
−1 −0.5 0 0.5 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
x t
Figure 8.3: Left: Solution of the inviscid Burger equation at t = 0:318 < 1= using
the advective form (black), momentum conserving form (blue), and energy conserving
form (red) the analytical solution is superimposed in Green. The initial conditions are
u(x 0) = ; sin x, the boundary conditions are periodic, the time step is t = 0:01325,
and x = 2=16 RK4 was used for the time integration. Right: Energy budget for the
dierent Burger schemes: red is the energy conserving, blue is the momentum conserving,
and black is the advective form.
The only energy conserving available for the Burger equation is the following:
@uj = ; uj +1 + uj + uj 1 uj+1 ; uj 1 : ; ;
(8.12)
@t 3 2x
where the advection velocity is a three-term average of the velocity at the central point.
Its discrete energy equation is given by:
@ PNj=0 u2j =2 1X N
@t = ; 6 j =0 !uj uj +1 (uj + uj +1 ) ; uj uj 1 (uj + uj 1)]
; ; (8.13)
where the term inside the summation sign does cancel out. Figure 8.3 shows solutions
of the Burger equations using the 3 schemes listed above. The advective form, shown
in black, does not conserve energy and exhibits oscillations near the front region. The
oscillations are absent in the both the ux and energy conserving forms. The ux form,
equation 8.10, exhibits a decrease in the energy and a decrease in the amplitude of the
waves. Note that the solution is shown just prior to the formation of the shock at time
t = 0:318 < 1=.
Can the term T r (vT ) be also written in ux form (for then it will be conserved upon
summation). We concentrate on the x-component for simplicity:
x2
Tx ux T x x
= x u T ; uxT xxT x (8.33)
x2
T 2 x
x
= x u T ; u x 2
x (8.34)
x2
T 2 T 2 x x
= x ux T ; x u 2 ; 2 x u (8.35)
x2
T 2 x! x2 T 2! T2
x
= x u T ; x u 2 ; 4 x xu x 2 + 2 x ux
x
2 2!
+ 4x x x u x T2 (8.36)
" f2x # 2
= x T2 ux ; T2 x (ux ) (8.37)
Equality 8.33 follows from property 8.20, 8.34 from 8.23, 8.35 from 8.19. The second
and third terms of equation 8.35 can rewritten with the help of equations 8.21 and 8.24,
respectively. The third and fth terms on the right hand side of equation 8.36 cancel.
The nal equation 8.37 is obtained by combining the rst and second terms of equation
8.36 (remember the operators are linear), and using equation 8.25. A similar derivation
can be carries out for the y-component of the divergence:
" f2 y #
2
= y T2 vy ; T2 y (vy )
Ty v y T y (8.38)
Thus, the semi-discrete second moment conservation becomes:
" x # " y #
@T 2 =2 = ; Tf2 ux ; Tf2 vy + T 2 ! (ux) + (v y )] (8.39)
@t x 2 y 2 2 x y
The rst and second term in the semi-discrete conservation equation are in ux form,
and hence will cancel out upon summation. The third term on the right hand side is
nothing but the discrete divergence constraint. Thus, the second order moment of T will
be conserved provided that the velocity eld is discretely divergence-free.
The following is a FD approximation to v rT consistent with the above derivation:
u @T @uT
@x = @x ; T@x
@u (8.40)
= x ux T x ; Tx (ux ) (8.41)
= Tx (ux) + ux x T x ; Tx (ux ) (8.42)
= ux x T x (8.43)
Thus, we have
v rT = uxxT x + vy y T y (8.44)
112 CHAPTER 8. NONLINEAR EQUATIONS
8.5 Conservation in vorticity streamfunction formulation
Nonlinear instabilities can develop if energy is falsely generated and persistently chan-
neled towards the shortest resolvable wavelengths. Arakawa !1, 3] devised an elegant
method to eliminate these articial sources of energy. His methodology is based on the
streamfunction-vorticity formulation of two dimensional, divergence-free uid ows. The
continuity constraint can be easily enforced in 2D ow by introducing a streamfunction,
, such that v = k r in component form this is:
u = ;y v = x (8.45)
The vorticity = r v reduces to
= vx ; uy = x x + y y = r2 (8.46)
and the vorticity advection equation can be obtained by taking the curl of the momentum
equation, thus:
@ r2 = J (r2 ) (8.47)
@t
where J stand for the Jacobian operator:
J (a b) = ax by ; bxay (8.48)
The Jacobian operator possesses some interesting properties
1. It is anti-symmetric, i.e.
J (b a) = ;J (a b) (8.49)
2. The Jacobian can be written in the useful forms:
J (a b) = ra k rb (8.50)
= r (k arb) (8.51)
= ;r (k bra) (8.52)
3. The integral of the Jacobian over a closed domain can be turned into a boundary
integral thanks to the above equations
Z Z @b Z @a
J (a b)dA = a @s ds = ; b @s ds (8.53)
@ @
where s is the tangential direction to the boundary. Hence, the integral of the
Jacobian vanishes if either a or b is constant along @ . In particular, if the boundary
is a streamline or a vortex line, the Jacobian integral vanishes. The area-averaged
vorticity is hence conserved.
4. The following relations hold:
2
aJ (a b) = J ( a2 b) (8.54)
2
bJ (a b) = J (a b2 ) (8.55)
8.5. CONSERVATION IN VORTICITY STREAMFUNCTION FORMULATION 113
Thus, the area integrals of aJ (a b) and bJ (a b) are zero if either a or b are constant
along the boundary.
It is easy to show that enstrophy, 2 =2, and kinetic energy, jrj2 =2, are conserved if
the boundary is closed. We would like to investigate if we can conserve vorticity, energy
and enstrophy in the discrete equations. We begin rst by noting that the Jacobian in
the continuum form can be written in one of 3 ways:
J ( ) = x y ; y x (8.56)
= (y )x ; (x )y (8.57)
= (x )y ; (y )x (8.58)
We can thus dene 3 centered dierence approximations to the above denitions:
J1 ( ) = xx y y ;y y x x (8.59)
x y
J2 ( ) = x y y ; y x x (8.60)
y x
J3 ( ) = y x x ; x y y (8.61)
It is obvious that J2 and J3 will conserve vorticity since they are in ux form J1 can
also be shown to conserve the rst moment since:
h i x h i y
J1 ( ) = x y xy x ; xx y y ; y xxy y + y y x x (8.62)
" 2 # " 2 #
= x y xy x ; 4x x y y x ; y x xy y ; 4y y x x y (8.63)
:
The last equation above shows that J1 can indeed be written in ux form, and hence
vorticity conservation is ensured. Now we turn our attention to the conservation of
quadratic quantities, namely, kinetic energy and enstrophy. It is easy to show that J2
conserves kinetic energy since:
x y
J2 ( ) = x y y ; y x x (8.64)
x yy x x y y
= x xy y ; y x x ; y y x + x x y (8.65)
" y #
x y x x2
= x y ; 4 x x y
" x#
y x y y2
; y x ; 4 y y x (8.66)
(8.67)
114 CHAPTER 8. NONLINEAR EQUATIONS
Notice that the nite dierence Jacobians satisfy the following property:
J1 ( ) = ;J1( ) (8.68)
J2 ( ) = ;J3( ): (8.69)
Hence, from equation 8.66 J3 ( ) can be written in ux form, and from equation 8.67
J1 +2 J3 can also be written in ux form. These results can be tabulated:
energy conserving enstrophy conserving
J2 J3
J1 + J3 J1 + J2
2 2
Notice that any linear combination of the energy conserving schemes will also be energy
conserving, likewise for the enstrophy conserving forms. Thus, it is possible to nd an
energy and enstrophy conserving Jacobian if we can nd two constants and
such
that:
JA = J2 + (1 ; ) J1 +2 J3 =
J3 + (1 ;
) J1 +2 J2 (8.70)
Equating like terms in J1 , J2 and J3 we can solve the system of equation. The nal
result can be written as:
JA = J1 + J32 + J3 (8.71)
Equation 8.71 denes the Arakawa Jacobian, named in honor of Akio Arakawa who
proposed it rst. The expression for JA in terms of the FD computational stencil is a
little complicated. We give the expression for a square grid spacing:x = y.
12xyJA ( ) = (j +1k + jk ) (j +1k+1 + jk+1 ; j +1k 1 ; jk 1)
; ;
us q jk u s
c vs
c
Figure 8.4: Conguration of unknowns on an Arakawa C-Grid. The C-Grid velocity
points ui+ 12 and vij + 21 are located a distance d=2 to the left and top, respectively, of
pressure point
ij .
2
The (x y ) coordinate system is rotated 45 degrees counterclockwise to the (x y) coor-
0 0
dinate system i.e. it is in the diagonal directions w.r.t. to the original axis.
8.7. CONSERVATION FOR DIVERGENT FLOWS 117
ther squaring the space-averaged velocity components, or averaging the squared velocity.
The latter however leads to a straightforward FD analogue of the kinetic energy and is
therefore preferred. This leads to the following PV-conserving momentum advection and
Coriolis force operators:
v ru ; fv = 21 x u2x + v2 y ; vxy qy (8.97)
v rv + fu = 21 y u2 x + v2y + uxy qx (8.98)
It can be shown that the above operator also conserves potential enstrophy hq2 =2.
The derivation of schemes that conserve both PV and kinetic energy is very complex.
Arakawa and Lamb !4, 2] did derive such a dierencing scheme. Here we quote the nal
result:
1 u2 x + v2 y ; V y qxy x ; 1 ;( V ) q + 1 U x qx + 1 U qx x(8.99)
x
x x y x y x y x y
0 0 0 0 0 0 0 0
2 48 12 12
1 u2 x + v2 y ; U xqxy y + 1 ;( U ) q ; 1 V y qy ; 1 V qy(8.100)
xy
y y x x y y x y x
0 0 0 0 0 0 0 0
2 48 12 12
where is the discrete dierential operator without division by grid distance.
0
Chapter 9
where the k are coecients that depend on the specic scheme used. A linear scheme
is one where the coecients k are independent of the solution Tj . For a scheme to be
monotone with respect to the Tkn , we need the condition
@Tjn+1
@Tjn+k 0 (9.3)
Godunov has shown that the only linear monotonic scheme is the rst order (upstream)
donor cell scheme. All high-order linear schemes are not monotonic and will permit
spurious extrema to be generated. High-order schemes must be nonlinear in order to
preserve monotonicity.
where f jxj+ 1 = !uT ]xj; 1 is the ux out of the cell j . This equation is nothing but the
2 2
restatement of the partial dierential equation as the rate at which the budget of T in
cell j increases according to the advective uxes in and out of the cell. As a matter of
9.3. FLUX CORRECTED TRANSPORT (FCT) 121
fact the above equation can be reintrepeted as a nite volume method if the integral is
replaced by @T@tj x where T j refers to the average of T in cell j whose size is x. We
now have:
@T j + fj+ 21 ; fj 21 = 0 ;
(9.6)
@t x
If the analytical ux is now replaced by a numerical ux, F , we can generate a family
of discrete schemes. If we choose an upstream biased scheme where the value within each
cell is considered constant, i.e. Fj + 21 = uj + 21 Tj for uj + 12 > 0 and Fj + 21 = uj + 12 Tj +1 for
uj + 21 < 0 we get the donor cell scheme. Note that the two cases above can be re-written
(and programmed) as:
uj+ 21 + juj+ 21 j uj + 21 ; juj + 21 j
Fj+ 21 = 2 Tj +2 Tj+1 (9.7)
The scheme will be monotone if we were to advance in time stably using a forward Euler
method. If on the other hand we choose to approximate Tj at the cell edge as the average
of the two cells:
Fj + 21 = uj + 21 Tj +2Tj +1 (9.8)
we obtained the second order centered in space scheme. Presumably the second order
scheme will provide a more accurate solution in those regions where the advected prole
is smooth whereas it will create spurious oscillations in regions where the solution is
\rough".
The idea behind the ux corrected transport algorithm is to use a combination of
the higher order ux and the lower order ux to prevent the generation of new extrema.
The algorithm can be summarized as follows:
1. compute low order uxes FjL+ 1 .
2
2. compute high order uxes FjH+ 1 , e.g. second order interpolation of T to cell edges
2
or higher.
3. Dene the anti-diusive ux Aj + 21 = FjH+ 1 ; FjL+ 1 . This ux is dubbed anti-diuse
2 2
because the higher order uxes attempt to correct the over diusive eects of the
low order uxes.
4. Update the solution using the low order uxes to obtain a rst order diused but
monotonic approximation:
FjL+ 21 ; FjL 1
Tj = Tj ;
d n
x
;
2
t (9.9)
5. Limit the anti-diusive ux so that the corrected solution will be free of extrema not
found in Tjn or Tjd . The limiting is eected through a factor: Acj+ 1 = Cj + 21 Aj + 21 ,
where 0 Cj + 21 1.
2
122 CHAPTER 9. SPECIAL ADVECTION SCHEMES
6. Apply the anti-diusive ux to get the corrected solution
Acj+ 21 ; Acj 1
Tj = Tj ;
n +1 d
x
;
2
t (9.10)
Notice that for C = 0 the anti-diusive uxes are not applied, and we end up with
Tjn+1 = Tjd while for C = 1, they are applied at full strength.
9.3.2 One-Dimensional Flux Correction Limiter
In order to elucidate the role of the limiter we expand the last expression in terms of the
high and low order uxes to obtain:
x
(9.11)
The term under the bracket can thus be interpreted as a weighed average of the low and
high order ux and the weights depend on the local smoothness of the solution. Thus for
a rough neighborhood we should choose C ! 0 to avoid oscillations, while for a smooth
neighborhood C = 1 to improve accuracy. As one can imagine the power and versatility
of FCT lies in the algorithm that prescribe the limiter. Here we prescribe the Zalesak
limiter.
1. Optional step designed to eliminate correction near extrema. Set Aj + 21 = 0 if:
8 d
> A T ;
< j+ 2 j+2 j+1
1 T d <0
Aj+ 21 Tjd+1 ; Tjd < 0 and > or (9.12)
: Aj+ 1 Tjd ; Tjd 1 < 0 ;
2
; ;
7. We now choose the limiting factors so as enforce the extrema constraints simulta-
neously on adjacent cells.
8 +
< min Rj+1 Rj if Aj+ 1 > 0
;
Cj+ 21 = : 2 (9.20)
min Rj+ Rj +1 if Aj + 12 < 0
;
Finally, the incoming and outgoing uxes in cell (j k) are given by:
+ = max 0 A 1 ; min 0 A 1 + max 0 A
Pjk 1 ; min 0 Ajk + 1 (9.24)
j 2 k ;
j+ 2 k jk 2 ;
2
Pjk = max 0 Aj + 12 k ; min 0 Aj 12 k + max 0 Ajk+ 21 ; min 0 Ajk 21 (9.25)
;
; ;
MC:
The graph of these limiters is shown in gure 9.1. The dierent functions have a number
of common features. All limiters set C = 0 near extrema (r 0). They all asymptote
to 2, save for the minmod limiter which asymptotes to 1, when the function changes
rapidly (r 1). The minmod limiter is the most stringent of the limiters and prevents
the solution gradient from changing quickly in neighboring cells this limiter is known as
being diusive. The other limiters are more lenient, the MC one being the most lenient,
and permit the gradients in neighboring cells to be twice as large as the one in the
neighboring cell. The Van Leer limiter is the smoothest of the limiters and asymptotes
to 2 for r ! 1.
9.5 MPDATA
The Multidimensional Positive Denite Advection Transport Algorithm (MPDATA) was
presented by Smolarkiewicz (1983) as an algorithm to preserve the positivity of the eld
throughout the simulation. The motivation behind his work is that chemical tracers
126 CHAPTER 9. SPECIAL ADVECTION SCHEMES
3
2.5
2 MC
ee
p erb
C(r)
1.5 S u
Van Leer
1 minmod
0.5
0
0 1 2 3
r
Figure 9.1: Graph of the dierent limiters as a function of the slope ratio r.
must remain positive. Non-oscillatory schemes like FCT are positive denite but are
deemed too expensive, particularly since oscillations are tolerable as long as they did not
involve negative values. MPDATA is built on the monotone donor cell scheme and on
its modied equation. The latter is used to determine the diusive errors in the scheme
and to correct for it near the zero values of the eld. The scheme is presented here in
its one-dimensional form for simplicity. The modied equation for the donor cell scheme
where the uxes are dened as in equaiton 9.7 is:
@T + @uT = @ @T + O(x2 ) (9.32)
@t @x @x @x
where is the numerical diusion generated by the donor cell scheme:
= jujx 2; u t
2
(9.33)
The donor cell scheme will produce a rst etimate of the eld which is guranteed
to be non-negative if the initial eld is initially non-negative. This estimate however, is
too diused and must be corrected to eliminate these rst order errors. MPDATA data
achieves the correction by casting the second order derivatives in the modied equation
9.32 as another transport step with a pseudo-velocity u~:
8 @T
@T = ; @ u~T u~ = < T >0 (9.34)
@t @x : 0T @x T = 0
and re-using the donor cell scheme to discretize it. The velocity u~ plays the role of an
anti-diusion velocity that tries to compensate for the diusive error in the rst step.
The correction step takes the form:
Td ; Td
u~j+ 21 = uj+ 21 x ; u2j+ 21 t (T d j++1T d +j)x (9.35)
j +1 j
9.6. WENO SCHEMES IN VERTICAL 127
u~j + 21 + u~j + 21 u~j+ 21 ; u~j + 21
F~j+ 12 = 2 Tjd + 2 Tjd+1 (9.36)
~
F~j + 21 ; F~j 12
Tjn +1 = Tj ; x t ;
(9.37)
where Tjd is the diused solution from the donor-cell step, and is a small positive
number, e.g. 10 15, meant to prevent the denominator from vanishing when Tjd =
;
Tjd+1 = 0. The second donor cell step is stable provided the original one is too and
hence the correction does not penalize the stability of the scheme. The procedure to
derive the two-dimensional version of the scheme is similar the major diculty is in
deriving the modied equation and the corresponding anti-diusion velocity. It turns
out that the x-component of the anti-diusion velocity remains the same while the y-
components takes a similar form with u replaced by v, and x by y.
i zi; 21
nd a polynomial pi (z ), of degree at most k ; 1, for each cell i, such that it is a k-th
order accurate approximation to the function T (z ) inside Ii :
pi (z ) = T (z) + O(zk ) z 2 Ii i = 1 2 : : : N (9.42)
The polynomial pi (z ) interpolates the function within cells. It also provides for a dis-
continuous interpolation at cell boundaries since a cell boundary is shared by more then
one cell we thus write:
Ti+ 1 = pi(zi 12 ) Ti+ 1 = pi(zi+ 12 )
; ;
;
(9.43)
2 2
Given the cell Ii and the order of accuracy k, we rst choose a stencil, S (i k l), based
on Ii , l cells to the left of Ii and s cells to the right of Ii with l + s + 1 = k. S (i) consists
of the cells:
S (i) = fIi l Ii l+1 : : : Ii+s g
; ; (9.44)
There is a unique polynomial p(z ) of degree k ; 1 = l + s, whose cell average in each of
the cells in S (i) agrees with that of T (z ):
Zz 1
T zj = 1z j+ 2 p(z ) dz j = i ; l : : : i + s:
0 0
(9.45)
j zj; 1 2
The polynomial in question is nothing but the derivative of the Lagrangian interpolant
of the function T (z ) at the cell boundaries. To see this, we look at the primitive function
of T (z ): Zz
T (z) = T (z ) dz 0
(9.46)
0
;1
where the choice of lower integration limit is immaterial. The function T (z ) at cell edges
can be expressed in terms of the cell averages:
Xi Z zj+ 21 Xi
T (zi+ 21 ) = T (z ) dz =
0 0
T zj zj (9.47)
j= ;1
zj; 1 j= ;1
2
Thus, the cell averages dene the primitive function at the cell boundaries. If we denote
the unique polynomial of degree at most k which interpolates T at the k + 1 points:
zi l 21 : : : zi+s+ 21 , by P (z), and denote its derivative by p(z ), it is easy to verify that:
; ;
The construction of the polynomial p(z ) is now straightforward. We can start with
the Lagrange intepolants on the k + 1 cell boundary and dierentiate with respect to z
to obtain:
0 X k Y k 1
BB z ; zi l+q 21 C CC
XX z
k m 1; B
B n =0 q =0
n6=m q6=mn CC
; ;
m=0 j =0
; ;
A ; ;
n=0
n6=m
The order of the outer sums can be exchanged to obtain an alternative form which maybe
computationally more practical:
kX1
Clj (z)T zi
;
p(z) = l+j
;
(9.54)
j =0
where Clj (z ) is given by:
0 X k Y k 1
BB z ; zi l+q 21 C CC
BB X
; ;
k n=0 q=0
Clj (z ) = zi l+j B n6=m q6=mn C
;
BBm=j+1 Yk CCC (9.55)
@ zi l+m 21 ; zi l+n 12 A
; ; ; ;
n=0
n6=m
The coecient Clj need not be computed at each time step if the computational grid is
xed, instead they can be precomputed and stored to save CPU time. The expression
for the Clj simplies (because many terms vanish) when the point z coincide with a cell
edge and/or when the grid is equally spaced (zj = z 8j ).
ENO reconstruction
The accuracy estimate holds only if the function is smooth inside the entire stencil
S (i k l) used in the interpolation. If the function is not smooth Gibbs oscillations
appear. The idea behind ENO reconstruction is to vary the stencil S (i k l), by changing
the left shift l, so as to choose a discontinuity-free stencil this choice of S (i k l) is called
an \adaptive stencil". A smoothness criterion is needed to choose the smoothest stencil,
130 CHAPTER 9. SPECIAL ADVECTION SCHEMES
and ENO uses Newton divided dierences. The stencil with the smoothest Newton
divided dierence is chosen.
ENO properties:
1. The accuracy condition is valid for any cell which does not contain a discontinuity.
2. Pi (z ) is monotone in any cell Ii which does contain a discontinuity.
3. The reconstruction is Total Variation Bounded (TVB as opposed to TVD), that
is there is a function Q(z ) satisfying Q(z ) = Pi (z ) + O(zik+1 ) z 2 Ii , such that
TV (Q) TV (T ).
ENO disadvantages:
1. The choice of stencil is sensitive to round-o errors near the roots of the solution
and its derivatives.
2. The numerical ux is not smooth as the stencil pattern may change at neighboring
points.
3. In the stencil choosing process k stencils are considered covering 2k ; 1 cells but
only one of the stencils is used. If information from all cells are used one can
potentially get 2k ; 1-th order accuracy in smooth regions.
4. ENO stencil choosing is not computationally ecient because of the repeated use
of \if" structures in the code.
9.6.2 WENO reconstruction
WENO attempts to address the disadvantages of ENO, primarily a more ecient use of
CPU time to gain accuracy in smooth region without sacricing the TVB property in
the presence of discontinuity. The basic idea is to use a convex combination of all stencils
used in ENO to form a better estimate of the function value. Suppose the k candidate
stencils S (i k l) l = 0 : : : k ; 1 produce the k dierent estimates:
kX1
Clj T zi
;
!l 0 !l = 1 (9.58)
l=0
Furthermore, when the solution has a discontinuity in one or more of the stencils we
would like the corresponding weights to be essentially 0 to emulate the ENO idea. The
9.6. WENO SCHEMES IN VERTICAL 131
weights should also be smooth functions of the cell averages. The weights described
below are in fact C .
1
n l
n=0
Here, the dl are the factor needed to maximize the accuracy of the estimate, i.e. to
make the truncation error O(z 2k 1 ). Note that the weights must be as close to dl in
;
smooth regions, actually we have the requirement that !l = dl + O(z k ). The factor
is introduced to avoid division by zero, a value of = 10 6 seems standard. Finally,
l
;
are the smoothness indicators of stencils S (i k l). These factors are responsible for the
success of WENO they also account for much of the CPU cost increase over traditional
methods. The requirements for the smoothness indicator are that
l = O(z 2 ) in smooth
regions and O(1) in the presence of discontinuities. This translates into l = O(1) in
smooth regions and O(z 4 ) in the presence of discontinuities. The smoothness measures
advocated by Shu et al look like weighed H k 1 norms of the interpolating functions:
;
kX1 Z zi+ 1
; @ npl 2
l = 2
z 2n 1;
@z n dz (9.60)
n=1 zi; 12
The right hand side is just the squares of the scaled L2 norms for all derivatives of the
polynomial pl over the interval !zi 21 zi+ 21 ]. The factor z 2n 1 is introduced to remove
;
;
any z dependence in the derivatives in order to preserve self similarity the smoothness
indicator are the same regardless of the underlying grid. The smoothness indicators for
the case k = 2 are:
0 = (T zi+1 ; T zi )2
1 = (T zi ; T zi 1 )2 ;
(9.61)
Higher order formulae can be found in !18, 5]. The formulae given here have a one-
point upwind bias in the optimal linear stencil suitable for a problem with wind blowing
from left to right. If the wind blows the other way, the procedure should be modied
symemetrically with respect to zi+ 21 .
−5
2
10
−5
10
3
−10
3
10
ε
ε
−10
10
5
−15 7 5
10 9
6
7 −20
10
−15 10 1 2 3 4
1 2 3
10 10 10 10 10 10 10
N N
Figure 9.3: Convergence Rate (in the maximum norm) for ENO (left panel) and WENO
(right panel) reconstuction. The dashed curves are for a left shift set to 0, while the solid
curve are for centered interpolation. The numbers refer to the interpolation order
order WENO5-RK3 scheme has less dissipation, and better phase properties than the
WENO3-RK2 scheme. For the narrow Gaussian hill the peak is well preserved and the
prole is narrower it is indistinguishable from the analytical solution for the wider pro-
le. Finally, although the scheme does not enforce TVD there is no evidence of dispersive
ripples in the case of the hat prole there are however small negative values.
I have tried to implement the shape preserving WENO scheme proposed by !19]
and !5]. Their limiting attempts to preserve the high order accuracy of WENO near
discontinuities and smoth extrema, and as such include a peak discriminator that picks
out smooth extrema from discontinuous ones. As such, I think the scheme will fail to
preserve the peaks of the original shape and will allow some new extrema to be generated.
This is because there is no full proof discrimator. Consider what happens to a square
wave advected by a uniform velocity eld. The discontinuity is initially conned to 1
cell the discriminator will rightly ag it as a discontinuous extremum and will apply
the limiter at full strength. Subsequentally, numerical dissipation will smear the front
across a few cells and the front width will occupy a wider stencil. The discriminator,
which works by comparing the curvature at a xed number of cells, will fail to pick the
widening front as a discontinuity, and will permit out of range values to be mistaken for
permissible smooth extrema.
In order to test the eectiveness of the limiter, I have tried the 1D advection of a
square wave using the limited and unlimited WENO5 (5-th order) coupled with RK2.
Figure 9.5 compares the negative minimum obtained with the limited (black) and un-
limited (red) schemes the x-axis represent time (the cone has undergone 4 rotations by
the end of the simulation). The dierent panels show the result using 16, 32, 64 and 128
cells. The trend in all cases is similar for the unlimited scheme: a rapid growth of the
negative extremum before it reaches a quasi-steady state. The trend for the limited case
is dierent. Initially, the negative values are suppressed the black curves starting away
from time 0. This is initial period increases with better resolution. After the rst nega-
9.6. WENO SCHEMES IN VERTICAL 133
Figure 9.4: Advection of several Shchepetkin proles. The black solid line refers to the
analytical solution, the red crosses to the WENO3 (RK2 time-stepping), and the blue
stars to WENO5 with RK3. The WENO5 is indistiguishable from the analytical solution
for the narrow prole
tive values appear, there is a rapid deterioration in the minimum value before reaching a
steady state. This steady state value decreases with the number of points, and becomes
negligeable for the 128 cell case. Finally, note that unlimited case produces a slightly
better minimum for the case of the 16 cells, but does not improve substantially as the
number of points is increased. For this experiment, the interval is the unit interval and
the hat prole is conned to jxj < 1=4 the time step is held x at t = 1=80, so the
Courant number increases with the number of cells used.
I have repeated the experiments for the narrow prole case (Shchepetkin's prole),
and conrmed that the limiter is indeed able to supress the generation of negative value,
even for a resolution as low as 64 cells (the reference case uses 256 cells). The discrim-
inator, however, allows a very slight and occasional increase of the peak value. By in
large, the limiter does a good job. The 2D cone experiments with the limiters are shown
in the Cone section.
134 CHAPTER 9. SPECIAL ADVECTION SCHEMES
0 0
10 10
−10 −10
10 10
−20 −20
10 10
0 0
10 10
−10 −10
10 10
−20 −20
10 10
Figure 9.5: Negative minima of unlimited (red) and limited (black) WENO scheme on
a square hat prole. Top left 16 points, top right 32 points, bottom left 64 points, and
bottom rights 64 points.
9.7 Utopia
The uniform third order polynomial interpolation algorithm was derived explicitly to be
a multi-dimension, two-time level, conservative advection scheme. The formulation is
based on a nite volume formulation of the advection equation:
Z
(V T )t + F n dS = 0 (9.62)
@ V
where T z is the average of T over the cell V and F n are the uxes passing through
the surfaces @ V of the control volume. A further integral in time reduces the solution
to the following: Z t Z
T n+1 = T n + 1V F n dS dt = 0 (9.63)
0 @V
A further denition will help us interpret the above
R formula. If we let the time-average
ux passing the surfaces bounding V as F t = 0t F dt we end up with the following
9.7. UTOPIA 135
(j ; 1 k + 1) (j k + 1)
E t
;A
(j + 1 k + 1)
t
B;;
(j ; 1 k) (j k) (j + 1 k )
F
;;D
C;
(j ; 1 k ; 1) (j k ; 1) (j + 1 k ; 1)
(j ; 1 k ; 2) (j k ; 2) (j + 1 k ; 2)
Figure 9.6: Sketch of the particles entering cell (j k) through its left edge (j ; 21 k)
assuming positive velocity components u and v.
so that the center of the box is located at (0 0) and the left and right edges at ( 21 0),
respectively. The interpolation formula is designed such as to produce the proper cell-
averages when the function is integrated over cells (j k), (j 1 k) and (j k 1). The
area integral in equation 9.65 must be broken into several integral: First, the area ABCD
stradles two cells, (j ; 1 k) and (j ; 1 k ; 1), with two dierent interpolation for T and
second, the trapezoidal area integral can be simplied. We now have
1 Z
xy ABCD T dx dy = I1 (j k) ; I2 (j k) + I2 (j k ; 1) (9.68)
where the individual contributions from each area are given by:
Z Z 1 Z 1
I1 (j k) = Tjk (
) d
d = 2 2
Tjk (
) d
d (9.69)
AEFD 1; u 1; u
Z Z 1
2
Z 1
2
I2 (j k) = Tjk (
) d
d = 2 2
Tjk (
) d
d: (9.70)
AEB 1
2; u AB ( )
The equation for the line AB is:
AB () = 12 + pq ; 21 (9.71)
using the local coordinates of cell (j k). Performing the integration is rather tedious the
output of a symbolic computer program is shown in gure 9.7
A dierent derivation of the UTOPIA scheme can be obtained if we consider the cell
values are function values and not cell averages. The nite dierence form is then given
by the equation shown in gure 9.8.
9.7. UTOPIA 137
8
; ; ; ;
12
;
Figure 9.7: Flux for the nite volume form of the utopia algorithm.
16
; ; ; ;
; f + f + f
+ m+1n mn m+1n 1 mn 1 u ; f ; ;
+ + f m +1 n + f m 1n
; ; f m +1 n 1 ; ;2 f mn + 2 f mn 1 ; f m 1 n;1 2
u uv ; ;
24
;
f mn ; f m +1 n f m +1 n ; 2 f mn + f m 1 n
+ u+ 2
u u ;
(9.73)
2 6
Figure 9.8: Flux of UTOPIA when variables represent function values. This is the nite
dierence form of the scheme
138 CHAPTER 9. SPECIAL ADVECTION SCHEMES
Chapter 10
10.1 MWR
Consider the following problem: nd the function u such that
L(u) = 0 (10.1)
where L is a linear operator dened on a domain if L is a dierential operator,
appropriate initial and boundary conditions must be supplied. The continuum problem
as dened in equation 10.1 is an innite dimensional problem as it requires us to nd
u at every point of . The essence of numerical discretization is to turn this innite
dimensional system into a nite dimensional one:
X
N
u~ = u^j (x) (10.2)
j =0
Here u~ stands for the approximate solution of the problem, u^ are the N + 1 degrees
of freedom available to minimize the error, and the 's are the interpolation or trial
139
140 CHAPTER 10. FINITE ELEMENT METHODS
functions. Equation 10.2 can be viewed as an expansion of the solution in term of a basis
function dened by the functions . Applying this series to the operator L we obtain
L(~u) = R(x) (10.3)
where R(x) is a residual which is dierent then zero unless u~ is the exact solution of the
equation 10.1. The degrees of freedom u^ can now be chosen to minimize R. In order to
determine the problem uniquely, I can impose N + 1 constraints. For MWR we require
that the inner product of R with a N + 1 test functions vj to be orthogonal:
(R vj ) = 0 j = 0 1 2 : : : N: (10.4)
Recalling the chapter on linear analysis this is equivalent to saying that the projection of
R on the set of functions vj is zero. In the case of the inner product dened in equation
11.13 this is equivalent to
Z
Rvj dx = 0 j = 0 1 2 : : : N: (10.5)
A number of dierent numerical methods can be derived by assigning dierent choices
to the test functions.
10.1.1 Collocation
If the test functions are dened as the Dirac delta functions vj = (x;xj ), then constraint
10.4 becomes:
R(xj ) = 0 (10.6)
i.e. it require the residual to be identically zero on the collocation points xj . Finite
dierences can thus be cast as a MWR with collocation points dened on the nite
dierence grid. The residual is free to oscillate between the collocation points where it is
pinned to zero the oscillations amplitude will decrease with the number of of collocation
points if the residual is a smooth function.
10.1.3 Galerkin
In the Galerkin method the test functions are taken to be the same as the trial functions,
so that vj = j . This is the most popular choice in the FE community and will be the one
we concentrate on. There are varitions on the Galerkin method where the test functions
are perturbation of the trial functions. This method is usually referred as the Petrov-
Galerkin method. The perturbations are introduced to improve the numerical stability of
the scheme for example to introduce upwinding in the solution of advection dominated
ows.
@x @x + uv dx = fv dx + v @x ; v @x (10.14)
0 0 1 0
The essential boundary conditions on the left edge eliminates the third term on the right
hand side of the equation since v(0) = 0, and the Neumann boundary condition at the
142 CHAPTER 10. FINITE ELEMENT METHODS
right edge can be substituted in the second term on the right hand side. The nal form
is thus: Z 1 @u @v Z1
+ uv dx = fv dx + qv(1) (10.15)
0 @x @x 0
For the weak form to be sensible, we must require that the integrals appearing in the
formulation be nite. The most severe restriction stems from the rst order derivatives
appearing on the left hand side of 10.15. For this term to be nite we must require that
the functions @u @v
@x and @x be integrable, i.e. piecewise continuous with nite jump discon-
tinuities this translates into the requirement that the functions u and v be continuous,
the so-called C0 continuity requirement.
10.2.2 Galerkin form
The solution is approximated with a nite sum as:
X
N
u(x) = u^ii (10.16)
i=0
and the test functions are taken to be v = j ,
j = 1 2 : : : N . The trial functions i
must be chosen such that i>0 (0) = 0, in accordance with the restriction that v(0) = 0.
We also set, without loss of generality, 0 (0) = 1, the rst term of the series is then
nothing but the value of the function at the edge where the Dirichlet condition is applied:
u(0) = u^0 . The substitution of the expansion and test functions into the weak form yield
the following N system of equations in the N + 1 variables u^i :
N Z 1 @ @
X Z1
i j
i=0 0 @x @x + i j dx u^i = 0
fj dx + qj (1) (10.17)
In matrix form this can be re-written as
X
N Z 1 @j @i Z1
@x @x + i j dx bj = 0 fj dx + qj (1) (10.18)
Kjiui = bj Kji =
i=0 0
Note that the matrix K is symmetric: Kji = Kij , so that only half the matrix entries
need to be evaluated. The Galerkin formulation of the weak variational statement 10.15
will always produce a symmetric matrix regardless of the choice of expansion function
the necessary condition for the symmetry is that the left hand side operator in equation
10.15 be symmetric with respect to the u and v variables. The matrix K is usually
referred to as the stiness matrix, a legacy term dating to the early application of the
nite element method to solve problems in solid mechanics.
10.2.3 Essential Boundary Conditions
The series has the N unknowns u^1 i N , thus the matrix equation above must be modied
to take into account the boundary condition. We do this by moving all known qunatities
to the right hand side, and we end up with the following system of equations:
X
N
Kjiui = cj cj = bj ; Kj0u0 j = 1 2 : : : N (10.19)
i=1
10.2. FEM EXAMPLE IN 1D 143
Had the boundary condition on the right edge of the domain been of Dirichlet type,
we would have to add the following restrictions on the trial functions 2 i N 1 (1) = 0.
;
0 1 2
XXXX X XXXXX
X
X
X
X XXXXXXXXXX
s s s
u^0 u^1 u^2
Figure 10.1: 2-element discretization of the interval !0 1] showing the interpolation func-
tions
point. If our expansion functions are Lagrange interpolants, then the coecients u^i
represent the value of the function at the collocation points xj :
X
N
u(xj ) = uj = u^i (xj ) = u^j j = 0 1 : : : N (10.22)
i=0
We will omit the circumex accents on the coecients whenever the expansion functions
are Lagrangian interpolation functions. The use of Lagrangian interpolation simplies
the imposition of the C 0 continuity requirement, and the function values at the colloca-
tion points are obtained directly without further processing.
There are other expansion functions in use in the FE community. For example,
Hermitian interpolation functions are used when the solution and its derivatives must be
continuous across element boundaries (the solution is then C 1 continuous) or Hermitian
expansion is used to model innite elements. These expansion function are usually
reserved for special situations and we will not address them further.
In the following 3 sections we will illustrate how the FEM solution of equation 10.15
proceeds. We will approach the problem from 3 dierent perspectives in order to highlight
the algorithmic steps of the nite element method. The rst approach will consider a
small size expansion for the approximate solution, the matrix equation can then be
written and inverted manually. The second approach repeats this procedure using a
longer expansion, the matrix entries are derived but the solution of the larger system
must be done numerically. The third approach considers the same large problem as
number two above but introduces the local coordinate and numbering systems, and the
mapping between the local and global systems. This local approach to constructing the
FE stiness matrix is key to its success and versatility since it localizes the computational
details to elements and subroutines. A great variety of local nite element approximations
can then be introduced at the local elemental level with little additional complexity at
the global level.
10.2.5 FEM solution using 2 linear elements
We illustrate the application of the Galerkin procedure for a 2-element discretization of
the interval !0 1]. Element 1 spans the interval !0 21 ] and element 2 the interval ! 21 1]
and we use the following interpolation procedure:
u(x) = u0 0 (x) + u1 1 (x) + u2 2 (x) (10.23)
10.2. FEM EXAMPLE IN 1D 145
where the interpolation functions and their derivatives are tabulated below
0 (x) 1 (x) 2 (x) @@x0 @@x1 @2
@x
0 x 21 1 ; 2x 2x 0 ;2 2 0 (10.24)
1 x1 0 2(1 ; x) 2x ; 1 0 ;2 2
2
and shown in gure 10.1. It is easy to verify that the i are Lagrangian interpolation
functions at the 3 collocations points x = 0 1=2 1, i.e. i (xj ) = ij . Furthermore, the
expansion functions are continuous across element interfaces, so that the C 0 continuity
requirement is satised), but their derivates are discontinuous. It is easy to show that the
interpolation 10.23 amounts to a linear interpolation of the solution within each element.
Since the boundary condition at x = 0 is of Dirichlet type, we need only test with
functions that satisfy v(0) = 0 in our case the functions 1 and 2 are the only can-
didates. Notice also that we have only 2 unknowns u1 and u2 , u0 being known from
the Dirichlet boundary conditions thus only 2 equations are needed to determine the
solution. The matrix entries can now be determined. We illustrate this for two of the
entries, and assuming is constant for simplicity:
Z 1 @1 @0 Z
2 )] dx = ;2 + (10.25)
1
K10 = + 1 0 d x = 2
! ; 4 + (2 x ; 4 x
0 @x @x 0 12
Notice that the integral over the entire domain reduces to an integral over a single element
because the interpolation and test functions 0 and 1 are non-zero only over element 1.
This property that localizes the operations needed to build the matrix equation is key
to the success of the method.
The entry K11 requires integration over both elements:
Z 1 @1 @1
K11 = @x @x + 1 1 dx (10.26)
0
Z 21 Z1
= !4 + 4x ] dx + 1 !4 + 4(1 ; x)2 ] dx
2 (10.27)
0
2 2 2
= 2+ + 2+
12 = 4 + 4
12 12 (10.28)
The remaining entries can be evaluated in a similar manner. The nal matrix equation
takes the form:
! 0 u0 1 !
;2 + 12 4 + 412 ;2 + 12 B@ u1 CA = b1 (10.29)
0 ;2 + 12 2 + 212 u2 b2
Note that since the value of u0 is known we can move it to the right hand side to obtain
the following system of equations:
! ! !
4 + 412 ;2 + 12 u1 = b1 + 2 ; 12 u0
(10.30)
;2 + 12 2 + 212 u2 b2
146 CHAPTER 10. FINITE ELEMENT METHODS
whose solution is given by
! ! !
u1 = 1 2 + 212 2 ; 12 b1 + 2 ; 12 u0 (10.31)
u2 2 ; 12 4 + 412 b2
where = 8(1 + 12 )2 ; ( 12 ; 2)2 is the determinant of the matrix. The only missing
piece is the evaluation of the right hand side. This is easy since the function f and the
ux q are known. It is possible to evaluate the integrals directly if the global expression
for f is available. However, more often that not, f is either a complicated function, or is
known only at the collocation points. The interpolation methodology that was used for
the unknown function can be used to interpolate the forcing functions and evalute their
associated integrals. Thus we write:
Z1
bj = fj dx + qj (1) (10.32)
0
Z 1X
2
= (fi i )j dx + qj (1) (10.33)
0 i=0
2 Z 1
X
= ij dx fi + qj (1) (10.34)
i=0 0
! 0 f0 1 !
1 1 4 1 B f C+ 0
= 12 0 1 2 @ f1 A q (10.35)
2
The nal solution can thus be written as:
! !
f0 +4f1 +f2 + 2 ;
!
u1 = 1 2 + 212 2 ; 12 12 12 u0 (10.36)
u2 2 ; 12 4 + 412 f 1+2f2 + q
12
If u0 = 0, = 0 and f = 0, the analytical solution to the problem is u = qx. The
nite element solution yields:
! ! ! q !
u1 = 41 2 2 0 = 2 (10.37)
u2 2 4 q q
which is the exact solution of the PDE. The FEM procedure produces the exact result
because the solution to the PDE is linear in x. Notice that the FE solution is exact at
the interpolation points x = 0 1=2 1 and inside the elements.
If f = ;1, and the remaining parameters are unchanged the exact solution is
quadratic ue = x2 =2 + (q ; 1)x, and the nite element solution is
! ! 1
! 4q 3 !
u1 = 14 2 2 ;
2 1 = 8
;
(10.38)
u2 2 4 q; 4 q ; 12
Notice that the FE procedure yields the exact value of the solution at the 3 interpolation
points. The errors committed are due to the interpolation of the function within the
10.2. FEM EXAMPLE IN 1D 147
0.5 0.7
0.6
0.4 f=0
0.5
0.3 0.4 f=−x
0.2 0.3
0.2
0.1
0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.8
0.6
0.4 f=−x2
0.2
0
0 0.2 0.4 0.6 0.8 1
Figure 10.2: Comparison of analytical (solid line) and FE (dashed line) solutions for
the equation ux x + f = 0 with homogeneous Dirichlet condition on the left edge and
Neumann condition ux = 1 on the right edge. The circles indicate the location of the
interpolation points. Two linear nite elements are used in this example.
148 CHAPTER 10. FINITE ELEMENT METHODS
elements the solution is quadratic whereas the FE interpolation provide only for a linear
variation.
For f = ;x, the exact solution is ue = x3 =6 + (q ; 1=2)x, and the FE solution is:
! ! 1
! 24q 11 !
u1 = 14 2 2 ;
4 5 = 48
;
(10.39)
u2 2 4 q ; 24 q ; 13
The solution is again exact at the interpolation points by in error within the element due
to the linear interpolation. This fortuitous state of aair is due to the exact evaluation
of the forcing term f which is also exactly interpolated by the linear functions.
For f = ;x2 , the exact solution is u = x4 =12 + (q ; 1=2)x, and the FE solution is:
! ! 1
! 48q 10 !
u1 = 41 2 2 ;
12 9 = 96
;
(10.40)
u2 2 4 q ; 48 q ; 22
96
This time the solution is in error at the interpolation points also. Figure 10.2 compare
the analytical and FE solutions for the 3 cases after setting q = 1.
0 Xj 1 j
X j+1 N
HHH XXXX;
X XX X XX
u
HH u : : : XXXX XXXX : : :
u u u u u
Figure 10.3: An element partition of the domain into N linear elements. The element
edges are indicated by the lled dots.
In order to decrease the error further for cases where f is a more complex function,
we need to increase the number of elements. This will increase the size of the matrix
system and the computational cost. We go through the process of developing the stiness
equations for this system since it will be useful for the understanding of general FE
concepts. Suppose that we have divided the interval into N elements (not necessarily of
equal size), then interpolation formula becomes:
X
N
u(x) = ui i (x) (10.41)
i=0
Element number j , shown in gure 10.3, spans the interval !xj 1 xj ], for j = 1 2 : : : N
its left neighbor is element j ; 1 and its right number is element j + 1. The length of
;
each element is xj = xj ; xj 1 . The linear interpolation function associated with node
;
10.2. FEM EXAMPLE IN 1D 149
j is 8
>
> 0 x < xj 1
< x xj;1
;
;
xj xj;1 xj 1 x xj
j (x) = > xj+1 x
; ;
xj x xj+1 (10.42)
>
;
xj+1 xj
:
0
;
xj +1 < x
Let us now focus on building the stiness matrix row by row. The j th row of K
corresponds to setting the test function to j . Since the function j is non-zero only
on the interval !xj 1 xj +1 ], the integral in the stiness matrix reduces to an integration
;
Z xj !
= 1 x ; xj
; (x ; x )2 + x ; x x ; x1 x j ;
; x dx (10.46)
xj;1 j j 1 ;j j 1 j ; j 1 ;
j ! j +1
!
= 1 + 2xj + 1 + 2xj +1
xj 6 xj +1 6
Note that all entries in the matrix equations are identical except for the rows associated
with the end points where the diagonal entries are dierent. It is easy to show that we
must have:
K00 = 1x + 26x1 (10.51)
1
KNN = 1x + 26xN (10.52)
N
150 CHAPTER 10. FINITE ELEMENT METHODS
The evaluation of the right hand sides leads to the following equations for bj :
Z xj+1
bj = fj dx + qj (1) (10.53)
xj;1
jX+1 Z xj +1
= ij dxfi + qN (1)Nj (10.54)
i=j 1 xj;1
;
Again, special consideration must be taken when dealing with the edge points to account
for the boundary conditions properly. In the present case b0 is not needed since a Dirichlet
boundary condition is applied on the left boundary. On the right boundary the right
hand side term is given by:
bN = 61 !xj fN 1 + 2(xN fj ] + q
; (10.56)
Note that the ux boundary condition aects only the last entry of the right hand side.
If the grid spacing is constant, a typical of the matrix equation is:
1 x
; x + 6 uj 1 + 2 1x + 26x uj + ; 1x + 6 x uj+1 =
;
x (f + 4f + f ) (10.57)
6 j 1 ; j j +1
For = 0 it is easy to show that the left hand side reduces to the centered nite dierence
approximation of the second order derivative. The nite element discretization produces
a more complex approximation for the right hand side involving a weighing of the function
at several neighboring points.
10.2.7 Local stiness matrix and global assembly
We note that in the previous section we have built the stiness matrix by constantly
referring to a global node numbering system and a global coordinate system across all
elements. Although this is practical and simple in one-dimensional problems, it becomes
very tedious in two and three dimensions, where elements can have arbitrary orientation
with respect to the global coordinate system. It is thus useful to transform the compu-
tations needed to a local coordinate system and a local numbering system in order to
simplify/automate the building of the stiness matrix. We illustrate these local entities
in the one-dimensional case since they are easiest to grasp in this setting.
For each element we introduce a local coordinate system that maps the element j
dened over xj 1 x xj into ;1 1. The following linear map is the simplest
;
1 xj 2
Figure 10.4: Local coordinate system and local numbering system
We also introduce a local numbering system so the unknown can be identied locally.
The superscript j , whenever it appears, indicate that a local numbering system is used to
refer to entities dened over element j . In the present instance the uj1 refers to the global
unknown uj 1 and uj2 refers to the global unknown uj . Finally, the global expansion
;
functions, j are transformed into local expansion functions so that the interpolation of
the solution u within element j is:
uj () = uj1 h1 ( ) + uj2 h2 () (10.60)
where the functions h12 are the local Lagrangian functions
h1 () = 1 ;2 h2 ( ) = 1 +2 (10.61)
It is easy to show that h1 should be identied with the right limb of the global function
j 1 while h2 should be identied with the left limb of global function j (x).
;
The operations that must be carried out in the computational space include dieren-
tiation and integration. The dierentiation in physical space is evaluated with the help
of the chain rule:
@uj = @uj @ = @uj 2 (10.62)
@x @ @x @ x j
@
where is the metric of mapping element j from physical space to computational space.
@x
For the linear mapping used here this metric is constant. The derivative of the function
in computational space is obtained from dierentiating the interpolation formula 10.60:
@uj = uj @h1 + uj @h2 (10.63)
@ 1 @ 2 @
j j
= u2 ; u1
2 (10.64)
For the linear interpolation functions used in this example, the derivative is constant
throught the element.
We know introduce the local stiness matrices which are the contributions of the
local element integration to the global stiness matrix:
Z xj @hm @hn
j =
Kmn + h ( )h ( ) dx (m n) = 1 2 (10.65)
@x @x m n
xj ;1
Notice that the local stiness matrix has a small dimension, 2 2 for the linear inter-
polation function, and is symmetric. We evaluate these integrals in computational space
152 CHAPTER 10. FINITE ELEMENT METHODS
j and M j dened as follows:
by breaking them up into 2 pieces Dmn mn
Z xj @hm @hn Z 1 @hm @hn @ 2 @x
j
Dmn = dx = d (10.66)
xj;1 @x @x 1 @ @ @x @
Z xj Z1 @x
;
j
Mmn = hm hn dx = hm hn @ d (10.67)
xj;1 ;1
The integrals in physical space have been replaced with integrals in computational space
in which the metric of the mapping appears. For the linear mapping and interpolation
function, these integrals can be easily evaluated:
x j
Z 1 (1 ; )2 (1 ; 2) ! x j
2 1
!
Mj = 2 d = 6 (10.68)
1 (1 ; ) (1 + )
2 2 1 2
;
! Assign Connectivity in 1D
do j = 1,N ! element loop
do m = 1,P ! loop over collocation points
imap(m,j) = (j-1)*(P-1)+m ! assign global node numbers
enddo
enddo
! Gather/Scatter operation
do j = 1,N ! element loop
do m = 1,P ! loop over local node numbers
mg = imap(m,j) ! global node number of node m in element j
ul(m,j) = u(mg) ! local gathering operation
v(mg) = vl(m,j) ! local scattering
enddo
enddo
! Assembly operation
K(1:Nt,1:Nt) = 0 ! global stiffness matrix
do j = 1,N ! element loop
do n = 1,P
ng = imap(n,j) ! global node number of local node n
do m = 1,P
mg = imap(m,j) ! global node number of local node m
K(mg,ng) = K(mg,ng) + Kl(m,n,j)
enddo
enddo
enddo
; (10.79)
P 1 m ; m n=1n=m m 6
n
LP 1 denotes the Legendre polynomial of degree (P ; 1) and LP 1 denotes its derivative.
;
0
The P collocation points n are the P Gauss-Lobatto roots of the Legendre polynomials
;
1 1
h2 h22 h31 h32 h33
0.8 1 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
1 1
h42 h43 h62 h63 h64 h65
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Figure 10.6: Plot of the Lagragian interpolants for dierent polynomial degrees: linear
(top left), quadratic (top right), cubic (bottom left), and fth (bottom right). The
collocation points are shown by circles, and are located at the Gauss-Lobatto roots of
the Legendre polynomial. The superscript indicates the total number of collocation
points, and the subscript the collocation point with which the polynomial interpolant is
associated.
r1 r2 r r
rm r r
r P
2Q+1 (Q!)4
2Q g
2 @
RQ = (2Q + 1)!(2Q)!]3 @2Q j j < 1 (10.84)
Gauss quadrature omits the end points of the interval = 1 and considers only interior
points. Notice that if the integrand is a polynomial of degree 2Q ; 1 its 2Q-derivative is
zero and the remainder vanishes identically. Thus a Q point Gauss quadrature integrates
all polynomials of degree less then 2Q exactly.
Gauss-Lobatto Quadrature
The Gauss-Lobatto quadrature formula include the end points in their estimate of the
integral. The roots, weights, and remainder of a Q-order Gauss-Lobatto quadrature are:
2
1 ; pGL LQ 1(pGL ) = 0
0
;
(10.85)
!pGL = (1 ; 2 )!2L ( )]2 p = 1 2 : : : Q (10.86)
p Q p
0
3 22Q 1 !(Q ; 2)!]4 @ 2Q 2 g
; Q ( Q ; 1)
@2Q jj <(10.87)
; ;
X
hm (p )hm (p ) @x
@ p !p (10.89)
p=1
XP " @x #
= mpnp @ !p (10.90)
p=1 p
@x
= mn @ !m (10.91)
m
Equation 10.91 shows that mass matrix becomes diagonal when the quadrature and col-
location points are identical. This rule applies equatlly well had we chosen the Gauss
points for quadrature and collocation. However, the Gauss-Lobatto roots are preferred
since they simplify the imposition of C 0 continuity across elements (there are formulation
where C 0 continuity is not necessary, Gauss quadrature and collocation becomes sensi-
ble). The implication of a diagonal mass matrix is profound for it simplies considerably
the computations of time-dependent problems. As we will see later, the time-integration
requires the inversion of the mass matrix, and this task is innitely easier when the mass
matrix is diagonal. The process of reducing the mass matrix to diagonal is occasionally
referred to as mass lumping. One should be carefull when low order nite elements are
used to build the mass matrix as the reduced quadrature introduces error. For low order
interpolation (linear and quadratic) mass lumping reduces the accuracy of the nite ele-
ment method substantially the impact is less pronounced for higher order interpolation
the rule of thumb is that mass lumping is terrible for linear element and has little impact
for P > 3.
Example 14 We solve the 1D problem: uxx ; 4u = 0 in 0 x 1 subject to the bound-
ary conditions u(0) = 0, and ux = 1. The analytical solution is u = sinh2x=(2cosh2).
The rms error between the nite element solution and the analytical solution is shown
in gures 10.8 as a function of the number of degrees of freedom. The plots show the
error for the linear (red) and quadratic (blue) interpolation. The left panel shows a
semi logarithmic plot to highlight the exponential convergence property of the spectral
element method, while the right panel shows a log-log plot of the same quantities to
show the algebraic decrease of the error as a function of resolution for the linear and
quadratic interpolation as evidenced by the straight convergence lines. The slopes of the
convergence curves for the spectral element method keeps increasing as the number of
degrees of freedom is increased. This decrease is most pronounced when the degrees of
freedom as added as increased spectral interpolation within each element as opposed to
increasing the number of elements. Note that the spectral element computations used
10.3. MATHEMATICAL RESULTS 159
−2
10
−2 10
−4
−4
1 10 1
10
2 2
−6
10
−6 10
3 3
4 4
5 −8
5
10
−8 10
||ε||2
||ε||2
−10
10
−10 10
−12
10
−12 10
−14
10
−14 10
−16
10
−16 10
0 1 2 3
0 50 100 150 200 250 300 350 10 10 10 10
K(N−1)+1 K(N−1)+1
Figure 10.8: Convergence curves for the nite element solution of uxx ; 4u = 0. The
red and blue lines show the solutions obtained using linear and quadratic interpolation,
respectively, using exact integration the black lines show the spectral element solution.
Gauss-Lobatto quadrature to evaluate the integrals, whereas exact integration was used
for linear and quadratic nite elements. The inexact quadrature does not destroy the
spectral character of the method.
kuk2 V . This, in combination with the coercivity condition guarantees that A is norm
equivalent to kkV . The above theorem guarantees the existence and uniqueness of the
continuous solution. We take the issue of the discrete solution in the following section.
160 CHAPTER 10. FINITE ELEMENT METHODS
10.3.2 Uniqueness and Existence of continuous solution
The innite-dimensional continuous solution u^, must be approximated with a nite di-
mensional approximation uN where N characterizes the dimensionality (number of de-
grees of freedom) of the discrete solution. Let VN V be a nite dimensional subspace
of V providing a dense coverage of V so that in the limit N ! 1, limN VN = V . !1
Since VN is a subset of V the condition of the Lax-Milgram theorem are fulllled and
hence a unique solution exists for the discrete problem. The case where the discrete space
VN V (VN is a subset of V ) is called the conforming case. The proofs of existence
and uniqueness follow from the simple fact that VN is a subset of V . Additionally, for
the Galerkin approximation the following stability condition is satised:
kuN kV C kf k (10.93)
where C is a positive constant independent of N . One has moreover that:
kun ; u^kV
vinfVN ku^ ; vkjV
2
(10.94)
The inequality (10.93) shows that the V -norm of the numerical solution is bounded by
the L2 -norm of the forcing function f (the data). This is essentially the stability criterion
of the Lax-Richtmeyer theorem. Inequality (10.94) says that the V -norm of the error is
bounded by the smallest error possible v 2 VN to describe u^ 2 V . By making further
assumptions about the smoothness of the solution u^ it is possible to devise error estimates
in terms of the size of the elements h. The above estimate guarantees that the left hand
side of (10.94) goes to zero as N ! 1 since VN ! V . Hence the numerical solution
converges to the true solution. According to the Lax-Richtmeyer equivalence theorem,
since two conditions (stability and convergence) hold, the third condition (consistency)
must follow the discretization is hence also consistent.
10.3.3 Error estimates
Inequality (10.94) provide the mean to bound the error in the numerical solution. Let
u^I = IN u^ be the interpolant of u^ in VN . An upper bound on the approximation error
can be obtained since
kun ; u^kV
vinfVN kv ; u^kjV C ku^I ; u^kV
2
(10.95)
Let h reect the charateristic size of an element (for a uniform discretization in 1D, this
would be equivalent to x). One expects kun ; u^kV ! 0 as h ! 0. The rate at which
that occurs depends on the smothness of the solution u^.
For linear intepolation, we have the following estimate:
Z 1 21
kun ; u^kH 1 C h ju^jH 2 ju^jH 2 = uxxdx (10.96)
0
where ju^jH 2 is the so-called H 2 semi-norm, (essentially a measure of the \size" of the
second derivative), and C is a generic positive constant independent of h. If the solution
10.4. TWO DIMENSIONAL PROBLEMS 161
admits integrabale second derivatives, then the H 1 -norm of the error decays linearly with
the grid size h. The L2 -norm of the error however decreases quadratically according to:
Z 1 12
kun ; u^kH 1 C~ h2 ju^jH 2 ju^jH 2 =
0
(uxx )2 dx (10.97)
The dierence between the rate of convergences of the two error norms is due to the
fact that the H 1 -norm takes the derivatives of the function into account. That the rst
derivative of u^ are approximated to rst order in h while u^ itself is approximated to
second order in h.
For an interpolation using polynomials of degree k the L2 -norm of the error is given
by
2Z k+1 !2 3 21
kun ; u^k C^ hk+1 ju^jH k+1 ju^jH k+1 = 4 ddxk+1u dx5
1
(10.98)
0
provided that the solution is smooth enough i.e. the k + 1 derivative of the solution is
square-integrable.
For the spectral element approximation using a single element, the error depends on
N and the regularity of the solution. If u^ 2 H m with m > 0, then the approximation
error in the L2 norm is bounded by
kun ; u^k C N ; m ku^kH m (10.99)
The essential dierence between the p-version of the nite element method and the
spectral element method lies in the exponent of the error (note that h N 1 ). In the ;
c
k
h@hhhhhh
@ Ai hhhhhh
@ j
@P
Ak
Aj
i
Figure 10.9: Natural area coordinate system for triangular elements.
1
0.5
0.5
1 0
1
0.5 1
0.5
0.5
0
0 0 0
0 0.2 0.4 0.6 0.8 1
0.5
0
1
1
0.5
0.5
0 0
Figure 10.10: Linear interpolation function over linear triangular elements. The triangle
is shown in dark shades. The three interpolation functions are shown in a light shade
and appear as inclined planes in the plot.
where ui , uj and uk are the values of the solution at the nodes i, j and k. The interpo-
lation formula for the triangle guarantees the continuity of the solution across element
boundaries. Indeed, on edge j ; k for example, the interpolation does not involve node
i and is essentially a linear combination using the functions values at node j and k, thus
ensuring continuity. The linear interpolation function for the triangle are shown in gure
10.10 as a three-dimensional plot.
The usefullness of the area coordinates stems from the existence of the following
integration formula over the triangle:
Z pq
ai aj ark dA = 2A (a + bp!+q!rc!+ 2)! (10.115)
A
where the notation p! = 12 : : : p stands for the factorial of the integer p. It is now easy
to verify that the local mass matrix is now given by
0 1 0 1
Z B ii i j ik C 2
AB1 2 1C 1 1
M e = @ j i j j j k A dA = 12 @ A (10.116)
A 1 1 2
k i k j k k
The entries of the matrix arising from the discretization of the Laplace operator are
easy to compute since the gradients of the interpolation and test functions are constant
over an element thus we have:
0 1
Z B ri ri ri rj ri rk C
De = @ rj ri rj rj rj rk A dA
A r r r r r r
(10.117)
k i k j k k
10.4. TWO DIMENSIONAL PROBLEMS 165
1
0.8
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
−1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Figure 10.11: FEM grid and contours of the solution to the Laplace equation in a circular
annulus.
0 1
i
i + i i
i
j + i j
i
k + i k
= 41A B
@
j
i + j i
j
j + j j
j
k + j k CA (10.118)
k
i + k i
k
j + k j
k
k + k k
As an example of the application of the FEM element method we solve the Laplace
equation in a circular annulus subject to Dirichlet boundary conditions using a FEM with
linear triangular elements. The FEM grid and contours of the solution are shown in gure
10.11. The grid contains 754 nodes and 1364 elements. The boundary conditions were
set to cos on the outer radius and sin on the inner radius, where is the azimuthal
angle. The inner and outer radii are 1/2 and 1, respectively. The contour lines were
obtained by interpolating the FEM solution to a high resolution (401x401) strutured
grid prior to contouring it.
10.4.2 Higher order triangular elements
It is possible to dene higher order interpolation formula within an element without
changing the shape of an element. For example the quadratic triangular elements with
collocation points at the triangle vertices and mid-points of edges are given by
u(x) = ui2i (x) + uj 2j (x) + uk 2k (x)
+ ui j 2i j (x) + uj k 2j k (x) + uk i2k i (x)
; ; ; ; ; ;
(10.119)
2i = ai (ai ; 1) (10.120)
2j = aj (aj ; 1) (10.121)
2k = ak (ak ; 1) (10.122)
2i j = 4ai aj
;
(10.123)
166 CHAPTER 10. FINITE ELEMENT METHODS
6 2u
3u 1 4
u 4 QQ
CC
u
QQ
- Q1
u
-1 1 6y CC
C
3 Cu
1
u
-1
u
2 - x
Figure 10.12: Mapping of a quadrilateral between the unit square in computational space
(left) and physical space (right).
2j
; k = 4aj ak (10.124)
2k i
;
= 4ak ai (10.125)
where i ; k denotes the midpoint of edge i ; k. There are 6 degrees of freedom associated
with each quadratic triangular element. Although it is possible to dene even higher order
interpolation in the triangle by proceeding as before it is not so simple to implement.
The alternative is to use a mapping to a structured computational space and dene
the interpolation and collocation function in that space. It is important to choose the
collocation points appropriately in order to avoid Gibbs oscillations as the polynomial
degree increases. Spectral triangular elements are the optimal choice with this regard.
We will not discuss spectral element triangles here we refer the interested reader to !16].
10.4.3 Quadrilateral elements
Change of Coordinates
The derivation of the interpolation and integration formula for quadrilateral nite el-
ements follows the line of the previous section. The main task resides in dening the
local coordinate system in the master element shown in gure 10.12. For straight edged
elements the mapping between physical and computational space can be easily eected
through the following bilinear map:
1 ; 1 ;
1 +
1 + 1 ;
1 +
x(
) = 2 2 x1 + 2 x3 + 2 2 x2 + 2 x4 (10.126)
In order to derive all the expressions needed to express the Galerkin integrals in
computational space, it is useful to introduce a basis in computational space (e e )
tangential to the coordinate lines (
) we denote by (x y) the coordinate in physical
space. Let r = xi + yj denotes a vector pointing to point P located inside the element.
We dene the basis vectors vectors tangent to the coordinate lines as
@ r = x i + y j e = @ r = x i + y j
e = @ (10.127)
@
10.4. TWO DIMENSIONAL PROBLEMS 167
where r denotes the position vector of a point P in space and (i j) forms an othonormal
basis in the physical space. Inverting the above relationship one obtains
i = yJ e ; yJ e j = ; xJ e + xJ e
where J = x y ; x y is the Jacobian of the mapping. The norms of e and e are given
by
je j2 = e e = (x )2 + (y )2 je j2 = e e = (x )2 + (y )2
The basis in the computational plane is orthogonal if e e = x x + y y = 0 in general
the basis is not orthogonal unless the element is rectangular.
It is now possible to derive expression for length and area segements in computational
space. These are needed in order to compute boundary and area integrals arising from
the Galerkin formulation. Using the denition ( ds)2 = dr dr with dr = r d + r d
,
we have:
( ds)2 = je d + e d
j2 = je j2 d 2 + je j2 d
2 + 2e e d d
(10.128)
The dierential area of a curved surface is dened as the area of its tangent plane
approximation (in 2D, the area is always at.) The area of the parallelogram dened by
the vectors d e and d
e is
i j k
dA = jj d e d
e jj = x y 0 d d
= jx y ; x y j d d
= jJ j d d
x y 0
(10.129)
after using the denition of (e e ) in terms of (i j).
Since x = x(
) and y = y(
), the derivative in physical space can be expressed
in terms of derivatives in computational space by using the chain rule of dierentiation
in matrix form this can be expressed as:
! ! !
ux = x
x u (10.130)
uy y y u
Notice that the chain rule involves the derivatives of
in terms of x y whereas the
bilinear map readily delivers the derivatives of x y with respect to
. In order to avoid
inverting the mapping from physical to computational space we derive expressions for
r, and r
in terms of x x , etc... Applying the chain rule to x and y we obtain
(noticing that the two variables are independent) the system of equations:
! ! ! !
x
x x y = xxx yyx = 1 0 (10.131)
y
y x y y y 0 1
The solution is
! !
x
x = J1 y ; y J = x y ; x y (10.132)
y
y ;x x
168 CHAPTER 10. FINITE ELEMENT METHODS
1 1
0.5 0.5
0 0
1 1
1 1
0 0
0 0
−1 −1 −1 −1
1 1
0.5 0.5
0 0
1 1
1 1
0 0
0 0
−1 −1 −1 −1
Figure 10.13: Bilinear shape functions in quadrilateral elements, the upper left hand
corner is h1 ( )h1 (
), the upper right hand side panel shows h2 ( )h1 (
), the lower left
panel shows h1 ( )h2 (
) and lower right shows h2 ( )h1 (
).
For the bilinear map of equation 10.126, the metrics and Jacobian can be easily
computed by dierentiation:
x = 1 ;2
x2 ;2 x1 + 1 +2
x4 ;2 x1 y = 1 ;2
y2 ;2 y1 + 1 +2
y4 ;2 y1(10.133)
x = 1 ;2 x3 ;2 x1 + 1 +2 x4 ;2 x2 y = 1 ;2 y3 ;2 y1 + 1 +2 y4 ;2 y2(10.134)
The remaining expression can be obtained simply by plugging in the the various expres-
sions derived earliear.
can use the collocation points located at the vertices of the quadrilateral to dene the
following interpolation:
X
4
u(
) = umm (
) = u11 + u2 2 + u3 3 + u4 4 (10.135)
m=1
where the two-dimensional Lagrangian interpolants are tensorized product of the one-
dimensional interpolants dened in equation 10.61:
1 (
) = h1 ()h1 (
) 2 (
) = h2 ( )h1 (
) (10.136)
3 (
) = h1 ()h2 (
) 4 (
) = h2 ( )h2 (
)
and shown in gure 10.13. Note that the bilinear interpolation function above satisfy
the C 0 continuity requirement. This can be easily veried by noting rst that the in-
terpolation along an edge involves only the collocation points along that edge, hence
neighboring elements sharing an edge will interpolate the solution identically if the value
of the function on the collocation points is unique. Another important feature of the
bilinear interpolation is that, unlike the linear interpolation in triangular element, it
contains the term of second degree:
. Hence the interpolation within an element is
non-linear it is only linear along edges.
Before proceeding further, we introduce a new notation for interpolation with quadri-
laterals to explicitely bring out the tensorized form of the interpolation functions. This
is accomplished by breaking the single two-dimensional index m in equation 10.135 into
two one-dimensional indices (i j ) such that m = (j ; 1)2 + i, where (i j ) = 1 2. The
index i runs along the direction and the index j along the
direction thus m = 1
becomes identied with (i j ) = (1 1), m = 2 with (i j ) = (2 1), etc... The interpolation
formula can now be written as
X
2 X
2
u(
) = uij hi()hj (
) (10.137)
j =1 i=1
where uij are the function values at point (i j ).
With this notation in hand, it is now possible to write down arbitrarily high order La-
grangian interpolation in 2D using the tensor product formula. Thus, a 1-D interpolation
using N points per 1D element can be extended to 2D via:
X
N X
N
u(
) = uij hNi ()hNj (
) (10.138)
j =1 i=1
The superscript N has been introduced on the Lagragian interpolants to stress that
they are polynomials of degree N ; 1 and use N collocation points per direction. The
collocation points using 7th degree interpolation polynomials is shown in gure 10.14.
Note, nally, that sum factorization algorithms must be used to compute various
quantities on structured sets of points p q in order to reduce the computational overhead.
For example, the derivative of the function at point (p q), can be computed as:
0N 1
X
N X N
u j pq = @ uij ddhi A hNj (
q ): (10.139)
j =1 i=1 p
170 CHAPTER 10. FINITE ELEMENT METHODS
Figure 10.14: Collocation points within a spectral element using 8 collocation points per
direction to interpolate the solution there are 82 =64 points in total per element.
First the term in parenthesis is computed and saved in a temporary array, second, the -
nal expression is computed and saved this essentially reduces the operation from O(N 4 )
to O(N 3 ). Further reduction in operation count can be obtained under special circum-
stances. For instance, if
q happens to be a collocation point, then hNj (
q ) = jq and the
formula reduces to a single sum:
X
N N
u j p q = uiq ddhi (10.140)
i=1 p
n=1 m=1
XQ X Q
+ hi (m) hj (
n ) hk (m ) hl (
n ) !r
r
jJ j]mn !m !n
0 0
n=1 m=1
XQ X Q
+ hi (m) hj (
n ) hk (m ) hl (
n ) !r r
jJ j]mn !m!n
0 0
n=1 m=1
XQ X Q
+ hi (m) hj (
n ) hk (m ) hl (
n ) !r r
jJ j]mn !m!n (10.148)
0 0
n=1 m=1
172 CHAPTER 10. FINITE ELEMENT METHODS
where the expressions in bracket are evaluated at the quadrature points (m
n ) ( we
have omitted the superscript G from the quadrature points). Again, substantial savings
can be achieved if the Gauss-Lobatto quadrature of the same order as the inteporlation
polynomial is used. The expression for Dijkl reduces to:
X
Q
Dijkl = jl hi (m ) hk (m ) !r r jJ j]mj !m !j
0 0
m=1
XQ
+ ik hj (
n ) hl (
n ) !r
r
jJ j]in !i !n
0 0
n=1
+ hi (k ) hl (
j ) !r r
jJ j]kj !k !j
0 0
+ hk (i ) hj (
l ) !r r
jJ j]il !i !l
0 0
(10.149)
The approximation of the advective term using linear nite element has resulted in a
centered-dierence approximation for that term whereas the time-derivative term has
produced the mass matrix. Notice that any integration scheme, even an explicit ones
like leap-frog, would necessarily require the inversion of the mass matrix. Thus, one can
already anticipate that the computational cost of solving the advection using FEM will
be higher then a similarly congured explicit nite dierence method. For linear elements
in 1D, the mass matrix is tridiagonal and the increase in cost is minimal since tridiagonal
solvers are very ecient. Quadratic elements in 1D lead to a pentadiagonal system and
is hence costlier to solve. This increased cost may be justiable if it is compensated by
a sucient increase in accuracy. In multi-dimensions, the mass matrix is not tridiagonal
but has only limited bandwidth that depends on the global numbering of the nodes thus
even linear elements would require a full matrix inversion.
Many solutions have been proposed to tackle the extra-cost of the full matrix. One
solution is to use reduced integration using Gauss-Lobatto quadrature which as we saw in
the Gaussian quadrature section leads immediately to a diagonal matrix this procedure
is often referred to as mass lumping. For low order elements, mass lumping degrades
signicantly the accuracy of the nite element method, particularly in regards to its phase
properties. For the 1D advection equation mass lumping of linear elements is equivalent
to a centered dierence approximation. For high order interpolation, i.e. higher then
degree 3, the loss of accuracy due to the inexact quadrature is tolerable, and is of the
same order of accuracy as the interpolation formula. Another alternative revolves around
the use of discontinuous test functions and is appropriate for the solution of mostly
hyperbolic equations this approach is dubbed the Discontinuous Galerkin method, and
will be examined in a following section.
The system of ODE can now be integrated using one of the time-stepping algorithms,
for example second order leap-frog, third order Adams-Bashforth, or one of the Runge-
Kutta schemes. For linear nite elements, the stability limit can be easily studied with
the help of Von Neumann stability analysis. For example, it is easy to show that a leap-
p scheme applied to equation will result in a stability limit of the form = ct=x <
frog
1= 3 and hence is much more restrictive then the nite dierence scheme which merely
requires that the Courant number be less then 1. However, an examination of the phase
property of the linear FE scheme will reveal its superiority over centered dierences.
174 CHAPTER 10. FINITE ELEMENT METHODS
1 1
0.9
0.5
0.8
FE1 0
0.7
c /c
0.5 CD4 −1 CD2
g
0.4
−1.5 CD4
0.3 CD2
−2 CD6
0.2
−2.5
0.1 FE1
0 −3
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
k ∆ x/π k∆ x/π
Figure 10.15: Comparison of the dispersive properties of linear nite elements with
centered dierences, the left panel shows the ratio of numerical phase speed to analytical
phase speed, and the right panel shows the ratio of the group velocity
We study the phase properties implied by the linear nite element discretization by
looking for the periodic solution, in space and time, of the system of equation 10.157:
u(x t) = ei(kxj
t) . We thus get the following dispersion relationship and phase speed:
;
= 3cx 2 +sincos
kx
kx (10.158)
cF = 3 sin kx (10.159)
c kx(2 + cos kx)
The numerical phase speed should be contrasted to the one obtaiend from centered second
and fourth order nite dierences:
cCD2 = sin kx (10.160)
c kx
cCD4 = 1 4 sin kx ; sin 2kx (10.161)
c 3 kx 2kx
Figure 10.15 compares the dispersion of linear nite element with that of centered dif-
ference schemes of 2, 4 and 6 order. It is immediately apparent that the FE formulation
yields more accurate phase speed at all wavenumbers, and that the linear interpolation
is equivalent, if not slightly better then, a sixth-order centered FD approximation in
particular the intermediate to short waves travel slightly faster then in FD. The group
velocity, shown in the right panel of gure 10.15, shows similar results for the long to
intermediate waves. The group velocity of the short wave are, however, in serious errors
for the FE formulation in particular they have a negative phase speed and propagate up-
stream of the signal at a faster speed then the nite dierence schemes. A mass-lumped
version of the FE method would collapse the FE curve onto that of the centered second
order method.
10.6. THE DISCONTINUOUS GALERKIN METHOD (DGM) 175
Figure 10.16: Flow eld and initial condition for Gaussian Hill experiment
2
−4 −4
10 3 10
2
4
−6 5 −6
10 10
3
2
ε2
ε
8
−8 −8
10 10 4
10
−10 −10
5
10 10
8 10
6 8 10 12 14 16 18 0 50 100 150 200
N Ndf
ε2 vs number of points per element ε2 vs total number of degrees of freedom
−2 −2
10 10
2
−4 −4
10 10
3
2
4
−6 −6
10 10
2
ε2
5
ε
−8 −8
10 10
10
8
4
−10 −10
10 10 10
5 8
Figure 10.17: Convergence curve in the L2 norm for the Gaussian Hill initial condition using, from top
to bottom, CGM, and DG. The labels indicate the number of elements in each direction. The abcissa on
the left graphs represent the spectral truncation, and on the right the total number of collocation points.
10.6. THE DISCONTINUOUS GALERKIN METHOD (DGM) 179
Figure 10.18: Contour lines of the rotating cone problem after one revolution for CG (left), and DG
(right), using the 10 6 grid. The contours are irregularly spaced to highlight the Gibbs oscillations.
for the 4 schemes at the end of the simulations. We note that the contour levels are
irregularly spaced and were chosen to highlight the presence of Gibbs oscillations around
the 0-level contour.
For CG, the oscillation are present in the entire computational region, and have
peaks that reaches ;0:03. Although the DG solution exhibits Gibbs oscillations also,
these oscillations are conned to the immediate neighborhood of the cone. Their largest
amplitude is one third that observed with CG. Further reduction of these oscillation
require the use of some form of dissipation, e.g. Laplacian, high order lters, or slope
limiters. We observe that CG shows a similar decay in the peak amplitude of the cone
with DG doing a slightly better job at preserving the peak amplitude. Figure 10.19
shows the evolution of the largest negative T as a function of the grid resolution for CG
and DG. We notice that the DG simulation produces smaller negative values, up to a
factor of 5, than CG at the same resolution.
180 CHAPTER 10. FINITE ELEMENT METHODS
0
10
−1
10
5
−min(T)
7 5
9
7
−2
10 9
−3
10 0 1 2
10 10 10
K
Figure 10.19: Min(T) as a function of the number of element at the end of the simulation. The red
lines are for DG and the blue lines for CG. The number of points per element is xed at 5, 7 and 9 as
indicated by the lables.
Chapter 11
Linear Analysis
11.1 Linear Vector Spaces
In the following we will use the convention that bold roman letter, such as x, denote
vectors, greek symbols denote scalars (real or complex) and capital roman letters denote
operators.
11.1.1 De nition of Abstract Vector Space
We call a set V of vectors a linear vector space V if the following requirements are
satised:
1. We can dene an addition operation, denoted by `+', such that for any 2 elements
of the vector space x and y, the result of the operation z = (x + y) belongs to V .
We say that the set V is closed under addition. Furthermore the addition must
have the following properties:
(a) commutative x + y = y + x
(b) associative (x + y) + z = x + (y + z)
(c) neutral element: there exist a null zero vector 0 such that x + 0 = x
(d) for each vector x 2 V there exist a vector y such that x + y = 0, we denote
this vector by y = ;x.
2. We can dene a scalar multiplication operation dened between any vector x 2 V
and a scalar such that x 2 V , i.e. V is closed under scalar multiplication. The
following properties must also hold for 2 scalars and
, and any 2 vectors x and
y:
(a) (
x) = (
)x
(b) Distributive scalar addition: ( +
)x = x +
x
(c) Distributive vector addition: (x + y) = x + y
(d) 1x = x
(e) 0x = 0
181
182 CHAPTER 11. LINEAR ANALYSIS
11.1.2 De nition of a Norm
In order to provide the abstract vector space with the sense of length and distance, we
dene the norm or length of a vector x, as the number kxk. In order for this number to
make sense as a distance we put the following restrictions on the denition of the norm:
1. k xk = j j kxk
2. Positivity kxk > 0 8x 6= 0, and kx = 0k , x = 0
3. Triangle or Minkowski inequality kx + yk kxk + kyk
With the help of the norm we can dene now the distance between 2 vectors x and
y as kx ; yk, i.e. the norm of their dierence. So 2 vectors are equal or identical if
their distance is zero. Furthermore, we can now talk about the convergence of a vector
sequence. Specically, we say that a vector sequence xn converges to x as n ! 1 if for
any > 0 there is an N such that kxn ; xk < 8n > N .
11.1.3 De nition of an inner product
It is generally usefull to introduce the notion of angle between 2 vectors x and y. We
thus dene the scalar (x y) which must satisfy the following properties
1. conjugate symmetry: (x y) = (y x), where the overbar denotes the complex con-
jugate.
2. linearity: ( x +
y z) = (x z) +
(y z)
3. positiveness: (x x) > 0 8x 6= 0, and (x x) = 0 if x = 0.
The above properties imply the Schwartz inequality:
q q
j(x y)j (x x) (y y) (11.1)
p
This inequality suggest that the inner product (x x) can be dened as a norm, so that
kxk = (x x)1=2 With the denition of a norm we call 2 vectors orthogonal i (x y) = 0.
Moreover i (x y) = kxk kyk the 2 vectors are colinear or aligned.
11.1.4 Basis
A set of vectors e1 e2 : : : en dierent from 0 are called a basis for the vector space V if
they have the following 2 properties:
1. Linear independence:
X
N
i ei = 0 , i = 08i (11.2)
i=1
If at least one of the i is non-zero, the set is called linearly dependent, and one of
the vectors can be written as a linear combination of the others.
11.1. LINEAR VECTOR SPACES 183
The orthogonality property can then be written as (ei ej ) = ij (ej ej ), and the ba-
sis is called orthogonal. For an orthogonal basis the system reduces to the uncoupled
(diagonal) system
(ej ej ) j = (x ej ) (11.6)
and the coordinates can be computed easily as
the interval !a b] (i.e. discrete space) by their pointwise values xi and yi at the points
ti , and let us dene the discrete inner product as the Riemann type sum:
;a X
N
(x y) = b N xiyi (11.12)
i=1
In the limit N tends to innity the above discrete sum become
;a X N Zb
(x y) = Nlim b N xiyi = a x(t)y(t) dt: (11.13)
!1
i=1
Two functions are said to be orthogonal if (x y) = 0.
Similarly we can dene the p-norm of a function as:
Zb
kxkp = jx(t)jp dt (11.14)
a
The 1-norm, 2-norm and 1 norms follow by setting p = 1 2. and 1.
The main diculties with function spaces are twofold. First they are innite dimen-
sional and thus require an innite set of vectors to dene a basis proving completeness
is hence dicult. Two, functions are usually dened in a continuum where limits may
be in or outside the vector space. Some interesting issue arise. Consider for example the
sequence of functions 8
>
< 0 ;1 t 10
x(t) = > nt 0 t n (11.15)
: 1 n1 t 1
dened on C (;1 1). This sequence converges to the step function, also known as the
Heaviside function, H (t) which does not belong to C (;1 1) since it is discontinuous.
That although the sequence xn is in C (;1 1), its limit as n ! 1 is not we say that
the space does not contain its limit points and hence is not closed. This is akin to the
space C (;1 1) having holes in it. This is rather unfortunate as closed spaces are more
tractable.
It is possible to create a closed space by changing the denition of the space slightly
to that of the Lebesgue space L2 (a b), namely the space of functions that are square
integrable on the interval !a b], i.e.
Zb
kxk2 = jx(t)j2 dt < 1: (11.16)
a
L2(a b) is an1 example of a Hilbert space { a closed inner product space with the
kxk = (x x) 2 { a closed inner product space with the kxk = (x x) 2 .
1
The issue of dening a basis function for a function space is complicated by innite
dimension of the space. Assume I have an innite set of linearly independent vectors, if
I remove a single element of that set, the set is still innite but clearly cannot generate
the space. It turns out that it is possible to prove completeness but we will defer the
discussion until later. For the moment, we assume that it is possible to dene such a
186 CHAPTER 11. LINEAR ANALYSIS
basis. Furthermore, this basis can be made to be orthogonal. The Legendre polynomials
Pn are an orthogonal set of functions over the interval !;1 1], and the trigonometric
functions einx are orthogonal over !; ].
Suppose that an orthogonal and complete basis ei (t) has been dened, then we can
expand a vector in this basis function:
X1
x= i ei (11.17)
i=1
The above expansion is referred to as a generalized Fourier series, and the i are the
Fourier coecients of x in the basis ei . We can also follow the procedure outlined for the
nite-dimensional spaces to compute the i 's by taking the inner product of both sides
of the expansion. The determination of the coordinate is, again, particularly easy if the
basis is orthogonal
i = ((ex eei )) (11.18)
i i
In particular, if the basis is orthonormal i = (x ei ).
In the following we show that the Fourier coecients are the best approximation
P to
the function in the 2-norm, i.e. the coecients i minimize the norm kx ; i i ei k2 .
We have:
X X X
kx ; iei k22 = (x ; iei x ; iei) (11.19)
i i X i X XX
= (x x) ; i (x ei ) ; i (ei x) + i j (ei ej(11.20)
)
i i i j
The orthogonality
P of the basis functions can be used to simply the last term on the right
hand side to i j i j2 . If we furthermore dene ai = (x ei ) we have:
X X
kx ; ieik22 = kxk2 + ! i i ; ai i ; ai i ] (11.21)
i Xi
= kxk2 + !(ai ; i )(ai ; i ) ; ai ] (11.22)
Xi X
= kxk2 + jai ; i j2 ; jaj2i (11.23)
i i
Note that since the rst and last terms are xed, and the middle term is always greater
be minimized by the choice i = ai = (x ei ). The
or equal to zero, the left hand side can P
minimum norm has the value jjxjj2 ; i jai j2 . Since this value must be always positive
then X
jai j jjxjj2 (11.24)
i
A result known as the Bessel inequality. If the basis set is complete and the minimum
norm tend to zero as the number of basis functions increases to innity, we have:
X
jai j = jjxjj2 (11.25)
i
which is known as the Paserval equality this is a generalized \Pythagorean Theorem".
11.2. LINEAR OPERATORS 187
Both functions belong to C (;1 1) and are identical for all t except t = 0. If we use the
2-norm to judge the distance between the 2 functions, we get that kx ; yk2 = 0, hence
the functions are the same. However, in the maximum norm, the 2 functions are not
identical since kx ; yk = 1. This example makes apparent that the choice of norms
1
is critical in deciding whether 2 functions are the same or dierent, 2 functions that
may be considered identical in one norm can become dierent in an another norm. This
is simply an apparent contradiction and reects the fact that dierent norms measure
dierent things. The 2-norm for example looks at the global picture and asks if the 2
functions are the same over the interval this is the so-called mean-square convergence.
The innity norm on the other hand measures pointwise convergence.
1.2
1.5
0.8
0.5
0.6
0
0.4
−0.5
0.2
0 −1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Figure 11.1: Left: step function t f = 1 blue line is for N = 1, black line for N = 7,
and red line for N = 15. Right: Fourier expansion to f = esin x , blue line is for N = 1,
red for N = 2, and black for N = 3 the circles show the function f
the boundary conditions by periodicity conditions y(a) = y(b) and y (a) = y (b). 0 0
Note carefully that the domain of the Sturm Liouville system is the set of function
that are twice dierentiable and that satisfy certain boundary conditions, and yet the
theorem asserts completeness over a much broader space L2 , which contains functions
that need not be even continuous and that need not satify the boundary conditions.
The various special forms of the Sturm Liouville problems gives rise to various com-
monly known Fourier series. For example the choice of w = p = 1 and r = 0 gives rise
to the Fourier trigonometric series. For w = 1, p = 1 ; t2 we get the Fourier Legendre
series, etc...
ExampleP 15 The function y(t) = 1 on 0 t 1 can be expanded in the trigonometrics
series Nm=1 fm sin(mx). It is easy to verify that
Z1
(sin mx sin kx) = sin kx sin mxdx = 12 km (11.40)
0
190 CHAPTER 11. LINEAR ANALYSIS
m Am Bm
0 0.000000000000000 0.532131755504017
1 1.13031820798497 4.09707210415378210 17
;
2 4.96816444942137310 18 ;
-0.271495339534077
3 -4.43368498486638110 02 ;
-5.79093095523838810 17
;
4 -3.86191704837914010 17 ;
5.47424044209372410 03
;
5 5.42926311914026310 04 ;
2.45795242606356410 17
;
6 -3.89811186474567210 16 ;
-4.49773229543022210 05
;
7 -3.19843646237013610 06 ;
5.04787058018770610 17
;
8 3.36316879101384010 16 ;
1.99212480661951510 07
;
9 1.10367717726918310 08 ;
-1.09127273641165010 16
;
10 -4.74861929728567110 17 ;
-5.50589597019361710 10
;
11 -2.49795977789615210 11 ;
1.28176151567782410 16
;
and that the Fourier coecients are given by: fm = 0, for even m, and fm = m 4 , for
odd m. Figure 11.1 illustrates the convergence of the series as the number of Fourier
functions retained, N , increases.
Example 16 The function f = esin x is periodic over the interval !;1 1] and can be
expanded into a Fourier series of the following form:
X
N
f (x) = Am cos mx + Bm sin mx (11.41)
m=0
The Fourier coecients can be determined from:
R 1 esin x cos mxdx R 1 esin x sin mxdx
1
Am = R 1 2
;
Bm = 1R 1 2 ;
(11.42)
1 cos
;
mxdx 1 sin mxdx
;
Since the integrals cannot be evaluated analytically, we compute them numerically using
a very high order methods. The rst few Fourier coecients are listed in table 11.1.
Notice in particular the rapid decrease of jAm j and jBm j as m increases, and the fact
that with 3 Fourier modes the series expansion and the original function are visually
identical.
Example 17 Let us take the example of the Laplace equation ru = 0 dened on the
rectangular domain 0 x a and 0 y b, and subject to the boundary conditions
that u = 0 on all boundaries except the top boundary where u(x y = b) = v(x). Sepa-
ration of variables assumes that the solution can be written as the product of functions
that depend on a single independent variable: u(x y) = X (x)Y (y). When this trial
solution is substituted in the PDE, we can derive the identity:
Xxx = ; Yyy (11.43)
X Y
Since the left hand side is a function of x-only, and the right hand side a function of y
only, and the equality must hold for arbitrary x and y the two ratios must be equal to a
constant which we set to ;2 , and we end up with the 2 equations
Xxx + 2 X = 0 (11.44)
Yyy ; Y = 0
2 (11.45)
Notice that the 2 equations are in the form of a Sturm-Liouville problem. The solutions
of the above 2 systems are the following set of functions:
X = A cos x + B sin x (11.46)
Y = C cosh y + D sinh y (11.47)
where A, B , C and D are integration constants. Applying the boundary conditions at
x = 0 and y = 0, we deduce that A = C = 0. The boundary condition at x = b produces
the equation
B sin a = 0 (11.48)
The solution B = 0 is rejected since this would result in the trivial solution u = 0, and
hence we require that sin a = 0 which yields the permissible values of :
n = n
a n = 1 2 : : : : (11.49)
There are thus an innite number of possible which we have tagged by the subscript
n, with correponding Xn(x), Yn(x), and unknown constants. Since the problem is linear
the sum of these solution is also a solution and hence we set
X
1
Example 19 Solution of the wave equation utt = c2r2 u in the disk 0 r a using
cylindrical coordinates will produce the following set of equations in each independent
variable:
Ttt + 2c2 T = 0 (11.55)
+ + 2 + = 0 (11.56)
r2Rrr + rRr + (2 r2 ; 2 )R = 0 (11.57)
Since the domain is periodic in the azimuthal direction, we should expect a periodic
solution and hence must be an integer. The radial equation is nothing but the Bessel
equation in the variable r, its solutions are given by R = An Jn (r)+ Bn Yn (r). Bn = 0
must be imposed if the solution is to be nite at r = 0. The eigenvalues are determined
by imposing a boundary condition at r = a. For a homogeneous Dirichlet conditions,
the eigenvalues are determined by the roots mn of the Bessel functions Jn (m ) = 0, and
hence mn = mn =a. Note that mn is the radial wavenumber. The solution can now be
expanded as:
XX
1 1
where mn = mn c is the time frequency. The integration constants must be determined
from the initial conditions of which there must be 2 since we have a second derivative
in time. In the present examples the radial eigenfunctions are given the Bessel functions
of the rst kind and order n and the eigenvalues are determined by the roots of the Jn .
Notice that the Bessel equation is also a Sturm Liouville problem and hence the basis
Jn(mn r) must be complete and orthogonal. Periodicity in the azimuthal direction yields
the trigonometric functions, and quantizes the eigenvalues to the set of integers.
194 CHAPTER 11. LINEAR ANALYSIS
Chapter 12
195
196 CHAPTER 12. RUDIMENTS OF LINEAR ALGEBRA
12.1.2 Inner product
An inner product is an operation that associates a real number between any two vectors
u and v. It is usually denoted by (u v) or u v and has the following properties:
1. commutative: (u v) = (v u).
2. linear under scalar multiplication: ( u v) = (u v).
3. linear under vector addition: (u v + w) = (u v) + (u v).
4. positivity (u u) 0, and (u u) = 0 implies u = 0.
The norm and inner product denition allow us to dened the cosine of an angle between
two vectors as:
cos = (u v)
kuk kvk (12.2)
The properties of the inner product allows it to dened a vector norm, often called the
inner-product induced norm: kuk = (u u).
sum.
q
3. 2-norm kLk2 = (LT L) where is the p spectral radius (see below). If the matrix
L is symmetric LT = L, then kLk2 = (L2 ) = (L).
12.3. EIGENVALUES AND EIGENVECTORS 197
for i = 1 2 : : : N .
12.5. EIGENVALUES OF TRIDIAGONAL MATRICES 199
For periodic partial dierential equations, the tridiagonal system is often of the form
0a b c1
BB c a b CC
BB c a b CC
L=B B ... CC (12.13)
BB CC
@ c a bA
b c a
The eigenvalues are then given by:
i = a + (b + c) cos 2(iN; 1) + i(c ; b) sin 2(iN; 1) (12.14)
A nal useful result is the following. If a real tridiagonal matrix has either all its o-
diagonal element positive or all its o-diagonal element negative, then all its eigenvalues
are real.
200 CHAPTER 12. RUDIMENTS OF LINEAR ALGEBRA
Chapter 13
Programming Tips
13.1 Introduction
The implementation of numerical algorithm requires familiarity with a number of soft-
ware packages and utilities. Here is a short list of the minimum required to get started
in programming:
1. Basics of the Operating System such as le manipulation. The RSMAS library has
UNIX: the basics. The library has also a bunch of electronic books on the subject.
Two titles I came across are: Unix Unleashed, and Learning the Unix Operating
System: Nutshell Handbook
2. Text editor to write the computer program. Most Unix books would have a short
tutorial on using either vi or emacs for editing text les. There are a number of
simpler visual editors too such as jed. Web sites for vi or its close cousin vim are:
http://www.asu.edu/it/fyi/dst/helpdocs/editing/vi/
This is actually a very concise and fast introduction to vi. Start with it and
then go to the other web sites for more in-depth information.
http://docs.freebsd.org/44doc/usd/12.vi/paper.html
http://www.vim.org/
real :: pi,x,y,dx
!.End of Variable Declaration
pi = 2.0*asin(1.0) ! initialize pi
dx = (xmax-xmin)/real(M-1) ! grid-size
do i = 1,M ! counter: starts at 1, increments by 1 and ends at M
!...indent statements within loop for clarity
x = (i-1)*dx + xmin ! location on interval
y = sin(pi*x) ! compute function 1
f(i) = cos(pi*x) ! compute function 2
write(6,*) x,y ! write two columns to terminal
write(9,*) x,y,f(i) ! write three columns to file
enddo ! end of do loop must be marked.
write(6,*)'Done'
stop
end program waves
!
!
! Compiling the program and creating an executable called waves:
! $ f90 waves.f90 -o waves
! If "-o waves" is ommited the executable will be called a.out
! by default. The fortan 90 compiler (f90) may have a different name
! on your system. Possible alternatives are pgf90 (Portland Group
! compiler), ifc (Intel Fortran Compiler), and xlf90 (on IBMs).
!
!
! Running the program
! $ waves
204 CHAPTER 13. PROGRAMMING TIPS
!
!
! Expected Terminal output is:
! Hello World
! -1.000000 8.7422777E-08
! -0.9000000 -0.3090170
! -0.8000000 -0.5877852
! -0.7000000 -0.8090170
! -0.6000000 -0.9510565
! -0.5000000 -1.000000
! -0.4000000 -0.9510565
! -0.3000000 -0.8090171
! -0.2000000 -0.5877852
! -9.9999964E-02 -0.3090169
! 0.0000000E+00 0.0000000E+00
! 0.1000000 0.3090171
! 0.2000000 0.5877854
! 0.3000001 0.8090171
! 0.4000000 0.9510565
! 0.5000000 1.000000
! 0.6000000 0.9510565
! 0.7000000 0.8090169
! 0.8000001 0.5877850
! 0.9000000 0.3090170
! 1.000000 -8.7422777E-08
! Done
!
!
! Visualizing results with matlab:
! $ matlab
!> z = load('waves.out') % read file into array z
!> size(z) % get dimensions of z
!> plot(z(:,1),z(:,2),'k') % plot second column versus first in black
!> hold on % add additional lines
!> plot(z(:,1),z(:,3),'r') % plot third column versus first in red
!> xlabel('x') % add labels to x-axis.
!> ylabel('f(x)') % add labels to y-axis.
!> title('sine and cosine curves') % add title to plot
!> legend('sin','cos',0) % add legend
!> print -depsc waves % save results to a color
!> % encapsulated postscript file
!> % called waves.eps. The extension
!> % eps will be added automatically.
! Viewing the postscript file:
! $ ghostscript waves.eps
13.3. DEBUGGING AND VALIDATION 205
!
! Printing the file to color printer
! $ lpr -Pmpocol waves.eps
Use list les to catch compiler report. When compiling the compiler throws rapidly
a list of errors at you, being out of context they are hard to understand, and even
harder to remember when you return to the editor. Using two windows one for
editing and one for debugging is helpful. You can also ask the compiler to generate
a LIST FILE, that you can look at (or print) in a separate window.
Use modules and interface to double check the argument lists of calling subroutines.