Lecture Notes On Problem Discretization Using Approximation Theory
Lecture Notes On Problem Discretization Using Approximation Theory
Contents
1 Uni…ed Problem Representation 3
2 Polynomial Approximation[3] 7
1
5 Least Square Approximations 36
5.1 Solution of Linear Least Square Problem . . . . . . . . . . . . . . . . . . . . 38
5.2 Geometric Interpretation of Least Squares Approximation [11] . . . . . . . . 39
5.2.1 Distance of a Point from a Line . . . . . . . . . . . . . . . . . . . . . 40
5.2.2 Distance of a point from Subspace . . . . . . . . . . . . . . . . . . . . 41
5.2.3 Additional Geometric Insights . . . . . . . . . . . . . . . . . . . . . . 44
5.3 Projection Theorem in a general Hilbert Space [6] . . . . . . . . . . . . . . . 45
5.3.1 Simple Polynomial Models and Hilbert Matrices [11, 7] . . . . . . . . 47
5.3.2 Approximation of Numerical Data by a Polynomial [7] . . . . . . . . 49
5.4 Problem Discretization using Minimum Residual Methods . . . . . . . . . . 50
5.4.1 Raleigh-Ritz method [11, 12] . . . . . . . . . . . . . . . . . . . . . . . 50
5.4.2 Discretization of ODE-BVP / PDEs using Finite Element Method . . 56
5.4.3 Method of Least Squares[4] . . . . . . . . . . . . . . . . . . . . . . . 63
5.4.4 Gelarkin’s Method[4, 2] . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2
To begin with, we listed and categorized di¤erent types of equations that arise in variety of
engineering problems. The fundamentals of vector spaces were introduced in the subsequent
module. With this background, we are ready to start our journey in the numerical analysis.
We …rst show that the concept of vector space allows us to develop a uni…ed representation
of seemingly di¤erent problems to be solved, which were categorized as algebraic equations,
ODE-IVPs, ODE-BVP, PDEs etc. When transformations involved in a problem at hand are
non-linear, it is often not possible to solve analytically. In all such cases, the problem is
approximated and transformed to a computationally tractable form
" # " #
Original Approximation Computationally Tractable
!
Problem Approximation
and we compute an approximate solution using the computable version. In this module, we
explain the process of problem approximation using various popular approaches available in
the literature. In the end, we distill out equation forms that frequently arise in the process
of problem approximation.
The set of all elements for which and operator T is de…ned is called as domain of T and
the set of all elements generated by transforming elements in the domain by T are called as
range of T . If for every y 2 Y , there is utmost one x 2 M for which T (x) = y , then T (:)
is said to be one to one. If for every y 2 Y there is at least one x 2 M; then T is said to
map M onto Y: A transformation is said to be invertible if it is one to one and onto.
3
T ( x(1) + x(2) ) = T (x(1) ) + T (x(2) ): (1)
Note that any transformation that does not satisfy the above de…nition is not a linear
transformation.
Example 4 Operators
1. Consider transformation
y = Ax (2)
where y 2 Rm ,x 2 Rn ; A 2 Rm Rn and T (x) =Ax. Whether this mapping in onto
Rn depends on the rank of the matrix. It is easy to check that A is a linear operator.
2. Consider transformation
y = Ax + b (3)
where y; b 2 Rm ,x 2 Rn ; A 2 Rm Rn and T (x) =Ax + b. Here, b is a …xed non-
zero vector. Note that this transformation does not satisfy equation (1) and does not
qualify as a linear transformation.
dx(t)
y(t)=
dt
where t 2 [a; b] : Here, T () =d=dt is an operator from, X C (1) [a; b]; the space of
continuously di¤erentiable functions, to the space of continuous function, i.e. Y
C[a; b]. It is easy to check that this is a linear operator.
5. Consider ODE-IVP
dx=dt = f [t; x(t)] ; t 2 [0; 1) (4)
4
with initial condition x(0) = : De…ning product space Y = C (1) [a; 1) R; the trans-
formation T : C (1) [0; 1) ! Y can be stated as
T [x( )] = (0(t); )
where 0 represents zero function over interval [0; 1); i:e:0(t) = 0 for t 2 [0; 1):
6. Consider ODE-BVP
d2 u du
a 2 + b + cg(u) = 0 (0 z 1)
dz dz
du(0)
B:C: at z = 0 : f1 ; u(0) = 0
dz
du(1)
B:C: at z = 1 : f2 ; u(1) = 1
dz
d2 u(z) du(z)
T [u(z)] = a 2
+b + cg (u(z)) ; f1 (u0 (0); u(0)) ; f2 (u0 (1); u(1))
dz dz
maps space X C (2) [0; 1] to Y = C (2) [0; 1] R R and the ODE-BVP can be
represented as follows
T [u( )] = (0(z); 0; 1)
@2u @u @u
a 2
+b + cg(u) =0
@z @z @t
de…ned over (0 < z < 1) and t 0 with the initial and the boundary conditions speci…ed
as follows
u(z; 0) = h(z) for (0 < z < 1)
du(0; t)
B:C: at z = 0 : f1 ; u(0; t) = 0 for t 0
dz
du(1; t)
B:C: at z = 1 : f2 ; u(1) = 1 for t 0
dz
5
In this case, the transformation T [u(z; t)] de…ned as
d2 u(z; t) du(z; t)
T [u(z; t)] = a 2
+b
dz dz
+cg (u(z; t)) ; u(z; 0); f1 (u0 (0; t); u(0; t)) ; f2 (u0 (1; t); u(1; t))
maps space X C (2) [0; 1] C (1) [0; 1) to Y = C (2) [0; 1] C[a; b] R R and the PDE
can be represented as follows
A large number of problems arising in applied mathematics can be stated as follows [4]:
Direct Problems: Given operator T and x; …nd y:In this case, we are trying to
compute output of a given system given input. The computation of de…nite integrals
is an example of this type.
Inverse Problems:Given operator T and y; …nd x: In this case we are looking for
input which generates the observed output. Solving system of simultaneous (linear /
nonlinear) algebraic equations, ordinary and partial di¤erential equations and integral
equations are examples of this category
The direct problems can be treated relatively easily. Inverse problems and identi…cation
problems are relatively di¢ cult to solve and form the central theme of the numerical analysis.
In this course, the inverse problems are of particular interest. When the operator involved is
nonlinear, it is di¢ cult to solve the problem (5) analytically. The problem is approximated
and transformed to a computable form
h i
Discretization
[y=T (x)] ! e=Tb (e
y x) (6)
6
where x e 2 Xn ; ye 2 Yn are …nite dimensional spaces and Tb (:) is an approximation of
the original operator T (:):This process is called as discretization. The main tool used for
discretization is continuous function approximation using polynomials. In the sections that
follow, we discuss the theoretical basis for this choice and di¤erent commonly used polynomial
based approaches for problem discetization.
2 Polynomial Approximation[3]
Given an arbitrary continuous function over an interval, can we approximate it with another
”simple” function with arbitrary degree of accuracy? This question assumes signi…cant
importance while developing many numerical methods. In fact, this question can be posed
in any general vector space. We often use such simple approximations while performing
computations. The classic examples of such approximations are use of a rational number to
approximate an irrational number (e.g. 22=7 is used in place of or …nite series expansion
of number e) or polynomial approximation of a continuous function. This sections discusses
rationale behind such approximations.
De…nition 5 (Dense Set) A set D is said to be dense in a normed space X; if for each
element x 2X and every " > 0; there exists an element d 2D such that kx dk < ":
Thus, if set D is dense in X, then there are points of D arbitrary close to any element
of X: Given any x 2X , a sequence can be constructed in D which converges to x: Classic
example of such a dense set is the set of rational numbers in the real line: Another dense
set, which is widely used for approximations, is the set of polynomials. This set is dense
in C[a; b] and any continuous function f (t) 2 C[a; b] can be approximated by a polynomial
function p(t) with an arbitrary degree of accuracy as evident from the following result. This
classical result is stated here without giving proof.
Theorem 6 (Weierstrass Approximation Theorem): Consider space C[a; b], the set
of all continuous functions over interval [a; b], together with 1 norm de…ned on it as
max
kf (t)k1 = jf (t)j (7)
t 2 [a; b]
Given any " > 0; for every f (t) 2 C[a; b] there exists a polynomial pn (t) such that kf (t) pn (t)k <
":
This fundamental result forms the basis of discretization in majority of the numerical
techniques. It may be noted that this is only an existence theorem and does not provide any
method of constructing the polynomial approximation. Approaches used for constructing
polynomial approximation will be discussed in the subsequent sections.
7
3 Discretization using Taylor Series Approximation
3.1 Local approximation by Taylor series expansion [9]
To begin with let us consider Taylor series expansion for a real valued scalar function. Any
scalar function f (x) : R ! R; which is continuously di¤erentiable n times at x = x, the
Taylor series expansion of this function in the neighborhood the point x = x can be expressed
as
@f (x) 1 @ 2 f (x)
f (x) = f (x) + x+ 2
( x)2 + ::: (8)
@x 2! @x
n
1 @ f (x)
:::: + n
: ( x)n + rn (x; x)
n! @x
1 @ n+1 f (x + x)
rn (x; x) = ( x)n+1 where (0 < < 1)
(n + 1)! @xn+1
@F(x) 1 @ 2 F(x)
F(x) = F(x) + x+ ( x; x) + ::: (9)
@x 2! @x2
1 @ n F(x)
:::: + ( x; x; :::; x) + Rn (x; x)
n! @xn
1 @ n+1 F(x + x)
Rn (x; x) = ( x; x; :::; x) where (0 < < 1)
(n + 1)! @xn+1
h i
It mat be noted that the F(x) 2 Rm ; Jacobian @F(x) @x
is a matrix of dimension (m n),
h 2 i h r i
@ F(x) @ F(x)
@x2 is a (m n n) dimensional array and so on. In general, @xr is a (m n n::: n)
dimensional array such that when vector x operates on it the product is a m 1 vector.
The following two multidimensional cases are used very frequently in the numerical analysis.
8
Case A: Scalar Function f (x) : Rn ! R
1 T
f (x) = f (x) + [rf (x)]T x + x r2 f (x) x + R3 (x; x) (10)
2!
T
@f (x) @f @f @f
rf (x) = = :::::: (Gradient)
@x @x1 @x2 @xn x=x
2 2 3
@ f @2f @2f
6 @x21 ::::::
6 @x1 @x2 @x1 @xn 7 7
6 @ f 2
@ f2
@2f 7
2
@ f (x) 6 :::::: 7
r2 f (x) = = 6 @x2 @x1 @x22 @x @x 7 (Hessian)
@x2 6 2 n 7
6 :::::: :::::: :::::: :::::: 7
6 7
4 @2f @2f @2f 5
::::::
@xn @x1 @xn @x2 @x2n x=x
1 X n Xn Xn 3
@ f (x + x)
R3 (x; x) = xi xj xk ; (0 < < 1)
3! i=1 j=1 k=1 @xi @xj @xk
Note that the gradient rf (x) is an n 1 vector and the Hessian r2 f (x) is an n n
matrix..
@f1 @f1
f (x) = f (x) + x
@x1 @x2 x=x
2 2 3
@ f @2f
6 @x2 @x1 @x2 7
+ [ x]T 6
4 @ f
1
2
7
5 x+R3 (x; x) (11)
@2f
@x2 @x1 @x22 x=x
" #
h i x 1
1
= 2(1 + e2 ) + (2 + e2 ) (2 + e2 )
x2 1
" #T " #" #
x1 1 (2 + e2 ) e2 x1 1
+ + R3 (x; x) (12)
x2 1 e2 (2 + e2 ) x2 1
9
Case B: Function vector F (x) : Rn ! Rn
@F (x)
F (x) = F (x) + x + R2 (x; x) (13)
@x
2 3
@f1 @f1 @f1
::::::
6 @x1 @x2 @xn 7
6 @f @f2 @f2 7
@F (x) 6 2
:::::: 7
6 7
= 6 @x1 @x2 @xn 7 (Jackobian)
@x 6 :::::: :::::: :::::: :::::: 7
6 7
4 @f @fn @fn 5
n
::::::
@x1 @x2 @xn x=x
Consider the following general form of 2nd order ODE-BVP problem frequently encountered
in engineering problems
d2 u du
; ; u; z = 0 f or z 2 (0; 1) (14)
dz 2 dz
du
B:C: 1 (at z = 0) : f1 ; u; 0 = 0 (15)
dz
10
du
B:C: 2 (at z = 1) : f2 ; u; 1 = 0 (16a)
dz
Let u (z) 2 C (2) [0; 1] denote the exact / true solution to the above ODE-BVP. Depending
on the nature of operator ;it may or may not be possible to …nd the true solution to the
problem. In the present case, however, we are interested in …nding an approximate numerical
e(z); to the above ODE-BVP. The basic idea in …nite di¤erence approach is
solution, say u
to convert the ODE-BVP into a set of coupled linear or nonlinear algebraic equations using
local approximation of the derivatives based on Taylor series expansion. In order to achieve
this, the domain 0 z 1 is divided into (n + 1) grid points z0 ; z1 ; ::::; zn located such that
which is considered for the subsequent development. Let the value of u e(z) at location zi be
denoted as ui = ue(zi ): If z is su¢ ciently small, then, using the Taylor Series expansion,
we can be write
du 1 d2 u 1 d3 u
ui 1 = ui ( zi ) + ( z)2 (1=3!) ( z)3 + ::::: (18)
dz z=zi 2! dz 2 z=zi 3! dz 3 z=zi
du
From equations (17) and (18) we can arrive at several approximate expressions for dz z=zi
.
Rearranging equation (17) we obtain
" #
du (ui+1 ui ) d2 u
= ( z=2) + ::: (19)
dz z=zi z dz 2 z=zi
and, neglecting the higher order terms when z is su¢ ciently small, we obtain the forward
di¤erence approximation of the local …rst order derivative, i.e.
du (ui+1 ui )
'
dz z=zi z
11
Similarly, starting from equation (18), we can arrive at the backward di¤erence approxima-
tion of the local …rst order derivative, i.e.
du (ui ui 1 )
' (20)
dz z=zi z
Combining equations (17) and (18) we can arrive at the following expression
(ui+1 ui 1 ) h i
(3)
(du=dz)i = ui ( z 2 =3!) + ::: (21)
2( z)
and, neglecting the higher order terms when z is su¢ ciently small, we obtain the central
di¤erence approximation of the local …rst order derivative, i.e.
du (ui+1 ui 1 )
' (22)
dz z=zi 2( z)
The errors in the forward and the backward di¤erence approximations are of the order of
z; which is denotes a O( z): The central di¤erence approximation is accurate to O[( z)2 ]
and is more commonly used. Equation (??) and (20) can be combined as follows
" #
d2 u (ui+1 2ui + ui 1 ) d4 u ( z)2
= 2 + ::: (23)
dz 2 z=zi ( z)2 dz 4 z=zi 4!
to arrive at an approximation for the second order derivatives at the i’th grid point
d2 u (ui+1 2ui + ui 1 )
' (24)
dz 2 z=zi ( z)2
Note that errors in the approximations (22) and (24) are of order O[ z)2 ]. While discretizing
the ODE, it is preferable to use the approximations having similar accuracies. These local
approximations of the local derivatives can be used to discretize the ODE-BVP as follows:
12
– Approach 1: Use one-sided derivatives only at the boundary points, i.e.,
(u1 u0 )
f1 ; u0 ; 0 = 0 (27)
z
(un un 1 )
f2 ; un ; 1 = 0 (28)
z
This gives remaining two equations.
– Approach 2:
(u1 u 1)
f1 [ ; u0 ; 0] = 0 (29)
(2 z)
un+1 un 1
f2 [ ; un ; 1] = 0 (30)
( z)
This approach introduces two more variables u 1 and un+1 at hypothetical grid
points. Thus we have n + 3 variables and n + 1 equations, two more algebraic
equations can be generated by setting residual at zero at boundary points,i.e., at
z0 and zn ,i.e.,
R0 = 0 and Rn = 0
This results in (n + 3) equations in (n + 3) unknowns.
It may be noted that the local approximations of the derivatives are developed under the
assumption that z is chosen su¢ ciently small. Consequently, it can be expected that the
quality of approximate solution would improve with increase in the number of grid points.
d2 T
k 2 + q = 0 f or 0 < z < L (31)
dz
13
Note that this problem can be solved analytically. However, it is used here to introduce the
concepts of discretization by …nite di¤erence approach. Dividing the region 0 z L into
n equal subregions and setting residuals zero at the internal grid points, we have
(Ti 1 2Ti + 2Ti+1 ) q
+ =0 (34)
( z)2 k
for i = 1; 2; :::::::(:n 1): Using the boundary condition (32), the residual at i = 1 reduces to
q
2T1 + T2 = ( z)2 T (35)
k
Using one sided derivative at z = L, boundary condition (33) reduces to
(Tn Tn 1 )
k = h(T1 Tn ) (36)
( z)
or
h z T1
Tn (1 +) Tn 1 = h z (37)
k k
Rearranging the equations in the matrix form, we have
x=e
Ae y
h iT
e=
x T1 T2 ::: Tnx 1
h iT
e=
y ( z)2 (q=k) T ( z)2 (q=k) ::: h( z)T1 =k
2 3
2 1 0 0:::::::::::::::::::::::::
6 7
6 1 2 1 0::::::::::::::::::::::::: 7
6 7
6 0 1 2 1::::::::::::::::::::::::: 7
Tb = A = 66 :
7
7
6 : : ::::::::::::::::::::::::::: 7
6 7
4 : : : ::::::: 2:::::::1::::::::: 5
0 0 0 ::::::: 1:: (1 + h z=k)
Thus, after discretization, the ODE-BVP is reduced to a set of linear algebraic equation. It
may also be noted that we end up with a tridiagonal matrix A, which is a sparse matrix i.e.
it it contains large number of zero elements.
Example 10 Consider the ODE-BVP describing the steady state conditions in a tubular
reactor with axial mixing (TRAM) in which an irreversible 2nd order reaction is carried out
at a constant temperature. The steady state behavior can be modelled using the following
ODE-BVP:
1 d2 C dC
DaC 2 = 0 (0 z 1) (38)
P e dz 2 dz
14
dC
B:C:at z = 0 : = P e(C 1) at z = 0; (39)
dz
dC
B:C:at z = 1 : =0 at z = 1; (40)
dz
Forcing residuals at (n-1) internal grid points to zero, we have
1 Ci+1 2Ci + Ci 1 Ci+1 Ci 1
= DaCi2
Pe ( z)2 2 ( z)
i = 1; 2; :::n 1
De…ning
1 1 2
= 2 + ; =
( z) P e 2 ( z) P e ( z)2
the above set of nonlinear equations can be rearranged as follows
Ci+1 Ci + Ci 1 = DaCi2
i = 1; 2; :::n 1
The two boundary conditions yield two additional equations
C1 C0
= P e(C0 1)
z
Cn Cn 1
= 0
z
the above set of nonlinear algebraic equations can be arranged as follow
Tb (e
x) = Ae x) =0
x G(e (41)
where
2 3
(1 + zP e) 1 0: ::: ::: 0
6 7
6 ::: ::: :: 7
6 7
A = 6
6 :::: ::::: ::::: ::: ::: 0: 7
7 (42)
6 7
4 ::::: ::::: ::::: ::: 5
0 ::::: ::::: ::: 1 1
2 3 2 3
C0 P e ( z)
6 7 6 7
6 C1 7 6 DaC12 7
6 7 6 7
e = 6
x 6 ::: 7
7 ; G(e 6
x) = 6 ::::: 7
7 (43)
6 7 6 2 7
4 ::: 5 4 DaCn 1 5
Cn 0
Thus, the ODE-BVP is reduced to a set of coupled nonlinear algebraic equations after dis-
cretization.
15
3.3 Discretization of PDEs using Finite Di¤erence [2]
Typical second order PDEs that we encounter in engineering problems are of the form
@u
ar2 u + bru + cg(u) = f (x1 ; x2 ; x3 ; t)
@t
x1;L < x1 < x1;H ; x2;L < x2 < x2;H ; x3;L < x3 < x3;H
subject to appropriate boundary conditions and initial conditions. For example, the Lapla-
cian operators r2 and gradient operator r are de…ned in the Cartesian coordinates as follows
@u @u @u
ru = + +
@x @y @z
@2u @2u @2u
r2 u = + +
@x2 @y 2 @z 2
In Cartesian coordinate system, we construct grid lines parallel to x, y and z axis and force
the residuals to zero at the internal grid points. For example, the partial derivative of the
dependent variable u with respect to x at grid point (xi; yj; zk ) can be approximated as follows
@u (ui+1;j;k ui 1;j;k )
=
@x ijk 2( x)
2
@ u (ui+1;j;k 2ui;j;k + ui 1;j;k )
=
@x2 ijk ( x)2
The partial derivatives in the remaining directions can be approximated in analogous manner.
It may be noted that the partial derivatives are approximated by considering one variable
at a time and is equivalent to application of Taylor series expansion of a scalar function.
When the PDE involves only the spatial derivatives, the discetization process yields either
a coupled set of linear / nonlinear algebraic equations or an ODE-BVP. When the PDEs
involve time derivatives, the discretization is carried out only in the spacial coordinates. As a
consequence, the discretization process yields coupled nonlinear ODEs with initial conditions
speci…ed, i.e. an ODE-IVP.
Example 11 Consider the PDE describing the unsteady state condition in a tubular reactor
with axial mixing (TRAM) in which an irreversible 2nd order reaction is carried out.
@C 1 @2C @C
= DaC 2 in (0 < z < 1) (44)
@t P e @z 2 @z
Example 12 (FurnacePDE) Laplace equation represents a prototype for steady state dif-
fusion processes. For example 2-dimensional Laplace equation
@2T @2T
+ = f (x; y) (54)
@x2 @y 2
0<x<1; 0<y<1
where T is temperature and x; y are dimensionless space coordinates. Equations similar to
this arise in many problems of ‡uid mechanics, heat transfer and mass transfer. In the
17
present case , T (x; y) represents the dimensionless temperature distribution in a furnace and
represents thermal di¤usivity. Three walls of the furnace are insulated and maintained
at a constant temperature. Convective heat transfer occurs from the fourth boundary to the
atmosphere. The boundary conditions are as follows:
x = 0 : T =T ; x=1:T =T (55)
y = 0 : T =T (56)
dT (x; 1)
y = 1:k = h [T1 T (x; 1)] (57)
dy
We construct the 2 -dimensional grid with (nx + 1) equispaced grid lines parallel to y axis
and (ny + 1) equispaced grid lines parallel to x axis. The temperature T at (i; j) th grid point
is denoted as Tij = T (xi; yj ):We then force the residual to be zero at each internal grid point
to obtain the following set of equations:
(Ti+1;j 2Ti ;j +Ti 1;j ) (Ti;j+1 2Ti ;j +Ti;j 1 )
+ = f (xi ; yj ) (58)
( x)2 ( y)2
for (i = 1; 2; :::; nx 1) and ( j = 1; 2; :::; ny 1): Note that regardless of the size of the system,
each equation contains not more than 5 unknowns, resulting in a sparse linear algebraic
system. Consider the special case when
x= y=
18
we can rearrange the above set of equations in form of Aex = b, then A turns out to be a
large sparse matrix. Even for modest choice of 10 internal grid lines in each direction, we
would get a 100 100 sparse matrix associated with 100 variables.
which can be used to eliminate variables in the above set of ODE that lie on the corresponding
edges. The boundary conditions ate y = 0 and y = 1; we have two boundary conditions for
the set of ODE
e(0) = T
x
de
x(Ly )
= G [e
x(Ly )]
dt
19
Example 14 Consider the 2-dimensional unsteady state heat transfer problem
@T @2T @2T
= [ 2 + ] + f (x; y; t) (64)
@t @x @y 2
t = 0 : T = H(x; y) (65)
x = 0 : T (0; y; t) = T ; x = 1 : T (1; y; t) = T (66)
y = 0 : T (x; 0; t) = T ; (67)
dT (x; 1; t)
y = 1 : k = h(T1 T (x; 1; t)) (68)
dy
where T (x; y; t) is the temperature at locations (x; y) at time t and is the thermal di¤usivity.
By …nite di¤erence approach, we construct a 2-dimensional grid with nx 1 equispaced grid
lines parallel to the y-axis and ny 1 grid lines parallel to the x-axis. The temperature T at
the (i; j)’th grid point is given by
Now, we force the residual to zero at each internal grid point to generate a set of coupled
ODE-IVP’s as
dTij
= [Ti+1;j 2Ti;j + Ti 1;j ]
dt ( x)2
+ [Ti;j+1 2Ti;j + Ti;j 1 ] + f (x; y; t) (70)
( y)2
for i = 1; 2; ::::nx 1 and j = 1; 2; ::::ny 1
Using the boundary conditions, we have constraints at the four boundaries
20
the PDE after discretization is reduced to a set of coupled ODEs of the form
de
x
Tb (e
x) = F (e
x; t) =0
dt
e(0)
subject to the initial condition x
T
e(0) = [H(x1 ; y1 ) H(x1 ; y2 )::::H(xnx 1 ; y1 )::::H(xnx 1 ; yny
x 1 )]
F(x) = 0
e )= F(e @F
F(x ) ' F(x x) + e
x
@x x=e
x
xe = x x e
21
The approximated operator equation can be rearranged as follows
@F
e] =
[ x x)
F(e
@x x=ex
(n n) matrix (n 1) vector = (n 1) vector
which corresponds to the standard form Ax = b: Solving the above linear equation yields
e and, if the guess solution x
x e is su¢ ciently close to true solution, then
x e+ x
x e (73)
However, we may not reach the true solution in a single iteration. Thus, equation (73) is
eN ew ; as follows
used to generate a new guess solution, say x
eN ew = x
x e+ e
x (74)
22
4.1 Legrange Interpolation
In Legrange interpolation, it is desired to …nd an interpolating polynomial p(z) of the form
n
p(z) = 0 + 1z + ::::: + nz (76)
such that
p(z) = ui for i = 0; 1; 2; :::n
To …nd coe¢ cients of the polynomial that passes exactly through fui : i = 0; 1; 2; :::ng;
consider (n+1) equations
n
0 + 1 z0 + ::::: + n z0 = u0
n
0 + 1 z1 + ::::: + n z1 = u1
:::: = ::::
n
0 + 1 zn + ::::: + n zn = un
23
sub-intervals and a lower order spline approximation is developed on each sub-interval. Let
[a,b] be a …nite interval. We introduce a partition of the interval by placing points
There are total 4n unknown coe¢ cients f 0;0 ; 1;0 ::::::: 3;n 1 g to be determined. In order to
ensure continuity and smoothness of the approximation, the following conditions are imposed
pn 1 (zn ) = un (86)
24
Conditions for ensuring continuity between two neighboring polynomials
Using constraints (85-89), we get the following set of coupled linear algebraic equations
f or i = 0; 1; 2; :::; n 2:
In addition, using the free boundary conditions, we have
2;0 = 0 (97)
2;n 1 +3 3;n 1 ( zn 1 ) = 0 (98)
2;i+1 2;i
3;i = for i = 0; 1; 2; :::; n 2 (99)
3 ( zi )
2;n 1
3;n 1 = (100)
3 ( zi )
25
and eliminating 1;n 1 using equation (93), we have
1 zi
1;i = ( 0;i+1 0;i ) (2 2;i + 2;i+1 ) (101)
zi 3
for i = 0; 1; 2; :::; n 2
yn 0;n 1
1;n 1 = ( zn 1 ) 2;n 1 3;n 1 ( zn 1 )2 (102)
zn 1
Thus, we get only f 2;i : i = 0; 1; :::n 1g as unknowns and the resulting set of linear equa-
tions assume the form
2;0 = 0 (103)
( zi 1 ) 2;i 1 + 2( zi + zi 1 ) 2;i + ( zi 1 ) 2;i+1 = bi (104)
for i = 1; 2; :::; n 2
where
3( 0;i+1 a0;1 ) 3( 0;i a0;i 1 )
bi =
zi zi 1
3(ui+1 u1 ) 3(ui ui 1 )
=
zi zi 1
for ( i = 1; 2; :::; n 2)
1 2
( zn 2 ) 2;n 2 + ( zn 2 + zn 1 ) 2;n 1 = bn (105)
3 3
un 1 1 un 2
bn = + yn 1 +
zn 1 zn 1 zn 2 zn 2
De…ning vector 2 as
h iT
2 = 2;0 2;1 ::::::: 2;n
A 2 =b (106)
26
4.3 Interpolation using Linearly Independent Functions
While polynomial is a popular choice as basis for interpolation, any set of linearly indepen-
dent functions de…ned on [a,b] can be used for developing an interpolating function. Let
ff0 (z); f1 (z); :::fn (z)g represent a set of linearly independent functions in C[a; b]: Then, we
can construct an interpolating function, g(z); as follows
Forcing the interpolating function to have values ui at z = zi leads to the following set of
linear algebraic equations
i = 0; 1; :::; n
which can be further rearranged as A = u where
2 3
f0 (0) f1 (0) :::: fn (0)
6 f0 (z1 ) f1 (z1 ) :::: fn (z1 ) 7
6 7
A=6 7 (109)
4 :::: :::: :::: :::: 5
f0 (1) f1 (1) :::: fn (1)
and vectors and u are de…ned by equations (78) and (79), respectively. Commonly used
interpolating functions are
Chebysheve polynomials
iz
Exponential functions fe : i = 0; 1; :::ng with 0 :::: n speci…ed i.e.
1z 2z nz
g(z) = 1e + 2e + :::::::::::::::::::::: + ne (110)
27
4.4.1 Discretization of ODE-BVP
Consider the second order ODE-BVP given by equations (14), (15) and (16a). To see how the
problem discretization can be carried out using Legrange interpolation, consider a selected
set of collocation (grid) points fzi : i = 0; 1; :::ng in the domain [0; 1] such that z0 = 0 and
zn = 1 and fz1 ; z2 ; ::::zn 1 g 2 (0; 1) such that
Let fui = u(zi ) : i = 0; 1; :::ng represent the values of the dependent variable at these
e(z); of the
collocation points. Given these points, we can propose an approximate solution, u
form
e(z) = 0 + 1 z + ::::: + n z n
u
to the ODE-BVP as an interpolation polynomial that passes exactly through fui : i =
0; 1; :::ng: This requires that the following set of equations hold
n
e(zi ) =
u 0 + 1 zi + ::::: + n zi = ui
i = 0; 1; :::; n
at the collocation points. The unknown polynomial coe¢ cients f i : i = 0; 1; :::ng can be
expressed in terms of unknowns fui : i = 0; 1; :::ng as follows
= A 1u
where matrix A is de…ned in equation (77). To approximate the OBE-BVP in (0; 1), we
e(z), i.e.
force the residuals at the collocation points to zero using the approximate solution u
d2 u
e(zi ) de
u(zi )
Ri = 2
; e(zi ); zi = 0
;u (111)
dz dz
for i = 1; 2; :::n 1: Thus, we need to compute the …rst and second derivatives of the
e(z) at the collocation points. The …rst derivative at i’th collocation
approximate solution u
point can be computed as follows
de
u(zi )
= 0 0 + 1 + 2 2 zi + ::::::: + n n zin 1
(112)
dz h i
n 1
= 0 1 2zi ::: nzi (113)
h i
= 0 1 2zi ::: nzin 1 A 1 u (114)
De…ning vector h i
T
s(i) = 0 1 2zi ::: nzin 1 A 1
28
we have
de
u(zi ) T
= s(i) u
dz
Similarly, the second derivative can be expressed in terms of vector u as follows:
d2 u
e(zi )
= 0 0 + 0 1 + 2 2 + ::::::: + n(n 1) n z n 2
(115)
dz 2 h i
= 0 0 2 ::: n(n 1)zin 2 (116)
h i
= 0 0 2 ::: n(n 1)zin 2 A 1 u (117)
De…ning vector h i
(i) T 1
t = 0 0 2 ::: n(n 1)zin 2 A
we have
d2 u
e(zi ) T
2
= t(i) u
dz
Substituting for the …rst and the second derivatives of u e(zi ) in equations in (111), we have
h i
(i) T (i) T (i) T
t u; s u; r u; zi = 0 (118)
29
Table 1: Roots of Shifted Legandre Polynomials
Order (m) Roots
1 0.5
2 0.21132, 0.78868
3 0.1127, 0.5, 0.8873
4 0.9305,0.6703, 0.3297, 0.0695
5 0.9543, 0.7662, 0.5034, 0.2286, 0.0475
6 0.9698, 0.8221, 0.6262, 0.3792, 0.1681, 0.0346
7 0.9740, 0.8667, 0.7151, 0.4853, 0.3076, 0.1246, 0.0267
Example 15 [2] Consider the ODE-BVP describing steady state conditions in a tubular
reactor with axial mixing (TRAM) in which an irreversible 2nd order reaction is carried out.
Using method of orthogonal collocation with n = 4 and de…ning vector
h iT
C = C0 C1 ::: C4
at
z0 = 0; 0z1 = 0:1127; z2 = 0:5; z3 = 0:8873 and z4 = 1
we get following set of …ve simultaneous nonlinear algebraic equations
1 h (i) T i h (i) T i
t C s C DaCi2 = 0
Pe
i = 1; 2; 3
30
h T
i
s(0) C P e(C0 1) = 0
h T
i
s(4) C = 0
where the matrices A; S and T for the selected set of collocation points are
2 3
1 0 0 0 0
6 7
6 1 0:1127 (0:1127)2 (0:1127)3 (0:1127)4 7
6 7
A=6 6 1 0:5 (0:5)2
(0:5)3
(0:5)4 7
7 (122)
6 2 3 4 7
4 1 0:8873 (0:8873) (0:8873) (0:8873) 5
1 1 1 1 1
2 T 3 2 3
s(0) 13 14:79 2:67 1:88 1
6 (1) T 7 6 7
6 s 7 6 5:32 3:87 2:07 1:29 0:68 7
6 7 6 7
S=6 6 s (2) T 7 = 6
7 6 1:5 3:23 0 3:23 1:5 7
7 (123)
6 (3) T 7 6 7
4 s 5 4 0:68 1:29 2:07 3:87 5:32 5
(4) T 1 1:88 2:67 14:79 13
s
2 T 3 2 3
t(0) 84 122:06 58:67 44:60 24
6 (1) T 7 6 7
6 t 7 6 53:24 73:33 26:27 13:33 6:67 7
6 7 6 7
T=6 t 6 (2) T 7 6 7 (124)
7=6 6 16:67 21:33 16:67 6 7
6 (3) T 7 6 7
4 t 5 4 6:76 13:33 26:67 73:33 53:24 5
T
t(4) 24 44:60 58:67 122:06 84
Thus, discretization yields a set of nonlinear algebraic equations.
Remark 16 Are the two methods presented above, i.e. …nite di¤erence and collocation
methods, doing something fundamentally di¤erent? Let us compare the following two cases
(a) …nite di¤erence method with 3 internal grid points (b) collocation with 3 internal grid
points on the basis of expressions used for approximating the …rst and second order derivatives
computed at one of the grid points. For the sake of comparison, we have taken equi-spaced grid
points for the collocation method instead of taking them at the roots of 3’rd order orthogonal
polynomial. Thus, for both collocation and …nite di¤erence method, the grid (or collocation)
points are at fz0 = 0; z1 = 1=4; z2 = 1=2; z3 = 3=4; z4 = 1g. Let us compare expressions for
approximate derivatives at z = z2 used in both the approaches.
31
Collocation
It is clear from the above expressions that the essential di¤erence between the two ap-
proaches is the way the derivatives at any grid (or collocation) point is approximated. The
…nite di¤erence method takes only immediate neighboring points for approximating the deriv-
atives while the collocation method …nds derivatives as weighted sum of all the collocation
(grid) points. As a consequence, the approximate solutions generated by these approaches
will be di¤erent.
Example 17 Consider the ODE-BVP describing steady state conditions in a tubular reactor
with axial mixing (TRAM) given in the above section. Using method of orthogonal collocation
with n 1 internal collocation points, we get
i = 1; 2; 3; :::n 1
where h i
C(t) = C0 (t) C1 (t) ::: Cn (t)
T
Ci (t) represents time varying concentration at the i’th collocation point and the vectors t(i)
T
and s(i) represent row vectors of matrices T and S: de…ned by equation (120). The two
boundary conditions yield the following algebraic constraints
h i
(0) T
s C(t) = P e(C0 (t) 1)
h T
i
s(n) C(t) = 0
These equations can be used to eliminate variables C0 (t) and Cn (t) from the set of ODEs
resulting from discretization. Thus,The resulting set of (n-1) ODEs together with initial
conditions
C1 (0) = f (z1 ) ; C2 (0) = f (z2 ) ; :::::Cn 1 (0) = f (zn 1 ) (125)
32
is the discretized problem. For example, when we select 3 internal grid points as discussed
in Example 15, the boundary constraints can be stated as follows
(13 + P e)Co (t) +14:79C1 (t) 2:67C2 (t) +1:88C3 (t) C4 (t) = Pe
Co (t) 1:88C1 (t) +2:67C2 (t) 14:79C3 (t) +13C4 (t) = 0
These equations can be used to eliminate variables C0 (t) and C4 (t) from the three ODEs
fC1 (t); C2 (t); C3 (t)g by solving the following equation
" #" # " #
(13 + P e) 1 C0 (t) 14:79C1 (t) +2:67C2 (t) 1:88C3 (t) P e
=
1 13 C4 (t) 1:88C1 (t) 2:67C2 (t) +14:79C3 (t)
Example 18 [2] Consider the 2-dimensional Laplace equation given in Example 12. We
consider a scenario where the thermal di¤usivity is function of temperature. To begin with,
we choose (nx 1) internal collocation points along x-axis and (ny 1) internal collocation
points along the y-axis. Using nx 1 internal grid lines parallel to y axis and ny 1 grid lines
parallel to y-axis, we get (nx 1) (ny 1) internal collocation points. Corresponding to the
chosen collocation points, we can compute matrices (Sx ; Tx ) and (Sy ; Ty ) using equations
(121). Using these matrices, the PDE can be transformed as to a set of coupled algebraic
equations as follows
h T
i
(i) T
(Ti;j ) t(j)
x Tx
(j)
+ ty Ty
(i)
= f (xi ; yj )
i = 1; 2; :::nx 1 ; j = 1; 2; :::ny 1
(j) (i)
where vectors Tx and Ty are de…ned as
h i
(j)
Tx = T0;j T1;j ::: Tnx;j
h i
Ty(i) = Ti;0 Ti;1 ::: Ti;ny
At the boundaries, we have
T0;j = T ; (j = 0; 1; :::nx )
T1;j = T ; (j = 0; 1; :::nx )
Ti;0 = T ; (i = 0; 1; :::ny )
T (n )
k s(n
y
y)
Tx(ny ) = h(T1 Tx;iy )
(i = 1; :::nx )
The above discretization procedure yields a set of (nx + 1) (ny + 1) nonlinear algebraic
equations in (nx + 1) (ny + 1) unknowns, which have to be solved simultaneously.
33
4.5 Orthogonal Collocations on Finite Elements (OCFE)
The main di¢ culty with polynomial interpolation is that Vandermond matrix becomes ill
conditioned when the order of interpolation polynomial is selected to be large. A remedy
to this problem is to sub-divide the region into …nite elements and assume a lower order
polynomial spline solution. The collocation points are then selected within each …nite ele-
ment, where the residuals are forced to zero. The continuity conditions (equal slopes) at the
boundaries of neighboring …nite elements gives rise to additional constraints. We illustrate
this method by taking a speci…c example.
Example 19 [2] Consider the ODE-BVP describing steady state conditions in a tubular
reactor with axial mixing (TRAM) in which an irreversible 2nd order reaction is carried out.
It is desired to solve this problem by OCFE approach.
Step 1: The …rst step is to create …nite elements in the domain. Let us assume that we
create 3 sub-domains. Finite Element 1: 0 z 0:3; Finite Element 2: 0:3 z 0:7;
Finite Element 3: 0:7 z 1: It may be noted that these sub-domains need not be equi-sized.
Step 2: On each …nite element, we de…ne a scaled spacial variable as follows
z Z0 z Z1 z Z2
1 = ; 2 = and 3 =
Z1 Z0 Z2 Z1 Z3 Z2
where Z0 = 0; Z1 = 0:3, Z2 = 0:7 and Z4 = 1 represent the boundary points of the …nite
elements: It is desired to develop a polynomial spline solution such that polynomial on each
…nite element is 4’th order. Thus, within each element, we select 3 collocation points at the
root of the 3’rd order shifted Legandre polynomial, i.e.,
in the i’th element Zi 1 z Zi :In the present case, we have total of 9 collocation points. In
addition, we have two points where the neighboring polynomials meet, i.e. at Z1 = 0:3 and
Z2 = 0:7: Thus, there are total of 11 internal points and two boundary points, i.e. Z0 = 0
and Z4 = 1:
Step 3: Let the total set of points created in the previous step be denoted as fz0 ; z1; :::z12: g
and let the corresponding values of the independent variables be denoted as fC0 ; C1; :::; C12: g :
34
Note that variables associate with each of the …nite elements are as follows
h iT
Finite Element 1 C(1) = C0 C1 C2 C3 C4
h iT
Finite Element 2 C(2) = C4 C5 C6 C7 C8
h iT
Finite Element 3 C(3) = C8 C9 C10 C11 C12
Now, we force residuals to zero at all the internal collocation points within a …nite element.
Let h1 ; h2 and h3 denote length of individual …nite elements, i.e.
h1 = Z1 Z0 ; h2 = Z2 Z1 and h3 = Z3 Z2 (126)
1 1 d2 C 1 dC
DaC 2 = 0 for Zi 1 z Zi and i = 1; 2; 3 (127)
Pe h2i d 2i hi d i
The main di¤erence here is that only the variables associated within an element are used
while discretizing the derivatives. Thus, at the collocation point z1 ; the residual is computed
as follows
1 1 T 1 T
R1 = t(1) C(1) s(1) C(1) Da (C1 )2 = 0 (128)
Pe h21 h1
T
t(1) C(1) = (53:24C0 73:33C1 + 26:27C2 13:33C3 + 6:67C4 )
(1) T
s C(1) = ( 5:32C0 + 3:87C1 + 2:07C2 1:29C3 + 0:68C)
T T
where vectors s(1) and t(1) are rows of matrices (123) and (124), respectively. Similarly,
at the collocation point z = z7 , which corresponds to i;3 = 0:5 in Finite Element 2, the
residual is computed as follows
1 1 T 1 T
R7 = t(3) C(2) s(3) C(2) Da (C7 )2 = 0 (129)
Pe h22 h2
T
t(3) C(2) = 6:76C4 13:33C5 + 26:67C6 73:33C7 + 53:24C8
(2) T
s C(2) = 0:68C4 + 1:29C5 2:07C6 3:87C7 + 5:32C8
Other equations arising from the forcing the residuals to zero are
Finite Element 1: R2 = R3 = 0
Finite Element 2: R5 = R6 = 0
Finite Element 3: R9 = R10 = R11 = 0
35
In addition to these 9 equations arising from the residuals at the collocation points, there are
two constraints at the collocation points z4 and z8 ; which ensure smoothness between the the
two neighboring polynomials, i.e.
1 T 1 T
s(4) C(1) = s(0) C(2)
h1 h2
1 T 1 T
s(4) C(2) = s(0) C(3)
h2 h3
The remaining two equations come from discretization of the boundary conditions.
1 h (0) T (1) i
s C = P e(C0 1)
h1
1 h (4) T (3) i
s C = 0
h3
Thus, we have 13 equations in 13 unknowns. It may be noted that, when we collect all the
equations together, we get the following form of equation
AC = F (C)
2 3
A1 [0] [0]
6 7
A = 4 [0] A2 [0] 5
[0] [0] A3 13 13
h iT
C= C0 C1 ::: C12
and F (C) is a 13 1function vector containing all the nonlinear terms. Here, A1 ; A1 and
A3 are each 5 5 matrices. Thus, matrix A is a sparse block diagonal matrix.
The method described above can be easily generalized to any number of …nite elements.
Also, the method can be extended to the discretization of PDEs in a similar way. These
extensions are left to the reader as an exercise and are not discussed separately. Note that
block diagonal and sparse matrices naturally arise when we apply this method.
36
In the development that follows, we slightly change the way the data points are numbered
and the …rst data point is indexed as (u1 ; z1 ). Thus, we are given a data set f(ui ; zi ) : i =
1; :::ng where ui denotes the values dependent variable at z = zi such that . fzi : i =
1; :::ng 2 C[a; b]:Let ff1 (z); :::fm (z)g represent a set of linearly independent functions in
C[a; b]: Then, we can propose to construct an approximating function, say g(z); as follows
where m < n; where the unknown coe¢ cients f 1 ; ::: m g are determined from the data set
in some optimal manner. De…ning approximation error at point zi as
the problem of …nding best approximation g(z) is posed as …nding the parameters f 1 ; ::: m g
such that some norm of the error vector (e) is minimized. Most commonly used norm is
weighted two norm, i.e.
X
n
kek2w;2 = he; eiW = e We = T
wi e2i
i=1
where h i
W = diag w1 w2 ::: wn
and wi > 0 for all i. The set of equations (131) can be expressed as follows
e=u A
h iT
= 1 2 ::: m (132)
h iT
u = u1 u2 ::: un (133)
2 3
f1 (z1 ) f2 (z1 ) :::: fm (z1 )
6 f1 (z2 ) f2 (z2 ) :::: fm (z2 ) 7
6 7
A=6 7 (134)
4 :::: :::: :::: :::: 5
f1 (zn ) f2 (zn ) :::: fm (zn )
37
It may be noted that e 2Rn ; u 2Rn ; 2Rm and A is a non-square matrix of dimension
(n m):Thus, it is desired to choose a solution that minimizes the scalar quantity = eT We;
i.e.
min min T min
= e We = (u A )T W (u A ) (135)
The resulting approximate function is called the least square approximation. Another op-
tion is to …nd the parameters such that in…nite-norm of vector e is minimized w.r.t. the
parameters, i.e.
min
kek1 = jei j
i
These problems involve optimization of a scalar function with respect to minimizing argu-
ment ; which is a vector. The necessary and su¢ cient conditions for qualifying a point to
be an optimum are given in the Appendix.
min n o
= (u A )T W (u A ) (136)
To obtain a unique solution to this problem, the matrices A and W should satisfy the
following conditions
38
Applying the above rules to the scalar function
T
= uT Wu (A )T Wu uT WA + (AT WA)
together with the necessary condition for the optimality yields the following constraint
@
= AT Wu AT Wu + 2(AT WA) = 0 (140)
@
Rearranging the above equation, we have
It may be noted that we have used the fact that WT = W and matrix AT WA is symmetric.
Also, even though A is a non-square (n m) matrix, AT WA is a (m m) square matrix.
When Conditions C1 and C2 are satis…ed, matrix (AT WA) is invertible and the least square
estimate of parameters can computed as
1
LS = AT WA AT W u (142)
Thus, the linear least square estimation problem is …nally reduced to solving equation. Using
the su¢ cient condition for optimality, the Hessian matrix
@2
= 2(AT WA) (143)
@ 2
should be positive de…nite or positive semi-de…nite for the stationary point to be a minimum.
When Conditions C1 and C2 are satis…ed, it can be easily shown that
Thus, the su¢ ciency condition is satis…ed and the stationary point is a minimum. As is a
convex function, it can be shown that the solution LS is the global minimum of = eT We.
In the previous subsection, this result was derived by purely algebraic manipulations. In this
section, we interpret this result from the geometric viewpoint.
39
5.2.1 Distance of a Point from a Line
Suppose we are given a vector b 2R3 and we want to …nd its distance from the line in the
direction of vector a 2R3 : In other words, we are looking for a point p along the line that
is closest to b (see Figure 5.2.1);i.e. p = a such that
is minimum. This problem can be solved by minimizing = kek22 with respect to ; i.e.
min min
= h a b; a bi (147)
min 2
= ha; ai 2 ha; bi + hb; bi (148)
which implies that the error vector e = p b is perpendicular to a: From school geometry,
we know that if p is such a point, then the vector (b p) is perpendicular to direction a: We
have derived this geometric result using principles of optimization. Equation (151) can be
further rearranged as * +
a a
p= p ;b p a; bi b
= hb a (153)
ha; ai ha; ai
a
where ba= p is unit vector along direction of a:and point p is the projection of vector
ha; ai
b along direction ba: Note that the above derivation holds in any general n dimensional space
n
a; b 2R or even any in…nite dimensional vector space.
The equation can be rearranged as
aT b 1
p= a = aaT b = Pr :b (154)
aT a aT a
T
where Pr = aT1 a aa is a n n matrix is called as projection matrix, which projects vector
b into its column space.
40
5.2.2 Distance of a point from Subspace
The situation is exactly same when we are given a point b 2R3 and plane S in R3 , which is
spanned by two linearly independent vectors a(1) ; a(1) : We would like to …nd distance of
b from S;i.e. a point p 2S such that kp bk2 is minimum (see Figure 5.2.2). Again, from
school geometry, we know that such point can be obtained by drawing a perpendicular from
b to S ; p is the point where this perpendicular meets S (see Figure 5.2.2). We would like
to formally derive this result using optimization.
More generally, consider a m dimensional subspace S of Rn such that
where the vectors a(1) ; a(2) ; ::::; a(m) 2 Rn are linearly independent vectors. Given an
arbitrary point b 2Rn , the problem is to …nd a point p in subspace S such that it is closest
to vector b (see Figure 5.2.2). As p 2 S we have
X
m
(1) (2) (m) (i)
p= 1a + 2a + :::: + ma = ia (155)
i=1
In other words, we would like to …nd a point p 2 S such that 2-norm of the error vector,
e = p b;i.e. !
X
m
(i)
kek2 = kp bk2 = ia b (156)
i=1 2
41
42
is minimum. This problem is equivalent to minimizing = kek22 ; i.e.
* m ! !+
min min X X
m
(i) (i)
LS = = ia b ; ia b (157)
i=1 i=1
43
5.2.3 Additional Geometric Insights
De…nition 21 (Row Space): The space spanned by row vectors of matrix A is called as
row space of matrix A and denoted as R(AT ).
De…nition 22 (Null space): The set of all vectors x such that Ax = 0 is called as null
space of matrix A and denoted as N (A):
De…nition 23 (Left Null Space) :The set of all vectors y such that AT y = 0 is called
as null space of matrix A and denoted as N (AT ):
The following fundamental result, which relates dimensions of row and column spaces
with the rank of a matrix, holds true for any m n matrix A:
In other words, the number of linearly independent columns of A equals the number of
linearly independent rows of A.
With this background on the vector spaces associated with a matrix, the following com-
ments regarding the projection matrix are in order.
If columns of A are linearly independent, then matrix AT A is invertible and, the point
p, which is projection of b onto column space of A (i.e. R(A)) is given as
1
p = A LS = A AT A AT b = [Pr ] b (162)
1
Pr = A AT A AT (163)
44
Here matrix Pr is the projection matrix, which projects matrix b onto R(A); i.e. the
column space of A: Note that [Pr ] b is the component of b in R(A)
b (Pr )b = [I Pr ] b (164)
– [Pr ]2 = Pr
– [Pr ]T = Pr
p = Ab = b (165)
This implies
p = A(AT A) 1 AT b = A(AT A) 1
AT A b = Ab = b (166)
When A is square and invertible, every vector projects onto itself, i.e. p =
A(AT A) 1 AT b = (AA 1 )(AT ) 1 AT b = b:
1
Matrix AT A AT is called as pseudo-inverse of matrix A as post multiplication
of this matrix by A yields the identity matrix.
45
Theorem 25 Classical Projection Theorem : Let X be a Hilbert space and S be a …nite
dimensional subspace of X: Corresponding to any vector u 2X; there is unique vector p 2S
such that ku pk2 ku sk2 for any vector s 2 S: Furthermore, a necessary and su¢ cient
condition for p 2S be the unique minimizing vector is that vector (u p) is orthogonal to
S:
Thus, given any …nite dimensional sub-space S spanned by linearly independent vectors
a ; a(2) ; :::::::; a(m) and an arbitrary vector u 2X we seek a vector p 2S
(1)
such that
(1) (2) (m)
u 1a + 2a + :::: + ma 2
(168)
is minimized with respect to scalars b 1; ::::b m : Now, according to the projection theorem, the
unique minimizing vector p is the orthogonal projection of u on S: This translates to the
following set of equations
u p; a(i) = u 1a
(1)
+ 2a
(2)
+ :::: + ma
(m)
; a(i) = 0 (169)
for i = 1; 2; :::m
where
i = e(i) ; u
as e(i) ; e(j) = 0 when i 6= j: It is important to note that, if we choose orthonormal set
e(1) ; e(2) ; :::::::; e(m) and we want to include an additional orthonormal vector, say e(m+1) ;
to this set, then we can compute m+1 as
m+1 = e(m+1) ; y
46
without requiring to recompute 1 ; :::: m :
Remark 26 Given any Hilbert space X and a orthonormal basis for the Hilbert space
e(1) ; e(2) ; ::; e(m) ; ::: we can express any vector u 2X as
(1) (2) (m)
u = 1e + 2e + :::: + me + :::::: (172)
i = e(i) ; u (173)
The series
(1)
u = e(1) ; u e + e(2) ; u e(2) + ::::::::::: + e(i) ; u e(i) + :::: (174)
X
1
(i)
= e(i) ; u e (175)
i=1
Consider problem of approximating a continuous function, say u(z), over interval [0; 1] by a
simple polynomial model of the form
2 m
b(z) =
u 1 + 2z + 3z + :::::: + m+1 z (176)
We want to …nd a polynomial of the form (176), which approximates f (z) in the least square
sense. Geometrically, we want to project f (z) in the (m + 1) dimensional subspace of C2 [0; 1]
spanned by vectors
47
Element hij of the matrix on L.H.S. can be computed as
Z1
1
hij = z j+i 2
dz = (179)
i+j 1
0
where 2 3
1 1=2 1=3 ::: 1=m
6 1=2 1=3 1=4 ::: 1=(m + 1) 7
6 7
Hm+1 = 6 7 (181)
4 ::: ::: ::: ::: ::: 5
1=m ::: ::: ::: 1=(2m 1) (m+1) (m+1)
The matrix Hm+1 is known as Hilbert matrix and this matrix is highly ill-conditioned for
m + 1 > 3: The following table shows condition numbers for a few values of m:(Refer to
Lecture Notes on Solving Linear Algebraic Equations to know about the concept of condition
number and matrix conditioning).
m+1 3 4 5 6 7 8
(182)
c2 (H) 524 1.55e4 4.67e5 1.5e7 4.75e8 1.53e10
Thus, for polynomial models of small order, say m = 3 we obtain good situation, but
beyond this order, what ever be the method of solution, we get approximations of less and
less accuracy. This implies that approximating a continuous function by polynomial of type
(??) with the choice of basis vectors as (177) is extremely ill-conditioned problem from the
viewpoint of numerical computations. Also, note that if we want to increase the degree of
polynomial to say (m + 1)from m, then we have to recompute 0 ; ::::; m along with m+1 :
On the other hand, consider the model
where pi (z) represents the i’th order orthonormal basis function on C2 [0; 1] i.e.
( )
1 if i = j
hpi (z); pj (z)i = (184)
0 if i 6= j
48
the normal equation reduces to
2 32 3 2 3
1 0 :::: 0 1 hp1 (z);u(z)i
6 0 1 :::: 0 76 7 6 hp2 (z);u(z)i 7
6 76 2 7 6 7
6 76 7=6 7 (185)
4 ::::: ::::: ::::: ::::: 5 4 :::: 5 4 :::: 5
0 0 ::::: 1 m hpm (z); u(z)i
or simply
i = hpi (z);u(z)i ; i = 1; 2; ::::m (186)
Obviously, the approximation problem is extremely well conditioned in this case. In fact, if
we want to increase the degree of polynomial to say (m + 1) from m, then we do not have
to recompute 1 ; ::::; m as in the case basis (177). We simply have to compute the m+1 as
Suppose we only know numerical fu1 ; u2 ; ::::::un g at points fz1 ; z2 ; ::::::zn g 2 [0; 1] and we
want to develop a simple polynomial model of the form given by equation (??). Substituting
the data into the polynomial model leads to an overdertermined set of equations
2 m 1
ui = 1 + 1 zi + 2 zi + ::::: + m zi + ei (188)
i = 1; 2; :::::n (189)
The least square estimates of the model parameters ( for W = I) can be obtained by solving
normal equation
(AT A)b = AT u (190)
where
2 3
1 z1 z12 ::: z1m 1
6 7
A = 4 ::: ::: ::: ::: ::::::: 5 (191)
1 zn zn2 ::: znm 1
2 P P 2 P m 1 3
n zi zi :::: z
6 Pz P 2 Pi m 7
6 i zi ::::: :::: zi 7
AT A = 6 7 (192)
4 ::::: ::::: ::::: :::: :::::: 5
P m 1 P 2m 2
zi ::::: ::::: :::: zi
49
i.e.,
X
n
T
(A A)ik = zij+k 2
(193)
i=1
Let us assume that zi is uniformly distributed in interval [0; 1]. For large n, approximating
dz = zi zi 1 ' 1=n; we can write
X
n Z1
n
T
A A jk
= zij+k 2 = n z j+k 2
dz = (194)
i=1
j+k 1
0
( j; k = 1; 2; :::::; m ) (195)
which is highly ill- conditioned for large m. Thus, whether we have a continuous function
or numerical data over interval [0; 1]; the numerical di¢ culties persists as the Hilbert matrix
appears in both the cases.
To understand the motivation for developing this approach, …rst consider a linear system of
equations
Ax = b (197)
50
where A is a n n positive de…nite and symmetric matrix and it is desired to solve for vector
x. We can pose this as a minimization problem by de…ning an objective function of the form
which is precisely the equation we want to solve. Since the Hessian matrix
@ 2 =@x2 = A
d2 u
Lu = = f (z) (201)
dz 2
B:C: 1 : u(0) = 0 (202)
B:C: 2 : u(1) = 0 (203)
Similar to the linear operator (matrix) A;which operates on vector x 2 Rn to produce another
vector b 2Rn ; the linear operator L = [ d2 =dz 2 ]operates on vector u(z) 2 C (2) [0; 1] to
produce f (z) 2 C[0; 1]: Note that the matrix A in our motivating example is symmetric and
positive de…nite, i.e.
In order to see how the concept of symmetric matrix can be generalized to operators on
in…nite dimensional spaces, let us …rst de…ne adjoint of a matrix.
51
and it is easy to see that A = AT ; i.e.
To begin with, let us check whether the operator L de…ned by equations (201-203) is
self-adjoint.
Z1
hv; Lui = v(z)( d2 u=dz 2 )dz
0
1 Z1
du dv du
= v(z) + dz
dz 0 dz dz
0
1 1 Z1
du dv d2 v
= v(z) + u(z) + u(z)dz
dz 0 dz 0 dz 2
0
If we set
B:C:1 : v(0) = 0
B:C:2 : v(1) = 0
then
1
du
v(z) =0
dz 0
52
and we have
Z1
d2 v
hv; Lui = u(z)dz = hL v; ui
dz 2
0
In fact, it is easy to see that the operator L is self adjoint as L = L, BC1 = BC1 and
BC2 = BC2: In addition to the self-adjointness of L; we have
1 Z1 2
du du
hu; Lui = u(z) + dz
dz 0 dz
0
Z1 2
du
= dz > 0 for all u(z)
dz
0
when u(z) is a non-zero vector in C (2) [0; 1]: In other words, solving the ODE-BVP is anal-
ogous to solving Ax = b by optimization formulation where A is symmetric and positive
de…nite matrix, i.e.
A $ d2 =dz 2 ; x $ u(z); b $ f (z)
Let u(z) = u (z) represent the true solution of the ODE-BVP. Now, taking motivation from
the optimization formulation for solving Ax = b, we can formulate a minimization problem
to compute the solution
Z1 Z1
= 1=2 u(z)( d2 u=dz 2 )dz u(z)f (z)dz (206)
0 0
M in
u (z) = [u(z)] (207)
u(z)
M in
= (1=2) hu(z); Lu(z)i hu(z); f (z)i (208)
u(z)
53
Thus, solving the ODE BV P has been converted solving a minimization problem. Inte-
grating the …rst term in equation (206) by parts, we have
Z1 Z1 2 1
d2 u du du
u(z) dz = dz u (210)
dz 2 dz dz 0
0 0
Now, using boundary conditions, we have
1
du du du
u = u(0) u(1) =0 (211)
dz 0 dz z=0 dz z=1
The above equation is similar to an energy function, where the …rst term is analogous to
kinetic energy and the second term is analogous to potential energy. As
Z1 2
du
dz
dz
0
is positive and symmetric, we are guaranteed to …nd the minimum. The main di¢ culty
in performing the search is that, unlike the previous case where we were working in Rn ;
the search space is in…nite dimensional as u(z) 2 C (2) [0; 1]. One remedy to alleviate this
di¢ culty is to reduce the in…nite dimensional search problem to a …nite dimensional search
space by constructing an approximate solution using n trial functions. Let v (1) (z); :::::; v (n) (z)
represent the trial functions. Then, the approximate solution is constructed as follows
(0) (n)
b(z) =
u 0v (z) + ::::: + nv (z) (213)
where v (i) (z) represents trial functions. Using this approximation, we convert the in…nite
dimensional optimization problem to a …nite dimensional optimization problem as follows
2 3 2 1 3
Z1 2 Z
db
u
M in b( ) = 41=2 dz 5 4 u bf (z)dz 5 (214)
dz
0 0
Z1
2
= 1=2 0 dv (0) (z)=dz + ::::: + n dv (n) (z)=dz dz
0
Z1
(0) (n)
f (z)[ 0v (z) + ::::: + nv (z)]dz (215)
0
54
The trial functions v (i) (z) are chosen in advance and coe¢ cients 1 ; :::: m are treated as
unknown. Also, let us assume that these functions are selected such that u b(0) = u
b(1) = 0:
Then, using the necessary conditions for optimality, we get
@b
= 0 f or i = 0; 2; :::n (216)
@ i
These equations can be rearranged as follows
@b
=A b=0 (217)
@
where h iT
= 0 2 ::: n
2 3
dv (0) dv (0) dv (0) dv (n)
6 ; :::::::: ; 7
6 dz dz dz dz 7
6 7
A=6 :::::::::::::::::: :::::::: ::::::::::::: 7 (218)
6 7
4 dv (n) dv (0) dv (n) dv (n) 5
; :::::::: ;
dz dz dz dz
2 3
v (1) (z); f (z)
6 7
b = 4 :::::::::::: 5 (219)
(n)
v (z); f (z)
Thus, the optimization problem under consideration can be recast as follows
M in b M in T T
( )= (1=2) A b (220)
It is easy to see that matrix A is positive de…nite and symmetric and the global minimum
of the above optimization problem can be found by using necessary condition for optimality
i.e. @ b=@ = A b = 0 or = A 1 b: Note the similarity of the above equation with
the normal equation arising from the projection theorem. Thus, steps in the Raleigh-Ritz
method can be summarized as follows
3. Solve for A = b
55
5.4.2 Discretization of ODE-BVP / PDEs using Finite Element Method
The …nite element method is a powerful tool for solving PDEs particularly when the system
under consideration has complex geometry. This method is based on the least square approx-
imation. In this section, we provide a very brief introduction to the method discretization
of PDEs and ODE-BVPs using the …nite element method.
56
In principle, we can work with this piecewise polynomial approximation. However, the
resulting optimization problems has coe¢ cients (ai ; bi : i = 1; 2; :::n) as unknowns. If the
optimization problem has to be solved numerically, it is hard to generate initial guess for
these unknown coe¢ cients. Thus, it is necessary to parameterize the polynomial in terms of
unknowns for which it is relatively easy to generate the initial guess. This can be achieved
bi denote the value of the approximate solution u
as follows. Let u b(z) at z = zi ; i.e.
bi = u
u b(zi ) (224)
Then, at the boundary points of the i’th element, we have
bi (zi 1 ) = u
u bi 1 = ai + bi zi 1 (225)
bi (zi ) = u
u bi = ai + bi zi (226)
Using these equations, we can express (ai ; bi ) in terms of unknowns (b bi ) as follows
ui 1 ; u
bi 1 zi
uubi zi 1 bi u
u bi 1
ai = ; bi = (227)
z z
Thus, the polynomial on the i’th segment can be written as
bi 1 zi
u ubi zi 1 bi u
u bi 1
bi (z) =
u + z (228)
z z
zi 1 z zi for i = 1; 2; :::n 1
and the approximate solution can be expressed as follows
8 9
> b0 z1
u b1 u
u b0 >
>
> + z f or z0 z z1 >
>
>
> z z >
>
>
> >
>
>
< b1 z2 u
u b2 z1 b2 u
u b1 >
=
+ z f or z1 z z2
b(z) =
u z z (229)
>
> >
>
>
> :::::::::::::: >
>
>
> >
>
>
> b z
u ubn zn 1 bn u
u bn 1 >
: n 1 n + z f or zn 1 z zn >
;
z z
Thus, now we can work in terms of unknown values fb b1 ; ::::b
u0; u un g instead of parameters ai
and bi: . Since unknowns fb b1 ; ::::b
u0; u un g correspond to some physical variable, it is relatively
easy to generate good guesses for these unknowns from knowledge of the underlying physics
of the problem. The resulting form is still not convenient from the viewpoint of evaluating
integrals involved in the computation of [b u(z)] : A more elegant and useful form of equation
(??) can be found by de…ning shape functions. To arrive at this representation, consider the
rearrangement of the line segement equation on i’th element as follows
bi 1 zi
u ubi zi 1 bi
u bi
u 1
bi (z) =
u + z (230)
z z
zi z z zi 1
= bi
u 1 + bi
u
z z
57
Let us de…ne two functions, Mi (z) and Ni (z); which are called as shape functions, as follows
zi z z zi 1
Mi (z) = ; Ni (z) =
z z
zi 1 z zi for i = 1; 2; :::n 1
The graphs of these shape functions are straight lines and they have fundamental properties
( )
1 ; z = zi 1
Mi (z) = (231)
0 ; z = zi
( )
0 ; z = zi 1
Ni (z) = (232)
1 ; z = zi
bi (z) as
This allows us to express u
bi (z) = u
u bi 1 Mi (z) + u
bi Ni (z)
i = 1; 2; :::n
bi appears in polynomials u
Note that the coe¢ cient u bi (z) and u
bi+1 (z); i.e.
bi (z) = u
u bi 1 Mi (z) + u
bi Ni (z)
bi+1 (z) = u
u bi Mi+1 (z) + u
bi+1 Ni+1 (z)
Thus, we can de…ne a continuous trial function by combining Ni (z) and Mi+1 (z) as follows
8 9
> z zi 1 z zi
>
> N (z) = =1+ ; zi 1 z zi > >
>
< i z z =
(i)
v (z) = zi+1 z z zi (233)
> Mi+1 (z) = =1 ; zi z zi+1 >
>
> z z >
>
: 0 Elsewhere ;
i = 1; 2; ::::n
This yields the simplest and most widely used hat function, which is shown in Figure 5.4.2.
This is a continuous linear function of z; but, it is not di¤erentiable at zi 1; zi; and zi+1 : Also,
note that at z = zi ; we have
( )
1 if i = j
v (i) (zj ) = (234)
0 otherwise
j = 1; 2; ::::; n
Thus, plot of this function looks like a symmetric triangle. The two functions at the boundary
points are de…ned as ramps
( z )
M 1 (z) = 1 ; 0 z z1
v (0) (z) = z (235)
0 Elsewhere
58
8 9
< N (z) = 1 + z zn
; zn z zn =
(n) n 1
v (z) = z (236)
: 0 Elsewhere ;
Ab
u b=0 (238)
where
dv (i) dv (j)
(A)ij = ; (239)
dz dz
and ( )
dv (i) 1= z on interval left of zi
=
dz 1= z on interval right of zi
If intervals do not overlap, then
dv (i) dv (j)
; =0 (240)
dz dz
The intervals overlap when
Zzi Zzi
dv (i) dv (i)
i=j : ; = (1= z)2 dz + ( 1= z)2 dz = 2= z (241)
dz dz
zi 1 zi 1
or
Zzi
dv (i) dv (i 1)
i = j+1: ; = (1= z):( 1= z)dz = 1= z (242)
dz dz
zi 1
Zzi
dv (i) dv (i+1)
i = j 1: ; = (1= z):( 1= z)dz = 1= z (243)
dz dz
zi 1
59
Thus, the matrix A is a tridiagonal matrix
2 3
2 1 :::: :::: 0
1 6 1 2 1 :::: ::: 7
6 7
A= 6 7 (244)
z 4 :::: :::: :::: :::: ::: 5
0 :::: :::: 1 2
which is similar to the matrix obtained using …nite di¤erence method. The components of
vector b on the R.H.S. is computed as
i = 1; 2; :::; n 1 (247)
which is a weighted average of f (z) over the interval zi 1 z zi+1 : Note that the R.H.S.
is signi…cantly di¤erent from …nite di¤erence method.
In this sub-section, we have developed approximate solution using piecewise linear ap-
proximation. It is possible to develop piecewise quadratic or piecewise cubic approximations
and generate better approximations. Readers are referred to Computational Science and
Engineering by Gilbert Strang [13].
Discretization of PDE using Finite Element Method [12] The Raleigh-Ritz method
can be easily applied to discretize PDEs when the operators are self-adjoin. Consider Laplace
/ Poisson’s equation
Lu = @ 2 u=@x2 @ 2 u=@y 2 = f (x; y) (248)
in open set S and u(x; y) = 0 on the boundary. Let the inner product on the space
C (2) [0; 1] C (2) [0; 1] be de…ned as
Z1 Z1
hf (x; y); g(x; y)i = f (x; y) g(x; y) dxdy (249)
0 0
(u) = 1=2 u(x; y); @ 2 u=@x2 @ 2 u=@y 2 hu(x; y); f (x; y)i (250)
60
Z Z
(u) = [1=2(@u=@x)2 + 1=2(@u=@y)2 f u]dxdy (251)
= (1=2) h@u=@x; @u=@xi + 1=2 h@u=@y; @u=@yi hf (x; y); u(x; y)i (252)
xi = ih (i = 1; 2; ::::n 1)
yj = ih (j = 1; 2; ::::n 1)
In two dimension, the simplest element divides region into triangles on which simple poly-
nomials are …tted. For example, u(x; y) can be approximated as
b(x; y) = a + bx + cy
u
where vertices a; b; c can be expressed in terms of values of u b(x; y) at the triangle vertices.
For example, consider triangle de…ned by (xi ; yj ); (xi+1 ; yj ) and (xi ; yj+1 ): The value of the
approximate solution at the corner points is denoted by
bi;j = u
u b(xi ; yj ) ; u
bi+1;j = u
b(xi+1 ; yj ) ; u
bi;j+1 = u
b(xi ; yj+1 )
bi+1;j u
u bi;j bi;j+1 u
u bi;j
b(x; y) = u
u bi;j + (x xi;j ) + (y yi;j )
h h
(x xi;j ) (y yi;j )
bi;j 1
= u
h h
(x xi;j ) (y yi;j )
+bui+1;j bi;j+1
+u (253)
h h
Now, coe¢ cient ubi;j appears in the shape functions of four triangular element around (xi ; yj ).
Collecting these shape functions, we can de…ne a two dimensional trial function as follows
8 9
> (x xi;j ) (y yi;j )
> 1
> ; xi x xi+1 ; yj y yj+1 > >
>
>
> h h >
>
>
> (x x ) (y y ) >
>
>
> i;j i;j >
> 1 + ; x i 1 x x i ; y j y y j+1 >
>
< h h =
(i;j)
v (z) = (x x i;j ) (y y i;j )
>
> 1 + ; xi x xi+1 ; yj 1 y yj > >
>
> h h >
>
>
> (x xi;j ) (y yi;j ) >
>
>
> 1 + + ; x i 1 x x i ; yj 1 y y j >
>
>
> h h >
>
: ;
0 Elsewhere
61
Figure 1: Trial function in two dimensions.
The shape of this trial function is like a pyramid (see Figure 1). We can de…ne trial functions
at the boundary points in a similar manner. Thus, expressing the approximate solution using
trial functions and using the fact that u b(x; y) = 0 at the boundary points, we get
where v (i;j) (z) represents the (i,j)’th trial function. For the sake of convenience, let us re-
number these trial functions and coe¢ cients using a new index l = 0; 1; :::::; N such that
l = i + (n 1)j
i = 1; :::n 1 and j = 0; 1; :::n 1
N = (n 1) (n 1)
b(x; y) = u
u bN v N (x; y)
b0 v 0 (x; y) + :::: + u
M in M in 1 @b
u @bu 1 @b
u @bu
(b
u) = ; + ; b(x; y)i
hf (x; y); u
b
u b
u 2 @x @x 2 @y @y
where h iT
b=
u b0 u
u b2 ::: u
bN
Thus, the above objective function can be reformulated as
M in M in
(b
u) = uT Ab
1=2b u bT b
u (254)
b
u b
u
62
where
(A)ij = (1=2) @v (i) =@x; @v (j) =@x + (1=2) @v (i) =@y; @v (j) =@y (255)
@ =@b
u = Ab
u b=0 (257)
The matrix A will also be a sparse matrix. The main limitation of Relay-Ritz method is
that it works only when the operator L is symmetric or self adjoint.
This is probably best known minimum residual method. When used for solving linear op-
erator equations, this approach does not require self adjointness of the linear operator. To
understand the method, let us …rst consider a linear ODE-BVP
Consider an approximate solution constructed using linear combination of set of …nite number
of linearly independent functions as follows
b(z) =
u b1 (z)
1u + b2 (z)
2u + :::: + bn (z)
nu
Let us assume that these basis functions are selected such that the two boundary conditions
bi (0) = u
are satis…ed, i.e. u bi (1) = 0: Given this approximate solution, the residual is de…ned
as follows
R(z) = L [u(z)] f (z) where 0 < z < 1
The idea is to determine h iT
= 1 2 ::: n
such that
M in
( ) = hR(z); R(z)i
Z 1
hR(z); R(z)i = !(z)R(z)2 dz
0
63
where !(z) is a positive function on 0 < z < 1: This minimization problem leads to a
generalized normal form of equation
2 32 3 2 3
hLb u1 i hLb
u1 ; Lb u2 i ::::
u1 ; Lb hLb un i
u1 ; Lb 1 hLb u1 ;f (z)i
6 hLb u1 i hLb u2 i :::: hLb un i 76 7 6 hLb 7
6 u2 ; Lb u2 ; Lb u2 ; Lb 76 2 7 6 u2 ;f (z)i 7
6 76 7=6 7 (261)
4 ::::: ::::: ::::: ::::: 5 4 :::: 5 4 :::: 5
hLb u1 i hLb
un ; Lb u2 i :::::
un ; Lb hLb un i
un ; Lb m hLb un ;f (z)i
Example 29 [4]Use the least squares method to …nd an approximate solution of the equation
@2u
L [u(z)] = x=1 (262)
@z 2
B:C: 1 : u(z) = 0 (263)
B:C: 2 : u(1) = 0 (264)
b(z) =
u 1 sin ( z) + 2 sin ( z)
It may be noted that this choice ensures that the boundary conditions are satis…ed. Now,
2
L [b
u1 (z)] = ( + 1) sin( z)
2
L [b
u2 (z)] = (4 + 1) sin(2 z)
2
0 1
(4 2 +1)2 =
0 2 2 0
64
When boundary conditions are non-homogeneous, it is some times possible to transform
them to homogeneous conditions. Alternatively, the optimization problem is formulated in
such a way that the boundary conditions are satis…ed in the least square sense [4]. While this
method can be, in principle, extended to discretization of general ODE-BVP of type (14-
16a), working with parameter vector as minimizing argument can pose practical di¢ culties
as the resulting minimization problem has to be solved numerically. Coming up with initial
guess of to start the iterative algorithms can prove to be a tricky task. Alternatively, one
can work with trial solutions of the form (237) or (253) to make the problem computationally
tractable.
The Gelarkin’s method can be applied for any problem where di¤erential operator is not self
adjoint or symmetric. Instead of minimizing (bu), we solve for
v (i) (z); Lb
u(z) = v (i) (z); f (z)
i = 0; 1; 2; ::::n
we can observe that parameters u0 ; :::::un are computed such that the error or residual vector
e(z) = (Lb
u(z) f (z))
u=b
Ab (266)
where 2 3
v (0) ; L(v (0) ) :::::::: v (1) ; L(v (n) )
6 7
A = 4 :::::::::::::::::: :::::::: ::::::::::::: 5 (267)
v (n) ; L(v (0) ) :::::::: v (n) ; L(v (n) )
65
2 3
v (0) (z); f (z)
6 7
b = 4 :::::::::::::::::::: 5
v (n) (z); f (z)
Solving for ub gives approximate solution given by equation (265).When the operator is L
self adjoint, the Gelarkin’s method reduces to the Raleigh-Ritz method.
Thus, Raleigh-Ritz method cannot be applied to generate approximate solution to this problem,
however, Gelarkin’s method can be applied.
It may be noted that one need not restrict to linear transformations while applying the
Gelarkin’s method. This approach can be used even when the ODE-BVP or PDE at hand
is a nonlinear transformation. Given a general nonlinear transformation of the form
T (u) = f (z)
we select a set of trial function v (i) (z) : i = 0; 1; :::n and an approximate solution of the
form (265) and solve for
v (i) (z); T (b
u(z)) = v (i) (z); f (z) for i = 0; 1; 2; ::::n
Example 31 [2]Consider the ODE-BVP describing steady state conditions in a tubular re-
actor with axial mixing (TRAM) in which an irreversible 2nd order reaction is carried out.
1 d2 C dC
T (C) = DaC 2 = 0 (0 z 1)
P e dz 2 dz
dC
= P e(C 1) at z = 0;
dz
dC
= 0 at z = 1;
dz
66
The approximate solution is chosen as
X
n
b
C(z) b0 v () (z) + :::::: + C
=C bn v (n) (z) = bi v (i) (z)
C (271)
i=0
will give rise to equations that are nonlinear in terms of unknown coe¢ cients. Thus, we get
(n+1) nonlinear algebraic equations in (n+1) unknowns, which have to be solved simultane-
ously to compute the unknown coe¢ cients C b0 ; :::C
bn . Details of computing these integrals and
developing piecewise approximating functions on …nite element can be found in [2].
67
carried out only with a …nite number of precision introduces round-o¤ errors. It may be
noted that these round-o¤ errors occur in every iteration and their cumulative e¤ect on the
…nal solution is di¢ cult to predict.
Discretization errors arise because an in…nite dimensional transformation by a …nite
dimensional one. Thus, while studying the discretization errors, we have to understand
behavior of xe with reference to the true solution x:It is reasonable to expect that a nu-
merical method be capable of yielding arbitrarily accurate answers by making discretization
su¢ ciently …ne. A method that gives a sequence of approximations converging to the true
solution is called convergent approximation method.
Ax = b
F (x) = 0
Optimization problem: Minimize some scalar objective function (x) :Rn ! R with
respect to argument x:
The numerical solution techniques for solving these fundamental problems forms the basic
toolkit of the numerical analysis. In the modules that follow, we examine each tool separately
and in greater details.
68
8 Appendix: Necessary and Su¢ cient Conditions for
Unconstrained Optimality
8.1 Preliminaries
Given a real valued scalar function (x) : Rn ! R de…ned for any x 2 Rn :
De…nition 32 (Global Minimum): If there exists a point x 2 Rn such that (x ) < (x)
for any x 2 RN ; then x is called as global minimum of (x):
De…nition 34 (Local Minimum) : If there exists an " neighborhood NC (x) round x such
that (x) < (x) for each x 2 Ne (x); then x is called local minimum.
Before we prove the necessary and su¢ cient conditions for optimality, we revise some
relevant de…nitions from linear algebra.
xT Ax 0 (273)
xT Ax 0 (275)
69
8.2 Necessary Condition for Optimality
The necessary condition for optimality, which can be used to establish whether a given point
is a stationary (maximum or minimum) point, is given by the following theorem.
Theorem 39 If (x) is continuous and di¤erentiable and has an extreme (or stationary)
point (i.e. maximum or minimum ) point at x =x; then
T
@ @ @
r (x) = :::::::::::::: =0 (276)
@x1 @x2 @xN x=x
Proof: Suppose x =x is a minimum point and one of the partial derivatives, say the k th
one, does not vanish at x =x; then by Taylor’s theorem
X
N
@
(x + x) = (x) + (x) xi + R2 (x; x) (277)
i=1
@xi
@
i:e: (x + x) (x) = xk
(x) + R2 (x; x) (278)
@xi
Since R2 (x; x) is of order ( xi )2 ; the terms of order xi will dominate over the higher
order terms for su¢ ciently small x: Thus, sign of (x + x) (x) is decided by sign of
@
xk (x)
@xk
Suppose,
@
(x) > 0 (279)
@xk
then, choosing xk < 0 implies
and (x) can be further reduced by reducing xk : This contradicts the assumption that
x =x is a minimum point. Similarly, if
@
(x) < 0 (281)
@xk
then, choosing xk > 0 implies
and (x) can be further reduced by increasing xk : This contradicts the assumption that
x =x is a minimum point. Thus, x =x will be a minimum of (x) only if
@
(x) = 0 F or k = 1; 2; :::n (283)
@xk
Similar arguments can be made if x =x is a maximum of (x):
70
8.3 Su¢ cient Condition for Optimality
The su¢ cient condition for optimality, which can be used to establish whether a stationary
point is a maximum or a minimum, is given by the following theorem.
Theorem 40 A su¢ cient condition for a stationary point x =x to be an extreme point (i.e.
@2
maximum or minimum) is that matrix (Hessian of )evaluated at x =x is
@xi @xj
1. positive de…nite when x =x is minimum
X
N
@ 1 XX @ 2 (x +
N N
x)
(x + x) = (x) + (x) x + xi xj
i=1
@xi 2! i=1 j=1 @xi @xj
(0 < < 1) (284)
r (x) = 0 (285)
1 XX @ 2 (x +
N N
x)
(x + x) (x) = xi xj (286)
2! i=1 j=1 @xi @xj
(0 < < 1)
This implies that sign of (a + x) (a)at extreme point x is same as sign of R.H.S. Since
@2
the 2’nd partial derivative is continuous in the neighborhood of x =x; its value at
@xi @xj
x =x + x will have same sign as its value at x =x for all su¢ ciently small x. If the
quantity
XN X N
@ 2 (x + x)
xi xj ' ( x)T [r2 (x)] x 0 (287)
i=1 j=1
@xi @xj
for all x, then x =x will be a local minimum. In other words, if Hessian matrix ,[r2 (x)], is
positive semi-de…nite, then x =x will be a local minimum. If the quantity
XN X N
@ 2 (x + x)
xi xj ' ( x)T [r2 (x)] x 0 (288)
i=1 j=1
@x i @x j
71
for all x, then x =x will be a local maximum. In other words, if Hessian matrix, [r2 (x)],
is negative semi-de…nite, then x =x will be a local maximum.
It may be noted that the need to de…ne a positive de…nite or negative de…nite matri-
ces naturally arises from the geometric considerations while qualifying a stationary point
in multi-dimensional optimization problem. Whether a matrix is positive (semi) de…nite,
negative (semi) de…nite or inde…nite can be established using algebraic conditions, such as
sign of the eigen values of the matrix. If eigenvalues of a matrix are all real positive (i.e.
i 0 for all i) then, the matrix is positive semi-de…nite. If eigenvalues of a matrix are all
real negative (i.e. i 0 for all i) then, the matrix is negative semi-de…nite. When eigen
values have mixed signs, the matrix is inde…nite.
References
[1] Bazara, M.S., Sherali, H. D., Shetty, C. M., Nonlinear Programming, John Wiley, 1979.
[2] Gupta, S. K.; Numerical Methods for Engineers. Wiley Eastern, New Delhi, 1995.
[3] Kreyzig, E.; Introduction to Functional Analysis with Applications,John Wiley, New
York, 1978.
[4] Linz, P.; Theoretical Numerical Analysis, Dover, New York, 1979.
[5] Luenberger, D. G.; Optimization by Vector Space Approach , John Wiley, New York,
1969.
[6] Luenberger, D. G.; Optimization by Vector Space Approach , Joshn Wiley, New York,
1969.
[7] Gourdin, A. and M Boumhrat; Applied Numerical Methods. Prentice Hall India, New
Delhi.
[8] Moursund, D. G., Duris, C. S., Elementary Theory and Application of Numerical Analy-
sis, Dover, NY, 1988.
[9] Rall, L. B.; Computational Solutions of Nonlinear Operator Equations. John Wiley,
New York, 1969.
[10] Rao, S. S., Optimization: Theory and Applications, Wiley Eastern, New Delhi, 1978.
[11] Strang, G.; Linear Algebra and Its Applications. Harcourt Brace Jevanovich College
Publisher, New York, 1988.
72
[12] Strang, G.; Introduction to Applied Mathematics. Wellesley Cambridge, Massachusetts,
1986.
[13] Strang, G.; Computational Science and Engineering. Wellesley-Cambridge Press, MA,
2007.
73