Direct Methods

Numerical Methods and Computation
MTL 107
Harish Kumar
(hkumar@iitd.ac.in)
Dept. of Mathematics, IIT Delhi

Direct Methods for Linear Systems
▶ Review on basic concepts regarding linear systems

▶ Basics of linear system solution
▶ Vector and matrix norms
▶ Orthogonal vectors and matrices
▶ LU factorization
▶ Pivoting
▶ Error Estimation and Condition Number
Lecture 8 & 9
Gauss Elimination and LU Decomposition
Basic concepts: linear system of equations

x1
▶ Find x =
x2
which satisfies
a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2
or,
Ax = b

a11 a12
where A =
a21 a22
umerical Methods for Computational Science and Engineering
Basic concepts: linear system of equations
Basic concepts: linear system of equations (cont.)
Basic concepts: linear system of equations (cont.)
Solving a 2 × 2 system.
I Unique solution if and only if lines are not parallel.
CSE, Lecture 5, Oct 3, 2013 Figure 1: Solving 2 × 2 system of equations

Range space and null space
We can write Ax = b as
a1 x1 + a2 x2 + · · · + an xn = b
where aj is the j−th column of A.

If A is nonsingular, then the columns aj are linearly independent.
Then there is a unique way of writing any vector b as a linear
combination of these columns.
Hence, if A is nonsingular, then there is a unique solution for the
system Ax = b.
Range space and null space (cont.)
▶ In general, for a square n × n system there is a unique solution

if one of the following equivalent statements hold:
▶ A is nonsingular;
▶ det(A) ̸= 0;
▶ A has linearly independent columns or rows;
▶ there exists an inverse A−1 satisfying AA−1 = I = A−1 A;
▶ range(A)=Rn ;
▶ null(A)=0.
range(A) = all vectors that can be written as linear combination of
columns of A.
null(A) = nullspace of A = all vectors z for which Az = 0.
Example (Range and nullspace)

1 1
A=
3 3

1
null(A) = α , α ∈ R,
−1

1
range(A) = β , β ∈ R.
3

2
Give all solutions of Ax = b with b= .
6
Example (Almost singularity)
Let’s
look atlinear
system
of equations

1+ε 1 x1 2+ε
= where 0 < ε ≪ 1.
3 3 x2 6

1
The unique solution is x= .
1
System matrix above is almost singular, i.e., a small perturbation
makes it singular.
The close-by problem has many solutions, some are far apart from
x: ill-conditioned problem.
Vector norms
A vector norm is a function ∥·∥ : Rn → R+ such that

1. ∥x∥ ≥ 0, for all x ∈ Rn and ∥x∥ = 0 ⇐⇒ x = 0.
2. ∥αx∥ = |α| ∥x∥ for all α ∈ R.
3. ∥x + y∥ ≤ ∥x∥ + ∥y∥, for all x, y ∈ Rn .
This generalizes absolute value or magnitude of a scalar.
Frequently used norms are
▶ The spectral norm or Euclidean norm or 2-norm or l2 norm:
It measures the Euclidean length of a vector and is defined by
v
u n
uX 2
∥x∥2 = t |xi | .
i=1
Vector norms (cont.)
▶ The infinity norm or maximum norm:

It measures the largest element in modulus and is defined by
∥x∥∞ = max |xi |

1≤i≤n
▶ The 1-norm or l1 norm:

It measures the sum of all the elements in modulus and is
defined by
X n
∥x∥1 = |xi | .
i=1
Numerical Methods for Computational Science and Engineering
Vector norms
‘Unit circles’ for `1 , `2 , and `1 norms (in 2D).

Figure 2: Errors
’Unit Circles’ of l1 , l2 and l∞ norms

NumCSE, Lecture 5, Oct 3, 2013 11/33
In the finite dimensional case considered here, all norms are

equivalent, meaning that for two norms ∥·∥a and ∥·∥b there are
positive constants C1 , C2 such that
C1 ∥x∥a ≤ ∥x∥b ≤ C2 ∥x∥a , ∀x.
We can choose freely in which norm we want to measure distance.

Often a good choice simplifies the argument when proving a result.
Nevertheless the result holds in any norm, the difference is just a
constant. In particular for the three norms above we have
√
∥x∥∞ ≤ ∥x∥2 ≤ n ∥x∥∞
∥x∥∞ ≤ ∥x∥1 ≤ n ∥x∥∞

√
∥x∥2 ≤ ∥x∥1 ≤ n ∥x∥2
Matrix norms
A matrix norm is a function ∥A∥ : Rm×n → R+ such that

1. ∥A∥ ≥ 0, for all A ∈ Rm×n and ∥A∥ = 0 ⇐⇒ A = 0.
2. ∥αA∥ = |α| ∥A∥ for all α ∈ R.
3. ∥A + B∥ ≤ ∥A∥ + ∥B∥, for all A, B ∈ Rm×n .
Note that matrix norms are often also required to satisfy the
submultiplicative property,
∥AB∥ ≤ ∥A∥ ∥B∥ .
Some matrix norms are induced by vector norms by the definition
∥A∥ = sup ∥Ax∥

∥x∥=1
Matrix norms (cont.)
For the vector norms we have introduced before, the corresponding

induced matrix norms are
▶ The spectral norm or 2-norm:
∥A∥2 = sup ∥Ax∥2 = σmax (A)

∥x∥2 =1
It can be shown that ∥A∥2 is equal to the largest eigenvalue of

A⊤ A. It follows that the 2-norm is invariant under orthogonal
transformations, i.e., we have
∥QA∥2 = ∥AQ∥2 = ∥A∥2
whenever Q ⊤ Q = I .
▶ The infinity norm or maximum row sum norm:

n
X
∥A∥∞ = sup ∥Ax∥ = max |aij |
∥x∥∞ =1 1≤i≤n
j=1
The last identity comes from the observation that the vector x
which maximizes the supremum is given by x = (±1, ±1, · · · , ±1)⊤
with the sign of the entries chosen according to the sign of the
entries in the row of A with the largest row sum.
▶ The 1-norm or maximum column sum norm:

n
X
∥A∥1 = sup ∥Ax∥ = max |aij |
∥x∥1 =1 1≤j≤n
i=1
The last identity holds because the supremum is attained for

the value x = (0, 0, · · · , 0, 1, 0, · · · , 0)⊤ with 1 at the column
position of A with the largest column sum.
▶ Frobenius norm:
The commonly used Frobenius norm does not arise from
vector norms:
v v
uXn u n
2
uX
σi2 (A)
u
∥A∥F = t |aij | = t
i,j=1 i=1
Orthogonal vectors and matrices
Orthogonal vectors: Two vectors u, v ∈ Rn are orthogonal if
u⊤ v = 0
Vector u is orthonormal if ∥u∥2 = 1.

A square matrix Q is orthogonal if its columns are pairwise
orthonormal, i.e.,
Q ⊤Q = I Hence, also Q −1 = Q ⊤ .
Important property: for any orthogonal matrix Q and vector x:
∥Qx∥2 = ∥x∥2
Hence,
∥Q∥2 = Q −1 2
= 1.
Linear systems: Problem statement
We consider linear systems of equation of the form

n
X
aik xk = bi , i = 1, 2, · · · , n
k=1
or,
Ax = b
The matrix elements aik and the right-hand side elements bi are
given. We are looking for the unknowns xk .
Direct vs. iterative methods
Ax = b
A is given, real, nonsingular, n × n and b is real, given vector.
Such problems are ubiquitous!
Two types of solution approaches:
1. Direct method: yield exact solution in absence of roundoff
error.
▶ Variations of Gaussian elimination.
2. Iterative method: iterate in a similar fashion to what we do
for nonlinear problems.
▶ Use only when direct methods are ineffective.
Existence and uniqueness of LU decomposition
The decomposition of the matrix A into a (unit) lower triangular

matrix L and an upper triangular matrix U, A = LU, is called LU
decomposition. The process that computes the LU decomposition
is called Gaussian elimination.
Theorem
The square matrix A ∈ Rn×n has a unique decomposition A = LU
if and only if the leading principal submatrices
Ak = A(1 : k, 1 : k) k = 1, 2, · · · , n − 1 are nonsingular.
Theorem
If A is nonsingular, then one can find a row permutation P such
that PA satisfies the conditions of the previous theorem, that is PA
= LU exists and is unique.
Gaussian elimination for Ax = b
x1 x2 x3 x4 1
a11 a12 a13 a14 b1
a21 a22 a23 a24 b2
a31 a32 a33 a34 b3
a41 a42 a43 a44 b4
1. Permute rows i = 1, ..., 4 (if necessary) such that a11 ̸= 0.

This element is called the pivot.
2. Subtract the multiple li1 = ai1 /a11 of row 1 from row i,
i = 2, · · · , 4.
′ = a − l a , k, i = 2, · · · , 4
3. Set aik ik i1 1k
4. Set bi′ = bi − li1 b1 , i = 2, · · · , 4
Gaussian elimination for Ax = b (cont.)
x1 x2 x3 x4 1
a11 a12 a13 a14 b1
0 ′
a22 ′
a23 ′
a24 b2′
0 ′
a32 ′
a33 ′
a34 b3′
0 ′
a42 ′
a43 ′
a44 b4′
′ ̸= 0.
1. Permute rows i = 2, ..., 4 (if necessary) such that a22
This is next pivot.
′ = a′ /a′ of row 2 from row i,
2. Subtract multiples li2 i2 22
i = 3, · · · , 4.
′′ = a′ − l ′ a′ , k, i = 3, · · · , 4
3. Set aik ik i2 2k
4. Set bi′′ = bi′ − li2
′ b ′ , i = 3, · · · , 4
2
x1 x2 x3 x4 1
a11 a12 a13 a14 b1
0 ′
a22 ′
a23 ′
a24 b2′
0 0 ′
a33 ′
a34 b3′
0 0 ′
a43 ′
a44 b4′
′′ ̸= 0.
1. Permute rows i = 3, ..., 4 (if necessary) such that a33
This is the next pivot.
′′ = a′′ /a′′ of row 3 from row 4.
2. Subtract multiples l43 43 33
′′′ = a′′ − l ′′ a′′ ,
3. Set a44 44 43 34
4. Set b4′′′ = b4′′ − li1
′′ b ′′ .
3
x1 x2 x3 x4 1 x1 x2 x3 x4 1
u11 u12 u13 u14 c1 a11 a12 a13 a14 b1
0 u22 u23 u24 c2 ⇐⇒ 0 ′
a22 ′
a23 ′
a24 b2′
0 0 u33 u34 c3 0 0 ′
a33 ′
a34 b3′
0 0 0 u44 c4 0 0 0 ′
a44 b4′
Actual storage scheme
x1 x2 x3 x4 1
a11 a12 a13 a14 b1
l21 ′
a22 ′
a23 ′
a24 b2′
l31 ′
l32 ′′
a33 ′′
a34 b3′′
l41 ′
l42 ′
l43 ′′′
a44 b4′′′
plus vector that stores info on permutations

Gaussian elimination for Ax = b (matrix notation)
We stick with the 4 × 4 example. Let

    
1 1 1
 l21 1   1   1
L1 =   L2 = 
′
 L3 = 
 l31 1   l32 1   1
l41 1 ′
l42 1 ′′
l43
Then, we executed the following
U = L−1 −1 −1
3 P3 L2 P2 L1 PA
which can be interpreted as
U = L−1 −1 −1 −1 −1 −1
3 (P3 L2 P3 )(P3 P2 L1 P2 P3 )(P3 P2 P1 )A
L3−1 (P3 L−1 −1 −1 −1 −1 −1

2 P3 )(P3 P2 L1 P2 P3 ) = L , (P3 P2 P1 ) = P
Algorithm: Solving Ax = y by LU decomposition
1. Compute LU factorization with column pivoting
LU = PA.
Then LUx = PAx = Pb.

2. Solve
Lc = Pb
by forward substitution.
3. Solve
Ux = c
by backward substitution.
Note: L is a unit lower triangular matrix, i.e., it has ones on the
diagonal.
Computing the determinant of A
Since
det(A) = det(P −1 ) det(L) det(U)
and
det(L) = 1,
and
det(P −1 ) = det(P) = (−1)V
where V is the number of row permutations in the Gaussian
elimination, we have
n
Y
det(A) = (−1)V uii
i=1
The uii are the diagonal elements of U.

Algorithm: LU factorization
1 for k=1:n-1
2 for i=k+1:n
3 l(i,k) = a(i,k)/a(k,k);
4 for j=k+1:n
5 a(i,j) = a(i,j) - l(i,k)*a(k,j);
6 end
7 end
8 end
Complexity: LU factorization
In step k of the LU factorization:

Divide ai,k for i = k + 1, ..., n by the pivot to get li,k
Update ai,j , for k + 1 ≤ i, j ≤ n by li,k ak,j .
The cost of the Gaussian elimination algorithm in terms of floating
point operations (flops) is thus
(n − 1) + (n − 2) + · · · + 1 + 2(n − 1)2 + (n − 2)2 + · · · + 12
1 1
= n(n − 1) + n(n − 1)(2n − 1)
2 3
2 3 1 2 1
= n − n − n
3 2 6
2
= n3 + O(n2 )
3
Algorithm: forward and Backward substitution
▶ The complexity of forward substitution is thus n2 + O(n) flops.

▶ The complexity of Backward substitution is thus n2 + O(n)
flops.
Exercise
Complexity: complete system solve
The three steps of Gaussian elimination costs thus

2 3 2
n + O(n2 ) + 2n2 + O(n) = n3 + O(n2 ) = O(n3 )
3 3
▶ Important: LU factorization costs O(n3 ) while the actual

solve costs only O(n2 ).
▶ Can solve economically for several right-hand sides! Costs are
O(n3 + n2 k) (not O(n3 k)) for k right-hand sides
Computing the inverse of a matrix
▶ We can compute A−1 by decomposing LU = PA once, and

then solving Ax = b for n right hand sides [e1 , e2 , ..., en ], the
columns of the identity matrix I .
▶ Cost(approx): 32 n3 + n(2n2 ) = 38 n3 .
▶ However, typically we will try to avoid computing the inverse
A−1 the need to compute it explicitly is rare.
▶ General rule: Don’t compute the explicit matrix inverse.
Lecture 10
Pivoting of Matrices
Gaussian elimination with partial pivoting
To solve the linear system of equation A x = b we proceed as

follows:
1. Compute LU factorization with partial pivoting
LU = PA
Then LUx = PAx = Pb.

2. Solve Lc=Pb by forward substitution
3. Solve Ux=c by backward substitution
The cost is 32 n3 + O(n2 ) for the factorization and 2n2 + O(n) for
each forward and backward substitution, in total
2 3
n + O(n2 ) FLOPs
3
Need for pivoting

0 1 x1 4
Ax=b ⇐⇒ =
1 1 x2 7
x1 x2 1
0 1 4
1 1 7
Pivot a11 = 0. We exchange rows 1 and 2.
x1 x2 1
1 1 7
0 1 4
Need for pivoting (cont.)
Let’s consider the numerical example (5 decimal digits)
x1 x2 1
0.00035 1.2654 3.5267
1.2547 1.3182 6.8541
x1 x2 1
→ 0.00035 1.2654 3.5267 with
0 -4535.0 -12636
l21 = 1.2547/0.00035 ≈ 3584.9
u22 = 1.3182 − 3584.9 × 1.2654 ≈ 1.3182 − 4536.3 ≈ −4535.0

c2 = 6.8541 − 3584.9 × 3.5267 ≈ 6.8541 − 12643 ≈ −12636.
Need for pivoting (cont.)
Backsubstitution gives
x2 = −12636/(−4535.0) ≈ 2.7863
x1 = (3.5267 − 1.2654 × 2.7863)/0.00035 ≈ 2.5714

The exact solution (up to 5 decimal digits) is
x1 = 2.5354, x2 = 2.7863
We have cancellation in the computation of x1 ! The very small

pivot in the first elimination step induced large numbers u22 and
c2 . Information, e.g. a22 , got lost.
GE with partial pivoting (GEPP)
The most common strategy for pivoting is to search for the

maximal element in modulus: The index p of the pivot in the k-th
step of GE is determined by
k−1 (k−1)
apk = max aik
i≥k
If p > k then rows p and k are exchanged.

This strategy implies that |lik | ≤ 1
GE with partial pivoting (GEPP) (cont.)
Let’s consider the

x1 x2 1
1.2547 1.3182 6.8541
0.00035 1.2654 3.5267
x1 x2 1
→ 1.2547 1.3182 6.8541 with
0 1.2650 3.5248
l21 = 0.00027895
x2 = 2.7864
x1 = (6.8541−1.3182×2.7864)/1.2547 ≈ 3.1811/1.2547 ≈ 2.5353
There is a deviation from the exact solution in the last digit only.
GEPP stability
Question: Does the incorporation of a partial pivoting procedure

guarantee stability of the Gaussian elimination algorithm, in the
sense that roundoff errors do not get amplified by factors that
grow unboundedly as the matrix size n increases?
We have seen that |lik ≤ 1|. What about |uik |?
We would like to be sure that
(k)
gn (A) = max ai,j
i,j,k
i.e.,the elements in U do not grow too fast in the course of GE.

Unfortunately, it is not possible in general to guarantee stability of
GEPP.
Wilkinson’s counter example

1,
 if i = j or j = n,
n
An = (aij )i,j=1 with −1 if i > j,

0 otherwise.

1 0 0 0 0 1
 
 −1
 1 0 0 0 1 

 −1 −1 1 0 0 1 
A6 =  
 −1
 −1 −1 1 0 1 

 −1 −1 −1 −1 1 1 
−1 −1 −1 −1 −1 1
Partial Pivoting do not trigger change of row and last column
grows.
Complete pivoting
In complete pivoting we look for (one of) the largest element in

modulus
(k−1) (k−1)
aqr = max aij
k≤i,j≤n
at each stage of GE and interchange both row q and column r with

row i and column j, respectively.
This strategy that does guarantee stability. However, searching in
each stage (n − k)2 elements makes it significantly more expensive
than partial pivoting.
Instances in which (scaled) partial pivoting really fails appear to be
extremely rare in practice → GEPP is the commonly used
approach.
When pivoting is not needed
Definition
A matrix A ∈ Rn×n is diagonally dominant, if
X
|aii | ≥ |aik | , i = 1, ..., n.
k̸=i
Theorem
If A is nonsingular and diagonally dominant then the LU
factorization can be computed without pivoting.
Proof
We show that after reduction of the first row, the reduced system
is again diagonally dominant. We have
(1) ai1 a1k
aik = aik − , i, k = 2, ..., n.
a11
For the diagonal elements we get the estimate
(1) ai1 a1i ai1 a1i

aii = aii − ≥ |aii | − , i = 2, ..., n.
a11 a11
We now want to show that

n
(1) (1)
X
aii ≥ aik , i = 2, ..., n.
k=2,k̸=i
Proof (cont.)
Now for the sum of the moduli of the off-diagonal elements of row
i (i = 2, . . . , n) we get
n n
X (1)
X ai1 a1k
aik = aik −
a11
k=2,k̸=i k=2,k̸=i
n n
X ai1 X
≤ |aik | + |a1k |
a11
k=2,k̸=i k=2,k̸=i
 
n n
X ai1  X 
= |aik | − |ai1 | + |a1k | − |a1i |
a11  
k=1,k̸=i k=1,k̸=i
ai1
≤ |aii | − |ai1 | + {|a11 | − |a1i |}
a11
ai1 a1i (1)
= |aii | − ≤ aii .
a11
Lecture 11
Cholesky Decomposition, Linear Algebra Libraries and Matlab
routine
Symmetric, positive definite systems
Definition
A symmetric matrix A is positive definite, if the corresponding
quadratic form is positive definite, i.e., if
Q(x) := x⊤ Ax > 0, for all x ̸= 0.

Symmetric, positive definite systems
Theorem
If a symmetric matrix A ∈ Rn×n is positive definite, then the
following conditions hold.
1. aii > 0 for i=1,...,n.
2 <a a
2. aik ii kk for i ̸= k, i, k = 1, ..., n.
3. There is a k with maxi,k |aik | = akk .
Proof.
1. aii = e⊤
i Aei > 0.
2. (ξei + ek )⊤ A(ξei + ek ) = aii ξ 2 + 2aik ξ + akk > 0.This
quadratic equation has no real zero ξ, therefore its
2 − 4a a
discriminant 4aik ii kk must be negative.
3. Clear.
Gaussian elimination works w/o pivoting. Since a11 > 0 we have
a11 a⊤ 0⊤ a⊤

1 1 a11 1
A= =
a1 A1 a1 /a11 I 0 A1 − a1 a⊤ 1 /a11
0⊤ a11 0⊤ 1 a⊤

=
1 1 /a11
a1 /a11 I 0 A(1) 0 I
with A(1) = A1 − a1 a⊤
1 /a11 . But for any x ∈ R
n−1 /{0} we have
⊤
a11 0⊤

0 0
x⊤ A(1) x =
x 0 A(1) x
⊤
0⊤ a11 a⊤ 1 −a⊤

0 1 1 1 /a11 0
=
x −a1 /a11 I a1 A1 0 I x
= y⊤ Ay > 0.
Cholesky decomposition
From the proof we see that
A = LU = LDL⊤ with U = DL⊤
Note that L is a unit lower triangular matrix and D is a diagonal

matrix with positive diagonal entries.
1
We can easily compute the positive definite diagonal matrix D 2
1 1
with D 2 D 2 = D.
Definition 1
Let L1 = LD 2 . Then
A = L1 L⊤
1
is called the Cholesky decomposition of A.

Cholesky decomposition: Solving systems and complexity
Linear system solving with the Cholesky factorization:
A = LL⊤ (Cholesky decomposition)
Lc = b (Forward substitution)
L⊤ x = c (Backward substitution)
The complexity is half that of the LU factorization:
1 3
n + O(n2 )
3
Efficient implementation
▶ In general, the performance of an algorithm depends not only

on the number of arithmetic operations that are carried out
but also on the frequency of memory accesses.
▶ A first step towards efficiently accessing memory is
vectorization. Matlab supports vector operations in a
convenient way.
▶ However, good performance is only achievable by blocking
algorithms. Only blocks of data increase the number of
floating point operations (flops) versus memory accesses (in
Bytes) beyond O(1).
LAPACK and the BLAS
▶ LAPACK (Linear Algebra PACKage) is a library of Fortran 77

subroutines for solving the most commonly occuring problems
in numerical linear algebra. It has been designed to be
efficient on a wide range of high-performance computers with
a hierarchy of memory levels.
▶ LAPACK supersedes LINPACK and EISPACK.
▶ LAPACK is written in a way that as much as possible of the
computation is performed by calls to the Basic Linear Algebra
Subprograms (BLAS). While LINPACK and EISPACK relied
on the vector operations in BLAS-1, LAPACK calls BLAS-2
(matrix-vector operations) and BLAS-3 (matrix-matrix
operations) to exploit the fast memories (caches) of todays
high-performance computers.
▶ Most of the work has been done at universities (U of TN at
Knoxville, U of CA at Berkeley) and at NAG, Oxford.
▶ The software is freely available at
http://www.netlib.org/lapack
▶ The LINPACK benchmark gives rise to the TOP500 list, the
list of the 500 most powerful computers:
http://www.top500.org
▶ IIT Delhi: HP Apollo 6000 Xl230/250 , Xeon E5-2680v3 12C
2.5GHz, Infiniband FDR, NVIDIA Tesla K40m
Hewlett-Packard Rank: 217, 4th in India
BLAS
See http://www.netlib.org/blas
BLAS-1: vector operations (real, double, complex variants)
▶ Swap two vectors, copy two vectors, scale a vector
▶ AXPY operation: y = αx + y
▶ 2-norm, 1-norm, dot product
▶ I AMAX index of largest matrix element:
first i such that |xi | ≥ |xk | for all k.
▶ O(1) flops per Byte memory access.
BLAS (cont.)
BLAS-2: matrix-vector operations

▶ matrix-vector multiplication (variants for various matrix types:
general, symmetric, banded, triangular)
▶ triangular solves (various variants)
▶ O(1) flops per Byte memory access.
BLAS (cont.)
BLAS-3: matrix-matrix operations

▶ matrix-matrix multiplication
▶ triangular solves for multiple right-hand sides (various
variants)
▶ O(b) flops per Byte memory access where b is block size.
Matlab
▶ Matlab has built-in Gaussian Elimination to solve Ax = b.

Use x=A\b.
▶ Can compute decompositions withlu and chol.
▶ Do not implement Gaussian elimination yourself!
▶ Use numerical libraries (LAPACK), NAG, MKL, or MATLAB!
▶ MATLAB operator: \
▶ In fact: all implementations are based on LAPACK.
.
Lecture 12
Error Analysis, Condition Number
Error estimation
Two questions regarding the accuracy of x̃ as an approximaton to

the solution x of the linear system of equations Ax = b.
▶ First we investigate what we can derive from the size of the
residual r̃ = b − Ax̃.
Note that r = b − Ax = 0.
▶ Then, what is the effect of errors in the initial data (b, A) on
the solution x? That is, how sensitive is the solution to
perturbations in the initial data?
Error estimation (cont.)
Let
1.2969 0.8648 0.8642
A= , b=
0.2161 0.1441 0.1440
Suppose somebody came up with the approximate solution

0.9911
x̃ =
−0.4870
−10−8

Then, r̃ = b − Ax̃ = =⇒ ∥r̃∥∞ = 10−8
10−8

2
Since x = =⇒ error ∥z̃∥∞ = 1.513 which is ≈ 108
−2
times larger than the residual.
So, how does the residual r̃ = b − Ax̃ affect the error z̃ = x̃ − x?

We assume that ∥A∥ is any matrix norm and ∥x∥ a compatible
vector norm, i.e., we have ∥Ax∥ ≤ ∥A∥ ∥x∥ for all ∥x∥.
We have
Az = A(x̃ − x) = Ax̃ − b = −r̃
Therefore
∥b∥ = ∥Ax∥ ≤ ∥A∥ ∥x∥ , ∥z∥ = −A−1 r̃ ≤ A−1 ∥r̃∥
We get an estimate for the relative error of x:
∥z∥ ∥x̃ − x∥ ∥r̃∥ ∥r̃∥

= ≤ ∥A∥ A−1 = κ(A)
∥x∥ ∥x∥ ∥b∥ ∥b∥
Definition
The quantity
κ(A) = ∥A∥ A−1
is called the condition number of A.
The condition number is at least 1:
1 = ∥I ∥ = AA−1 ≤ ∥A∥ A−1 = κ(A).
If A is nonsingular and E is the matrix with smallest norm such

that A + E is singular, then
∥E ∥2 1
=
∥A∥2 κ2 (A)
Previous example continued.

−1 8 0.1441 −0.8648
A = 10
−0.2161 1.2969
This yields
A−1 ∞
= 1.513×108 =⇒ κ∞ (A) = 2.162×1.513×108 ≈ 3.27×108 .
1.513 3.27
The numbers 2 < 0.8642 confirm the estimate
∥z∥∞ ∥r̃∥∞
≤ κ∞ (A)
∥x∥∞ ∥b∥∞
We now make A singular by replacing a22 by a12a11a21 .

Then A+E is singular with

0 0 0 0
E= ≈ .
0 0.8648−0.2161
1.2969 − 0.1441 0 −7.7 × 10−9
So, indeed,
∥E ∥ 1
≈ .
∥A∥ κ(A)
(This estimate holds in l2 and l∞ norms.)
Sensitivity on matrix coefficients
▶ Input data A and b are often perturbed due, e.g., to rounding.

▶ How big is the change δx if the matrix A is altered by δA and
b is perturbed by δb.
▶ The LU decomposition can be considered as the exact
decomposition of a perturbed matrix Ã = A + δA
▶ Forward/backward substitition add (rounding) errors.
▶ Alltogether: The computed solution ˆ x̃ is the solution of a
perturbed problem
(A + δA)x̃ = b + δb. x̃ = x + δx.
▶ How do these perturbations affect the solution?This is called

backward error analysis.
Sensitivity on matrix coefficients (cont.)
(A + δA)(x + δx) = (b + δb)

Multiplying out:
Ax + Aδx + δAx + δAδx = b + δb
or, with Ax=b,

Aδx + δAx + δAδx = δb
or, equivalently,
Aδx = δb − δAx − δAδx
Thus,
δx = A−1 (δb − δAx − δAδx)
For compatible norms we have
∥δx∥ = A−1 ∥δb − δAx − δAδx∥
≤ A−1 (∥δb∥ + ∥δA∥ ∥x∥ + ∥δA∥ ∥δx∥)

From this we get
(1 − A−1 ∥δA∥) ∥δx∥ ≤ A−1 (∥δb∥ + ∥δA∥ ∥x∥)

Now assume that the perturbation in A is small: A−1 ∥δA∥ < 1.

Then,
A−1
∥δx∥ ≤ (∥δb∥ + ∥δA∥ ∥x∥)
1 − ∥A−1 ∥ ∥δA∥
Since we are interested in an estimate of the relative error we use
1 ∥A∥
∥b∥ = ∥Ax∥ ≤ ∥Ax∥ ≤ ∥A∥ ∥x∥ =⇒ ∥x∥ ≥ ∥b∥ / ∥A∥ =⇒ ≤ .
∥x∥ ∥b∥
Therefore, we have
A−1

∥δx∥ ∥δb∥
≤ + ∥δA∥
∥x∥ 1 − ∥A−1 ∥ ∥δA∥ ∥x∥
With the condition number κ(A) = ∥A∥ A−1 we get

∥δx∥ κ(A) ∥δb∥ ∥δA∥
≤ +
∥x∥ 1 − κ(A) ∥δA∥
∥A∥
∥b∥ ∥A∥
The condition number κ(A) is the decisive quantity that describes

the sensitivity of the solution x on both relative changes in A as
well as in b.
▶ If κ(A) ≫ 1 small perturbations in A can lead to large relative
errors in the solution of the linear system of equations.
▶ If κ(A) ≪ 1 a stable algorithm can produce solutions with
large relative error!
▶ A stable algorithm produces (acceptably) small errors if the
problem is well-conditioned (i.e. κ(A) ‘not too large’).
Rule of thumb
Let’s assume we compute with d decimal digits such that the

initial errors in the input data are about
∥δb∥ ∥δA∥
≤ 5.10−d , ≤ 5.10−d .
∥b∥ ∥A∥
Let’s assume that the condition number of A is 10α .

If 5.10α−d ≪ 1 then we get
∥δx∥
≤ 10α−d+1
∥x∥
Rule of thumb (cont.)
Rule of thumb: If a linear system A x = b is solved with d digit

floating point numbers and κ(A) ≈ 10α then, due to unavoidable
errors in the initial data, we have to expect relative errors in x in
the order of 10α−d+1 .
Note that this statement is about the largest components of x.
The small components can be bigger errors!
Note on δA and δb
It can be shown that Gaussian elimination with pivoting yield a

perturbation bounded by
∥δA∥∞ ≤ ηϕ(n)gn (A)
where ϕ(n) is a low order polynomial in n (cubic at most) and η is

the rounding unit. The bound on forward and backward
substitutions and on δb are significantly smaller.
Thus, as long as the pivoting keeps gn (A) growing only moderately
and n is not too large then the overall perturbations δA and δb are
not larger than a few orders of magnitude times η.
Scaling
▶ It would be nice to have well-conditioned problems.

▶ Can we easily improve the condition of a linear system before
starting the LU factorization?
▶ Scaling is a possibility that often works.
Let D1 and D2 be diagonal matrices. The solution of A x = b can
be found by solving the scaled system
D1−1 AD2 y = D1−1 b
by Gaussian elimination and then setting
x = D2 y.
Scalings of A, b, y, require only O(n2 ) flops. Often D2 = I , i.e.,

only row scaling, unless matrix is symmetric.
Scaling (cont.)
Example
10 100000 x1 100000
=
1 1 x2 2
The row equivalent scaled problem is

0.0001 1 x1 1
=
1 1 x2 2
The solutions with 3 decimal digits are x̃ = (0, 1.00)T for the
unscaled system and x̃ = (1.00, 1.00)T for the scaled system.
The correct solution is x = (1.0001, 0.9999)T .
See Example 5.11 in Ascher and Greif
Theorem on scaling
Theorem
Let A ∈ Rn×n be nonsingular. Let the diagonal matrix Dz be
defined by
 −1
n
X
dii =  |aij |
j=1
Then
κ∞ (Dz A) ≤ κ∞ (DA)
for all nonsingular diagonal D.
See: Dahmen & Reusken: Numerik fur Ingenieure und
Naturwissenschaftler. 2nd ed. Springer 2008.
Remark on determinants
Although det(A) = 0 for singular matrices, small determinants do

not indicate bad condition!
Example(Small determinant)
A=diag (0.1, 0.1, ..., 0.1) ∈ Rn×n has det(A) = 10−n and κ(A) = 1.
Example(Unit determinant)
The condition number of A below grows exponentially but det(A)
= 1.  
1 −1 ... −1

 ... ... ... 
A=  ... ... 
.
 ... −1 
1
Condition estimation
▶ After having solved A x = b via PA = LU we would like to

ascertain the number of correct digits in the computed x̃ .
▶ We need estimate for κ∞ (A) = ∥A∥∞ A−1 .
∞
▶ known: ∥A∥∞ = max1≤i≤n nj=1 |aij |
P
▶ How get at A−1 ?

∞
▶ Idea: Ay = d =⇒ A−1 ≥ ∥y∥∞ / ∥d∥∞
∞
▶ Find d with ∥d∥∞ = 1 such that ∥y ∥∞ is as big as possible.
▶ Observation: Estimation of U −1 gives a good
∞
approximation of A −1 . Solve Uy = d with di = ±1.
∞

Direct Methods

Uploaded by

Copyright:

Available Formats

Direct Methods

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Direct Methods

Uploaded by

Copyright:

Available Formats

Numerical Methods and Computation

Dept. of Mathematics, IIT Delhi

▶ Review on basic concepts regarding linear systems

CSE, Lecture 5, Oct 3, 2013 Figure 1: Solving 2 × 2 system of equations

where aj is the j−th column of A.

▶ In general, for a square n × n system there is a unique solution

A vector norm is a function ∥·∥ : Rn → R+ such that

▶ The infinity norm or maximum norm:

∥x∥∞ = max |xi |

▶ The 1-norm or l1 norm:

‘Unit circles’ for `1 , `2 , and `1 norms (in 2D).

’Unit Circles’ of l1 , l2 and l∞ norms

In the finite dimensional case considered here, all norms are

C1 ∥x∥a ≤ ∥x∥b ≤ C2 ∥x∥a , ∀x.

We can choose freely in which norm we want to measure distance.

∥x∥∞ ≤ ∥x∥1 ≤ n ∥x∥∞

A matrix norm is a function ∥A∥ : Rm×n → R+ such that

∥AB∥ ≤ ∥A∥ ∥B∥ .

Some matrix norms are induced by vector norms by the definition

∥A∥ = sup ∥Ax∥

For the vector norms we have introduced before, the corresponding

∥A∥2 = sup ∥Ax∥2 = σmax (A)

It can be shown that ∥A∥2 is equal to the largest eigenvalue of

∥QA∥2 = ∥AQ∥2 = ∥A∥2

▶ The infinity norm or maximum row sum norm:

▶ The 1-norm or maximum column sum norm:

The last identity holds because the supremum is attained for

Vector u is orthonormal if ∥u∥2 = 1.

Important property: for any orthogonal matrix Q and vector x:

We consider linear systems of equation of the form

The decomposition of the matrix A into a (unit) lower triangular

1. Permute rows i = 1, ..., 4 (if necessary) such that a11 ̸= 0.

Actual storage scheme

plus vector that stores info on permutations

We stick with the 4 × 4 example. Let

Then, we executed the following

which can be interpreted as

L3−1 (P3 L−1 −1 −1 −1 −1 −1

1. Compute LU factorization with column pivoting

Then LUx = PAx = Pb.

The uii are the diagonal elements of U.

In step k of the LU factorization:

(n − 1) + (n − 2) + · · · + 1 + 2(n − 1)2 + (n − 2)2 + · · · + 12

▶ The complexity of forward substitution is thus n2 + O(n) flops.

The three steps of Gaussian elimination costs thus

▶ Important: LU factorization costs O(n3 ) while the actual

▶ We can compute A−1 by decomposing LU = PA once, and

To solve the linear system of equation A x = b we proceed as

Then LUx = PAx = Pb.

Pivot a11 = 0. We exchange rows 1 and 2.

Let’s consider the numerical example (5 decimal digits)

l21 = 1.2547/0.00035 ≈ 3584.9

u22 = 1.3182 − 3584.9 × 1.2654 ≈ 1.3182 − 4536.3 ≈ −4535.0

x1 = (3.5267 − 1.2654 × 2.7863)/0.00035 ≈ 2.5714

We have cancellation in the computation of x1 ! The very small

The most common strategy for pivoting is to search for the

If p > k then rows p and k are exchanged.

Let’s consider the

Question: Does the incorporation of a partial pivoting procedure