D 001 Download
D 001 Download
D 001 Download
Abstract. The aim of this paper is to study some iterative methods for solving Partial
Differential Equation, like Jacobi, Gauss-Seidel, SOR and multigrid, making a comparison
among them from their computational complexity point of view.
1. INTRODUCTION
Various problems arising from Physics, Fluid Dynamics, Chemistry, Biology, etc.
can be modeled mathematically by means of partial differential equations. It is known
that, sometimes, the exact solution (or solutions) is difficult to be determined, so
one has to compute an approximation of it, generated by means of the approximate
problem attached to the continuous one.
In order to solve the approximate problem, obtained by discretizing the initial,
continuous problem, several numerical methods can be used: direct (e.g. Gaussian
elimination, factorization techniques, etc.) or iterative (e.g. Jacobi, Gauss-Seidel,
SOR, multigrid, etc.).
6
Sometimes the direct solvers are preferred, but if the problem is too large, the
iterative ones are more appropriate. It seems to be more attractive, too, from the
computation point of view, if more than one processor are used, it means from the
parallel calculus point of view (see [6], [7]).
The aim of this paper is to make a review of some parallel abordations of the Jacobi, Gauss-Seidel, SOR and multigrid methods, emphasizing the last one and trying
to reduce its computational complexity by using a different type of communication
among processors. In order to present these ideas, the Poissons equation will be used
as the model problem.
2 u(x, y) 2 u(x, y)
+
= f (x, y) in
(1)
x2
y 2
with , lets say, the unit square 0 < x, y < 1, and with some boundary conditions,
lets consider the simplest case
u(x, y) = 0 on
(2)
(3)
for all 1 i, j n.
In the matrix form, (3) is a linear system of N = n2 equations with N unknowns,
lets write it
A u = b.
(4)
!" #%$ $'&)(%*,+" - . / * 0 " 1 * +2+3 4657" 8 9:)" 0 (<;
! "$# % & ' # (*)+& (# &,) - .$"/0"$) 12"$32) "4% 5 +& (
8
Using the linearized order of unknowns on a 2D grid, as in fig.2, the system will
be:
4
1
1
1
1
1
4
1
1
1
1
1
1
1
1
4
1
1
1
1
1
1
1
1
4
1
1
u11
u21
u31
u41
u12
u22
u32
u42
u13
u23
u33
u43
u14
u24
u34
u44
b11
b21
b31
b41
b12
b22
b32
b42
b13
b23
b33
b43
b14
b24
b34
b44
(k+1)
uij
uij
(k+1)
uij
(k)
(k)
(k)
(k)
(k)
(5)
(k+1)
(k+1)
(k)
(k)
(6)
(k+1)
(k+1)
(k)
(k)
(k)
(7)
9
= time per word in a message
= O( N ) + ((N/p)f + + (n/p)) =
3
= O(N 2 /p)f + O( N ) + O(N/p)
Remark. The Gauss-Seidel method converges twice as fast as Jacobi, but requires
twice as many parallel steps, using the checkerboard ordering of nodes (see [4], [2],
[3]), so about the same run time, in practice. This is the reason why it does not
appear in the table.
In [3] we show that changing the connectivity among processors, and using a ring
communication, the cost per step for SOR method can be made lower, and then the
whole complexity of the SOR parallel method.
It is known (see e.g. [1], [5]) that multigrid is a divide-and-conquer algorithm for
solving a discrete problems. It is widely used on partial differential equations, as well.
It is divide-and-conquer in
two related
senses. First, it obtains an initial solution for
n n
an (n n) grid using an
n n
point from the (n n) grid. The coarser
n n
an
10
Then the work we do on a particular grid will eliminate the error in half of the
frequency components not eliminated on other grids.
Without loss of generality, one consider a (2m 1) (2m 1) grid of unknowns
and adding the nodes at the boundary, which have the given value 0, one get a
(2m + 1) (2m + 1) grid on which the algorithm will operate. Lets denote n = 2m + 1.
Also, let P (i) denote the problem of solving Poissons equation on a (2i + 1) (2i +
1) grid, with (2i 1) (2i 1) unknowns. The problem is specified by the grid
size i, the coefficient matrix Ai and the right hand side, bi . A sequence of related
problems P (m), P (m 1), . . . , P (1) on coarser and coarser grids are generated, where
the solution to P (i 1) is a good approximation to the solution of P (i). Some grids
for n = 9 are shown in fig.3.
If we denote bi the right hand side of the linear system P (i) and xi an approximate
solution of P (i) (thus xi and bi are (2i 1) (2i 1) arrays of values at each grid
point), the basic Multigrid V-cycle is (MGV):
function M GV (bi , xi ) {return an improved solution} xi to P (i)
if i = 1 {only one unknown}
compute the exact solution x1 on P (1)
return (b1 , x1 )
else
xi = Si (bi , xi ) {improve the solution}
(bi , di ) = Ii1 (M GV (Ri (bi , xi ))) {solve recursively}
xi = xi di {correct fine grid solution}
xi = Si (bi , xi ) {improve solution some more}
return (bi , xi )
endif
%
& '
(*) &
) !""
#'
" )+
$" "'
%,) -,. '
)
%
/
& '
(*/) &
) % !,"
#' " )+* $0
" "'
%,) -,. '
)
11
12
on the next finer grid; Ri is the restriction operator, which maps the approximate
solution for P (i) to the next coarser grid; di is the defect, it means how much the
solution xi fails in verifying the system.
Remark 2. The algorithm is called V-cycle because if we draw it schematically
in space and time (with grid number i), with a point for each recursive call to MGV,
it lookd like fig.4, starting with a call to M GV (P (5), x5 ) in the upper left corner.
This calls MGV on grid 4, then 3, and so down to the coarsest grid 1, and then back
up to grid 5, again.
"!# %$
If we perform the algorithm on a serial computer (so with one processor), the
complexity of MGV can be determined, in the terms of big-Oh, in the following
way: we observe that the work at each point in the algorithm is proportional to the
number of unknowns, since the value at each grid point is just averaged with its
nearest neighbors. Thus, each point at grid level i on the V in the V-cycle will cost
13
(2i 1)2 , which is of order O(4i ) operations. If the finest grid is at level m, the total
serial work will be given by the geometric sum
m
X
i=1
n1 n1
owning an
14
each processor has only one grid point. After this, only some processors participate
!"$##&%# #')( * +-, +-.%$.)/ 0 ( *
Making a big-Oh analysis of the computation costs, for simplicity, lets consider
p = 4k = 22k , so each processor owns a (2mk 2mk ) subgrid. Consider a V-cycle
starting at level m. Denoting with f the time per flop, the time for a message
startup and the time per word to send a message, the following study of complexity
can be made: (see [4]).
At the levels k to m
Time at level i is:
O(4ik ) f + (number of flops, proportional to the number of grid, points per
processor)
+O(1) + (send a constant number of messages to neighbors)
+O(2ik ) (number of word sent)
15
Summing all these terms for i = k to m, yields
Time at levels k to m is:
O(4mk ) f +
+O(m k) +
+O(2mk ) =
= O(n2 /p) f +
+O(log(n/p)) +
+O(n/ p)
At level k 1 to 1
Because in levels k 1 through 1, fewer than all processors will own an active grid
point, some processors will remain idle, and then the time complexity differs:
Time at level i is:
O(1)f + (number of flops proportional to the number of grid points per processor)
+O(1) + (send a constant messages to neighbors)
+O 1) (number of words sent)
Summing all this for i = 1 to k 1, yields
Time at level 1 to k 1 is:
O(k 1) f +
O(k 1) +
O(k 1) =
= O(log(p)) f +
+O(log(p)) +
+O(log(p))
So, the total time for a V-cycle starting at the first level is therefore
Time:
O(n2 /p + log(p)) f +
+O(log(n)) +
+O(n/ p + log(p))
Remark. Denoting N = n2 the number of unknowns, we can state that, for
p N , the speed up of the serial multigrid is nearly perfect, but if we have enough
16
processors, it means at p = N , it reduces to log2 (N ).
Some improvements in the speed up of the multigrid V-cycle method can be
made if the cost of communication per step is made lower. It can be done if we use
another connectivity among processor, different from the lattice one, for instance a
tree network.
Lets consider, firstly, the 1D situation, with n = 9. In fig.6 we see the grids used
in computation:
!"
! #$ % &' (&*) +
We have n = 2m + 1 data, (with m = 3 in our example) on the finest grid. Lets
memorize the data in p = n processors leaves of a m-ary tree, like that in fig.7.
!#"%$'& ( $ $*) +-,.,./10% ) 2-& + 0301$ & 45+ ( 6%7849 & ";:=<?>
Each processor at a level k computes one unknown, using the values of three
processors from the level k + 1, and sending the result to processor k 1.
17
Time at level k is:
O(1) f + (number of flops proportional to the number of grid points per processors)
+O(1) + (send a constant number of messages to neighbor)
+O(1) (number of word sent)
Summing all these terms for k = 1 to m, yields
Time at level 1 to m is:
O(m) f +
+O(m) +
+O(m) =
= O(m) (f + + ) = O(log(n)) (f + + )
Remark. The complexity in this way is also of order O(log(n)). One can show
that the result is similar for the 2D case, using a more complex tree communication.
6. CONCLUSIONS
Comparing Jacobi, Gauss-Seidel, SOR and multigrid methods for solving discrete
Poissons equation on a n n grid of N = n2 unknowns for complexity point of
view, we see that the best method is the multigrid. But there are also other iterative
methods (of Krylov type like FFT) which generate same good times of execution.
References
[1] Briggs, W., A multigrid tutorial, SIAM, 1987.
[2] Chiorean, I., On some 2D Parallel Relaxation Schemes, Proceeding of
ROGER2002, Sibiu, 2002, Burg-Verlag, 2002, 77-85.
[3] Chiorean, I., On the complexity of some relaxation schemes, Proc. of ICAM3,
Borsa, Oct. 2002, Bul. St. Univ. Baia-Mare, Ser. B, Vol. XVIII (2002), nr. 2,
171-176.
18
[4] Demmel, J., Lecture Notes on Numerical Linear Algebra, Berkeley Lecture Notes
in Mathematics, UC Berkeley, 1996.
[5] Hackbusch, W., An Introduction to Multigrid Method, Springer Verlag, 1983.
[6] Kumar, V. et al., Introduction to Parallel Computing, The Benjamin Cumming
Pub. Company, 1994.
[7] Modi, J. J., Parallel Algorithms and Matrix Computation, Oxford, 1987.