Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

D 001 Download

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

5

Kragujevac J. Math. 25 (2003) 518.

PARALLEL METHODS FOR SOLVING PARTIAL


DIFFERENTIAL EQUATIONS
Ioana Chiorean
Babes-Bolyai University, Department of Mathematics, Cluj-Napoca, Romania

(Received May 28, 2003)

Abstract. The aim of this paper is to study some iterative methods for solving Partial
Differential Equation, like Jacobi, Gauss-Seidel, SOR and multigrid, making a comparison
among them from their computational complexity point of view.

1. INTRODUCTION

Various problems arising from Physics, Fluid Dynamics, Chemistry, Biology, etc.
can be modeled mathematically by means of partial differential equations. It is known
that, sometimes, the exact solution (or solutions) is difficult to be determined, so
one has to compute an approximation of it, generated by means of the approximate
problem attached to the continuous one.
In order to solve the approximate problem, obtained by discretizing the initial,
continuous problem, several numerical methods can be used: direct (e.g. Gaussian
elimination, factorization techniques, etc.) or iterative (e.g. Jacobi, Gauss-Seidel,
SOR, multigrid, etc.).

6
Sometimes the direct solvers are preferred, but if the problem is too large, the
iterative ones are more appropriate. It seems to be more attractive, too, from the
computation point of view, if more than one processor are used, it means from the
parallel calculus point of view (see [6], [7]).
The aim of this paper is to make a review of some parallel abordations of the Jacobi, Gauss-Seidel, SOR and multigrid methods, emphasizing the last one and trying
to reduce its computational complexity by using a different type of communication
among processors. In order to present these ideas, the Poissons equation will be used
as the model problem.

2. REVIEW OF THE DISCRETE POISSONS EQUATION


This equation may arise in heat flow, electrostatics, gravity, etc. and, in 2dimensions, is:

2 u(x, y) 2 u(x, y)
+
= f (x, y) in
(1)
x2
y 2
with , lets say, the unit square 0 < x, y < 1, and with some boundary conditions,
lets consider the simplest case
u(x, y) = 0 on

(2)

This is the continuous problem we have to solve. We discretize this equation by


means, e.g., of finite differences (see [3]). We use an (n + 1) (n + 1) grid on (it
1
means on the unit square), where h =
is the grid spacing. Lets denote uij the
n+1
approximate solution at x = ih and y = ih. This is shown in fig.1, for n = 7.
Denoting bij = f (ih, jh) h2 , the approximate problem becomes:
4uij ui1,j ui+1,j ui,j1 ui,j+1 = bij

(3)

for all 1 i, j n.
In the matrix form, (3) is a linear system of N = n2 equations with N unknowns,
lets write it
A u = b.

(4)




  





  






   




!" #%$ $'&)(%*,+" - . / * 0 " 1 * +2+3 4657" 8 9:)" 0 (<; 

  


  

   







   


  
 


 

    ! "$# % &  ' # (*)+& (# &,) - .$"/0"$) 12"$32) "4%  5 +&  (

8
Using the linearized order of unknowns on a 2D grid, as in fig.2, the system will
be:

4
1

1
1

1
1
4

1
1

1
1

1
1

1
1
4

1
1

1
1

1
1

1
1
4

1
1

u11
u21
u31
u41
u12
u22
u32
u42
u13
u23
u33
u43
u14
u24
u34

u44

b11
b21
b31
b41
b12
b22
b32
b42
b13
b23
b33
b43
b14
b24
b34

b44

3. CLASSICAL RELAXATION SCHEMES

We include here Jacobi, Gauss-Seidel and SOR methods.


Denoting by k and k + 1 two successive iterations, these methods derive from (3)
in the following way:
(k+1)

= (ui1,j + ui,j1 + ui+1,j + ui,j+1 + bij )/4

(k+1)

= (ui1,j + ui,j1 + ui+1,j + ui,j+1 + bij )/4

uij
uij
(k+1)

uij

(k)

(k)

(k)

(k)

(k)

(5)

(k+1)

(k+1)

(k)

(k)

(6)

(k+1)

(k+1)

(k)

(k)

(k)

= uij + (ui1,j + ui,j1 + ui+1,j + ui,j+1 + bij 4uij )/4

(7)

(with 1 < < 2 to have over-relaxation).


In [4], parallel alternatives for these methods are given, when a lattice connectivity
among processors is used. A comparison among their computational complexities is
made, which can be sumarized in the following table:
p = number of processors
f = time per flop
= startup for a message

9
= time per word in a message

Complexity (Jacobi) = number of steps cost per step =


= O(N ) ((N/p)f + + (n/p)) =
3

= O(N 2 /p)f + O(N ) + O(N 2 /p)

Complexity (SOR) = number of steps cost per step =

= O( N ) + ((N/p)f + + (n/p)) =

3
= O(N 2 /p)f + O( N ) + O(N/p)
Remark. The Gauss-Seidel method converges twice as fast as Jacobi, but requires
twice as many parallel steps, using the checkerboard ordering of nodes (see [4], [2],
[3]), so about the same run time, in practice. This is the reason why it does not
appear in the table.
In [3] we show that changing the connectivity among processors, and using a ring
communication, the cost per step for SOR method can be made lower, and then the
whole complexity of the SOR parallel method.

4. THE MULTIGRID METHODS

It is known (see e.g. [1], [5]) that multigrid is a divide-and-conquer algorithm for
solving a discrete problems. It is widely used on partial differential equations, as well.
It is divide-and-conquer in
two related
senses. First, it obtains an initial solution for

n n
an (n n) grid using an

grid as an approximation, taking every other grid


2
2

n n
point from the (n n) grid. The coarser

grid is in turn approximated by


2
2

n n
an

grid, and so on recursively.


4
4
The second way multigrid uses divide-and-conquer is in the frequency domain.
This requires us to think of the error as a sum of sine-curves of different frequencies.

10
Then the work we do on a particular grid will eliminate the error in half of the
frequency components not eliminated on other grids.
Without loss of generality, one consider a (2m 1) (2m 1) grid of unknowns
and adding the nodes at the boundary, which have the given value 0, one get a
(2m + 1) (2m + 1) grid on which the algorithm will operate. Lets denote n = 2m + 1.
Also, let P (i) denote the problem of solving Poissons equation on a (2i + 1) (2i +
1) grid, with (2i 1) (2i 1) unknowns. The problem is specified by the grid
size i, the coefficient matrix Ai and the right hand side, bi . A sequence of related
problems P (m), P (m 1), . . . , P (1) on coarser and coarser grids are generated, where
the solution to P (i 1) is a good approximation to the solution of P (i). Some grids
for n = 9 are shown in fig.3.
If we denote bi the right hand side of the linear system P (i) and xi an approximate
solution of P (i) (thus xi and bi are (2i 1) (2i 1) arrays of values at each grid
point), the basic Multigrid V-cycle is (MGV):
function M GV (bi , xi ) {return an improved solution} xi to P (i)
if i = 1 {only one unknown}
compute the exact solution x1 on P (1)
return (b1 , x1 )
else
xi = Si (bi , xi ) {improve the solution}
(bi , di ) = Ii1 (M GV (Ri (bi , xi ))) {solve recursively}
xi = xi di {correct fine grid solution}
xi = Si (bi , xi ) {improve solution some more}
return (bi , xi )
endif






%
 
    
& '
 (* ) &
)   !""
#'
" )+  $" "' 
 %,) -,.  '
  ) 
  

  
  
  
%
    /
& '
 (*/) &

) %  !,"
#' " )+* $0
" "' 
 %,) -,.  '
  ) 
  

11

1         %  !",#%"* $


 , 
2% ,3 342% ")+' ". 
'   )+   ,0 
05 67
Remark 1. Si denotes the smoothing operator (see e.g. [1]), which finally is one
or more relaxation steps; Ii1 is the prolongation operator which takes an approximate
solution xi1 for P (i 1) and converts it to an approximation xi for the problem P (i)

12
on the next finer grid; Ri is the restriction operator, which maps the approximate
solution for P (i) to the next coarser grid; di is the defect, it means how much the
solution xi fails in verifying the system.
Remark 2. The algorithm is called V-cycle because if we draw it schematically
in space and time (with grid number i), with a point for each recursive call to MGV,
it lookd like fig.4, starting with a call to M GV (P (5), x5 ) in the upper left corner.
This calls MGV on grid 4, then 3, and so down to the coarsest grid 1, and then back
up to grid 5, again.


  
          "!#    %$

If we perform the algorithm on a serial computer (so with one processor), the
complexity of MGV can be determined, in the terms of big-Oh, in the following
way: we observe that the work at each point in the algorithm is proportional to the
number of unknowns, since the value at each grid point is just averaged with its
nearest neighbors. Thus, each point at grid level i on the V in the V-cycle will cost

13
(2i 1)2 , which is of order O(4i ) operations. If the finest grid is at level m, the total
serial work will be given by the geometric sum
m
X

(2i 1)2 , which is of order O(4m )

i=1

so the total serial work is proportional to the number of unknowns. In general, is of


order O(N ), with N = n2 .
Remark. In [4], the Full Multigrid algorithm which uses Multigrid V-cycle as a
building block is also studied. We do not insist here on it, because it is shown that it
has the same serial complexity like the Multigrid V-cycle.

5. THE COMPLEXITY OF PARALLEL MULTIGRID METHOD

We know that multigrid requires each grid point to be updated depending on as


many as 8 neighbors (those to the N, E, W, S, NW, SW, SE and NE). [4] studies
the case in which a lattice of processors is used in order to execute the multigrid
algorithm. So, having a n = (2m + 1) (2m + 1) grid of data, one suppose that this
is laid out on
an s s gridof processors (so p = s2 processors), with each processor

n1 n1
owning an

subgrid. This situation is illustrated in the fig.5, taking


s
s
into account a 33 33 mesh, with 4 4 processor grid.
The grid points in the top processor row have been labeled by the grid number
i of the problem P (i) in which they participate. There is exactly one point labeled
2 per processor. The only grid point in P (1) with a nonboundry value is owned by
the processor above the coloured one. In the lower half of the mesh, grid points
labeled m need to be communicated to the coloured processor in problem P (m) of
multigrid. The coloured processor owns grid points inside the coloured box, and will
communicate with his neighbours in the following way: to update its own grid points
for P (5), it requires 8 grid point values from its N, D, E and W neighbors, as well
as single point values from its NW, SW, SE and NE neighbors. Similarly, updating
the values for P (4), it requires 4 grid point values from the N, S, E and W neighbors,
and one each from the NW, SW, SE and NE neighbors. This pattern continues until

14
each processor has only one grid point. After this, only some processors participate


































 



   
  
 








 
 



        !"$##&%# #')( * +-,  +-.%$.)/ 0 ( *    

in the computation, requiring one value each from 8 other processors.

Making a big-Oh analysis of the computation costs, for simplicity, lets consider
p = 4k = 22k , so each processor owns a (2mk 2mk ) subgrid. Consider a V-cycle
starting at level m. Denoting with f the time per flop, the time for a message
startup and the time per word to send a message, the following study of complexity
can be made: (see [4]).
At the levels k to m
Time at level i is:
O(4ik ) f + (number of flops, proportional to the number of grid, points per
processor)
+O(1) + (send a constant number of messages to neighbors)
+O(2ik ) (number of word sent)

15
Summing all these terms for i = k to m, yields
Time at levels k to m is:
O(4mk ) f +
+O(m k) +
+O(2mk ) =
= O(n2 /p) f +
+O(log(n/p)) +

+O(n/ p)
At level k 1 to 1
Because in levels k 1 through 1, fewer than all processors will own an active grid
point, some processors will remain idle, and then the time complexity differs:
Time at level i is:
O(1)f + (number of flops proportional to the number of grid points per processor)
+O(1) + (send a constant messages to neighbors)
+O 1) (number of words sent)
Summing all this for i = 1 to k 1, yields
Time at level 1 to k 1 is:
O(k 1) f +
O(k 1) +
O(k 1) =
= O(log(p)) f +
+O(log(p)) +
+O(log(p))
So, the total time for a V-cycle starting at the first level is therefore
Time:
O(n2 /p + log(p)) f +
+O(log(n)) +

+O(n/ p + log(p))
Remark. Denoting N = n2 the number of unknowns, we can state that, for
p N , the speed up of the serial multigrid is nearly perfect, but if we have enough

16
processors, it means at p = N , it reduces to log2 (N ).
Some improvements in the speed up of the multigrid V-cycle method can be
made if the cost of communication per step is made lower. It can be done if we use
another connectivity among processor, different from the lattice one, for instance a
tree network.
Lets consider, firstly, the 1D situation, with n = 9. In fig.6 we see the grids used
in computation:

  
  
  
 
 
       ! "  
!  # $  % &' (&*)  + 
We have n = 2m + 1 data, (with m = 3 in our example) on the finest grid. Lets
memorize the data in p = n processors leaves of a m-ary tree, like that in fig.7.












 


 





 

 

   !#"%$'& ( $ $*) +-,.,./10% ) 2-&  + 0301$ & 45+ ( 6%7849 & ";:=<?>

Each processor at a level k computes one unknown, using the values of three
processors from the level k + 1, and sending the result to processor k 1.

17
Time at level k is:
O(1) f + (number of flops proportional to the number of grid points per processors)
+O(1) + (send a constant number of messages to neighbor)
+O(1) (number of word sent)
Summing all these terms for k = 1 to m, yields
Time at level 1 to m is:
O(m) f +
+O(m) +
+O(m) =
= O(m) (f + + ) = O(log(n)) (f + + )
Remark. The complexity in this way is also of order O(log(n)). One can show
that the result is similar for the 2D case, using a more complex tree communication.

6. CONCLUSIONS

Comparing Jacobi, Gauss-Seidel, SOR and multigrid methods for solving discrete
Poissons equation on a n n grid of N = n2 unknowns for complexity point of
view, we see that the best method is the multigrid. But there are also other iterative
methods (of Krylov type like FFT) which generate same good times of execution.

References
[1] Briggs, W., A multigrid tutorial, SIAM, 1987.
[2] Chiorean, I., On some 2D Parallel Relaxation Schemes, Proceeding of
ROGER2002, Sibiu, 2002, Burg-Verlag, 2002, 77-85.
[3] Chiorean, I., On the complexity of some relaxation schemes, Proc. of ICAM3,
Borsa, Oct. 2002, Bul. St. Univ. Baia-Mare, Ser. B, Vol. XVIII (2002), nr. 2,
171-176.

18
[4] Demmel, J., Lecture Notes on Numerical Linear Algebra, Berkeley Lecture Notes
in Mathematics, UC Berkeley, 1996.
[5] Hackbusch, W., An Introduction to Multigrid Method, Springer Verlag, 1983.
[6] Kumar, V. et al., Introduction to Parallel Computing, The Benjamin Cumming
Pub. Company, 1994.
[7] Modi, J. J., Parallel Algorithms and Matrix Computation, Oxford, 1987.

You might also like