The Sigma-Sor Algorithm and The Optimal Strategy For The Utilization of The Sor Iterative Method
The Sigma-Sor Algorithm and The Optimal Strategy For The Utilization of The Sor Iterative Method
The Sigma-Sor Algorithm and The Optimal Strategy For The Utilization of The Sor Iterative Method
VOLUME62, NUMBER206
APRIL 1994,PAGES619-644
ZBIGNIEWI. WOZNICKI
Abstract. The paper describes, discusses, and numerically illustrates the meth-
od for obtaining a priori estimates of the optimum relaxation factor in the SOR
iteration method. The computational strategy of this method uses the so-called
Sigma-SOR algorithm based on the theoretical result proven in the paper. The
method presented is especially efficient for problems with slowly convergent
iteration process and in this case is strongly competitive with adaptive proce-
dures used for determining dynamically the optimum relaxation factor during
the course of the SOR solution.
1. Introduction
The SOR (Successive Over-Relaxation) method and its line variants are
among the most popular and efficient iterative methods used for solving large
and sparse linear systems of equations arising in many areas of science and en-
gineering. The popularity of SOR algorithms is in a great measure due to their
simplicity from the programming point of view. The rate of convergence of the
SOR method depends strongly on the relaxation factor co; therefore, the main
difficulty in the efficient use of this method lies in making a good estimate of
the optimum relaxation factor <y0ptwhich maximizes the rate of convergence.
For a large class of matrix problems arising in the discretization of elliptic
partial differential equations the coefficient matrices have certain eigenvalue
properties allowing us to determine explicitly the optimum relaxation factor
ûj0pt. In the case when the coefficient matrix is 2-cyclic and consistently ordered
[1] (this property will be assumed in the remainder), wopt can be determined by
finding the value of the spectral radius piS?x) for the associated Gauss-Seidel
iteration matrix S?x.
However, it is well known that the nature of the dependence of œopt on pi2\)
indicates the sensitivity of the rate of convergence to the accuracy in determin-
ing cuopt,as pi^x) approaches unity [1, 2]. When piS\) is very close to unity,
small changes in the estimate of pi&x ) can seriously decrease the rate of con-
vergence, and just in this case the availability of an accurate value of pi^fx) is
an essential point for the efficient use of the SOR method.
619
620 Z. I. WOZNICKI
In practice two approaches are used to determine a>opt• One approach pro-
posed in the literature [2, 3, 4] is determining <yoptdynamically, as the SOR
iteration proceeds with using some w, < coopt. Then by examining certain con-
ditions for quantities derived from current numerical results, <y, is updated to
a new relaxation factor co¡+x< coopt until the assumed tolerance criterion is
satisfied.
The second approach for determining cooptis based on obtaining an a priori
estimation of pi2fx), usually by means of the power method or its modifica-
tions. As is well known, the rate of convergence of the power method is governed
by the ratio of the largest subdominant (in the absolute value) to the dominant
eigenvalue. If this ratio is close to unity, the power method will converge very
slowly and in such a case determining a>optmay be more time-consuming than
the SOR iteration itself.
Basically, there is no general comparison procedure to determine which ap-
proach is "best". However in the case of 2-cyclic consistently ordered matrices,
an accurate estimate for pi2\) prior to the SOR iteration solution can be ef-
fectively obtained by an appropriate use of power method iterations, and this
topic is the main purpose of the paper.
In the next section the SOR iterative method and the power method are
briefly described, and well-known basic results are recalled. These basic results
are essential in deriving the Sigma-SOR algorithm. The computational strategy
for determining the optimum relaxation factor tyopt is described in the third
subsection of §2.
The secondary purpose of this paper, discussed in §3, is to give numerical
results for a variety of problems presented in the literature in order to illustrate
the efficiency of the proposed method for the a priori determination of the
optimum relaxation factor tyopt•
2. Formulation
2.1. The SOR iteration method. In the iterative solution of the linear system
(1) Ax = b
the n x n matrix A is usually defined by the following decomposition:
(2) A = D-L-U,
where D, L, and U are diagonal, strictly lower triangular and strictly upper
triangular matrices, respectively.
The SOR iterative method [1] is defined by
(3) Dx('+1)= w[Lx(<+1)+Ux(/)+ b]-(ft)-l)Dx(i), t = 0,1,2,...
or equivalently, if D is a nonsingular matrix
(4) x('+1)=^,xW-i-(D-iaL)-1b)
where
(5) X = (D-û>L)-1[û>U-(û)-1)D]
is called the SOR iteration matrix and œ is the relaxation factor. For co = 1 the
SOR method reduces to the classical scheme known as the Gauss-Seidel iterative
method and
(6) ^ = (D-L)-'U
is called the Gauss-Seidel iteration matrix.
THE SIGMA-SORALGORITHM 621
In the point algorithm, the iteration proceeds for one component of the ap-
proximate solution vector at a time. For block or line algorithms, the iteration
involves improving simultaneously groups of components, and therefore they
are more efficient than the point algorithm. In this case the matrices D, L,
and U have a block structure corresponding to the assumed partitioning of
components.
It is well known [1] that in the case of 2-cyclic consistent orderings, when the
associated nonnegative Jacobi iteration matrix
B = D-'(L + U)>0
is convergent (i.e., piB) < 1), then J¿?x has only nonnegative eigenvalues A,
such that
(7) l>pi^fx) = Xx>X2>h>--- ,
and the following fundamental relation due to Young (see, for example [1] and
the references given therein) holds between A,-and the corresponding eigenval-
ues Vj Of S'a) '■
Vj + OJ-l |2
(8) h = \- CD
Moreover, /?(-%) = maxi<,<„ \v¡\ < 1 for 0 < co < 2, and its minimum value
is attained when
_ 2
(9) co = coopl= co= =,
1 +VI -/>(-Si)
in which case
co ,Ä,-B-.-f^Si
l + y/l-pi&l)
In the convergence analysis of iterative methods the iasymptotic) rate of con-
vergence
(11) R(S?)= -ln/>(^)
is certainly the simplest practical measure of the speed of convergence for a con-
vergent matrix ¡?. The rate of convergence is especially useful for comparing
the efficiency of different iterative methods, because the number of iterations t
required for reducing the error norm in a given method by a prescribed factor
f is roughly inversely proportional to the rate of convergence, and is given by
,,^ -lnf
(12) t^W),
where 9 is the iteration matrix of the method.
Thus, the efficiencyof different iterative methods (with a similar arithmetical
effort per iteration) can be theoretically evaluated by a comparison of the rate
of convergence. The data given in Table 1 (next page) illustrate the efficiency
of the SOR method by comparing it with the Gauss-Seidel method, where
As can be seen from Table 1, the efficiency of the SOR method drasti-
cally increases as pi¿¿[) becomes close to unity. For the case when pÇS[) =
0.9999, the SOR method is asymptotically 200 times faster than the Gauss-
Seidel method. Since <yoptis a function only of the spectral radius pi^x),
then for any efficient use of the SOR method, computing an accurate value of
pi2[) is needed, and the order of the accuracy in estimating pÇ2[) is depen-
dent on the closeness of p{£¡) to unity.
2.2. The power method. Usually, an estimate for pÇS[) is obtained by using
the ordinary power method [5], which will be used in the analysis presented
in this paper. The power method is conceptually and computationally the sim-
plest iterative procedure for approximating the eigenvector corresponding to the
dominant (largest in modulus) eigenvalue of a given matrix & . It is defined by
the iterative process
(15) Ai >|A2|>|A3|>--->|A„|.
Since by assumption, & has a complete set of eigenvectors u,, an arbitrary
nonzero vector z(0) can be expressed in the form
Since |A,/Ai| < 1 for all i > 2, it is clear that z(i) converges to ui as t —►
co,
provided only that ax ^ 0.
Thus the vector z(/) is an approximation to an unnormalized eigenvector ui
belonging to Ai , which can be considered as accurate if ||e^|| is sufficiently
small. Since
¿,+» = X\+l[amx +e</+1)],
THE SIGMA-SORALGORITHM 623
The above result leads to computing the dominant eigenvalue by means of suc-
cessive approximations of the corresponding eigenvector in the simple power
method.
In practice, in order to keep the components of z(<) within the range of
practical calculation, its components are scaled at each iteration step, and (14)
is replaced by the pair of equations
(19) yW-^zC-1»,
(20) z(i)= y('7Uy(%,
and in this case,
(21) zO-u./llu.ll,
and
(22) Ily%-A, asi^oo,
where two norms, either the maximum norm || • H«, or the Euclidean norm
|| • ||2, are most commonly used.
The rate of convergence will depend on the constants a,, but more essentially
on the separation of the dominant eigenvalue from the largest subdominant
eigenvalues of %, that is, on the ratios |A2|/Ai, IA3I/A1,... , and it is evident
that the smaller these values, the faster the convergence. However, it may occur
that if z(°) is chosen as almost orthogonal to ui, then ax in (17) will be quite
small compared to the other coefficients, and whence for appropriate "small"
values of t, \axX\\ < |û2A2| and the ratio zj+ï'/ifj' will better approximate
A2 than Ai, assuming of course that Ai > |A2|. In the case when ax = 0, the
power method converges theoretically to the second eigenvector. However, in
practice rounding errors will introduce small components ui into the vector
z^ and those components will be magnified in subsequent iterations. Whence,
convergence is still likely to be to the first eigenvector, although with a larger
number of iterations than in the case when a more suitable starting vector z(°)
would be chosen.
In particular, if |A2|/Ai is close to unity, the accuracy of z(i) will be propor-
tional to (|A2|/Ai)' and the convergence may be intolerably slow, but still to the
dominant eigenvalue Ai . In such cases some practical techniques such as a shift
of origin, or Aitken's ¿2-process [5], can be used to speed up the convergence
of the simple power method.
In general, when Ai is the principal eigenvalue, the ratio
However, it seems that from the terminology point of view some comments are
necessary. In the literature for o the term "dominance ratio" is usually used
by some authors. But it is also interesting to notice that other authors (espe-
cially the authors of books dealing with the convergence analysis of eigenvalue
problems) do not use the term "dominance ratio" at all. In the author's feeling
the term "subdominance ratio" for a seems to be more appropriate because
a increases with the absolute value of the largest subdominant eigenvalue, and
the dominance of the principal eigenvalue decreases.
Since the convergence to the dominant eigenvalue by the power method is
geometric in the subdominance ratio o, then by an analogy to the analysis of
iterative methods for solving linear systems of equations one can define the
(asymptotic) rate of convergence as
(24) R(50 = -lncT,
which is a useful measure for the speed of convergence to the dominant eigen-
value of a given matrix & in the power method.
Referring back to the SOR method, we find it convenient to first consider
the behavior of the eigenvalues v, of -2£> as a function of co for the case of
2-cyclic consistently ordered nonsingular matrices A = D-L-U of(l), where
D, L, and U are nonsingular diagonal, strictly lower triangular and strictly
upper triangular nonnegative matrices, respectively. As is well known [1], the
eigenvalues u¡ of -2^, are related by (8) to the eigenvalues A, of the Gauss-
Seidel iteration matrix Sfx, the special case of £?w with co = 1. The matrix
S?x has at least half the eigenvalues equal to zero, and the remaining ones are
positive and real, and such that
(25) l>pi^x) = Xx>A2>A3>-...
In the analysis of convergence properties of the SOR method, it is very useful
to investigate the behavior of the roots of (8),
1
(26) uf,u
1 ' "1
co2Xj± ^co2Xj[co2Xj-4ico-l)] (co-I).
0.20
(35a) co*=-,2
v ' l + Vl-cr*X*
we can obtain v* = pi&w) by the power method iteration until a required
convergence criterion is satisfied. Then from the relation (29) one obtains
2
v* + co*-I
(35b) A,= 1 CO*
(35c) Wx=
i + vT^ät
an a priori "accurate" estimate for coopt. Thus, the accuracy of coopt is condi-
tional to the computation of an accurate value of v*.
As is demonstrated in numerical experiments given in the next section, the
above algorithm, even with crude approximations A* and a*, is very efficient
and strongly competitive with the SOR adaptive procedure [1] when piSCx) is
very close to unity (0.999 < pi¿?x) < 1).
Estimates for a* approximating ox can be obtained by observing the decay
rate of some quantities, for instance
1 ' |A«)-A«-i)|'
or ratios of differences between the components of successive eigenvectors in the
iteration process of the power method (19), (20), using a suitable norm (see, for
example, [4], where the term dominance ratio is used for a). As follows from
628 Z. I. WOZNICKI
(38) (e(%
=£(|)W-
Substituting (37) into (36), one obtains
(e«+i))7_2(eW); + (e"-1));
(39) o (t+X)
(eW);-2(e(i-1)); + (eC-2))/
Assume now that for any t' > 1
t' t'
(40) y-j a2(u2); > (y) a/(u/); for all 3 < /' < n.
Li/ \*i
where
¿1
(it)' - 2 (it)'"' + (S)'"2] "^ + (è«+1>);- 2(è«>);+ (è«-1»);
rC+l)
But when r > í', the relation (40) implies that (ë^); becomes sufficiently
small, and it can be concluded that
rC+l) A2
(43) Ox-
13C+1) jW i
(44b) 4'+1)- \ "A1-
Usually, the convergence behavior of both Am and crM have a monotone de-
creasing character, whereas for Ae and <Teit was observed that they first in-
crease and then (mainly for AE) slowly decrease as the number of iterations
increases.
In the case of using the Euclidean norm for scaling purposes, the following
two additional measures for a can be used:
(AAC\ ff('+l) _ lllv('+i)i|
I »YE Hoc~ ||v(0|i
IIYelleoli
EM " lllv(i)ll
I IlJE 11°° -\\y{'-x)\\
ll*E I
II001
and
1 } EE "lllyEi)-yri)ll2-llyri)-yr)ll2l'
where the successive eigenvectors yE+1), yE' , yE-1), and yE' are generated
by (19)—(22) with using the Euclidean norm for scaling.
As demonstrated in numerical experiments, the most rapid convergence is
observed for (Teewith a monotone increasing character, which provides certain
values estimating the true ox from below.
As can be seen from Figure 1 the behavior of am near <y2 is similar in
nature to the behavior of />(-2£>)near <y,. From an inspection of the slope of
the curve for ow near co2, it follows that errors with underestimating co2 give
larger values of ow than errors (comparable in size) with overestimating w2 .
In the range 1 < co < W2, the value of ow can be determined from (33) in
dependence on cr, and (<w- l)/v2 (or (w - l)/vx in the case of (33a)), and in
the range co2 < co < œx , it is defined by \co - l\/vx .
Thus, from the viewpoint of obtaining the maximum rate of convergence in
the power method, overestimating W2 is less dangerous than underestimating
<y2 by the same amount, but as ox approaches unity, this becomes a more
important problem because underestimating œ2 drastically decreases the rate
of convergence.
On the other hand, however, underestimating co2 may be attractive for ac-
celerating convergence by the use of the Aitken ¿2-process [5]. This procedure,
known also under the name of Aitken extrapolation, is a useful tool for improv-
ing convergence, and can be used for any process converging linearly (i.e., as in
(14), zW = ^z(/_1)). In the case of the simple power method, the convergent
sequence {A^} for the dominant eigenvalue can be transformed into a more
rapidly convergent sequence {A^} by using
cjC-2) _ ;C-ih2
(45) ¿(0 _ ¿(í-2) _ A('-2)-2A('-1)+A(')-
^ Á I
this occurs when co is close to <y3, which minimizes Vj, for all 1 < co < w3 and
provides the best separation of vx and v2 from v$. The distance of separation
is a decreasing function as co increases for öJ3 < co < W2 and vanishes for
co~2< co < a>x because in this region all subdominant eigenvalues have the same
absolute value. Thus, the use of erEE, providing an underestimated value of ax,
can give some advantages in the form of an increased rate of convergence when
the Aitken extrapolation is applied. This aspect will be discussed and illustrated
by numerical results in the next section.
In conclusion it should be stated that in the efficient use of the power method
for determining an accurate value of the optimum relaxation factor in the SOR
iterative method, the relaxation factors W2 and <y3 play an important role; W2
maximizes the rate of convergence in the simple power method, whereas <w3,
providing the best separation of two dominant eigenvalues from the remain-
ing subdominant eigenvalues of the SOR iteration matrix, maximizes the rate
of convergence of the Aitken extrapolation used as a practical technique for
improving the convergence of the power method.
3. Numerical Experiments
In this section the results of numerical experiments are presented for the
numerical solution of a two-dimensional elliptic equation of the form
,2
dtp2 i)„,2-|
dtp2
(46) -Dix,y) + Z(x, y)<p= s(x, y) forx,yeQ
dx2 dy2
with
<P(x,y) = g(x, y) or ^ = g(x,y) for x, ye díl,
where Q is an open bounded region with boundary d£l, n is the exterior
normal, Dix, y) > 0, and Z(x, y) > 0.
The standard finite difference discretization of (46) in a spatial mesh imposed
on Q leads to a system of linear equations of the form
(47) Acf>
= b,
where the components of 0 approximate the values of q> at each mesh point
(x, y). In the case of the natural ordering of mesh points for the standard five-
point difference operator, the nxn coefficient matrix A has only five nonzero
diagonals forming a tridiagonal block structure suitable for the implementation
of the 1-line SOR algorithm, and is 2-cyclic consistently ordered [1].
Five test problems taken from the literature [6, 7] are considered with dis-
continuous coefficients D and I, but chosen to be constant in each subregion
flfc > and different boundary conditions on dQ for uniform and nonuniform
mesh structures.
Test Problem 1. This example, obtained by assuming D = 1 and 1 = 0 in Q,
the unit square (0, 1) x (0, 1), the Dirichlet boundary conditions <p= 0 on
d£l, is usually used as a model problem in the analysis of numerical solutions
of elliptic-type problems. A square mesh with width h = j^ yields n = N2
mesh points, which is also the order of A. We assume n = 48 x 48 = 2304, as
in Problem A in [6].
Test Problem 2. In this problem (Problem B in [6]), whose domain and coeffi-
cients are depicted in Figure 2 (the numbers on the x-axis and y-axis in this
THE SIGMA-SORALGORITHM 631
y
g'0
dy
23
Í34
18
5.
03
4>= o 12 0 = 0 4.
02 9.
1.
fil
97
ay SI = (0,97)x(0.23)
To vertical line 15 16 17 22 23
yt
23
02 500. 0.05
2. 0.05
15
Oi
O = (0,4.65925)x(0,4.65925)
j£ = 0 on an
an
15 23
49
D
02
1000.
37 1.
Oi n = (o,i)x(o,i)
o on an
12
12 37 49
To vertical line 19 41
41
///////// Ol ////////
39
1. 0.02
2. 0.03
02 3. 0.05
(0,2.l)x(0,2.1)
19
d-± = 0 on an
an
03
19 39 41
Figure 5
Test Problem 5. This problem, also taken from [7] (and analyzed in [8]), has a
slightly modified mesh division, giving n = 42 x 42 = 1764, in order to keep
the number of horizontal lines divisible by 2 for convenient use of 2-line SOR
algorithms. The domain, coefficients and mesh division (assumed the same
in both directions) are depicted in Figure 5. In [7, 8] a uniform mesh with
h = 0.05 was used, giving the number of mesh points n = 43 x 43 = 1849.
THE SIGMA-SORALGORITHM 633
For solving (47) in the above five test problems, the following line algorithms
of the SOR iterative method are used [1, 6]:
1. SLOR—1-line system,
2. S2LOR—2-line system,
3. S2LCROR—2-line cyclically reduced system.
In our computations for each problem it was assumed that six, h) = 0 in
(46), so that the unique solution of each discrete problem is the null vector. All
components of the starting vector tb^ were equal to unity, and computations
for each iterative method were continued until the maximum absolute value of
all components of the iterate <t>^was less than a prescribed number e . Thus,
the stopping criterion
(48) £(i)= Halloo<e
can be considered as the most reliable measure of the error vector in estimating
the accuracy of the solution.
All computations were carried out on a PC computer in single-precision FOR-
TRAN for the SOR iteration (including the calculation of the coefficient matri-
ces A), and in double-precision FORTRAN for the power iteration. The results
of computation are shown in Table 2 (next page).
The accurate value of Ai was obtained with co = 1 when the stabilization
to nine significant figures of Ae was observed in the power method ((19)—(22),
using the Euclidean norm); h and I a are the numbers of iterations observed
in the power method without and with using the Aitken extrapolation (45),
respectively; Is is the number of SOR iterations required to satisfy the stopping
criterion (48) for two successive iterations with Z5Xas the optimum relaxation
factor and for two values e = 10-6 and e = 10~8.
The results obtained when using the SOR adaptive subroutine [1, pp. 368-
372] are shown under items 6, 7, and 8 of Table 2.
The data given in items 9-15 are related to computing the accurate value of z^i
(with stabilization to nine significant figures) in the power method with the value
of co = W2 determined from (27a) where A2= {oï[accur]} x A, and a,[accur],
approximated by (Tee (defined by (44d)), was obtained with the calculation of
Ai in item 1 for co = 1. Hence, by (8) and (27), the accurate value of Wx can
be found. Provided ax is known, the accurate value of the optimum relaxation
factor coopt= Wx can thus be efficiently computed. Comparison of the number
of iterations h (or 7A) given in items 2 and 13 allows us to illustrate the
efficiency of the power method used in the case when co = cö2 for each test
problem. The values of o^ given in items 14 and 15, and computed from
(w2 - l)/^i and (32), respectively, indicate the consistency of the results in all
cases, except for Test Problem 5 solved by the SLOR iterative method, where
ox = 0.9944 was found to only four significant figures.
The results obtained for the Sigma-SOR algorithm are given in items 16-
27. The subdominance ratio ax, approximated by oee is estimated once the
stopping criterion
(49) S^ = \o{^-oE'-l)\<S = lO'3
has been satisfied in two successive iterations in all test problems; Zee is the
634 Z. I. WOZNICKJ
en o in o cm en w r- n cm o en in
iû co r\i en m in «r w
t* co
— o—
(O OJ (O
(O M CM CD O
a t
— co —
en en co io
m en o 8 S
n oj cm m tn w**in(Mcnior»cn
en o co r- *r
co — CD im ID
n r* — o co en —•
w o —
en n oo t-
co co N oo — co —
in n — EE ¡S 2
■w •"■ o h- m IM (M M N ; n o in tn r-
n « n m h i r^ r» in en t- rg cm o (0 o T t—
in cm co r- en CO O) (M
3
E î n tn r> m
: N
,-
O
— o
1- 8 8 -> •* [M t-
c
o O) — O)
Nin-NOI-O* in o) n c->
3 : s t- r- in c- o u> en o in m i- «r oo o
r- —
r-
t-
c-
m
co — co —
e
o
en oj —
-• (O 00
en — cm en en cm
U cd t- en en co (o in h- — eninontootofï
: m n O to to ■* w COCOV-tfCMCOr-O'tMlO
t- « N
(N en co co
en — eu
W
3 v œ co m to
CO O) CM
" : (- 1D Ol "
r~ o m en -• to oo r- t- co
co co n r- o çp (M r> to 0)
2tn fi 8
en m
2en
O m t* n m en en « n en to to r- en oo <* ; t*
r- in co
co n -■
v t- m cm (M t —" o o m m tn m o . . m to im to
co co en co CM O tO (M (M
m en co co
r~ in co r-
o en in to en n co r- ui
tn cm
cm co »r *-!(•>« o m — rg
,< — ,3 « « j 3 -. î. - b b
—■ (M PJ V in : (O f-
THE SIGMA-SORALGORITHM 635
1.10
er
1.05
\em
1.00 h
^■'•A.—.,^,..jU-.^crr
0.95
0.90 1
\
0.85 J_l_i_i_i_J_1_L.
100 150 200 250 300 350
Iteration number
M: crM (eq. (44a)); E: cje (eq. (44b)); EM: cjEm (eq. (44c)); EE: C7EE(eq. (44d))
1.10
er
1.05
0.95 h
0.90
0B5I_l 1_i_i_l_i_L-
350
Iteration number
M: aM (eq. (44a)); E: <rE (eq. (44b)); EM: aEM (eq. (44c)); EE: crEE (eq. (44d))
THE SIGMA-SORALGORITHM 637
1.10
a
1.05
1.00
0.90
Iteration number
M: crM (eq. (44a)); E: aE (eq. (44b)); EM: ffEM (eq. (44c)); EE: <rEE(eq. (44d))
1.10
O"
1.05
1.00
IE
1
_i_i_I_i_i_
100 150 200 250 300 350
Iteration number
M: cjM (eq. (44a)); E: cje (eq. (44b)); EM: ctem (eq. (44c)); EE: aEE (eq. (44d))
638 Z. I. WOZNICKJ
1.10 -
a .
1.05 - ]
1.0Oh ,
Iteration number
Figure 10. Test Problem 5
M: crM (eq. (44a)); E: <rE(eq. (44b)); EM: <tEm(eq. (44c)); EE: cjee (eq. (44d))
Moreover, it is interesting to notice that the convergence behavior of ctee
and 0M has a continuous character when passing from convergence to A3/A,
to convergence to X2/Xx, whereas for crE and ctem strong deviations similar to
discontinuities are observed.
It is a well-known fact that for the SOR iterative method the optimum relax-
ation factor <yopt= &>, which maximizes theoretically the rate of convergence
does not provide the best results. In practice, one observes the existence of
a best relaxation factor cob (slightly greater then <y0pt)which minimizes the
number of iterations for the required accuracy of the solution. Unfortunately,
there is no rigorous analysis in the literature explaining the reasons for this cob
and predicting its value. From numerical experience, it can be concluded that
cob is a function of cyopt and the required degree of accuracy of the solution.
One observes the following empirical formula:
where the correction coefficient c = 1.02 when using e = 10~6, and c = 1.01
when using e = 10-8, provides a quite satisfactory estimate for cob . The use
of coB obtained from the above formula allows us to improve the convergence.
Usually, the number of iterations obtained with cob is about 15% less than that
obtained with <y0ptfor slowly convergent problems. The results obtained with
cob for two different stopping criteria are given in items 28-31.
The deterioration in the rate of convergence resulting from using an inaccu-
rate value of coopt is strongly dependent on the closeness of pi&x) to unity,
and it seems to be reasonable that this dependence should be taken in consid-
eration when estimating <y0pt a priori. The nature of calculating pi&x) by
THE SIGMA-SORALGORITHM 639
0.998952
0.991815
0.98 I_,_,_,_,_I_i_,_,_,_i_. _ I
O 50 100 150
Iteration number
Figure 11. Test Problems 1 and 2
means of the power method is such that the first few significant figures of pi2[)
are rapidly fixed at the beginning of the power iterations, whereas convergence
to the next figures begins to be governed by the subdominance ratio ox. The
behavior of />(-2,) versus the number of power iterations for Test Problems
1 and 2 is depicted in Figure 11 where the dashed curves (denoted by la and
2a) correspond to using Aitken extrapolation for accelerating the convergence
in the power method.
In the determination of ft>0ptbased on a priori estimates for />(-2,), the
application of the stopping criterion
(52) a« = lA^-Ai1"0! < S= 10-3|(1-A«)|;
where A^ is an approximation of Ai = pi¿¿[) in the power iteration / using the
Aitken extrapolation, yields results strongly competitive with the SOR adaptive
procedure [1] when the values of piSff) are close to unity.
In items 32-35 of Table 3 (next page) results are given for all test problems
solved by the SLOR method in which the estimate of <yopt is based on the
computation of Ai = p^Sx) by using the stopping criterion (52); the remaining
items quoted from Table 2 are given for comparison purposes.
Table 4 summarizes the results obtained for different computational strate-
gies implemented in four programs used for solving the test problems. The data
given in this table represent the numbers of iterations required to obtain the so-
lution which the stopping criterion ||c/>W||oo< 10~6 satisfied for two successive
640 Z. I. WOZNICKI
17. I 39 46 22 25 22
EC
25. I 100 67 76 69 27
á
28. u 1.83704 1.93847 1.98785 1.99209 1.98715
a
29. I [c=10"8l 99 229 . 1139 1736 1077
applied to the original system, and the D3 program, which uses the 2-line cyclic
Chebyshev method applied to the cyclically reduced system. Both these pro-
grams were used in [6] for solving Test Problems 1, 2, and 3 only; the results
from these programs for Test Problems 4 and 5 were not available.
4. Concluding Remarks
From the practical point of view, the best solution method is one that for the
required accuracy provides the solution with the minimum total arithmetical
effort, which is what mainly determines the cost of computations. In the case
of the SOR iterative method, the arithmetical effort is roughly proportional to
the number of SOR iterations required for obtaining the solution with a given
degree of accuracy, and the number of power iterations required for estimating
the appropriate relaxation factor co. Since the number of arithmetical opera-
tions per iteration in both SOR and power methods are comparable (the power
method defined by (19)—(22)needs a few additional arithmetical operations for
computing the Euclidean norm and for division by this norm), the efficiency of
the assumed solution method can be measured in terms of the total number of
iterations. Moreover, this total number of iterations, as well as the fraction of
both SOR and power iterations, may change from problem to problem.
The number of SOR iterations is roughly inversely proportional to the rate of
convergence where the deterioration of the convergence rate resulting from using
an inaccurate value of <woptis strongly dependent on the closeness of pi3\)
to unity. The speed of convergence in the power method is governed by the
value of the subdominance ratio ow , which determines the rate of convergence,
similarly as pi^fw) does in the SOR method, and the number of power iterations
is also strongly dependent on the closeness of ox to unity or on the degree of
separation of two dominant eigenvalues from the remaining ones, if the Aitken
extrapolation is used. Thus, it seems that the selection and application of the
iterative strategy for solving different problems should be based more on the
analysis of results obtained in practice than on theoretical considerations.
In the test problems considered in this work and representing a class of nu-
clear engineering problems, we have
0.978 < pi&x) < 0.99999 and 0.96 < ox < 0.995,
so that the analysis of numerical results obtained for these problems should also
be conclusive with solving large-scale scientific problems.
It seems that in the selection of computational strategy in solving elliptic-
type problems, the SOR adaptive technique (implemented in the Al, A2, and
A3 programs) is favored in the literature [1, 2, 3, 4, and 6] as a more efficient
solution method in comparison with the computational strategy based on a pri-
ori estimate of tyopt. However, the numerical experiments on all test problems
considered here show that the B2, B2, and B3 programs, in which an a priori
estimate for <yoptis obtained by calculating A, = piS'f) with the power method
accelerated by Aitken extrapolation and using the stopping criterion (52), are
competitive with the Al, A2, and A3 programs, especially when />(J?¡) is close
to unity.
As can be seen in Table 4, in the case of Test Problem 1 the Bl program
642 Z. I. WOZNICKI
needs 14 iterations more (that is, about 10% more) than the Al program. But
for Test Problem 4 the difference is equal to 912 iterations in favor of the Bl
program, which corresponds to about 40% more iterations in the Al program.
Since both test problems have the same size (2304 mesh points), the advantages
resulting from solving Test Problem 4 by the Bl program in comparison to the
Al program can be estimated by this difference of iterations, which in this case
is about seven times greater than the total number of iterations required for
solving Test Problem 1 by the Al or Bl programs.
Suppose that both problems are solved with an a priori estimate for <yopt
based on using the accurate value of pi¿¿[) given in item 1 of Table 3 and
obtained with 650 and 329 iterations (item 2 of Table 3) for Test Problems 1
and 4, respectively. Then, in the case of Test Problem 1 the solution is obtained
with 106 iterations (the same number of iterations as for the Bl solution), but
the total number of iterations is increased to 755, that is, 615 iterations more
than for the Bl solution given in Table 4. For Test Problem 4 the total number
of iterations (accompanied by a small decrease of SOR iterations) is increased
to 2377, that is, 199 iterations more than for the Bl solution but still much less
than for the Al solution. A similar behavior can be observed when comparing
the results of Table 4 given for the A2 and A3 programs with those given for
the B2 and B3 programs, respectively.
From the above comparisons, it is apparent that in the solution method based
on a priori estimates for coopt,the main difficulty lies in the choice of the degree
of accuracy appropriate for estimating piSfx) in a given problem; it is probably
for this reason that a priori estimates for eooptare given less attention in the
literature. However, as can be concluded from the results given in Table 4 for
the Bl, B2, and B3 programs, the simple trick of using the stopping criterion
(52) conditioned by the closeness of pi3\) to unity allows us in some sense
to avoid this main difficulty and to make a priori estimation of oj0pt a more
useful computational technique and competitive with the solution method based
on using the SOR adaptive procedure [1], especially for problems in which
the values of pi&x) are very close to unity. In the range 0.98 < piSCx) <
0.999, represented by Test Problems 1 and 2, the SOR adaptive procedure
discussed extensively and illustrated numerically in [1] just for this range of
values of p{3[), provides solutions with a smaller number of iterations than
in the case of using a priori estimates for <yoptbased on the stopping test (52).
But as was demonstrated above for Test Problem 1, the advantages resulting
from decreasing the total number of iterations have no practical significance
because in this range of spectral radii, the deterioration of the convergence
rate caused by using an inaccurate value of coopXdoes not strongly change the
number of iterations. For the class of problems with 0.999 < pi&x) < 0.99999,
represented by Test Problems 3, 4, and 5, the efficiency of solution becomes
more sensitive to the accurate value of coopXas pi3\) approaches unity, and
the computational strategy based on determining an accurate value of <y0pt
prior to the SOR solution is much superior than the SOR adaptive technique,
as can be seen in Table 4. In this case, the last estimate for <yoptin the SOR
adaptive technique is most time-consuming because aw becomes close to unity
(see Figure 1). It is interesting to note that in the case of Test Problem 5
extremely small numbers of iterations are required to a priori estimate cyoptin
the Bl, B2, and B3 programs.
THE SIGMA-SORALGORITHM 643
ACKNOWLEDGMENT
The author would like to thank Drs. J. Kubowski and K. Pytel for their
useful discussions and comments, as well as M. Sei. P. Jarzembowski for his
expert programming assistance. Thanks are also due to the Editor, Professor
W. Gautschi, for his significant contribution with revising the manuscript.
Bibliography
1. L. A. Hageman and D. Young, Applied iterative methods, Academic Press, New York, 1981.
2. B. A. Carré, The determination of the optimum accelerating factor for successive over-
relaxation, Comput. J. 4 (1961), 73-78.
3. H. E. Kulsrud, A practical technique for the determination of the optimum relaxation factor
of the successive over-relaxation method, Comm. ACM 4 (1961), 184-187.
4. L. A. Hageman and R. B. Kellogg, Estimating optimum relaxation factors for use in the
successive overrelaxation and the Chebyshev polynomial methods of iteration, WAPD-TM-
592, 1966.
5. J. H. Wilkinson, The algebraic eigenvalue problem, Oxford Univ. Press, London, 1965.
6. L. A. Hageman and R. S. Varga, Block iterative methods for cyclically reduced matrix equa-
tions, Numer. Math. 6 (1964), 106-119.
7. P. Conçus, G. H. Golub, and G. Meurant, Block preconditioning for the conjugate gradient
method, SIAMJ. Sei. Statist. Comput. 6 (1985), 220-252.
8. Z. I. Woznicki, On numerical analysis of conjugate gradient method, Japan J. Indust. Appl.
Math. 10(1993), 487-519.