Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

The Sigma-Sor Algorithm and The Optimal Strategy For The Utilization of The Sor Iterative Method

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

MATHEMATICSOF COMPUTATION

VOLUME62, NUMBER206
APRIL 1994,PAGES619-644

THE SIGMA-SOR ALGORITHM AND THE OPTIMAL STRATEGY


FOR THE UTILIZATION OF THE SOR ITERATIVEMETHOD

ZBIGNIEWI. WOZNICKI

Abstract. The paper describes, discusses, and numerically illustrates the meth-
od for obtaining a priori estimates of the optimum relaxation factor in the SOR
iteration method. The computational strategy of this method uses the so-called
Sigma-SOR algorithm based on the theoretical result proven in the paper. The
method presented is especially efficient for problems with slowly convergent
iteration process and in this case is strongly competitive with adaptive proce-
dures used for determining dynamically the optimum relaxation factor during
the course of the SOR solution.

1. Introduction
The SOR (Successive Over-Relaxation) method and its line variants are
among the most popular and efficient iterative methods used for solving large
and sparse linear systems of equations arising in many areas of science and en-
gineering. The popularity of SOR algorithms is in a great measure due to their
simplicity from the programming point of view. The rate of convergence of the
SOR method depends strongly on the relaxation factor co; therefore, the main
difficulty in the efficient use of this method lies in making a good estimate of
the optimum relaxation factor <y0ptwhich maximizes the rate of convergence.
For a large class of matrix problems arising in the discretization of elliptic
partial differential equations the coefficient matrices have certain eigenvalue
properties allowing us to determine explicitly the optimum relaxation factor
ûj0pt. In the case when the coefficient matrix is 2-cyclic and consistently ordered
[1] (this property will be assumed in the remainder), wopt can be determined by
finding the value of the spectral radius piS?x) for the associated Gauss-Seidel
iteration matrix S?x.
However, it is well known that the nature of the dependence of œopt on pi2\)
indicates the sensitivity of the rate of convergence to the accuracy in determin-
ing cuopt,as pi^x) approaches unity [1, 2]. When piS\) is very close to unity,
small changes in the estimate of pi&x ) can seriously decrease the rate of con-
vergence, and just in this case the availability of an accurate value of pi^fx) is
an essential point for the efficient use of the SOR method.

Received by the editor October 9, 1992.


1991MathematicsSubjectClassification.Primary 65B99,65F10, 65F15,65F50.
Key words and phrases. SOR iteration method, power method, acceleration of convergence,
eigenvalues of iteration matrix, estimation of optimum relaxation factor.
©1994 American Mathematical Society
0025-5718/94 $1.00+ $.25 per page

619
620 Z. I. WOZNICKI

In practice two approaches are used to determine a>opt• One approach pro-
posed in the literature [2, 3, 4] is determining <yoptdynamically, as the SOR
iteration proceeds with using some w, < coopt. Then by examining certain con-
ditions for quantities derived from current numerical results, <y, is updated to
a new relaxation factor co¡+x< coopt until the assumed tolerance criterion is
satisfied.
The second approach for determining cooptis based on obtaining an a priori
estimation of pi2fx), usually by means of the power method or its modifica-
tions. As is well known, the rate of convergence of the power method is governed
by the ratio of the largest subdominant (in the absolute value) to the dominant
eigenvalue. If this ratio is close to unity, the power method will converge very
slowly and in such a case determining a>optmay be more time-consuming than
the SOR iteration itself.
Basically, there is no general comparison procedure to determine which ap-
proach is "best". However in the case of 2-cyclic consistently ordered matrices,
an accurate estimate for pi2\) prior to the SOR iteration solution can be ef-
fectively obtained by an appropriate use of power method iterations, and this
topic is the main purpose of the paper.
In the next section the SOR iterative method and the power method are
briefly described, and well-known basic results are recalled. These basic results
are essential in deriving the Sigma-SOR algorithm. The computational strategy
for determining the optimum relaxation factor tyopt is described in the third
subsection of §2.
The secondary purpose of this paper, discussed in §3, is to give numerical
results for a variety of problems presented in the literature in order to illustrate
the efficiency of the proposed method for the a priori determination of the
optimum relaxation factor tyopt•
2. Formulation
2.1. The SOR iteration method. In the iterative solution of the linear system
(1) Ax = b
the n x n matrix A is usually defined by the following decomposition:
(2) A = D-L-U,
where D, L, and U are diagonal, strictly lower triangular and strictly upper
triangular matrices, respectively.
The SOR iterative method [1] is defined by
(3) Dx('+1)= w[Lx(<+1)+Ux(/)+ b]-(ft)-l)Dx(i), t = 0,1,2,...
or equivalently, if D is a nonsingular matrix
(4) x('+1)=^,xW-i-(D-iaL)-1b)
where
(5) X = (D-û>L)-1[û>U-(û)-1)D]
is called the SOR iteration matrix and œ is the relaxation factor. For co = 1 the
SOR method reduces to the classical scheme known as the Gauss-Seidel iterative
method and
(6) ^ = (D-L)-'U
is called the Gauss-Seidel iteration matrix.
THE SIGMA-SORALGORITHM 621

In the point algorithm, the iteration proceeds for one component of the ap-
proximate solution vector at a time. For block or line algorithms, the iteration
involves improving simultaneously groups of components, and therefore they
are more efficient than the point algorithm. In this case the matrices D, L,
and U have a block structure corresponding to the assumed partitioning of
components.
It is well known [1] that in the case of 2-cyclic consistent orderings, when the
associated nonnegative Jacobi iteration matrix
B = D-'(L + U)>0
is convergent (i.e., piB) < 1), then J¿?x has only nonnegative eigenvalues A,
such that
(7) l>pi^fx) = Xx>X2>h>--- ,
and the following fundamental relation due to Young (see, for example [1] and
the references given therein) holds between A,-and the corresponding eigenval-
ues Vj Of S'a) '■
Vj + OJ-l |2
(8) h = \- CD

Moreover, /?(-%) = maxi<,<„ \v¡\ < 1 for 0 < co < 2, and its minimum value
is attained when
_ 2
(9) co = coopl= co= =,
1 +VI -/>(-Si)
in which case

co ,Ä,-B-.-f^Si
l + y/l-pi&l)
In the convergence analysis of iterative methods the iasymptotic) rate of con-
vergence
(11) R(S?)= -ln/>(^)
is certainly the simplest practical measure of the speed of convergence for a con-
vergent matrix ¡?. The rate of convergence is especially useful for comparing
the efficiency of different iterative methods, because the number of iterations t
required for reducing the error norm in a given method by a prescribed factor
f is roughly inversely proportional to the rate of convergence, and is given by
,,^ -lnf
(12) t^W),
where 9 is the iteration matrix of the method.
Thus, the efficiencyof different iterative methods (with a similar arithmetical
effort per iteration) can be theoretically evaluated by a comparison of the rate
of convergence. The data given in Table 1 (next page) illustrate the efficiency
of the SOR method by comparing it with the Gauss-Seidel method, where

is the theoretical coefficient of efficiency and to = coopt.


622 z. i. woznicki
Table 1
pj&x) Pi&is) E,
0.9 0.5195 6
0.99 0.8182 20
0.999 0.9387 63
0.9999 0.9802 200

As can be seen from Table 1, the efficiency of the SOR method drasti-
cally increases as pi¿¿[) becomes close to unity. For the case when pÇS[) =
0.9999, the SOR method is asymptotically 200 times faster than the Gauss-
Seidel method. Since <yoptis a function only of the spectral radius pi^x),
then for any efficient use of the SOR method, computing an accurate value of
pi2[) is needed, and the order of the accuracy in estimating pÇ2[) is depen-
dent on the closeness of p{£¡) to unity.
2.2. The power method. Usually, an estimate for pÇS[) is obtained by using
the ordinary power method [5], which will be used in the analysis presented
in this paper. The power method is conceptually and computationally the sim-
plest iterative procedure for approximating the eigenvector corresponding to the
dominant (largest in modulus) eigenvalue of a given matrix & . It is defined by
the iterative process

(14) e(')=^z('-l)=^2z(i-2) = ...=5"z(0) t= 1,2.


which converges for almost any randomly chosen nonzero starting vector z'0'.
We assume, throughout this paper, that the n x n real matrix & has n
linearly independent eigenvectors u,, and its eigenvalues A, will be ordered
such that

(15) Ai >|A2|>|A3|>--->|A„|.
Since by assumption, & has a complete set of eigenvectors u,, an arbitrary
nonzero vector z(0) can be expressed in the form

(16) z(0)= 5>u<->


i=i
where a¡ are scalars not all zero.
Then the sequence (14) has the representation

(17) z«>= 5>AÍ-u, = A<ami +5^fli(A,7Ai)'n,-= A'1[flm1+e('>].


i=i 1=2

Since |A,/Ai| < 1 for all i > 2, it is clear that z(i) converges to ui as t —►
co,
provided only that ax ^ 0.
Thus the vector z(/) is an approximation to an unnormalized eigenvector ui
belonging to Ai , which can be considered as accurate if ||e^|| is sufficiently
small. Since
¿,+» = X\+l[amx +e</+1)],
THE SIGMA-SORALGORITHM 623

it follows that for any 7'th component z¡ of the vector z,


z('+i) fll(Ul)j + (e('+1));
(18) JTiT=^ Ai as t —»00.
axinx)j + i¿»)j

The above result leads to computing the dominant eigenvalue by means of suc-
cessive approximations of the corresponding eigenvector in the simple power
method.
In practice, in order to keep the components of z(<) within the range of
practical calculation, its components are scaled at each iteration step, and (14)
is replaced by the pair of equations

(19) yW-^zC-1»,
(20) z(i)= y('7Uy(%,
and in this case,
(21) zO-u./llu.ll,
and
(22) Ily%-A, asi^oo,
where two norms, either the maximum norm || • H«, or the Euclidean norm
|| • ||2, are most commonly used.
The rate of convergence will depend on the constants a,, but more essentially
on the separation of the dominant eigenvalue from the largest subdominant
eigenvalues of %, that is, on the ratios |A2|/Ai, IA3I/A1,... , and it is evident
that the smaller these values, the faster the convergence. However, it may occur
that if z(°) is chosen as almost orthogonal to ui, then ax in (17) will be quite
small compared to the other coefficients, and whence for appropriate "small"
values of t, \axX\\ < |û2A2| and the ratio zj+ï'/ifj' will better approximate
A2 than Ai, assuming of course that Ai > |A2|. In the case when ax = 0, the
power method converges theoretically to the second eigenvector. However, in
practice rounding errors will introduce small components ui into the vector
z^ and those components will be magnified in subsequent iterations. Whence,
convergence is still likely to be to the first eigenvector, although with a larger
number of iterations than in the case when a more suitable starting vector z(°)
would be chosen.
In particular, if |A2|/Ai is close to unity, the accuracy of z(i) will be propor-
tional to (|A2|/Ai)' and the convergence may be intolerably slow, but still to the
dominant eigenvalue Ai . In such cases some practical techniques such as a shift
of origin, or Aitken's ¿2-process [5], can be used to speed up the convergence
of the simple power method.
In general, when Ai is the principal eigenvalue, the ratio

(23) o = max ¡M, 2<i<n


i \AX\
will be called the subdominance ratio, which with the assumed ordering of A,
according to (15) is equivalent to
(23a) ff = |A2|/Ai.
624 Z. I. WOZNICKI

However, it seems that from the terminology point of view some comments are
necessary. In the literature for o the term "dominance ratio" is usually used
by some authors. But it is also interesting to notice that other authors (espe-
cially the authors of books dealing with the convergence analysis of eigenvalue
problems) do not use the term "dominance ratio" at all. In the author's feeling
the term "subdominance ratio" for a seems to be more appropriate because
a increases with the absolute value of the largest subdominant eigenvalue, and
the dominance of the principal eigenvalue decreases.
Since the convergence to the dominant eigenvalue by the power method is
geometric in the subdominance ratio o, then by an analogy to the analysis of
iterative methods for solving linear systems of equations one can define the
(asymptotic) rate of convergence as
(24) R(50 = -lncT,
which is a useful measure for the speed of convergence to the dominant eigen-
value of a given matrix & in the power method.
Referring back to the SOR method, we find it convenient to first consider
the behavior of the eigenvalues v, of -2£> as a function of co for the case of
2-cyclic consistently ordered nonsingular matrices A = D-L-U of(l), where
D, L, and U are nonsingular diagonal, strictly lower triangular and strictly
upper triangular nonnegative matrices, respectively. As is well known [1], the
eigenvalues u¡ of -2^, are related by (8) to the eigenvalues A, of the Gauss-
Seidel iteration matrix Sfx, the special case of £?w with co = 1. The matrix
S?x has at least half the eigenvalues equal to zero, and the remaining ones are
positive and real, and such that
(25) l>pi^x) = Xx>A2>A3>-...
In the analysis of convergence properties of the SOR method, it is very useful
to investigate the behavior of the roots of (8),
1
(26) uf,u
1 ' "1
co2Xj± ^co2Xj[co2Xj-4ico-l)] (co-I).

Thus, when co = 1, it is clear that v+ = A, and v~ = 0. As co increases from


unity, vf and v~ are decreasing and increasing functions of co, respectively,
until co2kj - 4(<y - 1) = 0, which occurs when
_ 2
(27) co = C0j=-.
V l + y/T^Ii
and both roots coincide with the same value, that is, u* = u¡~ = co¡■■
- 1. For
co >cDj, the roots v+ and vf become complex conjugate pairs and increase,
the absolute value being co - 1. It is obvious that, for
_ 2
1 < CO< COx=
l + vT^Ii'
pi&a) = vx is a real and strictly decreasing function of co while for œx <
co< 2 one has />(-2L)= \co- 11.
However, we should add a note about negative eigenvalues v¡ which may
exist. The matrix Sfx has s (usually half of n) eigenvalues positive and n - s
zero. These positive eigenvalues A, give rise to the roots v+ while the zero
THE SIGMA-SORALGORITHM 625

0.20

1.00 1.20 1.40 1.60 1.80 ,, ZOO

Figure 1. The behavior of v and aw vs. co

eigenvalues Xi+Sgive rise to the roots v~ with co = 1. \f 2s <n, then there


are zero eigenvalues Xj, where 2s < j < n, which satisfy also the relation (8);
hence the corresponding eigenvalues v¡■= -ico- 1) are negative for all co > 1.
The typical behavior of the eigenvalues v¡ of J2£, versus co is shown in
Figure 1 for the example in which the three largest eigenvalues of 3[ are Ai =
0.98, A2 = 0.94, and A3 = 0.9, and the subdominance ratio ax = X2/Xx=
0.9592. As can be seen from Figure 1, there exist only two positive eigenvalues
vx and v2 for 0)3 < co < œ2, only one ux for W2 < co < Wx, and for
co > Wx all eigenvalues v¡ are complex (and negative if they exist) with the
absolute value equal to co - 1.
It is obvious that the subdominance ratio ow for the SOR matrix S?w is
a function of co and ow = ox = X2/Xx when co = 1. For 1 < co < W2,
o~w= v2jvx is a strictly decreasing function as co increases from unity (because
vx = vx decreases much less rapidly than v2 = v2) and at

(27a) CO= co2


l+y/T^h
achieves its minimum öa = v2jvx = (&J2- l)¡vx . For Zo~2< co <lo~x, ctw =
\co-1 |/i^i is a strictly increasing function of co and for all 0J1 < co < 2, om= I
because all eigenvalues v¡ have the same absolute value equal to \co- 1|.
In the example shown in Figure 1, the dashed curve illustrates the behavior
of Ofo versus co, where the minimum ~am= 0.6639 occurs at W2 = 1.6065.
In terms of the rate of convergence the theoretical coefficient of efficiency
R(fffi
(28) E,
R(cn)
is equal to 9.84. Thus for this example the computation of pi^fw) by means
of the power method with co = ~co2is asymptotically about 10 times faster than
with co = 1.
626 Z. I. WOZNICKI

2.3. The Sigma-SOR algorithm and computational strategy. The observations


in the previous subsection show the existence of the minimum value cTm< ox
and moreover they allow us to precisely identify its locality which occurs at
co = co2 minimizing the value of the subdominant eigenvalue v2 . The question
now arises whether there exists a mathematical basis for determining the value
of crw in dependence on ax = X2/Xx. The following theorem gives an answer
to this question.
Theorem. Let Vj be the eigenvalues of the nxn SOR iteration matrix
&„ = (D - coh)-x[coU - ico - 1)D]
and let Xj be the eigenvalues of the Gauss-Seidel iteration matrix
S?x=(D-L)-'U.
If the eigenvalues of both matrices are related by
Vj + co- I
(29) u¡ co
and ¿¿?xhas only nonnegative real eigenvalues such that
(30) 1 > Ai > A2> A3> • • • ,
then the subdominance ratio aw = v2jvx of Sw achieves its minimum cf0
v2jvx with
2 2
(31) CO= co2 =
1 + v/l -A2 1 + ^1 -oxXx
and is defined by the followingformula :
2 -1 =
(32) On, =
l + y/T^TX l + s/T=Tx
where ox = X2/Xx.
Proof. By using (29), one obtains that
V2 + CO- 1 1+ ^
(33) o\
A, v2 Vx+ co - 1
= o(1
l+Oco^ v2 J
or equivalently
(O-X i2
Oa> +
(33a) CTl= 7T
On, 1+ V
The proof follows immediately from a close inspection of (33). As was al-
ready stated, a a, is minimized when co = co~2and its value is am = ow = v2¡vx,
where v2 = co~2 - 1. Hence, for co = co2, (33) reduces to
2
(34) Ox=aa
l+CTn
and has the solution

"" 1+ VT^T - 1 = 1+ ^T^T


This completes the proof of the theorem. D
THE SIGMA-SORALGORITHM 627

It is necessary, however, to make some comments on the above result, because


(34) has two roots ?J^ < 1 (corresponding to the above result) and <f~ > 1.
Since with co = œ2 the matrix J2£, has only four real eigenvalues (see Figure
1) such that vx > v2+= v2 = v2 = co2 - 1 > vx > 0, then for ax < 1
_+ v2 2 . . _ v2 2
K = -7 = -<-n- -1 < l 3nd CT<o
= -t = -,-7t=-1>L
Vf 1 + \/l -Ox VX l~ \/l -ox
But both v2 and vf are subdominant eigenvalues and therefore the fact that
ct~ > 1 has no practical significance.
The most interesting conclusion from this theorem is the fact that the mini-
mum values of both spectral radius and subdominance ratio for the SOR iter-
ation matrix are governed by the same formula (see (10) and (32)). In other
words, for the same values of both pi¿¿[) and ox the quantities />(-2£>)and ow
achieve the same minimum value but with different values of co. It is evident
that replacing pÇ&x), PÍ&to)»and E, in Table 1 by ox,aw, and É, (defined
by (28)), respectively, the data of this table illustrate also the efficiency of the
power method in the asymptotic range as in the case of the SOR method.
Thus, the result of this theorem is of fundamental importance in the compu-
tational strategy for a "rapid" estimate of an "accurate" value of the optimum
relaxation factor coopXin the SOR method.
The algorithm for determining <yopt, called the Sigma-SOR algorithm, is
based on the following computational strategy. Assume that A* and a*, ap-
propriate estimates for Ai = pi2[) and <7i, respectively, are known. Using

(35a) co*=-,2
v ' l + Vl-cr*X*
we can obtain v* = pi&w) by the power method iteration until a required
convergence criterion is satisfied. Then from the relation (29) one obtains
2
v* + co*-I
(35b) A,= 1 CO*

which allows us to determine

(35c) Wx=
i + vT^ät
an a priori "accurate" estimate for coopt. Thus, the accuracy of coopt is condi-
tional to the computation of an accurate value of v*.
As is demonstrated in numerical experiments given in the next section, the
above algorithm, even with crude approximations A* and a*, is very efficient
and strongly competitive with the SOR adaptive procedure [1] when piSCx) is
very close to unity (0.999 < pi¿?x) < 1).
Estimates for a* approximating ox can be obtained by observing the decay
rate of some quantities, for instance

1 ' |A«)-A«-i)|'
or ratios of differences between the components of successive eigenvectors in the
iteration process of the power method (19), (20), using a suitable norm (see, for
example, [4], where the term dominance ratio is used for a). As follows from
628 Z. I. WOZNICKI

(18), for each ;'th nonzero component z; of z approximating the eigenvector


corresponding to the dominant eigenvalue in the power method, we have that

z('+1) a.(ui); + (e('+1>);


A, as t —►
co,
(37) A('+1)= ^r=A1 a,(u,);-f(e«)), J
where

(38) (e(%
=£(|)W-
Substituting (37) into (36), one obtains

a(t+X) = a,(ui); + (e^')). (e('+')); - 2(eC>); + (e^-1));


û,(n,); + (eW); L(e(0);- 2(e«-D); + (e«-2));J '
and for / sufficiently large, a,(u,); > (e(<));, so that

(e«+i))7_2(eW); + (e"-1));
(39) o (t+X)
(eW);-2(e(i-1)); + (eC-2))/
Assume now that for any t' > 1
t' t'
(40) y-j a2(u2); > (y) a/(u/); for all 3 < /' < n.
Li/ \*i

Equation (38) can be written in the form

(41) (e(0);=(^) fl2(u2); + (ëW);,

where

(42) (ë% = £ (£)'*(»,);•

Substituting (41) into (39) yields

¿1
(it)' - 2 (it)'"' + (S)'"2] "^ + (è«+1>);- 2(è«>);+ (è«-1»);
rC+l)

fer-2fer-fer2 fl2(n2); + (ëW); - 2(ë«-D); + (ë«-2));

But when r > í', the relation (40) implies that (ë^); becomes sufficiently
small, and it can be concluded that

rC+l) A2
(43) Ox-

In the calculation of pi^x) (or p(J%o)) by means of the algorithm of the


power method defined by (19)—(22), the notation AM= pi^x) (or i^i = p(3a¡))
corresponds to using the maximum norm || • Hoc, and Xe = pi^x) (or i>e =
piSCnf)) corresponds to using the Euclidean norm ||-||2 in the scaling procedure.
THE SIGMA-SORALGORITHM 629

With these notations,


iC+i) 2CO1

13C+1) jW i
(44b) 4'+1)- \ "A1-
Usually, the convergence behavior of both Am and crM have a monotone de-
creasing character, whereas for Ae and <Teit was observed that they first in-
crease and then (mainly for AE) slowly decrease as the number of iterations
increases.
In the case of using the Euclidean norm for scaling purposes, the following
two additional measures for a can be used:
(AAC\ ff('+l) _ lllv('+i)i|
I »YE Hoc~ ||v(0|i
IIYelleoli
EM " lllv(i)ll
I IlJE 11°° -\\y{'-x)\\
ll*E I
II001
and

1 } EE "lllyEi)-yri)ll2-llyri)-yr)ll2l'
where the successive eigenvectors yE+1), yE' , yE-1), and yE' are generated
by (19)—(22) with using the Euclidean norm for scaling.
As demonstrated in numerical experiments, the most rapid convergence is
observed for (Teewith a monotone increasing character, which provides certain
values estimating the true ox from below.
As can be seen from Figure 1 the behavior of am near <y2 is similar in
nature to the behavior of />(-2£>)near <y,. From an inspection of the slope of
the curve for ow near co2, it follows that errors with underestimating co2 give
larger values of ow than errors (comparable in size) with overestimating w2 .
In the range 1 < co < W2, the value of ow can be determined from (33) in
dependence on cr, and (<w- l)/v2 (or (w - l)/vx in the case of (33a)), and in
the range co2 < co < œx , it is defined by \co - l\/vx .
Thus, from the viewpoint of obtaining the maximum rate of convergence in
the power method, overestimating W2 is less dangerous than underestimating
<y2 by the same amount, but as ox approaches unity, this becomes a more
important problem because underestimating œ2 drastically decreases the rate
of convergence.
On the other hand, however, underestimating co2 may be attractive for ac-
celerating convergence by the use of the Aitken ¿2-process [5]. This procedure,
known also under the name of Aitken extrapolation, is a useful tool for improv-
ing convergence, and can be used for any process converging linearly (i.e., as in
(14), zW = ^z(/_1)). In the case of the simple power method, the convergent
sequence {A^} for the dominant eigenvalue can be transformed into a more
rapidly convergent sequence {A^} by using
cjC-2) _ ;C-ih2
(45) ¿(0 _ ¿(í-2) _ A('-2)-2A('-1)+A(')-
^ Á I

This process will be most effective if both eigenvalues vx = ux and v2 = v% are


real and well separated from v->,= v^. As can be easily concluded from Figure 1,
630 Z. I. WOZNICKI

this occurs when co is close to <y3, which minimizes Vj, for all 1 < co < w3 and
provides the best separation of vx and v2 from v$. The distance of separation
is a decreasing function as co increases for öJ3 < co < W2 and vanishes for
co~2< co < a>x because in this region all subdominant eigenvalues have the same
absolute value. Thus, the use of erEE, providing an underestimated value of ax,
can give some advantages in the form of an increased rate of convergence when
the Aitken extrapolation is applied. This aspect will be discussed and illustrated
by numerical results in the next section.
In conclusion it should be stated that in the efficient use of the power method
for determining an accurate value of the optimum relaxation factor in the SOR
iterative method, the relaxation factors W2 and <y3 play an important role; W2
maximizes the rate of convergence in the simple power method, whereas <w3,
providing the best separation of two dominant eigenvalues from the remain-
ing subdominant eigenvalues of the SOR iteration matrix, maximizes the rate
of convergence of the Aitken extrapolation used as a practical technique for
improving the convergence of the power method.
3. Numerical Experiments
In this section the results of numerical experiments are presented for the
numerical solution of a two-dimensional elliptic equation of the form
,2
dtp2 i)„,2-|
dtp2
(46) -Dix,y) + Z(x, y)<p= s(x, y) forx,yeQ
dx2 dy2
with
<P(x,y) = g(x, y) or ^ = g(x,y) for x, ye díl,
where Q is an open bounded region with boundary d£l, n is the exterior
normal, Dix, y) > 0, and Z(x, y) > 0.
The standard finite difference discretization of (46) in a spatial mesh imposed
on Q leads to a system of linear equations of the form
(47) Acf>
= b,
where the components of 0 approximate the values of q> at each mesh point
(x, y). In the case of the natural ordering of mesh points for the standard five-
point difference operator, the nxn coefficient matrix A has only five nonzero
diagonals forming a tridiagonal block structure suitable for the implementation
of the 1-line SOR algorithm, and is 2-cyclic consistently ordered [1].
Five test problems taken from the literature [6, 7] are considered with dis-
continuous coefficients D and I, but chosen to be constant in each subregion
flfc > and different boundary conditions on dQ for uniform and nonuniform
mesh structures.
Test Problem 1. This example, obtained by assuming D = 1 and 1 = 0 in Q,
the unit square (0, 1) x (0, 1), the Dirichlet boundary conditions <p= 0 on
d£l, is usually used as a model problem in the analysis of numerical solutions
of elliptic-type problems. A square mesh with width h = j^ yields n = N2
mesh points, which is also the order of A. We assume n = 48 x 48 = 2304, as
in Problem A in [6].
Test Problem 2. In this problem (Problem B in [6]), whose domain and coeffi-
cients are depicted in Figure 2 (the numbers on the x-axis and y-axis in this
THE SIGMA-SORALGORITHM 631

y
g'0
dy
23
Í34
18
5.
03
4>= o 12 0 = 0 4.
02 9.
1.
fil

97
ay SI = (0,97)x(0.23)

Figure 2. Test Problem 2

Interval 1633 .3033 2562 17 .1 .17 2562 .25275 1633

To vertical line 15 16 17 22 23

yt
23

02 500. 0.05
2. 0.05
15

Oi
O = (0,4.65925)x(0,4.65925)

j£ = 0 on an
an

15 23

Figure 3. Test Problem 3


and subsequent figures are indices of mesh lines, not values of x and y), there
is a discontinuity of coefficients in the vertical direction, and mixed boundary
conditions are used on d£l as shown in Figure 2. The number of mesh points
is n = 96 x 24 = 2304, where h = 1 is assumed in both horizontal and vertical
direction.
Test Problem 3. In this problem (Problem C in [6]), with n = 24 x 24 = 576
and discontinuous coefficients, a nonuniform mesh is used. The mesh division,
assumed the same in both horizontal and vertical direction, corresponds to the
mesh division used in Problem 5 given in Reference 7 of [6]. The domain,
coefficients and the mesh division are depicted in Figure 3.
Test Problem 4. This problem, taken from [7] (and analyzed in [8]), has a
strongly discontinuous D, and n = 48 x 48 = 2304 in the square mesh shown
in Figure 4.
632 Z. I. WOZNICKI

49
D
02
1000.
37 1.

Oi n = (o,i)x(o,i)

o on an
12

12 37 49

Figure 4. Test Problem 4

Interval .05263158 .05

To vertical line 19 41

41
///////// Ol ////////
39

1. 0.02
2. 0.03
02 3. 0.05

(0,2.l)x(0,2.1)
19

d-± = 0 on an
an
03

19 39 41
Figure 5
Test Problem 5. This problem, also taken from [7] (and analyzed in [8]), has a
slightly modified mesh division, giving n = 42 x 42 = 1764, in order to keep
the number of horizontal lines divisible by 2 for convenient use of 2-line SOR
algorithms. The domain, coefficients and mesh division (assumed the same
in both directions) are depicted in Figure 5. In [7, 8] a uniform mesh with
h = 0.05 was used, giving the number of mesh points n = 43 x 43 = 1849.
THE SIGMA-SORALGORITHM 633

For solving (47) in the above five test problems, the following line algorithms
of the SOR iterative method are used [1, 6]:
1. SLOR—1-line system,
2. S2LOR—2-line system,
3. S2LCROR—2-line cyclically reduced system.
In our computations for each problem it was assumed that six, h) = 0 in
(46), so that the unique solution of each discrete problem is the null vector. All
components of the starting vector tb^ were equal to unity, and computations
for each iterative method were continued until the maximum absolute value of
all components of the iterate <t>^was less than a prescribed number e . Thus,
the stopping criterion
(48) £(i)= Halloo<e
can be considered as the most reliable measure of the error vector in estimating
the accuracy of the solution.
All computations were carried out on a PC computer in single-precision FOR-
TRAN for the SOR iteration (including the calculation of the coefficient matri-
ces A), and in double-precision FORTRAN for the power iteration. The results
of computation are shown in Table 2 (next page).
The accurate value of Ai was obtained with co = 1 when the stabilization
to nine significant figures of Ae was observed in the power method ((19)—(22),
using the Euclidean norm); h and I a are the numbers of iterations observed
in the power method without and with using the Aitken extrapolation (45),
respectively; Is is the number of SOR iterations required to satisfy the stopping
criterion (48) for two successive iterations with Z5Xas the optimum relaxation
factor and for two values e = 10-6 and e = 10~8.
The results obtained when using the SOR adaptive subroutine [1, pp. 368-
372] are shown under items 6, 7, and 8 of Table 2.
The data given in items 9-15 are related to computing the accurate value of z^i
(with stabilization to nine significant figures) in the power method with the value
of co = W2 determined from (27a) where A2= {oï[accur]} x A, and a,[accur],
approximated by (Tee (defined by (44d)), was obtained with the calculation of
Ai in item 1 for co = 1. Hence, by (8) and (27), the accurate value of Wx can
be found. Provided ax is known, the accurate value of the optimum relaxation
factor coopt= Wx can thus be efficiently computed. Comparison of the number
of iterations h (or 7A) given in items 2 and 13 allows us to illustrate the
efficiency of the power method used in the case when co = cö2 for each test
problem. The values of o^ given in items 14 and 15, and computed from
(w2 - l)/^i and (32), respectively, indicate the consistency of the results in all
cases, except for Test Problem 5 solved by the SLOR iterative method, where
ox = 0.9944 was found to only four significant figures.
The results obtained for the Sigma-SOR algorithm are given in items 16-
27. The subdominance ratio ax, approximated by oee is estimated once the
stopping criterion
(49) S^ = \o{^-oE'-l)\<S = lO'3
has been satisfied in two successive iterations in all test problems; Zee is the
634 Z. I. WOZNICKJ

O) (\j r- — ; —" m co n — co — or-cnenenw — nmcM


cooDon:— (nocot-
V) m co en f\J co r~ co (O co O)
co oo t- n
en cm —

en o in o cm en w r- n cm o en in
iû co r\i en m in «r w
t* co
— o—

(O OJ (O
(O M CM CD O
a t
— co —
en en co io
m en o 8 S

n oj cm m tn w**in(Mcnior»cn
en o co r- *r
co — CD im ID

n r* — o co en —•
w o —
en n oo t-
co co N oo — co —

in n — EE ¡S 2

■w •"■ o h- m IM (M M N ; n o in tn r-
n « n m h i r^ r» in en t- rg cm o (0 o T t—
in cm co r- en CO O) (M
3
E î n tn r> m
: N
,-
O
— o
1- 8 8 -> •* [M t-

c
o O) — O)

Nin-NOI-O* in o) n c->
3 : s t- r- in c- o u> en o in m i- «r oo o
r- —
r-
t-
c-
m
co — co —

e
o
en oj —
-• (O 00
en — cm en en cm

U cd t- en en co (o in h- — eninontootofï
: m n O to to ■* w COCOV-tfCMCOr-O'tMlO
t- « N
(N en co co
en — eu
W

3 v œ co m to
CO O) CM
" : (- 1D Ol "

r~ o m en -• to oo r- t- co
co co n r- o çp (M r> to 0)
2tn fi 8
en m
2en

O m t* n m en en « n en to to r- en oo <* ; t*
r- in co

co n -■

v t- m cm (M t —" o o m m tn m o . . m to im to
co co en co CM O tO (M (M
m en co co
r~ in co r-

o en in to en n co r- ui
tn cm
cm co »r *-!(•>« o m — rg

,< — ,3 « « j 3 -. î. - b b

—■ (M PJ V in : (O f-
THE SIGMA-SORALGORITHM 635

respective number of iterations required. As can be seen in Table 2, the above


stopping test provides an underestimation of ox in all cases except for the
S2LCROR method in Test Problem 3, for which ctee gives a slight overesti-
mation of er, . In computing SJ2[est] according to (27a) it was assumed that
A2[est] = cteeA,'', where X\' is the approximation of Ai obtained at iteration
t = Zee and using Aitken extrapolation. In item 20 the value vE approximating
vi with co = cD"2 [est] is obtained by satisfying the stopping criterion

(50) SW= \v%)-v%-i)\<S=lO-\


which is achieved after /e iterations without Aitken extrapolation The corre-
sponding values of AE and coe are given in items 22 and 23. In items 24-27
the same quantities are given when Aitken extrapolation is used. For the SLOR
method in Test Problem 1 there is a small difference between coe and co/,,= Wx,
but in all remaining cases it is observed that <yE= Wa = cox and 7a is smaller
than 7E, as œ2 is more underestimated by <y2[est], because in this case the
separation of Ai and A2 from the remaining eigenvalues increases and Aitken
extrapolation becomes more efficient. In the case where the co used is close
to the true value of <y2 (item 11), this separation of Ai and A2 from the re-
maining eigenvalues disappears and the numbers of iterations Ie and 7a are
comparable (item 13).
Thus, with the choice S = 10-3 for erEEand S = 10-8 for i/A and with
the use of Aitken extrapolation, the Sigma-SOR algorithm provides an estimate
for eoA= co~x= coopt to six significant figures in all considered test problems,
with 7EE+ ^a (items 17 and 25) being the number of iterations required for
obtaining this estimate.
In all eigenvalue calculations carried out by means of the power method, all
components of the starting vector z^0^were taken to be unity.
The behavior of oE, oM, erEE.and ctem (defined by (44a, b, c, d)), repre-
senting different measures for ex,, versus the number of iterations is depicted
in Figures 6-10 (see pp. 636-638) for all five test problems solved by means
of the SLOR iterative method. As can be seen in these figures, crEE converges
most rapidly to er,. (The true value of ax given in item 9 of Table 2 is marked
in the figures by a straight line parallel to the x-axis.) In the initial phase of the
iteration process, ctee provides estimates of ct, from below, which are helpful
in using the Aitken extrapolation.
In the convergence behavior of ctm, the decreasing character is observed
as the number of iterations is increasing, but there are strong local variations
(occurring sometimes also for ctem)visible in all figures, except for Test Problem
2 depicted in Figure 7. In the case of Test Problems 1 and 4 (Figures 6 and 9),
it can be observed that for our starting vector z(°), all of whose components are
equal to unity, all measures considered for ox tend first to A3/A, and then to
ox = X2/Xxas the number of iterations increases. This is due to the fact that for
the assumed starting vector z(0) the inequality a^ > a2 in the representation
(16) implies that in spite of A2 > A3, the inequality IÛ3A3I» \a2X'2\holds
for appropriate "small" values of t, so that the inequality (40) is not satisfied
because t < t' (where t may not necessarily be very small if t' is very large,
as occurs in the case of Test Problem 4) and a^ will converge to A3/A,, the
dominant term in this range of ¿-values.
636 Z. I. WOZNICKJ

1.10

er
1.05
\em

1.00 h

^■'•A.—.,^,..jU-.^crr

0.95

0.90 1
\

0.85 J_l_i_i_i_J_1_L.
100 150 200 250 300 350

Iteration number

Figure 6. Test Problem 1

M: crM (eq. (44a)); E: cje (eq. (44b)); EM: cjEm (eq. (44c)); EE: C7EE(eq. (44d))

1.10

er
1.05

0.95 h

0.90

0B5I_l 1_i_i_l_i_L-
350

Iteration number

Figure 7. Test Problem 2

M: aM (eq. (44a)); E: <rE (eq. (44b)); EM: aEM (eq. (44c)); EE: crEE (eq. (44d))
THE SIGMA-SORALGORITHM 637

1.10

a
1.05

1.00

0.90

0B5 I_i_i_i_i_l_ _j_I_i_i_i_i_I—i_i_i—i—L. J_i__l_i_i_i_i_


0 50 100 150 200 250 300 350

Iteration number

Figure 8. Test Problem 3

M: crM (eq. (44a)); E: aE (eq. (44b)); EM: ffEM (eq. (44c)); EE: <rEE(eq. (44d))

1.10

O"

1.05

1.00

IE
1

_i_i_I_i_i_
100 150 200 250 300 350

Iteration number

Figure 9. Test Problem 4

M: cjM (eq. (44a)); E: cje (eq. (44b)); EM: ctem (eq. (44c)); EE: aEE (eq. (44d))
638 Z. I. WOZNICKJ

1.10 -

a .
1.05 - ]

1.0Oh ,

0.85 I_._i_i_i_I_i_i_i_i_L_j_i_i_i_l . . ._. I . , . i_.I


O 50 100 150 200 250 300 350

Iteration number
Figure 10. Test Problem 5
M: crM (eq. (44a)); E: <rE(eq. (44b)); EM: <tEm(eq. (44c)); EE: cjee (eq. (44d))
Moreover, it is interesting to notice that the convergence behavior of ctee
and 0M has a continuous character when passing from convergence to A3/A,
to convergence to X2/Xx, whereas for crE and ctem strong deviations similar to
discontinuities are observed.
It is a well-known fact that for the SOR iterative method the optimum relax-
ation factor <yopt= &>, which maximizes theoretically the rate of convergence
does not provide the best results. In practice, one observes the existence of
a best relaxation factor cob (slightly greater then <y0pt)which minimizes the
number of iterations for the required accuracy of the solution. Unfortunately,
there is no rigorous analysis in the literature explaining the reasons for this cob
and predicting its value. From numerical experience, it can be concluded that
cob is a function of cyopt and the required degree of accuracy of the solution.
One observes the following empirical formula:

(51) InicOß- 1) = -ln(<yopt- 1),

where the correction coefficient c = 1.02 when using e = 10~6, and c = 1.01
when using e = 10-8, provides a quite satisfactory estimate for cob . The use
of coB obtained from the above formula allows us to improve the convergence.
Usually, the number of iterations obtained with cob is about 15% less than that
obtained with <y0ptfor slowly convergent problems. The results obtained with
cob for two different stopping criteria are given in items 28-31.
The deterioration in the rate of convergence resulting from using an inaccu-
rate value of coopt is strongly dependent on the closeness of pi&x) to unity,
and it seems to be reasonable that this dependence should be taken in consid-
eration when estimating <y0pt a priori. The nature of calculating pi&x) by
THE SIGMA-SORALGORITHM 639

0.998952

0.991815

0.98 I_,_,_,_,_I_i_,_,_,_i_. _ I
O 50 100 150

Iteration number
Figure 11. Test Problems 1 and 2

means of the power method is such that the first few significant figures of pi2[)
are rapidly fixed at the beginning of the power iterations, whereas convergence
to the next figures begins to be governed by the subdominance ratio ox. The
behavior of />(-2,) versus the number of power iterations for Test Problems
1 and 2 is depicted in Figure 11 where the dashed curves (denoted by la and
2a) correspond to using Aitken extrapolation for accelerating the convergence
in the power method.
In the determination of ft>0ptbased on a priori estimates for />(-2,), the
application of the stopping criterion
(52) a« = lA^-Ai1"0! < S= 10-3|(1-A«)|;
where A^ is an approximation of Ai = pi¿¿[) in the power iteration / using the
Aitken extrapolation, yields results strongly competitive with the SOR adaptive
procedure [1] when the values of piSff) are close to unity.
In items 32-35 of Table 3 (next page) results are given for all test problems
solved by the SLOR method in which the estimate of <yopt is based on the
computation of Ai = p^Sx) by using the stopping criterion (52); the remaining
items quoted from Table 2 are given for comparison purposes.
Table 4 summarizes the results obtained for different computational strate-
gies implemented in four programs used for solving the test problems. The data
given in this table represent the numbers of iterations required to obtain the so-
lution which the stopping criterion ||c/>W||oo< 10~6 satisfied for two successive
640 Z. I. WOZNICKI

Table 3. Results obtained with using the "dynamic" stopping


criterion (52)

Test Test Test Test Test


Problem 1 Problem 2 Problem 3 Problem 4 Problem 5

1. \ [accurl 991815239 998951986 999961143 999983580 999956430


2. I * 650 462 571 329 145
3. ü 1.83407 1.93728 1.98761 1.99193 1.98689
i
4. I [e«10"Bl 106 269 1347 2048 1281

1.83328 1.93587 1.98765 1.99186 1.98700


«dtp
7 I Ic-10"8] 127 343 18S3 3090 1738

17. I 39 46 22 25 22
EC
25. I 100 67 76 69 27
á
28. u 1.83704 1.93847 1.98785 1.99209 1.98715
a
29. I [c=10"8l 99 229 . 1139 1736 1077

32. X [6=10 (1-X )) 991816463 9989290S4 999960952 999983490 999956396


33. I 35 96 155 101 18
A
34. u 1.83408 1.93662 1.98758 1.99191 1.98688
E.t
35. I (c-10"el 106 283 1365 2077 1286

Table 4. Comparison of computational strategies

Program Test Test Test Test Test


No. Problem 1 Problem 2 Problem 3 Problem 4 Problem S

Al 127 343 I8S3 3090 1738


Bl 106 (35) 283 (96) 1365 (155) 2077 (101) 1286 (18)
Cl 99 (139) 229 (113) 1139 (98) 1736 (84) 1077 (49)

A2 83 208 1132 2047 11S4


B2 72 (21) 193 (58) 866 (74) 1501 (5S) 890 (8)
C2 66 (82) 160 (91) 740 (93) 1284 (69) 7S9 (80)
D2 61 169 752

A3 70 195 997 1705 925


2-1lne
B3 63 (17) 162 (45) 733 (70) 1270 (43) 775 (4)
cyclleally
reduced C3 58 (82) 136 (87) 634 (75) 1103 (62) 654 (99)
D3 52 145 681

iterations. The numbers given in parentheses correspond to the number of


iterations required to compute the relaxation factor co for a given strategy.
The A program uses the SOR adaptive procedure [1]. In the B program
the estimate of cci0ptis based on computing Ai = pi^x) by using the stopping
criterion (52) and Aitken extrapolation as an acceleration procedure. The C
program uses the Sigma-SOR algorithm for computing cob ■ The numbers at-
tached to the programs correspond to the applied solution methods, which are
specified in the first column of the table. In addition, the results from [6] are
quoted under the D2 program, which uses the 2-line cyclic Chebyshev method
THEsigma-sor algorithm 641

applied to the original system, and the D3 program, which uses the 2-line cyclic
Chebyshev method applied to the cyclically reduced system. Both these pro-
grams were used in [6] for solving Test Problems 1, 2, and 3 only; the results
from these programs for Test Problems 4 and 5 were not available.

4. Concluding Remarks
From the practical point of view, the best solution method is one that for the
required accuracy provides the solution with the minimum total arithmetical
effort, which is what mainly determines the cost of computations. In the case
of the SOR iterative method, the arithmetical effort is roughly proportional to
the number of SOR iterations required for obtaining the solution with a given
degree of accuracy, and the number of power iterations required for estimating
the appropriate relaxation factor co. Since the number of arithmetical opera-
tions per iteration in both SOR and power methods are comparable (the power
method defined by (19)—(22)needs a few additional arithmetical operations for
computing the Euclidean norm and for division by this norm), the efficiency of
the assumed solution method can be measured in terms of the total number of
iterations. Moreover, this total number of iterations, as well as the fraction of
both SOR and power iterations, may change from problem to problem.
The number of SOR iterations is roughly inversely proportional to the rate of
convergence where the deterioration of the convergence rate resulting from using
an inaccurate value of <woptis strongly dependent on the closeness of pi3\)
to unity. The speed of convergence in the power method is governed by the
value of the subdominance ratio ow , which determines the rate of convergence,
similarly as pi^fw) does in the SOR method, and the number of power iterations
is also strongly dependent on the closeness of ox to unity or on the degree of
separation of two dominant eigenvalues from the remaining ones, if the Aitken
extrapolation is used. Thus, it seems that the selection and application of the
iterative strategy for solving different problems should be based more on the
analysis of results obtained in practice than on theoretical considerations.
In the test problems considered in this work and representing a class of nu-
clear engineering problems, we have

0.978 < pi&x) < 0.99999 and 0.96 < ox < 0.995,

so that the analysis of numerical results obtained for these problems should also
be conclusive with solving large-scale scientific problems.
It seems that in the selection of computational strategy in solving elliptic-
type problems, the SOR adaptive technique (implemented in the Al, A2, and
A3 programs) is favored in the literature [1, 2, 3, 4, and 6] as a more efficient
solution method in comparison with the computational strategy based on a pri-
ori estimate of tyopt. However, the numerical experiments on all test problems
considered here show that the B2, B2, and B3 programs, in which an a priori
estimate for <yoptis obtained by calculating A, = piS'f) with the power method
accelerated by Aitken extrapolation and using the stopping criterion (52), are
competitive with the Al, A2, and A3 programs, especially when />(J?¡) is close
to unity.
As can be seen in Table 4, in the case of Test Problem 1 the Bl program
642 Z. I. WOZNICKI

needs 14 iterations more (that is, about 10% more) than the Al program. But
for Test Problem 4 the difference is equal to 912 iterations in favor of the Bl
program, which corresponds to about 40% more iterations in the Al program.
Since both test problems have the same size (2304 mesh points), the advantages
resulting from solving Test Problem 4 by the Bl program in comparison to the
Al program can be estimated by this difference of iterations, which in this case
is about seven times greater than the total number of iterations required for
solving Test Problem 1 by the Al or Bl programs.
Suppose that both problems are solved with an a priori estimate for <yopt
based on using the accurate value of pi¿¿[) given in item 1 of Table 3 and
obtained with 650 and 329 iterations (item 2 of Table 3) for Test Problems 1
and 4, respectively. Then, in the case of Test Problem 1 the solution is obtained
with 106 iterations (the same number of iterations as for the Bl solution), but
the total number of iterations is increased to 755, that is, 615 iterations more
than for the Bl solution given in Table 4. For Test Problem 4 the total number
of iterations (accompanied by a small decrease of SOR iterations) is increased
to 2377, that is, 199 iterations more than for the Bl solution but still much less
than for the Al solution. A similar behavior can be observed when comparing
the results of Table 4 given for the A2 and A3 programs with those given for
the B2 and B3 programs, respectively.
From the above comparisons, it is apparent that in the solution method based
on a priori estimates for coopt,the main difficulty lies in the choice of the degree
of accuracy appropriate for estimating piSfx) in a given problem; it is probably
for this reason that a priori estimates for eooptare given less attention in the
literature. However, as can be concluded from the results given in Table 4 for
the Bl, B2, and B3 programs, the simple trick of using the stopping criterion
(52) conditioned by the closeness of pi3\) to unity allows us in some sense
to avoid this main difficulty and to make a priori estimation of oj0pt a more
useful computational technique and competitive with the solution method based
on using the SOR adaptive procedure [1], especially for problems in which
the values of pi&x) are very close to unity. In the range 0.98 < piSCx) <
0.999, represented by Test Problems 1 and 2, the SOR adaptive procedure
discussed extensively and illustrated numerically in [1] just for this range of
values of p{3[), provides solutions with a smaller number of iterations than
in the case of using a priori estimates for <yoptbased on the stopping test (52).
But as was demonstrated above for Test Problem 1, the advantages resulting
from decreasing the total number of iterations have no practical significance
because in this range of spectral radii, the deterioration of the convergence
rate caused by using an inaccurate value of coopXdoes not strongly change the
number of iterations. For the class of problems with 0.999 < pi&x) < 0.99999,
represented by Test Problems 3, 4, and 5, the efficiency of solution becomes
more sensitive to the accurate value of coopXas pi3\) approaches unity, and
the computational strategy based on determining an accurate value of <y0pt
prior to the SOR solution is much superior than the SOR adaptive technique,
as can be seen in Table 4. In this case, the last estimate for <yoptin the SOR
adaptive technique is most time-consuming because aw becomes close to unity
(see Figure 1). It is interesting to note that in the case of Test Problem 5
extremely small numbers of iterations are required to a priori estimate cyoptin
the Bl, B2, and B3 programs.
THE SIGMA-SORALGORITHM 643

In the Cl, C2, and C3 programs, the Sigma-SORalgorithm defined by (35a)-


(35c) is used for the a priori determination of <yopt, whose value to six sig-
nificant figures was computed with the choice of S = 10-3 for approximating
Ox by ctee and S = 10~8 for approximating v* by vK, and using the Aitken
extrapolation. The detailed results are given in items 16-27 in Table 2. In SOR
iterations the best relaxation factor cob is used which is computed from the
relation (51) and is given in item 28 of Table 2. As can be seen in Table 4,
the Sigma-SOR algorithm needs about 100 iterations for computing coopXto six
significant figures in all test problems. For Test Problem 1 the number of iter-
ations required to obtain this accurate estimate for coopXexceeds the number
of SOR iterations, so that the total number of iterations in the Cl, C2, and C3
programs is about two times greater than in the Al, A2, and A3 programs, re-
spectively. However, as pi2Cx) becomes close to unity in the next test problems,
the efficiency of the computational strategy with the Sigma-SOR algorithm is
strongly improving in comparison to the former solution methods. Moreover,
it is observed that in the case of Test Problems 3, 4, and 5 solved by the Cl, C2,
and C3 programs, the total number of iterations (needed for estimating cob and
obtaining the solution) is smaller than the number of SOR iterations observed
when using the accurate value of coopX = co~x(items 3 and 4 in Table 2).
The results for Test Problems 1, 2, and 3 obtained in [6] by means of the
D2 program, using the 2-line cyclic Chebyshev method applied to the original
system, and the D3 program, using the 2-line cyclic Chebyshev method applied
to the cyclically reduced system, are given additionally in Table 4. From an
inspection of these results, it is apparent that the solution efficiency of the D2
and D3 programs, which is the best in the case of Test Problem 1, decreases
when going to Test Problems 2 and 3 in comparison to the convergence behavior
of the C2 and C3 programs, respectively. For Test Problem 3, the C2 and C3
programs provide solutions with the total number of iterations somewhat greater
than in the D2 and D3 programs. However, as follows from an exact calculation
of the number of arithmetical operations for the obtained solutions, the C2 and
C3 programs need somewhat less total arithmetical effort than the D2 and D3
programs, respectively. This is due to the fact that in each iteration of the
D2 and D3 programs, except for the arithmetical operations related with the
solution, additional arithmetical operations are required for the computation
of the Euclidean norm, whereas in the C2 and C3 programs only about 10%
of the number of iterations (the numbers given in parentheses in Table 4) are
related to those additional computations.
Thus, it can be concluded from the results obtained for our test problems, that
the Sigma-SOR algorithm based on the important theoretical result given by (32)
is a useful computational tool for the calculation of an accurate a priori estimate
of cy0pt, which in turn allows to determine the best relaxation factor cob from
(51) when solving problems for which 0.999 < /?(-S1)< 1. In comparison to the
SOR adaptive procedure, the efficiency of the Sigma-SOR algorithm increases as
pi^fx) and cri become closer to unity; and it seems that for the range 0.999 <
tri < 1, the Sigma-SOR algorithm should be extremely efficient. In the case
when the matrix problem (47) is to be solved many times for different vectors
b, the advantages resulting from using cob obtained by means of the Sigma-
SOR algorithm are obvious.
644 Z. I. WOZNICKI

Finally, it should be mentioned that the subsequent updated values of <y, in


the SOR adaptive technique are underestimated with respect to coopX,but this
underestimation drastically decreases the rate of convergence as /?(-S,) becomes
close to unity, and therefore the efficiency of the SOR adaptive procedure also
decreases when piS'x) approaches unity.

ACKNOWLEDGMENT
The author would like to thank Drs. J. Kubowski and K. Pytel for their
useful discussions and comments, as well as M. Sei. P. Jarzembowski for his
expert programming assistance. Thanks are also due to the Editor, Professor
W. Gautschi, for his significant contribution with revising the manuscript.

Bibliography

1. L. A. Hageman and D. Young, Applied iterative methods, Academic Press, New York, 1981.
2. B. A. Carré, The determination of the optimum accelerating factor for successive over-
relaxation, Comput. J. 4 (1961), 73-78.
3. H. E. Kulsrud, A practical technique for the determination of the optimum relaxation factor
of the successive over-relaxation method, Comm. ACM 4 (1961), 184-187.
4. L. A. Hageman and R. B. Kellogg, Estimating optimum relaxation factors for use in the
successive overrelaxation and the Chebyshev polynomial methods of iteration, WAPD-TM-
592, 1966.
5. J. H. Wilkinson, The algebraic eigenvalue problem, Oxford Univ. Press, London, 1965.
6. L. A. Hageman and R. S. Varga, Block iterative methods for cyclically reduced matrix equa-
tions, Numer. Math. 6 (1964), 106-119.
7. P. Conçus, G. H. Golub, and G. Meurant, Block preconditioning for the conjugate gradient
method, SIAMJ. Sei. Statist. Comput. 6 (1985), 220-252.
8. Z. I. Woznicki, On numerical analysis of conjugate gradient method, Japan J. Indust. Appl.
Math. 10(1993), 487-519.

Institute of Atomic Energy, 05-400 Otwock-Swierk, Poland


E-mail address : r05zw(5cxl. cyf. gov. pi

You might also like