A Comparison of Subspace Methods For Sylvester Equations: Mathematics Institute
A Comparison of Subspace Methods For Sylvester Equations: Mathematics Institute
A Comparison of Subspace Methods For Sylvester Equations: Mathematics Institute
Mathematics
Institute
Preprint
No. 1183
March, 2001
March, 2001
Abstract
Sylvester equations AX ,XB = C play an important role in numerical linear
algebra. For example, they arise in the computation of invariant subspaces,
in control problems, as linearizations of algebraic Riccati equations, and in
the discretization of partial dierential equations. For small systems, direct
methods are feasible. For large systems, iterative solution methods are available,
like Krylov subspace methods.
It can be observed that there are essentially two types of subspace methods
for Sylvester equations: one in which block matrices are treated as rigid objects
(functions on a grid), and one in which the blocks are seen as a basis of a
subspace.
In this short note we compare the two dierent types, and aim to identify
which applications should make use of which solution methods.
1
Equations of type (B) are dierent in the sense that it does not really matter whether
X or XF is produced by the numerical algorithm, where F may be any basis trans-
formation of IRk ; indeed, right-multiplication of X by F does not change the col-
umn span, showing that F does not even have to be known explicitly. This freedom
should, whenever possible, be exploited by the solution algorithms.
We remind the reader [4] that the Sylvester equation AX , XB = C is non-singular
if and only if A and B do not have an eigenvalue in common. For perturbation
theory (which is dierent than for general linear systems) we refer to [6].
1.1 Kronecker product formulation
Recall that any Sylvester equation can be written as an ordinary linear system of
equations since T : X 7! AX , XB is a linear mapping on IRnk . Dening a
function vec from the space of n k matrices to the space of nk vectors by
h i
vec(X ) = vec x1 xk = (x1; ; xk ) ; (1)
the action of T can be mimicked by an ordinary left-multiplication:
vec(T(X )) = vec(AX , XB) = (Ik
A , B
In,k ) vec(X ): (2)
Here, Iq is the q q identity matrix and
the Kronecker product, which, for general
matrices Y = (yij ) and Z = (zij ), is dened as,
2 3
y11 Z y1n Z
Y
Z = 4 ...
6 ..
.
7:
5 (3)
yn1Z ynn Z
Observation 1.1 The Kronecker product formulation in IRnk endowed with the
standard `2 -inner product is equivalent to the formulation in the space H(n; k) by
the identity
vec(A)vec(B) = hA; Bi: (4)
This shows that the application of standard solution methods for linear systems to
the Kronecker product formulation of a Sylvester equation, results in methods that
are particularly t for equations of type (A).
1.2 Basis transformations and assumptions
In theory, but practically only feasible if k is small, any basis transformation BF =
FT of B can be used to change the equation AX , XB = C into
AY , Y T = CF with Y = XF and T = F ,1 BF: (5)
This shows for example that if B is diagonalizable, the Sylvester equation reduces
to k decoupled linear systems.
We will assume that k n and that k and n are such, that direct solution methods
are not feasible. Hence we concentrate on iterative methods. Moreover we assume
that if k is small, B is not diagonalizable, since the resulting decoupling would
remove the typical Sylvester features and lead to ordinary linear systems.
2
2 Two model problems
In order to illustrate the two dierent types of Sylvester equations mentioned in the
previous section, we will now describe two sets of model problems. The rst set of
problems depends on a parameter that changes a partial dierential equation from
diusive to convective, whereas in the second set, the matrix A can be taken from
the Harwell-Boeing collection.
2.1 A model problem of type (A): Convection-Diusion equation
Consider the following simple convection-diusion problem dened on a rectangular
domain
, with constant convection vector b = (b1; b2) and right-hand side f ,
,u + bru = f in
; u = 0 on @
: (6)
We will use a grid of rectangles on
, where the x1-direction is subdivided into n +1
intervals of size h, and the x2 -direction into k + 1 intervals of size s. This yields
n k unknowns u(ih; js) that can be collected in an n k matrix X = (xij ) with
xij = u(ih; js). Note that due to numbering and notational conventions, the vertical
columns of X represent the horizontal x1 -direction. The following discrete problem
results,
1 D + b1 K X + X 1 D + b2 K = F: (7)
h2 n 2h n s2 k 2s k
Here, Dj , for j either n or k, is the j j tridiagonal matrix corresponding to the [-1 2
-1] approximation to the second derivative, and Kj the j j tridiagonal matrix corre-
sponding to the [-1 0 1] approximation to the rst derivative. Left multiplication by
these matrices represents dierentiation in the x1 direction, and right-multiplication
dierentiation in the x2 direction. Finally, F = (fij ) = (f (ih; js)).
2.2 A model problem of type (B): Invariant Subspace problem
A typical invariant subspace problem for a given matrix A would be to nd a full-
rank long tall matrix Y and a small matrix M such that AY = Y M . If such Y and
M are found, it also holds that AX^ = X^ (X^ AX^ ), where XR
^ = Y symbolizes a QR-
^ ^ ^
decomposition of Y . This is because := I , X X represents orthogonal projection
on the orthogonal complement of the columnspan of X^ , so ^ AX^ = 0. Now suppose
we have an orthogonal matrix Xj that approximates the invariant subspace X^ , then
a new and hopefully better approximation Xj +1 can be found by solving
AXj+1 , Xj+1 (XjAXj ) = AXj , Xj (XjAXj ): (8)
This is one iteration of the block Rayleigh quotient method. Clearly, it is only the
column span of Xj +1 that is of interest here.
Remark 2.1 Another approach leads to a Sylvester equation that is neither of type
(A) nor (B). Let := I , Xj Xj. Then Xj + Q with QXj = 0 spans an invariant
subspace if Q satises
XjQ = 0 and AQ , Q(XjAXj ) = Q(XjA)Q , AXj : (9)
3
This is a generalized algebraic Riccati equation [2] for Q. Approximations to solu-
tions Q can be found by iteration: set Q0 = 0 and solve the Sylvester equations
XjQi+1 = 0 and AQi+1 , Qi+1(XjAXj ) = Qi (XjA)Qi , AXj : (10)
Since Qi denotes a correction to an invariant subspace approximation, the precise
columns of Qi are indeed of interest. But since the columns of Xj are to a certain
extend arbitrary, no particular structure can be expected to be present in Qi . For
theory on convergence of the above and related iterations, we refer to [13, 3, 9].
4
so, residual correction in an m-dimensional Krylov subspace. In the literature, two
essentially dierent types of Krylov subspace methods for Sylvester equations are
frequently found. In the rst, one Krylov subspace belonging to the operator T is
used to project upon. In the second, a Krylov subspace for A is tensored with a
(left-)Krylov subspace for B and the result is used to project upon.
3.2.1 Krylov subspace methods of type (I)
Krylov subspace methods can be applied to the Kronecker product formulation (2)
of a Sylvester equation. By Observation 1.1, it follows that in GCR, GMRES and
FOM, a linear combination of the matrices T(R0); : : :; Tm (R0) is determined that
approximates the initial residual R0 in some sense. Explicitly, in GCR and GMRES,
scalars
1; : : :;
m are determined such that
X m
RA1 := R0 , Tj (R0)
j (13)
j =1
has minimal Frobenius norm, while in the Galerkin method FOM those scalars are
determined such that RA1 resulting from (13) is h; i-orthogonal to Tj (R0) for all
j = 1; : : :; m.
3.2.2 Krylov subspace methods of type (II)
The second approach, due to Hu and Reichel [7], is to associate Krylov subspaces to
A and B separately, and to construct the tensor product space. Generally, assume
that Vp is an orthogonal n p matrix and Wq an orthogonal k q matrix. Then,
each p q matrix Ypq induces an approximation VpYpq Wq of the solution U0 of
AU0 , U0 B = R0 by demanding that
Vp (AVpYpq Wq , VpYpq Wq B , R0)Wq = 0: (14)
By the identity
vec(VpYpq Wq) = (Wq
Vp)vec(Ypq ) (15)
it can be seen that (14) is a Galerkin projection onto the pq -dimensional subspace
space Wq
Vp of IRnk . By choosing for Vp and Wq block Krylov subspaces with
starting blocks full rank matrices RA and RB such that R0 = RA RB , (14) can be
written as
HA Ypq , Ypq HB = (VpRA)(WpRB ); (16)
where HA := VpAVp is p p upper Hessenberg, HB = Wp B Wp is q q upper
Hessenberg, and both Vp RA and Wp RB tall upper triangular matrices. It was
shown by Simoncini [8] that this Galerkin method results in a truncation of an
exact series representation of the solution in terms of block Krylov matrices and
minimal polynomials. Hu and Reichel [7] also present a minimal residual method
based on the same idea.
Remark 3.1 In the case that k is small, Wq may be chosen as the k k identity
matrix. The action of B is then used exactly. The resulting projected equation is
then
HAYpk , Ypk B = Vp R0: (17)
5
After computing a Schur decomposition for B , the Golub-Nash-Van Loan algorithm
[5] can then be employed to solve the projected system.
3.2.3 Comparison of the costs
In the Galerkin method of type (I), the subspaces consist of m blocks of size n k
while the projected matrix is only of size m m. A sparse Sylvester action costs
O(kn2 + k2n) operations. The orthogonalization in step j costs j Frobenius inner
products, each of costs kn2 , so up to step m the construction of the Hessenberg
matrix and the projected right-hand side costs O(km2 n2 ). Constructing the solution
of the Hessenberg system costs only O(m2) operations. Producing the solution of
the large system costs O(mkn2 ). So, assuming that k << n is small, the overall
costs are O(mnk) for storage and O(km2n2 ) for computation.
In the method of type (II), the storage is pn + qk for the two Krylov matrices. The
construction of those matrices costs about pn2 + qk2 for the actions of sparse A and
B . Orthogonalizations are O(p2n2 ) and O(q2 k2). The Hessenberg matrices are of
size p p and q q and solution is about O(k3 + kp2 ) for Schur decomposition and
solving k Hessenberg systems. Again assuming that k << n, the storage costs are
dominated by O(pn) and the computational costs by O(p2 n2 ).
Observation 3.2 Assuming that p km, which means that the number of n vec-
tors involved in the projection process is for both methods the same, the second
method is slightly more computationally expensive. Put dierently, with the same
computational costs, the rst method is more ecient in the use of memory.
3.3 Implementation of the Galerkin methods
The implementation of the Galerkin methods FOM(I) and FOM(II) of type (I) and
(II) respectively, is done through Arnoldi orthonormalization of the blocks from
which the approximation is constructed. The orthogonalization takes place in dif-
ferent inner products, and for dierent operators. For FOM(I), the operator T is
used, for FOM(II) we assume that C has full rank and put Wp equal to the identity
of size k as in Remark 3.1. The Arnoldi parts are given as MatLab-like code below.
************ META-CODE USED IN FOM(I) * META-CODE USED IN FOM(II) *********
*
function [V,H,E] = BARNOLDI(A,B,C,m); * function [V,H,E] = BARNOLDI(A,C,m);
*
E = FrobNorm(C); V{1} = C/E; * [V{1},E] = qr(C,0);
for k=2:m * for k=2:m
W = A*V{k-1} - V{k-1}*B; * W = A*V{k-1};
for j = 1:k-1; * for j = 1:k-1;
H(j,k-1) = trace(V{j}'*W); * H{j,k-1} = V{j}'*W;
W = W - V{j}*H(j,k-1); * W = W - V{j}*H{j,k-1};
end * end
H(k,k-1) = FrobNorm(W); * [V{k},H{k,k-1}] = qr(W,0);
V{k} = W/H(k,k-1); * end
end *
*
***************************************************************************
6
4 Numerical experiments
Both FOM(I) and FOM(II) will be applied to solve the Sylvester equations of type
(A) and (B) described in Section 2. First problem is the convection-diusion problem
of Section 2.1 with n = 200 and dierent values for k and h = s = 0:001. This could
correspond to a problem in a thin tube. Convection parameter was set to ten and
in the long direction only. Listed in Table 1 is the amount of
ops needed to get a
relative residual reduction of 10,6 , and also the number of iterations.
CONVECTION DIFFUSION PROBLEM ON A THIN DOMAIN
k
ops(I) iters(I)
ops(II) iters(II)
100
49 1
1.1e6 1.1e6
49
90
47 2
1.9e6 2.1e6
44
47 3
2.7e6 3.1e6
43
80
48 4
3.3e6 4.4e6
41
70
60
50
50 5
3.7e6 5.9e6
39
Table 1. Number of
ops and number
40
30
As a second problem we took one iteration of the Block Raleigh Quotient iteration,
as explained in Section 2.2, applied to the matrix SHERMAN2 from the Harwell-
Boeing collection. This is a matrix of size 1080 1080. Again, for dierent values
of k, we computed the next iterate with both FOM(I) and FOM(II) starting with
the same approximation. In Table 2 below, the results are given in the same format
as for Table 1.
k
ops(I) iters(I)
ops(II) iters(II)
1 3.5e5 5 3.3e5 5
2 2.0e6 12 8.5e5 6
3 2.2e7 46 2.8e6 11
4 1.2e7 26 2.2e6 7
5 4.2e7 49 1.9e6 5
10 1 1 5.3e6 6
Table 2.
4.1 Conclusions
In both cases, the method FOM(II) performed better than FOM(I). For the problem
of type (A), the dierence is small, and also it should be noted that in spite of the
slightly larger number of
ops needed for FOM(I), it was faster in time. For the
problem of type (B), clearly FOM(II) outperformed FOM(I).
The main dierence between the methods is, that FOM(I) produces the exact solu-
tion in general only after nk steps, while FOM(II), due to the exact representation
of B , needs only n=k steps to bring A on upper Hessenberg form. Note that much
depends on the rank of the right-hand side matrix. In all our experiments, we took
it full rank. If it is not full rank, FOM(II) runs into problems because it produces a
rank decient Krylov basis.
7
Acknowledgments
The research leading to this note has been made possible through a Fellowship of the
Royal Netherlands Academy of Arts and Sciences (KNAW). The support of KNAW
is gratefully acknowledged.
References
[1] R.H. Bartels and G.W Stewart, Solution of the equation AX + XB = C ,
Comm. ACM 15, pp. 820{826, 1972.
[2] S. Bittanti, A.J. Laub, and J.C. Willems (Eds.)(1991). The Riccati Equation,
Communications and Control Engineering Series, Springer-Verlag, Berlin.
[3] J.H. Brandts (2000). A Riccati Algorithm for Eigenvalues and Invariant Sub-
spaces, Preprint nr. 1150 of the Department of Mathematics, Utrecht Univer-
sity, Netherlands.
[4] G.H. Golub and C.F. van Loan (1996). Matrix Computations (third edition),
The John Hopkins University Press, Baltimore and London.
[5] G.H. Golub, S. Nash and C. F. van Loan (1979). A Hessenberg-Schur method
for the problem AX , XB = C . IEEE Trans. Automat. Control., AC-24, pp.
909-913.
[6] N.J. Higham (1993). Perturbation theory and backward error for AX , XB =
C , BIT, 33:124{136.
[7] D.Y. Hu and L. Reichel (1992). Krylov subspace methods for the Sylvester
equations, Linear Algebra Appl., 172:283{314.
[8] V. Simoncini (1996) On the numerical solution of AX , XB = C , BIT,
36(4):182{198.
[9] V. Simoncini and M. Sadkane (1996) Arnoldi-Riccati method for large eigen-
value problems, BIT, 36(3):579{594.
[10] G.L.G. Sleijpen and H.A. van der Vorst, A Jacobi-Davidson iteration method
for linear eigenvalue problems, SIAM J. Matrix Anal. Applic. 17, pp. 401{425,
1996.
[11] E. de Souza and S.P. Bhattacharyya (1981). Controllability, observability and
the solution of AX , XB = C , Linear Algebra Appl., 39:167{188.
[12] G. Starke and W. Niethammer (1991). SOR for AX , XB = C , Linear Algebra
Appl., 154{156:355-375.
[13] G.W. Stewart (1973). Error and perturbation bounds for subspaces associated
with certain eigenvalue problems, SIAM Review, Vol. 15(4).
[14] G.W Stewart and J.G. Sun (1990). Matrix Perturbation Theory, Academic
Press, London.