Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

0610206v3 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Efficient numerical diagonalization of hermitian 3 3 matrices

Joachim Koppa

MaxPlanckInstitut f
ur Kernphysik,
Postfach 10 39 80, 69029 Heidelberg, Germany

A very common problem in science is the numerical diagonalization of symmetric or hermitian 33


matrices. Since standard black box packages may be too inefficient if the number of matrices is
arXiv:physics/0610206v3 [physics.comp-ph] 4 Jul 2008

large, we study several alternatives. We consider optimized implementations of the Jacobi, QL, and
Cuppen algorithms and compare them with an analytical method relying on Cardanos formula for
the eigenvalues and on vector cross products for the eigenvectors. Jacobi is the most accurate, but
also the slowest method, while QL and Cuppen are good general purpose algorithms. The analytical
algorithm outperforms the others by more than a factor of 2, but becomes inaccurate or may even
fail completely if the matrix entries differ greatly in magnitude. This can mostly be circumvented by
using a hybrid method, which falls back to QL if conditions are such that the analytical calculation
might become too inaccurate. For all algorithms, we give an overview of the underlying mathematical
ideas, and present detailed benchmark results. C and Fortran implementations of our code are
available for download from http://www.mpi-hd.mpg.de/globes/3x3/.

1. INTRODUCTION partly from the algorithms themselves, and partly from


the implementational details.
In many scientific problems, the numerical diagonaliza- In this letter, we will study the performance of several
tion of a large number of symmetric or hermitian 3 3 algorithms which were optimized specifically for 3 3
matrices plays a central role. For a matrix A, this means matrices. We will discuss the well-known Jacobi, QL
calculating a set of eigenvalues i and eigenvectors vi , and Cuppen algorithms, and compare their speed and
satisfying accuracy to that of a direct analytical calculation using
Cardanos formula for the eigenvalues, and vector cross
Avi = i vi . (1) products for the eigenvectors. The application of Car-
danos formula to the 3 3 eigenproblem has been sug-
An example from classical mechanics or molecular sci- gested previously in [9], and formulas for the eigenvectors
ence is the determination of the principal axes of a solid based on the computation of the Euler angles have been
object [1]. The authors interest in the problem arises presented in [10],
from the numerical computation of neutrino oscillation The outline of the paper is as follows: In Secs. 2
probabilities in matter [24], which requires the diago- and 3, we will describe the mathematical background
nalization of the Hamiltonian operator of the considered algorithms as well as the most im-
portant implementational issues, and discuss their nu-
merical properties. In Sec. 4, we will briefly mention

0 V
H = U m221 U + 0 . (2) some other algorithms capable of solving the 3 3 eigen-
m312
0 problem, and give reasons why we do not consider them
to be the optimal choice for such small matrices. Our
Here, U is the leptonic mixing matrix, m221 and m231 purely theoretical discussion will be complemented in
are the differences of the squared neutrino masses, and V Sec. 5 by the presentation of detailed benchmark re-
is the MSW (Mikheyev-Smirnov-Wolfenstein) Potential sults. Finally, we will draw our conclusions in Sec. 6.
describing coherent forward scattering in matter. If cer- The appendix contains two alternative derivations of
tain non-standard physics contributions are considered, Cardanos formulas, and the documentation of our C
the MSW matrix can also contain more than one non- and Fortran code, which is available for download from
zero entry [5]. http://www.mpi-hd.mpg.de/globes/3x3/.
There exist many publicly available software pack-
ages for the calculation of matrix eigensystems, e.g. LA-
2. ITERATIVE ALGORITHMS
PACK [6], the GNU Scientific Library [7], or the Nu-
merical Recipes algorithms [8]. These packages exhibit
excellent accuracy, but being designed mainly for very 2.1. The Jacobi method
large matrices, they may produce a lot of computational
overhead in the simple 3 3 case. This overhead comes One of the oldest methods for the diagonalization of an
arbitrary symmetric or hermitian n n matrix A is the
Jacobi algorithm. Discussions of this algorithm can be
found in [8, 1114]. Its basic idea is to iteratively zero the
a Email: jkopp@mpi-hd.mpg.de off-diagonal elements of A by unitary transformations of
2

the form contrast to the Jacobi reduction to diagonal form, the



1
Givens reduction to tridiagonal form is non-iterative),
.. and the Householder method, which we will discuss here.
. A Householder transformation is defined by the unitary
sei

c transformation matrix
.. ..

Ppq = . (3)

. 1 .
i
P = I uu (8)

se c

..
with
.
1
u = x |x|ei (9)
The matrices Ppq differ from the unit matrix only in the
(pp), (pq), (qp), and (qq) elements; c and s are required and
to satisfy s2 + c2 = 1, and can thus be expressed in the
x u
 
form 1
= 1+ . (10)
c = cos , |u|2 u x
(4)
s = sin
Here, x is arbitrary for the moment, and ei denotes the
for some real angle . The complex phase is absent if i-th unit vector. From a purely mathematical point of
A has only real entries. and are chosen in such a view, the choice of sign in Eq. (9) is arbitrary, but in the
way that the similarity transformation actual implementation we choose it to be equal to the
A Ppq A Ppq (5) sign of the real part of xi to avoid cancellation errors. P
has the property that Px ei because
eliminates the (pq) element (and thus also the (qp) ele-  
ment) of A. Of course it will in general become nonzero
1 + xuux uu x
again in the next iteration, where p or q is different, but (I uu )x = x
one can show that the iteration (5) converges to a diago- 2|x|2 |x|(xi + xi )
nal matrix, with the eigenvalues of A along the diagonal, (u x + x u)(x |x|ei )
if p and q cycle through the rows or columns of A [14]. =x
2|x|2 |x|(xi + xi )
The normalized eigenvectors are given by the columns of
the matrix (|x|2 |x|xi + |x|2 |x|xi )(x |x|ei )
=x
2|x|2 |x|(xi + xi )
Q = Pp1 q1 Pp2 q2 Pp3 q3 . (6)
= |x|ei . (11)

2.2. The QR and QL algorithms This means, that if we choose x to contain the lower n1
elements of the first column of A and set x1 = 0, ei = e2 ,
then
The QR and QL algorithms are among the most widely
used methods in large scale linear algebra because they


are very efficient and accurate although their implemen-
tation is a bit more involved than that of the Jacobi P1 A P1 = .. . . .. . (12)

method [8, 12, 15]. They are only competitive if applied . . .
to real symmetric tridiagonal matrices of the form


Note that the first row and the first column of this matrix

are real even if A is not. In the next step, x contains the



lower n 2 elements of the second column of P1 AP1 ,

T= .. . (7)
.
x1 = x2 = 0, and ei = e3 , so that the second row (and
column) is brought to the desired form, while the first
remains unchanged. This is repeated n 1 times, until
A is fully tridiagonal and real.
Therefore, as a preliminary step, A has to be brought to
For the actual implementation of the Householder
this form.
method, we do not calculate the matrix product PAP
directly, but instead evaluate
2.2.1. Reduction of A to tridiagonal form
p = Au, (13)

There are two main ways to accomplish the tridiago- K = u p, (14)
nalization: The Givens method, which consists of succes- 2
sively applying plane rotations of the form of Eq. (3) (in q = p Ku. (15)
3

With these definitions, we have the final expression For the small matrices considered here, these estimates
are not reliable. In particular, the QL method suf-
P A P = P(A pu ) fers from the fact that the first few eigenvalues require
= A pu up + 2Kuu more iterations than the later ones, since for these the
corresponding off-diagonal elements have already been
= A qu uq . (16) brought close to zero in the preceding steps. Further-
Note that in the last step we have made use of the fact more, the operations taking place before and after the
that K is real, as can be seen from Eqs. (14) and (13), innermost loops will play a role for small matrices. Since
and from the hermiticity of A. these are more complicated for QL than for Jacobi, they
will give an additional penalty to QL. For these reasons
we expect the performance bonus of QL over Jacobi to
2.2.2. The QL algorithm for real tridiagonal matrices be smaller for 3 3 matrices than for larger problems.
The numerical properties of both iterative methods are
independent of the matrix size and have been studied in
The QL algorithm is based on the fact that any real
great detail by others, so we will only give a brief overview
matrix can be decomposed into an orthogonal matrix Q
here. For real symmetric positive definite matrices, Dem-
and a lower triangular matrix L according to
mel and Veselic have shown that the Jacobi method is
A = QL. (17) more accurate than the QL algorithm [16, 17]. In partic-
ular, if the eigenvalues are distributed over many orders
Equivalently, one could also start from a decomposition of magnitude, QL may become inaccurate for the small
of the form A = QR, with R being upper triangular, eigenvalues. In the example given in [17], the extremely
to obtain the QR algorithm, which has similar proper- fast convergence of this algorithm is also its main weak-
ties. For tridiagonal A, the QL decomposition is most ness: In the attempt to bring the lowest diagonal element
efficiently calculated by a sequence of plane rotations of close to the smallest eigenvalue of the matrix, a difference
the form of Eq. (3). The iteration prescription is of two almost equal numbers has to be taken.
If the requirement of positive definiteness is omitted,
A QT A Q, (18) one can also find matrices for which QL is more accurate
than Jacobi [18].
but to accelerate convergence it is advantageous to use
the method of shifting, which means decomposing A kI
instead of A. In each step, k is chosen in such a way
3. NON-ITERATIVE ALGORITHMS
that the convergence to zero of the uppermost non-zero
off-diagonal element of A is maximized (see [15] for a
discussion of the shifting strategy and for corresponding 3.1. Direct analytical calculation of the eigenvalues
convergence theorems). Since in practice, the subtraction
of k from the diagonal elements of A can introduce large For 3 3 matrices, the fastest way to calculate the
numerical errors, the most widely used form of the QL eigenvalues is by directly solving the characteristic equa-
algorithm is one with implicit shifting, where not all of tion
these differences need to be evaluated explicitly, although
the method is mathematically equivalent to the ordinary P () = |A I| = 0 (19)
QL algorithm with shifting.
If we write

2.3. Efficiency and accuracy of the Jacobi and QL a11 a12 a13
algorithms A = a12 a22 a23 , (20)
a13 a23 a33
As we have mentioned before, one of the main ben-
efits of the QL algorithm is its efficiency: For matrices Eq. (19) takes the form
of large dimension n, it requires 30n2 floating point
P () = 3 + c2 2 + c1 + c0 = 0 (21)
operations (combined multiply/add) if only the eigenval-
ues are to be computed, or 6n3 operations if also the
with the coefficients
complex eigenvectors are desired [8]. Additionally, the
preceding complex Householder transformation requires c2 = a11 a22 a33 , (22)
8n3 /3 resp. 16n3 /3 operations. In contrast, the complex
Jacobi algorithm takes about 3n2 to 5n2 complex Jacobi c1 = a11 a22 + a11 a33 + a22 a33
rotations, each of which involves 12n operations for the |a12 |2 |a13 |2 |a23 |2 , (23)
eigenvalues, or 24n operations for the complete eigensys- 2 2 2
c0 = a11 |a23 | + a22 |a13 | + a33 |a12 |
tem. Therefore, the total workload is 35n3 60n3 resp.
70n3 120n3 operations. a11 a22 a33 2 Re(a13 a12 a23 ). (24)
4

To solve Eq. (21), we follow the method proposed by account the sign of q: For q > 0, the solution must lie in
del Ferro, Tartaglia, and Cardano in the 16th cen- the first quadrant, for q < 0 it must be located in the sec-
tury [19]. In Appendix A we will discuss two alternative ond. In contrast to this, solutions differing by multiples
approaches and show that they lead to the same algo- of 2 are equivalent, so x can take three different values,
rithm as Cardanos method if numerical considerations
are taken into account. x1 = 2 cos ,
Cardanos method requires first transforming Eq. (21)  2 
to the form x2 = 2 cos + = cos 3 sin , (33)
3
 2 
x3 3x = t, (25) x3 = 2 cos = cos + 3 sin .
3
by defining These correspond to the three eigenvalues of A:
p= c22 3c1 , (26)
p
i = 3 xi 31 c2 . (34)
3
q= 27
2 c0 c2 + 9
2 c2 c1 , (27)
t = 2p3/2
q, (28) Similar formulas have been derived previously in [9].
The most expensive steps of Cardanos algorithm are
x= 3 ( + 13 c2 ). (29)
p the evaluations of the trigonometric functions. Neverthe-
less, the method is extremely fast, and will therefore be
It is easy to check that a solution to Eq. (25) is then the best choice for many practical problems. However,
given by from Eq. (34) we can see that it becomes unstable for ma-
trices with largely different eigenvalues: In general, c2 is
1
x= + u, (30) of the order of the largest eigenvalue max . Therefore, in
u order to obtain thesmaller eigenvalues, considerable can-
p
with cellation between 3 xi and 13 c2 must occur, which can
s yield large errors and is very susceptible to tiny relative
errors in the calculation of c2 .
r
3 t t2
u= 1. (31) In general, the roots of Eq. (21) can be very sensitive
2 4
to small errors in the coefficients which might arise due
There is a sixfold ambiguity in u (two choices for the to cancellations in Eqs. (22) (24).
sign of the square root and three solutions for the com- If is the machine precision, we can estimate the abso-
plex cube root), but it reduces to the expected threefold lute accuracy of the eigenvalues to be of O(max ), which
ambiguity in x. may imply a significant loss of relative accuracy for the
To achieve optimal performance, we would like to avoid small eigenvalues.
complex arithmetic as far as possible. Therefore, we will Consider for example the matrix

now show that p, and thus t, are always real. We know 40
from linear algebra that the characteristic polynomial 10 1019 1019
P () of the hermitian matrix A must have three real 1019 1020 109 , (35)
roots, which is only possible if the stationary points of 1019 109 1
1/2 = 1 c2 1 c2 3c1 = 1 c2 1 p are real.
p
P (), 3 3 2 3 3
Since c2 is real, this implies that also p must be real. which has the (approximate) eigenvalues 1040 , 1020 , and
Furthermore, from the same argument, we have the re- 1. However, Cardanos method yields 1040 , 5 1019 , and
quirement that P ( 1 ) 0 P (
51019. Note that for the matrix (35), also the QL algo-
p 2 ), which in turn implies
that 2 t 2. Therefore, t2 /4 1 is always purely rithm has problems and delivers one negative eigenvalue.
imaginary, and from this it is easy to see that |u| = 1. Only Jacobi converges to a reasonable relative accuracy.
Therefore we can write u = ei , with See [16] for a discussion of this.
p p
1 |t2 /4 1| 1 p3 q 2
= 3 arctan = 3 arctan 3.2. Direct analytical calculation of the
t/2 q
q  eigenvectors
1 2 27

27 4 c1 (p c1 ) + c0 (q + 4 c0 )
= 13 arctan . (32) Once the eigenvalues i of A have been computed, the
q
eigenvectors vi can be calculated very efficiently by using
The last step is necessary to improve the numerical ac- vector cross products, which are a unique tool in three-
curacy, which would suffer dramatically if we computed dimensional space.
the difference p3 q 2 directly. The vi satisfy by definition
When evaluating Eq. (32), care must be taken to cor-
rectly resolve the ambiguity in the arctan by taking into (A i I) vi = 0. (36)
5

Taking the hermitian conjugate of this equation and mul- If we erroneously start the vector product algorithm with
tiplying it with an arbitrary vector x C3 , we obtain the approximate eigenvalues 1020 , 1020 , and 0.98, the er-
ror that is introduced when subtracting 1 from the diag-
vi (A i I) x = 0. (37) onal elements is of order O(109 ) and thus comparable to
the off-diagonal elements. Consequently, the calculated
In particular, if x is the j-th unit vector ej , this becomes eigenvectors

vi (Aj i ej ) = 0

j, (38) 1/3 2/ 6 0
v1 = 1/ 3 , v2 = 1/ 6 , v3 = 1/2
where Aj denotes the j-th column of A. Consequently, as 1/ 3 1/ 6 1/ 2
long as A1 i e1 and A2 i e2 are linearly independent, (43)
we have are completely wrong.
Another flaw of the vector product algorithm is the fact
vi = [(A1 i e1 ) (A2 i e2 )] . (39)
that the subtractions (Aj i ej ) and the subtractions
In the special case that A1 i e1 = (A2 i e2 ), vi is in the evaluation of the cross products are very prone to
immediately given by cancellation errors.

1 1
vi = p . (40) 3.3. A hybrid algorithm
1 + ||2 0
To circumvent the cases where the cross product
When implementing the above procedure, care must method fails or becomes too inaccurate, we have devised
be taken if there is a degenerate eigenvalue because in a hybrid algorithm, which uses the analytical calcula-
this case, the algorithm will only find one of the two tions from Secs. 3.1 and 3.2 as the default branch, but
corresponding eigenvectors. Therefore, if we detect a de- falls back to QL if this procedure is estimated to be too
generate eigenvalue, say 1 = 2 , we calculate the sec- error-prone. The condition for the fallback is
ond eigenvector as the cross product of v1 with one of
the columns of A 1 I. In principle, this alternative ||vi ||2 28 2 (44)
formula would also work for non-degenerate eigenvalues, Here, vi is the analytically calculated and yet unnormal-
but we try to avoid it as far as possible because it abets ized eigenvector from Eq. (39), is the machine precision,
the propagation of errors. On the other hand, we have = max(2max , max ) is an estimate for the largest num-
to keep in mind that the test for degeneracy may fail if ber appearing in the problem, and 28 is introduced as a
the eigenvalues have been calculated too inaccurately. If safety factor.
this happens, the algorithm will deliver wrong results. Since in typical problems only a small fraction of ma-
The calculation of the third eigenvalue can be greatly trices will fulfill condition (44), the hybrid approach can
accelerated by using the formula v3 = v1 v2 . Of course, be expected to retain the efficiency of the analytical algo-
this is again vulnerable to error propagation, but it turns rithm. In reality, it is even slightly faster because fewer
out that this is usually tolerable. conditional branches are used.
For many practical purposes that require only moder-
ate accuracy, the vector product algorithm is the method
of choice because it is considerably faster than all other 3.4. Cuppens Divide and Conquer algorithm
approaches. However, the limitations to its accuracy
need to be kept in mind. First, the eigenvectors suffer In recent years, the Divide and Conquer paradigm
from errors in the eigenvalues. Under certain circum- for symmetric eigenproblems has received considerable
stances, these errors can even be greatly enhanced by attention. The idea was originally invented by Cup-
the algorithm. For example, the matrix pen [20], and current implementations are faster than
20 the QL method for large matrices [6]. One can estimate,
10 109 109 that for 3 3 matrices, a divide and conquer strategy
109 1020 109 , (41) might also be beneficial, because it means splitting the
109 109 1 problem into a trivial 1 1 and an analytically accessible
2 2 problem.
has the approximate eigenvalues (1 + 1011 ) 1020 , (1 However, let us first discuss Cuppens algorithm in its
1011 ) 1020 , and 0.98. The corresponding eigenvectors general form for n n matrices. As a preliminary step,
are approximately the matrix has to be reduced to symmetric tridiagonal
11 form, as discussed in Sec. 2.2.1. The resulting matrix T
1/2 1/ 2 10
v1 = 1/ 2 , v2 = 1/ 2 , v3 = 1011 . is then split up in the form
1
 
0 0 T1 0
T= + H, (45)
(42) 0 T2
6

where T1 and T2 are again tridiagonal, and H is a very and just need to be transformed back to the original basis
simple rank 1 matrix, e.g. by undoing the transformations Q1 , Q2 , and the tridiag-
onalization.
0

If implemented carefully, Cuppens method can reach
an accuracy comparable to that of the QL method. A
H= . (46)

major issue is the calculation of the differences di j
0 in the evaluation of the characteristic equation, Eq. (49),
and in the calculation of the eigenvectors, Eq. (50). To
Then, the smaller matrices T1 and T2 are brought to the keep the errors as small as possible when solving for the
diagonal form Ti = Qi Di QTi , so that T becomes eigenvalue j , we subtract our initial estimate for j from
all di before starting the iteration. This ensures that
Q1 D1 QT1
 
0 the thus transformed eigenvalue is very close to zero and
T= +H
0 Q2 D2 QT2 therefore small compared to the di .
As we have mentioned before, the Divide and Con-
     T 
Q1 0 D1 0 Q1 0
= + H . (47) quer algorithm is faster than the QL method for large
0 Q2 0 D2 0 QT2
matrices. It also required O(n3 ) operations, but since
Here, H = z zT is another rank 1 matrix with a gener- the expensive steps the reduction to tridiagonal form
ating vector z consisting of the last row of Q1 and the and the back-transformation of the eigenvectors are
first row of Q2 , both multiplied with . The remaining both outside the iterative loops, the coefficient of n3 is
problem is to find an eigenvalue and an eigenvector v, significantly smaller. For the small matrices that we are
satisfying considering, however, the most expensive part is solving
the characteristic equation. Furthermore, many condi-
tional branches are required to implement necessary case
  
D1 0 T
+ z z I v = 0. (48) differentiations, to avoid divisions by zero, and to handle
0 D2
special cases like multiple eigenvalues. Therefore, we ex-
By multiplying this equation from the left with pect the algorithm to be about as fast as the QL method.
zT diag((D1 I)1 , (D2 I)1 ) and dividing off the It is of course possible to reduce the calculational ef-
scalar zT v, we obtain the characteristic equation in the fort at the expense of reducing the accuracy and stability
form of the algorithm, but it will always be slower than Car-
danos method combined with the vector product algo-
X zi2 rithm.
1+ = 0, (49)
i
di

where the di are the diagonal entries of D1 and D2 . 4. OTHER ALGORITHMS


There are several obvious strategies for solving this
equation in the 3 3 case: First, by multiplying out the Apart from the Jacobi, QL, Cuppen, and vector prod-
denominators we obtain a third degree polynomial which uct algorithms there are several other methods for finding
can be solved exactly as discussed in Sec. 3.1. This is very the eigensystems of symmetric matrices. We will briefly
fast, but we have seen that it can be numerically unstable. outline some of them here, and give reasons why they are
Second, one could apply the classical Newton-Raphson inappropriate for 3 3 problems.
iteration, which is fast and accurate in the vicinity of the
root, but may fail altogether if a bad starting value is
chosen. Finally, the fact that the roots of Eq. (49) are 4.1. Iterative root finding methods
known to be separated by the singularities d1 , d2 and d3
suggests the usage of a bisection method to obtain the In order to avoid the instabilities of Cardanos method
eigenvalues very accurately, but with slow convergence. which were discussed in Sec. 3.1, one can use an itera-
To get the optimum results, we apply Cardanos an- tive root finding method to solve the characteristic equa-
alytical method to get estimates for the roots, and tion. Root bracketing algorithms like classical bisection
refine them with a hybrid Newton-Raphson/Bisection or the Dekker-Brent method start with an interval which
method based on [8]. This method usually takes Newton- is known to contain the root. This interval is then itera-
Raphson steps, but if the convergence gets too slow or if tively narrowed until the root is known with the desired
Newton-Raphson runs out of the bracketing interval, it accuracy. Their speed of convergence is fair, but they
falls back to bisection. are usually superseded by the Newton-Raphson method
The elements vij of the eigenvectors v i of which follows the gradient of the function until it finds
diag(D1 , D2 ) + H are then obtained by the simple the root. However, the Newton-Raphson algorithm is not
formula guaranteed to converge in all cases.
zj Although these problems can partly be circumvented
vij = (50)
dj i by using a hybrid method like the one discussed in
7

Sec. 3.4, iterative root finders are still unable to find be detected and treated separately. This renders efficient
multiple roots, and these special cases would have to be vectorization impossible. The same is true for Cuppens
treated separately. Furthermore, the accuracy is limited Divide and Conquer algorithm. On the other hand, the
by the accuracy with which the characteristic polyno- iterative methods are problematic if the required number
mial can be evaluated. As we have already mentioned of iterations is not approximately the same for all matri-
in Sec. 3.1, this can be spoiled by cancellations in the ces. Then, only the first few iterations can be vectorized
calculation of the coefficients c0 , c1 , and c2 . efficiently. Afterwards, matrices for which the algorithm
has not converged yet need to be treated separately.

4.2. Inverse iteration

Inverse iteration is a powerful tool for finding eigen- 5. BENCHMARK RESULTS


vectors and improving the accuracy of eigenvalues. The
method starts with some approximation i for the de- In this section we report on the computational per-
sired eigenvalue i , and a random vector b. One then formance and on the numerical accuracy of the above
solves the linear equation algorithms. For the iterative methods we use implemen-
tations which are similar to those discussed in [8]. Ad-
i I) v
(A i = b (51) ditionally, we study the LAPACK implementation of the
QL/QR algorithm [6] and the QL routine from the GNU
to find and approximation vi for the eigenvector vi . An
Scientific Library [7]. For the analytical methods we use
improved estimate for i is calculated by using the for-
our own implementations. Some ideas in our Cuppen
mula
routine are based on ideas realized in LAPACK.
i I) v
(A i) v
i (i i . (52) Note that we do not show results for the LAPACK im-
plementation of a Divide and Conquer algorithm (routine
We estimate that inverse iteration is impractical for small ZHEEVD) because this algorithm falls back to QL for small
matrices because there are many special cases that need matrices (n < 25) and would therefore not yield anything
to be detected and handled separately. This would slow new. We also neglect the new LAPACK routine ZHEEVR
the computation down significantly. because for the 3 3 problems considered here it is sig-
nificantly slower than the other algorithms.
We have implemented our code in C and Fortran, but
4.3. Vectorization here we will only discuss results for the Fortran version.
We have checked that the C code yields a similar numeri-
In a situation where a large number N of small ma- cal accuracy, but is about 10% slower. This performance
trices needs to be diagonalized, and all these matrices deficit can be ascribed to the fact that C code is harder
are available at the same time, it may be advantageous to optimize for the compiler.
to vectorize the algorithm, i.e. to make the loop from 1 Our code has been compiled with the GNU compiler,
to N the innermost loop 1 . This makes consecutive op- using the standard optimization flag -O3. We did not
erations independent of each other (because they affect use any further compiler optimizations although we are
different matrices), and allows them to be pipelined and aware of the fact that it is possible to increase the execu-
executed very efficiently. tion speed by about 10% if options like -ffast-math are
A detailed discussion of this approach is beyond the used to allow optimizations that violate the IEEE 754
scope of the present work, but our estimate is that, as standard for floating point arithmetic.
long as only the eigenvalues are to be computed, a vec- Our numerical tests were conducted in double precision
torization of Cardanos method would be most benefi- arithmetic (64 bit, 15 16 decimal digits) on an AMD
cial, because this method requires only few performance- Opteron 250 (64-bit, 2.4 GHz) system running Linux. To
limiting conditional branches, so that the number of pro- maximize the LAPACK performance on this system, we
cessor pipeline stalls is reduced to a minimum. However, used the highly optimized AMD Core Math Library for
the accuracy limitations discussed above would still ap- the corresponding tests. Note that on some platforms,
ply in the vectorized version. in particular on Intel and AMD desktop processors, you
If we want to calculate the full eigensystem, a vec- may obtain a higher numerical accuracy than is reported
torized vector product method can only give a limited here because these processors internally use 80 bit wide
performance bonus because in the calculation of the vec- floating point registers.
tor products, many special cases can arise which need to
To measure the accuracy of the calculated eigensys-
tems we use three different benchmarks:

1 We thank Charles van Loan for drawing our attention to this


and the
The relative difference of the eigenvalue
possibility. corresponding result of the LAPACK QL routine
8

LAPACK :
ZHEEV, Real symmetric matrices

Algorithm Eigenvalues Eigenvectors



LAPACK

1 . (53) Lin. Log. Lin. Log.
LAPACK
Jacobi 12.0 10.0 12.9 10.9
We use the LAPACK QL implementation as a ref- QL 8.3 6.3 9.0 6.9
erence because it is very well tested and stable. Cuppen 6.6 8.7 9.3 11.5
GSL 14.1 11.6 18.8 15.7
and the
The relative difference of the eigenvector v LAPACK QL 21.5 19.3 49.9 45.2
corresponding LAPACK result, v LAPACK :
Analytical 2.9 3.0 4.0 4.1
k
vv LAPACK k2 Hybrid 3.0 3.8 3.4 4.5
2 , (54)
vLAPACK k2
k
Complex hermitian matrices
where k k2 denotes the Euclidean norm. This def-
inition of 2 is of course not meaningful for matri-
Algorithm Eigenvalues Eigenvectors
ces with degenerate eigenvalues because for these,
different algorithms might find different bases for Lin. Log. Lin. Log.
the multidimensional eigenspaces. Even for non- Jacobi 17.0 15.4 21.0 18.4
degenerate eigenvalues, 2 is only meaningful if QL 10.8 8.9 13.5 11.2
the same phase convention is chosen for v and Cuppen 8.3 9.8 12.5 14.1
LAPACK . Therefore, when computing 2 , we re-
v GSL 20.4 17.8 38.2 32.5
ject matrices with degenerate eigenvalues, and for LAPACK QL 29.6 26.7 61.9 57.6
all others we re-phase the eigenvectors in such a way Analytical 3.5 3.7 5.9 6.0
that the component which has the largest modulus
Hybrid 3.7 3.6 4.8 5.1
in v LAPACK is real.

The deviation of the eigenvalue/eigenvector pair Table I: Performance of different algorithms for calculating
v
(, ) from its defining property Av = v: the eigenvalues and eigenvectors of symmetric or hermitian
3 3 matrices. We show the running times in seconds for
kA vk2
v the calculation of only the eigenvalues (left) and of the com-
3 . (55) plete eigensystems (right) of 107 random matrices. Further-
vk
k
more, we compare the cases of linearly and logarithmically
distributed matrix entries. For the scenarios with real ma-
Note that for the algorithms considered here, k vk2 = 1, trices, specialized versions of the algorithms have been used.
so the definitions of 2 and 3 can be slightly simplified. This table refers to the Fortran version of our code; the C
We have first verified the correctness of the algorithms version is about 10% slower.
by diagonalizing 107 random hermitian resp. symmetric
matrices with integer entries from the interval [10, 10]
and (automatedly) checking that 1 , 2 and 3 were as is due to the fact that in both LAPACK and the GSL,
small as expected. We have repeated this test with log- only the innermost loops, which determine the perfor-
arithmically distributed integer entries from the interval mance for large scale problems, are optimized. Outside
[0, 1010 ]. Such matrices tend to have largely different these loops, the libraries spend a lot of time evaluat-
eigenvalues and are therefore a special challenge to the ing their parameters and preparing data structures. LA-
algorithms. PACK QL additionally takes some time to decide at run-
For the results that we are going to present here, we time whether a QR or QL algorithm is more favorable.
have used similar tests, but we have allowed the matrix On the other hand, our implementations of the itera-
entries to be real, which corresponds more closely to what tive algorithms have a very simple parameter set, they do
is found in actual scientific problems. Furthermore, for not require any internal reorganization of data structures,
the logarithmically distributed case we have changed the avoid function calls as far as possible, and contain some
interval to [105 , 105 ]. optimizations specific to the 33 case. On the downside,
they do not perform any pre-treatment of the matrix such
as rescaling it to avoid overflows and underflows. We be-
5.1. Performance lieve that for pathological matrices, LAPACK may be
more accurate than our QL method.
The results of our timing tests are summarized in Ta- It is interesting to observe that the iterative algorithms
ble I. The most important observation from this table are slightly faster for matrices with logarithmically dis-
is the fact that the standard libraries are slower by a tributed entries. The reason is that for many of these ma-
substantial factor than the specialized algorithms. This trices, the off-diagonal entries are already small initially,
9

so fewer iterations are required for the diagonalization. distributed scenario, 1 and 2 are systematically larger
This is one of the reasons why QL is always faster than for real than for complex matrices. This does not have
Cuppen in the logarithmically distributed scenario, but a deeper reason but is simply due to the fact that in the
may be slower in the linear case. Another reason for real case, there are fewer random degrees of freedom, so
this is the fact that logarithmically distributed matrices there is a higher chance for ill-conditioned matrices to
are more likely to create situations in which the hybrid occur. The effect is not visible in 3 because there it is
root finder, which is used to solve Eq. (49), converges compensated by the fact that this quantity receives large
very slowly. This happens for example if the analytically contributions mainly when in the evaluation of A vi in
calculated starting values have large errors. Eq. (55), a multiplication of a large matrix entry with
However, even in favorable situations, neither QL nor a small and thus relatively inaccurate vector component
Cuppen can compete with the analytical methods. These occurs. It follows from combinatorics that this is more
do not require any pre-treatment of the matrix (like the likely to happen if A and v are complex.
transformation to tridiagonal form in the case of QL and
Cuppen), and though their implementation may look a
bit lengthy due to the many special cases that can arise, 6. CONCLUSIONS
the number of floating point operations that are finally
executed for each matrix is very small. In this article, we have studied the numerical three-
If we compare the performance for real symmetric vs. dimensional eigenproblem for symmetric and hermitian
complex hermitian matrices, we find that purely real matrices. We have discussed the Jacobi, QL, and Cuppen
problems can be solved much more efficiently. This is algorithms as well as an analytical method using Car-
especially true for the Jacobi algorithm since real Ja- danos formula and vector cross products. Our bench-
cobi transformations are much cheaper than complex marks reveal that standard packages are very slow for
ones. For QL and Cuppen, the performance bonus is less small matrices. Optimized versions of the standard al-
dramatic because large parts of these algorithms always gorithms are a lot faster while retaining similar numer-
operate on purely real data structures. The Cardano ical properties, but even their speed is not competitive
and vector product algorithms contain complex arith- with that of the analytical methods. We have, however,
metic, but their speed is also limited by the evaluation of seen that the latter have limited numerical accuracy in
trigonometric functions (Cardano) and by several condi- extreme situations. Moreover, they were not designed
tional branches (vector product), therefore they too do to avoid overflow and underflow conditions. To partly
not benefit as much as the Jacobi method. circumvent these problems, we have devised a hybrid
algorithm, which employs analytical calculations as the
standard branch, but falls back to QL if it estimates the
5.2. Numerical accuracy problem to be ill-conditioned.
Depending on what kind of problem is to be solved, we
The excellent performance of the analytical and hy- give the following recommendations:
brid algorithms is relativized by the results of our accu-
The hybrid algorithm is recommended for prob-
racy tests, which are shown in Table II. While for lin-
lems where computational speed is more important
early distributed matrix entries, all algorithms get close
than accuracy, and the matrices are not too ill-
to the machine precision of about 2 1016 , the Car-
conditioned in the sense that their eigenvalues do
dano and vector product methods become unstable for
not differ by more than a few orders of magnitude.
the logarithmically distributed case. In particular, the
For example, in the initial example of the neutrino
large average values of 3 > O(103 ) show that the cal-
oscillation Hamiltonian, where the physical uncer-
culated eigenvalues and eigenvectors often fail to fulfill
tainties in the parameters are much larger than the
their defining property Av = v. This problem is miti-
numerical errors, the hybrid algorithm turns out to
gated by the hybrid technique, but even this approach is
be the optimal choice [4].
still far less accurate than the Jacobi, QL, and Cuppen
algorithms. For these, 3 is still of order 109 (QL & The QL algorithm is a good general purpose
Cuppen) resp. 1010 (Jacobi). The fact that Jacobi is black box method since it is reasonably fast and
more accurate than QL and Cuppen confirms our expec- except in some very special situations like the
tations from Sec. 2.3. example given in Eq. (35) also very accurate. If
Note that the values of 1 and 2 for the LAPACK speed is not an issue, one can use standard im-
QL algorithm are zero in the case of complex matrices plementations of QL like the LAPACK function
and extremely small for real matrices. The reason is that ZHEEV. For better performance we recommend sim-
LAPACK QL is the reference algorithm used in the defi- pler implementations like our function ZHEEVQ3 or
nition of these quantities. In the case of real matrices, 1 the function tqli from Ref. [8], on which our rou-
and 2 reveal the tiny differences between the LAPACK tine is based.
ZHEEV and DSYEV routines.
It is interesting to observe that in the logarithmically Cuppens Divide and Conquer method can
10

Linearly distributed real matrix entries

1 2 3
Algorithm Avg. Max. Avg. Max. Avg. Max.
Jacobi 1.34 1015 3.52 109 4.01 1016 1.32 1012 2.01 1015 1.02 108
QL 1.89 1015 5.59 109 4.09 1016 2.05 1012 3.58 1015 1.24 108
Cuppen 1.95 1015 9.53 109 6.83 1016 1.80 1012 4.21 1015 1.45 108
GSL 1.29 1015 3.18 109 3.56 1016 2.18 1012 2.40 1015 5.02 109
LAPACK QL 5.80 1017 3.61 1011 3.17 1017 6.10 1013 2.69 1015 8.28 109
Analytical 1.87 1015 9.53 109 6.19 1015 1.80 108 1.36 1014 4.32 108
Hybrid 1.87 1015 9.53 109 4.91 1015 6.49 109 1.16 1014 2.91 108

Linearly distributed complex matrix entries

1 2 3
Algorithm Avg. Max. Avg. Max. Avg. Max.
Jacobi 1.96 1015 7.66 109 4.64 1016 1.13 1013 1.42 1014 3.44 107
QL 2.08 1014 5.46 107 4.83 1016 8.16 1014 4.27 1014 1.14 106
Cuppen 4.37 1015 6.54 108 6.60 1016 2.03 1013 3.95 1014 1.03 106
GSL 8.01 1015 1.88 107 4.56 1016 8.36 1014 2.14 1014 5.26 107
LAPACK QL 0.0 0.0 0.0 0.0 2.41 1014 6.03 107
Analytical 4.19 1015 6.54 108 5.66 1016 3.17 1011 3.05 1014 7.95 107
Hybrid 4.19 1015 6.54 108 5.56 1016 3.17 1011 3.03 1014 7.95 107

Logarithmically distributed real matrix entries

1 2 3
Algorithm Avg. Max. Avg. Max. Avg. Max.
Jacobi 2.96 1010 1.94 104 3.05 1012 3.91 107 8.16 1011 1.10 104
QL 4.88 1010 4.29 104 2.59 1012 7.14 107 1.03 109 1.18 103
Cuppen 4.28 1010 4.29 104 3.58 1012 6.55 107 8.90 1010 1.12 103
GSL 1.86 1010 1.62 104 2.78 1012 4.01 107 9.87 1010 2.04 103
LAPACK QL 8.36 1012 1.14 105 1.28 1013 1.81 107 1.11 109 1.18 103
Analytical 1.87 109 7.20 103 1.80 107 1.36 10+0 3.47 101 1.07 10+6
Hybrid 1.40 109 1.16 103 3.84 1011 2.03 104 2.19 104 4.75 10+1

Logarithmically distributed complex matrix entries

1 2 3
Algorithm Avg. Max. Avg. Max. Avg. Max.
Jacobi 1.55 1010 1.64 104 2.23 1013 7.43 108 1.19 1010 8.24 105
QL 2.25 1010 6.84 104 1.96 1013 1.17 107 7.85 1010 5.93 104
Cuppen 2.03 1010 6.02 104 2.71 1013 1.30 107 7.59 1010 5.86 104
GSL 1.06 1010 8.69 105 2.17 1013 1.30 107 1.38 109 1.15 103
LAPACK QL 0.0 0.0 0.0 0.0 1.27 109 1.24 103
Analytical 1.16 109 7.10 104 6.10 109 8.29 102 2.88 103 2.84 10+3
Hybrid 1.11 109 6.84 104 8.55 1012 3.55 105 1.15 104 8.81 10+1

Table II: Numerical accuracy of different algorithms for calculating the eigenvalues and eigenvectors of symmetric or hermitian
3 3 matrices. This table refers to the Fortran implementation, but we have checked that the values obtained with the C code
are similar.
11

achieve an accuracy similar to that of the QL algo- Our previous result 2 t 2 (see Sec. 3.1) ensures
rithm and may be slightly faster for complex prob- that this is well-defined. If we replace the arccos by a
lems if the input matrix is not already close to di- numerically more stable arctan, we immediately recover
agonal. The choice between Cuppen and QL will Eq. (33).
therefore depend on the details of the problem that
is to be solved.
2. Lagrange resolvents
If the highest possible accuracy is desired, Jacobis
method is the algorithm of choice. It is extremely The second alternative to Cardanos derivation that
accurate even for very pathological matrices, but we are going to consider employs the concept of La-
it is significantly slower than the other algorithms, grange resolvents. We start from the observation that
especially for complex problems. the coefficients of Eq. (21) can be expressed in terms of
the rootsQ1 , 2 , and 3 of P (), because we can write
The purely analytical method is not recom- P () = i=1,2,3 ( i ). In particular, c0 , c1 , and c2 are
mended for practical applications because it is su- the so-called elementary symmetric polynomials in 1 ,
perseded by the hybrid algorithm. It is, however, of 2 , and 3 :
academic interest since it reveals both the strengths X
and the limitations of analytical eigensystem calcu- c2 = i ,
lations. X
c1 = i j , (A3)
Let us remark that in scenarios where the diganolization i<j
of a hermitian 3 3 matrix is only part of a larger prob- c0 =
Y
i .
lem, it may be advantageous to choose a slower but more
accurate algorithm because this may improve the con- Next, we consider the Lagrange resolvents of Eq. (21),
vergence of the surrounding algorithm, thus speeding up which are defined by
the overall process. The final choice of diagonalization
r1 = 1 + 2 + 3 ,
algorithm will always depend strongly on the details of
2 2
the problem which is to be solved. r2 = 1 + ei 3 2 + ei 3 3 , (A4)
2 2
r3 = 1 + ei 3 2 + ei 3 3 .
Acknowledgments
We observe, that ri3is invariant under permutation of
the i , and so, by the fundamental theorem of symmetric
I would like to thank M. Lindner, P. Huber, and the functions [21], can be expressed in terms of c0 , c1 , and
readers of the NA Digest for useful discussion and com- c2 . Indeed, with the definitions from Eq. (29), we obtain
ments. Furthermore I would like to acknowledge support
from the Studienstiftung des Deutschen Volkes. r13 = c32 ,
p
r23 = q + q 2 p3 , (A5)
p
Appendix A: ALTERNATIVE DERIVATIONS OF r33 = q q 2 p3 .
CARDANOS METHOD We can then recover 1 , 2 , and 3 according to
1 = 13 (r1 + r2 + r3 ),
In this appendix, we will discuss two alternative so- 2 2
lution strategies for the third degree polynomial equa- 2 = 31 (r1 + ei 3 r2 + ei 3 r3 ), (A6)
tions (21), and show that in the end, numerical consid- 2 2
erations lead again to Eq. (33). 3 = 13 (r1 + ei 3 r2 + ei 3 r3 ).
For a practical implementation of these formulas, one
would like to avoid complex arithmetic.
p This is possi-
1. A trigonometric approach ble because we have seen before that q 2 p3 is always
purely imaginary. This observation allows us to write
If we substitute x = 2 cos 3 on the left hand side of
r2 = pei ,
Eq. (25) and use trigonometric transformations to obtain (A7)
r3 = pei ,
2 cos = t, (A1) p
with = 31 arctan p3 q 2 /q as before, and thus
we can show that the solutions to Eq. (25) can be written 1 = 31 (c2 + 2 cos ),
as
2 = 13 (c2 cos 3 sin ), (A8)

 
x = 2 cos 3 = 2 cos 13 arccos 2t . (A2) 1
3 = 3 (c2 cos + 3 sin ).
12

These expressions are equivalent to Eq. (33) after substi- 1. Main driver function
tuting back Eqs. (29), so the practical implementation of
the Lagrange method is again identical to the previous SUBROUTINE ZHEEVJ3(A, Q, W)
algorithms. COMPLEX*16 A(3, 3)
COMPLEX*16 Q(3, 3)
DOUBLE PRECISION W(3)
Appendix B: DOCUMENTATION OF THE C AND This routine uses Jacobis method (see Sec. 2.1) to find
FORTRAN CODE
the eigenvalues and normalized eigenvectors of a hermi-
tian 33 matrix A. The eigenvalues are stored in W, while
Along with the publication of this article, we provide C the eigenvectors are returned in the columns of Q.
and Fortran implementations of the algorithms discussed The upper triangular part of A is destroyed during the
here for download. They are intended to be used for fur- calculation, the diagonal elements are read but not de-
ther numerical experiments or for the solution of actual stroyed, and the lower triangular elements are not refer-
scientific problems. enced at all.
Our C code follows the C99 standard which provides
the complex data type double complex. In gcc, this re- SUBROUTINE ZHEEVQ3(A, Q, W)
quires the usage of the compiler option -std=c99. The COMPLEX*16 A(3, 3)
Fortran code is essentially Fortran 77, except for the COMPLEX*16 Q(3, 3)
fact that not all variable and function names obey the DOUBLE PRECISION W(3)
6-character limitation.
This is our implementation of the QL algorithm from
Both versions of the code contain detailed comments,
Sec. 2.2. It finds the eigenvalues and normalized eigen-
describing the structure of the routines, the purpose of
vectors of a hermitian 3 3 matrix A and stores them in
the different functions, and their arguments. The C ver-
W and in the columns of Q.
sion also contains detailed information about local vari-
The function accesses only the diagonal and upper tri-
ables, which was omitted in the Fortran version to keep
angular parts of A. The access is read-only.
the code compact.
Our nomenclature conventions for functions and sub-
routines may seem a bit cryptical because we tried to SUBROUTINE ZHEEVD3(A, Q, W)
keep as close as possible to the LAPACK conventions: COMPLEX*16 A(3, 3)
The first letter indicates the data type (D for double COMPLEX*16 Q(3, 3)
or Z for double complex), the second and third letters DOUBLE PRECISION W(3)
indicate the matrix type (SY for symmetric and HE This is Cuppens Divide and Conquer algorithm, op-
for hermitian), while the remaining characters specify the timized for 3-dimensional problems (see Sec. 3.4). The
purpose of the function: EV means eigenvalues and/or function assumes A to be a hermitian 3 3 matrix, and
eigenvectors, J stands for Jacobi, Q for QL, D for calculates its eigenvalues Wi , as well as its normalized
Divide & Conquer (Cuppen), V for vector product, and eigenvectors. The latter are returned in the columns of
C for Cardano. We also add the suffix 3 to indicate Q.
that our routines are designed for 3 3 matrices. The function accesses only the diagonal and upper tri-
In the following we will describe the interface of the angular parts of A. The access is read-only.
individual routines. We will discuss only those func-
tions which are relevant to the complex case, because SUBROUTINE ZHEEVC3(A, W)
their real counterparts are similar, with the data types COMPLEX*16 A(3, 3)
COMPLEX*16 resp. double complex being replaced by DOUBLE PRECISION W(3)
DOUBLE PRECISION resp. double. This routine calculates the eigenvalues Wi of a hermi-
Furthermore, we will only discuss the Fortran code tian 3 3 matrix A using Cardanos analytical algorithm
here because the corresponding C functions have iden- (see Sec. 3.1). Only the diagonal and upper triangular
tical names and arguments. For example the Fortran parts of A are accessed, and the access is read-only.
subroutine

SUBROUTINE ZHEEVJ3(A, Q, W) SUBROUTINE ZHEEVV3(A, Q, W)


COMPLEX*16 A(3, 3) COMPLEX*16 A(3, 3)
COMPLEX*16 Q(3, 3) COMPLEX*16 Q(3, 3)
DOUBLE PRECISION W(3) DOUBLE PRECISION W(3)
This function first calls ZHEEVC3 to find the eigenvalues
corresponds to the C function of the hermitian 3 3 matrix A, and then uses vector
cross products to analytically calculate the normalized
int zheevj3(double complex A[3][3], eigenvectors (see Sec. 3.2). The eigenvalues are stored in
double complex Q[3][3], double w[3]). W, the normalized eigenvectors in the columns of Q.
13

Only the diagonal and upper triangular parts of A need The result satisfies
to contain meaningful values, but all of A may be used ! ! ! !
as temporary storage and might hence be destroyed. RT1 0 CS SN A B CS -SN
= (B2)
0 RT2 -SN CS B C SN CS
SUBROUTINE ZHEEVH3(A, Q, W)
COMPLEX*16 A(3, 3)
COMPLEX*16 Q(3, 3) and RT1 RT2. Note that this convention is different
DOUBLE PRECISION W(3) from the convention used in the corresponding LAPACK
function DLAEV2, where |RT1| |RT2|. We use a different
This is the hybrid algorithm from Sec. 3.3. Its default convention here because it helps to avoid several condi-
behavior is identical to that of ZHEEVV3, but under cer- tional branches in ZHEEVD3 and DSYEVD3.
tain circumstances, it falls back to calling ZHEEVQ3. As
for the other routines, A has to be a hermitian 3 3 ma-
trix, and the eigenvalues and eigenvectors are stored in W SUBROUTINE ZHETRD3(A, Q, D, E)
resp. in the columns of Q. COMPLEX*16 A(3, 3)
Only the diagonal and upper triangular parts of A need COMPLEX*16 Q(3, 3)
to contain meaningful values, and access to A is read-only. DOUBLE PRECISION D(3)
DOUBLE PRECISION E(2)
This routine reduces a hermitian matrix A to real tridi-
agonal form by applying a Householder transformation Q
2. Helper functions
according to Sec. 2.2:

SUBROUTINE DSYEV2(A, B, C, RT1, RT2, CS, SN) D1 E1
A = Q E1 D2 E2 QT . (B3)

DOUBLE PRECISION A, B, C
DOUBLE PRECISION RT1, RT2, CS, SN E2 D3
This subroutine calculates the eigenvalues and eigen-
vectors of a real symmetric 2 2 matrix The function accesses only the diagonal and upper trian-
! gular parts of A. The access is read-only.
A B
. (B1)
B C

[1] H. Goldstein, Classical Mechanics (Addison-Wesley, Anal. Appl. 12, 41 (1991), ISSN 0895-4798.
2002). [11] A. Ralston and P. Rabinowitz, A First Course in Nu-
[2] E. K. Akhmedov (1999), hep-ph/0001264. merical Analysis (Dover Publications, 2001), 2nd ed.
[3] P. Huber, M. Lindner, and W. Winter, Comput. Phys. [12] D. Fadeev and V. Fadeeva, Computational Methods in
Commun. 167, 195 (2005), hep-ph/0407333, URL http: Linear Algebra (Freeman, San Francisco, CA, 1963).
//www.mpi-hd.mpg.de/ globes. [13] S
uli, Endre and Mayers, David F., An Introduction to
[4] P. Huber, J. Kopp, M. Lindner, M. Rolinec, and W. Win- Numerical Analysis (Cambridge University Press, 2003).
ter (2007), hep-ph/0701187, URL http://www.mpi-hd. [14] G. E. Forsythe and P. Henrici, Trans. Am. Math. Soc.
mpg.de/globes. 94, 1 (1960), ISSN 0002-9947.
[5] E. Roulet, Phys. Rev. D44, R935 (1991). [15] J. Stoer and R. Bulirsch, Introduction to Numerical Anal-
[6] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, ysis (Springer-Verlag, Berlin, Germany, 1993), 2nd ed.
J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, [16] J. Demmel and K. Veselic, Tech. Rep., Knoxville, TN,
A. McKenney, et al., LAPACK Users Guide (Society for USA (1989).
Industrial and Applied Mathematics, Philadelphia, PA, [17] J. Demmel, LAPACK Working Note 45 (1992),
1999), 3rd ed., ISBN 0-89871-447-8 (paperback). uT-CS-92-162, May 1992., URL http://www.
[7] M. Galassi et al., GNU Scientific Library Reference Man- netlib.org/lapack/lawnspdf/lawn45.pdf;http:
ual (Network Theory Ltd., 2003), 2nd ed., ISBN 0-9541- //www.netlib.org/lapack/lawns/lawn45.ps.
617-34, URL http://www.gnu.org/software/gsl/. [18] G. W. Stewart, Tech. Rep., College Park, MD, USA
[8] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. (1995).
Vetterlin, Numerical Recipes in C (Cambridge University [19] G. Cardano, Ars Magna (1545).
Press, 1988). [20] J. J. M. Cuppen, Numer. Math. 36, 177 (1981).
[9] O. K. Smith, Commun. ACM 4, 168 (1961), ISSN 0001- [21] B. L. van der Waerden, Algebra 1 (Springer-Verlag,
0782. 2006).
[10] A. W. Bojanczyk and A. Lutoborski, SIAM J. Matrix

You might also like