Analysis of Two Partial-least-Squares Algorithms For Multivariate Calibration PDF
Analysis of Two Partial-least-Squares Algorithms For Multivariate Calibration PDF
Multivariate Calibration
ROLF MANNE
Abstract
Manne,R., 1987. Analysis of two partial-least-squares algorithms for multivariate calibra-
tion. Chemometrics and Intelligent Laboratory Systems, 2:187-197.
Two algorithms for multivariate calibration are analysed in terms of standard linear
regression theory. The matrix inversion problem of linear regression is shown to be solved
by transformations to a bidiagonal form in PLS1 and to a triangular form in PLS2. PLS1
gives results identical with the bidiagonalization algorithm by Golub and Kahan, similar to
the method of conjugate gradients. The general efficiency of the algorithms is discussed.
1 INTRODUCTION
Partial least squares (PLS) is the name of a set of algorithms developed by Wold for use in
econometrics[1,2]. They have in common that no a priori assumptions are made about the model
structure, a fact which has given rise to the name soft modelling for the PLS approach. Instead,
estimates of reliability may be made using the jack-knife or cross-validation (for a review of these
techniques, see ref.3). Although such reliability estimates seem to be essential in the description
of the PLS approach [4], they are not considered here.
The PLS approach has been used in chemometrics for extracting chemical information from
complex spectra which contain interference effects from other factors (noise) than those of primary
interest [4-10]. This problem can also be solved by using more or less standard least-squares
methods, provided that the collinearity problem is considered [11]. What is required by these
methods is a proper method for calculating generalized inverses of matrices. The PLS approach,
however, is so far only described through its algorithms, and appears to have an intuitive
character. There has been some confusion about what the PLS algorithms do, what their
underlying mathematics are, and how they relate to other formulations of linear regression theory.
The purpose of this paper is to place the two PLS algorithms that have been used in
chemometrics in relation to a more conventional description of linear regression. The two PLS
algorithms are called, in the terminology in chemometrics, PLS1 and PLS2 [6]. PLS2 considers
the case when several chemical variables are to be fitted to the spectrum and has PLS1, which
fits one such variable, as a special case. A third algorithm, which is equivalent to PLS1, has been
1
suggested by Martens and Ns [7]. It differs from the latter in its orthogonality relations, but
has not been used for actual calculations. It will be referred to here only in passing.
Some properties of the PLS1 algorithm have previously been described by Wold et al. [4], in
particular its equivalence to a conjugate-greadient algorithm. This equivalence, however, has not
been used further in the literature, e.g., in comparisons with other methods. Such comparisons
have been made, particularly with the method of principal components regression (PCR), by Ns
and Martens [12] and Helland [13]. A recent tutorial article by Geladi and Kowalski [14] also
attempts to present the PLS algorithms in relation to PCR but, unfortunately, suffers from a
certain lack of precision.
The outline of this paper is as follows. After the notation has been established, the solution
to the problem of linear regression is developed using the Moore-Penrose generalized inverse. The
bidiagonalization algorithm of Golub and Kahan [15] is sketched and shown to be equivalent to
the PLS1 algorithm. Properties of this solution are discussed.
The Ulvik workshop made it clear that within chemistry and the geosciences there is growing
interest in the methods of multivariate statistics but, at the same time, the theoretical background
of workers in the field is highly variable. With this situation in mind, and attempt has been made
to make the presentation reasonably self-contained.
2 NOTATION
The notation used for the PLS method varies from publication to publication. Further confusion
is caused by the use of different conventions for normalization. In the following, we arrange
the measurements of the calibration set in a matrix X = Xij , where each row contains the
measurements for a given sample and the each column the measurements for a given variable.
the number of samples (or rows) is given as n and the number of variables (or column) as p. With
the experimental situation in mind, we shall call the p measurements for sample i a spectrum
which we denote by xi . For each sample in the calibration set, there is, in addition, a chemical
variable yi which is represented by a column vector y. In the prediction step the spectrum x of
a new sample is used to predict the value h for the sample.
We use bold-face capital letters for matrices and bold-face lower-case letters for vectors. The
transpose of vectors and matrices will be denoted by , e.g., y0 and X0 . A scalar production of
two (column) vectors is thus written (a0 b). Whenever possible, scalar quantities obtained from
vector or matrix multiplication are enclosed in parentheses. The Euclidian norm of a column
vector a is written ||a|| = (a0 a)1/2 . The Kronecker delta, ij , is used to describe orthogonality
relations. It takes the values 1 for i = j and 0 for i 6= j.
2
3 CALIBRATION AND THE LEAST-SQUARES METHOD
The relationship between a set of spectra X and the known values y of the chemical variable is
assumed to be
p
X
yi = b0 + Xij bj + noise(i = 1, 2, . . . , n) (1)
j=1
If all variables are measured relative to their averages, the natural estimate of b0 is zero, and in
this sense b0 may be eliminated from eqn.1. We thus assume that the zero points of the variables
y, x1 , x2 , . . . , xp are chosen so that
n
X n
X
yi = 0 and Xij = 0(j = 1, . . . , p) (2)
i=1 i=1
p
X
yi = Xij bj (3)
j=1
y = Xb (4)
For the estimation of b, eq.4 in general does not have an exact solution. In standard least
squares one instead minimizes the residual error ||e||2 defined by the relationship
e = y Xb (5)
X0 Xb = X0 y (6)
which reduce to eq.4 for non-singular square matrices X. Provided that the inverse of X0 X exists,
the solution may be written as
1
b = (X0 X) X0 y (7)
If the inverse does not exist, there will be non-zero vectors cj which fulfil
X0 Xcj = 0 (8)
P
Then, if b is a solution to eq.6, so is b + j cj (j arbitrary scalars). This may be expressed
by substituting for (X0 X)1 any generalized inverse of X0 X. A particular such generalized inverse
is chosen as follows. Write X as product of three matrices:
3
X = URW0 (9)
U0 U = W0 W = 1 (10)
X+ = WR1 U0 (11)
Truncations are commonly introduced by choosing the dimension of R equal to r smaller than
the rank a of X. As written (no truncation implied), X+ fulfils not only the defining condition
for a generalized inverse, i.e.,
XX+ X = X (12)
but also
which define the Moore-Penrose generalized inverse (see,e.g., ref. 17). Insertion of eq.9 into eq.6
gives
WW0 b = WR1 U0 y (14)
It may be shown [17] that the Moore-Penrose generalized inverse gives the minimum-norm
solution to the least squares problem,i.e.,
is the solution which minimizes (b0 b) in the case that X0 X is singular and eq.6 has multiple
solutions.
The orthogonal decomposition 9 gives together with eq.15 a simple expression for the residual
error:
e = y Xb = (1 URW0 WR1 U0 )y
= (1 UU0 )y (16)
There are many degrees of freedom in the orthogonal decomposition 9. From the computa-
tional point of view, it is important that the inverse R1 is simple to calculate. This may be
achieved, e.g., with R diagonal or triangular. For a right triangular matrix one has Rij = 0 for
i > j. In that case, from the definition of the inverse it follows that
X
(R1 )ik Rkj + (R1 )ij Rjj = ij (17)
k<j
4
or
X
(R1 )ij = (ij (R1 )ik Rkj )/Rjj (18)
k<j
A bidiagonal matrix is a special case of a triangular matrix with Rij = 0 except for i = j
and either i = j 1 (right bidiagonal) or i = j + 1 (left bidiagonal). From successive application
of eq.18 it follows that the inverse of a right triangular matrix (including the bidiagonal case) is
itself right triangular.
With a diagonal matrix R, i.e., with Rij = 0 for i 6= j, the decomposition 9 is known as the
singular-value decomposition. The singular values, which are chosen to be 0, are the diagonal
elements Rii . Another decomposition of interest for matrix inversion is the QR decomposition
with R right triangular and W = 1, the unit matrix.
i+1
X
0
X ui = wj (wj0 X0 ui ) (22)
j=1
5
of X according to eq.9 therefore makes Rij = u0i Xwj a right bidiagonal matrix. Eqs. 21 and 22
may therefore be reformulated as
X0 ui = wi+1 (wi+1
0
X0 ui ) + wi (wi0 X0 ui ) (24)
Step 1: X1 = X; y1 = y
The iteration in Step 2 may be continued until the rank of Xi equals zero (a=rank of X)
or may be interrupted earlier using, e.g., a stopping criterion from cross-validation. The present
description differs from that given, e.g., by Martens and Ns [7] only by the introduction of
normalized vectors {ui }. The latter write
Step 2.2a: ti = Xi wi
but have equation for the other steps that give results identical with our formulation. The
PLS decomposition of the X matrix into
6
X = TP (26)
The equivalence of this algorithm with the ordinary PLS1 algorithm has been pointed out by
Ns and Martens [12] and shown in detail by Helland [13].
In the PLS1 algorithm the orthogonality of {ui } follows from Steps 2.2 and 2.3 through
induction. One may then reformulate Step 2.3 as
Pi 0
Step 2.3b: Xi+1 = (1 k=1 uk uk )X
with specifications for the parameter ci , called the inner PLS relation. This updating therefore
has no effect on the result. Later publications which use the inner PLS relationship [6,14] have,
in fact, the same updating expression as used here in Step 2.4.
Comparing PLS1 with Biadiag2 one finds that Step 2.2 and 2.3b of the former give the second
step of the latter, eq. 20, since (u0k Xwi = 0) for k < i 1. In order to show the equivalence of
the first step of Bidiag2, eq. 19 and Step 2.1b, we write the latter as
From
7
wi+1 ||X0i+1 y|| = X0i+1 y = X0i (1 ui u0i )y = wi ||X0i y|| X0 ui (u0i y) (32)
which inserted back in eq. 30 gives the bidiagonalization equation 19 apart from, possibly, a sign
factor.
5 MATRIX TRANSFORMATIONS-PLS2
The PLS2 algorithm was designed for the case when several chemical variable vectors yk are to
be fitted using the same measured spectra X. The chemical vectors are collected as columns
in a matrix Y. The algorithm may be described as follow:
Step1: X1 = X; Y1 = Y
As for PLS1 the iteration may be continued until the rank of Xi is zero or may be stopped
earlier (r < a).
Several simplifications of the algorithm can be made. What is important in the present con-
text, however, is that the ui and wi vectors form two orthonormal sets, and that the transformed
matrix U0 XW is right triangular. Also, it may be shown that PLS2 reduces to PLS1 when the
Y matrix has only one column. In the latter case zi from Step 2.2.3 has only a single element=1.
Convergence of wi is then obtained in the first iteration.
The orthogonality u0i ui = ij follows from Steps 2.2.2 and 2.3 in the same way as in PLS1. This
in turn makes it possible to show that eq.21 is valid also for PLS2, which proves the triangularity
of the transformed matrix R = U0 XW. Finally, the orthogonality wi0 wj =ij (i > j) may be
8
established from
We write here this expression as for PLS1. The extension to PLS2, however, is trivial. Utilizing
the fact that R1 is right triangular both in PLS1 and in PLS2, the vector of regression coefficients
can be written as
XX X
b= wi (R1 )ij (u0j y) = dj (u0j y) (36)
j ij j
where wi and uj are columns of the matrices W and U, respectively. The substitution
X
dj = wi (R1 )ij (37)
ij
or
X
dk Rkj = wj (38)
kj
X
dj = (wj dk Rkj )/Rjj (39)
k<j
This equation makes it possible to calculate b with little use of computer memory, especially
since also
(u0j y) = (u0j1 y)Rj1,j /Rjj (41)
9
In the following, we develop eq. 35 to yield the equations for the prediction used in the PLS
literature. A basic feature of these equations is that the regression vector b is never explicitly
calculated. For this reason, the predicted value is written as
r
X r
X
y = (xb) = (xdj )(u0j y) = hj (u0j y) (42)
j=1 j=1
These expressions differ from those given by Martens and N s [7] only in the normalization.
The latter write r
X
y = tj qj (44)
j=1
X
tj = (x tk pk )wj (46)
k<j
and
pk = t0k Xk /(t0k tk ) = u0k X/(u0k Xwk ) (47)
i.e.
X
tj = (xwj ) tk Rkj /Rkk (48)
k<j
From the identification tj = hj Rjj in eq. 48, it follows that the prediction equation 44 of
Martens and N s[7] gives the same result as eq. 42.
Wold et al. [6] and Geladi and Kowalski [14] give expressions for prediction containing the
inner PLS relationship. They write for the PLS1 case
r
X
y = cj tj qj (49)
j=1
The need for the inner relationship comes from the normalization of qj to |qj | = 1 used by
these authors. A detailed calculation shows that cj qj in eq. 49 equals qj as defined by Martens
and N s [7]. Also for PLS2, the inner relationship of eq. 48 gives results identical with those of
Martens and N s[7].
On the other hand, we believe that Sjostrom et al. [5] make an erroneous use of the inner PLS
relationship in the prediction step. These authors make still another choice of normalization.
Also in other respects their prediction equations indicate an early stage of development.
10
Geladi and Kowalski in their tutorial [14], discuss a procedure for obtaining orthogonal t
values which we, so far, have chosen to overlook. This procedure, however, is said to be not
absolutely necessary. What is described is a scaling procedure for the vectors pj , tj and wj so
that
pnew
j = pj /||pj ||
tnew
j = tj ||pj ||
wjnew = wj ||pj || (50)
It should be noted that both before and after this scaling the vectors tj are orthogonal. The
replacements in eq. 50 also scal the values of cj and tj appearing in eq. 49 but have no effect
upon the predicted value y.
As the iteration proceeds ||ys ||2 becomes smaller, and the stability of this quantity may be
taken as a stopping criterion. Using Step 2.1 and eq. 21 we write
11
which may be used to evaluate the residual error (eq. 52). As mentioned above, eq.41, one may
also use eq. 54 in the evaluation of regression coefficients.
Another quantity of interest is the normalization integral ||X0 es || = ||X0s+1 y||, which appears
in the denominator of Step 2.1 of PLS1. When this quantity approaches zero the iteration scheme
becomes instable. The equation
may be derived from eq. 30. The use of eqs. 52 and 55 and further criteria for stopping was
dicussed by Paige and Saunders [18]. These criteria are simple to evaluate and relate directly to
the numerical properties of the PLS1 iteration scheme. For this reason, they may be of advantage
as a complement to the cross-validation currently used.
and obtain
X
X0 X = d2 fi fi0 (57)
i
(no contribution from vectors fi with di = 0). Application of the Lanczos equation (25) expanded
according to eq. 57 yields
X
w2 X0 Xw1 w1 (w10 X0 Xw1 ) = fi ci [d2i (w10 X0 Xw1 )] (59)
Partial summations over functions fi with degenerate eigenvalues di give the same results
for w2 as for w1 . One may therefore show that the number of linearly independent terms in the
sequence {wi } is no greater than the number of eigenvectors with distinct eigenvalues contributing
to the expansion of w1 . Almost degenerate or clustered eigenvalues coupled with finite numerical
accuracy may make the expansion even shorter in practice. Compare this with PCR, where all
eigenvectors with large eigenvalues are used irrespective of their degeneracy. These properties of
12
PLS1 have been pointed out by Ns and Martens [12] and by Helland [13]. They are also well
established in the literature on the conjugate gradient method (e.g.,ref. 20).
The exclusion of di = 0 not only for all wj but also for all uj is the main advantage of
Bidiag2 over the Bidiag1 algorithm used by Paige and Saunders in their LSQR algorithm [18].
The latter starts the bidiagonalization with u1 = y1 , w1 = X0 y, and obtains a left bidiagonal
matrix along similar lines as Bidiag2. The two algorithms generate the same set of vectors {wi },
but Bidiag1 runs into singularity problems for least-squares problems, which, however, are solved
by the application of the QR algorithm. The resulting right bidiagonal matrix is the same as that
obtained directly from Bidiag2. We have not found any obvious advantage of this algorithm over
the direct use of Bidiag2 as in PLS1.
In PCR the stopping or truncation criterion is usually the magnitude of the eigenvalue di . As
discussed by Joliffe [24], this may lead to omission of vectors fi , which are important for reducing
the residual error ||es ||2 , eq. 51.
On the other hand, the Bidiag2/PLS1 algorithms or, equivalently, the Lancoz algorithm do not
favour only the principal components with large eigenvalues di . Instead, it is our own experience
from eigenvalue calculations with the Lanczos algorithm in the initial tridiagonalization step [25]
that convergence is first reached for eigenvalues at the ends of the spectrum. That means in the
present context that with large and small eigenvalues are favoured over those with eigenvalues
in the middle. the open question is the extent to which the principal components with small
eigenvalues represent noise and therefore should be excluded from model building. Without
additional information about the data there is no simple solution to this problem.
9 UNDERSTANDING PLS2
In this section we consider the triangularization algorithm in PLS2. At convergence the iteration
loop contained in Step 2.2 of the algorithm (see Matrix transformations PLS2) leads to the
eigenvalue relationships
X0i Yi Yi0 Xi wi = ki2 wi (60)
where ki2 , which are the numerically large eigenvalues, may be evaluated from the product
of normalization integrals in Steps 2.2.1-2.2.4. We interpret these relationships as principal
component relationships of the matrix X0i Yi . For each matrix one thus obtains the principal
component with largest variance. As mentioned before, the vectors wi obtained in this way are
mutually orthogonal, but the vectors zi are not.
The matrix X0i Yi may be simplified to X0 Yi . Hence for each iteration, those parts of the
column vectors of Y which overlap with ui = Xi wi /||Xi wi || are removed. Eventually, a ui is
13
obtained that has zero (or a small) overlap with all the columns of Y, and the iteration has to
be stopped.
As for PLS1, it is of interest to relate the vectors wi to the singular-value decomposition of
X, eq. 56. We write eq. 60 as
X0 Ai Xwi = ki2 wi (62)
We obtain
X X
fj (dj gj0 Ai Xwi ) = ki2 fj cji (64)
j j
10 DISCUSSION
The first result of this study is that both PLS algorithms, as given by Martens and Ns [7], yield
the ordinary least-squares solution for invertible matrices X0 X. The algorithms correspond to
standard methods for inverting matrices or solving systems of linear equations, and the various
steps of these methods are identified in the PLS algorithms. This result is likely to be known
to those who know the method in detail. However, as parts of the PLS literature are obscure,
and as even recent descriptions of the algorithms in refereed publications contain errors, it is
felt necessary to make this statement. There are, however, no reasons to believe that the errors
mentioned carry over into current computer codes.
The close relationship with conjugate gradient techniques makes it possible to speculate about
the computational utility of PLS methods relative to other methods of linear regression. As
pointed out also by Wold et al. [4], the matrix transformations of Bidiag2 are computationally
simpler than those of the original PLS1 method. Further, in the prediction step some saving
would be possible using the equations given here. For problems of moderate size this saving will
not, however, be large. For small matrices a still faster procedure for bi- or tridiagonalization
is Householders method. Savings by using this method would be important for both matrix
inversion and matrix diagonalization (principal components regression). The real saving with
methods of the conjugate gradients type discussed here is for large and spare matrices where the
elements can only be accessed in a fixed order.
14
On the other hand, with present technology neigher matrix inversion nor matrix diagonal-
ization is particularly difficult, even on a small computer. The cost of obtaining high-quality
chemical data for the calibration is likely to be much higher than the cost of computing. This
puts a limit on the amount of effort one may want to invest in program refinement.
Compared with principal components regression/sigular value decomposition it is clear that
PLS1/Bidiag2 manages with fewer latent vectors. Like PCR, the PLS methods avoid exact linear
dependences, i.e., the zero eigenvalues of the X0 X matrix. On the other hand, there is room for
uncertainty in how PLS treats approximate linear dependences, i.e., small positive eigenvalues
of X0 X. Is it desirable to include such eigenvalues irrespective of the data considered? Detailed
studies of this problem in a PCR procedure might lead to a cut-off criterion where the smallness
of the eigenvalue is compared with the importance of the eigenvector for reducing the residual
error.
The points where the PLS algorithms depart most from standard regression methods are the
use of latent vectors (PLS factors) instead of regression coefficients in the prediction step, and
that the matrix inversion of standard regression methods is actually performed anew for each
prediction sample. As is clear from the present work and also from that of Helland [13], the
latter procedure is by no means a requirement. Once the latent vectors are obtained they may
be combined into regression coefficients, (eq. 36), i.e., into one vector giving the same predicted
value as obtained with several PLS or PCR vectors. A possible use of the PLS factors would
then be for the detection of outliers among samples supplied for prediction. For this purpose, a
regression vector is insufficient as it spans only one dimension. On the other hand, there seems
to be no guarantee that the space spanned by the PLS vectors is more suitable for this purpose
than that spanned by principal components.
It seems as if the PLS2 method has few numerical or computational advantage both relative to
PLS1/Bidiag2 performed for each dependent variable y and relative to PCR. The power method
of extracting eigenvalues, although simple to program, is inefficient, especially for near-degenerate
eigenvalues. In contrast to principal components analysis, the PLS2 eigenvalue problem changes
from iteration to iteration, which makes the saving small if matrix diagonalization is used instead.
As long as the number of dependent variables is relatively small, the use of PLS1 for each
dependent variable may well be worth the effort.
In conclusion, it can be stated that the PLS1 algorithm provides one solution to the calibration
problem using collinear data. This solution has a number of attractive features, some of which
have not yet been exploited. It is an open question, however, whether this method is the optimal
solution to the problem or not. For an answer one would have to consider the structure of the
input data in greater detail than has been done so far.
ACKNOWLEDGEMENTS
Numerous discussions with Olav M. Kvalheim are gratefully acknowledged. Thanks are also
due to John Birks, Inge Helland, Terje V. Karstang, H.J.H. MacFie, Harald Martens and an
15
unnamed referee for valuable comments.
References
[1] H. Wold, Soft modelling. The basic design and some extensions, in K. Jo reskog and H.
Wold (Editors), Systems under Indirect Observation, North-Holland, Amsterdam, 1982, Vol.
II, pp. 1-54
[2] H.Wold, Partial least squares, in S. Kotz and N.L. Johnson (Editors), Encyclopedia of
Statistical Sciences, Vol. 6, Wiley, New York, 1985, pp. 581-591
[3] B. Efron and G. Gong, A leisurely look at the bootstrap, jackknife and cross-validation, The
American Statistician, 37(1983)37-48
[4] S. Wold, A. Ruhe, H. Wold and W.J. Dunn III, The collinearity problem in linear regression.
The partial least squares (PLS) approach to generalized inverses, SIAM Journal of Scientific
and Statistical Computations, 5(1984)735-743
[5] M. Sjostrom, S. Wold, W. Lindberg, J.-A. Persson and H. Martens, Amultivariate calibration
problem in analytical chemistry solved by partial least-squares models in latent variables,
Analytica Chimica Acta, 150(1983)61-70.
[6] S. Wold, C. Albano, W.J. Dunn III, K. Esbensen, S. Hellberg, E. Johansson and M. Sjostrom,
Pattern recognition: finding and using regularities in multivariate data, in H. Martens and
H. Russworm, Jr. (Editors), Food Reasearch and Data Analysis, Applied Science Publishers,
London, 1983, pp. 147-188
[7] H. Martens and T. Ns, Multivariate calibration by data compression, in H.A. Martens,
Multivariate Calibration. Quantitative Interpretation of Non-selective Chemical Data, Dr.
techn. thesis, Technical University of Norway, Trondheim, 1985, pp. 167-286; K. Norris
and P.C. Williams (Editors), Near Infrared Technology in Agricultural and Food Industries,
American Cereal Association, St. Paul, MN, in press.
[8] T.V. Karstang and R. Eastgate, Multivariate calibration of an X-ray diffractometer by partial
least squares regression, Chemometrics and Intelligent Laboratory Systems, 2(1987)209-219.
[9] A.A. Christy, R.A. Velapoldi, T.V. Karstang, O.M. Kvalheim, E.Sletten and N. Telns,
Multivariate calibration of diffuse reflectance infrared spectra of coals as an alternative
to rank determination by vitrinite reflectance, Chemometrics and Intelligent Laboratory
Systems, 2(1987)221-232
[10] K.H. Esbensen and H. Martens, Predicting oil-well permeability and porosity from wire-line
geophysical logs-a feasibility study using partial least squares regression, Chemometrics and
Intelligent Laboratory Systems, 2(1987)221-232.
[11] P.J. Brown, Multivariate calibration, Proceedings of the Royal Statistical Society, Series B,
44(1982) 287-321
16
[13] I.S. helland, On the structure of partial least squares regression, Reports from the Department
of Mathematics and Statics, Agricultural University of Norway, 21(1986)44
[14] P. Geladi and B.R. Kowalski, Partial least-squares regression: A tutorial, Analytica Chimica
Acta, 185(1986)1-17
[15] G.H. Golub and W. Kahan, Calculating the singular values and pseudo-inverse of a matrix,
SIAM Journal of Numerical Analysis, Series B, 2(1965)205-224
[16] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear
differential and integral operators, Journal of Research of the National Bureau of Standards,
45(1950)255-282.
[17] C.R. Rao and S.K. Mitra, Generalized Inverse of Matrices and its Applications, Wiley, New
York, 1971
[18] C.C. Paige and M.A. Saunders, A bidiagonalization algorithm for sparse linear equations
and least squares problems, ACM Transcactions on Mathematical Software, 8(1982)43-71
[19] M.R. Hestenes and E. Stiefel, Method of conjugate gradients for solving linear systems,
Journal of Research of the National Bureau of Standards, 49(1952)409-436.
[21] H. Wold, Estimation of principal components and related models by iterative least squares,
in P.R. Krishnaiah (Editor), Multivariate Analysis, Academic Press, New York, 1966, pp.391-
420.
[22] A.S. Householder, The Theory of Matrices in Numerical Analysis, Blaisdell Publ. Corp., New
York, 1964, reprinted by Dover Publications, New York, 1975, p. 198.
[23] C.R. Muntz, Solution directe de lequation seculaire et des problemes analogues transcen-
dentes, Comptes Rendus de lAcademie des Sciences, Paris, 156(1913)443-46
[24] I.T. Joliffe, A note on the use of principal components in regression, Applied Statistics,
31(1982)300-303.
17