0% found this document useful (0 votes)

19 views

Tech Report03 2

Uploaded by

personal.njiyar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Tech Report03 2

Uploaded by

personal.njiyar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 55

Canonical correlation analysis; An

overview with application to learning

methods

David R. Hardoon , Sandor Szedmak and John Shawe-Taylor

Department of Computer Science
Royal Holloway, University of London
{davidh, sandor, john}@cs.rhul.ac.uk

Technical Report
CSD-TR-03-02
May 28, 2003

DepartmentofComputerScience
Egham, Surrey TW20 0EX, England
Abstract

We present a general method using kernel Canonical Correlation Analysis to

learn a semantic representation to web images and their associated text. The
semantic space provides a common representation and enables a comparison
between the text and images. In the experiments we look at two approaches of
retrieving images based only on their content from a text query. We com-pare
the approaches against a standard cross-representation retrieval technique
known as the Generalised Vector Space Model.

Keywords: Canonical correlation analysis, kernel canonical correlation

analysis, partial Gram-Schmidt orthogonolisation, Cholesky decomposition,
incomplete Cholesky decomposition, kernel methods.
Introduction 1

1 Introduction
During recent years there have been advances in data learning using kernel
methods. Kernel representation oﬀers an alternative learning to non-linear
functions by projecting the data into a high dimensional feature space to
increase the computational power of the linear learning machines, though
this still leaves open the issue of how best to choose the features or the
kernel func-tion in ways that will improve performance. We review some of
the methods that have been developed for learning the feature space.

• Principal Component Analysis (PCA) is a multivariate data analysis proce-dure that

involves a transformation of a number of possibly correlated variables into a smaller number of
uncorrelated variables known as principal components. PCA only makes use of the training
inputs while making no use of the labels.

• Independent Component Analysis (ICA) in contrast to correlation-based

transformations such as PCA not only decorrelates the signals but also reduces higher-order
statistical dependencies, attempting to make the signals as inde-pendent as possible. In
other words, ICA is a way of finding a linear not only orthogonal co-ordinate system in any
multivariate data. The directions of the axes of this co-ordinate system are determined by
both the second and higher order statistics of the original data. The goal is to perform a
linear transform which makes the resulting variables as statistically independent from each
other as possible.

• Partial Least Squares (PLS) is a method similar to canonical correlation analysis. It

selects feature directions that are useful for the task at hand, though PLS only uses one view
of an object and the label as the corresponding pair. PLS could be thought of as a method,
which looks for directions that are good at distinguishing the diﬀerent labels.

• Canonical Correlation Analysis (CCA) is a method of correlating linear relationships

between two multidimensional variables. CCA can be seen as us-ing complex labels as a
way of guiding feature selection towards the underling semantics. CCA makes use of two
views of the same semantic object to extract the representation of the semantics. The main
diﬀerence between CCA and the other three methods is that CCA is closely related to
mutual information (Borga 1998 [3]). Hence CCA can be easily motivated in information
based tasks and is our natural selection.

Proposed by H. Hotelling in 1936 [12], CCA can be seen as the problem of find-
ing basis vectors for two sets of variables such that the correlation between the
projections of the variables onto these basis vectors are mutually maximised. In
an attempt to increase the flexibility of the feature selection, kernelisation of CCA
(KCCA) has been applied to map the hypotheses to a higher-dimensional
feature space. KCCA has been applied in some preliminary work by Fyfe & Lai
[8], Akaho [1] and the recently Vinokourov et al. [19] with improved results.
Introduction 2

During recent years there has been a vast increase in the amount of mul-
timedia content available both oﬀ-line and online, though we are unable to
access or make use of this data unless it is organised in such a way as to
allow eﬃcient browsing. To enable content based retrieval with no reference
to labeling we attempt to learn the semantic representation of images and
their associated text. We present a general approach using KCCA that can
be used for content [11] to as well as mate based retrieval [18, 11]. In both
cases we compare the KCCA approach to the Generalised Vector Space
Model (GVSM), which aims at capturing some term-term correlations by
looking at co-occurrence information.

This study aims to serve as a tutorial and give additional novel contribu-tions
in the following ways:

• In this study we follow the work of Borga [4] where we represent the eigenproblem
as two eigenvalue equations as this allows us to reduce the com-putation time and
dimensionality of the eigenvectors.

• Further to that, we follow the idea of Bach & Jordan [2] to compute a new correlation
matrix with reduced dimensionality. Though Bach & Jordan [2] address a very di ﬀerent problem,
they use the same underlining technique of Cholesky decomposition to re-represent the kernel
matrices. We show that by using partial Gram-Schmidt orthogonolisation [6] is equivalent to
incomplete Cholesky decomposition, in the sense that incomplete Cholesky decomposition can
be seen as a dual implementation of partial Gram-Schmidt.

• We show that the general approach can be adapted to two di ﬀerent types of
problems, content and mate retrieval, by only changing the selection of eigenvectors used in
the semantic projection.

• To simplify the learning of the KCCA we explore a method of selecting the

regularization parameter a priori such that it gives a value that performs well in several
diﬀerent tasks.

In this study we also present a generalisation of the framework for canoni-cal

correlation analysis. Our approach is based on the works of Gifi (1990) and
Ketterling (1971). The purpose of the generalisation is to extend the
canonical correlation as an associativity measure between two set of
variables to more than two sets, whilst preserving most of its properties. The
generalisation starts with the optimisation problem formulation of canonical
correlation. By changing the objective function we will arrive at the multi set
problem. Ap-plying similar constraint sets in the optimisation problems we
find that the feasible solutions are singular vectors of matrices, which are
derived the same way for the original and generalised problem.

In Section 2 we present the theoretical background of CCA. In Section 3

Theoretical Foundations 3

we present the CCA and KCCA algorithm. Approaches to deal with the com-
putational problems that arose in Section 3 are presented in Section 4. Our
experimental results are presented In Section 5. In Section 6 we present the
generalisation framework for CCA while in Section 7 draws final conclusions.

2 Theoretical Foundations
Proposed by H. Hotelling in 1936 [12], Canonical correlation analysis can be
seen as the problem of finding basis vectors for two sets of variables such that
the correlation between the projections of the variables onto these basis vectors
are mutually maximised. Correlation analysis is dependent on the co-ordinate
system in which the variables are described, so even if there is a very strong
linear relationship between two sets of multidimensional variables, depending on
the co-ordinate system used, this relationship might not be visible as a cor-
relation. Canonical correlation analysis seeks a pair of linear transformations
one for each of the sets of variables such that when the set of variables are
transformed the corresponding co-ordinates are maximally correlated.

Consider a multivariate random vector of the form (x, y). Suppose we are
given a sample of instances S = ((x1, y1), . . . , (xn , yn )) of (x, y), we use Sx
to denote (x1, . . . , xn) and similarly Sy to denote (y1, . . . , yn ). We can
consider defining a new co-ordinate for x by choosing a direction w x and
projecting x onto that direction
x → hwx, xi
if we do the same for y by choosing a direction wy we obtain a sample of the
new x co-ordinate as

Sx,wx = (hwx, x1i, . . . , hwx, xn i)

with the corresponding values of the new y co-ordinate being

Sy,wy = (hwy , y1i, . . . , hwy , yni)

The first stage of canonical correlation is to choose w x and wy to maximise

the correlation between the two vectors. In other words the function to be
maximised is

ρ= max corr(Sxwx, Sy wy )
wx ,wy

hS w , S w i
= max x x y y

wx ,wy kSxwxkkSy wy k
ˆ
If we use E [f (x, y)] to denote the empirical expectation of the function f (x, y),
were
ˆ 1 m
X

E [f (x, y)] = m f (xi, yi)

i=1
Algorithm 4

we can rewrite the correlation expression as

ˆ
ρ = max E[hwx, xihwy , yi]

q
wx ,wy Eˆ[hwx, xi02 ]Eˆ[0hwx, xi2]
ˆ

E[wxxy wy ]
= max
q
Eˆ[wx0xx0wx]Eˆ[wy0yy0wy ]
wx ,wy

follows that 0
ˆ 0
wxE[xy ]wy
ρ = max .
q
0 wx0Eˆ[xx0]wxwy0Eˆ[yy0]wy
wx ,wy

Where we use A to denote the transpose of a vector or matrix A.

Now observe that the covariance matrix of (x, y) is
C(x, y) = E ˆ" y y
#
= Cyx
C
yy = C. (2.1)
0
C C
x x xx xy

The total covariance matrix C is a block matrix where the within-sets covari-ance
matrices are Cxx and Cyy and the between-sets covariance matrices are
0
Cxy = Cyx

Hence, we can rewrite the function ρ as

0
wx Cxy wy
ρ = max
p 0 C w
wx ,wy Cxxwxwy w x0 yy y
the maximum canonical correlation is the maximum of ρ with respect to w x and
wy .

3 Algorithm
In this section we will give an overview of the Canonical correlation analysis
(CCA) and Kernel-CCA (KCCA) algorithms where we formulate the optimisa-
tion problem as a generalised eigenproblem.

3.1 Canonical Correlation Analysis

Observe that the solution of equation (2.2) is not aﬀected by re-scaling w x or
wy either together or independently, so that for example replacing w x by αwx
gives the quotient
0 0
αwx Cxywy wx Cxy wy

=
2 0 0 p 0 C w
q α wx Cxxwxwy Cyywy w x0 Cxxwxwy yy y
Since the choice of re-scaling is therefore arbitrary, the CCA optimisation prob-
lem formulated in equation (2.2) is equivalent to maximising the numerator
Algorithm 5

subject to
0
wx Cxxwx = 1
0
wy Cyywy = 1.
The corresponding Lagrangian is
λ
x λy
0 0 0
L(λ, wx, wy ) = wx Cxywy − 2 (wx Cxxwx − 1) − 2 (wy Cyywy − 1)
Taking derivatives in respect to wx and wy we obtain
∂f

∂wx = Cxywy − λxCxxwx = 0 (3.1)

∂f

∂wy = Cyxwx − λy Cyywy = 0. (3.2)

0 0
Subtracting wy times the second equation from wx times the first we have
0 0 0 0 0
0 = wx Cxy wy − wx λxCxxwx − wy Cyxwx + wy λy Cyywy = λy wy Cyy wy −
0
λxwx Cxxwx,
which together with the constraints implies that λ y − λx = 0, let λ = λ x = λy .
Assuming Cyy is invertible we have
−1
Cyy Cyxwx
wy = (3.3)
λ
and so substituting in equation (3.1) gives
−1
Cxy Cyy Cyxwx
λ − λCxxwx = 0
or
−1 2
CxyCyy Cyxwx = λ Cxxwx (3.4)
We are left with a generalised eigenproblem of the form Ax = λBx. We can
therefore find the co-ordinate system that optimises the correlation between
corresponding co-ordinates by first solving for the generalised eigenvectors
of equation (3.4) to obtain the sequence of wx’s and then using equation
(3.3) to find the corresponding wy ’s.

As the covariance matrices Cxx and Cyy are symmetric positive definite we
are able to decompose them using a complete Cholesky decomposition
(more details on Cholesky decomposition can be found in section 4.2)
0
Cxx = Rxx · Rxx
0
where Rxx is a lower triangular matrix. If we let u x = Rxx · wx we are able to
rewrite equation (3.4) as follows
−1 −1 2
C C C R 0 u x = λ R ux
xy yy yx xx xx
We are therefore left with a symmetric eigenproblem of the form Ax = λx.
Algorithm 6

3.2 Kernel Canonical Correlation Analysis

CCA may not extract useful descriptors of the data because of its linearity.
Kernel CCA oﬀers an alternative solution by first projecting the data into a
higher dimensional feature space

φ : x = (x1, . . . xn) 7→φ(x) = (φ1(x), . . . , φN (x)) (n < N )

before performing CCA in the new feature space, essentially moving from
the primal to the dual representation approach. Kernels are methods of
implicitly mapping data into a higher dimensional feature space, a method
known as the ”kernel trick”. A kernel is a function K, such that for all x, z X

K(x, z) =< φ(x) · φ(z) >

where φ is a mapping from X to a feature space F . Kernels o ﬀer a great
deal of flexibility, as they can be generated from other kernels. In the kernel
the data only appears through entries in the Gram matrix, therefore this
approach gives a further advantage as the number of tuneable parameters
and updating time does not depend on the number of attributes being used.

Using the definition of the covariance matrix in equation (2.1) we can rewrite
the covariance matrix C using the data matrices (of vectors) X and Y , which
have the sample vector as rows and are therefore of size m × N , we obtain
0
Cxx = X X
0
Cxy = X Y.

The directions wx and wy (of length N ) can be rewritten as the projection of

the data onto the direction α and β (of length m)
0
wx = X α
0
wy = Y β.

Substituting into equation (2.2) we obtain the following

0 0 0
α XX Y Y β
max
√
0
ρ =
0
α,β α0 0 0
XX XX α · β Y Y Y Y
0 0 0
β
Let Kx = XX and Ky = Y Y be the kernel matrices corresponding to the two
representation. We substitute into equation (3.6)
0
α KxKy β

ρ = max .
q α0Kx2α · β0Ky2 β
α,β

We find that in equation (3.7) the variables are now represented in the dual
form.
Observe that as with the primal form presented in equation (2.2), equation (3.7)
is not aﬀected by re-scaling of α and β either together or independently. Hence
Algorithm 7

the KCCA optimisation problem formulated in equation (3.7) is equivalent to

maximising the numerator subject to
0 2
α Kx α = 1
0 2
β Ky β = 1

The corresponding Lagrangian is

0 λ α 0 2 λ β 0 2
L(λ, α, β) = α KxKy β − 2 α Kx α − 1 − 2 β Ky β − 1
Taking derivatives in respect to α and β we obtain
∂f
2 (3.8)
∂α = KxKy β − λαKx α = 0
∂f
2 (3.9)
∂β = Ky Kxα − λβ Ky β = 0.
0 0
Subtracting β times the second equation from α times the first we have
0 0 2 0 0 2 0 2
0 = α KxKy β − α λαKx α − β Ky Kxα + β λβ Ky β = λβ β Ky β −
0 2
λαα Kx α
which together with the constraints implies that λ α − λβ = 0, let λ = λ α = λβ .
Considering the case where the kernel matrices K x and Ky are invertible, we
have

β = Ky−1Ky−1Ky Kxα
λ
= K y K xα
−1

λ
substituting in equation (3.8) we obtain

−1 2
Kx Ky Ky Kxα − λ KxKx α = 0.
Hence

2
KxKxα − λ Kx Kxα = 0
or
2
I α = λ α. (3.10)
We are left with a generalised eigenproblem of the form Ax = λx. We can deduce
from equation 3.10 that λ = 1 for every vector of α; hence we can choose the
1
projections wx to be unit vectors ji i = 1, . . . , m while w y are the columns of λ
−1
Ky Kx . Hence when Kx or Ky is invertible, perfect correlation can be formed.
Since kernel methods provide high dimensional representations such
independence is not uncommon. It is therefore clear that a naive application of
CCA in kernel defined feature space will not provide useful results. In the next
section we investigate how this problem can be avoided.
Computational Issues 8

4 Computational Issues
We observe from equation (3.10) that if K x is invertible maximal correlation is
obtained, suggesting learning is trivial. To force non-trivial learning we intro-
duce a control on the flexibility of the projections by penalising the norms of
the associated weight vectors by a convex combination of constraints based
on Partial Least Squares. Another computational issue that can arise is the
use of large training sets, as this can lead to computational problems and
degener-acy. To overcome this issue we apply partial Gram-Schmidt
orthogonolisation (equivalently incomplete Cholesky decomposition) to
reduce the dimensionality of the kernel matrices.

4.1 Regularisation
To force non-trivial learning on the correlation we introduce a control on the
flexibility of the projection mappings using Partial Least Squares (PLS) to
penalise the norms of the associated weights. We convexly combine the PLS
term with the KCCA term in the denominator of equation (3.7) obtaining
max 0
α KxKy β
α,β

ρ = q
(α0Kx2α + κkwx0k2) · (β0Ky2β + κkwy k2))
α KxKy β

= maxα,β
0 2 0 0 2 0
q (α Kx α + κα Kxα) · (β Ky β + κβ Ky β
We observe that the new regularised equation is not aﬀected by re-scaling of
α or β, hence the optimisation problem is subject to
0 2 0
(α Kx α + κα Kx α) = 1
0 2 0
(β Ky β + κβ Ky β) = 1

The corresponding Lagrangain is

0
L(λα, λβ , α, β) = α KxKy β
λ α 0 2 0
− 2 (α Kx α + κα Kxα − 1)
λ β 0 2 0
− = KxKy β − λα2(Kx(βα K
2 β +α)
+yκK κβ Ky β − 1).
x

2
Taking derivatives in
= Ky Kxα − λβ (Ky β + κKy β). respect to α and β
∂f
(4.1)
∂α
∂f
(4.2)
∂β
0 0
Subtracting β times the second equation from α times the first we have
0 0 2 0 0 2 0 2
0 = α KxKy β − λαα (Kx α + κKxα) − β Ky Kxα + λβ β (Ky β + κKy β) = λβ β (Ky β
0 2
+ κKy β) − λαα (Kx α + κKx α).
Computational Issues 9

Which together with the constraints implies that λ α − λβ = 0, let λ = λα = λβ .

Consider the case where Kx and Ky are invertible, we have
−1 −1
β = (Ky + κI) Ky Ky Kxα
λ
= (Ky + κI) Kx α
−1

λ
substituting in equation 4.1 gives
−1 2
Kx Ky (Ky + κI) Ky (Ky + Kxα = λ Kx(Kx + κI)α
−1 −1 2
κI) (Kx + κI) Ky (Ky + Kxα = λ (Kx + κI)α
−1 K 2
κI) xα = λ α
We obtain a generalised eigenproblem of the form Ax = λx .

4.2 Cholesky Decomposition

We describe some background information on direct factorisation methods
on triangular decomposition [13].
LU=A
in which the diagonal elements of L are not necessarily unity. We consider L
≡ (lij ) then equation (4.3) implies
k−1
X
l u l u
kk kk = akk − kp pk for k ≥ 2
p=1
u
kj = lkk
a
kj
− lkpupj for j > k ≥ 2
k−1
1
X
k−1
p=1
l
ik = ukk aik − lipupk for i > k ≥ 2
1

X
p=1

Theorem 1. Let A be symmetric. If the factorisation LU = A is possible,

T
then the choice lkk = ukk implies lik = uki, that is, LL = A.

Proof. Use equation (4.4) and induction on k.

A simple, non-singular, symmetric matrix for which the factorisation is not

possible is

0 1
1 0
0
On the other hand, if the symmetric matrix A is positive definite (i.e., x Ax > 0
0
if x x > 0), then the factorisation is possible. We have
Computational Issues 10

Theorem 2. Let A be symmetric, positive definite. Then, A can be

factored in the form
0
LL =A
√
Proof. If we define lkk = ukk = bkk then we will obtain from the previous
equations LU = A where lik = uki

Incomplete Cholesky Decomposition

Complete decomposition of a kernel matrix is an expensive step and should be
avoided with real world data. Incomplete Cholesky decomposition as described
in [2] diﬀers from Cholesky decomposition in that all pivots, which are below a
certain threshold are skipped. If M is the number of non-skipped pivots, then we
i
obtain a lower triangular matrix G with only M nonzero columns. Symmetric
permutations of rows and columns are necessary during the factori-sation if we
require the rank to be as small as possible (Golub and loan, 1983).

We describe the algorithm from [2] (with slight modification) :

Input N xN matrix K
precision parameter η
0
1. Initialisation: i = 1, K = K, P = I, for j [1, N ], Gjj = Kjj
PN
2. While Gjj > η and i! = N + 1
j=1
• Find best new element: j = argmaxj [i,N ]Gjj
• Update j = (j + i) − 1
• Update permutation P :
P next = I, P nextii = 0, P nextj j = 0, P nextij = 1, P nextj i = 1
P = P · P next
0
• Permute elements i and j in K :
0 0
K = P next · K · P next
• Update (due to new permutation) the already calculated elements

• Permute elements j , j and i, i of G:

G(i, i) ↔√G(j , j )
th
• Calculate i column of G:
1 i−1

G = 0 P G G
i+1:n,i
G
ii Ki +1:n,i − j=1 i+1:n,j ij
0 Pi 2
• Update only diagonal elements: for j [i + 1, N ], Gjj = Kjj − k=1 G jk
• Update i = i + 1
3. Output P , G and M = i

Output: an N × M lower triangular matrix G and a permutation matrix

0 0
P such that kP K P − GG k ≤ η (appendix 1.2 for proof ).

The algorithm involves picking one column of K at a time, choosing the

column to be added by greedily maximising a lower bound on the reduction
Computational Issues 11

in the error of the approximation. After l steps, we have an approximation of

i
˜ i 0 i
is N × l. The ranking of the N − l vectors
the form Kl = Gl G l , where Gl
can be computed by comparing the diagonal elements of the remainder matrix
i 0i
K − G lG l.

Partial Gram-Schmidt Orthogonolisation

We explore the Partial Gram-Schmidt Orthogonolisation (PGSO) algorithm,
described in [6], as our matrix decomposition approach. ICD could been as
equivalent to PGSO as ICD is the dual implementation of PGSO. PGSO
works as follows; The projection is built up as the span of a subset of the
projections of a set of m training examples. These are selected by
performing a Gram-Schmidt orthogonalisation of the training vectors in the
feature space. We slightly modify the Gram-Schmidt algorithm so it will use a
precision parameter as a stopping criterion as shown in [2].

Given a kernel K from a training set, and precision parameter η:

Initialisations:
m = size of K, a N × N matrix
j=1
size and index are a vector with the same length as K
f eat a zeros matrix equal to the size of K for i = 1 to
m do
norm2[i] = Kii;

Algorithm: P

while i norm2[i] > η and j! = N + 1 do

index[j] = ij ;

size[j] = norm2[ij ];
to m do

for i = 1 p
P ”
“
k(di ,dij )−
t=1 f eat[i,t]·f eat[ij ,t]

j−1

feat[i, j] = ;
size[j]
norm2[i] = norm2[i]− feat(i, j)· feat(i, j);
end;
j=j+1
end;
return feat

Output:
0
kK − f eat · f eat k ≤ η where f eat is a N × M lower triangular matrix
(appendix 1.2 for proof )
We observe that the output is equivalent to the output of ICD.

To classify a new example at location i:

Computational Issues 12

Given a kernel K from a testing set

for j = 1 to M Pj−1
newfeat[j] = (Ki,index[j] − newfeat[t] · feat[index[j], t])/size[j];
t=1
end;

The advantage of using the partial Gram-Schmidt orthonogolisation (PGSO)

in comparison to the incomplete Cholesky decomposition (as described in
Section 4.2) is that there is no need for a permutation matrix P .

4.3 Kernel-CCA with PGSO

So far we have considered the kernel matrices as invertible, although in
prac-tice this may not be the case. In this Section we address the issue of
using large training sets, which may lead to computational problems and
degeneracy. We use PGSO to approximate the kernel matrices such that we
are able to re-represent the correlation with reduced dimensiality.

Decomposing the kernel matrices Kx and Ky via PGSO, where R is a lower

triangular matrix, gives
Kx =˜ RxRx0
Ky =˜ R y Ry
0

substituting the new representation into equations (3.8) and (3.9)

0 0 0 0
RxRx Ry Ry β − λRxRx RxRx α = 0
0 0 0 0
Ry Ry RxRx α − λRy Ry Ry Ry β = 0.
0 0
Multiplying the first equation with Rx and the second equation with Ry gives
0 0 0 0 0 0
Rx RxRx Ry Ry β − λRx RxRx RxRx α = 0
0 0 0 0 0 0
Ry Ry Ry RxRx α − λRy Ry Ry Ry Ry β = 0.
Let Z be the new correlation matrix with the reduced dimensiality
0
Rx Rx = Zxx
0
Ry Ry = Zyy
0
Rx Ry = Zxy
0
Ry R x = Zyx
˜
Let α˜ and β be the reduced directions, such that
0
α˜ = 0
Rx α
˜

β = Ry β
substituting in equations (4.9) and (4.10) we find that we return to the primal
representation of CCA with a dual representation of the data
˜
Z Z β − λZ α˜
xx xy xx
2
= 0
˜

Z Z α˜ − λZ β
yy yx yy = 0.
Computational Issues 13

Assuming that the Zxx and Zyy are invertible. We multiply the first equation
−1 −1
with Zxx and the second with Zyy
˜
Zxy β − λZxxα˜ = 0
˜

˜
Zyx α˜ − λZyy β = 0.
We are able to rewrite β from equation (4.12) as
−1
Z Z α˜
˜ yy yx

β=
λ
and substituting in equation (4.11) gives
−1 2
Zxy Z Zyxα˜ = λ Zxxα˜ (4.13)
yy
0
we are left with a generalised eigenproblem of the form Ax = λBx. Let SS be
0
equal to the complete Cholesky decomposition of Z xx such that Zxx = SS
0
where S is a lower triangular matrix, and let αˆ = S ·α˜. Substituting in
equation (4.13) we obtain
−1 −1 −10 2
S Zxy Zyy ZyxS αˆ = λ αˆ

We now have a symmetric generalised eigenproblem of the form Ax = λx.

KCCA Regularisation with PGSO

We combine the dimensiality reduction introduced in the previous Section
4.3 with the regularisation parameter (Section 4.1) to maximise the learning.
Fol-lowing the same approach in the previous section we can rewrite
equations (4.1) and (4.2) with the approximation of K x and Ky as formulated
in equations (4.7) and (4.8) respectively, in the following manner
0 0 0 0 0
RxRx Ry Ry β − λ(Rx Rx RxRx + κRx Rx )α
0 0 0 0 0
Ry Ry RxRx α − λ(Ry Ry Ry Ry + κRy Ry )β
0 0
Multiplying the first equation with Rx and the second equation with Ry gives
0 0 0 0 0 0 0
Rx RxRx Ry Ry β − λRx (Rx Rx RxRx + κRx Rx )α = 0 (4
0 0 0 0 0 0 0
Ry Ry Ry RxRx α − λRy (Ry Ry Ry Ry + κRy Ry )β = 0 (4
rewriting equation (4.14) with the new reduced correlation matrix Z as
defined in the previous Section 4.3, we obtain
˜
ZxxZxy β − λZxx(Zxx + κI)α˜ = 0
˜

Zyy Zyxα˜ − λZyy (Zyy + κI)β = 0.

Assuming that the Zxx and Zyy are invertible. We multiply the first equation
−1 −1
with Zxx and the second with Zyy
˜
Zxy β − λ(Zxx + κI)α˜ = 0
˜

Zyxα˜ − λ(Zyy + κI)β = 0.

Experimental Results 14

˜
We are able to rewrite β from equation (4.17) as
−1
˜ (Zyy + κI) Zyxα˜

β=
λ
substituting in equation 4.16 gives
−1 2
Zxy (Zyy + κI) Zyx α˜ = λ (Zxx + κI)α˜

We are left with a generalised eigenproblem of the form Ax = λBx.

0
Performing a complete Cholesky decomposition on Zxx + κI = SS where S is
0
a lower triangular matrix. and let αˆ = S α˜, substituting in equation (4.18)
−1 −1 −10 2
S Zxy (Zyy + κI) ZyxS αˆ = λ αˆ.

We obtain a symmetric generalised eigenproblem of the form Ax = λx .

5 Experimental Results
In the following experiments the problem of learning semantics of multimedia
content by combining image and text data is addressed. The synthesis is ad-
dressed by the kernel Canonical correlation analysis described in Section 4.3.
We test the use of the derived semantic space in an image retrieval task that
uses only image content. The aim is to allow retrieval of images from a text
query but without reference to any labeling associated with the image. This can
be viewed as a cross-modal retrieval task. We used the combined multime-dia
image-text web database, which was kindly provided by the authors of [15],
where we are trying to facilitate mate retrieval on a test set. The data was di-
vided into three classes (Figure 1) - Sport, Aviation and Paintball - 400 records
each and consisted of jpeg images retrieved from the Internet with attached text.
We randomly split each class into two halves which were used as training and
test data accordingly. The extracted features of the data were used the same as
in [15] (detailed description of the features used can be found in [15]: image
HSV colour, image Gabor texture and term frequencies in text.

We compute the value of κ for the regularization by running the KCCA with the
association between image and text randomized. Let λ(κ) be the spectrum with-
out randomisation, the database with itself, and λ R(κ) be the spectrum with
randomisation, the database with a randomised version of itself, (by spectrum it
is meant that the vector whose entries are the eigenvalues). We would like to
have the non-random spectrum as distant as possible from the randomised
spectrum, as if the same correlation occurs for λ(κ) and λ Rκ then clearly over-
fitting is taking place. Therefor we expect for κ = 0 (no regularisation) and let j =
1, . . . , 1 (the all ones vector) that we may have λ(κ) = λ R(κ) = j, since it is very
possible that the examples are linearly independent. Though we find that only
50% of the examples are linearly independent, this does not a ﬀect the selection
of κ through this method. We choose κ so that the κ for which the
Experimental Results 15

20 20 20 20

40 40 40 40

60 60 60 60

80 80 80 80

100 100 100 100

120 120 120 120

140 140 140 140

160 160 160 160

180 180 180 180

Aviation 200 50 100 150 200 250 300 350 400 200 50 100 150 200 250 300 350 400 450 500 200 50 100 150 200 250 300 350 400 200 50 100 150 200 250 300 350 400 450

10 10 10

20 20

40 30 30

40 50

50 60

80 70 60

80 70

100 90 80

Sports 10 20 30 40 50 60 70 80 20 40 60 80 100 120 100 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120

120

90 110 100

20 20

40 30

40 60

60 80

60 100

80 60

120

80 70

100 140

160

120 100 90

Paintball 10 20 30 40 50 60 70 80 90 100 110 20 40 60 80 100 120 140 20 40 60 80 100 120 140 50 100 150 200 250 300

100 180

140

120 110 200

Figure 1 Example of images in database.

diﬀerence between the spectrum of the randomized set is maximally

diﬀerent (in the two norm) from the true spectrum.

κ = argmaxkλR(κ) − λ(κ)k

We find that κ = 7 and set via a heuristic technique the Gram-Schmidt preci-
sion parameter η = 0.5 .

To perform the test image retrieval we compute the features of the images
and text query using the Gram-Schmidt algorithm. Once we have obtained
the features for the test query (text) and test images we project them into the
˜

semantic feature space using β and α˜ (which are computed through training)
respectively. Now we can compare them using an inner product of the
semantic feature vector. The higher the value of the inner product, the more
similar the two objects are. Hence, we retrieve the images whose inner
products with the test query are highest.

We compared the performance of our methods with a retrieval technique

based on the Generalised Vector Space Model (GVSM). This uses as a
seman-tic feature vector the vector of inner products between either a text
query and each training label or test image and each training image. For
both methods we have used a Gaussian kernel, with σ = max. distance/20,
for the image colour component and all experiments were an average of 10
runs. For convenience we separate the content-based and mate-based
approaches into the following Subsections 5.1 and 5.2 respectively.

5.1 Content-Based Retrieval

˜
In this experiment we used the first 30 and 5 α˜ eigenvectors and β eigenvectors
(corresponding to the largest eigenvalues). We computed the 10 and 30 images
for which their semantic feature vector has the closest inner product with the
semantic feature vector of the chosen text. Success is considered if the images
contained in the set are of the same label as the query text (Figure 3 - retrieval
Experimental Results 16

example for set of 5 images).

Image Set GVSM success KCCA success (30) KCCA success (5)
10 78.93% 85% 90.97%
30 76.82% 83.02% 90.69%

Table 1 Success cross-results between kernel-cca & generalised vector space.

KCCA (5)

KCCA (30)

GVSM

85
(%)

80
Rate
Success

0 20 40 60 80 100 120 140 160 180 20

Image Set Size

Figure 2 Success plot for content-based KCCA against GVSM

In Tables 1 and 2 we compare the performance of the kernel-CCA algorithm

and generalised vector space model. In Table 1 we present the performance
of the methods over 10 and 30 image sets where in Table 2 as plotted in
Figure 2 we see the overall performance of the KCCA method against the
0
GVSM for image sets (1 − 200), as in the 200 th image set location the
maximum of 200 × 600 of the same labelled images over all text queries can
be retrieved (we only have 200 images per label). The success rate in Table
1 and Figure 2 is computed as follows
600 i j
P P ×
j=1 k=1 countk
success % for image set i =
i 600
×
j
where count k = 1 if the image k in the set is of the same label as the text
j
query present in the set, else count k = 0. The success rate in Table 2 is
computed as above and averaged over all image sets.

As visible in Figure 4 we observe that when we add eigenvectors to the seman-

tic projection we will reduce the success of the content based retrieval. We
speculate that this may be the result of unnecessary detail in the semantic
projection. and as the semantic information needed is contained in the first
Experimental Results 17

10 10 10

20 20 20

30 30 30

40 40 40

50 50 50

60 60 60

70 70 70

80 80 80

90 90 90

100 100 100

110 110 110

10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80

10 10

20 20

30 30

40 40

50 50

60 60

70 70

80 80

90 90

100 100

110 110

10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80

Figure 3 Images retrieved for the text query: ”height: 6-11 weight: 235 lbs
position: forward born: september 18, 1968, split, croatia college: none”

Method overall success

GVSM 72.3%
KCCA (30) 79.12%
KCCA (5) 88.25%

Table 2 Success rate over all image sets (1 − 200).

few eigenvectors. Hence a minimal selection of 5 eigenvectors is suﬃcient

to obtain a high success rate.

5.2 Mate-Based Retrieval

˜
In the experiment we used the first 150 and 30 α˜ eigenvectors and β
eigenvectors (corresponding to the largest eigenvalues). We computed the
10 and 30 images for which their semantic feature vector has the closest
inner product with the semantic feature vector of the chosen text. A
successful match is considered if the image that actually matched the
chosen text is contained in this set. We compute the success as the average
of 10 runs (Figure 5 - retrieval example for set of 5 images).

Image set GVSM success KCCA success (30) KCCA success (150)
10 8% 17.19% 59.5%
30 19% 32.32% 69%

Table 3 Success cross-results between kernel-cca & generalised vector space.

In Table 3 we compare the performance of the KCCA algorithm with the GVSM
Experimental Results 18

70
(%)

60
Success
Overall

0 5 10 15 20 25 30 35 40 45

eigenvectors

Figure 4 Content-Based plot of eigenvector selection against overall success

(%).

over 10 and 30 image sets where in Table 4 we present the overall success
over all image sets. In figure 6 we see the overall performance of the KCCA
method against the GVSM for all possible image sets.
The success rate in Table 3 and Figure 6 is computed as follows
600
count
j=1 j

success % for image set i = P 600

where countj = 1 if the exact matching image to the text query was present in
the set, else countj = 0. The success rate in Table 4 is computed as above
and averaged over all image sets.

Method overall success

GVSM 70.6511%
KCCA (30) 83.4671%
KCCA (150) 92.9781%

Table 4 Success rate over all image sets.

As visible in Figure 7 we find that unlike the Content-Based (Section 5.1)

retrieval, increasing the number of eigenvectors used will assist in locating the
matching image to the query text. We speculate that this may be the result of
added detail towards exact correlation in the semantic projection. Though we do
not compute for all eigenvectors as this process would be expensive
Experimental Results 19

20 60
20
40
80

60 40
100
80
60
120
100

120 140 80

140
160 100

160
180
120
180

200
200
140

50 100 150 200 250 300 350 400 450 500 20 40 60 80 100 120 140 160 180 200 220
50 100 150 200 250 300 350 400 450

10 20

20 40

60
30

80
40
100

50
120

60 140

70 160

180
80

200
90
20 40 60 80 100 120
50 100 150 200 250 300 350 400 450 500

Figure 5 Images retrieved for the text query: ”at phoenix sky harbor on july 6,
1997. 757-2s7, n907wa phoenix suns taxis past n902aw teamwork america
west america west 757-2s7, n907wa phoenix suns taxis past n901aw
arizona at phoenix sky harbor on july 6, 1997.” The actual match is the
middle picture in the first row.

and the reminding eigenvectors would not necessarily add meaningful

semantic information.

It is visible that the kernel-CCA significantly outperformes the GVSM method

both in content retrieval and in mate retrieval.

5.3 Regularisation Parameter

We next verify that the method of selecting the regularisation parameter κ a
priori gives a value performed well. We randomly split each class into two
halves which were used as training and test data accordingly, we keep this
divided set for all runs. We set the value of the incomplete Gram-Schmidt
orthogonolisation precision parameter η = 0.5 and run over possible values
κ where for each value we test its content-based and mate-based retrieval
performance.

Let κˆ be the previous optimal choice of the regularisation parameter κˆ = κ =

7. As we define the new optimal value of κ by its performance on the testing
set, we can say that this method is biased (loosely its cheating). Though we
will show that despite this, the diﬀerence between the performance of the
biased κ and our a priori κˆ is slight.

In table 5 we compare the overall performance of the Content Based (CB)

performance in respect to the diﬀerent values of κ and in figures 8 and 9 we
view the plotting of the comparison. We observe that the diﬀerence in
Experimental Results 20

100

KCCA (150)

KCCA (30)

GVSM

0 100 200 300 400 500

Figure 6 Success plot for KCCA mate-based against GVSM (success (%)
against image set size).

κ CB-KCCA (30) CB-KCCA (5)

0 46.278% 43.8374%
κˆ 83.5238% 91.7513%
90 88.4592% 92.7936%
230 88.5548% 92.5281%

Table 5 Overall success of Content-Based (CB) KCCA with respect to κ.

performance between the a priori value κˆ and the new found optimal value
κ for 5 eigenvectors is 1.0423% and for 30 eigenvectors is 5.031%. The more
substantial increase in performance on the latter is due to the increase in the selection of the
regularisation parameter, which compensates for the substantial decrease in performance
(figure 6) of the content based retrieval, when high dimensional semantic feature space is
used.

κ MB-KCCA (30) MB-KCCA (150)

0 73.4756% 83.46%
κˆ 84.75% 92.4%
170 85.5086% 92.9975%
240 85.5086% 93.0083%
430 85.4914% 93.027%

Table 6 Overall success of Mate-Based (MB) KCCA with respect to κ.

In table 6 we compare the overall performance of the Mate-Based (MB) per-
formance with respect to the diﬀerent values of κ and in figures 10 and 11
we view a plot of the comparison. We observe that in this case the diﬀerence
in
Experimental Results 21

80
overall success (%)

0 20 40 60 80 100 120 140 160 1

eigenvectors

Figure 7 Mate-Based plot of eigenvector selection against overall success (%).

88.55
(%)

88.5
Success
Overall

88.45

88.4

100 120 140 160 180 200 220 240 260 280
Kappa

Figure 8 Content-Based. κ selection over overall success for 30 eigenvectors.

Experimental Results 22

92.796

92.794

92.792

92.79
Success (%)

92.788

92.786
Overall

92.784

92.782

92.78

92.778

80 85 90 95 100 105 110 115

Kappa

Figure 9 Content-Based. κ selection over overall success for 5 eigenvectors.

85.515

85.51

85.505
(%)

85.5
Success
Overall

85.495

85.49

85.485

100 120 140 160 180 200 220 240 260 28

Kappa

Figure 10 Mate-Based. κ selection over overall success for 30 eigenvectors.

Generalisation of Canonical Correlation Analysis 23

93.03

93.02

93.01

Succes (%)
93

92.99
Overa
ll

92.98

92.97

92.96

92.95

100 150 200 250 300 350 400 450 500

kappa

Figure 11 Mate-Based. κ selection over overall success for 150 eigenvectors.

performance between the a priori value κˆ and the new found optimal value κ
is for 150 eigenvectors 0.627% and for 30 eigenvectors is 0.7586%.

Our observed results support our proposed method for selecting the regu-
larisation parameter κ in an a priori fashion, since the diﬀerence between the
actual optimal κ and the a priori κˆ is very slight.

6 Generalisation of Canonical Correlation Analysis

In this section we follow A. Gifi’s book “Nonlinear Multivariate Analysis”
(1990) and partially J. R. Ketterling “Canonical analysis of several sets of
variables” (1971).

6.1 Some notations

For an n × n square matrix A having elements {a ij }, i, j = 1, . . . , n we can
define the trace by the formula
X
T r(A) =aii
i

the norm k kF , so called the Frobenius norm, defined by

X
0 2
kAkF = T r A A = aij
ij

and if ai denotes the ith column(row) of A then we have

2
kAkF = kaik2 = hai, aii
X X
i i
the notation k k2 means the Euclidean, l2, norm of a vector.
Generalisation of Canonical Correlation Analysis 24

6.2 Some propositions

Proposition 3. Let an optimisation problem be given in the form

min f (x, y) (6.4)

x,y

subject to (6.5)
g(y) = 0, (6.6)
m n
x R ,y R . (6.7)
n
Let the set Y R the feasibility domain for y determined by the constrain
g(y) = 0.

Assume the function f is convex in both variables x and y, the optimal

solution of x can be expressed by the function h(y) of the optimal
solution of y, where the function of h is defined on the whole set Y and
m
the functions f, g, h are twice continuously diﬀerentiable on R × Y .
Then the optimisation problem with the same constrain

min f (h(y), y) (6.8)

subject to (6.9)
g(y) = 0, (6.10)
n
y R , (6.11)
has the same optimal solution in y than equation (6.4) has.
Proof. Let the optimal solution of equation (6.4) be denoted by x 1, y1 and for
equation (6.8) be denoted by y2.

Based on the condition of the proposition we have x 1 = h(y1). Because y1 is

a feasible solution for equation (6.8) thus f (x 1, y1) = f (h(y1), y1) ≥ f (h(y2),
y2), but the objective function of equation (6.4) is not restricted in the first
variable, thus the inequality f (x 1, y1) ≤ f (h(y2), y2) holds, hence f (x1, y1) = f
(h(y2), y2).

From the convexity of f and the same feasibility domains the optimum
solutions have to be the same.

6.3 Formulation of the Canonical Correlation

(1) (2)
Let H , H be matrices with size m × n1, m × n2 respectively and assume the
sum of the elements in the columns of these matrices are equal to 0, they are
centralised and they are linearly independent vectors within one matrix. We
consider arbitrary linear combinations of the columns of these matrices in
(1) (1) (2) (2)
the form H a i , H a i , i = 1, . . . , p. Let A(1) = a(1)1 , . . . , a(1)n1 and A(2) =
(2) (2)
a 1, . . . , a n2 be matrices comprising the vectors of the linear combinations as
Generalisation of Canonical Correlation Analysis 25

columns. Introducing notations for the product of the matrices to simplify the
formulas:
0
Σij = H( i)H(j), i, j = 1, 2. (6.12)
We are looking for linear combinations of the columns of these matrices such
(1) (2)
that the first pair of the vectors (a 1, a 1 ) are optimal solution of the optimi-
sation problem:
(1)0 (2)
max a1 Σ12a1
(1) (2)
a1 ,a1

subject to
0 (1)
a1(1) Σ11a1 = 1,
0 (2)
Σ22a1
a1(2) = 1.
The meaning of this optimisation problem is to find the maximum correlation
(1) (2)
between the linear combinations of the columns of the matrices H , H ,
subject to the length of the vectors corresponding to these linear
combinations normalised to 1.

(1) (2)
To determinate the remaining pairs of the vectors, columns in A and A ,
a series of optimisation problems are solved successively. For the pair of the
(1) (2)
vectors (a r , a r), r = 2, . . . , p we have
(1)0 (2)
max ar Σ12ar
(1) (2)
ar ,ar
subject to
(k)0 (k)
ar Σkk ar = 1,
(k)0 (k)
ar Σkk aj = 0,
(k)0 (l)
ar Σklaj = 0,

k, l = 1, 2, j = 1, . . . , r − 1.
The problem (6.13) expanded by the orthogonality constrains (6.17), namely
the components of every new pair in the iteration have to be orthogonal to
the components of the previous pairs.

The upper limit p of the iteration has to be ≤ min(rank(H(1)), rank(H(2))).

Applying the Karush-Kuhn-Tucker conditions we can express the optimal

solutions of the problem (6.13) and the problems (6.17) for r = 2, . . . , p.
Let’s begin with the problem (6.17).

First we apply a substitution such that

−
a(k) =Σ (6.23)
i
−
1
−
1
(6.24)
Dkl = Σkk ΣklΣll 2 , 2
(6.25)
k, l = 1, 2, i = 1, . . . , p,
Generalisation of Canonical Correlation Analysis 26

Thus we have the problem

(1)0 (2)
max y1 D12y1 (6.26)
(1) (2)
y1 ,y1

subject to (6.27)
0 (k)
y1(k) y1 = 1, k = 1, 2. (6.28)
(6.29)

The Lagrangian of this problem has the form

0 0 0
(1) (2) 1 (1) (1) 1 (2) (2)

L = y D y
1 1 1 + 2 λ1(1 − y1
12 y1 ) + 2 λ2 (1 − y2 y2 ), (6.30)
where λ1 and λ2 are the Lagrangian multipliers. The vectors of the partial
(1) (2)
derivatives of L1respect to the vectors y1 , y1 are equal to 0 by the KKT
conditions, thus we get
∂L1 (2) (1)

= 2D a
∂y(1) 12 1 − 2λ1y1 = 0,
1
∂L1 (1) (2)

= 2D y
∂y(2) 21 1 − 2λ2y1 = 0.
1
(1)0
Multiplying equation (6.31) by y1 and equation (6.32) by and dividing
(2)
y1 0 by the constant 2 provides
(1)0 (2) (1)0 (1)
y1 D12y1 − λ1y1 y1 = 0, (6.33)
(2)0 (1) (2)0 (2)
y1 D120y1 − λ2y1 y1 = 0. (6.34)
Based on the constrains of the optimisation problem (6.26) and the identity
D210 = D12 we have
(1)0 (2)
λ1 = λ 2 = y 1 D12y1 .
After replacing λ1 and λ2 with λ the following equality system can be formulated
D (1)
−λI 12 y1 = 0.
− 1
D
21 λI y(2) !
It is not too hard to realise this equality system is a singular vector and value
(1) (2)
problem of the matrix D12 having y1 and y1 are a left and a right singular
vectors and the value of the Lagrangian λ is equal to the corresponding singular
value. Based on this statements we can claim that the optimal solutions are the
singular vectors belonging to the greatest singular value of the matrix D12.

Considering the successive optimisation problem and applying similar sub-

( k)
stitution for the all variables a i as introduced in equation (6.23), a problem
with the greatest r singular values and the corresponding left and right
singular vectors arises.
Generalisation of Canonical Correlation Analysis 27

6.4 The simultaneous formulation of the canonical correla-

tion
Instead of using the successive formulation of the canonical correlation we
can join the subproblems into one. The simultaneous formulation is the
optimisation problem
max p
(6.37) ai
(1)0
Σ12ai
(2)

X
(2) (1) (2)
,a ),...,(ap ,ap )
(a(1)
(6.38)
1 1 i=1
subject to (6.39)
0
(1) (1) = 1 if
Σ a
ai 11 j (6.40)
0 otherwise,
0
(2) (2) = 1 if
Σ a (6.41)
ai 22 j 0 otherwise,
i, j = 1, . . . , p, (6.42)
(1) (2) (6.43)
a i0 Σ12a j = 0,
i, j = 1, . . . , p, j =6 i.
Based on equation (6.37) and the definition of the Frobenius norm we have a
compact formulation of the canonical correlation problem:
0
max T r A(1) Σ A(2)
A(1),A(2) 12
subject to
(k)0 (k)
A Σkk A =I,
0 (l)
ai(k) Σklaj = 0,
k, l = {1, 2}, l 6= k, i, j = 1, . . . , p, j 6= i.
where I is the identity matrix with size p × p.

Repeating the substitution in equation (6.23) the set of feasible vectors for the
simultaneous problem is equal to the left and right singular vectors of ma-trix
D12, hence the optimal solution is compatible to the successive problems.

6.5 Correlation versus Distance

The canonical correlation problem can be transformed into a distance problem
where the distance between two matrices is measured by the Frobenius norm.
min H(1)A(1) H(2)A(2)
A ,A
(1) (2) − F

subject to
(k)0 (k)
A Σkk A = I ,
0 (l)
ai(k) Σklaj = 0,
k, l = 1, . . . , 2, l 6= k, i, j = 1, . . . , p, j 6= i.
Generalisation of Canonical Correlation Analysis 28

Unfolding the objective function of the minimisation problem (6.49) shows the
optimisation problem is the same as the maximisation problem (6.44).

6.6 The generalisation of canonical correlation

Exploiting the distance problem we can give a generalisation of the canoni-
cal correlation for more than two known matrices. Given a set of matrices {H
(1) (K )
,...,H } with dimension m × n1, . . . , m × nK . We are looking for the
linear combinations of the columns of these matrices in the matrix form
A(1), . . . , A(K) such that they gives the optimum solution of the problem
K
H(k)A(k) − H(l)A(l) F (6.54)
min X
(1) (K)
A ,...A
k,l=1

subject to
(k)0 (k)
A Σkk A =I,
0 (l)
ai(k) Σklaj = 0,
k, l = 1, . . . , K, l 6= k, i, j = 1, . . . , p, j 6= i.
In the forthcoming sections we will show how to simplify this problem.

6.7 Total Distance versus Variance

n
Given a set of vectors X = x 1, . . . , xm R . The notation xki means the ith
component of the vector xk .

The total squared distance, the sum of the squared Euclidean distance of all
possible pairs of vectors in X is equal to
1 mm
X X

2
2 kxk − xlk2 =
k=1 l=1,l=6k

as for any k, kxk − xk k = 0 we can drop the constrain l =6 k, thus we have

m
1
2
= 2
kxk − xlk2 = (
k=1,l=1

X
m n
1 X X

2
= 2 (xki − xli) = (
k=1,l=1 i=1
m n
1 X X

2 2
=2 (xki + xli − 2xkixli) = (
k=1,l=1 i=1
2
= 2 xki
2
+ xli − 2xkixli
1 n m m m
X X X X
1 n m m m m
i=1 k=1,l=1
2
k=1,l=1
2
k=1,l=1
x !
= 2 m xki +m xli − 2 ki xli (
X X X X X
i=1 k=1 l=1 k=1 l=1
Generalisation of Canonical Correlation Analysis 29

to simplify the formula we introduce

(i) 1 2
M (i) = 1 m xki, M = m x ,
X X
1 2 ki
m
k=1 m k=1
we can reformulate equation (6.64)
n
X
=
i=1
m2M2
(i) 2
− m (M1 )
(i) 2
= (6.66)

applying the well-known identity of the variance for the vectors

(x11, . . . , xm1), . . . , (x1n , . . . xmn ) gives
n m
2 (i) 2

XX
=m (xki − M1 ) .
i=1 k=1

Hence the total squared distance turns to be equal to the sum of the
component-wise variances of the vectors in X multiplied by the square of the
number of the vectors.

Another statement about the variance is introduced. If we have the following

optimisation problem
n
min z x 2, x X and z R ,
k
z k− k 2 k
then the optimal solution can be expressed by
1 m
X

zi = n xki.
k=1

The components of the optimal solution are equal to the mean values of the
corresponding components of the known vectors.

6.8 General form

(1) (K )
Let H , . . . , H be a set of known matrices with size m × n1, . . . , m × nK
and X be an unknown matrix with size m × p. The columns of the matrices
(1) (K )
H , . . . , H are centralised, i.e. the mean of every column in every
(k)
matrix is equal to 0. We assume the columns of every matrix H , k = 1, . . . ,
K are linearly independent. A notation to simplify the formulas, is introduced;
(k)T (l)
Σkl = H H . We are looking for linear combinations of the columns of the
known matrices and a corresponding X such that they are the optimal
solution of the optimisation problem given by
Σ
K s
1 u
F
kl bj
X,A(1),...,A(K) K k=1
−
a e
X
ct
min X H(k)A(k)
aj = to
if (k = l and i
6= j) or (k 6=
i 0 l and i 6= j),
0
(k) (l) 1 if k = l and i = j,
(6.70)
k, l = 1, . . . , K, i, j = 1, . . . , p, except when k =6 l and i = j,
(6.71)

(6.72)

(6.73)
Generalisation of Canonical Correlation Analysis 30

( k) (k)
where a i denotes the ith column of the matrix A containing the possible
linear combinations.

Applying substitutions for all k = 1, . . . , K, i = 1, . . . , p

1
−
a(k) = Σ 2 y(k) ,
i kk i
(k)
where we can compute the inverse because the columns of the matrix H
are independent meaning Σkk has full rank. We can transform this
optimisation problem into a more simply form. First, we modify the set of
constrains. To make this modification readable the notation is introduced
1 1
− −
Σkk ΣklΣll 2 = Dkl, k, l = 1, . . . , K,
2 (6.75)
1
−
where we exploit the symmetricity of the matrices Σkk2 .

Thus the constrains get the form

yi yj =
0
(k) (k)
(6.76)

k = 1, . . . , K, i, j = 1, . . . , p, (6.77)
(k)0 (l)
yi Dklyj = 0, (6.78)
k, l = 1, . . . , K, k =6 l, i, j = 1, . . . p, i =6 j, (6.79)

for which we can recognise the singular decomposition problems of the

matrices {Dkl}. If we consider the matrix D kl for a fixed pair of the indeces k, l
and apply the singular decomposition we have
(k) (l)0
Dkl = Y ΛklY ,
(k) (l) (k)
the matrices Y and Y have columns being equal to the vectors yi and
(l)
yi respectively, where i = 1, . . . , p. The singular decomposition Λ kl is a
(k) (k ) (l) (l)
diagonal matrix and Y 0 Y =I,Y 0Y = I. The constrains do not
contain the items having indeces with the properties k =6 l and i = j. They
give the singular values of the matrix Dkl
(k) (l)
yi 0 Dklyi = Λii. (6.81)
The consequence of the singular decomposition form is that the set of the
feasible solutions of the optimisation problem with constrains (6.76) are
equal to the set of the singular vectors of the matrices {Dkl, k, l = 1 . . . , K}.
To express the objective function of the optimisation problem (6.70) we use
the notations
1

(k) −
Qk = H Σkk 2 ,
T
D =Q Q .
kl kl
Generalisation of Canonical Correlation Analysis 31

We can derive another statement about the optimal solution of the problem.
Exploiting the definition of the Frobenius norm the objective function (6.70)
can be rewritten as a sum of the Euclidean norm of the column vectors,
where xi denotes the ith column of the matrix X,
K
1 (6.84)
K
k=1
X−H
X
p
(6.85)
= K K
k=1 i=1
xi − H(k)a
i

(6.86)
1

(6.87)
K p
1 (k)

= x − Q y
K k=1 i=1 i k i

p
1 K (k)
XX

hxi − Qk y
=K k=1 i=1 i , xi − Q
The constrains are formulated in equation (6.76).
For the Lagrangian function of the optimisation problem we have:
K L=
p hxi − Qk yi
(k)
, xi − Qk yi
(k)
i+ (6.88)

XX
k=1 i=1
K p

XX (k)0 (k)
+ λk,ii 1 − yi yi + (6.89)
k i
Kp

XX (k)0 (k)
+ λk,ij −yi yj + (6.90)
k i,j
i=6j
K p

XX (k)0 (l)
+ λkl,ij −yi Dklyj . (6.91)
k,l i,j
k=6l i=6j

1
We disregard the constant K from the objective function (6.70).
After computing the partial derivatives, where xi signs the ith column of the
matrix X, we get
K
∂L (k)
∂xi
=
k=1 2xi − 2Qk yi
= 0, i = 1, . . . , p, (6.92)

p K p
∂L (k) k0 (k) (l)

X X
X =6 j=6i
= 2Dkk y
∂y(k) i − 2Q x − 2λ
i k,ij yj −2 λ
kl,ij
D y
kl j = 0,
i j l j

l k
(6.93)
k = 1, . . . , K, i = 1, . . . , p. (6.94)
Conclusions 32

We can express xi for any i = 1, . . . , p from (6.92)

K
1 X l (l)
xi = K Q yi , i = 1, . . . , p. (6.95)
l=1

Based on the proposition (3) we can replace the variable X in equation (6.70)
by an expression of the other variables without changing the optimum value
and the optimal solution. Thus we have the variance problem.

7 Conclusions
Through this study we have presented a tutorial on canonical correlation
analysis and have established a novel general approach to retrieving images
based solely on their content. This is then applied to content-based and mate-
based retrieval. Experiments show that image retrieval can be more accurate
than with the Generalised Vector Space Model. We demonstrate that one can
choose the regularisation parameter κ a priori that performs well in very di ﬀerent
regimes. Hence we have come to the conclusion that kernel Canonical
Correlation Analysis is a powerful tool for image retrieval via content. In the
future we will extend our experiments to other data collections.

In the procedure of the generalisation of the canonical correlation analysis

we can see that the original problem can be transformed and reinterpreted
as a total distance problem or variance minimisation problem. This special
duality between the correlation and the distance requires more investigation
to give more suitable description of the structure of some special spaces
generated by diﬀerent kernels.

These approaches can give tools to handle some problems in the kernel
space, where the inner products and the distances between the points are
known but the coordinates are not. For some problems it is suﬃcient to
know only the coordinates of a few special points, which can be expressed
from the known inner product, e.g. to do cluster analysis in the kernel space
and to compute the coordinates of the cluster centres only.

Acknowledgments
We would like to acknowledge the financial support of EU Projects KerMIT,
No. IST-2000-25341 and LAVA, No. IST-2001-34405.

i i
1 Proof kK − G G 0 k ≤ η
1.1 Some notation
Pn
Lemma 4. Let A and B be an square matrices such that T race(A) = i
aii then we have T race(AB) = T race(BA)
i i
Proof kK − G G 0 k ≤ η 33

Proof.
n
X
T rane(AB) = (ab)ii i
n
X
a b
= ij ji
i,j
n
X
ba
= ji ij
j,i
n
X
= (ba)jj j
= T race(BA)

Lemma 5. Let A be a symmetric matrix having eigenvalue

0 0
decomposition equal to A = V ΛV (we are able to write Λ = V AV ) and
using Lemma 4, then T race(Λ) = T race(A).

Proof.
0
T race(Λ) = T race(V AV )
0
= T race((V A)V )
0
= T race(V (V A))
0
= T race(V V A)
= T race(A)

Hence we show that the following holds

n n
X X

aii = λi
i i

Lemma 6. If we have a symmetric matrix A, the Euclidian norm is equal

with the maximum eigenvalue of A

Proof.
kAxk
kAk = max .
x6=0 kxk
For any c R the scaling does not change

kcAxk = ckAxk = kAxk

kcxk ckxk kxk
i i
Proof kK − G G 0 k ≤ η 34

Hence we obtain

max kAxk = max

k
x6=0 x kxk=1

k k
0 0
kAxk = (x A Ax)
Ax 2 = x0A0Ax

k k
0 0
Let U DU be the eigenvalue decomposition of AA such that D is a
diagonal matrix containing square of the eigenvalues of A
0 0
A A = UDU
2 0
kAxk = x U DU x
0
Setting w = U x and as U is orthognoal we can rewrite kxk = 1 to kwk = 1
2 0
kAk = w Dw

X
= λi2wi2
We can see that the following holds
P X 2 2
( wi2=1) λ w i
2
max = max λ
i i i

Hence we obtain

max λ
kAk = i i

1.2 Proof
0
Theorem 7. If K is a positive definite matrix and GG is its incomplete
0
cholesky decomposition then the Euclidian norm of GG subtracted from K
i
is less than or equal to the trace of the uncalculated part of K. Let ΔK be
i i i0
the uncalculated part of K and let η = T race(ΔK ) then kK − G G k ≤ η.
0 0
Proof. Let GG be the being the complete cholesky decomposition K = GG
where G is a lower triangular matrix were the upper triangular is zeros.
G= B C
A 0

i i
Let G G 0 to be the incopmlete decomposition of K where i are the iterations
of the Cholesky factorization procedure
i
G = G1:n,1:i = B
A

0
i i ˜i ˜i is the approximation of K subject to a sym-
such that G G =K , where K
metric permutation of rows and columns. Assuming that the rows and columns
i i
Proof kK − G G 0 k ≤ η 35

of K are ordered and no permutation is necessary (this is only for convenience

i ˜i
of the proof ). Let ΔK =K−K .
Let A G , B G and C G
1:i,1:i i+1:n,1:i 1+i:n,1+i:n
0 0
AA AB
0 0 0 0
K =GG = BA BB + CC
i i 0
˜

0 0
K i =GG = BA0 BB

0 AA AB
i 0
ΔK = 0 CC
0 0

0
We show that C C is positive semi-definite
C C0 = K ˜i
i+1:n,i+1:n − K i+1:n,i+1:n
K
= K
i+1:n,i+1:n − BB
0

= K
i+1:n,i+1:n
−1
− B·A A·B
0

= i+1:n,i+1:n − B·A
−1
· (AB )
0
K
= K
i+1:n,i+1:n −G i+1:n,1:i
· G− 1
1: i,1:i
· K
1:i,i+1:n

= i+1:n,i+1:n
i
− G i+1:n · G 1:i
−1 i
· K1:i,i+1:n
therefore
0 0
xC C x= < xC, (xC) >
≥ 0 λc ≥ 0
0 i
C C is a positive semi-definite matrix, hence ΔK is also a positive semi-
definite matrix. Using Lemma 6 we are now able to show that
˜i i
kK − K k = kΔK
i i0 i
kK − G G k = kΔK k
n
X

= λiwi
i
= max λi
i

As the maximum eigenvalue is less than or equal to the sum of all the
eigenval-ues, using Lemma 5, we are able to rewrite the expression as
n
X
i i0
kK − G G k ≤ λi
i

≤ T race(Λ)
i
≤ T race(ΔKii ).
Therefore,
i i
kK − G G 0 k ≤ η.
Bibliography

[1] Shotaro Akaho. A kernel method for canonical correlation analysis. In

International Meeting of Psychometric Society, Osaka, 2001.

[2] Francis Bach and Michael Jordan. Kernel independent component analysis.
Journal of Machine Leaning Research, 3:1–48, 2002.

[3] Magnus Borga. Learning Multidimensional Signal Processing. PhD thesis,

Linkping Studies in Science and Technology, 1998.

[4] Magnus Borga. Canonical correlation a tutorial, 1999.

[5] Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vec-tor

Machines and other kernel-based learning methods. Cambridge Univer-sity Press, 2000.

[6] Nello Cristianini, John Shawe-Taylor, and Huma Lodhi. Latent semantic kernels.
In Caria Brodley and Andrea Danyluk, editors, Proceedings of ICML-01, 18th International
Conference on Machine Learning, pages 66– 73. Morgan Kaufmann Publishers, San
Francisco, US, 2001.

[7] Colin Fyfe and Pei Ling Lai. Ica using kernel canonical correlation analysis.

[8] Colin Fyfe and Pei Ling Lai. Kernel and nonlinear canonical correlation analysis.
International Journal of Neural Systems, 2001.

[9] A. Gifi. Nonlinear Multivariate Analysis. Wiley, 1990.

[10] G. H. Golub and C. F. V. Loan. Matrix Computations. The Johns Hopkins

University Press, Baltimore, MD, 1983.

[11] David R. Hardoon and John Shawe-Taylor. Kcca for diﬀerent level preci-sion in
content-based image retrieval. In Submitted to Third International Workshop on
Content-Based Multimedia Indexing, IRISA, Rennes, France, 2003.

[12] H. Hotelling. Relations between two sets of variates. Biometrika, 28:312– 377,
1936.

[13] E. Isaacson and H. B. Keller. Analysis of Numerical Methods. John Wiley &
Sons, Inc, 1966.
BIBLIOGRAPHY 37

[14] J. R. Ketterling. Canonical analysis of several sets of variables. Biometrika,

58:433–451, 1971.

[15] T. Kolenda, L. K. Hansen, J. Larsen, and O. Winther. Independent com-ponent

analysis for understanding multimedia content. In H. Bourlard, T. Adali, S. Bengio, J. Larsen,
and S. Douglas, editors, Proceedings of IEEE Workshop on Neural Networks for Signal
Processing XII, pages 757–766, Piscataway, New Jersey, 2002. IEEE Press. Martigny,
Valais, Switzerland, Sept. 4-6, 2002.

[16] Malte Kuss and Thore Graepel. The geometry of kernel canonical correla-tion
analysis. 2002.

[17] Yong Rui, Thomas S. Huang, and Shih-Fu Chang. Image retrieval: Cur-rent
techniques, promising directions, and open issues. Journal of Visual Communications
and Image Representation, 10:39–62, 1999.

[18] Alexei Vinokourov, David R. Hardoon, and John Shawe-Taylor. Learn-ing the
semantics of multimedia content with application to web image retrieval and classification. In
Proceedings of Fourth International Sym-posium on Independent Component
Analysis and Blind Source Separation, Nara, Japan, 2003.

[19] Alexei Vinokourov, John Shawe-Taylor, and Nello Cristianini. Inferring a semantic
representation of text via cross-language correlation analysis. In Advances of Neural Information
Processing Systems 15 (to appear), 2002.

Phased Array Probes and Wedges
100% (1)
Phased Array Probes and Wedges
32 pages
Tech Report03
No ratings yet
Tech Report03
39 pages
Canonical Correlation Analysis
No ratings yet
Canonical Correlation Analysis
39 pages
Sodapdf
No ratings yet
Sodapdf
83 pages
li13a
No ratings yet
li13a
9 pages
Regularized Estimation of Kronecker Structured Covariance Matrix Using Modified Cholesky Decomposition
No ratings yet
Regularized Estimation of Kronecker Structured Covariance Matrix Using Modified Cholesky Decomposition
27 pages
Latent Semantic Kernels: Kernel
No ratings yet
Latent Semantic Kernels: Kernel
27 pages
Reference
No ratings yet
Reference
27 pages
Impact of Similarity Measures On Web-Page Clustering: Joydeep Ghosh, and Raymond Mooney
No ratings yet
Impact of Similarity Measures On Web-Page Clustering: Joydeep Ghosh, and Raymond Mooney
7 pages
CosineSimilarityMetricLearning ACCV10
No ratings yet
CosineSimilarityMetricLearning ACCV10
13 pages
Graph Kernels: S.V. N. Vishwanathan
No ratings yet
Graph Kernels: S.V. N. Vishwanathan
42 pages
Clustering Art
No ratings yet
Clustering Art
8 pages
BIGMat: A Distributed Affinity-Preserving Random Walk Strategy For Instance Matching On Knowledge Graphs
No ratings yet
BIGMat: A Distributed Affinity-Preserving Random Walk Strategy For Instance Matching On Knowledge Graphs
7 pages
Document Similarity From Vector Space Densities
No ratings yet
Document Similarity From Vector Space Densities
12 pages
Commun Nonlinear Sci Numer Simulat: Xuegeng Mao, Pengjian Shang
No ratings yet
Commun Nonlinear Sci Numer Simulat: Xuegeng Mao, Pengjian Shang
10 pages
Pattern Vectors From Algebraic Graph Theory
No ratings yet
Pattern Vectors From Algebraic Graph Theory
14 pages
Rational Approximation of The Absolute Value Function From Measurements: A Numerical Study of Recent Methods
No ratings yet
Rational Approximation of The Absolute Value Function From Measurements: A Numerical Study of Recent Methods
28 pages
Geometric Diffusions As A Tool For Harmonic Analysis and Structure Definition of Data: Diffusion Maps
No ratings yet
Geometric Diffusions As A Tool For Harmonic Analysis and Structure Definition of Data: Diffusion Maps
6 pages
K-Means Clustering and Affinity Clustering Based On Heterogeneous Transfer Learning
No ratings yet
K-Means Clustering and Affinity Clustering Based On Heterogeneous Transfer Learning
7 pages
A Method For Comparing Content Based Image Retrieval Methods
No ratings yet
A Method For Comparing Content Based Image Retrieval Methods
8 pages
Hierarchical Variable Clustering Using Singular Va
No ratings yet
Hierarchical Variable Clustering Using Singular Va
20 pages
Real Life Application of Linear Algebra
No ratings yet
Real Life Application of Linear Algebra
9 pages
Understanding Bag-Of-Words Model: A Statistical Framework
No ratings yet
Understanding Bag-Of-Words Model: A Statistical Framework
10 pages
Learning Structured Prediction Models: A Large Margin Approach
No ratings yet
Learning Structured Prediction Models: A Large Margin Approach
8 pages
A Hybrid Method For Integrating Multiple
No ratings yet
A Hybrid Method For Integrating Multiple
18 pages
Is Cosine-Similarity of Embeddings Really About Similarity
No ratings yet
Is Cosine-Similarity of Embeddings Really About Similarity
9 pages
THE ULTRAMETRIC PROPERTIES OF BINARY DATASETS P. WILCZEK Silesian - J - Pure - Appl - Math - v6 - I1 - STR - 069-084
No ratings yet
THE ULTRAMETRIC PROPERTIES OF BINARY DATASETS P. WILCZEK Silesian - J - Pure - Appl - Math - v6 - I1 - STR - 069-084
16 pages
Path Based Dissimilarity Measured For Thesis Book Preparation
No ratings yet
Path Based Dissimilarity Measured For Thesis Book Preparation
11 pages
ScoreFusion Eg08 Final Lowres
No ratings yet
ScoreFusion Eg08 Final Lowres
9 pages
Taia P2 T3
No ratings yet
Taia P2 T3
2 pages
Canonical Correlation
No ratings yet
Canonical Correlation
7 pages
EntropyDirectedGraphs
No ratings yet
EntropyDirectedGraphs
16 pages
An Orthogonal Evolutionary Algorithm With Learning Automata For Multiobjective Optimization
No ratings yet
An Orthogonal Evolutionary Algorithm With Learning Automata For Multiobjective Optimization
14 pages
Linear Algebra and Its Applications: Jiashang Jiang, Hua Dai, Yongxin Yuan
No ratings yet
Linear Algebra and Its Applications: Jiashang Jiang, Hua Dai, Yongxin Yuan
14 pages
Junk 2
No ratings yet
Junk 2
14 pages
Lect 38 PDF PDF
No ratings yet
Lect 38 PDF PDF
8 pages
Learning Curve Analysis
No ratings yet
Learning Curve Analysis
10 pages
Minimax Quasi-Bayesian Estimation in Sparse Canonical Correlation Analysis Via A Rayleigh Quotient Function
No ratings yet
Minimax Quasi-Bayesian Estimation in Sparse Canonical Correlation Analysis Via A Rayleigh Quotient Function
29 pages
A Neural Network Approach For Learning Image Similarity in Adaptive Cbir
No ratings yet
A Neural Network Approach For Learning Image Similarity in Adaptive Cbir
6 pages
Regression Relevance Models For Data Fusion: October 2007
No ratings yet
Regression Relevance Models For Data Fusion: October 2007
6 pages
1-s2.0-S0031320316303326-main
No ratings yet
1-s2.0-S0031320316303326-main
10 pages
Simultaneous Topology and Stiffness Identification For Mass-Spring Models Based On FEM Reference Deformations
No ratings yet
Simultaneous Topology and Stiffness Identification For Mass-Spring Models Based On FEM Reference Deformations
9 pages
Functional Logistic Regression: A Comparison of Three Methods
No ratings yet
Functional Logistic Regression: A Comparison of Three Methods
20 pages
Sparse Concept Coding For Visual Analysis
No ratings yet
Sparse Concept Coding For Visual Analysis
6 pages
Chaos, Solitons and Fractals: Wen Chen, Yingjie Liang
No ratings yet
Chaos, Solitons and Fractals: Wen Chen, Yingjie Liang
6 pages
Interacting Classical and Quantum Particles: PACS Numbers: 03.65.ca, 03.65.ta, 04.60.-m
No ratings yet
Interacting Classical and Quantum Particles: PACS Numbers: 03.65.ca, 03.65.ta, 04.60.-m
7 pages
3 Comparison-Of-Conventional-And-Rough-Kmeans-Clustering
No ratings yet
3 Comparison-Of-Conventional-And-Rough-Kmeans-Clustering
8 pages
202202AISTATS PadMag
No ratings yet
202202AISTATS PadMag
15 pages
Multi Tracking Icassp11
No ratings yet
Multi Tracking Icassp11
4 pages
Admin, 4015
No ratings yet
Admin, 4015
19 pages
A Review of Surrogate Assisted Multiobjective EA
No ratings yet
A Review of Surrogate Assisted Multiobjective EA
15 pages
A Survey of Mapping Algorithms in The Long-Reads Era
No ratings yet
A Survey of Mapping Algorithms in The Long-Reads Era
23 pages
Mathematics 09 00319
No ratings yet
Mathematics 09 00319
16 pages
UNIT 3 - INSTANCE BASED LEARNING Akgec
No ratings yet
UNIT 3 - INSTANCE BASED LEARNING Akgec
14 pages
Vector Symbolic Architectures: A New Building Material For Artificial General Intelligence
No ratings yet
Vector Symbolic Architectures: A New Building Material For Artificial General Intelligence
5 pages
[Cao'12]Generalization-Bounds-Metric-Learning
No ratings yet
[Cao'12]Generalization-Bounds-Metric-Learning
20 pages
Opinion Particles: Classical Physics and Opinion Dynamics
No ratings yet
Opinion Particles: Classical Physics and Opinion Dynamics
19 pages
Scale-Free Networks From Varying Vertex Intrinsic Fitness: Olume Umber Ecember
No ratings yet
Scale-Free Networks From Varying Vertex Intrinsic Fitness: Olume Umber Ecember
4 pages
Mahalanonbis Distance Informed by Clustering
No ratings yet
Mahalanonbis Distance Informed by Clustering
26 pages
BEJ1653
No ratings yet
BEJ1653
26 pages
General Stochastic Processes in the Theory of Queues
From Everand
General Stochastic Processes in the Theory of Queues
Vaclav E. Benes
No ratings yet
(Silav) (4) 1st Course 2 Attempt 2022
No ratings yet
(Silav) (4) 1st Course 2 Attempt 2022
2 pages
(Silav) (5) 1st Course 1 Attempt 2023 - 2024
No ratings yet
(Silav) (5) 1st Course 1 Attempt 2023 - 2024
3 pages
(Silav) (4) 1st Course 2 Attempt 2023 - 2024
No ratings yet
(Silav) (4) 1st Course 2 Attempt 2023 - 2024
3 pages
English Tenses
No ratings yet
English Tenses
48 pages
Silav - 20 Exam First Course 2022-23
No ratings yet
Silav - 20 Exam First Course 2022-23
3 pages
The Legendre Equation and Its Self-Adjoint
No ratings yet
The Legendre Equation and Its Self-Adjoint
33 pages
Board of Intermediate Education (Ap) : Junior Inter Mathematics
No ratings yet
Board of Intermediate Education (Ap) : Junior Inter Mathematics
2 pages
Applied Mathematics Class 12 - 10 Sample Paper Sets
100% (1)
Applied Mathematics Class 12 - 10 Sample Paper Sets
103 pages
Sacs Dynpac
100% (1)
Sacs Dynpac
78 pages
D.S. Makarov JQSRT-2011-60-GHz Oxygen Band
No ratings yet
D.S. Makarov JQSRT-2011-60-GHz Oxygen Band
9 pages
Unit: 1 Complex Numbers: Department of Mathematics
No ratings yet
Unit: 1 Complex Numbers: Department of Mathematics
4 pages
Scip 8
No ratings yet
Scip 8
115 pages
DAA UNIT-V Branch and Bound and P &NP
No ratings yet
DAA UNIT-V Branch and Bound and P &NP
47 pages
Linear Control System Lab
100% (1)
Linear Control System Lab
18 pages
Math1035 Unit 1 Practice Milestone
No ratings yet
Math1035 Unit 1 Practice Milestone
19 pages
M341 Syllabus
No ratings yet
M341 Syllabus
2 pages
Staad - Pro2007 FullCourse
No ratings yet
Staad - Pro2007 FullCourse
353 pages
Computational Methods For Process Engineers 2nd Edition Mmbaga All Chapters Instant Download
100% (6)
Computational Methods For Process Engineers 2nd Edition Mmbaga All Chapters Instant Download
52 pages
NumSharp CheatSheet
No ratings yet
NumSharp CheatSheet
1 page
Survey Adjustment
No ratings yet
Survey Adjustment
96 pages
Full Download Advanced Linear and Matrix Algebra 1st Edition Nathaniel Johnston PDF DOCX
100% (1)
Full Download Advanced Linear and Matrix Algebra 1st Edition Nathaniel Johnston PDF DOCX
65 pages
Simo Fox Drilling Rotation Formulation (Theoretical Work)
No ratings yet
Simo Fox Drilling Rotation Formulation (Theoretical Work)
15 pages
PDF Practical Civil Engineering 1st Edition P K Jayasree K Balan V Rani download
100% (3)
PDF Practical Civil Engineering 1st Edition P K Jayasree K Balan V Rani download
62 pages
Integer Quantization For Deep Learning Inference
No ratings yet
Integer Quantization For Deep Learning Inference
20 pages
10 Pyq Maths
No ratings yet
10 Pyq Maths
14 pages
Eigenvector and Eigenvalue
No ratings yet
Eigenvector and Eigenvalue
6 pages
Cs 1173 Reshape Function
No ratings yet
Cs 1173 Reshape Function
2 pages
Mathematical Modeling of Quadcopter Dynamics
No ratings yet
Mathematical Modeling of Quadcopter Dynamics
14 pages
A Computer Simulation of An Induction Heating
No ratings yet
A Computer Simulation of An Induction Heating
12 pages
2023week6 Pre
No ratings yet
2023week6 Pre
3 pages
FORM FOUR SECOND TERM EXAMINATIONS 2024
No ratings yet
FORM FOUR SECOND TERM EXAMINATIONS 2024
4 pages
Matrices - PART 1 Form 5
No ratings yet
Matrices - PART 1 Form 5
26 pages
Carlo Rovelli and Lee Smolin - Spin Networks and Quantum Gravity
No ratings yet
Carlo Rovelli and Lee Smolin - Spin Networks and Quantum Gravity
42 pages
Apecet2024 - Mathematics (For Diploma Holders)
No ratings yet
Apecet2024 - Mathematics (For Diploma Holders)
4 pages