Tensors PDF
Tensors PDF
Tensors PDF
Summary I NTRODUCTION
The widespread use of multi-sensor technol- Historical notes. The roots of multiway anal-
ogy and the emergence of big datasets has high- ysis can be traced back to studies of homoge-
lighted the limitations of standard flat-view ma- neous polynomials in the 19th century, contribu-
trix models and the necessity to move towards tors include Gauss, Kronecker, Cayley, Weyl and
more versatile data analysis tools. We show Hilbert — in modern day interpretation these
that higher-order tensors (i.e., multiway arrays) are fully symmetric tensors. Decompositions of
enable such a fundamental paradigm shift to- non-symmetric tensors have been studied since
wards models that are essentially polynomial the early 20th century [1], whereas the benefits
and whose uniqueness, unlike the matrix meth- of using more than two matrices in factor analy-
ods, is guaranteed under very mild and natural sis [2] became apparent in several communities
conditions. Benefiting from the power of multi- since the 1960s. The Tucker decomposition for
linear algebra as their mathematical backbone, tensors was introduced in psychometrics [3], [4],
data analysis techniques using tensor decom- while the Canonical Polyadic Decomposition
positions are shown to have great flexibility (CPD) was independently rediscovered and put
in the choice of constraints that match data into an application context under the names
properties, and to find more general latent com- of Canonical Decomposition (CANDECOMP)
ponents in the data than matrix-based methods. in psychometrics [5] and Parallel Factor Model
A comprehensive introduction to tensor decom- (PARAFAC) in linguistics [6]. Tensors were sub-
positions is provided from a signal processing sequently adopted in diverse branches of data
perspective, starting from the algebraic founda- analysis such as chemometrics, food industry
tions, via basic Canonical Polyadic and Tucker and social sciences [7], [8]. When it comes to
models, through to advanced cause-effect and Signal Processing, the early 1990s saw a consid-
multi-view data analysis schemes. We show that erable interest in Higher-Order Statistics (HOS)
tensor decompositions enable natural general- [9] and it was soon realized that for the mul-
izations of some commonly used signal pro- tivariate case HOS are effectively higher-order
cessing paradigms, such as canonical correla- tensors; indeed, algebraic approaches to Inde-
tion and subspace techniques, signal separation, pendent Component Analysis (ICA) using HOS
linear regression, feature extraction and classi- [10]–[12] were inherently tensor-based. Around
fication. We also cover computational aspects, 2000, it was realized that the Tucker decompo-
and point out how ideas from compressed sens- sition represents a MultiLinear Singular Value
ing and scientific computing may be used for Decomposition (MLSVD) [13]. Generalizing the
addressing the otherwise unmanageable stor- matrix SVD, the workhorse of numerical linear
age and manipulation problems associated with algebra, the MLSVD spurred the interest in
big datasets. The concepts are supported by tensors in applied mathematics and scientific
illustrative real world case studies illuminating computing in very high dimensions [14]–[16].
the benefits of the tensor framework, as ef- In parallel, CPD was successfully adopted as a
ficient and promising tools for modern signal tool for sensor array processing and determinis-
processing, data analysis and machine learning tic signal separation in wireless communication
applications; these benefits also extend to [17], [18]. Subsequently, tensors have been used
vector/matrix data through tensorization. in audio, image and video processing, machine
2
bra is much structurally richer than linear al- where D = diag(λ1 , λ2 , . . . , λ R ) is a scaling
gebra. For example, even basic notions such as (normalizing) matrix, the columns of B repre-
rank have a more subtle meaning, uniqueness sent the unknown source signals (factors or la-
conditions of higher-order tensor decomposi- tent variables depending on the tasks in hand),
tions are more relaxed and accommodating than the columns of A represent the associated mix-
those for matrices [31], [32], while matrices and ing vectors (or factor loadings), while E is noise
tensors also have completely different geometric due to an unmodelled data part or model error.
properties [20]. This boils down to matrices rep- In other words, model (1) assumes that the
resenting linear transformations and quadratic data matrix X comprises hidden components
forms, while tensors are connected with multi- br (r = 1, 2, . . . , R) that are mixed together in
linear mappings and multivariate polynomials an unknown manner through coefficients A, or,
[29]. equivalently, that data contain factors that have
3
A(:, :, i3 , . . . , i N ) matrix slice of tensor A obtained by fixing all but two indices
an associated loading for every data channel. [30], Nonnegative Matrix Factorization (NMF)
Figure 2 (top) depicts the model (1) as a dyadic [19], and harmonic retrieval [35].
decomposition, whereby the terms ar ◦ br =
ar brT are rank-1 matrices.
T ENSORIZATION — B LESSING OF
The well-known indeterminacies intrinsic to D IMENSIONALITY
this model are: (i) arbitrary scaling of compo-
nents, and (ii) permutation of the rank-1 terms. While one-way (vectors) and two-way (matri-
Another indeterminacy is related to the physical ces) algebraic structures were respectively intro-
meaning of the factors: if the model in (1) is un- duced as natural representations for segments
constrained, it admits infinitely many combina- of scalar measurements and measurements on
tions of A and B. Standard matrix factorizations a grid, tensors were initially used purely for
in linear algebra, such as the QR-factorization, the mathematical benefits they provide in data
Eigenvalue Decomposition (EVD), and Singular analysis; for instance, it seemed natural to stack
Value Decomposition (SVD), are only special together excitation-emission spectroscopy ma-
cases of (1), and owe their uniqueness to hard trices in chemometrics into a third-order tensor
and restrictive constraints such as triangularity [7].
and orthogonality. On the other hand, certain The procedure of creating a data tensor from
properties of the factors in (1) can be repre- lower-dimensional original data is referred to
sented by appropriate constraints, making pos- as tensorization, and we propose the following
sible unique estimation or extraction of such fac- taxonomy for tensor generation:
tors. These constraints include statistical inde- 1) Rearrangement of lower dimensional data
pendence, sparsity, nonnegativity, exponential structures. Large-scale vectors or matrices
structure, uncorrelatedness, constant modulus, are readily tensorized to higher-order ten-
finite alphabet, smoothness and unimodality. sors, and can be compressed through ten-
Indeed, the first four properties form the basis sor decompositions if they admit a low-
of Independent Component Analysis (ICA) [12], rank tensor approximation; this principle
[33], [34], Sparse Component Analysis (SCA) facilitates big data analysis [21], [27], [28]
4
6
I =2 where b = [1, z, z2, · · · ] T .
Also, in sensor array processing, tensor
structures naturally emerge when combin-
...
plays) [40]. Also in scientific computing we tensor components that can be retrieved with
often need to evaluate a discretized mul- sufficient accuracy, and often there are only a
tivariate function; this is a natural tensor, few data components present. A pragmatic first
as illustrated in Figure 1 (bottom) for a assessment of the number of components may
trivariate function f ( x, y, z) [21], [27], [28]. be through the inspection of the multilinear
The high dimensionality of the tensor format singular value spectrum (see Section Tucker
is associated with blessings — these include Decomposition), which indicates the size of
possibilities to obtain compact representations, the core tensor in Figure 2 (bottom-right). Ex-
uniqueness of decompositions, flexibility in the isting techniques for rank estimation include
choice of constraints, and generality of compo- the CORCONDIA algorithm (core consistency
nents that can be identified. diagnostic) which checks whether the core ten-
sor is (approximately) diagonalizable [7], while
a number of techniques operate by balancing
C ANONICAL P OLYADIC D ECOMPOSITION the approximation error versus the number of
Definition. A Polyadic Decomposition degrees of freedom for a varying number of
(PD) represents an Nth-order tensor rank-1 terms [42]–[44].
X ∈ R I1 × I2 ×···× IN as a linear combination Uniqueness. Uniqueness conditions give the-
of rank-1 tensors in the form oretical bounds for exact tensor decomposi-
R tions. A classical uniqueness condition is due to
(1) (2) (N)
X= ∑ λr b r ◦ b r ◦ · · · ◦ br . (3) Kruskal [31], which states that for third-order
r =1 tensors the CPD is unique up to unavoidable
Equivalently, X is expressed as a multilinear scaling and permutation ambiguities, provided
product with a diagonal core: that k B(1) + k B(2) + k B(3) ≥ 2R + 2, where the
Kruskal rank k B of a matrix B is the maximum
X = D ×1 B ( 1 ) ×2 B ( 2 ) · · · × N B ( N ) value ensuring that any subset of k B columns is
= JD; B(1) , B(2), . . . , B( N ) K, (4) linearly independent. In sparse modeling, the
term (k B + 1) is also known as the spark [30].
where D = diag N (λ1 , λ2 , . . . , λ R ) (cf. the matrix
A generalization to Nth-order tensors is due to
case in (1)). Figure 2 (bottom) illustrates these
Sidiropoulos and Bro [45], and is given by:
two interpretations for a third-order tensor. The
tensor rank is defined as the smallest value of N
R for which (3) holds exactly; the minimum ∑ kB(n) ≥ 2R + N − 1. (6)
n =1
rank PD is called canonical (CPD) and is desired
in signal separation. The term CPD may also More relaxed uniqueness conditions can be ob-
be considered as an abbreviation of CANDE- tained when one factor matrix has full column
COMP/PARAFAC decomposition, see Histori- rank [46]–[48]; for a thorough study of the
cal notes. The matrix/vector form of CPD can third-order case, we refer to [32]. This all shows
be obtained via the Khatri-Rao products as: that, compared to matrix decompositions, CPD
is unique under more natural and relaxed con-
X ( n ) = B ( n ) D B ( N ) ⊙ · · · ⊙ B ( n +1 ) ditions, that only require the components to be
T “sufficiently different” and their number not
⊙ B ( n −1 ) ⊙ · · · ⊙ B ( 1 ) (5) unreasonably large. These conditions do not
have a matrix counterpart, and are at the heart
vec(X) = [B( N ) ⊙ B( N −1) ⊙ · · · ⊙ B(1) ] d.
of tensor based signal separation.
where d = (λ1 , λ2 , . . . , λ R ) T . Computation. Certain conditions, including
Rank. As mentioned earlier, rank-related Kruskal’s, enable explicit computation of the
properties are very different for matrices and factor matrices in (3) using linear algebra (es-
tensors. For instance, the number of complex- sentially, by solving sets of linear equations
valued rank-1 terms needed to represent a and by computing (generalized) Eigenvalue
higher-order tensor can be strictly less than Decomposition) [6], [47], [49], [50]. The presence
the number of real-valued rank-1 terms [20], of noise in data means that CPD is rarely ex-
while the determination of tensor rank is in act, and we need to fit a CPD model to the
general NP-hard [41]. Fortunately, in signal pro- data by minimizing a suitable cost function.
cessing applications, rank estimation most of- This is typically achieved by minimizing the
ten corresponds to determining the number of Frobenius norm of the difference between the
6
l1 lR
b1 bR
X
X @ + ... + = A T
B
a1 aR ar D
(I ´ J ) (I ´ R) (R´ R) (R´ J ) b r
c1 cR C (K ´ R)
l1 lR cr
@ b 1 + ... + bR =
A T
ar B
a1 aR
(I ´ J ´ K ) (I ´ R) (R´ R´ R) (R´ J ) b r
Figure 2: Analogy between dyadic (top) and polyadic (bottom) decompositions; the Tucker format has a
diagonal core. The uniqueness of these decompositions is a prerequisite for blind source separation and
latent variable analysis.
given data tensor and its CP approximation, to be typically more robust to overfactoring,
or alternatively by least absolute error fitting but come at a cost of a much higher compu-
when the noise is Laplacian [51]. Theoretical tational load per iteration. More sophisticated
Cramér-Rao Lower Bound (CRLB) and Cramér- versions use the rank-1 structure of the terms
Rao Induced Bound (CRIB) for the assessment within CPD to perform efficient computation
of CPD performance were derived in [52] and and storage of the Jacobian and (approximate)
[53]. Hessian; their complexity is on par with ALS
Since the computation of CPD is intrinsi- while for ill-conditioned cases the performance
cally multilinear, we can arrive at the solution is often superior [60], [61].
through a sequence of linear sub-problems as in An important difference between matrices
the Alternating Least Squares (ALS) framework, and tensors is that the existence of a best rank-R
whereby the LS cost function is optimized for approximation of a tensor of rank greater than
one component matrix at a time, while keeping R is not guaranteed [20], [62] since the set of
the other component matrices fixed [6]. As seen tensors whose rank is at most R is not closed.
from (5), such a conditional update scheme boils As a result, cost functions for computing factor
down to solving overdetermined sets of linear matrices may only have an infimum (instead
equations. of a minimum) so that their minimization will
While the ALS is attractive for its simplicity approach the boundary of that set without ever
and satisfactory performance for a few well reaching the boundary point. This will cause
separated components and at sufficiently high two or more rank-1 terms go to infinity upon
SNR, it also inherits the problems of alternating convergence of an algorithm, however, numeri-
algorithms and is not guaranteed to converge to cally the diverging terms will almost completely
a stationary point. This can be rectified by only cancel one another while the overall cost func-
updating the factor matrix for which the cost tion will still decrease along the iterations [63].
function has most decreased at a given step [54], These diverging terms indicate an inappropriate
but this results in an N-times increase in com- data model: the mismatch between the CPD and
putational cost per iteration. The convergence the original data tensor may arise due to an
of ALS is not yet completely understood — it underestimated number of components, not all
is quasi-linear close to the stationary point [55], tensor components having a rank-1 structure, or
while it becomes rather slow for ill-conditioned data being too noisy.
cases; for more detail we refer to [56], [57]. Constraints. As mentioned earlier, under
Conventional all-at-once algorithms for nu- quite mild conditions the CPD is unique by
merical optimization such as nonlinear conju- itself, without requiring additional constraints.
gate gradients, quasi-Newton or nonlinear least However, in order to enhance the accuracy and
squares [58], [59] have been shown to often robustness with respect to noise, prior knowl-
outperform ALS for ill-conditioned cases and edge of data properties (e.g., statistical inde-
7
pendence, sparsity) may be incorporated into shall consider Jk X as slices of the tensor X ∈
the constraints on factors so as to facilitate C I × J ×K (see Section Tensorization). It can be
their physical interpretation, relax the unique- shown that the signal part of X admits a CPD
ness conditions, and even simplify computation as in (3)–(4), with λ1 = · · · = λ R = 1,
(3) (3)
[64]–[66]. Moreover, the orthogonality and non- Jk A = B(1) diag(bk1 , . . . , bkR ) and B(2) = S
negativity constraints ensure the existence of [17], and the consequent source separation un-
the minimum of the optimization criterion used der rather mild conditions — its uniqueness
[63], [64], [67]. does not require constraints such as statistical
Applications. The CPD has already been es- independence or constant modulus. Moreover,
tablished as an advanced tool for signal sep- the decomposition is unique even in cases when
aration in vastly diverse branches of signal the number of sources R exceeds the number of
processing and data analysis, such as in audio subarray sensors I, or even the total number of
and speech processing, biomedical engineering, sensors Ĩ. Notice that particular array geome-
chemometrics, and machine learning [7], [22], tries, such as linearly and uniformly displaced
[23], [26]. Note that algebraic ICA algorithms subarrays, can be converted into a constraint
are effectively based on the CPD of a tensor on CPD, yielding a further relaxation of the
of the statistics of recordings; the statistical in- uniqueness conditions, reduced sensitivity to
dependence of the sources is reflected in the noise, and often faster computation [65].
diagonality of the core tensor in Figure 2, that is,
in vanishing cross-statistics [11], [12]. The CPD
T UCKER D ECOMPOSITION
is also heavily used in exploratory data anal-
ysis, where the rank-1 terms capture essential Figure 3 illustrates the principle of Tucker
properties of dynamically complex signals [8]. decomposition which treats a tensor X ∈
Another example is in wireless communication, R I1 × I2 ×···× IN as a multilinear transformation of
where the signals transmitted by different users a (typically dense but small) core tensor G ∈
correspond to rank-1 terms in the case of line- R R1 × R2 ×···× R N by the factor matrices B(n) =
of-sight propagation [17]. Also, in harmonic (n) (n) (n)
[b1 , b2 , . . . , bRn ] ∈ R In × Rn , n = 1, 2, . . . , N [3],
retrieval and direction of arrival type applica- [4], given by
tions, real or complex exponentials have a rank-
R1 R2 RN
1 structure, for which the use of CPD is natural
(1) (2) (N)
[36], [65]. X= ∑ ∑ ··· ∑ gr1 r2 ···r N br1 ◦ br2 ◦ · · · ◦ br N
r 1 =1 r 2 =1 r N =1
Example 1. Consider a sensor array consisting (7)
of K displaced but otherwise identical subarrays or equivalently
of I sensors, with Ĩ = KI sensors in total.
For R narrowband sources in the far field, the X = G ×1 B ( 1 ) ×2 B ( 2 ) · · · × N B ( N )
baseband equivalent model of the array output = JG; B(1) , B(2), . . . , B( N ) K. (8)
becomes X = AS T + E, where A ∈ C Ĩ × R is
Via the Kronecker products (see Table II) Tucker
the global array response, S ∈ C J × R contains
decomposition can be expressed in a ma-
J snapshots of the sources, and E is noise. A
trix/vector form as:
single source (R = 1) can be obtained from
the best rank-1 approximation of the matrix X, X ( n ) = B ( n ) G ( n ) ( B ( N ) ⊗ · · · ⊗ B ( n +1 ) ⊗ B ( n −1 ) ⊗ · · · ⊗ B ( 1 ) ) T
however, for R > 1 the decomposition of X is
vec(X) = [B( N ) ⊗ B( N −1) ⊗ · · · ⊗ B(1) ] vec(G).
not unique, and hence the separation of sources
is not possible without incorporating additional Although Tucker initially used the orthogonal-
information. Constraints on the sources that ity and ordering constraints on the core tensor
may yield a unique solution are, for instance, and factor matrices [3], [4], we can also employ
constant modulus or statistical independence other meaningful constraints (see below).
[12], [68]. Multilinear rank. For a core tensor of mini-
Consider a row-selection matrix Jk ∈ C I × Ĩ mal size, R1 is the column rank (the dimension
that extracts the rows of X corresponding to of the subspace spanned by mode-1 fibers), R2
the k-th subarray, k = 1, . . . , K. For two iden- is the row rank (the dimension of the subspace
tical subarrays, the generalized EVD of the spanned by mode-2 fibers), and so on. A re-
matrices J1 X and J2 X corresponds to the well- markable difference from matrices is that the
known ESPRIT [69]. For the case K > 2, we values of R1 , R2 , . . . , R N can be different for N ≥
8
Matrix representations
X(:,:,k)
J X(1) J (SVD/PCA)
... S1
...
Þ I @ U1 T
V1
Figure 4: Multiway Component Analysis (MWCA) for a third-order tensor, assuming that the components
are: principal and orthogonal in the first mode, nonnegative and sparse in the second mode and statistically
independent in the third mode.
are zero. The columns of Un may thus be seen trates the concept of MWCA and its flexibility in
as multilinear singular vectors, while the norms choosing the mode-wise constraints; a Tucker
of the slices of the core are multilinear singular representation of MWCA naturally accommo-
values [13]. As in the matrix case, the multilin- dates such diversities in different modes.
ear singular values govern the multilinear rank, Other applications. We have shown that
while the multilinear singular vectors allow, for Tucker decomposition may be considered as
each mode separately, an interpretation as in a multilinear extension of PCA [8]; it there-
PCA [8]. fore generalizes signal subspace techniques,
Low multilinear rank approximation. Anal- with applications including classification, fea-
ogous to PCA, a large-scale data tensor X can ture extraction, and subspace-based harmonic
be approximated by discarding the multilinear retrieval [25], [39], [75], [76]. For instance, a
singular vectors and slices of the core tensor that low multilinear rank approximation achieved
correspond to small multilinear singular values, through Tucker decomposition may yield a
that is, through truncated matrix SVDs. Low higher Signal-to-Noise Ratio (SNR) than the
multilinear rank approximation is always well- SNR in the original raw data tensor, making
posed, however, the truncation is not necessar- Tucker decomposition a very natural tool for
ily optimal in the LS sense, although a good es- compression and signal enhancement [7], [8],
timate can often be made as the approximation [24].
error corresponds to the degree of truncation.
When it comes to finding the best approxima- B LOCK T ERM D ECOMPOSITIONS
tion, the ALS type algorithms exhibit similar
advantages and drawbacks to those used for We have already shown that CPD is unique
CPD [8], [70]. Optimization-based algorithms under quite mild conditions, a further advan-
exploiting second-order information have also tage of tensors over matrices is that it is even
been proposed [71], [72]. possible to relax the rank-1 constraint on the
terms, thus opening completely new possibili-
Constraints and Tucker-based multiway
ties in e.g. BSS. For clarity, we shall consider
component analysis (MWCA). Besides orthog-
the third-order case, whereby, by replacing the
onality, constraints that may help to find unique (1) (2) (1) (2) T
basis vectors in a Tucker representation include rank-1 matrices br ◦ br = br br in (3) by
statistical independence, sparsity, smoothness low-rank matrices Ar BrT , the tensor X can be
and nonnegativity [19], [73], [74]. Components represented as (Figure 5, top):
of a data tensor seldom have the same proper- R
ties in its modes, and for physically meaningful X= ∑ (Ar BrT ) ◦ cr . (11)
representation different constraints may be re- r =1
quired in different modes, so as to match the Figure 5 (bottom) shows that we can even use
properties of the data at hand. Figure 4 illus- terms that are only required to have a low
10
0
0.1
r =1 0
s1
s1
−0.1
−0.1
These so-called Block Term Decompositions −0.2
−0.2
(BTD) admit the modelling of more complex −0.3
signal components than CPD, and are unique 0.05 0.1 0.15
Time (seconds)
0.2 0.05 0.1 0.15
Time (seconds)
0.2
0
duration correlated sources, BSS was performed −0.1
20
were contaminated by white Gaussian noise, Figure 6: Blind separation of the mixture of a pure
to give the mixtures X = AS + E ∈ R5×60 , sine wave and an exponentially modulated sine wave
where S(t) = [s1 (t), s2 (t)] T and A ∈ R5×2 was a using PCA, ICA, CPD, Tucker decomposition (TKD)
and BTD. The sources s1 and s2 are correlated and
random matrix whose columns (mixing vectors) of short duration; the symbols ŝ1 and ŝ2 denote the
satisfy a1T a2 = 0.1, k a1 k = k a2 k = 1. The estimated sources.
3Hz sine wave did not complete a full period
over the 60 samples, so that the two sources
|s T s |
had a correlation degree of ks k1 ks2 k = 0.35. H IGHER -O RDER C OMPRESSED S ENSING
1 2 2 2
The tensor approaches, CPD, Tucker decompo- The aim of Compressed Sensing (CS) is to
sition and BTD employed a third-order tensor provide faithful reconstruction of a signal of
X of size 24 × 37 × 5 generated from five Han- interest when the set of available measurements
kel matrices whose elements obey X(i, j, k) = is (much) smaller than the size of the original
X(k, i + j − 1) (see Section Tensorization). The signal [80]–[83]. Formally, we have available M
average squared angular error (SAE) was used (compressive) data samples y ∈ R M , which
as the performance measure. Figure 6 shows the are assumed to be linear transformations of
simulation results, illustrating that: the original signal x ∈ R I (M < I). In other
• PCA failed since the mixing vectors were words, y = Φx, where the sensing matrix Φ ∈
not orthogonal and the source signals were R M× I is usually random. Since the projections
correlated, both violating the assumptions are of a lower dimension than the original
for PCA. data, the reconstruction is an ill-posed inverse
11
(1)
C
(1) @
(1)T
A
(1) (1) B
...
(k) C
C
(1) Þ (k) @ Þ B
T
(k) (k)T
A (k) B (K)
...
(K)
C A
(K) @
(k) (K) (K)T
A (K) B
Figure 10: Efficient computation of the CP and Tucker decompositions, whereby tensor decompositions are
computed in parallel for sampled blocks, these are then merged to obtain the global components A, B, C
and a core tensor G.
that can be represented by two parameters: for efficient computation and for tracking
the scaling factor a and the generator z (cf. decompositions in the case of nonstationary
(2) in Section Tensorization). Non-symmetric data.
terms provide further opportunities, beyond the The second approach would be to employ
sum-of-exponential representation by symmet- compressed sensing ideas (see Section Higher-
ric low-rank tensors. Huge matrices and tensors Order Compressed Sensing) to fit an algebraic
may be dealt with in the same manner. For model with a limited number of parameters to
instance, an Nth-order tensor X ∈ R I1 ×···× IN , possibly large data. In addition to completion,
with In = q Ln , can be quantized in all modes the goal here is a significant reduction of the
simultaneously to yield a (q × q × · · · × q) quan- cost of data acquisition, manipulation and stor-
tized tensor of higher order. In QTN, q is small, age — breaking the Curse of Dimensionality
typically q = 2, 3, 4, for example, the binary being an extreme case.
encoding (q = 2) reshapes an Nth-order ten- While algorithms for this purpose are avail-
sor with (2 L1 × 2 L2 × · · · × 2 L N ) elements into able both for low rank and low multilinear
a tensor of order ( L1 + L2 + · · · + L N ) with rank representation [59], [87], an even more
the same number of elements. The tensor train drastic approach would be to directly adopt
decomposition applied to quantized tensors is sampled fibers as the bases in a tensor repre-
referred to as the quantized TT (QTT); variants sentation. In the Tucker decomposition setting
for other tensor representations have also been we would choose the columns of the factor
derived [27], [28]. In scientific computing, such matrices B(n) as mode-n fibers of the tensor,
formats provide the so-called super-compression which requires addressing the following two
— a logarithmic reduction of storage require- problems: (i) how to find fibers that allow us
ments: O( I N ) → O( N logq ( I )). to best represent the tensor, and (ii) how to
Computation of the decomposi- compute the corresponding core tensor at a
tion/representation. Now that we have low cost (i.e., with minimal access to the data).
addressed the possibilities for efficient tensor The matrix counterpart of this problem (i.e.,
representation, the question that needs to be representation of a large matrix on the basis of
answered is how these representations can a few columns and rows) is referred to as the
be computed from the data in an efficient pseudoskeleton approximation [98], where the opti-
manner. The first approach is to process the mal representation corresponds to the columns
data in smaller blocks rather than in a batch and rows that intersect in the submatrix of
manner [95]. In such a “divide-and-conquer” maximal volume (maximal absolute value of the
approach, different blocks may be processed determinant). Finding the optimal submatrix is
in parallel and their decompositions carefully computationally hard, but quasi-optimal sub-
recombined (see Figure 10) [95], [96]. In matrices may be found by heuristic so-called
fact, we may even compute the decomposition “cross-approximation” methods that only re-
through recursive updating, as new data arrive quire a limited, partial exploration of the data
[97]. Such recursive techniques may be used matrix. Tucker variants of this approach have
15
123%&45&6$78&$9:4;<2=&
C(3) !
!"#$%&'()& T R
P
S
>$;<=&#?2@?1&$&A9=3 prT
*'(+&,'(+&
X @ T =
-./+&0&
-
-./+ tr
C(1) r =1
(I ´ N ) (I ´ R ) (R ´ N )
G
Y ∼
=
C(2)
QT R
Figure 11: Tucker representation through fiber sam-
pling and cross-approximation: the columns of factor
Y @ U = S
r =1 ur
qrT
(2) (2)
X( I × J × K ) and Y( I × M × N ) can be P1 PR
flattened into long matrices X( I × JK ) and (I3 ´ L3) (I3 ´ L3)
Y( I × MN ), so as to admit matrix-PLS (see
Figure 12). However, the flattening prior
@ P1
(1)T
+L+ PR
(1)T
(1) (R)
to standard bilinear PLS obscures structure t1 tR
X X
in multiway data and compromises the (I1 ´ I2 ´ I3) ( I1) (L2 ´ I2 ) ( I1) (L2 ´ I2 )
interpretation of latent components.
2) By low rank tensor approximation. The so- (2) ( I3 ´ R3L3)
P
called N-PLS attempts to find score vec-
tors having maximal covariance with re-
sponse variables, under the constraints ...
= T P(1)T
that tensors X and Y are decomposed as
X
a sum of rank-one tensors [104].
(I1 ´ R) (R ´ RL2 ´ RL3) (RL2 ´ I2 )
3) By a BTD-type approximation, as in the
Higher Order PLS (HOPLS) model shown
in Figure 13 [105]. The use of block terms
(2) (2)
within HOPLS equips it with additional Q1 QR
( J 3 ´ L3) ( J 3 ´ L3)
flexibility, together with a more realistic
analysis than unfolding-PLS and N-PLS. @ Q1
(1)T
+L+ QR(1)T
The principle of HOPLS can be formalized as u1 (1) (R)
uR
a set of sequential approximate decompositions Y Y
(I1 ´ J 2 ´ J3) ( I1) ( L2 ´ J 2 ) ( I1) ( L2 ´ J 2 )
of the independent tensor X ∈ R I1 × I2 ×···× IN and
the dependent tensor Y ∈ R J1 × J2 ×···× JM (with
I1 = J1 ), so as to ensure maximum similarity Q (2) ( J 3 ´ R3L3)
(correlation) between the scores tr and ur within
the loadings matrices T and U, based on
... (1)
R U Q
X∼
(r ) (1) ( N −1 ) =
= ∑ G X ×1 t r ×2 P r · · · × N Pr (17) Y
r =1
R ( I1 ´ R) ( R ´ RL2 ´ RL3) ( RL2 ´ J 2 )
(r ) (1) ( M −1 )
Y∼
= ∑ GY ×1 ur ×2 Qr · · · × N Qr . (18)
r =1 Figure 13: The principle of Higher Order PLS
(HOPLS) for third-order tensors. The core tensors GX
A number of data-analytic problems can be and GY are block-diagonal. The BTD-type structure
reformulated as either regression or “similarity allows for the modelling of general components that
analysis” (ANOVA, ARMA, LDA, CCA), so that are highly correlated in the first mode.
both the matrix and tensor PLS solutions can be
generalized across exploratory data analysis.
Example 4: Decoding of a 3D represented as a third-order tensor Y
hand movement trajectory from the (time×3D marker position×marker no).
electrocorticogram (ECoG). The predictive The goal of the training stage is to identify
power of tensor-based PLS is illustrated on (r ) (r ) ( n) (n)
the HOPLS parameters: GX , GY , Pr , Qr ,
a real-world example of the prediction of see also Figure 13. In the test stage, the
arm movement trajectory from ECoG. Fig. movement trajectories, Y∗ , for the new
14(left) illustrates the experimental setup,
ECoG data, X∗ , are predicted through
whereby 3D arm movement of a monkey multilinear projections: (i) the new scores,
was captured by an optical motion capture
tr∗ , are found from new data, X∗ , and the
system with reflective markers affixed to (r ) (1) (2) (3)
existing model parameters: GX , Pr , Pr , Pr ,
the left shoulder, elbow, wrist, and hand;
(ii) the predicted trajectory is calculated as
for full detail see (http://neurotycho.org). (r ) (1) (2) (3)
The predictors (32 ECoG channels) Y∗ ≈ ∑rR=1 GY ×1 t r∗ ×2 Qr ×3 Qr ×4 Qr . In
naturally build a fourth-order tensor X the simulations, standard PLS was applied in
(time×channel no×epoch length×frequency) the same way to the unfolded tensors.
while the movement trajectories for Figure 14(right) shows that although the stan-
the four markers (response) can be dard PLS was able to predict the movement cor-
17
Figure 14: Prediction of arm movement from brain electrical responses. Left: Experiment setup. Middle:
Construction of the data and response tensors and training. Right: The new data tensor (bottom) and the
predicted 3D arm movement trajectories (X, Y, Z coordinates) obtained by tensor-based HOPLS and standard
matrix-based PLS (top).
responding to each marker individually, such The linked multiway component analysis
prediction is quite crude as the two-way PLS (LMWCA) [106], shown in Figure 15, performs
does not adequately account for mutual infor- such decomposition into shared and individ-
mation among the four markers. The enhanced ual factors, and is formulated as a set of ap-
predictive performance of the BTD-based HO- proximate joint Tucker decompositions of a set
PLS (red line in Fig.14(right)) is therefore at- of data tensors X(k) ∈ R I1 × I2 ×···× IN , (k =
tributed to its ability to model interactions be- 1, 2, . . . , K ):
tween complex latent components of both pre-
dictors and responses. X( k ) ∼
= G(k) ×1 B(1,k) ×2 B(2,k) · · · × N B( N,k) , (19)
(3)
BC B(3,1)
I
(1)
BC B(1,1)
I B(3,1)
(1) @ B(1,1) BC
(2)T
(1)
(2,1)T Sample images from different and same categories
B
(2,1)T BI
4.+8)8)2%9+,+ &' ')%*$+,-.$/%
…
… !"
567&!
567&!
*'.%$+01%0+,$2'.3
(3) (3, K)
!""#$%
BC BI !"
(1) K) (I3 × R3)
BC B(1,I B(3, K ) !" 4$/,%
4$/,%/+ "#$
(K ) @
!"
!"
(1,K ) (2)T
B BC
(K)
(2, K )T (2,K )T
B BI !"
Find the class
567&!
567&! whose common
&'(%
(I1 × I2 × I3) (I1 × R1) (R1 × R2 × R3 ) (R2 × I2) !"
features
featu best
Figure 15: Coupled Tucker decomposition for linked
!" match the test
sample%
multiway component analysis (LMWCA). The data
tensors have both shared and individual components.
Constraints such as orthogonality, statistical indepen- Classification based on LMWCA
dence, sparsity and non-negativity may be imposed
where appropriate. LWCA KNN−PCA LDA−PCA
95
vanced algorithms for CPD, nonnegative We have also discussed multilinear variants of
Tucker decomposition and MWCA [112], several standard signal processing tools such
[113]. as multilinear SVD, ICA, NMF and PLS, and
• The Tensorlab toolbox builds upon the have shown that tensor methods can operate
complex optimization framework and of- in a deterministic way on signals of very short
fers numerical algorithms for computing duration.
the CPD, BTD and Tucker decompositions. At present the uniqueness conditions of stan-
The toolbox includes a library of con- dard tensor models are relatively well under-
straints (e.g. nonnegativity, orthogonality) stood and efficient computation algorithms do
and the possibility to combine and jointly exist, however, for future applications several
factorize dense, sparse and incomplete ten- challenging problems remain to be addressed
sors [89]. in more depth:
• The N-Way Toolbox, which includes (con- • A whole new area emerges when several
strained) CPD, Tucker decomposition and decompositions which operate on different
PLS in the context of chemometrics ap- datasets are coupled, as in multiview data
plications [114]. Many of these methods where some details of interest are visible
can handle constraints (e.g., nonnegativity, in only one mode. Such techniques need
orthogonality) and missing elements. theoretical support in terms of existence,
• The TT Toolbox, the uniqueness, and numerical properties.
Hierarchical Tucker Toolbox and the • As the complexity of advanced models in-
Tensor Calculus library provide tensor creases, their computation requires efficient
tools for scientific computing [115]–[117]. iterative algorithms, extending beyond the
• Code developed for multiway ALS class.
analysis is also available from the • Estimation of the number of components
Three-Mode Company [118]. in data, and the assessment of their dimen-
sionality would benefit from automation,
C ONCLUSIONS AND F UTURE D IRECTIONS especially in the presence of noise and out-
We live in a world overwhelmed by data, liers.
from multiple pictures of Big Ben on various so- • Both new theory and algorithms are
cial web links to terabytes of data in multiview needed to further extend the flexibility of
medical imaging, while we may need to repeat tensor models, e.g., for the constraints to
the scientific experiments many times to obtain be combined in many ways, and tailored to
ground truth. Each snapshot gives us a some- the particular signal properties in different
what incomplete view of the same object, and modes.
involves different angles, illumination, lighting • Work on efficient techniques for saving
conditions, facial expressions, and noise. and/or fast processing of ultra large-scale
We have cast a light on tensor decomposi- tensors is urgent, these now routinely oc-
tions as a perfect match for exploratory anal- cupy tera-bytes, and will soon require peta-
ysis of such multifaceted data sets, and have bytes of memory.
illustrated their applications in multi-sensor and • Tools for rigorous performance analysis
multi-modal signal processing. Our emphasis and rule of thumb performance bounds
has been to show that tensor decompositions need to be further developed across tensor
and multilinear algebra open completely new decomposition models.
possibilities for component analysis, as com- • Our discussion has been limited to tensor
pared with the “flat view” of standard two-way models in which all entries take values
methods. independently of one another. Probabilistic
Unlike matrices, tensors are multiway arrays versions of tensor decompositions incor-
of data samples whose representations are typ- porate prior knowledge about complex
ically overdetermined (fewer parameters in the variable interaction, various data alphabets,
decomposition than the number of data entries). or noise distributions, and so promise to
This gives us an enormous flexibility in finding model data more accurately and efficiently
hidden components in data and the ability to [119], [120].
enhance both robustness to noise and tolerance It is fitting to conclude with a quote from
to missing data samples and faulty sensors. Marcel Proust “The voyage of discovery is not in
20
seeking new landscapes but in having new eyes”. Center, RIKEN-BSI. His research interests include multiway
We hope to have helped to bring to the eyes data analysis, brain computer interface and machine learn-
ing.
of the Signal Processing Community the multi- Lieven De Lathauwer received the Ph.D. degree from
disciplinary developments in tensor decomposi- the Faculty of Engineering, KU Leuven, Belgium, in 1997.
tions, and to have shared our enthusiasm about From 2000 to 2007 he was Research Associate with the
Centre National de la Recherche Scientifique, France. He
tensors as powerful tools to discover new land- is currently Professor with KU Leuven. He is affiliated
scapes. The future computational, visualization with both the Group Science, Engineering and Technology
and interpretation tools will be important next of Kulak, with the Stadius Center for Dynamical Systems,
Signal Processing and Data Analytics of the Electrical Engi-
steps in supporting the different communities neering Department (ESAT) and with iMinds Future Health
working on large-scale and big data analysis Department. He is Associate Editor of the SIAM Journal on
problems. Matrix Analysis and Applications and has served as Asso-
ciate Editor for the IEEE Transactions on Signal Processing.
His research concerns the development of tensor tools for
B IOGRAPHICAL NOTES engineering applications.
Andrzej Cichocki received the Ph.D. and Dr.Sc. (habilita-
tion) degrees, all in electrical engineering, from the Warsaw R EFERENCES
University of Technology (Poland). He is currently a Senior
Team Leader of the laboratory for Advanced Brain Signal [1] F. L. Hitchcock, “Multiple invariants and generalized
Processing, at RIKEN Brain Science Institute (JAPAN) and rank of a p-way matrix or tensor,” Journal of Mathe-
Professor at Systems Research Institute, Polish Academy of matics and Physics, vol. 7, pp. 39–79, 1927.
Science(POLAND). He has authored of more than 400 pub- [2] R. Cattell, “Parallel proportional profiles and other
lications and 4 monographs in the areas of signal processing principles for determining the choice of factors by
and computational neuroscience. He serves as Associate rotation,” Psychometrika, vol. 9, pp. 267–283, 1944.
Editor for the IEEE Transactions on Signal Processing and [3] L. R. Tucker, “The extension of factor analysis to
Journal Neuroscience Methods. three-dimensional matrices,” in Contributions to Math-
Danilo P. Mandic is a Professor of signal processing at Im- ematical Psychology, H. Gulliksen and N. Frederiksen,
perial College London, London, U.K. and has been working Eds. New York: Holt, Rinehart and Winston, 1964,
in the area of nonlinear and multidimensional adaptive sig- pp. 110–127.
nal processing and time-frequency analysis. His publication [4] ——, “Some mathematical notes on three-mode factor
record includes two research monographs titled Recurrent analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311,
Neural Networks for Prediction (West Sussex, U.K.: Wiley, September 1966.
August 2001) and Complex Valued Nonlinear Adaptive [5] J. Carroll and J.-J. Chang, “Analysis of individ-
Filters: Noncircularity, Widely Linear and Neural Models, ual differences in multidimensional scaling via an
an edited book titled Signal Processing for Information n-way generalization of ’Eckart-Young’ decomposi-
Fusion, and more than 200 publications on signal and image tion,” Psychometrika, vol. 35, no. 3, pp. 283–319,
processing. September 1970.
Anh Huy Phan received the Ph.D. degree from the Kita [6] R. A. Harshman, “Foundations of the PARAFAC pro-
Kyushu Institute of Technology, Japan in 2011. He worked cedure: Models and conditions for an explanatory
as Deputy Head of the Research and Development Depart- multimodal factor analysis,” UCLA Working Papers in
ment, Broadcast Research and Application Center, Vietnam Phonetics, vol. 16, pp. 1–84, 1970.
Television, and is currently a Research Scientist at the [7] A. Smilde, R. Bro, and P. Geladi, Multi-way Analysis:
Laboratory for Advanced Brain Signal Processing, and a Applications in the Chemical Sciences. New York: John
Visiting Research Scientist with the Toyota Collaboration Wiley & Sons Ltd, 2004.
Center, Brain Science Institute, RIKEN. He has served on the [8] P. Kroonenberg, Applied Multiway Data Analysis. New
Editorial Board of International Journal of Computational York: John Wiley & Sons Ltd, 2008.
Mathematics. His research interests include multilinear al- [9] C. Nikias and A. Petropulu, Higher-Order Spectra Anal-
gebra, tensor computation, blind source separation, and ysis: A Nonlinear Signal Processing Framework. Prentice
brain computer interface. Hall, 1993.
Cesar F. Caiafa received the Ph.D. degree in engineering [10] J.-F. Cardoso and A. Souloumiac, “Blind beamforming
from the Faculty of Engineering, University of Buenos Aires, for non-Gaussian signals,” in IEE Proceedings F (Radar
in 2007. He is currently Adjunct Researcher with the Ar- and Signal Processing), vol. 140, no. 6. IET, 1993, pp.
gentinean Radioastronomy Institute (IAR) - CONICET and 362–370.
Assistant Professor with Faculty of Engineering, University [11] P. Comon, “Independent component analysis, a new
of Buenos Aires. He is also Visiting Scientist at Lab. for concept?” Signal Processing, vol. 36, no. 3, pp. 287–314,
Advanced Brain Signal Processing, BSI - RIKEN, Japan. 1994.
Guoxu Zhou received his Ph.D. degree in intelligent signal [12] P. Comon and C. Jutten, Eds., Handbook of Blind Source
and information processing from South China University Separation: Independent Component Analysis and Appli-
of Technology, Guangzhou, China, in 2010. He is currently cations. Academic Press, 2010.
a Research Scientist of the Laboratory for Advanced Brain [13] L. De Lathauwer, B. De Moor, and J. Vandewalle,
Signal Processing, at RIKEN Brain Science Institute, Japan. “A multilinear singular value decomposition,” SIAM
His research interests include statistical signal processing, Journal of Matrix Analysis and Applications, vol. 24, pp.
tensor analysis, intelligent information processing, and ma- 1253–1278, 2000.
chine learning. [14] G. Beylkin and M. Mohlenkamp, “Algorithms for
Qibin Zhao received his Ph.D. degree from the Department numerical analysis in high dimensions,” SIAM J. Sci-
of Computer Science and Engineering, Shanghai Jiao Tong entific Computing, vol. 26, no. 6, pp. 2133–2159, 2005.
University, Shanghai, China, in 2009. He is currently a re- [15] J. Ballani, L. Grasedyck, and M. Kluge, “Black box ap-
search scientist at the Laboratory for Advanced Brain Signal proximation of tensors in hierarchical Tucker format,”
Processing in RIKEN Brain Science Institute, Japan and Linear Algebra and its Applications, vol. 438, no. 2, pp.
a visiting research scientist in BSI TOYOTA Collaboration 639–657, 2013.
21
[16] I. V. Oseledets, “Tensor-train decomposition,” SIAM Processing, IEEE Transactions on, vol. 52, no. 7, pp.
J. Scientific Computing, vol. 33, no. 5, pp. 2295–2317, 1814–1829, 2004.
2011. [36] N. Sidiropoulos, “Generalizing Caratheodory’s
[17] N. Sidiropoulos, R. Bro, and G. Giannakis, “Paral- uniqueness of harmonic parameterization to N
lel factor analysis in sensor array processing,” IEEE dimensions,” IEEE Trans. Information Theory, vol. 47,
Transactions on Signal Processing, vol. 48, no. 8, pp. no. 4, pp. 1687–1690, 2001.
2377–2388, 2000. [37] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and
[18] N. Sidiropoulos, G. Giannakis, and R. Bro, “Blind É. Moulines, “A blind source separation technique
PARAFAC receivers for DS-CDMA systems,” IEEE using second-order statistics,” IEEE Trans. Signal Pro-
Transactions on Signal Processing, vol. 48, no. 3, pp. cessing, vol. 45, no. 2, pp. 434–444, 1997.
810–823, 2000. [38] F. Miwakeichi, E. Martnez-Montes, P. Valds-Sosa,
[19] A. Cichocki, R. Zdunek, A.-H. Phan, and S. Amari, N. Nishiyama, H. Mizuhara, and Y. Yamaguchi, “De-
Nonnegative Matrix and Tensor Factorizations: Applica- composing EEG data into space−time−frequency
tions to Exploratory Multi-way Data Analysis and Blind components using parallel factor analysis,” NeuroIm-
Source Separation. Chichester: Wiley, 2009. age, vol. 22, no. 3, pp. 1035–1045, 2004.
[20] J. Landsberg, Tensors: Geometry and Applications. [39] M. Vasilescu and D. Terzopoulos, “Multilinear anal-
AMS, 2012. ysis of image ensembles: Tensorfaces,” in Proc. Eu-
[21] W. Hackbusch, Tensor Spaces and Numerical Tensor ropean Conf. on Computer Vision (ECCV), vol. 2350,
Calculus, ser. Springer series in computational math- Copenhagen, Denmark, May 2002, pp. 447–460.
ematics. Heidelberg: Springer, 2012, vol. 42. [40] M. Hirsch, D. Lanman, G.Wetzstein, and R. Raskar,
[22] E. Acar and B. Yener, “Unsupervised multiway data “Tensor displays,” in Int. Conf. on Computer Graphics
analysis: A literature survey,” IEEE Transactions on and Interactive Techniques, SIGGRAPH 2012, Los An-
Knowledge and Data Engineering, vol. 21, pp. 6–20, geles, CA, USA, Aug. 5-9, 2012, Emerging Technologies
2009. Proceedings, 2012, pp. 24–42.
[23] T. Kolda and B. Bader, “Tensor decompositions and [41] J. Hastad, “Tensor rank is NP-complete,” Journal of
applications,” SIAM Review, vol. 51, no. 3, pp. 455– Algorithms, vol. 11, no. 4, pp. 644–654, 1990.
500, September 2009. [42] M. Timmerman and H. Kiers, “Three mode principal
[24] P. Comon, X. Luciani, and A. L. F. de Almeida, components analysis: Choosing the numbers of com-
“Tensor decompositions, Alternating Least Squares ponents and sensitivity to local optima,” British Jour-
and other Tales,” Jour. Chemometrics, vol. 23, pp. 393– nal of Mathematical and Statistical Psychology, vol. 53,
405, 2009. no. 1, pp. 1–16, 2000.
[25] H. Lu, K. Plataniotis, and A. Venetsanopoulos, “A [43] E. Ceulemans and H. Kiers, “Selecting among three-
survey of multilinear subspace learning for tensor mode principal component models of different types
data,” Pattern Recognition, vol. 44, no. 7, pp. 1540– and complexities: A numerical convex-hull based
1551, 2011. method,” British Journal of Mathematical and Statistical
[26] M. Mørup, “Applications of tensor (multiway array) Psychology, vol. 59, no. 1, pp. 133–150, May 2006.
factorizations and decompositions in data mining,” [44] M. Mørup and L. K. Hansen, “Automatic
Wiley Interdisc. Rew.: Data Mining and Knowledge Dis- relevance determination for multiway models,”
covery, vol. 1, no. 1, pp. 24–40, 2011. Journal of Chemometrics, Special Issue: In Honor
[27] B. Khoromskij, “Tensors-structured numerical meth- of Professor Richard A. Harshman, vol. 23, no.
ods in scientific computing: Survey on recent ad- 7-8, pp. 352 – 363, 2009. [Online]. Available:
vances,” Chemometrics and Intelligent Laboratory Sys- http://www2.imm.dtu.dk/pubdb/p.php?5806
tems, vol. 110, no. 1, pp. 1–19, 2011. [45] N. Sidiropoulos and R. Bro, “On the uniqueness
[28] L. Grasedyck, D. Kessner, and C. Tobler, “A litera- of multilinear decomposition of N-way arrays,” J.
ture survey of low-rank tensor approximation tech- Chemometrics, vol. 14, no. 3, pp. 229–239, 2000.
niques,” CGAMM-Mitteilungen, vol. 36, pp. 53–78, [46] T. Jiang and N. D. Sidiropoulos, “Kruskal’s per-
2013. mutation lemma and the identification of CANDE-
[29] P. Comon, “Tensors: A brief survey,” IEEE Signal COMP/PARAFAC and bilinear models,” IEEE Trans.
Processing Magazine, p. (accepted), 2014. Signal Processing, vol. 52, no. 9, pp. 2625–2636, 2004.
[30] A. Bruckstein, D. Donoho, and M. Elad, “From sparse [47] L. De Lathauwer, “A link between the canonical
solutions of systems of equations to sparse modeling decomposition in multilinear algebra and simultane-
of signals and images,” SIAM Review, vol. 51, no. 1, ous matrix diagonalization,” SIAM J. Matrix Analysis
pp. 34–81, 2009. Applications, vol. 28, no. 3, pp. 642–666, 2006.
[31] J. Kruskal, “Three-way arrays: rank and uniqueness [48] A. Stegeman, “On uniqueness conditions for Cande-
of trilinear decompositions, with application to arith- comp/Parafac and Indscal with full column rank in
metic complexity and statistics,” Linear Algebra and its one mode,” Linear Algebra and its Applications, vol. 431,
Applications, vol. 18, no. 2, pp. 95 – 138, 1977. no. 1–2, pp. 211–227, 2009.
[32] I. Domanov and L. De Lathauwer, “On the unique- [49] E. Sanchez and B. Kowalski, “Tensorial resolution: a
ness of the canonical polyadic decomposition of third- direct trilinear decomposition,” J. Chemometrics, vol. 4,
order tensors — part i: Basic results and uniqueness pp. 29–45, 1990.
of one factor matrix and part ii: Uniqueness of the [50] I. Domanov and L. De Lathauwer, “Canonical
overall decomposition,” SIAM J. Matrix Anal. Appl., polyadic decomposition of third-order tensors: Re-
vol. 34, no. 3, pp. 855–903, 2013. duction to generalized eigenvalue decomposition,”
[33] A. Cichocki and S. Amari, Adaptive Blind Signal and ESAT, KU Leuven, ESAT-SISTA Internal Report 13-36,
Image Processing. John Wiley, Chichester, 2003. 2013.
[34] A. Hyvärinen, “Independent component analysis: re- [51] S. Vorobyov, Y. Rong, N. Sidiropoulos, and A. Gersh-
cent advances,” Philosophical Transactions of the Royal man, “Robust iterative fitting of multilinear models,”
Society A: Mathematical, Physical and Engineering Sci- IEEE Transactions Signal Processing, vol. 53, no. 8, pp.
ences, vol. 371, no. 1984, 2013. 2678–2689, 2005.
[35] M. Elad, P. Milanfar, and G. H. Golub, “Shape from [52] X. Liu and N. Sidiropoulos, “Cramer-Rao lower
moments – an estimation theory perspective,” Signal bounds for low-rank decomposition of multidimen-
22
sional arrays,” IEEE Trans. on Signal Processing, vol. 49, [70] L. De Lathauwer, B. De Moor, and J. Vandewalle, “On
no. 9, pp. 2074–2086, Sep. 2001. the best rank-1 and rank-(R1 , R2 , . . . , R N ) approxima-
[53] P. Tichavsky, A. Phan, and Z. Koldovsky, “Cramér- tion of higher-order tensors,” SIAM Journal of Matrix
rao-induced bounds for candecomp/parafac tensor Analysis and Applications, vol. 21, no. 4, pp. 1324–1342,
decomposition,” IEEE Transactions on Signal Process- 2000.
ing, vol. 61, no. 8, pp. 1986–1997, 2013. [71] B. Savas and L.-H. Lim, “Quasi-Newton methods
[54] B. Chen, S. He, Z. Li, and S. Zhang, “Maximum block on Grassmannians and multilinear approximations of
improvement and polynomial optimization,” SIAM tensors,” SIAM J. Scientific Computing, vol. 32, no. 6,
Journal on Optimization, vol. 22, no. 1, pp. 87–107, 2012. pp. 3352–3393, 2010.
[55] A. Uschmajew, “Local convergence of the alternating [72] M. Ishteva, P.-A. Absil, S. Van Huffel, and L. De Lath-
least squares algorithm for canonical tensor approx- auwer, “Best low multilinear rank approximation of
imation,” SIAM J. Matrix Anal. Appl., vol. 33, no. 2, higher-order tensors, based on the Riemannian trust-
pp. 639–652, 2012. region scheme,” SIAM J. Matrix Analysis Applications,
[56] M. J. Mohlenkamp, “Musings on multilinear fitting,” vol. 32, no. 1, pp. 115–135, 2011.
Linear Algebra and its Applications, vol. 438, no. 2, pp. [73] G. Zhou and A. Cichocki, “Fast and unique Tucker
834–852, 2013. decompositions via multiway blind source separa-
[57] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A unified tion,” Bulletin of Polish Academy of Science, vol. 60,
convergence analysis of block successive minimiza- no. 3, pp. 389–407, 2012.
tion methods for nonsmooth optimization,” SIAM [74] A. Cichocki, “Generalized Component Analysis and
Journal on Optimization, vol. 23, no. 2, pp. 1126–1153, Blind Source Separation Methods for Analyzing
2013. Mulitchannel Brain Signals,” in in Statistical and
[58] P. Paatero, “The multilinear engine: A table-driven Process Models for Gognitive Neuroscience and Aging.
least squares program for solving multilinear prob- Lawrence Erlbaum Associates, 2007, pp. 201–272.
lems, including the n-way parallel factor analysis [75] M. Haardt, F. Roemer, and G. D. Galdo, “Higher-
model,” Journal of Computational and Graphical Statis- order SVD based subspace estimation to improve the
tics, vol. 8, no. 4, pp. 854–888, Dec. 1999. parameter estimation accuracy in multi-dimensional
[59] E. Acar, D. Dunlavy, T. Kolda, and M. Mørup, harmonic retrieval problems,” IEEE Trans. Signal Pro-
“Scalable tensor factorizations for incomplete data,” cessing, vol. 56, pp. 3198 – 3213, Jul. 2008.
Chemometrics and Intelligent Laboratory Systems, vol. [76] A. Phan and A. Cichocki, “Tensor decompositions for
106 (1), pp. 41–56, 2011. [Online]. Available: feature extraction and classification of high dimen-
http://www2.imm.dtu.dk/pubdb/p.php?5923 sional datasets,” Nonlinear Theory and Its Applications,
[60] A.-H. Phan, P. Tichavsky, and A. Cichocki, “Low IEICE, vol. 1, no. 1, pp. 37–68, 2010.
complexity Damped Gauss-Newton algorithms for [77] L. De Lathauwer, “Decompositions of a higher-order
CANDECOMP/PARAFAC,” SIAM Journal on Matrix tensor in block terms – Part I and II,” SIAM
Analysis and Applications (SIMAX), vol. 34, no. 1, pp. Journal on Matrix Analysis and Applications (SIMAX),
126–147, 2013. vol. 30, no. 3, pp. 1022–1066, 2008, special Issue on
[61] L. Sorber, M. Van Barel, and L. De Lathauwer, Tensor Decompositions and Applications. [Online].
“Optimization-based algorithms for tensor decompo- Available: http://publi-etis.ensea.fr/2008/De08e
sitions: Canonical Polyadic Decomposition, decompo-
[78] L. De Lathauwer, “Blind separation of exponential
sition in rank-( L r , L r , 1) terms and a new generaliza-
polynomials and the decomposition of a tensor in
tion,” SIAM J. Optimization, vol. 23, no. 2, 2013.
rank-(L r ,L r ,1) terms,” SIAM J. Matrix Analysis Appli-
[62] V. de Silva and L.-H. Lim, “Tensor rank and the ill-
cations, vol. 32, no. 4, pp. 1451–1474, 2011.
posedness of the best low-rank approximation prob-
[79] L. De Lathauwer, “Block component analysis, a new
lem,” SIAM J. Matrix Anal. Appl., vol. 30, pp. 1084–
concept for blind source separation,” in Proc. 10th
1127, September 2008.
International Conf. LVA/ICA, Tel Aviv, March 12-15,,
[63] W. Krijnen, T. Dijkstra, and A. Stegeman, “On the
2012, pp. 1–8.
non-existence of optimal solutions and the occurrence
of “degeneracy” in the Candecomp/Parafac model,” [80] E. Candès, J. Romberg, and T. Tao, “Robust un-
Psychometrika, vol. 73, pp. 431–439, 2008. certainty principles: exact signal reconstruction from
[64] M. Sørensen, L. De Lathauwer, P. Comon, S. Icart, highly incomplete frequency information,” IEEE
and L. Deneire, “Canonical Polyadic Decomposition Transactions on Information Theory, vol. 52, no. 2, pp.
with orthogonality constraints,” SIAM J. Matrix Anal. 489–509, 2006.
Appl., vol. 33, no. 4, pp. 1190–1213, 2012. [81] E. J. Candes and T. Tao, “Near-optimal signal recovery
[65] M. Sørensen and L. De Lathauwer, “Blind signal from random projections: Universal encoding strate-
separation via tensor decomposition with Vander- gies?” Information Theory, IEEE Transactions on, vol. 52,
monde factor: Canonical polyadic decomposition,” no. 12, pp. 5406–5425, 2006.
IEEE Trans. Signal Processing, vol. 61, no. 22, pp. 5507– [82] D. L. Donoho, “Compressed sensing,” Information
5519, Nov. 2013. Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289–
[66] G. Zhou and A. Cichocki, “Canonical Polyadic De- 1306, 2006.
composition based on a single mode blind source [83] Y. Eldar and G. Kutyniok, “Compressed Sensing:
separation,” IEEE Signal Processing Letters, vol. 19, Theory and Applications,” New York: Cambridge Univ.
no. 8, pp. 523–526, 2012. Press, vol. 20, p. 12, 2012.
[67] L.-H. Lim and P. Comon, “Nonnegative approxima- [84] M. F. Duarte and R. G. Baraniuk, “Kronecker com-
tions of nonnegative tensors,” Journal of Chemometrics, pressive sensing,” IEEE Transactions on Image Process-
vol. 23, no. 7-8, pp. 432–441, 2009. ing, vol. 21, no. 2, pp. 494–504, 2012.
[68] A. van der Veen and A. Paulraj, “An analytical con- [85] C. Caiafa and A. Cichocki, “Computing sparse rep-
stant modulus algorithm,” IEEE Transactions Signal resentations of multidimensional signals using Kro-
Processing, vol. 44, pp. 1136–1155, 1996. necker bases,” Neural Computaion, vol. 25, no. 1, pp.
[69] R. Roy and T. Kailath, “Esprit-estimation of signal pa- 186–220, 2013.
rameters via rotational invariance techniques,” Acous- [86] ——, “Multidimensional compressed sensing and
tics, Speech and Signal Processing, IEEE Transactions on, their applications,” WIREs Data Mining and Knowledge
vol. 37, no. 7, pp. 984–995, 1989. Discovery, 2013 (accepted).
23
[87] S. Gandy, B. Recht, and I. Yamada, “Tensor com- [106] A. Cichocki, “Tensors decompositions: New concepts
pletion and low-n-rank tensor recovery via convex for brain data analysis?” Journal of Control, Measure-
optimization,” Inverse Problems, vol. 27, no. 2, 2011. ment, and System Integration (SICE), vol. 47, no. 7, pp.
[88] M. Signoretto, Q. T. Dinh, L. De Lathauwer, and J. A. 507–517, 2011.
Suykens, “Learning with tensors: a framework based [107] V. Calhoun, J. Liu, and T. Adali, “A review of group
on convex optimization and spectral regularization,” ICA for fMRI data and ICA for joint inference of
Machine Learning, pp. 1–49, 2013. imaging, genetic, and ERP data,” Neuroimage, vol. 45,
[89] L. Sorber, M. Van Barel, and L. De Lathauwer, pp. 163–172, 2009.
“Tensorlab v1.0,” Feb. 2013. [Online]. Available: [108] Y.-O. Li, T. Adali, W. Wang, and V. Calhoun, “Joint
http://esat.kuleuven.be/sista/tensorlab/ blind source separation by multiset canonical corre-
[90] N. Sidiropoulos and A. Kyrillidis, “Multi-way com- lation analysis,” IEEE Transactions on Signal Processing,
pressed sensing for sparse low-rank tensors,” IEEE vol. 57, no. 10, pp. 3918 –3929, oct. 2009.
Signal Processing Letters, vol. 19, no. 11, pp. 757–760, [109] E. Acar, T. Kolda, and D. Dunlavy, “All-at-once op-
2012. timization for coupled matrix and tensor factoriza-
[91] D. Foster, K. Amano, S. Nascimento, and M. Foster, tions,” CoRR, vol. abs/1105.3422, 2011.
“Frequency of metamerism in natural scenes.” Journal [110] G. Zhou, A. Cichocki, S. Xie, and D. Mandic,
of the Optical Society of America A, vol. 23, no. 10, pp. “Beyond Canonical Correlation Analysis: Com-
2359–2372, 2006. mon and individual features analysis,” IEEE
[92] A. Cichocki, “Era of big data processing: A new Transactions on PAMI, 2013. [Online]. Available:
approach via tensor networks and tensor decomposi- http://arxiv.org/abs/1212.3913,2012
tions, (invited talk),” in Proc. of Int. Workshop on Smart [111] B. Bader, T. G. Kolda et al., “MAT-
Info-Media Systems in Asia (SISA2013), Nagoya, Japan, LAB tensor toolbox version 2.5,” Avail-
Sept.30–Oct.2, 2013. able online, Feb. 2012. [Online]. Available:
http://www.sandia.gov/∼tgkolda/TensorToolbox
[93] R. Orus, “A Practical Introduction to Tensor Net-
[112] G. Zhou and A. Cichocki, “TDALAB:
works: Matrix Product States and Projected Entangled
Tensor Decomposition Laboratory,” LABSP,
Pair States,” The Journal of Chemical Physics, 2013.
Wako-shi, Japan, 2013. [Online]. Available:
[94] J. Salmi, A. Richter, and V. Koivunen, “Sequential http://bsp.brain.riken.jp/TDALAB/
unfolding SVD for tensors with applications in array [113] A.-H. Phan, P. Tichavský, and A. Cichocki, “Tensor-
signal processing,” IEEE Transactions on Signal Process- box: a matlab package for tensor decomposition,”
ing, vol. 57, pp. 4719–4733, 2009. LABSP, RIKEN, Japan, 2012. [Online]. Available:
[95] A.-H. Phan and A. Cichocki, “PARAFAC algorithms http://www.bsp.brain.riken.jp/∼phan/tensorbox.php
for large-scale problems,” Neurocomputing, vol. 74, [114] C. Andersson and R. Bro, “The N-way toolbox
no. 11, pp. 1970–1984, 2011. for MATLAB,” Chemometrics Intell. Lab. Systems,
[96] S. K. Suter, M. Makhynia, and R. Pajarola, “Tamresh vol. 52, no. 1, pp. 1–4, 2000. [Online]. Available:
- tensor approximation multiresolution hierarchy for http://www.models.life.ku.dk/nwaytoolbox
interactive volume visualization,” Comput. Graph. Fo- [115] I. Oseledets, “TT-toolbox 2.2,” 2012. [Online].
rum, vol. 32, no. 3, pp. 151–160, 2013. Available: https://github.com/oseledets/TT-Toolbox
[97] D. Nion and N. Sidiropoulos, “Adaptive algorithms [116] D. Kressner and C. Tobler, “htucker—A MATLAB
to track the PARAFAC decomposition of a third-order toolbox for tensors in hierarchical Tucker format,”
tensor,” IEEE Trans. on Signal Processing, vol. 57, no. 6, MATHICSE, EPF Lausanne, 2012. [Online]. Available:
pp. 2299–2310, Jun. 2009. http://anchp.epfl.ch/htucker
[98] S. A. Goreinov, N. L. Zamarashkin, and E. E. Tyrtysh- [117] M. Espig, M. Schuster, A. Killaitis, N. Waldren,
nikov, “Pseudo-skeleton approximations by matrices P. Wähnert, S. Handschuh, and H. Auer,
of maximum volume,” Mathematical Notes, vol. 62, “Tensor Calculus library,” 2012. [Online]. Available:
no. 4, pp. 515–519, 1997. http://gitorious.org/tensorcalculus
[99] C. Caiafa and A. Cichocki, “Generalizing the column- [118] P. Kroonenberg, “The three-mode company. A
row matrix decomposition to multi-way arrays,” Lin- company devoted to creating three-mode software
ear Algebra and its Applications, vol. 433, no. 3, pp. 557– and promoting three-mode data analysis.” [Online].
573, 2010. Available: http://three-mode.leidenuniv.nl/
[100] S. A. Goreinov, “On cross approximation of multi- [119] S. Zhe, Y. Qi, Y. Park, I. Molloy, and S. Chari,
index array,” Doklady Math., vol. 420, no. 4, pp. 404– “DinTucker: Scaling up Gaussian process models on
406, 2008. multidimensional arrays with billions of elements,”
[101] I. Oseledets, D. V. Savostyanov, and E. Tyrtysh- PAMI (in print) arXiv preprint arXiv:1311.2663, 2014.
nikov, “Tucker dimensionality reduction of three- [120] K. Yilmaz and A. T. Cemgil, “Probabilistic latent
dimensional arrays in linear time,” SIAM J. Matrix tensor factorisation,” in Proc. of International Conference
Analysis Applications, vol. 30, no. 3, pp. 939–956, 2008. on Latent Variable analysis and Signal Separation, vol.
[102] I. Oseledets and E. Tyrtyshnikov, “TT-cross approx- 6365, 2010, pp. 346–353, cPCI-S.
imation for multidimensional arrays,” Linear Algebra
and its Applications, vol. 432, no. 1, pp. 70–88, 2010.
[103] M. W. Mahoney, M. Maggioni, and P. Drineas,
“Tensor-CUR decompositions for tensor-based data,”
SIAM Journal on Matrix Analysis and Applications,
vol. 30, no. 3, pp. 957–987, 2008.
[104] R. Bro, “Multiway calibration. Multilinear PLS,” Jour-
nal of Chemometrics, vol. 10, pp. 47–61, 1996.
[105] Q. Zhao, C. Caiafa, D. Mandic, Z. Chao, Y. Nagasaka,
N. Fujii, L. Zhang, and A. Cichocki, “Higher-order
partial least squares (HOPLS): A generalized multi-
linear regression method,” IEEE Trans on Pattern Anal-
ysis and machine Intelligence (PAMI), vol. 35, no. 7, pp.
1660–1673, 2013.