AMS Classifications. 15A69, 65F10, 65F60: Kathryn - Lund@karlin - Mff.cuni - CZ
AMS Classifications. 15A69, 65F10, 65F60: Kathryn - Lund@karlin - Mff.cuni - CZ
AMS Classifications. 15A69, 65F10, 65F60: Kathryn - Lund@karlin - Mff.cuni - CZ
THIRD-ORDER TENSORS ∗
KATHRYN LUND†
is valid for third-order tensors in the tensor t-product formalism, which regards third-order tensors
as block circulant matrices. The tensor function definition is shown to have similar properties as
standard matrix function definitions in fundamental scenarios. To demonstrate the definition’s po-
tential in applications, the notion of network communicability is generalized to third-order tensors
and computed for a small-scale example via block Krylov subspace methods for matrix functions. A
complexity analysis for these methods in the context of tensors is also provided.
Key words. tensors, multidimensional arrays, tensor t-product, matrix functions, block circu-
lant matrices, network analysis
∗ This work was supported in part by the U.S. National Science Foundation under grant DMS-
1418882, the U.S. Department of Energy under grant DE-SC 0016578, and the Charles University
PRIMUS grant, project no. PRIMUS/19/SCI/11 .
† Charles University, Prague, Czech Republic (kathryn.lund@karlin.mff.cuni.cz).
1
2 KATHRYN LUND
focus on adapting the block Krylov subspace methods (KSMs) from [13], and in light
of the so-called “curse of dimensionality,” we also present a computational complexity
analysis for this algorithm in the tensor function context. We also propose modifica-
tions to the algorithm based on the discrete Fourier transform, which were shown in
[21] to increase computational efficiency for the tensor t-product.
This report proceeds as follows. We recapitulate matrix function definitions and
properties in Section 1.1. Section 2 restates the tensor t-product framework and poses
a definition for the tensor t-function, a new definition for a tensor function within this
framework. We also present statements and proofs of t-function properties in analogy
to the core properties of matrix functions. A possible application for the tensor t-
exponential as a generalized communicability measure is discussed in Section 3. In
Section 4, we show how block KSMs for matrix functions can be used to compute
the tensor t-function, and demonstrate the efficacy of these methods for the tensor
t-exponential. We make concluding remarks in Section 5.
Before proceeding, we make a brief comment on syntax and disambiguation: the
phrase “tensor function” already has an established meaning in physics; see, e.g.,
[3, 4, 33]. The most precise phrase for our object of interest would be “a function of a
multidimensional array,” in analogy to “a function of a matrix.” However, since combi-
nations of prepositional phrases can be cumbersome in English, we risk compounding
literature searches by resorting to the term “tensor function.”
1.1. Definitions of matrix functions. Following [15, 17], we concern ourselves
with the three main matrix function definitions, based on the Jordan canonical form,
Hermite interpolating polynomials, and the Cauchy-Stieltjes integral form. In each
case, the validity of the definition boils down to the differentiability of f on the
spectrum of A. When f is analytic on the spectrum of A, all the definitions are
equivalent, and we can switch between them freely.
Let A ∈ Cn×n be a matrix with spectrum spec(A) := {λj }N j=1 , where N ≤ n and
the λj are distinct. An m × m Jordan block Jm (λ) of an eigenvalue λ has the form
λ 1
..
λ .
Jm (λ) =
∈ Cm×m .
..
. 1
λ
f (k) (λj ), k = 0, . . . , nj − 1, j = 1, . . . , N.
Definition 1.1. Suppose A ∈ Cn×n has Jordan form (1.1) and that f is defined
on the spectrum of A. Then we define
f (A) := Xf (J)X −1 ,
TENSOR T-FUNCTION 3
where f (J) := diag(f (Jm1 (λj1 )), . . . , f (Jmp (λjℓ ))), and
(n −1)
f ′′ (λjk ) f jk (λjk )
f (λjk ) f ′ (λjk ) 2! . . . (njk −1)!
..
0 f (λ ) f ′
(λ ) . . . .
jk jk
f (Jmi (λjk )) :=
... . .. . .. . .. ′′
f (λjk )
∈ Cmi ×mi
2!
. .. ..
.. . . f (λjk )
′
Fig. 2.1: Different views of a third-order tensor A ∈ Cn1 ×n2 ×n3 . (a) column fibers: A(:, j, k);
(b) row fibers: A(i, :, k); (c) tube fibers: A(i, j, :); (d) horizontal slices: A(i, :, :); (e) lateral
slices: A(:, j, :); (f) frontal slices: A(:, :, k)
2. A definition for tensor functions. We direct the reader now to Figure 2.1
for different “views” of a third-order tensor, which will be useful in visualizing the
forthcoming concepts. We also make use of some notions from block matrices. Define
the standard block unit vectors Eknp×n := ebpk ⊗ In×n , where ebpk ∈ Cp is the vector
of all zeros except for the kth entry, and In×n is the identity in Cn×n . When the
dimensions are clear from context, we drop the superscripts. See (2.1) for various
ways of expressing E1np×n .
In×n 1
0 0
E1np×n = . = . ⊗ In×n = unfold(In×n×p ), (2.1)
.. ..
0 0
A ∗ B := fold(bcirc(A)unfold(B)).
Note that the operators fold, unfold, and bcirc are linear.
The notion of transposition is defined face-wise, i.e., A∗ is the n × m × p tensor
obtained by taking the conjugate transpose of each frontal slice of A and then reversing
the order of the second through pth transposed slices.
TENSOR T-FUNCTION 5
For tensors with n × n square faces, there is a tensor identity In×n×p ∈ Cn×n×p ,
whose first frontal slice is the n × n identity matrix and whose remaining frontal slices
are all the zero matrix. With In×n×p , One can then define the notion of an inverse
with respect to the t-product. Namely, A, B ∈ Cn×n×p are inverses of each other if
A ∗ B = In×n×p and B ∗ A = In×n×p . The t-product formalism further gives rise to
its own notion of polynomials, with powers of tensors defined as Aj := A | ∗ ·{z
· · ∗ A}.
j times
Assuming that A ∈ Cn×n×p has diagonalizable faces, we can also define a ten-
sor eigendecomposition. That is, we have that A(k) = X (k) D(k) (X (k) )−1 , for all
k = 1, . . . , p, and define X and D to be the tensors whose faces are X (k) and D(k) ,
respectively. Then
where X~i are the n × 1 × p lateral slices of X (see Figure 2.1(e) ) and dj are the
1 × 1 × p tubal fibers of D (see Figure 2.1). We say that D is f-diagonal, i.e., that
each of its frontal faces is a diagonal matrix.
The eigenvalue decomposition (2.3) is not unique. See [16] for an alternative
circulant-based interpretation of third-order tensors, as well as a deeper exploration
of a unique canonical eigendecomposition for tensors.
2.1. The tensor t-exponential. As motivation, we consider the solution to a
multidimensional ordinary differential equation. Suppose that A has square frontal
faces, i.e., that A ∈ Cn×n×p and let B : [0, ∞) → Cn×s×p be an unknown function
d
with B(0) given. With dt acting element-wise, we consider the differential equation
dB
(t) = A ∗ B(t). (2.4)
dt
Unfolding both sides leads to
(1)
B (1) (t) B (t)
d . .
.. = bcirc(A) .. ,
dt
B (n) (t) B (n) (t)
which we call the tensor t-function. Note that f (bcirc(A)) · unfold(B) is merely
a matrix function times a block vector. If B = In×n×p , then by equation (??) the
definition for f (A) reduces to
f (A) := fold f (bcirc(A))E1np×n . (2.7)
A natural question is whether the definition (2.6) behaves “as expected” in common
scenarios. To answer this question, we require some results on block circulant matrices
and the tensor t-product.
Theorem 2.1 (Theorem 5.6.5 in [7]). Suppose A, B ∈ Cnp×np are block circulant
matrices with n × n blocks. Let {αj }kj=1 be scalars. Then AT , A∗ , α1 A + α2 B, AB,
Pk
q(A) = j=1 αj Aj , and A−1 (when it exists) are also block circulant.
Remark 2.2. From (2.2), we can see that any block circulant matrix C ∈ Cnp×np
can be represented by its first column CE1np×n . Let C ∈ Cn×n×p be a tensor whose
frontal faces are the block entries of CE1np×n . Then C = fold CE1np×n .
Lemma 2.3. Let A ∈ Cm×n×p and B ∈ Cn×s×p . Then
(i) unfold(A) = bcirc(A)E1np×n;
(ii) bcirc fold bcirc(A)E1np×n = bcirc(A);
(iii) bcirc(A ∗ B) = bcirc(A)bcirc(B);
(iv) bcirc(A)j = bcirc Aj , for all j = 0, 1, . . .; and
(v) (A ∗ B)∗ = B ∗ ∗ A∗ .
Proof. We drop the superscripts on E1np×n for ease of presentation. Parts (i) and
(ii) follow from Remark (2.2). To prove part (iii), we note by part (i) that
bcirc(A ∗ B) = bcirc(fold(bcirc(A)unfold(B)))
= bcirc(fold(bcirc(A)bcirc(B)E1 )).
bcirc(fold(bcirc(A)bcirc(B)E1 )) = bcirc(A)bcirc(B).
Part (iv) follows by induction on part (iii). Part (v) is the same as [22, Lemma 3.16].
Let D be an n × n × p f-diagonal tensor, i.e., a tensor whose n × n frontal slices are
diagonal matrices. Alternatively, one can think of such a tensor as an n × n matrix
nonzero tube fibers on the diagonal, and zero tube fibers everywhere else. (Reference
Figure 2.1(c).) The following theorem summarizes the relationship between the block
circulant of D and those of its tube fibers.
Theorem 2.4. Let D ∈ Cn×n×p be f-diagonal, and let {di }ni=1 ⊂ C1×1×p denote
its diagonal tube fibers. Then the spectrum of bcirc(D) is identical to the union of
the spectra of bcirc(di ), i = 1, . . . n.
Proof. We begin by deriving an expression for bcirc(D) in terms of the p × p
circulant matrices bcirc(di ). Denote each slice as D(k) , k = 1, . . . , p, with diagonal
(k)
entries denoted as di , for i = 1, . . . , n; i.e.,
(k)
d1
..
D(k) = . .
(k)
dn
TENSOR T-FUNCTION 7
bcirc(D)
(1)
D D(p) · · · D(2)
(2) .. ..
D D(1) . .
= .
. .
.. .. .. D (p)
(p)
D · · · D(2) D(1)
(1) (p) (2)
d d1 d1
1
.. .. ··· ..
. . .
(1)
dn dn
(p) (2)
dn
(2)
d (1)
d1
1 ..
..
.. .. . .
. .
(2) (1)
dn dn
=
.
(p)
d1
.. .. ..
. . . ..
.
(p)
dn
(p) (2) (1)
d1 d1 d1
.. ··· .. ..
. . .
(p) (2) (1)
dn dn dn
Collecting the highlighted elements, note that the block circulant of the first tube
fiber is given as
(1) (p) (2)
d1 d1 ··· d1
.. ..
(2)
d1
(1)
d1 . .
bcirc(d1 ) = .. .
.. .. (p)
. . . d1
(p) (2) (1)
d1 ··· d1 d1
Defining
1
0
Ib1 := .. ∈ Cn×n ,
.
0
8 KATHRYN LUND
it holds that
bcirc(d1 ) ⊗ Ib1
(1) (p) (2)
d d1 d1
1
.. .. ··· ..
. . .
0 0 0
(2) (1)
d d1
1 .. ..
.. .. . .
. .
0 0
=
(p)
d1
..
.. ..
. . . ..
.
0
(p) (2) (1)
d1 d1 d1
.. ··· .. ..
. . .
0 0 0
Noting the same pattern for each i = 1, . . . , n, it is not hard to see that
n
X
bcirc(D) = bcirc(di ) ⊗ Ibi , (2.8)
i=1
where Ibi ∈ Cn×n is zero everywhere except for the iith entry, which is one.
It is known that a circulant matrix is unitarily diagonalizable by the discrete
Fourier transform (DFT); see, e.g., [7, Section 3.2]. Then for a p × p circulant matrix
C, and with Fp denoting the p × p DFT, Fp∗ CFp = Λ, where Λ ∈ Cp×p is diagonal.
Since each bcirc(di ) is a p × p circulant matrix, there exists for each i = 1, . . . , n a
diagonal Λi ∈ Cp×p such that
Fp bcirc(di )Fp = Λi . (2.9)
We also have the following useful property of the Kronecker product for matrices
A, B, C, and D such that the products AC and BD exist; see, e.g., [19, Lemma
4.2.10]:
(A ⊗ B)(C ⊗ D) = (AB) × (CD) (2.10)
Consequently,
n
!
X
(Fp ⊗ In×n )bcirc(D)(Fp∗ ⊗ In×n ) = (Fp ⊗ In×n ) bcirc(di ) ⊗ Ibi (Fp∗ ⊗ In×n )
i=1
n
X
= (Fp ⊗ In×n )bcirc(di ) ⊗ Ibi (Fp∗ ⊗ In×n )
i=1
n
X
= (Fp bcirc(di )Fp∗ ) ⊗ (In×n Ibi In×n ), by (2.10)
i=1
n
X
= Λi ⊗ Ibi , by (2.9).
i=1
TENSOR T-FUNCTION 9
Pn
Noting that Fp ⊗ In×n is unitary and that the matrix Λ := i=1 Λi ⊗ Ibi is a diagonal
matrix whose entries are precisely the diagonal entries of all the Λi concludes the
proof.
Corollary 2.5. Let D ∈ Cn×n×p be f-diagonal, and let {di }ni=1 ⊂ C1×1×p de-
note its diagonal tube fibers. Then a function f being defined on the spectrum of
bcirc(D) is equivalent to f being defined on the union of the spectra of bcirc(di ),
i = 1, . . . , n.
An immediate consequence of Theorem 2.4 is that a function f being defined on
the spectrum of bcirc(D) is equivalent to f being defined on the spectra of bcirc(di ),
i = 1, . . . , n. The interpolating polynomials for f (bcirc(D)) and f (bcirc(di )), i =
1, . . . , n, are also related.
The following theorem ensures that definition (2.6) is well defined when f is a
polynomial, when A and B are second-order tensors (i.e., matrices), and when f is
the inverse function.
Theorem 2.6. Let A ∈ Cn×n×p and B ∈ Cn×s×p .
(i) If f ≡ q, where q is a polynomial, then the tensor t-function definition (2.6)
matches the polynomial notion in the t-product formalism, i.e.,
= fold(bcirc(q(A)) · unfold(B)).
Part (ii) is a special case of part (i). As for part (iii), since p = 1, we have that
fold(A) = bcirc(A) = A = unfold(A), and similarly for B. Then the definition
of f (A) ∗ B reduces immediately to the matrix function case. Part (iv) follows by
carefully unwrapping the definition of f (A):
−1
f (A) ∗ A = fold bcirc(A) unfold(A)
= fold bcirc(A) bcirc(A)E1np×n , by Lemma 2.3(i)
−1
= fold E1np×n = In×n×p .
10 KATHRYN LUND
The definition (2.6) possesses generalized versions of many of the core properties
of matrix functions.
Theorem 2.7. Let A ∈ Cn×n×p , and let f : C → C be defined on a region in
the complex plane containing the spectrum of bcirc(A). For part (iv), assume that
A has an eigendecomposition as in equation (2.3), with A ∗ X~i = D ∗ X~i = X~i ∗ di ,
i = 1, . . . , n. Then it holds that
(i) f (A) commutes with A;
(ii) f (A∗ ) = f (A)∗ ;
(iii) f (X ∗ A ∗ X −1 ) = X f (A)X −1 ; and
(iv) f (D) ∗ X~i = X~i ∗ f (di ), for all i = 1, . . . , n.
Proof. For parts
Pm (i)-(iii), it suffices by Theorem 2.6(ii) to show that the statements
hold for f (z) = j=1 cj z j . Part (i) then follows immediately. To prove part (ii), we
need only show that (Aj )∗ = (A∗ )j for all j = 0, 1, . . ., which follows by induction
from Lemma 2.3(v). Part (iii) also follows inductively. The base cases j = 0, 1 clearly
hold. Assume for some j = k, (X ∗ A ∗ X −1 )k = X (A)k X −1 , and then note that
(X ∗ A ∗ X −1 )k+1 = (X ∗ A ∗ X −1 )k ∗ (X ∗ A ∗ X −1 )
= X ∗ (A)k ∗ X −1 ∗ X ∗ A ∗ X −1 = X ∗ (A)k+1 ∗ X −1 .
For part (iv), we fix i ∈ {i, . . . , n}. By Corollary 2.5, f being defined on
spec(bcirc(D)) implies that it is also defined on spec(bcirc(di )). Let q and qi be the
polynomials guaranteed by Theorem 1.2 such that f (bcirc(D)) = q(bcirc(D)) and
f (bcirc(di )) = qi (bcirc(di )). By Theorem 2.4, spec(bcirc(di )) ⊂ spec(bcirc(D)),
so by Theorem 2.4, it follows that qi (bcirc(di )) = q(bcirc(di )). Then it suffices to
prove part (iv) for Dj , j = 1, 0, . . .. The cases j = 0, 1 clearly hold, and we assume
the statement holds for some j = k ≥ 1. Then
where the inner matrix should be regarded three-dimensionally, with its elements
being tube fibers (cf. Figure 2.1(c)).
3. Centrality and communicability of a third-order network. We use the
term network to denote an undirected, unweighted graph with n nodes. The graph,
and by extension, the network, can be represented by its adjacency matrix A ∈ Rn×n .
TENSOR T-FUNCTION 11
The ijth entry of A is 1 if nodes i and j are connected, and 0 otherwise. As a rule, a
node is not connected to itself, so Aii = 0. The centrality of the ith node is defined
as exp(A)ii , while the communicability between nodes i and j is defined as exp(A)ij .
These notions can be extended to higher-order situations. Suppose we are con-
cerned instead about triplets, instead of pairs, of nodes. Then it is possible to con-
struct an adjacency tensor A, where a 1 at entry Aijk indicates that distinct nodes i,
j, and k are connected and 0 otherwise. Information will, however, be lost if only the
adjacency tensor is considered, since pairwise connectivity is stored in the adjacency
matrix. Multilayer networks, such as those describing a city’s bus, metro, and tram
systems, constitute a more natural application, and several notions of centrality are
explored in [9]. Alternatively, it is not hard to imagine a time-dependent network
stored as a tensor, where each frontal face corresponds to a sampling of the network
at discrete times; see, e.g., [32]. In any of these situations, we could compute the
communicability of a triple as exp(A)ijk , where exp(A) is our tensor t-exponential.
Centrality for a node i would thus be defined as exp(A)iii .
Algorithm 4.1 is the generalized block Arnoldi procedure. We assume that Algo-
rithm 4.1 runs to completion without breaking down, i.e., that we obtain
(i) a hh·, ·iiS -orthonormal basis {Vk }m+1
k=1 ⊂ C
np×n
, such that each Vk has full rank
S S m
and Km (A, B) = span {Vk }k=1 , and
(ii) a block upper Hessenberg matrix Hm ∈ Sm×m and Hm+1,m ∈ S,
all satisfying the block Arnoldi relation
∗
AV m = V m Hm + Vm+1 Hm+1,m Em , (4.1)
where V m = [V1 | . . . |Vm ] ∈ Cnp×nm , and (Hm )ij = Hij . Note that Hm has dimension
mn × mn; so long as m ≪ p, Hm will be significantly smaller than A. Otherwise,
it will be necessary to partition the right-hand side B and compute the action of
f (A) on each partition separately. Furthermore, in the global paradigm, Hm has a
Kronecker structure H ⊗ In , where H ∈ Cm×m , so the storage of Hm can be reduced.
The paper [13] also establishes theory for a block full orthogonalization method
for functions of matrices (B(FOM)2 ). The B(FOM)2 approximation is defined as
Fm := V m f (Hm )E1 B, (4.2)
TENSOR T-FUNCTION 13
which indeed reduces to a block FOM approximation when f (z) = z −1 ; see, e.g., [29].
With the end application being tensors, restarts will be necessary to mitigate
memory limitations imposed by handling higher-order data. Restarts for B(FOM)2
are developed in detail in [13] for functions with Cauchy-Stieltjes representations,
including the matrix exponential; we present here a high-level summary of the pro-
cedure as Algorithm 4.2. Restarts are performed by approximating an error function
(k)
via adaptive quadrature rules; this step is represented by ∆m in line 6.
Algorithm 4.2 B(FOM)2 (m): block full orthogonalization method for functions of
matrices with restarts
1: Given f , A, B, S, hh·, ·iiS , N , m, t, tol
2: Run Algorithm 4.1 with inputs A, B, S, hh·, ·iiS , N , and m and store
(1)
V m+1 , H(1)
m , and B
(1)
(1) (1) (1)
3: Compute and store Fm = V m f Hm E1 B
4: for k = 1, 2, . . ., until convergence do
(k)
5: Run Algorithm 4.1 with inputs A, Vm+1 , S, hh·, ·iiS , N , and m and
(k+1)
store V m+1 in place of the previous basis
(k)
em
6: Compute error approximation D
(k+1) (k) em(k)
7: Update Fm := Fm + D
8: end for
(k+1)
9: return Fm
multiplying V by a scalar, which is again O(n3 ). The total cost of global Arnoldi is
consequently dominated by AV products:
1
Cgl-Arnoldi = O mn4 + (m + 1)(m + 2) n3 . (4.4)
2
Classical B(FOM)2 . The computation of (4.2) can be broken into three stages:
f (Hm ), a V C-type product, and the evaluation of the basis V m on the resulting
matrix. The matrix function should be computed via a direct algorithm, such as
the Schur-Parlett Algorithm, described in [17, Chapter 9]. Without function-specific
information, funm requires up to O(m4 n4 ); function-specific algorithms, or algorithms
that take advantage of the eigenvalue distribution of A (described in other chapters of
[17]) may be cheaper. The product f (Hm )E1 does not require computation, since we
can just extract the first block column from f (Hm ); f (Hm )E1 B then requires only
O(mn3 ). Finally, V m applied to an mn × n matrix requires O(mn4 ). Including the
Arnoldi cost (4.3), the total for computing (4.2) is then
Ccl-B(FOM)2 = O (m + 1)n6 + m4 + m(m + 3) n4 + mn3 . (4.5)
Global B(FOM)2 . The same three stages apply for the global version of B(FOM)2
as for the classical, but we can make many computations cheaper. The matrix function
f (Hm ) = f (Hm ) ⊗ In , where the Kronecker product need not be formed explicitly, so
the cost reduces to O(m4 ). The matrix B can be regarded as a scalar, and using the
same column-extracting trick, f (Hm )E1 B comes at a negligible cost, O(m). Finally,
the product with the basis V m can be reduced to taking scalar combinations of the
basis vectors, amounting to O((m − 1)n3 ). The total for (4.2), including the Arnoldi
costs (4.4) is
4 1 3 4
Cgl-B(FOM)2 = O mn + (m + 1)(m + 2) + m − 1 n + m . (4.6)
2
Restarts. Determining the computational complexity for restarted B(FOM)2 is
challenging, because the quadrature rule is adaptive, and the number of nodes per
restart cycle plays a crucial role in how much work is done. Typically, the cost per
additional cycle should be less than computing the first step, and it should decrease as
the algorithm approaches convergence, because the quadrature rule can be approxi-
mated progressively less accurately; see, e.g., [12]. In the worst-case scenario, however,
the cost of successive cycles may be as expensive as the first, so it is reasonable to
regard (4.5) and (4.6) as upper bounds.
4.3. Block diagonalization and the discrete Fourier transform. Per rec-
ommendations in [21, 22], we can improve the computational effort of f (A) ∗ B by
taking advantage of the fact that bcirc(A) can be block diagonalized by the discrete
Fourier transform (DFT) along the tubal fibers of A. Let Fp denote the DFT of size
p × p. Then we have that
D1
D2
(Fp ⊗ In )bcirc(A)(Fp∗ ⊗ In ) = .. =: D,
.
Dp
where Dk are n × n matrices. Then by Theorem 1.5(iii),
f (bcirc(A)) = (Fp∗ ⊗ In )f (D)(Fp ⊗ In ).
TENSOR T-FUNCTION 15
Since each Di , i = 1, . . . , p may be a full n×n matrix, applying D itself to block vectors
will still take O(n4 ) operations. However, this structure is easier to parallelize and
requires less memory-movement than using A directly, which will play an important
role in high-performance applications.
4.4. The tensor t-exponential on a small third-order network. We take
A ∈ Cn×n×p to be a tensor whose p frontal faces are each adjacency matrices for
an undirected, unweighted network. More specifically, the frontal faces of A are
symmetric, and the entries are binary. The sparsity structure of this tensor is given
in Figure 4.1 for n = p = 50. Note that we must actually compute exp(A) ∗ I =
fold(exp(bcirc(A))E1 ) (cf. Definition (2.7)). With n = p = 50, this leads to a
2500 × 2500 matrix function times a 2500 × 50 block vector. The sparsity patterns
of bcirc(A) and D are shown in Figure 4.2. The block matrix D is determined by
applying Matlab’s fast Fourier transform to bcirc(A). Note that bcirc(A) is not
symmetric, but it has a banded structure. It should also be noted that while the
blocks of D appear to be structurally identical, they are not numerically equal.
Fig. 4.1: Sparsity structure for A. Blue indicates that a face is closer to the “front” and
pink farther to the “back”; see Figure 2.1(f) for how the faces are oriented.
0
0 0
500 500
50
1000 1000
1500 1500
100
2000 2000
bcirc(A) D zoom in on D
We compute exp(A) ∗ I with the classical and global versions of B(FOM)2 using
the Matlab software package bfomfom.1 We run Matlab 2019a on Windows 10 on
1 The script tensor texp network v2.m used to generate our results can be found at https://
gitlab.com/katlund/bfomfom-main, along with the main body of code.
16 KATHRYN LUND
a laptop with 16GB RAM and an Intel i7 processor at 1.80GHz. The convergence
behavior of each version is displayed in Figure 4.3, where we report the relative error
per restart cycle, i.e., per m iterations of Algorithm 4.1. The restart cycle length is
m = 5, and the error tolerance is 10−12 . The methods based on D (case (A)) are
0
D 0
A
10 10
classical classical
global global
relative error in || ||F
-10 -10
10 10
10-15 10-15
1 2 3 4 5 6 7 1 2 3 4 5 6 7
cycle count cycle count
(A) (B)
Fig. 4.3: Convergence plots for (A) classical and global methods on exp(D)Fp ⊗ In E1 , and
(B) classical and global methods on exp(bcirc(A))E1 . m = 5.
only a little less accurate than those based on bcirc(A) (case (B)), and they require
the same number of iterations to converge. The global methods require only one more
cycle than the classical ones, but considering the computational complexity per cycle
(cf. (4.5) and (4.6)), it is clear that the global methods require far less work overall.
Table 4.1 shows that for larger m, both classical and global methods require the
same number of cycles (for either D- or bcirc(A)-based approaches). For smaller
values of m, global methods cannot attain the desired tolerance, because they exceed
the maximum number of quadrature nodes allowed to perform the error update in
line 6 of Algorithm 4.2. See, however, Figure 4.4 for the convergence behavior of the
global method when m = 2. It still attains a high level of accuracy with much less
work overall than the classical method.
Table 4.1: Number of cycles needed to converge to 10−12 for different basis sizes m
m=2 m=5 m = 10 m = 15
classical 18 6 3 2
global – 7 3 2
5. Conclusion. The main purpose of this report is to establish a first notion for
functions of multidimensional arrays and demonstrate that it is feasible to compute
this object with well understood tools from the matrix function literature. Our def-
inition for the tensor t-function f (A) ∗ B shows versatility and consistency, and our
numerical results indicate that block KSMs can compute f (A) ∗ B with few iterations
and still achieve high accuracy. In particular, the global block KSM shows promise
for moderate sizes, since its overall workload is significantly smaller than that of its
classical counterpart. For smaller basis sizes, which are more favorable in the context
of large tensors, global methods may struggle to converge, and remedies for this situ-
ation remain an open problem. One potential solution, that should first be explored
TENSOR T-FUNCTION 17
D A
100 100
classical classical
global global
relative error in || ||F
10-10 10-10
10-15 10-15
0 5 10 15 20 25 0 5 10 15 20 25
cycle count cycle count
(A) (B)
Fig. 4.4: Convergence plots for (A) classical and global methods on exp(D)Fp ⊗ In E1 , and
(B) classical and global methods on exp(bcirc(A))E1 . m = 2.
for simple matrix functions, is to switch between global and classical paradigms in
some optimal way so as to minimize overall computational effort while maximizing
attainable accuracy.
The second aim of this report is to invite fellow researchers to pursue the many
open problems posed by this new definition and to devise tensor function definitions
for other paradigms. Other key problems include exploring applications of f (A) ∗ B in
real-life scenarios and comparing our definition of communicability for a third-order
network to existing network analysis tools.
Acknowledgments. The author would like to thank Misha Kilmer for useful con-
versations and the images used in Figure 2.1, Andreas Frommer and Daniel B. Szyld
for comments on Theorem 2.7, Francesca Arrigo for suggesting a new application in
Section 3, and the anonymous referee for multiple insights.
REFERENCES
[1] A. H. Al-Mohy and N. J. Higham, Computing the action of the matrix exponential with
an application to exponential integrators, SIAM J. Sci. Comput., 33 (2011), pp. 488–511,
https://doi.org/10.1137/100788860.
[2] F. Arrigo, M. Benzi, and C. Fenu, Computation of generalized matrix functions, SIAM J.
Matrix Anal. Appl., 37 (2016), pp. 836–860, https://doi.org/10.1137/15M1049634, https://
arxiv.org/abs/1512.01446.
[3] J. Betten, Creep mechanics, Springer, Berlin, 3rd ed., 2008.
[4] J. P. Boehler, ed., Applications of Tensor Functions in Solid Mechanics, Springer, Wien,
1987.
[5] K. Braman, Third-order tensors as linear operators on a space of matrices, Linear Algebra
Appl., 433 (2010), pp. 1241–1253, https://doi.org/10.1016/j.laa.2010.05.025, http://dx.
doi.org/10.1016/j.laa.2010.05.025.
[6] A. Cichocki, Era of Big Data Processing: A New Approach via Tensor Networks and Tensor
Decompositions, Tech. Report arXiv:1403.2048v4, 2014, http://arxiv.org/abs/1403.2048,
https://arxiv.org/abs/1403.2048.
[7] P. J. Davis, Circulant Matrices, AMS Chelsea Publishing, Providence, 2nd ed., 2012.
[8] L. de Lathauwer, B. de Moor, and J. Vandewalle, A multilinear singular value decom-
position, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1253–1278, https://doi.org/10.1137/
S0895479896305696, http://epubs.siam.org/doi/abs/10.1137/S0895479896305696.
[9] M. D. Domenico, A. Solé-Ribalta, E. Omodei, S. Gómez, and A. Arenas, Centrality in
Interconnected Multilayer Networks, Phys. D Nonlinear Phenom., 323 (2013), pp. 1–12,
https://doi.org/10.1016/j.physd.2016.01.002, https://arxiv.org/abs/arXiv:1311.2906v1.
18 KATHRYN LUND
[10] L. Elbouyahyaoui, A. Messaoudi, and H. Sadok, Algebraic properties of the block GMRES
and block Arnoldi methods, Electron. Trans. Numer. Anal., 33 (2008), pp. 207–220.
[11] E. Estrada and D. J. Higham, Network properties revealed through matrix functions, SIAM
Rev., 52 (2010), pp. 696–714, https://doi.org/10.1137/090761070, http://epubs.siam.org/
doi/10.1137/090761070.
[12] A. Frommer, S. Güttel, and M. Schweitzer, Efficient and stable Arnoldi restarts for matrix
functions based on quadrature, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 661–683.
[13] A. Frommer, K. Lund, and D. B. Szyld, Block Krylov subspace methods for functions of
matrices, Electron. Trans. Numer. Anal., 47 (2017), pp. 100–126, http://etna.mcs.kent.
edu/vol.47.2017/pp100-126.dir/pp100-126.pdf.
[14] A. Frommer, K. Lund, and D. B. Szyld, Block Krylov subspace methods for functions of
matrices II: Modified block FOM, tech. report, MATHICSE Technical Report, 2019, http://
infoscience.epfl.ch/record/265508.
[15] A. Frommer and V. Simoncini, Matrix functions, in Model Order Reduct. Theory, Res. Asp.
Appl., W. H. A. Schilders, H. A. van der Vorst, and J. Rommes, eds., vol. 13 of Mathematics
in Industry, Berlin, 2008, Springer, pp. 275–304.
[16] D. F. Gleich, C. Greif, and J. M. Varah, The power and Arnoldi methods in an algebra of
circulants, Numer. Linear Algebr. with Appl., 20 (2013), pp. 809–831, https://doi.org/10.
1002/nla, https://arxiv.org/abs/arXiv:1112.5346v3.
[17] N. J. Higham, Functions of Matrices, SIAM, Philadelphia, 2008.
[18] M. Hochbruck and A. Ostermann, Exponential integrators, Acta Numer., 19 (2010), pp. 209–
286, https://doi.org/10.1017/S0962492910000048.
[19] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge University Press,
Cambridge, 1991.
[20] B. N. Khoromskij, Tensor numerical methods for multidimensional PDES: theoretical analysis
and initial applications, ESAIM Proc. Surv., 48 (2015), pp. 1–28, https://doi.org/10.1051/
proc/201448001, http://www.esaim-proc.org/10.1051/proc/201448001.
[21] M. E. Kilmer, K. Braman, N. Hao, and R. C. Hoover, Third-order tensors as operators on
matrices: a theoretical and computational framework with applications in imaging, SIAM
J. Matrix Anal. Appl., 34 (2013), pp. 148–172, https://doi.org/10.1137/110837711.
[22] M. E. Kilmer and C. D. Martin, Factorization strategies for third-order tensors, Linear
Algebra Appl., 435 (2011), pp. 641–658, https://doi.org/10.1016/j.laa.2010.09.020, http://
dx.doi.org/10.1016/j.laa.2010.09.020.
[23] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51
(2008), pp. 455–500, https://doi.org/10.1137/07070111X.
[24] T. G. Kolda and J. R. Mayo, Shifted power method for computing tensor eigenpairs, SIAM
J. Matrix Anal. Appl., 32 (2011), pp. 1095–1124.
[25] L.-H. Lim, Singular values and eigenvalues of tensors: a variational approach, in Proc. IEEE
Int. Work. Comput. Adv. Multi-Sensor Adapt. Process. (CAMSAP ’05), vol. 3, 2005,
pp. 129–132, http://ieeexplore.ieee.org/document/1574201/.
[26] M. Ng, L. Qi, and G. Zhou, Finding the largest eigenvalue of a nonnegative tensor, SIAM J.
Matrix Anal. Appl., 31 (2009), pp. 1090–1099.
[27] L. Qi, Eigenvalues of a real supersymmetric tensor, J. Symb. Comput., 40 (2005), pp. 1302–
1324, https://doi.org/10.1016/j.jsc.2005.05.007.
[28] L. Qi, Eigenvalues and invariants of tensors, J. Math. Anal. Appl., 325 (2007), pp. 1363–1377,
https://doi.org/10.1016/j.jmaa.2006.02.071.
[29] Y. Saad, Iterative methods for sparse linear systems, SIAM, Philadelphia, 2nd ed., 2003.
[30] V. Simoncini and L. Lopez, Analysis of projection methods for rational function approxima-
tion to the matrix exponential, SIAM J. Numer. Anal., 44 (2006), pp. 613–635, https://
doi.org/10.1137/05062590.
[31] L. N. Trefethen and D. I. Bau, Numerical Linear Algebra, SIAM, 1997.
[32] Z. Zhang, G. Ely, S. Aeron, N. Hao, and M. E. Kilmer, Novel methods for multilinear
data completion and de-noising based on tensor-SVD, Proc. IEEE Comput. Soc. Conf.
Comput. Vis. Pattern Recognit., (2014), pp. 3842–3849, https://doi.org/10.1109/CVPR.
2014.485, https://arxiv.org/abs/1407.1785.
[33] Q.-S. Zheng, Theory of representations for tensor functions– A unified invariant ap-
proach to constitutive equations, Appl. Mech. Rev., 47 (1994), p. 545, https://doi.
org/10.1115/1.3111066, http://appliedmechanicsreviews.asmedigitalcollection.asme.org/
article.aspx?articleid=1395390.