Matrix PD

Introduction to Matrix
Analysis and Applications

Fumio Hiai and Denes Petz
Graduate School of Information Sciences
Tohoku University, Aoba-ku, Sendai, 980-8579, Japan
E-mail: hiai.fumio@gmail.com
Alfred Renyi Institute of Mathematics
Re altanoda utca 13-15, H-1364 Budapest, Hungary
E-mail: petz.denes@renyi.mta.hu
Preface
A part of the material of this book is based on the lectures of the authors
in the Graduate School of Information Sciences of Tohoku University and
in the Budapest University of Technology and Economics. The aim of the
lectures was to explain certain important topics on matrix analysis from the
point of view of functional analysis. The concept of Hilbert space appears
many times, but only nite-dimensional spaces are used. The book treats
some aspects of analysis related to matrices including such topics as matrix
monotone functions, matrix means, majorization, entropies, quantum Markov
triplets and so on. There are several popular matrix applications for quantum
theory.
The book is organized into seven chapters. Chapters 1-3 form an intro-
ductory part of the book and could be used as a textbook for an advanced
undergraduate special topics course. The word matrix started in 1848 and
applications appeared in many dierent areas. Chapters 4-7 contain a num-
ber of more advanced and less known topics. They could be used for an ad-
vanced specialized graduate-level course aimed at students who will specialize
in quantum information. But the best use for this part is as the reference for
active researchers in the eld of quantum information theory. Researchers in
statistics, engineering and economics may also nd this book useful.
Chapter 1 contains the basic subjects. We prefer the Hilbert space con-
cepts, so complex numbers are used. Spectrum and eigenvalues are impor-
tant. Determinant and trace are used later in several applications. The tensor
product has symmetric and antisymmetric subspaces. In this book positive
means 0, the word non-negative is not used here. The end of the chapter
contains many exercises.
Chapter 2 contains block-matrices, partial ordering and an elementary
theory of von Neumann algebras in nite-dimensional setting. The Hilbert
space concept requires the projections P = P
2
= P
. Self-adjoint matrices are

linear combinations of projections. Not only the single matrices are required,
but subalgebras are also used. The material includes Kadisons inequality
and completely positive mappings.
Chapter 3 contains matrix functional calculus. Functional calculus pro-
vides a new matrix f(A) when a matrix A and a function f are given. This
is an essential tool in matrix theory as well as in operator theory. A typical
example is the exponential function e
A
=
n=0
A
n
/n!. If f is suciently
smooth, then f(A) is also smooth and we have a useful Frechet dierential
formula.
Chapter 4 contains matrix monotone functions. A real functions dened
4
on an interval is matrix monotone if A B implies f(A) f(B) for Hermi-
tian matrices A, B whose eigenvalues are in the domain interval. We have a
beautiful theory on such functions, initiated by L owner in 1934. A highlight
is integral expression of such functions. Matrix convex functions are also con-
sidered. Graduate students in mathematics and in information theory will
benet from a single source for all of this material.
Chapter 5 contains matrix (operator) means for positive matrices. Matrix
extensions of the arithmetic mean (a + b)/2 and the harmonic mean
_
a
1
+ b
1
2
_
1
are rather trivial, however it is non-trivial to dene matrix version of the
geometric mean
ab. This was rst made by Pusz and Woronowicz. A general

theory on matrix means developed by Kubo and Ando is closely related to
operator monotone functions on (0, ). There are also more complicated
means. The mean transformation M(A, B) := m(L
A
, R
B
) is a mean of the
left-multiplication L
A
and the right-multiplication R
B
recently studied by
Hiai and Kosaki. Another concept is a multivariable extension of two-variable
matrix means.
Chapter 6 contains majorizations for eigenvalues and singular values of ma-
trices. Majorization is a certain order relation between two real vectors. Sec-
tion 6.1 recalls classical material that is available from other sources. There
are several famous majorizations for matrices which have strong applications
to matrix norm inequalities in symmetric norms. For instance, an extremely
useful inequality is called the Lidskii-Wielandt theorem.
The last chapter contains topics related to quantum applications. Positive
matrices with trace 1 are the states in quantum theories and they are also
called density matrices. The relative entropy appeared in 1962 and the ma-
trix theory has many applications in the quantum formalism. The unknown
quantum states can be known from the use of positive operators F(x) with
x
F(x) = I. This is called a POVM and a few mathematical results are
shown, but in quantum theory there are much more relevant subjects. These
subjects are close to the authors and there are some very recent results.
The authors thank several colleagues for useful communications. They are
particularly grateful to Professor Tsuyoshi Ando for insightful comments and
to Professor Rajendra Bhatia for valuable advice.
Fumio Hiai and Denes Petz
April, 2013
Contents
1 Fundamentals of operators and matrices 5
1.1 Basics on matrices . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Spectrum and eigenvalues . . . . . . . . . . . . . . . . . . . . 19
1.5 Trace and determinant . . . . . . . . . . . . . . . . . . . . . . 25
1.6 Positivity and absolute value . . . . . . . . . . . . . . . . . . . 31
1.7 Tensor product . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.8 Notes and remarks . . . . . . . . . . . . . . . . . . . . . . . . 49
1.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2 Mappings and algebras 59
2.1 Block-matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.2 Partial ordering . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.3 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.4 Subalgebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.5 Kernel functions . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.6 Positivity-preserving mappings . . . . . . . . . . . . . . . . . . 90
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3 Functional calculus and derivation 106
3.1 The exponential function . . . . . . . . . . . . . . . . . . . . . 107
3.2 Other functions . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.3 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.4 Frechet derivatives . . . . . . . . . . . . . . . . . . . . . . . . 129
1
2 CONTENTS
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4 Matrix monotone functions and convexity 140
4.1 Some examples of functions . . . . . . . . . . . . . . . . . . . 141
4.2 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.3 Pick functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.4 L owners theorem . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.5 Some applications . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5 Matrix means and inequalities 190
5.1 The geometric mean . . . . . . . . . . . . . . . . . . . . . . . 191
5.2 General theory . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.3 Mean examples . . . . . . . . . . . . . . . . . . . . . . . . . . 210
5.4 Mean transformation . . . . . . . . . . . . . . . . . . . . . . . 216
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6 Majorization and singular values 231
6.1 Majorization of vectors . . . . . . . . . . . . . . . . . . . . . . 232
6.2 Singular values . . . . . . . . . . . . . . . . . . . . . . . . . . 237
6.3 Symmetric norms . . . . . . . . . . . . . . . . . . . . . . . . . 246
6.4 More majorizations for matrices . . . . . . . . . . . . . . . . . 258
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
7 Some applications 277
7.1 Gaussian Markov property . . . . . . . . . . . . . . . . . . . . 278
7.2 Entropies and monotonicity . . . . . . . . . . . . . . . . . . . 281
7.3 Quantum Markov triplets . . . . . . . . . . . . . . . . . . . . 292
7.4 Optimal quantum measurements . . . . . . . . . . . . . . . . . 297
7.5 Cramer-Rao inequality . . . . . . . . . . . . . . . . . . . . . . 311
CONTENTS 3
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Index 327
Bibliography 334
Chapter 1
Fundamentals of operators and
matrices
A linear mapping is essentially matrix if the vector space is nite-dimensional.
In this book the vector space is typically a nite-dimensional complex Hilbert
space. The rst chapter collects introductory materials on matrices and op-
erators. Section 1.2 is a concise exposition of Hilbert spaces. The polar and
the spectral decompositions useful in studying operators in Hilbert spaces are
also essential for matrices. A ner decompositon for matrices is the Jordan
canonical form in Section 1.3. Among the most basic notions for matrices
are eigenvalues, singular values, trace and determinant included in the sub-
sequent sections. A less elementary but important subject is tensor product
discussed in the last.
1.1 Basics on matrices
For n, m N, M
nm
= M
nm
(C) denotes the space of all n m complex
matrices. A matrix M M
nm
is a mapping 1, 2, . . . , n 1, 2, . . . , m
C. It is represented as an array with n rows and m columns:
M =
_
_
m
11
m
12
m
1m
m
21
m
22
m
2m
.
.
.
.
.
.
.
.
.
.
.
.
m
n1
m
n2
m
nm
_
_
,
where m
ij
is the intersection of the ith row and the jth column. If the matrix
is denoted by M, then this entry is denoted by M
ij
. If n = m, then we write
M
n
instead of M
nn
. A simple example is the identity matrix I
n
M
n
5
6 CHAPTER 1. FUNDAMENTALS OF OPERATORS AND MATRICES
dened as m
ij
=
i,j
, or
I
n
=
_
_
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1
_
_
.
M
nm
is a complex vector space of dimension nm. The linear operations
are dened as follows:
[A]
ij
:= A
ij
, [A+ B]
ij
:= A
ij
+ B
ij
,
where is a complex number and A, B M
nm
.
Example 1.1 For i, j = 1, . . . , n let E(ij) be the nn matrix such that the
(i, j)-entry equals to one and all other entries equal to zero. Then E(ij) are
called the matrix units and form a basis of M
n
:
A =
n
i,j=1
A
ij
E(ij).
In particular,
I
n
=
n
i=1
E(ii) .
If A M
nm
and B M
mk
, then the product AB of A and B is dened
by
[AB]
ij
=
m
=1
A
i
B
j
,
where 1 i n and 1 j k. Hence AB M
nk
. So M
n
becomes an
algebra. The most signicant feature of matrices is non-commutativity of the
product AB ,= BA. For example,
_
0 1
0 0
_ _
0 0
1 0
_
=
_
1 0
0 0
_
,
_
0 0
1 0
_ _
0 1
0 0
_
=
_
0 0
0 1
_
.
In the matrix algebra M
n
, the identity matrix I
n
behaves as a unit: I
n
A =
AI
n
= A for every A M
n
. The matrix A M
n
is invertible if there is a
B M
n
such that AB = BA = I
n
. This B is called the inverse of A, in
notation A
1
.
1.1. BASICS ON MATRICES 7
Example 1.2 The linear equations
ax + by = u
cx + dy = v
can be written in a matrix formalism:
_
a b
c d
_ _
x
y
_
=
_
u
v
_
.
If x and y are the unknown parameters and the coecient matrix is invertible,
then the solution is
_
x
y
_
=
_
a b
c d
_
1
_
u
v
_
.
So the solution of linear equations is based on the inverse matrix, which is
formulated in Theorem 1.33.
The transpose A
t
of the matrix A M
nm
is an mn matrix,
[A
t
]
ij
= A
ji
(1 i m, 1 j n).
It is easy to see that if the product AB is dened, then (AB)
t
= B
t
A
t
. The
adjoint matrix A
is the complex conjugate of the transpose A

t
. The space
M
n
is a *-algebra:
(AB)C = A(BC), (A+ B)C = AC + BC, A(B + C) = AB + AC,
(A+ B)
= A
+ B
, (A)
=

A
, (A
= A, (AB)
= B
.
Let A M
n
. The trace of A is the sum of the diagonal entries:
Tr A :=
n
i=1
A
ii
. (1.1)
It is easy to show that Tr AB = Tr BA, see Theorem 1.28 .
The determinant of A M
n
is slightly more complicated:
det A :=
(1)
()
A
1(1)
A
2(2)
. . . A
n(n)
, (1.2)
where the sum is over all permutations of the set 1, 2, . . . , n and () is
the parity of the permutation . Therefore
det
_
a b
c d
_
= ad bc,
and another example is the following:
det
_
_
A
11
A
12
A
13
A
21
A
22
A
23
A
31
A
32
A
33
_
_
= A
11
det
_
A
22
A
23
A
32
A
33
_
A
12
det
_
A
21
A
23
A
31
A
33
_
+ A
13
det
_
A
21
A
22
A
31
A
33
_
.
It can be proven that
det(AB) = (det A)(det B).
1.2 Hilbert space
Let 1 be a complex vector space. A functional , : 1 1 C of two
variables is called an inner product if it satises
(1) x + y, z = x, z +y, z (x, y, z 1),
(2) x, y = x, y ( C, x, y 1),
(3) x, y = y, x (x, y 1),
(4) x, x 0 for every x 1 and x, x = 0 only for x = 0.
Condition (2) states that the inner product is conjugate linear in the rst
variable (and it is linear in the second variable). The Schwarz inequality
[x, y[
2
x, x y, y (1.3)
holds. The inner product determines a norm for the vectors:
|x| :=
_
x, x.
This has the properties
|x + y| |x| +|y| and [x, y[ |x| |y| .
|x| is interpreted as the length of the vector x. A further requirement in
the denition of a Hilbert space is that every Cauchy sequence must be con-
vergent, that is, the space is complete. (In the nite-dimensional case, the
completeness always holds.)
1.2. HILBERT SPACE 9
The linear space C
n
of all n-tuples of complex numbers becomes a Hilbert
space with the inner product
x, y =
n
i=1
x
i
y
i
= [x
1
, x
2
, . . . , x
n
]
_
_
y
1
y
2
.
.
.
y
n
_
_
,
where z denotes the complex conjugate of the complex number z C. An-
other example is the space of square integrable complex-valued functions on
the real Euclidean space R
n
. If f and g are such functions then
f, g =
_
R
n
f(x) g(x) dx
gives the inner product. The latter space is denoted by L
2
(R
n
) and it is
innite-dimensional contrary to the n-dimensional space C
n
. Below we are
mostly satised with nite-dimensional spaces.
If x, y = 0 for vectors x and y of a Hilbert space, then x and y are called
orthogonal, in notation x y. When H 1, H
:= x 1 : x h for
every h H is called the orthogonal complement of H. For any subset
H 1, H
is a closed subspace.
A family e
i
of vectors is called orthonormal if e
i
, e
i
= 1 and e
i
, e
j
=
0 if i ,= j. A maximal orthonormal system is called a basis or orthonormal
basis. The cardinality of a basis is called the dimension of the Hilbert space.
(The cardinality of any two bases is the same.)
In the space C
n
, the standard orthonormal basis consists of the vectors
1
= (1, 0, . . . , 0),
2
= (0, 1, 0, . . . , 0), . . . ,
n
= (0, 0, . . . , 0, 1); (1.4)
each vector has 0 coordinate n 1 times and one coordinate equals 1.
Example 1.3 The space M
n
of matrices becomes Hilbert space with the
inner product
A, B = Tr A
B
which is called HilbertSchmidt inner product. The matrix units E(ij)
(1 i, j n) form an orthonormal basis.
It follows that the HilbertSchmidt norm
|A|
2
:=
_
A, A =
Tr A
A =
_
n
i,j=1
[A
ij
[
2
_
1/2
(1.5)
is a norm for the matrices.
Assume that in an n-dimensional Hilbert space, linearly independent vec-
tors v
1
, v
2
, . . . , v
n
are given. By the Gram-Schmidt procedure an orthonor-
mal basis can be obtained by linear combinations:
e
1
:=
1
|v
1
|
v
1
,
e
2
:=
1
|w
2
|
w
2
with w
2
:= v
2
e
1
, v
2
e
1
,
e
3
:=
1
|w
3
|
w
3
with w
3
:= v
3
e
1
, v
3
e
1
e
2
, v
3
e
2
,
.
.
.
e
n
:=
1
|w
n
|
w
n
with w
n
:= v
n
e
1
, v
n
e
1
. . . e
n1
, v
n
e
n1
.
The next theorem tells that any vector has a unique Fourier expansion.
Theorem 1.4 Let e
1
, e
2
, . . . be a basis in a Hilbert space 1. Then for any
vector x 1 the expansion
x =
n
e
n
, xe
n
holds. Moreover,
|x|
2
=
n
[e
n
, x[
2
Let 1 and / be Hilbert spaces. A mapping A : 1 / is called linear if
it preserves linear combination:
A(f + g) = Af + Ag (f, g 1, , C).
The kernel and the range of A are
ker A := x 1 : Ax = 0, ran A := Ax / : x 1.
The dimension formula familiar in linear algebra is
dim1 = dim(ker A) + dim(ran A).
The quantity dim(ranA) is called the rank of A; rank A is the notation. It
is easy to see that rank A dim1, dim/.
Let e
1
, e
2
, . . . , e
n
be a basis of the Hilbert space 1 and f
1
, f
2
, . . . , f
m
be a
basis of /. The linear mapping A : 1 / is determined by the vectors Ae
j
,
j = 1, 2, . . . , n. Furthermore, the vector Ae
j
is determined by its coordinates:
Ae
j
= c
1,j
f
1
+ c
2,j
f
2
+ . . . + c
m,j
f
m
.
The numbers c
i,j
, 1 i m, 1 j n, form an m n matrix, which is
called the matrix of the linear transformation A with respect to the bases
(e
1
, e
2
, . . . , e
n
) and (f
1
, f
2
, . . . , f
m
). If we want to distinguish the linear op-
erator A from its matrix, then the latter one will be denoted by [A]. We
have
[A]
ij
= f
i
, Ae
j
(1 i m, 1 j n).
Note that the order of the basis vectors is important. We shall mostly con-
sider linear operators of a Hilbert space into itself. Then only one basis is
needed and the matrix of the operator has the form of a square. So a linear
transformation and a basis yield a matrix. If an nn matrix is given, then it
can be always considered as a linear transformation of the space C
n
endowed
with the standard basis (1.4).
The inner product of the vectors [x and [y will be often denoted as x[y,
this notation, sometimes called bra and ket, is popular in physics. On the
other hand, [xy[ is a linear operator which acts on the vector [z as
([xy[) [z := [x y[z y[z [x.
Therefore,
[xy[ =
_
_
x
1
x
2
.
.
x
n
_
_
[y
1
, y
2
, . . . , y
n
]
is conjugate linear in [y, while x[y is linear in [y.
The next example shows the possible use of the bra and ket.
Example 1.5 If X, Y M
n
(C), then
n
i,j=1
Tr E(ij)XE(ji)Y = (Tr X)(Tr Y ). (1.6)
Since both sides are bilinear in the variables X and Y , it is enough to check
the case X = E(ab) and Y = E(cd). Simple computation gives that the
left-hand side is
ab
cd
and this is the same as the right-hand side.
Another possibility is to use the formula E(ij) = [e
i
e
j
[. So
i,j
Tr E(ij)XE(ji)Y =
i,j
Tr [e
i
e
j
[X[e
j
e
i
[Y =
i,j
e
j
[X[e
j
e
i
[Y [e
i
j
e
j
[X[e
j
i
e
i
[Y [e
i
and the last expression is (Tr X)(Tr Y ).

Example 1.6 Fix a natural number n and let 1 be the space of polynomials
of at most n degree. Assume that the variable of these polynomials is t and
the coecients are complex numbers. The typical elements are
p(t) =
n
i=0
u
i
t
i
and q(t) =
n
i=0
v
i
t
i
.
If their inner product is dened as
p(t), q(t) :=
n
i=0
u
i
v
i
,
then 1, t, t
2
, . . . , t
n
is an orthonormal basis.
The dierentiation is a linear operator:
n
k=0
u
k
t
k
k=1
ku
k
t
k1
.
In the above basis, its matrix is
_
_
0 1 0 . . . 0 0
0 0 2 . . . 0 0
0 0 0 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0
0 0 0 . . . 0 n
0 0 0 . . . 0 0
_
_
.
This is an upper triangular matrix; the (i, j) entry is 0 if i > j.
Let 1
1
, 1
2
and 1
3
be Hilbert spaces and we x a basis in each of them.
If B : 1
1
1
2
and A : 1
2
1
3
are linear mappings, then the composition
f A(Bf) 1
3
(f 1
1
)
is linear as well and it is denoted by AB. The matrix [AB] of the composition
AB can be computed from the matrices [A] and [B] as follows:
[AB]
ij
=
k
[A]
ik
[B]
kj
.
The right-hand side is dened to be the product [A] [B] of the matrices [A]
and [B], that is, [AB] = [A] [B] holds. It is obvious that for an m matrix
[A] and an mn matrix [B], their product [A] [B] is an n matrix.
Let 1
1
and 1
2
be Hilbert spaces and we x a basis in each of them. If
A, B : 1
1
1
2
are linear mappings, then their linear combination
(A+ B)f (Af) + (Bf)
is a linear mapping and
[A+ B]
ij
= [A]
ij
+ [B]
ij
.
Let 1 be a Hilbert space. The linear operators 1 1 form an algebra.
This algebra B(1) has a unit, the identity operator denoted by I, and the
product is non-commutative. Assume that 1 is n-dimensional and x a basis.
Then to each linear operator A B(1) an n n matrix A is associated.
The correspondence A [A] is an algebraic isomorphism from B(1) to the
algebra M
n
(C) of nn matrices. This isomorphism shows that the theory of
linear operators on an n-dimensional Hilbert space is the same as the theory
of n n matrices.
Theorem 1.7 (Riesz-Fischer theorem) Let : 1 C be a linear map-
ping on a nite-dimensional Hilbert space 1. Then there is a unique vector
v 1 such that (x) = v, x for every vector x 1.
Proof: Let e
1
, e
2
, . . . , e
n
be an orthonormal basis in 1. Then we need a
vector v 1 such that (e
i
) = v, e
i
. So
v =
i
(e
i
)e
i
will satisfy the condition.
The linear mappings : 1 C are called functionals. If the Hilbert
space is not nite-dimensional, then in the previous theorem the boundedness
condition [(x)[ c|x| should be added, where c is a positive number.
Let 1 and / be nite-dimensional Hilbert spaces. The operator norm
of a linear operator A : 1 / is dened as
|A| := sup|Ax| : x 1, |x| = 1 .
It can be shown that |A| is nite. In addition to the common properties
|A + B| |A| +|B| and |A| = [[|A|, the submultiplicativity
|AB| |A| |B|
also holds.
If |A| 1, then the operator A is called a contraction.
The set of linear operators 1 1 is denoted by B(1). The convergence
A
n
A means |AA
n
| 0. In the case of nite-dimensional Hilbert space
the norm here can be the operator norm, but also the Hilbert-Schmidt norm
can be used. The operator norm of a matrix is not expressed explicitly by
the matrix entries.
Example 1.8 Let A B(1) and |A| < 1. Then I A is invertible and
(I A)
1
=
n=0
A
n
.
Since
(I A)
N
n=0
A
n
= I A
N+1
and |A
N+1
| |A|
N+1
,
we can see that the limit of the rst equation is
(I A)
n=0
A
n
= I.
This shows the statement, which formula is called a Neumann series.
Let 1 and / be Hilbert spaces. If T : 1 / is a linear operator, then
its adjoint T
: / 1 is determined by the formula

x, Ty
K
= T
x, y
H
(x /, y 1).
The operator T B(1) is called self-adjoint if T
= T. The operator T
is self-adjoint if and only if x, Tx is a real number for every vector x 1.
For the self-adjoint operators on 1 and the self-adjoint n n matrices the
notations B(1)
sa
and M
sa
n
are used.
Theorem 1.9 The properties of the adjoint are:
(1) (A + B)
= A
+ B
, (A)
= A
( C),
(2) (A
= A, (AB)
= B
,
(3) (A
1
)
= (A
)
1
if A is invertible,
(4) |A| = |A
|, |A
A| = |A|
2
.
Example 1.10 Let A : 1 1 be a linear operator and e
1
, e
2
, . . . , e
n
be a
basis in the Hilbert space 1. The (i, j) element of the matrix of A is e
i
, Ae
j
.
Since
e
i
, Ae
j
= e
j
, A
e
i
,
this is the complex conjugate of the (j, i) element of the matrix of A
.
If A is self-adjoint, then the (i, j) element of the matrix of A is the con-
jugate of the (j, i) element. In particular, all diagonal entries are real. The
self-adjoint matrices are also called Hermitian matrices.
Theorem 1.11 (Projection theorem) Let / be a closed subspace of a
Hilbert space 1. Any vector x 1 can be written in a unique way in the
form x = x
0
+ y, where x
0
/ and y /.
Note that a subspace of a nite-dimensional Hilbert space is always closed.
The mapping P : x x
0
dened in the context of the previous theorem is
called the orthogonal projection onto the subspace /. This mapping is
linear:
P(x + y) = Px + Py.
Moreover, P
2
= P = P
. The converse is also true: If P

2
= P = P
, then P
is an orthogonal projection (onto its range).
Example 1.12 The matrix A M
n
is self-adjoint if A
ji
= A
ij
. A particular
example is a Toeplitz matrix:
_
_
a
1
a
2
a
3
. . . a
n1
a
n
a
2
a
1
a
2
. . . a
n2
a
n1
a
3
a
2
a
1
. . . a
n3
a
n2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
. . . a
1
a
2
a
n
a
n1
a
n2
. . . a
2
a
1
_
_
,
where a
1
R.
An operator U B(1) is called a unitary if U
is the inverse of U. Then

U
U = I and
x, y = U
Ux, y = Ux, Uy
for any vectors x, y 1. Therefore the unitary operators preserve the inner
product. In particular, orthogonal unit vectors are mapped by a unitary
operator into orthogonal unit vectors.
Example 1.13 The permutation matrices are simple unitaries. Let
be a permutation of the set 1, 2, . . . , n. The A
i,(i)
entries of A M
n
(C)
are 1 and all others are 0. Every row and every column contain exactly one 1
entry. If such a matrix A is applied to a vector, it permutes the coordinates:
_
_
0 1 0
0 0 1
1 0 0
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
x
2
x
3
x
1
_
_
.
This shows the reason of the terminology. Another possible formalism is
A(x
1
, x
2
, x
3
) = (x
2
, x
3
, x
1
).
An operator A B(1) is called normal if AA
= A
A. It follows
immediately that
|Ax| = |A
x|
for any vector x 1. Self-adjoint and unitary operators are normal.
The operators we need are mostly linear, but sometimes conjugate-linear
operators appear. : 1 / is conjugate-linear if
(x + y) = x + y
for any complex numbers and and for any vectors x, y 1. The adjoint
of a conjugate-linear operator is determined by the equation

x, y
K
= y,
x
H
(x /, y 1). (1.7)
A mapping : 1 1 C is called a complex bilinear form if is
linear in the second variable and conjugate linear in the rst variable. The
inner product is a particular example.
Theorem 1.14 On a nite-dimensional Hilbert space there is a one-to-one
correspondence
(x, y) = Ax, y
between the complex bilinear forms : 1 1 C and the linear operators
A : 1 1.
1.3. JORDAN CANONICAL FORM 17
Proof: Fix x 1. Then y (x, y) is a linear functional. Due to the
Riesz-Fischer theorem (x, y) = z, y for a vector z 1. We set Ax = z.
The polarization identity
4(x, y) = (x + y, x + y) + i(x + iy, x + iy)
(x y, x y) i(x iy, x iy) (1.8)
shows that a complex bilinear form is determined by its so-called quadratic
form x (x, x).
1.3 Jordan canonical form
A Jordan block is a matrix
J
k
(a) =
_
_
a 1 0 0
0 a 1 0
0 0 a 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 a
_
_
,
where a C. This is an upper triangular matrix J
k
(a) M
k
. We use also
the notation J
k
:= J
k
(0). Then
J
k
(a) = aI
k
+ J
k
and the sum consists of commuting matrices.
Example 1.15 The matrix J
k
is
(J
k
)
ij
=
_
1 if j = i + 1,
0 otherwise.
Therefore
(J
k
)
ij
(J
k
)
jk
=
_
1 if j = i + 1 and k = i + 2,
0 otherwise.
It follows that
(J
2
k
)
ij
=
_
1 if j = i + 2,
0 otherwise.
We observe that taking the powers of J
k
the line of the 1 entries is going
upper, in particular J
k
k
= 0. The matrices J
m
k
(0 m k 1) are linearly
independent.
If a ,= 0, then det J
k
(a) ,= 0 and J
k
(a) is invertible. We can search for the
inverse by the equation
(aI
k
+ J
k
)
_
k1
j=0
c
j
J
j
k
_
= I
k
.
Rewriting this equation we get
ac
0
I
k
+
k1
j=1
(ac
j
+ c
j1
)J
j
k
= I
k
.
The solution is
c
j
= (a)
j1
(0 j k 1).
In particular,
_
_
a 1 0
0 a 1
0 0 a
_
_
1
=
_
_
a
1
a
2
a
3
0 a
1
a
2
0 0 a
1
_
_
.
Computation with Jordan blocks is convenient.
The Jordan canonical form theorem is the following.
Theorem 1.16 Given a matrix X M
n
, there is an invertible matrix S
M
n
such that
X = S
_
_
J
k
1
(
1
) 0 0
0 J
k
2
(
2
) 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 J
km
(
m
)
_
_
S
1
= SJS
1
,
where k
1
+ k
2
+ + k
m
= n. The Jordan matrix J is uniquely determined
(up to the permutation of the Jordan blocks in the diagonal.)
Note that the numbers
1
,
2
, . . . ,
m
are not necessarily dierent. The
theorem is about complex matrices. Example 1.15 showed that it is rather
easy to handle a Jordan block. If the Jordan canonical decomposition is
known, then the inverse can be computed.
Example 1.17 An essential application is concerning the determinant. Since
det X = det(SJS
1
) = det J, it is enough to compute the determinant of the
upper-triangular Jordan matrix J. Therefore
det X =
m
j=1
k
j
j
. (1.9)
1.4. SPECTRUM AND EIGENVALUES 19
The characteristic polynomial of X M
n
is dened as
p(x) := det(xI
n
X)
From the computation (1.9) we have
p(x) =
m
j=1
(x
j
)
k
j
= x
n
_
m
j=1
k
j
j
_
x
n1
+ + (1)
n
m
j=1
k
j
j
. (1.10)
The numbers
j
are the roots of the characteristic polynomial.
The powers of a matrix X M
n
are well-dened. For a polynomial p(x) =
m
k=0
c
k
x
k
the matrix p(X) is
m
k=0
c
k
X
k
.
A polynomial q is said to annihilate a matrix X M
n
if q(X) = 0.
The next result is the Cayley-Hamilton theorem.
Theorem 1.18 If p is the characteristic polynomial of X M
n
, then p(X) =
0.
1.4 Spectrum and eigenvalues
Let 1 be a Hilbert space. For A B(1) and C, we say that is an
eigenvalue of A if there is a non-zero vector v 1 such that Av = v.
Such a vector v is called an eigenvector of A for the eigenvalue . If 1 is
nite-dimensional, then C is an eigenvalue of A if and only if A I is
not invertible.
Generally, the spectrum (A) of A B(1) consists of the numbers
C such that AI is not invertible. Therefore in the nite-dimensional
case the spectrum is the set of eigenvalues.
Example 1.19 We show that (AB) = (BA) for A, B M
n
. It is enough
to prove that det(I AB) = det(I BA). Assume rst that A is invertible.
We then have
det(I AB) = det(A
1
(I AB)A) = det(I BA)
and hence (AB) = (BA).
When A is not invertible, choose a sequence
k
C (A) with
k
0
and set A
k
:= A
k
I. Then
det(I AB) = lim
k
det(I A
k
B) = lim
k
det(I BA
k
) = det(I BA).
(Another argument is in Exercise 3 of Chapter 2.)
Example 1.20 In the history of matrix theory the particular matrix
_
_
0 1 0 . . . 0 0
1 0 1 . . . 0 0
0 1 0 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . 0 1
0 0 0 . . . 1 0
_
_
(1.11)
has importance. Its eigenvalues were computed by Joseph Louis Lagrange
in 1759. He found that the eigenvalues are 2 cos j/(n + 1) (j = 1, 2, . . . , n).
The matrix (1.11) is tridiagonal. This means that A

ij
= 0 if [i j[ > 1.
Example 1.21 Let R and consider the matrix
J
3
() =
_
_
1 0
0 1
0 0
_
_
.
Now is the only eigenvalue and (1, 0, 0) is the only eigenvector up to constant
multiple. The situation is similar in the k k generalization J
k
(): is
the eigenvalue of SJ
k
()S
1
for an arbitrary invertible S and there is one
eigenvector (up to constant multiple).
If X has the Jordan form as in Theorem 1.16, then all
j
s are eigenvalues.
Therefore the roots of the characteristic polynomial are eigenvalues. When
is an eigenvalue of X, ker(X I) = v C
n
: (X I)v = 0 is called
the eigenspace of X for . Note that the dimension of ker(X I) is the
number of j such that
j
= , which is called the geometric multiplicity
of ; on the other hand, the multiplicity of as a root of the characteristic
polynomial is called the algebraic multiplicity.
For the above J
3
() we can see that
J
3
()(0, 0, 1) = (0, 1, ), J
3
()
2
(0, 0, 1) = (1, 2,
2
).
Therefore (0, 0, 1) and these two vectors linearly span the whole space C
3
.
The vector (0, 0, 1) is called cyclic vector.
Assume that a matrix X M
n
has a cyclic vector v C
n
which means
that the set v, Xv, X
2
v, . . . , X
n1
v spans C
n
. Then X = SJ
n
()S
1
with
some invertible matrix S, so the Jordan canonical form consists of a single
block.
Theorem 1.22 Assume that A B(1) is normal. Then there exist
1
, . . . ,
n
C and u
1
, . . . , u
n
1 such that u
1
, . . . , u
n
is an orthonormal basis of
1 and Au
i
=
i
u
i
for all 1 i n.
Proof: Let us prove by induction on n = dim1. The case n = 1 trivially
holds. Suppose the assertion holds for dimension n1. Assume that dim1 =
n and A B(1) is normal. Choose a root
1
of det(IA) = 0. As explained
before the theorem,
1
is an eigenvalue of A so that there is an eigenvector
u
1
with Au
1
=
1
u
1
. One may assume that u
1
is a unit vector, i.e., |u
1
| = 1.
Since A is normal, we have
(A
1
I)
(A
1
I) = (A
1
I)(A
1
I)
= A
A
1
A
1
A
+
1
1
I
= AA
1
A
1
A
+
1
1
I
= (A
1
I)(A
1
I)
,
that is, A
1
I is also normal. Therefore,
|(A
1
I)u
1
| = |(A
1
I)
u
1
| = |(A
1
I)u
1
| = 0
so that A
u
1
=
1
u
1
. Let 1
1
:= u
1
, the orthogonal complement of u

1
.
If x 1
1
then
Ax, u
1
= x, A
u
1
= x,
1
u
1
=
1
x, u
1
= 0,
A
x, u
1
= x, Au
1
= x,
1
u
1
=
1
x, u
1
= 0
so that Ax, A
x 1
1
. Hence we have A1
1
1
1
and A
1
1
1
1
. So
one can dene A
1
:= A[
H
1
B(1
1
). Then A
1
= A
[
H
1
, which implies
that A
1
is also normal. Since dim1
1
= n 1, the induction hypothesis
can be applied to obtain
2
, . . . ,
n
C and u
2
, . . . , u
n
1
1
such that
u
2
, . . . , u
n
is an orthonormal basis of 1
1
and A
1
u
i
=
i
u
i
for all i = 2, . . . , n.
Then u
1
, u
2
, . . . , u
n
is an orthonormal basis of 1 and Au
i
=
i
u
i
for all
i = 1, 2, . . . , n. Thus the assertion holds for dimension n as well.
It is an important consequence that the matrix of a normal operator is
diagonal in an appropriate orthonormal basis and the trace is the sum of the
eigenvalues.
Theorem 1.23 Assume that A B(1) is self-adjoint. If Av = v and
Aw = w with non-zero eigenvectors v, w and the eigenvalues and are
dierent, then v w and , R.
Proof: First we show that the eigenvalues are real:
v, v = v, v = v, Av = Av, v = v, v = v, v.
The orthogonality v, w = 0 comes similarly:
v, w = v, w = v, Aw = Av, w = v, w = v, w.
If A is a self-adjoint operator on an n-dimensional Hilbert space, then from

the eigenvectors we can nd an orthonormal basis v
1
, v
2
, . . . , v
n
. If Av
i
=
i
v
i
,
then
A =
n
i=1
i
[v
i
v
i
[ (1.12)
which is called the Schmidt decomposition. The Schmidt decomposition
is unique if all the eigenvalues are dierent, otherwise not. Another useful
decomposition is the spectral decomposition. Assume that the self-adjoint
operator A has the eigenvalues
1
>
2
> . . . >
k
. Then
A =
k
j=1
j
P
j
, (1.13)
where P
j
is the orthogonal projection onto the eigenspace for the eigenvalue
j
. (From the Schmidt decomposition (1.12),
P
j
=
i
[v
i
v
i
[,
where the summation is over all i such that
i
=
j
.) This decomposition
is always unique. Actually, the Schmidt decomposition and the spectral de-
composition exist for all normal operators.
If
i
0 in (1.12), then we can set [x
i
:=
i
[v
i
and we have
A =
n
i=1
[x
i
x
i
[.
If the orthogonality of the vectors [x
i
is not assumed, then there are several
similar decompositions, but they are connected by a unitary. The next lemma
and its proof is a good exercise for the bra and ket formalism. (The result
and the proof is due to Schrodinger [79].)
Lemma 1.24 If
A =
n
j=1
[x
j
x
j
[ =
n
i=1
[y
i
y
i
[,
then there exists a unitary matrix [U
ij
]
n
i,j=1
such that
n
j=1
U
ij
[x
j
= [y
i
(1 i n). (1.14)
Proof: Assume rst that the vectors [x
j
are orthogonal. Typically they are
not unit vectors and several of them can be 0. Assume that [x
1
, [x
2
, . . . , [x
k
are not 0 and [x

k+1
= . . . = [x
n
= 0. Then the vectors [y
i
are in the linear
span of [x
j
: 1 j k. Therefore
[y
i
=
k
j=1
x
j
[y
i
x
j
[x
j
[x
j
is the orthogonal expansion. We can dene [U

ij
] by the formula
U
ij
=
x
j
[y
i
x
j
[x
j
(1 i n, 1 j k).
We easily compute that
k
i=1
U
it
U
iu
=
k
i=1
x
t
[y
i
x
t
[x
t
y
i
[x
u
x
u
[x
u
=
x
t
[A[x
u
x
u
[x
u
x
t
[x
t
=
t,u
,
and this relation shows that the k column vectors of the matrix [U
ij
] are
orthonormal. If k < n, then we can append further columns to get an n n
unitary, see Exercise 37. (One can see in (1.14) that if [x
j
= 0, then U
ij
does
not play any role.)
In the general situation
A =
n
j=1
[z
j
z
j
[ =
n
i=1
[y
i
y
i
[,
we can make a unitary U from an orthogonal family to [y
i
s and a unitary
V from the same orthogonal family to [z
i
s. Then UV

goes from [z
i
s to
[y
i
s.
Example 1.25 Let A B(1) be a self-adjoint operator with eigenvalues
1

2
. . .
n
(counted with multiplicity). Then
1
= maxv, Av : v 1, |v| = 1. (1.15)
We can take the Schmidt decomposition (1.12). Assume that
maxv, Av : v 1, |v| = 1 = w, Aw
for a unit vector w. This vector has the expansion
w =
n
i=1
c
i
[v
i
and we have
w, Aw =
n
i=1
[c
i
[
2
i

1
.
The equality holds if and only if
i
<
1
implies c
i
= 0. The maximizer
should be an eigenvector for the eigenvalue
1
.
Similarly,
n
= minv, Av : v 1, |v| = 1. (1.16)
The formulas (1.15) and (1.16) will be extended below.
Theorem 1.26 (Poincares inequality) Let A B(1) be a self-adjoint
operator with eigenvalues
1

2
. . .
n
(counted with multiplicity) and
let / be a k-dimensional subspace of 1. Then there are unit vectors x, y /
such that
x, Ax
k
and y, Ay
k
.
Proof: Let v
k
, . . . , v
n
be orthonormal eigenvectors corresponding to the
eigenvalues
k
, . . . ,
n
. They span a subspace / of dimension n k + 1
which must have intersection with /. Take a unit vector x / / which
has the expansion
x =
n
i=k
c
i
v
i
and it has the required property:
x, Ax =
n
i=k
[c
i
[
2
i

k
n
i=k
[c
i
[
2
=
k
.
To nd the other vector y, the same argument can be used with the matrix
A.
The next result is a minimax principle.
1.5. TRACE AND DETERMINANT 25
Theorem 1.27 Let A B(1) be a self-adjoint operator with eigenvalues
1

2
. . .
n
(counted with multiplicity). Then
k
= min
_
maxv, Av : v /, |v| = 1 : / 1, dim/ = n + 1 k
_
.
Proof: Let v
k
, . . . , v
n
be orthonormal eigenvectors corresponding to the
eigenvalues
k
, . . . ,
n
. They span a subspace / of dimension n + 1 k.
According to (1.15) we have
k
= maxv, Av : v /
and it follows that in the statement of the theorem is true.
To complete the proof we have to show that for any subspace / of di-
mension n + 1 k there is a unit vector v such that
k
v, Av, or
k

v, (A)v. The decreasing eigenvalues of A are
n

n1

1
where the th is
n+1
. The existence of a unit vector v is guaranteed by
the Poincares inequality, where we take = n + 1 k.
1.5 Trace and determinant
When e
1
, . . . , e
n
is an orthonormal basis of 1, the trace Tr A of A B(1)
is dened as
Tr A :=
n
i=1
e
i
, Ae
i
. (1.17)
Theorem 1.28 The denition (1.17) is independent of the choice of an or-
thonormal basis e
1
, . . . , e
n
and Tr AB = Tr BA for all A, B B(1).
Proof: We have
Tr AB =
n
i=1
e
i
, ABe
i
=
n
i=1
A
e
i
, Be
i
=
n
i=1
n
j=1
e
j
, A
e
i
e
j
, Be
i
=
n
j=1
n
i=1
e
i
, B
e
j
e
i
, Ae
j
=
n
j=1
e
j
, BAe
j
= Tr BA.
Now, let f
1
, . . . , f
n
be another orthonormal basis of 1. Then a unitary
U is dened by Ue
i
= f
i
, 1 i n, and we have
n
i=1
f
i
, Af
i
=
n
i=1
Ue
i
, AUe
i
= Tr U
AU = Tr AUU
= Tr A,
which says that the denition of Tr A is actually independent of the choice of
an orthonormal basis.
When A M
n
, the trace of A is nothing but the sum of the principal
diagonal entries of A:
Tr A = A
11
+ A
22
+ + A
nn
.
The trace is the sum of the eigenvalues.
Computation of the trace is very simple, the case of the determinant (1.2)
is very dierent. In terms of the Jordan canonical form described in Theorem
1.16, we have
Tr X =
m
j=1
k
j
j
and det X =
m
j=1
k
j
j
.
Formula (1.10) shows that trace and determinant are certain coecients of
the characteristic polynomial.
The next example is about the determinant of a special linear mapping.
Example 1.29 On the linear space M
n
we can dene a linear mapping :
M
n
M
n
as (A) = V AV

, where V M
n
is a xed matrix. We are
interested in det .
Let V = SJS
1
be the canonical Jordan decomposition and set
1
(A) = S
1
A(S
1
)
,
2
(B) = JBJ
,
3
(C) = SCS
.
Then =
3

2

1
and det = det
3
det
2
det
1
. Since
1
=
1
3
,
we have det = det
2
, so only the Jordan block part has inuence to the
determinant.
The following example helps to understand the situation. Let
J =
_
1
x
0
2
_
and
A
1
=
_
1 0
0 0
_
, A
2
=
_
0 1
0 0
_
, A
3
=
_
0 0
1 0
_
, A
4
=
_
0 0
0 1
_
.
Then A
1
, A
2
, A
3
, A
4
is a basis in M
2
. If (A) = JAJ
, then from the data

(A
1
) =
1
1
A
1
, (A
2
) =
1
xA
1
+
1
2
A
2
,
(A
3
) =
1
xA
1
+
1
2
A
3
, (A
4
) = xxA
1
+
2
xA
2
+
2
xA
3
+
2
2
A
4
we can observe that the matrix of is upper triangular:
_
1

1
x
1
x xx
0
1
2
0
2
x
0 0
1
2

2
x
0 0 0
2
2
_
_
.
So its determinant is the product of the diagonal entries:
1

1
2

1
2

2
2
= [
1
2
[
4
= [ det J[
4
.
Now let J M
n
and assume that only the entries J
ii
and J
i,i+1
can be
non-zero. In M
n
we choose the basis of the matrix units,
E(1, 1), E(1, 2), . . . , E(1, n), E(2, 1), . . . , E(2, n), . . . , E(3, 1), . . . , E(n, n).
We want to see that the matrix of is upper triangular.
From the computation
JE(j, k))J
= J
j1,j
J
k1,k
E(j 1, k 1) + J
j1,j
J
k,k
E(j 1, k)
+J
jj
J
k1,k
E(j, k 1) + J
jj
J
k,k
E(j, k)
we can see that the matrix of the mapping A JAJ
is upper triangular. (In

the lexicographical order of the matrix units E(j1, k1), E(j1, k), E(j, k
1) are before E(j, k).) The determinant is the product of the diagonal entries:
n
j,k=1
J
jj
J
kk
=
m
k=1
(det J)J
kk
n
= (det J)
n
det J
n
.
Hence the determinant of (A) = V AV

is (det V )
n
det V
n
= [ det V [
2n
, since
the determinant of V equals to the determinant of its Jordan block J. If
(A) = V AV
t
, then the argument is similar, det = (det V )
2n
, thus only the
conjugate is missing.
Next we deal with the space / of real symmetric n n matrices. Let V
be a real matrix and set : //, (A) = V AV
t
. Then is a symmetric
operator on the real Hilbert space /. The Jordan blocks in Theorem 1.16
for the real V are generally non-real. However, the real form of the Jordan
canonical decomposition holds in such a way that there is a real invertible
matrix S such that
V = SJS
1
, J =
_
_
J
1
0 0
0 J
2
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 J
m
_
_
and each block J
i
is either the Jordan block J
k
() with real or a matrix of
the form
_
_
C I 0 0
0 C I 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. I
0 0 C
_
_
, C =
_
a b
b a
_
with real a, b, I =
_
1 0
0 1
_
.
Similarly to the above argument, det is equal to the determinant of X
JXJ
t
. Since the computation by using the above real Jordan decomposition
and a basis E(j, k) + E(k, j) : 1 j k n in / is rather complicated,
we are satised with the computation for the special case:
J =
_
_
0 0
0 0
0 0 1
0 0 0
_
_
.
For a 4 4 real matrix
X =
_
X
11
X
12
X
t
12
X
22
_
with
X
11
=
_
x
11
x
12
x
12
x
22
_
, X
12
=
_
x
13
x
14
x
23
x
24
_
, X
22
=
_
x
33
x
34
x
34
x
44
_
,
a direct computation gives
JXJ
t
=
_
Y
11
Y
12
Y
t
12
Y
22
_
with
Y
11
=
_

2
x
11
+ 2x
12
+
2
x
22
x
11
+ (
2
2
)x
12
+ x
22
x
11
+ (
2
2
)x
12
+ x
22

2
x
11
2x
12
+
2
x
22
_
,
Y
12
=
_
x
13
+ x
14
+ x
23
+ x
24
x
14
+ x
24
x
13
x
14
+ x
23
+ x
24
x
14
+ x
24
_
,
Y
22
=
_
2
x
33
+ 2x
34
+ x
44

2
x
34
+ x
44
2
x
34
+ x
44

2
x
44
_
.
Therefore, the matrix of X JXJ
t
is the direct sum of
_
_
2
2
2

2
2
2
2
_
_
,
_
_

0 0

0 0
_
_
,
_
_
2
2 1
0
2
0 0
2
_
_
.
The determinant can be computed as the product of the determinants of the
above three matrices and it is
(
2
+
2
)
5
10
= (det J)
5
= (det V )
5
.
For a general n n real V we have det = (det V )
n+1
.
Theorem 1.30 The determinant of a positive matrix A M
n
does not ex-
ceed the product of the diagonal entries:
det A
n
i=1
A
ii
This is a consequence of the concavity of the log function, see Example
4.18 (or Example 1.43).
If A M
n
and 1 i, j n, then in the next theorems [A]
ij
denotes the
(n1) (n1) matrix which is obtained from A by striking out the ith row
and the jth column.
Theorem 1.31 Let A M
n
and 1 j n. Then
det A =
n
i=1
(1)
i+j
A
ij
det([A]
ij
).
Example 1.32 Here is a simple computation using the row version of the
previous theorem.
det
_
_
1 2 0
3 0 4
0 5 6
_
_
= 1 (0 6 5 4) 2 (3 6 0 4) + 0 (3 5 0 0).
The theorem is useful if the matrix has several 0 entries.
The determinant has an important role in the computation of the inverse.
Theorem 1.33 Let A M
n
be invertible. Then
[A
1
]
ki
= (1)
i+k
det([A]
ik
)
det A
for 1 i, k n.
Example 1.34 A standard formula is
_
a b
c d
_
1
=
1
ad bc
_
d b
c a
_
when the determinant ad bc is not 0.
The next example is about the Haar measure on some group of matrices.
Mathematical analysis is essential there.
Example 1.35 ( denotes the set of invertible real 2 2 matrices. ( is a
(non-commutative) group and ( M
2
(R)

= R
4
is an open set. Therefore it
is a locally compact topological group.
The Haar measure is dened by the left-invariance property:
(H) = (BA : A H) (B G)
(H ( is measurable). We assume that
(H) =
_
H
p(A) dA,
where p : ( R
+
is a function and dA is the Lebesgue measure in R
4
:
A =
_
x y
z w
_
, dA = dxdy dz dw.
The left-invariance is equivalent to the condition
_
f(A)p(A) dA =
_
f(BA)p(A) dA
for all continuous functions f : ( R and for every B (. The integral can
be changed:
_
f(BA)p(A) dA =
_
f(A
)p(B
1
A
A
A
dA
,
where BA is replaced with A
. If
B =
_
a b
c d
_
then
A
:= BA =
_
ax + bz ay + bw
cx + dz cy + dw
_
1.6. POSITIVITY AND ABSOLUTE VALUE 31
and the Jacobi matrix is
A
A
=
_
_
a 0 b 0
0 a 0 b
c 0 d 0
0 c 0 d
_
_
= B I
2
.
We have

A
A
:=
det
_
A
A
=
1
[ det(B I
2
)[
=
1
(det B)
2
and
_
f(A)p(A) dA =
_
f(A)
p(B
1
A)
(det B)
2
dA.
So the condition for the invariance of the measure is
p(A) =
p(B
1
A)
(det B)
2
.
The solution is
p(A) =
1
(det A)
2
.
This denes the left invariant Haar measure, but it is actually also right
invariant.
For n n matrices the computation is similar; then
p(A) =
1
(det A)
n
.
(Another example is in Exercise 61.)
1.6 Positivity and absolute value
Let 1 be a Hilbert space and T : 1 1 be a bounded linear operator. T is
called a positive operator (or a positive semidenite matrix) if x, Tx 0
for every vector x 1, in notation T 0. It follows from the denition
that a positive operator is self-adjoint. Moreover, if T
1
and T
2
are positive
operators, then T
1
+ T
2
is positive as well.
Theorem 1.36 Let T B(1) be an operator. The following conditions are
equivalent.
(1) T is positive.
(2) T = T
and the spectrum of T lies in R

+
= [0, ).
(3) T is of the form A
A for some operator A B(1).

An operator T is positive if and only if UTU
is positive for a unitary U.

We can reformulate positivity for a matrix T M
n
. For (a
1
, a
2
, . . . , a
n
)
C
n
the inequality
j
a
i
T
ij
a
j
0 (1.18)
should be true. It is easy to see that if T 0, then T
ii
0 for every
1 i n. For a special unitary U the matrix UTU
can be diagonal
Diag(
1
,
2
, . . . ,
n
) where
i
s are the eigenvalues. So the positivity of T
means that it is Hermitian and the eigenvalues are positive (condition (2)
above).
Example 1.37 If the matrix
A =
_
_
a b c
b d e
c e f
_
_
is positive, then the matrices
B =
_
a b
b d
_
, C =
_
a c
c f
_
are positive as well. (We take the positivity condition (1.18) for A and the
choice a
3
= 0 gives the positivity of B. Similar argument with a
2
= 0 is for
C.)
Theorem 1.38 Let T be a positive operator. Then there is a unique positive
operator B such that B
2
= T. If a self-adjoint operator A commutes with T,
then it commutes with B as well.
Proof: We restrict ourselves to the nite-dimensional case. In this case it
is enough to nd the eigenvalues and the eigenvectors. If Bx = x, then x is
an eigenvector of T with eigenvalue
2
. This determines B uniquely, T and
B have the same eigenvectors.
AB = BA holds if for any eigenvector x of B the vector Ax is an eigen-
vector, too. If TA = AT, then this follows.
B is called the square root of T; T
1/2
and
T are the notations. It

follows from the theorem that the product of commuting positive operators
T and A is positive. Indeed,
TA = T
1/2
T
1/2
A = T
1/2
AT
1/2
= (A
1/2
T
1/2
)
A
1/2
T
1/2
.
For each A B(1), we have A
A 0. So, dene [A[ := (A
A)
1/2
that is
called the absolute value of A. The mapping
[A[x Ax
is norm preserving:
| [A[x|
2
= [A[x, [A[x = x, [A[
2
x = x, A
Ax = Ax, Ax = |Ax|
2
It can be extended to a unitary U. So A = U[A[ and this is called the polar
decomposition of A.
[A[ := (A
A)
1/2
makes sense if A : 1
1
1
2
. Then [A[ B(1
1
). The
above argument tells that [A[x Ax is norm preserving, but it is not always
true that it can be extended to a unitary. If dim1
1
dim1
2
, then [A[x
Ax can be extended to an isometry V : 1
1
1
2
. Then A = V [A[, where
V

V = I.
The eigenvalues s
i
(A) of [A[ are called the singular values of A. If
A M
n
, then the usual notation is
s(A) = (s
1
(A), . . . , s
n
(A)), s
1
(A) s
2
(A) s
n
(A).
Example 1.39 Let T be a positive operator acting on a nite-dimensional
Hilbert space such that |T| 1. We want to show that there is a unitary
operator U such that
T =
1
2
(U + U
).
We can choose an orthonormal basis e
1
, e
2
, . . . , e
n
consisting of eigenvectors
of T and in this basis the matrix of T is diagonal, say, Diag(t
1
, t
2
, . . . , t
n
),
0 t
j
1 from the positivity. For any 1 j n we can nd a real number
j
such that
t
j
=
1
2
(e
i
j
+ e
i
j
).
Then the unitary operator U with matrix Diag(exp(i
1
), . . . , exp(i
n
)) will
have the desired property.
If T acts on a nite-dimensional Hilbert space which has an orthonormal
basis e
1
, e
2
, . . . , e
n
, then T is uniquely determined by its matrix
[e
i
, Te
j
]
n
i,j=1
.
T is positive if and only if its matrix is positive (semidenite).
Example 1.40 Let
A =
_
1

2
. . .
n
0 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
.
Then
[A
A]
i,j
=
i
j
(1 i, j n)
and this matrix is positive:
j
a
i
[A
A]
i,j
a
j
=
i
a
i
j
a
j
j
0.
Every positive matrix is the sum of matrices of this form. (The minimum
number of the summands is the rank of the matrix.)
Example 1.41 Take numbers
1
,
2
, . . . ,
n
> 0 and set
A
ij
=
1
i
+
j
which is called a Cauchy matrix. We have
1
i
+
j
=
_

0
e
t
i
e
t
j
dt
and the matrix
A(t)
ij
:= e
t
i
e
t
j
is positive for every t R due to Example 1.40. Therefore
A =
_

0
A(t) dt
The above argument can be generalized. If r > 0, then
1
(
i
+
j
)
r
=
1
(r)
_

0
e
t
i
e
t
j
t
r1
dt.
This implies that
A
ij
=
1
(
i
+
j
)
r
(r > 0)
is positive.
The Cauchy matrix is an example of an innitely divisible matrix. If
A is an entrywise positive matrix, then it is called innitely divisible if the
matrices
A(r)
ij
= (A
ij
)
r
are positive for every number r > 0.
Theorem 1.42 Let T B(1) be an invertible self-adjoint operator and
e
1
, e
2
, . . . , e
n
be a basis in the Hilbert space 1. T is positive if and only if
for any 1 k n the determinant of the k k matrix
[e
i
, Te
j
]
k
ij=1
is positive (that is, 0).
An invertible positive matrix is called positive denite. Such matrices
appear in probability theory in the concept of Gaussian distribution. The
work with Gaussian distributions in probability theory requires the experience
with matrices. (This is in the next example, but also in Example 2.7.)
Example 1.43 Let M be a positive denite n n real matrix and x =
(x
1
, x
2
, . . . , x
n
). Then
f
M
(x) :=
det M
(2)
n
exp
_
1
2
x, Mx
_
(1.19)
is a multivariate Gaussian probability distribution (with 0 expectation, see,
for example, III.6 in [37]). The matrix M will be called the quadratic
matrix of the Gaussian distribution..
For an n n matrix B, the relation
_
x, Bxf
M
(x) dx = Tr BM
1
. (1.20)
holds.
We rst note that if (1.20) is true for a matrix M, then
_
x, Bxf
U
MU
(x) dx =
_
U
x, BU
xf
M
(x) dx
= Tr (UBU
)M
1
= Tr B(U
MU)
1
for a unitary U, since the Lebesgue measure on R
n
is invariant under unitary
transformation. This means that (1.20) holds also for U
MU. Therefore to
check (1.20), we may assume that M is diagonal. Another reduction concerns
B, we may assume that B is a matrix unit E
ij
. Then the n variable integral
reduces to integrals on R and the known integrals
_
R
t exp
_
1
2
t
2
_
dt = 0 and
_
R
t
2
exp
_
1
2
t
2
_
dt =
can be used.
Formula (1.20) has an important consequence. When the joint distribution
of the random variables (
1
,
2
, . . . ,
n
) is given by (1.19), then the covariance
matrix is M
1
.
The Boltzmann entropy of a probability density f(x) is dened as
h(f) :=
_
f(x) log f(x) dx
if the integral exists. For a Gaussian f
M
we have
h(f
M
) =
n
2
log(2e)
1
2
log det M.
Assume that f
M
is the joint distribution of (real-valued) random variables
1
,
2
, . . . ,
n
. Their joint Boltzmann entropy is
h(
1
,
2
, . . . ,
n
) =
n
2
log(2e) + log det M
1
and the Boltzmann entropy of
i
is
h(
i
) =
1
2
log(2e) +
1
2
log(M
1
)
ii
.
The subadditivity of the Boltzmann entropy is the inequality
h(
1
,
2
, . . . ,
n
) h(
1
) + h(
2
) + . . . + h(
n
),
which is
log det A
n
i=1
log A
ii
in our particular Gaussian case, A = M
1
. What we obtained is the Hadamard
inequality
det A
n
i=1
A
ii
for a positive denite matrix A, see Theorem 1.30.
Example 1.44 If the matrix X M
n
can be written in the form
X = SDiag(
1
,
2
, . . . ,
n
)S
1
,
with
1
,
2
, . . . ,
n
> 0, then X is called weakly positive. Such a matrix
has n linearly independent eigenvectors with strictly positive eigenvalues. If
the eigenvectors are orthogonal, then the matrix is positive denite. Since X
has the form
_
SDiag(
_
1
,
_
2
, . . . ,
_
n
)S
__
(S
)
1
Diag(
_
1
,
_
2
, . . . ,
_
n
)S
1
_
,
it is the product of two positive denite matrices.
Although this X is not positive, the eigenvalues are strictly positive.
Therefore we can dene the square root as
X
1/2
= SDiag(
_
1
,
_
2
, . . . ,
_
n
)S
1
.
(See also Example 3.16).
The next result is called the Wielandt inequality. In the proof the
operator norm will be used.
Theorem 1.45 Let A be a self-adjoint operator such that for some numbers
a, b > 0 the inequalities aI A bI hold. Then for orthogonal unit vectors
x and y the inequality
[x, Ay[
2
_
a b
a + b
_
2
x, Ax y, Ay
holds.
Proof: The conditions imply that A is a positive invertible operator. The
next argument holds for any real number :
x, Ay = x, Ay x, y = x, (AI)y
= A
1/2
x, (I A
1
)A
1/2
y
and
[x, Ay[
2
x, Ax|I A
1
|
2
y, Ay.
It is enough to prove that
|I A
1
|
a b
a + b
.
for an appropriate .
Since A is self-adjoint, it is diagonal in a basis, so A = Diag(
1
,
2
, . . . ,
n
)
and
I A
1
= Diag
_
1

1
, . . . , 1

n
_
.
Recall that b
i
a. If we choose
=
2ab
a + b
,
then it is elementary to check that
a b
a + b
1

a b
a + b
,
which gives the proof.
The description of the generalized inverse of an m n matrix can be
described in terms of the singular value decomposition.
Let A M
mn
with strictly positive singular values
1
,
2
, . . . ,
k
. (Then
k m, n.) Dene a matrix M
mn
as
ij
=
_
i
if i = j k,
0 otherwise.
This matrix appears in the singular value decomposition described in the next
theorem.
Theorem 1.46 A matrix A M
mn
has the decomposition
A = UV

, (1.21)
where U M
m
and V M
n
are unitaries and M
mn
is dened above.
For the sake of simplicity we consider the case m = n. Then A has the
polar decomposition U
0
[A[ and [A[ can be diagonalized:
[A[ = U
1
Diag(
1
,
2
, . . . ,
k
, 0, . . . , 0)U
1
.
Therefore, A = (U
0
U
1
)U
1
, where U
0
and U
1
are unitaries.
Theorem 1.47 For a matrix A M
mn
there exists a unique matrix A
M
nm
such that the following four properties hold:
(1) AA
A = A,
(2) A
AA
= A
,
(3) AA
is self-adjoint,
(4) A
A is self-adjoint.
It is easy to describe A
in terms of the singular value decomposition (1.21).

Namely, A
= V
, where
ij
=
_
_
_
1
i
if i = j k,
0 otherwise.
If A is invertible, then n = m and
=
1
. Hence A
is the inverse of A.
Therefore A
is called the generalized inverse of A or the Moore-Penrose

generalized inverse. The generalized inverse has the properties
(A)
=
1
, (A
= A, (A
= (A
.
It is worthwhile to note that for a matrix A with real entries A
has real
entries as well. Another important note is the fact that the generalized inverse
of AB is not always B
.
Example 1.48 If M M
m
is an invertible matrix and v C
m
, then the
linear system
Mx = v
has the obvious solution x = M
1
v. If M M
mn
, then the generalized
inverse can be used. From property (1) a necessary condition of the solvability
of the equation is MM
v = v. If this condition holds, then the solution is

x = M
v + (I
n
M
M)z
with arbitrary z C
n
. This example justies the importance of the general-
ized inverse.
1.7 Tensor product
Let 1 be the linear space of polynomials in the variable x and with degree
less than or equal to n. A natural basis consists of the powers 1, x, x
2
, . . . , x
n
.
Similarly, let / be the space of polynomials in y of degree less than or equal
to m. Its basis is 1, y, y
2
, . . . , y
m
. The tensor product of these two spaces
is the space of polynomials of two variables with basis x
i
y
j
, 0 i n and
0 j m. This simple example contains the essential ideas.
Let 1and / be Hilbert spaces. Their algebraic tensor product consists
of the formal nite sums
i,j
x
i
y
j
(x
i
1, y
j
/).
Computing with these sums, one should use the following rules:
(x
1
+ x
2
) y = x
1
y + x
2
y, (x) y = (x y) ,
x (y
1
+ y
2
) = x y
1
+ x y
2
, x (y) = (x y) . (1.22)
The inner product is dened as
_
i,j
x
i
y
j
,
k,l
z
k
w
l
_
=
i,j,k,l
x
i
, z
k
y
j
, w
l
.
When 1 and / are nite-dimensional spaces, then we arrived at the tensor
product Hilbert space 1/; otherwise the algebraic tensor product must
be completed in order to get a Hilbert space.
Example 1.49 L
2
[0, 1] is the Hilbert space of the square integrable func-
tions on [0, 1]. If f, g L
2
[0, 1], then the elementary tensor f g can be
interpreted as a function of two variables, f(x)g(y) dened on [0, 1] [0, 1].
The computational rules (1.22) are obvious in this approach.
The tensor product of nitely many Hilbert spaces is dened similarly.
If e
1
, . . . , e
n
and f
1
, . . . , f
m
are bases in nite-dimensional 1 and /, re-
spectively, then e
i
f
j
: i, j is a basis in the tensor product space. This
basis is called product basis. An arbitrary vector x 1 / admits an
expansion
x =
i,j
c
ij
e
i
f
j
for some coecients c
ij
,
i,j
[c
ij
[
2
= |x|
2
. This kind of expansion is general,
but sometimes it is not the best.
1.7. TENSOR PRODUCT 41
Lemma 1.50 Any unit vector x 1/ can be written in the form
x =
p
k
g
k
h
k
, (1.23)
where the vectors g
k
1 and h
k
/ are orthonormal and (p
k
) is a probability
distribution.
Proof: We can dene a conjugate-linear mapping : 1 / as
, = x,
for every vector 1 and /. In the computation we can use the bases
(e
i
)
i
in 1 and (f
j
)
j
in /. If x has the expansion (??), then
e
i
, f
j
= c
ij
and the adjoint
is determined by
f
j
, e
i
= c
ij
.
(Concerning the adjoint of a conjugate-linear mapping, see (1.7).)
One can compute that the partial trace Tr
2
[xx[ of the matrix [xx[ is
D :=
(see the denition before Example 1.56). It is enough to check that

x, ([e
k
e
[ I
K
)x = Tr
[e
k
e
[
for every k and .
Choose now the orthogonal unit vectors g
k
such that they are eigenvectors
of D with corresponding non-zero eigenvalues p
k
, Dg
k
= p
k
g
k
. Then
h
k
:=
1
p
k
[g
k
is a family of pairwise orthogonal unit vectors. Now

x, g
k
h
= g
k
, h
=
1
g
k
, g
=
1
g
k
=
k,
and we arrived at the orthogonal expansion (1.23).

The product basis tells us that
dim(1/) = dim(1) dim(/).
Example 1.51 In the quantum formalism the orthonormal basis in the two-
dimensional Hilbert space 1 is denoted as [ , [ . Instead of [ [ ,
the notation [ is used. Therefore the product basis is
[ , [ , [ , [ .
Sometimes is replaced by 0 and by 1.
Another basis
1
2
([00 +[11),
1
2
([01 +[10),
i
2
([10 [01),
1
2
([00 [11)
is often used, which is called the Bell basis.
Example 1.52 In the Hilbert space L
2
(R
2
) we can get a basis if the space
is considered as L
2
(R) L
2
(R). In the space L
2
(R) the Hermite functions
n
(x) = exp(x
2
/2)H
n
(x)
form a good basis, where H
n
(x) is the appropriately normalized Hermite
polynomial. Therefore, the two variable Hermite functions
nm
(x, y) := e
(x
2
+y
2
)/2
H
n
(x)H
m
(y) (n, m = 0, 1, . . .)
form a basis in L
2
(R
2
).
The tensor product of linear transformations can be dened as well. If
A : 1
1
/
1
and B : 1
2
/
2
are linear transformations, then there is a
unique linear transformation AB : 1
1
1
2
/
1
/
2
such that
(AB)(v
1
v
2
) = Av
1
Bv
2
(v
1
1
1
, v
2
1
2
).
Since the linear mappings (between nite-dimensional Hilbert spaces) are
identied with matrices, the tensor product of matrices appears as well.
Example 1.53 Let e
1
, e
2
, e
3
be a basis in 1 and f
1
, f
2
be a basis in /.
If [A
ij
] is the matrix of A B(1
1
) and [B
kl
] is the matrix of B B(1
2
),
then
(AB)(e
j
f
l
) =
i,k
A
ij
B
kl
e
i
f
k
.
It is useful to order the tensor product bases lexicographically: e
1
f
1
, e
1
f
2
, e
2
f
1
, e
2
f
2
, e
3
f
1
, e
3
f
2
. Fixing this ordering, we can write down
the matrix of AB and we have
_
_
A
11
B
11
A
11
B
12
A
12
B
11
A
12
B
12
A
13
B
11
A
13
B
12
A
11
B
21
A
11
B
22
A
12
B
21
A
12
B
22
A
13
B
21
A
13
B
22
A
21
B
11
A
21
B
12
A
22
B
11
A
22
B
12
A
23
B
11
A
23
B
12
A
21
B
21
A
21
B
22
A
22
B
21
A
22
B
22
A
23
B
21
A
23
B
22
A
31
B
11
A
31
B
12
A
32
B
11
A
32
B
12
A
33
B
11
A
33
B
12
A
31
B
21
A
31
B
22
A
32
B
21
A
32
B
22
A
33
B
21
A
33
B
22
_
_
.
In the block-matrix formalism we have
AB =
_
_
A
11
B A
12
B A
13
B
A
21
B A
22
B A
23
B
A
31
B A
32
B A
33
B
_
_
, (1.24)
see Section 2.1. The tensor product of matrices is also called the Kronecker
product.
Example 1.54 When A M
n
and B M
m
, the matrix
I
m
A + B I
n
M
nm
is called the Kronecker sum of A and B.
If u is an eigenvector of A with eigenvalue and v is an eigenvector of B
with eigenvalue , then
(I
m
A+ B I
n
)(u v) = (u v) + (u v) = ( + )(u v).
So u v is an eigenvector of the Kronecker sum with eigenvalue + .
The computation rules of the tensor product of Hilbert spaces imply straight-
forward properties of the tensor product of matrices (or linear operators).
Theorem 1.55 The following rules hold:
(1) (A
1
+ A
2
) B = A
1
B + A
2
B,
(2) B (A
1
+ A
2
) = B A
1
+ B A
2
,
(3) (A) B = A (B) = (AB) ( C),
(4) (A B)(C D) = AC BD,
(5) (A B)
= A
,
(6) (A B)
1
= A
1
B
1
if A and B are invertible,
(6) |AB| = |A| |B|.
For example, the tensor product of self-adjoint matrices is self-adjoint, the
tensor product of unitaries is unitary.
The linear mapping M
n
M
m
M
n
dened as
Tr
2
: A B (Tr B)A
is called a partial trace. The other partial trace is
Tr
1
: AB (Tr A)B.
Example 1.56 Assume that A M
n
and B M
m
. Then A B is an
nm nm matrix. Let C M
nm
. How can we decide if it has the form of
A B for some A M
n
and B M
m
?
First we study how to recognize A and B from AB. (Of course, A and
B are not uniquely determined, since (A) (
1
B) = A B.) If we take
the trace of all entries of (1.24), then we get
_
_
A
11
Tr B A
12
Tr B A
13
Tr B
A
21
Tr B A
22
Tr B A
23
Tr B
A
31
Tr B A
32
Tr B A
33
Tr B
_
_
= Tr B
_
_
A
11
A
12
A
13
A
21
A
22
A
23
A
31
A
32
A
33
_
_
= (Tr B)A.
The sum of the diagonal entries is
A
11
B + A
12
B + A
13
B = (Tr A)B.
If X = AB, then
(Tr X)X = (Tr
2
X) (Tr
1
X).
For example, the matrix
X :=
_
_
0 0 0 0
0 1 1 0
0 1 1 0
0 0 0 0
_
_
in M
2
M
2
is not a tensor product. Indeed,
Tr
1
X = Tr
2
X =
_
1 0
0 1
_
and their tensor product is the identity in M
4
.
Let 1 be a Hilbert space. The k-fold tensor product 1 1 is
called the kth tensor power of 1, in notation 1
k
. When A B(1), then
A
(1)
A
(2)
A
(k)
is a linear operator on 1
k
and it is denoted by A
k
.
(Here A
(i)
s are copies of A.)
1
k
has two important subspaces, the symmetric and the antisymmetric
ones. If v
1
, v
2
, , v
k
1 are vectors, then their antisymmetric tensor-
product is the linear combination
v
1
v
2
v
k
:=
1
k!
(1)
()
v
(1)
v
(2)
v
(k)
where the summation is over all permutations of the set 1, 2, . . . , k and
() is the number of inversions in . The terminology antisymmetric
comes from the property that an antisymmetric tensor changes its sign if two
elements are exchanged. In particular, v
1
v
2
v
k
= 0 if v
i
= v
j
for
dierent i and j.
The computational rules for the antisymmetric tensors are similar to (1.22):
(v
1
v
2
v
k
) = v
1
v
2
v
1
(v
) v
+1
v
k
for every and
(v
1
v
2
v
1
v v
+1
v
k
)
+ (v
1
v
2
v
1
v
v
+1
v
k
)
= v
1
v
2
v
1
(v + v
) v
+1
v
k
.
Lemma 1.57 The inner product of v
1
v
2
v
k
and w
1
w
2
w
k
is the determinant of the k k matrix whose (i, j) entry is v
i
, w
j
.
Proof: The inner product is
1
k!
(1)
()
(1)
()
v
(1)
, w
(1)
v
(2)
, w
(2)
. . . v
(k)
, w
(k)
=
1
k!
(1)
()
(1)
()
v
1
, w
1
(1)
v
2
, w
1
(2)
. . . v
k
, w
1
(k)
=
1
k!
(1)
(
1
)
v
1
, w
1
(1)
v
2
, w
1
(2)
. . . v
k
, w
1
(k)
(1)
()
v
1
, w
(1)
v
2
, w
(2)
. . . v
k
, w
(k)
.
This is the determinant.
It follows from the previous lemma that v
1
v
2
v
k
,= 0 if and only if
the vectors v
1
, v
2
, v
k
are linearly independent. The subspace spanned by
the vectors v
1
v
2
v
k
is called the kth antisymmetric tensor power of
1, in notation 1
k
. So 1
k
1
k
.
Lemma 1.58 The linear extension of the map
x
1
x
k

1
k!
x
1
x
k
is the projection of 1
k
onto 1
k
.
Proof: Let P be the dened linear operator. First we show that P
2
= P:
P
2
(x
1
x
k
) =
1
(k!)
3/2
(1)
()
x
(1)
x
(k)
=
1
(k!)
3/2
(1)
()+()
x
1
x
k
=
1
k!
x
1
x
k
= P(x
1
x
k
).
Moreover, P = P
:
P(x
1
x
k
), y
1
y
k
=
1
k!
(1)
()
k
i=1
x
(i)
, y
i
=
1
k!
(1)
(
1
)
k
i=1
x
i
, y
1
(i)
= x
1
x
k
, P(y
1
y
k
).
So P is an orthogonal projection.
Example 1.59 A transposition is a permutation of 1, 2, . . . , n which ex-
changes the place of two entries. For a transposition , there is a unitary
U
: 1
k
1
k
such that
U
(v
1
v
2
v
n
) = v
(1)
v
(2)
v
(n)
.
Then
1
k
= x 1
k
: U
x = x for every . (1.25)

The terminology antisymmetric comes from this description.
If e
1
, e
2
, . . . , e
n
is a basis in 1, then
e
i(1)
e
i(2)
e
i(k)
: 1 i(1) < i(2) < < i(k)) n
is a basis in 1
k
. It follows that the dimension of 1
k
is
_
n
k
_
if k n,
otherwise for k > n the power 1
k
has dimension 0. Consequently, 1
n
has
dimension 1.
If A B(1), then the transformation A
k
leaves the subspace 1
k
invari-
ant. Its restriction is denoted by A
k
which is equivalently dened as
A
k
(v
1
v
2
v
k
) = Av
1
Av
2
Av
k
.
For any operators A, B B(1), we have
(A
)
k
= (A
k
)
, (AB)
k
= A
k
B
k
and
A
n
= identity (1.26)
The constant is the determinant:
Theorem 1.60 For A M
n
, the constant in (1.26) is det A.
Proof: If e
1
, e
2
, . . . , e
n
is a basis in 1, then in the space 1
n
the vector
e
1
e
2
. . . e
n
forms a basis. We should compute A
k
(e
1
e
2
. . . e
n
).
(A
k
)(e
1
e
2
e
n
) = (Ae
1
) (Ae
2
) (Ae
n
)
=
_
n
i(1)=1
A
i(1),1
e
i(1)
_
_
n
i(2)=1
A
i(2),2
e
i(2)
_

_
n
i(n)=1
A
i(n),n
e
i(n)
_
=
n
i(1),i(2),...,i(n)=1
A
i(1),1
A
i(2),2
A
i(n),n
e
i(1)
e
i(n)
=
A
(1),1
A
(2),2
A
(n),n
e
(1)
e
(n)
=
A
(1),1
A
(2),2
A
(n),n
(1)
()
e
1
e
n
.
Here we used the fact that e
i(1)
e
i(n)
can be non-zero if the vec-
tors e
i(1)
, . . . , e
i(n)
are all dierent, in other words, this is a permutation of
e
1
, e
2
, . . . , e
n
.
Example 1.61 Let A M
n
be a self-adjoint matrix with eigenvalues
1

2

n
. The corresponding eigenvectors v
1
, v
2
, , v
n
form a good
basis. The largest eigenvalue of the antisymmetric power A
k
is
k
i=1
i
:
A
k
(v
1
v
2
v
k
) = Av
1
Av
2
Av
k
=
_
k
i=1
i
_
(v
1
v
2
. . . v
k
).
All other eigenvalues can be obtained from the basis of the antisymmetric
product (as in the proof of the next lemma).
The next lemma contains a relation of singular values with the antisym-
metric powers.
Lemma 1.62 For A M
n
and for k = 1, . . . , n, we have
k
i=1
s
i
(A) = s
1
(A
k
) = |A
k
|.
Proof: Since [A[
k
= [A
k
[, we may assume that A 0. Then there exists
an orthonormal basis u
1
, , u
n
of 1 such that Au
i
= s
i
(A)u
i
for all i. We
have
A
k
(u
i(1)
u
i(k)
) =
_
k
j=1
s
i(j)
(A)
_
u
i(1)
u
i(k)
,
and so u
i(1)
. . . u
i(k)
: 1 i(1) < < i(k) n is a complete set of
eigenvectors of A
k
. Hence the assertion follows.
The symmetric tensor product of the vectors v
1
, v
2
, . . . , v
k
1 is
v
1
v
2
v
k
:=
1
k!
v
(1)
v
(2)
v
(k)
,
where the summation is over all permutations of the set 1, 2, . . . , k again.
The linear span of the symmetric tensors is the symmetric tensor power 1
k
.
Similarly to (1.25), we have
1
k
= x
k
1 : U
x = x for every .
It follows immediately, that 1
k
1
k
for any k 2. Let u 1
k
and
v 1
k
. Then
u, v = U
u, U
v = u, v
1.8. NOTES AND REMARKS 49
and u, v = 0.
If e
1
, e
2
, . . . , e
n
is a basis in 1, then
k
1 has the basis
e
i(1)
e
i(2)
e
i(k)
: 1 i(1) i(2) i(k) n.
Similarly to the proof of Lemma 1.57 we have
v
1
v
2
v
k
, w
1
w
2
w
k
=
v
1
, w
(1)
v
2
, w
(2)
. . . v
k
, w
(k)
.
The right-hand-side is similar to a determinant, but the sign is not changing.
The permanent is dened as
per A =
A
1,(1)
A
2,(2)
. . . A
n,(n)
. (1.27)
similarly to the determinant formula (1.2).
1.8 Notes and remarks
The history of matrices goes back to ancient times. Their rst appearance in
application to linear equations was in ancient China. The notion of determi-
nants preceded the introduction and development of matrices and linear alge-
bra. Determinants were rst studied by a Japanese mathematician Takakazu
Seki in 1683 and by Gottfried Leibnitz (1646-1716) in 1693. In 1750 Gabriel
Cramer (1704-1752) discovered his famous determinant-based formula of so-
lutions to systems of linear equations. From the 18th century to the beginning
of the 19th, theoretical studies of determinants were made by Vandermonde
(famous for the determinant named after him), Joseph-Louis Lagrange (1736-
1813) who characterized the maxima and minima of multivariate functions
by his method known as the method of Lagrange multipliers, Pierre-Simon
Laplace (17491827), and Augustin Louis Cauchy (17891857). Gaussian
elimination to solve systems of linear equations by successively eliminating
variables was developed around 1800 by Johann Carl Friedrich Gauss (1777-
1855). However, as mentioned at the beginning, a prototype of this method
appeared in the important Chinese text in ancient times. The method is also
referred to as Gauss-Jordan elimination since it was published in 1887 in an
extended form by Wilhelm Jordan.
As explained above, determinants had been more dominant than matrices
in the rst stage of the history of matrix theory up to the middle of the 19th
century. The modern treatment of matrices emerged when Arthur Cayley
(18211895) published in 1858 his monumental work, Memoir on the The-
ory of Matrices. Before that, in 1851 James Josep Sylvester (1814-1897)
introduced the term matrix after the Latin word womb. Cayley studied
matrices in modern style by making connections with linear transformations.
The axiomatic denition of vector spaces was nally introduced by Giuseppe
Peano (1858-1932) in 1888. Computation of the determinant of concrete spe-
cial matrices has a huge literature; for example, the book Thomas Muir, A
Treatise on the Theory of Determinants (originally published in 1928) has
more than 700 pages.
In this book matrices are mostly complex matrices, which can be studied
from three dierent aspects. The rst aspect is algebraic. The nn matrices
form a -algebra with linear operations, product AB and adjoint A
as de-
scribed in the rst section. The second is the topological/analytic aspect of
matrices as described in Section 1.2. Since a matrix corresponds to a linear
transformation between nite-dimensional vector spaces, the operator norm
is naturally assigned to a matrix. It is also important that the n n ma-
trices form a Hilbert space with the Hilbert-Schmidt inner product. In this
respect, Section 1.2 may be a concise introduction to Hilbert spaces though
mostly restricted to the nite-dimensional case. The third aspect is the order
structure of matrices described in Section 1.6 and in further detail in Section
2.2. These three structures are closely related to each other and that is an
essential point for study of matrix analysis.
Cauchy proved in 1829 that the eigenvalues of a symmetric matrix are all
real numbers (see the proof of Theorem 1.23). The cofactor expansion for the
determinant in Theorem 1.31 was shown by Laplace. The famous Cayley-
Hamilton theorem (Theorem 1.18) for 22 and 33 matrices was contained
in Cayleys work mentioned above, and later Willian Rowan Hamilton showed
it for 4 4 matrices. The formula for inverse matrices (Theorem 1.33) was
also established by Cayley.
Spectral and polar decompositions are fundamental in operator theory.
The phase operator U of the polar decomposition A = U[A[ cannot always
be a unitary in the innite-dimensional case. However, a unitary U can
be chosen for matrices, though it is not unique. A ner decomposition for
general square matrices is the Jordan canonical form in Section 1.3, which
is due to Camille Jordan (1771-1821), a person dierent from the Jordan of
Gauss-Jordan elimination.
The useful minimax principle in Theorem 1.27 is often called the Courant-
Fisher-Weyl minimax principe due to their contributions (see Section III.7
of [20] for details). A similar expression for singular values will be given in
(6.5), and another expression called Ky Fans maximum principle for the sum
1.9. EXERCISES 51
k
j=1
j
of the k largest eigenvalues of a self-adjoint matrix is also useful
in matrix theory. Theorem 1.30 is the Hadamard inequality established by
Jacques Hadamard in 1893. Weakly positive matrices were introduced by
Eugene P. Wigner in 1963. He showed that if the product of two or three
weakly positive matrices is self-adjoint, then it is positive denite.
Hilbert spaces and operators are essential in the mathematical formulation
of quantum mechanics. John von Neumann (1903-1957) introduced several
concepts in connection with operator/matrix theory and quantum physics in
Mathematische Grundlagen der Quantenmechanik, 1932. Nowadays, matrix
theory plays an essential role in the theory of quantum information as well,
see [74]. When quantum theory appeared in the 1920s, some matrices already
appeared in the work of Werner Heisenberg. Later the physicist Paul Adrien
Maurice Dirac (1902-1984) introduced the bra-ket notation, which is used
sometimes in this book. But for column vectors x, y, matrix theorists prefer
to write x
y for the inner product x[y and xy
for the rank one operator

[xy[.
Note that the Kronecker sum is often denoted by AB in the literature,
but in this book is the notation for the direct sum. The antisymmetric
and symmetric tensor products are used in the construction of antisymmet-
ric (Fermion) and symmetric (Boson) Fock spaces in the study of quantum
mechanics and quantum eld theory. The antisymmetric tensor product is a
powerful technique in matrix theory as well, as will be seen in Chapter 6.
Concerning the permanent (1.27), a famous conjecture of Van der Waer-
den made in 1926 was that if A is an n n doubly stochastic matrix then
per A
n!
n
n
,
and the equality holds if and only if A
ij
= 1/n for all 1 i, j n. (The
proof was given in 1981 by G.P. Egorychev and D. Falikman. It is included
in the book [87].)
1.9 Exercises
1. Let A : 1
2
1
1
, B : 1
3
1
2
and C : 1
4
1
3
be linear mappings.
Show that
rank AB + rank BC rank B + rank ABC.
(This is called Frobenius inequality).
2. Let A : 1 1 be a linear mapping. Show that
dim ker A
n+1
= dim ker A+
n
k=1
dim(ranA
k
ker A).
3. Show that in the Schwarz inequality (1.3) the equality occurs if and
only if x and y are linearly dependent.
4. Show that
|x y|
2
+|x + y|
2
= 2|x|
2
+ 2|y|
2
for the norm in a Hilbert space. (This is called parallelogram law.)
5. Show the polarization identity (1.8).
6. Show that an orthonormal family of vectors is linearly independent.
7. Show that the vectors [x
1
, [x
2
, , . . . , [x
n
form an orthonormal basis in
an n-dimensional Hilbert space if and only if
i
[x
i
x
i
[ = I.
8. Show that Gram-Schmidt procedure constructs an orthonormal basis
e
1
, e
2
, . . . , e
n
. Show that e
k
is the linear combination of v
1
, v
2
, . . . , v
k
(1 k n).
9. Show that the upper triangular matrices form an algebra.
10. Verify that the inverse of an upper triangular matrix is upper triangular
if the inverse exists.
11. Compute the determinant of the matrix
_
_
1 1 1 1
1 2 3 4
1 3 6 10
1 4 10 20
_
_
.
Give an n n generalization.
12. Compute the determinant of the matrix
_
_
1 1 0 0
x h 1 0
x
2
hx h 1
x
3
hx
2
hx h
_
_
.
Give an n n generalization.
1.9. EXERCISES 53
13. Let A, B M
n
and
B
ij
= (1)
i+j
A
ij
(1 i, j n).
Show that det A = det B.
14. Show that the determinant of the Vandermonde matrix
_
_
1 1 1
a
1
a
2
a
n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
1
a
n1
2
a
n1
n
_
_
is
i<j
(a
j
a
i
).
15. Show the following properties:
([uv[)
= [vu[, ([u
1
v
1
[)([u
2
v
2
[) = v
1
, u
2
[u
1
v
2
[,
A([uv[) = [Auv[, ([uv[)A = [uA
v[ for all A B(1).

16. Let A, B B(1). Show that |AB| |A| |B|.
17. Let 1 be an n-dimensional Hilbert space. For A B(1) let |A|
2
:=
Tr A
A. Show that |A+B|

2
|A|
2
+|B|
2
. Is it true that |AB|
2

|A|
2
|B|
2
?
18. Find constants c(n) and d(n) such that
c(n)|A| |A|
2
d(n)|A|
for every matrix A M
n
(C).
19. Show that |A
A| = |A|
2
for every A B(1).
20. Let 1 be an n-dimensional Hilbert space. Show that given an operator
A B(1) we can choose an orthonormal basis such that the matrix of
A is upper triangular.
21. Let A, B M
n
be invertible matrices. Show that A+B is invertible if
and only if A
1
+ B
1
is invertible, moreover
(A+ B)
1
= A
1
A
1
(A
1
+ B
1
)
1
A
1
.
22. Let A M
n
be self-adjoint. Show that
U = (I iA)(I + iA)
1
is a unitary. (U is the Cayley transform of A.)
23. The self-adjoint matrix
0
_
a b
b c
_
has eigenvalues and . Show that
[b[
2
_

+
_
2
ac. (1.28)
24. Show that
_
+ z x iy
x + iy z
_
1
=
1
2
x
2
y
2
z
2
_
z x + iy
x iy + z
_
for real parameters , x, y, z.
25. Let m n, A M
n
, B M
m
, Y M
nm
and Z M
mn
. Assume that
A and B are invertible. Show that A+Y BZ is invertible if and only if
B
1
+ ZA
1
Y is invertible. Moreover,
(A+ Y BZ)
1
= A
1
A
1
Y (B
1
+ ZA
1
Y )
1
ZA
1
.
26. Let
1
,
2
, . . . ,
n
be the eigenvalues of the matrix A M
n
(C). Show
that A is normal if and only if
n
i=1
[
i
[
2
=
n
i,j=1
[A
ij
[
2
.
27. Show that A M
n
is normal if and only if A
= AU for a unitary
U M
n
.
28. Give an example such that A
2
= A, but A is not an orthogonal projec-
tion.
29. A M
n
is called idempotent if A
2
= A. Show that each eigenvalue of
an idempotent matrix is either 0 or 1.
30. Compute the eigenvalues and eigenvectors of the Pauli matrices:
1
=
_
0 1
1 0
_
,
2
=
_
0 i
i 0
_
,
3
=
_
1 0
0 1
_
. (1.29)
31. Show that the Pauli matrices (1.29) are orthogonal to each other (with
respect to the HilbertSchmidt inner product). What are the matrices
which are orthogonal to all Pauli matrices?
1.9. EXERCISES 55
32. The n n Pascal matrix is dened as
P
ij
=
_
i + j 2
i 1
_
(1 i, j n).
What is the determinant? (Hint: Generalize the particular relation
_
_
1 1 1 1
1 2 3 4
1 3 6 10
1 4 10 20
_
_
=
_
_
1 0 0 0
1 1 0 0
1 2 1 0
1 3 3 1
_
_
1 1 1 1
0 1 2 3
0 0 1 3
0 0 0 1
_
_
to n n matrices.)
33. Let be an eigenvalue of a unitary operator. Show that [[ = 1.
34. Let A be an n n matrix and let k 1 be an integer. Assume that
A
ij
= 0 if j i + k. Show that A
nk
is the 0 matrix.
35. Show that [ det U[ = 1 for a unitary U.
36. Let U M
n
and u
1
, . . . , u
n
be n column vectors of U, i.e., U =
[u
1
u
2
. . . u
n
]. Prove that U is a unitary matrix if and only if u
1
, . . . , u
n
is an orthonormal basis of C
n
.
37. Let a matrix U = [u
1
u
2
. . . u
n
] M
n
be described by column vectors.
Assume that u
1
, . . . , u
k
are given and orthonormal in C
n
. Show that
u
k+1
, . . . , u
n
can be chosen in such a way that U will be a unitary matrix.
38. Compute det(I A) when A is the tridiagonal matrix (1.11).
39. Let U B(1) be a unitary. Show that
lim
n
1
n
n
i=1
U
n
x
exists for every vector x 1. (Hint: Consider the subspaces x 1 :
Ux = x and Ux x : x 1.) What is the limit
lim
n
1
n
n
i=1
U
n
?
(This is the ergodic theorem.)
40. Let
[
0
=
1
2
([00 +[11) C
2
C
2
and
[
i
= (
i
I
2
)[
0
(i = 1, 2, 3)
by means of the Pauli matrices
i
. Show that [
i
: 0 i 3 is the
Bell basis.
41. Show that the vectors of the Bell basis are eigenvectors of the matrices
i
, 1 i 3.
42. Show the identity
[ [
0
=
1
2
3
k=0
[
k

k
[
in C
2
C
2
C
2
, where [ C
2
and [
i
C
2
C
2
is dened above.
43. Write the so-called Dirac matrices in the form of elementary tensor
(of two 2 2 matrices):
1
=
_
_
0 0 0 i
0 0 i 0
0 i 0 0
i 0 0 0
_
_
,
2
=
_
_
0 0 0 1
0 0 1 0
0 1 0 0
1 0 0 0
_
_
,
3
=
_
_
0 0 i 0
0 0 0 i
i 0 0 0
0 i 0 0
_
_
,
4
=
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
.
44. Give the dimension of 1
k
if dim(1) = n.
45. Let A B(/) and B B(1) be operators on the nite-dimensional
spaces 1 and /. Show that
det(AB) = (det A)
m
(det B)
n
,
where n = dim1 and m = dim/. (Hint: The determinant is the
product of the eigenvalues.)
46. Show that |AB| = |A| |B|.
1.9. EXERCISES 57
47. Use Theorem 1.60 to prove that det(AB) = det Adet B. (Hint: Show
that (AB)
k
= (A
k
)(B
k
).)
48. Let x
n
+c
1
x
n1
+ +c
n
be the characteristic polynomial of A M
n
.
Show that c
k
= Tr A
k
.
49. Show that
11 = (1 1) (1 1)
for a Hilbert space 1.
50. Give an example of A M
n
(C) such that the spectrum of A is in R
+
and A is not positive.
51. Let A M
n
(C). Show that A is positive if and only if X
AX is positive
for every X M
n
(C).
52. Let A B(1). Prove the equivalence of the following assertions: (i)
|A| 1, (ii) A
A I, and (iii) AA
I.
53. Let A M
n
(C). Show that A is positive if and only if Tr XA is positive
for every positive X M
n
(C).
54. Let |A| 1. Show that there are unitaries U and V such that
A =
1
2
(U + V ).
(Hint: Use Example 1.39.)
55. Show that a matrix is weakly positive if and only if it is the product of
two positive denite matrices.
56. Let V : C
n
C
n
C
n
be dened as V e
i
= e
i
e
i
. Show that
V

(A B)V = A B
for A, B M
n
(C). Conclude the Schur theorem.
57. Show that
[per (AB)[
2
per (AA
)per (B
B).
58. Let A M
n
and B M
m
. Show that
Tr (I
m
A + B I
n
) = mTr A+ nTr B.
59. For a vector f 1 the linear operator a
+
(f) :
k
1
k+1
1 is dened
as
a
+
(f) v
1
v
2
v
k
= f v
1
v
2
v
k
.
Compute the adjoint of a
+
(f) which is denoted by a(f).
60. For A B(1) let T(A) :
k
1
k
1 be dened as
T(A) v
1
v
2
v
k
=
k
i=1
v
1
v
2
v
i1
Av
i
v
i+1
. . . v
k
.
Show that
T([fg[) = a
+
(f)a(g)
for f, g 1. (Recall that a and a
+
are dened in the previous exercise.)
61. The group
( =
__
a b
0 c
_
: a, b, c R, a ,= 0, c ,= 0
_
is locally compact. Show that the left invariant Haar measure can be
dened as
(H) =
_
H
p(A) dA,
where
A =
_
x y
0 z
_
, p(A) =
1
x
2
[z[
, dA = dxdy dz.
Show that the right invariant Haar measure is similar, but
p(A) =
1
[x[z
2
.
Chapter 2
Mappings and algebras
Mostly the statements and denitions are formulated in the Hilbert space
setting. The Hilbert space is always assumed to be nite-dimensional, so
instead of operator one can consider a matrix. The idea of block-matrices
provides quite a useful tool in matrix theory. Some basic facts on block-
matrices are in Section 2.1. Matrices have two primary structures; one is of
course their algebraic structure with addition, multiplication, adjoint, etc.,
and another is the order structure coming from the partial order of positive
semideniteness, as explained in Section 2.2. Based on this order one can
consider several notions of positivity for linear maps between matrix algebras,
which are discussed in Section 2.6.
2.1 Block-matrices
If 1
1
and 1
2
are Hilbert spaces, then 1
1
1
2
consists of all the pairs
(f
1
, f
2
), where f
1
1
1
and f
2
1
2
. The linear combinations of the pairs
are computed entrywise and the inner product is dened as
(f
1
, f
2
), (g
1
, g
2
) := f
1
, g
1
+f
2
, g
2
.
It follows that the subspaces (f
1
, 0) : f
1
1
1
and (0, f
2
) : f
2
1
2
are
orthogonal and span the direct sum 1
1
1
2
.
Assume that 1 = 1
1
1
2
, / = /
1
/
2
and A : 1 / is a linear
operator. A general element of 1 has the form (f
1
, f
2
) = (f
1
, 0) + (0, f
2
).
We have A(f
1
, 0) = (g
1
, g
2
) and A(0, f
2
) = (g
1
, g
2
) for some g
1
, g
1
/
1
and
g
2
, g
2
/
2
. The linear mapping A is determined uniquely by the following 4
linear mappings:
A
i1
: f
1
g
i
, A
i1
: 1
1
/
i
(1 i 2)
59
60 CHAPTER 2. MAPPINGS AND ALGEBRAS
and
A
i2
: f
2
g
i
, A
i2
: 1
2
/
i
(1 i 2).
We write A in the form
_
A
11
A
12
A
21
A
22
_
.
The advantage of this notation is the formula
_
A
11
A
12
A
21
A
22
_ _
f
1
f
2
_
=
_
A
11
f
1
+ A
12
f
2
A
21
f
1
+ A
22
f
2
_
.
(The right-hand side is A(f
1
, f
2
) written in the form of a column vector.)
Assume that e
i
1
, e
i
2
, . . . , e
i
m(i)
is a basis in 1
i
and f
j
1
, f
j
2
, . . . , f
j
n(j)
is a basis
in /
j
, 1 i, j 2. The linear operators A
ij
: 1
j
/
i
have a matrix [A
ij
]
with respect to these bases. Since
(e
1
t
, 0) : 1 t m(1) (0, e
2
u
) : 1 u m(2)
is a basis in 1 and similarly
(f
1
t
, 0) : 1 t n(1) (0, f
2
u
) : 1 u n(2)
is a basis in /, the operator A has an (n(1) + n(2)) (m(1) +m(2)) matrix
which is expressed by the n(i) m(j) matrices [A
ij
] as
[A] =
_
[A
11
] [A
12
]
[A
21
] [A
22
]
_
.
This is a 2 2 matrix with matrix entries and it is called a block-matrix.
The computation with block-matrices is similar to that of ordinary matri-
ces:
_
[A
11
] [A
12
]
[A
21
] [A
22
]
_
=
_
[A
11
]
[A
21
]
[A
12
]
[A
22
]
_
,
_
[A
11
] [A
12
]
[A
21
] [A
22
]
_
+
_
[B
11
] [B
12
]
[B
21
] [B
22
]
_
=
_
[A
11
] + [B
11
] [A
12
] + [B
12
]
[A
21
] + [B
21
] [A
22
] + [B
22
]
_
and
_
[A
11
] [A
12
]
[A
21
] [A
22
]
_
_
[B
11
] [B
12
]
[B
21
] [B
22
]
_
=
_
[A
11
] [B
11
] + [A
12
] [B
21
] [A
11
] [B
12
] + [A
12
] [B
22
]
[A
21
] [B
11
] + [A
22
] [B
21
] [A
21
] [B
12
] + [A
22
] [B
22
]
_
.
2.1. BLOCK-MATRICES 61
In several cases we do not emphasize the entries of a block-matrix
_
A B
C D
_
.
However, if this matrix is self-adjoint we assume that A = A
, B
= C
and D = D
. (These conditions include that A and D are square matrices,

A M
n
and B M
m
.)
The block-matrix is used for the denition of reducible matrices. A
M
n
is reducible if there is a permutation matrix P M
n
such that
P
t
AP =
_
B C
0 D
_
.
A matrix A M
n
is irreducible if it is not reducible.
For a 2 2 matrix, it is very easy to check the positivity:
_
a b
b c
_
0 if and only if a 0 and b
b ac.
If the entries are matrices, then the condition for positivity is similar but it
is a bit more complicated. It is obvious that a diagonal block-matrix
_
A 0
0 D
_
.
is positive if and only if the diagonal entries A and D are positive.
Theorem 2.1 Assume that A is invertible. The self-adjoint block-matrix
_
A B
B
C
_
(2.1)
is positive if and only if A is positive and
B
A
1
B C.
Proof: First assume that A = I. The positivity of
_
I B
B
C
_
is equivalent to the condition
(f
1
, f
2
),
_
I B
B
C
_
(f
1
, f
2
) 0
for all vectors f
1
and f
2
. A computation gives that this condition is
f
1
, f
1
+f
2
, Cf
2
2Re Bf
2
, f
1
.
If we replace f
1
by e
i
f
1
with real , then the left-hand side does not change,
while the right-hand side becomes 2[Bf
2
, f
1
[ for an appropriate . Choosing
f
1
= Bf
2
, we obtain the condition
f
2
, Cf
2
f
2
, B
Bf
2
for every f
2
. This means that positivity implies the condition C B
B. The
converse is also true, since the right-hand side of the equation
_
I B
B
C
_
=
_
I 0
B
0
_ _
I B
0 0
_
+
_
0 0
0 C B
B
_
is the sum of two positive block-matrices.
For a general positive invertible A, the positivity of (2.1) is equivalent to
the positivity of the block-matrix
_
A
1/2
0
0 I
_ _
A B
B
C
_ _
A
1/2
0
0 I
_
=
_
I A
1/2
B
B
A
1/2
C
_
.
This gives the condition C B
A
1
B.
Another important characterization of the positivity of (2.1) is the condi-
tion that A, C 0 and B = A
1/2
WC
1/2
with a contraction W. (Here the
invertibility of A or C is not necessary.)
Theorem 2.1 has applications in dierent areas, see for example the Cramer-
Rao inequality, Section 7.5.
Theorem 2.2 For an invertible A, we have the so-called Schur factoriza-
tion
_
A B
C D
_
=
_
I 0
CA
1
I
_
_
A 0
0 D CA
1
B
_
_
I A
1
B
0 I
_
. (2.2)
The proof is simply the computation of the product on the right-hand side.
Since
_
I 0
CA
1
I
_
1
=
_
I 0
CA
1
I
_
is invertible, the positivity of the left-hand side of (2.2) with C = B
is
equivalent to the positivity of the middle factor of the right-hand side. This
fact gives the second proof of Theorem 2.1.
In the Schur factorization the rst factor is lower triangular, the second
factor is block diagonal and the third one is upper triangular. This structure
allows an easy computation of the determinant and the inverse.
Theorem 2.3 The determinant can be computed as follows.
det
_
A B
C D
_
= det A det (D CA
1
B).
If
M =
_
A B
C D
_
,
then D CA
1
B is called the Schur complement of A in M, in notation
M/A. Hence the determinant formula becomes det M = det A det (M/A).
Theorem 2.4 Let
M =
_
A B
B
C
_
be a positive invertible matrix. Then
M/C = ABC
1
B
= sup
_
X 0 :
_
X 0
0 0
_
_
A B
B
C
__
.
Proof: The condition
_
A X B
B
C
_
0
is equivalent to
AX BC
1
B
,
and this gives the result.
Theorem 2.5 For a block-matrix
0
_
A X
X
B
_
M
n
,
we have
_
A X
X
B
_
= U
_
A 0
0 0
_
U
+ V
_
0 0
0 B
_
V

for some unitaries U, V M
n
.
Proof: We can take
0
_
C Y
Y

D
_
M
n
such that
_
A X
X
B
_
=
_
C Y
Y
D
_ _
C Y
Y
D
_
=
_
C
2
+ Y Y
CV + Y D
Y

C + DY
Y + D
2
_
.
It follows that
_
A X
X
B
_
=
_
C 0
Y
0
_ _
C Y
0 0
_
+
_
0 Y
0 D
_ _
0 0
Y
D
_
= T
T + S
S,
where
T =
_
C Y
0 0
_
and S =
_
0 0
Y
D
_
.
When T = U[T[ and S = V [S[ with the unitaries U, V M
n
, then
T
T = U(TT
)U
and S
S = V (SS
)V
.
From the formulas
TT
=
_
C
2
+ Y Y
0
0 0
_
=
_
A 0
0 0
_
, SS
=
_
0 0
0 Y
Y + D
2
_
=
_
0 0
0 B
_
,
we have the result.
Example 2.6 Similarly to the previous theorem we take a block-matrix
0
_
A X
X
B
_
M
n
.
With a unitary
W :=
1
2
_
iI I
iI I
_
we notice that
W
_
A X
X
B
_
W
=
_
A+B
2
+ ImX
AB
2
+ iRe X
AB
2
iRe X
A+B
2
ImX
_
.
So Theorem 2.5 gives
_
A X
X
B
_
= U
_
A+B
2
+ ImX 0
0 0
_
U
+ V
_
0 0
0
A+B
2
ImX
_
V

for some unitaries U, V M
n
.
We have two remarks. If C is not invertible, then the supremum in The-
orem 2.4 is A BC
, where C
is the Moore-Penrose generalized inverse.

The supremum of that theorem can be formulated without the block-matrix
formalism. Assume that P is an ortho-projection (see Section 2.3). Then
[P]M := supN : 0 N M, PN = N. (2.3)
If
P =
_
I 0
0 0
_
and M =
_
A B
B
C
_
,
then [P]M = M/C. The formula (2.3) makes clear that if Q is another
ortho-projection such that P Q, then [P]M [P]QMQ.
It follows from the factorization that for an invertible block-matrix
_
A B
C D
_
,
both A and D CA
1
B must be invertible. This implies that
_
A B
C D
_
1
=
_
I A
1
B
0 I
_
_
A
1
0
0 (DCA
1
B)
1
_
_
I 0
CA
1
I
_
.
After multiplication on the right-hand side, we have the following:
_
A B
C D
_
1
=
_
A
1
+ A
1
BW
1
CA
1
A
1
BW
1
W
1
CA
1
W
1
_
=
_
V
1
V
1
BD
1
D
1
CV
1
D
1
+ D
1
CV
1
BD
1
_
, (2.4)
where W = M/A := D CA
1
B and V = M/D := A BD
1
C.
Example 2.7 Let X
1
, X
2
, . . . , X
m+k
be real random variables with (Gaus-
sian) joint probability distribution
f
M
(z) :=
det M
(2)
m+k
exp
_
1
2
z, Mz
_
,
where z = (z
1
, z
2
, . . . , z
m+k
) and M is a positive denite real (m+k)(m+k)
matrix, see Example 1.43. We want to compute the distribution of the random
variables X
1
, X
2
, . . . , X
m
.
Let
M =
_
A B
B
D
_
be written in the form of a block-matrix, A is m m and D is k k. Let
z = (x
1
, x
2
), where x
1
R
m
and x
2
R
k
. Then the marginal of the Gaussian
probability distribution
f
M
(x
1
, x
2
) =
det M
(2)
m+k
exp
_
1
2
(x
1
, x
2
), M(x
1
, x
2
)
_
on R
m
is the distribution
f
1
(x
1
) =
det M
(2)
m
det D
exp
_
1
2
x
1
, (ABD
1
B
)x
1
_
. (2.5)
We have
(x
1
, x
2
), M(x
1
, x
2
) = Ax
1
+ Bx
2
, x
1
+B
x
1
+ Dx
2
, x
2
= Ax
1
, x
1
+Bx
2
, x
1
+B
x
1
, x
2
+Dx
2
, x
2
= Ax
1
, x
1
+ 2B
x
1
, x
2
+Dx
2
, x
2
= Ax
1
, x
1
+D(x
2
+ Wx
1
), (x
2
+ Wx
1
) DWx
1
, Wx
1
,
where W = D
1
B
. We integrate on R
k
as
_
exp
_
1
2
(x
1
, x
2
)M(x
1
, x
2
)
t
_
dx
2
= exp
_
1
2
(Ax
1
, x
1
DWx
1
, Wx
1
)
_
_
exp
_
1
2
D(x
2
+ Wx
1
), (x
2
+ Wx
1
)
_
dx
2
= exp
_
1
2
(ABD
1
B
)x
1
, x
1
_
_
(2)
k
det D
and obtain (2.5).
This computation gives a proof of Theorem 2.3 (for a real positive denite
matrix) as well. If we know that f
1
(x
1
) is Gaussian, then its quadratic matrix
can be obtained from formula (2.4). The covariance of X
1
, X
2
, . . . , X
m+k
is
M
1
. Therefore, the covariance of X
1
, X
2
, . . . , X
m
is (A BD
1
B
)
1
. It
follows that the quadratic matrix is the inverse: A BD
1
B
M/D.
Theorem 2.8 Let A be a positive nn block-matrix with kk entries. Then
A is the sum of block-matrices B of the form [B]
ij
= X
i
X
j
for some k k
matrices X
1
, X
2
, . . . , X
n
.
Proof: A can be written as C
C for some
C =
_
_
C
11
C
12
. . . C
1n
C
21
C
22
. . . C
2n
.
.
.
.
.
.
.
.
.
.
.
.
C
n1
C
n2
. . . C
nn
_
_
.
Let B
i
be the block-matrix such that its ith row is the same as in C and all
other elements are 0. Then C = B
1
+ B
2
+ + B
n
and for t ,= i we have
B
t
B
i
= 0. Therefore,
A = (B
1
+B
2
+ +B
n
)
(B
1
+B
2
+ +B
n
) = B
1
B
1
+B
2
B
2
+ +B
n
B
n
.
The (i, j) entry of B
t
B
t
is C
ti
C
tj
; hence this matrix is of the required form.
Example 2.9 Let 1 be an n-dimensional Hilbert space and A B(1) be

a positive operator with eigenvalues
1

2

n
. If x, y 1 are
orthogonal vectors, then
[x, Ay[
2
1
+
n
_
2
x, Ax y, Ay,
which is called the Wielandt inequality. (It is also in Theorem 1.45.) The
argument presented here includes a block-matrix.
We can assume that x and y are unit vectors and we extend them to a
basis. Let
M =
_
x, Ax x, Ay
y, Ax y, Ay
_
,
and A has a block-matrix
_
M B
B
C
_
.
We can see that M 0 and its determinant is positive:
[x, Ay[
2
x, Ax y, Ay.
If
n
= 0, then the proof is complete. Now we assume that
n
> 0. Let
and be the eigenvalues of M. Formula (1.28) tells that
[x, Ay[
2
_

+
_
2
x, Ax y, Ay.
We need the inequality

+

1
1
+
n
when . This is true, since
1

n
.
As an application of the block-matrix technique, we consider the following
result, called the UL-factorization (or the Cholesky factorization).
Theorem 2.10 Let X be an nn invertible positive matrix. Then there is a
unique upper triangular matrix T with positive diagonal such that X = TT
.
Proof: The proof can be done by mathematical induction on n. For n = 1
the statement is clear. We assume that the factorization is true for (n1)
(n 1) matrices and write X in the form
_
A B
B
C
_
, (2.6)
where A is an (invertible) (n 1) (n 1) matrix and C is a number. If
T =
_
T
11
T
12
0 T
22
_
is written in a similar form, then
TT
=
_
T
11
T
11
+ T
12
T
12
T
12
T
22
T
22
T
12
T
22
T
22
_
The condition X = TT
leads to the equations

T
11
T
11
+ T
12
T
12
= A,
T
12
T
22
= B,
T
22
T
22
= C.
If C = 0, then the positivity of (2.6) forces B = 0 so that we can apply the
induction hypothesis to A. So we may assume that C > 0. If T
22
is positive
(number), then T
22
=
C is the unique solution and moreover

T
12
= BC
1/2
, T
11
T
11
= ABC
1
B
.
From the positivity of (2.6), we have ABC
1
B
0 by Theorem 2.1. The

induction hypothesis gives that the latter can be written in the form of T
11
T
11
with an upper triangular T
11
. Therefore T is upper triangular, too.
If 0 A M
n
and 0 B M
m
, then 0 A B. More generally, if
0 A
i
M
n
and 0 B
i
M
m
, then
k
i=1
A
i
B
i
is positive. These matrices in M
n
M
m
are called separable positive ma-
trices. Is it true that every positive matrix in M
n
M
m
is separable? A
counterexample follows.
2.2. PARTIAL ORDERING 69
Example 2.11 Let M
4
= M
2
M
2
and
D :=
1
2
_
_
0 0 0 0
0 1 1 0
0 1 1 0
0 0 0 0
_
_
.
D is a rank 1 positive operator, it is a projection. If D =
i
D
i
, then
D
i
=
i
D. If D is separable, then it is a tensor product. If D is a tensor
product, then up to a constant factor it equals to (Tr
2
D) (Tr
1
D) (as noted
in Example 1.56). We have
Tr
1
D = Tr
2
D =
1
2
_
1 0
0 1
_
.
Their tensor product has rank 4 and it cannot be D. It follows that this D
is not separable.
In quantum theory the non-separable positive operators are called entan-
gled. The positive operator D is maximally entangled if it has minimal
rank (it means rank 1) and the partial traces have maximal rank. The matrix
D in the previous example is maximally entangled.
It is interesting that there is no eective procedure to decide if a positive
operator in a tensor product space is separable or entangled.
2.2 Partial ordering
Let A, B B(1) be self-adjoint operators. The partial ordering A B
holds if B A is positive, or equivalently
x, Ax x, Bx
for all vectors x. From this formulation one can easily see that A B implies
XAX
XBX
for every operator X.

Example 2.12 Assume that for the orthogonal projections P and Q the
inequality P Q holds. If Px = x for a unit vector x, then x, Px
x, Qx 1 shows that x, Qx = 1. Therefore the relation
|x Qx|
2
= x Qx, x Qx = x, x x, Qx = 0
gives that Qx = x. The range of Q includes the range of P.
Let A
n
be a sequence of operators on a nite-dimensional Hilbert space.
Fix a basis and let [A
n
] be the matrix of A
n
. Similarly, the matrix of the
operator A is [A]. Let the Hilbert space be m-dimensional, so the matrices
are mm. Recall that the following conditions are equivalent:
(1) |AA
n
| 0.
(2) A
n
x Ax for every vector x.
(3) x, A
n
y x, Ay for every vectors x and y.
(4) x, A
n
x x, Ax for every vector x.
(5) Tr (AA
n
)
(AA
n
) 0
(6) [A
n
]
ij
[A]
ij
for every 1 i, j m.
These conditions describe in several ways the convergence of a sequence of
operators or matrices.
Theorem 2.13 Let A
n
be an increasing sequence of operators with an upper
bound: A
1
A
2
B. Then there is an operator A B such that
A
n
A.
Proof: Let
n
(x, y) := x, A
n
y be a sequence of complex bilinear function-
als. Then
n
(x, x) is a bounded increasing real sequence and it is convergent.
Due to the polarization identity
n
(x, y) is convergent as well and the limit
gives a complex bilinear functional . If the corresponding operator is denoted
by A, then
x, A
n
y x, Ay
for every vectors x and y. This is the convergence A
n
A. The condition
x, Ax x, Bx means A B.
Example 2.14 Assume that 0 A I for an operator A. Dene a sequence
T
n
of operators by recursion. Let T
1
= 0 and
T
n+1
= T
n
+
1
2
(AT
2
n
) (n N) .
T
n
is a polynomial of A with real coecients. So these operators commute
with each other. Since
I T
n+1
=
1
2
(I T
n
)
2
+
1
2
(I A) ,
2.2. PARTIAL ORDERING 71
induction shows that T
n
I.
We show that T
1
T
2
T
3
by mathematical induction again. In
the recursion
T
n+1
T
n
=
1
2
((I T
n1
)(T
n
T
n1
) + (I T
n
)(T
n
T
n1
)) ,
I T
n1
0 and T
n
T
n1
0 due to the assumption. Since they commute
their product is positive. Similarly (I T
n
)(T
n
T
n1
) 0. It follows that
the right-hand side is positive.
Theorem 2.13 tells that T
n
converges to an operator B. The limit of the
recursion formula yields
B = B +
1
2
(AB
2
) .
Therefore A = B
2
. The example is a constructive proof of Theorem 1.38.
Theorem 2.15 Assume that 0 < A, B M
n
are invertible matrices and
A B. Then B
1
A
1
Proof: The condition A B is equivalent to B
1/2
AB
1/2
I and
the statement B
1
A
1
is equivalent to I B
1/2
A
1
B
1/2
. If X =
B
1/2
AB
1/2
, then we have to show that X I implies X
1
I. The
condition X I means that all eigenvalues of X are in the interval (0, 1].
This implies that all eigenvalues of X
1
are in [1, ).
Assume that A B. It follows from (1.15) that the largest eigenvalue of
A is smaller than the largest eigenvalue of B. Let (A) = (
1
(A), . . . ,
n
(A))
denote the vector of the eigenvalues of A in decreasing order (with counting
multiplicities).
The next result is called Weyls monotonicity theorem.
Theorem 2.16 If A B, then
k
(A)
k
(B) for all k.
This is a consequence of the minimax principle, Theorem 1.27.
Corollary 2.17 Let A, B B(1) be self-adjoint operators.
(1) If A B, then Tr A Tr B.
(2) If 0 A B, then det A det B.
Theorem 2.18 (Schur theorem) Let A and B be positive nn matrices.
Then
C
ij
= A
ij
B
ij
(1 i, j n)
determines a positive matrix.
Proof: If A
ij
=
i
j
and B
ij
=
i
j
, then C
ij
=
i
j
and C is positive
due to Example 1.40. The general case is reduced to this one.
The matrix C of the previous theorem is called the Hadamard (or Schur)
product of the matrices A and B. In notation, C = A B.
Corollary 2.19 Assume that 0 A B and 0 C D. Then A C
B D.
Proof: The equation
B D A C = (B A) D + A (D C)
implies the statement.
Theorem 2.20 (Oppenheims inequality) If 0 A, B M
n
, then
det(A B)
_
n
i=1
A
ii
_
det B.
Proof: For n = 1 the statement is obvious. The argument will be in
induction on n. We take the Schur complementation and the block-matrix
formalism
A =
_
a A
1
A
2
A
3
_
and B =
_
b B
1
B
2
B
3
_
,
where a, b [0, ). We may assume that a, b > 0. From the induction we
have
det(A
3
(B/b)) A
2,2
A
3,3
. . . A
n,n
det(B/b). (2.7)
From Theorem 2.3 we have det(A B) = ab det(A B/ab) and
A B/ab = A
3
B
3
(A
2
B
2
)a
1
b
1
(A
1
B
1
)
= A
3
(B/b) + (A/a) (B
2
B
1
b
1
).
The matrices A/a and B/b are positive, see Theorem 2.4. So the matrices
A
3
(B/b) and (A/a) (B
2
B
1
b
1
)
2.3. PROJECTIONS 73
are positive as well. So
det(A B) ab det(A
3
(B/b)).
Finally the inequality (2.7) gives
det(A B)
_
n
i=1
A
ii
_
b det(B/b).
Since det B = b det(B/b), the proof is complete.
A linear mapping : M
n
M
n
is called completely positive if it has
the form
(B) =
k
i=1
V

i
BV
i
for some matrices V
i
. The sum of completely positive mappings is completely
positive. (More details about completely positive mappings are in the Theo-
rem 2.49.)
Example 2.21 Let A M
n
be a positive matrix. The mapping S
A
: B
A B sends positive matrix to positive matrix. Therefore it is a positive
mapping.
We want to show that S
A
is completely positive. Since S
A
is additive in
A, it is enough to show the case A
ij
=
i
j
. Then
S
A
(B) = Diag(
1
,
2
, . . . ,
n
) BDiag(
1
,
2
, . . . ,
n
)
and S
A
is completely positive.
2.3 Projections
Let / be a closed subspace of a Hilbert space 1. Any vector x 1 can
be written in the form x
0
+ x
1
, where x
0
/ and x
1
/, see Theorem
1.11. The linear mapping P : x x
0
is called (orthogonal) projection
onto /. The orthogonal projection P has the properties P = P
2
= P
. If
an operator P B(1) satises P = P
2
= P
, then it is an (orthogonal)
projection (onto its range). Instead of orthogonal projection the terminology
ortho-projection is also used.
The partial ordering is very simple for projections, see Example 2.12. If
P and Q are projections, then the relation P Q means that the range
of P is included in the range of Q. An equivalent algebraic formulation is
PQ = P. The largest projection in M
n
is the identity I and the smallest one
is 0. Therefore 0 P I for any projection P M
n
.
Example 2.22 In M
2
the non-trivial ortho-projections have rank 1 and they
have the form
P =
1
2
_
1 + a
3
a
1
ia
2
a
1
+ ia
2
1 a
3
_
,
where a
1
, a
2
, a
3
R and a
2
1
+ a
2
2
+ a
2
3
= 1. In terms of the Pauli matrices
0
=
_
1 0
0 1
_
,
1
=
_
0 1
1 0
_
,
2
=
_
0 i
i 0
_
,
3
=
_
1 0
0 1
_
(2.8)
we have
P =
1
2
_
0
+
3
i=1
a
i
i
_
.
An equivalent formulation is P = [xx[, where x C
2
is a unit vector. This
can be extended to an arbitrary ortho-projection Q M
n
(C):
Q =
k
i=1
[x
i
x
i
[,
where the set x
i
: 1 i k is a family of orthogonal unit vectors in C
n
.
(k is the rank of the image of Q, or Tr Q.)
If P is a projection, then I P is a projection as well and it is often
denoted by P
, since the range of I P is the orthogonal complement of the

range of P.
Example 2.23 Let P and Q be projections. The relation P Q means
that the range of P is orthogonal to the range of Q. An equivalent algebraic
formulation is PQ = 0. Since the orthogonality relation is symmetric, PQ = 0
if and only if QP = 0. (We can arrive at this statement by taking adjoint as
well.)
We show that P Q if and only if P + Q is a projection as well. P + Q
is self-adjoint and it is a projection if
(P + Q)
2
= P
2
+ PQ+ QP + Q
2
= P + Q+ PQ+ QP = P + Q
or equivalently
PQ+ QP = 0.
2.3. PROJECTIONS 75
This is true if P Q. On the other hand, the condition PQ+QP = 0 implies
that PQP + QP
2
= PQP + QP = 0 and QP must be self-adjoint. We can
conclude that PQ = 0 which is the orthogonality.
Assume that P and Q are projections on the same Hilbert space. Among
the projections which are smaller than P and Q there is the largest, it is the
orthogonal projection onto the intersection of the ranges of P and Q. This
has the notation P Q.
Theorem 2.24 Assume that P and Q are ortho-projections. Then
P Q = lim
n
(PQP)
n
= lim
n
(QPQ)
n
.
Proof: The operator A := PQP is a positive contraction. Therefore the
sequence A
n
is monotone decreasing and Theorem 2.13 implies that A
n
has
the limit R. The operator R is self-adjoint. Since (A
n
)
2
R
2
we have
R = R
2
; in other words, R is an ortho-projection. If Px = x and Qx = x
for a vector x, then Ax = x and it follows that Rx = x. This means that
R P Q.
From the inequality PQP P, R P follows. Taking the limit of
(PQP)
n
Q(PQP)
n
= (PQP)
2n+1
, we have RQR = R. From this we have
R(I Q)R = 0 and (I Q)R = 0. This gives R Q.
It has been proved that R P, Q and R P Q. So R = P Q is the
only possibility.
Corollary 2.25 Assume that P and Q are ortho-projections and 0 H
P, Q. Then H P Q.
Proof: Since (I P)H(I P) = 0 implies H
1/2
(I P) = 0, we have
H
1/2
P = H
1/2
so that PHP = H, and similarly QHQ = H. These imply
(PQP)
n
H(PQP)
n
= H and the limit n gives RHR = H, where
R = P Q. Hence H R.
Let P and Q be ortho-projections. If the ortho-projection R has the prop-
erty R P, Q, then the image of R includes the images of P and Q. The
smallest such R projects to the linear subspace generated by the images of
P and Q. This ortho-projection is denoted by P Q. The set of ortho-
projections becomes a lattice with the operations and . However, the
so-called distributivity
A (B C) = (A B) (A C)
is not true.
Example 2.26 We show that any operator X M
n
(C) is a linear combina-
tion of ortho-projections. We write
X =
1
2
(X + X
) +
1
2i
(iX iX
),
where X+X
and iXiX
are self-adjoint operators. Therefore, it is enough

to nd linear combination of ortho-projections for self-adjoint operators. This
is essentially the spectral decomposition (1.13).
Assume that
0
is dened on projections of M
n
(C) and it has the properties
0
(0) = 0,
0
(I) = 1,
0
(P + Q) =
0
(P) +
0
(Q) if P Q.
It is a famous theorem of Gleason that in the case n > 2 the mapping
0
has a linear extension : M
n
(C) C. The linearity implies the form
(X) = Tr X (X M
n
(C))
with a matrix M
n
(C). However, from the properties of
0
we have 0
and Tr = 1. Such a is usually called a density matrix in the quantum
applications. It is clear that if has rank 1, then it is a projection.
In quantum information theory the traditional variance is
Var
(A) = Tr A
2
(Tr A)
2
(2.9)
when is a density matrix and A M
n
(C) is a self-adjoint operator. This is
the straightforward analogy of the variance in probability theory; a standard
notation is A
2
A
2
in both formalism. We note that for two self-adjoint
operators the notion is covariance:
Cov
(A, B) = Tr AB (Tr A)(Tr B).

It is rather dierent from probability theory that the variance (2.9) can be
strictly positive even in the case where has rank 1. If has rank 1, then it
is an ortho-projection of rank 1 and it is also called as pure state.
It is easy to show that
Var
(A+ I) = V ar
(A) for R
and the concavity of the variance functional Var
(A):
Var
(A)
i
Var
i
(A) if =
i
.
2.3. PROJECTIONS 77
(Here
i
0 and
i
= 1.)
The formulation is easier if is diagonal. We can change the basis of the
n-dimensional space such that = Diag(p
1
, p
2
, . . . , p
n
); then we have
Var
(A) =
i,j
p
i
+ p
j
2
[A
ij
[
2
i
p
i
A
ii
_
2
. (2.10)
In the projection example P = Diag(1, 0, . . . , 0), formula (2.10) gives
Var
P
(A) =
i=1
[A
1i
[
2
and this can be strictly positive.
Theorem 2.27 Let be a density matrix. Take all the decompositions such
that
=
i
q
i
Q
i
, (2.11)
where Q
i
are pure states and (q
i
) is a probability distribution. Then
Var
(A) = sup
_
i
q
i
_
Tr Q
i
A
2
(Tr Q
i
A)
2
_
_
, (2.12)
where the supremum is over all decompositions (2.11).
The proof will be an application of matrix theory. The rst lemma contains
a trivial computation on block-matrices.
Lemma 2.28 Assume that
=
_
0
0 0
_
,
i
=
_
i
0
0 0
_
, A =
_
A
B
B
C
_
and
=
i
,
i
.
Then
_
Tr
(A
)
2
(Tr
)
2
_
i
_
Tr
i
(A
)
2
(Tr
i
A
)
2
_
= (Tr A
2
(Tr A)
2
)
i
_
Tr
i
A
2
(Tr
i
A)
2
_
.
This lemma shows that if M
n
(C) has a rank k < n, then the computa-
tion of a variance Var
(A) can be reduced to k k matrices. The equality in

(2.12) is rather obvious for a rank 2 density matrix and due to the previous
lemma the computation will be with 2 2 matrices.
Lemma 2.29 For a rank 2 matrix the equality holds in (2.12).
Proof: Due to Lemma 2.28 we can make a computation with 22 matrices.
We can assume that
=
_
p 0
0 1 p
_
, A =
_
a
1
b
b a
2
_
.
Then
Tr A
2
= p(a
2
1
+[b[
2
) + (1 p)(a
2
2
+[b[
2
).
We can assume that
Tr A = pa
1
+ (1 p)a
2
= 0.
Let
Q
1
=
_
p c e
i
c e
i
1 p
_
,
where c =
_
p(1 p). This is a projection and
Tr Q
1
A = a
1
p + a
2
(1 p) + bc e
i
+ bc e
i
= 2c Re b e
i
.
We choose such that Re b e
i
= 0. Then Tr Q
1
A = 0 and
Tr Q
1
A
2
= p(a
2
1
+[b[
2
) + (1 p)(a
2
2
+[b[
2
) = Tr A
2
.
Let
Q
2
=
_
p c e
i
c e
i
1 p
_
.
Then
=
1
2
Q
1
+
1
2
Q
2
and we have
1
2
(Tr Q
1
A
2
+ Tr Q
2
A
2
) = p(a
2
1
+[b[
2
) + (1 p)(a
2
2
+[b[
2
) = Tr A
2
.
Therefore we have an equality.
We denote by r() the rank of an operator . The idea of the proof is to
reduce the rank and the block diagonal formalism will be used.
2.3. PROJECTIONS 79
Lemma 2.30 Let be a density matrix and A = A
be in M
n
(C). Assume
the block-matrix forms
=
_
1
0
0
2
_
, A =
_
A
1
A
2
A
2
A
3
_
.
and r(
1
), r(
2
) > 1. We construct
:=
_
1
X
X
2
_
such that
Tr A = Tr
A,
0, r(
) < r().
Proof: The condition Tr A = Tr
A is equivalent to Tr XA
2
+Tr X
2
=
0 and this holds if and only if Re Tr XA
2
= 0.
We can have unitaries U and W such that U
1
U
and W
2
W
are diagonal:
U
1
U
= Diag(0, . . . , 0, a
1
, . . . , a
k
), W
2
W
= Diag(b
1
, . . . , b
l
, 0, . . . , 0)
where a
i
, b
j
> 0. Then has the same rank as the matrix
_
U 0
0 W
_
_
U
0
0 W
_
=
_
U
1
U
0
0 W
2
W
_
,
the rank is k + l. A possible modication of this matrix is Y :=
_
_
Diag(0, . . . , 0, a
1
, . . . , a
k1
) 0 0 0
0 a
k
a
k
b
1
0
0
a
k
b
1
b
1
0
0 0 0 Diag(b
2
, . . . , b
l
, 0, . . . , 0)
_
_
=
_
U
1
U
M
M W
2
W
_
and r(Y ) = k + l 1. So Y has a smaller rank than . Next we take
_
U
0
0 W
_
Y
_
U 0
0 W
_
=
_

1
U
MW
W
MU
2
_
which has the same rank as Y . If X
1
:= W
MU is multiplied with e
i
( > 0),
then the positivity condition and the rank remain. On the other hand, we can
choose > 0 such that Re Tr e
i
X
1
A
2
= 0. Then X := e
i
X
1
is the matrix
we wanted.
Lemma 2.31 Let be a density matrix of rank m > 0 and A = A
be in
M
n
(C). We claim the existence of a decomposition
= p
+ (1 p)
+
,
such that r(
) < m, r(
+
) < m, and
Tr A
+
= Tr A
= Tr A.
Proof: By unitary transformation we can get to the setup of the previous
lemma:
=
_
1
0
0
2
_
, A =
_
A
1
A
2
A
2
A
3
_
.
With
in the previous lemma we choose
+
=
=
_
1
X
X
2
_
,
=
_

1
X
X
2
_
.
Then
=
1
2
+
1
2
+
and the requirements Tr A
+
= Tr A
= Tr A also hold.
Proof of Theorem 2.27: For rank 2 states, the theorem is true because of
Lemma 2.29. Any state with a rank larger than 2 can be decomposed into
the mixture of lower rank states, according to Lemma 2.31, that have the
same expectation value for A, as the original has. The lower rank states
can then be decomposed into the mixture of states with an even lower rank,
until we reach states of rank 2. Thus, any state can be decomposed into
the mixture of pure states
=
p
k
Q
k
such that Tr AQ
k
= Tr A. Hence the statement of the theorem follows.
2.4 Subalgebras
A unital -subalgebra of M
n
(C) is a subspace / that contains the identity
I and is closed under matrix multiplication and adjoint. That is, if A, B /,
then so are AB and A
. In what follows, to simplify the notation, we shall

write subalgebra for all -subalgebras.
2.4. SUBALGEBRAS 81
Example 2.32 A simple subalgebra is
/ =
__
z w
w z
_
: z, w C
_
M
2
(C).
Since A, B / implies AB = BA, this is a commutative subalgebra. In
terms of the Pauli matrices (2.8) we have
/ = z
0
+ w
1
: z, w C .
This example will be generalized.
Assume that P
1
, P
2
, . . . , P
n
are projections of rank 1 in M
n
(C) such that
P
i
P
j
= 0 for i ,= j and
i
P
i
= I. Then
/ =
_
n
i=1
i
P
i
:
i
C
_
is a maximal commutative -subalgebra of M
n
(C). The usual name is MASA
which indicates the expression of maximal Abelian subalgebra.
Let / be any subset of M
n
(C). Then /
, the commutant of /, is given

by
/
= B M
n
(C) : BA = AB for all A /.
It is easy to see that for any set / M
n
(C), /
is a subalgebra. If / is a
MASA, then /
= /.
Theorem 2.33 If / M
n
(C) is a unital -subalgebra, then /
= /.
Proof: We rst show that for any -subalgebra /, B /
and any v C
n
,
there exists an A / such that Av = Bv. Let / be the subspace of C
n
given
by
/ = Av : A /.
Let P be the orthogonal projection onto / in C
n
. Since, by construction, /
is invariant under the action of /, PAP = AP for all A /. Taking the
adjoint, PA
P = PA
for all A /. Since / is a -algebra, this implies

PA = AP for all A /. That is, P /
. Thus, for any B /
, BP = PB
and so / is invariant under the action of /
. In particular, Bv / and
hence, by the denition of /, Bv = Av for some A /.
We apply the previous statement to the -subalgebra
/= AI
n
: A / M
n
(C) M
n
(C) = M
n
2(C).
It is easy to see that
/
= B I
n
: B /
M
n
(C) M
n
(C).
Now let v
1
, . . . , v
n
be any basis of C
n
and form the vector
v =
_
_
v
1
v
2
.
.
.
v
n
_
_
C
n
2
.
Then
(AI
n
)v = (B I
n
)v
and Av
j
= Bv
j
for every 1 j n. Since v
1
, . . . , v
n
is a basis of C
n
, this
means B = A /. Since B was an arbitrary element of /
, this shows that

/
/. Since / /
is an automatic consequence of the denitions, this

proves that /
= /.
Next we study subalgebras / B M
n
(C). A conditional expectation
c : B / is a unital positive mapping which has the property
c(AB) = Ac(B) for every A / and B B.
Choosing B = I, we obtain that c acts identically on /. It follows from the
positivity of c that c(C
) = c(C)
. Therefore, c(BA) = c(B)A for every

A / and B B. Another standard notation for a conditional expectation
B / is c
B
A
.
Theorem 2.34 Assume that / B M
n
(C). If : / B is the embed-
ding, then the dual c : B / of with respect to the Hilbert-Schmidt inner
product is a conditional expectation.
Proof: From the denition
Tr (A)B = Tr Ac(B) (A /, B B)
of the dual, we see that c : B / is a positive unital mapping and c(A) = A
for every A /. For every A, A
1
/ and B B we further have
Tr Ac(A
1
B) = Tr (A)A
1
B = Tr (AA
1
)B = Tr AA
1
c(B),
which implies that c(A
1
B) = A
1
c(B).
Note that a conditional expectation c : B / has norm 1, that is,
|c(B)| |B| for every B B. This is seen from Corollary 2.45.
The subalgebras /
1
, /
2
M
n
(C) cannot be orthogonal since I is in /
1
and in /
2
. They are called complementary or quasi-orthogonal if A
i
/
i
and Tr A
i
= 0 for i = 1, 2 imply that Tr A
1
A
2
= 0.
2.4. SUBALGEBRAS 83
Example 2.35 In M
2
(C) the subalgebras
/
i
:= a
0
+ b
i
: a, b C (1 i 3)
are commutative and quasi-orthogonal. This follows from the facts that
Tr
i
= 0 for 1 i 3 and
2
= i
3
,
2
3
= i
1

3
1
= i
2
.
So M
2
(C) has 3 quasi-orthogonal MASAs.
In M
4
(C) = M
2
(C)M
2
(C) we can give 5 quasi-orthogonal MASAs. Each
MASA is the linear combimations of 4 operators:
0
,
0
1
,
1
0
,
1
1
,
0
,
0
2
,
2
0
,
2
2
,
0
,
0
3
,
3
0
,
3
3
,
0
,
1
2
,
2
3
,
3
1
,
0
,
1
3
,
2
1
,
3
2
.
Theorem 2.36 Assume that /

i
: 1 i k is a set of quasi-orthogonal
MASAs in M
n
(C). Then k n + 1.
Proof: The argument is rather simple. The traceless part of M
n
(C) has
dimension n
2
1 and the traceless part of a MASA has dimension n 1.
Therefore k (n
2
1)/(n 1) = n + 1.
The maximal number of quasi-orthogonal MASAs is a hard problem. For
example, if n = 2
m
, then n +1 is really possible, but for an arbitrary n there
is no denite result.
The next theorem gives a characterization of complementarity.
Theorem 2.37 Let /
1
and /
2
be subalgebras of M
n
(C) and the notation
= Tr /n is used. The following conditions are equivalent:
(i) If P /
1
and Q /
2
are minimal projections, then (PQ) = (P)(Q).
(ii) The subalgebras /
1
and /
2
are quasi-orthogonal in M
n
(C).
(iii) (A
1
A
2
) = (A
1
)(A
2
) if A
1
/
1
, A
2
/
2
.
(iv) If c
1
: M
n
(C) /
1
is the trace-preserving conditional expectation, then
c
1
restricted to /
2
is a linear functional (times I).
Proof: Note that ((A
1
(A
1
)I)(A
2
(A
2
)I)) = 0 and (A
1
A
2
) =
(A
1
)(A
2
) are equivalent. If they hold for minimal projections, they hold
for arbitrary operators as well. Moreover, (iv) is equivalent to the property
(A
1
c
1
(A
2
)) = (A
1
((A
2
)I)) for every A
1
/
1
and A
2
/
2
, and note that
(A
1
c
1
(A
2
)) = (A
1
A
2
).
Example 2.38 A simple example for quasi-orthogonal subalgebras can be
formulated with tensor product. If / = M
n
(C) M
n
(C), /
1
= M
n
(C)
CI
n
/ and /
2
= CI
n
M
n
(C) /, then /
1
and /
2
are quasi-orthogonal
subalgebras of /. This comes from the property Tr (A B) = Tr A Tr B.
For n = 2 we give another example formulated by the Pauli matrices. The
4-dimensional subalgebra /
1
= M
2
(C) CI
2
is the linear combination of the
set
0
,
1
0
,
2
0
,
3
0
.
Together with the identity, each of the following triplets linearly spans a
subalgebra /
j
isomorphic to M
2
(C) (2 j 4):
1
,
3
2
,
0
3
,
3
,
2
1
,
0
2
,
2
,
1
3
,
0
1
.
It is easy to check that the subalgebras /
1
, . . . , /
4
are complementary.
The orthogonal complement of the four subalgebras is spanned by
0

3
,
3

0
,
3

3
. The linear combination together with
0

0
is a
commutative subalgebra.
The previous example is the general situation for M
4
(C). This will be
the content of the next theorem. It is easy to calculate that the number of
complementary subalgebras isomorphic to M
2
(C) is at most (16 1)/3 = 5.
However, the next theorem says that 5 is not possible.
If x = (x
1
, x
2
, x
3
) R
3
, then the notation
x = x
1
1
+ x
2
2
+ x
3
3
will be used and called a Pauli triplet.
Theorem 2.39 Assume that /
i
: 0 i 3 is a family of pairwise quasi-
orthogonal subalgebras of M
4
(C) which are isomorphic to M
2
(C). For every
0 i 3, there exists a Pauli triplet A(i, j) (j ,= i) such that /
i
/
j
is the
linear span of I and A(i, j). Moreover, the subspace linearly spanned by
I and
_
3
_
i=0
/
i
_
2.4. SUBALGEBRAS 85
is a maximal Abelian subalgebra.
Proof: Since the intersection /
0
/
j
is a 2-dimensional commutative
subalgebra, we can nd a self-adjoint unitary A(0, j) such that /
0
/
j
is
spanned by I and A(0, j) = x(0, j) I, where x(0, j) R
3
. Due to the
quasi-orthogonality of /
1
, /
2
and /
3
, the unit vectors x(0, j) are pairwise
orthogonal (see (2.17)). The matrices A(0, j) are anti-commute:
A(0, i)A(0, j) = i(x(0, i) x(0, j)) I
= i(x(0, j) x(0, i)) I = A(0, j)A(0, i)
for i ,= j. Moreover,
A(0, 1)A(0, 2) = i(x(0, 1) x(0, 2))
and x(0, 1) x(0, 2) = x(0, 3) because x(0, 1) x(0, 2) is orthogonal to both
x(0, 1) and x(0, 2). If necessary, we can change the sign of x(0, 3) such that
A(0, 1)A(0, 2) = iA(0, 3) holds.
Starting with the subalgebras /
1
, /
2
, /
3
we can construct similarly the
other Pauli triplets. In this way, we arrive at the 4 Pauli triplets, the rows of
the following table:
A(0, 1) A(0, 2) A(0, 3)
A(1, 0) A(1, 2) A(1, 3)
A(2, 0) A(2, 1) A(2, 3)
A(3, 0) A(3, 1) A(3, 2)
(2.13)
When /
i
: 1 i 3 is a family of pairwise quasi-orthogonal subalge-
bras, then the commutants /
i
: 1 i 3 are pairwise quasi-orthogonal as
well. /
j
= /
j
and /
i
have nontrivial intersection for i ,= j, actually the pre-
viously dened A(i, j) is in the intersection. For a xed j the three unitaries
A(i, j) (i ,= j) form a Pauli triplet up to a sign. (It follows that changing
sign we can always reach the situation where the rst three columns of table
(2.13) form Pauli triplets. A(0, 3) and A(1, 3) are anti-commute, but it may
happen that A(0, 3)A(1, 3) = iA(2, 3).)
Let C
0
:= A(i, j)A(j, i) : i ,= j I and C := C
0
iC
0
. We want
to show that C is a commutative group (with respect to the multiplication of
unitaries).
Note that the products in C
0
have factors in symmetric position in (2.13)
with respect to the main diagonal indicated by stars. Moreover, A(i, j) /(j)
and A(j, k) /(j)
, and these operators commute.

A
1
A
1
A
0
A
0
A
2
A
3
A
2
A
3
A(2,3)
A(0,3)
A(1,3)
This picture shows a family /
i
: 0 i 3 of pairwise quasi-orthogonal
subalgebras of M
4
(C) which are isomorphic to M
2
(C). The edges between
two vertices represent the one-dimensional traceless intersection of the two
subalgebras corresponding to two vertices. The three edges starting from a
vertex represent a Pauli triplet.
We have two cases for a product fromC. Taking the product of A(i, j)A(j, i)
and A(u, v)A(v, u), we have
(A(i, j)A(j, i))(A(i, j)A(j, i)) = I
in the simplest case, since A(i, j) and A(j, i) are commuting self-adjoint uni-
taries. It is slightly more complicated if the cardinality of the set i, j, u, v
is 3 or 4. First,
(A(1, 0)A(0, 1))(A(3, 0)A(0, 3)) = A(0, 1)(A(1, 0)A(3, 0))A(0, 3)
= i(A(0, 1)A(2, 0))A(0, 3)
= iA(2, 0)(A(0, 1)A(0, 3))
= A(2, 0)A(0, 2),
and secondly,
(A(1, 0)A(0, 1))(A(3, 2)A(2, 3)) = iA(1, 0)A(0, 2)(A(0, 3)A(3, 2))A(2, 3)
2.5. KERNEL FUNCTIONS 87
= iA(1, 0)A(0, 2)A(3, 2)(A(0, 3)A(2, 3))
= A(1, 0)(A(0, 2)A(3, 2))A(1, 3)
= iA(1, 0)(A(1, 2)A(1, 3))
= A(1, 0)A(1, 0) = I. (2.14)
So the product of any two operators from C is in C.
Now we show that the subalgebra ( linearly spanned by the unitaries
A(i, j)A(j, i) : i ,= jI is a maximal Abelian subalgebra. Since we know
the commutativity of this algebra, we estimate the dimension. It follows from
(2.14) and the self-adjointness of A(i, j)A(j, i) that
A(i, j)A(j, i) = A(k, )A(, k)
when i, j, k and are dierent. Therefore ( is linearly spanned by A(0, 1)A(1, 0),
A(0, 2)A(2, 0), A(0, 3)A(3, 0) and I. These are 4 dierent self-adjoint uni-
taries.
Finally, we check that the subalgebra ( is quasi-orthogonal to /(i). If the
cardinality of the set i, j, k, is 4, then we have
Tr A(i, j)(A(i, j)A(j, i)) = Tr A(j, i) = 0
and
Tr A(k, )A(i, j)A(j, i) = Tr A(k, )A(k, l)A(, k) = Tr A(, k) = 0.
Moreover, because /(k) is quasi-orthogonal to /(i), we also have A(i, k)A(j, i),
so
Tr A(i, )(A(i, j)A(j, i)) = i Tr A(i, k)A(j, i) = 0.
From this we can conclude that
A(k, ) A(i, j)A(j, i)
for all k ,= and i ,= j.
2.5 Kernel functions
Let A be a nonempty set. A function : A A C is often called a kernel.
A kernel : A A C is called positive denite if
n
j,k=1
c
j
c
k
(x
j
, x
k
) 0
for all nite sets c
1
, c
2
, . . . , c
n
C and x
1
, x
2
, . . . , x
n
A.
Example 2.40 It follows from the Schur theorem that the product of posi-
tive denite kernels is a positive denite kernel as well.
If : A A C is positive denite, then
e
n=0
1
n!

m
and

(x, y) = f(x)(x, y)f(y) are positive denite for any function f : A
C.
The function : A A C is called a conditionally negative denite
kernel if (x, y) = (y, x) and
n
j,k=1
c
j
c
k
(x
j
, x
k
) 0
for all nite sets c
1
, c
2
, . . . , c
n
C and x
1
, x
2
, . . . , x
n
A when
n
j=1
c
j
=
0.
The above properties of a kernel depend on the matrices
_
_
(x
1
, x
1
) (x
1
, x
2
) . . . (x
1
, x
n
)
(x
2
, x
1
) (x
2
, x
2
) . . . (x
2
, x
n
)
.
.
.
.
.
.
.
.
.
.
.
.
(x
n
, x
1
) (x
n
, x
2
) . . . (x
n
, x
n
)
_
_
.
If a kernel is positive denite, then f is conditionally negative denite, but
the converse is not true.
Lemma 2.41 Assume that the function : A A C has the property
(x, y) = (y, x) and x x
0
A. Then
(x, y) := (x, y) + (x, x
0
) + (x
0
, y) (x
0
, x
0
)
is positive denite if and only if is conditionally negative denite.
The proof is rather straightforward, but an interesting particular case is
below.
Example 2.42 Assume that f : R
+
R is a C
1
-function with the property
f(0) = f
(0) = 0. Let : R
+
R
+
R be dened as
(x, y) =
_
_
f(x) f(y)
x y
if x ,= y,
f
(x) if x = y.
2.5. KERNEL FUNCTIONS 89
(This is the so-called kernel of divided dierence.) Assume that this is con-
ditionally negative denite. Now we apply the lemma with x
0
= :
f(x) f(y)
x y
+
f(x) f()
x
+
f() f(y)
y
f
()
is positive denite and from the limit 0, we have the positive denite
kernel
f(x) f(y)
x y
+
f(x)
x
+
f(y)
y
=
f(x)y
2
f(y)x
2
x(x y)y
.
Assume that f(x) > 0 for all x > 0. The multiplication by xy/(f(x)f(y))
gives a positive denite kernel
x
2
f(x)

y
2
f(y)
x y
,
which is a divided dierence of the function g(x) := x
2
/f(x) on (0, ).
Theorem 2.43 (Schoenberg theorem) Let A be a nonempty set and let
: A A C be a kernel. Then is conditionally negative denite if and
only if exp(t) is positive denite for every t > 0.
Proof: If exp(t) is positive denite, then 1 exp(t) is conditionally
negative denite and so is
= lim
t0
1
t
(1 exp(t)).
Assume now that is conditionally negative denite. Take x
0
A and
set
(x, y) := (x, y) + (x, x
0
) + (x
0
, y) (x
0
, x
0
),
which is positive denite due to the previous lemma. Then
e
(x,y)
= e
(x,y)
e
(x,x
0
)
e
(y,x
0
)
e
(x
0
,x
0
)
is positive denite. This was t = 1, and the argument is similar for general
t > 0.
The kernel functions are a kind of generalization of matrices. If A M
n
,
then the corresponding kernel function is given by A := 1, 2, . . . , n and
A
(i, j) = A
ij
(1 i, j n).
Therefore the results of this section have matrix consequences.
2.6 Positivity-preserving mappings
Let : M
n
M
k
be a linear mapping. It is called positive (or positivity-
preserving) if it sends positive (semidenite) matrices to positive (semide-
nite) matrices. is unital if (I
n
) = I
k
.
The dual
: M
k
M
n
of is dened by the equation
Tr (A)B = Tr A
(B) (A M
n
, B M
k
) .
It is easy to see that is positive if and only if
is positive and is trace-

preserving if and only if
is unital.
The inequality
(AA
) (A)(A)
is called the Schwarz inequality. If the Schwarz inequality holds for a linear
mapping , then is positivity-preserving. If is a positive mapping, then
this inequality holds for normal matrices. This result is called the Kadison
inequality.
Theorem 2.44 Let : M
n
(C) M
k
(C) be a positive unital mapping.
(1) If A M
n
is a normal operator, then
(AA
) (A)(A)
.
(2) If A M
n
is positive such that A and (A) are invertible, then
(A
1
) (A)
1
.
Proof: A has a spectral decomposition
i
P
i
, where P
i
s are pairwise
orthogonal projections. We have A
A =
i
[
i
[
2
P
i
and
_
I (A)
(A)
(A
A)
_
=
i
_
1
i
i
[
i
[
2
_
(P
i
).
Since (P
i
) is positive, the left-hand side is positive as well. Reference to
Theorem 2.1 gives the rst inequality.
To prove the second inequality, use the identity
_
(A) I
I (A
1
)
_
=
i
_

i
1
1
1
i
_
(P
i
)
to conclude that the left-hand side is a positive block-matrix. The positivity
implies our statement.
2.6. POSITIVITY-PRESERVING MAPPINGS 91
Corollary 2.45 A positive unital mapping : M
n
(C) M
k
(C) has norm
1, i.e., |(A)| |A| for every A M
n
(C).
Proof: Let A M
n
(C) be such that |A| 1, and take the polar decom-
position A = U[A[ with a unitary U. By Example 1.39 there is a unitary V
such that [A[ = (V + V

)/2 and so A = (UV + UV

)/2. Hence it suces to
show that |(U)| 1 for every unitary U. This follows from the Kadison
inequality in (1) of the previous theorem as
|(U)|
2
= |(U)
(U)| |(U
U)| = |(I)| = 1.
The linear mapping : M

n
M
k
is called 2-positive if
_
A B
B
C
_
0 implies
_
(A) (B)
(B
) (C)
_
0
when A, B, C M
n
.
Lemma 2.46 Let : M
n
(C) M
k
(C) be a 2-positive mapping. If A, (A) >
0, then
(B)
(A)
1
(B) (B
A
1
B).
for every B M
n
. Hence, a 2-positive unital mapping satises the Schwarz
inequality.
Proof: Since
_
A B
B
A
1
B
_
0,
the 2-positivity implies
_
(A) (B)
(B
) (B
A
1
B)
_
0.
So Theorem 2.1 implies the statement.
If B = B
, then the 2-positivity condition is not necessary in the previous

lemma, positivity is enough.
Lemma 2.47 Let : M
n
M
k
be a 2-positive unital mapping. Then
A
:= A M
n
: (A
A) = (A)
(A) and (AA
) = (A)(A)
is a subalgebra of M
n
and
(AB) = (A)(B) and (BA) = (B)(A)
holds for all A A
and B M
n
.
Proof: The proof is based only on the Schwarz inequality. Assume that
(AA
) = (A)(A)
. Then
t((A)(B) + (B)
(A)
)
= (tA
+ B)
(tA
+ B) t
2
(A)(A)
(B)
(B)
((tA
+ B)
(tA
+ B)) t
2
(AA
) (B)
(B)
= t(AB + B
) + (B
B) (B)
(B)
for a real t. Divide the inequality by t and let t . Then
(A)(B) + (B)
(A)
= (AB + B
)
and similarly
(A)(B) (B)
(A)
= (AB B
).
Adding these two equalities we have
(AB) = (A)(B).
The other identity is proven similarly.
It follows from the previous lemma that if is a 2-positive unital mapping
and its inverse is 2-positive as well, then is multiplicative. Indeed, the
assumption implies (A
A) = (A)
(A) for every A.

A linear mapping c : M
n
M
k
is called completely positive if
c id
n
: M
n
M
n
M
k
M
n
is a positive mapping, where id
n
: M
n
M
n
is the identity mapping and
c id
n
is dened by
(c id
n
)([X
ij
]
n
i,j=1
) := [c(X
ij
)]
n
i,j=1
.
(Here, B(1) M
n
is identied with the n n block-matrices whose entries
are operators in B(1).) Note that if a linear mapping c : M
n
M
k
is
completely positive in the above sense, then c id
m
: M
n
M
m
M
k
M
m
is positive for every m N.
Example 2.48 Consider the transpose mapping c : A A
t
on 2 2 matri-
ces:
_
x y
z w
_
_
x z
y w
_
.
c is obviously positive. The matrix
_
_
2 0 0 2
0 1 1 0
0 1 1 0
2 0 0 2
_
_
.
is positive. The extension of c maps this to
_
_
2 0 0 1
0 1 2 0
0 2 1 0
1 0 0 2
_
_
.
This is not positive, so c is not completely positive.
Theorem 2.49 Let c : M
n
M
k
be a linear mapping. Then the following
conditions are equivalent:
(1) c is completely positive;
(2) The block-matrix X dened by
X
ij
= c(E(ij)) (1 i, j n) (2.15)
is positive, where E(ij) are the matrix units of M
n
;
(3) There are operators V
t
: C
n
C
k
(1 t k
2
) such that
c(A) =
t
V
t
AV

t
; (2.16)
(4) For nite families A
i
M
n
(C) and B
i
M
k
(C) (1 i n), the
inequality
i,j
B
i
c(A
i
A
j
)B
j
0
holds.
Proof: (1) implies (2): The matrix
i,j
E(ij) E(ij) =
1
n
_
i,j
E(ij) E(ij)
_
2
is positive. Therefore,
(id
n
c)
_
i,j
E(ij) E(ij)
_
=
i,j
E(ij) c(E(ij)) = X
(2) implies (3): Assume that the block-matrix X is positive. There are
orthogonal projections P
i
(1 i n) on C
nk
such that they are pairwise
orthogonal and
P
i
XP
j
= c(E(ij)).
We have a decomposition
X =
nk
t=1
[f
t
f
t
[,
where [f
t
are appropriately normalized eigenvectors of X. Since P
i
is a
partition of unity, we have
[f
t
=
n
i=1
P
i
[f
t
and set V
t
: C
n
C
k
by
V
t
[i = P
i
[f
t
.
([i are the canonical basis vectors.) In this notation,
X =
i,j
P
i
[f
t
f
t
[P
j
=
i,j
P
i
_
t
V
t
[ij[V
t
_
P
j
and hence
c(E(ij)) = P
i
XP
j
=
t
V
t
E(ij)V

t
.
Since this holds for all matrix units E(ij), we obtained
c(A) =
t
V
t
AV
t
.
(3) implies (4): Assume that c is of the form (2.16). Then
i,j
B
i
c(A
i
A
j
)B
j
=
i,j
B
i
V
t
(A
i
A
j
)V

t
B
j
=
t
_
i
A
i
V
t
B
i
_
j
A
j
V

t
B
j
_
0
follows.
(4) implies (1): We consider
c id
n
: M
n
M
n
M
k
M
n
.
Since any positive operator in M
n
M
n
is the sum of operators in the form
i,j
A
i
A
j
E(ij) (Theorem 2.8), it is enough to show that
Y := c id
n
_
i,j
A
i
A
j
E(ij)
_
=
i,j
c(A
i
A
j
) E(ij)
is positive. On the other hand, Y = [Y
ij
]
n
i,j=1
M
k
M
n
is positive if and
only if
i,j
B
i
Y
ij
B
j
=
i,j
B
i
c(A
i
A
j
)B
j
0.
The positivity of this operator is supposed in (4). Hence (1) is shown.
The representation (2.16) is called the Kraus representation. The block-
matrix X dened by (2.15) is called the representing block-matrix (or the
Choi matrix).
Example 2.50 We take / B M
n
(C) and a conditional expectation
c : B /. We can argue that this is completely positive due to conditition
(4) of the previous theorem. For A
i
/ and B
i
B we have
i,j
A
i
c(B
i
B
j
)A
j
= c
__
i
B
i
A
i
_
j
B
j
A
j
__
0
and this is enough.
The next example will be slightly dierent.
Example 2.51 Let 1 and / be Hilbert spaces and (f
i
) be a basis in /. For
each i set a linear operator V
i
: 1 1/ as V
i
e = e f
i
(e 1). These
operators are isometries with pairwise orthogonal ranges and the adjoints act
as V

i
(e f) = f
i
, fe.
The partial trace Tr
2
: B(1/) B(1) introduced in Section 1.7 can
be written as
Tr
2
(A) =
i
V

i
AV
i
(A B(1/)).
The reason for the terminology is the formula Tr
2
(X Y ) = XTr Y . The
above expression implies that Tr
2
is completely positive. It is actually a
conditional expectation up to a constant factor.
Example 2.52 The trace Tr : M
k
(C) C is completely positive if Tr id
n
:
M
k
(C) M
n
(C) M
n
(C) is a positive mapping. However, this is a partial
trace which is known to be positive (even completely positive).
It follows that any positive linear functional : M
k
(C) C is completely
positive. Since (A) = Tr DA with a certain positive D, is the composition
of the completely positive mappings A D
1/2
AD
1/2
and Tr .
Example 2.53 Let c : M
n
M
k
be a positive linear mapping such that
c(A) and c(B) commute for any A, B M
n
. We want to show that c is
completely positive.
Any two self-adjoint matrices in the range of c commute, so we can change
the basis such that all of them become diagonal. It follows that c has the
form
c(A) =
i
(A)E
ii
,
where E
ii
are the diagonal matrix units and
i
are positive linear functionals.
Since the sum of completely positive mappings is completely positive, it is
enough to show that A (A)F is completely positive for a positive func-
tional and for a positive matrix F. The complete positivity of this mapping
means that for an m m block-matrix X with entries X
ij
M
n
, if X 0
then the block-matrix [(X
ij
)F]
n
i,j=1
should be positive. This is true, since
the matrix [(X
ij
)]
n
i,j=1
is positive (due to the complete positivity of ).
Example 2.54 A linear mapping c : M
2
M
2
is dened by the formula
c :
_
1 + z x iy
x + iy 1 z
_
_
1 + z x iy
x + iy 1 z
_
with some real parameters , , .
The condition for positivity is
1 , , 1.
It is not dicult to compute the representing block-matrix as follows:
X =
1
2
_
_
1 + 0 0 +
0 1 0
0 1 0
+ 0 0 1 +
_
_
.
This matrix is positive if and only if
[1 [ [ [.
In quantum information theory this mapping c is called the Pauli channel.
Example 2.55 Fix a positive denite matrix A M

n
and set
T
A
(K) =
_

0
(t + A)
1
K(t + A)
1
dt (K M
n
).
This mapping T
A
: M
n
M
n
is obviously positivity-preserving and approxi-
mation of the integral by nite sum shows also the complete positivity.
If A = Diag(
1
,
2
, . . . ,
n
), then it is seen from integration that the entries
of T
A
(K) are
T
A
(K)
ij
=
log
i
log
j
j
K
ij
.
Another integration gives that the mapping
: L
_
1
0
A
t
LA
1t
dt
acts as
((L))
ij
=

i
j
log
i
log
j
L
ij
.
This shows that
T
1
A
(L) =
_
1
0
A
t
LA
1t
dt.
To show that T
1
A
is not positive, we take n = 2 and consider
T
1
A
_
1 1
1 1
_
=
_
2
log
1
log
2
2
log
1
log
2
2
_
_
.
The positivity of this matrix is equivalent to the inequality
_
2

1
2
log
1
log
2
between the geometric and logarithmis means. The opposite inequality holds,
see Example 5.22, and therefore T
1
A
is not positive.
The next result tells that the Kraus representation of a completely
positive mapping is unique up to a unitary matrix.
Theorem 2.56 Let c : M
n
(C) M
m
(C) be a linear mapping which is rep-
resented as
c(A) =
k
t=1
V
t
AV
t
and c(A) =
k
t=1
W
t
AW
t
with operators V
t
, W
t
: C
n
C
m
. Then there exists a k k unitary matrix
[c
tu
] such that
W
t
=
u
c
tu
V
u
(1 t k).
Proof: Without loss of generality we may assume that m n. Indeed,
we can imbed M
m
= B(C
m
) into a bigger M
m
= B(C
m
) and consider c as
M
n
M
m
. Let x
i
be a basis in C
m
and y
j
be a basis in C
n
. Consider the
vectors
v
t
:=
n
j=1
x
j
V
t
y
j
and w
t
:=
n
j=1
x
j
W
t
y
j
.
We have
[v
t
v
t
[ =
j,j
[x
j
x
j
[ V
t
[y
j
y
j
[V
t
and
[w
t
w
t
[ =
j,j
[x
j
x
i
[ W
t
[y
j
y
j
[W
t
.
Our hypothesis implies that
t
[v
t
v
t
[ =
t
[w
t
w
t
[ .
Lemma 1.24 tells us that there is a unitary matrix [c
tu
] such that
w
t
=
u
c
tu
v
u
.
This implies that
W
t
y
j
=
u
c
tu
V
u
y
j
(1 j n).
Hence the statement of the theorem can be concluded.
Theorem 2.5 is from the paper J.-C. Bourin and E.-Y. Lee, Unitary orbits of
Hermitian operators with convex or concave functions, Bull. London Math.
Soc. 44(2012), 10851102.
The Wielandt inequality has an extension to matrices. Let A be an
n n positive matrix with eigenvalues
1

2

n
. Let X and Y be
n p and n q matrices such that X
Y = 0. The generalized inequality is

X
AY (Y
AY )
AX
_
1
+
n
_
2
X
AX,
where a generalized inverse (Y
AY )
is included: BB
B = B. See Song-Gui
Wang and Wai-Cheung Ip, A matrix version of the Wielandt inequality and
its applications to statistics, Linear Algebra Appl. 296(1999), 171181.
The lattice of ortho-projections has applications in quantum theory. The
cited Gleason theorem was obtained by A. M. Gleason in 1957, see also R.
Cooke, M. Keane and W. Moran: An elementary proof of Gleasons theorem,
Math. Proc. Cambridge Philos. Soc. 98(1985), 117128.
Theorem 2.27 is from the paper D. Petz and G. Toth, Matrix variances
with projections, Acta Sci. Math. (Szeged), 78(2012), 683688. An extension
of this result is in the paper Z. Leka and D. Petz, Some decompositions of
matrix variances, to be published.
Theorem 2.33 is the double commutant theorem of von Neumann from
1929, the original proof was for operators on an innite-dimensional Hilbert
space. (There is a relevant dierence between nite and innite dimensions;
in a nite-dimensional space all subspaces are closed.) The conditional ex-
pectation in Theorem 2.34 was rst introduced by H. Umegaki in 1954 and
it is related to the so-called Tomiyama theorem.
The maximum number of complementary MASAs in M
n
(C) is a popular
subject. If n is a prime power, then n+1 MASAs can be constructed, but n =
6 is an unknown problematic case. (The expected number of complementary
MASAs is 3 here.) It is interesting that if in M
n
(C) n MASAs exist, then
n = 1 is only the case, see M. Weiner, A gap for the maximum number of
mutually unbiased bases, Proc. Amer. Math. Soc. 141(2013), 19631969.
Theorem 2.39 is from the paper H. Ohno, D. Petz and A. Sz anto, Quasi-
orthogonal subalgebras of 4 4 matrices, Linear Algebra Appl. 425(2007),
109118. It was conjectured that in the case n = 2
k
the algebra M
n
(C)
cannot have N
k
:= (4
k
1)/3 complementary subalgebras isomorphic to M
2
,
but it was proved that there are N
k
1 copies. 2 is not a typical prime
number in this situation. If p > 2 is a prime number, then in the case n = p
k
the algebra M
n
(C) has N
k
:= (p
2k
1)/(p
2
1) complementary subalgebras
isomorphic to M
p
, see the paper H. Ohno, Quasi-orthogonal subalgebras of
matrix algebras, Linear Algebra Appl. 429(2008), 21462158.
Positive and conditionally negative denite kernel functions are well dis-
cussed in the book C. Berg, J.P.R. Christensen and P. Ressel, Harmonic
Analysis on Semigroups. Theory of Positive Denite and Related Functions,
Graduate Texts in Mathematics, 100. Springer, New York, 1984. (It is note-
worthy that the conditionally negative denite is called there negative de-
nite.)
2.8 Exercises
1. Show that
_
A B
B
C
_
0
if and only if B = A
1/2
ZC
1/2
with a matrix Z with |Z| 1.
2. Let X, U, V M
n
and assume that U and V are unitaries. Prove that
_
_
I U X
U
I V
X
V

I
_
_
0
if and only if X = UV .
3. Show that for A, B M
n
the formula
_
I A
0 I
_
1
_
AB 0
B 0
_ _
I A
0 I
_
=
_
0 0
B BA
_
holds. Conclude that AB and BA have the same eigenvectors.
4. Assume that 0 < A M
n
. Show that A + A
1
2I.
5. Assume that
A =
_
A
1
B
B
A
2
_
> 0.
Show that det A det A
1
det A
2
.
6. Assume that the eigenvalues of the self-ajoint matrix
_
A B
B
C
_
2.8. EXERCISES 101
are
1

2
. . .
n
and the eigenvalues of A are
1

2

m
.
Show that
i

i

i+nm
.
7. Show that a matrix A M
n
is irreducible if and only if for every
1 i, j n there is a power k such that (A
k
)
ij
,= 0.
8. Let A, B, C, D M
n
and AC = CA. Show that
det
_
A B
C D
_
= det(AD CB).
9. Let A, B, C M
n
and
_
A B
B
C
_
0.
Show that B
B A C.
10. Let A, B M
n
. Show that A B is a submatrix of A B.
11. Assume that P and Q are projections. Show that P Q is equivalent
to PQ = P.
12. Assume that P
1
, P
2
, . . . , P
n
are projections and P
1
+P
2
+ +P
n
= I.
Show that the projections are pairwise orthogonal.
13. Let A
1
, A
2
, , A
k
M
sa
n
and A
1
+ A
2
+ . . . + A
k
= I. Show that the
following statements are equivalent:
(1) All operators A
i
are projections.
(2) For all i ,= j the product A
i
A
j
= 0 holds.
(3) rank (A
1
) + rank (A
2
) + + rank (A
k
) = n.
14. Let U[A[ be the polar decomposition of A M
n
. Show that A is normal
if and only if U[A[ = [A[U.
15. The matrix M M
n
(C) is dened as
M
ij
= mini, j.
Show that M is positive.
16. Let A M
n
and the mapping S
A
: M
n
M
n
is dened as S
A
: B
A B. Show that the following statements are equivalent.
(1) A is positive.
(2) S
A
: M
n
M
n
is positive.
(3) S
A
: M
n
M
n
is completely positive.
17. Let A, B, C be operators on a Hilbert space 1 and A, C 0. Show
that
_
A B
B
C
_
0
if and only if [Bx, y[ Ay, y Cx, x for every x, y 1.
18. Let P M
n
be idempotent, P
2
= P. Show that P is an ortho-
projection if and only if |P| 1.
19. Let P M
n
be an ortho-projection and 0 < A M
n
. Show the
following formulas:
[P](A
2
) ([P]A)
2
, ([P]A)
1/2
[P](A
1/2
), [P](A
1
) ([P]A)
.
20. Show that the kernels
(x, y) = cos(x y), cos(x
2
y
2
), (1 +[x y[)
1
are positive semidenite on R R.
21. Show that the equality
A (B C) = (A B) (A C)
is not true for ortho-projections.
22. Assume that the kernel : AA C is positive denite and (x, x) >
0 for every x A. Show that
(x, y) =
(x, y)
(x, x)(y, y)
is a positive denite kernel.
23. Assume that the kernel : AA C is negative denite and (x, x)
0 for every x A. Show that
log(1 + (x, y))
is a negative denite kernel.
2.8. EXERCISES 103
24. Show that the kernel (x, y) = (sin(x y))
2
is negative semidenite on
R R.
25. Show that the linear mapping c
p,n
: M
n
M
n
dened as
c
p,n
(A) = pA+ (1 p)
I
n
Tr A.
is completely positive if and only if
1
n
2
1
p 1 .
26. Show that the linear mapping c : M
n
M
n
dened as
c(D) =
1
n 1
(Tr (D)I D
t
)
is completely positive unital mapping. (Here D
t
denotes the transpose
of D.) Show that c has negative eigenvalue. (This mapping is called
HolevoWerner channel.)
27. Assume that c : M
n
M
n
is dened as
c(A) =
1
n 1
(I Tr AA).
Show that c is positive but not completely positive.
28. Let p be a real number. Show that the mapping c
p,2
: M
2
M
2
dened
as
c
p,2
(A) = pA+ (1 p)
I
2
Tr A
is positive if and only if 1 p 1. Show that c
p,2
is completely
positive if and only if 1/3 p 1.
29. Show that |(f
1
, f
2
)|
2
= |f
1
|
2
+|f
2
|
2
.
30. Give the analogue of Theorem 2.1 when C is assumed to be invertible.
31. Let 0 A I. Find the matrices B and C such that
_
A B
B
C
_
.
is a projection.
32. Let dim1 = 2 and 0 A, B B(1). Show that there is an orthogonal
basis such that
A =
_
a 0
0 b
_
, B =
_
c d
d e
_
with positive numbers a, b, c, d, e 0.
33. Let
M =
_
A B
B A
_
and assume that A and B are self-adjoint. Show that M is positive if
and only if A B A.
34. Determine the inverses of the matrices
A =
_
a b
b a
_
and B =
_
_
a b c d
b a d c
c d a b
d c b a
_
_
.
35. Give the analogue of the factorization (2.2) when D is assumed to be
invertible.
36. Show that the self-adjoint invertible matrix
_
_
A B C
B
D 0
C
0 E
_
_
has inverse in the form
_
_
Q
1
P R
P
D
1
(I + B
P) D
1
B
R
R
BD
1
E
1
(I + C
R)
_
_
,
where
Q = ABD
1
B
CE
1
C
, P = Q
1
BD
1
, R = Q
1
CE
1
.
37. Find the determinant and the inverse of the block-matrix
_
A 0
a 1
_
.
2.8. EXERCISES 105
38. Let A M
n
be an invertible matrix and d C. Show that
det
_
A b
c d
_
= (d cA
1
b)det A
where c = [c
1
, . . . , c
n
] and b = [b
1
, . . . , n
n
]
t
.
39. Show the concavity of the variance functional Var
(A) dened in
(2.9). The concavity is
Var
(A)
i
Var
i
(A) if =
i
when
i
0 and
i
= 1.
40. For x, y R
3
and
x :=
3
i=1
x
i
i
, y :=
3
i=1
y
i
i
show that
(x )(y ) = x, y
0
+ i(x y) , (2.17)
where x y is the vectorial product in R
3
.
Chapter 3
Functional calculus and
derivation
Let A M
n
(C) and p(x) :=
i
c
i
x
i
be a polynomial. It is quite obvious that
by p(A) one means the matrix
i
c
i
A
i
. So the functional calculus is trivial for
polynomials. Slightly more generally, let f be a holomorphic function with
the Taylor expansion f(z) =
k=0
c
k
(z a)
k
. Then for every A M
n
(C)
such that the operator norm |A aI| is less than radius of convergence of
f, one can dene the analytic functional calculus f(A) :=
k=0
c
k
(AaI)
k
.
This analytic functional calculus can be generalized by the Cauchy integral:
f(A) :=
1
2i
_
f(z)(zI A)
1
dz
if f is holomorphic in a domain G containing the eigenvalues of A, where is
a simple closed contour in G surrounding the eigenvalues of A. On the other
hand, when A M
n
(C) is self-adjoint and f is a general function dened on
an interval containing the eigenvalues of A, the functional calculus f(A) is
dened via the spectral decomposition of A or the diagonalization of A, that
is,
f(A) =
k
i=1
f(
i
)P
i
= UDiag(f(
1
), . . . , f(
n
))U
for the spectral decomposition A =
k
i=1
i
P
i
and the diagonalization A =
UDiag(
1
, . . . ,
n
)U
. In this way, one has some types of functional calculus

for matrices (also operators). When dierent types of functional calculus can
be dened for one A M
n
(C), they are the same. The second half of this
chapter contains several formulas for derivatives
d
dt
f(A+ tT)
106
3.1. THE EXPONENTIAL FUNCTION 107
and Frechet derivatives of functional calculus.
3.1 The exponential function
The exponential function is well-dened for all complex numbers. It has a
convenient Taylor expansion and it appears in some dierential equations. It
is important also for matrices.
The Taylor expansion can be used to dene e
A
for a matrix A M
n
(C):
e
A
:=
n=0
A
n
n!
. (3.1)
Here the right-hand side is an absolutely convergent series:
n=0
_
_
_
_
A
n
n!
_
_
_
_
n=0
|A|
n
n!
= e
A
The rst example is in connection with the Jordan canonical form.
Example 3.1 We take
A =
_
_
a 1 0 0
0 a 1 0
0 0 a 1
0 0 0 a
_
_
= aI + J.
Since I and J commute and J
m
= 0 for m > 3, we have
A
n
= a
n
I + na
n1
J +
n(n 1)
2
a
n2
J
2
+
n(n 1)(n 2)
2 3
a
n3
J
3
and
n=0
A
n
n!
=
n=0
a
n
n!
I +
n=1
a
n1
(n 1)!
J +
1
2
n=2
a
n2
(n 2)!
J
2
+
1
6
n=3
a
n3
(n 3)!
J
3
= e
a
I + e
a
J +
1
2
e
a
J
2
+
1
6
e
a
J
3
. (3.2)
So we have
e
A
= e
a
_
_
1 1 1/2 1/6
0 1 1 1/2
0 0 1 1
0 0 0 1
_
_
.
108 CHAPTER 3. FUNCTIONAL CALCULUS AND DERIVATION
Note that (3.2) shows that e
A
is a linear combination of I, A, A
2
, A
3
. (This
is contained in Theorem 3.6, the coecients are specied by dierential equa-
tions.) If B = SAS
1
, then e
B
= Se
A
S
1
.
Example 3.2 It is a basic fact in analysis that
e
a
= lim
n
_
1 +
a
n
_
n
for a complex number a, but we have also for matrices:
e
A
= lim
n
_
I +
A
n
_
n
. (3.3)
This can be checked similarly to the previous example:
e
aI+J
= lim
n
_
I
_
1 +
a
n
_
+
1
n
J
_
n
.
From the point of view of numerical computation (3.1) is a better for-
mula, but (3.3) will be extended in the next theorem. (An extension of the
exponential function will appear later in (6.45).)
Theorem 3.3 Let
T
m,n
(A) =
_
m
k=0
1
k!
_
A
n
_
k
_
n
(m, n N).
Then
lim
m
T
m,n
(A) = lim
n
T
m,n
(A) = e
A
.
Proof: The matrices B = e
A
n
and
T =
m
k=0
1
k!
_
A
n
_
k
commute. Hence
e
A
T
m,n
(A) = B
n
T
n
= (B T)(B
n1
+ B
n2
T + + T
n1
).
We can estimate:
|e
A
T
m,n
(A)| |B T|n max|B|
i
|T|
ni1
: 0 i n 1.
Since |T| e
A
n
and |B| e
A
n
, we have
|e
A
T
m,n
(A)| n|e
A
n
T|e
n1
n
A
.
By bounding the tail of the Taylor series,
|e
A
T
m,n
(A)|
n
(m+ 1)!
_
|A|
n
_
m+1
e
A
n
e
n1
n
A
converges to 0 in the two cases m and n .
Theorem 3.4 If AB = BA, then
e
t(A+B)
= e
tA
e
tB
(t R). (3.4)
Conversely, if this equality holds, then AB = BA.
Proof: First we assume that AB = BA and compute the product e
A
e
B
by
multiplying term by term the series:
e
A
e
B
=
m,n=0
1
m!n!
A
m
B
n
.
Therefore,
e
A
e
B
=
k=0
1
k!
C
k
,
where
C
k
:=
m+n=k
k!
m!n!
A
m
B
n
.
Due to the commutation relation the binomial formula holds and C
k
= (A+
B)
k
. We conclude
e
A
e
B
=
k=0
1
k!
(A + B)
k
which is the statement.
Another proof can be obtained by dierentiation. It follows from the
expansion (3.1) that the derivative of the matrix-valued function t e
tA
dened on R is e
tA
A:
d
dt
e
tA
= e
tA
A = Ae
tA
. (3.5)
Therefore, when AC = CA,
d
dt
e
tA
e
CtA
= e
tA
Ae
CtA
e
tA
Ae
CtA
= 0.
It follows that the function t e
tA
e
CtA
is constant. In particular,
e
A
e
CA
= e
C
.
Put A + B in place of C to have the statement (3.4).
The rst derivative of (3.4) is
e
t(A+B)
(A + B) = e
tA
Ae
tB
+ e
tA
e
tB
B
and the second derivative is
e
t(A+B)
(A+ B)
2
= e
tA
A
2
e
tB
+ e
tA
Ae
tB
B + e
tA
Ae
tB
B + e
tA
e
tB
B
2
.
For t = 0 this is BA = AB.
Example 3.5 The matrix exponential function can be used to formulate the
solution of a linear rst-order dierential equation. Let
x(t) =
_
_
x
1
(t)
x
2
(t)
.
.
.
x
n
(t)
_
_
and x
0
=
_
_
x
1
x
2
.
.
.
x
n
_
_
.
The solution of the dierential equation
x
(t) = Ax(t), x(0) = x

0
is x(t) = e
tA
x
0
due to formula (3.5).
Theorem 3.6 Let A M
n
with characteristic polynomial
p() = det(I A) =
n
+ c
n1
n1
+ + c
1
+ c
0
.
Then
e
tA
= x
0
(t)I + x
1
(t)A+ + x
n1
(t)A
n1
,
where the vector
x(t) = (x
0
(t), x
1
(t), . . . , x
n1
(t))
satises the nth order dierential equation
x
(n)
(t) + c
n1
x
(n1)
(t) + + c
1
x
(t) + c
0
x(t) = 0
with the initial condition
x
(k)
(0) = (
1
0 , . . . , 0,
k
1 , 0, . . . , 0)
for 0 k n 1.
Proof: We can check that the matrix-valued functions
F
1
(t) = x
0
(t)I + x
1
(t)A + + x
n1
(t)A
n1
and F
2
(t) = e
tA
satisfy the conditions
F
(n)
(t) + c
n1
F
(n1)
(t) + + c
1
F
(t) + c
0
F(t) = 0
and
F(0) = I, F
(0) = A, . . . , F
(n1)
(0) = A
n1
.
Therefore F
1
= F
2
.
Example 3.7 In case of 2 2 matrices, the use of the Pauli matrices
1
=
_
0 1
1 0
_
,
2
=
_
0 i
i 0
_
,
3
=
_
1 0
0 1
_
is ecient, together with I they form an orthogonal system with respect to
Hilbert-Schmidt inner product.
Let A M
sa
2
be such that
A = c
1
1
+ c
2
2
+ c
3
3
, c
2
1
+ c
2
2
+ c
2
3
= 1
in the representation with Pauli matrices. It is simple to check that A
2
= I.
Therefore, for even powers A
2n
= I, but for odd powers A
2n+1
= A. Choose
c R and combine the two facts with the knowledge of the relation of the
exponential to sine and cosine:
e
icA
=
n=0
i
n
c
n
A
n
n!
=
n=0
(1)
n
c
2n
A
2n
(2n)!
+ i
n=0
(1)
n
c
2n+1
A
2n+1
(2n + 1)!
= (cos c)I + i(sin c)A.
A general matrix has the form C = c
0
I + cA and
e
iC
= e
ic
0
(cos c)I + ie
ic
0
(sin c)A.
(e
C
is similar, see Exercise 13.)
The next theorem gives the so-called the Lie-Trotter formula. (A gen-
eralization is Theorem 5.17.)
Theorem 3.8 Let A, B M
n
(C). Then
e
A+B
= lim
m
_
e
A/m
e
B/m
_
n
Proof: First we observe that the identity
X
n
Y
n
=
n1
j=0
X
n1j
(X Y )Y
j
implies the norm estimate
|X
n
Y
n
| nt
n1
|X Y |
for the submultiplicative operator norm when the constant t is chosen such
that |X|, |Y | t.
Now we choose X
n
:= exp((A + B)/n) and Y
n
:= exp(A/n) exp(B/n).
From the above estimate we have
|X
n
n
Y
n
n
| nu|X
n
Y
n
|, (3.6)
if we can nd a constant u such that |X
n
|
n1
, |Y
n
|
n1
u. Since
|X
n
|
n1
( exp((|A| +|B|)/n))
n1
exp(|A| +|B|)
and
|Y
n
|
n1
( exp(|A|/n))
n1
( exp(|B|/n))
n1
exp |A| exp |B|,
u = exp(|A| +|B|) can be chosen to have the estimate (3.6).
The theorem follows from (3.6) if we show that n|X
n
Y
n
| 0. The
power series expansion of the exponential function yields
X
n
= I +
A+ B
n
+
1
2
_
A + B
n
_
2
+
and
Y
n
=
_
I +
A
n
+
1
2
_
A
n
_
2
+ . . .
__
I +
B
n
+
1
2
_
B
n
_
2
+
_
.
If X
n
Y
n
is computed by multiplying the two series in Y
n
, one can observe
that all constant terms and all terms containing 1/n cancel. Therefore
|X
n
Y
n
|
c
n
2
for some positive constant c.
If A and B are self-adjoint matrices, then it can be better to reach e
A+B
as the limit of self-adjoint matrices.
Corollary 3.9
e
A+B
= lim
n
_
e
A
2n
e
B
n
e
A
2n
_
n
.
Proof: We have
_
e
A
2n
e
B
n
e
A
2n
_
n
= e
A
2n
_
e
A/n
e
B/n
_
n
e
A
2n
and the limit n gives the result.
The Lie-Trotter formula can be extended to more matrices:
|e
A
1
+A
2
+...+A
k
(e
A
1
/n
e
A
2
/n
e
A
k
/n
)
n
|
2
n
_
k
j=1
|A
j
|
_
exp
_
n + 2
n
k
j=1
|A
j
|
_
. (3.7)
Theorem 3.10 For matrices A, B M
n
the Taylor expansion of the function
R t e
A+tB
is
k=0
t
k
A
k
(1) ,
where A
0
(s) = e
sA
and
A
k
(s) =
_
s
0
dt
1
_
t
1
0
dt
2

_
t
k1
0
dt
k
e
(st
1
)A
Be
(t
1
t
2
)A
B Be
t
k
A
for s R.
Proof: To make dierentiation easier we write
A
k
(s) =
_
s
0
e
(st
1
)A
BA
k1
(t
1
) dt
1
= e
sA
_
s
0
e
t
1
A
BA
k1
(t
1
) dt
1
for k 1. It follows that
d
ds
A
k
(s) = Ae
sA
_
s
0
e
t
1
A
BA
k1
(t
1
) dt
1
+ e
sA
d
ds
_
s
0
e
t
1
A
BA
k1
(t
1
) dt
1
= AA
k
(s) + BA
k1
(s).
Therefore
F(s) :=
k=0
A
k
(s)
satises the dierential equation
F
(s) = (A+ B)F(s), F(0) = I.

Therefore F(s) = e
s(A+B)
. If s = 1 and we write tB in place of B, then we
get the expansion of e
A+tB
.
Corollary 3.11
t
e
A+tB
t=0
=
_
1
0
e
uA
Be
(1u)A
du.
Another important formula for the exponential function is the Baker-
Campbell-Hausdor formula:
e
tA
e
tB
= exp
_
t(A + B) +
t
2
2
[A, B] +
t
3
12
([A, [A, B]] [B, [A, B]]) + O(t
4
)
_
where the commutator [A, B] := AB BA is included.
A function f : R
+
= [0, ) R is completely monotone if the nth
derivative of f has the sign (1)
n
on the whole R
+
and for every n N.
The next theorem is related to a conjecture.
sa
n
and let t R. The following statements are
equivalent:
(i) The polynomial t Tr (A+tB)
p
has only positive coecients for every
A, B 0 and all p N.
(ii) For every A self-adjoint and B 0, the function t Tr exp (AtB)
is completely monotone on [0, ).
(iii) For every A > 0, B 0 and all p 0, the function t Tr (A+ tB)
p
is completely monotone on [0, ).
Proof: (i)(ii): We have
Tr exp (A tB) = e
A
k=0
1
k!
Tr (A +|A|I tB)
k
and it follows from Bernsteins theorem and (i) that the right-hand side is
the Laplace transform of a positive measure supported in [0, ).
(ii)(iii): Due to the matrix equation
(A+ tB)
p
=
1
(p)
_

0
exp [u(A+ tB)] u
p1
du,
we can see the signs of the derivatives.
(iii)(i): It suces to assume (iii) only for p N. For invertible A, by
Lemma 3.31 below we observe that the rth derivative of Tr (A
0
+ tB
0
)
p
at
3.2. OTHER FUNCTIONS 115
t = 0 is related to the coecient of t
r
in Tr (A + tB)
p
as given by (3.17),
where A, A
0
, B, B
0
are related as in the lemma. The left side of (3.17) has
the sign (1)
r
because it is the derivative of a completely monotone function.
Thus the right-hand side has the correct sign as stated in item (i). The case
of non-invertible A follows from continuity argument.
The Laplace transform of a measure on R
+
is
f(t) =
_

0
e
tx
d(x) (t R
+
).
According to the Bernstein theorem such a measure exists if and only is
f is a completely monotone function.
Bessis, Moussa and Villani conjectured in 1975 that the function t
Tr exp(A tB) is a completely monotone function if A is self-adjoint and
B is positive. Theorem 3.12 due to Lieb and Seiringer gives an equivalent
condition. Property (i) has a very simple formulation.
3.2 Other functions
All reasonable functions can be approximated by polynomials. Therefore, it
is basic to compute p(X) for a matrix X M
n
and for a polynomial p. The
canonical Jordan decomposition
X = S
_
_
J
k
1
(
1
) 0 0
0 J
k
2
(
2
) 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 J
km
(
m
)
_
_
S
1
= SJS
1
,
gives that
p(X) = S
_
_
p(J
k
1
(
1
)) 0 0
0 p(J
k
2
(
2
)) 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 p(J
km
(
m
))
_
_
S
1
= Sp(J)S
1
.
The crucial point is the computation of (J
k
())
m
. Since J
k
() = I
n
+J
k
(0) =
I
n
+ J
k
is the sum of commuting matrices, we can compute the mth power
by using the binomial formula:
(J
k
())
m
=
m
I
n
+
m
j=1
_
m
j
_
mj
J
j
k
.
The powers of J
k
are known, see Example 1.15. Let m > 3, then the example
J
4
()
m
=
_
m
m
m1
m(m1)
m2
2!
m(m1)(m2)
m3
3!
0
m
m
m1
m(m1)
m2
2!
0 0
m
m
m1
0 0 0
m
_
_
.
shows the point. In another formulation,
p(J
4
()) =
_
_
p() p
()
p
()
2!
p
(3)
()
3!
0 p() p
()
p
()
2!
0 0 p() p
()
0 0 0 p()
_
_
,
which is actually correct for all polynomials and for every smooth function.
We conclude that if the canonical Jordan form is known for X M
n
, then
f(X) is computable. In particular, the above argument gives the following
result.
Theorem 3.13 For X M
n
the relation
det e
X
= exp(Tr X)
holds between trace and determinant.
A matrix A M
n
is diagonizable if
A = S Diag(
1
,
2
, . . . ,
n
)S
1
with an invertible matrix S. Observe that this condition means that in
the Jordan canonical form all Jordan blocks are 1 1 and the numbers
1
,
2
, . . . ,
n
are the eigenvalues of A. In this case,
f(A) = S Diag(f(
1
), f(
2
), . . . , f(
n
))S
1
(3.8)
when the complex-valued function f is dened on the set of eigenvalues of A.
If the numbers
1
,
2
, . . . ,
n
are dierent, then we can have a polynomial
p(x) of order n 1 such that p(
i
) = f(
i
):
p(x) =
n
j=1
i=j
x
i
j

i
f(
j
) .
(This is the so-called Lagrange interpolation formula.) Therefore we have
f(A) = p(A) =
n
j=1
i=j
A
i
I
j

i
f(
j
).
(Relevant formulations are in Exercises 14 and 15.)
Example 3.14 We consider the self-adjoint matrix
X =
_
1 + z x yi
x + yi 1 z
_
=
_
1 + z w
w 1 z
_
when x, y, z R. From the characteristic polynomial we have the eigenvalues
1
= 1 + R and
2
= 1 R,
where R =
_
x
2
+ y
2
+ z
2
. If R < 1, then X is positive and invertible. The
eigenvectors are
u
1
=
_
R + z
w
_
and u
2
=
_
R z
w
_
.
Set
=
_
1 + R 0
0 1 R
_
, S =
_
R + z R z
w w
_
.
We can check that XS = S, hence
X = SS
1
.
To compute S
1
we use the formula
_
a b
c d
_
1
=
1
ad bc
_
d b
c a
_
.
Hence
S
1
=
1
2wR
_
w R z
w R z
_
.
It follows that
X
t
= a
t
_
b
t
+ z w
w b
t
z
_
,
where
a
t
=
(1 + R)
t
(1 R)
t
2R
, b
t
= R
(1 + R)
t
+ (1 R)
t
(1 + R)
t
(1 R)
t
.
The matrix X/2 is a density matrix and has applications in quantum
theory.
In the previous example the function f(x) = x
t
was used. If the eigen-
values of A are positive, then f(A) is well-dened. The canonical Jordan
decomposition is not the only possibility to use. It is known in analysis that
x
p
=
sin p
_

0
x
p1
+ x
d (x (0, ))
when 0 < p < 1. It follows that for a positive matrix A we have
A
p
=
sin p
_

0
p1
A(I + A)
1
d.
For self-adjoint matrices we can have a simple formula, but the previous
integral formula is still useful in some situations, for example for some dier-
entiation.
Remember that self-adjoint matrices are diagonalizable and they have a
spectral decomposition. Let A =
i
P
i
be the spectral decomposition of
a self-adjoint A M
n
(C). (
i
are the dierent eigenvalues and P
i
are the
corresponding eigenprojections; the rank of P
i
is the multiplicity of
i
.) Then
f(A) =
i
f(
i
)P
i
. (3.9)
Usually we assume that f is continuous on an interval containing the eigen-
values of A.
Example 3.15 Consider
f
+
(t) := maxt, 0 and f
(t) := maxt, 0 for t R.

For each A B(1)
sa
dene
A
+
:= f
+
(A) and A
:= f
(A).
Since f
+
(t), f
(t) 0, f
+
(t) f
(t) = t and f
+
(t)f
(t) = 0, we have
A
+
, A
0, A = A
+
A
, A
+
A
= 0.
These A
+
and A
are called the positive part and the negative part of A,

respectively, and A = A
+
+A
is called the Jordan decomposition of A.

Let f be holomorphic inside and on a positively oriented simple contour
in the complex plane and let A be an n n matrix such that its eigenvalues
are inside of . Then
f(A) :=
1
2i
_
f(z)(zI A)
1
dz (3.10)
is dened by a contour integral. When A is self-adjoint, then (3.9) makes
sense and it is an exercise to show that it gives the same result as (3.10).
Example 3.16 We can dene the square root function on the set
G := re
i
C : r > 0, /2 < < /2
as
re
i
:=
re
i/2
and this is a holomorphic function in G.
When X = S Diag(
1
,
2
, . . . ,
n
) S
1
M
n
is a weakly positive matrix,
then
1
,
2
, . . . ,
n
> 0 and to use (3.10) we can take a positively oriented
simple contour in G such that the eigenvalues are inside. Then
X =
1
2i
_
z(zI X)
1
dz
= S
_
1
2i
_
z Diag(1/(z
1
), 1/(z
2
), . . . , 1/(z
n
)) dz
_
S
1
= S Diag(
_
1
,
_
2
, . . . ,
_
n
) S
1
.
Example 3.17 The logarithm is a well-dened dierentiable function on

positive numbers. Therefore for a strictly positive operator A formula (3.9)
gives log A. Since
log x =
_

0
1
1 + t

1
x + t
dt,
we can use
log A =
_

0
1
1 + t
I (A+ tI)
1
dt. (3.11)
If we have a matrix A with eigenvalues out of R
= (, 0], then we can

take the domain
T = re
i
C : r > 0, < <
with the function re
i
log r + i. The integral formula (3.10) can be used
for the calculus. Another useful formula is
log A =
_
1
0
(AI) (t(AI) + I)
1
dt (3.12)
(when A does not have eigenvalue in R
).
Note that log(ab) = log a + log b is not true for any complex numbers, so
it cannot be expected for (commuting) matrices.
Theorem 3.18 If f
k
and g
k
are functions (, ) R such that for some
c
k
R
k
c
k
f
k
(x)g
k
(y) 0
for every x, y (, ), then
k
c
k
Tr f
k
(A)g
k
(B) 0
whenever A, B are self-adjoint matrices with spectra in (, ).
Proof: Let A =
i
P
i
and B =
j

j
Q
j
be the spectral decompositions.
Then
k
c
k
Tr f
k
(A)g
k
(B) =
i,j
c
k
Tr P
i
f
k
(
i
)g
k
(
j
)Q
j
=
i,j
Tr P
i
Q
j
k
c
k
f
k
(
i
)g
k
(
j
) 0
due to the hypothesis.
In the theorem assume that
k
c
k
f
k
(x)g
k
(y) = 0 if and only if x = y.
Then we show that
k
c
k
Tr f
k
(A)g
k
(B) = 0 if and only if A = B. From
the above proof it follows that
k
c
k
Tr f
k
(A)g
k
(B) = 0 holds if and only if
Tr P
i
Q
j
> 0 implies
i
=
j
. This property yields
Q
j
AQ
j
=
i
Q
j
P
i
Q
j
=
j
Q
j
,
and similarly Q
j
A
2
Q
j
=
2
j
Q
j
. Hence
(AQ
j

j
Q
j
)
(AQ
j

j
Q
j
) = Q
j
A
2
Q
j
2
j
Q
j
AQ
j
+
2
j
Q
j
= 0
so that AQ
j
=
j
Q
j
= BQ
j
for all j, which implies A = B. The converse is
obvious.
Example 3.19 In order to show an application of the previous theorem,
assume that f is convex. Then
f(x) f(y) (x y)f
(y) 0
and
Tr f(A) Tr f(B) + Tr (AB)f
(B) .
Replacing f by (t) = t log t we have
Tr Alog A Tr Blog B + Tr (AB) + Tr (AB) log B
or equivalently
Tr A(log Alog B) Tr (AB) 0.
The left-hand side is the quantum relative entropy S(A|B) of the positive
denite matrices A and B. (In fact, S(A|B) is well-dened for A, B 0 if
ker A ker B; otherwise, it is dened to be +). Moreover, since (t) is
strictly convex, we see that S(A|B) = 0 if and only if A = B.
If Tr A = Tr B, then S(A|B) is the so-called Umegaki relative entropy:
S(A|B) = Tr A(log A log B). For this we can have a better estimate. If
Tr A = Tr B = 1, then all eigenvalues are in [0, 1]. Analysis tells us that for
some (x, y)
(x) + (y) + (x y)
(y) =
1
2
(x y)
2
()
1
2
(x y)
2
when x, y [0, 1]. According to Theorem 3.18 we have
Tr A(log Alog B)
1
2
Tr (AB)
2
. (3.13)
The Streater inequality (3.13) has the consequence that A = B if the rela-
tive entropy is 0. Indeed, a stronger inequality called Pinskers inequality
is known: If Tr A = Tr B, then
Tr A(log Alog B)
1
2
|AB|
2
1
,
where |AB|
1
:= Tr [AB[ is the trace-norm of AB, see Section 6.3.
3.3 Derivation
This section contains derivatives of scalar-valued and matrix-valued func-
tions. From the latter one, scalar-valued can be obtained by taking trace, for
example.
n
is invertible. Then A+tT is invertible
as well for T M
n
and for a small real number t. The identity
(A+ tT)
1
A
1
= (A+ tT)
1
(A(A+ tT))A
1
= t(A + tT)
1
TA
1
,
gives
lim
t0
1
t
_
(A+ tT)
1
A
1
_
= A
1
TA
1
.
The derivative was computed at t = 0, but if A + tT is invertible, then
d
dt
(A + tT)
1
= (A + tT)
1
T(A+ tT)
1
by a similar computation. We can continue the derivation:
d
2
dt
2
(A+ tT)
1
= 2(A+ tT)
1
T(A+ tT)
1
T(A+ tT)
1
.
d
3
dt
3
(A+ tT)
1
= 6(A + tT)
1
T(A+ tT)
1
T(A+ tT)
1
T(A+ tT)
1
.
So the Taylor expansion is
(A + tT)
1
= A
1
tA
1
TA
1
+ t
2
A
1
TA
1
TA
1
t
3
A
1
TA
1
TA
1
TA
1
+
=
n=0
(t)
n
A
1/2
(A
1/2
TA
1/2
)
n
A
1/2
.
Since
(A + tT)
1
= A
1/2
(I + tA
1/2
TA
1/2
)
1
A
1/2
we can get the Taylor expansion also from the Neumann series of (I +
tA
1/2
TA
1/2
)
1
, see Example 1.8.
Example 3.21 There is an interesting formula for the joint relation of the
functional calculus and derivation:
f
__
A B
0 A
__
=
_
f(A)
d
dt
f(A+ tB)
0 f(A)
_
.
If f is a polynomial, then it is easy to check this formula.
3.3. DERIVATION 123
n
is positive invertible. Then A + tT
is positive invertible as well for T M
sa
n
and for a small real number t.
Therefore log(A+ tT) is dened and it is expressed as
log(A+ tT) =
_

0
(x + 1)
1
I (xI + A+ tT)
1
dx.
This is a convenient formula for the derivation (with respect to t R):
d
dt
log(A+ tT) =
_

0
(xI + A)
1
T(xI + A)
1
dx
from the derivative of the inverse. The derivation can be continued and we
have the Taylor expansion
log(A+ tT) = log A + t
_

0
(x + A)
1
T(x + A)
1
dx
t
2
_

0
(x + A)
1
T(x + A)
1
T(x + A)
1
dx +
= log A
n=1
(t)
n
_

0
(x + A)
1/2
((x + A)
1/2
T(x + A)
1/2
)
n
(x + A)
1/2
dx.

n
(C) be self-adjoint matrices and t R. As-
sume that f : (, ) R is a continuously dierentiable function dened on
an interval and assume that the eigenvalues of A+tB are in (, ) for small
t t
0
. Then
d
dt
Tr f(A+ tB)
t=t
0
= Tr (Bf
(A+ t
0
B)) .
Proof: One can verify the formula for a polynomial f by an easy direct
computation: Tr (A + tB)
n
is a polynomial of the real variable t. We are
interested in the coecient of t which is
Tr (A
n1
B + A
n2
BA+ + ABA
n2
+ BA
n1
) = nTr A
n1
B.
We have the result for polynomials and the formula can be extended to a
more general f by means of polynomial approximation.
Example 3.24 Let f : (, ) R be a continuous increasing function and
assume that the spectrum of the self-adjoint matrices A and C lie in (, ).
We use the previous theorem to show that
A C implies Tr f(A) Tr f(C). (3.14)
We may assume that f is smooth and it is enough to show that the deriva-
tive of Tr f(A + tB) is positive when B 0. (To observe (3.14), one takes
B = C A.) The derivative is Tr (Bf
(A + tB)) and this is the trace of the

product of two positive operators. Therefore, it is positive.
Another (simpler) way to show is use of Theorem 2.16. For the eigen-
values of A, C we have
k
(A)
k
(C) (1 k n) and hence Tr f(A) =
k
f(
k
(A))
k
f(
k
(C)) = Tr f(C).
For a holomorphic function f, we can compute the derivative of f(A+tB)
on the basis of (3.10), where is a positively oriented simple contour satisfying
the properties required above. The derivation is reduced to the dierentiation
of the resolvent (zI (A+ tB))
1
and we obtain
X :=
d
dt
f(A+ tB)
t=0
=
1
2i
_
f(z)(zI A)
1
B(zI A)
1
dz . (3.15)
When A is self-adjoint, then it is not a restriction to assume that it is diagonal,
A = Diag(t
1
, t
2
, . . . , t
n
), and we compute the entries of the matrix (3.15) using
the Frobenius formula
1
2i
_
f(z)
(z t
i
)(z t
j
)
dz =
f(t
i
) f(t
j
)
t
i
t
j
(this means f
(t
i
) if t
i
= t
j
). Therefore,
X
ij
=
1
2i
_
f(z)
1
z t
i
B
ij
1
z t
j
dz =
f(t
i
) f(t
j
)
t
i
t
j
B
ij
.
A C
1
function, together with its derivative, can be approximated by polyno-
mials. Hence we have the following result.
Theorem 3.25 Assume that f : (, ) R is a C
1
function and A =
Diag(t
1
, t
2
, . . . , t
n
) with < t
i
< (1 i n). If B = B
, then the
derivative t f(A+ tB) is a Schur product:
d
dt
f(A+ tB)
t=0
= D B, (3.16)
where D is the divided dierence matrix:
D
ij
=
_
_
f(t
i
) f(t
j
)
t
i
t
j
if t
i
t
j
,= 0,
f
(t
i
) if t
i
t
j
= 0.
3.3. DERIVATION 125
Let f : (, ) R be a continuous function. It is called matrix mono-
tone if
A C implies f(A) f(C)
when the spectra of the self-adjoint matrices B and C lie in (, ).
Theorem 2.15 tells us that f(x) = 1/x is a matrix monotone function.
Matrix monotonicity means that f(A+tB) is an increasing function when B
0. The increasing property is equivalent to the positivity of the derivative.
We use the previous theorem to show that the function f(x) =
x is matrix
monotone.
Example 3.26 Assume that A > 0 is diagonal: A = Diag(t
1
, t
2
, . . . , t
n
).
Then derivative of the function
A + tB is D B, where
D
ij
=
_
_
1
t
i
+
t
j
if t
i
t
j
,= 0,
1
2
t
i
if t
i
t
j
= 0.
This is a Cauchy matrix, see Example 1.41 and it is positive. If B is positive,
then so is the Schur product. We have shown that the derivative is positive,
hence f(x) =
x is matrix monotone.
The idea of another proof is in Exercise 28.
A subset K M
n
is convex if for any A, B K and for a real number
0 < < 1
A+ (1 )B K.
The functional F : K R is convex if for A, B K and for a real number
0 < < 1 the inequality
F(A+ (1 )B) F(A) + (1 )F(B)
holds. This inequality is equivalent to the convexity of the function
G : [0, 1] R, G() := F(B + (AB)).
It is well-known in analysis that the convexity is related to the second deriva-
tive.
Theorem 3.27 Let K be the set of self-adjoint nn matrices with spectrum
in the interval (, ). Assume that the function f : (, ) R is a convex
C
2
function. Then the functional A Tr f(A) is convex on K.
Proof: The stated convexity is equivalent to the convexity of the numerical
functions
t Tr f(tX
1
+ (1 t)X
2
) = Tr (X
2
+ t(X
1
X
2
)) (t [0, 1]).
It is enough to prove that the second derivative of t Tr f(A+tB) is positive
at t = 0.
The rst derivative of the functional t Tr f(A+tB) is Tr f
(A+tB)B.
To compute the second derivative we dierentiate f
(A+tB). We can assume

that A is diagonal and we dierentiate at t = 0. We have to use (3.16) and
get
_
d
dt
f
(A+ tB)
t=0
_
ij
=
f
(t
i
) f
(t
j
)
t
i
t
j
B
ij
.
Therefore,
d
2
dt
2
Tr f(A+ tB)
t=0
= Tr
_
d
dt
f
(A+ tB)
t=0
_
B
=
i,k
_
d
dt
f
(A+ tB)
t=0
_
ik
B
ki
=
i,k
f
(t
i
) f
(t
k
)
t
i
t
k
B
ik
B
ki
=
i,k
f
(s
ik
)[B
ik
[
2
,
where s
ik
is between t
i
and t
k
. The convexity of f means f
(s
ik
) 0, hence
we conclude the positivity.
In the above theorem one can indeed remove the C
2
assumption for f by
using the so-called regularization (or smoothing) technique; for this technique,
see [20] or [36]. Note that another, less analytic, proof is sketched in Exercise
22.
Example 3.28 The function
(x) =
_
xlog x if x > 0,
0 if x = 0
is continuous and concave on R
+
. For a positive matrix D 0
S(D) := Tr (D)
is called the von Neumann entropy. It follows from the previous theorem
that S(D) is a concave function of D. If we are very rigorous, then we cannot
3.3. DERIVATION 127
apply the theorem, since is not dierentiable at 0. Therefore we should
apply the theorem to f(x) := (x+), where > 0 and take the limit 0.
Example 3.29 Let a self-adjoint matrx H be xed. The state of a quantum

system is described by a density matrix D which has the properties D 0
and Tr D = 1. The equilibrium state is minimizing the energy
F(D) = Tr DH
1
S(D),
where is a positive number. To nd the minimizer, we solve the equation
t
F(D + tX)
t=0
= 0
for self-adjoint matrices X with property Tr X = 0. The equation is
Tr X
_
H +
1
log D +
1
I
_
= 0
and
H +
1
log D +
1
I
must be cI. Hence the minimizer is
D =
e
H
Tr e
H
,
which is called the Gibbs state.
Example 3.30 Next we restrict ourselves to the self-adjoint case A, B
M
n
(C)
sa
in the analysis of (3.15).
The space M
n
(C)
sa
can be decomposed as /
A
/
A
, where /
A
:=
C M
n
(C)
sa
: CA = AC is the commutant of A and /
A
is its orthogonal
complement. When the operator L
A
: X i[A, X] := i(AX XA) is
considered, /
A
is exactly the kernel of L
A
, while /
A
is its range.
When B /
A
, then
1
2i
_
f(z)(zI A)
1
B(zI A)
1
dz =
B
2i
_
f(z)(zI A)
2
dz = Bf
(A)
and we have
d
dt
f(A+ tB)
t=0
= Bf
(A) .
When B = i[A, X] /
A
, then we use the identity
(zI A)
1
[A, X](zI A)
1
= [(zI A)
1
, X]
and we conclude
d
dt
f(A+ ti[A, X])
t=0
= i[f(A), X] .
To compute the derivative in an arbitrary direction B we should decompose
B as B
1
B
2
with B
1
/
A
and B
2
/
A
. Then
d
dt
f(A+ tB)
t=0
= B
1
f
(A) + i[f(A), X] ,
where X is the solution to the equation B
2
= i[A, X].
The next lemma was used in the proof of Theorem 3.12.
Lemma 3.31 Let A
0
, B
0
M
sa
n
and assume A
0
> 0. Dene A = A
1
0
and
B = A
1/2
0
B
0
A
1/2
0
, and let t R. For all p, r N
d
r
dt
r
Tr (A
0
+ tB
0
)
p
t=0
=
p
p + r
(1)
r
d
r
dt
r
Tr (A+ tB)
p+r
t=0
. (3.17)
Proof: By induction it is easy to show that
d
r
dt
r
(A+ tB)
p+r
= r!
0i
1
,...,i
r+1
p
j
i
j
=p
(A+ tB)
i
1
B(A + tB)
i
2
B(A + tB)
i
r+1
.
By taking the trace at t = 0 we obtain
1
:=
d
r
dt
r
Tr (A+ tB)
p+r
t=0
= r!
0i
1
,...,i
r+1
p
j
i
j
=p
Tr A
i
1
BA
i
2
BA
i
r+1
.
Moreover, by similar arguments,
d
r
dt
r
(A
0
+ tB
0
)
p
= (1)
r
r!
1i
1
,...,i
r+1
p
j
i
j
=p+r
(A
0
+ tB
0
)
i
1
B
0
(A
0
+ tB
0
)
i
2
B
0
(A
0
+ tB
0
)
i
r+1
.
3.4. FR
ECHET DERIVATIVES 129

By taking the trace at t = 0 and using cyclicity, we get
2
:=
d
r
dt
r
Tr (A
0
+ tB
0
)
p
t=0
= (1)
r
r!
0i
1
,...,i
r+1
p1
j
i
j
=p1
Tr AA
i
1
BA
i
2
BA
i
r+1
.
We have to show that
2
=
p
p + r
(1)
r
1
.
To see this we rewrite
1
in the following way. Dene p + r matrices M
j
by
M
j
=
_
B for 1 j r
A for r + 1 j r + p .
Let S
n
denote the permutation group on 1, . . . , n. Then
1
=
1
p!
S
p+r
Tr
p+r
j=1
M
(j)
.
Because of the cyclicity of the trace we can always arrange the product such
that M
p+r
has the rst position in the trace. Since there are p + r possible
locations for M
p+r
to appear in the product above, and all products are
equally weighted, we get
1
=
p + r
p!
S
p+r1
Tr A
p+r1
j=1
M
(j)
.
On the other hand,
2
= (1)
r
1
(p 1)!
S
p+r1
Tr A
p+r1
j=1
M
(j)
,
so we arrive at the desired equality.
3.4 Frechet derivatives
Let f be a real-valued function on (a, b) R, and we denote by M
sa
n
(a, b) the
set of all matrices A M
sa
n
with (A) (a, b). In this section we discuss the
dierentiability property of the matrix functional calculus A f(A) when
A M
sa
n
(a, b).
The case n = 1 corresponds to dierentiation in classical analysis. There
the divided dierences is important and it will appear also here. Let
x
1
, x
2
, . . . be distinct points in (a, b). Then we dene
f
[0]
[x
1
] := f(x
1
), f
[1]
[x
1
, x
2
] :=
f(x
1
) f(x
2
)
x
1
x
2
and recursively for n = 2, 3, . . .,
f
[n]
[x
1
, x
2
, . . . , x
n+1
] :=
f
[n1]
[x
1
, x
2
, . . . , x
n
] f
[n1]
[x
2
, x
3
, . . . , x
n+1
]
x
1
x
n+1
.
The functions f
[1]
, f
[2]
and f
[n]
are called the rst, the second and the nth
divided dierences, respectively, of f.
From the recursive denition the symmetry is not clear. If f is a C
n
-
function, then
f
[n]
[x
0
, x
1
, . . . , x
n
] =
_
S
f
(n)
(t
0
x
0
+ t
1
x
1
+ + t
n
x
n
) dt
1
dt
2
dt
n
, (3.18)
where the integral is on the set S := (t
1
, . . . , t
n
) R
n
: t
i
0,
i
t
i
1
and t
0
= 1
n
i=1
t
i
. From this formula the symmetry is clear and if x
0
=
x
1
= = x
n
= x, then
f
[n]
[x
0
, x
1
, . . . , x
n
] =
f
(n)
(x)
n!
.
Next we introduce the notion of Frechet dierentiability. Assume that
mapping F : M
m
M
n
is dened in a neighbourhood of A M
m
. The
derivative f(A) : M
m
M
n
is a linear mapping such that
|F(A+ X) F(A) F(A)(X)|
2
|X|
2
0 as X M
m
and X 0,
where ||
2
is the Hilbert-Schmidt norm in (1.5). This is the general denition.
In the next theorem F(A) will be the matrix functional calculus f(A) when
f : (a, b) R and A M
sa
n
(a, b). Then the Frechet derivative is a linear
mapping f(A) : M
sa
n
M
sa
n
such that
|f(A+ X) f(A) f(A)(X)|
2
|X|
2
0 as X M
sa
n
and X 0,
or equivalently
f(A+ X) = f(A) + f(A)(X) + o(|X|
2
).
3.4. FR

Since Frechet dierentiability implies G ataux (or directional) dierentiability,
one can dierentiate f(A+ tX) with respect to the real parameter t and
f(A+ tX) f(A)
t
f(A)(X) as t 0.
This notion of Frechet dierentiability for f(A) is inductively extended to
the general higher degree. To do this, we denote by B((M
sa
n
)
m
, M
sa
n
) the set
of all m-multilinear maps from (M
sa
n
)
m
:= M
sa
n
M
sa
n
(m times) to M
sa
n
,
and introduce the norm of B((M
sa
n
)
m
, M
sa
n
) as
|| := sup
_
|(X
1
, . . . , X
m
)|
2
: X
i
M
sa
n
, |X
i
|
2
1, 1 i m
_
. (3.19)
Now assume that m N with m 2 and the (m 1)th Frechet derivative
m1
f(B) B((M
sa
n
)
m1
.M
sa
n
) exists for all B M
sa
n
(a, b) in a neighborhood
of A M
sa
n
(a, b). We say that f(B) is m times Frechet dierentiable at
A if
m1
f(B) is one more Frechet dierentiable at A, i.e., there exists a
m
f(A) B(M
sa
n
, B((M
sa
n
)
m1
, M
sa
n
)) = B((M
sa
n
)
m
, M
sa
n
)
such that
|
m1
f(A+ X)
m1
f(A)
m
f(A)(X)|
|X|
2
0 as X M
sa
n
and X 0,
with respect to the norm (3.19) of B((M
sa
n
)
m1
, M
sa
n
). Then
m
f(A) is called
the mth Frechet derivative of f at A. Note that the norms of M
sa
n
and
B((M
sa
n
)
m
, M
sa
n
) are irrelevant to the denition of Frechet derivatives since
the norms on a nite-dimensional vector space are all equivalent; we can use
the Hilbert-Schmidt norm just for convenience.
Example 3.32 Let f(x) = x
k
with k N. Then (A+X)
k
can be expanded
and f(A)(X) consists of the terms containing exactly one factor of X:
f(A)(X) =
k1
u=0
A
u
XA
k1u
.
To have the second derivative, we put A+ Y in place of A in f(A)(X) and
again we take the terms containing exactly one factor of Y :
2
f(A)(X, Y )
=
k1
u=0
_
u1
v=0
A
v
Y A
u1v
_
XA
k1u
+
k1
u=0
A
u
X
_
k2u
v=0
A
v
Y A
k2uv
_
.
The formulation
2
f(A)(X
1
, X
2
) =
u+v+w=n2
A
u
X
(1)
A
v
X
(2)
A
w
is more convenient, where u, v, w 0 and denotes the permutations of
1, 2.
Theorem 3.33 Let m N and assume that f : (a, b) R is a C
m
-function.
Then the following properties hold:
(1) f(A) is m times Frechet dierentiable at every A M
sa
n
(a, b). If the
diagonalization of A M
sa
n
(a, b) is A = UDiag(
1
, . . . ,
n
)U
, then the
mth Frechet derivative
m
f(A) is given as
m
f(A)(X
1
, . . . , X
m
) = U
_
n
k
1
,...,k
m1
=1
f
[m]
[
i
,
k
1
, . . . ,
k
m1
,
j
]
Sm
(X
(1)
)
ik
1
(X
(2)
)
k
1
k
2
(X
(m1)
)
k
m2
k
m1
(X
(m)
)
k
m1
j
_
n
i,j=1
U
for all X
i
M
sa
n
with X
i
= U
X
i
U (1 i m). (S
m
is the permuta-
tions on 1, . . . , m.)
(2) The map A
m
f(A) is a norm-continuous map from M
sa
n
(a, b) to
B((M
sa
n
)
m
, M
sa
n
).
(3) For every A M
sa
n
(a, b) and every X
1
, . . . , X
m
M
sa
n
,
m
f(A)(X
1
, . . . , X
m
) =

m
t
1
t
m
f(A+t
1
X
1
+ +t
m
X
m
)
t
1
==tm=0
.
Proof: When f(x) = x
k
, it is easily veried by a direct computation that
m
f(A) exists and
m
f(A)(X
1
, . . . , X
m
)
=
u
0
,u
1
,...,um0
u
0
+u
1
++um=km
Sm
A
u
0
X
(1)
A
u
1
X
(2)
A
u
2
A
u
m1
X
(m)
A
um
,
see Example 3.32. (If m > k, then
m
f(A) = 0.) The above expression is
further written as
u
0
,u
1
,...,um0
u
0
+u
1
+...+um=km
Sm
U
_

k
1
,...,k
m1
=1
u
0
i

u
1
k
1

u
m1
k
m1
um
j
3.4. FR

(X
(1)
)
ik
1
(X
(2)
)
k
1
k
2
(X
(m1)
)
k
m2
k
m1
(X
(m)
)
k
m1
j
_
n
i,j=1
U
= U
_
n
k
1
,...,k
m1
=1
_

u
0
,u
1
,...,um0
u
0
+u
1
++um=km
u
0
i

u
1
k
1

u
m1
k
m1
um
j
_
Sm
(X
(1)
)
ik
1
(X
(2)
)
k
1
k
2
(X
(m1)
)
k
m2
k
m1
(X
(m)
)
k
m1
j
_
n
i,j=1
U
= U
_
n
k
1
,...,k
m1
=1
f
[m]
[
i
,
k
1
, . . . ,
k
m1
,
j
]
Sm
(X
(1)
)
ik
1
(X
(2)
)
k
1
k
2
(X
(m1)
)
k
m2
k
m1
(X
(m)
)
k
m1
j
_
n
i,j=1
U
by Exercise 31. Hence it follows that

m
f(A) exists and the expression in (1)
is valid for all polynomials f. We can prove this and the continuity assertion
in (2) for all C
m
functions f on (a, b) by a method based on indunction on
m and approximation by polynomials. The details are not given here.
Formula (3) comes from the fact that Frechet dierentiability implies
G ataux (or directional) dierentiability, one can dierentiate f(A + t
1
X
1
+
+ t
m
X
m
) as
m
t
1
t
m
f(A+ t
1
X
1
+ + t
m
X
m
)
t
1
==tm=0
=

m
t
1
t
m1
f(A+ t
1
X
1
+ + t
m1
X
m1
)(X
m
)
t
1
==t
m1
=0
= =
m
f(A)(X
1
, . . . , X
m
).
Example 3.34 In particular, when f is C

1
on (a, b) and A = Diag(
1
, . . . ,
n
)
is diagonal in M
sa
n
(a, b), then the Frechet derivative f(A) at A is written as
f(A)(X) = [f
[1]
(
i
,
j
)]
n
i,j=1
X,
where denotes the Schur product, this was Theorem 3.25.
When f is C
2
on (a, b), the second Frechet derivative
2
f(A) at A =
Diag(
1
, . . . ,
n
) M
sa
n
(a, b) is written as
2
f(A)(X, Y ) =
_
n
k=1
f
[2]
(
i
,
k
,
j
)(X
ik
Y
kj
+ Y
ik
X
kj
)
_
n
i,j=1
.

Example 3.35 The Taylor expansion
f(A+ X) = f(A) +
k=1
1
k!
k
f(A)(X, . . . , X
. .
m
)
has a simple computation for a holomorphic function f, see (3.10):
f(A+ X) =
1
2i
_
f(z)(zI AX)
1
dz .
Since
zI A X = (zI A)
1/2
(I (zI A)
1/2
X(zI A)
1/2
)(zI A)
1/2
,
we have the expansion
(zI AX)
1
= (zI A)
1/2
(I (zI A)
1/2
X(zI A)
1/2
)
1
(zI A)
1/2
= (zI A)
1/2
n=0
_
(zI A)
1/2
X(zI A)
1/2
_
n
(zI A)
1/2
= (zI A)
1
+ (zI A)
1
X(zI A)
1
+(zI A)
1
X(zI A)
1
X(zI A)
1
+ .
Hence
f(A+ X) =
1
2i
_
f(z)(zI A)
1
dz
+
1
2i
_
f(z)(zI A)
1
X(zI A)
1
dz +
= f(A) + f(A)(X) +
1
2!
2
f(A)(X, X) + ,
which is the Taylor expansion.
When f satises the C
m
assumption as in Theorem 3.33, we have the
Taylor formula
f(A+ X) = f(A) +
k=1
1
k!
k
f(A)(X
(1)
, . . . , X
(k)
) + o(|X|
m
2
)
as X M
sa
n
and X 0. The details are not given here.
Formula (3.7) is due to Masuo Suzuki, Generalized Trotters formula and
systematic approximants of exponential operators and inner derivations with
applications to many-body problems, Commun. Math. Phys. 51(1976), 183
190.
The Bessis-Moussa-Villani conjecture (or BMV conjecture) was pub-
lished in the paper D. Bessis, P. Moussa and M. Villani, Monotonic converging
variational approximations to the functional integrals in quantum statistical
mechanics, J. Math. Phys. 16(1975), 23182325. Theorem 3.12 is from the
paper [64] of E. H. Lieb and R. Seiringer. A proof appeared in the paper H.
R. Stahl, Proof of the BMV conjecture, arXiv:1107.4875v3.
The contour integral representation (3.10) was found by Henri Poincare in
1899. Formula (3.18) is called Hermite-Genocchi formula.
Formula (3.11) appeared already in the work of J.J. Sylvester in 1833 and
(3.12) is due to H. Richter in 1949. It is remarkable that J. von Neumann
proved in 1929 that |A I|, |B I|, |AB I| < 1 and AB = BA implies
log AB = log A+ log B.
Theorem 3.33 is essentially due to Daleckii and Krein, Ju. L. Daleckii and
S. G. Krein, Integration and dierentiation of functions of Hermitian oper-
ators and applications to the theory of perturbations, Amer. Math. Soc.
Transl., Ser. 2 47(1965), 130. There the higher G ateaux derivatives of the
function t f(A+tX) were obtained for self-adjoint operators in an innite-
dimensional Hilbert space. As to the version of Frechet derivatives in Theorem
3.33, the proof for the case m = 1 is in the book [20] of Rajendra Bhatia and
the proof for the higher degree case is in Fumio Hiai, Matrix Analysis: Ma-
trix Monotone Functions, Matrix Means, and Majorization, Interdisciplinary
Information Sciences 16(2010), 139248.
3.6 Exercises
1. Prove that
t
e
tA
= e
tA
A.
2. Compute the exponential of the matrix
_
_
0 0 0 0 0
1 0 0 0 0
0 2 0 0 0
0 0 3 0 0
0 0 0 4 0
_
_
.
What is the extension to the n n case?
3. Use formula (3.3) to prove Theorem 3.4.
4. Let P and Q be ortho-projections. Give an elementary proof for the
inequality
Tr e
P+Q
Tr e
P
e
Q
.
5. Prove the Golden-Thompson inequality using the trace inequality
Tr (CD)
n
Tr C
n
D
n
(n N)
for C, D 0.
6. Give a counterexample for the inequality
[Tr e
A
e
B
e
C
[ Tr e
A+B+C
with Hermitian matrices. (Hint: Use the Pauli matrices.)
7. Solve the equation
e
A
=
_
cos t sin t
sin t cos t
_
where t R is given.
8. Show that
exp
__
A B
0 A
__
=
_
e
A
_
1
0
e
tA
Be
(1t)A
dt
0 e
A
_
.
9. Let A and B be self-adjoint matrices. Show that
[Tr e
A+iB
[ Tr e
A
.
10. Show the estimate
|e
A+B
(e
A/n
e
B/n
)
n
|
2

1
2n
|AB BA|
2
exp(|A|
2
+|B|
2
).
3.6. EXERCISES 137
11. Show that |A I|, |B I|, |AB I| < 1 and AB = BA implies
log AB = log A+ log B for matrices A and B.
12. Find an example that AB = BA for matrices, but log AB ,= log A +
log B.
13. Let
C = c
0
I + c(c
1
1
+ c
2
2
+ c
3
3
) with c
2
1
+ c
2
2
+ c
2
3
= 1,
where
1
,
2
,
3
are the Pauli matrices and c
0
, c
1
, c
2
, c
3
R. Show
that
e
C
= e
c
0
((cosh c)I + (sinh c)(c
1
1
+ c
2
2
+ c
3
3
)) .
14. Let A M
3
have eigenvalues , , with ,= . Show that
e
tA
= e
t
(I + t(AI)) +
e
t
e
t
( )
2
(AI)
2
te
t

(AI)
2
.
15. Assume that A M
3
has dierent eigenvalues , , . Show that e
tA
is
e
t
(AI)(AI)
( )( )
+ e
t
(AI)(AI)
( )( )
+ e
t
(AI)(AI)
( )( )
.
16. Assume that A M
n
is diagonalizable and let f(t) = t
m
with m N.
Show that (3.8) and (3.10) are the same matrices.
17. Prove Corollary 3.11 directly in the case B = AX XA.
18. Let 0 < D M
n
be a xed invertible positive matrix. Show that the
inverse of the linear mapping
J
D
: M
n
M
n
, J
D
(B) :=
1
2
(DB + BD)
is the mapping
J
1
D
(A) =
_

0
e
tD/2
Ae
tD/2
dt .
19. Let 0 < D M
n
be a xed invertible positive matrix. Show that the
inverse of the linear mapping
J
D
: M
n
M
n
, J
D
(B) :=
_
1
0
D
t
BD
1t
dt
is the mapping
J
1
D
(A) =
_

0
(D + tI)
1
A(D + tI)
1
dt. (3.20)
20. Prove (3.16) directly for the case f(t) = t
n
, n N.
21. Let f : [, ] R be a convex function. Show that
Tr f(B)
i
f(Tr Bp
i
) . (3.21)
for a pairwise orthogonal family (p
i
) of minimal projections with
i
p
i
=
I and for a self-ajoint matrix B with spectrum in [, ]. (Hint: Use the
spectral decomposition of B.)
22. Prove Theorem 3.27 using formula (3.21). (Hint: Take the spectral
decomposition of B = B
1
+ (1 )B
2
and show
Tr f(B
1
) + (1 )Tr f(B
2
) Tr f(B).)
23. A and B are positive matrices. Show that
A
1
log(AB
1
) = A
1/2
log(A
1/2
B
1
A
1/2
)A
1/2
.
(Hint: Use (3.17).)
24. Show that
d
2
dt
2
log(A+ tK)
t=0
= 2
_

0
(A+ sI)
1
K(A+ sI)
1
K(A + sI)
1
ds.
25. Show that
2
log A(X
1
, X
2
) =
_

0
(A+ sI)
1
X
1
(A + sI)
1
X
2
(A+ sI)
1
ds
_

0
(A + sI)
1
X
2
(A+ sI)
1
X
1
(A+ sI)
1
ds
for a positive invertible matrix A.
26. Prove the BMV conjecture for 2 2 matrices.
27. Show that
2
A
1
(X
1
, X
2
) = A
1
X
1
A
1
X
2
A
1
+ A
1
X
2
A
1
X
1
A
1
for an invertible variable A.
3.6. EXERCISES 139
28. Dierentiate the equation
A+ tB
A+ tB = A+ tB
and show that for positive A and B
d
dt
A+ tB
t=0
0.
29. For a real number 0 < ,= 1 the Renyi entropy is dened as
S
(D) :=
1
1
log Tr D
for a positive matrix D such that Tr D = 1. Show that S
(D) is a
decreasing function of . What is the limit lim
1
S
(D)? Show that

S
(D) is a concave functional of D for 0 < < 1.

30. Fix a positive invertible matrix D M
n
and set a linear mapping
M
n
M
n
by K
D
(A) := DAD. Consider the dierential equation
t
D(t) = K
D(t)
T, D(0) =
0
, (3.22)
where
0
is positive invertible and T is self-adjoint in M
n
. Show that
D(t) = (
1
0
tT)
1
is the solution of the equation.
31. When f(x) = x
k
with k N, verify that
f
[n]
[x
1
, x
2
, . . . , x
n+1
] =
u
1
,u
2
,...,u
n+1
0
u
1
+u
2
+...+u
n+1
=kn
x
u
1
1
x
u
2
2
x
un
n
x
u
n+1
n+1
.
32. Show that for a matrix A > 0 the integral
log(I + A) =
_

1
A(tI + A)
1
t
1
dt
holds. (Hint: Use (3.12).)
Chapter 4
Matrix monotone functions and
convexity
Let (a, b) R be an interval. A function f : (a, b) R is said to be
monotone for n n matrices if f(A) f(B) whenever A and B are self-
adjoint nn matrices, A B and their eigenvalues are in (a, b). If a function
is monotone for every matrix size, then it is called matrix monotone or
operator monotone. (One can see by an approximation argument that if
a function is matrix monotone for every matrix size, then A B implies
f(A) f(B) also for operators on an innite-dimensional Hilbert space.)
The theory of operator/matrix monotone functions was initiated by Karel
L owner, which was soon followed by Fritz Kraus on operator/matrix con-
vex functions. After further developments due to some authors (for instance,
Bendat and Sherman, Koranyi), Hansen and Pedersen established a modern
treatment of matrix monotone and convex functions. A remarkable feature of
L owners theory is that we have several characterizations of matrix monotone
and matrix convex functions from several dierent points of view. The im-
portance of complex analysis in studying matrix monotone functions is well
understood from their characterization in terms of analytic continuation as
Pick functions. Integral representations for matrix monotone and matrix con-
vex functions are essential ingredients of the theory both theoretically and in
applications. The notion of divided dierences has played a vital role in the
theory from its very beginning.
Let (a, b) R be an interval. A function f : (a, b) R is said to be
matrix convex if
f(tA+ (1 t)B) tf(A) + (1 t)f(B)
for all self-adjoint matrices A, B with eigenvalues in (a, b) and for all 0 t
140
4.1. SOME EXAMPLES OF FUNCTIONS 141
1. When f is matrix convex, then f is called matrix concave.
In the real analysis the monotonicity and convexity are not related, but
in the matrix case the situation is very dierent. For example, a matrix
monotone function on (0, ) is matrix concave. Matrix monotone and matrix
convex functions have several applications, but for a concrete function it is
not easy to verify the matrix monotonicity or matrix convexity. The typical
description of these functions is based on integral formulas.
4.1 Some examples of functions
Example 4.1 Let t > 0 be a parameter. The function f(x) = (t + x)
1
is
matrix monotone on [0, ).
Let A and B be positive matrices of the same order. Then A
t
:= tI + A
and B
t
:= tI + B are invertible, and
A
t
B
t
B
1/2
t
A
t
B
1/2
t
I |B
1/2
t
A
t
B
1/2
t
| 1
|A
1/2
t
B
1/2
t
| 1.
Since the adjoint preserves the operator norm, the latest condition is equiva-
lent to |B
1/2
t
A
1/2
t
| 1, which implies that B
1
t
A
1
t
.
Example 4.2 The function f(x) = log x is matrix monotone on (0, ).
This follows from the formula
log x =
_

0
1
1 + t

1
x + t
dt ,
which is easy to verify. The integrand
f
t
(x) :=
1
1 + t

1
x + t
is matrix monotone according to the previous example. It follows that
n
i=1
c
i
f
t(i)
(x)
is matrix monotone for any t(i) and positive c
i
R. The integral is the limit
of such functions. Therefore it is a matrix monotone function as well.
There are several other ways to show the matrix monotonicity of the log-
arithm.
142 CHAPTER 4. MONOTONE FUNCTIONS AND CONVEXITY
f
+
(x) =
0
n=
_
1
(n 1/2) x

n
n
2
+ 1
_
is matrix monotone on the interval (/2, +) and
f
(x) =
n=1
_
1
(n 1/2) x

n
n
2
+ 1
_
is matrix monotone on the interval (, /2). Therefore,
tan x = f
+
(x) + f
(x) =
n=
_
1
(n 1/2) x

n
n
2
+ 1
_
is matrix monotone on the interval (/2, /2).
Example 4.4 To show that the square root function is matrix monotone,
consider the function
F(t) :=
A + tX
dened for t [0, 1] and for xed positive matrices A and X. If F is increas-
ing, then F(0) =
A+ X = F(1).
In order to show that F is increasing, it is enough to see that the eigen-
values of F
(t) are positive. Dierentiating the equality F(t)F(t) = A + tX,

we have
F
(t)F(t) + F(t)F
(t) = X.
As the limit of self-adjoint matrices, F
is self-adjoint and let F
(t) =
i
E
i
be its spectral decomposition. (Of course, both the eigenvalues and the pro-
jections depend on the value of t.) Then
i
(E
i
F(t) + F(t)E
i
) = X
and after multiplication by E
j
from the left and from the right, we have for
the trace
2
j
Tr E
j
F(t)E
j
= Tr E
j
XE
j
.
Since both traces are positive,
j
must be positive as well.
More generally, for every 0 < t < 1 the matrix monotonicity holds: 0
A B implies A
t
B
t
. This is often called the L owner-Heinz inequality.
4.1. SOME EXAMPLES OF FUNCTIONS 143
A proof will be given in Example 4.45, and another approach is in Theorem
5.3 based on the geometric mean.
Next we consider the case t > 1. Take the matrices
A =
_
3
2
0
0
3
4
_
and B =
1
2
_
1 1
1 1
_
.
Then A B 0 can be checked. Since B is an orthogonal projection, for
each p > 1 we have B
p
= B and
A
p
B
p
=
_ _
3
2
_
p
1
2

1
2
1
2
_
3
4
_
p
1
2
_
.
We can compute
det(A
p
B
p
) =
1
2
_
3
8
_
p
(2 3
p
2
p
4
p
).
If A
p
B
p
then we must have det(A
p
B
p
) 0 so that 2 3
p
2
p
4
p
0,
which is not true when p > 1. Hence A
p
B
p
does not hold for any p > 1.
The previous example contained an important idea. To decide about the

matrix monotonicity of a function f, one has to investigate the derivative of
f(A+ tX).
Theorem 4.5 A smooth function f : (a, b) R is matrix monotone for
n n matrices if and only if the divided dierence matrix D M
n
dened as
D
ij
=
_
_
f(t
i
) f(t
j
)
t
i
t
j
if t
i
t
j
,= 0,
f
(t
i
) if t
i
t
j
= 0,
is positive semidenite for t
1
, t
2
, . . . , t
n
(a, b).
Proof: Let A be a self-adjoint and B be a positive semidenite nn matrix.
When f is matrix monotone, the function t f(A + tB) is an increasing
function of the real variable t. Therefore, the derivative, which is a matrix,
must be positive semidenite. To compute the derivative, we use formula
(3.16) of Theorem 3.25. The Schur theorem implies that the derivative is
positive if the divided dierence matrix is positive.
To show the converse, take a matrix B such that all entries are 1. Then
positivity of the derivative D B = D is the positivity of D.
The assumption about the smooth property in the previous theorem is
not essential. At the beginning of the theory L owner proved that if the
function f : (a, b) R has the property that A B for A, B M
2
implies
f(A) f(B), then f must be a C
1
-function.
The previous theorem can be reformulated in terms of a positive denite
kernel. The divided dierence
(x, y) =
_
_
f(x) f(y)
x y
if x ,= y,
f
(x) if x = y
is an (a, b) (a, b) R kernel function. f is matrix monotone if and only if
is a positive denite kernel.
Example 4.6 The function f(x) := exp x is not matrix monotone, since the
divided dierence matrix
_
_
exp x
exp x exp y
x y
exp y exp x
y x
exp y
_
_
does not have positive determinant (for x = 0 and for large y).
Example 4.7 We study the monotone function
f(x) =
_
x if 0 x 1,
(1 + x)/2 if 1 x.
This is matrix monotone in the intervals [0, 1] and [1, ). Theorem 4.5 helps
to show that this is monotone on [0, ) for 2 2 matrices. We should show
that for 0 < x < 1 and 1 < y
_
f
(x)
f(x)f(y)
xy
f(x)f(y)
xy
f
(y)
_
=
_
f
(x) f
(z)
f
(z) f
(y)
_
(for some z [x, y])
is a positive matrix. This is true, however f is not monotone for larger
matrices.
Example 4.8 The function f(x) = x
2
is matrix convex on the whole real
line. This follows from the obvious inequality
_
A+ B
2
_
2
A
2
+ B
2
2
.
4.2. CONVEXITY 145

Example 4.9 The function f(x) = (x+t)
1
is matrix convex on [0, ) when
t > 0. It is enough to show that
_
A+ B
2
_
1
A
1
+ B
1
2
,
which is equivalent to
_
B
1/2
AB
1/2
+ I
2
_
1
(B
1/2
AB
1/2
)
1
+ I
2
.
This holds, since
_
X + I
2
_
1
X
1
+ I
2
is true for an invertible matrix X 0.
Note that this convexity inequality is equivalent to the relation of arith-
metic and harmonic means.
4.2 Convexity
Let V be a vector space (over the real scalars). Let u, v V . Then they are
called the endpoints of the line-segment
[u, v] := u + (1 )v : R, 0 1.
A subset / V is convex if for any u, v / the line-segment [u, v] is
contained in /. A set / V is convex if and only if for every nite subset
v
1
, v
2
, . . . , v
n
and for every family of real positive numbers
1
,
2
, . . . ,
n
with
sum 1
n
i=1
i
v
i
/.
For example, if | | : V R
+
is a norm, then
v V : |v| 1
is a convex set. The intersection of convex sets is a convex set.
In the vector space M
n
the self-adjoint matrices and the positive matrices
form a convex set. Let (a, b) a real interval. Then
A M
sa
n
: (A) (a, b)
is a convex set.
Example 4.10 Let
o
n
:= D M
sa
n
: D 0 and Tr D = 1.
This is a convex set, since it is the intersection of convex sets. (In quantum
theory the set is called the state space.)
If n = 2, then a popular parametrization of the matrices in o
2
is
1
2
_
1 +
3

1
i
2
1
+ i
2
1
3
_
=
1
2
(I +
1
1
+
2
2
+
3
3
),
where
1
,
2
,
3
are the Pauli matrices and the necessary and sucient con-
dition to be in o
2
is
2
1
+
2
2
+
2
3
1.
This shows that the convex set o
2
can be viewed as the unit ball in R
3
. If
n > 2, then the geometric picture of o
n
is not so clear.
If / is a subset of the vector space V , then its convex hull is the smallest
convex set containg /, it is denoted by co /, i.e.,
co / :=
_
n
i=1
i
v
i
: v
i
/,
i
0, 1 i n,
n
i=1
i
= 1, n N
_
.
Let / V be a convex set. The vector v / is an extreme point of /
if the conditions
v
1
, v
2
/, 0 < < 1, v
1
+ (1 )v
2
= v
imply that v
1
= v
2
= v.
In the convex set o
2
the extreme points correspond to the parameters
satisfying
2
1
+
2
2
+
2
3
= 1. (If o
2
is viewed as a ball in R
3
, then the extreme
points are in the boundary of the ball.) For extreme points of o
n
, see Exercise
14.
Let J R be an interval. A function f : J R is said to be convex if
f(ta + (1 t)b) tf(a) + (1 t)f(b) (4.1)
for all a, b J and 0 t 1. This inequality is equivalent to the positivity
of the second divided dierence
f
[2]
[a, b, c] =
f(a)
(a b)(a c)
+
f(b)
(b a)(b c)
+
f(c)
(c a)(c b)
=
1
c b
_
f(c) f(a)
c a

f(b) f(a)
b a
_
4.2. CONVEXITY 147
for every dierent a, b, c J. If f C
2
(J), then for x J we have
lim
a,b,cx
f
[2]
[a, b, c] =
f
(x)
2
.
Hence the convexity is equivalent to the positivity of the second derivative.
For a convex function f the Jensen inequality
f
_
i
t
i
a
i
_
i
t
i
f(a
i
)
holds whenever a
i
J, t
i
0 and
i
t
i
= 1. This inequality has an integral
form
f
__
g(x) d(x)
_
_
f g(x) d(x).
For a nite discrete probability measure this is exactly the Jensen inequality,
but it holds for any probability measure on J and for a bounded Borel
function g with values in J.
Denition (4.1) makes sense if J is a convex subset of a vector space and
f is a real functional dened on it.
A functional f is concave if f is convex.
Let V be a nite-dimensional vector space and / V be a convex subset.
The functional F : / R + is called convex if
F(x + (1 )y) F(x) + (1 )F(y)
for every x, y / and real number 0 < < 1. Let [u, v] / be a line-
segment and dene the function
F
[u,v]
() = F(u + (1 )v)
on the interval [0, 1]. F is convex if and only if all functions F
[u,v]
: [0, 1] R
are convex when u, v /.
Example 4.11 We show that the functional
A log Tr e
A
is convex on the self-adjoint matrices, see Example 4.13.
The statement is equivalent to the convexity of the function
f(t) = log Tr (e
A+tB
) (t R) (4.2)
for every A, B M
sa
n
. To show this we prove that f
(0) 0. It follows from

Theorem 3.23 that
f
(t) =
Tr e
A+tB
B
Tr e
A+tB
.
In the computation of the second derivative we use Dysons expansion
e
A+tB
= e
A
+ t
_
1
0
e
uA
Be
(1u)(A+tB)
du .
In order to write f
(0) in a convenient form we introduce the inner product

X, Y
Bo
:=
_
1
0
Tr e
tA
X
e
(1t)A
Y dt.
(This is frequently termed Bogoliubov inner product.) Now
f
(0) =
I, I
Bo
B, B
Bo
I, B
2
Bo
(Tr e
A
)
2
,
which is positive due to the Schwarz inequality.
Let V be a nite-dimensional vector space with dual V
. Assume that the

duality is given by a bilinear pairing , . For a convex function F : V
R + the conjugate convex function F
: V

R + is given
by the formula
F
(v
) = supv, v
F(v) : v V .
F
is sometimes called the Legendre transform of F. Since F
is the supre-
mum of continuous linear functionals, it is convex and lower semi-continuous.
The following duality theorem is basic in convex analysis.
Theorem 4.12 If F : V R + is a lower semi-continuous convex
functional, then F
= F.
Example 4.13 The negative von Neumann entropy S(D) = Tr (D) =
Tr Dlog D is continuous and convex on the density matrices. Let
F(X) =
_
Tr X log X if X 0 and Tr X = 1,
+ otherwise.
This is a lower semi-continuous convex functional on the linear space of all self-
adjoint matrices. The duality is X, H = Tr XH. The conjugate functional
is
F
(H) = supTr XH F(X) : X M

sa
n

4.2. CONVEXITY 149
= infTr XH S(D) : D M
sa
n
, D 0, Tr D = 1 .
According to Example 3.29 the minimizer is D = e
H
/Tr e
H
, and therefore
F
(H) = log Tr e
H
.
This is a continuous convex function of H M
sa
n
. So Example 4.11 is recov-
ered. The duality theorem gives that
Tr X log X = supTr XH log Tr e
H
: H = H
when X 0 and Tr X = 1.
Example 4.14 Fix a density matrix = e
H
(with a self-adjoint H) and
consider the functional F dened on the self-adjoint matrices by
F(X) :=
_
Tr X(log X H) if X 0 and Tr X = 1 ,
+ otherwise.
F is essentially the relative entropy with respect to :
S(X|) := Tr X(log X log ).
The duality is X, B = Tr XB if X and B are self-adjoint matrices. We
want to show that the functional B log Tr e
H+B
is the Legendre transform
or the conjugate function of F:
log Tr e
B+H
= maxTr XB S(X|e
H
) : X is positive, Tr X = 1 .
Introduce the notation
f(X) = Tr XB S(X|e
H
)
for a density matrix X. When P
1
, . . . , P
n
are projections of rank one with
n
i=1
P
i
= I, we write
f
_
n
i=1
i
P
i
_
=
n
i=1
(
i
Tr P
i
B +
i
Tr P
i
H
i
log
i
) ,
where
i
0,
n
i=1
i
= 1. Since
i
f
_
n
i=1
i
P
i
_
i
=0
= +,
we see that f(X) attains its maximum at a matrix X
0
> 0, Tr X
0
= 1. Then
for any self-adjoint Z, Tr Z = 0, we have
0 =
d
dt
f(X
0
+ tZ)
t=0
= Tr Z(B + H log X
0
) ,
so that B +H log X
0
= cI with c R. Therefore X
0
= e
B+H
/Tr e
B+H
and
f(X
0
) = log Tr e
B+H
by a simple computation.
On the other hand, if X is positive invertible with Tr X = 1, then
S(X|e
H
) = maxTr XB log Tr e
H+B
: B is self-adjoint
due to the duality theorem.
Theorem 4.15 Let : M
n
M
m
be a positive unital linear mapping and
f : R R be a convex function. Then
Tr f((A)) Tr (f(A))
for every A M
sa
n
.
Proof: Take the spectral decompositions
A =
j
Q
j
and (A) =
i
P
i
.
So we have
i
= Tr ((A)P
i
)/Tr P
i
=
j
Tr ((Q
j
)P
i
)/Tr P
i
,
whereas the convexity of f yields
f(
i
)
j
f(
j
)Tr ((Q
j
)P
i
)/Tr P
i
.
Therefore,
Tr f((A)) =
i
f(
i
)Tr P
i

i,j
f(
j
)Tr ((Q
j
)P
i
) = Tr (f(A)) ,
which was to be proven.
It was stated in Theorem 3.27 that for a convex function f : (a, b) R,
the functional A Tr f(A) is convex. It is rather surprising that in the
convexity of this functional the number coecient 0 < t < 1 can be replaced
by a matrix.
4.2. CONVEXITY 151
Theorem 4.16 Let f : (a, b) R be a convex function and C
i
, A
i
M
n
be
such that
(A
i
) (a, b) and
k
i=1
C
i
C
i
= I.
Then
Tr f
_
k
i=1
C
i
A
i
C
i
_
i=1
Tr C
i
f(A
i
)C
i
.
Proof: We prove only the case
Tr f(CAC
+ DBD
) Tr Cf(A)C
+ Tr Df(B)D
,
when CC
+ DD
= I. (The more general version can be treated similarly.)

Set F := CAC
+ DBD
and consider the spectral decomposition of A

and B as integrals:
X =
X
i
P
X
i
=
_
dE
X
() (X = A, B),
where
X
i
are eigenvalues, P
X
i
are eigenprojections and the operator-valued
measure E
X
is dened on the Borel subsets S of R as
E
X
(S) =
P
X
i
:
X
i
S.
Assume that A, B, C, D M
n
and for a vector C
n
we dene a measure
(S) = (CE
A
(S)C
+ DE
B
(S)D
),
= E
A
(S)C
, C
+E
B
(S)D
, D
.
The reason of the denition of this measure is the formula
F, =
_
d
().
If is a unit eigenvector of F (and f(F)), then
f(CAC
+ DBD
), = f(F), = f(F, ) = f
__
d
()
_
_
f()d
()
= (Cf(A)C
+ Df(B)D
), .
(The inequality follows from the convexity of the function f.) To obtain the
statement we summarize this kind of inequalities for an orthonormal basis of
eigenvectors of F.
Example 4.17 The example is about a positive block-matrix A and a con-
cave function f : R
+
R. The inequality
Tr f
__
A
11
A
12
A
12
A
22
__
Tr f(A
11
) + Tr f(A
22
)
is called subadditivity. We can take ortho-projections P
1
and P
2
such that
P
1
+ P
2
= I and the subadditivity
Tr f(A) Tr f(P
1
AP
1
) + Tr f(P
2
AP
2
)
follows from the previous theorem. A stronger version of this inequality is
less trivial.
Let P
1
, P
2
and P
3
be ortho-projections such that P
1
+P
2
+P
3
= I. We use
the notation P
12
:= P
1
+P
2
and P
23
:= P
2
+P
3
. The strong subadditivity
is the inequality
Tr f(A) + Tr f(P
2
AP
2
) Tr f(P
12
AP
12
) + Tr f(P
23
AP
23
). (4.3)
Some details about this will come later, see Theorems 4.50 and 4.51.
Example 4.18 The log function is concave. If A M
n
is positive denite
and we set the projections P
i
:= E(ii), then from the previous theorem we
have
Tr log
n
i=1
P
i
AP
i

n
i=1
Tr P
i
(log A)P
i
.
This means
n
i=1
log A
ii
Tr log A
and the exponential is
n
i=1
A
ii
exp(Tr log A) = det A.
This is the well-known Hadamard inequality for the determinant, see The-
orem 1.30.
When /, / are two convex sets of matrices and F : / / R +
is a function of two matrix variables, F is called jointly concave if
F(A
1
+ (1 )A
2
, B
1
+ (1 )B
2
) F(A
1
, B
1
) + (1 )F(A
2
, B
2
)
4.2. CONVEXITY 153
for every A
i
/, B
i
/ and 0 < < 1. The function (A, B) / /
F(A, B) is jointly concave if and only if the function
AB / / F(A, B)
is concave. In this way the joint convexity and concavity are conveniently
studied.
Lemma 4.19 If (A, B) / / F(A, B) is jointly concave, then
f(A) := supF(A, B) : B /
is concave on /.
Proof: Assume that f(A
1
), f(A
2
) < +. Let > 0 be a small number.
We have B
1
and B
2
such that
f(A
1
) F(A
1
, B
1
) + and f(A
2
) F(A
2
, B
2
) + .
Then
f(A
1
) + (1 )f(A
2
) F(A
1
, B
1
) + (1 )F(A
2
, B
2
) +
F(A
1
+ (1 )A
2
, B
1
+ (1 )B
2
) +
f(A
1
+ (1 )A
2
) +
and this gives the proof.
The case of f(A
1
) = + or f(A
2
) = + has a similar proof.
Example 4.20 The quantum relative entropy of X 0 with respect to
Y 0 is dened as
S(X|Y ) := Tr (X log X X log Y ) Tr (X Y ).
It is known (see Example 3.19) that S(X|Y ) 0 and equality holds if and
only if X = Y . (We assumed X, Y > 0 in Example 3.19 but this is true for
general X, Y 0.) A dierent formulation is
Tr Y = max Tr (X log Y X log X + X) : X 0.
For a positive denite D, selecting Y = exp(L + log D) we obtain
Tr exp(L + log D) = maxTr (X(L + log D) X log X + X) : X 0
= maxTr (XL) S(X|D) + Tr D : X 0.
Since the quantum relative entropy is a jointly convex function, the func-
tion
F(X, D) := Tr (XL) S(X|D) + Tr D
is jointly concave as well. It follows that the maximization in X is concave
and we obtain that the functional
D Tr exp(L + log D) (4.4)
is concave on positive denite matrices. (This was the result of Lieb, but the
present proof is from [82].)
In the next lemma the operators
J
D
X =
_
1
0
D
t
XD
1t
dt, J
1
D
K =
_

0
(t + D)
1
K(t + D)
1
dt
for D, X, K M
n
, D > 0, are used (see Exercise 19 of Chapter 3). Liebs
concavity theorem (see Example ??) says that D > 0 Tr X
D
t
XD
1t
is
concave for every X M
n
. Thus, D > 0 X, J
D
X is concave. By using
this we prove the following:
Theorem 4.21 The functional
(D, K) Q(D, K) := K, J
1
D
K
is jointly convex on the domain D M
n
: D > 0 M
n
.
Proof: M
n
is a Hilbert space 1 with the Hilbert-Schmidt inner product.
The mapping K Q(D, K) is a quadratic form. When / := 1 1 and
D = D
1
+ (1 )D
2
, then
/(K
1
K
2
) := Q(D
1
, K
1
) + (1 )Q(D
2
, K
2
)
A(K
1
K
2
) := Q(D, K
1
+ (1 )K
2
)
are quadratic forms on /. Note that / is non-degenerate. In terms of /
and A the dominance A / is to be shown.
Let m and n be the corresponding sesquilinear forms on /, that is,
/() = m(, ), A() = n(, ) ( /) .
There exists a positive operator X on / such that
n(, ) = m(, X) (, /)
4.2. CONVEXITY 155
and our aim is to show that its eigenvalues are 1. If X(KL) = (KL)
for 0 ,= K L 11, we have
n(K L
, K L) = m(K
, K L)
for every K
, L
1. This is rewritten in terms of the Hilbert-Schmidt inner

product as
K
+(1 )L
, J
1
D
(K +(1 )L) = K
, J
1
D
1
K +(1 )L
, J
1
D
2
L ,
which is equivalent to the equations
J
1
D
(K + (1 )L) = J
1
D
1
K
and
J
1
D
(K + (1 )L) = J
1
D
2
L.
We infer
J
D
M = J
D
1
M + (1 )J
D
2
M
with the new notation M := J
1
D
(K + (1 )L). It follows that
M, J
D
M = M, J
D
1
M + (1 )M, J
D
2
M.
On the other hand, the concavity of D M, J
D
M tells the inequality
M, J
D
M M, J
D
1
M + (1 )M, J
D
2
M
and we arrive at 1 if M ,= 0. Otherwise, if M = 0, then we must have
K = L = 0 so that = 0.
Let J R be an interval. As introduced at the beginning of the chapter,
a function f : J R is said to be matrix convex if
f(tA+ (1 t)B) tf(A) + (1 t)f(B) (4.5)
for all self-adjoint matrices Aand B whose spectra are in J and for all numbers
0 t 1. (The function f is matrix convex if the functional A f(A) is
convex.) f is matrix concave if f is matrix convex.
The classical result is about matrix convex functions on the interval (1, 1).
They have integral decomposition
f(x) =
0
+
1
x +

2
2
_
1
1
x
2
1 x
d(), (4.6)
where is a probability measure and
2
0. (In particular, f must be an
analytic function.) The details will be given in Theorem 4.40.
Since self-adjoint operators on an innite-dimensional Hilbert space may
be approximated by self-adjoint matrices, (4.5) holds for operators when it
holds for all matrices. The point in the next theorem is that in the convex
combination tA+(1t)B the numbers t and 1t can be replaced by matrices.
Theorem 4.22 Let f : [a, b] R be a matrix convex function and C
i
, A
i
=
A
i
M
n
be such that
(A
i
) [a, b] and
k
i=1
C
i
C
i
= I.
Then
f
_
k
i=1
C
i
A
i
C
i
_
i=1
C
i
f(A
i
)C
i
. (4.7)
Proof: We are content to prove the case k = 2:
f(CAC
+ DBD
) Cf(A)C
+ Df(B)D
,
when CC
+ DD
= I. The essential idea is contained in this case.

The condition CC
+ DD
= I implies that we can nd a unitary block-

matrix
U :=
_
C D
X Y
_
when the entries X and Y are chosen properly. (Indeed, since [D
[ = (I
CC
)
1/2
, we have the polar decomposition D
= W[D
[ with a unitary W.
Then it is an exercise to show that the choice of X = (I C
C)
1/2
and
Y = C
satises the requirements.) Then

U
_
A 0
0 B
_
U
=
_
CAC
+ DBD
CAX
+ DBY

XAC
+ Y BD
XAX
+ Y BY

_
=:
_
A
11
A
12
A
21
A
22
_
.
It is easy to check that
1
2
V
_
A
11
A
12
A
21
A
22
_
V +
1
2
_
A
11
A
12
A
21
A
22
_
=
_
A
11
0
0 A
22
_
for
V =
_
I 0
0 I
_
.
It follows that the matrix
Z :=
1
2
V U
_
A 0
0 B
_
U
V +
1
2
U
_
A 0
0 B
_
U
is diagonal, Z
11
= CAC
+ DBD
and f(Z)
11
= f(CAC
+ DBD
).
Next we use the matrix convexity of the function f:
f(Z)
1
2
f
_
V U
_
A 0
0 B
_
U
V
_
+
1
2
f
_
U
_
A 0
0 B
_
U
_
4.2. CONVEXITY 157
=
1
2
V Uf
__
A 0
0 B
__
U
V +
1
2
Uf
__
A 0
0 B
__
U
=
1
2
V U
_
f(A) 0
0 f(B)
_
U
V +
1
2
U
_
f(A) 0
0 f(B)
_
U
The right-hand side is diagonal with Cf(A)C
+ Df(B)D
as (1, 1) entry.
The inequality implies the inequality between the (1, 1) entries and this is
exactly the inequality (4.7) for k = 2.
In the proof of (4.7) for nn matrices, the ordinary matrix convexity was
used for (2n) (2n) matrices. That is an important trick. The next theorem
is due to Hansen and Pedersen [40].
Theorem 4.23 Let f : [a, b] R and a 0 b.
If f is a matrix convex function, |V | 1 and f(0) 0, then f(V
AV )
V

f(A)V holds if A = A
and (A) [a, b].

If f(PAP) Pf(A)P holds for an orthogonal projection P and A = A
with (A) [a, b], then f is a matrix convex function and f(0) 0.
Proof: If f is matrix convex, we can apply Theorem 4.22. Choose B = 0
and W such that V
V + W
W = I. Then
f(V

AV + W
BW) V
f(A)V + W
f(B)W
holds and gives our statement.
Let A and B be self-adjoint matrices with spectrum in [a, b] and 0 < < 1.
Dene
C :=
_
A 0
0 B
_
, U :=
_
I
1 I
1 I
I
_
, P :=
_
I 0
0 0
_
.
Then C = C
with (C) [a, b], U is a unitary and P is an orthogonal

projection. Since
PU
CUP =
_
A+ (1 )B 0
0 0
_
,
the assumption implies
_
f(A+ (1 )B) 0
0 f(0)I
_
= f(PU
CUP)
Pf(U
CU)P = PU
f(C)UP
=
_
f(A) + (1 )f(B) 0
0 0
_
.
This implies that f(A+(1 )B) f(A) +(1 )f(B) and f(0) 0.
Example 4.24 From the previous theorem we can deduce that if f : [0, b]
R is a matrix convex function and f(0) 0, then f(x)/x is matrix monotone
on the interval (0, b].
Assume that 0 < A B. Then B
1/2
A
1/2
=: V is a contraction, since
|V |
2
= |V V
| = |B
1/2
AB
1/2
| |B
1/2
BB
1/2
| = 1.
Therefore the theorem gives
f(A) = f(V
BV ) V

f(B)V = A
1/2
B
1/2
f(B)B
1/2
A
1/2
,
which is equivalent to A
1
f(A) B
1
f(B).
Now assume that g : [0, b] R is matrix monotone. We want to show
that f(x) := xg(x) is matrix convex. Due to the previous theorem we need
to show
PAPg(PAP) PAg(A)P
for an orthogonal projection P and A 0. From the monotonicity
g(A
1/2
PA
1/2
) g(A)
and this implies
PA
1/2
g(A
1/2
PA
1/2
)A
1/2
P PA
1/2
g(A)A
1/2
P.
Since g(A
1/2
PA
1/2
)A
1/2
P = A
1/2
Pg(PAP) and A
1/2
g(A)A
1/2
= Ag(A) we
nished the proof.
Example 4.25 Heuristically we can say that Theorem 4.22 replaces all the
numbers in the Jensen inequality f(
i
t
i
a
i
)
i
t
i
f(a
i
) by matrices. There-
fore
f
_
i
a
i
A
i
_
i
f(a
i
)A
i
(4.8)
holds for a matrix convex function f if
i
A
i
= I for the positive matrices
A
i
M
n
and for the numbers a
i
(a, b).
4.2. CONVEXITY 159
We want to show that the property (4.8) is equivalent to the matrix con-
vexity
f(tA+ (1 t)B) tf(A) + (1 t)f(B).
Let
A =
i
P
i
and B =
j
Q
j
be the spectral decompositions. Then
i
tP
i
+
j
(1 t)Q
j
= I
and from (4.8) we obtain
f(tA+ (1 t)B) = f
_
i
t
i
P
i
+
j
(1 t)
j
Q
j
_
i
f(
i
)tP
i
+
j
f(
j
)(1 t)Q
j
= tf(A) + (1 t)f(B).
This inequality was the aim.
An operator Z B(1) is called a contraction if Z
Z I and an ex-
pansion if Z
Z I. For an A M
n
(C)
sa
let (A) = (
1
(A), . . . ,
n
(A))
denote the eigenvalue vector of A in decreasing order with multiplicities.
Theorem 4.23 says that, for a function f : [a, b] R with a 0 b, the
matrix inequality f(Z
AZ) Z
f(A)Z for every A = A
with (A) [a, b]

and every contraction Z characterizes the matrix convexity of f with f(0) 0.
Now we take some similar inequalities in the weaker senses of eigenvalue
dominance under the simple convexity or concavity condition of f.
The rst theorem presents the eigenvalue dominance involving a contrac-
tion when f is a monotone convex function with f(0) 0.
Theorem 4.26 Assume that f is a monotone convex function on [a, b] with
a 0 b and f(0) 0. Then, for every A M
n
(C)
sa
with (A) [a, b]
and for every contraction Z M
n
(C), there exists a unitary U such that
f(Z
AZ) U
f(A)ZU,
or equivalently,
k
(f(Z
AZ))
k
(Z
f(A)Z) (1 k n).
Proof: We may assume that f is increasing; the other case is covered by
taking f(x) and A. First, note that for every B M
n
(C)
sa
and for every
vector x with |x| 1 we have
f(x, Bx) x, f(B)x. (4.9)
Indeed, taking the spectral decomposition B =
n
i=1
i
[u
i
u
i
[ we have
f(x, Bx) = f
_
n
i=1
i
[x, u
i
[
2
_
i=1
f(
i
)[x, u
i
[
2
+ f(0)(1 |x|
2
)
i=1
f(
i
)[x, u
i
[
2
= x, f(B)x
thanks to convexity of f and f(0) 0. By the minimax expression in
Theorem 1.27 there exists a subspace / of C
n
with dim/ = k 1 such
that
k
(Z
f(A)Z) = max
xM
, x=1
x, Z
f(A)Zx = max
xM
, x=1
Zx, f(A)Zx.
Since Z is a contraction and f is non-decreasing, we apply (4.9) to obtain
k
(Z
f(A)Z) max
xM
, x=1
f(Zx, AZx) = f
_
max
xM
, x=1
x, Z
AZx
_
f(
k
(Z
AZ)) =
k
(f(Z
AZ)).
In the second inequality above we have used the mini-max expression again.
The following corollary was originally proved by Brown and Kosaki [23] in
the von Neumann algebra setting.
Corollary 4.27 Let f be a function on [a, b] with a 0 b, and let A
M
n
(C)
sa
, (A) [a, b], and Z M
n
(C) be a contraction. If f is a convex
function with f(0) 0, then
Tr f(Z
AZ) Tr Z
f(A)Z.
If f is a concave function on R with f(0) 0, then
Tr f(Z
AZ) Tr Z
f(A)Z.
4.2. CONVEXITY 161
Proof: Obviously, the two assertions are equivalent. To prove the rst,
by approximation we may assume that f(x) = x + g(x) with R and a
monotone and convex function g on [a, b] with g(0) 0. Since Tr g(Z
AZ)
Tr Z
gf(A)Z by Theorem 4.26, we have Tr f(Z
AZ) Tr Z
f(A)Z.
The next theorem is the eigenvalue dominance version of Theorem 4.23 for
under the simple convexity condition of f.
Theorem 4.28 Assume that f is a monotone convex function on [a, b]. Then,
for every A
1
, . . . , A
m
M
n
(C)
sa
with (A
i
) [a, b] and every C
1
, . . . , C
m

M
n
(C) with
m
i=1
C
i
C
i
= I, there exists a unitary U such that
f
_
m
i=1
C
i
A
i
C
i
_
U
_
m
i=1
C
i
f(A
i
)C
i
_
U.
Proof: Letting f
0
(x) := f(x) f(0) we have
f
_
i
C
i
A
i
C
i
_
= f(0)I + f
0
_
i
C
i
A
i
C
i
_
,
i
C
i
f(A
i
)C
i
= f(0)I +
i
C
i
f
0
(A
i
)C
i
.
So it may be assumed that f(0) = 0. Set
A :=
_
_
A
1
0 0
0 A
2
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 A
m
_
_
and Z :=
_
_
C
1
0 0
C
2
0 0
.
.
.
.
.
.
.
.
.
.
.
.
C
m
0 0
_
_
For the block-matrices f(Z
AZ) and Z
f(A)Z, we can take the (1, 1) blocks:

f(
i
C
i
A
i
C
i
) and
i
C
i
f(A
i
)C
i
. Moreover, all other blocks are 0. Hence
Theorem 4.26 implies that
k
_
f
_
i
C
i
A
i
C
i
__

k
_
i
C
i
f(A
i
)C
i
_
(1 k n),
as desired.
A special case of Theorem 4.28 is that if f and A
1
, . . . , A
m
are as above,
1
, . . . ,
m
> 0 and
m
i=1
i
= 1, then there exists a unitary U such that
f
_
m
i=1
i
A
i
_
U
_
m
i=1
i
f(A
i
)
_
U.
This inequality implies the trace inequality in Theorem 4.16 though mono-
tonicity of f is not assumed there.
4.3 Pick functions
Let C
+
denote the upper half-plane,
C
+
:= z C : Imz > 0 = re
i
C : r > 0, 0 < < .
Now we concentrate on analytic functions f : C
+
C. Recall that the range
f(C
+
) is a connected open subset of C unless f is a constant. An analytic
function f : C
+
C
+
is called a Pick function.
The next examples show that this concept is in connection with the matrix
monotonicity property.
Example 4.29 Let z = re
i
with r > 0 and 0 < < . For a real parameter
p > 0 the function
f
p
(z) = z
p
:= r
p
e
ip
has the range in T if and only if p 1.
This function f
p
(z) is a continuous extension of the real function 0 x
x
p
. The latter is matrix monotone if and only if p 1. The similarity to the
Pick function concept is essential.
Recall that the real function 0 < x log x is matrix monotone as well.
The principal branch of log z dened as
Log z := log r + i
is a continuous extension of the real logarithm function and it is in T as well.
The next Nevanlinnas theorem provides the integral representation of

Pick functions.
Theorem 4.30 A function f : C
+
C is in T if and only if there exists an
R, a 0 and a positive nite Borel measure on R such that
f(z) = + z +
_

1 + z
z
d(), z C
+
. (4.10)
The integral representation (4.10) is also written as
f(z) = + z +
_

_
1
z

2
+ 1
_
d(), z C
+
, (4.11)
where is a positive Borel measure on R given by d() := (
2
+ 1) d()
and so
_

2
+ 1
d() < +.
4.3. PICK FUNCTIONS 163
Proof: The proof of the if part is easy. Assume that f is dened on C
+
as in (4.10). For each z C
+
, since
f(z + z) f(z)
z
= +
_
R
2
+ 1
( z)( z z)
d()
and
sup
_
2
+ 1
( z)( z z)
: R, [z[ <
Imz
2
_
< +,
it follows from the Lebesgue dominated convergence theorem that
lim
0
f(z + z) f(z)
z
= +
_
R
2
+ 1
( z)
2
d().
Hence f is analytic in C
+
. Since
Im
_
1 + z
z
_
=
(
2
+ 1) Imz
[ z[
2
, z C
+
,
we have
Imf(z) =
_
+
_
R
2
+ 1
[ z[
2
d()
_
Imz 0
for all z C
+
. Therefore, we have f T. The equivalence between the two
representations (4.10) and (4.11) is immediately seen from
1 + z
z
= (
2
+ 1)
_
1
z

2
+ 1
_
.
The only if is the signicant part, whose proof is skipped here.
Note that , and in Theorem 4.30 are uniquely determined by f. In
fact, letting z = i in (4.10) we have = Re f(i). Letting z = iy with y > 0
we have
f(iy) = + iy +
_

(1 y
2
) + iy(
2
+ 1)
2
+ y
2
d()
so that
Imf(iy)
y
= +
_

2
+ 1
2
+ y
2
d().
By the Lebesgue dominated convergence theorem this yields
= lim
y
Imf(iy)
y
.
Hence and are uniquely determined by f. By (4.11), for z = x + iy we
have
Imf(x + iy) = y +
_

y
(x )
2
+ y
2
d(), x R, y > 0. (4.12)
Thus the uniqueness of (hence ) is a consequence of the so-called Stieltjes
inversion formula. (For details omitted here, see [36, pp. 2426] and [20,
pp. 139141]).
For any open interval (a, b), a < b , we denote by T(a, b) the
set of all Pick functions which admits continuous extension to C
+
(a, b) with
real values on (a, b).
The next theorem is a specialization of Nevanlinnas theorem to functions
in T(a, b).
Theorem 4.31 A function f : C
+
C is in T(a, b) if and only if f is
represented as in (4.10) with R, 0 and a positive nite Borel measure
on R (a, b).
Proof: Let f T be represented as in (4.10) with R, 0 and a
positive nite Borel measure on R. It suces to prove that f T(a, b) if
and only if ((a, b)) = 0. First, assume that ((a, b)) = 0. The function f
expressed by (4.10) is analytic in C
+
C
so that f(z) = f(z) for all z C

+
.
For every x (a, b), since
sup
_
2
+ 1
( x)( x z)
: R (a, b), [z[ <

1
2
minx a, b x
_
is nite, the above proof of the if part of Theorem 4.30 by using the
Lebesgue dominated convergence theorem can work for z = x as well, and so
f is dierentiable (in the complex variable z) at z = x. Hence f T(a, b).
Conversely, assume that f T(a, b). It follows from (4.12) that
_

1
(x )
2
+ y
2
d() =
Imf(x + iy)
y
, x R, y > 0.
For any x (a, b), since f(x) R, we have
Imf(x + iy)
y
= Im
f(x + iy) f(x)
y
= Re
f(x + iy) f(x)
iy
Re f
(x)
as y 0 and so the monotone convergence theorem yields
_

1
(x )
2
d() = Re f
(x), x (a, b).

4.3. PICK FUNCTIONS 165
Hence, for any closed interval [c, d] included in (a, b), we have
R := sup
x[c,d]
_

1
(x )
2
d() = sup
x[c,d]
Re f
(x) < +.
For each m N let c
k
:= c + (k/m)(d c) for k = 0, 1, . . . , m. Then
([c, d)) =
m
k=1
([c
k1
, c
k
))
m
k=1
_
[c
k1
,c
k
)
(c
k
c
k1
)
2
(c
k
)
2
d()
k=1
_
d c
m
_
2
_

1
(c
k
)
2
d()
(d c)
2
R
m
.
Letting m gives ([c, d)) = 0. This implies that ((a, b)) = 0 and
therefore ((a, b)) = 0.
Now let f T(a, b). The above theorem says that f(x) on (a, b) admits
the integral representation
f(x) = + x +
_
R\(a,b)
1 + x
x
d()
= + x +
_
R\(a,b)
(
2
+ 1)
_
1
x

2
+ 1
_
d(), x (a, b),
where , and are as in the theorem. For any n N and A, B M
sa
n
with (A), (B) (a, b), if A B then (I A)
1
(I B)
1
for all
R (a, b) (see Example 4.1) and hence we have
f(A) = I + A+
_
R\(a,b)
(
2
+ 1)
_
(I A)
1
2
+ 1
I
_
d()
I + B +
_
R\(a,b)
(
2
+ 1)
_
(I B)
1
2
+ 1
I
_
d() = f(B).
Therefore, f T(a, b) is matrix monotone on (a, b). It will be shown in the
next section that f is matrix monotone on (a, b) if and only if f T(a, b).
The following are examples of integral representations for typical Pick
functions from Example 4.29.
Example 4.32 The principal branch Log z of the logarithm in Example 4.29
is in T(0, ). Its integral representation in the form (4.11) is
Log z =
_
0
_
1
z

2
+ 1
_
d, z C
+
.
To show this, it suces to verify the above expression for z = x (0, ),
that is,
log x =
_

0
_
1
+ x
+

2
+ 1
_
d, x (0, ),
which is immediate by a direct computation.
Example 4.33 If 0 < p < 1, then z
p
dened in Example 4.29 is in T(0, ).
Its integral representation in the form (4.11) is
z
p
= cos
p
2
+
sin p
_
0
_
1
z

2
+ 1
_
[[
p
d, z C
+
.
For this it suces to verify that
x
p
= cos
p
2
+
sin p
_

0
_
1
+ x
+

2
+ 1
_
p
d, x (0, ), (4.13)
which is computed as follows.
The function
z
p1
1 + z
:=
r
p1
e
i(p1)
1 + re
i
, z = re
i
, 0 < < 2,
is analytic in the cut plane C (, 0] and we integrate it along the contour
z =
_
_
re
i
( r R, = +0),
Re
i
(0 < < 2),
re
i
(R r , = 2 0),
e
i
(2 > > 0),
where 0 < < 1 < R. Apply the residue theorem and let 0 and R
to show that
_

0
t
p1
1 + t
dt =

sin p
. (4.14)
For each x > 0, substitute /x for t in (4.14) to obtain
x
p
=
sin p
_

0
x
p1
+ x
d, x (0, ).
Since
x
+ x
=
1
2
+ 1
+
_

2
+ 1

1
+ x
_
,
4.4. L
OWNERS THEOREM 167

it follows that
x
p
=
sin p
_

0
p1
2
+ 1
d+
sin p
_

0
_

2
+ 1
1
+ x
_
p
d, x (0, ).
Substitute
2
for t in (4.14) with p replaced by p/2 to obtain
_

0
p1
2
+ 1
d =

2 sin
p
2
.
Hence (4.13) follows.
4.4 L owners theorem
The main aim of this section is to prove the primary result in L owners theory
saying that a matrix monotone (i.e., operator monotone) function on (a, b)
belongs to T(a, b).
Operator monotone functions on a nite open interval (a, b) are trans-
formed into those on a symmetric interval (1, 1) via an ane function. So
it is essential to analyze matrix monotone functions on (1, 1). They are
C
-functions and f
(0) > 0 unless f is constant. We denote by / the set of

all matrix monotone functions on (1, 1) such that f(0) = 0 and f
(0) = 1.
Lemma 4.34 Let f /. Then
(1) For every [1, 1], (x + )f(x) is matrix convex on (1, 1).
(2) For every [1, 1], (1 +

x
)f(x) is matrix monotone on (1, 1).
(3) f is twice dierentiable at 0 and
f
(0)
2
= lim
x0
f(x) f
(0)x
x
2
.
Proof: (1) The proof is based on Example 4.24, but we have to change
the argument of the function. Let (0, 1). Since f(x 1 + ) is matrix
monotone on [0, 2 ), it follows that xf(x 1 +) is matrix convex on the
same interval [0, 2 ). So (x + 1 )f(x) is matrix convex on (1 + , 1).
By letting 0, (x + 1)f(x) is matrix convex on (1, 1).
We repeat the same argument with the matrix monotone function f(x)
and get the matrix convexity of (x 1)f(x). Since
(x + )f(x) =
1 +
2
(x + 1)f(x) +
1
2
(x 1)f(x),
this function is matrix convex as well.
(2) (x + )f(x) is already known to be matrix convex, which divided by
x is matrix monotone.
(3) To prove this, we use the continuous dierentiability of matrix mono-
tone functions. Then, by (2), (1 +
1
x
)f(x) as well as f(x) is C
1
on (1, 1)
so that the function h on (1, 1) dened by h(x) := f(x)/x for x ,= 0 and
h(0) := f
(0) is C
1
. This implies that
h
(x) =
f
(x)x f(x)
x
2
h
(0) as x 0.
Therefore,
f
(x)x = f(x) + h
(0)x
2
+ o([x[
2
)
so that
f
(x) = h(x) + h
(0)x + o([x[) = h(0) + 2h
(0)x + o([x[) as x 0,
which shows that f is twice dierentiable at 0 with f
(0) = 2h
(0). Hence
f
(0)
2
= h
(0) = lim
x0
h(x) h(0)
x
= lim
x0
f(x) f
(0)x
x
2
and the proof is ready.
Lemma 4.35 If f /, then
x
1 + x
f(x) for x (1, 0), f(x)
x
1 x
for x (0, 1).
and [f
(0)[ 2.
Proof: For every x (1, 1), Theorem 4.5 implies that
_
f
[1]
(x, x) f
[1]
(x, 0)
f
[1]
(x, 0) f
[1]
(0, 0)
_
=
_
f
(x) f(x)/x
f(x)/x 1
_
0,
and hence
f(x)
2
x
2
f
(x). (4.15)
By Lemma 4.34 (1),
d
dx
(x 1)f(x) = f(x) + (x 1)f
(x)
4.4. L
OWNERS THEOREM 169

is increasing on (1, 1). Since f(0) f
(0) = 1, we have
f(x) + (x 1)f
(x) 1 for 0 < x < 1, (4.16)

f(x) + (x + 1)f
(x) 1 for 1 < x < 0. (4.17)

By (4.15) and (4.16) we have
f(x) + 1
(1 x)f(x)
2
x
2
.
If f(x) >
x
1x
for some x (0, 1), then
f(x) + 1 >
(1 x)f(x)
x
2

x
1 x
=
f(x)
x
so that f(x) <
x
1x
, a contradiction. Hence f(x)
x
1x
for all x [0, 1).
A similar argument using (4.15) and (4.17) yields that f(x)
x
1+x
for all
x (1, 0].
Moreover, by Lemma 4.34 (3) and the two inequalities just proved,
f
(0)
2
lim
x0
x
1x
x
x
2
= lim
x0
1
1 x
= 1
and
f
(0)
2
lim
x0
x
1+x
x
x
2
= lim
x0
1
1 + x
= 1
so that [f
(0)[ 2.
Lemma 4.36 The set / is convex and compact if it is considered as a subset
of the topological vector space consisting of real functions on (1, 1) with the
locally convex topology of pointwise convergence.
Proof: It is obvious that / is convex. Since f(x) : f / is bounded
for each x (1, 1) thanks to Lemma 4.35, it follows that / is relatively
compact. To prove that / is closed, let f
i
be a net in / converging to a
function f on (1, 1). Then it is clear that f is matrix monotone on (1, 1)
and f(0) = 0. By Lemma 4.34 (2), (1+
1
x
)f
i
(x) is matrix monotone on (1, 1)
for every i. Since lim
x0
(1 +
1
x
)f
i
(x) = f
i
(0) = 1, we thus have
_
1
1
x
_
f
i
(x) 1
_
1 +
1
x
_
f
i
(x), x (0, 1).
Therefore,
_
1
1
x
_
f(x) 1
_
1 +
1
x
_
f(x), x (0, 1).
Since f is C
1
on (1, 1), the above inequalities yield f
(0) = 1.
Lemma 4.37 The extreme points of / have the form
f(x) =
x
1 x
, where =
f
(0)
2
.
Proof: Let f be an extreme point of /. For each (1, 1) dene
g
(x) :=
_
1 +

x
_
f(x) , x (1, 1).
By Lemma 4.34 (2), g
is matrix monotone on (1, 1). Notice that

g
(0) = f(0) + f
(0) = 0
and
g
(0) = lim
x0
(1 +

x
)f(x)
x
= f
(0) + lim
x0
f(x) f
(0)x
x
2
= 1 +
1
2
f
(0)
by Lemma 4.34 (3). Since 1 +
1
2
f
(0) > 0 by Lemma 4.35, the function

h
(x) :=
(1 +

x
)f(x)
1 +
1
2
f
(0)
is in /. Since
f =
1
2
_
1 +
1
2
f
(0)
_
h
+
1
2
_
1
1
2
f
(0)
_
h
,
the extremality of f implies that f = h
so that
_
1 +
1
2
f
(0)
_
f(x) =
_
1 +

x
_
f(x)
for all (1, 1). This immediately implies that f(x) = x/(1
1
2
f
(0)x).
Theorem 4.38 Let f be a matrix monotone function on (1, 1). Then there
exists a probability Borel measure on [1, 1] such that
f(x) = f(0) + f
(0)
_
1
1
x
1 x
d(), x (1, 1). (4.18)
Proof: The essential case is f /. Let
(x) := x/(1x) for [1, 1].

By Lemmas 4.36 and 4.37, the Krein-Milman theorem says that / is the
closed convex hull of
: [1, 1]. Hence there exists a net f

i
in the
4.4. L
OWNERS THEOREM 171

convex hull of
: [1, 1] such that f

i
(x) f(x) for all x (1, 1).
Each f
i
is written as f
i
(x) =
_
1
1
(x) d
i
() with a probability measure
i
on [1, 1] with nite support. Note that the set /
1
([1, 1]) of probability
Borel measures on [1, 1] is compact in the weak* topology when considered
as a subset of the dual Banach space of C([1, 1]). Taking a subnet we may
assume that
i
converges in the weak* topology to some /
1
([1, 1]).
For each x (1, 1), since
(x) is continuous in [1, 1], we have

f(x) = lim
i
f
i
(x) = lim
i
_
1
1
(x) d
i
() =
_
1
1
(x) d().
To prove the uniqueness of the representing measure , let
1
,
2
be prob-
ability Borel measures on [1, 1] such that
f(x) =
_
1
1
(x) d
1
() =
_
1
1
(x) d
2
(), x (1, 1).
Since
(x) =
k=0
x
k+1
k
is uniformly convergent in [1, 1] for any
x (1, 1) xed, it follows that
k=0
x
k+1
_
1
1
k
d
1
() =
k=0
x
k+1
_
1
1
k
d
2
(), x (1, 1).
Hence
_
1
1
k
d
1
() =
_
1
1
k
d
2
() for all k = 0, 1, 2, . . ., which implies that
1
=
2
.
The integral representation of the above theorem is an example of the so-
called Choquets theorem while we proved it in a direct way. The uniqueness
of the representing measure shows that
: [1, 1] is actually the

set of extreme points of /. Since the pointwise convergence topology on
: [1, 1] agrees with the usual topology on [1, 1], we see that / is
a so-called Bauer simplex.
Theorem 4.39 (L owner theorem) Let a < b and f be a real-
valued function on (a, b). Then f is matrix monotone on (a, b) if and only if
f T(a, b). Hence, a matrix monotone function is analytic.
Proof: The if part was shown after Theorem 4.31. To prove the only
if , it is enough to assume that (a, b) is a nite open interval. Moreover, when
(a, b) is a nite interval, by transforming f into a matrix monotone function
on (1, 1) via a linear function, it suces to prove the only if part when
(a, b) = (1, 1). If f is a non-constant matrix monotone function on (1, 1),
then by using the integral representation (4.18) one can dene an analytic
continuation of f by
f(z) = f(0) + f
(0)
_
1
1
z
1 z
d(), z C
+
.
Since
Imf(z) = f
(0)
_
1
1
Imz
[1 z[
2
d(),
it follows that f maps C
+
into itself. Hence f T(1, 1).
Theorem 4.40 Let f be a non-linear matrix convex function on (1, 1).
Then there exists a unique probability Borel measure on [1, 1] such that
f(x) = f(0) + f
(0)x +
f
(0)
2
_
1
1
x
2
1 x
d(), x (1, 1).
Proof: To prove this statement, we use the result due to Kraus that if f is a
matrix convex function on (a, b), then f is C
2
and f
[1]
[x, ] is matrix monotone
on (a, b) for every (a, b). Then we may assume that f(0) = f
(0) = 0
by considering f(x) f(0) f
(0)x. Since g(x) := f

[1]
[x, 0] = f(x)/x is a
non-constant matrix monotone function on (1, 1). Hence by Theorem 4.38
there exists a probability Borel measure on [1, 1] such that
g(x) = g
(0)
_
1
1
x
1 x
d(), x (1, 1).
Since g
(0) = f
(0)/2 is easily seen, we have

f(x) =
f
(0)
2
_
1
1
x
2
1 x
d(), x (1, 1).
Moreover, the uniqueness of follows from that of the representing measure
for g.
Theorem 4.41 Let f be a continuous matrix monotone functions on [0, ).
Then there exists a positive measure on (0, ) and 0 such that
f(x) = f(0) + x +
_

0
x
x +
d(), x [0, ), (4.19)
where
_

0
1 +
d() < +.
4.4. L
OWNERS THEOREM 173

Proof: Consider a function : (1, 1) (0, ) dened by
(x) :=
1 + x
1 x
= 1 +
2
1 x
,
which is matrix monotone. Let f be a continuous matrix monotone function
on R
+
. Since g(x) := f((x)) is matrix monotone on (1, 1), by Theorem
4.38 there exists a probability measure on [1, 1] such that
g(x) = g(0) + g
(0)
_
[1,1]
x
1 x
d()
for every x (1, 1). We may assume g
(0) > 0 since otherwise g and hence

f are constant functions. Since g(1) = lim
x0
g(x) = f(0) > , we have
_
[1,1]
1
1 +
d() < +
and in particular (1) = 0. Therefore,
g(x) g(1) = g
(0)
_
(1,1]
1 + x
(1 x)(1 + )
d() =
_
(1,1]
1 + x
1 x
d (),
where d () := g
(0)(1 + )
1
d(). Dene a nite measure m on (0, )
by m :=
1
. Transform the above integral expression by x =
1
(t) to
obtain
f(t) f(0) = t (1) +
_
(0,)
1 +
1
(t)
1
1
()
1
(t)
dm()
= t +
_
(0,)
t(1 + )
t +
dm(),
where := (1). With the measure d() := ((1 + )/) dm() we have
the desired integral expression of f.
Since the integrand
x
x +
= 1

x +
is a matrix monotone function of x (see Example 4.1), it is obvious that a
function on [0, ) admitting the integral expression (4.19) is matrix mono-
tone. The theorem shows that a matrix monotone function on [0, ) is matrix
concave.
Theorem 4.42 If f : R
+
R is matrix monotone, then xf(x) is matrix
convex.
Proof: Let > 0. First we check the function f(x) = (x + )
1
. Then
xf(x) =
x
+ x
= 1 +

+ x
and it is well-known that x (x + )
1
is matrix convex.
For a general matrix monotone f, we use the integral decomposition (4.19)
and the statement follows from the previous special case.
Theorem 4.43 If f : (0, ) (0, ), then the following conditions are
equivalent:
(1) f is matrix monotone;
(2) x/f(x) is matrix monotone;
(3) f is matrix concave.
Proof: For > 0 the functionf
(x) := f(x +) is dened on [0, ). If the

statement is proved for this function, then the limit 0 gives the result.
So we assume f : [0, ) (0, ).
Recall that (1) (3) was already remarked above.
The implication (3) (2) is based on Example 4.24. It says that f(x)/x
is matrix monotone. Therefore x/f(x) is matrix monotone as well.
(2) (1): Assume that x/f(x) is matrix monotone on (0, ). Let :=
lim
x0
x/f(x). Then it follows from the L owner representation that divided
by x we have
1
f(x)
=

x
+ +
_

0
+ x
d().
This multiplied with 1 is the matrix monotone 1/f(x). Therefore f(x) is
matrix monotone as well.
It was proved that the matrix monotonicity is equivalent to the positive
deniteness of the divided dierence kernel. Matrix concavity has a somewhat
similar property.
Theorem 4.44 Let f : R
+
R
+
be a smooth function. If the divided
dierence kernel function is conditionally negative denite, then f is matrix
convex.
Proof: By continuity it suces to prove that the function f(x) + is
matrix convex on R
+
for any > 0. So we may assume that f > 0. Example
4.5. SOME APPLICATIONS 175
2.42 and Theorem 4.5 give that g(x) = x
2
/f(x) is matrix monotone. Then
x/g(x) = f(x)/x is matrix monotone due to Theorem 4.43. Multiplying by
x we have a matrix convex function by Theorem 4.42.
It is not always easy to decide if a function is matrix monotone. An ecient
method is based on Theorem 4.39. The theorem says that a function R
+
R
is matrix monotone if and only if it has a holomorphic extension to the upper
half-plane C
+
such that its range is in the closure of C
+
. It is remarkable that
a matrix monotone function is very smooth and connected with functions of
a complex variable.
Example 4.45 The representation
x
t
=
sin t
_

0
t1
x
+ x
d
shows that f(x) = x
t
is matrix monotone on R
+
when 0 < t < 1. In other
words,
0 A B imply A
t
B
t
,
which is often called the L owner-Heinz inequality.
We can arrive at the same conclusion by holomorphic extension as a Pick
function (see Examples 4.29 and 4.33) so that f(x) = x
t
is matrix monotone
on R
+
for these values of the parameter but not for any other value. An-
other familiar example of a matrix monotone function is log x on (0, ), see
Examples 4.29 and 4.32.
4.5 Some applications
If the complex extension of a function f : [0, ) R is rather natural,
then it can be checked numerically that the upper half-plane is mapped into
itself, and the function is expected to be matrix monotone. For example,
x x
p
has a natural complex extension. In the following we give a few more
examples of matrix monotone functions on R
+
.
Theorem 4.46 Let
f
p
(x) :=
_
p(x 1)
x
p
1
_ 1
1p
(x > 0).
In particular, f
2
(x) = (x + 1)/2, f
1
(x) =
x and
f
1
(x) := lim
p1
f
p
(x) = e
1
x
x
x1
, f
0
(x) := lim
p0
f
p
(x) =
x 1
log x
.
Then f
p
is matrix monotone if 2 p 2.
Proof: Since the functions are continuous in the parameter p and the
matrix monotone functions are closed under the pointwise convergence, we
may prove for p [2, 2] such that p ,= 2, 1, 0, 1, 2. By L owners theorem
(Theorem 4.39) it suces to show that f
p
has a holomorphic continuation
to C
+
mapping into itself. We dene log z as log 1 = 0; then in case 2 <
p < 2, the real function p(x 1)/(x
p
1) has a holomorphic continuation
p(z 1)/(z
p
1) to C
+
since z
p
1 ,= 0 in C
+
. Moreover, it is continuous
in the closed upper half-plane Imz 0, Further, since p(z 1)/(z
p
1) ,= 0
(z ,= 1), f
p
has a holomorphic continuation (denoted by the same f
p
) to C
+
and it is also continuous in Imz 0.
Assume p (0, 1) (1, 2). For R > 0 let K
R
:= z : [z[ R, Imz 0,
R
:= z : [z[ = R, Imz > 0 and K
R
be the interior of K
R
; then the
boundary of K
R
is
R
[R, R]. Note that f
p
(K
R
) is a compact set. Recall
the well-known fact that the image of a connected open set by a holomorphic
function is a connected open set unless a single point. This yields that f
p
(K
R
)
is open and hence the boundary of f
p
(K
R
) is included in f
p
(
R
[R, R]).
Below let us prove that for any suciently small > 0, if R is suciently
large (depending on ) then
f
p
(
R
[R, R]) z : arg z + ,
which yields that f
p
(K
R
) z : arg z + . Thus, letting R
(so 0) we conclude that f
p
(C
+
) z : Imz 0.
Clearly, [0, ) is mapped into [0, ) by f
p
. If z (, 0), then arg(z
1) = and p arg(z
p
1) for 0 < p < 1 and arg(z
p
1) p for
1 < p < 2. Hence
0 arg
z 1
z
p
1
(1 p) for 0 < p < 1,
(1 p) arg
z 1
z
p
1
0 for 1 < p < 2.
Thus, since
arg
_
z 1
z
p
1
_
1
1p
=
1
1 p
arg
z 1
z
p
1
,
it follows that 0 arg f
p
(z) , so (, 0) is mapped into Imz 0.
Next, for any small > 0, if R is suciently large, then we have for
every z
R
[ arg(z 1) arg z[ , [ arg(z
p
1) p log z[
so that
arg
z 1
z
p
1
(1 p) arg z
2.
Since
arg
_
z 1
z
p
1
_
1
1p
arg z
1
1 p
_
arg
z 1
z
p
1
(1 p) arg z
_

2
[1 p[
,
we have f
p
(
R
) z : 2/[1 p[ arg z + 2/[1 p[. Thus, the
desired assertion follows.
The case 2 < p < 0 can be treated similarly by noting that
f
p
(x) =
_
[p[x
|p|
(x 1)
x
|p|
1
_
1
1+|p|
.
Theorem 4.47 The function

f
p
(x) =
_
x
p
+ 1
2
_1
p
is matrix monotone if and only if 1 p 1.
Proof: Observe that f
1
(x) = 2x/(x + 1) and f
1
(x) = (x + 1)/2, so f
p
could be matrix monotone only if 1 p 1. We show that it is indeed
matrix monotone. The case p = 0 is well-known. Further, note that if f
p
is
matrix monotone for 0 < p < 1 then
f
p
(x) =
_
_
x
p
+ 1
2
_1
p
_
1
is also matrix monotone since x
p
is matrix monotone decreasing for 0 < p
1.
So let us assume that 0 < p < 1. Then, since z
p
+ 1 ,= 0 in the upper half
plane, f
p
has a holomorphic continuation to the upper half plane (by dening
log z as log 1 = 0). By L owners theorem it suces to show that f
p
maps the
upper half plane into itself. If 0 < arg z < then 0 < arg(z
p
+ 1) < arg z
p
=
p arg z so that
0 < arg
_
z
p
+ 1
2
_1
p
=
1
p
arg
_
z
p
+ 1
2
_
< arg z < .
Thus z is mapped into the upper half plane.
In the special case p =
1
n
,
f
p
(x) =
_
x
1
n
+ 1
2
_
n
=
1
2
n
n
k=0
_
n
k
_
x
k
n
,
and it is well-known that x
is matrix monotone for 0 < < 1 thus f

p
is also
matrix monotone.
The matrix monotone functions in the next theorem play an important
role in some applications.
Theorem 4.48 For 1 p 2 the function
f
p
(x) = p(1 p)
(x 1)
2
(x
p
1)(x
1p
1)
. (4.20)
is matrix monotone.
Proof: The special cases p = 1, 0, 1, 2 are well-known. For 0 < p < 1 we
can use an integral representation
1
f
p
(x)
=
sin p
_

0
d
p1
_
1
0
ds
_
1
0
dt
1
x((1 t) + (1 s)) + (t + s)
and this shows that 1/f
p
is matrix monotone decreasing since so is the inte-
grand as a function of all variables. It follows that f
p
(x) is matrix monotone
for 0 < p < 1.
The proof below is based on the L owners theorem. Since f
p
= f
1p
, we
may assume p (0, 1) (1, 2). It suces to show that f
p
has a holomorphic
continuation to C
+
mapping into itself. It is clear that [0, ) is mapped into
[0, ) by f
p
. First, when 0 < p < 1, f
p
has a holomorphic continuation to
C
+
in the form
f
p
(z) = p(1 p)
(z 1)
2
(z
p
1)(z
1p
1)
.
For z (, 0), since
p arg(z
p
1) , (1 p) arg(z
1p
1)
and
arg f
p
(z) = arg(z
p
1) arg(z
1p
1),
we have 2 arg f
p
(z) so that Imf
p
(z) 0. Let K
R
and
R
be as
in the proof of Theorem 4.46. For every > 0, if z
R
with a suciently
large R > 0 (depending on ), then we have
[ arg(z 1)
2
2 arg z[ , [ arg(z
p
1) p arg z[ ,
[ arg(z
1p
1) (1 p) arg z[ ,
so that
[ arg f
p
(z) arg z[ = [ arg(z 1)
2
arg(z
p
1) arg(z
1p
1) arg z[ 3,
which yields that f
p
(
R
) [R, R]) z : 3 arg z 3. Thus, letting
R (so 0) we have f
p
(C
+
) z : Imz 0 as in the proof of
Theorem 4.46.
Next, when 1 < p < 2, f
p
has a holomorphic continuation to C
+
in the
form
f
p
(z) = p(p 1)
z
p1
(z 1)
2
(z
p
1)(z
p1
1)
.
For every > 0, if z
R
with a suciently large R, then
[ arg f
p
(z) (2 p) log z[ 3
as above. The assertion follows similarly.
+
R is a matrix monotone function and A, B 0,
then
2Af(A) + 2Bf(B)
A + B
_
f(A) + f(B)
_
A + B.
This proof is left for Exercise 8.
For a function f : (a, b) R the notion of strong subadditivity is intro-
duced via the inequality (4.3). Recall that the conditon for f is
Tr f(A) + Tr f(A
22
) Tr f(B) + Tr f(C)
for every matrix A = A
with (A) (a, b) in the form of 3 3 block-matrix

A =
_
_
A
11
A
12
A
13
A
12
A
22
A
23
A
13
A
23
A
33
_
_
and
B =
_
A
11
A
12
A
12
A
22
_
, C =
_
A
22
A
23
A
23
A
33
_
.
The next theorem tells that f(x) = log x on (0, ) is a strong subadditive
function, since log det A = Tr log A for a positive denite matrix A.
Theorem 4.50 Let
S =
_
_
S
11
S
12
S
13
S
12
S
22
S
23
S
13
S
23
S
33
_
_
be a positive denite block-matrix. Then
det S det S
22
det
_
S
11
S
12
S
12
S
22
_
det
_
S
22
S
23
S
23
S
33
_
and the condition for equality is S
13
= S
12
S
1
22
S
23
.
Proof: Take the ortho-projections
P =
_
_
I 0 0
0 0 0
0 0 0
_
_
and Q =
_
_
I 0 0
0 I 0
0 0 0
_
_
.
Since P Q, we have the matrix inequality
[P]S [P]QSQ,
which implies the determinant inequality
det [P]S det [P]QSQ.
According to the Schur determinant formula (Theorem 2.3), this is exactly
the determinant inequality of the theorem.
The equality in the determinant inequality implies [P]S = [P]QSQ which
is
S
11
[ S
12
, S
13
]
_
S
22
S
23
S
32
S
33
_
1
_
S
21
S
31
_
= S
11
S
12
S
1
22
S
21
.
This can be written as
[ S
12
, S
13
]
_
_
S
22
S
23
S
32
S
33
_
1
_
S
1
22
0
0 0
_
_
_
S
21
S
31
_
= 0 . (4.21)
For a moment, let
_
S
22
S
23
S
32
S
33
_
1
=
_
C
22
C
23
C
32
C
33
_
.
Then
_
S
22
S
23
S
32
S
33
_
1
_
S
1
22
0
0 0
_
=
_
C
23
C
1
33
C
32
C
23
C
32
C
33
_
=
_
C
23
C
1/2
33
C
1/2
33
_
_
C
1/2
33
C
32
C
1/2
33
.
Comparing this with (4.21) we arrive at
[ S
12
, S
13
]
_
C
23
C
1/2
33
C
1/2
33
_
= S
12
C
23
C
1/2
33
+ S
13
C
1/2
33
= 0.
Equivalently,
S
12
C
23
C
1
33
+ S
13
= 0.
Since the concrete form of C
23
and C
33
is known, we can compute that
C
23
C
1
33
= S
1
22
S
23
and this gives the condition stated in the theorem.
The next theorem gives a sucient condition for the strong subadditivity
(4.3) for functions on (0, ).
Theorem 4.51 Let f : (0, ) R be a function such that f
is matrix
monotone. Then the inequality (4.3) holds.
Proof: A matrix monotone function has the representation
a + bx +
_

0
_

2
+ 1

1
+ x
_
d(),
where b 0, see (V.49) in [20]. Therefore, we have the representation
f(t) = c
_
t
1
_
a + bx +
_

0
_

2
+ 1

1
+ x
_
d()
_
dx.
By integration we have
f(t) = d at
b
2
t
2
+
_

0
_

2
+ 1
(1 t) + log
_

+ 1
+
t
+ 1
_
_
d().
The rst quadratic part satises the strong subadditivity and we have to check
the integral. Since log x is a strongly subadditive function due to Theorem
4.50, so is the integrand. The integration keeps the property.
Example 4.52 By dierentiation we can see that f(x) = (x+t) log(x+t)
with t 0 satises the strongly subadditivity. Similarly, f(x) = x
t
satises
the strongly subadditivity if 1 t 2.
In some applications the matrix monotone functions
f
p
(x) = p(1 p)
(x 1)
2
(x
p
1)(x
1p
1)
(0 < p < 1)
appear.
For p = 1/2 this is a strongly subadditivity function. Up to a constant
factor, the function is
(
x + 1)
2
= x + 2
x + 1
and all terms are known to be strongly subadditive. The function f
1/2
is
evidently matrix monotone.
Numerical computation shows that f
p
seems to be matrix monotone, but
proof is not known.
For K, L 0 and a matrix monotone function f, there is a very particular
relation between f(K) and f(L). This is in the next theorem.
+
R be a matrix monotone function. For positive
matrices K and L, let P be the projection onto the range of (K L)
+
. Then
Tr PL(f(K) f(L)) 0.
Proof: From the integral representation
f(x) =
_

0
x(1 + s)
x + s
d(s)
we have
Tr PL(f(K) f(L)) =
_

0
(1 + s)sTr PL(K + s)
1
(K L)(L + s)
1
d(s).
Hence it is sucient to prove that
Tr PL(K + s)
1
(K L)(L + s)
1
0
for s > 0. Let
0
:= K L and observe the integral representation
(K + s)
1
0
(L + s)
1
=
_
1
0
s(L + t
0
+ s)
1
0
(L + t
0
+ s)
1
dt.
So we can make another reduction:
Tr PL(L + t
0
+ s)
1
t
0
(L + t
0
+ s)
1
0
is enough to be shown. If C := L + t
0
and := t
0
, then L = C and
we have
Tr P(C )(C + s)
1
(C + s)
1
0. (4.22)
We write our operators in the form of 2 2 block-matrices:
V = (C + s)
1
=
_
V
1
V
2
V

2
V
3
_
, P =
_
I 0
0 0
_
, =
_

+
0
0
_
.
The left-hand side of the inequality (4.22) can then be rewritten as
Tr P(C )(V V ) = Tr [(C )(V V )]
11
= Tr [(V
1
s)(V V )]
11
= Tr [V ( + s)(V V )]
11
= Tr (
+
V
11
(
+
+ s)(V V )
11
)
= Tr (
+
(V V V )
11
s(V V )
11
). (4.23)
Because of the positivity of L, we have V
1
+ s, which implies V =
V V
1
V V ( + s)V = V V + sV
2
. As the diagonal blocks of a positive
operator are themselves positive, this further implies
V
1
(V V )
11
s(V
2
)
11
.
Inserting this in (4.23) gives
Tr [(V
1
s)(V V )]
11
= Tr (
+
(V V V )
11
s(V V )
11
)
Tr (
+
s(V
2
)
11
s(V V )
11
)
= sTr (
+
(V
2
)
11
(V V )
11
)
= sTr (
+
(V
1
V
1
+ V
2
V

2
) (V
1
+
V
1
V
2
2
))
= sTr (
+
V
2
V
2
+ V
2
2
).
This quantity is positive.
Theorem 4.54 Let A and B be positive operators, then for all 0 s 1,
2Tr A
s
B
1s
Tr (A+ B [AB[). (4.24)
Proof: For a self-adjoint operator X, X
denotes its positive and negative

parts. Decomposing A B = (AB)
+
(AB)
one gets
Tr A + Tr B Tr [AB[ = 2Tr A2Tr (AB)
+
,
and (4.24) is equivalent to
Tr ATr B
s
A
1s
Tr (AB)
+
.
From
A A+ (AB)
= B + (AB)
+
and B B+(AB)
+
as well as matrix monotonicity of the function x x
s
,
we can write
Tr ATr B
s
A
1s
= Tr (A
s
B
s
)A
1s
Tr ((B + (A B)
+
)
s
B
s
)A
1s
Tr ((B + (AB)
+
)
s
B
s
)(B + (AB)
+
)
1s
= Tr B + Tr (AB)
+
Tr B
s
(B + (A B)
+
)
1s
Tr B + Tr (AB)
+
Tr B
s
B
1s
= Tr (AB)
+
and the statement is obtained.
The following result is Liebs extension of the Golden-Thompson in-
equality.
Theorem 4.55 (Golden-Thompson-Lieb) Let A, B and C be self-adjoint
matrices. Then
Tr e
A+B+C
_

0
Tr e
A
(t + e
C
)
1
e
B
(t + e
C
)
1
dt .
Proof: Another formulation of the statement is
Tr e
A+Blog D
Tr e
A
J
1
D
(e
B
),
where
J
1
D
K =
_

0
(t + D)
1
K(t + D)
1
dt
(which is the formulation of (3.20)). We choose L = log D+A, = e
B
and
conclude from (4.4) that the functional
F : Tr e
L+log
is convex on the cone of positive denite matrices. It is also homogeneous of
order 1 so that the hypothesis of Lemma 4.56 (from below) is fullled. So
Tr e
A+Blog D
= Tr exp(L + log ) = F()

d
dx
Tr exp(L + log(D + x))
x=0
= Tr e
A
J
1
D
() = Tr e
A
J
1
D
(e
B
).
This is the statement with minus sign.
Lemma 4.56 Let ( be a convex cone in a vector space and F : ( R be
a convex function such that F(A) = F(A) for every > 0 and A (. If
B ( and the limit
lim
x0
F(A+ xB) F(A)
x
=:
B
F(A)
exists, then
F(B)
B
F(A) .
If the equality holds here, then F(A + xB) = (1 x)F(A) + xF(A + B) for
0 x 1.
Proof: Set a function f : [0, 1] R by f(x) := F(A+ xB). This function
is convex:
f(x
1
+ (1 )x
2
) = F((A+ x
1
B) + (1 )(A+ x
2
B))
F(A+ x
1
B) + (1 )F(A+ x
2
B))
= f(x
1
) + (1 )f(x
2
) .
The assumption is the existence of the derivative f
(0) (from the right). From

the convexity
F(A+ B) = f(1) f(0) + f
(0) = F(A) +
B
F(A).
Actually, F is subadditive:
F(A+ B) = 2F(A/2 + B/2) F(A) + F(B),
and the stated inequality follows.
If f
(0) + f(0) = f(1), then f(x) f(0) is linear. (This has also the
description that f
(x) = 0.)
When C = 0 in Theorem 4.55, then we have
Tr e
A+B
Tr e
A
e
B
which is the original Golden-Thompson inequality. If BC = CB, then
in the right-hand side, the integral
_

0
(t + e
C
)
2
dt
appears. This equals e
C
and we have Tr e
A+B+C
Tr e
A
e
B
e
C
. Without the
assumption BC = CB, this inequality is not true.
The Golden-Thompson inequality is equivalent to a kind of monotonicity
of the relative entropy, see [74]. An example of the application of the Golden-
Thompson-Lieb inequality is the strong subadditivity of the von Neumann
entropy.
About convex analysis R. Tyrell Rockafellar has a famous book: Convex
Analysis, Princeton, Princeton University Press, 1970.
Theorem 4.46 as well as the optimality of the range 2 p 2 was
proved in the paper Y. Nakamura, Classes of operator monotone functions
and Stieltjes functions, in The Gohberg Anniversary Collection, Vol. II, H.
Dym et al. (eds.), Oper. Theory Adv. Appl., Vol. 41, Birkhauser, 1989, pp.
395404. The proof here is a modication of that in the paper

Ad am Besenyei
and Denes Petz, Completely positive mappings and mean matrices, Linear
Algebra Appl. 435 (2011), 984997. Theorem 4.47 was given in the paper
Fumio Hiai and Hideki Kosaki, Means for matrices and comparison of their
norms, Indiana Univ. Math. J. 48 (1999), 899936.
The matrix monotonicity of the function (4.20) for 0 < p < 1 was recog-
nized in [75]. A proof for p [1, 2] is a modication of that in the paper V.
E. Sandor Szabo, A class of matrix monotone functions, Linear Algebra Appl.
420(2007), 7985. Related discussions are in [17], and there is an extension
to
(x a)(x b)
(f(x) f(a))(x/f(x) b/f(b))
in the paper M. Kawasaki and M. Nagisa, Transforms on operator monotone
functions, arXiv:1206.5452. (a = b = 1 and f(x) = x
p
covers (4.20).) A
shorter proof of this is in the paper F. Hansen, WYD-like skew information
measures, J. Stat. Phys. 151(2013), 974979.
The original result of Karl L owner is from 1934 (and he changed his name
to Charles Loewner when he emigrated to the US). Apart from L owners
original proof, three dierent proofs, for example by Bendat and Sherman
based on the Hamburger moment problem, by Koranyi based on the spectral
theorem of self-adjoint operators, and by Hansen and Pedersen based on
the Krein-Milman theorem. In all of them, the integral representation of
operator monotone functions was obtained to prove L owners theorem. The
proof presented here is based on [40].
The integral representation (4.6) was obtained by Julius Bendat and Sey-
mur Sherman [16]. Theorem 4.22 is from the famous paper of Frank Hansen
and Gert G. Pedersen [40], and Theorem 4.16 is from [41] of the same authors.
Theorems 4.26 and 4.28 are from the paper of J. S. Aujla and F. C. Silva
[15], that also contains Theorem 4.16 in a stronger form of majorization (see
Theorem 6.27 in Chapter 6).
Theorem 4.51 is from the paper [14]. It is an interesting question if the
opposite statement is true.
4.7. EXERCISES 187
Theorem 4.54 is in the paper K. M. R. Audenaert et al., Discriminating
states: the quantum Cherno bound, Phys. Rev. Lett. 98(2007), 160501.
The quantum information application is contained in the same paper and
also in the book [74]. The present proof is due to Narutaka Ozawa, which
is contained in the book V. Jaksic, Y. Ogata, Y. Pautrat and C.-A. Pillet,
Entropic Fluctuations in Quantum Statistical Mechanics. An Introduction,
arXiv:1106.3786. Theorem 4.53 from [74] is an extension of Theorem 4.54
and its proof is similar to that of Audenaert et al.
4.7 Exercises
1. Prove that the function : R
+
R, (x) = xlog x+(x+1) log(x+1)
is matrix monotone.
2. Give an example that f(x) = x
2
is not matrix monotone on any positive
interval.
3. Show that f(x) = e
x
is not matrix monotone on [0, ).
4. Show that if f : R
+
R is a matrix monotone function, then f is a
completely monotone function.
5. Let f be a dierentiable function on the interval (a, b) such that for
some a < c < b the function f is matrix monotone for 22 matrices on
the intervals (a, c] and [c, b). Show that f is matrix monotone for 2 2
matrices on (a,b).
6. Show that the function
f(x) =
ax + b
cx + d
(a, b, c, d R, ad > bc)
is matrix monotone on any interval which does not contain d/c.
7. Use the matrices
A =
_
1 1
1 1
_
and B =
_
2 1
1 1
_
to show that f(x) =
x
2
+ 1 is not a matrix monotone function on R
+
.
8. Let f : R
+
R be a matrix monotone function. Prove the inequality
Af(A) + Bf(B)
1
2
(A+ B)
1/2
(f(A) + f(B))(A+ B)
1/2
for positive matrices A and B. (Hint: Use that f is matrix concave and
xf(x) is matrix convex.)
9. Show that the canonical representing measure in (5.24) for the standard
matrix monotone function f(x) = (x 1)/ log x is the measure
d() =
2
(1 + )
2
d .
10. The function
log
(x) =
x
1
1
1
(x > 0, > 0, ,= 1)
is called -logaritmic function. Is it matrix monotone?
11. Give an example of a matrix convex function such that the derivative
is not matrix monotone.
12. Show that f(z) = tanz := sin z/ cos z is in T, where cos z := (e
iz
+
e
iz
)/2 and sin z := (e
iz
e
iz
)/2i.
13. Show that f(z) = 1/z is in T.
14. Show that the extreme points of the set
o
n
:= D M
sa
n
: D 0 and Tr D = 1
are the orthogonal projections of trace 1. Show that for n > 2 not all
points in the boundary are extreme.
15. Let the block-matrix
M =
_
A B
B
C
_
be positive and f : R
+
R be a convex function. Show that
Tr f(M) Tr f(A) + Tr f(C).
sa
n
the inequality
log Tr e
A+B
log Tr e
A
+
Tr Be
A
Tr e
A
holds. (Hint: Use the function (4.2).)
4.7. EXERCISES 189
17. Let the block-matrix
M =
_
A B
B
C
_
be positive and invertible. Show that
det M det A det C.
sa
n
the inequality
[ log Tr e
A+B
log Tr e
A
[ |B|
holds. (Hint: Use the function (4.2).)
19. Is it true that the function
(x) =
x
x
1
(x > 0)
is matrix concave if (0, 2)?
Chapter 5
Matrix means and inequalities
The means of numbers is a popular subject. The inequality
2ab
a + b

ab
a + b
2
is well-known for the harmonic, geometric and arithmetic means of positive
numbers. If we move from 1 1 matrices to n n matrices, then arith-
metic mean does not require any theory. Historically the harmonic mean was
the rst essential subject for matrix means, from the point of view of some
applications the name parallel sum was popular.
Carl Friedrich Gauss worked about an iteration in the period 1791 until
1828:
a
0
:= a, b
0
:= b,
a
n+1
:=
a
n
+ b
n
2
, b
n+1
:=
_
a
n
b
n
,
then the (joint) limit is called the Gauss arithmetic-geometric mean AG(a, b)
today. It has a non-trivial characterization:
1
AG(a, b)
=
2
_

0
dt
_
(a
2
+ t
2
)(b
2
+ t
2
)
.
In this chapter, rst the geometric mean will be generalized to positive
matrices and several other means will be studied in terms of matrix (i.e.,
operator) monotone functions. There is also a natural (limit) denition for
the mean of several matrices, but its explicit description is rather hopeless.
190
5.1. THE GEOMETRIC MEAN 191
5.1 The geometric mean
The geometric mean will be introduced by a motivation including a Rieman-
nian manifold.
The positive denite matrices can be considered as the variance of multi-
variate normal distributions and the information geometry of Gaussians yields
a natural Riemannian metric. Those distributions (with 0 expectation) are
given by a positive denite matrix A M
n
in the form
f
A
(x) :=
1
(2)
n
det A
exp ( A
1
x, x/2) (x C
n
).
The set T
n
of positive denite n n matrices can be considered as an open
subset of the Euclidean space R
n
2
and they form a manifold. The tangent
vectors at a footpoint A T
n
are the self-adjoint matrices M
sa
n
.
A standard way to construct an information geometry is to start with an
information potential function and to introduce the Riemannian metric
by the Hessian of the potential. The information potential is the Boltzmann
entropy
S(f
A
) :=
_
f
A
(x) log f
A
(x) dx = C + Tr log A (C is a constant).
The Hessian is
2
st
S(f
A+tH
1
+sH
2
)
t=s=0
= Tr A
1
H
1
A
1
H
2
and the inner product on the tangent space at A is
g
A
(H
1
, H
2
) = Tr A
1
H
1
A
1
H
2
.
We note here that this geometry has many symmetries, each congruence
transformation of the matrices becomes a symmetry. Namely for any invert-
ible matrix S,
g
SAS
(SH
1
S
, SH
2
S
) = g
A
(H
1
, H
2
). (5.1)
A C
1
dierentiable function : [0, 1] T
n
is called a curve, its tangent
vector at t is
(t) and the length of the curve is

_
1
0
_
g
(t)
(
(t),
(t)) dt.
Given A, B T
n
the curve
(t) = A
1/2
(A
1/2
BA
1/2
)
t
A
1/2
(0 t 1) (5.2)
192 CHAPTER 5. MATRIX MEANS AND INEQUALITIES
connects these two points: (0) = A, (1) = B. The next lemma says
that hhis is the shortest curve connecting the two points, that is called the
geodesic.
Lemma 5.1 The geodesic connecting A, B T
n
is (5.2) and the geodesic
distance is
(A, B) = | log(A
1/2
BA
1/2
)|
2
,
where | |
2
stands for the HilbertSchmidt norm.
Proof: Due to the property (5.1) we may assume that A = I, then (t) =
B
t
. Let (t) be a curve in M
sa
n
such that (0) = (1) = 0. This will be used
for the perturbation of the curve (t) in the form (t) + (t).
We want to dierentiate the length
_
1
0
_
g
(t)+(t)
(
(t) +
(t),
(t) +
(t)) dt
with respect to at = 0. When (t) = B
t
(0 t 1), note that
g
(t)
(
(t),
(t)) = Tr B
t
B
t
(log B)B
t
B
t
log B = Tr (log B)
2
does not depend on t. The derivative of the above integral at = 0 is
_
1
0
1
2
_
g
(t)
(
(t),
(t)
_
1/2
g
(t)+(t)
(
(t) +
(t),
(t) +
(t))
=0
dt
=
1
2
_
Tr (log B)
2
_
1
0
Tr
(B
t
+ (t))
1
(B
t
log B +
(t))(B
t
+ (t))
1
(B
t
log B +
(t))
=0
dt
=
1
_
Tr (log B)
2
_
1
0
Tr (B
t
(log B)
2
(t) + B
t
(log B)
(t)) dt.
To remove
(t), we integrate by part the second term:

_
1
0
Tr B
t
(log B)
(t) dt =
_
Tr B
t
(log B)(t)
_
1
0
+
_
1
0
Tr B
t
(log B)
2
(t) dt .
Since (0) = (1) = 0, the rst term vanishes here and the derivative at = 0
is 0 for every perturbation (t). Thus we can conclude that (t) = B
t
is the
geodesic curve between I and B. The distance is
_
1
0
_
Tr (log B)
2
dt =
_
Tr (log B)
2
= | log B|
2
.
The lemma is proved.
The midpoint of the curve (5.2) will be called the geometric mean of
A, B T
n
and denoted by A#B, that is,
A#B := A
1/2
(A
1/2
BA
1/2
)
1/2
A
1/2
. (5.3)
The motivation is the fact that in case of AB = BA the midpoint is
AB.
This geodesic approach will give an idea for the geometric mean of three
matrices as well.
Let A, B 0 and assume that A is invertible. We want to study the
positivity of the matrix
_
A X
X B
_
. (5.4)
for a positive X. The positivity of the block-matrix implies
B XA
1
X,
see Theorem 2.1. From the matrix monotonicity of the square root function
(Example 3.26), we obtain (A
1/2
BA
1/2
)
1/2
A
1/2
XA
1/2
, or
A
1/2
(A
1/2
BA
1/2
)
1/2
A
1/2
X.
It is easy to see that for X = A#B, the block matrix (5.4) is positive.
Therefore, A#B is the largest positive matrix X such that (5.4) is positive,
that is,
A#B = max
_
X 0 :
_
A X
X B
_
0
_
. (5.5)
The denition (5.3) is for invertible A. For a non-invertible A, an equiva-
lent possibility is
A#B := lim
0
(A+ I)#B.
(The characterization with (5.4) remains true in this general case.) If AB =
BA, then A#B = A
1/2
B
1/2
(= (AB)
1/2
). The inequality between geometric
and arithmetic means holds also for matrices, see Exercise 1.
Example 5.2 The partial ordering of operators has a geometric interpre-
tation for projections. The relation P Q is equivalent to ran P ran Q,
that is, P projects to a smaller subspace than Q. This implies that any two
projections P and Q have the largest lower bound denoted by P Q. This
operator is the orthogonal projection to the (closed) subspace ranP ran Q.
We want to show that P#Q = P Q. First we show that the block-matrix
_
P P Q
P Q Q
_
is positive. This is equivalent to the relation
_
P + P
P Q
P Q Q
_
0 (5.6)
for every constant > 0. Since
(P Q)(P + P
)
1
(P Q) = P Q
is smaller than Q, the positivity (5.6) is true due to Theorem 2.1. We conclude
that P#Q P Q.
The positivity of
_
P + P
X
X Q
_
gives the condition
Q X(P +
1
P
)X = XPX +
1
XP
X.
Since > 0 is arbitrary, XP
X = 0. The latter condition gives X = XP.

Therefore, Q X
2
. Symmetrically, P X
2
and Corollary 2.25 tells us that
P Q X
2
and so P Q X.
Theorem 5.3 Assume that A
1
, A
2
, B
1
, B
2
are positive matrices and A
1

B
1
, A
2
B
2
. Then A
1
#A
2
B
1
#B
2
.
Proof: The statement is equivalent to the positivity of the block-matrix
_
B
1
A
1
#A
2
A
1
#A
2
B
2
_
.
This is a sum of positive matrices:
_
A
1
A
1
#A
2
A
1
#A
2
A
2
_
+
_
B
1
A
1
0
0 B
2
A
2
_
.
The proof is complete.
The next theorem is the L owner-Heinz inequality already given in Sec-
tion 4.4. The present proof is based on the geometric mean.
Theorem 5.4 Assume that for the matrices A and B the inequalities 0
A B hold and 0 < t < 1 is a real number. Then A
t
B
t
.
Proof: Due to the continuity, it is enough to show the case t = k/2
n
, that
is, t is a dyadic rational number. We use Theorem 5.3 to deduce from the
inequalities A B and I I the inequality
A
1/2
= A#I B#I = B
1/2
.
The second application of Theorem 5.3 gives similarly A
1/4
B
1/4
and A
3/4
B
3/4
. The procedure can be continued to cover all dyadic rational numbers.
Arbitrary t [0, 1] can be the limit of dyadic numbers.
Theorem 5.5 The geometric mean of matrices is jointly concave, that is,
A
1
+ A
2
2
#
A
3
+ A
4
2

A
1
#A
3
+ A
2
#A
4
2
.
Proof: The block-matrices
_
A
1
A
1
#A
2
A
1
#A
2
A
2
_
and
_
A
3
A
3
#A
4
A
4
#A
3
A
4
_
are positive and so is the arithmetic mean,
_
1
2
(A
1
+ A
3
)
1
2
(A
1
#A
2
+ A
3
#A
4
)
1
2
(A
1
#A
2
+ A
3
#A
4
)
1
2
(A
2
+ A
4
)
_
.
Therefore the o-diagonal entry is smaller than the geometric mean of the
diagonal entries.
Note that the joint concavity property is equivalent to the slightly simpler
formula
(A
1
+ A
2
)#(A
3
+ A
4
) (A
1
#A
3
) + (A
2
#A
4
).
Later this inequality will be used.
The next theorem of Ando [7] is a generalization of Example 5.2. For the
sake of simplicity the formulation is in block-matrices.
Theorem 5.6 Take an ortho-projection P and a positive invertible matrix
R:
P =
_
I 0
0 0
_
, R =
_
R
11
R
12
R
21
R
22
_
.
The geometric mean is the following:
P#R = (PR
1
P)
1/2
=
_
(R
11
R
12
R
1
22
R
21
)
1/2
0
0 0
_
.
Proof: We have already P and R in block-matrix form. Due to (5.5) we
are looking for positive matrices
X =
_
X
11
X
12
X
21
X
22
_
such that
_
P X
X R
_
=
_
_
I 0 X
11
X
12
0 0 X
21
X
22
X
11
X
12
R
11
R
12
X
21
X
22
R
21
R
22
_
_
is positive. From the positivity X
12
= X
21
= X
22
= 0 follows and the
necessary and sucient condition is
_
I 0
0 0
_
_
X
11
0
0 0
_
R
1
_
X
11
0
0 0
_
,
or
I X
11
(R
1
)
11
X
11
.
The latter is equivalent to I ((R
1
)
11
)
1/2
X
2
11
((R
1
)
11
)
1/2
, which implies
that
X
11
((R
1
)
11
)
1/2
.
The inverse of a block-matrix is described in (2.4) and the proof is complete.
For projections P and Q, the theorem and Example 5.2 give

P#Q = P Q = lim
+0
(P(Q+ I)
1
P)
1/2
.
The arithmetic mean of several matrices is simpler: for (positive) matrices
A
1
, A
2
, . . . , A
n
it is
A(A
1
, A
2
, . . . , A
n
) :=
A
1
+ A
2
+ + A
n
n
.
Only the linear structure plays a role. The arithmetic mean is a good example
to show how to move from the means of two variables to three variables.
Suppose we have a device which can compute the mean of two matrices.
How to compute the mean of three? Assume that we aim to obtain the mean
of A, B and C. For the case of arithmetic mean, we can make a new device
W : (A, B, C) (A(A, B), A(A, C), A(B, C)),
which, applied to (A, B, C) many times, gives the mean of A, B and C:
W
n
(A, B, C) = (A
n
, B
n
, C
n
)
and
A
n
, B
n
, C
n
A(A, B, C) as n .
Indeed, A
n
, B
n
, C
n
are convex combinations of A, B and C, so
A
n
=
(n)
1
A+
(n)
2
B +
(n)
3
C.
One can compute the coecients
(n)
i
explicitly and show that
(n)
i
1/3.
The same holds also for B
n
and C
n
. The idea is shown by a picture and will
be extended to the geometric mean.
A
B
1
C
B
@
@
@
@
@
@
@
@
@
@
A
1
C
1
B
2
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
A
2
C
2
@
@
@
@
@
r
*
Figure 5.1: The triangles of A
n
, B
n
, C
n
.
Theorem 5.7 Let A, B, C M
n
be positive denite matrices and set a re-
cursion as
A
0
= A, B
0
= B, C
0
= C,
A
m+1
= A
m
#B
m
, B
m+1
= A
m
#C
m
, C
m+1
= B
m
#C
m
.
Then the limits
G
3
(A, B, C) := lim
m
A
m
= lim
m
B
m
= lim
m
C
m
(5.7)
exist.
Proof: First we assume that A B C. From the monotonicity property
of the geometric mean, see Theorem 5.3, we obtain that A
m
B
m
C
m
. It
follows that the sequence (A
m
) is increasing and (C
m
) is decreasing. There-
fore, the limits
L := lim
m
A
m
and U = lim
m
C
m
exist, and L U. We claim that L = U.
By continuity, B
m
L#U =: M, where L M U. Since
B
m
#C
m
= C
m+1
,
the limit m gives M#U = U. Therefore M = U and so U = L.
The general case can be reduced to the case of ordered triplet. If A, B, C
are arbitrary, we can nd numbers and such that A B C and use
the formula
(X)#(Y ) =
_
(X#Y ) (5.8)
for positive numbers and .
Let
A
1
= A, B
1
= B, C
1
= C,
and
A
m+1
= A
m
#B
m
, B
m+1
= A
m
#C
m
, C
m+1
= B
m
#C
m
.
It is clear that for the numbers
a := 1, b := and c :=
the recursion provides a convergent sequence (a
m
, b
m
, c
m
) of triplets:
()
1/3
= lim
m
a
m
= lim
m
b
m
= lim
m
c
m
.
Since
A
m
= A
m
/a
m
, B
m
= B
m
/b
m
and C
m
= C
m
/c
m
due to property (5.8) of the geometric mean, the limits stated in the theorem
must exist and equal G(A
, B
, C
)/()
1/3
.
The geometric mean of the positive denite matrices A, B, C M
n
is
dened as G
3
(A, B, C) in (5.7). Explicit formula is not known and the same
kind of procedure can be used to make denition of the geometric mean of k
matrices. If P
1
, P
2
, . . . , P
k
are ortho-projections, then Example 5.2 gives the
limit
G
k
(P
1
, P
2
, . . . , P
k
) = P
1
P
2
P
k
.
5.2. GENERAL THEORY 199
5.2 General theory
The rst example is the parallel sum which is a constant multiple of the
harmonic mean.
Example 5.8 It is well-known in electricity that if two resistors with re-
sistance a and b are connected parallelly, then the total resistance q is the
solution of the equation
1
q
=
1
a
+
1
b
.
Then
q = (a
1
+ b
1
)
1
=
ab
a + b
is the harmonic mean up to a factor 2. More generally, one can consider
n-point network, where the voltage and current vectors are connected by a
positive matrix. The parallel sum
A : B = (A
1
+ B
1
)
1
of two positive denite matrices represents the combined resistance of two
n-port networks connected in parallel.
One can check that
A : B = AA(A+ B)
1
A.
Therefore A : B is the Schur complement of A+ B in the block-matrix
_
A A
A A+ B
_
,
see Theorem 2.4.
It is easy to see that for 0 < A C and 0 < B D, then A : B C : D.
The parallel sum can be extended to all positive matrices:
A : B = lim
0
(A+ I) : (B + I) .
Note that all matrix means can be expressed as an integral of parallel sums
(see Theorem 5.11 below).
On the basis of the previous example, the harmonic mean of the positive
matrices A and B is dened as
H(A, B) := 2(A : B)
Upper part: An n-point network with the input and output voltage vectors.
Below: Two parallelly connected networks
Assume that for all positive matrices A, B (of the same size) the matrix
A B is dened. Then is called an operator connection if it satises
the following conditions:
(i) 0 A C and 0 B D imply
A B C D (joint monotonicity),
(ii) if 0 A, B and C = C
, then
C(A B)C (CAC) (CBC) (transformer inequality), (5.9)
(iii) if 0 A
n
, B
n
and A
n
A, B
n
B then
(A
n
B
n
) (A B) (upper semi-continuity).
The parallel sum is an example of operator connections.
Lemma 5.9 Assume that is an operator connection. If C = C
is invert-
ible, then
C(A B)C = (CAC) (CBC), (5.10)
and for every 0
(A B) = (A) (B) (positive homogeneity) (5.11)
holds.
Proof: In the inequality (5.9) A and B are replaced by C
1
AC
1
and
C
1
BC
1
, respectively:
A B C(C
1
AC
1
C
1
BC
1
)C.
Replacing C with C
1
, we have
C(A B)C CAC CBC.
This and (5.9) give equality.
When > 0, letting C :=
1/2
I in (5.10) implies (5.11). When = 0, let
0 <
n
0. Then (
n
I) (
n
I) 0 0 by (iii) above while (
n
I) (
n
I) =
n
(I I) 0. Hence 0 = 0 0, which is (5.11) for = 0.
The next fundamental theorem of Kubo and Ando says that there is a one-
to-one correspondence between operator connections and matrix monotone
functions on [0, ).
Theorem 5.10 (Kubo-Ando theorem) For each operator connection
there exists a unique matrix monotone function f : R
+
R
+
such that
f(t)I = I (tI) (t R
+
) (5.12)
and for A > 0 and B 0 the formula
A B = A
1/2
f(A
1/2
BA
1/2
)A
1/2
= f(BA
1
)A (5.13)
holds, where the last term is dened via analytic functional calculus.
Proof: Let be an operator connection. First we show that if an ortho-
projection P commutes with A and B, then P commutes A B and
((AP) (BP)) P = (A B)P. (5.14)
Since PAP = AP A and PBP = BP B, it follows from (ii) and (i)
of the denition of that
P(A B)P (PAP) (PBP) = (AP) (BP) A B. (5.15)
Hence (A B P(A B)P)
1/2
exists so that
(A B P(A B)P)
1/2
P
2
= P(A B P(A B)P)P = 0.
Therefore, (A B P(A B)P)
1/2
P = 0 and so (A B)P = P(A B)P.
This implies that P commutes with A B. Similarly, P commutes with
(AP) (BP) as well, and (5.14) follows from (5.15). For every t 0, since
I (tI) commute with all ortho-projections, it is a scalar multiple of I. Thus,
we see that there is a function f 0 on [0, ) satisfying (5.12). The unique-
ness of such function f is obvious, and it follows from (iii) of the deni-
tion of the operator connection that f is right-continuous for t 0. Since
t
1
f(t)I = (t
1
I) I for t > 0 thanks to (5.11), it follows from (iii) of the
denition again that t
1
f(t) is left-continuous for t > 0 and so is f(t). Hence
f is continuous on [0, ).
To show the operator monotonicity of f, let us prove that
f(A) = I A. (5.16)
Let A =
m
i=1
i
P
i
, where
i
> 0 and P
i
are projections with
m
i=1
P
i
= I.
Since each P
i
commute with A, using (5.14) twice we have
I A =
m
i=1
(I A)P
i
=
m
i=1
(P
i
(AP
i
))P
i
=
m
i=1
(P
i
(
i
P
i
))P
i
=
m
i=1
(I (
i
I))P
i
=
m
i=1
f(
i
)P
i
= f(A).
For general A 0 choose a sequence 0 < A
n
of the above form such that
A
n
A. By the upper semi-continuity we have
I A = lim
n
I A
n
= lim
n
f(A
n
) = f(A).
So (5.16) is shown. Hence, if 0 A B, then
f(A) = I A I B = f(B)
and we conclude that f is matrix monotone.
When A is invertible, we can use (5.10):
A B = A
1/2
(I A
1/2
BA
1/2
)A
1/2
= A
1/2
f(A
1/2
BA
1/2
)A
1/2
and the rst part of (5.13) is obtained. The rest is a general property.
Note that the general formula is
A B = lim
0
A
= lim
0
A
1/2
f(A
1/2
A
1/2
)A
1/2
,
where A
:= A+I and B
:= B+I. We call f the representing function

of . For scalars s, t > 0 we have s t = sf(t/s).
The next theorem comes from the integral representation of matrix mono-
tone functions and from the previous theorem.
Theorem 5.11 Every operator connection has an integral representation
A B = aA+ bB +
_
(0,)
1 +
_
(A) : B
_
d() (A, B 0),
where is a positive nite Borel measure on [0, ).
Due to this integral expression, one can often derive properties of general
operator connections by checking them for parallel sum.
Lemma 5.12 For every vector z,
infx, Ax +y, By : x + y = z = z, (A : B)z .
Proof: When A, B are invertible, we have
A : B =
_
B
1
(A+B)A
1
_
1
=
_
(A+B)B
_
(A+B)
1
B = BB(A+B)
1
B.
For all vectors x, y we have
x, Ax +z x, B(z x) z, (A : B)z
= z, Bz +x, (A+ B)x 2Re x, Bz z, (A : B)z
= z, B(A + B)
1
Bz +x, (A+ B)x 2Re x, Bz
= |(A+ B)
1/2
Bz|
2
+|(A+ B)
1/2
x|
2
2Re (A+ B)
1/2
x, (A + B)
1/2
Bz 0.
In particular, the above is equal to 0 if x = (A+B)
1
Bz. Hence the assertion
is shown when A, B > 0. For general A, B,
z, (A : B)z = inf
>0
z,
_
(A+ I) : (B + I)
_
z
= inf
>0
inf
y
_
x, (A+ I)x +z x, (B + I)(z x)
_
= inf
y
_
x, Ax +z x, B(z x)
_
.
The proof is complete.
The next result is called the transformer inequality, it is a stronger
version of (5.9).
Theorem 5.13 For every A, B 0 and general S,
S
(A B)S (S
AS) (S
BS)
and equality holds if S is invertible.
Proof: For z = x + y Lemma 5.12 implies
z, S
(A : B)Sz = Sz, (A : B)Sz Sx, ASx +Sy, BSy

= x, S
ASx +y, S
BSy.
Hence S
(A : B)S (S
AS) : (S
BS) follows. The statement of the theorem

is true for the parallel sum and by Theorem 5.11 we obtain for any operator
connection. The proof of the last assertion is similar to that of Lemma 5.9.
A very similar argument gives the joint concavity:

(A B) + (C D) (A+ C) (B + D) .
The next theorem is about a recursively dened double sequence.
Theorem 5.14 Let
1
and
2
be operator connections dominated by the
arithmetic mean. For positive matrices A and B set a recursion
A
1
= A, B
1
= B, A
k+1
= A
k

1
B
k
, B
k+1
= A
k

2
B
k
. (5.17)
Then (A
k
) and (B
k
) converge to the same operator connection A B.
Proof: First we prove the convergence of (A
k
) and (B
k
). From the inequal-
ity
X
i
Y
X + Y
2
we have
A
k+1
+ B
k+1
= A
k

1
B
k
+ A
k

2
B
k
A
k
+ B
k
.
Therefore the decreasing positive sequence has a limit:
A
k
+ B
k
X as k .
Moreover,
a
k+1
:= |A
k+1
|
2
2
+|B
k+1
|
2
2
|A
k
|
2
2
+|B
k
|
2
2
1
2
|A
k
B
k
|
2
2
,
where |X|
2
= (Tr X
X)
1/2
, the Hilbert-Schmidt norm. The numerical se-
quence a
k
is decreasing, it has a limit and it follows that
|A
k
B
k
|
2
2
0
and A
k
, B
k
X/2 as k .
For each k, A
k
and B
k
are operator connections of the matrices A and B,
and the limit is an operator connection as well.
Example 5.15 At the end of the 18th century J.-L. Lagrange and C.F.
Gauss became interested in the arithmetic-geometric mean of positive num-
bers. Gauss worked on this subject in the period 1791 until 1828.
With the initial conditions
a
1
= a, b
1
= b
set the recursion
a
n+1
=
a
n
+ b
n
2
, b
n+1
=
_
a
n
b
n
.
Then the (joint) limit is the so-called Gauss arithmetic-geometric mean
AG(a, b) with the characterization
1
AG(a, b)
=
2
_

0
dt
_
(a
2
+ t
2
)(b
2
+ t
2
)
,
see [34]. It follows from Theorem 5.14 that the Gauss arithmetic-geometric
mean can be dened also for matrices. Therefore the function f(x) = AG(1, x)
is a matrix monotone function.
It is an interesting remark, that (5.17) can have a small modication:
A
1
= A, B
1
= B, A
k+1
= A
k

1
B
k
, B
k+1
= A
k+1
2
B
k
. (5.18)
A similar proof gives the existence of the limit. (5.17) is called Gaussian
double-mean process, while (5.18) is Archimedean double-mean pro-
cess.
The symmetric matrix means are binary operations on positive matrices.
They are operator connections with the properties A A = A and A B =
B A. For matrix means we shall use the notation m(A, B). We repeat the
main properties:
(1) m(A, A) = A for every A,
(2) m(A, B) = m(B, A) for every A and B,
(3) if A B, then A m(A, B) B,
(4) if A A
and B B
, then m(A, B) m(A
, B
),
(5) m is upper semi-continuous,
(6) C m(A, B) C
m(CAC
, CBC
).
It follows from the Kubo-Ando theorem (Theorem 5.10) that the operator
means are in a one-to-one correspondence with matrix monotone functions
R
+
R
+
satisfying conditions f(1) = 1 and tf(t
1
) = f(t). Such a matrix
monotone function on R
+
is called standard. Given a matrix monotone
function f, the corresponding mean is
m
f
(A, B) = A
1/2
f(A
1/2
BA
1/2
)A
1/2
(5.19)
when A is invertible. (When A is not invertible, take a sequence A
n
of
invertible operators approximating A such that A
n
A and let m
f
(A, B) =
lim
n
m
f
(A
n
, B).) It follows from the denition (5.19) of means that if f g,
then m
f
(A, B) m
g
(A, B).
+
R
+
is a standard matrix monotone function,
then
2x
x + 1
f(x)
x + 1
2
.
Proof: From the dierentiation of the formula f(x) = xf(x
1
), we obtain
f
(1) = 1/2. Since f(1) = 1, the concavity of the function f gives f(x)
(1 + x)/2.
If f is a standard matrix monotone function, then so is f(x
1
)
1
. The
inequality f(x
1
)
1
(1 + x)/2 gives f(x) 2x/(x + 1).
If f(x) is a standard matrix monotone function with the matrix mean
m( , ), then the matrix mean corresponding to x/f(x) is called the dual of
m( , ) and denoted by m
( , ). For instance, the dual of the arithmetic

mean is the harmonic mean and #
= #,
The next theorem is a Trotter-like product formula for matrix means.
Theorem 5.17 For a symmetric matrix mean m and for self-adjoint A, B
we have
lim
n
m(e
A/n
, e
B/n
)
n
= exp
A + B
2
.
Proof: It is an exercise to prove that
lim
t0
m(e
tA
, e
tB
) I
t
=
A+ B
2
.
The choice t = 1/n gives
exp
_
n(I m(e
A/n
, e
B/n
))
_
exp
A+ B
2
.
So it is enough to show that
D
n
:= m(e
A/n
, e
B/n
)
n
exp
_
n(I m(e
A/n
, e
B/n
))
_
0
as n . If A is replaced by A + aI and B is replaced by B + aI with a
real number a, then D
n
does not change. Therefore we can assume A, B 0.
We use the abbreviation F(n) := m(e
A/n
, e
B/n
), so
D
n
= F(n)
n
exp (n(I F(n))) = F(n)
n
e
n
k=0
n
k
k!
F(n)
k
= e
n
k=0
n
k
k!
F(n)
n
e
n
k=0
n
k
k!
F(n)
k
= e
n
k=0
n
k
k!
_
F(n)
n
F(n)
k
_
.
Since F(n) I, we have
|D
n
| e
n
k=0
n
k
k!
|F(n)
n
F(n)
k
| e
n
k=0
n
k
k!
|I F(n)
|kn|
|.
Since
0 I F(n)
|kn|
[k n[(I F(n)),
it follows that
|D
n
| e
n
|I F(n)|
k=0
n
k
k!
[k n[.
The Schwarz inequality gives that
k=0
n
k
k!
[k n[
_

k=0
n
k
k!
_
1/2
_

k=0
n
k
k!
(k n)
2
_
1/2
= n
1/2
e
n
.
So we have
|D
n
| n
1/2
|n(I F(n))|.
Since |n(I F(n))| is bounded, the limit is really 0.
For the geometric mean the previous theorem gives the Lie-Trotter for-
mula, see Theorem 3.8.
Theorem 5.7 is about the geometric mean of several matrices and it can be
extended for arbitrary symmetric means. The proof is due to Miklos P ala
and the Hilbert-Schmidt norm |X|
2
= (Tr X
X)
1/2
will be used.
Theorem 5.18 Let m( , ) be a symmetric matrix mean and 0 A, B, C
M
n
. Set a recursion:
(1) A
(0)
:= A, B
(0)
:= B, C
(0)
:= C,
(2) A
(k+1)
:= m(A
(k)
, B
(k)
), B
(k+1)
:= m(A
(k)
, C
(k)
) and C
(k+1)
:=
m(B
(k)
, C
(k)
).
Then the limits lim
m
A
(m)
= lim
m
B
(m)
= lim
m
C
(m)
exist and this can be
dened as m(A, B, C).
Proof: From the well-known inequality
m(X, Y )
X + Y
2
(5.20)
we have
A
(k+1)
+ B
(k+1)
+ C
(k+1)
A
(k)
+ B
(k)
+ C
(k)
.
Therefore the decreasing positive sequence has a limit:
A
(k)
+ B
(k)
+ C
(k)
X as k . (5.21)
It follows also from (5.20) that
|m(C, D)|
2
2

|C|
2
2
+|D|
2
2
2

1
4
|C D|
2
2
.
Therefore,
a
k+1
:= |A
(k+1)
|
2
2
+|B
(k+1)
|
2
2
+|C
(k+1)
|
2
2
|A
(k)
|
2
2
+|B
(k)
|
2
2
+|C
(k)
|
2
2
1
4
_
|A
(k)
B
(k)
|
2
2
+|B
(k)
C
(k)
|
2
2
+|C
(k)
A
(k)
|
2
2
_
=: a
k
c
k
.
Since the numerical sequence a
k
is decreasing, it has a limit and it follows
that c
k
0. Therefore,
A
(k)
B
(k)
0, A
(k)
C
(k)
0.
If we combine these formulas with (5.21), then
A
(k)
1
3
X as k .
Similar convergence holds for B
(k)
and C
(k)
.
Theorem 5.19 The mean m(A, B, C) dened in Theorem 5.18 has the fol-
lowing properties:
(1) m(A, A, A) = A for every A,
(2) m(A, B, C) = m(B, A, C) = m(C, A, B) for every A, B and C,
(3) if A B C, then A m(A, B, C) C,
(4) if A A
, B B
and C C
, then m(A, B, C) m(A
, B
, C
),
(5) m is upper semi-continuous,
(6) Dm(A, B, C) D
m(DAD
, DBD
, DCD
) and equality holds if D

is invertible.
The above properties can be shown from convergence arguments based on
Theorem 5.18. The details are omitted here.
Example 5.20 If P
1
, P
2
, P
3
are ortho-projections, then
m(P
1
, P
2
, P
3
) = P
1
P
2
P
3
holds for several means, see Example 5.23.
Now we consider the geometric mean G
3
(A, A, B). If A > 0, then
G
3
(A, A, B) = A
1/2
G
3
(I, I, A
1/2
BA
1/2
)A
1/2
.
Since I, I, A
1/2
BA
1/2
are commuting matrices, it is easy to compute the
geometric mean. So
G
3
(A, A, B) = A
1/2
(A
1/2
BA
1/2
)
1/3
A
1/2
.
This is an example of a weighted geometric mean:
G
t
(A, B) = A
1/2
(A
1/2
BA
1/2
)
t
A
1/2
(0 < t < 1).
There is a general theory for the weighted geomeric means.
5.3 Mean examples
Recall that a matrix monotone function f : R
+
R
+
is called standard if
f(1) = 1 and xf(x
1
) = f(x). Standard functions are used to dene matrix
means in (5.19).
Here are familiar standard matrix monotone functions:
2x
x + 1

x
x 1
log x

x + 1
2
.
The corresponding increasing means are the harmonic, geometric, logarithmic
and arithmetic. By Theorem 5.16 we see that the harmonic mean is the
smallest and the arithmetic mean is the largest among the symmetric matrix
means.
First we study the harmonic mean H(A, B). A variational expression is
expressed in terms of a 2 2 block-matrices.
Theorem 5.21
H(A, B) = max
_
X 0 :
_
2A 0
0 2B
_
_
X X
X X
__
.
Proof: The inequality of the two block-matrices is equivalently written as
x, 2Ax +y, 2By x + y, X(x + y).
Therefore the proof is reduced to Lemma 5.12, where x + y is written by z
and H(A, B) = 2(A : B).
5.3. MEAN EXAMPLES 211
Recall the geometric mean
G(A, B) = A#B = A
1/2
(A
1/2
BA
1/2
)
1/2
A
1/2
which corresponds to f(x) =

x. The mean A#B is the unique positive
solution to the equation XA
1
X = B and therefore (A#B)
1
= A
1
#B
1
.
f(x) =
x 1
log x
is matrix monotone due to the formula
_
1
0
x
t
dt =
x 1
log x
.
The standard property is obvious. The matrix mean induced by the function
f(x) is called the logarithmic mean. The logarithmic mean of positive
operators A and B is denoted by L(A, B).
From the inequality
x 1
log x
=
_
1
0
x
t
dt =
_
1/2
0
(x
t
+ x
1t
) dt
_
1/2
0
2
xdt =
x
of the real functions we have the matrix inequality
A#B L(A, B).
It can similarly be proved that L(A, B) (A+ B)/2.
From the integral formula
1
L(a, b)
=
log a log b
a b
=
_

0
1
(a + t)(b + t)
dt
one can obtain
L(A, B)
1
=
_

0
(tA+ B)
1
t + 1
dt.
In the next example we study the means of ortho-projections.

Example 5.23 Let P and Q be ortho-projections. It was shown in Example
5.2 that P#Q = P Q. The inequality
_
2P 0
0 2Q
_
_
P Q P Q
P Q P Q
_
is true since
_
P 0
0 Q
_
_
P Q 0
0 P Q
_
,
_
P P Q
P Q Q
_
0.
This gives that H(P, Q) P Q and from the other inequality H(P, Q)
P#Q, we obtain H(P, Q) = P Q = P#Q.
The general matrix mean m
f
(P, Q) has the integral expression
m
f
(P, Q) = aP + bQ +
_
(0,)
1 +
_
(P) : Q
_
d().
Since
(P) : Q =

1 +
(P Q),
we have
m
f
(P, Q) = aP + bQ + c(P Q).
Note that a = f(0), b = lim
x
f(x)/x and c = ((0, )). If a = b = 0,
then c = 1 (since m(I, I) = I) and m
f
(P, Q) = P Q.
Example 5.24 The power dierence means are determined by the func-
tions
f
t
(x) =
t 1
t

x
t
1
x
t1
1
(1 t 2), (5.22)
where the values t = 1, 1/2, 1, 2 correspond to the well-known means as
harmonic, geometric, logarithmic and arithmetic. The functions (5.22) are
standard matrix monotone [39] and it can be shown that for xed x > 0 the
value f
t
(x) is an increasing function of t. The case t = n/(n 1) is simple so
that
f
t
(x) =
1
n
n1
k=0
x
k/(n1)
and the matrix monotonicity is obvious.
Example 5.25 The Heinz mean
H
t
(x, y) =
x
t
y
1t
+ x
1t
y
t
2
(0 t 1/2)
interpolates the arithmetic and geometric means. The corresponding stan-
dard function
f
t
(x) =
x
t
+ x
1t
2
is obviously matrix monotone and a decreasing function of the parameter t.
Therefore we can have Heinz mean for matrices. The formula is
H
t
(A, B) = A
1/2
(A
1/2
BA
1/2
)
t
+ (A
1/2
BA
1/2
)
1t
2
A
1/2
.
This is between geometric and arithmetic means:
A#B H
t
(A, B)
A+ B
2
(0 t 1/2) .
Example 5.26 For x ,= y the Stolarsky mean is

m
p
(x, y) =
_
p
x y
x
p
y
p
_ 1
1p
=
_
1
y x
_
y
x
t
p1
dt
_ 1
p1
,
where the case p = 1 is understood as
lim
p=1
m
p
(x, y) =
1
e
_
x
x
y
y
_ 1
xy
.
If 2 p 2, then f
p
(x) = m
p
(x, 1) is a matrix monotone function (see
Theorem 4.46), so it can make a matrix mean. The case of p = 1 is called
identric mean and the case p = 0 is the well-known logarithmic mean.
It is known that the next canonical representation holds for any standard
matrix monotone function R
+
R
+
.
+
R
+
be a standard matrix monotone function.
Then f admits a canonical representation
f(x) =
1 + x
2
exp
_
1
0
( 1)(1 x)
2
( + x)(1 + x)(1 + )
h() d (5.23)
where h : [0, 1] [0, 1] is a measurable function.
Example 5.28 In the function (5.23) we take
h() =
_
1 if a b,
0 otherwise
where 0 a b 1.
Then an easy calculation gives
( 1)(1 x)
2
( + x)(1 + x)(1 + )
=
2
1 +

1
+ x

x
1 + x
.
Thus
_
b
a
( 1)(1 x)
2
( + x)(1 + x)(1 + )
d =
_
log(1 + )
2
log( + x) log(1 + x)
b
=a
= log
(1 + b)
2
(1 + a)
2
log
b + x
a + x
log
1 + bx
1 + ax
So
f(x) =
(b + 1)
2
2(a + 1)
2
(1 + x)(a + x)(1 + ax)
(b + x)(1 + bx)
.
For h 0 the largest function f(x) = (1 + x)/2 comes and h 1 gives the
smallest function f(x) = 2x/(1 + x). If
_
1
0
h()
d = +,
then f(0) = 0.
The next theorem is the canonical representation for the reciprocal 1/f of
a standard matrix monotone function f (1/f is a matrix monotone decreasing
function).
+
R
+
be a standard matrix monotone function,
then
1
f(x)
=
_
1
0
1 +
2
_
1
x +
+
1
1 + x
_
d(), (5.24)
where is a probability measure on [0, 1].
A standard matrix monotone function f : R
+
R
+
is called regular if
f(0) > 0. The next theorem provides a bijection between the regular standard
matrix monotone functions and the non-regular ones.
+
R
+
be a standard matrix monotone function
with f(0) > 0. Then
f(x) :=
1
2
_
(x + 1) (x 1)
2
f(0)
f(x)
_
is standard matrix monotone as well. Moreover, f

f gives a bijection
between f T : f(0) > 0 and f T : f(0) = 0, where T is the set of
all standard matrix monotone functons R
+
R
+
.
Example 5.31 Let A, B M
n
be positive denite matrices and m be a
matrix mean. The block-matrix
_
A m(A, B)
m(A, B) B
_
is positive if and only if m(A, B) A#B. Similarly,
_
A
1
m(A, B)
1
m(A, B)
1
B
1
_
0
if and only m(A, B) A#B.
If
1
,
2
, . . . ,
n
are positive numbers, then the matrix A M
n
dened as
A
ij
=
1
L(
i
,
j
)
is positive for n = 2 according to the above argument. However, this is true
for every n due to the formula
1
L(x, y)
=
_

0
1
(x + t)(y + t)
dt.
(Another argument is in Example 2.55.)
From the harmonic mean we obtain the mean matrix
[H(
i
,
j
)] =
_
2
i
i
+
j
_
.
which is positive since it is the Hadamard product of two positive matrices
(one of which is the Cauchy matrix).
A general description of positive mean matrices and many examples are
found in the book [45] and the paper [46]. It is worthwhile to note that two
dierent notions of the matrix mean m(A, B) and the mean matrix [m(
i
,
j
)]
are associated with a standard matrix monotone function.
5.4 Mean transformation
If 0 A, B M
n
, then a matrix mean m
f
(A, B) M
n
has a slightly com-
plicated formula expressed by the function f : R
+
R
+
of the mean. If
AB = BA, then the situation is simpler: m
f
(A, B) = f(AB
1
)B. The mean
introduced here will be a linear mapping M
n
M
n
. If n > 1, then this is
essentially dierent from m
f
(A, B).
From A and B we have the linear mappings M
n
M
n
dened as
L
A
X := AX, R
B
X := XB (X M
n
).
So L
A
is the left-multiplication by A and R
B
is the right-multiplication by
B. Obviously, they are commuting operators, L
A
R
B
= R
B
L
A
, and they can
be considered as matrices in M
n
M
n
= M
n
2.
The denition of the mean transformation is
M
f
(A, B) := m
f
(L
A
, R
B
) .
Sometime the notation J
f
A,B
is used for this.
For f(x) =
x we have the geometric mean which is a simple example.

Example 5.32 Since L
A
and R
B
commute, the example of geometric mean
is the following:
L
A
#R
B
= (L
A
)
1/2
(R
B
)
1/2
= L
A
1/2R
B
1/2, X A
1/2
XB
1/2
.
It is not true that M(A, B)X 0 if X 0, but as a linear mapping M(A, B)
is positive:
X, M(A, B)X = Tr X
A
1/2
XB
1/2
= Tr B
1/4
X
A
1/2
XB
1/4
0
for every X M
n
.
Let A, B > 0. The equality M(A, B)A = M(B, A)A immediately implies
that AB = BA. From M(A, B) = M(B, A) we can nd that A = B
with some number > 0. Therefore M(A, B) = M(B, A) is a very special
situation for the mean transformation.
The logarithmic mean transformation is
M
log
(A, B)X =
_
1
0
A
t
XB
1t
dt.
In the next example we have a formula for general M(A, B).
5.4. MEAN TRANSFORMATION 217
Example 5.33 Assume that A and B act on a Hilbert space which has two
orthonormal bases [x
1
, . . . , [x
n
and [y
1
, . . . , [y
n
such that
A =
i
[x
i
x
i
[, B =
j
[y
j
y
j
[.
Then for f(x) = x
k
we have
f(L
A
R
1
B
)R
B
[x
i
y
j
[ = A
k
[x
i
y
j
[B
k+1
=
k
i
k+1
j
[x
i
y
j
[
= f(
i
/
j
)
j
[x
i
y
j
[ = m
f
(
i
,
j
)[x
i
y
j
[
and for a general f
M
f
(A, B)[x
i
y
j
[ = m
f
(
i
,
j
)[x
i
y
j
[.
This shows also that M
f
(A, B) 0 with respect to the Hilbert-Schmidt inner
product.
Another formulation is also possible. Let A = UDiag(
1
, . . . ,
n
)U
, B =
V Diag(
1
, . . . ,
n
)V
with unitaries U, V . Let [e

1
, . . . , [e
n
be the standard
basis vectors. Then
M
f
(A, B)X = U ([m
f
(
i
,
j
)]
ij
(U
XV )) V
.
It is enough to check the case X = [x
i
y
j
[. Then
U ([m
f
(
i
,
j
)]
ij
(U
[x
i
y
j
[V )) V

= U ([m
f
(
i
,
j
)]
ij
[e
i
e
j
[) V
= m
f
(
i
,
j
)U[e
i
e
j
[V

= m
f
(
i
,
j
)[x
i
y
j
[.
For the matrix means we have m(A, A) = A, but M(A, A) is rather dif-
ferent, it cannot be A since it is a transformation. If A =
i
[x
i
x
i
[,
then
M(A, A)[x
i
x
j
[ = m(
i
,
j
)[x
i
x
j
[.
(This is related to the so-called mean matrix, see Example 5.31)
Example 5.34 Here we show a very special inequality between the geomet-
ric mean transformation M
G
(A, B) and the arithmetic mean transformation
M
A
(A, B). They are
M
G
(A, B)X = A
1/2
XB
1/2
, M
A
(A, B)X =
1
2
(AX + XB).
There is an integral formula
M
G
(A, B)X =
_

A
it
M
A
(A, B)XB
it
d(t), (5.25)
where the probability measure is
d(t) =
1
cosh(t)
dt.
From (5.25) it follows that
|M
G
(A, B)X| |M
A
(A, B)X| (5.26)
which is an operator norm inequality. A general comparison theorem of this
kind between mean transformations is given in [44].
The next theorem gives the transformer inequality.
Theorem 5.35 Let f : [0, +) [0, +) be a matrix monotone function
and M( , ) be the corresponding mean transformation. If : M
n
M
m
is
a 2-positive trace-preserving mapping and matrices A, B M
n
are positive,
then
M(A, B)
M((A), (B)). (5.27)

Proof: By approximation we may assume that A, B, (A), (B) > 0. In-
deed, assume that the conclusion holds under this positive deniteness con-
dition. For each > 0 let
(X) :=
(X) + (Tr X)I
m
1 + m
, X M
n
,
which is 2-positive and trace-preserving. If A, B > 0, then
(A),
(B) > 0
as well and hence (5.27) holds for
. Letting 0 implies that (5.27) for

is true for all A, B > 0. Then by taking the limit from A + I
n
, B + I
n
as
, we have (5.27) for all A, B 0. Now assume A, B, (A), (B) > 0.
Based on the L owner theorem, we may consider f(x) = x/(+x) ( > 0).
Then
M(A, B) =
L
A
I +L
A
R
1
B
, M(A, B)
1
= (I +L
A
R
1
B
)L
1
A
and similarly M((A), (B))
1
= (I + L
(A)
R
1
(B)
)L
1
(A)
. The statement
(5.27) has the equivalent form
M((A), (B))
1
M(A, B)
1
,
which means
(X), (I +L
(A)
R
1
(B)
)L
1
(A)
(X) X, (I +L
A
R
1
B
)L
1
A
X
or
Tr (X
)(A)
1
(X)+Tr (X)(B)
1
(X
) Tr X
A
1
X+Tr XB
1
X
.
This inequality is true due to the matrix inequality
(X
)(Y )
1
(X) (X
Y
1
X) (Y > 0),
see Lemma 2.46.
If
1
has the same properties as in the previous theorem, then we have
equality in formula (5.27).
+
R
+
be a matrix monotone function with f(1) =
1 and M( , ) be the corresponding mean transformation. Assume that 0
A A
and 0 B B
in M
n
. Then M(A, B) M(A
, B
).
Proof: By continuity we may assume that A, B > 0. Based on the L owner
theorem, we may consider f(x) = x/( + x) ( > 0). Then the statement is
L
A
(I +L
A
R
1
B
)
1
L
A
(I +L
A
R
1
B
)
1
,
which is equivalent to the relation
L
1
A
+R
1
B
= (I +L
A
R
1
B
)L
1
A
(I +L
A
R
1
B
)L
1
A
= L
1
A
+R
1
B
.
This is true, since L
1
A
L
1
A
and R
1
B
R
1
B
due to the assumption.
Theorem 5.37 Let f be a matrix monotone function with f(1) = 1 and M
f
be the corresponding transformation mean. It has the following properties:
(1) M
f
(A, B) = M
f
(A, B) for a number > 0.
(2) (M
f
(A, B)X)
= M
f
(B, A)X
.
(3) M
f
(A, A)I = A.
(4) Tr M
f
(A, A)
1
Y = Tr A
1
Y .
(5) (A, B) X, M
f
(A, B)Y is continuous.
(6) Let
C :=
_
A 0
0 B
_
0.
Then
M
f
(C, C)
_
X Y
Z W
_
=
_
M
f
(A, A)X M
f
(A, B)Y
M
f
(B, A)Z M
f
(B, B)Z
_
.
The proof of the theorem is an elementary computation. Property (6) is
very essential. It tells that it is sucient to know the mean transformation
for two identical matrices.
The next theorem is an axiomatic characterization of the mean transfor-
mation.
Theorem 5.38 Assume that for any n N and for all 0 A, B M
n
, the
linear operator L(A, B) : M
n
M
n
is dened. L(A, B) = M
f
(L
A
, R
B
) with
a matrix monotone function f if and only if L has the following properties:
(i) (X, Y ) X, L(A, B)Y is an inner product on M
n
.
(ii) (A, B) X, L(A, B)Y is continuous.
(iii) For a trace-preserving completely positive mapping : M
n
M
m
,
L(A, B)
L(A, B)
holds.
(iv) Let
C :=
_
A 0
0 B
_
> 0.
Then
L(C, C)
_
X Y
Z W
_
=
_
L(A, A)X L(A, B)Y
L(B, A)Z L(B, B)Z
_
.
The proof needs a few lemmas. Use the notation P
n
:= A M
n
: A > 0.
Lemma 5.39 If U, V M
n
are arbitrary unitary matrices, then for every
A, B P
n
and X M
n
we have
X, L(A, B)X = UXV

, L(UAU
, V BV
)UXV

.
Proof: For a unitary matrix U M
n
dene (A) = U
AU. Then : M
n

M
n
is trace-preserving completely positive and
(A) =
1
(A) = UAU
.
Thus by double application of (iii) we obtain
X, L(A, A)X = X, L(
1
A,
1
A)X
X, L(
1
A,
1
A)
X
=
X, L(
1
A,
1
A
X,
1
L(A, A)(
1
)
X
= X, L(A, A)X,
hence
X, L(A, A)X = UAU
, L(UAU
, UAU
)UXU
.
Now for the matrices
C =
_
A 0
0 B
_
P
2n
, Y =
_
0 X
0 0
_
M
2n
and W =
_
U 0
0 V
_
M
2n
it follows by (iv) that
X, L(A, B)X = Y, L(C, C)Y
= WY W
, L(WCW
, WCW
)WY W
= UXV

L(UAU
, V BV
)UXV

and we have the statement.

Lemma 5.40 Suppose that L(A, B) is dened by the axioms (i)(iv). Then
there exists a unique continuous function d: R
+
R
+
R
+
such that
d(r, r) = rd(, ) (r, , > 0)
and for every A = Diag(
1
, . . . ,
n
) and B = Diag(
1
, . . . ,
n
) in P
n
,
X, L(A, B)X =
n
j,k=1
d(
j
,
k
)[X
jk
[
2
.
Proof: The uniqueness of such a function d is clear. We concentrate on
the existence.
Denote by E(jk)
(n)
and I
n
the nn matrix units and the nn unit matrix,
respectively. We assume that A = Diag(
1
, . . . ,
n
) and B = Diag(
1
, . . . ,
n
)
are in P
n
.
We rst show that
E(jk)
(n)
, L(A, A)E(lm)
(n)
= 0 if (j, k) ,= (l, m). (5.28)
Indeed, if j ,= k, l, m we let U
j
= Diag(1, . . . , 1, i, 1, . . . , 1) where the imagi-
nary unit is the jth entry and j ,= k, l, m. Then by Lemma 5.39 we have
E(jk)
(n)
, L(A, A)E(lm)
(n)
= U
j
E(jk)
(n)
U
j
, L(U
j
AU
j
, U
j
AU
j
)U
j
E(lm)
(n)
U
= iE(jk)
(n)
, L(A, A)E(lm)
(n)
= iE(jk)
(n)
, L(A, A)E(lm)
(n)

hence E(jk)
(n)
, L(A, A)E(lm)
(n)
= 0. If one of the indices j, k, l, m is dif-
ferent from the others then (5.28) follows analogously. Finally, applying con-
dition (iv) we obtain that
E(jk)
(n)
, L(A, B)E(lm)
(n)
= E(j, k + n)
(2n)
, m(C, C)E(l, m+ n)
(2n)
= 0
if (j, k) ,= (l, m), because C = Diag(
1
, . . . ,
n
,
1
, . . . ,
n
) H
+
2n
and one of
the indices j, k + n, l, m + n are dierent from the others.
Now we claim that E(jk)
(n)
, L(A, B)E(jk)
(n)
is determined by
j
and
k
. More specically,
|E(jk)
(n)
|
2
A,B
= |E(12)
(2)
|
2
Diag(
j
,
k
)
, (5.29)
where for brevity we introduced the notations
|X|
2
A,B
= X, L(A, B)X and |X|
2
A
= |X|
2
A,A
.
Indeed, if U
j,k+n
M
2n
denotes the unitary matrix which interchanges the
rst and the jth coordinates and further the second and the (k + n)th coor-
dinates, then by condition (iv) and Lemma 5.39 it follows that
|E(jk)
(n)
|
2
A,B
= |E(j, k + n)
(2n)
|
2
C
= |U
j,k+n
E(j, k + n)
(2n)
U
j,k+n
|
2
U
j,k+n
CU
j,k+n
= |E(12)
(2n)
|
2
Diag(
j
,
k
,
3
,...,n)
.
Thus it suces to prove
|E(12)
(2n)
|
2
Diag(
1
,
2
,...,
2n
)
= |E(12)
(2)
|
2
Diag(
1
,
2
)
. (5.30)
Condition (iv) with X = E(12)
(n)
and Y = Z = W = 0 yields
|E(12)
(2n)
|
2
Diag(
1
,
2
,...,
2n
)
= |E(12)
(n)
|
2
Diag(
1
,
2
,...,n)
. (5.31)
Further, consider the following mappings (n 4):
n
: M
n
M
n1
,
n
(E(jk)
(n)
) :=
_
_
_
E(jk)
(n1)
, if 1 j, k n 1,
E(n 1, n 1)
(n1)
, if j = k = n,
0, otherwise,
and

n
: M
n1
M
n
,

n
(E(jk)
(n1)
) := E(jk)
(n1)
if 1 j, k n 2,
n
(E(n 1, n 1)
(n1)
) :=

n1
E(n 1, n 1)
(n)
+
n
E(nn)
(n)
n1
+
n
and in the other cases

n
(E(jk)
(n1)
) = 0.
Clearly,
n
and

n
are trace-preserving completely positive mappings hence
by (iii)
|E(12)
(n)
|
2
Diag(
1
,...,n)
= |E(12)
(n)
|
2
nnDiag(
1
,...,n)
|
n
E(12)
(n)
|
2
nDiag(
1
,...,n)
|
n
E(12)
(n)
|
2
Diag(
1
,...,n)
= |E(12)
(n)
|
2
Diag(
1
,...,n)
.
Thus equality holds, which implies that
|E(12)
(n)
|
2
Diag(
1
,...,
n1
,n)
= |E(12)
(n1)
|
2
Diag(
1
,...,
n2
,
n1
+n)
. (5.32)
Now repeated application of (5.31) and (5.32) yields (5.30) and therefore also
(5.29) follows.
For , > 0 let
d(, ) := |E(12)
(2)
|
2
Diag(,)
.
Condition (ii) implies the continuity of d. We furthermore claim that d is
homogeneous of order one, that is,
d(r, r) = rd(, ) (, , r > 0).
First let r = k N. Then the mappings
k
: M
2
M
2k
,
k
: M
2k
M
k
dened by
k
(X) =
1
k
I
k
X
and

k
_
_
X
11
X
12
. . . X
1k
X
21
X
22
. . . X
2k
.
.
.
.
.
.
.
.
.
X
k1
X
k2
. . . X
kk
_
_
= X
11
+ X
22
+ . . . + X
kk
are trace-preserving completely positive, for which
k
= k
k
. So applying
condition (iii) twice it follows that
|E(12)
(2)
|
2
Diag(,)
= |E(12)
(2)
|
2

k
k
Diag(,)
|
k
E(12)
(2)
|
2
k
Diag(,)
|
k
E(12)
(2)
|
2
Diag(,)
= |E(12)
(2)
|
2
Diag(,)
.
Hence equality holds, which means that
|E(12)
(2)
|
2
Diag(,)
= |I
k
E(12)
(2)
|
2
1
k
I
k
Diag(,)
.
Thus by applying (5.28) and (5.29) we obtain
d(, ) = |I
k
E(12)
(2)
|
2
1
k
I
k
Diag(,)
=
k
j=1
|E(jj)
(k)
E(12)
(2)
|
2
1
k
I
k
Diag(,)
= k|E(11)
(k)
E(12)
(2)
|
2
1
k
I
k
Diag(,)
= kd
_
k
,

k
_
.
If r = /k where , k are positive natural numbers, then
d(r, r) = d
_
k
,

k
_
=
1
k
d(, ) =

k
d(, ).
By condition (ii), the homogeneity follows for every r > 0.
We nish the proof by using (5.28) and (5.29) and obtain
|X|
2
A,B
=
n
j,k=1
d(
j
,
k
)[X
jk
[
2
.
If we require the positivity of M(A, B)X for X 0, then from the formula
(M(A, B)X)
= M(B, A)X
we need A = B. If A =
i
[x
i
x
i
[ and X =
i,j
[x
i
x
j
[ with an orthonor-
mal basis [x
i
: i, then
_
M(A, A)X
_
ij
= m(
i
,
j
).
The positivity of this matrix is necessary.
Given the positive numbers
i
: 1 i n, the matrix
K
ij
= m(
i
,
j
)
is called an n n mean matrix. From the previous argument the positivity
of M(A, A) : M
n
M
n
implies the positivity of the n n mean matrices
of the mean M. It is easy to see that if the mean matrices of any size are
positive, then M(A, A) : M
n
M
n
is a completely positive mapping.
If the mean matrix
_

1
m(
1
,
2
)
m(
1
,
2
)
1
_
is positive, then m(
1
,
2
)
2
. It follows that to have a positive mean
matrix, the mean m should be smaller than the geometric mean. Indeed, the
next general characterization result is known.
Theorem 5.41 Let f be a standard matrix monotone function on R
+
and m
the corresponding mean, i.e., m(x, y) := xf(x/y) for x, y > 0. Let M( , )
be the corresponding mean transformation. Then the following conditions are
equivalent:
(1) M(A, A)X 0 for every 0 A, X M
n
and every n N;
(2) the mean transformation M(A, A) : M
n
M
n
is completely positive
for every 0 A M
n
and every n N;
(3) |M(A, A)X| |A
1/2
XA
1/2
| for every A, X M
n
with A 0 and
every n N, where | | is the operator norm;
(4) the mean matrix [m(
i
,
j
)]
ij
is positive semi-denite for every
1
, . . . ,
n
> 0 and every n N;
(5) f(e
t
)e
t/2
is a positive denite function on R in the sense of Bochner,
i.e., it is the Fourier transform of a probability measure on R.
The above condition (5) is stronger than f(x)
x and it is a necessary
and sucient condition of the positivity of M(A, A)X for all A, X 0.
Example 5.42 The power mean or binomial mean
m
t
(x, y) =
_
x
t
+ y
t
2
_
1/t
is an increasing function of t when x and y are xed. The limit t 0 gives
the geometric mean. Therefore the positivity of the matrix mean may appear
only for t 0. Then for t > 0,
m
t
(x, y) = 2
1/t
xy
(x
t
+ y
t
)
1/t
and the corresponding mean matrix is positive due to the innitely divisible
Cauchy matrix, see Example 1.41.
The geometric mean of operators rst appeared in the paper of Wieslaw Pusz
and Stanislav L. Woronowicz, Functional calculus for sesquilinear forms and
the purication map, Rep. Math. Phys. 8(1975), 159170, and the detailed
study was in the papers [2, 58] of Tsuyoshi Ando and Fumio Kubo. The
geometric mean for more matrices is from the paper [10]. Another approach
based on dierential geometry is explained in the book [21]. A popularization
of the subject is the paper Rajendra Bhatia and John Holbrook, Noncom-
mutative geometric means, Math. Intelligencer 28(2006), 3239.
Theorem 5.18 is from the paper Miklos Pala, A multivariable extension
of two-variable matrix means, SIAM J. Matrix Anal. Appl. 32(2011), 385
393. There is a dierent denition of the geometric mean X of the positive
matrices A
1
, A
2
, . . . .A
k
as dened by the equation
n
k=1
log A
1
i
X = 0. See
the paper Y. Lim and M. P ala, Matrixpower means and the Karcher mean,
J. Funct. Anal. 262(2012), 14981514 and the references therein.
The mean transformations are in the paper [44] and the book [45] of Fu-
mio Hiai and Hideki Kosaki. Theorem 5.38 is from the paper [18]. There
are several examples of positive and innite divisible mean matrices in the
paper Rajendra Bhatia and Hideki Kosaki, Mean matrices and innite divis-
ibility, Linear Algebra Appl. 424(2007), 3654. (Innite divisibility means
the positivity of matrices A
ij
= m(
i
,
j
)
t
for every t > 0.)
Lajos Molnar proved that if a bijection : M
+
n
M
+
n
preserves the
geometric mean, then for n 2 (A) = SAS
for a linear or conjugate linear

mapping S (Maps preserving the geometric mean of positive operators, Proc.
Amer. Math. Soc. 137(2009), 17631770.)
Theorem 5.27 is from the paper K. Audenaert, L. Cai and F. Hansen,
Inequalities for Quantum Skew Information, Lett. Math. Phys. 85 (2008),
135146. On the other hand, Theorem 5.29 is from the paper F. Hansen,
Metric adjusted skew information, Proc. Natl. Acad. Sci. USA 105(2008),
99099916, and Theorem 5.30 is from P. Gibilisco, F. Hansen, T. Isola, On
a correspondence between regular and non-regular operator monotone func-
tions, Linear Algebra Appl. 430(2009), 22252232.
The norm inequality (5.26) was obtained by R. Bhatia and C. Davis, A
Cauchy-Schwarz inequality for operators with applications, Linear Algebra
Appl. 223/224(1995), 119129. The integral expression (5.25) is due to H.
Kosaki, Arithmetic-geometric mean and related inequalities for operators, J.
Funct. Anal. 156(1998), 429451. For a systematic analysis on norm inequal-
ities and integral expressions of this kind as well as the details on Theorem
5.41, see the papers [44, 45, 46].
5.6. EXERCISES 227
5.6 Exercises
1. Show that for positive invertible matrices A and B the inequalities
2(A
1
+ B
1
)
1
A#B
1
2
(A+ B)
hold. What is the condition for equality? (Hint: Reduce the general
case to A = I.)
2. Show that
A#B =
1
_
1
0
(tA
1
+ (1 t)B
1
)
1
_
t(1 t)
dt.
3. Let A, B > 0. Show that A#B = A implies A = B.
4. Let 0 < A, B M
m
. Show that the rank of the matrix
_
A A#B
A#B B
_
is smaller than 2m.
5. Show that for any matrix mean m,
m(A, B)#m
(A, B) = A#B.
6. Let A 0 and P be a projection of rank 1. Show that A#P =
Tr APP.
7. Argue that natural map
(A, B) exp
_
log A+ log B
2
_
would not be a good denition for geometric mean.
8. Show that for positive matrices A : B = AA(A+ B)
1
A.
9. Show that for positive matrices A : B A.
10. Show that 0 < A B imply A 2(A : B) B.
11. Show that L(A, B) (A+ B)/2.
12. Let A, B > 0. Show that if for a matrix mean m
f
(A, B) = A, then
A = B.
13. Let f, g : R
+
R
+
be matrix monotone functions. Show that their
arithmetic and geometric means are matrix monotone as well.
14. Show that the matrix
A
ij
=
1
H
t
(
i
,
j
)
dened by the Heinz mean is positive.
15. Show that
t
m(e
tA
, e
tB
)
t=0
=
A + B
2
for a symmetric mean. (Hint: Check the arithmetic and harmonic
means, reduce the general case to these examples.)
16. Let A and B be positive matrices and assume that there is a unitary U
such that A
1/2
UB
1/2
0. Show that A#B = A
1/2
UB
1/2
.
17. Show that
S
(A : B)S (S
AS) : (S
BS)
for any invertible matrix S and A, B 0.
18. Show the property
(A : B) + (C : D) (A+ C) : (B + D)
of the parallel sum.
19. Show the logarithmic mean formula
L(A, B)
1
=
_

0
(tA + B)
1
t + 1
dt
for positive denite matrices A, B.
20. Let A and B be positive denite matrices. Set A
0
:= A, B
0
:= B and
dene recurrently
A
n
=
A
n1
+ B
n1
2
and B
n
= 2(A
1
n1
+ B
1
n1
)
1
(n = 1, 2, . . .).
Show that
lim
n
A
n
= lim
n
B
n
= A#B.
5.6. EXERCISES 229
21. Show that the function f
t
(x) dened in (5.22) has the property
x f
t
(x)
1 + x
2
when 1/2 t 2.
22. Let P and Q be ortho-projections. What is their Heinz mean?
23. Show that
det (A#B) =
det Adet B.
24. Assume that A and B are invertible positive matrices. Show that
(A#B)
1
= A
1
#B
1
.
25. Let
A :=
_
3/2 0
0 3/4
_
and B :=
_
1/2 1/2
1/2 1/2
_
.
Show that A B 0 and for p > 1 the inequality A
p
B
p
does not
hold.
26. Show that
det
_
G(A, B, C)
_
=
_
det Adet Bdet C
_
1/3
.
27. Show that
G(A, B, C) = ()
1/3
G(A, B, C)
for positive numbers , , .
28. Show that A
1
A
2
, B
1
B
2
, C
1
C
2
imply
G(A
1
, B
1
, C
1
) G(A
2
, B
2
, C
2
).
29. Show that
G(A, B, C) = G(A
1
, B
1
, C
1
)
1
.
30. Show that
3(A
1
+ B
1
+ C
1
)
1
G(A, B, C)
1
3
(A + B + C).
31. Show that
f
(x) = 2
21
x
(1 + x)
12
is a matrix monotone function for 0 < < 1.
32. Let P and Q be ortho-projections. Prove that L(P, Q) = P Q.
33. Show that the function
f
p
(x) =
_
x
p
+ 1
2
_
1/p
is matrix monotone if and only if 1 p 1.
34. For positive numbers a and b
lim
p0
_
a
p
+ b
p
2
_
1/p
=
ab.
Is it true that for 0 < A, B M
n
(C)
lim
p0
_
A
p
+ B
p
2
_
1/p
is the geometric mean of A and B?
Chapter 6
Majorization and singular
values
A citation from von Neumann: The object of this note is the study of certain
properties of complex matrices of nth order together with them we shall use
complex vectors of nth order. This classical subject in matrix theory is
exposed in Sections 6.2 and 6.3 after discussions on vectors in Section 6.1. The
chapter contains also several matrix norm inequalities as well as majorization
results for matrices, which were mostly developed rather recently.
Basic properties of singular values of matrices are given in Section 6.2. The
section also contains several fundamental majorizations, notably the Lidskii-
Wielandt and Gelfand-Naimark theorems, for the eigenvalues of Hermitian
matrices and the singular values of general matrices. Section 6.3 is an impor-
tant subject on symmetric or unitarily invariant norms for matrices. Sym-
metric norms are written as symmetric gauge-functions of the singular values
of matrices (the von Neumann theorem). So they are closely connected with
majorization theory as manifestly seen from the fact that the weak majoriza-
tion s(A)
w
s(B) for the singular value vectors s(A), s(B) of matrices A, B is
equivalent to the inequality [[[A[[[ [[[B[[[ for all symmetric norms (as sum-
marized in Theorem 6.23. Therefore, the majorization method is of particular
use to obtain various symmetric norm inequalities for matrices.
Section 4 further collects several majorization results (hence symmetric
norm inequalities), mostly developed rather recently, for positive matrices in-
volving concave or convex functions, or operator monotone functions, or cer-
tain matrix means. For instance, the symmetric norm inequalities of Golden-
Thompson type and of its complementary type are presented.
231
232 CHAPTER 6. MAJORIZATION AND SINGULAR VALUES
6.1 Majorization of vectors
Let a = (a
1
, . . . , a
n
) and b = (b
1
, . . . , b
n
) be vectors in R
n
. The decreasing
rearrangement of a is a
= (a
1
, . . . , a
n
) and b
= (b
1
, . . . , b
n
) is similarly
dened. The majorization a b means that
k
i=1
a
i

k
i=1
b
i
(1 k n) (6.1)
and the equality is required for k = n. The weak majorization a
w
b is
dened by the inequality (6.1), where the equality for k = n is not required.
The concepts were introduced by Hardy, Littlewood and P olya.
The majorization a b is equivalent to the statement that a is a convex
combination of permutations of the components of the vector b. This can be
written as
a =
U
Ub,
where the summation is over the n n permutation matrices U and
U
0,
U

U
= 1. The nn matrix D =
U

U
U has the property that all entries
are positive and the sums of rows and columns are 1. Such a matrix D is
called doubly stochastic. So a = Db. The proof is a part of the next
theorem.
Theorem 6.1 The following conditions for a, b R
n
are equivalent:
(1) a b ;
(2)
n
i=1
[a
i
r[
n
i=1
[b
i
r[ for all r R;
(3)
n
i=1
f(a
i
)
n
i=1
f(b
i
) for any convex function f on an interval con-
taining all a
i
, b
i
;
(4) a is a convex combination of coordinate permutations of b ;
(5) a = Db for some doubly stochastic n n matrix D.
Proof: (1) (4). We show that there exist a nite number of matrices
D
1
, . . . , D
N
of the form I +(1) where 0 1 and is a permutation
matrix interchanging two coordinates only such that a = D
N
D
1
b. Then
(4) follows because D
N
D
1
becomes a convex combination of permutation
matrices. We may assume that a
1
a
n
and b
1
b
n
. Suppose
a ,= b and choose the largest j such that a
j
< b
j
. Then there exists a k with
k > j such that a
k
> b
k
. Choose the smallest such k. Let
1
:= 1 minb
j
6.1. MAJORIZATION OF VECTORS 233

a
j
, a
k
b
k
/(b
j
b
k
) and
1
be the permutation matrix interchanging the
jth and kth coordinates. Then 0 <
1
< 1 since b
j
> a
j
a
k
> b
k
. Dene
D
1
:=
1
I + (1
1
)
1
and b
(1)
:= D
1
b. Now it is easy to check that
a b
(1)
b and b
(1)
1
b
(1)
n
. Moreover the jth or kth coordinates of a
and b
(1)
are equal. When a ,= b
(1)
, we can apply the above argument to a and
b
(1)
. Repeating nite times we reach the conclusion.
(4) (5) is trivial from the fact that any convex combination of permu-
tation matrices is doubly stochastic.
(5) (2). For every r R we have
n
i=1
[a
i
r[ =
n
i=1
j=1
D
ij
(b
j
r)
i,j=1
D
ij
[b
j
r[ =
n
j=1
[b
j
r[.
(2) (1). Taking large r and small r in the inequality of (2) we have
n
i=1
a
i
=
n
i=1
b
i
. Noting that [x[ + x = 2x
+
for x R, where x
+
=
maxx, 0, we have
n
i=1
(a
i
r)
+

n
i=1
(b
i
r)
+
(r R). (6.2)
Now prove that (6.2) implies that a
w
b. When b
k
r b
k+1
,
k
i=1
a
k
i=1
b
i
follows since
n
i=1
(a
i
r)
+

k
i=1
(a
i
r)
+

k
i=1
a
i
kr,
n
i=1
(b
i
r)
+
=
k
i=1
b
i
kr.
(4) (3). Suppose that a
i
=
N
k=1
k
b
k
(i)
, 1 i n, where
k
> 0,
N
k=1
k
= 1, and
k
are permutations on 1, . . . , n. Then the convexity of
f implies that
n
i=1
f(a
i
)
n
i=1
N
k=1
k
f(b
k
(i)
) =
n
i=1
f(b
i
).
(3) (5) is trivial since f(x) = [x r[ is convex.
Note that the implication (5) (4) is seen directly from the well-known
theorem of Birkho saying that any doubly stochastic matrix is a convex
combination of permutation matrices [27].
Example 6.2 Let D
AB
M
n
M
m
be a density matrix which is the convex
combination of tensor product of density matrices: D
AB
=
i
D
A
i
D
B
i
.
We assume that the matrices D
A
i
are acting on the Hilbert space 1
A
and D
B
i
acts on 1
B
.
The eigenvalues of D
AB
form a probability vector r = (r
1
, r
2
, . . . , r
nm
).
The reduced density matrix D
A
=
i
(Tr D
B
i
)D
A
i
has n eigenvalues and we
add nm n zeros to get a probability vector q = (q
1
, q
2
, . . . , q
nm
). We want
to show that there is a doubly stochastic matrix S which transform q into r.
This means r q.
Let
D
AB
=
k
r
k
[e
k
e
k
[ =
j
p
j
[x
j
x
j
[ [y
j
y
j
[
be decompositions of a density matrix in terms of unit vectors [e
k
1
A

1
B
, [x
j
1
A
and [y
j
1
B
. The rst decomposition is the Schmidt de-
composition and the second one is guaranteed by the assumed separability
condition. For the reduced density D
A
we have the Schmidt decomposition
and another one:
D
A
=
l
q
l
[f
l
f
l
[ =
j
p
j
[x
j
x
j
[,
where f
j
is an orthonormal family in 1
A
. According to Lemma 1.24 we have
two unitary matrices V and W such that
k
V
kj
p
j
[x
j
[y
j
=

r
k
[e
k
l
W
jl
q
l
[f
l
=

p
j
[x
j
.
Combine these equations to have
k
V
kj
l
W
jl
q
l
[f
l
[y
j
=
r
k
[e
k
and take the squared norm:

r
k
=
l
_
j
1
,j
2
V
kj
1
V
kj
2
W
j
1
l
W
j
2
l
y
j
1
, y
j
2
_
q
l
Introduce a matrix
S
kl
=
_
j
1
,j
2
V
kj
1
V
kj
2
W
j
1
l
W
j
2
l
y
j
1
, y
j
2
_
and verify that it is doubly stochastic.
6.1. MAJORIZATION OF VECTORS 235
The weak majorization a
w
b is dened by the inequality (6.1). A
matrix S is called doubly substochastic n n matrix if
n
j=1
S
ij
1 for
1 i n and
n
i=1
S
ij
1 for 1 j n.
The previous theorem was about majorization and the next one is about
weak majorization.
Theorem 6.3 The following conditions for a, b R
n
are equivalent:
(1) a
w
b;
(2) there exists a c R
n
such that a c b, where a c means that
a
i
c
i
, 1 i n;
(3)
n
i=1
(a
i
r)
+

n
i=1
(b
i
r)
+
for all r R;
(4)
n
i=1
f(a
i
)

n
i=1
f(b
i
) for any increasing convex function f on an
interval containing all a
i
, b
i
.
Moreover, if a, b 0, then the above conditions are equivalent to the next one:
(5) a = Sb for some doubly substochastic n n matrix S.
Proof: (1) (2). By induction on n. We may assume that a
1
a
n
and b
1
b
n
. Let := min
1kn
(
k
i=1
b
i
k
i=1
a
i
) and dene a := (a
1
+
, a
2
, . . . , a
n
). Then a a
w
b and
k
i=1
a
i
=
k
i=1
b
i
for some 1 k n.
When k = n, a a b. When k < n, we have ( a
1
, . . . , a
k
) (b
1
, . . . , b
k
)
and ( a
k+1
, . . . , a
n
)
w
(b
k+1
, . . . , b
n
). Hence the induction assumption implies
that ( a
k+1
, . . . , a
n
) (c
k+1
, . . . , c
n
) (b
k+1
, . . . , b
n
) for some (c
k+1
, . . . , c
n
)
R
nk
. Then a ( a
1
, . . . , a
k
, c
k+1
, . . . , c
n
) b is immediate from a
k
b
k

b
k+1
c
k+1
.
(2) (4). Let a c b. If f is increasing and convex on an interval
[, ] containing a
i
, b
i
, then c
i
[, ] and
n
i=1
f(a
i
)
n
i=1
f(c
i
)
n
i=1
f(b
i
)
by Theorem 6.1.
(4) (3) is trivial and (3) (1) was already shown in the proof (2)
(1) of Theorem 6.1.
Now assume a, b 0 and prove that (2) (5). If a c b, then we have,
by Theorem 6.1, c = Db for some doubly stochastic matrix D and a
i
=
i
c
i
for some 0
i
1. So a = Diag(
1
, . . . ,
n
)Db and Diag(
1
, . . . ,
n
)D is a
doubly substochastic matrix. Conversely if a = Sb for a doubly substochastic
matrix S, then a doubly stochastic matrix D exists so that S D entrywise,
whose proof is left for the Exercise 1 and hence a Db b.
Example 6.4 Let a, b R
n
and f be a convex function on an interval con-
taining all a
i
, b
i
. We use the notation f(a) := (f(a
1
), . . . , f(a
n
)) and similarly
f(b). Assume that a b. Since f is a convex function, so is (f(x) r)
+
for
any r R. Hence f(a)
w
f(b) follows from Theorems 6.1 and 6.3.
Next assume that a
w
b and f is an increasing convex function, then
f(a)
w
f(b) can be proved similarly.
Let a, b R
n
and a, b 0. We dene the weak log-majorization
a
w(log)
b when
k
i=1
a
i

k
i=1
b
i
(1 k n) (6.3)
and the log-majorization a
(log)
b when a
w(log)
b and equality holds for
k = n in (6.3). It is obvious that if a and b are strictly positive, then a
(log)
b
(resp., a
w(log)
b) if and only if log a log b (resp., log a
w
log b), where
log a := (log a
1
, . . . , log a
n
).
Theorem 6.5 Let a, b R
n
with a, b 0 and assume that a
w(log)
b. If f
is a continuous increasing function on [0, ) such that f(e
x
) is convex, then
f(a)
w
f(b). In particular, a
w(log)
b implies a
w
b.
Proof: First assume that a, b R
n
are strictly positive and a
w(log)
b, so
that log a
w
log b. Since g h is convex when g and h are convex with g
increasing, the function (f(e
x
) r)
+
is increasing and convex for any r R.
Hence by Theorem 6.3 we have
n
i=1
(f(a
i
) r)
+

n
i=1
(f(b
i
) r)
+
,
which implies f(a)
w
f(b) by Theorem 6.3 again. When a, b 0 and
a
w(log)
b, we can choose a
(m)
, b
(m)
> 0 such that a
(m)
w(log)
b
(m)
, a
(m)
a,
and b
(m)
b. Since f(a
(m)
)
w
f(b
(m)
) and f is continuous, we obtain
f(a)
w
f(b).
6.2. SINGULAR VALUES 237
6.2 Singular values
In this section we discuss the majorization theory for eigenvalues and sin-
gular values of matrices. Our goal is to prove the Lidskii-Wielandt and the
Gelfand-Naimark theorems for singular values of matrices. These are the
most fundamental majorizations for matrices.
When A is self-adjoint, the vector of the eigenvalues of A in decreasing
order with counting multiplicities is denoted by (A). The majorization re-
lation of self-adjoint matrices appears also in quantum theory.
Example 6.6 In quantum theory the states are described by density matri-
ces, they are positive with trace 1. Let D
1
and D
2
be density matrices. The
relation (D
1
) (D
2
) has the interpretation that D
1
is more mixed than
D
2
. Among the n n density matrices the most mixed has all eigenvalues
1/n.
Let f : R
+
R
+
be an increasing convex function with f(0) = 0. We
show that
(D) (f(D)/Tr f(D)) (6.4)
for a density matrix D.
Set (D) = (
1
,
2
, . . . ,
n
). Under the hypothesis on f the inequality
f(y)x f(x)y holds for 0 x y. Hence for i j we have
j
f(
i
)
i
f(
j
) and
(f(
1
) + + f(
k
))(
k+1
+ +
n
)
(
1
+ +
k
)(f(
k+1
) + + f(
n
)) .
Adding to both sides the term (f(
1
) + +f(
k
))(
1
+ +
k
) we arrive
at
(f(
1
) + + f(
k
))
n
i=1
i
(
1
+ +
k
)
n
i=1
f(
i
) .
This shows that the sum of the k largest eigenvalues of f(D)/Tr f(D) must
exceed that of D (which is
1
+ +
k
).
The canonical (Gibbs) state at inverse temperature = (kT)
1
possesses
the density e
H
/Tr e
H
. Choosing f(x) = x
/
with
> the formula

(6.4) tells us that
e
H
/Tr e
H
e
H
/Tr e
H
that is, at higher temperature the canonical density is more mixed.
Let 1 be an n-dimensional Hilbert space and A B(1). Let s(A) =
(s
1
(A), . . . , s
n
(A)) denote the vector of the singular values of A in decreas-
ing order, i.e., s
1
(A) s
n
(A) are the eigenvalues of [A[ = (A
A)
1/2
with counting multiplicities.
The basic properties of the singular values are summarized in the next
theorem. Recall that | | is the operator norm. The next theorem includes
the denition of the minimax expression, see Theorem 1.27.
Theorem 6.7 Let A, B, X, Y B(1) and k, m 1, . . . , n. Then
(1) s
1
(A) = |A|.
(2) s
k
(A) = [[s
k
(A) for C.
(3) s
k
(A) = s
k
(A
).
(4) Minimax expression:
s
k
(A) = min|A(I P)| : P is a projection, rank P = k 1. (6.5)
If A 0 then
s
k
(A) = min
_
maxx, Ax : x /
, |x| = 1 :
/ is a subspace of 1, dim /= k 1 . (6.6)
(5) Approximation number expression:
s
k
(A) = inf|AX| : X B(1), rank X < k. (6.7)
(6) If 0 A B then s
k
(A) s
k
(B).
(7) s
k
(XAY ) |X||Y |s
k
(A).
(8) s
k+m1
(A + B) s
k
(A) + s
m
(B) if k + m1 n.
(9) s
k+m1
(AB) s
n
(A)s
m
(B) if k + m1 n.
(10) [s
k
(A) s
k
(B)[ |AB|.
(11) s
k
(f(A)) = f(s
k
(A)) if A 0 and f : R
+
R
+
is an increasing
function.
Proof: First, recall basic decompositions of A B(1). Let A = U[A[ be
the polar decomposition of A and we write the Schmidt decomposition of [A[
as
[A[ =
n
i=1
s
i
(A)[u
i
u
i
[,
where U is a unitary and u
1
, . . . , u
n
is an orthonormal basis of 1. From the
polar decomposition of A and the diagonalization of [A[ one has the expression
A = U
1
Diag(s
1
(A), . . . , s
n
(A))U
2
(6.8)
with unitaries U
1
, U
2
B(1), called the singular value decomposition of
A, see Theorem 1.46.
(1) follows since s
1
(A) = | [A[ | = |A|. (2) is clear from [A[ = [[ [A[.
Also, (3) immediately follows since the Schmidt decomposition of [A
[ is given
as
[A
[ = U[A[U
=
n
i=1
s
i
(A)[Uu
i
Uu
i
[.
(4) Let
k
be the right-hand side of (6.5). For 1 k n dene P
k
:=
k
i=1
[u
i
u
i
[, which is a projection of rank k. We have
k
|A(I P
k1
)| =
_
_
_
_
n
i=k
s
i
(A)[u
i
u
i
[
_
_
_
_
= s
k
(A).
Conversely, for any > 0 choose a projection P with rank P = k 1 such
that |A(I P)| <
k
+. Then there exists a y 1 with |y| = 1 such that
P
k
y = y but Py = 0. Since y =
k
i=1
u
i
, yu
i
, we have
k
+ > | [A[(I P)y| = | [A[y| =
_
_
_
_
k
i=1
u
i
, ys
i
(A)u
i
_
_
_
_
=
_
k
i=1
[u
i
, y[
2
s
i
(A)
2
_
1/2
s
k
(A).
Hence s
k
(A) =
k
and the inmum
k
is attained by P = P
k1
.
Although the second expression (6.6) is included in Theorem 1.27, we give
the proof for convenience. When A 0, we have
s
k
(A) = s
k
(A
1/2
)
2
= min|A
1/2
(IP)|
2
: P is a projection, rank P = k 1.
Since |A
1/2
(I P)|
2
= max
xM
, x=1
x, Ax with / := ranP, the latter
expression follows.
(5) Let
k
be the right-hand side of (6.7). Let X := AP
k1
, where P
k1
is as in the above proof of (1). Then we have rank X rank P
k1
= k 1 so
that
k
|A(I P
k1
)| = s
k
(A). Conversely, assume that X B(1) has
rank < k. Since rank X = rank [X[ = rank X
, the projection P onto ranX
has rank < k. Then X(I P) = 0 and by (6.5) we have

s
k
(A) |A(I P)| = |(AX)(I P)| |AX|,
implying that s
k
(A)
k
. Hence s
k
(A) =
k
and the inmum
k
is attained
by AP
k1
.
(6) is an immediate consequence of (6.6). It is immediate from (6.5) that
s
n
(XA) |X|s
n
(A). Also s
n
(AY ) = s
n
(Y

A
) |Y |s
n
(A) by (3). Hence
(7) holds.
Next we show (8)(10). By (6.7) there exist X, Y B(1) with rank X <
k, rank Y < m such that |A X| = s
k
(A) and |B Y | = s
m
(B). Since
rank (X + Y ) rank X + rank Y < k + m1, we have
s
k+m1
(A+ B) |(A+ B) (X + Y )| < s
k
(A) + s
m
(B),
implying (8). For Z := XB + (A X)Y we get
rank Z rank X + rank Y < k + m1,
|AB Z| = |(AX)(B Y )| s
k
(A)s
m
(B).
These imply (9). Letting m = 1 and replacing B by B A in (8) we get
s
k
(B) s
k
(A) +|B A|,
which shows (10).
(11) When A 0 has the Schmidt decomposition A =
n
i=1
s
i
(A)[u
i
u
i
[,
we have f(A) =
n
i=1
f(s
i
(A))[u
i
u
i
[. Since f(s
1
(A)) f(s
n
(A)) 0,
s
k
(f(A)) = f(s
k
(A)) follows.
The next result is called the Weyl majorization theorem and we can
see the usefulness of the antisymmetric tensor technique.
Theorem 6.8 Let A M
n
and
1
(A), ,
n
(A) be the eigenvalues of A
arranged as [
1
(A)[ [
n
(A)[ with counting algebraic multiplicities.
Then
k
i=1
[
i
(A)[
k
i=1
s
i
(A) (1 k n).
Proof: If is an eigenvalue of A with algebraic multiplicity m, then there
exists a set y
1
, . . . , y
m
of independent vectors such that
Ay
j
y
j
spany
1
, . . . , y
j1
(1 j m).
Hence one can choose independent vectors x
1
, . . . , x
n
so that Ax
i
=
i
(A)x
i
+
z
i
with z
i
spanx
1
, . . . , x
i1
for 1 i n. Then it is readily checked that
A
k
(x
1
x
k
) = Ax
1
Ax
k
=
_
k
i=1
i
(A)
_
x
1
x
k
and x
1
x
k
,= 0, implying that
k
i=1
i
(A) is an eigenvalue of A
k
. Hence
Lemma 1.62 yields that
i=1
i
(A)
|A
k
| =
k
i=1
s
i
(A).
Note that another formulation of the previous theorem is

([
1
(A)[, . . . , [
n
(A)[)
w(log)
s(A).
The following majorization results are the celebrated Lidskii-Wielandt
theorem for the eigenvalues of self-adjoint matrices as well as for the singular
values of general matrices.
Theorem 6.9 If A, B M
sa
n
, then
(A) (B) (AB),
or equivalently
(
i
(A) +
ni+1
(B)) (A+ B).
Proof: What we need to prove is that for any choice of 1 i
1
< i
2
< <
i
k
n we have
k
j=1
(
i
j
(A)
i
j
(B))
k
j=1
j
(AB). (6.9)
Choose the Schmidt decomposition of AB as
AB =
n
i=1
i
(AB)[u
i
u
i
[
with an orthonormal basis u
1
, . . . , u
n
of C
n
. We may assume without loss of
generality that
k
(AB) = 0. In fact, we may replace B by B+
k
(AB)I,
which reduces both sides of (6.9) by k
k
(AB). In this situation, the Jordan
decomposition AB = (A B)
+
(AB)
is given as
(AB)
+
=
k
i=1
i
(AB)[u
i
u
i
[, (AB)
=
n
i=k+1
i
(AB)[u
i
u
i
[.
Since A = B+(AB)
+
(AB)
B+(AB)
+
, it follows from Theorem
1.27 that
i
(A)
i
(B + (AB)
+
), 1 i n.
Since B B + (A B)
+
, we also have
i
(B)
i
(B + (AB)
+
), 1 i n.
Hence
k
j=1
(
i
j
(A)
i
j
(B))
k
j=1
(
i
j
(B + (AB)
+
)
i
j
(B))
i=1
(
i
(B + (AB)
+
)
i
(B))
= Tr (B + (AB)
+
) Tr B
= Tr (A B)
+
=
k
j=1
j
(AB),
proving (6.9). Moreover,
n
i=1
(
i
(A)
i
(B)) = Tr (A B) =
n
i=1
i
(AB).
The latter expression is obvious since
i
(B) =
ni+1
(B) for 1 i n.
Theorem 6.10 For every A, B M

n
[s(A) s(B)[
w
s(AB)
holds, that is,
k
j=1
[s
i
j
(A) s
i
j
(B)[
k
j=1
s
j
(AB)
for any choice of 1 i
1
< i
2
< < i
k
n.
Proof: Dene
A :=
_
0 A
A 0
_
, B :=
_
0 B
B 0
_
.
Since
A
A =
_
A
A 0
0 AA
_
, [A[ =
_
[A[ 0
0 [A
[
_
,
it follows from Theorem 6.7 (3) that
s(A) = (s
1
(A), s
1
(A), s
2
(A), s
2
(A), . . . , s
n
(A), s
n
(A)).
On the other hand, since
_
I 0
0 I
_
A
_
I 0
0 I
_
= A,
we have
i
(A) =
i
(A) =
2ni+1
(A) for n i 2n. Hence one can
write
(A) = (
1
, . . . ,
n
,
n
, . . . ,
1
),
where
1
. . .
n
0. Since
s(A) = ([A[) = (
1
,
1
,
2
,
2
, . . . ,
n
,
n
),
we have
i
= s
i
(A) for 1 i n and hence
(A) = (s
1
(A), . . . , s
n
(A), s
n
(A), . . . , s
1
(A)).
Similarly,
(B) = (s
1
(B), . . . , s
n
(B), s
n
(B), . . . , s
1
(B)),
(AB) = (s
1
(AB), . . . , s
n
(AB), s
n
(AB), . . . , s
1
(AB)).
(A) (B) (AB).
Now we note that the components of (A) (B) are
[s
1
(A) s
1
(B)[, . . . , [s
n
(A) s
n
(B)[, [s
1
(A) s
1
(B)[, . . . , [s
n
(A) s
n
(B)[.
Therefore, for any choice of 1 i
1
< i
2
< < i
k
n with 1 k n, we
have
k
j=1
[s
i
j
(A) s
i
j
(B)[
k
i=1
i
(AB) =
k
j=1
s
j
(AB),
the proof is complete.
The following results due to Ky Fan are consequences of the above theo-
rems, which are weaker versions of the Lidskii-Wielandt theorem.
Corollary 6.11 If A, B M
sa
n
, then
(A+ B) (A) + (B).
Proof: Apply Theorem 6.9 to A + B and B. Then
k
i=1
_
i
(A + B)
i
(B)
_
i=1
i
(A)
so that
k
i=1
i
(A+ B)
k
i=1
_
i
(A) +
i
(B)
_
.
Moreover,
n
i=1
i
(A+ B) = Tr (A+ B) =
n
i=1
(
i
(A) +
i
(B)).
Corollary 6.12 If A, B M
n
, then
s(A+ B)
w
s(A) + s(B).
Proof: Similarly, by Theorem 6.10,
k
i=1
[s
i
(A+ B) s
i
(B)[
k
i=1
s
i
(A)
so that
k
i=1
s
i
(A+ B)
k
i=1
_
s
i
(A) + s
i
(B)
_
.
Another important majorization for singular values of matrices is the

Gelfand-Naimark theorem as follows.
n
(s
i
(A)s
ni+1
(B))
(log)
s(AB), (6.10)
holds, or equivalently
k
j=1
s
i
j
(AB)
k
j=1
(s
j
(A)s
i
j
(B)) (6.11)
for every 1 i
1
< i
2
< < i
k
n with equality for k = n.
Proof: First assume that A and B are invertible matrices and let A =
U
1
Diag(s
1
, . . . , s
n
)U
2
be the singular value decomposition (see (6.8)) with
the singular values s
1
s
n
> 0 of A and unitaries U
1
, U
2
. Write
D := Diag(s
1
, . . . , s
n
). Then s(AB) = s(U
1
DU
2
B) = s(DU
2
B) and s(B) =
s(U
2
B), so we may replace A, B by D, U
2
B, respectively. Hence we may
assume that A = D = Diag(s
1
, . . . , s
n
). Moreover, to prove (6.11), it suces
to assume that s
k
= 1. In fact, when A is replaced by s
1
k
A, both sides of
(6.11) are multiplied by same s
k
k
. Dene

A := Diag(s
1
, . . . , s
k
, 1, . . . , 1); then
A
2
A
2
and

A
2
I. We notice that from Theorem 6.7 that we have
s
i
(AB) = s
i
((B
A
2
B)
1/2
) = s
i
(B
A
2
B)
1/2
s
i
(B

A
2
B)
1/2
= s
i
(

AB)
for every i = 1, . . . , n and
s
i
(

AB) = s
i
(B

A
2
B)
1/2
s
i
(B
B)
1/2
= s
i
(B).
Therefore, for any choice of 1 i
1
< < i
k
n, we have
k
j=1
s
i
j
(AB)
s
i
j
(B)

k
j=1
s
i
j
(

AB)
s
i
j
(B)

n
i=1
s
i
(

AB)
s
i
(B)
=
det [
AB[
det [B[
=
_
det(B

A
2
B)
_
det(B
B)
=
det

A [ det B[
[ det B[
= det

A =
k
j=1
s
j
(A),
proving (6.11). By replacing A and B by AB and B
1
, respectively, (6.11) is
rephrased as
k
j=1
s
i
j
(A)
k
j=1
_
s
j
(AB)s
i
j
(B
1
)
_
.
Since s
i
(B
1
) = s
ni+1
(B)
1
for 1 i n as readily veried, the above
inequality means that
k
j=1
_
s
i
j
(A)s
ni
j
+1
(B)
_
j=1
s
j
(AB).
Hence (6.11) implies (6.10) and vice versa (as long as A, B are invertible).
For general A, B M
n
choose a sequence of complex numbers
l
C
((A) (B)) such that
l
0. Since A
l
:= A
l
I and B
l
:= B
l
I are
invertible, (6.10) and (6.11) hold for those. Then s
i
(A
l
) s
i
(A), s
i
(B
l
)
s
i
(B) and s
i
(A
l
B
l
) s
i
(AB) as l for 1 i n. Hence (6.10) and
(6.11) hold for general A, B.
An immediate corollary of this theorem is the majorization result due to
Horn.
Corollary 6.14 For every matrices A and B,
s(AB)
(log)
s(A)s(B),
where s(A)s(B) = (s
i
(A)s
i
(B)).
Proof: A special case of (6.11) is
k
i=1
s
i
(AB)
k
i=1
_
s
i
(A)s
i
(B)
_
for every k = 1, . . . , n. Moreover,
n
i=1
s
i
(AB) = det [AB[ = det [A[ det [B[ =
n
i=1
_
s
i
(A)s
i
(B)
_
.
6.3 Symmetric norms

A norm : R
n
R
+
is said to be symmetric if
(a
1
, a
2
, . . . , a
n
) = (
1
a
(1)
,
2
a
(2)
, . . . ,
n
a
(n)
) (6.12)
for every (a
1
, . . . , a
n
) R
n
, for any permutation on 1, . . . , n and
i
= 1.
The normalization is (1, 0, . . . , 0) = 1. Condition (6.12) is equivalently
written as
(a) = (a
1
, a
2
, . . . , a
n
)
for a = (a
1
, . . . , a
n
) R
n
, where (a
1
, . . . , a
n
) is the decreasing rearrangement
of ([a
1
[, . . . , [a
n
[). A symmetric norm is often called a symmetric gauge
function.
Typical examples of symmetric gauge functions on R
n
are the
p
-norms
p
dened by
p
(a) :=
_
_
_
n
i=1
[a
i
[
p
_
1/p
if 1 p < ,
max[a
i
[ : 1 i n if p = .
(6.13)
The next lemma characterizes the minimal and maximal normalized sym-
metric norms.
6.3. SYMMETRIC NORMS 247
Lemma 6.15 Let be a normalized symmetric norm on R
n
. If a = (a
i
),
b = (b
i
) R
n
and [a
i
[ [b
i
[ for 1 i n, then (a) (b). Moreover,
max
1in
[a
i
[ (a)
n
i=1
[a
i
[ (a = (a
i
) R
n
),
which means

1
.
Proof: In view of (6.12) we may show that
(a
1
, a
2
, . . . , a
n
) (a
1
, a
2
, . . . , a
n
) for 0 1.
This is seen as follows:
(a
1
, a
2
, . . . , a
n
)
=
_
1 +
2
a
1
+
1
2
(a
1
),
1 +
2
a
2
+
1
2
a
2
, . . . ,
1 +
2
a
n
+
1
2
a
n
_
1 +
2
(a
1
, a
2
, . . . , a
n
) +
1
2
(a
1
, a
2
, . . . , a
n
)
= (a
1
, a
2
, . . . , a
n
).
(6.12) and the previous inequality imply that
[a
i
[ = (a
i
, 0, . . . , 0) (a).
This means
. From
(a)
n
i=1
(a
i
, 0, . . . , 0) =
n
i=1
[a
i
[
we have
1
.
Lemma 6.16 If a = (a
i
), b = (b
i
) R
n
and ([a
1
[, . . . , [a
n
[)
w
([b
1
[, . . . , [b
n
[),
then (a) (b).
Proof: Theorem 6.3 gives that there exists a c R
n
such that
([a
1
[, . . . , [a
n
[) c ([b
1
[, . . . , [b
n
[).
Theorem 6.1 says that c is a convex combination of coordinate permutations
of ([b
1
[, . . . , [b
n
[). Lemma 6.15 and (6.12) imply that (a) (c) (b).
Let 1 be an n-dimensional Hilbert space. A norm [[[ [[[ on B(1) is said
to be unitarily invariant if
[[[UAV [[[ = [[[A[[[
for all A B(1) and all unitaries U, V B(1). A unitarily invariant norm
on B(1) is also called a symmetric norm. The following fundamental
theorem is due to von Neumann.
Theorem 6.17 There is a bijective correspondence between symmetric gauge
functions on R
n
and unitarily invariant norms [[[ [[[ on B(1) determined
by the formula
[[[A[[[ = (s(A)) (A B(1)). (6.14)
Proof: Assume that is a symmetric gauge function on R
n
. Dene [[[[[[ on
B(1) by the formula (6.14). Let A, B B(1). Since s(A+B)
w
s(A)+s(B)
by Corollary 6.12, it follows from Lemma 6.16 that
[[[A+ B[[[ = (s(A+ B)) (s(A) + s(B))
(s(A)) + (s(B)) = [[[A[[[ +[[[B[[[.
Also it is clear that [[[A[[[ = 0 if and only if s(A) = 0 or A = 0. For C
we have
[[[A[[[ = ([[s(A)) = [[ [[[A[[[
by Theorem 6.7. Hence [[[ [[[ is a norm on B(1), which is unitarily invariant
since s(UAV ) = s(A) for all unitaries U, V .
Conversely, assume that [[[ [[[ is a unitarily invariant norm on B(1).
Choose an orthonormal basis e
1
, . . . , e
n
of 1 and dene : R
n
R by
(a) :=
i=1
a
i
[e
i
e
i
[
(a = (a
i
) R
n
).
Then it is immediate to see that is a norm on R
n
. For any permutation
on 1, . . . , n and
i
= 1, one can dene unitaries U, V on 1 by Ue
(i)
=
i
e
i
and V e
(i)
= e
i
, 1 i n, so that
(a) =
U
_
n
i=1
a
(i)
[e
(i)
e
(i)
[
_
V
i=1
a
(i)
[Ue
(i)
V e
(i)
[
i=1
i
a
(i)
[e
i
e
i
[
= (
1
a
(1)
,
2
a
(2)
, . . . ,
n
a
(n)
).
Hence is a symmetric gauge function. For any A B(1) let A = U[A[ be
the polar decomposition of A and [A[ =
n
i=1
s
i
(A)[u
i
u
i
[ be the Schmidt
decomposition of [A[ with an orthonormal basis u
1
, . . . , u
n
. We have a
unitary V dened by V e
i
= u
i
, 1 i n. Since
A = U[A[ = UV
_
n
i=1
s
i
(A)[e
i
e
i
[
_
V
,
we have
(s(A)) =
i=1
s
i
(A)[e
i
e
i
[
UV
_
n
i=1
s
i
(A)[e
i
e
i
[
_
V

= [[[A[[[,
and so (6.14) holds. Therefore, the theorem is proved.
The next theorem summarizes properties of unitarily invariant (or sym-
metric) norms on B(1).
Theorem 6.18 Let [[[ [[[ be a unitarily invariant norm on B(1) correspond-
ing to a symmetric gauge function on R
n
and A, B, X, Y B(1). Then
(1) [[[A[[[ = [[[A
[[[.
(2) [[[XAY [[[ |X| |Y | [[[A[[[.
(3) If s(A)
w
s(B), then [[[A[[[ [[[B[[[.
(4) Under the normalization we have |A| [[[A[[[ |A|
1
.
Proof: By the denition (6.14), (1) follows from Theorem 6.7. By Theorem
6.7 and Lemma 6.15 we have (2) as
[[[XAY [[[ = (s(XAY )) (|X| |Y |s(A)) = |X| |Y | [[[A[[[.
Moreover, (3) and (4) follow from Lemmas 6.16 and 6.15, respectively.
For instance, for 1 p , we have the unitarily invariant norm | |
p
on B(1) corresponding to the
p
-norm
p
in (6.13), that is, for A B(1),
|A|
p
:=
p
(s(A)) =
_
_
_
n
i=1
s
i
(A)
p
_
1/p
= (Tr [A[
p
)
1/p
if 1 p < ,
s
1
(A) = |A| if p = .
The norm | |
p
is called the Schatten-von Neumann p-norm. In partic-
ular, |A|
1
= Tr [A[ is the trace-norm, |A|
2
= (Tr A
A)
1/2
is the Hilbert-
Schmidt norm and |A|
= |A| is the operator norm. (For 0 < p < 1,

we may dene | |
p
by the same expression as above, but this is not a norm,
and is called quasi-norm.)
Another important class of unitarily invariant norms for n n matrices is
the Ky Fan norm | |
(k)
dened by
|A|
(k)
:=
k
i=1
s
i
(A) for k = 1, . . . , n.
Obviously, | |
(1)
is the operator norm and | |
(n)
is the trace-norm. In
the next theorem we give two variational expressions for the Ky Fan norms,
which are sometimes quite useful since the Ky Fan norms are essential in
majorization and norm inequalities for matrices.
The right-hand side of the second expression in the next theorem is known
as the K-functional in the real interpolation theory.
Theorem 6.19 Let 1 be an n-dimensional space. For A B(1) and k =
1, . . . , n, we have
(1) |A|
(k)
= max|AP|
1
: P is a projection, rank P = k,
(2) |A|
(k)
= min|X|
1
+ k|Y | : A = X + Y .
Proof: (1) For any projection P of rank k, we have
|AP|
1
=
n
i=1
s
i
(AP) =
k
i=1
s
i
(AP)
k
i=1
s
i
(A)
by Theorem 6.7. For the converse, take the polar decomposition A = U[A[
with a unitary U and the spectral decomposition [A[ =
n
i=1
s
i
(A)P
i
with
mutually orthogonal projections P
i
of rank 1. Let P :=
k
i=1
P
i
. Then
|AP|
1
= |U[A[P|
1
=
_
_
_
_
k
i=1
s
i
(A)P
i
_
_
_
_
1
=
k
i=1
s
i
(A) = |A|
(k)
.
(2) For any decomposition A = X + Y , since s
i
(A) s
i
(X) + |Y | by
Theorem 6.7 (10), we have
|A|
(k)

k
i=1
s
i
(X) + k|Y | |X|
1
+ k|Y |
for any decomposition A = X + Y . Conversely, with the same notations as
in the proof of (1), dene
X := U
k
i=1
(s
i
(A) s
k
(A))P
i
,
Y := U
_
s
k
(A)
k
i=1
P
i
+
n
i=k+1
s
i
(A)P
i
_
.
Then X + Y = A and
|X|
1
=
k
i=1
s
i
(A) ks
k
(A), |Y | = s
k
(A).
Hence |X|
1
+ k|Y | =
k
i=1
s
i
(A).
The following is a modication of the above expression in (1):
|A|
(k)
= max[Tr (UAP)[ : U a unitary, P a projection, rank P = k.
Here we show the Holder inequality for matrices to illustrate the use-
fulness of the majorization technique.
Theorem 6.20 Let 0 < p, p
1
, p
2
and 1/p = 1/p
1
+ 1/p
2
. Then
|AB|
p
|A|
p
1
|B|
p
2
, A, B B(1).
Proof: When p
1
= or p
2
= , the result is obvious. Assume that
0 < p
1
, p
2
< . Since Corollary 6.14 implies that
(s
i
(AB)
p
)
(log)
(s
i
(A)
p
s
i
(B)
p
),
it follows from Theorem 6.5 that
(s
i
(AB)
p
)
w
(s
i
(A)
p
s
i
(B)
p
).
Since (p
1
/p)
1
+ (p
2
/p)
1
= 1, the usual Holder inequality for vectors shows
that
|AB|
p
=
_
n
i=1
s
i
(AB)
p
_
1/p
_
n
i=1
s
i
(A)
p
s
i
(B)
p
_
1/p
_
n
i=1
s
i
(A)
p
1
_
1/p
1
_
n
i=1
s
i
(B)
p
2
_
1/p
2
|A|
p
1
|B|
p
2
.
Corresponding to each symmetric gauge function , dene
: R
n
R
by
(b) := sup
_
n
i=1
a
i
b
i
: a = (a
i
) R
n
, (a) 1
_
for b = (b
i
) R
n
.
Then
is a symmetric gauge function again, which is said to be dual to

. For example, when 1 p and 1/p +1/q = 1, the
p
-norm
p
is dual
to the
q
-norm
q
.
The following is another generalized Holder inequality, which can be shown
as Theorem 6.20.
Lemma 6.21 Let ,
1
and
2
be symmetric gauge functions with the cor-
responding unitarily invariant norms [[[ [[[, [[[ [[[
1
and [[[ [[[
2
on B(1),
respectively. If
(ab)
1
(a)
2
(b), a, b R
n
,
then
[[[AB[[[ [[[A[[[
1
[[[B[[[
2
, A, B B(1).
In particular, if [[[ [[[
is the unitarily invariant norm corresponding to
dual to , then
|AB|
1
[[[A[[[ [[[B[[[
, A, B B(1).
Proof: By Corollary 6.14, Theorem 6.5, and Lemma 6.16, we have
(s(AB)) (s(A)s(B))
1
(s(A))
2
(s(B)) [[[A[[[
1
[[[B[[[
2
,
showing the rst assertion. For the second part, note by denition of
that
1
(ab) (a)
(b) for a, b R
n
.
Theorem 6.22 Let and
be dual symmetric gauge functions on R

n
with
the corresponding norms [[[ [[[ and [[[ [[[
on B(1), respectively. Then

[[[ [[[ and [[[ [[[
are dual with respect to the duality (A, B) Tr AB for

A, B B(1), that is,
[[[B[[[
= sup[Tr AB[ : A B(1), [[[A[[[ 1, B B(1). (6.15)

Proof: First note that any linear functional on B(1) is represented as
A B(1) Tr AB for some B B(1). We write [[[B[[[
for the right-hand

side of (6.15). From Lemma 6.21 we have
[Tr AB[ |AB|
1
[[[A[[[ [[[B[[[
so that [[[B[[[
[[[B[[[
for all B B(1). On the other hand, let B = V [B[

be the polar decomposition and [B[ =
n
i=1
s
i
(B)[v
i
v
i
[ be the Schmidt
decomposition of [B[. For any a = (a
i
) R
n
with (a) 1, let A :=
(
n
i=1
a
i
[v
i
v
i
[)V

. Then s(A) = s(
n
i=1
a
i
[v
i
v
i
[) = (a
1
, . . . , a
n
), the de-
creasing rearrangement of ([a
1
[, . . . , [a
n
[), and hence [[[A[[[ = (s(A)) =
(a) 1. Moreover,
Tr AB = Tr
_
n
i=1
a
i
[v
i
v
i
[
__
n
i=1
s
i
(B)[v
i
v
i
[
_
= Tr
_
n
i=1
a
i
s
i
(B)[v
i
v
i
[
_
=
n
i=1
a
i
s
i
(B)
so that
n
i=1
a
i
s
i
(B) [Tr AB[ [[[A[[[ [[[B[[[
[[[B[[[
.
This implies that [[[B[[[
(s(B)) [[[B[[[
.
As special cases we have | |
p
= | |
q
when 1 p and 1/p+1/q = 1.
The close relation between the (log-)majorization and the unitarily invari-
ant norm inequalities is summarized in the following proposition.
Theorem 6.23 Consider the following conditions for A, B B(1).
(i) s(A)
w(log)
s(B),
(ii) [[[f([A[)[[[ [[[f([B[)[[[ for every unitarily invariant norm [[[ [[[ and
every continuous increasing function f : R
+
R
+
such that f(e
x
) is
convex,
(iii) s(A)
w
s(B),
(iv) |A|
(k)
|B|
(k)
for every k = 1, . . . , n,
(v) [[[A[[[ [[[B[[[ for every unitarily invariant norm [[[ [[[,
(vi) [[[f([A[)[[[ [[[f([B[)[[[ for every unitarily invariant norm [[[ [[[ and
every continuous increasing convex function f : R
+
R
+
.
Then
(i) (ii) =(iii) (iv) (v) (vi).
Proof: (i) (ii). Let f be as in (ii). By Theorems 6.5 and 6.7 (11) we
have
s(f([A[)) = f(s(A))
w
f(s(B)) = s(f([B[)). (6.16)
This implies by Theorem 6.18 (3) that [[[f([A[)[[[ [[[f([B[)[[[ for any uni-
tarily invariant norm.
(ii) (i). Take [[[[[[ = ||
(k)
, the Ky Fan norms, and f(x) = log(1+
1
x)
for > 0. Then f satises the condition in (ii). Since
s
i
(f([A[)) = f(s
i
(A)) = log( + s
i
(A)) log ,
the inequality |f([A[)|
(k)
|f([B[)|
(k)
means that
k
i=1
( + s
i
(A))
k
i=1
( + s
i
(B)).
Letting 0 gives
k
i=1
s
i
(A)
k
i=1
s
i
(B) and hence (i) follows.
(i) (iii) follows from Theorem 6.5. (iii) (iv) is trivial by denition of
| |
(k)
and (vi) (v) (iv) is clear. Finally assume (iii) and let f be as in
(vi). Theorem 6.7 yields (6.16) again, so that (vi) follows. Hence (iii) (vi)
holds.
By Theorems 6.9, 6.10 and 6.23 we have:
Corollary 6.24 For A, B M
n
and a unitarily invariant norm [[[ [[[, the
inequality
[[[Diag(s
1
(A) s
1
(B), . . . , s
n
(A) s
n
(B))[[[ [[[AB[[[
holds. If A and B are self-adjoint, then
[[[Diag(
1
(A)
1
(B), . . . ,
n
(A)
n
(B))[[[ [[[AB[[[.
The following statements are particular cases for self-adjoint matrices:
_
n
i=1
[
i
(A)
i
(B)[
p
_
1/p
|AB|
p
(1 p < ).
The following is called Weyls inequality:
max
1in
[
i
(A)
i
(B)[ |AB|.
There are similar inequalities in the general case, where
i
is replaced by s
i
.
In the rest of this section we show symmetric norm inequalities (or eigen-
value majorizations) involving convex/concave functions and expansions. An
operator Z is called an expansion if Z
Z I.
+
R
+
be a concave function. If 0 A M
n
and
Z M
n
is an expansion, then
[[[f(Z
AZ)[[[ [[[Z
f(A)Z[[[
for every unitarily invariant norm [[[ [[[, or equivalently,
(f(Z
AZ))
w
(Z
f(A)Z).
Proof: Note that f is automatically non-decreasing. Due to Theorem 6.22
it suces to prove the inequality for the Ky Fan k-norms | |
(k)
, 1 k n.
Letting f
0
(x) := f(x) f(0) we have
f(Z
AZ) = f(0)I + f
0
(Z
AZ),
Z
f(A)Z = f(0)Z
Z + Z
f
0
(A)Z f(0)I + Z
f
0
(A)Z,
which show that we may assume that f(0) = 0. Then there is a spectral
projection E of rank k for Z
AZ such that
|f(Z
AZ)|
(k)
=
k
j=1
f(
j
(Z
AZ)) = Tr f(Z
AZ)E.
When we show that
Tr f(Z
AZ)E Tr Z
f(A)ZE, (6.17)
it follows that
|f(Z
AZ)|
(k)
Tr Z
f(A)ZE |Z
f(A)Z|
(k)
by Theorem 6.19. For (6.17) we may show that
Tr g(Z
AZ)E Tr Z
g(A)ZE (6.18)
for every convex function on R
+
with g(0) = 0. Such a function g can be
approximated by functions of the type
x +
m
i=1
i
(x
i
)
+
(6.19)
with R and
i
,
i
> 0, where (x )
+
:= max0, x . Consequently,
it suces to show (6.18) for g
(x) := (x )
+
with > 0. From the lemma
below we have a unitary U such that
g
(Z
AZ) U
(A)ZU.
We hence have
Tr g
(Z
AZ)E =
k
j=1
j
(g
(Z
AZ))
k
j=1
j
(U
(A)ZU)
=
k
j=1
j
(Z
(A)Z) Tr Z
(A)ZE,
that is (6.18) for g = g
.
Lemma 6.26 Let A M
+
n
, Z M be an expansion, and > 0. Then there
exists a unitary U such that
(Z
AZ I)
+
U
(AI)
+
ZU.
Proof: Let P be the support projection of (A I)
+
and set A
:= PA.
Let Q be the support projection of Z
Z. Since Z
AZ Z
Z and
(x )
+
is a non-decreasing function, for 1 j n we have
j
((Z
AZ I)
+
) = (
j
(Z
AZ) )
+
(
j
(Z
Z) )
+
=
j
((Z
Z I)
+
).
So there exists a unitary U such that
(Z
AZ I)
+
U
(Z
Z I)
+
U.
It is obvious that Q is the support projection of Z
PZ. Also, note that

Z
PZ is unitarily equivalent to PZZ
P. Since Z
Z I, it follows that
ZZ
I and so PZZ
P P. Therefore, we have Q Z
PZ. Since
Z
Z Z
PZ Q, we see that
(Z
Z I)
+
= Z
Z Q Z
Z Z
PZ
= Z
(A
P)Z = Z
(AI)
+
Z,
which gives the conclusion.
When f is convex with f(0) = 0, the inequality in Theorem 6.25 is re-
versed.
+
R
+
be a convex function with f(0) = 0. If
0 A M
n
and Z M
n
is an expansion, then
[[[f(Z
AZ)[[[ [[[Z
f(A)Z[[[
for every unitarily invariant norm [[[ [[[.
Proof: By approximation we may assume that f is of the form (6.19) with
0 and
i
,
i
> 0. By Lemma 6.26 we have
Z
f(A)Z = Z
AZ +
i
Z
(A
i
I)
+
Z
Z
AZ +
i
U
i
(Z
AZ
i
I)
+
U
i
for some unitaries U
i
, 1 i m. We now consider the Ky Fan k-norms
| |
(k)
. For each k = 1, . . . , n there is a projection E of rank k so that
_
_
_Z
AZ +
i
U
i
(Z
AZ
i
I)
+
U
i
_
_
_
(k)
= Tr
_
Z
AZ +
i
U
i
(Z
AZ
i
I)
+
U
i
_
E
= Tr Z
AZE +
i
Tr (Z
AZ
i
I)
+
U
i
EU
i
|Z
AZ|
(k)
+
i
|(Z
AZ
i
I)
+
|
(k)
=
k
j=1
_
j
(Z
AZ) +
i
(
j
(Z
AZ)
i
)
+
_
=
k
j=1
f(
j
(Z
AZ)) = |f(Z
AZ)|
(k)
,
and hence |Z
f(A)Z|
(k)
|f(Z
AZ)|
(k)
. This implies the conclusion.
For the trace function the non-negativity assumption of f is not necessary
so that we have
Theorem 6.28 Let 0 A M
n
and Z M
n
be an expansion. If f is a
concave function on R
+
with f(0) 0, then
Tr f(Z
AZ) Tr Z
f(A)Z.
If f is a convex function on R
+
with f(0) 0, then
Tr f(Z
AZ) Tr Z
f(A)Z.
Proof: The two assertions are obviously equivalent. To prove the second,
by approximation we may assume that f is of the form (6.19) with R
and
i
,
i
> 0. Then, by Lemma 6.26,
Tr f(Z
AZ) = Tr
_
Z
AZ +
i
(Z
AZ
i
I)
+
_
Tr
_
Z
AZ +
i
Z
(A
i
I)
+
Z
_
= Tr Z
f(A)Z
and the statement is proved.
6.4 More majorizations for matrices
In the rst part of this section, we prove a subadditivity property for certain
symmetric norm functions. Let f : R
+
R
+
be a concave function. Then f
is increasing and it is easy to show that f(a + b) f(a) + f(b) for positive
numbers a and b. The Rotfeld inequality
Tr f(A+ B) Tr (f(A) + f(B)) (A, B M
+
n
)
is a matrix extension. Another extension is
[[[f(A+ B)[[[ [[[f(A) + f(B)[[[ (6.20)
for all 0 A, B M
n
and for any unitarily invariant norm [[[ [[[, which will
be proved in Theorem 6.33 below.
Lemma 6.29 Let g : R
+
R
+
be a continuous function. If g is decreasing
and xg(x) is increasing, then
((A+ B)g(A+ B))
w
(A
1/2
g(A+ B)A
1/2
+ B
1/2
g(A+ B)B
1/2
)
for all A, B M
+
n
.
Proof: Let (A + B) = (
1
, . . . ,
n
) be the eigenvalue vector arranged in
decreasing order and u
1
, . . . , u
n
be the corresponding eigenvectors forming an
orthonormal basis of C
n
. For 1 k n let P
k
be the orthogonal projection
onto the subspace spanned by u
1
, . . . , u
k
. Since xg(x) is increasing, it follows
that
((A+ B)g(A+ B)) = (
1
g(
1
), . . . ,
n
g(
n
)).
Hence, what we need to prove is
Tr (A+ B)g(A+ B)P
k
Tr
_
A
1/2
g(A+ B)A
1/2
+ B
1/2
g(A+ B)B
1/2
_
P
k
,
6.4. MORE MAJORIZATIONS FOR MATRICES 259
since the left-hand side is equal to
k
i=1
i
g(
i
) and the right-hand side is
less than or equal to
k
i=1
i
(A
1/2
g(A + B)A
1/2
+ B
1/2
g(A + B)B
1/2
). The
above inequality immediately follows by summing the following two:
Tr g(A+ B)
1/2
Ag(A+ B)
1/2
P
k
Tr A
1/2
g(A+ B)A
1/2
P
k
, (6.21)
Tr g(A+ B)
1/2
Bg(A+ B)
1/2
P
k
Tr B
1/2
g(A+ B)B
1/2
P
k
. (6.22)
To prove (6.21), we write P
k
, H := g(A+ B) and A
1/2
as
P
k
=
_
I
K
0
0 0
_
, H =
_
H
1
0
0 H
2
_
, A
1/2
=
_
A
11
A
12
A
12
A
22
_
in the form of 2 2 block-matrices corresponding to the orthogonal decom-
position C
n
= / /
with / := P
k
C
n
. Then
P
k
g(A+ B)
1/2
Ag(A+ B)
1/2
P
k
=
_
H
1/2
1
A
2
11
H
1/2
1
+ H
1/2
1
A
12
A
12
H
1/2
1
0
0 0
_
,
P
k
A
1/2
g(A+ B)A
1/2
P
k
=
_
A
11
H
1
A
11
+ A
12
H
2
A
12
0
0 0
_
.
Since g is decreasing, we notice that
H
1
g(
k
)I
K
, H
2
g(
k
)I
K
.
Therefore, we have
Tr H
1/2
1
A
12
A
12
H
1/2
1
= Tr A
12
H
1
A
12
g(
k
)Tr A
12
A
12
= g(
k
)Tr A
12
A
12
Tr A
12
H
2
A
12
so that
Tr (H
1/2
1
A
2
11
H
1/2
1
+ H
1/2
1
A
12
A
12
H
1/2
1
) Tr (A
11
H
1
A
11
+ A
12
H
2
A
12
),
which shows (6.21). (6.22) is similarly proved.
In the next result matrix concavity is assumed.
+
R
+
be a continuous matrix monotone (equiv-
alently, matrix concave) function. Then (6.20) holds for all 0 A, B M
n
and for any unitarily invariant norm [[[ [[[.
Proof: By continuity we may assume that A, B are invertible. Let g(x) :=
f(x)/x; then g satises the assumptions of Lemma 6.29. Hence the lemma
implies that
[[[f(A+ B)[[[ [[[A
1/2
(A + B)
1/2
f(A+ B)(A+ B)
1/2
A
1/2
+B
1/2
(A+ B)
1/2
f(A+ B)(A + B)
1/2
B
1/2
[[[. (6.23)
Since C := A
1/2
(A + B)
1/2
is a contraction, Theorem 4.23 implies from the
matrix concavity that
A
1/2
(A+ B)
1/2
f(A+ B)(A+ B)
1/2
A
1/2
= Cf(A+ B)C
f(C(A+ B)C
) = f(A),
and similarly
B
1/2
(A+ B)
1/2
f(A+ B)(A+ B)
1/2
B
1/2
f(B).
Therefore, the right-hand side of (6.23) is less than or equal to [[[f(A) +
f(B)[[[.
A particular case of the next theorem is [[[(A+B)
m
[[[ [[[A
m
+B
m
[[[ for
m N, which was shown by Bhatia and Kittaneh [23].
Theorem 6.31 Let g : R
+
R
+
be an increasing bijective function whose
inverse function is operator monotone. Then
[[[g(A+ B)[[[ [[[g(A) + g(B)[[[ (6.24)
for all 0 A, B M
n
and [[[ [[[.
Proof: Let f be the inverse function of g. For every 0 A, B M
n
,
f((A+ B))
w
(f(A) + f(B)).
Now, replace A and B by g(A) and g(B), respectively. Then we have
f((g(A) + g(B)))
w
(A+ B).
Since f is concave and hence g is convex (and increasing), we have by Example
6.4
(g(A) + g(B))
w
g((A+ B)) = (g(A+ B)),
which means by Theorem 6.23 that [[[g(A) + g(B)[[[ [[[g(A+ B)[[[.
The above theorem can be extended to the next theorem due to Kosem
[57], which is the rst main result of this section. The simpler proof below is
from [30].
Theorem 6.32 Let g : R
+
R
+
be a continuous convex function with
g(0) = 0. Then (6.24) holds for all A, B and [[[ [[[ as above.
Proof: First, note that a convex function g 0 on R
+
with g(0) = 0 is
non-decreasing. Let denote the set of all non-negative functions g on R
+
for
which the conclusion of the theorem holds. It is obvious that is closed un-
der pointwise convergence and multiplication by non-negative scalars. When
f, g , for the Ky Fan norms | |
(k)
, 1 k n, and for 0 A, B M
n
we
have
|(f + g)(A+ B)|
(k)
= |f(A+ B)|
(k)
+|g(A+ B)|
(k)
|f(A) + f(B)|
(k)
+|g(A) + g(B)|
(k)
|(f + g)(A) + (f + g)(B)|
(k)
,
where the above equality is guaranteed by the non-decreasingness of f, g and
the latter inequality is the triangle inequality. Hence f + g by Theorem
6.23 so that is a convex cone. Notice that any convex function g 0 on R
+
with g(0) = 0 is the pointwise limit of an increasing sequence of functions of
the form
m
l=1
c
l
a
l
(x) with c
l
, a
l
> 0, where
a
is the angle function at a > 0
given as
a
(x) := maxx a, 0. Hence it suces to show that
a
for all
a > 0. To do this, for a, r > 0 we dene
h
a,r
(x) :=
1
2
_
_
(x a)
2
+ r + x
a
2
+ r
_
, x 0,
which is an increasing bijective function on R
+
and whose inverse is
x
r/2
2x +
a
2
+ r a
+
a
2
+ r + a
2
. (6.25)
Since (6.25) is operator monotone on R
+
, we have h
a,r
by Theorem 6.31.
Therefore,
a
since h
a,r

a
as r 0.
The next subadditivity inequality extending Theorem 6.30 was proved by
Bourin and Uchiyama [30], which is the second main result.
+
R
+
be a continuous concave function. Then
(6.20) holds for all A, B and [[[ [[[ as above.
Proof: Let
i
and u
i
, 1 i n, be taken as in the proof of Lemma 6.29,
and P
k
, 1 k n, be also as there. We may prove the weak majorization
k
i=1
f(
i
)
k
i=1
i
(f(A) + f(B)) (1 k n).
To do this, it suces to show that
Tr f(A+ B)P
k
Tr (f(A) + f(B))P
k
. (6.26)
Indeed, since concave f is necessarily increasing, the left-hand side of (6.26)
is
k
i=1
f(
i
) and the right-hand side is less than or equal to
k
i=1
i
(f(A) +
f(B)). Here, note by Exercise 12 that f is the pointwise limit of a sequence
of functions of the form + x g(x) where 0, > 0, and g 0 is a
continuous convex function on R
+
with g(0) = 0. Hence, to prove (6.26), it
suces to show that
Tr g(A+ B)P
k
Tr (g(A) + g(B))P
k
for any continuous convex function g 0 on R
+
with g(0) = 0. In fact, this
is seen as follows:
Tr g(A+ B)P
k
= |g(A+ B)|
(k)
|g(A) + g(B)|
(k)
Tr (g(A) + g(B))P
k
,
where the above equality is due to the increasingness of g and the rst in-
equality follows from Theorem 6.32.
The subadditivity inequality of Theorem 6.32 was further extended by J.-
C. Bourin in such a way that if f is a positive continuous concave function
on R
+
then
[[[f([A+ B[)[[[ [[[f([A[) + f([B[)[[[
for all normal matrices A, B M
n
and for any unitarily invariant norm [[[ [[[.
In particular,
[[[f([Z[)[[[ [[[f([A[) + f([B[)[[[
when Z = A + iB is the Descartes decomposition of Z.
In the second part of this section, we prove the inequality between norms of
f([AB[) and f(A)f(B) (or the weak majorization for their singular values)
when f is a positive operator monotone function on R
+
and A, B M
+
n
. We
rst prepare some simple facts for the next theorem.
Lemma 6.34 For self-adjoint X, Y M
n
, let X = X
+
X
and Y =
Y
+
Y
be the Jordan decompositions.

(1) If X Y then s
i
(X
+
) s
i
(Y
+
) for all i.
(2) If s(X
+
)
w
s(Y
+
) and s(X
)
w
s(Y
), then s(X)
w
s(Y ).
Proof: (1) Let Q be the support projection of X
+
. Since
X
+
= QXQ QY Q QY
+
Q,
we have s
i
(X
+
) s
i
(QY
+
Q) s
i
(Y
+
) by Theorem 6.7 (7).
(2) It is rather easy to see that s(X) is the decreasing rearrangement of
the combination of s(X
+
) and s(X
). Hence for each k N we can choose

0 m k so that
k
i=1
s
i
(X) =
m
i=1
s
i
(X
+
) +
km
i=1
s
i
(X
).
Hence
k
i=1
s
i
(X)
m
i=1
s
i
(Y
+
) +
km
i=1
s
i
(Y
)
k
i=1
s
i
(Y ),
as desired.
+
R
+
be a matrix monotone function. Then
[[[f(A) f(B)[[[ [[[f([AB[)[[[
for all 0 A, B M
n
and for any unitarily invariant norm [[[ [[[. Equiva-
lently,
s(f(A) f(B))
w
s(f([AB[)) (6.27)
holds.
Proof: First assume that A B 0 and let C := A B 0. In view of
Theorem 6.23, it suces to prove that
|f(B + C) f(B)|
(k)
|f(C)|
(k)
(1 k n). (6.28)
For each (0, ) let
h
(x) :=
x
x +
= 1

x +
,
which is increasing on R
+
with h
(0) = 0. According to the integral repre-

sentation (4.19) for f with a, b 0 and a positive measure on (0, ), we
have
s
i
(f(C)) = f(s
i
(C))
= a + bs
i
(C) +
_
(0,)
s
i
(C)
s
i
(C) +
d()
= a + bs
i
(C) +
_
(0,)
s
i
(h
(C)) d(),
so that
|f(C)|
(k)
b|C|
(k)
+
_
(0,)
|h
(C)|
(k)
d(). (6.29)
On the other hand, since
f(B + C) = aI + b(B + C) +
_
(0,)
h
(B + C) d()
as well as the analogous expression for f(B), we have
f(B + C) f(B) = bC +
_
(0,)
(h
(B + C) h
(B)) d(),
so that
|f(B + C) f(B)|
(k)
b|C|
(k)
+
_
(0,)
|h
(B + C) h
(B)|
(k)
d().
By this inequality and (6.29), it suces for (6.28) to show that
|h
(B + C) h
(B)|
(k)
|h
(C)|
(k)
( (0, ), 1 k n).
As h
(x) = h
1
(x/), it is enough to show this inequality for the case = 1
since we may replace B and C with
1
B and
1
C, respectively. Thus, what
remains to prove is the following:
|(B + I)
1
(B + C + I)
1
|
(k)
|I (C + I)
1
|
(k)
(1 k n). (6.30)
Since
(B+I)
1
(B+C+I)
1
= (B+I)
1/2
h
1
((B+I)
1/2
C(B+I)
1/2
)(B+I)
1/2
and |(B + I)
1/2
| 1, we obtain
s
i
((B + I)
1
(B + C + I)
1
) s
i
(h
1
((B + I)
1/2
C(B + I)
1/2
))
= h
1
(s
i
((B + I)
1/2
C(B + I)
1/2
))
h
1
(s
i
(C)) = s
i
(I (C + I)
1
)
by repeated use of Theorem 6.7 (7). Therefore, (6.30) is proved.
Next, let us prove the assertion in the general case A, B 0. Since
0 A B + (A B)
+
, it follows that
f(A) f(B) f(B + (AB)
+
) f(B),
which implies by Lemma 6.34 (1) that
|(f(A) f(B))
+
|
(k)
|f(B + (AB)
+
) f(B)|
(k)
.
Applying (6.28) to B + (A B)
+
and B, we have
|f(B + (AB)
+
) f(B)|
(k)
|f((AB)
+
)|
(k)
.
Therefore,
s((f(A) f(B))
+
)
w
s(f((AB)
+
)). (6.31)
Exchanging the role of A, B gives
s((f(A) f(B))
)
w
s(f((AB)
)). (6.32)
Here, we may assume that f(0) = 0 since f can be replaced by f f(0).
Then it is immediate to see that
f((AB)
+
)f((AB)
) = 0, f((AB)
+
) + f((AB)
) = f([AB[).
Hence s(f(A)f(B))
w
s(f([AB[)) follows from (6.31) and (6.32) thanks
to Lemma 6.34 (2).
When f(x) = x
with 0 < < 1, the weak majorization (6.27) gives the

norm inequality formerly proved by Birman, Koplienko and Solomyak:
|A
|
p/
|AB|
p
for all A, B M
+
n
and p . The case where = 1/2 and p = 1 is
known as the Powers-Strmer inequality.
The following is an immediate corollary of Theorem 6.35, whose proof is
similar to that of Theorem 6.31.
Corollary 6.36 Let g : R
+
R
+
be an increasing bijective function whose
inverse function is operator monotone. Then
[[[g(A) g(B)[[[ [[[g([AB[)[[[
for all A, B and [[[ [[[ as above.
In [13], Audenaert and Aujla pointed out that Theorem 6.35 is not true
in the case where f : R
+
R
+
is a general continuous concave function and
that Corollary 6.36 is not true in the case where g : R
+
R
+
is a general
continuous convex function.
In the last part of this section we prove log-majorizations results, which
give inequalities strengthening or complementing the Golden-Thompson in-
equality. The following log-majorization is due to Huzihiro Araki.
+
n
,
s((A
1/2
BA
1/2
)
r
)
(log)
s(A
r/2
B
r
A
r/2
) (r 1), (6.33)
or equivalently
s((A
p/2
B
p
A
p/2
)
1/p
)
(log)
s((A
q/2
B
q
A
q/2
)
1/q
) (0 < p q). (6.34)
Proof: We can pass to the limit from A + I and B + I as 0 by
Theorem 6.7 (10). So we may assume that A and B are invertible.
First we show that
|(A
1/2
BA
1/2
)
r
| |A
r/2
B
r
A
r/2
| (r 1). (6.35)
It is enough to check that A
r/2
B
r
A
r/2
I implies A
1/2
BA
1/2
I which is
equivalent to a monotonicity: B
r
A
r
implies B A
1
.
We have
((A
1/2
BA
1/2
)
r
)
k
= ((A
k
)
1/2
(B
k
)(A
k
)
1/2
)
r
,
(A
r/2
B
r
A
r/2
)
k
= (A
k
)
r/2
(B
k
)
r
(A
k
)
r/2
,
and instead of A, B in (6.35) we put A
k
, B
k
:
|((A
1/2
BA
1/2
)
r
)
k
| |(A
r/2
B
r
A
r/2
)
k
|.
This means, thanks to Lemma 1.62, that
k
i=1
s
i
((A
1/2
BA
1/2
)
r
)
k
i=1
s
i
(A
r/2
B
r
A
r/2
).
Moreover,
n
i=1
s
i
((A
1/2
BA
1/2
)
r
) = (det A det B)
r
=
n
i=1
s
i
(A
r/2
B
r
A
r/2
).
Hence (6.33) is proved. If we replace A, B by A
p
, B
p
and take r = q/p, then
s((A
p/2
B
p
A
p/2
)
q/p
)
(log)
s(A
q/2
B
q
A
q/2
),
which implies (6.34) by Theorem 6.7 (11).
Let 0 A, B M
m
, s, t R
+
and t 1. Then the theorem implies
Tr (A
1/2
BA
1/2
)
st
Tr (A
t/2
BA
t/2
)
s
(6.36)
which is called the Araki-Lieb-Thirring inequality. The case s = 1 and
integer t was the Lieb-Thirring inequality.
Theorems 6.23 and 6.37 yield:
Corollary 6.38 Let 0 A, B M
n
and [[[ [[[ be any unitarily invariant
norm. If f is a continuous increasing function on R
+
such that f(0) 0 and
f(e
t
) is convex, then
[[[f((A
1/2
BA
1/2
)
r
)[[[ [[[f(A
r/2
B
r
A
r/2
)[[[ (r 1).
In particular,
[[[(A
1/2
BA
1/2
)
r
[[[ [[[A
r/2
B
r
A
r/2
[[[ (r 1).
The next corollary is the strengthened Golden-Thompson inequality
to the form of log-majorization.
Corollary 6.39 For every self-adjoint H, K M
n
,
s(e
H+K
)
(log)
s((e
rH/2
e
rK
e
rH/2
)
1/r
) (r > 0).
Hence, for every unitarily invariant norm [[[ [[[,
[[[e
H+K
[[[ [[[(e
rH/2
e
rK
e
rH/2
)
1/r
[[[ (r > 0),
and the above right-hand side decreases to [[[e
H+K
[[[ as r 0. In particular,
[[[e
H+K
[[[ [[[e
H/2
e
K
e
H/2
[[[ [[[e
H
e
K
[[[. (6.37)
Proof: The log-majorization follows by letting p 0 in (6.34) thanks to
the above lemma. The second assertion follows from the rst and Theorem
6.23. Thanks to Theorem 6.7 (3) and Theorem 6.37 the second inequality of
(6.37) is seen as
[[[e
H
e
K
[[[ = [[[ [e
K
e
H
[ [[[ = [[[(e
H
e
2K
e
H
)
1/2
[[[ [[[e
H/2
e
K
e
H/2
[[[.
The specialization of the inequality (6.37) to the trace-norm [[ [[

1
is the
Golden-Thompson trace inequality Tr e
H+K
Tr e
H
e
K
. It was shown in
[80] that Tr e
H+K
Tr (e
H/n
e
K/n
)
n
for every n N. The extension (6.37)
was given in [61, 81]. Also (6.37) for the operator norm is known as Segals
inequality (see [78, p. 260]).
Theorem 6.40 If A, B, X M
n
and for the block-matrix
_
A X
X B
_
0,
then we have
_
_
A X
X B
_
_

_
_
A+ B 0
0 0
_
_
.
Proof: By Example 2.6 and the Ky Fan majorization (Corollary 6.11), we
have
_
_
A X
X B
_
_

_
_
A+B
2
0
0 0
_
_
+
_
_
0 0
0
A+B
2
_
_
=
_
_
A + B 0
0 0
_
_
.
This is the result.
The following statement is a special case of the previous theorem.
Example 6.41 For every X, Y M
n
such that X
Y is Hermitian, we have
(XX
+ Y Y
) (X
X + Y

Y ).
Since
_
XX
+ Y Y
0
0 0
_
=
_
X Y
0 0
_ _
X
0
Y
0
_
is unitarily conjugate to
_
X
0
Y

0
_ _
X Y
0 0
_
=
_
X
X X
Y
Y
X Y
Y
_
and X
Y is Hermitian by assumption, the above corollary implies that
_
_
XX
+ Y Y
0
0 0
_
_

_
_
X
X + Y
Y 0
0 0
_
_
.
So the statement follows.
Next we study log-majorizations and norm inequalities. These involve the
weighted geometric means
A#
B = A
1/2
(A
1/2
BA
1/2
)
A
1/2
,
where 0 1. The log-majorization in the next theorem is due to Ando
and Hiai [8] which is considered as complementary to Theorem 6.37.
+
n
,
s(A
r
#
B
r
)
(log)
s((A#
B)
r
) (r 1), (6.38)
or equivalently
s((A
p
#
B
p
)
1/p
)
(log)
s((A
q
#
B
q
)
1/q
) (p q > 0). (6.39)
Proof: First assume that both A and B are invertible. Note that
det(A
r
#
B
r
) = (det A)
r(1)
(det B)
r
= det(A#
B)
r
.
For every k = 1, . . . , n, it is easily veried from the properties of the antisym-
metric tensor powers that
(A
r
#
B
r
)
k
= (A
k
)
r
#
(B
k
)
r
,
((A#
B)
r
)
k
= ((A
k
) #
(B
k
))
r
.
So it suces to show that
|A
r
#
B
r
| |(A#
B)
r
| (r 1), (6.40)
because (6.38) follows from Lemma 1.62 by taking A
k
, B
k
instead of A, B in
(6.40). To show (6.40), we may prove that A#
B I implies A
r
#
B
r
I.
When 1 r 2, let us write r = 2 with 0 1. Let C := A
1/2
BA
1/2
.
Suppose that A#
B I. Then C
A
1
and
A C
, (6.41)
so that thanks to 0 1
A
1
C
(1)
. (6.42)
Now we have
A
r
#
B
r
= A
1
2
A
1+
2
B B
BA
1+
A
1
2
= A
1
2
A
1
2
CA
1/2
(A
1/2
C
1
A
1/2
)
A
1/2
CA
1
2
A
1
2
= A
1/2
A
1
#
[C(A#
C
1
)C]A
1/2
A
1/2
C
(1)
#
[C(C
C
1
)C]A
1/2
by using (6.41), (6.42), and the joint monotonicity of power means. Since
C
(1)
#
[C(C
C
1
)C] = C
(1)(1)
[C(C
(1)
C
)C]
= C
,
we have
A
r
#
B
r
A
1/2
C
A
1/2
= A#
B I.
Therefore (6.38) is proved when 1 r 2. When r > 2, write r = 2
m
s with
m N and 1 s 2. Repeating the above argument we have
s(A
r
#
B
r
)
w(log)
s(A
2
m1
s
#
B
2
m1
s
)
2
.
.
.
w(log)
s(A
s
#
B
s
)
2
m
w(log)
s(A#
B)
r
.
For general A, B B(1)
+
let A
:= A + I and B
:= B + I for > 0.
Since
A
r
#
B
r
= lim
0
A
r
B
r
and (A#
B)
r
= lim
0
(A
)
r
,
we have (6.38) by the above case and Theorem 6.7 (10). Finally, (6.39) readily
follows from (6.38) as in the last part of the proof of Theorem 6.37.
By Theorems 6.42 and 6.23 we have:
Corollary 6.43 Let 0 A, B M
n
and [[[ [[[ be any unitarily invariant
norm. If f is a continuous increasing function on R
+
such that f(0) 0 and
f(e
t
) is convex, then
[[[f(A
r
#
B
r
)[[[ [[[f((A#
B)
r
)[[[ (r 1).
In particular,
[[[A
r
#
B
r
[[[ [[[(A#
B)
r
[[[ (r 1).
Corollary 6.44 For every self-adjoint H, K M
n
,
s((e
rH
#
e
rK
)
1/r
)
w(log)
s(e
(1)H+K
) (r > 0).
Hence, for every unitarily invariant norm [[[ [[[,
[[[(e
rH
#
e
rK
)
1/r
[[[ [[[e
(1)H+K
[[[ (r > 0),
and the above left-hand side increases to [[[e
(1)H+K
[[[ as r 0.
Specializing to trace inequality we have
Tr (e
rH
#
e
rK
)
1/r
Tr e
(1)H+K
(r > 0),
which was rst proved in [47]. The following logarithmic trace inequalities
are also known for every 0 A, B B(1) and every r > 0:
1
r
Tr Alog B
r/2
A
r
B
r/2
Tr A(log A+log B)
1
r
Tr Alog A
r/2
B
r
A
r/2
, (6.43)
1
r
Tr Alog(A
r
#B
r
)
2
Tr A(log A+ log B). (6.44)
The exponential function has generalization:
exp
p
(X) = (I + pX)
1
p
, (6.45)
where X = X
M
n
and p (0, 1]. (If p 0, then the limit is exp X.)
There is an extension of the Golden-Thompson trace inequality.
Theorem 6.45 For 0 X, Y M
n
and p (0, 1] the following inequalities
hold:
Tr exp
p
(X + Y ) Tr exp
p
(X + Y + pY
1/2
XY
1/2
)
Tr exp
p
(X + Y + pXY ) Tr exp
p
(X) exp
p
(Y ) .
Proof: Let X
1
:= pX, Y
1
:= pY and q := 1/p. Then
Tr exp
p
(X + Y ) Tr exp
p
(X + Y + pY
1/2
XY
1/2
)
= Tr [(I + X
1
+ Y
1
+ Y
1/2
1
X
1
Y
1/2
1
)
q
]
Tr [(I + X
1
+ Y
1
+ X
1
Y
1
)
q
]
= Tr [((I + X
1
)(I + Y
1
))
q
]
The rst inequality is immediate from the monotonicity of the function (1 +
px)
1/p
and the second is by Lemma 6.46 below. Next we take
Tr [((I + X
1
)(I + Y
1
))
q
] Tr [(I + X
1
)
q
(I + y
1
)
q
] = Tr [exp
p
(X) exp
p
(Y )],
which is by the Araki-Lieb-Thirring inequality (6.36).
Lemma 6.46 For 0 X, Y M
n
we have the following:
Tr [(I + X + Y + Y
1/2
XY
1/2
)
p
] Tr [(I + X + Y + XY )
p
] if p 1,
Tr [(I + X + Y + Y
1/2
XY
1/2
)
p
] Tr [(I + X + Y + XY )
p
] if 0 p 1.
Proof: For every A, B M
sa
n
, let X = A and Z = (BA)
k
for any k N.
Since X
Z = A(BA)
k
is Hermitian, we have
(A
2
+ (BA)
k
(AB)
k
) (A
2
+ (AB)
k
(BA)
k
). (6.46)
When k = 1, by Theorem 6.1 this majorization yields the trace inequalities:
Tr [(A
2
+ BA
2
B)
p
] Tr [(A
2
+ AB
2
A)
p
] if p 1,
Tr [(A
2
+ BA
2
B)
p
] Tr [(A
2
+ AB
2
A)
p
] if 0 p 1.
Moreover, for every 0 X, Y M
n
, let A = (I + X)
1/2
and B = Y
1/2
.
Notice that
Tr [(A
2
+ BA
2
B)
p
] = Tr [(I + X + Y + Y
1/2
XY
1/2
)
p
]
and
Tr [(A
2
+ BA
2
B)
p
] = Tr [((I + X)
1/2
(I + Y )(I + X)
1/2
)
p
]
= Tr [((I + X)(I + Y ))
p
] = Tr [(I + X + Y + XY )
p
],
where (I +X)(I +Y ) has the eigenvalues in (0, ) so that ((I +X)(I +Y ))
p
is dened via the analytic functional calculus (3.17). Therefore the statement
follows.
The inequalities of Theorem 6.45 can be extended to the symmetric norm
inequality, as shown below together with the complementary inequality with
geometric mean.
Theorem 6.47 Let [[[ [[[ be a symmetric norm on M
n
and p (0, 1]. For
every 0 X, Y M
n
we have
[[[ exp
p
(2X)#exp
p
(2Y )[[[ [[[ exp
p
(X + Y )[[[
[[[ exp
p
(X)
1/2
exp
p
(Y ) exp
p
(X)
1/2
[[[
[[[ exp
p
(X) exp
p
(Y )[[[.
Proof: We have
(exp
p
(2X)#exp
p
(2Y )) = ((I + 2pX)
1/p
#(I + 2pY )
1/p
)
(log)
(((I + 2pX)#(I + 2pY ))
1/p
)
(exp
p
(X + Y )),
where the log-majorization is due to (6.38) and the inequality is due to the
arithmetic-geometric mean inequality:
(I + 2pX)#(I + 2pY )
(I + 2pX) + (I + 2pY )
2
= I + p(X + Y ).
On the other hand, let A := (I +pX)
1/2
and B := (pY )
1/2
. We can use (6.46)
and Theorem 6.37:
(exp
p
(X + Y )) ((A
2
+ BA
2
B)
1/p
)
((A
2
+ AB
2
A)
1/p
)
= (((I + pX)
1/2
(I + pY )(I + pX)
1/2
)
1/p
)
(log)
((I + pX)
1/2p
(I + pY )
1/p
(I + pX)
1/2p
)
= (exp
p
(X)
1/2
exp
p
(Y ) exp
p
(X)
1/2
)
(log)
((exp
p
(X) exp
p
(Y )
2
exp
p
(X))
1/2
)
= ([ exp
p
(X) exp
p
(Y )[).
The above majorizations give the stated norm inequalities.
The rst sentence of the chapter is from the paper of John von Neumann,
Some matrix inequalities and metrization of matric-space, Tomsk. Univ. Rev.
1(1937), 286300. (The paper is also in the book John von Neumann Collected
Works.) Theorem 6.17 and the duality of the
p
norm appeared also in this
paper.
Example 6.2 is from the paper M. A. Nielsen and J. Kempe, Separable
states are more disordered globally than locally, Phys. Rev. Lett. 86(2001),
51845187. The most comprehensive literature on majorization theory for
vectors and matrices is Marshall and Olkins monograph [66]. (There is a
recently reprinted version: A. W. Marshall, I. Olkin and B. C. Arnold, In-
equalities: Theory of Majorization and Its Applications, Second ed., Springer,
New York, 2011.) The contents presented here are mostly based on Fumio
Hiai [43]. Two survey articles [5, 6] of Tsuyoshi Ando are the best sources
on majorizations for the eigenvalues and the singular values of matrices.
The rst complete proof of the Lidskii-Wielandt theorem (Theorem 6.9)
was obtained by Helmut Wielandt in 1955, who proved a complicated mini-
max representation by induction. The proofs of Theorems 6.9 and 6.13 pre-
sented here are surprisingly elemetnary and short (compared with previously
known proofs), which are from the paper C.-K. Li and R. Mathias, The
Lidskii-Mirsky-Wielandt theorem additive and multiplicative versions, Nu-
mer. Math. 81(1999), 377413.
Here is a brief remark on the famous Horn conjecture that was armatively
solved just before 2000. The conjecture is related to three real vectors a =
(a
1
, . . . , a
n
), b = (b
1
, . . . , b
n
), and c = (c
1
, . . . , c
n
). If there are two n n
Hermitian matrices A and B such that a = (A), b = (B), and c = (A+B),
that is, a, b, c are the eigenvalues of A, B, A+B, then the three vectors obey
many inequalities of the type
kK
c
k

iI
a
i
+
jJ
b
j
for certain triples (I, J, K) of subsets of 1, . . . , n, including those coming
from the Lidskii-Wielandt theorem, together with the obvious equality
n
i=1
c
i
=
n
i=1
a
i
+
n
i=1
b
i
.
Horn [52] proposed the procedure how to produce such triples (I, J, K) and
conjectured that all the inequalities obtained in that way are sucient to
characterize a, b, c that are the eigenvalues of Hermitian matrices A, B, A+B.
This long-standing Horn conjecture was solved by two papers put together,
one by Klyachko [55] and the other by Knuston and Tao [56].
The Lieb-Thirring inequality was proved in 1976 by Elliott H. Lieb and
Walter Thirring in a physical proceeding. It is interesting that Bellmann
proved the particular case Tr (AB)
2
Tr A
2
B
2
in 1980 and he conjectured
Tr (AB)
n
Tr A
n
B
n
. The extension was proved by Huzihiro Araki, On an
inequality of Lieb and Thirring, Lett. Math. Phys. 19(1990), 167170.
Theorem 6.25 is from J.-C. Bourin [29]. Theorem 6.27 from [28] also
appeared in the paper of Aujla and Silva [15] with inequality reversed for a
contraction instead of expansion. The subadditivity inequality in Theorem
6.30 and Theorem 6.35 was rst obtained by T. Ando and X. Zhan, Norm
inequalities related to operator monotone functions, Math. Ann. 315(1999),
771780. The proof of Theorem 6.30 presented here is simpler and it is due
to M. Uchiyama [83]. Theorem 6.35 is due to Ando [4].
In the papers [8, 47] there are more details about the logarithmic trace
inequalities (6.43) and (6.44). Theorem 6.45 is in the paper S. Furuichi and
M. Lin, A matrix trace inequality and its application, Linear Algebra Appl.
433(2010), 13241328.
6.6 Exercises
1. Let S be a doubly substochastic nn matrix. Show that there exists a
doubly stochastic nn matrix D such that S
ij
D
ij
for all 1 i, j n.
2. Let
n
denote the set of all probability vectors in R
n
, i.e.,
n
:= p = (p
1
, . . . , p
n
) : p
i
0,
n
i=1
p
i
= 1.
Prove that
(1/n, 1/n, . . . , 1/n) p (1, 0, . . . , 0) (p
n
).
The Shannon entropy of p
n
is H(p) :=
n
i=1
p
i
log p
i
. Show
that H(q) H(p) log n for all p q in
n
and H(p) = log n if and
only if p = (1/n, . . . , 1/n).
3. Let A M
sa
n
. Prove the expression
k
i=1
i
(A) = maxTr AP : P is a projection, rank P = k
6.6. EXERCISES 275
for 1 k n.
4. Let A, B M
sa
n
. Show that A B implies
k
(A)
k
(B) for 1 k
n.
5. Show that statement of Theorem 6.13 is equivalent with the inequality
k
j=1
_
s
n+1j
(A)s
i
j
(B)
_
j=1
s
i
j
(AB)
for any choice of 1 i
1
< < i
k
n.
6. Give an example that for the generalized inverse (AB)
= B
is not
always true.
7. Describe the generalized inverse for a row matrix.
8. What is the generalized inverse of an orthogonal projection?
9. Let A B(1) with the polar decomposition A = U[A[. Prove that
[x, Ax[
x, [A[x +x, U[A[U
x
2
for x 1.
10. Show that [Tr A[ |A|
1
for A B(1).
11. Let 0 < p, p
1
, p
2
and 1/p = 1/p
1
+1/p
2
. Prove the Holder inequal-
ity for the vectors a, b R
n
:
p
(ab)
p
1
(a)
p
2
(b),
where ab = (a
i
b
i
).
12. Show that a continuous concave function f : R
+
R
+
is the pointwise
limit of a sequence of functions of the form
+ x
m
=1
c
(x),
where 0, , c
, a
> 0 and
a
is as given in the proof of Theorem
6.32.
13. Prove for self-adjoint matrices H, K the Lie-Trotter formula:
lim
r0
(e
rH/2
e
rK
e
rH/2
)
1/r
= e
H+K
.
14. Prove for self-adjoint matrices H, K that
lim
r0
(e
rH
#
e
rK
)
1/r
= e
(1)H+K
.
15. Let f be a real function on [a, b] with a 0 b. Prove the converse of
Corollary 4.27, that is, if
Tr f(Z
AZ) Tr Z
f(A)Z
for every A M
sa
2
with (A) [a, b] and every contraction Z M
2
,
then f is convex on [a, b] and f(0) 0.
16. Prove Theorem 4.28 in a direct way similar to the proof of Theorem
4.26.
17. Provide an example of a pair A, B of 22 Hermitian matrices such that
1
([A+ B[) <
1
([A[ +[B[) and
2
([A+ B[) >
2
([A[ +[B[).
From this, show that Theorems 4.26 and 4.28 are not true for a simple
convex function f(x) = [x[.
Chapter 7
Some applications
Matrices are of important use in many areas of both pure and applied math-
ematics. In particular, they are playing essential roles in quantum proba-
bility and quantum information. A discrete classical probability is a vector
(p
1
, p
2
, . . . , p
n
) of p
i
0 with
n
i=1
p
i
= 1. Its counterpart in quantum theory
is a matrix D M
n
(C) such that D 0 and Tr D = 1; such matrices are
called density matrices. Then matrix analysis is a basis of quantum prob-
ability/statistics and quantum information. A point here is that classical
theory is included in quantum theory as a special case where relevant matri-
ces are restricted to diagonal ones. On the other hand, there are concepts in
classical probability theory which are formulated with matrices, for instance,
covariance matrices typical in Gaussian probabilities and Fisher information
matrices in the Cramer-Rao inequality.
This chapter is devoted to some aspects in application sides of matrices.
One of the most important concepts in probability theory is the Markov prop-
erty. This concept is discussed in the rst section in the setting of Gaussian
probabilities. The structure of covariance matrices for Gaussian probabilities
with the Markov property is claried in connection with the Boltzmann en-
tropy. Its quantum analogue in the setting of CCR-algebras CCR(1) is the
subject of Section 7.3. The counterpart of the notion of Gaussian probabili-
ties is that of Gaussian or quasi-free states
A
induced by positive operators
A (similar to covariance matrices) on the underlying Hilbert space 1. In the
situation of the triplet CCR-algebra
CCR(1
1
1
2
1
3
) = CCR(1
1
) CCR(1
2
) CCR(1
3
),
the special structure of A on 1
1
1
2
1
3
and equality in the strong subad-
ditivity of the von Neumann entropy of
A
come out as equivalent conditions
for the Markov property of
A
.
277
278 CHAPTER 7. SOME APPLICATIONS
The most useful entropy in both classical and quantum probabilities is
the relative entropy S(D
1
|D
2
) := Tr D
1
(log D
1
log D
2
) for density matrices
D
1
, D
2
, which was already discussed in Sections 3.2 and 4.5. (It is also known
as the Kullback-Leibler divergence in the classical case.) The notion was
extended to the quasi-entropy:
S
A
f
(D
1
|D
2
) := AD
1/2
2
, f((D
1
/D
2
))(AD
1/2
2
)
associated with a certain function f : R
+
R and a reference matrix A,
where (D
1
/D
2
)X := D
1
XD
1
2
= L
D
1
R
1
D
2
(X). (Recall that M
f
(L
A
, R
B
) =
f(L
A
R
1
B
)R
B
was used for the matrix mean transformation in Section 5.4.)
The original relative entropy S(D
1
|D
2
) is recovered by taking f(x) = xlog x
and A = I. The monotonicity and the joint convexity properties are two
major properties of the quasi-entropies, which are the subject of Section 7.2.
Another important topic in the section is the monotone Riemannian metrics
on the manifold of invertible positive density matrices.
In a quantum system with a state D, several measurements are performed
to recover D, that is the subject of the quantum state tomography. Here,
a measurements is given by a POVM (positive operator-valued measure)
F(x) : x A, i.e., a nite set of positive matrices F(x) M
n
(C) such
that
xX
F(x) = I. In Section 7.4 we study a few results concerning how
to construct optimal quantum measurements.
The last section is concerned with the quantum version of the Cramer-Rao
inequality, that is a certain matrix inequality between a sort of generalized
variance and the quantum Fisher information. The subject belongs to the
quantum estimation theory and is also related to the monotone Riemannian
metrics.
7.1 Gaussian Markov property
In probability theory the matrices have typically real entries, but the content
of this section can be modied for the complex case.
Given a positive denite real matrix M M
n
(R) a Gaussian probability
density is dened on R
n
as
p(x) :=
det M
(2)
n
exp
_
1
2
x, Mx
_
(x R
n
).
Obviously p(x) > 0 and the integral
_
R
n
p(x) dx = 1
7.1. GAUSSIAN MARKOV PROPERTY 279
follows due to the constant factor. Since
_
R
n
x, Bxp(x) dx = Tr BM
1
,
the particular case B = E(ij) gives
_
R
n
x
i
x
j
p(x) dx =
_
R
n
x, E(ij)xp(x) dx = Tr E(ij)M
1
= (M
1
)
ij
.
Thus the inverse of the matrix M is the covariance matrix.
The Boltzmann entropy is
S(p) =
_
R
n
p(x) log p(x) dx =
n
2
log(2e)
1
2
Tr log M. (7.1)
(Instead of Tr log M, the formulation log det M is often used.)
If R
n
= R
k
R
, then the probability density p(x) has a reduction p

1
(y)
on R
k
:
p
1
(y) :=
det M
1
(2)
k
exp
_
1
2
y, M
1
y
_
(y R
k
).
To describe the relation of M and M
1
we take the block matrix form
M =
_
M
11
M
12
M
12
M
22
_
,
where M
11
M
k
(R). The we have
p
1
(y) =
det M
(2)
m
det M
22
exp
_
1
2
y, (M
11
M
12
M
1
22
M
12
)y
_
,
see Example 2.7. Therefore M
1
= M
11
M
12
M
1
22
M
12
= M/M
22
, which is
called the Schur complement of M
22
in M. We have det M
1
det M
22
=
det M.
Let p
2
(z) be the reduction of p(x) to R
and denote the Gaussian matrix

by M
2
. In this case M
2
= M
22
M
12
M
1
11
M
12
= M/M
11
. The following
equivalent conditions hold:
(1) S(p) S(p
1
) + S(p
2
),
(2) Tr log M Tr log M
1
Tr log M
2
,
(3) Tr log M Tr log M
11
+ Tr log M
22
.
(1) is known as the subadditivity of the Boltzmann entropy. The equivalence
of (1) and (2) follows directly from formula (7.1). (2) can be rewritten as
log det M (log det M log det M
22
) (log det M log det M
11
)
and we have (3). The equality condition is M
12
= 0. If
M
1
= S =
_
S
11
S
12
S
12
S
22
_
,
then M
12
= 0 is obviously equivalent to S
12
= 0. It is an interesting remark
that (2) is equivalent to the inequality
(2*) Tr log S Tr log S
11
+ Tr log S
22
.
The three-fold factorization R
n
= R
k
R
R
m
is more interesting and
includes essential properties. The Gaussian matrix of the probability density
p is
M =
_
_
M
11
M
12
M
13
M
12
M
22
M
23
M
13
M
23
M
33
_
_
, (7.2)
where M
11
M
k
(R), M
22
M
(R), M
33
M
m
(R). Denote the reduced
probability densities of p by p
1
, p
2
, p
3
, p
12
, p
23
. The strong subadditivity of
the Boltzmann entropy
S(p) + S(p
2
) S(p
12
) + S(p
23
) (7.3)
is equivalent to the inequality
Tr log S + Tr log S
22
Tr log
_
S
11
S
12
S
12
S
22
_
+ Tr log
_
S
22
S
23
S
23
S
33
_
, (7.4)
where
M
1
= S =
_
_
S
11
S
12
S
13
S
12
S
22
S
23
S
13
S
23
S
33
_
_
.
The Markov property in probability theory is typically dened as
p(x
1
, x
2
, x
3
)
p
12
(x
1
, x
2
)
=
p
23
(x
2
, x
3
)
p
2
(x
2
)
(x
1
R
k
, x
2
R
, x
3
R
m
).
Taking the logarithm and integrating with respect to dp, we obtain
S(p) + S(p
12
) = S(p
23
) + S(p
2
) (7.5)
and this is the equality case in (7.3) and in (7.4). The equality case of (7.4)
is described in Theorem 4.50, so we have the following:
7.2. ENTROPIES AND MONOTONICITY 281
Theorem 7.1 The Gaussian probability density described by the block matrix
(7.2) has the Markov property if and only if S
13
= S
12
S
1
22
S
23
for the inverse.
Another condition comes from the inverse property of a 33 block matrix.
Theorem 7.2 Let S = [S
ij
]
3
i,j=1
be an invertible block matrix and assume
that S
22
and [S
ij
]
3
i,j=2
are invertible. Then the (1, 3) entry of the inverse
S
1
= [M
ij
]
3
i,j=1
is given by the following formula:
_
S
11
[ S
12
, S
13
]
_
S
22
, S
23
S
32
S
33
_
1
_
S
12
S
13
_
_
1
(S
12
S
1
22
S
23
S
13
)(S
33
S
32
S
1
22
S
23
)
1
.
Hence M
13
= 0 if and only if S
13
= S
12
S
1
22
S
23
.
It follows that the Gaussian block matrix (7.2) has the Markov property
if and only if M
13
= 0.
7.2 Entropies and monotonicity
Entropy and relative entropy have been important notions in information the-
ory. The quantum versions are in matrix theory. Recall that 0 D M
n
is a
density matrix if Tr D = 1. This means that the eigenvalues (
1
,
2
, . . . ,
n
)
form a probabilistic set:
i
0,
i
= 1. The von Neumann entropy
S(D) = Tr Dlog D of the density matrix D is the Shannon entropy of the
probabilistic set,
i
log
i
.
The partial trace Tr
1
: M
n
M
m
M
m
is a linear mapping which is
dened by the formula Tr
1
(A B) = (Tr A)B on elementary tensors. It is
called the partial trace, since the trace of the rst tensor factor was taken.
Tr
2
: M
n
M
m
M
n
is similarly dened.
The rst example includes the strong subadditivity of the von Neumann
entropy and a condition of the equality is also included. (Other conditions
will appear in Theorem 7.6.)
Example 7.3 We shall need here the concept for three-fold tensor product
and reduced densities. Let D
123
be a density matrix in M
k
M
M
m
. The
reduced density matrices are dened by the partial traces:
D
12
:= Tr
3
D
123
M
k
M
, D
2
:= Tr
13
D
123
M
, D
23
:= Tr
1
D
123
M
k
.
The strong subadditivity is the inequality
S(D
123
) + S(D
2
) S(D
12
) + S(D
23
), (7.6)
which is equivalent to
Tr D
123
(log D
123
(log D
12
log D
2
+ log D
23
)) 0.
The operator
exp(log D
12
log D
2
+ log D
23
)
is positive and can be written as D for a density matrix D. Actually,
= Tr exp(log D
12
log D
2
+ log D
23
).
We have
S(D
12
) + S(D
23
) S(D
123
) S(D
2
)
= Tr D
123
(log D
123
(log D
12
log D
2
+ log D
23
))
= S(D
123
|D) = S(D
123
|D) log .
Here S(X|Y ) := Tr X(log X log Y ) is the relative entropy. If X and Y
are density matrices, then S(X|Y ) 0, see the Streater inequality (3.13).
Therefore, 1 implies the positivity of the left-hand side (and the strong
subadditivity). Due to Theorem 4.55, we have
Tr exp(log D
12
log D
2
+log D
23
))
_

0
Tr D
12
(tI +D
2
)
1
D
23
(tI +D
2
)
1
dt.
Applying the partial traces we have
Tr D
12
(tI + D
2
)
1
D
23
(tI + D
2
)
1
= Tr D
2
(tI + D
2
)
1
D
2
(tI + D
2
)
1
and that can be integrated out. Hence
_

0
Tr D
12
(tI + D
2
)
1
D
23
(tI + D
2
)
1
dt = Tr D
2
= 1
and 1 is obtained and the strong subadditivity is proven.
If the equality holds in (7.6), then exp(log D
12
log D
2
+ log D
23
) is a
density matrix and
S(D
123
| exp(log D
12
log D
2
+ log D
23
)) = 0
implies
log D
123
= log D
12
log D
2
+ log D
23
.
This is the necessary and sucient condition for the equality.
For a density matrix D one can dene the q-entropy as
S
q
(D) =
1 Tr D
q
q 1
=
Tr (D
q
D)
1 q
(q > 1).
This is also called the quantum Tsallis entropy. The limit q 1 is the
von Neumann entropy.
The next theorem is the subadditivity of the q-entropy. The result has an
elementary proof, but it was not known for several years.
Theorem 7.4 When the density matrix D M
n
M
m
has the partial den-
sities D
1
:= Tr
2
D and D
2
:= Tr
1
D, the subadditivity inequality S
q
(D)
S
q
(D
1
) + S
q
(D
2
), or equivalently
Tr D
q
1
+ Tr D
q
2
= |D
1
|
q
q
+|D
2
|
q
q
1 +|D|
q
q
= 1 + Tr D
q
holds for q 1.
Proof: It is enough to show the case q > 1. First we use the q-norms and
we prove
1 +|D|
q
|D
1
|
q
+|D
2
|
q
. (7.7)
Lemma 7.5 below will be used.
If 1/q + 1/q
= 1, then for A 0 we have

|A|
q
:= maxTr AB : B 0, |B|
q
1.
It follows that
|D
1
|
q
= Tr XD
1
and |D
2
|
q
= Tr Y D
2
with some X 0 and Y 0 such that |X|
q
1 and |Y |
q
1. It follows
from Lemma 7.5 that
|(X I
m
+ I
n
Y I
n
I
m
)
+
|
q
1
and we have Z 0 such that
Z X I
m
+ I
n
Y I
n
I
m
and |Z|
q
= 1. It follows that
Tr (ZD) + 1 Tr (X I
m
+ I
n
Y )D = Tr XD
1
+ Tr Y D
2
.
Since
|D|
q
Tr (ZD),
we have the inequality (7.7).
We examine the maximum of the function f(x, y) = x
q
+y
q
in the domain
M := (x, y) : 0 x 1, 0 y 1, x + y 1 +|D|
q
.
Since f is convex, it is sucient to check the extreme points (0, 0), (1, 0),
(1, |D|
q
), (|D|
q
, 1), (0, 1). It follows that f(x, y) 1+|D|
q
q
. The inequality
(7.7) gives that (|D
1
|
q
, |D
2
|
q
) M and this gives f(|D
1
|
q
, |D
2
|
q
) 1 +
|D|
q
q
and this is the statement.
Lemma 7.5 For q 1 and for the positive matrices 0 X M
n
and
0 Y M
m
assume that |X|
q
, |Y |
q
1. Then the quantity
|(X I
m
+ I
n
Y I
n
I
m
)
+
|
q
1 (7.8)
holds.
Proof: Let x
i
: 1 i n and y
j
: 1 j m be the eigenvalues of X
and Y , respectively. Then
n
i=1
x
q
i
1,
m
j=1
y
q
j
1
and
|(X I
m
+ I
n
Y I
n
I
m
)
+
|
q
q
=
i,j
((x
i
+ y
j
1)
+
)
q
.
The function a (a + b 1)
+
is convex for any real value of b:
_
a
1
+ a
2
2
+ b 1
_
+
1
2
(a
1
+ b 1)
+
+
1
2
(a
2
+ b 1)
+
.
It follows that the vector-valued function
a ((a + y
j
1)
+
: j)
is convex as well. Since the
q
norm for positive real vectors is convex and
monotonously increasing, we conclude that
f(a) :=
_
j
((a + y
j
1)
+
)
q
_
1/q
is a convex function. Since f(0) = 0 and f(1) = 1, we hav the inequality
f(a) a for 0 a 1. Actually, we need this for x
i
. Since 0 x
i
1,
f(x
i
) x
i
follows and
j
((x
i
+ y
j
1)
+
)
q
=
i
f(x
i
)
q
i
x
q
i
1.
So (7.8) is proved.
The next theorem is stated in the setting of Example 7.3.
Theorem 7.6 The following conditions are equivalent:
(i) S(D
123
) + S(D
2
) = S(D
12
) + S(D
23
);
(ii) D
it
123
D
it
23
= D
it
12
D
it
2
for every real t;
(iii) D
1/2
123
D
1/2
23
= D
1/2
12
D
1/2
2
;
(iv) log D
123
log D
23
= log D
12
log D
2
;
(v) There are positive matrices X M
k
M
and Y M
M
m
such that
D
123
= (X I
m
)(I
k
Y ).
In the mathematical formalism of quantum mechanics, instead of n-tuples
of numbers one works with n n complex matrices. They form an algebra
and this allows an algebraic approach.
For positive denite matrices D
1
, D
2
M
n
, for A M
n
and a function
f : R
+
R, the quasi-entropy is dened as
S
A
f
(D
1
|D
2
) := AD
1/2
2
, f((D
1
/D
2
))(AD
1/2
2
)
= Tr D
1/2
2
A
f((D
1
/D
2
))(AD
1/2
2
), (7.9)
where B, C := Tr B
C is the so-called Hilbert-Schmidt inner product

and (D
1
/D
2
) : M
n
M
n
is a linear mapping acting on matrices as follows:
(D
1
/D
2
)A := D
1
AD
1
2
.
This concept was introduced by Petz in [71, 73]. An alternative terminology
is the quantum f-divergence.
For a positive denite matrix D M
n
the left and the right multiplication
operators acting on matrices are dened by
L
D
(X) := DX , R
D
(X) := XD (X M
n
). (7.10)
If we set
J
f
D
1
,D
2
:= f(L
D
1
R
1
D
2
)R
D
2
,
then the quasi-entropy has the form
S
A
f
(D
1
|D
2
) = A, J
f
D
1
,D
2
A . (7.11)
It is clear from the denition that
S
A
f
(D
1
|D
2
) = S
A
f
(D
1
|D
2
)
for a positive number .
Let : M
n
M
m
be a mapping between two matrix algebras. The dual
: M
m
M
n
with respect to the Hilbert-Schmidt inner product is positive
if and only if is positive. Moreover, is unital if and only if
is trace
preserving. : M
n
M
m
is called a Schwarz mapping if
(B
B) (B
)(B) (7.12)
for every B M
n
.
The quasi-entropies are monotone and jointly convex.
Theorem 7.7 Assume that f : R
+
R is a matrix monotone function with
f(0) 0 and : M
n
M
m
is a unital Schwarz mapping. Then
S
A
f
(
(D
1
)|
(D
2
)) S
(A)
f
(D
1
|D
2
) (7.13)
holds for A M
n
and for invertible density matrices D
1
and D
2
from the
matrix algebra M
m
.
Proof: The proof is based on inequalities for matrix monotone and matrix
concave functions. First note that
S
A
f+c
(
(D
1
)|
(D
2
)) = S
A
f
(
(D
1
)|
(D
2
)) + c Tr D
1
(A
A)
and
S
(A)
f+c
(D
1
|D
2
) = S
(A)
f
(D
1
|D
2
) + c Tr D
1
((A)
(A))
for a positive constant c. Due to the Schwarz inequality (7.12), we may
assume that f(0) = 0.
Let := (D
1
/D
2
) and
0
:= (
(D
1
)/
(D
2
)). The operator
V X
(D
2
)
1/2
= (X)D
1/2
2
(X /
0
)
is a contraction:
|(X)D
1/2
2
|
2
= Tr D
2
((X)
(X))
Tr D
2
((X
X) = Tr
(D
2
)X
X = |X
(D
2
)
1/2
|
2
since the Schwarz inequality is applicable to . A similar simple computation
gives that
V
V
0
.
Since f is matrix monotone, we have f(
0
) f(V
V ). Recall that f is
matrix concave. Therefore f(V
V ) V
f()V and we conclude

f(
0
) V

f()V .
Application to the vector A
(D
2
)
1/2
gives the statement.
It is remarkable that for a multiplicative (i.e., is a -homomorphism)
we do not need the condition f(0) 0. Moreover, since V

V =
0
, we do
not need the matrix monotonicity of the function f. In this case the matrix
concavity is the only condition to obtain the result analogous to Theorem
7.7. If we apply the monotonicity (7.13) (with f in place of f) to the
embedding (X) = X X of M
n
into M
n
M
n
M
n
M
2
and to the
densities D
1
= E
1
(1 )F
1
, D
2
= E
2
(1 )F
2
, then we obtain the
joint convexity of the quasi-entropy:
+
R is a matrix convex function, then S
A
f
(D
1
|D
2
)
is jointly convex in the variables D
1
and D
2
.
If we consider the quasi-entropy in the terminology of means, then we can
have another proof. The joint convexity of the mean is the inequality
f(L
(A
1
+A
2
)/2
R
1
(B
1
+B
2
)/2
)R
(B
1
+B
2
)/2

1
2
f(L
A
1
R
1
B
1
)R
B
1
+
1
2
f(L
A
2
R
1
B
2
)R
B
2
,
which can be simplied as
f(L
A
1
+A
2
R
1
B
1
+B
2
) R
1/2
B
1
+B
2
R
1/2
B
1
f(L
A
1
R
1
B
1
)R
1/2
B
1
R
1/2
B
1
+B
2
+R
1/2
B
1
+B
2
R
1/2
B
2
f(L
A
2
R
1
B
2
)R
1/2
B
2
R
1/2
B
1
+B
2
= Cf(L
A
1
R
1
B
1
)C
+ Df(L
A
2
R
1
B
2
)D
.
Here CC
+ DD
= I and
C(L
A
1
R
1
B
1
)C
+ D(L
A
2
R
1
B
2
)D
= L
A
1
+A
2
R
1
B
1
+B
2
.
So the joint convexity of the quasi-entropy has the form
f(CXC
+ DY D
) Cf(X)C
+ Df(Y )D
which is true for a matrix convex function f, see Theorem 4.22.

Example 7.9 The concept of quasi-entropies includes some important spe-
cial cases. If f(t) = t
, then
S
A
f
(D
1
|D
2
) = Tr A
1
AD
1
2
.
If 0 < < 1, then f is matrix monotone. The joint concavity in (D
1
, D
2
) is
the famous Liebs concavity theorem [63].
In the case where A = I we have a kind of relative entropy. For f(x) =
xlog x we have Umegakis relative entropy
S(D
1
|D
2
) = Tr D
1
(log D
1
log D
2
).
(If we want a matrix monotone function, then we can take f(x) = log x and
then we have S(D
2
|D
1
).) Umegakis relative entropy is the most important
example.
Let
f
(x) =
1
(1 )
(1 x
).
This function is matrix monotone decreasing for (1, 1). (For = 0, the
limit is taken and it is log x.) Then the relative entropies of degree
are produced:
S
(D
1
|D
2
) :=
1
(1 )
Tr (I D
1
D
2
)D
2
.
These quantities are essential in the quantum case.
Let /
n
be the set of positive denite density matrices in M
n
. This is a dif-
ferentiable manifold and the set of tangent vectors is A = A
M
n
: Tr A =
0. A Riemannian metric is a family of real inner products
D
(A, B) on the
tangent vectors. For a function f : (0, ) (0, ) with xf(x
1
) = f(x), the
possible denition is similar to (7.11): for A, B M
sa
n
with Tr A = Tr B = 0,
f
D
(A, B) := Tr A(J
f
D
)
1
(B). (7.14)
(Here J
f
D
= J
f
D,D
.) The condition xf(x
1
) = f(x) implies that (J
f
D
)
1
(B)
M
sa
n
if B M
sa
n
. Hence (7.14) actually denes a real inner product.
By a monotone metric we mean a family
D
of Riemannian metrics on
all manifolds /
n
such that
(D)
((A), (A))
D
(A, A) (7.15)
for every completely positive trace-preserving mapping : M
n
M
m
and
every A M
sa
n
with Tr A = 0. If f is matrix monotone, then
f
D
satises this
monotonicity, see [72].
Let : M
n
M
2
M
n
be dened as
_
B
11
B
12
B
21
B
22
_
B
11
+ B
22
.
This is completely positive and trace-preserving, which is a so-called partial
trace. For
D =
_
D
1
0
0 (1 )D
2
_
, A =
_
A
1
0
0 (1 )A
2
_
the inequality (7.15) gives
D
1
+(1)D
2
(A
1
+ (1 )A
2
, A
1
+ (1 )A
2
)

D
1
(A
1
, A
1
) +
(1)D
2
((1 )A
2
, (1 )A
2
).
Since
tD
(tA, tB) = t
D
(A, B), we obtain the joint convexity:
Theorem 7.10 For a matrix monotone function f, the monotone metric
f
D
(A, A) is a jointly convex function of (D, A) of positive denite D and
general A M
n
.
Now let f : (0, ) (0, ) be a continuous function; the denition of f
at 0 is not necessary here. Dene g, h : (0, ) (0, ) by g(x) := xf(x
1
)
and
h(x) :=
_
f(x)
1
+ g(x)
1
2
_
1
, x > 0.
Obviously, h is symmetric, i.e., h(x) = xh(x
1
) for x > 0, so we may call h
the harmonic symmetrization of f.
The dierence between two parameters in J
f
D
1
,D
2
and one parameter in
J
f
D,D
is not essential if the matrix size can be changed. We need the next
lemma.
Lemma 7.11 For D
1
, D
2
> 0 and general X in M
n
let
D :=
_
D
1
0
0 D
2
_
, Y :=
_
0 X
0 0
_
, A :=
_
0 X
X
0
_
.
Then
Y, (J
f
D
)
1
Y = X, (J
f
D
1
,D
2
)
1
X, (7.16)
A, (J
f
D
)
1
A = 2X, (J
h
D
1
,D
2
)
1
X. (7.17)
Proof: First we show that
(J
f
D
)
1
_
X
11
X
12
X
21
X
22
_
=
_
(J
f
D
1
)
1
X
11
(J
f
D
1
,D
2
)
1
X
12
(J
f
D
2
,D
1
)
1
X
21
(J
f
D
2
)
1
X
22
_
. (7.18)
Since continuous functions can be approximated by polynomials, it is enough
to check (7.18) for f(x) = x
k
, which is easy. From (7.18), (7.16) is obvious
and
A, (J
f
D
)
1
A = X, (J
f
D
1
,D
2
)
1
X +X
, (J
f
D
2
,D
1
)
1
X
.
From the spectral decompositions
D
1
=
i
P
i
and D
2
=
j
Q
j
we have
J
f
D
1
,D
2
A =
i,j
m
f
(
i
,
j
)P
i
AQ
j
and
X, (J
g
D
1
,D
2
)
1
X =
i,j
m
g
(
i
,
j
)Tr X
P
i
XQ
j
=
i,j
m
f
(
j
,
i
)Tr XQ
j
X
P
i
= X
, (J
f
D
2
,D
1
)
1
X
. (7.19)
Therefore,
A, (J
f
D
)
1
A = X, (J
f
D
1
,D
2
)
1
X +X, (J
g
D
1
,D
2
)
1
X = 2X, (J
h
D
1
,D
2
)
1
X.
Theorem 7.12 In the above situation consider the following conditions:

(i) f is matrix monotone,
(ii) (D, A) A, (J
f
D
)
1
A is jointly convex in positive denite D and gen-
eral A in M
n
for every n,
(iii) (D
1
, D
2
, A) A, (J
f
D
1
,D
2
)
1
A is jointly convex in positive denite
D
1
, D
2
and general A in M
n
for every n,
(iv) (D, A) A, (J
f
D
)
1
A is jointly convex in positive denite D and self-
adjoint A in M
n
for every n,
(v) h is matrix monotone.
Then (i) (ii) (iii) (iv) (v).
Proof: (i) (ii) is Theorem 7.10 and (ii) (iii) follows from (7.16). We
prove (iii) (i). For each C
n
let X
:= [ 0 0] M
n
, i.e., the rst
column of X
is and all other entries of X
are zero. When D

2
= I and
X = X
, we have for D > 0 in M

n
X
, (J
f
D,I
)
1
X
= X
, f(D)
1
X
= , f(D)
1
.
Hence it follows from (iii) that , f(D)
1
is jointly convex in D > 0
in M
n
and C
n
. By a standard convergence argument we see that
(D, ) , f(D)
1
is jointly convex for positive invertible D B(1) and
1, where B(1) is the set of bounded operators on a separable innite-
dimensional Hilbert space 1. Now Theorem 3.1 in [9] is used to conclude
that 1/f is matrix monotone decreasing, so f is matrix monotone.
(ii) (iv) is trivial. Assume (iv); then it follows from (7.17) that (iii)
holds for h instead of f, so (v) holds thanks to (iii) (i) for h. From (7.19)
when A = A
and D
1
= D
2
= D, it follows that
A, (J
f
D
)
1
A = A, (J
g
D
)
1
A = A, (J
h
D
)
1
A.
Hence (v) implies (iv) by applying (i) (ii) to h.
Example 7.13 The
2
-divergence
2
(p, q) :=
i
(p
i
q
i
)
2
q
i
=
i
_
p
i
q
i
1
_
2
q
i
was rst introduced by Karl Pearson in 1900 for probability densities p and
q. Since
_
i
[p
i
q
i
[
_
2
=
_
p
i
q
i
1
q
i
_
2
i
_
p
i
q
i
1
_
2
q
i
,
we have
|p q|
2
1

2
(p, q). (7.20)
A quantum generalization was introduced very recently: for density matrices
and ,
(, ) = Tr
_
( )
( )
1
_
= Tr
1
1
= , (J
f
)
1
1,
where [0, 1] and f(x) = x
. If and commute, then this formula is

independent of .
The monotonicity of the
2
-divergence follows from (7.15). The mono-
tonicity and the classical inequality (7.20) imply that
| |
2
1

2
(, ).
Indeed, if E is the conditional expectation onto the commutative algebra
generated by , then
| |
2
1
= |E() E()|
2
1

2
(E(), E())
2
(, ).
7.3 Quantum Markov triplets

The CCR-algebra used in this section is an innite-dimensional C*-algebra,
but its parametrization will be by a nite-dimensional Hilbert space 1. (CCR
is the abbreviation of canonical commutation relation and the book [70]
contains the details.)
Assume that for every f 1 a unitary operator W(f) is given so that the
relations
W(f
1
)W(f
2
) = W(f
1
+ f
2
) exp(i (f
1
, f
2
)),
W(f) = W(f)
hold for f
1
, f
2
, f 1 with (f
1
, f
2
) := Imf
1
, f
2
. The C*-algebra gener-
ated by these unitaries is unique and denoted by CCR(1). Given a positive
operator A B(1), a functional
A
: CCR(1) C can be dened as
A
(W(f)) := exp ( |f|
2
/2 f, Af).
This is called a Gaussian or quasi-free state. In the so-called Fock rep-
resentation of CCR(1) the quasi-free state
A
has the density operator D
A
,
D
A
0 and Tr D
A
= 1. We do not describe here D
A
but we remark that if
i
s are the eigenvalues of A, then D
A
has the eigenvalues
i
1
1 +
i
_

i
1 +
i
_
n
i
,
7.3. QUANTUM MARKOV TRIPLETS 293
where n
i
s are non-negative integers. Therefore the von Neumann entropy is
S(
A
) := Tr D
A
log D
A
= Tr (A), (7.21)
where (t) := t log t + (t + 1) log(t + 1) is an interesting special function.
Assume that 1 = 1
1
1
2
and write the positive mapping A B(1) in
the form of block matrix:
A =
_
A
11
A
12
A
21
A
22
_
.
If f 1
1
, then
A
(W(f 0)) = exp ( |f|
2
/2 f, A
11
f).
Therefore the restriction of the quasi-free state
A
to CCR(1
1
) is the quasi-
free state
A
11
.
Let 1 = 1
1
1
2
1
3
be a nite-dimensional Hilbert space and consider
the CCR-algebras CCR(1
i
) (1 i 3). Then
CCR(1) = CCR(1
1
) CCR(1
2
) CCR(1
3
)
holds. Assume that D
123
is a density operator in CCR(1) and we denote
by D
12
, D
2
, D
23
its reductions into the subalgebras CCR(1
1
) CCR(1
2
),
CCR(1
2
), CCR(1
2
) CCR(1
3
), respectively. These subalgebras form a
Markov triplet with respect to the state D
123
if
S(D
123
) S(D
23
) = S(D
12
) S(D
2
), (7.22)
where S denotes the von Neumann entropy and we assume that both sides
are nite in the equation. (Note (7.22) is the quantum analogue of (7.5).)
Now we concentrate on the Markov property of a quasi-free state
A

123
with the density operator D
123
, where A is a positive operator acting on
1 = 1
1
1
2
1
3
and it has a block matrix form
A =
_
_
A
11
A
12
A
13
A
21
A
22
A
23
A
31
A
32
A
33
_
_
.
Then the restrictions D
12
, D
23
and D
2
are also Gaussian states with the
positive operators
B =
_
_
A
11
A
12
0
A
21
A
22
0
0 0 I
_
_
, C =
_
_
I 0 0
0 A
22
A
23
0 A
32
A
33
_
_
and D =
_
_
I 0 0
0 A
22
0
0 0 I
_
_
,
respectively. Formula (7.21) tells that the Markov condition (7.22) is equiv-
alent to
Tr (A) + Tr (D) = Tr (B) + Tr (C).
(This kind of condition appeared already in the study of strongly subadditive
functions, see Theorem 4.50.)
Denote by P
i
the orthogonal projection from 1 onto 1
i
, 1 i 3. Of
course, P
1
+ P
2
+ P
3
= I and we use also the notation P
12
:= P
1
+ P
2
and
P
23
:= P
2
+ P
3
.
Theorem 7.14 Assume that A B(1) is a positive invertible operator and
the corresponding quasi-free state is denoted as
A

123
on CCR(1). Then
the following conditions are equivalent.
(a) S(
123
) + S(
2
) = S(
12
) + S(
23
);
(b) Tr (A) + Tr (P
2
AP
2
) = Tr (P
12
AP
12
) + Tr (P
23
AP
23
);
(c) There is a projection P B(1) such that P
1
P P
1
+ P
2
and
PA = AP.
Proof: Due to the formula (7.21), (a) and (b) are equivalent.
Condition (c) tells that the matrix A has a special form:
A =
_
_
A
11
[ a 0 ] 0
_
a
0
_ _
c 0
0 d
_ _
0
b
_
0 [ 0 b
] A
33
_
_
=
_
_
_
A
11
a
a
c
_
0
0
_
d b
b
A
33
_
_
_
, (7.23)
where the parameters a, b, c, d (and 0) are operators. This is a block diagonal
matrix:
A =
_
A
1
0
0 A
2
_
,
and the projection P is
_
I 0
0 0
_
in this setting.
The Hilbert space 1
2
is decomposed as 1
L
2
1
R
2
, where 1
L
2
is the range
of the projection PP
2
. Therefore,
CCR(1) = CCR(1
1
1
L
2
) CCR(1
R
2
1
3
)
7.3. QUANTUM MARKOV TRIPLETS 295
and
123
becomes a product state
L
R
. From this we can easily show the
implication (c) (a).
The essential part is the proof of (b) (c). Now assume (b), that is,
Tr (A) + Tr (A
22
) = Tr (B) + Tr (C).
We notice that the function (x) = xlog x + (x + 1) log(x + 1) admits the
integral representation
(x) =
_

1
t
2
log(tx + 1) dt.
By Theorem 4.50 applied to tA + I we have
Tr log(tA+I) + Tr log(tA
22
+ I) Tr log(tB + I) + Tr log(tC +I) (7.24)
for every t > 1. Hence it follows from (7.24) that equality holds in (7.24) for
almost every t > 1. By Theorem 4.50 again this implies that
tA
13
= tA
12
(tA
22
+ I)
1
tA
23
for almost every t > 1. The continuity gives that actually for every t > 1 we
have
A
13
= A
12
(A
22
+ t
1
I)
1
A
23
.
Since A
12
(A
22
+ zI)
1
A
23
is an analytic function in z C : Re z > 0, we
have
A
13
= A
12
(A
22
+ sI)
1
A
23
(s R
+
).
Letting s shows that A
13
= 0. Since A
12
s(A
22
+ sI)
1
A
23
A
12
A
23
as s , we have also A
12
A
23
= 0. The latter condition means that
ran A
23
ker A
12
, or equivalently (ker A
12
)
ker A
23
.
The linear combinations of the functions x 1/(s+x) form an algebra and
due to the Stone-Weierstrass theorem A
12
g(A
22
)A
23
= 0 for any continuous
function g.
We want to show that the equality implies the structure (7.23) of the
operator A. We have A
23
: 1
3
1
2
and A
12
: 1
2
1
1
. To show the
structure (7.23), we have to nd a subspace H 1
2
such that
A
22
H H, H
ker A
12
, H ker A
23
,
or alternatively K (= H
) 1
2
should be an invariant subspace of A
22
such
that
ran A
23
K ker A
12
.
Let
K :=
_
i
A
n
i
22
A
23
x
i
: x
i
1
3
, n
i
0
_
be the set of nite sums. It is a subspace of 1
2
. The property ranA
23
K
and the invariance under A
22
are obvious. Since
A
12
A
n
22
A
23
x = 0,
K ker A
12
also follows. The proof is complete.
In the theorem it was assumed that 1is a nite-dimensional Hilbert space,
but the proof works also in innite dimension. In the theorem the formula
(7.23) shows that A should be a block diagonal matrix. There are nontrivial
Markovian Gaussian states which are not a product in the time localization
(1 = 1
1
1
2
1
3
). However, the rst and the third subalgebras are always
independent.
The next two theorems give dierent descriptions (but they are not essen-
tially dierent).
Theorem 7.15 For a quasi-free state
A
the Markov property (7.22) is equiv-
alent to the condition
A
it
(I + A)
it
D
it
(I + D)
it
= B
it
(I + B)
it
C
it
(I + C)
it
for every real t.
Theorem 7.16 The block matrix
A =
_
_
A
11
A
12
A
13
A
21
A
22
A
23
A
31
A
32
A
33
_
_
gives a Gaussian state with the Markov property if and only if
A
13
= A
12
f(A
22
)A
23
for any continuous function f : R R.
This shows that the CCR condition is much more restrictive than the
classical one.
7.4. OPTIMAL QUANTUM MEASUREMENTS 297
7.4 Optimal quantum measurements
In the matrix formalism the state of a quantum system is a density matrix
0 M
d
(C) with the property Tr = 1. A nite set F(x) : x X of
positive matrices is called a positive operator-valued measure (POVM)
if
xX
F(x) = I,
where F(x) ,= 0 can be assumed. The quantum state tomography can recover
the state from the probability set Tr F(x) : x X. In this section there
are arguments for the optimal POVM set. There are a few rules from quantum
theory, but the essential part is the frames in the Hilbert space M
d
(C).
The space M
d
(C) of matrices equipped with the Hilbert-Schmidt inner
product A[B = Tr A
B is a Hilbert space. We use the bra-ket notation for

operators: A[ is an operator bra and [B is an operator ket. Then [AB[ is
a linear mapping M
d
(C) M
d
(C). For example,
[AB[C = (Tr B
C)A, ([AB[)
= [BA[,
[A
1
AA
2
B[ = A
1
[AB[A
2
when A
1
, A
2
: M
d
(C) M
d
(C).
For an orthonormal basis [E
k
: 1 k d
2
of M
d
(C), a linear superop-
erator S : M
d
(C) M
d
(C) can then be written as S =
j,k
s
jk
[E
j
E
k
[ and
its action is dened as
S[A =
j,k
s
jk
[E
j
E
k
[A =
j,k
s
jk
E
j
Tr (E
k
A) .
We denote the identity superoperator as I, and so I =
k
[E
k
E
k
[.
The Hilbert space M
d
(C) has an orthogonal decomposition
cI : c C A M
d
(C) : Tr A = 0.
In the block-matrix form under this decomposition,
I =
_
1 0
0 I
d
2
1
_
and [II[ =
_
d 0
0 0
_
.
Let X be a nite set. An operator frame is a family of operators A(x) :
x X for which there exists a constant a > 0 such that
aC[C
xX
[A(x)[C[
2
(7.25)
for all C M
d
(C). The frame superoperator is dened as
A =
xX
[A(x)A(x)[.
It has the properties
AB =
xX
[A(x)A(x)[B =
xX
[A(x)Tr A(x)
B,
Tr A
2
=
x,yX
[A(x)[A(y)[
2
. (7.26)
The operator A is positive (and self-adjoint), since
B[A[B =
xX
[A(x)[B[
2
0.
Since this formula shows that (7.25) is equivalent to
aI A,
it follows that (7.25) holds if and only if A has an inverse. The frame is called
tight if A = aI.
Let : X (0, ). Then A(x) : x X is an operator frame if and only
if (x)A(x) : x X is an operator frame.
Let A
i
M
d
(C) : 1 i k be a subset of M
d
(C) such that the linear
span is M
d
(C). (Then k d
2
.) This is a simple example of an operator frame.
If k = d
2
, then the operator frame is tight if and only if A
i
M
d
(C) : 1
i d
2
is an orthonormal basis up to a multiple constant.
A set A(x) : x X of positive matrices is informationally complete
(IC) if for each pair of distinct quantum states ,= there exists an element
x X such that Tr A(x) ,= Tr A(x). When A(x)s are of unit rank we call
A a rank-one. It is clear that for numbers (x) > 0 the set A(x) : x X
is IC if and only if (x)A(x) : x X is IC.
Theorem 7.17 Let F(x) : x X be a POVM. Then F is informationally
complete if and only if F(x) : x X is an operator frame.
Proof: We use the notation
A =
xX
[F(x)F(x)[, (7.27)
which is a positive operator.
Suppose that F is informationally complete and take an operator A =
A
1
+ iA
2
in self-adjoint decomposition such that
A[A[A =
xX
[Tr F(x)A[
2
=
xX
[Tr F(x)A
1
[
2
+
xX
[Tr F(x)A
2
[
2
= 0,
then we must have Tr F(x)A
1
= Tr F(x)A
2
= 0. The operators A
1
and A
2
are traceless:
Tr A
i
=
xX
Tr F(x)A
i
= 0 (i = 1, 2).
Take a positive denite state and a small number > 0. Then +A
i
can
be a state and we have
Tr F(x)( + A
i
) = Tr F(x) (x X).
The informationally complete property gives A
1
= A
2
= 0 and so A = 0. It
follows that A is invertible and the operator frame property comes.
For the converse, assume that for the distinct quantum states ,= we
have
[A[ =
xX
[Tr F(x)( )[
2
> 0.
Then there must exist an x X such that
Tr (F(x)( )) ,= 0,
or equivalently, Tr F(x) ,= Tr F(x), which means that F is informationally
complete.
Suppose that a POVM F(x) : x X is used for quantum measurement
when the state is . The outcome of the measurement is an element x X
and its probability is p(x) = Tr F(x). If N measurements are performed
on N independent quantum systems (in the same state), then the results
are y
1
, . . . , y
N
. The outcome x X occurs with some multiplicity and the
estimate for the probability is
p(x) = p(x; y
1
, . . . , y
N
) :=
1
N
N
k=1
(x, y
k
). (7.28)
From this information the state estimation has the form
=
xX
p(x)Q(x),
where Q(x) : x X is a set of matrices. If we require that
=
xX
Tr (F(x))Q(x),
should hold for every state , then Q(x) : x X should satisfy some
conditions. This idea will need the concept of dual frame.
For a frame A(x) : x X, a dual frame B(x) : x X is a frame such
that
xX
[B(x)A(x)[ = I,
or equivalently for all C M
d
(C) we have
C =
xX
A(x)[CB(x) =
xX
B(x)[CA(x).
The existence of a dual frame is equivalent to the frame inequality (7.25),
but we also have a canonical construction: The canonical dual frame is
dened by the operators
[
A(x) := A
1
[A(x). (7.29)
Recall that the inverse of A exists whenever A(x) : x X is an operator
frame. Note that given any operator frame A(x) : x X we can construct
a tight frame as A
1/2
[A(x) : x X.
Theorem 7.18 If
A(x) : x X is the canonical dual of an operator frame

A(x) : x X with superoperator A, then
A
1
=
xX
[
A(x)
A(x)[
and the canonical dual of
A(x) : x X is A(x) : x X. For an arbitrary

dual frame B(x) : x X of A(x) : x X the inequality
xX
[B(x)B(x)[
xX
[
A(x)
A(x)[
holds and equality holds only if B

A.
Proof: A and A
1
are self-adjoint superoperators and we have
xX
[
A(x)
A(x)[ =
xX
[A
1
A(x)A
1
A(x)[
= A
1
_
xX
[A(x)A(x)[
_
A
1
= A
1
AA
1
= A
1
.
The second statement is A[
A(x) = [A(x), which comes immediately from

[
A(x) = A
1
[A(x).
Let B be a dual frame of A and dene D(x) := B(x)

A(x). Then
xX
[
A(x)D(x)[ =
xX
_
[
A(x)B(x)[ [
A(x)
A(x)[
_
= A
1
xX
[A(x)B(x)[ A
1
xX
[A(x)A(x)[A
1
= A
1
I A
1
AA
1
= 0.
The adjoint gives
xX
[D(x)
A(x)[ = 0 ,
and
xX
[B(x)B(x)[ =
xX
[
A(x)
A(x)[ +
xX
[
A(x)D(x)[
+
xX
[D(x)
A(x)[ +
xX
[D(x)D(x)[
=
xX
[
A(x)
A(x)[ +
xX
[D(x)D(x)[
xX
[
A(x)
A(x)[
with equality if and only if D 0.
We have the following inequality, which is also known as the frame bound.
Theorem 7.19 Let A(x) : x X be an operator frame with superoperator
A. Then the inequality
x,yX
[A(x)[A(y)[
2
(Tr A)
2
d
2
(7.30)
holds, and we have equality if and only if A(x) : x X is a tight operator
frame.
Proof: Due to (7.26) the left hand side is Tr A
2
, so the inequality holds.
It is clear that equality holds if and only if all eigenvalues of A are the same,
that is, A = cI.
The trace measure is dened by (x) := Tr F(x). The useful superoper-
ator is
F =
xX
[F(x)F(x)[((x))
1
.
Formally this is dierent from the frame superoperator (7.27). Therefore, we
express the POVM F as
F(x) = P
0
(x)
_
(x) (x X)
where P
0
(x) : x X is called a positive operator-valued density
(POVD). Then
F =
xX
[P
0
(x)P
0
(x)[ =
xX
[F(x)F(x)[((x))
1
. (7.31)
F is invertible if and only if A in (7.27) is invertible. As a corollary, we
see that for an informationally complete POVM F, the POVD P
0
can be
considered as a generalized operator frame. The canonical dual frame (in the
sense of (7.29)) then denes a reconstruction operator-valued density
[R
0
(x) := F
1
[P
0
(x) (x X).
We use also the notation R(x) := R
0
(x)(x)
1/2
. The identity
xX
[R(x)F(x)[ =
xX
[R
0
(x)P
0
(x)[ =
xX
F
1
[P
0
(x)P
0
(x)[ = F
1
F = I
(7.32)
then allows state reconstruction in terms of the measurement statistics:
=
_
xX
[R(x)F(x)[
_
=
xX
(Tr F(x))R(x). (7.33)
So this state-reconstruction formula is an immediate consequence of the
action of (7.32) on .
Theorem 7.20 We have
F
1
=
xX
[R(x)R(x)[(x) (7.34)
and the operators R(x) are self-adjoint and Tr R(x) = 1.
Proof: From the mutual canonical dual relation of P
0
(x) : x X and
R
0
(x) : x X we have
F
1
=
xX
[R
0
(x)R
0
(x)[
by Theorem 7.18, and this is (7.34).
The operators R(x) are self-adjoint since F and thus F
1
map self-adjoint
operators to self-adjoint operators. For an arbitrary POVM, the identity
operator is always an eigenvector of the POVM superoperator:
F[I =
xX
[F(x)F(x)[I((x))
1
=
xX
[F(x) = [I. (7.35)
Thus [I is also an eigenvector of F
1
, and we obtain
Tr R(x) = I[R(x) = (x)
1/2
I[R
0
(x) = (x)
1/2
I[F
1
P
0
(x)
= (x)
1/2
I[P
0
(x) = (x)
1
I[F(x) = (x)
1
(x) = 1 .
Note that we need [X[ d

2
for F to be informationally complete. If
this were not the case then F could not have full rank. An IC-POVM with
[X[ = d
2
is called minimal. In this case the reconstruction OVD is unique. In
general, however, there will be many dierent choices.
Example 7.21 Let x
1
, x
2
, . . . , x
d
be an orthonormal basis of C
d
. Then Q
i
=
[x
i
x
i
[ are projections and Q
i
: 1 i d is a POVM. However, it is not
informationally complete. The subset
/ :=
_
d
i=1
i
[x
i
x
i
[ :
1
,
2
, . . . ,
d
C
_
M
d
(C)
is a maximal abelian *-subalgebra, called a MASA.
A good example of an IC-POVM comes from d + 1 similar sets:
Q
(m)
k
: 1 k d, 1 m d + 1
consists of projections of rank one and
Tr Q
(m)
k
Q
(n)
l
=
_
kl
if m = n,
1/d if m ,= n.
The class of POVM is described by
X := (k, m) : 1 k d, 1 m d + 1
and
F(k, m) :=
1
d + 1
Q
(m)
k
, (k, m) :=
1
d + 1
for (k, m) X. (Here is constant and this is a uniformity.) We have
(k,m)X
[F(k, m)F(k, m)[Q
(n)
l
=
1
(d + 1)
2
_
Q
(n)
l
+ I
_
This implies
FA =
_
xX
[F(x)F(x)[((x))
1
_
A =
1
(d + 1)
(A + (Tr A)I) .
So F is rather simple: if Tr A = 0, then FA =
1
d+1
A and FI = I. (Another
formulation is (7.36).)
This example is a complete set of mutually unbiased bases (MUBs)
[85, 54]. The denition
/
m
:=
_
d
k=1
k
Q
(m)
k
:
1
,
2
, . . . ,
d
C
_
M
d
(C)
gives d + 1 MASAs. These MASAs are quasi-orthogonal in the following
sense. If A
i
/
i
and Tr A
i
= 0 (1 i d + 1), then Tr A
i
A
j
= 0 for
i ,= j. The construction of d + 1 quasi-orthogonal MASAs is known when d
is a prime-power (see also [31]). But d = 6 is not a prime-power and it is
already a problematic example.
It is straightforward to conrm that we have the decomposition
F =
1
d
[II[ +
xX
[P(x) I/dP(x) I/d[(x)
for any POVM superoperator (7.31), where P(x) := P
0
(x)(x)
1/2
= F(x)(x)
1
and
1
d
[II[ =
_
1 0
0 0
_
is the projection onto the subspace CI. With a notation
I
0
:=
_
0 0
0 I
d
2
1
_
,
an IC-POVM F(x) : x X is tight if
xX
[P(x) I/dP(x) I/d[(x) = aI
0
.
Theorem 7.22 F is a tight rank-one IC-POVM if and only if
F =
I +[II[
d + 1
=
_
1 0
0
1
d+1
I
d
2
1
_
. (7.36)
(The latter is in the block-matrix form.)
Proof: The constant a can be found by taking the superoperator trace:
a =
1
d
2
1
xX
P(x) I/d[P(x) I/d(x)
=
1
d
2
1
_
xX
P(x)[P(x)(x) 1
_
.
The POVM superoperator of a tight IC-POVM satises the identity
F = aI +
1 a
d
[II[ . (7.37)
In the special case of a rank-one POVM a takes its maximum possible
value 1/(d + 1). Since this is in fact only possible for rank-one POVMs, by
noting that (7.37) can be taken as an alternative denition in the general
case, we obtain the proposition.
It follows from (7.36) that
F
1
=
_
1 0
0 (d + 1)I
d
2
1
_
= (d + 1)I [II[.
This shows that Example 7.21 contains a tight rank-one IC-POVM. Here is
another example.
Example 7.23 An example of an IC-POVM is the symmetric informa-
tionally complete POVM (SIC POVM). The set Q
k
: 1 k d
2
consists of projections of rank one such that

Tr Q
k
Q
l
=
1
d + 1
(k ,= l).
Then X := x : 1 x d
2
and
F(x) =
1
d
Q
x
, F =
1
d
xX
[Q
x
Q
x
[.
We have some simple computations: FI = I and
F(Q
k
I/d) =
1
d + 1
(Q
k
I/d).
This implies that if Tr A = 0, then
FA =
1
d + 1
A.
So the SIC POVM is a tight rank-one IC-POVM.
SIC-POVMs are conjectured to exist in all dimensions [86, 12].
The next theorem will tell that the SIC POVM is characterized by the IC
POVM property.
Theorem 7.24 If a set Q
k
M
d
(C) : 1 k d
2
consists of projections
of rank one such that
d
2
k=1
k
[Q
k
Q
k
[ =
I +[II[
d + 1
(7.38)
with numbers
k
> 0, then
i
=
1
d
, Tr Q
i
Q
j
=
1
d + 1
(i ,= j).
Proof: Note that if both sides of (7.38) are applied to [I, then we get
d
2
i=1
i
Q
i
= I. (7.39)
First we show that
i
= 1/d. From (7.38) we have
d
2
i=1
i
A[Q
i
Q
i
[A = A[
I +[II[
d + 1
[A (7.40)
with
A := Q
k
1
d + 1
I.
(7.40) becomes
k
d
2
(d + 1)
2
+
j=k
j
_
Tr Q
j
Q
k
1
d + 1
_
2
=
d
(d + 1)
2
. (7.41)
The inequality
k
d
2
(d + 1)
2

d
(d + 1)
2
gives
k
1/d for every 1 k d
2
. The trace of (7.39) is
d
2
i=1
i
= d.
Hence it follows that
k
= 1/d for every 1 k d
2
. So from (7.41) we have
j=k
j
_
Tr Q
j
Q
k
1
d + 1
_
2
= 0
and this gives the result.
The state-reconstruction formula for a tight rank-one IC-POVM also takes
an elegant form. From (7.33) we have
=
xX
R(x)p(x) =
xX
F
1
P(x)p(x) =
xX
((d + 1)P(x) I)p(x)
and obtain
= (d + 1)
xX
P(x)p(x) I .
Finally, let us rewrite the frame bound (Theorem 7.19) for the context of
quantum measurements.
Theorem 7.25 Let F(x) : x X be a POVM. Then
x,yX
P(x)[P(y)
2
(x)(y) 1 +
(Tr F 1)
2
d
2
1
, (7.42)
with equality if and only if F is a tight IC-POVM.
Proof: The frame bound (7.30) has a slightly improved form
Tr (A
2
) (Tr (A))
2
/D,
where D is the dimension of the operator space. Setting A = F
1
d
[II[
and D = d
2
1 for M
d
(C) CI then gives (7.42) (using (7.35)).
Informationally complete quantum measurements are precisely those mea-
surements which can be used for quantum state tomography. We will show
that, amongst all IC-POVMs, the tight rank-one IC-POVMs are the most
robust against statistical error in the quantum tomographic process. We will
also nd that, for an arbitrary IC-POVM, the canonical dual frame with re-
spect to the trace measure is the optimal dual frame for state reconstruction.
These results are shown only for the case of linear quantum state tomography.
Consider a state-reconstruction formula of the form
=
xX
p(x)Q(x) =
xX
(Tr F(x))Q(x), (7.43)
where Q(x) : X M
d
(C) is an operator-valued density. If this formula is to
remain valid for all , then we must have
xX
[Q(x)F(x)[ = I =
xX
[Q
0
(x)P
0
(x)[, (7.44)
where Q
0
(x) = (x)
1/2
Q(x) and P
0
(x) = (x)
1/2
F(x). Equation (7.44)
forces Q(x) : x X to be a dual frame of F(x) : x X. Similarly
Q
0
(x) : x X is a dual frame of P
0
(x) : x X. Our rst goal is to nd
the optimal dual frame.
Suppose that we take N independent random samples, y
1
, . . . , y
N
, and the
outcome x occurs with some unknown probability p(x). Our estimate for this
probability is (7.28) which of course obeys the expectation E[ p(x)] = p(x). An
elementary calculation shows that the expected covariance for two samples is
E[(p(x) p(x))(p(y) p(y))] =
1
N
_
p(x)(x, y) p(x)p(y)
_
. (7.45)
Now suppose that the p(x)s are the outcome probabilities for an infor-
mationally complete quantum measurement of the state , p(x) = Tr F(x).
The estimate of is
= (y
1
, . . . , y
N
) :=
xX
p(x; y
1
, . . . , y
N
)Q(x) ,
and the error can be measured by the squared Hilbert-Schmidt distance:
| |
2
2
= , =
x,yX
(p(x) p(x))(p(y) p(y))Q(x), Q(y),
which has the expectation E[| |
2
2
]. We want to minimize this quantity,
but not for an arbitrary , but for some average. (Integration will be on the
set of unitary matrices with respect to the Haar measure.)
Theorem 7.26 Let F(x) : x X be an informationally complete POVM
which has a dual frame Q(x) : x X as an operator-valued density. The
quantum system has a state and y
1
, . . . , y
N
are random samples of the mea-
surements. Then
p(x) :=
1
N
N
k=1
(x, y
k
), :=
xX
p(x)Q(x).
Finally let = (, U) := UU
parametrized by a unitary U. Then for the

average squared distance
_
U
E[| |
2
2
] d(U)
1
N
_
1
d
Tr (F
1
) Tr (
2
)
_
(7.46)
1
N
_
d(d + 1) 1 Tr (
2
)
_
. (7.47)
Equality in the equality (7.46) occurs if and only if Q is the reconstruction
operator-valued density (dened as [R(x) = F
1
[P(x)) and equality in the
inequality (7.47) occurs if and only if F is a tight rank-one IC-POVM.
Proof: For a xed IC-POVM F we have
E[| |
2
2
] =
1
N
x,yX
(p(x)(x, y) p(x)p(y))Q(x), Q(y)
=
1
N
_
xX
p(x)Q(x), Q(x)
xX
p(x)Q(x)
yX
p(y)Q(y)
_
=
1
N
_
p
(Q) Tr (
2
)
_
,
where the formulas (7.45) and (7.43) are used and moreover
p
(Q) :=
xX
p(x) Q(x), Q(x).
Since we have no control over Tr
2
, we want to minimize
p
(Q). The
IC-POVM which minimizes
p
(Q) will in general depend on the quantum
state under examination. We thus set = (, U) := UU
, and now remove

this dependence by taking the Haar average (U) over all U U(d). Note
that
_
U(d)
UPU
d(U)
is the same constant C for any projection of rank 1. If
d
i=1
P
i
= I, then
dC =
d
i=1
_
U(d)
UPU
d(U) = I
and we have C = I/d. Therefore for A =
d
i=1
i
P
i
we have
_
U(d)
UAU
d(U) =
d
i=1
i
C =
I
d
Tr A.
This fact implies
_
U(d)
p
(Q) d(U) =
xX
Tr
_
F(x)
_
U(d)
UU
d(U)
_
Q(x), Q(x)
=
1
d
xX
Tr F(x) Tr Q(x), Q(x)
=
1
d
xX
(x) Q(x), Q(x) =:
1
d

(Q) ,
where (x) := Tr F(x). We will now minimize
(Q) over all choices for Q,

while keeping the IC-POVM F xed. Our only constraint is that Q(x) :
x X remains a dual frame to F(x) : x X (see (7.44)), so that the
reconstruction formula (7.43) remains valid for all . Theorem 7.18 shows
that the reconstruction OVD R(x) : x X dened as [R = F
1
[P is the
optimal choice for the dual frame.
Equation (7.34) shows that
(R) = Tr (F
1
). We will minimize the
quantity
Tr F
1
=
d
2
k=1
1
k
, (7.48)
where
1
, . . . ,
d
2 > 0 denote the eigenvalues of F. These eigenvalues satisfy
the constraint
d
2
k=1
k
= Tr F =
xX
(x)Tr [P(x)P(x)[
xX
(x) = d ,
since Tr [P(x)P(x)[ = Tr P(x)
2
1. We know that the identity operator I
is an eigenvalue of F:
FI =
xX
(x)[P(x) = I
7.5. CRAM
ER-RAO INEQUALITY 311

Thus we in fact take
1
= 1 and then
d
2
k=2
k
d 1. Under this latter
constraint it is straightforward to show that the right-hand side of (7.48) takes
its minimum value if and only if
2
= =
d
2 = (d1)/(d
2
1) = 1/(d+1),
or equivalently,
F = 1
[II[
d
+
1
d + 1
_
I
[II[
d
_
. (7.49)
Therefore, by Theorem 7.22, Tr F
1
takes its minimum value if and only if
F is a tight rank-one IC-POVM. The minimum of Tr F
1
comes from (7.49).
7.5 Cramer-Rao inequality

The Cramer-Rao inequality belongs to the estimation theory in mathemat-
ical statistics. Assume that we have to estimate the state
, where =
(
1
,
2
, . . . ,
N
) lies in a subset of R
N
. There is a sequence of estimates
n
: A
n
R
N
. In mathematical statistics the N N mean quadratic
error matrix
V
n
()
i,j
:=
_
Xn
(
n
(x)
i
i
)(
n
(x)
j

j
) d
n,
(x) (1 i, j N)
is used to express the eciency of the nth estimation and in a good estimation
scheme V
n
() = O(n
1
) is expected. Here, A
n
is the set of measurement
outcomes and
n,
is the probability distribution when the true state is
.
An unbiased estimation scheme means
_
Xn
n
(x)
i
d
n,
(x) =
i
(1 i N)
and the formula simplies:
V
n
()
i,j
:=
_
Xn
n
(x)
i
n
(x)
j
d
n,
(x)
i
j
.
(In mathematical statistics, this is sometimes called covariance matrix of the
estimate.)
The mean quadratic error matrix is used to measure the eciency of an
estimate. Even if the value of is xed, for two dierent estimations the
corresponding matrices are not always comparable, because the ordering of
positive denite matrices is highly partial. This fact has inconvenient conse-
quences in classical statistics. In the state estimation of a quantum system
the very dierent possible measurements make the situation even more com-
plicated.
Assume that d
n,
(x) = f
n,
(x) dx and x . f
n,
is called the likelihood
function. Let
j
:=

j
.
Dierentiating the relation
_
Xn
f
n,
(x) dx = 1,
we have
_
Xn
j
f
n,
(x) dx = 0.
If the estimation scheme is unbiased, then
_
Xn
n
(x)
i
j
f
n,
(x) dx =
i,j
.
As a combination, we conclude
_
Xn
(
n
(x)
i
i
)
j
f
n,
(x) dx =
i,j
for every 1 i, j N. This condition may be written in the slightly dierent
form
_
Xn
_
(
n
(x)
i
i
)
_
f
n,
(x)
_
j
f
n,
(x)
_
f
n,
(x)
dx =
i,j
.
Now the rst factor of the integrand depends on i while the second one on j.
We need the following lemma.
Lemma 7.27 Assume that u
i
, v
i
are vectors in a Hilbert space such that
u
i
, v
j
=
i,j
(i, j = 1, 2, . . . , N).
Then the inequality
A B
1
holds for the N N matrices
A
i,j
= u
i
, u
j
and B
i,j
= v
i
, v
j
(1 i, j N).
7.5. CRAM

The lemma applies to the vectors
u
i
= (
n
(x)
i
i
)
_
f
n,
(x) and v
j
=

j
f
n,
(x)
_
f
n,
(x)
and the matrix A will be exactly the mean square error matrix V
n
(), while
in place of B we have
I
n
()
i,j
=
_
Xn
i
(f
n,
(x))
j
(f
n,
(x))
f
2
n,
(x)
d
n,
(x).
Therefore, the lemma tells us the following:
Theorem 7.28 For an unbiased estimation scheme the matrix inequality
V
n
() I
n
()
1
(7.50)
holds (if the likelihood functions f
n,
satisfy certain regularity conditions).
This is the classical Cramer-Rao inequality. The right hand side is
called the Fisher information matrix. The essential content of the in-
equality is that the lower bound is independent of the estimate
n
but depends
on the the classical likelihood function. The inequality is called classical be-
cause on both sides classical statistical quantities appear.
Example 7.29 Let F be a measurement with values in the nite set A and
assume that
= +
n
i=1
i
B
i
, where B
i
are self-adjoint operators with
Tr B
i
= 0. We want to compute the Fisher information matrix at = 0.
Since
i
Tr
F(x) = Tr B
i
F(x)
for 1 i n and x A, we have
I
ij
(0) =
xX
Tr B
i
F(x)Tr B
j
F(x)
Tr F(x)
.
The essential point in the quantum Cramer-Rao inequality compared with

Theorem 7.28 is that the lower bound is a quantity determined by the family
. Theorem 7.28 allows to compare dierent estimates for a given measure-
ment but two dierent measurements are not comparable.
As a starting point we give a very general form of the quantum Cramer-Rao
inequality in the simple setting of a single parameter. For (, ) R
a statistical operator
is given and the aim is to estimate the value of

the parameter close to 0. Formally
is an m m positive semidenite
matrix of trace 1 which describes a mixed state of a quantum mechanical
system and we assume that
is smooth (in ). Assume that an estimation

is performed by the measurement of a self-adjoint matrix A playing the role
of an observable. (In this case the positive operator-valued measure on R is
the spectral measure of A.) A is an unbiased estimator when Tr
A = .
Assume that the true value of is close to 0. A is called a locally unbiased
estimator (at = 0) if
Tr
=0
= 1 . (7.51)
Of course, this condition holds if A is an unbiased estimator for . To require
Tr
A = for all values of the parameter might be a serious restriction on

the observable A and therefore we prefer to use the weaker condition (7.51).
Example 7.30 Let
:=
exp(H + B)
Tr exp(H + B)
and assume that
0
= e
H
is a density matrix and Tr e
H
B = 0. The Frechet
derivative of
(at = 0) is
_
1
0
e
tH
Be
(1t)H
dt. Hence the self-adjoint operator
A is locally unbiased if
_
1
0
Tr
t
0
B
1t
0
Adt = 1.
(Note that
is a quantum analogue of the exponential family; in terms

of physics
is a Gibbsian family of states.)

Let
[B, C] = Tr J
(B)C be an inner product on the linear space of self-

adjoint matrices.
[ , ] and the corresponding superoperator J
depend on
the density matrix ; the notation reects this fact. When
is smooth in
as already assumed above, we have
Tr
=0
=
0
[B, L] (7.52)
with some L = L
. From (7.51) and (7.52) we have
0
[A, L] = 1, and the
Schwarz inequality yields
Theorem 7.31
0
[A, A]
1
0
[L, L]
. (7.53)
7.5. CRAM

This is the quantum Cramer-Rao inequality for a locally unbiased es-
timator. It is instructive to compare Theorem 7.31 with the classical Cramer-
Rao inequality. If A =
i
E
i
is the spectral decomposition, then the cor-
responding von Neumann measurement is F =
i
E
i
. Take the estimate
(
i
) =
i
. Then the mean quadratic error is
2
i
Tr
0
E
i
(at = 0) which
is exactly the left-hand side of the quantum inequality provided that
0
[B, C] =
1
2
Tr
0
(BC + CB) .
Generally, we want to interpret the left-hand side as a sort of generalized
variance of A. To do this it is useful to assume that
[B, B] = Tr B
2
if B = B.
However, in the non-commutative situation the statistical interpretation seems
to be rather problematic and thus we call this quantity quadratic cost func-
tional.
The right-hand side of (7.53) is independent of the estimator and provides
a lower bound for the quadratic cost. The denominator
0
[L, L] appears to
be in the role of Fisher information here. We call it the quantum Fisher
information with respect to the cost function
0
[ , ]. This quantity de-
pends on the tangent of the curve
. If the densities
and the estimator A

commute, then
L =
1
0
d
=0
=
d
d
log
=0
,
0
[L, L] = Tr
1
0
_
d
=0
_
2
= Tr
0
_
1
0
d
=0
_
2
.
The rst formula justies that L is called the logarithmic derivative.
A coarse-graining is an ane mapping sending density matrices into
density matrices. Such a mapping extends to all matrices and provides a
positive and trace-preserving linear transformation. A common example of
coarse-graining sends a density matrix
12
of a composite system M
m
1
M
m
2
into the (reduced) density matrix
1
of component M
m
1
. There are several
reasons to assume completely positivity for a coarse graining and we do so.
Mathematically a coarse-graining is the same as a state transformation in
an information channel. The terminology coarse-graining is used when the
statistical aspects are focused on. A coarse-graining is the quantum analogue
of a statistic.
Assume that
= +B is a smooth curve of density matrices with tangent

B := d/d at . The quantum Fisher information F
(B) is an information
quantity associated with the pair (, B). It appeared in the Cramer-Rao
inequality above and the classical Fisher information gives a bound for the
variance of a locally unbiased estimator. Now let be a coarse-graining.
Then (
) is another curve in the state space. Due to the linearity of , the

tangent at () is (B). As it is usual in statistics, information cannot be
gained by coarse graining, Therefore we expect that the Fisher information
at the density matrix in the direction B must be larger than the Fisher
information at () in the direction (B). This is the monotonicity property
of the Fisher information under coarse-graining:
F
(B) F
()
((B)) . (7.54)
Although we do not want to have a concrete formula for the quantum Fisher
information, we require that this monotonicity condition must hold. Another
requirement is that F
(B) should be quadratic in B. In other words, there

exists a non-degenerate real bilinear form
(B, C) on the self-adjoint matrices

such that
F
(B) =
(B, B). (7.55)

When is regarded as a point of a manifold consisting of density matrices
and B is considered as a tangent vector at the foot point , the quadratic
quantity
(B, B) may be regarded as a Riemannian metric on the manifold.

This approach gives a geometric interpretation to the Fisher information.
The requirements (7.54) and (7.55) are strong enough to obtain a reason-
able but still wide class of possible quantum Fisher informations.
We may assume that
(B, C) = Tr BJ
1
(C)
for an operator J
acting on all matrices. (This formula expresses the inner

product
by means of the Hilbert-Schmidt inner product and the positive

linear operator J
.) In terms of the operator J
the monotonicity condition

reads as
J
1
()
J
1
(7.56)
for every coarse graining . (
stands for the adjoint of with respect to

the Hilbert-Schmidt product. Recall that is completely positive and trace
preserving if and only if
is completely positive and unital.) On the other

hand the latter condition is equivalent to
J
J
()
. (7.57)
It is interesting to observe the relevance of a certain quasi-entropy:
B
1/2
, f(L
R
1
)B
1/2
= S
B
f
(|),
7.5. CRAM

see (7.9) and (7.11), where L
and R
are in (7.10). When f : R

+
R is
matrix monotone (we always assume f(1) = 1),
(B)
1/2
, f(L
R
1
(B)
1/2
B()
1/2
, f(L
()
R
1
()
)B()
1/2
due to the monotonicity of the quasi-entropy, see Theorem 7.7. If we set

J
= J
f
:= f(L
R
1
)R
,
then (7.57) holds. Therefore,
[B, B] := Tr BJ
(B) = B
1/2
, f(L
R
1
)B
1/2
can be called a quadratic cost function and the corresponding monotone

quantum Fisher information
(B, C) = Tr BJ
1
(C)
will be real for self-adjoint B and C if the function f satises the condition
f(x) = xf(x
1
), see (7.14). This is nothing but a monotone metric discussed
in Section 7.2.
Example 7.32 In order to understand the action of the operator J
, assume
that is diagonal, =
i
p
i
E
ii
. Then one can check that the matrix units
E
kl
are eigenvectors of J
, namely
J
(E
kl
) = p
l
f(p
k
/p
l
)E
kl
.
The condition f(x) = xf(x
1
) gives that the eigenvectors E
kl
and E
lk
have
the same eigenvalues. Therefore, the symmetrized matrix units E
kl
+E
lk
and
iE
kl
iE
lk
are eigenvectors as well.
Since
B =
k<l
Re B
kl
(E
kl
+ E
lk
) +
k<l
ImB
kl
(iE
kl
iE
lk
) +
i
B
ii
E
ii
,
we have
(B, B) = 2
k<l
1
p
k
f(p
k
/p
l
)
[B
kl
[
2
+
i
1
p
i
[B
ii
[
2
.
In place of 2
k<l
, we can write
k=l
.
Any monotone cost function has the property
[B, B] = Tr B
2
for com-
muting and B. The examples below show that it is not so generally.
Example 7.33 The analysis of matrix monotone functions leads to the fact
that among all monotone quantum Fisher informations there is the smallest
one which corresponds to the (largest) function f
max
(t) = (1 + t)/2. In this
case
F
min
(B) = Tr BL = Tr L
2
, where L + L = 2B.
For the purpose of a quantum Cramer-Rao inequality the minimal quantity
seems to be the best, since the inverse gives the largest lower bound. In fact,
the matrix L has been used for a long time under the name of symmetric
logarithmic derivative. In this example the quadratic cost function is
[B, C] =
1
2
Tr (BC + CB)
and we have
J
(B) =
1
2
(B + B) and J
1
(C) = 2
_
0
e
t
Ce
t
dt
for the operator J
. Since J
1
is the smallest, J
is the largest (among all

possibilities).
There is the largest among all monotone quantum Fisher informations and
this corresponds to the function f
min
(t) = 2t/(1 + t). In this case
J
1
(B) =
1
2
(
1
B + B
1
) and F
max
(B) = Tr
1
B
2
.
It is known that the function
f
(t) = (1 )
(t 1)
2
(t
1)(t
1
1)
is matrix monotone for (0, 1). We denote by F
the corresponding Fisher

information. When X is self-adjoint, B = i[, X] := i(XX) is orthogonal
to the commutator of the foot point in the tangent space (see Example 3.30),
and we have
F
(B) =
1
(1 )
Tr ([
, X][
1
, X]). (7.58)
Apart from a constant factor this expression is the skew information pro-
posed by Wigner and Yanase some time ago. In the limiting cases 0 or
1 we have
f
0
(t) =
1 t
log t
and the corresponding quantum Fisher information
(B, C) = K
(B, C) :=
_

0
Tr B( + t)
1
C( + t)
1
dt
7.5. CRAM

will be named here after Kubo and Mori. The Kubo-Mori inner product
plays a role in quantum statistical mechanics. In this case J is the so-called
Kubo transform K (and J
1
is the inverse Kubo transform K
1
),
K
1
(B) :=
_

0
( + t)
1
B( + t)
1
dt and K
(C) :=
_
1
0
t
C
1t
dt .
Therefore the corresponding generalized variance is
[B, C] =
_
1
0
Tr B
t
C
1t
dt .
All Fisher informations discussed in this example are possible Riemannian
metrics of manifolds of invertible density matrices. (Manifolds of pure states
are rather dierent.)
A Fisher information appears not only as a Riemannian metric but as
an information matrix as well. Let / :=
: G be a smooth m-
dimensional manifold of invertible density matrices. The quantum score
operators (or logarithmic derivatives) are dened as
L
i
() := J
1
) (1 i m),
and
Q
ij
() := Tr L
i
()J
(L
j
()) (1 i, j m)
is the quantum Fisher information matrix. This matrix depends on a
matrix monotone function which is involved in the superoperator J. Histori-
cally the matrix Q determined by the symmetric logarithmic derivative (or the
function f
max
(t) = (1 + t)/2) appeared rst in the work of Helstrm. There-
fore, we call this Helstrm information matrix and it will be denoted by
H().
Theorem 7.34 Fix a matrix monotone function f to induce quantum Fisher
information. Let be a coarse-graining sending density matrices on the
Hilbert space 1
1
into those acting on the Hilbert space 1
2
and let / :=
: G be a smooth m-dimensional manifold of invertible density ma-

trices on 1
1
. For the Fisher information matrix Q
(1)
() of / and for the
Fisher information matrix Q
(2)
() of (/) := (
) : G, we have the
monotonicity relation
Q
(2)
() Q
(1)
(). (7.59)
(This is an inequality between mm positive matrices.)
Proof: Set B
i
() :=
. Then J
1
(
)
(B
i
()) is the score operator of
(/). Using (7.56), we have
ij
Q
(2)
ij
()a
i
a
j
= Tr J
1
(
i
a
i
B
i
()
_
j
a
j
B
j
()
_
Tr J
1
i
a
i
B
i
()
__
j
a
j
B
j
()
_
=
ij
Q
(1)
ij
()a
i
a
j
for any numbers a
i
.
Assume that F
j
are positive operators acting on a Hilbert space 1
1
on
which the family / :=
: is given. When
n
j=1
F
j
= I, these
operators determine a measurement. For any
the formula
(
) := Diag(Tr
F
1
, . . . , Tr
F
n
)
gives a diagonal density matrix. Since this family is commutative, all quantum
Fisher informations coincide with the classical I() in the right-hand side of
(7.50) and the classical Fisher information stand on the left-hand side of
(7.59). We hence have
I() Q(). (7.60)
Combination of the classical Cramer-Rao inequality in Theorem 7.28 and
(7.60) yields the Helstrm inequality:
V () H()
1
.
Example 7.35 In this example, we want to investigate (7.60) which is equiv-
alently written as
Q()
1/2
I()Q()
1/2
I
m
.
Taking the trace, we have
Tr Q()
1
I() m. (7.61)
Assume that
= +
k
B
k
,
where Tr B
k
= 0 and the self-adjoint matrices B
k
are pairwise orthogonal
with respect to the inner product (B, C) Tr BJ
1
(C).
7.5. CRAM

The quantum Fisher information matrix
Q
kl
(0) = Tr B
k
J
1
(B
l
)
is diagonal due to our assumption. Example 7.29 tells us about the classical
Fisher information matrix:
I
kl
(0) =
j
Tr B
k
F
j
Tr B
l
F
j
Tr F
j
.
Therefore,
Tr Q(0)
1
I(0) =
k
1
Tr B
k
J
1
(B
k
)
j
(Tr B
k
F
j
)
2
Tr F
j
=
j
1
Tr F
j
k
_
_
Tr
B
k
_
Tr B
k
J
1
(B
k
)
J
1
(J
F
j
)
_
_
2
.
We can estimate the latter sum using the fact that
B
k
_
Tr B
k
J
1
(B
k
)
is an orthonormal system and it remains so when is added to it:
(, B
k
) = Tr B
k
J
1
() = Tr B
k
= 0
and
(, ) = Tr J
1
() = Tr = 1.
Due to the Parseval inequality, we have
_
Tr J
1
(J
F
j
)
_
2
+
k
_
_
Tr
B
k
_
Tr B
k
J
1
(B
k
)
J
1
(J
F
j
)
_
_
2
Tr (J
F
j
)J
1
(J
F
j
)
and
Tr Q(0)
1
I(0)
j
1
Tr F
j
_
Tr (J
F
j
)F
j
(Tr F
j
)
2
_
=
n
j=1
Tr (J
F
j
)F
j
Tr F
j
1 n 1
if we show that
Tr (J
F
j
)F
j
Tr F
j
.
To see this we use the fact that the left-hand side is a quadratic cost and it
can be majorized by the largest one (see Example 7.33):
Tr (J
F
j
)F
j
Tr F
2
j
Tr F
j
,
because F
2
j
F
j
.
Since = 0 is not essential in the above argument, we obtained that
Tr Q()
1
I() n 1,
which can be compared with (7.61). This bound can be smaller than the gen-
eral one. The assumption on B
k
s is not very essential, since the orthogonality
can be reached by reparameterization.
Let / :=
: G be a smooth m-dimensional manifold and assume

that a collection A = (A
1
, . . . , A
m
) of self-adjoint matrices is used to estimate
the true value of .
Given an operator J we have the corresponding cost function
for
every and the cost matrix of the estimator A is a positive denite matrix,
dened by
[A]
ij
=
[A
i
, A
j
]. The bias of the estimator is
b() = (b
1
(), b
2
(), . . . , b
m
())
:= (Tr
(A
1
1
), Tr
(A
2
2
), . . . , Tr
(A
m
m
)).
From the bias vector we form a bias matrix
B
ij
() :=
j
b
i
() (1 i, j m).
For a locally unbiased estimator at
0
, we have B(
0
) = 0.
The next result is the quantum Cramer-Rao inequality for a biased esti-
mate.
Theorem 7.36 Let A = (A
1
, . . . , A
m
) be an estimator of . Then for the
above dened quantities the inequality
[A] (I + B())Q()
1
(I + B()
)
holds in the sense of the order on positive semidenite matrices. (Here I
denotes the identity operator.)
7.5. CRAM

Proof: We will use the block-matrix method. Let X = [X
ij
]
m
i,j=1
be an
mm matrices with n n entries X
ij
, and dene (X) := [(X
ij
)]
m
i,j=1
. For
every
1
, . . . ,
m
C we have
m
i,j=1
j
Tr (X (X
))
ij
=
m
k=1
Tr
_
i
X
ik
_
__
j
X
jk
_
_
0 ,
because
Tr Y (Y
) = Tr Y (Y )
= (Y ), Y 0
for every n n matrix Y . Therefore, the m m ordinary matrix M having
the (i, j) entry Tr (X (X
))
ij
is positive. In the sequel we restrict ourselves
to m = 4 for the sake of simplicity and apply the above fact to the case
X =
_
_
A
1
0 0 0
A
2
0 0 0
L
1
() 0 0 0
L
2
() 0 0 0
_
_
and = J
.
Then we have
M =
_
_
Tr A
1
J
(A
1
) Tr A
1
J
(A
2
) Tr A
1
J
(L
1
) Tr A
1
J
(L
2
)
Tr A
2
J
(A
1
) Tr A
2
J
(A
2
) Tr A
2
J
(L
1
) Tr A
2
J
(L
2
)
Tr L
1
J
(A
1
) Tr L
1
J
(A
2
) Tr L
1
J
(L
1
) Tr L
1
J
(L
2
)
Tr L
2
J
(A
1
) Tr L
2
J
(A
2
) Tr L
2
J
(L
1
) Tr L
2
J
(L
2
)
_
_
0 .
Now we rewrite the matrix M in terms of the matrices involved in our
Cramer-Rao inequality. The 2 2 block M
11
is the generalized covariance,
M
22
is the Fisher information matrix and M
12
is easily expressed as I + B.
We have
M =
_
[A
1
, A
1
]
[A
1
, A
2
] 1 + B
11
() B
12
()
[A
2
, A
1
]
[A
2
, A
2
] B
21
() 1 + B
22
()
1 + B
11
() B
21
()
[L
1
, L
1
]
[L
1
, L
2
]
B
12
() 1 + B
22
()
[L
2
, L
1
]
[L
2
, L
2
]
_
_
0 .
The positivity of a block matrix
M =
_
M
1
C
C
M
2
_
=
_

[A] I + B()
I + B()
Q()
_
implies M
1
CM
1
2
C
, which reveals exactly the statement of the theorem.

(Concerning positive block-matrices, see Chapter 2.)
Let M
: be a smooth manifold of density matrices. The

following construction is motivated by classical statistics. Suppose that a
positive functional d(
1
,
2
) of two variables is given on the manifold. In
many cases one can obtain a Riemannian metric by dierentiation:
g
ij
() =

2
j
d(
( ).
To be more precise the positive smooth functional d( , ) is called a contrast
functional if d(
1
,
2
) = 0 implies
1
=
2
.
Following the work of Csisz ar in classical information theory, Petz intro-
duced a family of information quantities parametrized by a function F : R
+
R
S
F
(
1
,
2
) =
1/2
1
, F((
2
/
1
))
1/2
1
,
see (7.9); F is written here in place of f. ((
2
/
1
) := L
2
R
1
1
is the relative
modular operator of the two densities.) When F is matrix monotone decreas-
ing, this quasi-entropy possesses good properties, for example it is a contrast
functional in the above sense if F is not linear and F(1) = 0. In particular,
for
F
(t) =
1
(1 )
(1 t
)
we have the relative entropy S
(
1
,
2
) of degree in Example 7.9. The
dierentiation is
2
tu
S
( + tB, + uC) =
1
(1 )

2
tu
Tr ( + tB)
1
( + uC)
=: K
(B, C)
at t = u = 0 in the ane parametrization. The tangent space at is decom-
posed into two subspaces, the rst consists of self-adjoint matrices of trace
zero commuting with and the second is i[, X] : X = X
, the set of
commutators. The decomposition is essential both from the viewpoint of dif-
ferential geometry and from the point of view of dierentiation, see Example
3.30. If B and C commute with , then
K
(B, C) = Tr
1
BC
is independent of and it is the classical Fischer information (in matrix form).
If B = i[, X] and C = i[, Y ], then
K
(B, C) =
1
(1 )
Tr ([
1
, X][
, Y ]).
Thus, K
(B, B) is exactly equal to the skew information (7.58).

As an introduction we suggest the book Oliver Johnson, Information Theory
and The Central Limit Theorem, Imperial College Press, 2004. The Gaussian
Markov property is popular in probability theory for single parameters, but
the vector-valued case is less popular. Section 1 is based on the paper T.
Ando and D. Petz, Gaussian Markov triplets approached by block matrices,
Acta Sci. Math. (Szeged) 75(2009), 329345.
Classical information theory is in the book I. Csisz ar and J. K orner, Infor-
mation Theory: Coding Theorems for Discrete Memoryless Systems, Cam-
bridge University Press, 2011. The Shannon entropy appeared in the 1940s
and it is sometimes written that the von Neumann entropy is its generaliza-
tion. However, it is a fact that von Neumann started the quantum entropy
in 1925. Many details are in the books [67, 74]. The f-entropy of Imre
Csiszar is used in classical information theory (and statistics) [35], see also
the paper F. Liese and I. Vajda, On divergences and informations in statistics
and information theory, IEEE Trans. Inform. Theory 52(2006), 43944412.
The quantum generalization was extended by Denes Petz in 1985, for ex-
ample see Chapter 7 in [67]. The strong subadditivity of the von Neumann
entropy was proved by E. H. Lieb and M. B. Ruskai in 1973. Details about
the f-divergence are in the paper [49]. Theorem 7.4 is from the paper K.
M. R. Audenaert, Subadditivity of q-entropies for q > 1, J. Math. Phys.
48(2007), 083507. The quantity (Tr D
q
1)/(1 q) is called the q-entropy or
the Tsallis entropy. It is remarkable that the strong subadditivity is not
true for the Tsallis entropy in the matrix case (but it holds for probability),
good informations are in the paper [38] and S. Furuichi, Tsallis entropies
and their theorems, properties and applications, Aspects of Optical Sciences
and Quantum Information, 2007.
A good introduction to the CCR-algebra is the book [70]. This subject
is far from matrix analysis, but the quasi-free states are really described by
matrices. The description of the Markovian quasi-free state is from the paper
A. Jencov a, D. Petz and J. Pitrik, Markov triplets on CCR-algebras, Acta
Sci. Math. (Szeged), 76(2010), 111134.
Section 7.4 on optimal quantum measurements is from the paper A. J.
Scott, Tight informationally complete quantum measurements, J. Phys. A:
Math. Gen. 39(2006), 13507. MUBs have a big literature. They are commuta-
tive quasi-orthogonal subalgebras. The work of Scott motivated the paper D.
Petz, L. Ruppert and A. Sz anto, Conditional SIC-POVMs, arXiv:1202.5741.
It is interesting that the existence of d MUBs in M
d
(C) implies the existence
of d + 1 MUBs, in the paper M. Weiner, A gap for the maximum number of
mutually unbiased bases, Proc. Amer. Math. Soc. 141(2013), 19631969.
The quasi-orthogonality of non-commutative subalgebras of M
d
(C) has
also big literature, a summary is the paper D. Petz, Algebraic complemen-
tarity in quantum theory, J. Math. Phys. 51(2010), 015215. The SIC POVM
is constructed in 6 dimension in the paper M. Grassl, On SIC-POVMs and
MUBs in dimension 6, http://arxiv.org/abs/quant-ph/0406175.
Section 7.5 is taken from Sections 10.210.4 of D. Petz [74]. The Fisher in-
formation appeared in the 1920s. We can suggest the book of Oliver Johnson
cited above and a paper K. R. Parthasarathy, On the philosophy of Cramer-
Rao-Bhattacharya inequalities in quantum statistics, arXiv:0907.2210. The
general quantum matrix formalism was started by D. Petz in the paper [72].
A. Lesniewski and M. B. Ruskai discovered in [62] that all monotone Fisher
informations are obtained from a quasi-entropy as contrast functional.
7.7 Exercises
1. Prove Theorem 7.2.
2. Assume that 1
2
is one-dimensional in Theorem 7.14. Describe the
possible quasi-free Markov triplet.
3. Show that in Theorem 7.6 condition (iii) cannot be replaced by
D
123
D
1
23
= D
12
D
1
2
.
5. The Bogoliubov-Kubo-Mori Fisher information is induced by the func-
tion
f(x) =
x 1
log x
=
_
1
0
x
t
dt
and
BKM
D
(A, B) = Tr A(J
f
D
)
1
B
for self-adjoint matrices. Show that
BKM
D
(A, B) =
_

0
Tr (D + tI)
1
A(D + tI)
1
Bdt
=

2
ts
S(D + tA|D + sB)
t=s=0
.
7.7. EXERCISES 327
7. Show that
xlog x =
_

0
_
x
1 + t

x
x + t
_
dt
and imply that the function f(x) = xlog x is matrix convex.
8. Dene
S
(
1
|
2
) :=
Tr
1+
1

2
1
for (0, 1). Show that

S(
1
|
2
) S
(
1
|
2
)
for density matrices
1
and
2
.
9. The functions
g
p
(x) :=
_
_
_
1
p(1p)
(x x
p
) if p ,= 1,
xlog x if p = 1
can be used for quasi-entropy. For which p > 0 is the function g
p
matrix
concave?
10. Give an example that condition (iv) in Theorem 7.12 does not imply
condition (iii).
11. Assume that
_
A B
B
C
_
0.
Prove that
Tr (AC B
B) (Tr A)(Tr C) (Tr B)(Tr B
).
(Hint: Use Theorem 7.4 in the case q = 2.)
12. Let and be invertible density matrices. Show that
S(|) Tr ( log(
1/2
1/2
)).
13. For [0, 1] let
(, ) := Tr
1
1.
Find the value of which gives the minimal quantity.
Index
A B, 196
A B, 69
A : B, 196
A#B, 190
A
, 7
A
t
, 7
B(1), 13
B(1)
sa
, 14
E(ij), 6
G
t
(A, B), 206
H(A, B), 196, 207
H
, 9
H
+
n
, 216
I
n
, 5
L(A, B), 208
M/A, 61
[P]M, 63
, , 8
AG(a, b), 187
p
(Q), 306
J
D
, 152
, 242
p
(a), 242
M
f
(A, B), 212
Tr A, 7
Tr
1
, 278
|A|, 13
|A|
p
, 245
|A|
(k)
, 246
M
sa
n
, 14
M
n
, 5
2
-divergence, 288
det A, 7
p
-norms, 242
1, 8
T, 188
ker A, 10
ranA, 10
(A), 19
a
w
b, 231
a
w(log)
b, 232
m
f
(A, B), 203
s(A), 234
v
1
v
2
, 43
2-positive mapping, 89
absolute value, 31
adjoint
matrix, 7
operator, 14
Ando, 222, 269
Ando and Hiai, 265
annihilating
polynomial, 18
antisymmetric tensor-product, 43
arithmetic-geometric mean, 202
Audenaert, 184, 322
Baker-Campbell-Hausdor
formula, 112
basis, 9
Bell, 40
product, 38
Bernstein theorem, 113
Bessis-Moussa-Villani conjecture, 132
Bhatia, 222
bias matrix, 319
328
INDEX 329
bilinear form, 16
Birkho, 229
block-matrix, 58
Boltzmann entropy, 34, 188, 276
Bourin and Uchiyama, 258
bra and ket, 11
Cauchy matrix, 32
Cayley, 48
Cayley transform, 51
Cayley-Hamilton theorem, 18
channel
Pauli, 94
Werner-Holevo, 101
characteristic polynomial, 18
Choi matrix, 93
coarse-graining, 312
completely
monotone, 112
positive, 71, 90
concave, 145
jointly, 151
conditional
expectation, 80
conjecture
BMV, 133
conjugate
convex function, 146
contraction, 13
operator, 157
contrast functional, 321
convex
function, 145
hull, 144
set, 143
cost matrix, 319
covariance, 74
Cramer, 47
Csisz ar, 322
cyclic vector, 20
decomposition
polar, 31
Schmidt, 22
singular value, 36
spectral, 22
decreasing rearrangement, 228
density matrix, 278
determinant, 7, 27
divided dierence, 127, 145
doubly
stochastic, 49, 228, 229
substochastic, 231
dual
frame, 297
mapping, 88
mean, 203
eigenvector, 19
entangled, 67
entropy
Boltzmann, 34, 188
quasi, 282
Renyi, 136
Tsallis, 280, 322
von Neumann, 124
error
mean quadratic, 308
estimator
locally unbiased, 311
expansion operator, 157
exponential, 105, 267
extreme point, 144
factorization
Schur, 60
UL-, 65
family
exponential, 311
Gibbsian, 311
Fisher information, 310
quantum, 314
formula
Baker-Campbell-Hausdor, 112
330 INDEX
Lie-Trotter, 109
Stieltjes inversion, 162
Fourier expansion, 10
frame superoperator, 294
Frobenius inequality, 49
Furuichi, 270, 322
Gauss, 47, 202
Gaussian
distribution, 33
probability, 275
geodesic, 188
geometric mean, 207
weighted, 265
Gibbs state, 233
Gleason, 73
Gleason theorem, 97
Golden-Thompson
-Lieb inequality, 181
inequality, 183, 263, 264
Gram-Schmidt procedure, 9
Holder inequality, 247
Haar measure, 28
Hadamard
inequality, 151
product, 69
Heinz mean, 209
Helstrm inequality, 317
Hermitian matrix, 14
Hessian, 188
Hiai, 222
Hilbert space, 5
Hilbert-Schmidt norm, 245
Holbrook, 222
Horn, 242, 270
conjecture, 270
identity matrix, 5
inequality
Araki-Lieb-Thirring, 263
classical Cramer-Rao, 310
Cramer-Rao, 308
Golden-Thompson, 183, 263, 264
Golden-Thompson-Lieb, 181, 183
Holder, 247
Hadamard, 35, 151
Helstrm, 317
Jensen, 145
Kadison, 88
L owner-Heinz, 141, 172, 191
Lieb-Thirring, 263
Poincare, 23
Powers-Strmer, 261
quantum Cramer-Rao, 311
Rotfeld, 254
Schwarz, 8, 88
Segals, 264
Streater, 119
Weyls, 251
Wielandt, 35, 65
information
Fisher, 310
matrix, Helstrm, 316
skew, 315
informationally complete, 295
inner product, 8
HilbertSchmidt, 9
inverse, 6, 28
generalized, 37
irreducible matrix, 59
Jensen inequality, 145
joint concavity, 201
Jordan block, 18
K-functional, 246
Kadison inequality, 88
Karcher mean, 222
kernel, 10, 86
positive denite, 86, 142
Klyachko, 270
Knuston and Tao, 270
Kosaki, 222
INDEX 331
Kraus representation, 93
Kronecker
product, 41
sum, 41
Kubo, 222
transform, 316
Kubo-Ando theorem, 198
Kubo-Mori
inner product, 316
Ky Fan, 239
norm, 246
L owner, 165
Lagrange, 19
interpolation, 115
Laplace transform, 113
Legendre transform, 146
Lie-Trotter formula, 109
Lieb, 181
log-majorization, 232
logarithm, 117
logarithmic
derivative, 312, 316
mean, 208
majorization, 228
log-, 232
weak, 228
Markov property, 277
Marshall and Olkin, 269
MASA, 79, 300
matrix
bias, 319
concave function, 138
cost, 319
Dirac, 54
doubly stochastic, 228
doubly substchastic, 231
innitely divisible, 33
mean, 202
monotone function, 138
Pauli, 71, 109
permutation, 15
Toeplitz, 15
tridiagonal, 19
upper triangular, 12
matrix-unit, 6
maximally entangled, 67
mean
arithmetic-geometric, 202
binomial, 221
dual, 203
geometric, 190, 207
harmonic, 196, 207
Heinz, 209
Karcher, 222
logarithmic, 208
matrix, 221
power, 221
power dierence, 209
Stolarsky, 210
transformation, 212
weighted, 206
mini-max expression, 158, 234
minimax principle, 24
Molnar, 222
Moore-Penrose
generalized inverse, 37
more mixed, 233
mutually unbiased bases, 301
Neumann series, 14
norm, 8
p
-, 242
Hilbert-Schmidt, 9, 245
Ky Fan, 246
operator, 13, 245
Schatten-von Neumann, 245
symmetric, 242, 243
trace-, 245
unitarily invariant, 243
normal operator, 15
332 INDEX
Ohno, 97
operator
conjugate linear, 16
connection, 196
frame, 294
monotone function, 138
norm, 245
normal, 15
positive, 30
self-adjoint, 14
Oppenheims inequality, 70
ortho-projection, 71
orthogonal projection, 15
orthogonality, 9
orthonormal, 9
P ala, 222
parallel sum, 196
parallelogram law, 50
partial
ordering, 67
trace, 42, 93, 278
Pascal matrix, 53
Pauli matrix, 71, 109
permanent, 47
permutation matrix, 15, 229
Petz, 97, 323
Pick function, 160
polar decomposition, 31
polarization identity, 16
positive
mapping, 30, 88
matrix, 30
POVD, 299
POVM, 81, 294
Powers-Strmer inequality, 261
projection, 71
quadratic
cost function, 314
matrix, 33
quantum
f-divergence, 282
Cramer-Rao inequality, 311
Fisher information, 314
Fisher information matrix, 316
score operator, 316
quasi-entropy, 282
quasi-free state, 289
quasi-orthogonal, 301
Renyi entropy, 136
rank, 10
reducible matrix, 59
relative entropy, 119, 147, 279
representing
block-matrix, 93
function, 200
Riemannian manifold, 188
Rotfeld inequality, 254
Schatten-von Neumann, 245
Schmidt decomposition, 21
Schoenberg theorem, 87
Schr odinger, 22
Schur
complement, 61, 196, 276
factorization, 60
theorem, 69
Schwarz mapping, 283
Segals inequality, 264
self-adjoint operator, 14
separable
positive matrix, 66
Shannon entropy, 271
SIC POVM, 302
singular
value, 31, 234
value decomposition, 235
skew information, 315, 321
spectral decomposition, 21
spectrum, 19
Stolarsky mean, 210
INDEX 333
Streater inequality, 119
strong subadditivity, 150, 178, 279
subadditivity, 150
subalgebra, 78
Suzuki, 132
Sylvester, 48
symmetric
dual gauge function, 248
gauge function, 242
logarithmic derivative, 315
matrix mean, 202
norm, 242, 243
Taylor expansion, 132
tensor product, 38
theorem
Bernstein, 113
Cayley-Hamilton, 18
ergodic, 53
GelfandNaimark, 240
Jordan canonical, 18
Kubo-Ando, 198
L owner, 169
Lidskii-Wielandt, 237
Liebs concavity, 285
Nevanlinna, 160
Riesz-Fischer, 13
Schoenberg, 87
Schur, 55, 69
Tomiyama, 97
Weyl majorization, 236
Weyls monotonicity, 69
trace, 7, 24
trace-norm, 245
transformer inequality, 201, 214
transpose, 7
triangular, 12
tridiagonal, 19
Tsallis entropy, 280, 322
unbiased estimation scheme, 308
unitarily invariant norm, 243
unitary, 15
van der Waerden, 49
Vandermonde matrix, 51
variance, 74
vector
cyclic, 20
von Neumann, 49, 97, 243, 269, 322
von Neumann entropy, 124
weak majorization, 228, 231
weakly positive matrix, 35
weighted
mean, 206
Weyl
inequality, 251
majorization theorem, 236
monotonicity, 69
Wielandt inequality, 35, 65, 96
Wigner, 49
Bibliography
[1] T. Ando, Generalized Schur complements, Linear Algebra Appl.
27(1979), 173186.
[2] T. Ando, Concavity of certain maps on positive denite matrices and ap-
plications to Hadamard products, Linear Algebra Appl. 26(1979), 203
241.
[3] T. Ando, Totally positive matrices, Linear Algebra Appl. 90(1987), 165
219.
[4] T. Ando, Comparison of norms [[[f(A) f(B)[[[ and [[[f([A B[)[[[,
Math. Z. 197(1988), 403409.
[5] T. Ando, Majorization, doubly stochastic matrices and comparison of
eigenvalues, Linear Algebra Appl. 118(1989), 163248.
[6] T. Ando, Majorization and inequalities in matrix theory, Linear Algebra
Appl. 199(1994), 1767.
[7] T. Ando, private communication, 2009.
[8] T. Ando and F. Hiai, Log majorization and complementary Golden-
Thompson type inequalities, Linear Algebra Appl. 197/198(1994), 113
131.
[9] T. Ando and F. Hiai, Operator log-convex functions and operator means,
Math. Ann., 350(2011), 611630.
[10] T. Ando, C-K. Li and R. Mathias, Geometric means, Linear Algebra
Appl. 385(2004), 305334.
[11] T. Ando and D. Petz, Gaussian Markov triplets approached by block
matrices, Acta Sci. Math. (Szeged) 75(2009), 265281.
334
BIBLIOGRAPHY 335
[12] D. M. Appleby, Symmetric informationally complete-positive opera-
tor valued measures and the extended Cliord group, J. Math. Phys.
46(2005), 052107.
[13] K. M. R. Audenaert and J. S. Aujla, On Andos inequalities for convex
and concave functions, Preprint (2007), arXiv:0704.0099.
[14] K. Audenaert, F. Hiai and D. Petz, Strongly subadditive functions, Acta
Math. Hungar. 128(2010), 386394.
[15] J. S. Aujla and F. C. Silva, Weak majorization inequalities and convex
functions, Linear Algebra Appl. 369(2003), 217233.
[16] J. Bendat and S. Sherman, Monotone and convex operator functions.
Trans. Amer. Math. Soc. 79(1955), 5871.
[17]

A. Besenyei, The Hasegawa-Petz mean: properties and inequalities, J.
Math. Anal. Appl. 339(2012), 441450.
[18]

A. Besenyei and D. Petz, Characterization of mean transformations, Lin-
ear Multilinear Algebra, 60(2012), 255265.
[19] D. Bessis, P. Moussa and M. Villani, Monotonic converging variational
approximations to the functional integrals in quantum statistical me-
chanics, J. Mathematical Phys. 16(1975), 23182325.
[20] R. Bhatia, Matrix Analysis, Springer, New York, 1996.
[21] R. Bhatia, Positive Denite Matrices, Princeton Univ. Press, Princeton,
2007.
[22] R. Bhatia and C. Davis, A Cauchy-Schwarz inequality for operators with
applications, Linear Algebra Appl. 223/224(1995), 119129.
[23] R. Bhatia and F. Kittaneh, Norm inequalities for positive operators,
Lett. Math. Phys. 43(1998), 225231.
[24] R. Bhatia and K. R. Parthasarathy, Positive denite functions and op-
erator inequalities, Bull. London Math. Soc. 32(2000), 214228.
[25] R. Bhatia and T. Sano, Loewner matrices and operator convexity, Math.
Ann. 344(2009), 703716.
[26] R. Bhatia and T. Sano, Positivity and conditional positivity of Loewner
matrices, Positivity, 14(2010), 421430.
336 BIBLIOGRAPHY
[27] G. Birkho, Tres observaciones sobre el algebra lineal, Univ. Nac. Tu-
cuman Rev. Ser. A 5(1946), 147151.
[28] J.-C. Bourin, Convexity or concavity inequalities for Hermitian opera-
tors, Math. Ineq. Appl. 7(2004), 607620.
[29] J.-C. Bourin, A concavity inequality for symmetric norms, Linear Alge-
bra Appl. 413(2006), 212217.
[30] J.-C. Bourin and M. Uchiyama, A matrix subadditivity inequality for
f(A+ B) and f(A) + f(B), Linear Algebra Appl. 423(2007) 512518.
[31] A. R. Calderbank, P. J. Cameron, W. M. Kantor and J. J. Seidel, Z
4
-
Kerdock codes, orthogonal spreads, and extremal Euclidean line-sets,
Proc. London Math. Soc. 75(1997) 436.
[32] M. D. Choi, Completely positive mappings on complex matrices, Linear
Algebra Appl. 10(1977), 285290.
[33] J. B. Conway, Functions of One Complex Variable I, Second edition
Springer, New York-Berlin, 1978.
[34] D. A. Cox, The arithmetic-geometric mean of Gauss, Enseign. Math.
30(1984), 275330.
[35] I. Csisz ar, Information type measure of dierence of probability distri-
butions and indirect observations, Studia Sci. Math. Hungar. 2(1967),
299318.
[36] W. F. Donoghue, Jr., Monotone Matrix Functions and Analytic Contin-
uation, Springer, Berlin-Heidelberg-New York, 1974.
[37] W. Feller, An introduction to probability theory with its applications, vol.
II. John Wiley & Sons, Inc., New York-London-Sydney 1971.
[38] S. Furuichi, On uniqueness theorems for Tsallis entropy and Tsallis rel-
ative entropy, IEEE Trans. Infor. Theor. 51(2005), 36383645.
[39] T. Furuta, Concrete examples of operator monotone functions obtained
by an elementary method without appealing to L owner integral repre-
sentation, Linear Algebra Appl. 429(2008), 972980.
[40] F. Hansen and G. K. Pedersen, Jensens inequality for operators and
L owners theorem, Math. Ann. 258(1982), 229241.
BIBLIOGRAPHY 337
[41] F. Hansen and G. K. Pedersen, Jensens operator inequality, Bull. Lon-
don Math. Soc. 35(2003), 553564.
[42] F. Hansen, Metric adjusted skew information, Proc. Natl. Acad. Sci.
USA 105(2008), 99099916.
[43] F. Hiai, Log-majorizations and norm inequalities for exponential opera-
tors, in Linear operators (Warsaw, 1994), 119181, Banach Center Publ.,
38, Polish Acad. Sci., Warsaw, 1997.
[44] F. Hiai and H. Kosaki, Means for matrices and comparison of their norms
Indiana Univ. Math. J. 48(1999), 899936.
[45] F. Hiai and H. Kosaki, Means of Hilbert Space Operators, Lecture Notes
in Math., vol. 1820. Springer, Berlin, 2003.
[46] F. Hiai, H. Kosaki, D. Petz and M. B. Ruskai, Families of completely
positive maps associated with monotone metrics, Linear Algebra Appl.,
to appear.
[47] F. Hiai and D. Petz, The Golden-Thompson trace inequality is comple-
mented, Linear Algebra Appl. 181(1993), 153185.
[48] F. Hiai and D. Petz, Riemannian geometry on positive denite matrices
related to means, Linear Algebra Appl. 430(2009), 31053130.
[49] F. Hiai, M. Mosonyi, D. Petz and C. Beny, Quantum f-divergences and
error correction, Rev. Math. Phys. 23(2011), 691747.
[50] T. Hida, Canonical representations of Gaussian processes and their ap-
plications. Mem. Coll. Sci. Univ. Kyoto. Ser. A. Math. 331960/1961,
109155.
[51] T. Hida and M. Hitsuda, Gaussian processes. Translations of Mathemat-
ical Monographs, 120. American Mathematical Society, Providence, RI,
1993.
[52] A. Horn, Eigenvalues of sums of Hemitian matrices, Pacic J. Math.
12(1962), 225241.
[53] R. A. Horn and C. R. Johnson, Matrix analysis, Cambridge University
Press, 1985.
[54] I. D. Ivanovic, Geometrical description of quantal state determination,
J. Phys. A 14(1981), 3241.
338 BIBLIOGRAPHY
[55] A. A. Klyachko, Stable bundles, representation theory and Hermitian
operators, Selecta Math. 4(1998), 419445.
[56] A. Knuston and T. Tao, The honeycomb model of GL
n
(C) tensor prod-
ucts I: Proof of the saturation conjecture, J. Amer. Math. Soc. 12(1999),
10551090.
[57] T. Kosem, Inequalities between |f(A+B)| and |f(A) +f(B)|, Linear
Algebra Appl. 418(2006), 153160.
[58] F. Kubo and T. Ando, Means of positive linear operators, Math. Ann.
246(1980), 205224.
[59] P. D. Lax, Functional Analysis, John Wiley & Sons, 2002.
[60] P. D. Lax, Linear algebra and its applications, John Wiley & Sons, 2007.
[61] A. Lenard, Generalization of the Golden-Thompson inequality
Tr (e
A
e
B
) Tr e
A+B
, Indiana Univ. Math. J. 21(1971), 457467.
[62] A. Lesniewski and M.B. Ruskai, Monotone Riemannian metrics and rel-
ative entropy on noncommutative probability spaces, J. Math. Phys.
40(1999), 57025724.
[63] E. H. Lieb, Convex trace functions and the Wigner-Yanase-Dyson con-
jecture, Advances in Math. 11(1973), 267288.
[64] E. H. Lieb and R. Seiringer, Equivalent forms of the Bessis-Moussa-
Villani conjecture, J. Stat. Phys. 115(2004), 185190.
[65] K. L owner,

Uber monotone Matrixfunctionen, Math. Z. 38(1934), 177
216.
[66] A. W. Marshall and I. Olkin, Inequalities: Theory of Majorization and
Its Applications, Academic Press, New York, 1979.
[67] M. Ohya and D. Petz, Quantum Entropy and Its Use, Springer, Heidel-
berg, 1993. Second edition 2004.
[68] M. P ala, Weighted matrix means and symmetrization procedures, Lin-
ear Algebra Appl. 438(2013), 17461768.
[69] D. Petz, A variational expression for the relative entropy, Commun.
Math. Phys., 114(1988), 345348.
BIBLIOGRAPHY 339
[70] D. Petz, An invitation to the algebra of the canonical commutation rela-
tion, Leuven University Press, Leuven, 1990.
[71] D. Petz, Quasi-entropies for states of a von Neumann algebra, Publ.
RIMS. Kyoto Univ. 21(1985), 781800.
[72] D. Petz, Monotone metrics on matrix spaces. Linear Algebra Appl.
244(1996), 8196.
[73] D. Petz, Quasi-entropies for nite quantum systems, Rep. Math. Phys.,
23(1986), 57-65.
[74] D. Petz, Quantum Information Theory and Quantum Statistics,
Springer, Berlin, 2008.
[75] D. Petz and H. Hasegawa, On the Riemannian metric of -entropies of
density matrices, Lett. Math. Phys. 38(1996), 221225.
[76] D. Petz and R. Temesi, Means of positive numbers and matrices, SIAM
Journal on Matrix Analysis and Applications, 27(2006), 712-720.
[77] D. Petz, From f-divergence to quantum quasi-entropies and their use.
Entropy 12(2010), 304-325.
[78] M. Reed and B. Simon, Methods of Modern Mathematical Physics II,
Academic Press, New York, 1975.
[79] E. Schr odinger, Probability relations between separated systems, Proc.
Cambridge Philos. Soc. 31(1936), 446452.
[80] M. Suzuki, Quantum statistical Monte Carlo methods and applications
to spin systems, J. Stat. Phys. 43(1986), 883909.
[81] C. J. Thompson, Inequalities and partial orders on matrix spaces, Indi-
ana Univ. Math. J. 21(1971), 469480.
[82] J. A. Tropp, From joint convexity of quantum relative entropy to a con-
cavity theorem of Lieb, Proc. Amer. Math. Soc. 140(2012), 17571760.
[83] M. Uchiyama, Subadditivity of eigenvalue sums, Proc. Amer. Math. Soc.
134(2006), 14051412.
[84] H. Wielandt, An extremum property of sums of eigenvalues, Proc. Amer.
Math. Soc. 6(1955), 106110.
340 BIBLIOGRAPHY
[85] W. K. Wootters and B. D. Fields, Optimal state-determination by mu-
tually unbiased measurements, Ann. Phys. 191(1989), 363.
[86] G. Zauner, Quantendesigns - Grundz uge einer nichtkommutativen De-
signtheorie, PhD thesis (University of Vienna, 1999).
[87] X. Zhan, Matrix inequalities, Springer, 2002.
[88] F. Zhang, The Schur complement and its applications, Springer, 2005.

Matrix PD

Uploaded by

Copyright:

Available Formats

Matrix PD

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matrix PD

Uploaded by

Copyright:

Available Formats

Introduction to Matrix

Analysis and Applications

. Self-adjoint matrices are

ab. This was rst made by Pusz and Woronowicz. A general

is the complex conjugate of the transpose A

and the last expression is (Tr X)(Tr Y ).

: / 1 is determined by the formula

. The converse is also true: If P

is the inverse of U. Then

of a conjugate-linear operator is determined by the equation

The matrix (1.11) is tridiagonal. This means that A

, the orthogonal complement of u

If A is a self-adjoint operator on an n-dimensional Hilbert space, then from

are not 0 and [x

is the orthogonal expansion. We can dene [U

, then from the data

is upper triangular. (In

and the spectrum of T lies in R

A for some operator A B(1).

is positive for a unitary U.

T are the notations. It

A 0. So, dene [A[ := (A

in terms of the singular value decomposition (1.21).

is called the generalized inverse of A or the Moore-Penrose

v = v. If this condition holds, then the solution is

(see the denition before Example 1.56). It is enough to check that

is a family of pairwise orthogonal unit vectors. Now

and we arrived at the orthogonal expansion (1.23).

x = x for every . (1.25)

y for the inner product x[y and xy

for the rank one operator

v[ for all A B(1).

A. Show that |A+B|

. (These conditions include that A and D are square matrices,

is the Moore-Penrose generalized inverse.

Example 2.9 Let 1 be an n-dimensional Hilbert space and A B(1) be

leads to the equations

C is the unique solution and moreover

0 by Theorem 2.1. The

for every operator X.

, since the range of I P is the orthogonal complement of the

are self-adjoint operators. Therefore, it is enough

(A, B) = Tr AB (Tr A)(Tr B).

(A) can be reduced to k k matrices. The equality in

in the previous lemma we choose

. In what follows, to simplify the notation, we shall

, the commutant of /, is given

for all A /. Since / is a -algebra, this implies

. Thus, for any B /

, this shows that

is an automatic consequence of the denitions, this

. Therefore, c(BA) = c(B)A for every

Theorem 2.36 Assume that /

, and these operators commute.

is positive and is trace-

The linear mapping : M

, then the 2-positivity condition is not necessary in the previous

(A) and (AA