RM Notes Speicher v2
RM Notes Speicher v2
RM Notes Speicher v2
Random matrices
Lecture notes
Winter 2019/20
0.35 8
0.3 6
4
0.25
2
0.2
0.15
-2
0.1
-4
0.05
-6
0 -8
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 0 500 1000 1500
1.5 0.5
0.45
1
0.4
0.35
0.5
0.3
0 0.25
0.2
-0.5
0.15
0.1
-1
0.05
-1.5 0
-1.5 -1 -0.5 0 0.5 1 1.5 -5 -4 -3 -2 -1 0 1 2
2
Table of contents
1 Introduction 7
1.1 Brief history of random matrix theory . . . . . . . . . . . . . . . . . . 7
1.2 What are random matrices and what do we want to know about them? 8
1.3 Wigner’s semicircle law . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Concentration phenomena . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 From histograms to moments . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Choice of scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.8 The semicircle and its moments . . . . . . . . . . . . . . . . . . . . . 15
1.9 Types of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3
5.2 Stein’s identity for independent Gaussian variables . . . . . . . . . . 51
5.3 Semicircle law for GOE . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4
11 The Circular Law 113
11.1 Circular law for Ginibre ensemble . . . . . . . . . . . . . . . . . . . . 113
11.2 General circular law . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
13 Exercises 127
13.1 Assignment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
13.2 Assignment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
13.3 Assignment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
13.4 Assignment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
13.5 Assignment 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
13.6 Assignment 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
13.7 Assignment 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
13.8 Assignment 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
13.9 Assignment 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
13.10Assignment 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
13.11Assignment 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
14 Literature 141
5
1 Introduction
We start with giving a brief history of the subject and a feeling for some the basic
objects, questions, and methods - this is just motivational and should be seen as an
appetizer. Rigorous versions of statements will come in later chapters.
7
1.2 What are random matrices and what do we
want to know about them?
A random matrix is a matrix A = (aij )N
i,j=1 where the entries aij are chosen randomly
and we are mainly interested in the eigenvalues of the matrices. Often we require
A to be selfadjoint, which guarantees that its eigenvalues are real.
Example 1.1. Choose aij ∈ {−1, +1} with aij = aji for all i, j. We consider all
such matrices and ask for typical or generic behaviour of the eigenvalues. In a more
probabilistic language we declare all allowed matrices to have the same probability
and we ask for probabilities of properties of the eigenvalues. We can do this for
different sizes N . To get a feeling, let us look at different N .
• For N = 1 we have two matrices.
matrix eigenvalues probability of the matrix
1
(1) +1 2
1
(−1) −1 2
• For N = 2 we have eight matrices.
matrix
! eigenvalues probability of the matrix
1 1 1
0, 2 8
1 1
√ √
!
1 1 1
− 2, 2
1 −1 8
!
1 −1 1
0, 2
−1 1 8
√ √
!
−1 1 1
− 2, 2
1 −1 8
√ √
!
1 −1 1
− 2, 2
−1 −1 8
!
−1 1 1
−2, 0
1 −1 8
8
√ √
!
−1 −1 1
− 2, 2
−1 1 8
!
−1 −1 1
−2, 0
−1 −1 8
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
−10 −8 −6 −4 −2 0 2 4 6 8 10 −10 −8 −6 −4 −2 0 2 4 6 8 10
9
The message of those histograms is not so clear - apart from maybe that
degeneration of eigenvalues is atypical. However, if we increase N further
then there will appear much more structure.
• Here are the eigenvalue histograms for two “random” 100 × 100 matrices ...
0.4 0.4
0.35 0.35
0.3 0.3
0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
• ... and here for two “random” 3000 × 3000 matrices ...
0.4 0.4
0.35 0.35
0.3 0.3
0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
To be clear, no coins were thrown for producing those matrices, but we relied on
the MATLAB procedure for creating random matrices. Note that we also rescaled
our matrices, as we will address in Section 1.7.
10
1.3 Wigner’s semicircle law
What we see in the above figures is the most basic and important result of random
matrix theory, the so-called Wigner’s semicirlce law . . .
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
. . . which says that typically the eigenvalue distribution of such a random matrix
converges to Wigner’s semicircle for N → ∞.
0.3 0.3
0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Note the quite surprising feature that the limit of the random eigenvalue dis-
tribution for N → ∞ is a deterministic object - the semicircle distribution. The
randomness disappears for large N .
11
1.4 Universality
This statement is valid much more generally. Choose the aij not just from {−1, +1}
but, for example,
• aij ∈ {1, 2, 3, 4, 5, 6},
• aij normally (Gauß) distributed,
• aij distributed according to your favorite distribution,
but still independent (apart from symmetry), then we still have the same result:
The eigenvalue distribution typically converges to a semicircle for N → ∞.
Thus,
vol(B) n→∞
= 1 − (1 − ε)n −→ 1.
vol(B1 (0))
This says that in high dimensions the volume of a ball is concentrated in an arbitrar-
ily small neighborhood of the surface. This is, of course, not true in small dimension -
hence from our usual 3-dimensional perspective this appears quite counter-intuitive.
12
1.6 From histograms to moments
Let AN = A = (aij )N
i,j=1 be our selfadjoint matrix with aij = ±1 randomly chosen.
Then we typically see for the eigenvalues of A:
N →∞
s t s t
where λ1 , . . . , λN are the eigenvalues of A counted with multiplicity and 1[s,t] is the
characteristic function of [s, t], i.e,
1, x ∈ [s, t],
1[s,t] (x) =
0, x 6∈ [s, t].
for all f = 1[s,t] . It is easier to calculate this for other functions f , in particular, for
f of the form f (x) = xn , i.e.,
N
1 X n N →∞
Z
λ −−−→ xn dµW (x); (??)
N i=1 i
R
13
the latter are the moments of µW . (Note that µW must necessarily be a probability
measure.)
We will see later that in our case the validity of (?) for all s < t is equivalent to
the validity of (??) for all n. Hence we want to show (??) for all n.
Remark 1.3. The above raises of course the question: What is the advantage of (??)
over (?), or of xn over 1[s,t] ?
Note that A = A∗ is selfadjoint and hence can be diagonalized, i.e., A = U DU ∗ ,
where U is unitary and D is diagonal with dii = λi for all i (where λ1 , . . . , λN are
the eigenvalues of A, counted with multiplicity). Moreover, we have
λn1
An = (U DU ∗ )n = U Dn U ∗ with Dn =
...
,
λnN
hence
N
λni = Tr(Dn ) = Tr(U Dn U ∗ ) = Tr(An )
X
i=1
and thus
N
1 X 1
λni = Tr(An ).
N i=1 N
1
Notation 1.4. We denote by tr = N
Tr the normalized trace of matrices, i.e.,
N
1 X
tr (aij )N
i,j=1 = aii .
N i=1
So we are claiming that for our matrices we typically have that
Z
N →∞
tr(AnN ) −−−→ xn dµW (x).
The advantage in this formulation is that the quantity tr(AnN ) can be expressed in
terms of the entries of the matrix, without actually having to calculate the eigen-
values.
14
Since this has to converge for N → ∞ we should rescale our matrices
1
AN √ AN ,
N
i.e., we consider matrices AN = (aij )N √1
i,j=1 , where aij = ± N . For this scaling we
claim that we typically have that
Z
N →∞
tr(AnN ) −−−→ xn dµW (x)
−2 2
15
(ii) The Catalan numbers are uniquely determined by this recursion and by
the initial value C0 = 1.
(2) The semicircular distribution µW is a probability measure, i.e.,
2
1 Z √
4 − x2 dx = 1
2π
−2
16
(2) Then we show that with high probability the derivation from the average will
become small as N → ∞.
We will first consider step (1); (2) is a concentration phenomenon and will be
treated later.
Note that step (1) is giving us the insight into why the semicircle shows up. Step
(2) is more of a theoretical nature, adding nothing to our understanding of the
semicircle, but making the (very interesting!) statement that in high dimensions
the typical behaviour is close to the average behaviour.
17
2 Gaussian Random Matrices:
Wick Formula and
Combinatorial Proof of
Wigner’s Semicircle
We want to prove convergence of our random matrices to the semicircle by showing
N →∞
h i
E tr A2k
N −−−→ Ck ,
1 Z n − t2
E [X n ] = √ t e 2 dt.
2π
R
19
Proposition 2.2. The moments of a standard Gaussian random variable are of the
form
∞
1 Z n − t2 0, n odd,
√ t e 2 dt =
2π −∞ (n − 1)!!, n even,
20
(2) Follows from (i) and Proposition 2.2.
Example 2.6. Usually we draw our partitions by connecting the elements in each
pair. Then E [X 2 ] = 1 corresponds to the single partition
1 2
1 2 3 4 1 2 3 4
1 2 3 4
, , .
Remark 2.7 (Independent Gaussian random variables). We will have several, say
two, Gaussian random variables X, Y and have to calculate their joint moments.
The random variables are independent; this means that their joint distribution is
the product measure of the single distributions,
E [X n Y m ] = E [X n ] · E [Y m ] .
This gives then also a combinatorial description for their mixed moments:
E [X n Y m ] = E [X n ] · E [Y m ]
= #{pairings of X · · X}} · #{pairings of |Y ·{z
| ·{z · · Y}}
n m
| ·{z
= #{pairings of X · · X} Y
| ·{z
· · Y} which connect X with X and Y with Y }.
n m
X X Y Y.
On the other hand, E [XXXY XY ] = 3 since we have the following three possible
pairings:
X X X Y X Y X X X Y X Y X X X Y X Y
21
Consider now x1 , . . . , xn ∈ {X, Y }. Then we still have
Hence we have:
X Y
E [x1 · · · xn ] = E [xi xj ]
π∈P2 (n) (i,j)∈π
Theorem 2.8 (Wick 1950, physics; Isserlis 1918, statistics). Let Y1 , . . . , Yp be inde-
pendent standard Gaussian random variables and consider x1 , . . . , xn ∈ {Y1 , . . . , Yp }.
Then we have the Wick formula
X
E [x1 · · · xn ] = Eπ [x1 , . . . , xn ] ,
π∈P2 (n)
Note that the Wick formula is linear in the xi , hence it remains valid if we replace
the xi by linear combinations of the xj . In particular, we can go over to complex
Gaussian variables.
Definition 2.9. A standard complex Gaussian random variable Z is of the form
X + iY
Z= √ ,
2
where X and Y are independent standard real Gaussian variables.
X+iY
Remark 2.10. Let Z be a standard complex Gaussian, i.e., Z = √
2
. Then we have
Z̄ = X−iY
√
2
and the first and second moments are given by
h i
• E [Z] = 0 = E Z̄
• E [Z 2 ] = E [ZZ] = 12 [E [XX] − E [Y Y ] + i (E [XY ] + E [Y X])] = 0
22
h i
• E Z̄ 2 = 0
h i
• E [|Z|2 ] = E Z Z̄ = 12 [E [XX] + E [Y Y ] + i (E [Y X] − E [Y X])] = 1
z1 z2
Hence, for z1 , z2 ∈ {Z, Z̄} and π = we have
1, π connects Z with Z̄,
E [z1 z2 ] =
0, π connects (Z with Z) or (Z̄ with Z).
Definition 2.12. A Gaussian random matrix is of the form AN = √1N (aij )N i,j=1 ,
where
• AN = A∗N , i.e., aij = āji for all i, j,
• {aij | i ≥ j} are independent,
• each aij is a standard Gaussian random variable, which is complex for i 6= j
and real for i = j.
Remark 2.13. (1) More precisely, we should address the above as selfadjoint Gaus-
sian random matrices.
(2) Another common name for those random matrices is gue, which stands for
Gaussian unitary ensemble. “Unitary” corresponds here to the fact that the
entries are complex, since such matrices are invariant under unitary transfor-
mations. With gue(n) we denote the gue of size N × N . There are also real
and quaternionic versions, Gaussian orthogonal ensembles goe, and Gaussian
symplectic ensembles gse.
23
(3) Note that we can also express this definition in terms of the Wick formula 2.11
as
h i X h i
E ai(1)j(1) · · · ai(n)j(n) = Eπ ai(1)j(1) , . . . , ai(n)j(n) ,
π∈P2 (n)
for all n and 1 ≤ i(1), j(1), . . . , i(n), j(n) ≤ N , and where the second moments
are given by
and more concretely, E [a12 a21 a11 a11 ] = 1 and E [a12 a12 a21 a21 ] = 2.
Remark 2.14 (Calculation of E [tr(Am
N )]). For our Gaussian random matrix we want
to calculate their moments
N
1 1 h i
E [tr(Am
X
N )] = E ai(1)i(2) ai(2)i(3) · · · ai(m)i(1) .
N N m/2 i(1),...,i(m)=1
Let us first consider small examples before we treat the general case:
(1)
N
h i 1 X 1
E tr(A2N ) = 2 E [aij aji ] = 2 N 2 = 1,
N i,j=1 | {z } N
=1
24
and calculate
N N
1 = N 3,
X X
Eπ1 [aij , ajk , akl , ali ] =
i,j,k,l=1 i,j,k,l=1
i=k
N N
1 = N 3,
X X
Eπ2 [aij , ajk , akl , ali ] =
i,j,k,l=1 i,j,k,l=1
j=l
N
X N
X N
X
Eπ3 [aij , ajk , akl , ali ] = 1= 1 = N,
i,j,k,l=1 i,j,k,l=1 i=1
i=l,j=k,j=i,k=l
hence h i 1 3 1
E tr(A4N ) = 3
N + N 3
+ N = 2 + 2.
N N
So we have h i
lim E tr(A4N ) = 2 = C2 .
N →∞
find that
N
1 h i
E [tr(Am
X X Y
N )] = E ai(k)i(k+1) ai(l)i(l+1)
N m/2+1 i(1),...,i(m)=1 π∈P2 (m) (k,l)∈π
N
1 X X Yh i
= i(k) = i(π(k) + 1) ,
N m/2+1 π∈P2 (m) i(1),...,i(m)=1 k
| {z }
γπ(k)
25
γπ ∈ Sm . Thus we get finally
1
E [tr(Am N #(γπ) ,
X
N )] =
N m/2+1 π∈P2 (m)
Theorem 2.15. Let AN be a gue(n) random matrix. Then we have for all m ∈ N,
m
E [tr(Am N #(γπ)− 2 −1 .
X
N )] =
π∈P2 (m)
Example 2.16. (1) This says in particular that all odd moments are zero, since
P2 (2k + 1) = ∅.
(2) Let m = 2, then γ = (1, 2) and we have only one π = (1, 2); then γπ = id =
(1)(2), and thus #(γπ) = 2 and
m
#(γπ) − − 1 = 0.
2
Thus, h i
E tr A2N = N 0 = 1.
(3) Let m = 4 and γ = (1, 2, 3, 4). Then there are three π ∈ P2 (4) with the
following contributions:
π γπ #(γπ) − 3 contribution
(1, 2)(34) (1, 3)(2)(4) 0 N0 = 1
(13)(24) (1, 4, 3, 2) −2 N −2 = N12
(14)(23) (1)(2, 4)(3) 0 N0 = 1
so that
h i 1
E tr A4N =2+ .
N2
(4) In the same way one can calculate that
h i 1
E tr A6N = 5 + 10
,
N2
h i 1 1
E tr A8N = 14 + 70 2 + 21 4 .
N N
26
(5) For m = 6 the following 5 pairings give contribution N 0 :
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
Those are non-crossing pairings, all other pairings π ∈ P2 (6) have a crossing,
e.g.:
1 2 3 4 5 6
i j k k
is not allowed!
We put
N C 2 (m) = {π ∈ P2 (m) | π is non-crossing} .
27
(3) The 5 elements of N C 2 (6) are given in Example 2.16 (v), P2 (6) contains 15
elements; thus there are 15 − 5 = 10 more elements in P2 (6) with crossings.
Remark 2.19. Note that NC-pairings have a recursive structure, which usually is
crucial for dealing with them.
(1) The first pair of π ∈ N C 2 (2k) must necessarily be of the form (1, 2l) and the
remaining pairs can only pair within {2, . . . , 2l − 1} or within {2l + 1, . . . , 2l}.
1 2l
... ... ... ...
(2) Iterating this shows that we must find in any π ∈ N C 2 (2k) at least one pair of
the form (i, i + 1) with 1 ≤ i ≤ 2k − 1. Removing this pair gives a NC-pairing
of 2k − 2 points. This characterizes the NC-pairings as those pairings, which
can be reduced to the empty set by iterated removal of pairs, which consist of
neighbors.
An example for the reduction of a non-crossing pairing is the following.
1 2 3 4 5 6 7 8
1 2 5 8
→ → 1 8
→∅
In the case of a crossing pairing, some reductions might be possible, but eventually
one arrives at a point, where no further reduction can be done.
1 2 3 4 5 6 1 4 5 6
→ no further reduction possible!
Proposition 2.20. Consider m even and let π ∈ P2 (m), which we identify with a
permutation π ∈ Sm . As before, γ = (1, 2, . . . , m) ∈ Sm . Then we have:
(1) #(γπ) − m2 − 1 ≤ 0 for all π ∈ P2 (m).
(2) #(γπ) − m2 − 1 = 0 if and only if π ∈ N C 2 (m).
Proof. First we note that a pair (i, i + 1) in π corresponds to a fixed point of γπ.
π γ π γ
More precisely, in such a situation we have i + 1 → − i→ − i + 1 and i →
− i+1 →
− i + 2.
Hence γπ contains the cycles (i + 1) and (. . . , i, i + 2, . . . ).
This implication also goes in the other direction: If γπ(i + 1) = i + 1 then
π(i + 1) = γ −1 (i + 1) = i. Since π is a pairing we must then also have π(i) = i + 1
and hence we have the pair (i, i + 1) in π.
28
If we have (i, i + 1) in π, we can remove the points i and i + 1, yielding another
pairing π̃. By doing so, we remove in γπ the cycle (i + 1) and we remove in the cycle
(. . . , i, i + 2, . . . ) the point i, yielding γ π̃. We reduce thus m by 2 and #(γπ) by 1.
If π is NC we can iterate this until we arrive at π̃ with m = 2. Then we have
π̃ = (1, 2) and γ = (1, 2) such that γ π̃ = (1)(2) and #(γ π̃) = 2. If m = 2k we did
k − 1 reductions where we reduced in each step the number of cycles by 1 and at
the end we remain with 2 cycles, hence
m
#(γπ) = (k − 1) · 1 + 2 = k + 1 = + 1.
2
Here is an example for this:
1 2 3 4 5 6 7 8
1 2 5 8
−−−−→ 1 8
remove remove
π= −−−−→
(3,4),(6,7) (2,5)
29
Proof. This is true for m odd, since then both sides are equal to zero. Consider
m = 2k even. Then Theorem 2.15 and Proposition 2.20 show that
m
lim E [tr (Am lim N #(γπ)− 2 −1 =
X X
N )] = 1 = #N C 2 (m).
N→∞ N →∞
π∈P2 (m) π∈N C 2 (m)
Since the moments of the semicircle are given by the Catalan numbers, it remains
to see that #N C 2 (2k) is equal to the Catalan number Ck . To see this, we now count
dk := #N C 2 (2k) according to the recursive structure of NC-pairings as in 2.19 (i).
1 2l 2k
... ... ... ...
Namely, we can identify π ∈ N C 2 (2k) with {(1, 2l)}∪π0 ∪π1 , where l ∈ {1, . . . , k},
π0 ∈ N C 2 (2(l − 1)) and π1 ∈ N C 2 (2(k − l)). Hence we have
k
X
dk = dl−1 dk−l , where d0 = 1.
l=1
This is the recursion for the Catalan numbers, whence dk = Ck for all k ∈ N.
Remark 2.22. (1) One can refine
m m
#(γπ) − −1≤0 to #(γπ) − − 1 = −2g(π)
2 2
for g(π) ∈ N0 . This g has the meaning that it is the minimal genus of a
surface on which π can be drawn without crossings. NC pairings are also
called planar, they correspond to g = 0. Theorem 2.15 is usually addressed as
genus expansion,
E [tr(Am N −2g(π) .
X
N )] =
π∈P2 (m)
(2) For example, (1, 2)(3, 4) ∈ N C 2 (4) has g = 0, but the crossing pairing (1, 3)(2, 4) ∈
P2 (4) has genus g = 1. It has a crossing in the plane but this can be avoided
on a torus.
1 1
4 2 4 2
3 3
30
(3) If we denote
εg (k) = # {π ∈ P2 (2k) | π has genus g}
then the genus expansion 2.15 can be written as
h i
E tr(A2k εg (k)N −2g .
X
N) =
g≥0
We know that !
1 2k
εg (0) = Ck = ,
k+1 k
but what about the εg (k) for g > 0? There does not exist an explicit formula
for them, but Harer and Zagier have shown in 1986 that
(2k)!
εg (k) = · λg (k),
(k + 1)!(k − 2g)!
We will come back later to this statement of Harer and Zagier; see Theorem
9.2
31
3 Wigner Matrices:
Combinatorial Proof of
Wigner’s Semicircle Law
Wigner’s semicircle law does not only hold for Gaussian random matrices, but more
general for so-called Wigner matrices; there we keep the independence and identical
distribution of the entries, but allow arbitrary distribution instead of Gaussian. As
there is no Wick formula any more, there is no clear advantage of the complex over
the real case any more, hence we will consider in the following the real one.
Remark 3.2. (1) In our combinatorial setting we will assume that all moments of
µ exist; that the first moment is 0; and the second moment will be normalized
to 1. In an analytic setting one can deal with more general situations: usually
only the existence of the second moment is needed; and one can also allow
non-vanishing mean.
(2) Often one also allows different distributions for the diagonal and the off-
diagonal entries.
(3) Even more general, one can give up the assumption of identical distribution
of all entries and replace this by uniform bounds on their moments.
(4) We will now try to imitate our combinatorial proof from the Gaussian case
also in this more general situation. Without a precise Wick formula for the
higher moments of the entries, we will not aim at a precise genus expansion;
33
it suffices to see that the leading contributions are still given by the Catalan
numbers.
Z Z
x dµ(x) = 0, x2 dµ(x) = 1.
R R
Then
N
1 1
E [tr(Am
X X X
N )] = 1+ m
E [ai1 i2 ai2 i3 · · · aim i1 ] = 1+ m
E [σ] ,
N 2
i1 ,...,im =1 N 2
σ∈P(m) i : [m]→[N ]
ker i=σ
where we group the appearing indices (i1 , . . . , im ) according to their “kernel”, which
is a “partition σ of {1, . . . , m}.
The Vi are called blocks of σ. The set of all partitions of [n] is denoted by
(2) For a multi-index i = (i1 , . . . , im ) we denote by ker i its kernel; this is the
partition σ ∈ P(m) such that we have ik = il if and only if k and l are in the
same block of σ. If we identify i with a function i : [m] → [N ] via i(k) = ik
then we can also write
n o
ker i = i−1 (1), i−1 (2), . . . , i−1 (N ) ,
34
Example 3.4. For i = (1, 2, 1, 3, 2, 4, 2) we have
ik 1 2 1 3 2 4 2
k 1 2 3 4 5 6 7
such that
ker i = {(1, 3), (2, 5, 7), (4), (6)} ∈ P(7).
Remark 3.5. The relevance of this kernel in our setting is the following:
For i = (i1 , . . . , im ) and j = (j1 , . . . , jm ) with ker i = ker j we have
1 2 3 4 5 6
ker i = = ker j
and
h i h i h i h i
E [a11 a12 a21 a11 a12 a21 ] = E a211 E a412 = E a222 E a427 = E [a22 a27 a72 a22 a27 a72 ] .
Thus we get:
1
E [tr(Am
X
N )] = 1+ m
E [σ] · # {i : [m] → [N ] | ker i = σ} .
N 2
σ∈P(m)
35
Example 3.7. (1) For
we have
1=3
Gσ =
4 2=5
(2) For
we have
1=5
Gσ =
3 2=4
(3) For
σ = {(1, 3), (2), (4)} =
we have
1=3
Gσ =
2 4
The term E [ai1 i2 ai2 i3 · · · aim i1 ] corresponds now to a walk in Gσ , with σ = ker i,
along the edges with steps
i1 → i2 → i3 → · · · → im → i1 .
Hence we are using each edge in Gσ at least once. Note that different edges in Gσ
correspond to independent random variables. Hence, if we use an edge only once in
36
our walk, then E [σ] = 0, because the expectation factorizes into a product with one
factor being the first moment of aij , which is assumed to be zero. Thus, every edge
must be used at least twice, but this implies
# steps in the walk m
# edges in Gσ ≤ = .
2 2
Since the number of i with the same kernel is
#V ≤ #E + 1
and we have equality if and only if G is a tree, i.e., a connected graph without cycles.
37
m
In order to have E [σ] 6= 0, we can restrict to σ with #E(Gσ ) ≤ 2
, which by
Proposition 3.8 implies that
m
#V (Gσ ) ≤ #E(Gσ ) + 1 ≤ + 1.
2
Hence all terms converge and the only contribution in the limit N → ∞ comes from
those σ, where we have equality, i.e.,
m
#V (Gσ ) = #E(Gσ ) + 1 = + 1.
2
Thus, Gσ must be a tree and in our walk we use each edge exactly twice (necessarily
in opposite directions). For such a σ we have E [σ] = 1; thus
We will check in Exercise 9 that the latter number is also counted by the Catalan
numbers.
Remark 3.10. Note that our Gσ are not just abstract trees, but they are coming with
the walks, which encode
• a starting point, i.e., the Gσ are rooted trees
• a cyclic order of the outgoing edges at a vertex, which gives a planar drawing
of our graph
Hence what we have to count are rooted planar trees.
Note also that a rooted planar tree determines uniquely the corresponding walk.
38
4 Analytic Tools: Stieltjes
Transform and Convergence of
Measures
Let us recall our setting and goal. We have, for each N ∈ N, selfadjoint N × N
random matrices, which are given by a probability measure PN on the entries of the
matrices; the prescription of PN should be kind of uniform in N .
For example, for the gue(n) we have A = (aij )N i,j=1 with the complex variables
√
aij = xij + −1yij having real part xij and imaginary part yij . Since aij = āji ,
we have yii = 0 for all i and we remain with the N 2 many “free variables” xii
(i = 1, . . . , N ) and xij , yij (1 ≤ i < j ≤ N ). All those are independent and Gaussian
distributed, which can be written in the compact form
Tr(A2 )
dP(A) = cN exp(−N )dA,
2
where dA is the product of all differentials of the N 2 variables and cN is a normal-
ization constant, to make PN a probability measure.
We want now statements about our matrices with respect to this measure PN ,
either in average or in probability. Let us be a bit more specific on this.
Denote by ΩN the space of our selfadjoint N × N matrices, i.e.,
√ 2
ΩN := {A = (xij + −1yij )N ˆ N ,
i,j=1 | xii ∈ R(i = 1, . . . , N ), xij , yij ∈ R(i < j)}=R
39
• in average, i.e., Z
N →∞
µN := µA dPN (A) = E [µa ] −→ µW
ΩN
40
(2) We have
1 ε
Z Z
Im Sµ (x + iε) = Im dµ(t) = dµ(t)
t − x − iε (t − x)2 + ε
R R
and thus
Zb Z Zb
ε
Im Sµ (x + iε)dx = dxdµ(t).
a
(t − x)2 + ε2
R a
For the inner integral we have
Zb (b−t)/ε !
ε 1 b−t a−t
Z
dx = dx = tan−1 − tan−1
a
(t − x)2 + ε2 2
x +1 ε ε
(a−b)/ε
t 6∈ [a, b]
0,
ε&0
−→ π/2, t ∈ {a, b}
π, t ∈ (a, b)
From this the assertion follows.
(3) Now assume that Sµ = Sν . By the Stieltjes inversion formula it follows then
that µ((a, b)) = ν((a, b)) for all open intervals such that a and b are atoms
neither of µ nor of ν. Since there can only be countably many atoms we can
write any interval as
∞
[
(a, b) = (a + εn , b − εn ),
n=1
where the sequence εn & 0 is chosen such that all a + εn and b − εn are no
atoms of µ nor ν. By monotone convergence for measures we get then
µ((a, b) = lim µ((a + εn , b − εn )) = lim ν((a + εn , b − εn )) = ν((a, b)).
n→∞ n→∞
41
Proposition 4.4. Let µ be a compactly supported probability measure, say µ([−r, r]) =
1 for some r > 0. Then Sµ has a power series expansion (about ∞) as follows
∞
X mn
Sµ (z) = − n+1
for |z| > r,
n=0 z
Proposition
√ 4.5. The Stieltjes transform S(z) of the semicircle distribution, dµW (t) =
1 2 − 4dt, is, for z ∈ C+ , uniquely determined by
2π
t
• S(z) ∈ C+ ;
• S(z) is the solution of the equation S(z)2 + zS(z) + 1 = 0.
Explicitly, this means
√
−z + z 2 − 4
S(z) = (z ∈ C+ ).
2
Proof. By Proposition 4.4, we know that for large |z|:
∞
X Ck
S(z) = − 2k+1
,
k=0 z
where Ck are the Catalan numbers. By using the recursion for the Catalan numbers
(see Theorem 1.6), this implies that for large |z| we have S(z)2 + zS(z) + 1 = 0.
Since we know that S is analytic on C+ , this equation is, by analytic extension, then
valid for all z ∈ C+ . √
This equation has two solutions, (−z ± z 2 − 4)/2, and only the one with the
+-sign is in C+ .
42
Remark 4.6. Proposition 4.5 gave us the Stieltjes√transform of µW just from the
knowledge of the moments. From S(z) = (−z − z 2 − 4)/2 we can then get the
density of µW via the Stieltjes inversion formula:
1 1 q
ε&0 1 √ 0, |x| > 2
Im S(x+iε) = Im (x + iε)2 − 4 −→ Im x4 − 4 = 1 √
π 2π 2π 2π
4 − x , |x| ≤ 2.
2
Thus this analytic machinery gives an effective way to calculate a distribution from
its moments (without having to know the density in advance).
The first possibility is problematic in this generality, as it does treat atoms too
restrictive.
Example: Take µN = δ1−1/N and µ = δ1 . Then we surely want that µN → µ, but
for B = [1, 2] we have µN ([1, 2]) = 0 for all N , but µ([1, 2]) = 1.
Thus the second possibility above is the better definition. But we have to be
careful about
R
which class of continuous functions we allow; we need bounded ones,
otherwise f dµ might not exist in general; and, for compactness reasons, it is some-
times better to ignore the behaviour of the measures at infinity.
Definition 4.7. (1) We use the notations
(i) C0 (R) := {f ∈ C(R) | lim|t|→∞ f (t) = 0} are the continuous functions on
R vanishing at infinity
(ii) Cb (R) := {f ∈ C(R) | ∃M > 0 : |f (t)| ≤ M ∀t ∈ R} are the continuous
bounded functions on R
(2) Let µ and µN (N ∈ N ) be finite measures. Then we say that
v
(i) µN converges vaguely to µ, denoted by µN → µ, if
Z Z
f (t)dµ(t) → f (t)dµ(t) for all f ∈ C0 (R);
43
w
(ii) µN converges weakly to µ, denoted by µN → µ, if
Z Z
f (t)dµN (t) → f (t)dµ(t) for all f ∈ Cb (R).
Remark 4.8. (1) Note that weak convergence includes in particular that
Z Z
µN (R) = 1dµN (t) → 1dµ(t) = µ(R),
and thus the weak limit of probability measures must again be a probability
measure. For the vague convergence this is not true; there we can loose mass
at infinity.
Example: Consider µN = 21 δ1 + 21 δN and µ = 12 δ1 ; then
Z
1 1 1 Z
f (t)dµN (t) = f (1) + f (N ) → f (1) = f (t)dµ(t)
2 2 2
for all f ∈ C0 (R). Thus the sequence of probability measures 12 δ1 + 12 δN
converges, for N → ∞, to the finite measure 12 δ1 with total mass 1/2.
(2) The relevance of the vague convergence, even if we are only interested in
probability measures, is that the probability measures are precompact in the
vague topology, but not in the weak topology. E.g., in the above example,
µN = 21 δ1 + 12 δN has no subsequence which converges weakly (but it has a
subsequence, namely itself, which converges vaguely).
Proof. (1) From a functional analytic perspective this is a special case of the
Banach-Alaoglu theorem; since complex measures on R are the dual space of
the Banach space C0 (R), and its weak∗ topology is exactly the vague topology.
(2) From a measure theory perspective this is known as Helly’s (Selection) Theo-
rem. Here are the main ideas for the proof in this setting.
(i) We describe a finite measure µ by its distribution function Fµ , given by
44
• F (−∞) := limt→−∞ F (t) = 0 and F (+∞) := limt→∞ F (t) < ∞
• F is continuous on the right
v
(ii) The vague convergence of µN → µ can also be described in terms of their
v
distribution functions FN , F ; µN → µ is equivalent to:
FN (t) → F (t) for all t ∈ R at which F is continuous.
(iii) Let now a sequence (µN )N of probability measures be given. We con-
sider the corresponding distribution functions (FN )N and want to find a
convergent subsequence (in the sense of (ii)) for those.
For this choose a countable dense subset T = {t1 , t2 , . . . } of R. Then, by
choosing subsequences of subsequences and taking the “diagonal” sub-
sequence, we get convergence for all t ∈ T . More precisely: Choose
subsequence (FN1 (m) )m such that
m→∞
FN1 (m) (t1 ) −→ FT (t1 ),
choose then a subsequence (FN2 (m) )m of this such that
m→∞ m→∞
FN2 (m) (t1 ) −→ FT (t1 ), FN2 (m) (t2 ) −→ FT (t2 );
iterating this gives subsequences (FNk (m) )m such that
m→∞
FNk (m) (ti ) −→ FT (ti ) for all i = 1, . . . , k.
The diagonal subsequence (FNm (m) )m converges then at all t ∈ T to FT (t).
We improve now FT to the wanted F by
F (t) := inf{FT (s) | s ∈ T, s > t}
and show that
• F is a distribution function;
m→∞
• FNm (m) (t) −→ F (t) at all continuity points of F .
m→∞
According to (ii) this gives then the convergence µNm (m) −→ µ, where µ
is the finite measure corresponding to the distribution function F . Note
that FN (+∞) = 1 for all N ∈ N gives F (+∞) ≤ 1, but we cannot
guarantee F (+∞) = 1 in general.
Remark 4.10. If we want compactness in the weak topology, then we must control
the mass at ∞ in a uniform way. This is given by the notion of tightness. A
sequence (µN )N of probability measures is tight if: for all ε > 0 there exists an
interval I = [−R, R] such that µN (I c ) < ε for all N .
Then one has: Any tight sequence of probability measures has a subsequence
which converges weakly; the limit is then necessarily a probability measure.
45
4.3 Probability measures determined by moments
We can now also relate weak convergence to convergence of moments; which shows
that our combinatorial approach (using moments) and analytic approach (using
Stieltjes transforms) for proving the semicircle law are essentially equivalent. We
want to make this more precise in the following.
Definition 4.11. RA probability measure µ on R is determined by its moments if
(i) all moments tk dµ(t) < ∞ (k ∈ N) exist;
(ii) µ is the only RprobabilityRmeasure with those moments: if ν is a probability
measure and tk dν(t) = tk dµ(t) for all k ∈ N, then ν = µ.
Theorem 4.12. Let µ and µN (N ∈ N) be probability measures for which all mo-
ments exist. Assume that µ is determined by its moments. Assume furthermore that
we have convergence of moments, i.e.,
Z Z
lim tk dµN (t) = tk dµ(t) for all k ∈ N.
N →∞
w
Then we have weak convergence: µN → µ.
Rough idea of proof. One has to note that convergence of moments implies tightness,
which implies the existence of a weakly convergent subsequence, µNm → ν. Further-
more, the assumption that the moments converge implies that they are uniformly
integrable, which implies then that the moments of this subsequence converge to the
moments of ν. (These are kind of standard measure theoretic arguments, though a
bit involved; for details see the book of Billingsley, in particular, his Theorem 25.12
and its Corollary.) However, the moments of the subsequence converge, as the mo-
ments of the whole sequence, by assumption to the moments of µ; this means that
µ and ν have the same moments and hence, by our assumption that µ is determined
by its moments, we have that ν = µ.
In the same way all weakly convergent subsequences of (νN )N must converge to the
same µ, and thus the whole sequence must converge weakly to µ.
Remark 4.13. (0) Note that in the first version of these notes (and also in the
recorded lectures) it was claimed that, under the assumption that the limit
is determined by its moments, convergence in moments is equivalent to weak
convergence. This is clearly not true as the following simple example shows.
Consider
1 1
µN = (1 − )δ0 + δN and µ = δ0 .
N N
46
w
Then it is clear that µN → µ, and µ is also determined by its moments. But
there is no convergence of moments. For example, the first moment converges,
but to the wrong limit
Z
1 Z
tdµN (t) = N = 1 → 1 6= 0 = tdµ(t),
N
and the other moments explode
Z
1 k
tk dµN (t) = N = N k−1 → ∞ for k ≥ 2.
N
In order to have convergence of moments one needs a uniform integrability
assumption; see Billingsley, in particular, his Theorem 25.12 and its Corollary.
(1) Note that there exist measures for which all moments exist but which, however,
are not determined by their moments. Weak convergence to them cannot be
checked by just looking on convergence of moments.
Example: The log-normal distribution with density
1 1 −(log x)2 /2
dµ(t) = √ e dt on [0, ∞)
2π x
(which is the distribution of eX for X Gaussian) is not determined by its
moments.
(2) Compactly supported measures (like the semicircle) or also the Gaussian dis-
tribution are determined by their moments.
47
Since lim|t|→∞ fz (t) = 0, we have fz ∈ C0 (R) ⊂ Cb (R) and thus, by definition
of weak convergence:
Z Z
SµN (z) = fz (t)dµN (t) → fz (t)dµ(t) = Sµ (z).
Thus the analytic functions Sµ and Sν agree on D and hence, but the identity
therem for analytic functions, also on C+ , i.e., Sµ = Sν . But this implies, by
Theorem 4.2, that ν = µ.
Thus the subsequence (µN (m) )m converges vaguely to the probability measure
µ (and thus also weakly, see Exercise 12). In the same way, any weak cluster
point of (µN )N must be equal to µ, and thus the whole sequence must converge
weakly to µ.
Remark 4.15. If we only assume that SµN (z) converges to a limit function S(z), then
S must be the Stieltjes transform of a measure ν with ν(R) ≤ 1 and we have the
v
vague convergence µN → ν.
48
5 Analytic Proof of Wigner’s
Semicircle Law for Gaussian
Random Matrices
Now we are ready to give an analytic proof of Wigner’s semicircle law, relying on the
analytic tools we developed in the last chapter. As for the combinatorial approach,
the Gaussian case is easier compared to the general Wigner case and thus we will
restrict to this. The difference between real and complex is here not really relevant,
instead of gue we will treat the goe case.
ΩN = {AN = (xij )N
i,j=1 | xij ∈ R, xij = xji ∀i, j}
N
Tr(A2N )) dxij ,
Y
dPN (AN ) = cN exp(−
4 i≤j
Remark 5.2. (1) Note that with this choice of PN , which is invariant under orthog-
onal rotations, we have actually different variances on and off the diagonal:
h i 1 h i 2
E x2ij = (i 6= j) and E x2ii = .
N N
49
(2) We consider now, for each N ∈ N, the averaged eigenvalue distribution
Z
µN := E [µAN ] = µA dPN (A).
ΩN
w
We want to prove that µN → µW . According to Theorem 4.14 we can prove
this by showing limN →∞ SµN (z) = SµW (z) for all z ∈ C+ .
(3) Note that
Z
1 Z
1 h i
SµN (z) = dµN (t) = E dµAN (t) = E tr[(AN − z1)−1 ] ,
t−z t−z
R R
since, by Assignement ...., SµAN (t) = tr[(AN − z1)−1 ]. So what we have to see,
is for all z ∈ C+ :
h i
lim E tr[(AN − z1)1− ] = SµW (z).
N →∞
For this, we want to see that SµN (z) satisfies approximately the quadratic
equation for SµW (z), from 4.5.
(4) Let us use for the resolvents of our matrices A the notation
1
RA (z) = , so that SµN (z) = E [tr(RAN (z))] .
A − z1
In the following we will usually suppress the index N at our matrices; thus
write just A instead of AN , as long as the N is fixed and clear.
We have then (A − z1)RA (z) = 1, or A · RA (z) − zRA (z) = 1, thus
1 1
RA (z) = − 1 + ARA (z).
z z
Taking the normalized trace and expectation of this yields
1 1
E [tr(RA (z))] = − + E [tr(ARA (z))] .
z z
The left hand side is our Stieltjes transform, but what about the right hand
side; can we relate this also to the Stieltjes transform? Note that the function
under the expectation is N1 k,l xkl [RA (z)]lk ; thus a sum of terms which are the
P
product of one of our Gaussian variables times a function of all the independent
Gaussian variables. There exists actually a very nice and important formula
to deal with such expectations of independent Gaussian variables. In a sense,
this is the analytic version of the combinatorial Wick formula.
50
5.2 Stein’s identity for independent Gaussian
variables
Proposition 5.3 (Stein’s identity). Let X1 , . . . , Xk be independent random variables
with Gaussian distribution, with mean zero and variances E [Xi ] = σi2 . Let h :
Rk → C be continuously differentiable such that h and all partial derivatives are of
polynomial growth. Then we have for i = 1, . . . , k:
" #
∂h
E [Xi h(X1 , . . . , Xk )] = σi2 E (X1 , . . . , Xk ) .
∂xi
More explicitly,
!
Z
x2 x2
xi h(x1 , . . . , xk ) exp − 12 − · · · − k2 dx1 . . . dxk =
2σ1 2σk
Rk
!
Z
∂h x2 x2
σi2 (x1 , . . . , xk ) exp − 12 − · · · − k2 dx1 . . . dxk
∂xi 2σ1 2σk
Rk
2 /(2σ 2 ) 2 /(2σ 2 )
Proof. The main argument happens for k = 1. Since xe−x = [−σ 2 e−x ]0
we get by partial integration
Z Z Z
−x2 /(2σ 2 ) 2 −x2 /(2σ 2 ) 0 2 /(2σ 2 )
xh(x)e dx = h(x)[−σ e ] dx = h0 (x)σ 2 e−x dx;
R R R
our assumptions on h are just such that the boundary terms vanish.
For general k, we just do partial integration for the i-th coordinate.
We want to apply this now to our Gaussian random matrices, with Gaussian
random variables xij (1 ≤ i ≤ j ≤ N ) of variance
1, i 6= j
σij2 = N
2, i=j
N
51
Lemma 5.4. For A = (xij )N
i,j=1 with xij = xji for all i, j, we have for all i, j, k, l:
∂ −[R (z)] · [R (z)] , i = j,
A li A ik
[RA (z)]lk =
∂xij −[RA (z)]li · [RA (z)]jk − [RA (z)]lj · [RA (z)]ik , i 6= j.
52
Now we calculate, with A = (xij )N
i,j=1 ,
N
1 X
E [tr[ARA (z)]] = E [xkl · [RA (z)]lk ]
N k,l=1
N
" #
1 X 2 ∂
= σkl ·E [RA (z)]lk
N k,l=1 ∂xkl
N
1 X 1
=− · [RA (z)]lk · [RA (z)]lk + [RA (z)]ll · [RA (z)]kk .
N k,l=1 N
(Note that the combination of the different covariances and of the different form of
the formula in Lemma 5.4 for on-diagonal and for off-diagonal entries gives in the
end the same result for all pairs of (k, l).)
Now note that (A − z1) is symmetric, hence the same is true for its inverse
RA (z) = (A − z1)−1 and thus: [RA (z)]lk = [RA (z)]kl . Thus we get finally
1 h i
E [tr[ARA (z)]] = − E tr[RA (z)2 ] − E [tr[RA (z)] · tr[RA (z)]] .
N
To proceed further we need to deal with the two summands on the right hand side;
we expect
• the first term, N1 E [tr[RA (z)2 ]], should go to zero, for N → ∞
• the second term, E [tr[RA (z)] · tr[RA (z)]], should be close to its factorized ver-
sion E [tr[RA (z)]] · E [tr[RA (z)]] = SµN (z)2
Both these ideas are correct; let us try to make them rigorous.
• A as a symmetric matrix can be diagonalized by an orthogonal matrix U ,
1
... 0
λ1 . . . 0 (λ1 −z)2
. . .. ∗ .. ... .. ∗
A = U .. .. and thus RA (z)2 =
U ,
U ,
. . .
1
0 . . . λN 0 ... (λN −z)2
which yields
N
2 1 X 1
| tr[RA (z) ]| ≤ | |.
N i=1 (λi − z)2
Note that for all λ ∈ R and all z ∈ C+
1 1
| |≤
λ−z Im z
and hence
1 h i 1 h i 1 1
|E tr(RA (z)2 ] | ≤ E | tr[RA (z)2 ]| ≤ →0 for N → ∞.
N N N (Im z)2
53
• By definition of the variance we have
h i h i
Var [X] := E (X − E [X])2 = E X 2 − E [X]2 ,
and thus h i
E X 2 = E [X]2 + Var [X] .
Hence we can replace E [tr[RA (z)] · tr[RA (z)]] by
E [tr[RA (z)]]2 + Var [tr[RA (z)]] = SµN (z)2 + Var [SµA (z)] .
In the next chapter we will show that we have concentration, i.e., the variance
Var [SµA (z)] goes to zero for N → ∞.
With those two ingredients we have then
1 1 1 1
SµN (z) = − + E [tr[ARA (z)]] = − − SµN (z)2 + εN ,
z z z z
where εN → for N → ∞.
Note that, as above, for any Stieltjes transform Sν we have
Z
1 Z
1 1
|Sν (z)| = | dν(t)| ≤ | |dν(t) ≤ ,
t−z t−z Im z
and thus (SµN (z))N is a bounded sequence of complex numbers. Hence, by compact-
ness, there exists a convergent subsequence (SµN (m) (z))m , which converges to some
S(z). This S(z) must then satisfy the limit N → ∞ of the above equation, thus
1 1
S(z) = − − S(z)2 .
z z
Since all SµN (z) are in C+ , the limit S(z) must be in C+ , which leaves for S(z) only
the possibility that √
−z + z 2 − 4
S(z) = = SµW (z)
2
(as the other solution is in C− ).
In the same way, it follows that any subsequence of (SµN (z))N has a convergent
subsequence which converges to SµW (z); this forces all cluster points of (SµN (z))N
to be SµW (z). Thus the whole sequence converges to SµN (z). This holds for any
w
z ∈ C+ , and thus implies that µN → µW .
To complete the proof we still have to see the concentration to get the asymptotic
vanishing of the variance. We will address such concentration questions in the next
chapter.
54
6 Concentration Phenomena and
Stronger Forms of Convergence
for the Semicircle Law
6.1 Forms of convergence
Remark 6.1. (1) Recall that our random matrix ensemble is given by probability
measures PN on sets ΩN of N × N matrices and we want to see that µAN con-
verges weakly to µW , or, equivalently, that, for all z ∈ C+ , SµAN (z) converges
to SµW (z). There are different levels of this convergence with respect to PN :
(i) convergence in average, i.e.,
N →∞
h i
E SµAN (z) −→ SµW (z)
instead of making this more precise, let us just point out that this al-
most sure convergence is guaranteed, by the Borel-Cantelli Lemma, if
the convergence in (ii) to zero is sufficiently fast in N , so that for all
ε>0 ∞X
PN {AN | |SµAN (z) − SµW (z)| ≥ ε} < ∞.
N =1
55
to a dictum of M. Talagrand: “A random variable that depends (in a smooth
way) on the influence of many independent variables (but not too much on
any of them) is essentially constant.”
(3) Note also that many classical results in probability theory (like law of large
numbers) can be seen as instances of this, dealing with linear functions. How-
ever, this principle also applies to non-linear functions - like in our case, to
tr[(A − z1)−1 ], considered as function of the entries of A.
(4) Often control of the variance of the considered function is a good way to get
concentration estimates. We develop in the following some of the basics for
this.
h i h i Z
Var [X] := E (X − E [X])2 = E X 2 − E [X]2 = (X(ω) − E [(] X))2 dP(ω).
Ω
E [X]
P{ω | X(ω) ≥ t} ≤ .
t
E [X] could here also be ∞, but then the statement is not very useful. The
Markov inequality only gives useful information if X has finite mean, and then only
for t > E [X].
56
Proof. Since X(ω) ≥ 0 for all ω ∈ Ω we can estimate as follows:
Z
E [X] = X(ω)dP(ω)
Ω
Z Z
= X(ω)dP(ω) + X(ω)dP(ω)
{X(ω)≥t} {X(ω)<t}
Z
≥ X(ω)dP(ω)
{X(ω)≥t}
Z
≥ tdP(ω)
{X(ω)≥t}
= t · P{X(ω) ≥ t}.
Remark 6.5. Our goal will thus be to control the variance of X = f (X1 , . . . , Xn ) for
X1 , . . . , Xn independent random variables. (In our case, the Xi will be the entries
of the GOE matrix A and f will be the function f = tr[(A − z1)−1 ].) A main idea in
this context is to have estimates which go over from separate control of each variable
to control of all variables together; i.e., which are stable under tensorization. There
are two prominent types of such estimates, namely
57
(i) Poincaré inequality
(ii) LSI=logarithmic Sobolev inequality
We will focus here on (i) and say a few words on (ii) later.
Remark 6.7. Let us write this also “explicitly” in terms of the distribution µ of the
random variable X : Ω → Rn ; recall that µ is the push-forward of the probability
measure P under the map X to a probability measure on Rn . In terms of µ we have
then Z
E [f (X)] = f (x1 , . . . , xn )dµ(x1 , . . . , xn )
Rn
where Var(i) denotes taking the variance in the i-th variable, keeping all the other
variables fixed, and the expectation is then integrating over all the other variables.
Proof. We denote the distribution of Xi by µi ; this is, for each i, a probability mea-
sure on R. Since X1 , . . . , Xn are independent, the distribution of X = (X1 , . . . , Xn )
is given by the product measure µ1 × · · · × µn on Rn .
58
Putting Z = f (X1 , . . . , Xn ), we have
Z
E [Z] = f (x1 , . . . , xn )dµ1 . . . dµn (xn )
Rn
and Z 2
Var [Z] = f (x1 , . . . , xn ) − E [Z] dµ1 . . . dµn .
Rn
We will now do the integration E by integrating one variable at a time and control
each step. For this we write
Z − E [Z] = Z − E1 [Z]
+ E1 [Z] − E1,2 [Z]
+ E1,2 [Z] − E1,2,3 [Z]
..
.
+ E1,2,...,n−1 [Z] − E [Z] ,
where E1,...,k denotes integration over the variables x1 , . . . , xk , leaving a function of
the variables xk+1 , . . . , xn . Thus, with
∆i := E1,...,i−1 [Z] − E1,...,i−1,i [Z]
Pn
(which is a function of the variables xi , xi+1 , . . . , xn ), we have Z − E [Z] = i=1 ∆i ,
and thus
h i
Var [Z] = Var (Z − E [Z])2
!2
n
X
= Var ∆i
i=1
n h i
E ∆2i +
X X
= E [∆i ∆j ] .
i=1 i6=j
Now observe that for all i 6= j we have E [∆i ∆j ] = 0. Indeed, consider, for example,
n = 2 and i = 1, j = 2:
E [∆1 ∆2 ] = E [(Z − E1 [Z]) · (E1 [Z] − E1,2 [Z])]
Z Z
= f (x1 , x2 ) − f (x̃1 , x2 )dµ1 (x̃1 ) ·
Z Z
· f (x̃1 , x2 )dµ1 (x̃1 ) − f (x̃1 , x̃2 )dµ1 (µ̃1 )dµ2 (x̃2 ) dµ1 (x1 )dµ2 (x2 )
59
Integration with respect to x1 now affects only the first factor and integrating this
gives zero. The general case i 6= j works in the same way. Thus we get
n h i
E ∆2i .
X
Var [Z] =
i=1
We denote now with E(i) integration with respect to the variable xi , leaving a func-
tion of the other variables x1 , . . . , xi−1 , xi+1 , . . . , xn , and
Var(i) [Z] := E(i) [(Z − E(i) [Z])2 ].
Then we have
∆i = E1,...,i−1 [Z] − E1,...,i [Z] = E1,...,i−1 [Z − E(i) [Z]],
and thus by Jensen’s inequality (which is here just the fact that variances are non-
negative),
∆2i ≤ E1,...,i−1 [(Z − E(i) [Z])2 ].
This gives finally
n h i n h i
E ∆2i ≤ E E(i) [(Z − E(i) [Z])2 ] ,
X X
Var [Z] =
i=1 i=1
60
Theorem 6.10 (Gaussian Poincaré Inequality). Let X1 , . . . , Xn be independent
standard Gaussian random variables, E [Xi ] = 0 and E [Xi2 ] = 1. Then X =
(X1 , . . . , Xn ) satisfies a Poincaré inequality with constant 1; i.e., for each continu-
ously differentiable f : Rn → R we have
h i
Var [f (X)] ≤ E k∇f (X)k2 .
Proof. As remarked in Remark 6.11, the general case can, by Theorem 6.9, be
reduced to the one-dimensional case and, by shifting our function f by a constant,
we can also assume that f (X) has mean zero. One possible proof is to approximate
X via a central limit theorem by independent Bernoulli variables Yi .
So let Y1 , Y2 , . . . be independent Bernoulli variables, i.e., P [Yi = 1] = 1/2 =
P [Yi = −1] and put
Y1 + · · · + Yn
Sn = √ .
n
Then, by the central limit theorem, the distribution of Sn converges weakly, for
n → ∞, to a standard Gaussian distribution. So we can approximate f (X) by
!
1
g(Y1 , . . . , Yn ) = f (Sn ) = f √ (Y1 + · · · + Yn ) .
n
By the Efron-Stein inequality 6.8, we have
Var [f (Sn )] = Var [g(Y1 , . . . , Yn )]
n h i
E Var(i) [g(Y1 , . . . , Yn )]
X
≤
i=1
n h i
E Var(i) [f (Sn )] .
X
=
i=1
61
Put
1 1
Sn[i] := Sn − √ Yi = √ (Y1 + · · · Yi−1 + Yi+1 + · · · + Yn ).
n n
Then !
1 1 1
E(i) [f (Sn )] = f Sn[i] + √ + f Sn[i] − √
2 n n
and
Var(i) [f (Sn )]
!2 !2
1 [i] 1 1
= f Sn + √ − E(i) [f (Sn )] + f Sn[i] − √ − E(i) [f (Sn )]
2 n n
!2
1 1 1
= f Sn[i] + √ − f Sn[i] − √ ,
4 n n
and thus
!
1n 1 1 2
E f Sn[i] + √ − f Sn[i] − √
X
Var [f (Sn )] ≤ .
4 i=1 n n
62
Note that the first term containing Sn[i] is actually independent of i. Now we take
the limit n → ∞ in this inequality; since both Sn and Sn[i] converge to our standard
Gaussian variable X we obtain finally the wanted
h i
Var [f (X)] ≤ E f 0 (X)2 .
Thus it suffices to estimate the variance of real and imaginary part of g(A).
We have, for i < j,
∂g(A) ∂
= tr[RA (z)]
∂xij ∂xij
N
1 X ∂[RA (z)]kk
=
N k=1 ∂xij
N
1 X
=− [RA (z)]ki · [RA (z)]jk + [RA (z)]kj · [RA (z)]ik by Lemma 5.4
N k=1
63
N
2 X
=− [RA (z)]ik · [RA (z)]kj since RA (z) is symmetric, see proof of 5.5
N k=1
2
= − [RA (z)2 ]ij ,
N
and the same for i = j with 2/N replaced by 1/N .
Thus we get for f (A) := Re g(A) = Re tr[RA (z)]:
∂f (A) ∂g(A) 2 2 2
| | = | Re | ≤ |[RA (z)2 ]ij | ≤ kRA (z)2 k ≤ ,
∂xij ∂xij N N N · (Im z)2
where in the last step we used the usual estimate for resolvents as in the proof of
Theorem 5.5. Hence we have
∂f (A) 2 4
| | ≤ 2 ,
∂xij N · (Im z)4
and thus our Gaussian Poincaré inequality 6.10 (with constant 2/N ) yields
2 X ∂f (A) 2 8
Var [f (A)] ≤ · | | ≤ .
N i≤j ∂xij N · (Im z)4
The same estimate holds for the imaginary part and thus, finally, we have for the
variance of the trace of the resolvent:
32
Var [tr[RA (z)]] ≤ .
N · (Im z)4
The fact that Var [tr[RA (z)]] goes to zero for N → ∞ closes the gap in our proof
of Theorem 5.5. Furthermore, it also improves the type of convergence in Wigner’s
semicircle law.
Theorem 6.12. Let AN be goe random matrices as in 5.1. Then the eigenvalue
distribution µAN converges in probability to the semicircle distribution. Namely, for
each z ∈ C+ and all ε > 0 we have
Proof. By the Chebyshev inequality 6.4, our above estimate for the variance implies
for any ε > 0 that
64
Var [tr[RAN ]]
PN {AN | | tr[RAN (z)] − E [tr[RAN (z)]] | ≥ ε} ≤
ε2
32 N →∞
≤ −→ 0.
N · (Im z) · ε
4 2
Since we already know, by Theorem 5.5, that limN →∞ E [tr[RAN (z)]] = SµW (z), this
gives the assertion.
Remark 6.13. Note that our estimate Var [...] ∼ 1/N is not strong enough to get
almost sure convergence; one can, however, improve our arguments to get Var [...] ∼
1/N 2 , which implies then also almost sure convergence.
where Z Z Z
Entµ (f ) := f log f dµ − f dµ · log f dµ
Rn Rn Rn
Remark 6.15. (1) As for Poincaré inequalities, logarithmic Sobolev inqualities are
stable under tensorization and Gaussian measures satisfy LSI.
(2) From a logarithmic Sobolev inequality one can then derive a concentration
inequality for our random matrices of the form
!
N 2 ε2
PN {AN | | tr[RAN (z)] − E [tr[RAN (z)]] | ≥ ε} ≤ const · exp − · (Im z)4 .
2
65
7 Analytic Description of the
Eigenvalue Distribution of
Gaussian Random Matrices
√
In Exercise 7 we showed that the joint distribution of the entries aij = xij + −1yij
of a gue A = (aij )N
i,j=1 has density
N
c · exp − Tr A2 dA.
2
This clearly shows the invariance of the distribution under unitary transformations:
Let U be a unitary N × N matrix and let B = U ∗ AU = (bij )N i,j=1 . Then we have
2 2
Tr B = Tr A and the volume element is invariant under unitary transformations,
dB = dA. Therefore, for the joint distributions of entries of A and of B, respectively,
we have
N N
2 2
c · exp − Tr B dB = c · exp − Tr A dA.
2 2
Thus the joint distribution of entries of a gue is invariant under unitary trans-
formations, which explains the name Gaussian Unitary Ensemble. What we are
interested in, however, are not the entries but the eigenvalues of our matrices. Thus
we should transform this density from entries to eigenvalues. Instead of gue, we
will mainly consider the real case, i.e., goe.
67
With goe(n) we denote the goe of size N × N .
Remark 7.2. (1) This is clearly invariant under orthogonal transformation of the
entries.
(2) This is equivalent to independent real Gaussian random variables. Note, how-
ever, that the variance for the diagonal entries has to be chosen differently
from the off-diagonals; see Remark 5.2. Let us check this for N = 2 with
!
x x12
A = 11
x12 x22 .
Then
!2
N x x12 N 2
exp − Tr 11 = exp − 2 2
x11 + 2x12 + x22
4 x 12 x 22 . 4
N 2 N N
= exp − x11 exp − x212 exp − x222 ;
4 2 4
those give the density of a Gaussian of variance 1/N for x11 and x22 and of
variance 1/N for x12 .
(3) From this one can easily determine the normalization constant cN (as a func-
tion of N ).
Since we are usually interested in functions of the eigenvalues, we will now trans-
form this density to eigenvalues.
Example 7.3. As a warmup, let us consider the goe(2) case,
!
x x N
A = 11 12 with density p(A) = c2 exp − Tr A2 .
x12 x22 4
We parametrize A by its eigenvalues λ1 and λ2 and an angle θ by diagonalization
A = OT DO, where
! !
λ1 0 cos θ − sin θ
D= and O= ;
0 λ2 sin θ cos θ
explicityly
68
Note that O and D are not uniquely determined by A. In particular, if λ1 = λ2 then
any orthogonal O works. However, this case has probability zero and thus can be
ignored (see Remark 7.4). If λ1 6= λ2 , then we can choose λ1 < λ2 ; O contains then
the normalized eigenvectors for λ1 and λ2 . Those are unique up to a sign, which
can be fixed by requiring that cos θ ≥ 0. Hence θ is not running from −π to π, but
instead it can be restricted to [−π/2, π/2]. We will now transform
We calculate
cos2 θ sin2 θ
−2(λ1 − λ2 ) sin θ cos θ
2 2
det DF = det
cos θ sin θ − cos θ sin θ (λ 1 − λ2 ) (− sin θ + cos θ) = −(λ1 − λ2 ),
Thus, the density for the joint distribution of the eigenvalues on {(λ1 , λ2 ); λ1 < λ2 }
is given by
N 2 2
c̃2 · e− 4 (λ1 +λ2 ) |λ1 − λ2 |
with c̃2 = πc2 .
69
Remark 7.4. Let us check that the probability of λ1 = λ2 is zero.
λ1 , λ2 are the solutions of the characteristic equation
Then there is only one solution if and only if the discriminant d = b2 − 4ac is zero.
However,
{(x11 , x22 , x12 ); d(x11 , x22 , x12 ) = 0}
is a two-dimensional surface in R3 , i.e., its Lebesgue measure is zero.
Now we consider general goe(n).
Theorem 7.5. The joint distribution of the eigenvalues of a goe(n) is given by a
density
N 2 2
c̃N e− 4 (λ1 +···+λN ) (λl − λk )
Y
k<l
where A = (xkl )Nk,l=1 with xkl real and xkl = xlk for all l, k. Again we diagonalize
T
A = O DO with O orthogonal and D = diag(λ1 , . . . , λN ) with λ1 ≤ · · · ≤ λN .
As before, degenerated eigenvalues have probability zero, hence this case can be
neglected and we assume λ1 < · · · < λN . We parametrize O via O = e−H by a
skew-symmetric matrix H, that is, H T = −H, i.e., H = (hij )N i,j=1 with hij ∈ R and
hij = −hji for all i, j. In particular, hii = 0 for all i. We have
T T
OT = e−H = e−H = eH
O = e−H is actually a parametrization of the Lie group SO(N ) by the Lie algebra
so(N ) of skew-symmetric matrices.
Note that our parametrization A = eH De−H has the right number of parameters.
For A we have the variables {xij ; j ≤ i} and for eH De−H we have the N eigenvalues
70
{λ1 , . . . , λN } and the 21 (N 2 − N ) many parameters {hij ; i > j}. In both cases we
have 21 N (N + 1) many variables. This parametrization is locally bijective; so we
need to compute the Jacobian of the map S : A 7→ eH De−H . We have
dA = ( deH )De−H + eH ( dD)e−H + eH D( de−H )
h i
= eH e−H ( deH )D + dD − D( de−H )eH e−H .
This transports the calculation of the derivative at any arbitrary point eH to the
identity element I = e0 in the Lie group. Since the Jacobian is preserved under this
transformation, it suffices to calculate the Jacobian at H = 0, i.e., for eH = I and
deH = dH. Then
dA = dH · D − D · dH + dD,
i.e.,
dxij = dhij λj − λi dhij + dλi δij
This means that we have
∂xij ∂xij
= δij δik and = δik δjl (λl − λk ).
∂λk ∂hkl
Hence the Jacobian is given by
Y
J = det DS = (λl − λk ).
k<l
Thus,
N
Tr A2
q(λ1 , . . . , λN , hkl ) = p(xij | i ≥ j)J = cN e− 4
Y
(λl − λk )
k<l
N 2 2
= cN e− 4 (λ1 +···+λN )
Y
(λl − λk ).
k<l
This is independent of the “angles” hkl , so integrating over those variables just
changes the constant cN into another constant c̃N .
In a similar way, the complex case can be treated; see Exercise 19. One gets the
following.
Theorem 7.6. The joint distribution of the eigenvalues of a gue(n) is given by a
density
N 2 2
ĉN e− 2 (λ1 +···+λN ) (λl − λk )2
Y
k<l
71
7.2 Rewriting the Vandermonde
Definition 7.7. The function
N
Y
∆(λ1 , . . . , λN ) = (λl − λk )
k,l=1
k<l
1
0 + 1 + 2 + · · · + (N − 1) = N (N − 1),
2
which is the same as the degree of ∆(λ1 , . . . , λN ). This shows that
∆(λ1 , . . . , λN ) = c · det(λi−1
j ) for some c ∈ R.
−1
By comparing the coefficient of 1 · λ2 · λ23 · · · λN
N on both sides one can check that
c = 1.
72
7.3 Rewriting the GUE density in terms of
Hermite kernels
In the following, we will make a special choice for the pk . We will choose them
as the Hermite polynomials, which are orthogonal with respect to the Gaussian
1 2
distribution 1c e− 2 λ .
Definition 7.10. The Hermite polynomials Hn are defined by the following require-
ments.
(i) Hn is a monic polynomial of degree n.
(ii) For all n, m ≥ 0:
Z
1 1 2
Hn (x)Hm (x) √ e− 2 x dx = δnm n!
2π
R
Remark 7.11. (1) One can get the Hn (x) from the monomials 1, x, x2 , . . . via
Gram-Schmidt orthogonalization as follows.
• We define an inner product on the polynomials by
1 Z 1 2
hf, gi = √ f (x)g(x)e− 2 x dx.
2π
R
and
1 Z 2 − 1 x2
hH1 , H1 i = √ x e 2 dx = 1 = 1!.
2π
R
73
and
2 1 Z 2 − 1 x2
hx , H0 i = √ x e 2 dx = 1.
2π
R
hH2 , H0 i = 0 = hH2 , H1 i
and
1 Z 2 1 2
hH2 , H2 i = √ (x − 1)2 e− 2 x dx
2π
R
1 Z 4 1 2
= √ (x − 2x2 + 1)e− 2 x dx = 3 − 2 + 1 = 2!
2π
R
H0 (x) = 1,
H1 (x) = x,
H2 (x) = x2 − 1,
H3 (x) = x3 − 3x,
H4 (x) = x4 − 6x2 + 3.
(4) By Proposition 7.9, we can now use the Hn for writing our Vandermonde
determinant as
∆(λ1 , . . . , λN ) = det (Hi−1 (λj ))N
i,j=1 .
74
We want to use this for our gue(n) density
N 2 2
q(λ1 , . . . , λN ) = ĉN e− 2 (λ1 +···+λN ) ∆(λ1 , . . . , λN )2
!2
− 21 (µ21 +···+µ2N ) µ1 µN
= ĉN e ∆ √ ,..., √ .
N N
!N (N −1)
− 12 (µ21 +···+µ2N ) 2 1
= ĉN e ∆(µ1 , . . . , µN ) √ ,
N
√
where√ the µi = N λi are the eigenvalues of the “unnormalized” gue ma-
trix N AN . It will be easier to deal with those. We now will also go
over from ordered eigenvalues λ1 < λ2 < · · · < λN to unordered eigenval-
ues (µ1 , . . . , µN ) ∈ RN . Since in the latter case each ordered tuple shows up
N ! times, this gives an additional factor N ! in our density. We collect all these
N -dependent factors in our constant c̃N . So we now have the density
1 2 2
p(µ1 , . . . , µN ) = c̃N e− 2 (µ1 +···+µN ) ∆(µ1 , . . . , µN )2
1 2 2
h i2
= c̃N e− 2 (µ1 +···+µN ) det (Hi−1 (µj ))N
i,j=1
N 2
− 41 µ2j
= c̃N det e Hi−1 (µj ) .
i,j=1
i.e., the Ψn are orthonormal with respect to the Lebesgue measure. Actually,
they form an orthonormal Hilbert space basis of L2 (R).
(2) Now we can continue the calculation
h i2
p(µ1 , . . . , µN ) = cN det (Ψi−1 (µj ))N
i,j=1
75
Definition 7.14. The N -th Hermite kernel KN is defined by
N
X −1
KN (x, y) = Ψk (x)Ψk (y).
k=0
Collecting all our notations and calculations we have thus proved the following.
Theorem 7.15. The unordered joint eigenvalue distribution of an unnormalized
gue(n) is given by the density
Proof. We calculate
N −1 N −1
Z Z ! !
X X
KN (x, u)KN (u, y) du = Ψk (x)Ψk (u) Ψl (u)Ψl (y) du
R R k=0 l=0
NX−1 Z
= Ψk (x)Ψl (y) Ψk (u)Ψl (u) du
k,l=0 R
N
X −1
= Ψk (x)Ψl (y)δkl
k,l=0
N
X −1
= Ψk (x)Ψk (y)
k=0
= KN (x, y).
76
We assume that all those integrals make sense, as it is the case for our Hermite
kernels.
For n = 3,
K(µ1 , µ1 ) K(µ1 , µ2 ) K(µ1 , µ3 )
det K(µ2 , µ1 ) K(µ2 , µ2 ) K(µ2 , µ3 )
with
! !
K(µ1 , µ1 ) K(µ1 , µ2 ) K(µ1 , µ1 ) K(µ1 , µ2 )
Z
det K(µ3 , µ3 ) dµ3 = det · d,
K(µ2 , µ1 ) K(µ2 , µ2 ) K(µ2 , µ1 ) K(µ2 , µ2 )
R
and
!
K(µ1 , µ1 ) K(µ1 , µ2 )
Z
− det K(µ2 , µ3 ) dµ3
K(µ3 , µ1 ) K(µ3 , µ2 )
R
!
K(µ1 , µ1 ) K(µ1 , µ2 )
Z
=− det dµ3
K(µ2 , µ3 )K(µ3 , µ1 ) K(µ2 , µ3 )K(µ3 , µ2 )
R
!
K(µ1 , µ1 ) K(µ1 , µ2 )
= − det ,
K(µ2 , µ1 ) K(µ2 , µ2 )
77
and
!
K(µ2 , µ1 ) K(µ2 , µ2 )
Z
det K(µ1 , µ3 ) dµ3
K(µ3 , µ1 ) K(µ3 , µ2 )
R
!
K(µ2 , µ1 ) K(µ2 , µ2 )
Z
= det dµ3
K(µ1 , µ3 )K(µ3 , µ1 ) K(µ1 , µ3 )K(µ3 , µ2 )
R
!
K(µ2 , µ1 ) K(µ2 , µ2 )
= det
K(µ1 , µ1 ) K(µ1 , µ2 )
!
K(µ1 , µ1 ) K(µ1 , µ2 )
= − det .
K(µ2 , µ1 ) K(µ2 , µ2 )
Putting all terms together gives
Z
det(K(µi , µj ))3i,j=1 dµ3 = (d − 2) det(K(µi , µj ))2i,j=1 .
R
Remark 7.19. We want to apply this to the Hermite kernel K = KN . In this case
we have
Z
d= KN (x, x) dx
R
−1
Z NX
= Ψk (x)Ψk (x) dx
R k=0
N
X −1 Z
= Ψk (x)Ψk (x) dx
k=0 R
= N,
and thus, since now d = N = n,
Z Z
··· det (KN (µi , µj ))N
i,j=1 dµ1 · · · dµn = N !.
R R
78
This now allows us to determine the constant cN in the density p(µ1 , . . . , µn ) in
Theorem 7.15. Since p is a probability density on RN , we have
Z
1= p(µ1 , . . . , µn ) dµ1 · · · dµN
RN
Z Z
= cN ··· det (KN (µi , µj ))N
i,j=1 dµ1 · · · dµN
R R
= cN N !,
1
and thus cN = N!
.
Theorem 7.20. The unordered joint eigenvalue distribution of an unnormalized
gue(n) is given by a density
1
p(µ1 , . . . , µN ) = det (KN (µi , µj ))N
i,j=1 ,
N!
where KN is the Hermite kernel
N
X −1
KN (x, y) = Ψk (x)Ψk (x).
k=0
79
8 Determinantal Processes and
Non-Crossing Paths:
Karlin–McGregor and
Gessel–Viennot
Our probability distributions for the eigenvalues of gue have a determinantal struc-
ture, i.e., are of the form
1
p(µ1 , . . . , µn ) = det (KN (µi , µj ))N
i,j=1 .
N!
They describe N eigenvalues which repel each other (via the factor (µi − µj )2 ). If we
consider corresponding processes, then the paths of the eigenvalues should not cross;
for this see also Section 8.3. There is a quite general relation between determinants
as above and non-crossing paths. This appeared in fundamental papers in different
contexts:
• in a paper by Karlin and McGregor, 1958, in the context of Markov chains
and Brownian motion
• in a paper of Lindström, 1973, in the context of matroids
• in a paper of Gessel and Viennot, 1985, in combinatorics
81
cross. Let xi be such that all distances are even, i.e., if two paths cross they have
to meet.
(1) (n)
Theorem 8.1 (Karlin–McGregor). Consider n copies of Yk , i.e., (Yk , . . . , Yk )
(i)
with Y0 = xi , where x1 > x2 > · · · > xn . Consider now t ∈ N and y1 > y2 > · · · >
yn . Denote by
Pt (xi , yj ) = P [Yt = yj | Y0 = xi ]
the probability of one random walk to get from xi to yj in t steps. Then we have
h i
(i)
P Yt = yi for all i, Ys(1) > Ys(2) > · · · > Ys(n) for all 0 ≤ s ≤ t = det (Pt (xi , yj ))ni,j=1 .
Example 8.2. For one symmetric random walk Yt we have the following probabilities
to go in two steps from 0 to -2,0,2:
p1 1
p0 p1 = 4
p0 q1
1
p0 q1 + q0 p−1 = 2
q0 p−1
q−1 q0 q−1 = 1
4
Theorem 8.1 says that we also obtain this probability from the transition proba-
bilities of one random walk as
!
1/2 1/4 1 1 3
det = − = .
1/4 1/2 4 16 16
82
Proof of Theorem 8.1. Let Ωij be the set of all possible paths in t steps from xi to
yj . Denote by P [π] the probability for such a path π ∈ Ωij . Then we have
X
Pt (xi , yj ) = P [π]
π∈Ωij
Here, the first term counts all pairs of paths x1 → y1 and x2 → y2 ; hence non-
crossing ones, but also crossing ones. However, such a crossing pair of paths is, via
the “reflection principle” (where we exchange the parts of the two paths after their
first crossing), in bijection with a pair of paths from x1 → y2 and x2 → y1 ; this
bijection also preserves the probabilities.
Those paths, x1 → y2 and x2 → y1 , are counted by the second term in the
determinant. Hence the second term cancels out all the crossing terms in the first
term, leaving only the non-crossing paths.
For general n it works in a similar way.
e
where we have weights mij = me on each edge i →
− j. This gives weights for directed
paths Y
P = via m(P ) = me ,
e∈P
83
and then also a weight for connecting two vertices a, b,
X
m(a, b) = m(P ),
P : a→b
where we sum over all directed paths from a to b. Note that this is a finite sum,
because we do not have directed cycles in our graph.
Lemma 8.4 (Gessel–Viennot). Let G be a finite acyclic weighted directed graph and
let A = (a1 , . . . , an ) and B = (b1 , . . . , bn ) be two n-sets of vertices. Then we have
n
det (m(ai , bj ))ni,j=1 =
X Y
sgn σ(P ) m(Pi ).
P : A→B i=1
vertex-disjoint
Proof. Similar as the proof of Theorem 8.1; the crossing paths cancel each other out
in the determinant.
This lemma can be useful in two directions. Whereas in the stochastic setting
one uses mainly the determinant to count non-crossing paths, one can also count
vertex-disjoint path systems to calculate determinants. The following is an example
of this.
Example 8.5. Let Cn be the Catalan numbers
C0 = 1, C1 = 1, C2 = 2, C3 = 5, C4 = 14, . . .
and consider
· · · Cn
C0 C1
C1
C2 · · · Cn+1
Mn =
.. ... .. .
. .
Cn Cn+1 · · · C2n
84
Then we have
det M0 = det(1) = 1,
!
1 1
det M1 = det = 2 − 1 = 1,
1 2
1 1 2
det M2 = det 1 2 5 = 28 + 10 + 10 − 8 − 14 − 25 = 1.
2 5 14
This is actually true for all n: det Mn = 1. This is not obvious directly, but follows
easily from Gessel–Viennot, if one chooses the right setting.
Let us show it for M2 . For this, consider the graph
b2
b1
a0 = b0
a1
a2
The possible directions in the graph are up and right, and all weights are chosen as 1.
Paths in this graph correspond to Dyck graphs, and thus the weights for connecting
the a’s with the b’s are counted by Catalan numbers; e.g.,
m(a0 , b0 ) = C0 ,
m(a0 , b1 ) = C1 ,
m(a0 , b2 ) = C2 ,
m(a2 , b2 ) = C4 .
Thus
m(a0 , b0 ) m(a0 , b1 ) m(a0 , b2 )
M2 = det m(a1 , b0 ) m(a1 , b1 ) m(a1 , b2 )
85
since there is only one such vertex-disjoint system of three paths, corresponding to
σ = id. This is given as follows; note that the path from a0 to b0 is actually a path
with 0 steps.
b2
b1
a0 = b0
a1
a2
We have seen that the eigenvalues of random matrices repel each other. This be-
comes even more apparent when we consider process versions of our random matri-
ces, where the eigenvalue processes yield then non-intersecting paths. Those process
versions of our Gaussian ensembles are called Dyson Brownian motions. They are
defined as AN (t) := (aij (t))N i,j=1 (t ≥ 0), where each aij (t) is a classical Brownian mo-
tion (complex or real) and they are independent, apart from the symmetry condition
aij (t) = āji (t) for all t ≥ 0 and all i, j = 1, . . . , N . The eigenvalues λ1 (t), . . . , λN (t)
of AN (t) give then N non-intersecting Brownian motions.
Here are plots for discretized random walk versions of the Dyson Brownian mo-
tion, corresponding to goe(13), gue(13) and, for comparision, also 13 independent
86
Brownian motions; see also Exercise 24. Guess which is which!
8 8
6
6
4
4
2
2
0
-2
-2
-4
-4
-6
-8 -6
0 500 1000 1500 0 500 1000 1500
-1
-2
-3
-4
0 200 400 600 800 1000 1200
87
9 Statistics of the Largest
Eigenvalue and Tracy–Widom
Distribution
Consider gue(n) or goe(n). For large N , the eigenvalue distribution is close to a
semircircle with density
1√
p(x) = 4 − x2 .
2π
bulk edge
We will now zoom to a microscopic level and try to understand the behaviour of
a single eigenvalue. The behaviour in the bulk and at the edge is different. We are
particularly interested in the largest eigenvalue. Note that at the moment we do
not even know whether the largest eigenvalue sticks close to 2 with high probability.
Wigner’s semicircle law implies that it cannot go far below 2, but it does not prevent
it from being very large. We will in particular see that this cannot happen.
Behaviour in the bulk: In [λ, λ+t] there should be ∼ tp(λ)N eigenvalues. This
is of order 1 if we choose t ∼ 1/N . This means that eigenvalues in the bulk have
89
for their position an interval of size ∼ 1/N , so this is a good guess for the order of
fluctuations for an eigenvalue in the bulk.
Thus 1 ∼ t3/2 N , i.e., t ∼ N −2/3 . Hence we expect for the largest eigenvalue an
interval or fluctuations of size N −2/3 . Very optimistically, we might expect
λmax ≈ 2 + N −2/3 X,
where X has N -independent distribution.
90
(2) On the other hand, when β is fixed, there is a large universality class for the
corresponding Tracy–Widom distribution. F2 shows up as limiting fluctuations
for
(a) largest eigenvalue of gue (Tracy, Widom, 1993),
(b) largest eigenvalue of more general Wigner matrices (Soshnikov, 1999),
(c) largest eigenvalue of general unitarily invariant matrix ensembles (Deift
et al., 1994-2000),
(d) length of the longest increasing subsequence of random permutations
(Baik, Deift, Johansson, 1999; Okounkov, 2000),
(e) arctic cicle for Aztec diamond (Johansson, 2005),
(f) various growth processes like ASEP (“asymmetric single exclusion pro-
cess”), TASEP (“totally asymmetric ...”).
(3) There is still no uniform explanation for this universality. The feeling is that
the Tracy-Widom distribution is somehow the analogue of the normal distri-
bution for a kind of central limit theorem, where independence is replaced by
some kind of dependence. But no one can make this precise at the moment.
(4) Proving Tracy–Widom for gue is out of reach for us, but we will give some
ideas. In particular, we try to derive rigorous estimates which show that our
N −2/3 -heuristic is of the right order and, in particular, we will prove that the
largest eigenvalue converges almost surely to 2.
N
λ2k 2k
X
≤ P j ≥ (2 + ε)
j=1
" #
(2 + ε)2k
=P tr A2k
N ≥
N
N h
2k
i
≤ E tr AN .
(2 + ε)2k
91
In the last step we used Markov’s inequality 6.3; note that we have even powers,
and hence the random variable tr(A2k
N ) is positive.
In Theorem 2.15 we calculated the expectation in terms of a genus expansion as
h i
E tr(A2k N #(γπ)−k−1 = εg (k)N −2g ,
X X
N) =
π∈P2 (2k) g≥0
where
εg (k) = # {π ∈ P2 (2k) | π has genus g} .
The inequality
N h
2k
i
P [λmax ≥ 2 + ε] ≤ E tr AN
(2 + ε)2k
is useless if k is fixed for N → ∞, because then the right hand side goes to ∞. Hence
we also have to scale k with N (we will use k ∼ N 2/3 ), but then the sub-leading
terms in the genus expansion become important. Up to now we only know that
ε0 (k) = Ck , but now we need some information on the other εg (k). This is provided
by a theorem of Harer and Zagier.
Theorem 9.2 (Harer–Zagier, 1986). Let us define bk by
εg (k)N −2g = Ck bk ,
X
g≥0
where Ck are the Catalan numbers. (Note that the bk depend also on N , but we
suppress this dependency in the notation.) Then we have the recursion formula
k(k + 1)
bk+1 = bk + bk−1
4N 2
for all k ≥ 2.
We will prove this later; see Section 9.6. For now, let us just check it for small
examples.
Example 9.3. From Remark 2.16 we know
h i
C1 b1 = E tr A2N = 1,
h i 1
C2 b2 = E tr A4N = 2 + ,
N2
h i 10
C3 b3 = E tr A6N = 5 + 2 ,
N
h i 70 21
C4 b4 = E tr A8N = 14 + 2 + 4 ,
N N
92
which gives
1 2 5 3
b1 = 1, b2 = 1 + , b3 = 1 + , b4 = 1 + + .
2N 2 N2 N 2 2N 4
k(k + 1) 2 12 1
b3 + 2
b2 = 1 + 2 + 2
1+
4N N 4N 2N 2
5 3
=1+ 2 +
N 2N 4
= b4
Proof. Note that, by definition, bk > 0 for all k ∈ N and hence, by Theorem 9.2,
bk+1 > bk . Thus,
! !
k(k + 1) k(k + 1) k2
bk+1 = bk + 2
bk−1 ≤ bk 1 + ≤ bk 1+ ;
4N 4N 2 2N 2
93
We can now continue our estimate for the largest eigenvalue.
N h
2k
i
P [λmax ≥ 2 + ε] ≤ E tr AN
(2 + ε)2k
N
= C k bk
(2 + ε)2k
!
N k3
≤ C k exp
(2 + ε)2k 2N 2
4k
!
N k3
≤ exp .
(2 + ε)2k k 3/2 2N 2
4k 4k
Ck ≤ √ 3/2 ≤ 3/2 .
πk k
This estimate is now strong enough to see that the largest eigenvalue has actually
to converge to 2. For this, let us fix ε > 0 and choose k depending on N as
kN := bN 2/3 c, where bxc denotes the smallest integer ≥ x. Then
3
N N →∞ kN N →∞ 1
3/2
−−−→ 1 and 2
−−−→ .
kN 2N 2
Hence
2kN
2
lim sup P [λmax ≥ 2 + ε] ≤ lim · 1 · e1/2 = 0,
N →∞ N →∞ 2+ε
and thus for all ε > 0,
lim P [λmax ≥ 2 + ε] = 0.
N →∞
This says that the largest eigenvalue λmax of a gue converges in probability to 2.
94
Corollary 9.6. For a normalized gue(n) matrix AN we have that its largest eigen-
value converges in probability, and also almost surely, to 2, i.e.,
N →∞
λmax (AN ) −−−→ 2 almost surely.
Proof. The convergence in probability was shown above. For the strenghtening to
almost sure convergence one has to use Borel–Cantelli and the fact that
2kN
X 2
< ∞.
N 2+ε
Then
3 3
N N →∞ 1 kN N →∞ r
3/2
−−−→ and −−−→ ,
kN r3/2 2N 2 2
and
!2bN 2/3 rc
4kN 1 N →∞
= 1 −−−→ e−rt ,
(2 + εN )2kN 1+ 2N 2/3
t
and thus
h i 1 3 /2
lim sup P λmax ≥ 2 + tN −2/3 ≤ er e−rt
N →∞ r3/2
√
for arbitrary r > 0. We optimize this now by choosing r = t for t > 0 and get
Proposition 9.7. For a normalized gue(n) matrix AN we have for all t > 0
1
2
h i
lim sup P λmax (AN ) ≥ 2 + tN − 3 ≤ t−3/4 exp − t3/2 .
N →∞ 2
Although this estimate does not prove the existence of the limit on the left hand
side, it turns out that the right hand side is quite sharp and captures the tail
behaviour of the Tracy–Widom distribution quite well.
95
9.5 Non-rigorous derivation of Tracy–Widom
distribution
For determining the Tracy–Widom fluctuations in the limit N → ∞ one has to use
the analytic description of the gue joint density. Recall from Theorem 7.20 that
the joint density of the unordered eigenvalues of an unnormalized gue(n) is given
by
1
p(µ1 , . . . , µN ) = det (KN (µi , µj ))N
i,j=1 ,
N!
where KN is the Hermite kernel
N
X −1
KN (x, y) = Ψk (x)Ψk (y)
k=0
Now consider
h i
(N )
P µmax ≤ t = P [there is no eigenvalue in (t, ∞)]
= 1 − P [there is an eigenvalue in (t, ∞)]
! !
N N
= 1 − N P [µ1 ∈ (t, ∞)] − P [µ1 , µ2 ∈ (t, ∞)] + P [µ1 , µ2 , µ3 ∈ (t, ∞)] − · · ·
2 3
N
! ∞ Z∞
N Z
(−1)r
X
=1+ · · · pN (µ1 , . . . , µr ) dµ1 · · · dµr
r=1 r
t t
N Z∞ Z∞
1
(−1)r det (K(µi , µj ))ri,j=1 dµ1 · · · dµr .
X
=1+ ···
r=1 r!
t t
96
(N )
√Note that p is the distribution for a gue(n) without normalization, i.e., µmax ≈
2 N . More precisely, we expect fluctutations
√ −2/3
√
µ(N )
max ≈ N 2 + tN = 2 N + tN −1/6 .
We put √ √
K̃N (x, y) = N −1/6 · KN 2 N + xN −1/6 , 2 N + yN −1/6
so that we have
N ∞ Z∞
(−1)r Z
" ! #
2/3 µ(N )
max − 2
X r
P N √ −2 ≤t = · · · det K̃(xi , xj ) dx1 · · · dxr .
N r=0 r! i,j=1
t t
exists. For this, we need the limit lim K̃N (x, y). Recall that
N →∞
N
X −1
KN (x, y) = Ψk (x)Ψk (y).
k=0
As this involves Ψk for all k = 0, 1, . . . , N −1 this is not amenable to taking the limit
N → ∞. However, by the Christoffel–Darboux identity for the Hermite functions
(see Exercise 27))
n−1
X Hk (x)Hk (y) Hn (x)Hn−1 (y) − Hn−1 (x)Hn (y)
=
k=0 k! (x − y) (n − 1)!
and with
1 2
Ψk (x) = (2π)−1/4 (k!)−1/2 e− 4 x Hk (x),
as defined in Definition 7.12, we can rewrite KN in the form
−1
1 NX 1 − 14 (x2 +y2 )
KN (x, y) = √ e Hk (x)Hk (y)
2π k=0 k!
1 1 2 HN (x)HN −1 (y) − HN −1 (x)HN (y)
= √ e− 4 (x +y )
2
2π (x − y) (N − 1)!
√ ΨN (x)ΨN −1 (y) − ΨN −1 (x)ΨN (y)
= N· .
x−y
97
Note that the ΨN satisfy the differential equation (see Exercise 30)
x √
Ψ0N (x) = − ΨN (x) + N ΨN −1 (x),
2
and thus
h i h i
ΨN (x) Ψ0N (y) + y2 ΨN (y) − Ψ0N (x) + x2 ΨN (x) ΨN (y)
KN (x, y) =
x−y
ΨN (x)Ψ0N (y) − Ψ0N (x)ΨN (y) 1
= − ΨN (x)ΨN (y).
x−y 2
Now put √
e (x) := N 1/12 · Ψ −1/6
ΨN N 2 N + xN ,
thus
√ √
e 0 (x) = N 1/12 · Ψ0 2 N + xN −1/6 · N −1/6 = N −1/12 · Ψ0 2 N + xN −1/6 .
Ψ N N N
Then
Ψ e 0 (y) − Ψ
e (x)Ψ e 0 (x)Ψ
e (y) 1 e0
N N N N e 0 (y).
K̃(x, y) = − Ψ (x)Ψ
x−y 2N 1/3 N N
Ai(x) = lim Ψ
e (x).
N
N →∞
and hence
Ai(x) Ai0 (y) − Ai0 (x) Ai(y)
lim K̃(x, y) = =: A(x, y).
N →∞ x−y
A is called the Airy kernel.
Let us try, again non-rigorously, to characterize this limit function Ai. For the
Hermite functions we have (see Exercise 30)
!
1 x2
Ψ00N (x) + N+ − ΨN (x) = 0.
2 4
98
For the Ψ
e we have
N
√
e 0 (x) = N −1/12 · Ψ0 2 N + xN −1/6
Ψ N N
and √
e 00 (x) = N −1/4 · Ψ00 2 N + xN −1/6 .
Ψ N N
Thus,
√ −1/6
2
1 2 N + xN √
e 00 (x) = −N −1/4
Ψ N + −
ΨN 2 N + xN −1/6
N
24
√
1 4N + 4 N xN −1/6 + x2 N −1/3 e
" #
−1/3
= −N N+ − ΨN (x)
2 4
1 4xN 1/3 + x2 N −1/3 e
" #
−1/3
= −N − ΨN (x)
2 4
≈ −xΨ
e (x)
N for large N .
This is indeed the case, but the proof is again beyond our tools. Let us just give
the formal definition of the Airy function and formulate the final result.
1 2
Ai(x) ∼ √ x−1/4 exp − x3/2 .
2 π 3
99
Theorem 9.9. The random variable N 2/3 (λmax (AN ) − 2) of a normalized gue(n)
has a limiting distribution as N → ∞. Its limiting distribution function is
2
h i
F2 (t) : = lim P N 3 (λmax − 2) ≤ t
N →∞
N ∞ Z∞
(−1)r Z
· · · det (A(xi , xj ))ri,j=1 dx1 · · · dxr .
X
=
r=0 r!
t t
The form of F2 from Theorem 9.9 is more of a theoretical nature and not very
convenient for calculations. A main contribution of Tracy–Widom in this context
was that they were able to derive another, quite astonishing, representation of F2 .
Theorem 9.10 (Tracy–Widom, 1994). The distribution function F2 satisfies
Z∞
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2
100
9.6 Proof of the Harer–Zagier recursion
We still have to prove the recursion of Harer–Zagier, Theorem 9.2.
Let us denote h i
T (k, N ) := E tr A2k εg (k)N −2g .
X
N =
g≥0
The genus expansion shows that T (k, N ) is, for fixed k, a polynomial in N −1 . Ex-
pressing it in terms of integrating over eigenvalues reveals the surprising fact that,
up to a Gaussian factor, it is also a polynomial in k for fixed N . We show this in
the next lemma. This is actually the only place where we need the random matrix
interpretation of this quantity.
Lemma 9.11. The expression
1
Nk T (k, N )
(2k − 1)!!
is a polynomial of degree N − 1 in k.
Proof. First check the easy case N = 1: T (k, 1) = (2k − 1)!! is the 2k-th moment of
a normal variable and
T (k, 1)
=1
(2k − 1)!!
is a polynomial of degree 0 in k.
101
with αl possibly depending on N . Thus,
N −1 Z
N 2
λ2k+2l e− 2 λ1 dλ1
X
T (k, N ) = N cN αl 1
l=0 R
N −1
αl · kN · (2k + 2l − 1)!! · N −(k+l) ,
X
= N cN
l=0
since the integral over λ1 gives the (2k + 2l)-th moment of a Gauss variable of
variance N −1 , where kN contains the N -dependent normalization constants of the
Gaussian measure; hence
N k T (k, N )
(2k − 1)!!
is a linear combination (with N -dependent coefficients) of terms of the form
(2k + 2l − 1)!!
.
(2k − 1)!!
Let us introduce
X
t̃(k, L) = # {coloring cycles of γπ with exactly L different colors} ,
π∈P2 (2k)
102
then we have
N
!
X N
t(k, N ) = t̃(k, L) ,
L=1 L
because if we want to use at most N different colors, then we can do this by using
exactly
L different colors (for any L between 1 and N ), and after fixing L we have
N
L
many possibilities to choose the L colors among the N colors.
This relation can be inverted by
N
!
X
N −L N
t̃(k, N ) = (−1) t(k, L)
L=1 L
and hence t̃(k, N )/(2k − 1)!! is also a polynomial in k of degree N − 1. But now we
have
since γπ has, by Proposition 2.20, at most k + 1 cycles for π ∈ P2 (2k); and thus
t̃(k + 1, N ) = 0 if k + 1 < N , as we need at least N cycles if we want to use N
different colors.
So, t̃(k, N )/(2k − 1)!! is a polynomial in k of degree N − 1 and we know N − 1
zeros; hence it must be of the form
!
t̃(k, N ) k
= αN k(k − 1) · · · (k − N + 2) = αN (N − 1)!.
(2k − 1)!! N −1
Hence,
N
! !
X N k
t(k, N ) = (L − 1)!αL (2k − 1)!!.
L=1 L L−1
To identify αN we look at
!
N
αN +1 N !(2N − 1)!! = t̃(N, N + 1) = CN (N + 1)!.
N
Note that only the NC pairings can be colored with exactly N + 1 colors, and for
103
each such π there are (N + 1)! ways of doing so. We conclude
CN (N + 1)!
αN +1 =
N !(2N − 1)!!
CN (N + 1)
=
(2N − 1)!!
!
1 2N N +1
=
N + 1 N (2N − 1)!!
(2N )!
=
N !N !(2N − 1)!!
2N
= .
N!
Thus we have
1
T (k, N ) = t(k, N )
N k+1
N
2L−1
! !
1 X N k
= k+1 (L − 1)! (2k − 1)!!
N L=1 L L−1 (L − 1)!
N
! !
1 N k
2L−1 .
X
= (2k − 1)!!
N k+1 L=1 L L−1
104
N
2s
= 1+
1−s
1+s N
= .
1−s
Note that (as in our calculation for the αN )
1 2k
=
(2k − 1)!! k!Ck (k + 1)
and hence T (s, N ) can also be rewritten as a generating function in our main quan-
tity of interest,
(N ) T (k, N )
bk = (we make now the dependence of bk on N explicit)
Ck
as
∞
T (k, N ) k
2 (N s)k+1
X
T (s, N ) = 1 + 2
k=0 (k + 1)!C k
∞ (N )
bk
(2N s)k+1 .
X
=1+
k=0 (k + 1)!
(N )
In order to get a recursion for the bk , we need some functional relation for T (s, N ).
Note that the recursion in Harer–Zagier involves bk , bk+1 , bk−1 for the same N , thus
we need a relation which does not change the N . For this we look on the derivative
with respect to s. From
1+s N
T (s, N ) =
1−s
we get
N −1
d 1+s (1 − s) + (1 + s)
T (s, N ) = N
ds 1−s (1 − s)2
N
1+s 1
= 2N
1−s (1 − s)(1 + s)
1
= 2N · T (s, N ) ,
1 − s2
and thus
d
(1 − s2 ) T (s, N ) = 2N · T (s, N ).
ds
105
Note that we have
∞ (N )
d bk
(2N s)k 2N.
X
T (s, N ) =
ds k=0 k!
106
10 Statistics of the Longest
Increasing Subsequence
10.1 Complete order is impossible
Definition 10.1. A permutation σ ∈ Sn is said to have an increasing subsequence
of length k if there exist indices 1 ≤ i1 < · · · < ik ≤ n such that σ(i1 ) < · · · < σ(ik ).
For a decreasing subsequence of length k the above holds with the second set of
inequalities reversed. For a given σ ∈ Sn we denote the length of an increasing
subsequence of maximal length by Ln (σ).
Example 10.2. (1) Maximal length is achieved for the identity permutation
!
1 2 ··· n − 1 n
σ = id = ;
1 2 ··· n − 1 n
in this case all increasing subsequences have length 1, hence Ln (σ) = 1; but
there is a decreasing subsequence of length n.
(3) Consider a more “typical” permutation
!
1 2 3 4 5 6 7
σ= ;
4 2 3 1 6 5 7
107
length 3. In the graphical representation
y
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 x
4 1 9 3 2 7 6 8 5
→ 4 1 9 2 3 7 6 8 5
→ 1 9 2 3 4 7 6 8 5
→ 1 9 2 3 4 5 7 6 8
→ 1 9 2 3 4 5 6 7 8
→ 1 2 3 4 5 6 7 8 9
in 9 − 4 = 5 operations.
(2) One has situations with only small increasing subsequences, but then one has
long decreasing subsequences. This is true in general; one cannot avoid both
long decreasing and long increasing subsequences at the same time. According
to the slogan
108
“Complete order is impossible.” (Motzkin)
Theorem 10.4 (Erdős, Szekeres, 1935). Every permutation σ ∈ Sn2 +1 has a mono-
tone subsequence of length more than n.
Proof. Write σ = a1 a2 · · · an2 +1 . Assign labels (xk , yk ), where xk is the length of
a longest increasing subsequence ending at ak ; and yk is the length of a longest
decreasing subsequence ending at ak . Assume now that there is no monotone subse-
quence of length n+1. Hence we have for all k: 1 ≤ xk , yk ≤ n; i.e., there are only n2
possible labels. By the pigeonhole principle there are i < j with (xi , yi ) = (xj , yj ). If
ai < aj we can append aj to a longest increasing subsequence ending at ai , but then
xj > xi . If ai > aj we can append aj to a longest decreasing subsequence ending at
ai , but then yj > yi . In both cases we have a contradiction.
109
the length of the longest subsequence could be related to the statistics of the largest
eigenvalue.
(1) The RSK correspondence relates permutations to Young diagrams. Ln goes
under this mapping to the length of the first row of the diagram.
(2) These Young diagrams correspond to non-intersecting paths.
(3) Via Gessel–Viennot the relevant quantities in terms of NC paths have a de-
terminantal form.
(4) Then one has to show that the involved kernel, suitably rescaled, converges to
the Airy kernel.
In the following we want to give some idea of the first two items in the above list;
the main (and very hard part of the proof) is to show the convergence to the Airy
kernel.
..
.
(2) A Young tableau of shape λ is the Young diagram λ filled with numbers 1, . . . , n
such that in any row the numbers are increasing from left to right and in any
column the numbers are increasing from top to bottom. We denote the set of
all Young tableaux of shape λ by Tab λ.
110
Example 10.6. (1) For n = 1 there is only one Young diagram, , and one
corresponding Young tableau: 1 .
For n = 2, there are two Young diagrams,
and ,
each of them having one corresponding Young tableau
1
1 2 and 2 .
;
the first and the third have only one tableau, but the middle one has two:
1 2 1 3
3 2
1 2 4 8
3 7
5
6
corresponds to the walk
1 2 4 1 2 4 1 2 4 8
1 2 4 3 3 7 3 7
1 2 1 2 4 3 5 5 5
1 → 1 2 → 3 → 3 → 5 → 6 → 6 → 6
Remark 10.7. Those objects are extremely important since they parametrize the
irreducible representations of Sn :
λ ` n ←→ irreducible representation πλ of Sn .
111
Furthermore, the dimension of such a representation πλ is given by the number of
tableaux of shape λ. If one recalls that for any finite group one has the general
statement that the sum of the squares of the dimensions over all irreducible repre-
sentations of the group gives the number of elements in the group, then one has for
the symmetric group the statement that
(#T abλ)2 = #Sn = n!.
X
λ`n
This shows that there is a bijection between elements in Sn and pairs of tableaux of
the same shape λ ` n. The RSK correspondence is such a concrete bijection, given
by an explicit algorithm. It has the property, that Ln goes under this bijection over
to the length of the first row of the corresponding Young diagram λ.
Example 10.8. For example, under the RSK correspondence, the permutation
!
1 2 3 4 5 6 7
σ=
4 2 3 6 5 1 7
corresponds to the pair of Young tableaux
1 3 5 7 1 3 4 7
2 6 2 5
4 , 6 .
Note that L7 (σ) = 4 is the length of the first row.
1 2 3 4 5 6 7 7 6 5 4 3 2 1
112
11 The Circular Law
11.1 Circular law for Ginibre ensemble
The non-selfadjoint analogue of gue is given by the Ginibre ensemble, where all
entries are independent and complex Gaussians. A standard complex Gaussian is
of the form
x + iy
z= √ ,
2
where x and y are independent standard real Gaussians, i.e., with joint distribution
1 − x2 − y 2
p(x, y) dx dy = e 2 e 2 dx dy.
2π
If we rewrite this in terms of a density with respect to the Lebesgue measure for
real and imaginary part
x + iy x − iy
z= √ = t1 + it2 , z= √ = t1 − it2 ,
2 2
we get
1 −(t21 +t22 ) 1 2
p(t1 , t2 ) dt1 dt2 = e dt1 dt2 = e−|z| d2 z,
π π
where d2 z = dt1 dt2 .
As for the gue case, Theorem 7.6, we can rewrite the density in terms of eigen-
values. Note that the eigenvalues are now complex.
113
Theorem 11.2. The joint distribution of the complex eigenvalues of an N × N
Ginibre ensemble is given by a density
N
!
2
|zi − zj |2 .
X Y
p(z1 , . . . , zN ) = cN exp − |zk |
k=1 1≤i<j≤N
Remark 11.3. (1) Note that typically Ginibre matrices are not normal, i.e., AA∗ 6=
A∗ A. This means that one loses the relation between functions in eigenvalues
and traces of functions in the matrix. The latter is what we can control, the
former is what we want to understand.
(2) As in the selfadjoint case the eigenvalues repel, hence there will almost surely
be no multiple eigenvalues. Thus we can also in the Ginibre case diagonalize
our matrix, i.e., A = V DV −1 , where D = diag(z1 , . . . , zN ) contains the eigen-
values. However, V is now not unitary anymore, i.e., eigenvectors for different
eigenvalues are in general not orthogonal. We can also diagonalize A∗ via A∗ =
(V −1 )∗ D∗ V ∗ , but since V −1 6= V ∗ (if A is not normal) we cannot diagonalize A
and A∗ simultaneously. This means that in general, for example Tr(AA∗ A∗ A)
has no clear relation to N ∗ ∗ ∗ ∗
i z¯i z¯i zi . (Note that Tr(AA A A) 6= Tr(AA AA )
P
i=1 z
if AA∗ 6= A∗ A, but of course N N
P P
i=1 zi z¯i z¯i zi = i=1 zi z¯i z1 z¯i .)
(3) In Theorem 11.2 it seems that we have rewritten the density exp(− Tr(AA∗ ))
2
as exp(− N k=1 |zk | ). However, this is more subtle. On can bring any matrix
P
114
Integrating out the tij (j > i) gives then the density for the zi .
(4) As for the gue case (Theorem 7.15) we can write the Vandermonde density in a
determinantal form. The only difference is that we have to replace the Hermite
polynomials Hk (x), which orthogonalize the real Gaussian distribution, by
monomials z k , which orthogonalize the complex Gaussian distribution.
Theorem 11.4. The joint eigenvalue distribution of the Ginibre ensemble is of the
determinantal form p(z1 , . . . , zn ) = N1 ! det(KN (zi , zj ))N
i,j=1 with the kernel
N −1
1 1 2 1
ϕk (z) = √ e− 2 |z| √ z k .
X
KN (z, w) = ϕk (z)ϕ̄k (w), where
k=0 π k!
1 1 −|z|2 NX−1
|z|2k
pN (z) = KN (z, z) = e .
N Nπ k=0 k!
Theorem 11.5 (Circular law for the Ginibre ensemble). The averaged eigenvalue
distribution for a normalized Ginibre random matrix √1N AN converges for N → ∞
weakly to the uniform distribution on the unit disc of C with density π1 1{z∈C||z|≤1} .
√ 1 N −1 2 k
2 X (N |z| )
qN (z) = N · pN ( N z) = e−N |z| .
π k=0 k!
We have to show that this converges to the circular density. For |z| < 1 we have
N |z|2
N
X −1
(N |z|2 )k X∞
(N |z|2 )k
e − =
k=0 k! k=N k!
(N |z|2 )N X∞
(N |z|2 )l
≤ l
N! l=0 (N + 1)
(N |z|2 )N 1
≤ 2 ,
N! 1 − NN|z|
+1
115
√ 1
Furthermore, using the lower bound N ! ≥ 2πN N + 2 e−N for N !, we calculate
2 (N |z|2 )N 2 1 1
e−N |z| ≤ e−N |z| N N |z|2N √ 1 e
N
N! 2π N N + 2
1 1 −N |z|2 N ln|z|2 N
=√ √ e e e
2π N
1 exp[N (− |z|2 + ln |z|2 + 1)] N →∞
= √ √ −→ 0.
2π N
Here, we used that − |z|2 + ln |z|2 + 1 < 0 for |z| < 1. Hence we conclude
−N |z|2
N
X −1
(N |z|2 )k 2 N
−N |z|2 (N |z| ) 1 N →∞
1−e ≤e |z| 2 −→ 0.
k=0 k! N! N
1 − N +1
Similarly, for |z| > 1,
N
X −1
(N |z|2 )k (N |z|2 )N −1 NX
−1
(N − 1)l (N |z|)N −1 1
≤ 2 l ≤ −1 ,
k=0 k! (N − 1)! l=0 (N |z| ) (N − 1)! 1 − NN|z| 2
Remark 11.6. (1) The convergence also holds almost surely. Here is a plot of the
3000 eigenvalues of one realization of a 3000 × 3000 Ginibre matrix.
1.5
0.5
-0.5
-1
-1.5
-1.5 -1 -0.5 0 0.5 1 1.5
116
(2) The circular law also holds for non-Gaussian entries, but proving this is much
harder than the extension for the semicircle law from the Gaussian case to
Wigner matrices.
Note that only the existence of the second moment is required, higher moments
don’t need to be finite.
Remark 11.8. (1) It took quite a while to prove this in full generality. Here is a
bit about the history of the proof.
• 60’s, Mehta proved it (see his book) in expectation for Ginibre ensemble;
• 80’s, Silverstein proved almost sure convergence for Ginibre;
• 80’s, 90’s, Girko outlined the main ideas for a proof in the general case;
• 1997, Bai gave the first rigorous proof, under additional assumptions on
the distribution
• papers by Tao–Vu, Götze–Tikhomirov, Pan–Zhou and others, weakening
more and more the assumptions;
• 2010, Tao–Vu gave final version under the assumption of the existence of
the second moment.
(2) For measures on C one can use ∗-moments or the Stieltjes transform to describe
them, but controlling the convergence properties is the main problem.
(3) For a matrix A its ∗-moments are all expressions of the form tr(Aε(1) · · · Aε(m) ),
where m ∈ N and ε(1), . . . , ε(m) ∈ {1, ∗}. The eigenvalue distribution
1
µA = (δz1 + · · · + δzN ) (z1 , . . . , zn are complex eigenvalues of A)
N
117
Example. Consider
0 1 0 ··· 0 0 1 0 ··· 0
. . .. . .
.. .
.. .. ... . .. . . . . . . . ...
.
. .. .. . .. ..
.. .
AN = . . 0
and BN =
. . . 0 .
.
. ...
..
. 1
0
. 1
0 ··· ··· ··· 0 1 0 ··· ··· 0
Then µAN = δ0 , but µBN is the uniform distribution on the N -th roots of
unity. Hence µAN → δ0 , whereas µBN converges to the uniform distribution
on the unit circle. However, the limits of the ∗-moments are the same for AN
and BN .
(4) For each measure µ on C one has the Stieltjes transform
Z
1
Sµ (w) = dµ(z).
z−w
C
to selfadjoint matrices. The left hand side for all λ determines µA and the
right hand side is about selfadjoint matrices
q
|A − λ1| = (A − λ1)(A − λ1)∗ .
In this analytic approach one still needs to control convergence properties. For
this, estimates of probabilities of small singular values are crucial.
For more details on this one should have a look at the survey of Bordenave–
Chafai, Around the circular law.
118
12 Several Independent GUEs
and Asymptotic Freeness
Up to now, we have only considered limits N → ∞ of one random matrix AN . But
often one has several matrix ensembles and would like to understand the “joint”
distribution; e.g., in order to use them as building blocks for more complicated
random matrix models.
In the case of two matrices A1 , A2 the notion µA1 ,A2 has only one meaning,
119
namely the collection of all mixed moments tr[Ai1 · · · Aim ] with m ∈ N and
i1 , . . . , im ∈ {1, 2}. If A1 and A2 do not commute then there exists no proba-
bility measure µ on R2 such that
Z
tr[Ai1 · · · Aim ] = ti1 · · · tim dµ(t1 , t2 )
are independent sets of Gaussian random variables. Equivalently, this can be charac-
terized by the requirement that all entries of all matrices together form a collection of
independent standard Gaussian variables (real on the diagonal, complex otherwise).
Hence we can express this again in terms of the Wick formula 2.8 as
h i h i
(i ) (i ) X (i ) (i )
E ak11l1 · · · akmmlm = Eπ ak11l1 , . . . , akmmlm
π∈P2 (m)
Now we can essentially repeat the calculations from Remark 2.14 for our mixed
moments:
N
1 X h
(i ) (i ) (i )
i
E [tr(Ai1 · · · Aim )] = m E ak11k2 ak22k3 · · · akmmk1
N 1+ 2 k1 ,...,km =1
N
1 X X h
(i ) (i ) (i )
i
= 1+ m
Eπ ak11k2 , ak22k3 , . . . , akmmk1
N 2
k1 ,...,km =1 π∈P2 (m)
N
1 X X Y h
(i ) (i )
i
= 1+ m
E akppkp+1 akqqkq+1
N 2
k1 ,...,km =1 π∈P2 (m) (p,q)∈π
120
N
1 X X Y
= 1+ m
[kp = kq+1 ] [kq = kp+1 ] [ip = iq ]
N 2
k1 ,...,km =1 π∈P2 (m) (p,q)∈π
N
1 X X Yh i
= 1+ m
kp = kγπ(p)
N 2
π∈P2 (m) k1 ,...,km =1 p
(p,q)∈π
ip =iq
1
N #(γπ) ,
X
= 1+ m
N 2
π∈P2 (m)
(p,q)∈π
ip =iq
and also
[i]
N C 2 (m) := {π ∈ N C 2 (m) | π respects i} .
and thus
[i]
lim E [tr(Ai1 · · · Aim )] = #N C 2 (m).
N →∞
Proof. The genus expansion follows from our computation above. The limit for
N → ∞ follows as for Wigner’s semicircle law 2.21 from the fact that
#(γπ)− m −1
1, π ∈ N C 2 (m),
lim N 2 =
N →∞ 0, π 6∈ N C 2 (m).
121
12.3 The concept of free independence
Remark 12.4. We would like to find some structure in those limiting moments. We
prefer to talk directly about the limit instead of making asymptotic statements. In
the case of one gue, we had the semicircle µW as a limiting analytic object. Now
we do not have an analytic object in the limit, but we can organize our distribution
as the limit of moments in a more algebraic way.
Remark 12.6. (1) Note that if we consider only one of the si , then its distribution
is just the collection of Catalan numbers, hence correspond to the semicircle,
which we understand quite well.
(2) If we consider all s1 , . . . , sr , then their joint distribution is a large collection
of numbers. We claim that the following theorem discovers some important
structure in those.
We say that s1 , . . . , sr are free (or freely independent); in terms of the indepen-
(N )
dent gue random matrices, we say that A1 , . . . , A(N r
)
are asymtotially free. Those
notions and the results above are all due to Dan Voiculescu.
122
Proof. It suffices to prove the statement for polynomials of the form
pk (sik ) = spikk − ϕ spikk
for any power pk , since general polynomials can be written as linear combinations
of those. The general statement then follows by linearity. So we have to prove that
h i
ϕ spi11 − ϕ spi11 · · · spimm − ϕ spimm = 0.
We have
p p
h i
spi11 − ϕ spi11 · · · spimm − ϕ spimm (−1)|M |
X Y Y
ϕ = ϕ sijj ϕ sijj
M ⊂[m] j∈M j6∈M
with
p
ϕ sijj = ϕ sij · · · sij = #N C 2 (pj )
and
Y p [respects indices] X
ϕ sijj = #N C 2 pj .
j6∈M j6∈M
Let us put
I1 = {1, . . . , p1 }
I2 = {p1 + 1, . . . , p1 + p2 }
..
.
Im = {p1 + p2 + · · · + pm−1 + 1, . . . , p1 + p2 + · · · + pm }
and I = I1 ∪ I2 ∪ · · · ∪ Im . Denote
[. . . ] = [i1 , . . . , i1 , i2 , . . . , i2 , . . . , im , . . . , im ].
Then
p p
Y Y [... ]
ϕ sijj ϕ sijj = #{π ∈ N C 2 (I) | for all j ∈ M all elements
j∈M j6∈M
Let us denote
[... ] [... ]
N C 2 (I : j) := {π ∈ N C 2 (I) | elements in Ij are only paired amongst each other}.
123
Then, by the inclusion-exclusion formula,
h i
[... ]
spi11 − ϕ spi11 · · · spimm − ϕ spimm (−1)|M | · #
X \
ϕ = N C 2 (I : j)
M ⊂[m] j∈M
[... ] [ [... ]
= # N C 2 (I)\ N C 2 (I : j) .
j
[... ]
These are π ∈ N C 2 (I) such that at least one element of each interval Ij is paired
with an element from another interval Ik . Since i1 6= i2 , i2 6= i3 , . . . , im−1 6= im
we cannot connect neighboring intervals and each interval must be connected to
another interval in a non-crossing way. But there is no such π, hence
h i
[··· ] [··· ]
sip11 spi11 spimm spimm
[
ϕ −ϕ ··· −ϕ = # N C 2 (I)\ N C 2 (I : j) = 0,
j
as claimed.
Remark 12.8. (1) Note that in Theorem 12.7 we have traded the explicit descrip-
tion of our moments for implicit relations between the moments.
(2) For example, the simplest relations from Theorem 12.7 are
ϕ [spi − ϕ(spi )1][sqj − ϕ(sqj )1] = 0,
for i 6= j, which can be reformulated to
ϕ(spi sqj ) − ϕ(spi 1)ϕ(sqj ) − ϕ(spi )ϕ(sqj 1) + ϕ(spi )ϕ(sqj )ϕ(1) = 0,
i.e.,
ϕ(spi sqj ) = ϕ(spi )ϕ(sqj ).
Those relations are quickly getting more complicated. For example,
ϕ [(sp11 − ϕ(sp11 )1)(sq21 − ϕ(sq21 )1)(sp12 − ϕ(sp12 )1)(sq22 − ϕ(sq22 )1)] = 0
leads to
ϕ (sp11 sq21 sp12 s2q2 ) = ϕ sp11 +p2 ϕ (sq21 ) ϕ (sq22 )
+ ϕ (sp11 ) ϕ (sp12 ) ϕ sq21 +q2
− ϕ (sp11 ) ϕ (sq21 ) ϕ (sp12 ) ϕ (sq22 ) .
These relations are to be considered as non-commutative versions for the fac-
toriziation rules of expectations of independent random variables.
124
(3) One might ask: What is it good for to find those relations between the mo-
ments, if we know the moments in a more explicit form anyhow?
Answer: Those relations occur in many more situations. For example, inde-
pendent Wishart matrices satisfy the same relations, even though the explicit
form of their mixed moments is quite different from the gue case.
Furthermore, we can control what happens with these relations much better
than with the explicit moments if we deform our setting or construct new
random matrices out of other ones.
Not to mention that those relations also show up in very different corners of
mathematics (like operator algebras).
To make a long story short: Those relations from Theorem 12.7 are really
worth being investigated further, not just in a random matrix context, but
also for its own sake. This is the topic of a course on Free Probability Theory,
which can, for example, be found here:
rolandspeicher.files.wordpress.com/2019/08/free-probability.pdf
125
13 Exercises
13.1 Assignment 1
Exercise 1. Make yourself familiar with MATLAB (or any other programming
language which allows you to generate random matrices and calculate eigenvalues).
In particular, you should try to generate random matrices and calculate and plot
their eigenvalues.
Exercise 2. In this exercise we want to derive the explicit formula for the Catalan
numbers. We define numbers ck by the recursion
k−1
X
ck = cl ck−l−1 (13.1)
l=0
f (z) = 1 + zf (z)2 .
127
(4) Conclude that
!
1 2k
ck = C k = .
k+1 k
Exercise 3. Consider the semicircular distribution, given by the density function
1√
4 − x2 1[−2,2] , (13.2)
2π
where 1[−2,2] denotes the indicator function of the interval [−2, 2]. Show that (13.2)
indeed defines a probability measure, i.e.
2
1 Z √
4 − x2 dx = 1.
2π
−2
Moreover show that the even moments of the measure are given by the Catalan
numbers and the odd ones vanish, i.e.
2
1 Z n√ 0 n is odd
x 4 − x2 dx = .
2π Ck n = 2k
−2
13.2 Assignment 2
Exercise 4. Using your favorite programing language or computer algebra system,
generate N × N random matrices for N = 3, 9, 100. Produce a plot of the eigenvalue
distribution for a single random matrix and as well as a plot for the average over a
reasonable number of matrices of given size. The entries should be independent and
identically distributed (i.i.d.) according to
(1) the Bernoulli distribution 21 (δ−1 + δ1 ), where δx denotes the Dirac measure
with atom x.
(2) the normal distribution.
Exercise 5. Prove Proposition 2.2, i.e. compute the moments of a standart Gaus-
sian random variable:
Z∞
1 2 0 n odd,
− t2
√ tn e dt =
2π −∞ (n − 1)!! n even.
128
(1) Show that
Tr(A2 )
C exp(−N )dA,
2
where C is a constant and
N
Y Y
dA = dxii dxij dyij .
i=1 i<y
13.3 Assignment 3
Exercise 8. Produce histograms for various random matrix ensembles.
(1) Produce histograms for the averaged situation: average over 1000 realizations
for the eigenvalue distribution of a an N × N Gaussian random matrix (or
alternatively ±1 entries) and compare this with one random realization for
N = 5, 50, 500, 1000.
(2) Check via histograms that Wigner’s semicircle law is insensitive to the common
distribution of the entries as long as those are independent; compare typical
realisations for N = 100 and N = 3000 for different distributions of the entries:
±1, Gaussian, uniform distribution on the interval [−1, +1].
(3) Check what happens when we give up the constraint that the the entries are
centered; take for example the uniform distribution on [0, 2].
129
(4) Check whether the semicircle law is sensitive to what happens on the diagonal
of the matrix. Choose one distribution (e.g. Gaussian) for the off-diagonal
elements and another distribution for the elements on the diagonal (extreme
case: put the diagonal equal to zero).
(5) Try to see what happens when we take a distribution for the entries which
does not have finite second moment; for example, the Cauchy distribution.
Exercise 9. In the proof of Theorem 3.9 we have seen that the m-th moment of
a Wigner matrix is asymptotically counted by the number of partitions σ ∈ P(m),
for which the corresponding graph Gσ is a tree; then the corresponding walk i1 →
i2 → · · · → im → i1 (where ker i = σ) uses each edge exactly twice, in opposite
directions. Assign to such a σ a pairing by opening/closing a pair when an edge is
used for the first/second time in the corresponding walk.
(1) Show that this map gives a bijection between the σ ∈ P(m) for which Gσ is a
tree and non-crossing pairings π ∈ N C2 (m).
(2) Is there a relation between σ and γπ, under this bijection?
Exercise 10. For a probability measure µ on R we define its Stieltjes transform Sµ
by Z
1
Sµ (z) := dµ(t)
t−z
R
+
for all z ∈ C := {z ∈ C | Im(z) > 0}. Show the following for a Stieltjes transform
S = Sµ .
(1) S : C+ → C + .
(2) S is analytic on C+ .
(3) We have
lim iyS(iy) = −1
y→∞
and sup y|S(x + iy)| = 1.
y>0,x∈R
13.4 Assignment 4
Exercise 11. (1) Let ν be the Cauchy distribution, i.e.,
1 1
dν(t) = dt.
π 1 + t2
Show that the Stieltjes transform of ν is given by
1
S(z) = for z ∈ C+ .
−i − z
130
(Note that this formula is not valid in C− .)
Recover from this the Cauchy distribution via the Stieltjes inversion formula.
(2) Let A be a selfadjoint matrix in MN (C) and consider its spectral distribution
µA = N1 N
P
i=1 δλi , where λ1 , . . . , λN are the eigenvalues (counted with multi-
plicity) of A. Prove that for any z ∈ C+ the Stieltjes transform SµA of µA is
given by
SµA (z) = tr[(A − zI)−1 ].
Exercise 12. Let (µN )N ∈N be a sequence of probability measures on R which con-
verges vaguely to µ. Assume that µ is also a probablity measure. Show the following.
(1) The sequence (µN )N ∈N is tight, i.e., for each ε > 0 there is a compact interval
I = [−R, R] such that µN (R\I) ≤ ε for all N ∈ N.
(2) µN converges to µ also weakly.
Exercise 13. The problems with being determined by moments and whether conver-
gence in moments implies weak convergence are mainly coming from the behaviour
of our probability measures around infinity. If we restrict everything to a compact
interval, then the main statements follow quite easily by relying on the Weierstrass
theorem for approximating continuous functions by polynomials. In the following
you should not use Theorem 4.12.
In the following let I = [−R, R] be a fixed compact interval in R.
(1) Assume that µ is a probability measure on R which has its support in I (i.e.,
µ(I) = 1). Show that all moments of µ are finite and that µ is determined by
its moments (among all probability measures on R).
(2) Consider in addition a sequence of probability measures µN , such that µN (I) =
1 for all N . Show that the following are equivalent:
• µN converges weakly to µ;
• the moments of µN converge to the corresponding moments of µ.
13.5 Assignment 5
In this assignment we want to investigate the behaviour of the limiting eigenvalue
distribution of matrices under certain perturbations. In order to do so, it is crucial
to deal with different kinds of matrix norms. We recall the most important ones for
the following exercises. Let A ∈ MN (C), then we define the following norms.
• The spectral norm (or operator norm):
√
kAk = max{ λ : λ is an eigenvalue of AA∗ }.
Some of its important properties are:
131
(i) It is submultiplicative, i.e. for A, B ∈ MN (C) one has
kABk ≤ kAk · kBk.
(ii) It is also given as the operator norm
kAxk2
kAk = sup ,
x∈CN kxk2
x6=0
Exercise 14. In this exercise we will prove some useful facts about these norms,
which you will have to use in the next exercise when adressing the problem of
perturbed random matrices.
Prove the following properties of the matrix norms.
(1) For A, B ∈ MN (C) we have | Tr(AB)| ≤ kAk2 · kBk2 .
(2) Let A ∈ MN (C) be positive and B ∈ MN (C) arbitrary. Prove that
| Tr(AB)| ≤ kBk Tr(A).
(A ∈ MN (C) is positive if there is a matrix C ∈ MN (C) such that A = C ∗ C;
this is equivalent to the fact that A is selfadjoint and all the eigenvalues of A
are positive.)
(3) Let A ∈ MN (C) be normal, i.e. AA∗ = A∗ A, and B ∈ MN (C) arbitrary. Prove
that
max{kABk2 , kBAk2 } ≤ kBk2 · kAk
Hint: normal matrices are unitarily diagonalizable.
Exercise 15. In this main exercise we want to investigate the behaviour of the
eigenvalue distribution of selfadjoint matrices with respect to certain types of per-
turbations.
(1) Let A ∈ MN (C) be selfadjoint, z ∈ C+ and RA (z) = (A − zI)−1 . Prove that
1
kRA (z)k ≤
Im(z)
and that RA (z) is normal.
132
(2) First we study a general perturbation by a selfadjoint matrix.
Let, for any N ∈ N, XN = (Xij )N N
i,j=1 and YN = (Yij )i,j=1 be selfadjoint matrices
in MN (C) and define X̃N = XN + YN . Show that
s
1 tr(YN2 )
tr(R √1 XN (z)) − tr(R √1 X̃N )(z) ≤
N N (Im(z))2 N
(3) In this part we want to show that the diagonal of a matrix does not contribute
to the eigenvalue distribution in the large N limit, if it is not too ill-behaved.
As before, consider a selfadjoint matrix XN = (Xij )N D
i,j=1 ∈ MN (C); let XN =
(0)
diag(X11 , . . . , XN N ) be the diagonal part of XN and XN = XN − XND the part
of XN with zero diagonal. Assume that kXND k2 ≤ N for all N ∈ N. Show that
tr(R √1 XN (z)) − tr(R √1 X (0) )(z) → 0, as N → ∞.
N N N
13.6 Assignment 6
Exercise 16. We will address here concentration estimates for the law of large
numbers, and see that control of higher moments allows stronger estimates. Let
Xi be a sequence of independent and identically distributed random variables with
common mean µ = E[Xi ]. We put
n
1X
Sn := Xi .
n i=1
(1) Assume that the variance Var [Xi ] is finite. Prove that we have then the weak
law of large numbers, i.e., convergence in probability of Sn to the mean: for
any > 0
P(ω | |Sn (ω) − µ| ≥ ) → 0, for n → ∞.
(2) Assume that the fourth moment of the Xi is finite, E [Xi4 ] < ∞. Show that
we have then the strong law of large numbers, i.e.,
Sn → µ, almost surely.
133
One should also note that our assumptions for the weak and strong law of
large numbers are far from optimal. Even the existence of the variance is not
needed for them, but for proofs of such general versions one needs other tools
then our simple consequences of Cheyshev’s inequality.
Exercise 17. Let XN = √1N (xij )N i,j=1 , where the xij are all (without symmetry
condition) independent and identically distributed with standard complex Gaussian
distribution. We denote the adjoint (i.e., congugate transpose) of XN by XN∗ .
(1) By following the ideas from our proof of Wigner’s semicircle law for the gue
in Chapter 3 show the following: the averaged trace of any ∗-moment in XN
and XN∗ , i.e.,
p(1) p(m)
E[tr(XN · · · XN )] where p(1), . . . , p(m) ∈ {1, ∗}
is for N → ∞ given by the number of non-crossing pairings π in N C2 (m)
which satisfy the additional requirement that each block of π connects an X
with an X ∗ .
(2) Use the result from part (1) to show that the asymptotic averaged eigenvalue
distribution of WN := XN XN∗ is the same as the square of the semicircle
distribution, i.e. the distribution of Y 2 if Y has a semicircular distribution.
(3) Calculate the explicit form of the asymptotic averaged eigenvalue distribution
of WN .
(4) Again, the convergence is here also in probability and almost surely. Produce
histograms of samples of the random matrix WN for large N and compare it
with the analytic result from (3).
Exercise 18. We consider now random matrices WN = XN XN∗ as in Exercise 17,
but now we allow the XN to be rectangular matrices, i.e., of the form
1
XN = √ (xij ) 1≤i≤N ,
p 1≤j≤p
where again all xij are independent and identically distributed. We allow now real
or complex entries. (In case the entries are real, XN∗ is of course just the transpose
XNT .) Such matrices are called Wishart matrices. Note that we can now not multiply
XN and XN∗ in arbitrary order, but alternating products as in WN make sense.
(1) What is the general relation between the eigenvalues of XN XN∗ and the eigen-
values of XN∗ XN . Note that the first is an N × N matrix, whereas the second
is a p × p matrix.
(2) Produce histograms for the eigenvalues of WN := XN XN∗ for N = 50, p = 100
as well as for N = 500, p = 1000, for different distributions of the xij ;
134
• standard real Gaussian random variables
• standard complex Gaussian random variables
• Bernoulli random variables, i.e., xij takes on values +1 and −1, each with
probability 1/2.
(3) Compare your histograms with the density, for c = 0.5 = N/p, of the Marchenko–
Pastur distribution which is given by
q
(λ+ − x)(x − λ− ) √ 2
1[λ− ,λ+ ] (x),
where λ± := 1 ± c .
2πcx
13.7 Assignment 7
Exercise 19. Prove – by adapting the proof for the goe case and parametrizing
a unitary matrix in the form U = e−iH , where H is a selfajoint matrix – Theorem
7.6: The joint eigenvalue distribution of the eigenvalues of a gue(n) is given by a
density
N 2 2
ĉN e− 2 (λ1 +···+λN ) (λl − λk )2 ,
Y
k<l
Exercise 20. In order to get a feeling for the repulsion of the eigenvalues of goe
and gue compare histograms for the following situations:
• the eigenvalues of a gue(n) matrix for one realization
• the eigenvalues of a goe(n) matrix for one realization
• N independently chosen realizations of a random variable with semicircular
distribution
for a few suitable values of N (for example, take N = 50 or N = 500).
Exercise 21. For small values of N (like N = 2, 3, 4, 5, 10) plot the histogram of
averaged versions of gue(n) and of goe(n) and notice the fine structure in the gue
case. In the next assignment we will compare this with the analytic expression for
the gue(n) density from class.
13.8 Assignment 8
Exercise 22. In this exercise we define the Hermite polynomials Hn by
2 /2 dn −x2 /2
Hn (x) = (−1)n ex e
dxn
135
and want to show that they are the same polynomials we defined in Definition 7.10
and that they satisfy the recursion relation. So, starting from the above definition
show the following.
(1) For any n ≥ 1,
xHn (x) = Hn+1 (x) + nHn−1 (x).
(2) Hn is a monic polynomial of degree n. Furthermore, it is an even function if
n is even and an odd function if n is odd.
(3) The Hn are orthogonal with respect to the Gaussian measure
2 /2
dγ(x) = (2π)−1/2 e−x dx.
for the unnormalized gue(n) to the density qN (λ) for the normalized gue(n)
(with second moment normalized to 1).
(2) Then average over sufficiently many normalized gue(n), plot their histograms,
and compare this to the analytic density qN (λ). Do this at least for N =
1, 2, 3, 5, 10, 20, 50.
(3) Check also numerically that qN converges, for N → ∞, to the semicircle.
(4) For comparison, also average over goe(n) and over Wigner ensembles with
non-Gaussian distribution for the entries, for some small N .
Exercise 24. In this exercise we will approximate the Dyson Brownian motions from
Section 8.3 by their discretized random walk versions and plot the corresponding
walks of the eigenvalues.
(1) Approximate the Dyson Brownian motion by its discretized random walk ver-
sion
k
X (i)
AN (k) := ∆ · AN , for 1 ≤ k ≤ K
i=1
136
(1) (K)
where AN , . . . , AN are K independent normalized gue(n) random matrices.
∆ is a time increment. Generate a random realization of such a Dyson random
walk AN (k) and plot the N eigenvalues λ1 (k), . . . , λN (k) of AN (k) versus k in
the same plot to see the time evolution of the N eigenvalues. Produce at least
plots for three different values of N .
Hint: Start with N = 15, ∆ = 0.01, K = 1500, but also play around with
those parameters.
(2) For the same parameters as in part (i) consider the situation where you replace
gue by goe and produce corresponding plots. What is the effect of this on
the behaviour of the eigenvalues?
(3) For the three considered cases of N in parts (1) and 2i), plot also N indepen-
dent random walks in one plot, i.e.,
k
∆ · x(i) ,
X
λ̃N (k) := for 1 ≤ k ≤ K
i=1
where x(1) , . . . , x(K) are K independent real standard Gaussian random vari-
ables.
You should get some plots like in Section 8.3.
13.9 Assignment 9
Exercise 25. Produce histograms for the Tracy–Widom distribution by plotting
(λmax − 2)N 2/3 .
(1) Produce histograms for the largest eigenvalue of gue(n), for N = 50, N = 100,
N = 200, with at least 5000 trials in each case.
(2) Produce histograms for the largest eigenvalue of goe(n), for N = 50, N = 100,
N = 200, with at least 5000 trials in each case.
(3) Consider also real and complex Wigner matrices with non-Gaussian distribu-
tion for the entries.
(4) Check numerically whether putting the diagonal equal to zero (in gue or
Wigner) has an effect on the statistics of the largest eigenvalue.
(5) Bonus: Take a situation where we do not have convergence to semicircle, e.g.,
Wigner matrices with Cauchy distribution for the entries. Is there a reasonable
guess for the asymptotics of the distribution of the largest eigenvalue?
(6) Superbonus: Compare the situation of repelling eigenvalues with “indepen-
dent” eigenvalues. Produce N independent copies x1 , . . . , xN of variables dis-
tributed according to the semicircle distribution and take then the maximal
137
value xmax of these. Produce a histogram of the statistics of xmax . Is there a
limit of this for N → ∞; how does one have to scale with N ?
Exercise 26. Prove the estimate for the Catalan numbers
4k
Ck ≤ 3/2 √ ∀k ∈ N.
k π
Show that this gives the right asymptotics, i.e., prove that
4k √
lim 3/2 = π.
k→∞ k Ck
Exercise 27. Let Hn (x) be the Hermite polynomials. The Christoffel-Darboux
identity says that
n−1
X Hk (x)Hk (y) Hn (x)Hn−1 (y) − Hn−1 (x)Hn (y)
= .
k=0 k! (x − y) (n − 1)!
(1) Check this identity for n = 1 and n = 2.
(2) Prove the identity for general n.
13.10 Assignment 10
Exercise 28. Work out the details for the “almost sure” part of Corollary 9.6, i.e.,
prove that almost surely the largest eigenvalue of gue(n) converges, for N → ∞,
to 2.
Exercise 29. Consider the rescaled Hermite functions
√
Ψ̃(x) := N 1/12 ΨN (2 N + xN −1/6 ).
(1) Check numerically that the rescaled Hermite functions have a limit for N → ∞
by plotting them for different values of N .
(2) Familarize yourself with the Airy function. Compare the above plots of Ψ̃N
for large N with a plot of the Airy function.
Hint: MATLAB has an implementation of the Airy function, see
https://de.mathworks.com/help/symbolic/airy.html
Exercise 30. Prove that the Hermite functions satisfy the following differential
equations:
x √
Ψ0n (x) = − Ψn (x) + nΨn−1 (x)
2
and
1 x2
Ψ00n (x) + (n + − )Ψn (x) = 0.
2 4
138
13.11 Assignment 11
Exercise 31. Read the notes “Random Matrix Theory and its Innovative Applica-
tions” by A. Edelman and Y. Wang,
http:
//math.mit.edu/~edelman/publications/random_matrix_theory_innovative.pdf
and implement its “Code 7” for calculating the Tracy–Widom distribution (via solv-
ing the Painlevé II equation) and compare the output with the histogram for the
rescaled largest eigenvalue for the gue from Exercise 25. You should get a plot like
after Theorem 9.10.
Exercise 32. For N = 100, 1000, 5000 plot in the complex plane the eigenvalues of
one N × N random matrix √1N AN , where all entries (without symmetry condition)
are independent and identically distributed according to the
(i) standard Gaussian distribution;
(ii) symmetric Bernoulli distribution;
(iii) Cauchy distribution.
139
14 Literature
Books
(1) Gernot Akemann, Jinho Baik, Philippe Di Francesco: The Oxford Handbook
of Random Matrix Theory, Oxford Handbooks in Mathematics, 2011.
(2) Greg Anderson, Alice Guionnet, Ofer Zeitouni: An Introduction to Random
Matrices, Cambridge University Press, 2010.
(3) Zhidong Bai, Jack Silverstein: Spectral Analysis of Large Dimensional Random
Matrices, Springer-Verlag 2010.
(4) Patrick Billingsley: Probability and Measure, John Wiley & Sons, 3rd edition,
1995.
(5) Stéphane Boucheron, Gábor Lugosi, Pascal Massart: Concentration inequali-
ties: A nonasymptotic theory of independence, Oxford University Press, Ox-
ford, 2013.
(6) Alice Guionnet: Large Random Matrices: Lectures on Macroscopic Asymp-
totics, Springer-Verlag 2009.
(7) Madan Lal Mehta: Random Matices, Elsevier Academic Press, 3rd edition,
2004.
(8) James Mingo, Roland Speicher: Free Probability and Random Matrices, Springer-
Verlag, 2017.
(9) Alexandru Nica, Roland Speicher: Lectures on the Combinatorics of Free Prob-
ability, Cambridge University Press 2006.
141