Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

RM Notes Speicher v2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 141

Saarland University

Faculty of Mathematics and Computer Science


Department of Mathematics

Random matrices
Lecture notes
Winter 2019/20

(version 2 from May 14, 2020)

Prof. Dr. Roland Speicher


This in an introduction to random matrix theory, giving an impression of some of
the most important aspects of this modern subject. In particular, it covers the basic
combinatorial and analytic theory around Wigner’s semicircle law, featuring also
concentration phenomena, and the Tracy–Widom distribution of the largest eigen-
value. The circular law and a discussion of Voiculescu’s multivariate extension of the
semicircle law, as an appetizer for free probability theory, also make an appearance.
This manuscript here is an updated version of a joint manuscript with Marwa
Banna from an earlier version of this course; it relies substantially on the sources
given in the literature; in particular, the lecture notes of Todd Kemp were inspiring
and very helpful at various places.
The material here was presented in the winter term 2019/20 at Saarland University
in 24 lectures of 90 minutes each. The lectures were recorded and can be found online
at https://www.math.uni-sb.de/ag/speicher/web_video/index.html.

0.35 8

0.3 6

4
0.25

2
0.2

0.15

-2

0.1
-4

0.05
-6

0 -8
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 0 500 1000 1500

1.5 0.5

0.45

1
0.4

0.35
0.5

0.3

0 0.25

0.2

-0.5
0.15

0.1
-1

0.05

-1.5 0
-1.5 -1 -0.5 0 0.5 1 1.5 -5 -4 -3 -2 -1 0 1 2

2
Table of contents
1 Introduction 7
1.1 Brief history of random matrix theory . . . . . . . . . . . . . . . . . . 7
1.2 What are random matrices and what do we want to know about them? 8
1.3 Wigner’s semicircle law . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Concentration phenomena . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 From histograms to moments . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Choice of scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.8 The semicircle and its moments . . . . . . . . . . . . . . . . . . . . . 15
1.9 Types of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Gaussian Random Matrices: Wick Formula and Combinatorial


Proof of Wigner’s Semicircle 19
2.1 Gaussian random variables and Wick formula . . . . . . . . . . . . . 19
2.2 Gaussian random matrices and genus expansion . . . . . . . . . . . . 23
2.3 Non-crossing pairings . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Semicircle law for GUE . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Wigner Matrices: Combinatorial Proof of Wigner’s Semicircle Law 33


3.1 Wigner matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Combinatorial description of moments of Wigner matrices . . . . . . 34
3.3 Semicircle law for Wigner matrices . . . . . . . . . . . . . . . . . . . 37

4 Analytic Tools: Stieltjes Transform and Convergence of Measures 39


4.1 Stieltjes transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Convergence of probability measures . . . . . . . . . . . . . . . . . . 43
4.3 Probability measures determined by moments . . . . . . . . . . . . . 46
4.4 Description of weak convergence via the Stieltjes transform . . . . . . 47

5 Analytic Proof of Wigner’s Semicircle Law for Gaussian Random


Matrices 49
5.1 GOE random matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3
5.2 Stein’s identity for independent Gaussian variables . . . . . . . . . . 51
5.3 Semicircle law for GOE . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 Concentration Phenomena and Stronger Forms of Convergence


for the Semicircle Law 55
6.1 Forms of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Markov’s and Chebyshev’s inequality . . . . . . . . . . . . . . . . . . 56
6.3 Poincaré inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.4 Concentration for tr[RA (z)] via Poincaré inequality . . . . . . . . . . 63
6.5 Logarithmic Sobolev inequalities . . . . . . . . . . . . . . . . . . . . . 65

7 Analytic Description of the Eigenvalue Distribution of Gaussian


Random Matrices 67
7.1 Joint eigenvalue distribution for GOE and GUE . . . . . . . . . . . . 67
7.2 Rewriting the Vandermonde . . . . . . . . . . . . . . . . . . . . . . . 72
7.3 Rewriting the GUE density in terms of Hermite kernels . . . . . . . . 73

8 Determinantal Processes and Non-Crossing Paths: Karlin–McGregor


and Gessel–Viennot 81
8.1 Stochastic version à la Karlin–McGregor . . . . . . . . . . . . . . . . 81
8.2 Combinatorial version à la Gessel–Viennot . . . . . . . . . . . . . . . 83
8.3 Dyson Brownian motion and non-intersecting paths . . . . . . . . . . 86

9 Statistics of the Largest Eigenvalue and Tracy–Widom Distribution 89


9.1 Some heuristics on single eigenvalues . . . . . . . . . . . . . . . . . . 89
9.2 Tracy–Widom distribution . . . . . . . . . . . . . . . . . . . . . . . . 90
9.3 Convergence of the largest eigenvalue to 2 . . . . . . . . . . . . . . . 91
9.4 Estimate for fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.5 Non-rigorous derivation of Tracy–Widom distribution . . . . . . . . . 96
9.6 Proof of the Harer–Zagier recursion . . . . . . . . . . . . . . . . . . . 101

10 Statistics of the Longest Increasing Subsequence 107


10.1 Complete order is impossible . . . . . . . . . . . . . . . . . . . . . . . 107
10.2 Tracy–Widom for the asymptotic distribution of Ln . . . . . . . . . . 109
10.3 Very rough sketch of the proof of the Baik, Deift, Johansson theorem 109
10.3.1 RSK correspondence . . . . . . . . . . . . . . . . . . . . . . . 110
10.3.2 Relation to non-intersecting paths . . . . . . . . . . . . . . . . 112

4
11 The Circular Law 113
11.1 Circular law for Ginibre ensemble . . . . . . . . . . . . . . . . . . . . 113
11.2 General circular law . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

12 Several Independent GUEs and Asymptotic Freeness 119


12.1 The problem of non-commutativity . . . . . . . . . . . . . . . . . . . 119
12.2 Joint moments of independent GUEs . . . . . . . . . . . . . . . . . . 120
12.3 The concept of free independence . . . . . . . . . . . . . . . . . . . . 122

13 Exercises 127
13.1 Assignment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
13.2 Assignment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
13.3 Assignment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
13.4 Assignment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
13.5 Assignment 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
13.6 Assignment 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
13.7 Assignment 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
13.8 Assignment 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
13.9 Assignment 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
13.10Assignment 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
13.11Assignment 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

14 Literature 141

5
1 Introduction
We start with giving a brief history of the subject and a feeling for some the basic
objects, questions, and methods - this is just motivational and should be seen as an
appetizer. Rigorous versions of statements will come in later chapters.

1.1 Brief history of random matrix theory


• 1897: the paper Über die Erzeugung der Invarianten durch Integration by
Hurwitz is, according to Diaconis and Forrester, the first paper on random
matrices in mathematics;
• 1928: usually, the first appearance of random matrices, for fixed size N , is
attributed to a paper of Wishart in statistics;
• 1955: Wigner introduced random matrices as statistical models for heavy nu-
clei and studied in particular the asymptotics for N → ∞ (the so-called “large
N limit”); this set off a lot of activity around random matrices in physics;
• since 1960’s: random matrices have become important tools in physics; in par-
ticular in the context of quantum chaos and universality questions; important
work was done by by Mehta and Dyson;
• 1967: the first and influential book on Random Matrices by Mehta appeared;
• 1967: Marchenko and Pastur calculated the asymptotics N → ∞ of Wishart
matrices;
• ∼ 1972: a relation between the statistics of eigenvalues of random matrices
and the zeros of the Riemann ζ-function was conjectured by Montgomery and
Dyson; with substantial evidence given by numerical calculations of Odlyzko;
this made the subject more and more popular in mathematics;
• since 1990’s: random matrices are studied more and more extensively in math-
ematics, in the context of quite different topics, like
◦ Tracy-Widom distribution of largest eigenvalue
◦ free probability theory
◦ universality of fluctuations
◦ “circular law”
◦ and many more

7
1.2 What are random matrices and what do we
want to know about them?
A random matrix is a matrix A = (aij )N
i,j=1 where the entries aij are chosen randomly
and we are mainly interested in the eigenvalues of the matrices. Often we require
A to be selfadjoint, which guarantees that its eigenvalues are real.
Example 1.1. Choose aij ∈ {−1, +1} with aij = aji for all i, j. We consider all
such matrices and ask for typical or generic behaviour of the eigenvalues. In a more
probabilistic language we declare all allowed matrices to have the same probability
and we ask for probabilities of properties of the eigenvalues. We can do this for
different sizes N . To get a feeling, let us look at different N .
• For N = 1 we have two matrices.
matrix eigenvalues probability of the matrix
1
(1) +1 2

1
(−1) −1 2
• For N = 2 we have eight matrices.

matrix
! eigenvalues probability of the matrix
1 1 1
0, 2 8
1 1

√ √
!
1 1 1
− 2, 2
1 −1 8

!
1 −1 1
0, 2
−1 1 8

√ √
!
−1 1 1
− 2, 2
1 −1 8

√ √
!
1 −1 1
− 2, 2
−1 −1 8

!
−1 1 1
−2, 0
1 −1 8

8
√ √
!
−1 −1 1
− 2, 2
−1 1 8

!
−1 −1 1
−2, 0
−1 −1 8

• For general N , we have 2N (N +1)/2 matrices, each counting with probability


2−N (N +1)/2 . Of course, there are always very special matrices such as
 
1 ··· 1
. . .. 
A =  .. . .

.,
1 ··· 1
where all entries are +1. Such a matrix has an atypical eigenvalue behaviour
(namely, only two eigenvalues, N and 0, the latter with multiplicity N − 1);
however, the probability of such atypical behaviours will become small if we
increase N . What we are interested in is the behaviour of most of the matrices
for large N .
Question. What is the “typical” behaviour of the eigenvalues?
• Here are two “randomly generated” (by throwing a coin for each of the entries
on and above the diagonal) 8 × 8 symmetric matrices with ±1 entries ...
−1 −1 1 1 1 −1 1 −1
  1 −1 −1 1 −1 −1 1 −1

−1 −1 1 −1 −1 −1 1 −1 −1 1 −1 1 1 −1 1 1 
1 1 1 1 −1 −1 1 −1 −1 −1 −1 −1 1 1 1 −1
 1 −1 1 −1 −1 −1 1 1  1 1 −1 1 −1 1 1 1 
   
 1 −1 −1 −1 1 1 −1 −1 −1 1 1 −1 −1 −1 −1 −1
−1 −1 −1 −1 1 1 −1 1 
 −1 −1 1 1 −1 1 −1 −1

 
1 1 1 1 −1 −1 −1 −1 1 1 1 1 −1 −1 −1 −1
−1 −1 −1 1 −1 1 −1 −1 −1 1 −1 1 −1 −1 −1 −1

... and their corresponding histograms of the 8 eigenvalues


1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
−10 −8 −6 −4 −2 0 2 4 6 8 10 −10 −8 −6 −4 −2 0 2 4 6 8 10

9
The message of those histograms is not so clear - apart from maybe that
degeneration of eigenvalues is atypical. However, if we increase N further
then there will appear much more structure.
• Here are the eigenvalue histograms for two “random” 100 × 100 matrices ...

0.4 0.4

0.35 0.35

0.3 0.3

0.25 0.25

0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

0 0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

• ... and here for two “random” 3000 × 3000 matrices ...

0.4 0.4

0.35 0.35

0.3 0.3

0.25 0.25

0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

0 0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

To be clear, no coins were thrown for producing those matrices, but we relied on
the MATLAB procedure for creating random matrices. Note that we also rescaled
our matrices, as we will address in Section 1.7.

10
1.3 Wigner’s semicircle law
What we see in the above figures is the most basic and important result of random
matrix theory, the so-called Wigner’s semicirlce law . . .

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

. . . which says that typically the eigenvalue distribution of such a random matrix
converges to Wigner’s semicircle for N → ∞.

eigenvalue distribution → semicircle


0.35 0.35

0.3 0.3

0.25 0.25

0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

0 0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Note the quite surprising feature that the limit of the random eigenvalue dis-
tribution for N → ∞ is a deterministic object - the semicircle distribution. The
randomness disappears for large N .

11
1.4 Universality
This statement is valid much more generally. Choose the aij not just from {−1, +1}
but, for example,
• aij ∈ {1, 2, 3, 4, 5, 6},
• aij normally (Gauß) distributed,
• aij distributed according to your favorite distribution,
but still independent (apart from symmetry), then we still have the same result:
The eigenvalue distribution typically converges to a semicircle for N → ∞.

1.5 Concentration phenomena


The (quite amazing) fact that the a priori random eigenvalue distribution is, for
N → ∞, not random anymore, but concentrated on one deterministic distribution
(namely the semicircle) is an example of the general high-dimensional phenomenon
of “measure concentration”.
Example 1.2. To illustrate this let us give an easy but illustrating example of such
a concentration in high dimensions; namely that the volume of a ball is essentially
sitting in the surface.
Denote by Br (0) the ball of radius r about 0 in Rn and for 0 < ε < 1 consider the
ε-neighborhood of the surface inside the ball, B := {x ∈ Rn | 1 − ε ≤ kxk ≤ 1}. As
we know the volume of balls
n
n π2
vol(Br (0)) = r n
 ,
2
−1 !

we can calculate the volume of B as


n
π2
vol(B) = vol(B1 (0)) − vol(B1−ε (0)) = 
n
 (1 − (1 − ε)n ) .
2
−1 !

Thus,
vol(B) n→∞
= 1 − (1 − ε)n −→ 1.
vol(B1 (0))
This says that in high dimensions the volume of a ball is concentrated in an arbitrar-
ily small neighborhood of the surface. This is, of course, not true in small dimension -
hence from our usual 3-dimensional perspective this appears quite counter-intuitive.

12
1.6 From histograms to moments
Let AN = A = (aij )N
i,j=1 be our selfadjoint matrix with aij = ±1 randomly chosen.
Then we typically see for the eigenvalues of A:

N →∞

s t s t

This convergence means


t t
#{eigenvalues in [s, t]} N →∞ Z Z
−−−→ dµW = pW (x) dx,
N s s

where µW is the semicircle distribution, with density pW .


The left-hand side of this is difficult to calculate directly, but we note that the
above statement is the same as
N
1 X N →∞
Z
1[s,t] (λi ) −−−→ 1[s,t] (x) dµW (x), (?)
N i=1
R

where λ1 , . . . , λN are the eigenvalues of A counted with multiplicity and 1[s,t] is the
characteristic function of [s, t], i.e,

1, x ∈ [s, t],
1[s,t] (x) =
0, x 6∈ [s, t].

Hence in (?) we are claiming that


N
1 X N →∞
Z
f (λi ) −−−→ f (x) dµW (x)
N i=1
R

for all f = 1[s,t] . It is easier to calculate this for other functions f , in particular, for
f of the form f (x) = xn , i.e.,
N
1 X n N →∞
Z
λ −−−→ xn dµW (x); (??)
N i=1 i
R

13
the latter are the moments of µW . (Note that µW must necessarily be a probability
measure.)
We will see later that in our case the validity of (?) for all s < t is equivalent to
the validity of (??) for all n. Hence we want to show (??) for all n.
Remark 1.3. The above raises of course the question: What is the advantage of (??)
over (?), or of xn over 1[s,t] ?
Note that A = A∗ is selfadjoint and hence can be diagonalized, i.e., A = U DU ∗ ,
where U is unitary and D is diagonal with dii = λi for all i (where λ1 , . . . , λN are
the eigenvalues of A, counted with multiplicity). Moreover, we have
λn1
 

An = (U DU ∗ )n = U Dn U ∗ with Dn = 
 ... 
,
 
λnN
hence
N
λni = Tr(Dn ) = Tr(U Dn U ∗ ) = Tr(An )
X

i=1
and thus
N
1 X 1
λni = Tr(An ).
N i=1 N
1
Notation 1.4. We denote by tr = N
Tr the normalized trace of matrices, i.e.,
N
  1 X
tr (aij )N
i,j=1 = aii .
N i=1
So we are claiming that for our matrices we typically have that
Z
N →∞
tr(AnN ) −−−→ xn dµW (x).

The advantage in this formulation is that the quantity tr(AnN ) can be expressed in
terms of the entries of the matrix, without actually having to calculate the eigen-
values.

1.7 Choice of scaling


Note that we need to choose the right scaling in N for the existence of the limit
N → ∞. For the case aij ∈ {±1} with AN = A∗N we have
N
1 X 1 X 2 1
tr(A2N ) = aij aji = aij = N 2 = N.
N i,j=1 N i,j=1 N

14
Since this has to converge for N → ∞ we should rescale our matrices
1
AN √ AN ,
N
i.e., we consider matrices AN = (aij )N √1
i,j=1 , where aij = ± N . For this scaling we
claim that we typically have that
Z
N →∞
tr(AnN ) −−−→ xn dµW (x)

for a deterministic probability measure µW .

1.8 The semicircle and its moments


It’s now probably time to give the precise definition of the semicircle distribution
and, in the light of what we have to prove, also its moments.
Definition 1.5. (1) The (standard) semicircular distribution µW is the measure
on [−2, 2] with density
1√
dµW (x) = 4 − x2 dx.

1
π

−2 2

(2) The Catalan numbers (Ck )k≥0 are given by


!
1 2k
Ck = .
k+1 k
They look like this: 1, 1, 2, 5, 14, 42, 132, . . .
Theorem 1.6. (1) The Catalan numbers have the following properties.
(i) The Catalan numbers satisfy the following recursion:
k−1
X
Ck = Cl Ck−l−1 (k ≥ 1)
l=0

15
(ii) The Catalan numbers are uniquely determined by this recursion and by
the initial value C0 = 1.
(2) The semicircular distribution µW is a probability measure, i.e.,
2
1 Z √
4 − x2 dx = 1

−2

and its moments are given by


2

1 Z n√ 0, n odd,
x 4 − x2 dx =
2π Ck , n = 2k even.
−2

Exercises 2 and 3 address the proof of Theorem 1.6.

1.9 Types of convergence


So we are claiming that typically

tr(A2N ) → 1, tr(A4N ) → 2, tr(A6N ) → 5, tr(A8N ) → 14, tr(A10


N ) → 42,

and so forth. But what do we mean by “typically”? The mathematical expression


for this is “almost surely”, but for now let us look on the more intuitive “convergence
in probability” for tr(A2k
N ) → Ck .
Denote by ΩN the set of our considered matrices, that is
( )
1 ∗
ΩN = AN = √ (aij )N
i,j=1 | AN = AN and aij ∈ {±1} .
N

Then convergence in probability means that for all ε > 0 we have


n   o
# AN ∈ ΩN tr A2k
N − Ck > ε

N →∞
   
= P AN tr A2k
N − C k

> ε −−−→ 0. (?)
#ΩN

How can we show (?)? Usually, one proceeds as follows.


(1) First we show the weaker form of convergence in average, i.e.,
 
tr A2k
P
AN ∈ΩN N
h  i
N →∞
= E tr A2k
N −−−→ Ck .
#ΩN

16
(2) Then we show that with high probability the derivation from the average will
become small as N → ∞.
We will first consider step (1); (2) is a concentration phenomenon and will be
treated later.
Note that step (1) is giving us the insight into why the semicircle shows up. Step
(2) is more of a theoretical nature, adding nothing to our understanding of the
semicircle, but making the (very interesting!) statement that in high dimensions
the typical behaviour is close to the average behaviour.

17
2 Gaussian Random Matrices:
Wick Formula and
Combinatorial Proof of
Wigner’s Semicircle
We want to prove convergence of our random matrices to the semicircle by showing
N →∞
h i
E tr A2k
N −−−→ Ck ,

where the Ck are the Catalan numbers.


Up to now our matrices were of the form AN = √1N (aij )Ni,j=1 with aij ∈ {−1, 1}.
From an analytic and also combinatorial point of view it is easier to deal with
another choice for the aij , namely we will take them as Gaussian (aka normal)
random variables; different aij will still be, up to symmetry, independent. If we
want to calculate the expectations tr(A2k ), then we should of course understand
how to calculate moments of independent Gaussian random variables.

2.1 Gaussian random variables and Wick formula


Definition 2.1. A standard Gaussian (or normal) random variable X is a real-
valued Gaussian random variable with mean 0 and variance 1, i.e., it has distribution
t2
1 Z − t2
P [t1 ≤ X ≤ t2 ] = √ e 2 dt
2π t
1

and hence its moments are given by

1 Z n − t2
E [X n ] = √ t e 2 dt.

R

19
Proposition 2.2. The moments of a standard Gaussian random variable are of the
form


1 Z n − t2 0, n odd,
√ t e 2 dt =
2π −∞ (n − 1)!!, n even,

where the “double factorial” is defined, for m odd, as


m!! = m(m − 2)(m − 4) · · · 5 · 3 · 1.
You are asked to prove this in Exercise 5.
Remark 2.3. From an analytic point of view it is surprising that those integrals
evaluate to natural numbers. They actually count interesting combinatorial objects,
h i
E X 2k = #{pairings of 2k elements}.
Definition 2.4. (1) For a natural number n ∈ N we put [n] = {1, . . . , n}.
(2) A pairing π of [n] is a decomposition of [n] into disjoints subsets of size 2, i.e.,
π = {V1 , . . . , Vk } such that for all i, j = 1, . . . , k with i 6= j, we have:
• Vi ⊂ [n]
• #Vi = 2
• Vi ∩ Vj = ∅
• ki=1 Vi = [n]
S

Note that necessarily k = n2 .


(3) The set of all pairings of [n] is denoted by
P2 (n) = {π | π is a pairing of [n]}.
Proposition 2.5. (1) We have

0, n odd,
#P2 (n) = 
(n − 1)!!, n even.
(2) Hence for a standard Gaussian variable X we have
E [X n ] = #P2 (n).
Proof. (1) Count elements in P2 (n) in a recursive way. Choose the pair which
contains the element 1, for this we have n − 1 possibilities. Then we are left
with choosing a pairing of the remaining n − 2 numbers. Hence we have
#P2 (n) = (n − 1) · #P2 (n − 2).
Iterating this and noting that #P2 (1) = 0 and #P2 (2) = 1 gives the desired
result.

20
(2) Follows from (i) and Proposition 2.2.

Example 2.6. Usually we draw our partitions by connecting the elements in each
pair. Then E [X 2 ] = 1 corresponds to the single partition

1 2

and E [X 4 ] = 3 corresponds to the three partitions

1 2 3 4 1 2 3 4
1 2 3 4
, , .

Remark 2.7 (Independent Gaussian random variables). We will have several, say
two, Gaussian random variables X, Y and have to calculate their joint moments.
The random variables are independent; this means that their joint distribution is
the product measure of the single distributions,

P [t1 ≤ X ≤ t2 , s1 ≤ Y ≤ s2 ] = P [t1 ≤ X ≤ t2 ] · P [s1 ≤ Y ≤ s2 ] ,

so in particular, for the moments we have

E [X n Y m ] = E [X n ] · E [Y m ] .

This gives then also a combinatorial description for their mixed moments:

E [X n Y m ] = E [X n ] · E [Y m ]
= #{pairings of X · · X}} · #{pairings of |Y ·{z
| ·{z · · Y}}
n m

| ·{z
= #{pairings of X · · X} Y
| ·{z
· · Y} which connect X with X and Y with Y }.
n m

Example. We have E [XXY Y ] = 1 since the only possible pairing is

X X Y Y.

On the other hand, E [XXXY XY ] = 3 since we have the following three possible
pairings:
X X X Y X Y X X X Y X Y X X X Y X Y

21
Consider now x1 , . . . , xn ∈ {X, Y }. Then we still have

E [x1 . . . xn ] = #{pairings which connect X with X and Y with Y }.

Can we decide in a more abstract way whether xi = xj or xi 6= xj ? Yes, we can read


this from the corresponding second moment, since

E [x2 ] = 1, xi = xj ,
i
E [xi xj ] =
E [xi ] E [xj ] = 0, xi 6= xj .

Hence we have:
X Y
E [x1 · · · xn ] = E [xi xj ]
π∈P2 (n) (i,j)∈π

Theorem 2.8 (Wick 1950, physics; Isserlis 1918, statistics). Let Y1 , . . . , Yp be inde-
pendent standard Gaussian random variables and consider x1 , . . . , xn ∈ {Y1 , . . . , Yp }.
Then we have the Wick formula
X
E [x1 · · · xn ] = Eπ [x1 , . . . , xn ] ,
π∈P2 (n)

where, for π ∈ P2 (n), we use the notation


Y
Eπ [x1 , . . . , xn ] = E [xi xj ] .
(i,j)∈π

Note that the Wick formula is linear in the xi , hence it remains valid if we replace
the xi by linear combinations of the xj . In particular, we can go over to complex
Gaussian variables.
Definition 2.9. A standard complex Gaussian random variable Z is of the form
X + iY
Z= √ ,
2
where X and Y are independent standard real Gaussian variables.
X+iY
Remark 2.10. Let Z be a standard complex Gaussian, i.e., Z = √
2
. Then we have
Z̄ = X−iY

2
and the first and second moments are given by
h i
• E [Z] = 0 = E Z̄
• E [Z 2 ] = E [ZZ] = 12 [E [XX] − E [Y Y ] + i (E [XY ] + E [Y X])] = 0

22
h i
• E Z̄ 2 = 0
h i
• E [|Z|2 ] = E Z Z̄ = 12 [E [XX] + E [Y Y ] + i (E [Y X] − E [Y X])] = 1
z1 z2
Hence, for z1 , z2 ∈ {Z, Z̄} and π = we have


1, π connects Z with Z̄,
E [z1 z2 ] =
0, π connects (Z with Z) or (Z̄ with Z).

Theorem 2.11. Let Z1 , . . . , Zp be independent standard complex Gaussian random


variables and consider z1 , . . . , zn ∈ {Z1 , Z̄1 , . . . , Zp , Z̄p }. Then we have the Wick
formula
X
E [z1 · · · zn ] = Eπ [z1 , . . . , zn ]
π∈P2 (n)

= #{pairings of [n] which connect Zi with Z̄i }.

2.2 Gaussian random matrices and genus


expansion
Now we are ready to consider random matrices with such complex Gaussians as
entries.

Definition 2.12. A Gaussian random matrix is of the form AN = √1N (aij )N i,j=1 ,
where
• AN = A∗N , i.e., aij = āji for all i, j,
• {aij | i ≥ j} are independent,
• each aij is a standard Gaussian random variable, which is complex for i 6= j
and real for i = j.

Remark 2.13. (1) More precisely, we should address the above as selfadjoint Gaus-
sian random matrices.
(2) Another common name for those random matrices is gue, which stands for
Gaussian unitary ensemble. “Unitary” corresponds here to the fact that the
entries are complex, since such matrices are invariant under unitary transfor-
mations. With gue(n) we denote the gue of size N × N . There are also real
and quaternionic versions, Gaussian orthogonal ensembles goe, and Gaussian
symplectic ensembles gse.

23
(3) Note that we can also express this definition in terms of the Wick formula 2.11
as
h i X h i
E ai(1)j(1) · · · ai(n)j(n) = Eπ ai(1)j(1) , . . . , ai(n)j(n) ,
π∈P2 (n)

for all n and 1 ≤ i(1), j(1), . . . , i(n), j(n) ≤ N , and where the second moments
are given by

E [aij akl ] = δil δjk .

So we have for example for the fourth moment


h i
E ai(1)j(1) ai(2)j(2) ai(3)j(3) ai(4)j(4) =δi(1)j(2) δj(1)i(2) δi(3)j(4) δj(3)i(4)
+ δi(1)j(3) δj(1)i(3) δi(2)j(4) δj(2)i(4)
+ δi(1)j(4) δj(1)i(4) δi(2)j(3) δj(2)i(3) ,

and more concretely, E [a12 a21 a11 a11 ] = 1 and E [a12 a12 a21 a21 ] = 2.
Remark 2.14 (Calculation of E [tr(Am
N )]). For our Gaussian random matrix we want
to calculate their moments
N
1 1 h i
E [tr(Am
X
N )] = E ai(1)i(2) ai(2)i(3) · · · ai(m)i(1) .
N N m/2 i(1),...,i(m)=1

Let us first consider small examples before we treat the general case:
(1)
N
h i 1 X 1
E tr(A2N ) = 2 E [aij aji ] = 2 N 2 = 1,
N i,j=1 | {z } N
=1

and hence: E [tr(A2N )]


= 1 = C1 for all N .
(2) We consider the partitions
1 2 3 4 1 2 3 4
π1 = 1 2 3 4, π2 = , π3 = .

With this, we have


N
h i 1
E tr(A4N ) =
X
E [aij ajk akl ali ]
N3 i,j,k,l=1
| {z }
=Eπ1 [...]+Eπ2 [...]+Eπ3 [...]

24
and calculate
N N
1 = N 3,
X X
Eπ1 [aij , ajk , akl , ali ] =
i,j,k,l=1 i,j,k,l=1
i=k
N N
1 = N 3,
X X
Eπ2 [aij , ajk , akl , ali ] =
i,j,k,l=1 i,j,k,l=1
j=l
N
X N
X N
X
Eπ3 [aij , ajk , akl , ali ] = 1= 1 = N,
i,j,k,l=1 i,j,k,l=1 i=1
i=l,j=k,j=i,k=l

hence h i 1  3  1
E tr(A4N ) = 3
N + N 3
+ N = 2 + 2.
N N
So we have h i
lim E tr(A4N ) = 2 = C2 .
N →∞

(3) Let us now do the general case.


h i X h i
E ai(1)i(2) ai(2)i(3) · · · ai(m)i(1) = Eπ ai(1)i(2) , ai(2)i(3) , . . . , ai(m)i(1)
π∈P2 (m)
X Y h i
= E ai(k)i(k+1) ai(l)i(l+1) .
π∈P2 (m) (k,l)∈π

We use the notation [i = j] = δij and, by identifying a pairing π with a


permutation π ∈ Sm via

(k, l) ∈ π ↔ π(k) = l, π(l) = k,

find that
N
1 h i
E [tr(Am
X X Y
N )] = E ai(k)i(k+1) ai(l)i(l+1)
N m/2+1 i(1),...,i(m)=1 π∈P2 (m) (k,l)∈π
N
1 X X Yh i
= i(k) = i(π(k) + 1) ,
N m/2+1 π∈P2 (m) i(1),...,i(m)=1 k
| {z }
γπ(k)

where γ = (1, 2, . . . , m) ∈ Sm is the shift by 1 modulo m. The above product


is different from 0 if and only if i : [m] → [N ] is constant on the cycles of

25
γπ ∈ Sm . Thus we get finally
1
E [tr(Am N #(γπ) ,
X
N )] =
N m/2+1 π∈P2 (m)

where #(γπ) is the number of cycles of the permutation γπ.


Hence we have proved the following.

Theorem 2.15. Let AN be a gue(n) random matrix. Then we have for all m ∈ N,
m
E [tr(Am N #(γπ)− 2 −1 .
X
N )] =
π∈P2 (m)

Example 2.16. (1) This says in particular that all odd moments are zero, since
P2 (2k + 1) = ∅.
(2) Let m = 2, then γ = (1, 2) and we have only one π = (1, 2); then γπ = id =
(1)(2), and thus #(γπ) = 2 and
m
#(γπ) − − 1 = 0.
2
Thus, h  i
E tr A2N = N 0 = 1.
(3) Let m = 4 and γ = (1, 2, 3, 4). Then there are three π ∈ P2 (4) with the
following contributions:

π γπ #(γπ) − 3 contribution
(1, 2)(34) (1, 3)(2)(4) 0 N0 = 1
(13)(24) (1, 4, 3, 2) −2 N −2 = N12
(14)(23) (1)(2, 4)(3) 0 N0 = 1

so that
h  i 1
E tr A4N =2+ .
N2
(4) In the same way one can calculate that
h  i 1
E tr A6N = 5 + 10
,
N2
h  i 1 1
E tr A8N = 14 + 70 2 + 21 4 .
N N

26
(5) For m = 6 the following 5 pairings give contribution N 0 :

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6

Those are non-crossing pairings, all other pairings π ∈ P2 (6) have a crossing,
e.g.:
1 2 3 4 5 6

2.3 Non-crossing pairings


Definition 2.17. A pairing π ∈ P2 (m) is non-crossing (NC ) if there are no pairs
(i, k) and (j, l) in π with i < j < k < l, i.e., we don’t have a crossing in π.

i j k k
is not allowed!

We put
N C 2 (m) = {π ∈ P2 (m) | π is non-crossing} .

Example 2.18. (1) N C 2 (2) = P2 (2) = { }


(2) N C 2 (4) = { , } and P2 (4)\N C 2 (4) = { }

27
(3) The 5 elements of N C 2 (6) are given in Example 2.16 (v), P2 (6) contains 15
elements; thus there are 15 − 5 = 10 more elements in P2 (6) with crossings.
Remark 2.19. Note that NC-pairings have a recursive structure, which usually is
crucial for dealing with them.
(1) The first pair of π ∈ N C 2 (2k) must necessarily be of the form (1, 2l) and the
remaining pairs can only pair within {2, . . . , 2l − 1} or within {2l + 1, . . . , 2l}.

1 2l
... ... ... ...

(2) Iterating this shows that we must find in any π ∈ N C 2 (2k) at least one pair of
the form (i, i + 1) with 1 ≤ i ≤ 2k − 1. Removing this pair gives a NC-pairing
of 2k − 2 points. This characterizes the NC-pairings as those pairings, which
can be reduced to the empty set by iterated removal of pairs, which consist of
neighbors.
An example for the reduction of a non-crossing pairing is the following.

1 2 3 4 5 6 7 8
1 2 5 8
→ → 1 8
→∅

In the case of a crossing pairing, some reductions might be possible, but eventually
one arrives at a point, where no further reduction can be done.

1 2 3 4 5 6 1 4 5 6
→ no further reduction possible!

Proposition 2.20. Consider m even and let π ∈ P2 (m), which we identify with a
permutation π ∈ Sm . As before, γ = (1, 2, . . . , m) ∈ Sm . Then we have:
(1) #(γπ) − m2 − 1 ≤ 0 for all π ∈ P2 (m).
(2) #(γπ) − m2 − 1 = 0 if and only if π ∈ N C 2 (m).

Proof. First we note that a pair (i, i + 1) in π corresponds to a fixed point of γπ.
π γ π γ
More precisely, in such a situation we have i + 1 → − i→ − i + 1 and i →
− i+1 →
− i + 2.
Hence γπ contains the cycles (i + 1) and (. . . , i, i + 2, . . . ).
This implication also goes in the other direction: If γπ(i + 1) = i + 1 then
π(i + 1) = γ −1 (i + 1) = i. Since π is a pairing we must then also have π(i) = i + 1
and hence we have the pair (i, i + 1) in π.

28
If we have (i, i + 1) in π, we can remove the points i and i + 1, yielding another
pairing π̃. By doing so, we remove in γπ the cycle (i + 1) and we remove in the cycle
(. . . , i, i + 2, . . . ) the point i, yielding γ π̃. We reduce thus m by 2 and #(γπ) by 1.
If π is NC we can iterate this until we arrive at π̃ with m = 2. Then we have
π̃ = (1, 2) and γ = (1, 2) such that γ π̃ = (1)(2) and #(γ π̃) = 2. If m = 2k we did
k − 1 reductions where we reduced in each step the number of cycles by 1 and at
the end we remain with 2 cycles, hence
m
#(γπ) = (k − 1) · 1 + 2 = k + 1 = + 1.
2
Here is an example for this:
1 2 3 4 5 6 7 8
1 2 5 8
−−−−→ 1 8
remove remove
π= −−−−→
(3,4),(6,7) (2,5)

γπ = (1)(268)(35)(4)(7) −→ (1)(28)(5) −→ (1)(8)

For a general π ∈ P2 (m) we remove cycles (i, i + 1) as long as possible. If π is


crossing we arrive at a pairing π̃, where this is not possible anymore. It suffices to
show that such a π̃ ∈ P2 (m) satisfies #(γ π̃) − m2 − 1 < 0. But since π̃ has no cycle
(i, i + 1), γ π̃ has no fixed point. Hence each cycle has at least 2 elements, thus
m m
#(γ π̃) ≤ < + 1.
2 2
Note that in the above arguments, with (i, i + 1) we actually mean (i, γ(i)); thus
also (1, m) counts as a pair of neighbors for a π ∈ P2 (m), in order to have the
characterization of fixed points right. Hence, when reducing a general pairing to
one without fixed points we have also to remove such cyclic neighbors as long as
possible.

2.4 Semicircle law for GUE


Theorem 2.21 (Wigner’s semicircle law for GUE, averaged version). Let AN be a
Gaussian (GUE) N × N random matrix. Then we have for all m ∈ N:
2
1 Z m√
lim E [tr (Am
N )] = x 4 − x2 dx.
N →∞ 2π
−2

29
Proof. This is true for m odd, since then both sides are equal to zero. Consider
m = 2k even. Then Theorem 2.15 and Proposition 2.20 show that
m
lim E [tr (Am lim N #(γπ)− 2 −1 =
X X
N )] = 1 = #N C 2 (m).
N→∞ N →∞
π∈P2 (m) π∈N C 2 (m)

Since the moments of the semicircle are given by the Catalan numbers, it remains
to see that #N C 2 (2k) is equal to the Catalan number Ck . To see this, we now count
dk := #N C 2 (2k) according to the recursive structure of NC-pairings as in 2.19 (i).
1 2l 2k
... ... ... ...

Namely, we can identify π ∈ N C 2 (2k) with {(1, 2l)}∪π0 ∪π1 , where l ∈ {1, . . . , k},
π0 ∈ N C 2 (2(l − 1)) and π1 ∈ N C 2 (2(k − l)). Hence we have
k
X
dk = dl−1 dk−l , where d0 = 1.
l=1

This is the recursion for the Catalan numbers, whence dk = Ck for all k ∈ N.
Remark 2.22. (1) One can refine
m m
#(γπ) − −1≤0 to #(γπ) − − 1 = −2g(π)
2 2
for g(π) ∈ N0 . This g has the meaning that it is the minimal genus of a
surface on which π can be drawn without crossings. NC pairings are also
called planar, they correspond to g = 0. Theorem 2.15 is usually addressed as
genus expansion,
E [tr(Am N −2g(π) .
X
N )] =
π∈P2 (m)

(2) For example, (1, 2)(3, 4) ∈ N C 2 (4) has g = 0, but the crossing pairing (1, 3)(2, 4) ∈
P2 (4) has genus g = 1. It has a crossing in the plane but this can be avoided
on a torus.
1 1

4 2 4 2

3 3

30
(3) If we denote
εg (k) = # {π ∈ P2 (2k) | π has genus g}
then the genus expansion 2.15 can be written as
h i
E tr(A2k εg (k)N −2g .
X
N) =
g≥0

We know that !
1 2k
εg (0) = Ck = ,
k+1 k
but what about the εg (k) for g > 0? There does not exist an explicit formula
for them, but Harer and Zagier have shown in 1986 that

(2k)!
εg (k) = · λg (k),
(k + 1)!(k − 2g)!

where λg (k) is the coefficient of x2g in


x
!k+1
2
.
tanh x2

We will come back later to this statement of Harer and Zagier; see Theorem
9.2

31
3 Wigner Matrices:
Combinatorial Proof of
Wigner’s Semicircle Law
Wigner’s semicircle law does not only hold for Gaussian random matrices, but more
general for so-called Wigner matrices; there we keep the independence and identical
distribution of the entries, but allow arbitrary distribution instead of Gaussian. As
there is no Wick formula any more, there is no clear advantage of the complex over
the real case any more, hence we will consider in the following the real one.

3.1 Wigner matrices


Definition 3.1. Let µ be a probability distribution on R. A corresponding Wigner
random matrix is of the form AN = √1N (aij )Ni,j=1 , where

• AN = AN , i.e., aij = aji for all i, j,
• {aij | i ≥ j} are independent,
• each aij has distribution µ.

Remark 3.2. (1) In our combinatorial setting we will assume that all moments of
µ exist; that the first moment is 0; and the second moment will be normalized
to 1. In an analytic setting one can deal with more general situations: usually
only the existence of the second moment is needed; and one can also allow
non-vanishing mean.
(2) Often one also allows different distributions for the diagonal and the off-
diagonal entries.
(3) Even more general, one can give up the assumption of identical distribution
of all entries and replace this by uniform bounds on their moments.
(4) We will now try to imitate our combinatorial proof from the Gaussian case
also in this more general situation. Without a precise Wick formula for the
higher moments of the entries, we will not aim at a precise genus expansion;

33
it suffices to see that the leading contributions are still given by the Catalan
numbers.

3.2 Combinatorial description of moments of


Wigner matrices
Consider a Wigner matrix AN = √1
N
(aij )N
i,j=1 , where µ has all moments and

Z Z
x dµ(x) = 0, x2 dµ(x) = 1.
R R

Then
N
1 1
E [tr(Am
X X X
N )] = 1+ m
E [ai1 i2 ai2 i3 · · · aim i1 ] = 1+ m
E [σ] ,
N 2
i1 ,...,im =1 N 2
σ∈P(m) i : [m]→[N ]
ker i=σ

where we group the appearing indices (i1 , . . . , im ) according to their “kernel”, which
is a “partition σ of {1, . . . , m}.

Definition 3.3. (1) A partition σ of [n] is a decomposition of [n] into disjoint,


non-empty subsets (of arbitrary size), i.e., σ = {V1 , . . . , Vk }, where
• Vi ⊂ [n] for all i,
• Vi 6= ∅ for all i,
• Vi ∩ Vj = ∅ for all i 6= j,
• ki=1 Vi = [n].
S

The Vi are called blocks of σ. The set of all partitions of [n] is denoted by

P(n) := {σ | σ is a partition of [n]} .

(2) For a multi-index i = (i1 , . . . , im ) we denote by ker i its kernel; this is the
partition σ ∈ P(m) such that we have ik = il if and only if k and l are in the
same block of σ. If we identify i with a function i : [m] → [N ] via i(k) = ik
then we can also write
n o
ker i = i−1 (1), i−1 (2), . . . , i−1 (N ) ,

where we discard all empty sets.

34
Example 3.4. For i = (1, 2, 1, 3, 2, 4, 2) we have

ik 1 2 1 3 2 4 2
k 1 2 3 4 5 6 7

such that
ker i = {(1, 3), (2, 5, 7), (4), (6)} ∈ P(7).
Remark 3.5. The relevance of this kernel in our setting is the following:
For i = (i1 , . . . , im ) and j = (j1 , . . . , jm ) with ker i = ker j we have

E [ai1 i2 ai2 i3 · · · aim i1 ] = E [aj1 j2 aj2 j3 · · · ajm j1 ] .

For example, for i = (1, 1, 2, 1, 1, 2) and j = (2, 2, 7, 2, 2, 7) we have

1 2 3 4 5 6
ker i = = ker j

and
h i h i h i h i
E [a11 a12 a21 a11 a12 a21 ] = E a211 E a412 = E a222 E a427 = E [a22 a27 a72 a22 a27 a72 ] .

We denote this common value by

E [σ] := E [ai1 i2 ai2 i3 · · · aim i1 ] if ker i = σ.

Thus we get:
1
E [tr(Am
X
N )] = 1+ m
E [σ] · # {i : [m] → [N ] | ker i = σ} .
N 2
σ∈P(m)

To understand the contribution corresponding to a σ ∈ P(m) we associate to σ a


graph Gσ .
Definition 3.6. For σ = {V1 , . . . , Vk } ∈ P(m) we define a corresponding graph Gσ
as follows. The vertices of Gσ are given by the blocks Vp of σ and there is an edge
betwees Vp and Vq if there is an r ∈ [m] such that r ∈ Vp and r + 1(mod m) ∈ Vq .
Another way of saying this is that we start with a graph with vertices 1, 2, . . . , m
and edges (1, 2), (2, 3), (3, 4), . . . , (m−1, m), (m, 1) and then identify vertices accord-
ing to the blocks of σ. We keep loops, but erase multiple edges.

35
Example 3.7. (1) For

σ = {(1, 3), (2, 5), (4)} =

we have
1=3

Gσ =
4 2=5

(2) For

σ = {(1, 5), (2, 4), (3)} =

we have

1=5

Gσ =
3 2=4

(3) For
σ = {(1, 3), (2), (4)} =
we have
1=3

Gσ =
2 4

The term E [ai1 i2 ai2 i3 · · · aim i1 ] corresponds now to a walk in Gσ , with σ = ker i,
along the edges with steps

i1 → i2 → i3 → · · · → im → i1 .

Hence we are using each edge in Gσ at least once. Note that different edges in Gσ
correspond to independent random variables. Hence, if we use an edge only once in

36
our walk, then E [σ] = 0, because the expectation factorizes into a product with one
factor being the first moment of aij , which is assumed to be zero. Thus, every edge
must be used at least twice, but this implies
# steps in the walk m
# edges in Gσ ≤ = .
2 2
Since the number of i with the same kernel is

# {i : [m] → [N ] | ker i = σ} = N (N − 1)(N − 2) · · · (N − #σ + 1),

where #σ is the number of blocks in σ, we finally get


1
E [tr (Am
X
N )] = 1+ m
E [σ] N (N − 1)(N − 2) · · · (N − #σ + 1) . (?)
N 2
σ∈P(m)
| {z }
#edges(Gσ )≤ m ∼N #σ for N →∞
2

We have to understand what the constraint on the number of edges in Gσ gives


us for the number of vertices in Gσ (which is the same as #σ). For this, we will now
use the following well-known basic result from graph theory.
Proposition 3.8. Let G = (V, E) be a connected finite graph with vertices V and
edges E. (We allow loops and multi-edges.) Then we have that

#V ≤ #E + 1

and we have equality if and only if G is a tree, i.e., a connected graph without cycles.

3.3 Semicircle law for Wigner matrices


Theorem 3.9 (Wigner’s semicircle law for Wigner matrices, averaged version). Let
AN be a Wigner matrix corresponding to µ, which has all moments, with mean 0
and second moment 1. Then we have for all m ∈ N:
2
1 Z m√
lim E [tr (Am
N )] = x 4 − x2 dx.
N →∞ 2π
−2

Proof. From (?) we get


m
lim E [tr (Am E [σ] lim N #V (Gσ )− 2 −1 .
X
N )] =
N →∞ N →∞
σ∈P(m)

37
m
In order to have E [σ] 6= 0, we can restrict to σ with #E(Gσ ) ≤ 2
, which by
Proposition 3.8 implies that
m
#V (Gσ ) ≤ #E(Gσ ) + 1 ≤ + 1.
2
Hence all terms converge and the only contribution in the limit N → ∞ comes from
those σ, where we have equality, i.e.,
m
#V (Gσ ) = #E(Gσ ) + 1 = + 1.
2
Thus, Gσ must be a tree and in our walk we use each edge exactly twice (necessarily
in opposite directions). For such a σ we have E [σ] = 1; thus

lim E [tr (Am


N )] = # {σ ∈ P(m) | Gσ is a tree} .
N →∞

We will check in Exercise 9 that the latter number is also counted by the Catalan
numbers.
Remark 3.10. Note that our Gσ are not just abstract trees, but they are coming with
the walks, which encode
• a starting point, i.e., the Gσ are rooted trees
• a cyclic order of the outgoing edges at a vertex, which gives a planar drawing
of our graph
Hence what we have to count are rooted planar trees.
Note also that a rooted planar tree determines uniquely the corresponding walk.

38
4 Analytic Tools: Stieltjes
Transform and Convergence of
Measures
Let us recall our setting and goal. We have, for each N ∈ N, selfadjoint N × N
random matrices, which are given by a probability measure PN on the entries of the
matrices; the prescription of PN should be kind of uniform in N .
For example, for the gue(n) we have A = (aij )N i,j=1 with the complex variables

aij = xij + −1yij having real part xij and imaginary part yij . Since aij = āji ,
we have yii = 0 for all i and we remain with the N 2 many “free variables” xii
(i = 1, . . . , N ) and xij , yij (1 ≤ i < j ≤ N ). All those are independent and Gaussian
distributed, which can be written in the compact form

Tr(A2 )
dP(A) = cN exp(−N )dA,
2
where dA is the product of all differentials of the N 2 variables and cN is a normal-
ization constant, to make PN a probability measure.
We want now statements about our matrices with respect to this measure PN ,
either in average or in probability. Let us be a bit more specific on this.
Denote by ΩN the space of our selfadjoint N × N matrices, i.e.,
√ 2
ΩN := {A = (xij + −1yij )N ˆ N ,
i,j=1 | xii ∈ R(i = 1, . . . , N ), xij , yij ∈ R(i < j)}=R

then, for each N ∈ N, PN is a probability measure on ΩN .


For A ∈ ΩN we consider its N eigenvalues λ1 , . . . , λN , counted with multiplicity.
We encode those eigenvalues in a probability measure µA on R:
1
µA := (δλ + · · · + δλN ),
N 1
which we call the eigenvalue distribution of A. Our claim is now that µA converges
under PN , for N → ∞ to the semicircle distribution µW ,

39
• in average, i.e., Z
N →∞
µN := µA dPN (A) = E [µa ] −→ µW
ΩN

• and, stronger, in probability or almost surely.


So what we have to understand now is:
• What kind of convergence µN → µ do we have here, for probability measures
on R?
• How can we describe probability measures (on R) and their convergence with
analytic tools?
The relevant notions of convergence are the “vague” and ”weak” convergence and
our analytic tool will be the Stieltjes transform. We start with describing the latter.

4.1 Stieltjes transform


Definition 4.1. Let µ be a Borel measure on R.
(1) µ is finite if µ(R) < ∞.
(2) µ is a probability measure if µ(R) = 1.
(3) For a finite measure µ on R we define its Stieltjes transform Sµ on C\R by
Z
1
Sµ (z) = dµ(t) (z ∈ C\R).
t−z
R

(4) −Sµ = Gµ is also called the Cauchy transform.


Theorem 4.2. The Stieltjes transform has the following properties.
(1) Let µ be a finite measure on R and S = Sµ its Stieltjes transform. Then one
has:
(i) S : C+ → C+ , where C+ := {z ∈ C | Im(z) > 0};
(ii) S is analytic on C+ ;
(iii) limy→∞ iyS(iy) = −µ(R).
(2) µ can be recovered from Sµ via the Stieltjes inversion formula: for a < b we
have
b
1Z 1
lim Im Sµ (x + iε)dx = µ((a, b)) + µ({a, b}).
ε&0 π
a
2
(3) In particular, we have for two finite measures µ and ν: Sµ = Sν implies that
µ = ν.
Proof. (1) This is Exercise 10.

40
(2) We have
1 ε
Z   Z
Im Sµ (x + iε) = Im dµ(t) = dµ(t)
t − x − iε (t − x)2 + ε
R R

and thus
Zb Z Zb
ε
Im Sµ (x + iε)dx = dxdµ(t).
a
(t − x)2 + ε2
R a
For the inner integral we have
Zb (b−t)/ε !
ε 1 b−t a−t
Z  
dx = dx = tan−1 − tan−1
a
(t − x)2 + ε2 2
x +1 ε ε
(a−b)/ε


t 6∈ [a, b]
0,
ε&0
−→ π/2, t ∈ {a, b}



π, t ∈ (a, b)
From this the assertion follows.
(3) Now assume that Sµ = Sν . By the Stieltjes inversion formula it follows then
that µ((a, b)) = ν((a, b)) for all open intervals such that a and b are atoms
neither of µ nor of ν. Since there can only be countably many atoms we can
write any interval as

[
(a, b) = (a + εn , b − εn ),
n=1
where the sequence εn & 0 is chosen such that all a + εn and b − εn are no
atoms of µ nor ν. By monotone convergence for measures we get then
µ((a, b) = lim µ((a + εn , b − εn )) = lim ν((a + εn , b − εn )) = ν((a, b)).
n→∞ n→∞

Remark 4.3. If we put µε = pε λ (where λ is Lebesgue measure) with density


1 1Z ε
pε (x) := Im Sµ (x + iε) = dµ(t),
π π (t − x)2 + ε2
R

then µε = γε ∗ µ, where γε is the Cauchy distribution, and we have checked explicitly


in our proof of the Stieltjes inversion formula the well-known fact that γε ∗µ converges
weakly to µ for ε & 0. We will talk about weak convergence later, see Definition
4.7.

41
Proposition 4.4. Let µ be a compactly supported probability measure, say µ([−r, r]) =
1 for some r > 0. Then Sµ has a power series expansion (about ∞) as follows

X mn
Sµ (z) = − n+1
for |z| > r,
n=0 z

tn dµ(t) are the moments of µ.


R
where mn := R

Proof. For |z| > r we can expand


∞  n
1 1 1X t
=− t = −
t−z z(1 − z ) z n=0 z

for all t ∈ [−r, r]; the convergence on [−r, r] is uniform, hence


Zr ∞ Z r ∞
1 X tn X mn
Sµ (z) = dµ(t) = − n+1
dµ(t) = − n+1
.
t−z n=0 z n=0 z
−r −r

Proposition
√ 4.5. The Stieltjes transform S(z) of the semicircle distribution, dµW (t) =
1 2 − 4dt, is, for z ∈ C+ , uniquely determined by

t
• S(z) ∈ C+ ;
• S(z) is the solution of the equation S(z)2 + zS(z) + 1 = 0.
Explicitly, this means

−z + z 2 − 4
S(z) = (z ∈ C+ ).
2
Proof. By Proposition 4.4, we know that for large |z|:

X Ck
S(z) = − 2k+1
,
k=0 z

where Ck are the Catalan numbers. By using the recursion for the Catalan numbers
(see Theorem 1.6), this implies that for large |z| we have S(z)2 + zS(z) + 1 = 0.
Since we know that S is analytic on C+ , this equation is, by analytic extension, then
valid for all z ∈ C+ . √
This equation has two solutions, (−z ± z 2 − 4)/2, and only the one with the
+-sign is in C+ .

42
Remark 4.6. Proposition 4.5 gave us the Stieltjes√transform of µW just from the
knowledge of the moments. From S(z) = (−z − z 2 − 4)/2 we can then get the
density of µW via the Stieltjes inversion formula:

1 1 q
ε&0 1 √ 0, |x| > 2
Im S(x+iε) = Im (x + iε)2 − 4 −→ Im x4 − 4 =  1 √
π 2π 2π 2π
4 − x , |x| ≤ 2.
2

Thus this analytic machinery gives an effective way to calculate a distribution from
its moments (without having to know the density in advance).

4.2 Convergence of probability measures


Now we want to consider the convergence µN → µ. We can consider (probability)
measures from two equivalent perspectives:
(1) measure theoretic perspective: µ gives us the measure (probability) of sets,
i.e., µ(B) for measurable sets B, or just µ(intervals)
(2) functional analytic
R
perspective: µ allows to integrate continuous functions, i.e,
it gives us f dµ for continuous f
According to this there are two canonical choices for a notion of convergence:
(1) µ
RN
(B) → µ(B) for all measurable sets B, or maybe for all intervals B
(2) f dµN → f dµ for all continuous f
R

The first possibility is problematic in this generality, as it does treat atoms too
restrictive.
Example: Take µN = δ1−1/N and µ = δ1 . Then we surely want that µN → µ, but
for B = [1, 2] we have µN ([1, 2]) = 0 for all N , but µ([1, 2]) = 1.
Thus the second possibility above is the better definition. But we have to be
careful about
R
which class of continuous functions we allow; we need bounded ones,
otherwise f dµ might not exist in general; and, for compactness reasons, it is some-
times better to ignore the behaviour of the measures at infinity.
Definition 4.7. (1) We use the notations
(i) C0 (R) := {f ∈ C(R) | lim|t|→∞ f (t) = 0} are the continuous functions on
R vanishing at infinity
(ii) Cb (R) := {f ∈ C(R) | ∃M > 0 : |f (t)| ≤ M ∀t ∈ R} are the continuous
bounded functions on R
(2) Let µ and µN (N ∈ N ) be finite measures. Then we say that
v
(i) µN converges vaguely to µ, denoted by µN → µ, if
Z Z
f (t)dµ(t) → f (t)dµ(t) for all f ∈ C0 (R);

43
w
(ii) µN converges weakly to µ, denoted by µN → µ, if
Z Z
f (t)dµN (t) → f (t)dµ(t) for all f ∈ Cb (R).

Remark 4.8. (1) Note that weak convergence includes in particular that
Z Z
µN (R) = 1dµN (t) → 1dµ(t) = µ(R),

and thus the weak limit of probability measures must again be a probability
measure. For the vague convergence this is not true; there we can loose mass
at infinity.
Example: Consider µN = 21 δ1 + 21 δN and µ = 12 δ1 ; then
Z
1 1 1 Z
f (t)dµN (t) = f (1) + f (N ) → f (1) = f (t)dµ(t)
2 2 2
for all f ∈ C0 (R). Thus the sequence of probability measures 12 δ1 + 12 δN
converges, for N → ∞, to the finite measure 12 δ1 with total mass 1/2.
(2) The relevance of the vague convergence, even if we are only interested in
probability measures, is that the probability measures are precompact in the
vague topology, but not in the weak topology. E.g., in the above example,
µN = 21 δ1 + 12 δN has no subsequence which converges weakly (but it has a
subsequence, namely itself, which converges vaguely).

Theorem 4.9. The space of probability measures on R is precompact in the vague


topology: every sequence (µN )N ∈N of probability measures on R has a subsequence
which converges vaguely to a finite measure µ, with µ(R) ≤ 1.

Proof. (1) From a functional analytic perspective this is a special case of the
Banach-Alaoglu theorem; since complex measures on R are the dual space of
the Banach space C0 (R), and its weak∗ topology is exactly the vague topology.
(2) From a measure theory perspective this is known as Helly’s (Selection) Theo-
rem. Here are the main ideas for the proof in this setting.
(i) We describe a finite measure µ by its distribution function Fµ , given by

Fµ : R → R; Fµ (t) := µ((−∞, t]).

Such distribution functions can be characterized as functions F with the


properties:
• t 7→ F (t) is non-decreasing

44
• F (−∞) := limt→−∞ F (t) = 0 and F (+∞) := limt→∞ F (t) < ∞
• F is continuous on the right
v
(ii) The vague convergence of µN → µ can also be described in terms of their
v
distribution functions FN , F ; µN → µ is equivalent to:
FN (t) → F (t) for all t ∈ R at which F is continuous.
(iii) Let now a sequence (µN )N of probability measures be given. We con-
sider the corresponding distribution functions (FN )N and want to find a
convergent subsequence (in the sense of (ii)) for those.
For this choose a countable dense subset T = {t1 , t2 , . . . } of R. Then, by
choosing subsequences of subsequences and taking the “diagonal” sub-
sequence, we get convergence for all t ∈ T . More precisely: Choose
subsequence (FN1 (m) )m such that
m→∞
FN1 (m) (t1 ) −→ FT (t1 ),
choose then a subsequence (FN2 (m) )m of this such that
m→∞ m→∞
FN2 (m) (t1 ) −→ FT (t1 ), FN2 (m) (t2 ) −→ FT (t2 );
iterating this gives subsequences (FNk (m) )m such that
m→∞
FNk (m) (ti ) −→ FT (ti ) for all i = 1, . . . , k.
The diagonal subsequence (FNm (m) )m converges then at all t ∈ T to FT (t).
We improve now FT to the wanted F by
F (t) := inf{FT (s) | s ∈ T, s > t}
and show that
• F is a distribution function;
m→∞
• FNm (m) (t) −→ F (t) at all continuity points of F .
m→∞
According to (ii) this gives then the convergence µNm (m) −→ µ, where µ
is the finite measure corresponding to the distribution function F . Note
that FN (+∞) = 1 for all N ∈ N gives F (+∞) ≤ 1, but we cannot
guarantee F (+∞) = 1 in general.

Remark 4.10. If we want compactness in the weak topology, then we must control
the mass at ∞ in a uniform way. This is given by the notion of tightness. A
sequence (µN )N of probability measures is tight if: for all ε > 0 there exists an
interval I = [−R, R] such that µN (I c ) < ε for all N .
Then one has: Any tight sequence of probability measures has a subsequence
which converges weakly; the limit is then necessarily a probability measure.

45
4.3 Probability measures determined by moments
We can now also relate weak convergence to convergence of moments; which shows
that our combinatorial approach (using moments) and analytic approach (using
Stieltjes transforms) for proving the semicircle law are essentially equivalent. We
want to make this more precise in the following.
Definition 4.11. RA probability measure µ on R is determined by its moments if
(i) all moments tk dµ(t) < ∞ (k ∈ N) exist;
(ii) µ is the only RprobabilityRmeasure with those moments: if ν is a probability
measure and tk dν(t) = tk dµ(t) for all k ∈ N, then ν = µ.
Theorem 4.12. Let µ and µN (N ∈ N) be probability measures for which all mo-
ments exist. Assume that µ is determined by its moments. Assume furthermore that
we have convergence of moments, i.e.,
Z Z
lim tk dµN (t) = tk dµ(t) for all k ∈ N.
N →∞

w
Then we have weak convergence: µN → µ.
Rough idea of proof. One has to note that convergence of moments implies tightness,
which implies the existence of a weakly convergent subsequence, µNm → ν. Further-
more, the assumption that the moments converge implies that they are uniformly
integrable, which implies then that the moments of this subsequence converge to the
moments of ν. (These are kind of standard measure theoretic arguments, though a
bit involved; for details see the book of Billingsley, in particular, his Theorem 25.12
and its Corollary.) However, the moments of the subsequence converge, as the mo-
ments of the whole sequence, by assumption to the moments of µ; this means that
µ and ν have the same moments and hence, by our assumption that µ is determined
by its moments, we have that ν = µ.
In the same way all weakly convergent subsequences of (νN )N must converge to the
same µ, and thus the whole sequence must converge weakly to µ.

Remark 4.13. (0) Note that in the first version of these notes (and also in the
recorded lectures) it was claimed that, under the assumption that the limit
is determined by its moments, convergence in moments is equivalent to weak
convergence. This is clearly not true as the following simple example shows.
Consider
1 1
µN = (1 − )δ0 + δN and µ = δ0 .
N N

46
w
Then it is clear that µN → µ, and µ is also determined by its moments. But
there is no convergence of moments. For example, the first moment converges,
but to the wrong limit
Z
1 Z
tdµN (t) = N = 1 → 1 6= 0 = tdµ(t),
N
and the other moments explode
Z
1 k
tk dµN (t) = N = N k−1 → ∞ for k ≥ 2.
N
In order to have convergence of moments one needs a uniform integrability
assumption; see Billingsley, in particular, his Theorem 25.12 and its Corollary.
(1) Note that there exist measures for which all moments exist but which, however,
are not determined by their moments. Weak convergence to them cannot be
checked by just looking on convergence of moments.
Example: The log-normal distribution with density
1 1 −(log x)2 /2
dµ(t) = √ e dt on [0, ∞)
2π x
(which is the distribution of eX for X Gaussian) is not determined by its
moments.
(2) Compactly supported measures (like the semicircle) or also the Gaussian dis-
tribution are determined by their moments.

4.4 Description of weak convergence via the


Stieltjes transform
Theorem 4.14. Let µ and µN (N ∈ N) be probability measures on R. Then the
following are equivalent.
w
(i) µN → µ.
(ii) For all z ∈ C+ we have: limN →∞ SµN (z) = Sµ (z).
(iii) There exists a set D ⊂ C+ , which has an accumulation point in C+ , such that:
limN →∞ SµN (z) = Sµ (z) for all z ∈ D.
w
Proof. • (i) =⇒ (ii): Assume that µN → µ. For z ∈ C+ we consider
1
fz : R → C with fz (t) = .
t−z

47
Since lim|t|→∞ fz (t) = 0, we have fz ∈ C0 (R) ⊂ Cb (R) and thus, by definition
of weak convergence:
Z Z
SµN (z) = fz (t)dµN (t) → fz (t)dµ(t) = Sµ (z).

• (ii) =⇒ (iii): clear


• (iii) =⇒ (i): By Theorem 4.9, we know that (µN )N has a subsequence (µN (m) )m
which converges vaguely to some finite measure ν with ν(R) ≤ 1. Then, as
above, we have for all z ∈ D:

Sµ (z) = lim SµN (m) (z) = Sν (z).


m→∞

Thus the analytic functions Sµ and Sν agree on D and hence, but the identity
therem for analytic functions, also on C+ , i.e., Sµ = Sν . But this implies, by
Theorem 4.2, that ν = µ.
Thus the subsequence (µN (m) )m converges vaguely to the probability measure
µ (and thus also weakly, see Exercise 12). In the same way, any weak cluster
point of (µN )N must be equal to µ, and thus the whole sequence must converge
weakly to µ.

Remark 4.15. If we only assume that SµN (z) converges to a limit function S(z), then
S must be the Stieltjes transform of a measure ν with ν(R) ≤ 1 and we have the
v
vague convergence µN → ν.

48
5 Analytic Proof of Wigner’s
Semicircle Law for Gaussian
Random Matrices
Now we are ready to give an analytic proof of Wigner’s semicircle law, relying on the
analytic tools we developed in the last chapter. As for the combinatorial approach,
the Gaussian case is easier compared to the general Wigner case and thus we will
restrict to this. The difference between real and complex is here not really relevant,
instead of gue we will treat the goe case.

5.1 GOE random matrices


Definition 5.1. Real Gaussian random matrices (goe) are of the form AN =
(xij )N
i,j=1 , where xij = xji for all i, j and {xij | i ≤ j} are i.i.d. (independent
identically distributed) random variables with Gaussian distribution of mean zero
and variance 1/N . More formaly, on the space of symmetric N × N matrices

ΩN = {AN = (xij )N
i,j=1 | xij ∈ R, xij = xji ∀i, j}

we consider the probability measure

N
Tr(A2N )) dxij ,
Y
dPN (AN ) = cN exp(−
4 i≤j

with a normalization constant cN such that PN is a probability measure.

Remark 5.2. (1) Note that with this choice of PN , which is invariant under orthog-
onal rotations, we have actually different variances on and off the diagonal:
h i 1 h i 2
E x2ij = (i 6= j) and E x2ii = .
N N

49
(2) We consider now, for each N ∈ N, the averaged eigenvalue distribution
Z
µN := E [µAN ] = µA dPN (A).
ΩN

w
We want to prove that µN → µW . According to Theorem 4.14 we can prove
this by showing limN →∞ SµN (z) = SµW (z) for all z ∈ C+ .
(3) Note that
 
Z
1 Z
1 h i
SµN (z) = dµN (t) = E  dµAN (t) = E tr[(AN − z1)−1 ] ,
t−z t−z
R R

since, by Assignement ...., SµAN (t) = tr[(AN − z1)−1 ]. So what we have to see,
is for all z ∈ C+ :
h i
lim E tr[(AN − z1)1− ] = SµW (z).
N →∞

For this, we want to see that SµN (z) satisfies approximately the quadratic
equation for SµW (z), from 4.5.
(4) Let us use for the resolvents of our matrices A the notation
1
RA (z) = , so that SµN (z) = E [tr(RAN (z))] .
A − z1
In the following we will usually suppress the index N at our matrices; thus
write just A instead of AN , as long as the N is fixed and clear.
We have then (A − z1)RA (z) = 1, or A · RA (z) − zRA (z) = 1, thus
1 1
RA (z) = − 1 + ARA (z).
z z
Taking the normalized trace and expectation of this yields
1 1
E [tr(RA (z))] = − + E [tr(ARA (z))] .
z z
The left hand side is our Stieltjes transform, but what about the right hand
side; can we relate this also to the Stieltjes transform? Note that the function
under the expectation is N1 k,l xkl [RA (z)]lk ; thus a sum of terms which are the
P

product of one of our Gaussian variables times a function of all the independent
Gaussian variables. There exists actually a very nice and important formula
to deal with such expectations of independent Gaussian variables. In a sense,
this is the analytic version of the combinatorial Wick formula.

50
5.2 Stein’s identity for independent Gaussian
variables
Proposition 5.3 (Stein’s identity). Let X1 , . . . , Xk be independent random variables
with Gaussian distribution, with mean zero and variances E [Xi ] = σi2 . Let h :
Rk → C be continuously differentiable such that h and all partial derivatives are of
polynomial growth. Then we have for i = 1, . . . , k:
" #
∂h
E [Xi h(X1 , . . . , Xk )] = σi2 E (X1 , . . . , Xk ) .
∂xi
More explicitly,
!
Z
x2 x2
xi h(x1 , . . . , xk ) exp − 12 − · · · − k2 dx1 . . . dxk =
2σ1 2σk
Rk
!
Z
∂h x2 x2
σi2 (x1 , . . . , xk ) exp − 12 − · · · − k2 dx1 . . . dxk
∂xi 2σ1 2σk
Rk
2 /(2σ 2 ) 2 /(2σ 2 )
Proof. The main argument happens for k = 1. Since xe−x = [−σ 2 e−x ]0
we get by partial integration
Z Z Z
−x2 /(2σ 2 ) 2 −x2 /(2σ 2 ) 0 2 /(2σ 2 )
xh(x)e dx = h(x)[−σ e ] dx = h0 (x)σ 2 e−x dx;
R R R

our assumptions on h are just such that the boundary terms vanish.
For general k, we just do partial integration for the i-th coordinate.

We want to apply this now to our Gaussian random matrices, with Gaussian
random variables xij (1 ≤ i ≤ j ≤ N ) of variance

1, i 6= j
σij2 = N
2, i=j
N

and for the function


1
h(xij | i ≤ j) = h(A) = [RA (z)]lk , where RA (z) = .
A − z1
To use Stein’s identitiy 5.3 in this case we need the partial derivatives of the resol-
vents.

51
Lemma 5.4. For A = (xij )N
i,j=1 with xij = xji for all i, j, we have for all i, j, k, l:

∂ −[R (z)] · [R (z)] , i = j,
A li A ik
[RA (z)]lk = 
∂xij −[RA (z)]li · [RA (z)]jk − [RA (z)]lj · [RA (z)]ik , i 6= j.

Proof. Note first that 


∂A E ,
ii i=j
=
∂xij  6 j
Eij + Eji , i =
where Eij is a matrix unit with 1 at position (i, j) and 0 elsewhere.
We have RA (z) · (A − z1) = 1, which yields by differentiating
∂RA (z) ∂A
· (A − z1) + Ra (z) · = 0,
∂xij ∂xij
and thus
∂RA (z) ∂A
= −RA (z) · · RA (z).
∂xij ∂xij
This gives, for i = j,

[RA (z)]lk = −[RA (z) · Eii · RA (z)]lk = −[RA (z)]li · [RA (z)]ik ,
∂xii
and, for i 6= j,

[RA (z)]lk = −[RA (z) · Eij · RA (z)]lk − [RA (z) · Eji · RA (z)]lk
∂xij
= −[RA (z)]li · [RA (z)]jk − [RA (z)]lj · [RA (z)]ik .

5.3 Semicircle law for GOE


Theorem 5.5. Let AN be goe random matrices as in 5.1. Then its averaged eigen-
value distribution µN := E [µAN ] converges weakly to the semicircle distribution:
w
µN → µW .
Proof. By Theorem 4.14, it suffices to show limN →∞ SµN (z) = SµW (z) for all z ∈ C+ .
In Remark 5.2(4) we have seen that (where we write A instead of AN )
1 1
SµN (z) = E [tr(RA (z)]] = − + E [tr[ARA (z)]] .
z z

52
Now we calculate, with A = (xij )N
i,j=1 ,

N
1 X
E [tr[ARA (z)]] = E [xkl · [RA (z)]lk ]
N k,l=1
N
" #
1 X 2 ∂
= σkl ·E [RA (z)]lk
N k,l=1 ∂xkl
N
1 X 1  
=− · [RA (z)]lk · [RA (z)]lk + [RA (z)]ll · [RA (z)]kk .
N k,l=1 N
(Note that the combination of the different covariances and of the different form of
the formula in Lemma 5.4 for on-diagonal and for off-diagonal entries gives in the
end the same result for all pairs of (k, l).)
Now note that (A − z1) is symmetric, hence the same is true for its inverse
RA (z) = (A − z1)−1 and thus: [RA (z)]lk = [RA (z)]kl . Thus we get finally
1 h i
E [tr[ARA (z)]] = − E tr[RA (z)2 ] − E [tr[RA (z)] · tr[RA (z)]] .
N
To proceed further we need to deal with the two summands on the right hand side;
we expect
• the first term, N1 E [tr[RA (z)2 ]], should go to zero, for N → ∞
• the second term, E [tr[RA (z)] · tr[RA (z)]], should be close to its factorized ver-
sion E [tr[RA (z)]] · E [tr[RA (z)]] = SµN (z)2
Both these ideas are correct; let us try to make them rigorous.
• A as a symmetric matrix can be diagonalized by an orthogonal matrix U ,
1
... 0
   
λ1 . . . 0 (λ1 −z)2
 . . ..  ∗ .. ... ..  ∗
A = U  .. .. and thus RA (z)2 = 

U ,
U ,

.   . . 
1
0 . . . λN 0 ... (λN −z)2

which yields
N
2 1 X 1
| tr[RA (z) ]| ≤ | |.
N i=1 (λi − z)2
Note that for all λ ∈ R and all z ∈ C+
1 1
| |≤
λ−z Im z
and hence
1 h i 1 h i 1 1
|E tr(RA (z)2 ] | ≤ E | tr[RA (z)2 ]| ≤ →0 for N → ∞.
N N N (Im z)2

53
• By definition of the variance we have
h i h i
Var [X] := E (X − E [X])2 = E X 2 − E [X]2 ,
and thus h i
E X 2 = E [X]2 + Var [X] .
Hence we can replace E [tr[RA (z)] · tr[RA (z)]] by
E [tr[RA (z)]]2 + Var [tr[RA (z)]] = SµN (z)2 + Var [SµA (z)] .
In the next chapter we will show that we have concentration, i.e., the variance
Var [SµA (z)] goes to zero for N → ∞.
With those two ingredients we have then
1 1 1 1
SµN (z) = − + E [tr[ARA (z)]] = − − SµN (z)2 + εN ,
z z z z
where εN → for N → ∞.
Note that, as above, for any Stieltjes transform Sν we have
Z
1 Z
1 1
|Sν (z)| = | dν(t)| ≤ | |dν(t) ≤ ,
t−z t−z Im z
and thus (SµN (z))N is a bounded sequence of complex numbers. Hence, by compact-
ness, there exists a convergent subsequence (SµN (m) (z))m , which converges to some
S(z). This S(z) must then satisfy the limit N → ∞ of the above equation, thus
1 1
S(z) = − − S(z)2 .
z z
Since all SµN (z) are in C+ , the limit S(z) must be in C+ , which leaves for S(z) only
the possibility that √
−z + z 2 − 4
S(z) = = SµW (z)
2
(as the other solution is in C− ).
In the same way, it follows that any subsequence of (SµN (z))N has a convergent
subsequence which converges to SµW (z); this forces all cluster points of (SµN (z))N
to be SµW (z). Thus the whole sequence converges to SµN (z). This holds for any
w
z ∈ C+ , and thus implies that µN → µW .
To complete the proof we still have to see the concentration to get the asymptotic
vanishing of the variance. We will address such concentration questions in the next
chapter.

54
6 Concentration Phenomena and
Stronger Forms of Convergence
for the Semicircle Law
6.1 Forms of convergence
Remark 6.1. (1) Recall that our random matrix ensemble is given by probability
measures PN on sets ΩN of N × N matrices and we want to see that µAN con-
verges weakly to µW , or, equivalently, that, for all z ∈ C+ , SµAN (z) converges
to SµW (z). There are different levels of this convergence with respect to PN :
(i) convergence in average, i.e.,
N →∞
h i
E SµAN (z) −→ SµW (z)

(ii) convergence in probability, i.e.,


N →∞
PN {AN | |SµAN (z) − SµW (z)| ≥ ε} −→ 0 for all ε > 0

(iii) almost sure convergence, i.e.,

P{(AN )N | SµAN (z) does not converge to SµW (z)} = 0;

instead of making this more precise, let us just point out that this al-
most sure convergence is guaranteed, by the Borel-Cantelli Lemma, if
the convergence in (ii) to zero is sufficiently fast in N , so that for all
ε>0 ∞X
PN {AN | |SµAN (z) − SµW (z)| ≥ ε} < ∞.
N =1

(2) Note that we have here convergence of probabilistic quantities to a deter-


ministic limit, thus (ii) and (iii) are saying that for large N the eigenvalue
distribution of AN concentrates in a small neighborhood of µW . This is an in-
stance of a quite general “concentration of measure” phenomenon; according

55
to a dictum of M. Talagrand: “A random variable that depends (in a smooth
way) on the influence of many independent variables (but not too much on
any of them) is essentially constant.”
(3) Note also that many classical results in probability theory (like law of large
numbers) can be seen as instances of this, dealing with linear functions. How-
ever, this principle also applies to non-linear functions - like in our case, to
tr[(A − z1)−1 ], considered as function of the entries of A.
(4) Often control of the variance of the considered function is a good way to get
concentration estimates. We develop in the following some of the basics for
this.

6.2 Markov’s and Chebyshev’s inequality


Notation 6.2. A probability space (Ω, P) consists of a set Ω, equipped with a σ-
algebra of measurable sets, and a probability measure P on the measurable sets of
Ω. A random variable X is a measurable function X : Ω → R; its expectation or
mean is given by
Z
E [X] := X(ω)dP(ω),

and its variance is given by

h i h i Z
Var [X] := E (X − E [X])2 = E X 2 − E [X]2 = (X(ω) − E [(] X))2 dP(ω).

Theorem 6.3 (Markov’s Inequality). Let X be a random variable taking non-


negative values. Then, for any t > 0,

E [X]
P{ω | X(ω) ≥ t} ≤ .
t

E [X] could here also be ∞, but then the statement is not very useful. The
Markov inequality only gives useful information if X has finite mean, and then only
for t > E [X].

56
Proof. Since X(ω) ≥ 0 for all ω ∈ Ω we can estimate as follows:
Z
E [X] = X(ω)dP(ω)

Z Z
= X(ω)dP(ω) + X(ω)dP(ω)
{X(ω)≥t} {X(ω)<t}
Z
≥ X(ω)dP(ω)
{X(ω)≥t}
Z
≥ tdP(ω)
{X(ω)≥t}

= t · P{X(ω) ≥ t}.

Theorem 6.4 (Chebyshev’s Inequality). Let X be a random variable with finite


mean E [X] and variance Var [X]. Then, for any ε > 0,
Var [X]
P{ω | |X(ω) − E [X] | ≥ ε} ≤ .
ε2
Proof. We use Markov’s inequality 6.3 for the positive random variable Y := (X −
E [X])2 . Note that h i
E [Y ] = E (X − E [X])2 = Var [X] .
Thus we have for ε > 0
P{ω | |X(ω − E [(] X)| ≥ ε} = P{ω | (X(ω) − E [X])2 ≥ ε2 }
= P[{ω | Y (ω) ≥ ε2 }
E [Y ]
≤ 2
ε
Var [X]
= .
ε2

Remark 6.5. Our goal will thus be to control the variance of X = f (X1 , . . . , Xn ) for
X1 , . . . , Xn independent random variables. (In our case, the Xi will be the entries
of the GOE matrix A and f will be the function f = tr[(A − z1)−1 ].) A main idea in
this context is to have estimates which go over from separate control of each variable
to control of all variables together; i.e., which are stable under tensorization. There
are two prominent types of such estimates, namely

57
(i) Poincaré inequality
(ii) LSI=logarithmic Sobolev inequality
We will focus here on (i) and say a few words on (ii) later.

6.3 Poincaré inequality


Definition 6.6. A random variable X = (X1 , . . . , Xn ) : Ω → Rn satisfies a Poincaré
inequality with constant c > 0 if for any differentiable function f : Rn → R with
E [f (X)2 ] < ∞ we have
n
h i ∂f 2
Var [f (X)] ≤ c · E k∇f (X)k22 k∇f k22 =
X
where ( ).
i=1 ∂xi

Remark 6.7. Let us write this also “explicitly” in terms of the distribution µ of the
random variable X : Ω → Rn ; recall that µ is the push-forward of the probability
measure P under the map X to a probability measure on Rn . In terms of µ we have
then Z
E [f (X)] = f (x1 , . . . , xn )dµ(x1 , . . . , xn )
Rn

and the Poincaré inequality asks then for


Z  2
f (x1 , . . . , xn ) − E [f (X)] dµ(x1 , . . . , xn )
Rn
n Z
!2
X ∂f
≤c· (x1 , . . . , xn ) dµ(x1 , . . . , xn ).
i=1Rn ∂xi

Theorem 6.8 (Efron-Stein Inequality). Let X1 , . . . , Xn be independent random vari-


ables and let f (X1 , . . . , Xn ) be a square-integrable function of X = (X1 , . . . , Xn ).
Then we have n h i
E Var(i) [f (X)] ,
X
Var [f (X)] ≤
i=1

where Var(i) denotes taking the variance in the i-th variable, keeping all the other
variables fixed, and the expectation is then integrating over all the other variables.

Proof. We denote the distribution of Xi by µi ; this is, for each i, a probability mea-
sure on R. Since X1 , . . . , Xn are independent, the distribution of X = (X1 , . . . , Xn )
is given by the product measure µ1 × · · · × µn on Rn .

58
Putting Z = f (X1 , . . . , Xn ), we have
Z
E [Z] = f (x1 , . . . , xn )dµ1 . . . dµn (xn )
Rn

and Z  2
Var [Z] = f (x1 , . . . , xn ) − E [Z] dµ1 . . . dµn .
Rn
We will now do the integration E by integrating one variable at a time and control
each step. For this we write
Z − E [Z] = Z − E1 [Z]
+ E1 [Z] − E1,2 [Z]
+ E1,2 [Z] − E1,2,3 [Z]
..
.
+ E1,2,...,n−1 [Z] − E [Z] ,
where E1,...,k denotes integration over the variables x1 , . . . , xk , leaving a function of
the variables xk+1 , . . . , xn . Thus, with
∆i := E1,...,i−1 [Z] − E1,...,i−1,i [Z]
Pn
(which is a function of the variables xi , xi+1 , . . . , xn ), we have Z − E [Z] = i=1 ∆i ,
and thus
h i
Var [Z] = Var (Z − E [Z])2
 !2 
n
X
= Var  ∆i 
i=1
n h i
E ∆2i +
X X
= E [∆i ∆j ] .
i=1 i6=j

Now observe that for all i 6= j we have E [∆i ∆j ] = 0. Indeed, consider, for example,
n = 2 and i = 1, j = 2:
E [∆1 ∆2 ] = E [(Z − E1 [Z]) · (E1 [Z] − E1,2 [Z])]
Z  Z 
= f (x1 , x2 ) − f (x̃1 , x2 )dµ1 (x̃1 ) ·
Z Z 
· f (x̃1 , x2 )dµ1 (x̃1 ) − f (x̃1 , x̃2 )dµ1 (µ̃1 )dµ2 (x̃2 ) dµ1 (x1 )dµ2 (x2 )

59
Integration with respect to x1 now affects only the first factor and integrating this
gives zero. The general case i 6= j works in the same way. Thus we get
n h i
E ∆2i .
X
Var [Z] =
i=1

We denote now with E(i) integration with respect to the variable xi , leaving a func-
tion of the other variables x1 , . . . , xi−1 , xi+1 , . . . , xn , and
Var(i) [Z] := E(i) [(Z − E(i) [Z])2 ].
Then we have
∆i = E1,...,i−1 [Z] − E1,...,i [Z] = E1,...,i−1 [Z − E(i) [Z]],
and thus by Jensen’s inequality (which is here just the fact that variances are non-
negative),
∆2i ≤ E1,...,i−1 [(Z − E(i) [Z])2 ].
This gives finally
n h i n h i
E ∆2i ≤ E E(i) [(Z − E(i) [Z])2 ] ,
X X
Var [Z] =
i=1 i=1

which is the assertion.


Theorem 6.9. Let X1 , . . . , Xn be independent random variables in R, such that
each Xi satisfies a Poincaré inequality with constant ci . Then X = (X1 , . . . , Xn )
satisfies a Poicaré inequality in Rn iwth constant c = max(c1 , . . . , cn ).
Proof. By the Efron-Stein inequality 6.8, we have
n h i
E Var(i) [f (X)]
X
Var [f (X)] ≤
i=1
  !2 
n
∂f
E ci · E(i) 
X
≤ 
i=1 ∂xi
 !2 
n
X ∂f
≤c· E 
i=1 ∂xi
h i
= c · E k∇f (X)k22 .
In the step from the first to the second line we have used, for fixed i, the Poincaré
inequality for Xi and the function xi 7→ f (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ), for each
fixed x1 , . . . , xi−1 , xi+1 , . . . , xn .

60
Theorem 6.10 (Gaussian Poincaré Inequality). Let X1 , . . . , Xn be independent
standard Gaussian random variables, E [Xi ] = 0 and E [Xi2 ] = 1. Then X =
(X1 , . . . , Xn ) satisfies a Poincaré inequality with constant 1; i.e., for each continu-
ously differentiable f : Rn → R we have
h i
Var [f (X)] ≤ E k∇f (X)k2 .

Remark 6.11. (1) Note the independence of n!


(2) By Theorem 6.9 it suffices to prove the statement for n = 1. But even in this
one-dimensional case the statement is not obvious. Let us see what we are
actually claiming in this case: X is a standard Gaussian random variable and
f : R → R, and the Poincaré inequality says
h i
Var [f (X)] ≤ E f 0 (X)2 .

We might also assume that E [f (X)] = 0, then this means explicitly:


Z Z
2 −x2 /2 2 /2
f (x) e dx ≤ f 0 (x)2 e−x dx.
R R

Proof. As remarked in Remark 6.11, the general case can, by Theorem 6.9, be
reduced to the one-dimensional case and, by shifting our function f by a constant,
we can also assume that f (X) has mean zero. One possible proof is to approximate
X via a central limit theorem by independent Bernoulli variables Yi .
So let Y1 , Y2 , . . . be independent Bernoulli variables, i.e., P [Yi = 1] = 1/2 =
P [Yi = −1] and put
Y1 + · · · + Yn
Sn = √ .
n
Then, by the central limit theorem, the distribution of Sn converges weakly, for
n → ∞, to a standard Gaussian distribution. So we can approximate f (X) by
!
1
g(Y1 , . . . , Yn ) = f (Sn ) = f √ (Y1 + · · · + Yn ) .
n
By the Efron-Stein inequality 6.8, we have
Var [f (Sn )] = Var [g(Y1 , . . . , Yn )]
n h i
E Var(i) [g(Y1 , . . . , Yn )]
X

i=1
n h i
E Var(i) [f (Sn )] .
X
=
i=1

61
Put
1 1
Sn[i] := Sn − √ Yi = √ (Y1 + · · · Yi−1 + Yi+1 + · · · + Yn ).
n n
Then !
1  1   1 
E(i) [f (Sn )] = f Sn[i] + √ + f Sn[i] − √
2 n n
and

Var(i) [f (Sn )]
 !2 !2 
1   [i] 1   1  
= f Sn + √ − E(i) [f (Sn )] + f Sn[i] − √ − E(i) [f (Sn )]
2 n n 
!2
1  1   1 
= f Sn[i] + √ − f Sn[i] − √ ,
4 n n
and thus
 ! 
1n  1  1   2
E  f Sn[i] + √ − f Sn[i] − √
X
Var [f (Sn )] ≤ .
4 i=1 n n

By Taylor’s theorem we have now


 1    1   1 00
f Sn[i] + √ = f Sn[i] + √ f 0 Sn[i] + f (ξ+ )
n n 2n
 1    1 0  [i]  1 00
f Sn[i] − √ = f Sn[i] − √ f Sn + f (ξ+ )
n n 2n
We assume that f is twice differentiable and f 0 and f 00 are bounded: |f 0 (ξ)| ≤ K
and |f 00 (ξ)| ≤ K for all ξ ∈ R. (The general situation can be approximated by this.)
Then we have
!2 !2
 1   1  2   1  00 
f Sn[i] + √ − f Sn[i] − √ = √ f 0 Sn[i] + f (ξ+ ) − f 00 (ξ− )
n n n 2n
4   2 2 1
= f 0 Sn[i] + 3/2 R1 R2 + 2 R22 ,
n n 4n
 
where we have put R1 := f 0 Sn[i] and R2 := f 00 (ξ+ ) − f 00 (ξ− ). Note that |R1 | ≤ K
and |R2 | ≤ 2K, and thus
1 4 2 1
   2  
Var [f (Sn )] ≤ n E f 0 Sn[i] + 3/2 2K 2 + 2 4K 2 .
4 n n 4n

62
Note that the first term containing Sn[i] is actually independent of i. Now we take
the limit n → ∞ in this inequality; since both Sn and Sn[i] converge to our standard
Gaussian variable X we obtain finally the wanted
h i
Var [f (X)] ≤ E f 0 (X)2 .

6.4 Concentration for tr[RA(z)] via Poincaré


inequality
We apply this Gaussian Poincaré inequality now to our random matrix setting A =
(xij )N
i,j=1 , where {xijj | i ≤ j} are independent Gaussian random variables with
E [xij ] = 0 and E [xii ] = 2/N on the diagonal and E [xij ] = 1/N (i 6= j) off the
diagonal. Note that by a change of variable the constant in the Poincare inequality
for this variances is given by max{σij2 | i ≤ j} = 2/N . Thus we have in our setting
for nice real-valued f :
2 h i
Var [f (A)] ≤ · E kf (A)k22 .
N
We take now
g(A) := tr[(A − z1)−1 ] = tr[RA (z)]
and want to control Var [g(AN ] for N → ∞. Note that g is complex-valued (since
z ∈ C+ ), but we can estimate
h √ i  
|Var [g(A)] | = |Var Re g(A) + −1 Im g(A) | ≤ 2 Var [Re g(A)] + Var [Im g(A)] .

Thus it suffices to estimate the variance of real and imaginary part of g(A).
We have, for i < j,

∂g(A) ∂
= tr[RA (z)]
∂xij ∂xij
N
1 X ∂[RA (z)]kk
=
N k=1 ∂xij
N 
1 X 
=− [RA (z)]ki · [RA (z)]jk + [RA (z)]kj · [RA (z)]ik by Lemma 5.4
N k=1

63
N
2 X
=− [RA (z)]ik · [RA (z)]kj since RA (z) is symmetric, see proof of 5.5
N k=1
2
= − [RA (z)2 ]ij ,
N
and the same for i = j with 2/N replaced by 1/N .
Thus we get for f (A) := Re g(A) = Re tr[RA (z)]:

∂f (A) ∂g(A) 2 2 2
| | = | Re | ≤ |[RA (z)2 ]ij | ≤ kRA (z)2 k ≤ ,
∂xij ∂xij N N N · (Im z)2

where in the last step we used the usual estimate for resolvents as in the proof of
Theorem 5.5. Hence we have
∂f (A) 2 4
| | ≤ 2 ,
∂xij N · (Im z)4

and thus our Gaussian Poincaré inequality 6.10 (with constant 2/N ) yields

2 X ∂f (A) 2 8
Var [f (A)] ≤ · | | ≤ .
N i≤j ∂xij N · (Im z)4

The same estimate holds for the imaginary part and thus, finally, we have for the
variance of the trace of the resolvent:
32
Var [tr[RA (z)]] ≤ .
N · (Im z)4

The fact that Var [tr[RA (z)]] goes to zero for N → ∞ closes the gap in our proof
of Theorem 5.5. Furthermore, it also improves the type of convergence in Wigner’s
semicircle law.

Theorem 6.12. Let AN be goe random matrices as in 5.1. Then the eigenvalue
distribution µAN converges in probability to the semicircle distribution. Namely, for
each z ∈ C+ and all ε > 0 we have

lim PN {AN | |SµAN (z) − SµW (z)| ≥ ε} = 0.


N →∞

Proof. By the Chebyshev inequality 6.4, our above estimate for the variance implies
for any ε > 0 that

64
Var [tr[RAN ]]
PN {AN | | tr[RAN (z)] − E [tr[RAN (z)]] | ≥ ε} ≤
ε2
32 N →∞
≤ −→ 0.
N · (Im z) · ε
4 2

Since we already know, by Theorem 5.5, that limN →∞ E [tr[RAN (z)]] = SµW (z), this
gives the assertion.
Remark 6.13. Note that our estimate Var [...] ∼ 1/N is not strong enough to get
almost sure convergence; one can, however, improve our arguments to get Var [...] ∼
1/N 2 , which implies then also almost sure convergence.

6.5 Logarithmic Sobolev inequalities


One actually has typcially even exponential convergence in N . Such stronger con-
centration estimates rely usually on so called logarithmic Sobolev inequalities

Definition 6.14. A probability measure on Rn satisfies a logarithmic Sobolev in-


equality (LSI) with constant c > 0, if for all nice f :
Z
Entµ (f 2 ) ≤ 2c k∇f k22 dµ,
Rn

where Z Z Z
Entµ (f ) := f log f dµ − f dµ · log f dµ
Rn Rn Rn

is an entropy like quantity.

Remark 6.15. (1) As for Poincaré inequalities, logarithmic Sobolev inqualities are
stable under tensorization and Gaussian measures satisfy LSI.
(2) From a logarithmic Sobolev inequality one can then derive a concentration
inequality for our random matrices of the form
!
N 2 ε2
PN {AN | | tr[RAN (z)] − E [tr[RAN (z)]] | ≥ ε} ≤ const · exp − · (Im z)4 .
2

65
7 Analytic Description of the
Eigenvalue Distribution of
Gaussian Random Matrices

In Exercise 7 we showed that the joint distribution of the entries aij = xij + −1yij
of a gue A = (aij )N
i,j=1 has density

N
 
c · exp − Tr A2 dA.
2
This clearly shows the invariance of the distribution under unitary transformations:
Let U be a unitary N × N matrix and let B = U ∗ AU = (bij )N i,j=1 . Then we have
2 2
Tr B = Tr A and the volume element is invariant under unitary transformations,
dB = dA. Therefore, for the joint distributions of entries of A and of B, respectively,
we have
N N
   
2 2
c · exp − Tr B dB = c · exp − Tr A dA.
2 2
Thus the joint distribution of entries of a gue is invariant under unitary trans-
formations, which explains the name Gaussian Unitary Ensemble. What we are
interested in, however, are not the entries but the eigenvalues of our matrices. Thus
we should transform this density from entries to eigenvalues. Instead of gue, we
will mainly consider the real case, i.e., goe.

7.1 Joint eigenvalue distribution for GOE and


GUE
Let us recall the definition of goe, see also Definition 5.1.
Definition 7.1. A Gaussian orthogonal random matrix (goe) A = (xij )N i,j=1 is given
by real-valued entries xij with xij = xji for all i, j = 1, . . . , N and joint distribution
N
 Y
2
cN exp − Tr A dxij .
4 i≥j

67
With goe(n) we denote the goe of size N × N .
Remark 7.2. (1) This is clearly invariant under orthogonal transformation of the
entries.
(2) This is equivalent to independent real Gaussian random variables. Note, how-
ever, that the variance for the diagonal entries has to be chosen differently
from the off-diagonals; see Remark 5.2. Let us check this for N = 2 with
!
x x12
A = 11
x12 x22 .

Then
 !2 
N x x12 N 2
 
exp − Tr 11  = exp − 2 2
x11 + 2x12 + x22
4 x 12 x 22 . 4
N 2 N N
     
= exp − x11 exp − x212 exp − x222 ;
4 2 4
those give the density of a Gaussian of variance 1/N for x11 and x22 and of
variance 1/N for x12 .
(3) From this one can easily determine the normalization constant cN (as a func-
tion of N ).
Since we are usually interested in functions of the eigenvalues, we will now trans-
form this density to eigenvalues.
Example 7.3. As a warmup, let us consider the goe(2) case,
!
x x N
 
A = 11 12 with density p(A) = c2 exp − Tr A2 .
x12 x22 4
We parametrize A by its eigenvalues λ1 and λ2 and an angle θ by diagonalization
A = OT DO, where
! !
λ1 0 cos θ − sin θ
D= and O= ;
0 λ2 sin θ cos θ

explicityly

x11 = λ1 cos2 θ + λ2 sin2 θ,


x12 = (λ1 − λ2 ) cos θ sin θ,
x22 = λ1 sin2 θ + λ2 cos2 θ.

68
Note that O and D are not uniquely determined by A. In particular, if λ1 = λ2 then
any orthogonal O works. However, this case has probability zero and thus can be
ignored (see Remark 7.4). If λ1 6= λ2 , then we can choose λ1 < λ2 ; O contains then
the normalized eigenvectors for λ1 and λ2 . Those are unique up to a sign, which
can be fixed by requiring that cos θ ≥ 0. Hence θ is not running from −π to π, but
instead it can be restricted to [−π/2, π/2]. We will now transform

p(x11 , x22 , x12 ) dx11 dx22 dx12 → q(λ1 , λ2 , θ) dλ1 dλ2 dθ

by the change of variable formula q = p |det DF |, where DF is the Jacobian of

F : (x11 , x22 , x12 ) 7→ (λ1 , λ2 , θ).

We calculate

cos2 θ sin2 θ
 
−2(λ1 − λ2 ) sin θ cos θ
2 2
det DF = det 

cos θ sin θ − cos θ sin θ (λ 1 − λ2 ) (− sin θ + cos θ) = −(λ1 − λ2 ),

sin2 θ cos2 θ 2(λ1 − λ2 ) sin θ cos θ

and hence |det DF | = |λ1 − λ2 |. Thus,


N 2 )) N 2 2
q(λ1 , λ2 , θ) = c2 e− 4 (Tr(A |λ1 − λ2 | = c2 e− 4 (λ1 +λ2 ) |λ1 − λ2 | .

Note that q is independent of θ, i.e., we have a uniform distribution in θ. Consider


a function f = f (λ1 , λ2 ) of the eigenvalues. Then
Z Z Z
E [f (λ1 , λ2 )] = q(λ1 , λ2 , θ)f (λ1 , λ2 ) dλ1 dλ2 dθ
π
Z2 Z Z
N 2 2
= f (λ1 , λ2 )c2 e− 4 (λ1 +λ2 ) |λ1 − λ2 | dλ1 dλ2 dθ
− π2 λ1 <λ2
Z Z
N 2 2
= πc2 f (λ1 , λ2 )e− 4 (λ1 +λ2 ) |λ1 − λ2 | dλ1 dλ2 .
λ1 <λ2

Thus, the density for the joint distribution of the eigenvalues on {(λ1 , λ2 ); λ1 < λ2 }
is given by
N 2 2
c̃2 · e− 4 (λ1 +λ2 ) |λ1 − λ2 |
with c̃2 = πc2 .

69
Remark 7.4. Let us check that the probability of λ1 = λ2 is zero.
λ1 , λ2 are the solutions of the characteristic equation

0 = det(λI − A) = (λ − x11 )(λ − x22 ) − x212


= λ2 − (x11 + x22 )λ + (x11 x22 − x212 )
= λ2 − bλ + c.

Then there is only one solution if and only if the discriminant d = b2 − 4ac is zero.
However,
{(x11 , x22 , x12 ); d(x11 , x22 , x12 ) = 0}
is a two-dimensional surface in R3 , i.e., its Lebesgue measure is zero.
Now we consider general goe(n).
Theorem 7.5. The joint distribution of the eigenvalues of a goe(n) is given by a
density
N 2 2
c̃N e− 4 (λ1 +···+λN ) (λl − λk )
Y

k<l

restricted on λ1 < · · · < λN .


Proof. In terms of the entries of the goe matrix A we have density
N
Tr A2
p (xkl | k ≥ l) = cN e− 4 ,

where A = (xkl )Nk,l=1 with xkl real and xkl = xlk for all l, k. Again we diagonalize
T
A = O DO with O orthogonal and D = diag(λ1 , . . . , λN ) with λ1 ≤ · · · ≤ λN .
As before, degenerated eigenvalues have probability zero, hence this case can be
neglected and we assume λ1 < · · · < λN . We parametrize O via O = e−H by a
skew-symmetric matrix H, that is, H T = −H, i.e., H = (hij )N i,j=1 with hij ∈ R and
hij = −hji for all i, j. In particular, hii = 0 for all i. We have
 T T
OT = e−H = e−H = eH

and thus O is indeed orthogonal:

OT O = eH e−H = eH−H = e0 = I = OOT .

O = e−H is actually a parametrization of the Lie group SO(N ) by the Lie algebra
so(N ) of skew-symmetric matrices.
Note that our parametrization A = eH De−H has the right number of parameters.
For A we have the variables {xij ; j ≤ i} and for eH De−H we have the N eigenvalues

70
{λ1 , . . . , λN } and the 21 (N 2 − N ) many parameters {hij ; i > j}. In both cases we
have 21 N (N + 1) many variables. This parametrization is locally bijective; so we
need to compute the Jacobian of the map S : A 7→ eH De−H . We have
dA = ( deH )De−H + eH ( dD)e−H + eH D( de−H )
h i
= eH e−H ( deH )D + dD − D( de−H )eH e−H .

This transports the calculation of the derivative at any arbitrary point eH to the
identity element I = e0 in the Lie group. Since the Jacobian is preserved under this
transformation, it suffices to calculate the Jacobian at H = 0, i.e., for eH = I and
deH = dH. Then
dA = dH · D − D · dH + dD,
i.e.,
dxij = dhij λj − λi dhij + dλi δij
This means that we have
∂xij ∂xij
= δij δik and = δik δjl (λl − λk ).
∂λk ∂hkl
Hence the Jacobian is given by
Y
J = det DS = (λl − λk ).
k<l

Thus,
N
Tr A2
q(λ1 , . . . , λN , hkl ) = p(xij | i ≥ j)J = cN e− 4
Y
(λl − λk )
k<l
N 2 2
= cN e− 4 (λ1 +···+λN )
Y
(λl − λk ).
k<l

This is independent of the “angles” hkl , so integrating over those variables just
changes the constant cN into another constant c̃N .
In a similar way, the complex case can be treated; see Exercise 19. One gets the
following.
Theorem 7.6. The joint distribution of the eigenvalues of a gue(n) is given by a
density
N 2 2
ĉN e− 2 (λ1 +···+λN ) (λl − λk )2
Y

k<l

restricted on λ1 < · · · < λN .

71
7.2 Rewriting the Vandermonde
Definition 7.7. The function
N
Y
∆(λ1 , . . . , λN ) = (λl − λk )
k,l=1
k<l

is called the Vandermonde determinant.


Proposition 7.8. For λ1 , . . . , λN ∈ R we have that
···
 
1 1 1
 N  λ1 λ2 ... λN 
λi−1
 
∆(λ1 , . . . , λN ) = det j = det  .. .. .. .. .
i,j=1 
 . . . .


−1 −1 −1
λN
1 λN
2 . . . λN
N

Proof. det(λi−1 j ) is a polynomial in λ1 , . . . , λN . If λl = λk for some l, k ∈ {1, . . . , N }


then det(λj ) = 0. Thus det(λi−1
i−1
j ) contains a factor λl − λk for each k < l, hence
i−1
∆(λ1 , . . . , λN ) divides det(λj ).
Since det(λi−1 j ) is a sum of products with one factor from each row, we have that
the degree of det(λi−1 j ) is equal to

1
0 + 1 + 2 + · · · + (N − 1) = N (N − 1),
2
which is the same as the degree of ∆(λ1 , . . . , λN ). This shows that
∆(λ1 , . . . , λN ) = c · det(λi−1
j ) for some c ∈ R.
−1
By comparing the coefficient of 1 · λ2 · λ23 · · · λN
N on both sides one can check that
c = 1.

The advantage of being able to write our density in terms of a determinant


comes from the following observation: In det(λi−1 j ) we can add arbitrary linear
combinations of smaller rows to the k-th row without changing the value of the
determinant, i.e., we can replace λk by any arbitrary monic polynomial pk (λ) =
λk + αk−1 λk−1 + · · · + α1 λ + α0 of degree k. Hence we have the following statement.
Proposition 7.9. Let p0 , . . . , pN −1 be monic polynomials with deg pk = k. Then we
have
N
det (pi−1 (λj ))N
Y
i,j=1 = ∆(λ1 , . . . , λN ) = (λl − λk ).
k,l=1
k<l

72
7.3 Rewriting the GUE density in terms of
Hermite kernels
In the following, we will make a special choice for the pk . We will choose them
as the Hermite polynomials, which are orthogonal with respect to the Gaussian
1 2
distribution 1c e− 2 λ .

Definition 7.10. The Hermite polynomials Hn are defined by the following require-
ments.
(i) Hn is a monic polynomial of degree n.
(ii) For all n, m ≥ 0:
Z
1 1 2
Hn (x)Hm (x) √ e− 2 x dx = δnm n!

R

Remark 7.11. (1) One can get the Hn (x) from the monomials 1, x, x2 , . . . via
Gram-Schmidt orthogonalization as follows.
• We define an inner product on the polynomials by
1 Z 1 2
hf, gi = √ f (x)g(x)e− 2 x dx.

R

• We put H0 (x) = 1. This is monic of degree 0 with


1 Z − 1 x2
hH0 , H0 i = √ e 2 dx = 1 = 0!.

R

• We put H1 (x) = x. This is monic of degree 1 with


1 Z 1 2
hH1 , H0 i = √ xe− 2 x dx = 0

R

and
1 Z 2 − 1 x2
hH1 , H1 i = √ x e 2 dx = 1 = 1!.

R

• For H2 , note that


1 Z 3 − 1 x2
hx2 , H1 i = √ x e 2 dx = 0

R

73
and
2 1 Z 2 − 1 x2
hx , H0 i = √ x e 2 dx = 1.

R

Hence we set H2 (x) := x2 − H0 (x) = x2 − 1. Then we have

hH2 , H0 i = 0 = hH2 , H1 i

and

1 Z 2 1 2
hH2 , H2 i = √ (x − 1)2 e− 2 x dx

R
1 Z 4 1 2
= √ (x − 2x2 + 1)e− 2 x dx = 3 − 2 + 1 = 2!

R

• Continue in this way.


Note that the Hn are uniquely determined by the requirements that Hn is
monic and that hHm , Hn i = 0 for all m 6= n. That we have hHn , Hn i = n!, is
then a statement which has to be proved.
(2) The Hermite polynomials satisfy many explicit relations; important is the
three-term recurrence relation

xHn (x) = Hn+1 (x) + nHn−1 (x)

for all n ≥ 1; see Exercise 22.


(3) The first few Hn are

H0 (x) = 1,
H1 (x) = x,
H2 (x) = x2 − 1,
H3 (x) = x3 − 3x,
H4 (x) = x4 − 6x2 + 3.

(4) By Proposition 7.9, we can now use the Hn for writing our Vandermonde
determinant as
∆(λ1 , . . . , λN ) = det (Hi−1 (λj ))N
i,j=1 .

74
We want to use this for our gue(n) density
N 2 2
q(λ1 , . . . , λN ) = ĉN e− 2 (λ1 +···+λN ) ∆(λ1 , . . . , λN )2
!2
− 21 (µ21 +···+µ2N ) µ1 µN
= ĉN e ∆ √ ,..., √ .
N N
!N (N −1)
− 12 (µ21 +···+µ2N ) 2 1
= ĉN e ∆(µ1 , . . . , µN ) √ ,
N

where√ the µi = N λi are the eigenvalues of the “unnormalized” gue ma-
trix N AN . It will be easier to deal with those. We now will also go
over from ordered eigenvalues λ1 < λ2 < · · · < λN to unordered eigenval-
ues (µ1 , . . . , µN ) ∈ RN . Since in the latter case each ordered tuple shows up
N ! times, this gives an additional factor N ! in our density. We collect all these
N -dependent factors in our constant c̃N . So we now have the density
1 2 2
p(µ1 , . . . , µN ) = c̃N e− 2 (µ1 +···+µN ) ∆(µ1 , . . . , µN )2
1 2 2
h i2
= c̃N e− 2 (µ1 +···+µN ) det (Hi−1 (µj ))N
i,j=1
 N 2
− 41 µ2j

= c̃N det e Hi−1 (µj ) .
i,j=1

Definition 7.12. The Hermite functions Ψn are defined by


1 1 1 2
Ψn (x) = (2π)− 4 (n!)− 2 e− 4 x Hn (x).
Remark 7.13. (1) We have
Z
1 1 Z − 1 x2
Ψn (x)Ψm (x) dx = √ √ e 4 Hn (x)Hm (x) dx = δnm ,
2π n!m!
R R

i.e., the Ψn are orthonormal with respect to the Lebesgue measure. Actually,
they form an orthonormal Hilbert space basis of L2 (R).
(2) Now we can continue the calculation
h i2
p(µ1 , . . . , µN ) = cN det (Ψi−1 (µj ))N
i,j=1

with a new constant cN . Denote Vij = Ψi−1 (µj ). Then we have


(det V )2 = det V T det V = det(V T V )
such that
N N
(V T V )ij =
X X
Vki Vkj = Ψk−1 (µi )Ψk−1 (µj ).
k=1 k=1

75
Definition 7.14. The N -th Hermite kernel KN is defined by
N
X −1
KN (x, y) = Ψk (x)Ψk (y).
k=0

Collecting all our notations and calculations we have thus proved the following.
Theorem 7.15. The unordered joint eigenvalue distribution of an unnormalized
gue(n) is given by the density

p(µ1 , . . . , µN ) = cN det (KN (µi , µj ))N


i,j=1 .

Proposition 7.16. KN is a reproducing kernel, i.e.,


Z
KN (x, u)KN (u, y) du = KN (x, y).
R

Proof. We calculate
N −1 N −1
Z Z ! !
X X
KN (x, u)KN (u, y) du = Ψk (x)Ψk (u) Ψl (u)Ψl (y) du
R R k=0 l=0
NX−1 Z
= Ψk (x)Ψl (y) Ψk (u)Ψl (u) du
k,l=0 R
N
X −1
= Ψk (x)Ψl (y)δkl
k,l=0
N
X −1
= Ψk (x)Ψk (y)
k=0
= KN (x, y).

Lemma 7.17. Let K : R2 → R be a reproducing kernel, i.e.,


Z
K(x, u)K(u, y) du = K(x, y).
R

K(x, x) dx. Then, for all n ≥ 2,


R
Put d = R
Z
det (K(µi , µj ))ni,j=1 dµn = (d − n + 1) · det (K(µi , µj ))n−1
i,j=1 .
R

76
We assume that all those integrals make sense, as it is the case for our Hermite
kernels.

Proof. Consider the case n = 2. Then


!
K(µ1 , µ1 ) K(µ1 , µ2 )
Z
det dµ2
K(µ2 , µ1 ) K(µ2 , µ2 )
R
Z Z
= K(µ1 , µ1 ) K(µ2 , µ2 ) dµ2 − K(µ1 , µ2 )K(µ2 , µ1 ) dµ2
R R
= (d − 1)K(µ1 , µ1 )
= (d − 1)K(µ1 , µ1 ) det (K(µ1 , µ1 )) .

For n = 3,
 
K(µ1 , µ1 ) K(µ1 , µ2 ) K(µ1 , µ3 )
det K(µ2 , µ1 ) K(µ2 , µ2 ) K(µ2 , µ3 )
 

K(µ3 , µ1 ) K(µ3 , µ2 ) K(µ3 , µ3 )


! !
K(µ2 , µ1 ) K(µ2 , µ2 ) K(µ1 , µ1 ) K(µ1 , µ2 )
= det K(µ1 , µ3 ) − det K(µ2 , µ3 )
K(µ3 , µ1 ) K(µ3 , µ2 ) K(µ3 , µ1 ) K(µ3 , µ2 )
!
K(µ1 , µ1 ) K(µ1 , µ2 )
+ det K(µ3 , µ3 ),
K(µ2 , µ1 ) K(µ2 , µ2 )

with
! !
K(µ1 , µ1 ) K(µ1 , µ2 ) K(µ1 , µ1 ) K(µ1 , µ2 )
Z
det K(µ3 , µ3 ) dµ3 = det · d,
K(µ2 , µ1 ) K(µ2 , µ2 ) K(µ2 , µ1 ) K(µ2 , µ2 )
R

and
!
K(µ1 , µ1 ) K(µ1 , µ2 )
Z
− det K(µ2 , µ3 ) dµ3
K(µ3 , µ1 ) K(µ3 , µ2 )
R
!
K(µ1 , µ1 ) K(µ1 , µ2 )
Z
=− det dµ3
K(µ2 , µ3 )K(µ3 , µ1 ) K(µ2 , µ3 )K(µ3 , µ2 )
R
!
K(µ1 , µ1 ) K(µ1 , µ2 )
= − det ,
K(µ2 , µ1 ) K(µ2 , µ2 )

77
and
!
K(µ2 , µ1 ) K(µ2 , µ2 )
Z
det K(µ1 , µ3 ) dµ3
K(µ3 , µ1 ) K(µ3 , µ2 )
R
!
K(µ2 , µ1 ) K(µ2 , µ2 )
Z
= det dµ3
K(µ1 , µ3 )K(µ3 , µ1 ) K(µ1 , µ3 )K(µ3 , µ2 )
R
!
K(µ2 , µ1 ) K(µ2 , µ2 )
= det
K(µ1 , µ1 ) K(µ1 , µ2 )
!
K(µ1 , µ1 ) K(µ1 , µ2 )
= − det .
K(µ2 , µ1 ) K(µ2 , µ2 )
Putting all terms together gives
Z
det(K(µi , µj ))3i,j=1 dµ3 = (d − 2) det(K(µi , µj ))2i,j=1 .
R

The general case works in the same way.


Iteration of Lemma 7.17 gives then the following.
Corollary 7.18. Under the assumptions of Lemma 7.17 we have
Z Z
··· det (K(µi , µj ))ni,j=1 dµ1 · · · dµn = (d − n + 1)(d − n + 2) · · · (d − 1)d.
R R

Remark 7.19. We want to apply this to the Hermite kernel K = KN . In this case
we have
Z
d= KN (x, x) dx
R
−1
Z NX
= Ψk (x)Ψk (x) dx
R k=0
N
X −1 Z
= Ψk (x)Ψk (x) dx
k=0 R

= N,
and thus, since now d = N = n,
Z Z
··· det (KN (µi , µj ))N
i,j=1 dµ1 · · · dµn = N !.
R R

78
This now allows us to determine the constant cN in the density p(µ1 , . . . , µn ) in
Theorem 7.15. Since p is a probability density on RN , we have
Z
1= p(µ1 , . . . , µn ) dµ1 · · · dµN
RN
Z Z
= cN ··· det (KN (µi , µj ))N
i,j=1 dµ1 · · · dµN
R R
= cN N !,
1
and thus cN = N!
.
Theorem 7.20. The unordered joint eigenvalue distribution of an unnormalized
gue(n) is given by a density
1
p(µ1 , . . . , µN ) = det (KN (µi , µj ))N
i,j=1 ,
N!
where KN is the Hermite kernel
N
X −1
KN (x, y) = Ψk (x)Ψk (x).
k=0

Theorem 7.21. The averaged eigenvalue density of an unnormalized gue(n) is


given by
−1 −1
1 1 NX 1 1 NX 1 µ2
pN (µ) = KN (µ, µ) = 2
Ψk (µ) = √ Hk (µ)2 e− 2 .
N N k=0 2π N k=0 k!
Proof. Note that p(µ1 , . . . , µN ) is the probability density to have N eigenvalues at
the positions µ1 , . . . , µN . If we are integrating out N −1 variables we are left with the
probability for one eigenvalue (without caring about the others). With the notation
µN = µ we get
Z
pN (µ) = p(µ1 , . . . , µN −1 , µ) dµ1 · · · dµN −1
RN −1
1 Z
= det (KN (µi , µj ))N
i,j=1 dµ1 · · · dµN −1
N ! N −1
R
1
= (N − 1)! det(KN (µ, µ))
N!
1
= KN (µ, µ).
N

79
8 Determinantal Processes and
Non-Crossing Paths:
Karlin–McGregor and
Gessel–Viennot
Our probability distributions for the eigenvalues of gue have a determinantal struc-
ture, i.e., are of the form
1
p(µ1 , . . . , µn ) = det (KN (µi , µj ))N
i,j=1 .
N!
They describe N eigenvalues which repel each other (via the factor (µi − µj )2 ). If we
consider corresponding processes, then the paths of the eigenvalues should not cross;
for this see also Section 8.3. There is a quite general relation between determinants
as above and non-crossing paths. This appeared in fundamental papers in different
contexts:
• in a paper by Karlin and McGregor, 1958, in the context of Markov chains
and Brownian motion
• in a paper of Lindström, 1973, in the context of matroids
• in a paper of Gessel and Viennot, 1985, in combinatorics

8.1 Stochastic version à la Karlin–McGregor


Consider a random walk on the integers Z:
• Yk : position at time k
• Z: possible positions
• Transition probability (to the two neighbors) might depend on position:
q p
−i i −
i−1← →i
i + 1, qi + p1 = 1
We now consider n copies of such a random walk, which at time k = 0 start at
different positions xi . We are interested in the probability that the paths don’t

81
cross. Let xi be such that all distances are even, i.e., if two paths cross they have
to meet.
(1) (n)
Theorem 8.1 (Karlin–McGregor). Consider n copies of Yk , i.e., (Yk , . . . , Yk )
(i)
with Y0 = xi , where x1 > x2 > · · · > xn . Consider now t ∈ N and y1 > y2 > · · · >
yn . Denote by
Pt (xi , yj ) = P [Yt = yj | Y0 = xi ]
the probability of one random walk to get from xi to yj in t steps. Then we have
h i
(i)
P Yt = yi for all i, Ys(1) > Ys(2) > · · · > Ys(n) for all 0 ≤ s ≤ t = det (Pt (xi , yj ))ni,j=1 .

Example 8.2. For one symmetric random walk Yt we have the following probabilities
to go in two steps from 0 to -2,0,2:

p1 1
p0 p1 = 4
p0 q1
1
p0 q1 + q0 p−1 = 2
q0 p−1
q−1 q0 q−1 = 1
4

Now consider two such symmetric random walks and set x1 = 2 = y1 , x2 = 0 = y2 .


Then h i
(1) (1) (2) (2) (1) (2)
P Y2 = 2 = Y0 , Y2 = 0 = Y0 , Y1 > Y1
 

 


 
 

 
 
 
3
 

 
= P
 , ,  = .







 16
  





 

 

Note that is not allowed.

Theorem 8.1 says that we also obtain this probability from the transition proba-
bilities of one random walk as
!
1/2 1/4 1 1 3
det = − = .
1/4 1/2 4 16 16

82
Proof of Theorem 8.1. Let Ωij be the set of all possible paths in t steps from xi to
yj . Denote by P [π] the probability for such a path π ∈ Ωij . Then we have
X
Pt (xi , yj ) = P [π]
π∈Ωij

and we have to consider the determinant


 n

det (Pt (xi , yj ))ni,j=1 = det 


X
P [π] .
π∈Ωij i,j=1

Let us consider the case n = 2:


P P !
P [π] P [π]
Pπ∈Ω11 Pπ∈Ω12
X X X X
det = P [π] · P [σ] − P [π] · P [σ]
π∈Ω21 P [π] π∈Ω22 P [π] π∈Ω 11σ∈Ω π∈Ω 22 σ∈Ω 12 21

Here, the first term counts all pairs of paths x1 → y1 and x2 → y2 ; hence non-
crossing ones, but also crossing ones. However, such a crossing pair of paths is, via
the “reflection principle” (where we exchange the parts of the two paths after their
first crossing), in bijection with a pair of paths from x1 → y2 and x2 → y1 ; this
bijection also preserves the probabilities.
Those paths, x1 → y2 and x2 → y1 , are counted by the second term in the
determinant. Hence the second term cancels out all the crossing terms in the first
term, leaving only the non-crossing paths.
For general n it works in a similar way.

8.2 Combinatorial version à la Gessel–Viennot


Let G be a weighted directed graph without directed cycles, e.g.

e
where we have weights mij = me on each edge i →
− j. This gives weights for directed
paths Y
P = via m(P ) = me ,
e∈P

83
and then also a weight for connecting two vertices a, b,
X
m(a, b) = m(P ),
P : a→b

where we sum over all directed paths from a to b. Note that this is a finite sum,
because we do not have directed cycles in our graph.

Definition 8.3. Consider two n-tuples of vertices A = (a1 , . . . , an ) and B =


(b1 , . . . , bn ). A path system P : A → B is given by a permutation σ ∈ Sn and
paths Pi : ai → bσ(i) for i = 1, . . . , n. We also put σ(P ) = σ and sgn P = sgn σ. A
vertex-disjoint path system is a path system (P1 , . . . , Pn ), where the paths P1 , . . . , Pn
do not have a common vertex.

Lemma 8.4 (Gessel–Viennot). Let G be a finite acyclic weighted directed graph and
let A = (a1 , . . . , an ) and B = (b1 , . . . , bn ) be two n-sets of vertices. Then we have

n
det (m(ai , bj ))ni,j=1 =
X Y
sgn σ(P ) m(Pi ).
P : A→B i=1
vertex-disjoint

Proof. Similar as the proof of Theorem 8.1; the crossing paths cancel each other out
in the determinant.

This lemma can be useful in two directions. Whereas in the stochastic setting
one uses mainly the determinant to count non-crossing paths, one can also count
vertex-disjoint path systems to calculate determinants. The following is an example
of this.
Example 8.5. Let Cn be the Catalan numbers

C0 = 1, C1 = 1, C2 = 2, C3 = 5, C4 = 14, . . .

and consider
· · · Cn
 
C0 C1
 C1

C2 · · · Cn+1 

Mn = 
 .. ... .. .
 . . 
Cn Cn+1 · · · C2n

84
Then we have
det M0 = det(1) = 1,
!
1 1
det M1 = det = 2 − 1 = 1,
1 2
 
1 1 2
det M2 = det 1 2 5  = 28 + 10 + 10 − 8 − 14 − 25 = 1.
 

2 5 14
This is actually true for all n: det Mn = 1. This is not obvious directly, but follows
easily from Gessel–Viennot, if one chooses the right setting.
Let us show it for M2 . For this, consider the graph
b2

b1

a0 = b0

a1

a2
The possible directions in the graph are up and right, and all weights are chosen as 1.
Paths in this graph correspond to Dyck graphs, and thus the weights for connecting
the a’s with the b’s are counted by Catalan numbers; e.g.,
m(a0 , b0 ) = C0 ,
m(a0 , b1 ) = C1 ,
m(a0 , b2 ) = C2 ,
m(a2 , b2 ) = C4 .
Thus  
m(a0 , b0 ) m(a0 , b1 ) m(a0 , b2 )
M2 = det m(a1 , b0 ) m(a1 , b1 ) m(a1 , b2 )
 

m(a2 , b0 ) m(a2 , b1 ) m(a2 , b2 )


and hence, by Gessel-Viennot,
det M2 = det (m(ai , bj ))2i,j=0 =
X
1 = 1,
P : (a0 ,a1 ,a2 )→(b0 ,b1 ,b2 )
vertex-disjoint

85
since there is only one such vertex-disjoint system of three paths, corresponding to
σ = id. This is given as follows; note that the path from a0 to b0 is actually a path
with 0 steps.

b2

b1

a0 = b0

a1

a2

8.3 Dyson Brownian motion and non-intersecting


paths

We have seen that the eigenvalues of random matrices repel each other. This be-
comes even more apparent when we consider process versions of our random matri-
ces, where the eigenvalue processes yield then non-intersecting paths. Those process
versions of our Gaussian ensembles are called Dyson Brownian motions. They are
defined as AN (t) := (aij (t))N i,j=1 (t ≥ 0), where each aij (t) is a classical Brownian mo-
tion (complex or real) and they are independent, apart from the symmetry condition
aij (t) = āji (t) for all t ≥ 0 and all i, j = 1, . . . , N . The eigenvalues λ1 (t), . . . , λN (t)
of AN (t) give then N non-intersecting Brownian motions.

Here are plots for discretized random walk versions of the Dyson Brownian mo-
tion, corresponding to goe(13), gue(13) and, for comparision, also 13 independent

86
Brownian motions; see also Exercise 24. Guess which is which!

8 8

6
6

4
4

2
2

0
-2

-2
-4

-4
-6

-8 -6
0 500 1000 1500 0 500 1000 1500

-1

-2

-3

-4
0 200 400 600 800 1000 1200

87
9 Statistics of the Largest
Eigenvalue and Tracy–Widom
Distribution
Consider gue(n) or goe(n). For large N , the eigenvalue distribution is close to a
semircircle with density
1√
p(x) = 4 − x2 .

bulk edge
We will now zoom to a microscopic level and try to understand the behaviour of
a single eigenvalue. The behaviour in the bulk and at the edge is different. We are
particularly interested in the largest eigenvalue. Note that at the moment we do
not even know whether the largest eigenvalue sticks close to 2 with high probability.
Wigner’s semicircle law implies that it cannot go far below 2, but it does not prevent
it from being very large. We will in particular see that this cannot happen.

9.1 Some heuristics on single eigenvalues


Let us first check heuristically what we expect as typical order of fluctuations of the
eigenvalues. For this we assume (without justification) that the semicircle predicts
the behaviour of eigenvalues down to the microscopic level.

Behaviour in the bulk: In [λ, λ+t] there should be ∼ tp(λ)N eigenvalues. This
is of order 1 if we choose t ∼ 1/N . This means that eigenvalues in the bulk have

89
for their position an interval of size ∼ 1/N , so this is a good guess for the order of
fluctuations for an eigenvalue in the bulk.

Behaviour at the edge: In [2 − t, 2] there should be roughly


Z2 2
N Z q
N p(x) dx = (2 − x)(2 + x) dx

2−t 2−t

many eigenvalues. To have this of order 1, we should choose t as follows:


2 2
N Z q N Z √ N2 3
1≈ (2 − x)(2 + x) dx ≈ 2 − x dx = t2
2π π π3
2−t 2−t

Thus 1 ∼ t3/2 N , i.e., t ∼ N −2/3 . Hence we expect for the largest eigenvalue an
interval or fluctuations of size N −2/3 . Very optimistically, we might expect
λmax ≈ 2 + N −2/3 X,
where X has N -independent distribution.

9.2 Tracy–Widom distribution


This heuristics (at least its implication) is indeed true and one has that the limit
h i
Fβ (x) := lim P N 2/3 (λmax − 2) ≤ x
N →∞

exists. It is called the Tracy–Widom distribution.


Remark 9.1. (1) Note the parameter β! This corresponds to:
ensemble β repulsion
goe 1 (λi − λj )1
gue 2 (λi − λj )2
gse 4 (λi − λj )4
It turns out that the statistics of the largest eigenvalue is different for real,
complex, quaternionic Gaussian random matrices. The behaviour on the mi-
croscopic level is more sensitive to the underlying symmetry than the macro-
scopic behaviour; note that we get the semicircle as the macroscopic limit for
all three ensembles. (In models in physics the choice of β corresponds often to
underlying physical symmetries; e.g., goe is used to describe systems which
have a time-reversal symmetry.)

90
(2) On the other hand, when β is fixed, there is a large universality class for the
corresponding Tracy–Widom distribution. F2 shows up as limiting fluctuations
for
(a) largest eigenvalue of gue (Tracy, Widom, 1993),
(b) largest eigenvalue of more general Wigner matrices (Soshnikov, 1999),
(c) largest eigenvalue of general unitarily invariant matrix ensembles (Deift
et al., 1994-2000),
(d) length of the longest increasing subsequence of random permutations
(Baik, Deift, Johansson, 1999; Okounkov, 2000),
(e) arctic cicle for Aztec diamond (Johansson, 2005),
(f) various growth processes like ASEP (“asymmetric single exclusion pro-
cess”), TASEP (“totally asymmetric ...”).
(3) There is still no uniform explanation for this universality. The feeling is that
the Tracy-Widom distribution is somehow the analogue of the normal distri-
bution for a kind of central limit theorem, where independence is replaced by
some kind of dependence. But no one can make this precise at the moment.
(4) Proving Tracy–Widom for gue is out of reach for us, but we will give some
ideas. In particular, we try to derive rigorous estimates which show that our
N −2/3 -heuristic is of the right order and, in particular, we will prove that the
largest eigenvalue converges almost surely to 2.

9.3 Convergence of the largest eigenvalue to 2


We want to derive an estimate, in the gue case, for the probability P [λmax ≥ 2 + ε],
which is compatible with our heuristic that ε should be of the order N −2/3 . We will
refine our moment method for this. Let AN be our normalized gue(n). We have
for all k ∈ N:
h i
P [λmax ≥ 2 + ε] = P λ2k
max ≥ (2 + ε)
2k

 
N
λ2k 2k 
X
≤ P j ≥ (2 + ε)
j=1
" #
(2 + ε)2k
=P tr A2k
N ≥
N
N h
2k
i
≤ E tr AN .
(2 + ε)2k

91
In the last step we used Markov’s inequality 6.3; note that we have even powers,
and hence the random variable tr(A2k
N ) is positive.
In Theorem 2.15 we calculated the expectation in terms of a genus expansion as
h i
E tr(A2k N #(γπ)−k−1 = εg (k)N −2g ,
X X
N) =
π∈P2 (2k) g≥0

where
εg (k) = # {π ∈ P2 (2k) | π has genus g} .
The inequality
N h
2k
i
P [λmax ≥ 2 + ε] ≤ E tr AN
(2 + ε)2k
is useless if k is fixed for N → ∞, because then the right hand side goes to ∞. Hence
we also have to scale k with N (we will use k ∼ N 2/3 ), but then the sub-leading
terms in the genus expansion become important. Up to now we only know that
ε0 (k) = Ck , but now we need some information on the other εg (k). This is provided
by a theorem of Harer and Zagier.
Theorem 9.2 (Harer–Zagier, 1986). Let us define bk by
εg (k)N −2g = Ck bk ,
X

g≥0

where Ck are the Catalan numbers. (Note that the bk depend also on N , but we
suppress this dependency in the notation.) Then we have the recursion formula
k(k + 1)
bk+1 = bk + bk−1
4N 2
for all k ≥ 2.
We will prove this later; see Section 9.6. For now, let us just check it for small
examples.
Example 9.3. From Remark 2.16 we know
h i
C1 b1 = E tr A2N = 1,
h i 1
C2 b2 = E tr A4N = 2 + ,
N2
h i 10
C3 b3 = E tr A6N = 5 + 2 ,
N
h i 70 21
C4 b4 = E tr A8N = 14 + 2 + 4 ,
N N

92
which gives

1 2 5 3
b1 = 1, b2 = 1 + , b3 = 1 + , b4 = 1 + + .
2N 2 N2 N 2 2N 4

We now check the recursion from Theorem 9.2 for k = 3:

k(k + 1) 2 12 1
 
b3 + 2
b2 = 1 + 2 + 2
1+
4N N 4N 2N 2
5 3
=1+ 2 +
N 2N 4
= b4

Corollary 9.4. For all N, k ∈ N we have for a gue(n) matrix AN that


!
h i k3
E tr A2k
N ≤ Ck exp .
2N 2

Proof. Note that, by definition, bk > 0 for all k ∈ N and hence, by Theorem 9.2,
bk+1 > bk . Thus,
! !
k(k + 1) k(k + 1) k2
bk+1 = bk + 2
bk−1 ≤ bk 1 + ≤ bk 1+ ;
4N 4N 2 2N 2

iteration of this yields


! ! !
(k − 1)2 (k − 2)2 12
bk ≤ 1 + 1+ · · · 1 +
2N 2 2N 2 2N 2
!k
k2
≤ 1+
2N 2
!k
k2
≤ exp since 1 + x ≤ ex
2N 2
!
k3
= exp
2N 2

93
We can now continue our estimate for the largest eigenvalue.
N h
2k
i
P [λmax ≥ 2 + ε] ≤ E tr AN
(2 + ε)2k
N
= C k bk
(2 + ε)2k
!
N k3
≤ C k exp
(2 + ε)2k 2N 2
4k
!
N k3
≤ exp .
(2 + ε)2k k 3/2 2N 2

For the last estimate, we used (see Exercise 26)

4k 4k
Ck ≤ √ 3/2 ≤ 3/2 .
πk k

Let us record our main estimate in the following proposition.

Proposition 9.5. For a normalized gue(n) matrix AN we have for all N, k ∈ N


and all ε > 0
4k
!
N k3
P [λmax (AN ) ≥ 2 + ε] ≤ exp .
(2 + ε)2k k 3/2 2N 2

This estimate is now strong enough to see that the largest eigenvalue has actually
to converge to 2. For this, let us fix ε > 0 and choose k depending on N as
kN := bN 2/3 c, where bxc denotes the smallest integer ≥ x. Then
3
N N →∞ kN N →∞ 1
3/2
−−−→ 1 and 2
−−−→ .
kN 2N 2

Hence
2kN
2

lim sup P [λmax ≥ 2 + ε] ≤ lim · 1 · e1/2 = 0,
N →∞ N →∞ 2+ε
and thus for all ε > 0,
lim P [λmax ≥ 2 + ε] = 0.
N →∞

This says that the largest eigenvalue λmax of a gue converges in probability to 2.

94
Corollary 9.6. For a normalized gue(n) matrix AN we have that its largest eigen-
value converges in probability, and also almost surely, to 2, i.e.,
N →∞
λmax (AN ) −−−→ 2 almost surely.

Proof. The convergence in probability was shown above. For the strenghtening to
almost sure convergence one has to use Borel–Cantelli and the fact that
2kN
X 2
< ∞.
N 2+ε

See Exercise 28.

9.4 Estimate for fluctuations


Our estimate from Proposition 9.5 gives also some information about the fluctuations
of λmax about 2, if we choose ε also depending on N . Let us use there now

kN = bN 2/3 rc and εN = N −2/3 t.

Then
3 3
N N →∞ 1 kN N →∞ r
3/2
−−−→ and −−−→ ,
kN r3/2 2N 2 2
and
!2bN 2/3 rc
4kN 1 N →∞
= 1 −−−→ e−rt ,
(2 + εN )2kN 1+ 2N 2/3
t
and thus
h i 1 3 /2
lim sup P λmax ≥ 2 + tN −2/3 ≤ er e−rt
N →∞ r3/2

for arbitrary r > 0. We optimize this now by choosing r = t for t > 0 and get
Proposition 9.7. For a normalized gue(n) matrix AN we have for all t > 0
1
 
2
h i
lim sup P λmax (AN ) ≥ 2 + tN − 3 ≤ t−3/4 exp − t3/2 .
N →∞ 2
Although this estimate does not prove the existence of the limit on the left hand
side, it turns out that the right hand side is quite sharp and captures the tail
behaviour of the Tracy–Widom distribution quite well.

95
9.5 Non-rigorous derivation of Tracy–Widom
distribution
For determining the Tracy–Widom fluctuations in the limit N → ∞ one has to use
the analytic description of the gue joint density. Recall from Theorem 7.20 that
the joint density of the unordered eigenvalues of an unnormalized gue(n) is given
by
1
p(µ1 , . . . , µN ) = det (KN (µi , µj ))N
i,j=1 ,
N!
where KN is the Hermite kernel
N
X −1
KN (x, y) = Ψk (x)Ψk (y)
k=0

with the Hermite functions Ψk from Definition 7.12. Because KN is a reproducing


kernel, we can integrate out some of the eigenvalues and get a density of the same
form. If we are integrate out all but r eigenvalues we get, by Lemma 7.17,
Z Z
1
··· p(µ1 , . . . , µN ) dµr+1 · · · dµN = · 1 · 2 · · · (N − r) · det (KN (µi , µj ))ri,j=1
N!
R R
(N − r)!
= det (KN (µi , µj ))ri,j=1
N!
=: pN (µ1 , . . . , µr ).

Now consider
h i
(N )
P µmax ≤ t = P [there is no eigenvalue in (t, ∞)]
= 1 − P [there is an eigenvalue in (t, ∞)]
! !
N N
 
= 1 − N P [µ1 ∈ (t, ∞)] − P [µ1 , µ2 ∈ (t, ∞)] + P [µ1 , µ2 , µ3 ∈ (t, ∞)] − · · ·
2 3
N
! ∞ Z∞
N Z
(−1)r
X
=1+ · · · pN (µ1 , . . . , µr ) dµ1 · · · dµr
r=1 r
t t
N Z∞ Z∞
1
(−1)r det (K(µi , µj ))ri,j=1 dµ1 · · · dµr .
X
=1+ ···
r=1 r!
t t

Does this have a limit for N → ∞?

96
(N )
√Note that p is the distribution for a gue(n) without normalization, i.e., µmax ≈
2 N . More precisely, we expect fluctutations
√  −2/3
 √
µ(N )
max ≈ N 2 + tN = 2 N + tN −1/6 .

We put  √ √ 
K̃N (x, y) = N −1/6 · KN 2 N + xN −1/6 , 2 N + yN −1/6
so that we have
N ∞ Z∞
(−1)r Z
" ! #
2/3 µ(N )
max − 2
X  r
P N √ −2 ≤t = · · · det K̃(xi , xj ) dx1 · · · dxr .
N r=0 r! i,j=1
t t

We expect that the limit


" ! #
2/3 µ(N )
max − 2
F2 (t) := lim P N √ −2 ≤t
N →∞ N

exists. For this, we need the limit lim K̃N (x, y). Recall that
N →∞

N
X −1
KN (x, y) = Ψk (x)Ψk (y).
k=0

As this involves Ψk for all k = 0, 1, . . . , N −1 this is not amenable to taking the limit
N → ∞. However, by the Christoffel–Darboux identity for the Hermite functions
(see Exercise 27))
n−1
X Hk (x)Hk (y) Hn (x)Hn−1 (y) − Hn−1 (x)Hn (y)
=
k=0 k! (x − y) (n − 1)!

and with
1 2
Ψk (x) = (2π)−1/4 (k!)−1/2 e− 4 x Hk (x),
as defined in Definition 7.12, we can rewrite KN in the form
−1
1 NX 1 − 14 (x2 +y2 )
KN (x, y) = √ e Hk (x)Hk (y)
2π k=0 k!
1 1 2 HN (x)HN −1 (y) − HN −1 (x)HN (y)
= √ e− 4 (x +y )
2

2π (x − y) (N − 1)!
√ ΨN (x)ΨN −1 (y) − ΨN −1 (x)ΨN (y)
= N· .
x−y

97
Note that the ΨN satisfy the differential equation (see Exercise 30)
x √
Ψ0N (x) = − ΨN (x) + N ΨN −1 (x),
2
and thus
h i h i
ΨN (x) Ψ0N (y) + y2 ΨN (y) − Ψ0N (x) + x2 ΨN (x) ΨN (y)
KN (x, y) =
x−y
ΨN (x)Ψ0N (y) − Ψ0N (x)ΨN (y) 1
= − ΨN (x)ΨN (y).
x−y 2
Now put  √ 
e (x) := N 1/12 · Ψ −1/6
ΨN N 2 N + xN ,
thus
 √   √ 
e 0 (x) = N 1/12 · Ψ0 2 N + xN −1/6 · N −1/6 = N −1/12 · Ψ0 2 N + xN −1/6 .
Ψ N N N

Then
Ψ e 0 (y) − Ψ
e (x)Ψ e 0 (x)Ψ
e (y) 1 e0
N N N N e 0 (y).
K̃(x, y) = − Ψ (x)Ψ
x−y 2N 1/3 N N

One can show, by a quite non-trivial steepest descent method, that Ψ


e (x) converges
N
to a limit. Let us call this limit the Airy function

Ai(x) = lim Ψ
e (x).
N
N →∞

The convergence is actually so strong that also

Ai0 (x) = lim Ψ


e 0 (x),
N
N →∞

and hence
Ai(x) Ai0 (y) − Ai0 (x) Ai(y)
lim K̃(x, y) = =: A(x, y).
N →∞ x−y
A is called the Airy kernel.
Let us try, again non-rigorously, to characterize this limit function Ai. For the
Hermite functions we have (see Exercise 30)
!
1 x2
Ψ00N (x) + N+ − ΨN (x) = 0.
2 4

98
For the Ψ
e we have
N
 √ 
e 0 (x) = N −1/12 · Ψ0 2 N + xN −1/6
Ψ N N

and  √ 
e 00 (x) = N −1/4 · Ψ00 2 N + xN −1/6 .
Ψ N N

Thus,
  √ −1/6
2 
1 2 N + xN  √ 
e 00 (x) = −N −1/4 
Ψ N + − 
 ΨN 2 N + xN −1/6
N
24

1 4N + 4 N xN −1/6 + x2 N −1/3 e
" #
−1/3
= −N N+ − ΨN (x)
2 4
1 4xN 1/3 + x2 N −1/3 e
" #
−1/3
= −N − ΨN (x)
2 4
≈ −xΨ
e (x)
N for large N .

Hence we expect that Ai should satisfy the differential equation

Ai00 (x) − x Ai(x) = 0.

This is indeed the case, but the proof is again beyond our tools. Let us just give
the formal definition of the Airy function and formulate the final result.

Definition 9.8. The Airy function Ai : R → R is a solution of the Airy ODE

u00 (x) = xu(x)

determined by the following asymptotics as x → ∞ :

1 2
 
Ai(x) ∼ √ x−1/4 exp − x3/2 .
2 π 3

The Airy kernel is defined by

Ai(x) Ai0 (y) − Ai0 (x) Ai(y)


A(x, y) = .
x−y

99
Theorem 9.9. The random variable N 2/3 (λmax (AN ) − 2) of a normalized gue(n)
has a limiting distribution as N → ∞. Its limiting distribution function is
2
h i
F2 (t) : = lim P N 3 (λmax − 2) ≤ t
N →∞
N ∞ Z∞
(−1)r Z
· · · det (A(xi , xj ))ri,j=1 dx1 · · · dxr .
X
=
r=0 r!
t t

The form of F2 from Theorem 9.9 is more of a theoretical nature and not very
convenient for calculations. A main contribution of Tracy–Widom in this context
was that they were able to derive another, quite astonishing, representation of F2 .
Theorem 9.10 (Tracy–Widom, 1994). The distribution function F2 satisfies
Z∞
 

F2 (t) = exp − (x − t)q(x)2 dx ,


t

where q is a solution of the Painlevé II equation q 00 (x) − xq(x) + 2q(x)3 = 0 with


q(x) ∼ Ai(x) as x → ∞.
Here is a plot of the Tracy–Widom distribution F2 , via solving the Painlevé II
equation from above, and a comparision with the histogram for the rescaled largest
eigenvalue of 5000 gue(200); see also Exercise 31.

0.5

0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-5 -4 -3 -2 -1 0 1 2

100
9.6 Proof of the Harer–Zagier recursion
We still have to prove the recursion of Harer–Zagier, Theorem 9.2.
Let us denote h i
T (k, N ) := E tr A2k εg (k)N −2g .
X
N =
g≥0

The genus expansion shows that T (k, N ) is, for fixed k, a polynomial in N −1 . Ex-
pressing it in terms of integrating over eigenvalues reveals the surprising fact that,
up to a Gaussian factor, it is also a polynomial in k for fixed N . We show this in
the next lemma. This is actually the only place where we need the random matrix
interpretation of this quantity.
Lemma 9.11. The expression
1
Nk T (k, N )
(2k − 1)!!
is a polynomial of degree N − 1 in k.
Proof. First check the easy case N = 1: T (k, 1) = (2k − 1)!! is the 2k-th moment of
a normal variable and
T (k, 1)
=1
(2k − 1)!!
is a polynomial of degree 0 in k.

For general N we have


h i
T (k, N ) = E tr A2k
N
Z 
N
− 2 (λ1 +···+λN )2 2

λ2k 2k
(λi − λj )2 dλ1 · · · dλN
Y
= cN 1 + · · · + λN e
i<j
RN
Z
N
− 2 (λ1 +···+λN )
2 2
λ2k
Y
= N cN 1 e |λi − λj | dλ1 · · · dλN
RN i6=j
Z
N 2
− 2 λ1
= N cN λ2k
1 e pN (λ1 ) dλ1 ,
R

where pN is the result of integrating the Vandermonde over λ2 , . . . , λN . It is an even


polynomial in λ1 of degree 2(N − 1), whose coefficients depend only on N and not
on k. Hence
N −1
αl λ2l
X
pN (λ1 ) = 1
l=0

101
with αl possibly depending on N . Thus,
N −1 Z
N 2
λ2k+2l e− 2 λ1 dλ1
X
T (k, N ) = N cN αl 1
l=0 R
N −1
αl · kN · (2k + 2l − 1)!! · N −(k+l) ,
X
= N cN
l=0

since the integral over λ1 gives the (2k + 2l)-th moment of a Gauss variable of
variance N −1 , where kN contains the N -dependent normalization constants of the
Gaussian measure; hence
N k T (k, N )
(2k − 1)!!
is a linear combination (with N -dependent coefficients) of terms of the form

(2k + 2l − 1)!!
.
(2k − 1)!!

These terms are polynomials in k of degree l.


We now have that
1 1
Nk T (k, N ) = N k N #(γπ)−k−1
X
(2k − 1)!! (2k − 1)!! π∈P2 (2k)
1 1
N #(γπ)
X
=
N (2k − 1)!! π∈P2 (2k)
1 1
= t(k, N ),
N (2k − 1)!!

where the last equality defines t(k, N ).


By Lemma 9.11, t(k, N )/(2k − 1)!! is a polynomial of degree N − 1 in k. We
interpret it as follows:
X
t(k, N ) = # {coloring cycles of γπ with at most N different colors} .
π∈P2 (2k)

Let us introduce
X
t̃(k, L) = # {coloring cycles of γπ with exactly L different colors} ,
π∈P2 (2k)

102
then we have
N
!
X N
t(k, N ) = t̃(k, L) ,
L=1 L

because if we want to use at most N different colors, then we can do this by using
exactly
  L different colors (for any L between 1 and N ), and after fixing L we have
N
L
many possibilities to choose the L colors among the N colors.
This relation can be inverted by

N
!
X
N −L N
t̃(k, N ) = (−1) t(k, L)
L=1 L

and hence t̃(k, N )/(2k − 1)!! is also a polynomial in k of degree N − 1. But now we
have

0 = t̃(0, N ) = t̃(1, N ) = · · · = t̃(N − 2, N ),

since γπ has, by Proposition 2.20, at most k + 1 cycles for π ∈ P2 (2k); and thus
t̃(k + 1, N ) = 0 if k + 1 < N , as we need at least N cycles if we want to use N
different colors.
So, t̃(k, N )/(2k − 1)!! is a polynomial in k of degree N − 1 and we know N − 1
zeros; hence it must be of the form
!
t̃(k, N ) k
= αN k(k − 1) · · · (k − N + 2) = αN (N − 1)!.
(2k − 1)!! N −1

Hence,

N
! !
X N k
t(k, N ) = (L − 1)!αL (2k − 1)!!.
L=1 L L−1

To identify αN we look at
!
N
αN +1 N !(2N − 1)!! = t̃(N, N + 1) = CN (N + 1)!.
N

Note that only the NC pairings can be colored with exactly N + 1 colors, and for

103
each such π there are (N + 1)! ways of doing so. We conclude
CN (N + 1)!
αN +1 =
N !(2N − 1)!!
CN (N + 1)
=
(2N − 1)!!
!
1 2N N +1
=
N + 1 N (2N − 1)!!
(2N )!
=
N !N !(2N − 1)!!
2N
= .
N!
Thus we have
1
T (k, N ) = t(k, N )
N k+1
N
2L−1
! !
1 X N k
= k+1 (L − 1)! (2k − 1)!!
N L=1 L L−1 (L − 1)!
N
! !
1 N k
2L−1 .
X
= (2k − 1)!!
N k+1 L=1 L L−1

To get information from this on how this changes in k we consider a generating


function in k,

T (k, N )
(N s)k+1
X
T (s, N ) = 1 + 2
k=0 (2k − 1)!!
∞ X
N
! !
N k
2L−1 sk+1
X
=1+2
k=0 L=1 L L−1
N ∞
! !
N L X k
sk+1
X
= 2
L=0 L k=L−1 L − 1
N
! L
N L s
X 
= 2
L=0 L 1−s
N L
!
X N 2s
=
L=0 L 1−s

104
N
2s

= 1+
1−s
1+s N
 
= .
1−s
Note that (as in our calculation for the αN )
1 2k
=
(2k − 1)!! k!Ck (k + 1)
and hence T (s, N ) can also be rewritten as a generating function in our main quan-
tity of interest,

(N ) T (k, N )
bk = (we make now the dependence of bk on N explicit)
Ck
as

T (k, N ) k
2 (N s)k+1
X
T (s, N ) = 1 + 2
k=0 (k + 1)!C k

∞ (N )
bk
(2N s)k+1 .
X
=1+
k=0 (k + 1)!
(N )
In order to get a recursion for the bk , we need some functional relation for T (s, N ).
Note that the recursion in Harer–Zagier involves bk , bk+1 , bk−1 for the same N , thus
we need a relation which does not change the N . For this we look on the derivative
with respect to s. From
1+s N
 
T (s, N ) =
1−s
we get
N −1
d 1+s (1 − s) + (1 + s)

T (s, N ) = N
ds 1−s (1 − s)2
N
1+s 1

= 2N
1−s (1 − s)(1 + s)
1
= 2N · T (s, N ) ,
1 − s2
and thus
d
(1 − s2 ) T (s, N ) = 2N · T (s, N ).
ds

105
Note that we have
∞ (N )
d bk
(2N s)k 2N.
X
T (s, N ) =
ds k=0 k!

Thus, by comparing coefficients of sk+1 in our differential equation from above, we


conclude
(N ) (N ) (N )
bk+1 bk−1 b
(2N )k+2 − (2N )k = 2N k+1 (2N )k+1 ,
(k + 1)! (k − 1)! (k + 1)!

and thus, finally,


(N ) (N ) (N ) (k + 1)k
bk+1 = bk + bk−1 .
(2N )2

106
10 Statistics of the Longest
Increasing Subsequence
10.1 Complete order is impossible
Definition 10.1. A permutation σ ∈ Sn is said to have an increasing subsequence
of length k if there exist indices 1 ≤ i1 < · · · < ik ≤ n such that σ(i1 ) < · · · < σ(ik ).
For a decreasing subsequence of length k the above holds with the second set of
inequalities reversed. For a given σ ∈ Sn we denote the length of an increasing
subsequence of maximal length by Ln (σ).

Example 10.2. (1) Maximal length is achieved for the identity permutation
!
1 2 ··· n − 1 n
σ = id = ;
1 2 ··· n − 1 n

this has an increasing subsequence of length n, hence Ln (id) = n. In this case,


all decreasing subsequences have length 1.
(2) Minimal length is achieved for the permutation
!
1 2 ··· n − 1 n
σ= ;
n n − 1 ··· 2 1

in this case all increasing subsequences have length 1, hence Ln (σ) = 1; but
there is a decreasing subsequence of length n.
(3) Consider a more “typical” permutation
!
1 2 3 4 5 6 7
σ= ;
4 2 3 1 6 5 7

this has (2, 3, 5, 7) and (2, 3, 6, 7) as longest increasing subsequences, thus


L7 (σ) = 4. Its longest decreasing subsequences are (4, 2, 1) and (4, 3, 1) with

107
length 3. In the graphical representation
y
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 x

an increasing subsequence corresponds to a path that always goes up.


Remark 10.3. (1) Longest increasing subsequences are relevant for sorting algo-
rithms. Consider a library of n books, labeled bijectively with numbers 1, . . . , n,
arranged somehow on a single long bookshelf. The configuration of the books
corresponds to a permutation σ ∈ Sn . How many operations does one need
to sort the books in a canonical ascending order 1, 2, . . . , n? It turns out
that the minimum number is n − Ln (σ). One can sort around an increasing
subsequence.
Example. Around the longest increasing subsequence (1, 2, 6, 8) we sort

4 1 9 3 2 7 6 8 5
→ 4 1 9 2 3 7 6 8 5
→ 1 9 2 3 4 7 6 8 5
→ 1 9 2 3 4 5 7 6 8
→ 1 9 2 3 4 5 6 7 8
→ 1 2 3 4 5 6 7 8 9

in 9 − 4 = 5 operations.
(2) One has situations with only small increasing subsequences, but then one has
long decreasing subsequences. This is true in general; one cannot avoid both
long decreasing and long increasing subsequences at the same time. According
to the slogan

108
“Complete order is impossible.” (Motzkin)
Theorem 10.4 (Erdős, Szekeres, 1935). Every permutation σ ∈ Sn2 +1 has a mono-
tone subsequence of length more than n.
Proof. Write σ = a1 a2 · · · an2 +1 . Assign labels (xk , yk ), where xk is the length of
a longest increasing subsequence ending at ak ; and yk is the length of a longest
decreasing subsequence ending at ak . Assume now that there is no monotone subse-
quence of length n+1. Hence we have for all k: 1 ≤ xk , yk ≤ n; i.e., there are only n2
possible labels. By the pigeonhole principle there are i < j with (xi , yi ) = (xj , yj ). If
ai < aj we can append aj to a longest increasing subsequence ending at ai , but then
xj > xi . If ai > aj we can append aj to a longest decreasing subsequence ending at
ai , but then yj > yi . In both cases we have a contradiction.

10.2 Tracy–Widom for the asymptotic


distribution of Ln
We are now interested in the distribution of Ln (σ) for n → ∞. This means, we
put the uniform distribution on permutations, i.e., P [σ] = 1/n! for all σ ∈ Sn , and
consider Ln : Sn → R as a random variable. What is the asymptotic distribution of
Ln ? This question is called Ulan’s problem and was raised in the 1960’s. In 1972,
Hammersley showed that the limit
E [Ln ]
Λ = lim √
n→∞ n

exists and that Ln / n converges to Λ in probability. In 1977, Vershik–Kerov and
Logan–Shepp showed independently that Λ = 2. Then in 1998, Baik, Deift and
Johansson proved the asymptotic behaviour of the fluctuations of Ln ; quite surpris-
ingly, this is also captured by the Tracy–Widom distribution:
" √ #
Ln − 2 n
lim P ≤ t = F2 (t).
n→∞ n1/6

10.3 Very rough sketch of the proof of the Baik,


Deift, Johansson theorem
Again, we have no chance of giving a rigorous proof of the BDJ theorem. Let us give
at least a possible route for a proof, which gives also an idea why the statistics of

109
the length of the longest subsequence could be related to the statistics of the largest
eigenvalue.
(1) The RSK correspondence relates permutations to Young diagrams. Ln goes
under this mapping to the length of the first row of the diagram.
(2) These Young diagrams correspond to non-intersecting paths.
(3) Via Gessel–Viennot the relevant quantities in terms of NC paths have a de-
terminantal form.
(4) Then one has to show that the involved kernel, suitably rescaled, converges to
the Airy kernel.
In the following we want to give some idea of the first two items in the above list;
the main (and very hard part of the proof) is to show the convergence to the Airy
kernel.

10.3.1 RSK correspondence


RSK stands for Robinson–Schensted–Knuth after papers from 1938, 1961 and 1973.
It gives a bijection [
Sn ←→ (Tab λ × Tab λ) ,
λ
Young diagram
of size n
where Tab λ is the set of Young tableaux of shape λ.
Definition 10.5. (1) Let n ≥ 1. A partition of n is a sequence of natural numbers
λ = (λ1 , . . . , λr ) such that
r
X
λ1 ≥ λ2 ≥ · · · ≥ λr and λi = n.
i=1

We denote this by λ ` n. Graphically, a partition λ ` n is represented by a


Young diagram with n boxes.

..
.

(2) A Young tableau of shape λ is the Young diagram λ filled with numbers 1, . . . , n
such that in any row the numbers are increasing from left to right and in any
column the numbers are increasing from top to bottom. We denote the set of
all Young tableaux of shape λ by Tab λ.

110
Example 10.6. (1) For n = 1 there is only one Young diagram, , and one
corresponding Young tableau: 1 .
For n = 2, there are two Young diagrams,

and ,
each of them having one corresponding Young tableau
1
1 2 and 2 .

For n = 3, there are three Young diagrams

;
the first and the third have only one tableau, but the middle one has two:
1 2 1 3
3 2

(2) Note that a tableau of shape λ corresponds to a walk from ∅ to λ by adding


one box in each step and only visiting Young diagrams. For example, the
Young tableau

1 2 4 8
3 7
5
6
corresponds to the walk
1 2 4 1 2 4 1 2 4 8
1 2 4 3 3 7 3 7
1 2 1 2 4 3 5 5 5
1 → 1 2 → 3 → 3 → 5 → 6 → 6 → 6

Remark 10.7. Those objects are extremely important since they parametrize the
irreducible representations of Sn :
λ ` n ←→ irreducible representation πλ of Sn .

111
Furthermore, the dimension of such a representation πλ is given by the number of
tableaux of shape λ. If one recalls that for any finite group one has the general
statement that the sum of the squares of the dimensions over all irreducible repre-
sentations of the group gives the number of elements in the group, then one has for
the symmetric group the statement that
(#T abλ)2 = #Sn = n!.
X

λ`n
This shows that there is a bijection between elements in Sn and pairs of tableaux of
the same shape λ ` n. The RSK correspondence is such a concrete bijection, given
by an explicit algorithm. It has the property, that Ln goes under this bijection over
to the length of the first row of the corresponding Young diagram λ.
Example 10.8. For example, under the RSK correspondence, the permutation
!
1 2 3 4 5 6 7
σ=
4 2 3 6 5 1 7
corresponds to the pair of Young tableaux
1 3 5 7 1 3 4 7
2 6 2 5
4 , 6 .
Note that L7 (σ) = 4 is the length of the first row.

10.3.2 Relation to non-intersecting paths


Pairs (Q, P ) ∈ Tab λ × Tab λ can be identified with r = #rows(λ) paths. Q gives
the positions of where to go up and P of where to go down; the conditions on
the Young tableau guarantee that the paths will be non-intersecting. For example,
the pair corresponding to the σ from Example 10.8 above gives the following non-
intersecting paths:

1 2 3 4 5 6 7 7 6 5 4 3 2 1

112
11 The Circular Law
11.1 Circular law for Ginibre ensemble
The non-selfadjoint analogue of gue is given by the Ginibre ensemble, where all
entries are independent and complex Gaussians. A standard complex Gaussian is
of the form
x + iy
z= √ ,
2
where x and y are independent standard real Gaussians, i.e., with joint distribution
1 − x2 − y 2
p(x, y) dx dy = e 2 e 2 dx dy.

If we rewrite this in terms of a density with respect to the Lebesgue measure for
real and imaginary part
x + iy x − iy
z= √ = t1 + it2 , z= √ = t1 − it2 ,
2 2
we get
1 −(t21 +t22 ) 1 2
p(t1 , t2 ) dt1 dt2 = e dt1 dt2 = e−|z| d2 z,
π π
where d2 z = dt1 dt2 .

Definition 11.1. A (complex) unnormalized Ginibre ensemble AN = (aij )N


i,j=1 is
given by complex-valued entries with joint distribution
 
N N
1 1
|aij |2  dA = exp (− Tr(AA∗ )) dA, d2 aij .
X Y
exp − where dA =
πN 2 i,j=1 πN 2 i,j=1

As for the gue case, Theorem 7.6, we can rewrite the density in terms of eigen-
values. Note that the eigenvalues are now complex.

113
Theorem 11.2. The joint distribution of the complex eigenvalues of an N × N
Ginibre ensemble is given by a density
N
!
2
|zi − zj |2 .
X Y
p(z1 , . . . , zN ) = cN exp − |zk |
k=1 1≤i<j≤N

Remark 11.3. (1) Note that typically Ginibre matrices are not normal, i.e., AA∗ 6=
A∗ A. This means that one loses the relation between functions in eigenvalues
and traces of functions in the matrix. The latter is what we can control, the
former is what we want to understand.
(2) As in the selfadjoint case the eigenvalues repel, hence there will almost surely
be no multiple eigenvalues. Thus we can also in the Ginibre case diagonalize
our matrix, i.e., A = V DV −1 , where D = diag(z1 , . . . , zN ) contains the eigen-
values. However, V is now not unitary anymore, i.e., eigenvectors for different
eigenvalues are in general not orthogonal. We can also diagonalize A∗ via A∗ =
(V −1 )∗ D∗ V ∗ , but since V −1 6= V ∗ (if A is not normal) we cannot diagonalize A
and A∗ simultaneously. This means that in general, for example Tr(AA∗ A∗ A)
has no clear relation to N ∗ ∗ ∗ ∗
i z¯i z¯i zi . (Note that Tr(AA A A) 6= Tr(AA AA )
P
i=1 z
if AA∗ 6= A∗ A, but of course N N
P P
i=1 zi z¯i z¯i zi = i=1 zi z¯i z1 z¯i .)
(3) In Theorem 11.2 it seems that we have rewritten the density exp(− Tr(AA∗ ))
2
as exp(− N k=1 |zk | ). However, this is more subtle. On can bring any matrix
P

via a unitary conjugation in a triangular form: A = U T U ∗ , where U is unitary


and  
z1 ? · · · ?
 0 . . . . . . ... 
 
T =
 
 .. . . .. 
. . . ? 
0 · · · 0 zn
contains on the diagonal the eigenvalues z1 , . . . , zn of A (this is usually called
Schur decomposition). Then A∗ = U T ∗ U ∗ with
 
z¯1 0 ··· 0
 . . . . . . .. 

? .
T =
 
.
. .. .. 
. . . 0 
? · · · ? z¯n
and
N
Tr(AA∗ ) = Tr(T T ∗ ) = |zk |2 +
X X
tij t̄ij .
k=1 j>i

114
Integrating out the tij (j > i) gives then the density for the zi .
(4) As for the gue case (Theorem 7.15) we can write the Vandermonde density in a
determinantal form. The only difference is that we have to replace the Hermite
polynomials Hk (x), which orthogonalize the real Gaussian distribution, by
monomials z k , which orthogonalize the complex Gaussian distribution.

Theorem 11.4. The joint eigenvalue distribution of the Ginibre ensemble is of the
determinantal form p(z1 , . . . , zn ) = N1 ! det(KN (zi , zj ))N
i,j=1 with the kernel

N −1
1 1 2 1
ϕk (z) = √ e− 2 |z| √ z k .
X
KN (z, w) = ϕk (z)ϕ̄k (w), where
k=0 π k!

In particular, for the averaged eigenvalue density of an unnormalized Ginibre eigen-


value matrix we have the density

1 1 −|z|2 NX−1
|z|2k
pN (z) = KN (z, z) = e .
N Nπ k=0 k!

Theorem 11.5 (Circular law for the Ginibre ensemble). The averaged eigenvalue
distribution for a normalized Ginibre random matrix √1N AN converges for N → ∞
weakly to the uniform distribution on the unit disc of C with density π1 1{z∈C||z|≤1} .

Proof. The density qN of the normalized Ginibre is given by

√ 1 N −1 2 k
2 X (N |z| )
qN (z) = N · pN ( N z) = e−N |z| .
π k=0 k!

We have to show that this converges to the circular density. For |z| < 1 we have

N |z|2
N
X −1
(N |z|2 )k X∞
(N |z|2 )k
e − =
k=0 k! k=N k!
(N |z|2 )N X∞
(N |z|2 )l
≤ l
N! l=0 (N + 1)

(N |z|2 )N 1
≤ 2 ,
N! 1 − NN|z|
+1

115
√ 1
Furthermore, using the lower bound N ! ≥ 2πN N + 2 e−N for N !, we calculate
2 (N |z|2 )N 2 1 1
e−N |z| ≤ e−N |z| N N |z|2N √ 1 e
N
N! 2π N N + 2

1 1 −N |z|2 N ln|z|2 N
=√ √ e e e
2π N
1 exp[N (− |z|2 + ln |z|2 + 1)] N →∞
= √ √ −→ 0.
2π N
Here, we used that − |z|2 + ln |z|2 + 1 < 0 for |z| < 1. Hence we conclude
−N |z|2
N
X −1
(N |z|2 )k 2 N
−N |z|2 (N |z| ) 1 N →∞
1−e ≤e |z| 2 −→ 0.
k=0 k! N! N
1 − N +1
Similarly, for |z| > 1,
N
X −1
(N |z|2 )k (N |z|2 )N −1 NX
−1
(N − 1)l (N |z|)N −1 1
≤ 2 l ≤ −1 ,
k=0 k! (N − 1)! l=0 (N |z| ) (N − 1)! 1 − NN|z| 2

which shows that


2
N −1
(N |z|2 )k N →∞
E −N |z|
X
−→ 0.
k=0 k!

Remark 11.6. (1) The convergence also holds almost surely. Here is a plot of the
3000 eigenvalues of one realization of a 3000 × 3000 Ginibre matrix.

1.5

0.5

-0.5

-1

-1.5
-1.5 -1 -0.5 0 0.5 1 1.5

116
(2) The circular law also holds for non-Gaussian entries, but proving this is much
harder than the extension for the semicircle law from the Gaussian case to
Wigner matrices.

11.2 General circular law


Theorem 11.7 (General circular law). Consider a complex random matrix AN =
N
√1 (aij )
N i,j=1 , where the aij are independent and identically distributed complex ran-
dom variables with variance 1, i.e., E[|aij |2 ] − E [aij ]2 = 1. Then the eigenvalue
distribution of AN converges weakly almost surely for N → ∞ to the uniform dis-
tribution on the unit disc.

Note that only the existence of the second moment is required, higher moments
don’t need to be finite.

Remark 11.8. (1) It took quite a while to prove this in full generality. Here is a
bit about the history of the proof.
• 60’s, Mehta proved it (see his book) in expectation for Ginibre ensemble;
• 80’s, Silverstein proved almost sure convergence for Ginibre;
• 80’s, 90’s, Girko outlined the main ideas for a proof in the general case;
• 1997, Bai gave the first rigorous proof, under additional assumptions on
the distribution
• papers by Tao–Vu, Götze–Tikhomirov, Pan–Zhou and others, weakening
more and more the assumptions;
• 2010, Tao–Vu gave final version under the assumption of the existence of
the second moment.
(2) For measures on C one can use ∗-moments or the Stieltjes transform to describe
them, but controlling the convergence properties is the main problem.
(3) For a matrix A its ∗-moments are all expressions of the form tr(Aε(1) · · · Aε(m) ),
where m ∈ N and ε(1), . . . , ε(m) ∈ {1, ∗}. The eigenvalue distribution

1
µA = (δz1 + · · · + δzN ) (z1 , . . . , zn are complex eigenvalues of A)
N

of A is uniquely determined by the knowledge of all ∗-moments of A, but con-


vergence of ∗-moments does not necessarily imply convergence of the eigen-
value distribution.

117
Example. Consider

0 1 0 ··· 0 0 1 0 ··· 0
   
. . ..  . .
 .. .
.. .. ... .  .. . . . . . . . ... 
. 
   
. .. .. . .. ..
 .. .
 
AN =  . . 0
 and BN = 
. . . 0 .
.
. ...  
.. 
. 1

0

. 1 
0 ··· ··· ··· 0 1 0 ··· ··· 0

Then µAN = δ0 , but µBN is the uniform distribution on the N -th roots of
unity. Hence µAN → δ0 , whereas µBN converges to the uniform distribution
on the unit circle. However, the limits of the ∗-moments are the same for AN
and BN .
(4) For each measure µ on C one has the Stieltjes transform
Z
1
Sµ (w) = dµ(z).
z−w
C

This is almost surely defined. However, it is analytic in w only outside the


support of µ. In order to recover µ from Sµ one also needs the information
about Sµ inside the support. In order to determine and deal with µA one
reduces it via Girko’s “hermitization method”
Z Zt
log |λ − z| dµA (z) = log t dµ|A−λ1| (t)
C 0

to selfadjoint matrices. The left hand side for all λ determines µA and the
right hand side is about selfadjoint matrices
q
|A − λ1| = (A − λ1)(A − λ1)∗ .

Note that the eigenvalues of |B| are related to those of


!
0 B
.
B∗ 0

In this analytic approach one still needs to control convergence properties. For
this, estimates of probabilities of small singular values are crucial.
For more details on this one should have a look at the survey of Bordenave–
Chafai, Around the circular law.

118
12 Several Independent GUEs
and Asymptotic Freeness
Up to now, we have only considered limits N → ∞ of one random matrix AN . But
often one has several matrix ensembles and would like to understand the “joint”
distribution; e.g., in order to use them as building blocks for more complicated
random matrix models.

12.1 The problem of non-commutativity


(N ) (N )
Remark 12.1. (1) Consider two random matrices A1 and A2 such that their
entries are defined on the same probability space. What is now the “joint”
information about the two matrices which survies in the limit N → ∞? Note
that in general our analytical approach breaks down if A1 and A2 do not
commute, since then we cannot diagonalize them simultaneously. Hence it
makes no sense to talk about a joint eigenvalue distribution of A1 and A2 .
The notion µA1 ,A2 has no clear analytic meaning.
What still makes sense in the multivariate case is the combinatorial approach
via “mixed” moments with respect to the normalized trace tr. Hence we con-
(N ) (N )
sider the collection of all mixed moments tr(Ai1 · · · Aim ) in A1 and A2 , with
m ∈ N, i1 , . . . , im ∈ {1, 2}, as the joint distribution of A1 and A2 and denote
this by µA1 ,A2 . We want to understand, in interesting cases, the behavior of
µA1 ,A2 as N → ∞.
(2) In the case of one selfadjoint matrix A, the notion µA has two meanings:
analytic as µA = N1 (δλ1 + · · · + δλN ), which is a probability measure on R;
combinatorial, where µA is given by all moments tr[Ak ] for all k ≥ 1.
These two points of view are the same (at least when we restrict to cases where
the proabability measure µ is determined by its moments) via
Z
tr(Ak ) = tk dµA (t).

In the case of two matrices A1 , A2 the notion µA1 ,A2 has only one meaning,

119
namely the collection of all mixed moments tr[Ai1 · · · Aim ] with m ∈ N and
i1 , . . . , im ∈ {1, 2}. If A1 and A2 do not commute then there exists no proba-
bility measure µ on R2 such that
Z
tr[Ai1 · · · Aim ] = ti1 · · · tim dµ(t1 , t2 )

for all m ∈ N and i1 , . . . , im ∈ {1, 2}.

12.2 Joint moments of independent GUEs


We will now consider the simplest case of several random matrices, namely r gues
(N )
A1 , . . . , A(N )
r , which we assume to be independent of each other, i.e., we have
(N ) (i) (N )
Ai = √1N (akl )N k,l=1 , where i = 1, . . . , r, each Ai is a gue(n) and
n o n o
(1) (r)
akl ; k, l = 1, . . . , N , . . . , akl ; k, l = 1, . . . , N

are independent sets of Gaussian random variables. Equivalently, this can be charac-
terized by the requirement that all entries of all matrices together form a collection of
independent standard Gaussian variables (real on the diagonal, complex otherwise).
Hence we can express this again in terms of the Wick formula 2.8 as
h i h i
(i ) (i ) X (i ) (i )
E ak11l1 · · · akmmlm = Eπ ak11l1 , . . . , akmmlm
π∈P2 (m)

for all m ∈ N, 1 ≤ k1 , l1 , . . . , km , lm ≤ N and 1 ≤ i1 , . . . , im ≤ r and where the


second moments are given by
h i
(i) (j)
E apq akl = δpl δqk δij .

Now we can essentially repeat the calculations from Remark 2.14 for our mixed
moments:
N
1 X h
(i ) (i ) (i )
i
E [tr(Ai1 · · · Aim )] = m E ak11k2 ak22k3 · · · akmmk1
N 1+ 2 k1 ,...,km =1
N
1 X X h
(i ) (i ) (i )
i
= 1+ m
Eπ ak11k2 , ak22k3 , . . . , akmmk1
N 2
k1 ,...,km =1 π∈P2 (m)
N
1 X X Y h
(i ) (i )
i
= 1+ m
E akppkp+1 akqqkq+1
N 2
k1 ,...,km =1 π∈P2 (m) (p,q)∈π

120
N
1 X X Y
= 1+ m
[kp = kq+1 ] [kq = kp+1 ] [ip = iq ]
N 2
k1 ,...,km =1 π∈P2 (m) (p,q)∈π
N
1 X X Yh i
= 1+ m
kp = kγπ(p)
N 2
π∈P2 (m) k1 ,...,km =1 p
(p,q)∈π
ip =iq
1
N #(γπ) ,
X
= 1+ m
N 2
π∈P2 (m)
(p,q)∈π
ip =iq

where γ = (1 2 . . . m) ∈ Sm is the shift by 1 modulo m. Hence we get the same kind


of genus expansion for several gues as for one gue. The only difference is, that in
our pairings we only allow to connect the same matrices.

Notation 12.2. For a given i = (i1 , . . . , im ) with 1 ≤ i1 , . . . , im ≤ r we say that


π ∈ P2 (m) respects i if we have ip = iq for all (p, q) ∈ π. We put
[i]
P2 (m) := {π ∈ P2 (m) | π respects i}

and also
[i]
N C 2 (m) := {π ∈ N C 2 (m) | π respects i} .

Theorem 12.3 (Genus expansion of independent gues). Let A1 , . . . , Ar be r inde-


pendent gue(n). Then we have for all m ∈ N and all i1 , . . . , im ∈ [r] that
m
N #(γπ)− 2 −1
X
E [tr(Ai1 · · · Aim )] =
[i]
π∈P2 (m)

and thus
[i]
lim E [tr(Ai1 · · · Aim )] = #N C 2 (m).
N →∞

Proof. The genus expansion follows from our computation above. The limit for
N → ∞ follows as for Wigner’s semicircle law 2.21 from the fact that

#(γπ)− m −1
1, π ∈ N C 2 (m),
lim N 2 =
N →∞ 0, π 6∈ N C 2 (m).

The index tuple (i1 , . . . , im ) has no say in this limit.

121
12.3 The concept of free independence
Remark 12.4. We would like to find some structure in those limiting moments. We
prefer to talk directly about the limit instead of making asymptotic statements. In
the case of one gue, we had the semicircle µW as a limiting analytic object. Now
we do not have an analytic object in the limit, but we can organize our distribution
as the limit of moments in a more algebraic way.

Definition 12.5. (1) Let A = Chs1 , . . . , sr i be the algebra of polynomials in non-


commuting variables s1 , . . . , sr ; this means A is the free unital algebra gener-
ated by s1 , . . . , sr (i.e., there are no non-trivial relations between s1 , . . . , sr and
A is the linear span of the monomials si1 · · · sim for m ≥ 0 and i1 , . . . , im ∈ [r];
multiplication for monomials is given by concatenation).
(2) On this algebra A we define a unital linear functional ϕ : A → C by ϕ(1) = 1
and
[i]
ϕ(si1 · · · sim ) = lim E [tr(Ai1 · · · Aim )] = #N C 2 (m).
N →∞

(3) We also address (A, ϕ) as a non-commutative probability space and s1 , . . . , sr ∈


A as (non-commutative) random variables. The moments of s1 , . . . , sr are the
ϕ(si1 · · · sim ) and the collection of those moments is the (joint) distribution of
s1 , . . . , s r .

Remark 12.6. (1) Note that if we consider only one of the si , then its distribution
is just the collection of Catalan numbers, hence correspond to the semicircle,
which we understand quite well.
(2) If we consider all s1 , . . . , sr , then their joint distribution is a large collection
of numbers. We claim that the following theorem discovers some important
structure in those.

Theorem 12.7. Let A = Chs1 , . . . , sr i and let ϕ : A → C be defined by ϕ(si1 · · · sim ) =


[i]
#N C 2 (m) as before. Then for all m ≥ 1, i1 , . . . , im ∈ [r] with i1 6= i2 , i2 6=
i3 , . . . , im−1 6= im and all polynomials p1 , . . . , pm in one variable such that ϕ (pk (sik )) =
0 we have:
ϕ (p1 (si1 )p2 (si2 ) · · · pm (sim )) = 0.
In words: the alternating product of centered variables is centered.

We say that s1 , . . . , sr are free (or freely independent); in terms of the indepen-
(N )
dent gue random matrices, we say that A1 , . . . , A(N r
)
are asymtotially free. Those
notions and the results above are all due to Dan Voiculescu.

122
Proof. It suffices to prove the statement for polynomials of the form
 
pk (sik ) = spikk − ϕ spikk

for any power pk , since general polynomials can be written as linear combinations
of those. The general statement then follows by linearity. So we have to prove that
h     i
ϕ spi11 − ϕ spi11 · · · spimm − ϕ spimm = 0.

We have
 
p p
h     i  
spi11 − ϕ spi11 · · · spimm − ϕ spimm (−1)|M |
X Y Y
ϕ = ϕ sijj ϕ  sijj 
M ⊂[m] j∈M j6∈M

with
p
   
ϕ sijj = ϕ sij · · · sij = #N C 2 (pj )
and
   
Y p [respects indices] X
ϕ sijj  = #N C 2  pj  .
j6∈M j6∈M

Let us put

I1 = {1, . . . , p1 }
I2 = {p1 + 1, . . . , p1 + p2 }
..
.
Im = {p1 + p2 + · · · + pm−1 + 1, . . . , p1 + p2 + · · · + pm }

and I = I1 ∪ I2 ∪ · · · ∪ Im . Denote

[. . . ] = [i1 , . . . , i1 , i2 , . . . , i2 , . . . , im , . . . , im ].

Then
 
p p
 
Y Y [... ]
ϕ sijj ϕ  sijj  = #{π ∈ N C 2 (I) | for all j ∈ M all elements
j∈M j6∈M

in Ij are only paired amongst each other}

Let us denote
[... ] [... ]
N C 2 (I : j) := {π ∈ N C 2 (I) | elements in Ij are only paired amongst each other}.

123
Then, by the inclusion-exclusion formula,
 
h     i
[... ]
spi11 − ϕ spi11 · · · spimm − ϕ spimm (−1)|M | · # 
X \
ϕ = N C 2 (I : j)
M ⊂[m] j∈M
 
[... ] [ [... ]
= # N C 2 (I)\ N C 2 (I : j) .
j

[... ]
These are π ∈ N C 2 (I) such that at least one element of each interval Ij is paired
with an element from another interval Ik . Since i1 6= i2 , i2 6= i3 , . . . , im−1 6= im
we cannot connect neighboring intervals and each interval must be connected to
another interval in a non-crossing way. But there is no such π, hence
 
h     i
[··· ] [··· ]
sip11 spi11 spimm spimm
[
ϕ −ϕ ··· −ϕ = # N C 2 (I)\ N C 2 (I : j) = 0,
j

as claimed.
Remark 12.8. (1) Note that in Theorem 12.7 we have traded the explicit descrip-
tion of our moments for implicit relations between the moments.
(2) For example, the simplest relations from Theorem 12.7 are
 
ϕ [spi − ϕ(spi )1][sqj − ϕ(sqj )1] = 0,
for i 6= j, which can be reformulated to
ϕ(spi sqj ) − ϕ(spi 1)ϕ(sqj ) − ϕ(spi )ϕ(sqj 1) + ϕ(spi )ϕ(sqj )ϕ(1) = 0,
i.e.,
ϕ(spi sqj ) = ϕ(spi )ϕ(sqj ).
Those relations are quickly getting more complicated. For example,
ϕ [(sp11 − ϕ(sp11 )1)(sq21 − ϕ(sq21 )1)(sp12 − ϕ(sp12 )1)(sq22 − ϕ(sq22 )1)] = 0
leads to
 
ϕ (sp11 sq21 sp12 s2q2 ) = ϕ sp11 +p2 ϕ (sq21 ) ϕ (sq22 )
 
+ ϕ (sp11 ) ϕ (sp12 ) ϕ sq21 +q2
− ϕ (sp11 ) ϕ (sq21 ) ϕ (sp12 ) ϕ (sq22 ) .
These relations are to be considered as non-commutative versions for the fac-
toriziation rules of expectations of independent random variables.

124
(3) One might ask: What is it good for to find those relations between the mo-
ments, if we know the moments in a more explicit form anyhow?
Answer: Those relations occur in many more situations. For example, inde-
pendent Wishart matrices satisfy the same relations, even though the explicit
form of their mixed moments is quite different from the gue case.
Furthermore, we can control what happens with these relations much better
than with the explicit moments if we deform our setting or construct new
random matrices out of other ones.
Not to mention that those relations also show up in very different corners of
mathematics (like operator algebras).
To make a long story short: Those relations from Theorem 12.7 are really
worth being investigated further, not just in a random matrix context, but
also for its own sake. This is the topic of a course on Free Probability Theory,
which can, for example, be found here:
rolandspeicher.files.wordpress.com/2019/08/free-probability.pdf

125
13 Exercises
13.1 Assignment 1
Exercise 1. Make yourself familiar with MATLAB (or any other programming
language which allows you to generate random matrices and calculate eigenvalues).
In particular, you should try to generate random matrices and calculate and plot
their eigenvalues.

Exercise 2. In this exercise we want to derive the explicit formula for the Catalan
numbers. We define numbers ck by the recursion
k−1
X
ck = cl ck−l−1 (13.1)
l=0

for k > 0, with the initial data c0 = 1.


(1) Show that the numbers ck are uniquely defined by the recursion (13.1) and its
initial data.
(2) Consider the (generating) function

ck z k
X
f (z) =
k=0

and show that the recursion (13.1) implies the relation

f (z) = 1 + zf (z)2 .

(3) Show hat f is a power series representation for



1 − 1 − 4z
z 7→ .
2z
Note: You may use the fact that the formal power series f , defined in (2), has
a positive radius of convergence.

127
(4) Conclude that
!
1 2k
ck = C k = .
k+1 k
Exercise 3. Consider the semicircular distribution, given by the density function
1√
4 − x2 1[−2,2] , (13.2)

where 1[−2,2] denotes the indicator function of the interval [−2, 2]. Show that (13.2)
indeed defines a probability measure, i.e.
2
1 Z √
4 − x2 dx = 1.

−2

Moreover show that the even moments of the measure are given by the Catalan
numbers and the odd ones vanish, i.e.
2

1 Z n√ 0 n is odd
x 4 − x2 dx = .
2π Ck n = 2k
−2

13.2 Assignment 2
Exercise 4. Using your favorite programing language or computer algebra system,
generate N × N random matrices for N = 3, 9, 100. Produce a plot of the eigenvalue
distribution for a single random matrix and as well as a plot for the average over a
reasonable number of matrices of given size. The entries should be independent and
identically distributed (i.i.d.) according to
(1) the Bernoulli distribution 21 (δ−1 + δ1 ), where δx denotes the Dirac measure
with atom x.
(2) the normal distribution.
Exercise 5. Prove Proposition 2.2, i.e. compute the moments of a standart Gaus-
sian random variable:
Z∞

1 2 0 n odd,
− t2
√ tn e dt =
2π −∞ (n − 1)!! n even.

Exercise 6. Let Z, Z1 , Z2 , . . . , Zn be independent standard complex Gaussian ran-


dom variables with mean 0 and E[|Zi |] = 1 for i = 1, . . . , n.

128
(1) Show that

E[Zi1 , . . . , Zir Z̄j1 . . . Z̄jr ] = #{σ ∈ Sr : ik = jσ(k) for k = 1, . . . , r}.

(2) Show that



0 m 6= n,
E[Z n Z̄ m ] =
n! m = n.

Exercise 7. Let A = (aij )N


i,j=1 be a Gaussian (gue(n)) random matrix with entries

aii = xii and aij = xij + −1yij , i.e. the xij , yij are real i.i.d. Gaussian random
variables, normalized such that E[|a2ij |] = 1/N . Consider the N 2 random vector

(x11 , . . . , xN N , x12 , . . . , x1N , . . . , xN −1N , y12 , . . . , y1N , . . . , yN −1N )

and show that it has the density

Tr(A2 )
C exp(−N )dA,
2
where C is a constant and
N
Y Y
dA = dxii dxij dyij .
i=1 i<y

13.3 Assignment 3
Exercise 8. Produce histograms for various random matrix ensembles.
(1) Produce histograms for the averaged situation: average over 1000 realizations
for the eigenvalue distribution of a an N × N Gaussian random matrix (or
alternatively ±1 entries) and compare this with one random realization for
N = 5, 50, 500, 1000.
(2) Check via histograms that Wigner’s semicircle law is insensitive to the common
distribution of the entries as long as those are independent; compare typical
realisations for N = 100 and N = 3000 for different distributions of the entries:
±1, Gaussian, uniform distribution on the interval [−1, +1].
(3) Check what happens when we give up the constraint that the the entries are
centered; take for example the uniform distribution on [0, 2].

129
(4) Check whether the semicircle law is sensitive to what happens on the diagonal
of the matrix. Choose one distribution (e.g. Gaussian) for the off-diagonal
elements and another distribution for the elements on the diagonal (extreme
case: put the diagonal equal to zero).
(5) Try to see what happens when we take a distribution for the entries which
does not have finite second moment; for example, the Cauchy distribution.
Exercise 9. In the proof of Theorem 3.9 we have seen that the m-th moment of
a Wigner matrix is asymptotically counted by the number of partitions σ ∈ P(m),
for which the corresponding graph Gσ is a tree; then the corresponding walk i1 →
i2 → · · · → im → i1 (where ker i = σ) uses each edge exactly twice, in opposite
directions. Assign to such a σ a pairing by opening/closing a pair when an edge is
used for the first/second time in the corresponding walk.
(1) Show that this map gives a bijection between the σ ∈ P(m) for which Gσ is a
tree and non-crossing pairings π ∈ N C2 (m).
(2) Is there a relation between σ and γπ, under this bijection?
Exercise 10. For a probability measure µ on R we define its Stieltjes transform Sµ
by Z
1
Sµ (z) := dµ(t)
t−z
R
+
for all z ∈ C := {z ∈ C | Im(z) > 0}. Show the following for a Stieltjes transform
S = Sµ .
(1) S : C+ → C + .
(2) S is analytic on C+ .
(3) We have

lim iyS(iy) = −1
y→∞
and sup y|S(x + iy)| = 1.
y>0,x∈R

13.4 Assignment 4
Exercise 11. (1) Let ν be the Cauchy distribution, i.e.,
1 1
dν(t) = dt.
π 1 + t2
Show that the Stieltjes transform of ν is given by
1
S(z) = for z ∈ C+ .
−i − z

130
(Note that this formula is not valid in C− .)
Recover from this the Cauchy distribution via the Stieltjes inversion formula.
(2) Let A be a selfadjoint matrix in MN (C) and consider its spectral distribution
µA = N1 N
P
i=1 δλi , where λ1 , . . . , λN are the eigenvalues (counted with multi-
plicity) of A. Prove that for any z ∈ C+ the Stieltjes transform SµA of µA is
given by
SµA (z) = tr[(A − zI)−1 ].
Exercise 12. Let (µN )N ∈N be a sequence of probability measures on R which con-
verges vaguely to µ. Assume that µ is also a probablity measure. Show the following.
(1) The sequence (µN )N ∈N is tight, i.e., for each ε > 0 there is a compact interval
I = [−R, R] such that µN (R\I) ≤ ε for all N ∈ N.
(2) µN converges to µ also weakly.
Exercise 13. The problems with being determined by moments and whether conver-
gence in moments implies weak convergence are mainly coming from the behaviour
of our probability measures around infinity. If we restrict everything to a compact
interval, then the main statements follow quite easily by relying on the Weierstrass
theorem for approximating continuous functions by polynomials. In the following
you should not use Theorem 4.12.
In the following let I = [−R, R] be a fixed compact interval in R.
(1) Assume that µ is a probability measure on R which has its support in I (i.e.,
µ(I) = 1). Show that all moments of µ are finite and that µ is determined by
its moments (among all probability measures on R).
(2) Consider in addition a sequence of probability measures µN , such that µN (I) =
1 for all N . Show that the following are equivalent:
• µN converges weakly to µ;
• the moments of µN converge to the corresponding moments of µ.

13.5 Assignment 5
In this assignment we want to investigate the behaviour of the limiting eigenvalue
distribution of matrices under certain perturbations. In order to do so, it is crucial
to deal with different kinds of matrix norms. We recall the most important ones for
the following exercises. Let A ∈ MN (C), then we define the following norms.
• The spectral norm (or operator norm):

kAk = max{ λ : λ is an eigenvalue of AA∗ }.
Some of its important properties are:

131
(i) It is submultiplicative, i.e. for A, B ∈ MN (C) one has
kABk ≤ kAk · kBk.
(ii) It is also given as the operator norm
kAxk2
kAk = sup ,
x∈CN kxk2
x6=0

where kxk2 is here the Euclidean 2-norm of the vector x ∈ CN .


• The Frobenius (or Hilbert-Schmidt or L2 ) norm:
 1/2 s X
kAk2 = Tr(A∗ A) = |aij |2
1≤i,j≤N

Exercise 14. In this exercise we will prove some useful facts about these norms,
which you will have to use in the next exercise when adressing the problem of
perturbed random matrices.
Prove the following properties of the matrix norms.
(1) For A, B ∈ MN (C) we have | Tr(AB)| ≤ kAk2 · kBk2 .
(2) Let A ∈ MN (C) be positive and B ∈ MN (C) arbitrary. Prove that
| Tr(AB)| ≤ kBk Tr(A).
(A ∈ MN (C) is positive if there is a matrix C ∈ MN (C) such that A = C ∗ C;
this is equivalent to the fact that A is selfadjoint and all the eigenvalues of A
are positive.)
(3) Let A ∈ MN (C) be normal, i.e. AA∗ = A∗ A, and B ∈ MN (C) arbitrary. Prove
that
max{kABk2 , kBAk2 } ≤ kBk2 · kAk
Hint: normal matrices are unitarily diagonalizable.
Exercise 15. In this main exercise we want to investigate the behaviour of the
eigenvalue distribution of selfadjoint matrices with respect to certain types of per-
turbations.
(1) Let A ∈ MN (C) be selfadjoint, z ∈ C+ and RA (z) = (A − zI)−1 . Prove that
1
kRA (z)k ≤
Im(z)
and that RA (z) is normal.

132
(2) First we study a general perturbation by a selfadjoint matrix.
Let, for any N ∈ N, XN = (Xij )N N
i,j=1 and YN = (Yij )i,j=1 be selfadjoint matrices
in MN (C) and define X̃N = XN + YN . Show that
s
1 tr(YN2 )
tr(R √1 XN (z)) − tr(R √1 X̃N )(z) ≤

N N (Im(z))2 N

(3) In this part we want to show that the diagonal of a matrix does not contribute
to the eigenvalue distribution in the large N limit, if it is not too ill-behaved.
As before, consider a selfadjoint matrix XN = (Xij )N D
i,j=1 ∈ MN (C); let XN =
(0)
diag(X11 , . . . , XN N ) be the diagonal part of XN and XN = XN − XND the part
of XN with zero diagonal. Assume that kXND k2 ≤ N for all N ∈ N. Show that


tr(R √1 XN (z)) − tr(R √1 X (0) )(z) → 0, as N → ∞.

N N N

13.6 Assignment 6
Exercise 16. We will address here concentration estimates for the law of large
numbers, and see that control of higher moments allows stronger estimates. Let
Xi be a sequence of independent and identically distributed random variables with
common mean µ = E[Xi ]. We put
n
1X
Sn := Xi .
n i=1

(1) Assume that the variance Var [Xi ] is finite. Prove that we have then the weak
law of large numbers, i.e., convergence in probability of Sn to the mean: for
any  > 0
P(ω | |Sn (ω) − µ| ≥ ) → 0, for n → ∞.
(2) Assume that the fourth moment of the Xi is finite, E [Xi4 ] < ∞. Show that
we have then the strong law of large numbers, i.e.,

Sn → µ, almost surely.

(Recall that by Borel–Cantelli it suffices for almost sure convergence to show


that ∞ X
P(ω | |Sn (ω) − µ| ≥ ) < ∞.)
n=1

133
One should also note that our assumptions for the weak and strong law of
large numbers are far from optimal. Even the existence of the variance is not
needed for them, but for proofs of such general versions one needs other tools
then our simple consequences of Cheyshev’s inequality.
Exercise 17. Let XN = √1N (xij )N i,j=1 , where the xij are all (without symmetry
condition) independent and identically distributed with standard complex Gaussian
distribution. We denote the adjoint (i.e., congugate transpose) of XN by XN∗ .
(1) By following the ideas from our proof of Wigner’s semicircle law for the gue
in Chapter 3 show the following: the averaged trace of any ∗-moment in XN
and XN∗ , i.e.,
p(1) p(m)
E[tr(XN · · · XN )] where p(1), . . . , p(m) ∈ {1, ∗}
is for N → ∞ given by the number of non-crossing pairings π in N C2 (m)
which satisfy the additional requirement that each block of π connects an X
with an X ∗ .
(2) Use the result from part (1) to show that the asymptotic averaged eigenvalue
distribution of WN := XN XN∗ is the same as the square of the semicircle
distribution, i.e. the distribution of Y 2 if Y has a semicircular distribution.
(3) Calculate the explicit form of the asymptotic averaged eigenvalue distribution
of WN .
(4) Again, the convergence is here also in probability and almost surely. Produce
histograms of samples of the random matrix WN for large N and compare it
with the analytic result from (3).
Exercise 18. We consider now random matrices WN = XN XN∗ as in Exercise 17,
but now we allow the XN to be rectangular matrices, i.e., of the form
1
XN = √ (xij ) 1≤i≤N ,
p 1≤j≤p

where again all xij are independent and identically distributed. We allow now real
or complex entries. (In case the entries are real, XN∗ is of course just the transpose
XNT .) Such matrices are called Wishart matrices. Note that we can now not multiply
XN and XN∗ in arbitrary order, but alternating products as in WN make sense.
(1) What is the general relation between the eigenvalues of XN XN∗ and the eigen-
values of XN∗ XN . Note that the first is an N × N matrix, whereas the second
is a p × p matrix.
(2) Produce histograms for the eigenvalues of WN := XN XN∗ for N = 50, p = 100
as well as for N = 500, p = 1000, for different distributions of the xij ;

134
• standard real Gaussian random variables
• standard complex Gaussian random variables
• Bernoulli random variables, i.e., xij takes on values +1 and −1, each with
probability 1/2.
(3) Compare your histograms with the density, for c = 0.5 = N/p, of the Marchenko–
Pastur distribution which is given by
q
(λ+ − x)(x − λ− ) √ 2
1[λ− ,λ+ ] (x),

where λ± := 1 ± c .
2πcx

13.7 Assignment 7
Exercise 19. Prove – by adapting the proof for the goe case and parametrizing
a unitary matrix in the form U = e−iH , where H is a selfajoint matrix – Theorem
7.6: The joint eigenvalue distribution of the eigenvalues of a gue(n) is given by a
density
N 2 2
ĉN e− 2 (λ1 +···+λN ) (λl − λk )2 ,
Y

k<l

restricted on λ1 < λ2 < · · · < λN ,

Exercise 20. In order to get a feeling for the repulsion of the eigenvalues of goe
and gue compare histograms for the following situations:
• the eigenvalues of a gue(n) matrix for one realization
• the eigenvalues of a goe(n) matrix for one realization
• N independently chosen realizations of a random variable with semicircular
distribution
for a few suitable values of N (for example, take N = 50 or N = 500).

Exercise 21. For small values of N (like N = 2, 3, 4, 5, 10) plot the histogram of
averaged versions of gue(n) and of goe(n) and notice the fine structure in the gue
case. In the next assignment we will compare this with the analytic expression for
the gue(n) density from class.

13.8 Assignment 8
Exercise 22. In this exercise we define the Hermite polynomials Hn by
2 /2 dn −x2 /2
Hn (x) = (−1)n ex e
dxn

135
and want to show that they are the same polynomials we defined in Definition 7.10
and that they satisfy the recursion relation. So, starting from the above definition
show the following.
(1) For any n ≥ 1,
xHn (x) = Hn+1 (x) + nHn−1 (x).
(2) Hn is a monic polynomial of degree n. Furthermore, it is an even function if
n is even and an odd function if n is odd.
(3) The Hn are orthogonal with respect to the Gaussian measure
2 /2
dγ(x) = (2π)−1/2 e−x dx.

More precisely, show the following:


Z
Hn (x)Hm (x)dγ(x) = δnm n!
R

Exercise 23. Produce histograms for the averaged eigenvalue distribution of a


gue(n) and compare this with the exact analytic density from Theorem 7.21.
(1) Rewrite first the averaged eigenvalue density
−1
1 1 1 NX 1 2
pN (µ) = KN (µ, µ) = √ Hk (µ)2 e−µ /2
N 2π N k=0 k!

for the unnormalized gue(n) to the density qN (λ) for the normalized gue(n)
(with second moment normalized to 1).
(2) Then average over sufficiently many normalized gue(n), plot their histograms,
and compare this to the analytic density qN (λ). Do this at least for N =
1, 2, 3, 5, 10, 20, 50.
(3) Check also numerically that qN converges, for N → ∞, to the semicircle.
(4) For comparison, also average over goe(n) and over Wigner ensembles with
non-Gaussian distribution for the entries, for some small N .

Exercise 24. In this exercise we will approximate the Dyson Brownian motions from
Section 8.3 by their discretized random walk versions and plot the corresponding
walks of the eigenvalues.
(1) Approximate the Dyson Brownian motion by its discretized random walk ver-
sion
k
X (i)
AN (k) := ∆ · AN , for 1 ≤ k ≤ K
i=1

136
(1) (K)
where AN , . . . , AN are K independent normalized gue(n) random matrices.
∆ is a time increment. Generate a random realization of such a Dyson random
walk AN (k) and plot the N eigenvalues λ1 (k), . . . , λN (k) of AN (k) versus k in
the same plot to see the time evolution of the N eigenvalues. Produce at least
plots for three different values of N .
Hint: Start with N = 15, ∆ = 0.01, K = 1500, but also play around with
those parameters.
(2) For the same parameters as in part (i) consider the situation where you replace
gue by goe and produce corresponding plots. What is the effect of this on
the behaviour of the eigenvalues?
(3) For the three considered cases of N in parts (1) and 2i), plot also N indepen-
dent random walks in one plot, i.e.,
k
∆ · x(i) ,
X
λ̃N (k) := for 1 ≤ k ≤ K
i=1

where x(1) , . . . , x(K) are K independent real standard Gaussian random vari-
ables.
You should get some plots like in Section 8.3.

13.9 Assignment 9
Exercise 25. Produce histograms for the Tracy–Widom distribution by plotting
(λmax − 2)N 2/3 .
(1) Produce histograms for the largest eigenvalue of gue(n), for N = 50, N = 100,
N = 200, with at least 5000 trials in each case.
(2) Produce histograms for the largest eigenvalue of goe(n), for N = 50, N = 100,
N = 200, with at least 5000 trials in each case.
(3) Consider also real and complex Wigner matrices with non-Gaussian distribu-
tion for the entries.
(4) Check numerically whether putting the diagonal equal to zero (in gue or
Wigner) has an effect on the statistics of the largest eigenvalue.
(5) Bonus: Take a situation where we do not have convergence to semicircle, e.g.,
Wigner matrices with Cauchy distribution for the entries. Is there a reasonable
guess for the asymptotics of the distribution of the largest eigenvalue?
(6) Superbonus: Compare the situation of repelling eigenvalues with “indepen-
dent” eigenvalues. Produce N independent copies x1 , . . . , xN of variables dis-
tributed according to the semicircle distribution and take then the maximal

137
value xmax of these. Produce a histogram of the statistics of xmax . Is there a
limit of this for N → ∞; how does one have to scale with N ?
Exercise 26. Prove the estimate for the Catalan numbers
4k
Ck ≤ 3/2 √ ∀k ∈ N.
k π
Show that this gives the right asymptotics, i.e., prove that
4k √
lim 3/2 = π.
k→∞ k Ck
Exercise 27. Let Hn (x) be the Hermite polynomials. The Christoffel-Darboux
identity says that
n−1
X Hk (x)Hk (y) Hn (x)Hn−1 (y) − Hn−1 (x)Hn (y)
= .
k=0 k! (x − y) (n − 1)!
(1) Check this identity for n = 1 and n = 2.
(2) Prove the identity for general n.

13.10 Assignment 10
Exercise 28. Work out the details for the “almost sure” part of Corollary 9.6, i.e.,
prove that almost surely the largest eigenvalue of gue(n) converges, for N → ∞,
to 2.
Exercise 29. Consider the rescaled Hermite functions

Ψ̃(x) := N 1/12 ΨN (2 N + xN −1/6 ).
(1) Check numerically that the rescaled Hermite functions have a limit for N → ∞
by plotting them for different values of N .
(2) Familarize yourself with the Airy function. Compare the above plots of Ψ̃N
for large N with a plot of the Airy function.
Hint: MATLAB has an implementation of the Airy function, see
https://de.mathworks.com/help/symbolic/airy.html
Exercise 30. Prove that the Hermite functions satisfy the following differential
equations:
x √
Ψ0n (x) = − Ψn (x) + nΨn−1 (x)
2
and
1 x2
Ψ00n (x) + (n + − )Ψn (x) = 0.
2 4

138
13.11 Assignment 11
Exercise 31. Read the notes “Random Matrix Theory and its Innovative Applica-
tions” by A. Edelman and Y. Wang,
http:
//math.mit.edu/~edelman/publications/random_matrix_theory_innovative.pdf

and implement its “Code 7” for calculating the Tracy–Widom distribution (via solv-
ing the Painlevé II equation) and compare the output with the histogram for the
rescaled largest eigenvalue for the gue from Exercise 25. You should get a plot like
after Theorem 9.10.

Exercise 32. For N = 100, 1000, 5000 plot in the complex plane the eigenvalues of
one N × N random matrix √1N AN , where all entries (without symmetry condition)
are independent and identically distributed according to the
(i) standard Gaussian distribution;
(ii) symmetric Bernoulli distribution;
(iii) Cauchy distribution.

139
14 Literature
Books
(1) Gernot Akemann, Jinho Baik, Philippe Di Francesco: The Oxford Handbook
of Random Matrix Theory, Oxford Handbooks in Mathematics, 2011.
(2) Greg Anderson, Alice Guionnet, Ofer Zeitouni: An Introduction to Random
Matrices, Cambridge University Press, 2010.
(3) Zhidong Bai, Jack Silverstein: Spectral Analysis of Large Dimensional Random
Matrices, Springer-Verlag 2010.
(4) Patrick Billingsley: Probability and Measure, John Wiley & Sons, 3rd edition,
1995.
(5) Stéphane Boucheron, Gábor Lugosi, Pascal Massart: Concentration inequali-
ties: A nonasymptotic theory of independence, Oxford University Press, Ox-
ford, 2013.
(6) Alice Guionnet: Large Random Matrices: Lectures on Macroscopic Asymp-
totics, Springer-Verlag 2009.
(7) Madan Lal Mehta: Random Matices, Elsevier Academic Press, 3rd edition,
2004.
(8) James Mingo, Roland Speicher: Free Probability and Random Matrices, Springer-
Verlag, 2017.
(9) Alexandru Nica, Roland Speicher: Lectures on the Combinatorics of Free Prob-
ability, Cambridge University Press 2006.

Lecture Notes and Surveys


(10) Nathanaël Berstycki: Notes on Tracy–Widom Fluctuation Theory, 2007.
(11) Charles Bordenave, Djalil Chafaï: Around the circular law, Probability Surveys
9 (2012) 1-89.
(12) Alan Edelman, Raj Rao: Random matrix theory, Acta Numer. 14 (2005),
233-297.
(13) Alan Edelman, Raj Rao: The polynomial method for random matrices, Found.
Comput. Math. 8 (2008), 649-702.
(14) Todd Kemp: MATH 247A: Introduction to Random Matrix Theory, lecture
notes, UC San Diego, fall 2013.

141

You might also like