Lecture Notes Combinatorics
Lecture Notes Combinatorics
Lecture Notes Combinatorics
Combinatorics
Lecture by Maria Axenovich and Torsten Ueckerdt (KIT)
Problem Classes by Jonathan Rollin (KIT)
1
Contents
0 What is Combinatorics? 4
2 Inclusion-Exclusion-Principle
and Möbius Inversion 45
2.1 The Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . . . 45
2.1.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.1.2 Stronger Version of PIE . . . . . . . . . . . . . . . . . . . 52
2.2 Möbius Inversion Formula . . . . . . . . . . . . . . . . . . . . . . 53
3 Generating Functions 58
3.1 Newton’s Binomial Theorem . . . . . . . . . . . . . . . . . . . . . 62
3.2 Exponential Generating Functions . . . . . . . . . . . . . . . . . 64
3.3 Recurrence Relations . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.1 Special Solution an = xn and the Characteristic Polynomial 70
3.3.2 Advancement Operator . . . . . . . . . . . . . . . . . . . 76
3.3.3 Non-homogeneous Recurrences . . . . . . . . . . . . . . . 80
3.3.4 Solving Recurrences using Generating Functions . . . . . 82
2
4 Partitions 85
4.1 Partitioning [n] – the set on n elements . . . . . . . . . . . . . . . 85
4.1.1 Non-Crossing Partitions . . . . . . . . . . . . . . . . . . . 86
4.2 Partitioning n – the natural number . . . . . . . . . . . . . . . . 87
4.3 Young Tableau . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.1 Counting Tableaux . . . . . . . . . . . . . . . . . . . . . . 99
4.3.2 Counting Tableaux of the Same Shape . . . . . . . . . . . 100
6 Designs 132
6.1 (Non-)Existence of Designs . . . . . . . . . . . . . . . . . . . . . 133
6.2 Construction of Designs . . . . . . . . . . . . . . . . . . . . . . . 135
6.3 Projective Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.4 Steiner Triple Systems . . . . . . . . . . . . . . . . . . . . . . . . 138
6.5 Resolvable Designs . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.6 Latin Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
3
What is Combinatorics?
Combinatorics is a young field of mathematics, starting to be an independent
branch only in the 20th century. However, combinatorial methods and problems
have been around ever since. Many combinatorial problems look entertaining
or aesthetically pleasing and indeed one can say that roots of combinatorics lie
in mathematical recreations and games. Nonetheless, this field has grown to be
of great importance in today’s world, not only because of its use for other fields
like physical sciences, social sciences, biological sciences, information theory and
computer science.
4
Interconnections: Assume a discrete structure has some properties (num-
ber of arrangements, . . . ) that match with another discrete structure.
Can we specify a concrete connection between these structures? If
this other structure is well-known, can we draw conclusions about our
structure at hand?
We will give some life to this abstract list of tasks in the context of the
following example.
Example (Dimer Problem). Consider a generalized chessboard of size m×n (m
rows and n columns). We want to cover it perfectly with dominoes of size 2 × 1
or with generalized dominoes – called polyominoes – of size k × 1. That means
we want to put dominoes (or polyominoes) horizontally or vertically onto the
board such that every square of the board is covered and no two dominoes (or
polyominoes) overlap. A perfect covering is also called tiling. Consider Figure
1 for an example.
Existence
If you look at Figure 1, you may notice that whenever m and n are both odd
(in the Figure they were both 5), then the board has an odd number of squares
and a tiling with dominoes is not possible. If, on the other hand, m is even or
n is even, a tiling can easily be found. We will generalize this observation for
polyominoes:
Claim. An m × n board can be tiled with polyominoes of size 1 × k if and only
if k divides m or n.
Proof. “⇐” If k divides m, it is easy to construct a tiling: Just cover every
column with m/k vertical polyominoes. Similarly, if k divides n, cover
every row using n/k horizontal polyominoes.
“⇒” Assume k divides neither m nor n (but note that k could still divide
the product m · n). We need to show that no tiling is possible. We
write m = s1 k + r1 , n = s2 k + r2 for appropriate s1 , s2 , r1 , r2 ∈ N and
0 < r1 , r2 < k. Without loss of generality, assume r1 ≤ r2 (the argument
is similar if r2 < r1 ). Consider the colouring of the m × n board with k
colours as shown in Figure 2.
5
1 2 3 4 5 6 7 8 9
1
2
3
4
k 5
k 6
7
8
Formally, the colour of the square (i, j) is defined to be ((i−j) mod k)+1.
Any polyomino of size k × 1 that is placed on the board will cover exactly
one square of each colour. However, there are more squares of colour 1
than of colour 2, which shows that no tiling with k×1 dominoes is possible.
Indeed, for the number of squares coloured with 1 and 2 we have:
Now that the existence of tilings is answered for rectangular boards, we may
be inclined to consider other types of boards as well:
Claim (Mutilated Chessboard). The n×n board with bottom-left and top-right
square removed (see Figure 3) cannot be tiled with (regular) dominoes.
Figure 3: A “mutilated” 6 × 6 board. The missing corners have the same colour.
Proof. If n is odd, then the total number of squares is odd and clearly no tiling
can exist. If n is even, consider the usual chessboard-colouring: In it, the missing
squares are of the same colour, say black. Since there was an equal number of
black and white squares in the non-mutilated board, there are now two more
white squares than black squares. Since dominoes always cover exactly one
black and one white square, no tiling can exist.
6
Other ways of pruning the board have been studied, but we will not consider
them here.
Enumeration
A general formula to determine the number of ways an m × n board can be tiled
with dominoes is known. For an 2m × 2n board the following formula is due to
Temperly and Fisher [TF61] and independently Kasteleyn [Kas61]
n
m Y
Y
jπ
4mn cos2 iπ
2m+1 + cos2 2n+1 .
i=1 j=1
Theorem (Fischer 1961). There are 24 · 172 · 532 = 12, 988, 816 ways to tile the
8 × 8 board with dominoes.
Classification
Consider tilings of the 4 × 4 board with dominoes. For some of these tilings
there is a vertical line through the board that does not cut through any domino.
Call such a line a vertical cut. In the same way we define horizontal cuts.
As it turns out, for every tiling of the 4 × 4 board at least one cut exists,
possibly several (try this for yourself!).
Hence the set T of all tilings can be partitioned into
Figure 4: Some tilings have horizontal cuts, some have vertical cuts and some
have both.
7
Figure 5: The board B consisting of two 4 × 4 boards and two extra squares
connecting them as shown. The partial covering on the right cannot be extended
to a tiling.
.
Meta Structure
We say two tilings are adjacent, if one can be transformed into the other by
taking two dominoes lying like this and turning them by 90 degrees so they
are lying like this (or vice versa). Call this a turn operation. If we draw all
tilings of the 4 × 4 board and then draw a line between tilings that are adjacent,
we get the picture on the left of Figure 6.
With this in mind, we can speak about the distance of two tilings, the number
of turn operations required to transform one into the other. We could also look
for a pair of tilings with maximum distance (this would be an optimization
problem).
We can even discover a deeper structure, but for this we need to identify
different types of turn operations. We can turn into , call this an up-turn,
or into , call this a side-turn. The turn can happen on the background
or the background (white squares form a falling or rising diagonal). We call
an operation a flip if it is an up-turn on or a side-turn on . Call it a flop
otherwise.
Quite surprisingly, walking upward in the graph in Figure 6 always corre-
sponds to flips and walking downwards corresponds to flops. This means that
there is a natural partial ordering on the tilings: We can say a tiling B is greater
than a tiling A if we can get from A to B by a sequence of flips. As it turns out,
the order this gives has special properties. It is a so-called distributive lattice,
in particular, there is a greatest tiling.
Optimization
Different tilings have a different set of decreasing free paths. Such a path (see
Figure 7) proceeds monotonously from the top left corner of the board along
the borders of squares to the bottom right, and does not “cut through” any
domino.
Some tilings admit more such paths then others. It is conjectured that in an
m × n board the maximum number is always attained by one of the two tilings
where all dominoes are oriented the same way (see right of Figure 7). But no
proof has been found yet.
8
Figure 6: On the left: The set of all tilings of the 4 × 4 board. Two tilings
are connected by an edge if they can be transformed into one another by a
single flip. On the right: The same picture, but with the chessboard still drawn
beneath the tilings, so you can check that upward edges correspond to flips (as
defined in the text).
9
Figure 7: The green path does not cut through dominoes. We believe that the
boring tiling on the right admits the maximum number of such paths, but this
has not been proved for large boards.
10
1.1.3 Subtraction Principle
Let S be a subset of a finite set T . We define S := T \ S, the complement of S
in T . Then the substraction principle claims that
|S| = |T | − |S|.
Example. If T is the set of students studying at KIT and S the set of students
studying neither math nor computer science. If we know |T | = 23905 and
|S| = 20178, then we can compute the number |S| of students studying either
math or computer science:
Example. Let S be the set of students attending the lecture and T the set of
homework submissions for the first problem sheet.
If the number of students and the number of submissions coincide, then there
is a a bijection between students and submissions1 and vice versa.
We now consider two other principles that are similarly intuitive and natural
but will frequently and explicitly occur as patterns in proofs.
matter: That would require that you cleanly write your name onto your submission.
11
Example (Handshaking Lemma). Assume there are n people at a party and
everybody will shake hands with everybody else. What is the total number N of
handshakes that occur? If we just sum up for each guest each of its handshakes,
then we overcount each handshake. In order to control this overcount we shall
instead count the number M of pairs (P, S) where P is a guest and S is a
handshake involving P . We count this number in two ways:
First way: There are n guests and everybody shakes n − 1 hands. So
X X
M= |{(P, S) | S handshake with P }| = (n − 1) = n · (n − 1).
P guest P guest
n · (n − 1)
N= .
2
Example. In the example above we have seen a first way how two count the
number N (already utilizing double counting). Here we count N in a second
way. Label the guests from 1 to n. To avoid counting a handshake twice, we
count for guest i only the handshakes with guests of smaller numbers. Then the
total number of handshakes is
n
X n−1
X n−1
X
(i − 1) = i= i.
i=1 i=0 i=1
12
i 1 2 3 4 5 6 7
s(i)
function: We call [n] the domain of s and s(i) the image of i under s (i ∈ [n]).
The set {x ∈ X | s(i) = x for some i ∈ [n]} is the range of s.
In the example, the domain of s is {1, 2, 3, 4, 5, 6, 7}, the image of 3 is
and the range of s is { , , , }. Note that is not in the range of s.
string: We call s an X-string of length n and write s = s(1)s(2)s(3) · · · s(n).
The i-th position (or character ) in s is denoted by si = s(i). The set X
is an alphabet and its elements are letters. Often s is called a word.
In the example, we would say that s = is a string (or word)
of length n over the five-letter alphabet X. The fourth character of s is
s4 = .
tuple: We can view s as an element of the n-fold Cartesian product X1 ×
. . . × Xn , where Xi = X for i ∈ [n]. We call s a tuple and write it
as (s1 , s2 , . . . , sn ). The element si is called the i-th coordinate (i ∈ [n]).
Viewing arrangements as elements of products makes it easy to restrict the
number of allowed values for a particular coordinate (just choose Xi ( X).
In the example we would write s = ( , , , , , , ). Its first coordinate
is .
In the following we will mostly view arrangements as functions but will freely
switch perspective when appropriate.
1.2.1 Permutations
The most important ordered arrangements are those in which the mapping is
injective, i.e., s(i) 6= s(j) for i 6= j.
Definition 1.1 (Permutation). Let X be a finite set. We define
permutation: A permutation of X is a bijective map π : [n] → X. Usually we
choose X = [n] and denote the set of all permutations of [n] by Sn .
We tend to write permutations as strings if n < 10, take for example
π = 2713546 by which we mean the function:
i 1 2 3 4 5 6 7
π(i) 2 7 1 3 5 4 6
13
Circular k-Permutation: We say that two k-permutations π1 , π2 ∈ P (n, k)
are circular equivalents if there exists a shift s ∈ [k] such that the following
implication holds:
6 2 2
1 3 1
π1 7 π2 1 π3 3
2 7 6
3 6 7
n!
(ii) |Pc (n, k)| = k·(n−k)! .
14
n!
First way: |P (n, k)| = (n−k)! which we proved in (i).
Second way: |P (n, k)| = |Pc (n, k)| · k because every equivalence class in
Pc (n, k) contains k permutations from P (n, k) (since there are k ways
to rotate a k-permutation).
n!
From this we get (n−k)! = |Pc (n, k)| · k which implies the claim.
As with ordered arrangements, the most important case for unordered ar-
rangements is that all repetition numbers are 1, i.e. rx = 1 for all x ∈ S. Then
S is simply a subset of X, denoted by S ⊆ X.
Definition 1.3. k-Combination: Let X be a finite set. A k-combination of
X is an unordered arrangement of k distinct elements from X. We prefer
the more standard term subset and use “combination” only when we want
to emphasize
the selection process. The set of all k-subsets of X is denoted
by Xk and if |X| = n then we denote
n X
:= .
k k
15
k-Permutation of a Multiset: Let M be a finite multiset with set of types X.
A k-permutation of M is an ordered arrangement of k elements of M where
different orderings of elements of the same type are not distinguished.
This is a an ordered multiset with types in X and repetition numbers
P|X|
s1 , . . . , s|X| such that si ≤ ri , 1 ≤ i ≤ |X|, and i=1 si = k.
Note that there might be several elements of the same type compared
to a permutation of a set (where each repetition number equals 1). If for
example M = {2· , 1· , 3· , 1· }, then T = ( , , , ) is a 4-permutation
of the multiset M .
The coefficient in front of x21 x2 x3 is 12, the coefficient in front of x21 x23 is 6 and
the coefficient in front of x43 is 1.
16
According to the last definition this means:
4 4 4
= 12, = 6, = 1.
2, 1, 1 2, 0, 2 0, 0, 4
The monomial xi1 xi2 . . . xin is equal to xk11 . . . xkr r if from the indices {i1 , . . . , in }
exactly k1 are equal to 1, k2 equal to 2 and so on.
We count the number of assignments of values to the indices {i1 , . . . , in }
satisfying this.
Choose k1 indices to be equal to 1: There are kn1 ways to do so.
Choose k2 indices to be equal to 2: There are n − k1 indices left to choose
from, so there are n−k
k2
1
ways to choose k2 of them.
Choose kj indices to be equal to j (j ∈ [r]): There n − k1 − . . . − kj−1
indices left to choose from, so there are n−k1 −...−k
kj
j−1
ways to choose kj
of them.
Hence the multinomial coefficient (i.e. the coefficient of xk11 xk22 . . . xkr r ) is
n n n − k1 n − k1 − k2 − . . . − kr−1
= · · ... ·
k1 , . . . , kr k1 k2 kr
and the first identity is proved. Now use Theorem 1.4 to rewrite the binomial
coefficients and obtain
n! (n − k1 )! (n − k1 − . . . − kr−1 )!
· · ... ·
k1 !(n − k1 )! k2 !(n − k1 − k2 )! kr !(n − k1 − . . . − kr )!
Conveniently, many of the factorial terms cancel out, like this:
n! (n−k1
)! (n( −(k1(−(. .(
.−(k(((
r−1 )!
·
( · ... · ( .
k1 !
(n− k1 )! k2 !(
(n(−(k1 − k2 )! kr !(n − k1 − . . . − kr )!
( ( (
17
Note that the last Theorem establishes that binomial coefficients are special
cases of multinomial coefficients. We have for 0 ≤ k ≤ n:
n n! n
= = .
k k!(n − k)! k, n − k
We extend
the definition of binomial andmultinomial
coefficients by setting
n n n
k1 ,...,kr = 0 if k i = −1 for some i, and −1 = n+1 = 0. This makes stating
the following lemma more convenient.
Lemma 1.7 (Pascal’s Formula).
If n ≥ 1 and 0 ≤ k ≤ n, we have
n n−1 n−1
= + .
k k k−1
Proof. Note first, that in the case of binomial coefficients, the claim can be
rewritten as:
n n−1 n−1
= +
k, n − k k, n − k − 1 k − 1, n − k
18
Note that the condition ki − 1 ≥ 0 is not needed, because for the summands
with ki − 1 = −1 we defined the coefficient under the sum to be 0. We remove
it and swap the summation signs, ending up with:
X r
X
n−1
(x1 + . . . + xr ) =n
xk1 . . . xkr r
k1 , . . . , ki − 1 . . . , kr 1
k1 ,...,kr ≥0 i=1
k1 +...+kr =n
as claimed.
You may already know Pascal’s Triangle. It is a way to arrange binomial
coefficients in the plane as shown in Figure 9. It is possible to do something
similar in the case of multinomial
coefficients, however, when drawing all coef-
n
ficients of the form k1 ...,kr
, the drawing will be r-dimensional. For r = 3, a
“Pascal Pyramid” is given in Figure 10.
Now that we know what multinomial coefficients are and how to compute
them, it is time to see how they can help us count things.
Example. How many 6-permutations are there of the multiset { , , , , , }?
Trying to list them all (( , , , , , ), ( , , , , , ), ), ( , , , , , ), . . .)
would be tedious. Well, there are 6 possible positions for . Then there are 5
positions left, 2 of which need to contain . The three need to go to the three
remaining positions, so there is just one choice left for them. This
means
there
are 6 · 52 · 1 = 60 arrangements in total. This is equal to 61 52 33 = 1,2,36
,
which is no coincidence:
Theorem 1.8. Let S be a finite multiset with k different types and repetition
numbers r1 , r2 , . . . rk . Let the size of S be n = r1 + r2 + · · · + rk . Then the
number of n-permutations of S equals
n
.
r1 , . . . , rk
choices. Continuing like this, the total number of choices will be:
n n − r1 n − r1 − r2 − . . . − rk−1 Thm 1.6 n
· · ... · = .
r1 r2 rk r1 , r2 , . . . , rk
We have seen before how the coefficient nk = k,n−k n
counts the number
of k-combinations of [n] (i.e. number of subsets of [n]). Now we learned, that
19
0
k=
n=0 1
1
k=
n=1 1 1
2
k=
n=2 1 2 1
3
k=
n=3 1 3 3 1
4
k=
n=4 1 4 6 4 1
5
k=
n=5 1 5 10 10 5 1
6
k=
n=6 1 6 15 20 15 6 1
Figure 9: Arrangement of the binomial coefficients nk . Lemma 1.7 shows that
the number nk is obtained as the sum of the two numbers n−1 k−1 and
n−1
k
directly above it.
1
1
1
1 1
2
1 2 1
3 2
1 3 1 3
4 6 1
1 4 3 6 3
5 12 3 4
10 12 1
5 6 1 10 4
20 12 5
10 4 30 6 1
30 4 20
30 5
10 1 10
20
5 10
5
1
Figure 10: Arrangement of the numbers k1 ,kn2 ,k3 with k1 + k2 + k3 = n in a
pyramid. To make the picture less messy, the numbers in the back (with large
k3 ) are faded out. The three flanks of the pyramid are pascal triangles, one of
which is the black triangle in the front with numbers k1 ,kn2 ,0 . The first number
3
not on one of the flanks is 1,1,1 = 6.
20
it also counts the number of n-permutations of a multiset with two types and
repetition number k and n − k. How come ordered and unordered things are
counted by the same number? We demystify this by finding a natural bijection:
Consider the multiset M := {k · X, (n − k) · X} with types X and X (the
“chosen” type and the “unchosen” type). Now associate with an n-permutation
of M the set of positions that contain X, for instance with n = 5, k = 2 and
the permutation (X, X, X, X, X), the corresponding set would be {1, 4} ⊂ [5]
(since the first and fourth position received X). It is easy to see that every
n-permutation of M corresponds to a unique k element subset of [n] and vice
versa.
Note that so far we only considered n-permutations of multisets of size n.
What about general r-permutations of multisets of size n? For example, the
number of 2 permutations of the multiset { , , , , } is 7, since there is:
, , , , , , .
Note that and is not possible, since we have only of one copy of
and at our disposal. The weird number 7 already suggests that general r-
permutations of n element multisets may not be as easy to count. Indeed, there
is no simple formula as in Theorem 1.8 but we will see an answer using the
principle of inclusion and exclusion later.
There is a special case other than r = n that we can handle, though: If
all repetition numbers ri of a multiset with k types are bigger or equal than
r, for instance when considering 2-permutations of M := { , , , , , , },
then those repetition numbers do not actually impose a restriction, since we
will never run out of copies of any type. We would sometimes sloppily write
M = {∞ · , ∞ · , ∞ · } where the infinity sign indicates that there are “many”
copies of the corresponding elements. The number of r-permutations of M is
then equal to k r , just choose one type of the k types for each of the r positions.
After permutations of multisets, we now consider combinations.
Example. Say you are told to bring two pieces of fruit from the supermarket
and they got , and (large quantities of each). How many choices do you
have? Well, there is: { , }, { , }, { , }, { , }, { , }, { , }, so six
combinations. Note that bringing a and an is the same as bringing an
and a (your selection is not ordered), so this option is counted only once.
We now determine the number of combinations for arbitrary number of types
and number of elements to choose.
Theorem 1.9. Let r, k ∈ N and let S be a multiset with k types and large repe-
tition numbers (each r1 , . . . , rk is at least r), then the number of r-combinations
of S equals
k+r−1
.
r
Proof. For clarity, we do the proof alongside an example. Let the types be
a1 , a2 , . . . , ak , for instance k = 4 and a1 = , a2 = , a3 = , a4 = . Then
imagine the r-combinations laid-out linearly, first all elements of type a1 then
all of type a2 and so on. In our example this could be
21
Now for each i ∈ [k − 1], draw a delimiter between types ai and ai+1 , in our
example:
.
Note that, since we have no elements of type a2 = , there are two delimiters
directly after one another. Given these delimiters, drawing the elements is
actually redundant, just replacing every element with “•” yields:
• • • • •.
(i) No box may contain more than one ball. Example: When assigning stu-
dents to topics in a seminar, there may be at most one student per topic.
(ii) Each box must contain at least one ball. Example: When you want to get
10 people and 5 cars to Berlin, you have some flexibility in distributing
the people to cars, but a car cannot drive on its own.
(iii) No restriction.
A summary of the results of this section is given in the end in Table 1.
Instead of counting the ways balls can be arranged in boxes, some people
count ways that balls can being put into the boxes or being picked from the
boxes, but this does not actually make a difference. We now systematically
examine all 12 cases.
22
to points (points are not labeled, they are just points) and boxes correspond to
problems (they are labeled: There is problem 1,2,3,4 and 5).
In other words, we search for solutions to the equation
30 = x1 + x2 + x3 + x4 + x5
with non-negative integers x1 , . . . , x5 , for example:
30 = 8 + 4 + 5 + 5 + 7 or 30 = 10 + 5 + 8 + 4 + 3.
The students say that they would like it if some of the problems are “bonus
problems” worth zero points, so the partition 30 = 10 + 10 + 10 + 0 + 0 should
be permissible. Torsten remains unconvinced.
We come back to this later and examine three cases for general n and k in
the balls-and-boxes formulation:
≤ 1 ball per box Of course, this is only possible if there are at most as many
balls as boxes (n ≤ k). For n = 2 and k = 5 one arrangement would be:
1 2 3 4 5
Each of the k boxes can have two states: Occupied (one ball in it) or
empty (no ball in it) and exactly n boxes are occupied. The number of
ways to choose these occupied boxes is nk .
≥ 1 ball per box Of course, this is only possible if there are at least as many
balls as boxes (n ≥ k). For example for n = 9 and k = 5:
1 2 3 4 5
To count the number of ways to do this, arrange the balls linearly, like
this:
There is a bijection between the arrangements with no empty box and the
choices of n − 1 gaps for delimiters out of k − 1 gaps in total. We know
how to count the latter: There are n−1
k−1 possibilities.
Going back to the example, we now know there are 29 5 solutions to the
equation:
x1 + x2 + x3 + x4 + x5 = 30
where x1 , . . . , x5 are positive integers.
23
arbitrary number of balls per box Now boxes are allowed to contain any
number of elements, including 0. One example for n = 7 and k = 5 would
be:
1 2 3 4 5
X
k−1
k
n−1
.
i=0
i k−i−1
This looks different from the other two results, which means we “ac-
cidentally” proved the non-trivial identity:
k−1
X k n − 1
n+k−1
= (n, k ≥ 1).
k−1 i=0
i k−i−1
30+6−1
Going back to the example, we now know there are 6−1 solutions to the
equation:
30 = x1 + x2 + x3 + x4 + x5
with non-negative integers, i.e. that many assignments of points to the five
exercises on exercise sheets such that the total number of points is 30.
24
Torsten thinks that no exercise should be worth more than 10 points. In
the balls and boxes setting this limits the capacity ri of a box i. This makes
counting much more difficult but we will see a way to address this in Chapter 2
using the principle of inclusion and exclusion.
(As for homework problems, it turns out that Jonathan put \def\points{6}
into his latex preamble, which settles the issue.)
1 2 3
≥ 1 ball per box Of course, this is only possible if there are at least as many
balls as boxes (n ≥ k).
This is the same as the number of partitions of [n] into k non-empty parts
which we also call Stirling Numbers of the second kind and write as sII
k (n).
Some values are easy to determine:
• sII
0 (0) = 1: There S is one way to partition the empty set into non-
empty parts: ∅ = X∈∅ X. Each X ∈ ∅ is non-empty (because no
such X exists).
• sII
0 (n) = 0 (for n ≥ 1): There is no way to partition non-empty sets
into zero parts.
• sII
1 (n) = 1 (for n ≥ 1): Every non-empty set X can be partitioned
into one non-empty set in exactly one way: X = X.
2n −2
• sII
2 (n) = 2 = 2n−1 − 1 (for n ≥ 1): We want to partition [n] into
two non-empty parts. If we consider the parts labeled (there is a first
part and a second part), then choosing the first part fully determines
the second and vice versa. Every subset of [n] is allowed to be the
first part – except for ∅ and [n]. This amounts to 2n − 2 possibilities,
however, since the parts are actually unlabeled (there is no “first”
or “second”) every possibility is counted twice so we need to divide
by 2.
25
• sII
n (n) = 1: There is only one way to partition [n] into n parts: Every
number gets its own part.
sII II II
k (n) = ksk (n − 1) + sk−1 (n − 1).
• The ball of label n may have its own box (with no other ball in
it). The number of such arrangements is equal to the number of
arrangements of the remaining n − 1 balls in k − 1 boxes such that
none of those k − 1 boxes is empty. There are sII
k−1 (n − 1) of those.
• The box with the ball of label n contains another ball. Then, when
removing ball n, there is still at least one ball per box. So removing
ball n gets us to an arrangements of n−1 balls in k non-empty boxes.
There are sIIk (n − 1) of those and for each there are k possibilities
where ball n could have been before removal (note that the boxes
are distinguished by the balls that are already in it). So there are
k · sII
k (n − 1) arrangements where ball n is not alone in a box.
The recursion formula follows from summing up the values for the two
cases.
Note that the Stirling Numbers fulfill a recursion similar to the recursion of
binomial coefficients, so there is something similar to the Pascal Triangle.
See Figure 11.
0
k=
n=0 1
1
k=
n=1 0 1
2
k=
n=2 0 1 1
3
k=
n=3 0 1 3 1
4
k=
n=4 0 1 7 6 1
5
k=
n=5 0 1 15 25 10 1
6
k=
n=6 0 1 31 90 65 15 1
Figure 11: “Stirling Triangle”. A number sIIk (n) is obtained as the sum of the
number sII II
k−1 (n − 1) toward the top left and k times the number sk (n − 1)
towards the top right. E.g. for n = 6, k = 3: 90 = 15 + 3 · 25.
26
We will later see a closed form of sII
k (n) in Chapter 2 using the principle
of inclusion and exclusion.
arbitrary number of balls per box Empty boxes are allowed. To count all
arrangements, first choose the number i of boxes that should be non-empty
(0 ≤ i ≤ k), then count the number of arrangements of n balls into the i
boxes such that non of them is empty. This gives a total of:
k
X
sII
i (n).
i=0
C AB H FG I JK
1.0 1.3 1.7 5.0
...
H FG I JK C AB
1.0 1.3 1.7 5.0
...
so the boxes (grades) are clearly labeled (we could not draw the boxes 2.0 till
4.0 due to lack of space) . Furthermore, Alice (A) and Bob (B) insist that they
would notice the difference between the arrangement
A E CD FG B
1.0 1.3 1.7 3.0
... ...
B E CD FG A
1.0 1.3 1.7 3.0
... ...
so the balls (students) are labeled as well. Such arrangements correspond di-
rectly to functions f : [n] → [k], mapping each of the n balls to one of the k
boxes, or here: Mapping every student to their grade.
As before, we consider three subcases:
≤ 1 ball per box Of course, this is only possible if there are at most as many
balls as boxes (n ≤ k).
Such arrangements correspond to injective functions f : [n] → [k]. We
first choose the image of 1 ∈ [n] (there are k possibilities), then the image
27
of 2 ∈ [n] (there are k − 1 possibilities left) and so on. Therefore, the
number of injective functions (and therefore arrangements with at most
one ball per box) is:
k! k
k · (k − 1) · · · · · (k − n + 1) = = n!.
(k − n)! n
≥ 1 ball per box Of course, this is only possible if there are at least as many
balls as boxes (n ≥ k).
These arrangements correspond to surjective functions from [n] to [k].
They can also be thought of as partitions of [n] into k non-empty distin-
guishable (!) parts. So we count the number of ways to partition [n] into
k non-empty indistinguishable parts (there are sII
k (n)) and multiply this
by the number of ways k! to assign labels to the parts afterwards. So in
total, there are
k!sII
k (n)
arbitrary number of balls per box There are k choices for each of the n
balls, so k n arrangements in total.
Balls and boxes are unlabeled. Adria cannot distinguish any two balls and can
also not distinguish boxes with the same number of balls.
Even though boxes have no intrinsic ordering, we need to somehow arrange
them on this two-dimensional paper. In order to not accidentally think that
and are different, we use the convention of drawing boxes
in decreasing order of balls. With this convention, an arrangement will look
different on paper if and only if it is actually different in our sense.
With this in mind we see the number of arrangements of n unlabeled balls
in k unlabeled boxes is equal to the number of ways to partition the integer n
into k non-negative summands. For example:
9 = 4 + 2 + 2 + 1.
Partitions where merely the order of the summands differ are considered the
same, so again we use the convention of writing summands in decreasing order.
28
≤ 1 ball per box Of course, this is only possible if there are at most as many
balls as boxes (n ≤ k).
In that case, there is only one way to do it: Put every ball in its own box.
Then there are n boxes with a ball and k − n empty boxes.
≥ 1 ball per box Of course, this is only possible if there are at least as many
balls as boxes (n ≥ k).
As discussed before, we count the number of ways in which the integer n
can be partitioned into exactly k positive parts, i.e.
n = a1 + a2 + . . . + ak , where a1 ≥ a2 ≥ . . . ≥ ak ≥ 1.
29
n balls k boxes ≤ 1 per box ≥ 1 per box arbitrary
k
n−1
n+k−1
U L n k−1 k−1
Pk
L U 1 sII
k (n)
II
i=1 si (n)
k
L L n n! sII
k (n)k! kn
Pk
U U 1 pk (n) qk (n) = i=1 pi (n)
(7,4)
(0,0)
In other words, a lattice path is a sequence of “↑” (upward steps) and “→”
(rightward steps) with a total of n times ↑ and m times →. In yet other words,
the lattice paths correspond to permutations of the multiset {m · →, n · ↑} and
by Theorem 1.8 their number is
m+n
.
m
30
Example (Cake Number). For positive integers m and n the cake number
c(m, n) is the maximum number of pieces obtained when cutting an m-dimensional
m
Sn by n cuts, i.e. the maximum number of connected components of R \
cake
H
i=1 i , where H i is an (m − 1)-dimensional affine hyperplane. For m = 2, this
simply means: Put n lines into the plane and observe into how many pieces the
plane is cut. See Figure 13 for an example.
2
3
1
10 4
9 11
8
6 5
7
darts onto the plane, the probability that the points will be in general position is 1.
31
e
d
a
c
b
Figure 14: There are n = 5 points in the plane (m = 3). Some subsets can be
captured by a circle (the pictures shows: {a, d}, {b, d, e}, {c, d, e}, {c}, ∅), some
cannot, for example {b, c, e} (such a circle would also contain d).
P
n
(ii) (2k − 1) = n2 .
k=1
Proof. (i) For the first identity, arrange 1+2+. . .+n dots in a triangle, mirror
it, and observe that this fills all positions of a square of size (n+1)×(n+1)
except for the diagonal. Here is a picture for n = 6:
• • •••• • •••••
• • ••• • •••• •
• • •• • ••• ••
• • • −→ • •• •••
• • • • ••••
• • •••••
••••••
P
n
Therefore 2 · k = (n + 1)2 − (n + 1) = n(n + 1). Dividing by 2 proves
1
the claim.
(ii) Note how tiles of with sizes of the first n odd numbers can be arranged to
form a square of size n × n, here a picture for n = 5:
1
3
5
7
9
32
Theorem 1.16 (Proofs by Double Counting). We have
P
n
n
(i) 2k k = 3n .
k=0
m
X
n+i n+m+1
(ii) = .
i=0
i m
Proof. (i) We double count strings over the alphabet {0, 1, 2} of length n, in
other words, the n permutations of the multiset {∞ · 0, ∞ · 1, ∞ · 2} where
there is infinite supply of the types 0, 1, 2.
First way: There are three possibilities per character, so 3n possibilities
in total.
Second way: First choose the number of times k that the letter “0”
should be used ( 0 ≤ k ≤ n). Then choose the positions for those
characters, there are nk possibilities. Finally, choose for each of the
remaining (n − k) positions if they should be 1 or 2, there are 2n−k
choices. So in total, there is this number of possibilities:
Xn
n−k n
2 ·
k
k=0
n+m
Pm k+n−1
This gives the equality m = k=0 k . In the claim, we merely
replaced n by n + 1.
A few other identities can be derived by using the connection of multinomial
coefficients to polynomials.
33
(7,4)
(0,0)
Figure 15: At some point, a lattice path must cross the dashed line, i.e. use
one of the edges going to the last row. We count thenumber of paths using the
highlighted edge from (4, 3) to (4, 4): There are 4+3 ways to get from (0, 0) to
4
(4, 3) and one way to get from (4, 4) to (7, 4), so 4+3
4 · 1 paths in total.
Theorem 1.17 (Proofs by Analysis). We show three equations (i), (ii) and
(iii), see below.
Proof. Start with the binomial formula:
X n
n i n−i
(x + y)n = xy .
i=0
i
Setting y = 1 yields:
Xn
n i
(i) (x + 1)n = x.
i=0
i
Deriving by x gives:
Xn
n i−1
n(x + 1)n−1 = i x .
i=1
i
Now set x = 1 and get:
X n
n
(ii) n · 2n−1 = i .
i=0
i
Taking the second derivative of (i) yields:
X n
n i−2
n(n − 1)(x + 1)n−2 = i(i − 1) x .
i=2
i
Again, we set x = 1
Xn
n
n(n − 1)2n−2 = i(i − 1) .
i=0
i
Adding (ii) to this gives:
Xn
n−2 2 n
(iii) n(n + 1)2 = i .
i=0
i
34
For such a permutation, the pair (i, j) of two numbers from [n] is called an
inversion if the numbers are ordered, but swapped by the permutation, i.e.
Take for example the permutation π = 31524. It has the inversions (1, 3), (2, 3),
(2, 5), (4, 5).
For i ∈ [n] define αi := |{j ∈ [n] | (i, j) is inversion}|. The inversion sequence
of π is the sequence α1 α2 . . . αn .
The inversion number or disorder of a permutation π is the number of its
inversions, so α1 +· · ·+αn . For our example the inversion sequence is 1, 2, 0, 1, 0
and its disorder is 4.
Since for any i ∈ [n], there are only n−i numbers bigger than i, any inversion
sequence α1 , . . . , αn satisfies:
0 ≤ αi ≤ n − i (i ∈ [n]). (?)
π=
| {z }
n positions
π= 1
Now recall that α2 is the number of elements bigger than 2 that are sorted to
the left of 2, this means, α2 is exactly the number of (currently) unoccupied
positions to the left of 2. In the example we have α2 = 3, so π must look like
this:
π= 2 1
For general: i ∈ [n], if we have already placed all number from 1 to i − 1, we
can derive that π −1 (i) must be the position of the free slot that has exactly αi
free slots to the left of it.
35
In the example, since α3 = 4 we put 3 to the unoccupied position that has
four unoccupied positions to the left of it:
π= 2 1 3
π= 4 7 6 2 5 1 3 8.
←
− ←
− ←
− ←
− ←
−←−←−
1 2 3 4 1 2 3
←
− ←
− ←
− ←
−
1 2 4 3
←
− ←
− ←
− ←
−
1 4 2 3
←
− ←
− ←
− ←
−
4 1 2 3
→
− ←
− ←
− ←
− ←
−←−←−
4 1 3 2 1 3 2
←
− →
− ←
− ←
−
1 4 3 2
←
− ←
− →
− ←
−
1 3 4 2
←
− ←
− ←
− →
−
1 3 2 4
←
− ←
− ←
− ←
− ←
−←−←−
3 1 2 4 3 1 2
←
− ←
− ←
− ←
−
3 1 4 2
←
− ←
− ←
− ←
−
3 4 1 2
←
− ←
− ←
− ←
−
4 3 1 2
→
− →
− ←
− ←
− →
− ←
−←−
4 3 2 1 3 2 1
→
− →
− ←
− ←
−
3 4 2 1
→
− ←
− →
− ←
−
3 2 4 1
→
− ←
− ←
− →
−
3 2 1 4
←
− →
− ←
− ←
− ←
−→− ←
−
2 3 1 4 2 3 1
←
− →
− ←
− ←
−
2 3 4 1
36
←
− ←
− →
− ←
−
2 4 3 1
←
− ←
− →
− ←
−
4 2 3 1
→
− ←
− ←
− →
− ←
−←−→−
4 2 1 3 2 1 3
←
− →
− ←
− →
−
2 4 1 3
←
− ←
− →
− →
−
2 1 4 3
←
− ←
− →
− →
−
2 1 3 4
i 1 2 3 4 5 6 7 8
π(i) 7 8 5 4 6 1 3 2
• Another way to specify the map would be to connect i with π(i), maybe
like this:
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
• In the last drawing every number occurs twice (once as a preimage and
once as an image), now we only take one copy of every number and give
a direction to the edges (from preimage to image):
1
7
6 2 8 4
3
5
In other words, we draw an edge from i → j if π(i) = j. Since π is a map,
every node gets one outgoing edge and since π is bijective, every node gets
exactly one incoming edge. From this it is easy to see that the image we
get is a collection of cycles: When starting at any node i and following the
arrows, i.e. walking along the path i → π(i) → π(π(i)) → π(π(π(i))) →
. . ., we must at some point – since there are only finitely many elements
– come for the first time to an element where we already were. This must
be i, since all other nodes already have an incoming edge. So i was indeed
in a cycle.
37
Instead of drawing the picture we prefer to just list the cycles like this:
π = (17356)(28)(4)
where cycles are group by “()” and elements within a cycle are given in
order. The representation at hand is called a disjoint cycle decomposition.
Note that every element occurs in exactly one cycle (some cycles may have
length 1). Note that this composition is, strictly speaking, not unique, we
could also write π = (4)(73561)(28), where we changed the order of two
cycles and “rotated” 7 to the beginning of its cycle. In the following, we
will not distinguish disjoint cycle decompositions that differ only in these
ways.
Before we generalize disjoint cycle decompositions to arbitrary cycle decom-
positions, we define the product of two permutations:
If π1 , π2 ∈ Sn are permutations, the product π1 · π2 (or just π1 π2 for short)
is the permutation π with π(i) = π2 (π1 (i)) (i.e. first apply π1 , then π2 ). Note,
that when viewed as maps: π1 · π2 = π2 ◦ π1 .
Now if we write a single cycle, e.g. π ∈ S7 , π = (137), we mean the permu-
tation that permutes the elements in the cycle along the cycle (here π(1) = 3,
π(3) = 7, π(7) = 1) and leaves all other elements in place, in our example π is
the permutation with the disjoint cycle decomposition (137)(2)(4)(5)(6).
We can now talk about cycle decompositions where elements may occur more
than once, e.g. π = (136)(435)(7452)(6)(23). It is the product of the involved
cycles ( (136), (435), . . . ).
If π ∈ Sn is a permutation given as a cycle decomposition, then π can be
evaluated by parsing, where parsing an element i ∈ [n] in a cycle decomposition
means
• Have a current element c, initially c = i.
• Go through cycles from left to right.
– If the current element c is part of a cycle, replace c with the next
element in this cycle.
• In the end of this process π(i) is the current element c.
Take for instance π = (136)(435)(7452)(6)(23) and i = 3. Then we start with
the current element c = 3. The first cycle (136) contains c = 3, so we change
c to 6 and go on. The next cycle (435) does not contain c = 6, and neither
does (7452) so we move past these cycles without changing c. The next cycle
(6) contains c = 6 but does not alter it. We therefore end with π(3) = 6. As
another example, consider how π(1) is evaluated:
c=1 c=3 c=5 c=2 c=2 c=3
↓ ↓ ↓ ↓ ↓ ↓
(136) (435) (7452) (6) (23)
So π(1) = 3.
Theorem 1.19. If π1 , π2 ∈ Sn are permutations given in cycle decomposition,
then a cycle decomposition of π1 · π2 is given by concatenating the cycle decom-
positions of π1 and π2 .
38
Proof. Let the cycle decompositions be π1 = C1 C2 . . . Ck , π2 = C10 C20 . . . Cl0 .
When parsing C1 C2 . . . Ck C10 C20 . . . Cl0 with i ∈ [n] then the current element
after C1 C2 . . . Ck will be π1 (i) and therefore, after going through the remaining
cycles C10 . . . Cl0 , the current element will be π2 (π1 (i)) = (π1 · π2 )(i). So the
concatenation is indeed a cycle decomposition of π1 · π2 .
Note that non-disjoint cycle decompositions do not always commute, e.g.
(435)(321) = (13542) 6= (15432) = (321)(435)
where the identities can be verified by parsing. The claims of the following
theorem are easy and we will skip the proofs:
Theorem 1.20. Let π ∈ Sn be given in cycle decomposition.
• Swapping adjacent cycles has no effect if they are disjoint (i.e. no number
occurs in both cycles), e.g. (13)(524) = (524)(13).
• A cycle decomposition of π −1 is obtained by swapping the order of all
−1
cycles and reversing the elements in each cycle, e.g. ((321)(435)) =
(534)(123).
• Cyclic shifts in a cycle have no effect. (534) = (345) = (453).
• Up to cyclic shifts and order of cycles, there is a unique decomposition
into disjoint cycles.
Let id ∈ Sn be the identity permutation ( id(i) = i for i ∈ [n]). Define the order
of π ∈ Sn to be the smallest k such that
πk = π
| · π {z
· . . . · π} = id.
k times
Theorem 1.21. The order of π is the least common multiple of the lengths of
the cycles in the disjoint cycle decomposition of π.
Proof. Assume π is composed of the disjoint cycles C1 C2 . . . Cm and an element
i ∈ [n] is contained in a cycle Cj of length l. Then for any k ∈ N, the value
π k (i) is obtained by parsing i through i
C1 C2 . . . Cm C1 C2 . . . Cm . . . C1 C2 . . . Cm
| {z }
k copies of C1 . . . Cm
Only the copies of Cj can affect the current value if we start with i, so the result
is the same as when parsing i through k copies of just Cj . From this we see
that π k (i) = i if and only if k is a multiple of l. This shows that π k = id if and
only if k is a common multiple of the cycle lengths. So, the order of π is the
least common multiple of them.
1.7.3 Transpositions
A cycle of length 2 is a transposition.
Define discriminant of π as N (π) := n − #C where #C is the number of
cycles in a disjoint cycle decomposition of π. Note that we may not omit single
element cycles now, they count towards #C:
In S5 we have for example N (id) = N ((1)(2)(3)(4)(5)) = n − 5 = 0 and
N ((134)(25)) = 5 − 2 = 3.
39
Theorem 1.22.
(i) Any permutation can be written as the product of transpositions.
(ii) If π is the product of k transpositions (k ∈ N), then N (π) and k have the
same parity (i.e. N (π) ≡ k (mod 2)).
Proof. (i) Let π be any permutation. We already know that we can write π
as a product of cycles. So to show that π can be written as a product
of transpositions, it suffices to show that any cycle can be written as the
product of transpositions.
Let C = (a1 a2 . . . al ) be a cycle of length l (assume l ≥ 2, since cycles of
length 1 correspond to identity permutations and can be omitted from any
cycle decomposition).
Then we can write C as the product of l − 1 transpositions:
C = (a1 a2 . . . al ) = (al−1 al )(al−2 al−1 ) . . . (a2 a3 )(a1 a2 )
Verify this by parsing: If i 6= l, then parsing ai yields:
c=ai c=ai c=ai c=ai c=ai+1 c=ai+1 c=ai+1 c=ai+1 c=ai+1
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
(al−1 al ) (al−2 al−1 ) . . . (ai ai+1 ) (ai−1 ai ) ... (a2 a3 ) (a1 a2 )
40
In both cases, adding a transposition changed the number of cycles by one,
which means that the parity of the number of cycles changed. Therefore,
the claim follows by induction.
We call the number sIk (n) of n-permutations of [n] with exactly k cycles in
a disjoint cycle decomposition the unsigned Stirling number of first kind. More
precisely sIk (n) = |{π ∈ Sn | N (π) = n − k}|.
Theorem 1.23. For all n, k ≥ 1
41
0
k=
n=0 1
1
k=
n=1 0 1
2
k=
n=2 0 1 1
3
k=
n=3 0 2 3 1
4
k=
n=4 0 6 11 6 1
5
k=
n=5 0 24 50 35 10 1
6
k=
n=6 0 120 274 225 85 15 1
Figure 16: A number sIk (n) is obtained as the sum of the number sIk−1 (n − 1)
toward the top left and (n − 1) times the number sIk (n − 1) towards the top
right. E.g. for n = 6, k = 3: 225 = 50 + 5 · 35.
1.7.4 Derangements
A permutation π ∈ Sn is a derangement if it is fixpoint-free, i.e., ∀i ∈ [n] :
π(i) 6= i. Note that this means that each cycle has length at least 2. The set of
all derangements of [n] is denoted by Dn .
We close this chapter by “counting” derangements in Theorem 1.24. This
is also meant to demonstrate that there are several different ways to count.
Depending on the purpose, different counts may be more or less helpful.
Remark. You might find the notation |Dn | =!n in the literature (sic!). Since we
think this notation is too confusing we won’t use it here.
Before we start observe that |D1 | = 0 and |D2 | = 1. We may also agree
on |D0 | = 1, since the empty permutation has no fixpoint, but we try to avoid
using D0 .
42
Theorem 1.24. For a natural number n ≥ 1 we have
|Dn | = (n − 1)(|Dn−1 | + |Dn−2 |), if n ≥ 2, (Recursion (i))
n
|Dn | = n|Dn−1 | + (−1) , (Recursion (ii))
n
X n n−1
X n
n! = |Dk |, |Dn | = n! − |Dk |, (Recursion (iii))
k k
k=0 k=0
Xn Xn
n (−1)k
|Dn | = (−1)n+k k! = n! , (Summation)
k k!
k=0 k=0
n! 1
|Dn | = + , (Explicit)
e 2
√ nn
|Dn | ∼ 2πn n+1 (∼ cn log n for some c ∈ R), (Asymptotic)
e
X |Dn | e−x
xn = ∀x ∈ R \ {1}. (Generating Function)
n! 1−x
n≥0
Proof.
Recursion (i): We prove this statement using a map φ : Dn → Dn−1 ∪ Dn−2
as follows. Consider a derangement π ∈ Dn and let x, y ∈ [n] be the
unique numbers such that π(x) = n and π(n) = y. Written as a string, π
is of the form π = AnBy where A is a sequence of x − 1 elements and B
is a sequence of n − x − 1 elements.
If (case 1) x 6= y, then define φ(π) := AyB. Note that σ = φ(π) ∈ Dn−1 ,
since σ(x) = y 6= x. For example, if π = 45123, then x = 2, y = 3 and
φ(π) = 4312.
If (case 2) x = y, then define φ(π) := A∗ B ∗ where A∗ and B ∗ are
identical to A and B except that all numbers bigger than x are decreased
by 1. We claim that σ = φ(π) ∈ Dn−2 . If i < x, then σ(i) = φ(i) 6= i if
φ(i) < x, and σ(i) = φ(i) − 1 ≥ x > i if φ(i) > x. Otherwise i ≥ x and
σ(i) = φ(i+1) < x < i if φ(i+1) < x, and σ(i) = φ(i+1)−1 6= i+1−1 = i
if φ(i + 1) > x. For example, if π = 45132, then x = y = 2, AB = 413 and
φ(π) = 312.
Claim. Each σ ∈ Dn−1 ∪Dn−2 is the image of exactly (n−1) permutations
π ∈ Dn .
Case 1: σ ∈ Dn−1 . Then there are n − 1 choices to pick a position
x ∈ [n − 1]. Let σ(x) = y, write σ = AyB and define π = AnBy.
Then φ(π) = σ. Note that π ∈ Dn , since π(x) = n 6= x.
Case 2: σ ∈ Dn−2 . Then there are n − 1 choices to put n on position
x (as the first element, into a gap or after the last element). Then
there is a unique way to increase σ(i) by 1 for each i with σ(i) ≥ x.
Finally put x on position n. We obtain π = AnBx, which is mapped
to σ. Note that π ∈ Dn due to similar arguments as above.
Recursion (ii): We will apply induction on n. An induction basis is given by
|D2 | = 1 and |D1 | = 0 . Suppose n ≥ 3. We rewrite Recursion (i) as
|Dn | − n|Dn−1 | = −(|Dn−1 | − (n − 1)|Dn−2 |). By induction hypothesis,
the right side is equal to −(−1)n−1 = (−1)n which proves the claim.
43
Remark. For the number |Sn | = P (n) = n! of permutations of [n] we have
a similar recursion |Sn | = (n − 1)(|Sn−1 | + |Sn−2 |), since n! = (n − 1)((n −
1)! + (n − 2)! and |Sn | = n|Sn−1 |.
Recursion (iii): We count all permutations in Sn . On the one hand |Sn | = n!.
On the other hand each π ∈ Sn induces a derangement on[n]\F (π), where
F (π) is the set of all fixpoints of π. Thus there are nk |Dn−k | different
permutations
Pn in Sn with exactly k fixpoints each. Hence n! = |Sn | =
n
k=0 k |Dn−k |.
P1 (−1)k
summation: Proof by induction on n. If n = 1, then |D1 | = 0 = 1 k=0 k! =
1(1 − 1). This gives an induction basis.
Consider n ≥ 2. Then, using the induction hypothesis (IH):
n−1
X
IH (−1)k
|Dn | = n|Dn−1 | + (−1)n = n(n − 1)! + (−1)n
k!
k=0
n−1
X k n
X
(−1) n! (−1)k
= n! + (−1)n = n! .
k! n! k!
k=0 k=0
P zk
explicit: Recall that ez = k≥0 k! . Then
Pn (−1)k n
|Dn | n! X (−1)k 1
k=0
lim = lim k!
= lim = e−1 = .
n→∞ |Sn | n→∞ n! n→∞ k! e
k=0
X |Dn |
(1 − x)F 0 (x) = (xn−1 − xn )
(n − 1)!
n≥1
X |Dn+1 | |Dn |
= − xn
n! (n − 1)!
n≥1
X |Dn−1 |
= xn
(n − 1)!
n≥1
= xF (x).
e−x
This differential equation with F (0) = 1 is solved by F (x) = 1−x .
44
2 Inclusion-Exclusion-Principle
and Möbius Inversion
In the last theorem of the previous chapter, and in several other places, we have
seen summations with alternating signs. This chapter will deal with such kind
of results. Consider a finite set X and some properties P1 , . . . , Pm such that
for each x ∈ X and each i ∈ [m] it is known whether x has property Pi or
not. We are interested in the number of elements from X satisfying none of the
properties Pi .
Example. Let X be the set of students in the room and let P1 and P2 be the
properties “being male” and “studies math”, respectively. Suppose there are 36
students, 26 of which are male and 32 of which study math. Among the male
students 23 study math. We are interested in the number of students which are
neither male nor study math. Let X1 , X2 denote the set of male students and
the set of math students, respectively. Then
|X \ (X1 ∪ X2 )| = |X| − |X1 | − |X2 | + |X1 ∩ X2 | = 36 − 26 − 32 + 23 = 1.
See Figure 17.
36 students
male math
26 23 32
Figure 17: When there are 26 male and 32 math students among 36 students
in the class, and 23 of the male students study math, then there is exactly 1
female student who does not study math.
45
Proof. Consider any x ∈ X. If x ∈ X has none of the properties, then x ∈ N (∅)
and x 6∈ N (S) for any S 6= ∅. Hence x contributes 1 to the sum (1).
If x ∈ X has exactly k ≥ 1 of the properties, call this set of properties
T ∈ [m]
k . Then x ∈ N (S) iff S ⊆ T . P Pk
The contribution of x to the sum (1) is S⊆T (−1)|S| = i=0 ki (−1)i = 0.
Pk
In the last step we used that for any y ∈ R we have (1−y)k = i=0 ki (−y)i
Pk
which implies (for y = 1) that 0 = i=0 ki (−1)i .
The previous theorem can also be proved inductively.
Corollary 2.2. The number of elements of X that have at least one of the
properties P1 , . . . , Pm is given by
X X
|X| − (−1)|S| N (S) = (−1)|S|−1 N (S).
S⊆[m] ∅6=S⊆[m]
P1 P2
−−
− + −
−−−
+++
−− − −−
+ +
−
P3
is subtracted once from the total number of elements. The number of elements
in (P1 ∩ P2 ) \ P3 is subtracted twice in the beginning (since the elements are
in P1 as well as in P2 ) and then added back once. The number of elements
46
in P1 ∩ P2 ∩ P3 is subtracted three times in the beginning, then added back
three times and finally subtracted yet again. The same holds for the other
intersections. Altogether one can see that each element in P1 ∪P2 ∪P3 contributes
exactly −1 to the total sum.
2.1.1 Applications
In the following we apply the principle of inclusion and exclusion (PIE) to count
things. The arguments have a fairly rigid pattern:
(i) Define “bad” properties: We identity the things to count as the elements
of some universe X except for those having at least one of a set of prop-
erties P1 , . . . , Pm . The corresponding sets are denoted by X1 , . . . Xm (i.e.
Xi is the set of elements having property Pi ). Given this reformulation of
our problem we want to count X \ (X1 ∪ . . . ∪ Xm ).
(ii) Count N(S): For each S ⊆ [m], determine N (S), the number of elements
of X having all bad properties Pi for i ∈ S.
(iii) Apply PIE: Apply Theorem 2.1, i.e. the principle of inclusion and ex-
clusion. This yields a closed formula for |X \ (X1 ∪ . . . ∪ Xm )|, typically
with one unresolved summation sign.
Theorem 2.3 (Surjections). The number of surjections from [k] to [n] is:
Xn
n
(−1)i (n − i)k .
i=0
i
Proof. Define bad properties: Let X be the set of all maps from [k] to [n].
Define the “bad” property Pi for i ∈ [n] as “i is not in the image of f ”,
i.e.
f : [k] → [n] has property Pi :⇐⇒ ∀j ∈ [k] : f (j) 6= i.
With this definition, the surjective functions are exactly those functions
that have no bad property, i.e. we need to count X \ (X1 ∪ . . . ∪ Xn ).
Count N(S): We claim N (S) = (n−|S|)k , for any S ⊆ [n]. To see this, observe
that f has all properties with indices from S if and only if f (i) ∈
/ S for all
i ∈ [k]. In other words, f must be a function from [k] to [n] \ S and there
are (n − |S|)k of those.
Apply PIE: Using Theorem 2.1, the number of surjections is therefore:
X
(−1)|S| N (S)
pie
X \ (X1 ∪ . . . ∪ Xn ) =
S⊆[n]
X
= (−1)|S| (n − |S|)k
S⊆[n]
n
X
n i
= (−1) (n − i)k .
i=0
i
In the last step we used that (−1)|S| (n − |S|)k only depends on the size of
S and there are ni sets S ⊆ [n] of size i.
47
Corollary 2.4. (i) Consider the case n = k. A function from [n] to [n] is a
surjection if and only if it is a bijection. Since there are n! bijections on
[n] (all permutations) we obtained the identity:
X n
i n
n! = (−1) (n − i)n .
i=0
i
(ii) A surjection from [k] to [n] can be seen as a partition of [k] into n non-
empty distinguishable parts (the map assigns a part to each i ∈ [k]). Since
the partitions of [k] into n non-empty indistinguishable parts is counted by
sII
n (k) and there are n! ways to assign labels to the n parts, we obtain that
the number of surjections is equal to n!sII n (k). This proves the identity:
Xn
i n
n!sII
n (k) = (−1) (n − i)k .
i=0
i
Proof. Define bad properties: Let X be the set of all permutations of [n].
We define the “bad property” Pi that means “π has a fixpoint i”:
π ∈ X has property Pi :⇐⇒ π(i) = i, (i ∈ [n]).
Derangements are exactly permutations that have none of these properties.
Count N(S): We claim N (S) = (n − |S|)! for any S ⊆ [n].
Indeed, π ∈ X has all properties with indices from S if and only if all
i ∈ S are fixed points of π. On the other elements, i.e. on [n] \ S, π may
be an arbitrary bijection so there are (n − |S|)! choices for π.
Apply PIE: Using Theorem 2.1, the number of derangements is therefore:
X
(−1)|S| N (S)
pie
X \ (X1 ∪ . . . ∪ Xn ) =
S⊆[n]
X
= (−1)|S| (n − |S|)!
S⊆[n]
n
X
n i
= (−1) (n − i)!
i=0
i
In the last step we used that (−1)|S| (n − |S|)! only depends on the size of
S and there are ni sets S ⊆ [n] of size i.
Theorem 2.6 (Combinations of multisets). Consider a multiset M with types
1, . . . , m and repetition numbers r1 , . . . , rm . Then the number of k-combinations
of M is: P
X
|S| m − 1 + k − i∈S (ri + 1)
(−1)
m−1
S⊆[m]
where we define ab := 0 for a < b.
48
Proof. Define bad properties: Let X be the set of k-combinations where we
disregard the restrictions the repetition numbers impose, in other words
X is the set of k-combinations of M f, where Mf is the multiset with the
same m types as in M but infinite supply of each type (i.e. “ri = ∞” for
each i ∈ [m]).
Recall that by Theorem 1.9, |X| = m−1+k m−1 . Define the bad property Pi
as:
In the special case where all the repetition numbers are equal, i.e. r1 = r2 =
. . . = rm = r, this can be simplified to:
Xm
i m m − 1 + k − (r + 1) · i
(−1) .
i=0
i m−1
Before we study a more advanced example, we prove a small result needed there:
2n 2n−r
Lemma. There are 2n−r r binary sequences (with letters 0 and 1) such
that:
• The sequence has length 2n and exactly r copies of 1 and 2n − r copies of
0.
• No two copies of 1 are adjacent. Here the first and the last position of the
sequence count as adjacent (the sequence is cyclic).
For example, if n = 3 and r = 2 there are 9 such sequences, namely:
49
Proof. Since no two copies of 1 may be adjacent, we know that after each 1
there must be a 0. So we can just imagine that one 0 is already “glued” to
every 1 and we are actually building a sequence with 2n − 2r copies of 0 and r
copies of 10.
However, one copy of 10 might “spill” across the border, i.e. the 1 could be
in position 2n and the 0 in position 1. We need to handle this case separately.
Case 1: The last position of the sequence is 1. Then the first position is 0 and
the remaining positions contain r − 1 copies of 10 and 2n − 2r copies of
0, now without any further pitfalls. There are 2n−r−1
r−1 ways to arrange
them.
Case 2: The last position of the sequence is 0. Then our special character
10 does not spill across the border and the sequence is any ordered ar-
rangement of r copies of 10 and 2n − 2r copies of 0. There are 2n−r
r of
them.
In total we have:
2n − r − 1 2n − r r 2n − r 2n − r 2n 2n − r
+ = + = .
r−1 r 2n − r r r 2n − r r
Define bad properties: Let X be the set of all ways to seat the men without
paying attention to the rule that they must not sit next to their wives.
There are |X| = n! ways to do it.
We define the bad property Pi that captures that the husband Hi sits next
to his wife Wi , i.e. on seat 2i − 1 or on seat 2i + 1 (all seat numbers are
taken modulo 2n, in particular, 2n and 1 are adjacent).
The permitted arrangements are exactly those with none of the bad prop-
erties.
50
Count N (S): This time, calculating N (S) for arbitrary S ⊆ [n] is tricky. In-
stead we calculate
X
N ∗ (r) := N (S), for 0 ≤ r ≤ n.
S⊆[n]
|S|=r
which will be just as helpful. The meaning of this number is a bit subtle:
N ∗ (r) is the number of pairs (π, S) where S ⊆ [n] is of size r and π is a
seating plan such that (wife Wi sits at 2i and) the couples with indices in
S are not separated. To better count these pairs, define a map f :
Under f , the pair (π, S) is mapped to a binary sequence with 2n characters
and exactly |S| copies of 1: For i ∈ S (remember that Hi and Wi will be
assigned adjacent places in π) a one should be put in the position of the
husband Hi – if he sits left of his wife – or in the position of Wi in π, if
she sits left of her husband.
It is time for an example. Consider the pair (π, S) with S = {2, 5} and
the seating
1 2 3 4 5 6 7 8 9 10
π = (H5 W1 H2 W2 H1 W3 H4 W4 H3 W5 ).
Note that the second and fifth couple are indeed not separated (W5 and
H5 are adjacent because the table is circular). Also note that the fourth
couple is also not separated, this is allowed. The mapping f discussed
above would map this pair to the sequence 0010000001 since H2 (in seat
3) and W5 (in seat 10) sit left of their spouse. It is clear that f will
never produce sequences with two adjacent ones: That would mean two
adjacent people are both sitting left of their spouse: Which is impossible
(assuming n ≥ 2). However, any sequence with r copies of 1 and no two
adjacent copies of 1 is the image of (n − r)! pairs (π, S): From the r copes
of 1, the set S (of size r) can be reconstructed as can be the position
of the husbands with indices in S. The (n − r) other husbands can be
distributed arbitrarily onto the (n − r) remaining odd-numbered seats,
there are (n − r)! ways to do so. This proves, using the Lemma above to
count the number of binary sequences of length n with r copies of 1 and
no two adjacent copies of 1:
∗ 2n 2n − r
N (r) = (n − r)! · .
2n − r r
Apply PIE: Using Theorem 2.1, the number of valid ways to arrange the hus-
51
bands between the wives is:
X
(−1)|S| N (S)
pie
X \ (X1 ∪ . . . ∪ Xn ) =
S⊆[n]
n X
X
= (−1)|S| N (S)
r=0 S⊆[n]
|S|=r
n
X X n
X
= (−1)r N (S) = (−1)r N ∗ (r)
r=0 S⊆[n] r=0
|S|=r
n
X
r 2n 2n − r
= (−1) (n − r)! · .
r=0
2n − r r
Multiplying this with 2n!, the number of ways to place the wives the total
number of seatings is (for n ≥ 2):
n
X
r 2n 2n − r
2n! (−1) (n − r)! · .
r=0
2n − r r
52
P
To see the second “=”, note that S⊆A f (S) counts an element x if and only
if the set S of all properties that x does not have is a subset of A. This means
x is counted if and only if it has all properties from [m] \ A.
We now apply Theorem 2.8 to obtain:
X X
f ([m]) = (−1)m−|S| g(S) = (−1)m−|S| N ([m] \ S)
S⊆[m] S⊆[m]
X
= (−1)|S| N (S).
S⊆[m]
where in the last step we changed the order of summation, i.e. summed over
[m] \ S instead of S. This concludes the proof of (Thm. 2.8 ⇒ Thm. 2.1).
X k
X
|A|−|S| k−i k
cT = (−1) = (−1) = 0.
i
T ⊆S⊆A i=0
where in the second step we observed that picking a set between T and A is
equivalent to picking a subset of A \ T . The last step is an identity we already
saw.
This proves the claim.
53
The greatest common divisor (gcd) of m and n, is the number k = gcd(m, n),
with M (k) = M (m)∩M (n). For example gcd(12, 90) = 6 since M (6) = {2, 3} =
{2, 2, 3} ∩ {2, 3, 3, 5} = M (12) ∩ M (90).
Note that M (1) = ∅ (1 is the result of the empty product). If gcd(m, n) = 1,
then m and n are called relatively prime (or coprime).
We define the Euler’s φ-function for n ≥ 2 as:
Then the numbers in [n] that are relatively prime to n are exactly those
that have no bad property.
Count N(S): For any S ⊆ [k] we have N (S) = Q n
i∈S pi .
The last identity is best seen by multiplying out the right side: For each
of the k factors we can either choose 1 or we can choose − p1i . The indices
i where − p1i was chosen are captured in the set S. For each S ⊆ [k] we
get exactly the term under the sum.
We now show a few more number theoretic results leading up to Möbius Inver-
sions.
P
Theorem 2.10. n = φ(d).
d|n
54
Proof. We claim that for a divisor d of n the sets {x ∈ [n] | gcd(x, n) = d} and
{y ∈ [ nd ] | gcd(y, nd ) = 1} have the same cardinality. Indeed, it is easy to see
that x 7→ y := xd is a bijection.
This means #{x ∈ [n] | gcd(x, n) = d} = φ( nd ). Summing these identities
for all d|n yields: X
n= φ( nd )
d|n
This is almost identical to the claim, the only thing left to do is to change the
order of summation: Substitute d with d0 := nd and note that d0 divides n iff d
divides n.
We now define the Möbius Function for d ≥ 1 as:
1 d is the product of an even number of distinct primes
µ(d) := −1 d is the product of an odd number of distinct primes
0 otherwise
For example, µ(15) = 1, µ(7) = µ(30) = −1, µ(12) = 0, since 15 = 3 · 5 is the
product of two distinct primes, 7 = 7 and 30 = 2 · 3 · 5 are the product of an
odd number of primes and 12 = 2 · 2 · 3 is not the product of distinct primes:
We need 2 twice. The numbers n with µ(n) 6= 0 are also called square free since
they do not have a square as a factor (12 is not square-free since it has 4 as a
factor).
Note that µ(1) = 1 since 1 is the product of 0 primes and 0 is even.
Lemma 2.11. (
X 1 if n = 1
µ(d) =
d|n
0 6 1
if n =
In the first step remember that µ(d) = 0 if d is not square-free. This means if
d contains a prime factor more than once, it does not contribute to the sum.
For the second step, realize that a divisor d of p1 . . . pk is just given by choosing
a subset D of those k primes and multiplying them. Then µ(d) will be 1 if an
even number of primes where chosen and −1 otherwise. The last two steps are
identities we already encountered earlier.
Corollary 2.12.
φ(n) X µ(d)
= .
n d
d|n
Proof. We use an identity that came up in the proof of Theorem 2.9 and argu-
ments similar to those from the last Theorem.
k
Y X X X µ(d)
1 −1|S| µ(d)
φ(n) = n (1 − pi ) =n Q =n =n .
i=1 S⊆[k] i∈S pi d|p1 ...pk
d d
d|n
55
We now have all ingredients to prove the Möbius Inversion Formula:
Theorem 2.13 (Möbius Inversion). P
Let f, g : N → R be functions satisfying g(n) = f (d). Then:
d|n
X
f (n) = µ(d)g( nd ).
d|n
n
Proof. We start by changing the order of summation (d → d) and using the
definition of g.
X X n
µ(d)g( nd ) = µ( )g(d)
d
d|n d|n
X X
= µ( nd ) f (d0 )
d|n d0 |d
X
= cd0 · f (d0 )
d0 |n
Where cd0 are numbers (to be determined!) that count how often f (d0 ) occurs
as a summand. Note that cn = µ(1) = 1 since f (n) occurs only for d0 = d = n.
For d0 |n, d0 6= n we have:
X X
cd0 = µ( nd ) = µ(m) = 0.
d0 |d|n m| dn0
n
Where we substituted the summation index d 7→ m := d and used Lemma 2.11
in the last step. This proves the claim.
Example 2.14 (Circular Sequences of 0’s and 1’s). Circular sequences are for
example:
0 1 1 0 1 0
0 1 0
1 0 0
A= 0, B= 1, C= 1
1 0 0
0 0 0
0 1 0 1 1 0
Circular sequences are considered equal if they can by transformed into one
another by rotation. We would write
56
trivial, but also not hard to see. Clearly, the length of S 0 must divide n. So we
found:
X
Nn = M (d). (?)
d|n
Now define g(n) = 2n and f (d) = d · M (d). These choices of f and g fulfill
the requirement of the Möbius Inversion Formula so we obtain:
2.13
X
f (n) = µ(d)g( nd )
d|n
X n
⇔ nM (n) = µ(d)2 d
d|n
(?) X X X d
X X
so Nn = M (d) = 1
d µ(l)2 l = 1
d µ( dl )2l
d|n d|n l|d d|n l|d
XX X X X X
1 d l 1 d l 1 l
= d µ( l )2 = d µ( l )2 = k·l µ(k)2
l|n l|d|n l|n 1| dl | nl l|n k| nl
(d:=k·l)
X 2l X µ(k) 2.12 X 2l φ( n ) X 2l
= = l
= φ( n ).
l n k l nl n l
l|n k| l l|n l|n
While not overwhelmingly pretty, at least our result is an explicit formula only
involving one sum and Euler’s φ-function. This is as good as it gets.
57
3 Generating Functions
In the following we consider sequences (an )n∈N = a0 , a1 , a2 , . . . of non-negative
numbers. Typically, ak is the number of discrete structures of a certain type
and “size” k. P∞
The generating function for (an )n∈N is given as F (x) = n=0 an xn . Despite
the name, a generating function should not be thought of as a thing you plug
values into: It is not meaningful to compute something like F (5): In fact, these
values are often not well-defined because the sum would be divergent. We care
about the thing as a whole: You can either think of generating functions as
functions that are well-defined within their radius of convergence (some area
close to zero) or you just think of the “x” in F (x) as an abstract thing (not a
placeholder for a number) which makes F (x) an element of the ring of formal
power series. In the following, we ignore all technicalities and boldly apply an-
alytic methods as though a generating function were just a simple well-behaved
function. And it just works. If you think this is all a bit arcane, try to look at
it this way:
At first, a generating function is just a silly way to write a sequence (instead
of a0 = 1, a1 = 42, a2 = 23, . . . you would write A(x) = 1 + 42x + 23x2 + . . .).
Some operations on the sequence have a natural correspondence: For example,
shifting the sequence (a0 = 0, a1 = 1, a2 = 42, a3 = 23 . . .) corresponds to
multiplying A(x) by x. Some complicated operations on sequences suddenly
become simple and natural in the world of generating functions where analytic
tools are readily available. P∞ n
P∞When ndealing with generating functions F (x) = n=0 an x and G(x) =
n=0 bn x , we freely use the following operations:
• Differentiate F term-wise
∞
X ∞
X
F 0 (x) = nan xn−1 = (n + 1)an+1 xn
n=1 n=0
We will later see that the coefficients of the product sometimes count
meaningful things if an and bn did (see Example 3.2 and 3.3).
58
Example 3.1 (Maclaurin series).
(i) Consider (an )n∈N with an = 1 for all n ∈ N. The corresponding generating
function F (x) = 1 + x + x2 + x3 + . . . is called the Maclaurin series.
1
We claim F (x) = 1−x , which looks a lot like the identity for an infinite
geometric series. To verify this, just observe that the product of F (x) and
(1 − x) is one:
(1 − x)F (x) = (1 + x + x2 + x3 + . . .) − (x + x2 + x3 + . . .) = 1.
1
(ii) Differentiating F (x) = 1−x with respect to x on both sides yields:
∞
X 1
F 0 (x) = (n + 1)xn = .
n=0
(1 − x)2
So B (1) (x) is just the Maclaurin Series from 3.1(i) shifted by one position (i.e.
multiplied by x), meaning:
∞
X x
B (1) (x) = 0 + x + x2 + x3 + . . . = x · xn = .
n=0
1−x
The key observation we make now is that multiplying two generating functions
has a meaningful correspondence in our balls-in-boxes setting:
For two numbers of boxes s and t, we have the identity:
n
X (s) (t)
an(s+t) = al an−l
l=0
59
in the remaining t boxes. Looking at the right side of the equation, note that
these numbers are exactly the coefficients of the product B (s) (x) · B (t) (x) (check
this!). So we obtained:
and therefore
k x k
(k) (1)
B (x) = B (x) = .
1−x
P (k)
However, we want to write B (k) (x) in the form n an xn to see the coefficients
(k)
an . To do this, note that deriving k − 1 times the term (1 − x)−1 yields
(k − 1)!(1 − x)−k . Using this we obtain:
k−1 ∞
!
(k) xk xk d 1 xk dk−1 X n
B (x) = = · = x
(1 − x)k (k − 1)! dxk−1 1 − x (k − 1)! dxk−1 n=0
∞
X
xk
= n · (n − 1) · . . . · (n − k + 2)xn−k+1
(k − 1)!
n=k−1
∞
X n!
= xn+1
(n − k + 1)!(k − 1)!
n=k−1
X∞ X∞
n n−1 n
= xn+1 = x
k−1 n=0
k−1
n=k−1
where in the last step use that ab = 0 if a < b and additionally make an index
shift of 1.
So we found a new proof that the number of arrangements of n unlabeled
(k)
balls in k labeled boxes and at least one ball per box is an = n−1
k−1 .
60
• Solutions to c = n where c ∈ {0, 1, 2}. Clearly, there is exactly one solution
if n ∈ {0, 1, 2} and no solution otherwise. The corresponding generating
function is:
C(x) = 1 + x + x2 .
With the same argument as in the previous example, multiplying the generating
functions yields the generating function of the sequence we are interested in, so
we calculate:
(1 + x + x2 ) (1 + x + x2 )
F (x) = A(x) · B(x) · C(x) = =
(1 − x2 )(1 − x) (1 + x)(1 − x)2 .
We would like to write this as a linear combination of generating function we
understand well, so we search for real numbers R, S, T with:
(1 + x + x2 ) R S T
F (x) = = + +
(1 + x)(1 − x)2 1 + x 1 − x (1 − x)2
Multiplying by the denominators yields:
61
So if F (x) is the generating function belonging to an , then we know:
∞
X ∞ n−1
X X ∞ X
X n
F (x) = an xn = 1 + ( ak an−k−1 )xn = 1 + ( ak an−k )xn+1
n=0 n=1 k=0 n=0 k=0
∞ X
X n
=1+x· ( ak an−k )xn = 1 + x · F (x)2 .
n=0 k=0
Xn ∞
n n k X n k
(1 + x) = x = x , ∀n ∈ N.
k k
k=0 k=0
Where, as usual, nk = 0 for k > n. This shows that (1 + x)n is the generating
n
function for the series (ak )k∈N with ak = k .
We extend this result from natural numbers n ∈ N to any real number n ∈ R.
To this end we first extend the definition of binomial coefficients.
Definition 3.7 (Binomial Coefficients for Real Numbers). Recall that for in-
tegers n, k ∈ N we have:
n |P (n, k)| n(n − 1) · · · · · (n − k + 1) n!
= = =
k k! k! k!(n − k)!
It does not make sense to talk about permutations of sets of size n ∈ R and it is
unclear what n! should be, but the formula in between is well-defined for general
n ∈ R. With this in mind, we define p(n, k) := n · (n − 1) · . . . · (n − k + 1) and
n
k
:= p(n,k)
k! . Note that the new definition matches the old one if n is integer.
We can now talk about numbers such as “−7/2 choose 5” by which me mean:
−7 −9 −11 −13 −15
−7/2 2 · 2 · 2 · 2 · 2 9009
= =− .
5 5! 256
Given our extended definition of binomial coefficients, we can state the following
Theorem (but will omit the proof).
62
Theorem 3.8 (Newton’s Binomial Theorem). For all non-zero n ∈ R we have:
∞
X
n n
(1 + x) = xk .
k
k=0
√
Setting n = 1/2 yields an identity for 1 + x that we require to proceed
in Example 3.5. But before we can use it, we need to better understand the
coefficients of the form 1/2
· .
Example 3.5 (Continued). Using the proposition we are now able to find the
coefficients of the generating function from Example 3.5 above, i.e. the number
of well-formed parenthesis expressions.
√ ∞
1 − 1 − 4x 3.10 1 X 2n − 2 1 1
F (x) = = 2 (−1)n 2n (−4x)n
2x 2x n=1 n−1 2 n
∞ ∞
1 X 2n − 2 1 n X 2n 1
= x = xn .
x n=1 n − 1 n n=0
n n + 1
1
The numbers Cn := 2n n n+1 are called Catalan Numbers. They do not only
count well-formed parenthesis expressions but occur in other situations as well.
63
3.2 Exponential Generating Functions
Until now we mapped sequences to functions like this:
∞
X
(an )n∈N 7→ an xn ∈ R[x]
n=0
In the
P∞last case we get from any sequence (an )n∈N a corresponding Dirichlet
series n=0 anns . Such series are important in algebraic number theory (in that
setting the variable is typically called s instead of x). As an example, we state
the following result (without proof).
Theorem 3.11 (Euler Product). The Dirichlet series for (µ(n))n∈N satisfies
X∞ Y
µ(n) 1
= = (1 − p−s )
n=0
ns ζ(s) p prime
P∞ 1
where ζ(s) is the Riemann zeta function ζ(s) := s
.
n=0 n
We will not examine Dirichlet series further, dealing with exponential gener-
ating functions instead. In contrast to ordinary generating functions (the kind
n
we considered before) the basis is not {xn }n∈N but { xn! }n∈N and instead of
counting arrangements of unlabeled objects (unlabeled balls in labeled boxes,
solutions to a1 + a2 + . . . + al = n with constraints, indistinguishable parenthe-
sis in an ordered string), exponential generating functions are useful to count
arrangements of labeled objects (permutations, derangements, partitions, . . . )
as we will see shortly.
Before we get started, note the exponential generating functions A(x) and
B(x) of (an )n∈N and (bn )n∈N with an = 1 and bn = n! (n ∈ N) are
X∞
xn
A(x) = = ex
n=0
n!
∞
X X∞
xn 1
B(x) = n! = xn =
n=0
n! n=0
1 − x.
64
Assume arrangements of type C with n objects are obtained by a unique
split of the n objects into two sets and then forming an arrangement of type A
with the first set and an arrangement of type B with the second.
Then cn , the number of arrangements of type C and size n, is given as
n
X n
cn = ak · bn−k .
k
k=0
Crucially, the exponential generating functions A(x), B(x), C(x) of the three
sequences reflect this relationship as C(x) = A(x) · B(x), which is easy to verify:
∞
! ∞
! ∞ n
!
X xn X xn X X ak bn−k
A(x) · B(x) = an · bn = xn
n=0
n! n=0
n! n=0 k=0
k! (n − k)!
∞ n !
X X n x n
= ak · bn−k = C(x).
n=0
k n!
k=0
• The unsigned Stirling numbers of the first kind sIk (n) fulfill the recursion
sII II II
k (n) = ksk (n − 1) + sk−1 (n − 1)
∞
X xn 1
Fk (x) = sII
k (n) = (ex − 1)k .
n=0
n! k!
65
P
∞
xn
and the claim follows from the identity n! = ex − 1.
n=1
If k ≥ 2, we use the recursion of Stirling numbers from above:
sII II II
k (n + 1) = ksk (n) + sk−1 (n).
Taking the generating functions for each term (remember that derivations cor-
respond to index shifts!) and then applying the induction hypothesis yields:
IH 1
Fk0 (x) = k · Fk (x) + Fk−1 (x) = kFk (x) + (ex − 1)k−1 .
(k − 1)!
n
X
(ii) s̄Ik (n)xk = p(x, n).
k=0
66
For (ii), plug −x into the equation above:
n
X n
Y
(−1)k sIk (n)xk = p(−x + n − 1, n) = (−x + n − k)
k=0 k=1
n
Y
= (−1)n (x − n + k) = (−1)n p(x, n).
k=1
n
Multiplying by (−1) gives the desired result:
n
X
(−1)n−k sIk (n) xk = p(x, n).
| {z }
k=0
s̄Ik (n)
On the other hand, we can also expand (1 + x)z using Newton’s Binomial The-
orem and the last Lemma:
X∞ ∞ ∞ n
z 3.8 z n X p(z, n) n 3.15 X xn X I
(1 + x) = x = x = s̄k (n)z k
n=0
n n=0
n! n=0
n!
k=0
∞ X ∞ ∞ ∞
!
X n
x k X X xn
I I
= s̄k (n) z = s̄k (n) zk .
↑ n=0 n! n=0
n!
k=0 k=0
sIk (n)=0
for k > n
We have written (1 + x)z in two ways and the coefficients of z k in both repre-
sentations must match. This proves the claim.
67
Example 3.17 (Fibonacci Numbers). Rabbits were first brought to Australia
by settlers in 1788. Assume for simplicity, the first pair of young rabbits (male
and female) arrives in January (at time n = 1). A month later (n = 2) this
pair of rabbits reaches adulthood. Another month later (n = 3) they produced
a new young pair of rabbits as offspring. In general, assume that in the course
of a month every pair of young rabbits grows into adulthood and every pair of
adult rabbits produces one pair of young rabbits as offspring. If Fn denotes the
number of rabbits at time n, then we have Fn = Fn−1 + Fn−2 , since exactly
the rabbits that already existed at time n − 2 will be adults at time n − 1 and
produce offspring between time n − 1 and n. This gives sequence:
F0 = 0, F1 = 1, F2 = 1, F3 = 2, F4 = 3, F5 = 5, F6 = 8, F7 = 13, F8 = 21, . . .
The sequence (Fn )n∈N is called the Fibonacci sequence and its elements are
the famous Fibonacci numbers. Rabbits in Australia are now a serious prob-
lem, causing substantial damage to crops and have been combated with ferrets,
fences, poison, firearms and the myxoma virus.
Figure 19: Spiraling patterns can be seen in many plants. The pine cone on the
left has 8 spiral arms spiraling out counterclockwise and 13 spiral arms spiraling
out clockwise. In the sunflower, different spiral patterns stand out depending
on where you look. Close to the center, a counterclockwise spiral with 21 arms
and a clockwise spiral with 34 arms can be seen. Closer to the border another
counterclockwise spiral with 55 arms emerges. Fibonacci numbers everywhere!
We’re not making this up, zoom in and count for yourselves (or better yet, go
outside and look at nature directly), the patterns are really there!
Fibonacci numbers also frequently occur in plants (see Figure 19). In case
you do not know Vi Hart (seriously?!) you should definitely check out her
Youtube Channel and watch her videos on this topic.
68
Figure 20: A tiling of the 9 × 2 board with dominoes.
Figure 21: The last column can either be covered by a vertical domino or by
horizontal dominoes.
69
The Fibonacci sequence fulfills a homogeneous linear recurrence relation with
constant coefficients
Fn = Fn−1 + Fn−2
where k = 2, c1 = 1, c2 = 1, g ≡ 0.
The same is true for tn from Example 3.18
tn = 3tn−1 − tn−2
where this time k = 2, c1 = 3, c2 = −1, g ≡ 0.
70
Proof. Assume first that p(x) is divisible by (x−q)m . Then p(x) = (x−q)m t(x),
where t(x) is a polynomial. We use induction on k to show that (x − q)m−k
divides pk , k = 0, . . . , m − 1. When k = 0, (x − q)(m−0) divides p0 (x) = p(x).
Assume that (x − q)m−k+1 divides pk−1 (x). Let us show that (x − q)m−k divides
pk (x). We have then that pk−1 = (x−q)m−k+1 s(x) for some polynomial s. Then
pk (x) = xp0k−1 (x) = x (m − k + 1)(x − q)m−k s(x) + (x − q)m−k+1 s0 (x) , so pk
is divisible by (x − q)m−k .
Now, assume that each of p0 , . . . , pm−1 is divisible by x − q. We shall show
that pm−k (x) is divisible by (x − q)k , k = 0, . . . , m − 1, so in particular that
p0 (x) is divisible by (x − q)m . We do this by induction on k. When k = 1,
the statement holds by assumption. Now, assume that pm−k (x) is divisible
by (x − q)k , let us prove that pm−k−1 (x) is divisible by (x − q)k+1 . Assume
not, i.e., that pm−k−1 (x) = (x − q)` t(x), t(q) 6= 0, ` ≤ k. Then pm−k (x) =
x(x − q)`−1 (`t(x) + t0 (x)(x − q)), so the highest power of (x − q) dividing pm−k
is ` − 1 < k. We see that pm−k (x) is not divisible by (x − q)k .
Lemma 3.22 (Fundamental Solution). Let f be a linear homogeneous recur-
rence with constant coefficients. If q is a root of the characteristic polynomial
of f , then (q n )n∈N is a solution of f for some initial values. Moreover, if q is
a root of the characteristic polynomial of multiplicity s, s > 1, then (ni q n )n∈N ,
0 ≤ i ≤ s − 1 is a solution of f .
Proof. Let an = f (an−1 , . . . , an−k ) = c1 an−1 + · · · + ck an−k and bn = q n , where
q is a root of the characteristic polynomial, so q k − c1 q k−1 − · · · − ck q 0 = 0.
Multiply the last equality by q n−k to obtain q n − c1 q n−1 − · · · − ck q n−k = 0. So
(q n )n∈N is a solution of f .
Now, assume that q has multiplicity s, i.e., the characteristic polynomial
p1 (x) has a form p1 (x) = (x − q)s p(x), where p(x) is a polynomial. Observe
that q is a root of each of the following polynomials: p2 (x) = p1 (x)xn−k , p02 (x),
p3 (x) = xp02 (x), p03 (x), p4 (x) = xp03 (x), etc. till ps+2 (x). Indeed, the (x − q)
term remains after performing the product rule. So, in particular, for i ≤ s
We have
We need to verify that for bn = ni q n , (bn )n∈N gives a solution of the recurrence
an − c1 an−1 − · · · − ck an−k = 0. Let us plug bn for an in the left hand side of
71
the equation:
bn − c1 bn−1 − · · · − ck bn−k =
i n i n−1 i n−i
n q − c1 (n − 1) q − · · · − ck (n − k) q =
pi+2 (q) = 0.
Here the last equality holds by (2).
Theorem 3.23 (Existence and Uniqueness). Let f be a linear homogeneous
recursion with constant coefficients and p(x) be its characteristic polynomial
with roots q1 , . . . , qr having multiplicities s1 , . . . , sr , respectively. Then for any
initial values, there is a solution (an )n∈N of f written in the form
an = (?q1n + ?nq1n + · · · + ?ns1 −1 q1n ) + · · · + (?qrn + ?nqrn + · · · + ?nsr −1 qrn ), (3)
where ?’s are some constants. In particular, if all roots of the characteristic
polynomial are distinct, an = ?q1n + ?q2n + · · · + ?qrn .
Proof. From Lemma 3.22, all the listed terms are solutions of the recursion
with some initial values. From Lemma 3.20, any of their linear combinations is
a solution of the recursion with some initial values. It remains to verify that
there are coefficients (?’s) of the linear combination so that it is a solution of the
recursion with the given initial values a0 , a1 , . . . , ak−1 . Denote the consecutive
coefficient in (3) in front of nj qin by γi,j , i = 1, . . . , r, j = 0, . . . , si − 1. That is,
the γi,j ’s are the unknowns solving equation (3) for all n ∈ N, which can hence
be written as
Xr sXi −1 Xr sXi −1
72
This is a square k × k matrix. We shall show that it is non-singular by proving
that its rows r1 , r2 , . . . , rk are linearly independent. Assume that there are
coefficients α1 , . . . , αk , not all equal to zero, so that α1 r1 +α2 r2 +· · ·+αk r1 = 0.
The components of this vector correspond to respective columns in the matrix
in the following form:
α1 + α2 q1 + α3 q12 + · · · + αk q1k−1 = 0
α1 0 + α2 q1 + α3 2q12 + · · · + αk (k − 1)q1k−1 = 0
α1 0 + α2 q1 + α3 22 q12 + · · · + αk (k − 1)2 q1k−1 = 0
..
.
α1 + α2 qr + α3 qr2 + · · · + αk qrk−1 = 0
α1 0 + α2 qr + α3 2qr2 + · · · + αk (k − 1)qrk−1 = 0
α1 0 + α2 qr + α3 22 qr2 + · · · + αk (k − 1)2 qrk−1 = 0
..
.
Let h(x) = α1 +α2 x+α3 x2 . . .+αk xk−1 . Then we see that q1 is a root of h(x), as
well as a root of h2 (x) = xh01 (x), h3 (x) = xh02 (x), . . ., hs1 . By Lemma 3.21 h(x)
is divisible by (x − q1 )s1 . Similarly,
Qr h(x) is divisible by (x − q2 )s2 , . . . , (x − qr )sr .
ThusPh(x) is divisible by i=1 (x − qi )si . Thus the degree of h(x) is at least
r
k = i=1 si , a contradiction to the definition of h(x). Thus there are no such
coefficients αi ’s and thus our matrix is non-singular. Therefore the system (4)
has a solution for any given right-hand side (a0 , a1 , . . . , ak−1 ).
Theorem 3.23 gives us a tool to solve any linear homogeneous recursion with
constant coefficients. The main steps are the following.
1. Derive the characteristic polynomial p(x).
2. Compute the roots q1 , . . . , qr of p(x) with respective multiplicities s1 , . . . , sr .
3. Solve the system of linear equations (4) (for example by Gaussian Elimi-
nation).
Pr Psi −1
4. Write down the explicit solution an = i=1 j=0 γi,j nj qin .
Let us illustrate this approach with a few examples.
Example 3.24. We solve the recursion
a0 = γ1,0 + γ2,0 = 3,
a1 = γ1,0 − 6γ2,0 = 10.
Thus γ2,0 = −1, γ1,0 = 4, and the recursion is solved by (an )n∈N with
an = 4 − (−6)n .
73
Example 3.25. We solve the recursion
a0 = γ1,0 = 3,
a1 = γ1,0 (−2) + γ1,1 (−2) = 10.
Thus γ1,0 = 3, γ1,1 = −8, and the recursion is solved by (an )n∈N with
an = 3(−2)n − 8n(−2)n .
an = −4an−2 , a0 = 6, a1 = 20.
a0 = γ1,0 + γ2,0 = 6,
a1 = γ1,0 (2i) + γ2,0 (−2i) = 20.
Recall from Example 3.17 that the Fibonacci sequence (Fn )n∈N solves the
recursion
Fn = Fn−1 + Fn−2 , F0 = 1, F1 = 1.
We shall now derive an explicit formula for Fn .
Theorem 3.27 (Binet’s Formula). For the Fibonacci numbers we have:
√ !n √ !n !
1 1+ 5 1− 5
Fn = √ −
5 2 2
√
1+ 5
where Φ := 2 is called the Golden Ratio.
74
Figure 22: In a drawing like this one, obtained by squares “spiraling” out of the
center, the ratio of width and height will approach the golden ratio. Of course,
you already know all about spirals, since you watched all Vi Hart Videos, right?
75
√ n
1 1 − 5 |1 − Φ|n |1 − Φ| 1 2 2 1
√ = √ ≤ √ = √ <√ √ = < .
5 2 5 5 Φ 5 5 5 5 2
Example 3.29 (Ternary n-string without 20). In Example 3.18 we found start-
ing values and the recurrence but we have yet to determine the values. Recall:
t1 = 3, t2 = 8, tn+2 = 3tn+1 − tn .
√
3± 5
So p(x) = (x2 −3x+1). The characteristic polynomial has the roots q1,2 = 2 .
So by Theorem 3.23 the general solution is
√ !n √ !n
3+ 5 3− 5
t(n) = γ1,0 + γ2,0 .
2 2
Again we use the initial values to determine γ1,0 and γ2,0 . For simplicity,
use t0 = 1 instead of t2 = 8:
76
An advancement operator polynomial is a polynomial in A, for example 3A2 −
A + 6. This is also an operator which maps f to the function (3A2 − A + 6)f
with:
((3A2 − A + 6)f )(n) = 3f (n + 2) − f (n + 1) + 6f (n).
This notation allows us to write linear recurrence equations with constant coef-
ficients as:
p(A)f = g
where p is some polynomial. For the Fibonacci numbers we would write (A2 −
A − 1)F = 0.
In the following we try to solve equations of this kind. We start with the eas-
iest case of a homogeneous equation and an advancement operator polynomial
of degree one.
Lemma 3.30. For a real number r 6= 0 the solution to (A − r)f = 0 with initial
value f (0) = c is given by f (n) = crn .
Proof.
(A − r)f (n) = 0 ⇔ f (n + 1) − rf (n) = 0
⇔ f (n + 1) = rf (n)
⇔ f (n) = rn · f (0) = rn · c
Raising the difficulty a bit, now consider the advancement operator polyno-
mial p(A) = A2 + A − 6 = (A + 3) · (A − 2) and the corresponding homogeneous
recurrence p(A)f = (A + 3)(A − 2)f = 0. Note that solutions of (A − 2)f = 0
or (A + 3)f are also solutions of (A + 3)(A − 2)f = 0. 4
Actually, all solutions are sums of such solutions, which means that, using
the last Lemma, we know that f is of the form
f (n) = c1 (−3)n + c2 2n .
Here, c1 and c2 are constants that can be derived from two initial values for f .
It is no coincidence that we found a two dimensional space of functions:
Lemma 3.31. If p(A) has degree k and its constant term is non-zero, then the
set of all solutions f to p(A)f = 0 is a k-dimensional subspace of functions from
Z → C, parametrized by c1 , . . . , ck which can be determined by k initial values
for f .
We do not give a formal proof. However, you should not be surprised by
the statement. It is clear that set of all solutions form a subspace: If f, g are
solutions and α, β ∈ C\{0} then αf +βg is also a solution since p(A)(αf +βg) =
αp(A)f + βp(A)g = α · 0 + β · 0 = 0. It is also intuitive that there are k degrees
of freedom: For every choice for values of f on {1, 2, . . . , k} the recurrence gives
a unique way to extend f (upwards and downwards) to other values.
Generalizing our observation for the advancement operator polynomial (A +
3)(A − 2), we state (also without proof):
4 There is some low-level notational magic involved here: We need ((A + 3) · (A − 2))f =
77
Proposition 3.32. If p(A) = (A − r1 ) · (A − r2 ) · . . . · (A − rk ) for distinct
r1 , . . . , rk , then all solutions of p(A)f = 0 are of the form
Every polynomial of degree k has k complex roots, but those roots are not
necessarily distinct. The following Theorem handles the case of roots with
multiplicity at least two and is therefore the last piece in the puzzle.
From now on, instead of “all solutions are of the form. . . ” we say the “general
solution is . . . ”.
Theorem 3.33. If r 6= 0 and p(A) = (A − r)k , then the general solution of the
homogeneous system p(A)f = 0 is
f (n) = c1 rn + n · c2 rn + n2 c3 rn + . . . + nk−1 ck · rn .
If p(A) = q1 (A) · q2 (A) and q1 (A), q2 (A) have no root in common, then the
general solution of p(A)f = 0 is the sum of the general solutions of q1 (A)f = 0
and q2 (A)f = 0.
Example. Let p(A) = (A − 1)5 · (A + 1)2 · (A − 3). Then the general solution
to p(A)f = 0 is
Note that the condition that r (the constant term of p(A)) may not be zero
is no real restriction: An advancement operator polynomial of the form p(A) · A
describes the same recurrence as p(A), just at a different index.
Theorem 3.34 (Binet’s Formula). For the Fibonacci numbers we have:
√ !n √ !n !
1 1+ 5 1− 5
Fn = √ −
5 2 2
√
1+ 5
where Φ := 2 is called the Golden Ratio.
√
1± 5
The roots of p(A) are 2 , i.e.
√ ! √ !
1+ 5 1− 5
p(A) = A− A− .
2 2
78
Figure 23: In a drawing like this one, obtained by squares “spiraling” out of the
center, the ratio of width and height will approach the golden ratio. Of course,
you already know all about spirals, since you watched all Vi Hart Videos, right?
√ n
1 1 − 5 |1 − Φ|n |1 − Φ| 1 2 2 1
√ = √ ≤ √ = √ <√ √ = < .
5 2 5 5 Φ 5 5 5 5 2
Example 3.36 (Ternary n-string without 20). In Example 3.18 we found start-
ing values and the recurrence but we have yet to determine the values. Recall:
t1 = 3, t2 = 8, tn+2 = 3tn+1 − tn .
79
Again we use the initial values to determine c1 and c2 . For simplicity, use
t0 = 1 instead of t2 = 8:
1 = t(0) = c1 + c2
√ √
3+ 5 3− 5
3 = t(1) = c1 · + c2
2 2
√ √
5+3 5 5−3 5
Solving yields c1 = 10 , c2 = 10 and therefore:
√ ! √ !n √ ! √ !n
5+3 5 3+ 5 5−3 5 3− 5
tn = − .
10 2 10 2
as desired.
This means that to find all solutions of an = f (an−1 , . . . , an−k ) + g(n) it suffices
to find a single such solution (a0n )n∈N , the particular solution, and all solutions
(a∗n )n∈N of an = f (an−1 , . . . , an−k ), the general solution.
Then the general solution of the non-homogeneous recurrence is given by
an = a∗n + a0n for n ∈ N.
Since there is no framework that always works to find particular solutions,
this is actually the difficult part.
Example 3.38. Recall from Example 1.6 (and the corresponding problem on
the exercise sheet) that when cutting a two dimensional cake into the maximum
number of pieces, the n-th cut yields n additional pieces, i.e.
80
1. Find general solution for the homogeneous recurrence:
sn = sn−1 ⇒ p(x) = x − 1 ⇒ q1 = 1, s1 = 1
⇒ a∗n = γ1,0 q1n = γ1,0 .
a0n = d1 n2 + d2 n + d3
1 = s0 = γ1,0 + 0 ⇒ γ1,0 = 1
n+1 n n n
⇒ sn = +1= + + .
2 2 1 0
an = 4an−1 − 4an−2 + 3n + 2n
an = 4an−1 − 4an−2 + 3n + 2n
Guess something similar to the right hand side. Our attempt is6 :
a0n = d1 · 3n + d2 n + d3 .
6 We could have guessed another summand of d4 · n2 , but as it turns out, it is not needed.
81
Which means we need to find d1 , d2 , d3 such that:
!
d1 · 3n + d2 n + d3 = a0n = 4an−1 − 4an−2 + 3n + 2n
= 4d1 · 3n−1 + 4d2 (n − 1) + 4d3
− 4d1 · 3n−2 − 4d2 (n − 2) − 4d3 + 3n + 2n
= 8d1 · 3n−2 + 4d2 + 3n + 2n
= d1 · 3n + d2 n + d3
+ (−d1 · 3n−2 − d2 n − d3 + 4d2 + 3n + 2n)
= d1 · 3n + d2 n + d3
+ ((9 − d1 )3n−2 + (2 − d2 )n − d3 + 4d2 )
and we get d1 = 9, d2 = 2, d3 = 8, i.e.
a0n = 3n+2 + 2n + 8.
82
Luckily, we already know the exponential generation functions for e−x and for
1
1−x so we can write:
∞
! ∞
!
1 X n X xn
−x nx
D(x) = e · = (−1) · n! .
1−x n=0
n! n=0
n!
∞ n
!
3.12
X X n xn
n−k
= k!(−1) .
n=0
k n!
k=0
P
n
n
So we found yet another proof that dn = k k!(−1)n−k .
k=0
Let us give two more examples showing how we can solve recurrences using
generating functions.
Example 3.41. Consider the recurrence
Thanks to our approach and using the recurrence for n ≥ 2, all but a finite
number of terms canceled out. In the last step we used the initial values.
Using partial fraction decomposition (not handled here) we can simplify the
identity and obtain:
1 + 4x 1 + 4x 6 1 1 1
F (x) = 2
= = 5 · − 5 ·
1 + x − 6x (1 − 2x)(1 + 3x) 1 − 2x 1 + 3x
∞ ∞
6X 1X
= (2x)n − (−3x)n
5 n=0 5 n=0
1
P∞
where in the last step we applied 1−y = n=0 y n for y = 2x and y = −3x. We
can now see the coefficients:
an = 6
5 · 2n − 1
5 · (−3)n .
83
Solving for F (x) yields:
1
F (x)(1 − x − 2x2 ) = + 1 − 3x
1 − 2x
2 − 5x + 6x2
⇒ F (x) =
(1 − 2x)(1 − x − 2x2 )
From here, apply partial fraction decomposition with a method of your choice,
to get
1 1 1
F (x) = − 91 +2 + 13
9 1 + x.
1 − 2x 3 (1 − 2x)2
From this we can obtain the coefficients again using identities we know.
∞ ∞
!2 ∞
X X X
n n n n
F (x) = − 19 2 x + 2
3 2 x + 13
9 (−1)n xn .
n=0 n=0 n=0
X∞ ∞
X X∞
= − 19 2 n xn + 2
3 (n + 1)2n xn + 13
9 (−1)n xn .
n=0 n=0 n=0
bn = − 91 2n + 32 (n + 1)2n + 13 n
9 (−1) .
84
4 Partitions
4.1 Partitioning [n] – the set on n elements
A partition of [n] into given by its parts A1 , . . . , Ak with
k
[
Ai 6= ∅ (for i = 1, . . . k), Ai ∩ Aj = ∅ (for i 6= j), Ai = [n].
i=1
7
1
5 3
6 4
9
2 8
The parts are unlabeled, i.e. if two partitions use the same parts Ai only
in a different order, we consider them to be identical. If we fix the number of
parts, i.e. if we want to count partitions of [n] into exactly k non-empty parts,
then their number is given by sIIk (n) as we have already seen in Section 1.5.2
where we considered arrangements of n labeled balls in k unlabeled boxes with
at least one ball per box.
We define the Bell Number as
n
X
Bn = sII
k (n).
k=0
It counts the total number of partitions of [n] (into an arbitrary number of sets).
Don’t get confused over the specialS case of n = 0: There is exactly one partition
of ∅ into non-empty parts: ∅ = A∈∅ A. Every A ∈ ∅ is non-empty, since no
such A exists. So we also have B0 = sII 0 (0) = 1.
A different way to define the Bell numbers is to consider a square free number
k ∈ N, meaning
The number of ways to write k as product of integers bigger than one is exactly
Bn . Take for instance k = 2 · 3 · 5 · 7, then some ways of writing k would be:
{2, 3, 5, 7} = {2, 3}∪{5, 7}, {2, 3, 5, 7} = {2}∪{5}∪{3, 7}, {2, 3, 5, 7} = {2, 3, 5, 7}.
85
Theorem 4.1.
X
n−1
n−1
Bn = Bk (n ≥ 1).
k
k=0
Proof. Every partition of [n] has one part that contains the number n. In ad-
dition to n this part contains k other numbers (for some 0 ≤ k ≤ n − 1). The
remaining n − 1 − k elements are partitioned arbitrarily. From this correspon-
dence we obtain the desired identity:
X
n−1
n−1
X n−1
n−1 X n − 1
n−1
Bn = Bn−1−k = Bn−1−k = Bk .
k n−1−k k
k=0 k=0 k=0
86
4 3 4 3
2 2
5 5
1 1
6 6
9 9
7 8 7 8
Figure 25: A non-crossing partition of [9] on the left and a crossing partition of
[9] on the right.
Theorem 4.3.
1 2n
NCn = Cn = .
n+1 n
Proof. Recall the values C0 = 1, C1 = 1, C2 = 2 (corresponding to the paren-
thesis expressions “”, “()”, “()()”, “(())”) and the recursion
n
X
Cn+1 = Ck Cn−k .
k=0
It suffices to prove that the sequence (NCn )n∈N has the same starting values
and satisfies the same recursion.
It is easy to check that NC0 = 1, NC1 = 1, NC2 = 2. Actually those numbers
are just the Bell numbers since partitions of [n] can only be crossing for n ≥ 4.
We now have to prove the recursion
n
X
NCn+1 = NCk · NCn−k .
k=0
To this end, consider any non-crossing partition P of [n + 1]. The last element
n + 1 is in some part S ⊆ [n + 1]. Let k be the biggest number in S other than
n + 1 if such an element exists and k = 0 if S = {n + 1}. Now observe that in
the partition P, every part contains either only numbers that are bigger than k
or only numbers that are at most k: Otherwise, such a part would cross S.
This means that P decomposes into a non-crossing partition of [k] and a
non-crossing partition of {k + 1, . . . , n}. Here we ignored n + 1: It must be in
the same part as k and will never produce a crossing if there has not already
been one. Such a decomposition is unique: Every non-crossing partition of
[n + 1] uniquely decomposes and every pair of non-crossing partitions of [k] and
{k + 1, . . . , n} corresponds to a non-crossing partition of [n + 1].
This proves the claimed recursion and therefore the Theorem.
n = 17 = 5 + 5 + 4 + 3.
87
is a partition of n = 17 into the (unlabeled) parts 5, 5, 4 and 3.
Alternatively we write n = λ = (5, 5, 4, 3) and say λ is the partition of n
(even though λ is a sorted tuple) hoping this will not be confusing.
We already considered this in Section 1.5.4 were we counted arrangements
of n unlabeled balls in k unlabeled boxes and at least one ball per box. The
number of such arrangements is given by pk (n). We established a recursive
formula
0 if k > n,
0 if n ≥ 1, k = 0,
pk (n)
1 if n = k = 0,
pk (n − k) + pk−1 (n − 1) if 1 ≤ k ≤ n.
We define the total number of partitions of n (into an arbitrary number of parts)
as the partition function
Xn
p(n) = pk (n).
k=0
Which means the diagram for the partition λ = (7, 5, 3, 3, 3, 2) is mapped to the
diagram for the partition λ = (6, 6, 5, 2, 2, 1, 1).
Formal Proof. The bijection on the diagrams corresponds to a bijection on the
partitions that maps the partition n = (λ1 , . . . , λk ) with biggest part l to the
conjugate partition n = (λ∗1 , . . . , λ∗l ) defined as
88
Proof. We boldly rewrite the infinite
P∞ product as an infinite product of infinite
1
sums using the identity: 1−y = n=0 y n .
∞
Y 1 1 1 1
= · · · ...
1 − xk 1 − x 1 − x2 1 − x3
k=1
∞
! ∞
! ∞
!
X X X
n1 2n2 3n3
= x x x ...
n1 =0 n2 =0 n3 =0
7 = 3 + 4, 7 = 1 + 2 + 4, 7 = 2 + 5, 7 = 1 + 6, 7 = 7.
This proves podd = pdist , but we will prove it yet again, this time by constructing
a bijection.
Theorem 4.6. podd = pdist .
Proof. Let n = λ1 + λ2 + . . . + λk be a partition of n into distinct parts. We
separate the powers of 2 from the λi , i.e. we write:
for odd numbers ui and λi = ui 2ai . Note that the ui need no longer be distinct,
for example if λ1 = 5, λ2 = 10 we would have u1 = u2 = 5. We sort the
summands according to the ui which gives
where the values µi are all distinct and take the roles of ui , in particular, the
following multisets coincide
89
Note that the values ri are sums of distinct powers of 2.
For the bijection, we map the original partition into distinct parts to the
following partition into odd parts
n = µ1 + µ1 + . . . + µ1 + µ2 + µ2 + . . . + µ2 + . . . .
| {z } | {z }
r1 times r2 times
26 = 3 · 22 + 3 · 21 + 1 · 22 + 3 · 20 + 1 · 20
= 3(20 + 21 + 22 ) + 1(20 + 22 )
=3·7+1·5
26 = 3 + 3 + 3 + 3 + 3 + 3 + 3 + 1 + 1 + 1 + 1 + 1.
All steps are reversible, as there is only one way to write 5 and 7 as sums of
distinct powers of two. In terms of Ferrer diagrams we have mapped:
7→
peven
d (7) = #{“1+6”, “2+5”, “3+4”} = 3,
podd
d (7) = #{“1+2+4”, “7”} = 2.
Apparently, these numbers can differ, but we now show that they can differ
by at most 1, and characterize when this is the case.
To this end, define for k ∈ Z the pentagonal number wk := (3k−1)k
2 . Note
(3k+1)k
that with this definition w−k = 2 . Some values are:
90
k ... -3 -2 -1 0 1 2 3 ...
wk ... 15 7 2 0 1 5 12 ...
Lemma 4.7. (
(−1)k if n = wk for some k ∈ Z,
peven
d (n) − podd
d (n) =
0 otherwise.
Proof. Consider the Ferrer Diagrams for a partition into distinct parts. In it,
two rows may never have the same length. Define the slope S of such a diagram
to be a maximal staircase going diagonally down starting from the top-right
square and define the bottom B as the last row of the diagram. The following
diagram has a slope of length 3 (highlighted in red) and a bottom of size 2
(highlighted in blue):
Note that, in a few special diagrams, B and S may have a single square in
common. We define the set ∆ := B ∩ S, it contains the single common square
if it exists, and is empty otherwise. We will be sloppy in notation using B, S
and ∆ to simultaneously denote the set and the size of the set.
Now distinguish three types partitions corresponding to three types of dia-
grams where:
• Type 1: S ≥ B + ∆
• Type 2: S < B − ∆
• Type 3: B − ∆ ≤ S < B + ∆
Note that the latter case can only occur for ∆ = 1 and S ∈ {B, B − 1}.
Claim. (i) Type 3 partitions can only occur if n = wk for some k ∈ Z.
(ii) Conversely, if n = wk for some k ∈ Z, then there is exactly one Type 3
partition of n.
Proof of (i). Consider the case k := S = B − 1 first, where the diagram looks
like this (for k = 4):
which gives
k
X (k + 1)k (3k + 1)k
n = k2 + i = k2 + = = w−k .
i=1
2 2
The other case is k := S = B meaning the diagram looks like this (for k = 4)
91
which gives
Proof of (ii). The existence is already clear from our proof of (i): We found
for arbitrary k ∈ N a type 3 partition of w−k and wk . For the uniqueness, note
that no two diagrams of type 3 have the same size: We can iterate through all
of them by alternatingly adding a column and a row.
Next we prove the following claim:
Claim (iii). For every k ∈ N there is a bijection between Type 1 partitions on
k rows and Type 2 partitions on k − 1 rows.
Note that from this the Theorem follows, since it guarantees a bijection that
maps every partition to a partition with one row more or one row less, so every
partition into an even number of parts is mapped to a partition into an odd
number of parts and vice versa. This shows that the number of even and odd
partitions must coincide. The only disturbance can be the single type 3 partition
that exists if n = wk and is even if k is even and odd if k is odd.
Proof of (iii). Consider a Type 1 partition, its slope is at least as large as its
bottom, maybe like this:
We take away the bottom and distribute the squares among the first |B| rows
of the diagram, like this:
There is enough room to do this (since the slope was at least as big as the
bottom) and the resulting partition is a partition into distinct parts. The size
of the bottom has increased and the size of the new slope is the size of the old
bottom. Therefore the new diagram is of type 2 or type 3 and, looking more
closely, type 3 can actually not occur as result of our operation, so it really is of
type 2. The inverse operation is to take the current slope and create a new row
from it. After checking that this maps type 2 partitions to type 1 partitions we
are done.
Theorem 4.8 (Euler’s Pentagonal Number Theorem).
∞
Y ∞
X ∞
X
(1 − xk ) = (−1)k xwk = 1 + (−1)k (xwk + xw−k ).
k=1 k=−∞ k=1
92
Proof. Multiply out the left hand side. The coefficient for xn counts the number
of partitions of n into distinct parts, however, partitions into an odd number of
parts are counted with negative sign. Therefore
∞
Y ∞
X ∞
X
Thm 4.7
(1 − xk ) = (peven
d (n) − podd
d (n))x
n
= (−1)k xwk .
k=1 n=0 k=−∞
We start with one dot, and say it is “layer 0”, then add layer by layer filling
an area of the respective kind. For pentagonal numbers, there are four dots in
layer 1, seven dots in layer 2 and so on. Generally, in layer i there are 3i + 1
dots since there are three sides with i + 1 dots each, but two dots are shared by
two sides. In a drawing with k layers there are:
k−1
X k−1
X
k (3k − 1)k
3i + 1 = k + 3( i) = k + 3 = = wk .
i=0 i=0
2 2
11 26 23 13 19 14 24
8 18 9 15 22 21
5 1 2 3 6
17 4 7
10 12
16 25
20
93
Young tableau:
1 2 3 6 12 16 25
4 7 8 17 18 21
5 11 13 19 26
9 15 22
10 23
14 24
20
i.e. the set of ordered pairs (T1 , T2 ) where T1 and T2 are standard Young tableaux
of the same shape.
Proof. We will see a geometric construction that builds, given a permutation π,
a corresponding pair of standard Young tableaux.
Let π be a permutation of [n] and X(π) = {(i, π(i)) | i = 1, . . . , n} the corre-
sponding point set in the plane. For instance, for the permutation π = 3271546
of [7] the point set X(π) is:
π(x)
7
6
5
4
3
2
1
x
1 2 3 4 5 6 7
A point p is minimal if there is no other point that is both to the left and below
of p. The set of all minimal points in X is therefore
The shadowline S(X) for a point set X is the weakly decreasing rectilinear
line through all points in min(X) and convex bends exactly in min(X). Here,
by convex we mean and by concave we mean . In the following drawing,
94
min(X) (red) and S(X) (black) are shown.
π(x)
7
6
5
4
3
2
1
x
1 2 3 4 5 6 7
The algorithm proceeds in phases (counted by the variable i), the total
number of phases m is not known beforehand (but bounded by n). We start
every phase i with a non-empty point set Xi . For Xi we construct a sequence
of shadowlines Si1 , . . . , Sini where the j-th shadowline is taken for the point set
consisting of those points from Xi that were not used for previous shadowlines.
Sj−1
Si1 = S(Xi ), Si2 = S(Xi \ Si1 ). In general: Sij = S(Xi \ k=1 Sik ).
The i-th phase ends as soon as all points from Xi were contained in one of the
shadowlines Si1 , Si2 , . . . , Sini .
The shadowlines of phase i determine the i-th row of the tableaux T1 and
T2 we want to construct. Let xji be the x-coordinate of first segment of the
shadowline Sij , i.e. the x-coordinate at which Sij leaves the picture on the top
and yij the y-coordinate of last segment of Sij , i.e. the y-coordinate at which Sij
leaves the picture on the right. Then the i-th row of T1 and T2 consists of the
numbers x1i , . . . , xni i and yi1 , . . . , yini , respectively.
95
Shadowlines may contain concave bends (they do iff they have two or more
convex bends). The set Xi+1 is defined as the set of all concave bends occurring
in the shadowlines Si1 , . . . , Sini . If Xi+1 is empty, we are done, otherwise, it
serves as point set for phase i + 1. (proof continues later)
Before we verify that the construction yields the bijection we desire, we give
an example.
Example. Consider again π = 3271546. The first phase of the algorithm will
find three shadowlines S11 , S12 , S13 . They leave the diagram on x positions 1, 3
and 7 and on y positions 1, 4 and 6, giving rise to the partial tableaux as shown.
π(x)
7
(1)
6 T1 = 1 3 7
5
4
3 (1)
2 T2 = 1 4 6
1
x
1 2 3 4 5 6 7
There are four concave bends on these shadowlines which give the point set
for the next phase: X2 = {(2, 3), (4, 2), (5, 7), (6, 5)}. In phase 2 we get two
shadowlines and add corresponding second lines to the tableaux.
π(x)
7 (2) 1 3 7
6 T1 =
2 5
5
4
3 1 4 6
(2)
2 T2 =
1 2 5
x
1 2 3 4 5 6 7
There are still convex bends, so we proceed with phase 3
π(x)
1 3 7
7 T1 = 2 5
6
4 6
5
4
3 1 4 6
2 T2 = 2 5
1 3 7
x
1 2 3 4 5 6 7
No concave bends were made this time, our construction is done. The pair
(T1 , T2 ) is the result.
96
We now establish, in a series of claims, that the construction constitutes
a bijection between permutations and pairs of Young tableaux of the same
shape as desired. We use the notion of a chain, which is a subset Y of a
point set X such that Y is increasing, i.e. Y = {(x1 , y1 ), (x2 , y2 ), . . . , (xk , yk )}
with y1 < y2 < . . . < yk and x1 < x2 < . . . < xk .
Claim. The number ni of shadows lines in phase i is the length of the longest
chain in Xi .
97
So far we proved that our map is well-defined, i.e. it gives rise to a pair of
standard Young tableaux for any permutation π ∈ Sn . The final step is to show
that the map constitutes a bijection.
Claim. There is an inverse map, i.e. from any pair (T1 , T2 ) of standard Young
tableaux of the same shape we can recover a corresponding permutation π ∈ Sn .
Proof. We demonstrate the inverse procedure with an example, hoping the gen-
eral case will be apparent from this. Consider the following pair of Young
tableaux:
1 2 4 8 1 3 6 7
3 6 2 4
T1 := , T2 :=
5 5
7 8
These tableaux contain eight numbers in four rows, so we try to recover a
four-phased construction and annotate the x and y coordinates of an 8 × 8 grid
with the index of the phases at which we wish a corresponding shadowlines to
leave the picture.
π(x)1 1 2 1 3 2 4 1
8 4
7 1
6 1
5 3
4 2
3 1
2 2
1 1
x
1 2 3 4 5 6 7 8
98
π(x)1 1 2 1 3 2 4 1 π(x)1 1 2 1 3 2 4 1
8 4 8 4
7 1 7 1
6 1 6 1
5 3 5 3
4 2 4 2
3 1 3 1
2 2 2 2
1 1 1 1
x x
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
π(x)1 1 2 1 3 2 4 1 π(x)1 1 2 1 3 2 4 1
8 4 8 4
7 1 7 1
6 1 6 1
5 3 5 3
4 2 4 2
3 1 3 1
2 2 2 2
1 1 1 1
x x
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
i: 1 2 3 4 5 6 7 8
π(i): 1 2 3 4 5 6 7 8
π 2 (i): 1 2 3 4 5 6 7 8
99
(iii) The length of the longest increasing subsequence in π is the length of the
first row in T1 .
The point set X(π −1 ) is therefore obtained by flipping the point set X(π)
along the diagonal x = y, in other words, changing the roles of x and y.
The shadowlines we obtain will also be flipped, so every x-coordinate we
would have written into T1 is now a y-coordinate and therefore written
into T2 and vice versa. This just means that the roles of T1 and T2 are
swapped.
(i)
(ii) π 2 = id ⇔ π = π −1 ⇔ (T1 , T2 ) = (T2 , T1 ) ⇔ T1 = T2 .
(iii) Consider an increasing subsequence, i.e.
i1 = 1, i2 = 2, in = in−1 + (n − 1)in−2 .
Case 1: The ball with label n is in its own box. The rest is an arrangement
with n − 1 balls.
Case 2: The ball with label n is in a box together with another ball x. There
are n − 1 choices for x and the rest is an arrangement with n − 2 balls.
100
In general, let λ = (n1 , n2 , . . . , nm ) where n1 ≥ n2 ≥ . . . ≥ nm are numbers
that sum up to n. With this in mind we define the function t : Zm → N as
(
|T ((n1 , . . . , nm ))| if n1 ≥ n2 ≥ . . . ≥ nm ≥ 0,
t(n1 , . . . , nm ) =
0 otherwise.
For technical reasons, we allow trailing zero-length rows, for instance we could
write T ((3, 2, 0, 0)) = T ((3, 2, 0)) = T ((3, 2)) = 5. We claim the function is
fully characterized by the following identities:
(1) If the numbers ni fail to be weakly decreasing then
t(n1 , . . . , nm ) = 0.
t(n1 , . . . , nm , 0) = t(n1 , . . . , nm ).
(3) If the numbers are weakly decreasing and none is zero then
m
X
t(n1 , . . . , nm ) = t(n1 , n2 , . . . , ni−1 , ni − 1, ni+1 , . . . , nm )
i=1
= t(n1 − 1, . . . , nm ) + t(n1 , n2 − 1, . . . , nm ) + . . . .
t(n) = 1.
It is obvious that t fulfills (1), (2) and (4). And (3) is just the observation that
the biggest number n must be the last number in one of the m rows. Consider
for instance λ = (5, 4, 4, 3) then in any Young tableau the number n = 16 will
be in one of the following positions:
n
n
n
n
For each case, we count the number of Young diagrams of the shape where the
corresponding square was removed, i.e. we consider the shapes
, , ,
101
means there is a unique solution. While we do not try the long and arduous
journey of discovering the solution ourselves, we can, given the solution, verify
that it is indeed correct.
To this end, we need to first examine the Vandermonde determinant which
is defined7 as Y
∆(x1 , . . . , xm ) := (xi − xj ).
1≤i<j≤m
It has the curious property that swapping two input values changes the sign
of the output. To see this, observe firstly what happens if the adjacent inputs
xi and xi+1 in the argument list of ∆ are swapped for some i ∈ [m − 1]. All
factors remain unchanged with the exception of (xi − xi+1 ) which is replaced
by (xi−1 − xi ) = −(xi − xi−1 ) as claimed. If two non-adjacent values xi , xi+k
are swapped, then this can be simulated by an odd number of swaps of adjacent
elements. Think of a race with n runners where Alice is in place i and Bob in
place i + k. We can make them change places as follows: First Bob overtakes
k people, putting him in front of Alice and then Alice (who is now in position
i+1) must be fall behind k −1 positions. This gives 2k −1 overtaking operations
in total, an odd number.
We remark (without proof), that the Vandermonde is indeed a determinant,
namely
1 x1 x21 ··· xm−1
1
1 x2 x2 ··· xm−1
(m
) 2 2
∆(x1 , . . . , xm ) = (−1) 2 det .. .. .. .. .
. . . .
··
·
2 m−1
1 xm xm ... xm
Before we can get back to counting tableaux, we need to prove a technical
Lemma:
Lemma 4.13. We have the following identity on polynomials over x1 , x2 , . . . , xm , y:
m
X
m
xk ∆(x1 , . . . , xk + y, . . . , xm ) = x1 + x2 + . . . + xm + 2 y ∆(x1 , . . . , xm ).
k=1
Proof. Let g be the polynomial given as the sum on the left hand side. Observe
what happens if we swap the roles of xi and xj (with i < j) in g. The summands
for k ∈
/ {i, j} will just change sign (by our previous observation). The summand
with k = i turns into:
So the term for k = i became the negated term with k = j, and this works vice
versa as well — altogether, the value of g changes sign if xi and xj swap roles.
Now consider the case of xi = xj , then swapping xi and xj obviously doesn’t
change anything. The only number that doesn’t change if its sign changes is
zero, so g = 0 whenever two xi and xj coincide (for i 6= j).
Now think of all the variables x2 , . . . , xm , y as some (distinct) integer con-
stants and of x1 as the only actual variable, then g is a polynomial of degree
7 Our definition differs from the usual one in that it may have a different sign.
102
n over x1 . It has several zeroes, one of which is x1 = x2 . This allows us to
divide g by the degree 1 polynomial x1 − x2 (using polynomial long division)
and we obtain g = p · (x1 − x2 ) for some polynomial p. In the same way we
separate the other zeroes getting g = p0 · (x1 − x2 )(x1 − x3 ) . . . (x1 − xm ) for
some polynomial p0 . Then, looking at p0 , we switch perspective thinking of x2
as the variable and of the other values as (suitable distinct) constants. Looking
at it that way we find the zeroes x2 = x3 , x2 = x4 , . . . , x2 = xm of p0 and can
separate corresponding factors from p0 . We repeat this for the remaining m − 2
variables as well. With this we eventually obtain:
Y
g = p̂ · (xi − xj ). (?)
1≤i<j≤m
for some polynomial p̂. Note that this was the important step. Since this is not
an algebra lecture, we allowed ourselves to be a bit sketchy: Formally we would
have to argue that no funny business is going on when doing the polynomial
long division and switching perspective between thinking of the xi as constants
or variables.
m
Given (?), everything falls into place: The degreem
of the polynomial g is
2 + 1 while the degree of the right hand side is 2 plus the degree of p̂. So
the degree of p̂ is one!
This means that g is of the form:
Y
g = (a1 x1 + a2 x2 + . . . + am xm + by) · (xi − xj )
1≤i<j≤m
= (a1 x1 + a2 x2 + . . . + am xm + by)∆(x1 , . . . , xm ).
Since the equation must hold for the special case of y = 0, we quickly see that
a1 = a2 = . . . = am = 1.
Now to determine b, we multiply out g and collect the sum of all monomials
containing y with multiplicity 1. We start by analyzing the k-th summand of g:
xk · ∆(x1 , . . . , xk + y, . . . , xm )
Y
= xk · (xi − xj ) (x1 − xk − y)(x2 − xk − y) . . . (xk + y − xm ).
1≤i<j≤m
i,j6=k
To get monomials with a single y, choose y from one of the factors and choose
the part without y everywhere else. This allows to reassemble ∆ with the
exception of a single missing factor (we write it into the denominator). The
8 You may object that p̂ may have a constant term. We call a polynomial homogeneous, if
all monomials have full degree. Since g and ∆(x1 , . . . , xm ) are homogeneous, we can conclude
that p̂ must be homogeneous as well.
103
m − 1 summands (one for each occurrence of y) add up to
Y
−y −y y
xk · (xi − xj ) + + ... +
x1 − xk x2 − xk xk − xm
1≤i<j≤m
k−1 m
!
X xk X xk
= y∆(x1 , . . . , xm ) − +
i=1 i
x − x k xk − xi
i=k+1
X −xk
= y∆(x1 , . . . , xm ) .
xi − xk
i6=k
m
This shows b = 2 , completing the proof.
Theorem 4.14.
∆(x1 , . . . , xm )n!
t(n1 , . . . , nm ) =
x1 !x2 ! . . . xm !
Pm
where n = i=1 ni , xi = ni + m − i for i = 1, . . . , m and x1 ≥ . . . ≥ xm ≥ 0.
Proof. We show that the right hand side is a solution to the recurrence we found
for t, i.e. if we define t by the right hand side then (1) − (4) from page 101 hold.
First note what happens if xi are not strictly decreasing, i.e. xi = xi+1 for some
i. This gives a value of 0 since the factor 0 = xi − xi+1 occurs in ∆(x1 , . . . , xm ).
We also know:
ni + m − i = ni+1 + m − (i + 1) ⇔ ni = ni+1 − 1,
so the ni are not weakly decreasing which means these values do not correspond
to a valid shape for Young tableaux. Some other invalid shapes were already
excluded by restricting ourselves to weakly decreasing xi . All in all, we are
consistent with:
(1) t(n1 , . . . , nm ) = 0 unless n1 ≥ . . . ≥ nm ≥ 0.
The second thing we need to show is
(2) t(n1 , . . . , nm−1 , 0) = t(n1 , . . . , nm−1 ).
Since nm = 0 means xm = 0, we calculate:
Y Y m−1
Y
∆(x1 , . . . , xm−1 , 0) = (xi − xj ) = (xi − xj ) xi
1≤i<j≤m 1≤i<j≤m−1 i=1
m−1
Y m−1
Y
= ∆(x1 , . . . , xm−1 ) xi = ∆(x1 − 1, . . . , xm−1 − 1) xi .
i=1 i=1
104
With this it is easy to verify:
The hard part is (3), but we did most of the work already in the last Lemma.
Pm
(3) t(n1 , . . . , nm ) = k=1 t(n1 , . . . , nk − 1, . . . , nm ).
First note:
m
X m
X m−1
X
m
xi = (ni + m − i) = n + j =n+ .
i=1 i=1 j=0
2
Hence:
Pm
∆(x1 , . . . , xm )n! xk ∆(x1 , . . . , xk − 1, . . . , xm )(n − 1)!
k=1
t(n1 , . . . , nm ) = =
x1 ! . . . xm ! x1 ! . . . xm !
m
X Xm
∆(x1 , . . . , xk − 1, . . . , xm )(n − 1)!
= = t(n1 , . . . , nk − 1, . . . , nm )
x1 ! . . . (xk − 1)! . . . xm !
k=1 k=1
105
Theorem 4.15 (Hook length formula). Let λ be a partition of n. Then
n!
t(λ) = Q
|hi,j |.
(i,j)∈λ
Proof. Let λ = (n1 , . . . , nm ). We multiply the lengths of all the hooks, going
through them row by row. For row i, the first hook hi,1 starts at square (m, 1)
then goes upwards and rightwards and ends at square (i, ni ). It has length
(ni + m − i) = xi . The other hooks in row i also end in (i, ni ) but they start
in other positions. We go through these positions from left to right as shown in
the following figure (for i = 3).
•
•
•
•
Now if there was a hook starting in each integer position this line passes through,
the product of the hook lengths would just be xi !. However, the positions
marked with a dot do not correspond to starts of hooks so for each dot we
have to divide by the length that a hook starting there would have. The dot
in line j (j > i) is in position (j, nj + 1) so a hook going to (i, ni ) has length
(j − i + ni − nj ) = (xi − xj ). So the product of all valid hooks for line i is
xi !
(xi −xj ) . Now for the product of all hook lengths of all rows we get:
Q
j>i
Y Y xi ! x1 ! . . . xm ! x1 ! . . . xm ! 4.14 n!
|hi,j | = Q = Q = =
(xi − xj ) (xi − xj ) ∆(x1 , . . . , xm ) t(λ).
(i,j)∈λ 1≤i≤m
j>i i<j
Take for instance the following Ferrer diagram, that we annotate with the
lengths of the hooks rooted at the respective position:
7 6 3 1
→ 5 4 1
3 2
2 1
The number of Young tableaux of this shape is by the Hook length formula 11!
divided by all the hook lengths, so:
11! 11 · 10 · 9 · 8
= = 11 · 10 · 3 · 4 = 1320.
7·6·3·1·5·4·1·3·2·2·1 3·2
106
5 Partially Ordered Sets
Example 5.1. As was accurately observed by Randall Munroe, creator of xkcd,
different fruit not only differ in their tastiness, but also in the difficulty of
preparing them. Some types fruit are clearly superior to others, for instance
Figure 27: https://xkcd.com/388/: Coconuts are so far down to the left they
couldn’t be fit on the chart. Ever spent half an hour trying to open a coconut
with a rock? – Randall Munroe
seedless grapes are both more tasty and easier to prepare than oranges. However,
if you compare pineapples with bananas then pineapples are more tasty but
harder to prepare, so there is no clear winner.
To model these situations where some things are bigger/better/above/dominating
others things but some pairs of things may also be incomparable/on the same
level/of equal rank, we introduce the concept of partial orders.
You probably already know some (partial) orders, and recognize a few ex-
amples from Table 3. Note that a total order (where for any x, y ∈ X we have
x ≤ y or y ≤ x) is a special case of a partial order. We now introduce some
notation.
• If x ≤ y and x 6= y then we write x < y.
• If x ≤ y we also write y ≥ x.
107
X ≤ Example Relations
natural numbers order by value 23 ≤ 42, 42 23
natural numbers order by divisibility 6 7, 7 6, 13 ≤ 91
polynomials order by divisibility X − 7 ≤ X 2 − 10X + 21
words over {A, . . . , Z} order lexicographically ELEPHANT ≤ MOUSE
real intervals order by inclusion [3, 5] ≤ [2, 7], [2, 3] [4, 6]
real intervals “completely left of” [3, 5] [2, 7], [2, 3] ≤ [4, 6]
real valued functions point-wise domination 1 + x x2 , sin(x) ≤ 2
subsets of [N ] inclusion {1, 3} ≤ {1, 2, 3, 5}
Table 3: Some posets P = (X, ≤) with informal definitions of their order rela-
tions and examples.
• If x < y and there is no z with x < z < y, then x < y is a cover relation
(or cover for short), denoted by x l y.
We depict a poset by its Hasse diagram. In it, every x ∈ X corresponds to a
point in the plane and for every x, y ∈ X with xly the corresponding points are
connected by a y-monotone line where x is the lower (with smaller y-coordinate)
endpoint. In particular “transitive edges” are omitted meaning x, y ∈ P are not
directly connected if there is a z such that x < z < y. The poset from Example
5.1 has the Hasse diagram shown in Figure 28.
Figure 28: Hasse Diagram for the poset originating from Figure 5.1. For instance
lemons and bananas are connected by a line since bananas are both tastier and
easier to prepare. Lemons and cherries are not connected by a direct line since
they are already connected via bananas. Bananas and watermelons are not
connected at all since they are incomparable.
108
Sometimes it may be easier to define the cover relations than to define the
entire relation.
Example. Let X be the set of states of a Rubik’s Cube. For x ∈ X define r(x)
to be the minimum number of moves needed to solve the cube from state x.
We want x l y if and only if r(x) < r(y) and x and y can be transformed
into one another by one move. This is the cover relation of a poset (without
proof).
min(P ) := {x ∈ X | ∀y ∈ X : x ≤ y or x k y},
max(P ) := {x ∈ X | ∀y ∈ X : x ≥ y or x k y}.
Looking at the poset P from Figure 28 again, an example for a chain would
be {Watermelons, Pears, Strawberries}. There are several longest chains, one
of which is {Grapefruit, Oranges, Bananas, Plums, Pears, Blueberries, Seed-
less Grapes}. The height of P is therefore 7. An example for an antichain
is {Pineapple, Cherries, Plums, Red Apples}. No other antichain is longer so
the width of P is 4. The maximal elements are max(P ) = {Seedless Grapes,
Peaches} and the minimal elements are min(P ) = {Pineapples, Pomegranates,
Grapefruit, Lemons}. We now study partitions of posets into chains and an-
tichains. We start with the easier case.
Theorem 5.4 (Antichain Partitioning). The elements of every poset P =
(X, ≤) can be partitioned into h(P ) antichains (and not less).
Proof. Note first that we cannot partition P into fewer antichains: No antichain
can contain two elements of a chain, since elements of chains are pairwise com-
parable and elements of antichains are pairwise incomparable. Since P contains
a chain Y of size h(P ), at least h(P ) antichains are needed.
To see that h(P ) antichains suffice, first note that min(X) is an antichain:
Two minimal elements are always unrelated, otherwise the “bigger” minimal
element would not be minimal at all. Also, every maximal chain in P contains an
element from min(X): If Y = {y1 , . . . , yk } is a maximal chain with y1 < . . . < yk
and y1 is not minimal, then we would find y0 < y1 and therefore a longer chain.
So we take the first antichain to be min(P ), then we still need to partition
X \ min(P ) into h(P ) − 1 antichains. Since the maximal chains in X \ min(P )
have size at most h(P ) − 1 we can do this by induction.
109
Figure 29: Partition of the poset into 7 antichains obtained by iteratively putting
the minimal remaining elements into a new antichain.
To get back to our fruit example, Figure 29 shows how the poset can be
partitioned into 7 antichains.
Theorem 5.5 (Dilworth’s Theorem). Every poset P = (X, ≤) can be parti-
tioned into w(P ) chains (and not less).
Proof. Clearly, w(P ) chains are necessary: P contains an antichain of size w(P )
and no two of its elements can be contained in the same chain.
To show w(P ) chains suffice, we do induction on |X|. Consider a maximum
antichain A = {x1 , . . . , xw(P ) }.
The idea is to split P along A into two parts, take for instance our fruit
poset and A = {Seeded Grapes, Cherries, Plums, Red Apples}, then the two
parts are shown in Figure 30.
Formally we define P1 = (X1 , ≤) and P2 = (X2 , ≤) with elements
X1 := {y ∈ X | ∃x ∈ A : x ≤ y}, X2 := {y ∈ X | ∃x ∈ A : y ≤ x},
and the same relations as before. Note that with this definition X1 ∩ X2 = A,
since if y ∈ (X1 ∩ X2 ) then there is x1 ∈ A and x2 ∈ A with x1 ≤ y ≤ x2 . Since
A is an antichain this implies x1 = x2 = y so y ∈ A.
Now if we can partition P1 into chains C11 , . . . , C|A|
1
and P2 into chains
2 2 1 2
C1 , . . . , C|A| where Ci and Ci both contains xi then we can attach the chains
to one another obtaining a partition of P into chains C1 , . . . , C|A| . We can find
these partitions by induction unless P1 or P2 fail to be smaller than P .
Convince yourself that P1 = P ⇔ A = min(P ) and P2 = P ⇔ A = max(P ).
So our only problem is the case where there is no largest antichain except for
min(P ), max(P ) or both.
In that case, let C be any maximal chain. In the same way as in the previous
Theorem we argue that min(P ) ∩ C 6= ∅ and max(P ) ∩ C 6= ∅. Then P \ C has
110
Figure 30: We split the poset P along an antichain into P1 (the upper half) and
P2 (the lower half). The elements of the antichain are contained in both posets
after the split.
111
e
z
c d
Q := P := x y P 0 :=
w
a b
Figure 31: With the posets as shown (via their Hasse diagrams) P is a subposet
of Q as witnessed by the map w 7→ a, x 7→ c, y 7→ d, z 7→ e. Actually, there are
four different embeddings, since w could also be mapped to b and x and y can
also be mapped to c and d the other way round. These two maps correspond
to two copies of P in Q, namely {a, c, d, e} and {b, c, d, e}. The poset P 0 is not
a subposet of Q. However, Q is an extension of P 0 .
Proof. It is easy to verify that ≤ is an order relation. It is also clear that all pairs
that are related via ≤ are related via ≤1 and ≤2 , so ≤1 and ≤2 are extensions
of ≤.
Two-dimensional posets. Recall the poset P from the fruit example. It has
two natural linear extensions: The first is the total order LTaste in which the
fruit are arranged in order of increasing tastiness:
In other words, LTaste is the order obtained when projecting all elements to the
tastiness-axis (note how this is an extension of P and a total order, assuming no
two fruit have exactly the same tastiness) The second is the total order LEase ,
in which the fruits are arranged in order of increasing ease of preparing them:
Fruit are ordered if and only if they are ordered according to both linear orders
so P = L1 ∩ L2 .
We want to capture this property in a new notion and define: A poset is 2-
dimensional if it has a two dimensional picture9 , i.e. an assignment f : P → R2
of positions in the plane to each element such that x < y in P if and only if
f (y) is above and right of f (x).
Not every poset is 2-dimensional. Consider the spider poset with three
legs, the Hasse Diagram is shown on the left of Figure 32. This spider has a
head H three knees K1 , K2 , K3 considered bigger than the head and three feet
F1 , F2 , F3 considered smaller than the corresponding knee. There are no further
relations.
We try to find a two dimensional embedding into the plane, i.e. assign
positions in the plane to each element (see right side of Figure 32).
The head H has to go somewhere which partitions the plane into four quad-
rants (above and right of H, below and right of H, above and left of H, below
and left of H). The knees must all go above and right of H since they are bigger
than H. Since knees are incomparable, the ones with bigger x coordinate must
9 Strictly speaking we would have to say: It has a two-dimensional picture but no one-
112
K1
K2 K1 K3 K2
K3
F2 F1 H F3
H
Figure 32: On the left: The spider with three legs. On the right: Sketch for the
proof that the spider is not two-dimensional.
have smaller y coordinate so the knees must lie on a decreasing curve as shown.
Without loss of generality, K2 is the second point on this decreasing curve. Now
observe that there is no suitable point to put F2 : It cannot be in the quadrant
below and left of H nor in the quadrant above and right of H since H k F2 .
Since it must also be below and left of K2 , it must be in the shaded area. But
points in that area are below and left of either K1 or K3 (or both), which is not
allowed since F2 k K1 and F2 k K3 . This completes the proof.
113
Assume not, then we find elements a, b, c in P that form an unordered
c
copy of , i.e. we have the situation a b with b <L1 a <L1 c. Since
b ≤P c we also have b ≤L2 c so we are in this situation:
L2
c
b
L1
xj
x
We must have put xi so high for a reason: There must be some xk (k < i)
with xk <P xi and xi was therefore assigned a y-coordinate barely above
xk . In the drawing above, xk is therefore on one of the horizontal lines,
either to the left or to the right of xj . It cannot be on the right (i.e. on
the blue line and j < k < i) since that would imply xj < xk (otherwise we
would have made a mistake already earlier, but i was minimal) and because
of xk < xi and transitivity we would also have xj < xi contradicting our
assumption of xj k xi .
So we have that xk is to the left of xj (i.e. on the red line and k < j).
x
This means xk k xj and therefore xj xik is a copy of in P . Since in
L we have xk < xj < xi it is not ordered by L, a contradiction.
114
We say a poset P is d-dimensional if it is a subposet of (Rd , ≤dom ) and not a
subposet of (Rd−1 , ≤dom ).
“⇒” We can assume, without loss of generality, that no two points share a
coordinate (we always break ties without changing the relations in the
poset). Then define Li as the order of the elements on the projection of P
to the i-th coordinate. This is a set of linear extensions with intersection
P.
“⇐” Given linear extensions L1 , . . . , Ld we can take these to assign coordinates.
The coordinate of x ∈ P will be (r1 , . . . , rd ) ∈ Rd where ri is the rank of
x in the i-th linear extension, i.e. ri = |{y ∈ P | y ≤Li x}|. It is easy to
see that this gives an embedding of P into Rd .
• For single elements sets S = {s} we will simply write D(s) and D[s]
instead of D({s}) and D[{s}].
• In the same way define open and closed upsets:
dim(P ) ≤ w(P ).
115
X5
X4
∅ X3
X2
X1
X0
Figure 33: On the left: The open upset of {Cherries, Red Apples, Blueberries}
is shown in red and the closed downset of {Pineapples Bananas} is shown in
dashed blue.
On the right: Sketch for the claim from Theorem 5.10. Consider the chain
C = {Oranges, Watermelons, Cherries, Pears, Strawberries}. Then L is a linear
extension that puts the elements from C as late as possible, for instance like this
(elements from C highlighted): L : Grapefruit, Lemons, Tomatoes, Pineapples,
Pomegranates, Oranges, Bananas, Red Apples, Watermelons, Plums, Green
Apples, Seeded Grapes, Cherries, Pears, Peaches, Blueberries, Strawberries,
Seedless Grapes
Note first that once we have proved this claim, we have proved the theorem
since the linear extensions TL1 , . . . , Lw we get for the chains C1 , . . . , Cw are a
w
realizer of P , meaning P = i=0 Li . It is clear that the right side is an extension
of the left side. Now consider if x k y. Then x ∈ Ci for some i ∈ [w], and y ∈ Cj
for some j 6= i. Then we have y <Li x and x <Lj y so neither relation occurs
in Li ∩ Lj .
Think of Li as a linear extension of P that puts the elements from Ci as late
as possible. A sketch is given on the right of Figure 33.
Proof of claim. Denote the elements of C by x1 < x2 < . . . < xk and consider
their upsets. Note that clearly U (x1 ) ⊂ U (x2 ) ⊂ . . . ⊂ U (xk ).
Now define X0 := X \ U [x1 ], Xj := U (xj ) \ U [xj+1 ] (for 1 ≤ j < k) and
Xk := U (xk ) and let Pj := (Xj , ≤) be the poset induced by Xj (0 ≤ j ≤ k).
Given a linear extension Lj for each Pj (0 ≤ j ≤ k) we now define the linear
extension L of P as:
L : L0 x1 L1 x2 L2 . . . xk Lk .
First, convince yourself that L really is a linear extension of P , i.e. every element
occurs exactly once and if x <P y then x occurs before y.
116
Now assume x ∈ C and x k y. Say x = xl . Then y ∈/ U (xl ) so y is not part
of any Xj for j ≥ l. This means y <L xl = x as claimed.
Example 5.11 (Standard Examples). For a positive integer n define the poset
Sn = ({a1 , . . . , an , b1 , . . . , bn }, ≤) where ai ≤ bj ⇔ i 6= j and no (non-reflexive)
relations within {a1 , . . . , an } and {b1 , . . . , bn } respectively. For instance:
b1 b1 b2 b1 b2 b3 b1 b2 b3 b4
S1 = , S2 = S3 = S4 =
a1 a1 a2 a1 a2 a3 a1 a2 a3 a4
l2
T
S E
l1 P O
Figure 34: On the left we drew five shapes between two horizontal lines. From
this we obtain a poset (shown on the right) by considering a shape less than
another shape if it is left of it.
117
In the following we ask: What kind of posets can be represented in such a
way and what shapes are required?
Observation 5.12 (Straight Lines). If P is 2-dimensional then P can be rep-
resented by straight segments spanned between the two lines.
To see this, assume P = L1 ∩ L2 for two linear orders L1 , L2 . Then place
the elements of P on l1 in the order given by L1 , also place them onto l2 in the
order given by L2 and connect the points corresponding to the same element.
This give a line segment sx for each x ∈ X.
Now observe:
The reverse holds as well, i.e. any poset represented by straight segments
spanned between l1 and l2 is at most two-dimensional and a realizer is given by
sorting the segments according to their endpoints on l1 and l2 .
The second type of object we consider are axis aligned rectangles. Note that
since those rectangles must be tangential to l1 and l2 , they are already uniquely
determined by their leftmost and rightmost x-coordinate. In fact, the setting is
not “really” two-dimensional and is easily seen to be equivalent to the setting
of interval orders where:
An interval order P = (X, ≤) is given by a set X of open bounded intervals
in R with (a, b) <P (c, d) iff b ≤R c.
We remark that not all interval orders are 2-dimensional but we will not
prove this until later. We first characterize them in terms of a forbidden sub-
poset:
Theorem 5.13 (Fishburn). P = (X, ≤) is an interval order if and only if
2 ⊕ 2 * P , i.e. there is no copy of the poset contained in P .
Proof. “⇒” Assume there is a copy of 2 ⊕ 2 labeled like this.
b d
a c
118
Given the observation, we can order the downsets D = {D(x) | x ∈ X}
by strict inclusion, i.e. ∅ = D0 ⊂ D1 ⊂ D2 ⊂ . . . ⊂ Dµ .
For x ∈ X we choose the interval (ax , bx ) such that
(
min{β | x ∈ Dβ } if x ∈ / max(P )
D(x) = Dax , bx =
µ+1 if x ∈ max(P ).
h
h g
f
e
f g d
c
d e b
a
a b c 0 1 2 3 4 5
Figure 35: Example for the construction in Fishburn’s Theorem. The poset on
the left contains no 2 ⊕ 2. We determine all downsets D0 ⊆ . . . ⊆ D4 . We pick
the interval orders as the Theorem suggests, for instance, the interval for d is
(1, 3) since D1 is the downset of d and D3 is the first downset containing d.
We now consider the case where the objects are (open) triangles spanned
between l1 and l2 with the base on l1 and tip on l2 (see pictures below).
Theorem 5.14. P is a triangle order if and only if there is a linear extension
of P that orders all copies of 2 ⊕ 2.
Here we say a 2 ⊕ 2 is ordered by a linear order L, if L puts the element of one
chain both before the elements of the other chain. In other words, if 2 ⊕ 2 is
labeled like this:
b d
a c
then we want either a <L b <L c <L d or c <L d <L a <L b.
Proof. “⇒” If P is a triangle order then we claim that taking L as the order
of the tips of the triangles from left to right works. Note that L is a
total order (obviously) and a linear extension of P : If a triangle is left of
another triangle (T1 <P T2 ) , then in particular its tip is left of the other
triangle (T1 <L T2 ).
Now consider a copy of 2 ⊕ 2 with the same labeling as above.
119
Then a, b, c, d correspond to triangles Ta , Tb , Tc , Td and Ta is left of Tb .
Ta Tb
Ta Tc Tb
y1 u y2 y3 s x
↑ ↑
ax far right
↑
shifting by and ax left,
almost overtaking au
120
All Orders
Convex Orders
Triangle Orders
Figure 36: Venn diagram showing the relationships between some types of orders
we consider here and in the following.
Since x is the most recent element, it comes last in L, and since L orders
every copy of 2 ⊕ 2 we have t <L u <L y <L x and the picture above was
actually misleading. In truth we are in this situation:
t u y x
121
closed intervals with end points in [n] has dimension at least log log n + ( 21 +
o(1)) log log log n.
Theorem 5.15. Every poset P can be represented by y-monotone curves spanned
between l1 and l2 .
Proof. Take any realizer of P , i.e. P = L1 ∩ L2 ∩ . . . ∩ Lk for linear orders
L1 , . . . , Lk . Assume without loss of generality k ≥ 2 and introduce k − 2 lines in
between l1 and l2 . This gives k lines in total. On the i-th line we distribute the
element of P in increasing order according to Li . We then connect all points
belonging to the same element with a straight segments.
ordered by L5 : l2
ordered by L4 :
ordered by L3 :
ordered by L2 :
ordered by L1 : l1
x w y z
Now it is obvious that x <Li y for each i if and only if the line for x is left of
the line for y.
Lemma 5.16. There is a 3-dimensional order that is not a convex order.
Proof. Here is one such poset. It is a modified standard example:
d1 d2 d3
c1 c2 c3
b1 b2 b3
a1 a2 a3
Assume it can be represented by convex shapes Ca1 , Ca2 , . . . , Cd3 that are spanned
between l1 and l2 . Note that this implies that each such shape Cx contains a
straight line Sx that is spanned between l1 and l2 .
Since ai k di for i = 1, 2, 3 the shapes Cai and Cdi intersect. So take a
point xi ∈ Cai ∩ Cdi . Assume without loss of generality that x1 has the middle
y-coordinate. With respect to the straight line from x2 to x3 , we have that x1
is either left of it (red area) or right of it (blue) area:
l2
x2
x3
l1
Assume x1 is on the left. Then since c1 > a2 and c1 > a3 , the shape Cc1 must
be right of x2 and x3 (since x2 ∈ Ca2 , x3 ∈ Ca3 ), but since c1 < d1 it must also
be left of x1 (since x1 ∈ Cd1 ). The same holds for the straight line Sc1 . Clearly
such a straight line does not exist.
If x1 is on the right instead, we run into the same problem with the line Sb1 :
It has to be right of x1 but left of x2 and x3 .
122
5.3 Sets of Sets and Multisets – Lattices
We now consider posets whose elements are sets ordered by inclusion meaning
A ≤ B iff A ⊆ B (we will prefer the latter notation). Usually the sets we consider
are subsets of [n] and the most important example is the Boolean lattice Bn ,
the family of all subsets of [n]. Take for instance B3 :
{1, 2, 3}
Two sets are incomparable if they are pairwise not included in one another. For
instance {1} k {2, 3} and {1, 2} k {1, 3}. Note also that {{1, 2}, {2, 3}, {1, 3}} is
a largest antichain in B3 .
The next Theorem generalizes this observation.
Theorem 5.17 (Sperner’s Theorem). w(Bn ) = d nn e .
2
Proof. Note that two different sets of the same size are never included in one
another. So taking all k-subsets of [n] (i.e. all subsets of size k) yields an
antichain of size nk . Choosing k = d n2 e maximizes its size and proves the lower
bound.
To prove the upper bound, we introduce a new notion: For a permutation π
of [n] we say that π meets a set A ⊆ [n] if the elements of A form a prefix of π,
meaning:
A = {π(1), π(2), . . . , π(|A|)}.
For instance, the permutation π = 24513 meets {2, 4} and {1, 2, 4, 5} but does
not meet {4, 5}.
Now consider a antichain F. We count the number of permutations that
meet an element of F. Note that the sets met by a single permutation π form
a chain, so no π can meet several elements of F.
So clearly the sets {π | π meets A} are disjoint for different A ∈ F and we
have: X
|{π | π meets A}| ≤ n!
A∈F
For any given A ⊆ [n] there are |A|!(n − |A|)! permutations that meet A: We
know that the first |A| elements of such a π are given by A and can be arranged
in |A|! ways, and the remaining n − |A| elements can be arranged in (n − |A|)!
123
ways. So we have:
X
|A|!(n − |A|)! ≤ n!
A∈F
X 1
⇔ n
≤1
A∈F |A|
X 1
⇒ n
≤1
A∈F dn
2e
n
⇔ |F| ≤ .
d n2 e
n
step we divided both sides by n!, then we used that k = d 2 e maxi-
In the first
n
mizes k and thus made the term under the sum independent of A.
Remark. Note that if F actually is a largest antichain in Bn , then all inequalities
n
used in the above proof must be tight. In particular, each |A| must actually
n
n
have been equal to b n c = d n e . So only sets of size b 2 c or d n2 e are used in
n
2 2
F. From this it is easy to see that:
• If n is even, then F = [n]n is the unique largest antichain.
2
[n]
[n]
• If n is odd, then all largest antichains are contained in bn
∪ dn
.
2c 2e
Proof. For any element A ∈ F, the complement [n] \ A is disjoint from A and
cannot be in F. So taking complements is an injection that maps elements from
F to elements not in F. Therefore |F| ≤ 2n − F and thus |F| ≤ 2n−1 .
To attain this bound, consider Fx = {A ⊆ [n] | x ∈ A} for some fixed
x ∈ [n]. Clearly this is an intersecting family of size 2n−1 .
So this problem has a fairly straightforward and boring solution. But what
about the maximum cardinality of an intersecting k-family, i.e. we only allow
sets of size k? If k = 1, then we can clearly only pick one set (any two different
sets of size 1 do not intersect). If k = 2, then it is easy to see that n − 1 is best
possible using the sets {1, 2}, {1, 3}, {1, 4}, . . . , {1, n}. Is the best strategy still
the “obvious” one in general? It turns out the answer is “yes”, but the proof is
a bit more involved.
n
Theorem 5.19 (Erdős-Ko-Rado). For two integers n and 0 < k ≤ the
2
maximum size of an intersecting k-family of [n] is n−1
k−1 .
124
Proof. The lower bound is easy, just take Fxk = {A ⊆ [n] | x ∈ A, |A| = k}, for
some fixed x ∈ [n]. Clearly, Fxk is an intersecting family of the desired size.
For the upper bound we use a similar approach as in Sperner’s Theorem.
Recall that a circular permutation σ of [n] is an arrangement of the numbers
1, . . . , n on a circle where we do not distinguish between circular shifts, e.g.
53142 ≡ 31425. The number of circular permutations of [n] is (n − 1)!.
We say a circular permutation σ meets a set A ⊆ [n] if the elements of A
appear consecutively on σ. For instance A = {1, 2, 4} is met by σ = 25341 but
B = {2, 3, 4} is not.
Now fix F to be an intersecting k-family.
Claim. For any circular permutation σ, the number of sets A ∈ F met by σ is
at most k.
Proof by Picture. Fix one k-set A0 met by σ, we draw it in red (here k = 5,
n = 12):
Second Way:
X X
|S| = {σ | σ meets A} = k!(n − k)! = |F| · k!(n − k)!
A∈F A∈F
125
5.3.1 Symmetric Chain Partition
We have already
seen in Sperner’s Theorem that the width of the Boolean lattice
Bn is d nn e . Using Dilworth’s Theorem, this means Bn can be partitioned into
n
2
d 2 e chains. We are going to prove a version of this that is stronger in two ways.
n
126
Note that B(1, . . . , 1) = Bk since in that case M = {1, . . . , k}.
| {z }
k
Figure 38 depicts B(1, 2, 1), a three dimensional poset that is similar to B3
except it has an additional “plane” since we can have the element 2 not only
zero or one times, but also twice.
There are many other ways to explain what these posets are, maybe you
prefer one of the following perspectives:
• The poset B(m1 , . . . , mk ) is isomorphic to the poset of all divisors of
mk
pm m2
1 · p2 · . . . · pk
1
ordered by divisibility, where p1 , . . . , pk are distinct
primes. This is illustrated for B(1, 2, 1) and p1 = 2, p2 = 3, p3 = 5 on the
right of Figure 38.
• B(m1 , . . . , mk ) is the product of chains:
B(m1 , . . . , mk ) = C1 × . . . × Ck
• Using particularly easy chains we can view B(m1 , . . . , mk ) as the set [m1 +
1] × [m2 + 1] × . . . × [mk + 1] with dominance order ≤Rk .
Note that B(m1 , . . . , mk ) is ranked. A set A = {r1 · 1, . . . , rk · k} has rank
rank(A) = r1 +. . .+rk +1 and the rank of the entire poset is rank(B(m1 , . . . , mk )) =
m1 + . . . + mk + 1.
Theorem 5.21. B(m1 , . . . , mk ) has a symmetric chain decomposition.
Proof. We do induction on k. If k = 1, then B(m1 ) is a (symmetric) chain of
length m1 + 1. So the trivial partition works.
For k ≥ 2, consider the subposet P = B(m1 , . . . , mk−1 , 0) of B(m1 , . . . , mk ),
consisting of those multisets with repetition number 0 for type k. Clearly P is
isomorphic to B(m1 , . . . , mk−1 ) and has, by induction hypothesis, a symmetric
chain decomposition.
1234
123
12 123 124 134 234
1 12 13 23
1 2 12 13 14 23 24 34
∅ 1 2 3
∅ 1 2 3 4
∅
∅
127
{1, 2 · 2, 3}
90
{2 · 2, 3} {1, 2, 3}
{1, 2 · 2} 45 30
18
{2, 3} {1, 3} 15 10
{2 · 2} {1, 2} 9 6
{3} 5
3 2
{2} {1}
1
{}
Figure 38: Left: The Hasse Diagram of the poset B(1, 2, 1).
Right: Divisors of 90 = 21 · 32 · 51 ordered by divisibility.
Now the idea is really simple, see Figure 39. Given a symmetric chain
composition of P , we first partition B(m1 , . . . , mk ) into “curtains” that run
along the chains in P and then we partition the curtains (which are essentially
two dimensional grids) into symmetric chains. Now for the formal argument.
If P = C1 ∪ . . . ∪ CR is a partition into symmetric chains, then B(m1 , . . . , mk ) =
Cur1 ∪ . . . ∪ CurR is a partition into R sets where
Curi := {A ∪ {j · k} | A ∈ Ci , 0 ≤ j ≤ mk }.
Its minimum has rank |A1 | + 1 and its maximum has rank |Al | + mk + 1. Re-
member that C was symmetric in P so we had |A1 | + 1 + |Al | + 1 = rank(P ) + 1
and get:
This proves that the first hook is a symmetric chain. Subsequent hooks have
their minima at higher ranks but the ranks of the maxima are correspondingly
lower so it is easy to see that they are symmetric as well.
Figure 37 shows the results of this construction for the Boolean lattices B1 ,
B2 , B3 and B4 . You may want to verify your understanding by constructing
these symmetric chain partitions yourself. Note that in the case of Boolean
lattices all curtains have height 2 and will therefore be partitioned into only one
or two hooks.
128
Figure 39: On the left: Partition of B(2, 3) into three symmetric chains. On
the right: The corresponding partition of B(2, 3, 5) into three “curtains”.
1 1 0 0 0 0 1 1 0 1 1 0
1 2 3 4 5 6 7 8 9 10 11 12
The matched positions are those elements that do not vary within the chain.
The matched 1s form the minimum m of the chain, here m = {7, 8, 10, 11}, the
matched 0s are the elements missing from the maximum of the chain, so here
M = [12] \ {4, 5, 6, 9} = {1, 2, 3, 7, 8, 10, 11, 12}. The unmatched positions, here
{1, 2, 3, 12} will vary within the chain and are added from left to right. So in
the symmetric chain partition of Bn the set A will be part of the symmetric
129
Al ∪ {mk · k}
Al−1 ∪ {mk · k}
·
··
Al ∪ {2 · k} Al−2 ∪ {mk · k}
·
··
Al ∪ {k} Al−1 ∪ {2 · k}
··
·
··
·
Al Al−1 ∪ {k} Al−2 ∪ {2 · k} ... A3 ∪ {mk · k}
··
·
··
Al−2 · A3 ∪ {2 · k} A1 ∪ {mk · k}
··
·
··
·
A3 ∪ {k} A2 ∪ {2 · k}
··
·
··
·
A3 A2 ∪ {k} A1 ∪ {2 · k}
A2 A1 ∪ {k}
A1
Figure 40: The elements of a curtain can be partitioned into symmetric chains
as shown.
chain:
130
The following poset P is not a lattice although it has a maximum and mini-
mum. Note that a and b have no join, the upper bounds of a and b are {c, d, 1}
but there is no least upper bound (since c and d are incomparable). For similar
reasons, c and d have no meet. If we modify P by adding an element x as shown
we obtain a lattice L. In it, we have for instance a ∨ b = x, c ∧ d = x.
1 1
c d c d
P := L := x
a b a b
0 0
Note that for two elements x, y of a lattice with x ≤ y we always have x ∧ y = x
and x ∨ y = y. This also implies that 1 is the neutral element of the ∧-operation
and 0 is the neutral element of the ∨ operation. The Boolean lattices Bn really
are lattices where ∧ = ∩ and ∨ = ∪, since A∪B really is the least set containing
A and B and A ∩ B is the largest set contained in A and B.
We now consider Young lattices Y (m, n). Its elements are Ferrer diagrams
with at most m rows and at most n columns ordered by inclusion. Figure 41
shows Y (2, 3).
131
6 Designs
Assume you are the leader of a local brewery and have just invented seven
kinds of new beers. You are pretty sure all of them are awesome but are curious
whether other people share your opinion. There are some experts, each of which
can judge 3 beers (beyond that point they become tipsy and you do not trust
their judgment).
You want to make sure that each beer is evaluated in contrast to each other
beer, i.e. for each pair of beers there should be one expert that tries both beers.
Actually, make sure that there is exactly one expert for each pair, since experts
are expensive and you don’t have any money to waste.
So can you assign beers to the experts and meet this requirement? It turns
out your problem has a solution with seven experts as shown in Figure 42.
b7
b4 b6
b5
b1 b2 b3
Figure 42: The beers b1 , . . . , b7 are represented by points. Each set of beers that
one expert tries is represented by a straight line or circle. Note that any pair of
points uniquely determines a line or circle containing that pair.
132
Note that any pair of points x 6= y uniquely determines a third point z
such that their sum is zero, namely z = x + y (note that x = −x in F42 so
x 6= −y and z = x + y 6= 0). So B is a 2-(v = 15, k = 3, λ = 1)-design.
S := {(T, B) | B ∈ B, T ⊆ B, |T | = t}.
Firstly, each set T ⊆ V of size t (and there are vt of those) is contained
in exactly λ blocks since that is what a design requires. So |S| = vt λ.
Secondly, each blocks contains kt subsets of size t, so |S| = |B| · kt .
Together we get: |B| · kt = vt λ from which the claim follows.
(ii) Fix an i-set I and double count the set
S := {(T, B) | B ∈ B, I ⊆ T ⊆ B, |T | = t}.
Firstly, there are v−it−i sets T of size t containing I and for each such T
there are λ blocks B containing T . This gives |S| = v−i t−i · λ.
Secondly, for each of the rI blocks B with I ⊆ B there are k−i t−i sets T
of size t with I ⊆ T ⊆ B. This gives |S| = rI · k−i
t−i .
Together we obtain v−i k−i
t−i · λ = rI · t−i . From this we see that rI actually
only depends on |I| = i and the claim follows.
Remark. We consider t = 2 to be the default case. Sometimes a 2-(v, k, λ) design
is just called a (v, k, λ)-design. In the other notation the parameter λ = 1 can
be omitted so an S1 (t, k, v)-design is simply a S(t, k, v)-design. In that case we
also call it Steiner System (hence the “S”).
Corollary 6.4. If B is a (v, k, λ)-design (so t = 2) the results of the last
Theorem become
v(v−1)
(i) |B| = λ k(k−1) = r kv where r is:
133
v−1
(ii) r = r1 = λ k−1 .
Remark 6.5. For any t-(v, k, λ)-design all the numbers we derived above, such
as |B|, ri (1 ≤ i ≤ t) are integers. In particular, if some choice of parameters
t, k, v, λ does not yield integers, no design with those parameters exists.
For example, in any (v, k = 3, λ = 1)-design, we have r1 = v−1 2 , which is
(v−1)v
integer only if v is odd, and |B| = 6 which is integer only if v ∈ {0, 1, 3, 4}
(mod 6), so together we need v ∈ {1, 3} (mod 6). Note that for v = 3 we get
the trivial design and for v = 7 we get the Fano plane, so things seem to fit so
far.
However, the necessary conditions for the existence of designs are not yet
sufficient. In the following Theorem we derive another necessary condition, this
time a lower bound on v.
Theorem 6.6. In any non-trivial t-(v, k, λ = 1)-design we have v ≥ (t + 1)(k −
t + 1).
B{c,d,a} B{b,c,d}
d c
a b
B{d,a,b} B0 = B{a,b,c}
This means for each of the blocks BT (and there are t + 1 of them) that
there are k − t elements outside of BT that are not contained in any other BT 0 .
134
Together with the elements from S this gives
..
T .. ..
A·A = . . .
.
..
..
..
.
.
. r λ
.
..
λ ··· ··· λ r
135
• A symmetric (v, k, 1)-design is called projective plane. The Fano plane
from Figure 42 is such a projective plane with 7 points and 7 blocks
(“lines”).
B := {D, 1 + D, 2 + D, . . . , v − 1 + D}
136
Example. If v is a prime power with v ≡ 3 (mod 4) then
{a2 | a ∈ Zv , a2 6= 0}
v−1 v−3
is a (v, k, λ)-difference set with k = 2 and λ = 4 . We do not prove this.
Note the same point can be described in different ways, for instance [x0 , x1 , x2 ] =
[2x0 , 2x1 , 2x2 ] (for q 6= 2). Since |X| = q 3 − 1 and q − 1 elements represent the
3
−1
same point (are in the same class), we have qq−1 = q 2 + q + 1 points in total,
as desired.
For (a0 , a1 , a2 ) ∈ F3q \ {~0} we define the line L([a0 , a1 , a2 ]) as
137
Note that lines are well defined, that is the definition respects the equivalence
classes (either all elements representing a point satisfy the equation of the line
or neither of them).
The number of solutions to a0 x0 + a1 x1 + a2 x2 is q 2 since, without loss of
generality, a2 6= 0 and thus x0 and x1 can be chosen arbitrarily and x2 is then
uniquely determined. Disregarding the solution x0 = x1 = x2 = 0 (which does
not represent any point) we have q 2 − 1 solutions in X that are part of lines,
2
−1
meaning qq−1 = q + 1 points are part of the line, as desired.
Now consider two points [x0 , x1 , x2 ] 6= [y0 , y1 , y2 ]. We show that they are
contained in a unique line. Indeed, if L[a0 , a1 , a2 ] contains both, then we have
a0 x0 + a1 x1 + a2 x2 = 0, a0 y0 + a1 y1 + a2 y2 = 0. This is a homogeneous sys-
tem with two equations and three variables a0 , a1 , a2 , so there exists a solution
(a0 , a1 , a2 ) 6= 0 and all solutions are of the form c · (a0 , a1 , a2 ). All those solution
triples define the same line, so L[a0 , a1 , a2 ] is uniquely determined.
This concludes the construction (and the verification thereof).
Remark. It is conjectured that the order of every projective plane is a prime
power, but no proof is known.
type 1: {(x, 0), (x, 1), (x, 2)} for each x ∈ Z2n+1 ,
type 2: {(x, i), (y, i), ( x+y
2 , i + 1)} for all x 6= y and each i ∈ Z3 .
138
Case 3: x 6= y, i 6= j: Assume without loss of generality that j = i + 1 (since
i, j ∈ Z3 we always have this or j = i − 1). Then the pair is contained
0
in the block {(x, i), (y 0 , i), ( x+y 0
2 , i + 1)} where y = 2y − x. The pair is
contained in no other block.
This concludes the case of v ≡ 3 (mod 6), we proceed with v ≡ 1 (mod 6).
The construction and verification is more complicated.
As point set, choose V = (Z2n × Z3 ) ∪ {∞}. To simplify notation we write xi
to denote the pairs (x, i) ∈ Z2n × Z3 . The element ∞ is special and we assume
x + ∞ = ∞ for any x ∈ V . Before we define the blocks of the Triple System,
we define four types of base blocks first:
• {00 , 01 , 02 },
• {∞, 00 , n1 }, {∞, 01 , n2 }, {∞, 02 , n0 },
139
Note that projective planes are examples for non-resolvable designs since no
disjoint blocks exist. An example for a resolvable (v = 4, k = 2, λ = 1)-design
with the corresponding parallel classes is shown in the next picture, where the
points are A, B, C, D and the blocks are depicted by edges.
D C D C D C D C
−→ and and
A B A B A B A B
This is a special case of a (v = q 2 , k = q, λ = 1)-design. Such designs are called
affine planes and are, as we show now, always resolvable.
Theorem 6.15. Any (v = q 2 , k = q, λ = 1)-design is resolvable.
Proof. The main observation is the following:
/ B there is a unique block B 0 with x ∈ B 0
Claim. For each block B and x ∈
0
and B ∩ B = ∅.
Proof of Claim. For each y ∈ B there is a unique block By such that {x, y} ∈
By . These blocks are distinct (since |By ∩ B| ≤ 1) and all contain x. Because
Cor.6.4 2
−1
of r = λ k−1 v−1
= qq−1 = q + 1 there is exactly one block left that contains x
that is different from each By . It is disjoint from B.
With the claim proved, everything falls into place. Start with a maximal set
of pairwise disjoint blocks. This set forms a parallel class (this is not obvious, but
easy to prove). Then, in the next phase, take some other set of disjoint blocks
not yet considered. This forms a parallel class in the same way. Continue like
this until all blocks are handled (it is easy to verify that this works).
140
and A(0) for instance). At least the latter operation seems uninteresting as
renaming numbers yields the same partition, just with different labels. To
capture the idea of “entirely different Latin squares, we propose the notion
of orthogonality defined in the following.
Let A, B be Latin squares of order n with entries A = (aij )i,j , B = (bij )i,j .
The juxtaposition of A and B is the n × n array where each position simply
contains the corresponding numbers of A and B, i.e.
Note that this notion of orthogonality has little to do with geometric notions
orthogonality you may be familiar with. Observe the following:
• If A is orthogonal to B, then B is orthogonal to A.
• If A is orthogonal to B, then “renaming” numbers in A or B (for instance
replace every 0 with a 1 and vice versa) preserves orthogonality.
Remark. For n ∈ {2, 6} there are no two orthogonal Latin squares of order n.
For n = 2 this is easy to see since the only Latin squares with n = 2 are
0 1 1 0
L1 = and L2 =
1 0 0 1.
They are not orthogonal to each other nor to themselves (only for n = 1 can a
Latin square be orthogonal to itself). For n = 6 the argument is non-trivial.
For n ∈ N \ {2, 6}, a pair of orthogonal Latin squares of order n exists. The
proof of this is highly non-trivial, in fact it took over a hundred years to show
that there is a pair of orthogonal Latin squares of order 10.
Our goal in the following is to construct a large set A1 , . . . , Ak of Latin
squares of order n that are MOLS (mutually orthogonal Latin squares), meaning
Ai is orthogonal to Aj for i 6= j.
Theorem 6.16. Let n be a positive integer, r ∈ [n − 1] non-zero and co-prime
to n, i.e. gcd(n, r) = 1. Then Lrn = (r · i + j (mod n))i,j is a Latin square.
141
To clarify how Lrn looks, consider the example n = 5 and r = 2. “Going
right” corresponds to +1 and going down corresponds to +2 which gives
3 4 0 1 2
0 1 2 3 4
L25 = 2 3 4 0 1
4 0 1 2 3
1 2 3 4 0.
Proof. We have to show that each row and column contains each number exactly
once. For rows this is clear, since the i-th row contains r·i+1, r·i+2, . . . , r·i+n,
which traverses all numbers modulo n. The column j contains r + j, 2r +
j, . . . , n · r + j, all modulo n. Assume two of those numbers are identical, say
i1 · r + j ≡ i2 · r + j we conclude (i1 − i2 ) · r ≡ 0 (mod n). Since gcd(n, r) = 1
(meaning r has an inverse modulo n) we actually had i1 = i2 . So no column
contains a number twice and Lrn really is a Latin square of order n.
Theorem 6.17. If n is prime, then L1n , . . . , Ln−1
n are n − 1 MOLS of order n.
Proof. Since n is prime, gcd(n, i) = 1 for all i ∈ [n − 1] so by Theorem 6.16
L1n , . . . , Ln−1
n are Latin squares of order n.
Now consider two of those squares Lrn and Lsn with r 6= s. We need to show
that they are orthogonal, so suppose some pair of numbers from Zn ×Zn appears
in Lrn ⊗ Lsn in positions (i, j) and (k, l). By definition of Lrn and Lsn this gives
the identity:
(r · i + j, s · i + j) = (r · k + l, s · k + l).
So r · (i − k) = l − j = s · (i − k) which implies (r − s)(i − k) = 0. The numbers
modulo n form a field and a product can only be zero if one of the factors is
zero. Since r 6= s we obtain i = k and therefore also l = j. In particular we
showed that the same pair cannot appear in distinct positions and of Lrn ⊗ Lsn ,
so Lrn is orthogonal to Lsn as desired.
Remark. For a prime power n = pk there is a field Fn = {α0 = 0, α1 , . . . , αn−1 }
of order n. For it we can define Lα
n = (αr · αi + αj )i,j and generalize Theorem
r
142
(ii) There exists a finite field of order n.
(iii) n is a prime power, i.e. n = pk .
We need to show that each pair of distinct points (i, j) and (k, l) appears
in exactly one block. We first show that they are contained in at most one
block.
Case 1: i = k. Both points are contained in the i-th row so both are
in R(i). They cannot be contained in the same column (otherwise
they would not be distinct) and they are not both contained in any
Ar (s) since that would mean that the Latin square Ar contains the
number s twice in the i-th row. In particular, no block other than
R(i) contains both points.
Case 2: j = l. Similar to Case 1, the points are contained in C(j) and
in no other block.
Case 3: i 6= k, j 6= l. The points are not in the same row or column so
in no R(i) or C(j). But assume (for contradiction) that we have
(i, j), (k, l) ∈ Ar (s) ∩ At (u) for (r, s) 6= (t, u). Since the blocks orig-
inating from the same Latin square are disjoint (since those blocks
form a partition), we have r 6= t. So in Ar there is the number s
in both positions (i.e. at (i, j) and (k, l)) and in At there is u in
both positions. This means Ar ⊗ At has (s, u) in both positions, con-
tradicting the fact that Ar and At are orthogonal. So each pair of
positions is contained in at most one block.
S = {((i, j), (k, l), B) | B ∈ B, (i, j) 6= (k, l), B contains (i, j) and (k, l)}.
143
X
Firstly, |S| = {B | B contains (i, j) and (k, l)}
(i,j)6=(k,l)
X
n2
≤ 1= . (uses “at most one”)
2
(i,j)6=(k,l)
X
Secondly, |S| = {((i, j), (k, l)) | (i, j) 6= (k, l), B contains (i, j) and (k, l)}
B
X |B| 2
n n
= = (n2 + n) = .
2 2 2
B
R(1)
R(2)
R(3)
R(4)
R(5)
Figure 44: The two different parallel classes B1 and B2 intersect like this when
arranging the points accordingly.
we can interpret the blocks R(i) as rows and C(j) as columns (i, j ∈ [n]).
To simplify notation, identify the point point pij of the design with the
position (i, j) ∈ [n] × [n] in Latin squares. Then for each l ∈ {3, . . . , n + 1}
we interpret the n blocks Bl (0), Bl (2), . . . , Bl (n − 1) of the parallel class
Bl as a Latin square where the positions containing a number r ∈ Zn
are given by Bl (r). This really is a valid Latin square since each Bl (r)
intersects each R(i) and C(j) in exactly one point so each number occurs
in exactly one row and column.
144
It is also easy to verify that the n − 1 Latin squares we get from B3 to
Bn+1 are mutually orthogonal, since for any distinct s, t ∈ {3, . . . , n + 1}
and r1 , r2 ∈ Zn we know that |Bs (r1 ) ∩ Bt (r2 )| = 1 which just means that
the pair (r1 , r2 ) ∈ Zn × Zn occurs exactly once in the juxtaposition of the
Latin square for Bs and Bt .
References
[Kas61] Pieter W. Kasteleyn. The statistics of dimers on a lattice: I. the number
of dimer arrangements on a quadratic lattice. Physica, 27(12):1209–
1225, 1961.
[TF61] H. N. V. Temperley and Michael E. Fisher. Dimer problem in statistical
mechanics—an exact result. Philos. Mag. (8), 6:1061–1063, 1961.
145