An Introduction To Mathematical Methods in Combinatorics
An Introduction To Mathematical Methods in Combinatorics
= T ` a
1
, which contains
n 1 elements. Consequently, we have:
U
n
= 1 +U
n1
This is a recurrence relation, that is an expression
relating the value of U
n
with other values of U
k
having
k < n. It is clear that if some value, e.g., U
0
or U
1
is known, then it is possible to nd the value of U
n
,
for every n N. In our case, U
0
is the number of
comparisons necessary to nd out that an element
s does not belong to a table containing no element.
Hence we have the initial condition U
0
= 0 and we
can unfold the preceding recurrence relation:
U
n
= 1 +U
n1
= 1 + 1 +U
n2
= =
= 1 + 1 + + 1
. .. .
ntimes
+U
0
= n
Recurrence relations are the other mathematical
device arising in algorithm analysis. In our example
the recurrence is easily transformed into a sum, but
as we shall see this is not always the case. In general
we have the problem of solving a recurrence, i.e., to
nd an explicit expression for U
n
, starting with the
recurrence relation and the initial conditions. So, an-
other large part of our eorts will be dedicated to the
solution or recurrences.
1.3 Binary Searching
Another simple example of analysis can be performed
with the binary search algorithm. Let S be a given
ordered set. The ordering must be total, as the nu-
merical order in N, Z or R or the lexicographical
order in A
. If T = (a
1
, a
2
, . . . , a
n
) is a nite or-
dered subset of S, i.e., a table, we can always imag-
ine that a
1
< a
2
< < a
n
and consider the fol-
lowing algorithm, called binary searching, to search
for an element s S in T. Let a
i
the median el-
ement in T, i.e., i = (n + 1)/2, and compare it
with s. If s = a
i
then the search is successful;
otherwise, if s < a
i
, perform the same algorithm
on the subtable T
= (a
1
, a
2
, . . . , a
i1
); if instead
s > a
i
perform the same algorithm on the subtable
T
= (a
i+1
, a
i+2
, . . . , a
n
). If at any moment the ta-
ble on which we perform the search is reduced to the
empty set , then the search is unsuccessful.
1.4. CLOSED FORMS 7
Let us consider rst the Worst Case analysis of
this algorithm. For a successful search, the element
s is only found at the last step of the algorithm, i.e.,
when the subtable on which we search is reduced to
a single element. If B
n
is the number of comparisons
necessary to nd s in a table T with n elements, we
have the recurrence:
B
n
= 1 +B
n/2
In fact, we observe that every step reduces the table
to n/2 or to (n1)/2 elements. Since we are per-
forming a Worst Case analysis, we consider the worse
situation. The initial condition is B
1
= 1, relative to
the table (s), to which we should always reduce. The
recurrence is not so simple as in the case of sequential
searching, but we can simplify everything considering
a value of n of the form 2
k
1. In fact, in such a case,
we have n/2 = (n 1)/2 = 2
k1
1, and the re-
currence takes on the form:
B
2
k
1
= 1 +B
2
k1
1
or
k
= 1 +
k1
if we write
k
for B
2
k
1
. As before, unfolding yields
k
= k, and returning to the Bs we nd:
B
n
= log
2
(n + 1)
by our denition n = 2
k
1. This is valid for every n
of the form 2
k
1 and for the other values this is an
approximation, a rather good approximation, indeed,
because of the very slow growth of logarithms.
We observe explicitly that for n = 1, 000, 000, a
sequential search requires about 500,000 comparisons
on the average for a successful search, whereas binary
searching only requires log
2
(1, 000, 000) 20 com-
parisons. This accounts for the dramatic improve-
ment that binary searching operates on sequential
searching, and the analysis of algorithms provides a
mathematical proof of such a fact.
The Average Case analysis for successful searches
can be accomplished in the following way. There is
only one element that can be found with a single com-
parison: the median element in T. There are two
elements that can be found with two comparisons:
the median elements in T
and in T
. Continuing
in the same way we nd the average number A
n
of
comparisons as:
A
n
=
1
n
(1 + 2 + 2 + 3 + 3 + 3 + 3 + 4 +
+ (1 +log
2
(n)))
The value of this sum can be found explicitly, but the
method is rather dicult and we delay it until later
(see Section 4.7). When n = 2
k
1 the expression
simplies:
A
2
k
1
=
1
2
k
1
k
j=1
j2
j1
=
k2
k
2
k
+ 1
2
k
1
This sum also is not immediate, but the reader can
check it by using mathematical induction. If we now
write k2
k
2
k
+1 = k(2
k
1) +k (2
k
1), we nd:
A
n
= k +
k
n
1 = log
2
(n + 1) 1 +
log
2
(n + 1)
n
which is only a little better than the worst case.
For unsuccessful searches, the analysis is now very
simple, since we have to proceed as in the Worst Case
analysis and at the last comparison we have a failure
instead of a success. Consequently, U
n
= B
n
.
1.4 Closed Forms
The sign = between two numerical expressions de-
notes their numerical equivalence as for example:
n
k=0
k =
n(n + 1)
2
Although algebraically or numerically equivalent, two
expressions can be computationally quite dierent.
In the example, the left-hand expression requires n
sums to be evaluated, whereas the right-hand ex-
pression only requires a sum, a multiplication and
a halving. For n also moderately large (say n 5)
nobody would prefer computing the left-hand expres-
sion rather than the right-hand one. A computer
evaluates this latter expression in a few nanoseconds,
but can require some milliseconds to compute the for-
mer, if only n is greater than 10,000. The important
point is that the evaluation of the right-hand expres-
sion is independent of n, whilst the left-hand expres-
sion requires a number of operations growing linearly
with n.
A closed form expression is an expression, depend-
ing on some parameter n, the evaluation of which
does not require a number of operations depending
on n. Another example we have already found is:
n
k=0
k2
k1
= n2
n
2
n
+ 1 = (n 1)2
n
+ 1
Again, the left-hand expression is not in closed form,
whereas the right-hand one is. We observe that
2
n
= 22 2 (n times) seems to require n1 mul-
tiplications. In fact, however, 2
n
is a simple shift in
a binary computer and, more in general, every power
n
= exp(nln ) can be always computed with the
maximal accuracy allowed by a computer in constant
time, i.e., in a time independent of and n. This
is because the two elementary functions exp(x) and
ln(x) have the nice property that their evaluation is
independent of their argument. The same property
holds true for the most common numerical functions,
8 CHAPTER 1. INTRODUCTION
as the trigonometric and hyperbolic functions, the
and functions (see below), and so on.
As we shall see, in algorithm analysis there appear
many kinds of special numbers. Most of them can
be reduced to the computation of some basic quan-
tities, which are considered to be in closed form, al-
though apparently they depend on some parameter
n. The three main quantities of this kind are the fac-
torial, the harmonic numbers and the binomial co-
ecients. In order to justify the previous sentence,
let us anticipate some denitions, which will be dis-
cussed in the next chapter, and give a more precise
presentation of the and functions.
The -function is dened by a denite integral:
(x) =
0
t
x1
e
t
dt.
By integrating by parts, we obtain:
(x + 1) =
0
t
x
e
t
dt =
=
t
x
e
t
0
+
0
xt
x1
e
t
dt
= x(x)
which is a basic, recurrence property of the -
function. It allows us to reduce the computation of
(x) to the case 1 x 2. In this interval we can
use a polynomial approximation:
(x + 1) = 1 +b
1
x +b
2
x
2
+ +b
8
x
8
+(x)
where:
b
1
= 0.577191652 b
5
= 0.756704078
b
2
= 0.988205891 b
6
= 0.482199394
b
3
= 0.897056937 b
7
= 0.193527818
b
4
= 0.918206857 b
8
= 0.035868343
The error is [(x)[ 3 10
7
. Another method is to
use Stirlings approximation:
(x) = e
x
x
x0.5
1 +
1
12x
+
1
288x
2
139
51840x
3
571
2488320x
4
+
.
Some special values of the -function are directly
obtained from the denition. For example, when
x = 1 the integral simplies and we immediately nd
(1) = 1. When x = 1/2 the denition implies:
(1/2) =
0
t
1/21
e
t
dt =
0
e
t
t
dt.
By performing the substitution y =
t (t = y
2
and
dt = 2ydy), we have:
(1/2) =
0
e
y
2
y
2ydy = 2
0
e
y
2
dy =
.
The function is dened for every x C, except
when x is a negative integer, where the function goes
to innity; the following approximation can be im-
portant:
(n +)
(1)
n
n!
1
.
When we unfold the basic recurrence of the -
function for x = n an integer, we nd (n + 1) =
n (n 1) 2 1. The factorial (n + 1) =
n! = 1 2 3 n seems to require n 2 multi-
plications. However, for n large it can be computed
by means of the Stirlings formula, which is obtained
from the same formula for the -function:
n! = (n + 1) = n(n) =
=
2n
n
e
1 +
1
12n
+
1
288n
2
+
.
This requires only a xed amount of operations to
reach the desired accuracy.
The function (x), called -function or digamma
function, is dened as the logarithmic derivative of
the -function:
(x) =
d
dx
ln (x) =
(x)
(x)
.
Obviously, we have:
(x + 1) =
(x + 1)
(x + 1)
=
d
dx
x(x)
x(x)
=
=
(x) +x
(x)
x(x)
=
1
x
+(x)
and this is a basic property of the digamma function.
By this formula we can always reduce the computa-
tion of (x) to the case 1 x 2, where we can use
the approximation:
(x) = ln x
1
2x
1
12x
2
+
1
120x
4
1
252x
6
+ .
By the previous recurrence, we see that the digamma
function is related to the harmonic numbers H
n
=
1 + 1/2 + 1/3 + + 1/n. In fact, we have:
H
n
= (n + 1) +
where = 0.57721566 . . . is the Mascheroni-Euler
constant. By using the approximation for (x), we
1.5. THE LANDAU NOTATION 9
obtain an approximate formula for the Harmonic
numbers:
H
n
= ln n + +
1
2n
1
12n
2
+
which shows that the computation of H
n
does not
require n 1 sums and n 1 inversions as it can
appear from its denition.
Finally, the binomial coecient:
n
k
=
n(n 1) (n k + 1)
k!
=
=
n!
k!(n k)!
=
=
(n + 1)
(k + 1)(n k + 1)
can be reduced to the computation of the function
or can be approximated by using the Stirlings for-
mula for factorials. The two methods are indeed the
same. We observe explicitly that the last expression
shows that binomial coecients can be dened for
every n, k C, except that k cannot be a negative
integer number.
The reader can, as a very useful exercise, write
computer programs to realize the various functions
mentioned in the present section.
1.5 The Landau notation
To the mathematician Edmund Landau is ascribed
a special notation to describe the general behavior
of a function f(x) when x approaches some denite
value. We are mainly interested to the case x
, but this should not be considered a restriction.
Landau notation is also known as O-notation (or big-
oh notation), because of the use of the letter O to
denote the desired behavior.
Let us consider functions f : N R (i.e., sequences
of real numbers); given two functions f(n) and g(n),
we say that f(n) is O(g(n)), or that f(n) is in the
order of g(n), if and only if:
lim
n
f(n)
g(n)
<
In formulas we write f(n) = O(g(n)) or also f(n)
g(n). Besides, if we have at the same time:
lim
n
g(n)
f(n)
<
we say that f(n) is in the same order as g(n) and
write f(n) = (g(n)) or f(n) g(n).
It is easy to see that is an order relation be-
tween functions f : N R, and that is an equiv-
alence relation. We observe explicitly that when f(n)
is in the same order as g(n), a constant K = 0 exists
such that:
lim
n
f(n)
Kg(n)
= 1 or lim
n
f(n)
g(n)
= K;
the constant K is very important and will often be
used.
Before making some important comments on Lan-
dau notation, we wish to introduce a last denition:
we say that f(n) is of smaller order than g(n) and
write f(n) = o(g(n)), i:
lim
n
f(n)
g(n)
= 0.
Obviously, this is in accordance with the previous
denitions, but the notation introduced (the small-
oh notation) is used rather frequently and should be
known.
If f(n) and g(n) describe the behavior of two al-
gorithms A and B solving the same problem, we
will say that A is asymptotically better than B i
f(n) = o(g(n)); instead, the two algorithms are
asymptotically equivalent i f(n) = (g(n)). This is
rather clear, because when f(n) = o(g(n)) the num-
ber of operations performed by A is substantially less
than the number of operations performed by B. How-
ever, when f(n) = (g(n)), the number of operations
is the same, except for a constant quantity K, which
remains the same as n . The constant K can
simply depend on the particular realization of the al-
gorithms A and B, and with two dierent implemen-
tations we may have K < 1 or K > 1. Therefore, in
general, when f(n) = (g(n)) we cannot say which
algorithm is better, this depending on the particular
realization or on the particular computer on which
the algorithms are run. Obviously, if A and B are
both realized on the same computer and in the best
possible way, a value K < 1 tells us that algorithm
A is relatively better than B, and vice versa when
K > 1.
It is also possible to give an absolute evaluation
for the performance of a given algorithm A, whose
behavior is described by a sequence of values f(n).
This is done by comparing f(n) against an absolute
scale of values. The scale most frequently used con-
tains powers of n, logarithms and exponentials:
O(1) < O(ln n) < O(
n) < O(n
2
) <
< O(n
5
) < < O(e
n
) <
< O(e
e
n
) < .
This scale reects well-known properties: the loga-
rithm grows more slowly than any power n
, how-
ever small , while e
n
grows faster than any power
10 CHAPTER 1. INTRODUCTION
n
k
, however large k, independent of n. Note that
n
n
= e
nln n
and therefore O(e
n
) < O(n
n
). As a mat-
ter of fact, the scale is not complete, as we obviously
have O(n
0.4
) < O(
B
A
= [B[
|A|
= m
n
.
This formula allows us to solve some simple com-
binatorial problems. For example, if we toss 5 coins,
how many dierent congurations head/tail are pos-
sible? The ve coins are the domain of our mappings
and the set head,tail is the codomain. Therefore
we have a total of 2
5
= 32 dierent congurations.
Similarly, if we toss three dice, the total number of
congurations is 6
3
= 216. In the same way, we can
count the number of subsets in a set S having [S[ = n.
In fact, let us consider, given a subset A S, the
mapping
A
: S 0, 1 dened by:
A
(x) = 1 for x A
A
(x) = 0 for x / A
This is called the characteristic function of the sub-
set A; every two dierent subsets of S have dif-
ferent characteristic functions, and every mapping
f : S 0, 1 is the characteristic function of some
subset A S, i.e., the subset x S [ f(x) = 1.
Therefore, there are as many subsets of S as there
are characteristic functions; but these are 2
n
by the
formula above.
A nite set A is sometimes called an alphabet and
its elements symbols or letters. Any sequence of let-
ters is called a word; the empty sequence is the empty
word and is denoted by . From the previous consid-
erations, if [A[ = n, the number of words of length m
is n
m
. The rst part of Chapter 6 is devoted to some
basic notions on special sets of words or languages.
2.2 Permutations
In the usual sense, a permutation of a set of objects
is an arrangement of these objects in any order. For
example, three objects, denoted by a, b, c, can be ar-
ranged into six dierent ways:
(a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a).
A very important problem in Computer Science is
sorting: suppose we have n objects from an or-
dered set, usually some set of numbers or some set of
strings (with the common lexicographic order); the
objects are given in a random order and the problem
consists in sorting them according to the given order.
For example, by sorting (60, 51, 80, 77, 44) we should
obtain (44, 51, 60, 77, 80) and the real problem is to
obtain this ordering in the shortest possible time. In
other words, we start with a random permutation of
the n objects, and wish to arrive to their standard
ordering, the one in accordance with their supposed
order relation (e.g., less than).
In order to abstract from the particular nature of
the n objects, we will use the numbers 1, 2, . . . , n =
N
n
, and dene a permutation as a 1-1 mapping :
N
n
N
n
. By identifying a and 1, b and 2, c and 3,
the six permutations of 3 objects are written:
1 2 3
1 2 3
1 2 3
1 3 2
1 2 3
2 1 3
1 2 3
2 3 1
1 2 3
3 1 2
1 2 3
3 2 1
,
where, conventionally, the rst line contains the el-
ements in N
n
in their proper order, and the sec-
ond line contains the corresponding images. This
is the usual representation for permutations, but
since the rst line can be understood without
ambiguity, the vector representation for permuta-
tions is more common. This consists in writ-
ing the second line (the images) in the form of
a vector. Therefore, the six permutations are
(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1), re-
spectively.
Let us examine the permutation = (3, 2, 1) for
which we have (1) = 3, (2) = 2 and (3) = 1.
If we start with the element 1 and successively ap-
ply the mapping , we have (1) = 3, ((1)) =
(3) = 1, (((1))) = (1) = 3 and so on. Since
the elements in N
n
are nite, by starting with any
k N
n
we must obtain a nite chain of numbers,
which will repeat always in the same order. These
numbers are said to form a cycle and the permuta-
tion (3, 2, 1) is formed by two cycles, the rst one
composed by 1 and 3, the second one only composed
by 2. We write (3, 2, 1) = (1 3)(2), where every cycle
is written between parentheses and numbers are sep-
arated by blanks, to distinguish a cycle from a vector.
Conventionally, a cycle is written with the smallest
element rst and the various cycles are arranged ac-
cording to their rst element. Therefore, in this cycle
representation the six permutations are:
(1)(2)(3), (1)(2 3), (1 2)(3), (1 2 3), (1 3 2), (1 3)(2).
A number k for which (k) = k is called a xed
point for . The corresponding cycles, formed by a
single element, are conventionally understood, except
in the identity (1, 2, . . . , n) = (1)(2) (n), in which
all the elements are xed points; the identity is simply
written (1). Consequently, the usual representation
of the six permutations is:
(1) (2 3) (1 2) (1 2 3) (1 3 2) (1 3).
A permutation without any xed point is called a
derangement. A cycle with only two elements is called
a transposition. The degree of a cycle is the number
of its elements, plus one; the degree of a permutation
2.3. THE GROUP STRUCTURE 13
is the sum of the degrees of its cycles. The six per-
mutations have degree 2, 3, 3, 4, 4, 3, respectively. A
permutation is even or odd according to the fact that
its degree is even or odd.
The permutation (8, 9, 4, 3, 6, 1, 7, 2, 10, 5),
in vector notation, has a cycle representation
(1 8 2 9 10 5 6)(3 4), the number 7 being a xed
point. The long cycle (1 8 2 9 10 5 6) has degree 8;
therefore the permutation degree is 8 + 3 + 2 = 13
and the permutation is odd.
2.3 The group structure
Let n N; P
n
denotes the set of all the permutations
of n elements, i.e., according to the previous sections,
the set of 1-1 mappings : N
n
N
n
. If , P
n
,
we can perform their composition, i.e., a new permu-
tation dened as (k) = ((k)) = ( )(k). An
example in P
7
is:
=
1 2 3 4 5 6 7
1 5 6 7 4 2 3
1 2 3 4 5 6 7
4 5 2 1 7 6 3
1 2 3 4 5 6 7
4 7 6 3 1 5 2
.
In fact, by instance, (2) = 5 and (5) = 7; therefore
(2) = ((2)) = (5) = 7, and so on. The vec-
tor representation of permutations is not particularly
suited for hand evaluation of composition, although it
is very convenient for computer implementation. The
opposite situation occurs for cycle representation:
(2 5 4 7 3 6) (1 4)(2 5 7 3) = (1 4 3 6 5)(2 7)
Cycles in the left hand member are read from left to
right and by examining a cycle after the other we nd
the images of every element by simply remembering
the cycle successor of the same element. For example,
the image of 4 in the rst cycle is 7; the second cycle
does not contain 7, but the third cycle tells us that the
image of 7 is 3. Therefore, the image of 4 is 3. Fixed
points are ignored, and this is in accordance with
their meaning. The composition symbol is usually
understood and, in reality, the simple juxtaposition
of the cycles can well denote their composition.
The identity mapping acts as an identity for com-
position, since it is only composed by xed points.
Every permutation has an inverse, which is the per-
mutation obtained by reading the given permutation
from below, i.e., by sorting the elements in the second
line, which become the elements in the rst line, and
then rearranging the elements in the rst line into
the second line. For example, the inverse of the rst
permutation in the example above is:
1 2 3 4 5 6 7
1 6 7 5 2 3 4
.
In fact, in cycle notation, we have:
(2 5 4 7 3 6)(2 6 3 7 4 5) =
= (2 6 3 7 4 5)(2 5 4 7 3 6) = (1).
A simple observation is that the inverse of a cycle
is obtained by writing its rst element followed by
all the other elements in reverse order. Hence, the
inverse of a transposition is the same transposition.
Since composition is associative, we have proved
that (P
n
, ) is a group. The group is not commuta-
tive, because, for example:
= (1 4)(2 5 7 3) (2 5 4 7 3 6)
= (1 7 6 2 4)(3 5) = .
An involution is a permutation such that
2
=
= (1). An involution can only be composed by xed
points and transpositions, because by the denition
we have
1
= and the above observation on the
inversion of cycles shows that a cycle with more than
2 elements has an inverse which cannot coincide with
the cycle itself.
Till now, we have supposed that in the cycle repre-
sentation every number is only considered once. How-
ever, if we think of a permutation as the product of
cycles, we can imagine that its representation is not
unique and that an element k N
n
can appear in
more than one cycle. The representation of or
are examples of this statement. In particular, we can
obtain the transposition representation of a permuta-
tion; we observe that we have:
(2 6)(6 5)(6 4)(6 7)(6 3) = (2 5 4 7 3 6)
We transform the cycle into a product of transposi-
tions by forming a transposition with the rst and
the last element in the cycle, and then adding other
transpositions with the same rst element (the last
element in the cycle) and the other elements in the
cycle, in the same order, as second element. Besides,
we note that we can always add a couple of transpo-
sitions as (2 5)(2 5), corresponding to the two xed
points (2) and (5), and therefore adding nothing to
the permutation. All these remarks show that:
every permutation can be written as the compo-
sition of transpositions;
this representation is not unique, but every
two representations dier by an even number of
transpositions;
the minimal number of transpositions corre-
sponding to a cycle is the degree of the cycle
(except possibly for xed points, which however
always correspond to an even number of trans-
positions).
14 CHAPTER 2. SPECIAL NUMBERS
Therefore, we conclude that an even [odd] permuta-
tion can be expressed as the composition of an even
[odd] number of transpositions. Since the composi-
tion of two even permutations is still an even permu-
tation, the set A
n
of even permutations is a subgroup
of P
n
and is called the alternating subgroup, while the
whole group P
n
is referred to as the symmetric group.
2.4 Counting permutations
How many permutations are there in P
n
? If n = 1,
we only have a single permutation (1), and if n = 2
we have two permutations, exactly (1, 2) and (2, 1).
We have already seen that [P
3
[ = 6 and if n = 0
we consider the empty vector () as the only possible
permutation, that is [P
0
[ = 1. In this way we obtain
a sequence 1, 1, 2, 6, . . . and we wish to obtain a
formula giving us [P
n
[, for every n N.
Let P
n
be a permutation and (a
1
, a
2
, ..., a
n
) be
its vector representation. We can obtain a permuta-
tion in P
n+1
by simply adding the new element n+1
in any position of the representation of :
(n + 1, a
1
, a
2
, . . . , a
n
) (a
1
, n + 1, a
2
, . . . , a
n
)
(a
1
, a
2
, . . . , a
n
, n + 1)
Therefore, from any permutation in P
n
we obtain
n+1 permutations in P
n+1
, and they are all dierent.
Vice versa, if we start with a permutation in P
n+1
,
and eliminate the element n + 1, we obtain one and
only one permutation in P
n
. Therefore, all permuta-
tions in P
n+1
are obtained in the way just described
and are obtained only once. So we nd:
[P
n+1
[ = (n + 1) [P
n
[
which is a simple recurrence relation. By unfolding
this recurrence, i.e., by substituting to [P
n
[ the same
expression in [P
n+1
[, and so on, we obtain:
[P
n+1
[ = (n + 1)[P
n
[ = (n + 1)n[P
n1
[ =
= =
= (n + 1)n(n 1) 1 [P
0
[
Since, as we have seen, [P
0
[=1, we have proved that
the number of permutations in P
n
is given by the
product n (n 1)... 2 1. Therefore, our sequence
is:
n 0 1 2 3 4 5 6 7 8
[P
n
[ 1 1 2 6 24 120 720 5040 40320
As we mentioned in the Introduction, the number
n (n1)... 2 1 is called n factorial and is denoted by
n!. For example we have 10! = 10987654321 =
3, 680, 800. Factorials grow very fast, but they are one
of the most important quantities in Mathematics.
When n 2, we can add to every permutation in
P
n
one transposition, say (1 2). This transforms ev-
ery even permutation into an odd permutation, and
vice versa. On the other hand, since (1 2)
1
= (1 2),
the transformation is its own inverse, and therefore
denes a 1-1 mapping between even and odd permu-
tations. This proves that the number of even (odd)
permutations is n!/2.
Another simple problem is how to determine the
number of involutions on n elements. As we have al-
ready seen, an involution is only composed by xed
points and transpositions (without repetitions of the
elements!). If we denote by I
n
the set of involutions
of n elements, we can divide I
n
into two subsets: I
n
is the set of involutions in which n is a xed point,
and I
n
is the set of involutions in which n belongs
to a transposition, say (k n). If we eliminate n from
the involutions in I
n
, we obtain an involution of n1
elements, and vice versa every involution in I
n
can be
obtained by adding the xed point n to an involution
in I
n1
. If we eliminate the transposition (k n) from
an involution in I
n
, we obtain an involution in I
n2
,
which contains the element n1, but does not contain
the element k. In all cases, however, by eliminating
(k n) from all involutions containing it, we obtain a
set of involutions in a 1-1 correspondence with I
n2
.
The element k can assume any value 1, 2, . . . , n 1,
and therefore we obtain (n 1) times [I
n2
[ involu-
tions.
We now observe that all the involutions in I
n
are
obtained in this way from involutions in I
n1
and
I
n2
, and therefore we have:
[I
n
[ = [I
n1
[ + (n 1)[I
n2
[
Since [I
0
[ = 1, [I
1
[ = 1 and [I
2
[ = 2, from this recur-
rence relation we can successively nd all the values
of [I
n
[. This sequence (see Section 4.9) is therefore:
n 0 1 2 3 4 5 6 7 8
I
n
1 1 2 4 10 26 76 232 764
We conclude this section by giving the classical
computer program for generating a random permu-
tation of the numbers 1, 2, . . . , n. The procedure
shue receives the address of the vector, and the
number of its elements; lls it with the numbers from
1 to n then uses the standard procedure random to
produce a random permutation, which is returned in
the input vector:
procedure shue( var v : vector; n : integer ) ;
var : ;
begin
for i := 1 to n do v[ i ] := i ;
for i := n downto 2 do begin
j := random( i ) + 1 ;
2.5. DISPOSITIONS AND COMBINATIONS 15
a := v[ i ]; v[ i ] := v[ j ]; v[ j ] := a
end
end shue ;
The procedure exchanges the last element in v with
a random element in v, possibly the same last ele-
ment. In this way the last element is chosen at ran-
dom, and the procedure goes on with the last but one
element. In this way, the elements in v are properly
shued and eventually v contains the desired random
permutation. The procedure obviously performs in
linear time.
2.5 Dispositions and Combina-
tions
Permutations are a special case of a more general sit-
uation. If we have n objects, we can wonder how
many dierent orderings exist of k among the n ob-
jects. For example, if we have 4 objects a, b, c, d, we
can make 12 dierent arrangements with two objects
chosen from a, b, c, d. They are:
(a, b) (a, c) (a, d) (b, a) (b, c) (b, d)
(c, a) (c, b) (c, d) (d, a) (d, b) (d, c)
These arrangements ara called dispositions and, in
general, we can use any one of the n objects to be rst
in the permutation. There remain only n 1 objects
to be used as second element, and n 2 objects to
be used as a third element, and so on. Therefore,
the k objects can be selected in n(n1)...(nk +1)
dierent ways. If D
n,k
denotes the number of possible
dispositions of n elements in groups of k, we have:
D
n,k
= n(n 1) (n k + 1) = n
k
The symbol n
k
is called a falling factorial because it
consists of k factors beginning with n and decreasing
by one down to (n k +1). Obviously, n
n
= n! and,
by convention, n
0
= 1. There exists also a rising
factorial n
k
= n(n + 1) (n +k 1), often denoted
by (n)
k
, the so-called Pochhammer symbol.
When in k-dispositions we do not consider the
order of the elements, we obtain what are called
k-combinations. Therefore, there are only 6 2-
combinations of 4 objects, and they are:
a, b a, c a, d b, c b, d c, d
Combinations are written between braces, because
they are simply the subsets with k objects of a set of n
objects. If A = a, b, c, all the possible combinations
of these objects are:
C
3,0
=
C
3,1
= a, b, c
C
3,2
= a, b, a, c, b, c
C
3,3
= a, b, c
The number of k-combinations of n objects is denoted
by
n
k
n
0
= 1
n
n
= 1 n N
because, given a set of n elements, the empty set is
the only subset with 0 elements, and the whole set is
the only subset with n elements.
The name binomial coecients is due to the well-
known binomial formula:
(a +b)
n
=
n
k=0
n
k
a
k
b
nk
which is easily proved. In fact, when expanding the
product (a + b)(a + b) (a + b), we choose a term
from each factor (a +b); the resulting term a
k
b
nk
is
obtained by summing over all the possible choices of
k as and n k bs, which, by the denition above,
are just
n
k
.
There exists a simple formula to compute binomial
coecients. As we have seen, there are n
k
dier-
ent k-dispositions of n objects; by permuting the k
objects in the disposition, we obtain k! other dispo-
sitions with the same elements. Therefore, k! dis-
positions correspond to a single combination and we
have:
n
k
=
n
k
k!
=
n(n 1) . . . (n k + 1)
k!
This formula gives a simple way to compute binomial
coecients in a recursive way. In fact we have:
n
k
=
n(n 1) . . . (n k + 1)
k!
=
=
n
k
(n 1) . . . (n k + 1)
(k 1)!
=
=
n
k
n 1
k 1
r
0
7
3
=
7
3
6
2
=
7
3
6
2
5
1
=
7
3
6
2
5
1
4
0
= 35
It is not dicult to compute a binomial coecient
such as
100
3
100
97
n
k
=
n(n 1) . . . (n k + 1)
k!
=
=
n. . . (n k + 1)
k!
(n k) . . . 1
(n k) . . . 1
=
=
n!
k!(n k)!
This is a very important formula by its own, and it
shows that:
n
n k
=
n!
(n k)!(n (n k))!
=
n
k
100
97
100
3
2k
k
, for
which symmetry gives no help.
The reader is invited to produce a computer pro-
gram to evaluate binomial coecients. He (or she) is
warned not to use the formula n!/k!(nk)!, which can
produce very large numbers, exceeding the capacity
of the computer when n, k are not small.
The denition of a binomial coecient can be eas-
ily expanded to any real numerator:
r
k
=
r
k
k!
=
r(r 1) . . . (r k + 1)
k!
.
For example we have:
1/2
3
=
1/2(1/2)(3/2)
3!
=
1
16
but in this case the symmetry rule does not make
sense. We point out that:
n
k
=
n(n 1) . . . (n k + 1)
k!
=
=
(1)
k
(n +k 1)
k
k!
=
n +k 1
k
(1)
k
which allows us to express a binomial coecient with
negative, integer numerator as a binomial coecient
with positive numerator. This is known as negation
rule and will be used very often.
If in a combination we are allowed to have several
copies of the same element, we obtain a combination
with repetitions. A useful exercise is to prove that the
number of the k by k combinations with repetitions
of n elements is:
R
n,k
=
n +k 1
k
.
2.6 The Pascal triangle
Binomial coecients satisfy a very important recur-
rence relation, which we are now going to prove. As
we know,
n
k
n1
k1
. The class S
can be seen
as composed by the subsets with k elements of a set
with n1 elements, i.e., the base set minus a
n
: their
number is therefore
n1
k
n
k
n 1
k
n 1
k 1
n
0
n
n
= 1 n N.
For example, we have:
4
2
3
2
3
1
=
=
2
2
2
1
2
1
2
0
=
= 2 + 2
2
1
= 2 + 2
1
1
1
0
=
= 2 + 2 2 = 6.
This recurrence is not particularly suitable for nu-
merical computation. However, it gives a simple rule
to compute successively all the binomial coecients.
Let us dispose them in an innite array, whose rows
represent the number n and whose columns represent
the number k. The recurrence tells us that the ele-
ment in position (n, k) is obtained by summing two
elements in the previous row: the element just above
the position (n, k), i.e., in position (n1, k), and the
element on the left, i.e., in position (n1, k1). The
array is initially lled by 1s in the rst column (cor-
responding to the various
n
0
n
n
2n
n
1/2
n
in terms of
the central binomial coecients:
1/2
n
=
(1/2)(3/2) ((2n 1)/2)
n!
=
=
(1)
n
1 3 (2n 1)
2
n
n!
=
=
(1)
n
1 2 3 4 (2n 1) (2n)
2
n
n!2 4 (2n)
=
=
(1)
n
(2n)!
2
n
n!2
n
(1 2 n)
=
(1)
n
4
n
(2n)!
n!
2
=
=
(1)
n
4
n
2n
n
.
In a similar way, we can prove the following identities:
1/2
n
=
(1)
n1
4
n
(2n 1)
2n
n
3/2
n
=
(1)
n
3
4
n
(2n 1)(2n 3)
2n
n
3/2
n
=
(1)
n1
(2n + 1)
4
n
2n
n
.
An important point of this generalization is that
the binomial formula can be extended to real expo-
nents. Let us consider the function f(x) = (1+x)
r
; it
is continuous and can be dierentiated as many times
as we wish; in fact we have:
f
(n)
(x) =
d
n
dx
n
(1 +x)
r
= r
n
(1 +x)
rn
as can be shown by mathematical induction. The
rst two cases are f
(0)
= (1 + x)
r
= r
0
(1 + x)
r0
,
f
(x) = r(1 + x)
r1
. Suppose now that the formula
holds true for some n N and let us dierentiate it
once more:
f
(n+1)
(x) = r
n
(r n)(1 +x)
rn1
=
= r
n+1
(1 +x)
r(n+1)
and this proves our statement. Because of that, (1 +
x)
r
has a Taylor expansion around the point x = 0
of the form:
(1 +x)
r
=
= f(0) +
f
(0)
1!
x +
f
(0)
2!
x
2
+ +
f
(n)
(0)
n!
x
n
+ .
The coecient of x
n
is therefore f
(n)
(0)/n! =
r
n
/n! =
r
n
, and so:
(1 +x)
r
=
n=0
r
n
x
n
.
We conclude with the following property, which is
called the cross-product rule:
n
k
k
r
=
n!
k!(n k)!
k!
r!(k r)!
=
=
n!
r!(n r)!
(n r)!
(n k)!(k r)!
=
=
n
r
n r
k r
.
This rule, together with the symmetry and the nega-
tion rules, are the three basic properties of binomial
coecients:
n
k
n
n k
n
k
n +k 1
k
(1)
k
n
k
k
r
n
r
n r
k r
k=1
1
k
= 1 +
1
2
+
1
3
+
1
4
+
1
5
+
diverges. In fact, if we cumulate the 2
m
numbers
from 1/(2
m
+ 1) to 1/2
m+1
, we obtain:
1
2
m
+ 1
+
1
2
m
+ 2
+ +
1
2
m+1
>
>
1
2
m+1
+
1
2
m+1
+ +
1
2
m+1
=
2
m
2
m+1
=
1
2
18 CHAPTER 2. SPECIAL NUMBERS
and therefore the sum cannot be limited. On the
other hand we can dene:
H
n
= 1 +
1
2
+
1
3
+ +
1
n
a nite, partial sum of the harmonic series. This
number has a well-dened value and is called a har-
monic number. Conventionally, we set H
0
= 0 and
the sequence of harmonic numbers begins:
n 0 1 2 3 4 5 6 7 8
H
n
0 1
3
2
11
6
25
12
137
60
49
20
363
140
761
280
Harmonic numbers arise in the analysis of many
algorithms and it is very useful to know an approxi-
mate value for them. Let us consider the series:
1
1
2
+
1
3
1
4
+
1
5
= ln 2
and let us dene:
L
n
= 1
1
2
+
1
3
1
4
+ +
(1)
n1
n
.
Obviously we have:
H
2n
L
2n
= 2
1
2
+
1
4
+ +
1
2n
= H
n
or H
2n
= L
2n
+H
n
, and since the series for ln 2 is al-
ternating in sign, the error committed by truncating
it at any place is less than the rst discarded element.
Therefore:
ln 2
1
2n
< L
2n
< ln 2
and by summing H
n
to all members:
H
n
+ ln 2
1
2n
< H
2n
< H
n
+ ln 2
Let us now consider the two cases n = 2
k1
and n =
2
k2
:
H
2
k1 + ln 2
1
2
k
< H
2
k < H
2
k1 + ln 2
H
2
k2 + ln 2
1
2
k1
< H
2
k1 < H
2
k2 + ln 2.
By summing and simplifying these two expressions,
we obtain:
H
2
k2 + 2 ln 2
1
2
k
1
2
k1
< H
2
k < H
2
k2 + 2 ln 2.
We can now iterate this procedure and eventually
nd:
H
2
0 +k ln 2
1
2
k
1
2
k1
1
2
1
< H
2
k < H
2
0 +k ln 2.
Since H
2
0 = H
1
= 1, we have the limitations:
ln 2
k
< H
2
k < ln 2
k
+ 1.
These limitations can be extended to every n, and
since the values of the H
n
s are increasing, this im-
plies that a constant should exist (0 < < 1) such
that:
H
n
ln n + as n
This constant is called the Euler-Mascheroni con-
stant and, as we have already mentioned, its value
is: 0.5772156649 . . .. Later we will prove the
more accurate approximation of the H
n
s we quoted
in the Introduction.
The generalized harmonic numbers are dened as:
H
(s)
n
=
1
1
s
+
1
2
s
+
1
3
s
+ +
1
n
s
and H
(1)
n
= H
n
. They are the partial sums of the
series dening the Riemann function:
(s) =
1
1
s
+
1
2
s
+
1
3
s
+
which can be dened in such a way that the sum
actually converges except for s = 1 (the harmonic
series). In particular we have:
(2) = 1 +
1
4
+
1
9
+
1
16
+
1
25
+ =
2
6
(3) = 1 +
1
8
+
1
27
+
1
64
+
1
125
+
(4) = 1 +
1
16
+
1
81
+
1
256
+
1
625
+ =
4
90
and in general:
(2n) =
(2)
2n
2(2n)!
[B
2n
[
where B
n
are the Bernoulli numbers (see below). No
explicit formula is known for (2n + 1), but numeri-
cally we have:
(3) 1.202056903 . . . .
Because of the limited value of (s), we can set, for
large values of n:
H
(s)
n
(s)
2.8 Fibonacci numbers
At the beginning of 1200, Leonardo Fibonacci intro-
duced in Europe the positional notation for numbers,
together with the computing algorithms for perform-
ing the four basic operations. In fact, Fibonacci was
2.9. WALKS, TREES AND CATALAN NUMBERS 19
the most important mathematician in western Eu-
rope at that time. He posed the following problem:
a farmer has a couple of rabbits which generates an-
other couple after two months and, from that moment
on, a new couple of rabbits every month. The new
generated couples become fertile after two months,
when they begin to generate a new couple of rab-
bits every month. The problem consists in comput-
ing how many couples of rabbits the farmer has after
n months.
It is a simple matter to nd the initial values: there
is one couple at the beginning (1st month) and 1 in
the second month. The third month the farmer has
2 couples, and 3 couples the fourth month; in fact,
the rst couple has generated another pair of rabbits,
while the previously generated couple of rabbits has
not yet become fertile. The couples become 5 on the
fth month: in fact, there are the 3 couples of the
preceding month plus the newly generated couples,
which are as many as there are fertile couples; but
these are just the couples of two months beforehand,
i.e., 2 couples. In general, at the nth month, the
farmer will have the couples of the previous month
plus the new couples, which are generated by the fer-
tile couples, that is the couples he had two months
before. If we denote by F
n
the number of couples at
the nth month, we have the Fibonacci recurrence:
F
n
= F
n1
+F
n2
with the initial conditions F
1
= F
2
= 1. By the same
rule, we have F
0
= 0 and the sequence of Fibonacci
numbers begins:
n 0 1 2 3 4 5 6 7 8 9 10
F
n
0 1 1 2 3 5 8 13 21 34 55
every term obtained by summing the two preceding
numbers in the sequence.
Despite the small numbers appearing at the begin-
ning of the sequence, Fibonacci numbers grow very
fast, and later we will see how they grow and how they
can be computed in a fast way. For the moment, we
wish to show how Fibonacci numbers appear in com-
binatorics and in the analysis of algorithms. Suppose
we have some bricks of dimensions 1 2 dm, and we
wish to cover a strip 2 dm wide and n dm long by
using these bricks. The problem is to know in how
many dierent ways we can perform this covering. In
Figure 2.1 we show the ve ways to cover a strip 4
dm long.
If M
n
is this number, we can observe that a cov-
ering of M
n
can be obtained by adding vertically a
brick to a covering in M
n1
or by adding horizon-
tally two bricks to a covering in M
n2
. These are the
only ways of proceeding to build our coverings, and
Figure 2.1: Fibonacci coverings for a strip 4 dm long
therefore we have the recurrence relation:
M
n
= M
n1
+M
n2
which is the same recurrence as for Fibonacci num-
bers. This time, however, we have the initial condi-
tions M
0
= 1 (the empty covering is just a covering!)
and M
1
= 1. Therefore we conclude M
n
= F
n+1
.
Euclids algorithm for computing the Greatest
Common Divisor (gcd) of two positive integer num-
bers is another instance of the appearance of Fi-
bonacci numbers. The problem is to determine the
maximal number of divisions performed by Euclids
algorithm. Obviously, this maximum is attained
when every division in the process gives 1 as a quo-
tient, since a greater quotient would drastically cut
the number of necessary divisions. Let us consider
two consecutive Fibonacci numbers, for example 34
and 55, and let us try to nd gcd(34, 55):
55 = 1 34 + 21
34 = 1 21 + 13
21 = 1 13 + 8
13 = 1 8 + 5
8 = 1 5 + 3
5 = 1 3 + 2
3 = 1 2 + 1
2 = 2 1
We immediately see that the quotients are all 1 (ex-
cept the last one) and the remainders are decreasing
Fibonacci numbers. The process can be inverted to
prove that only consecutive Fibonacci numbers enjoy
this property. Therefore, we conclude that given two
integer numbers, n and m, the maximal number of
divisions performed by Euclids algorithm is attained
when n, m are two consecutive Fibonacci numbers
and the actual number of divisions is the order of the
smaller number in the Fibonacci sequence, minus 1.
2.9 Walks, trees and Catalan
numbers
Walks or paths are common combinatorial ob-
jects and are dened in the following way. Let Z
2
be
20 CHAPTER 2. SPECIAL NUMBERS
T
E
, ,
, ,
,
, , , ,
, , ,
,
,
,
,
A
B
C
D
Figure 2.2: How a walk is decomposed
the integral lattice, i.e., the set of points in R
2
having
integer coordinates. A walk or path is a nite sequence
of points in Z
2
with the following properties:
1. the origin (0, 0) belongs to the walk;
2. if (x + 1, y + 1) belongs to the walk, then either
(x, y + 1) or (x + 1, y) also belongs to the walk.
A pair of points ((x, y), (x + 1, y)) is called an east
step and a pair ((x, y), (x, y + 1)) is a north step. It
is a simple matter to show that the number of walks
composed by n steps and ending at column k (i.e.,
the last point is (x, k) = (n k, k)) is just
n
k
. In
fact, if we denote by 1, 2, . . . , n the n steps, starting
with the origin, then we can associate to any walk a
subset of N
n
= 1, 2, . . . , n, that is the subset of the
north-step numbers. Since the other steps should be
east steps, a 1-1 correspondence exists between the
walks and the subsets of N
n
with k elements, which,
as we know, are exactly
n
k
k=1
b
k1
b
nk
=
n1
k=0
b
k
b
nk1
with the initial condition b
0
= 1, corresponding to
the empty walk or the walk composed by the only
point (0, 0). The sequence (b
k
)
kN
begins:
n 0 1 2 3 4 5 6 7 8
b
n
1 1 2 5 14 42 132 429 1430
and, as we shall see:
b
n
=
1
n + 1
2n
n
.
The b
n
s are called Catalan numbers and they fre-
quently occur in the analysis of algorithms and data
structures. For example, if we associate an open
parenthesis to every east steps and a closed paren-
thesis to every north step, we obtain the number of
possible parenthetizations of an expression. When
we have three pairs of parentheses, the 5 possibilities
are:
()()() ()(()) (())() (()()) ((())).
When we build binary trees from permutations, we
do not always obtain dierent trees from dierent
permutations. There are only 5 trees generated by
the six permutations of 1, 2, 3, as we show in Figure
2.3.
How many dierent trees exist with n nodes? If we
x our attention on the root, the left subtree has k
nodes for k = 0, 1, . . . , n 1, while the right subtree
contains the remaining n k 1 nodes. Every tree
with k nodes can be combined with every tree with
n k 1 nodes to form a tree with n nodes, and
therefore we have the recurrence relation:
b
n
=
n1
k=0
b
k
b
nk1
which is the same recurrence as before. Since the
initial condition is again b
0
= 1 (the empty tree) there
are as many trees as there are walks.
Another kind of walks is obtained by considering
steps of type ((x, y), (x + 1, y + 1)), i.e., north-east
steps, and of type ((x, y), (x + 1, y 1)), i.e., south-
east steps. The interesting walks are those starting
2.10. STIRLING NUMBERS OF THE FIRST KIND 21
1
2
3
d
d
d
d
1
3
2
d
d
1
2
3
d
d
1
3
2
d
d
1
2
3
d
d
d
d
d
d
d
d
d
d
Figure 2.4: Rooted Plane Trees
from the origin and never going below the x-axis; they
are called Dyck walks. An obvious 1-1 correspondence
exists between Dyck walks and the walks considered
above, and again we obtain the sequence of Catalan
numbers:
n 0 1 2 3 4 5 6 7 8
b
n
1 1 2 5 14 42 132 429 1430
Finally, the concept of a rooted planar tree is as
follows: let us consider a node, which is the root of
the tree; if we recursively add branches to the root or
to the nodes generated by previous insertions, what
we obtain is a rooted plane tree. If n denotes the
number of branches in a rooted plane tree, in Fig. 2.4
we represent all the trees up to n = 3. Again, rooted
plane trees are counted by Catalan numbers.
2.10 Stirling numbers of the
rst kind
About 1730, the English mathematician James Stir-
ling was looking for a connection between powers
of a number x, say x
n
, and the falling factorials
x
k
= x(x 1) (x k + 1). He developed the rst
n`k 0 1 2 3 4 5 6
0 1
1 0 1
2 0 1 1
3 0 2 3 1
4 0 6 11 6 1
5 0 24 50 35 10 1
6 0 120 274 225 85 15 1
Table 2.2: Stirling numbers of the rst kind
instances:
x
1
= x
x
2
= x(x 1) = x
2
x
x
3
= x(x 1)(x 2) = x
3
3x
2
+ 2x
x
4
= x(x 1)(x 2)(x 3) =
= x
4
6x
3
+ 11x
2
6x
and picking the coecients in their proper order
(from the smallest power to the largest) he obtained a
table of integer numbers. We are mostly interested in
the absolute values of these numbers, as are shown in
Table 2.2. After him, these numbers are called Stir-
ling numbers of the rst kind and are now denoted by
n
k
k=0
n
k
(1)
nk
x
k
.
Let us now observe that x
n
= x
n1
(x n + 1) and
therefore we have:
x
n
= (x n + 1)
n1
k=0
n 1
k
(1)
nk1
x
k
=
=
n1
k=0
n 1
k
(1)
nk1
x
k+1
n1
k=0
(n 1)
n 1
k
(1)
nk1
x
k
=
=
n
k=0
n 1
k 1
(1)
nk
x
k
+
22 CHAPTER 2. SPECIAL NUMBERS
+
n
k=0
(n 1)
n 1
k
(1)
nk
x
k
.
We performed the change of variable k k 1 in
the rst sum and then extended both sums from 0
to n. This identity is valid for every value of x, and
therefore we can equate its coecients to those of the
previous, general Stirling identity, thus obtaining the
recurrence relation:
n
k
= (n 1)
n 1
k
n 1
k 1
.
This recurrence, together with the initial conditions:
n
n
= 1, n N and
n
0
= 0, n > 0,
completely denes the Stirling number of the rst
kind.
What is a possible combinatorial interpretation of
these numbers? Let us consider the permutations of
n elements and count the permutations having ex-
actly k cycles, whose set will be denoted by S
n,k
. If
we x any element, say the last element n, we ob-
serve that the permutations in S
n,k
can have n as a
xed point, or not. When n is a xed point and we
eliminate it, we obtain a permutation with n 1 el-
ements having exactly k 1 cycles; vice versa, any
such permutation gives a permutation in S
n,k
with n
as a xed point if we add (n) to it. Therefore, there
are [S
n1,k1
[ such permutations in S
n,k
. When n is
not a xed point and we eliminate it from the permu-
tation, we obtain a permutation with n 1 elements
and k cycles. However, the same permutation is ob-
tained several times, exactly n1 times, since n can
occur after any other element in the standard cycle
representation (it can never occur as the rst element
in a cycle, by our conventions). For example, all the
following permutations in S
5,2
produce the same per-
mutation in S
4,2
:
(1 2 3)(4 5) (1 2 3 5)(4) (1 2 5 3)(4) (1 5 2 3)(4).
The process can be inverted and therefore we have:
[S
n,k
[ = (n 1)[S
n1,k
[ +[S
n1,k1
[
which is just the recurrence relation for the Stirling
numbers of the rst kind. If we now prove that also
the initial conditions are the same, we conclude that
[S
n,k
[ =
n
k
k=0
n
k
= n!,
i.e., the row sums of the Stirling triangle of the rst
kind equal n!, because they correspond to the total
number of permutations of n objects. We also observe
that:
n
1
= (n 1)!; in fact, S
n,1
is composed by
all the permutations having a single cycle; this
begins by 1 and is followed by any permutations
of the n 1 remaining numbers;
n
n1
n
2
; in fact, S
n,n1
contains permu-
tations having all xed points except a single
transposition; but this transposition can only be
formed by taking two elements among 1, 2, . . . , n,
which is done in
n
2
dierent ways;
n
2
= (n 1)!H
n1
; returning to the numeri-
cal denition, the coecient of x
2
is a sum of
products, in each of which a positive integer is
missing.
2.11 Stirling numbers of the
second kind
James Stirling also tried to invert the process de-
scribed in the previous section, that is he was also
interested in expressing ordinary powers in terms of
falling factorials. The rst instances are:
x
1
= x
1
x
2
= x
1
+x
2
= x +x(x 1)
x
3
= x
1
+ 3x
2
+x
3
= x + 3x(x 1)+
+x(x 1)(x 2)
x
4
= x
1
+ 6x
2
+ 7x
3
+x
4
The coecients can be arranged into a triangular ar-
ray, as shown in Table 2.3, and are called Stirling
numbers of the second kind. The usual notation for
them is
n
k
k=0
n
k
x
k
.
We obtain a recurrence relation in the following way:
x
n
= xx
n1
= x
n1
k=0
n 1
k
x
k
=
=
n1
k=0
n 1
k
(x +k k)x
k
=
2.12. BELL AND BERNOULLI NUMBERS 23
n`k 0 1 2 3 4 5 6
0 1
1 0 1
2 0 1 1
3 0 1 3 1
4 0 1 7 6 1
5 0 1 15 25 10 1
6 0 1 31 90 65 15 1
Table 2.3: Stirling numbers of the second kind
=
n1
k=0
n 1
k
x
k+1
+
n1
k=0
k
n 1
k
x
k
=
=
n
k=0
n 1
k 1
x
k
+
n
k=0
k
n 1
k
x
k
where, as usual, we performed the change of variable
k k 1 and extended the two sums from 0 to n.
The identity is valid for every x R, and therefore we
can equate the coecients of x
k
in this and the above
identity, thus obtaining the recurrence relation:
n
k
= k
n 1
k
n 1
k 1
n
n
= 1, n N and
n
0
= 0, n 1.
These relations completely dene the Stirling trian-
gle of the second kind. Every row of the triangle
determines a polynomial; for example, from row 4 we
obtain: S
4
(w) = w +7w
2
+6w
3
+w
4
and it is called
the 4-th Stirling polynomial.
Let us now look for a combinatorial interpretation
of these numbers. If N
n
is the usual set 1, 2, . . . , n,
we can study the partitions of N
n
into k disjoint, non-
empty subsets. For example, when n = 4 and k = 2,
we have the following 7 partitions:
1 2, 3, 4 1, 2 3, 4 1, 3 2, 4
1, 4 2, 3 1, 2, 3 4
1, 2, 4 3 1, 3, 4 2 .
If P
n,k
is the corresponding set, we now count [P
n,k
[
by xing an element in N
n
, say the last element n.
The partitions in P
n,k
can contain n as a singleton
(i.e., as a subset with n as its only element) or can
contain n as an element in a larger subset. In the
former case, by eliminating n we obtain a partition
in P
n1,k1
and, obviously, all partitions in P
n1,k1
can be obtained in such a way. When n belongs to a
larger set, we can eliminate it obtaining a partition
in P
n1,k
; however, the same partition is obtained
several times, exactly by eliminating n from any of
the k subsets containing it in the various partitions.
For example, the following three partitions in P
5,3
all
produce the same partition in P
4,3
:
1, 2, 5 3 4 1, 2 3, 5 4
1, 2 3 4, 5
This proves the recurrence relation:
[P
n,k
[ = k[P
n1,k
[ +[P
n1,k1
[
which is the same recurrence as for the Stirling num-
bers of the second kind. As far as the initial con-
ditions are concerned, we observe that there is only
one partition of N
n
composed by n subsets, i.e., the
partition containing n singletons; therefore [P
n,n
[ =
1, n N (in the case n = 0 the empty set is the only
partition of the empty set). When n 1, there is
no partition of N
n
composed by 0 subsets, and there-
fore [P
n,0
[ = 0. We can conclude that [P
n,k
[ coincides
with the corresponding Stirling number of the second
kind, and use this fact for observing that:
n
1
n
2
= 2
n1
, n 2. When the partition is only
composed by two subsets, the rst one uniquely
determines the second. Let us suppose that the
rst subset always contains 1. By eliminating
1, we obtain as rst subset all the subsets in
N
n
` 1, except this last whole set, which would
correspond to an empty second set. This proves
the identity;
n
n1
n
2
n
2
dierent ways.
2.12 Bell and Bernoulli num-
bers
If we sum the rows of the Stirling triangle of the sec-
ond kind, we nd a sequence:
n 0 1 2 3 4 5 6 7 8
B
n
1 1 2 5 15 52 203 877 4140
which represents the total number of partitions rela-
tive to the set N
n
. For example, the ve partitions of
a set with three elements are:
1, 2, 3 1 2, 3 1, 2 3
24 CHAPTER 2. SPECIAL NUMBERS
1, 3 2 1 2 3 .
The numbers in this sequence are called Bell numbers
and are denoted by B
n
; by denition we have:
B
n
=
n
k=0
n
k
.
Bell numbers grow very fast; however, since
n
k
n
k
k=0
n
k
k!;
this shows that O
n
n!, and, in fact, O
n
> n!, n >
1.
Another combinatorial interpretation of the or-
dered Bell numbers is as follows. Let us x an in-
teger n N and for every k n let A
k
be any mul-
tiset with n elements containing at least once all the
numbers 1, 2, . . . , k. The number of all the possible
orderings of the A
k
s is just the nth ordered Bell num-
ber. For example, when n = 3, the possible multisets
are: 1, 1, 1 , 1, 1, 2 , 1, 2, 2 , 1, 2, 3. Their pos-
sible orderings are given by the following 7 vectors:
(1, 1, 1) (1, 1, 2) (1, 2, 1) (2, 1, 1)
(1, 2, 2) (2, 1, 2) (2, 2, 1)
plus the six permutations of the set 1, 2, 3. These
orderings are called preferential arrangements.
We can nd a 1-1 correspondence between the
orderings of set partitions and preferential arrange-
ments. If (a
1
, a
2
, . . . , a
n
) is a preferential arrange-
ment, we build the corresponding ordered partition
by setting the element 1 in the a
1
th subset, 2 in the
a
2
th subset, and so on. If k is the largest number
in the arrangement, we exactly build k subsets. For
example, the partition corresponding to (1, 2, 2, 1) is
1, 4 2, 3, while the partition corresponding to
(2, 1, 1, 2) is 2, 31, 4, whose ordering is dierent.
This construction can be easily inverted and since it is
injective, we have proved that it is actually a 1-1 cor-
respondence. Because of that, ordered Bell numbers
are also called preferential arrangement numbers.
We conclude this section by introducing another
important sequence of numbers. These are (positive
or negative) rational numbers and therefore they can-
not correspond to any counting problem, i.e., their
combinatorial interpretation cannot be direct. How-
ever, they arise in many combinatorial problems and
therefore they should be examined here, for the mo-
ment only introducing their denition. The Bernoulli
numbers are implicitly dened by the recurrence re-
lation:
n
k=0
n + 1
k
B
k
=
n,0
.
No initial condition is necessary, because for n = 0
we have
1
0
B
0
= 1, i.e., B
0
= 1. This is the starting
value, and B
1
is obtained by setting n = 1 in the
recurrence relation:
2
0
B
0
+
2
1
B
1
= 0.
We obtain B
1
= 1/2, and we now have a formula
for B
2
:
3
0
B
0
+
3
1
B
1
+
3
2
B
2
= 0.
By performing the necessary computations, we nd
B
2
= 1/6, and we can go on successively obtaining
all the possible values for the B
n
s. The rst twelve
values are as follows:
n 0 1 2 3 4 5 6
B
n
1 1/2 1/6 0 1/30 0 1/42
n 7 8 9 10 11 12
B
n
0 1/30 0 5/66 0 691/2730
Except for B
1
, all the other values of B
n
for odd
n are zero. Initially, Bernoulli numbers seem to be
small, but as n grows, they become extremely large in
modulus, but, apart from the zero values, they are al-
ternatively one positive and one negative. These and
other properties of the Bernoulli numbers are not eas-
ily proven in a direct way, i.e., from their denition.
However, well see later how we can arrange things
in such a way that everything becomes accessible to
us.
Chapter 3
Formal power series
3.1 Denitions for formal
power series
Let R be the eld of real numbers and let t be any
indeterminate over R, i.e., a symbol dierent from
any element in R. A formal power series (f.p.s.) over
R in the indeterminate t is an expression:
f(t) = f
0
+f
1
t+f
2
t
2
+f
3
t
3
+ +f
n
t
n
+ =
k=0
f
k
t
k
where f
0
, f
1
, f
2
, . . . are all real numbers. The same
denition applies to every set of numbers, in particu-
lar to the eld of rational numbers Q and to the eld
of complex numbers C. The developments we are now
going to see, and depending on the eld structure of
the numeric set, can be easily extended to every eld
F of 0 characteristic. The set of formal power series
over F in the indeterminate t is denoted by F[[t]]. The
use of a particular indeterminate t is irrelevant, and
there exists an obvious 1-1 correspondence between,
say, F[[t]] and F[[y]]; it is simple to prove that this
correspondence is indeed an isomorphism. In order
to stress that our results are substantially indepen-
dent of the particular eld F and of the particular
indeterminate t, we denote F[[t]] by T, but the reader
can think of T as of R[[t]]. In fact, in combinatorial
analysis and in the analysis of algorithms the coe-
cients f
0
, f
1
, f
2
, . . . of a formal power series are mostly
used to count objects, and therefore they are positive
integer numbers, or, in some cases, positive rational
numbers (e.g., when they are the coecients of an ex-
ponential generating function. See below and Section
4.1).
If f(t) T, the order of f(t), denoted by ord(f(t)),
is the smallest index r for which f
r
= 0. The set of all
f.p.s. of order exactly r is denoted by T
r
or by F
r
[[t]].
The formal power series 0 = 0 + 0t + 0t
2
+ 0t
3
+
has innite order.
If (f
0
, f
1
, f
2
, . . .) = (f
k
)
kN
is a sequence of (real)
numbers, there is no substantial dierence between
the sequence and the f.p.s.
k=0
f
k
t
k
, which will
be called the (ordinary) generating function of the
sequence. The term ordinary is used to distinguish
these functions from exponential generating func-
tions, which will be introduced in the next chapter.
The indeterminate t is used as a place-marker, i.e.,
a symbol to denote the place of the element in the se-
quence. For example, in the f.p.s. 1+t +t
2
+t
3
+ ,
corresponding to the sequence (1, 1, 1, . . .), the term
t
5
= 1 t
5
simply denotes that the element in position
5 (starting from 0) in the sequence is the number 1.
Although our study of f.p.s. is mainly justied by
the development of a generating function theory, we
dedicate the present chapter to the general theory of
f.p.s., and postpone the study of generating functions
to the next chapter.
There are two main reasons why f.p.s. are more
easily studied than sequences:
1. the algebraic structure of f.p.s. is very well un-
derstood and can be developed in a standard
way;
2. many f.p.s. can be abbreviated by expressions
easily manipulated by elementary algebra.
The present chapter is devoted to these algebraic as-
pects of f.p.s.. For example, we will prove that the
series 1 +t +t
2
+t
3
+ can be conveniently abbre-
viated as 1/(1 t), and from this fact we will be able
to infer that the series has a f.p.s. inverse, which is
1 t + 0t
2
+ 0t
3
+ .
We conclude this section by dening the concept
of a formal Laurent (power) series (f.L.s.), as an ex-
pression:
g(t) = g
m
t
m
+g
m+1
t
m+1
+ +g
1
t
1
+
+g
0
+g
1
t +g
2
t
2
+ =
k=m
g
k
t
k
.
The set of f.L.s. strictly contains the set of f.p.s..
For a f.L.s. g(t) the order can be negative; when the
order of g(t) is non-negative, then g(t) is actually a
f.p.s.. We observe explicitly that an expression as
k=
f
k
t
k
does not represent a f.L.s..
25
26 CHAPTER 3. FORMAL POWER SERIES
3.2 The basic algebraic struc-
ture
The set T of f.p.s. can be embedded into several al-
gebraic structures. We are now going to dene the
most common one, which is related to the usual con-
cept of sum and (Cauchy) product of series. Given
two f.p.s. f(t) =
k=0
f
k
t
k
and g(t) =
k=0
g
k
t
k
,
the sum of f(t) and g(t) is dened as:
f(t) +g(t) =
k=0
f
k
t
k
+
k=0
g
k
t
k
=
k=0
(f
k
+g
k
)t
k
.
From this denition, it immediately follows that T
is a commutative group with respect to the sum.
The associative and commutative laws directly fol-
low from the analogous properties in the eld F; the
identity is the f.p.s. 0 = 0 + 0t + 0t
2
+ 0t
3
+ , and
the opposite series of f(t) =
k=0
f
k
t
k
is the series
f(t) =
k=0
(f
k
)t
k
.
Let us now dene the Cauchy product of f(t) by
g(t):
f(t)g(t) =
k=0
f
k
t
k
k=0
g
k
t
k
=
=
k=0
j=0
f
j
g
kj
t
k
Because of the form of the t
k
coecient, this is also
called the convolution of f(t) and g(t). It is a good
idea to write down explicitly the rst terms of the
Cauchy product:
f(t)g(t) = f
0
g
0
+ (f
0
g
1
+f
1
g
0
)t +
+ (f
0
g
2
+f
1
g
1
+f
2
g
0
)t
2
+
+ (f
0
g
3
+f
1
g
2
+f
2
g
1
+f
3
g
0
)t
3
+
This clearly shows that the product is commutative
and it is a simple matter to prove that the identity is
the f.p.s. 1 = 1+0t +0t
2
+0t
3
+ . The distributive
law is a consequence of the distributive law valid in
F. In fact, we have:
(f(t) +g(t))h(t) =
=
k=0
j=0
(f
j
+g
j
)h
kj
t
k
=
=
k=0
j=0
f
j
h
kj
t
k
+
k=0
j=0
g
j
h
kj
t
k
=
= f(t)h(t) +g(t)h(t)
Finally, we can prove that T does not contain any
zero divisor. If f(t) and g(t) are two f.p.s. dierent
from zero, then we can suppose that ord(f(t)) = k
1
and ord(g(t)) = k
2
, with 0 k
1
, k
2
< . This
means f
k1
= 0 and g
k2
= 0; therefore, the product
f(t)g(t) has the term of degree k
1
+k
2
with coecient
f
k1
g
k2
= 0, and so it cannot be zero. We conclude
that (T, +, ) is an integrity domain.
The previous reasoning also shows that, in general,
we have:
ord(f(t)g(t)) = ord(f(t)) + ord(g(t))
The order of the identity 1 is obviously 0; if f(t) is an
invertible element in T, we should have f(t)f(t)
1
=
1 and therefore ord(f(t)) = 0. On the other hand, if
f(t) T
0
, i.e., f(t) = f
0
+f
1
t +f
2
t
2
+f
3
t
3
+ with
f
0
= 0, we can easily prove that f(t) is invertible. In
fact, let g(t) = f(t)
1
so that f(t)g(t) = 1. From
the explicit expression for the Cauchy product, we
can determine the coecients of g(t) by solving the
innite system of linear equations:
f
0
g
0
= 1
f
0
g
1
+f
1
g
0
= 0
f
0
g
2
+f
1
g
1
+f
2
g
0
= 0
The system can be solved in a simple way, starting
with the rst equation and going on one equation
after the other. Explicitly, we obtain:
g
0
= f
1
0
g
1
=
f
1
f
2
0
g
2
=
f
2
1
f
3
0
f
2
f
2
0
and therefore g(t) = f(t)
1
is well dened. We con-
clude stating the result just obtained: a f.p.s. is in-
vertible if and only if its order is 0. Because of that,
T
0
is also called the set of invertible f.p.s.. Accord-
ing to standard terminology, the elements of T
0
are
called the units of the integrity domain.
As a simple example, let us compute the inverse of
the f.p.s. 1 t = 1 t +0t
2
+0t
3
+ . Here we have
f
0
= 1, f
1
= 1 and f
k
= 0, k > 1. The system
becomes:
g
0
= 1
g
1
g
0
= 0
g
2
g
1
= 0
and we easily obtain that all the g
j
s (j = 0, 1, 2, . . .)
are 1. Therefore the inverse f.p.s. we are looking for
is 1 + t + t
2
+ t
3
+ . The usual notation for this
fact is:
1
1 t
= 1 +t +t
2
+t
3
+ .
It is well-known that this identity is only valid for
1 < t < 1, when t is a variable and f.p.s. are inter-
preted as functions. In our formal approach, however,
these considerations are irrelevant and the identity is
valid from a purely formal point of view.
3.4. OPERATIONS ON FORMAL POWER SERIES 27
3.3 Formal Laurent Series
In the rst section of this Chapter, we introduced the
concept of a formal Laurent series, as an extension
of the concept of a f.p.s.; if a(t) =
k=m
a
k
t
k
and
b(t) =
k=n
b
k
t
k
(m, n Z), are two f.L.s., we can
dene the sum and the Cauchy product:
a(t) +b(t) =
k=m
a
k
t
k
+
k=n
b
k
t
k
=
=
k=p
(a
k
+b
k
)t
k
a(t)b(t) =
k=m
a
k
t
k
k=n
b
k
t
k
=
=
k=q
i+j=k
a
i
b
j
t
k
where p = min(m, n) and q = m + n. As we did for
f.p.s., it is not dicult to nd out that these opera-
tions enjoy the usual properties of sum and product,
and if we denote by L the set of f.L.s., we have that
(L, +, ) is a eld. The only point we should formally
prove is that every f.L.s. a(t) =
k=m
a
k
t
k
= 0 has
an inverse f.L.s. b(t) =
k=m
b
k
t
k
. However, this
is proved in the same way we proved that every f.p.s.
in T
0
has an inverse. In fact we should have:
a
m
b
m
= 1
a
m
b
m+1
+a
m+1
b
m
= 0
a
m
b
m+2
+a
m+1
b
m+1
+a
m+2
b
m
= 0
By solving the rst equation, we nd b
m
= a
1
m
;
then the system can be solved one equation after the
other, by substituting the values obtained up to the
moment. Since a
m
b
m
is the coecient of t
0
, we have
a(t)b(t) = 1 and the proof is complete.
We can now show that (L, +, ) is the smallest
eld containing the integrity domain (T, +, ), thus
characterizing the set of f.L.s. in an algebraic way.
From Algebra we know that given an integrity do-
main (K, +, ) the smallest eld (F, +, ) containing
(K, +, ) can be built in the following way: let us de-
ne an equivalence relation on the set K K:
(a, b) (c, d) ad = bc;
if we now set F = K K/ , the set F with the
operations + and dened as the extension of + and
in K is the eld we are searching for. This is just
the way in which the eld Q of rational numbers is
constructed from the integrity domain Z of integers
numbers, and the eld of rational functions is built
from the integrity domain of the polynomials.
Our aim is to show that the eld (L, +, ) of f.L.s. is
isomorphic with the eld constructed in the described
way starting with the integrity domain of f.p.s.. Let
k=0
d
k
t
k
T
0
, and consequently
let us consider the f.L.s. l(t) =
k=0
d
k
t
km
; by
construction, it is uniquely determined by a(t), b(t)
or also by f(t), g(t). It is now easy to see that l(t)
is the inverse of the f.p.s. b(t)/a(t) in the sense of
f.L.s. as considered above, and our proof is complete.
This shows that the correspondence is a 1-1 corre-
spondence between L and
L preserving the inverse,
so it is now obvious that the correspondence is also
an isomorphism between (L, +, ) and (
L, +, ).
Because of this result, we can identify
L and L
and assert that (L, +, ) is indeed the smallest eld
containing (T, +, ). From now on, the set
L will be
ignored and we will always refer to L as the eld of
f.L.s..
3.4 Operations on formal
power series
Besides the four basic operations: addition, subtrac-
tion, multiplication and division, it is possible to con-
sider other operations on T, only a few of which can
be extended to L.
The most important operation is surely taking a
power of a f.p.s.; if p N we can recursively dene:
f(t)
0
= 1 if p = 0
f(t)
p
= f(t)f(t)
p1
if p > 0
and observe that ord(f(t)
p
) = p ord(f(t)). There-
fore, f(t)
p
T
0
if and only if f(t) T
0
; on the
other hand, if f(t) T
0
, then the order of f(t)
p
be-
comes larger and larger and goes to when p .
This property will be important in our future devel-
opments, when we will reduce many operations to
28 CHAPTER 3. FORMAL POWER SERIES
innite sums involving the powers f(t)
p
with p N.
If f(t) T
0
, i.e., ord(f(t)) > 0, these sums involve
elements of larger and larger order, and therefore for
every index k we can determine the coecient of t
k
by only a nite number of terms. This assures that
our denitions will be good denitions.
We wish also to observe that taking a positive in-
teger power can be easily extended to L; in this case,
when ord(f(t)) < 0, ord(f(t)
p
) decreases, but re-
mains always nite. In particular, for g(t) = f(t)
1
,
g(t)
p
= f(t)
p
, and powers can be extended to all
integers p Z.
When the exponent p is a real or complex num-
ber whatsoever, we should restrict f(t)
p
to the case
f(t) T
0
; in fact, if f(t) = t
m
g(t), we would have:
f(t)
p
= (t
m
g(t))
p
= t
mp
g(t)
p
; however, t
mp
is an ex-
pression without any mathematical sense. Instead,
if f(t) T
0
, let us write f(t) = f
0
+ v(t), with
ord( v(t)) > 0. For v(t) = v(t)/f
0
, we have by New-
tons rule:
f(t)
p
= (f
0
+ v(t))
p
= f
p
0
(1+v(t))
p
= f
p
0
k=0
p
k
v(t)
k
,
which can be assumed as a denition. In the last
expression, we can observe that: i) f
p
0
C; ii)
p
k
is
dened for every value of p, k being a non-negative
integer; iii) v(t)
k
is well-dened by the considerations
above and ord(v(t)
k
) grows indenitely, so that for
every k the coecient of t
k
is obtained by a nite
sum. We can conclude that f(t)
p
is well-dened.
Particular cases are p = 1 and p = 1/2. In the
former case, f(t)
1
is the inverse of the f.p.s. f(t).
We have already seen a method for computing f(t)
1
,
but now we obtain the following formula:
f(t)
1
=
1
f
0
k=0
1
k
v(t)
k
=
1
f
0
k=0
(1)
k
v(t)
k
.
For p = 1/2, we obtain a formula for the square root
of a f.p.s.:
f(t)
1/2
=
f(t) =
f
0
k=0
1/2
k
v(t)
k
=
=
f
0
k=0
(1)
k1
4
k
(2k 1)
2k
k
v(t)
k
.
In Section 3.12, we will see how f(t)
p
can be ob-
tained computationally without actually performing
the powers v(t)
k
. We conclude by observing that this
more general operation of taking the power p R
cannot be extended to f.L.s.: in fact, we would have
smaller and smaller terms t
k
(k ) and there-
fore the resulting expression cannot be considered an
actual f.L.s., which requires a term with smallest de-
gree.
By applying well-known rules of the exponential
and logarithmic functions, we can easily dene the
corresponding operations for f.p.s., which however,
as will be apparent, cannot be extended to f.L.s.. For
the exponentiation we have for f(t) T
0
, f(t) = f
0
+
v(t):
e
f(t)
= exp(f
0
+v(t)) = e
f0
k=0
v(t)
k
k!
.
Again, since v(t) T
0
, the order of v(t)
k
increases
with k and the sums necessary to compute the co-
ecient of t
k
are always nite. The formula makes
clear that exponentiation can be performed on every
f(t) T, and when f(t) T
0
the factor e
f0
is not
present.
For the logarithm, let us suppose f(t) T
0
, f(t) =
f
0
+ v(t), v(t) = v(t)/f
0
; then we have:
ln(f
0
+ v(t)) = ln f
0
+ ln(1 +v(t)) =
= ln f
0
+
k=1
(1)
k+1
v(t)
k
k
.
In this case, for f(t) T
0
, we cannot dene the log-
arithm, and this shows an asymmetry between expo-
nential and logarithm.
Another important operation is dierentiation:
Df(t) =
d
dt
f(t) =
k=1
kf
k
t
k1
= f
(t).
This operation can be performed on every f(t) L,
and a very important observation is the following:
Theorem 3.4.1 For every f(t) L, its derivative
f
t
0
f()d =
k=0
f
k
t
0
k
d =
k=0
f
k
k + 1
t
k+1
.
Our purely formal approach allows us to exchange
the integration and summation signs; in general, as
we know, this is only possible when the convergence is
uniform. By this denition,
t
0
f()d never belongs
to T
0
. Integration can be extended to f.L.s. with an
3.6. COEFFICIENT EXTRACTION 29
obvious exception: because integration is the inverse
operation of dierentiation, we cannot apply integra-
tion to a f.L.s. containing a term in t
1
. Formally,
from the denition above, such a term would imply a
division by 0, and this is not allowed. In all the other
cases, integration does not create any problem.
3.5 Composition
A last operation on f.p.s. is so important that we
dedicate to it a complete section. The operation is the
composition of two f.p.s.. Let f(t) T and g(t) T
0
,
then we dene the composition of f(t) by g(t) as
the f.p.s:
f(g(t)) =
k=0
f
k
g(t)
k
.
This denition justies the fact that g(t) cannot be-
long to F
0
; in fact, otherwise, innite sums were in-
volved in the computation of f(g(t)). In connection
with the composition of f.p.s., we will use the follow-
ing notation:
f(g(t)) =
f(y)
y = g(t)
y = g(t)
= g(t) and
f(y)
y = t
= f(t), and
therefore t is a left and right identity. As a sec-
ond fact, we show that a f.p.s. f(t) has an inverse
with respect to composition if and only if f(t) T
1
.
Note that g(t) is the inverse of f(t) if and only if
f(g(t)) = t and g(f(t)) = t. From this, we de-
duce immediately that f(t) T
0
and g(t) T
0
.
On the other hand, it is clear that ord(f(g(t))) =
ord(f(t))ord(g(t)) by our initial denition, and since
ord(t) = 1 and ord(f(t)) > 0, ord(g(t)) > 0, we must
have ord(f(t)) = ord(g(t)) = 1.
Let us now come to the main part of the proof
and consider the set T
1
with the operation of com-
position ; composition is always associative and
therefore (T
1
, ) is a group if we prove that every
f(t) T
1
has a left (or right) inverse, because the
theory assures that the other inverse exists and coin-
cides with the previously found inverse. Let f(t) =
f
1
t+f
2
t
2
+f
3
t
3
+ and g(t) = g
1
t+g
2
t
2
+g
3
t
3
+ ;
we have:
f(g(t)) = f
1
(g
1
t +g
2
t
2
+g
3
t
3
+ ) +
+f
2
(g
2
1
t
2
+ 2g
1
g
2
t
3
+ ) +
+f
3
(g
3
1
t
3
+ ) + =
= f
1
g
1
t + (f
1
g
2
+f
2
g
2
1
)t
2
+
+ (f
1
g
3
+ 2f
2
g
1
g
2
+f
3
g
3
1
)t
3
+
= t
In order to determine g(t) we have to solve the sys-
tem:
f
1
g
1
= 1
f
1
g
2
+f
2
g
2
1
= 0
f
1
g
3
+ 2f
2
g
1
g
2
+f
3
g
3
1
= 0
The rst equation gives g
1
= 1/f
1
; we can substitute
this value in the second equation and obtain a value
for g
2
; the two values for g
1
and g
2
can be substi-
tuted in the third equation and obtain a value for g
3
.
Continuing in this way, we obtain the value of all the
coecients of g(t), and therefore g(t) is determined
in a unique way. In fact, we observe that, by con-
struction, in the kth equation, g
k
appears in linear
form and its coecient is always f
1
. Being f
1
= 0,
g
k
is unique even if the other g
r
(r < k) appear with
powers.
The f.p.s. g(t) such that f(g(t)) = t, and therefore
such that g(f(t)) = t as well, is called the composi-
tional inverse of f(t). In the literature, it is usually
denoted by f(t) or f
[1]
(t); we will adopt the rst no-
tation. Obviously, f(t) = f(t), and sometimes f(t) is
also called the reverse of f(t). Given f(t) T
1
, the
determination of its compositional inverse is one of
the most interesting problems in the theory of f.p.s.
or f.L.s.; it was solved by Lagrange and we will dis-
cuss it in the following sections. Note that, in princi-
ple, the g
k
s can be computed by solving the system
above; this, however, is too complicated and nobody
will follow that way, unless for exercising.
3.6 Coecient extraction
If f(t) L, or in particular f(t) T, the notation
[t
n
]f(t) indicates the extraction of the coecient of
t
n
from f(t), and therefore we have [t
n
]f(t) = f
n
.
In this sense, [t
n
] can be seen as a mapping: [t
n
] :
L R or [t
n
] : L C, according to what is the
eld underlying the set L or T. Because of that, [t
n
]
30 CHAPTER 3. FORMAL POWER SERIES
(linearity) [t
n
](f(t) +g(t)) = [t
n
]f(t) +[t
n
]g(t) (K1)
(shifting) [t
n
]tf(t) = [t
n1
]f(t) (K2)
(dierentiation) [t
n
]f
(t) = (n + 1)[t
n+1
]f(t) (K3)
(convolution) [t
n
]f(t)g(t) =
n
k=0
[t
k
]f(t)[t
nk
]g(t) (K4)
(composition) [t
n
]f(g(t)) =
k=0
([y
k
]f(y))[t
n
]g(t)
k
(K5)
Table 3.1: The rules for coecient extraction
is called an operator and exactly the coecient of
operator or, more simply, the coecient operator.
In Table 3.1 we state formally the main properties
of this operator, by collecting what we said in the pre-
vious sections. We observe that , R or , C
are any constants; the use of the indeterminate y is
only necessary not to confuse the action on dierent
f.p.s.; because g(0) = 0 in composition, the last sum
is actually nite. Some points require more lengthy
comments. The property of shifting can be easily gen-
eralized to [t
n
]t
k
f(t) = [t
nk
]f(t) and also to nega-
tive powers: [t
n
]f(t)/t
k
= [t
n+k
]f(t). These rules are
very important and are often applied in the theory of
f.p.s. and f.L.s.. In the former case, some care should
be exercised to see whether the properties remain in
the realm of T or go beyond it, invading the domain
of L, which can be not always correct. The property
of dierentiation for n = 1 gives [t
1
]f
(t) = 0, a
situation we already noticed. The operator [t
1
] is
also called the residue and is noted as res; so, for
example, people write resf
(t) and we
can successively apply this form to our case:
[t
n
](1 +t)
r
=
r
n
[t
n1
](1 +t)
r1
=
=
r
n
(r 1)
n 1
[t
n2
](1 +t)
r2
= =
=
r
n
(r 1)
n 1
(r n + 1)
1
[t
0
](1 +t)
rn
=
=
n
r
n
[t
0
](1 +t)
rn
.
We now observe that [t
0
](1 + t)
rn
= 1 because of
our observations on f.p.s. operations. Therefore, we
conclude with the so-called Newtons rule:
[t
n
](1 +t)
r
=
r
n
n
which is one of the most frequently used results in
coecient extraction. Let us remark explicitly that
when r = 1 (the geometric series) we have:
[t
n
]
1
1 +t
=
1
n
n
=
=
1 +n 1
n
(1)
n
n
= ()
n
.
A simple, but important use of Newtons rule con-
cerns the extraction of the coecient of t
n
from the
inverse of a trinomial at
2
+ bt + c, in the case it is
reducible, i.e., it can be written (1 +t)(1 +t); ob-
viously, we can always reduce the constant c to 1; by
the linearity rule, it can be taken outside the coe-
cient of operator. Therefore, our aim is to compute:
[t
n
]
1
(1 +t)(1 +t)
with = , otherwise Newtons rule would be im-
mediately applicable. The problem can be solved by
using the technique of partial fraction expansion. We
look for two constants A and B such that:
1
(1 +t)(1 +t)
=
A
1 +t
+
B
1 +t
=
=
A+At +B +Bt
(1 +t)(1 +t)
;
if two such constants exist, the numerator in the rst
expression should equal the numerator in the last one,
independently of t, or, if one so prefers, for every
value of t. Therefore, the term A+B should be equal
to 1, while the term (A +B)t should always be 0.
The values for A and B are therefore the solution of
the linear system:
A+B = 1
A +B = 0
3.7. MATRIX REPRESENTATION 31
The discriminant of this system is , which is
always dierent from 0, because of our hypothesis
= . The system has therefore only one solution,
which is A = /() and B = /(). We can
now substitute these values in the expression above:
[t
n
]
1
(1 +t)(1 +t)
=
= [t
n
]
1
1 +t
1 +t
=
=
1
[t
n
]
1 +t
[t
n
]
1 +t
=
=
n+1
n+1
(1)
n
Let us now consider a trinomial 1 + bt + ct
2
for
which = b
2
4c < 0 and b = 0. The trinomial is
irreducible, but we can write:
[t
n
]
1
1 +bt +ct
2
=
= [t
n
]
1
1
b+i
||
2
t
1
bi
||
2
t
n+1
.
Since and are complex numbers, the resulting
expression is not very appealing. We can try to give
it a better form. Let us set =
b +i
[[
/2,
so is always contained in the positive imaginary
halfplane. This implies 0 < arg() < and we have:
=
b
2
+i
[[
2
+
b
2
+i
[[
2
=
= i
[[ = i
4c b
2
If = arg() and:
= [[ =
b
2
4
4c b
2
4
=
c
we can set = e
i
and = e
i
. Consequently:
n+1
n+1
=
n+1
e
i(n+1)
e
i(n+1)
=
= 2i
n+1
sin(n + 1)
and therefore:
[t
n
]
1
1 +bt +ct
2
=
2(
c)
n+1
sin(n + 1)
4c b
2
At this point we only have to nd the value of .
Obviously:
= arctan
[[
2
b
2
+k = arctan
4c b
2
b
+k
When b < 0, we have 0 < arctan
4c b
2
/2 <
/2, and this is the correct value for . However,
when b > 0, the principal branch of arctan is negative,
and we should set = +arctan
4c b
2
/2. As
a consequence, we have:
= arctan
4c b
2
b
+C
where C = if b > 0 and C = 0 if b < 0.
An interesting and non-trivial example is given by:
n
= [t
n
]
1
1 3t + 3t
2
=
=
2(
3)
n+1
sin((n + 1) arctan(
3/3))
3
=
= 2(
3)
n
sin
(n + 1)
6
These coecients have the following values:
n = 12k
n
=
3
12k
= 729
k
n = 12k + 1
n
=
3
12k+2
= 3 729
k
n = 12k + 2
n
= 2
3
12k+2
= 6 729
k
n = 12k + 3
n
=
3
12k+4
= 9 729
k
n = 12k + 4
n
=
3
12k+4
= 9 729
k
n = 12k + 5
n
= 0
n = 12k + 6
n
=
3
12k+6
= 27 729
k
n = 12k + 7
n
=
3
12k+8
= 81 729
k
n = 12k + 8
n
= 2
3
12k+8
= 162 729
k
n = 12k + 9
n
=
3
12k+10
= 243 729
k
n = 12k + 10
n
=
3
12k+10
= 243 729
k
n = 12k + 11
n
= 0
3.7 Matrix representation
Let f(t) T
0
; with the coecients of f(t) we form
the following innite lower triangular matrix (or ar-
ray) D = (d
n,k
)
n,kN
: column 0 contains the coe-
cients f
0
, f
1
, f
2
, . . . in this order; column 1 contains
the same coecients shifted down by one position
and d
0,1
= 0; in general, column k contains the coef-
cients of f(t) shifted down k positions, so that the
rst k positions are 0. This denition can be sum-
marized in the formula d
n,k
= f
nk
, n, k N. For
a reason which will be apparent only later, the array
32 CHAPTER 3. FORMAL POWER SERIES
D will be denoted by (f(t), 1):
D = (f(t), 1) =
f
0
0 0 0 0
f
1
f
0
0 0 0
f
2
f
1
f
0
0 0
f
3
f
2
f
1
f
0
0
f
4
f
3
f
2
f
1
f
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
If (f(t), 1) and (g(t), 1) are the matrices corre-
sponding to the two f.p.s. f(t) and g(t), we are in-
terested in nding out what is the matrix obtained
by multiplying the two matrices with the usual row-
by-column product. This product will be denoted by
(f(t), 1) (g(t), 1), and it is immediate to see what
its generic element d
n,k
is. The row n in (f(t), 1) is,
by denition, f
n
, f
n1
, f
n2
, . . ., and column k in
(g(t), 1) is 0, 0, . . . , 0, g
1
, g
2
, . . . where the number
of leading 0s is just k. Therefore we have:
d
n,k
=
j=0
f
nj
g
jk
if we conventionally set g
r
= 0, r < 0, When
k = 0, we have d
n,0
=
j=0
f
nj
g
j
=
n
j=0
f
nj
g
j
,
and therefore column 0 contains the coecients of
the convolution f(t)g(t). When k = 1 we have
d
n,1
=
j=0
f
nj
g
j1
=
n1
j=0
f
n1j
g
j
, and this
is the coecient of t
n1
in the convolution f(t)g(t).
Proceeding in the same way, we see that column k
contains the coecients of the convolution f(t)g(t)
shifted down k positions. Therefore we conclude:
(f(t), 1) (g(t), 1) = (f(t)g(t), 1)
and this shows that there exists a group isomorphism
between (T
0
, ) and the set of matrices (f(t), 1) with
the row-by-column product. In particular, (1, 1) is
the identity (in fact, it corresponds to the identity
matrix) and (f(t)
1
, 1) is the inverse of (f(t), 1).
Let us now consider a f.p.s. f(t) T
1
and let
us build an innite lower triangular matrix in the
following way: column k contains the coecients of
f(t)
k
in their proper order:
1 0 0 0 0
0 f
1
0 0 0
0 f
2
f
2
1
0 0
0 f
3
2f
1
f
2
f
3
1
0
0 f
4
2f
1
f
3
+f
2
2
3f
2
1
f
2
f
4
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The matrix will be denoted by (1, f(t)/t) and we are
interested to see how the matrix (1, g(t)/t)(1, f(t)/t)
is composed when f(t), g(t) T
1
.
If
f
n,k
n,kN
= (1, f(t)/t), by denition we have:
f
n,k
= [t
n
]f(t)
k
and therefore the generic element d
n,k
of the product
is:
d
n,k
=
j=0
g
n,j
f
j,k
=
j=0
[t
n
]g(t)
j
[y
j
]f(y)
k
=
= [t
n
]
j=0
[y
j
]f(t)
k
g(t)
j
= [t
n
]f(g(t))
k
.
In other words, column k in (1, g(t)/t) (1, f(t)/t) is
the kth power of the composition f(g(t)), and we can
conclude:
(1, g(t)/t) (1, f(t)/t) = (1, f(g(t))/t).
Clearly, the identity t T
1
corresponds to the ma-
trix (1, t/t) = (1, 1), the identity matrix, and this is
sucient to prove that the correspondence f(t)
(1, f(t)/t) is a group isomorphism.
Row-by-column product is surely the basic op-
eration on matrices and its extension to innite,
lower triangular arrays is straight-forward, because
the sums involved in the product are actually nite.
We have shown that we can associate every f.p.s.
f(t) T
0
to a particular matrix (f(t), 1) (let us
denote by A the set of such arrays) in such a way
that (T
0
, ) is isomorphic to (A, ), and the Cauchy
product becomes the row-by-column product. Be-
sides, we can associate every f.p.s. g(t) T
1
to a
matrix (1, g(t)/t) (let us call B the set of such matri-
ces) in such a way that (T
1
, ) is isomorphic to (B, ),
and the composition of f.p.s. becomes again the row-
by-column product. This reveals a connection be-
tween the Cauchy product and the composition: in
the Chapter on Riordan Arrays we will explore more
deeply this connection; for the moment, we wish to
see how this observation yields to a computational
method for evaluating the compositional inverse of a
f.p.s. in T
1
.
3.8 Lagrange inversion theo-
rem
Given an innite, lower triangular array of the form
(1, f(t)/t), with f(t) T
1
, the inverse matrix
(1, g(t)/t) is such that (1, g(t)/t) (1, f(t)/t) = (1, 1),
and since the product results in (1, f(g(t))/t) we have
f(g(t)) = t. In other words, because of the isomor-
phism we have seen, the inverse matrix for (1, f(t)/t)
is just the matrix corresponding to the compositional
inverse of f(t). As we have already said, Lagrange
3.9. SOME EXAMPLES OF THE LIF 33
found a noteworthy formula for the coecients of
this compositional inverse. We follow the more recent
proof of Stanley, which points out the purely formal
aspects of Lagranges formula. Indeed, we will prove
something more, by nding the exact form of the ma-
trix (1, g(t)/t), inverse of (1, f(t)/t). As a matter of
fact, we state what the form of (1, g(t)/t) should be
and then verify that it is actually so.
Let D = (d
n,k
)
n,kN
be dened as:
d
n,k
=
k
n
[t
nk
]
t
f(t)
n
.
Because f(t)/t T
0
, the power (t/f(t))
k
=
(f(t)/t)
k
is well-dened; in order to show that
(d
n,k
)
n,kN
= (1, g(t)/t) we only have to prove that
D (1, f(t)/t) = (1, 1), because we already know that
the compositional inverse of f(t) is unique. The
generic element v
n,k
of the row-by-column product
D (1, f(t)/t) is:
v
n,k
=
j=0
d
n,j
[y
j
]f(y)
k
=
=
j=0
j
n
[t
nj
]
t
f(t)
n
[y
j
]f(y)
k
.
By the rule of dierentiation for the coecient of op-
erator, we have:
j[y
j
]f(y)
k
= [y
j1
]
d
dy
f(y)
k
= k[y
j
]yf
(y)f(y)
k1
.
Therefore, for v
n,k
we have:
v
n,k
=
k
n
j=0
[t
nj
]
t
f(t)
n
[y
j
]yf
(y)f(y)
k1
=
=
k
n
[t
n
]
t
f(t)
n
tf
(t)f(t)
k1
.
In fact, the factor k/n does not depend on j and
can be taken out of the summation sign; the sum
is actually nite and is the term of the convolution
appearing in the last formula. Let us now distinguish
between the case k = n and k = n. When k = n we
have:
v
n,n
= [t
n
]t
n
tf(t)
1
f
(t) =
= [t
0
]f
(t)
t
f(t)
= 1;
in fact, f
(t) = f
1
+ 2f
2
t + 3f
3
t
2
+ and, being
f(t)/t T
0
, (f(t)/t)
1
= (f
1
+f
2
t +f
3
t
2
+ )
1
=
f
1
1
+ ; therefore, the constant term in f
(t)(t/f(t))
is f
1
/f
1
= 1. When k = n:
v
n,k
=
k
n
[t
n
]t
n
tf(t)
kn1
f
(t) =
=
k
n
[t
1
]
1
k n
d
dt
f(t)
kn
= 0;
in fact, f(t)
kn
is a f.L.s. and, as we observed, the
residue of its derivative should be zero. This proves
that D (1, f(t)/t) = (1, 1) and therefore D is the
inverse of (1, f(t)/t).
If f(t) is the compositional inverse of f(t), the col-
umn 1 gives us the value of its coecients; by the
formula for d
n,k
we have:
f
n
= [t
n
]f(t) = d
n,1
=
1
n
[t
n1
]
t
f(t)
n
and this is the celebrated Lagrange Inversion For-
mula (LIF). The other columns give us the coecients
of the powers f(t)
k
, for which we have:
[t
n
]f(t)
k
=
k
n
[t
nk
]
t
f(t)
n
.
Many times, there is another way for applying the
LIF. Suppose we have a functional equation w =
t(w), where (t) T
0
, and we wish to nd the
f.p.s. w = w(t) satisfying this functional equation.
Clearly w(t) T
1
and if we set f(y) = y/(y), we
also have f(t) T
1
. However, the functional equa-
tion can be written f(w(t)) = t, and this shows that
w(t) is the compositional inverse of f(t). We there-
fore know that w(t) is uniquely determined and the
LIF gives us:
[t
n
]w(t) =
1
n
[t
n1
]
t
f(t)
n
=
1
n
[t
n1
](t)
n
.
The LIF can also give us the coecients of the
powers w(t)
k
, but we can obtain a still more general
result. Let F(t) T and let us consider the com-
position F(w(t)) where w = w(t) is, as before, the
solution to the functional equation w = t(w), with
(w) T
0
. For the coecient of t
n
in F(w(t)) we
have:
[t
n
]F(w(t)) = [t
n
]
k=0
F
k
w(t)
k
=
=
k=0
F
k
[t
n
]w(t)
k
=
k=0
F
k
k
n
[t
nk
](t)
n
=
=
1
n
[t
n1
]
k=0
kF
k
t
k1
(t)
n
=
=
1
n
[t
n1
]F
(t)(t)
n
.
Note that [t
0
]F(w(t)) = F
0
, and this formula can be
generalized to every F(t) L, except for the coe-
cient [t
0
]F(w(t)).
3.9 Some examples of the LIF
We found that the number b
n
of binary trees
with n nodes (and of other combinatorial objects
34 CHAPTER 3. FORMAL POWER SERIES
as well) satises the recurrence relation: b
n+1
=
n
k=0
b
k
b
nk
. Let us consider the f.p.s. b(t) =
k=0
b
k
t
k
; if we multiply the recurrence relation by
t
n+1
and sum for n from 0 to innity, we nd:
n=0
b
n+1
t
n+1
=
n=0
t
n+1
k=0
b
k
b
nk
.
Since b
0
= 1, we can add and subtract 1 = b
0
t
0
from
the left hand member and can take t outside the sum-
mation sign in the right hand member:
n=0
b
n
t
n
1 = t
n=0
k=0
b
k
b
nk
t
n
.
In the r.h.s. we recognize a convolution and substi-
tuting b(t) for the corresponding f.p.s., we obtain:
b(t) 1 = tb(t)
2
.
We are interested in evaluating b
n
= [t
n
]b(t); let us
therefore set w = w(t) = b(t) 1, so that w(t) T
1
and w
n
= b
n
, n > 0. The previous relation becomes
w = t(1+w)
2
and we see that the LIF can be applied
(in the form relative to the functional equation) with
(t) = (1 +t)
2
. Therefore we have:
b
n
= [t
n
]w(t) =
1
n
[t
n1
](1 +t)
2n
=
=
1
n
2n
n 1
=
1
n
(2n)!
(n 1)!(n + 1)!
=
=
1
n + 1
(2n)!
n!n!
=
1
n + 1
2n
n
.
As we said in the previous chapter, b
n
is called the
nth Catalan number and, under this name, it is often
denoted by C
n
. Now we have its form:
C
n
=
1
n + 1
2n
n
r
r
r
r
r
r
r
r r
t
t
tt
T
1
t
t
tt
T
2
. . .
t
t
tt
T
p
which proves that T
n+1
=
T
i1
T
i2
T
ip
, where the
sum is extended to all the p-uples (i
1
, i
2
, . . . , i
p
) such
that i
1
+i
2
+ +i
p
= n. As before, we can multiply
by t
n+1
the two members of the recurrence relation
and sum for n from 0 to innity. We nd:
T(t) 1 = tT(t)
p
.
This time we have a p degree equation, which cannot
be directly solved. However, if we set w(t) = T(t)1,
so that w(t) T
1
, we have:
w = t(1 +w)
p
and the LIF gives:
T
n
= [t
n
]w(t) =
1
n
[t
n1
](1 +t)
pn
=
=
1
n
pn
n 1
=
1
n
(pn)!
(n 1)!((p 1)n + 1)!
=
=
1
(p 1)n + 1
pn
n
n=1
n
n1
n!
t
n
= t +t
2
+
3
2
t
3
+
8
3
t
4
+
125
24
t
5
+ .
As noticed, w(t) is the compositional inverse of
f(t) = t/(t) = te
t
= t t
2
+
t
3
2!
t
4
3!
+
t
5
4!
.
It is a useful exercise to perform the necessary com-
putations to show that f(w(t)) = t, for example up to
the term of degree 5 or 6, and verify that w(f(t)) = t
as well.
3.10 Formal power series and
the computer
When we are dealing with generating functions, or
more in general with formal power series of any kind,
we often have to perform numerical computations in
order to verify some theoretical result or to experi-
ment with actual cases. In these and other circum-
stances the computer can help very much with its
speed and precision. Nowadays, several Computer
Algebra Systems exist, which oer the possibility of
3.11. THE INTERNAL REPRESENTATION OF EXPRESSIONS 35
actually working with formal power series, contain-
ing formal parameters as well. The use of these tools
is recommended because they can solve a doubt in
a few seconds, can clarify dicult theoretical points
and can give useful hints whenever we are faced with
particular problems.
However, a Computer Algebra System is not al-
ways accessible or, in certain circumstances, one may
desire to use less sophisticated tools. For example,
programmable pocket computers are now available,
which can perform quite easily the basic operations
on formal power series. The aim of the present and
of the following sections is to describe the main al-
gorithms for dealing with formal power series. They
can be used to program a computer or to simply un-
derstand how an existing system actually works.
The simplest way to represent a formal power se-
ries is surely by means of a vector, in which the kth
component (starting from 0) is the coecient of t
k
in
the power series. Obviously, the computer memory
only can store a nite number of components, so an
upper bound n
0
is usually given to the length of vec-
tors and to represent power series. In other words we
have:
repr
n0
k=0
a
k
t
k
= (a
0
, a
1
, . . . , a
n
) (n n
0
)
Fortunately, most operations on formal power se-
ries preserve the number of signicant components,
so that there is little danger that a number of succes-
sive operations could reduce a nite representation to
a meaningless sequence of numbers. Dierentiation
decreases by one the number of useful components;
on the contrary, integration and multiplication by t
r
,
say, increase the number of signicant elements, at
the cost of introducing some 0 components.
The components a
0
, a
1
, . . . , a
n
are usually real
numbers, represented with the precision allowed by
the particular computer. In most combinatorial ap-
plications, however, a
0
, a
1
, . . . , a
n
are rational num-
bers and, with some extra eort, it is not dicult
to realize rational arithmetic on a computer. It is
sucient to represent a rational number as a couple
(m, n), whose intended meaning is just m/n. So we
must have m Z, n N and it is a good idea to
keep m and n coprime. This can be performed by
a routine reduce computing p = gcd(m, n) using Eu-
clids algorithm and then dividing both m and n by
p. The operations on rational numbers are dened in
the following way:
(m, n) + (m
, n
) = reduce(mn
+m
n, nn
)
(m, n) (m
, n
) = reduce(mm
, nn
)
(m, n)
1
= (n, m)
(m, n)
p
= (m
p
, n
p
) (p N)
provided that (m, n) is a reduced rational number.
The dimension of m and n is limited by the internal
representation of integer numbers.
In order to avoid this last problem, Computer Al-
gebra Systems usually realize an indenite precision
integer arithmetic. An integer number has a vari-
able length internal representation and special rou-
tines are used to perform the basic operations. These
routines can also be realized in a high level program-
ming language (such as C or JAVA), but they can
slow down too much execution time if realized on a
programmable pocket computer.
3.11 The internal representa-
tion of expressions
The simple representation of a formal power series by
a vector of real, or rational, components will be used
in the next sections to explain the main algorithms
for formal power series operations. However, it is
surely not the best way to represent power series and
becomes completely useless when, for example, the
coecients depend on some formal parameter. In
other words, our representation only can deal with
purely numerical formal power series.
Because of that, Computer Algebra Systems use a
more sophisticated internal representation. In fact,
power series are simply a particular case of a general
mathematical expression. The aim of the present sec-
tion is to give a rough idea of how an expression can
be represented in the computer memory.
In general, an expression consists in operators and
operands. For example, in a +
a
`
3
`
d
d
d
Figure 3.1: The tree for a simple expression
when only numerical operands are attached to them.
Simplication is a rather complicated matter and it
is not quite clear what a simple expression is. For
example, which is simpler between (a+1)(a+2) and
a
2
+3a+2? It is easily seen that there are occasions in
which either expression can be considered simpler.
Therefore, most Computer Algebra Systems provide
both a general simplication routine together with
a series of more specic programs performing some
specic simplifying tasks, such as expanding paren-
thetized expressions or collecting like factors.
In the computer memory an expression is repre-
sented by its tree, which is called the tree or list repre-
sentation of the expression. The representation of the
formal power series
n
k=0
a
k
t
k
is shown in Figure 3.2.
This representation is very convenient and when some
coecients a
1
, a
2
, . . . , a
n
depend on a formal param-
eter p nothing is changed, at least conceptually. In
fact, where we have drawn a leaf a
j
, we simply have
a more complex tree representing the expression for
a
j
.
The reader can develop computer programs for
dealing with this representation of formal power se-
ries. It should be clear that another important point
of this approach is that no limitation is given to the
length of expressions. A clever and dynamic use of
the storage solves every problem without increasing
the complexity of the corresponding programs.
3.12 Basic operations of formal
power series
We are now considering the vector representation of
formal power series:
repr
n0
k=0
a
k
t
k
= (a
0
, a
1
, . . . , a
n
) n n
0
.
The sum of the formal power series is dened in an
obvious way:
(a
0
, a
1
, . . . , a
n
) + (b
0
, b
1
, . . . , b
m
) = (c
0
, c
1
, . . . , c
r
)
where c
i
= a
i
+ b
i
for every 0 i r, and r =
min(n, m). In a similar way the Cauchy product is
dened:
(a
0
, a
1
, . . . , a
n
) (b
0
, b
1
, . . . , b
m
) = (c
0
, c
1
, . . . , c
r
)
where c
k
=
k
j=0
a
j
b
kj
for every 0 k r. Here
r is dened as r = min(n + p
B
, m + p
A
), if p
A
is
the rst index for which a
pA
= 0 and p
B
is the rst
index for which b
pB
= 0. We point out that the
time complexity for the sum is O(r) and the time
complexity for the product is O(r
2
).
Subtraction is similar to addition and does not re-
quire any particular comment. Before discussing divi-
sion, let us consider the operation of rising a formal
power series to a power R. This includes the
inversion of a power series ( = 1) and therefore
division as well.
First of all we observe that whenever N, f(t)
, where g(t) /
T
0
. In fact we have:
f(t) = f
h
t
h
+f
h+1
t
h+1
+f
h+2
t
h+2
+ =
= f
h
t
h
1 +
f
h+1
f
h
t +
f
h+2
f
h
t
2
+
and therefore:
f(t)
= f
h
t
h
1 +
f
h+1
f
h
t +
f
h+2
f
h
t
2
+
.
On the contrary, when / N, f(t)
only can be
performed if f(t) T
0
. In that case we have:
f(t)
f
0
+f
1
t +f
2
t
2
+f
3
t
3
+
=
= f
1 +
f
1
f
0
t +
f
2
f
0
t
2
+
f
3
f
0
t
3
+
;
note that in this case if f
0
= 1, usually f
0
is not ratio-
nal. In any case, we are always reduced to compute
(1 +g(t))
and since:
(1 +g(t))
k=0
g(t)
k
(3.12.1)
if the coecients in g(t) are rational numbers, also
the coecients in (1 + g(t))
are, provided Q.
These considerations are to be remembered if the op-
eration is realized in some special environment, as
described in the previous section.
Whatever R is, the exponents involved in
the right hand member of (3.12.1) are all positive
integer numbers. Therefore, powers can be real-
ized as successive multiplications or Cauchy products.
This gives a straight-forward method for perform-
ing (1 + g(t))
d
d
d
d
d
d
.
.
.
.
.
d
d
d
`
a
0
`
t
0
`
a
1
`
t
1
`
a
2
`
t
2
`
a
n
`
t
n
`
O(t
n
)
d
d
d
Figure 3.2: The tree for a formal power series
J. C. P. Miller has devised an algorithm allowing to
perform (1 + g(t))
in time O(r
2
). In fact, let us
write h(t) = a(t)
(t) = a(t)
1
a
(t) = h(t)a
k=0
a
k
(n k)h
nk
=
n1
k=0
(k + 1)a
k+1
h
nk1
We now isolate the term with k = 0 in the left hand
member and the term having k = n 1 in the right
hand member (a
0
= 1 by hypothesis):
nh
n
+
n1
k=1
a
k
(n k)h
nk
= na
n
+
n1
k=1
ka
k
h
nk
(in the last sum we performed the change of vari-
able k k 1, in order to have the same indices as
in the left hand member). We now have an expres-
sion for h
n
only depending on (a
1
, a
2
, . . . , a
n
) and
(h
1
, h
2
, . . . , h
n1
):
h
n
= a
n
+
1
n
n1
k=1
(( + 1)k n) a
k
h
nk
=
= a
n
+
n1
k=1
( + 1)k
n
1
a
k
h
nk
The computation is now straight-forward. We be-
gin by setting h
0
= 1, and then we successively com-
pute h
1
, h
2
, . . . , h
r
(r = n, if n is the number of terms
in (a
1
, a
2
, . . . , a
n
)). The evaluation of h
k
requires a
number of operations in the order O(k), and therefore
the whole procedure works in time O(r
2
), as desired.
The inverse of a series, i.e., (1+g(t))
1
, is obtained
by setting = 1. It is worth noting that the previ-
ous formula becomes:
h
k
= a
k
k1
j=1
a
j
h
kj
=
k
j=1
a
j
h
kj
(h
0
= 1)
and can be used to prove properties of the inverse of
a power series. As a simple example, the reader can
show that the coecients in (1 t)
1
are all 1.
3.13 Logarithm and exponen-
tial
The idea of Miller can be applied to other operations
on formal power series. In the present section we
wish to use it to perform the (natural) logarithm and
the exponentiation of a series. Let us begin with the
logarithm and try to compute ln(1 + g(t)). As we
know, there is a direct way to perform this operation,
i.e.:
ln(1 +g(t)) =
k=1
1
k
g(t)
k
38 CHAPTER 3. FORMAL POWER SERIES
and this formula only requires a series of successive
products. As for the operation of rising to a power,
the procedure needs a time in the order of O(r
3
),
and it is worth considering an alternative approach.
In fact, if we set h(t) = ln(1+g(t)), by dierentiating
we obtain h
(t) = g
(t)/(1 + g(t)), or h
(t) = g
(t)
h
j=0
(k j)h
kj
g
j
However, g
0
= 0 by hypothesis, and therefore we have
an expression relating h
k
to (g
1
, g
2
, . . . , g
k
) and to
(h
1
, h
2
, . . . , h
k1
):
h
k
= g
k
1
k
k1
j=1
(k j)h
kj
g
j
=
= g
k
k1
j=1
1
j
k
h
kj
g
j
A program to perform the logarithm of a formal
power series 1+g(t) begins by setting h
0
= 0 and then
proceeds computing h
1
, h
2
, . . . , h
r
if r is the number
of signicant terms in g(t). The total time is clearly
in the order of O(r
2
).
A similar technique can be applied to the com-
putation of exp(g(t)) provided that g(t) / T
0
. If
g(t) T
0
, i.e., g(t) = g
0
+ g
1
t + g
2
t
2
+ , we have
exp(g
0
+g
1
t+g
2
t
2
+ ) = e
g0
exp(g
1
t+g
2
t
2
+ ). In
this way we are reduced to the previous case, but we
no longer have rational coecients when g(t) Q[[t]].
By dierentiating the identity h(t) = exp(g(t)) we
obtain h
(t) = g
(t) exp(g(t)) = g
(t)h(t). We extract
the coecient of t
k1
:
kh
k
=
k1
j=0
(j + 1)g
j+1
h
kj1
=
k
j=1
jg
j
h
kj
This formula allows us to compute h
k
in terms of
(g
1
, g
2
, . . . , g
k
) and (h
0
= 1, h
1
, h
2
, . . . , h
k1
). A pro-
gram performing exponentiation can be easily writ-
ten by dening h
0
= 1 and successively evaluating
h
1
, h
2
, . . . , h
r
. if r is the number of signicant terms
in g(t). Time complexity is obviously O(r
2
).
Unfortunately, a similar trick does not work for
series composition. To compute f(g(t)), when g(t) /
T
0
, we have to resort to the dening formula:
f(g(t)) =
k=0
f
k
g(t)
k
This requires the successive computation of the in-
teger powers of g(t), which can be performed by re-
peated applications of the Cauchy product. The exe-
cution time is in the order of O(r
3
), if r is the minimal
number of signicant terms in f(t) and g(t), respec-
tively.
We conclude this section by sketching the obvious
algorithms to compute dierentiation and integration
of a formal power series f(t) = f
0
+f
1
t+f
2
t
2
+f
3
t
3
+
. If h(t) = f
(t), we have:
h
k
= (k + 1)f
k+1
and therefore the number of signicant terms is re-
duced by 1. Conversely, if h(t) =
t
0
f()d, we have:
h
k
=
1
k
f
k1
and h
0
= 0; consequently the number of signicant
terms is increased by 1.
Chapter 4
Generating Functions
4.1 General Rules
Let us consider a sequence of numbers F =
(f
0
, f
1
, f
2
, . . .) = (f
k
)
kN
; the (ordinary) generat-
ing function for the sequence F is dened as f(t) =
f
0
+ f
1
t + f
2
t
2
+ , where the indeterminate t is
arbitrary. Given the sequence (f
k
)
kN
, we intro-
duce the generating function operator (, which ap-
plied to (f
k
)
kN
produces the ordinary generating
function for the sequence, i.e., ((f
k
)
kN
= f(t). In
this expression t is a bound variable, and a more ac-
curate notation would be (
t
(f
k
)
kN
= f(t). This
notation is essential when (f
k
)
kN
depends on some
parameter or when we consider multivariate gener-
ating functions. In this latter case, for example, we
should write (
t,w
(f
n,k
)
n,kN
= f(t, w) to indicate the
fact that f
n,k
in the double sequence becomes the co-
ecient of t
n
w
k
in the function f(t, w). However,
whenever no ambiguity can arise, we will use the no-
tation ((f
k
) = f(t), understanding also the binding
for the variable k. For the sake of completeness, we
also dene the exponential generating function of the
sequence (f
0
, f
1
, f
2
, . . .) as:
c(f
k
) = (
f
k
k!
k=0
f
k
t
k
k!
.
The operator ( is clearly linear. The function f(t)
can be shifted or dierentiated. Two functions f(t)
and g(t) can be multiplied and composed. This leads
to the properties for the operator (listed in Table 4.1.
Note that formula (G5) requires g
0
= 0. The rst
ve formulas are easily veried by using the intended
interpretation of the operator (; the last formula can
be proved by means of the LIF, in the form relative
to the composition F(w(t)). In fact we have:
[t
n
]F(t)(t)
n
= [t
n1
]
F(t)
t
(t)
n
=
= n[t
n
]
F(y)
y
dy
y = w(t)
;
in the last passage we applied backwards the formula:
[t
n
]F(w(t)) =
1
n
[t
n1
]F
(t)(t)
n
(w = t(w))
and therefore w = w(t) T
1
is the unique solution
of the functional equation w = t(w). By now apply-
ing the rule of dierentiation for the coecient of
operator, we can go on:
[t
n
]F(t)(t)
n
= [t
n1
]
d
dt
F(y)
y
dy
y = w(t)
=
= [t
n1
]
F(w)
w
w = t(w)
dw
dt
.
We have applied the chain rule for dierentiation, and
from w = t(w) we have:
dw
dt
= (w) +t
d
dw
w = t(w)
dw
dt
.
We can therefore compute the derivative of w(t):
dw
dt
=
(w)
1 t
(w)
w = t(w)
where
F(w)
w
(w)
1 t
(w)
w = t(w)
=
= [t
n1
]
1
t
F(w)
1 t
(w)
w = t(w)
=
= [t
n
]
F(w)
1 t
(w)
w = t(w)
k=0
f
k
g
nk
= ((f
k
) ((g
k
) (G4)
composition
n=0
f
n
(((g
k
))
n
= ((f
k
) ((g
k
) (G5)
diagonalisation (([t
n
]F(t)(t)
n
) =
F(w)
1 t
(w)
w = t(w)
(G6)
Table 4.1: The rules for the generating function operator
[t
n
]F(t)(t)
n
are just the elements in the main di-
agonal of this array.
The rules (G1) (G6) can also be assumed as ax-
ioms of a theory of generating functions and used to
derive general theorems as well as specic functions
for particular sequences. In the next sections, we
will prove a number of properties of the generating
function operator. The proofs rely on the following
fundamental principle of identity:
Given two sequences (f
k
)
kN
and (g
k
)
kN
, then
((f
k
) = ((g
k
) if and only if for every k N f
k
= g
k
.
The principle is rather obvious from the very deni-
tion of the concept of generating functions; however,
it is important, because it states the condition under
which we can pass from an identity about elements
to the corresponding identity about generating func-
tions. It is sucient that the two sequences do not
agree by a single element (e.g., the rst one) and we
cannot infer the equality of generating functions.
4.2 Some Theorems on Gener-
ating Functions
We are now going to prove a series of properties of
generating functions.
Theorem 4.2.1 Let f(t) = ((f
k
) be the generating
function of the sequence (f
k
)
kN
, then
((f
k+2
) =
((f
k
) f
0
f
1
t
t
2
(4.2.1)
Proof: Let g
k
= f
k+1
; by (G2), ((g
k
) =
(((f
k
) f
0
) /t. Since g
0
= f
1
, we have:
((f
k+2
) = ((g
k+1
) =
((g
k
) g
0
t
=
((f
k
) f
0
f
1
t
t
2
By mathematical induction this result can be gen-
eralized to:
Theorem 4.2.2 Let f(t) = ((f
k
) be as above; then:
((f
k+j
) =
((f
k
) f
0
f
1
t f
j1
t
j1
t
j
(4.2.2)
If we consider right instead of left shifting we have
to be more careful:
Theorem 4.2.3 Let f(t) = ((f
k
) be as above, then:
((f
kj
) = t
j
((f
k
) (4.2.3)
Proof: We have ((f
k
) = (
f
(k1)+1
=
t
1
(((f
n1
) f
1
) where f
1
is the coecient of t
1
in f(t). If f(t) T, f
1
= 0 and ((f
k1
) = t((f
k
).
The theorem then follows by mathematical induction.
Property (G3) can be generalized in several ways:
Theorem 4.2.4 Let f(t) = ((f
k
) be as above; then:
(((k + 1)f
k+1
) = D((f
k
) (4.2.4)
Proof: If we set g
k
= kf
k
, we obtain:
(((k + 1)f
k+1
) = ((g
k+1
) =
= t
1
(((kf
k
) 0f
0
)
(G2)
=
= t
1
tD((f
k
) = D((f
k
) .
Theorem 4.2.5 Let f(t) = ((f
k
) be as above; then:
(
k
2
f
k
= tD((f
k
) +t
2
D
2
((f
k
) (4.2.5)
This can be further generalized:
4.3. MORE ADVANCED RESULTS 41
Theorem 4.2.6 Let f(t) = ((f
k
) be as above; then:
(
k
j
f
k
= o
j
(tD)((f
k
) (4.2.6)
where o
j
(w) =
j
r=1
j
r
w
r
is the jth Stirling poly-
nomial of the second kind (see Section 2.11).
Proof: Formula (4.2.6) is to be understood in the
operator sense; so, for example, being o
3
(w) = w +
3w
2
+w
3
, we have:
(
k
3
f
k
= tD((f
k
) + 3t
2
D
2
((f
k
) +t
3
D
3
((f
k
) .
The proof proceeds by induction, as (G3) and (4.2.5)
are the rst two instances. Now:
(
k
j+1
f
k
= (
k(k
j
)f
k
= tD(
k
j
f
k
that is:
o
j+1
(tD) = tDo
j
(tD) = tD
j
r=1
S(j, r)t
r
D
r
=
=
j
r=1
S(j, r)rt
r
D
r
+
j
r=1
S(j, r)t
r+1
D
r+1
.
By equating like coecients we nd S(j + 1, r) =
rS(j, r) + S(j, r 1), which is the classical recur-
rence for the Stirling numbers of the second kind.
Since initial conditions also coincide, we can conclude
S(j, r) =
j
r
.
For the falling factorial k
r
= k(k 1) (k r +
1) we have a simpler formula, the proof of which is
immediate:
Theorem 4.2.7 Let f(t) = ((f
k
) be as above; then:
((k
r
f
k
) = t
r
D
r
((f
k
) (4.2.7)
Let us now come to integration:
Theorem 4.2.8 Let f(t) = ((f
k
) be as above and
let us dene g
k
= f
k
/k, k = 0 and g
0
= 0; then:
(
1
k
f
k
= ((g
k
) =
t
0
(((f
k
) f
0
)
dz
z
(4.2.8)
Proof: Clearly, kg
k
= f
k
, except for k = 0. Hence
we have ((kg
k
) = ((f
k
) f
0
. By using (G3), we nd
tD((g
k
) = ((f
k
) f
0
, from which (4.2.8) follows by
integration and the condition g
0
= 0.
A more classical formula is:
Theorem 4.2.9 Let f(t) = ((f
k
) be as above or,
equivalently, let f(t) be a f.p.s. but not a f.L.s.; then:
(
1
k + 1
f
k
=
=
1
t
t
0
((f
k
) dz =
1
t
t
0
f(z) dz (4.2.9)
Proof: Let us consider the sequence (g
k
)
kN
, where
g
k+1
= f
k
and g
0
= 0. So we have: ((g
k+1
) =
((f
k
) = t
1
(((g
k
) g
0
). Finally:
(
1
k + 1
f
k
= (
1
k + 1
g
k+1
=
1
t
(
1
k
g
k
=
=
1
t
t
0
(((g
k
) g
0
)
dz
z
=
1
t
t
0
((f
k
) dz
In the following theorems, f(t) will always denote
the generating function ((f
k
).
Theorem 4.2.10 Let f(t) = ((f
k
) denote the gen-
erating function of the sequence (f
k
)
kN
; then:
(
p
k
f
k
= f(pt) (4.2.10)
Proof: By setting g(t) = pt in (G5) we have:
f(pt) =
n=0
f
n
(pt)
n
=
n=0
p
n
f
n
t
n
= (
p
k
f
k
(1)
k
f
k
=
f(t).
4.3 More advanced results
The results obtained till now can be considered as
simple generalizations of the axioms. They are very
useful and will be used in many circumstances. How-
ever, we can also obtain more advanced results, con-
cerning sequences derived from a given sequence by
manipulating its elements in various ways. For exam-
ple, let us begin by proving the well-known bisection
formulas:
Theorem 4.3.1 Let f(t) = ((f
k
) denote the gener-
ating function of the sequence (f
k
)
kN
; then:
((f
2k
) =
f(
t) +f(
t)
2
(4.3.1)
((f
2k+1
) =
f(
t) f(
t)
2
t
(4.3.2)
where
t)
2
= t.
Proof: By (G5) we have:
f(
t) +f(
t)
2
=
=
n=0
f
n
t
n
+
n=0
f
n
(
t)
n
2
=
=
n=0
f
n
(
t)
n
+ (
t)
n
2
For n odd (
t)
n
+ (
t)
n
= 0; hence, by setting
n = 2k, we have:
k=0
f
2k
(t
k
+t
k
)/2 =
k=0
f
2k
t
k
= ((f
2k
) .
42 CHAPTER 4. GENERATING FUNCTIONS
The proof of the second formula is analogous.
The following proof is typical and introduces the
use of ordinary dierential equations in the calculus
of generating functions:
Theorem 4.3.2 Let f(t) = ((f
k
) denote the gener-
ating function of the sequence (f
k
)
kN
; then:
(
f
k
2k + 1
=
1
2
t
0
((f
k
)
z
dz (4.3.3)
Proof: Let us set g
k
= f
k
/(2k+1), or 2kg
k
+g
k
= f
k
.
If g(t) = ((g
k
), by applying (G3) we have the dier-
ential equation 2tg
k=0
f
k
=
1
1 t
((f
k
) (4.3.4)
Proof: If we set s
n
=
n
k=0
f
k
, then we have s
n+1
=
s
n
+f
n+1
for every n N and we can apply the oper-
ator ( to both members: ((s
n+1
) = ((s
n
) +((f
n+1
),
i.e.:
((s
n
) s
0
t
= ((s
n
) +
((f
n
) f
0
t
Since s
0
= f
0
, we nd ((s
n
) = t((s
n
) + ((f
n
) and
from this (4.3.4) follows directly.
The following result is known as Euler transforma-
tion:
Theorem 4.3.4 Let f(t) = ((f
k
) denote the gener-
ating function of the sequence (f
k
)
kN
; then:
(
k=0
n
k
f
k
=
1
1 t
f
t
1 t
(4.3.5)
Proof: By well-known properties of binomial coe-
cients we have:
n
k
n
n k
n +n k 1
n k
(1)
nk
=
=
k 1
n k
(1)
nk
and this is the coecient of t
nk
in (1 t)
k1
. We
now observe that the sum in (4.3.5) can be extended
to innity, and by (G5) we have:
n
k=0
n
k
f
k
=
n
k=0
k 1
n k
(1)
nk
f
k
=
=
k=0
[t
nk
](1 t)
k1
[y
k
]f(y) =
= [t
n
]
1
1 t
k=0
[y
k
]f(y)
t
1 t
k
=
= [t
n
]
1
1 t
f
t
1 t
.
Since the last expression does not depend on n, it
represents the generating function of the sum.
We observe explicitly that by (K4) we have:
n
k=0
n
k
f
k
= [t
n
](1 + t)
n
f(t), but this expression
does not represent a generating function because it
depends on n. The Euler transformation can be gen-
eralized in several ways, as we shall see when dealing
with Riordan arrays.
4.4 Common Generating Func-
tions
The aim of the present section is to derive the most
common generating functions by using the apparatus
of the previous sections. As a rst example, let us
consider the constant sequence F = (1, 1, 1, . . .), for
which we have f
k+1
= f
k
for every k N. By ap-
plying the principle of identity, we nd: ((f
k+1
) =
((f
k
), that is by (G2): ((f
k
) f
0
= t((f
k
). Since
f
0
= 1, we have immediately:
((1) =
1
1 t
For any constant sequence F = (c, c, c, . . .), by (G1)
we nd that ((c) = c(1t)
1
. Similarly, by using the
basic rules and the theorems of the previous sections
we have:
((n) = ((n 1) = tD
1
1 t
=
t
(1 t)
2
(
n
2
= tD((n) = tD
1
(1 t)
2
=
t +t
2
(1 t)
3
(((1)
n
) = ((1) (t) =
1
1 +t
(
1
n
= (
1
n
1
t
0
1
1 z
1
dz
z
=
=
t
0
dz
1 z
= ln
1
1 t
((H
n
) = (
k=0
1
k
=
1
1 t
(
1
n
=
=
1
1 t
ln
1
1 t
where H
n
is the nth harmonic number. Other gen-
erating functions can be obtained from the previous
formulas:
((nH
n
) = tD
1
1 t
ln
1
1 t
=
4.4. COMMON GENERATING FUNCTIONS 43
=
t
(1 t)
2
ln
1
1 t
+ 1
1
n + 1
H
n
=
1
t
t
0
1
1 z
ln
1
1 z
dz =
=
1
2t
ln
1
1 t
2
((
0,n
) = ((1) t((1) =
1 t
1 t
= 1
where
n,m
is the Kroneckers delta. This last re-
lation can be readily generalized to ((
n,m
) = t
m
.
An interesting example is given by (
1
n(n+1)
. Since
1
n(n+1)
=
1
n
1
n+1
, it is tempting to apply the op-
erator ( to both members. However, this relation is
not valid for n = 0. In order to apply the principle
of identity, we must dene:
1
n(n + 1)
=
1
n
1
n + 1
+
n,0
in accordance with the fact that the rst element of
the sequence is zero. We thus arrive to the correct
generating function:
(
1
n(n + 1)
= 1
1 t
t
ln
1
1 t
Let us now come to binomial coecients. In order
to nd (
p
k
p
k + 1
=
p(p 1) (p k + 1)(p k)
(k + 1)!
=
=
p k
k + 1
p
k
.
Hence, by denoting
p
k
as f
k
, we have:
(((k + 1)f
k+1
) = (((p k)f
k
) = p((f
k
) ((kf
k
).
By applying (4.2.4) and (G3) we have
D((f
k
) = p((f
k
) tD((f
k
), i.e., the dieren-
tial equation f
(t) = pf(t) tf
(t). By sep-
arating the variables and integrating, we nd
ln f(t) = p ln(1 +t) +c, or f(t) = c(1 +t)
p
. For t = 0
we should have f(0) =
p
0
p
k
= (1 +t)
p
p R
We are now in a position to derive the recurrence
relation for binomial coecients. By using (K1)
(K5) we nd easily:
p
k
= [t
k
](1 +t)
p
= [t
k
](1 +t)(1 +t)
p1
=
= [t
k
](1 +t)
p1
+ [t
k1
](1 +t)
p1
=
=
p 1
k
p 1
k 1
.
By (K4), we have the well-known Vandermonde
convolution:
m+p
n
= [t
n
](1 +t)
m+p
=
= [t
n
](1 +t)
m
(1 +t)
p
=
=
n
k=0
m
k
p
n k
n
k=0
n
k
2
=
2n
n
.
We can also nd (
k
p
, where p is a parameter.
The derivation is purely algebraic and makes use of
the generating functions already found and of various
properties considered in the previous section:
(
k
p
= (
k
k p
=
= (
k +k p + 1
k p
(1)
kp
=
= (
p 1
k p
(1)
kp
=
= (
[t
kp
]
1
(1 t)
p+1
=
= (
[t
k
]
t
p
(1 t)
p+1
=
t
p
(1 t)
p+1
.
Several generating functions for dierent forms of bi-
nomial coecients can be found by means of this
method. They are summarized as follows, where p
and m are two parameters and can also be zero:
(
p
m+k
=
(1 +t)
p
t
m
(
p +k
m
=
t
mp
(1 t)
m+1
(
p +k
m+k
=
1
t
m
(1 t)
p+1m
These functions can make sense even when they are
f.L.s. and not simply f.p.s..
Finally, we list the following generating functions:
(
p
k
= tD(
p
k
=
= tD(1 +t)
p
= pt(1 +t)
p1
(
k
2
p
k
= (tD +t
2
D
2
)(
p
k
=
= pt(1 +pt)(1 +t)
p2
(
1
k + 1
p
k
=
1
t
t
0
(
p
k
dz =
=
1
t
(1 +t)
p+1
p + 1
t
0
=
=
(1 +t)
p+1
1
(p + 1)t
44 CHAPTER 4. GENERATING FUNCTIONS
(
k
m
= tD
t
m
(1 t)
m+1
=
=
mt
m
+t
m+1
(1 t)
m+2
(
k
2
k
m
= (tD +t
2
D
2
)(
k
m
=
=
m
2
t
m
+ (3m+ 1)t
m+1
+t
m+2
(1 t)
m+3
(
1
k
k
m
t
0
k
m
0
m
dz
z
=
=
t
0
z
m1
dz
(1 z)
m+1
=
t
m
m(1 t)
m
The last integral can be solved by setting y = (1
z)
1
and is valid for m > 0; for m = 0 it reduces to
((1/k) = ln(1 t).
4.5 The Method of Shifting
When the elements of a sequence F are given by an
explicit formula, we can try to nd the generating
function for F by using the technique of shifting: we
consider the element f
n+1
and try to express it in
terms of f
n
. This can produce a relation to which
we apply the principle of identity deriving an equa-
tion in ((f
n
), the solution of which is the generating
function. In practice, we nd a recurrence for the
elements f
n
F and then try to solve it by using
the rules (G1) (G5) and their consequences. It can
happen that the recurrence involves several elements
in F and/or that the resulting equation is indeed a
dierential equation. Whatever the case, the method
of shifting allows us to nd the generating function
of many sequences.
Let us consider the geometric sequence
1, p, p
2
, p
3
, . . .
; we have p
k+1
= pp
k
, k N
or, by applying the operator (, (
p
k+1
= p(
p
k
.
By (G2) we have t
1
p
k
= p(
p
k
, that is:
(
p
k
=
1
1 pt
From this we obtain other generating functions:
(
kp
k
= tD
1
1 pt
=
pt
(1 pt)
2
(
k
2
p
k
= tD
pt
(1 pt)
2
=
pt +p
2
t
2
(1 pt)
3
(
1
k
p
k
t
0
1
1 pz
1
dz
z
=
=
p dz
(1 pz)
= ln
1
1 pt
(
k=0
p
k
=
1
1 t
1
1 pt
=
=
1
(p 1)t
1
1 pt
1
1 t
.
The last relation has been obtained by partial fraction
expansion. By using the operator [t
k
] we easily nd:
n
k=0
p
k
= [t
n
]
1
1 t
1
1 pt
=
=
1
(p 1)
[t
n+1
]
1
1 pt
1
1 t
=
=
p
n+1
1
p 1
the well-known formula for the sum of a geometric
progression. We observe explicitly that the formulas
above could have been obtained from formulas of the
previous section and the general formula (4.2.10). In
a similar way we also have:
(
m
k
p
k
= (1 +pt)
m
(
k
m
p
k
= (
k
k m
p
km
p
m
=
= p
m
(
m1
k m
(p)
km
=
=
p
m
t
m
(1 pt)
m+1
.
As a very simple application of the shifting method,
let us observe that:
1
(n + 1)!
=
1
n + 1
1
n!
that is (n+1)f
n+1
= f
n
, where f
n
= 1/n!. By (4.2.4)
we have f
1
n!
= e
t
(
1
n n!
t
0
e
z
1
z
dz
(
n
(n + 1)!
= tD(
1
(n + 1)!
=
= tD
1
t
(e
t
1) =
te
t
e
t
+ 1
t
By this relation the well-known result follows:
n
k=0
k
(k + 1)!
= [t
n
]
1
1 t
te
t
e
t
+ 1
t
=
= [t
n
]
1
1 t
e
t
1
t
=
= 1
1
(n + 1)!
.
Let us now observe that
2n+2
n+1
=
2(2n+1)
n+1
2n
n
;
by setting f
n
=
2n
n
(t) =
4tf
2n
n
=
1
1 4t
(
1
n + 1
2n
n
=
1
t
t
0
dz
1 4z
=
1
1 4t
2t
(
1
n
2n
n
t
0
1 4z
1
dz
z
=
= 2 ln
1
1 4t
2t
(
2n
n
=
2t
(1 4t)
1 4t
(
1
2n + 1
2n
n
=
1
t
0
dz
4z(1 4z)
=
=
1
4t
arctan
4t
1 4t
.
A last group of generating functions is obtained by
considering f
n
= 4
n
2n
n
1
. Since:
4
n+1
2n + 2
n + 1
1
=
2n + 2
2n + 1
4
n
2n
n
1
we have the recurrence: (2n + 1)f
n+1
= 2(n + 1)f
n
.
By using the operator ( and the rules of Section 4.2,
the dierential equation 2t(1t)f
(t) (1+2t)f(t) +
1 = 0 is derived. The solution is:
f(t) =
t
(1 t)
3
t
0
(1 z)
3
z
dz
2z(1 z)
4
n
2n
n
t
(1 t)
3
arctan
t
1 t
+
1
1 t
.
Some immediate consequences are:
(
4
n
2n + 1
2n
n
=
1
t(1 t)
arctan
t
1 t
(
4
n
2n
2
2n
n
arctan
t
1 t
2
and nally:
(
1
2n
4
n
2n + 1
2n
n
= 1
1 t
t
arctan
t
1 t
.
4.6 Diagonalization
The technique of shifting is a rather general method
for obtaining generating functions. It produces rst
order recurrence relations, which will be more closely
studied in the next sections. Not every sequence can
be dened by a rst order recurrence relation, and
other methods are often necessary to nd out gener-
ating functions. Sometimes, the rule of diagonaliza-
tion can be used very conveniently. One of the most
simple examples is how to determine the generating
function of the central binomial coecients, without
having to pass through the solution of a dierential
equation. In fact we have:
2n
n
= [t
n
](1 +t)
2n
and (G6) can be applied with F(t) = 1 and (t) =
(1 +t)
2
. In this case, the function w = w(t) is easily
determined by solving the functional equation w =
t(1+w)
2
. By expanding, we nd tw
2
(12t)w+t = 0
or:
w = w(t) =
1 t
1 4t
2t
.
Since w = w(t) should belong to T
1
, we must elim-
inate the solution with the + sign; consequently, we
have:
(
2n
n
=
=
1
1 2t(1 +w)
w =
1 t
1 4t
2t
=
=
1
1 4t
as we already know.
The function (t) = (1 +t)
2
gives rise to a second
degree equation. More in general, let us study the
sequence:
c
n
= [t
n
](1 +t +t
2
)
n
and look for its generating function C(t) = ((c
n
).
In this case again we have F(t) = 1 and (t) = 1 +
t +t
2
, and therefore we should solve the functional
equation w = t(1+w+w
2
) or tw
2
(1t)w+t =
0. This gives:
w = w(t) =
1 t
(1 t)
2
4t
2
2t
and again we have to eliminate the solution with the
+ sign. By performing the necessary computations,
we nd:
C(t) =
1
1 t( + 2w)
w =
1 t
1 2t + (
2
4)t
2
2t
=
1
1 2t + (
2
4)t
2
and for = 2, = 1 we obtain again the generating
function for the central binomial coecients.
The coecients of (1+t +t
2
)
n
are called trinomial
coecients, in analogy with the binomial coecients.
They constitute an innite array in which every row
has two more elements, dierent from 0, with respect
to the previous row (see Table 4.2).
If T
n,k
is [t
k
](1 + t + t
2
)
n
, a trinomial coecient,
by the obvious property (1 + t + t
2
)
n+1
= (1 + t +
t
2
)(1+t+t
2
)
n
, we immediately deduce the recurrence
relation:
T
n+1,k+1
= T
n,k1
+T
n,k
+T
n,k+1
from which the array can be built, once we start from
the initial conditions T
n,0
= 1 and T
n,2n
= 1, for ev-
ery n N. The elements T
n,n
, marked in the table,
are called the central trinomial coecients; their se-
quence begins:
n 0 1 2 3 4 5 6 7 8
T
n
1 1 3 7 19 51 141 393 1107
and by the formula above their generating function
is:
((T
n,n
) =
1
1 2t 3t
2
=
1
(1 +t)(1 3t)
.
4.7 Some special generating
functions
We wish to determine the generating function of
the sequence 0, 1, 1, 1, 2, 2, 2, 2, 3, . . ., that is the
sequence whose generic element is
k. We can
think that it is formed up by summing an innite
number of simpler sequences 0, 1, 1, 1, 1, 1, 1, . . .,
0, 0, 0, 0, 1, 1, 1, 1, . . ., the next one with the rst 1
in position 9, and so on. The generating functions of
these sequences are:
t
1
1 t
t
4
1 t
t
9
1 t
t
16
1 t
and therefore we obtain:
(
k=0
t
k
2
1 t
.
In the same way we obtain analogous generating
functions:
(
k=0
t
k
r
1 t
((log
r
k) =
k=0
t
r
k
1 t
where r is any integer number, or also any real num-
ber, if we substitute to k
r
and r
k
, respectively, k
r
and r
k
.
These generating functions can be used to nd the
values of several sums in closed or semi-closed form.
Let us begin by the following case, where we use the
Euler transformation:
n
k
k(1)
k
=
= (1)
n
n
k
(1)
nk
k =
= (1)
n
[t
n
]
1
1 +t
k=0
y
k
2
1 y
y =
t
1 +t
=
= (1)
n
[t
n
]
k=0
t
k
2
(1 +t)
k
2
=
= (1)
n
k=0
[t
nk
2
]
1
(1 +t)
k
2
=
= (1)
n
k=0
k
2
n k
2
=
= (1)
n
k=0
n 1
n k
2
(1)
nk
2
=
=
k=0
n 1
n k
2
(1)
k
2
.
We can think of the last sum as a semi-closed
form, because the number of terms is dramatically
reduced from n to
n
k
log
2
k(1)
k
=
log
2
n
k=1
n 1
n 2
k
.
A truly closed formula is found for the following
sum:
n
k=1
k = [t
n
]
1
1 t
k=1
t
k
2
1 t
=
4.8. LINEAR RECURRENCES WITH CONSTANT COEFFICIENTS 47
=
k=1
[t
nk
2
]
1
(1 t)
2
=
=
k=1
2
n k
2
(1)
nk
2
=
=
k=1
n k
2
+ 1
n k
2
=
=
k=1
(n k
2
+ 1) =
= (n + 1)
k=1
k
2
.
The nal value of the sum is therefore:
(n + 1)
n
3
n + 1
n +
1
2
k=1
log
2
k = (n + 1)log
2
n 2
log
2
n+1
+ 1.
A somewhat more dicult sum is the following one:
n
k=0
n
k
k = [t
n
]
1
1 t
k=0
y
k
2
1 y
y =
t
1 t
= [t
n
]
k=0
t
k
2
(1 t)
k
2
(1 2t)
=
=
k=0
[t
nk
2
]
1
(1 t)
k
2
(1 2t)
.
We can now obtain a semi-closed form for this sum by
expanding the generating function into partial frac-
tions:
1
(1 t)
k
2
(1 2t)
=
A
1 2t
+
B
(1 t)
k
2
+
+
C
(1 t)
k
2
1
+
D
(1 t)
k
2
2
+ +
X
1 t
.
We can show that A = 2
k
2
, B = 1, C = 2, D =
4, . . . , X = 2
k
2
1
; in fact, by substituting these
values in the previous expression we get:
2
k
2
1 2t
1
(1 t)
k
2
2(1 t)
(1 t)
k
2
4(1 t)
2
(1 t)
k
2
2
k
2
1
(1 t)
k
2
1
(1 t)
k
2
=
=
2
k
2
1 2t
1
(1 t)
k
2
2
k
2
(1 t)
k
2
1
2(1 t) 1
=
=
2
k
2
1 2t
2
k
2
(1 t)
k
2
1
(1 t)
k
2
(1 2t)
=
=
1
(1 t)
k
2
(1 2t)
.
Therefore, we conclude:
n
k=0
n
k
k =
k=1
2
k
2
2
nk
2
k=1
n 1
n k
2
k=1
2
n 2
n k
2
k=1
2
k
2
1
n k
2
n k
2
=
=
n2
n
k=1
n 1
n k
2
+ 2
n 2
n k
2
+ +
+ + 2
k
2
1
n k
2
n k
2
.
We observe that for very large values of n the rst
term dominates all the others, and therefore the
asymptotic value of the sum is
n2
n
.
4.8 Linear recurrences with
constant coecients
If (f
k
)
kN
is a sequence, it can be dened by means
of a recurrence relation, i.e., a relation relating the
generic element f
n
to other elements f
k
having k < n.
Usually, the rst elements of the sequence must be
given explicitly, in order to allow the computation
of the successive values; they constitute the initial
conditions and the sequence is well-dened if and only
if every element can be computed by starting with
the initial conditions and going on with the other
elements by means of the recurrence relation. For
example, the constant sequence (1, 1, 1, . . .) can be
dened by the recurrence relation x
n
= x
n1
and
the initial condition x
0
= 1. By changing the initial
conditions, the sequence can radically change; if we
consider the same relation x
n
= x
n1
together with
the initial condition x
0
= 2, we obtain the constant
sequence 2, 2, 2, . . ..
In general, a recurrence relation can be written
f
n
= F(f
n1
, f
n2
, . . .); when F depends on all
the values f
n1
, f
n2
, . . . , f
1
, f
0
, then the relation is
called a full history recurrence. If F only depends on
a xed number of elements f
n1
, f
n2
, . . . , f
np
, then
the relation is called a partial history recurrence and p
is called the order of the relation. Besides, if F is lin-
ear, we have a linear recurrence. Linear recurrences
are surely the most common and important type of
recurrence relations; if all the coecients appearing
in F are constant, we have a linear recurrence with
48 CHAPTER 4. GENERATING FUNCTIONS
constant coecients, and if the coecients are poly-
nomials in n, we have a linear recurrence with poly-
nomial coecients. As we are now going to see, the
method of generating functions allows us to nd the
solution of any linear recurrence with constant coe-
cients, in the sense that we nd a function f(t) such
that [t
n
]f(t) = f
n
, n N. For linear recurrences
with polynomial coecients, the same method allows
us to nd a solution in many occasions, but the suc-
cess is not assured. On the other hand, no method
is known that solves all the recurrences of this kind,
and surely generating functions are the method giv-
ing the highest number of positive results. We will
discuss this case in the next section.
The Fibonacci recurrence F
n
= F
n1
+F
n2
is an
example of a recurrence relation with constant co-
ecients. When we have a recurrence of this kind,
we begin by expressing it in such a way that the re-
lation is valid for every n N. In the example of
Fibonacci numbers, this is not the case, because for
n = 0 we have F
0
= F
1
+F
2
, and we do not know
the values for the two elements in the r.h.s., which
have no combinatorial meaning. However, if we write
the recurrence as F
n+2
= F
n+1
+F
n
we have fullled
the requirement. This rst step has a great impor-
tance, because it allows us to apply the operator (
to both members of the relation; this was not pos-
sible beforehand because of the principle of identity
for generating functions.
The recurrence being linear with constant coe-
cients, we can apply the axiom of linearity to the
recurrence:
f
n+p
=
1
f
n+p1
+
2
f
n+p2
+ +
p
f
n
and obtain the relation:
((f
n+p
) =
1
((f
n+p1
)+
+
2
((f
n+p2
) + +
p
((f
n
).
By Theorem 4.2.2 we can now express every
((f
n+pj
) in terms of f(t) = ((f
n
) and obtain a lin-
ear relation in f(t), from which an explicit expression
for f(t) is immediately obtained. This is the solution
of the recurrence relation. We observe explicitly that
in writing the expressions for ((f
n+pj
) we make use
of the initial conditions for the sequence.
Let us go on with the example of the Fibonacci
sequence (F
k
)
kN
. We have:
((F
n+2
) = ((F
n+1
) +((F
n
)
and by setting F(t) = ((F
n
) we nd:
F(t) F
0
F
1
t
t
2
=
F(t) F
0
t
+F(t).
Because we know that F
0
= 0, F
1
= 1, we have:
F(t) t = tF(t) +t
2
F(t)
and by solving in F(t) we have the explicit expression:
F(t) =
t
1 t t
2
.
This is the generating function for the Fibonacci
numbers. We can now nd an explicit expression for
F
n
in the following way. The denominator of F(t)
can be written 1 t t
2
= (1 t)(1
t) where:
=
1 +
5
2
1.618033989
=
1
5
2
0.618033989.
The constant 1/ 0.618033989 is known as the
golden ratio. By applying the method of partial frac-
tion expansion we nd:
F(t) =
t
(1 t)(1
t)
=
A
1 t
+
B
1
t
=
=
AA
t +B Bt
1 t t
2
.
We determine the two constants A and B by equat-
ing the coecients in the rst and last expression for
F(t):
A+B = 0
A
B = 1
A = 1/(
) = 1/
5
B = A = 1/
5
The value of F
n
is now obtained by extracting the
coecient of t
n
:
F
n
= [t
n
]F(t) =
= [t
n
]
1
1
1 t
1
1
t
=
=
1
[t
n
]
1
1 t
[t
n
]
1
1
t
=
=
n
5
.
This formula allows us to compute F
n
in a time in-
dependent of n, because
n
= exp(nln ), and shows
that F
n
grows exponentially. In fact, since [
[ < 1,
the quantity
n
approaches 0 very rapidly and we
have F
n
= O(
n
). In reality, F
n
should be an inte-
ger and therefore we can compute it by nding the
integer number closest to
n
/
5; consequently:
F
n
= round
.
4.10. THE SUMMING FACTOR METHOD 49
4.9 Linear recurrences with
polynomial coecients
When a recurrence relation has polynomial coe-
cients, the method of generating functions does not
assure a solution, but no other method is available
to solve those recurrences which cannot be solved by
a generating function approach. Usually, the rule of
dierentiation introduces a derivative in the relation
for the generating function and a dierential equa-
tion has to be solved. This is the actual problem of
this approach, because the main diculty just con-
sists in dealing with the dierential equation. We
have already seen some examples when we studied
the method of shifting, but here we wish to present
a case arising from an actual combinatorial problem,
and in the next section we will see a very important
example taken from the analysis of algorithms.
When we studied permutations, we introduced the
concept of an involution, i.e., a permutation P
n
such that
2
= (1), and for the number I
n
of involu-
tions in P
n
we found the recurrence relation:
I
n
= I
n1
+ (n 1)I
n2
which has polynomial coecients. The number of
involutions grows very fast and it can be a good idea
to consider the quantity i
n
= I
n
/n!. Therefore, let
us begin by changing the recurrence in such a way
that the principle of identity can be applied, and then
divide everything by (n + 2)!:
I
n+2
= I
n+1
+ (n + 1)I
n
I
n+2
(n + 2)!
=
1
n + 2
I
n+1
(n + 1)!
+
1
n + 2
I
n
n!
.
The recurrence relation for i
n
is:
(n + 2)i
n+2
= i
n+1
+i
n
and we can pass to generating functions.
(((n + 2)i
n+2
) can be seen as the shifting of
(((n + 1)i
n+1
) = i
(t) 1
t
=
i(t) 1
t
+i(t)
because of the initial conditions i
0
= i
1
= 1, and so:
i
(t) = (1 +t)i(t).
This is a simple dierential equation with separable
variables and by solving it we nd:
ln i(t) = t +
t
2
2
+C or i(t) = exp
t +
t
2
2
+C
t +
t
2
2
.
4.10 The summing factor
method
For linear recurrences of the rst order a method ex-
ists, which allows us to obtain an explicit expression
for the generic element of the dened sequence. Usu-
ally, this expression is in the form of a sum, and a pos-
sible closed form can only be found by manipulating
this sum; therefore, the method does not guarantee
a closed form. Let us suppose we have a recurrence
relation:
a
n+1
f
n+1
= b
n
f
n
+c
n
where a
n
, b
n
, c
n
are any expressions, possibly depend-
ing on n. As we remarked in the Introduction, if
a
n+1
= b
n
= 1, by unfolding the recurrence we can
nd an explicit expression for f
n+1
or f
n
:
f
n+1
= f
n
+c
n
= f
n1
+c
n1
+c
n
= = f
0
+
n
k=0
c
k
where f
0
is the initial condition relative to the se-
quence under consideration. Fortunately, we can al-
ways change the original recurrence into a relation of
this more simple form. In fact, if we multiply every-
thing by the so-called summing factor:
a
n
a
n1
. . . a
0
b
n
b
n1
. . . b
0
provided none of a
n
, a
n1
, . . . , a
0
, b
n
, b
n1
, . . . , b
0
is
zero, we obtain:
a
n+1
a
n
. . . a
0
b
n
b
n1
. . . b
0
f
n+1
=
=
a
n
a
n1
. . . a
0
b
n1
b
n2
. . . b
0
f
n
+
a
n
a
n1
. . . a
0
b
n
b
n1
. . . b
0
c
n
.
We can now dene:
g
n+1
= a
n+1
a
n
. . . a
0
f
n+1
/(b
n
b
n1
. . . b
0
),
and the relation becomes:
g
n+1
= g
n
+
a
n
a
n1
. . . a
0
b
n
b
n1
. . . b
0
c
n
g
0
= a
0
f
0
.
Finally, by unfolding this recurrence the result is:
f
n+1
=
b
n
b
n1
. . . b
0
a
n+1
a
n
. . . a
0
a
0
f
0
+
n
k=0
a
k
a
k1
. . . a
0
b
k
b
k1
. . . b
0
c
k
.
As a technical remark, we observe that sometimes
a
0
and/or b
0
can be 0; in that case, we can unfold the
50 CHAPTER 4. GENERATING FUNCTIONS
recurrence down to 1, and accordingly change the last
index 0.
In order to show a non-trivial example, let us
discuss the problem of determining the coecient
of t
n
in the f.p.s. corresponding to the function
f(t) =
1 t ln
1
1 t
=
= t
1
24
t
3
1
24
t
4
71
1920
t
5
31
960
t
6
+ .
A method for nding a recurrence relation for the
coecients f
n
of this f.p.s. is to derive a dierential
equation for f(t). By dierentiating:
f
(t) =
1
2
1 t
ln
1
1 t
+
1
1 t
and therefore we have the dierential equation:
(1 t)f
(t) =
1
2
f(t) +
1 t.
By extracting the coecient of t
n
, we have the rela-
tion:
(n + 1)f
n+1
nf
n
=
1
2
f
n
+
1/2
n
(1)
n
which can be written as:
(n + 1)f
n+1
=
2n 1
2
f
n
1
4
n
(2n 1)
2n
n
.
This is a recurrence relation of the rst order with the
initial condition f
0
= 0. Let us apply the summing
factor method, for which we have a
n
= n, b
n
= (2n
1)/2. Since a
0
= 0, we have:
a
n
a
n1
. . . a
1
b
n
b
n1
. . . b
1
=
n(n 1) 1 2
n
(2n 1)(2n 3) 1
=
=
n!2
n
2n(2n 2) 2
2n(2n 1)(2n 2) 1
=
=
4
n
n!
2
(2n)!
.
By multiplying the recurrence relation by this sum-
ming factor, we nd:
(n + 1)
4
n
n!
2
(2n)!
f
n+1
=
2n 1
2
4
n
n!
2
(2n)!
f
n
1
2n 1
.
We are fortunate and c
n
simplies dramatically;
besides, we know that the two coecients of f
n+1
and f
n
are equal, notwithstanding their appearance.
Therefore we have:
f
n+1
=
1
(n + 1)4
n
2n
n
k=0
1
2k 1
(a
0
f
0
= 0).
We can somewhat simplify this expression by observ-
ing that:
n
k=0
1
2k 1
=
1
2n 1
+
1
2n 3
+ + 1 1 =
=
1
2n
+
1
2n 1
+
1
2n 2
+ +
1
2
+1
1
2n
1
2
1 =
= H
2n+2
1
2n + 1
1
2n + 2
1
2
H
n+1
+
1
2n + 2
1 =
= H
2n+2
1
2
H
n+1
2(n + 1)
2n + 1
.
Furthermore, we have:
1
(n + 1)4
n
2n
n
=
4
(n + 1)4
n+1
2n + 2
n + 1
n + 1
2(2n + 1)
=
2
2n + 1
1
4
n+1
2n + 2
n + 1
.
Therefore:
f
n+1
=
1
2
H
n+1
H
2n+2
+
2(n + 1)
2n + 1
2
2n + 1
1
4
n+1
2n + 2
n + 1
.
This expression allows us to obtain a formula for f
n
:
f
n
=
1
2
H
n
H
2n
+
2n
2n 1
2
2n 1
1
4
n
2n
n
=
=
H
n
2H
2n
+
4n
2n 1
1/2
n
.
The reader can numerically check this expression
against the actual values of f
n
given above. By using
the asymptotic approximation H
n
ln n+ given in
the Introduction, we nd:
H
n
2H
2n
+
4n
2n 1
ln n + 2(ln 2 + lnn +) + 2 =
= ln n ln 4 + 2.
Besides:
1
2n 1
1
4
n
2n
n
1
2n
n
and we conclude:
f
n
(ln n + + ln 4 2)
1
2n
n
which shows that [f
n
[ grows as ln n/n
3/2
.
4.11. THE INTERNAL PATH LENGTH OF BINARY TREES 51
4.11 The internal path length
of binary trees
Binary trees are often used as a data structure to re-
trieve information. A set D of keys is given, taken
from an ordered universe U. Therefore D is a permu-
tation of the ordered sequence d
1
< d
2
< < d
n
,
and as the various elements arrive, they are inserted
in a binary tree. As we know, there are n! pos-
sible permutations of the keys in D, but there are
only
2n
n
n1
k
n 1
k
(n1 k)!(P
k
+k!k) = (n1)!
P
k
k!
+k
.
In a similar way we nd the total contribution of the
right subtrees:
n 1
k
k!(P
n1k
+ (n 1 k)!(n 1 k)) =
= (n 1)!
P
n1k
(n 1 k)!
+ (n 1 k)
.
It only remains to count the contribution of the roots,
which obviously amounts to n!, a single comparisons
for every tree. We therefore have the following recur-
rence relation, in which the contributions of the left
and right subtrees turn out to be the same:
P
n
= n! + (n 1)!
n1
k=0
P
k
k!
+k +
P
n1k
(n 1 k)!
+ (n 1 k)
=
= n! + 2(n 1)!
n1
k=0
P
k
k!
+k
=
= n! + 2(n 1)!
n1
k=0
P
k
k!
+
n(n 1)
2
.
We used the formula for the sum of the rst n 1
integers, and now, by dividing by n!, we have:
P
n
n!
= 1 +
2
n
n1
k=0
P
k
k!
+n 1 = n +
2
n
n1
k=0
P
k
k!
.
Let us now set Q
n
= P
n
/n!, so that Q
n
is the average
total i.p.l. relative to a single tree. If well succeed
in nding Q
n
, the average i.p.l. we are looking for
will simply be Q
n
/n. We can also reformulate the
recurrence for n +1, in order to be able to apply the
generating function operator:
(n + 1)Q
n+1
= (n + 1)
2
+ 2
n
k=0
Q
k
Q
(t) =
1 +t
(1 t)
3
+ 2
Q(t)
1 t
Q
(t)
2
1 t
Q(t) =
1 +t
(1 t)
3
.
This dierential equation can be easily solved:
Q(t) =
1
(1 t)
2
(1 t)
2
(1 +t)
(1 t)
3
dt +C
=
=
1
(1 t)
2
2 ln
1
1 t
t +C
.
52 CHAPTER 4. GENERATING FUNCTIONS
Since the i.p.l. of the empty tree is 0, we should have
Q
0
= Q(0) = 0 and therefore, by setting t = 0, we
nd C = 0. The nal result is:
Q(t) =
2
(1 t)
2
ln
1
1 t
t
(1 t)
2
.
We can now use the formula for ((nH
n
) (see the Sec-
tion 4.4 on Common Generating Functions) to ex-
tract the coecient of t
n
:
Q
n
= [t
n
]
1
(1 t)
2
ln
1
1 t
+ 1
(2 +t)
=
= 2[t
n+1
]
t
(1 t)
2
ln
1
1 t
+ 1
[t
n
]
2 +t
(1 t)
2
=
= 2(n+1)H
n+1
2
2
n
(1)
n
2
n 1
(1)
n1
=
= 2(n +1)H
n
+2 2(n +1) n = 2(n +1)H
n
3n.
Thus we conclude with the average i.p.l.:
P
n
n!n
=
Q
n
n
= 2
1 +
1
n
H
n
3.
This formula is asymptotic to 2 ln n+3, and shows
that the average number of comparisons necessary to
retrieve any key in a binary tree is in the order of
O(ln n).
4.12 Height balanced binary
trees
We have been able to show that binary trees are a
good retrieving structure, in the sense that if the
elements, or keys, of a set a
1
, a
2
, . . . , a
n
are stored
in random order in a binary (search) tree, then the
expected average time for retrieving any key in the
tree is in the order of lnn. However, this behavior of
binary trees is not always assured; for example, if the
keys are stored in the tree in their proper order, the
resulting structure degenerates into a linear list and
the average retrieving time becomes O(n).
To avoid this drawback, at the beginning of the
1960s, two Russian researchers, Adelson-Velski and
Landis, found an algorithm to store keys in a height
balanced binary tree, a tree for which the height of
the left subtree of every node K diers by at most
1 from the height of the right subtree of the same
node K. To understand this concept, let us dene the
height of a tree as the highest level at which a node
in the tree is placed. The height is also the maximal
number of comparisons necessary to nd any key in
the tree. Therefore, if we nd a limitation for the
height of a class of trees, this is also a limitation for
the internal path length of the trees in the same class.
Formally, a height balanced binary tree is a tree such
that for every node K in it, if h
K
and h
k
are the
heights of the two subtrees originating from K, then
[h
K
h
K
[ 1.
The algorithm of Adelson-Velski and Landis is very
important because, as we are now going to show,
height balanced binary trees assure that the retriev-
ing time for every key in the tree is O(ln n). Because
of that, height balanced binary trees are also known
as AVL trees, and the algorithm for building AVL-
trees from a set of n keys can be found in any book
on algorithms and data structures. Here we only wish
to perform a worst case analysis to prove that the re-
trieval time in any AVL tree cannot be larger than
O(ln n).
In order to perform our analysis, let us consider
to worst possible AVL tree. Since, by denition, the
height of the left subtree of any node cannot exceed
the height of the corresponding right subtree plus 1,
let us consider trees in which the height of the left
subtree of every node exceeds exactly by 1 the height
of the right subtree of the same node. In Figure 4.1
we have drawn the rst cases. These trees are built in
a very simple way: every tree T
n
, of height n, is built
by using the preceding tree T
n1
as the left subtree
and the tree T
n2
as the right subtree of the root.
Therefore, the number of nodes in T
n
is just the sum
of the nodes in T
n1
and in T
n2
, plus 1 (the root),
and the condition on the heights of the subtrees of
every node is satised. Because of this construction,
T
n
can be considered as the worst tree of height n,
in the sense that every other AVL-tree of height n will
have at least as many nodes as T
n
. Since the height
n is an upper bound for the number of comparisons
necessary to retrieve any key in the tree, the average
retrieving time for every such tree will be n.
If we denote by [T
n
[ the number of nodes in the
tree T
n
, we have the simple recurrence relation:
[T
n
[ = [T
n1
[ +[T
n2
[ + 1.
This resembles the Fibonacci recurrence relation,
and, in fact, we can easily show that [T
n
[ = F
n+1
1,
as is intuitively apparent from the beginning of the
sequence 0, 1, 2, 4, 7, 12, . . .. The proof is done by
mathematical induction. For n = 0 we have [T
0
[ =
F
1
1 = 1 1 = 0, and this is true; similarly we
proceed for n + 1. Therefore, let us suppose that for
every k < n we have [T
k+1
[ = F
k
1; this holds for
k = n1 and k = n2, and because of the recurrence
relation for [T
n
[ we have:
[T
n
[ = [T
n1
[ +[T
n2
[ + 1 =
= F
n
1 +F
n1
1 + 1 = F
n+1
1
since F
n
+F
n1
= F
n+1
by the Fibonacci recurrence.
4.13. SOME SPECIAL RECURRENCES 53
T
0
T
1
T
2
T
3
T
4
T
5
,
,
,
,
,
,
,
d
d
,
,
,
,
,
,
,
e
e
d
d
,
,
,
,
,
,
,
,
,
,
,
,
e
e
d
d
d
d
d
d
5; therefore we have [T
n
[
n+1
/
5 1
or
n+1
5([T
n
[+1). By passing to the logarithms,
we have: n log
5([T
n
[ + 1)) 1, and since all
logarithms are proportional, n = O(ln [T
n
[). As we
observed, every AVL-tree of height n has a number
of nodes not less than [T
n
[, and this assures that the
retrieving time for every AVL-tree with at most [T
n
[
nodes is bounded from above by log
5([T
n
[+1))1
4.13 Some special recurrences
Not all recurrence relations are linear and we had
occasions to deal with a dierent sort of relation when
we studied the Catalan numbers. They satisfy the
recurrence C
n
=
n1
k=0
C
k
C
nk1
, which however,
in this particular form, is only valid for n > 0. In
order to apply the method of generating functions,
we write it for n + 1:
C
n+1
=
n
k=0
C
k
C
nk
.
The right hand member is a convolution, and there-
fore, by the initial condition C
0
= 1, we obtain:
C(t) 1
t
= C(t)
2
or tC(t)
2
C(t) + 1 = 0.
This is a second degree equation, which can be di-
rectly solved; for t = 0 we should have C(0) = C
0
=
1, and therefore the solution with the + sign before
the square root is to be ignored; we thus obtain:
C(t) =
1
1 4t
2t
which was found in the section The method of shift-
ing in a completely dierent way.
The Bernoulli numbers were introduced by means
of the implicit relation:
n
k=0
n + 1
k
B
k
=
n,0
.
We are now in a position to nd out their expo-
nential generating function, i.e., the function B(t) =
((B
n
/n!), and prove some of their properties. The
dening relation can be written as:
n
k=0
n + 1
n k
B
nk
=
=
n
k=0
(n + 1)n (k + 2)
B
nk
(n k)!
=
=
n
k=0
(n + 1)!
(k + 1)!
B
nk
(n k)!
=
n,0
.
If we divide everything by (n + 1)!, we obtain:
n
k=0
1
(k + 1)!
B
nk
(n k)!
=
n,0
and since this relation holds for every n N, we
can pass to the generating functions. The left hand
member is a convolution, whose rst factor is the shift
of the exponential function and therefore we obtain:
e
t
1
t
B(t) = 1 or B(t) =
t
e
t
1
.
The classical way to see that B
2n+1
= 0, n > 0,
is to show that the function obtained from B(t) by
deleting the term of rst degree is an even function,
and therefore should have all its coecients of odd
order equal to zero. In fact we have:
B(t) +
t
2
=
t
e
t
1
+
t
2
=
t
2
e
t
+ 1
e
t
1
.
In order to see that this function is even, we substi-
tute t t and show that the function remains the
same:
t
2
e
t
+ 1
e
t
1
=
t
2
1 +e
t
1 e
t
=
t
2
e
t
+ 1
e
t
1
.
54 CHAPTER 4. GENERATING FUNCTIONS
Chapter 5
Riordan Arrays
5.1 Denitions and basic con-
cepts
A Riordan array is a couple of formal power series
D = (d(t), h(t)); if both d(t), h(t) T
0
, then the
Riordan array is called proper. The Riordan array
can be identied with the innite, lower triangular
array (or triangle) (d
n,k
)
n,kN
dened by:
d
n,k
= [t
n
]d(t)(th(t))
k
(5.1.1)
In fact, we are mainly interested in the sequence of
functions iteratively dened by:
d
0
(t) = d(t)
d
k
(t) = d
k1
(t)th(t) = d(t)(th(t))
k
These functions are the column generating functions
of the triangle.
Another way of characterizing a Riordan array
D = (d(t), h(t)) is to consider the bivariate gener-
ating function:
d(t, z) =
k=0
d(t)(th(t))
k
z
k
=
d(t)
1 tzh(t)
(5.1.2)
A common example of a Riordan array is the Pascal
triangle, for which we have d(t) = h(t) = 1/(1 t).
In fact we have:
d
n,k
= [t
n
]
1
1 t
t
1 t
k
= [t
nk
]
1
(1 t)
k+1
=
=
k 1
n k
(1)
nk
=
n
n k
n
k
k=0
d
n,k
f
k
= [t
n
]d(t)f(th(t)) (5.1.3)
Proof: The proof consists in a straight-forward com-
putation:
n
k=0
d
n,k
f
k
=
k=0
d
n,k
f
k
=
=
k=0
[t
n
]d(t)(th(t))
k
f
k
=
= [t
n
]d(t)
k=0
f
k
(th(t))
k
=
= [t
n
]d(t)f(th(t)).
In the case of Pascal triangle we obtain the Euler
transformation:
n
k=0
n
k
f
k
= [t
n
]
1
1 t
f
t
1 t
(1)
k
k
d
n,k
= [t
n
]
d(t)
1 th(t)
alternating r. s.
k
(1)
k
d
n,k
= [t
n
]
d(t)
1 +th(t)
weighted r. s.
k
kd
n,k
= [t
n
]
td(t)h(t)
(1 th(t))
2
.
Moreover, by observing that
D = (d(t), th(t)) is a
Riordan array, whose rows are the diagonals of D,
55
56 CHAPTER 5. RIORDAN ARRAYS
we have:
diagonal sums
k
d
nk,k
= [t
n
]
d(t)
1 t
2
h(t)
.
Obviously, this observation can be generalized to nd
the generating function of any sum
k
d
nsk,k
for
every s 1. We obtain well-known results for the
Pascal triangle; for example, diagonal sums give:
n k
k
= [t
n
]
1
1 t
1
1 t
2
(1 t)
1
=
= [t
n
]
1
1 t t
2
= F
n+1
connecting binomial coecients and Fibonacci num-
bers.
Another general result can be obtained by means of
two sequences (f
k
)
kN
and (g
k
)
kN
and their gener-
ating functions f(t), g(t). For p = 1, 2, . . ., the generic
element of the Riordan array (f(t), t
p1
) is:
d
n,k
= [t
n
]f(t)(t
p
)
k
= [t
npk
]f(t) = f
npk
.
Therefore, by formula (5.1.3), we have:
n/p
k=0
f
npk
g
k
= [t
n
]f(t)
g(y)
y = t
p
=
= [t
n
]f(t)g(t
p
).
This can be called the rule of generalized convolution
since it reduces to the usual convolution rule for p =
1. Suppose, for example, that we wish to sum one
out of every three powers of 2, starting with 2
n
and
going down to the lowest integer exponent 0; we
have:
S
n
=
n/3
k=0
2
n3k
= [t
n
]
1
1 2t
1
1 t
3
.
As we will learn studying asymptotics, an approxi-
mate value for this sum can be obtained by extracting
the coecient of the rst factor and then by multiply-
ing it by the second factor, in which t is substituted
by 1/2. This gives S
n
2
n+3
/7, and in fact we have
the exact value S
n
= 2
n+3
/7.
In a sense, the theorem on the sums involving the
Riordan arrays is a characterization for them; in fact,
we can prove a sort of inverse property:
Theorem 5.1.2 Let (d
n,k
)
n,kN
be an innite tri-
angle such that for every sequence (f
k
)
kN
we have
k
d
n,k
f
k
= [t
n
]d(t)f(th(t)), where f(t) is the gen-
erating function of the sequence and d(t), h(t) are two
f.p.s. not depending on f(t). Then the triangle de-
ned by the Riordan array (d(t), h(t)) coincides with
(d
n,k
)
n,kN
.
Proof: For every k N take the sequence which
is 0 everywhere except in the kth element f
k
= 1.
The corresponding generating function is f(t) = t
k
and we have
i=0
d
n,i
f
i
= d
n,k
. Hence, according to
the theorems hypotheses, we nd (
t
(d
n,k
)
n,kN
=
d
k
(t) = d(t)(th(t))
k
, and this corresponds to the ini-
tial denition of column generating functions for a
Riordan array, for every k = 1, 2, . . ..
5.2 The algebraic structure of
Riordan arrays
The most important algebraic property of Riordan
arrays is the fact that the usual row-by-column prod-
uct of two Riordan arrays is a Riordan array. This is
proved by considering two Riordan arrays (d(t), h(t))
and (a(t), b(t)) and performing the product, whose
generic element is
j
d
n,j
f
j,k
, if d
n,j
is the generic
element in (d(t), h(t)) and f
j,k
is the generic element
in (a(t), b(t)). In fact we have:
j=0
d
n,j
f
j,k
=
=
j=0
[t
n
]d(t)(th(t))
j
[y
j
]a(y)(yb(y))
k
=
= [t
n
]d(t)
j=0
(th(t))
j
[y
j
]a(y)(yb(y))
k
=
= [t
n
]d(t)a(th(t))(th(t)b(th(t)))
k
.
By denition, the last expression denotes the generic
element of the Riordan array (f(t), g(t)) where f(t) =
d(t)a(th(t)) and g(t) = h(t)b(th(t)). Therefore we
have:
(d(t), h(t)) (a(t), b(t)) = (d(t)a(th(t)), h(t)b(th(t))).
(5.2.1)
This expression is particularly important and is the
basis for many developments of the Riordan array
theory.
The product is obviously associative, and we ob-
serve that the Riordan array (1, 1) acts as the neutral
element or identity. In fact, the array (1, 1) is every-
where 0 except for the elements on the main diagonal,
which are 1. Observe that this array is proper.
Let us now suppose that (d(t), h(t)) is a proper
Riordan array. By formula (5.2.1), we immediately
see that the product of two proper Riordan arrays is
proper; therefore, we can look for a proper Riordan
array (a(t), b(t)) such that (d(t), h(t)) (a(t), b(t)) =
(1, 1). If this is the case, we should have:
d(t)a(th(t)) = 1 and h(t)b(th(t)) = 1.
5.3. THE A-SEQUENCE FOR PROPER RIORDAN ARRAYS 57
By setting y = th(t) we have:
a(y) =
d(t)
1
t = yh(t)
1
b(y) =
h(t)
1
t = yh(t)
1
.
Here we are in the hypotheses of the Lagrange Inver-
sion Formula, and therefore there is a unique function
t = t(y) such that t(0) = 0 and t = yh(t)
1
. Besides,
being d(t), h(t) T
0
, the two f.p.s. a(y) and b(y) are
uniquely dened. We have therefore proved:
Theorem 5.2.1 The set / of proper Riordan arrays
is a group with the operation of row-by-column prod-
uct dened functionally by relation (5.2.1).
It is a simple matter to show that some important
classes of Riordan arrays are subgroups of /:
the set of the Riordan arrays (f(t), 1) is an in-
variant subgroup of /; it is called the Appell
subgroup;
the set of the Riordan arrays (1, g(t)) is a sub-
group of / and is called the subgroup of associ-
ated operators or the Lagrange subgroup;
the set of Riordan arrays (f(t), f(t)) is a sub-
group of / and is called the Bell subgroup. Its
elements are also known as renewal arrays.
The rst two subgroups have already been consid-
ered in the Chapter on Formal Power Series and
show the connection between f.p.s. and Riordan ar-
rays. The notations used in that Chapter are thus
explained as particular cases of the most general case
of (proper) Riordan arrays.
Let us now return to the formulas for a Riordan
array inverse. If h(t) is any xed invertible f.p.s., let
us dene:
d
h
(t) =
d(y)
1
y = th(y)
1
d
h
(t)
1 zy
y = th
h
(t)
.
By the formulas above, we have:
y = th
h
(t) = th(th
h
(t))
1
= th(y)
1
which is the same as t = yh(y). Therefore we nd:
d
h
(t) = d
h
(yh(y)) = d(t)
1
, and consequently:
d
n,k
= [z
k
][t
n
]
d(y)
1
1 zy
y = th(y)
1
=
= [z
k
]
1
n
[y
n1
]
d
dy
d(y)
1
1 zy
1
h(y)
n
=
= [z
k
]
1
n
[y
n1
]
z
d(y)(1 zy)
2
(y)
d(y)
2
(1 zy)
1
h(y)
n
=
= [z
k
]
1
n
[y
n1
]
r=0
z
r+1
y
r
(r + 1)
(y)
d(y)
r=0
z
r
y
r
1
d(y)h(y)
n
=
=
1
n
[y
n1
]
ky
k1
y
k
d
(y)
d(y)
1
d(y)h(y)
n
=
=
1
n
[y
nk
]
k
yd
(y)
d(y)
1
d(y)h(y)
n
.
This is the formula we were looking for.
5.3 The A-sequence for proper
Riordan arrays
Proper Riordan arrays play a very important role
in our approach. Let us consider a Riordan array
D = (d(t), h(t)), which is not proper, but d(t)
T
0
. Since h(0) = 0, an s > 0 exists such that
h(t) = h
s
t
s
+h
s+1
t
s+1
+ and h
s
= 0. If we dene
h(t) = h
s
+h
s+1
t+ , then
h(t) T
0
. Consequently,
the Riordan array
D = (d(t),
d
nsk
)
kN
of
D. Fortunately, for proper Riordan arrays, Rogers
has found an important characterization: every ele-
ment d
n+1,k+1
, n, k N, can be expressed as a linear
combination of the elements in the preceding row,
i.e.:
d
n+1,k+1
= a
0
d
n,k
+a
1
d
n,k+1
+a
2
d
n,k+2
+ =
58 CHAPTER 5. RIORDAN ARRAYS
=
j=0
a
j
d
n,k+j
. (5.3.1)
The sum is actually nite and the sequence A =
(a
k
)
kN
is xed. More precisely, we can prove the
following theorem:
Theorem 5.3.1 An innite lower triangular array
D = (d
n,k
)
n,kN
is a Riordan array if and only if a
sequence A = a
0
= 0, a
1
, a
2
, . . . exists such that for
every n, k N relation (5.3.1) holds
Proof: Let us suppose that D is the Riordan
array (d(t), h(t)) and let us consider the Riordan
array (d(t)h(t), h(t)); we dene the Riordan array
(A(t), B(t)) by the relation:
(A(t), B(t)) = (d(t), h(t))
1
(d(t)h(t), h(t))
or:
(d(t), h(t)) (A(t), B(t)) = (d(t)h(t), h(t)).
By performing the product we nd:
d(t)A(th(t)) = d(t)h(t) and h(t)B(th(t)) = h(t).
The latter identity gives B(th(t)) = 1 and this implies
B(t) = 1. Therefore we have (d(t), h(t)) (A(t), 1) =
(d(t)h(t), h(t)). The element f
n,k
of the left hand
member is
j=0
d
n,j
a
kj
=
j=0
d
n,k+j
a
j
, if as
usual we interpret a
kj
as 0 when k < j. The same
element in the right hand member is:
[t
n
]d(t)h(t)(th(t))
k
=
= [t
n+1
]d(t)(th(t))
k+1
= d
n+1,k+1
.
By equating these two quantities, we have the iden-
tity (5.3.1). For the converse, let us observe that
(5.3.1) uniquely denes the array D when the ele-
ments d
0,0
, d
1,0
, d
2,0
, . . . of column 0 are given. Let
d(t) be the generating function of this column, A(t)
the generating function of the sequence A and de-
ne h(t) as the solution of the functional equation
h(t) = A(th(t)), which is uniquely determined be-
cause of our hypothesis a
0
= 0. We can therefore
consider the proper Riordan array
D = (d(t), h(t));
by the rst part of the theorem,
D satises relation
(5.3.1), for every n, k N and therefore, by our previ-
ous observation, it must coincide with D. This com-
pletes the proof.
The sequence A = (a
k
)
kN
is called the A-sequence
of the Riordan array D = (d(t), h(t)) and it only
depends on h(t). In fact, as we have shown during
the proof of the theorem, we have:
h(t) = A(th(t)) (5.3.2)
and this uniquely determines A when h(t) is given
and, vice versa, h(t) is uniquely determined when A
is given.
The A-sequence for the Pascal triangle is the so-
lution A(y) of the functional equation 1/(1 t) =
A(t/(1 t)). The simple substitution y = t/(1 t)
gives A(y) = 1 +y, corresponding to the well-known
basic recurrence of the Pascal triangle:
n+1
k+1
n
k
n
k+1
j=0
z
j
d
n,j
.
(5.3.3)
Proof: Let z
0
= d
1,0
/d
0,0
. Now we can uniquely
determine the value of z
1
by expressing d
2,0
in terms
of the elements in row 1, i.e.:
d
2,0
= z
0
d
1,0
+z
1
d
1,1
or z
1
=
d
0,0
d
2,0
d
2
1,0
d
0,0
d
1,1
.
In the same way, we determine z
2
by expressing d
3,0
in terms of the elements in row 2, and by substituting
the values just obtained for z
0
and z
1
. By proceeding
in the same way, we determine the sequence Z in a
unique way.
The sequence Z is called the Z-sequence for the
(Riordan) array; it characterizes column 0, except
for the element d
0,0
. Therefore, we can say that
the triple (d
0,0
, A(t), Z(t)) completely characterizes
a proper Riordan array. To see how the Z-sequence
is obtained by starting with the usual denition of a
Riordan array, let us prove the following:
Theorem 5.3.3 Let (d(t), h(t)) be a proper Riordan
array and let Z(t) be the generating function of the
arrays Z-sequence. We have:
d(t) =
d
0,0
1 tZ(th(t))
5.4. SIMPLE BINOMIAL COEFFICIENTS 59
Proof: By the preceding theorem, the Z-sequence
exists and is unique. Therefore, equation (5.3.3) is
valid for every n N, and we can go on to the gener-
ating functions. Since d(t)(th(t))
k
is the generating
function for column k, we have:
d(t) d
0,0
t
=
= z
0
d(t) +z
1
d(t)th(t) +z
2
d(t)(th(t))
2
+ =
= d(t)(z
0
+z
1
th(t) +z
2
(th(t))
2
+ ) =
= d(t)Z(th(t)).
By solving this equation in d(t), we immediately nd
the relation desired.
The relation can be inverted and this gives us the
formula for the Z-sequence:
Z(y) =
d(t) d
0,0
td(t)
t = yh(t)
1
.
We conclude this section by giving a theorem,
which characterizes renewal arrays by means of the
A- and Z-sequences:
Theorem 5.3.4 Let d(0) = h(0) = 0. Then d(t) =
h(t) if and only if: A(y) = d(0) +yZ(y).
Proof: Let us assume that A(y) = d(0) + yZ(y) or
Z(y) = (A(y) d(0))/y. By the previous theorem,
we have:
d(t) =
d(0)
1 tZ(th(t))
=
=
d(0)
1 (tA(th(t)) d(0)t)/th(t)
=
=
d(0)th(t)
d(0)t
= h(t),
because A(th(t)) = h(t) by formula (5.3.2). Vice
versa, by the formula for Z(y), we obtain from the
hypothesis d(t) = h(t):
d(0) +yZ(y) =
=
d(0) +y
1
t
d(0)
th(t)
t = yh(t)
1
=
=
d(0) +
th(t)
t
d(0)th(t)
th(t)
t = yh(t)
1
=
=
h(t)
t = yh(t)
1
= A(y).
5.4 Simple binomial coe-
cients
Let us consider simple binomial coecients, i.e., bi-
nomial coecients of the form
n+ak
m+bk
, where a, b are
two parameters and k is a non-negative integer vari-
able. Depending if we consider n a variable and m a
parameter, or vice versa, we have two dierent in-
nite arrays (d
n,k
) or (
d
m,k
), whose elements depend
on the parameters a, b, m or a, b, n, respectively. In
either case, if some conditions on a, b hold, we have
Riordan arrays and therefore we can apply formula
(5.1.3) to nd the value of many sums.
Theorem 5.4.1 Let d
n,k
and
d
m,k
be as above. If
b > a and b a is an integer, then D = (d
n,k
) is
a Riordan array. If b < 0 is an integer, then
D =
(
d
m,k
) is a Riordan array. We have:
D =
t
m
(1 t)
m+1
,
t
ba1
(1 t)
b
D =
(1 +t)
n
,
t
b1
(1 +t)
a
.
Proof: By using well-known properties of binomial
coecients, we nd:
d
n,k
=
n +ak
m+bk
n +ak
n m+ak bk
=
=
n ak +n m+ak bk 1
n m+ak bk
(1)
nm+akbk
=
=
mbk 1
(n m) + (a b)k
(1)
nm+akbk
=
= [t
nm+akbk
]
1
(1 t)
m+1+bk
=
= [t
n
]
t
m
(1 t)
m+1
t
ba
(1 t)
b
k
;
and:
d
m,k
= [t
m+bk
](1 +t)
n+ak
=
= [t
m
](1 +t)
n
(t
b
(1 +t)
a
)
k
.
The theorem now directly follows from (5.1.1)
For m = a = 0 and b = 1 we again nd the Riordan
array of the Pascal triangle. The sum (5.1.3) takes
on two specic forms which are worth being stated
explicitly:
n +ak
m+bk
f
k
=
= [t
n
]
t
m
(1 t)
m+1
f
t
ba
(1 t)
b
b > a (5.4.1)
n +ak
m+bk
f
k
=
= [t
m
](1 +t)
n
f(t
b
(1 +t)
a
) b < 0. (5.4.2)
60 CHAPTER 5. RIORDAN ARRAYS
If m and n are independent of each other, these
relations can also be stated as generating function
identities. The binomial coecient
n+ak
m+bk
is so gen-
eral that a large number of combinatorial sums can
be solved by means of the two formulas (5.4.1) and
(5.4.2).
Let us begin our set of examples with a simple sum;
by the theorem above, the binomial coecients
nk
m
n k
m
= [t
n
]
t
m
(1 t)
m+1
1
1 t
=
= [t
nm
]
1
(1 t)
m+2
=
n + 1
m+ 1
.
Another simple example is the sum:
n
2k + 1
5
k
=
= [t
n
]
t
(1 t)
2
1
1 5y
y =
t
2
(1 t)
2
=
=
1
2
[t
n
]
2t
1 2t 4t
2
= 2
n1
F
n
.
The following sum is a more interesting case. From
the generating function of the Catalan numbers we
immediately nd:
n +k
m+ 2k
2k
k
(1)
k
k + 1
=
= [t
n
]
t
m
(1 t)
m+1
1 + 4y 1
2y
y =
t
(1 t)
2
=
= [t
nm
]
1
(1 t)
m+1
1 +
4t
(1 t)
2
1
(1 t)
2
2t
=
= [t
nm
]
1
(1 t)
m
=
n 1
m1
.
In the following sum we use the bisection formulas.
Because the generating function for
z+1
k
2
k
is (1 +
2t)
z+1
, we have:
(
z + 1
2k + 1
2
2k+1
=
=
1
2
(1 + 2
t)
z+1
(1 2
t)
z+1
.
By applying formula (5.4.2):
z + 1
2k + 1
z 2k
n k
2
2k+1
=
= [t
n
](1 +t)
z
(1 + 2
y)
z+1
2
(1 2
y)
z+1
2
y =
t
(1 +t)
2
=
= [t
n
](1 +t)
z+1
(1 +t + 2
t)
z+1
2
t(1 +t)
z+1
(1 +t 2
t)
z+1
2
t(1 +t)
z+1
=
= [t
2n+1
](1 +t)
2z+2
=
2z + 2
2n + 1
;
in the last but one passage, we used backwards the
bisection rule, since (1 +t 2
t)
z+1
= (1
t)
2z+2
.
We solve the following sum by using (5.4.2):
2n 2k
mk
n
k
(2)
k
=
= [t
m
](1 +t)
2n
(1 2y)
n
y =
t
(1 t)
2
=
= [t
m
](1 +t
2
)
n
=
n
m/2
1
1
.
For example we nd:
2n
n +k
n +k
2k
=
2n
n +k
n +k
n k
=
=
n +k
n k
n +k 1
n k 1
=
=
n +k
2k
n 1 +k
2k
.
Hence, by formula (5.4.1), we have:
k
2n
n +k
n +k
n k
f
k
=
=
n +k
2k
f
k
+
n 1 +k
2k
f
k
=
= [t
n
]
1
1 t
f
t
(1 t)
2
+
+ [t
n1
]
1
1 t
f
t
(1 t)
2
=
= [t
n
]
1 +t
1 t
f
t
(1 t)
2
.
5.6. BINOMIAL COEFFICIENTS AND THE LIF 61
This proves that the innite triangle of the elements
2n
n+k
n+k
2k
k
2n
n +k
n +k
n k
2k
k
(1)
k
=
= [t
n
]
1 +t
1 t
1 + 4y
y =
t
(1 t)
2
=
= [t
n
]1 =
n,0
,
k
2n
n +k
n +k
n k
2k
k
(1)
k
k + 1
=
= [t
n
]
1 +t
1 t
1 + 4y 1
2y
y =
t
(1 t)
2
=
= [t
n
](1 +t) =
n,0
+
n,1
.
The following is a quite dierent case. Let f(t) =
((f
k
) and:
G(t) = (
f
k
k
t
0
f() f
0
d.
Obviously we have:
n k
k
=
n k
k
n k 1
k 1
k
n
n k
n k
k
f
k
=
= f
0
+n
k=1
n k 1
k 1
f
k
k
=
= f
0
+n[t
n
]G
t
2
1 t
. (5.5.1)
This gives an immediate proof of the following for-
mula known as Hardys identity:
k
n
n k
n k
k
(1)
k
=
= [t
n
] ln
1 t
2
1 +t
3
=
= [t
n
]
ln
1
1 +t
3
ln
1
1 t
2
=
=
(1)
n
2/n if 3 divides n
(1)
n1
/n otherwise.
We also immediately obtain:
k
1
n k
n k
k
=
n
+
n
n
where is the golden ratio and
=
1
. The
reader can generalize formula (5.5.1) by using the
change of variable t pt and prove other formulas.
The following one is known as Riordans old identity:
k
n
n k
n k
k
(a +b)
n2k
(ab)
k
= a
n
+b
n
while this is a generalization of Hardys identity:
k
n
n k
n k
k
x
n2k
(1)
k
=
=
(x +
x
2
4)
n
+ (x
x
2
4)
n
2
n
.
5.6 Binomial coecients and
the LIF
In a few cases only, the formulas of the previous sec-
tions give the desired result when the m and n in
the numerator and denominator of a binomial coe-
cient are related between them. In fact, in that case,
we have to extract the coecient of t
n
from a func-
tion depending on the same variable n (or m). This
requires to apply the Lagrange Inversion Formula, ac-
cording to the diagonalization rule. Let us suppose
we have the binomial coecient
2nk
nk
and we wish
to know whether it corresponds to a Riordan array
or not. We have:
2n k
n k
= [t
nk
](1 +t)
2nk
=
= [t
n
](1 +t)
2n
t
1 +t
k
.
The function (1 + t)
2n
cannot be assumed as the
d(t) function of a Riordan array because it varies
as n varies. Therefore, let us suppose that k is
xed; we can apply the diagonalization rule with
F(t) = (t/(1 + t))
k
and (t) = (1 + t)
2
, and try to
nd a true generating function. We have to solve the
equation:
w = t(w) or w = t(1 +w)
2
.
This equation is tw
2
(1 2t)w + t = 0 and we are
looking for the unique solution w = w(t) such that
w(0) = 0. This is:
w(t) =
1 2t
1 4t
2t
.
We now perform the necessary computations:
F(w) =
w
1 +w
k
=
62 CHAPTER 5. RIORDAN ARRAYS
=
1 2t
1 4t
1
1 4t
k
=
=
1 4t
2
k
;
furthermore:
1
1 t
(w)
=
1
1 2t(1 +w)
=
1
1 4t
.
Therefore, the diagonalization gives:
2n k
n k
= [t
n
]
1
1 4t
1 4t
2
k
.
This shows that the binomial coecient is the generic
element of the Riordan array:
D =
1 4t
,
1
1 4t
2t
.
As a check, we observe that column 0 contains all the
elements with k = 0, i.e.,
2n
n
1 4t.
A simple example is:
n
k=0
2n k
n k
2
k
=
= [t
n
]
1
1 4t
1
1 2y
y =
1
1 4t
2
=
= [t
n
]
1
1 4t
1
1 4t
= [t
n
]
1
1 4t
= 4
n
.
By using the diagonalization rule as above, we can
show that:
2n +ak
n ck
kN
=
=
1 4t
, t
c1
1 4t
2t
a+2c
.
An interesting example is given by the following al-
ternating sum:
2n
n 3k
(1)
k
=
= [t
n
]
1
1 4t
1
1 +y
y = t
3
1 4t
2t
= [t
n
]
1
2
1 4t
+
1 t
2(1 3t)
=
=
1
2
2n
n
+ 3
n1
+
n,0
6
.
The reader is invited to solve, in a similar way, the
corresponding non-alternating sum.
In the same way we can deal with binomial coe-
cients of the form
pn+ak
nck
n
k
a
k
b
nk
and so we end up with the Pascal triangle. Con-
sequently, we assume c = 0. For any given triple
(a, b, c) we obtain one type of array from complete
walks and another from underdiagonal walks. How-
ever, the function h(t), that only depends on the A-
sequence, is the same in both cases, and we can nd it
by means of formula (5.3.2). In fact, A(t) = a+bt+ct
2
and h(t) is the solution of the functional equation
h(t) = a +bth(t) +ct
2
h(t)
2
having h(0) = 0:
h(t) =
1 bt
1 2bt +b
2
t
2
4act
2
2ct
2
(5.7.1)
The radicand 1 2bt + (b
2
4ac)t
2
= (1 (b +
2
ac)t)(1 (b 2
1 bt
2act
2
,
1 bt
2ct
2
.
In current literature, major importance is usually
given to the following three quantities:
1. the number of walks returning to the main diag-
onal; this is d
n
= [t
n
]d(t), for every n,
2. the total number of walks of length n; this is
n
=
n
k=0
d
n,k
, i.e., the value of the row sums
of the Riordan array;
3. the average distance from the main diagonal of
all the walks of length n; this is
n
=
n
k=0
kd
n,k
,
which is the weighted row sum of the Riordan
array, divided by
n
.
In Chapter 7 we will learn how to nd an asymp-
totic approximation for d
n
. With regard to the last
two points, the formulas for the row sums and the
weighted row sums given in the rst section allow us
to nd the generating functions (t) of the total num-
ber
n
of underdiagonal walks of length n, and (t)
of the total distance
n
of these walks from the main
diagonal:
(t) =
1
2at
1 (b + 2a)t
(a +b +c)t 1
(t) =
1
4at
1 (b + 2a)t
(a +b +c)t 1
2
.
In the symmetric case these formulas simplify as fol-
lows:
(t) =
1
2at
1 (b 2a)t
1 (b + 2a)t
1
(t) =
1
2at
1 bt
1 (b + 2a)t
1 (b 2a)t
1 (b + 2a)t
.
The alternating row sums and the diagonal sums
sometimes have some combinatorial signicance as
well, and so they can be treated in the same way.
The study of complete walks follows the same lines
and we only have to derive the form of the corre-
sponding Riordan array, which is:
(d
n,k
)
n,kN
=
,
1 bt
2ct
2
.
The proof is as follows. Since a complete walk can
go above the main diagonal, the array (d
n,k
)
n,kN
is
only the right part of an innite triangle, in which
k can also assume the negative values. By following
the logic of the theorem above, we see that the gen-
erating function of the nth row is ((c/w) +b +aw)
n
,
and therefore the bivariate generating function of the
extended triangle is:
d(t, w) =
c
w
+b +aw
n
t
n
=
=
1
1 (aw +b +c/w)t
.
If we expand this expression by partial fractions, we
get:
d(t, w) =
1
1
1
1bt
2ct
w
1
1
1bt+
2ct
w
=
1
1
1
1bt
2ct
w
+
+
1 bt
2at
1
w
1
1
1bt
2ct
1
w
.
64 CHAPTER 5. RIORDAN ARRAYS
The rst term represents the right part of the ex-
tended triangle and this corresponds to k 0,
whereas the second term corresponds to the left part
(k < 0). We are interested in the right part, and the
expression can be written as:
1
1
1
1bt
2ct
w
=
1
1 bt
2ct
k
w
k
which immediately gives the form of the Riordan ar-
ray.
5.8 Stirling numbers and Rior-
dan arrays
The connection between Riordan arrays and Stirling
numbers is not immediate. If we examine the two in-
nite triangles of the Stirling numbers of both kinds,
we immediately realize that they are not Riordan ar-
rays. It is not dicult to obtain the column generat-
ing functions for the Stirling numbers of the second
kind; by starting with the recurrence relation:
n + 1
k + 1
= (k + 1)
n
k + 1
n
k
n
2
= 2
n1
1,
and also indicates the form of the generating function
for column m:
S
m
(t) = (
n
m
nN
=
t
m
(1 t)(1 2t) (1 mt)
which is now proved by induction when we specialize
the recurrence relation above to k = m. This is left
to the reader as a simple exercise.
The generating functions for the Stirling numbers
of the rst kind are not so simple. However, let us
go on with the Stirling numbers of the second kind
proceeding in the following way; if we multiply the
recurrence relation by (k +1)!/(n+1)! we obtain the
new relation:
(k + 1)!
(n + 1)!
n + 1
k + 1
=
=
(k + 1)!
n!
n
k + 1
k + 1
n + 1
+
k!
n!
n
k
k + 1
n + 1
.
If we denote by d
n,k
the quantity k!
n
k
/n!, this is a
recurrence relation for d
n,k
, which can be written as:
(n + 1)d
n+1,k+1
= (k + 1)d
n,k+1
+ (k + 1)d
n,k
.
Let us now proceed as above and nd the column
generating functions for the new array (d
n,k
)
n,kN
.
Obviously, d
0
(t) = 1; by setting k = 0 in the new
recurrence:
(n + 1)d
n+1,1
= d
n,1
+d
n,0
and passing to generating functions: d
1
(t) = d
1
(t) +
1. The solution of this simple dierential equation
is d
1
(t) = e
t
1 (the reader can simply check this
solution, if he or she prefers). We can now go on
by setting k = 1 in the recurrence; we obtain: (n +
1)d
n+1,2
= 2d
n,2
+ 2d
n,1
, or d
2
(t) = 2d
2
(t) + 2(e
t
k+1
(t) = (k +1)d
k+1
(t) +
(k + 1)d
k
(t). By the induction hypothesis, we can
substitute d
k
(t) = (e
t
1)
k
and solve the dierential
equation thus obtained. In practice, we can simply
verify that d
k+1
(t) = (e
t
1)
k+1
; by substituting, we
have:
(k + 1)e
t
(e
t
1)
k
=
(k + 1)(e
t
1)
k+1
+ (k + 1)(e
t
1)
k
and this equality is obviously true.
The form of this generating function:
d
k
(t) = (
k!
n!
n
k
nN
= (e
t
1)
k
proves that (d
n,k
)
n,kN
is a Riordan array having
d(t) = 1 and th(t) = (e
t
1). This fact allows us
to prove algebraically a lot of identities concerning
the Stirling numbers of the second kind, as we shall
see in the next section.
For the Stirling numbers of the rst kind we pro-
ceed in an analogous way. We multiply the basic
recurrence:
n + 1
k + 1
= n
n
k + 1
n
k
n
k
/n!:
(k + 1)!
(n + 1)!
n + 1
k + 1
=
=
(k + 1)!
n!
n
k + 1
n
n + 1
+
k!
n!
n
k
k + 1
n + 1
,
that is:
(n + 1)f
n+1,k+1
= nf
n,k+1
+ (k + 1)f
n,k
.
In this case also we have f
0
(t) = 1 and by special-
izing the last relation to the case k = 0, we obtain:
f
1
(t) = tf
1
(t) +f
0
(t).
This is equivalent to f
1
(t) = 1/(1 t) and because
f
1
(0) = 0 we have:
f
1
(t) = ln
1
1 t
.
By setting k = 1, we nd the simple dierential equa-
tion f
2
(t) = tf
2
(t) + 2f
1
(t), whose solution is:
f
2
(t) =
ln
1
1 t
2
.
This suggests the general formula:
f
k
(t) = (
k!
n!
n
k
nN
=
ln
1
1 t
k
and again this can be proved by induction. In this
case, (f
n,k
)
n,kN
is the Riordan array having d(t) = 1
and th(t) = ln(1/(1 t)).
5.9 Identities involving the
Stirling numbers
The two recurrence relations for d
n,k
and f
n,k
do not
give an immediate evidence that the two triangles
are indeed Riordan arrays., because they do not cor-
respond to A-sequences. However, the A-sequences
for the two arrays can be easily found, once we know
their h(t) function. For the Stirling numbers of the
rst kind we have to solve the functional equation:
ln
1
1 t
= tA
ln
1
1 t
.
By setting y = ln(1/(1 t)) or t = (e
y
1)/y, we
have A(y) = ye
y
/(e
y
1) and this is the generating
function for the A-sequence we were looking for. In
a similar way, we nd that the A-sequence for the
triangle related to the Stirling numbers of the second
kind is:
A(t) =
t
ln(1 +t)
.
A rst result we obtain by using the correspon-
dence between Stirling numbers and Riordan arrays
concerns the row sums of the two triangles. For the
Stirling numbers of the rst kind we have:
n
k=0
n
k
= n!
n
k=0
k!
n!
n
k
1
k!
=
= n![t
n
]
e
y
y = ln
1
1 t
=
= n![t
n
]
1
1 t
= n!
as we observed and proved in a combinatorial way.
The row sums of the Stirling numbers of the second
kind give, as we know, the Bell numbers; thus we
can obtain the (exponential) generating function for
these numbers:
n
k=0
n
k
= n!
n
k=0
k!
n!
n
k
1
k!
=
= n![t
n
]
e
y
y = e
t
1
=
= n![t
n
] exp(e
t
1);
therefore we have:
(
B
n
n!
= exp(e
t
1).
We also dened the ordered Bell numbers as O
n
=
n
k=0
n
k
k=0
k!
n!
n
k
=
= [t
n
]
1
1 y
y = e
t
1
= [t
n
]
1
2 e
t
.
We have thus obtained the exponential generating
function:
(
O
n
n!
=
1
2 e
t
.
Stirling numbers of the two kinds are related be-
tween them in various ways. For example, we have:
n
k
k
m
=
n!
m!
k
k!
n!
n
k
m!
k!
k
m
=
=
n!
m!
[t
n
]
(e
y
1)
m
y = ln
1
1 t
=
=
n!
m!
[t
n
]
t
m
(1 t)
m
=
n!
m!
n 1
m1
.
Besides, two orthogonality relations exist between
Stirling numbers. The rst one is proved in this way:
n
k
k
m
(1)
nk
=
66 CHAPTER 5. RIORDAN ARRAYS
= (1)
n
n!
m!
k
k!
n!
n
k
m!
k!
k
m
(1)
k
=
= (1)
n
n!
m!
[t
n
]
(e
y
1)
m
y = ln
1
1 t
=
= (1)
n
n!
m!
[t
n
](t)
m
=
n,m
.
The second orthogonality relation is proved in a sim-
ilar way and reads:
n
k
k
m
(1)
nk
=
n,m
.
We introduced Stirling numbers by means of Stir-
ling identities relative to powers and falling factorials.
We can now prove these identities by using a Riordan
array approach. In fact:
n
k=0
n
k
(1)
nk
x
k
=
= (1)
n
n!
n
k=0
k!
n!
n
k
x
k
k!
(1)
k
=
= (1)
n
n![t
n
]
e
xy
y = ln
1
1 t
=
= (1)
n
n
x
= n!
x
n
= x
n
and:
n
k=0
n
k
x
k
= n!
n
k=0
k!
n!
n
k
x
k
=
= n![t
n
]
(1 +y)
x
y = e
t
1
=
= n![t
n
]e
tx
= x
n
.
We conclude this section by showing two possible
connections between Stirling numbers and Bernoulli
numbers. First we have:
n
k=0
n
k
(1)
k
k!
k + 1
= n!
n
k=0
k!
n!
n
k
(1)
k
k + 1
=
= n![t
n
]
1
y
ln
1
1 +y
y = e
t
1
=
= n![t
n
]
t
e
t
1
= B
n
which proves that Bernoulli numbers can be dened
in terms of the Stirling numbers of the second kind.
For the Stirling numbers of the rst kind we have the
identity:
n
k=0
n
k
B
k
= n!
n
k=0
k!
n!
n
k
B
k
k!
=
= n![t
n
]
y
e
y
1
y = ln
1
1 t
=
= n![t
n
]
1 t
t
ln
1
1 t
=
= n ln
1
1 t
=
(n 1)!
n + 1
.
Clearly, this holds for n > 0. For n = 0 we have:
n
k=0
n
k
B
k
= B
0
= 1.
Chapter 6
Formal methods
6.1 Formal languages
During the 1950s, the linguist Noam Chomski in-
troduced the concept of a formal language. Several
denitions have to be provided before a precise state-
ment of the concept can be given. Therefore, let us
proceed in the following way.
First, we recall denitions given in Section 2.1. An
alphabet is a nite set A = a
1
, a
2
, . . . , a
n
, whose
elements are called symbols or letters. A word on A
is a nite sequence of symbols in A; the sequence is
written by juxtaposing the symbols, and therefore a
word w is denoted by w = a
i1
a
i2
. . . a
ir
, and r = [w[ is
the length of the word. The empty sequence is called
the empty word and is conventionally denoted by ;
its length is obviously 0, and is the only word of 0
length.
The set of all the words on A, the empty word in-
cluded, is indicated by A
, and by A
+
if the empty
word is excluded. Algebraically, A
, ), if de-
notes the juxtaposition, is called the free monoid
generated by A. Observe that a monoid is an alge-
braic structure more general than a group, in which
all the elements have an inverse as well.
If w A
.
The basic denition concerning formal languages
is the following: a grammar is a 4-tuple G =
(T, N, , {), where:
T = a
1
, a
2
, . . . , a
n
is the alphabet of terminal
symbols;
N =
1
,
2
, . . . ,
m
is the alphabet of non-
terminal symbols;
N is the initial symbol;
{ is a nite set of productions.
Usually, the symbols in T are denoted by lower
case Latin letters; the symbols in N by Greek let-
ters or by upper case Latin letters. A production
is a pair (z
1
, z
2
) of words in T N, such that z
1
contains at least a symbol in N; the production is
often indicated by z
1
z
2
. If w (T N)
, we
can apply a production z
1
z
2
{ to w whenever
w can be decomposed w = w
1
z
1
w
2
, and the result
is the new word w
1
z
2
w
2
(T N)
; we will write
w = w
1
z
1
w
2
w
1
z
2
w
2
when w
1
z
1
w
2
is the decompo-
sition of w in which z
1
is the leftmost occurrence of
z
1
in w; in other words, if we also have w = w
1
z
1
w
2
,
then [w
1
[ < [ w
1
[.
Given a grammar G = (T, N, , {), we dene the
relation w w between words w, w (T N)
: the
relation holds if and only if a production z
1
z
2
{
exists such that z
1
occurs in w, w = w
1
z
1
w
2
is the
leftmost occurrence of z
1
in w and w = w
1
z
2
w
2
. We
also denote by
w
if and only if a sequence (w = w
1
, w
2
, . . . , w
s
= w)
exists such that w
1
w
2
, w
2
w
3
, . . . , w
s1
w
s
.
We observe explicitly that by our condition that in
every production z
1
z
2
the word z
1
should contain
67
68 CHAPTER 6. FORMAL METHODS
at least a symbol in N, if a word w
i
T
is produced
during a generation, it is terminal, i.e., the generation
should stop. By collecting all these denitions, we
nally dene the language generated by the grammar
G as the set:
L(G) =
w T
i.e., a word w T
belongs to the
Dyck language D if and only if:
i) the number of as in w equals the number of bs;
ii) in every prex z of w the number of as is not
less than the number of bs.
Proof: Let w D; if w = nothing has to be
proved. Otherwise, w is generated by the second pro-
duction and w = aw
1
bw
2
with w
1
, w
2
D; therefore,
if we suppose that i) holds for w
1
and w
2
, it also
holds for w. For ii), any prex z of w must have
one of the forms: a, az
1
where z
1
is a prex of w
1
,
aw
1
b or aw
1
bz
2
where z
2
is a prex of w
2
. By the
induction hypothesis, ii) should hold for z
1
and z
2
,
and therefore it is easily proved for w. Vice versa, let
us suppose that i) and ii) hold for w T
. If w = ,
then by ii) w should begin by a. Let us scan w until
we nd the rst occurrence of the symbol b such that
w = aw
1
bw
2
and in w
1
the number of bs equals the
number of as. By i) such occurrence of b must exist,
and consequently w
1
and w
2
must satisfy condition
i). Besides, if w
1
and w
2
are not empty, then they
should satisfy condition ii), by the very construction
of w
1
and the fact that w satises condition ii) by
hypothesis. We have thus obtained a decomposition
of w showing that the second production has been
used. This completes the proof.
If we substitute the letter a with the symbol (
and
the letter b with the symbol ), the theorem shows
that the words in the Dyck language are the possi-
ble parenthetizations of an expression. Therefore, the
number of Dyck words with n pairs of parentheses is
the Catalan number
2n
n
ab
ab aabb
ab abab aabb aaabbb
abab abaabb aabb aababb
abab ababab aabb aabbab
$
$
$
$
$
$
$
@
@
@
@
@
@
@
@
@
@@ h
h
h
h
h
h
h
h
h
hh
d
d
d
d
d
d
d
d
d
d
d
d
d
d
d
d
d
d
Figure 6.1: The generation of some Dyck words
Instead, the Dyck grammar is non-ambiguous; in
fact, as we have shown in the proof of the pre-
vious theorem, given any word w D, w = ,
there is only one decomposition w = aw
1
bw
2
, hav-
ing w
1
, w
2
D; therefore, w can only be generated
in a single way. In general, if we show that any word
in a context-free language L(G), generated by some
grammar G, has a unique decomposition according
to the productions in G, then the grammar cannot
be ambiguous. Because of the connection between
the Sch utzenberger methodology and non-ambiguous
context-free grammars, we are mainly interested in
this kind of grammars. For the sake of completeness,
a context-free language is called intrinsically ambigu-
ous i every context-free grammar generating it is
ambiguous. This denition stresses the fact that, if a
language is generated by an ambiguous grammar, it
can also be generated by some non-ambiguous gram-
mar, unless it is intrinsically ambiguous. It is possible
to show that intrinsically ambiguous languages actu-
ally exist; fortunately, they are not very frequent. For
example, the language generated by the previous am-
biguous grammar is 1
+
, i.e., the set of all the words
composed by any sequence of 1s, except the empty
word. actually, it is not an ambiguous language and
a non-ambiguous grammar generating it is given by
the same T, N, and the two productions:
1 1.
It is a simple matter to show that every word 11 . . . 1
can be uniquely decomposed according to these pro-
ductions.
6.3 Formal languages and pro-
gramming languages
In 1960, the formal denition of the programming
language ALGOL60 was published. ALGOL60 has
surely been the most inuential programming lan-
guage ever created, although it was actually used only
by a very limited number of programmers. Most of
the concepts we now nd in programming languages
were introduced by ALGOL60, of which, for exam-
ple, PASCAL and C are direct derivations. Here,
we are not interested in these aspects of ALGOL60,
but we wish to spend some words on how ALGOL60
used context-free grammars to dene its syntax in a
formal and precise way. In practice, a program in
ALGOL60 is a word generated by a (rather com-
plex) context-free grammar, whose initial symbol is
'program`.
The ALGOL60 grammar used, as terminal sym-
bol alphabet, the characters available on the stan-
dard keyboard of a computer; actually, they were the
characters punchable on a card, the input mean used
at that time to introduce a program into the com-
puter. The non-terminal symbol notation was one
of the most appealing inventions of ALGOL60: the
symbols were composed by entire English sentences
enclosed by the two special parentheses ' and `. This
allowed to clearly express the intended meaning of the
non-terminal symbols. The previous example con-
cerning 'program` makes surely sense. Another tech-
nical device used by ALGOL60 was the compaction
of productions; if we had several production with the
same left hand symbol w
1
, w
2
, . . . , w
k
,
70 CHAPTER 6. FORMAL METHODS
they were written as a single rule:
::= w
1
[ w
2
[ [ w
k
where ::= was a metasymbol denoting denition and
[ was read or to denote alternatives. This notation
is usually called Backus Normal Form (BNF).
Just to do a very simple example, in Figure 6.1
(lines 1 through 6) we show how integer numbers
were dened. This denition avoids leading 0s in
numbers, but allows both +0 and 0. Productions
can be easily changed to avoid +0 or 0 or both.
In the same gure, line 7 shows the denition of the
conditional statements.
This kind of denition gives a precise formula-
tion of all the clauses in the programming language.
Besides, since the program has a single generation
according to the grammar, it is possible to nd
this derivation starting from the actual program and
therefore give its exact structure. This allows to give
precise information to the compiler, which, in a sense,
is directed from the formal syntax of the language
(syntax directed compilation).
A very interesting aspect is how this context-free
grammar denition can avoid ambiguities in the in-
terpretation of a program. Let us consider an expres-
sion like a + b c; according to the rules of Algebra,
the multiplication should be executed before the ad-
dition, and the computer must follow this convention
in order to create no confusion. This is done by the
simplied productions given by lines 8 through 11 in
Figure 6.1. The derivation of the simple expression
a+b c, or of a more complicated expression, reveals
that it is decomposed into the sum of a and b c;
this information is passed to the compiler and the
multiplication is actually performed before addition.
If powers are also present, they are executed before
products.
This ability of context-free grammars in design-
ing the syntax of programming languages is very im-
portant, and after ALGOL60 the syntax of every
programming language has always been dened by
context-free grammars. We conclude by remembering
that a more sophisticated approach to the denition
of programming languages was tried with ALGOL68
by means of van Wijngaardens grammars, but the
method revealed too complex and was abandoned.
6.4 The symbolic method
The Sch utzenbergers method allows us to obtain
the counting generating function for every non-
ambiguous language, starting with the correspond-
ing non-ambiguous grammar and proceeding in a me-
chanical way. Let us begin by a simple example; Fi-
bonacci words are the words on the alphabet 0, 1
beginning and ending by the symbol 1 and never con-
taining two consecutive 0s. For small values of n,
Fibonacci words of length n are easily displayed:
n = 1 1
n = 2 11
n = 3 111, 101
n = 4 1111, 1011, 1101
n = 5 11111, 10111, 11011, 11101, 10101
If we count them by their length, we obtain the
sequence 0, 1, 1, 2, 3, 5, 8, . . ., which is easily recog-
nized as the Fibonacci sequence. In fact, a word of
length n is obtained by adding a trailing 1 to a word of
length n1, or adding a trailing 01 to a word of length
n 2. This immediately shows, in a combinatorial
way, that Fibonacci words are counted by Fibonacci
numbers. Besides, we get the productions of a non-
ambiguous context-free grammar G = (T, N, , {),
where T = 0, 1, N = , = and { contains:
1 1 01
(these productions could have been written ::=
1 [ 1 [ 01 by using the ALGOL60 notations).
We are now going to obtain the counting gener-
ating function for Fibonacci words by applying the
Sch utzenbergers method. This consists in the fol-
lowing steps:
1. every non-terminal symbol N is transformed
into the name of its counting generating function
(t);
2. every terminal symbol is transformed into t;
3. the empty word is transformed into 1;
4. every [ sign is transformed into a + sign, and ::=
is transformed into an equal sign.
After having performed these transformations, we ob-
tain a system of equations, which can be solved in the
unknown generating functions introduced in the rst
step. They are the counting generating functions for
the languages generated by the corresponding non-
terminal symbols, when we consider them as the ini-
tial symbols.
The denition of the Fibonacci words produces:
(t) = t +t(t) +t
2
(t)
the solution of which is:
(t) =
t
1 t t
2
;
this is obviously the generating function for the Fi-
bonacci numbers. Therefore, we have shown that the
6.5. THE BIVARIATE CASE 71
1 'digit` ::= 0 [ 1 [ 2 [ 3 [ 4 [ 5 [ 6 [ 7 [ 8 [ 9
2 'non zero digit` ::= 1 [ 2 [ 3 [ 4 [ 5 [ 6 [ 7 [ 8 [ 9
3 'sequence of digits` ::= 'digit` [ 'digit` 'sequence of digits`
4 'unsigned number` ::= 'digit` [ 'non zero digit` 'sequence of digits`
5 'signed number` ::= +'unsigned number` [ 'unsigned number`
6 'integer number` ::= 'unsigned number` [ 'signed number`
7 'conditional clause` ::= if 'condition` then 'instruction` [
if 'condition` then 'instruction` else 'instruction`
8 'expression` ::= 'term` [ 'term` +'expression`
9 'term` ::= 'factor` [ 'term` 'factor`
10 'factor` ::= 'element` [ 'factor` 'element`
11 'element` ::= 'constant` [ 'variable` [ ('expression`)
Table 6.1: Context-free languages and programming languages
number of Fibonacci words of length n is F
n
, as we
have already proved by combinatorial arguments.
In the case of the Dyck language, the denition
yields:
(t) = 1 +t
2
(t)
2
and therefore:
(t) =
1
1 4t
2
2t
2
.
Since every word in the Dyck language has an even
length, the number of Dyck words with 2n symbols is
just the nth Catalan number, and this also we knew
by combinatorial means.
Another example is given by the Motzkin words;
these are words on the alphabet a, b, c in which
a, b act as parentheses in the Dyck language, while
c is free and can appear everywhere. Therefore, the
denition of the language is:
::= [ c[ ab
if is the only non-terminal symbol. The
Sch utzenbergers method gives the equation:
(t) = 1 +t(t) +t
2
(t)
2
whose solution is easily found:
(t) =
1 t
1 2t 3t
2
2t
2
.
By expanding this function we nd the sequence of
Motzkin numbers, beginning:
n 0 1 2 3 4 5 6 7 8 9
M
n
1 1 2 4 9 21 51 127 323 835
These numbers count the so-called unary-binary
trees, i.e., trees the nodes of which have ariety 1 or 2.
They can be dened in a pictorial way by means of
an object grammar:
An object grammar denes combinatorial objects
instead of simple letters or words; however, most
times, it is rather easy to pass from an object gram-
mar to an equivalent context-free grammar, and
therefore obtain counting generating functions by
means of Sch utzenbergers method. For example, the
object grammar in Figure 6.2 is obviously equivalent
to the context-free grammar for Motzkin words.
6.5 The bivariate case
In Sch utzenbergers method, the role of the indeter-
minate t is to count the number of letters or symbols
occurring in the generated words; because of that, ev-
ery symbol appearing in a production is transformed
into a t. However, we can wish to count other param-
eters instead of or in conjunction with the number of
symbols. This is accomplished by modifying the in-
tended meaning of the indeterminate t and/or intro-
ducing some other indeterminate to take into account
the other parameters.
For example, in the Dyck language, we can wish to
count the number of pairs a, b occurring in the words;
this means that t no longer counts the single letters,
but counts the pairs. Therefore, the Sch utzenbergers
method gives the equation (t) = 1 + t(t)
2
, whose
solution is just the generating function of the Catalan
numbers.
An interesting application is as follows. Let us sup-
pose we wish to know how many Fibonacci words of
length n contain k zeroes. Besides the indeterminate
t counting the total number of symbols, we introduce
a new indeterminate z counting the number of zeroes.
From the productions of the Fibonacci grammar, we
derive an equation for the bivariate generating func-
tion (t, z), in which the coecient of t
n
z
k
is just the
72 CHAPTER 6. FORMAL METHODS
t
t
t
t
::=
`
t
t
t
t
d
d
d
d
t
t
t
t
t
t
t
t
n,k
= [t
n
][z
k
]
t
1 t zt
2
=
= [t
n
][z
k
]
t
1 t
1
1 z
t
2
1t
=
= [t
n
][z
k
]
t
1 t
k=0
z
k
t
2
1 t
k
=
= [t
n
][z
k
]
k=0
t
2k+1
z
k
(1 t)
k+1
= [t
n
]
t
2k+1
(1 t)
k+1
=
= [t
n2k1
]
1
(1 t)
k+1
=
n k 1
k
.
Therefore, the number of Fibonacci words of length
n containing k zeroes is counted by a binomial coe-
cient. The second expression in the derivation shows
that the array (
n,k
)
n,kN
is indeed a Riordan ar-
ray (t/(1 t), t/(1 t)), which is the Pascal triangle
stretched vertically, i.e., column k is shifted down by
k positions (k + 1, in reality). The general formula
we know for the row sums of a Riordan array gives:
n,k
=
n k 1
k
=
= [t
n
]
t
1 t
1
1 y
y =
t
2
1 t
=
= [t
n
]
t
1 t t
2
= F
n
as we were expecting. A more interesting problem
is to nd the average number of zeroes in all the Fi-
bonacci words with n letters. First, we count the
total number of zeroes in all the words of length n:
k
k
n,k
=
n k 1
k
k =
= [t
n
]
t
1 t
y
(1 y)
2
y =
t
2
1 t
=
= [t
n
]
t
3
(1 t t
2
)
2
.
We extract the coecient:
[t
n
]
t
3
(1 t t
2
)
2
=
= [t
n1
]
1
1 t
1
1
t
2
=
=
1
5
[t
n1
]
1
(1 t)
2
2
5
[t
n
]
t
(1 t)(1
t)
+
+
1
5
[t
n1
]
1
(1
t)
2
=
=
1
5
[t
n1
]
1
(1 t)
2
2
5
5
[t
n
]
1
1 t
+
+
2
5
5
[t
n
]
1
1
t
+
1
5
[t
n1
]
1
(1
t)
2
.
The last two terms are negligible because they rapidly
tend to 0; therefore we have:
k
k
n,k
n
5
n1
2
5
n
.
To obtain the average number Z
n
of zeroes, we need
to divide this quantity by F
n
n
/
5, the total
number of Fibonacci words of length n:
Z
n
=
k
k
n,k
F
n
2
5
=
5
5
10
n
2
5
.
This shows that the average number of zeroes grows
linearly with the length of the words and tends to
become the 27.64% of this length, because (5
5)/10 0.2763932022 . . ..
6.6 The Shift Operator
In the usual mathematical terminology, an operator
is a mapping from some set T
1
of functions into some
6.7. THE DIFFERENCE OPERATOR 73
other set of functions T
2
. We have already encoun-
tered the operator (, acting from the set of sequences
(which are properly functions from N into R or C)
into the set of formal power series (analytic func-
tions). Other usual examples are D, the operator
of dierentiation, and
k=0
n
k
x
k
1
x
=
1
x + 1
1
x
=
1
x(x + 1)
x
m
x + 1
m
x
m
x
m1
f(x)
g(x)
=
f(x + 1)
g(x + 1)
f(x)
g(x)
=
=
f(x + 1)g(x) f(x)g(x)
g(x)g(x + 1)
+
+
f(x)g(x) f(x)g(x + 1)
g(x)g(x + 1)
=
=
(f(x))g(x) f(x)g(x)
g(x)Eg(x)
The dierence operator can be iterated:
2
f(x) = f(x) = (f(x + 1) f(x)) =
= f(x + 2) 2f(x + 1) +f(x).
From a formal point of view, we have:
2
= (E 1)
2
= E
2
2E + 1
and in general:
n
= (E 1)
n
=
n
k=0
n
k
(1)
nk
E
k
=
= (1)
n
n
k=0
n
k
(E)
k
This is a very important formula, and it is the rst
example for the interest of combinatorics and gener-
ating functions in the theory of nite operators. In
fact, let us iterate on f(x) = 1/x:
2
1
x
=
1
(x + 1)(x + 2)
+
1
x(x + 1)
=
=
x +x + 2
x(x + 1)(x + 2)
=
2
x(x + 1)(x + 2)
n
1
x
=
(1)
n
n!
x(x + 1) (x +n)
as we can easily show by mathematical induction. In
fact:
n+1
1
x
=
(1)
n
n!
(x + 1) (x +n + 1)
(1)
n
n!
x(x + 1) (x +n)
=
=
(1)
n+1
(n + 1)!
x(x + 1) (x +n + 1)
The formula for
n
now gives the following identity:
n
1
x
=
n
k=0
n
k
(1)
nk
E
k
1
x
By multiplying everything by (1)
n
this identity can
be written as:
n!
x(x + 1) (x +n)
=
1
x
x+n
n
=
n
k=0
n
k
(1)
k
x +k
and therefore we have both a way to express the in-
verse of a binomial coecient as a sum and an expres-
sion for the partial fraction expansion of the polyno-
mial x(x + 1) (x +n) inverse.
6.8 Shift and Dierence Oper-
ators - Example I
As the dierence operator can be expressed in
terms of the shift operator E, so E can be expressed
in terms of :
E = + 1
This rule can be iterated, giving the summation for-
mula:
E
n
= ( + 1)
n
=
n
k=0
n
k
k
which can be seen as the dual formula of the one
already considered:
n
=
n
k=0
n
k
(1)
nk
E
k
6.8. SHIFT AND DIFFERENCE OPERATORS - EXAMPLE I 75
The evaluation of the successive dierences of any
function f(x) allows us to state and prove two identi-
ties, which may have combinatorial signicance. Here
we record some typical examples; we mark with an
asterisk the cases when
0
f(x) = If(x).
1) The function f(x) = 1/x has already been de-
veloped, at least partially:
1
x
=
1
x(x + 1)
n
1
x
=
(1)
n
n!
x(x + 1) (x +n)
(1)
n
x
x +n
n
n
k
(1)
k
x +k
=
n!
x(x + 1) (x +n)
=
=
1
x
x +n
n
n
k
(1)
k
x +k
k
1
=
x
x +n
.
2
p +x
m+x
=
mp
(m+x)(m+x + 1)
n
p +x
m+x
=
mp
m+x
(1)
n1
m+x +n
n
n
k
(1)
k
p +k
m+k
=
mp
m
m+n
n
1
(n > 0)
n
k
m+k
k
1
(1)
k
=
m
m+n
(see above).
3) Another version of the rst example:
1
px +m
=
p
(px +m)(px +p +m)
n
1
px +m
=
=
(1)
n
n!p
n
(px +m)(px +p +m) (px +np +m)
According to this rule, we should have:
0
(p +
x)/(m + x) = (p m)(m + x); in the second next
sum, however, we have to set
0
= I, and therefore
we also have to subtract 1 from both members in or-
der to obtain a true identity; a similar situation arises
whenever we have
0
= I.
n
k
(1)
k
pk +m
=
n!p
n
m(m+p) (m+np)
n
k
(1)
k
k!p
k
m(m+p) (m+pk)
=
1
pn +m
4
n
H
x
=
(1)
n1
n
x +n
n
n
k
(1)
k
H
x+k
=
1
n
x +n
n
1
(n > 0)
n
k=1
n
k
(1)
k1
k
x +k
k
1
= H
x+n
H
x
where is to be noted the case x = 0.
5
n
xH
x
=
(1)
n
n 1
x +n 1
n 1
n
k
(1)
k
(x +k)H
x+k
=
=
1
n 1
x +n 1
n 1
n
k
(1)
k
k 1
x +k 1
k 1
1
=
= (x +n) (H
x+n
H
x
) n
6) Harmonic numbers and binomial coecients:
x
m
H
x
=
x
m1
H
x
+
1
m
x
m
H
x
=
x
mn
(H
x
+H
m
H
mn
)
n
k
(1)
k
x +k
m
H
x+k
=
= (1)
n
x
mn
(H
x
+H
m
H
mn
)
n
k
x
mk
(H
x
+H
m
H
mk
) =
=
x +n
m
H
x+n
and by performing the sums on the left containing
H
x
and H
m
:
n
k
x
mk
H
mk
=
=
x +n
m
(H
x
+H
m
H
x+n
)
76 CHAPTER 6. FORMAL METHODS
7) The function ln(x) can be inserted in this group:
ln(x) = ln
x + 1
x
n
ln(x) = (1)
n
n
k
(1)
k
ln(x +k)
n
k
(1)
k
ln(x +k) =
=
n
k
(1)
k
ln(x +k)
n
k=0
k
j=0
(1)
k+j
n
k
k
j
n
p
x
= (p 1)
n
p
x
n
k
(1)
k
p
k
= (p 1)
n
n
k
(p 1)
k
= p
n
2) Two sums involving Fibonacci numbers:
F
x
= F
x1
n
F
x
= F
xn
n
k
(1)
k
F
x+k
= (1)
n
F
xn
n
k
F
xk
= F
x+n
3) Falling factorials are an introduction to binomial
coecients:
x
m
= mx
m1
n
x
m
= m
n
x
mn
n
k
(1)
k
(x +k)
m
= (1)
n
m
n
x
mn
n
k
m
k
x
mk
= (x +n)
m
4) Similar sums hold for raising factorials:
x
m
= m(x + 1)
m1
n
x
m
= m
n
(x +n)
mn
n
k
(1)
k
(x +k)
m
= (1)
n
m
n
(x +n)
mn
n
k
m
k
(x +k)
mk
= (x +n)
m
5) Two sums involving the binomial coecients:
x
m
x
m1
x
m
x
mn
n
k
(1)
k
x +k
m
= (1)
n
x
mn
n
k
x
mk
x +n
m
p +x
m+x
p +x
m+x + 1
p +x
m+x
p +x
m+x +n
n
k
(1)
k
p +k
m+k
= (1)
n
p
m+n
n
k
p
m+k
p +n
m+n
x
m
1
=
m
m1
x + 1
m+ 1
x
m
1
= (1)
n
m
m+n
x +n
m+n
n
k
(1)
k
x +k
m
1
=
m
m+n
x +n
m+n
n
k
(1)
k
m
m+k
x +k
m+k
1
=
x +n
m
1
6.10. THE ADDITION OPERATOR 77
8) Two sums with the central binomial coecients:
1
4
x
2x
x
=
1
2(x + 1)
1
4
x
2x
x
n
1
4
x
2x
x
=
(1)
n
(2n)!
n!(x + 1) (x +n)
1
4
n
2n
n
n
k
(1)
k
2x + 2k
x +k
1
4
k
=
=
(2n)!
n!(x + 1) (x +n)
1
4
n
2x
x
=
=
1
4
n
2n
n
x +n
n
2x
x
n
k
(1)
k
(2k)!
k!(x + 1) (x +k)
1
4
k
2x
x
=
=
1
4
n
2x + 2n
x +n
2x
x
1
=
4
x
2x + 1
2x
x
n
4
x
2x
x
1
=
=
1
2n 1
(1)
n1
(2n)!4
x
2
n
n!(2x + 1) (2x + 2n 1)
2x
x
n
k
(1)
k
4
k
2x + 2k
x +k
1
=
=
1
2n 1
(2n)!
2
n
n!(2x + 1) (2x + 2n 1)
2x
x
n
k
1
2k 1
(2k)!(1)
k1
2
k
k!(2x + 1) (2x + 2k 1)
=
= 4
n
2x + 2n
x +n
2x
x
1
6.10 The Addition Operator
The addition operator S is analogous to the dierence
operator:
S = E + 1
and in fact a simple connection exists between the
two operators:
S(1)
x
f(x) = (1)
x+1
f(x + 1) + (1)
x
f(x) =
= (1)
x+1
(f(x + 1) f(x)) =
= (1)
x1
f(x)
Because of this connection, the addition operator has
not widely considered in the literature, and the sym-
bol S only is used here for convenience. Likewise the
dierence operator, the addition operator can be it-
erated and often produces interesting combinatorial
sums according to the rules:
S
n
= (E + 1)
n
=
n
k
E
k
E
n
= (S 1)
n
=
n
k
(1)
nk
S
k
Some examples are in order here:
1) Fibonacci numbers are typical:
SF
m
= F
m+1
+F
m
= F
m+2
S
n
F
m
= F
m+2n
n
k
F
m+k
= F
m+2n
n
k
(1)
k
F
m+2k
= (1)
n
F
m+n
2) Here are the binomial coecients:
S
m
x
m+ 1
x + 1
S
n
m
x
m+n
x +n
n
k
m
x +k
m+n
x +n
n
k
(1)
k
m+k
x +k
= (1)
n
m
x +n
m
x
1
=
m+ 1
m
m1
x
1
S
n
m
x
=
m+ 1
mn + 1
mn
x
n
k
m
x +k
1
=
m+ 1
mn + 1
m+n
x
n
k
(1)
k
mk + 1
m+k
x
1
=
1
m+ 1
m
x +n
1
.
We can obviously invent as many expressions as we
desire and, correspondingly, may obtain some sum-
mation formulas of combinatorial interest. For ex-
ample:
S = (E + 1)(E 1) = E
2
1 =
= (E 1)(E + 1) = S
78 CHAPTER 6. FORMAL METHODS
This derivation shows that the two operators S and
commute. We can directly verify this property:
Sf(x) = S(f(x + 1) f(x)) =
= f(x + 2) f(x) = (E
2
1)f(x)
Sf(x) = (f(x + 1) +f(x)) =
= f(x + 2) f(x) = (E
2
1)f(x)
Consequently, we have the two summation formulas:
n
S
n
= (E
2
1)
n
=
n
k
(1)
nk
E
2k
E
2n
= (S + 1)
n
=
n
k
k
S
k
A simple example is oered by the Fibonacci num-
bers:
SF
m
= F
m+1
(S)
n
F
m
=
n
S
n
F
m
= F
m+n
n
k
(1)
k
F
m+2k
= (1)
n
F
m+n
n
k
F
m+k
= F
m+2n
but these identities have already been proved using
the addition operator S.
6.11 Denite and Indenite
summation
The following result is one of the most important rule
connecting the nite operator method and combina-
torial sums:
n
k=0
E
k
= [z
n
]
1
1 z
1
1 Ez
=
= [z
n
]
1
(E 1)z
1
1 Ez
1
1 z
=
=
1
E 1
[z
n+1
]
1
1 Ez
1
1 z
=
=
E
n+1
1
E 1
= (E
n+1
1)
1
We observe that the operator E commutes with the
indeterminate z, which is constant with respect to
the variable x, on which E operates. The rule above
is called the rule of denite summation; the operator
1
is called indenite summation and is often de-
noted by . In order to make this point clear, let us
consider any function f(x) and suppose that a func-
tion g(x) exists such that g(x) = f(x). Hence we
have
1
f(x) = g(x) and the rule of denite sum-
mation immediately gives:
n
k=0
f(x +k) = g(x +n + 1) g(x)
This is analogous to the rule of denite integration.
In fact, the operator of indenite integration
dx
is inverse of the dierentiation operator D, and if
f(x) is any function, a primitive function for f(x)
is any function g(x) such that D g(x) = f(x) or
D
1
f(x) =
b
a
f(x)dx = g(b) g(a)
The formula for denite summation can be written
in a similar way, if we consider the integer variable k
and set a = x and b = x +n + 1:
b1
k=a
f(k) = g(b) g(a)
These facts create an analogy between
1
and
D
1
, or and
x
m
= x
x
m+ 1
=
= x
x
m+ 1
(x)
x + 1
m+ 1
=
= x
x
m+ 1
x + 1
m+ 1
=
= x
x
m+ 1
x + 1
m+ 2
k=a
k
k
m
= (b + 1)
b + 1
m+ 1
a
m+ 1
b + 2
m+ 2
a + 1
m+ 2
k=0
k
k
m
= (n + 1)
n + 1
m+ 1
n + 2
m+ 2
k=0
E
k
= (E
n+1
1)
1
= (E
n+1
1)
is the most important result of the operator method.
In fact, it reduces the sum of the successive elements
in a sequence to the computation of the indenite
sum, and this is just the operator inverse of the
dierence. Unfortunately,
1
is not easy to com-
pute and, apart from a restricted number of cases,
there is no general rule allowing us to guess what
1
f(x) = f(x) might be. In this rather pes-
simistic sense, the rule is very ne, very general and
completely useless.
However, from a more positive point of view, we
can say that whenever we know, in some way or an-
other, an expression for
1
f(x) = f(x), we have
solved the problem of nding
n
k=0
f(x + k). For
example, we can look at the dierences computed in
the previous sections and, for each of them, obtain
the of some function; in this way we immediately
have a number of sums. The negative point is that,
sometimes, we do not have a simple function and,
therefore, the sum may not have any combinatorial
interest.
Here is a number of identities obtained by our pre-
vious computations.
1) We have again the partial sums of the geometric
series:
1
p
x
=
p
x
p 1
= p
x
n
k=0
p
x+k
=
p
x+n+1
p
x
p 1
n
k=0
p
k
=
p
n+1
1
p 1
(x = 0)
2) The sum of consecutive Fibonacci numbers:
1
F
x
= F
x+1
= F
x
n
k=0
F
x+k
= F
x+n+2
F
x+1
n
k=0
F
k
= F
n+2
1 (x = 0)
3) The sum of consecutive binomial coecients with
constant denominator:
x
m
x
m+ 1
x
m
k=0
x +k
m
x +n + 1
m+ 1
x
m+ 1
k=0
k
m
n + 1
m+ 1
(x = 0)
4) The sum of consecutive binomial coecients:
p +x
m+x
p +x
m+x 1
p +x
m+x
k=0
p +k
m+k
p +n + 1
m+n
p
m1
1
x
m
=
1
m+ 1
x
m+1
= x
m
n
k=0
(x +k)
m
=
(x +n + 1)
m+1
x
m+1
m+ 1
n
k=0
k
m
=
1
m+ 1
(n + 1)
m+1
(x = 0).
6) The sum of raising factorials:
1
x
m
=
1
m+ 1
(x 1)
m+1
= x
m
80 CHAPTER 6. FORMAL METHODS
n
k=0
(x +k)
m
=
(x +n)
m+1
(x 1)
m+1
m+ 1
n
k=0
k
m
=
1
m+ 1
n
m+1
(x = 0).
7) The sum of inverse binomial coecients:
x
m
1
=
m
m1
x 1
m1
1
=
x
m
k=0
x +k
m
1
=
m
m1
x 1
m1
x +n
m1
.
8) The sum of harmonic numbers. Since 1 = x, we
have:
1
H
x
= xH
x
x = H
x
n
k=0
H
x+k
= (x +n + 1)H
x+n+1
xH
x
(n + 1)
n
k=0
H
k
= (n + 1)H
n+1
(n + 1) =
= (n + 1)H
n
n (x = 0).
6.13 The Euler-McLaurin Sum-
mation Formula
One of the most striking applications of the nite
operator method is the formal proof of the Euler-
McLaurin summation formula. The starting point is
the Taylor theorem for the series expansion of a func-
tion f(x) C
(x) +
h
2
2!
f
(x) +
+ +
h
n
n!
f
(n)
(x) +
can be interpreted in the sense of operators as a result
connecting the shift and the dierentiation operators.
In fact, for h = 1, it can be written as:
Ef(x) = If(x) +
Df(x)
1!
+
D
2
f(x)
2!
+
and therefore as a relation between operators:
E = 1 +
D
1!
+
D
2
2!
+ +
D
n
n!
+ = e
D
This formal identity relates the nite operator E and
the innitesimal operator D, and subtracting 1 from
both sides it can be formulated as:
= e
D
1
By inverting, we have a formula for the operator:
=
1
e
D
1
=
1
D
D
e
D
1
B
0
+
B
1
1!
D +
B
2
2!
D
2
+
=
= D
1
1
2
I +
1
12
D
1
720
D
3
+
+
1
30240
D
5
1
1209600
D
7
+ .
This is not a series development since, as we know,
the Bernoulli numbers diverge to innity. We have a
case of asymptotic development, which only is dened
when we consider a limited number of terms, but in
general diverges if we let the number of terms go to
innity. The number of terms for which the sum ap-
proaches its true value depends on the function f(x)
and on the argument x.
From the indenite we can pass to the denite sum
by applying the general rule of Section 6.12. Since
D
1
=
k=0
f(k) =
n
0
f(x) dx
1
2
[f(x)]
n
0
+
+
1
12
[f
(x)]
n
0
1
720
[f
(x)]
n
0
+
and this is the celebrated Euler-McLaurin summation
formula. It expresses a sum as a function of the in-
tegral and the successive derivatives of the function
f(x). In this sense, the formula can be seen as a
method for approximating a sum by means of an in-
tegral or, vice versa, for approximating an integral by
means of a sum, and this was just the point of view
of the mathematicians who rst developed it.
As a simple but very important example, let us
nd an asymptotic development for the harmonic
numbers H
n
. Since H
n
= H
n1
+ 1/n, the Euler-
McLaurin formula applies to H
n1
and to the func-
tion f(x) = 1/x, giving:
H
n1
=
n
1
dx
x
1
2
1
x
n
1
+
1
12
1
x
2
n
1
1
720
6
x
4
n
1
+
1
30240
120
x
6
n
1
+
= ln n
1
2n
+
1
2
1
12n
2
+
1
12
+
1
120n
4
1
120
1
256n
6
+
1
252
+
6.14. APPLICATIONS OF THE EULER-MCLAURIN FORMULA 81
In this expression a number of constants appears, and
they can be summed together to form a constant ,
provided that the sum actually converges. However,
we observe that as n this constant is the
Euler-Mascheroni constant:
lim
n
(H
n1
ln n) = = 0.577215664902 . . .
By adding 1/n to both sides of the previous relation,
we eventually nd:
H
n
= ln n + +
1
2n
1
12n
2
+
1
120n
4
1
252n
6
+
and this is the asymptotic expansion we were looking
for.
6.14 Applications of the Euler-
McLaurin Formula
As another application of the Euler-McLaurin sum-
mation formula, we now show the derivation of the
Stirlings approximation for n!. The rst step consists
in taking the logarithm of that quantity:
ln n! = ln 1 + ln 2 + ln 3 + + ln n
so that we are reduced to compute a sum and hence
to apply the Euler-McLaurin formula:
ln(n 1)! =
n1
k=1
ln k =
n
1
ln xdx
1
2
[ln x]
n
1
+
+
1
12
1
x
n
1
1
720
2
x
3
n
1
+ =
= nln n n + 1
1
2
ln n +
1
12n
1
12
1
360n
2
+
1
360
+ .
Here we have used the fact that
ln xdx = xln x
x. At this point we can add lnn to both sides and
introduce a constant = 1 1/12 +1/360 It is
not by all means easy to determine directly the value
of , but by other approaches to the same problem
it is known that = ln
2. Numerically, we can
observe that:
1
1
12
+
1
360
= 0.919(4) and ln
2 0.9189388.
We can now go on with our sum:
ln n! = nln nn+
1
2
ln n+ln
2+
1
12n
1
360n
3
+
To obtain the value of n! we only have to take expo-
nentials:
n! =
n
n
e
n
2 exp
1
12n
exp
1
360n
3
2n
n
n
e
n
1 +
1
12n
+
1
288n
2
+
1
1
360n
3
+
=
=
2n
n
n
e
n
1 +
1
12n
+
1
288n
2
.
This is the well-known Stirlings approximation for
n!. By means of this approximation, we can also nd
the approximation for another important quantity:
2n
n
=
(2n)!
n!
2
=
4n
2n
e
2n
2n
n
e
2n
1 +
1
24n
+
1
1142n
2
1 +
1
12n
+
1
288n
2
2
=
=
4
n
1
1
8n
+
1
128n
2
+
.
Another application of the Euler-McLaurin sum-
mation formula is given by the sum
n
k=1
k
p
, when
p is any integer constant dierent from 1, which is
the case of the harmonic numbers:
n1
k=0
k
p
=
n
0
x
p
dx
1
2
[x
p
]
n
0
+
1
12
px
p1
n
0
1
720
n
0
+ =
=
n
p+1
p + 1
n
p
2
+
pn
p1
12
k=0
k
p
=
n
p+1
p + 1
+
n
p
2
+
pn
p1
12
k=1
k
p
=
n
1
x
p
dx
1
2
[x
p
]
n
1
+
1
12
px
p1
n
1
1
720
n
1
+ =
=
n
p+1
p + 1
1
p + 1
1
2
n
p
+
1
2
+
pn
p1
12
p
12
p(p 1)(p 2)n
p3
720
+
82 CHAPTER 6. FORMAL METHODS
+
p(p 1)(p 2)
720
n
k=1
k
p
=
n
p+1
p + 1
+
n
p
2
+
pn
p1
12
k=1
k =
2
3
n
n +
n
2
+K
1/2
+
1
24
n
+
K
1/2
0.2078862 . . .
n
k=1
1
k
= 2
n +K
1/2
+
1
2
n
1
24n
n
+
K
1/2
1.4603545 . . .
For p = 2 we nd:
n
k=1
1
k
2
= K
2
1
n
+
1
2n
2
1
6n
3
+
1
30n
5
It is possible to show that K
2
=
2
/6 and therefore
we have a way to approximate the sum (see Section
2.7).
Chapter 7
Asymptotics
7.1 The convergence of power
series
In many occasions, we have pointed out that our ap-
proach to power series was purely formal. Because
of that, we always spoke of formal power series,
and never considered convergence problems. As we
have seen, a lot of things can be said about formal
power series, but now the moment has arrived that
we must turn to talk about the convergence of power
series. We will see that this allows us to evaluate the
asymptotic behavior of the coecients f
n
of a power
series
n
f
n
t
n
, thus solving many problems in which
the exact value of f
n
cannot be found. In fact, many
times, the asymptotic evaluation of f
n
can be made
more precise and an actual approximation of f
n
can
be found.
The natural setting for talking about convergence
is the eld C of the complex numbers and therefore,
from now on, we will think of the indeterminate t
as of a variable taking its values from C. Obviously,
a power series f(t) =
n
f
n
t
n
converges for some
t
0
C i f(t
0
) =
n
f
n
t
n
0
< , and diverges i
lim
tt0
f(t) = . There are cases for which a series
neither converges nor diverges; for example, when t =
1, the series
n
(1)
n
t
n
does not tend to any limit,
nite or innite. Therefore, when we say that a series
does not converge (to a nite value) for a given value
t
0
C, we mean that the series in t
0
diverges or does
not tend to any limit.
A basic result on convergence is given by the fol-
lowing:
Theorem 7.1.1 Let f(t) =
n
f
n
t
n
be a power se-
ries such that f(t
0
) converges for the value t
0
C.
Then f(t) converges for every t
1
C such that
[t
1
[ < [t
0
[.
Proof: If f(t
0
) < then an index N N ex-
ists such that for every n > N we have [f
n
t
n
0
[
[f
n
[[t
0
[
n
< M, for some nite M R. This means
[f
n
[ < M/[t
0
[
n
and therefore:
n=N
f
n
t
n
1
n=N
[f
n
[[t
1
[
n
n=N
M
[t
0
[
n
[t
1
[
n
= M
n=N
[t
1
[
[t
0
[
n
<
because the last sum is a geometric series with
[t
1
[/[t
0
[ < 1 by the hypothesis [t
1
[ < [t
0
[. Since the
rst N terms obviously amount to a nite quantity,
the theorem follows.
In a similar way, we can prove that if the series
diverges for some value t
0
C, then it diverges for
every value t
1
such that [t
1
[ > [t
0
[. Obviously, a
series can converge for the single value t
0
= 0, as it
happens for
n
n!t
n
, or can converge for every value
t C, as for
n
t
n
/n! = e
t
. In all the other cases,
the previous theorem implies:
Theorem 7.1.2 Let f(t) =
n
f
n
t
n
be a power se-
ries; then there exists a non-negative number R R
or R = such that:
1. for every complex number t
0
such that [t
0
[ < R
the series (absolutely) converges and, in fact, the
convergence is uniform in every circle of radius
< R;
2. for every complex number t
0
such that [t
0
[ > R
the series does not converge.
The uniform convergence derives from the previous
proof: the constant M can be made unique by choos-
ing the largest value for all the t
0
such that [t
0
[ .
The value of R is uniquely determined and is called
the radius of convergence for the series. From the
proof of the theorem, for r < R we have [f
n
[r
n
M or
n
[f
n
[
n
[f
n
[ 1/R. Besides, for r > R we
have [f
n
[r
n
1 for innitely many n; this implies
limsup
n
[f
n
[ 1/R, and therefore we have the fol-
lowing formula for the radius of convergence:
1
R
= lim sup
n
n
[f
n
[.
83
84 CHAPTER 7. ASYMPTOTICS
This result is the basis for our considerations on the
asymptotics of a power series coecients. In fact, it
implies that, as a rst approximation, [f
n
[ grows as
1/R
n
. However, this is a rough estimate, because it
can also grow as n/R
n
or 1/(nR
n
), and many possi-
bilities arise, which can make more precise the basic
approximation; the next sections will be dedicated to
this problem. We conclude by noticing that if:
lim
n
f
n+1
f
n
= S
then R = 1/S is the radius of convergence of the
series.
7.2 The method of Darboux
Newtons rule is the basis for many considerations on
asymptotics. In practice, we used it to prove that
F
n
n
/
1 2t 3t
2
2t
2
.
For n 2 we obviously have:
n
= [t
n
]
1 t
1 2t 3t
2
2t
2
=
=
1
2
[t
n+2
]
1 +t (1 3t)
1/2
.
We now observe that the radius of convergence of
(t) is R = 1/3, which is the same as the radius of
g(t) = (13t)
1/2
, while h(t) =
1 +t has 1 as radius
of convergence; therefore we have
n
/
n+1
1/3 as
n . By Benders theorem we nd:
n
1
2
4
3
[t
n+2
](1 3t)
1/2
=
=
3
3
1/2
n + 2
(3)
n+2
=
=
3
3(2n + 3)
2n + 4
n + 2
3
4
n+2
.
This is a particular case of a more general re-
sult due to Darboux and known as Darboux method.
First of all, let us show how it is possible to obtain an
approximation for the binomial coecient
, when
C is a xed number and n is large. We begin
by proving the following formula for the ratio of two
large values of the function (a, b are two small pa-
rameters with respect to n):
(n +a)
(n +b)
=
= n
ab
1 +
(a b)(a +b 1)
2n
+O
1
n
2
.
Let us apply the Stirling formula for the function:
(n +a)
(n +b)
2
n +a
n +a
e
n+a
1 +
1
12(n +a)
n +b
2
e
n +b
n+b
1
1
12(n +b)
.
If we limit ourselves to the term in 1/n, the two cor-
rections cancellate each other and therefore we nd:
(n +a)
(n +b)
n +b
n +a
e
ba
(n +a)
n+a
(n +b)
n+b
=
=
n +b
n +a
e
ba
n
ab
(1 +a/n)
n+a
(1 +b/n)
n+b
.
We now obtain asymptotic approximations in the fol-
lowing way:
n +b
n +a
=
1 +b/n
1 +a/n
1 +
b
2n
1
a
2n
1 +
b a
2n
.
1 +
x
n
n+x
= exp
(n +x) ln
1 +
x
n
=
= exp
(n +x)
x
n
x
2
2n
2
+
=
= exp
x +
x
2
n
x
2
2n
+
=
= e
x
1 +
x
2
2n
+
.
Therefore, for our expression we have:
(n +a)
(n +b)
n
ab
e
ab
e
a
e
b
1 +
a
2
2n
1
b
2
2n
1 +
b a
2n
=
= n
ab
1 +
a
2
b
2
a +b
2n
+O
1
n
2
.
We are now in a position to prove the following:
7.3. SINGULARITIES: POLES 85
Theorem 7.2.2 Let f(t) = h(t)(1 t)
, for some
which is not a positive integer, and h(t) having a
radius of convergence larger than 1/. Then we have:
f
n
= [t
n
]f(t) h
()
n
=
n
h(1/)
()n
1+
.
Proof: We simply apply Benders theorem and the
formula for approximating the binomial coecient:
=
( 1) ( n + 1)
n!
=
=
(1)
n
(n 1)(n 2) (1 )()
(n + 1)
.
By repeated applications of the recurrence formula
for the function (x + 1) = x(x), we nd:
(n ) =
= (n 1)(n 2) (1 )()()
and therefore:
=
(1)
n
(n )
(n + 1)()
=
=
(1)
n
()
n
1
1 +
( + 1)
2n
f(t) = 1 +t +t
2
+t
3
+ . The series
f(t) represents
f(t) inside the circle of convergence, in the sense that
for
which f(t
) =
f(t
). Therefore, t
0
= 1 is a singularity
for f(t) = 1/(1 t). Because our previous considera-
tions, the singularities of f(t) determine its radius of
convergence; on the other hand, no singularity can be
contained in the circle of convergence, and therefore
the radius of convergence is determined by the singu-
larity or singularities of smallest modulus. These will
be called dominating singularities and we observe ex-
plicitly that a function can have more than one dom-
inating singularity. For example, f(t) = 1/(1 t
2
)
has t = 1 and t = 1 as dominating singularities, be-
cause [1[ = [ 1[. The radius of convergence is always
a non-negative real number and we have R = [t
0
[, if
t
0
is any one of the dominating singularities for f(t).
An isolated point t
0
for which f(t
0
) = is there-
fore a singularity for f(t); as we shall see, not every
singularity of f(t) is such that f(t) = , but, for
the moment, let us limit ourselves to this case. The
following situation is very important: if f(t
0
) =
and we set = 1/t
0
, we will say that t
0
is a pole for
f(t) i there exists a positive integer m such that:
lim
tt0
(1 t)
m
f(t) = K < and K = 0.
The integer m is called the order of the pole. By
this denition, the function f(t) = 1/(1 t) has a
pole of order 1 in t
0
= 1, while 1/(1 t)
2
has a
pole of order 2 in t
0
= 1 and 1/(1 2t)
5
has a pole
of order 5 in t
0
= 1/2. A more interesting case is
f(t) = (e
t
e)/(1 t)
2
, which, notwithstanding the
(1 t)
2
, has a pole of order 1 in t
0
= 1; in fact:
lim
t1
(1 t)
e
t
e
(1 t)
2
= lim
t1
e
t
e
1 t
= lim
t1
e
t
1
= e.
The generating function of Bernoulli numbers
f(t) = t/(e
t
1) has innitely many poles. Observe
rst that t = 0 is not a pole because:
lim
t0
t
e
t
1
= lim
t0
1
e
t
= 1.
The denominator becomes 0 when e
t
= 1, and this
happens when t = 2ki; in fact, e
2ki
= cos 2k +
i sin 2k = 1. In that case, the dominating sin-
gularities are t
0
= 2i and t
1
= 2i. Finally,
the generating function of the ordered Bell numbers
f(t) = 1/(2e
t
) has again an innite number of poles
t = ln 2+2ki; in this case the dominating singularity
is t
0
= ln 2.
We conclude this section by observing that if
f(t
0
) = , not necessarily t
0
is a pole for f(t). In
86 CHAPTER 7. ASYMPTOTICS
fact, let us consider the generating function for the
central binomial coecients f(t) = 1/
1 4t. For
t
0
= 1/4 we have f(1/4) = , but t
0
is not a pole of
order 1 because:
lim
t1/4
1 4t
1 4t
= lim
t1/4
1 4t = 0
and the same happens if we try with (1 4t)
m
for
m > 1. As we shall see, this kind of singularity is
called algebraic. Finally, let us consider the func-
tion f(t) = exp(1/(1t)), which goes to as t 1.
In this case we have:
lim
t1
(1 t)
m
exp
1
1 t
=
= lim
t1
(1 t)
m
1 +
1
1 t
+
1
2(1 t)
2
+
= .
In fact, the rst m 1 terms tend to 0, the mth
term tends to 1/m!, but all the other terms go to .
Therefore, t
0
= 1 is not a pole of any order. When-
ever we have a function f(t) for which a point t
0
C
exists such that m > 0: lim
tt0
(1t/t
0
)
m
f(t) = ,
we say that t
0
is an essential singularity for f(t). Es-
sential singularities are points at which f(t) goes to
too fast; these singularities cannot be treated by
Darboux method and their study will be delayed un-
til we study the Haymans method.
7.4 Poles and asymptotics
Darboux method can be easily used to deal with
functions, whose dominating singularities are poles.
Actually, a direct application of Benders theorem is
sucient, and this is the way we will use in the fol-
lowing examples.
Fibonacci numbers are easily approximated:
[t
n
]
t
1 t t
2
= [t
n
]
t
1
t
1
1 t
t
1
t
t =
1
[t
n
]
1
1 t
=
1
n
.
Our second example concerns a particular kind of
permutations, called derangements (see Section 2.2).
A derangement is a permutation without any xed
point. For n = 0 the empty permutation is consid-
ered a derangement, since no xed point exists. For
n = 1, there is no derangement, but for n = 2 the
permutation (1 2), written in cycle notation, is ac-
tually a derangement. For n = 3 we have the two
derangements (1 2 3) and (1 3 2), and for n = 4 we
have a total of 9 derangements.
Let D
n
the number of derangements in P
n
; we can
count them in the following way: we begin by sub-
tracting from n!, the total number of permutations,
the number of permutations having at least a xed
point: if the xed point is 1, we have (n1)! possible
permutations; if the xed point is 2, we have again
(n 1)! permutations of the other elements. There-
fore, we have a total of n(n 1)! cases, giving the
approximation:
D
n
= n! n(n 1)!.
This quantity is clearly 0 and this happens because
we have subtracted twice every permutation with at
least 2 xed points: in fact, we subtracted it when
we considered the rst and the second xed point.
Therefore, we have now to add permutations with at
least two xed points. These are obtained by choos-
ing the two xed points in all the
n
2
possible ways
and then permuting the n 2 remaining elements.
Thus we have the new approximation:
D
n
= n! n(n 1)! +
n
2
(n 2)!.
In this way, however, we added twice permutations
with at least three xed points, which have to be
subtracted again. We thus obtain:
D
n
= n! n(n 1)! +
n
2
(n 2)!
n
3
(n 3)!.
We can now go on with the same method, which is
called the inclusion exclusion principle, and eventu-
ally arrive to the nal value:
D
n
= n! n(n 1)! +
n
2
(n 2)!
n
3
(n 3)! + =
=
n!
0!
n!
1!
+
n!
2!
n!
3!
+ = n!
n
k=0
(1)
k
k!
.
This formula checks with the previously found val-
ues. We obtain the exponential generating function
((D
n
/n!) by observing that the generic element in
the sum is the coecient [t
n
]e
t
, and therefore by
the theorem on the generating function for the par-
tial sums we have:
(
D
n
n!
=
e
t
1 t
.
In order to nd the asymptotic value for D
n
, we
observe that the radius of convergence of 1/(1 t)
is 1, while e
t
converges for every value of t. By
Benders theorem we have:
D
n
n!
e
1
or D
n
n!
e
.
7.5. ALGEBRAIC AND LOGARITHMIC SINGULARITIES 87
This value is indeed a very good approximation for
D
n
, which can actually be computed as the integer
nearest to n!/e.
Let us now see how Benders theorem is applied
to the exponential generating function of the ordered
Bell numbers. We have shown that the dominating
singularity is a pole at t = ln 2 which has order 1:
lim
tln 2
1 t/ ln 2
2 e
t
= lim
tln 2
1/ ln 2
e
t
=
1
2 ln 2
.
At this point we have:
[t
n
]
1
2 e
t
= [t
n
]
1
1 t/ ln 2
1 t/ ln 2
2 e
t
1 t/ ln 2
2 e
t
t = ln 2
[t
n
]
1
1 t/ ln 2
=
=
1
2
1
(ln 2)
n+1
and we conclude with the very good approximation
O
n
n!/(2(ln 2)
n+1
).
Finally, we nd the asymptotic approximation for
the Bernoulli numbers. The following statement is
very important when we have functions with several
dominating singularity:
Principle: If t
1
, t
2
, . . . , t
k
are all the dominating
singularities of a function f(t), then [t
n
]f(t) can be
found by summing all the contributions obtained by
independently considering the k singularities.
We already observed that 2i are the two domi-
nating singularities for the generating function of the
Bernoulli numbers; they are both poles of order 1:
lim
t2i
t(1 t/2i)
e
t
1
= lim
t2i
1 t/i
e
t
= 1.
lim
t2i
t(1 +t/2i)
e
t
1
= lim
t2i
1 +t/i
e
t
= 1.
Therefore we have:
[t
n
]
t
e
t
1
= [t
n
]
1
1 t/2i
t(1 t/2i)
e
t
1
1
(2i)
n
.
A similar result is obtained for the other pole; thus
we have:
B
n
n!
1
(2i)
n
1
(2i)
n
.
When n is odd, these two values are opposite in sign
and the result is 0; this conrms that the Bernoulli
numbers of odd index are 0, except for n = 1. When
n is even, say n = 2k, we have (2i)
2k
= (2i)
2k
=
(1)
k
(2)
2k
; therefore:
B
2k
2(1)
k
(2k)!
(2)
2k
.
This formula is a good approximation, also for small
values of n, and shows that Bernoulli numbers be-
come, in modulus, larger and larger as n increases.
7.5 Algebraic and logarithmic
singularities
Let us consider the generating function for the Cata-
lan numbers f(t) = (1
n
k=0
C
k
C
nk
dening the Catalan numbers. This
is due to the fact that, when the argument is a pos-
itive real number, we can choose the positive value
as the result of a square root. In other words, we
consider the arithmetic square root instead of the al-
gebraic square root. This allows us to identify the
power series
f(t) with the function f(t), but when we
pass to complex numbers this is no longer possible.
Actually, in the complex eld, a function containing
a square root is a two-valued function, and there are
two branches dened by the same expression. Only
one of these two branches coincides with the func-
tion dened by the power series, which is obviously a
one-valued function.
The points at which a square root becomes 0 are
special points; in them the function is one-valued, but
in every neighborhood the function is two-valued. For
the smallest in modulus among these points, say t
0
,
we must have the following situation: for t such that
[t[ < [t
0
[,
f(t) should coincide with a branch of f(t),
while for t such that [t[ > [t
0
[,
f(t) cannot converge.
In fact, consider a t R, t > [t
0
[; the expression un-
der the square root should be a negative real number
and therefore f(t) C`R; but
f(t) can only be a real
number or f(t) does not converge. Because we know
that when
f(t) converges we must have
f(t) = f(t),
we conclude that
f(t) cannot converge. This shows
that t
0
is a singularity for f(t).
Every kth root originates the same problem and
the function is actually a k-valued function; all the
values for which the argument of the root is 0 is a
singularity, called an algebraic singularity. They can
be treated by Darboux method or, directly, by means
of Benders theorem, which relies on Newtons rule.
Actually, we already used this method to nd the
asymptotic evaluation for the Motzkin numbers.
The same considerations hold when a function con-
tains a logarithm. In fact, a logarithm is an innite-
valued function, because it is the inverse of the expo-
nential, which, in the complex eld C, is a periodic
function:
e
t+2ki
= e
t
e
2ki
= e
t
(cos 2k +i sin 2k) = e
t
.
The period of e
t
is therefore 2i and lnt is actually
ln t + 2ki, for k Z. A point t
0
for which the ar-
gument of a logarithm is 0 is a singularity for the
88 CHAPTER 7. ASYMPTOTICS
corresponding function. In every neighborhood of t
0
,
the function has an innite number of branches; this
is the only fact distinguishing a logarithmic singular-
ity from an algebraic one.
Let us suppose we have the sum:
S
n
= 1 +
2
2
+
4
3
+
8
4
+ +
2
n1
n
=
1
2
n
k=1
2
k
k
and we wish to compute an approximate value. The
generating function is:
(
1
2
n
k=1
2
k
k
=
1
2
1
1 t
ln
1
1 2t
.
There are two singularities: t = 1 is a pole, while t =
1/2 is a logarithmic singularity. Since the latter has
smaller modulus, it is dominating and R = 1/2 is the
radius of convergence of the function. By Benders
theorem we have:
S
n
=
1
2
[t
n
]
1
1 t
ln
1
1 2t
1
2
1
1 1/2
[t
n
] ln
1
1 2t
=
2
n
n
.
This is not a very good approximation. In the next
section we will see how it can be improved.
7.6 Subtracted singularities
The methods presented in the preceding sections only
give the expression describing the general behavior of
the coecients f
n
in the expansion
f(t) =
k=0
f
k
t
k
,
i.e., what is called the principal value for f
n
. Some-
times, this behavior is only achieved for very large
values of n, but for smaller values it is just a rough
approximation of the true value. Because of that,
we speak of asymptotic evaluation or asymptotic
approximation. When we need a true approxima-
tion, we should introduce some corrections, which
slightly modify the general behavior and more ac-
curately evaluate the true value of f
n
.
Many times, the following observation solves the
problem. Suppose we have found, by one of the
previous methods, that a function f(t) is such that
f(t) A(1t)
n
k=1
2
k1
/k, introduced
in the previous section. We found the principal value
S
n
2
n
/n by studying the generating function:
1
2
1
1 t
ln
1
1 2t
ln
1
1 2t
.
Let us therefore consider the new function:
h(t) =
1
2
1
1 t
ln
1
1 2t
ln
1
1 2t
=
=
1 2t
2(1 t)
ln
1
1 2t
.
The generic term h
n
should be signicantly less than
f
n
; the factor (1 2t) actually reduces the order of
growth of the logarithm:
[t
n
]
1 2t
2(1 t)
ln
1
1 2t
=
=
1
2(1 1/2)
[t
n
](1 2t) ln
1
1 2t
=
=
2
n
n
2
2
n1
n 1
=
2
n
n(n 1)
.
Therefore, a better approximation for S
n
is:
S
n
=
2
n
n
+
2
n
n(n 1)
=
2
n
n 1
.
The reader can easily verify that this correction
greatly reduces the error in the evaluation of S
n
.
7.8. HAYMANS METHOD 89
A further correction can now be obtained by con-
sidering:
k(t) =
1
2
1
1 t
ln
1
1 2t
ln
1
1 2t
+ (1 2t) ln
1
1 2t
=
=
(1 2t)
2
2(1 t)
ln
1
1 2t
which gives:
k
n
=
1
2(1 1/2)
[t
n
](1 2t)
2
ln
1
1 2t
=
=
2
n
n
4
2
n1
n 1
+ 4
2
n2
n 2
=
2
n+1
n(n 1)(n 2)
.
This correction is still smaller, and we can write:
S
n
2
n
n 1
1 +
2
n(n 2)
.
In general, we can obtain the same results if we ex-
pand the function h(t) in f(t) = g(t)h(t), h(t) with a
radius of convergence larger than that of f(t), around
the dominating singularity. This is done in the fol-
lowing way:
1
2(1 t)
=
1
1 + (1 2t)
=
= 1 (1 2t) + (1 2t)
2
(1 2t)
3
+ .
This implies:
1
2(1 t)
ln
1
1 2t
= ln
1
1 2t
(1 2t) ln
1
1 2t
+ (1 2t)
2
ln
1
1 2t
and the result is the same as the one previously ob-
tained by the method of subtracted singularities.
7.7 The asymptotic behavior of
a trinomial square root
In many problems we arrive to a generating function
of the form:
f(t) =
p(t)
(1 t)(1 t)
rt
m
or:
g(t) =
q(t)
(1 t)(1 t)
.
In the former case, p(t) is a correcting polynomial,
which has no eect on f
n
, for n suciently large,
and therefore we have:
f
n
=
1
r
[t
n+m
]
(1 t)(1 t)
where m is a small integer. In the second case, g
n
is
the sum of various terms, as many as there are terms
in the polynomial q(t), each one of the form:
q
k
[t
nk
]
1
(1 t)(1 t)
.
It is therefore interesting to compute, once and for
all, the asymptotic value [t
n
]((1t)(1t))
s
, where
s = 1/2 or s = 1/2.
Let us suppose that [[ > [[, since the case =
has no interest and the case = should be ap-
proached in another way. This hypothesis means that
t = 1/ is the radius of convergence of the function
and we can develop everything around this singular-
ity. In most combinatorial problems we have > 0,
because the coecients of f(t) are positive numbers,
but this is not a limiting factor.
Let us consider s = 1/2; in this case, a minus sign
should precede the square root. The evaluation is
shown in Table 7.1. The formula so obtained can be
considered sucient for obtaining both the asymp-
totic evaluation of f
n
and a suitable numerical ap-
proximation. However, we can use the following de-
velopments:
2n
n
=
4
n
1
1
8n
+
1
128n
2
+O
1
n
3
1
2n 1
=
1
2n
1 +
1
2n
+
1
4n
2
+O
1
n
3
1
2n 3
=
1
2n
1 +
3
2n
+O
1
n
2
and get:
f
n
=
n
2n
1
6 + 3( )
8( )n
+
25
128n
2
+
+
9
8( )n
2
9 + 51
2
32( )
2
n
2
+O
1
n
3
.
The reader is invited to nd a similar formula for
the case s = 1/2.
7.8 Haymans method
The method for coecient evaluation which uses the
function singularities (Darboux method), can be im-
proved and made more accurate, as we have seen,
by the technique of subtracted singularities. Un-
fortunately, these methods become useless when the
function f(t) has no singularity (entire functions) or
when the dominating singularity is essential. In fact,
in the former case we do not have any singularity to
operate on, and in the latter the development around
the singularity gives rise to a series with an innite
number of terms of negative degree.
90 CHAPTER 7. ASYMPTOTICS
[t
n
] (1 t)
1/2
(1 t)
1/2
= [t
n
] (1 t)
1/2
(1 t)
1/2
=
=
[t
n
](1 t)
1/2
1 +
(1 t)
1/2
=
=
[t
n
](1 t)
1/2
1 +
2()
(1 t)
2
8()
2
(1 t)
2
+
=
=
[t
n
]
(1 t)
1/2
+
2()
(1 t)
3/2
2
8()
2
(1 t)
5/2
+
=
=
1/2
n
()
n
+
2()
3/2
n
()
n
2
8()
2
5/2
n
()
n
+
=
=
(1)
n1
4
n
(2n1)
2n
n
()
n
1
2()
3
2n3
2
8()
2
15
(2n3)(2n5)
+
=
=
n
4
n
(2n1)
2n
n
1
3
2()(2n3)
15
2
8()
2
(2n3)(2n5)
+O
1
n
3
.
Table 7.1: The case s = 1/2
In these cases, the only method seems to be the
Cauchy theorem, which allows us to evaluate [t
n
]f(t)
by means of an integral:
f
n
=
1
2i
f(t)
t
n+1
dt
where is a suitable path enclosing the origin. We do
not intend to develop this method here, but well limit
ourselves to sketch a method, derived from Cauchy
theorem, which allows us to nd an asymptotic evalu-
ation for f
n
in many practical situations. The method
can be implemented on a computer in the following
sense: given a function f(t), in an algorithmic way we
can check whether f(t) belongs to the class of func-
tions for which the method is applicable (the class
of H-admissible functions) and, if that is the case,
we can evaluate the principal value of the asymp-
totic estimate for f
n
. The system , by Flajolet,
Salvy and Zimmermann, realizes this method. The
development of the method was mainly performed
by Hayman and therefore it is known as Haymans
method; this also justies the use of the letter H in
the denition of H-admissibility.
A function is called H-admissible if and only if it
belongs to one of the following classes or can be ob-
tained, in a nite number of steps according to the
following rules, from other H-admissible functions:
1. if f(t) and g(t) are H-admissible functions and
p(t) is a polynomial with real coecients and
positive leading term, then:
exp(f(t)) f(t) +g(t) f(t) +p(t)
p(f(t)) p(t)f(t)
are all H-admissible functions;
2. if p(t) is a polynomial with positive coecients
and not of the form p(t
k
) for k > 1, then the
function exp(p(t)) is H-admissible;
3. if , are positive real numbers and , are real
numbers, then the function:
f(t) = exp
(1 t)
1
t
ln
1
(1 t)
2
t
ln
1
t
ln
1
(1 t)
is H-admissible.
For example, the following functions are all H-
admissible:
e
t
exp
t +
t
2
2
exp
t
1 t
exp
1
t(1 t)
2
ln
1
1 t
.
In particular, for the third function we have:
exp
t
1 t
= exp
1
1 t
1
=
1
e
exp
1
1 t
2b(r)
as n
where r = r(n) is the least positive solution of the
equation tf
t
f
(t)
f(t)
.
As we said before, the proof of this theorem is
based on Cauchys theorem and is beyond the scope
of these notes. Instead, let us show some examples
to clarify the application of Haymans method.
7.9. EXAMPLES OF HAYMANS THEOREM 91
7.9 Examples of Haymans
Theorem
The rst example can be easily veried. Let f(t) = e
t
be the exponential function, so that we know f
n
=
1/n!. For applying Haymans theorem, we have to
solve the equation te
t
/e
t
= n, which gives r = n.
The function b(t) is simply t and therefore we have:
[t
n
]e
t
=
e
n
n
n
2n
and in this formula we immediately recognize Stirling
approximation for factorials.
Examples become early rather complex and require
a large amount of computations. Let us consider the
following sum:
n1
k=0
n 1
k
1
(k + 1)!
=
n
k=0
n 1
k 1
1
k!
=
=
n
k=0
n
k
n 1
k
1
k!
=
=
n
k=0
n
k
1
k!
n1
k=0
n 1
k
1
k!
=
= [t
n
]
1
1 t
exp
t
1 t
[t
n1
]
1
1 t
exp
t
1 t
=
= [t
n
]
1 t
1 t
exp
t
1 t
= [t
n
] exp
t
1 t
.
We have already seen that this function is H-
admissible, and therefore we can try to evaluate the
asymptotic behavior of the sum. Let us dene the
function g(t) = tf
t
1t
exp
t
1t
=
t
(1 t)
2
.
The value of r is therefore given by the minimal pos-
itive solution of:
t
(1 t)
2
= n or nt
2
(2n + 1)t +n = 0.
Because = 4n
2
+ 4n + 1 4n
2
= 4n + 1, we have
the two solutions:
r =
2n + 1
4n + 1
2n
and we must accept the one with the sign, which is
positive and less than the other. It is surely positive,
because
4n + 1 < 2
4n + 1 = 2
1 +
1
4n
=
= 2
1 +
1
8n
+O
1
n
2
.
From this formula we immediately obtain:
r = 1
1
n
+
1
2n
1
8n
n
+O
1
n
2
n
1
2n
+
1
8n
n
+O
1
n
2
=
=
1
1
1
2
n
+
1
8n
+O
1
n
we immediately obtain:
r
1 r
=
1
1
n
+
1
2n
+O
1
n
1 +
1
2
n
+
1
8n
+O
1
n
=
=
1
1
2
n
+
1
8n
+O
1
n
=
=
n
1
2
+
1
8
n
+O
1
n
.
Finally, the exponential gives:
exp
r
1 r
=
e
e
exp
1
8
n
+O
1
n
=
=
e
1 +
1
8
n
+O
1
n
.
Because Haymans method only gives the principal
value of the result, the correction can be ignored (it
can be not precise) and we get:
exp
r
1 r
=
e
e
.
The second part we have to develop is 1/r
n
, which
can be computed when we write it as exp(nln 1/r),
that is:
1
r
n
= exp
nln
1 +
1
n
+
1
2n
+O
1
n
=
= exp
n
+
1
2n
+O
1
n
1
2
n
+O
1
n
2
+O
1
n
92 CHAPTER 7. ASYMPTOTICS
= exp
n
+
1
2n
1
2n
+O
1
n
=
= exp
n +O
n
.
Again, the correction is ignored and we only consider
the principal value. We now observe that f(r)/r
n
e
2
n
/
e.
Only b(r) remains to be computed; we have b(t) =
tg
1
1
n
+
1
2n
+O
1
n
2
1
n
+
1
2n
+O
1
n
=
= 2
3
n
+
5
2n
+O
1
n
=
= 2
1
3
2
n
+
5
4n
+O
1
n
.
By using the expression already found for (1 r) we
then have:
(1 r)
3
=
1
n
1
1
2
n
+
1
8n
+O
1
n
3
=
=
1
n
1
1
n
+
1
2n
+O
1
n
1
1
2
n
+
1
8n
+O
1
n
=
=
1
n
1
3
2
n
+
9
8n
+O
1
n
.
By inverting this quantity, we eventually get:
b(r) =
r(1 +r)
(1 r)
3
=
= 2n
1 +
3
2
n
+
9
8n
+O
1
n
1
3
2
n
+
5
4n
+O
1
n
=
= 2n
1 +
1
8n
+O
1
n
.
The principal value is 2n
n and therefore:
2b(r) =
4n
n = 2
n
3/4
;
the nal result is:
f
n
= [t
n
] exp
t
1 t
e
2
n
2
en
3/4
.
It is a simple matter to execute the original sum
n1
k=0
n1
k
k
L(k, n) = R(n) have a special form:
L(k + 1, n)
L(k, n)
L(k, n + 1)
L(k, n)
R(n + 1)
R(n)
are all rational functions in n and k. This actu-
ally means that L(k, n) and R(n) are composed of
factorial, powers and binomial coecients. In this
sense, Riordan arrays are less powerfy+ul, but can
be used also when non-hypergeometric terms are not
involved, as for examples in the case of harmonic and
Stirling numbers of both kind.
* * *
Chapter 6: Formal Methods
In this chapter we have considered two important
methods: the symbolic method for deducing counting
generating functions from the syntactic denition of
combinatorial objects, and the method of operators
for obtaining combinatorial identities from relations
between transformations of sequences dened by op-
erators.
The symbolic method was started by
Sch utzenberger and Viennot, who devised a
technique to automatically generate counting gener-
ating functions from a context-free non-ambiguous
grammar. When the grammar denes a class of
combinatorial objects, this method gives a direct way
to obtain monovariate or multivariate generating
functions, which allow to solve many problems
relative to the given objects. Since context-free
languages only dene algebraic generating functions
(the subset of regular grammars is limited to rational
96 CHAPTER 8. BIBLIOGRAPHY
functions), the method is not very general, but
is very eective whenever it can be applied. The
method was extended by Flajolet to some classes of
exponential generating functions and implemented
in Maple.
Marco Sch utzenberger: Context-free languages
and pushdown automata, Information and Con-
trol 6 (1963) 246 264
Maylis Delest, Xavier Viennot: Algebraic lan-
guages and polyominoes enumeration, X Collo-
quium on Automata, Languages and Program-
ming - Lecture Notes in Computer Science (1983)
173 181.
Philippe Flajolet: Symbolic enumerative com-
binatorics and complex asymptotic analysis,
Algorithms Seminar, (2001). Available at:
http://algo.inria.fr/seminars/sem00-
01/ajolet.html
The method of operators is very old and was devel-
oped in the XIX Century by English mathematicians,
especially George Boole. A classical book in this di-
rection is:
Charles Jordan: Calculus of Finite Dierences,
Chelsea Publ. (1965).
Actually, the method is used in Numerical Anal-
ysis, but it has a clear connection with Combinato-
rial Analysis, as our numerous examples show. The
important concepts of indenite and denite summa-
tions are used by Wilf and Zeilberger in the quoted
texts. The Euler-McLaurin summation formula is the
rst connection between nite methods (considered
up to this moment) and asymptotics.
* * *
Chapter 7: Asymptotics
The methods treated in the previous chapters are
exact, in the sense that every time they give the so-
lution to a problem, this solution is a precise formula.
This, however, is not always possible, and many times
we are not able to nd a solution of this kind. In these
cases, we would also be content with an approximate
solution, provided we can give an upper bound to the
error committed. The purpose of asymptotic meth-
ods is just that.
The natural settings for these problems are Com-
plex Analysis and the theory of series. We have used
a rather descriptive approach, limiting our consid-
erations to elementary cases. These situations are
covered by the quoted text, especially Knuth, Wilf
and Henrici. The method of Heyman is based on the
paper:
Micha Hofri: Probabilistic Analysis of Algo-
rithms, Springer (1987).
Index
function, 8, 84
, 90
function, 8
1-1 correspondence, 11
1-1 mapping, 11
A-sequence, 58
absolute scale, 9
addition operator, 77
Adelson-Velski, 52
adicity, 35
algebraic singularity, 86, 87
algebraic square root, 87
Algol60, 69
Algol68, 70
algorithm, 5
alphabet, 12, 67
alternating subgroup, 14
ambiguous grammar, 68
Appell subgroup, 57
arithmetic square root, 87
arrangement, 11
asymptotic approximation, 88
asymptotic development, 80
asymptotic evaluation, 88
average case analysis, 5
AVL tree, 52
Backus Normal Form, 70
Bell number, 24
Bell subgroup, 57
Benders theorem, 84
Bernoulli number, 18, 24, 53, 80, 85
big-oh notation, 9
bijection, 11
binary searching, 6
binary tree, 20
binomial coecient, 8, 15
binomial formula, 15
bisection formula, 41
bivariate generating function, 55
BNF, 70
Boole, George, 73
branch, 87
C language, 69
cardinality, 11
Cartesian product, 11
Catalan number, 20, 34
Catalan triangle, 63
Cauchy product, 26
Cauchy theorem, 90
central binomial coecient, 16, 17
central trinomial coecient, 46
characteristic function, 12
Chomski, Noam, 67
choose, 15
closed form expression, 7
codomain, 11
coecient extraction rules, 30
coecient of operator, 30
coecient operator, 30
colored walk, 62
colored walk problem, 62
column, 11
column index, 11
combination, 15
combination with repetitions, 16
complete colored walk, 62
composition of f.p.s., 29
composition of permutations, 13
composition rule for coecient of, 30
composition rule for generating functions, 40
compositional inverse of a f.p.s., 29
Computer Algebra System, 34
context free grammar, 68
context free language, 68
convergence, 83
convolution, 26
convolution rule for coecient of, 30
convolution rule for generating functions, 40
cross product rule, 17
cycle, 12, 21
cycle degree, 12
cycle representation, 12
Darboux method, 84
denite integration of a f.p.s., 28
denite summation, 78
degree of a permutation, 12
delta series, 29
derangement, 12, 86
derivation, 67
diagonal step, 62
97
98 INDEX
diagonalisation rule for generating functions, 40
dierence operator, 73
dierentiation of a f.p.s., 28
dierentiation rule for coecient of, 30
dierentiation rule for generating functions, 40
digamma function, 8
disposition, 15
divergence, 83
domain, 11
dominating singularity, 85
double sequence, 11
Dyck grammar, 68
Dyck language, 68
Dyck walk, 21
Dyck word, 69
east step, 20, 62
empty word, 12, 67
entire function, 89
essential singularity, 86
Euclids algorithm, 19
Euler constant, 8, 18
Euler transformation, 42, 55
Euler-McLaurin summation formula, 80
even permutation, 13
exponential algorithm, 10
exponential generating function, 25, 39
exponentiation of f.p.s., 28
extensional denition, 11
extraction of the coecient, 29
f.p.s., 25
factorial, 8, 14
falling factorial, 15
Fibonacci number, 19
Fibonacci problem, 19
Fibonacci word, 70
Fibonacci, Leonardo, 18
nite operator, 73
xed point, 12
Flajolet, Philippe, 90
formal grammar, 67
formal language, 67
formal Laurent series, 25, 27
formal power series, 25
free monoid, 67
full history recurrence, 47
function, 11
Gauss integral, 8
generalized convolution rule, 56
generalized harmonic number, 18
generating function, 25
generating function rules, 40
generation, 67
geometric series, 30
grammar, 67
group, 67
H-admissible function, 90
Hardys identity, 61
harmonic number, 8, 18
harmonic series, 17
Haymans method, 90
head, 67
height balanced binary tree, 52
height of a tree, 52
i.p.l., 51
identity, 12
identity for composition, 29
identity operator, 73
identity permutation, 13
image, 11
inclusion exclusion principle, 86
indenite precision, 35
indenite summation, 78
indeterminate, 25
index, 11
innitesimal operator, 73
initial condition, 6, 47
injective function, 11
input, 5
integral lattice, 20
integration of a f.p.s., 28
intensional denition, 11
internal path length, 51
intractable algorithm, 10
intrinsecally ambiguous grammar, 69
invertible f.p.s., 26
involution, 13, 49
juxtaposition, 67
k-combination, 15
key, 5
Kroneckers delta, 43
Lagrange, 32
Lagrange inversion formula, 33
Lagrange subgroup, 57
Landau, Edmund, 9
Landis, 52
language, 12, 67
language generated by the grammar, 68
leftmost occurrence, 67
length of a walk, 62
length of a word, 67
letter, 12, 67
LIF, 33
linear algorithm, 10
linear recurrence, 47
linear recurrence with constant coecients, 48
INDEX 99
linear recurrence with polynomial coecients, 48
linearity rule for coecient of, 30
linearity rule for generating functions, 40
list representation, 36
logarithm of a f.p.s., 28
logarithmic algorithm, 10
logarithmic singularity, 88
mapping, 11
Mascheroni constant, 8, 18
metasymbol, 70
method of shifting, 44
Miller, J. C. P., 37
monoid, 67
Motzkin number, 71
Motzkin triangle, 63
Motzkin word, 71
multiset, 24
negation rule, 16
Newtons rule, 28, 30, 84
non convergence, 83
non-ambiguous grammar, 68
north step, 20, 62
north-east step, 20
number of involutions, 14
number of mappings, 11
Numerical Analysis, 73
O-notation, 9
object grammar, 71
occurence, 67
occurs in, 67
odd permutation, 13
operand, 35
operations on rational numbers, 35
operator, 35, 72
order of a f.p.s., 25
order of a pole, 85
order of a recurrence, 47
ordered Bell number, 24, 85
ordered partition, 24
ordinary generating function, 25
output, 5
p-ary tree, 34
parenthetization, 20, 68
partial fraction expansion, 30, 48
partial history recurrence, 47
partially recursive set, 68
Pascal language, 69
Pascal triangle, 16
path, 20
permutation, 12
place marker, 25
Pochhammer symbol, 15
pole, 85
polynomial algorithm, 10
power of a f.p.s., 27
preferential arrangement number, 24
preferential arrangements, 24
prex, 67
principal value, 88
principle of identity, 40
problem of searching, 5
product of f.L.s., 27
product of f.p.s., 26
production, 67
program, 5
proper Riordan array, 55
quadratic algoritm, 10
quasi-unit, 29
rabbit problem, 19
radius of convergence, 83
random permutation, 14
range, 11
recurrence relation, 6, 47
renewal array, 57
residue of a f.L.s., 30
reverse of a f.p.s., 29
Riemann zeta function, 18
Riordan array, 55
Riordans old identity, 61
rising factorial, 15
Rogers, Douglas, 57
root, 21
rooted planar tree, 21
row, 11
row index, 11
row-by-column product, 32
Salvy, Bruno, 90
Sch utzenberger methodology, 68
semi-closed form, 46
sequence, 11
sequential searching, 5
set, 11
set partition, 23
shift operator, 73
shifting rule for coecient of, 30
shifting rule for generating functions, 40
shuing, 14
simple binomial coecients, 59
singularity, 85
small-oh notation, 9
solving a recurrence, 6
sorting, 12
south-east step, 20
square root of a f.p.s., 28
Stanley, 33
100 INDEX
Stirling number of the rst kind, 21
Stirling number of the second kind, 22
Stirling polynomial, 23
Stirling, James, 21, 22
subgroup of associated operators, 57
subtracted singularity, 88
subword, 67
successful search, 5
sux, 67
sum of a geometric progression, 44
sum of f.L.s, 27
sum of f.p.s., 26
summation by parts, 78
summing factor method, 49
surjective function, 11
symbol, 12, 67
symbolic method, 68
symmetric colored walk problem, 62
symmetric group, 14
symmetry formula, 16
syntax directed compilation, 70
table, 5
tail, 67
Tartaglia triangle, 16
Taylor theorem, 80
terminal word, 68
tractable algorithm, 10
transposition, 12
transposition representation, 13
tree representation, 36
triangle, 55
trinomial coecient, 46
unary-binary tree, 71
underdiagonal colored walk, 62
underdiagonal walk, 20
unfold a recurrence, 6, 49
uniform convergence, 83
unit, 26
unsuccessful search, 5
van Wijngarden grammar, 70
Vandermonde convolution, 43
vector notation, 11
vector representation, 12
walk, 20
word, 12, 67
worst AVL tree, 52
worst case analysis, 5
Z-sequence, 58
zeta function, 18
Zimmermann, Paul, 90