JM FM

Rhymes in primes
Andrei Okounkov
Abstract
While the author is a professional mathematician, he is by no means an expert in the sub-
ject area of these notes. The goal of these notes is to share the author’s personal excitement
about some results of James Maynard with mathematics enthusiasts of all ages, using
maximally accessible, yet precise mathematical language. No attempt has been made to
present an overview of the current state field, its history, or to place this narrative in any
kind of broader scientific or social context. See the references in Section 11 for both pro-
fessional surveys and popular science accounts that will certainly give the reader a broader
and deeper understanding of the material.
© 2022 International Mathematical Union

Preliminary version, to appear in Proc. Int. Cong. Math. 2022, Vol. 1.
DOI 10.4171/ICM2022/202
1. The ancient sieve
It is hard to imagine a more fundamental arithmetic object than the multiplication
table
(1)
where the dots indicate that we imagine this table has infinitely many rows and columns. The
numbers 𝑛 that appear in the shaded area are called composite numbers. They can be written
in the form 𝑛 = 𝑎𝑏 where both 𝑎 ≠ 1 and 𝑏 ≠ 1 are positive integers.
Numbers that are not 1 and not composite are called prime. For instance, 2, 3, 5,
and 7 are prime, as one sees from (1). Indeed, every composite number 𝑎𝑏 appears in the
multiplication table in the column 𝑎 and row 𝑏, which are both less than the number 𝑎𝑏. So,
2, 3, 5, 7 will never appear in the shaded part.
It is a fundamental arithmetic fact that every positive integer 𝑛 > 1 can be factored as
a product of primes, and this factorization is unique up to the order of the prime factors. One
can compare and contrast factorization into primes with how molecules are built from atoms.
One clear difference is that the order of prime factors does not matter, unlike the positions
of the atoms in a molecule.
Primes form an infinite sequence which has mesmerized and puzzled mathemati-
cians for millenia. Many mathematicians were first attracted to mathematics by the magic of
prime numbers and remained true to their first mathematical love — number theory.
“It is the fact that primes are so fundamental (being the building blocks of whole
numbers), but still so mysterious and poorly understood which makes them so fascinating to
me”, says James Maynard, the hero of these notes. Kannan Soundararajan, the presenter of
Maynard’s Fields Medal laudatio at ICM 2022, agrees: “Like many others, I was drawn in by
the extreme simplicity of problems involving primes, and the remarkable difficulty of proving
anything about them. Twin primes and Goldbach in particular were especially fascinating
problems. It’s been amazing to witness such spectacular progress as the Green–Tao theorem
and bounded gaps between primes over the last twenty years.”
The following method for tabulating the primes goes at least far back as Eratosthenes
(276 – 195/194 BC). To remove the composite numbers from the list of all numbers, we can
successively cross out or punch trough all numbers from the grey columns in the multiplica-
tion table (1), that is, remove all nontrivial multiples of 2, of 3, of 5, et cetera. For instance,
the list of natural numbers with 1 and multiples of 2 and 3 removed will look like this:
2 Andrei Okounkov
⃝ 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
(2)
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
where dots indicate that this table has infinitely many rows. The reader may notice there is
no need to worry about multiples of 4, 6, or any other composite number.
Once we remove all composite numbers from numbers up to a 100, the result will
look like this (the colors will be explained momentarily):
⃝ 2 3 ⃝ 5 ⃝ 7 ⃝ ⃝ ⃝
11 ⃝ 13 ⃝ ⃝ ⃝ 17 ⃝ 19 ⃝
⃝ ⃝ 23 ⃝ ⃝ ⃝ ⃝ ⃝ 29 ⃝
31 ⃝ ⃝ ⃝ ⃝ ⃝ 37 ⃝ ⃝ ⃝
41 ⃝ 43 ⃝ ⃝ ⃝ 47 ⃝ ⃝ ⃝
⃝ ⃝ 53 ⃝ ⃝ ⃝ ⃝ ⃝ 59 ⃝ (3)
61 ⃝ ⃝ ⃝ ⃝ ⃝ 67 ⃝ ⃝ ⃝
71 ⃝ 73 ⃝ ⃝ ⃝ ⃝ ⃝ 79 ⃝
⃝ ⃝ 83 ⃝ ⃝ ⃝ ⃝ ⃝ 89 ⃝
⃝ ⃝ ⃝ ⃝ ⃝ ⃝ 97 ⃝ ⃝ ⃝
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
This table has a lot of holes, just like a sieve. For this reason, the methods that produce an
interesting set (e.g. primes) from a less interesting set (e.g. integers) by successively sifting
out the unwanted elements are referred to as sieve methods.
The primes shown in green are the twin primes, that is, primes 𝑝 such that 𝑝 + 2 or
𝑝 − 2 are also prime1. Twin primes are the simplest rhymes in the mysterious poem of primes.
While it is very easy to see that there are infinitely many primes2, the infinitude of twin primes
is a very old conjecture, still open today. However, the recent years saw an incredible progress
in our understanding of various patterns in primes, recognized, in particular, by the Fields
medal, the highest honor in mathematics, awarded in 2022 to James Maynard.
1 Can you prove that 𝑝 + 2 and 𝑝 − 2 cannot both be prime, except for 𝑝 = 5? Questions like
this will be clarified when we talk about admissible patterns.
2 Every divisor of the number 𝑛! + 1, where 𝑛! = 1 · 2 · 3 · · · · · 𝑛, has to be larger than 𝑛.
Since 𝑛 is arbitrary, there are infinitely many primes.
3 Rhymes in primes
In these notes, we will try to give a very basic introduction to this area of number
theory and some of the results of Maynard and his predecessors. A more experienced reader
can probably skip many sections of this narrative. All newcomers we wish some patience
working through these notes, and very much hope this patience will be rewarded by the
sense of awe that this mathematics inspires.
2. Last digits of primes

It is very noticeable in (3) that some columns have very few (in fact zero or one)
prime numbers in them. Given a number 𝑛, its column number in (3) is determined by the
last digit of 𝑛 in its decimal notation or, equivalently, by the remainder in the division of 𝑛
by 10. Mathematicians have a special notation for the remainder, namely
89 mod 10 = 9 .
One also says that the residue of 89 modulo 10 is 9. More generally, we write
𝑎 1 = 𝑎 2 mod 𝑏
to mean that 𝑎 1 − 𝑎 2 is divisible by 𝑏. We say that 𝑎 1 and 𝑎 2 are equal mod 𝑏, or that they
are in the same residue class modulo 𝑏.
If 𝑛 mod 10 = 8 then 𝑛 is even and not equal to 2, hence 𝑛 cannot possibly be prime.
Therefore, the 8th column in (3) is empty. Similar reasoning applies to the 2nd, 4th, 5th,
6th, and 10th columns. In due time we will see that prime numbers are approximately evenly
distributed among the remaining 4 columns of table (3). Whether the column corresponding
to a residue 𝑎 modulo 10 has many or very few primes is determined by the greatest common
divisor gcd(𝑎, 10). The columns with gcd(𝑎, 10) > 1 contain at most one prime.
The base 10 of the decimal expansion can be replaced by any other base 𝑏 > 1. For
instance, 𝑏 = 2 means binary expansions, as exemplified by
23 = 10111binary = 1 · 24 + 0 · 23 + 1 · 22 + 1 · 21 + 1 · 20 . (4)
Clearly, for all primes 𝑝 ≠ 2 we have 𝑝 = 1 mod 2.

Generalizing what we have seen for 𝑏 = 10 and 𝑏 = 2, for any base 𝑏, primes are
approximately evenly distributed among residue classes 𝑎 modulo 𝑏 such that gcd(𝑎, 𝑏) = 1.
The residue classes with gcd(𝑎, 𝑏) > 1 contain at most one prime each.
For example, if we replace base 𝑏 = 10 in (3), by 𝑏 = 211, which is a prime number,
we will get the following distribution of primes 𝑝 ≤ 2112 (shown by blue or green squares,
colors mean the same as in Figure (3)).
4 Andrei Okounkov
(5)
Primes indeed seem to be roughly evenly distributed among all columns3, except the very
last one, which contains the multiples of 211. Of course, what catches the eye in this picture
are the diagonal stripes. We invite the reader to explain them using the equality
211𝑖 + 𝑗 = 𝑖 + 𝑗 mod 210
and the factorization 210 = 2 · 3 · 5 · 7.
3. The Chinese remainder theorem

One can add and multiply residue classes modulo 𝑏 in the same way that one can
tell the last digit of a sum 𝑛1 + 𝑛2 or a product 𝑛1 𝑛2 from the last digits of 𝑛1 and 𝑛2 . Such
considerations of are both very basic and very central to number theory. They can be simpli-
fied using the Chinese remainder theorem (CRT), which is a result nearly as ancient as the
Eratosthenes sieve, appearing in Sunzi Suanjing treatise from the 3rd century CE.
CRT applies to residues modulo 𝑏 = 𝑏 1 𝑏 2 , where 𝑏 1 and 𝑏 2 are coprime, meaning
that gcd(𝑏 1 , 𝑏 2 ) = 1. For example, 10 = 2 · 5 and gcd(2, 5) = 1. Given a residue 𝑎 modulo
𝑏, we can associate to it two numbers
𝑎 −→ (𝑎 1 , 𝑎 2 ) = (𝑎 mod 𝑏 1 , 𝑎 mod 𝑏 2 ) . (6)
3 Actually, the number of primes in any given column in (5) varies between 14 and 31, but
it all evens out as we go further and further down the list of primes. It is fact of life that it
takes a while for primes 𝑝 to equidistribute mod any fixed prime like 𝑞 = 211. It is a very
subtle business to find out how long exactly this while can be, for either some fixed 𝑞 and
or averaged over 𝑞. This is, in fact, one of the key technical questions in this part of number
theory.
5 Rhymes in primes
For instance, for 𝑏 = 10 = 2 · 5, consider the following table. The rows and the columns of
this table are indexed by residues mod 2 and 5, respectively, and we place each residue mod
10 in the corresponding row and column:
1 2 3 4 0 mod 5
1 mod 2 1 7 3 9 5 . (7)
0 mod 2 6 2 8 4 0
We observe the remarkable fact that each residue 𝑎 = 0, 1, . . . , 9 mod 10 finds a unique
place in this table, filling the table completely. In general CRT says that the map (6) gives a
one-to-one correspondence
{residues mod 𝑏 1 𝑏 2 } = {residues mod 𝑏 1 } × {residues mod 𝑏 2 } (8)
that preserves arithmetic operations. We invite the reader to prove the CRT and to generalize
its statement to the case 𝑏 = 𝑏 1 𝑏 2 · · · 𝑏𝑟 .
Let us revisit table (3) from the point of view of CRT. Shading the residue classes
that contain ≤ 1 primes, we get
1 2 3 4 0 mod 5
1 mod 2 1 7 3 9 5 (9)
0 mod 2 6 2 8 4 0
which illustrates two key points:
• 𝑎 is coprime to 10 if and only if 𝑎 is coprime to 2 and 5,
• being coprime to 2 and 5 are independent events.
Here we think of residue classes 𝑎 modulo 10 as all equally likely and we call two events E1
and E2 independent if
Prob(E1 &E2 ) = Prob(E1 ) Prob(E2 ) .
While primes are truly special and not random at all, after centuries of looking into patterns
in primes most mathematicians would probably agree that primes behave as if they were
completely random, subject to, first, all possible constraints imposed by the considerations of
residues and, second, density constraints imposed by the unique factorization of integers into
primes. It is therefore very useful to inject, following Cramér, some probabilistic terminology
and intuition into our discussion.
4. Infinity and limits

There is mystery and challenge in primes because there are infinitely many of them.
Any list or plot of primes that we can examine, however long, contains only 0% of all primes,
hence always at the best provides a warm-up for the real question. Which is: what happens
for all sufficiently large primes?
6 Andrei Okounkov
In mathematics, there is lot of questions for which one is free to discard an arbitrary
finite part of some infinite data set. As an example, let’s take the concept of a limit, which is
very important when talking about primes. In the discussion that follows, we will very often
have a sequence of real numbers
(𝑎 𝑛 ) = (𝑎 1 , 𝑎 2 , 𝑎 3 , . . . ) ,
that tends to a limit

𝑎 = lim 𝑎 𝑛 (10)
𝑛→∞
as 𝑛 goes to infinity. Slightly incorrectly, this means that every digit in the decimal expansions
of 𝑎 𝑛 ’s equals to that of 𝑎, except for finitely many values of 𝑛. Any person trained in calculus
will be quick to point out some problems with this definition, namely
𝑎 𝑛 = 10𝑛 ↛ 0 ,
even though every digit of 𝑎 𝑛 is zero except for one value of 𝑛, while
𝑎 𝑛 = 0. 999 . . . 9 → 1.0000 . . . ,
| {z }
𝑛 times
despite the fact that all digits after the decimal point are different. Readers who are not sure
how to fix these issues and feel they could use a more rigorous discussion, can find it in
Appendix A.
With the notion of a limit one can define infinite sums and products by
∞
∑︁ 𝑁
∑︁ ∞
Ö 𝑁
Ö
𝑎 𝑛 = lim 𝑎𝑛 , 𝑎 𝑛 = lim 𝑎𝑛 ,
𝑁 →∞ 𝑁 →∞
𝑛=1 𝑛=1 𝑛=1 𝑛=1
when these limits exist. For example, for any number |𝑥| < 1, we have
𝑥 ∞ = lim 𝑥 𝑛 = 0 , (11)
𝑛→∞
and also ∞
∑︁ 1
𝑥𝑛 = , (12)
𝑛=0
1−𝑥
which we invite the reader to deduce from (11).
Limits are needed not only for talking about infinite sets, but also as a way to define
some very important functions4
∞
𝑥2 𝑥3 𝑥4 ∑︁ 𝑥 𝑛
𝑒𝑥 = 1 + 𝑥 + + + +··· = , (13)
2 2·3 2·3·4 𝑛=0
𝑛!
4 The primary reason the exponential 𝑒 𝑥 and the natural logarithm ln 𝑦 are so important in
mathematics is because they solve the simplest differential equations, namely (𝑒 𝑥 ) ′ = 𝑒 𝑥
and (ln 𝑦) ′ = 1/𝑦. The reader can check this using the series (13), (15), and the rule ( 𝑥 𝑛 ) ′ =
𝑛𝑥 𝑛−1 .
7 Rhymes in primes
where 𝑒 = 2.71828 . . . is a famous transcendental number that can be computed by substi-
tuting 𝑥 = 1 in the above series. Another important constant that we will meet below is the
Euler constant !
𝑁
∑︁ 1
𝛾 = lim ln 𝑁 − = 0.57721 . . . . (14)
𝑁 →∞ 𝑛
1
Here and below ln 𝑦 denotes the function inverse to (13), which means that by definition
ln 𝑒 𝑥 = 𝑥 .
It is called the natural logarithm, and for arguments in (0, 2) it can be computed using the
series ∞
𝑦2 𝑦3 𝑦4 ∑︁ 𝑦𝑛
ln(1 + 𝑦) = 𝑦 − + − +··· = (−1) 𝑛−1 , |𝑦| < 1 . (15)
2 3 4 𝑛=1
𝑛
Readers unfamiliar with these functions will discover that the exponential 𝑒 𝑥 grows very
quickly with 𝑥, making the inverse function ln 𝑦 grow very slowly. Notice that the sum in
(14), with its minus sign, is the partial sum for 𝑦 = −1 in (15). No wonder it goes to ln 0 = −∞
as 𝑁 grows.
While Zeno of Elea (c. 495 – c. 430 BC) made a career out of being confused by the
𝑥 = 1/2 case of (12), we want to stress there are no logical problems whatsoever in thinking
about the infinity of primes and about limits. We encourage the reader to embrace these
notions as something more true and fundamental than any finite approximations to it.
5. The density of primes

If N = {1, 2, . . . } is the set of natural numbers and A ⊂ N is a subset of it, we define
|A ∩ {1, . . . , 𝑁 }|
density(A ) = lim , (16)
𝑁 →∞ 𝑁
assuming this limit exists. When the limit (16) exists, we will also say that this is the proba-
bility that a random natural number is in A .
From table (9) it is clear that
4 1 4
density({coprime to 10}) = = × . (17)
10 2 5
Similarly, if 𝑝 1 , 𝑝 2 , . . . , 𝑝 𝑟 are prime then
𝑟
Ö 1
density({coprime to 𝑝 1 𝑝 2 · · · 𝑝 𝑟 }) = 1− . (18)
𝑖=1
𝑝𝑖
The equality (18) makes one wonder whether
?
Ö 1

density({primes}) = 1− . (19)
all primes 𝑝
𝑝
This is indeed true, but with the clarification that

Ö
1 !
1− = 0, (20)
all primes 𝑝
𝑝
8 Andrei Okounkov
as we will see momentarily. Let us look at the reciprocal of the product (18). We have the
𝑥 = 1/𝑝 special case of (12)
1 1 1 1 ∑︁ 1
1
= 1+ + 2 + 3 +··· = ,
1− 𝑝 𝑝 𝑝 𝑝 𝑚≥0
𝑝𝑚
and multiplying those out for different primes 𝑝 𝑖 , we get
𝑟 −1
Ö 1 ∑︁ 1
1− = 𝑚1 𝑚2 𝑚𝑟 . (21)
2 · · · 𝑝𝑟
𝑖=1
𝑝 𝑖 𝑚 ,...,𝑚 ≥0
𝑝 1 𝑝
1 𝑟
If the set {𝑝 𝑖 } contains all primes that are ≤ 𝑁, then the sum on the right in (21) contains,
in particular, the reciprocals of all natural numbers ≤ 𝑁. Therefore, by the existence of the
prime factorization, we conclude
−1
Ö 1 1 1
1− = 1 + + · · · + + more terms
all primes 𝑝 ≤ 𝑁
𝑝 2 𝑁
1 1
≥ 1+ +···+
2 𝑁
= ln 𝑁 + 𝛾 + 𝑜(1) , (22)
where 𝛾 is the Euler constant from (14) and 𝑜(1) denotes a quantity that goes to 0 as 𝑁 → ∞.
This shows that the rightmost term in

Ö 1
0 ≤ density({primes}) ≤ density({coprime to 𝑁!}) = 1− (23)
all primes 𝑝 ≤ 𝑁
𝑝
goes to 0 as 𝑁 → ∞ and completes the proof of (19).
It is curious to notice that taking logarithms in (22) and using that (15) says that
− ln(1 − 𝑝 −1 ) ≈ 𝑝 −1 for large 𝑝, we get
∑︁ 1
= +∞ . (24)
primes 𝑝
𝑝
This means that the same computation (22) proves that the density of primes is zero and yet
there are sufficiently many primes for the series (24) to diverge, as first noted by Euler.
While we may be disappointed in the fact that the number (19) vanishes, very similar
considerations often lead to positive results. For instance, let us consider square-free numbers
𝑛, that is numbers not divisible by 𝑚 2 for any 𝑚 > 1. This means
𝑛 mod 𝑝 2 ≠ 0 ,
for any prime 𝑝. Referring back to (4), this means that the two last digits of 𝑛 in the expansion
base 𝑝 do not vanish simultaneously. Since this pair of digits is free to take any of the 𝑝 2
possible values, one can conclude
Ö 1

6
density({squarefree}) = 1 − 2 = 𝜁 (2) −1 = 2 ≈ 0.6 . (25)
primes 𝑝
𝑝 𝜋
Here we meet the infinitely famous Riemann 𝜁-function
∞ Ö −1
∑︁ 1 1
𝜁 (𝑠) = = 1− 𝑠 , 𝑠 > 1, (26)
𝑛=1
𝑛𝑠 primes 𝑝 𝑝
9 Rhymes in primes
and its value 𝜁 (2) first computed by Euler in 1735. Our earlier computation (20) means that
𝜁 (1) = ∞.
6. The prime number theorem

For a set A of zero density, the numbers (16) go to 0 as 𝑁 → ∞. A finer measurement
of the density is then the rate at which the limit 0 as approached. For prime numbers, the
answer is given by the prime number theorem, which says that the density of primes around
some large number 𝑁 is about 1/ln(𝑁).
A mathematically precise way to phrase it uses the function
𝜋(𝑥) = number of primes 𝑝 such that 𝑝 ≤ 𝑥 (27)
and states that5 ∫ 𝑥

def 𝑑𝑦 𝑥
𝜋(𝑥) ∼ Li(𝑥) = ∼ , (28)
2 ln 𝑦 ln(𝑥)
where 𝑓1 (𝑥) ∼ 𝑓2 (𝑥) means that 𝑓𝑓12 (( 𝑥𝑥 )) → 1 as 𝑥 → ∞. The reader may find the following
data, taken from the Online encyclopedia of integer sequences, convincing:
𝑥 𝜋 (𝑥) Li( 𝑥 )/ 𝜋 ( 𝑥 ) − 1
10 4 .25
102 25 .16
103 168 .054
104 1229 .013
105 9592 .0039
106 78498 .0016
107 664579 .00051
108 5761455 .00013
109 50847534 .000033
1010 455052511 .0000068
1011 4118054813 .0000028 (29)
1012 37607912018 .0000010
1013 346065536839 .00000031
1014 3204941750802 .000000098
1015 29844570422669 .000000035
1016 279238341033925 .000000012
1017 2623557157654233 .0000000030
1018 24739954287740860 .00000000089
1019 234057667276344607 .00000000043
1020 2220819602560918840 .00000000010
1021 21127269486018731928 .000000000028
1022 201467286689315906290 .0000000000096
Lest the reader concludes that the last column is always positive, it is known that, in fact, the
function Li(𝑥) − 𝜋(𝑥) changes sign infinitely many times. Also, while all 3 functions in (28)
5 A limit procedure is part of the definition of such everyday notions as areas and volumes.
The integral of a univariate or multivariate function 𝑓 is the signed area or volume between
the graph of 𝑓 and the graph of the zero function. It is a continuous limit of summing the
values of 𝑓 over a finer and finer mesh.
10 Andrei Okounkov
grow at the same rate, the logarithmic integral Li(𝑥) gives a much better approximation to
𝜋(𝑥) than the ratio ln(𝑥𝑥 ) .
The prime number theorem was first shown by Hadamard and de la Vallée Poussin
in 1896, so more than 2000 years after Eratosthenes. Certainly, many additional ideas were
required, and are still required today to prove (28). Therefore, we will say very little about
the proof. The reader interested in a heuristic derivation of the 1/ln(𝑁) density from unique
factorization can find it here (requires familiarity with integrals).
To extract the distribution of primes from (93), Hadamard and de la Vallée Poussin
had to use some properties of 𝜁 (𝑠) for complex values of 𝑠. What happens with 𝜁 (𝑠) for
complex 𝑠 involves some of deepest problems in all of mathematics, including the infinitely
famous Riemann hypothesis (RH), still completely open today. The RH says that all solutions
of 𝜁 (𝑠) = 0 are either the so-called trivial zeros 𝑠 = −2, −4, −6, . . . or have real part ℜ𝑠 = 21 .
The remarkable 21 from the Riemann Hypothesis can be in fact seen in the table (29)
if one notices that the number of 0’s in the second column is about half the number of digits
of 𝜋(𝑥), meaning that the difference 𝜋(𝑥) − Li(𝑥) of of the order 𝑥 1/2 , give or take some
logarithmic factors. If there was a zero with ℜ𝑠 = 𝑐 > 21 , the error 𝜋(𝑥) − Li(𝑥) would be
at least of size 𝑥 𝑐 , and the argument of Hadamard and de la Vallée Poussin was really about
proving that ℜ𝑠 < 1 for all zeros of the 𝜁-function.
While this is an incredibly interesting topic, the plot of our narrative follows a dif-
ferent path. Asked about the RH, James Maynard says: “The Riemann Hypothesis suggests
that there is a deep hidden structure within the prime numbers. This must occur for a good
reason - we just do not know what the reason is yet.”
7. Inclusion–exclusion
Let A be a set of integers, or even of objects of arbitrary nature. A very, very abstract
formulation of a sieve involves some subsets A 𝑝 ⊂ A , labelled by 𝑝 in some index set 𝑝 ∈ P,
which we wish to remove or sift out from the set A . In other words, our goal is to understand
the complement A \ 𝑝∈ P A 𝑝 of all sets A 𝑝 in A .
Ð
In its most basic form, the principle of inclusion–exclusion refers to the following
elementary observation. Assuming the number of elements |A | is finite, we have
Ø
A \ A 𝑝 = |A | count all elements of A
𝑝∈ P
∑︁
− A𝑝 subtract |A 𝑝 | for each 𝑝
𝑝
∑︁
+ A 𝑝1 ∩ A 𝑝2 correct for subtracting twice
𝑝1 < 𝑝2
∑︁
− A 𝑝1 ∩ A 𝑝2 ∩ A 𝑝3 + . . . et cetera. (30)
𝑝1 < 𝑝2 < 𝑝3
11 Rhymes in primes
For example, referring back to table (9) we may take
A = {residues modulo 10}

A 𝑝 = {multiples of 𝑝} , 𝑝 ∈ P = {2, 5} ,
in which case (30) gives

Ø
A \ A 𝑝 = |{residues coprime to 10}| = 10 − 5 − 2 + 1 .
𝑝=2,5
In other words, subtracting 5 multiples of 2 and 2 multiples of 5, we subtract the zero residue
twice, as the shading in table (9) illustrates. Hence we have to put it back.
If the subsets A 𝑝 ⊂ A correspond to independent events, meaning that
|A 𝑝1 ∩ A 𝑝2 ∩ · · · ∩ A 𝑝𝑟 | Ö 𝑟
|A 𝑝𝑖 |
= , (31)
|A | 𝑖=1
|A |
then formula (30) factors very nicely
A \ 𝑝∈ P A 𝑝
Ð Ö
|A 𝑝 |
= 1− , (32)
|A | 𝑝∈ P
|A |
special instances of which we have observed in (17), (18), and (25).

For us, A will always be some set of integers or residue classes and A𝑑 ⊂ A will
denote those divisible by a some number 𝑑. In this case, all possible intersections in (30) can
be described very concretely
A 𝑝1 ∩ A 𝑝2 ∩ · · · ∩ A 𝑝𝑟 = A 𝑝1 𝑝2 ... 𝑝𝑟 , (33)
as is illustrated for P = {2, 3, 5} in the following diagram. In (34), we visualize a composite

number as a kind of molecule formed by its factors. The primes in P are assigned three
different colors.
(34)
If (33) is the case, the terms in formula (30) correspond to square-free integers 𝑛 all prime
factors of which belong to P. Thus (30) may be written more compactly
Ø ∞
∑︁
A \ A𝑝 = 𝜇P (𝑑)|A𝑑 | , (35)
𝑝∈ P 𝑑=1
12 Andrei Okounkov
using a variant of the Möbius function
 (−1) 𝑟 ,

 𝑑 is a product of 𝑟 distict primes in P ,
𝜇P (𝑑) = (36)
0, otherwise.

A more flexible language for the inclusion-exclusion principle uses the notion of character-
istic functions. For any subset 𝑆 ⊂ A , we define its characteristic function 𝛿 𝑆 by
1,

 𝑛 ∈ 𝑆,
𝛿 𝑆 (𝑛) = (37)
0, 𝑛 ∉ 𝑆.

Then (35) can be refined to
∞
∑︁
𝛿A \Ð 𝑝∈ P A 𝑝 = 𝜇P (𝑑) 𝛿A𝑑 . (38)
𝑑=1
Since ∑︁
|𝑆| = 𝛿 𝑆 (𝑛) , (39)
𝑛
summing the values in (38) gives (35).

Formulas (35) and (39) require no assumption of indepence like (31). This is very
good because (31) is satisfied only approximately in the vast majority of sieve problems.
Independence being only approximate is, in fact, a serious problem, to which we will come
back below.
Another difficulty one encounters in real number-theoretic applications is that the
set A is typically infinite. For example, we can have A = N, where N = {1, 2, . . . } is the
set of natural numbers. The solution to this problem is to count elements of 𝑛 ∈ A not with
weight 1 as in (39), but with some weight 𝜌(𝑛) such that the count converges. Schematically
∑︁ generalize ∑︁
|A | = 1 −−−−−−−−−−−−→ 𝜌(A ) = 𝜌(𝑛) .
𝑛∈ A 𝑛∈ A
An example of such weight function is
𝜌 𝜁 (𝑛) = 𝑛 −𝑠 , 𝑠 > 1, (40)
used in the construction of the 𝜁-function. Multiplicativity of 𝜌
𝜌(𝑛1 𝑛2 ) = 𝜌(𝑛1 ) 𝜌(𝑛2 ) , (41)
satisfied by (40) and some other choices of 𝜌, implies an analog of independence (31) for
weighted counts. For example, for A = N, A 𝑝 = 𝑝N, and a function 𝜌 satisfying (41), formula
(32) transforms into Í
𝑛 coprime to P 𝜌(𝑛)
Ö
Í = (1 − 𝜌( 𝑝)) . (42)
𝑛∈ N 𝜌(𝑛) 𝑝∈ P
We invite the reader to generalize formula (42) for functions 𝜌 satisfying a weaker property
gcd(𝑛1 , 𝑛2 ) = 1 ⇒ 𝜌(𝑛1 𝑛2 ) = 𝜌(𝑛1 ) 𝜌(𝑛2 ) . (43)
13 Rhymes in primes
Other than (40), what other interesting functions satisfy (41)? For every 𝑁, the set
(Z/𝑁Z) × = {residue classes 𝑎 mod 𝑁 such that gcd(𝑎, 𝑁) = 1} (44)
is a finite abelian group with respect to multiplication. We take a character of 𝜒 of the group
(44) that is, a complex-valued multiplicative function with 𝜒(1) = 1, and extend it by zero
to all residues mod 𝑁. Examples of such functions are


 1, 𝑛 = 1 mod 3 ,


  𝑖 𝑚 , 𝑛 = 2𝑚 mod 5 ,


𝜒3 (𝑛) = −1 , 𝑛 = −1 mod 3 , 𝜒5 (𝑛) = (45)

  0 , 𝑛 = 0 mod 5 ,
0,

𝑛 = 0 mod 3 ,


√
where the complex number 𝑖 = −1 ∈ C is the imaginary unit. The weight
𝜒(𝑛 mod 𝑁)
𝜌 𝑁 , 𝜒,𝑠 (𝑛) = , 𝑠 > 1, (46)
𝑛𝑠
satisfies (41) and the corresponding analog of the 𝜁-function
∞
∑︁ 𝜒(𝑛 mod 𝑁)
𝐿( 𝜒, 𝑠) = , 𝑠 > 1, (47)
𝑛=1
𝑛𝑠
is called the Dirichlet L-function. Its properties are entirely parallel to the 𝜁-function with
one crucial difference. Namely, if 𝜒 is nontrivial, that is, takes values other than 0 and 1, then,
in contrast to the 𝜁 function having a singularity at 𝑠 = 1 as in (93), the L-function has a finite
nonzero value at 𝑠 = 1. This allowed Dirichlet to show that primes are equally distributed
among the residue classes (44).
8. The first challenge for sieves

As already emphasized above, the main difficulty with sieves is the fact that the
independence (31) is only approximate and not exact. Here is an example. Take some large
number 𝑥 and consider the sets
√
A = {integers 𝑛 such that 𝑥 < 𝑛 ≤ 𝑥} , (48)
√
P = {primes 𝑝 such that 𝑝 ≤ 𝑥} .
√
After sifting out P, we get precisely the primes in the range ( 𝑥, 𝑥], hence
Ø √ 𝑥
A \ A 𝑝 = 𝜋(𝑥) − 𝜋( 𝑥) ∼ ,
𝑝∈ P
ln 𝑥
by the prime number theorem. Let’s see if, conversely, we can recover the prime number
theorem from the sieve (48).
For fixed 𝑝 1 , . . . , 𝑝 𝑟 , the equality (31) is satisfied in the limit 𝑥 → ∞. However,
the error terms present for finite 𝑥 render the following reasoning incorrect. To warn the
???
readers, will use = to denote an incorrect equality. If we could just apply (32) to the 𝑥 → ∞
asymptotics, we would get
√
𝜋(𝑥) − 𝜋( 𝑥) 𝜋(𝑥) ??? Ö 1
√ ∼ ∼ 1− , 𝑥 → ∞. (49)
𝑥− 𝑥 𝑥 √ 𝑝
primes 𝑝 ≤ 𝑥
14 Andrei Okounkov
Having seen products of this general shape before, the reader should not be surprised by the
following exact result of F. Mertens
2𝑒 −𝛾

Ö 1
1− ∼ , (50)
√ 𝑝 ln 𝑥
primes 𝑝 ≤ 𝑥
where 𝛾 is the number from (22) and (93). Since 2𝑒 −𝛾 ≈ 1.123 this is somewhat close to the
right answer and, in particular, gives the correct logarithmic dependence on 𝑥, but little else
can be said in defence of a wrong formula.
This example is meant to illustrate that it is not easy to construct a good sieve, and
not to discourage the reader from reading on! See also the references in Section 11, and in
particular [7].
9. Patterns in primes
So far, we have looked at primes individually, meaning that we studied expressions
like
∑︁ 1,

 𝑦 ∈ [1, 𝑥] ,
𝜋(𝑥) = 𝛿 [1, 𝑥 ] ( 𝑝) , where 𝛿 [1,𝑥 ] (𝑦) =
0, otherwise ,
primes 𝑝 

∑︁ 1
ln 𝜁 (𝑠) = − ln 1 − ,
primes 𝑝
𝑝𝑠
given by summing some natural function 𝑓 ( 𝑝) over the set of all primes. To a general science
audience, we can say that we have been learning about 1-point correlations in the set of
primes.
Recall we expect the primes to be as “random” as the constraints imposed by residues
and density allow. To really put these ideas to the test, one should study multi-point correla-
tions, that is, events or patterns that involve pairs, triples, etc. of primes.
To start with a concrete example, what is the probability that 𝑛 and 𝑛 + 1 are both
prime? The answer is clearly 0 because one of these numbers will have to be even, and so
𝑛 = 2 is the only solution. What about 𝑛 and 𝑛 + 2 being simultaneously prime? Such pairs
are called twin primes and we saw many such pairs (green) in the Eratosthenes’ sieve (3).
Similarly, in the plot (5), twin primes are shown in green, all other primes in blue.
Twin primes provide an excellent test of our probabilistic intuition based on density
and mod 𝑝 considerations. From density alone, we should expect that the density of twin
primes around 𝑁 should be about (ln 𝑁) −2 . However, this needs to be corrected from mod 𝑝
considerations. Indeed, if 𝑛 and 𝑛 + 2 were truly independent, the probability of both of them
to be coprime to 𝑝 would be (1 − 1/𝑝) 2 , while in reality it is 1/2 for 𝑝 = 2 and (1 − 2/𝑝) for
𝑝 > 2. Whence the following constant in the 1923 conjecture of Hardy and Littlewood
∫ 𝑥
? 𝑑𝑦
𝜋2 (𝑥) = |{𝑝 ≤ 𝑥 such that 𝑝 + 2 is prime}| ∼ 𝐶2 , 𝑥 → ∞, (51)
2 (ln 𝑦) 2
where
Ö 1 − 𝑝2
𝐶2 = 2 1 2
= 1.32 . . . . (52)
primes 𝑝 > 2 (1 − 𝑝 )
15 Rhymes in primes
In exactly the same fashion, the probability that 𝑛 and 𝑛 + 2𝑚 are both coprime to 𝑝 equals
(1 − 1/𝑝) if 𝑝 divides 2𝑚 and (1 − 2/𝑝) otherwise. Therefore, for any fixed 𝑚 one can
conjecture that
∫ 𝑥
? 𝑑𝑦
|{𝑝 ≤ 𝑥 such that 𝑝 + 2𝑚 is prime}| ∼ 𝐶2𝑚 , 𝑥 → ∞, (53)
2 (ln 𝑦) 2
where
𝐶2𝑚 Ö 𝑝−1
= ≥ 1. (54)
𝐶2 𝑝−2
𝑝 |𝑚, 𝑝≠2
From this, it is clear that products of consecutive odd primes like 1155 = 3 · 5 · 7 · 11 should be
particularly likely to occur as distances 𝑝 2 − 𝑝 1 between primes, while powers of two are the
least likely values of 𝑝 2 − 𝑝 1 . In (55) the function (54) is plotted in the ranges 𝑚 ∈ [1 . . . 105]
and 𝑚 ∈ [1 . . . 1155], respectively6.
(55)
The conjecture (53) is in excellent agreement with data, especially if one considers the relative
frequencies of distances. The following plot (56) compares the function 𝐶2𝑚 with the actual
distribution of the distances among first 106 odd primes:
(56)
6 The reader may have to adjust the size/resolution of the graph to see the peak at 1155
16 Andrei Okounkov
In (56) we have plotted the relative frequencies, normalized to exactly 1 for 𝑚 = 1. The numer-
ical data is in light blue and the theoretical prediction is in dark blue. The latter overshoots
(with the exception of 𝑚 = 18) the former by less than 1%, so it is just barely visible in the
plot. Had we gone any deeper in the list of primes, the difference in graphs woud have become
undetectable.
We note that the above discussion is for distances between primes, while a prime gap
of length 2𝑚 means there are no other primes between 𝑝 and 𝑝 + 2𝑚. However, since primes
become sparser and sparser, finding another prime in an interval of fixed length becomes less
and less probable as 𝑝 → ∞.
The exact same heuristic can be applied to any finite set of jumps
𝐽 = { 𝑗1 < 𝑗2 < · · · < 𝑗𝑙 } ⊂ N (57)
that we would like to find between primes. We denote by 𝑛 + 𝐽 = {𝑛 + 𝑗 1 < · · · < 𝑛 + 𝑗𝑙 } the
shift of 𝐽 by 𝑛 ∈ N and by 𝑛 + 𝐽 ⊂ P the event that all numbers 𝑛 + 𝑗 𝑖 are prime. In parallel
to (53), it is natural to expect that
∫ 𝑥
? 𝑑𝑦
|{𝑛 ≤ 𝑥 such that 𝑛 + 𝐽 ⊂ P }| ∼ 𝐶 𝐽 |𝐽|
, 𝑥 → ∞, (58)
2 (ln 𝑦)
where
Ö 1 − | 𝐽 mod
𝑝
𝑝|
𝐶𝐽 = |𝐽| . (59)
𝑝 1 − 𝑝1
Here |𝐽 mod 𝑝| is the number of distinct residue classes mod 𝑝 in 𝐽. Since, for fixed 𝐽, this
equals |𝐽 | for all sufficiently large 𝑝, the contribution of all such 𝑝 to (59) is 1 + 𝑂 ( 𝑝12 ).
Therefore, the product (59) converges.
It is clear from (59) that the pat-
tern in primes favor those 𝐽 that
contain a small fraction of residues
modulo some prime 𝑝 and pro-
hibit those 𝐽 for which |𝐽 mod 𝑝| =
𝑝. It is also clear from definitions
that it suffices to consider the case
𝑗1 = 0. The graph of the function

𝐶 {0,2𝑖,2(𝑖+ 𝑗 ) } 𝐶 {0,2,6} is plotted on
the left. It vanishes unless 𝑖 𝑗 (𝑖 + 𝑗) =
0 mod 3, which explains the missing
columns in the plot.
17 Rhymes in primes
10. Closing the gap
Let is call a pattern 𝐽 as in (57) admissible if 𝐶 𝐽 ≠ 0, that is, if has a nonzero chance
to occur in prime numbers. As a very, very special case of the above heuristic reasoning,
one expect that any admissible pattern 𝐽 will occur as a sequence of prime gaps infinitely
many times7. In particular, one expects the set of twin primes to be infinite. This is known
as the twin prime conjecture, and it is still open today. However, in constrast to the Riemann
Hypothesis, there has been a truly dramatic progress in the recent years on such infinitude
questions. This progress has been so dramatic that it inspires us to say that these conjectures
are ”almost” proven. It is quite incredible to see humans actually reach for the stars.
James Maynard does not quite agree with the narrator here. He says: “Despite all
the recent progress, it seems we are still missing an important idea to prove the Twin Prime
Conjecture. But perhaps it is only one big idea.”
Of course, the actual mathematics involved in proofs compares to what we have
discussed so far like a modern airplane compares to a paper airplane. But if the reader tried
to think about the issues discussed in Section 8, then she or he may begin to appreciate the
amazing creativity and technical mastery required to design sieving arguments leading to the
proofs of the breakthrough results below.
It is clear from the prime number theorem that for any constant 𝑐 > 1 there are
infinitely many pairs of primes 𝑝 1 and 𝑝 2 such that
𝑝 1 < 𝑝 2 < 𝑝 1 + 𝑐 ln 𝑝 1 . (60)
Proving the same statement for some value 𝑐 < 1 is not easy. Many brilliant mathematicians
worked on this, finding proofs for smaller and smaller values of 𝑐, until Goldston, Pintz and
Yıldırım have shown that for any constant 𝑐 > 0 there are infinitely many pairs of primes
satisfying (60).
The new important ideas introduced by Goldston, Pintz and Yıldırım opened the
race to replace 𝑐 ln 𝑝 1 in (60) by some fixed constant 𝐵, that is to prove the infinitude of pairs
of primes that are within a fixed finite distance
𝑝1 < 𝑝2 ≤ 𝑝1 + 𝐵 (61)
from each other. This race was won in a very dramatic fashion in April 2013 by Yitang Zhang.
Even much more modest results in mathematics today require finding a new way
through a real maze of possible ideas, techniques, and logical constructions, and hence
moments of extraordinary concentration and clarity of mind. This is not unlike the need
to be in a really, really top form for an athlete to set a world record. Research mathematicians
(who do have time to do research as part of their job description, in addition to teaching,
advising, and other professional duties) cherish these precious moments. Most athletes and
mathematicians will surely agree that these special moments tend to be spaced further than
ln 𝑁 apart once we are past our prime. Zhang’s proof is therefore particularly incredible
and inspiring, since he had to find his way not just through the mathematical maze, but also
7 This specific statement is known as the Dickson conjecture, made in 1904.
18 Andrei Okounkov
through the many turns of his difficult career outside of academia, not giving up despite the
big success finally coming to him only at the age of 55. His achievement was widely cel-
ebrated by the community, earning him a number of prestigious prizes including the 2013
Ostrowski Prize, the 2014 Cole Prize in Number Theory, and the 2014 Rolf Schock Prize. In
the same year 2014, the Cole Prize in Number theory was also awarded to Goldston, Pintz
and Yıldırım for their influential work mentioned above.
We hope the reader will turn to [8, 13, 17, 19, 30, 31] to learn more about these
developments, and turn to the main hero of these popular notes, the winner of many awards
including the 2022 Fields Medal. In the same eventful year 2013, James Maynard realized
he can make the sieve a lot more effective, eclipsing Zhang’s result in two key dimensions:
getting a much stronger result by an easier method.
Speaking about the influences and inspirations that have lead to this result, James
Maynard says: “I was trying to understand the sieve intuition behind the groundbreaking
work of Goldston-Pintz-Yıldırım, but in studying this I realised that it might be possible to
modify their ideas to go further.”
It is commonly said that great minds think alike, and the same sometimes happens to
the greatest minds, also. In the suspenseful race to close the prime gap, Terry Tao arrived at the
same results independently at the same time as James Maynard. “I was a bit shocked when I
first heard the news, but fortunately Tao was very generous and understanding. Simultaneous
discovery happens more often than you’d imagine!”, says James Maynard.
To explain Maynard’s and Tao’s main result on small gaps in primes, it is important
to make a certain change of perspective. In Section 9, we were interested in the event when
all numbers
𝑛 + 𝐽 = (𝑛 + 𝑗 1 , 𝑛 + 𝑗 2 , . . . , 𝑛 + 𝑗𝑙 ) (62)
are prime. But if one asks for less one can prove more! Let’s instead fix some 𝑚 < 𝑙 and
ask that at least 𝑚 of the numbers (62) are prime for infinitely many values of 𝑛. We will
not know which ones among (62) are prime, but we will know, for instance, that there are
infinitely many primes within distance 𝑗𝑙 − 𝑗 1 from each other.
The following is a special case of the spectacular main result of [20], which Kannan
Soundararajan compares with “sun amidst the stars” in his Fields Medal laudatio.
Theorem 1. For any 𝑚, for all sufficiently long admissible patterns 𝐽, at least 𝑚 of the
numbers (62) are prime for infinitely many 𝑛.
In fact, for any given 𝑚, the required size of 𝐽 in Theorem 1 can be made explicit.
For 𝑚 = 2, |𝐽 | = 50 suffices, and the following set being admissible
𝐽 = {0, 4, 6, 16, 30, 34, 36, 46, 48, 58, 60, 64, 70, 78, 84, 88, 90, 94, 100, 106,
108, 114, 118, 126, 130, 136, 144, 148, 150, 156, 160, 168, 174, 178, 184,
190, 196, 198, 204, 210, 214, 216, 220, 226, 228, 234, 238, 240, 244, 246} , (63)
shows there are infinitely many primes at most 246 apart.
19 Rhymes in primes
For 𝑚 = 3, |𝐽 | = 35410 suffices, and one can take8, for instance, the first 35410
primes larger than 35410
𝐽 = {35419, 35423, . . . , 469411, 469397} .
Therefore, there are infinitely many triples of primes within 433992 of each other. In general,
the best estimate for required length of 𝐽 currently stands at 𝑐𝑒 3.815𝑚 , see [1].
The more general result proven in [20] guarantees there are at least 𝑚 primes among
the numbers 𝑎 1 𝑛 + 𝑗 1 , . . . , 𝑎 𝑙 𝑛 + 𝑗𝑙 provided these are distinct and admissible. This stronger
version of Theorem 1 leads to many further interesting conclusions about patterns in primes.
For example, one can deduce that there are arbitrarily large sets of primes where any pair in
the set differs in only 2 decimal places! Indeed, if we take
𝑎 𝑖 = 𝑙! 10𝑙+2 , 𝑗𝑖 = 10𝑖+1 + 1 , (64)
then all digits of 𝑎 𝑖 𝑛 + 𝑗 𝑖 , 𝑖 = 1, . . . , 𝑙 are the same, except the position of the 1 in the (𝑖 + 1)st
decimal place, which is changing its position within the string of 𝑙 zeros.
I hope the readers share the narrator’s sense of awe at this absolutely amazing math-
ematics and join me in warmest congratulations on it being recognized by the Fields Medal.
I also hope the readers got the sense that today’s mathematics is not just extraordinarily pow-
erful, but also concrete, understandable, and fun, once one finds the right idea and the right
point of view. While finding that right point of view is not at all easy, my biggest hope is to
have inspired my youngest readers to believe that mathematics can be beautiful and reward-
ing, both as a subject and as a profession. Maybe this is also a good place for me to thank
James Maynard and Kannan Soundararajan for this special opportunity to be introduced to
their wonderful subject.
11. Further reading

The Quanta Magazine has published several popular accounts of these and related
developments, see [11, 13–15, 19].
Among surveys written by top experts in the field, one should mention [5, 8, 17, 26],
including expositions by James Maynard himself [21–23].
Among textbooks of different level, the reader will surely find something which suits
her or his level and style among [3, 9, 16, 27, 28] or the more advanced [4, 12]. There is even
a graphic detective novel [10]!
I hope the reader has a lot of fun studying these sources as well as the original articles
[6, 20, 24, 25, 31].
8 As an exercise, the reader may check than any 𝑙-tuples of primes larger than 𝑙 is admissible
20 Andrei Okounkov
12. A glimpse into the argument
To help the reader make transition to further popular and research reading, we will
indicate some initial logical steps in the argument leading to the proof of Theorem 1. There
is a certain distance that we can fly even on our paper airplane.
12.1. Being prime on average

We need to prove that at least 𝑚 of the numbers (62) are prime for infinitely many 𝑛.
Suffices to show that for any given integer 𝑁 this is true for some 𝑛 ≥ 𝑁. Let P denote the
set of all primes. Instead of trying to find a specific 𝑛 for which the intersection {𝑛 + 𝐽} ∩ P
has at least 𝑚 elements, we can ask about the average size of the intersection {𝑛 + 𝐽} ∩ P
with respect to some density 𝜌(𝑛) ≥ 0 on [𝑁, . . . , 2𝑁]. This density 𝜌 is something we are
bringing into the argument, not something given to us in advance.
Clearly,
𝜌(𝑛) {𝑛 + 𝐽} ∩ P
Í
average {𝑛 + 𝐽} ∩ P = ≤ max {𝑛 + 𝐽} ∩ P ,

Í (65)
𝜌(𝑛)
and so if we can bound the average in (65) below by 𝑚 then we win. Now, since the numbers
𝑗 𝑘 ∈ 𝐽 are all distinct, we have
∑︁
𝜌(𝑛)
2𝑁 𝑙
1 ∑︁ ∑︁ 𝑛 + 𝑗𝑘 is prime
Í 𝜌(𝑛) {𝑛 + 𝐽} ∩ P = ∑︁ . (66)
𝜌(𝑛) 𝑛=𝑁 𝑘=1 𝜌(𝑛)
𝑁 ≤ 𝑛 ≤ 2𝑁
Hence, our strategy is to invent a function 𝜌(𝑛) for which each of the 𝑙 ratios in the right-hand
side of (66) can be shown to be large.
12.2. Looking for 𝝆, part I

A naive strategy would be to take
1,

 𝑛+𝐽 ⊂ P,
𝜌0 (𝑛) = (67)
 0 , otherwise .

This makes the numerator and denominator in (66) equal, and so naively each fraction equals
1. What this overlooks is that 00 is no good in (66), and that our original goal is precisely
equivalent to showing that 𝜌0 takes some nonzero values.
This underscores the point that we haven’t really advanced on the problem yet, just
put in a slightly more flexible framework by introducing the density 𝜌. Those who can design
a good 𝜌 are the great masters of the sieve.
Functions that only take values 0 or 1 are called characteristic functions as we recall
from (37). These are also the functions that are equal to their own square. From the definitions,
𝑙
Ö
𝜌0 (𝑛) = 𝛿 [ 𝑁 ,...,2𝑁 ] (𝑛) 𝛿P (𝑛 + 𝑗 𝑘 ) . (68)
𝑘=1
The next natural idea is to find a working replacement e

𝛿 for 𝛿P and get 𝜌 by multiplying
them together.
21 Rhymes in primes
Plots of the function 𝛿P look like barcodes, and here is an example
(69)
in which 𝑛 takes odd values from 106 + 1 to 106 + 599. In principle, (38) gives a formula for
𝛿P , and we can approach the goal of finding a replacement 𝛿P by tinkering with the formula
(38). For instance, we just truncate summation over 𝑑 to some maximal value 𝐷. That is, we
define
2
∑︁
𝛿0 (𝑛) = (70)
© ª
e 𝜇(𝑑) ® ,
«𝑑 |𝑛 , 𝑑 ≤𝐷 ¬
where we square the sum to make the result nonnegative. Since this equals 1 if 𝑛 has no
nontrivial divisors 𝑑 ≤ 𝐷, it is natural to compare this function to the characteristic function
𝛿 ≤𝐷 of numbers without prime factors 𝑝 ≤ 𝐷.
It is easy to plot the function e𝛿0 − 𝛿 ≤𝐷 and the result
(71)
for 𝐷 = 100 is not really satisfying. The two peaks in the graph correspond to the numbers
1000109 = 11 · 23 · 59 · 67, 1000545 = 3 · 5 · 7 · 13 · 733,
and, in general, the function (70) becomes large not because 𝑛 is prime, but because there
is a significant disbalance between its divisors 𝑑 ≤ 𝐷 with different parity of the number of
𝛿0 (𝑛) is much more sensitive to the artificial cutoff introduced
prime factors. In other words, e
by us at 𝑑 ≤ 𝐷 than to what we set out to measure in the first place.
22 Andrei Okounkov
To get rid of this effect, it makes sense to replace the hard cutoff at 𝑑 ≤ 𝐷 by a more
gentle one, through some weight function of 𝑑 that gives 1 for prime numbers and vanishes
at 𝑑 = 𝐷. Let us try
𝑘 2
1 © ∑︁ 𝐷 ª
𝛿 𝑘 (𝑛) =
e 𝜇(𝑑) ln ® , (72)
(ln 𝐷) 2𝑘 𝑑 |𝑛 , 𝑑 ≤𝐷

𝑑
« ¬
and this works much, much better for 𝑘 ≥ 1. For 𝐷 = 100, the function e
𝛿1 − 𝛿 ≤𝐷 looks like
this:
(73)
Not only it takes values in [0, 1) in this plot, it also peaks at numbers with prime factors 𝑝
of size close to 𝐷. Since the weight ln 𝐷𝑝 gets small for such 𝑝, we certainly expect such
numbers to contribute on par with the prime numbers.
12.3. Looking for 𝝆, part II

Functions (72) played an important role in the work of Goldston, Pintz and Yıldırım.
However magical, by themselves they are not enough to get to the Maynard-Tao theorem. If
we just multiply them as in (68), then we loose the crucial synergy between different elements
of the list 𝐽. Recall that the logic of Theorem 1 is such that the longer the list 𝐽 gets, the easier
it is to find many prime numbers in it. For this, there should be some nontrivial interaction
between different 𝑗 𝑘 .
One key new ingredient in the Maynard-Tao method is to consider functions of the
form
2

© ∑︁ ln 𝑑1 ln 𝑑𝑙 ª®
𝜌(𝑛) = 𝛿 [ 𝑁 ,...,2𝑁 ] 𝜇(𝑑1 𝑑2 · · · 𝑑𝑙 )𝐹 (74)

,..., ® ,
ln 𝐷 ln 𝐷 ®
𝑑1 |𝑛+ 𝑗1 ,...,𝑑𝑙 |𝑛+ 𝑗𝑙 ,
« 𝑑1 𝑑2 ···𝑑𝑙 ≤𝐷 ¬
where 𝐹 is a multivariate function to be specified later. As before, we want 𝐹 to be small if
the arguments sum to 1 (meaning that 𝑑1 𝑑2 · · · 𝑑𝑙 = 𝐷) to soften the effect of the summation
cutoff introduced in (74).
23 Rhymes in primes
By allowing 𝐹 to depend on each divisor 𝑑𝑖 , the Maynard-Tao method activates
a very powerful principle of measure concentration in high-dimensional geometry. At the
risk of being repetitive, one may note that there is really a lot of space in a space of a large
dimension 𝑁. There is so much space that no probability distribution can cover all of it evenly
as 𝑁 → ∞, and one could put this vague principle in a mathematically precise form, see for
instance [18].
To make a negative statement positive, one can say that any high-dimensional prob-
ability density has to concentrate on some small portion of the whole space. For example, a
probability measure 𝜈 on the line R is another name for a random variable 𝑥, and a product
measure 𝜈 ⊗ 𝑁 = 𝜈 × · · · × 𝜈 on R 𝑁 is another name for a sequence of independent, identically
distributed (i.i.d.) random variables 𝑥 1 , . . . , 𝑥 𝑁 . We know from basic probability theory that,
with minimal assumptions about 𝜈, the average 𝑁1 𝑥𝑖 , and many other functions of i.i.d.
Í
random variables 𝑥1 , . . . , 𝑥 𝑁 , will sharply peak, or concentrate, around their expected value
as 𝑁 → ∞.
A reader not familiar with these notions, may experiment by working out the example
in which 𝜈 is the uniform density on [0, 1] and 𝜈 ⊗ 𝑁 is a uniform density on an 𝑁-dimensional
Í
cube [0, 1] 𝑁 . Taking the sum 𝑥𝑖 means projecting the cube onto the (1, 1, . . . , 1) axis, and
the reader may enjoy actually plotting these densities for different values of 𝑁. It is also fun
to compute the projection of a uniform measure on a high-dimensional sphere onto any axis.
It is by harnessing these concentration of measure phenomena that the density (74)
can significantly improve upon (72).
12.4. Primes in arithmetic progressions, on average

Now let’s plug the formula (74) into the numerator in (66), expand out the square,
and do summation over the variable 𝑛 first. We get a sum of the form
∑︁ ∑︁ ∑︁
𝜌(𝑛) = 𝜇𝜇𝐹𝐹 1 (75)
𝑛 + 𝑗𝑘 is prime ® 𝑑®′
𝑑, certain 𝑛
where the outer sum is over two sets of integers
𝑑® = (𝑑1 , . . . , 𝑑𝑙 ) and 𝑑®′ = (𝑑1′ , . . . , 𝑑𝑙′ ) ,
there is a weight of the form

® ®′
𝜇𝜇𝐹𝐹 = 𝜇(Π𝑑𝑖 )𝜇(Π𝑑𝑖′ )𝐹 ( lnln 𝐷
𝑑
)𝐹 ( ln 𝑑
ln 𝐷 )
and the inner sum runs over 𝑛 such that
𝑛 + 𝑗𝑖 = 0 mod lcm(𝑑𝑖 , 𝑑 ′𝑗 ) , 𝑖 = 1...,𝑙, (76)

𝑛 + 𝑗𝑘 is prime , (77)
where lcm(𝑑𝑖 , 𝑑 ′𝑗 ) denotes the least common multiple.

It is clear from this that we must have 𝑑 𝑘 = 𝑑 ′𝑘 = 1. Since the remaining congru-
ence conditions can be put into a single congruence condition using the Chinese Remainder
Theorem, the sum over 𝑛 thus counts primes in an arithmetic progression.
24 Andrei Okounkov
Time and time again in these notes we have stressed the technical importance of
being able to accurately count primes in arithmetic progression in analytic number theory,
also stressing that this may be very delicate if the progression is not much longer than its
common difference.
The counting function (27) may be refined to count primes in a given residue class
modulo 𝑏
𝜋(𝑥, 𝑏, 𝑎) = number of primes 𝑝 such that 𝑝 ≤ 𝑥 and 𝑝 = 𝑎 mod 𝑏 . (78)
The Dirichlet theorem mentioned in Section 7 says that
𝜋(𝑥, 𝑏, 𝑎)  𝜙(𝑏) −1 ,

 gcd(𝑎, 𝑏) = 1 ,
→ (79)
𝜋(𝑥) 0, otherwise ,

as 𝑥 → ∞, where 𝜙(𝑏) is the number of residue classes coprime to 𝑏. For fixed 𝑥, however,
the function
𝜋(𝑥, 𝑏, 𝑎)
(𝑏, 𝑎) ↦→ 𝜙(𝑏) −1 (80)
𝜋(𝑥)
behaves in a very irregular manner. This is illustrated in the following plot for 𝑎 < 𝑏 ≤ 100
and the first 5000 primes, which means 𝑥 1/2 ≈ 220.

Very fortunately, in (75), we don’t have to face the full complexity of this function.
Since there is an outside summation over 𝑑® and 𝑑®′ , we only need to know its average over 𝑏.
Recall that the Riemann hypothesis implies error of size about 𝑥 1/2 in the prime num-
ber theorem. The conjectural extension of the Riemann hypothesis to Dirichlet L-functions
(47) would give a similar error bound for 𝜋(𝑥, 𝑏, 𝑎). If one sums these errors for 𝑏 < 𝑥 1/2 , one
thus expects to get something of order 𝑥. Remarkably, a slight weakening of this statement,
known as the Bombieri-Vinogradov theorem has been proven [2, 29]. In other words, the
Riemann hypothesis for L-functions is a complete mystery, but its main consequence for the
25 Rhymes in primes
distributions of primes in arithmetic progression can be rigorously proven on average. The
actual estimate one needs here has the form
∑︁ 𝜋(𝑥) 𝑥
max 𝜋(𝑥, 𝑏, 𝑎) − ≤ 𝐶 ( 𝐴, 𝜀) , (81)
𝑎
1/2− 𝜀 gcd(𝑎,𝑏)=1
𝜙(𝑏) (ln 𝑥) 𝐴
𝑏<𝑥
which holds for any 𝐴 > 0 and 𝜀 > 0 with some positive constant 𝐶 ( 𝐴, 𝜀) that depends on
𝐴 and 𝜀. In our example, the maxima over 𝑎 in (81) and their running average over 𝑏 can be
seen in the following plot
Averaging really does make the behavior a lot more regular and, hence, manageable.
We have discussed some of the key ingredient that go into the proof of the amazing
result of Maynard and Tao. Perhaps, this discussion has given the reader the motivation and
confidence to open more advanced literature written by the experts in the field, including the
papers listed in Section 11. In any case, we hope to have communicated to the reader our
own sense of awe at the beauty of mathematics.
A. Limits
Limits are defined not just for numerical sequences (𝑎 1 , 𝑎 2 , . . . ) but for objects of
arbitrary nature for which there is a notion of neighborhoods. Namely, 𝑎 is the limit of the
above sequence, if every neighborhood of 𝑎 contains all elements 𝑎 𝑛 except maybe finitely
26 Andrei Okounkov
many. The reader may find it useful to picture this as follows:
(82)
where the bin represents a neighborhood of 𝑎 and spheres represent the elements 𝑎 𝑛 . Of
course, since the sequence is infinite, any neighborhood of the limit point contains not just
many, but infinitely many of the 𝑎 𝑛 ’s.
For real numbers, or any other set with the notion of distance, we may take the open
balls of arbitrary positive radius 𝑟 > 0
𝐵(𝑎, 𝑟) = {all 𝑥 such that distance(𝑥, 𝑎) < 𝑟}
as standard neighborhoods. The reader may check her or his understanding of the definition
by proving (11) and (12), constructing a sequence or real numbers that does not have a limit,
and proving that the limit of a sequence of real numbers is unique when it exists.
The slight issue with defining the limits digit by digit is that the set of all real numbers
whose decimal expansion is fixed up to a certain point is a half-open interval, for instance
{all 𝑥 such that 𝑥 = 2.71 . . . } = [2.71, 2.72) .
To define limits for real numbers correctly, one one should take open intervals, that is, those
without both endpoints as neighborhoods. Back to the main text.
B. Mellin transform and the density of primes

Consider a simplified model, in which we forget about integrality and talk about real
numbers 𝑥 > 1. Let 𝜌1 (𝑥) be a certain density function on [1, ∞). It will model the density
of prime numbers. What should then correspond to the density 𝜌𝑟 (𝑦) of the numbers 𝑦 that
have exactly 𝑟 prime factors?
We have, by definition, 𝑦 = 𝑥 1 𝑥2 . . . 𝑥𝑟 , where 𝑥𝑖 are distributed in the set
{1 ≤ 𝑥1 ≤ 𝑥 2 ≤ · · · ≤ 𝑥𝑟 }
with density 𝜌1 (𝑥 1 ) · · · 𝜌1 (𝑥𝑟 ). Thus for any function 𝑓 (𝑦) we have

∫ ∫ Ö
𝑓 (𝑦) 𝜌𝑟 (𝑦) 𝑑𝑦 = 𝑓 (𝑥1 · · · 𝑥𝑟 ) 𝜌1 (𝑥𝑖 ) 𝑑𝑥 𝑖 . (83)
1≤ 𝑥1 ≤ 𝑥2 ≤ ··· ≤ 𝑥𝑟
27 Rhymes in primes
Which functions 𝑓 (𝑦) should we consider?
In mathematics, the success often depends on choosing the right point of view. If
one has the right point of view, then one is able to see clearly where one is going.
A very nice choice here is to take 𝑓 (𝑦) = 𝑦 −𝑠 , where 𝑠 > 1 is parameter. This is called
Mellin transform, and it is a transform because it takes a function 𝜌𝑟 (𝑦) of one variable 𝑦 to
another functions 𝜌𝑟Mellin (𝑠), of the parameter 𝑠. Thus one trades a function of one variable
𝜌𝑟 (𝑦) for another function of one variable 𝜌𝑟Mellin (𝑠), which seems like a fair exchange. In
fact, one can reconstruct 𝜌𝑟 (𝑦) from 𝜌𝑟Mellin (𝑠), so no information is lost.
The Mellin transform is a close relative of the Fourier transform and what makes
the following computation work is the basic identity
(𝑥 1 𝑥2 ) 𝑠 = 𝑥1𝑠 𝑥2𝑠 .
Because of this, the function 𝑓 (𝑥1 · · · 𝑥𝑟 ) in (83) factors as 𝑓 (𝑥1 ) · · · 𝑓 (𝑥𝑟 ) and we can even-
tually reduce an 𝑟-fold integral in (83) to a product of 𝑟 integrals.
We compute
∫ ∞
def
𝜌𝑟Mellin (𝑠) = 𝑦 −𝑠 𝜌𝑟 (𝑦) 𝑑𝑦 (84)
∫ 1
Ö
= (𝑥1 · · · 𝑥𝑟 ) −𝑠 𝜌1 (𝑥𝑖 ) 𝑑𝑥 𝑖 (85)
1≤ 𝑥1 ≤ 𝑥2 ≤ ··· ≤ 𝑥𝑟
∫
1 Ö
= 𝑥 𝑖−𝑠 𝜌1 (𝑥𝑖 ) 𝑑𝑥 𝑖 (86)
𝑟! [1,∞) 𝑟
1 Mellin 𝑟
= 𝜌 (𝑠) , (87)
𝑟! 1
where in going from (85) to (86) we used the fact that
Ø
[1, ∞) 𝑟 = {1 ≤ 𝑥 𝑤(1) ≤ 𝑥 𝑤(2) ≤ · · · ≤ 𝑥 𝑤(𝑟 ) } (88)
permutations
𝑤:{1,...,𝑟 }→{1,...,𝑟 }
and that the integration over any of the 𝑟! sets in the right-hand side of (88) gives the same
result as (85).
If 𝜌• is the density of numbers 𝑦 having an arbitrary number of factors 𝑟, including
the case when 𝑟 = 0 and 𝑦 = 1, then summing (87) over 𝑟 = 0, 1, 2, . . . gives

𝜌•Mellin (𝑠) = exp 𝜌1Mellin (𝑠) , (89)
where exp(𝑥) is another notation for the function 𝑒 𝑥 from (13). The appearance of the expo-
nential function here is typical in many inclusion-exclusion situations.
To model unique factorization we want to take 𝜌• = 1 on [1, ∞) which means
∫ ∞
1
Mellin
𝜌• (𝑠) = 𝑥 −𝑠 𝑑𝑥 = , 𝑠 > 1. (90)
1 𝑠−1
Thus, we expect ∫ ∞
? 1
𝑥 −𝑠 𝜌1 (𝑥) 𝑑𝑥 = ln , 𝑠 > 1, (91)
1 𝑠−1
which is both good and bad news for the following reasons.
28 Andrei Okounkov
1
On the one hand, ln 𝑠−1 is not a Mellin transform of any density 𝜌1 on [1, ∞) simply
because it does not have a limit as 𝑠 → +∞. The 𝑠 → +∞ limit in (91) probes 𝜌1 (𝑥) for 𝑥
very close to 1 because 𝑥 −𝑠 becomes very small on the whole interval (1 + 𝛿, ∞) as 𝑠 → ∞,
for any fixed 𝛿 > 0. In particular, the Mellin transform of a bounded density function 𝜌1 (𝑥)
on [1, ∞) has to go to zero as 𝑠 → +∞.
This means that we cannot accurately model prime numbers with real numbers and
continuous densities. Of course, it was certainly silly to be asking for the density of small
primes to begin with. However, our interest is precisely the opposite, as we want to know the
behavior of 𝜌1 (𝑥) for large 𝑥. This region is probed by 𝑠 → 1 limit of the Mellin tranform.
In fact ∫ ∞
𝑓0
𝑓 (𝑥) = 𝑓0 + 𝑂 (𝑥 −𝑐 ) ⇒ 𝑓 (𝑥)𝑥 −𝑠 𝑑𝑥 = +... , (92)
1 𝑠 − 1
where 𝑂 (𝑥 −𝑐 ) means that 𝑓 ( 𝑥𝑥−𝑐) − 𝑓0
remains bounded as 𝑥 → ∞, the double arrow ⇒ denotes
implication, and dots stand for a function which is analytic for 𝑠 > 1 − 𝑐. (And also analytic
for complex values of 𝑠 such that ℜ𝑠 > 1 − 𝑐.) In the 𝑠 → 1 limit, we may write
∫ ∞ ∫ ∞
𝑑 𝑑 1 1
𝑥 −𝑠 𝜌1 (𝑥) ln(𝑥)𝑑𝑥 = − 𝑥 −𝑠 𝜌1 (𝑥)𝑑𝑥 ∼ − ln = .
1 𝑑𝑠 1 𝑑𝑠 𝑠 − 1 𝑠 − 1
which strongly suggests 𝜌1 (𝑥) ∼ 1/ln(𝑥) for 𝑥 → ∞.
In place of continuous approximations, the proof of Hadamard and de la Vallée
Poussin uses properties of the 𝜁-function (26), which, in the spirit of (84) , can be interpreted
as the averaged value of 𝑛 −𝑠 with respect to the measure that gives every positive integer 𝑛
weight 1. The equality between the sum and the product in (26) is the correct discrete version
of the relation (89). It looks different because in the discrete situation we need to account for
the nonzero chance of having two equal prime factors, the possibility of which was ignored
in going from (85) to (86). The exact analog of (90) is the the following description
1
𝜁 (𝑠) = + 𝛾 + 𝑜(1) , 𝑠 → 1 , (93)
𝑠−1
of the 𝑠 → 1 behavior of the 𝜁-function, where 𝛾 is the constant from (14) and (22) . Back
to the main text.
References
[1] R. C. Baker and A. J. Irving, Bounded intervals containing many primes, Math. Z. 286 (2017), no. 3-4, 821–841.
↑20
[2] E. Bombieri, On the large sieve, Mathematika 12 (1965), 201–225. ↑25
[3] Harold Davenport, Multiplicative number theory, 3rd ed., Graduate Texts in Mathematics, vol. 74, Springer-
Verlag, New York, 2000. Revised and with a preface by Hugh L. Montgomery. ↑20
[4] John Friedlander and Henryk Iwaniec, Opera de cribro, American Mathematical Society Colloquium Publi-
cations, vol. 57, American Mathematical Society, Providence, RI, 2010. ↑20
[5] D. A. Goldston, J. Pintz, and C. Y. Yıldırım, Small gaps between primes, Proceedings of the International
Congress of Mathematicians—Seoul 2014. Vol. II, Kyung Moon Sa, Seoul, 2014, pp. 419–441. ↑20
[6] Daniel A. Goldston, János Pintz, and Cem Y. Yıldırım, Primes in tuples. I, Ann. of Math. (2) 170 (2009),
no. 2, 819–862. ↑20
[7] Andrew Granville, Unexpected irregularities in the distribution of prime numbers, Proceedings of the Inter-
national Congress of Mathematicians, Vol. 1, 2 (Zürich, 1994), Birkhäuser, Basel, 1995, pp. 388–399. ↑15
29 Rhymes in primes
[8] , Primes in intervals of bounded length, Bull. Amer. Math. Soc. (N.S.) 52 (2015), no. 2, 171–222. ↑19,
20
[9] , Number theory revealed: a masterclass, American Mathematical Society, Providence, RI, [2019]
©2019. ↑20
[10] Andrew Granville and Jennifer Granville, Prime suspects, Princeton University Press, Princeton, NJ, 2019.
The anatomy of integers and permutations; Illustrated by Robert J. Lewis. ↑20
[11] Kevin Hartnett, New Proof Settles How to Approximate Numbers Like Pi, Quanta Magazine (August 14, 2019).
https://www.quantamagazine.org/new-proof-settles-how-to-approximate-numbers-like-pi-20190814/. ↑20
[12] Henryk Iwaniec and Emmanuel Kowalski, Analytic number theory, American Mathematical Society Collo-
quium Publications, vol. 53, American Mathematical Society, Providence, RI, 2004. ↑20
[13] Erica Klarreich, Unheralded Mathematician Bridges the Prime Gap, Quanta Magazine (May 19,
2013). www.quantamagazine.org/yitang-zhang-proves-landmark-theorem-in-distribution-of-prime-numbers-
20130519. ↑19, 20
[14] , Together and Alone, Closing the Prime Gap, Quanta Magazine (November 19, 2013).
www.quantamagazine.org/mathematicians-team-up-on-twin-primes-conjecture-20131119/. ↑20
[15] , Prime Gap Grows After Decades-Long Lull, Quanta Magazine (December 10, 2014).
www.quantamagazine.org/mathematicians-prove-conjecture-on-big-prime-number-gaps-20141210/. ↑20
[16] Dimitris Koukoulopoulos, The distribution of prime numbers, Graduate Studies in Mathematics, vol. 203,
American Mathematical Society, Providence, RI, [2019] ©2019. ↑20
[17] Emmanuel Kowalski, Gaps between prime numbers and primes in arithmetic progressions [after Y. Zhang
and J. Maynard], Astérisque 367-368 (2015), Exp. No. 1084, ix, 327–366. ↑19, 20
[18] Michel Ledoux, The concentration of measure phenomenon, Mathematical Surveys and Monographs, vol. 89,
American Mathematical Society, Providence, RI, 2001. ↑24
[19] Thomas Lin, After Prime Proof, an Unlikely Star Rises, Quanta Magazine (April 2, 2015).
https://www.quantamagazine.org/yitang-zhang-and-the-mystery-of-numbers-20150402. ↑19, 20
[20] James Maynard, Small gaps between primes, Ann. of Math. (2) 181 (2015), no. 1, 383–413. ↑19, 20
[21] , Digits of primes, European Congress of Mathematics, Eur. Math. Soc., Zürich, 2018, pp. 641–661.
↑20
[22] , Gaps between primes, Proceedings of the International Congress of Mathematicians—Rio de Janeiro
2018. Vol. II. Invited lectures, World Sci. Publ., Hackensack, NJ, 2018, pp. 345–361. ↑20
[23] , The twin prime conjecture, Jpn. J. Math. 14 (2019), no. 2, 175–206. ↑20
[24] D. H. J. Polymath, New equidistribution estimates of Zhang type, Algebra Number Theory 8 (2014), no. 9,
2067–2199. ↑20
[25] , Variants of the Selberg sieve, and bounded intervals containing many primes, Res. Math. Sci. 1
(2014), Art. 12, 83. ↑20
[26] K. Soundararajan, Small gaps between prime numbers: the work of Goldston-Pintz-Yıldırım, Bull. Amer. Math.
Soc. (N.S.) 44 (2007), no. 1, 1–18. ↑20
[27] Gérald Tenenbaum and Michel Mendès France, The prime numbers and their distribution, Student Mathemat-
ical Library, vol. 6, American Mathematical Society, Providence, RI, 2000. Translated from the 1997 French
original by Philip G. Spain. ↑20
[28] Gérald Tenenbaum, Introduction to analytic and probabilistic number theory, 3rd ed., Graduate Studies in
Mathematics, vol. 163, American Mathematical Society, Providence, RI, 2015. Translated from the 2008
French edition by Patrick D. F. Ion. ↑20
[29] A. I. Vinogradov, The density hypothesis for Dirichet 𝐿-series, Izv. Akad. Nauk SSSR Ser. Mat. 29 (1965),
903–934 (Russian). ↑25
[30] Yitang Zhang, Small gaps between primes and primes in arithmetic progressions to large moduli, Proceedings
of the International Congress of Mathematicians—Seoul 2014. Vol. II, Kyung Moon Sa, Seoul, 2014, pp. 557–
567. ↑19
[31] , Bounded gaps between primes, Ann. of Math. (2) 179 (2014), no. 3, 1121–1174. ↑19, 20
30 Andrei Okounkov
Andrei Okounkov
Andrei Okounkov, Department of Mathematics, University of California, Berkeley, 970
Evans Hall Berkeley, CA 94720–3840, okounkov@math.columbia.edu
31 Rhymes in primes

JM FM

Uploaded by

Copyright:

Available Formats

JM FM

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JM FM

Uploaded by

Copyright:

Available Formats

Rhymes in primes

© 2022 International Mathematical Union

2. Last digits of primes

Clearly, for all primes 𝑝 ≠ 2 we have 𝑝 = 1 mod 2.

211𝑖 + 𝑗 = 𝑖 + 𝑗 mod 210

and the factorization 210 = 2 · 3 · 5 · 7.

3. The Chinese remainder theorem

𝑎 −→ (𝑎 1 , 𝑎 2 ) = (𝑎 mod 𝑏 1 , 𝑎 mod 𝑏 2 ) . (6)

{residues mod 𝑏 1 𝑏 2 } = {residues mod 𝑏 1 } × {residues mod 𝑏 2 } (8)

which illustrates two key points:

• 𝑎 is coprime to 10 if and only if 𝑎 is coprime to 2 and 5,

• being coprime to 2 and 5 are independent events.

4. Infinity and limits

that tends to a limit

5. The density of primes

This is indeed true, but with the clarification that

6. The prime number theorem

𝜋(𝑥) = number of primes 𝑝 such that 𝑝 ≤ 𝑥 (27)

and states that5 ∫ 𝑥

A = {residues modulo 10}

in which case (30) gives

special instances of which we have observed in (17), (18), and (25).

as is illustrated for P = {2, 3, 5} in the following diagram. In (34), we visualize a composite

summing the values in (38) gives (35).

An example of such weight function is

𝜌 𝜁 (𝑛) = 𝑛 −𝑠 , 𝑠 > 1, (40)

used in the construction of the 𝜁-function. Multiplicativity of 𝜌

𝜌(𝑛1 𝑛2 ) = 𝜌(𝑛1 ) 𝜌(𝑛2 ) , (41)

gcd(𝑛1 , 𝑛2 ) = 1 ⇒ 𝜌(𝑛1 𝑛2 ) = 𝜌(𝑛1 ) 𝜌(𝑛2 ) . (43)

8. The first challenge for sieves

𝐽 = { 𝑗1 < 𝑗2 < · · · < 𝑗𝑙 } ⊂ N (57)

7 This specific statement is known as the Dickson conjecture, made in 1904.

shows there are infinitely many primes at most 246 apart.

𝐽 = {35419, 35423, . . . , 469411, 469397} .

𝑎 𝑖 = 𝑙! 10𝑙+2 , 𝑗𝑖 = 10𝑖+1 + 1 , (64)

11. Further reading

12.1. Being prime on average

12.2. Looking for 𝝆, part I

The next natural idea is to find a working replacement e

1000109 = 11 · 23 · 59 · 67, 1000545 = 3 · 5 · 7 · 13 · 733,

12.3. Looking for 𝝆, part II

12.4. Primes in arithmetic progressions, on average

where the outer sum is over two sets of integers

𝑑® = (𝑑1 , . . . , 𝑑𝑙 ) and 𝑑®′ = (𝑑1′ , . . . , 𝑑𝑙′ ) ,

there is a weight of the form

and the inner sum runs over 𝑛 such that

𝑛 + 𝑗𝑖 = 0 mod lcm(𝑑𝑖 , 𝑑 ′𝑗 ) , 𝑖 = 1...,𝑙, (76)

where lcm(𝑑𝑖 , 𝑑 ′𝑗 ) denotes the least common multiple.

𝜋(𝑥, 𝑏, 𝑎) = number of primes 𝑝 such that 𝑝 ≤ 𝑥 and 𝑝 = 𝑎 mod 𝑏 . (78)

The Dirichlet theorem mentioned in Section 7 says that

and the first 5000 primes, which means 𝑥 1/2 ≈ 220.

𝐵(𝑎, 𝑟) = {all 𝑥 such that distance(𝑥, 𝑎) < 𝑟}

{all 𝑥 such that 𝑥 = 2.71 . . . } = [2.71, 2.72) .

B. Mellin transform and the density of primes

with density 𝜌1 (𝑥 1 ) · · · 𝜌1 (𝑥𝑟 ). Thus for any function 𝑓 (𝑦) we have

You might also like