Tale of Two Sieves
Tale of Two Sieves
Tale of Two Sieves
A Tale of
Two Sieves
Carl Pomerance
I
t is the best of times for the game of fac- was largely ignored, since it was considered triv-
toring large numbers into their prime fac- ial. After all, it was doable in principle, so what
tors. In 1970 it was barely possible to fac- else was there to discuss? A few researchers ig-
tor “hard” 20-digit numbers. In 1980, in nored the fashions of the time and continued to
the heyday of the Brillhart-Morrison con- try to find fast ways to factor. To these few it
tinued fraction factoring algorithm, factoring of was a basic and fundamental problem, one that
50-digit numbers was becoming commonplace. should not be shunted to the side.
In 1990 my own quadratic sieve factoring algo- But times change. In the last few decades we
rithm had doubled the length of the numbers have seen the advent of accessible and fast com-
that could be factored, the record having 116 dig- puting power, and we have seen the rise of cryp-
its. tographic systems that base their security on our
By 1994 the quadratic sieve had factored the supposed inability to factor quickly (and on
famous 129-digit RSA challenge number that other number theoretic problems). Today there
had been estimated in Martin Gardner’s 1976 Sci- are many people interested in factoring, recog-
entific American column to be safe for 40 nizing it not only as a benchmark for the secu-
quadrillion years (though other estimates around rity of cryptographic systems, but for comput-
then were more modest). But the quadratic sieve ing itself. In 1984 the Association for Computing
is no longer the champion. It was replaced by Machinery presented a plaque to the Institute for
Pollard’s number field sieve in the spring of Electrical and Electronics Engineers (IEEE) on
1996, when that method successfully split a the occasion of the IEEE centennial. It was in-
130-digit RSA challenge number in about 15% of scribed with the prime factorization of the num-
the time the quadratic sieve would have taken. ber 2251 − 1 that was completed that year with
In this article we shall briefly meet these fac- the quadratic sieve. The president of the ACM
torization algorithms—these two sieves—and made the following remarks:
some of the many people who helped to de-
velop them. About 300 years ago the French
In the middle part of this century, computa- mathematician Mersenne speculated
tional issues seemed to be out of fashion. In most that 2251 − 1 was a composite, that
books the problem of factoring big numbers is, a factorable number. About 100
years ago it was proved to be fac-
torable, but even 20 years ago the
Carl Pomerance is research professor of mathematics computational load to factor the
at the University of Georgia, Athens, GA. His e-mail ad- number was considered insur-
dress is carl@ada.math.uga.edu. mountable. Indeed, using conven-
Supported in part by National Science Foundation grant tional machines and traditional
number DMS-9206784. search algorithms, the search time
was estimated to be about 1020 years. I had wasted too much time, and I missed the
The number was factored in Febru- problem.
ary of this year at Sandia on a Cray So can you find the clever way? If you wish
computer in 32 hours, a world to think about this for a moment, delay reading
record. We’ve come a long way in the next paragraph.
computing, and to commemorate
IEEE’s contribution to computing we Fermat and Kraitchik
have inscribed the five factors of the The trick is to write 8051 as 8100 − 49, which
Mersenne composite on a plaque. is 902 − 72 , so we may use algebra, namely, fac-
Happy Birthday, IEEE. toring a difference of squares, to factor 8051. It
is 83 × 97 .
Factoring big numbers is a strange kind of Does this always work? In fact, every odd
mathematics that closely resembles the experi- composite can be factored as a difference
� �of
1 2
mental sciences, where nature has the last and squares:
� just
�2 use the identity ab = 2 (a + b)
definitive word. If some method to factor n runs − 12 (a − b) . Trying to find a pair of squares
for awhile and ends with the statement “ d is a which work is, in fact, a factorization method of
factor of n”, then this assertion may be easily Fermat. Just like trial division, which has some
checked; that is, the integers have the last and very easy cases (such as when there is a small
definitive word. One can thus get by quite nicely prime factor), so too does the difference-of-
without proving a theorem that a method works squares method have easy cases. For example,
in general. But, as with the experimental sci-
√
if n = ab where a and b are very close to n ,
ences, both rigorous and heuristic analyses can as in the case of n = 8051 , it is easy to find the
be valuable in understanding the subject and two squares. But in its worst cases, the differ-
moving it forward. And, as with the experimen- ence-of-squares method can be far worse than
tal sciences, there is sometimes a tension be- trial division. It is worse in another way too.
tween pure and applied practitioners. It is held With trial division, most numbers fall into the
by some that the theoretical study of factoring easy case; namely, most numbers have a small
is a freeloader at the table (or as Hendrik Lenstra factor. But with the difference-of-squares
once colorfully put it, paraphrasing Siegel, “a pig method, only a small fraction of numbers have
in the rose garden”), enjoying undeserved at- a divisor near their square root, so the method
tention by vapidly giving various algorithms la- works well on only a small fraction of possible
bels, be they “polynomial”, “exponential”, “ran- inputs. (Though trial division allows one to begin
dom”, etc., and offering little or nothing in return a factorization for most inputs, finishing with a
to those hard workers who would seriously com- complete factorization is usually far more dif-
pute. There is an element of truth to this view. ficult. Most numbers resist this, even when a
But as we shall see, theory played a significant combination of trial division and difference-of-
role in the development of the title’s two sieves. squares is used.)
In the 1920s Maurice Kraitchik came up with
A Contest Problem an interesting enhancement of Fermat’s differ-
But let us begin at the beginning, at least my be- ence-of-squares technique, and it is this en-
ginning. When I give talks on factoring, I often hancement that is at the basis of most modern
repeat an incident that happened to me long factoring algorithms. (The idea had roots in the
ago in high school. I was involved in a math con- work of Gauss and Seelhoff, but it was Kraitchik
test, and one of the problems was to factor the who brought it out of the shadows, introducing
number 8051. A time limit of five minutes was it to a new generation in a new century. For
given. It is not that we were not allowed to use more on the early history of factoring, see [23].)
pocket calculators; they did not exist in 1960, Instead of trying to find integers u and v with
around when this event occurred! Well, I was u2 − v 2 equal to n, Kraitchik reasoned that it
fairly good at arithmetic, and I was sure I could might suffice to find u and v with u2 − v 2 equal
trial divide up to the square root of 8051 (about to a multiple of n, that is, u2 ≡ v 2 mod n . Such
90) in the time allowed. But on any test, espe- a congruence can have uninteresting solutions,
cially a contest, many students try to get into the those where u ≡ ±v mod n, and interesting so-
mind of the person who made it up. Surely they lutions, where u �≡ ±v mod n. In fact, if n is odd
would not give a problem where the only rea- and divisible by at least two different primes,
sonable approach was to try possible divisors then at least half of the solutions to
frantically until one was found. There must be u2 ≡ v 2 mod n , with uv coprime to n, are of the
a clever alternate route to the answer. So I spent interesting variety. And for an interesting solu-
a couple of minutes looking for the clever way, tion u, v , the greatest common factor of u − v
but grew worried that I was wasting too much and n, denoted (u − v, n), must be a nontrivial
time. I then belatedly started trial division, but factor of n . Indeed, n divides u2 − v 2 =
(u − v)(u + v) but divides neither factor. So n Lehmer and R. E. Powers suggested replacing
must be somehow split between u − v and u + v. Kraitchik’s function Q(x) = x2 − n with another
As an aside, it should be remarked that find- that is derived from the continued-fraction ex-
√
ing the greatest common divisor (a, b) of two pansion of n .
given numbers a and b is a very easy task. If If ai /bi is the i-th continued fraction con-
0 < a ≤ b and if a divides b , then (a, b) = a . If √
vergent to n , let Qi = ai2 − bi2 n . Then
a does not divide b , with b leaving a remainder 2
Qi ≡ ai mod n. Thus, instead of playing with the
r when divided by a, then (a, b) = (a, r ) . This
numbers Q(x) , we may play with the numbers
neat idea of replacing a larger problem with a
Qi , since in both cases they are congruent mod-
smaller one is over two thousand years old and
ulo n to known squares. Although continued
is due to Euclid. It is very fast: it takes about as
fractions can be ornery beasts as far as compu-
much time for a computer to find the greatest
common divisor of a and b as it would take to tation goes, the case for quadratic irrationals is
multiply them together. quite pleasant. In fact, there is a simple iterative
Let us see how Kraitchik might have factored procedure (see [16]) going back to Gauss and per-
n = 2041 . The first square above n is haps earlier for computing what is needed here,
462 = 2116 . Consider the sequence of numbers namely, the sequence of integers Qi and the
Q(x) = x2 − n for x = 46, 47, . . . . We get residues ai mod n.
75, 168, 263, 360, 459, 560, . . . . But why mess up a perfectly simple quadratic
polynomial with something as exotic as contin-
So far no squares have appeared, so Fermat ued fractions? It is because of the inequality
√
might still be searching. But Kraitchik has an- |Qi | < 2 n. The numbers Qi are smaller in ab-
other option: namely, he tries to find several solute value than the numbers Q(x) . (As x moves
numbers x with the product of the corre- √
away from n , the numbers Q(x) grow ap-
sponding numbers Q(x) equal to a square. For √
proximately linearly, with a slope of 2 n .) If
if Q(x1 ) · · · Q(xk ) = v 2 and x1 · · · xk = u , then one wishes to “play” with numbers to find some
of them with product a square, it is presumably
u2 = x21 · · · x2k ≡ (x21 − n) · · · (x2k − n)
easier to do this with smaller numbers than with
= Q(x1 ) · · · Q(xk ) = v 2 mod n; larger numbers. So the continued fraction of
that is, we have found a solution to Lehmer and Powers has an apparent advantage
u2 ≡ v 2 mod n . But how to find the set over the quadratic polynomial of Kraitchik.
x1 , . . . , xk ? Kraitchik notices that some of the
numbers Q(x) factor very easily: How to “Play” with Numbers
It is certainly odd to have an instruction in an
75 = 3 × 52 , 168 = 23 × 3 × 7, algorithm asking you to play with some numbers
360 = 23 × 32 × 5, 560 = 24 × 5 × 7. to find a subset with product a square. I am re-
minded of the cartoon with two white-coated sci-
From these factorizations he can tell that the
entists standing at a blackboard filled with ar-
product of these four numbers is
210 × 34 × 54 × 72 , a square! Thus he has cane notation, and one points to a particularly
u2 ≡ v 2 mod n , where delicate juncture and says to the other that at
this point a miracle occurs. Is it a miracle that
u = 46 · 47 · 49 · 51 ≡ 311 mod 2041, we were able to find the numbers 75, 168, 360,
v = 25 · 32 · 52 · 7 ≡ 1416 mod 2041. and 560 in Kraitchik’s sequence with product a
square? Why should we expect to find such a sub-
He is now nearly done, since sequence, and, if it exists, how can we find it ef-
311 �≡ ±1416 mod 2041 . Using Euclid’s algo- ficiently?
rithm to compute the greatest common factor A systematic strategy for finding a subse-
(1416 − 311, 2041) , he finds that this is 13, and quence of a given sequence with product a square
so 2041 = 13 × 157.
was found by John Brillhart and Michael Morri-
Continued Fractions son, and, surprisingly, it is at heart only linear
algebra (see [16]). Every positive integer m has
The essence of Kraitchik’s method is to “play”
with the sequence x2 − n as x runs through in- an exponent vector v(m) that is based on the
√
tegers near n to find a subsequence with prod- prime factorization of � mv. Let pi denote the i-th
uct a square. If the square root of this square is prime, and say m = pi i . (The product is over
v and the product of the corresponding x val- all primes, but only finitely many of the expo-
ues is u, then u2 ≡ v 2 mod n , and there is now nents vi are nonzero.) Then v(m) is the vector
a hope that this congruence is “interesting”, (v1 , v2 , . . . ) . For example, leaving off the infinite
namely, that u �≡ ±v mod n . In 1931 D. H. string of zeros after the fourth place, we have
v(75) = (0, 1, 2, 0), are not negative. However, we can put an extra
v(168) = (3, 1, 0, 1), coordinate in each exponent vector, one that is
v(360) = (3, 2, 1, 0), 0 for positive numbers and 1 for negative num-
bers. (It is as if we are including the “prime” −1
v(560) = (4, 0, 1, 1). in the factor base.) So allowing the auxiliary
For our purposes the exponent vectors give too numbers to be negative just increases the di-
much information. We are interested only in mension of the problem by 1.
squares, and since a positive integer m is a For example, let us consider again the num-
square if and only if every entry of v(m) is even, ber 2041 and try to factor it via Kraitchik’s poly-
we should be reducing exponents modulo 2. nomial, but now allowing negative values. So
Since v takes products to sums, we are looking with Q(x) = x2 − 2041 and the factor base 2, 3,
for numbers such that the sum of their exponent and 5, we have
vectors is the zero vector mod 2. The mod 2 re- Q(43) = −192 = −26 · 3 ↔ (1, 0, 1, 0)
ductions of the above exponent vectors are Q(44) = −105
v(75) ≡ (0, 1, 0, 0) mod 2, Q(45) = −16 = −24 ↔ (1, 0, 0, 0)
v(168) ≡ (1, 1, 0, 1) mod 2, Q(46) = 75 = 3 · 52 ↔ (0, 0, 1, 0),
v(360) ≡ (1, 0, 1, 0) mod 2, where the first coordinates correspond to the ex-
ponent on −1 . So, using the smaller factor base
v(560) ≡ (0, 0, 1, 1) mod 2.
of 2, 3, and 5 but allowing also negatives, we are
Note that their sum is the zero vector! Thus the especially lucky, since the three vectors assem-
product of 75, 168, 360, and 560 is a square. bled so far are dependent. This leads to the con-
To systematize this procedure, Brillhart and gruence (43 · 45 · 46)2 ≡ (−192)(−16) (75) mod
Morrison suggest that we choose some number 2041, or 12472 ≡ 4802 mod 2041 . This again
B and look only at those numbers in the se- gives us the divisor 13 of 2041, since
quence that completely factor over the first B (1247 − 480, 2041) = 13 .
primes. So in the case above, we have B = 4 . As Does the final greatest common divisor step
soon as B + 1 such numbers have been assem- always lead to a nontrivial factorization? No it
bled, we have B + 1 vectors in the B-dimensional does not. The reader is invited to try still another
vector space F2B . By linear algebra they must be assemblage of a square in connection with 2041.
linearly dependent. But what is a linear depen- This one involves Q(x) for x = 41, 45 , and 49
dence relation over the field F2? Since the only and gives rise to the congruence
scalars are 0 and 1, a linear dependence relation 6012 ≡ 14402 mod 2041. In our earlier termi-
is simply a subset sum equaling the 0-vector. And nology, this congruence is uninteresting, since
we have many algorithms from linear algebra 601 ≡ −1440 mod 2041 . And sure enough, the
that can help us find this dependency. greatest common divisor (601 − 1440, 2041) is
Note that we were a little lucky here, since we the quite uninteresting divisor 1.
were able to find the dependency with just four
Smooth Numbers and the Stirrings of
vectors rather than the five vectors needed in the
Complexity Theory
worst case.
Brillhart and Morrison call the primes With the advent of the RSA public key cryp-
p1 , p2 , . . . , pB the “factor base”. (To be more pre- tosystem in the late 1970s, it became particularly
important to try to predict just how hard fac-
cise, they discard those primes pj for which n
toring is. Not only should we know the state of
is not congruent to a square, since such primes
the art at present, we would like to predict just
will never divide a number Qi in the continued-
what it would take to factor larger numbers be-
fraction method nor a number Q(x) in Kraitchik’s
yond what is feasible now. In particular, it seems
method.) How is B to be chosen? If it is chosen
empirically that dollar for dollar computers dou-
small, then we do not need to assemble too
ble their speed and capacity every one and a half
many numbers before we can stop. But if it is cho-
to two years. Assuming this and no new factor-
sen too small, the likelihood of finding a num-
ing algorithms, what will be the state of the art
ber in the sequence that completely factors over
in ten years?
the first B primes will be so minuscule that it
It is to this type of question that complexity
will be difficult to find even one number. Thus
theory is well suited. So how then might one an-
somewhere there is a happy balance, and with
alyze factoring via Kraitchik’s polynomial or the
factoring 2041 via Kraitchik’s method, the happy
Lehmer-Powers continued fraction? Richard
balance turned out to be B = 4 .
Schroeppel, in unpublished correspondence in
Some of the auxiliary numbers may be nega-
the late 1970s, suggested a way. Essentially, he
tive. How do we handle their exponent vectors? begins by thinking of the numbers Qi in the con-
Clearly we cannot ignore the sign, since squares tinued-fraction method or the numbers Q(x) in
Kraitchik’s method as “random”. If you are pre- assemble the congruent squares we may be very
sented with a stream of truly random numbers unlucky and only come up with uninteresting so-
below a particular bound X, how long should you lutions which do not help in the factorization.
expect to wait before you find some subset with Again assuming randomness, we do not expect
product a square? inordinately long strings of bad luck, and this
Call a number Y-smooth if it has no prime fac- heuristic again supports the conjecture.
tor exceeding Y. (Thus a number which com- As mentioned, this complexity argument was
pletely factors over the primes up to pB is pB - first made by Richard Schroeppel in unpublished
smooth.) What is the probability that a random work in the late 1970s. (He assumed the result
positive integer up to X is Y -smooth? It is mentioned above from [4], even though at that
ψ (X, Y )/[X] ≈ ψ (X, Y )/X, where ψ (X, Y ) is the time it was not a theorem or even really a con-
number of Y-smooth numbers in the interval jecture.) Armed with the tools to study com-
[1, X] . Thus the expected number of random plexity, he used them during this time to come
numbers that must be examined to find just up with a new method that came to be known
one that is Y-smooth is the reciprocal of this as the linear sieve. It was the forerunner of the
probability, namely, X/ψ (X, Y ). But we must quadratic sieve and also its inspiration.
find about π (Y ) such Y-smooth numbers, where
π (Y ) denotes the number of primes up to Y. So Using Complexity to Come Up With a
the expected number of random numbers that Better Algorithm: The Quadratic Sieve
must be examined is about π (Y )X/ψ (X, Y ) . And The above complexity sketch shows a place
how much work does it take to examine a num- where we might gain some improvement. It is the
ber to see if it is Y-smooth? If one uses trial di- time we are taking to recognize auxiliary num-
vision for this task, it takes about π (Y ) steps. bers that factor completely with the primes up
So the expected number of steps is to Y = pB, that is, the Y-smooth numbers. In the
π (Y )2 X/ψ (X, Y ) . argument we assumed this is about π (Y ) steps,
It is now a job for analytic number theory to where π (Y ) is the number of primes up to Y. The
choose Y as a function of X so as to minimize probability that a number is Y-smooth is, ac-
the expression π (Y )2 X/ψ (X, Y ) . In fact, in the cording to the notation above, ψ (X, Y )/[X] . As
late 1970s the tools did not quite exist to make you might expect and as is easily checked in
this estimation accurately. practice, when Y is a reasonable size and X is
This was remedied in a paper in 1983 (see [4]), very large, this probability is very, very small. So
though preprints of this paper were around for one after the other, the auxiliary numbers pop
several years before then. So what is the mini- up, and we have to invest all this time in each
mum?� It occurs when Y is about one, only to find out almost always that the
exp( 12 log X log� log X) and the minimum value number is not Y-smooth and is thus a number
is about exp(2 log X log log X). But what are “ X” that we will discard.
and “ Y” anyway?1 The number X is an estimate It occurred to me early in 1981 that one might
for the typical auxiliary number the algorithm use something akin to the sieve of Eratosthenes
produces. In the continued-fraction
√ method, X to quickly recognize the smooth values of
can be taken as 2 n . With Kraitchik’s polyno- Kraitchik’s quadratic polynomial Q(x) = x2 − n.
mial, X is a little larger: it is n1/2+ε . And the num- The sieve of Eratosthenes is the well-known de-
ber Y is an estimate for pB , the largest prime in vice for finding all the primes in an initial interval
the factor base. of the natural numbers. One circles the first
Thus factoring n, either via the Lehmer-Pow- prime 2 and then crosses off every second num-
ers continued fraction or via the � Kraitchik poly- ber, namely, 4, 6, 8, etc. The next unmarked
nomial, should take about exp( 2 log n log log n) number is 3. It is circled, and we then cross off
steps. This is not a theorem; it is a conjecture. every third number. And so on. After reaching
The conjecture is supported by the above heuris- the square root of the upper limit of the sieve,
tic argument which assumes that the auxiliary one can stop the procedure and circle every re-
numbers
√ generated by the continued fraction of maining unmarked number. The circled numbers
n or by Kraitchik’s quadratic polynomial are are the primes; the crossed-off numbers the
“random” as far as the property of being
composites.
Y-smooth goes. This has not been proved. In ad-
It should be noted that the sieve of Eratos-
dition, getting many auxiliary numbers that are
thenes does more than find primes. Some
Y-smooth may not be sufficient for factoring n,
crossed-off numbers are crossed off many times.
since each time we use linear algebra over F2 to
For example, 30 is crossed off three times, as is
42, since these numbers have three prime fac-
1Actually, this is a question that has perplexed many tors. Thus we can quickly scan the array look-
a student in elementary algebra, not to mention many ing for numbers that are crossed off a lot and
a philosopher of mathematics. so quickly find the numbers which have many
due to the exponential spread of low-cost, high- their eyes on a real prize, 22 + 1, the ninth Fer-
mat number.2 Clearly it was beyond the range of
quality computers.
the quadratic sieve. Hendrik Lenstra’s own el-
The Dawn of the Number Field Sieve liptic curve method, which he discovered early
Taking his inspiration from a discrete logarithm
algorithm of Don Coppersmith, Andrew Odlyzko, 2This number had already been suggested in Pollard’s
and Richard Schroeppel [6] that used quadratic original note as a worthy goal. It was known to be com-
number fields, John Pollard in 1988 circulated posite—in fact, we already knew a 7-digit prime factor—
a letter to several people outlining an idea of his but the remaining 148-digit cofactor was still compos-
for factoring certain big numbers via algebraic ite, with no factor known.
in 1985 and which is especially good at splitting the ring Z[α] consisting of all polynomial ex-
numbers which have a relatively small prime pressions in α with integer coefficients. Since
factor (say, “only” 30 or so digits) had so far not f (α) = 0 and f (m) ≡ 0 mod n, by substituting
been of help in factoring it. The Lenstras and the residue m mod n for each occurrence of α
Manasse succeeded in getting the prime factor- we have a natural map φ from Z[α] to Z/(nZ) .
9
ization of 22 + 1 in the spring of 1990. This Our conditions on f , α, and m ensure that φ is
sensational achievement announced to the world well defined. And not only this, φ is a ring ho-
that Pollard’s number field sieve had arrived. momorphism.
But what of general numbers? In the summer Suppose now that S is a finite set of coprime
of 1989 I was to give a talk at the meeting of the integer pairs �a, b� with two properties. The
Canadian Number Theory Association in Van- first is that the product of the algebraic integers
couver. It was to be a survey talk on factoring, a − αb for all pairs �a, b� in S is a square in
and I figured it would be a good idea to mention Z[α] , say, γ 2 . The second property for S is that
Pollard’s new method. On the plane on the way the product of all the numbers a − mb for pairs
to the meeting I did a complexity analysis of the �a, b� in S is a square in Z , say, v 2 . Since γ may
method as to how it would work for general be written as a polynomial expression in α, we
numbers, assuming myriad technical difficul- may replace each occurrence of α with the in-
ties did not exist and that it was possible to run teger m , coming up with an integer u with
it for general numbers. I was astounded. The φ(γ) ≡ u mod n. Then
complexity for this algorithm-that-did-not-yet- �
2 2 2
exist was of the shape exp(c(log n)1/3 u ≡ φ(γ) = φ(γ ) = φ (a − αb)
(log log n)2/3 ) . The key difference over the com- �a,b�∈S
�
plexity of the quadratic sieve was that the most = φ(a − αb)
important quantity in the exponent, the power �a,b�∈S
of log n, had its exponent reduced from 1/2 to �
1/3. If reducing the constant in the exponent had ≡ (a − mb) = v 2 mod n.
�a,b�∈S
such a profound impact in passing from the And we know what to do with u and v . Just as
continued-fraction method to the quadratic sieve, Kraitchik showed us seventy years ago, we hope
think what reducing the exponent in the expo- that we have an interesting congruence, that is,
nent might accomplish. Clearly this method de- u �≡ ±v mod n, and if so, we take the greatest
served some serious thought! common divisor (u − v, n) to get a nontrivial
I do not wish to give the impression that with factor of n.
this complexity analysis I had single-handedly Where is the set S of pairs �a, b� supposed
found a way to apply the number field sieve to to come from? For at least the second property
general composites. Far from it. I merely had a S is supposed to have, namely, that the prod-
shrouded glimpse of exciting possibilities for the uct of the numbers a − mb is a square, it is
future. That these possibilities were ever realized clear we might again use exponent vectors and
was mostly due to Joe Buhler, Hendrik Lenstra, a sieve. Here there are two variables a and b in-
and others. In addition, some months earlier stead of just the one variable in Q(x) in the qua-
Lenstra had done a complexity analysis for Pol- dratic sieve. So we view this as a parametrized
lard’s method applied to special numbers, and family of linear polynomials. We can fix b and
he too arrived at the expression exp(c(log n)1/3 let a run over an interval, then change to the next
(log log n)2/3 ) . My own analysis was based on b and repeat.
some optimistic algebraic assumptions and on But S is to have a second property too: for
arguments about what might be expected to the same pairs �a, b� , the product of a − αb is
hold, via averaging arguments, for a general a square in Z[α] . It was Pollard’s thought that if
number. we were in the nice situation that Z[α] is the full
The starting point of Pollard’s method to fac- ring of algebraic integers in Q(α) , if the ring is
tor n is to come up with a monic polynomial f (x) a unique factorization domain, and if we know
over the integers that is irreducible and an in- a basis for the units, then we could equally well
teger m such that f (m) ≡ 0 mod n. The polyno- create exponent vectors for the algebraic inte-
mial should have “moderate” degree d , meaning gers a − αb and essentially repeat the same al-
that if n has between 100 and 200 digits, then gorithm. To arrange for both properties of S to
d should be 5 or 6. For a number such as the hold simultaneously, well, this would just involve
9
ninth Fermat number, n = 22 + 1 , it is easy to longer exponent vectors having coordinates for
come up with such a polynomial. Note that all the small prime numbers, for the sign of
8n = 2515 + 8 . So let f (x) = x5 + 8 , and let a − αb , for all the “small” primes in Z[α] , and
m = 2103 . for each unit in the unit basis.
Of what possible use could such a polynomial But how are we supposed to do this for a
be? Let α be a complex root of f (x), and consider general number n? In fact, how do we even
achieve the first step of finding the polynomial uct of various algebraic integers a − αb is a
f (x) and the integer m with f (m) ≡ 0 mod n? square of an algebraic integer, then so too is the
And if we could find it, why should we expect corresponding product of norms a square of an
that Z[α] has all of the nice properties to make integer. Note too that we know how to find a set
Pollard’s plan work? of pairs �a, b� with the product of N(a − αb) a
square. This could be done by using a sieve to
The Number Field Sieve Evolves discover Y-smooth values of N(a − αb) and then
There is at the least a very simple device to get combine them via exponent vector algebra over
started, that is, to find f (x) and m. The trick is F2.
to find f (x) last. First, one decides on the degree But having the product of the numbers
d of f . Next, one lets m be the integer part of N(a − αb) be a square, while a necessary con-
n1/d . Now write n in the base m , so that
dition for the product of the a − αb to be a
n = md + cd−1 md−1 + · · · + c0 , where the base
square, is far from sufficient. The principal rea-
m “digits” ci satisfy 0 ≤ ci < m . (If n > (2d)d ,
son for this is that the norm map takes various
then the leading “digit” cd is 1.) The polynomial
prime ideals to the same thing in Z , and so the
f (x) is now staring us in the face; it is
norm can easily be a square without the argu-
xd + cd−1 xd−1 + · · · + c0 . So we have a monic
polynomial f (x), but is it irreducible? ment being a square. For example, the two de-
There are many strategies for factoring prim- gree one primes in Z[i] , 2 + i and 2 − i , have
itive polynomials over Z into irreducible factors. norm 5. Their product is 5, which has norm
In fact, we have the celebrated polynomial-time 25 = 52 , but (2 + i)(2 − i) = 5 is squarefree. (Note
algorithm of Arjen Lenstra, Hendrik Lenstra, that if we are working in the ring of all algebraic
and Lászlo Lovász for factoring primitive poly- integers in Q(α) , then all of the prime ideal fac-
nomials in Z[x] (the running time is bounded by tors of a − αb for coprime integers a and b are
a fixed power of the sum of the degree and the degree one; namely, their norms are rational
number of digits in the coefficients). So suppose primes.) For each prime p let Rp be the set of
we are unlucky and the above procedure leads solutions to f (x) ≡ 0 mod p . When we come
to a reducible polynomial f (x) , say, across a pair �a, b� with p dividing N(a − αb) ,
f (x) = g(x)h(x). Then n = f (m) = g(m)h(m) , and then some prime ideal above p divides a − αb .
from a result of John Brillhart, Michael Filaseta, And we can tell which one, since a/b will be con-
and Andrew Odlyzko this factorization of n is gruent modulo p to one of the members of Rp ,
nontrivial. But our goal is to find a nontrivial fac- and this will serve to distinguish the various
torization of n, so this is hardly unlucky at all! prime ideals above p. Thus we can arrange for
Since almost all polynomials are irreducible, it our exponent vectors to have #Rp coordinates
is much more likely that the construction will for each prime p and so keep track of the prime
let us get started with the number field sieve, ideal factorization of a − αb. Note that #Rp ≤ d ,
and we will not be able to factor n immediately. the degree of f (x).
There was still the main problem of how one So we have gotten over the principal hurdle,
might get around the fact that there is no rea-
but there are still many obstructions. We are
son to expect the ring Z[α] to have any nice prop-
supposed to be working in the ring Z[α], and this
erties at all. By 1990 Joe Buhler, Hendrik Lenstra,
may not be the full ring of algebraic integers. In
and I had worked out the remaining difficulties
fact, this ring may not be a Dedekind domain,
and, incorporating a very practical idea of Len
so we may not even have factorization into prime
Adleman [1], which simplified some of our
constructions,3 published a description of the ideals. And even if we have factorization into
general number field sieve in [11]. prime ideals, the above paragraph merely as-
Here is a brief summary of what we did. The sures us that the principal ideal generated by the
norm N(a − αb) (over Q ) of a − αb is easily product of the algebraic integers a − αb is the
worked out to be bd f (a/b). This is the homog- square of some ideal, not necessarily the square
enized version of f . We define a − αb to be of a principal ideal. And even if it is the square
Y-smooth if N(a − αb) is Y-smooth. Since the of a principal ideal, it may not be a square of an
norm is multiplicative, it follows that if the prod- algebraic integer, because of units. (For example,
the ideal generated by −9 is the square of an
3Though Adleman’s ideas did not change our theoret- ideal in Z , but −9 is not a square.) And even if
ical complexity estimates for the running time, the sim- the product of the numbers a − αb is a square
plicities they introduced removed most remaining ob-
stacles to making the method competitive in practice 4It is a theorem that if f is a monic irreducible poly-
with the quadratic sieve. It is interesting that Adleman nomial over Z with a complex root α and if γ is in the
himself, like most others around 1990, continued to ring of integers of Q(α) , then f � (α)γ is in Z[α] . So if
think of the general number field sieve as purely a γ 2 is a square in the ring of integers of Q(α) , then
speculative method. f � (α)2 γ 2 is a square in Z[α] .
of an algebraic integer, how do we know it is the tity in a factorization method such as the qua-
square of an element of Z[α] ? dratic sieve or the number field sieve is what I
The last obstruction is rather easily handled was calling “ X” earlier. It is an estimate for the
by using f � (α)2 as a multiplier,4 but the other ob- size of the auxiliary numbers that we are hop-
structions seem difficult. However, there is a ing to combine into a square. Knowing X gives
simple and ingenious idea of Len Adleman [1] you � the complexity; it is about
that in one fell swoop overcomes them all. The exp( 2 log X log log X) . In the quadratic sieve
point is that even though we are being faced with we have X about n1/2+ε . But in the number field
some nasty obstructions, they form, modulo sieve, we may choose the polynomial f (x) and
squares, an F2-vector space of fairly small di- the integer m in such a way that
mension. So the first thought just might be to (a − mb)N(a − αb) (the numbers that we hope
ignore the problem. But the dimension is not that to find smooth) is bounded by a value of X of
small. Adleman suggested randomly choosing the form exp(c � (log n)2/3 (log log n)1/3 ) . Thus the
some quadratic characters and using their val- number of digits of the auxiliary numbers that
ues at the numbers a − αb to augment the ex- we sieve over for smooth values is about the 2/3
ponent vectors. (There is one fixed choice of the power of the number of digits of n, as opposed
random quadratic characters made at the start.) to the quadratic sieve where the auxiliary num-
So we are arranging for a product of numbers bers have more than half the number of digits
a − αb to not only be a square up to the “ob- of n. That is why the number field sieve is
struction space” but also highly likely actually asymptotically so fast in comparison.
to be a square. For example, consider the above I mentioned earlier that the heuristic run-
problem with −9 not being a square. If some- ning time for the number field sieve to factor n
how we cannot “see” the problem with the sign is of the form exp(c(log n)1/3 (log log n)2/3 ) , but
but it sure looks like a square to us because we
I did not reveal what “ c ” is. There are actually
know that for each prime p the exponent on p
three values of c depending on which version
in the prime factorization of −9 is even, we
of the number field sieve is used. The “special”
might still detect the problem. Here is how: Con-
number field sieve, more akin to Pollard’s orig-
sider a quadratic character evaluated at −9 , in
inal method and well suited to factoring num-
this case the Legendre symbol (−9/p) , which is 9
bers like 22 + 1 which are near high powers, has
1 if −9 is a square mod p and −1 if −9 is not 1/3
c = (32/9) ≈ 1.523 . The “general” number
a square mod p. Say we try this with p = 7 . It is
field sieve is the method I sketched in this paper
easy to compute this symbol, and it turns out
and is for use on any odd composite number that
to be −1 . So −9 is not a square mod 7, and so
is not a power. It has c = (64/9)1/3 ≈ 1.923. Fi-
it cannot be a square in Z . If −9 is a square mod
nally, Don Coppersmith [5] proposed a version
some prime p, however, this does not guaran-
tee it is a square in Z . For example, if we had of the general number field sieve in which many
tried this with 5 instead of 7, then −9 would still polynomials are used. The √ value of “ c ” for this
be looking like a square. Adleman’s idea is to method is 13 (92 + 26 13)1/3 ≈ 1.902 . This
evaluate smooth values of a − αb at the qua- stands as the champion worst-case factoring
dratic characters that were chosen and use the method asymptotically. It had been thought that
linear algebra to create an element with two Coppersmith’s idea is completely impractical, but
properties: its (unaugmented) exponent vector [8] considers whether the idea of using several
has all even entries, and its value at each char- polynomials may have some practical merit.
acter is 1 . This algebraic integer is highly likely,
in a heuristic sense, to be a square. If it is not a
The State of the Art
square, we can continue to use linear algebra over In April 1996 a large team (see [7]) finished the
F2 to create another candidate. factorization of a 130-digit RSA challenge num-
To be sure, there are still difficulties. One of ber using the general number field sieve. Thus
these is the “square root problem”. If you have the gauntlet has finally been passed from the
the prime factorizations of various rational in- quadratic sieve, which had enjoyed champion
tegers and their product is a square, you can eas- status since 1983 for the largest “hard” number
ily find the square root of the square via its factored. Though the real time was about the
prime factorization. But in Z[α] the problem same as with the quadratic sieve factorization
does not seem so transparent. Nevertheless, of the 129-digit challenge number two years ear-
there are devices for solving this too, though it lier, it was estimated that the new factorization
still remains as a computationally interesting took only about 15% of the computer time. This
step in the algorithm. The interested reader discrepancy was due to fewer computers being
should consult [15]. used on the project and some “down time” while
Perhaps it is not clear why the number field code for the final stages of the algorithm was
sieve is a good factoring algorithm. A key quan- being written.
In the table, the notation P k means a prime number of k decimal digits, while the notation C k means a composite
number of k decimal digits for which we know no nontrivial factorization.
The history of the factorization of Fermat numbers is a microcosm of the history of factoring. Fermat himself
m
knew about F0 through F4 , and he conjectured that all of the remaining numbers in the sequence 22 + 1 are prime.
However, Euler found the factorization of F5 . It is not too hard to find this factorization, if one uses the result, es-
sentially due to Fermat, that for p to be a prime factor of Fm it is necessary that p ≡ 1 mod 2m+2 , when m is at least
2. Thus the prime factors of F5 are all 1 mod 128, and the first such prime, which is not a smaller Fermat number,
is 641. It is via this idea that F6 was factored (by Landry in 1880) and that “small” prime factors of many other Fer-
mat numbers have been found, including more than 80 beyond this table.
The Fermat number F7 was the first success of the Brillhart-Morrison continued fraction factoring method. Brent
and Pollard used an adaptation of Pollard’s “rho” method to factor F8 . As discussed in the main article, F9 was fac-
tored by the number field sieve. The Fermat numbers F10 and F11 were factored by Brent using Lenstra’s elliptic
curve method.
We know that F14 , F20 and F22 are composite, but we do not know any prime factors of these numbers. That they
are composite was discovered via Pepin’s criterion: Fm is prime if and only if 3(Fm −1)/2 ≡ −1 mod Fm . The smallest
Fermat number for which we do not know if it is prime or composite is F24. It is now thought by many number the-
orists that every Fermat number after F4 is composite.
Fermat numbers are connected with an ancient problem of Euclid: for which n is it possible to construct a regu-
lar n-gon with straightedge and compass? Gauss showed that a regular n-gon is constructible if and only if n ≥ 3 and
the largest odd factor of n is a product of distinct, prime Fermat numbers. Gauss’s theorem, discovered at the age
of 19, followed him to his death: a regular 17-gon is etched on his gravestone.
So where is the crossover between the qua- much memory a computer has. The quadratic
dratic sieve and the number field sieve? The an- sieve is as well, but not to such a large degree.
swer to this depends somewhat on whom you There is much that was not said in this brief
talk to. One thing everyone agrees on: for smaller survey. An important omission is a discussion
numbers—say, less than 100 digits—the qua- of the algorithms and complexity of the linear
algebra part of the quadratic sieve and the num-
dratic sieve is better, and for larger numbers—
ber field sieve. At the beginning we used Gauss-
say, more than 130 digits—the number field
ian elimination, as Brillhart and Morrison did with
sieve is better. One reason a question like this
the continued-fraction method. But the size of
does not have an easy answer is that the issue the problem has kept increasing. Nowadays a fac-
is highly dependent on fine points in the pro- tor base of size one million is in the ballpark for
gramming and on the kind of computers used. record factorizations. Clearly, a linear algebra
For example, as reported in [7], the performance problem that is one million by one million is not
of the number field sieve is sensitive to how a trifling matter. There is interesting new work
on this that involves adapting iterative methods 18, 19, 20]. In addition, I am currently writing a
for dealing with sparse matrices over the real book with Richard Crandall, PRIMES: A compu-
numbers to sparse matrices over F2. For a recent tational perspective, that should be out sometime
reference, see [14]. in 1997.
Several variations on the basic idea of the I hope I have been able to communicate some
number field sieve show some promise. One can of the ideas and excitement behind the devel-
replace the linear expression a − mb used in opment of the quadratic sieve and the number
the number field sieve with bk g(a/b) , where field sieve. This development saw an interplay
g(x) is an irreducible polynomial over Z of de- between theoretical complexity estimates and
gree k with g(m) ≡ 0 mod n. That is, we use two good programming intuition. And neither could
polynomials f (x), g(x) with a common root m have gotten us to where we are now without the
mod n (the original scenario has us take other.
g(x) = x − m ). It is a subject of current research
to come up with good strategies for choosing Acknowledgments
polynomials. Another variation on the usual This article is based on a lecture of the same title
number field sieve is to replace the polynomial given as part of the Pitcher Lecture Series at
f (x) with a family of polynomials along the lines Lehigh University, April 30–May 2, 1996. I grate-
suggested by Coppersmith. For a description of fully acknowledge their support and encour-
the number field sieve incorporating both of agement for the writing of this article. I also
these ideas, see [8]. thank the Notices editorial staff, especially Susan
The discrete logarithm problem (given a cyclic Landau, for their encouragement. I am grateful
group with generator g and an element h in the to the following individuals for their critical
group, find an integer x with g x = h ) is also of comments: Joe Buhler, Scott Contini, Richard
keen interest in cryptography. As mentioned, Crandall, Bruce Dodson, Andrew Granville, Hen-
Pollard’s original idea for the number field sieve drik Lenstra, Kevin McCurley, Andrew Odlyzko,
was born out of a discrete logarithm algorithm. David Pomerance, Richard Schroeppel, John Sel-
We have come full circle, since Dan Gordon,
fridge, and Hugh Williams.
Oliver Schirokauer, and Len Adleman have all
given variations of the number field sieve that References
can be used to compute discrete logarithms in [1] L. M. Adelman, Factoring numbers using singu-
multiplicative groups of finite fields. For a recent lar integers, Proc. 23rd Annual ACM Sympos. The-
survey, see [22]. ory of Computing (STOC), 1991, pp. 64–71.
I have said nothing on the subject of primal- [2] W. R. Alford and C. Pomerance, Implementing
ity testing. It is generally much easier to recog- the self initializing quadratic sieve on a distributed
nize that a number is composite than to factor network, Number Theoretic and Algebraic Meth-
it. When we use complicated and time-consum- ods in Computer Science, Proc. Internat. Moscow
ing factorization methods on a number, we al- Conf., June–July 1993 (A. J. van der Poorten, I. Sh-
ready know from other tests that it is an odd parlinski, and H. G. Zimmer, eds.), World Scien-
composite and it is not a power. tific, 1995, pp. 163–174.
[3] J. Brillhart, D. H. Lehmer, J. L. Selfridge,
I have given scant mention of Hendrik
B. Tuckerman, and S. S. Wagstaff Jr., Factor-
Lenstra’s elliptic curve factorization method.
izations of bn ± 1 , b = 2, 3, 5, 6, 7, 10, 11, 12 ,
This algorithm is much superior to both the up to high powers, second ed., vol. 22, Contemp.
quadratic sieve and the number field sieve for Math., Amer. Math. Soc., Providence, RI, 1988.
all but a thin set of composites, the so-called [4] E. R. Canfield, P. Erdös, and C. Pomerance, On
“hard” numbers, for which we reserve the sieve a problem of Oppenheim concerning “Factorisa-
methods. tio Numerorum”, J. Number Theory 17 (1983),
There is also a rigorous side to factoring, 1–28.
where researchers try to dispense with heuris- [5] D. Coppersmith, Modifications to the number field
tics and prove theorems about factorization al- sieve, J. Cryptology 6 (1993), 169–180.
gorithms. So far we have had much more suc- [6] D. Coppersmith, A. M. Odlyzko, and R. Schroep-
cess proving theorems about probabilistic pel, Discrete logarithms in GF(p) , Algorithmica
methods than deterministic methods. We do not 1 (1986), 1–15.
[7] J. Cowie, B. Dodson, R. Marije Elkenbracht-
seem close to proving that various practical
Huizing, A. K. Lenstra, P. L. Montgomery, and
methods, such as the quadratic sieve and the
J. Zayer, A world wide number field sieve factor-
number field sieve, actually work as advertised. ing record: On to 512 bits, Advances in Cryptol-
It is fortunate that the numbers we are trying to ogy—Asiacrypt ‘96, to appear.
factor have not been informed of this lack of [8] M. Elkenbracht-Huizing, A multiple polynomial
proof! general number field sieve, Algorithmic Number
For further reading I suggest several of the ref- Theory, Second Intern. Sympos., ANTS-II, to ap-
erences already mentioned and also [10, 13, 17, pear.