Polychap
Polychap
Polychap
Rev. 19/V/14 JM
1
Each polynomial f ∈ R[x1 , . . . , xn ] defines a function Rn → R in an obvious
way. Usually we will use the same letter for the polynomial and for the function.
Exercise 1.1. Prove (before reading further, and carefully!) that if the function
defined by a polynomial f ∈ R[x, y] is zero everywhere on R2 , then f is the zero
polynomial; that is, all coefficients are 0. Similarly for Q k[x1 , . . . , xn ] where k
is an infinite field. (Note that if F is a finite field, then a∈F (x−a) is a nonzero
polynomial defining the zero function F → F.)
Here we measure the size of the zero set of f discretely, by counting the
points of its intersection with the “combinatorial cube” S n . If k = Fq , we can
often simply take S = Fq .
2
where k is the maximum exponent of x1 in f .
We divide the n-tuples (r1 , . . . , rn ) with f (r1 , . . . , rn ) = 0 into two classes.
The first class, called V1 , consists of the n-tuples with fk (r2 , . . . , rn ) = 0. Since
the polynomial fk (x2 , . . . , xn ) is not identically zero and has degree at most
d − k, the number of choices for (r2 , . . . , rn ) is at most (d − k)|S|n−2 by the
induction hypothesis, and so |V1 | ≤ (d − k)|S|n−1 .
The second class V2 are the remaining n-tuples, that is, those with f (r1 , . . . , rn ) =
0 but fk (r2 , . . . , rn ) 6= 0. Here we count as follows: r2 through rn can be cho-
sen in at most |S|n−1 ways, and if r2 , . . . , rn are fixed with fk (r2 , . . . , rn ) 6= 0,
then r1 must be a root of the univariate polynomial g(x1 ) = f (x1 , r2 , . . . , rn ).
This polynomial has degree (exactly) k, and hence at most k roots. Thus
|V2 | ≤ k|S|n−1 , which gives d|S|n−1 altogether, finishing the proof.
Exercise 2.2. Check that the Schwartz–Zippel theorem is tight; i.e., exhibit an
n-variate polynomial of degree d whose zero set in S n has d|S|n−1 points (where
d < |S|).
Exercise 2.3. Imitate the proof of the Schwartz–Zippel theorem to show that
the zero set of f is Lebesgue null for every nonzero f ∈ R[x1 , . . . , xn ]. (Fubini’s
theorem heps with a convenient proof, if you are somewhat familiar with measure
theory.)
3
variables. We introduce a variable xij for every edge {ui , vj } ∈ E(G) (so we
have m variables altogether), and we define an n × n matrix A by
xij if {ui , vj } ∈ E(G),
aij :=
0 otherwise.
4
Finally, it is worth mentioning that although the basic version of the algo-
rithm, as described above, only decides if there is a perfect matching but does
not find one, there are more sophisticated extensions that also find a perfect
matching, and if a perfect matching does not exist, they can find a matching of
maximum cardinality. See [Har09] for recent results and references.
Counting compositions. The strategy in the above algebraic algorithm is
very general and can be used for an arbitrary polynomial identity testing; that
is, for a polynomial of controlled degree provided by a black box, the Schwartz–
Zippel theorem allows us to test whether the polynomial is identically zero.
Here is another lovely application.
Given a set P ⊆ Sn of permutations, we would like to count |P ◦ P |, i.e.,
the number of distinct permutations ρ that can be expressed as a composition
σ ◦ τ for σ, τ ∈ P . Mainly for notational simplicity, let us assume |P | = n.
A straightforward algorithm for computing |P ◦ P | takes every pair (σ, τ ) ∈
P 2 , computes the composition σ ◦ τ in O(n) time, and then counts the number
of distinct permutations in the resulting list. With some care, this can be
implemented in a total of O(n3 ) time.
To get an asymptotically faster, algebraic algorithm, we introduce variables
x1 , . . . , xn and y1 , . . . , yn . Let us observe that, given permutations σ and τ , the
(quadratic) polynomial
Xn
fστ := xσ(i) yτ −1 (i)
i=1
encodes the composition ρ := σ ◦ τ , in the sense that
n
X
fστ = xρ(i) yi ,
i=1
as is easy to check. Consequently, fστ and fσ0 τ 0 are equal polynomials iff σ ◦
τ = σ 0 ◦ τ 0 . Hence, |P ◦ P | equals the number of distinct polynomials among
(fστ : σ, τ ∈ P ).
Next, we observe that all of the fστ can be evaluated simultaneously using
a matrix product. Indeed, let us enumerate P = {σ1 , . . . , σn }, and define the
polynomial matrices A, B with aij = xσj (i) and bij = yσ−1 (i) . Setting C = AT B,
j
we find that cij = fσi σj .
The probabilistic algorithm for computing |P ◦ P | now goes as follows. We
set N := 4n4 , S := {1, 2, . . . , N }, we pick values s1 , . . . , sn and t1 , . . . , tn from
S independently and uniformly at random, we make the substitutions xi := si
and yi := ti , i = 1, 2, . . . , n, and we compute the value of C. By fast matrix
multiplication this can be done in O(n2.376 ) time. We return the number of
distinct entries of the resulting matrix as the answer.
Clearly, this answer is never larger than |P ◦ P |. If it is strictly smaller than
|P ◦ P |, it means that a nonzero polynomial of the form fστ − fσ0 τ 0 evaluates
to 0 at s1 , . . . , sn , t1 , . . . , tn . For every fixed fourtuple (σ, τ, σ 0 , τ 0 ) ∈ P 4 , this
has probability at most 2n1 4 according to the Schwartz–Zippel theorem (with
degree d = 2). The probability that this occurs for at least one of the at most
n4 fourtuples is thus at most 12 .
5
Hence the answer is correct with probability at least 12 . As before, this
probability can be boosted by repetition and/or by choosing larger S.
Exercise 4.3. (a) We recall that real numbers ξ1 , . . . , ξn are algebraically in-
dependent (over the rationals) if there is no nonzero polynomial f ∈ Q[x1 , . . . , xn ]
with f (ξ1 , . . . , ξn ) = 0. Prove that for every n there exist n algebraically inde-
pendent real numbers. Hint: one can use a cardinality argument or a measure
argument, for example.
(b) Show that if a1 , . . . , aN ∈ Rn are points whose nN coordinates are
algebraically independent, and if N = d+n n , then the only polynomial f ∈
R[x1 , . . . , xn ] of degree at most d vanishing at all the ai is identically zero.
6
Exercise 4.4. (a) Given a1 , . . . , aN ∈ kn and values b1 , . . . , bN ∈ k, prove that
there exists a polynomial f ∈ k[x1 , . . . , xn ] with f (ai ) = bi for all i = 1, . . . , N ,
and with deg f ≤ N − 1.
(b) Show that the bound deg f ≤ N − 1 is optimal in the worst case (i.e.,
find a1 , . . . , aN and b1 , . . . , bN for which no f of smaller degree will do). Note
that for n ≥ 2, this bound is very different from the one in Lemma 4.2.
and it was proved by Guth and Katz [GK10] in 2008, after many years of effort
by a number of people and many intermediate results, that, asymptotically, this
is the most one can get.
Theorem 4.5. The maximum number of joints of n lines in R3 is O(n3/2 ).
There is a straightforward generalization to Rd : for every fixed d, the maxi-
mum number of joints of n lines in Rd is of order nd/(d−1) , where a joint means a
point common to at least d lines whose direction vectors span Rd . For simplicity
we stick to the d = 3 case.
On partial derivatives. We recall that, for a polynomial f ∈ R[x1 , . . . , xn ],
the partial derivative ∂f /∂xi is the usual derivative of a univariate real func-
tion, where xi is regarded as a variable, while all the other xj are considered
constant. The gradient of f is the n-tuple
∂f ∂f
∇f := ,..., .
∂x1 ∂xn
As a side remark, we note that the derivative can be defined purely formally,
by setting ∂(xi )/∂x := ixi−1 and extending linearly, and this makes sense over
any field. Many of the usual properties of derivatives can then be checked as
well, and so one need not specialize to real (or complex) numbers. For finite
fields, though, it may be better to work with the Hasse derivative, where the
mth Hasse derivative Dm (xi ) := mi xi−m ; this avoids troubles with dividing
by zero, e.g., in a Taylor expansion formula.
If f ∈ R[x1 , . . . , xn ] is of degree d ≥ 1, then each of the partial derivatives
∂f /∂xi is a polynomial of degree at most d − 1, and at least one of them is
nonzero.
The following observation connects the definition of a joint to algebra.
7
Observation 4.6. Let a be a joint of lines `1 , `2 , `3 in R3 , and let f ∈ R[x1 , x2 , x3 ]
be a polynomial that vanishes on each of the `i . Then ∇f (a) = 0; that is, all of
the partial derivatives of f vanish at a.
Proof. This follows easily using the notion and simple properties of directional
derivatives. Here is a more explicit argument.
W.l.o.g. we may assume a = 0 (the general case follows by translation).
If we write f = c0 + c1 x1 + c2 x2 + c3 x3 + terms of degree ≥ 2, then we have
∂f
∂xi (0) = ci , and ∇f (0) = c := (c1 , c2 , c3 ). Letting vi = (vi1 , vi2 , vi3 ) be the
directional vector of `i , the restriction of f to the line `i can be regarded as the
univariate polynomial
We are almost ready for the proof of the O(n3/2 ) upper bound for joints,
but there is still a simple technical step to be prepared. If we have a set L of
n lines with a large number m of joints, then an average line contains “many”
joints, namely, 3m/n. But for the polynomial argument to work, we want that
every line contains many joints. This is taken care by a standard “pruning”
argument (if you know the proof of the statement that every graph of average
degree 2δ contains a subgraph with minimum degree at least δ, then you also
know the proof of the next lemma).
Lemma 4.7. Let L be a set of n lines in R3 , let J be the set of all joints of L,
and let b := m/2n, where m = |J|. Then there are subsets L0 ⊆ L and J 0 ⊆ J
such that L0 6= ∅, every point of J 0 is a joint of the lines of L0 , and every line
of L0 contains more than b points of J 0 .
Proof of Theorem 4.5. For contradiction, we suppose that a set L of n lines has
m ≥ 7n3/2 joints. Let J, b = m/2n, J 0 , and L0 be as in the previous lemma.
We choose a nonzero polynomial f ∈ R[x1 , x2 , x3 ] that vanishes on all of J 0
and, subject to this condition, has the smallest possible degree.
8
First we claim that deg f ≤ b. Indeed, by Lemma 4.2, deg f does not exceed
the smallest integer d with d+3 > |J 0 |, and a simple calculation shows that
b+3
3
3 > m ≥ |J 0 |. Namely,
b+3 b3 (m/2n)3 m2
> = =m ≥ m,
3 3! 3! 48n3
This kind of argument is what Larry Guth calls contagious vanishing: the
vanishing of f spreads like infection from J 0 to the lines of L0 . In more com-
plicated proofs of this kind, this spreading may continue further, to suitable
planes or surfaces, and sometimes ultimately to the whole space.
There are several other beautiful applications of the contagious vanishing
argument. The most significant ones are probably a near-optimal solution to
the Erdős distinct distances problem due to Elekes, Guth, and Katz, and a proof
of the Kakeya conjecture over finite fields due to Dvir. These, and much more,
can be found, e.g., in Guth’s book [Gut13] in preparation or in Tao’s survey
[Tao13]. There are also older arguments in number theory, due to Thue (see
[Gut13]) and, especially, due to Baker (see [Wal79, Sec. 4]), which use some
sort of contagious vanishing.
9
Exercise 5.2. Prove that the sets Z ⊆ R and [0, 1]2 ⊆ R2 are not algebraic
varieties (over R).
Exercise 5.3. Show that a ring R (commutative and with 1) has only two
ideals, {0} and R, iff it is a field.
10
Theorem 5.5 (Hilbert basis3 theorem). If R is a Noetherian ring, then the
polynomial ring R[x] is Noetherian as well. Consequently, k[x1 , . . . , xn ] is
Noetherian for every field k.
Hilbert’s proof, more than 100 years old, was unusual at that time since
it was nonconstructive: it proved the existence of a finite generating set in
every ideal of k[x1 , . . . , xn ], but did not provide any method for finding one.
This nonconstructive approach was initially criticized, but later on embraced
enthusiastically by the mathematical community. In the last decades, with
renewed emphasis on computations and algorithms, people again put much
effort into finding constructive, and efficient, proofs for important results.
Then fm+1 − g has degree strictly smaller than fm+1 and lies in I \ I 0 (why?).
But this contradicts our choice of fm+1 as a smallest-degree element.
Exercise 5.6. For every n, find an ideal in R[x, y] that needs at least n gener-
ators.
6 The Nullstellensatz
The German word Nullstellensatz, meaning “zero locus theorem,” is commonly
used in English to denote a basic and classical result of algebraic geometry. It
applies to varieties over an algebraically closed field, most notably over C—a
very important assumption.
3
Here “basis” refers to what we call “generating set.” In linear algebra, bases are inclusion-
minimal generating sets and they have a number of neat properties, such as all having the
same size for a given vector space. In contrast, different inclusion-minimal generating sets of
an ideal may have very different sizes and thus, for example, they are unsuitable for defining
“dimension.”
11
For a field k that is not algebraically closed, one can sometimes obtain useful
information by applying the Nullstellensatz with the algebraic closure k of k,
which is an inclusion-minimal algebraically closed field extending k; as it turns
out, k is determined uniquely up to isomorphism.
Exercise 6.1. (a) Prove that for every field k, possibly finite, there are in-
finitely many irreducible polynomials in k[x], none a constant multiple of an-
other. (Recall that a polynomial f is irreducible if it is not a product f = gh
with deg g, deg h ≥ 1.)
(b) Deduce that every algebraically closed field is infinite.
Exercise 6.2. Prove this using suitable theorems from linear algebra.
The weak Nullstellensatz can also be stated in this form: if a system of poly-
nomial equations f1 = f2 = · · · = fm = 0, with f1 , . . . , fm ∈ k[x1 , . . . , xn ] and
k algebraically closed, has no solution, then there are polynomials h1 , . . . , hm ∈
k[x1 , . . . , xn ] such that h1 f1 + · · · + hm fm = 1. The last equation is an obvi-
ous reason of unsolvability of the original system, since any common zero of
f1 , . . . , fm would also be a zero of h1 f1 + · · · + hm fm , but the latter is never
zero.
Here is the usual, formally somewhat different, statement.
The usual proofs of the weak Nullstellensatz, including those given here, are
nonconstructive—they do not provide the hi for given fi . Algorithmic methods
exist as well, and we will mention them later on. But it should be said that
although the weak Nullstellensatz provides an “obvious” reason, or proof, of
unsolvability of a given polynomial system, that proof is not necessarily very
compact. Indeed, examples are known in which the smallest possible degree of
the hi has to be exponential in n (see [Kol88] for precise bounds).
12
The ideal–variety correspondence: the (strong) Nullstellensatz. The
strong Nullstellensatz basically says that, over an algebraically closed field, alge-
braic varieties in kn are in one-to-one correspondence with ideals in k[x1 , . . . , xn ].
Or actually, not with all ideals but radical ones, where an ideal I is radical if
f s ∈ I for some natural number s implies f ∈ I.
This extra condition is needed since, e.g., the ideals hxi and hx2 i in C[x]
both define the same variety, namely {0}—but √ only the first one is radical. For
an arbitrary ideal I in a ring R, its radical I is defined in the expected way,
as {f ∈ R : f s ∈ I for some s}.
√
Exercise 6.5. Check that I is an ideal.
For a set S ⊆ kn , let
I(S) := {f ∈ k[x1 , . . . , xn ] : f vanishes on S};
clearly, this is an ideal.
Exercise 6.6. (a)
√ Check that V (I(X)) = X for every variety X, over any field.
(b) Verify that I ⊆ I(V (I)) for every ideal I ⊆ k[x1 , . . . , xn ], again over any
field.
Proof of the strong Nullstellensatz from the weak one. This is known as the Ra-
binowitsch trick : we add a new variable and a new equation to get an unsatis-
fiable system, for which we apply the weak Nullstellensatz.
Namely, let I = hf1 , . . . , fm i; then the polynomials f1 , . . . , fm and xn+1 g −
1 ∈ k[x1 , . . . , xn+1 ] have no common zero in kn+1 . So by the weak Nullstellen-
satz we have
h1 f1 + · · · + hm fm + hm+1 (xn+1 g − 1) = 1 (1)
for some h1 , . . . , hm+1 ∈ k[x1 , . . . , xn+1 ].
This equality holds for every value of the variables, and in particular, with
xn+1 = 1/g(x1 , . . . , xn ) whenever g(x1 , . . . , xn ) 6= 0. Hence the rational function
resulting by substituting xn+1 = 1/g(x1 , . . . , xn ) into the left-hand side of (1)
equals 1 whenever g 6= 0.
We multiply both sides of the resulting equality by g s , where s is the highest
power of xn+1 appearing in (1). This yields the following equality of polynomial
functions kn → k
h01 f1 + · · · + h0m fm = g s (2)
which holds at all points except possibly at the zeros of g (here h01 , . . . , h0m ∈
k[x1 , . . . , xn ]; also note that the term hm+1 (xn+1 g − 1) vanishes). Using the
fact that every algebraically closed field is infinite (Exercise 6.1) and, for ex-
ample, the Schwartz–Zippel theorem, we get that (2) holds as an equality of
polynomials, and this concludes the proof.
13
The strong Nullstellensatz shows that, with k algebraically closed, an al-
gebraic variety in kn and a radical ideal in k[x1 , . . . , xn ] are just two ways of
looking at the same object. Such alternative views of mathematical objects are
often very useful.
Several proofs of the Nullstellensatz are known, usually with numerous vari-
ations. Here we essentially follow a particularly simple proof from [Arr06], in
which we meet a classical tool—resultants.
To see how one can naturally arrive at the resultant, let us consider the
system of two polynomial equations f = 0, g = 0. A possible way of showing
that it is unsolvable, i.e., f and g have no common root, is to find polynomials
a, b ∈ k[x] such that the polynomial af + bg is a nonzero constant, say 1.
First we observe that if such a and b exist, we may as well assume deg a < `
and deg b < k. This is because if some af + bg = 1, then also a0 f + b0 g = 1,
where a0 = a + pg and b0 = b − pf for some p ∈ k[x]. Hence we can reduce a
modulo g to have degree smaller than `, and then b must have degree smaller
than k, for otherwise, we would have deg bg ≥ k +` > deg af , and so af +bg = 1
would be impossible.
Let us regard the coefficients of a and b as above as unknowns. The require-
ment af + bg = 1 is an equality of polynomials of degree at most k + `. By
comparinge the coefficients of each of the relevant powers of x on both sides,
we obtain a system of k + ` linear equations with k + ` unknowns. The reader
may want to write this system down and see that its matrix looks as follows
(we show it for the special case k = 5 and ` = 3, which makes clear what the
general case is):
f0 0 0 g0 0 0 0 0
f1 f0 0 g1 g0 0 0 0
f2 f1 f0 g2 g1 g0 0 0
f3 f2 f1 g3 g2 g1 g0 0
f4 f3 f2 0 g3 g2 g1 g0 .
f5 f4 f3 0 0 g3 g2 g1
0 f5 f4 0 0 0 g3 g2
0 0 f5 0 0 0 0 g3
This is called the Sylvester matrix of f and g.
14
The resultant of f and g, denoted by Res(f, g, x), is the determinant of
the Sylvester matrix, which is an element of k. From the above discussion it is
clear that if Res(f, g, x) 6= 0, then the considered linear system has a solution,
and so the desired a and b with af + bg = 1 exist, witnessing the nonexistence
of a common root of f and g.
Exercise 6.8. (a) Using Euclid’s algorithm, check that if f, g ∈ k[x] have
no nonconstant common factor, then there are polynomials u, v ∈ k[x] with
uf + vg = 1. (The reverse implication is obvious.)
(b) Using (a), prove that for f, g ∈ k[x], where k need not be algebraically
closed, Res(f, g, x) = 0 implies that f and g have a nonconstant common factor.
Resultant over a ring. We will need a slightly more general setting, where
f, g ∈ R[x] are polynomials over a ring R (commutative with 1 as usual). The
definition above still makes sense and Res(f, g, x) is an element of R. The
next lemma, which we will need later, provides another way of showing that if
Res(f, g, x) 6= 0, then f and g have no common root.
Lemma 6.9. For every f, g ∈ R[x], deg f = k, deg g = `, there are a, b ∈ R[x]
with deg a ≤ ` − 1, deg b ≤ k − 1, and Res(f, g, x) = af + bg.
Proof. Let fd denote the sum of all terms of degree d in f (this is called
the homogeneous component of f of degree d). Then the coefficient of xn
in f 0 equals fd (λ1 , . . . , λn−1 , 1). Since k is infinite, the nonzero polynomial
fd (x1 , . . . , xn−1 , 1) cannot vanish everywhere on kn−1 .
15
We proceed by induction on n, considering the case n = 1 settled (Exer-
cise 6.4). So let n > 1.
By Lemma 6.10 we can make a change of variables so that I contains a
polynomial g of degree d ≥ 1 with the term xdn . Since this substitution is
invertible, if we find a common zero for the ideal obtained from I after the
substitution, we can convert it back to a common zero for the original I. So we
assume we have g ∈ I as above.
Let I 0 be the set of all polynomials in I that do not contain the variable xn
(that is, there is no term with nonzero coefficient and nonzero power of xn ). We
can regard I 0 as a subset of k[x1 , . . . , xn−1 ]; then it is a proper ideal (right?),
and so by the inductive hypothesis, there is (a1 , . . . , an−1 ), a common zero of
all polynomials in I 0 .
Now we claim that the set
which is obviously an ideal, is not all of k[xn ]. Once we prove this claim, we
will be done, since by the 1-dimensional weak Nullstellensatz all polynomials
in J have a common zero a ∈ k, and then (a1 , . . . , an−1 , a) is a common zero
for I.
To prove the claim, we need to check that 1 6∈ J, so for contradiction, we
assume that there is f ∈ I such that f (a1 , . . . , an−1 , x) = 1 (this is an equality
of univariate polynomials). We fix f , as well as g as above, i.e., of degree d and
with term xdn . P P
Let us consider f and g as polynomials in xn : f = ki=0 fi xin , g = dj=0 gj xjn ,
f0 , . . . , fk , g0 , . . . , gd ∈ R := k[x1 , . . . , xn−1 ].
By Lemma 6.9, the resultant Res(f, g, xn ) ∈ R can be written as af + bg
with a, b ∈ R[x], and hence it belongs to I 0 . To finish the proof, we will show
that Res(f, g, xn ) is nonzero at (a1 , . . . , an−1 ), and hence it cannot belong to I 0 .
The equality f (a1 , . . . , an−1 , x) = 1 means that f0 (a1 , . . . , an−1 ) = 1 and f1
through fk vanish at (a1 , . . . , an−1 ). Also, by the choice of g, we have gd = 1
(identically). Looking at the Sylvester matrix of f and g, again for notational
simplicity in the particular case deg f = 5, deg g = 3, i.e.,
f0 0 0 g0 0 0 0 0
f1 f0 0 g1 g0 0 0 0
f2 f1 f0 g2 g1 g0 0 0
f3 f2 f1 g3 g2 g1 g0 0
f4 f3 f2 0 g3 g2 g1 g0 ,
f5 f4 f3 0 0 g3 g2 g1
0 f5 f4 0 0 0 g3 g2
0 0 f5 0 0 0 0 g3
16
7 Bézout’s inequality in the plane
One of the questions that often comes up in applications is, given a system
of polynomial equations f1 = 0,. . . , fm = 0, f1 , . . . , fm ∈ k[x1 , . . . , xn ], what
can be said about the existence and number of solutions? In order to avoid
trivialities, we always assume that di := deg fi ≥ 1 for all i.
In general this is not an easy question, and in this section we will consider
the special case with two equations f (x, y) = 0, g(x, y) = 0 in two variables,
which is considerably simpler than the general setting but still interesting. (We
are leaving aside the case of a single equation f = 0, which has already been
treated to some extent, at least implicitly.)
Here is an example of the zero set of two polynomials f and g of degree
5; each of them has been created by passing the zero set through 25 random
points in [0, 1]2 using Lemma 4.2.
1.0
0.8
0.6
0.4
0.2
0.0
V (g)
V (f )
17
a degree-k polynomial, and although this is not really a typical polynomial, it
can serve for a quick sanity check of many things.
The following theorem asserts that, assuming no common factor, we cannot
have more solutions than in the example.
18
The vector space dimension of R, or of I, in itself is usually not a very good
measure of “size,” since it is most often infinite. Certainly, for I = I(X) and
R = k[X], it does not capture the intuitive geometric notion of dimension of
the variety X. The trick is to consider subspaces consisting of polynomials up
to some given degree d.
For the ideal I this can be done in the obvious manner: we let I≤d consist
of all polynomials in I of degree at most d. For R this is slightly more tricky,
since two polynomials representing the same element of R may have different
degrees.
We thus define Rd as the quotient vector space k[x1 , . . . , xn ]≤d /I≤d , so the
elements of Rd are represented by polynomials of degree at most d, with the
same equivalence as that for R.
By a well known fact from linear algebra about quotient spaces, we have
dim Rd +dim I≤d = dim k[x1 , . . . , xn ]≤d = n+d
n , the last equality being Fact 4.1.
In particular, Rd and I≤d have finite dimension for every d.
The vector-space dimension of k[X]d , considered as a function of d, carries a
lot of information about the variety X, and it has a name—again after Hilbert.
Proof of Theorem 7.1. The plan for proving the planar Bézout inequality is now
this:
(i) We check that if X ⊆ k2 is an m-point set, then the Hilbert function of
X is at least m for all sufficiently large d.
19
polynomial in p ∈ K has the form p = af , and p determines a uniquely. (Here
we use that k[x, y] is a unique factorization domain; Exercise 7.3 below.) Of
course, we also have dim L≤d = dim k[x, y]d−` for d ≥ `.
What we want to bound is dim I≤d , where I = hf, gi. We have I = {af +bg :
a, b ∈ k[x, y]} = {p + q : p ∈ K, q ∈ L}. The sum of two polynomials of degree
at most d again has degree at most d, and hence I≤d ⊇ K≤d + L≤d .
Exercise 7.2. Find an example where this inclusion is proper.
Fortunately, since we need to bound dim Rd from above and thus dim I≤d
from below, the inclusion goes in the right direction. By the well-known formula
for the dimension of a sum of vector spaces, we have
(assuming d ≥ k + `).
20
of them quickly, but hopefully they will look less frightening next time. Reading
this section should give some first impression and basic vocabulary; for serious
work one should study a proper textbook.
Irreducible varieties. The union of the x-axis and y-axis in the plane
is an algebraic variety, namely, V (xy), which can naturally be decomposed
into two proper subvarieties, V (x) and V (y). Varieties that cannot be further
decomposed are called irreducible:
As we have remarked, some sources even reserve the term variety only for
irreducible varieties, and irreducibility is extremely important. We have already
seen a hint of this in Bézout’s inequality, and many other theorems require ir-
reducibility assumptions. For example, it turns out that an irreducible variety
over an algebraically closed field has the same “local dimension” in the neigh-
borhood of each point (we have not yet defined dimension rigorously, but surely
the reader has some intuitive idea), while a reducible variety may be, e.g., the
union of a plane and a line.
Sketch of proof. Finiteness follows from the Hilbert basis theorem: if we could
keep decomposing indefinitely, we would obtain an infinite descending chain
of varieties X1 ) X2 ) X3 ) · · · , whose corresponding ideals would form an
infinite ascending chain, and this is impossible since k[x1 , . . . , xn ] is Noetherian.
As for uniqueness, assuming two minimal decompositions into irreducibles
X = X1 ∪ · · · ∪ Xk = X10 ∪ · · · ∪ X`0 , we observe that if some Xi were not among
the Xj0 , then the Xi ∩ Xj0 would properly decompose Xi or vice versa.
21
We also stress that the task of finding the irreducible decomposition of a
given variety is highly nontrivial in general, although algorithmically solvable.
The Zariski topology. In the language of algebraic geometry, a set S ⊆ kn is
called Zariski closed or just closed if it is a variety, and it is (Zariski) open
if its complement is a variety. Readers familiar with the notion of topological
space can check that this defines a topology on kn , although a somewhat pe-
culiar one. Nonempty open sets are very big (assuming an infinite field), they
are dense in kn and every two intersect. Thus, the topology is not Hausdorff.
Yet it provides a convenient framework and terminology.
Exercise 8.3. Let X ⊆ km and Y ⊆ kn be irreducible varieties. Prove that
the product X × Y ⊆ km+n is irreducible as well.
22
A useful way of proving irreducibility. Let X ⊆ km be an irreducible
variety, and let f : X → kn be a regular map. Then it is easy to check that the
image f (X) is irreducible, but the statement has to be understood in a right
way.
Indeed, as we will discuss below in more detail, f (X) need not be a variety!
So we generalize irreducibility to an arbitrary set S ⊆ kn , meaning that we
cannot write S = (S ∩ X1 ) ∪ (S ∩ X2 ), where X1 , X2 are varieties and S ∩ X1 6=
S 6= S ∩ X2 . Then we can see that if f (X) were reducible, then so would be X,
because the preimage of a variety under a regular map is always a variety, as is
easy to check.
Thus, in particular, if we can express some variety Y parametrically, as the
image of some km , or of some other irreducible variety X, under a polynomial
map f , then Y is irreducible. More generally, it suffices that the image f (X) be
Zariski dense in Y , meaning that Y is the smallest variety containing f (X).
As an example, let m, n and r ≤ min(m, n) be natural numbers, and con-
sider the determinantal variety Dr (m, n) consisting of all m × n matrices,
considered as points in kmn , that have rank strictly smaller than r. This is
indeed a variety since the rank condition can be expressed as vanishing of all
r × r minors. Since an m × n matrix A has rank at most r − 1 iff it can be
expressed as a product U V , where U is m × (r − 1) and V is (r − 1) × n, we have
a surjective regular map k(r−1)(m+n) → Dr (m, n), and hence the determinantal
variety is irreducible.
Projections and images of affine varieties: constructible sets. Let us
consider the variety X := V (xy −1), a hyperbola, and project it onto the x-axis:
The projection π(X) is the x-axis minus 0, certainly not an algebraic variety.
Passing to an algebraically closed setting, complex numbers, does not help—the
0 is still missing. So affine algebraic varieties are not closed under projections,
and under regular maps in general.
One remedy is to add points at infinity and work in the projective space—
see Section 8.5 below. Another approach is to consider a larger class consisting
of all sets obtainable from varieties by finitely many set-theoretical operations;
these are called constructible sets. Using the fact that varieties are closed
under intersections and finite unions, it is not difficult to check that every
constructible set can be written as
(X1 \ Y1 ) ∪ · · · ∪ (Xk \ Yk ),
23
for varieties X1 , Y1 , . . . , Xk , Yk , where we may assume the Xi irreducible and
Yi ( Xi . Then Yi can be regarded as a set of “exceptional points” in Xi ; as we
will discuss in Section 8.3, it has smaller dimension than Xi .
We state the following result without proof:
Theorem 8.5 (Chevalley’s theorem). Let k be an algebraically closed field, and
let π : km+n → kn denote the projection on the last n coordinates. Then π(Z)
is a constructible set for every constructible set Z ⊆ km+n and, in particular,
for every variety Z.
This is actually a result about quantifier elimination in the theory of alge-
braically closed fields, and a nice proof can be found in [MO02].
Corollary 8.6. The image of a constructible set Z ⊆ km under a regular map
f : km → kn is a constructible set.
Sketch of proof. This is a generally useful trick: one needs to check that the
graph G := {(x, f (x)) ∈ km × kn : x ∈ Z} is a constructible set; then f (Z) =
π(G) is constructible by Chevalley’s theorem.
24
8.3 Dimension and degree
The dimension of algebraic varieties is defined algebraically, and it has several
rather different-looking but equivalent definitions. Here we will mention only
some of them, and we will not prove their equivalence.
In this section we will assume an algebraically closed field unless stated
otherwise. Things are considerably subtler over an arbitrary field, and it is
often preferable to work with schemes there, rather than varieties.
Dimension. Here is a definition which is very simple to state, but rather
difficult to work with. The dimension of a variety X is the largest n such
that there is a chain of properly increasing irreducible varieties ∅ ( X0 ( · · · (
Xn ⊆ X. (In particular, the empty variety ∅ has dimension −1.)
The idea is that a proper subvariety of an irreducible variety must be of lower
dimension; note that the same definition works for finite-dimensional vector
spaces. Since, in the algebraically closed case, irreducible varieties correspond
to prime ideals (Exercise 8.1), the dimension is also the length of the longest
chain of properly nested prime ideals in I(X) (this notion is called the Krull
dimension of the coordinate ring of X).
With this definition, even dim kn = n is not obvious (but it is true).
A geometric view, and degree. Another, more geometric way is to define
the dimension of a variety X ⊆ kn as the largest dimension k of a linear
subspace H ⊂ kn such that there is a projection π : kn → H with π(X) Zariski
dense in H. Here a projection is a linear map π : kn → kn such that π ◦ π = π,
and H = π(kn ).
Another, but equivalent, geometric definition of the dimension considers
only the the usual projections on all k-dimensional coordinate subspaces.
It turns out that the property of π(X) being Zariski dense in H = π(kn ) is
generic, in the sense that the set of the π not having this property is negligible:
if we parameterize all projections π onto k-dimensional subspaces by suitable
coordinates, then those with π(X) not Zariski dense in H satisfy a nontrivial
polynomial equation.
This point of view also brings us to the notion of degree. For a projection
π and a point y ∈ H = π(kn ), let us consider the number of preimages |{x ∈
X : π(x) = y}|. It can be shown that for π and y generic, this number is finite
and depends only on X. It is called the degree of X and denoted by deg X.
There is also a “dual” view: if X is a k-dimensional variety in kn , then a
generic (n − k − 1)-dimensional affine subspace of kn avoids X, while a generic
(n − k)-dimensional affine subspace intersects it in deg X points.
Dimension and regular maps. Regular maps do not increase dimension:
if X and Y are varieties (over an algebraically closed field) and f (X) = Y ,
or more generally, if f (X) is Zariski dense in Y for a regular map f , then
dim Y ≤ dim X. Moreover, if we have dim f −1 (y) = m for all y from a Zariski
dense subset of Y , then dim Y = dim X − m. Proofs can be found in many
introductory textbooks.
Generalized Bézout. If X, Y are varieties (over an algebraically closed field),
then deg(X ∩ Y ) ≤ (deg X)(deg Y ), which can be seen as a generalization of
25
Bézout’s inequality (see Heintz [Hei83]).
The Hilbert function and the Hilbert polynomial. We recall that the
Hilbert function of a variety X is defined as the Hilbert function HFk[X] of
its coordinate ring, and the value HFk[X] (d) is the dimension of the vector
space k[X]d , which consists of polynomials of degree at most d modulo the
polynomials in I(X) of degree at most d.
It turns out that for all sufficiently large d, the Hilbert function coincides
with a polynomial, called the Hilbert polynomial of X. More precisely, for
every quotient ring R = k[x1 , . . . , xn ]/I there exist d0 and a polynomial, de-
noted by HPR and obviously uniquely determined, such that HPR (d) = HFR (d)
for all d ≥ d0 .
This fact, mysterious as it may look, is not difficult. A short algebraic
proof can be found, e.g., in [Sch03, Lemma 2.3.3], and below we will provide a
geometric picture explaining the polynomial behavior.
The Hilbert polynomial provides a seemingly very different definition of
dimension and degree:
26
Exercise 8.7. Let us fix a graded monomial ordering, let I be an ideal in
k[x1 , . . . , xn ], let I 0 := LM(I), and let R := k[x1 , . . . , xn ]/I and R0 := k[x1 , . . . , xn ]/I 0
be the corresponding quotient rings.
(a) Show that I≤d has a basis (f1 , . . . , fm ) such that LM(f1 ) > · · · >
LM(fm ), and derive dim I≤d ≤ dim I≤d 0 .
(b) Prove that if the fi constitute a basis of I≤d as in (a), then LM(f1 ), . . . , LM(fm )
generate I≤d 0 . Conclude that HF 0 = HF .
R R
(c) Where does the argument use the assumption that the monomial ordering
is graded?
The proof in the exercise also shows that all monomials in I 0 = LM(I) are
linearly independent, and that each I≤d0 has a basis consisting of monomials.
n
Let us consider Z≥0 , all n-tuples of nonnegative integers, and let us color
the exponent vector of every monomial in the monomial ideal I 0 black; here is
a picture for n = 2:
α2
α1
α1 + α2 ≤ d
Since I 0 is an ideal, the black dots are the union of finitely many “corners”, i.e.,
translations of the nonnegative orthant—one corner for each generator. The
generators are marked by double circles.
The number of black dots in the halfspace α1 + · · · + αn ≤ d is the vector-
space dimension of I≤d 0 (since the corresponding monomials form a basis), and
hence the value of HFR0 (d) is the number of white dots in that halfspace (be-
cause HFR0 (d) = n+d d
0 ; we do not claim that the corresponding
− dim I≤d
monomials form a basis).
From this interpretation one can see why the Hilbert function eventually
becomes a polynomial: the key observation is that if we ignore a finite num-
ber of “irregular” white dots near the origin, the remaining white dots can be
organized into finitely many disjoint axes-parallel “orthants” of various dimen-
sions (semiinfinite rays, quadrants of planes, octants of 3-dimensional subspaces,
etc.); this is not quite a proofPbut almost. The following 3-dimensional picture
illustrates how the halfspace αi ≤ d sweeps the set of white dots, after it has
already passed the irregular part:
27
Finally, let us see why the growth of the Hilbert polynomial is related to
the geometric dimension V (I 0 ), at least for a monomial ideal I 0 .
Some thought reveals that HPR0 grows at least linearly iff at least one of
the coordinate axes has no black dots. Assuming, e.g., that all dots on the
α1 -axis are white, this means that every generator in the monomial ideal I 0 is
a multiple of one of x2 , . . . , xn , and hence the x1 -axis is contained in V (I 0 ).
Similarly, deg HPR0 ≥ 2 iff there is a two-dimensional coordinate plane
without a black point. Assuming it is the α1 α2 plane, we can see that the x1 x2 -
plane is contained in V (I 0 ), and so on—in general, the degree of the Hilbert
polynomial is the largest dimension of a coordinate subspace contained in V (I 0 ).
(And since I 0 is a monomial ideal, V (I 0 ) is the union of coordinate subspaces.)
The proofs relating the Hilbert polynomial to the other definitions of di-
mension and degree mentioned earlier are not too difficult, but here we do not
treat them.
28
does it mean to reduce them “modulo f1 , . . . , fm ”? We would like to write
g = a1 f1 + · · · + am fm + r, for suitable polynomials a1 , . . . , am and r, where r
should be a “remainder” after the division of g by the fi .
A good way of doing this is to fix a monomial ordering ≤, as introduced in
the previous section (but this time it need not be graded), and always try to get
rid of the leading monomial of the current g by subtracting the right multiple
of some fi .
Here is the division algorithm. It receives g as input, and successively
reduces it by subtracting suitable multiples of the fi , while simultaneously
building the remainder r.
1. Set r := 0.
3. At this point none of the LM(fi ) divides LM(g). Subtract the leading
term of g (i.e., LM(g) with the coefficient it has in g) from g and add it
to r. If g = 0, finish, and otherwise, go back to the previous step.
29
A Gröbner basis f1 , . . . , fm is called reduced if it satisfies a certain natural
minimality condition, namely, the leading monomials of the fi have coefficient
1, and no monomial in any fi is in the ideal generated by the LM(fj ) for j 6= i.
For a given I and monomial order, it can be shown that a reduced Gröbner
basis is unique.
There are algorithms that, given an arbitrary set of generators of I, compute
a Gröbner basis, usually a reduced one, w.r.t. a given monomial order. This
algorithmic task has been investigated a lot, since it is very significant both
in theory and in practice. In the worst case, the computational complexity, as
well as the size of the resulting Gröbner basis, are at least exponential in n, the
number of variables.
Once a Gröbner basis is available, we can solve the ideal membership prob-
lem by the division algorithm. Many other tasks can be solved as well: comput-
ing the sum, intersection, or quotient of two ideals; computing the dimension,
Hilbert polynomial, and Hilbert function of a given variety; solving a system of
polynomial equations; etc. The worst-case computational complexity of these
problems is again very high, but the existing implementations can sometimes
handle impressively large instances.
A nice mathematical application of these algorithms is for automatic theo-
rem proving: with Gröbner bases and some cleverness one can make a computer
program routinely prove many theorems in high-school geometry or even be-
yond it, for example, the Pappus theorem. The method is sketched in [CLO07].
Here we finish our very brief excursion to algorithms, referring to [CLO07]
for a thorough introduction.
Such an equivalence class can be viewed as a line through the origin in kn+1 .
The (n + 1)-tuple (a0 : · · · : an ) is called the homogeneous coordinates of a;
these are defined only up to a scalar multiple.
The following picture illustrates, for the case n = 2, the geometric meaning
of this construction.
30
`1 `2
`3
x0 = 1
0
x0 = 0
31
varieties may have nonisomorphic projective completions). The meaning of
I(X) for X ⊆ Pn is also modified appropriately.
Many of the concepts and results from the affine setting transfer to pro-
jective varieties without change (irreducible decomposition, Zariski open and
closed sets) or with only minor modifications.
For the weak Nullstellensatz, V (I) = ∅ not only for I = h1i, but also
when the radical of I is hx0 , x1 , . . . , xn i. This irrelevant ideal also has to be
excluded in the strong Nullstellensatz; after that, over an algebraically closed
field, we have a bijective correspondence between homogeneous radical ideals
and projective varieties.
A morphism f : X → Y of projective varieties X ⊆ Pm and Y ⊆ Pn
needs to be defined locally: for each x0 ∈ X ⊆ Pm there is a Zariski open
neighborhood U and homogeneous polynomials f0 , . . . , fn ∈ k[x0 , . . . , xm ] of
the same degree such that f (x) = (f0 (x) : · · · : fn (x)) for all x ∈ U (and in
particular, at least one fi (x) must be nonzero for each x).
As for the Hilbert function, in the projective case one needs to take the
dimension of k[x0 , . . . , xn ]/I=d , where I=d is the vector subspace spanned by
homogeneous polynomials of degree exactly d in the homogeneous ideal I.
Cutting with a polynomial. If X is a k-dimensional projective variety
over an algebraically closed field and f is a polynomial, then k − 1 ≤ dim(X ∩
V (f )) ≤ k. If, moreover, X is irreducible and f does not vanish on it, then
dim(X ∩ V (f )) = k − 1.
Exercise 8.8. Show that this fails for affine varieties; dim(X ∩ V (f )) can be
smaller than dim X − 1.
32
Let us prove at least something in this long section.
Here, crucially, since all the fi,a are homogeneous of degree k, we may
assume that the hi are homogeneous of degree s − k, because monomials of any
other degree can be discarded from them without changing the validity of (3).
Therefore, for every g, (3) can be rewritten as a system of linear equations
for the unknown coefficients of the hi . The matrix of this system, call it A, does
not depend on g, and its entries are homogeneous polynomials in a0 , . . . , an , the
homogeneous coordinates of a. The number of equations is t, the number of
monomials of degree s in n + 1 variables; it equals s+n n but we do not need
that.
The solvability of (3) for every g means that the linear system is solvable
for every right-hand side, which means exactly that A has rank t. Hence the
negation of this condition can be expressed as vanishing of all the t × t minors
of A.
Let Ys be the set of all a ∈ Pn such that the matrixT∞ A as above has rank
less than t. Each Ys is a variety, and we have π(Z) = s=0 Ys . Therefore, π(Z)
is a projective variety as claimed.
33
Warning example. Unlike in the planar case, over an arbitrary field, having
finitely many solutions does not guarantee that the bound d1 d2 · · · dn for their
number is correct.
Indeed, the system of three equations
(x − 1)2 (x − 2)2 · · · (x − k)2 + (y − 1)2 (y − 2)2 · · · (y − k)2 = 0, z = 0, z = 0
has k 2 solutions in R3 , but the degrees are 2k, 1, 1. We note that the solution
set in C3 is infinite.
Another example. In the previous example, the first equation has only
1-dimensional solution set over R, while the two remaining equations are iden-
tical. However, over C the solution set of the first equation is 2-dimensional,
and so at least over algebraically closed fields, one might hope to exclude this
kind of pathology by imposing a suitable condition on the fi . Indeed, drawing
inspiration from the planar case, a natural guess for such a condition can be
that no two of the fi have a common factor.
However, things are not that simple, and the suggested condition is definitely
not the right one. Here is a highly instructive example for n = 3:
f1 = x3 − yz, f2 = y 2 − xz, f3 = z 2 − x2 y.
These are irreducible polynomials, as is easy to check, none a multiple of an-
other. But V (f1 , f2 , f3 ) contains the curve C with parametric expression
C = {(t3 , t4 , t5 ) : t ∈ C},
and so surely it is not finite.
This example is also interesting in another respect. In linear algebra, every
k-dimensional vector subspace of kn can be described by n − k linear equations;
for example, a line in R3 is always the intersection of two planes. In contrast,
the curve C cannot be defined by two polynomial equations: It is easy to check
the common zero set of every two of the fi contains points not belonging to the
zero set of the third—e.g., V (f1 , f2 ) contains the z-axis, where f3 is nonzero.
With more effort, one can show that no two polynomials suffice; this is done
algebraically, by checking that the ideal hf1 , f2 , f3 i cannot be generated by two
polynomials.
Let us remark that things cannot get completely out of hands with this
kind of examples: it is known that every irreducible affine variety in kn , k
algebraically closed, can be given as the zero set of at most n + 1 polynomials
[Hei83, Prop. 3].
Bézout’s inequality assuming finitely many zeros. It seems that there is
no particularly useful general condition for V (f1 , . . . , fn ) to be finite, although
there are algorithms that can decide this question for any given f1 , . . . , fn —but
these are nontrivial and quite demanding computationally.
One way around this is to assume V (f1 , . . . , fn ) finite. Then, for k is alge-
braically closed, the expected inequality for the number of zeros does hold.
Theorem 9.1 (Higher-dimensional Bézout’s inequality I). Let k be algebraically
closed, and let f1 , . . . , fn ∈ k[x1 , . . . , xn ] be polynomials of degrees d1 , . . . , dn ≥
1. Assuming that V (f1 , . . . , fn ) ⊂ kn is finite, it has at most d1 d2 · · · dn points.
34
Actually, one can say a bit more: even if V (f1 , . . . , fn ) contains irreducible
components of positive dimension, the number of one-point irreducible compo-
nents is still at most d1 d2 · · · dn .
We will not prove Theorem 9.1 here. A reasonably accessible algebraic proof
can be found in [Tao12, Sec. 8.4].
Bounding the number of nonsingular zeros. The above formulation of
Bézout’s inequality leaves something to be desired, since, as we have mentioned,
verifying the assumption |V (f1 , . . . , fn )| < ∞ is not easy in general (although
there are various sufficient conditions known; see, e.g., [CLO05, Chap. 3,4] and
[Sch95]).
Another formulation, which is often useful for applications, is to consider
only a suitable kind of “nice” zeros, namely, only those where the hypersurfaces
Xi := V (fi ) intersect transversally..
We will work only over the field R, where one can rely on intuition and
methods from analysis. However, with an appropriate generalization of notions
like gradient, results can also be obtained for other fields—see [CKW11, Sec. 5].
Let X1 , . . . , Xn ⊆ Rn be the hypersurfaces as above and let a be a point
where they all intersect. Transversality means that if we make a tangent hyper-
plane hi to each Xi at a, then these n hyperplanes intersect only in a—they look
like the coordinate hyperplanes, after a suitable affine transformation (this in-
cludes the assumption that each Xi is (n−1)-dimensional in some neighborhood
of a).
We recall that if f : Rn → R is a differentiable function, then the gradient
∇f at a point a is the “fastest ascent” direction for f . Assuming f (a) = 0,
∇f (a) is perpendicular to the zero set of f , and thus it is a normal vector of
the tangent hyperplane of the zero set, assuming ∇f (a) 6= 0. (Rigorously this
can be derived from the implicit function theorem.)
The transversality of our X1 , . . . , Xn at a thus corresponds to linear in-
dependence of the n gradients ∇f1 (a),. . . , ∇fn (a), or in other words, to the
Jacobian determinant
∂f1 ∂f1
∂x1 (a) . . . ∂xn (a)
..
Jf1 ,...,fn (a) := det ... ... .
∂fn ∂fn
∂x1 (a) ... ∂xn (a)
being nonzero. (Apologies to the readers for whom the geometric meaning of
the Jacobian is well known and boring.)
A point a ∈ V (f1 , . . . , fn ) with Jf1 ,...,fn (a) 6= 0 is called a nonsingular
zero.
Theorem 9.2 (Higher-dimensional Bézout’s inequality II). Let f1 , . . . , fn ∈
R[x1 , . . . , xn ]. Then the polynomial system f1 = 0,. . . , fn = 0 has at most
d1 d2 · · · dn nonsingular zeros in Rn , where di = deg fi .
35
going via a complex version of the theorem, see [BPR03, Sec. 4.7].
So let a1 , . . . , aN ∈ Rn be nonsingular common zeros of f1 , . . . , fn ; we want
to show that N ≤ D = d1 d2 · · · dn .
First we fix a linear polynomial π ∈ R[x1 , . . . , xn ] such that the π(ai ) are all
distinct; we can think of this as choosing a projection on a suitable line. Armed
with the knowledge from the previous sections, the reader will surely supply a
rigorous proof of existence of a suitable π.
The general idea of the proof is to produce a nonzero univariate polynomial
of degree at most D for which all the π(ai ) are roots.
To this end, we would like to have a polynomial h ∈ R[y1 , . . . , yn , z] satisfy-
ing the following conditions:
36
It follows that if the original system has at least N nonsingular zeros, so
does the perturbed system for δ sufficiently small. Moreover, again for δ small
enough, these N zeros of the perturbed system still yield N distinct values
of the projection π. So if h satisfies (C1) and (C2), then for every δ ∈ Rn
sufficiently small, h(δ1 , . . . , δn , z) vanishes for at least N distinct values of z.
At the same time, since V (h) has zero Lebesgue measure (or, alternatively,
by the Schwartz–Zippel theorem), there are values δ̄1 , . . . , δ̄n ∈ (−δ, δ) and
z̄ ∈ R with h(δ̄1 , . . . , δ̄n , z̄) 6= 0. It follows that h(δ̄1 , . . . , δ̄n , z) is a nonzero
polynomial in z, of degree at most D by (C2), and hence N ≤ D as claimed. It
remains to prove the lemma.
d1 α1 + · · · + dn αn + αn+1 .
Let us call this expression the weight w(α), and set w(A) := maxα∈A w(α).
Thus, if we fix A, the degree of h̃, the polynomial after the substitution, is
at most w(A). Moreover, the coefficients of h̃ are linear functions of the cα .
We want to force zero coefficient for every monomial that could possibly
appear in h̃; each such requirement yields
a linear equation for the cα . Since
deg h̃ ≤ w(A), we thus obtain w(A)+nn homogeneous linear equations for |A|
unknowns.
Hence
w(A)+n
the lemma will be proved as soon as we find A such that |A| >
n and αn+1 ≤ D for all α ∈ A.
For an integer W , let
37
Let r(β) = (β1 mod d1 , . . . , βn mod dn ), and let us partition B into equiva-
lence classes according to the value of r(β); there are d1 d2 · · · dn = D classes.
It is easy to see that the class with r(β) = (0, . . . , 0) is at least as large as any
other class, and so
1 T +n
N (T ) ≥ .
D n
Consequently,
D+1 W −D+n
|A(W )| ≥ (D + 1)N (W − D) ≥
D n
D+1 W +n (W − D + n) · · · (W − D + 1)
= ·
D n (W + n) · · · (W + 1)
n
W +n D+1 D
≥ 1− .
n D W +1
For D fixed and W → ∞, we have (1− WD+1 )n → 1, while D+1 D remains bounded
W +n
away from 1. Hence |A(W )| > n for W sufficiently large as desired. The
lemma, as well as Bézout’s inequality for nonsingular zeros, are proved.
The degree is d = mn, and the zero set, a grid of hyperplanes, partitions Rn
into (m + 1)n ∼ (d/n)n components (axis-parallel boxes).
Theorem 10.1. For a polynomial f ∈ R[x1 , . . . , xn ] of degree d ≥ 2, Rn \ V (f )
has at most (d + 1)n components.
The proof below is similar to one in [ST12, Appendix A]. This kind of
arguments goes back to Oleinik and Petrovskiı̌ [OP49, Ole51], Milnor [Mil64],
and Thom [Tho65].
In the proof, we will need the following result.
Fact 10.2. Let f : Rn → Rn be a polynomial map (that is, a map for which
each coordinate fi : Rn → R is given by a polynomial; in algebraic geometry,
one usually speaks of regular maps in this context), and let X ⊂ Rn be a
proper algebraic variety (that is, X is contained in the zero set of a nonzero
polynomial). Then the image f (X) does not fill any open ball in Rn .
38
This result may look obvious, but obvious approaches to proofs have their
caveats.
First, we know that X is “small”; e.g., it does not fill any open ball. But,
for example, the image of a segment under a continuous map may be a unit
square, as is witnessed by the famous Peano curve. So we have to use other
properties of f besides continuity.
Approaching from the side of mathematical analysis, we can use the fact
(which we do not prove here) that the image Lebesgue null set under a smooth
map is Lebesgue null, plus Exercise 2.3. In our case, a polynomial map is not
only smooth (inifinitely differentiable), but also locally Lipschitz, which allows
for a quite straightforward proof.
A more algebraic approach to Fact 10.2 would be to prove that the image
of a proper subvariety in Rn under a polynomial map is a proper subvariety of
Rn . Unfortunately, this is not literally true, as can be seen by modifying the
hyperbola example from Section 8.2. What can be shown is that such an image
is contained in a proper subvariety of Rn , which is enough for our purposes.
This is not too hard, given the tools covered so far, and it is a special case of a
result stating that a regular map cannot increase dimension, but here we will
not go through the argument.
39
∂f
The condition for nonsingularity of a common zero a of the ∂xi reads
det Hf (a) 6= 0, where Hf is the Hessian matrix of f , with
∂2f
(Hf )ij := .
∂xi ∂xj
40
First, there is a quantitative improvement, which becomes significant if the
degree d and the dimension n are comparable: the true bound is more like
(d/n)n (which is the lower bound we got from the simple example) than dn .
Second, the bound can be extended to the complement of the union of
several zero sets, i.e., Rn \ (V (f1 ) ∪ · · · ∪ V (fm )). In this case a reasonably good
bound can be obtained just by setting f = f1 f2 · · · fm and using the bound for
a single polynomial.
Third, instead of considering just the complement, which is the set where all
of f1 , . . . , fm are nonzero, we can consider sets where some of the fi are required
to be 0, some others positive, and some negative. These three improvements
are all reflected in the next theorem.
Then, for m ≥ n ≥ 2,
X n
50dm
#Sσ ≤ ,
n
σ∈{−1,0,+1}m
The basic ideas of the proof are similar to those in the proof of Theorem 10.1
shown above, but the details are considerably more involved.
In the literature, such results are often stated as bounding the total topo-
logical complexity of the considered sets, more precisely, the sum of the Betti
numbers, instead of just the number of connected components. For still other
strengthenings of the just stated theorem, such as a more refined dependence on
the degrees of the fi , as well as replacing the ground set Rn with a k-dimensional
algebraic variety in Rn , see [Bar13] and references therein.
Bounds on the radius of components and inscribed balls. Another way
of measuring zero sets of polynomials in Rn is, for example, by the radius of the
smallest ball intersecting all connected components. Here, of course, we need
to make some assumptions on the coefficients of the polynomials; typically we
assume them to be integers not exceeding some given bound. Here is a general
result of this kind:
Theorem 10.5. Let f1 , . . . , fm ∈ Z[x1 , . . . , xn ] be polynomials of maximum
degree d whose coefficients are integers bounded by M in absolute value. For
σ ∈ {−1, 0, 1}m , let Sσ := {x ∈ Rn : sgn fi (x) = σi for all i = 1, 2, . . . , m}.
Cn
Then each connected component of Sσ intersects the ball of radius R = M (d+1)
centered at 0, were C is a suitable absolute constant. The bounded connected
components of Sσ are all contained in that ball.
If σi 6= 0 for all i, or in other words, Sσ is defined only by strict inequalities,
and if Sσ is nonempty, then it contains a rational point with coordinates whose
numerators and denominators are integers not exceeding R in absolute value.
41
This kind of result goes back to [GV88, Lemma 9] (which deals with more
special sets, namely, the zero set of a single polynomial), and the result as above
about a ball intersecting all connected components is [BPR96, Theorem 4.1.1]
(also see [BPR03, Theorem 13.14]). A statement directly implying the part
with the ball containing all bounded components is [BV07, Theorem 6.2]. For
the part with a rational point see [BPR03, Theorem 13.15].
On applications. Theorem 10.4 and its relatives have probably hundreds of
applications in geometry, combinatorics, computer science and elsewhere. An
old but still very beautiful one is Ben Or’s lower bound method for algorithms
described as algebraic computation trees [BO83].
Here is a quick application from [AFR85] which uses the more precise bound
in Theorem 10.4. Let the sign pattern of an n×n matrix A be the matrix S with
sij = sgn aij . We claim that there are n×n matrices S with only ±1 entries such
that every A with sign pattern S has rank at least cn, for a positive constant c.
2
On the one hand, there are 2n possible S’s. On the other hand, an A of
rank at most r can be written as U V T , where U and V are n × r matrices. We
consider the 2nr entries of U and V as variables; then the signs of the entries
of A are signs of quadratic polynomials in these variables. We have m = n2
polynomials and thus, by Theorem 10.4, there are no more than O(n2 /nr)2nr
possible sign patterns of a rank-r matrix A. For r < cn and c small, this
2
quantity is smaller than 2n , and so some patterns force rank at least cn.
11 Literature
Textbooks and lecture notes for such a classical subject as algebraic geometry
abound, of course, but not all of them are equally accessible to beginners.
The usual hands-on introduction, with emphasis on computational aspects,
is Cox, Little, and O’Shea [CLO07]. Schenck’s book [Sch03] is very clear, read-
able, and concise; another advantage is that it also treats many related concepts
from algebra and topology. A very good set of lecture notes freely accessible on
the web, including some of the more advanced concepts, such as sheaves and
schemes, is Gathmann [Gat13].
For intersection theory, dealing with generalizations of Bézout’s theorem
and other counting questions for varieties, a remarkable little book is Katz
[Kat06], and an older concise introduction is Fulton [Ful84].
For combinatorial, geometric, and computer science applications of polyno-
mials, we can recommend, for example, Chen, Kayal, and Wigderson [CKW11].
Recent treatments of methods similar to the one used in the joints problem are
Guth [Gut13] and Tao [Tao13].
Acknowledgment. I would like to thank Boris Bukh, Vincent Kusters, Zuzana
Safernová, Adam Sheffer, and Noam Solomon for valuable comments, sugges-
tions, and corrections to earlier versions of this document.
42
References
[AFR85] N. Alon, P. Frankl, and V. Rödl. Geometrical realization of set
systems and probabilistic communication complexity. In Proc. 26th
IEEE Symposium on Foundations of Computer Science, pages 277–
280, 1985.
[BPR96] S. Basu, R. Pollack, and M.-F. Roy. On the combinatorial and alge-
braic complexity of quantifier elimination. J. ACM, 43(6):1002–1045,
1996.
43
[GV88] D. Yu. Grigor’ev and N. N. Vorobjov jun. Solving systems of poly-
nomial inequalities in subexponential time. J. Symb. Comput., 5(1-
2):37–64, 1988.
44
[Wal79] M. Waldschmidt. Transcendence methods. Queen’s University, 1979.
Available at http://www.math.jussieu.fr/~miw/articles/pdf/
QueensPaper52.pdf.
45