Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views45 pages

Polychap

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 45

Polynomials

A chapter for the Mathematics++ Lecture Notes


Jiřı́ Matoušek

Rev. 19/V/14 JM

Here we discuss polynomials in several variables. They belong among the


most powerful and most often applied mathematical tools in computer science,
and sometimes their use works like a magic wand.
The set of all solutions of a system of m polynomial equations in n variables
is called an algebraic variety, and it is studied in algebraic geometry, one of the
most classical and deepest areas of mathematics. Here we will make the first
few steps in this fascinating field.

1 Rings, fields, and polynomials


A ring R is an algebraic structure with addition and multiplication; the readers
unsure about the definition might want to check it. Here we will consider only
commutative rings (commutativity concerns multiplication, since addition in
a ring is always commutative) with unit element 1. Actually, unlike in usual
introductory courses of algebra, we will see a large menagerie of rings.
A field is a ring in which we also have division (by each nonzero element,
that is). We will most often consider the fields R, the reals, and C, the complex
numbers, sometimes also a finite field Fq with q elements, where q, as we recall,
must be a prime power, and the rationals Q. An arbitrary field will usually
be denoted by k, partially in agreement with a typical convention in algebraic
geometry where they use k.
Everyone knows univariate polynomials such as 37x5 − 2x4 + 12. The set of
all polynomials in a variable x with coefficients in a ring R is denoted by R[x].
It also forms a ring, with the usual addition and multiplication of polynomials.
We will more often consider multivariate polynomials, such as 13x5 y 3 z −
6x2 y 4 z 2 + y 2 − 2z. We write R[x1 , . . . , xn ] for the ring of all polynomials in
variables x1 , . . . , xn with coefficients in R. A polynomial f ∈ R[x1 , . . . , xn ] is
a finite sum of terms of the form cα xα , where α = (α1 , . . . , αn ) ∈ Zn≥0 is a
vector of nonnegative integers, cα ∈ R is a coefficient, and xα is a convenient
shorthand for the monomial xα1 1 · · · xαnn . The degree of such a monomial is
α1 + · · · + αn . The degree of f , written deg f , is the maximum of the degrees
of its monomials.
The degree of the zero polynomial, which has no monomials, is usually taken
as −∞.

1
Each polynomial f ∈ R[x1 , . . . , xn ] defines a function Rn → R in an obvious
way. Usually we will use the same letter for the polynomial and for the function.
Exercise 1.1. Prove (before reading further, and carefully!) that if the function
defined by a polynomial f ∈ R[x, y] is zero everywhere on R2 , then f is the zero
polynomial; that is, all coefficients are 0. Similarly for Q k[x1 , . . . , xn ] where k
is an infinite field. (Note that if F is a finite field, then a∈F (x−a) is a nonzero
polynomial defining the zero function F → F.)

Rigidity of polynomials. Polynomials constitute one of the most significant


classes of functions, and they have various amazing properties. For computer
science, one of the most important properties is some kind of rigidity, which
can be vaguely expressed as “if two polynomials differ, then they differ a lot.”
Here is a well-known manifestation of rigidity for univariate polynomials.

(Univariate rigidity) A nonzero univariate polynomial f ∈ k[x] of degree


d ≥ 0, where k is a field, has at most d roots. Consequently, if f, g ∈ k[x],
deg(f ), deg(g) ≤ d, and f (a) = g(a) for at least d + 1 distinct points a,
then f = g.

We recall that this is proved by induction on d, by checking that if f (a) = 0,


then f is divisible by x − a.

2 The Schwartz–Zippel theorem


This is a manifestation of rigidity in the multivariate case, one which is quite
simple to prove and extremely useful, e.g., for randomized algorithms.

Theorem 2.1 (The Schwartz–Zippel theorem). Let k be a field, let d


be a natural number, and let S be a finite subset of k. Then for every
nonzero polynomial f ∈ k[x1 , . . . , xn ] of degree d, the number of n-tuples
(r1 , r2 , . . . , rn ) ∈ S n with f (r1 , . . . , rn ) = 0 is at most d|S|n−1 . In other
words, if r1 ,. . . , rn ∈ S are chosen independently and uniformly at random,
d
then the probability of f (r1 , . . . , rn ) = 0 is at most |S| .

Here we measure the size of the zero set of f discretely, by counting the
points of its intersection with the “combinatorial cube” S n . If k = Fq , we can
often simply take S = Fq .

Proof of the Schwartz–Zippel theorem. We proceed by induction on n. The n =


1 case was mentioned earlier, so let n > 1.
Let us suppose that x1 occurs in at least one term of f with a nonzero
coefficient (if not, we rename the variables). Let us write f as a polynomial in
x1 with coefficients being polynomials in x2 , . . . , xn :
k
X
f= fi xi1 , f1 , . . . , fk ∈ k[x2 , . . . , xn ],
i=0

2
where k is the maximum exponent of x1 in f .
We divide the n-tuples (r1 , . . . , rn ) with f (r1 , . . . , rn ) = 0 into two classes.
The first class, called V1 , consists of the n-tuples with fk (r2 , . . . , rn ) = 0. Since
the polynomial fk (x2 , . . . , xn ) is not identically zero and has degree at most
d − k, the number of choices for (r2 , . . . , rn ) is at most (d − k)|S|n−2 by the
induction hypothesis, and so |V1 | ≤ (d − k)|S|n−1 .
The second class V2 are the remaining n-tuples, that is, those with f (r1 , . . . , rn ) =
0 but fk (r2 , . . . , rn ) 6= 0. Here we count as follows: r2 through rn can be cho-
sen in at most |S|n−1 ways, and if r2 , . . . , rn are fixed with fk (r2 , . . . , rn ) 6= 0,
then r1 must be a root of the univariate polynomial g(x1 ) = f (x1 , r2 , . . . , rn ).
This polynomial has degree (exactly) k, and hence at most k roots. Thus
|V2 | ≤ k|S|n−1 , which gives d|S|n−1 altogether, finishing the proof.

Exercise 2.2. Check that the Schwartz–Zippel theorem is tight; i.e., exhibit an
n-variate polynomial of degree d whose zero set in S n has d|S|n−1 points (where
d < |S|).

A well known “continuous” counterpart of the Schwartz–Zippel theorem


asserts that the zero set of a nonzero polynomial f ∈ R[x1 , . . . , xn ] is a Lebesgue
null set.1 This follows, e.g., from Sard’s theorem of mathematical analysis, or
it can be proved directly.

Exercise 2.3. Imitate the proof of the Schwartz–Zippel theorem to show that
the zero set of f is Lebesgue null for every nonzero f ∈ R[x1 , . . . , xn ]. (Fubini’s
theorem heps with a convenient proof, if you are somewhat familiar with measure
theory.)

3 Polynomial identity testing

Testing perfect matchings. We recall that a matching in a graph G is a


set of edges F ⊆ E(G) such that no vertex of G is incident to more than one
edge of F . A perfect matching is a matching covering all vertices. One of the
most famous uses of the Schwartz–Zippel theorem is in an algebraic algorithm
for testing the existence of a perfect matching in a given graph.
For simplicity, we will discuss only the bipartite case. So we consider
a bipartite graph, with vertices divided into two classes {u1 , u2 , . . . , un } and
{v1 , v2 , . . . , vn } and the edges going only between the two classes. Both of the
classes have the same size, for otherwise, there is no perfect matching. Let m
stand for the number of edges of G.
Let Sn be the set of all permutations of the set {1, 2, . . . , n}. Every perfect
matching F of G uniquely corresponds to a permutation π ∈ Sn ; we can write
F = {{u1 , vπ(1) }, {u2 , vπ(2) },. . . , {un , vπ(n) }}.
We express the existence of a perfect matching by a determinant, but not
of an ordinary matrix of numbers, but rather of a matrix whose entries are
1
We recall that a set X ⊆ Rn is Lebesgue null if, for every ε > 0, X can be covered by at
most countably many axis-parallel boxes of total volume at most ε. Instead of Lebesgue null,
one also says that X has (Lebesgue) measure zero.

3
variables. We introduce a variable xij for every edge {ui , vj } ∈ E(G) (so we
have m variables altogether), and we define an n × n matrix A by

xij if {ui , vj } ∈ E(G),
aij :=
0 otherwise.

The determinant of A is a polynomial in the m variables xij . By the definition


of the determinant,
X
det A = sgn(π) · a1,π(1) a2,π(2) · · · an,π(n)
π∈Sn
X
= sgn(π) · x1,π(1) x2,π(2) · · · xn,π(n) .
π describes a perfect
matching of G

Clearly, if G has no perfect matching, then det A is the zero polynomial.


But the converse also holds: if G does have a perfect matching, then det A 6= 0
as a polynomial. To see this, we fix a permutation σ that defines a perfect
matching, and we set xi,σ(i) := 1 for every i = 1, 2, . . . , n, while the remaining
xij are set to 0. Then all terms in the above expansion of det A vanish except
for the one corresponding to σ, which is ±1.
So testing for a perfect matching in G is equivalent to testing if det A is
the zero polynomial. We cannot afford to compute det A explicitly in the usual
form, as a sum of monomials, since it may have up to n! terms.
But if we substitute specific numbers for the variables xij , we can calculate
the determinant reasonably fast, e.g., by Gaussian elimination. So we can
imagine that det A is available through a black box, from which we can obtain
its value at any specified point.
Since deg(det A) ≤ n, the Schwartz–Zippel theorem shows that if det A is
nonzero and we compute it for values of the xij chosen independently at random
from S := {1, 2, . . . , 2n}, then the probability of getting 0 is at most 21 . This
gives a probabilistic algorithm for testing the existence of a perfect matching
in G.
The probability of error can be reduced, either by repeating the algorithm
several times, or by choosing from a larger set S.
Computationally, instead of working over the integers, it is better to com-
pute the determinant over a sufficiently large finite field, because then we need
not worry about the intermediate values in the computation getting very large.
(There is a polynomial-time version of the Gaussian elimination over the inte-
gers, but it is not an easy matter.)
If we compute the determinant by Gaussian elimination, the running time
is O(n3 ), which is worse than for some combinatorial algorithms for perfect
matchings. But using fast matrix multiplication, the determinant can be com-
puted faster; the current best asymptotic running time is O(n2.376 ). This yields
the asymptotically fastest known perfect matching algorithm.
This algorithm also has a fast implementation on a parallel computer, with
polylogarithmic running time. No other known approach yields comparably
fast parallel algorithms.

4
Finally, it is worth mentioning that although the basic version of the algo-
rithm, as described above, only decides if there is a perfect matching but does
not find one, there are more sophisticated extensions that also find a perfect
matching, and if a perfect matching does not exist, they can find a matching of
maximum cardinality. See [Har09] for recent results and references.
Counting compositions. The strategy in the above algebraic algorithm is
very general and can be used for an arbitrary polynomial identity testing; that
is, for a polynomial of controlled degree provided by a black box, the Schwartz–
Zippel theorem allows us to test whether the polynomial is identically zero.
Here is another lovely application.
Given a set P ⊆ Sn of permutations, we would like to count |P ◦ P |, i.e.,
the number of distinct permutations ρ that can be expressed as a composition
σ ◦ τ for σ, τ ∈ P . Mainly for notational simplicity, let us assume |P | = n.
A straightforward algorithm for computing |P ◦ P | takes every pair (σ, τ ) ∈
P 2 , computes the composition σ ◦ τ in O(n) time, and then counts the number
of distinct permutations in the resulting list. With some care, this can be
implemented in a total of O(n3 ) time.
To get an asymptotically faster, algebraic algorithm, we introduce variables
x1 , . . . , xn and y1 , . . . , yn . Let us observe that, given permutations σ and τ , the
(quadratic) polynomial
Xn
fστ := xσ(i) yτ −1 (i)
i=1
encodes the composition ρ := σ ◦ τ , in the sense that
n
X
fστ = xρ(i) yi ,
i=1

as is easy to check. Consequently, fστ and fσ0 τ 0 are equal polynomials iff σ ◦
τ = σ 0 ◦ τ 0 . Hence, |P ◦ P | equals the number of distinct polynomials among
(fστ : σ, τ ∈ P ).
Next, we observe that all of the fστ can be evaluated simultaneously using
a matrix product. Indeed, let us enumerate P = {σ1 , . . . , σn }, and define the
polynomial matrices A, B with aij = xσj (i) and bij = yσ−1 (i) . Setting C = AT B,
j
we find that cij = fσi σj .
The probabilistic algorithm for computing |P ◦ P | now goes as follows. We
set N := 4n4 , S := {1, 2, . . . , N }, we pick values s1 , . . . , sn and t1 , . . . , tn from
S independently and uniformly at random, we make the substitutions xi := si
and yi := ti , i = 1, 2, . . . , n, and we compute the value of C. By fast matrix
multiplication this can be done in O(n2.376 ) time. We return the number of
distinct entries of the resulting matrix as the answer.
Clearly, this answer is never larger than |P ◦ P |. If it is strictly smaller than
|P ◦ P |, it means that a nonzero polynomial of the form fστ − fσ0 τ 0 evaluates
to 0 at s1 , . . . , sn , t1 , . . . , tn . For every fixed fourtuple (σ, τ, σ 0 , τ 0 ) ∈ P 4 , this
has probability at most 2n1 4 according to the Schwartz–Zippel theorem (with
degree d = 2). The probability that this occurs for at least one of the at most
n4 fourtuples is thus at most 12 .

5
Hence the answer is correct with probability at least 12 . As before, this
probability can be boosted by repetition and/or by choosing larger S.

4 Interpolation, joints, and contagious vanishing


We begin with a small counting question, whose result appears very often when
dealing with polynomials.

 number of monomials of degree at most d in variables x1 , . . . , xn


Fact 4.1. The
equals d+n
n .

Indeed, the number in question is the number of ordered n-tuples (α1 , . . . , αn ) ∈


Zn≥0 with α1 + · · · + αn ≤ d, and counting them is basic combinatorics which
we omit here. 
Somewhat imprecisely, we can say that d+n n is the number of “degrees of
freedom” for a general n-variate polynomial of degree at most d. In a more
sophisticated language, if Pd ⊂ k[x1 , . . . , xn ] is the vector space (over k) con-
sisting of all polynomials of degree at most d, then all monomials of degree at
most d form a basis, and so dim Pd = d+n n by Fact 4.1.
The next simple but surprisingly useful lemma can be regarded as a kind of
counterpart of the Schwartz–Zippel theorem: that theorem says that the zero
set of a polynomial cannot be too big, and the lemma tells us that, nevertheless,
we can cover a significant number of points by such a zero set.

Lemma 4.2. Let a1 , a2 , . . . , aN be points in kn , where N < d+n n . Then there
exists a nonzero polynomial f ∈ k[x1 , . . . , xn ] of degree at most d such that
f (ai ) = 0 for all i.

Proof. Given the ai , we regard the coefficients cα of the desired polynomial f


as unknowns. So we have d+n unknowns. A requirement of the form f (a) = 0
n 
translates to a homogeneous linear equation for the cα . Since N < n+d n , we
have fewer equations than unknowns, and such a homogeneous system always
has a nonzero solution. So there is a polynomial with at least one nonzero
coefficient.
Expressed differently, we can consider the linear map Pd → kN that sends
a polynomial f to the N -tuple (f (a1 ), . . . , f (aN )). Since dim Pd > N , this map
has a nontrivial kernel.

Exercise 4.3. (a) We recall that real numbers ξ1 , . . . , ξn are algebraically in-
dependent (over the rationals) if there is no nonzero polynomial f ∈ Q[x1 , . . . , xn ]
with f (ξ1 , . . . , ξn ) = 0. Prove that for every n there exist n algebraically inde-
pendent real numbers. Hint: one can use a cardinality argument or a measure
argument, for example.
(b) Show that if a1 , . . . , aN ∈ Rn are points whose nN coordinates are
algebraically independent, and if N = d+n n , then the only polynomial f ∈
R[x1 , . . . , xn ] of degree at most d vanishing at all the ai is identically zero.

6
Exercise 4.4. (a) Given a1 , . . . , aN ∈ kn and values b1 , . . . , bN ∈ k, prove that
there exists a polynomial f ∈ k[x1 , . . . , xn ] with f (ai ) = bi for all i = 1, . . . , N ,
and with deg f ≤ N − 1.
(b) Show that the bound deg f ≤ N − 1 is optimal in the worst case (i.e.,
find a1 , . . . , aN and b1 , . . . , bN for which no f of smaller degree will do). Note
that for n ≥ 2, this bound is very different from the one in Lemma 4.2.

The joints problem. We consider a set L of n lines in R3 , and call a point


a ∈ R3 a joint if there are at least three lines of L, not all coplanar, passing
through a. The question is, what is the maximum possible number of joints for
n lines?
There is a lower bound of Ω(n3/2 ) attained by a grid of lines,

and it was proved by Guth and Katz [GK10] in 2008, after many years of effort
by a number of people and many intermediate results, that, asymptotically, this
is the most one can get.
Theorem 4.5. The maximum number of joints of n lines in R3 is O(n3/2 ).
There is a straightforward generalization to Rd : for every fixed d, the maxi-
mum number of joints of n lines in Rd is of order nd/(d−1) , where a joint means a
point common to at least d lines whose direction vectors span Rd . For simplicity
we stick to the d = 3 case.
On partial derivatives. We recall that, for a polynomial f ∈ R[x1 , . . . , xn ],
the partial derivative ∂f /∂xi is the usual derivative of a univariate real func-
tion, where xi is regarded as a variable, while all the other xj are considered
constant. The gradient of f is the n-tuple
 
∂f ∂f
∇f := ,..., .
∂x1 ∂xn
As a side remark, we note that the derivative can be defined purely formally,
by setting ∂(xi )/∂x := ixi−1 and extending linearly, and this makes sense over
any field. Many of the usual properties of derivatives can then be checked as
well, and so one need not specialize to real (or complex) numbers. For finite
fields, though, it may be better to work  with the Hasse derivative, where the
mth Hasse derivative Dm (xi ) := mi xi−m ; this avoids troubles with dividing
by zero, e.g., in a Taylor expansion formula.
If f ∈ R[x1 , . . . , xn ] is of degree d ≥ 1, then each of the partial derivatives
∂f /∂xi is a polynomial of degree at most d − 1, and at least one of them is
nonzero.
The following observation connects the definition of a joint to algebra.

7
Observation 4.6. Let a be a joint of lines `1 , `2 , `3 in R3 , and let f ∈ R[x1 , x2 , x3 ]
be a polynomial that vanishes on each of the `i . Then ∇f (a) = 0; that is, all of
the partial derivatives of f vanish at a.

Proof. This follows easily using the notion and simple properties of directional
derivatives. Here is a more explicit argument.
W.l.o.g. we may assume a = 0 (the general case follows by translation).
If we write f = c0 + c1 x1 + c2 x2 + c3 x3 + terms of degree ≥ 2, then we have
∂f
∂xi (0) = ci , and ∇f (0) = c := (c1 , c2 , c3 ). Letting vi = (vi1 , vi2 , vi3 ) be the
directional vector of `i , the restriction of f to the line `i can be regarded as the
univariate polynomial

f (tvi ) = c0 + (c1 vi1 + c2 vi2 + c3 vi3 )t + (terms of degree ≥ 2).

Thus, vanishing of this polynomial means, in particular, that c1 vi1 + c2 vi2 +


c3 vi3 = 0; that is, the vector c is perpendicular to vi . A vector perpendicular
to three linearly independent vectors in R3 must be zero.

We are almost ready for the proof of the O(n3/2 ) upper bound for joints,
but there is still a simple technical step to be prepared. If we have a set L of
n lines with a large number m of joints, then an average line contains “many”
joints, namely, 3m/n. But for the polynomial argument to work, we want that
every line contains many joints. This is taken care by a standard “pruning”
argument (if you know the proof of the statement that every graph of average
degree 2δ contains a subgraph with minimum degree at least δ, then you also
know the proof of the next lemma).

Lemma 4.7. Let L be a set of n lines in R3 , let J be the set of all joints of L,
and let b := m/2n, where m = |J|. Then there are subsets L0 ⊆ L and J 0 ⊆ J
such that L0 6= ∅, every point of J 0 is a joint of the lines of L0 , and every line
of L0 contains more than b points of J 0 .

Proof. We use the following pruning procedure: We set J0 = J, L0 = L, and


for i = 1, 2, 3, . . ., if Li−1 contains a line ` with at most b joints of Ji−1 , we set
Li := Li−1 \ {`}, and Ji := Ji−1 \ ` (i.e., all joints in which ` participated are
removed).
By definition, this procedure finishes with some L0 = Lk and J 0 = Jk such
that each point of J 0 is a joint of the lines in L0 and each line of L0 contains
more than b joints of J 0 .
It remains to verify that L0 6= ∅, for which it suffices to check J 0 6= ∅.
Since we have removed at most n lines and at most b joints per line, we have
|J 0 | ≥ m − nb = m/2 > 0.

Now we can focus on the essence.

Proof of Theorem 4.5. For contradiction, we suppose that a set L of n lines has
m ≥ 7n3/2 joints. Let J, b = m/2n, J 0 , and L0 be as in the previous lemma.
We choose a nonzero polynomial f ∈ R[x1 , x2 , x3 ] that vanishes on all of J 0
and, subject to this condition, has the smallest possible degree.

8
First we claim that deg f ≤ b. Indeed, by Lemma 4.2, deg f does not exceed
the smallest integer d with d+3 > |J 0 |, and a simple calculation shows that
b+3
 3
3 > m ≥ |J 0 |. Namely,
 
b+3 b3 (m/2n)3 m2
> = =m ≥ m,
3 3! 3! 48n3

since we assumed m ≥ 7n3/2 .


The restriction of f on each line ` ∈ L0 is thus a univariate polynomial of
degree at most deg f ≤ b that vanishes in at least b + 1 points, and hence f
vanishes everywhere on `. By Observation 4.6, we have all the partial derivatives
∂f /∂xi zero on all of J 0 . At the same time, since deg f ≥ 1, at least one of
these partial derivatives is a nonzero polynomial.
But then such a nonzero partial derivative is a polynomial of degree strictly
smaller than f vanishing on J 0 , and this contradicts our choice of f and con-
cludes the proof.

This kind of argument is what Larry Guth calls contagious vanishing: the
vanishing of f spreads like infection from J 0 to the lines of L0 . In more com-
plicated proofs of this kind, this spreading may continue further, to suitable
planes or surfaces, and sometimes ultimately to the whole space.
There are several other beautiful applications of the contagious vanishing
argument. The most significant ones are probably a near-optimal solution to
the Erdős distinct distances problem due to Elekes, Guth, and Katz, and a proof
of the Kakeya conjecture over finite fields due to Dvir. These, and much more,
can be found, e.g., in Guth’s book [Gut13] in preparation or in Tao’s survey
[Tao13]. There are also older arguments in number theory, due to Thue (see
[Gut13]) and, especially, due to Baker (see [Wal79, Sec. 4]), which use some
sort of contagious vanishing.

5 Varieties, ideals, and the Hilbert basis theorem

Varieties. Let F ⊆ k[x1 , . . . , xn ] be a set of polynomials, possibly infinite.


The variety V (F) of F is the set of common zeros of the polynomials in F:

V (F) := {(a1 , . . . , an ) ∈ kn : f (a1 , . . . , an ) = 0 for all f ∈ F}.

Some sources use Z instead of V , Z for “zero set.”


An (algebraic) variety2 is any subset of kn that can be expressed as V (F)
for some F. More precisely, such a set is called an affine algebraic variety, to
distinguish it from a projective algebraic variety, to be mentioned later.
Exercise 5.1. Show that a finite union, as well as an arbitrary intersection, of
varieties is a variety.
2
In some of the sources an algebraic variety is also required to be irreducible (this is a
notion defined later), while an arbitrary V (F) is called an algebraic set.

9
Exercise 5.2. Prove that the sets Z ⊆ R and [0, 1]2 ⊆ R2 are not algebraic
varieties (over R).

Algebraic geometry. The study of algebraic varieties is called algebraic


geometry. In the literature, one can encounter (at least) two quite distinct
branches of algebraic geometry, with different flavor and conventions.
Classical algebraic geometry mainly investigates varieties over the complex
numbers and, more generally, over algebraically closed fields (these are fields
in which every nonconstant polynomial has a root). It is an enormous, very im-
portant, highly developed, and sometimes very abstract area of modern mathe-
matics. Actually, since the work of Grothendieck in the 1960s, “true” algebraic
geometers no longer consider algebraic varieties, but rather schemes. A scheme
is a more general and technically convenient notion, for which even the defini-
tion is out of our scope; see, e.g., [Gat13] for an introduction.
Real algebraic geometry considers varieties over R and, more generally,
semialgebraic sets, which are defined not only by conjunctions of polyno-
mial equations, but also by Boolean combinations of polynomial inequalities.
One can say that the results are perhaps less elegant than those about varieties
over algebraically closed fields, but sometimes they are closer to the needs of
computer science and other applications.
We will see a sample of basic results in both of these directions.
Ideals. We recall that an ideal in a (commutative) ring R is a subset
I ⊆ R that contains 0 and is closed under addition and under multiplication
by arbitrary elements of R (in symbols, f, g ∈ I implies f + g ∈ I and f ∈ I,
h ∈ R implies hf ∈ I).

Exercise 5.3. Show that a ring R (commutative and with 1) has only two
ideals, {0} and R, iff it is a field.

For a subset F ⊆ R, we let hFi be the ideal generated by F . By


definition, this is the intersection of all ideals in R that contain F, and it is easy
to check that hFi = {h1 f1 + · · · + hn fn : n ≥ 0, f1 , . . . , fn ∈ F, h1 , . . . , hn ∈ R}
(this is similar to linear combinations in linear algebra, but here we multiply
by arbitrary elements of R).
Specializing this to the polynomial ring k[x1 , . . . , xn ], it is easy to see that for
every set F of polynomials we have V (F) = V (hFi). Therefore, every variety
X is the set of common zeros of some ideal I in k[x1 , . . . , xn ]; X = V (I). Ideals
are usually much better to work with than arbitrary sets of polynomials.
Here is the first significant general result about varieties: we can restrict
ourselves to finitely generated ideals. A ring R is called Noetherian if every
ideal in R is generated by a finite set. In particular, every field k is Noetherian,
since the only ideals are {0} and k = h1i.
In the literature, the definition is often stated in a different but equivalent
way: R is Noetherian iff there is no infinte sequence of properly nested ideals
I1 ( I2 ( · · · in R.

Exercise 5.4. Check this equivalence.

10
Theorem 5.5 (Hilbert basis3 theorem). If R is a Noetherian ring, then the
polynomial ring R[x] is Noetherian as well. Consequently, k[x1 , . . . , xn ] is
Noetherian for every field k.

Hilbert’s proof, more than 100 years old, was unusual at that time since
it was nonconstructive: it proved the existence of a finite generating set in
every ideal of k[x1 , . . . , xn ], but did not provide any method for finding one.
This nonconstructive approach was initially criticized, but later on embraced
enthusiastically by the mathematical community. In the last decades, with
renewed emphasis on computations and algorithms, people again put much
effort into finding constructive, and efficient, proofs for important results.

Proof. Let I ⊂ R[x] be an ideal. We are going to choose a sequence f1 , f2 , f3 , . . .


of elements (polynomials) from I inductively: for i = 1, 2, 3, . . ., fi is an element
of the smallest degree in I \ hf1 , . . . , fi−1 i. For i = 1, in particular, we have
h∅i = {0}, and so f1 is a smallest-degree nonzero element of I. We have
deg f1 ≤ deg f2 ≤ · · · .
If we reach some n with hf1 , . . . , fn i = I, we are done.
Otherwise, let ai ∈ R be the leading coefficient of fi (the coefficient of
the highest power of x), and let us consider the ideal L = ha1 , a2 , . . .i. Since R
is Noetherian, L is generated by a1 , . . . , am for some finite m.
We claim that I 0 := hf1 , . . . , fm i = I. If not, then fm+1 was chosen as a
smallest-degree element in I \ I 0 . The leading P coefficient am+1 of fm+1 belongs
to L and thus it can be written as am+1 = m i=1 hi ai , where h1 , . . . , hm ∈ R.
Using this, we can construct a polynomial g ∈ I 0 that has the same degree
and same leading coefficient as fm+1 , namely,
m
X
g := hi fi xdeg fm+1 −deg fi
i=1

Then fm+1 − g has degree strictly smaller than fm+1 and lies in I \ I 0 (why?).
But this contradicts our choice of fm+1 as a smallest-degree element.

Exercise 5.6. For every n, find an ideal in R[x, y] that needs at least n gener-
ators.

6 The Nullstellensatz
The German word Nullstellensatz, meaning “zero locus theorem,” is commonly
used in English to denote a basic and classical result of algebraic geometry. It
applies to varieties over an algebraically closed field, most notably over C—a
very important assumption.
3
Here “basis” refers to what we call “generating set.” In linear algebra, bases are inclusion-
minimal generating sets and they have a number of neat properties, such as all having the
same size for a given vector space. In contrast, different inclusion-minimal generating sets of
an ideal may have very different sizes and thus, for example, they are unsuitable for defining
“dimension.”

11
For a field k that is not algebraically closed, one can sometimes obtain useful
information by applying the Nullstellensatz with the algebraic closure k of k,
which is an inclusion-minimal algebraically closed field extending k; as it turns
out, k is determined uniquely up to isomorphism.

Exercise 6.1. (a) Prove that for every field k, possibly finite, there are in-
finitely many irreducible polynomials in k[x], none a constant multiple of an-
other. (Recall that a polynomial f is irreducible if it is not a product f = gh
with deg g, deg h ≥ 1.)
(b) Deduce that every algebraically closed field is infinite.

The weak Nullstellensatz: a theorem about alternative. There are


several ways of stating the Nullstellensatz. The following one is perhaps the
most intuitive. It is called “weak” but the full version can be derived from it
fairly quickly.
Many areas of mathematics have theorems of alternative, with the following
structure: if something cannot be done, then this impossibility must be caused
by an “obvious” obstacle. In linear algebra, for example, if a system Ax = b
of linear equations has no solution, then there is a linear combination of the
equations that has all coefficients on the left-hand side zero and the right-hand
side nonzero. In other words, there exists a vector y such that y T A = 0 and
y T b 6= 0.

Exercise 6.2. Prove this using suitable theorems from linear algebra.

The weak Nullstellensatz can also be stated in this form: if a system of poly-
nomial equations f1 = f2 = · · · = fm = 0, with f1 , . . . , fm ∈ k[x1 , . . . , xn ] and
k algebraically closed, has no solution, then there are polynomials h1 , . . . , hm ∈
k[x1 , . . . , xn ] such that h1 f1 + · · · + hm fm = 1. The last equation is an obvi-
ous reason of unsolvability of the original system, since any common zero of
f1 , . . . , fm would also be a zero of h1 f1 + · · · + hm fm , but the latter is never
zero.
Here is the usual, formally somewhat different, statement.

Theorem 6.3 (Weak Nullstellensatz). Let k be algebraically closed and let I


be an ideal in k[x1 , . . . , xn ] such that V (I) = ∅; that is, there is no common
zero. Then I = h1i = k[x1 , . . . , xn ].

Exercise 6.4. (a) Give an example of how this fails over R.


(b) Prove the weak Nullstellensatz for n = 1.

The usual proofs of the weak Nullstellensatz, including those given here, are
nonconstructive—they do not provide the hi for given fi . Algorithmic methods
exist as well, and we will mention them later on. But it should be said that
although the weak Nullstellensatz provides an “obvious” reason, or proof, of
unsolvability of a given polynomial system, that proof is not necessarily very
compact. Indeed, examples are known in which the smallest possible degree of
the hi has to be exponential in n (see [Kol88] for precise bounds).

12
The ideal–variety correspondence: the (strong) Nullstellensatz. The
strong Nullstellensatz basically says that, over an algebraically closed field, alge-
braic varieties in kn are in one-to-one correspondence with ideals in k[x1 , . . . , xn ].
Or actually, not with all ideals but radical ones, where an ideal I is radical if
f s ∈ I for some natural number s implies f ∈ I.
This extra condition is needed since, e.g., the ideals hxi and hx2 i in C[x]
both define the same variety, namely {0}—but √ only the first one is radical. For
an arbitrary ideal I in a ring R, its radical I is defined in the expected way,
as {f ∈ R : f s ∈ I for some s}.

Exercise 6.5. Check that I is an ideal.
For a set S ⊆ kn , let
I(S) := {f ∈ k[x1 , . . . , xn ] : f vanishes on S};
clearly, this is an ideal.
Exercise 6.6. (a)
√ Check that V (I(X)) = X for every variety X, over any field.
(b) Verify that I ⊆ I(V (I)) for every ideal I ⊆ k[x1 , . . . , xn ], again over any
field.

Theorem 6.7 (Strong Nullstellensatz). Let k be algebraically √ closed and


let I be an ideal in k[x1 , . . . , xn ]. Then I(V (I)) = I. Thus, if
f1 , . . . , fm ∈ k[x1 , . . . , xn ] are polynomials and g is a polynomial that van-
ishes on V (f1 , . . . , fm ), then then there are P an integer s and polynomials
h1 , . . . , hm ∈ k[x1 , . . . , xn ] such that g s = mi=1 hi fi .

Proof of the strong Nullstellensatz from the weak one. This is known as the Ra-
binowitsch trick : we add a new variable and a new equation to get an unsatis-
fiable system, for which we apply the weak Nullstellensatz.
Namely, let I = hf1 , . . . , fm i; then the polynomials f1 , . . . , fm and xn+1 g −
1 ∈ k[x1 , . . . , xn+1 ] have no common zero in kn+1 . So by the weak Nullstellen-
satz we have
h1 f1 + · · · + hm fm + hm+1 (xn+1 g − 1) = 1 (1)
for some h1 , . . . , hm+1 ∈ k[x1 , . . . , xn+1 ].
This equality holds for every value of the variables, and in particular, with
xn+1 = 1/g(x1 , . . . , xn ) whenever g(x1 , . . . , xn ) 6= 0. Hence the rational function
resulting by substituting xn+1 = 1/g(x1 , . . . , xn ) into the left-hand side of (1)
equals 1 whenever g 6= 0.
We multiply both sides of the resulting equality by g s , where s is the highest
power of xn+1 appearing in (1). This yields the following equality of polynomial
functions kn → k
h01 f1 + · · · + h0m fm = g s (2)
which holds at all points except possibly at the zeros of g (here h01 , . . . , h0m ∈
k[x1 , . . . , xn ]; also note that the term hm+1 (xn+1 g − 1) vanishes). Using the
fact that every algebraically closed field is infinite (Exercise 6.1) and, for ex-
ample, the Schwartz–Zippel theorem, we get that (2) holds as an equality of
polynomials, and this concludes the proof.

13
The strong Nullstellensatz shows that, with k algebraically closed, an al-
gebraic variety in kn and a radical ideal in k[x1 , . . . , xn ] are just two ways of
looking at the same object. Such alternative views of mathematical objects are
often very useful.
Several proofs of the Nullstellensatz are known, usually with numerous vari-
ations. Here we essentially follow a particularly simple proof from [Arr06], in
which we meet a classical tool—resultants.

6.1 Intermezzo: Resultants


Resultants are useful for several purposes. They provide a way of detecting
when two polynomials have a nonconstant common factor (or, over an al-
gebraically closed field, a common root), and they are useful for eliminating
variables from a polynomial system or, in geometric terms, for projecting an
algebraic variety on a coordinate subspace. Here we introduce them briefly,
aiming mainly at the properties we will really use.
Excluding common zero of two polynomials. For a while we will be
dealing with univariate polynomials f and g; Pk first let us assume that
P` they jare
over a field k. Let us write them as f (x) = i=0 fi x and g(x) = j=0 gj x .
i

To see how one can naturally arrive at the resultant, let us consider the
system of two polynomial equations f = 0, g = 0. A possible way of showing
that it is unsolvable, i.e., f and g have no common root, is to find polynomials
a, b ∈ k[x] such that the polynomial af + bg is a nonzero constant, say 1.
First we observe that if such a and b exist, we may as well assume deg a < `
and deg b < k. This is because if some af + bg = 1, then also a0 f + b0 g = 1,
where a0 = a + pg and b0 = b − pf for some p ∈ k[x]. Hence we can reduce a
modulo g to have degree smaller than `, and then b must have degree smaller
than k, for otherwise, we would have deg bg ≥ k +` > deg af , and so af +bg = 1
would be impossible.
Let us regard the coefficients of a and b as above as unknowns. The require-
ment af + bg = 1 is an equality of polynomials of degree at most k + `. By
comparinge the coefficients of each of the relevant powers of x on both sides,
we obtain a system of k + ` linear equations with k + ` unknowns. The reader
may want to write this system down and see that its matrix looks as follows
(we show it for the special case k = 5 and ` = 3, which makes clear what the
general case is):
 
f0 0 0 g0 0 0 0 0
f1 f0 0 g1 g0 0 0 0 
 
f2 f1 f0 g2 g1 g0 0 0 
 
f3 f2 f1 g3 g2 g1 g0 0 
 
f4 f3 f2 0 g3 g2 g1 g0  .
 
f5 f4 f3 0 0 g3 g2 g1 
 
 0 f5 f4 0 0 0 g3 g2 
0 0 f5 0 0 0 0 g3
This is called the Sylvester matrix of f and g.

14
The resultant of f and g, denoted by Res(f, g, x), is the determinant of
the Sylvester matrix, which is an element of k. From the above discussion it is
clear that if Res(f, g, x) 6= 0, then the considered linear system has a solution,
and so the desired a and b with af + bg = 1 exist, witnessing the nonexistence
of a common root of f and g.

Exercise 6.8. (a) Using Euclid’s algorithm, check that if f, g ∈ k[x] have
no nonconstant common factor, then there are polynomials u, v ∈ k[x] with
uf + vg = 1. (The reverse implication is obvious.)
(b) Using (a), prove that for f, g ∈ k[x], where k need not be algebraically
closed, Res(f, g, x) = 0 implies that f and g have a nonconstant common factor.

Resultant over a ring. We will need a slightly more general setting, where
f, g ∈ R[x] are polynomials over a ring R (commutative with 1 as usual). The
definition above still makes sense and Res(f, g, x) is an element of R. The
next lemma, which we will need later, provides another way of showing that if
Res(f, g, x) 6= 0, then f and g have no common root.

Lemma 6.9. For every f, g ∈ R[x], deg f = k, deg g = `, there are a, b ∈ R[x]
with deg a ≤ ` − 1, deg b ≤ k − 1, and Res(f, g, x) = af + bg.

Proof. We do the following row operations on the Sylvester matrix: for i =


2, 3, . . . , k + ` we add the ith row multiplied by xi−1 to the first row. After that
the first row is
(f, xf, . . . , x`−1 f, g, xg, . . . , xk−1 g).
Expanding this determinant, which still equals Res(f, g, x), according to the
first row, we obtain precisely an expression of the desired form af + bg with a, b
as in the lemma.

6.2 Proof of the weak Nullstellensatz


We need a lemma saying that in a polynomial of degree d, we can make the
coefficient of xd1 nonzero by a suitable invertible linear substitution. This result
is quite simple; it is a special case of a more intricate result known as the
Noether normalization lemma.

Lemma 6.10. If k is an infinite field and f ∈ k[x1 , . . . , xn ] is a polynomial


of degree d ≥ 1, then there are λ1 , . . . , λn−1 ∈ k such that the coefficient of
xdn in the polynomial f 0 (x1 , . . . , xn ) := f (x1 + λ1 xn , . . . , xn−1 + λn−1 xn , xn ) is
nonzero.

Proof. Let fd denote the sum of all terms of degree d in f (this is called
the homogeneous component of f of degree d). Then the coefficient of xn
in f 0 equals fd (λ1 , . . . , λn−1 , 1). Since k is infinite, the nonzero polynomial
fd (x1 , . . . , xn−1 , 1) cannot vanish everywhere on kn−1 .

Proof of the weak Nullstellensatz. We establish the contraposition, so we as-


sume that I is an ideal properly contained in k[x1 , . . . , xn ], and we want to find
a common zero (a1 , . . . , an ) ∈ kn of all polynomials in I.

15
We proceed by induction on n, considering the case n = 1 settled (Exer-
cise 6.4). So let n > 1.
By Lemma 6.10 we can make a change of variables so that I contains a
polynomial g of degree d ≥ 1 with the term xdn . Since this substitution is
invertible, if we find a common zero for the ideal obtained from I after the
substitution, we can convert it back to a common zero for the original I. So we
assume we have g ∈ I as above.
Let I 0 be the set of all polynomials in I that do not contain the variable xn
(that is, there is no term with nonzero coefficient and nonzero power of xn ). We
can regard I 0 as a subset of k[x1 , . . . , xn−1 ]; then it is a proper ideal (right?),
and so by the inductive hypothesis, there is (a1 , . . . , an−1 ), a common zero of
all polynomials in I 0 .
Now we claim that the set

J := {f (a1 , . . . , an−1 , xn ) : f ∈ I},

which is obviously an ideal, is not all of k[xn ]. Once we prove this claim, we
will be done, since by the 1-dimensional weak Nullstellensatz all polynomials
in J have a common zero a ∈ k, and then (a1 , . . . , an−1 , a) is a common zero
for I.
To prove the claim, we need to check that 1 6∈ J, so for contradiction, we
assume that there is f ∈ I such that f (a1 , . . . , an−1 , x) = 1 (this is an equality
of univariate polynomials). We fix f , as well as g as above, i.e., of degree d and
with term xdn . P P
Let us consider f and g as polynomials in xn : f = ki=0 fi xin , g = dj=0 gj xjn ,
f0 , . . . , fk , g0 , . . . , gd ∈ R := k[x1 , . . . , xn−1 ].
By Lemma 6.9, the resultant Res(f, g, xn ) ∈ R can be written as af + bg
with a, b ∈ R[x], and hence it belongs to I 0 . To finish the proof, we will show
that Res(f, g, xn ) is nonzero at (a1 , . . . , an−1 ), and hence it cannot belong to I 0 .
The equality f (a1 , . . . , an−1 , x) = 1 means that f0 (a1 , . . . , an−1 ) = 1 and f1
through fk vanish at (a1 , . . . , an−1 ). Also, by the choice of g, we have gd = 1
(identically). Looking at the Sylvester matrix of f and g, again for notational
simplicity in the particular case deg f = 5, deg g = 3, i.e.,
 
f0 0 0 g0 0 0 0 0
f1 f0 0 g1 g0 0 0 0 
 
f2 f1 f0 g2 g1 g0 0 0 
 
f3 f2 f1 g3 g2 g1 g0 0 
 
f4 f3 f2 0 g3 g2 g1 g0  ,
 
f5 f4 f3 0 0 g3 g2 g1 
 
 0 f5 f4 0 0 0 g3 g2 
0 0 f5 0 0 0 0 g3

we see that at (a1 , . . . , an−1 ) it is an upper triangular matrix with 1s on the


main diagonal, and hence Res(f, g, xn )(a1 , . . . , an−1 ) = 1.

16
7 Bézout’s inequality in the plane
One of the questions that often comes up in applications is, given a system
of polynomial equations f1 = 0,. . . , fm = 0, f1 , . . . , fm ∈ k[x1 , . . . , xn ], what
can be said about the existence and number of solutions? In order to avoid
trivialities, we always assume that di := deg fi ≥ 1 for all i.
In general this is not an easy question, and in this section we will consider
the special case with two equations f (x, y) = 0, g(x, y) = 0 in two variables,
which is considerably simpler than the general setting but still interesting. (We
are leaving aside the case of a single equation f = 0, which has already been
treated to some extent, at least implicitly.)
Here is an example of the zero set of two polynomials f and g of degree
5; each of them has been created by passing the zero set through 25 random
points in [0, 1]2 using Lemma 4.2.

1.0

0.8

0.6

0.4

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

The system f = g = 0 may have infinitely many solutions—this we can


see already in the case of linear equations, where f may be a multiple of g or,
speaking geometrically, the two lines described by the equations may coincide.
For two polynomials, infinitely many solutions may occur if f and g have a
nonconstant common factor, and for k algebraically closed they actually do
occur in such case.
As we will see, excluding a common factor leads to finitely many solutions.
Let us consider an example with f (x, y) = (x−1)(x−2) · · · (x−k) and g(x, y) =
(y − 1)(y − 2) · · · (y − `). We have deg f = k, deg g = `, and the zero sets look
like this:


V (g)


V (f )

Thus, there can be as many as k` distinct solutions. This example, trivial as it


may look, is actually quite useful: the union of k hyperplanes is the zero set of

17
a degree-k polynomial, and although this is not really a typical polynomial, it
can serve for a quick sanity check of many things.
The following theorem asserts that, assuming no common factor, we cannot
have more solutions than in the example.

Theorem 7.1 (Bézout’s inequality in the plane). Let f, g ∈ k[x, y] be poly-


nomials of degrees k, ` ≥ 1, respectively, having no nonconstant common
factor. Then |V (f, g)| ≤ k`.

In algebraic geometry, Bézout’s theorem is often stated as an equality: under


suitable assumptions, there are exactly k` solutions. The assumptions have to
address three issues: (a) the field has to be algebraically closed; (b) we have to
count solutions with appropriately defined multiplicity; and (c) we also have to
count solutions “at infinity.” The next drawing illustrates these issues:

not algebraically closed multiplicity at infinity

We will talk about solutions at infinity later. Handling multiplicities prop-


erly takes a substantial amount of work, and we will not consider it here. How-
ever, Bézout’s theorem is usually applied in the inequality form.
Theorem 7.1 can be proved in several ways, for example using resultants.
The proof shown below is ingenious, short, and introduces a general approach
used for handling the concept of dimension in algebraic geometry. We begin
with some general considerations.
Coordinate rings, and measuring them. We recall from algebra that if
I is an ideal in a (commutative) ring S, we can form the quotient ring S/I,
whose elements are equivalence classes of elements of S, with a, b equivalent
if a − b ∈ I. Here we will consider the case where S is the polynomial ring
k[x1 , . . . , xn ] and I is an ideal in S.
In particular, if X ⊆ kn is an algebraic variety and I = I(X) ⊆ k[x1 , . . . , xn ],
then the quotient ring k[x1 , . . . , xn ]/I is called the coordinate ring of X and
denoted by k[X]. It has an intuitive meaning: its elements can be represented
by polynomials, but two polynomials are considered the same if they coincide
on X (strictly speaking, this is literally true only over infinite fields).
Being determined by the ideal I = I(X), the coordinate ring carries the
same information as I, but some things are more convenient to express in
terms of the coordinate ring. Moreover, k[X] is more suitable for representing
the variety X “up to isomorphism” (which we are going to define later).
Now we want to measure the “size” of coordinate rings. Slightly more
generally, we consider an ideal I and the quotient ring R := k[x1 , . . . , xn ]/I.
They are both closed under addition and under multiplication by elements of
k, and so they are also vector spaces over k.

18
The vector space dimension of R, or of I, in itself is usually not a very good
measure of “size,” since it is most often infinite. Certainly, for I = I(X) and
R = k[X], it does not capture the intuitive geometric notion of dimension of
the variety X. The trick is to consider subspaces consisting of polynomials up
to some given degree d.
For the ideal I this can be done in the obvious manner: we let I≤d consist
of all polynomials in I of degree at most d. For R this is slightly more tricky,
since two polynomials representing the same element of R may have different
degrees.
We thus define Rd as the quotient vector space k[x1 , . . . , xn ]≤d /I≤d , so the
elements of Rd are represented by polynomials of degree at most d, with the
same equivalence as that for R.
By a well known fact from linear algebra about quotient spaces, we have
dim Rd +dim I≤d = dim k[x1 , . . . , xn ]≤d = n+d
n , the last equality being Fact 4.1.
In particular, Rd and I≤d have finite dimension for every d.
The vector-space dimension of k[X]d , considered as a function of d, carries a
lot of information about the variety X, and it has a name—again after Hilbert.

Let R = k[x1 , . . . , xn ]/I be a quotient of the polynomial ring k[x1 , . . . , xn ],


and let Rd be the vector space defined as above. Then the Hilbert function
of R (or, for I = I(X), also of X) is defined as

HFR (d) := dim Rd .

If I ⊆ I 0 ⊆ k[x1 , . . . , xn ] are ideals and R, R0 are the corresponding


 quotient
rings, we have HFR ≥ HFR0 (this follows from HFR (d) = n+d n − dim I ≤d ). For
varieties this yields HFX ≤ HFX 0 for X ⊆ X 0 , which we will freely use in the
sequel.

Proof of Theorem 7.1. The plan for proving the planar Bézout inequality is now
this:
(i) We check that if X ⊆ k2 is an m-point set, then the Hilbert function of
X is at least m for all sufficiently large d.

(ii) We show that if f and g have no nonconstant common factor, then


HFR (d) ≤ k`, where R := k[x, y]/hf, gi, again for sufficiently large d.
To prove (i), let X = {a1 , . . . , am } ⊂ k2 , and let us choose a system
ϕ1 , . . . , ϕm of functions X → k that are linearly independent. For example,
we can set ϕi (aj ) := δij , the Kronecker delta, with δii = 1 and δij = 0 for i 6= j.
According to Exercise 4.4(a), for each ϕi there is a polynomial pi ∈ k[x, y]
whose values on X coincide with ϕi . Then the pi are linearly independent as
elements of the coordinate ring k[X], and this proves dim k[X]d ≥ m for all
d ≥ max deg pi . (This argument works for any number of variables, not only
two.)
As for (ii), let us first consider the ideals K := hf i and L := hgi. We claim
that for d ≥ k, we have dim K≤d = dim k[x, y]≤d−k . This is because every

19
polynomial in p ∈ K has the form p = af , and p determines a uniquely. (Here
we use that k[x, y] is a unique factorization domain; Exercise 7.3 below.) Of
course, we also have dim L≤d = dim k[x, y]d−` for d ≥ `.
What we want to bound is dim I≤d , where I = hf, gi. We have I = {af +bg :
a, b ∈ k[x, y]} = {p + q : p ∈ K, q ∈ L}. The sum of two polynomials of degree
at most d again has degree at most d, and hence I≤d ⊇ K≤d + L≤d .
Exercise 7.2. Find an example where this inclusion is proper.
Fortunately, since we need to bound dim Rd from above and thus dim I≤d
from below, the inclusion goes in the right direction. By the well-known formula
for the dimension of a sum of vector spaces, we have

dim(K≤d + L≤d ) = dim K≤d + dim L≤d − dim(K ∩ L)≤d .

It remains to note that, since f and g have no common factor, a polynomial


divisible by both f and g must be divisible by f g, and so K ∩ L = hf gi. Hence
dim(K ∩ L)≤d = dim k[x, y]≤d−k−` for d ≥ k + `.
The rest is calculation with binomial coefficients:

dim Rd = dim k[x, y]≤d − dim I≤d


≤ dim k[x, y]≤d − dim(K≤d + L≤d )
       
d+2 d−k+2 d−`+2 d−k−`+2
= − − +
2 2 2 2
= k`

(assuming d ≥ k + `).

Exercise 7.3. We recall that a (commutative) ring R is called an integral


domain if the product of every two nonzero elements is nonzero. An element
a ∈ R is irreducible if it cannot be written as a product a = bc with neither b
nor c invertible.
(a) Let R be an integral domain in which every nonzero element has a unique
factorization into irreducibles (unique up to reordering and multiplication by
invertible elements). The contents cont(f ) of a polynomial f ∈ R[x] is defined
as the greatest common divisor of all coefficients of f . Show that cont(f g) =
cont(f ) cont(g).
(b) Prove that every univariate polynomial f ∈ k[x] over a field has a unique
factorization into irreducible polynomials.
(c) Prove by induction on n that every f ∈ k[x1 , . . . , xn ] has a unique fac-
torization into irreducible polynomials.

8 More properties of varieties


In this section we introduce further basic notions and results concerning al-
gebraic varieties. Building this theory properly with all details requires much
more space, and so we try to present a reasonable selection. We will encounter
many clever and sophisticated notions, and one should not expect to master all

20
of them quickly, but hopefully they will look less frightening next time. Reading
this section should give some first impression and basic vocabulary; for serious
work one should study a proper textbook.

8.1 Irreducible components

Irreducible varieties. The union of the x-axis and y-axis in the plane
is an algebraic variety, namely, V (xy), which can naturally be decomposed
into two proper subvarieties, V (x) and V (y). Varieties that cannot be further
decomposed are called irreducible:

A variety X ⊆ kn is irreducible if we cannot express X = X1 ∪ X2 with


X1 and X2 both varieties and proper subsets of X.

As we have remarked, some sources even reserve the term variety only for
irreducible varieties, and irreducibility is extremely important. We have already
seen a hint of this in Bézout’s inequality, and many other theorems require ir-
reducibility assumptions. For example, it turns out that an irreducible variety
over an algebraically closed field has the same “local dimension” in the neigh-
borhood of each point (we have not yet defined dimension rigorously, but surely
the reader has some intuitive idea), while a reducible variety may be, e.g., the
union of a plane and a line.

Exercise 8.1. (a) (Any field) Show that if a variety X ⊆ kn is irreducible,


then I = I(X) is a prime ideal; that is, f g ∈ I implies f ∈ I or g ∈ I.
(b) (Algebraically closed field) Prove that if X ⊆ kn is a variety with k
alrebraically closed such that I(X) is prime, then X is irreducible.
(c) Check that a prime ideal is radical, but not necessarily the other way
around.

Proposition 8.2. Every variety X can be decomposed as a finite union X =


X1 ∪ · · · ∪ Xk of irreducible varieties. Moreover, assuming that Xi 6⊆ Xj for all
i 6= j, the decomposition is unique up to reordering.

The Xi as in the proposition are called the irreducible components of X.

Sketch of proof. Finiteness follows from the Hilbert basis theorem: if we could
keep decomposing indefinitely, we would obtain an infinite descending chain
of varieties X1 ) X2 ) X3 ) · · · , whose corresponding ideals would form an
infinite ascending chain, and this is impossible since k[x1 , . . . , xn ] is Noetherian.
As for uniqueness, assuming two minimal decompositions into irreducibles
X = X1 ∪ · · · ∪ Xk = X10 ∪ · · · ∪ X`0 , we observe that if some Xi were not among
the Xj0 , then the Xi ∩ Xj0 would properly decompose Xi or vice versa.

One of the basic sources of difficulties in algebraic geometry is that the


intersection of irreducible varieties need not be irreducible. A simple example is
with two irreducible algebraic curves in k2 intersecting in at least two points,
but there are more interesting higher-dimensional examples as well, one of them
to be mentioned in Section 9 below.

21
We also stress that the task of finding the irreducible decomposition of a
given variety is highly nontrivial in general, although algorithmically solvable.
The Zariski topology. In the language of algebraic geometry, a set S ⊆ kn is
called Zariski closed or just closed if it is a variety, and it is (Zariski) open
if its complement is a variety. Readers familiar with the notion of topological
space can check that this defines a topology on kn , although a somewhat pe-
culiar one. Nonempty open sets are very big (assuming an infinite field), they
are dense in kn and every two intersect. Thus, the topology is not Hausdorff.
Yet it provides a convenient framework and terminology.
Exercise 8.3. Let X ⊆ km and Y ⊆ kn be irreducible varieties. Prove that
the product X × Y ⊆ km+n is irreducible as well.

8.2 Morphisms of affine varieties


Having defined a class of objects, affine algebraic varieties in our case, one
should ask what is an appropriate notion of morphisms of the objects. Familiar
examples of morphisms include linear maps of vector spaces, homomorphisms
of groups, rings, fields, but also of graphs, and continuous maps of topological
spaces.
For affine algebraic varieties, morphisms are called regular maps. A poly-
nomial map f : km → kn is a map f = (f1 , . . . , fn ) such that each fi is given
by a polynomial in k[x1 , . . . , xm ]. If X ⊆ km and Y ⊆ kn are varieties, then
a regular map f : X → Y is a map that is a restriction of a polynomial map
f¯: km → kn to X and satisfies f (X) ⊆ Y .
An isomorphism of affine varieties is a regular map with a regular inverse.
While the affine line R is homeomorphic as a topological space to the “cusp
curve” V (x2 − y 3 ), it can be shown that they are not isomorphic as affine
varieties.

3
y= x2

We note that if f : X → Y is a regular map and ϕ : Y → k is a polynomial


function on Y , i.e., an element of the coordinate ring k[Y ], then the composition
ϕf : X → k belongs to k[X]. Thus, the composition with f induces a mapping
f ∗ : k[Y ] → k[X] (note the change of direction compared to f !). Moreover,
f ∗ is a k-algebra homomorphism, meaning that it is a ring homomorphism for
which, in addition, f ∗ (α) = α for every α ∈ k.
Conversely, it is not hard to show that every k-algebra homomorphism
k[Y ] → k[X] equals f ∗ for some regular map f : X → Y .
Exercise 8.4. Prove that; start with X = km , Y = kn .
It follows that two varieties are isomorphic exactly if their coordinate rings
are isomorphic as k-algebras. So the coordinate ring provides a “coordinate-
free” representation of a variety, independent of a specific embedding of the
variety in some kn .

22
A useful way of proving irreducibility. Let X ⊆ km be an irreducible
variety, and let f : X → kn be a regular map. Then it is easy to check that the
image f (X) is irreducible, but the statement has to be understood in a right
way.
Indeed, as we will discuss below in more detail, f (X) need not be a variety!
So we generalize irreducibility to an arbitrary set S ⊆ kn , meaning that we
cannot write S = (S ∩ X1 ) ∪ (S ∩ X2 ), where X1 , X2 are varieties and S ∩ X1 6=
S 6= S ∩ X2 . Then we can see that if f (X) were reducible, then so would be X,
because the preimage of a variety under a regular map is always a variety, as is
easy to check.
Thus, in particular, if we can express some variety Y parametrically, as the
image of some km , or of some other irreducible variety X, under a polynomial
map f , then Y is irreducible. More generally, it suffices that the image f (X) be
Zariski dense in Y , meaning that Y is the smallest variety containing f (X).
As an example, let m, n and r ≤ min(m, n) be natural numbers, and con-
sider the determinantal variety Dr (m, n) consisting of all m × n matrices,
considered as points in kmn , that have rank strictly smaller than r. This is
indeed a variety since the rank condition can be expressed as vanishing of all
r × r minors. Since an m × n matrix A has rank at most r − 1 iff it can be
expressed as a product U V , where U is m × (r − 1) and V is (r − 1) × n, we have
a surjective regular map k(r−1)(m+n) → Dr (m, n), and hence the determinantal
variety is irreducible.
Projections and images of affine varieties: constructible sets. Let us
consider the variety X := V (xy −1), a hyperbola, and project it onto the x-axis:

The projection π(X) is the x-axis minus 0, certainly not an algebraic variety.
Passing to an algebraically closed setting, complex numbers, does not help—the
0 is still missing. So affine algebraic varieties are not closed under projections,
and under regular maps in general.
One remedy is to add points at infinity and work in the projective space—
see Section 8.5 below. Another approach is to consider a larger class consisting
of all sets obtainable from varieties by finitely many set-theoretical operations;
these are called constructible sets. Using the fact that varieties are closed
under intersections and finite unions, it is not difficult to check that every
constructible set can be written as

(X1 \ Y1 ) ∪ · · · ∪ (Xk \ Yk ),

23
for varieties X1 , Y1 , . . . , Xk , Yk , where we may assume the Xi irreducible and
Yi ( Xi . Then Yi can be regarded as a set of “exceptional points” in Xi ; as we
will discuss in Section 8.3, it has smaller dimension than Xi .
We state the following result without proof:
Theorem 8.5 (Chevalley’s theorem). Let k be an algebraically closed field, and
let π : km+n → kn denote the projection on the last n coordinates. Then π(Z)
is a constructible set for every constructible set Z ⊆ km+n and, in particular,
for every variety Z.
This is actually a result about quantifier elimination in the theory of alge-
braically closed fields, and a nice proof can be found in [MO02].
Corollary 8.6. The image of a constructible set Z ⊆ km under a regular map
f : km → kn is a constructible set.
Sketch of proof. This is a generally useful trick: one needs to check that the
graph G := {(x, f (x)) ∈ km × kn : x ∈ Z} is a constructible set; then f (Z) =
π(G) is constructible by Chevalley’s theorem.

Rational maps. A rational map ϕ : km → kn is given by an n-tuple of


rational functions f
1 fn 
ϕ= ,..., ,
g1 gn
where f1 , g1 , . . . , fn , gn ∈ k[x1 , . . . , xm ] are polynomials, none of the gi identi-
cally zero.
There is a catch: a rational map is not really a map in the usual sense,
because it is undefined on the zero sets of the gi (for this reason, instead the
usual mapping arrow →, one uses 99K for a rational map). Nevertheless, it is
defined on a Zariski open subset of km , and it is still useful.
A rational map ϕ : X 99K Y of varieties, with X ⊆ km irreducible and
Y ⊆ kn is, similar to regular maps, a restriction of a rational map ϕ : km 99K
kn to X such that ϕ(X) ⊆ Y , but with the extra condition that none of
the denominators gi (assuming fi and gi having no common factors) vanishes
identically on X.
Two rational maps X 99K Y are considered equivalent if they agree on a
nonempty Zariski open subset of X (they may be defined on different Zariski
open subsets of X, though).
We have seen that an algebraic counterpart of regular maps X → Y are
k-algebra homomorphisms k[Y ] → k[Y ] of the coordinate rings. Similarly, ra-
tional maps ϕ : X 99K Y of irreducible varieties correspond to k-algebra homo-
morphisms k(Y ) → k(X), where k(X) is the quotient field of the coordinate
ring k[X] (which is an integral domain for X irreducible, so a quotient field
makes sense).
The corresponding notion of isomorphism is called birational equivalence,
and it is more permissive than the isomorphism defined by regular maps. For
example, it is known, and not extremely difficult to prove, that every variety
(over an algebraically closed field) is birationally equivalent to a hypersurface,
i.e., a variety defined by a single polynomial.

24
8.3 Dimension and degree
The dimension of algebraic varieties is defined algebraically, and it has several
rather different-looking but equivalent definitions. Here we will mention only
some of them, and we will not prove their equivalence.
In this section we will assume an algebraically closed field unless stated
otherwise. Things are considerably subtler over an arbitrary field, and it is
often preferable to work with schemes there, rather than varieties.
Dimension. Here is a definition which is very simple to state, but rather
difficult to work with. The dimension of a variety X is the largest n such
that there is a chain of properly increasing irreducible varieties ∅ ( X0 ( · · · (
Xn ⊆ X. (In particular, the empty variety ∅ has dimension −1.)
The idea is that a proper subvariety of an irreducible variety must be of lower
dimension; note that the same definition works for finite-dimensional vector
spaces. Since, in the algebraically closed case, irreducible varieties correspond
to prime ideals (Exercise 8.1), the dimension is also the length of the longest
chain of properly nested prime ideals in I(X) (this notion is called the Krull
dimension of the coordinate ring of X).
With this definition, even dim kn = n is not obvious (but it is true).
A geometric view, and degree. Another, more geometric way is to define
the dimension of a variety X ⊆ kn as the largest dimension k of a linear
subspace H ⊂ kn such that there is a projection π : kn → H with π(X) Zariski
dense in H. Here a projection is a linear map π : kn → kn such that π ◦ π = π,
and H = π(kn ).
Another, but equivalent, geometric definition of the dimension considers
only the the usual projections on all k-dimensional coordinate subspaces.
It turns out that the property of π(X) being Zariski dense in H = π(kn ) is
generic, in the sense that the set of the π not having this property is negligible:
if we parameterize all projections π onto k-dimensional subspaces by suitable
coordinates, then those with π(X) not Zariski dense in H satisfy a nontrivial
polynomial equation.
This point of view also brings us to the notion of degree. For a projection
π and a point y ∈ H = π(kn ), let us consider the number of preimages |{x ∈
X : π(x) = y}|. It can be shown that for π and y generic, this number is finite
and depends only on X. It is called the degree of X and denoted by deg X.
There is also a “dual” view: if X is a k-dimensional variety in kn , then a
generic (n − k − 1)-dimensional affine subspace of kn avoids X, while a generic
(n − k)-dimensional affine subspace intersects it in deg X points.
Dimension and regular maps. Regular maps do not increase dimension:
if X and Y are varieties (over an algebraically closed field) and f (X) = Y ,
or more generally, if f (X) is Zariski dense in Y for a regular map f , then
dim Y ≤ dim X. Moreover, if we have dim f −1 (y) = m for all y from a Zariski
dense subset of Y , then dim Y = dim X − m. Proofs can be found in many
introductory textbooks.
Generalized Bézout. If X, Y are varieties (over an algebraically closed field),
then deg(X ∩ Y ) ≤ (deg X)(deg Y ), which can be seen as a generalization of

25
Bézout’s inequality (see Heintz [Hei83]).
The Hilbert function and the Hilbert polynomial. We recall that the
Hilbert function of a variety X is defined as the Hilbert function HFk[X] of
its coordinate ring, and the value HFk[X] (d) is the dimension of the vector
space k[X]d , which consists of polynomials of degree at most d modulo the
polynomials in I(X) of degree at most d.
It turns out that for all sufficiently large d, the Hilbert function coincides
with a polynomial, called the Hilbert polynomial of X. More precisely, for
every quotient ring R = k[x1 , . . . , xn ]/I there exist d0 and a polynomial, de-
noted by HPR and obviously uniquely determined, such that HPR (d) = HFR (d)
for all d ≥ d0 .
This fact, mysterious as it may look, is not difficult. A short algebraic
proof can be found, e.g., in [Sch03, Lemma 2.3.3], and below we will provide a
geometric picture explaining the polynomial behavior.
The Hilbert polynomial provides a seemingly very different definition of
dimension and degree:

The dimension k of an affine algebraic variety X is the degree of its Hilbert


polynomial HPk[X] , and the degree of X is k! times the leading coefficient
of the Hilbert polynomial.

Monomial orderings. For presenting the promised geometric view of the


Hilbert function, we first need to define a linear ordering of the monomials
in k[x1 , . . . , xn ]; this will also be indispensable later, when we briefly discuss
computational aspects of ideals and varieties.
One particular ordering which works for our purposes is the graded lex-
icographic ordering: for two monomials Pn xα = xα1 1 · · · xαnn and xβ , we first
compare the degrees, i.e., kαk1 = i=1 αi and kβk1 , and if they are equal, we
compare the nonnegative integer vectors α and β lexicographically.
More generally, a monomial ordering is a linear ordering ≤ on Zn≥0 (we
identify monomials with their exponent vectors) that is a well-ordering,4 and
such that α < β implies α + γ < β + γ for every γ ∈ Zn≥0 . For the considerations
in this section, we also need the monomial ordering to be graded, meaning that
kαk1 < kβk1 implies α < β.
So we fix a graded monomial ordering ≤. Then every polynomial f ∈
k[x1 , . . . , xn ] has a uniquely determined leading monomial LM(f ), the one
that is the largest according to the monomial ordering.
For an ideal I in k[x1 , . . . , xn ], we let LM(I) := hLM(f ) : f ∈ Ii; this is
a monomial ideal, meaning that it is generated by monomials (but of course,
being an ideal, it also contains polynomials that are not monomials). We should
also warn that if I is generated by some polynomials f1 , . . . , fm , LM(I) may be
larger than hLM(f1 ), . . . , LM(fm )i—the reader may want to find an example.
The next claim, which we leave as an exercise, shows that, as far as the
Hilbert function is concerned, it is enough to deal with monomial ideals.
4
That is, every nonempty subset has a minimum element.

26
Exercise 8.7. Let us fix a graded monomial ordering, let I be an ideal in
k[x1 , . . . , xn ], let I 0 := LM(I), and let R := k[x1 , . . . , xn ]/I and R0 := k[x1 , . . . , xn ]/I 0
be the corresponding quotient rings.
(a) Show that I≤d has a basis (f1 , . . . , fm ) such that LM(f1 ) > · · · >
LM(fm ), and derive dim I≤d ≤ dim I≤d 0 .

(b) Prove that if the fi constitute a basis of I≤d as in (a), then LM(f1 ), . . . , LM(fm )
generate I≤d 0 . Conclude that HF 0 = HF .
R R
(c) Where does the argument use the assumption that the monomial ordering
is graded?

The proof in the exercise also shows that all monomials in I 0 = LM(I) are
linearly independent, and that each I≤d0 has a basis consisting of monomials.
n
Let us consider Z≥0 , all n-tuples of nonnegative integers, and let us color
the exponent vector of every monomial in the monomial ideal I 0 black; here is
a picture for n = 2:
α2

α1
α1 + α2 ≤ d

Since I 0 is an ideal, the black dots are the union of finitely many “corners”, i.e.,
translations of the nonnegative orthant—one corner for each generator. The
generators are marked by double circles.
The number of black dots in the halfspace α1 + · · · + αn ≤ d is the vector-
space dimension of I≤d 0 (since the corresponding monomials form a basis), and
hence the value of HFR0 (d) is the number of white dots in that halfspace (be-
cause HFR0 (d) = n+d d
0 ; we do not claim that the corresponding
− dim I≤d
monomials form a basis).
From this interpretation one can see why the Hilbert function eventually
becomes a polynomial: the key observation is that if we ignore a finite num-
ber of “irregular” white dots near the origin, the remaining white dots can be
organized into finitely many disjoint axes-parallel “orthants” of various dimen-
sions (semiinfinite rays, quadrants of planes, octants of 3-dimensional subspaces,
etc.); this is not quite a proofPbut almost. The following 3-dimensional picture
illustrates how the halfspace αi ≤ d sweeps the set of white dots, after it has
already passed the irregular part:

27
Finally, let us see why the growth of the Hilbert polynomial is related to
the geometric dimension V (I 0 ), at least for a monomial ideal I 0 .
Some thought reveals that HPR0 grows at least linearly iff at least one of
the coordinate axes has no black dots. Assuming, e.g., that all dots on the
α1 -axis are white, this means that every generator in the monomial ideal I 0 is
a multiple of one of x2 , . . . , xn , and hence the x1 -axis is contained in V (I 0 ).
Similarly, deg HPR0 ≥ 2 iff there is a two-dimensional coordinate plane
without a black point. Assuming it is the α1 α2 plane, we can see that the x1 x2 -
plane is contained in V (I 0 ), and so on—in general, the degree of the Hilbert
polynomial is the largest dimension of a coordinate subspace contained in V (I 0 ).
(And since I 0 is a monomial ideal, V (I 0 ) is the union of coordinate subspaces.)
The proofs relating the Hilbert polynomial to the other definitions of di-
mension and degree mentioned earlier are not too difficult, but here we do not
treat them.

8.4 Computation with ideals and Gröbner bases


Here we briefly consider algorithmic questions concerning varieties and ideals.
A basic question is ideal membership. Given an ideal I ⊆ k[x1 , . . . , xn ],
specified by a list of generators, i.e., I = hf1 , . . . , fm i, how can we test whether
a given polynomial g belongs to I? P
Recalling that g ∈ I means g = m i=1 hi fi for some hi , one way might be to
look for the hi , say by solving a system of linear equations for their coefficients.
But, as we have remarked, the required degrees of the hi may be very high, and
this method is not practical.
If we have n = 1, i.e., univariate polynomials, every ideal can be generated
by a single polynomial f , and testing whether g ∈ hf i is very simple: we just
reduce g modulo f and see if the remainder is 0. This, of course, assumes that
we know a single generator: if I is given by several generators f1 , . . . , fm , then
we first need to compute their greatest common divisor.
The division algorithm. Back in the multivariate setting and trying to
proceed analogously, the first question is, given generators f1 , . . . , fm , what

28
does it mean to reduce them “modulo f1 , . . . , fm ”? We would like to write
g = a1 f1 + · · · + am fm + r, for suitable polynomials a1 , . . . , am and r, where r
should be a “remainder” after the division of g by the fi .
A good way of doing this is to fix a monomial ordering ≤, as introduced in
the previous section (but this time it need not be graded), and always try to get
rid of the leading monomial of the current g by subtracting the right multiple
of some fi .
Here is the division algorithm. It receives g as input, and successively
reduces it by subtracting suitable multiples of the fi , while simultaneously
building the remainder r.

1. Set r := 0.

2. Let µ := LM(g) be the leading monomial of the current g. If there is


some i such that LM(fi ) divides µ, choose one (arbitrarily), and subtract
the appropriate multiple of fi from g so that the coefficient of µ after the
subtraction is 0. Repeat this step with the new g. If there is no such i,
go to the next step.

3. At this point none of the LM(fi ) divides LM(g). Subtract the leading
term of g (i.e., LM(g) with the coefficient it has in g) from g and add it
to r. If g = 0, finish, and otherwise, go back to the previous step.

This algorithm is finite, since it strictly decreases LM(g), according to the


monomial ordering, in each step.
But unfortunately, it is not sufficient to test ideal membership unless we
have a very good set of generators. For example, if we run it with f1 = x2 + y,
f2 = xy + x, g = x2 − y 2 , and the graded lexicographic order as in the preceding
section, we get a nonzero remainder −y 2 − y. Yet g ∈ hf1 , f2 i, since x2 − y 2 =
−yf1 + xf2 .
The problem here is that, in the expression −yf1 + xf2 , the leading terms
cancel out.
Gröbner bases. It turns out that, for a given monomial ordering, every
polynomial ideal I has a “very good” set of generators, called Gröbner5 basis,
for which the division algorithm above is guaranteed to test membership in I
correctly: it returns remainder 0 iff g ∈ I.
This can be taken as a definition of a Gröbner basis. An equivalent condi-
tion, and the usual definition, is this:

An m-tuple f1 , . . . , fm is a Gröbner basis of an ideal I, w.r.t. a given


monomial order, if I = hf1 , . . . , fm i and LM(I) = hLM(f1 ), . . . , LM(fm )i;
in words, the leading monomials of the fi generate the ideal of the leading
monomials of all polynomials in I.

A Gröbner basis of I w.r.t. one monomial order may fail to be a Gröbner


basis for a different monomial order.
5
Often spelled Groebner in English texts and software.

29
A Gröbner basis f1 , . . . , fm is called reduced if it satisfies a certain natural
minimality condition, namely, the leading monomials of the fi have coefficient
1, and no monomial in any fi is in the ideal generated by the LM(fj ) for j 6= i.
For a given I and monomial order, it can be shown that a reduced Gröbner
basis is unique.
There are algorithms that, given an arbitrary set of generators of I, compute
a Gröbner basis, usually a reduced one, w.r.t. a given monomial order. This
algorithmic task has been investigated a lot, since it is very significant both
in theory and in practice. In the worst case, the computational complexity, as
well as the size of the resulting Gröbner basis, are at least exponential in n, the
number of variables.
Once a Gröbner basis is available, we can solve the ideal membership prob-
lem by the division algorithm. Many other tasks can be solved as well: comput-
ing the sum, intersection, or quotient of two ideals; computing the dimension,
Hilbert polynomial, and Hilbert function of a given variety; solving a system of
polynomial equations; etc. The worst-case computational complexity of these
problems is again very high, but the existing implementations can sometimes
handle impressively large instances.
A nice mathematical application of these algorithms is for automatic theo-
rem proving: with Gröbner bases and some cleverness one can make a computer
program routinely prove many theorems in high-school geometry or even be-
yond it, for example, the Pappus theorem. The method is sketched in [CLO07].
Here we finish our very brief excursion to algorithms, referring to [CLO07]
for a thorough introduction.

8.5 Projective varieties


Instead of the affine space kn , algebraic geometry is usually done in the pro-
jective space Pn = Pn (k), which can be thought of as a completion of kn by
adding points at infinity in a suitable way. Then almost everything comes out
more elegantly and algebraic varieties behave much better—for example, over
an algebraically closed field, the projection of a variety is again a variety, unlike
in the affine case.
The projective space. To construct Pn formally, we consider all (n + 1)-
tuples (a0 : a1 : · · · : an ), where a0 , . . . , an ∈ k are not all simultaneously 0.
Each point a of Pn is an equivalence class of such (n + 1)-tuples consisting of
all nonzero multiples of some (n + 1)-tuple:

a = {(λa0 : λa1 : · · · : λan ) : λ ∈ k \ {0}}.

Such an equivalence class can be viewed as a line through the origin in kn+1 .
The (n + 1)-tuple (a0 : · · · : an ) is called the homogeneous coordinates of a;
these are defined only up to a scalar multiple.
The following picture illustrates, for the case n = 2, the geometric meaning
of this construction.

30
`1 `2

`3
x0 = 1
0
x0 = 0

Here k2 , to which we want to add points at infinity, is embedded in k3 as the


gray plane x0 = 1 (where the coordinates in k3 are x0 , x1 , x2 and the x0 -axis is
drawn vertical). Each point a of this plane corresponds to the line 0a through
the origin in k3 .
Conversely, each line through the origin corresponds to exactly one point of
the gray plane, except for horizontal lines, such as `3 . When we start tilting the
line `1 towards the position `2 and further towards the horizontal position `3 ,
the corresponding point in the gray plane recedes to infinity along the dashed
line. So horizontal lines such as `3 correspond to points at infinity, one point
for each direction of parallel lines in the gray plane.
Algebraically, in this interpretation, a point of Pn with homogeneous coor-
dinates (x0 : · · · : xn ) with x0 6= 0 corresponds to the point ( xx10 , . . . , xxn0 ) ∈ kn .
Adding the points at infinity, with x0 = 0, can be thought of as adding a copy
of Pn−1 to kn .
On the other hand, the structure of Pn is the same everywhere and locally, in
the neighborhood of each point, it looks like the affine space kn . In our picture,
the plane representing k2 can be rotated around 0, and this yields various ways
of placing k2 in P2 . In algebraic geometry, this allows one to transfer all kinds
of “local” notions from the affine setting to the projective one.
Projective varieties. We would like to say that projective varieties are zero
sets of polynomial systems of equations in Pn , but we have to be a bit careful.
Working in Pn , we have n + 1 coordinates x0 , . . . , xn , but it does not make
sense to consider, for example, the equation x1 = x20 , since the (n + 1)-tuple
(1 : 1 : · · · : 1) satisfies it, but (2 : 2 : · · · : 2), representing the same point of
Pn , does not.
One has to consider only zero sets of homogeneous polynomials f ∈
k[x0 , . . . , xn ], meaning that all monomials of f have the same degree; then
the zero set can be regarded as a subset of Pn . The counterpart for ideals
is a homogeneous ideal, one generated by homogeneous polynomials (but
necessarily containing non-homogenerous polynomials too); for such an ideal I,
the variety V (I) ⊆ Pn is well defined as the set of common zeros of all f ∈ I.
Every polynomial f ∈ k[x1 , . . . , xn ] can be homogenized to a homogeneous
polynomial f˜ by adding an appropriate power of x0 to each term so that the
resulting polynomial becomes homogeneous (and has the same degree as f ). For
instance, from x31 +x1 x2 +5 we get x31 +x0 x1 x2 +5x30 . An ideal I ⊆ k[x1 , . . . , xn ]
is homogenized to the homogeneous ideal I˜ = hf˜ : f ∈ Ii ⊂ k[x0 , . . . , xn ].
(Let us mention that the homogenization of a generating set of I need not
generate I.) ˜ From an affine variety V (I) ⊆ kn we thus obtain the projective
completion V (I) ˜ ⊆ Pn (it is perhaps worth mentioning that isomorphic affine

31
varieties may have nonisomorphic projective completions). The meaning of
I(X) for X ⊆ Pn is also modified appropriately.
Many of the concepts and results from the affine setting transfer to pro-
jective varieties without change (irreducible decomposition, Zariski open and
closed sets) or with only minor modifications.
For the weak Nullstellensatz, V (I) = ∅ not only for I = h1i, but also
when the radical of I is hx0 , x1 , . . . , xn i. This irrelevant ideal also has to be
excluded in the strong Nullstellensatz; after that, over an algebraically closed
field, we have a bijective correspondence between homogeneous radical ideals
and projective varieties.
A morphism f : X → Y of projective varieties X ⊆ Pm and Y ⊆ Pn
needs to be defined locally: for each x0 ∈ X ⊆ Pm there is a Zariski open
neighborhood U and homogeneous polynomials f0 , . . . , fn ∈ k[x0 , . . . , xm ] of
the same degree such that f (x) = (f0 (x) : · · · : fn (x)) for all x ∈ U (and in
particular, at least one fi (x) must be nonzero for each x).
As for the Hilbert function, in the projective case one needs to take the
dimension of k[x0 , . . . , xn ]/I=d , where I=d is the vector subspace spanned by
homogeneous polynomials of degree exactly d in the homogeneous ideal I.
Cutting with a polynomial. If X is a k-dimensional projective variety
over an algebraically closed field and f is a polynomial, then k − 1 ≤ dim(X ∩
V (f )) ≤ k. If, moreover, X is irreducible and f does not vanish on it, then
dim(X ∩ V (f )) = k − 1.
Exercise 8.8. Show that this fails for affine varieties; dim(X ∩ V (f )) can be
smaller than dim X − 1.

Projection. Unlike in the affine case, the projection of a projective variety


is also a projective variety, and so is the image under a morphism.
One has to be slightly careful with what is meant by a projection, since in
Pn we cannot simply omit some of the homogeneous coorditates, because we
might get all 0s.
One way around this is to consider a projection as a map π : Pm × Pn → Pn ,
but strictly speaking, we have not yet defined what a variety in Pm × Pn is.
There are two equivalent ways of doing that.
First, we may embed Pm × Pn as a variety in P(m+1)(n+1)−1 ; this is called
the Segre embedding, and it sends a pair ((x0 : · · · : xm ), (y0 : · · · : yn )) to
(x0 y0 : x1 y0 : · · · : xi yj : · · · : xm yn ). Then varieties in Pm × Pn are just the
intersections of varieties in P(m+1)(n+1)−1 with the embedded copy of Pm × Pn .
(We note in passing that the image of the Segre embedding is essentially the
determinantal variety D2 (m + 1, n + 1) mentioned in Section 8.2.)
Second, and more explicitly, a variety in Pm × Pn is the common zero set
of a set of bihomogeneous polynomials f ∈ k[x0 , . . . , xm , y0 , . . . , yn ], where f is
bihomogeneous if each monomial has degree k in the xi and degree ` in the yi ,
for some k, `, possibly with k 6= `. Then the result can be stated as follows:
Theorem 8.9 (Projection theorem). For every projective variety Z ⊆ Pm ×
Pn over an algebraically closed field, π(Z) is also a projective variety, where
π : Pm × Pn → Pn is the projection onto the second factor.

32
Let us prove at least something in this long section.

Proof. Let f1 , . . . , fr be bihomogeneous generators of I(Z). We may assume


that all of them have the same degree k in the xi . (If, in order to achieve that,
we need to raise the degree of f1 by d, say, we replace f1 by the (n + 1)-tuple
of polynomials xd0 f1 , xd1 f1 , . . . , xdn f1 , which does not change the zero set.)
Let us fix a point a ∈ Pn and write fi,a (x) := fi (x, a). By definition,
a 6∈ π(Z) means that the fi,a have no common zero in Pm . By the projective
weak Nullstellensatz, this happens iff the radical of the homogeneous ideal I :=
hf1,a , . . . , fr,a i ⊆ k[x0 , . . . , xn ] contains the irrelevant ideal hx0 , . . . , xn i. In
other words, there are s0 , . . . , sn with xsi i ∈ I, i = 1, 2, . . . , n.
For the proof to work, we transform this condition further: setting s :=
s0 +· · ·+sn , we can see that I contains all homogeneous polynomials of degree s.
So a 6∈ π(X) if and only if there exists s such that for every homogeneous
g ∈ k[x0 , . . . , xn ] of degree s we can find h1 , . . . , hr ∈ k[x0 , . . . , xn ] with
r
X
g= hi fi,a . (3)
i=1

Here, crucially, since all the fi,a are homogeneous of degree k, we may
assume that the hi are homogeneous of degree s − k, because monomials of any
other degree can be discarded from them without changing the validity of (3).
Therefore, for every g, (3) can be rewritten as a system of linear equations
for the unknown coefficients of the hi . The matrix of this system, call it A, does
not depend on g, and its entries are homogeneous polynomials in a0 , . . . , an , the
homogeneous coordinates of a. The number of equations  is t, the number of
monomials of degree s in n + 1 variables; it equals s+n n but we do not need
that.
The solvability of (3) for every g means that the linear system is solvable
for every right-hand side, which means exactly that A has rank t. Hence the
negation of this condition can be expressed as vanishing of all the t × t minors
of A.
Let Ys be the set of all a ∈ Pn such that the matrixT∞ A as above has rank
less than t. Each Ys is a variety, and we have π(Z) = s=0 Ys . Therefore, π(Z)
is a projective variety as claimed.

9 Bézout’s inequality in higher dimensions


9.1 In search of a proper statement
We again consider the system of polynomial equations f1 = 0,. . . , fm = 0,
f1 , . . . , fm ∈ k[x1 , . . . , xn ], this time for n > 2 variables.
The most important case is m = n. Guided by the example with hyper-
planes, i.e., with fi = (xi − 1) · · · (xi − di ), where di = deg fi ≥ 1, we expect
that if the number of solutions is finite, then it should be at most d1 d2 · · · dn .
Moreover, finitely many solutions should be the typical, “generic” case.

33
Warning example. Unlike in the planar case, over an arbitrary field, having
finitely many solutions does not guarantee that the bound d1 d2 · · · dn for their
number is correct.
Indeed, the system of three equations
(x − 1)2 (x − 2)2 · · · (x − k)2 + (y − 1)2 (y − 2)2 · · · (y − k)2 = 0, z = 0, z = 0
has k 2 solutions in R3 , but the degrees are 2k, 1, 1. We note that the solution
set in C3 is infinite.
Another example. In the previous example, the first equation has only
1-dimensional solution set over R, while the two remaining equations are iden-
tical. However, over C the solution set of the first equation is 2-dimensional,
and so at least over algebraically closed fields, one might hope to exclude this
kind of pathology by imposing a suitable condition on the fi . Indeed, drawing
inspiration from the planar case, a natural guess for such a condition can be
that no two of the fi have a common factor.
However, things are not that simple, and the suggested condition is definitely
not the right one. Here is a highly instructive example for n = 3:
f1 = x3 − yz, f2 = y 2 − xz, f3 = z 2 − x2 y.
These are irreducible polynomials, as is easy to check, none a multiple of an-
other. But V (f1 , f2 , f3 ) contains the curve C with parametric expression
C = {(t3 , t4 , t5 ) : t ∈ C},
and so surely it is not finite.
This example is also interesting in another respect. In linear algebra, every
k-dimensional vector subspace of kn can be described by n − k linear equations;
for example, a line in R3 is always the intersection of two planes. In contrast,
the curve C cannot be defined by two polynomial equations: It is easy to check
the common zero set of every two of the fi contains points not belonging to the
zero set of the third—e.g., V (f1 , f2 ) contains the z-axis, where f3 is nonzero.
With more effort, one can show that no two polynomials suffice; this is done
algebraically, by checking that the ideal hf1 , f2 , f3 i cannot be generated by two
polynomials.
Let us remark that things cannot get completely out of hands with this
kind of examples: it is known that every irreducible affine variety in kn , k
algebraically closed, can be given as the zero set of at most n + 1 polynomials
[Hei83, Prop. 3].
Bézout’s inequality assuming finitely many zeros. It seems that there is
no particularly useful general condition for V (f1 , . . . , fn ) to be finite, although
there are algorithms that can decide this question for any given f1 , . . . , fn —but
these are nontrivial and quite demanding computationally.
One way around this is to assume V (f1 , . . . , fn ) finite. Then, for k is alge-
braically closed, the expected inequality for the number of zeros does hold.
Theorem 9.1 (Higher-dimensional Bézout’s inequality I). Let k be algebraically
closed, and let f1 , . . . , fn ∈ k[x1 , . . . , xn ] be polynomials of degrees d1 , . . . , dn ≥
1. Assuming that V (f1 , . . . , fn ) ⊂ kn is finite, it has at most d1 d2 · · · dn points.

34
Actually, one can say a bit more: even if V (f1 , . . . , fn ) contains irreducible
components of positive dimension, the number of one-point irreducible compo-
nents is still at most d1 d2 · · · dn .
We will not prove Theorem 9.1 here. A reasonably accessible algebraic proof
can be found in [Tao12, Sec. 8.4].
Bounding the number of nonsingular zeros. The above formulation of
Bézout’s inequality leaves something to be desired, since, as we have mentioned,
verifying the assumption |V (f1 , . . . , fn )| < ∞ is not easy in general (although
there are various sufficient conditions known; see, e.g., [CLO05, Chap. 3,4] and
[Sch95]).
Another formulation, which is often useful for applications, is to consider
only a suitable kind of “nice” zeros, namely, only those where the hypersurfaces
Xi := V (fi ) intersect transversally..
We will work only over the field R, where one can rely on intuition and
methods from analysis. However, with an appropriate generalization of notions
like gradient, results can also be obtained for other fields—see [CKW11, Sec. 5].
Let X1 , . . . , Xn ⊆ Rn be the hypersurfaces as above and let a be a point
where they all intersect. Transversality means that if we make a tangent hyper-
plane hi to each Xi at a, then these n hyperplanes intersect only in a—they look
like the coordinate hyperplanes, after a suitable affine transformation (this in-
cludes the assumption that each Xi is (n−1)-dimensional in some neighborhood
of a).
We recall that if f : Rn → R is a differentiable function, then the gradient
∇f at a point a is the “fastest ascent” direction for f . Assuming f (a) = 0,
∇f (a) is perpendicular to the zero set of f , and thus it is a normal vector of
the tangent hyperplane of the zero set, assuming ∇f (a) 6= 0. (Rigorously this
can be derived from the implicit function theorem.)
The transversality of our X1 , . . . , Xn at a thus corresponds to linear in-
dependence of the n gradients ∇f1 (a),. . . , ∇fn (a), or in other words, to the
Jacobian determinant
 ∂f1 ∂f1 
∂x1 (a) . . . ∂xn (a)
 .. 
Jf1 ,...,fn (a) := det  ... ... . 
∂fn ∂fn
∂x1 (a) ... ∂xn (a)
being nonzero. (Apologies to the readers for whom the geometric meaning of
the Jacobian is well known and boring.)
A point a ∈ V (f1 , . . . , fn ) with Jf1 ,...,fn (a) 6= 0 is called a nonsingular
zero.
Theorem 9.2 (Higher-dimensional Bézout’s inequality II). Let f1 , . . . , fn ∈
R[x1 , . . . , xn ]. Then the polynomial system f1 = 0,. . . , fn = 0 has at most
d1 d2 · · · dn nonsingular zeros in Rn , where di = deg fi .

9.2 Proof for nonsingular zeros


We present a proof of Theorem 9.2 due to Wooley [Woo96], mostly following a
presentation in [CKW11, Sec. 5]. For another more or less elementary proof,

35
going via a complex version of the theorem, see [BPR03, Sec. 4.7].
So let a1 , . . . , aN ∈ Rn be nonsingular common zeros of f1 , . . . , fn ; we want
to show that N ≤ D = d1 d2 · · · dn .
First we fix a linear polynomial π ∈ R[x1 , . . . , xn ] such that the π(ai ) are all
distinct; we can think of this as choosing a projection on a suitable line. Armed
with the knowledge from the previous sections, the reader will surely supply a
rigorous proof of existence of a suitable π.
The general idea of the proof is to produce a nonzero univariate polynomial
of degree at most D for which all the π(ai ) are roots.
To this end, we would like to have a polynomial h ∈ R[y1 , . . . , yn , z] satisfy-
ing the following conditions:

(C1) The polynomial h̃ := h(f1 , . . . , fn , π) ∈ R[x1 , . . . , xn ], obtained by sub-


stituting fi (x1 , . . . , xn ) for yi and π(x1 , . . . , xn ) for z into h, is the zero
polynomial.

(C2) The highest power of z occurring in h is at most D.

(C3) The univariate polynomial h0 (z) := h(0, 0, . . . , 0, z) is nonzero.

If we had such an h, we would be done: by (C1), h0 (π(a)) = 0 whenever


a is a common zero of the fi , by (C2) we have deg h0 ≤ D, and together with
(C3) this would show that h0 has at most D zeros.
However, a suspicious thing is that this plan does not use the nonsingularity
of the considered common zeros of the fi , and indeed, we will have to modify it.
But (C1) and (C2) can be achieved; this is done by linear algebra and counting,
and it works over any field.

Lemma 9.3. Given arbitrary polynomials f1 , . . . , fn , π ∈ k[x1 , . . . , xn ] with


deg fi = di and deg π = 1, there exists a nonzero polynomial h ∈ k[y1 , . . . , yn , z]
satisfying (C1) and (C2).

We postpone the proof of the lemma. Having such an h, we cannot guarantee


(C3), unfortunately. But here we use the assumption with nonsingular zeros to
perturb the fi , and for the perturbed version we will be able to get (C3).
Concretely, we perturb by choosing a sufficiently small vector δ = (δ1 , . . . , δn ) ∈
n
R and considering the perturbed system f1 = δ1 ,. . . , fn = δn . We claim that
if a is a nonsingular zero of the original system, with zero right-hand sides, then
for every δ sufficiently small, there is a zero a(δ) of the perturbed system, such
that a(δ) → a as kδk → 0.
This is a textbook application of the implicit function theorem; after all,
nonzero Jacobian is typically used through that theorem. We just consider the
function F : Rn × Rn → Rn given by coordinate-wise by F (x, δ)i := fi (x) − δi .
Then F (a, 0) = 0, and the implicit function theorem guarantees the existence
of a (continuous) function a(δ) with F (a(δ), δ) = 0 for all δ sufficiently small
(note that the Jacobian in the implicit function theorem is with respect to
the “dependent” variables, which in our case are the xi , and this is exactly
Jf1 ,...,fn (a) as in the definition of nonsingular zero).

36
It follows that if the original system has at least N nonsingular zeros, so
does the perturbed system for δ sufficiently small. Moreover, again for δ small
enough, these N zeros of the perturbed system still yield N distinct values
of the projection π. So if h satisfies (C1) and (C2), then for every δ ∈ Rn
sufficiently small, h(δ1 , . . . , δn , z) vanishes for at least N distinct values of z.
At the same time, since V (h) has zero Lebesgue measure (or, alternatively,
by the Schwartz–Zippel theorem), there are values δ̄1 , . . . , δ̄n ∈ (−δ, δ) and
z̄ ∈ R with h(δ̄1 , . . . , δ̄n , z̄) 6= 0. It follows that h(δ̄1 , . . . , δ̄n , z) is a nonzero
polynomial in z, of degree at most D by (C2), and hence N ≤ D as claimed. It
remains to prove the lemma.

Proof of Lemma 9.3. We will look for h in the form


X
h(y1 , . . . , yn , z) = cα y1α1 · · · ynαn z αn+1 ,
α∈A

where A ⊂ Zn+1 ≥0 is a suitable finite set of (n + 1)-tuples, whose choice we will


discuss later, and where the cα are regarded as unknowns. So we have |A|
unknowns.
If we make the substitution y1 = f1 ,. . . , yn = fn , z = π for a monomial
α1
y1 · · · ynαn z αn+1 , the degree of the resulting polynomial in x1 , . . . , xn is

d1 α1 + · · · + dn αn + αn+1 .

Let us call this expression the weight w(α), and set w(A) := maxα∈A w(α).
Thus, if we fix A, the degree of h̃, the polynomial after the substitution, is
at most w(A). Moreover, the coefficients of h̃ are linear functions of the cα .
We want to force zero coefficient for every monomial that could possibly
appear in h̃; each such requirement yields
 a linear equation for the cα . Since
deg h̃ ≤ w(A), we thus obtain w(A)+nn homogeneous linear equations for |A|
unknowns.
Hence
w(A)+n
 the lemma will be proved as soon as we find A such that |A| >
n and αn+1 ≤ D for all α ∈ A.
For an integer W , let

A = A(W ) := {α : w(α) ≤ W, αn+1 ≤ D}.



We want to show that |A(W )| > Wn+n holds for all sufficiently large W . The
counting must be quite precise; after all, the proof cannot work with D − 1
instead of D.
n
Pna parameter T , let N (T ) be the number vectors (α1 , . . . , αn ) ∈ Z≥0 such
For
that i=1 di αi ≤ T ; we have
D
X
|A(W )| = N (W − αn+1 ) ≥ (D + 1)N (W − D).
αn+1 =0

Let B = B(T ) be the set of all β = (β1 , . . . , βn ) ∈ Zn≥0 with β1 +· · ·+βn ≤ T ;



we have |B| = T +nn . We can express N (T ) as the number of β ∈ B such that
βi mod di = 0 for all i.

37
Let r(β) = (β1 mod d1 , . . . , βn mod dn ), and let us partition B into equiva-
lence classes according to the value of r(β); there are d1 d2 · · · dn = D classes.
It is easy to see that the class with r(β) = (0, . . . , 0) is at least as large as any
other class, and so  
1 T +n
N (T ) ≥ .
D n
Consequently,
 
D+1 W −D+n
|A(W )| ≥ (D + 1)N (W − D) ≥
D n
 
D+1 W +n (W − D + n) · · · (W − D + 1)
= ·
D n (W + n) · · · (W + 1)
   n
W +n D+1 D
≥ 1− .
n D W +1
For D fixed and W → ∞, we have (1− WD+1 )n → 1, while D+1 D remains bounded
W +n

away from 1. Hence |A(W )| > n for W sufficiently large as desired. The
lemma, as well as Bézout’s inequality for nonsingular zeros, are proved.

10 Bounding the number of connected components


How complicated can the zero set of a polynomial f ∈ R[x1 , . . . , xn ] of degree
d be? The answer depends, of course, on how we measure the complexity, and
there are several sensible ways.
We will first look at the number of connected components of the comple-
ment, i.e., of Rn \ V (f ). In this case a good answer can be given with a
reasonably simple proof.
To see what can be expected, we consider the usual example with hyper-
planes, slightly modified:
n Y
Y m
f (x1 , . . . , xn ) = (xi − j).
i=1 j=1

The degree is d = mn, and the zero set, a grid of hyperplanes, partitions Rn
into (m + 1)n ∼ (d/n)n components (axis-parallel boxes).
Theorem 10.1. For a polynomial f ∈ R[x1 , . . . , xn ] of degree d ≥ 2, Rn \ V (f )
has at most (d + 1)n components.
The proof below is similar to one in [ST12, Appendix A]. This kind of
arguments goes back to Oleinik and Petrovskiı̌ [OP49, Ole51], Milnor [Mil64],
and Thom [Tho65].
In the proof, we will need the following result.
Fact 10.2. Let f : Rn → Rn be a polynomial map (that is, a map for which
each coordinate fi : Rn → R is given by a polynomial; in algebraic geometry,
one usually speaks of regular maps in this context), and let X ⊂ Rn be a
proper algebraic variety (that is, X is contained in the zero set of a nonzero
polynomial). Then the image f (X) does not fill any open ball in Rn .

38
This result may look obvious, but obvious approaches to proofs have their
caveats.
First, we know that X is “small”; e.g., it does not fill any open ball. But,
for example, the image of a segment under a continuous map may be a unit
square, as is witnessed by the famous Peano curve. So we have to use other
properties of f besides continuity.
Approaching from the side of mathematical analysis, we can use the fact
(which we do not prove here) that the image Lebesgue null set under a smooth
map is Lebesgue null, plus Exercise 2.3. In our case, a polynomial map is not
only smooth (inifinitely differentiable), but also locally Lipschitz, which allows
for a quite straightforward proof.

Exercise 10.3. (a) Verify that a polynomial map f : Rn → Rn is locally Lip-


schitz, meaning that for every x0 ∈ Rn there exist ε > 0 and L such that f is
L-Lipschitz on the ε-ball around x0 , i.e., kf (x) − f (y)k ≤ Lkx − yk for every
choice of x, y in that ball. (Unlike in most uses of the letter ε in analysis, here
one can actually take ε as large as desired.)
(b) Prove that the image of a Lebesgue null set under a locally Lipschitz
map Rn → Rn is Lebesgue null.

A more algebraic approach to Fact 10.2 would be to prove that the image
of a proper subvariety in Rn under a polynomial map is a proper subvariety of
Rn . Unfortunately, this is not literally true, as can be seen by modifying the
hyperbola example from Section 8.2. What can be shown is that such an image
is contained in a proper subvariety of Rn , which is enough for our purposes.
This is not too hard, given the tools covered so far, and it is a special case of a
result stating that a regular map cannot increase dimension, but here we will
not go through the argument.

Proof of Theorem 10.1. First we count only the bounded components of Rn \


V (f ).
We do not know apriori that there are only finitely many components, but
for some of the arguments below it will be important that we work with finitely
many. So we fix any collection C of finitely many bounded components of
Rn \V (f ) and work only with these. We will show that |C| ≤ (d−1)n , which will
imply, in particular, that there are only finitely many components altogether.
For each component C ∈ C, we have either f > 0 or f < 0 on C; let
us assume the former. We claim that f attains at least one maximum on
C. Indeed, f attains some positive value ε > 0 at some point of C, the set
{x ∈ C : f (x) ≥ 2ε } is compact and nonempty, and so f attains a maximum at
some xC there.
Since xC lies inside the open set C and f is differentiable, the gradient ∇f
vanishes at xC , and hence xC ∈ V (∇f ), where V (∇f ) is a shorthand for the
∂f
set of common zeros of ∂x i
, i = 1, 2, . . . , n.
∂f
We note that deg ∂x i
≤ d − 1. The idea is to apply Bézout’s inequality, in
the form with nonsingular zeros, to bound |V (∇f )|, and hence the number of
bounded components, by (d − 1)n .

39
∂f
The condition for nonsingularity of a common zero a of the ∂xi reads
det Hf (a) 6= 0, where Hf is the Hessian matrix of f , with

∂2f
(Hf )ij := .
∂xi ∂xj

However, we cannot guarantee that det Hf is not identically 0 (even some of


the partial derivatives may be identically 0—for example, if f does not depend
on some of the variables).
The next trick is to perturb the function whose maxima we seek. Indeed,
if the maximum of f over a bounded component C is at least ε, then another
function f˜ differing from f by at most 3ε , say, also has to attain a maximum
in C. (Note that C ∈ C is still one of the original components of Rn \ V (f ),
even though we maximize the perturbed function f˜ over it.)
We actually make two perturbations. First, for δ sufficiently small, we set
f˜ := f − δ(x21 + · · · + x2n ). This is the simplest kind of perturbation that may
make the Hessian determinant nonzero (if we were willing to use Theorem 9.1
instead of Theorem 9.2, we could skip this perturbation).
It is easy to see that Hf˜ = Hf − 2δI, where I is the identity matrix, and
hence det Hf˜ = 0 exactly if 2δ is an eigenvalue of Hf . Thus, no matter what
Hf looks like, det Hf˜ is a nonzero polynomial for all but finitely many δ. We
fix some sufficiently small δ for which det H ˜ is not identically zero; then f˜ is
f
fixed too.
Next, we let f˜η := f˜ − η1 x1 − · · · − ηn xn , where η = (η1 , . . . , ηn ) is a vector
of parameters. Then ∇f˜η = ∇f˜ − η, and so instead of counting the points
in V (∇f˜) = (∇f˜)−1 (0), we now need to count the number of preimages of η
under the polynomial map ∇f˜: Rn → Rn . (Geometrically, replacing f˜ with f˜η
corresponds to slightly tilting the originally vertical direction in which we seek
maxima or minima of f˜.)
We want to choose η sufficiently small (so that f˜η and f are sufficiently
close) and such the Hessian determinant det Hf˜η = det Hf˜ does not vanish at
the points of the preimage (∇f˜)−1 (η).
The variety of the Hessian determinant, Y := V (det Hf˜), is the zero set of
a nonzero polynomial, and ∇f˜ is a polynomial mapping Rn → Rn . Hence by
Fact 10.2, there are arbitrarily small η avoiding the image of ∇f˜(Y ).
For such η, all the maxima and minima of f˜η are nonsingular common zeros
of the polynomials in ∇f˜η , and so we can bound their number by (d − 1)n as
desired.
It remains to account for the unbounded components. For that, we replace
f with g := f ·(x21 +· · ·+xnn −R2 ), where R is a sufficiently large number; that is,
to the zero set of f we add a large sphere. Then every component of Rn \ V (f )
appears as a bounded component of Rn \ V (g). Since deg g = deg f + 2, the
bound claimed in the theorem follows.

A stronger version. The theorem just proved can be strengthened in several


respects.

40
First, there is a quantitative improvement, which becomes significant if the
degree d and the dimension n are comparable: the true bound is more like
(d/n)n (which is the lower bound we got from the simple example) than dn .
Second, the bound can be extended to the complement of the union of
several zero sets, i.e., Rn \ (V (f1 ) ∪ · · · ∪ V (fm )). In this case a reasonably good
bound can be obtained just by setting f = f1 f2 · · · fm and using the bound for
a single polynomial.
Third, instead of considering just the complement, which is the set where all
of f1 , . . . , fm are nonzero, we can consider sets where some of the fi are required
to be 0, some others positive, and some negative. These three improvements
are all reflected in the next theorem.

Theorem 10.4. Let f1 , . . . , fm ∈ R[x1 , . . . , xn ] be polynomials of degree at


most d, and for every sign vector σ ∈ {−1, 0, +1}m let Sσ ⊆ Rn be defined
as n o
x ∈ Rn : sgn fi (x) = σi for all i = 1, 2, . . . , m .

Then, for m ≥ n ≥ 2,
X  n
50dm
#Sσ ≤ ,
n
σ∈{−1,0,+1}m

where #Sσ denotes the number of connected components of Sσ .

The basic ideas of the proof are similar to those in the proof of Theorem 10.1
shown above, but the details are considerably more involved.
In the literature, such results are often stated as bounding the total topo-
logical complexity of the considered sets, more precisely, the sum of the Betti
numbers, instead of just the number of connected components. For still other
strengthenings of the just stated theorem, such as a more refined dependence on
the degrees of the fi , as well as replacing the ground set Rn with a k-dimensional
algebraic variety in Rn , see [Bar13] and references therein.
Bounds on the radius of components and inscribed balls. Another way
of measuring zero sets of polynomials in Rn is, for example, by the radius of the
smallest ball intersecting all connected components. Here, of course, we need
to make some assumptions on the coefficients of the polynomials; typically we
assume them to be integers not exceeding some given bound. Here is a general
result of this kind:
Theorem 10.5. Let f1 , . . . , fm ∈ Z[x1 , . . . , xn ] be polynomials of maximum
degree d whose coefficients are integers bounded by M in absolute value. For
σ ∈ {−1, 0, 1}m , let Sσ := {x ∈ Rn : sgn fi (x) = σi for all i = 1, 2, . . . , m}.
Cn
Then each connected component of Sσ intersects the ball of radius R = M (d+1)
centered at 0, were C is a suitable absolute constant. The bounded connected
components of Sσ are all contained in that ball.
If σi 6= 0 for all i, or in other words, Sσ is defined only by strict inequalities,
and if Sσ is nonempty, then it contains a rational point with coordinates whose
numerators and denominators are integers not exceeding R in absolute value.

41
This kind of result goes back to [GV88, Lemma 9] (which deals with more
special sets, namely, the zero set of a single polynomial), and the result as above
about a ball intersecting all connected components is [BPR96, Theorem 4.1.1]
(also see [BPR03, Theorem 13.14]). A statement directly implying the part
with the ball containing all bounded components is [BV07, Theorem 6.2]. For
the part with a rational point see [BPR03, Theorem 13.15].
On applications. Theorem 10.4 and its relatives have probably hundreds of
applications in geometry, combinatorics, computer science and elsewhere. An
old but still very beautiful one is Ben Or’s lower bound method for algorithms
described as algebraic computation trees [BO83].
Here is a quick application from [AFR85] which uses the more precise bound
in Theorem 10.4. Let the sign pattern of an n×n matrix A be the matrix S with
sij = sgn aij . We claim that there are n×n matrices S with only ±1 entries such
that every A with sign pattern S has rank at least cn, for a positive constant c.
2
On the one hand, there are 2n possible S’s. On the other hand, an A of
rank at most r can be written as U V T , where U and V are n × r matrices. We
consider the 2nr entries of U and V as variables; then the signs of the entries
of A are signs of quadratic polynomials in these variables. We have m = n2
polynomials and thus, by Theorem 10.4, there are no more than O(n2 /nr)2nr
possible sign patterns of a rank-r matrix A. For r < cn and c small, this
2
quantity is smaller than 2n , and so some patterns force rank at least cn.

11 Literature
Textbooks and lecture notes for such a classical subject as algebraic geometry
abound, of course, but not all of them are equally accessible to beginners.
The usual hands-on introduction, with emphasis on computational aspects,
is Cox, Little, and O’Shea [CLO07]. Schenck’s book [Sch03] is very clear, read-
able, and concise; another advantage is that it also treats many related concepts
from algebra and topology. A very good set of lecture notes freely accessible on
the web, including some of the more advanced concepts, such as sheaves and
schemes, is Gathmann [Gat13].
For intersection theory, dealing with generalizations of Bézout’s theorem
and other counting questions for varieties, a remarkable little book is Katz
[Kat06], and an older concise introduction is Fulton [Ful84].
For combinatorial, geometric, and computer science applications of polyno-
mials, we can recommend, for example, Chen, Kayal, and Wigderson [CKW11].
Recent treatments of methods similar to the one used in the joints problem are
Guth [Gut13] and Tao [Tao13].
Acknowledgment. I would like to thank Boris Bukh, Vincent Kusters, Zuzana
Safernová, Adam Sheffer, and Noam Solomon for valuable comments, sugges-
tions, and corrections to earlier versions of this document.

42
References
[AFR85] N. Alon, P. Frankl, and V. Rödl. Geometrical realization of set
systems and probabilistic communication complexity. In Proc. 26th
IEEE Symposium on Foundations of Computer Science, pages 277–
280, 1985.

[Arr06] E. Arrondo. Another elementary proof of the Nullstellensatz. Amer.


Math. Monthly, 113(2):169–171, 2006.

[Bar13] S. Barone. Some quantitative results in real algebraic geometry.


Preprint, arXiv:1307.8353, 2013.

[BO83] M. Ben-Or. Lower bounds for algebraic computation trees. In Proc.


15th Annu. ACM Sympos. Theory Comput., pages 80–86, 1983.

[BPR96] S. Basu, R. Pollack, and M.-F. Roy. On the combinatorial and alge-
braic complexity of quantifier elimination. J. ACM, 43(6):1002–1045,
1996.

[BPR03] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in real algebraic


geometry. Algorithms and Computation in Mathematics 10. Springer,
Berlin, 2003.

[BV07] S. Basu and N. N. Vorobjov. On the number of homotopy types of


fibres of a definable map. J. Lond. Math. Soc., II. Ser., 76(3):757–
776, 2007.

[CKW11] Xi Chen, N. Kayal, and A. Wigderson. Partial derivatives in arith-


metic complexity and beyond. Found. Trends Theor. Comput. Sci.,
6(1-2):1–138, 2011.

[CLO05] D. A. Cox, J. Little, and D. O’Shea. Using algebraic geometry.


Springer, New York, 2005.

[CLO07] D. Cox, J. Little, and D. O’Shea. Ideals, varieties, and algorithms.


Undergraduate Texts in Mathematics. Springer, New York, third edi-
tion, 2007.

[Ful84] W. Fulton. Introduction to intersection theory in algebraic geome-


try, volume 54 of CBMS Regional Conference Series in Mathematics.
Published for the Conference Board of the Mathematical Sciences,
Washington, DC, 1984.

[Gat13] A. Gathmann. Algebraic geometry. Lecture Notes, TU Kaiser-


slautern, http://www.mathematik.uni-kl.de/agag/mitglieder/
professoren/gathmann/notes/alggeom/, 2013.

[GK10] L. Guth and N. H. Katz. Algebraic methods in discrete analogs of


the Kakeya problem. Adv. Math., 225(5):2828–2839, 2010.

[Gut13] L. Guth. The polynomial method, 2013. Book in preparation.

43
[GV88] D. Yu. Grigor’ev and N. N. Vorobjov jun. Solving systems of poly-
nomial inequalities in subexponential time. J. Symb. Comput., 5(1-
2):37–64, 1988.

[Har09] N. J. A. Harvey. Algebraic algorithms for matching and matroid


problems. SIAM J. Comput., 39(2):679–702, 2009.

[Hei83] J. Heintz. Definability and fast quantifier elimination in algebraically


closed fields. Theoret. Comput. Sci., 24(3):239–277, 1983. Corrigen-
dum ibid. 39,1983: 2–3.

[Kat06] S. Katz. Enumerative geometry and string theory, volume 32 of Stu-


dent Mathematical Library. American Mathematical Society, Provi-
dence, RI, 2006. IAS/Park City Mathematical Subseries.

[Kol88] J. Kollár. Sharp effective Nullstellensatz. J. Amer. Math. Soc.,


1(4):963–975, 1988.

[Mil64] J. W. Milnor. On the Betti numbers of real algebraic varieties. Proc.


Amer. Math. Soc., 15:275–280, 1964.

[MO02] C. Michaux and A. Ozturk. Quantifier elimination following Much-


nik. Univ. de Mons-Hainaut Preprint Series (#10), 2002.

[Ole51] O. A. Oleinik. Estimates of the Betti numbers of real algebraic hy-


persurfaces (in Russian). Mat. Sbornik (N. S.), 28(70):635–640, 1951.

[OP49] O. A. Oleinik and I. B. Petrovskiı̌. On the topology of of real algebraic


surfaces (in Russian). Izv. Akad. Nauk SSSR, 13:389–402, 1949.

[Sch95] J. Schmid. On the affine Bézout inequality. Manuscripta Math.,


88(2):225–232, 1995.

[Sch03] H. Schenck. Computational algebraic geometry, volume 58 of London


Mathematical Society Student Texts. Cambridge University Press,
Cambridge, 2003.

[ST12] J. Solymosi and T. Tao. An incidence theorem in higher dimensions.


Discrete Comput. Geom., 48(2):255–280, 2012.

[Tao12] T. Tao. Spending symmetry. Book in preparation, draft


available at http://http://terrytao.wordpress.com/books/
spending-symmetry/, 2012.

[Tao13] T. Tao. Algebraic combinatorial geometry: the polynomial method


in arithmetic combinatorics, incidence combinatorics, and number
theory. Preprint, arXiv:1310.6482, 2013.

[Tho65] R. Thom. On the homology of real algebraic varieties (in French). In


S. S. Cairns, editor, Differential and Combinatorial Topology. Prince-
ton Univ. Press, 1965.

44
[Wal79] M. Waldschmidt. Transcendence methods. Queen’s University, 1979.
Available at http://www.math.jussieu.fr/~miw/articles/pdf/
QueensPaper52.pdf.

[Woo96] T. D. Wooley. A note on simultaneous congruences. J. Number


Theory, 58(2):288–297, 1996.

45

You might also like