Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Polynomials and Defintions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Chapter

Polynomials
The only way to learn mathematics is to do mathematics.
Paul Halmos
Lets start our journey with polynomials. In studying polynomials,
well reveal some of the implicit assumptions behind mathematical definitions, work carefully through two nontrivial proofs, and learn about
how to share secrets using something called polynomial interpolation.

. Polynomials and defintions


Mathematical discourse starts with the denition of some mathematical object. Our rst denition is a polynomial, and there are many
possible denitions. I will start with what Ill call the data denition
of a polynomial. In this denition you can safely think of real numbers as oating point numbers (minus the headaches of oating point
roundo).
Denition . A polynomial with real coecients is a nite list of real
numbers (a0 , a1 , . . . , an ).
If you already have a good feel for polynomials, then this denition
is somehow unsatisfying. Or at least, thats what a mathematician
would tell you. Why? Its a perfectly good denition, and its technically a correct denition. In fact, its probably the denition one uses
for a polynomial when writing a program! The problem is that this
denition doesnt tell you what polynomials are all about. It doesnt
communicate much to the reader. This theme, that most mathematics

is written for the purpose of human-to-human communication, will


follow us throughout the book.
A second theme is the dierence between a mathematical object and
its representation, say, as a struct or class in a computer program. You
may have experienced this with some of the more fancy programming
language features, for example those that allow you to work with
innite lists by writing them as python generators or C++ iterators.
Thats why I gave the data denition of a polynomial rst, to convey
that a mathematical denition is supposed to be more like an interface
or API than a piece of data. A good mathematical denition goes
beyond simply explaining what an object is, it highlights how that
object relates to other things.
This distinction is important because in mathematics its implicit,
but in programming its usually explicit. In Java you have to separate
an interface from its denition, and in C++ templates are distinct from
their usage. If youre familiar with Python, math is close in spirit to
duck-typing. The point is that as a programmer you already understand the conceptual dierence, and so you can already understand
how and why it is useful in math. The diculty for a programmer is
acknowledging the parallel and understanding when it matters.
Heres a better denition of a polynomial. A quick syntactic note:
the subscript notation a1 , a2 , a3 is just a way to give an ordering on
some variables. In a programming language youd dene a list or array
a and index it with a[i], and this is the same thing. In fact, in math
well often implicitly associate the bold-faced letter x, which is an array
of numbers, with the list (x1 , . . . , xn ), so that xi = x[i] (Ill remind
you of this later).
Denition . A polynomial with real coecients is a function f that
takes as input real numbers, produces real numbers as output, and has
the form
f (x) = a0 + a1 x + a2 x2 + + an xn ,

where all the ai are real numbers. The ai are called coecients of f .
The degree of the polynomial is the integer n.
Lets break this denition down a bit. When I say f (x) = a0 + a1 x +
a2 x2 + + an xn , I am implying a few things. First, that x is the input
of the function and not part of the data that makes up the polynomial.

Jeremy Kun

A Programmers Introduction to Mathematics

Second, the ellipses imply a repeated pattern all the way up to n. You
as the reader are supposed to be able to pick up on the pattern, but the
author usually doesnt make it particularly complicated to do so. In
this, its supposed to be clear that the next term is a3 x3 and that the
index and exponent increase by every step until n.
The key here is that, even though polynomials are really dened by
the numbers that make up their coecients (the choice of n and the
numbers ai ), polynomials at their heart are functions with a specic
structure. The input is multiplied and added together to produce
an output. The class of all polynomials as dened above consists of
every possible function of a single input that can be expressed using
multiplication and addition. Further, if you were to allow more than
one input (multivariable polynomials) this would represent all possible
multiply and/or add functions on any number of inputs.
This example may seem frivolous to illustrate the dierence between an object and its denition, but the same pattern often lurks
behind more complicated denitions. First the author will start with a
denition that seems (to them, with the hindsight of years of study) to
be the most useful way to communicate the idea behind the concept.
For us thats Denition above. Often these denitions seem totally
useless from a programming perspective.
Then ten pages down the line the author introduces a second definition which turns out to be equivalent to the rst. Any properties
dened in the rst denition automatically hold in the second and vice
versa; theyre dierent perspectives on identical things. The second
denition is the one that allows for nice programs. You might think
the author was crazy not to start with the second denition, but then
later down the line its rst denition that really sticks, generalizes, or
guides you through proofs.
As a sneak peek, well see this happen when we introduce linear
algebra in Chapter ??. You nd out that linear maps are equivalent to
matrices in some formal sense. And while linear maps are easy to conceptualize, the corresponding operations on matrices are complicated
and best suited for a computer. In this sense, matrices are analogous to
the data denition of a polynomial. And, as a mathematician would
argue, you cant see the elegance or truly grok linear algebra if you
only ever see the data denition.
In any case, the point is that we can convert between these two

ways of thinking about polynomials: as expressions dened abstractly


by picking a list of numbers, or as functions with a special structure.
Another important bit of culture that every mathematician knows
is that the second you see a denition in a text you must immediately write down examples. Generous authors provide examples
of geniunely new concepts, but an author is never obligated to do so.
Instead, the unspoken rule is that the reader may not continue unless
the reader understands what the denition is saying. That is to say,
you arent expected to master the concept, most certainly not at the
same speed you read it. But you should have some idea going forward
of what the dened words refer to.
The best way to think of this is like testing in software (or better,
test driven development). You start with the simplest possible stupid
tests, usually setting as many values as you can to zero or one, then
work your way up to more complicated examples. Later when you get
stuck on some theorem or proof everyone does and you will too
you return to those examples you wrote down and test them in the
new context. This is how you build intuition.
So lets write down some denitions of polynomials according to
Denition , starting from literally the simplest possible thing. To make
you pay attention, Ill slip in some things that are not polynomials and
your job is to run them against the denition. The slower you do it
the better, and you can check your answers in the Chapter Notes.
f (x) = 0
g(x) = 12
h(x) = 1 + x + x2 + x3
p
i(x) = x
1
j(x) = + x2 2x4 + 8x8
2
1
5
k(x) = 4.5 +
x x2
1 5
l(x) =
x + e 3 x10
e
m(x) = x + x2 x + xe

Jeremy Kun

A Programmers Introduction to Mathematics

Like testing, part of the point is to weed out those little edge cases.
Such as the fact that the exponents need to be nonnegative integers,
though I only stated it implicitly in the denition. What seems like a
strange edge case can just be a matter of taste, sometimes signied by
the phrase by convention, or it can be a key ingredient in the concept
itself. Either way, having the examples in your pocket as you continue
to read is important, and coming up with the examples yourself is what
helps you internalize the concept.
Before we move on, lets spend a second clarifying the syntax of
denition. Every denition emphasizes the thing being dened in
some way. Either the entire text of the denition is in italics and the
dened term is not, or vice versa. When written on a chalkboard, one
underlines the new term. When the thing being dened is a function,
one usually uses the compact arrow notation f : A ! B to describe the
allowed inputs and outputs. All possible inputs are collectively called
the domain, and all possible outputs are called the range. However, if
we call f : A ! B the function, then B is not called the range, but
rather the codomain. The reason we make a distinction is essentially
syntax versus semantics. Think of it as a type signature or function
header like
float f(int x) {
...
}

We know that f outputs floats, but it might not output every oating point number if you try all possible int inputs. So the codomain
would be all oats, but the range would depend on the semantics of
the function. Its the same for a mathematical function f : A ! B.
All of this is essentially to say that function application is syntactically
the same as most programming languages, and the only dierences
are in terminology and the notation f : A ! B.
Because mathematicians were not originally constrained by ASCII,
they developed other symbols for types. The one for the reals is R. So
the arrow notation well use for polynomials is f : R ! R. Moreover,
saying over the reals doesnt have to do with the domain or codomain,

The font is called blackboard-bold.

but rather the coecients. So if I called a polynomial over the integers


it would have integer coecients. By the way, the symbol for integers is
Z, and the positive integers are denoted by N, often called the natural
numbers. So sometimes youd say a polynomial over Z instead of a
polynomial over the integers. Finally, Ill use this strange 2 symbol,
read in, to assert or assume membership in some set. For example
q 2 N is the claim that q is a natural number. It is literally short hand
for the phrase, q is in the natural numbers, since it can represent a
question (preceded by if) or a claim (preceded by suppose).

. Polynomials as curves, pairs of points, and


syntactic objects
Here are the other common perspectives on polynomials in mathematics.
As pictures: Pictures in math are rarely meant to be precise. Again,
in mathematics pictures are communicative tools, often sketched on
a napkin or whiteboard only to be erased minutes later. But still we
often think of mathematical objects as being dened or represented
by pictures.
The same is true of polynomials. We could dene a certain polynomial by a picture of its graph, perhaps for the sake of an argument
that will proceed by general reasoning illustrated by the picture, even
when the actual coecients of the polynomial are not clearly dened.
Unfortunately, as enlightening as such arguments can often be, they
are almost never in books. The medium is usually not right for it, and
since one wants to be more rigorous on paper anyway, the drawing
is a step often crucial to learning, but ignored in the nal writeup.
Just know that its usually happening behind the scenes, and you are
encouraged to draw your own pictures to help you understand an idea.
As pairs of points: In the same vein, the graph of a polynomial
(or any function R ! R) has a rigorous denition you can write down.
Its the set of all pairs (x, f (x)) where x 2 R. Pairs here are ordered,
so you can tell which is the domain and which is in the codomain. In
the context of polynomials, the pairs are called points, implying at
least that they have some geometric interpretation.

The Z stands for Zahlen, the german word for numbers.

Jeremy Kun

A Programmers Introduction to Mathematics

This set of all points is a ridiculous set you could never ever write
down explicitly, but its just as valid a representation of a polynomial
as any other for the purpose of reasoning.

One of many possible representations of a polynomial.


As geometric objects: While we wont get to this at all in this
book, polynomials can be thought of as a special kind of geometric
object, in much the same way that triangles and circles have a special
place in geometry. The diculty with this view is that one must quickly
dive deep into advanced mathematical machinery to appreciate this
view. The name of the eld is algebraic geometry, and its one of the
deepest areas in all of mathematics. Perhaps to whet your appetite,
mastery of even elementary algebraic geometry allows one to do things
like robot motion planning and automated theorem proving. That
being said, most of the eld invovles the purest of pure math concerns.
As formal syntactic objects: The last perspective is that a polynomial is a formal syntact object. By analogy, you might abstractly
represent an arithmetic expression given as input to a compiler using
an abstract syntax tree, like
If[Eq[7,6],Set[Var[x], ], Set[Var[x], Plus[,Var[y]]]] .
Likewise, you can think of a polynomial as a purely syntactic object
with syntactically dened operations like Plus and EvaluateAt.
We wont dwell on this perspective, but it can be useful in certain
situations.

. Existence & Uniqueness


Now that weve played around with denitions, were ready to do
something substantive. Were going to see two proofs relating to the
existence and uniqueness of certain polynomials.
First, a word about existence and uniqueness. Existence proofs are
classic in mathematics, and they come in all shapes and sizes. Basically,
mathematicians like to take interesting properties they see on small
objects, write down the property in general, and then ask things like,
Are there arbitrarily large objects with this property? or, Are there
innitely many objects with this property? Its like in physics: when
you come up with some equations that govern the internal workings of
a star you might ask: would these equations support arbitrarily large
stars?
One simple example of this in mathematics is quite famous: whether
there are innitely many pairs of primes of the form p, p + 2. For
example, and work, but is not part of such a pair. Perhaps
surprisingly, it is an open question whether there are innitely many
such pairs. This is called the Twin Prime Conjecture.
In some cases you get lucky, and the property you dened is specic
enough to single out a unique mathematical object or a unique behavior. This is what will happen to us with polynomials. Other times,
the property (or list of properties) you dened are too restrictive, and
there are no mathematical objects that can satisfy it. For example,
Kleinbergs Impossibility Theorem for Clustering lays out three natural properties for a community detection algorithm (an algorithm
that nds communities in, say, a social network) and proves that no
algorithm can satisfy all three simultaneously. Though such theorems
are often heralded as genius, more often than not mathematicians
avoid them by turning small examples into broad conjectures.
Were going to follow this path now. But because our focus in this
book is on reading and understanding written mathematics, and not
so much on generating original conjectures and proofs, well start by
stating the theorem in its nal, meticulous form. Dont worry, well
go carefully through every bit of it, but try to read it now and identify
what you understand and what you dont understand.

See how I immediately wrote down examples?

Jeremy Kun

A Programmers Introduction to Mathematics

Theorem . For any integer n


0 and any list of n + 1 points
2
(x0 , y0 ), (x1 , y1 ), . . . , (xn , yn ) in R with x0 < x1 < < xn there
exists a unique degree n polynomial p(x) such that p(xi ) = yi for all i.
The one piece of new notation is the exponent on R2 . This just
means pairs of numbers, each of which is in R. Likewise, Z3 would
be triples of integers, and N10 tuples of size , each entry of which is
a natural number.
A briefer, more informal way to state the theorem is to say that
there is a unique degree n polynomial passing through any given n + 1
points. Now just like with denitions, the rst thing we need to do
when we see a new theorem is write down the simplest possible examples. In addition to simplifying the theorem, it will give us examples
to work with while going through the proof. Go write down some
examples now.
Back already? Ill show you examples Id write down, and you can
compare your process to mine. Okay, so the simplest example is n = 0,
so that n + 1 = 1 and were working with a single point. Lets pick
one at random, say (7, 4). The theorem claims that there is a unique
degree zero polynomial passing through this point. Whats a degree
zero polynomial? Looking back at Denition , its a function like
a0 + a1 x + a2 x2 + + ad xd (Im using d for the degree here because
n is already taken), where weve chosen to set d = 0. Setting d = 0
means that f is the function f (x) = a0 . So whats such a function
with f (7) = 4? Well, it has to be the constant function f (x) = 4. It
should be pretty clear that its the only degree zero polynomial that
does this. Indeed, the datum of a degree-zero polynomial is a single
number, and the constraint of passing through the point (7, 4) forces
that one piece of data to a specic value.
Lets move on to a slightly larger example which Ill allow you to
work out for yourself before going through the details. When n = 1
and we have n + 1 = 2 points (2, 3), (7, 4), the claim is that there is a
unique degree polynomial with f (2) = 3 and f (7) = 4. Find it by

To say a function f (x) passes through a point (a, b) means that f (a) = b. When
we say this were thinking of f as a geometric curve. Its passing through the point
because we imagine ourselves as a dot on the curve moving along it. That perspective
likely comes from calculus, but at least it allows for suggestive language in place of
notation.

writing down the denition for a polynomial in this special case and
solving the two resulting equations.
Alright. The denition of a degree polynomial is a function of the
form
f (x) = a0 + a1 x
And so writing down the two equations f (2) = 3, f (7) = 4, we are
trying to simultaneously solve:
a0 + a1 2 = 3
a0 + a1 7 = 4

If we solve for a0 in the rst equation, we get a0 = 3 2a1 . Substituting that into the second equation we get (3 2a1 ) + a1 7 = 4.
If we solve this we get a1 = 1/5. Plugging this back into the rst
equation gives a0 = 2 3/5. This has forced the polynomial to be
exactly f (x) = (2 3/5) + (1/5)x, as desired. If we want to write
this in the usual notation as a line, its
y=

7 1
+ x
5 5

In this case were nding a strange way to state a fact we already


know, that there is a unique line between any two points. Well, its
not quite the same fact. What is dierent about this scenario? The
statement of the theorem said x0 < x1 < < xn . In our example,
this means we require x0 < x1 . So this is where we run a sanity check.
What happens if x0 = x1 ? Think about it, and if you cant tell then
you should try to prove it wrong: try to nd a degree polynomial
passing through the points (2, 3), (2, 5).
The problem could be that there is no degree polynomial passing
through those points, violating existence. Or, the problem might be
that there are many degree polynomials passing through these two
points, violating uniqueness. Its your job to determine what the
problem is. And despite it being pedantic, you should work straight
from the denition of a polynomial! Dont use any mnemonics or
heuristics you may remember; were practicing reading from precise
denitions.

Jeremy Kun

A Programmers Introduction to Mathematics

Just in case youre still stuck, lets follow our pattern from before.
Saying a degree polynomial passes through these two points is equivalent to saying, if we call a0 + a1 x our polynomial, that there is a
simultaneous solution to the following two equations f (2) = 3 and
f (2) = 5.
a0 + a1 2 = 3
a0 + a1 2 = 5

What happens when you try to solve these equations like we did
before?
What about for three points or more? Well, thats the point at which
it might start to get dicult to compute. You can try by setting up
equations like those I wrote above, and with some elbow grease youll
probably make it work. Such things are best done in private so you
can make plentiful mistakes without being judged for it.
Now that weve worked out two examples of the theorem in action,
lets move on to the proof. The proof will have two parts, the existence
part and the uniqueness part. That is, rst well show that a polynomial
satisfying the requirements exists, and then well show that if two
polynomials both satised the requirements, theyd have to be the
same. In other words, there can only be one polynomial with that
property. In this chapter well spell out the details of the proof from
the start. In later chapters well gradually become more terse, with the
goal of training you to spot and ll in the details.
Existence. We will show existence by direct construction. What I
mean by that is well be clever and nd a general way to write down
a polynomial that works. Being clever sounds scary, but the process is
actually quite natural, and it follows the same pattern as we did for
reading and understanding denitions: you start with the simplest
possible example (but this time the example will be generic) and then
you work up to more complicated examples. By the time we get to
n = 2 we will notice a pattern (this is the clever part), and that pattern
will suggest a big formula for the general solution, which we will prove
is correct. In fact, once we understand how to build the big formula,
the proof that it works will be trivial.
Lets start with a single point (x1 , y1 ) and n = 0. Im not specifying
the values of x1 or y1 because I dont want the construction to depend

on my arbitrary specic choices. I want to ensure that f (x1 ) = y1 and


that the polynomial has degree zero. So of course I need to make the
constant term of the polynomial y1 and the rest zero.
f (x) = y1
On to two points. Call them (x1 , y1 ), (x2 , y2 ) (note the variable is
just plain x). Now heres an interesting idea: I can write the polynomial
in this strange way:
x x2
x x1
+ y2
x1 x2
x2 x1
First lets verify that this works. If I evaluate f at x1 , the second
term gets x1 x1 = 0 in the numerator and so the second term is zero.
The rst term, however, becomes y1 xx11 xx22 = y1 1, which is what we
wanted: we gave x1 as input and the output was y1 . Also note that we
have explicitly disallowed x1 = x2 by the conditions in the theorem,
so we cant get something nonsensical like 0/0.
Likewise, if you evaluate f (x2 ) the rst term is zero and the second
term evalutes to y2 . So we have both f (x1 ) = y1 and f (x2 ) = y2 , and
the expression is a degree polynomial. How do I know its degree
one when I wrote f in that strange way? (it will not be so strange in a
moment) Well, I could instead write it like this
f (x) = y1

f (x) =

y1

(x

x2 ) +

y2

(x x1 ),
x1 x2
x2 x1
and simplify with typical algebra to get the form required by the
denition:

x1 y2 x2 y1
y1 y2
f (x) =
+
x
x1 x2
x1 x2
Indeed, instead of doing all that algebra I could have just recognized
that no powers of x appear in the formula for f that are larger than ,
and Im never multiplying two xs together. Since these are the only
ways to get degree bigger than , if I want I can skip the algebra and
be condent that the degree is .
The key to the above idea, and the reason we wrote it down in
that strange way, is so that each constraint (i.e. f (x1 ) = y1 ) could be
isolated in its own term, while all the other terms evaluate to zero. For

Jeremy Kun

A Programmers Introduction to Mathematics

three points (x1 , y1 ), (x2 , y2 ), (x3 , y3 ) we just have to beef up the terms
to maintain the same property: when you plug in x1 , all terms except
the rst evaluate to zero and the fraction in the rst term evaluates
to . When you plug in x2 , the second term is the only one that stays
nonzero, and likewise for the third. Here is the generalization that
does the trick.
f (x) = y1 (x(x1

x2 )(x x3 )
x2 )(x1 x3 )

+ y2 (x(x2

x1 )(x x3 )
x1 )(x2 x3 )

+ y3 (x(x3

x1 )(x x2 )
x1 )(x3 x2 )

Now you come in. Evaluate f at x1 and verify that the second
and third terms are zero, and that the rst term simplies to y1 . The
symmetry in the formula should convince you that the same holds
true for x2 , x3 without having to go through all the steps two more
times.
Again, its clear that the polynomial we dened is degree , because
each term consists of a product of two terms (x xi ) and taking their
product gives degree two. This has saved me the eort of multiplying
all that nonsense out to get something in the form of Denition .
So now the general form for points (x1 , y1 ), . . . , (xn , yn ) should
follow the same pattern. Add up a bunch of terms, and for the i-th
term you multiply yi by a fraction you construct according to the rule:
the numerator is the product of x xj for every j except i, and the
denominator is a product of all the (xi xj ) for the same js as the
numerator. It works for the same reason that our formula works for
three terms above. In fact, the process is clear enough that you could
write a program to build these polynomial quite easily, and well walk
through such a program together at the end of the chapter.
Here is the notation version of the process we just described in
words. Its a mess, but well break it down.
!
n
X
Y x xj
f (x) =
yi
xi xj
i=0
j6=i
P Q
What a mouthful! Ill assume the , symbols are new to you.
They are read semantically as sum and product, or typographically
as sigma and pi. they essentially
Pn represent loops of arithmetic.
That is, if I have a statement like i=0 (expr), it is equivalent to the
following code snippet.

int i;
sometype theSum = defaultValue;
for (i = 0; i <= n; i++) {
theSum += expr;
}
return theSum;

I wrote it this way because defaultValue is whatever the understood zero object is in that setting, and there are lots of dierent
settings where addition is dened. For numbers the default is zero,
but if were summing over vectors it would be the zero vector, or if
were on this exotic thing called
P an elliptic curve it would be a base
point. The point is that the
notation does not imply a specic type
of the thing being
Q summed. Moreover, writing it this way allows me
to dene the symbol by analogy: you just replace += with *= and
reinterpret the default value (for numbers it would be ). Functional
programmers will know this pattern well, because its just a fold
operation.
Q
Finally, when I say something like j6=i there are three extra things.
First, recall that in this context i is xed by the outer loop, so j is the
looping variable. Unfortunately the reader is required to keep track
of scope when it comes to indices in sums and products. Second, the
bounds on j are not stated; we have to infer them from the context.
There are two hints: were comparing j to i, so it should probably have
the same range as i unless otherwise stated, and we can see where in
the expression were using j. Were using it as an index to the x-values
of the points xj . Since the x-values go from 0 to n, wed expect j to
have that range. It might seem totally unrigorous to a programmer, but
if mathematicians consider it easy to infer the intent of a notation,
then it is considered rigorous enough.
Though it sometimes makes me cringe to say it, you have to give the
author the benet of the doubt. When things are ambiguous, you have
to pick the option that doesnt break the math. In this respect, you have
to act as both the tester, the compiler, and the bug xer when youre
reading math. The best default assumption is that the author is far

Another reason is that mathematicians get tired of writing these obvious details out
over and over again.

Jeremy Kun

A Programmers Introduction to Mathematics

smarter than we are, and if we the reader dont understand something,


its likely a user error and not a bug.
Finally, the j 6= i part is an implied lter on the range of j. Inside
the for loop you add Q
an extra if statement to skip that iteration if
j = i. Read out loud, j6=i would be the product over j not equal
to i. So the formula we gave above is really just a compact way of
describing the pattern we said in words.
If we wanted to write out the product-nested-in-a-sum as a nested
loop, it would look like this:
int i, j;
sometype theSum = defaultSumValue;
for (i = 0; i <= n; i++) {
othertype innerProduct = defaultProductValue;
for (j = 0; j <= n; j++) {
if (j == i) {
continue;
} else {
innerProduct *= foo(i, j);
}
}
}

theSum += bar(i) * innerProduct;

return theSum;

Compare this with the notational version above and make sure you
can connect the structural pieces, i.e., excluding the fact that we arent
using xj or yi at all in the code.
One diculty of reading mathematics is that the author will almost
never go through these details for the reader. Its a rather subtle point
to be making so early, but its probably the rst thing you notice when
you read math books. Instead of doing the details, a typical proof of
the existence of these polynomials looks like this.
Proof. Let (x1 , y1 ), . . . , (xn , yn ) be a list of points with no two xi the
same. To show existence, construct f (x) as
f (x) =

n
X
i=0

yi

Y x
xi
j6=i

xj
xj

Clearly the constructed polynomial f (x) has degree n because each


term has degree n. For each i, plugging in xi kills all but the i-th term
in the sum, and the i-th term clearly evaluates to yi , as desired.
. . . Uniqueness part . . .
The square is called a tombstone and marks the end of a proof.
So the proof writer just gives a brief overview and you are expected
to ll in the details to your satisfaction. It sucks, but if you do whats
expected of youthat is, write down examples of the construction
before reading onthen the explanation is as clear as it needs to be. If
not, then your job is to evaluate the statements made in the proof on
your examples. The more you practice this the better you get at judging
how much work you need to put into understanding a construction or
denition before continuing. And, more importantly, you understand
it more thoroughly for all your testing. Its the same process you go
through when learning a new programming language feature.
Uniqueness. Now for the uniqueness part. This is a straightforward
proof, but it relies on a special fact about polynomials.
Well state the fact as a theorem that we wont prove. Some terminology: a root of a polynomial f : R ! R is a value z for which f (z) = 0.
Also, the zero polynomial is the polynomial whose coecients are all
zero. By convention, the degree of the zero polynomial is declared to
be -.
Theorem . The only polynomial over R of degree at most n which has
more than n distinct roots is the zero polynomial.
On to the proof. It works by supposing we actually have two polynomials f and g, both of degree n, passing through the desired set of
points. We dont assume we know anything else about the polynomials ahead of time. They could be dierent, or they could be the same.
If you wrote down two dierent looking polynomials with the two
properties, they might just look dierent (maybe one is in factored
form). So the proof operates by making no other assumptions, and
showing that actually f and g have to be the same.

Kills is a standard term for evaluates to zero.

Jeremy Kun

A Programmers Introduction to Mathematics

So suppose f, g are two such polynomials. Lets look at the polynomial (f g)(x). I hope its clear what I mean by f g. If we use the
list denition (Denition ) then its just the entrywise dierence
of the coecients. If we use the structural denition then we can
dene it as (f g)(x) = f (x) g(x). Either way, its important that
f g is a polynomial.
Now what do we know about f g? Well, its certainly got degree at
most n, because you cant magically produce a coecient of x7 if you
subtract two polynomials whose highest-degree terms are x5 . Moreover, you know that (f g)(xi ) = 0 for all i. This is by assumption:
for every i, f (xi ) = g(xi ) = yi , so subtracting them gives zero.
Now we apply the fact about polynomials. If we call d the degree of
f g, we know that d n, and hence that f g can have no more
than d roots unless its the zero polynomial. But there are n + 1 many
points xi where f g is zero, and n + 1 > n d. The conclusion is
that f g must be the zero polynomial, meaning f and g have the
same coecients.
Just for completeness, Ill write the above argument more briey
and put the whole proof of the theorem together as it would show up
in a standard textbook. That is, extremely tersely.
Proof. Let (x1 , y1 ), . . . , (xn , yn ) be a list of points with no two xi the
same. To show existence, construct f (x) as
!
n
X
Y x xj
f (x) =
yi
xi xj
i=0
j6=i

Clearly the constructed polynomial f (x) is degree n because each


term has degree n. For each i, plugging in xi kills all but the i-th term
in the sum, and the i-th term clearly evaluates to yi , as desired.
To show uniqueness, let g(x) be another such polynomial. Then
f g is a polynomial with degree at most n which has all of the n + 1
values xi as roots. This implies that f g is the zero polynomial, or
equivalently that f = g.
So basically we spent quite a few pages on expanding the details
of a ten-line proof. This is par for the course. When you encounter
a mysterious or overly brief theorem or proof it becomes your job to

expand and clarify it as needed. Much like with reading programs


written by others, as your mathematical background and experience
grows youll need less work to ll in the details.
Now that weve shown the existence and uniqueness of a degree n
polynomial passing through a given list of n + 1 points, were allowed
to give it a name. Its called the interpolating polynomial of the given
points. To take a list of points and nd the polynomial passing through
them is to interpolate those points.

. Realizing it in code
For the sake of concreteness, lets write a python program that interpolates points. Im going to assume the existence of a polynomial
class that accepts as input a list of coecients (in the same order as
Denition , starting from the constant term) and has methods for
adding, multiplying, and evaluating at a given value. All of this code,
including my own version of the polynomial class, is available at this
books Github repository. Note the polynomial class is not intended
to be perfect. Im certainly leaving the code open to oating point
rounding errors and other such things. The point of the code is not
to be industry-strength, but to help you understand the constructions
weve seen in the chapter. On to the code.
Here are some examples of constructing polynomials.
zero = Polynomial([]) # zero polynomial
f = Polynomial([1,2,3]) # 1 + 2 x + 3 x^2
g = Polynomial([-8, 17, 0, 5]) # -8 + 17 x + 5 x^3

Now rst well write the main interpolate function. It uses the
yet-to-be-dened function singleTerm.
# pts is a list of (float, float) of length n+1.
# Return the unique degree n polynomial that passes through
pts.
def interpolate(pts):
if len(pts) == 0:
raise Exception(Must provide at least one point.)
xValues = [p[0] for p in pts]
if len(set(xValues)) < len(xValues):

http://github.com/j2kun/PICM

Jeremy Kun

A Programmers Introduction to Mathematics

raise Exception(Not all x values are distinct.)


terms = [singleTerm(pts, i) for i in range(0, len(pts))]
return sum(terms, Polynomial([]))

The rst two blocks check for the edge cases, zero points or repeating
x-values. Finally, the last block creates a list of terms, each one being
a term of the sum from the construction. The return statement sums
all the terms, with the second argument being the starting value for
the sum, in this case the zero polynomial.
Now for the singleTerm function.
# pts is a list of (float, float)
# i is an integer index of pts.
# Return a term from the sum in the construction.
def singleTerm(pts, i):
theTerm = Polynomial([1.])
xi, yi = pts[i]
for j, p in enumerate(pts):
if j == i:
continue
xj = p[0]
theTerm = theTerm * Polynomial([-xj / (xi - xj), 1.0/(xi
- xj)])
return theTerm * Polynomial([yi])

We had to break up the linear polynomial (x xj )/(xi xj ) into


its coecients, which gives a0 = xj /(xi xj ) and a1 = 1/(xi xj ).
The rest is just taking a product over all the corresponding terms.
And some examples:
>>> pts1 = [(1,1)]
>>> pts2 = [(1,1), (2,0)]
>>> pts3 = [(1,1), (2,4), (7,9)]
>>> interpolate(pts1)
1.0
>>> interpolate(pts2)
2.0 + -1.0 x^1
>>> f = interpolate(pts3)
>>> f
-2.666666666666666 + 3.9999999999999996 x^1 +
-0.3333333333333334 x^2
>>> [f(xi) for xi,yi in pts3]
[1.0, 3.999999999999999, 8.999999999999993]

Ignoring the rounding errors, we can see the interpolation is correct.

. Application: sharing secrets


And now, as promised, well see how to use polynomial interpolation
to share secrets in a secure way. Heres the scenario. Say I have ve
daughters, and I want to share a secret with them, represented as a
binary string and intepreted as an integer. Perhaps the secret is the
key code for a safe which contains my will. The problem is that my
daughters are greedy, and I know that if I just give them the secret
theyll do something nefarious, like forge a modied will that leaves
them all my riches and destroy the original.
Moreover, Im afraid to even give them part of the key code. They
might be able to guess the rest and gain access. Even worse, three
of the daughters might get together with their pieces of the keycode
and then theyd really have a good chance of guessing the rest and
excluding the other two daughters. So what I really want is a scheme
that has the following properties.
. Each daughter gets a share, i.e. some string unique to them.
. If any four of the daughters gets together, they cannot use their
shares to reconstruct the secret.
. If all ve of the daughters get together, they can reconstruct the
secret.
In fact, Id be happier if I could prove, not only that any four out of
the ve daughters couldnt pool their shares to determine the secret,
but that theyd provably have no information at all about the secret.
They cant even determine a single bit of information about the secret,
and theyd have an easier time brute forcing the key code on the safe.
The magical fact is that there is such a scheme. Not only is it possible,
but its possible no matter how many daughters I have (say, n), and no
matter what minimum size group I want to allow to reconstruct the
secret (say, k). So I might have daughters , and I may want any

My family clearly has issues.


Ive been busy.

Jeremy Kun

A Programmers Introduction to Mathematics

of them to be able to reconstruct the secret, but prevent any group of


or fewer from doing so.
Polynomial interpolation gives us all of these guarantees. Here is the
scheme. First represent your secret s as an integer. Now construct a
random polynomial f (x) so that f (0) = s. Well say in a moment what
degree d to use for f (x). But if we know what d we want generating f
is easy to do, since f (0) = s precisely when the rst coecient a0 of f
is s. So you just randomly pick the other coecients. The shares you
distribute are values of f (x) at various points. Say, f (1), f (2), . . . , f (n)
if you have n people in your scenario. To person i you give the point
(i, f (i)) as their share.
What do we know about subsets of points? Well, if any k people get
together and share their points, they can construct the unique degree
k 1 polynomial g(x) passing through all those points. The question
is, will g(x) be the same as f (x)? If so, they can compute g(0) to get
the secret!
This is where we pick d, to control how many shares are needed. If
we want k to be the minimum number of shares needed to reconstruct
the secret, we make our polynomial be degree d = k 1. Then if k
people get together and reconstruct g(x), they can appeal to Theorem
to be sure that g(x) = f (x). For example, a degree polynomial would
prevent any trio of people from reconstructing f (x), and a degree
polynomial would stop any group of size 17 from obtaining f (x).
Lets be more explicit and write down an example. Say we have
n = 5 daughters, and we want any k = 3 of them to be able to
reconstruct the secret. Then we pick a polynomial f (x) of degree
d = k 1 = 2, and we make sure f (0) is the secret. Say the secret is
109, then we would generate f as
f (x) = 109 + random x + random x2
Note that if youre going to actually use this to distribute secrets
that matter, you need to be a bit more careful about the range of these
random numbers. For the sake of this example lets say theyre random
-bit integers, but in reality youd want to do everything with modular
arithmetic.
Then we distribute one point to each daughter as their share.

(1, f (1)), (2, f (2)), (3, f (3)), (4, f (4)), (5, f (5))
To give concrete numbers to the examples, if
f (x) = 109

55x + 271x2

then the secret is f (0) = 109 and the shares are


(1, 325), (2, 1083), (3, 2383), (4, 4225), (5, 6609).
The polynomial interpolation theorem tells us that with any three
points we can completely reconstruct f (x), and then plug in zero to
get the secret.
For example, using our polynomial interpolation algorithm, if we
feed in the rst, third, and fth shares we reconstruct the polynomial
exactly:
>>> pts = [(1, 325), (3, 2383), (5, 6609)]
>>> interpolate(pts)
109.0 + -55.0 x^1 + 271.0 x^2
>>> f = interpolate(pts); int(f(0))
109

At this point you should be asking the question: how do I know


theres not some other way to get f (x) (or even just f (0)!) if you
have fewer than d + 1 = k points? You should clearly understand
the claim being made. Its not just that we can reconstruct f (0) when
given enough points on f , but also were claiming impossibility, that
no algorithm can reconstruct f (0) with fewer than k points.
Indeed its true, and Ill make two little claims to show why. Say f is
degree d and you have even d points (just one fewer than the theorem
requires to get uniqueness). The rst claim is that there are innitely
many dierent degree d polynomials g passing through those d points.
Indeed, if you pick any new x value, say x = 0, and any y value and
you add (x, y) to your list of points, then you get an interpolating
polynomial for that list. Moreover, for each choice of y you get a
dierent interpolating polynomial (this is trivial: the polynomials will
pass through dierent points at x = 0).
The second claim is a consequence of the rst. If you only have d
points, then for any string x that you think might be the secret, there

Jeremy Kun

A Programmers Introduction to Mathematics

is a valid (d + 1)-th point that you could add to the list to make x as
the correct decoded value f (0).
Lets think about this last point. Say your secret is an English
sentence s = Hello, world! and you encode it with a degree
polynomial f (x) so that f (0) is a binary representation of s. If y
is the complete text of The Art of Computer Programming, Volume I,
and you give me 10 points f (1), f (2), . . . , f (10), I could have just as
easily chosen my secret to be y instead of s, and made a polynomial
for which the same 10 points occur as f (1) through f (10)! In other
words, your knowledge of the 10 points give you no information to
distinguish between whether the secret is s or y. If you try decoding
it, you might get sensible English, but youll never be able to tell its
right because any sensible English text could be the right answer, as
well as all possible junk text!
To drive this point home, lets go back to our small example secret
109 and encoded polynomial
55x + 271x2

f (x) = 109

and say I now give you just two points: (2, 1083), (5, 6609) and I
give you a desired fake decrypted message, say 533. Then my claim
is that I can come up with a polynomial that has f (2) = 1083 and
f (5) = 6609, and also f (0) = 533. Indeed, we already wrote the code
to do this!
>>> pts = [(2, 1083), (5, 6609)]
>>> interpolate(pts + [(0, 533)])
533.0 + -351.7999999999999 x^1 + 313.4 x^2
>>> f = interpolate(pts + [(0, 533)]); int(f(0))
533.0

You should notice that the coecients of the fake secret polynomial are no longer integers, but this problem is xed when you do
everything with modular arithmetic instead of oating point numbers.
This property raises some interesting security questions. For example, if the secret is, say, the text of a document instead of the key-code
to a safe, and if one of the greedy daughters sees the shares of two
others before revealing her own, she could conceivably come up with
a share that produces whatever decoded messsage she wants, such
as a will giving her the entire inheritance!

This property of being able to decode any possible plaintext given


an encrypted text is called perfect secrecy, and its an early topic on a
long journey through mathematical cryptography.

. Cultural Review
Here were the main cultural points introduced in this chapter:
. In math we have many denitions for an object, we prefer to
work with the denition that is easiest to keep neat in our head,
and we often dont say when we switch between two views.
. Whenever you see a denition you must immediately write
down examples. They are your test cases for the rest of the text.
. In mathematics we place emphasis on elegance over utility.

. Exercises
. Look up a proof of Theorem . There are many dierent proofs.
Either read one and understand it using the techniques we described in this chapter (writing down examples and tests), or if
you cannot then write down the words in the proofs that you
dont understand and look for them later in this book.
. Write down examples for the following denitions: Two integers
a, b are said to be relatively prime if their only common divisor
is . Let n be a positive integer, and dene by '(n) the number
of positive integers less than n that are relatively prime to n.
. Verify the following theorem using the examples from the previous exercise. If a, n are relatively prime integers, then a'(n) has
remainder when dividing by n. This result is known as Eulers
theorem (pronounced OY-lurr), and it is the keystone of the
RSA cryptosystem.
. Write a web app that implements the distribution and reconstruction of the secret sharing protocol using the polynomial

See this blog post for more: http://jeremykun.com/2011/07/29/


encryption-rsa/

Jeremy Kun

A Programmers Introduction to Mathematics

interpolation algorithm presented in this chapter, and beef it up


by xing a -bit prime p and doing modular arithmetic modulo
p.

. Chapter Notes
Which are polynomials?
The polynomials were f (x), g(x),
p h(x), j(x), and l(x). The reason i
is not a polynomial is because x = x1/2 does not have an integer
power. Similarly, k(x) is not a polynomial because its terms have
negative integer powers. Finally, m(x) is not because its powers, , e,
in addition to being very scary, are not integers. Of course, if you were
to dene and e to be particular variables that happened to be integers,
then the result would be a polynomial. But without any indication
youre supposed to infer that theyre the famous constants.

You might also like