Polynomials and Defintions
Polynomials and Defintions
Polynomials and Defintions
Polynomials
The only way to learn mathematics is to do mathematics.
Paul Halmos
Lets start our journey with polynomials. In studying polynomials,
well reveal some of the implicit assumptions behind mathematical definitions, work carefully through two nontrivial proofs, and learn about
how to share secrets using something called polynomial interpolation.
where all the ai are real numbers. The ai are called coecients of f .
The degree of the polynomial is the integer n.
Lets break this denition down a bit. When I say f (x) = a0 + a1 x +
a2 x2 + + an xn , I am implying a few things. First, that x is the input
of the function and not part of the data that makes up the polynomial.
Jeremy Kun
Second, the ellipses imply a repeated pattern all the way up to n. You
as the reader are supposed to be able to pick up on the pattern, but the
author usually doesnt make it particularly complicated to do so. In
this, its supposed to be clear that the next term is a3 x3 and that the
index and exponent increase by every step until n.
The key here is that, even though polynomials are really dened by
the numbers that make up their coecients (the choice of n and the
numbers ai ), polynomials at their heart are functions with a specic
structure. The input is multiplied and added together to produce
an output. The class of all polynomials as dened above consists of
every possible function of a single input that can be expressed using
multiplication and addition. Further, if you were to allow more than
one input (multivariable polynomials) this would represent all possible
multiply and/or add functions on any number of inputs.
This example may seem frivolous to illustrate the dierence between an object and its denition, but the same pattern often lurks
behind more complicated denitions. First the author will start with a
denition that seems (to them, with the hindsight of years of study) to
be the most useful way to communicate the idea behind the concept.
For us thats Denition above. Often these denitions seem totally
useless from a programming perspective.
Then ten pages down the line the author introduces a second definition which turns out to be equivalent to the rst. Any properties
dened in the rst denition automatically hold in the second and vice
versa; theyre dierent perspectives on identical things. The second
denition is the one that allows for nice programs. You might think
the author was crazy not to start with the second denition, but then
later down the line its rst denition that really sticks, generalizes, or
guides you through proofs.
As a sneak peek, well see this happen when we introduce linear
algebra in Chapter ??. You nd out that linear maps are equivalent to
matrices in some formal sense. And while linear maps are easy to conceptualize, the corresponding operations on matrices are complicated
and best suited for a computer. In this sense, matrices are analogous to
the data denition of a polynomial. And, as a mathematician would
argue, you cant see the elegance or truly grok linear algebra if you
only ever see the data denition.
In any case, the point is that we can convert between these two
Jeremy Kun
Like testing, part of the point is to weed out those little edge cases.
Such as the fact that the exponents need to be nonnegative integers,
though I only stated it implicitly in the denition. What seems like a
strange edge case can just be a matter of taste, sometimes signied by
the phrase by convention, or it can be a key ingredient in the concept
itself. Either way, having the examples in your pocket as you continue
to read is important, and coming up with the examples yourself is what
helps you internalize the concept.
Before we move on, lets spend a second clarifying the syntax of
denition. Every denition emphasizes the thing being dened in
some way. Either the entire text of the denition is in italics and the
dened term is not, or vice versa. When written on a chalkboard, one
underlines the new term. When the thing being dened is a function,
one usually uses the compact arrow notation f : A ! B to describe the
allowed inputs and outputs. All possible inputs are collectively called
the domain, and all possible outputs are called the range. However, if
we call f : A ! B the function, then B is not called the range, but
rather the codomain. The reason we make a distinction is essentially
syntax versus semantics. Think of it as a type signature or function
header like
float f(int x) {
...
}
We know that f outputs floats, but it might not output every oating point number if you try all possible int inputs. So the codomain
would be all oats, but the range would depend on the semantics of
the function. Its the same for a mathematical function f : A ! B.
All of this is essentially to say that function application is syntactically
the same as most programming languages, and the only dierences
are in terminology and the notation f : A ! B.
Because mathematicians were not originally constrained by ASCII,
they developed other symbols for types. The one for the reals is R. So
the arrow notation well use for polynomials is f : R ! R. Moreover,
saying over the reals doesnt have to do with the domain or codomain,
Jeremy Kun
This set of all points is a ridiculous set you could never ever write
down explicitly, but its just as valid a representation of a polynomial
as any other for the purpose of reasoning.
Jeremy Kun
To say a function f (x) passes through a point (a, b) means that f (a) = b. When
we say this were thinking of f as a geometric curve. Its passing through the point
because we imagine ourselves as a dot on the curve moving along it. That perspective
likely comes from calculus, but at least it allows for suggestive language in place of
notation.
writing down the denition for a polynomial in this special case and
solving the two resulting equations.
Alright. The denition of a degree polynomial is a function of the
form
f (x) = a0 + a1 x
And so writing down the two equations f (2) = 3, f (7) = 4, we are
trying to simultaneously solve:
a0 + a1 2 = 3
a0 + a1 7 = 4
If we solve for a0 in the rst equation, we get a0 = 3 2a1 . Substituting that into the second equation we get (3 2a1 ) + a1 7 = 4.
If we solve this we get a1 = 1/5. Plugging this back into the rst
equation gives a0 = 2 3/5. This has forced the polynomial to be
exactly f (x) = (2 3/5) + (1/5)x, as desired. If we want to write
this in the usual notation as a line, its
y=
7 1
+ x
5 5
Jeremy Kun
Just in case youre still stuck, lets follow our pattern from before.
Saying a degree polynomial passes through these two points is equivalent to saying, if we call a0 + a1 x our polynomial, that there is a
simultaneous solution to the following two equations f (2) = 3 and
f (2) = 5.
a0 + a1 2 = 3
a0 + a1 2 = 5
What happens when you try to solve these equations like we did
before?
What about for three points or more? Well, thats the point at which
it might start to get dicult to compute. You can try by setting up
equations like those I wrote above, and with some elbow grease youll
probably make it work. Such things are best done in private so you
can make plentiful mistakes without being judged for it.
Now that weve worked out two examples of the theorem in action,
lets move on to the proof. The proof will have two parts, the existence
part and the uniqueness part. That is, rst well show that a polynomial
satisfying the requirements exists, and then well show that if two
polynomials both satised the requirements, theyd have to be the
same. In other words, there can only be one polynomial with that
property. In this chapter well spell out the details of the proof from
the start. In later chapters well gradually become more terse, with the
goal of training you to spot and ll in the details.
Existence. We will show existence by direct construction. What I
mean by that is well be clever and nd a general way to write down
a polynomial that works. Being clever sounds scary, but the process is
actually quite natural, and it follows the same pattern as we did for
reading and understanding denitions: you start with the simplest
possible example (but this time the example will be generic) and then
you work up to more complicated examples. By the time we get to
n = 2 we will notice a pattern (this is the clever part), and that pattern
will suggest a big formula for the general solution, which we will prove
is correct. In fact, once we understand how to build the big formula,
the proof that it works will be trivial.
Lets start with a single point (x1 , y1 ) and n = 0. Im not specifying
the values of x1 or y1 because I dont want the construction to depend
f (x) =
y1
(x
x2 ) +
y2
(x x1 ),
x1 x2
x2 x1
and simplify with typical algebra to get the form required by the
denition:
x1 y2 x2 y1
y1 y2
f (x) =
+
x
x1 x2
x1 x2
Indeed, instead of doing all that algebra I could have just recognized
that no powers of x appear in the formula for f that are larger than ,
and Im never multiplying two xs together. Since these are the only
ways to get degree bigger than , if I want I can skip the algebra and
be condent that the degree is .
The key to the above idea, and the reason we wrote it down in
that strange way, is so that each constraint (i.e. f (x1 ) = y1 ) could be
isolated in its own term, while all the other terms evaluate to zero. For
Jeremy Kun
three points (x1 , y1 ), (x2 , y2 ), (x3 , y3 ) we just have to beef up the terms
to maintain the same property: when you plug in x1 , all terms except
the rst evaluate to zero and the fraction in the rst term evaluates
to . When you plug in x2 , the second term is the only one that stays
nonzero, and likewise for the third. Here is the generalization that
does the trick.
f (x) = y1 (x(x1
x2 )(x x3 )
x2 )(x1 x3 )
+ y2 (x(x2
x1 )(x x3 )
x1 )(x2 x3 )
+ y3 (x(x3
x1 )(x x2 )
x1 )(x3 x2 )
Now you come in. Evaluate f at x1 and verify that the second
and third terms are zero, and that the rst term simplies to y1 . The
symmetry in the formula should convince you that the same holds
true for x2 , x3 without having to go through all the steps two more
times.
Again, its clear that the polynomial we dened is degree , because
each term consists of a product of two terms (x xi ) and taking their
product gives degree two. This has saved me the eort of multiplying
all that nonsense out to get something in the form of Denition .
So now the general form for points (x1 , y1 ), . . . , (xn , yn ) should
follow the same pattern. Add up a bunch of terms, and for the i-th
term you multiply yi by a fraction you construct according to the rule:
the numerator is the product of x xj for every j except i, and the
denominator is a product of all the (xi xj ) for the same js as the
numerator. It works for the same reason that our formula works for
three terms above. In fact, the process is clear enough that you could
write a program to build these polynomial quite easily, and well walk
through such a program together at the end of the chapter.
Here is the notation version of the process we just described in
words. Its a mess, but well break it down.
!
n
X
Y x xj
f (x) =
yi
xi xj
i=0
j6=i
P Q
What a mouthful! Ill assume the , symbols are new to you.
They are read semantically as sum and product, or typographically
as sigma and pi. they essentially
Pn represent loops of arithmetic.
That is, if I have a statement like i=0 (expr), it is equivalent to the
following code snippet.
int i;
sometype theSum = defaultValue;
for (i = 0; i <= n; i++) {
theSum += expr;
}
return theSum;
I wrote it this way because defaultValue is whatever the understood zero object is in that setting, and there are lots of dierent
settings where addition is dened. For numbers the default is zero,
but if were summing over vectors it would be the zero vector, or if
were on this exotic thing called
P an elliptic curve it would be a base
point. The point is that the
notation does not imply a specic type
of the thing being
Q summed. Moreover, writing it this way allows me
to dene the symbol by analogy: you just replace += with *= and
reinterpret the default value (for numbers it would be ). Functional
programmers will know this pattern well, because its just a fold
operation.
Q
Finally, when I say something like j6=i there are three extra things.
First, recall that in this context i is xed by the outer loop, so j is the
looping variable. Unfortunately the reader is required to keep track
of scope when it comes to indices in sums and products. Second, the
bounds on j are not stated; we have to infer them from the context.
There are two hints: were comparing j to i, so it should probably have
the same range as i unless otherwise stated, and we can see where in
the expression were using j. Were using it as an index to the x-values
of the points xj . Since the x-values go from 0 to n, wed expect j to
have that range. It might seem totally unrigorous to a programmer, but
if mathematicians consider it easy to infer the intent of a notation,
then it is considered rigorous enough.
Though it sometimes makes me cringe to say it, you have to give the
author the benet of the doubt. When things are ambiguous, you have
to pick the option that doesnt break the math. In this respect, you have
to act as both the tester, the compiler, and the bug xer when youre
reading math. The best default assumption is that the author is far
Another reason is that mathematicians get tired of writing these obvious details out
over and over again.
Jeremy Kun
return theSum;
Compare this with the notational version above and make sure you
can connect the structural pieces, i.e., excluding the fact that we arent
using xj or yi at all in the code.
One diculty of reading mathematics is that the author will almost
never go through these details for the reader. Its a rather subtle point
to be making so early, but its probably the rst thing you notice when
you read math books. Instead of doing the details, a typical proof of
the existence of these polynomials looks like this.
Proof. Let (x1 , y1 ), . . . , (xn , yn ) be a list of points with no two xi the
same. To show existence, construct f (x) as
f (x) =
n
X
i=0
yi
Y x
xi
j6=i
xj
xj
Jeremy Kun
So suppose f, g are two such polynomials. Lets look at the polynomial (f g)(x). I hope its clear what I mean by f g. If we use the
list denition (Denition ) then its just the entrywise dierence
of the coecients. If we use the structural denition then we can
dene it as (f g)(x) = f (x) g(x). Either way, its important that
f g is a polynomial.
Now what do we know about f g? Well, its certainly got degree at
most n, because you cant magically produce a coecient of x7 if you
subtract two polynomials whose highest-degree terms are x5 . Moreover, you know that (f g)(xi ) = 0 for all i. This is by assumption:
for every i, f (xi ) = g(xi ) = yi , so subtracting them gives zero.
Now we apply the fact about polynomials. If we call d the degree of
f g, we know that d n, and hence that f g can have no more
than d roots unless its the zero polynomial. But there are n + 1 many
points xi where f g is zero, and n + 1 > n d. The conclusion is
that f g must be the zero polynomial, meaning f and g have the
same coecients.
Just for completeness, Ill write the above argument more briey
and put the whole proof of the theorem together as it would show up
in a standard textbook. That is, extremely tersely.
Proof. Let (x1 , y1 ), . . . , (xn , yn ) be a list of points with no two xi the
same. To show existence, construct f (x) as
!
n
X
Y x xj
f (x) =
yi
xi xj
i=0
j6=i
. Realizing it in code
For the sake of concreteness, lets write a python program that interpolates points. Im going to assume the existence of a polynomial
class that accepts as input a list of coecients (in the same order as
Denition , starting from the constant term) and has methods for
adding, multiplying, and evaluating at a given value. All of this code,
including my own version of the polynomial class, is available at this
books Github repository. Note the polynomial class is not intended
to be perfect. Im certainly leaving the code open to oating point
rounding errors and other such things. The point of the code is not
to be industry-strength, but to help you understand the constructions
weve seen in the chapter. On to the code.
Here are some examples of constructing polynomials.
zero = Polynomial([]) # zero polynomial
f = Polynomial([1,2,3]) # 1 + 2 x + 3 x^2
g = Polynomial([-8, 17, 0, 5]) # -8 + 17 x + 5 x^3
Now rst well write the main interpolate function. It uses the
yet-to-be-dened function singleTerm.
# pts is a list of (float, float) of length n+1.
# Return the unique degree n polynomial that passes through
pts.
def interpolate(pts):
if len(pts) == 0:
raise Exception(Must provide at least one point.)
xValues = [p[0] for p in pts]
if len(set(xValues)) < len(xValues):
http://github.com/j2kun/PICM
Jeremy Kun
The rst two blocks check for the edge cases, zero points or repeating
x-values. Finally, the last block creates a list of terms, each one being
a term of the sum from the construction. The return statement sums
all the terms, with the second argument being the starting value for
the sum, in this case the zero polynomial.
Now for the singleTerm function.
# pts is a list of (float, float)
# i is an integer index of pts.
# Return a term from the sum in the construction.
def singleTerm(pts, i):
theTerm = Polynomial([1.])
xi, yi = pts[i]
for j, p in enumerate(pts):
if j == i:
continue
xj = p[0]
theTerm = theTerm * Polynomial([-xj / (xi - xj), 1.0/(xi
- xj)])
return theTerm * Polynomial([yi])
Jeremy Kun
(1, f (1)), (2, f (2)), (3, f (3)), (4, f (4)), (5, f (5))
To give concrete numbers to the examples, if
f (x) = 109
55x + 271x2
Jeremy Kun
is a valid (d + 1)-th point that you could add to the list to make x as
the correct decoded value f (0).
Lets think about this last point. Say your secret is an English
sentence s = Hello, world! and you encode it with a degree
polynomial f (x) so that f (0) is a binary representation of s. If y
is the complete text of The Art of Computer Programming, Volume I,
and you give me 10 points f (1), f (2), . . . , f (10), I could have just as
easily chosen my secret to be y instead of s, and made a polynomial
for which the same 10 points occur as f (1) through f (10)! In other
words, your knowledge of the 10 points give you no information to
distinguish between whether the secret is s or y. If you try decoding
it, you might get sensible English, but youll never be able to tell its
right because any sensible English text could be the right answer, as
well as all possible junk text!
To drive this point home, lets go back to our small example secret
109 and encoded polynomial
55x + 271x2
f (x) = 109
and say I now give you just two points: (2, 1083), (5, 6609) and I
give you a desired fake decrypted message, say 533. Then my claim
is that I can come up with a polynomial that has f (2) = 1083 and
f (5) = 6609, and also f (0) = 533. Indeed, we already wrote the code
to do this!
>>> pts = [(2, 1083), (5, 6609)]
>>> interpolate(pts + [(0, 533)])
533.0 + -351.7999999999999 x^1 + 313.4 x^2
>>> f = interpolate(pts + [(0, 533)]); int(f(0))
533.0
You should notice that the coecients of the fake secret polynomial are no longer integers, but this problem is xed when you do
everything with modular arithmetic instead of oating point numbers.
This property raises some interesting security questions. For example, if the secret is, say, the text of a document instead of the key-code
to a safe, and if one of the greedy daughters sees the shares of two
others before revealing her own, she could conceivably come up with
a share that produces whatever decoded messsage she wants, such
as a will giving her the entire inheritance!
. Cultural Review
Here were the main cultural points introduced in this chapter:
. In math we have many denitions for an object, we prefer to
work with the denition that is easiest to keep neat in our head,
and we often dont say when we switch between two views.
. Whenever you see a denition you must immediately write
down examples. They are your test cases for the rest of the text.
. In mathematics we place emphasis on elegance over utility.
. Exercises
. Look up a proof of Theorem . There are many dierent proofs.
Either read one and understand it using the techniques we described in this chapter (writing down examples and tests), or if
you cannot then write down the words in the proofs that you
dont understand and look for them later in this book.
. Write down examples for the following denitions: Two integers
a, b are said to be relatively prime if their only common divisor
is . Let n be a positive integer, and dene by '(n) the number
of positive integers less than n that are relatively prime to n.
. Verify the following theorem using the examples from the previous exercise. If a, n are relatively prime integers, then a'(n) has
remainder when dividing by n. This result is known as Eulers
theorem (pronounced OY-lurr), and it is the keystone of the
RSA cryptosystem.
. Write a web app that implements the distribution and reconstruction of the secret sharing protocol using the polynomial
Jeremy Kun
. Chapter Notes
Which are polynomials?
The polynomials were f (x), g(x),
p h(x), j(x), and l(x). The reason i
is not a polynomial is because x = x1/2 does not have an integer
power. Similarly, k(x) is not a polynomial because its terms have
negative integer powers. Finally, m(x) is not because its powers, , e,
in addition to being very scary, are not integers. Of course, if you were
to dene and e to be particular variables that happened to be integers,
then the result would be a polynomial. But without any indication
youre supposed to infer that theyre the famous constants.