An Introduction To Highed Mathematics
Although mathematical ability and opinions about mathematics vary widely, even among educated
people, there is certainly widespread agreement that mathematics is logical. Indeed, properly
conceived, this may be one of the most important defining properties of mathematics.
Logical thought and logical arguments are not easy to come by (ponder some of the discussions
you encounter on current topics such as abortion, climate change, evolution, gun control, or same
sex marriage to appreciate this statement), nor is it always clear whether a given argument is
logical (that is, logically correct). Logic itself deserves study; the right tools and concepts can
make logical arguments easier to discover and to discern. In fact, logic is a major and active area
of mathematics; for our purposes, a brief introduction will give us the means to investigate more
traditional mathematics with confidence.
Logical Operations
Mathematics typically involves combining true (or hypothetically true) statements in various ways
to produce (or prove) new true statements. We begin by clarifying some of these fundamental
By a sentence, we mean a statement that has a definite truth value of either true (T) or
false (F). For example,
In terms of area, Pennsylvania is larger than Iowa. (F)
The integer 289 is a perfect square. (T)
Because we insist that our sentences have a truth value, we are not allowing sentences such as
Chocolate ice cream is the best.
This statement is false.
Chapter 1 Logic
since the first is a matter of opinion and the second leads to a logical dilemma. More generally, by
a formula we mean a statement, possibly involving some variables, which is either true or false
whenever we assign particular values to each of the variables. (Formulas are sometimes referred to
as open sentences.) We typically use capital letters such as P , Q, and R to designate formulas.
If the truth of a formula P depends on the values of, say, x, y, and z, we use notation like P (x, y, z)
to denote the formula.
EXAMPLE 1.1 If P (x, y) is x2 + y = 12, then P (2, 8) and P (3, 3) are true, while P (1, 4)
and P (0, 6) are false. If Q(x, y, z) is x + y < z, then Q(1, 2, 4) is true and Q(2, 3, 4) is false. If
R(f (x)) is f (x) is differentiable at 0, then R(x2 + 2x) is true and R(|x|) is false.
Whether a sentence is true or false usually depends on what we are talking aboutexactly
the same sentence may be true or false depending on the context. As an example, consider the
statement the equation x2 + 1 = 0 has no solutions. In the context of the real numbers, this
statement is true; there is no real number x with the property that x2 + 1 = 0. However, if we allow
complex numbers, then both i and i are solutions to the equation. In this case, the statement
the equation x2 + 1 = 0 has no solutions is false. Examples such as this one emphasize how
important it is to be perfectly clear about the context in which a statement is made.
The universe of discourse for a particular branch of mathematics is a set that contains all
of the elements that are of interest for that subject. When we are studying mathematical formulas
such as x divides y or f is differentiable at each point, the variables are assumed to take values
in whatever universe of discourse is appropriate for the particular subject (the set of integers for
the first example and the set of continuous functions for the second). The universe of discourse is
frequently clear from the discussion, but occasionally we need to identify it explicitly for clarity.
For general purposes, the universe of discourse is usually denoted by U .
Complicated sentences and formulas are put together from simpler ones using a small number
of logical operations. Just a handful of these operations allow us to represent everything we need
to say in mathematics. These operations and their notation are presented below.
The denial (or negation) of a formula P is the formula not P , which is written symbolically
as P . The statement P is false if P is true and vice versa. (This fact follows from the types
of statements we are willing to accept as sentences.) For example, the denial of the false sentence
6 is a prime number is the true sentence 6 is not a prime number and the denial of the true
sentence 343 is a perfect cube is the false sentence 343 is not a perfect cube.
The conjunction of the formulas P and Q is the formula P and Q, which is written symbolically as P Q. For P Q to be true both P and Q must be true, otherwise it is false. For
example (the reader can easily identify P and Q),
5 > 6 and 7 = 8. (F)
17 is prime and 324 is a perfect square. (T)
n 1 o
1 k o
converges to 0 and
converges to 1. (F)
The disjunction of the formulas P and Q is the formula P or Q, which is written symbolically
as P Q. It is important to note that this is an inclusive or, that is, either or both. In other
Logical Operations
words, if P , Q, or both P and Q are true, then so is P Q. The only way P Q can be false is if
both P and Q are false. For example (once again, the reader can easily identify P and Q),
5 < 7 or 8 < 10. (T)
19 is prime or 4 divides 15. (T)
k 1/2 converges or
2 converges. (F)
Suppose that P and Q are formulas. The sentence if P , then Q or P implies Q is written
P Q, using the conditional symbol, . It is not obvious (at least not to most people) under
what circumstances P Q should be true. In part this is because if . . . , then . . . is used in
more than one way in ordinary English, yet we need to fix a rule that will let us know precisely
when P Q is true. Certainly, if P is true and Q is false, P cannot imply Q, so P Q is false
in this case. To help us with the other cases, consider the following statement:
If x is less than 2, then x is less than 4.
This statement should be true regardless of the value of x (assuming that the universe of discourse
is something familiar, like the integers). If x is 1, it evaluates to T T, if x is 3, it becomes
F T, and if x is 5, it becomes F F. So it appears that P Q is true unless P is true and Q
is false. This is the rule that we adopt.
Finally, the biconditional involving the formulas P and Q is the sentence P if and only if
Q, written as P Q. Sometimes the phrase if and only if is abbreviated as iff , but we will
not use this shorthand here. It should be clear that P Q is true when P and Q have the same
truth value, otherwise it is false.
EXAMPLE 1.2 Suppose P (x, y) is x + y = 2 and Q(x, y) is xy > 1. Then when x = 1 and
y = 1, the sentences
P (x, y), P (x, y) Q(x, y), P (x, y) Q(x, y), P (x, y) Q(x, y), P (x, y) Q(x, y)
have truth values F, F, T, F, F, respectively, and when x = 2 and y = 3, they have truth values T,
F, T, T, F, respectively.
Using the operations , , , , and , we can construct compound expressions such as
(P (Q)) ((R) ((P ) Q)).
As this example illustrates, it is sometimes necessary to include many parentheses to make the
grouping of terms in a formula clear. Just as in algebra, where multiplication takes precedence
over addition, we can eliminate some parentheses by agreeing on a particular order in which logical
operations are performed. We will apply the operations in this order, from first to last: , , ,
and . Thus
is short for
A (B (C (D))).
It is generally a good idea to include some extra parentheses to make certain the intended meaning
is clear.
Chapter 1 Logic
Much of the information we have discussed can be summarized in truth tables. For example,
the truth table for P is:
This table has two rows because there are only two possibilities for the truth value of P . The other
logical operations involve two formulas, so they require four rows in their truth tables.
Any compound expression has a truth table. If there are n simple (that is, not compound) formulas
in the expression, then there will be 2n rows in the table because there are this many different ways
to assign Ts and Fs to the n simple formulas in the compound expression. The truth table for
(P Q) R is
P Q R P Q R (P Q) R
Observe how the inclusion of intermediate steps makes the table easier to calculate and read.
A tautology is a logical expression that always evaluates to T, that is, the last column of
its truth table consists of nothing but Ts. A tautology is sometimes said to be valid. (Although
valid is used in other contexts as well, this should cause no confusion.) For example, the statement
(P Q) P P is a tautology, since its truth table is:
(P Q) P
(P Q) P P
We list a few important tautologies in the following theorem, including the names by which
some of the tautologies are referred to in the literature.
Logical Operations
a) P P (excluded middle)
b) P (P ) (double negation)
c) P Q Q P
d) P Q Q P
e) (P Q) R P (Q R)
f ) (P Q) R P (Q R)
g) P (Q R) (P Q) (P R)
h) P (Q R) (P Q) (P R)
i) (P Q) (P Q) (conditional disjunction)
j) (P Q) (Q P ) (contraposition)
k) (P (P Q)) Q (modus ponens)
l) P (P Q)
m) (P Q) P
n) ((P Q) P ) Q (disjunctive syllogism)
o) (P Q) ((P Q) (Q P )) (logical biconditional)
Proof. The proofs are left as exercises. However, we note in passing that it is not always necessary
to use a truth table to verify a tautology. For example, a proof of (j) can be written as
(P Q) (P Q)
by part (i)
(Q P )
by part (c)
((Q) P )
by part (b)
(Q P )
by part (i)
In other words, previous results can sometimes be used to prove other results.
In reading through Theorem 1.3, you may have noticed that and satisfy many similar
properties. These are called dual notionsfor any property of one, there is a nearly identical
property that the other satisfies, with the instances of the two operations interchanged. This often
means that when we prove a result involving one notion, we get the corresponding result for its
dual with no additional work.
Observe that (c) and (d) are commutative laws, (e) and (f) are associative laws, and (g) and
(h) show that and distribute over each other. This suggests that there is a form of algebra for
logical expressions similar to the algebra for numerical expressions. This subject is called Boolean
Algebra and has many uses, particularly in computer science.
If two formulas always take on the same truth value no matter what elements from the universe
of discourse we substitute for the various variables, then we say they are equivalent. The advantage
of equivalent formulas is that they say the same thing but in a different way. For example, algebraic
manipulations such as replacing x2 2x = 12 with (x 1)2 = 13 fit into this category. It is always
Chapter 1 Logic
a valid step in a proof to replace some formula by an equivalent one. In addition, many tautologies
contain important ideas for constructing proofs. For example, (o) says that if you wish to show
that P Q, it is possible (and often advisable) to break the proof into two parts, one proving the
implication P Q and the second proving the converse, Q P .
Since we just mentioned the term converse, this is probably a good place to refresh your
memory of some familiar terminology. In the conditional sentence P Q, the sentence P is usually
referred to as the hypothesis and the sentence Q is called the conclusion. By rearranging and/or
negating P and Q, we can form various other conditionals related to P Q; you may remember
doing this in a high school geometry class. Beginning with the conditional P Q, the converse
is Q P , the contrapositive is Q P , and the inverse is P Q. As an illustration,
consider the following important theorem from differential calculus.
As indicated in part (j) of Theorem 1.3, a conditional and its contrapositive always have the same
truth value. It is very important to note that the converse may or may not have the same truth
value as the given conditional; the previous illustration provides one example where the truth values
of a statement and its converse are not the same. (Be certain that you can give an example to
show that the converse in this case is false.) It is a common mistake for students to turn theorems
around without thinking much about it. Be aware of this potential pitfall and think carefully before
drawing conclusions. The inverse, which is the contrapositive of the converse, is not referred to
very often in mathematics.
George Boole. Boole (18151864) had only a common school education, though he learned
Greek and Latin on his own. He began his career as an elementary school teacher, but decided
that he needed to know more about mathematics, so he began studying mathematics, as well as
the languages he needed to read contemporary literature in mathematics. In 1847, he published
a short book, The Mathematical Analysis of Logic, which may fairly be said to have founded the
study of mathematical logic. The key contribution of the work was in redefining mathematics
to mean not simply the study of number and magnitude, but the study of symbols and their
manipulation according to certain rules. The importance of this level of abstraction for the future
of mathematics would be difficult to overstate. Probably on the strength of this work, he moved
into a position at Queens College in Cork.
In Investigation of the Laws of Thought, published in 1854, Boole established a real formal
logic, developing what today is called Boolean Algebra, or sometimes the algebra of sets. He
used the symbols for addition and multiplication as operators, but in a wholly abstract sense.
Today these symbols are still sometimes used in Boolean algebra, though the symbols and ,
Logical Operations
and and , are also used. Boole applied algebraic manipulation to the process of reasoning.
Heres a simple example of the sort of manipulation he did. The equation xy = x (which today
might be written x y = x or x y = x) means that all things that satisfy x satisfy y, or in
our terms, x y. If also yz = y (that is, y z), then substituting y = yz into xy = x gives
x(yz) = x or (xy)z = x. Replacing xy by x, we get xz = x, or x z. This simple example of
logical reasoning (essentially the transitive property) is used over and over in mathematics.
In 1859, Boole wrote Treatise on Differential Equations, in which he introduced the algebra
of differential operators. Using D to stand for the derivative of, the second order differential
equation ay 00 + by 0 + cy = 0 may be written as aD2 (y) + bD(y) + cy = 0, or in the more compact
form (aD2 + bD + c)y = 0. Remarkably, the solutions to aD2 + bD + c = 0, treating D as a number,
provide information about the solutions to the differential equation. (If you have taken differential
equations, you should be familiar with this approach to solving linear differential equations.)
The information here is taken from A History of Mathematics, by Carl B. Boyer, New York:
John Wiley and Sons, 1968. For more information, see Lectures on Ten British Mathematicians,
by Alexander Macfarlane, New York: John Wiley & Sons, 1916.
Exercises 1.1.
1. Determine the truth value of each of the following statements.
a) 51 is a prime or 128 is a square.
b) 211 is a prime and 441 is a square.
converges to 1 or
2k 1
and =
d) e =
converges or
f ) The graph of y =
is concave down on R and
1 + x2
dx = .
1 + x2
b) P (Q P )
c) (P Q) (P R)
d) P (Q R)
b) x = 1, y = 2
c) x = 3, y = 1
d) x = 2, y = 1
Chapter 1 Logic
5. Write the converse and contrapositive of each of the following conditionals. Use your knowledge of
mathematics to determine if the statements are true or false. (Note that there are three statements
to consider for each part; the original statement and the two new ones derived from it.) For parts (a)
and (e), consider different universes and see if the truth values of the statements change.
a) If x = y, then x3 = y 3 .
b) If f and g are differentiable on R, then f + g is differentiable on R.
c) If
an converges, then lim an = 0.
Recall that a formula (or open sentence) is a statement whose truth value may depend on the
values of some variables. For example, the formula (x 5) (x > 3) is true for x = 4 and
false for x = 6. Compare this with the statement For every x, (x 5) (x > 3), which is false
and the statement There exists an x such that (x 5) (x > 3), which is true. The phrase
for every x (sometimes for all x) is called a universal quantifier and is denoted by x. The
phrase there exists an x such that is called an existential quantifier and is denoted by x. A
formula that contains variables is not simply true or false unless each of the variables is bound by
a quantifier. If a variable is not bound, the truth of the formula is contingent on the value assigned
to the variable from the universe of discourse.
We were careful in Section 1.1 to define the truth values of compound statements precisely. We
do the same for x P (x) and x P (x), though the intended meanings of these are clear.
Let R be the universe of discourse.
If we say, if x is negative, so is its cube, we usually mean every negative x has a negative
cube. This should be written symbolically as x ((x < 0) (x3 < 0)).
If two numbers have the same square, then they have the same absolute value should be
written as x y ((x2 = y 2 ) (|x| = |y|)).
If x = y, then x + z = y + z should be written as x y z ((x = y) (x + z = y + z)).
If S is a set, the sentence every x in S satisfies P (x) is written formally as
x ((x S) P (x)).
(We assume that the reader has some familiarity with sets; a set is a collection of objects and the
notation x S means that the element x belongs to the set S. Sets will be discussed in Section
1.5.) For clarity and brevity, this is usually written x S (P (x)) or (x S)(P (x)) if there is any
chance of confusion. To understand and manipulate the formula x S (P (x)) properly, you will
sometimes need to unabbreviate it, rewriting it as x ((x S) P (x)). With R as the universe
of discourse, we use
Chapter 1 Logic
sentence some positive integers are rational numbers between 10 and 0 is certainly false, but
x (P (x) Q(x))
is true. To see this, suppose x0 = 7. Then the implication P (x0 ) Q(x0 ) is true (since the
hypothesis is false) and the existential quantifier is satisfied.
We use abbreviations of the some form much like those for the all form.
EXAMPLE 1.5 Let R be the universe of discourse.
x < 0 (x2 = 1) stands for x ((x < 0) (x2 = 1)).
x [0, 1] (2x2 + x = 1) stands for x ((x [0, 1]) (2x2 + x = 1)).
If corresponds to all and corresponds to some, do we need a third quantifier to correspond to none? As the following examples show, this is not necessary:
No perfect squares are prime, can be written x (x is a perfect square x is not prime);
No triangles are rectangles, can be written x (x is a triangle x is not a rectangle);
No unbounded sequences are convergent, can be written x (x is an unbounded sequence
x is not convergent).
In general, the statement no x satisfying P (x) satisfies Q(x) can be written as
x (P (x) Q(x))
or, equivalently, as
x (Q(x) P (x)).
(You may wonder why we do not use x (P (x) Q(x)). In fact, we couldit is equivalent to
x (P (x) Q(x)); such statements will be considered in the next section.)
Exercises 1.2.
Except for problems 2 and 3, assume that the universe of discourse is the set of real numbers.
1. Express the following as formulas involving quantifiers.
a) Any number raised to the fourth power is nonnegative.
b) Some number raised to the third power is negative.
c) The sine of a number is always between 1 and 1, inclusive.
d) 10 raised to any negative power is strictly between 0 and 1.
2. Let U represent the set of all living people, let T (x) be the statement x is tall, and let B(x) be the
statement x plays basketball. Express the following as formulas involving quantifiers.
a) All basketball players are tall.
3. Suppose X and Y are sets. Express the following as formulas involving quantifiers.
a) Every element of X is an element of Y .
d) No element of X is an element of Y .
b) f is constant on R
De Morgans Laws
De Morgan's Laws
If P is some sentence or formula, then (as we have seen) P is called the denial or negation
of P . The ability to manipulate the denial of a formula accurately is critical to understanding
mathematical arguments. The following tautologies are referred to as De Morgans Laws:
(P Q) (P Q)
(P Q) (P Q).
These are easy to verify using truth tables, but with a little thought, they are not hard to understand
directly. The first says that the only way that P Q can fail to be true is if both P and Q fail to
be true. For example, the statements x is neither positive nor negative and x is not positive
and x is not negative clearly express the same thought. For an example of the second tautology,
consider x is not between 2 and 3. This can be written symbolically as ((2 < x) (x < 3)), and
clearly is equivalent to (2 < x) (x < 3), that is, (x 2) (x 3).
We can also use De Morgans Laws to simplify the denial of P Q:
(P Q) (P Q)
(P ) (Q)
so the denial of P Q is P Q. (What is the justification for the first step?) In other words, it
is not the case that P implies Q if and only if P is true and Q is false. Of course, this agrees with
the truth table for P Q that we have already seen.
There are versions of De Morgans Laws for quantifiers:
x P (x) x P (x);
x P (x) x P (x).
You may be able to see that these are true immediately. If not, here is an explanation for the
statement x P (x) x P (x) that should be convincing. If x P (x) is true, then P (x) is not
true for every value of x, which is to say that for some value a, P (a) is not true. This means that
P (a) is true. Since P (a) is true, it is certainly the case that there is some value of x that makes
P (x) true and hence x P (x) is true. The other three implications may be explained similarly.
Chapter 1 Logic
Here is another way to think of the quantifier versions of De Morgans Laws. The statement
x P (x) is very much like a conjunction of many statements. If the universe of discourse is the set
of positive integers, for example, then
x P (x) P (1) P (2) P (3)
and its negation would be
x P (x) P (1) P (2) P (3)
P (1) P (2) P (3)
x P (x).
Similar reasoning shows that the second quantifier law can also be interpreted this way.
Finally, general understanding is usually aided by specific examples. Suppose the universe is
the set of cars. If P (x) is x has four wheel drive, then the denial of every car has four wheel
drive is there exists a car which does not have four wheel drive. This is an example of the first
law. If P (x) is x has three wheels, then the denial of there is a car with three wheels is every
car does not have three wheels. This fits the pattern of the second law. In a more mathematical
vein, a denial of the sentence for every x, x2 is positive is there is an x such that x2 fails to be
positive. A denial of there is an x such that x2 = 1 is for every x, x2 6= 1.
It is easy to confuse the denial of a sentence with something stronger. If the universe is the
set of all people, the denial of the sentence All people are tall is not the sentence No people are
tall. This might be called the opposite of the original sentenceit says more than simply All
people are tall is untrue. The correct denial of this sentence is there is someone who is not tall,
which is a considerably weaker statement. In symbols, the denial of x P (x) is x P (x), whereas
the opposite is x P (x). (Denial is an official term in wide use; opposite, as used here, is
not widely used.)
De Morgans Laws can be used to simplify negations of the some form and the all form;
the negations themselves turn out to have the same forms, but reversed, that is, the negation of
an all form is a some form, and vice versa. Suppose P (x) and Q(x) are formulas. We then
x (P (x) Q(x)) x (P (x) Q(x));
x (P (x) Q(x)) x (P (x) Q(x)).
To illustrate the first form, the denial of the sentence all lawn mowers run on gasoline is the
sentence some lawn mower does not run on gasoline (not no lawn mowers run on gasoline, the
opposite). We will verify the first statement and leave a verification of the second as an exercise.
We begin by noting that a formula is usually easier to understand when does not appear in front
of any compound expression, that is, it appears only in front of simple statements such as P (x).
Using this idea, we find that
x (P (x) Q(x)) x (P (x) Q(x)) x (P (x) Q(x)),
where the last step uses a tautology presented earlier in this section.
De Morgans Laws
Denials of formulas are extremely useful. In a later section we will see that the techniques
called proof by contradiction and proof by contraposition use them extensively. Denials can also be
a helpful study device. When you read a theorem or a definition in mathematics, it is frequently
helpful to form the denial of that sentence to see what it means for the condition to fail. The
more ways you think about a concept in mathematics, the clearer it should become. To illustrate
this point, we note that definitions in mathematics are biconditional in nature even though they
are not always written in this form. In other words, definitions fit into the form P Q. To
negate a definition means to write out P Q. (This is not the same as forming the denial of
a biconditional!) Since definitions often involve quantifiers, some care must be taken when doing
this. Consider the following definition:
A function f defined on R is even if f (x) = f (x) for all x R.
As just mentioned, this definition is actually a biconditional even though it is not written explicitly
in that form. In symbols, we can express this definition as
f is even x R(f (x) = f (x)),
which is equivalent to
f is not even x R(f (x) 6= f (x)).
In words, the negation of the definition is the following:
A function f defined on R is not even if there exists an x R such that f (x) 6= f (x).
To illlustrate these ideas, note that the functions f (x) = x2 and g(x) = cos x are even. To
show that the function h defined by h(x) = 2x4 3x is not even, it is sufficient to note that
h(1) = 5 6= 1 = h(1).
It takes some practice to learn how to express negated definitions in clear words; read your
definitions several times to ensure that they represent the correct mathematical idea in a way that
others will understand. For the record, it is not always necessary to run through all of the symbols
to negate a definition, but it can be helpful in many cases.
Augustus De Morgan. (y1871; De Morgan himself noted that he turned x years old in the
year x2 .) De Morgans father died when he was ten, after which he was raised by his mother, a
devout member of the Church of England, who wanted him to be a minister. Far from becoming a
minister, De Morgan developed a pronounced antipathy toward the Church, which would profoundly
influence the course of his career.
De Morgans interest in and talent for mathematics did not become evident until he was
fourteen, but already at sixteen he entered Trinity College at Cambridge, where he studied algebra
under George Peacock and logic under William Whewell. He was also an excellent flute player, and
became prominent in musical clubs at Cambridge.
On graduation, De Morgan was unable to secure a position at Oxford or Cambridge, as he
refused to sign the required religious test (a test not abolished until 1875). Instead, at the age of
Chapter 1 Logic
22, he became Professor of Mathematics at London University, a new institution founded on the
principle of religious neutrality.
De Morgan wrote prolifically on the subjects of algebra and logic. Peacock and Gregory had
already focused attention on the fundamental importance to algebra of symbol manipulationthat
is, they established that the fundamental operations of algebra need not depend on the interpretation of the variables. De Morgan went one (big) step further: he recognized that the operations (+,
, etc.) also need have no fixed meaning (though he made an exception for equality). Despite this
view, De Morgan does seem to have thought that the only appropriate interpretations for algebra
were familiar numerical domains, primarily the real and complex numbers. Indeed, he thought that
the complex numbers formed the most general possible algebra, because he could not bring himself
to abandon the familiar algebraic properties of the real and complex numbers, like commutativity.
One of De Morgans most widely known books was A Budget of Paradoxes. He used the
word paradox to mean anything outside the accepted wisdom of a subject. Though this need
not be interpreted pejoratively, his examples were in fact of the mathematical crank variety
mathematically naive people who insisted that they could trisect the angle or square the circle, for
De Morgans son George was himself a distinguished mathematician. With a friend, George
founded the London Mathematical Society and served as its first secretary; De Morgan was the
first president.
In 1866, De Morgan resigned his position to protest an appointment that was made on religious
grounds, which De Morgan thought abused the principle of religious neutrality on which London
University was founded. Two years later his son George died, and shortly thereafter a daughter died.
His own death perhaps hastened by these events, De Morgan died in 1871 of nervous prostration.
The information for this biography is taken from Lectures on Ten British Mathematicians, by
Alexander Macfarlane, New York: John Wiley & Sons, 1916.
Exercises 1.3.
1. Use truth tables to verify De Morgans Laws.
2. Let U be the collection of all quadrilaterals. Suppose R(x) is the statement x is a rectangle, and
S(x) is the statement x is a square. Write the following symbolically and decide which pairs of
statements are denials of each other:
a) All rectangles are squares.
c) x y (xy = y 2 x = y)
d) x y (x > y y > x)
4. Verify x (P (x) Q(x)) x (P (x) Q(x)). Be certain to include all of the steps.
Mixed Quantifiers
5. Observe that
P Q (P Q) (P Q),
which shows that can be expressed in terms of and .
a) Show how to express in terms of and .
b) Show how to express in terms of and .
c) Show how to express in terms of and .
6. Express the universal quantifier in terms of and . Express in terms of and .
7. Write (in words) negations for each definition; be careful with your wording. With the exception of
part (c), give examples to illustrate both the definition and the negated definition.
a) A function f has a zero if and only if there exists a real number r such that f (r) = 0.
b) A positive integer n > 1 is square-free if and only if it is not divisible by any perfect square greater
than 1.
c) A metric space X is complete if and only if every Cauchy sequence in X converges.
d) Let A be a set of real numbers and let z R. The point z is a limit point of A if and only if for
each positive number r the interval (z r, z + r) contains a point of A other than z.
e) A function f is increasing on R if and only if f (x) < f (y) for all real numbers x and y that satisfy
x < y.
Mixed Quantifiers
In many of the more interesting mathematical formulas, some variables are universally quantified
and others are existentially quantified. You should be very careful when this is the case; in particular, the order of the quantifiers is extremely important. Except as noted, the universe in the
following examples is the set of real numbers.
y x (x + y = 0).
In the first we require that x be a fixed value that satisfies the equation regardless of the value of
y; clearly x = 0 will do. In the second formula, however, x depends on y; if y = 3, x = 3, if y = 0,
x = 0. Note that for any given value of y, we must choose x to be y; this shows the explicit
dependence of x on y.
y x (xy 3 = x).
The first is valid because given any x we can set y equal to the cube root of x. (That is, every
real number has a cube root.) So as x varies, y also varies, that is, y depends upon x. The second
is valid because there is a single fixed value y = 1 which makes the equation xy 3 = x valid,
regardless of the value of x.
Chapter 1 Logic
y x (x < y).
The first sentence is true and states that given any number there is a strictly larger number, that
is, there is no largest number. (A simple choice is y = x + 1.) The second sentence is false; it says
that there is a single number that is strictly larger than all real numbers.
In general, if you compare y x P (x, y) with x y P (x, y), it is clear that the first statement
implies the second. If there is a fixed value y0 which makes P (x, y) true for all x, then no matter
what x we are given, we can find a y (the fixed value y0 ) which makes P (x, y) true. So the first
is a stronger statement because one value of y will work for all values of x rather than needing
a different value of y for each x. As in Example 1.8, it is usually the case that this implication
cannot be reversed.
We now consider some examples that use more than two variables. The sentence between any
two distinct real numbers is another real number can be written as
x y z ((x < y) (x < z < y)).
Observe that z depends in an essential way on both variables to its left, namely, x and y. (The
most common choice is to use z = (x + y)/2.) Neither of the following is true:
x z y ((x < y) (x < z < y)),
Be certain that you can explain why these statements are false.
Now suppose that the universe of discourse is the set of integers. The following two sentences
are valid:
x y z (x = 7y + 5z),
x y z (z > x z y).
Consider the first sentence. In words, this sentence says for each integer x, there exist integers y
and z such that x = 7y + 5z. If we know the value of x, we can choose y = 2x and z = 3x, so
7y + 5z = 14x + 15x = x. Notice that y and z depend on x in an essential way. Turning to the
second, if we know x, we can choose y to be the next integer, x + 1. Any z is strictly larger than x
if and only if it is at least as large as y.
We often need to form denials of sentences with mixed quantifiers. These are handled with De
Morgans Laws, just as in Section 1.3. For example, using the set of real numbers as the universe,
the sentence x y (x + y 6= ) is false because its denial, the sentence x y (x + y = ), is valid.
(For any number x, let y = x.) Similarly, with the universe being the set of integers, the
sentence x y z (x = 4y + 6z) is false because its denial x y z (x 6= 4y + 6z) is valid. To see
this, note that 4y + 6z is even for any values of y and z so this expression cannot give any odd
integer x.
Exercises 1.4.
1. Using the set of real numbers as the universe of discourse, describe why the following are valid:
a) x y (xy = x2 )
b) x y (x2 + 6xy + 9y 2 = 0)
c) y x (x + y > xy)
d) y x (y x = xy 2 + 1)
e) x y (y x = xy + 2)
f ) x y (2 sin2 y + cos(2y) = x)
2. Using the integers as the universe of discourse, describe why the following are valid:
a) x y z (z < x z y)
b) x y z (x = 8y + 3z)
c) x y z (x = yz y = z)
3. Form the denials of the following statements and simplify using De Morgans Laws. Which of the
statements are true?
a) x y ((x+y = 1)(xy 6= 0))
b) y x ((x2 = y) (x = y + 1))
4. Write (in words) negations for each definition; do pay attention to your wording. Except for part (a),
try to find examples to illustrate the definitions.
a) A group G is abelian if and only if x y = y x for all x and y in G.
b) A sequence {xn } is bounded if and only if there is a number M such that |xn | M for all n.
c) The sequence {xn } converges to the number L if and only if for each > 0 there exists a positive
integer N such that |xn L| < for all n N .
d) A sequence {xn } is a Cauchy sequence if and only if for each > 0 there exists a positive integer
N such that |xm xn | < for all m, n N .
e) A function f : R R is continuous at c if and only if for each > 0 there exists > 0 such that
|f (x) f (c)| < for all x that satisfy |x c| < .
f ) A function f is Lipschitz on R if and only if there exists a positive constant K such that the
inequality |f (x) f (y)| K|x y| is valid for all x and y in R.
g) A set A of real numbers is bounded if and only if there exists a positive number M such that
|x| M for all x A.
h) A set A of real numbers is open if and only if for each x A there exists a positive number r such
that (x r, x + r) A.
5. Using quantifiers, define what it means for a function f defined on R to be periodic (for example,
recall that sin(x) is periodic). What does it mean for f to fail to be periodic?
Like logic, the subject of sets is rich and interesting for its own sake. However, we will be content
to list a few facts about sets and discuss some techniques for dealing with them. Further properties
of sets are considered in Chapter 4.
It is necessary for some terms in mathematics to be left undefined; the concept of set is one
such term. When a term is left undefined, some attempt must be made to explain what is meant
by the term. Since everyone has some experience with sets (a set of dishes, a collection of stamps,
a herd of buffalo, a pocket full of change), it is not difficult to get across the basic idea of a set.
Hence, we say that a set is a collection of objects. The objects in a set usually have some features
in common, such as the set of real numbers or a set of continuous functions, but a set can also
be any random collection of objects. (Actually, there are some restrictions on the types of objects
that can be considered, but this restriction will not be important here.) Any one of the objects in
Chapter 1 Logic
A B = {x : x A x B},
which are called the intersection of A and B and the union of A and B, respectively. It is
sometimes useful to consider the complement of B relative to A. This set, which is denoted by
A \ B, is the set of all elements that belong to A but do not belong to B. This operation is referred
to as set difference since the elements of B are removed from A.
EXAMPLE 1.9 Suppose the universe U of discourse consists of the set {1, 2, 3, . . . , 10} and
consider the sets A = {1, 3, 4, 5, 7} and B = {1, 2, 4, 7, 8, 9}. Then
Ac = {2, 6, 8, 9, 10},
A \ B = {3, 5},
A B = {1, 4, 7},
A B = {1, 2, 3, 4, 5, 7, 8, 9}.
Note that the complement of a set depends on the universe U , while the union, intersection, and
set difference of two sets do not.
c c
a) (A ) = A
b) A B A
c) A A B
d) A (B C) = (A B) (A C)
e) A (B C) = (A B) (A C)
f ) (A B)c = Ac B c
g) (A B)c = Ac B c
Proof. We first give a set theoretic proof (or a chasing points proof) of part (d). Suppose
that x A (B C). This means that x belongs to A and to either B or C. It follows that x
belongs to either A and B or to A and C, that is, x (A B) (A C). We have thus shown
that A (B C) (A B) (A C). Now suppose that x (A B) (A C). This means that
x belongs to either A B or to A C, which in turn implies that x belongs to A and to either B
or C, that is, x A (B C). Therefore, (A B) (A C) A (B C). We conclude that
A (B C) = (A B) (A C). The other parts of the theorem can be proved in a similar way.
However, it is important to also realize that all of these facts are consequences of logical
statements considered earlier. To illustrate this, define statements P (x) =x A, Q(x) =x B,
and R(x) =x C. Then part (d) is simply the tautology (part (g) of Theorem 1.3)
P (x) (Q(x) R(x)) (P (x) Q(x)) (P (x) R(x)).
The other statements in the theorem are also related to tautologies.
Chapter 1 Logic
As in the case of logic, parts (f) and (g) of Theorem 1.10 are called De Morgans Laws.
Theorem 1.10 certainly is not an exhaustive list of set identitiesnote that obvious facts such as
commutative and associative properties are not includedit merely illustrates a few of the more
important ones.
Suppose that A and B are nonempty sets. If a A and b B, then we can form the ordered
pair (a, b); the pair is said to be ordered since the first element must come from the set A and the
second from the set B. The fundamental property of ordered pairs is that (a1 , b1 ) = (a2 , b2 ) if and
only if a1 = a2 and b1 = b2 , that is, two ordered pairs are the same when both the first elements
and the second elements are the same. If A and B are sets, the set
A B = {(a, b) : a A b B}
is called the Cartesian product of A and B. (Note carefully that a Cartesian product does not
involve the multiplication of elements.) For example, if A = {r, s, t} and B = {$, %}, then
A B = {(r, $), (r, %), (s, $), (s, %), (t, $), (t, %)}.
The sets R R and R R R are usually abbreviated as R2 and R3 , respectively, and represent the
plane and 3-dimensional space. It is in this latter context that you are most familiar with ordered
pairs. As a reminder (via a particular example), the graph of the equation y = x2 + 2x + 2 is the
subset of R2 defined by {(x, y) : x R and y = x2 + 2x + 2}.
e Descartes. Descartes (15961650) was perhaps the most able mathematician of his time
(though he may have to share top billing with Pierre de Fermat, a busy lawyer who did mathematics
on the side for fun). Despite his ability and his impact on mathematics, Descartes was really a
scientist and philosopher at heart. He made one great contribution to mathematics, La geometrie,
and then concentrated his energies elsewhere.
La geometrie did not even appear on its own, but as an appendix to his most famous work, Discours de la methode pour bien conduire sa raison et chercher la verite dans les sciences (Discourse
on the method of reasoning well and seeking truth in the sciences). Descartes is remembered as
the father of coordinate or analytic geometry, but his uses of the method were much closer in spirit
to the great Greek geometers of antiquity than to modern usage. That is, his interest really lay in
geometry; he viewed the introduction of algebra as a powerful tool for solving geometrical problems. Confirming his view that geometry is central, he went to some lengths to show how algebraic
operations (for example, finding roots of quadratic equations) could be interpreted geometrically.
In contrast to modern practice, Descartes had no interest in graphing an arbitrary relation
in two variablesin the whole of La geometrie, he did not plot any new curve from its equation.
Further, ordered pairs do not play any role in the work; rectangular coordinates play no special
role (Descartes used oblique coordinates freelythat is, his axes were not constrained to meet
at a right angle); familiar formulas for distance, slope, angle between lines, and so on, make no
appearance; and negative coordinates, especially negative abscissas, are little used and poorly
understood. Ironically, then, there is little about the modern notion of Cartesian coordinates that
Descartes would recognize.
Despite all these differences in emphasis and approach, Descartes work ultimately made a great
contribution to the theory of functions. The Cartesian product may be misnamed, but Descartes
surely deserves the tribute.
Exercises 1.5.
1. For the given universe U and the given sets A and B, find Ac , A B, and A B.
a) U = {1, 2, 3, 4, 5, 6, 7, 8}, A = {1, 3, 5, 8}, B = {2, 3, 5, 6}
b) U = R, A = (, 2], B = (1, )
c) U = Z, A = {n : n is even}, B = {n : n is odd}
d) U = Q, A = , B = {q : q > 0}
e) U = N, A = N, B = {n : n is even}
f ) U = R, A = (, 0], B = [2, 3)
g) U = N, A = {n : n 6}, B = {1, 2, 4, 5, 7, 8}
h) U = R R, A = {(x, y) : x2 + y 2 1}, B = {(x, y) : x 0, y 0}.
2. For the sets in Exercise 1a, 1b, and 1e, find A \ B and B \ A.
3. Prove that A \ B = A B c .
4. Prove the parts of Theorem 1.10 not proved in the text. Be certain you understand both approaches
to these proofs.
5. Use Exercise 3 and Theorem 1.10 to prove that (A \ B) (B \ A) = (A B) \ (A B).
6. Suppose U is some universe of discourse. Find {x : x = x} and {x : x 6= x}.
7. Prove carefully from the definition of that for any set A, A.
8. a) For A = {1, 2, 3, 4} and B = {x, y}, write out A B, A A, and B B.
b) If A has m elements and B has n elements, how many elements are in A B?
c) Describe A . Justify your answer.
d) What name do we give the set (0, ) (0, ) in the universe R2 ?
e) What kind of geometric figure is [1, 2] [1, 2] [1, 2] in the universe R3 ?
9. If A and B are sets, show that A B, A B c = , and Ac B = U are equivalent statements, that
is, each pair is related by the biconditional. What are the corresponding logical statements?
10. Suppose A, B, C, and D are sets.
a) Prove that (A B) (C D) = (A C) (B D).
b) Does (a) hold with replaced by ? Prove any set inclusion that is true and give an example of
any result that fails.
c) Illustrate the results in parts (a) and (b) graphically in R2 .
11. Suppose we say a set S is normal if S
/ S. (You probably have encountered only normal sets. For
example, the set of real numbers is not a real number. However, consider the set of all abstract
ideas. Most people would agree that this set is not normal.) Consider N = {S : S is a normal set}.
Is N a normal set? (This is called Russells Paradox. Examples like this helped make set theory a
mathematical subject in its own right. Although the concept of a set at first seems straightforward,
even trivial, it emphatically is not.)
Chapter 1 Logic
Families of Sets
Suppose I is a set and with each i I, associate a set Ai . The set I is referred to as the index
set and we call {Ai : i I} an indexed family of sets. Sometimes this is denoted by {Ai }iI .
Consider the following examples of this concept.
Suppose I is the days of the year, and for each i I, let Bi be the set of people whose birthday
is i. So, for example, Beethoven B(December 16) .
Suppose I is the set of integers and for each i I, let Ci be the set of multiples of i, that is,
Ci = {ni : n Z}. For example, C7 = {. . . , 21, 14, 7, 0, 7, 14, 21, . . .}.
For each real number x, let Dx = {x 1, x, x + 1}. In this case, the index set I is the set of
real numbers. For example, D = { 1, , + 1}.
Given an indexed family {Ai : i I}, we can define the intersection and union of the sets Ai
using the universal and existential quantifiers:
Ai = {x : i I (x Ai )} and
Ai = {x : i I (x Ai )}.
i I (x
/ Ai )
i I (x Aci )
Aci .
show that the sets are equal. The reader should make certain each step is clear.
You may be puzzled by the inclusion of this theorem as it seems to be a simple consequence
of the latter part of Theorem 1.10. However, parts (f) and (g) of Theorem 1.10 concern the
Families of Sets
intersection or union of two sets only. This can be extended easily to the intersection or union of
a finite number of sets, though even this modest extension does require separate proof (see Section
2.6). The real problem is with intersections or unions of an infinite number of sets. Though in this
case the extension to infinite operations has an easy proof, it is not always the case that what is
true for a finite number of operations is true for an infinite number of operations, and even when
true, the proof in the infinite case may be more difficult. (For example, a finite sum of differentiable
functions is differentiable, but an infinite sum of differentiable functions may not be differentiable.)
The relationships in the following theorem are simple but useful; they illustrate the dual nature
of the union and intersection of families of sets.
THEOREM 1.12 If {Ai : i I} is an indexed family of sets and B is any set, then
a) iI Ai Aj for each j I;
b) Aj iI Ai for each j I;
c) if B Ai for all i I, then B iI Ai ;
d) if Ai B for all i I, then iI Ai B.
Proof. Part (a) is a case of specialization. Suppose that x iI Ai . This means that x Ai
for all i I. In particular, x Aj for any choice of j I. We have thus shown that iI Ai Aj
for each j I. Part (d) follows in much the same way. Suppose that x iI Ai . It follows that
x Ai for some i I. Since Ai B, we see that x B. We have thus shown that iI Ai B.
Proofs for parts (b) and (c) are left as exercises.
An indexed family {Ai : i I} is pairwise disjoint if Ai Aj = whenever i and j are distinct
elements of I. For example, the indexed family {Bi } involving birthdays is pairwise disjoint, but
the family {Ci } involving multiples is not. If S is a set, then an indexed family {Ai : i I} of
nonempty subsets of S is a partition of S if it is pairwise disjoint and S = iI Ai . Partitions
appear frequently in mathematics; one important way to generate partitions appears in the next
section. Two simple examples are the following.
Let I = {e, o}, let Ae be the set of even integers, and let Ao be the set of odd integers. Then
{Ai : i I} is a partition of S = Z. (Technically, this fact requires proof; see the next chapter.)
Let I = R, let S = R2 , and for each i I, let Ai = {(x, i) : x R}. Each Ai is the graph of a
horizontal line and the indexed family partitions the plane S.
Sometimes we want to discuss a collection of sets (that is, a set of sets) even though there is no
natural index present. In this case we can use the collection itself as the index. For example, if S is
{{1, 3, 4}, {2, 3, 4, 6}, {3, 4, 5, 7}}, then we have AS A = {3, 4} and AS A = {1, 2, 3, 4, 5, 6, 7}.
An especially useful collection of sets is the power set of a set. If X is any set, the power set
P(X) of X is the set that contains all of the subsets of X, that is, P(X) = {A : A X}. Note
that each element of a power set is itself a set. If X = {1, 2}, then P(X) = {, {1}, {2}, {1, 2}}.
For the record, P() = {}, that is, the power set of the empty set is nonempty.
Chapter 1 Logic
Exercises 1.6.
1. Let
S I = {1, 2, 3}, A1 = {1, 3, 4, 6, 7}, A2 = {1, 4, 5, 7, 8, 9}, and A3 = {2, 4, 7, 10}. Find
iI Ai .
Ai and
2. Let I = N and for each positive integer i, define the intervals Ai = [0, 1/i], Bi = (i, i + 1), and
Ci = [i, ). Find each of the following.
a) iI Ai and iI Ai b) iI Bi and iI Bi c) iI Ci and iI Ci
3. For each x [0, 1], let Ax be the interval (x 1, x + 1). Find x[0,1] Ax and x[0,1] Ax .
4. Use part (a) of Theorem 1.11 (and some logic or set properties) to prove part (b) of the theorem.
5. Prove parts (b) and (c) of Theorem 1.12.
6. Let P be the set of prime numbers that are less than 20. Give an example of a partition of P that
consists of four sets.
7. Let I = [0, ) and for each i I, let Ai = {(x, y) : x2 + y 2 = i2 }. Show that {Ai : i I} is a partition
of R2 .
8. Let {Ai }iI be a partition of a set S and let T S. Prove that the nonempty sets in the collection
{Ai T }iI form a partition of T .
9. Suppose S is a collectionSof sets and B is some other set. Show that if B is disjoint from every A S
then B is disjoint from AS A.
10. Write out the power set for the set {a, b, c}.
Equivalence Relations
We might arguably say that mathematics is the study of how various entities are related; in any
case, the relationships between mathematical objects are a large part of what we study. You are
already familiar with many such relationships: If f (x) = y, then x and y are related in a special way
by the function f ; if we say x < y or x = y or x y, we are highlighting a particular relationship
between the numbers x and y; the symbols A B also indicate a relationship, this time involving
the sets A and B.
Certain kinds of relationships appear over and over in mathematics, and therefore deserve
careful treatment and study. We use the notation x y to mean that x and y are related in some
special way; is called a relation. The meaning of changes with contextit is not a fixed
relation. In some cases, of course, we can use other symbols that have come to be associated with
particular relations, like <, , or =. A very important type of relation is given in the next
DEFINITION 1.13 A relation on a nonempty set A is an equivalence relation on A if it
satisfies the following three properties:
a) reflexivity: for all a A, a a.
b) symmetry: for all a A and all b A, if a b, then b a.
c) transitivity: for all a A, all b A, and all c A, if a b and b c, then a c.
Equality (=) is certainly an equivalence relation. It is, of course, enormously important, but it
is not a very interesting example of an equivalence relation since no two distinct objects are related
by equality. Less than or equal to () is not an equivalence relation since if fails to be symmetric.
Equivalence Relations
The following examples indicate that equivalence relations can be more interesting than equality.
For the first example, recall that a is a multiple of n if there exists an integer j such that a = jn.
EXAMPLE 1.14 Suppose A = Z and let n be a fixed positive integer. Let a b mean that
a b is a multiple of n. For each integer a, it is clear that a a = 0 is a multiple of n. This shows
that is reflexive. If a and b are any integers for which a b is a multiple of n, it follows easily
that b a is also a multiple of n. In other words, a b implies b a for all a and b in the set Z,
and we conclude that is symmetric. Finally, suppose that a, b, and c are any integers for which
a b and b c. This means that there exist integers j and k such that a b = jn and b c = kn.
a c = (a b) + (b c) = jn + kn = (j + k)n,
we see that a c is a multiple of n. It follows that a c, revealing that is transitive. Since the
relation is reflexive, symmetric, and transitive, it is an equivalence relation on Z.
EXAMPLE 1.15 Let A be the set of all words. If a A and b A, define a b to mean that a
and b have the same number of letters. It is easy to verify that is an equivalence relation on A.
EXAMPLE 1.16 Let A be the set R2 . If a A and b A, with a = (x1 , y1 ) and b = (x2 , y2 ),
define a b to mean that x21 +y12 = x22 +y22 . We leave as an exercise a proof that is an equivalence
relation on R2 .
If is an equivalence relation defined on a set A and a A, let [a] = {x A : x a}. This set
is called the equivalence class corresponding to a. Observe that reflexivity implies that a [a].
Referring to our earlier examples, we obtain the following:
Letting n = 6 in Example 1.14, we find that
[2] = {6n + 2 : n Z} = {. . . , 10, 4, 2, 8, . . .};
[5] = {6n + 5 : n Z} = {. . . , 7, 1, 5, 11, . . .}.
Note that [2] and [5] are disjoint, that [2] = [8], and that [5] = [29].
Using the relation of Example 1.15, [math] is the set consisting of all four letter words.
Using the relation of Example 1.16, [(1, 0)] is the boundary of the unit circle.
The words the following are equivalent followed by a list of statements such as P , Q, and
R mean that each of the biconditionals P Q, P R, and Q R are valid. Although there
are six conditional statements here, the reader should verify that it is sufficient to prove the three
conditionals P Q, Q R, and R P . This is the plan of action for the following proof.
THEOREM 1.17 Suppose is an equivalence relation on a set A. Then for any two elements
a and b in A, the following are equivalent:
1) a b;
2) [a] [b] 6= ;
3) [a] = [b].
Chapter 1 Logic
Proof. We first prove that (1) (2). Suppose a b. By definition, we find that a is an element
of [b]. Since a is also in [a], we know that a [a] [b]. This shows that [a] [b] 6= .
We next prove that (2) (3). Suppose that [a] [b] 6= . Since [a] [b] is not empty, we can
choose y A such that y is in both [a] and [b]. This means that y a and y b. Using both the
symmetric and transitive properties of , it follows that a b. We need to show that the two sets
[a] and [b] are equal. To do so, note that (be certain you can verify each step)
x [a] x a x b x [b];
x [b] x b x a x [a].
The first line shows that [a] [b] and the second line shows that [b] [a]. We conclude that
[a] = [b].
To prove that (3) (1), assume that [a] = [b]. Since a [a] and [a] = [b], we find that a [b].
It follows that a b. This completes the proof.
Suppose that is an equivalence relation on a set A and let A/ denote the collection of all
the corresponding equivalence classes. By the previous theorem, we see that A/ is a partition of
A. The expression A/ is usually pronounced A mod twiddle.
A/ = {{one letter words}, {two letter words}, {three letter words}, . . .}.
It is easy to see that the nonempty sets in this collection form a partition of the set of all words.
EXAMPLE 1.20 Using the relation of Example 1.16, A/ = {Cr : r 0}, where for each
positive real number r, Cr is the circle of radius r centered at the origin (just the circumference)
and C0 = {(0, 0)}. Note that the resulting partition of R2 has a simple geometric description.
Equivalence Relations
Exercises 1.7.
1. Let A = R3 . Let a b mean that a and b have the same z coordinate. Show is an equivalence
relation and describe [a] geometrically.
2. Show that the relation defined in Example 1.16 is an equivalence relation.
3. Find examples (more than one if possible) of relations with the given property; indicate the set and
the relation clearly.
a) The relation is reflexive and symmetric but not transitive.
b) The relation is symmetric and transitive but not reflexive.
c) The relation is reflexive and transitive but not symmetric.
4. Suppose is a relation on A. The following purports to prove that the reflexivity condition is
unnecessary, that is, it can be derived from symmetry and transitivity:
Suppose a b. By symmetry, b a. Since a b and b a, by transitivity, a a.
Therefore, is reflexive.
What is wrong with this argument?
5. Suppose is a relation on A that is reflexive and has the property that for all elements a, b, and c in
A, if a b and a c, then b c. Prove that is an equivalence relation on A.
6. Let f : R R be a function and define a relation on R by a b if f (a) = f (b).
a) Prove that is an equivalence relation on R.
b) For f (x) = x2 + 2x, find [5].
c) For f (x) = sin x, find [/2] and [/6].
7. Define a relation on R by x y if there exist integers a, b, c, and d with |ad bc| = 1 such that
ax + b
cx + d
Prove that is an equivalence relation on R. Can you identify the equivalence class [0]?
8. Let Z be the set of all nonzero integers. Define a relation on Z Z by (a, b) (c, d) if and only if
ad = bc. Prove that is an equivalence relation on Z Z . (Remember to use only integers in your
proof; fractions should not appear.)
9. Let S be the collection of all sequences of real numbers and define a relation on S by {xn } {yn } if
and only if {xn yn } converges to 0.
a) Prove that is an equivalence relation on S.
b) What happens if is defined by {xn } {yn } if and only if {xn + yn } converges to 0?
10. Let C be the collection of all continuous functions defined on R and define a relation on C by f g
if and only if f g is differentiable on R. Prove that is an equivalence relation on C.
Proof may be what best distinguishes mathematics from other disciplines. The notion of proof
even distinguishes mathematics from the sciences, which (according to most people) are logical,
rigorous, and to a greater or lesser degree (depending on the discipline) based on mathematics.
By using rigorous, logically correct reasoning, we aim to prove mathematical theoremsthat is,
to demonstrate that something is true beyond all doubt (assuming, of course, that the axioms we
choose to accept are valid).
It is impossible to give a formula or algorithm for proving any and all mathematical statements,
yet certain approaches or strategies appear over and over in successful proofs, so studying proof
itself is worthwhile. Of course, even if the subject is proof itself, we need to prove something,
so in this chapter we begin our study of number theory, that is, the properties of the integers
(often, but not always, the non-negative integers). A mathematical theory such as number theory
or geometry is a collection of related statements that are known or accepted to be true. The theory
consists of definitions, axioms, and derived results. The derived results are usually called theorems,
but other names (such as propositions) are sometimes used as well. Before proceeding with our
study of some aspects of elementary number theory, we present a general discussion of definitions,
axioms, and theorems.
Definitions represent a mathematical shorthand. A word or short phrase is used to represent
some concept. For example, a prime number is a positive integer p such that p > 1 and the only
positive divisors of p are p and 1. The term prime number replaces the longer phrase. It is much
easier to write or say prime number than it is to write or say a positive integer greater than 1
whose only positive divisors are itself and 1. The tradeoff, of course, is that you must learn what is
meant by the term prime number. Although the longer version is not written, it must be known.
Notice that the definition of a prime number requires knowledge of positive integers and the notion
of divisibility of integers. New terms are defined using previously defined terms and concepts. This
process cannot go on indefinitely. In order to avoid circular definitions, some terms must remain
Chapter 2 Proofs
undefined. In geometry, points and lines are undefined terms. Other objects, such as triangles and
squares, are defined in terms of points and lines. Although most people are comfortable with the
concepts of points and lines, it is not possible to give them a definition in terms of simpler concepts.
As we have seen, another undefined term in mathematics is the term set. Attempts to define a set
result in a list of synonyms (such as collection, group, or aggregate) that do not define the term.
In summary, certain terms in a mathematical theory must remain undefined. New terms may be
defined using the undefined terms or previously defined terms.
A mathematical theory cannot get off the ground with definitions only. It is necessary to know
something about the terms and/or how they are related to each other. Basic information about
the terms and their relationships is provided by axioms. An axiom is a statement that is assumed
to be true. Most axioms are statements that are easy to believe. Turning to geometry once again,
one axiom states that two distinct points determine exactly one line. This statement certainly
makes sense. The important point, however, is that this statement cannot be proved. It is simply
a statement that is assumed to be true. Although the axioms are generally chosen by intuition, the
only real requirement for a list of axioms is that they be consistent. This means that the axioms
do not lead to contradictions.
For clarity, for aesthetics, and for ease of checking for consistency, the number of undefined
terms and axioms is kept to a bare minimum. A short list of undefined terms and axioms lies
at the foundation of every branch of mathematics. In fact, most branches of mathematics share
a common foundation. This common base involves properties of sets and properties of positive
integers. However, most mathematics courses do not start at this level. A typical mathematics
course generally assumes knowledge of other aspects of mathematics. For instance, the set of
positive integers can be used to define the set of real numbers. However, for a course in real
analysis, it is assumed that the reader already has a working knowledge of the set of real numbers,
that is, it is taken for granted that a rigorous definition of the set of real numbers using more basic
concepts exists. At a different level, a graduate course in number theory would assume a working
knowledge of the ideas presented in Chapter 3 of this book.
A theorem is a true statement that follows from the axioms, definitions, and previously derived
results. An example from calculus is the following theorem:
If f is differentiable at c, then f is continuous at c.
This result follows from the definitions of continuity and differentiability, and from previous results
on limits. The bulk of a mathematical theory is made up of theorems. Most of this book is made up
of theorems and their corresponding proofs. Some authors refer to derived results as propositions,
but the use of the word theorem is much more common.
One other comment on terminology is worth mentioning. A common sequence of derived results
is lemma, theorem, corollary. A lemma is a derived result whose primary purpose is as an aid
in the proof of a theorem. The lemma is usually only referred to in the proof of the associated
theoremit is not of interest in and of itself. A lemma is often used to shorten a proof or to make a
proof read more easily. If part of the proof of a theorem involves some technical details that divert
the readers attention from the main points, then this result is pulled out and called a lemma. The
technical details in the proof of the theorem are replaced by a phrase such as by the lemma.
Chapter 2 Proofs
A proof that requires a number of fairly long steps is sometimes split into parts, each of which
becomes a lemma. A corollary is a result that follows almost immediately from a theorem. It is a
simple consequence of the result recorded in the theorem. None of these labels (lemma, theorem,
proposition, corollary) has an exact meaning and their use may vary from author to author. The
common theme is that each represents a derived result.
Another important aspect of a mathematical theory are examples. Examples are objects that
illustrate definitions and other concepts. Examples give the mind some specific content to ponder
when thinking about a definition or a concept. For instance, after defining a prime number, it
is helpful to note that 7 is prime and 6 = 2 3 is not. Abstract mathematics is brought to life
by examples. It is possible to create all kinds of definitions, but unless there are some examples
that satisfy a given definition, the definition is not very useful. Consider the following artificial
A positive integer n is called a century prime if both n and n + 100 are prime numbers
and there are no prime numbers between n and n + 100.
Before proving theorems about century primes, an example of a century prime should be found.
If there are no century primes, there is no need to study the concept. For the calculus theorem
stated earlier in this chapter introduction, an example of a function f and a point c such that f is
continuous at c but f is not differentiable at c would be interesting and enlightening. The study
of and search for examples can lead to conjectures about possible theorems and/or indicate that
proposed theorems are false. With every new definition and concept, you should always generate a
number of specific examples.
After the axioms and definitions have been recorded, how are derived results generated? The
discovery of a derived result involves hard work, intuition, and, on occasion, creative insight. The
new result must then be proved. The validity of the axioms and previous results must be used
to establish the validity of the new result. This is where logic enters the picture. The rules of
logic make it possible to move from one true statement to another. To understand a mathematical
theory, it is necessary to understand the logic that establishes the validity of derived results; this
was the purpose of Chapter 1.
In this textbook, we will primarily focus on proofs involving the integers (number theory) for
two reasons. First, it is a very good subject in which to learn to write proofs. The proofs in number
theory are typically very clean and clear; there is little in the way of abstraction to cloud ones
understanding of the essential points of an argument. Secondly, the integers have a central position
in mathematics and are used extensively in other fields such as computer science. Although the
great twentieth century mathematician G. H. Hardy boasted that he did number theory because
there was no chance that it could be construed as applied mathematics, it has in fact become
enormously useful and important in the study of computation and particularly in cryptography.
Many people also find number theory intrinsically interesting, one of the most beautiful subjects
in modern mathematics, and all the more interesting because of its roots in antiquity. Unless
otherwise specified, then, the universe of discourse is the set of integers, Z.
Chapter 2 Proofs
Direct Proofs
A proof is a sequence of statements. These statements come in two forms: givens and deductions.
The following are the most important types of givens.
Hypotheses: Usually the theorem we are trying to prove is of the form P1 Pn Q. The
Pi s are the hypotheses of the theorem. We can assume that the hypotheses are true, because if
one of the Pi s is false, then the implication is automatically true.
Known results: In addition to any stated hypotheses, it is always valid in a proof to write down a
theorem that has already been established, or an unstated hypothesis (which is usually understood
from context). In an introductory course such as this, it is sometimes difficult to decide what you
can assume and what you must prove. This should become clearer as we go.
Definitions: If a term is defined by some formula, it is always legitimate in a proof to replace the
term by the formula or the formula by the term.
We turn now to the most important ways a statement can appear as a consequence of (or
deduction from) other statements:
Tautology: If P is a statement in a proof and Q is logically equivalent to P , we can then write
down Q.
Modus Ponens: If P has occurred in a proof and P Q is a theorem or an earlier statement in
the proof, we can write down Q. Modus ponens is used frequently, though sometimes in a disguised
form; for example, most algebraic manipulations are examples of modus ponens.
Specialization: If we know x P (x), then we can write down P (x0 ) whenever x0 is a particular value. Similarly, if P (x0 ) has appeared in a proof, it is valid to continue with x P (x).
Frequently, choosing a useful special case of a general proposition is the key step in an argument.
When you read or write a proof you should always be very clear exactly why each statement is
valid. You should always be able to identify how it follows from earlier statements.
A direct proof is a sequence of statements which are either givens or deductions from previous
statements, and whose last statement is the conclusion to be proved. The statements generally come
in three forms: premises, added assumptions, and deductions. The deductions follow from the rules
of logic, primarily the tautologies discussed in the first chapter. We introduce the notion of proof
with two-column logic proofs (a statement in the left column and its corresponding justification in
the right column) of purely symbolic statements. The point of these proofs is to focus on the logic
rather than on any particular content.
Direct Proofs
Each step requires justification. Here the steps are either premises or known tautologies. As
in this example, it is helpful to give line numbers to indicate precisely which information is being
used. This format should be followed in the exercises.
Why are we doing proofs of this type? In principle, every mathematical proof can be reduced
to steps like this. For better or worse (depending upon your perspective), this is seldom done in
practice. However, when one is trying to sort out a difficult proof it is sometimes necessary to do
a partial breakdown of the proof to see what is going on. Although proofs found in the literature
are most often in words, these words reflect the sort of steps written out above. The order, logic,
and transitions should all be apparent in the proof. There is room within the framework of written
proofs to develop a style of your own, but certain conventions must be followed and the logic must
be valid.
The next example shows how to give a direct proof of a conditional statement of the form
P Q. Since P Q is automatically true in the case in which P is false, all we need to prove is
that Q is true when P is true. In a proof of this type, we can use P as an added premise. (You
may find it helpful to look again at the list of tautologies given in Theorem 1.3.)
A (B C)
added premise
contraposition (2)
modus ponens (1) (3)
modus ponens (1) (5)
disjunctive syllogism (4) (6)
contraposition (8)
modus ponens (7) (9)
conditional proof (1) (10)
Although the previous examples strip a proof to its bare essentials, they are misleading in one
important regard. When given a statement to prove, you are seldom given all of the information
that you need to write the proof. Determining what other premises are known and useful can be
difficult and may require some creative leaps. It is this aspect of mathematics that is both exciting
and frustrating.
We continue now with our list of givens that appear (or need to be determined) in proofs.
Many theorems in mathematics involve variables. For example, the familiar calculus theorem that
says If f is differentiable at c, then f is continuous at c involves two variables, a function f and
a point c.
Variables: The proper use of variables in an argument is critical. Their improper use results in
unclear and even incorrect arguments. Every variable in a proof has a quantifier associated with
it, so there are two types of variables: those that are universally quantified and those that are
Chapter 2 Proofs
existentially quantified. We may fail to mention explicitly how a variable is quantified when this
information is clear from the context, but every variable has an associated quantifier.
A universally quantified variable is introduced when trying to prove a statement of the form
x(P (x) Q(x)). The language typically employed is Suppose x satisfies P (x), Assume P (x),
or Let P (x). The variable x represents a fixed but arbitrary element chosen from some universe.
It is important to be certain to not use any special properties of x that do not apply to the entire
universe. For example, if x represents a positive real number, you cannot assume that x2 x in
the proof because this statement is not true for all positive real numbers.
When we introduce an existentially quantified variable, it is usually defined in terms of other
things that have been introduced earlier in the argument. In other words, it depends on previously
mentioned quantities. Note how the integer k appears in the following familiar definition; it depends
on the integer n.
DEFINITION 2.3 An integer n is even if and only if there is an integer k such that n = 2k.
An integer n is odd if and only if there is an integer k such that n = 2k + 1.
We assume that every integer is either even or odd. Although this seems like an obvious
statement, it does require proof. We postpone the proof to a later section (see, for instance, the
Division Algorithm in Section 2.7).
Proof. Suppose that n is an even integer (n is a universally quantified variable which appears in
the statement we are trying to prove). By definition, there exists an integer k such that n = 2k (k
is existentially quantified, defined in terms of n, which appears previously). It follows easily that
n2 = 4k 2 = 2(2k 2 ). Letting j = 2k 2 (j is existentially quantified, defined in terms of k), we find
that j is an integer and that n2 = 2j. Therefore, the integer n2 is even (by definition).
The parenthetical remarks are not part of the actual proof; they are included at this stage to
help explain what is going on. We will soon be omitting such remarks. Note how both directions
of the biconditional definition have been used in the proof; one direction to obtain the integer k
given a value for n and the other to verify that n2 is even. By the way, what is the contrapositive
of the statement proved in this example? (You might find this fact useful in the exercises.)
The next example is not presented in the standard if ..., then ... form; the reader should
write the theorem in this form before proceeding to read the proof.
Proof. Suppose that m and n are odd integers (introducing two universally quantified variables
to stand for the quantities implicitly mentioned in the statement). By definition, there exist integers
j and k such that m = 2j + 1 and n = 2k + 1 (introducing existentially quantified variables, defined
in terms of quantities already mentioned). We then have m + n = (2j + 1) + (2k + 1) = 2(j + k + 1).
Letting i = j + k + 1 (existentially quantified), we find that i is an integer and that m + n = 2i. It
follows that m + n is even (by definition).
Exercises 2.1.
For problems 15, give a two-column logic proof. Use the style of Examples 2.1 and 2.2.
1. Prove T , given R T , S, and S R.
2. Prove Q, given T , R, and R (T Q).
3. Prove N , given R S, R, S Q, and N Q.
4. Prove R P , given P Q, and R Q.
5. Prove D C, given A (B C), D A, and B.
For problems 69, write proofs for the given statements, inserting parenthetic remarks to explain the
rationale behind each step (as in Examples 2.4 and 2.5).
6. The sum of two even numbers is even.
7. The sum of an even number and an odd number is odd.
8. The product of two odd numbers is odd (and thus the square of an odd number is odd).
9. The product of an even number and any other number is even.
There are several options for proofs of the statements in problems 1011. Try to find proofs that take
advantage of results already proved in this section.
10. Prove that x is odd if and only if |x| is odd.
11. Suppose that x and y are integers and that x2 + y 2 is even. Prove that x + y is even.
Chapter 2 Proofs
solve a problem. These surprising connections between different parts of mathematics enhance the
whole mathematical enterprise.)
In spite of their simplicity, the following results will be very useful.
a) If n 6= 0, then n|0 and n|n.
b) 1|n for any integer n.
c) If n|a, then n|ab for any integer b.
d) If n|a and a|b, then n|b.
e) If m|a and n|b, then mn|ab.
f ) If n|a and n|b, then n|(ax + by) for any x, y Z.
Proof. The equations 0 = n0 and n = n1 prove part (a), while the equation n = 1n establishes
part (b). To prove part (c), suppose that n|a and let b be any integer. By definition, there exists
an integer k such that a = nk. It follows that ab = n(bk), revealing that n|ab. Turning to part (f),
suppose that n, a, and b are integers such that n|a and n|b. By definition, there exist integers i
and j such that a = ni and b = nj. If x and y are any integers, then
ax + by = (ni)x + (nj)y = n(ix + jy).
Since ix + jy is an integer, this equation shows that n|(ax + by). This proves part (f). Proofs for
parts (d) and (e) will be left as exercises.
Proof. Suppose that a and b are integers such that a|b and b 6= 0. Since a|b, there exists an
integer c such that b = ac. Note that c 6= 0 since b 6= 0. Since c 6= 0 and c is an integer, we know
that |c| 1. It follows that |b| = |ac| = |a| |c| |a|.
If a and b are positive integers such that a|b and b|a, then a = b.
DEFINITION 2.10 An integer p > 0 is called prime if it has exactly two positive divisors,
namely, 1 and p. If a > 0 has more than two positive divisors, we say it is composite.
It is important to remember that 1 is neither prime nor composite. A prime has exactly two
positive divisors, but 1 has only one (1 itself). Observe that if a > 1 is composite, then there exist
integers n and m such that a = nm, 1 < n < a, and 1 < m < a (just let n be any positive divisor
of a other than 1 or a).
There are many theorems about primes that are truly amazing and some of these are amazingly
difficult to prove. There are also many questions involving primes which, though they are easy to
state, have resisted all attempts at proof. A simple question of this type concerns so-called twin
primes, pairs of primes of the form p and p + 2. For example, 5 and 7 are twin primes as are 59
Existence proofs
and 61. No one knows whether there are an infinite number of such pairs, though they occur as
far out as anyone has checked (by computer). There also are some arguments that make it appear
likely that the number of twin primes is infinite. But the twin primes conjecture (as well as several
other related questions) remains an unsolved mystery.
Exercises 2.2.
1. For the given integers n and a, show n|a by finding an integer m with a = nm.
a) 7|119
b) 5| 65
c) 3|51
d) 9| 252
e) 1|12
f ) 6|0
Existence proofs
Many interesting and important theorems have the form x P (x), that is, that there exists an
object x satisfying some formula P . In such existence proofs, try to be as specific as possible.
The most satisfying and useful existence proofs often give a concrete example or describe explicitly
how to produce the object x.
To prove the statement, there is a prime number p such that p + 2 and p + 6 are also prime
numbers, note that p = 5 works because 5 + 2 = 7 and 5 + 6 = 11 are also primes.
Suppose that U is the collection of all differentiable functions defined on R. To prove the
statement, there is a function f such that f 0 = 2f , note that f (x) = e2x works (as does any
constant multiple of e2x ).
Chapter 2 Proofs
In the first example, 5 is not the only number that works (for example, 11 works as well). In
fact, it is a famous unsolved problem whether there are infinitely many primes that work. Proving
that there are infinitely many primes with this property would be a more interesting result (and
would give the author of the proof some notoriety), but the point remains: when doing an existence
proof, be as concrete as possible. In each of the above examples, an explicit object satisfying the
desired properties is produced.
A slight variation on the existence proof is the counterexample. Suppose you look at a sentence
of the form xP (x) and you come to the conclusion that it is false. To demonstrate this, you
need to prove xP (x), which by one of De Morgans Laws is equivalent to xP (x). A specific x
satisfying P (x) is called a counterexample to the assertion xP (x).
To disprove the sentence for every integer n, the integer 5n2 +1 is not a perfect square, we need
to find an integer n such that 5n2 +1 is a perfect square. Note that 4 provides a counterexample
to the sentence, that is, 4 is an integer but 5(4)2 + 1 = 81 is a perfect square.
Suppose U is the collection of all continuous functions defined on R. To disprove the sentence
for every function f , if f is continuous at 0 then it is differentiable at 0, note that f (x) = |x|
is a counterexample.
Once again, the most satisfying way to prove something false is to come up with a specific
counterexample. Note well that it is never sufficient simply to find an error in the proof of some
sentence to conclude that it is falseit is easy to come up with erroneous proofs of correct facts. If
you have trouble proving a statement of the form x P (x), try looking at some particular cases of
the result. You may find a counterexample, or you may get a hint about why the statement really
is true.
There are occasions when it is impossible, or very difficult, to find a specific example. An
existence proof sometimes can be constructed by indirect means, or by using other existence results.
EXAMPLE 2.11 To show there is a real number x such that x7 + 3x 2 = 0, let f be the
function defined by f (x) = x7 + 3x 2. Then f is a continuous function (since it is a polynomial)
such that f (0) = 2 and f (1) = 2. By the Intermediate Value Theorem, there is a number c in the
interval (0, 1) for which f (c) = 0.
In this example, notice that a formula or method to actually determine the point c is not given.
There are various ways to approximate the point c, but the actual existence of this point depends
on an axiom concerning the set of real numbers. However, even though we do not have a method
for determining the exact value of c, we are guaranteed that such a point exists.
All calculus books include a proof of the Mean Value Theorem. However, if you trace the proof,
you will find that the Mean Value Theorem (which is an existence result), is proved by referring
to Rolles Theorem (another existence result), which is proved by referring to the Extreme Value
Theorem (yet a third existence result, sometimes called the Maximum Value Theorem), which is
proved indirectly, (if it is proved at all; you may need to consult a text in real analysis) without
ever exhibiting the object that is claimed to exist. At no point are we given a formula for the
quantity we seek, and the result is perhaps not as satisfying as we would like. In general, then, try
Existence proofs
to be specific when doing an existence proof, but if you cannot, it may still be possible to show
that an example exists using some other existence result or another technique of proof.
Trying to prove a statement of the form xyP (x, y) is rather like trying to do many existence
arguments at the same time. For any given value of x, we would like to construct or describe a
value for y that makes P (x, y) true.
Suppose we want to prove that there is no largest integer. This statement can be expressed as
nm(n < m). To prove this well-known fact, suppose n Z is given and let m = n + 1. Then
m is an integer and n < n + 1 = m.
We claim that there are arbitrarily long gaps in the sequence of prime numbers. In other words,
we are asserting that for every positive integer n there is a positive integer m such that m + 1,
m + 2, . . ., m + n are all composite. Given a positive integer n, let
m = (n + 1)! + 1 = (n + 1)n(n 1) 3 2 1 + 1.
We note in passing that m 3. If 1 k n, then m + k = (n + 1)! + (k + 1). Since both
(n + 1)! and k + 1 are divisible by k + 1, it follows that m + k is divisible by k + 1. Since
1 < k + 1 < k + m, the number m + k is composite. We have thus constructed n consecutive
composite numbers. The size of the numbers that appear here is rather difficult to imagine. For
instance, this result shows that there exist 10 billion consecutive composite numbers. However,
the universe itself is not a large enough canvas on which to actually write out all of the digits
in the number 10 billion factorial.
Exercises 2.3.
As you work through these exercises, dont simply find one example and move on to the next problem; try
to find other examples, look for patterns, and make note of your thought process.
1. Show that there is a prime number p such that p + 4 and p + 6 are also prime numbers.
2. Show that there is a two-digit prime number p such that p + 8 is a prime number and there are no
prime numbers between p and p + 8.
3. Show that there are prime numbers p and q such that p + q = 128. (This is a case of the famous
Goldbach Conjecture, which says that every even integer n 6 can be written as the sum of two
odd primes. It seems highly probable from work with computers that the Goldbach Conjecture is true,
but no one has discovered a proof.)
4. Show that there is a nonzero differentiable function f such that xf 0 (x) = 5f (x).
5. Show that there is a positive real number x such that x = 2 sin x.
6. Show that every odd integer is the sum of two consecutive integers.
7. Show that every odd integer is the difference between two consecutive perfect squares.
8. Show that for each positive integer n > 1 there exists a positive integer q such that n2 < 4q < (n + 1)2 .
9. Find counterexamples for each of the following statements; use N as the universe of discourse.
a) If 12|n2 , then 12|n.
b) If n|ab, then n|a or n|b.
c) If n2 |m3 , then n|m.
d) n2 n + 11 is a prime number for every n. (Find the smallest counterexample.)
e) 7n2 + 4 is not a perfect square for every n.
Chapter 2 Proofs
10. Suppose U is the collection of all continuous functions defined on R. Disprove the following sentence:
for every f U , either f is differentiable at 4 or f is differentiable at 7.
11. Find a positive integer n > 25 such that (n + 1)2 n2 is a perfect square.
12. Find distinct positive integers a, b, and c such that a2 ac + c2 = b2 .
Mathematical Induction
The set Z of integers and its properties are at the root of all mathematical disciplines. The algebraic
and order properties of the integers, whether used formally or informally, are the properties that
are most relevant when doing mathematics. However, the set of integers has another property that
is independent of its algebraic and order properties. This additional property of the integers is
quite important and is the topic of discussion for the next few sections.
Statements of the form, for each positive integer n, something is true, occur in all branches
of mathematics. Three simple examples are
1. For each positive integer n, the number 92n2 + 1 is not a perfect square.
n(n + 1)(2n + 1)
2. For each positive integer n, 12 + 22 + 32 + + n2 =
3. For each positive integer n, cos x + i sin x = cos(nx) + i sin(nx).
To prove that statements such as these are false, it is only necessary to find one positive integer
n for which the statement is false. For instance, statement (1) is false; it is possible (but requires
some patience) to find a positive integer n for which 92n2 + 1 is a perfect square. However, it is
not possible to prove such statements are true by showing that they are true for several values of
n (or even a whole lot of values of n); the formulas or statements must somehow be verified for
every positive integer n. Since it is not possible to actually prove individually an infinite number
of statements, some other method of proof is needed. The Principle of Mathematical Induction is
a useful tool for proving some statements of this type. This important property, which we accept
as an axiom, is stated below.
Principle of Mathematical Induction: If S is a set of positive integers that contains 1 and
satisfies the condition if k S, then k + 1 S, then S = Z+ .
The Principle of Mathematical Induction can be compared to a chain reaction. If we know that
each event (the quantifier is implicitly used here) will set off the next (the condition in quotes)
and if the first event occurs (1 S), then the entire chain reaction will occur. Perhaps you have
seen one of those amazing domino exhibits where thousands of dominoes fall over in interesting
patterns. The dominoes must be set up in such a way that each domino knocks over the next, and
someone must begin the process by pushing over the first domino.
Given a statement of the form for each positive integer n, something is true, let S be the
set of all positive integers n for which the statement is true. In order to prove that the statement
is true for all positive integers, we must show that S = Z+ . By the Principle of Mathematical
Induction, it is sufficient to prove that S contains 1 and satisfies the condition if k S, then
k + 1 S. In almost every situation of this type, it is easy to prove that 1 S. However, a proof
that k + 1 S under the assumption that k S requires more effort. (Make careful note of what
Mathematical Induction
is assumed and what is to be proved. We assume that the statement is true for some fixed value
k and then try to use this fact to prove that it is true for the next value k + 1.) Two examples
of such proofs are given below. We provide two proofs of the first result; one in the formal style
indicated by the statement of the Principle of Mathematical Induction and a second in the more
common informal style. You may use whichever style of proof you prefer, but it may be a good
idea to use the longer form until you become proficient with this type of proof.
The formula
12 + 22 + 32 + + n2 =
n(n + 1)(2n + 1)
n(n + 1)(2n + 1)
When n = 1, the formula reduces to 12 = (1 2 3)/6. Since this statement is true, it follows that
1 S. Suppose that k S for some positive integer k. This means that
12 + 22 + 32 + + k 2 =
k(k + 1)(2k + 1)
(k + 1)(k + 2)(2(k + 1) + 1)
To show that the two expressions are equal, we begin with one side of the equation and manipulate
it using algebra and known results to obtain the other side. In this case, we have
k(k + 1)(2k + 1)
+ (k + 1)2
2k 2 + k + 6k + 6
(k + 1)(k + 2)(2k + 3)
12 + 22 + 32 + + k 2 + (k + 1)2 =
n(n + 1)(2n + 1)
Chapter 2 Proofs
(second version) The formula is easily verified for n = 1. Suppose that the formula is valid for
some positive integer k. Then
k(k + 1)(2k + 1)
+ (k + 1)2
2k 2 + k + 6k + 6
(k + 1)(k + 2)(2k + 3)
12 + 22 + 32 + + k 2 + (k + 1)2 =
showing that the formula is valid for k + 1 as well. The result now follows by the Principle of
Mathematical Induction.
Proof. We will use the Principle of Mathematical Induction. Since 0 is divisible by 64, the
statement is valid when n = 1. Suppose that 9k 8k 1 is divisible by 64 for some positive integer
k. By definition, there exists an integer j such that 64j = 9k 8k 1. We then have
9k+1 8(k + 1) 1 = 9(9k 1) 8k
= 9(64j + 8k) 8k
= 64(9j + k).
Since 9j + k is an integer, we find that 9k+1 8(k + 1) 1 is divisible by 64. By the Principle of
Mathematical Induction, for each positive integer n, the integer 9n 8n 1 is divisible by 64.
It is not necessary that 1 be the starting point in the Principle of Mathematical Induction; any
integer a will do. If S is a set of integers that contains a and satisfies the condition if k a and
k S, then k + 1 S, then S = {n Z : n a}. The fact that this statement is equivalent to
the Principle of Mathematical Induction follows by making a simple change of variables; the details
will be left to the reader. There are situations for which this slight modification to the Principle of
Mathematical Induction is helpful. An example appears below.
Proof. Using a calculator, it is easy to verify that 8 < (1.3)8 . Suppose that k < (1.3)k for some
positive integer k 8. Then
k + 1 < k + 0.3k = 1.3k < 1.3(1.3)k = (1.3)k+1 ,
so the inequality is valid for k+1 as well. By the Principle of Mathematical Induction, the inequality
n < (1.3)n is valid for all n 8.
It will be helpful to make several comments concerning terminology. The hypothesis if k S
or suppose the result is valid when n = k is known as the induction hypothesis. The part of
the argument that uses this assumption (remember it is an assumption that something is true for
this one particular value of k) to prove that k + 1 S or that the result holds for n = k + 1 is
Mathematical Induction
called the inductive step. A proof that uses the Principle of Mathematical Induction is called a
proof by induction. In those cases in which the inductive step is easy, the proof is usually left
out. A phrase such as the result follows by induction means that the induction argument is easy
and is left to the reader.
The Principle of Mathematical Induction requires the validation of two hypotheses. The first
involves checking that the result is valid for some starting value of n. Even though this step is
usually very easy, it is still necessary. Suppose that someone claims that n2 + 7n 3 is an even
number for each positive integer n. If k 2 + 7k 3 is even for some positive integer k, then
(k + 1)2 + 7(k + 1) 3 = (k 2 + 7k 3) + 2(k + 4)
is the sum of two even numbers and thus an even number. This establishes the condition if k S,
then k + 1 S, where S is the set of all positive integers n for which n2 + 7n 3 is even. However,
the result is false for n = 1 (and also false for every other positive integer n). It is generally a good
idea to check a formula for several values of n before trying to find a general proof. Not only does
this give you more evidence of the validity of the statement, it can sometimes give you a good idea
of the steps that are needed for a proof of the induction hypothesis.
As a final comment, it is important to realize that not every statement that involves positive
integers requires the Principle of Mathematical Induction in its proof. There may be better or
easier methods to prove the result; the following result provides one simple example.
n3 n2 > 2n 1.
Exercises 2.4.
1. Prove that 1 + 3 + 5 + + (2n 1) = n2 for each positive integer n.
n2 (n + 1)2
for each positive integer n.
3. Prove that
+ +
for each positive integer n.
n (n + 1)
2. Prove that 13 + 23 + 33 + + n3 =
Chapter 2 Proofs
8. Prove that the product of any four consecutive positive integers is one less than a perfect square.
9. Prove that n + 1 < 2n1 for all integers n > 3.
10. Let a and b be real numbers. Prove that
an bn = (a b) an1 + an2 b + an3 b2 + + bn1
for each positive integer n > 1.
11. Suppose that a1 = 1/3 and let an+1 = (1 + 2an )/3 for each n > 1. Find and prove a simple formula
for an .
12. Suppose that x > 1 and that x 6= 0. Prove that (1 + x)n > 1 + nx for each positive integer n 2.
This result is known as Bernoullis Inequality.
13. Let A be a set with n elements, where n is a positive integer. Prove that P(A) has 2n elements.
14. A polygon in the plane is convex if the segment connecting any two vertices of the polygon is contained
entirely inside the polygon. (Since you are most familiar with convex polygons, you might find it helpful
to draw a polygon that is not convex.) Prove that the sum of the n angles of a convex polygon with
n vertices is (n 2).
In this section, we present two well-known and useful results. Although the results extend beyond
the realm of the integers, the proofs of each of them involve mathematical induction. The results
are included here because they provide practice in reading more elaborate induction proofs and
they present some important mathematical ideas.
The first goal is to state and prove the Binomial Theorem, the familiar theorem that gives a
formula for expanding (a + b)n . We begin with some notation. For each positive integer n, define
n! (read n factorial) by n! = n(n 1)(n 2) 3 2 1. Even for small values of n, factorials can
be very large; for example, 70! > 10100 . For sound mathematical reasons (see Exercise 12 below),
0! is defined to be 1. For a positive integer n and a nonnegative integer k such that 0 k n,
define the binomial coefficient nk by
n(n 1) (n k + 1)
k! (n k)!
It is easy to verify that
= 1;
0! 3!
. To illustrate binomial coefficients, note that
= 3;
1! 2!
= 3;
2! 1!
= 1.
3! 0!
The reader should recognize these numbers as the coefficients that appear in the expansion of
(a + b)3 :
(a + b)3 = a3 + 3a2 b + 3ab2 + b3 .
This is no coincidence and provides one example of a general formula for (a + b)n , where n is a
positive integer. Our next goal is to derive this formula, which is known as the Binomial Theorem.
We begin with a simple lemma.
LEMMA 2.16
Binomial Theorem
(a + b)n =
ak bnk
1 0
1 1
x +
x = 1 + x,
Chapter 2 Proofs
which is clearly a true statement. Suppose that the formula holds for some positive integer n.
Using the previous lemma, we find that
(1 + x)n+1 = (1 + x)(1 + x)n
n k
= (1 + x)
n k X n k+1
x +
n k X
x +
xk + xn+1
n+1 k
n + 1 n+1
x +
X n + 1
xk .
Hence, the formula is valid for n+1 as well. The result now follows by the Principle of Mathematical
We now consider the general case. If b = 0, then the only nonzero term in the sum occurs when
k = n; this yields the equation an = an . Assume that b 6= 0. Using the special case proved above,
we obtain
n k
n a
n k nk
a n
a b
(a + b) = b 1 +
= 2 and
= 0.
mean of two numbers represents the number that is halfway between the two numbers. For two
positive real numbers x and y, their geometric mean xy represents the length of the side of a
square whose area is the same as the area of a rectangle with sides of lengths x and y. It is easy
to verify that the geometric mean of two positive numbers is less than or equal to the arithmetic
mean of the numbers. It turns out that this result is true for every set of n nonnegative numbers,
but a proof for values of n > 2 is more difficult. As with the previous result, the proof presented
here begins with a lemma.
LEMMA 2.19 Let n 2 be an integer. Suppose that b1 , b2 , . . . , bn are positive real numbers
that are not all equal. If b1 b2 bn = 1, then b1 + b2 + + bn > n.
Proof. We will use the Principle of Mathematical Induction. For the case n = 2, we know that
b1 6= b2 and b1 b2 = 1. It follows that
p 2
b1 b2 = b1 2 b1 b2 + b2 = b1 2 + b2 and thus b1 + b2 > 2,
showing that the result is true when n = 2. Now suppose the result is valid for some positive
integer p 2. Let b1 , b2 , . . . , bp , bp+1 be positive real numbers that are not all equal and satisfy
b1 b2 bp bp+1 = 1. Without loss of generality, we may assume that the numbers are in increasing
order, that is, b1 b2 bp bp+1 . By the assumptions on these numbers, we must have
b1 < 1 < bp+1 . Since the conclusion of the lemma is assumed to be true when n = p, we consider
the product (b1 bp+1 )b2 bp = 1, which is a product of p numbers. If all of these numbers are equal
(and thus all equal 1), then
b2 + b3 + + bp = p 1
b1 + bp+1 > 2
(the inequality follows from the first part of the proof) and it follows that
b1 + b2 + + bp + bp+1 > p + 1.
If the numbers b1 bp+1 , b2 , . . . , bp are not all equal, then
b1 bp+1 + b2 + + bp > p
by the induction hypothesis. Since the quantity (bp+1 1)(1 b1 ) is positive, we find that
b1 + b2 + + bp+1 = (b1 bp+1 + b2 + + bp ) + 1 + (bp+1 1)(1 b1 )
> p + 1 + (bp+1 1)(1 b1 )
> p + 1.
This shows that the result holds when n = p + 1. By the Principle of Mathematical Induction, the
conditional statement given in the lemma is valid for all integers n 2.
THEOREM 2.20 Arithmetic Mean/Geometric Mean Inequality
integer. If a1 , a2 , . . . , an are nonnegative real numbers, then
a1 + a2 + + an
a1 a2 an
Equality occurs if and only if a1 = a2 = = an .
Let n be a positive
Chapter 2 Proofs
= 1,
r r
the previous lemma yields
+ +
> n,
which is equivalent to
a1 + a2 + + an
> r = a1 a2 an
(Note the reduction of the general case to a special case.) This completes the proof.
To see one application of this inequality, consider the following fairly traditional optimization
problem from calculus: find the minimum surface area of an open top rectangular box having a
square base and a fixed volume of 4000 cubic feet. To solve this problem, let x be the length
and width (in feet) of the base of the box and let h be the height (in feet) of the box. Then the
volume V and surface area S of the box are given by V = x2 h and S = x2 + 4xh. The Arithmetic
Mean/Geometric Mean Inequality, along with some simple algebra, yields
S = x2 + 4xh
= x2 + 2xh + 2xh
3 x2 2xh 2xh
= 3 4V 2 = 1200.
Hence, the surface area of the box is always at least as large as 1200 square feet. We know
that this minimum is attainable when equality occurs in the AM/GM inequality, that is, when
x2 = 2xh = 2xh. Since the sum of these equal numbers must be 1200, we find that each of them
must be 400. It follows that x = 20 and h = 10. Note that the minimum surface area occurs when
the area of the base of the box is the same as the area of two opposite sides of the box. The crucial
step in this particular solution to the problem is writing the expression for S in such a way that the
product of all the terms gives an expression for V . In practice, it may take some trial and error to
find the right combination. By the way, make sure you see why writing S = x2 + xh + 3xh, which
does provide a lower bound for the surface area, is not useful for finding the minimum value of S.
Exercises 2.5.
1. Use the Binomial Theorem to expand each of the following.
a) (1 + x)6
b) (a + b)5
c) (2x + y)7
Strong Induction
6. Let A be a set with n elements, where n is a positive integer. Use the Binomial Theorem to prove
that P(A) has 2n elements.
7. Use the Binomial Theorem to give an alternate proof of Theorem 2.13.
8. Let n be a positive integer and let a1 , a2 , . . . , an be nonnegative real numbers. Prove that the arithmetic
mean and the geometric mean of this set of numbers lie in the closed interval [m, M ], where m and
M are defined by m = min{a1 , a2 , . . . , an } and M = max{a1 , a2 , . . . , an }.
9. Let x and y be positive numbers. For each of the following conditions on x and y, find the maximum
value for the product xy and the values of x and y that generate this product.
a) 4x + 9y = 36
b) 4x2 + 9y 2 = 36
c) 4x2 + 9y = 36
10. Find the minimum value for 4x + 9y, subject to the conditions x > 0, y > 0, and x2 y 3 = 100.
11. Let n be a positive integer and let a1 , a2 , . . . , an be positive numbers. The harmonic mean of these
numbers is the reciprocal of the arithmetic mean of the reciprocals of the numbers. Prove that the
harmonic mean of a set of positive numbers is less than or equal to the geometric mean. When does
equality occur?
12. The following set of results provides a different way to think of factorials.
a) Use mathematical induction and LH
opitals Rule to prove that lim xn ex = 0 for all n Z+ .
R n x
b) Use part (a) and mathematical induction to prove that 0 x e dx = n! for all n Z+ .
c) Use the result from part (b) to explain why 0! is defined to be 1.
Strong Induction
The Principle of Mathematical Induction is equivalent to the following statement; a proof of this
fact will be given in the next section.
Principle of Strong Induction: If S is a set of positive integers that contains 1 and satisfies the
condition if 1, 2, . . . , k S, then k + 1 S, then S = Z+ .
This stronger form of induction (the statement is assumed to be true for all of the positive integers
up to k, not just for k) is needed in some cases. (Strong induction is sometimes referred to as
complete induction.) An example of a proof that uses this stronger form of induction follows.
EXAMPLE 2.21 Suppose that a1 = 1, a2 = 1/2, and an+1 = (an + an1 )/2 for each positive
integer n > 1. Then an = (1/2)n1 for each positive integer n.
Proof. We will use the Principle of Strong Induction. Let S be the set of all positive integers n
such that an = (1/2)n1 . It is easy to see both 1 and 2 belong to S. (We need to check both of
these cases since these numbers do not fit the general pattern for the generation of terms.) Suppose
that all the integers 1, 2, . . . , k belong to S for some positive integer k 2. To prove that k + 1 S,
we must show that ak+1 = (1/2)k . Using the assumption that all of the ai terms for i from 1 to
k satisfy the pattern (all we really need to know is that the pattern is valid for the terms k and
k 1, but the crucial point is that we need more than just k),
+ 12
2 + 4
ak + ak1
1 k
ak+1 =
This shows that k + 1 S. By the Principle of Strong Induction, it follows that S = Z+ . Therefore,
the formula an = (1/2)n1 is valid for all positive integers n.
Chapter 2 Proofs
What is the main difference between the Principle of Mathematical Induction and the Principle
of Strong Induction? The condition if k S, then k + 1 S is replaced by the condition if
1, 2, . . . , k S, then k + 1 S; the hypothesis of the condition is stronger (more results are
assumed to be true) for the Principle of Strong Induction. The assumption that all of the integers
1, 2, . . . , k belong to S gives more information to use in the proof that k + 1 S. Here is another
way to compare the two forms of induction. Suppose we want to prove that the formula Q(n)
is true for every positive integer n. For the Principle of Mathematical Induction, the key step is
proving the conditional
Q(k) Q(k + 1),
whereas for the Principle of Strong Induction, the key step is proving the conditional
Q(1) Q(2) Q(k) Q(k + 1).
In some cases (as in the above example), the stronger hypothesis is needed to prove that Q(k + 1)
is true. In the preceding proof, all we really needed was for the formula to be valid for both k and
k 1 to prove that the formula was valid for k + 1; the other hypotheses were simply ignored. The
important point is that knowing the formula is valid only for k is not sufficient to prove that the
formula is valid for k + 1; thus we need the stronger form of induction.
By the way, the sequence {an } defined in the statement of Example 2.21 is known as a recursively defined sequence; the sequence is generated by the first few terms and a rule to determine
successive terms from previous ones. The fact that an is defined for every positive integer n is a
simple consequence of the Principle of Strong Induction. In this case, the numbers a1 and a2 are
defined. Assuming that the numbers a1 , a2 , . . . , ak are defined, the formula shows how to define
ak+1 . It follows that an is defined for every positive integer n. The Principle of Strong Induction is
generally not mentioned in such cases; instead, a comment such as continue this process is made.
As long as the first couple of terms are defined and it is evident how to determine the next term
from previous terms, the Principle of Strong Induction guarantees that there is a term defined for
each positive integer.
As a second illustration of the Principle of Strong Induction, we prove the existence portion
of the Fundamental Theorem of Arithmetic. This is an important result in number theory, and it
will be used many times in this book, often without special recognition.
THEOREM 2.22 Fundamental Theorem of Arithmetic
Every positive integer n > 1 is
either a prime number or can be factored into a product of prime numbers.
Proof. It is clear that 2 is a prime number. Suppose that for some positive integer k, each of the
integers 2, 3, . . . , k is either a prime number or can be factored into a product of prime numbers.
Consider the integer k + 1. If k + 1 is a prime number, there is nothing further to prove. If k + 1 is
not a prime number, then k + 1 = ab, where a and b are integers between 2 and k, inclusively. By
the induction hypothesis, each of the integers a and b is either a prime number or can be factored
into a product of prime numbers. It follows that k + 1 = ab can be factored into a product of prime
numbers. By the Principle of Strong Induction, every integer n 2 is either a prime number or
can be factored into a product of prime numbers.
Strong Induction
The rest of the Fundamental Theorem of Arithmetic states that the factorization of positive
integers into products of primes is unique except for the order in which the factors are written.
The proof of the uniqueness part of this theorem is not difficult, but it does involve some simple
facts about prime numbers that we have not yet discussed. The proof will therefore be postponed
until the next chapter (see Theorem 3.30 in Section 3.6).
The first part of the Fundamental Theorem of Arithmetic is often written as
Every integer n > 1 can be factored into a product of primes.
This statement is shorter and more concise than the one stated above and the proof requires fewer
words. However, the reader must make a mental adjustment by considering a single prime number,
such as 2 or 13, as a product. A number by itself is not normally considered to be a producta
product requires two or more numbers. Writing 2 = 2 1 does not solve the problem here since
1 is not a prime number. In this instance, the single number 2 must be thought of as a product.
Simplifications and generalizations such as this occur frequently in mathematics; it is therefore
necessary to learn how to make the appropriate mental adjustments.
Proof. Suppose that n > 1 is an integer. If n is a prime, then n is certainly divisible by a prime,
namely itself. If n is not a prime, then it can be written as a product of two or more (not necessarily
distinct) primes. Any one of the primes in this product divides n.
There is a common situation in which the Principle of Mathematical Induction occurs in disguised form (typically in the regular form, not the strong form) or is only mentioned as an aside.
One such example from calculus is the following. After a proof of the familiar fact (f + g)0 = f 0 + g 0 ,
an example such as
d 3
d 3
d 2
x + 2x2 + 3x + 2 =
x +
2x +
3x +
2 = 3x2 + 4x + 3
is given. What is the problem? The theorem is stated for the sum of two functions and has been
applied to a sum of four functions. Some calculus texts make no mention of this; others say that
it is possible to extend the result to n functions using induction. However, the proof is seldom
given because it is so boring. Since it is important to see such proofs at least once, we will prove
this property of derivatives. (It is assumed that the reader has some familiarity with limits and
THEOREM 2.24 For each positive integer n, the derivative of the sum of n differentiable
functions is the sum of the derivatives of the functions.
Proof. There is nothing to prove if n = 1, so we first establish the result for the sum of two
differentiable functions. Let f1 and f2 be differentiable functions and use the definition of the
Chapter 2 Proofs
derivative to compute
(f1 + f2 )0 (x) = lim
f (x + h) f (x) f (x + h) f (x)
= lim
= f10 (x) + f20 (x).
Hence, the derivative of f1 + f2 is f10 + f20 . Now suppose that for some k 2, the derivative of the
sum of k differentiable functions is the sum of the derivatives of the functions and let f1 , . . . , fk+1
be differentiable functions. Using the induction hypothesis and the fact that the result has already
been proved for two functions, we obtain
f1 + + fk + fk+1
f1 + + fk + fk+1
= f1 + + fk + fk+1
= f10 + + fk0 + fk+1
= f10 + + fk0 + fk+1
the desired result. By the Principle of Mathematical Induction, for each positive integer n, the
derivative of the sum of n differentiable functions is the sum of the derivatives of the functions.
Note that the first step in the preceding proof requires the definition, but that the inductive
step uses only previous results and assumptions. Since the latter part of the proof is rather routine
and requires more words than thinking, it is often left out. However, it is important to know what
goes on in such situations. As you read through the main body of this textbook, look for situations
such as this where a result for two is extended to a result for more than two.
A cautionary word is appropriate at this point. Theorem 2.24 states that
fk (x)
fk0 (x),
where n is any positive integer. It is not possible to conclude from this that
fk (x)
fk0 (x).
In fact, this result is not true in general. An induction argument only shows that a result is valid
for finite sums of any size; it does not say anything about infinite sums.
Exercises 2.6.
1. Let b1 = 1, b2 = 2, and bn = 3bn1 2bn2 for each positive integer n > 2. Find and prove a formula
for bn . You should begin by finding a few more terms of the sequence.
2. Prove that any nonnegative integer n can be expressed as n = 3q + r, where 0 r < 3.
3. Prove that any integer n 8 can be expressed as n = 3x + 5y, where x 0 and 0 y < 3.
Well-Ordering Property
b) 561
c) 825
d) 3042
e) 1938
f ) 1955
g) 2079
h) 111111
5. This exercise refers to the Fibonacci numbers 1, 1, 2, 3, 5, 8, 13, 21, 34, . . .. These numbers are defined
by f1 = 1, f2 = 1, and fn+1 = fn + fn1 for each n 2. (You may find it helpful to first write the
formulas in (a) through (d) using summation notation.)
a) Prove that f1 + f2 + + fn = fn+2 1 for each positive integer n.
b) Prove that f12 + f22 + + fn2 = fn fn+1 for each positive integer n.
c) Prove that f1 + f3 + f5 + + f2n1 = f2n for each positive integer n.
d) Prove that f1 f2 + f2 f3 + f3 f4 + + f2n1 f2n = f2n
for each positive integer n.
e) Prove that fn+1 fn1 = fn2 + (1)n for each positive integer n > 1.
f ) Find and prove a formula for f1 + f4 + f7 + + f3n2 .
g) Find and prove a formula for
(1)k+1 fk .
i) Find and prove a formula (it should be a simple and interesting one) for fn2 + fn+1
6. Suppose that n distinct lines are drawn in the plane in such a way that no two lines are parallel and
no three lines share a common point. Into how many regions do these n lines divide the plane? Of
course, you must provide a proof of your conjecture.
7. Assume that it has been proved that det(AB) = det A det B for square matrices of a given size. Use
this fact and induction to prove that
det(A1 A2 An ) = det A1 det A2 det An
for each positive integer n.
Well-Ordering Property
A set A Z contains a least element if there exists an integer q A such that q a for all a A.
For example, the set of prime numbers contains a least elementnamely, the integer 2. The set
of all even integers does not contain a least element. Now consider the following statement about
sets of positive integers.
Well-Ordering Property: Every nonempty set of positive integers contains a least element.
The Well-Ordering Property should make intuitive sense. Given a list of positive integers, it
is always possible to find the smallest number in the list. This is not true for sets of positive real
numbers. In particular, there is a smallest positive integer, but there is no smallest positive real
Although the Well-Ordering Property and the two forms of the Principle of Mathematical
Induction are plausible, none of them can be proved from the algebraic or order properties of the
integers. In fact, these statements must be accepted as axioms. This may come as a surprise since
they seem so obvious, but it is sometimes the case that what seems obvious cannot be proved. In
such situations, it is necessary to introduce an axiom. It is an interesting fact that all three of these
statements effectively say the same thing, that is, that they are logically equivalent. Consequently,
Chapter 2 Proofs
any one of these three statements can be taken as an axiom and the other two can be derived from
it as theorems. A proof of the equivalence of these statements is given below.
1. Well-Ordering Property;
2. Principle of Mathematical Induction;
3. Principle of Strong Induction.
Proof. We prove (1) (2) and (3) (1), leaving a proof of (2) (3) as an exercise. It then
follows that all three statements are equivalent.
Suppose first that the Well-Ordering Property is true. Let S be a set of positive integers that
contains 1 and satisfies the condition if k S, then k + 1 S, and let A = Z+ \ S. To prove
(2), it is sufficient to show that A = . We give a proof by contradiction (see the next section for
further details on this proof technique). Suppose that A 6= . Since A is a nonempty set of positive
integers, the Well-Ordering Property guarantees the existence of an integer q A such that q a
for all a A. Since q A, we know that q
/ S. It follows that q 6= 1, so q 1 is a positive integer.
Note that q 1 S since q is the smallest integer in A. By the properties of the set S, the integer
q = (q 1) + 1 belongs to the set S. This is a contradiction to the fact that q
/ S. Hence, the
set A is empty. Therefore, the Principle of Mathematical Induction follows from the Well-Ordering
Now suppose that the Principle of Strong Induction is true. Let S be the set of all positive
integers n with the following property:
Any set of positive integers that contains an integer less than or equal to n has a
least element.
It is clear that 1 S. Suppose that 1, 2, . . . , k S for some positive integer k. Let A be a set
of positive integers that contains an integer less than or equal to k + 1. If A contains no integer
less than k + 1, then k + 1 is the least element in A. If A contains an integer a < k + 1, then A
is a set of positive integers that contains an integer less than or equal to a. Since a S, the set
A has a least element. This shows that every set of positive integers that contains an integer less
than or equal to k + 1 has a least element. It follows that k + 1 S. By the Principle of Strong
Induction, S = Z+ . Therefore, every nonempty set of positive integers has a least element, that is,
the Well-Ordering Property holds.
As mentioned prior to the theorem, one of these statements is accepted as an axiom. Hence, all
three statements are valid. We have yet to mention a proof that uses the Well-Ordering Property.
Since it is equivalent to the Principle of Mathematical Induction, any proof that uses the Principle
of Mathematical Induction could also be done using the Well-Ordering Property. The only change
is the format of the proof.
Well-Ordering Property
n(n + 1) 2
is valid.
Proof. Let B be the set of all positive integers n for which the formula is false. We need to show
that B is the empty set. Suppose (to give a proof by contradiction) that B is nonempty. By the
Well-Ordering Property, the set B contains a least element, call it q. It is clear that q 6= 1 since
the formula is easily seen to be true for n = 1. Since q 1 is a positive integer that is not in B,
the formula is valid for q 1. That is,
13 + 23 + 33 + + (q 1)3 =
It follows that
13 + 23 + 33 + + (q 1)3 + q 3 =
(q 1)q 2
(q 1)q 2
+ q3
(q 1)2 + 4q
q(q + 1) 2
which indicates that q is not in B, a contradiction. We conclude that B is empty. Hence, the
formula is valid for all positive integers.
It is probably more natural to use the Principle of Mathematical Induction rather than the
Well-Ordering Property to prove the formula in Theorem 2.26, but the proof does indicate how
this property can be used. However, there are some situations in which it is easier to use the
Well-Ordering Property. The proof of the next result, known as the Division Algorithm, is such a
case. Its conclusion includes the rather obvious statement that when one positive integer is divided
by another the result is a quotient and a remainder that is smaller than the divisor; this simple
statement is the basis for many results in number theory. The proof also provides a good example
of an existence/uniqueness proof; a proof that establishes the existence of some object and
shows that there is only one object with the given property.
THEOREM 2.27 Division Algorithm
If a and b are integers with b 1, then there exist
unique integers q and r such that a = bq + r and 0 r < b.
Proof. For the existence part, we restrict ourselves to the case in which a 0; the case in which
a < 0 is left as an exercise. Suppose that a 0. There are two easy situations that can be handled
i) If a < b, then a = b 0 + a has the desired form.
ii) If b = 1, then a = 1 a + 0 has the desired form.
Suppose that a b > 1. The set C = {k Z+ : bk a} is a nonempty set of positive integers. By
the Well-Ordering Property, the set C contains a least element p. This means that b(p1) < a bp.
We now consider two options:
Chapter 2 Proofs
and a = bq2 + r2 ,
where r1 and r2 are nonnegative integers less than b. By relabeling the integers if necessary, we
may assume that r1 r2 . It follows that 0 r1 r2 < b and thus 0 b(q2 q1 ) < b. Since
0 q2 q1 < 1 and q2 q1 is an integer, we find that q1 = q2 and thus r1 = r2 . Therefore, there
is only one representation of the desired form for a. This completes the proof.
You may find some of the steps in the above proof less than obvious. (Why is the set C
nonempty?) You may find some sentences require you to take out a piece of paper and do some
writing. (Why do q and r have the desired properties?) You may have to think a while and/or
ask for some help. The important point here is the emphasis on you. It is important that you
understand each and every step in a proof; do not be a passive reader.
You may object that this is an awful lot of work for such an obvious result. But this result
almost certainly seems obvious because it is so familiaryou know by experience that you can
always get a quotient and remainder. Yet pressed to explain how you know that no matter how
you do it, you always get the same remainder, you would probably find yourself at something of a
loss. As we set out to establish a body of true mathematical facts, it is important to have complete
confidence that the foundation is solid. Many a convincing proof has turned out to be wrong; the
first place to look for a mistake is always the line that says now, it is obvious that . . .
As we have just seen, some of the more useful and interesting existence theorems are existence and uniqueness statementsthey say that there is one and only one object with a specified
property. The symbol !x P (x) stands for there exists a unique x satisfying P (x), or there is
exactly one x such that P (x), or any equivalent formulation. The following examples illustrate
this notation and show that the ! quantifier can be combined with other quantifiers.
If the universe is R, then the statement !x (x2 + 1 = 2x) is true since x = 1 is not only a
solution, but the only solution. (Can you prove this?)
If the universe is Z, then x !y (x < y < x + 2) is true since only y = x + 1 satisfies the
The quantifier ! can be broken down into the existence part and the uniqueness part. In
other words, x! P (x) says the same thing as
x P (x) x y (P (x) P (y) x = y) .
The second part of this formula is the uniqueness part; it says that any two elements that satisfy
P (x) must, in fact, be the same. More often than not, we must prove existence and uniqueness
Well-Ordering Property
separately; often one of the proofs is easier than the other. (Quite frequently, the uniqueness part
is the easier of the two.)
Proof. Let f be the function defined by f (x) = x3 + 6x2 + 13x 100. Since f is a polynomial,
it is both continuous and differentiable. Note that
f (0) = 100 < 0 < 20 = f (3).
By the Intermediate Value Theorem, there exists a point c (0, 3) such that f (c) = 0. This
shows that the equation has at least one solution. Now suppose that both a and b are solutions
to the equation and assume that a < b. The function f is differentiable on the interval [a, b] and
f (a) = 0 = f (b). By Rolles Theorem, there exists a point z (a, b) such that f 0 (z) = 0. However,
f 0 (x) = 3x2 + 12x + 13 = 3(x + 2)2 + 1
is easily seen to be positive for every value of x. This is a contradiction so a and b must be equal.
It follows that the equation x3 + 6x2 + 13x 100 = 0 has a unique solution.
There is a unique function f such that f 0 (x) = 2x for all x and f (0) = 3.
Proof. The function f defined by f (x) = x2 + 3 clearly works. If f0 (x) and f1 (x) both satisfy
these conditions, then f00 (x) = 2x = f10 (x), so (by the Mean Value Theorem) the two functions
differ by a constant, that is, there is a constant C such that f0 (x) = f1 (x) + C for all x R.
Letting x = 0 yields 3 = f0 (0) = f1 (0) + C = 3 + C, which shows that C = 0. It follows that
f0 = f1 .
Exercises 2.7.
1. Finish the proof of Theorem 2.25 by proving that (2) implies (3). Begin by letting S be a set that
satisfies the hypotheses of the Principle of Strong Induction, then define a new set T by
T = {n Z+ : 1, 2, . . . , n S}.
Show that T satisfies the hypotheses of the Principle of Mathematical Induction.
2. Use the Well-Ordering Property to prove 1 + 3 + 5 + + (2n 1) = n2 for each positive integer n.
3. Finish the proof of Theorem 2.27 by establishing the existence portion for a < 0. You should use the
result already proved for a > 0 rather than adapting the given proof.
4. For the following values of a and b, find q and r such that 0 r < b and a = qb + r.
a) a = 81, b = 6
b) a = 728, b = 7
c) a = 11, b = 8
d) a = 57, b = 9
e) a = 375, b = 1
f ) a = 7, b = 11
For the next two exercises, identify the existence part and the uniqueness part of your proof clearly.
5. There is a unique solution to 2x 3 = 7.
6. For every x there is a unique y such that (x + 1)3 x3 = 3y + 1.
For Exercises 78, assume the universe of discourse is the collection of differentiable functions on R.
7. Prove that there is a unique function f such that f 0 (x) = sin x for all values of x and f (/2) = 0.
Chapter 2 Proofs
8. Prove that there is a unique function f such that f 0 (x) = f (x) for all values of x and f (0) = 1.
(To show uniqueness, let f0 be the obvious solution and let f1 be any other solution. What is the
derivative of f1 /f0 ?)
9. Find a positive integer a and integers c and d so that a = 5c + 1 = 7d + 3. Is a unique?
10. Prove the following modification of the Division Algorithm: If a and b are positive integers, then there
exist integers q and r such that a = bq + r and |r| b/2. Can you conclude that the integers q and r
are unique?
11. Let f be a differentiable function on R and suppose that |f 0 (x)| < 1 for all x R. Prove that there
exists at most one real number c such that f (c) = c.
Indirect Proof
There are times when it is difficult (or impossible) to prove something directly, but easier (at least
possible) to prove it indirectly. The essence of the idea is simple: for example, suppose you are
inside your house and want to know whether it is overcast or sunny, but you cant see the sky
through your window. You usually can tell, indirectly, by the quality of light that you can see.
Without formalizing the process, you make use of something like the following: If it is sunny I will
be able to see areas of bright light and areas of shadow in the garden; I dont, so it must be (at
least partially) overcast. What logical fact (that is, which tautology) is being used here?
There are two methods of indirect proof; proof of the contrapositive and proof by contradiction.
They are closely related, even interchangeable in some circumstances, though proof by contradiction
is more powerful. What unites them is that they both start by assuming the denial of the conclusion.
Proof. We will prove the contrapositive: if n 1 is not prime, then 2n 1 is not prime. The
case in which n = 1 is trivial so suppose that n > 1 is composite. This means that there exist
integers a and b such that 1 < a < n, 1 < b < n, and n = ab. The equation
xb 1 = (x 1) (xb1 + xb2 + + 1)
is valid for all real numbers x (see Exercise 10 in Section 2.4). Applying this formula, we find that
2n 1 = (2a )b 1 = (2a 1) (2a )b1 + (2a )b2 + + 1 .
This shows that 2a 1 is a divisor of 2n 1. Since 1 < 2a 1 < 2n 1, it follows that 2n 1 is
Proof by Contradiction
To prove a sentence P by contradiction, we assume P and derive a statement that is known
to be false. Since mathematics is consistent (at least we hope so), this means P must be true.
Several examples of such proofs can be found in the previous section.
In the case that the sentence we are trying to prove is of the form P Q, we assume that P is
true and Q is false (because P Q is the negation of P Q), and try to derive a statement known
to be false. Note that this statement need not be P this is the principal difference between proof
by contradiction and proof of the contrapositive. In logical symbols, a proof by contradiction of
P Q can often be expressed as
((P Q) (R R)) (P Q).
In a proof of the contrapositive, we assume that Q is false and try to prove that P is false.
Proof. Suppose that the number log2 5 is not irrational, that is, suppose that log2 5 is rational.
It follows that log2 5 = a/b, where a and b are positive integers. We then have 2a/b = 5 or 2a = 5b .
However, the integer 2a is even and the integer 5b (a product of odd numbers) is odd. Since an
integer cannot be both even and odd, we have a contradiction. We conclude that log2 5 is an
irrational number.
When should a proof by contraposition or a proof by contradiction be attempted? There is
no foolproof method for deciding when such a proof will be helpful; this is the sort of knowledge
that comes with practice. However, when the hypothesis provides very little useful information, a
proof by contraposition or a proof by contradiction may be helpful or necessary. The hypothesis in
Example 2.31 involves a prime number. The definition of a prime number is essentially a negative
definition: a prime number does not have any positive divisors except itself and 1. By starting
with the negation of the conclusion, we can work with a composite numbera number that can be
factored. This factorization provides the key to the proof. The statement to be proved in Example
2.32 is a negative statement; we want to prove that a given number is not rational. By assuming
that the number is rational, we can express it as a ratio of two integers; our assumption that the
conclusion is false gave us some information that we could use.
These two types of proofs, which are sometimes referred to as indirect proofs, should be
kept in mind as possible options when presented with a theorem or result to prove. However, it
is important to think carefully about the logic behind your proof. Students sometimes write a
proof in the style of proof by contradiction when the logic they have used is actually proof by
contraposition. In addition, although there are plenty of exceptions, a direct proof or at least a
proof by contraposition, is generally preferred over a proof by contradiction. A direct proof often
provides a better indication as to why a theorem or result is valid.
Proof by contradiction makes some people uneasyit seems a little like magic, perhaps because
throughout the proof we appear to be proving false statements. A direct proof, or even a proof
of the contrapositive, may seem more satisfying. Still, there seems to be no way to avoid proof by
contradiction. (Attempts to do so have led to the strange world of constructive mathematics.)
Chapter 2 Proofs
We close this section with the following simple but wonderful indirect proof that there are an
infinite number of primes. This proof is at least as old as Euclids book The Elements. (This result
is Proposition 20 in Book IX. Euclids proof is very interesting as it illustrates the difficulty of the
idea of representing a generic number of things with symbols.)
Proof. Suppose that there are only a finite number of primes. It is then possible to write all
of the primes in a list: p1 , p2 , . . . , pn . Consider the integer m = p1 p2 pn + 1. By Corollary
2.23, the integer m must be divisible by some prime in our list, say pj . Since pj clearly divides
the product p1 p2 pn , part (f) of Theorem 2.7 guarantees that pj divides m p1 p2 pn . Since
m p1 p2 pn = 1, we have reached the contradiction that pj > 1 and pj divides 1. Hence, there
are an infinite number of primes.
It is important to realize that the proof of this theorem does not give us a formula or method
for constructing an infinite list of prime numbers. In particular, the integer m that appears in the
proof is not necessarily prime. Although many people have tried over the centuries, to date no one
has devised a prime-generating formula.
Euclid of Alexandria. Euclid, who flourished around 300 BC, is known to most high school
students as the father of geometry. Surprisingly little is known of his life, not even his dates or
birthplace. Shortly before 300 BC, Ptolemy I founded the great university at Alexandria, the first
institution of its kind, and not unlike the universities of today. Euclid was recruited, probably from
Athens, to head the mathematics department.
Euclid appears to have been primarily a teacher, not a great originator of new material. His
Elements, unquestionably the most successful textbook of all time, often is thought to be an
encyclopedia of all geometrical knowledge at the time. In fact, it is an elementary textbook covering
geometry, arithmetic and algebra; Euclid himself knew and wrote about more advanced topics in
mathematics. The perception that the Elements is only about geometry presumably is due to two
facts: his name is most closely associated with geometry in modern elementary mathematics; and
the mathematicians of antiquity, lacking modern algebraic notation, did all arithmetic and algebra
in the language of geometryfor example, numbers were not thought of in the abstract, but as the
lengths of line segments, or measures of areas or volumes.
The Elements consists of thirteen books containing much that is still familiar to students: most
of elementary geometry, of course, including the Pythagorean Theorem; the theorem on the number
of primes and the Fundamental Theorem of Arithmetic; and the Euclidean Algorithm, which we
will see in Section 3.3.
Two famous stories are told about Euclid. It is said that Ptolemy asked him if geometry
could be learned without reading the Elements, to which Euclid replied, There is no royal road to
geometry. (This story is also told about Menaechmus and Alexander the Great, which perhaps
diminishes its credibility somewhat.) In response to a student who questioned the use of geometry,
Euclid reportedly ordered that the student be given three pence, since he must needs make gain
of what he learns.
For more information, see A History of Mathematics, by Carl B. Boyer, New York: John Wiley
and Sons, 1968; or An Introduction to the History of Mathematics, by Howard Eves, New York:
Holt, Rinehart and Winston, 1976.
Exercises 2.8.
1. Suppose that a and b are integers for which a + b is odd. Prove that either a or b is odd. Give an
indirect proof.
2. Suppose that a and b are real numbers for which a + b > 100. Prove that either a > 50 or b > 50.
Give both a direct proof and an indirect proof.
3. An integer n is said to be square-free if it has no divisors that are perfect squares (other than 1).
Show that any divisor of a square-free integer is square-free.
7. Prove that log2 25 and log2 3 5 are irrational numbers. Give a direct proof by using previous results.
8. Prove that log2 7 is irrational.
9. Let x be a real number. Prove that either log2 5 x or log2 5 + x is irrational.
10. Let p1 , p2 , p3 , . . . be a listing of the prime numbers in increasing order. Find a value of n for which
p1 p2 pn + 1 is not a prime number.
11. Show that for every integer n > 2 there is a prime between n and n!.
12. Prove that the sum of a rational number and an irrational number is irrational. You may use the fact
that the sum of two rational numbers is rational.
13. Suppose that the function f is differentiable and that the function g is not differentiable. Prove that
the function f g is not differentiable.
14. Fill in the details of the following proof that 2 is irrational using the Well-Ordering Property.
Suppose that 2 is rational. Then there exist positive integers a and b such that (a/b)2 = 2. Let
P = {n N : n(a/b) is an integer}. By the Well-Ordering Property, the set P contains a least
integer p. Then
p(a/b) p is a positive integer that belongs to P and is less than p. This is a
contradiction so 2 is irrational.
Number Theory
In this chapter, we begin our study of number theory in earnest. We discuss and prove a number
of well-known and useful theorems in this interesting area of mathematics. The section headings
for the chapter should give readers a good indication of the range of topics to be considered. Along
the way, we will have occasion to take a brief look at the structure of the abstract spaces Zn and
Un . The chapter concludes with a discussion of quadratic residues and a characterization of those
positive integers that can be represented as a sum of two squares.
We begin our study of number theory by defining the notion of congruence. As with so many
concepts we will see, congruence is simple (and perhaps familiar to you) yet enormously useful
and powerful in the study of number theory.
DEFINITION 3.1 Let n be a positive integer. The integers a and b are congruent modulo n,
denoted by a b (mod n), if and only if n|(a b).
The notation for congruence, as well as much of the elementary theory of congruence, is due to
the famous German mathematician, Carl Friedrich Gausscertainly the outstanding mathematician of his time, and perhaps the greatest mathematician of all time. A short biography of Gauss
can be found at the end of this section.
From the definition, it is easy to see that 36 1 (mod 7). In fact, any two numbers in the
set {. . . , 13, 6, 1, 8, 15, . . .} are congruent modulo 7 because their differences are multiples of 7.
Note that each of these numbers leaves a remainder of 1 when divided by 7. Any two numbers in
the set {. . . , 4, 4, 12, 20, . . .} are congruent modulo 8. What are the remainders of these numbers
when divided by 8? Finally, we note that 42 2 (mod 11) and 71 6 (mod 11).
The examples in the previous paragraph make the next result plausible. Although it is quite
simple, it is a wonderfully useful result. For the record, by the remainder on division by n, we
mean the unique number r, guaranteed by the Division Algorithm, for which 0 r < n.
THEOREM 3.2 Let n be a positive integer. Then a b (mod n) if and only if a and b have
the same remainder when divided by n. Consequently, for each integer a, there exists a unique
integer r such that 0 r < n and a r (mod n).
Proof. Since this is a biconditional statement, we break the proof into two parts. Suppose first
that a b (mod n). By definition, there exists an integer x such that a b = xn. By the Division
Algorithm, there exist unique integers q and r such that b = nq + r and 0 r < n. We then have
a = b + xn = (nq + r) + xn = n(q + x) + r.
It follows that the remainder is r when a is divided by n. Therefore, the integers a and b have the
same remainder when divided by n.
Now suppose that a and b have the same remainder when divided by n. By the Division
Algorithm, there exist unique integers q1 , q2 , and r such that a = nq1 + r, b = nq2 + r, and
0 r < n. Since
a b = (nq1 + r) (nq2 + r) = n(q1 q2 ),
we see that n divides a b. It follows that a b (mod n).
Whenever we use the notation a b (mod n), it is assumed that a and b are integers and that
n is a positive integer. If the value of n is clear from the context, we often write simply a b.
Congruence of integers shares many properties with equality (due to the fact that equal remainders
are involved); we list a few of these properties in the next theorem.
1. a a for any a.
2. If a b, then b a.
3. If a b and b c, then a c.
4. a 0 if and only if n|a.
5. If a b and c d, then a + c b + d.
6. If a b and c d, then a c b d.
7. If a b and c d, then ac bd.
8. If a b and j is any positive integer, then aj bj .
Proof. Parts (1) through (4) follow easily from the definition of congruence and the properties
of divisibility. Alternatively, each of these four results is a simple consequence of Theorem 3.2. In
what follows, we prove parts (6) and (8); proofs for parts (5) and (7) are left as exercises.
To prove part (6), suppose that a b and c d. From the definition of congruence, we know
that n|(a b) and that n|(c d). It follows that n divides the quantity (see Theorem 2.7)
(a b) (c d) = (a c) (b d).
Using the definition (remember that definitions are biconditional), we see that a c b d.
Part (8) follows from part (7) and the simple form of mathematical induction discussed in
Section 2.6. However, this result also follows from the equality
aj bj = (a b)(aj1 + aj2 b + . . . + abj2 + bj1 )
(see Exercise 10 in Section 2.4). Suppose that a b and that j is a positive integer. By definition,
we know that n|(a b). It then follows from the above equation that n|(aj bj ). We conclude that
aj bj .
Note that parts (1), (2), and (3) of this theorem show that congruence modulo n defines an
equivalence relation on Z. Parts (5) through (8) can be summarized by saying that in any expression
involving +, , , and positive integer exponents (that is, any polynomial), if individual terms are
replaced by other terms that are congruent to them modulo n, the resulting expression is congruent
to the original. (Do you notice anything related to induction in this comment?) To illustrate this,
consider the polynomial Q defined by Q(x) = 3x3 + 22x2 6x + 73. Suppose that we want to find
the remainder when Q(17) is divided by 5. Using the results from Theorem 3.3, we have (modulo 5)
Q(17) 3 173 + 22 172 6 17 + 73
3 23 + 2 22 1 2 + 3
1 + 3 2 + 3 3.
This is certainly much easier than evaluating Q(17) and then dividing by 5.
The following simple result will play an important role later in this chapter.
Proof. Restating the theorem in the language of congruences yields the following: for each
positive integer m, either m2 0 (mod 4) or m2 1 (mod 4). To see this, suppose that m is any
positive integer and note that m is congruent modulo 4 to exactly one of 0, 1, 2, or 3. It follows
that m2 is congruent to 02 0, 12 1, 22 0, or 32 1.
It is also possible to solve equations involving congruences. To illustrate this, suppose we want
to find all integers x such that 3x 5 is divisible by 11. Putting this statement into the language of
congruence, we are trying to solve the congruence equation 3x 5 (mod 11) for x. Lets assume
3x 5 and see what this tells us about x. Working modulo 11, we find that
3x 5 12x 20 x 9.
We have thus shown that if 3x 5, then x 9. We also need to verify the converse, that is, we
need to show that if x 9, then 3x 5:
x 9 3x 27 3x 5.
99x 9
4x 9
x 75
11x 825
104x 234
x 75
11x 1.
Therefore, the solution set of the equation 11x 1 (mod 103) is {103n + 75 : n Z}.
We present one final illustration of the power of simple congruences.
EXAMPLE 3.5 You may be familiar with the old rule, known as casting out nines, that an
integer is divisible by 9 if and only if the sum of its digits is divisible by 9. To prove this fact,
suppose that x is a positive integer. When we write x in decimal form, it looks like dk dk1 . . . d1 d0 ,
where each digit di is an integer between 0 and 9. This means that
x = dk 10k + dk1 10k1 + . . . + d1 10 + d0 .
Since 10 1 (mod 9), we find that 10i 1i 1 (mod 9) for every nonnegative integer i and thus
x dk + dk1 + . . . + d1 + d0
(mod 9).
Note that we have actually proved that an integer and the sum of its digits are congruent modulo 9,
that is, the remainder when a number is divided by 9 is the same as the remainder that is obtained
when the sum of its digits is divided by 9. For example, the numbers 4357 and 19 = 4 + 3 + 5 + 7
both have a remainder of 1 when divided by 9. In particular, a number is divisible by 9 if and only
if the sum of its digits is divisible by 9.
Carl Friedrich Gauss. Gauss (17771855) was an infant prodigy and arguably the greatest
mathematician of all time (if such rankings mean anything; certainly he would be in almost everyones list of the top five mathematicians, as measured by talent, accomplishment, and influence).
Perhaps the most famous story about Gauss relates his triumph over busywork. As Carl Boyer
tells the story: One day, in order to keep the class occupied, the teacher had the students add
up all the numbers from one to a hundred, with instructions that each should place his slate on
a table as soon as he had completed the task. Almost immediately Carl placed his slate on the
table, saying, There it is; the teacher looked at him scornfully while the others worked diligently.
When the instructor finally looked at the results, the slate of Gauss was the only one to have the
correct answer, 5050, with no further calculation. The ten-year-old boy evidently had computed
mentally the sum of the arithmetic progression 1 + 2 + + 100, presumably through the formula
m(m + 1)/2.
By the time Gauss was about 17, he had devised and justified the method of least squares, but
had not decided whether to become a mathematician or a philologist. Just short of his nineteenth
birthday, he chose mathematics, when he succeeded in constructing (under the ancient restriction
to compass and straightedge) a seventeen-sided regular polygon, the first polygon with a prime
number of sides to be constructed in over 2000 years; previously, only the equilateral triangle and
the regular pentagon had been constructed. Gauss later proved precisely which regular polygons
can be constructed. (The answer is somewhat unsatisfying, however. He proved that the regular
polygons that can be constructed have 2m p1 p2 pr sides, for any m 0 and distinct Fermat
primes pi , that is, prime numbers having the form 22 + 1 for some n. Unfortunately, it is not
known whether there are an infinite number of Fermat primes.)
Gauss published relatively little of his work, but from 1796 to 1814 kept a small diary, just
nineteen pages long and containing 146 brief statements. This diary remained unknown until
1898. It establishes in large part the breadth of his genius and his priority in many discoveries.
Quoting Boyer again: The unpublished memoranda of Gauss hung like a sword of Damocles over
mathematics of the first half of the nineteenth century. When an important new development
was announced by others, it frequently turned out that Gauss had had the idea earlier, but had
permitted it to go unpublished.
The range of Gausss contributions is truly stunning, including some deep and still standard
results such as the Quadratic Reciprocity Theorem and the Fundamental Theorem of Algebra. He
devoted much of his later life to astronomy and statistics, and made significant contributions in
many other fields as well. His name is attached to many mathematical objects, methods, and
theorems; students of physics may know him best as the namesake of the standard unit of magnetic
intensity, the gauss.
The information here is taken from A History of Mathematics, by Carl Boyer, New York: John
Wiley & Sons, 1968.
Exercises 3.1.
1. For the given values of n and a, find the number b that belongs to the set {0, 1, . . . , n 1} for which
a b (mod n). Do your computations without the aid of a calculator.
a) n = 7, a = 30
b) n = 9, a = 69
c) n = 2, a = 7461
d) n = 6, a = 60
e) n = 11, a = 63
f ) n = 17, a = 38
8. Solve each of the following by finding all values of x that satisfy the congruence. As in the examples
in the text, your solution requires two parts. (Try to solve these without a calculator.)
a) 7x 6 (mod 9)
b) 5x 7 (mod 13)
b) 3721
The spaces
c) 11519
d) 822237
One interpretation of Theorem 3.3 is that doing arithmetic modulo a positive integer n can be
simplified by replacing some numbers with numbers that are equivalent modulo n. In other words,
the result of a computation doesnt depend on which numbers we compute with, only that they are
the same modulo n. For example, to compute 38 96 (mod 11), we can more easily compute the
product 5 8 (mod 11), since 38 5 and 96 8. This suggests that we can go further, devising
some universe in which there really is no difference between 38 and 5 or between 96 and 8 (assuming
that we want to work modulo 11).
Let n be a fixed positive integer. Since congruence modulo n is an equivalence relation on Z, we
can consider the equivalence classes associated with each integer. In particular, for every integer a,
the symbol [a] denotes the set {b Z : b a (mod n)}. Note that [a] represents a set that contains
an infinite number of integers. However, we treat [a] as a single entity; we are entering the realm
of the abstract. Theorem 1.17 shows that the sets [a1 ] and [a2 ] are either disjoint (when a1 6 a2 )
or identical (when a1 a2 ). Recall that if r is the remainder on dividing n into a, then a r or,
in our new language, [a] = [r]. This means that every [a] is equal to some [r] for 0 r < n. In
other words, the sets [0], [1], . . ., [n 1] are the distinct equivalence classes that partition Z. In
[r1 ] [r2 ] = when 0 r1 < r2 < n.
For example, Z4 = {[0], [1], [2], [3]} and Z7 = {[0], [1], [2], [3], [4], [5], [6]}. Note that [3] has a
different meaning when interpreted as an element of Z7 rather than an element of Z4 ; the context
should make this clear. For the record, we could write Z4 = {[80], [25], [102], [13]}, but only to
make a pointthis is not done in practice.
The set Zn thus consists of n elements. Each of its elements is a set that contains an infinite
number of integers. This is a new universe in which we can investigate arithmetic. We begin
The spaces Zn
by presenting definitions for the familiar operations that are defined on integers, namely, addition,
subtraction, and multiplication.
DEFINITION 3.7 For elements [a] and [b] of Zn , define the operations of addition, subtraction,
and multiplication by [a] + [b] = [a + b], [a] [b] = [a b], and [a] [b] = [ab], respectively.
Most mathematicians would agree that these definitions are quite natural. To illustrate these
operations, we present the complete addition and multiplication tables for Z4 .
[0] [1]
[2] [3]
For example, to find [3] + [2], locate [3] in the left column of the + table, then follow across its
corresponding row until you are under the [2] located in the top row of the table. This entry ([1])
is the desired sum.
Although we have characterized the definitions of addition, subtraction, and multiplication as
natural, the situation is not as straightforward as it may first appear. For example, the definition
[a] + [b] = [a + b] depends on the manipulation of specific integers a and b, but we know that there
are other integers c and d with [a] = [c] and [b] = [d]. What if we compute [c + d]? The value of
[c + d] must be the same as [a + b] or the definition of addition doesnt make sense. Fortunately,
Theorem 3.3 comes to the rescue. Since [a] = [c] and [b] = [d], we know that a and c are congruent
modulo n, as are b and d. Thus their sums a + b and c + d are congruent modulo n, which means
that [a + b] = [c + d]. The operations of subtraction and multiplication can be justified in the same
way. What we have shown is that the definitions of addition, subtraction, and multiplication are
Many of the basic algebraic properties of integers carry over to Zn . The following theorem lists
a few of the more familiar properties.
Proof. We prove two parts and leave the rest as exercises. Although the proofs are quite easy,
it is important that you think about the steps that are involved.
To prove part (a), suppose that [a] and [b] are elements of Zn . We then have
[a] + [b] = [a + b] = [b + a] = [b] + [a].
Note that the definition of addition in Zn involves addition in Z. Since we know the properties of
addition in Z, we can apply them to determine the properties of addition in Zn . Similarly, part (f)
follows from the simple equation [0] + [a] = [0 + a] = [a].
Parts (a) and (c) are commutative laws, parts (b) and (d) are associative laws, and part (e)
says that multiplication distributes over addition. Parts (f), (g), (h), and (i) show that [0] and [1]
act in Zn in much the same way that 0 and 1 act in Z.
Though many properties of the integers are shared by Zn , there are some exceptions. Consider
the following statements that are true in Z.
If ab = 0, then either a = 0 or b = 0.
If a2 = 1, then either a = 1 or a = 1.
If ab = ac and a 6= 0, then b = c.
(For the record, the second and third statements are consequences of the first statement.) These
statements are not necessarily true in Zn . Working in Z24 , we find that
[3] [8] = [24] = [0], but [3] 6= [0] and [8] 6= [0];
[5] [5] = [25] = [1], but [5] 6= [1] and [5] 6= [1];
[2] [4] = [8] = [32] = [2] [16], but [2] 6= [0] and [4] 6= [16].
Examples such as these should serve as a reminder that there are differences between the arithmetic
operations in Z and Zn .
It is important to remember that [a] is not an integer; it represents an infinite collection of
integers. Hence, the set Zn is not a subset of Z. It is sometimes tempting to confuse the set
Zn = {[0], [1], [2], . . . , [n 1]} with the set {0, 1, 2, . . . , n 1} Z. The brackets make all the
difference in the world: in Z5 , the elements [2] and [7] are the same, but of course 2 and 7 are
different integers. The set Zn , along with the operations of addition and multiplication, is an
example of an abstract space. The general properties of such spaces are studied in detail in higher
mathematics. As this may be your first exposure to abstract spaces, you will need to spend some
time and energy thinking about these objects.
Exercises 3.2.
1. Construct addition and multiplication tables for
a) Z2
b) Z5
c) Z6
2. In Z23 , find each of the following. Give your answer in the form [r], where 0 r < 23.
a) [10] + [17]
b) [8] + [22]
c) [6] [16]
d) [14] [20]
e) [8] [12]
f ) [13] [19]
5. Use the table from Exercise 1(c) to verify the following statements.
a) There is a unique [x] Z6 such that [5] [x] = [2].
b) There is no [x] Z6 such that [3] [x] = [4].
c) There is an [x] Z6 such that [4] [x] = [2], but it is not unique.
6. Give examples in Z14 to illustrate each of the following.
a) [a] [b] = [0] but [a] 6= [0] and [b] 6= [0].
b) [a] and [b] so that [a] [b] = [1] but neither [a] nor [b] are [1] or [1].
c) [a], [b], and [c] so that [a] [b] = [a] [c] but neither [a] = [0] nor [b] = [c].
7. Find all the elements [x] of Z15 such that [x] = [p] for some prime number p (p < 15 is not required).
8. Find (with proof) the sum of all the elements of Zn . (Consider the even and odd cases separately.)
9. In Z360 , find all of the elements [x] such that [x]n = [0] for some positive integer n. Of course, the
symbols [x]n mean to multiply [x] times itself n times.
10. Let [a] be an element in Zn .
a) Suppose that [a] + [x] = [a] + [y]. Prove that [x] = [y].
b) Suppose that [a] [x] = [a] [y]. Give an example in Z35 to show that [x] may not equal [y].
Consider the numbers 12 and 18. We know (from elementary school and also our formal Definition
2.6) that the divisors of 12 are 1, 2, 3, 4, 6, and 12 and that the divisors of 18 are 1,
2, 3, 6, 9, and 18. If we want the common divisors of 12 and 18, we take the intersection of
these two lists. Thus the divisors of both 12 and 18 are 1, 2, 3, and 6. The largest number
in this common list, namely 6, is called the greatest common divisor of 12 and 18.
DEFINITION 3.9 Suppose a and b are integers, not both zero. The greatest common divisor
of a and b, denoted by (a, b) or gcd(a, b), is the largest positive integer that divides both a and b.
We will be concerned almost exclusively with the case in which a and b are nonnegative, but
since (a, b) = (|a|, |b|) the theory goes through with essentially no change in case a or b or both
are negative. (Note that (0, 0) is undefined.) The notation (a, b) might be somewhat confusing
since it is also used to denote ordered pairs and open intervals. The meaning is usually clear from
the context; if there is a chance for confusion, we use gcd(a, b). In the discussion preceding the
definition, we showed that (18, 12) = 6. For further examples, it should be clear that (21, 7) = 7,
(28, 18) = 2, and (31, 13) = 1.
For the record, whenever the notation (a, b) is used, it is assumed that at least one of a or b is
nonzero. The next theorem lists some simple but important observations concerning (a, b).
0 < r1 < b;
( a r1 (mod b) )
b = q2 r1 + r2 ,
0 < r2 < r1 ;
( b r2 (mod r1 ) )
r1 = q3 r2 + r3 ,
0 < r3 < r2 ;
( r1 r3 (mod r2 ) )
r2 = q4 r3 + r4 ,
0 < r4 < r3 ;
( r2 r4 (mod r3 ) )
Since a decreasing list of nonnegative integers cannot continue indefinitely, eventually one
of the remainders is 0. It follows that the last two steps in the list would look like
rk2 = qk rk1 + rk ,
rk1 = qk+1 rk + 0.
(mod rk ) )
an important role in many applications. For example, to find the greatest common divisor of 198
and 168 with this method, we would compute
198 = 1 168 + 30
168 = 5 30 + 18
30 = 1 18 + 12
18 = 1 12 + 6
12 = 2 6
Furthermore, with a little extra bookkeeping, we can use the Euclidean Algorithm to show that
(a, b) is actually a linear combination of a and b. Referring to the previous example with a = 198
and b = 168, we find that
30 = 198 168 = a b,
18 = 168 5 30 = b 5(a b) = 5a + 6b,
12 = 30 18 = (a b) (5a + 6b) = 6a 7b,
6 = 18 12 = (5a + 6b) (6a 7b) = 11a + 13b.
Notice that the numbers in the left column are precisely the remainders computed by the Euclidean
Algorithm. This example leads to the following general result, known as the Extended Euclidean
Algorithm. In spite of its simplicity, this is an extremely important theorem, one of the most crucial
results in this book.
THEOREM 3.11 If a and b are integers, not both zero, then there exist integers x and y such
that (a, b) = ax + by.
that g|d and thus g d. To complete the proof, we need to prove that d g and to do this, it is
sufficient to prove that d|a and d|b. By the Division Algorithm, there exist integers q and r such
that a = dq + r and 0 r < d. It follows that
r = a dq = a (ai + bj)q = a(1 iq) + b(jq)
is a linear combination of a and b. If r > 0, then r is an element of D that is smaller than d, a
contradiction to the fact that d is the least element of D. We conclude that r = 0 and thus that
d|a. In a similar way, it follows that d|b. This completes the proof.
It is rather remarkable that the greatest common divisor of a and b can be written as a linear
combination of a and b. In fact, the second of the proofs given above reveals that (a, b) is the
smallest positive integer with this property.
The following definition introduces an important concept in number theory.
It is easy to verify that 6 and 5 are relatively prime as are 21 and 10, but the integers 12
and 20 are not relatively prime. Part (a) of Theorem 3.10 can be rephrased to say that 1 and a
are relatively prime for any integer a. Finally, if p is a prime and a is any integer that satisfies
1 a < p, then a and p are relatively prime. An extremely useful characterization of relatively
prime integers is given below; its proof is left as an exercise.
THEOREM 3.13 Suppose that a and b are integers, not both zero. Then a and b are relatively
prime if and only if there exist integers x and y such that ax + by = 1.
Exercises 3.3.
1. For each pair of integers a and b, find (a, b) and integers x and y satisfying (a, b) = ax + by. Which
pairs of integers are relatively prime?
a) a = 32, b = 13
b) a = 148, b = 40
c) a = 300, b = 55
d) a = 58, b = 17
e) a = 147, b = 105
f ) a = 338, b = 225
2. Let p be a prime and let a be a positive integer. What are the possible values for (a, p)?
3. Let a and b be integers, not both 0. Prove that (a, b) = (|a|, |b|).
4. Prove parts (d) and (e) of Theorem 3.10.
5. Prove Theorem 3.13.
6. Let a and b be positive integers. Suppose that there exist integers x and y such that ax + by = 6.
What are the possible values for (a, b)?
7. Suppose that g = (a, b). Prove that g 2 |(ab).
8. Suppose that g is a positive integer and let x be a multiple of g 2 . Show that there exist integers a and
b such that (a, b) = g and ab = x.
9. Show that there are an infinite number of ways of expressing (a, b) as a linear combination of a and b.
10. Let a and b be integers, not both zero, and let g = (a, b). Prove that an integer can be expressed as a
linear combination of a and b if and only if it is a multiple of g.
11. Prove that a and b are relatively prime if and only if a2 and b2 are relatively prime.
The spaces Un
12. The Euclidean Algorithm works so well that it is difficult to find pairs of numbers that make it take a
long time. Find two numbers whose greatest common divisor is 1 for which the Euclidean Algorithm
takes 10 steps.
The spaces
As the last part of Section 3.2 indicates, some of the arithmetic properties of Zn are different
from those of Z. In particular, it is possible for two nonzero numbers to have a product of zero
([3] [8] = [0] in Z24 ) and for the square of a number other than [1] or [1] to be [1] ([5]2 = [1]
in Z24 ). Notice also that [2] [3] = [1] in Z5 but that neither [2] nor [3] is [1] or [1]. This last
observation shows that there can be multiplicative inverses for elements in Zn other than [1] and
[1]. We are thus led to consider a notion of division (that is, multiplication by multiplicative
inverses) in Zn , but in order to do so, we must eliminate some of the problem elements, namely
those that do not have multiplicative inverses. The next theorem and its three corollaries are quite
useful in this regard and for our later work. The proofs of the corollaries are left as exercises.
Proof. Suppose that n, a, and b are integers such that n and a are relatively prime and n|ab.
By the definition of divisibility, there exists an integer k such that ab = nk. Since n and a are
relatively prime, Theorem 3.13 shows that there exist integers x and y such that nx + ay = 1. It
follows that
b = b(nx + ay) = nbx + aby = nbx + nky = n(bx + ky),
revealing that n|b.
COROLLARY 3.16 If p is a prime and p divides the product a1 a2 an , then p|ai for some
index i that satisfies 1 i n.
COROLLARY 3.17 If p, p1 , p2 , . . . , pn are primes and p divides the product p1 p2 pn , then
p = pi for some index i that satisfies 1 i n.
Given a positive integer n > 1, the set Zn can be best represented as
{[0], [1], [2], . . . , [n 2], [n 1]} = {[i] : 0 i n 1}.
For some elements [i] of Zn , we find that (i, n) = 1, while (i, n) > 1 for other values of i. It turns
out that the subset Un of Zn that consists of those elements [u] of Zn for which (u, n) = 1 has
some interesting properties. In particular (see Corollary 3.20 below), each element of Un has a
multiplicative inverse .
DEFINITION 3.18 Let n > 1 be a positive integer. The set Un Zn is defined to be the set
of all [u] Zn such that (u, n) = 1.
As indicated by the definition, the set U1 is not defined. The set U2 consists of a single element,
namely, {[1]}. It is easy to verify the following examples of sets Un :
U6 = {[1], [5]};
U7 = {[1], [2], [3], [4], [5], [6]};
U8 = {[1], [3], [5], [7]};
U21 = {[1], [2], [4], [5], [8], [10], [11], [13], [16], [17], [19], [20]}.
Note that [0] is never an element of Un while [1] and [n 1] are elements of Un for n > 2. The
most important property of elements of Un is given in the following result.
THEOREM 3.19 Suppose that u and n are integers with n > 1. Then u and n are relatively
prime if and only if there exists an integer v such that uv 1 (mod n).
Proof. Let u and n be integers with n > 1. We first suppose that u and n are relatively prime.
By Theorem 3.13, there exist integers v and w such that uv + nw = 1. Since uv 1 is a multiple of
n, it follows that uv 1 (mod n). Now suppose that uv 1 (mod n). By definition, there exists
an integer k such that uv 1 = kn. This equation can be written as uv + n(k) = 1. Applying
Theorem 3.13 once again, we find that u and n are relatively prime. (For the record, since the
integers u and v are interchangeable in the proof, the integers v and n are relatively prime.)
that [u] [v] = [1].
If n > 1 is an integer, then for each [u] Un there exists a [v] Un such
Proof. Let n > 1 be an integer and suppose that [u] Un . Since u and n are relatively prime
by the definition of Un , the theorem guarantees the existence of an integer v such that uv 1
(mod n). Referring to the theorem once again, we find that v and n are relatively prime and thus
[v] Un . The equation uv 1 (mod n) is equivalent to [u] [v] = [1] in Un .
Therefore, the set Un consists of those [u] Zn such that for some [v] Zn we have [u][v] = [1].
In other words, the set Un is the set of all elements of Zn that have multiplicative inverses. The
invertible elements of Zn are sometimes called unitshence the use of the symbol Un for this set.
We say [v] is a multiplicative inverse (or reciprocal ) of [u]. Note that [u] is an inverse of [v] when [v]
is an inverse of [u]. For some specific examples, for U5 = {[1], [2], [3], [4]}, we see that [2] and [3] are
inverses of each other, while [1] and [4] are their own inverses. In U14 = {[1], [3], [5], [9], [11], [13]},
we find that [3] and [5] are inverses, as are [9] and [11]; and that [1] and [13] are their own inverses.
For these examples it is easy to find an inverse by inspection. In general, this can be done by
the Extended Euclidean Algorithm or by solving a congruence equation.
EXAMPLE 3.21 Find an inverse for [17] in the set U37 . We apply the Extended Euclidean
Algorithm (details omitted) to find that
13 17 + 6 37 = 1 13 17 1
It follows that [13] = [24] is an inverse for [17]. Alternatively, we can solve the following congruence
modulo 37:
17x 1 34x 2 3x 2 36x 24 x 24,
The spaces Un
showing again that [24] is an inverse for [17]. In either case, we can check that when the product
17 24 is divided by 37, the remainder is 1.
Notice that both methods in the above example produced the same inverse for [17]. This is no
accident; each element in Un has exactly one inverse.
Proof. By Corollary 3.20, we know that [u] has at least one inverse in Un . Suppose that [v1 ]
and [v2 ] are both inverses of [u]. This means that [u] [v1 ] = [1] = [u] [v2 ] or, in the language of
congruence, uv1 uv2 (mod n). It follows that n divides the product u(v1 v2 ). Since u and n
are relatively prime, we find that n divides v1 v2 (see Theorem 3.14) and thus [v1 ] = [v2 ].
We denote the multiplicative inverse of [u] by [u]1 . Note well that this notation only makes
sense if [u] Un ; an arbitrary element of Zn may not have a multiplicative inverse. For elements [u]
of Un and positive integers k, we define [u]k to be [u]1 ; thus (adopting the usual convention
that [u]0 = [1]) for elements [u] of Un , the expression [u]k is defined for all integers k
THEOREM 3.23 The product of any two elements of Un is an element of Un . Hence, the
product of any number of elements of Un is an element of Un .
Proof. Suppose that [u1 ] and [u2 ] are in Un . Since (u1 , n) = 1 = (u2 , n), by Theorem 3.19, there
exist integers v1 and v2 such that u1 v1 1 and u2 v2 1. It follows that (u1 u2 )(v1 v2 ) 1. This
last equation, along with Theorem 3.19 once again, shows that (u1 u2 , n) = 1 and thus [u1 u2 ] Un .
Since [u1 u2 ] = [u1 ] [u2 ], we find that the product [u1 ] [u2 ] belongs to Un . Since
[u1 u2 ] [v1 v2 ] = [1]
[u1 ] [u2 ] [v1 ] [v2 ] = [1]
[u1 ] [u2 ] [u1 ]1 [u2 ]1 = [1],
= [u1 ]1 [u2 ]1 .
Notice that every row in this multiplication table is a list of all of the elements of U9 . In particular,
each row contains [1] exactly once, as it must, allowing us to read off inverses: [1]1 = [1], [2]1 = [5],
[4]1 = [7], and [8]1 = [8]. The fact that Un appears in each row is true in general and will be
useful to us later. We record it below and leave the proof as an exercise.
THEOREM 3.24 Let n > 1 be an integer and let [a1 ], [a2 ], . . . , [ak ] be a list of all the elements
of Un . If [u] Un , then [u] [a1 ], [u] [a2 ], . . . , [u] [ak ] is also a list of all the elements of Un .
In Zn we can add, subtract, and multiply, but, as in Z, we cannot divide. However, since
division is defined as multiplication by the multiplicative inverse, we can do division in Un . Thus,
if p is a prime, algebra in Zp is much like algebra in Q.
In this section, we present some further properties of the greatest common divisor and introduce the
related notion of least common multiple. These two concepts are first introduced in middle school
as an aid to adding and subtracting fractions. However, as is to be expected, we are going to look
at some deeper properties of these concepts. The first result provides a complete characterization
of the greatest common divisor.
THEOREM 3.25 Suppose that a and b are integers, not both 0. Then g = (a, b) if and only if
g > 0, g|a, g|b, and d|g for every common divisor d of a and b.
Proof. Suppose that g is an integer that satisfies g > 0, g|a, g|b, and d|g for every common
divisor d of a and b. We must prove that g is the greatest common divisor of a and b. It is clear
that g is a common divisor of a and b. By hypothesis, every common divisor d of a and b divides g
and thus satisfies |d| g (see Theorem 2.8). It follows that g is in fact the greatest common divisor
of a and b. A proof of the converse is left as an exercise.
As a consequence of this theorem, we see that the entire collection of common divisors of a and
b can be found by listing all of the divisors of (a, b). In other words, (a, b) is actually a multiple of
every other common divisor of a and b. This is a somewhat surprising result since it is not obvious
from the definition of the greatest common divisor. After all, from the definition all that we know
about the greatest common divisor is that it is larger than all the other common divisors; we now
see that it is not only larger, it is actually a multiple of every other common divisor.
Speaking of multiples, we now define another number which is dual to the greatest common
divisor. Consider once again the numbers 12 and 18. If we list the positive multiples of 12, we
obtain 12, 24, 36, 48, 60, 72, and so on. Similarly, the positive multiples of 18 are 18, 36, 54, 72,
90, 108, and so on. If we want the common multiples of 12 and 18, we take the intersection of these
two lists and obtain 36, 72, 108, and so on. The smallest number in this common list, namely 36,
is called the least common multiple of 12 and 18.
DEFINITION 3.26 Suppose a and b are positive integers. The least common multiple of a and
b, denoted by [a, b] or lcm(a, b), is the smallest positive integer that is a multiple of both a and b.
Returning to the discussion prior to the definition, we see that [12, 18] = 36. Omitting the
simple details, it should be clear that [2, 5] = 10, [4, 14] = 28, and [33, 77] = 231. The next
theorem, which is analogous to Theorem 3.10, records some simple facts related to [a, b].
Just as every common divisor of a and b is a divisor of (a, b), we now see that every common
multiple of a and b is a multiple of [a, b]. As shown in the next theorem, there is also an interesting
algebraic relationship between the greatest common divisor and the least common multiple.
Proof. To make the notation easier, let g = (a, b). By Exercise 11 in Section 3.4, there exist
integers A and B such that a = Ag, b = Bg, and (A, B) = 1. Let ` be the integer ABg and note
that aB = ` = Ab. We will show that the number ` satisfies the hypotheses of Theorem 3.28. It
is easy to verify that ` > 0, that a|`, and that b|`. Now suppose that m is any positive common
multiple of a and b and choose integers x and y such that ax = m = by. Since ax = by, we find
that Agx = Bgy and consequently (since g 6= 0) that Ax = By. This last equation reveals that
A|By and thus A|y since A and B are relatively prime (see Theorem 3.14). Writing y = Av then
m = by = Abv = `v,
which shows that `|m. Since all of the hypotheses of Theorem 3.28 are satisfied, the number ` is the
least common multiple of a and b. The proof is complete once we note that the product (a, b) [a, b]
is the same as g` = g(ABg) = (Ag)(Bg) = ab.
Illustrating this theorem with earlier examples, we find that (12, 18) [12, 18] = 6 36 = 12 18.
The numbers used in this example are quite small. If we are given larger numbers as a and b, it
may be more difficult to find the least common multiple [a, b]. However, the Euclidean Algorithm
provides a systematic approach for finding (a, b) and once this is known, it is easy to use the
preceding theorem to determine [a, b].
We are now ready to prove the uniqueness portion of the Fundamental Theorem of Arithmetic,
that is, to prove that every positive integer greater than 1 can be factored as a product of primes
in only one way (ignoring the order in which the factors are written). This fact seems completely
obvious to most students; they are thus left wondering why a proof is even required. Therefore,
before presenting the proof, we offer two examples of situations where unique factorization does
not hold and thus show that there actually is something to prove.
Consider the set T = {3n 2 : n Z+ } = {1, 4, 7, 10, . . .} and define multiplication in the usual
way. It is easy to verify that the set T is closed under multiplication, that is, the product of any
two elements of T is another element of T . (Note, however, that T is not closed under addition.)
Prime numbers have the same meaning as before; an integer in T is prime if its only factors in T
are itself and 1. So, for instance, the numbers 4, 7, and 10 are prime in T , but 28 = 4 7 is not. In
this set, the number 100 has two different prime factorizations: 4 25 = 100 = 10 10. Therefore,
prime factorization is not unique in the set T .
For a different and more complicated example, consider the set C = {a + b 5 : a, b Z} with
addition and multiplication defined in the usual way. The reader may verify that C is closed under
both addition and multiplication. Without including any details, we note that it can be shown
that the numbers 3, 7, 1 + 2 5, and 1 2 5 are all prime in C. It then follows that 21 has
two distinct prime factorizations:
3 7 = 21 = 1 + 2 5 1 2 5 .
So once again we have an example of a set that does not have unique factorizations. Since C
apparently shares all of the algebraic properties of Z, we thus realize that there is some other
crucial property that Z satisfies in order to guarantee unique factorization. We encourage the
reader to trace the origins of the facts needed in the following proof and thus determine what key
results lie behind this property of Z.
THEOREM 3.30 Fundamental Theorem of Arithmetic
Every positive integer n > 1
is either a prime number or can be factored into a product of prime numbers. Furthermore, the
factorization is unique except for the order in which the factors are written, that is, in any two
factorizations of n into primes, every prime p occurs the same number of times in each factorization.
Proof. We already have seen that n can be factored in at least one way (see the proof of Theorem
2.22), so we need only prove uniqueness. Suppose that a positive integer n > 1 can be represented
as a product of primes in two different ways as
p1 p2 pj = n = q1 q2 qk ,
where, without loss of generality, we may assume that
1 j k,
p1 p2 pj ,
q 1 q2 qk .
b) (a, b) = p1
min{ek ,fk }
Proof. Although the notation is admittedly rather formidable, this result is a simple consequence
of Theorem 3.31, which states that one number divides another if and only if the primes in the
factorization of the first are present to lower powers than those in the second. Consequently, if
d is a common divisor of a and b, then any prime in its factorization must occur less often (or
equally) than it occurs in either the factorization of a or the factorization of b. This proves part
(a). To determine the largest possible common factor, we clearly should choose the largest exponent
possible for each prime; this is exactly what part (b) says. Finally, part (c) follows immediately
from part (a), part (b), and Theorem 3.31.
THEOREM 3.33 Suppose integers a and b have prime factorizations a = pe11 pe22 pekk and
b = pf11 pf22 pfkk , where the pi s are distinct and the exponents are nonnegative. Then
a) A positive integer m = pt11 pt22 ptkk is a common multiple of a and b if and only if the
inequality ti max{ei , fi } is valid for 1 i k.
max{e1 ,f1 } max{e2 ,f2 }
b) [a, b] = p1
max{ek ,fk }
16. Suppose that n is a positive integer that is not a perfect square. Prove that n is an irrational number.
In this section, we want to look more carefully at Un . To aid in this investigation, we introduce
a new quantity, the Euler phi function, written (n), for positive integers n. This is a rather
remarkable function, but we only need a few of its more basic properties for our purposes.
DEFINITION 3.34 Let n be a positive integer. The Euler phi function, denoted by the symbol
(n), represents the number of positive integers less than or equal to n that are relatively prime to
n. In other words, for each n > 1 the value of (n) is the number of elements in the set Un .
It is easy to verify that (1) = 1, (2) = 1, (4) = 2, (12) = 4, (15) = 8, and (17) = 16.
(By the way, what makes (1) somewhat unusual?) In general, if p is a prime, then (p) = p 1
because 1, 2, . . . , p 1 are all relatively prime to p but p is not. The number (n) turns out to
have a remarkably simple form; that is, there is a simple formula that gives the value of (n) for
any positive integer n. Stating and proving this formula is the goal of the next few results.
Proof. Let p be a prime and let e be a positive integer. To find (pe ), we need to calculate the
number of positive integers less than or equal to pe that are relatively prime to pe . As is often
the case, it turns out to be easier to calculate the number that are not relatively prime to pe , and
subtract from the total. The positive integers less than or equal to pe are 1, 2, . . . , pe ; there are pe
of these integers. The numbers that are not relatively prime to pe must be multiples of p, namely
any number in the set {kp : 1 k pe1 }. Since there are pe1 numbers in this set, we find that
(pe ) = pe pe1 .
For example, we see that (32) = 32 16 = 16 and that (125) = 125 25 = 100. As we will
prove below, it turns out that (ab) = (a)(b) when a and b are relatively prime. Using this fact,
we find that
(4000) = (25 53 ) = (25 ) (53 ) = 16 100 = 1600.
It is certainly evident how much easier it is to compute (4000) using these results than to use a
direct approach. Note that once again, we are reducing a problem about positive integers to one
for prime numbers.
LEMMA 3.36 Suppose that b and e are positive integers and that p is a prime. If b and p are
relatively prime, then (bpe ) = (b)(pe ).
Proof. Let p be a prime and let e be a positive integer. There is nothing to prove when b = 1
so assume that b > 1 and (b, p) = 1, and let n = bpe . We proceed to count the number of positive
integers less than or equal to n that are not relatively prime to n. As a start, note that for
1 k n,
(b, k) 6= 1;
(n, k) 6= 1
(b, k) = 1 and (p, k) = p;
where the two options are mutually exclusive. To count these values, we list the numbers 1, 2, . . . , n
in pe rows of b numbers each:
1, 2, . . . , b ;
b + 1, b + 2, . . . , 2b ;
2b + 1, 2b + 2, . . . , 3b ;
(pe 1)b + 1, (pe 1)b + 2, . . . , pe b .
To determine those values of k for which (b, k) 6= 1, we note that each row can be used to form a
representation of Zb (by writing i as [i]) and thus contains b (b) values of k for which (b, k) 6= 1.
It follows that there are a total of pe (b (b)) values for integers k with (b, k) 6= 1. Now suppose
that (b, k) = 1 and (p, k) = p. Then k = xp, where (x, b) = 1 and 1 x bpe1 . Referring
once again to the listing given above, each row contains (b) integers that are relatively prime to
b (those values of i for which [i] Ub ). Multiplying each of these numbers that appear in the first
pe1 rows by p generates a value of k for which 1 k n, (b, k) = 1, and (p, k) = p. We thus find
that there are pe1 (b) values of k with this property. Putting this information together yields
(bpe ) = (n) = n pe (b (b)) pe1 (b)
= n bpe + pe (b) pe1 (b)
= (b) pe pe1 = (b)(pe ),
where the last step uses Theorem 3.35.
If a = pe11 pe22 pekk , where the pi s are distinct primes and the ei s are
positive, then
(a) =
pei i pei i 1 .
Proof. This result is a simple consequence of the Principle of Mathematical Induction, where
the value of n represents the number of distinct primes in the product. When n = 1, the equation
follows immediately from Theorem 3.35. Now suppose that the result holds for a product involving
Q ei
pi , where the pi s
n distinct primes, where n is some positive integer. Consider the integer a =
are prime and the ei s are positive. Note that the integers
pei i
Using the lemma, the inductive hypothesis, and Theorem 3.35, we obtain
en+1 1
(a) =
pei i pn+1
pei i pei i 1
pn+1 pn+1
pei i piei 1 .
This shows that the equation is valid for a product involving n + 1 distinct primes. The result now
follows by the Principle of Mathematical Induction.
Since every positive integer n > 1 can be written as a product of primes, Theorem 3.37 shows
how to determine (n) for any positive integer n > 1. For example,
(600) = 23 3 52 = (23 22 )(3 1)(52 5) = 160.
This is another illustration of how the prime numbers can be viewed as the building blocks for the
positive integers; once we know how the Euler phi function behaves for prime numbers, we know
how it behaves for all positive integers. Of course, there is the difficult computational problem of
determining the prime factorizations of large integers, but that is an entirely different matter.
The defining characteristic of Un is that every element has a unique multiplicative inverse (see
Theorem 3.22). It is quite possible for an element of Un to be its own inverse; for example, in U12 ,
each of the elements [1], [5], [7], and [11] is its own inverse. This stands in contrast to arithmetic
in Z or R, where the only solutions to x2 = 1 are 1. If n is prime, then this familiar fact is true
in Un as well.
THEOREM 3.39 If p is a prime, then the only elements of Up which are their own inverses are
[1] and [p 1] = [1].
Proof. Let p be a prime. It is certainly clear that [1] and [1] are their own inverses in Up .
Suppose that [u] Up is its own inverse. The fact that [u] [u] = [1] in Up is equivalent to the
statement u2 1 (mod p). This means that p|(u 1)(u + 1). By Corollary 3.15, we know that
either p|(u 1) or p|(u + 1), which implies that either [u] = [1] or [u] = [1], respectively.
If p > 2 is prime, then Up = {[1], [2], . . . , [p 1]}. Note that Up contains an even number of
elements since p is odd. The elements [2], [3], . . . , [p 2] all have unique inverses different from
themselves, so it must be possible to pair up each element in this list with its inverse from the list.
This means that if we multiply all of [2], [3], . . . , [p 2] together, we must get [1]. Illustrating this
fact for p = 11, the pairing would be
[2] [3] [4] [5] [6] [7] [8] [9] = ([2] [6]) ([3] [4]) ([5] [9]) ([7] [8])
= [1] [1] [1] [1] = [1].
This observation suggests the following result, called Wilsons Theorem.
Wilsons Theorem
Proof. The result is trivial for p = 2 and p = 3. Suppose that p 5 is a prime and use the
observation preceding the theorem to obtain
[(p 1)!] = [1] [2] [3] [p 2] [p 1] = [1] [1] [1] = [1].
It follows that (p 1)! 1 (mod p).
To illustrate Wilsons Theorem with some simple examples, note that 4! + 1 = 25 is a multiple
of 5, 6! + 1 = 721 is a multiple of 7, and that 10! + 1 = 3628801 is divisible by 11. (How can you
check this last divisibility result quickly?)
Similar in spirit to Wilsons Theorem, and very useful, is Eulers Theorem and its special
case known as Fermats Little Theorem. To motivate these results, let n > 1 be a positive
integer and let [u] Un . Since the set {[u]i : i Z+ } is a subset of Un and since Un contains at
most n1 elements, there must be distinct positive integers i and j such that [u]i = [u]j . Assuming
that j > i, we find that [u]ji = [1]. In other words, for each [u] Un , there exists a positive
integer k such that [u]k = [1]. Eulers Theorem specifies a value of k with this property.
u(n) 1 (mod n).
Eulers Theorem
Proof. The result is trivial when n = 1 so suppose that n > 1 and let k = (n). Since u and n
are relatively prime, we know that [u] Un . If [a1 ], . . . , [ak ] is a list of the elements of Un , then
by Theorem 3.24 in Section 3.4, the collection [u] [a1 ], [u] [a2 ], . . . , [u] [ak ] is also a list of the
elements of Un . Multiplying these two collections of terms together gives
[a1 ] [a2 ] [ak ] = ([u] [a1 ]) ([u] [a2 ]) ([u] [ak ]) = [u]k [a1 ] [a2 ] [ak ].
Let b = a1 a2 ak . Then [b] Un and the displayed equation can be written as [b] = [u]k [b].
Multiplying both sides of this equation by [b]1 , we find that [u]k = [1]. Given the value of k, it
follows that u(n) 1 (mod n).
integer. Then
(mod 14).
The second example indicates how modular arithmetic can simplify computations. Using similar
ideas to avoid large numbers, we can verify Fermats Little Theorem with p = 17:
216 24
(1)4 1;
52 (8)8 24 (1)6 1;
(mod 7);
3 2
It is good practice to do these computations without a calculator and looking for shortcuts.
Leonhard Euler. Euler (pronounced oiler) was born in Basel in 1707 and died in 1783, following
a life of stunningly prolific mathematical work. His complete bibliography runs to nearly 900 entries;
his research amounted to some 800 pages a year over the whole of his career. He continued doing
research right up until his sudden death while relaxing with a cup of tea. For almost all of the last
17 years of his life he was totally blind.
The breadth of Eulers knowledge may be as impressive as the depth of his mathematical work.
He had a great facility with languages, and studied theology, medicine, astronomy, and physics.
His first appointment was in medicine at the recently established St. Petersburg Academy. On the
day that he arrived in Russia, the academys patron, Catherine I, died, and the academy itself just
managed to survive the transfer of power to the new regime. In the process, Euler ended up in the
chair of natural philosophy instead of medicine.
Euler is best remembered for his contributions to analysis and number theory, especially for
his use of infinite processes of various kinds (infinite sums and products, continued fractions), and
for establishing much of the modern notation of mathematics. Euler originated the use of e for the
base of the natural logarithms and i for 1; the symbol has been found in a book published
in 1706, but it was Eulers adoption of the symbol, in 1737, that made it standard. He was also
responsible for the use of
to represent a sum, and for the modern notation for a function, f (x).
Eulers greatest contribution to mathematics was the development of techniques for dealing
with infinite operations. In the process, he established what has ever since been called the field of
analysis, which includes and extends the differential and integral calculus of Newton and Leibniz.
For example, by treating the familiar functions sin x, cos x, and ex analytically (as infinite series),
Euler could easily establish identities that became fundamental tools in analysis. One such is the
well-known eix = cos x + i sin x; substituting x = gives ei = 1 or ei + 1 = 0, a remarkable
equation containing perhaps the five most important constants in analysis.
Euler used infinite series to establish and exploit some remarkable connections between analysis
and number theory. Many talented mathematicians before Euler had failed to discover the value
of the sum of the reciprocals of the squares: 12 + 22 + 32 + . Using the infinite series for
sin x, and assuming that it behaved like a finite polynomial, Euler showed that the sum is 2 /6.
Eulers uncritical application of ordinary algebra to infinite series occasionally led him into trouble,
but his results were overwhelmingly correct, and were later justified by more careful techniques as
the need for increased rigor in mathematical arguments became apparent. Well see Eulers name
more than once in the remainder of the chapter.
The information here is taken from A History of Mathematics, by Carl Boyer, New York: John
Wiley & Sons, 1968.
Quadratic Residues
The prime numbers, their properties, and their relation to the composite numbers have fascinated
mathematicians for thousands of years. A list of these results would fill volumes and new facts are
continuing to be discovered. In this section and the next, we provide a glimpse into some of the
problems that have been considered.
Most everyone is familiar with perfect squares. The list of perfect squares goes on indefinitely:
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, . . .
Is it possible to extend the notion of perfect square to Zn ? To determine the nature of this problem,
we begin by considering the 1s digit of perfect squares. By one of the exercises in Section 3.1, any
perfect square must end in 0, 1, 4, 5, 6, or 9. To verify this, we simply need to determine what
happens in Z10 :
[0]2 = [0],
[5]2 = [5],
[1]2 = [1],
[6]2 = [6],
[2]2 = [4],
[7]2 = [9],
[3]2 = [9],
[8]2 = [4],
[4]2 = [6],
[9]2 = [1].
As an illustration of this result, we immediately know that 56847 is not a perfect square. Another
way to phrase what we have just done is to say that [0], [1], [4], [5], [6], and [9] are squares in Z10
and that [2], [3], [7], and [8] are not squares in Z10 .
For a simpler, but slightly more abstract example, we can look at squares in Z4 :
[0]2 = [0],
[2]2 = [0],
[1]2 = [1],
[3]2 = [1].
It follows that any perfect square has a remainder of 0 or 1 when divided by 4. (See Theorem 3.4
in Section 3.1.) In other words, [0] and [1] are squares in Z4 , and [2] and [3] are not squares in Z4 .
The notion of quadratic residues extends this idea, but it does so with a focus on prime numbers.
Quadratic Residues
DEFINITION 3.43 Suppose that p is an odd prime and that b and p are relatively prime. Then
b is a quadratic residue modulo p if and only if the equation x2 b (mod p) has a solution. If
the equation has no solution, then we say that b is a quadratic nonresidue modulo p.
This definition merely says that b is a quadratic residue modulo p if some perfect square (a
quadratic) has a remainder (a residue) of b when divided by p. To phrase it another way, a
quadratic residue is a perfect square in the world of modular arithmetic. However, by insisting
that gcd(b, p) = 1, we are excluding the trivial case of 0. An equivalent way of stating this is to say
that b is a quadratic residue modulo p if the equation [x]2 = [b] has a solution in Up .
Given the equation x2 b (mod p), we may, without loss of generality, assume that 1 b < p
and seek solutions x that satisfy 1 x < p. Hence, to find the quadratic residues of a prime
number p, we simply need to check the squares of the numbers 1, 2, . . . , p 1. Since p is an odd
prime, this is equivalent to checking the squares of the numbers 1, 2, . . . , (p 1)/2. Using
this idea, it is easy to check that the quadratic residues of 7 are 1, 2, and 4, and that the quadratic
residues of 11 are 1, 3, 4, 5, and 9. For a more complicated illustration of the notion of quadratic
residue, the equation 132 = 169 = 5 31 + 14 reveals that 14 is a quadratic residue modulo 31. The
following result indicates how many quadratic residues a prime number may have.
THEOREM 3.44 If p is an odd prime, then exactly half of the numbers 1, 2, 3, . . . , p 1 are
quadratic residues modulo p. In other words, half of the elements of Up are perfect squares in Up .
Proof. Let p be an odd prime. To determine the quadratic residues of p, we need to compute the
numbers x2 (mod p) as x runs through the integers 1 to p 1. Since x2 (p x)2 for any integer
x, at most half of the elements that belong to the set {1, 2, 3, . . . , p 1} are quadratic residues
modulo p. The result then follows from the fact that x2 y 2 implies y x or y p x. The
details are left as an exercise.
For small primes, it is not difficult to find all of the quadratic residues by computing squares.
However, to determine whether or not, say, 111 is a quadratic residue of the prime 947 would be
rather challenging to attempt by brute force (that is, to write out all of the possible options for
squares in U947 ). It is thus convenient to have an easy and arithmetic way to identify whether
a given integer b is a quadratic residue of a prime p. It turns out to be most useful to define a
function QR(b, p) that gives the quadratic character of b modulo p. We write QR(b, p) = 1 when b
is a quadratic residue modulo p and QR(b, p) = 1 when it is not. The standard notation for this
function is the Legendre symbol:
QR(b, p) =
if x2 b (mod p) has a solution;
1, if x2 b (mod p) does not have a solution.
We should emphasize here that the notation QR is completely nonstandard, introduced in the
hope that first expressing this concept as a function makes the idea easier to understand. The
Legendre symbol is not an ideal choice, since it looks exactly like a fraction, but it is the standard
notation used in number theory. Whenever this notation is used, it is implicitly assumed that p is
an odd prime and that b and p are relatively prime.
b(p1)/2 (mod p)
when b and p are relatively prime. To illustrate the ideas used in the proof for a particular case,
let b = 7 and p = 13. We then list all of the solutions to the equation xy 7 (mod 13) for values
of x and y between 1 and 12:
The following result is known as Eulers Criterion. It states that
1 7 7;
2 10 7;
4 5 7;
3 11 7;
6 12 7;
8 9 7.
Multiplying all these equations together gives 12! 76 (mod 13) and thus 1 7(131)/2 (mod 13)
by Wilsons Theorem. Since x 6= y for each pair of products, we see that 7 is a quadratic nonresidue
of 13, that is, 1 =
7(131)/2 (mod 13). Now let b = 3 and p = 13. The solutions to the
equation xy 3 (mod 13) in this case are:
4 4 3;
9 9 3;
1 3 3;
2 8 3;
5 11 3;
6 7 3;
10 12 3.
It is clear that 3 is a quadratic residue of 13 since there are two solutions to x2 = 3 (mod 13).
Putting in the proper numbers and using Wilsons Theorem once again, we find that
1 12! 4 9 35 4 (4) 35 (1)3 35 (1)36 (mod 13).
It follows that 1 =
3(131)/2 (mod 13). The proof of Eulers Criterion merely extends these
ideas to the general case. As indicated by the above examples, for each integer x {1, 2, 3, . . . , p1}
there is a unique integer y {1, 2, 3, . . . , p1} such that xy b (mod p). If b is a quadratic residue
modulo p, then y may be equal to x, but if b is not a quadratic residue modulo p, then x and y are
always distinct. We leave a proof of this fact as an exercise.
THEOREM 3.45 Eulers Criterion Suppose that p is an odd prime and that b is an integer.
If b and p are relatively prime, then
b(p1)/2 (mod p).
Proof. Although it is not necessary, we can, without loss of generality, assume that 1 b p1.
We first note that (p 1)/2 is an integer since p is odd and that bp1 1 (mod p) by Fermats
Little Theorem. It follows that p divides the product
b(p1)/2 1 b(p1)/2 + 1 ,
which means that either b(p1)/2 1 (mod p) or b(p1)/2 1 (mod p). This shows that the
conclusion of the theorem makes sense; both sides assume the values 1.
Suppose that b is a quadratic nonresidue modulo p. Then the numbers 1, 2, . . . , p 1 can be
grouped into (p 1)/2 pairs {xi , yi } with xi yi b and it follows that
(p 1)! =
xi yi b(p1)/2
(mod p).
Quadratic Residues
Now suppose that b is a quadratic residue modulo p. There are precisely two numbers in
{1, 2, 3, . . . , p 1}, say c and p c, such that c2 (p c)2 b. The remaining p 3 numbers can
be paired up as before. Since c(p c) c2 b, we find that
(p 1)! = c(p c)
xi yi (b)b(p3)/2 = b(p1)/2 .
Using Wilsons Theorem once again, we find that b(p1)/2 1 (mod p), which agrees with the
value of the Legendre symbol. This completes the proof.
bi . If n and p are
Proof. We first note that the hypotheses imply that p does not divide any of the bi s. By the
theorem, it follows that
Now the number represented by
(mod p).
is a multiple of p and can only assume the values 0, 2, or 2. Since p is an odd prime, the value
must be 0.
To illustrate Eulers Criterion, note that
59 5 (52 )4 5 64 5 (2)2 1;
1714 2897 (1)7 1.
Thus 5 is a quadratic residue modulo 19 (with a little patience, we find that 92 5 (mod 19))
and 17 is a quadratic nonresidue modulo 29. The corollary once again reduces a problem about
integers to a problem concerning primes. As an example, the second problem above could be solved
as follows:
(1)14 (1)(314 ) (33 )4 32
(2)4 32 144 1.
As you can imagine, as the numbers become larger, the computations become more challenging.
The following remarkable result shows how the quadratic character of larger primes can be
computed quite easily. This deep result, known as the Quadratic Reciprocity Theorem, was discovered (but not proved) by Leonhard Euler. It was proved first by Gauss in the early 1800s and
reproved many times thereafter (at least eight different ways by Gauss alone). The beautiful proof
of this result given below is due to the brilliant young mathematician Gotthold Eisenstein, who
died tragically young, at 29, of tuberculosis. The proof is similar to one by Gauss, but it replaces
a complicated lemma by an ingenious geometrical argument. For each real number x, the symbols
bxc represent the greatest integer less than or equal to x. For example, b58/7c = 8 and bc = 3.
= (1) 2 (p1) 2 (q1) .
Proof. Let p and q be distinct odd primes and let E = {2, 4, 6, . . . , p 1}. For each e E, use
the Division Algorithm to write eq = pne + re , where ne = beq/pc and 1 re p 1. It is easy
to verify that re = rf if and only if e = f . For each e E, define
se =
if re is even;
re ,
p re , if re is odd;
and note that {se : e E} E. We claim that these two sets are actually equal. The only way
for them not to be equal would be if re = p rf for distinct integers e and f in E. But then
0 re + rf q(e + f ) (mod p),
which implies that p divides e + f , a contradiction to the fact that 2 < e + f < 2p and e + f is
even. As we will use this fact momentarily, note that se (1)re re (mod p) for each e E.
ne . We claim that (1)x q (p1)/2 (1)y (mod p). To see this,
re and y =
Let x =
eq q (p1)/2
e q (p1)/2
se q (p1)/2
re .
eq =
(pne + re ) = py + x.
Since the sum is even and p is odd, we find that the integers x and y have the same parity (that
is, x and y are either both even or both
and thus (1) = (1) . This establishes the claim.
By Eulers Criterion, we conclude that p = (1)y . Note that thus far in the proof, the only
property of q that we have used is that q is a positive integer that is relatively prime to p. In
particular, the results thus far are valid if q = 2.
Quadratic Residues
We have thus reduced the problem of determining p to that of determining whether y is
X j eq k
even or odd, where y =
. For each e E, we need to count the number of integers k that
satisfy 1 k < eq/p. Interpreting an allowed value of e and k as the ordered pair (e, k), we can
view the problem as counting lattice points (points with integer coordinates) in a certain region
of the plane. In particular, we are interested in lattice points with even abscissas and lying below
the line through the origin with slope q/p, that is, lattice points that lie completely inside triangle
ABD in the figure.
Note that (excluding the endpoints) there are no integer lattice points on the line segment AB.
The number of integer lattice points inside rectangle ADBF with a given integer abscissa is even
(namely, q 1), so the number of these points above line AB has the same parity as the number
below AB. Suppose e E and e > p/2. The number of integer lattice points with abscissa e above
line AB is the same as the number of integer lattice points with abscissa p e below AB (via the
correspondence (e, k) (p e, q k)). Since p e is odd,
X qe
X qe
X qe
e even
e even
parity, we find that p = (1) . By an analogous argument, it can be shown that q = (1) ,
where is the number of lattice points that lie completely inside triangle ALH. Since there are
2 (p 1) 2 (q 1) lattice points inside the rectangle AKHL, we have
= (1)+ = (1) 2 (p1) 2 (q1) .
This completes the proof.
Since the Quadratic Reciprocity Theorem does not include the lone even prime 2, its quadratic
character is stated in a separate theorem.
Proof. The proof depends on the observation made at the end of the second paragraph in the
proof of the Quadratic Reciprocity Theorem. The details are left as an exercise.
The value of p is 1. Multiplying both sides of the equation in the general theorem by this
value gives
(p1) 12 (q1)
= (1)
It may not be immediately apparent how much this equation simplifies the problem of determining
quadratic residues. Repeating the examples from above with this new result, we find that
= 1;
= 1.
To show how to handle much larger numbers, suppose we want to determine whether or not 73 is
a quadratic residue of 419. Using results in this section, we obtain
= 1.
(Be certain you follow each of the steps that appear in this equation.) In other words, the equation
x2 73 (mod 419) has a solution. (Can you find the value of x that solves this equation?) Although
these computations take some care, they are certainly much easier than using Eulers Criterion and
attempting to compute 73209 .
We have certainly not exhausted the topic of quadratic residues and the ramifications of the
Quadratic Reciprocity Theorem, but we have sufficient information to solve an interesting problem.
This is the content of the next section.
Quadratic Residues
Ferdinand Gotthold Max Eisenstein. Eisenstein (1823-1852) was born to parents of limited
means and remained near poverty throughout his life. He had five younger siblings, all of whom
died in childhoodmost of meningitis, which also afflicted Eisenstein. He suffered from poor health
and depression for most of his life.
Eisenstein first became interested in mathematics when he was six, thanks to a family acquaintance. In his autobiography, Eisenstein wrote, As a boy of six I could understand the proof of a
mathematical theorem more readily than that meat had to be cut with ones knife, not ones fork.
He also had a lifelong interest in musiche played the piano and composed.
Eisenstein had some excellent and encouraging teachers in mathematics, and began reading
the work of Euler, Lagrange and Gauss at an early age. In 1843 he passed his secondary school
examinations, though he already knew far more mathematics than the standard secondary fare.
He enrolled at the University of Berlin and submitted his first paper in January of 1844. In
that year, volumes 27 and 28 of Crelles mathematical journal contained twenty-five works by
Eisenstein, making him an overnight sensation in mathematical circles. Gauss was very impressed
by Eisensteins early work, and wrote the preface for an 1847 collection of work by Eisenstein.
Through Crelle, Eisenstein met Alexander von Humboldt, who became his mentor, champion
and financial lifeline. Humboldt secured a series of small grants for Eisenstein, and sometimes
contributed his own funds to help Eisenstein through times between grants.
Eisenstein was minimally involved in the political unrest of 1848. He was arrested and detained
overnight, suffering severe mistreatment that hurt his already poor health. The incident also made
it even more difficult for him to find financial support; Humboldt was just barely able to find some
funding for him. Eisensteins health deteriorated and his depression increased, so that he was often
unable to deliver his lectures, but he continued to publish papers.
In 1851 Eisenstein was elected to the Gottingen Society, and in 1852 to the Berlin Academy.
In July of 1852, his health declined precipitously when he suffered a hemorrhage. Humboldt raised
enough money to send him to recuperate in Italy for a year, but it came too late. Eisenstein died
in October of tuberculosis.
Our exposition of Eisensteins proof is taken from Eisensteins Misunderstood Geometric Proof
of the Quadratic Reciprocity Theorem, by Reinhard Laubenbacher and David Pengelley, in The
College Mathematics Journal, volume 25, number 1, January 1994. Biographical information
is from the same paper, and from the article on Eisenstein, by Kurt-R. Biermann, in Biographical
Dictionary of Mathematicians, New York: Charles Scribners Sons, 1991.
The Pythagorean Theorem states that a2 + b2 = c2 for a right triangle with legs a and b and
hypotenuse c. A search for integer solutions to this equation can be traced back more than two
thousand years; some general solutions appear in The Elements. Simple examples include
52 = 32 + 42 ,
132 = 52 + 122 ,
172 = 82 + 152 ,
1 = 02 + 12
2=1 +1
17 = 12 + 42
18 = 3 + 3
33 =
34 = 32 + 52
19 =
35 =
4 = 02 + 22
20 = 22 + 42
36 = 02 + 62
5 = 12 + 22
21 =
37 = 12 + 62
22 =
38 =
23 =
39 =
24 =
40 = 22 + 62
9 = 02 + 32
25 = 32 + 42
41 = 42 + 52
10 = 12 + 32
26 = 12 + 52
42 =
11 =
27 =
43 =
8=2 +2
12 =
44 =
28 =
13 = 2 + 3
14 =
29 = 2 + 5
47 =
31 =
16 = 0 + 4
45 = 32 + 62
46 =
30 =
15 =
32 = 4 + 4
48 =
We mentioned in the last section that a perfect square is congruent to either 0 or 1 modulo 4. It
follows that the sum of two perfect squares must be congruent to either 0, 1, or 2 modulo 4. In
other words, any integer that is congruent to 3 modulo 4 cannot be represented as a sum of two
squares. This accounts for the fact that the numbers 3, 7, 11, 15, and so on, have blank equations
in the table. However, there are quite a few other integers that also have blank equations. Our
goal is to characterize all positive integers that can be represented as a sum of two squares.
As we have done several times before, we break the problem down into simpler parts. We first
make note of the following pair of algebraic identities:
(a2 + b2 )(c2 + d2 ) = (ac + bd)2 + (ad bc)2 ;
(a2 + b2 )(c2 + d2 ) = (ac bd)2 + (ad + bc)2 .
These equalities show that the product of two integers that can be represented as a sum of two
squares can also be represented as a sum of two squares (and often the product has two such
representations). For example, using these identities and results from the table, we find that
1189 = 29 41 = (22 + 52 )(42 + 52 ) =
Notice that determining whether or not 1189 can be represented as a sum of two squares reduces to
determining whether or not its prime factors 29 and 41 can be represented as sums of two squares.
In general, if the factors of a number can be represented as a sum of two squares, then the number
itself can be represented as a sum of two squares.
What can we say about the integers that cannot be represented as a sum of two squares?
The product equation given above, along with the Fundamental Theorem of Arithmetic, makes it
possible to focus on prime numbers and their powers. Using 02 as one of the perfect squares, it is
easy to see that any even power of a prime number can be represented as a sum of squares. Lets
look at a few of the numbers that have blank equations in our table (and are not of the form 4k + 3
since these have already been ruled out) and consider their prime factorizations:
24 = 23 3
42 = 2 3 7
12 = 22 3
28 = 22 7
44 = 4 11
14 = 2 7
30 = 2 3 5
46 = 2 23
21 = 3 7
33 = 3 11
48 = 24 3
22 = 2 11
38 = 2 19
54 = 2 33
We notice that each of these nonrepresentable numbers contains an odd power of a prime of the
form 4k + 3. It turns out that this property completely characterizes those integers that can be
represented as a sum of two squares.
THEOREM 3.49 A positive integer n can be represented as a sum of two squares if and only
if every prime divisor of n of the form 4k + 3 appears in the canonical representation of n with an
even exponent.
Proof. Suppose first that n 1 can be represented as a sum of two squares and, to exclude the
trivial case, assume that n is not a perfect square. Let n = a2 + b2 , where a and b are positive
integers, and let g = (a, b). It follows easily that g 2 divides n so we may write
N = A2 + B 2 ,
n = N g 2 , a = Ag, b = Bg.
Note that (A, B) = 1 (see Exercise 11 in Section 3.4). Assume that a prime number p of the form
4k + 3 appears with an odd exponent in the canonical representation of n. Since g 2 contains only
even powers of primes that divide n, we know (by the Fundamental Theorem of Arithmetic) that
p must divide N , and hence p divides A2 + B 2 . If p divides either of the integers A or B, then
p also divides the other and we would have (A, B) p, a contradiction. Thus p does not divide
either A or B. Since (B 2 , p) = 1 and the equation x2 B 2 (mod p) has a solution (namely, A),
the number B 2 is a quadratic residue of p. By Eulers Criterion and Fermats Little Theorem,
we find that (using modulo p)
1 (B 2 )(p1)/2 (1)(2k+1) B p1 1,
a contradiction. It follows that p must appear in the canonical representation of n with an even
Now suppose that every prime divisor of n of the form 4k + 3 appears in the canonical representation of n with an even exponent. It is thus possible to write n as
n = N 2 p1 p2 pm ,
where N 1 and the pi s are distinct primes of the form 4k + 1 with the possible exception that
one of them might be 2. As noted earlier, the product of two positive integers each of which can be
represented as a sum of two squares also can be represented as a sum of two squares. Since both
N 2 and 2 can be represented as a sum of two squares, it is sufficient to prove that each prime of
the form 4k + 1 can be represented as a sum of two squares.
Let p be a prime of the form 4k +1. By Eulers Criterion, we find that 1 is a quadratic residue
of p. Consequently, there exist positive integers z and s such that 2 z (p1)/2 and sp = z 2 +1.
In other words, a multiple of p can be written as a sum of two squares and the multiplier s satisfies
1 s < p since
p + 1 2
p 1 2
< p2 .
sp = z 2 + 1
Since the collection of all positive multiples of p that can be written as a sum of two squares is
nonempty, it contains a least element (by the Well-Ordering Property), call it s1 p. If s1 = 1, then
we are finished. Suppose then that s1 > 1 and choose positive integers x and y so that s1 p = x2 +y 2 .
By a modification of the Division Algorithm (see Exercise 10 in Section 2.7), there exist integers
q1 , r1 , q2 , and r2 such that
x = q1 s1 + r1
and y = q2 s1 + r2 ,
where 0 |ri | s1 /2 for each i. Note that r1 and r2 cannot both be 0 since then s1 would divide
p, an impossibility since 1 < s1 s < p. We then have
s1 p = x2 + y 2 = s21 (q12 + q22 ) + 2s1 (q1 r1 + q2 r2 ) + (r12 + r22 ),
Using ideas similar to those discussed in this section, it can be shown that every positive integer
can be represented as the sum of at most four squares. From here, you can branch off in several
directions. You can ask which integers can be represented as a sum of three squares or how many
different ways an integer can be written as a sum of four squares. You can then look at sums of
cubes, sums of fourth powers, and so on. Many such problems have been studied over the years
and continue to be studied today.
The topics considered in the last three sections of this chapter provide a glimpse at the wonderful
but difficult and subtle areas of the field of number theory. We hope these ideas make you want to
explore number theory further. You can use the bibliography for a list of some books to get you
started but there are many other sources of information on these topics.
The reader has certainly dealt with functions before, primarily in calculus, where functions from
R to R or from R2 to R are studied extensively. Most students think of functions as formulas
such as f (x) = x2 sin x or g(x, y) = x2 + 2xy + y 3 , but there is much more to the concept than
these simple formulas might indicate. Perhaps you have encountered functions in a more abstract
setting as well; this is our focus. We consider the general notion of a function and examine some
of its properties. In the last few sections of the chapter, we use functions to study some interesting
topics in set theory. In particular, we explore the notion of infinity and determine ways in which
to compare the sizes of infinite sets.
Chapter 4 Functions
To see why this definition is not really a definition, note that the words assignment and
rule are synonyms for function. As mentioned above, this problem can be resolved by defining
a function using the undefined term set; a function from A to B is a subset of the Cartesian
product A B that satisfies certain properties. For our purposes, all that is needed is an intuitive
understanding of the concept and a way of showing two functions are equal.
We often write f : A B to indicate that f is a function from A to B. For the record, whenever
we write f : A B, it is always assumed that A and B are nonempty sets. Sometimes the word
map or mapping is used instead of function. If f : A B and f (a) = b, then we say b is the
image of a under f and a is a preimage of b under f . When the function is clear from the
context, the phrase under f may be dropped. The elements of A are sometimes referred to as the
inputs for the function f and the values f (a) are the outputs of the function f .
It is important to note that a function consists of three parts; a domain, a codomain, and a
rule of correspondence, that is, a function is not just a rule of correspondence or a formula. In
calculus, it is common to see something like consider the function f (x) = 5 x. For situations
such as this, the codomain is assumed to be R and the domain is assumed to be the set of all real
numbers for which the formula for f (x) is defined. In this case, we see that the domain is the
interval (, 5]. Using our new notation, we would write consider the function f : (, 5] R
defined by f (x) = 5 x. When a domain is defined implicitly like this, it is often referred to as
the natural domain of the function. To emphasize the first sentence of this paragraph, the function
g: [0, 5] [0, ) defined by g(x) = 5 x is not the same as the function f ; the rule is the same
but both the domain and the codomain are different. In practice, however, the sets A and B are
often clear from the context and we refer to the function f as opposed to always writing the
function f : A B.
Let A and B be nonempty sets. A rule of correspondence that attempts to define a function
f : A B is well-defined if for each a A there is exactly one value for f (a). To illustrate what
is meant by this, consider the following attempt to define a function: for each real number x,
let f (x) be a real number whose square is x. There are two problems with this definition. First
of all, if x < 0, then there is no value for f (x). This problem can be eliminated by writing for
each nonnegative real number x, let f (x) be a real number whose square is x. However, this does
not remove the second problem; for each x > 0, there are two real numbers whose square is x.
Hence, all positive inputs generate two outputs, something that is not allowed in the definition of a
function. The bottom line is that this rule of correspondence does not define a function. However,
the function f : [0, ) R defined by f (x) = x is a valid function. (By convention, the symbol
The reader should be familiar with many functions of the form f : R R: polynomial functions,
trigonometric functions, exponential functions, and so on. Usually these functions have codomain R
and their domain is some subset of R. For example, f (x) = x has domain [0, ) and f (x) = 1/x
has domain {x R : x 6= 0}. (The domain of these functions is the natural domain discussed
earlier.) It is easy to see that a subset of the plane is the graph of a function f : R R if and only
if every vertical line intersects the graph at exactly one point. If this point is (a, b), then f (a) = b.
Functions on finite sets can be defined by listing all the assignments. If A = {1, 2, 3, 4} and
B = {r, s, t, u, v}, then f (1) = t, f (2) = s, f (3) = u, f (4) = t defines a function from A to B.
The assignment can be done quite arbitrarily, without recourse to any particular formula. Note
that the images of both 1 and 4 are t. This is consistent with the definition of a function. The
definition insists that each input have exactly one output, but different inputs may have the same
output; this is an important distinction to remember.
In calculus and analysis, the rule of correspondence for a function is often given by an explicit
formula. For example, we can define a function h: R R by h(x) = x2 . This function assigns the
real number x2 to the real number x. However, any rule of correspondence that assigns to each
element of A a unique element of B is a function, even if it does not involve a formula. As an
example of this situation from a calculus perspective, for each real number x, let u(x) be the real
number for which
(u(x))7 + (u(x))5 + (u(x))3 + u(x) + 1 = x.
It can be shown that this defines a function u: R R. There is no explicit formula that gives the
values of this function, but it still satisfies the definition of a function; for each real number x there
is exactly one real number u(x). (What theorems from calculus are necessary to prove this?)
EXAMPLE 4.2 For A = {1, 2, 3, 4, 5} and B = {r, s, t, u}, consider the following correspondences f , g, and h:
f (1) = t;
g(1) = u;
h(1) = r;
f (2) = s;
g(2) = r;
h(2) = r;
f (3) = r;
g(4) = s;
h(3) = s;
f (3) = u;
g(5) = t;
h(4) = s;
f (4) = u;
h(5) = s.
f (5) = r;
The correspondences f and g are not functions from A to B. The problem is that f maps 3 to two
values and g doesnt map 3 to any values. When listing the assignments for a function the elements
of the domain must appear exactly once. Elements of the codomain may appear more than once
or not at all; the correspondence h is a function from A to B even though the element s of the
codomain has three preimages and t has none. We discuss this situation at length in later sections.
Some functions are common enough to be given special names. Suppose that A and B are
nonempty sets. We define the identity function iA : A A by the rule iA (a) = a for all a A.
In other words, the identity function maps every element to itself. Though this seems like a rather
trivial concept, it is useful and important. As we will see, identity functions behave in much the
same way that 0 does with respect to addition or 1 does with respect to multiplication. If b0 is a
fixed element of B, we can define a constant function f : A B by the formula f (a) = b0 for all
Chapter 4 Functions
a A. There are as many constant functions from A to B as there are elements of B. Finally, if
A B, define the inclusion function f : A B by f (a) = a for every a A. This is very similar
to iA ; the only difference is the codomain.
DEFINITION 4.3 If f : A B and g: B C are functions, define g f : A C by the rule
(g f )(a) = g(f (a)) for all a A. This is called the composition of the two functions.
Note that the domain of g f is the same as the domain of f and that the codomain of g f is
the same as the codomain of g. Observe that f is the first function that is applied to an element
a though it is listed on the right. This violation of the usual left-to-right convention sometimes
causes confusion so be careful. Composite functions appear frequently in calculus. If f : [0, ) R
(g f )(x) = sin x. Note that the function f g, which is defined by the formula (f g)(x) = sin x,
makes sense only for those values of x such that sin x 0. In general, the functions f g and g f
are not equal, and (as in this case) they need not be defined at the same points. Thus the operation
of composition is not commutative.
If A, B, and C are nonempty sets for which A B, f : A B is the inclusion function, and
g: B C is a function, then g f : A C is called the restriction of g to A and is usually written
g|A . For all a A,
g|A (a) = g(f (a)) = g(a),
that is, the rule for g|A is the same as it is for the function g but the domain of g|A is a smaller
set. In particular, the functions g and g|A are not the same unless A = B.
EXAMPLE 4.4 Let A = {1, 2, 3, 4}, B = {r, s, t, u}, and C = {$, %, #, &}, then for the
functions f : A B and g: B C defined by
f (1) = u;
f (2) = r;
f (3) = s;
f (4) = u;
g(r) = %;
g(s) = #;
g(t) = $;
g(u) = $;
we have
(g f )(1) = $;
(g f )(2) = %;
(g f )(3) = #;
(g f )(4) = $.
If f : A B, then f iA = f and iB f = f .
Proof. All three functions f , f iA , and iB f have domain A and codomain B; these sets are
implicitly assumed for the composite functions. For every a A,
(f iA )(a) = f (iA (a)) = f (a)
Since the values of these functions are the same for each a in the domain A, the functions are
As we will see throughout this chapter, sets and functions are intimately related. In this section,
we begin to explore some basic connections between them. Suppose f : A B is a function. If
X A, define a set f (X) B by
f (X) = {b B : a X(b = f (a))}
Chapter 4 Functions
Context should always make it clear what is meant by the function f , but you should be aware of
the potential misinterpretation.
EXAMPLE 4.7 Suppose A = {1, 2, 3, 4, 5, 6} and B = {r, s, t, u, v, w} and define a function
f : A B by f (1) = r, f (2) = s, f (3) = v, f (4) = t, f (5) = r, f (6) = v.
f ({1, 3, 5}) = {r, v};
f 1 ({r, t, u}) = f 1 ({r, t}) = {1, 4, 5};
f ({4, 5, 6}) = {r, t, v};
f 1 ({u, w}) = .
a) f 1 (Y Z) = f 1 (Y ) f 1 (Z),
b) f 1 (Y Z) = f 1 (Y ) f 1 (Z).
Proof. We prove part (b) and leave a proof of part (a) as an exercise. Note that the three sets
that appear in part (b) are all subsets of A. Suppose that a A. We then have
a f 1 (Y Z) f (a) Y Z
definition of f 1
definition of
a f 1 (Y ) and a f 1 (Z)
a f 1 (Y ) f 1 (Z),
definition of f 1
definition of
a) f (W X) = f (W ) f (X),
b) f (W X) f (W ) f (X).
Proof. Once again, we prove part (b) and leave part (a) as an exercise. The three sets that
appear in part (b) are all subsets of B. If f (W X) is empty, we are done. Otherwise, suppose
that b f (W X). This means that b = f (a) for some a W X. Since a W X, it follows that
a is in both W and X. Thus b = f (a) belongs to both f (W ) and f (X), that is, b f (W ) f (X).
Since every b that belongs to the set f (W X) also belongs to the set f (W ) f (X), we find that
f (W X) f (W ) f (X).
It is perhaps surprising to compare these two theorems and observe that of the two induced
set functions, it is f 1 that is better behaved with respect to the usual set operations.
Two simple properties that functions may have turn out to be exceptionally useful. If the codomain
of a function is also its range, then the function is onto or surjective. If a function does not map
two different elements in the domain to the same element in the range, it is one-to-one or injective.
In this section, we define these concepts officially in terms of preimages, and explore some easy
examples and consequences. Recall that a function is a rule of correspondence along with two sets,
a domain and a codomain; it is not just a rule. This distinction is very important to remember.
DEFINITION 4.11 Let A and B be nonempty sets. A function f : A B is injective if each
b B has at most one preimage in A.
An injective function is called an injection. An injection may also be referred to as a one-toone (or 11) function; some people consider this term to be less formal than injection. Note that
the definition has several equivalent formulations:
f is injective
b f (A) ! a A (f (a) = b)
g(1) = r;
g(2) = t;
f (3) = r;
g(3) = r.
The function f is injective since r, s, and t each have one preimage and u and v each have no
preimages. On the other hand, the function g fails to be injective since r has more than one
preimage. In general, if A B, then the inclusion map from A to B is injective. In particular, the
identity function is injective.
To illustrate injective functions with functions from calculus, define functions f : R R and
g: R R by f (x) = x2 and g(x) = 2x , respectively. The function f fails to be injective because any
positive number has two preimages (its positive and negative square roots). On the other hand,
the function g is injective. To see this, note that g(x) = b has one solution when b > 0 (namely,
log2 b) and no solution when b 0. The reader might find it helpful to formulate a horizontal line
test to determine if a function of the form h: R R is injective or not.
Referring to the list of equivalent formulations for the definition of an injection, the third one
shows that a proof that a function is injective is essentially a uniqueness proof. Hence, a common
way to prove that a function f : A B is injective is to assume that f (a1 ) = f (a2 ) for two elements
a1 and a2 of A, then prove that a1 = a2 . It follows that each element of the codomain has at most
one preimage. This method of proof is illustrated in the following example.
EXAMPLE 4.12 Consider the function f : R R defined by f (x) = x3 + 4x + 7. To prove that
f is injective, suppose that there exist two real numbers x and y such that f (x) = f (y). Then
x3 + 4x + 7 = y 3 + 4y + 7;
(x3 y 3 ) + 4(x y) = 0;
(x y)(x2 + xy + y 2 + 4) = 0;
x2 + y 2 + (x + y)2 + 8
(x y)
= 0.
Since the second term in the last product is clearly positive, we find that x = y. It follows that f
is injective. (We have presented a proof that involves algebra only. It is possible to prove that f
is injective using some theorems from calculus, but calculus results are deeper than algebra results
so some people might view such a proof as cheating.)
The next result shows how injections behave under composition.
injective function.
Proof. Suppose there exist elements u and v in A for which g(f (u)) = g(f (v)). Since g is
injective, we know that f (u) = f (v). Since f is injective, it follows that u = v. Hence, the function
g f : A C is injective.
We now turn to the other property of functions that we mentioned in the introduction of this
section. The notion of a surjective function is dual to that of an injective function.
DEFINITION 4.14 Let A and B be nonempty sets. A function f : A B is surjective if each
b B has at least one preimage in A.
A surjective function is called a surjection. A surjection may also be called an onto function;
some people consider this term to be less formal than surjection. As with injective functions, the
Chapter 4 Functions
b B a A (f (a) = b)
B = f (A)
the range of f is B.
As the last form indicates, a function f : A B is a surjection if its range is the same as its
codomain. For example, let A = {1, 2, 3, 4, 5} and B = {r, s, t} be sets and define functions f and
g mapping A into B by
f (1) = s;
g(1) = t;
f (2) = r;
f (3) = s;
g(2) = r;
g(3) = r;
f (4) = t;
f (5) = r;
g(4) = t;
g(5) = t.
For the function f , the elements r, s, and t have 2, 2, and 1 preimages, respectively, so f is
surjective. For the function g, the element s has no preimages. It follows that g is not surjective.
For any nonempty set A, the identity map iA : A A is both injective and surjective.
To illustrate surjective functions with functions from calculus, define functions f : R R and
g: R R by f (x) = 3x and g(x) = x3 , respectively. Since 3x is always positive, the function f
is not surjective (any b 0 has no preimages). On the other hand, for any b R, the equation
b = g(x) has a solution (namely x = 3 b) so b has a preimage under g. Therefore, the function g is
surjective. As with injective functions, the reader might find it helpful to formulate a horizontal
line test to determine if a function of the form h: R R is surjective or not.
As we have mentioned before, a function consists of two nonempty sets and a rule of correspondence. Since the definitions of an injection and a surjection depend on the domain and codomain,
it is important to be clear what these sets are. For example, we cannot use the formula f (x) = x2
to decide if f is injective or surjective; we need to know the domain and codomain. To be specific,
1. the function f1 : R R defined by f1 (x) = x2 is neither injective nor surjective;
2. the function f2 : [0, ) R defined by f2 (x) = x2 is injective but not surjective;
3. the function f3 : [1, 1] [0, 1] defined by f3 (x) = x2 is surjective but not injective;
4. the function f4 : [0, ) [0, ) defined by f4 (x) = x2 is both injective and surjective.
The following result is the analogue of Theorem 4.13. Its proof is left as an exercise.
surjective function.
11. Suppose that A and B are nonempty sets. The function p: A B B defined by p((a, b)) = b is
called the projection onto B. Prove that p is surjective. Under what conditions is p injective?
Injections and surjections are alike but different, much as intersection and union are alike but
different. This is another example of duality.
Chapter 4 Functions
Since g f is injective, we find that a1 = a2 . Hence, the function f is injective. We leave a proof
of part (b) as an exercise.
Let A be a nonempty set and let F represent the collection of all functions mapping A into A.
Define an operation on F by f g = f g. As we have seen, this operation is associative but it
may not be commutative. For this operation, we can ask questions such as
if f g1 = f g2 , does it follow that g1 = g2 ?
if f1 g = f2 g, does it follow that f1 = f2 ?
The next result shows that such cancellations are valid for injective and surjective functions in
certain circumstances. The results are given in a more general setting than in this brief introduction
to function spaces. As in Theorem 4.16, the result in the two cases is the same, but different.
THEOREM 4.17 Suppose that f1 and f2 are functions mapping A into B, that g is a function
mapping B into C, and that h1 and h2 are functions mapping C into D.
a) If g is injective and g f1 = g f2 , then f1 = f2 .
b) If g is surjective and h1 g = h2 g, then h1 = h2 .
Proof. To prove part (b), assume that g is surjective and that h1 g = h2 g. We must show
that h1 (c) = h2 (c) for each c C. Let c C. Since the function g is surjective, there exists b B
such that g(b) = c. It then follows that
h1 (c) = h1 (g(b)) = h2 (g(b)) = h2 (c).
Since this equality is valid for every c C, the functions h1 and h2 are equal. We leave a proof of
part (a) as an exercise.
f (2) = t,
f (3) = t,
f (4) = r,
g(s) = 3,
g(t) = 2,
is one of several pseudo-inverses of f . The important point is that g must map r to either 1 or 4,
and t to either 2 or 3.
As this example illustrates, any f : A B has a pseudo-inverse. We are usually interested in a
pseudo-inverse when f is either injective or surjective. The next result indicates some of the useful
information that can be obtained in these cases.
Chapter 4 Functions
of f , we know that g(b) must belong to the set f 1 ({b}). However, since f is injective, this set
contains a only so we must have g(b) = a. It follows that g is surjective. Note that g f = iA ; we
say g is a left inverse of f .
Now suppose that f : A B is surjective and let g: B A be any pseudo-inverse of f . To prove
that g is injective, suppose that g(b1 ) = g(b2 ) for two elements b1 and b2 in B. By the definition of
a function, we know that f (g(b1 )) = f (g(b2 )). Since f is surjective, it follows from the definition
of a pseudo-inverse that g(b1 ) is a pre-image under f of b1 and g(b2 ) is a pre-image under of f of
b2 . This means that b1 = f (g(b1 )) = f (g(b2 )) = b2 and we conclude that g is injective. Note that
f g = iB ; we say g is a right inverse of f .
and g: B A by
f (1) = s,
f (2) = v,
f (3) = w,
f (4) = r;
g(r) = 4,
g(s) = 1,
g(t) = 2,
g(u) = 4,
g(v) = 2,
g(w) = 3.
For the sets A = {1, 2, 3, 4, 5} and B = {r, s, t}, define two functions f : A B
f (1) = r,
f (2) = t,
f (3) = t,
g(r) = 4,
g(s) = 5,
g(t) = 2.
f (4) = r,
f (5) = s;
As we have seen, functions that are injections or surjections have special properties that a general
function does not have. What happens if a function is both injective and surjective?
DEFINITION 4.23 A function f : A B is bijective if each b B has exactly one preimage.
A bijective function is called a bijection.
Since at least one combined with at most one gives exactly one, a function f is a bijection
if and only if it is both an injection and a surjection. Consider the following examples of bijections.
If A = {1, 2, 3, 4} and B = {r, s, t, u}, then the function f : A B defined by
f (1) = u,
f (2) = r,
f (3) = t,
f (4) = s,
is a bijection.
The functions f : R R and F : R R+ (where R+ denotes the set of positive real numbers)
given by f (x) = x5 and F (x) = 5x are bijections.
For a nonempty set A, the identity function iA : A A is a bijection.
It should be clear why a bijection is also called a one-to-one correspondence.
DEFINITION 4.24 If f : A B and g: B A are functions, we say g is an inverse of f (and
f is an inverse of g) if and only if f g = iB and g f = iA .
The idea behind an inverse is that f sends an element a in A to an element b in B and then g
sends it right back. Referring to the examples given above, we have the following:
For the function f : A B, a function g: B A defined by
g(r) = 2,
g(s) = 4,
g(t) = 3,
g(u) = 1,
( 5 x )5 = x,
x5 = x,
log5 5x = x, 5log5 x = x,
respectively. Note carefully the domains and codomains of the functions f and g and the
functions F and G and the corresponding values of x for which the above equations are valid.
Chapter 4 Functions
Proof. Suppose first that f has an inverse and let g be an inverse of f . Since g f = iA is
injective, the function f is injective (see part (a) of Theorem 4.16). Since f g = iB is surjective,
the function f is surjective (see part (b) of Theorem 4.16). Since f is injective and surjective, it is
Conversely, suppose f is bijective. For each b B there exists (the surjective part) a unique
(the injective part) a A such that f (a) = b. Let g(b) = a; this defines a function g: B A. It is
easy to verify that f g = iB and g f = iA , showing that g is an inverse of f .
We have talked about an inverse of f , but really there is only one.
Consider the seemingly innocuous statement The set of real numbers is larger than the set of
rational numbers. On the surface, this appears to be a simplistic observation. Since every rational
number is a real number and there are real numbers that are not rational numbers, it seems clear
that the set of real numbers contains more elements than the set of rational numbers. However,
consider the statement, there are more irrational numbers than there are rational numbers. Since
these two sets are disjoint, this question is not as easy to dismiss; for that matter, since the sets are
both infinite, it is not all that clear what it even means. How can one set be more infinite than
another set? To answer this question, we must first agree on a definition of size for infinite sets.
For the usual sorts of sets we encounter every day, the question When are two sets A and B
the same size? has a simple answer; the sets A and B have the same size when A and B have
the same number of elements. When entering the realm of infinite sets, we need to be much more
careful than this. A good way to motivate the definition of size for infinite sets is with a thought
experiment. Consider a large auditorium that is filled with people. The fire code states that each
person must have a seat, while the management wants every seat full in order to maximize revenue.
How can you determine if both conditions are met? One method is to actually count the number
of people and to count the number of seats. If there are 9852 people and 9852 seats, then everyone
is satisfied. This would be a tedious task, and errors in counting could easily occur. A much more
efficient method is to have everyone take a seat. If each person has a seat and if no seat is empty,
Chapter 4 Functions
then there are the same number of people as seats. This is a true statement even if you do not
know either the number of people or the number of seats.
It is thus possible to show that two sets have the same size without knowing the number of
elements in each set: simply pair off the elements of each set. This method easily extends to
infinite sets. Two infinite sets have the same size if their elements can be put into a one-to-one
correspondence. We thus make the following definition.
DEFINITION 4.28 The sets A and B have the same size or cardinality if there is a bijection
f : A B. When A and B have the same cardinality, we write A B.
The difficulty, if one can call it that, is that this definition leads to intuitively bizarre results.
For example, the set of positive integers and the set of even positive integers have the same size.
This follows from the pairing (top to bottom)
10 12 14
8 9
16 18
10 . . .
20 . . .
which establishes a one-to-one correspondence between the two sets. The fact that this pairing is a
one-to-one correspondence is clear, but it seems just as clear that there are more positive integers
than there are even positive integers. If a definition leads to contradictions (in the logical sense),
it must be discarded; if it leads to results that seem to violate common sense, then the definition
can either be left aside or intuition can rise to the occasion. In this case, it is intuition that must
find a way to grapple with these strange properties of infinite sets. In other words, we will accept
this definition and see where it leads.
Due to the fact that counterintuitive results appear to occur when working with the definition
of cardinality, especially when dealing with infinite sets, we must proceed very carefully. For this
reason, even obvious results require careful proofs using the definition of this new concept. The
following theorem presents one of these obvious results.
Suppose that {1, 2, . . . , n} N for some positive integer n and let f : {1, 2, . . . , n} N
be a bijection. It is easy to verify that the positive integer p defined by p =
f (i) is not in the
range of f . This is a contradiction to the fact that f is a bijection and the theorem follows. (Do
you see how mathematical induction has implicitly entered the proof?)
We now record some simple but important properties of cardinality. Taken together, they
reveal that the notion of cardinality determines an equivalence relation on a collection of sets.
a) A A;
b) if A B, then B A;
c) if A B and B C, then A C.
Proof. Let A and B be sets. Since iA : A A is a bijection, part (a) follows. Suppose that A B
and let f : A B be a bijection. By Theorem 4.27, the function f 1 : B A is also a bijection. It
follows that B A, proving part (b). Part (c) follows from the fact that the composition of two
bijections is a bijection; the details are left to the reader.
The next definition formalizes the introductory discussion concerning the relative sizes of arbitrary sets. It also introduces some adjectives that describe the various sizes of sets that are to be
considered in these last two sections of the text.
a) The set A is finite if it is empty or if its elements can be put in a one-to-one correspondence
with the set {1, 2, . . . , n} for some positive integer n.
b) The set A is infinite if it is not finite.
c) The set A is countably infinite if its elements can be put in a one-to-one correspondence
with the set of positive integers.
d) The set A is countable if it is either finite or countably infinite.
e) The set A is uncountable if it is not countable.
The distinction between finite sets and infinite sets is generally easy to grasp: a finite set is
eventually exhausted when you start listing out its elements, whereas an infinite set is not. For
instance, the number of license plates using three letters (from the alphabet) and three singledigit numbers is finite. There are many of them, but in theory if you start writing down all of the
possibilities, eventually the list would end. Intuitively, the set of positive integers is infinite because
a list of positive integers never ends; Theorem 4.29 provides a rigorous proof of this fact. It then
follows (see part (3) of Theorem 4.33) that the set of rational numbers is an infinite set, as is the
set of real numbers.
According to the definition, a set A is countably infinite if N A, that is, A has the same
cardinality as the natural numbers. If f : N A is a bijection, then
A = {f (1), f (2), f (3), . . .}.
In other words, a set is countably infinite if and only if it can be arranged as an infinite sequence
of distinct terms.
As indicated earlier, the set of even positive integers is countably infinite. Letting E + be the
set of even positive integers, the function f : N E + defined by f (n) = 2n is a bijection. The
set of all integers greater than 1000 is also countably infinite; the function g defined on N by
g(n) = n 1000 provides a one-to-one correspondence between these two sets. Since it is often
difficult to express a correspondence between two sets as a function, a pairing of the elements of two
sets is sometimes just written down as a pattern, with the assumption that the pattern continues.
For example, the pairing
1 2
0 1
3 4
1 2
7 8
3 4
9 10 . . .
4 5 . . .
Chapter 4 Functions
shows that the set Z is countably infinite. This pairing is probably easier to grasp than defining a
function f : N Z by
if n is even;
f (n) =
(1 n)/2, if n is odd;
and showing that it is a bijection.
The above examples show that a proper subset of an infinite set can have the same cardinality
as the entire set. In fact, this is sometimes taken as the definition of an infinite set. In other words,
this seeming paradox is actually part of the nature of an infinite set. This fact is stated precisely
in the following theorem; be aware that understanding the proof requires some effort.
A set is infinite if and only if it has the same cardinality as one of its proper
Proof. Suppose first that X is an infinite set. Let x1 be an arbitrary element of X, let x2 be
an arbitrary element of X \ {x1 }, let x3 be an arbitrary element of X \ {x1 , x2 }, and so on. Since
the set X is infinite, this process can be repeated for each positive integer n. Hence (by the strong
form of the Principle of Mathematical Induction), the set X contains a countably infinite subset
{xn : n Z+ }. Let X1 = X \ {xn : n Z+ } and let Y = X1 {x2n : n Z+ }. (Note that the set
X1 may be empty.) Then Y is a proper subset of X and the function f : X Y defined by
f (x) =
if x X1 ;
x2n , if x = xn ;
is a bijection. It follows that X has the same cardinality as one of its proper subsets.
For the converse, we must prove that a finite set cannot have the same cardinality as any of its
proper subsets. Since cardinality is an equivalence relation, we need only consider sets of positive
integers. For each positive integer n, let n = {1, 2, . . . , n}. We must show that for each n, the set
n does not have the same cardinality as any of its proper subsets. To prove this, we apply the
Principle of Mathematical Induction. It is obvious that the statement is true for both 1 and 2 .
Suppose that for some positive integer k, the set k does not have the same cardinality as any of
its proper subsets. Let A be a proper subset of k+1 and suppose that f : k+1 A is a bijection.
There are two cases to consider.
i) Suppose that k+1
/ A. Then A k and f (k+1) k . The function g: k A\{f (k+1)}
defined by g(i) = f (i) for each i k is a bijection between k and one of its proper subsets,
namely, the set A \ {f (k + 1)}.
ii) Suppose that k + 1 A. Without loss of generality, we may assume that f (k + 1) = k + 1.
For if f (p) = k + 1 for some 1 p < k + 1, the function f1 : k+1 A defined by
if i
/ {p, k + 1};
f (i),
f1 (i) = f (p),
if i = k + 1;
f (k + 1), if i = p;
is a bijection that satisfies f1 (k + 1) = k + 1. Now the function g : k A \ {k + 1} defined
by g(i) = f (i) for each i k is a bijection between k and one of its proper subsets.
In either case the induction hypothesis is contradicted. Hence, the set k+1 does not have the
same cardinality as any of its proper subsets. By the Principle of Mathematical Induction, for each
positive integer n, the set n does not have the same cardinality as any of its proper subsets. This
completes the proof.
The next theorem lists some results that are easy to believe based upon the definitions of the
concepts and intuition. However, careful proofs are once again required to establish the results.
Chapter 4 Functions
of the function f , we find that a is the smallest integer in the set A \ {f (1), f (2), . . . , f (p)}. Then
f (p + 1) = a, and it follows that f is onto. Therefore, the function f establishes a one-to-one
correspondence between N and A. This shows that A is countably infinite.
The assertion that there are more irrational numbers than there are rational numbers can
now be stated precisely as follows: the set of rational numbers is countably infinite and the set of
irrational numbers is uncountable. This fact, which was first published by Georg Cantor (see the
next section for his biography), came as a surprise to mathematicians of the time. As a first step
toward a proof, we prove that the union of a countable number of countable sets is a countable set.
(The formation of a set of this type is explained in the proof of the theorem.)
Proof. It is sufficient to prove that a countably infinite union of disjoint countably infinite sets
is countably infinite (see the exercises). In order to have a countably infinite number of sets, there
must be one set corresponding to each positive integer n. Let {An : n N} be a countably infinite
collection of sets. Suppose that each An is a countably infinite set and that none of the sets have
any elements in common. We must prove that the set A =
An is countably infinite. Since each
An is countably infinite, its elements can be put into a one-to-one correspondence with the set of
positive integers. For each n, let An = {xn,k : k = 1, 2, . . .}, that is,
A1 = {x1,1 , x1,2 , x1,3 , x1,4 , . . .},
A2 = {x2,1 , x2,2 , x2,3 , x2,4 , . . .},
A3 = {x3,1 , x3,2 , x3,3 , x3,4 , . . .},
and so on. By the Fundamental Theorem of Arithmetic, the pairing xn,k 2n 3k is a one-to-one
correspondence between A and an infinite subset of the positive integers. By part (7) of Theorem
4.33, the set A is countably infinite.
The previous theorem is often used to prove that an infinite set is countably infinite. If it is
possible to decompose the set into a countably infinite number of subsets, each of which is countable,
then the set is countably infinite. The advantage of this method for proving that a set is countably
infinite is that there is no need to find a formula of correspondence or even to illustrate how the
elements of the set can be paired with the positive integers. To illustrate the use of Theorem 4.34,
we use it to prove that the set of rational numbers is countably infinite. (For the record, there are
other ways to prove that Q is countably infinite.) The method of proof, which is summarized in the
next few sentences, is quite typical. Let A be a set. For each positive integer n, define a subset An
of A. The sets An must be defined in such a way that A =
An and that a method for proving
that each An is a countable set is apparent. It is this step of the proof that may require some
creativity. The conclusion that A is countable then follows from Theorem 4.34. For the record, the
ratio p/q of two integers is said to be in simplest form if p and q are relatively prime.
Proof. Let A1 be the set of all rational numbers that in simplest form have a denominator of 1.
The set A1 is actually the set of integers and is thus countably infinite. Let A2 be the set of all
rational numbers that in simplest form have a denominator of 2. Since the only possible choices for
the numerators are odd integers, the set A2 is also countably infinite. In general, for each positive
integer n, let An be the set of all rational numbers that in simplest form have a denominator of n.
For instance,
4 2 1 1 2 4
5 3 1 1 3 5
A3 = . . . , , , , , , , . . .
A4 = . . . , , , , , , , . . . .
3 3 3 3 3 3
4 4 4 4 4 4
It should be clear that Q =
An . Since the numerators that appear in the elements of An are
integers that are relatively prime to n, each An can be put into a one-to-one correspondence with
an infinite subset of Z, that is, each of the sets An is countably infinite. By Theorem 4.34, the set
Q is countably infinite.
Chapter 4 Functions
Since a countably infinite union of countably infinite sets is still countably infinite, it is difficult to
imagine a set that is uncountable. However, such sets do exist. To prove that a set is uncountable,
it is necessary to verify that there is no one-to-one correspondence between the given set and the
set of positive integers. The following list of equivalent statements provides a common way to prove
that an infinite set is uncountable:
A is an uncountable set
Hence, to prove that a set A is uncountable, it is sufficient to prove that every injection mapping N
into A is not a surjection. This is the approach taken in the proof of the next theorem. In addition,
the proof uses a famous technique known as the Cantor diagonalization process, named after the
mathematician Georg Cantor.
Proof. It is sufficient to prove that the open interval (0, 1) is uncountable. Suppose that a
function f : N (0, 1) is an injection. In other words, we can express the range of f as an infinite
sequence f (1), f (2), f (3), . . . of distinct real numbers. To show that f is not a surjection, we show
that this sequence cannot be a listing of all the real numbers in (0, 1) by finding a real number
that is not in the list. We begin by writing the numbers f (i) in decimal form; the list might start
something like this:
f (1) = 0.23454167 . . . ,
f (2) = 0.15367843 . . . ,
f (3) = 0.86954367 . . . ,
f (4) = 0.19919423 . . . ,
f (5) = 0.22453665 . . . ,
Let r be the real number with decimal expansion 0.d1 d2 d3 d4 d5 . . ., where di = 1 unless the decimal
expansion of f (i) has a 1 in the ith place to the right of the decimal point, in which case di = 5. (For
the list above, the expansion would be 0.11151 . . .; the diagonal entries are underlined. However,
note that our method is completely general and not dependent on any particular listing.) This
decimal expansion is different than every expansion in the list and it corresponds to a real number
between 0 and 1. Therefore, the real number r determined in this way is not on the list; that is, the
function f is not surjective. Since f : N (0, 1) was an arbitrary injective function, we conclude
that the interval (0, 1) is uncountable.
For the record, the fact that every real number has a decimal expansion requires proof. Since
the proof of this fact requires properties of the real numbers that we have not discussed, it is not
included here. Furthermore, as the reader may recall, some real numbers have two different decimal
expansions; for example,
0.274999999999 . . . = 0.275000000000 . . .
In the proof of Theorem 4.36, we may insist that the decimal expansion of each f (i) does not end
in all 9s. Our method for choosing a number not in the list (which is just one of many ways to
accomplish this) guarantees that no number whose decimal expansion ends in all 0s or all 9s can
appear. This means that there is no way our generated number could appear in the list but in
a different form. It should be clear how the term Cantor diagonalization process appears. The
listing of the decimal expansions can be interpreted as a very large (infinite in fact) matrix and
the key step is moving along the diagonal and writing down a number that is different than the
diagonal entry. It is a clever technique, one that is both simple and subtle.
Proof. Suppose, by way of contradiction, that the set K of irrational numbers is countably
infinite. Since the rational numbers Q are countably infinite and R = Q K, it follows from
Theorem 4.34 that the set of real numbers is countably infinite. As this is a contradiction to
Theorem 4.36, the set of irrational numbers is uncountable.
Since the set of irrational numbers is uncountable and the set of rational numbers is countably
infinite, it is certainly clear that there are more irrational numbers than there are rational numbers.
It is not difficult to prove that between any two distinct rational numbers there is an irrational
number and between any two distinct irrational numbers there is a rational number. Yet the set
of irrational numbers is, in some sense, much larger than the set of rational numbers. It is difficult
to make sense of these two statements at the same time. Nevertheless, both statements are valid
and both statements are consequences of properties of the real number system.
We have seen that many infinite sets that might seem to have different sizes are in fact the
same size and we have just seen that there are infinite sets that are not the same size. It turns
out that there are infinitely many different sizes of infinite sets. In order to talk about the size
of an infinite set, in much the same way that we talk about the size of a finite set (as in, The set
{a, b, c, d, e} has size 5.), with every set A we associate a symbol A, called the cardinal number
of A, and we say that A = B if and only if A B.
Some cardinal numbers occur so frequently that they have been given special names: N = 0
(aleph-naught) and R = c (the size of the continuum). In this language, we can say that the
size of Q or of Z is 0 , and that the size of the open interval (0, 1) is c.
One familiar feature of finite sizes is that they come in a particular orderthat is, if two sizes
are different, then one is bigger than the other. When can we say that one infinite cardinal number
is bigger than another? Here is a natural way: If A and B are cardinal numbers, define A B to
mean that there is an injection f : A B. There is a potential, but somewhat subtle, problem with
this definition. We are defining a relationship between sizes by referring to particular sets that
have those sizes. What if we were to choose different sets, say A1 and B1 , with the same sizes?
The following lemma shows that there is no cause for concern.
Chapter 4 Functions
a) A A;
b) if A B and B C, then A C.
Proof. Part (a) follows from the simple observation that the identity map iA : A A is an
injection. For part (b), the hypotheses imply that there exist injective functions f : A B and
g: B C. By Theorem 4.16, the function g f : A C is an injection. It follows that A C.
Theorem 4.39 shows that , as applied to infinite cardinal numbers, shares some properties
with in more familiar settings such as the integers or the real numbers. Another property that
we rely on when dealing with real numbers is anti-symmetry: if x y and y x, then x = y.
We state without proof the following result. For the record, the proof is not exceptionally difficult,
but it is rather abstract and hard to grasp.
oder-Bernstein Theorem
If A B and B A, then A = B.
It is sometimes tempting to react to a result like this with, Of course! How could it be
otherwise? This may be due in part to the use of the familiar symbol but just using the
symbol hardly guarantees that it acts like in more familiar contexts. Even paying attention to
the new meaning, this theorem may seem obvious. Perhaps the best way to see that it might not
be so obvious is to look at a special case, one in which the injections f and g are easy to find, but
there does not seem to be any obvious bijection. See Exercise 5 for a similar example.
EXAMPLE 4.41 Suppose D = { (x, y) : x2 + y 2 1 } is the unit disk in the plane and S is the
square { (x, y) : x [1, 1], y [1, 1] }. Since D S, it is clear that D S. Since the function
f : S D defined by f (x, y) = (x/2, y/2) is an injection, we see that S D. By the Schr
oderBernstein Theorem, S = D. It is an interesting exercise to seek an explicit bijection mapping D
onto S; such a search provides an appreciation for the power of Theorem 4.40.
Thus far, we have seen infinite sets of two different sizes, 0 and c. Are there others? Is there a
largest infinite size, that is, a largest cardinal number? Recall that for any set A, the power set of
A, written P(A), is the collection of all subsets of A. For example, P({1, 2}) = {, {1}, {2}, {1, 2}}.
For finite sets, the power set is not just larger than the original set, it is much larger (see Exercise
7). This makes it natural to think that perhaps the power set of an infinite set will be larger than
the base set, that is, A < P(A). To be clear what this last statement is saying, let A < B mean
that A B, but that A and B do not have the same cardinality. The next theorem answers both
questions posed at the beginning of this paragraph.
Cantors Theorem
Proof. Since the function f : A P(A) defined by f (a) = {a} is an injection, we find that
A P(A). To prove that A < P(A), we need to show that there is no bijection g: A P(A). To
obtain a contradiction, suppose that g is such a bijection. Let S = {a A : a
/ g(a)} and note
that S A. Since g is a surjection and S P(A), there exists some x A such that S = g(x).
There are two possibilities to consider: x S and x
/ S.
1. If x S, then x
/ g(x), that is, x
/ S, a contradiction.
2. If x
/ S, then x g(x), that is, x S, a contradiction.
Therefore, no such bijection is possible. (Compare the ideas in this proof with the concept of a
normal set introduced in Exercise 11 in Section 1.5.)
Cantors theorem implies that there are infinitely many infinite cardinal numbers, and that
there is no largest cardinal number. It also has the following interesting consequence:
There is no such thing as the set of all sets.
Suppose A were the set of all sets. Since every element of P(A) is a set, we would have P(A) A,
which then implies that
P(A) A P(A).
By the Schr
oderBernstein Theorem, P(A) = A, but this contradicts Cantors Theorem.
Many questions about the cardinal numbers remain. Since we know that Z and Q are the same
size, and that R is larger, one very natural question is whether there are any sets between Z and
R, that is, strictly bigger than Z (and Q) but strictly smaller than R. The continuum hypothesis
There is no set A with 0 < A < c.
That is, the continuum hypothesis asserts that c is the first cardinal number larger than 0 ; in
symbols, this means that 1 = c. Assuming the usual axioms for our number systems, it is a
remarkable fact that the continuum hypothesis cannot be proved to be true and cannot be proved to
be false. In the 1920s, Kurt Godel showed that the continuum hypothesis cannot be disproved, and
in the early 1960s, Paul Cohen showed that it cannot be proved either. Hence, mathematicians
can choose to add the continuum hypothesis as an axiom or to add the denial of the continuum
hypothesis as an axiom. In each case, different results can be proved. With this conundrum, we
bring to a close our introduction to higher mathematics.
Chapter 4 Functions
Georg Cantor. Cantor (18451918) was born in St. Petersburg and grew up in Germany. He took
an early interest in theological arguments about continuity and the infinite, and as a result studied
philosophy, mathematics, and physics at universities in Zurich, Gottingen, and Berlin, though his
father encouraged him to pursue engineering. He did his doctorate in number theory and then
worked in analysis before doing his pioneering work in the theory of sets.
The prevailing opinion in the nineteenth century was that completed infinities could not be
studied rigorously; only potential infinity made sensefor example, the process of repeatedly
adding one, starting at 1, would never finish and was therefore infinite, but most mathematicians
viewed the completed set of positive integers (or any other infinite set) as a dubious concept at
best. An infinite set can be placed in one-to-one correspondence with a proper subset of itself;
most mathematicians saw this as a paradox, and solved the problem by declaring that infinite
sets simply make no sense.
A few mathematicians went against the grain; Dedekind realized that the paradoxical correspondence between a set and one of its proper subsets could be taken as the definition of an infinite
set. Cantor took this notion much further, showing that infinite sets come in an infinite number of
sizes. Cantor knew most of what we have seen in this chapter: he showed that the rational numbers
are countable, that R is not countable, and that P(A) is always bigger than A. The algebraic
numbers are those real numbers that are roots of polynomials with rational coefficientsfor ex
ample, 2 is a solution of x2 2 = 0, and is therefore both irrational and algebraic (see Exercise
10 for further results concerning algebraic numbers). There are more algebraic numbers than
rational numbers, in the sense that the algebraic numbers form a proper superset of the rationals,
but Cantor showed that the set of algebraic numbers is countable. This means that the transcendental numbers (that is, the non-algebraic numbers, like and e) form an uncountable setso in
fact almost all real numbers are transcendental.
In addition to the arithmetic of infinite cardinal numbers, Cantor developed the theory of
infinite ordinal numbers. The two concepts are practically the same for finite numbers, so the idea
that infinite ordinals and infinite cardinals are different takes some getting used to. Since there is
essentially only one way to make a total order out of four objects (namely, pick a first, a second, a
third and a fourth), the cardinal number 4 (how many) and the ordinal number 4 (what order)
are easily confused. For infinite sets the situation is radically different. The ordinal number of
the positive integers, called , is simply the usual total ordering of the positive integers. Addition
of ordinals is accomplished by placing the orders side by side: 1 + looks like one item followed
by a countable number of items in the same order as the positive integersthis looks just like the
positive integers. On the other hand, + 1 looks like the positive integers followed by a single item,
and is much different than the usual ordering of the positive integers, even though the size of the
two ordered sets is the same. (The easiest way to see that there is a crucial difference between the
two orderings is to note that one element of + 1 has an infinite number of predecessors, while all
of the elements of 1 + have a finite number of predecessors.)
Cantor was unable to secure a position at a major university, including Berlin, where he most
desired to be. This failure was due in large part to the influence of Kronecker, a mathematician at
Berlin, who ridiculed all talk of completed infinities, convinced that only finite processes could be
justified. (As a result, he didnt believe in irrational numbers, since they could not be produced
by a finite process.) Beginning in 1884, Cantor suffered a series of nervous breakdowns, presumably
related to the refusal of so many mathematicians to accept his work; Cantor himself had occasional
doubts about his resultsthe proofs were clear and rigorous, but the results still seemed paradoxical. Cantor died in a mental institution in 1918, though he did get some positive recognition for his
work before his death. Writing a few years after Cantors death, the great mathematician David
Hilbert called Cantors work the most astonishing product of mathematical thought, one of the
most beautiful realizations of human activity in the domain of the purely intelligible. The years
since have more than justified this assessment of Cantors work.
The information here is taken from A History of Mathematics, by Carl Boyer, New York: John
Wiley & Sons, 1968. For a more detailed account of Cantors life and work, see Georg Cantor, His
Mathematics and Philosophy of the Infinite, by Joseph Dauben, Harvard University Press, 1979.
