Discrete Structures
Discrete Structures
Rafael Pass
Wei-Lung Dustin Tseng
Machine Translated by Google
Preface
Discrete mathematics deals with objects that come in discrete bundles, eg, 1 or 2
babies. In contrast, continuous mathematics deals with objects that vary
continuously, eg, 3.42 inches from a wall. Think of digital watches versus analog
watches (ones where the second hand loops around continuously without
stopping).
Why study discrete mathematics in computer science? It does not directly
help us write programs. At the same time, it is the mathematics underlying almost
all of computer science. Here are a few examples:
i
Machine Translated by Google
a) Message Routing
b) Social networks
8. Finite automata and regular languages
a) Compilers
Happy
Happy iii
4 Counting 4.1 61
The Product and Sum Rules. ....... . . . . . . . . . . . 61
4.2 Permutations and Combinations. . . . . . . . . . . . . . . . . 63
4.3 Combinatorial Identities. . ....... . . . . . . . . . . . . . 65
4.4 Inclusion-Exclusion Principle. . . . . . . . . . . . . . . . . . . 69
4.5 Pigeonhole Principle. . . . ....... . . . . . . . . . . . . . 72
5 Probability 73
iii
Machine Translated by Google
6 Logic 95
6.1 Propositional Logic. . . . . . . ....... . . . . . . . . . . . 95
6.2 Logical Inference. . . . . . . . ....... . . . . . . . . . . . 100
6.3 First Order Logic. . . . . . ....... . . . . . . . . . . . . . 105
6.4 Applications. . . . . . . . . ....... . . . . . . . . . . . . . 108
7 Graphs 109
7.1 Graph Isomorphism. . . . . . ....... . . . . . . . . . . . 112
7.2 Paths and Cycles. . . . . . . . ....... . . . . . . . . . . . 115
7.3 Graph Coloring. . . . . . . . . ....... . . . . . . . . . . . 120
7.4 Random Graphs [Optional] . . ....... . . . . . . . . . . . 122
Chapter 1
1.1 Sets
A set is one of the most fundamental object in mathematics.
Example 1.2. The following notations all refer to the same set:
The last example read as “the set of all x such that x is an integer between 1
and 2 (inclusive)”.
We will encounter the following sets and notations throughout the course:
1
Machine Translated by Google
Given a collection of objects (a set), we may want to know how large is the
collection:
Given two collections of objects (two sets), we may want to know if they are
equal, or if one collection contains the other. These notions are formalized as
set equality and subsets:
Definition 1.5 (Set equality). Two sets S and T are equal, written as S = T, if S
and T contains exactly the same elements, ie, for every x, x ÿ S ÿ x ÿ T.
Example 1.7.
Definition 1.8 (Set operations). Given sets S and T, we define the following
operations:
Machine Translated by Google
1.1. SETS 3
Example 1.9. Let S = {1, 2, 3}, T = {3, 4}, V = {a, b}. Then:
Some set operations can be visualized using Venn diagrams. See Figure
1.1. To give an example of working with these set operations, consider the
following set identity.
Proof. We can visualize the set identity using Venn diagrams (see Figure 1.1b
and 1.1c). To formally prove the identity, we will show both of the following:
S ÿ (S ÿ T) ÿ (S ÿ T) (1.1)
(S ÿ T) ÿ (S ÿ T) ÿ S (1.2)
To prove (1.1), consider any element x ÿ S. Either x ÿ T or x /ÿ T.
• If x ÿ S ÿ T, then x ÿ S
Machine Translated by Google
U U
S T S T
(a) S ÿ T (b) S ÿ T
U U
S T S T
(c) S ÿ T (d)S
S T
• If x ÿ S ÿ T, then x ÿ S.
1.2. RELATIONSHIPS 5
Commonly seen set includes {0, 1} n as the set of n-bit strings, and {0, 1} ÿ
as the set of finite length bit strings. Also observe that |[n]| =n.
Before we end this section, let us revisit our informal definition of sets: an
unordered “collection” of objects. In 1901, Russell came up with the following
“set”, known as Russell's paradox1 :
S = {x | x /ÿ x}
That is, S is the set of all sets that don't contain themselves as an element.
This might seem like a natural “collection”, but is S ÿ S? It's not hard to see that
S ÿ S ÿ S /ÿ S. The conclusion today is that S is not a good “collection” of
objects; it is not a set.
So how will you know if {x | x satisfies some condition} is a set? Formally,
sets can be defined axiomatically, where only collections constructed from a
careful list of rules are considered sets. This is outside the scope of this course.
We will take a short cut, and restrict our attention to a well-behaved universe.
Let E be all the objects that we are interested in (numbers, letters, etc.), and let
U = E ÿP(E)ÿP(P(E)), ie, E, subsets of E and subsets of subsets of E. In fact,
we can extend U with three power set operations, or indeed any finite number
of power set operations. Then, S = {x | x ÿ U and some condition holds} is
always a set.
1.2 Relationships
Example 1.15.
• R is transitive iff in its graph, for any three nodes x, y and z such that there
is an edge from x to y and from y to z, there exists an edge from x to z.
• More naturally, R is transitive iff in its graph, whenever there is a path from
node x to node y, there is also a direct edge from x to y.
Proof. The proofs of the first three parts follow directly from the definitions.
The proof of the last bullet connects on induction; we will revisit it later.
Example 1.19. Let R = {(1, 2),(2, 3),(1, 4)} be a relation (say on set Z).
Then (1, 3) ÿ Rÿ (since (1, 2), (2, 3) ÿ R), but (2, 4) ÿ/ Rÿ . See Figure 1.2.
Equivalence relations capture the every day notion of “being the same” or
“equal”.
Machine Translated by Google
1.3. FUNCTIONS 7
2 2
1 3 1 3
4 4
(a) The relation R = {(1, 2),(2, 3),(1, 4)} (b) The relation R , transitive closure of
R
1.3 Functions
Definition 1.23. A function f : S ÿ T is a “mapping” from elements in set S to elements in set
T. Formally, f is a relation on S and T such that for each s ÿ S, there exists a unique t ÿ T
such that (s, t) ÿ R. S is the domain of f, and T is the range of f. {y | y = f(x) for some x ÿ S}
is the image of f.
Example 1.25.
• f : N ÿ N, f(x) = 2x is injective.
Machine Translated by Google
• f: R + ÿ R +, f(x) = x • f: 2 is injective.
R ÿ R, f(x) = x 2
is not injective since (ÿx) 2
2=x .
Example 1.27.
Definition 1.30 (Set cardinality). Let S and T be two potentially infinite sets. S
and T have the same cardinality, written as |S| = |T|, if there exists a bijection f :
S ÿ T (equivalently, if there exists a bijection f : T ÿ S). T has cardinality at larger
or equal to S, written as |S| ÿ |T|, if there exists an injection g : S ÿ T (equivalently,
if there exists a surjection g : T ÿ S).
To “intuitively justify” Definition 1.30, see Figure 1.3. The next theorem
shows that this definition of cardinality corresponds well with our intuition for
size: if both sets are at least as large as the other, then they have the same
cardinality.
Machine Translated by Google
XY XY
(a) An injective function from X to (b) A surjective function from X to Y.
Y.
XY
(c) A bijective function from X to Y.
Proof. Q+ is clearly not finite, so we need a way to count Q+. Note that double
counting, triple counting, even counting some element infinite many times is
okay, as long as we eventually count all of Q+. Ie, we implicitly construct a
surjection f: N + ÿ Q+.
Let us count in the following way. We first order the rational numbers p/q by
the value of p + q; then we break ties by ordering according to p. The ordering
then looks like this:
Implicitly, we have f(1) = 1/1, f(2) = 1/2, f(3) = 2/1, etc. Clearly, f is a surjection.
See Figure 1.4 for an illustration of f.
Figure 1.4: An infinite table containing all positive rational numbers (with
repetition). The red arrow represents how f traverses this table—how we
count the rationals.
Proof. Here we use Cantor's diagonlization argument. Let S be the set of infinite
sequences (d1, d2, . . .) over digits {0, 1}. Clearly S is infinite. To
Machine Translated by Google
ÿ
of the above table, and flipping all the digits. Then for any n, s is different from s
digit. Thisthcontradicts
not
in the n the fact that f is a bijection.
Theorem 1.36. The real interval [0, 1] (the set of real numbers between 0 and 1,
inclusive) is uncountable.
Proof. We will show that |[0, 1]| ÿ |S|, where S is the same set as in the proof of
Theorem 1.35. Treat each s = (d1, d2, . . .) ÿ S as the real number between 0
and 1 with the binary expansion 0.d1d2 · · · . Note that this does not establish a
bijection; some real numbers have two binary expansions, eg, 0.1 = 0.0111 · · ·
2
(similarly, in decimal expansion, we have 0.1 = 0.0999 · · · ).
We can overcome this “annoyance” in two ways:
• Since each real number can have at most two decimal representations
(most only have one), we can easily extend the above argument to show
that |S| ÿ |[0, 2]| (ie, map [0, 1] to one representation, and [1, 2] to the
other). It remains to show that |[0, 1]| = |[0, 2]| (can you think of a bijection
here?).
The Continuum Hypothesis states that no such set exists. G¨odel and
Cohen together showed (in 1940 and 1963) that this can neither be proven
nor disproved using the standard axioms underlying mathematics (we will
talk more about axioms when we get to logic).
Machine Translated by Google
Chapter 2
There are many forms of mathematical proofs. In this chapter we introduce several basic
types of proofs, with special emphasis on a technique called induction that is invaluable
to the study of discrete math.
2 2 = 4k = 2 · (2k 2
2n _
= (2k) ), which is even.
2 2
2n _
= (2k + 1)2 = 4k + 4k + 1 = 2 · (2k + 2k) + 1, which is odd.
There are also several forms of indirect proofs. A proof by contrapos-itive starts by
assuming that the conclusion Y is false, and deduce that the premise X must also be false
through a series of logical steps. See Claim 2.2 for an example.
13
Machine Translated by Google
2
Claim 2.2. Let n be an integer. If n is even, then n is even.
Proof by contrapositive. Suppose that n is not even. Then by Claim 2.1, n is not even 2
as well. (Yes, the proof ends here.)
A proof by contradiction, on the other hand, assume both that the premise X is
true and the conclusion Y is false, and reach a logical fallacy.
We give another proof of Claim 2.2 as example.
Proof by contradiction. Suppose that n Claim 2 is even, but n is odd. Applying cannot
2.1, we see that n even! 2
2 must be odd. Goal n be both odd and
In their simplest forms, it may seem that a direct proof, a proof by con-trapositive,
and a proof and contradiction may just be restatements of each other; indeed, one can
always sentence a direct proof or a proof by contrapositive as a proof by contradiction
(can you see how?). In more complicated proofs, however, choosing the “right” proof
technique sometimes simplifies or improves the aesthetics of a proof. Below is an
interesting use of proof by contradiction.
Proof by contradiction. Assume for contradiction that ÿ 2 is rational. Then there exists
integers p and q, with no common divisors, such that ÿ 2 = p/q (ie, the reduced
fraction). Squaring both sides, we have:
2
p2 2 ÿ 2q 2=p
=2q
This means p 2 is even, and by Claim 2.2 p is even as well. Let us replace p by
2k. The expression becomes:
2 2 = 2k ÿ
2q2 = (2k) 2
2 = 4k q
This time, we conclude that q to 2 is even, and so q is even as well. But this leads
a contradiction, since p and q now shares a common factor of 2.
We end the section with the (simplest form of the) AM-GM inequality.
x+y
2 < ÿ xy
1 2
ÿ <xy squaring non-negative values
4 (x + y)
2
2ÿx + 2xy + y < 4xy
2ÿx 2 ÿ 2xy + y < 0
ÿ 2 <0
(x ÿ y)
Note that the proof Theorem 2.4 can be easily turned into a direct proof;
the proof of Theorem 2.3, on the other hand, cannot.
Proof by cases. There are only 6 different values of n. Let's try them all:
n (n + 1)2 0 1 2 not
ÿ1
1 4 ÿ2
2 9 ÿ4
3 16 ÿ8
4 25 ÿ16
5 36 ÿ32
2 2.
Claim 2.6. For all real x, |x | = |x|
When presenting a proof by cases, make sure that all cases are covered! For
some theorems, we only need to construct one case that satisfy the theorem
statement.statement.
Claim 2.7. Show that there exists some n such that (n + 1)2 ÿ 2
not
.
Machine Translated by Google
Proof by example. n = 6.
The next proof does not explicitly construct the example asked by the
theorem, but proves that such an example exists anyway. These type of proofs
(among others) are non-constructive.
Theorem 2.9. There exists irrational numbers x and y such that x y is rational.
ÿ 2 ÿÿ2 ÿ 2 ÿ 2 = ÿ 2 = 2
yx = (ÿ 2 2) =2
Theorem 2.10. Suppose the game of Chomp is played with rectangular grid
strictly larger than 1 × 1. Player 1 (the first player) has a winning strategy.
Proof. Consider following first move for player 1: eat the lower right most block.
We have two cases1 :
1
Here we use the well-known fact of 2-player, deterministic, finite-move games
without ties: any move is either a winning move (ie, there is a strategy following this move
that forces a win), or allows the opponent to follow up with a winning move. See Theorem
2.14 later for a proof of this fact.
Machine Translated by Google
2.3. INDUCTION 17
• Case 1: There is a winning strategy for player 1 starting with this move.
In this case we are done.
• Case 2: There is no winning strategy for player 1 starting with this move.
In this case there is a winning strategy for player 2 following this move.
But this winning strategy for player 2 is also a valid winning strategy for
players 1, since the next move made by player 2 can be mimicked by
player 1 (here we need the fact that the game is symmetric between the
players).
While we have just shown that Player 1 can always win in a game of Chomp,
no constructive strategy for Player 1 has been found for general rect-angular
grids (ie, you cannot buy a strategy guide in store that tells you how to win
Chomp ). For a few specific cases though, we do know good strategies for
Player 1. Eg, given an × n square grid, Player 1 starts by removing an ÿ 1 × n ÿ
1 (unique) block, leaving an L-shaped piece of chocolate with two “arms”;
thereafter, Player 1 simply mirrors Player 2's move, ie, whenever Player 2 takes
a bite from one of the arms, Player 1 takes the same bite on the other arm.
2.3 Induction
We start with the most basic form of induction: induction over the natural
numbers. Suppose we want to show that a statement is true for all natural
Machine Translated by Google
numbers, eg, for all n, 1 + 2 + · · · + n = n(n + 1)/2. The basic idea is to approach
the proof in two steps:
1. First prove that the statement is true for n = 1. This is called the base
box.
2. Next prove that whenever the statement is true for case n, then it is also
true for case n + 1. This is called the inductive step.
The base case shows that the statement is true for n = 1. Then, by repeatedly
applying the inductive step, we see that the statement is true for n = 2, and then
n = 3, and then n = 4, 5, . . . ; we just covered all the natural numbers!
Think of pushing over a long line of dominoes. The induction step is just like
setting up the dominoes; we make sure that if a domino falls, so will the next one.
The base case is then analogous to pushing down the first domino. The result?
All the dominoes fall.
Follow these steps to write an inductive proof:
1. Start by formulating the inductive hypothesis (ie, what you want to prove). It
should be parametrized by a natural number. Eg, P(n): 1 + 2 + · · · + n =
n(n + 1)/2.
2. Show that P(base) is true for some appropriate base case. Usually basic
is 0 or 1.
3. Show that the inductive step is true, ie, assume P(n) holds and prove that
P(n + 1) holds as well.
Viol`a, we have just shown that P(n) holds for all n ÿ base. Note that the base
case does not always have to be 0 or 1; we can start by showing that something
is P(n) is true for n = 5; this combined with the inductive step shows that P(n) is
true for all n ÿ 5. Let's put our new found power of inductive proofs to the test!
i = 1 n(n + 1)
2
i=1
2.3. INDUCTION 19
Inductive Step: Assume P(n) is true; we wish to show that P(n + 1) is true as well:
n+1 not
i= i + (n + 1)
i=1 i=1
1
= n(n + 1) + n + 1 using P(n)
21 1
= (n(n + 1) + 2(n + 1)) = ((n + 1)(n + 2))
2 2
Proof. Define our induction hypothesis P(n) to be true if for every finite set S of cardinality
|S| = n, |P(S)| = 2n .
Base case: P(0) is true since the only finite set of size 0 is the empty set
ÿ, and the power set of the empty set, P(ÿ) = {ÿ}, has cardinality 1.
Inductive Step: Assume P(n) is true; we wish to show that P(n + 1) is true as well.
Consider a finite set S of cardinality n + 1. Pick an element e ÿ S, and consider S = S ÿ
{e}. By the induction hypothesis, |P(S)| = 2n .
Now consider P(S). Observe that a set in P(S) either contains e or not; furthermore,
there is a one-to-one correspondence between the sets containing e and the sets not
containing e (can you think of the bijection?). We have just partitioned P(S) into two equal
cardinality subsets, one of which is P(S ).
Therefore |P(S)| = 2|P(S)| = 2n+1 .
Claim 2.13. The following two properties of graphs are equivalent (recall that these are
the definitions of transitivity on the graph of a relation):
1. For any three nodes x, y and z such that there is an edge from x to y
and from y to z, there exists an edge from x to z.
2. Whenever there is a path from node x to node y, there is also a direct
edge from x to y.
Proof. Clearly property 2 implies property 1. We use induction to show that property 1
implies property 2 as well. Let G be a graph on which property 1 holds. Define our
induction hypothesis P(n) to be true if for every path of length n in G from node x to node
y, there exists a direct edge from x to y.
Base case: P(1) is simply true (path of length 1 is already a direct edge).
Inductive Step: Assume P(n) is true; we wish to show that P(n + 1) is true as well.
Consider a path of length n + 1 from node x to node y, and let
Machine Translated by Google
z be the first node after x on the path. We now have a path of length n from
node z to y, and by the induction hypothesis, a direct edge from z to y. Now
that we have a directly edge from x to z and from z to y, property 1 implies
that there is a direct edge from x to y.
Theorem 2.14. In a deterministic, finite 2-player game of perfect information without ties,
either player 1 or player 2 has a winning strategy, ie, a
strategy that guarantees a win.2,3
• If all these games have a winning strategy for player 24 , then no matter
what move player 1 plays, player 2 has a winning strategy
• If one these games have a winning strategy for player 1, then player 1
has a winning strategy (by making the corresponding first move).
In the next example, induction is used to prove only a subset of the theo-rem to give
us a jump start; the theorem can then be completed using other
techniques.
3By deterministic, we mean the game has no randomness and depends on only on player
moves (eg, not backgammon). By finite, we mean the game is always ends in some prede-
terminated fix number of moves; in chess, even though there are infinite sequences of moves
that avoid both checkmates and stalemates, many draw rules (eg, cannot have more than
100 consecutive moves without captures or pawn moves) ensures that chess is a finite game.
By perfect information, we mean that both players know each other's past moves (eg, no
fog of war).
4
By this we mean the player 1 of the n-move game (the next player to move) has a
winning strategy
Machine Translated by Google
2.3. INDUCTION 21
1/n
1 not
Let us first prove the AM-GM inequality for values of n = 2k . Define our
induction hypothesis P(k) to be true if AM-GM holds for n = 2k .
Base case: P(0) (ie, n = 1) trivially holds, and P(1) (ie, n = 2) was shown in Theorem
2.4.
Inductive Step: Assume P(k) is true; we wish to show that P(k + 1) is X = (x1, .., x2
true as well. Given a sequence of length 2k+1 xk into , k+1 ), we split it
two sequences 1 2 2 = (x2 k+1, x2 k+2, . . ., x2 k+1 ). Then:
1
AM(X) = (AM(X 1) + AM(X 2))
21
(GM(X 1) + GM(X 2)) ÿ 2 by the induction hypothesis P(k)
ÿÿ ÿ
i=1 ÿÿ i=2k+1 ÿÿ ÿ
1
2k +1 2k+1
= ÿ ÿ
xi = GM(X)
ÿ i=1 ÿ
We are now ready to show the AM-GM inequality for sequences of all lengths. Given
a sequence X = (x1, ..., xn) where n is not a power of 2, find the smallest k such that 2k >
n. Let ÿ = AM(X ), and consider a new
sequence
and verify that AM(X ) = AM(X ) = ÿ. Apply P(k) (the AM-GM inequality for sequences of
length 2k ), we have:
Machine Translated by Google
k
k2 _
1/2
ÿ ÿ
AM(X) = ÿ ÿ GM(X) = xi
ÿ i=1 ÿ
k2 _ not
k
2 2 kÿn
ÿ ÿ ÿ xi = xi · ÿ
i=1 i=1
not
ÿ nÿ _ ÿ xi
i=1
not 1/n
ÿ ÿÿ xi = GM(X)
i=1
Note that for the inductive proof in Theorem 2.15, we needed to show both base cases
P(0) and P(1) to avoid circular arguments, since the inductive step relies on P(1) to be true.
Theorem 2.16. The first player has a winning strategy in the game of “coins on the table”.
Proof. Consider the following strategy for player 1 (the first player). Start first by putting a
penny centered on the table, and in all subsequent moves, simply mirror player 2's last
move (ie, place a penny diagonally opposite of player 2's last penny). We prove by induction
that player 1 can always put down a coin, and therefore will win eventually (when the table
runs out of space).
th
Define the induction hypothesis P(n) to be true if on the n move of player
1, player 1 can put down a penny according to his strategy, and leave the table symmetric
about the center (ie, looks the same if rotated 180 degrees).
Machine Translated by Google
2.3. INDUCTION 23
Base case: P(1) holds since player 1 can always start by putting one
penny at the center of the table, leaving the table symmetric.
Inductive Step: Assume P(n) is true; we wish to show that P(n + 1) is moving, the table
th
true as well. By the induction hypothesis, after player 1's n is symmetric.
Therefore, if player 2 now puts down a penny, the diagonally opposite spot must be free of
pennies, allowing player 1 to set down a penny as well. Moreover, after player 1's move, the
table is back to being symmetric.
The Towers of Hanoi is a puzzle game where there is three poles, and a number of
increasingly larger rings that are originally all stacked in order of size on the first pole, largest
at the bottom. The goal of the puzzle is to move all the rings to another pole (pole 2 or pole
3), with the rule that:
• You can only move one ring at a time, and it must be the top most ring in one of the
three potential stacks.
Proof. Define the induction hypothesis P(n) to be true if the theorem state-ment is true for n
rings.
Base case: P(1) is clearly true. Just move the ring.
Inductive Step: Assume P(n) is true; we wish to show that P(n + 1) is true as well.
Number the rings 1 to n + 1, from smallest to largest (top to bottom on the original stack).
First move rings 1 to n from pole 1 to pole 2; this takes 2n ÿ 1 steps by the induction
hypothesis P(n). Now move ring n + 1 from pole 1 to pole 3. Finally, move rings 1 to n from
pole 2 to pole 3; again, this takes 2n ÿ 1 steps by the induction hypothesis P(n). In total we
have used (2n ÿ 1) + 1 + (2n ÿ 1) = 2n+1 ÿ 1 moves. (Convince yourself that this recursive
definition of moves will never violate the rule that no ring can be placed on top of a smaller
ring.)
Legends say that such a puzzle was found in a temple with n = 64 rings, left for the priests to
solve. With our solution, that would require 264 ÿ1 ÿ 1.8×1019 moves. Is our solution just silly
and takes too many moves?
not
Theorem 2.18. The Towers of Hanoi with n rings requires at least 2 moves to solve. ÿ1
Good luck priests!
Proof. Define the induction hypothesis P(n) to be true if the theorem state-ment
is true for n rings.
Base case: P(1) is clearly true. You need to move the ring.
Inductive Step: Assume P(n) is true; we wish to show that P(n + 1) is true as
well. Again we number the rings 1 to n + 1, from smallest to largest (top to bottom
on the original stack). Consider ring n + 1. It needs to be moved at some point.
Without loss of generality, assume its final destination
is pole 3. Let the k th move be the first move where ring n + 1 is moved away
from pole 1 (to pole 2 or 3), and let the kn th move be the last move where ring
+ 1 is moved to pole 3 (away from pole 1 to pole 2),
Before performing move k, all n other rings must first be moved to the
remaining free pole (pole 3 or 2); by the induction hypothesis P(n), 2n ÿ 1 steps
are required before move k. Similarly, after performing move k, all n other rings
must be on the remaining free pole (pole 2 or 1); by the induction hypothesis P(n),
2nÿ1 steps are required after move k to complete the puzzle.
In the best case where k = k (ie, they are the same move), we still need at least
(2n ÿ 1) + 1 + (2n ÿ 1) = 2n+1 ÿ 1 moves.
Strong Induction
Taking the dominoes analogy one step further, a large domino may require the
combined weight of all the previous toppling over before it topples over as well.
The mathematical equivalent of this idea is strong induction. To prove that a
statement P(n) is true for (a subset of) positive integers, the basic idea
is:
1. First prove that P(n) is true for some base values of n (eg, n = 1).
These are the base cases.
How many base cases do we need? It roughly depends on the following factors:
• What is the theorem? Just like basic induction, if we only need P(n) to
be true for n ÿ 5, then we don't need base cases n < 5.
• What does the induction hypothesis need? Often to show P(n + 1), instead
of requiring that P(k) be true for 1 ÿ k ÿ n, we actually need, say P(n) and
P(n ÿ 1) to be true. Then having the base case of P(1) isn't enough for the
induction hypothesis to prove P(3); P(2) is another required base case.
Machine Translated by Google
2.3. INDUCTION 25
Claim 2.19. Suppose we have an unlimited supply of 3 cent and 5 cent coins.
Then we can pay any amount ÿ 8 cents.
Proof. Let P(n) be the true if we can indeed form n cents with 3 cents and 5
hundred corners.
Induction step: Assume P(k) is true for all 1 ÿ k ÿ n; we wish to show P(n + 1) is
true as well. Given a set W of n + 1 women in which x ÿ W is blonde, take
any two strict subsets A, BW (in particular |A|, |B| < n+ 1) such that they
both contain the blonde (x ÿ A , x ÿ B), and A ÿ B = W (no one is left out).
Applying the induction hypothesis to A and B, we conclude that all the
women in A and B are blonde, and so everyone in W is blonde.
6Another example is to revisit Claim 2.19. If we use the same proof to show that P(n) is
true for all n ÿ 3, without the additional base cases, the proof will be “seemingly correct”.
What is the obvious contradiction?
7Hint: Can you trace the argument when n = 2?
Machine Translated by Google
Recurrence Relations
When an inductive definition generates a sequence (eg, the factorial sequence
is 1, 1, 2, 6, 24, . . .), we call the definition a recurrence relation. We can gen-eralize
inductive definitions and recurrence relations in a way much like us
generalize inductive proofs with strong induction. For example, consider a
sequence defined by:
a0 = 1; a1 = 2; year = 4yearÿ1 ÿ 4yearÿ2
According to the definition, the next few terms in the sequence will be
a2 = 4; a3 = 8
Remember that it is very important to check all the base cases (espe-cially
since this proof uses strong induction). Let us consider another example:
b0 = 1; b1 = 1; bn = 4bnÿ1 ÿ 3bnÿ2
From the recurrence part of the definition, its looks like the sequence (bn)n will
eventually out grow the sequence (an)n. Based only on this intuition, let us
conjecture that bn = 3n .
Wow! Was that a lucky guess or what. Let us actually compute a few terms
of (bn)n to make sure. . .
b2 = 4b1 ÿ 3b0 = 4 ÿ 3 = 1,
b3 = 4b2 ÿ 3b1 = 4 ÿ 3 = 1,
..
.
Looks like in fact, bn = 1 for all n (as an exercise, prove this by induction).
What went wrong with our earlier “proof”? Note that P(n ÿ 1) is only well defined
if n ÿ 1, so the inductive step does not work when we try to show P(1) (when n =
0). As a result we need an extra base case to handle P(1); a simple check shows
that it is just not true: b1 = 1 = 31 = 3. (On the other hand, if we define b = 3, and
b then we can recycle 0 =our
1, b“faulty
1 = 4b
proof” and ÿnÿ2
show
not
3b that
nÿ1
, b
= 3n ).
not
In the examples so far, we guessed at a closed form formula for the se-
quences (an)n and (bn)n, and then proved that our guesses were correct using
induction. For certain recurrence relations, there are direct methods for
computing a closed form formula of the sequence.
2
Proof. The polynomial f(x) = x ÿ (c1x + c2) is called the characteristic
polynomial for the recurrence relation an = c1anÿ1 + c2anÿ2. Its significance
0
can be explained by the sequence (r , r1 , . . .) where r is a root of f(x); we
claim that this sequence satisfies the recurrence relation (with base cases set
1 .
0 as r andr
not
). Let P(n) be true if an = r
Inductive Step: Assume P(k) is true for 0 ÿ k ÿ n; we wish to show
that P(n + 1) is true as well. Observe that:
Recall that there are two distinct roots, r1 and r2, so we actually have two
sequences that satisfy the recurrence relation (under proper base cases). In
fact, because the recurrence relation is linear (an depends linearly on anÿ1 and
anÿ2), and homogeneous (there is no constant term in the recurrence relation),
any sequence of the form an = ÿrn 1 + ÿrn 2 will satisfy the recurrence relation;
(this can be shown using a similar inductive step as above).
Finally, does sequences of the form an = ÿrn 1 + ÿrn 2 cover all possible basis
ÿ ÿ
boxes? The answer is yes. Given any base case a0 = a 0 , 1 a1
, =a we can solve
for the unique value of ÿ and ÿ using the linear system:
ÿ
has
0 = ÿr0 1 + ÿr0 2 = ÿ + ÿ
ÿ
has
1 = ÿr1 1 + ÿr1 2 = ÿr1 + ÿr2
The studious reader should check that this linear system always has a unique
solution (say, by checking that the determinant of the system is non-zero).
In the case that f(x) has duplicate roots, say when a root r has multiplicity m, in order to still
have a total of k distinct sequences, we associate the following m sequences with r:
0r , r1 , r2 , ..., rn ,
00·r, 11· , 22· , ..., nrn ,
r r
((( 02 · r 0 , 21 1r , 22 2r , ..., n2 r , not
. . .) . . .) . . .)
..
.
( 0mÿ1 · r
0
, 1 mÿ1 1 r , 2 mÿ1 2 r , ..., nmÿ 1r not
, . . .)
For example, if f(x) has degree 2 and has a unique root r with multiplicity 2, then the general
form solution to the recurrence is
an = ÿrn + ÿnrn
We omit the proof of this general construction. Interestingly, the same technique is used in
many other branches of mathematics (for example, to solve linear ordinary differential
equations).
As an example, let us derive a closed form expression to the famous Fi-bonacci numbers.
f0 = 0; f1 = 1; fn = fnÿ1 + fnÿ2
Then
not not
1 1+ÿ5 1 1ÿÿ5
fn = (2.1)
ÿ
ÿ5 2 ÿ5 2
Proof. It is probably hard to guess (2.1); we will derive it from scratch. The characteristic
2
polynomial here is f(x) = x ÿ (x + 1), which has roots
1+ÿ5 1ÿÿ5
,
2 2
1+ÿ5 1ÿÿ5
fn = ÿ +ÿ
2 2
Machine Translated by Google
Figure 2.1: Approximating the golden ratio with rectangles whose side lengths are
consecutive elements of the Fibonacci sequence. Do the larger rectangles look more
pleasing than the smaller rectangles to you?
0=ÿ+ÿ1
+ÿ5 1ÿÿ5
1=ÿ +ÿ
2 2
1+ÿ5
1 fn
ÿÿ5 2
because the other term approaches zero. This in term implies that
= 1+ÿ5
fn+1
limnÿÿ
fn 2
which is the golden ratio. It is widely believed that a rectangle whose ratio (length divided by
width) is golden is pleasing to the eye; as a result, the golden ratio can be found in many
artworks and architectures throughout history.
Machine Translated by Google
We then conclude that a googol (10100) grains of sand is not a heap of sand
(this is more than the number of atoms in the observable universe by some
estimates). What went wrong? The base case and the inductive step is per-
fectly valid! There are many “solutions” to this paradox, one of which is to blame
it on the vagueness of the word “heap”; the notion of vagueness is itself a topic
of interest in philosophy.
Alice and Bob to quote is $2. Would you quote $2 (and do you think you are
“rational”)?
Claim 2.22. All the muddy children say yes in the n (and th round of questioning
not earlier) if and only if there are n muddy children.
Inductive Step: Assume P(k) is true for 0 ÿ k ÿ n; we wish to show P(n + 1).
Suppose there are exactly n + 1 muddy children. Since there are more than n
muddy children, it follows by the induction hypothesis that no one will speak
before round n + 1. From the view of the muddy children, they see n other
muddy kids, and know from the start that there are either n or n + 1 muddy
children in total (depending on whether they themselves are muddy). But, by
the induction hypothesis, they know that if there were n muddy children, then
someone would have said yes in round n; since no one has said anything yet,
each muddy child deduces that he/she is indeed muddy and says yes in round
n + 1. Now suppose there are strictly more than n + 1 muddy children. In this
case, everyone sees at least n + 1 muddy children already. By the induction
hypothesis, every child knows from the
Machine Translated by Google
beginning that that no one will speak up in the first n round. Thus in n + 1st
round, they have no more information about who is muddy than when the father
first asked the question, and thus they cannot say yes.
Looking carefully at the proof, we are not making the same mistakes as before
in our examples for strong induction: to show P(n), we rely only on P(n/2), which
always satisfies 0 ÿ n/2 < n , so we are not simply missing base cases.
The only conclusion is that induction just “does not make sense” for the rational
numbers.
C programs. On the other hand, we can inductively define and reason about C
programs. Let us focus on a simpler example: the set of (very limited) arithmetic
expressions defined by the following context free grammar:
Notice that this inductive definition does not give us a sequence of arithmetic
expressions! We can also define the value of an arithmetic expression inductively:
Base Case: The arithmetic expression “0” has value 0, and the expression
“1” has value 1.
Inductive Definition: An arithmetic expression of the form “(expr1+expr2)” has
value equal to the sum of the values of expr1 and expr2. Similarly, an
arithmetic expression of the form “(expr1 × expr2)” has value equal to the
product of the values of expr1 and expr2.
We can even use induction to prove, for example, that any expression of length
n must have value ÿ 2 2 .
not
that “(1 + (0×1))” and “(1 + 1)” are valid arithmetic expressions. Again, this
recursive verification will end in finite time.
Finally, let us consider our faulty example with rational numbers. To show
that the number 2/3 in reduced form is an even number over an odd number,
we need to check the claim for the number 1/3, and for that we need to check
1/6, and 1/12, and . . . ;this never ends, so we never have a complete proof of
the desired (faulty) fact.
Machine Translated by Google
Machine Translated by Google
Chapter 3
Number Theory
Number theory is the study of numbers (in particular the integers), and is one of the
purest branch of mathematics. Regardless, it has many applications in computer
science, particularly in cryptography, the underlying tools that build modern services
such as secure e-commerce. In this chapter, we will touch on the very basics of
number theory, and put an emphasis on its applications to cryptography.
3.1 Severability
A fundamental relationship between two numbers is whether or not one divides another.
37
Machine Translated by Google
38 number theory
Proof. We show only item 1; the other proofs are similar (HW). By definition,
Corollary 3.4. Let a, b, c ÿ Z. If a|b and a|c, then a|mb+nc for any m, n ÿ Z.
We learn in elementary school that even when integers don't divide evenly,
we can compute the quotient and the remainder.
Theorem 3.5 (Algorithm Division). For any a ÿ Z and d ÿ N +, there exists unique
q, r ÿ Z st a = dq + r and 0 ÿ r < d.
q is called the quotient and denoted by q = a div d. r
is called the remainder and denoted by r = a mod d.
Proof. Given a ÿ Z and d ÿ N +, let q = a/d (the greatest integer ÿ a/d), and let r
= a ÿ dq. By choice of q and r, we have a = dq + r. We also have 0 ÿ r < d,
because q is the largest integer such that dq ÿ a. It remains to show uniqueness.
dq + r = dq + r ÿ
d · (q ÿ q ) = r ÿ r.
This implies that d|(r ÿr). But ÿ(dÿ1) ÿ r ÿr ÿ dÿ1 (because 0 ÿ r, r < d), and the
only number divisible by d between ÿ(dÿ1) and dÿ1 is 0. Therefore we must
have r = r, which in turn implies that q = q.
3.1. DIVISIBILITY 39
Example 3.7.
Euclid designed one of the first known algorithms in history (for any problem)
to compute the greatest common divisor:
Example 3.8. Let's trace Euclid's algorithm on inputs 414 and 662.
We now prove that Euclid's algorithm is correct in two steps. First, we show
that if the algorithm terminates, then it does output the correct greatest common
divisor. Next, we show that Euclid's algorithm always terminates (and does so
rather quickly).
Proof. It is enough to show that the common divisors of a and b are the same
as the common divisors of b and (a mod b). If so, then the two pairs of numbers
must also share the same greatest common divisor.
Machine Translated by Google
40 number theory
Proof. This can be shown by induction, using Lemma 3.9 as the inductive step.
(What would be the base case?)
Claim 3.11. For every two recursive calls made by EuclidAlg, the first argument
a is halved.
Theorem 3.13 shows that we can give a certificate for the greatest common
divisor. From Corollary 3.4, we already know that any common divisor of a and
b also divides sa + tb. Thus, if we can identify a common divisor d of a and b,
and show that d = sa + tb for some s and t, this demonstrates d is in fact the
greatest common divisor (d = gcd(a, b)). And there is more good news! This
certificate can be produced by slightly modifying Euclid's algorithm (often called
the extended Euclid's algorithm); this also constitutes
Machine Translated by Google
as a constructive proof of Theorem 3.13. We omitted the proof here and gave an
example instead.
Claim 3.16. a ÿ b (mod m) if and only if a and b have the same remainder
when divided by m, ie, a mod m = b mod m.
Proof. We start with the if direction. Assume a and b have the same remainder
when divided by m. That is, a = q1m + r and b = q2m + r. Then we have
42 number theory
For the only if direction, we start by assuming m|(aÿb). Using the division algorithm,
let a = q1m + r1, b = q2m + r2 with 0 ÿ r1, r2 < m. Because m|(a ÿ b), we have
Since m clearly divides q1m and q2m, it follows by Corollary 3.4 that
m|r1 ÿ r2
The next theorem shows that addition and multiplication “carry over” to the modular
world (specifically, addition and multiplication can be computed before or after computing
the remainder).
1. a + c ÿ b + d (mod m) 2. ac
ÿ bd (mod m)
For item 2, using Claim 3.16, we have unique integers r and r such that
a = q1m + r b = q2m + rd
c = q 1m + r = q2m + r
Clever usage of Theorem 3.17 can simplify many modular arithmetic cal-
culations.
Machine Translated by Google
Example 3.18.
Note that exponentiation was not included in Theorem 3.17. Because ÿ (a mod n)
does carry over, we have a already used this e e (mod n); we have multiplication
fact in the example. However, in general we cannot perform e mod n (mod n). modular
e ÿa
exponent first, ie, a operations on the
Hashing. The age-old setting that call for hashing is simple. How do we efficiently
retrieve (store/delete) a large number of records? Take for exam-ple student records,
where each record has a unique 10-digit student ID. We cannot (or do not want) a
table of size 1010 to index all the student records indexed by their ID. The solution?
Store the records in an array of size N where N is a bit bigger than the expected
number of students. The record for student ID is then stored in position h(ID) where h
is a hash function that maps IDs to {1, . . . , NOT}. One very simple hash-function
would be
h(k) = k mod N
ISBN. Most published books today have a 10 or 13-digit ISBN number; we will focus
on the 10-digit version here. The ISBN identifies the country of publication, the
publisher, and other useful data, but all this information is stored in the first 9 digits;
the 10thdigit is a redundancy check for errors.
The actual technical implementation is done using modular arithmetic. a10 be the
it must pass digits of an ISBN number. In order to be a valid ISBN Let a1, . . . , number,
the check:
44 number theory
• a transposition occurred, ie, two digits were swapped (this is why in the
check, we multiply ai by i).
If 2 or more errors occur, the errors may cancel out and the check may still
pass; fortunately, more robust solutions exist in the study of error correcting
codes.
Cast out 9s. Observe that a number is congruent to the sum of its digits modulo
9. (Can you show this? Hint: start by showing 10n ÿ 1 (mod 9) for any n ÿ N +.)
The same fact also holds modulo 3. This allows us to check if the computation
is correct by quickly performing the same computation modulo 9. (Note that
incorrect computations might still pass, so this check only increase our
confidence that the computation is correct.)
• Choose a modulus m ÿ N +, •
a multiply a ÿ 2, 3, . . . , m ÿ 1, and • an
increment c ÿ Zm = {0, 1, . . . , m ÿ 1}
(an encryption scheme) so that Alice may “scramble” her messages to Bob (an
encryption algorithm) in a way that no one except Bob may “unscramble” it (a
decryption algorithm).
4. The scheme is correct; that is, decrypting a valid cipher-text should output
the original plain text. Formally we require that for all m ÿ M, k ÿ K,
Deck(Enck(m)) = m.
To use a private encryption scheme, Alice and Bob first meet in advance
and run k ÿ Gen together to agree on the secret key k. The next time Alice has
a private message m for Bob, she sends c = Enck(m) over the insecure channel.
Once Bob receives the cipher-text c, he decrypts it by running m = Deck(c) to
read the original message.
46 number theory
For example, if k = 3, then we substitute each letter in the plain-text according to the
following table:
plain-text: ABCDEFGHIJKLMNOPQRSTUVWXYZ
cipher-text: DEFGHIJKLMNOPQRSTUVWXYZABC
Proof. Correctness is trivial, since for all alphabets m and all keys k,
Nowadays, we know the Caesar Cipher is not a very good encryption scheme. There
are numerous freely available programs or applets on-line that can crack the Caesar Cipher.
(In fact, you can do it too! After all, there are only 26 keys to try.) The next example is on
the other extreme; it is a perfectly secure private-key encryption scheme. We wait until a
later chapter to formalize the notion of perfect secrecy; for now, we simply point out that at
least the key length of the one-time pad grows with the message length (ie, there is not just
26 keys).
Example 3.22 (One-Time Pad). In the one-time pad encryption scheme, the key is required
to be as long as the message. During encryption, the entire key is used to mask the plain-
text, and therefore “perfectly hides” the plain-text.
Formally, let M = {0, 1} n K = {0,, 1} n , and
plain-text: 0100000100101011 ÿ
key: 1010101001010101 cipher-
text: 1110101101111110
3.3. PREMIUMS 47
Proof. Again correctness is trivial, since for mi ÿ {0, 1} and all ki ÿ {0, 1}, mi = ((mi + ki)
mod 2 ÿ ki) mod 2 (equivalently, mi = mi ÿ ki ÿ ki) .
Private-key encryption is limited by the precondition that Alice and Bob must meet
in advance to (securely) exchange a private key. Is this an inherent cost for achieving
secure communication?
First let us ask: can parties communicate securely without having secrets?
Unfortunately, the answer is impossible. Alice must encrypt her message based on
some secret key known only to Bob; otherwise, everyone can run the same decryption
procedure as Bob to view the private message. Does this mean Alice has to meet with
Bob in advance?
Fortunately, the answer this time around is no. The crux observation is that maybe
we don't need the whole key to encrypt a message. Public-key cryptography, first
proposed by Diffie and Hellman in 1976, splits the key into two parts: an encryption key,
called public-key, and a decryption key, called the secret-key. In our example, this
allows Bob to generate his own public and private key-pair without meeting with Alice.
Bob can then publish his public-key for anyone to find, including Alice, while keeping
his secret-key to himself. Now when Alice has a private message for Bob, she can
encrypt it using Bob's public-key, and be safely assured that only Bob can decipher her
message.
To learn more about public-key encryption, we need more number theory;
in particular, we need to notion of prime numbers.
3.3 Bonuses
Primes are numbers that have the absolute minimum number of divisors; they are only
divisible by themselves and 1. Composite numbers are just numbers that are not prime.
Formally:
Definition 3.24 (Primes and Composites). Let p ÿ N and p ÿ 2. p is prime if its only
positive divisors are 1 and p. Otherwise p is composite (ie, there exists some a such
that 1 < a < n and a|n).
Note that the definition of primes and composites exclude the numbers 0 and 1.
Also note that, if n is composite, we may assume that there exists some a such that 1 <
a ÿ ÿ n and a|n. This is because given a divisor d|n with 1 < d < n, then 1 < n/d < n is
also a divisor of n; moreover, one of d or n/d must be at most ÿ n.
Machine Translated by Google
48 number theory
Distribution of Bounties
How many primes are there? Euclid first showed that there are infinitely many
primes.
Proof. Assume the contrary that there exists a finite number of primes p1, . . . , p.n.
Consider q = p1p2 · · · pn + 1. By assumption, q is not prime. Let a > 1 be the
smallest number that divides q. Then a must be prime (or else it could not be the
smallest, by transitivity of divisibility), ie, a = pi for some i (since p1, . . ., pn are
all the primes).
We have pi |q. Since q = p1p2 · · · pn + 1 and pi clearly divides p1p2 · · · pn,
we conclude by Corollary 3.4 that pi |1, a contradiction.
Not only are there infinitely many primes, primes are actually common
(enough).
Theorem 3.27 (Prime Number Theorem). Let ÿ(N) be the number of primes ÿ N.
Then
lime
ÿ(N) =1
Nÿÿ N/ ln N
We omitted the proof, as it is out of the scope of this course. We can interpret
the theorem as follows: there exists (small) constants c1 and c2 such that
If we consider n-digit numbers, ie, 0 ÿ x < 10n , roughly 10n/ log 10n = 10n/n
numbers are prime. In other words, roughly 1/n fraction of n-digit numbers are
prime.
Machine Translated by Google
3.3. PREMIUMS 49
Given that prime numbers are dense (enough), here is a method for finding
a random n-digit prime:
Relative Primality
Primes are numbers that lack divisors. A related notion is relative primality, where a pair of
number lacks common divisors.
Definition 3.28 (Relative Primality). Two positive integers a, b ÿ N + relatively prime if are
gcd(a, b) = 1.
Clearly, a (standalone) prime is relatively prime to any other number ex-cept a multiple
of itself. From Theorem 3.13 (ie, from Euclid's algorithm), we have an alternative
characterization of relatively prime numbers:
Corollary 3.29. Two positive integers a, b ÿ N + are relatively prime if and only if there exists
s, t ÿ Z such that sa + tb = 1.
ÿ1 ÿ1
Theorem 3.30. Let a, b ÿ N +. There exists an element a 1 (mod b) such that a ·a ÿ
Proof of Theorem 3.30. If direction. If a and b are relatively prime, then there exists s, t such
that sa + tb = 1 (Corollary 3.29). Rearranging terms,
sa = 1 ÿ tb ÿ 1 (mod b)
ÿ1
therefore s = a .
Machine Translated by Google
50 number theory
Lemma 3.31. If a and b are relatively prime and a|bc, then a|c.
Proof. Because a and b are relatively prime, there exists s and t such that sa +
tb = 1. Multiplying both sides by c gives sac + tbc = c. Since a divides the left
hand side (a divides a and bc), a must also divides the right hand side (ie, c).
Lemma 3.32. If p is prime and p | i=1 ai, then there exist some 1 ÿ j ÿ n
such that p|aj .
Proof. We proceed by induction. Let P(n) be the statement: For every prime p,
not
Lemma 3.31, p | i=1 ai . We can then use the induction hypothesis to show that
there exists 1 ÿ j ÿ n such that p|aj .
3.3. PREMIUMS 51
Inductive Step: Assume P(k) holds for all 2 ÿ k ÿ n ÿ 1. We will show P(n) (for n ÿ 3). If
n is prime, then we have the factorization of n = n, and this is unique (anything else would
contradict that n is prime). If n is composite, we show existence and uniqueness of the
factorization separately:
Existence. If n is composite, then there exists a, b such that 2 ÿ a, b ÿ nÿ1 and n = ab.
Apply the induction hypothesis P(a) and P(b) to get their respective factorization,
and “merge them” for a factorization of n.
Since 1 < n = n/p1 < n, the induction hypothesis shows that the two factorizations of
n are actually the same, and so the two factorization of n are also the same (adding
back the terms p1 = qj0 ).
Open Problems
Number theory is a field of study that is rife with (very hard) open problems.
Here is a small sample of open problems regarding primes.
Goldbach's Conjecture, first formulated way back in the 1700's, states that any positive
even integer other than 2 can be expressed as the sum of two primes. For example 4 = 2
+ 2, 6 = 3 + 3, 8 = 3 + 5 and 22 = 5 + 17.
With modern computing power, the conjecture has been verified for all even integers up to
ÿ 1017 .
The Twin Prime Conjecture states that there are infinitely pairs of primes that differ by
2 (called twins). For example, 3 and 5, 5 and 7, 11 and 13 or 41 and 43 are all twin primes.
A similar conjecture states that there are infinitely many safe primes or Sophie-Germain
primes — pairs of primes of the form p and 2p + 1 (p is called the Sophie-Germain prime,
and 2p + 1 is called the safe prime). For example, consider 3 and 7, 11 and 23, or 23 and
47. In cryptographic applications, the use of safe primes sometimes provides more security
guarantees.
Machine Translated by Google
52 number theory
For example, ÿ(6) = 2 (the relatively prime numbers are 1 and 5), and ÿ(7) = 6 (the
relatively prime numbers are 1, 2, 3, 4, 5, and 6). By definition ÿ(1) = 1 (although this is
rather uninteresting). The Euler ÿ function can be computed easily on any integer for
which we know the unique prime factor-ization (computing the unique prime factorization
itself may be difficult). In k1 k2 km fact, if the prime factorization of n is n = p then 1 p · · ·
p2
m,
1
ÿ(n) = n 1ÿ = (p iki kiÿ1 ÿ p i ) (3.5)
pi
i i
While we won't prove (3.5) here (it is an interesting counting exercise), we do state and
show the following special cases.
Claim 3.35. If p is a prime, then ÿ(p) = p ÿ 1. If n = pq where p = q are both primes, then
ÿ(n) = (p ÿ 1)(q ÿ 1).
ÿ(n) = n ÿ p ÿ q + 1 = pq ÿ p ÿ q + 1 = (p ÿ 1)(q ÿ 1)
By Theorem 3.30, X is the set of all numbers that have multiplicative inverses
modulo n; this is useful in the rest of the proof.
We first claim that X = aX (this does indeed hold in the example). We
prove this by showing X ÿ aX and aX ÿ X.
X ÿ aX. Given x ÿ X, we will show that x ÿ aX. Consider the number a ÿ1x
mod n (recall that a is theÿ1 multiplicative inverse of a, and exists
since gcd(a, n) = 1). We claim that a ÿ1x mod n ÿ X, since it has the
multiplicative inverse x ÿ1a. Consequently, a(a ÿ1x) ÿ x (mod n) ÿ aX.
Knowing aX =
xÿ yÿ ax (mod n)
xÿX yÿaX xÿX
Since each x ÿ
xÿX
s):
1ÿ a (mod n)
xÿX
x
has ÿa x mod ÿ(n) (mod n)
1
In class we showed this differently. Observe that |aX| ÿ |X| (since elements in aX are
“spawned” from X). Knowing that X ÿ aX, |aX| ÿ |X|, and the fact that these are finite sets
allows us to conclude that aX ÿ X (in fact we can directly conclude that aX = X).
Machine Translated by Google
54 number theory
Proof. Let x = qÿ(n) + r be the result of dividing x by ÿ(n) with remainder (recall that r is exactly
x mod ÿ(n)). Then
Example 3.39. Euler's function can speed up modular exponentiation by a lot. Let n = 21 = 3 ×
7. We have ÿ(n) = 2 × 6 = 12. Then
999 999 mod 12
2 ÿ2 = 23 = 8 (mod 21)
• If n is not prime, then with some probability, say 1/2, the algorithm may still output YES
incorrectly.
Looking at it from another point of view, if the algorithm ever says n is not prime, then n is
definitely not prime. With such an algorithm, we can ensure that n is prime with very high
probability: run the algorithm 200 times, and believe n is prime if the output is always YES. If n
is prime, we always correctly conclude that it is indeed prime. If n is composite, then we would
only incorrectly view it as a prime with probability (1/2)200 (which is so small that is more likely
to encounter some sort of hardware error).
How might we design such an algorithm? A first approach, on input n, is to pick a random
number 1 < a < n and output YES if and only if gcd(a, n) = 1.
Certainly if n is prime, the algorithm will always output YES. But if n is not prime, this algorithm
may output YES with much too high probability; in fact, it outputs YES with probability ÿ ÿ(n)/n
(this can be too large if say n = pq, and ÿ(n) = (p ÿ 1)(q ÿ 1)).
We can design a similar test relying on Euler's Theorem. On input n, pick a random 1 < a
nÿ1
< n and output YES if and only if a ÿ 1 (mod n). Again, if n is prime, this test will always output
YES. What if n is composite? For most composite numbers, the test does indeed output YES
with sufficiently small probability. However there are some composites, called Carmichael
numbers or pseudo-primes, on which this test always outputs YES incorrectly (ie, a Carmichael
number n has the property that for all 1 < a < n, a ÿ 1 (mod n), yet n is not prime).
nÿ1
Machine Translated by Google
By adding a few tweaks to the above algorithm, we would arrive at the Miller-Rabin
primality test that performs well on all numbers (this is out of the scope of this course).
nÿ1
For now let us focus on computing a way of computing a requires nÿ1 . The naive
nÿ1
multiplications — in that case we might as well just divide n by all numbers less than n.
A more clever algorithm is to do repeated squaring:
The correctness of ExpMod is based on the fact the exponent e can be expressed
as (e div 2) · 2 + e mod 2 by the division algorithm, and therefore
e mod 2
To analyze the efficiency of ExpMod, observe that x is easy to compute
1
0 = x or x
(it is either x = 1), and that the recursion has depth ÿ log2 e since the exponent e is
halved in each recursive call. The intuition behind ExpMod is simple. By repeated
squaring, it is must faster to compute exponents that are powers of two, eg, to compute
16 2 ÿ
x ÿ x 16. Exponents that are not powers requires squaring four times: x ÿ x
48xÿx
of two can first be split into sums of powers of two; this is the same concept
as binary representations for those who are familiar with it. As an example, suppose we
want to compute 4 19 would be 10112). By repeated squaring, we first compute:
19
mod 13. First observe that 19 = 16 + 2 + 1 (the binary representation of
The takeaway of this section is that primality testing can be done ef-ficiently, in time
polynomial in the length (number of digits) of the input number n (ie, in time polynomial
in log n).
Machine Translated by Google
56 number theory
4. The scheme is correct; that is, decrypting a valid cipher-text with the
correct pair of keys should output the original plain text. Formally we
require that for all (pk, sk) ÿ K, m ÿ Mpk, Decsk(Encpk(m)) = m.
• Gen(n) picks two random n-bit primes p and q, set N = pq, and pick a
random e such that 1 < e < n, gcd(e, ÿ(N)) = 1. The public key is
Machine Translated by Google
pk = (N, e), while the secret key is sk = (p, q) (the factorization of N).
N is called the modulus of the scheme.
Correctness of RSA
First and foremost we should verify that all three algorithms, Gen, Enc and Dec,
can be efficiently computed. Gen involves picking two n-bit primes, p and q,
and an exponent e relatively prime to pq; we covered generating random primes
in Section 3.3, and choosing e is simple: just make sure e is not a multiple of p
or q (and a random e would work with very high probability).
Enc and Dec are both modular exponentiations; we covered that in Section
ÿ1
the Dec additionally requires us to compute d = e 3.4. mod ÿ(N); knowing
secret-key, which contains the factorization of N, it is easy to compute ÿ(N) =
(p ÿ 1)(q ÿ 1), and then compute d using the extended GCD algorithm and
Theorem 3.30.
Next, let us verify that decryption is able to recover encrypted messages.
Given a message m satisfying 0 < m < N and gcd(m, N) = 1, we have:
d
Dec(Enc(m)) = ((me ) mod N) mod N
= med mod N
= med mod ÿ(N) mod N by Corollary 3.38 = m1
mod N since d = e ÿ1 mod ÿ(N)
= m mod N = m
This calculation also shows why the message space is restricted to {m | 0 < m
< N, gcd(m, N) = 1}: A message m must satisfy gcd(m, N) = 1 so that we can
apply Euler's Theorem, and m must be in the range 0 < m < N so that when we
recover m mod N, it is actually equal to the original message m.
Security of RSA
Let us informally discuss the security of RSA encryption. What stops Eve from
decrypting Alice's messages? The assumption we make is that without
Machine Translated by Google
58 number theory
knowing the secret key, it is hard for Eve to compute d = e modÿ1 ÿ(N). In
particular, we need to assume the factoring conjecture: there is no efficient
algorithm that factor numbers N that are products of two equal length primes p
and q (formally, efficient algorithm means any probabilistic algorithm that runs
in time polynomial in the length of N , ie, the number of digits of N).
Otherwise Eve would be able to recover the secret-key and decrypt in the same
way as Bob would.
There is another glaring security hole in the our description of the RSA
scheme: the encryption function is deterministic. What this means is that once
the public-key is fixed, the encryption of each message is unique! For example,
there is only one encryption for the word “YES”, and one encryption for the word
“NO”, and anyone (including Eve) can compute these encryptions (it is a public-
key scheme after all). If Alice ever sends an encrypted YES or NO answer to
Bob, Eve can now completely compromise the message.
One solution to this problem is for Alice to pad each of her message m with
a (fairly long) random string; she than encrypts the resulting padded message
m as before, outputting the cipher-text (m) e mod N (now the whole encryption
procedure is randomized). This type of “padded RSA” is implemented in practice.
x0 = random seed
xi = ax + c mod M
In C++, we have a = 22695477, c = 1, and M = 232. Never mind the fact that
the sequence (x0, x1, x2, . . .) has a pattern (that is not very random at all).
Because there are only 232 starting values for x0, we can simply try them all,
and obtain the secret-key of any RSA key-pair generated using LCG and C+
+.2
Padded RSA
We have already discussed why a padded scheme for RSA is necessary for
se-curity. A padded scheme also has another useful feature; it allows us to define
2
In Java M = 248, so the situation improves a little bit.
Machine Translated by Google
a message space that does not depend on the choice of the public-key (eg, it
would be tragic if Alice could not express her love for Bob simply because Bob
chose the wrong key). In real world implementations, designing the padding
scheme is an engineering problem with many practical considerations; here we
give a sample scheme just to illustrate how padding can be done. Given a se-
curity parameter n, a padded RSA public-key encryption scheme can proceed
as follows:
nÿ1
• Gen(n) picks two random n-bit primes, p, q > 2 and set N , = pq, and pick a
random e such that 1 < e < n, gcd(e, ÿ(N)) = 1. The public key is pk = (N,
e), while the secret key is sk = (p, q) (the factorization of N). N is called
the modulus of the scheme.
RSA signatures
We end the section with another cryptographic application of RSA: digital
signatures. Suppose Alice wants to send Bob a message expressing her love,
“I love you, Bob,” and Alice is so bold and confident that she is not afraid of
eavesdroppers. However Eve is not just eavesdropping this time, but out to
sabotage the relationship between Alice and Bob. She sees Alice's message,
and changes it to “I hate you, Bob” before it reaches Bob. How can cryptog-
raphy help with this sticky situation? A digital signature allows the sender of a
message to “sign” it with a signature; when a receiver verifies the signature, he
or she can be sure that the message came from the sender and has not been
tampered.
Machine Translated by Google
60 number theory
In the RSA signature scheme, the signer generates keys similar to the RSA encryption
scheme; as usual, the signer keeps the secret-key, and publishes the public-key. To sign
a message m, the signer computes:
ÿm = md mod N
Anyone that receives a message m along with a signature ÿ can perform the following
check using the public-key:
eÿ mod N ?= m
The correctness and basic security guarantees of the RSA signature scheme is the
same as the RSA encryption scheme. Just as before though, there are a few security
concerns with the scheme as described.
Consider this attack. By picking the signature ÿ first, and computing anyone can forge
em=ÿ a signature, although the message m is most likely meaningless (what if the
attacker gets lucky?). Or suppose Eve collects two signatures, (m1, ÿ1) and (m2, ÿ2); now
she can construct a new signature (m = m1 ·m2 mod N, ÿ = ÿ1 ·ÿ2 mod N) (very possible
that the new message m is meaningful). To prevent these two attacks, we modify the
signature scheme to first transform the message using a “crazy” function H (ie, ÿm = H(m)
d
mod N).
Another important consideration is how do we sign large messages (eg, lengthy
documents)? Certainly we do not want to increase the size of N. If we apply the same
solution as we did for encryption — break the message into chunks and sign each chunk
individually — then we run into another security hole. Suppose Alice signed the sentences
“I love you, Bob” and “I hate freezing rain” by signing the individual words; then Eve can
collect and rearrange these signatures to produce a signed copy of “I hate you, Bob”. The
solution again relies on the crazy hash function H: we require H to accept arbitrary large
messages as input, and still output a hash < N. A property that H must have is collision
resistance: it should be hard to find two messages, m1 and m2, that hash to the same
thing H(m1) = H(m2) (we wouldn't want “I love you, Bob” and “I hate you, Bob” to share the
same signature).
Machine Translated by Google
Chapter 4
Counting
Counting is a basic mathematical tool that has uses in the most diverse
circumstances. How much RAM can a 64-bit register address? How many poker
hands form full houses compared to flushes? How many ways can ten coin
tosses end up with four heads? To count, we can always take the time to
enumerate all the possibilities; but even just enumerating all poker hands is
already daunting, let alone all 64-bit addresses. This chapter covers several
techniques that serve as useful short cuts for counting.
The product and sum rules represent the most intuitive notions of counting.
Suppose there are n(A) ways to perform task A, and regardless of how task A
is performed, there are n(B) ways to perform task B. Then, there are n(A) · n(B)
ways to perform both task A and task B; this is the product rule. This can
generalize to multiple tasks, eg, n(A) · n(B) · n(C) ways to perform task A, B,
and C, as long as the independence condition holds, eg, the number of ways to
perform task C does not depend on how task A and B are done.
Example 4.1. On an 8 × 8 chess board, how many ways can I place a pawn and
a rook? First I can place the pawn anywhere on the board; there are 64 ways.
Then I can place the rook anywhere except where the pawn is; there are 63
ways. In total, there are 64 × 63 = 4032 ways.
61
Machine Translated by Google
62 counting
Example 4.2. On an 8 × 8 chess board, how many ways can I place a pawn
and a rook so that the rook does not threaten the pawn? First I can place the
rook anywhere on the board; there are 64 ways. At the point, the rook takes up
on square, and threatens 14 others (7 in its row and 7 in its column).
Therefore I can then place the pawn on any of the 64 ÿ 14 ÿ 1 = 49 remaining
squares. In total, there are 64 × 49 = 3136 ways.
Example 4.3. If a finite set S has n elements, then |P(S)| = 2n . We have seen
a proof of this by induction; now we will see a proof using the product rule. P(S)
is the set of all subsets of S. To form a subset of S, each of the n elements can
either be in the subset or not (2 ways). Therefore there are 2n possible ways to
form unique subsets, and so |P(S)| = 2n .
Example 4.4. How many legal configurations are there in the towers of Hanoi?
Each of the n rings can be on one of three poles, giving us 3n config-urations.
Normally we would also need to count the height of a ring relative to other rings
on the same pole, but in the case of the towers of Hanoi, the rings sharing the
same pole must be ordered in a unique fashion: from small at the top to large
at the bottom.
The sum rule is probably even more intuitive than the product rule.
Suppose there are n(A) ways to perform task A, and distinct from these, there
are n(B) ways to perform task B. Then, there are n(A) + n(B) ways to perform
task A or task B. This can generalize to multiple tasks, eg, n(A) + n(B) + n(C)
ways to perform task A, B, or C, as long as the distinct condition holds, eg, the
ways to perform task C are different from the ways to perform task A or B.
Example 4.5. To fly from Ithaca to Miami you must fly through New York or
Philadelphia. There are 5 such flights a day through New York, and 3 such
flights a day through Philadelphia. How many different flights are there in a day
that can take you from Ithaca to get to Miami? The answer is 5 + 3 = 8.
Example 4.6. How many 4 to 6 digit pin codes are there? By the product rule,
the number of distinct n digit pin codes is 10n (each digit has 10 possibilities).
By the sum rule, we have 104 + 105 + 106 number of 4 to 6 digit pin codes (to
state the obvious, we have implicitly used the fact that every 4 digit pin code is
different from every 5 digit pin code).
Machine Translated by Google
Permutations
Example 4.9. How many one-to-one functions are there from a set A with m
elements to a set B with n elements? If m > n we know there are no such one-
to-one functions. If m ÿ n, then each one-to-one function f from A to B is a m-
permutation of the elements of B: we choose m elements from B in an ordered
manner (eg, first chosen element is the value of f on the first element in A).
Therefore there are P(n, m) such functions.
Combinations
Let us turn to unordered selections.
1Recall that 0! = 1.
Machine Translated by Google
64 counting
For example, how many ways are there to put two pawns on a 8 × 8 chess board?
We can select 64 possible squares for the first pawn, and 63 possible remaining squares
for the second pawn. But now we are over counting, eg, choosing squares (b5, c8) is the
same as choosing (c8, b5) since the two pawns are identical. Therefore we divide by 2 to
get the correct count: 64 × 63/2 = 2016. More generally,
Theorem 4.11.
not!
C(n, r) =
(n ÿ r)!r!
Proof. Let us express P(n, r) in turns of C(n, r). It must be that P(n, r) = C(n, r)P(r, r),
because to select an r-permutation from n elements, we can first selected an unordered
set of r elements, and then select an ordering of the r elements. Rearranging the
expression gives:
Example 4.12. How many poker hands (ie, sets of 5 cards) can be dealt from a standard
deck of 52 cards? Exactly C(52, 5) = 52!/(47!5!).
Example 4.13. How many full houses (3 of a kind and 2 of another) can be dealt from a
standard deck of 52 cards? Recall that we have 13 denominations (ace to king), and 4
suits (spades, hearts, diamonds and clubs). To count the number of full houses, we may
• Pick 3 cards from this denomination (out of 4 suits): there are C(4, 3) =
4 choices.
• Next pick a denomination for the “2 of a kind”: there are 12 choices left (different
from the “3 of a kind”).
Figure 4.1: Suppose there are 5 balls and 3 urns. Using the delimiter idea, the
first row represents the configuration (1, 3, 1) (1 ball in the first urn, 3 and 1 ball
second configuration, in the third). The second row represents the balls in the
(4, 0, 1) (4 balls in the first urn, none in the second, and 1 ball in the third). In
general, we need to choose 2 positions out of 7 as delimiters (the rest of the
positions are the 5 balls).
Example 4.14. How many solutions are there to the equation x+y+z = 100, if x,
y, z ÿ N? This is just like having 3 distinguishable urns (x, y and z) and 100
indistinguishable balls, so there are C(102, 2) solutions.
66 counting
one can also prove these identities by churning out the algebra, but that is boring. We
start with a few simple identities.
An algebraic proof of the same fact (without much insight) goes as follows:
not! not!
C(n, k) = = = C(n, n ÿ k) (n
(n ÿ k)!k! ÿ (n ÿ k))!(n ÿ k)!
Lemma 4.16 (Pascal's Identity). If 0 < k ÿ n, then C(n + 1, k) = C(n, k ÿ 1) + C(n, k).
• If it is, then it remains to choose kÿ1 elements from the first n elements. • If it isn't,
then we need to choose all k elements from the first n elements.
Pascals identity, along with the initial conditions C(n, 0) = C(n, n) = 1, gives a recursive
way of computing the binomial coefficients C(n, k). The recursion table is often written as
a triangle, called Pascal's Triangle; see Figure 4.2.
Proof. Let us once again count the number of possible subsets of a set of n elements. We
have already seen by induction and by the product rule that there are 2n such subsets;
this is the RHS.
Another way to count is to use the sum rule:
not not
1 1
11 10 11
= 2 2 2
121 0 1 2
3 3 3
1331
+ 30 1 2 3
4 4 4 4 4
14641 0 1 2 3 4
Figure 4.2: Pascal's triangle contains the binomial coefficients C(n, k) ordered
as shown in the figure. Each entry in the figure is the sum of the two entries
on top of it (except the entries on the side which are always 1).
Proof. The identity actually gives two ways to count the following problem: given n people,
how many ways are there to pick a committee of any size, and then pick a chairperson of
the committee? The first way to count is:
• For committees of size k, there are C(n, k) ways of choosing the committee, and
independently, k ways of choosing a chairperson from the committee.
not
• For the remaining n ÿ 1 people, each person can either be part of the committee or
not; there are 2nÿ1 possibilities.
nÿ1
• This gives a total of n2 possibilities; this is the RHS.
68 counting
Proof. Let M be a set with m elements and N be a set with n elements. Then the LHS
represents the number of possible ways to pick r elements from M and N together.
Equivalently, we can count the same process by splitting into r cases (the sum rule): let k
range from 0 to r, and consider picking r ÿ k elements from M and k elements from M.
The next theorem explains the name “binomial coefficients”: the combination function
C(n, k) are also the coefficient of powers of the simplest binomial, (x + y).
not
= nÿkk y
(x+y) n C(n, k)x
k=0
7
Example 4.21. What is the coefficient of x 13y in the expansion of (xÿ3y) 20? as (x +
20
gives us the term (ÿ3y))20 and apply the binomial theorem, which We write (x ÿ 3y)
7
C(20, 7)x 13(ÿ3y) = ÿ3 7C(20, 7)x 13y 7 .
If we substitute specific values for x and y, the binomial theorem gives us more
combinatorial identities as corollaries.
not
Proof. Simply write 2n = (1+1)n and expand using the binomial theorem.
not
= C(n, 0) +n (ÿ1)kC(n, k)
k=1
This can be observed using the Venn Diagram. The counting argument goes as
follows: To count the number of ways to perform A or B (|X ÿ Y |) we start by
adding the number of ways to perform A (ie, |X|) and the number of ways to
perform B (ie, |Y |). But if some of the ways to perform A and B are the same (|X
ÿ Y |), they have been counted twice, so we need to subtract those.
Example 4.24. How many positive integers ÿ 100 are multiples of either 2 or 5?
Let A be the set of multiples of 2 and B be the set of multiples of 5. Then |A| =
50, |B| = 20, and |A ÿ B| = 10 (since this is the number of multiples of 10). By the
inclusion-exclusion principle, we have 50 + 20 ÿ 10 = 60 multiples of either 2 or
5.
Machine Translated by Google
70 counting
What if there are more tasks? For three sets, we can still gleam from the
Venn diagram that
|X ÿ Y ÿ Z| = |X| + |Y | + |Z| ÿ |X ÿ Y | ÿ |X ÿ Z| ÿ |Y ÿ Z| + |X ÿ Y ÿ Z|
More generally,
Proof. Consider some x ÿ ÿiAi . We need to show that it gets counted exactly
one in the RHS. Suppose that x is contained in exactly m of the starting sets
(A1 to An), 1 ÿ m ÿ n. Then for each k ÿ m, x appears in C(m, k) many k-way
intersections (that is, if we look at Ai for all |I| =iÿIk, x appears in C(m, k) many
terms) . Therefore, the number of times x gets counted by the inclusion-
exclusion formula is exactly
m
(ÿ1)k+1C(m, k)
k=1
Example 4.26. How many onto functions are there from a set A with n elements
to a set B with m ÿ n elements? We start by computing the number
th of functions that are not onto. Let Ai be the set of functions that miss the i
th
element of B (ie, does not have the i element of B in its range). ÿ mi=1Ai is then
the set of functions that are not onto. By the inclusion exclusion principle, we
have:
m m
Ai = (ÿ1)k+1 Ai
i=1 k=1 Iÿ{1,...,m},|I|=kiÿI
For any k and I with |I| = k, observe that ÿiÿIAi is the set of functions that miss a
particular set of k elements, therefore
Ai = (m ÿ k) n
iÿI
Machine Translated by Google
Also observe that there are exactly C(m, k) many different I's of size k. Using these two
facts, we have
m m
Ai = (ÿ1)k+1 Ai
i=1 k=1 Iÿ{1,...,m},|I|=kiÿI
m
= (ÿ1)k+1C(m, k)(m ÿ k) n
k=1
Finally, to count the number of onto functions, we take all possible functions (mn many)
and subtract that functions that are not onto:
m m
where the last step relates on (ÿ1)kC(m, k)(m ÿ k) n = mn when k = 0. This m k=0(ÿ1)kC(m,
closely related to the Sterling
k)(m ÿ k)number
n final expression
of the second
is kind, another counting function
(similar to C(n, k)) that is out of the scope of this course.
Counting the complement follows the same philosophy as the inclusion-exclusion principle:
sometimes it is easier to over-count first, and subtract some later.
This is best explained by examples.
Example 4.27. How many standard poker hands (5 cards from a 52-card deck) contain at
least a pair2 ? We could count the number of hands that are (strict) pairs, two-pairs, three-
of-a-kinds, full houses and four-of-a-kinds, and sum up the counts. It is easier, however,
to count all possible hands, and subtract the number of hands that do not contain at least
a pair, ie, hands where all 5 cards have different ranks:
2
By this we mean hands where at least two cards share the same rank. A slightly
more difficult question (but perhaps more interesting in a casino) is how many hands is
better or equal to a pair? (ie, a straight does not contain a pair, but is better than a pair.)
Machine Translated by Google
72 counting
Proof. Assume the contrary that every pigeon hole contains ÿ n/kÿ1 < n/k many
pigeons. Then the total number of pigeons among the pigeon holes would be
strictly less than k(n/k) = n, a contradiction.
Example 4.29. In a group of 800 people, there are at least 800/366 = 3 people
with the same birthday.
Machine Translated by Google
Chapter 5
Probability
What is probability? What does it mean that a fair coin toss comes up heads with
probability 50%? One interpretation is Bayesian: “50%” is a statement of our beliefs,
and how much we are willing to bet on one coin toss. Another interpretation is more
experimental: “50%” means that if we toss the coin 10 million times, it will come up
heads in roughly 5 million tosses. Regardless of how we view probability, this chapter
introduces the mathematical formalization of probability, accompanied with useful
analytical tools to go with the formalization.
1Without formally defining this term, we refer to random processes whose outcomes are discrete,
such as dice rolls, as opposed to picking a uniformly random real number from zero
to one.
73
Machine Translated by Google
74 chance
2
called the probability mass function. f(x) Additionally, f satisfies the property
xÿS = 1.
Intuitively, the sample space S corresponds to the set of possible states that
the world could be in, and the probability mass function f assigns a probability
from 0 to 1 to each of these states. To model our conventional notion of
probability, we require that the total probability assigned by f to all possible
states should sum up to 1.
Definition 5.2 (Event). Given a probability space (S, f), an event is simply a
subset of S. The probability of an event E, denoted by Pr(S,f) [E] = Pr[E], is f(x).
to be xÿE In particular, the event that includes “everything”, defined
E = S, has probability Pr[S] = 1.
Even though events and probabilities are not well-defined without a prob-
ability space (eg, see the quote of the chapter), by convention, we often omit S
and f in our statements when they are clear from context.
Example 5.3. Consider rolling a regular 6-sided die. The sample space is S = {1,
2, 3, 4, 5, 6}, and the probability mass function is constant: f(x) = 1/6 for all x ÿ
S. The event of an even roll is E = {2, 4, 6}, and this occurs with probability
1
Pr[E] = f(x) =
2
xÿ{2,4,6}
The probability mass function used in the above example has a (popular)
property: it assigns equal probability to all the elements in the sample space.
2
By [0, 1] we mean the real interval {x | 0 ÿ x ÿ 1}
Machine Translated by Google
Proof. Let f take on the constant value ÿ. First note that ÿ = 0, because it would force f(x)
= 0, violating the definition
xÿS of mass functions. Next f(x) = note that S cannot be infinite,
it would force ÿ, again violating the definition of mass functions. because ÿ=
xÿS xÿS
1= f(x) = ÿ = |S|ÿ
xÿS xÿS
1
ÿ |S| =
ÿ
1
Pr[E] = ÿ= = |E|
xÿE xÿE
|S| |S|
Example 5.6. What is the probability that a random hand of five cards in poker is a full
house? We have previously counted the number of possible five-card hands and the
number of possible full houses (Example 4.12 and 4.13).
Since each hand is equally likely (ie, we are dealing with an equiprobable probability
space), the probability of a full house is:
76 chance
Example 5.7. We may have the probability space (N +, f) where f(n) = 1/2 n This .
corresponds with the following experiment: how many coin tosses does it take for a head
to come up?3 We expect this to be a well-defined probability space since it corresponds
to a natural random process. But to make sure, we = 1.4 verify that nÿN+ 1/2 n
Example 5.8. Perhaps at a whim, we want to pick the positive integer n with probability
proportional to 1/n2 . In this case we need to normalize 1/n2 = ÿ 2/6.5 we can assign f(n)
n2 ), so that nÿN+ f(n) = 1. nÿN+ = the probability. Knowing that (6/ÿ2 )(1/
Example 5.9. Suppose now we wish to pick the positive integer n with probability
proportional to 1/n. This time we are bound to fail, since the series 1 + 1/2 + 1/3 + · · ·
diverges (approaches ÿ), and cannot be normalized.
Probabilities
Now that probability spaces are defined, we give a few basic properties of probability:
Claim 5.10. If A and B are disjoint events (A ÿ B = ÿ) then Pr[A ÿ B] = Pr[A] + Pr[B].
Proof. By definition,
Pr[A ÿ B] = f(x)
xÿAÿB
= Pr[A] + Pr[B]
.
4
One way to compute the sum is to observe that it is a converging geometric series. and
More directly, let S = 1/2 + 1/4 + · · · · · , observe that S = 2S ÿ S = (1 + 1/2 + 1/4 +
·) ÿ (1/2 + 1/4 + · · ·) = 1.
5
This is the Basel problem, first solved by Euler.
Machine Translated by Google
When events are not disjoint, we instead have the following generalization
of the inclusion-exclusion principle.
Claim 5.12. Given events A and B, Pr[A ÿ B] = Pr[A] + Pr[B] ÿ Pr[A ÿ B].
Proof. First observe that A ÿ B = (A ÿ B) ÿ (B ÿ A) ÿ (A ÿ B) and that all the terms on the
RHS are disjoint. Therefore
Similarly, we have
because, say A is the disjoint union of A ÿ B and A ÿ B. Substituting (5.2) and (5.3) into
(5.1) gives
Corollary 5.13 (Union Bound). Given events A and B, Pr[AÿB] ÿ Pr[A]+ Pr[B]. In general,
given events A1 . . . , Year,
Pr Ai ÿ Pr[Ai ]
i i
78 chance
Conditional Probability
Suppose after receiving a random 5-card hand dealt from a standard 52-card deck, we
are told that the hand contains “at least a pair” (that is, at least two of the cards have the
same rank). How do we calculate the probability of a full-house given this extra
information? Consider the following thought
process:
• Start with the original probability space of containing all 5-card hands, pair or no
pair.
• Re-normalize the probability among the remaining hands (that contain at least a
pair).
Definition 5.14. Let A and B be events, and let Pr[B] = 0. The conditional probability of
A, conditioned on B, denoted by Pr[A | B], is defined as
Pr[A ÿ B]
Pr[A | B] =
Pr[B]
Pr[A | B] = |A ÿ B|/|B|
Example 5.15 (Second Ace Puzzle). Suppose we have a deck of four cards: {Aÿ, 2ÿ, Aÿ,
2ÿ}. After being dealt two random cards, facing down, the dealer tells us that we have at
least one ace in our hand. What is the probability that our hand has both aces? That is,
what is Pr[ two aces | at least one ace ]?
Because we do not care about the order in which the cards were dealt, we have an
equiprobable space with 6 outcomes:
If we look closely, five of the outcomes contain at least one ace, while only one outcome
has both aces. Therefore Pr[ two aces | at least one ace ] = 1/5.
Machine Translated by Google
Now what if the dealer tells us that we have the ace of spades (Aÿ) in our
hand? Now Pr[ two aces | has ace of spades ] = 1/3. It might seem strange that
the probability of two aces has gone up; why should finding out the suit of the
ace we have increase our chances? The intuition is that by finding out the suit
of our ace and knowing that the suit is spades, we can eliminate many more
hands that are not two aces.
80 chance
Independence
By defining conditional probability, we model how the occurrence of one event can affect
the probability of another event. An interesting equally concept is independence, where
a set of events do not affect each other.
If there are just two events, A and B, then they are independent if and only if Pr[A ÿ
B] = Pr[A] Pr[B]. The following claim gives justification to the definition of independence.
Claim 5.18. If A and B are independent events and Pr[B] = 0, then Pr[A | B] = Pr[A]. In
other words, conditioning on B does not change the probability of A.
Proof.
The following claim should also hold according to our intuition of independence:
Claim 5.19. If A and B are independent events, then A and B are also independent events.
In other words, if A is independent of the occurrence of B, then it is also independent of
the “non-occurrence” of B.
Proof.
Proof. If we denote success by S and failure by F, then our probability space is the
set of n-character strings containing the letters S and F (eg, SF F · · · F denotes the
out come that the first Bernoulli trial is successful, while all the rest failed). Using
our counting tools, we know that number of such strings with exactly k occurrences
of S (success) is C(n, k). Each of those strings occurs with probability p due to
k
independence. (1 ÿ p) nÿk
Bayes' Rule
Suppose that we have a test against a rare disease that affects only 0.3% of the
population, and that the test is 99% effective (ie, if a person has the disease the
test says YES with probability 0.99, and otherwise it says NO with probability 0.99).
If a random person in the populous tested positive, what is the probability that he
has the disease? The answer is not 0.99. Indeed, this is an exercise in conditional
probability: what are the chances that a random person has the rare disease, given
the occurrence of the event that he tested positive?
Claim 5.21. Let A1, . . . , An be disjoint events with non-zero probability such Ai =
S). i S (ie, the events are exhaustive; the events partition the sample that space
not
Let B be an event. Then Pr[B] = Pr[B | Ai ]P r[Ai ] i=1
Pr[B ÿ Ai ]
i=1
Since A1, . . . ,An are disjoint it follows that the events B ÿ A1, . . . , B ÿ An are
also disjoint. Therefore
not not not
82 chance
Theorem 5.22 (Bayes' Rule). Let A and B be events with non-zero probabilities. Then:
Pr[A | B] Pr[B]
Pr[B | A] =
Pr[A]
Corollary 5.23 (Bayes' Rule Expanded). Let A and B be events with non-zero probability.
Then:
Pr[A | B] Pr[B]
Pr[B | A] =
Pr[B] Pr[A | B] + Pr[B] Pr[A | B]
Proof. We apply Claim 5.21, using that B and B are disjoint and B ÿ B = S.
We return to our original question of testing for rare diseases. Let's consider the sample
space S = {(t, d) | t ÿ {0, 1}, d ÿ {0, 1}}, where t represents
the outcome of the test on a random person in the populous, and d represents
whether the same person carries the disease or not. Let D be event that a
randomly drawn person has the disease (d = 1), and T be the event that a
randomly drawn person tests positive (t = 1).
We know that Pr[D] = 0.003 (because 0.3% of the population has the dis-ease). We
also know that Pr[T | D] = 0.99 and Pr[T | D] = 0.01 (because the
test is 99% effective). Using Bayes' rule, we can now calculate the probability
that a random person, who tested positive, actually has the disease:
Pr[T | D] Pr[D]
Pr[D | T] =
(Pr[D] Pr[T | D] + Pr[D] Pr[T | D])
.99 ÿ .003
= = 0.23
.003 ÿ .99 + .997 ÿ .01
Notice that 23%, while significant, is a far cry from 99% (the effectiveness
of the test). This final probability can vary if we have a different prior (initial
Machine Translated by Google
belief). For example, if a random patient has other medical conditions that raises the
probability of contracting the disease up to 10%, then the final probability of having the
disease, given a positive test, raises to 92%.
Conditional Independence
Bayes' rule shows us how to update our beliefs when we receive new information. What
if we receive multiple signals at once? How do we compute Pr[A | B1 ÿ B2]? First we need
the notion of conditional independence.
In other words, given that the event A has occurred, then the events B1, . . . , Bn are
independent.
When there are only two events, B1 and B2, they are conditionally independent given
event A if and only if Pr[B1 ÿ B2 | A] = Pr[B1 | A] Pr[B2 | HAS].
The notion of conditional independence is somewhat fickle, illustrated by the following
examples:
Independence does not imply conditional independence. Suppose we toss a fair coin
twice; let H1 and H2 be the event that the first and second coin tosses come up
heads, respectively. Then H1 and H2 are independent:
1 1 1
· =
Pr[H1] Pr[H2] = = Pr[H1 ÿ H2] 4
2 2
However, if we are told that the at least one of the coin tosses came up tails (call
this event T), then H1 and H2 are no longer independent given T:
1 1
·
Pr[H1 | T] Pr[H2 | T] = 3 = 0 = Pr[H1 ÿ H2 | T] 3
Conditional independence does not imply independence. Suppose we have two coins,
one is heavily biased towards heads, and the other one is heavily biased towards
tails (say with probability 0.99). First we choose
Machine Translated by Google
84 chance
a coin at random; let BH be the event that we choose the coin that is biased
towards heads. Next we toss the chosen coin twice; let H1 and H2 be the event
that the first and second coin tosses come up heads, respectively. Then, given that
we chose the coin biased towards heads (the event BH), H1 and H2 are independent:
However, H1 and H2 are not independent, since if the first toss came up heads, it
is most likely that we chose the coin that is biased towards heads, and so the
second toss will come up heads as well. Actually probabilities are:
1 1 1
· =
Pr[H1] Pr[H2] = = Pr[H1 ÿH2] = 0.5(0.992 )+ 0.5(0.012 ) ÿ 0.5
2 2 4
Let us return to the question of computing Pr[A | B1 ÿ B2]. If we assume that the signals
B1 and B2 are independent when conditioned on A, and also independent when
conditioned on A, then:
Pr[A | B1 ÿ B2]
= Pr[B1 ÿ B2 | A] Pr[A]
Pr[A] Pr[B1 ÿ B2 | A] + Pr[A] Pr[B1 ÿ B2 | HAS]
= Pr[B1 | A] Pr[B2 | A] Pr[A]
Pr[A] Pr[B1 | A] Pr[B2 | A] + Pr[A] Pr[B1 | A] Pr[B2 | HAS]
In general, given signals B1, . . . , Bn that are conditionally independent given A and
conditionally independent given A, we have
Wi = i Pr [Wi | spam]
Pr spam |
i i Pr [Wi | spam] + i Pr [Wi | not spam]
Back to the example of 100 coin tosses, given any outcome of the
experiment s ÿ S, we would define X(s) to be the number of heads that occurred
in that outcome.
Definition 5.26. Given a random variable X on probability space (S, f), we can
consider a new probability space (S , fX) where the sample space is the range
of s ÿ S}, and the probability mass function is extended from f, fX(x) = PrS,f [{x
| X(s) = x}]. We call fX the probability distribution or the probability density
function of the random variable X. Similarly defined, the cumulative distribution
or the cumulative density function of the random variable X(s) ÿ x}].
Example 5.27. Suppose we toss two 6-sided dice. The sample space would be
pairs of outcomes, S = {(i, j) | i, j ÿ {1, . . . , 6}}, and the probability
Machine Translated by Google
86 chance
5.4. EXPECTATION 87
Definition 5.28 . A sequence of random variables X1 , . . , Xik and for any real numbers Xn are (mu-
x1, x2, . . . , xk, the events X1 = xi1 , independent. X2 = xi2 , . . . , Xik = xk are (mutually)
In the case of two random variables X and Y, they are independent if and
only if for all real values x and y, Pr[X = x ÿ X = y] = Pr[X = x] Pr[Y = y].
As mentioned before, a common use of independence is to model the out-
come of consecutive coin tosses. This time we model it as the sum of
independent random variables. Consider a biased coin that comes up heads with
probability p. Define X = 1 if the coin comes up heads and X = 0 if the coin
comes up tails; then X is called the Bernoulli random variable (with probability
p). Suppose now we toss this biased coin n times, and let Y be the
random variable that denotes the total number of occurrence of heads.8 We
not
can view Y as a sum of independent random variables, wherei=1Xi Xi is ,
a Bernoulli random variable with probability p that represents the outcome
th of the i toss. We leave it as an exercise to show that the random variables
X1, . . . ,Xn are indeed independent.
5.4 Expectation
Given a random variable defined on a probability space, what is its “average”
value? Naturally, we need to weigh things according to the probability that
the random variable takes on each value.
E[X] = f(s)X(s)
sÿS
8
Just for fun, we can calculate the density function and cumulative density function of
Y. By Theorem 5.20, fY (k) = C(n, k)p k
(1 ÿ p) nÿk , and FY (k) = Pk i nÿi .
i=0 C(n, i)p (1 ÿ p)
Machine Translated by Google
88 chance
Pr[X = x] · x
xÿ range of
Proof.
Pr[X = x]g(x)
xÿ range of
= f(s)g(x)
xÿ range of X sÿ(X=x)
= f(s)g(X(s)) xÿ
range of X sÿ(X=x)
= f(s)g(X(s)) = E[g(X)]
sÿS
Example 5.31. Suppose in a game, with probability 1/10 we are paid $10, and with
probability 9/10 we are paid $2. What is our expected payment?
The answer is
1
9 $10 + $2 = $2.80 10
10
Example 5.32. Given a biased coin that ends up heads with probability p, how many
tosses does it take for the coin to show heads, in expectation?
We can consider the state space S = {H, TH, TTH, TTTH, . . . }; these are possible
results of a sequence of coin tosses that ends when we see the first
Machine Translated by Google
5.4. EXPECTATION 89
head. Because each coin toss is independent, we define the probability mass function to
be
i
E[X] = (i + 1)p(1 ÿ p)
i=0
ÿ
1
=p (i + 1)(1 ÿ p) 1i=p2p =
p
i=0
with probability 1/10 we get paid $10 and gets utility 8 with
probability 9/10 we get paid $2 and gets utility 0
This gives a positive expected utility of 0.8, so we should play the game!
This reasoning of utility does not always explain human behavior though.
Suppose there is a game that costs a thousand dollars to play. With one chance in a
million, the reward is two billion dollars (!), but otherwise there is no reward. The expected
utility is
1 1 (2 × 109 ÿ 1000) + (1 ÿ )(0 ÿ
1000) = 1000 106 106
One expects to earn a thousand dollars from the game on average. Would you play it?
Turns out many people are risk-averse and would turn down
9
Recall that an infinite geometric series with ratio |x| < 1 converges to Pÿ 1/(1 ÿ x). =
ixi = 0
10 i
To see this, let S = Pÿ i=0(i + 1)x , and observe that if |x| < 1, then S(1 ÿ x) is a + · · ·)ÿ(x
converging geometric series: S(1ÿx) = S ÿxS = (x (x + · · ·)
0 1
+ 3x 2 + · · ·) = + 2x1 + 2x 2
0
1 + x
= 1/(1 ÿ x).
Machine Translated by Google
90 chance
the game. After all, except with one chance in a million, you simply lose a
thousand dollars. This example shows how expectation does not capture all the
important features of a random variable, such as how likely does the random
variable end up close to its expectation (in this case, the utility is either -1000 or
two billion, not close to the expectation of 1000 at all).
In other instances, people are risk-seeking. Take yet another game that takes
a dollar to play. This time, with one chance in a billion, the reward is a million
dollars; otherwise there is no reward. The expected utility is
1 1
109(106 ÿ 1) + (1 ÿ )(0 ÿ 1) = ÿ0.999 109
Essentially, to play the game is to throw a dollar away. Would you play the
game? Turns out many people do; this is called a lottery. Many people think
losing a dollar will not change their life at all, but the chance of winning a million
dollars is worth it, even if the chance is tiny. One way to explain this behavior
within the utility framework is to say that perhaps earning or losing just a dollar is
not worth 1 point in utility.
Linearity of Expectation
One nice property of expectation is that the expectation of the sum of random
variables, is the sum of expectations. This can often simplify the calculation of
expectation (or in applications, the estimation of expectation). More generally,
not not
E aiXi = ai E[Xi ]
i=1 i=1
Machine Translated by Google
5.4. EXPECTATION 91
Proof.
not not
= aif(s)Xi(s)
sÿS i=1
not
= f(s)Xi(s)
have
i=1 sÿS
not
= ai E[Xi ]
i=1
Example 5.34. If we make n tosses of a biased coin that ends up heads with th
probability p, what is the expected number of heads? Let Xi = 1 if the i toss is heads,
and Xi = 0 otherwise. Then Xi is an independent Bernoulli random variable with probability
p, and has expectation
E[Xi ] = p · 1 + (1 ÿ p) · 0 = p
E Xi = E[Xi ] = np
i=1 i=1
Thus if the coin was fair, we would expect (1/2)n, half of the tosses, to be
heads.
Markov's Inequality
The expectation of a non-negative random variable X gives us a (relatively weak) bound
on the probability of X growing too large:
Proof. The expectation of X is the weighted sum of the possible values of X, where the
weights are the probabilities. Consider the random variable Y defined by
a if X ÿ a 0 if
Y=
a>Xÿ0
Machine Translated by Google
92 chance
Clearly Y ÿ X at all times and so E[Y ] ÿ E[X] (easy to verify by the definition
of expectation). Now observe that
E[X] ÿ E[Y ] = a · Pr[Y = a] + 0 · Pr[Y = 0] = a · Pr[X ÿ a]
Rearranging the terms gives us Markov's inequality.
Example 5.36. Let Then most people are comfortable with the
assumption that X would not exceed one thousand times its expectation,
because
EX] 1
Pr[X ÿ 1000 E[X]] ÿ =
1000 E[X] 1000
5.5 Variance
Consider the following two random variables:
1.
2. Y = 0 with probability 1 ÿ 10ÿ6 , and Y = 106 with probability 10ÿ6 .
Both X and Y has expectation 1, but they have very different distributions.
To capture their differences, the variance of a random variable is introduced to
capture how “spread out” is the random variable away from its expectation.
Definition 5.37. The variance of a random variable
Var[X] = E[(XE[X])2 ]
Intuitively, the term (X ÿ E[X])2 measures the distance of X to its
expectation. The term is squared to ensure that the distance is always
positive (perhaps we could use absolute value, but it turns out defining
variance with a square gives it much nicer properties).
Example 5.38. Going back to the start of the section, the random variable
X (that takes the constant value 1) has E[X] = 1 and Var[X] = 0 (it is never
different from its mean). The random variable
0 wp 1 ÿ 10ÿ6
Y=
106 wp 10ÿ6
also has expectation E[Y ] = 1, but variance
Var[Y ] = E[(Y ÿ E[Y ])2 ] = (1 ÿ 10ÿ6 ) · (0 ÿ 1)2 + (10ÿ6 )(106 ÿ 1)2 ÿ 106
Machine Translated by Google
5.5. VARIANCE 93
Chebyshev's Inequality
Knowing the variance of a random variable
Example 5.43. The variance (more specifically the square root of the vari-ance) can be
used as a “ruler” to measure how much a random variable By convention, let Var[X] = ÿ
Var[X] 2 . Then
1
=
Pr[|X ÿ E[X]| ÿ nÿ] ÿ n2ÿ 2
n2
Machine Translated by Google
Machine Translated by Google
Chapter 6
Logic
“Logic will get you from A to B. Imagination will take you everywhere.”
–Albert Einstein
(P ÿ Q) ÿ R
If P is the atom “it is raining in Ithaca”, Q is the atom “I have an umbrella”, and R is the
atom “I open my umbrella”, then the formula reads as:
95
Machine Translated by Google
96 logic
¬ÿ, ÿ ÿ ÿ, ÿ ÿ ÿ, ÿ ÿ ÿ, ÿ ÿ ÿ
ÿ ÿ ¬ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ
TTF T T T T
TFF F T F F
FTTF T T F
FFTF F T T
Figure 6.1: The truth table definition of the connectives NOT (¬), AND (ÿ),
OR (ÿ), implication (ÿ), and equivalence (ÿ).
Most of the definitions are straightforward. NOT flips a truth value; AND
outputs true iff both inputs are true, OR outputs true iff at least one of the
inputs are true; equivalence outputs true iff both inputs have the same truth
value. Implication (ÿ) may seem strange at first. ÿ ÿ ÿ is false only if ÿ is
true, yet ÿ is false. In particular, ÿ ÿ ÿ is true whenever ÿ is false, regardless
of what ÿ is. An example of this in English might be “if pigs fly, then I am the
Machine Translated by Google
president of the United States”; this seems like a correct statement regardless
of who says it since pigs don't fly in our world.1
Finally, we denote the truth value of a formula ÿ, evaluated on an inter-
pretation I, by ÿ[I]. We define ÿ[I] inductively:
• If ÿ = ÿ1 ÿ ÿ2, then ÿ[I] = ÿ1[I] ÿ ÿ2[I] (using Table 6.1). The value of ÿ[I] is
similarly defined if ÿ = ÿ1 ÿ ÿ2, ÿ1 ÿ ÿ2 or ÿ1 ÿ ÿ2.
Given a formula ÿ, we call the mapping from interpretations to the truth value
of ÿ (ie, the mapping that takes I to ÿ[I]) the truth table of ÿ.
At this point, for convenience, we add the symbols T and F as special
atoms that are always true or false, respectively. This does not add anything
real substance to propositional logic since we can always replace T by “P ÿ
¬P” (which always evaluates to true), and F by “P ÿ ¬P” (which always
evaluates to false).
Equivalence of Formulas
We say that two formulas ÿ and ÿ are equivalent (denoted ÿ ÿ ÿ) if for all
interpretations I, they evaluate to the same truth value (equivalently, if ÿ and
ÿ have the same truth table). How many possible truth tables are there over n
atoms? Because each atom is either true or false, we have 2n interpretations.
A formula can evaluate to true or false on each of the interpretations, resulting
not
1
A related notion, counterfactuals, is not captured by propositional implication. In the
sentence “if pigs were to fly then they would have wings”, the speaker knows that pigs do not
fly, but wish to make a logical conclusion in an imaginary world where pigs do. Formalizing
counterfactuals is still a topic of research in logic.
Machine Translated by Google
98 logic
PQ ÿ (= P ÿ Q)
TT T
TF F
FT T
FF T
We find the rows where ÿ is true; for each such row we create an AND formula
that is true iff P and Q takes on the value of that row, and then we OR these
formulas together. That is:
The equivalence
P ÿ Q ÿ ¬P ÿ Q (6.1)
is a very useful way to think about implication (and a very useful formula for
manipulating logic expressions).
Finally, we remark that we do not need both OR and AND (ÿ and ÿ) to
capture all truth tables. This follows from De Morgan's Laws:
¬(ÿ ÿ ÿ) ÿ ¬ÿ ÿ ¬ÿ ¬(ÿ
(6.2)
ÿ ÿ) ÿ ¬ÿ ÿ ¬ÿ
Coupled with the (simple) equivalence ¬¬ÿ ÿ ÿ, we can eliminate AND (ÿ), for
example, using
ÿ ÿ ÿ ÿ ¬(¬ÿ ÿ ¬ÿ)
Definition 6.2 (Validity). A formula ÿ is valid (or a tautology) if for all a truth
assignments I, I |= ÿ.
• P ÿ ¬P is unsatisfiable.
• P ÿ ¬P is valid.
(P ÿ Q) ÿ (Q ÿ P)
(¬P ÿ Q) ÿ (¬Q ÿ P)
Proof. The claim essentially follows from definition. If ÿ is valid, then ÿ[I] = T for
every interpretation I. This means (¬ÿ)[I] = F for every interpretation I, and so ¬ÿ
is unsatisfiable. The other direction follows similarly.
2
In complexity jargon, checking if a formula is satisfiable is “NP-complete”, and finding
an efficient algorithm to determine satisfiability would show that P=NP.
3
In complexity jargon, the unsatisfiability problem is co-NP complete. The major open
problem here is whether or not NP=coNP; that is, whether there exists an efficient way of
convincing someone that a formula is unsatisfiable.
Machine Translated by Google
100 logic
“Bob carries an umbrella if it is cloudy and the forecast calls for rain.”
Next we know that
Can we conclude that Bob is not carrying an umbrella? The answer is no.
Bob may always carry an umbrella around to feel secure (say in Ithaca).
To make sure that we make correct logical deductions in more complex
settings, let us cast the example in the language of propositional logic. Let P be
the atom “it is cloudy”, Q be the atom “the forecast calls for rain”, and R be the
atom “Bob carries an umbrella”. Then we are given two premises:
(P ÿ Q) ÿ R, ¬P
Can we make the conclusion that ¬R is true? The answer is no, because the
truth assignment P = Q = F, R = T satisfies the premises, but does not satisfy
the conclusion.4 The next definition formalizes proper logical deductions.
If direction. Assume ÿ = (ÿ1 ÿ · · · ÿ ÿn) ÿ ÿ is valid. For any truth assignment I that
satisfies all of ÿ1, . . . , ÿn, we have (ÿ1 ÿ · · · ÿ ÿn)[I] = T.
We also have ÿ[I] = ((ÿ1 ÿ · · · ÿ ÿn) ÿ ÿ)[I] = T due to validity. Together this means
ÿ[I] must be true, by observing the truth table for implication (ÿ). This shows that
ÿ1, . . . , ÿn entails ÿ.
Theorem 6.6 gives us further evidence that we have defined implication (ÿ)
correctly. We allow arguments to be valid even if the premise are false.
Axiom Systems
Checking the validity of a formula is difficult (as we discussed, it has been a long
standing open question). On the other hand, we perform logic reasoning everyday,
in mathematical proofs and in English. An axiom system formalizes the reasoning
tools we use in a syntactic way (ie, pattern matching and string manipulations of
formulas) so that we can study and eventually automate the reasoning process.
Definition 6.7. An axiom system H consists of a set of formulas, called the axioms,
and a set of rules of inference. A rule of inference is a way of producing a new
formula (think of it as new logical conclusions), given several established formulas
(think of it as known facts). A rule of inference has the form:
ÿ1
ÿ2 This means “from the formulas ÿ1, . . . , ÿn we may infer ÿ”. We
..
. also use the notation ÿ1, . . . ÿn ÿ (note that this is different from
ÿn the symbol for satisfiability |=).
ÿ
102 logic
For example, an axiom system that contains an invalid axiom is not sound,
while a trivial axiom system that contains no axioms or no rules of inference is
trivially incomplete.
Rules of inference. Here are well-known (and sound) rules of inference for
propositional logic:
It is easy to see that all of the above inferences rules preserves validity, ie,
the antecedents (premises) entail the conclusion. Therefore an axiom system
using these rules will at least be sound.
5We have left out the (rather tedious) formal definitions of “matching” against axioms
or inference rules. This is best explained through examples later in the section.
Machine Translated by Google
¬C an axiom
¬C ÿ (A ÿ C) an axiom
AÿC Modus Ponens, from line 1 and 2
¬A Modus Tollens, from line 3 and 1
AÿB an axiom
¬A an axiom
B Disjunctive Syllogism, from line 1 and 2
¬B ÿ (C ÿ ¬C) an axiom
C ÿ ¬C Disjunctive Syllogism, from line 3 and 4
ÿ ÿ (ÿ ÿ ÿ)
By this we mean any formula that “matches” against the axiom is assumed
to be true. For example, let P and Q be atoms, then
P ÿ (Q ÿ P)
(P ÿ Q) ÿ ((Q ÿ P) ÿ (P ÿ Q))
104 logic
ÿ ÿ (ÿ ÿ ÿ) (ÿ (A1)
ÿ (ÿ ÿ ÿ)) ÿ ((ÿ ÿ ÿ) ÿ (ÿ ÿ ÿ)) (¬ÿ ÿ ¬ÿ) ÿ (ÿ (A2)
ÿ ÿ) (A3)
Theorem 6.12. The axioms (A1), (A2) and (A3), together with the inference rule
Modus Ponens, form a sound and complete axiom system for propositional logic
(restricted to connectives ÿ and ¬).
The proof of Theorem 6.12 is out of the scope of this course (although keep
in mind that soundness follows from the fact that our axioms are tautologies and
Modus Ponens preserves validity). We remark that the derivations guar-anteed
by Theorem 6.12 (for valid formulas) are by and large so long and tedious that
they are more suited to be generated and checked by computers.
Simplification: Addition:
ÿÿÿ
ÿ ÿÿÿÿ
Most of the time we also add rules of “replacement” which allow us to rewrite
formulas into equivalent (and simpler) forms, eg,
ÿx Man(x) ÿ Mortal(x)
Man(Socrates)
Mortal(Socrates)
Several syntax features of first order logic can be seen above: ÿ is one of the
two quantifiers introduced in first order logic; x is a variable; Socrates is a
constant (a particular person); Mortal(x) and Man(x) are predicated.
Formally, an atomic expression is a predicate symbol (eg, Man(x),
LessThan(x, y)) with the appropriate number of arguments; the arguments can
either be constants (eg, the number 0, Socrates) or variables (eg, x, y and z).
A first order formula, similar to propositional logic, is multiple atomic expressions
connected by connectives. The formal recursive definition goes as follows:
• [New to first order logic.] If ÿ is a formula and x is a variable, then ÿxÿ (for
all x the formula ÿ holds) and ÿxÿ (for some x the formula ÿ holds) are
also formulas.
Example 6.13. The following formula says that the binary predicate P is
transitive:
ÿxÿyÿz(P(x, y) ÿ P(y, z)) ÿ P(x, z)
Machine Translated by Google
106 logic
Example 6.14. The following formula shows that the constant “1” is a mul-
tiplicative identity (the ternary predicate Mult(x, y, z) is defined to be true if xy =
z):
ÿxÿy(Mult(1, x, x) ÿ Mult(x, 1, x))
Can you extend the formula to enforce that “1” is the unique multiplicative
identity?
Example 6.15. The following formula shows that every number except 0 has an
inverse multiplicative:
Example 6.17. In the following formula (that is not a sentence), the first
occurrence of x is free, and the second one is bound:
ÿyP(x, y) ÿ ÿxR(x)
The next formula is a sentence (note that in this case, ÿx captures both
occurrences of x):
ÿyÿx(P(x, y) ÿ R(x))
For example, in the Socrates example, we could have D be the set of all people
(or the set of all living creatures, or the set of all Greeks). An interpretation I
would need to single out Socrates in D, and also specify for each a ÿ D, whether
Man(x) and Mortal(x) holds.
Given a first-order sentence ÿ, a domain D and an interpretation I = ID
(together (D, I) is called a model), we can define the truth value of ÿ, denoted
by ÿ[I], recursively:
• If ÿ has the form ¬ÿ, ÿ1 ÿ ÿ2, ÿ1 ÿ ÿ2, ÿ1 ÿ ÿ2 or ÿ1 ÿ ÿ2, then ÿ[I] = ¬ÿ[I],
ÿ1[I]ÿÿ2[I], ÿ1[I ]ÿÿ2[I], ÿ1[I] ÿ ÿ2[I] or ÿ1[I] ÿ ÿ2[I], respectively (following
the truth tables for ¬, ÿ, ÿ, ÿ, and ÿ).
• If ÿ has the form ÿxÿ, then ÿ[I] is true if and only if for every element a ÿ D,
ÿ, with free occurrences of x replaced by a, evaluates to true.
• If ÿ has the form ÿxÿ, then ÿ[I] is true if and only if there exists some
element a ÿ D such that ÿ, with free occurrences of x replaced by a,
evaluates to true.
A note on the truth value of first order formulas [optional]. We have cheated in
our definition above, in the case that ÿ = ÿxÿ or ÿxÿ. When we replace free
occurrences of x in ÿ by a, we no longer have a formula (be-cause strictly
speaking, “a”, an element, is not part of the language). One work around is to
extend the language with constants for each element in the domain (this has to
be done after the domain D is fixed). A more common approach (but slightly
more complicated) is to define truth values for all for-mulas, including those
that are not sentences. In this case, the interpretation
Machine Translated by Google
108 logic
6.4 Applications
Logic has a wide range of applications in computer science, including program
verification for correctness, process verification for security policies, information
access control, formal proofs of cryptographic protocols, etc.
In a typical application, we start by specifying of “model”, a desired prop-
erty in logic, eg, we want to check that a piece of code does not create
deadlocks. We next describe the “system” in logic, eg, the piece of code, and
the logic behind code execution. It then remains to show that our system
satisfies our desired model, using tools in logic; this process is called model
checking. Recently Edmund Clark received the Turing award in 2007 for his
work on hardware verification using model checking. He graduated with his
Ph.D. from Cornell in 1976 with Bob Constable as advisor.
Machine Translated by Google
Chapter 7
Graphs
Graphs are simple but extremely useful mathematical objects; they are
ubiquitous in practical applications of computer science. For example:
• In a digitalized map, nodes are intersections (or cities), and edges are roads (or
highways). We may have directed edges to capture one-way streets, and
weighted edges to capture distance. This graph is then used for generating
directions (eg, in GPS units).
• On the internet, nodes are web pages, and (directed) edges are links from one
web page to another. This graph can be used to rank the importance of each
web page for search results (eg, the importance of a web page can be
determined by how many other web pages are pointing to it, and recursively
how important those web pages are).
• In a social network, nodes are people, and edges are friendships. Under-
standing social networks is a very hot topic of research. For example, how
does a network achieve “six-degrees of separation”, where everyone is
approximately 6 friendships away from anyway else? Also known as the small
world phenomena, Watts and Strogatz (from Cornell) published the first
models of social graphs that have this property, in 1998.
109
Machine Translated by Google
110 graphs
Graph Representations
The way a graph is represented by a computer can affect the efficiency of vari-
ous graph algorithms. Since graph algorithms are not a focus of this course, we
instead examines the space efficiency of the different common representations.
Given a graph G = (V, E):
Adjacency Matrix. We can number the vertices v1 to vn, and represent the
edges in an by n matrix A. Row i and column j of the matrix, aij , is 1
if and only if there is an edge from vi to vj . If the graph is undirected,
then aij = aji and the matrix A is symmetric about the diagonal; in this
case we can just store the upper right triangle of the matrix.
graphs 111
Edge Lists. We may simply have a list of all the edges in E, which implicitly
defines a set of “interesting” vertices (vertices that have at least one edge
entering or leaving).
If the graph is dense (ie, has lots of edges), then consider the adjacency
2
matrix representation. The matrix requires storing O(n ) entries, which is
comparable to the space required by adjacency lists or edge lists if the graph is
dense. In return, the matrix allows very efficient lookups of whether an edge (u,
v) exists (by comparison, if adjacency lists are used, we would need to traverse
the whole adjacency list for the vertex u).For sparse graphs, using adjacency
lists or edge lists can result in large savings in the size of the representation.1
Vertex Degree
The degree of a vertex corresponds to the number of edges coming out or
going into a vertex. This is defined slightly differently for directed and undirected
graphs.
Proof. In a directed graph, each edge contributes once to the in-degree of some
vertex and the out-degree of some, possibly the same, vertex. In an undirected
1
Since the advent of the internet, we now have graphs of unprecedented sizes (eg, the
graph of social networks such as Facebook, or the graph of web pages). Storing and working
with these graphs are an entirely different science and a hot topic of research backed by both
academic and commercial interests.
Machine Translated by Google
112 graphs
graph, each non-looping edge contributes once to the degree of exactly two
vertices, and each self-loop contributes twice to the degree of one vertex. In
both cases we conclude that 2|E| = vÿV deg(v).
Corollary 7.4. In a graph, the number of vertices with an odd degree is even.
Proof. Let A be the set of vertices of even degree, and B = V\A be the set of
vertices of odd degree. Then by Theorem 7.3,
Since the LHS and the first term of RHS is even, we have that deg(v) vÿB
is even. In
order for a sum of odd numbers to be even, there must be an even number of
terms.
The only difference between G1 and G2 are the names of the vertices; they are
clearly the same graph! On the other hand, the graphs
are clearly different (eg, in H1, there is a node without any incoming or outgoing
edges.) What about the undirected graphs shown in Figure 7.1c?
One can argue that K1 and K2 are also the same graph. One way to get K2
from K1 is to rename/permute the nodes a, b and c to b, c and a, respectively.
(Can you name another renaming scheme?)
2The name stems from anecdote that the number of people that shake hands with an
odd number of people is even.
Machine Translated by Google
b b
G1 a H1 a K1 b has vs
vs vs
2 b b
G2 1 H2 has K2 has
3 vs
vs
(a) G1 and G2 are clearly (b) H1 and H2 are clearly (c) Are K1 and K2
isomorphic. not isomorphic. isomorphic?
Definition 7.5. Two graphs G1 = (V1, E1) and G2 = (V2, E2) are
isomorphic if there exists a bijection f : V1 ÿ V2 such that (u, v) ÿ E1 if
and only if (f(u), f(v)) ÿ E2. The bijection f is called the isomorphism from
G1 to G2, and we use the notation G2 = f(G1).
As we would expect, the definition of isomorphism, through the use of the bijection
function f, ensures, at the very least, that the two graphs have the same number of vertices
and edges. Another observation is that given an isomorphism f from G1 to G2, then inverse
function f is an isomorphism from G2 to G1 (we leave it to the readers to verify the details);
this makes sense since we would expect isomorphisms to be symmetric.
Given two graphs, wow do we check if they are isomorphic? This is hard problem where
no efficient algorithm is known. However, if an isomorphism f is found, it can be efficiently
stored and validated (as a proper isomorphism) by anyone. In other words, f serves as a
short and efficient proof that two graphs are isomorphic.
Can we prove that two graphs are not isomorphic in an efficient way? Sure, if the
graphs have a different number of vertices or edges. Or we may be able to find some
structure present in one graph G1 that can be checked to not be in the other graph G2, eg,
G1 contains a “triangle” (three nodes that are all connected to each other) but G2 doesn' t,
or G1 has a vertex of degree 10 but G2 doesn't. Unfortunately, no general and efficient
method is known for proving that two graphs are not isomorphic. This is analogous to the
task of
Machine Translated by Google
114 graphs
Figure 7.2: An interactive protocol for graph non-isomorphism (the verifier should
accept when G1 and G2 are not isomorphic).
Interactive Proofs
In 1985, Goldwasser, Micali and Rackoff, and independently Babai, found a work
around to prove that two graphs are not isomorphic. The magic is to add
interaction to proofs. Consider a proof system that consists of two players, a
prover P and a verifier V, where the players can communicate interactively with
each other, instead of the prover writing down a single proof.
In general the prover (who comes up with the proof) may not be efficient, but the
verifier (who checks the proof) must be. As with any proof system, we desire
completeness: on input non-isomorphic graphs, the prover P should be able to
convince V of this fact. We also require soundness, but with a slight relaxation:
on input isomorphic graphs, no matter what the prover says to V, V should reject
with very high probability. We present an interactive proof for graph non-
isomorphism in Figure 7.2.
Let us check that the interactive proof in Figure 7.2 is complete and sound.
Completeness: If the graphs are not isomorphic, then H is isomorphic to Gb, but
not to the other input graph G1ÿb. This allows P to determine b = b every
time.
As of now, given isomorphic graphs, the verifier accepts or rejects with proba-
bility 1/2; this may not fit the description “reject with very high probability”.
Fortunately, we can amplify this probability by repeating the protocol (say) 100
times, and let the verifier accept if only if b = b is all 100 repetitions.
Then by independence, the verifier would accept in the end with probability at
most 1/2 100, and reject with probability at least 1 ÿ 1/2 100. Note that the
completeness of the protocol is unchanged even after the repetitions.
• A walk can always be “trimmed” in such away that every vertex is visited
at most once, while keeping the same starting and ending vertices.
Example 7.7. The Bacon number of an actor or actress is the shortest path
from the actor or actress to Kevin Bacon on the following graph: the nodes are
actors and actresses, and edges connect people who star together in a movie.
The Erd¨os number is similarly defined to be the distance of a mathematician
to Paul Erd¨os on the co-authorship graph.
Machine Translated by Google
116 graphs
Figure 7.3: The difference between strong and weak connectivity in a directed
graph.
Connectivity
Definition 7.8. An undirected graph is connected if there exists a path between
any two nodes u, v ÿ V (note that a graph containing a single node v is
considered connected via the length 0 path (v)).
3We have avoided describing implementation details of BFS. It suffices to say that BFS
is very efficient, and can be implemented to run in linear time with respect to the size of the
graph. Let us also mention here that an alternative graph search algorithm, depth first search
(DFS), can also be used here (and is more efficient than BFS at computing strongly
Machine Translated by Google
118 graphs
To see why, first note that if there are no path between v and u, then of
course the search algorithm will never reach u. On the other hand, assume that
there exists a path between v and u, but for the sake of contradiction, that the
BFS algorithm does not visit u after all the reachable vertices are visited. Let w
be the first node on the path from v to u that is not visited by BFS (such a node
must exists because u is not visited). We know w = v since v is visited right
away. Let wÿ1 be the vertex before w on the path from v to u, which must be
visited because w is the first unvisited vertex on the path.
But this gives a contradiction; after BFS visits wÿ1, it must also visit w since w is
4
an unvisited neighbor of wÿ1.
Now let us use BFS to find the connected components of a graph. Simply
start BFS from any node v; when the graph search ends, all visited vertex form
one connected component. Repeat the BFS on remaining unvisited nodes to
recover additional connected components, until all nodes are visited.
Theorem 7.11. A undirected graph G = (V, E) has an Euler cycle if and only if G
is connected and every v ÿ V has even degree. Similarly, a directed graph G =
(V, E) has an Euler cycle if and only if G is strongly connected and every v ÿ V
has equal in-degree and out-degree.
Proof. We prove the theorem for the case of undirected graphs; it generalizes
easily to directed graphs. First observe that if G has a Euler cycle, then of course
G is connected by the cycle. Because every edge is in the cycle, and each time
the cycle visits a vertex it must enter and leave, the degree of each vertex is
even.
To show the converse, we describe an algorithm that builds the Eulerian
cycle assuming connectivity and that each node has even degrees. The
algorithm grows the Euler cycle in iterations. Starting from any node v, follow
any path in the graph without reusing edges (at each node, pick some unused
edge to continue the path). We claim the path must eventually return to v; this
is because the path cannot go on forever, and cannot terminate on any other
vertex u = v do to the even degrees constraint: if there is an available edge into
u, there is also an available edge out of u. That is, we now have a cycle (from v
to v). If the cycle uses all edges in G then we are done.
Otherwise, find the first node on the cycle, w, that still has an unused edge;
w must exist since otherwise the cycle would be disconnected from the part G
that still has unused edges. We repeat the algorithm starting from vertex w,
resulting in a cycle from w to w that does not have repeated edges, and does
not use edges in the cycle from v to v. We can then “stitch” these two cycles
together into a larger cycle:
We can relax the notion of Euler cycles into Euler paths — a path that uses
every edge in the graph exactly once.
Corollary 7.12. A undirected graph G = (V, E) has an Euler path, but not a Euler
cycle, if and only if the graph is connected and exactly two nodes has an odd
degree.
Proof. Again it is easy to see that if G has an Euler path that is not a cycle, then
the graph is connected. Moreover, the starting and ending nodes of the path,
and only these two nodes, have an odd degree.
To prove the converse, we reduce the problem into finding an Euler cycle.
Let u, v ÿ V be the unique two nodes that have an odd degree. Consider
introducing an extra node w and the edges {u, w}, {v, w}. This modified graph
satisfies the requirements for having a Euler cycle! Once we find the cycle in
the modified graph, simply break the cycle at node w to get an Euler path from
u to v in the original graph.
120 graphs
Definition 7.14. A graph is k-colorable if it can be colored with k colors, ie, ie,
there exists a coloring c satisfying ÿv ÿ V, 0 ÿ c(v) < k. The chromatic number
ÿ(G) of a graph G is the smallest number such that G is ÿ(G)-colorable.
Here are some easy observations and special cases of graph coloring:
• A fully connected graph with n nodes (ie, every two distinct nodes share
an edge) has chromatic number n; every node must have a unique color,
and every node having a unique color works.
Coloring Planar Graphs. A graph is planar if all the edges can be drawn on a
plane (eg, a piece of paper) without any edges crossing. A well-known result in
mathematics state that all planar graphs are 4-colorable. Eg, the complete
graph with 5 nodes cannot be planar, since it requires 5 colors! Also
Machine Translated by Google
known as the “4 color map theorem”, this allow any map to color all the countries
(or states, provinces) with only four colors without ambiguity (no neighboring
countries will be colored the same). In general, checking whether planar graphs
are 3-colorable is still NP-complete.
Step 3: P removes the cups covering u and v to reveal their colors, ÿ(c(u))
and ÿ(c(v)).
Figure 7.5: An interactive protocol for graph non-isomorphism (the verifier should
accept when G1 and G2 are not isomorphic).
prover will always convince the verifier by following the protocol. What if the
Machine Translated by Google
122 graphs
graph is not 3-colorable? With what probability can the prove cheat? The coloring ÿ(c(v)) must be
wrong for at least one edge. Since the verifier V asks the prover P to reveal the colors along a
random edge, P will be caught with probability 1/|E|.
As with before, even though the prover may cheat with a seemingly large probability, 1ÿ1/|E|,
we can amplify the probabilities by repeating the proto-col (say) 100|E| times. Due to independence,
the probability that the prover successfully cheats in all 100|E| repetitions is bounded by
100
1 100| 1 |E|
ÿ100 ÿ
1ÿ E| = 1 ÿ e
|E| |E|
The zero-knowledge property. It is easy to “prove” that a graph is 3-colorable: simply write down
the coloring! Why do we bother with the interactive proof in Figure 7.5? The answer is that it has
the zero-knowledge property.
Intuitively, in a zero-knowledge interactive proof, the verifier should not learn anything from the
interaction other than the fact that the proof state-ment proved is true. Eg, After the interaction, the
verifier cannot better compute a 3-coloring for the graph, or better predict the weather for tomorrow-
row. Zero-knowledge is roughly formalized by requiring that the prover only tells the verifier things
that he already knows — that the prover messages could have been generated by the verifier itself.
For our 3-coloring interactive proof, it is zero-knowledge because the prover messages consists
only of two random colors (and anyone can pick out two random colors from {0, 1, 2}).
Implementing electronic “cups”. To implement a cup, the prover P can pick an RSA public-key (N,
e) and encrypt the color of each node using Padded RSA. To reveal a cup, the prover simply
provides the color and the padding (and the verifier can check the encryption). We use Padded
RSA instead of plain RSA because without the padding, the encryption of the same color would
always be the same; essentially, the encryptions themselves would give the coloring away.
Consider the following random process to generate an vertex graph: for each pair of
vertices, randomly create an edge between them with independent probability 1/2 (we will
not have self loops). What is the probability that two nodes, u, and v, are connected with a
path of length at most 2? (This is a simple version of “six degrees of separation”.)
Taking any third node w, the probability that the path u–w–v does not exist is 3/4 (by
independence). Again by independence, ranging over all possible third nodes w, the
probability the path u–w–v does not exist for all w = u, w = v is (3/4)nÿ2 . Therefore, the
probability that u and v are more than distance 2 part is at most (3/4)nÿ2 .
What if we look at all pairs of nodes? By the union bound (Corollary 5.13),
the probably the same pair of node is more than distance 2 apart is
Pr ÿ
u, v has distance ÿ 2 ÿ Pr[u, v has distance ÿ 2]
ÿ ÿ u=v ÿ u=v
nÿ2
n(n ÿ 1) ÿ 3
2 4
Chapter 8
Finite Automata
• f is a transition function f: S × ÿ ÿ S.
Here is how a DFA operates, on input string x. The DFA starts in state s0 (the
start state). It reads the input string x one character at time, and transition into a new
state by applying the transition function f to the current
125
Machine Translated by Google
state and the character read. For example, if x = x1x2 · · · start , the DFA would
by transitioning through the following states:
After reading the whole input x, if the DFA ends in an accepting state ÿ F, then x
is accepted. Otherwise x is rejected.
Definition 8.2. Given an alphabet ÿ, a language L is just a set of strings over the
ÿ
alphabet ÿ, ie, L ÿ ÿ recognized by a . We say a language L is accepted or
DFA M, if M accepts an input string x ÿ ÿ ÿ if and only if x ÿ L.
We can illustrate a DFA with a graph: each state s ÿ S becomes a node, and
each mapping (s, ÿ) ÿ t in the transition function becomes an edge from s to t
labeled by the character ÿ. The start state is usually represented by an extra edge
pointing to it (from empty space), while the final states are marked
with double circles.
Example 8.3. Consider the alphabet ÿ = {0, 1} and the DFA M = (S, ÿ, f, s0, F) defined
by
The DFA M accepts all strings that has an even number of 1s. Intuitively, state s0
corresponds to “we have seen an even number of 1s”, and state s1 corresponds
to “we have seen an odd number of 1s”. A graph of M looks like:
1
0 s0 s1 0
Figure 8.1: Illustration for Lemma 8.4. If a DFA M with < c states accepts the string 1c
, then it must also accept infinitely many other strings.
Lemma 8.4. Let c be a constant and L = {1 c} (the singleton language con-taining the string
of c many 1s). Then no DFA with < c states can accept L.
Proof. Assuming the contrary that some DFA M with < c states accepts sc be the states
accept state). By traversed by M to accept the string 1c (and L. Let s0, . . ., so sc ÿ F is an
the pigeon hold principle, some state is repeated twice, say s
ÿ
This gives a contradiction, since M accepts (infinitely) more strings than the language L.
On the other hand, see Figure 8.2 for a DFA with c+ 2 states that accepts the language
{1 c}. The techniques of Lemma 8.1 can be generalized to show the pumping lemma:
Machine Translated by Google
1 1 0.1
s0 s1 s2 hell 0.1
Figure 8.2: A DFA with 4 states that accepts the language {1 2}. This can be easily
generalized to construct a DFA with c + 2 states that accepts the language {1 c}.
Lemma 8.5 (Pumping Lemma). If M is a DFA with k states and M accepts some string
x with |x| > k, there exists strings u, v and w such that x = uvw, |uv| ÿ k, |v| ÿ 1 and uviw
is accepted by M for i ÿ N.
Proof sketch. Again let s0, . . . , s|x| be the states that M travels through to accept the
string x¿ Due to the pigeonhole principle, some state must be repeated among s0, . . . ,
ÿ
sk, say s = si = sj with 0 ÿ i < j ÿ k. We can now set u to be the first i characters of x, v
to be the next j ÿ i > 0 characters of x, and w to be the rest of x.
Example 8.6. No DFA can accept the language L = {0 n1 n | n ÿ N} (intuitively, this is another
counting exercise). If we take any DFA with N states, and assume that it accepts the string 0N 1 then
NOT
, strings 0N+t1 N 0N+2t1 N , etc., for
the pumping lemma says that the same DFA must accept the
some 0 < t ÿ N . ,
2
Example 8.7. No DFA can accept the language L = {0 n | n ÿ N}. If we take any DFA
with N states, and assume that it accepts the string 0N2 then the pumping lemma ,
says that the same DFA must accept the strings 0 N2+t , etc., for some 0 < t ÿ N. (In
, 0N+2t
particular, 0N2+t ÿ/ L because
0 < t ÿ N.)
A game theory perspective. Having a computing model that does not count may be a
good thing. Consider the repeated prisoner's dilemma from game theory. We have two
prisoners under suspicion for robbery. Each pris-oner may either cooperate (C) or
defect (D) (ie, they may keep their mouths shut, or rat each other out). The utilities of
the players (given both players' choices) are as follows (they are symmetric between
the players):
Machine Translated by Google
VS D
C (3, 3) (ÿ5, 5)
D (5, ÿ5) (ÿ3, ÿ3)
Roughly the utilities say the following. Both players cooperating is fine (both
prisoners get out of jail). But if one prisoner cooperates, the other should defect
(not only does the defector get out of jail, he always gets to keep the root all to
himself, while his accomplish stays in jail for a long time). If both players are
defective, then they both stay in jail.
In game theory we look for a stable state called a Nash equilibrium; we look at
a pair of strategies for the prisoners such that neither player has any incentive to
deviate. It is unfortunate (although realistic) that the only Nash equilibrium here is
for both prisoners to defect.
Now suppose we repeat this game 100 times. The total utility of a player ÿ iu
is 100 (i) where u (i)
i=1 the utility of the player in round i, and 0 < ÿ < 1 is a discount
factor (for inflation and interests over time, etc.). Instead of prisoners, we now
have competing stores on the same street. To cooperate is to continue business
as usual, while to defect means to burn the other store down for the day.
Clearly cooperating all the way seems best. But knowing that the first store
would cooperate all the time, the second store should defect in that last (100th)
round. Knowing this, the first store would defect the round before (99th round).
Continuing this argument1 , the only Nash equilibrium is again for both prisoners
to always defect.
What happens in real life? Tit-for-tat2 seems to be the most popular strategy:
cooperate or defect according to the action of the other player in the previous
round (eg, cooperate if the other player cooperated). How can we change our
game theoretical model to predict the use of tit-for-tat?
Suppose players use a DFA (with output) to compute their decisions; the input
is the decision of the other player in the previous round. Also assume that players
need to pay for the number of states in their DFA (intuitively, having many states
is cognitively expensive). Then tit-for-tat is a simple DFA with just 1 state s, and
the identity transition function: f(s, C) = (s, C), f(s, D) = (s, D). Facing a player that
follows tit-for-tat, the best strategy would be to cooperate until round 99 and then
defect in round 100. But we have seen that counting with DFA requires many
states (and therefore bears a heavy cost)! This is especially true if the game has
more rounds, or if the
discount factor ÿ is harsh (ie, 1). If we restrict ourselves to 1-state DFAs, then both
players following tit-for-tat is a Nash equilibrium.
accepts (or recognizes) a language L for all inputs x, M accepts x if and only if x ÿ L.
Note that just as it is possible for a state to have multiple possible transitions after
reading a character, a state may have no possible transitions. An input that simply
does not have a sequence of valid state transitions (ignoring final states altogether)
is of course rejected.
Note that an NFA is not a realistic “physical” model of computation.
At any point in the computation where there are multiple possible states to transition
into, it is hard to find locally the “correct transition”. An alternative model of
computation is a randomized finite automaton (RFA). A RFA is much like an NFA,
with the additional property that whenever there is a choice of transitions, the RFA
would specify the probability with which the automaton transitions to each of the
allowed states. Correspondingly, a RFA not simply accept or reject an input x, but
instead accepts each input with some probability. Compared to an NFA, an RFA is a
more realistic “physical” model of computation.
Machine Translated by Google
0.1
Figure 8.3: A NFA with 5 states that accepts the language L3. Intuitively, given a
string x ÿ L3, the NFA would choose to remain in state s0 until it reads the third
last character; it would then (magically decide to) transition to state s1, read the
final two characters (transitioning to s2 and s3), and accept.
The converse that any x accepted by the NFA must be in the language L3 is
easy to see. This can be easily generalized to construct an NFA with n + 2 states
that accepts the language Ln.
Example 8.9. Consider the language Ln = {x ÿ {0, 1} ÿ | |x| ÿ n, x|x|ÿn = 1} (ie, the
language of bit strings where the n bit countingthfrom the end is a 1). Ln can be
recognized by an O(n)-state NFA, as illustrated in Figure 8.3.
On the other hand, any DFA that recognizes Ln must have at least 2n states.
(Can you construct a DFA for recognizing this language?) Let M be a DFA with
less than 2n states. By the pigeonhole principle, there exists 2 n-bit strings x and
x such that M would reach the same state s after reading x or x as input (because
there are a total of 2n n-bit strings). Let x and x differ in position i (1 ÿ i ÿ n), and
without loss of generality assume that xi = 1 and x (ie, appending the string of n
ÿ i many 1s). Mi =would
0. Now consider
reach the strings
the same ˆx = x1
state after nÿi
reading ˆxand xˆ (since
or xˆ = x 1 itnÿi
reached the same state after reading x or x ), and so M must either accept both
strings or reject both strings. Yet ˆx ÿ Ln and xˆ ÿ/ Ln, ie, M does not recognize
the language Ln.
• Upon reading the character ÿ ÿ ÿ, we transition from state t ÿ P(S) to the state
corresponding to the union of all the possible states that M could have transitioned
into, if M is currently in any state s ÿ t. More formally, let
f(t, ÿ) = f(s, ÿ)
sÿt
• The start state of M is the singleton state containing the start state of
M, ie, t0 = {s0}.
• The final states of M is any state that contains a final state of M, ie, F = {t ÿ T = P(S) | t
ÿ F = ÿ}.
Intuitively, after reading any (partial) string, the DFA M tries to keep track of all possible states
that M may be in.
We now show that the DFA M accepts any input x if and only if the NFA M accepts x.
Assume that M accepts x; that is, there exists some path of s|x| such that s, xi). t|x| .
0
computation s ÿ F, ,ands1, s. Consider
..,
0
the (deterministic)
x| = path | computation i+1
s0, s of
i
ÿ f(s
0
of M on input x: t It can be shown inductively that s 0 Base case. s ,...,
i
iÿt for all 0 ÿ i ÿ |x|: = {s
0 ÿ t 0 since t 0} by definition. ÿ f(s
Inductive step. If s
ii ÿ t
, then because s i+1 i
, xi), we also have
1si+ ÿt
i+1 =
f(s, xi)
sÿt i
conclude that s. Since s|x| ÿ t |x| |x| We ÿ F, we have t |x| ÿ F and so M would
accept x.
0
For the converse direction, assume M accepts x. Let t = t0 and t |x| , . . . , t|x| be the
deterministic computation path of M we can , 0 with t ÿF. From this
inductively define an accepting sequence of state transitions for M on input x, starting from
the final state and working backwards.
Base case. Because t |x| |x| ÿ F there, exists some s ÿ |x| such that s ÿ k ÿ F.
i+1 ii ÿ t
Inductive step. Given some s i+1 ÿ f(s ÿ t i+1, then there must exist some s, xi) (in
i i to transition to t i+1 s| .
s It is easy to such that order for t
0
see that the sequence s an valid, accepting , . . . , x| , inductively defined above, is
0
sequence of state transitions for M on input x: s |x| 0 = t0 = { s0}, s since s = s0
0ÿt ÿ F by the base case of the definition, and the
transitions are valid by the inductive step of the definition. Therefore M accepts x.
Machine Translated by Google
Definition 8.11. The set of regular expressions over alphabet ÿ are de-fined
inductively as follows:
Usually the Kleene star takes precedence over concatenation, which takes
precedence over alternation. In more complex expressions, we use parenthesis
to disambiguate the order of operations between concatenations, alternations
and Kleene stars. Examples of regular expressions over the lower case letters ,
include: ab|c ÿ (a|b)(c|ÿ), ÿ. A common extension of regular expressions is
the “+” operator; A+ is interpreted as syntactic sugar (a shortcut) for AAÿ .
As of now a regular expression is just a syntactic object — it is just a
sequence of symbols. Next we describe how to interpret these symbols to
specify a language.
• L(ÿ) = {ÿ} (ie, the set consisting only of the empty string).
• L(x) = {x} (ie, the singleton set consisting only of the one-character
string “x”).
Example 8.13. The parity language consisting of all strings with an even number
of 1s can be specified by the regular expression (0 ÿ10ÿ10ÿ ) ÿ language . Tea
consisting of all finite strings {0, 1} ÿ can be specified either by (0|1 ) ÿ or (0 ÿ1
ÿ)ÿ .
We can prove Kleene's Theorem constructively. That is, given any DFA, we
can generate an equivalent regular expression to describe the language
recognized by the DFA, and vice versa. We omitted the formal proof of Kleene's
Theorem; in the rest of this section, we give an outline of how a regular expression
can be transformed into a NFA (which can then be transformed into a DFA).
We sketch how any regular language can be recognized by a NFA; since NFAs
are equivalent to DFAs, this means any regular language can be recognized by
a DFA as well. The proof proceeds by induction over regular expressions.
Base case: It is easy to show that the language specified by regular
expressions ÿ, ÿ, and x for x ÿ ÿ can be recognized by a NFA (also see Figure
8.4):
• L(ÿ) is recognized by a NFA where the start state is also a final state, and
has outgoing transitions.
Case AB: Let the languages L(A) and L(B) be recognized by NFAs MA and MB,
respectively. Recall that L(AB) contains strings that can be divided into two parts
such that the first part is in L(A), recognized by MA, and the second part is in L(B),
recognized by MB. Hence intuitively, we need a combined NFA MAB that contains
the NFA MA followed by the NFA MB, sequentially. To do so, let us “link” the final
states of MA to the starting state of MB. One way to proceed is to modify all the
final states in MA, by adding to them all outgoing transitions leaving the start state
of MB (ie, each final state in MA can now function as the start state of MB and
transition to appropriate states in MB.)
The start state of MAB is the start state of MA. The final states of MAB is the final
states of MB; furthermore, if the start state of MB is final, then all of the final states
in MA is also final in MAB (because we want the final states of MA to be “linked” to
the start state of MB).
We leave it to the readers to check that this combined NFA MAB does indeed
accept strings in L(AB), and only strings in L(AB).
Before we proceed onto other cases, let us abstract the notion of a “link” from above. The
goal of a “link” from state s to t, is to allow the NFA to (nonde-terministically) transition
from state s to state t, without reading any input.
We have implemented this “link” above by adding the outgoing transitions of t to s (ie, this
simulates the case when the NFA nondeterministically transitions to state t and then
follows one of t's outgoing transition). We also make its final state if t is a final state (ie,
this simulates the case when the NFA nondeterministically transitions to state t and then
halt and accepts).
Case A|B: Again, let the languages L(A) and L(B) be recognized by NFAs MA and MB,
respectively. This time, we construct a machine MA|B that contains MA, MB and a
brand new start state s0, and add “links”
Machine Translated by Google
from s0 to the start state of MA and MB. Intuitively, at state s0, the
machine MA|B must nondeterministically decide whether to accept
the input string as a member of L(A) or as a member of L(B). The
start state of MA|B is the new state s0, and the final states of MA|B
3 MB.
are all the final states of MA and
Case Aÿ : Let the languages L(A) be recognized by NFA MA. Consider the
NFA MA+ that is simply the NFA MA but with “links” from its final
states back to its start state (note that we have not constructed the
machine MAÿ yet; in order for a string to be accepted by MA+ as de-
scribed above, the string must be accepted by the machine MA at
least once). We can then construct the machine MAÿ by using the
fact that the regular expressions ÿ|A+ and Aÿ are equivalent.
Appendix A
Problem Sets
Problem 2 [5 points]
Compute (38002 · 7 201) mod 55. Show your work.
Problem 3 [2 + 4 = 6 points]
Let p ÿ 3 be any prime number. Let c, a ÿ {1, . . . , p ÿ 1} such that a is a solution to
2 2
the equation x ÿ c (mod p). ÿ c (mod p), ie, a
2
(a) Show that p ÿ a is also a solution to the equation x ÿ c ÿ c (mod p), ie,
2
(p ÿ a) (mod p).
2
(b) Show that a and p ÿ a are the only solutions to the equation x (mod p) ÿc
2
modulo p, ie, if b ÿ Z satisfies b ÿ c (mod p), then b ÿ a (mod p) or b ÿ p ÿ a
(mod p).
137
Machine Translated by Google
Problem 4 [4 points]
How many solutions are there to the equation a+b+c+d = 30, if a, b, c, d ÿ N?
(N includes the number 0. You do not need to simplify your answer.)
Problem 5 [2 + 2 + 4 = 8 points]
Let n be a positive even integer.
(a) How many functions f : {0, 1} n ÿ {0, 1} n are there that do not map an element to itself
(ie, f satisfies f(x) = x for all x ÿ {0, 1 } n )?
(c) How many functions f : {0, 1} n ÿ {0, 1} n are there that satisfy f(x) = x and f(x) = x rev
for all x ÿ {0, 1} n ? Justify your answer.
Problem 6 [6 points]
r
not=
Let n, r, k ÿ N + such that k ÿ r ÿ n. Show that by using a combinatorial
nÿkrÿk
not
argument (ie, show
r k k
that both sides of the equation count the same thing).
Problem 7 [3 + 3 = 6 points]
A certain candy similar to Skittles is manufactured with the following proper-ties: 30% of
the manufactured candy pieces are sweet, while 70% of the pieces are sour. Each candy
piece is colored either red or blue (but not both). If a candy piece is sweet, then it is
colored blue with 80% probability (and colored red with 20% probability), and if a piece is
sour, then it is colored red with 80% probability. The candy pieces are mixed together
randomly before they are sold. You bought a jar containing such candy pieces.
(a) If you choose a piece at random from the jar, what is the probability that you choose
a blue piece? Show your work. (You do not need to simplify your answer.)
(b) Given that the piece you chose is blue, what is the probability that the piece is sour?
Show your work. (You do not need to simplify your answer.)
Machine Translated by Google
Problem 8 [3 + 3 = 6 points]
A literal is an atom (ie, an atomic proposition) or the negation of an atom (eg, if
P is an atom, then P is a literal, and so is ¬P). A clause is a formula of the form
liÿlj ÿlk, where li , lj , lk are literals and no atom occurs in liÿlj ÿlk more than once
(eg, P ÿ ¬Q ÿ ¬P is not allowed, since the atom P occurs in P ÿ ¬Q ÿ ¬P more
than once).
Examples: P1 ÿ ¬P2 ÿ ¬P4 is a clause, and so is P2 ÿ ¬P4 ÿ ¬P5.
(b) Let {C1, C2, . . . , Cn} be a collection of n clauses. If we choose a uniformly random interpretation as in part (a), what
is the expected number of clauses in {C1, C2, . . . , Cn} that evaluates to T rue under the chosen in-terpretation?
Justify your answer.
Problem 9 [3 + 3 = 6 points]
Consider the formula (P ÿ ¬Q) ÿ (¬P ÿ Q), where P and Q are atoms.
Problem 10 [6 points]
Let G be an undirected graph, possibly with self-loops. Suppose we have the
predicate symbols Equals(·, ·), IsV ertex(·), and Edge(·, ·).
Let D be some domain that contains the set of vertices of G (D might contain
other elements as well). Let I be some interpretation that specifies functions for
Equals(·, ·), IsV ertex(·), and Edge(·, ·), so that for all u, v ÿ D, we have Equals[I]
(u, v ) = T (True) if and only if u = v, IsV ertex[I](u) = T if and only if u is a vertex
of G, and if u and v are both vertices of G, then Edge[I ](u, v) = T if and only if
{u, v} is an edge of G (we do not know the value of Edge[I](u, v) when u or v is
not a vertex of G) .
Write a formula in first order logic that captures the statement “the graph G
does not contain a triangle”, ie, the formula is T (True) under (D, I) if and only if
the graph G does not contain a triangle ( a triangle is 3 distinct vertices all of
which are connected to one another via an edge).
Machine Translated by Google
Problem 11 [6 points]
Suppose we have an undirected graph G such that the degree of each vertex is
a multiple of 10 or 15. Show that the number of edges in G must be a multiple
of 5.
Problem 12 [3 + 2 + 5 = 10 points]
Consider the following non-deterministic finite automaton (NFA):
b
y
has
x b
has
b has
b a,b
w b
b
z
has
u v
has
(a) Write a regular expression that defines the language recognized by the
above NFA.
(b) Construct (by drawing a state diagram) the smallest deterministic finite
automaton (DFA) that recognizes the same language as the above NFA
(smallest in terms of the number of states).
(c) Prove that your DFA for part (b) is indeed the smallest DFA that rec-ognizes
the same language as the above NFA (smallest in terms of the number of
states).
Problem 13 [6 points]
Let S1, S2, S3, S4, . . . be an infinite sequence of countable sets. Show that Sn is defined by Sn := {x | x ÿ
ÿn=1 Sn is countable. (ÿ n=1 Sn for ÿn=1
some n ÿ N +}.)
Machine Translated by Google
Appendix B
Problem 1 [6 points]
= 2n+1 ÿ 2, where the second equality follows from the induction hypothesis.
141
Machine Translated by Google
Problem 2 [5 points]
Compute (38002 · 7 201) mod 55. Show your work.
Solution:
Note that 55 = 5 · 11, so we have ÿ(55) = (4)(10) = 40. Thus, we have (38002 ·
7 201) mod 55 = ((38002 mod 55)·(7201 mod 55)) mod 55 = ((38002 mod ÿ(55) mod
mod 40 mod 40 mod
55)·(7201 mod ÿ(55) mod 55)) mod 55 = ((38002 mod 55)·(7201
55)) mod 55 = ((32 mod 55) · (71 mod 55)) mod 55 = (9 · 7) mod 55 = 63 mod
55 = 8, where we have used Euler's theorem in the second equality.
Problem 3 [2 + 4 = 6 points]
Let p ÿ 3 be any prime number. Let c, a ÿ {1, . . . , p ÿ 1} such that a is a
2 2
solution to the equation x ÿ c (mod p), ie, a ÿ c (mod p).
2
(a) Show that p ÿ a is also a solution to the equation x ÿ c (mod p), ie,
2
(p ÿ a) ÿ c (mod p).
2
(b) Show that a and p ÿ a are the only solutions to the equation x ÿc
2
(mod p) modulo p, ie, if b ÿ Z satisfies b ÿ c (mod p), then b ÿ a
(mod p) or b ÿ p ÿ a (mod p).
Solution:
2 2
(a) Observe that (p ÿ a) 2=p ÿ 2pa + a 2ÿa ÿ c (mod p).
2 2 2
(b) Assume b ÿ Z satisfies b ÿ c (mod p). Then, b ÿa (mod p), so
2 2 ÿ
b ÿa 0 (mod p), so (bÿa)(b+a) ÿ 0 (mod p), so p | (bÿa)(b+a). Since
p is prime, we must have p | (bÿa) or p | (b+a). The former case implies that
b ÿ a (mod p), and the latter case implies that b ÿ ÿa (mod p), so b ÿ p ÿ a
(mod p), as required.
Problem 4 [4 points]
How many solutions are there to the equation a+b+c+d = 30, if a, b, c, d ÿ N?
(N includes the number 0. You do not need to simplify your answer.)
Solution:
Machine Translated by Google
33
since there are 4 distinguishable urns (a, b, c, and d) and 30 indistinct
3,
Problem 5 [2 + 2 + 4 = 8 points]
Let n be a positive even integer.
(a) How many functions f : {0, 1} n ÿ {0, 1} n are there that do not map an element to itself
(ie, f satisfies f(x) = x for all x ÿ {0, 1 } n )?
(c) How many functions f : {0, 1} n ÿ {0, 1} n are there that satisfy f(x) = x and f(x) = x rev
for all x ÿ {0, 1} n ? Justify your answer.
Solution:
(a) (2n ÿ 1)(2n) , since there are 2n elements in the domain {0, 1} n and for each, of these
elements, f can map the element to any element in {0, 1} n cept for itself, so there ex-
are 2n ÿ 1 choices for the element.
rev
(b) 2n/2 =, x,since
theretoare
construct
2 choices
a string
(either
x ÿ0 {0,
or 1)
1} for
n such
eachthat
of the
x first n/2 bits (the first half of
the n-bit string), and then the second half is fully determined by the first half.
(c) Consider constructing a function f : {0, 1} n ÿ {0, 1} n such that f(x) = x and f(x) = x rev for all x ÿ {0, 1} n . For each x ÿ {0, 1} n such
rev
that x there are 2n ÿ 1 choices for f(x). For each x ÿ {0, 1} n such that x 2n/2 strings x ÿ {0, 1} n such there are 2n ÿ 2 choices = x,
rev
for f(x). Since there are = x, and since there are 2n ÿ 2 n/2 strings x ÿ {0, 1} n such that that x = x, the number of functions = x,
2 ) · (2n ÿ 2)(2nÿ2 n/2 ) f : {0, 1} n ÿ {0, 1} n such that f(x) = x is (2n ÿ 1)(2n/
rev
rev
x
Problem 6 [6 points]
r nÿkrÿk
=
Let n, r, k ÿ N + such that k ÿ r ÿ n. Show that by using a combinatorial
not not
Solution:
Suppose there are n people, and we want to choose r of them to serve on a committee,
and out of the r people on the committee, we want to choose k of them to be responsible
for task A (where task A is some task ). The LHS of the equation counts precisely the
number of possible ways to do the choosing above.
Another way to count this is the following: Out of the n people, choose k of them to
be part of the committee and the ones responsible for task A; however, we want exactly r
people to serve on the committee, so we need to choose rÿk more people out of the
remaining nÿk people left to choose from.
nÿkrÿk
Thus, there are possible
nk ways to do the choosing, which is the RHS.
Problem 7 [3 + 3 = 6 points]
A certain candy similar to Skittles is manufactured with the following proper-ties: 30% of
the manufactured candy pieces are sweet, while 70% of the pieces are sour. Each candy
piece is colored either red or blue (but not both). If a candy piece is sweet, then it is
colored blue with 80% probability (and colored red with 20% probability), and if a piece is
sour, then it is colored red with 80% probability. The candy pieces are mixed together
randomly before they are sold. You bought a jar containing such candy pieces.
(a) If you choose a piece at random from the jar, what is the probability that you choose
a blue piece? Show your work. (You do not need to simplify your answer.)
(b) Given that the piece you chose is blue, what is the probability that the piece is sour?
Show your work. (You do not need to simplify your answer.)
Solution:
Let B be the event that the piece you choose is blue, and let D be the event that the piece
you choose is sweet.
Problem 8 [3 + 3 = 6 points]
A literal is an atom (ie, an atomic proposition) or the negation of an atom (eg, if P is an
atom, then P is a literal, and so is ¬P). A clause is a formula of the form liÿlj ÿlk, where li ,
lj , lk are literals and no atom occurs in liÿlj ÿlk more than once (eg, P ÿ ¬Q ÿ ¬P is not
allowed, since the atom P occurs in P ÿ ¬Q ÿ ¬P more than once).
independently, what is the probability that C evaluates to T rue under the chosen
interpretation? Justify your answer.
(b) Let {C1, C2, . . . , Cn} be a collection of n clauses. If we choose a uniformly random interpretation as in part (a), what is
the expected number of clauses in {C1, C2, . . . , Cn} that evaluates to T rue under the chosen in-terpretation?
Solution:
(a) The probability that C evaluates to T rue is equal to 1 minus the prob-ability that C
evaluates to F alse. Now, C evaluates to F alse if and only if each of the three literals in
C evaluates to F alse. A literal in C evaluates to F alse with probability once, the
probability that all three literals in C 1 2 , and since no atom occurs in C more than (by
1 3
evaluate to F alse is ( 2
)
independence). Thus, the probability that C evaluates to T rue is 1ÿ(
1 3 =
2) 78.
Problem 9 [3 + 3 = 6 points]
Consider the formula (P ÿ ¬Q) ÿ (¬P ÿ Q), where P and Q are atoms.
Solution:
Machine Translated by Google
(a) No, since the interpretation that assigns T rue to P and F alse to Q would
make the formula evaluate to F alse, since (P ÿ ¬Q) evaluates to T rue while (¬P
ÿ Q) evaluates to F alse .
(b) Yes, since any interpretation that assigns F alse to P would make (P ÿ¬Q)
evaluate to F alse, and so (P ÿ ¬Q) ÿ (¬P ÿ Q) would evaluate to T rue.
Problem 10 [6 points]
Let G be an undirected graph, possibly with self-loops. Suppose we have the
predicate symbols Equals(·, ·), IsV ertex(·), and Edge(·, ·).
Let D be some domain that contains the set of vertices of G (D might contain
other elements as well). Let I be some interpretation that specifies functions for
Equals(·, ·), IsV ertex(·), and Edge(·, ·), so that for all u, v ÿ D, we have Equals[I]
(u, v ) = T (True) if and only if u = v, IsV ertex[I](u) = T if and only if u is a vertex
of G, and if u and v are both vertices of G, then Edge[I ](u, v) = T if and only if {u,
v} is an edge of G (we do not know the value of Edge[I](u, v) when u or v is not a
vertex of G) .
Write a formula in first order logic that captures the statement “the graph G
does not contain a triangle”, ie, the formula is T (True) under (D, I) if and only if
the graph G does not contain a triangle ( a triangle is 3 distinct vertices all of
which are connected to one another via an edge).
Solution:
Problem 11 [6 points]
Suppose we have an undirected graph G such that the degree of each vertex is
a multiple of 10 or 15. Show that the number of edges in G must be a multiple of
5.
Solution:
2|E|, and since 5 is prime, 5 must divide 2 or |E|. 5 clearly does not divide 2, so 5
divides |E|, as required.
Problem 12 [3 + 2 + 5 = 10 points]
Consider the following non-deterministic finite automaton (NFA):
b
y
has
x b
has
b has
b a,b
w b
b
z
has
u v
has
(a) Write a regular expression that defines the language recognized by the
above NFA.
(b) Construct (by drawing a state diagram) the smallest deterministic finite automaton
(DFA) that recognizes the same language as the above NFA (smallest in
terms of the number of states).
(c) Prove that your DFA for part (b) is indeed the smallest DFA that rec-ognizes the
same language as the above NFA (smallest in terms of the number of states).
Solution:
(a) (ab) ÿ
(b)
has
a,b
v
has
u b
b w
Machine Translated by Google
(c) We will show that any DFA that recognizes (ab) ÿ must have at least 3
states. 1 state is clearly not enough, since a DFA with only 1 state recognizes
either the empty language or the language {a, b} ÿ . Now, consider any DFA
with 2 states. We note that the start state must be an accepting state, since
the empty string needs to be accepted by the DFA. Since the string a is not
accepted by the DFA, the a-transition (the arrow labeled a) out of the start
state cannot be a self-loop, so the a-transition must lead to the other state.
Similarly, since the string b is not accepted by the DFA, the b-transition out
of the start state must lead to the other state. However, this means that bb is
accepted by the DFA, since ab is accepted by the DFA and the first letter of
the string does not affect whether the string is accepted, since both a and b
result in transitioning to the non-start state. This means that the DFA does
not recognize the language defined by (ab) ÿ . Thus, any DFA that recognizes
(ab) ÿ must have at least 3 states.