Boaz Barak's Intensive Intro To Crypto PDF
Boaz Barak's Intensive Intro To Crypto PDF
Boaz Barak's Intensive Intro To Crypto PDF
AN INTENSIVE
INTRODUCTION TO
C RY P TO G R A P H Y
If you can just get your mind together
Then come on across to me
We’ll hold hands, and then we’ll watch the sunrise
From the bottom of the sea
Mathematical Background 23
1 Introduction 47
2 Computational Secrecy 67
3 Pseudorandomness 85
Bibliography 363
Contents (detailed)
Mathematical Background 23
0.3 A quick overview of mathematical prerequisites 24
0.4 Mathematical Proofs 25
0.4.1 Example: The existence of infinitely many primes. 26
0.5 Probability and Sample spaces 28
0.5.1 Random variables 31
0.5.2 Distributions over strings 33
0.5.3 More general sample spaces. 34
0.6 Correlations and independence 34
0.6.1 Independent random variables 37
0.6.2 Collections of independent random variables. 38
0.7 Concentration 39
0.7.1 Chebyshev’s Inequality 41
0.7.2 The Chernoff bound 42
0.8 Exercises 43
1 Introduction 47
1.1 Defining encryptions 51
1.1.1 Generating randomness in actual cryptographic systems 53
1.2 Defining the secrecy requirement. 55
1.3 Perfect Secrecy 58
1.4 Necessity of long keys 63
2 Computational Secrecy 67
2.0.1 Proof by reduction 71
2.1 The asymptotic approach 72
2.1.1 Counting number of operations. 74
2.2 Our first conjecture 74
2.3 Why care about the cipher conjecture? 76
2.4 Prelude: Computational Indistinguishability 77
2.5 The Length Extension Theorem 79
2.5.1 Appendix: The computational model 83
3 Pseudorandomness 85
3.1 Stream ciphers 89
3.2 What do pseudorandom generators actually look like? 91
3.2.1 Attempt 0: The counter generator 92
3.2.2 Attempt 1: The linear checksum / linear feedback shift register (LFSR) 92
3.2.3 From insecurity to security 94
3.2.4 Attempt 2: Linear Congruential Generators with dropped bits 95
3.3 Successful examples 96
3.3.1 Case Study 1: Subset Sum Generator 96
3.3.2 Case Study 2: RC4 97
3.4 Non-constructive existence of pseudorandom generators 98
13.7 Client to client key exchange for secure text messaging - ZRTP, OTR, TextSecure 231
13.8 Heartbleed and logjam attacks 231
Bibliography 363
Foreword and Syllabus
0.1 Syllabus
In this fast-paced course, I plan to start from the very basic notions of
cryptogrpahy and by the end of the term reach some of the exciting
advances that happened in the last few years such as the construction
of fully homomorphic encryption, a notion that Brian Hayes called “one
of the most amazing magic tricks in all of computer science”, and in-
distinguishability obfuscators which are even more amazing. To achieve
this, our focus will be on ideas rather than implementations and so we
will present cryptographic notions in their pedagogically simplest
form– the one that best illustrates the underlying concepts– rather
than the one that is most efficient, widely deployed, or conforms to
Internet standards. We will discuss some examples of practical sys-
tems and attacks, but only when these serve to illustrate a conceptual
point.
• Part I: Introduction
security by reductions.
0.1.1 Prerequisites
The main prerequisite is the ability to read, write (and even enjoy!)
mathematical proofs. In addition, familiarity with algorithms, basic
probability theory and basic linear algebra will be helpful. We’ll only
use fairly basic concepts from all these areas: e.g. Oh-notation- e.g.
O(n) running time- from algorithms, notions such as events, random
variables, expectation, from probability theory, and notions such as
matrices, vectors, and eigenvectors. Mathematically mature students
should be able to pick up the needed notions on their own. See the
“mathematical background” handout for more details.
The main notions we will use in this course are the following:
• Proofs: First and foremost, this course will involve a heavy dose
of formal mathematical reasoning, which includes mathematical
definitions, statements, and proofs.
(In this class, the particular humans you are trying to convince are
me and the teaching fellows.)
Like any good piece of writing, a proof should be concise and not
be overly formal or cumbersome. In fact, overuse of formalism can
often be detrimental to the argument since it can mask weaknesses
in the argument from both the writer and the reader. Sometimes
students try to “throw the kitchen sink” at an answer trying to list
all possibly relevant facts in the hope of getting partial credit. But
a proof is a piece of writing, and a badly written proof will not get
credit even if it contains some correct elements. It is better to write
a clear proof of a partial statement. In particular, if you haven’t been
able to convince yourself that the statement is true, you should be
honest about it and explain which parts of the statement you have
been able to verify and which parts you haven’t.
In the spirit of “do what I say and not what I do”, I will now demon-
strate the importance of conciseness by belaboring the point and
spending several paragraphs on a simple proof, written by Euclid
around 300 BC. Recall that a prime number is an integer p > 1 whose
only divisors are p and 1. Euclid’s Theorem is the following:
From these two lemmas it follows that there exist infinitely many
primes, since otherwise if we let p1 , . . . , pk be the set of all primes,
then we would get a contradiction as by combining Lemma 0.2 and
Lemma 0.3 we would get a number n with a prime factor outside this
set. We now prove the lemmas:
Proof of Lemma 0.2. Let n > 1 be a number, and let p be the smallest
divisor of n that is larger than 1 (there exists such a number p since n
divides itself). We claim that p is a prime. Indeed suppose otherwise
there was some 1 < q < p that divides p. Then since n = pc for some
integer c and p = qc′ for some integer c′ we’ll get that n = qcc′ and
hence q divides n in contradiction to the choice of p as the smallest
28
divisor of n. !
These are all important questions that have been studied and de-
bated by scientists, mathematicians, statisticians and philosophers.
Fortunately, we will not need to deal directly with these questions
here. We will be mostly interested in the setting of tossing n random,
unbiased and independent coins. Below we define the basic proba-
bilistic objects of events and random variables when restricted to this
setting. These can be defined for much more general probabilistic
experiments or sample spaces, and later on we will briefly discuss how
this can be done. However, the n-coin case is sufficient for almost
everything we’ll need in this course.
We can also use the intersection (∩) and union (∪) operators to
talk about the probability of both event A and event B happening, or
the probability of event A or event B happening. For example, the
probability p that x has an even number of ones and x0 = 1 is the
same as P [ A ∩ B] where A = { x ∈ {0, 1}n : ∑in=−01 xi = 0 mod 2} and
B = { x ∈ {0, 1}n : x0 = 1}. This probability is equal to 1/4. (It is a
great exercise for you to pause here and verify that you understand
why this is the case.)
This makes sense: since A happens if and only if A does not happen,
the probability of A should be one minus the probability of A.
E[X] = ∑ 2− n X ( x ) . (4)
x ∈{0,1}n
E [ X + Y ] = E [ X ] + E [Y ] (5)
Proof.
E[X + Y] = ∑ 2−n ( X ( x ) + Y ( x )) =
x ∈{0,1}n
∑ 2− n X ( x ) + ∑ 2− n Y ( x ) = (6)
x ∈{0,1}b x ∈{0,1}b
E [ X ] + E [Y ]
32
Figure 2: The union bound tells us that the probability of A or B happening is at most
the sum of the individual probabilities. We can see it by noting that for every two sets
| A ∪ B| ≤ | A| + | B| (with equality only if A and B have no intersection).
1
P [ x0 = 1] = 2
4 1
(7)
P [ x0 + x1 + x2 ≥ 2] = P [{011, 101, 110, 111}] = 8 = 2
but
3 1 1
P [ x0 = 1 ∧ x0 + x1 + x2 ≥ 2] = P [{101, 110, 111}] = 8 > 2 · 2 (8)
P [∧i∈ I Ai ] = ∏ P [ Ai ]. (9)
i∈ I
Figure 4: Consider the sample space {0, 1}n and the events A, B, C, D, E corresponding
to A: x0 = 1, B: x1 = 1, C: x0 + x1 + x2 ≥ 2, D: x0 + x1 + x2 = 0 mod 2 and
D: x0 + x1 = 0 mod 2. We can see that A and B are independent, C is positively
correlated with A and positively correlated with B, the three events A, B, D are
mutually independent, and while every pair out of A, B, E is independent, the three
events A, B, E are not mutually independent since their intersection has probability
2 1 1 1 1 1
8 = 4 instead of 2 · 2 · 2 = 8 .
37
We say that two random variables X : {0, 1}n → R and Y : {0, 1}n →
R are independent if for every u, v ∈ R, the events { X = u} and 5
We use { X = u} as shorthand for
{Y = v} are independent.5 In other words, X and Y are independent { x | X ( x ) = u }.
if P [ X = u ∧ Y = v] = P [ X = u] P [Y = v] for every u, v ∈ R. For
example, if two random variables depend on the result of tossing
different coins then they are independent:
Lemma 0.7 Suppose that S = {s0 , . . . , sk−1 } and T = {t0 , . . . , tm−1 } are
disjoint subsets of {0, . . . , n − 1} and let X, Y : {0, 1}n → R be random
variables such that X = F ( xs0 , . . . , xsk−1 ) and Y = G ( xt0 , . . . , xtm−1 ) for
some functions F : {0, 1}k → R and G : {0, 1}m → R. Then X and Y
are independent.
|C | | A | | B | 2n − k − m
2n = 2k 2m 2n − k − m
= P [ X = a ] P [Y = b ] . (10)
E [ XY ] = ∑ P [ X = a ∧ Y = b] · ab =(1) ∑ P [ X = a] P [Y = b] · ab =(2)
a∈SX ,b∈SY a∈SX ,b∈SY
# $# $
∑ P [ X = a] · a ∑ P [Y = b ] b = (3)
a∈SX b∈SY
E [ X ] E [Y ]
(11)
38
P [ F ( X ) = a ∧ G (Y ) = b ] = ∑ P [ X = x ∧ Y = y] =
x s.t.F ( x )= a,y s.t. G (y)=b
∑ P [ X = x ] P [Y = y ] =
x s.t.F ( x )= a,y s.t. G (y)=b
⎛ ⎞ ⎛ ⎞
⎝ ∑ P [ X = x ]⎠ · ⎝ ∑ P [Y = y ] ⎠ =
x s.t.F ( x )= a y s.t.G (y)=b
P [ F ( X ) = a ] P [ G (Y ) = b ] .
(12)
P [ X0 = a 0 ∧ · · · ∧ X n − 1 = a n − 1 ] = P [ X0 = a 0 ] · · · P [ X n − 1 = a n − 1 ] .
(13)
And similarly, we have that
Lemma 0.8 — Expectation of product of independent random variables. If
X0 , . . . , Xn−1 are mutually independent then
n −1 n −1
E [ ∏ Xi ] = ∏ E [ Xi ] . (14)
i =0 i =0
0.7 Concentration
well.
For every random variable Xi in [0, 1], Var[ Xi ] ≤ 1 (if the variable
is always in [0, 1], it can’t be more than 1 away from its expectation),
√
and hence Eq. (16) implies that Var[ X ] ≤ n and hence σ[ X ] ≤ n. For
√ √
large n, n ≪ 0.001n, and in particular if n ≤ 0.001n/k, we can
use Chebyshev’s inequality to bound the probability that X is not in
[0.499n, 0.501n] by 1/k2 .
Figure 7: In the normal distribution or the Bell curve, the probability of deviating k
standard deviations from the expectation shrinks exponentially in k2 , and specifically
2
with probability at least 1 − 2e−k /2 , a random variable X of expectation µ and standard
deviation σ satisfies µ − kσ ≤ X ≤ µ + kσ. This figure gives more precise bounds for
k = 1, 2, 3, 4, 5, 6. (Image credit:Imran Baghirov)
0.8 Exercises
The following exercises will be part of the first problem set in the
course, so you can get a head start by working on them now.
(e) Give an example for a random variable X such that its stan-
dard deviation is not equal to E [| X − E [ X ]|].
(c) Prove that if m > 1000 · n2 then the probability that H is one to
one is at least 0.9.
(e) Prove that if m < n2 /1000 then the probability that H is one to
one is at most 0.1.
45
√ + n ,n √ + ,n
2πn e ≤ n! ≤ 2 2πn ne (18)
Do the following:
(a) Prove that for every n, P x←R {0,1}n [∑ xi ≥ 0.6n] < 2−n/1000
The above shows that if you were given a coin of bias at least 0.6,
you should only need some constant number of samples to be able
to reject the “null hypothesis” that the coin is completely unbiased
with extremely high confidence. In the following somewhat more
challenging questions (which can be considered as bonus exercise)
we try to show a converse to this:
(a) Let P be the uniform distribution over {0, 1}n and Q be the
1/2 + ϵ-biased distribution corresponding to tossing n coins in
which each one has a probability of 1/2 + ϵ of equalling 1 and
probability 1/2 − ϵ of equalling 0. Namely the probability of
x ∈ {0, 1}n according to Q is equal to ∏in=1 (1/2 − ϵ + 2ϵxi ).
ii. Prove that for every function F mapping {0, 1}n to {0, 1}, if
n < 1/(100ϵ)2 then the probabilities that F ( x ) = 1 under
P and Q respectively differ by at most 0.1. Therefore, if the
number of samples is smaller than a constant times 1/ϵ2 then
there is simply no test that can reliably distinguish between
these two possiblities.
1
Introduction
In 1587, Mary the queen of Scots, and the heir to the throne of
England, wanted to arrange the assasination of her cousin, queen
Elisabeth I of England, so that she could ascend to the throne and
finally escape the house arrest under which she has been for the last
18 years. As part of this complicated plot, she sent a coded letter to
Sir Anthony Babington. It is what’s known as a substitution cipher
Figure 1.1: Snippet from encrypted communication between queen Mary and Sir
Babington
can be done via frequency analysis (can you see why?). Confeder-
ate generals used Vigenère regularly during the civil war, and their
messages were routinely cryptanalzed by Union officers.
Figure 1.3: Confederate Cipher Disk for implementing the Vigenère cipher
The story of the Enigma cipher had been told many times (see for
example Kahn’s book as well as Andrew Hodges’ biography of Alan
Turing). This was a mechanical cipher (looking like a typewriter)
where each letter typed would get mapped into a different letter
depending on the (rather complicated) key and current state of the
machine which had several rotors that rotated at different paces. An
50 an intensive introduction to cryptography
Figure 1.4: Confederate encryption of the message “Gen’l Pemberton: You can expect
no help from this side of the river. Let Gen’l Johnston know, if possible, when you
can attack the same point on the enemy’s lines. Inform me also and I will endeavor to
make a diversion. I have sent some caps. I subjoin a despatch from General Johnston.”
m = Dk ( Ek (m)) . (1.1)
Definition 1.1 says nothing about security and does not rule out
trivial “encryption” schemes such as the scheme Ek (m) = m that
simply outputs the plaintext as is. Defining security is tricky, and
we’ll take it one step at a time, but lets start by pondering what
is secret and what is not. A priori we are thinking of an attacker
Eve that simply sees the ciphertext y = Ek ( x ) and does not know
anything on how it was generated. So, it does not know the details
of E and D, and certainly does not know the secret key k. However,
many of the troubles past cryptosystems went through was caused by
them relying on “security through obscurity”— trusting that the fact
their methods are not known to their enemy will protect them from
being broken. This is a faulty assumption - if you reuse a method
again and again (even with a different key each time) then eventually
your adversaries will figure out what you are doing. And if Alice and
Bob meet frequently in a secure location to decide on a new method,
they might as well take the opportunity to exchange their secrets..
These considerations led Kerchoffs to state the following principle:
(The actual quote is “Il faut qu’il n’exige pas le secret, et qu’il
puisse sans inconvénient tomber entre les mains de l’ennemi” loosely
translated as “The system must not require secrecy and can be stolen
by the enemy without causing trouble”. According to Steve Bellovin
the NSA version is “assume that the first copy of any device we make
is shipped to the Kremlin”.)
• For every fixed string x ∈ {0, 1}n , if you toss a coin n times, the
probability that the heads/tails pattern will be exactly x is 2−n .
An encyption
Definition 1.2 — Security of encryption: first attempt.
scheme ( E, D ) is n-secure if no matter what method Eve employs,
the probability that she can recover the true key k from the cipher-
text c is at most 2−n .
You might wonder if Definition 1.2 is not too strong. After all how
are we going ever to prove that Eve cannot recover the secret key no
matter what she does? Edgar Allan Poe would say that there can
always be a method that we overlooked. However, in fact this defini-
tion is too weak! Consider the following encryption: the secret key k
is chosen at random in {0, 1}n but our encryption scheme simply ig-
introduction 57
The math behind the above argument is very simple, yet I urge
you to read and re-read the last two paragraphs until you are sure
that you completely understand why this encryption is in fact secure
according to the above definition. This is a “toy example” of the kind
of reasoning that we will be employing constantly throughout this
course, and you want to make sure that you follow it.
So, Lemma 1.3 is true, but one might question its meaning. Clearly
this silly example was not what we meant when stating this defini-
tion. However, as mentioned above, we are not willing to ignore even
silly examples and must amend the definition to rule them out. One
obvious objection is that we don’t care about hiding the key- it is
the message that we are trying to keep secret. This suggests the next
attempt:
Now this seems like it captures our intended meaning. But re-
member that we are being anal, and truly insist that the definition
holds as stated, namely that for every plaintext message x and every
function Eve : {0, 1} L → {0, 1}ℓ , the probability over the choice of k
that Eve( Ek ( x )) = x is at most 2−n . But now we see that this is clearly
impossible. After all, this is supposed to work for every message x
and every function Eve, but clearly if x is the all-zeroes message 0ℓ
and Eve is the function that ignores its input and simply outputs 0ℓ ,
then it will hold that Eve( Ek ( x )) = x with probability one.
58 an intensive introduction to cryptography
So, if before the definition was too weak, the new definition is too
strong and is impossible to achieve. The problem is that of course
we could guess a fixed message with probability one, so perhaps we
could try to consider a definition with a random message. That is:
1/2, then Eve won’t be able to guess which one it is with probability
better than half. In fact, that turns out to be the heart of the matter:
Let’s fix the message x0 to be the all zeroes message and pick x1 at
random in M. Under our assumption, it holds that for random key k
and message x1 ∈ M,
(Can you see why? This is worthwhile stopping and reading again.)
But this can be turned into an attacker Eve′ such that for b ← R {0, 1}.
the probability that Eve′ ( Ek ( xb )) = xb is larger than 1/2. Indeed,
we can define Eve′ (y) to output x1 if Eve(y) = x1 and otherwise
output a random message in { x0 , x1 }. The probability that Eve′ (y)
equals x1 is higher when y = Ek ( x1 ) than when y = Ek ( x0 ), and
since Eve′ outputs either x0 or x1 , this means that the probability that
Eve′ ( Ek ( xb )) = xb is larger than 1/2. (Can you see why?) !
60 an intensive introduction to cryptography
Figure 1.6: A perfectly secret encryption scheme for two-bit keys and messages. The
blue vertices represent plaintexts and the red vertices represent ciphertexts, each edge
mapping a plaintext x to a ciphertext y = Ek ( x ) is labeled with the corresponding key
k. Since there are four possible keys, the degree of the graph is four and it is in fact a
complete bipartite graph. The encryption scheme is valid in the sense that for every
k ∈ {0, 1}2 , the map x 3→ Ek ( x ) is one-to-one, which in other words means that the set
of edges labeled with k is a matching.
Proof Idea:Our scheme is the one-time pad also known as the “Ver-
nam Cipher”, see Fig. 1.8. The encryption is exceedingly simple: to
encrypt a message x ∈ {0, 1}n with a key k ∈ {0, 1}n we simply
output x ⊕ k where ⊕ is the bitwise XOR operation that outputs the
string corresponding to XORing each coordinate of x and k.
Proof of Theorem 1.8. For two binary strings a and b of the same
length n, we define a ⊕ b to be the string c ∈ {0, 1}n such that ci = ai +
introduction 61
Figure 1.7: For any key length n, we can visualize an encryption scheme ( E, D ) as a
graph with a vertex for every one of the 2 L(n) possible plaintexts and for every one of
the ciphertexts in {0, 1}∗ of the form Ek ( x ) for k ∈ {0, 1}n and x ∈ {0, 1} L(n) . For every
plaintext x and key k, we add an edge labeled k between x and Ek ( x ). By the validity
condition, if we pick any fixed key k, the map x 3→ Ek ( x ) must be one-to-one. The
condition of perfect secrecy simply corresponds to requiring that every two plaintexts
x and x ′ have exactly the same set of neighbors (or multi-set, if there are parallel
edges).
62 an intensive introduction to cryptography
Figure 1.8: In the one time pad encryption scheme we encrypt a plaintext x ∈ {0, 1}n
with a key k ∈ {0, 1}n by the ciphertext x ⊕ k where ⊕ denotes the bitwise XOR
operation.
So, does Theorem 1.8 give the final word on cryptography, and
means that we can all communicate with perfect secrecy and live
happily ever after? No it doesn’t. While the one-time pad is effi-
cient, and gives perfect secrecy, it has one glaring disadvantage: to
communicate n bits you need to store a key of length n. In contrast,
practically used cryptosystems such as AES-128 have a short key of
128 bits (i.e., 16 bytes) that can be used to protect terabytes or more of
communication! Imagine that we all needed to use the one time pad.
If that was the case, then if you had to communicate with m people,
you would have to maintain (securely!) m huge files that are each as
long as the length of the maximum total communication you expect
with that person. Imagine that every time you opened an account
with Amazon, Google, or any other service, they would need to send
you in the mail (ideally with a secure courier) a DVD full of random
numbers, and every time you suspected a virus, you’d need to ask all
these services for a fresh DVD. This doesn’t sound so appealing.
This is not just a theoretical issue. The Soviets have used the one-
time pad for their confidential communication since before the 1940’s.
In fact, even before Shannon’s work, the U.S. intelligence already
knew in 1941 that the one-time pad is in principle “unbreakable” (see
page 32 in the Venona document). However, it turned out that the
hassle of manufacturing so many keys for all the communication
took its toll on the Soviets and they ended up reusing the same keys
for more than one message. They did try to use them for completely
different receivers in the (false) hope that this wouldn’t be detected.
The Venona Project of the U.S. Army was founded in February 1943
by Gene Grabeel (see Fig. 1.9), a former home economics teacher
from Madison Heights, Virgnia and Lt. Leonard Zubko. In October
1943, they had their breakthrough when it was discovered that the
5
Credit to this discovery is shared by
Lt. Richard Hallock, Carrie Berry, Frank
Russians were reusing their keys.5 In the 37 years of its existence, Lewis, and Lt. Karl Elmquist, and there
the project has resulted in a treasure chest of intelligence, exposing are others that have made important
contribution to this project. See pages
hundreds of KGB agents and Russian spies in the U.S. and other
27 and 28 in the document.
countries, including Julius Rosenberg, Harry Gold, Klaus Fuchs,
Alger Hiss, Harry Dexter White and many others.
Figure 1.9: Gene Grabeel, who founded the U.S. Russian SigInt program on 1 Feb 1943.
Photo taken in 1942, see Page 7 in the Venona historical study.
Figure 1.10: An encryption scheme where the number of keys is smaller than the
number of plaintexts corresponds to a bipartite graph where the degree is smaller
than the number of vertices on the left side. Together with the validity condition this
implies that there will be two left vertices x, x ′ with non-identical neighborhoods, and
hence the scheme does not satisfy perfect secrecy.
introduction 65
Proof Idea:The idea behind the proof is illustrated in Fig. 1.10. If the
number of keys is smaller than the number of messages then the
neighborhoods of all vertices in the corresponding graphs cannot be
identical.
Claim I: There exists some x1 ∈ {0, 1} L and k ∈ {0, 1}n such that
Ek ( x1 ) ̸∈ S0 .
So, why can’t we use the above Python program to break all
encryptions in the Internet and win infamy and fortune? We can
in fact, but we’ll have to wait a really long time, since the loop in
Distinguish will run 2128 times, which will take much more than
the lifetime of the universe to complete, even if we used all the
computers on the planet.
This in fact does seem to be the case, but as we’ve seen, defining
security is a subtle task, and will take some care. As before, the way
we avoid (at least some of) the pitfalls of so many cryptosystems in
history is that we insist on very precisely defining what it means for a
scheme to be secure.
1
It is important to keep track of what is
Definition 2.1 seems very natural, but is in fact impossible to known and unknown to the adversary
achieve if the key is shorter than the message. Eve. The adversary knows the set
{m0 , m1 } of potential messages, and the
ciphertext y = Ek (mb ). The only things
she doesn’t know are whether b = 0
Before reading further, you might want to stop√ or b = 1, and the value of the secret
P
and think if you can prove that there is no, say, n key k. In particular, because m0 and m1
secure encryption scheme satisfying Definition 2.1 are known to Eve, it does not matter
whether we define Eve’s goal in this
with ℓ = n + 1 and where the time to compute the
“security game” as outputting mb or as
encryption is polynomial. outputting b.
An encryption
Definition 2.2 — Computational secrecy (concrete).
scheme ( E, D ) has t bits of computational secrecy 2 if for every
two distinct plaintexts {m0 , m1 } ⊆ {0, 1}ℓ and every strategy of
Eve using at most 2t computational steps, if we choose at random
b ∈ {0, 1} and a random key k ∈ {0, 1}n , then the probability that
Eve guesses mb after seeing Ek (mb ) is at most 1/2 + 2−t . 2
This is a slight simplification of the
typical notion of “t bits of security”. In
the more standard definition we’d say
Having learned our lesson, let’s try to see that this strategy does that a scheme has t bits of security if for
give us the kind of conditions we desired. In particular, let’s verify every t1 + t2 ≤ t, an attacker running
in 2t1 time can’t get success probability
that this definition implies the analogous condition to perfect secrecy.
advantage more than 2−t2 . However
these two definitions only differ from
If ( E, D ) is
Theorem 2.3 — Guessing game for computational secrecy. one another by at most a factor of two.
This may be important for practical
has t bits of Computational secrecy as per Definition 2.2 then ev-
applications (where the difference
ery subset M ⊆ {0, 1}ℓ and every strategy of Eve using at most between 64 and 32 bits of security could
2t − (100ℓ + 100) computational steps, if we choose at random be crucial) but won’t matter for our
concerns.
m ∈ M and a random key k ∈ {0, 1}n , then the probability that Eve
guesses m after seeing Ek (mb ) is at most 1/| M| + 2−t+1 .
and
This will imply that if Eve ran in polynomial time and had poly-
nomial advantage over 1/| M| in guessing a plaintext chosen from
M, then Eve′ would run in polynomial time and have polynomial
advantage over 1/2 in guessing a plaintext chosen from {m0 , m1 }.
The first item can be shown by simply doing the same proof
1
more carefully, keeping track how the advantage over | M |
for Eve
translates into an advantage over 12 for Eve′ . As the world’s most
annoying saying goes, doing this is an excellent exercise for the
reader. The item point is obtained by looking at the definition of
computational secrecy 71
Figure 2.1: We show that the security of S′ implies the security of S by transforming an
adversary Eve breaking S into an adversary Eve′ breaking S′
construct the scheme S based on the scheme S′ , but then prove that
we can transform an algorithm breaking S into an algorithm breaking
S′ . Just like in computational complexity, it can sometimes be hard
to keep track of the direction of the reduction. In fact, cryptographic
reductions can be even subtler, since they involve an interplay of
several entities (for example, sender, receiver, and adversary) and
probabilistic choices (e.g., over the message to be sent and the key).
the length of the key and ϵ > 0 is some absolute constant such as
ϵ = 1/3.
These are not all the theoretically possible running times. One can
have intermediate functions such as nlog n though we will generally
not encounter those. To make things clean (and to correspond to
standard terminology), we will say that an algorithm A is efficient
if it runs in time poly(n) when n is its input length (which will
always be the same, up to polynomial factors, as the key length).
If µ(n) is some probability that depends on the input/key length
parameter n, then we say that µ(n) is negligible if it’s smaller than
every polynomial. That is, for every c, d there is some N, such that
if n > N then µ(n) < 1/(cn)d . Note that for every non-constant
polynomials p, q, µ(n) is negligible if and only if the function µ′ (n) =
p(µ(q(n))) is negligible.
An encryption
Definition 2.4 — Computational secrecy (asymptotic).
scheme ( E, D ) is computationally secret if for every two distinct
plaintexts {m0 , m1 } ⊆ {0, 1}ℓ and every efficient (i.e., polynomial
time) strategy of Eve, if we choose at random b ∈ {0, 1} and a
random key k ∈ {0, 1}n , then the probability that Eve guesses
mb after seeing Ek (mb ) is at most 1/2 + µ(n) for some negligible
function µ(·).
One more detail that we’ve so far ignored is what does it mean
exactly for a function to be computable using at most T operations.
Fortunately, when we don’t really care about the difference between
T and, say, T 2 , then essentially every reasonable definition gives the
same answer. Formally, we can use the notions of Turing machines,
Boolean circuits, or straightline programs to define complexity. For
concreteness, lets define that a function F : {0, 1}n → {0, 1}m has
complexity at most T if there is a Boolean circuit that computes F
using at most T NAND gates (or equivalently, there is a NAND
program computing F in at most T lines). (There is nothing special
about NAND, and we can use any other universal gate set.) We will
often also consider probabilistic functions in which case we allow the
circuit a RAND gate that outputs a single random bit (though this in
general does not give extra power). The fact that we only care about
asymptotics means you don’t really need to think of gates, etc.. when
arguing in cryptography. However, it is comforting to know that this
notion has a precise mathematical formulation.
Proof. We just sketch the proof, as this is not the focus of this course.
If P = NP then whenever we have a loop that searches through some
domain to find some string that satisfies a particular property (like
the loop in the Distinguish subroutine above that searches over all
keys) then this loop can be sped up exponentially . !
• Concrete candidates: As we will see in the next lecture, there are sev-
eral concrete candidate ciphers using keys shorter than messages
for which despite tons of effort, no one knows how to break them.
Some of them are widely used and hence governments and other
benign or not so benign organizations have every reason to invest
huge resources in trying to break them. Despite that as far as we
know (and we know a little more after Edward Snowden’s reve-
lations) there is no significant break known for the most popular
ciphers. Moreover, there are other ciphers that can be based on
canonical mathematical problems such as factoring large integers
or decoding random linear codes that are immensely interesting in
their own right, independently of their cryptographic applications.
Figure 2.2: Web of reductions between notions equivalent to ciphers with larger than
key messages
computational secrecy 77
We will soon see the first of the many reductions we’ll learn in this
course. Together this “web of reductions” forms the scientific core
of cryptography, connecting many of the core concepts and enabling
us to construct increasingly sophisticated tools based on relatively
simple “axioms” such as the cipher conjecture.
Let X and Y be
Definition 2.6 — Computational Indistinguishability.
o
two distributions over {0, 1} . We say that X and Y are ( T, ϵ)-
computationally indistinguishable, denoted by X ≈ T,ϵ Y, if for every
function Eve computable with at most T operations,
Let
Theorem 2.7 — Computational Indistinguishability phrasing of security.
( E, D ) be a valid encryption scheme. Then ( E, D ) is computation-
ally secret if and only if for every two messages m0 , m1 ∈ {0, 1}ℓ ,
Working out the proof is an excellent way to make sure you un-
derstand both the definition of Computational secrecy and computa-
78 an intensive introduction to cryptography
Write
m −1
P [ Eve( X1 ) = 1] − P [ Eve( Xm ) = 1] = ∑ (P [ Eve( Xi ) = 1] − P [ Eve( Xi+1 ) = 1]) .
i =1
(2.5)
Thus,
m −1
∑ |P [ Eve( Xi ) = 1] − P [ Eve( Xi+1 ) = 1]| > (m − 1)ϵ (2.6)
i =1
* *
*E [ Eve′ ( Hi )] − E [ Eve( Hi+1 )]* > ϵ . (2.8)
In other words
* *
*E X ,...,X ,Y ,...,Y [ Eve′ ( X1 , . . . , Xi−1 , Yi , . . . , Yℓ )] − E X [ Eve′ ( X1 , . . . , Xi , Yi+1 , . . . , Yℓ )]* > ϵ .
1 i −1 i ℓ 1 ,...,Xi ,Yi +1 ,...,Yℓ
(2.9)
By the averaging principle8 this means that there exist some values 8
This is the principle that if the average
x1 , . . . , xi−1 , yi+1 , . . . , yℓ such that grade in an exam was at least α then
* - .*
someone must have gotten at least α,
*E X ,Y Eve′ ( x1 , . . . , xi−1 , Yi , yi+1 , . . . , yℓ ) − Eve′ ( x1 , . . . , xi−1 , Xi , yi+1 , . . . , yℓ ) or* > ϵ
in other words that if a real-valued
i i
(2.11) random variable Z satisfies EZ ≥ α
then P [ Z ≥ α] > 0.
Now Xi and Yi are simply independent draws from the
distributions X and Y respectively, and so if we define
Eve(z) = Eve′ ( x1 , . . . , xi−1 , z, yi+1 , . . . , yℓ ) then Eve runs in
time at most the running time of Eve plus 2ℓn and it satisfies
* *
*E X [ Eve( Xi )] − EY [ Eve(Yi )]* > ϵ (2.12)
i i
We can now prove the full length extension theorem. Before doing
so, we will need to generalize the notion of an encryption scheme
to allow a randomized encryption scheme. That is, we will consider
encryption schemes where the encryption algorithm can “toss coins”
in its computation. There is a crucial difference between key material
and such “as hoc” randomness. Keys need to be not only chosen at
random, but also shared in advance between the sender and receiver,
and stored securely throughout their lifetime. The “coin tosses” used
by a randomized encryption scheme are generated “on the fly” and
are not known to the receiver, nor do they need to be stored long
term by the sender. So, allowing such randomized encryption does
not make a difference for most applications of encryption schemes.
In fact, as we will see later in this course, randomized encryption
is necessary for security against more sophisticated attackes such
as chosen plaintext and chosen ciphertext attacks, as well as for
obtaining secure public key encryptions. We will use the notation
Ek (m; r ) to denote the output of the encryption algorithm on key k,
computational secrecy 81
Figure 2.3: Constructing a cipher with t bit long messages from one with n + 1 long
messages
Note that Ê is not a valid encryption scheme since it’s not at all
clear there is a decryption algorithm for it. It is just an hypothetical
tool we use for the proof. Since both E and Ê are randomized en-
cryption schemes (with E using (t − 1)n bits of randomness for the
emphemeral keys k1 , . . . , k t−1 and Ê using (2t − 1)n bits of random-
ness for the ephemeral keys k1 , . . . , k t , k′2 , . . . , k′t ), we can also write
Eq. (2.14) as
′
EUn (m; Utn ) ≈ ÊUn (m; U(′ 2t−1)n ) (2.15)
Once we prove the claim then we are done since we know that
for every pair of message m, m′ , EUn (m) ≈ ÊUn (m) and EUn (m′ ) ≈
ÊUn (m′ ) but ÊUn (m) ≈ ÊUn (m′ ) since Ê is essentially the same as the
t-times repetition scheme we analyzed above. Thus by the triangle
inequality we can conclude that EUn (m) ≈ EUn (m′ ) as we desired.
* *
*E [ Eve′ ( Hj )] − E [ Eve′ ( Hj+1 )]* ≥ ϵ (∗) (2.16)
* *
* *
*E k j−1 [ Eve′ (α, Ek′ j−1 (k j , m j ), β) − Eve′ (α, Ek′ j−1 (k′j , m j ), β)]* ≥ ϵ (∗∗)
(2.17)
for every n ∈ N
Note that the requirement that ℓ > n is crucial to make this notion
non-trivial, as for ℓ = n the function G ( x ) = x clearly satisfies
that G (Un ) is identical to (and hence in particular indistinguishable
from) the distribution Un . (Make sure that you understand this last
statement!) However, for ℓ > n this is no longer trivial at all, and in
particular if we didn’t restrict the running time of Eve then no such
pseudo-random generator would exist:
Lemma 3.2 Suppose that G : {0, 1}n → {0, 1}n+1 . Then there ex-
ists an (inefficient) algorithm Eve : {0, 1}n+1 → {0, 1} such that
E [ Eve( G (Un ))] = 1 but E [ Eve(Un+1 )] ≤ 1/2.
Proof. The proof of this theorem is very similar to the length exten-
sion theorem for ciphers, and in fact this theorem can be used to give
an alternative proof for the former theorem.
use G ′ to map si−1 to the n + 1-long bit string (si , yi ), output yi and
keep si as our new state. To prove the security of this construction we
need to show that the distribution G (Un ) = (y1 , . . . , yt ) is computa-
tionally indistinguishable from the uniform distribution Ut . As usual,
we will use the hybrid argument. For i ∈ {0, . . . , t} we define Hi to
be the distribution where the first i bits chosen at uniform, whereas
the last t − i bits are computed as above. Namely, we choose si at ran-
dom in {0, 1}n and continue the computation of yi+1 , . . . , yt from the
state si . Clearly H0 = G (Un ) and Ht = Ut and hence by the triangle
inequality it suffices to prove that Hi ≈ Hi+1 for all i ∈ {0, . . . , t − 1}.
We illustrate these two hybrids in Fig. 3.2.
Figure 3.2: Hybrids Hi and Hi+1 — dotted boxes refer to values that are chosen inde-
pendently and uniformly at random
Now suppose otherwise, that there exists some adversary Eve such
that |E [ Eve( Hi )] − E [ Eve( Hi+1 )]| ≥ ϵ for some non-negligible ϵ. We
will build from Eve an adversary Eve′ breaking the security of the
pseudorandom generator G ′ (see Fig. 3.3).
Figure 3.3: Building an adversary Eve′ for G ′ from an adversary Eve distinguishing
Hi and Hi+1 . The boxes marked with questions marks are those that are random or
pseudorandom depending on whether we are in Hi or Hi+1 . Everything inside the
dashed red lines is simulated by Eve′ that gets as input the n + 1-bit string (si+1 , yi+1 ).
It turns out that the converse direction is also true, and hence
these two conjectures are equivalent, though we will probably not
show the (quite non-trivial) proof of this fact in this course. (We
might show some weaker version of this harder direction.)
Proof. The construction is actually quite simple, recall that the one
time pad is a perfectly secure cipher but its only problem was that to
encrypt an n + 1 long message it needed an n + 1 long bit key. Now
using a pseudorandom generator, we can map an n-bit long key into
an n + 1-bit long string that looks random enough that we could use it
as a key for the one-time pad. That is, our cipher will look as follows:
and
Dk ( c ) = G ( k ) ⊕ c (3.4)
The claim implies the security of the scheme, since it means that
EUn (m) is indistinguishable from the one-time-pad encryption of m,
which is identically distributed to the one-time pad encryption of
m′ which (by another application of the claim) is indistinguishable
from EUn (m′ ) and so the theorem follows from the triangle inequality.
Thus all that’s left is to prove the claim:
* *
*E [ Eve′ ( G (Un ) ⊕ m)] − E [ Eve′ (Un+1 ⊕ m)]* ≥ ϵ (3.5)
So far we have made the conjectures that objects such as ciphers and
pseudorandom generators exist, without giving any hint as to how
they would actually look like. (Though we have examples such as
Ceasar cipher, Vignere, and Enigma of what secure ciphers don’t look
like.) As mentioned above, we do not know how to prove that any
particular function is a pseudorandom generator. However, there
are quite simple candidates (i.e., functions that are conjectured to be
secure pseudorandom generators), though care must be taken in
92 an intensive introduction to cryptography
LFSR can be thought of as the “mother” (or maybe more like the sick
great-uncle) of all psuedorandom generators. One of the simplest
ways to generate a “randomish” extra digit given an n digit number
5
CRC are often used to generate a
is to use a checksum - some linear combination of the digits, with a
“control digit” to detect mistypes of
canonical example being the cyclic redundancy check or CRC.5 This credit card or social security card
motivates the notion of a linear feedback shift register generator (LFSR): number. This has very different goals
than its use for pseudorandom gener-
if the current state is s ∈ {0, 1}n then the output is f (s) where f is ators, though there are some common
a linear function (modulo 2) and the new state is obtained by right intuitions behind the two usages.
shifting the previous state and putting f (s) at the leftmost location.
That is, s1′ = f (s) and si′ = si−1 for i ∈ {2, . . . , n}.
exponential number of steps until the state repeats itself), though that
also holds for the simple “counter” generator we saw above. They
also have the property that every individual bit is equal to 0 or 1
with probability exactly half (the counter generator also shares this
property).
Easy Hard
Continuous Discrete
Convex Non-convex
Linear Non-linear (degree ≥ 2)
Noiseless Noisy
Local Global
Shallow Deep
Low degree High degree
Let’s now describe some successful (at least per current knowledge)
pseudorandom generators:
Given the known constants and known output, figuring out the set
of potential seeds can be thought of as solving a single equation in 40
variables. However, this equation is clearly overdetermined, and will
have a solution regardless of whether the observed value is indeed an
output of the generator, or it is chosen uniformly at random.
def RC4(P,i,j):
i = (i + 1) % 256
j = (j + P[i]) % 256
P[i], P[j] = P[j], P[i]
return (P,i,j,P[(P[i]+P[j]) % 256])
The function RC4 takes as input the current state P,i,j of the
generator and returns the new state together with a single output
byte. The state of the generator consists of an array P of 256 bytes,
98 an intensive introduction to cryptography
Proof Idea: The proof uses an extremely useful technique known as the 11
There is a whole (highly recom-
“probabilistic method” which is not too hard mathematically but can mended) book by Alon and Spencer
be confusing at first.11 The idea is to give a “non constructive” proof devoted to this method.
(We’ve replaced here the probability statements in Eq. (3.2) with the
equivalent sums so as to reduce confusion as to what is the sample
space that BP is defined over.)
To understand this proof it is crucial that you pause here and see
how the definition of BP above corresponds to Eq. (3.7). This may
well take re-reading the above text once or twice, but it is a good
exercise at parsing probabilistic statements and learning how to
identify the sample space that these statements correspond to.
2
2O(T log T ) 2−T < 0.1 for sufficiently large T. What is important for
us about the number 0.1 is that it is smaller than 1. In particular this
means that there exists a single G ∗ ∈ Fℓm such that G ∗ does not violate
Eq. (3.2) with respect to any NAND program of at most T lines, but
that precisely means that G ∗ is a ( T, ϵ) pseudorandom generator.
2
is at most 2−T . Eq. (3.8) follows directly from the Chernoff bound.
If we let for every i ∈ [ L] the random variable Xi denote P(yi ),
then since y0 , . . . , y L−1 is chosen independently at random, these
are independently and identically distributed random variables
with mean E y∼{0,1}m [ P(y)] = P y∼{0,1}m [ P(y) = 1] and hence the
probability that they deviate from their expectation by ϵ is at most
2
2 · 2−ϵ L/2 . !
4
Pseudorandom functions
An efficiently
Definition 4.1 — Pseudorandom Function Generator.
computable function F taking two inputs s ∈ {0, 1}n and
n
i ∈ {0, . . . , 2 − 1} and outputting a single bit F (s, i ) is a pseu-
dorandom function (PRF) generator if for every polynomial time
adversary A outputting a single bit and polynomial p(n), if n is
* *
* *
* E [ A F(s,·) (1n )] − E [ A (1 )]** < 1/p(n) .
H n
(4.1)
*s∈{0,1}n H← [2n ]→{0,1}
R
In the next lecture we will see the proof of following theorem (due
to Goldreich, Goldwasser, and Micali)
But before we see the proof of Theorem 4.2, let us see why pseudo-
random functions could be useful.
pseudorandom functions 103
Figure 4.1: In a pseudorandom function, an adversary cannot tell whether they are
given a black box that computes the function i 3→ F (s, i ) for some secret s that was
chosen at random and fixed, or whether the black box computes a completely random
function that tosses a fresh random coin whenever it’s given a new input i
Let’s start with a very simple scenario which I’ll call the login
problem. Alice and Bob share a key as before, but now Alice wants
to simply prove her identity to Bob. What makes it challenging is
that this time they need to tackle not the passive eavesdropping
Eve but the active adversary Mallory who completely controls the
communication channel between them and can modify (or mall) any
message that they send out. Specifically for the identity proving case,
we think of the following scenario. Each instance of such an iden-
tification protocol consists of some interaction between Alice and
Bob that ends with Bob deciding whether to accept it as authentic or
104 an intensive introduction to cryptography
The most basic way to try to solve the login problem is simply
using a password. That is, if we assume that Alice and Bob can share
a key, we can treat this key as some secret password p that was
selected at random from {0, 1}n (and hence can only be guessed with
probability 2−n ). Why doesn’t Alice simply send p to Bob to prove
to him her identity? A moment’s thought shows that this would be a
very bad idea. Since Mallory is controlling the communication line,
she would learn p after the first identification attempt and then could
impersonate Alice in future interactions. However, we seem to have
just the tool to protect the secrecy of p— encryption. Suppose that
Alice and Bob share a secret key k and an additional secret password
p. Wouldn’t a simple way to solve the login problem be for Alice to
send to Bob an encryption of the password p? After all, the security
of the encryption should guarantee that Mallory can’t learn p, right?
The problem is that Mallory does not have to learn the password
p in order to impersonate Alice. For example, she can simply record
the message Alice c1 sends to Bob in the first session and then replay
it to Bob in the next session. Since the message is a valid encryption
of p, then Bob would accept it from Mallory! (This is known as a
replay attack and is a common concern one needs to protect against in
cryptographic protocols.) One can try to put in countermeasures to
defend against this particular attack, but its existence demonstrates
that secrecy of the password does not guarantee security of the login
protocol.
The idea is that they create what’s known as a one time password.
Alice and Bob will share an index s ∈ {0, 1}n for the pseudorandom
function generator { f s }. When Alice wants to prove to Bob her
identity, Bob will choose a random i ← R {0, 1}n , and send i to Alice,
and then Alice will send f s (i ), f s (i + 1), . . . , f s (i + ℓ − 1) to Bob where
ℓ is some parameter (you can think of ℓ = n for simplicity). Bob will
check that indeed y = f S (i ) and if so accept the session as authentic.
pseudorandom functions 105
Protocol PRF-Login:
• Shared input: s ∈ {0, 1}n . Alice and Bob treat it as a seed for a
pseudorandom function generator { f s }.
As we will see it’s not really crucial that the input i (which is
known in crypto parlance as a nonce) is random. What is crucial
is that it never repeats itself, to foil a replay attack. For this reason
in many applications Alice and Bob compute i as a function of the
current time (for example, the index of the current minute based on
some agreed-upon starting point), and hence we can make it into
a one message protocol. Also the parameter ℓ is sometimes chosen
to be deliberately short so that it will be easy for people to type the
values y1 , . . . , yℓ .
Figure 4.2: The Google Authenticator app is one popular example of a one-time
password scheme using pseudorandom functions. Another example is RSA’s SecurID
token.
values f (0), . . . , f (2n − 1). Now consider the case where we don’t
actually toss the ith coin until we need it. The crucial point is that if
we have queried the function in T ≪ 2n places, when Bob chooses
a random i ∈ [2n ] then it is extremely unlikely that any one of the set
{i, i + 1, . . . , i + ℓ − 1} will be one of those locations that we previously
queried. Thus, if the function was truly random, Mallory has no infor-
mation on the value of the function in these coordinates, and would
be able to predict it in all these locations with probability at most 2−ℓ .
F (·) is the function i 3→ f s (i ) for some fixed and random s, then this
probability is at least 2−ℓ + ϵ. Thus A will distinguish between the two
cases with bias at least ϵ/2. We now turn to the formal proof:
One time passwords are a tool allowing you to prove your identity to,
say, your email server. But even after you did so, how can the server
trust that future communication comes from you and not from some
attacker that can interfere with the communication channel between
you and the server (so called “man in the middle” attack). Similarly,
one time passwords may allow a software company to prove their
identity before they send you a software update, but how do you
know that an attacker does not change some bits of this software
update on route between their servers and your device?
Alice has a message m she wants to send to Bob, but now we are not
concerned with Mallory learning the contents of the message. Rather,
we want to make sure that Bob gets precisely the message m sent by
Alice. Actually this is too much to ask for, since Mallory can always
decide to block all communication, but we can ask that either Bob
gets precisely m or he detects failure and accepts no message at all.
Since we are in the private key setting, we assume that Alice and Bob
share a key k that is unknown to Mallory.
Theorem 4.5 — MAC Theorem. Under the PRF Conjecture, there exists
a secure MAC.
pseudorandom functions 111
4
This discussion has more to do with
results” for proving results such as P ̸= NP.4 Specifically, the Natural computational complexity than cryp-
Proofs barrier for proving circuit lower bounds says that if strong tography, and so can be safely skipped
without harming understanding of
enough pseudorandom functions exist, then certain types of argu-
future material in this course.
ments are bound to fail. These are arguments which come up with a
property EASY of a Boolean function f : {0, 1}n → {0, 1} such that:
• The property EASY fails to hold for a random function with high
probability.
Theorem 5.1 — The PRF Theorem. Suppose that the PRG Conjecture is
true, then there exists a secure PRF collection { f s }s∈{0,1}∗ such that
for every s ∈ {0, 1}n , f s maps {0, 1}n to {0, 1}n .
maps n bits into 2n bits. Let’s denote G (s) = G0 (s) ◦ G1 (s) where
◦ denotes concatenation. That is, G0 (s) denotes the first n bits and
G1 (s) denotes the last n bits of G (s).
Figure 5.2: In the “lazy evaluation” implementation of the black box to the adversary,
we label every node in the tree only when we need it. In this figure check marks
correspond to nodes that have been labeled and question marks to nodes that are still
unlabeled.
pseudorandom functions from pseudorandom generators 115
Figure 5.3: When the adversary queries i, the oracle takes the path from i to the root
and computes the generator on the minimum number of internal nodes that is needed
to obtain the label of the ith leaf
Note that the 0th hybrid corresponds to the case where the oracle
implements the function i 3→ f s (i ) will in the T ′th hybrid all labels
are random and hence the oracle implements a random function. By
the hybrid argument, if A can distinguish between the 0th hybrid and
the T ′th hybrid with bias ϵ then there must exists some j such that it
distinguishes between the jth hybrid and the j + 1st hybrid with bias
116 an intensive introduction to cryptography
encryption of m1 .
An en-
Definition 5.2 — Chosen Plaintext Attack (CPA) secure encryption.
cryption scheme ( E, D ) is secure against chosen plaintext attack (CPA
secure) if for every polynomial time Eve, Eve wins with probability
at most 1/2 + negl (n) in the game defined below:
Figure 5.4: In the CPA game, Eve interacts with the encryption oracle and at the end
chooses m0 , m1 , gets an encryption c∗ = Ek (mb ) and outputs b′ . She wins if b′ = b
Proof. The proof is very simple: Eve will only use a single round of
interacting with E where she will ask for the encryption c1 of 0ℓ . In
the second round, Eve will choose m0 = 0ℓ and m1 = 1ℓ , and get
c∗ = Ek (mb ) she wil then output 0 if and only if c∗ = c1 . !
This shows that if f s (·) was a random function then Eve would
win the game with probability at most 1/2. Now if we have some
efficient Eve that wins the game with probability at least 1/2 + ϵ
then we can build an adversary A for the PRF that will run this entire
game with black box access to f s (·) and will output 1 if and only if
Eve wins. By the argument above, there would be a difference of at
least ϵ in the probability it outputs 1 when f s (·) is random vs when
it is pseudorandom, hence contradicting the security property of the
PRF. !
pseudorandom functions from pseudorandom generators 121
We will not show the proof of this theorem here, but Fig. 5.6
illustrates how the construction of a pseudorandom permuta-
tion from a pseudorandom function looks like. The construction
(known as the Luby-Rackoff construction) uses several rounds of
what is known as the Feistel Transformation that maps a function
f : {0, 1}n → {0, 1}n into a permutation g : {0, 1}2n → {0, 1}2n
using the map ( x, y) 3→ ( x, f ( x ) ⊕ y). For an overview of the proof see
Section 4.5 in Boneh Shoup or Section 7.6 in Katz-Lindell.
One of the first modern block ciphers was the Data Encryption
Standard (DES) constructed by IBM in the 1970’s. It is a fairly good
cipher- to this day, as far as we know, it provides a pretty good
number of security bits compared to its key. The trouble is that its
pseudorandom functions from pseudorandom generators 123
The actual construction of AES (or DES for that matter) is not
extremely illuminating, but let us say a few words about the general
principle behind many block ciphers. They are typically constructed
by repeating one after the other a number of very simple permuta-
tions (see Fig. 5.7). Each such iteration is called a round. If there are
t rounds, then the key k is typically expanded into a longer string,
which we think of as a t tuple of strings (k1 , . . . , k t ) via some pseu-
dorandom generator known as the key scheduling algorithm. The i-th
string in the tuple is known as the round key and is used in the ith
round. Each round is typically composed of several components:
there is a “key mixing component” that performs some simple per-
mutation based on the key (often as simply as XOR’ing the key),
there is a “mixing component” that mixes the bits of the block so
that bits that were intially nearby don’t stay close to one another, and
then there is some non-linear component (often obtained by applying
some simple non-linear functions known as “S boxes” to each small
block of the input) that ensures that the overall cipher will not be an
affine function. Each one of these operations is an easily reversible
operations, and hence decrypting the cipher simply involves running
the rounds backwards.
Figure 5.7: A typical round of a block cipher, k i is the th round key, xi is the block
before the ith round and xi+1 is the block at the end of this round.
Figure 5.8: In the Electronic Codebook (ECB) mode every message is encrypted
deterministically and independently
Figure 5.9: An encryption of the Linux penguin (left image) using ECB mode (middle
image) vs CBC mode (right image). The ECB encryption is insecure as it reveals much
structure about the original image. Image taken from Wikipedia.
pseudorandom functions from pseudorandom generators 125
In the output feedback mode (OFB) we encrypt the all zero string
using CBC mode to get a sequence (y1 , y2 , . . .) of pseudorandom
outputs that we can use as a stream cipher. Perhaps the simplest
mode is counter mode where we convert a block cipher to a stream
cipher by using the stream pk ( IV ), pk ( IV + 1), pk ( IV + 2), . . . where
IV is a random string in {0, 1}n which we identify with [2n ] (and
perform addition modulo 2n ) . For a modern block cipher this should
be no less secure than CBC or OFB and has advantages that we can
easily compute it in parallel.
It may seem that we have finally nailed down the security definition
for encryption. After all, what could be stronger than allowing Eve
unfettered access to the encryption function. Clearly an encryption
satisfying this property will hide the contents of the message in all
practical circumstances. Or will it?
our main point. In this protocol Alice (the user) sends to Bob (the
access point) an IP packet that she wants routed somewhere on the
internet.
Figure 6.1: The attack on the WEP protocol allowing the adversary Mallory to read
encrypted messages even when Alice uses a CPA secure encryption.
The point is that often our adversaries can be active and modify
the communication between sender and receiver, which in effect gives
them access not just to choose plaintexts of their choice to encrypt but
even to have some impact on the ciphertexts that are decrypted. This
motivates the following notion of security (see also Fig. 6.2):
What does CCA has to do with WEP? The CCA security game is
somewhat strange, and it might not be immediately clear whether it
has anything to do with the attack we described on the WEP protocol.
However, it turns out that using a CCA secure encryption would have
prevented that attack. The key is the following claim:
Lemma 6.2 Suppose that ( E, D ) is a CCA secure encryption, then there
is no efficient algorithm that given an encryption c of the plaintext
(m1 , m2 ) outputs a ciphertext c′ that decrypts to (m1′ , m2 ) where
m1′ ̸= m1 .
The proof is simple and relies on the crucial fact that the CCA
game allows M to query the decryption box on any ciphertext of her
choice, as long as it’s not exactly identical to the challenge cipertext
c∗ . In particular, if M′ is able to morph an encryption c of m to some
encryption c′ of some different m′ that agrees with m on some set
of bits, then M can do the following: in the security game, use m0
to be some random message and m1 to be this plaintext m. Then,
when receiving c∗ , apply M′ to it to obtain a ciphertext c′ (note
that if the plaintext differs then the ciphertext must differ also; can
you see why?) ask the decryption box to decrypt it and output 1
if the resulting message agrees with m in the corresponding set of
bits (otherwise output a random bit). If M′ was successful with
probability ϵ, then M would win in the CCA game with probability
at least 1/2 + ϵ/10 or so. !
chosen ciphertext security 133
This is a lesson that has been time and again been shown and
many protocols have been broken due to the mistaken belief that if
we only care about secrecy, it is enough to use only encryption (and
one that is only CPA secure) and there is no need for authentication.
Matthew Green writes this more provocatively as 1
I also like the part where Green says
about a block cipher mode that “if
OCB was your kid, he’d play three
Nearly all of the symmetric encryption modes you sports and be on his way to Harvard.”
learned about in school, textbooks, and Wikipedia are We will have an exercise about a
(potentially) insecure. 1 simplified version of the GCM mode
(which perhaps only plays a single
sport and is on its way to . . . ). You
can read about OCB in Exercise 9.14
exactly because these basic modes only ensure security for passive in the Boneh-Shoup book; it uses the
eavesdropping adversaries and do not ensure chosen ciphertext notion of a “tweakable block cipher”
which simply means that given a
security which is the “gold standard” for online applications. (For single key k, you actually get a set
symmetric encryption people often use the name “authenticated { pk,1 , . . . , pk,t } of permutations that are
indistinguishable from t independent
encryption” in practice rather than CCA security; those are not random permutation (the set {1, . . . , t}
identical but are extremely related notions.) is called the set of “tweaks” and we
sometimes index it using strings instead
All of this suggests that Message Authentication Codes might help of numbers).
134 an intensive introduction to cryptography
us get CCA security. This turns out to be the case. But one needs to
take some care exactly how to use MAC’s to get CCA security. At this
point, you might want to stop and think how you would do this. . .
OK, so now that you had a chance to think about this on your own,
we will describe one way that works to achieve CCA security from
MACs. We will explore other approaches that may or may not work
in the exercises.
Theorem 6.3 — CCA from CPA and MAC. Let ( E, D ) be CPA-secure en-
cryption scheme and (S, V ) be a CMA-secure MAC with n bit
keys and a canonical verification algorithm. 2 Then the following
encryption ( E′ , D ′ ) with keys 2n bits is CCA secure:
* Ek′ ,k2 (m) is obtained by computing c = Ek1 (m) , σ = Sk2 (c) and
1
outputting (c, σ ).
* Dk′ ,k2 (c, σ ) outputs nothing (e.g., an error message) if Vk2 (c, σ) ̸=
1
1, and otherwise outputs Dk1 (c).
2
By a canonical verification algorithm we
mean that Vk (m, σ) = 1 iff Sk (m) = σ.
Case II: The event above happens with probability smaller than
ϵ/10.
Let’s start with Case I: When this case holds, we will build an
adversary F (for “forger”) for the MAC (S, V ), we can assume the
adversary F has access to the both signing and verification algorithms 3
Since we use a MAC with canonical
as black boxes for some unknown key k2 that is chosen at random verification, access to the signature algo-
and fixed.3 F will choose k1 on its own, and will also choose at rithm implies access to the verification
algorithm.
random a number i0 from 1 to T, where T is the total number of
queries that M′ makes to the decryption box. F will run the entire
CCA game with M′ , using k1 and its access to the black boxes to
execute the decryption and decryption boxes, all the way until just
before M′ makes the i0th query (c, σ) to its decryption box. At that
point, F will output (c, σ). We claim that with probability at least
ϵ/(10T ), our forger will succeed in the CMA game in the sense that
(i) the query (c, σ ) will pass verification, and (ii) the message c was
chosen ciphertext security 137
Now for Case II: In this case, we will build an adversary Eve
for CPA-game in the original scheme ( E, D ). As you might expect,
the adversary Eve will choose by herself the key k2 for the MAC
scheme, and attempt to play the CCA security game with M′ . When
M′ makes encryption queries this should not be a problem- Eve can
forward the plaintext m to its encryption oracle to get c = Ek1 (m) and
then compute σ = Sk2 (c) since she knows the signing key k2 .
P [ h( x ) = y ∧ h( x ′ ) = y′ ] = 2−2n (6.1)
h
4
In ϵ-almost universal hash functions
Universal hash functions have rather efficient constructions, and we require that for every y, y′ ∈ {0, 1}n ,
and x ̸= x ′ ∈ {0, 1}ℓ , the probability
in particular if we relax the definition to allow almost universal hash
that h( x ) = h( x ′ ) is at most ϵ. It can be
functions (where we replace the 2−2n factor in the righthand side of easily shown that the analysis below
Eq. (6.1) by a slightly bigger, though still negligible quantity) then extends to ϵ almost universal hash
functions as long as ϵ is negligible,
the constructions become extremely efficient and the size of the but we will leave verifying this to the
description of h is only related to n, no matter how big ℓ is.4 reader.
chosen ciphertext security 139
• Let ci = zi ⊕ mi .
The basic unit in the bitcoin system is a coin. Each coin has a 2
This is one of the places where we
simplify and deviate from the actual
unique identifier, and a current owner .2 Transactions in the system
Bitcoin system. In the actual Bitcoin
have either the form of “mint coin with identifier ID and owner P” system, the atomic unit is known as
or “transfer the coin ID from P to Q”. All of these transactions are a satoshi and one bitcoin (abberviated
BTC) is 108 satoshis. For reasons of
recorded in a public ledger. efficiency, there is no individual iden-
tifier per satoshi and transactions can
Since there are no user accounts in bitcoin, the “entities” P and involve transfer and creation of multi-
Q are not identifiers of any person or account. Rather P and Q are ple satoshis. However, conceptually we
can think of atomic coins each of which
“computational puzzles”. A computational puzzle can be thought of as has a unique identifier.
a string α that specifies some “problem” such that it’s easy to verify
whether some other string β is a “solution” for α, but it is hard to
find such a solution on your own. (Students with complexity back-
ground will recognize here the class NP.) So when we say “transfer
the coin ID from P to Q” we mean that whomever holds a solution
for the puzzle Q is now the owner of the coin ID (and to verify the
authenticity of this transfer, you provide a solution to the puzzle P.)
More accurately, a transaction involving the coin ID is self-validating
if it contains a solution to the puzzle that is associated with ID ac-
hash functions and random oracles 145
assume that even one party would behave honestly- if there is no cen-
tral authority and it is profitable to cheat then they everyone would
cheat, wouldn’t they?
Figure 7.1: The bitcoin ledger consists of an ordered list of transactions. At any given
point in time there might be several “forks” that continue the ledger, and different
parties do not necessarily have to agree on them. However, the bitcoin architecture
is designed to ensure that the parties corresponding to a majority of the computing
power will reach consensus on a single ledger.
Perhaps the main idea behind bitcoin is that “majority” will corre-
spond to a “majority of computing power”, or as the original bitcoin
paper says, “one CPU one vote” (or perhaps more accurately, “one
cycle one vote”). It might not be immediately clear how to imple-
ment this, but at least it means that creating fictitious new entities
(sometimes known as a Sybill attack after the movie about multiple-
personality disorder) cannot help. To implement it we turn to a
cryptographic concept known as “proof of work” which was origi- 4
This was a rather visionary paper in
nally suggested by Dwork and Naor in 1991 as a way to combat mass that it foresaw this issue before the term
marketing email.4 “spam” was introduced and indeed
when email itself, let alone spam email,
was hardly widespread.
Consider a pseudorandom function { f k } mapping n bits to ℓ bits.
On average, it will take a party Alice 2ℓ queries to obtain an input x
such that f k ( x ) = 0ℓ . So, if we’re not too careful, we might think of
such an input x as a proof that Alice spent 2ℓ time.
The question is then how do we get to that happy state given that
many parties might be non-malicious but still selfish and might not
want to volunteer their computing power for the goal of creating a
consensus ledger. Bitcoin achieves this by giving some incentive, in
the form of the ability to mint new coins, to any party that adds to
the ledger. This means that if we are already in the situation where
there is a consensus ledger L, then every party has an interest in
continuing this ledger L, and not any alternative, as they want their
minting transaction to be part of the new consensus ledger. In con-
trast if they “fork” the consensus ledger then their work may well be
for vain. Thus one can hope that the consensus ledger will continue
to grow. (This is a rather hand-wavy and imprecise argument, see
this paper for a more in depth analysis; this is also related to the
phenomenon known as preferential attachment.)
hash functions and random oracles 149
Figure 7.2: A collision-resistant hash function is a map that from a large universe to a
small one that is “practically one to one” in the sense that collisions for the function do
exist but are hard to find.
The main idea is the following simple result, which can be thought
of as one side of the so called “birthday paradox”:
Lemma 7.1 If H is a random function from some domain S to {0, 1}n ,
then the probability that after T queries an attacker finds x ̸= x ′ such
that H ( x ) = H ( x ′ ) is at most T 2 /2n .
hash functions and random oracles 151
A collection {hk } of
Definition 7.2 — Collision resistant hash functions.
functions where h : {0, 1}∗ → {0, 1}n for k
k ∈ {0, 1}n is a collision
resistant hash function (CRH) collection if the map (k, x ) 3→ hk ( x )
is efficiently computable and for every efficient adversary A,
the probability over k that A(k ) = ( x, x ′ ) such that x ̸= x ′ and
hk ( x ) = hk ( x ′ ) is negligible. 6
6
Note that the other side of the birth-
day bound shows that you can always
Once more we do not know a theorem saying that under the PRG find a collision in hk using roughly 2n/2
conjecture there exists a collision resistant hash function collection, queries. For this reason we typically
need to double the output length of
even though this property is considered as one of the desiderata for hash functions compared to the key size
cryptographic hash functions. However, we do know how to obtain of other cryptographic primitives (e.g.,
collections satisfying this condition under various assumptions that 256 bits as opposed to 128 bits).
we will see later in the course such as the learning with error prob-
lem and the factoring and discrete logarithm problems. Furthermore
if we consider the weaker notion of security under a second preimage
attack (also known as being a “universal one way hash function” or
UOWHF) then it is known how to derive such a function from the
PRG assumption.
Proof. The intuition behind the proof is that if h was invertible then
we could invert H by simply going backwards. Thus in principle if
a collision for H exists then so does a collision for h. Now of course
hash functions and random oracles 153
One fix for this is to use a different IV ′ in the end of the encryp-
tion. That is, we define:
There are a few ways we can get “insider views” to the NSA’s
thinking. Some such insights can be obtained from the Snowden
hash functions and random oracles 155
Figure 8.1: To obtain a key from a password we will typically use a “slow” hash func-
tion to map the password and a unique-to-user public “salt” value to a cryptographic
key. Even with such a procedure, the resulting key cannot be consider as secure and
unpredictable as a key that was chosen truly at random, especially if we are in a
setting where an adversary can launch an offline attack to guess all possibilities.
Suppose that you outsource to the cloud storing your huge data file
x ∈ {0, 1} N . You now need the ith bit of that file and ask the cloud for
xi . How can you tell that you actually received the correct bit?
Alice who sends x to the cloud Bob will keep the short block y.
Whenever Alice queries the value i she will ask for a certificate that
xi is indeed the right value. This certificate will consists of the block
that contains i, as well as all of the 2 log t blocks that were used in the
hash from this block to the root. The security of this scheme follows
from the following simple theorem:
Figure 8.2: In the Merkle Tree construction we map a long string x into a block y ∈
{0, 1}n that is a “digest” of the long string x. As in a collision resistant hash we can
imagine that this map is “one to one” in the sense that it won’t be possible to find
x ′ ̸= x with the same digest. Moreover, we can efficiently certify that a certain bit of
x is equal to some value without sending out all of x but rather the log t blocks that
are on the path between i to the root together with their “siblings” used in the hash
function, for a total of at most 2 log t blocks.
The above provides a way to ensure Alice that the value retrieved
from a cloud storage is correct, but how can Alice be sure that the
cloud server still stores the values that she did not ask about?
A priori, you might think that she obviously can’t. If Bob is lazy,
or short on storage, he could decide to store only some small fraction
of x that he thinks Alice is more likely to query for. As long as Bob
162 an intensive introduction to cryptography
wasn’t unlucky and Alice doesn’t ask these queries, then it seems Bob
could get away with this. In a proof of retrievability, first proposed by
Juels and Kalisky in 2007, Alice would be able to get convinced that
Bob does in fact store her data.
First, note that Alice can guarantee that Bob stores at least 99
percent of her data, by periodically asking him to provide answers
(with proofs!) of the value of x at 100 or so random locations. The
idea is that if bob dropped more than 1 percent of the bits, then he’d
be very likely to be caught “red handed” and get a question from
Alice about a location he did not retain.
Figure 8.3: To obtain pseudorandom bits for cryptographic applications we hash down
measurements which contain some entropy in them to a shorter string that is hopefully
truly uniformly random or at least statistically close to it, and then expand this to get
as many pseudorandom bits as we need using a pseudorandom generator.
How do hash functions figure into this? The idea is that if an input
x has n bits of entropy then h( x ) would still have the same bits of
entropy, as long as its output is larger than n. In practice people use
the notion of “entropy” in a rather loose sense, but we will try to be
more precise below.
Claim: Let Col (Ys ) be the probability that two independent sample
from Ys are identical. Then with probability at least 0.99, Col (Ys ) <
2−n + 100 · 2−2n .
People have been dreaming about heavier than air flight since at
least the days of Leonardo Da Vinci (not to mention Icarus from the
greek mythology). Jules Verne wrote with rather insightful details
about going to the moon in 1865. But, as far as I know, in all the
thousands of years people have been using secret writing, until about
50 years ago no one has considered the possibility of communicating
securely without first exchanging a shared secret key. However, in the
late 1960’s and early 1970’s, several people started to question this
“common wisdom”.
We only found out much later that in the late 1960’s, a few years
before Merkle, James Ellis of the British Intelligence agency GCHQ
was having similar thoughts. His curiosity was spurred by an old
World-War II manuscript from Bell labs that suggested the following
way that two people could communicate securely over a phone line.
Alice would inject noise to the line, Bob would relay his messages,
and then Alice would subtract the noise to get the signal. The idea is
that an adversary over the line sees only the sum of Alice’s and Bob’s
signals, and doesn’t know what came from what. This got James Ellis
thinking whether it would be possible to achieve something like that
digitally. As he later recollected, in 1970 he realized that in principle
this should be possible, since he could think of an hypothetical
black box B that on input a “handle” α and plaintext p would give a
“ciphertext” c and that there would be a secret key β corresponding
to α, such that feeding β and c to the box would recover p. However,
Ellis had no idea how to actually instantiate this box. He and others
kept giving this question as a puzzle to bright new recruits until one
of them, Clifford Cocks, came up in 1973 with a candidate solution
loosely based on the factoring problem; in 1974 another GCHQ
recruit, Malcolm Williamson, came up with a solution using modular
exponentiation.
3.5.2, BS 5.5
One major point we did not talk about in this course was one way
functions. The definition of a one way function is quite simple:
Definition 9.1 — One Way Functions. A function f : {0, 1}∗ → {0, 1}∗
is a one way function if it is efficiently computable and for every n
and a poly(n) time adversary A, the probability over x ← R {0, 1}n
that A( f ( x )) outputs x ′ such that f ( x ′ ) = f ( x ) is negligible.
Theorem 9.2 — One way functions and private key cryptography. The fol-
lowing are equivalent:
The key result in the proof of this theorem is the result of Hastad,
Impagliazzo, Levin and Luby that if one way functions exist then
pseudorandom generators exist. If you are interested in finding out
more, Sections 7.2-7.4 in the KL book cover a special case of this
theorem for the case that the one way function is a permutation on
{0, 1}n for every n. This proof has been considerably simplified and
quantitatively improved in works of Haitner, Holenstein, Reingold,
Vadhan, Wee and Zheng. See this talk of Salil Vadhan for more on
this. See also these lecture notes from a Princeton seminar I gave on
this topic (though the proof has been simplified since then by the
above works).
the “right” definition, but in the interest of time we will skip ahead to
what by now is the standard basic notion (see also Fig. 9.1):
Figure 9.1: In a public key encryption, the receiver Bob generates a pair of keys (e, d),
The encryption key e is used for encryption, and the decryption key is used for decryption.
We call it a public key system since the security of the scheme does not rely on the
adversary Eve not knowing the encryption key. Hence Bob can publicize the key e to
a great many potential receivers, and still ensure confidentiality of the messages he
receives.
• (e, d) ← R G (1n )
Why would someone imagine that such a magical object could exist?
The writing of both James Ellis as well as Diffie and Hellman sug-
gests that their thought process was roughly as follows. You imagine
a “magic black box” B such that if all parties have access to B then
we could get a public key encryption scheme. Now if public key
encryption was impossible it would mean that for every possible
program P that computes the functionality of B, if we distribute the
code of P to all parties then we don’t get a secure encryption scheme.
That means that no matter what program P the adversary gets, she will
always be able to get some information out of that code that helps
break the encryption, even though she wouldn’t have been able to
break it if P was a black box. Now intuitively understanding arbi-
trary code is a very hard problem, so Diffie and Hellman imagined
that it might be possible to take this ideal B and compile it to some
sufficiently low level assembly language so that it would behave as
a “virtual black box”. In particular, if you took, say, the encoding
procedure m 3→ pk (m) of a block cipher with a particular key k, and
ran it through an optimizing compiler you might hope that while it
would be possible to perform this map using the resulting executable,
it will be hard to extract k from it, and hence could treat this code as
a “public key”. This suggests the following approach for getting an
encryption scheme:
Suppose
Theorem 9.5 — Diffie-Hellman security in Random Oracle Model.
that the Computational Diffie-Hellman Conjecture for mod prime
groups is true. Then, the Diffie-Hellman public key encryption is
CPA secure in the random oracle model.
Proof. For CPA security we need to prove that (for fixed G of size p
and random oracle H) the following two distributions are computa-
tionally indistinguishable for every two strings m, m′ ∈ {0, 1}ℓ :
(can you see why this implies CPA security? you should pause
here and verify this!)
Now given the claim, we can complete the proof of security via
the following hybrids. Define the following “hybrid” distributions
(where in all cases a, b are chosen uniformly and independently in
Z p ):
public key cryptography 179
• H0 : ( g a , gb , H ( g ab ) ⊕ m)
• H1 : ( g a , gb , Uℓ ⊕ m)
• H2 : ( g a , gb , Uℓ ⊕ m′ )
• H3 : ( g a , gb , H ( g ab ) ⊕ m)
Proof. Recall that the least common multiple (LCM) of two or more
a1 , . . . , at is the smallest number that is a multiple of all of the ai ’s.
One way to compute the LCM of a1 , . . . , at is to take the prime factor-
izations of all the ai ’s, and then the LCM is the product that all the
primes that appear in these factorizations, with the highest power
that they appear in. Let k be the number of primes between 1 and N.
The lemma will follow from the following two claims:
The two claim immediately imply the result since they imply
that 2 N ≤ N k , and taking logs we get that N − 2 ≤ k log N or
k ≥ ( N − 2)/ log N. (We can assume that N is odd without of loss of
public key cryptography 181
2− N +1 ≥ I ≥ 1
LCM (1,...,N )
(9.1)
The following basic facts are all not too hard to prove and would
be useful exercises:
2. The adversary gets the inputs 1n , v, and black box access to the
signing algorithm Ss (·).
The first issue is not so significant, since we can always have the
ciphertext be an encryption of x = H (m) where H is some hash
function presumed to behave as a random oracle. (We do not want to
simply run this protocol with x = m. Can you see why?)
Suppose
Theorem 9.8 — Random-Oracle Model Security of DSA signatures.
that the discrete logarithm assumption holds for the group G.
Then the DSA signature with G is secure when H, F are modeled
as random oracles.
and
∗ ∗) ∗
g H (m ) h F ( f = ( f ∗ )σ (9.4)
and
H (m∗ ) + aF ( f ∗ ) = bσ∗ (9.6)
public key cryptography 187
or
b = ( H (m∗ ) − H (m))(σ − σ∗ )−1 mod p (9.7)
since all of the valus H (m∗ ), H (m), σ, σ∗ are known, this means we
can compute b, and hence also recover the unknown value a.
H1 (m∗ ) + aF ( f ∗ ) = bσ (9.8)
and
H2 (m∗ ) + aF ( f ∗ ) = bσ∗ (9.9)
and
H (m∗ ) + aF2 ( f ∗ ) = bσ∗ (9.11)
where F1 ( f ∗ ) and F2 ( f ∗ ) are our two answers in the first and second
experiment, and now we can use this to learn a = b(σ − σ∗ )( F1 ( f ∗ ) −
F2 ( f ∗ ))−1 .
188 an intensive introduction to cryptography
public key for Amazon, but Apple surely does. So Apple can supply
Amazon with a signed message to the effect of
from the encryption, how the two parties negotiate which crypto-
graphic algorithm to use, and more. All these issues can and have
been used for attacks on this protocol. For two recent discussions see
this blog post and this website.
Figure 9.2: When you connect to a webpage protected by SSL/TLS, the Browswer
displays information on the certificate’s authenticity
Figure 9.3: The cipher and certificate used by ‘”Google.com”’. Note that Google has a
2048bit RSA signature key which it then uses to authenticate an elliptic curve based
Diffie-Hellman key exchange protocol to create session keys for the block cipher AES
with 128 bit key in Galois Counter Mode.
Figure 9.4: Digital signatures and other forms of electronic signatures are legally
binding in many jurisdictions. This is some material from the website of the electronic
signing company DocuSign
192 an intensive introduction to cryptography
of which is either 0 or 1.
log 2N
⌊ log P ⌋
Thus, (2N
N ) ≤ ∏1≤ P≤2N P . Taking logs we get that
P prime
7 8
2N
N ≤ log (9.12)
N
log 2N
≤ ∑ ⌊ log P ⌋ log P (9.13)
Pprime∈[2n]
≤ ∑ log 2N (9.14)
Pprime∈[2n]
(See Shoup’s excellent and freely available book for extensive cover-
age of these and many other topics.)
Lemma 10.2 its own might not seem very meaningful since it’s
not clear how many pseudoprimes are there. However, it turns out
these pseudoprimes, also known as “Carmichael numbers”, are
much less prevalent than the primes, specifically, there are about
N/2−Θ(log N/ log log N ) pseudoprimes between 1 and N. If we choose a
random number m ∈ [2n ] and output it if and only if the algorithm of
Lemma 10.2 algorithm outputs YES (otherwise resampling), then the
probability we make a mistake and output a pseudoprime is equal
to the ratio of the set of pseudoprimes in [2n ] to the set of primes in
n
[2n ]. Since there are Ω(2n /n) primes in [2n ], this ratio is −Ω(n/ log n)
2
which is a negligible quantity. Moreover, as mentioned above, there
are better algorithms that succeed for all numbers.
10.1.2 Fields
If f is a
Theorem 10.3 — Fundamental Theorem of Algebra, mod p version.
nonzero polynomial of degree d over Z p then there are at most d
distinct inputs x such that f ( x ) = 0.
(If you’re curious why, you can see that the task of, given
x1 , . . . , xd+1 finding the coefficients for a polynomial vanishing on
the xi ’s amounts to solving a linear system in d + 1 variables with
d + 1 equations that are independent due to the non-singularity of the
Vandermonde matrix.)
not contain all the numbers from 1 to m − 1. Indeed, all the numbers
of the form p, 2p, 3p, . . . , (q − 1) p and q, 2q, . . . , ( p − 1)q will have
non-trivial g.c.d. with m. There are exactly q − 1 + p − 1 such numbers
(because p and q are prime all the numbers of the forms above are
∗ | = m − 1 − ( p − 1) − ( q − 1) = pq − p − q + 1 =
distinct). Hence | Zm
( p − 1)(q − 1).
∗ | = |Z ∗ | · |Z ∗ |. It turns out this is no accident:
Note that | Zm p q
* ϕ1 ( x + y) = ϕ1 ( x ) + ϕ1 (y) (mod p)
concrete candidates for public key crypto 197
* ϕ2 ( x + y) = ϕ2 ( x ) + ϕ2 (y) (mod q)
* ϕ1 ( x · y) = ϕ1 ( x ) · ϕ1 (y) (mod p)
* ϕ2 ( x · y) = ϕ2 ( x ) · ϕ2 (y) (mod q)
Suppose and
Theorem 10.6 — Square root extraction implies factoring.
there is an efficient algorithm A such that for every m ∈ N and
a ∈ Z ∗m , A(m, a2 (mod m)) = b such that a2 = b2 (mod m). Then,
there is an efficient algorithm to recover p, q from m.
We are now ready to describe the RSA and Rabin trapdoor functions:
for the RSA function, but there is no better algorithm known to attack
it than proceeding via factorization of m. The RSA function has the
advantage that it is a permutation over Z ∗m :
Lemma 10.9 RSAm,e is one to one over Z ∗m .
If { pk }
Theorem 10.11 — Public key encryption from trapdoor permutations.
is a secure TDP and H is a random oracle then TDPENC is a CPA
secure public key encryption scheme.
Figure 10.1: In the proof of security of TDPENC, we show that if the assumption of the
claim is violated, the “forgetful experiment” is identical to the real experiment with
probability larger 1 − ϵ. In such a case, even if all that probability mass was on the
points in the sample space where the adversary in the forgetful experiment will lose
and the adversary of the real experiment will win, the probability of winning in the
latter experiment would still be less than 1/2 + ϵ.
• When A makes the query m to the signature box, then since m was
1
queried before to H, if m ̸= m∗ then I returns x = p−k ( H ( m ))
∗
using its records. If m = m then I halts and outputs “failure”.
To be completed.
11
Lattice based crypto
You may note that I haven’t yet said what lattices are; we will do
so later, but for now if you simply think of questions involving linear
equations modulo some prime q, you will get enough of the intuition
that you need. (The lattice viewpoint is more geometric, and we’ll
discuss it more below; it was first used to attack cryptosystems and in
particular break the Merkle-Hellman knapsack scheme and many of
its variants.)
rently lattice based cryptography is the only real “game in town” for
potentially quantum-resistant public key encryption schemes.
How does the encrypting algorithm, that does not know x, get
a correct or incorrect equation on demand? One way would be to
simply take two equations ⟨ ai , x ⟩ = yi and ⟨ a j , x ⟩ = y j and add them
together to get the equation ⟨ ai + a j , x ⟩ = yi + y j . This equation
is correct and so one can use it to encrypt 0, while to encrypt 1 we
simply add some fixed nonzero number α ∈ Z q to the right hand side
to get the incorrect equation ⟨ ai + a j , x ⟩ = yi + y j + α. However, even
if it’s hard to solve for x given the equations, an attacker (who also
knows the public key ( A, y)) can try itself all pairs of equations and
do the same thing.
Our solution for this is simple- just add more equations! If the
encryptor adds a random subset of equations then there are 2m
possibilities for that, and an attacker can’t guess them all. Thus, at
least intuitively, the following encryption scheme would be “secure”
in the Gaussian-elimination free world of attackers that haven’t taken
freshman linear algebra:
P Please stop here and make sure that you see why
this is a valid encryption, and this description corre-
sponds to the previous one; as usual all calculations
are done modulo q.
lattice based crypto 209
Now, suppose that the equations were noisy, in the sense that 2
Over Z q , we can think of q − 1 also
we added to y a vector e ∈ Z m q such that | ei | < δq for every i.
2
as the number −1, and so on. Thus
Even ignoring the effect of the scaling step, simply adding the first if a ∈ Z q , we define | a| to be the
minimum of a and q − a. This ensures
equation to the rest of the equations would typically tend to increase the absolute value satisfies the natural
the relative error of equations 2, . . . , m from ≈ δ to ≈ 2δ. Now, when property of | a| = | − a|.
we repeat the process, we increase the error of equations 3, . . . , m
from ≈ 2δ to ≈ 4δ, and we see that by the time we’re done dealing
with about n/2 variables, the remaining equations have error level
roughly 2n/2 δ. So, unless δ was truly tiny (and q truly big, in which
case the difference between working in Z q and simply working with
integers or rationals disappears), the resulting equations have the
form Ix = y′ + e′ where e′ is so big that we get no information on x.
Figure 11.1: The search to decision reduction (Theorem 11.1) implies that under the
LWE conjecture, for every m = poly(n), if we choose and fix a random m × n matrix
A over Z q , the distribution Ax + e is indistinguishable from a random vector in Z m
q ,
where x is a random vector in Z nq and e is a random “short” vector in Z m
q . The two
distributions are indistinguishable even to an adversary that knows A.
Figure 11.2: In the encryption scheme LWEENC, the public key is a matrix A′ = ( A|y),
where y = As + e and s is the secret key. To encrypt a bit b we choose a random
q
w ← R {0, 1}m , and output w⊤ A′ + (0, . . . , 0, b⌊ 2 ⌋). We decrypt c ∈ Z qn+1 to zero with
key s iff |⟨c, (s, −1)⟩| ≤ q/10 where the inner product is done modulo q.
5
In fact, due to the fact that the signs of
Proof. ⟨w⊤ A, x ⟩ = ⟨w, Ax ⟩. Hence, if y = Ax + e then ⟨w, y⟩ =
the error vector’s entries are different,
⟨w⊤ A, x ⟩ + ⟨w, e⟩. But since every coordinate of w is either 0 or 1, we expect the errors to have significant
|⟨w, e⟩| < δmq < q/10 for our choice of parameters.5 So, we get that cancellations and hence we would
expect |⟨w, e√⟩| to only be roughly of
if a = w⊤ A and σ = ⟨w, y⟩ + b⌊q/2⌋ then σ − ⟨ a, x ⟩ = ⟨w, e⟩ + b⌊q/2⌋ magnitude mδq, but this is not crucial
which will be smaller than q/10 iff b = 0. ! for our discussions.
lattice based crypto 213
For a public key encryption scheme with messages that are just
bits, CPA security means that an encryption of 0 is indistinguish-
able from an encryption of 1, even given the public key. Thus Theo-
rem 11.3 will follow from the following lemma:
Lemma 11.4 Let q, m, δ be set as in LWEENC, Then assuming the
LWE conjecture, the following distributions are computationally
indistinguishable:
P You should stop here and verify that (i) You un-
derstand the statement of Lemma 11.4 and (ii) you
understand why this lemma implies Theorem 11.3.
The idea is that Lemma 11.4 shows that the con-
catenation of the public key and encryption of 0 is
indistinguishable from something that is completely
random. You can then use it to show that the con-
catenation of the public key and encryption of 1 is
indistinguishable from the same thing, and then
finish using the hybrid argument.
We will not do the whole proof of the claim (which uses the
mod q version of the leftover hash lemma which we mentioned
before and is also “Wikipedia-able”) but the idea is simple. For every
m × (n + 1) matrix A′ over Z q , define h A′ : Z m n
q → Z q to be the
⊤ ′
map h A′ (w) = w A . This collection can be shown to be a “good”
hash function collection in some specific technical sense, which in
particular implies that for every distribution D with much more
than n log q bits of min-entropy, with all but negligible probability
over the choice of A′ , h A′ ( D ) is statistically indistinguishable from
the uniform distribution. Now when we choose w at random in
{0, 1}m , it is coming from a distribution with m bits of entropy. If
m ≫ (n + 1) log q, then because the output of this function is so much
smaller than m, we expect it to be completely uniform, and this is
what’s shown by the leftover hash lemma. !
lattice based crypto 215
One of the biggest issues with lattice based cryptosystem is the key
size. In particular, the scheme above uses an m × n matrix where each
entry takes log q bits to describe. (It also encrypts a single bit using
a whole vector, but more efficient “multi-bit” variants are known.)
Schemes using ideal lattices are an attempt to get more practical
variants. These have very similar structure except that the matrix
A chosen is not completely random but rather can be described
by a single vector. One common variant is the following: we fix
some polynomial p over Z q with degree n and then treat vectors
in Z nq as the coefficients of n − 1 degree polynomials and always
work modulo this polynomial p(). (By this I mean that for every
polynomial t of degree at least n we write t as ps + r where p is the
polynomial above, s is some polynomial and r is the “remainder”
polynomial of degree < n; then t (mod p) = r.) Now for every fixed
polynomial t, the operation At which is defined as s 3→ ts (mod p)
is a linear operation mapping polynomials of degree at most n − 1
to polynomials of degree at most n − 1, or put another way, it is a
lattice based crypto 217
linear map over Z nq . However, the map Ad can be described using the
n coefficients of t as opposed to the n2 description of a matrix. It also
turns out that by using the Fast Fourier Transform we can evaluate
this operation in roughly n steps as opposed to n2 . The ideal lattice
based cryptosystem use matrices of this form to save on key size and
computation time. It is still unclear if this structure can be used for
attacks; recent papers attacking principal ideal lattices have shown
that one needs to be careful about this.
To be completed
We’ve now compiled all the tools that are needed for the basic goal
of cryptography (which is still being subverted quite often) allowing
Alice and Bob to exchange messages assuring their integrity and
confidentiality over a channel that is observed or controlled by an
adversary. Our tools for achieving this goal are:
The main issue with this key exchange protocol is of course that
adversaries often are not passive. In particular, an active Eve could
agree on her own key with Alice and Bob separately and then be
able to see and modify all future communication. She might also
be able to create weird (with some potential security implications)
correlations by, say, modifying the message A to be A2 etc..
• The adversary then starts many connections with the server with
ciphertexts related to c, and observes whether they succeed or fail
(and in what way they fail, if they do). It turns out that based on
this information, the adversary would be able to recover the key k.
• The keys (e, d) are generated via G (1n ), and Mallory gets the
public encryption key e and 1n .
of the note written there with some arbitrary string. Indeed, several
practical attacks, including Bleichenbacher’s attack above, exploited
exactly this gap between the physical metaphor and the digital
realization. For more on this, please see Victor Shoup’s survey where
he also describes the Cramer-Shoup encryption scheme which was
the first practical public key system to be shown CCA secure without
resorting to the random oracle heuristic. (The first definition of CCA
security, as well as the first polynomial-time construction, was given
in a seminal 1991 work of Dolev, Dwork and Naor.)
CCA-ROM-ENC Scheme:
• Ingredients: A public key encryption
scheme ( G ′ , E′ , D ′ ) and a two hash functions
H, H ′ : {0, 1}∗ → {0, 1}n (which we model as
independent random oracles 2 )
• Key generation: We generate keys (e, d) =
G ′ (1n ) for the underlying encryption scheme.
• Encryption: To encrypt a message m ∈ {0, 1}ℓ ,
we select randomness r ←R {0, 1}ℓ for the
underlying encryption algorithm E′ and out-
put Ee′ (r; H (m∥r ))∥(r ⊕ m)∥ H ′ (m∥r ), where by
Ee′ (m′ ; r ′ ) we denote the result of encrypting
m′ using the key e and the randomness r ′ (we
assume the scheme takes n bits of randomness
as input; otherwise modify the output length of
H accordingly).
• Decryption: To decrypt a ciphertext c∥y∥z first
let r = Dd (c), m = r ⊕ y and then check that
c = Ee (m; H (m∥r )) and z = H ′ (m∥r ). If any of
the checks fail we output error; otherwise we
228 an intensive introduction to cryptography
output m.
2
Recall that it’s easy to obtain two
independent random oracles H, H ′
The above CCA-ROM-ENC scheme is CCA secure. from a single oracle H ′′ , for example
by letting H ( x ) = H ′′ (0∥ x ) and
H ′ ( x ) = H ′′ (1∥ x ).
Figure 13.1: How the NSA feels about breaking encrypted communication
14
Zero knowledge proofs
The United States and Russia have reached a dangerous and expen-
sive equilibrium by which each has about 7000 nuclear warheads, 2
To be fair, “only” about 170 mil-
much more than is needed to decimate each others’ population (and lion Americans live in the 50 largest
the population of much of the rest of the world).2 Having so many metropolitan areas and so arguably
many people will survive at least
weapons increases the chance of “leakage” of weapons, or of an ac-
the initial impact of a nuclear war,
cidental launch (which can result in an all out war) through fault though it had been estimated that
in communications or rogue commanders. This also threatens the even a “small” nuclear war involving
detonation of 100 not too large war-
delicate balance of the Non-Proliferation Treaty which at its core is heads could have devastating global
a bargain where non-weapons states agree not to pursue nuclear consequences.
weapons and the five nuclear weapon states agree to make progress
on nuclear disarmament. These huge quantities of nuclear weapons
are not only dangerous, as they increase the chance of a leak or of an
individual failure or rogue commander causing a world catastrophe,
but also extremely expensive to maintain.
14.1.2 Voting
Electronic voting has been of great interest for many reasons. One
potential advantage is that it could allow completely transparent
vote counting, where every citizen could verify that the votes were
counted correctly. For example, Chaum suggested an approach to
do so by publishing an encryption of every vote and then having
the central authority prove that the final outcome corresponds to the
counts of all the plaintexts. But of course to maintain voter privacy,
we need to prove this without actually revealing those plaintexts. Can
we do so?
I chose these two examples above precisely because they are hardly
the first that come to mind when thinking about zero knowledge.
Zero knowledge has been used for many cryptographic applications.
One such application (originating from work of Fiat and Shamir) is
the use for identification protocols. Here Alice knows a solution x to a
puzzle P, and proves her identity to Bob by, for example, providing
an encryption c of x and proving in zero knowledge that c is indeed 3
As we’ll see, technically what Alice
an encryption of a solution for P.3 Bob can verify the proof, but needs to do in such a scenario is use
a zero knowledge proof of knowledge of a
because it is zero knowledge, learns nothing about the solution of solution for P.
the puzzle and will not be able to impersonate Alice. An alternative
approach to such identification protocols is through using digital
signatures; this connection goes both ways and zero knowledge
proofs have been used by Schnorr and others as a basis for signature
schemes.
their secret inputs, hence violating security, but zero knowledge pre-
cisely guarantees that we can verify correct behaviour without access
to these inputs.
So, zero knowledge proofs are wonderful objects, but how do we get
them? In fact, we haven’t answered the even more basic question of
how do we define zero knowledge? We have to start by the most basic
task of defining what we mean by a proof.
All these proof systems have the property that the verifying algo-
rithm V is efficient. Indeed, that’s the whole point of a proof π- it’s a
sequence of symbols that makes it easy to verify that the statement is
true.
Proof of tetrachromacy:
Suppose that Alice is a tetrachromat and can dis-
tinguish between the colors of two pieces of plastic
that would be identical to a trichromat. She wants
to prove to a trichromat Bob that the two pieces are
not identical. She can do this as follows:
Alice and Bob will repeat the following experiment
n times: Alice turns her back and Bob tosses a
coin and with probability 1/2 leaves the pieces as
they are, and with probability 1/2 switches the
right piece with the left piece. Alice needs to guess
whether Bob switched the pieces or not.
If Alice is successful in all of the n repetitions then
Bob will have 1 − 2−n confidence that the pieces are
truly different.
• We have two parties: Alice and Bob. The common input is (m, x )
and Alice wants to convince Bob that NQR(m, x ) = 1. (That is,
that x is not a quadratic residue modulo m).
To see that Bob will indeed accept the proof, note that if x is a
non-residue then xs2 will have to be a non-residue as well (since if it
had a root s′ then s′ s−1 would be a root of x). Hence it will always be
the case that b′ = b.
5
People have considered the notion
strategy.5 We say that a proof system has an efficient prover if there of zero knowledge systems where
is an NP-type proof system Π for L (that is some efficient algorithm soundness holds only with respect to
efficient provers; these are known as
Π such that there exists π with Π( x, π ) = 1 iff x ∈ L and such that
argument systems.
Π( x, π ) = 1 implies that |π | ≤ poly(| x |)), such that the strategy for P
can be implemented efficiently given any static proof π for x in this
system.
1. Alice will pick a random s′ and send to Bob x ′ = xs′2 (mod m).
4. Bob checks that the value s′′ revealed by Alice is indeed a root of
x ′ x −b , if so then it “accepts” the proof.
That is, we can show the verifier does not gain anything from the
interaction, because no matter what algorithm V ∗ he uses, whatever
he learned as a result of interacting with the prover, he could have
just as equally learned by simply running the standalone algorithm
S∗ on the same input.
zero knowledge proofs 241
4. Output V2 ( x, m, x ′ , s′′ ).
We now show a proof for another language. Suppose that Alice and
Bob know an n-vertex graph G and Alice knows a Hamiltonian cycle
C in this graph (i.e.. a length n simple cycle- one that traverses all
vertices exactly once). Here is how Alice can prove that such a cycle
exists without revealing any information about it:
Protocol ZK-Ham:
5. If b = 0 then Alice sends out π and the strings { xi,j } for all i, j; If
b = 1 then Alice sends out the n strings xπ (C1 ),π (C2 ) ,. . .,xπ (Cn ),π (C1 )
together with their indices.
Protocol
Theorem 14.4 — Zero Knowledge proof for Hamiltonian Cycle.
ZK-Ham is a zero knowledge proof system for the language of
Hamiltonian graphs. 6
6
Goldreich, Micali and Wigderson
were the first to come up with a zero
knowledge proof for an NP complete
Proof. We need to prove completeness, soundness, and zero knowl- problem, though the Hamiltoncity
protocol here is from a later work by
edge.
Blum. We use Naor’s commitment
scheme.
Completeness can be easily verified, and so we leave this to the
reader.
For soundness, we recall that (as we’ve seen before) with ex-
tremely high probability the sets S0 = { G ( x ) : x ∈ {0, 1}n } and
S1 = { G ( x ) ⊕ z : x ∈ {0, 1}n } will be disjoint (this probability is over
the choice of z that is done by the verifier). Now, assuming this is
the case, given the messages {yi,j } sent by the prover in the first step,
define an n × n matrix M′ with entries in {0, 1, ?} as follows: Mi,j ′ =0
′ ′
if yi,j ∈ S0 , Mi,j = 1 if yi,j ∈ S1 and Mi,j =? otherwise.
We split into two cases. The first case is that there exists some
permutation π such that (i) M′ is a π-permuted version of the input
graph G and (ii) M′ contains a Hamiltonian cycle. Clearly in this case
G contains a Hamiltonian cycle as well, and hence we don’t need to
consider it when analyzing soundness. In the other case we claim
that with probability at least 1/2 the verifier will reject the proof.
Indeed, if (i) is violated then the proof will be rejected if Bob chooses
b = 0 and if (ii) is violated then the proof will be rejected if Bob
chooses b = 1.
4. Let b be the output of V ∗ when given the input H and the first
message {yi,j } computed as above. If b ̸= b′ then go back to step 0.
We will simply sketch here the proofs (again see Goldreich’s book
for full proofs):
Figure 14.1: Using a zero knowledge protocol for Hamiltonicity we can obtain a zero
knowledge protocol for any language L in NP. For example, if the public input is a
SAT formula ϕ and the Prover’s secret input is a satisfying assignment x for ϕ then
the verifier can run the reduction on ϕ to obtain a graph H and the prover can run
the same reduction to obtain from x a Hamiltonian cycle C in H. They can then run
the ZK-Ham protocol to prove that indeed H is Hamiltonian (and hence the original
formula was satisfiable) without revealing any information the verifier could not have
obtain on his own.
zero knowledge proofs 247
• Proof of knowledge
• Deniability / non-transferability
15
Fully homomorphic encryption: Introduction and boot-
strapping
Figure 15.1: A fully homomorphic encryption can be used to store data on the cloud in
encrypted form, but still have the cloud provider be able to evaluate functions on the
data in encrypted form (without ever learning either the inputs or the outputs of the
function they evaluate).
Let F = ∪Fℓ be a
Definition 15.1 — Partially Homomorphic Encryption.
class of functions where every f ∈ Fℓ maps {0, 1}ℓ to {0, 1}.
• Dd ( c ) = f ( x 1 , . . . , x ℓ ) .
Figure 15.2: In a valid encryption scheme E, the set of ciphertexts c such that Dd (c) = b
is a superset of the set of ciphertexts c such that c = Ee (b; r ) for some r ∈ {0, 1}t where
t is the number of random bits used by the encryption algorithm. Our definition of
partially homomorphic encryption scheme requires that for every f : {0, 1}ℓ → {0, 1}
in our family and x ∈ {0, 1}ℓ , if ci ∈ Ee ( xi ; {0, 1}t ) for i = 1..ℓ then EVAL( f , c1 , . . . , cℓ )
is in the superset {c | Dd (c) = f ( x )} of Ee ( f ( x ); {0, 1}t ). For example if we apply
EVAL to the OR function and ciphertexts c, c′ that were obtained as encryptions of
1 and 0 respectively, then the output is a ciphertext c′′ that would be decrypted to
OR(1, 0) = 1, even if c′′ is not in the smaller set of possible outputs of the encryption
algorithm on 1. This distinction between the smaller and larger set is the reason why
we cannot automatically apply the EVAL function to ciphertexts that are obtained from
the outputs of previous EVAL operations.
fully homomorphic encryption: introduction and bootstrapping 255
We claim that if the server cheats then the client will detect this
with probability 1/2 − negl (n). Working this out is a great exercise.
The probability of detection can be amplified to 1 − negl (n) using
appropriate repetition, see the paper for details.
Let q
Definition 15.2 — LWE (simplified decision variant). = q(n) be
some function mapping the natural numbers to primes. The q(n)-
decision learning with error (q(n)-dLWE) conjecture is the following:
for every m = poly(n) there is a distribution LWEq over pairs
( A, s) such that:
q
• A is an m × n matrix over Z q and s ∈ Z nq satisfies s1 = ⌊ 2 ⌋ and
√
| As|i ≤ q for every i ∈ {1, . . . , m}.
The LWE conjecture is that q(n)-dLWE holds for every q(n) that
is at most poly(n). This is not exactly the same phrasing we used
before, but can be shown to be essentially equivalent to it.
256 an intensive introduction to cryptography
The reason the two conjectures are equivalent are the following.
Before we phrased the conjecture as recovering s from a pair ( A′ , y)
where y = A′ s′ + e and |ei | ≤ δq for every i. We then showed a
search to decision reduction (Theorem 11.1) demonstrating that this
is equivalent to the task of distinguishing between this case and the
q
case that y is a random vector. If we now let α = ⌊ 2 ⌋ and β = α−1 (
mod q), and consider the matrix A = (− βy| A′ ) and the column
vector s = (sα′ ) we see that As = e. Note that if y is a random vector in
Zmq then so is − βy and so the current form of the conjecture follows
from the previous one. (To reduce the number of free parameters, we
√
fixed δ to equal 1/ q; in this form the conjecture becomes stronger
as q grows.)
Proof of Lemma 15.3. The proof is quite simple. EVAL will simply add
the ciphertexts as vectors in Z q . If c = ∑ ci then
Figure 15.3: In a trapdoor generator, we have two ways to generate randomized algo-
rithms. That is, we have some algorithms GEN and GEN ′ such that GEN outputs
a pair ( Gs , s) and GEN ′ outputs G ′ with Gs , G ′ being themselves algorithms (e.g.,
randomized circuits). The conditions we require are that (1) the descriptions of the
circuits Gs and G ′ (considering them as distributions over strings) are computationally
indistinguishable and (2) the distribution G ′ (1n ) is statistically indistinguishable from the
uniform distribution , (3) there is an efficient algorithm that given the secret “trapdoor”
s can distinguish the output of Gs from the uniform distribution. In particular (1),(2),
and (3) together imply that it is not feasible to exract s from the description of Gs .
• The distributions GEN (1n )1 (i.e., the first output of GEN (1n )
and GEN ′ (1n ) are computationally indistinguishable
Figure 15.4: The “Bootstrapping Theorem” shows that once a partially homomorphic
encryption scheme is homomorphic with respect to a rich enough family of functions,
and specifically a family that contains its own decryption algorithm, then it can be
converted to a fully homomorphic encryption scheme that can be used to evaluate any
function.
Figure 15.5: To build a castle from radioactive Lego bricks, which can be kept safe in
a special ziploc bag for 10 seconds, we can: 1) Place the bricks in a bag, and place the
bag inside an outer bag. 2) Manipulate the inner bag through the outer bag to remove
the bricks from it in 9 seconds, and spend 1 second putting one brick in place 3) Now
the outer bag has 9 seconds of life left, and we can put it inside a new bag and repeat
the process.
continue this process by putting the i + 1st bag inside the i + 2nd bag
and so on and so forth.
Proof. The idea behind the proof is simple but ingenious. Recall
that the NAND gate b, b′ 3→ ¬(b ∧ b′ ) is a universal gate that al-
lows us to compute any function f : {0, 1}n → {0, 1} that can
be efficiently computed. Thus, to obtain a fully homomorphic
encryption it suffices to obtain a function N ANDEVAL such that
Dd ( N ANDEVAL(c, c′ )) = Dd (c) N AND Dd (c′ ). (Note that this is
stronger than the typical notion of homomorphic evaluation since we
require that N ANDEVAL outputs an encryption of b N AND b′ when
given any pair of ciphertexts that decrypt to b and b′ respectively,
regardless whether these ciphertexts were produced by the encryp-
tion algorithm or by some other method, including the N ANDEVAL
procedure itself.)
Figure 16.1: In the “naive” version of the GSW encryption, to encrypt a bit b we
output an n × n matrix C such that Cs = bs where s ∈ Z nq is the secret key. In this
scheme we can transform encryptions C, C ′ of b, b′ respectively to an encryption C ′′ of
N AND (b, b′ ) by letting C ′′ = I − CC ′ .
succeed since
(C + C ′ )s = (b + b′ )s (16.1)
and
√
b ∈ {0, 1} and “short” e satisfying |ei | ≤ q for all i. This yields a
natural candidate for an encryption scheme where we encrypt b by a 3
We deliberately leave some flexibility
matrix C satisfying Cs = bs + e where e is a “short” vector.3 in the definition of “short”. While
initially “short” might mean that
We can now try to check what adding and multiplying two matri- √
|ei | < q for every i, decryption will
ces does to the noise. If Cs = bs + e and C ′ s = b′ s + e′ then succeed as long as long as |ei | is, say, at
most q/100.
(C + C ′ )s = (b + b′ )s + (e + e′ ) (16.3)
and
If you think about it hard enough, it turns out that there is some-
thing known as the “binary basis” that allows us to encode a number 4
If we were being pedantic the length
x ∈ Z q as a vector x̂ ∈ {0, 1}log q .4 What’s even more surprising is of the vector (and other constant below)
that this seemingly trivial trick turns out to be immensely useful. should be the integer ⌈log q⌉ but I omit
the ceiling symbols for simplicity of
We will define the binary encoding of a vector or matrix x over Z q by notation.
x̂. That is, x̂ is obtained by replacing every coordinate xi with log q
coordinates xi,0 , . . . , xi,log q−1 such that
log q−1
xi = ∑ 2 j xi,j . (16.5)
j =0
(C ⊕ C ′ )v = (C + C ′ )v = (b + b′ )v + (e + e′ ) (16.6)
270 an intensive introduction to cryptography
n log q
Figure 16.2: We can encode a vector s ∈ Z nq as a vector ŝ ∈ Z q that has only
entries in {0, 1} by using the binary encoding, replacing every coordinate of s with a
log q-sized block in ŝ. The decoding operation is linear and so we can write s = Qŝ for
a specific (simple) n × (n log q) matrix Q.
and
(C ⊗ C ′ )v = (!
CQ⊤ )C ′ v = (!
CQ⊤ )(bv + e′ ) . (16.7)
(!
CQ⊤ )(b′ Q⊤ s + e′ ) = b′ CQ⊤ s + (!
CQ⊤ )e′ = b′ Cv + (!
CQ⊤ )e′ (16.8)
C ∧C ′ = ( I − C ⊗ C ′ ) (16.9)
FHEENC:
• Key generation: As in the scheme of last lecture
the secret key is s ∈ Z ns and the public key is
a generator Gs such that samples from Gs (1n )
are indistinguishable from independent random
samples from Z nq but if c is output by Gs then
√
|⟨c, s⟩| < q, where the inner product (as all
other computations) is done modulo q and
for every x ∈ Z q = {0, . . . , q − 1} we define
| x | = min{ x, q − x }. As before, we can assume
that s1 = ⌊q/2⌋ which implies that ( Q⊤ s)1 is
272 an intensive introduction to cryptography
Figure 16.4: In our fully homomorphic encryption, the public key is a trapdoor
generator Gs . To encrypt a bit b, we output C = (bQ!
⊤ + D ) where D is a ( n log q ) × n
matrix whose rows are generated using Gs .
fully homomorphic encryption : construction 273
!
Figure 16.5: We decrypt a ciphertext C = (bQ ⊤ + D ) by looking at the first coordinate
of CQ⊤ s (or equivalently, CQ⊤ Qŝ). If b = 0 then this equals to the first coordinate of
√
Ds, which is at most q in magintude. If b = 1 then we get an extra factor of Q⊤ s
which we set to be in the interval (0.499q, 0.51q). We can think of either s or ŝ as our
secret key.
Once we obtain 1-4 above, we can plug FHEENC into the Boot-
strapping Theorem (Theorem 15.5) and thus complete the proof
of existence of a fully homomorphic encryption scheme. We now
address those points one by one.
274 an intensive introduction to cryptography
16.5.1 Correctness
Proof. For starters, let us see that the dimensions make sense: the
encryption of b is computed by C = (bQ! ⊤ + D ) where D is an
√
(n log q) × n matrix satisfying | Ds|i ≤ q for every i and I is the
(n log q) × (n log q).
Cv = (bQ⊤ + D )s = bv + Ds (16.11)
√
but by construction |( Ds)i | ≤ q for every i. !
matrix yields a random matrix) and hence the matrix bQ⊤ + D (and
so also the matrix bQ!
⊤ + D) contains no information about b. This
16.5.3 Homomorphism
"
CQ "
⊤ C ′ v = CQ "
⊤ ( b′ v + e′ ) = b′ CQ "
⊤ Q⊤ s + CQ "
⊤ e′ = b′ (Cv ) + CQ "
⊤ e′ = bb′ v + b′ e + CQ ⊤ e′
(16.13)
" ⊤
But since CQ is a 0/1 matrix with every row of length n log q, for
"
every i (CQ ⊤ e′ ) ≤ ( n log q ) max | e |. We see that the noise vector in
i j j
the product has magnitude at most µ(C ) + n log qµ(C ′ ). Adding the
identity for the NAND operation adds at most µ(C ) + µ(C ′ ) to the
noise, and so the total noise magnitude is bounded by the righthand
side of Eq. (16.12). !
In our case we can think of the secret key as the binary string ŝ
which describes our vector s as a bit string of length n log q. Given a
ciphertext C, the decryption algorithm takes the dot product modulo
q of s with the first row of CQ⊤ (or, equivalently, the dot product of q̂
with CQ⊤ Q) and outputs 0 (respectively 1) if the resulting number is
small (respectively large).
1. For every ŝ ∈ {0, 1}n such that |⟨ŝ, c⟩| < 0.1q, f (ŝ) = 0
2. For every ŝ ∈ {0, 1}n such that 0.4q < |⟨ŝ, c⟩| < 0.6q, f (ŝ) = 1
Note that | ∑ ŝi c̃i − ∑ ŝi ci | < mq/m1 0 = q/m9 so now we want
to show that the effect of taking modulo q̃ is not much different
from taking modulo q. Indeed, note that this sum (before a modular
reduction) is an integer between 0 and qm. If x is such an integer
and we divide x by q to write x = kq + r for r < q, then since
x < qm, k < m, and so we can write x = kq̃ + k(q − q̃) + r so the
difference between k mod q and k mod q̃ will be (in our standard
modular metric) at most mq/m10 = q/m9 . Overall we get that if
∑ ŝi ci mod q is in the interval [0.4q, 0.6q] then ∑ ŝi c̃i ( mod q̃) will be
in the interval [0.4q − 100q/m9 , 0.6q + 100q/m9 ] which is contained in
[0.3q̃, 0.7q̃]. !
This completes the proof that our scheme can fit into the boot-
strapping theorem (i.e., of Theorem 16.1), hence completing the
description of the fully homomorphic encryption scheme.
278 an intensive introduction to cryptography
To be completed
17
Multiparty secure computation I: Definition and
Honest-But-Curious to Malicious complier
But this paradigm goes well beyond this. For example, second
price (or Vickrey) auctions are known as a way to incentivize bidders
to bid their true value. In these auctions, every potential buyer sends
a sealed bid, and the item goes to the highest bidder, who only needs
to pay the price of the second-highest bid. We could imagine a digital
version, where buyers send encrypted versions of their bids. The auc-
tioneer could announce who the winner is and what was the second
largest bid, but could we really trust him to do so faithfully? Perhaps
we would want an auction where even the auctioneer doesn’t learn
anything about the bids beyond the identity of the winner and the
value of the second highest bid? Wouldn’t it be great if there was a
multiparty secure computation i: definition and honest-but-curious to malicious
complier 281
trusted party that all bidders could share with their private values,
and it would announce the results of the auction but nothing more
than that? This could be useful not just in second price auctions but
to implement many other mechanisms, especially if you are a Danish
sugar beet farmer.
S such that for every set of inputs { xi }i∈[k]\T the following two
distributions are computationally indistinguishable:
Here are some good exercises to make sure you follow the defini-
tion:
cycle and otherwise outputs (0, 0). Prove that a protocol for com- 5
Actually, if we want to be pedantic,
puting F is a zero knowledge proof5 system for the language of this is what’s known as a zero knowl-
Hamiltonicity.6 edge argument system since soundness
is only guaranteed against efficient
• Let F be the k-party functionality that on inputs x1 , . . . , xk ∈ {0, 1} provers. However, this distinction is not
important in almost all applications.
outputs to all parties the majority value of the xi ’s. Then, in any 6
Our treatment of the input graph
protocol that securely computes F, for any adversary that controls H is an instance of a general case.
less than half of the parties, if at least n/2 + 1 of the other parties’ While the definition of a functionality
only talks about private inputs, it’s
inputs equal 0, then the adversary will not be able to cause an very easy to include public inputs
honest party to output 1. as well. If we want to include some
public input Z we can simply have Z
concatenated to all the private inputs
(and the functionality check that they
P It is an excellent idea for you to pause here and try are all the same, otherwise outputting
to work out at least informally these exercises. error or some similar result).
There is in fact not a single theorem but rather many variants of this
fundamental theorem obtained by great many people, depending
on the different security properties desired, as well as the different
cryptographic and setup assumptions. Some of the issues studied in
the literature include the following:
• The fact that in the ideal model the adversary needs to choose its
queries independently means that the adversary cannot get any
information about the honest parties’ bids before deciding on its
bid.
• Despite all parties using their signing keys as inputs to the pro-
tocol, we are guaranteed that no one will learn anything about
another party’s signing key except the single signature that will be
produced.
• Note that if i is the highest bidder and j is the second highest, then
at the end of the protocol we get a valid signature using si on a
transaction transferring x j bitcoins to v1 , despite i not knowing the
value x j (and in fact never learning the identity of j.) Nonetheless,
i is guaranteed that the signature produced will be on an amount
not larger than its own bid and an amount that one of the other
bidders actually bid for.
• On the other side, a company might wish to split its own key
between several servers residing in different countries, to ensure
not one of them is completely under one jurisdiction. Or it might
do such splitting for technical reasons, so that if there is a break in
into a single site, the key is not compromised.
There are several other such examples. One problem with this
approach is that splitting a cryptographic key is not the same as
cutting a 100 dollar bill in half. If you simply give half of the bits to
each party, you could significantly harm security. (For example, it is
possible to recover the full RSA key from only 27% of its bits).
Secret sharing solves the problem of protecting the key “at rest”
but if we actually want to use the secret key in order to sign or de-
crypt some message, then it seems we need to collect all the pieces
together into one place, which is exactly what we wanted to avoid
doing. This is where multiparty secure computation comes into play,
we can define a functionality F taking public input m and secret in-
puts s1 , . . . , sk and producing a signature or decryption of m. In fact,
we can go beyond that and even have the parties sign or decrypt a
multiparty secure computation i: definition and honest-but-curious to malicious
complier 289
message without them knowing what this message is, except that it
satisfies some conditions.
1. A protocol for the “honest but curious” case using fully homomor-
phic encryption.
2. A reduction of the general case into the “honest but curious” case
where the adversary follows the protocol precisely but merely
attempts to learn some information on top of the output that it
is “entitled to” learn. (This reduction is based on zero knowledge
proofs and is due to Goldreich, Micali and Wigderson)
We will focus on the case of two parties. The same ideas extend to
k > 2 parties but with some additional complications.
The problem is that at every step Alice proves that there exists
some input x1 that can explain her message but she doesn’t prove
that it’s the same input for all messages. If Alice was being truly honest,
she should have picked her input once and use it throughout the
protocol, and she could not compute the first message according to
the input x1 and then the third message according to some input
x1′ ̸= x1 . Of course we can’t have Alice reveal the input, as this would
violate security. The solution is for Alice to commit in advance to the
input. We have seen commitments before, but let us now formally
define the notion:
We will not prove security but will only sketch it here, see Section
7.3.2 in Goldreich’s survey for a more detailed proof:
• To argue that we maintain security for Alice we use the zero knowl-
edge property: we claim that Bob could not learn anything from
the zero knowledge proofs precisely because he could have sim-
ulated them by himself. We also use the hiding property of the
commitment scheme. To prove security formally we need to show
that whatever Bob learns in the modified protocol, he could have
learned in the original protocol as well. We do this by simulating
Bob by replacing the commitment scheme with commitment to
some random junk instead of x1 and the zero knowledge proofs
with their simulated version. The proof of security requires a hy-
brid argument, and is again a good exercise to try to do it on your
own.
We can repeat this transformation for Bob (or Charlie, David, etc..
in the k > 2 party case) to transform a protocol secure in the honest
but curious setting into a protocol secure (allowing for aborts) in the
malicious setting.
Note that Alice knows r. Bob doesn’t know r but because he chose
r ′′after Alice committed to r ′ he knows that it must be fully random
regardless of Alice’s choice of r ′ . It can be shown that if we use this
coin tossing protocol at the beginning and then modify the zero
knowledge proofs to show that mi = f ( x1 , r1 , m1 , . . . , mi−1 ) where r
is the string that is consistent with the transcript of the coin tossing
294 an intensive introduction to cryptography
That is, S, which only gets the input xt and output yt , can sim-
ulate all the information that an honest-but-curious adversary
controlling party t will view.
Let F be a two party functionality. Lets start with the case that F is de-
terministic and that only Alice receives an output. We’ll later show an
easy reduction from the general case to this one. Here is a suggested
protocol for Alice and Bob to run on inputs x, y respectively so that
Alice will learn F ( x, y) but nothing more about y, and Bob will learn
nothing about x that he didn’t know before.
Figure 18.1: An honest but curious protocol for two party computation using a fully
homomorphic encryption scheme with circuit privacy.
First, note that if Alice and Bob both follow the protocol, then
indeed at the end of the protocol Alice will compute F ( x, y). We now
claim that Bob does not learn anything about Alice’s input:
(In fact, Claim B holds even against a malicious strategy of Bob- can
you see why?)
We would now hope that we can prove the same regarding Alice’s
security. That is prove the following:
.
298 an intensive introduction to cryptography
.
multiparty secure computation: construction using fully homomorphic encryption 299
So, it turns out that Claim A is not generically true. The reason is
the following: the definition of fully homomorphic encryption only
requires that EVAL( f , E( x )) decrypts to f ( x ) but it does not require
that it hides the contents of f . For example, for every FHE, if we
modify EVAL( f , c) to append to the ciphertext the first 100 bits of
the description of f (and have the decryption algorithm ignore this 2
It’s true that strictly speaking, we
extra information) then this would still be a secure FHE.2 Now we allowed EVAL’s output to have length
didn’t exactly specify how we describe the function f ( x ) defined at most n, while this would make the
output be n + 100, but this is just a
as x 3→ F ( x, y) but there are clearly representations in which the technicality that can be easily bypassed,
first 100 bits of the description would reveal the first few bits of the for example by having a new scheme
hardwired constant y, hence meaning that Alice will learn those bits that on security parameter n runs the
original scheme with parameter n/2
from Bob’s message. (and hence will have a lot of “room”
to pad the output of EVAL with extra
Thus we need to get a stronger property, known as circuit privacy. bits).
This is a property that’s useful in other contexts where we use FHE.
Let us now define it:
| P [ A(d, EVAL( f , Ee ( x1 ), . . . , Ee ( xℓ ))) = 1] − P [ A(d, EVAL( f ′ , Ee ( x1 ), . . . , Ee ( xℓ ))) = 1]| < negl (n).
(18.1)
The algorithm A above gets the secret key as input, but still cannot
distinguish whether the EVAL algorithm used f or f ′ . In fact, the
300 an intensive introduction to cryptography
expression on the lefthand side of Eq. (18.1) is equal to zero when the
scheme satisfies perfect circuit privacy.
That is,
where once again, these probabilities are taken only over the
coins of the algorithms EVAL and E.
If you find Definition 18.5 hard to parse, the most important points
you need to remember about it are the following:
(The third point, which goes without saying, is that you can al-
ways ask clarifying questions in class, Piazza, sections, or office
hours. . . )
We will not provide the full details, but together these lemmas
show that EVAL can use bootstrapping to reduce the magnitude of
0.1
the noise to roughly 2n and then add an additional random noise of
0.2
roughly, say, 2n which would make it statistically indistinguishable
from the actual encryption. Here are some hints on how to make this
work: the idea is that in order to “re-randomize” a ciphertext C we
need a very noisy encryption of zero and add it to C. The normal
0.2
encryption will use noise of magnitude 2n but we will provide an
0.1
encryption of the secret key with smaller magnitude 2n /polylog(n)
so we can use bootstrapping to reduce the noise. The main idea that
allows to add noise is that at the end of the day, our scheme boils
down to LWE instances that have the form (c, σ ) where c is a random
vector in Z nq −1 and σ = ⟨c, s⟩ + a where a ∈ [−η, +η ] is a small noise
addition. If we take any such input and add to σ some a′ ∈ [−η ′ , +η ′ ]
then we create the effect of completely re-randomizing the noise.
However, completely analyzing this requires non-trivial amount of
care and work.
multiparty secure computation: construction using fully homomorphic encryption 303
18.3 Bottom line: A two party honest but curious two party
secure computation protocol
For much of the history of mankind, people believed that the ul-
timate “theory of everything” would be of the “billiard ball” type.
That is, at the end of the day, everything is composed of some ele-
mentary particles and adjacent particles interact with one another
according to some well specified laws. The types of particles and
laws might differ, but not the general shape of the theory. Note that
this in particular means that a system of N particles can be simulated
by a computer with poly( N ) memory and time.
Figure 19.2: In the double slit experiment, opening two slits can actually cause some
positions to receive fewer electrons than before.
quantum computing.
So, what is this Bell’s Inequality? Suppose that Alice and Bob try
to convince you they have telepathic ability, and they aim to prove
it via the following experiment. Alice and Bob will be in separate 9
If you are extremely paranoid about
closed rooms.9 You will interrogate Alice and your associate will in- Alice and Bob communicating with one
terrogate Bob. You choose a random bit x ∈ {0, 1} and your associate another, you can coordinate with your
assistant to perform the experiment
chooses a random y ∈ {0, 1}. We let a be Alice’s response and b be exactly at the same time, and make sure
Bob’s response. We say that Alice and Bob win this experiment if that the rooms are so that Alice and Bob
a ⊕ b = x ∧ y. couldn’t communicate to each other in
time the results of the coin even if they
do so at the speed of light.
Now if Alice and Bob are not telepathic, then they need to agree in
advance on some strategy. The most general form of such a strategy
is that Alice and Bob agree on some distribution over a pair of func-
tions d, g : {0, 1} → {0, 1}, such that Alice will set a = f ( x ) and Bob
will set b = g( x ). Therefore, the following claim, which is basically 10
This form of Bell’s game was shown
Bell’s Inequality,10 implies that Alice and Bob cannot succeed in this by CHSH
game with probability higher than 3/4:
Proof: The main idea is for Alice and Bob to first prepare a 2-qubit
quantum system in the state (up to normalization) |00⟩ + |11⟩ (this
is known as an EPR pair). Alice takes the first qubit in this system
to her room, and Bob takes the qubit to his room. Now, when Alice
receives x if x = 0 she does nothing and if x = 1 she applies the
+ cosθ sin −θ ,
unitary map Rπ/8 to her qubit where Rθ = sin θ cos θ
. When Bob
receives y, if y = 0 he does nothing and if y = 1 he applies the
unitary map R−π/8 to his qubit. Then each one of them measures
their qubit and sends this as their response. Recall that to win the
game Bob and Alice want their outputs to be more likely to differ if
x = y = 1 and to be more likely to agree otherwise.
If x = y = 0 then the state does not change and Alice and Bob
always output either both 0 or both 1, and hence in both case a ⊕ b =
x ∧ y. If x = 0 and y = 1 then after Alice measures her bit, if
she gets 0 then Bob’s state is equal to − cos(π/8)|0⟩ − sin(π/8)|1⟩
which will equal 0 with probability cos2 (π/8). The case that Alice
gets 1, or that x = 1 and y = 0, is symmetric, and so in all the
cases where x ̸= y (and hence x ∧ y = 0) the probability that
a = b will be cos2 (π/8) ≥ 0.85. For the case that x = 1 and y = 1,
direct calculation via trigonomertic identities yields that all four
options for ( a, b) are equally likely and hence in this case a = b with
probability 0.5. The overall probability of winning the game is at least
1 1 1
4 · 1 + 2 · 0.85 + 4 · 0.5 = 0.8. QED
Proof sketch: The proof is not hard but we only sketch it here. The
general idea can be illustrated in the case that there exists a single x ∗
satisfying f ( x ∗ ) = 1. (There is a classical reduction from the general
case to this problem.) As in Simon’s algorithm, we can efficiently ini-
tialize an n-qubit system to the uniform state u = 2−n/2 ∑ x∈{0,1}n | x ⟩
which has 2−n/2 dot product with | x ∗ ⟩. Of course if we measure u,
we only have probability (2−n/2 )2 = 2−n of obtaining the value x ∗ .
Our goal would be to use O(2n/2 ) calls to the oracle to transform the
system to a state v with dot product at least some constant ϵ > 0 with
the state | x ∗ ⟩.
Now, let θ be the angle between u and x⊥ ∗ . These vectors are very
close to each other and so θ is very small but not zero - it is equal to
sin−1 (2−n/2 ) which is roughly 2−n/2 . Now if our state v has angle
α ≥ 0 with u, then as long as α is not too large (say α < π/8) then
this means that v has angle u + θ with x⊥ ∗ . That means that U ∗ v
time.
The order finding problem allows not just to factor integers in poly-
nomial time, but also solve the discrete logarithm over arbitrary
Figure 20.1: If f is a periodic function then when we represent it in the Fourier trans-
form, we expect the coefficients corresponding to wavelengths that do not evenly
divide the period to be very small, as they would tend to “cancel out”.
Shor carried out this approach for the group H = Z ∗q for some
q, but we will start be seeing this for the group H = {0, 1}n with
the XOR operation. This case is known as Simon’s algorithm (given
by Dan Simon in 1994) and actually preceded (and inspired) Shor’s
algorithm:
Note that given O(n) such samples, we can recover h∗ with high
probability by solving the corresponding linear equations.
one of these product and look at all 2n choices y ∈ {0, 1}n (with yi =
0 corresponding to picking |0⟩ and yi = 1 corresponding to picking
|1⟩ in the ith product) we get 2−n ∑ x∈{0,1}n ∑y∈{0,1}n (−1)⟨ x,y⟩ |y⟩| f ( x )⟩.
Now under our assumptions for every particular z in the image
of f , there exist exactly two preimages x and x ⊕ h∗ such that
f ( x ) = f ( x + h∗ ) = z. So, if ⟨y, h∗ ⟩ = 0 (mod 2), we get that
∗
(−1)⟨ x,y⟩ + (−1)⟨ x,y+h ⟩ = 2 and otherwise we get (−1)⟨ x,y⟩ +
∗
(−1)⟨ x,y+h ⟩ = 0. Therefore, if measure the state we will get a pair
(y, z) such that ⟨y, h∗ ⟩ = 0 (mod 2). QED !
where ω = e2πi/m .
1 1
⟨χ x , χz ⟩ = m ∑ ω xy ω zy = m ∑ ω ( x −z)y . (20.1)
y ∈Z m y ∈Z m
Note that
fˆ( x ) = √1
m ∑y∈Zm f (y)ω xy =
The crux of the algorithm is the FFT equations, which allow the
problem of computing FTm , the problem of size m, to be split into
two identical subproblems of size m/2 involving computation of
FTm/2 , which can be carried out recursively using the same elemen-
tary operations. (Aside: Not every divide-and-conquer classical
algorithm can be implemented as a fast quantum algorithm; we are
really using the structure of the problem here.)
5. Move LSB to the most significant position (state: |0⟩( FTm/2 f even +
WFTm/2 f odd ) + |1⟩( FTm/2 f even − WFTm/2 f odd ) = fˆ)
The final state is equal to fˆ by the FFT equations (we leave this as
an exercise)
There is a polynomial-
Theorem 20.6 — Order finding algorithm, restated.
time quantum algorithm that on input A, N (represented in binary)
finds the smallest r such that Ar = 1 (mod N ).
We now describe the algorithm and the state, this time including
normalizing factors.
The claim concludes the proof since it implies that x/m = a/r
where a is random integer less than r. Now for every r, at least
Ω(r/ log r ) of the numbers in [r − 1] are co-prime to r. Indeed, the
prime number theorem says that there at least this many primes
in this interval, and since r has at most log r prime factors, all but
log r of these primes are co-prime to r. Thus, when the algorithm
computes a rational approximation for x/m, the denominator it will
find will indeed be r.
−1 r ℓ x
If c does not divide x then ω r is a cth root of unity, so ∑cℓ= 0w =
0 by the formula for sums of geometric progressions. Thus, such a
number x would be measured with zero probability. But if x = cj
then ω rℓ x = wrcjℓ = ω Mj = 1, and hence the amplitudes of all such
x’s are equal for all j ∈ {0, 2, . . . , r − 1}.
Thus all that is left is to prove the next two lemmas. The first
shows that there are Ω(r/ log r ) values of x that satisfy the above
two conditions and the second shows that each is measured with
√
probability Ω((1/ r )2 ) = Ω(1/r ).
Proof of Lemma 1 We prove the lemma for the case that r is co-
prime to m, leaving the general case to the reader. In this case, the
map x 3→ rx (mod m) is a permutation of Z ∗m . There are at least
Ω(r/ log r ) numbers in [1..r/10] that are coprime to r (take primes
in this range that are not one of r’s at most log r prime factors) and
hence Ω(r/ log r ) numbers x such that rx (mod m) = xr − ⌊ xr/m⌋m
is in [1..r/10] and coprime to r. But this means that ⌊rx/m⌋ can not
have a nontrivial shared factor with r, as otherwise this factor would
be shared with rx (mod m) as well.
tegers.
1
α = ⌊α⌋ + . (20.3)
R
If we continue this process for n steps, we get a rational number,
p
denoted by [ a0 , a1 , . . . , an ], which can be represented as qnn with pn , qn
coprime. The following facts can be proven using induction:
the solution of Alice sending to Bob a Brink’s truck with the shared
secret key. People have proposed some other ways to use the inter-
esting properties of quantum mechanics for cryptographic purposes
including quantum money and quantum software protection.
21
Software Obfuscation
• Public key encryption and digital signatures that enable Alice and Bob
to set up such a virtually secure channel without sharing a prior key.
This enables our “information economy” and protects virtually
every financial transaction over the web. Moreover, it is the crucial
mechanism for supplying “over the air” software updates which
smart devices whether its phones, cars, thermostats or anything
else. Some had predicted that this invention will change the nature
of our form of government to crypto anarchy and while this may
be hyperbole, governments everywhere are worried about this
invention.
(BTW all of the above points are notions that you should be famil-
iar and be able to explain what are their security guarantees if you
ever need to use them, for example, in the unlikely event that you
ever find yourself needing to take a cryptography final exam. . . )
It turns out that the answer is yes. Here are some scenarios that are
still not covered by the above tools:
This will allow us to give the key d1 to the manager of the sales
department (and not worry about her taking the key with her if she
leaves the company), or more generally give every employee a key
that corresponds to his or her role. Furthermore, if the company re-
ceives a subpoena for all emails relating to a particular topic, it could
give out a cryptographic key that reveals precisely these emails and
nothing else. It could also run a spam filter on encrypted messages
without needing to give the server performing this filter access to the
full contents of the messages (and so perhaps even outsource spam
filtering to a different company).
• For every function f : {0, 1}ℓ → {0, 1}, if (d, e) = G (1n ) and
d f = KeyDist(d, f ), then for every message m, Dd f ( Ee (m)) =
f ( m ).
5. Eve wins if b′ = b.
software obfuscation 335
It’s not only exotic forms of encryption that we’re missing. Here is
another application that is not yet solved by the above tools. From
time to time software companies discover a vulnerability in their
products. For example, they might discover that if fed an input x
of some particular form (e.g., satisfying a regular expression R) to
a server running their software could give an adversary unlimited
access to it. In such a case, you might want to release a patch that
modifies the software to check if R( x ) = 1 and if so rejects the
input. However the fear is that hackers who didn’t know about
the vulnerability before could discover it by examining the patch
and then use it to attack the customers who are slow to update
their software. Could we come up for a regular expression R with a
program P such that P( x ) = 1 if and only if R( x ) = 1 but examining
the code of P doesn’t make it any easier to find some x satisfying R?
• A(O(C ))
(Note that the distributions above are of a single bit, and so being
indistinguishable simply means that the probability of outputing 1 is
equal in both cases up to a negligible additive factor.)
The writings of Diffie and Hellman, James Ellis, and others that
thought of public key encryption, shows that one of the first ap-
proaches they considered was to use obfuscation to transform a
private-key encryption scheme into a public key one. That is, given
a private key encryption scheme ( E, D ) we can transform it to a pub-
lic key encryption scheme ( G, E′ , D ) by having the key generation
algorithm select a private key k ← R {0, 1}n that will serve as the de-
cryption key, and let the encryption key e be the circuit O(C ) where
O is an obfuscator and C is a circuit mapping c to Dk (d). The new
encryption algorithm E′ takes e and c and simply outputs e(c).
We will now show the proof of Theorem 21.3. For starters, note
that obfuscation is trivial for learnable functions. That is, if F is a
function such that given black-box access to F we can recover a
circuit that computes it, then we can obfuscate it. Given a circuit C,
the obfuscator O will simply use it as a black box to learn a circuit
C ′ that computes the same function and output it. Since O itself only
uses black box access to C, it can be trivially simulated perfectly.
(Verifying that this is indeed the case is a good way to make sure you
followed the definition.)
However, this is not so useful, since it’s not hard to see that all
the examples above where we wanted to use obfuscation involved
functions that were unlearnable. But it already suggests that we
should use an unlearnable function for our negative result. Here is
an extremely simple unlearnable function. For every α, β ∈ {0, 1}n ,
we define Fα,β : {0, 1}n → {0, 1}n to be the function that on input x
outputs β if x = α and otherwise outputs 0n .
Given black box access for this function for a random α, β, it’s
extremely unlikely that we would hit α with a polynomial number of
queries and hence will not be able to recover β and so in particular 2
Pseudorandom functions can be used
will not be able to learn a circuit that computes Fα,β .2 to construct examples of functions
that are unlearnable in the much
This function already yields a counterexample for a stronger stronger sense that we cannot achieve
the machine learning goal of outputing
version of the VBB definition. We define a strong VBB obfuscator to be
some circuit that approximately predicts
a compiler O that satisfies the above definition for adversaries that the function.
can output not just one bit but an arbitrary long string. We can now
prove the following:
Lemma 21.4 There does not exist a strong VBB obfuscator.
* *
* *
*P [ Dα,β ( A(O( Fα,β )) = 1] − P [ Dα,beta (S Fα,β (110n )) = 1]* > 0.9 (∗)
Clearly (∗) implies that that these two distributions are not indis-
tinguishable, and so proving (∗) will finish the proof. The algorithm
Dα,β on input a circuit C ′ will simply output 1 iff C ′ (α) = β. By
the definition of a compiler and the algorithm A, for every α, β,
P [ Dα,β ( A(O( Fα,β )) = 1] = 1.
The adversary in the proof of Lemma 21.4 does not seem very
impressive. After all, it merely printed out its input. Indeed, the
definition of strong VBB security might simply be an overkill, and
“plain” VBB is enough for almost all applications. However, as men-
tioned above, plain VBB is impossible to achieve as well. We’ll prove
a slightly weaker version of Theorem 21.3:
(To get the original theorem from this, note that if VBB obfuscation
exist then we can transform any private key encryption into a fully
homomorphic public key encryption.)
output γ iff Dd (c′ ) = β and otherwise output 0n . And for the input
1n , it will output c. For all other inputs it will output 0n .
We will use this function family where d, e are the keys of the FHE,
and c = Ee (α). We now define our adversary A. On input some
circuit C ′ , A will compute c = C ′ (1n ) and let C ′′ be the circuit that on
input x outputs C ′ (00x ). It will then let c′′ = EVALe (C ′′ , c). Note that
if c is an encryption of α and C ′ computes F = Fd,e,c,α,β,γ then c′′ will
be an encryption of F (00α) = β. The adversary A will then compute
γ′ = C ′ (01c′ ) and output γ1 .
* *
* *
*P [ D ( A(O( Fd,e,c,α,β,γ ))) = 1] − P [S Fd,e,c,α,β,γ (1| Fd,e,c,α,β,γ | )]* ≥ 0.1 (21.1)
We say a compiler
Definition 21.6 — Indistinguishability Obfuscation.
O is an indistinguishability obfuscator (IO) if for every two circuits
C, C ′ that have the same size and compute the same function, the
random variables O(C ) and O(C ′ ) are computationally indistin-
guishable.
1. IO is impossible to achieve.
However, it turns out that this guess is (most likely) wrong. New
results have shown that IO is extremely useful for many applications,
including those outlined above. They also gave some evidence that it
might be possible to achieve. We’ll talk about those works in the next
lecture.
22
More obfuscation, exotic encryptions
More generally, even in the private key setting, people have stud-
ied encryption schemes such as
hid = H (id) and let b = logg hid . Then an encryption of m has the
form h′ = gc , H ′ (id∥ ϕ( g a , hid )c ) ⊕ m, and so the second term is equal
to H ′ (id∥ ĝ abc ) ⊕ m. However, since did = hid a = g ab , we get that
ϕ(h′ , did ) = ĝ abc and hence decryption will recover the message. QED
• The keys are generated and Eve gets the master public key.
Proof: Suppose for the sake of contradiction that there exists some
time T = poly(n) adversary A that succeeds in the IBE-CPA with
probability at least 1/2 + ϵ for some non-negligible ϵ. We assume
without loss of generality that whenever A makes a query to the key
distribution function with id id or a query to H ′ with prefix id, it had
already previously made the query id to H. (A can be easily modified
to have this behavior)
• When A makes a query to H with id, then for all but the i0th
queries, B will chooose a random bid ∈ {0, . . . , |G |} (as usual
we’ll assume |G | is prime), choose eid = gbid and define H (id) = eid .
more obfuscation, exotic encryptions 345
Let id0 be the i0th query A made to the oracle. We define H (i0 ) = gb
(where gb is the input to B- recall that B does not know b.)
• When A makes a query to the H ′ oracle with input id′ ∥ĥ then for
all but the j0th query B answers with a random string in {0, 1}ℓ .
In the j0th query, if id′ ̸= id0 then B stops and fails. Otherwise, it
outputs ĥ.
Proof: If A does not make this query then the message in the
challenge is XOR’ed by a completely random string and A cannot
distinguish between m0 and m1 in this case with probability better
than 1/2. QED
We will now show how using such a multilinear map we can get
a construction for a witness encryption scheme. We will only show
the construction, without talking about the security definition, the
assumption, or security reductions.
• Anonymous routing is about ensuring that Alice and Bob can com-
municate without that fact being revealed.
23.1 Steganography
In the public key setting, suppose that Bob publishes a public key e
for an encryption scheme that has pseudorandom ciphertexts. That is,
23.3 Tor
23.4 Telex
23.5 Riposte
24
Ethical, moral, and policy dimensions to cryptography
All that said, significant changes often pose non trivial dangers,
and it is important to have an informed and reasoned discussion of
the ways cryptography can help or harm the general and private
good.
• Are we less or more secure today than in the past? In what ways
did the balance between government and individuals shift in the
last few decades? Do governments have more or less data and
tools for monitoring individuals at their disposal? Do individuals
and non-governmental groups have more or less ability to inflict
harm (and hence need to be protected against)?
The impetus for the current iteration of the security vs privacy debate
were the Snowden revelations on the massive scale of surveillance by
the NSA on citizens in the U.S. and around the globe. Concurrently,
in plain sight, companies such as Apple, Google, Facebook, and oth-
ers are also collecting massive amounts of information on their users.
Some of the backlash to the Snowden revelations was increased pres-
sure on companies to support stronger “end-to-end” encryption
such as some data does not reside on companies’ servers, that have
become suspect. We’re now seeing some “backlash to the backlash”
with law enforcement and government officials around the globe
trying to ban such encryption technlogoy or mandate government
backdoors.
We’ve mentioned this case in the past. (I also blogged about it.)
The short summary is that an iPhone belonging to one of the San
Bernardino terrorists was found by the FBI. The iPhone’s memory
was encrypted by a key k that is obtained as H (uid∥ passcode) where
passcode is the six digit passcode of the user and uid is a secret 128
bit key that is hardwired into the processor. The processor will
only allow ten attempts at guessing the passcode before erasing all
memory. The FBI wanted Apple’s help in creating a digitally signed
software update that essentially run a brute force search over the
106 passcodes and output the key k. The software update could
be restricted to run only on that particular iPhone. Eventually, the
FBI managed to extract the information out of the iPhone without
Apple’s help. The method they used is unknown, but it may be
possible to physically extract the uid from the processor. It might
also be possible to prevent erasure of the memory by disconnecting
it from the processor, or rewriting it after erasure. Would such cases
change your position on this question?
• Given that the FBI had a legal warrant for the information on
the iPhone, was it wrong of Apple to refuse to provide the help
required?
• Was it wrong for Apple to have designed their iPhone so that they
are unable to easily extract information out of it? Should they be
required to make sure that such devices can be searched as a result
of a legal warrant?
ethical, moral, and policy dimensions to cryptography 355
• If the only way for the FBI to get the information was to get Ap-
ple’s master signature key (that allows to completely break into
any iPhone, and even turn it into a recording/surveillance device),
would it have been OK for them to do it? Should Apple design
their device in a way that even their master signature key cannot
break them? Is that even possible, given that software updates are
crucial for proper functioning of such devices? (It was recently
claimed that Canadian police has had access to the master decryp-
tion key of Blackberry since 2010.)
In the San Bernardino case, the utility of breaking into the phone
was questioned, given that both perpetrators were killed and there
was no evidence of them receiving any assistance. But there are cases
where things are more complicated. Brittney Mills was 29 years old
and 8 months pregnant when she was shot and killed in April 2015
in Baton Rouge, Louisiana. Her baby was delivered via emergency
C section but also died a week later. There was no sign of forced
entry and so it is quite likely she knew her assailant. Her family
believes that the clues to her murderer’s identity could be found in
her iPhone, but since it is locked they have no way of extracting this
information. One can imagine other cases as well. Recently a mother
found her kidnapped daughter using the Find my iPhone procedure.
It is not hard to concieve of a case where unlocking a phone is the
key to saving someone’s life. Would such cases change your view of
the above questions?
We’ve also mentioned the case of the Juniper backdoor case. This was
a break in to the firewalls of Juniper networks by an unknown party
that was crucially enabled by backdoor allegedly inserted by the NSA
into the Dual EC pseudorandom generator. (see also here and here
for more).
• While we talked about bitcoin, the TLS protocol, two factor au-
thentication systems, and some aspects of pretty good privacy, we
restricted ourselves to abstractions of these systems and did not
attempt a full “end to end” analysis of a complete system. I do
hope you have learned the tools that you’d be able to understand
the full operation of such a system if you need to.
I did not intend this course to teach you how to implement crypto-
graphic algorithms, but I do hope that if you need to use cryptogra-
phy at any point, you now have the skills to read up what’s needed,
and be able to argue intelligently about the security of real-world
systems. I also hope that you have now sufficient background to not
be scared by the technical jargon and the abundance of adjectives in
cryptography research papers, and be able to read up on what you
need to follow any paper that is interesting to you.
Mostly, I just hope you enjoyed this last term and felt like this
course was a good use of your time. I certainly did.
Bibliography