April 19, 2015: 10:58am C 2015 Avinash Kak, Purdue University
April 19, 2015: 10:58am C 2015 Avinash Kak, Purdue University
April 19, 2015: 10:58am C 2015 Avinash Kak, Purdue University
Goals:
To introduce the rudiments of encryption/decryption vocabulary.
To trace the history of some early approaches to cryptography
and to show through this history a common failing of humans to
get carried away by the technological and scientific hubris of the
moment.
Python scripts that give you pretty good security for confidential
communications. Only good for fun, though.
1
CONTENTS
Section Title
Page
2.1
2.2
2.3
Caesar Cipher
2.4
11
2.5
Monoalphabetic Ciphers
13
2.5.1
2.6
2.6.1
2.7
15
16
18
20
2.7.1
21
2.7.2
22
2.7.3
24
2.7.4
25
2.8
2.9
2.9.1
28
30
31
32
2.10
Transposition Techniques
34
2.11
37
2.12
Homework Problems
44
Lecture 2
Lecture 2
Lecture 2
Lecture 2
Lecture 2
Lecture 2
Lecture 2
Each character of a message is replaced by a character three position down in the alphabet.
plaintext:
ciphertext:
Lecture 2
In these formulas, k would be the secret key. As mentioned earlier, E() stands for encryption. By the same token, D() stands
for decryption.
10
Lecture 2
11
Lecture 2
If you did not know anything about the underlying plaintext and
it was encrypted by a Base64 sort of an algorithm, it might not
be as trivial a cryptographic system as it might seem. But, of
course, if the word ever got out that your plaintext was in Swahili,
youd be hosed.
Finally, here is more regarding the slogan All internet communications are character based in the red-and-blue note on the
previous page: As you will see in Lecture 16, the internet communications are governed by the TCP/IP protocol. That protocol
itself does not care whether you put on the wire a purely character based file, an audio file, a video file, etc. The protocol would
work equally well with all sorts of files. So, strictly speaking, the
slogan is technically wrong. Nonetheless, the slogan is of great
practical importance because the software that is charged with
the task of making your data file available to the TCP/IP engine
in your computer could corrupt your data if it is not based on
just printable characters.
12
Lecture 2
The Caesar cipher you just saw is an example of a monoalphabetic cipher. Basically, in a monoalphabetic cipher, you have
a substitution rule that gives you a replacement ciphertext letter
for each letter of the alphabet used in the plaintext message.
Lets now consider what one would think would be a very strong
monoalphabetic cipher. We will make our substitution letters a
random permutation of the 26 letters of the alphabet:
plaintext letters:
f .....
substitution letters:
b .....
13
Lecture 2
Wouldnt such a large key space make this cipher extremely difficult to break? Not really, as we explain next!
14
Lecture 2
So this would seem to be the answer to our prayers for an unbreakable code for symmetric encryption.
15
Lecture 2
If you know the nature of plaintext, any substitution cipher, regardless of the size of the key space, can be broken easily with a
statistical attack.
Figure 1 shows the relative frequencies for the letters of the English alphabet in a sample of English text. Obviously, by comparing this distribution with a histogram for the letters occurring
in a piece of ciphertext, you may be able to establish the true
identities of the ciphertext letters.
16
Lecture 2
17
Lecture 2
Shown in Table 1 are the digram frequencies. The table does not
include digrams whose relative frequencies are below 0.47. (A
complete table of frequencies for all possible digrams would have
676 entries in it.)
If we have available to us the relative frequencies for all possible digrams, we can represent this table by the joint probability
p(x, y) where x denotes the first letter of a digram and y the
second letter. Such joint probabilities can be used to compare
the digram-based statistics of ciphertext and plaintext.
18
Lecture 2
digram frequency
th
3.15
he
2.51
an
1.72
in
1.69
er
1.54
re
1.48
es
1.45
on
1.45
ea
1.31
ti
1.28
at
1.24
st
1.21
en
1.20
nd
1.18
or
1.13
and
ent
ion
digram frequency
to
1.11
nt
1.10
ed
1.07
is
1.06
ar
1.01
ou
0.96
te
0.94
of
0.94
it
0.88
ha
0.84
se
0.84
et
0.80
al
0.77
ri
0.77
ng
0.75
tio
f or
nde
.....
19
Lecture 2
2.7: MULTIPLE-CHARACTER
ENCRYPTION TO MASK PLAINTEXT
STRUCTURE: THE PLAYFAIR CIPHER
So how about destroying some of that structure by mapping multiple characters at a time to ciphertext characters?
One of the best known approaches in classical encryption that carries out multiple-character substitution is known as the Playfair
cipher, which is described in the next subsection.
20
Lecture 2
I/J
21
Lecture 2
1. Two plaintext letters that fall in the same row of the 5 5 matrix are replaced by letters to the right of each in the row. The
rightness property is to be interpreted circularly in each row,
meaning that the first entry in each row is to the right of the
last entry. Therefore, the pair of letters bf in plaintext will get
replaced by CA in ciphertext.
2. Two plaintext letters that fall in the same column are replaced
by the letters just below them in the column. The belowness
property is to be considered circular, in the sense that the topmost
entry in a column is below the bottom-most entry. Therefore, the
pair ol of plaintext will get replaced by CV in ciphertext.
Lecture 2
23
Lecture 2
Before the substitution rules are applied, you must insert a chosen
filler letter (lets say it is x) between any repeating letters in
the plaintext. So a plaintext word such as hurray becomes
hurxray
24
Lecture 2
As expected, the cipher does alter the relative frequencies associated with the individual letters and with digrams and with
trigrams, but not sufficiently.
Lecture 2
26
Lecture 2
27
Lecture 2
The Hill cipher takes a very different (more mathematical) approach to multi-letter substitution, as we describe in what follows.
Now we can transform three letters at a time from the plaintext, the letters being represented by the numbers p1, p2, and
p3, into three ciphertext letters c1 , c2, and c3 in their numerical
representations by
28
Lecture 2
~ mod 26
C
~ mod 26 = P
~
[K] P
29
Lecture 2
30
Lecture 2
In the Vigenere cipher, you first align the encryption key with
the plaintext message. [If the plaintext message is longer than the encryption
key, you can repeat the encryption key, as we show below where the encryption key
is abracadabra.]
31
key:
plaintext:
ciphertext:
Lecture 2
abracadabraabracadabraabracadabraab
canyoumeetmeatmidnightihavethegoods
CBEYQUPEFKMEBK.....................
Since, in general, the encryption key will be shorter than the message to be encrypted, for the Vigenere cipher the key is repeated,
as mentioned previously and as illustrated in the above example
where the key is the string abracadabra.
encryption key
plain text letters
letter
a b c d ............
substitution letters
a
A B C D ............
b
B C D E ............
c
C D E F ............
d
D E F G ............
e
E F G H ............
.
. . . .
.
.
. . . .
.
z
Z A B C ............
32
Lecture 2
Since there exist in the output multiple ciphertext letters for each
plaintext letter, you would expect that the relative frequency distribution would be effectively destroyed. But as can be seen in
the plots in Figure 2, a great deal of the input statistical distribution still shows up in the output. [The plot shown for Vigenere cipher is for an
encryption key that is just 9 letters long.]
Obviously, the longer the encryption key, the greater the masking
of the structure of the plaintext. The best possible key is as long
as the plaintext message and consists of a purely random permutation of the 26 letters of the alphabet. This would yield the
ideal plot shown in Figure 2. The ideal plot is labeled Random
polyalphabetic in that figure.
Lecture 2
34
Lecture 2
We will now talk about a different notion in classical cryptography: permuting the plaintext.
This is how a pure permutation cipher could work: You write
your plaintext message along the rows of a matrix of some size.
You generate ciphertext by reading along the columns. The order
in which you read the columns is determined by the encryption
key:
key:
4 1 3 6 2 5
plaintext:
m
a
i
r
d
e
t
g
t
i
35
e
m
h
h
e
t
i
t
e
s
m
d
f
g
x
e
n
o
o
y
ciphertext:
Lecture 2
ETGTIMDFGXEMHHEMAIRDENOOYTITES
36
Lecture 2
If your goal is to establish a medium-strength secure communication link, you may be able to get by without having to resort to
the full-strength crypto systems that we will be studying in later
lectures.
=
=
=
A [B C
0
A
Lecture 2
Differential XORing destroys any repetitive patterns in the messages to be encrypted and makes it more difficult to break encryption by statistical analysis. Differential XORing needs an
Initialization Vector that, as already mentioned, is derived from
a pass phrase in the script shown below.
#!/usr/bin/env python
###
###
###
EncryptForFun.py
Avi Kak (kak@purdue.edu)
January 21, 2014
###
###
###
###
Call syntax:
38
###
###
###
EncryptForFun.py
Lecture 2
message_file.txt
output.txt
#(A)
if len(sys.argv) is not 3:
sys.exit(Needs two command-line arguments, one for
the message file and the other for the
encrypted output file)
#(B)
BLOCKSIZE = 64
numbytes = BLOCKSIZE / 8
#(C)
#(D)
#(E)
#(F)
#(G)
#(H)
#(I)
#(J)
#(K)
#(L)
#(M)
#(N)
#(O)
#(P)
#(Q)
#(R)
#(S)
#(T)
#(U)
39
Lecture 2
bv_read = bv.read_bits_from_file(BLOCKSIZE)
if len(bv_read) < BLOCKSIZE:
bv_read += BitVector(size = (BLOCKSIZE - len(bv_read)))
bv_read ^= key_bv
bv_read ^= previous_block
previous_block = bv_read.deep_copy()
msg_encrypted_bv += bv_read
outputhex = msg_encrypted_bv.getHexStringFromBitVector()
#(V)
#(W)
#(X)
#(Y)
#(Z)
#(a)
#(b)
#(c)
#(d)
#(e)
#(f)
In the script shown above, if the size (in terms of the number of
bits) of the message file is not an integral multiple of BLOCKSIZE,
the script appends a sequence of null bytes (that is, bytes made
up of all zeros) at the end so that this condition is satisfied. This
is done in line (W) and (X) of the script.
The reader may wish to compare the decryption logic in the loop
in lines (U) through (b) of the script shown below with the encryption logic shown in lines (S) through (b) of the script above.
40
Lecture 2
#!/usr/bin/env python
###
###
###
DecryptForFun.py
Avi Kak (kak@purdue.edu)
January 21, 2014
###
###
###
###
###
###
###
Call syntax:
DecryptForFun.py
encrypted_file.txt
recover.txt
#(A)
if len(sys.argv) is not 3:
sys.exit(Needs two command-line arguments, one for
the encrypted file and the other for the
decrypted output file)
#(B)
BLOCKSIZE = 64
numbytes = BLOCKSIZE / 8
#(C)
#(D)
#(E)
#(F)
#(G)
#(H)
#(I)
#(J)
#(K)
#(L)
#(M)
#(N)
#(O)
41
Lecture 2
#(P)
#(Q)
#(R)
#(S)
#(U)
#(V)
#(W)
#(X)
#(Y)
#(Z)
#(a)
#(b)
outputtext = msg_decrypted_bv.getTextFromBitVector()
#(c)
#(d)
#(e)
#(f)
To exercise these scripts, enter some text in a file and lets call
this file message.txt. Now you can call the encrypt script by
EncryptForFun.py
message.txt
output.txt
The script will place the encrypted output, in the form of a hex
string, in the file output.txt. Subsequently, you can call
DecryptForFun.py
output.txt
recover.txt
to recover the original message from the encrypted output produced by the first script.
42
Lecture 2
43
Lecture 2
1. Use the ASCII codes available at http://www.asciitable.com to manually construct a Base64 encoded version of the string hello\njello.
Your answer should be aGVsbG8KamVsbG8=. What do you think the
character = at the end of the Base64 representation is for? [If
you wish you can also use interactive Python for this. Enter the following sequence of commands import
base64 followed by base64.b64encode(hello\njello). If you are using Python 3, make sure you
prefix the argument to the b64encode() function by the character b to indicate that it is of type bytes as
opposed to of type str. Several string processing functions in Python 3 require bytes type arguments and
often return results of the same type. Educate yourself on the difference between the string str type and bytes
type in Python 3.
2. A text file named myfile.txt that you created with a run-ofthe-mill editor contains just the following word:
hello
If you examine this file with a command like
hexdump
-C
myfile.txt
you are likely to see the following bytes (in hex) in the file:
68
65
6C
6C
6F
0A
44
Lecture 2
Looks like there are six bytes in the file whereas the word hello
has only five characters. What do you think is going on? Do you
know why your editor might want to place that extra byte in the
file and how to prevent that from happening?
3. All classical ciphers are based on symmetric key encryption. What
does that mean?
4. What are the two building blocks of all classical ciphers?
5. True or false: The larger the size of the key space, the more secure
a cipher? Justify your answer.
6. Give an example of a cipher that has an extremely large key space
size, an extremely simple encryption algorithm, and extremely
poor security.
7. What is the difference between monoalphabetic substitution ciphers and polyalphabetic substitution ciphers?
8. What is the main security flaw in the Hill cipher?
45
Lecture 2
9. What makes Vigenere cipher more secure than, say, the Playfair
cipher?
Lecture 2
all in one line. (You can copy-and-paste this hex ciphertext into
your own script. However, make sure that you delete the backslash at the end of the first line. You can also see the same
47
Lecture 2
output in the file named output5.txt in the code archive for Lecture 2.) Your job is to both recover the original quote and the
encryption key used by mounting a brute-force attack on the encryption/decryption algorithms. (HINT: The logic used in the
scripts implies that the effective key size is only 16 bits when the
BLOCKSIZE variable is set to 16. So your brute-force attack need
search through a keyspace of size only 216.)
48
Lecture 2
CREDITS
49