Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Base64 Decoding Explained

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Rene Nyffeneggers base64

Decoding Algorithm Explained


Michael Horii

Intro
Were using Rene Nyffeneggers code to
decode base64 strings.
Code is available at:
http://www.adpgmbh.ch/cpp/common/base64.html

The following slides explain how the algorithm


works.

Encoded Symbols
Suppose we have four encoded symbols to
decode, where each symbol is represented by
a 6-bit vector. For ease of illustration, well
choose the encoded symbols as follows:
Encoded Symbol 1: ES1 = 111111
Encoded Symbol 2: ES2 = 000000
Encoded Symbol 3: ES3 = 111111
Encoded Symbol 4: ES4 = 000000

Decoding the String


To decode the base64 string, we concatenate the 6bit vectors, and group them in 8-bit vectors.
ES=Encoded Symbol

ES1

ES2

ES3

ES4

111111000000111111000000
DS=Decoded Symbol

DS1

DS2

DS3

111111000000111111000000
C

Note we start with a


4-symbol encoded
string, and end up
with a 3-symbol
decoded string

We match each 8-bit vector to a symbol using a


look-up table.
11111100 C
00001111
11000000

A
T

The Catch
This decoding process is straightforward
conceptually, but the catch is that bytes are 8
bits, while the encoded symbols are
represented by 6-bit vectors.
That means well need to use three operators
to accomplish our goal of assembling the bits:
Bit shifting
Masking
ORing

Prepended Encoded Symbols


When we assign a 6-bit vector to a (1-byte) char
variable, two zeroes are prepended to the front
of the 6-bit vector:
111111 becomes 00111111
000000 becomes 00000000
Well refer to a prepended encoded symbol as PES:

PES1 = 00111111
PES2 = 00000000
PES3 = 00111111
PES4 = 00000000

(ES1 = 111111)
(ES2 = 000000)
(ES3 = 111111)
(ES4 = 000000)

Aligning Bit Vectors


Because were fitting 6-bit vectors into 8-bit
containers, we need to align the 6-bit vectors
using bit shifting.
ES1

ES2

ES3

ES4

111111000000111111000000
unsigned char PES1 = 0b111111;
unsigned char PES2 = 0b000000;
DS1
PES1 shifted left by 2
to align the bits

DS2

00111111
00000000

DS3

111111000000111111000000
00111111
00000000 Apply mask to PES2 to get the two digits

(underlined) needed for DS1, and then shift


right by 4 to align the bits

Plan of Action
The following slides show in detail how the
bytes are manipulated, from start to finish, to
achieve our goal.
Steps:
Populate Byte 1 (DS1) using ES1 and ES2
Populate Byte 2 (DS2) using ES2 and ES3
Populate Byte 3 (DS3) using ES3 and ES4
ES=Encoded Symbol

ES1

ES2

ES3

ES4

111111000000111111000000
DS=Decoded Symbol

DS1

DS2

DS3

111111000000111111000000

Populating Byte 1
Byte 1
Prepended ES1 = PES1

111111000000111111000000
00111111

Shift left by 2

11111100
11111100
Prepended ES2 = PES2
00000000
Mask 00110000
Apply Mask

00000000

Blue: Digits
from ES1
Orange: Digits
from ES2

PES1 << 2
To populate Byte 1, we only want the first two nonprepended digits of ES2, so we use a mask to get these
digits

PES2 & 0x30

Shift right by 4

00000000
00000000

(PES2 & 0x30) >> 4

11111100

(PES1 << 2) | ((PES2 & 0x30) >> 4)

Populating Byte 2
Byte 2
PES2

111111000000111111000000
00000000 To populate Byte 2, we only want the
Mask 00001111
last 4 digits of ES2, so we use a mask to
Apply Mask

get these digits

00000000 PES2 & 0xf


Shift left by 4

00000000
00000000
00111111
Mask 00111100

PES3

Apply Mask

00111100
Shift right by 2

(PES2 & 0xf) << 4


To populate Byte 2, we only want the first 4 nonprepended digits of ES3, so we use a mask to get
these digits

PES3 & 0x3c

00001111
00001111 (PES3 & 0x3c) >> 2
00001111 ((PES2 << 0xf) << 4) | ((PES3 & 0x3c) >> 2)

Populating Byte 3
Byte 3
PES3

111111000000111111000000
00111111
Mask 00000011
Apply Mask

00000011

PES3 & 0x3

11000000
00000000
11000000

(PES3 & 0x3) << 6

Shift left by 6

PES4

PES4
((PES3 & 0x3) << 6) |PES4

Putting It All Together


Now we have three bytes, and concatenated
we get the desired byte array.
Byte 1

Byte 2

Byte 3

111111000000111111000000
C

11111100
00001111
11000000

C
A
T

Decoding just requires using the lookup table


to find out what symbol goes with Byte 1, Byte
2, and Byte 3. Using the look-up table shown
earlier, well get CAT as the decoded string.

You might also like