CS252 Graduate Computer Architecture Error Correction Codes April 20, 2011

CS252
Graduate Computer Architecture

Lecture 23

Error Correction Codes
April 20
th
, 2011
John Kubiatowicz
Electrical Engineering and Computer Sciences
University of California, Berkeley

http://www.eecs.berkeley.edu/~kubitron/cs252
Approach: Redundancy
Add extra information so that we can recover from errors
Can we do better than just create complete copies?
Block Codes: Data Coded in blocks
k data bits coded into n encoded bits
Measure of overhead: Rate of Code: K/N
Often called an (n,k) code
Consider data as vectors in GF(2) [ i.e. vectors of bits ]
Code Space is set of all 2
n
vectors,
Data space set of 2
k
vectors
Encoding function: C=f(d)
Decoding function: d=f(C)
Not all possible code vectors, C, are valid!
Recall: ECC Approach: Redundancy

4/20/2011 cs252-S11, Lecture 23 2
Code Space
v
0

C
0
=f(v
0
)
Code Distance
(Hamming Distance)
General Idea: Code Vector Space
Not every vector in the code space is valid
Hamming Distance (d):
Minimum number of bit flips to turn one code word into another
Number of errors that we can detect: (d-1)
Number of errors that we can fix: (d-1)
4/20/2011 cs252-S11, Lecture 23 3
Some Code Types
Linear Codes:

Code is generated by G and in null-space of H
(n,k) code: Data space 2
k,
Code space 2
n
(n,k,d) code: specify distance d as well

Random code:
Need to both identify errors and correct them
Distance d correct (d-1) errors
Erasure code:
Can correct errors if we know which bits/symbols are bad
Example: RAID codes, where symbols are blocks of disk
Distance d correct (d-1) errors
Error detection code:
Distance d detect (d-1) errors
Hamming Codes
d = 3 Columns nonzero, Distinct
d = 4 Columns nonzero, Distinct, Odd-weight
Binary Golay code: based on quadratic residues mod 23
Binary code: [24, 12, 8] and [23, 12, 7].
Often used in space-based schemes, can correct 3 errors
C H S = d G C =
4/20/2011 cs252-S11, Lecture 23 4
Hamming Bound, symbols in GF(2)
Consider an (n,k) code with distance d
How do n, k, and d relate to one another?
First question: How big are spheres?
For distance d, spheres are of radius (d-1),
i.e. all error with weight (d-1) or less must fit within sphere
Thus, size of sphere is at least:
1 + Num(1-bit err) + Num(2-bit err) + + Num( (d-1) bit err)

Hamming bound reflects bin-packing of spheres:
need 2
k
of these spheres within code space

=
|
|
.
|
\
|
=
) 1 (
2
1
0
d
e
e
n
Size
n
d
e
k
e
n
2 2
) 1 (
2
1
0
s
|
|
.
|
\
|

=
3 , 2 ) 1 ( 2 = s + d n
n k

4/20/2011 cs252-S11, Lecture 23 5
How to Generate code words?
Consider a linear code. Need a Generator Matrix.
Let v
i
be the data value (k bits), C
i
be resulting code (n bits):

Are there 2
k
unique code values?
Only if the k columns of G are linearly independent!
Of course, need some way of decoding as well.

Is this linear??? Why or why not?
A code is systematic if the data is directly encoded
within the code words.
Means Generator has form:
Can always turn non-systematic
code into a systematic one (row ops)
But What is distance of code? Not Obvious!

( )
'
i d i
C f v =
i i
v C = G
|
|
.
|
\
|
=
P
I
G
G must be an nk matrix
4/20/2011 cs252-S11, Lecture 23 6
Implicitly Defining Codes by Check Matrix
Consider a parity-check matrix H (n[n-k])
Define valid code words Ci as those that give S
i
=0 (null space of H)

Size of null space?
(null-rank H)=k if (n-k) linearly independent columns in H
Suppose we transmit code word C with error:
Model this as vector E which flips selected bits of C to get R (received):

Consider what happens when we multiply by H:

What is distance of code?
Code has distance d if no sum of d-1 or less columns yields 0
I.e. No error vectors, E, of weight < d have zero syndromes
So Code design is designing H matrix

0 = =
i i
C S H
E C R =
E E C R S = = = H H H ) (
4/20/2011 cs252-S11, Lecture 23 7
How to relate G and H (Binary Codes)
Defining H makes it easy to understand distance of code,
but hard to generate code (H defines code implicitly!)
However, let H be of following form:

Then, G can be of following form (maximal code size):

Notice: G generates values in null-space of H and has k
independent columns so generates 2
k
unique values:
( ) I P H | =
P is (n-k)k, I is (n-k)(n-k)
Result: H is (n-k)n
|
|
.
|
\
|
=
P
I
G
P is (n-k)k, I is kk
Result: G is nk
( ) ( ) 0 |
|
|
.
|
\
|
|
|
.
|
\
|
= =
i i i
v v S
P
I
I P G H
4/20/2011 cs252-S11, Lecture 23 8
Simple example (Parity, d=2)
Parity code (8-bits):

Note: Complexity of logic depends on number of 1s in row!

( ) 111111111 = H
|
|
|
|
|
|
|
|
|
|
|
|
|
.
|
\
|
=
1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
G
v
7
v
6
v
5
v
4
v
3
v
2
v
1
v
0

+
c
8

+
s
0

C
8

C
7
C
6

C
5

C
4

C
3

C
2

C
1

C
0

4/20/2011 cs252-S11, Lecture 23 9
Simple example: Repetition (voting, d=3)
Repetition code (1-bit):

Positives: simple
Negatives:
Expensive: only 33% of code word is data
Not packed in Hamming-bound sense (only D=3). Could get much more
efficient coding by encoding multiple bits at a time

|
|
.
|
\
|
=
1 0 1
0 1 1
H
|
|
|
.
|
\
|
=
1
1
1
G
C
0

C
1

C
2

Error
v
0

C
0

C
1

C
2

4/20/2011 cs252-S11, Lecture 23 10
Binary Hamming code meets
Hamming bound
Recall bound for d=3:

So, rearranging:

Thus, for:
c=2 check bits, k 1 (Repetition code)
c=3 check bits, k 4
c=4 check bits, k 11, use k=8?
H matrix consists of all
unique, non-zero vectors
There are 2
c
-1 vectors, c used for parity,
so remaining 2
c
-c-1
Example: Hamming Code (d=3)
|
|
|
.
|
\
|
=
1 0 0 0 1 1 1
0 1 0 1 0 1 1
0 0 1 1 1 0 1
H
|
|
|
|
|
|
|
|
|
.
|
\
|
=
0 1 1 1
1 0 1 1
1 1 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
G
1 2 2 ) 1 ( 2 s s +
k n n k
n n
k n c c k
c
= + s ), 1 ( 2
4/20/2011 cs252-S11, Lecture 23 11
Example, d=4 code (SEC-DED)
Design H with:
All columns non-zero, odd-weight, distinct
Note that odd-weight refers to Hamming Weight, i.e. number of zeros
Why does this generate d=4?
Any single bit error will generate a distinct, non-zero value
Any double error will generate a distinct, non-zero value
Why? Add together two distinct columns, get distinct result
Any triple error will generate a non-zero value
Why? Add together three odd-weight values, get an odd-weight value
So: need four errors before indistinguishable from code word
Because d=4:
Can correct 1 error (Single Error Correction, i.e. SEC)
Can detect 2 errors (Double Error Detection, i.e. DED)
Example:
Note: log size of nullspace will
be (columns rank) = 4, so:
Rank = 4, since rows
independent, 4 cols indpt
Clearly, 8 bits in code word
Thus: (8,4) code

|
|
|
|
|
|
|
|
|
|
|
.
|
\
|
|
|
|
|
|
.
|
\
|
=
|
|
|
|
|
.
|
\
|
7
6
5
4
3
2
1
0
3
2
1
0
1 0 0 0 1 1 1 0
0 1 0 0 1 1 0 1
0 0 1 0 1 0 1 1
0 0 0 1 0 1 1 1
C
C
C
C
C
C
C
C
S
S
S
S
4/20/2011 cs252-S11, Lecture 23 12
Tweeks:
No reason cannot make code shorter than required
Suppose n-k=8 bits of parity. What is max code size (n)
for d=4?
Maximum number of unique, odd-weight columns: 2
7
= 128
So, n = 128. But, then k = n (n k) = 120. Weird!
Just throw out columns of high weight and make (72, 64) code!
Circuit optimization: if throwing out column vectors, pick
ones of highest weight (# bits=1) to simplify circuit
But shortened codes like this might have d > 4 in some
special directions
Example: Kaneda paper, catches failures of groups of 4 bits
Good for catching chip failures when DRAM has groups of 4 bits
What about EVENODD code?
Can be used to handle two erasures
What about two dead DRAMs? Yes, if you can really know they are dead

4/20/2011 cs252-S11, Lecture 23 13
How to correct errors?
Consider a parity-check matrix H (n[n-k])
Compute the following syndrome S
i
given code element C
i
:

Suppose that two correctable error vectors E
1
and E
2
produce same
syndrome:

But, since both E
1
and E
2
have s (d-1)/2 bits set, E
1
+ E
2
s d-1 bits set
so this conclusion cannot be true!
So, syndrome is unique indicator of correctable error vectors

E C S
i i
= = H H
( )
set bits more or d has
0
2 1
2 1 2 1
E E
E E E E
+
= + = H H H
4/20/2011 cs252-S11, Lecture 23 14
4/20/2011 cs252-S11, Lecture 23 15
Galois Field
Definition: Field: a complete group of elements with:
Addition, subtraction, multiplication, division
Completely closed under these operations
Every element has an additive inverse
Every element except zero has a multiplicative inverse
Examples:
Real numbers
Binary, called GF(2) : Galois Field with base 2
Values 0, 1. Addition/subtraction: use xor. Multiplicative inverse of 1 is 1
Prime field, GF(p) : Galois Field with base p
Values 0 p-1
Addition/subtraction/multiplication: modulo p
Multiplicative Inverse: every value except 0 has inverse
Example: GF(5): 11 1 mod 5, 23 1mod 5, 44 1 mod 5
General Galois Field: GF(p
m
) : base p (prime!), dimension m
Values are vectors of elements of GF(p) of dimension m
Add/subtract: vector addition/subtraction
Multiply/divide: more complex
Just like real numbers but finite!
Common for computer algorithms: GF(2
m
)

4/20/2011 cs252-S11, Lecture 23 16
Specific Example: Galois Fields GF(2
n
)
Consider polynomials whose coefficients come from GF(2).
Each term of the form x
n
is either present or absent.
Examples: 0, 1, x, x
2
, and x
7
+ x
6
+ 1
= 1x
7
+ 1 x
6
+ 0 x
5
+ 0 x
4
+ 0 x
3
+ 0 x
2
+ 0 x
1
+ 1 x
0

With addition and multiplication these form a ring (not quite a
field still missing division):
Add: XOR each element individually with no carry:
x
4
+ x
3
+ + x + 1
+ x
4
+ + x
2
+ x
x
3
+ x
2
+ 1
Multiply: multiplying by x is like shifting to the left.

x
2
+ x + 1
x + 1
x
2
+ x + 1
x
3
+ x
2
+ x
x
3
+ 1

4/20/2011 cs252-S11, Lecture 23 17
So what about division (mod)
x
4
+ x
2

x
= x
3
+ x with remainder 0
x
4
+ x
2
+ 1

X + 1
= x
3
+ x
2
with remainder 1
x
4
+ 0x
3
+ x
2
+ 0x + 1

X + 1
x
3
x
4
+ x
3

x
3
+ x
2

+ x
2
x
3
+ x
2

0x
2
+ 0x
+ 0x
0x + 1
+ 0
Remainder 1
4/20/2011 cs252-S11, Lecture 23 18
Producing Galois Fields
These polynomials form a Galois (finite) field if we
take the results of this multiplication modulo a prime
polynomial p(x)
A prime polynomial cannot be written as product of two non-trivial
polynomials q(x)r(x)
For any degree, there exists at least one prime polynomial.
With it we can form GF(2
n
)
Every Galois field has a primitive element, o, such
that all non-zero elements of the field can be
expressed as a power of o
Certain choices of p(x) make the simple polynomial x the primitive
element. These polynomials are called primitive
For example, x
4
+ x + 1 is primitive. So o = x is a
primitive element and successive powers of o will
generate all non-zero elements of GF(16).
Example on next slide.

4/20/2011 cs252-S11, Lecture 23 19
Galois Fields with primitive x
4
+ x + 1
o
0
= 1
o
1
= x
o
2
= x
2

o
3
= x
3

o
4
= x

+ 1
o
5
= x
2
+ x
o
6
= x
3
+ x
2

o
7
= x
3
+ x

+ 1
o
8
= x
2
+ 1
o
9
= x
3
+ x
o
10
= x
2
+ x

+ 1
o
11
= x
3
+ x
2
+ x

o
12
= x
3
+ x
2
+ x

+ 1
o
13
= x
3
+ x
2

+ 1
o
14
= x
3
+ 1
o
15
= 1
Primitive element = x in GF(2
n
)

In general finding primitive
polynomials is difficult. Most
people just look them up in a
table, such as:
4
= x
4
mod x
4
+ x + 1
= x
4
xor x
4
+ x + 1
= x + 1
4/20/2011 cs252-S11, Lecture 23 20
Primitive Polynomials
x
2
+ x +1
x
3
+ x +1
x
4
+ x +1
x
5
+ x
2
+1
x
6
+ x +1
x
7
+ x
3
+1
x
8
+ x
4
+ x
3
+ x
2
+1
x
9
+ x
4
+1
x
10
+ x
3
+1
x
11
+ x
2
+1
x
12
+ x
6
+ x
4
+ x +1
x
13
+ x
4
+ x
3
+ x +1
x
14
+ x
10
+ x
6
+ x +1
x
15
+ x +1
x
16
+ x
12
+ x
3
+ x +1
x
17
+ x
3
+ 1
x
18
+ x
7
+ 1
x
19
+ x
5
+ x
2
+ x+ 1
x
20
+ x
3
+ 1
x
21
+ x
2
+ 1

x
22
+ x +1
x
23
+ x
5
+1
x
24
+ x
7
+ x
2
+ x +1
x
25
+ x
3
+1
x
26
+ x
6
+ x
2
+ x +1
x
27
+ x
5
+ x
2
+ x +1
x
28
+ x
3
+ 1
x
29
+ x +1
x
30
+ x
6
+ x
4
+ x +1
x
31
+ x
3
+ 1
x
32
+ x
7
+ x
6
+ x
2
+1

Galois Field Hardware
Multiplication by x shift left
Taking the result mod p(x) XOR-ing with the coefficients of p(x)
when the most significant coefficient is 1.
Obtaining all 2
n
-1 non-zero
elements by evaluating x
k
Shifting and XOR-ing 2
n
-1 times.
for k = 1, , 2
n
-1

4/20/2011 cs252-S11, Lecture 23 21
Reed-Solomon Codes
Galois field codes: code words consist of symbols
Rather than bits
Reed-Solomon codes:
Based on polynomials in GF(2
k
) (I.e. k-bit symbols)
Data as coefficients, code space as values of polynomial:
P(x)=a
0
+a
1
x
1
+ a
k-1
x
k-1

Coded: P(0),P(1),P(2).,P(n-1)
Can recover polynomial as long as get any k of n
Properties: can choose number of check symbols
Reed-Solomon codes are maximum distance separable (MDS)
Can add d symbols for distance d+1 code
Often used in erasure code mode: as long as no more than n-k
coded symbols erased, can recover data
Side note: Multiplication by constant in GF(2
k
) can be represented
by kk matrix: ax
Decompose unknown vector into k bits: x=x
0
+2x
1
++2
k-1
x
k-1

Each column is result of multiplying a by 2
i

4/20/2011 cs252-S11, Lecture 23 22
Reed-Solomon Codes (cont)
|
|
|
|
|
|
.
|
\
|
|
|
|
|
|
|
|
|
|
|
.
|
\
|
=
4
3
2
1
0
4 3 2 1 0
4 3 2 1 0
4 3 2 1 0
4 3 2 1 0
4 3 2 1 0
4 3 2 1 0
4 3 2 1 0
7 7 7 7 7
6 6 6 6 6
5 5 5 5 5
4 4 4 4 4
3 3 3 3 3
2 2 2 2 2
1 1 1 1 1
a
a
a
a
a
G
|
|
.
|
\
|
=
1 1 1 1 1 1 1
0 0 0 0 0 0 0
'
7 6 5 4 3 2 1
7 6 5 4 3 2 1
H
Reed-solomon codes
(Non-systematic):
Data as coefficients, code space as
values of polynomial:
P(x)=a
0
+a
1
x
1
+ a
6
x
6

Coded: P(0),P(1),P(2).,P(6)
Called Vandermonde Matrix:
maximum rank
Different representation
(This H and G not related)
Clear that all combinations of
two or less columns
independent d=3
Very easy to pick whatever d you
happen to want: add more rows
Fast, Systematic version of
Reed-Solomon:
Cauchy Reed-Solomon, others

4/20/2011 cs252-S11, Lecture 23 23
Aside: Why erasure coding?
High Durability/overhead ratio!
Exploit law of large numbers for durability!
6 month repair, FBLPY:
Replication: 0.03
Fragmentation: 10
-35

Fraction Blocks Lost
Per Year (FBLPY)
4/20/2011 cs252-S11, Lecture 23 24
Statistical Advantage of Fragments
Latency and standard deviation reduced:
Memory-less latency model
Rate code with 32 total fragments
Time to Coalesce vs. Fragments Requested (TI5000)
0
20
40
60
80
100
120
140
160
180
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Objects Requested
L
a
t
e
n
c
y
4/20/2011 cs252-S11, Lecture 23 25
Conclusion
ECC: add redundancy to correct for errors
(n,k,d) n code bits, k data bits, distance d
Linear codes: code vectors computed by linear transformation
Erasure code: after identifying erasures, can correct
Reed-Solomon codes
Based on GF(p
n
), often GF(2
n
)
Easy to get distance d+1 code with d extra symbols
Often used in erasure mode

4/20/2011 cs252-S11, Lecture 23 26

CS252 Graduate Computer Architecture Error Correction Codes April 20, 2011

Uploaded by

Copyright:

Available Formats

CS252 Graduate Computer Architecture Error Correction Codes April 20, 2011

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS252 Graduate Computer Architecture Error Correction Codes April 20, 2011

Uploaded by

Copyright:

Available Formats

CS252

Graduate Computer Architecture

You might also like