ELG 5372 Error Control Coding

ELG 5372 Error Control Coding
Claude D’Amours
Lecture 1: Introduction to Coding
Introduction
z Shannon demonstrated that by proper

encoding of information, errors introduced
by a noisy environment can be reduced to
any desired level without sacrificing
transmission rate, as long as transmission
rate is below capacity of channel.
z Since Shannon’s work, much research has
been done to find efficient encoding and
decoding methods.
Introduction (2)
z Transmission and storage of digital

information are two processes that transfer
data from an information source to a
destination.
Digital Source
m c
Information Source Channel Modulator/
Equivalent channel
source coder coder writer
Channel/
memory
Source m’ Channel c’ Demod./
Sink
Destination
decoder decoder reader
Types of Codes
z Two structurally different types of codes are

typically used:
z Block Codes
z Hamming
z BCH, RS
z LDPC
z Convolutional Codes
z Turbo codes typically use convolutional codes as
constituent codes
z TCM based on convolutional codes
Block Codes
z A block of k digital symbols is input to the encoder
from which n digital symbols are output (typically
n > k).
Block
encoder
k symbols n symbols
z Each k bit message sequence is mapped to one of

Mk possible codewords. Since there are Mn
possible M-ary sequences of length n, errors can be
detected if an invalid codeword is received.
Block Codes (2)
z Code rate R = k/n.

z The message sequence carries k symbols of
information.
z The codeword, which carries k symbols of
information, is made up of n symbols.
z There are (n-k) redundant symbols.
Convolutional Codes
zA convolutional code also produces n

symbols for k input symbols.
z However, output not only depends on current
k inputs but also on km previous inputs,
where m is the encoder memory.
z Encoder is implemented by a sequential
logic circuit.
Modulation and Coding
z Symbol rate = Rs, signaling interval = T = 1/Rs.

z For each symbol, the modulator selects a
waveform of duration T to represent the symbol to
be transmitted.
z Example BPSK:
2 Eb
s0 (t ) = cos(2πf c t + π ), 0 ≤ t ≤ T
T
2 Eb
s1 (t ) = cos(2πf c t ), 0 ≤ t ≤ T
T
Modulation and Coding (2)
z Transmitted signal is:
N
s (t ) = ∑ si (t − nT )
n =0
where i = 0,1,…,M-1 and is random (i = 0 or 1
for binary case).
z The received signal is:
r (t ) = a (t ) s (t − τ (t )) + n(t )
where a(t) is the time varying channel gain, τ(t) is the
delay introduced by the channel and n(t) is additive noise.
z AWGN Channel
z a(t) = a and τ(t) = τ.
z Flat Rayleigh Fading
z a(t) = time varying Rayleigh envelope
z τ(t) introduces time varying phase shift.
z Noise introduces detection errors at the

receiver.
z Error rate is function of Es/No.
z BER for BPSK in AWGN is:

⎛ 2 Eb ⎞
Pb = Q⎜ ⎟
⎝ No
⎠
z BER for BPSK in slow flat Rayleigh fading with

ideal channel phase compensation:
1⎡ γb ⎤
Pb = ⎢1 − ⎥
2 ⎢⎣ 1+ γ b ⎥⎦
z Coding increases symbol rate (k info bits

without coding, n code bits after coding).
z For same transmitted power, the code bit
energy is less than the uncoded bit energy Ec
= REb = (k/n)Eb.
z Therefore, the probability that a code bit is
incorrectly detected is higher than the
probability that an uncoded bit is incorrectly
detected.
z Coded data streams provide improved bit

error rates after decoding due to the error
correction capabilities of the code.
Example Hamming (7,4)
z 0000 0000000 1000 1000110

z 0001 0001101 1001 1001011
z 0010 0010111 1010 1010001
z 0011 0011010 1011 1011100
z 0100 0100011 1100 1100101
z 0101 0101110 1101 1101000
z 0110 0110100 1110 1110010
z 0111 0111001 1111 1111111
z Assume that we transmit 0000 in the uncoded case.

z If the first bit is incorrectly detected, we receive 1000,
which is a valid message.
z Assume that we transmit 0000000 in the uncoded
case
z If the first bit is detected in error, we receive 1000000,
which is not a valid codeword.
z Error has been detected.
z Codeword 0000000 differs from 1000000 in only one bit
position. All other codewords differ from 1000000 in at
least two positions.
z Assuming independent errors

z P(uncoded word error) = 1-(1-Pu)4.
z P(coded word error) = 1-(1-Pc)7-7Pc(1-Pc)6.
z In AWGN channel:
⎛ 2 Eb ⎞ ⎛ 2 ( 4 / 7 ) Eb ⎞
Pu = Q⎜ ⎟, Pc = Q⎜ ⎟
⎝ N0
⎠ ⎝ N0
⎠
Example Hamming (7,4) WER
Word Error Rate
1.00E+00
1.00E-01
WER
1.00E-02 uncoded
coded
1.00E-03
1.00E-04
0 1 2 3 4 5 6 7 8 9
Eb/No (dB)
Example Hamming (7,4) BER
z BER
z Uncoded Pb = Pu.
z Coded
Pb = 9 Pc2 (1 − Pc ) 5 + 19 Pc3 (1 − Pc ) 4 + 16 Pc4 (1 − Pc ) 3 +

12 Pc5 (1 − Pc ) 2 + 7 Pc6 (1 − Pc )1 + Pc7
Example Hamming (7,4) BER
Bit Error Rate
1.00E+00
1.00E-01
BER
1.00E-02
uncoded
coded
1.00E-03
1.00E-04
0 1 2 3 4 5 6 7 8 9
Eb/No (in dB)
ELG 5372 Error
Control Coding
Claude D’Amours
Lecture 2: Introduction to Coding 2
Decoding Techniques
Hard Decision
Receiver detects data before decoding
Soft Decision
Receiver quantizes received data and decoder uses
likelihood information to decode.
Magnitude of decision variable usually indicates
the likelihood that the detection is correct.
Maximum Likelihood Decoding (MLD).
Hard Decision vs Soft Decision
0’
0 0 0
0
1 1 1
1
1’
Hard Decision Soft Decision (2 bit quantization)

Maximum Likelihood Decoding
Transmit codeword c
Receive word r.
Decode c’.
Decoding error occurs if c’ ≠ c.
Maximum Likelihood Decoding (2)
P( E | r ) = P(c' ≠ c | r )
P( E ) = ∑ P (c' ≠ c | r ) P (r )
r
The optimum decoding rule is one that minimizes P(E).

P(r) is independent of the decoding rule, therefore we must
minimize P(c’≠c|r), which is equivalent to maximizing
P(c’=c|r).
Maximum Likelihood Decoding (3)
P(c | r ) = P (r | c) P (c) / P(r )
Assuming all codewords are equally likely, maximizing P(c|r)
is the same as maximizing P(r|c). Assuming hard decision
with discrete memoryless channel (DMC):
P (r | c ) = ∏ P ( ri | c i ) = p d ( r , c ) (1 − p ) n − d ( r , c ) (1)
i
log P ( r | c ) = d ( r , c ) log p + ( n − d ( r , c )) log( 1 − p ) (2)
Since p < (1-p), P(r|c) is maximized by the codeword for

which d(r,c) is minimized. This is known as a minimum
distance decoding rule.
Hamming Distance vs Euclidean
Distance
Hamming Distance = number of positions in which
two vectors differ.
Hard decision decoding decodes using minimum
Hamming distance rule previously shown
Euclidean distance between r and c is ||r-c||
Soft decision decoding uses minimum Euclidean
distance (approximately)
Decision Variables
fri(ri|-1) fri(ri|1)
-1 0 +1 ri
P(ci=1)/P(ci=-1)=fri(ri|ci=1)/fri(ri|ci=-1)
Example: HD vs SD
Consider Hamming (7,4) code used with the
following channels
0’
0 P(0’|0)=P(1’|1)=0.6
0 0
0 P(0|0)=P(1|1)=0.3
1 P(1|0)=P(0|1)=0.099
1 1
1 P(1’|0)=P(0’|1)=0.001
P(0|0)=P(1|1)=0.9
1’
P(1|0)=P(0|1)=0.1
(a) HD (b) SD
Example: HD vs SD
Suppose we receive r = 1’ 0 0’ 0 0’ 0’ 0’
For HD, there is no quantization, so r = 1000000 and
will be decoded as 0000000 using minimum
Hamming distance rule.
In the SD case, using (1) with c = 0000000, we get
P(r|c) = 0.000117
However, for c=1101000, we get P(r|c) = 0.000762
This means that for the given r, it is almost 7 times
more probable that 1101000 was transmitted than
0000000.
Errors and Channel Models
Memoryless channels: Noise affects each transmitted
symbol independently.
Tx symbol has probability p of being received incorrectly
and probability 1-p of being received correctly.
Transmission errors occur randomly in the received
sequence.
Memoryless channels are often referred to as random-error
channels.
Errors and Channel Models (2)
Examples
AWGN: ri = si+ni, E[ni]=0, E[ni2]=σn2 and
E[ninj]=0 for i≠j.
DMC: P[0|0]
0 0
P[1|0]
P[0|1]
1 P[1|1]
1
Channels with memory
Errors do not occur randomly.
Either noise is not independent from
transmission to transmission (coloured noise)
Or slow time varying signal to noise ratio
causes time dependent error rates (fading
channels).
Gilbert and Fritchman model
q1
Good Bad
1-q1 state State 1-q2
q2
Channels with memory lead to error
bursts.
Burst-error correcting codes
Random error correcting codes with
interleaving-deinterleaving to randomize
errors.
Performance Measures
Probability of decoding error P(E).
Probability that codeword at output of decoder is not the
transmitted one.
Also referred to as word error rate (WER) or Block error
rate (BLER).
Bit error rate (BER) Pb
Probability that message bit at output of decoder is
incorrect.
Coding Gain (measured in dB)
Savings in transmitted power to achieve a specific BER
using coding compared to uncoded case
Performance Measures 2
Asymptotic coding gain
Coding gain when Eb/No → ∞
Performance Measures 3
1.00E-02
1.00E-03
1.00E-04
1.00E-05
BER
1.00E-06
1.00E-07 uncoded
coded
1.00E-08
1.00E-09
1.00E-10
6 7 8 9 10 11 12 13
Eb/No (in dB)
Coded Modulation
Use of ECC creates bandwidth expansion due to
redundant symbols.
Combining ECC and modulation allows the
redundancy to be contained in the modulation
1 011
s1(t)+s0(t-T)+s1(t-2T)+s1(t-3T)
s1(t)+s2(t-T)+s1(t-2T)+s3(t-3T)
Memory is created without adding redundant bits
by using a higher order modulation scheme and
using a bit in two successive symbols.
Trellis Coded Modulation
State machine adds redundant bits and
creates memory
State change is encoded by selecting a
symbol from a larger than needed
constellation, thus no bandwidth
expansion occurs and significant coding
gains are achieved.
ELG 5372 Error Control Coding
Claude D’Amours
Lecture 3: Algebra (1): Groups,
Subgroups and Cosets
Groups
Let G be a set of elements and * is a
binary operation defined on G such that
for all elements a, b ∈G then c = a*b.
If c ∈G, for all a and b, then G is closed under
the operation *.
For example, if G is the set of all real numbers,
then G is closed under the real addition (+)
operation.
Also, the operation is said to be associative if
for a, b, c∈G, then (a*b)*c = a*(b*c)
Definition of a Group
The set G on which the binary operation *
is defined is referred to as a group if the
following conditions are met:
* is associative
G contains an identity element. In other words,
for a, e ∈G, e is an identity element if a*e = a
for all a.
For any element a∈G, there exists an inverse
element a’ ∈G such that a*a’ = e.
The groups is commutative if for any a, b ∈
G, a*b = b*a
Examples
G is the set of all real numbers
under multiplication.
Multiplication is associative
a×1 = a for all a ∈ G and 1 ∈ G.
a×(1/a) = 1 and 1/a ∈ G.

Example 2
H is the set of all positive integers plus
0 under addition
Addition is associative
a + 0 = a, 0 ∈H.
a + (-a), but -a∉H.
Therefore H is not a group under addition.
Theorem 1
The identity element of any group is
unique
Theorem 2
The inverse of a group element is
unique.
For any element a, there exists only one
inverse, a’, such that a*a’ = e.
Subgroups
Let G be a group under the binary
operation *. Let H be a nonempty
subset of G. H is a subgroup of G if
the following conditions are met:
H.
H is closed under *. (property 1)

For any element a ∈H,
the inverse of a, a’∈H. (property 2)
Subgroups
If H is a subgroup of G, then H is also a
group on its own.
Since a and a’ are elements of H, then e
must be an element of H (property 1).
Since H is made up of elements in G, for
which he associative property holds, it
must also hold of H.
Example
G is the set of all integers under
addition.
G is a commutative group under addition.
H is the set of all even integers under
addition.
All elements of H are in G.
If a is in H, -a is also in H.
Example 2
Let H2 be the set of all odd integers
under addition.
Is H2 a subgroup of G?
Cosets
Let H be a subgroup of a group G under
the binary operation *. Let a be any
element in G.
Then the set of elements a*H which is
defined as {a*h : h∈H} is called a left
coset of H and
the set of elements H*a which is defined
as {h*a : h ∈H} is called a right coset of H.
Example
G = {0, 1, 2, 3, 4, 5} under modulo-6 addition is a
group.
Let H = {0, 2, 4}
Let a = 1
(a+H)mod6 = {1, 3, 5} is a left coset of H. (H +
a)mod6 = {1, 3, 5} is a right coset of H.
If, for the same a, the left and right cosets are equal,
then G must be a commutative group. In this case,
we don’t refer to cosets as being left or right cosets.
They are simply referred to as cosets of H.
(setting a = 2 or 4 produces H}
(setting a = 3 or 5 produces (1+H)mod6)
Subgroup and its cosets
A subgroup and its cosets are distinct
and the union of a subgroup and its
cosets form G.
Theorem 3
Let H be a subgroup of G under *. No
two elements in a coset of H are
identical.
Theorem 4
No two elements in different cosets of
the subgroup H of a group G are
identical.
ELG 5372 Error Control
Coding
Lecture 4: Algebra 2: Fields

and Polynomials
Fields
z A field is a set of elements on which we
can perform addition, subtraction,
multiplication and division without
leaving the set.
Formal Definition of a Field
z Let F be a set of elements on which two binary
operations called addition ‘+’ and multiplication ‘×’ are
defined. The set is a field under these two operations if
the following conditions are satisfied:
1. F is a commutative group under addition. The identity
element with respect to addition is called the zero element
of F and is denoted by 0.
2. The nonzero elements of F {{F}-0} form a commutative
group under multiplication. The multiplicative identity is
termed the unity element in F and is denoted by 1.
3. Multiplication is distributive over addition. In other words,
for a, b, c in F, a×(b+c) = a×b + a×c.
Some Notation
z For a in F, -a is the additive inverse of a.
• Example: in GF(3) if a =1, -a = 2.
z For a in F, 1/a is the multiplicative
inverse of a.
• Example: in GF(3) if a = 2, 1/a = 2.
z This will become evident as we progress
through the lecture.
Properties of Fields
1. For every element a in F, a×0 = 0×a = 0.
2. For every two non-zero elements a, b in F,
a×b ≠ 0.
3. For a, b in F, a×b = 0 for a≠ 0 implies b = 0.
4. For any two elements in a field -(a × b) = (-a)
× b = a × (-b).
5. For a ≠ 0, a × b = a × c implies that b = c.
Galois Field 2 (GF(2)): The
Binary Field
z A binary field can be constructed under
modulo-2 addition and modulo-2
multiplication.
+ 0 1 × 0 1
0 0 1 0 0 0
1 1 0 1 0 1
Modulo-2 Addition Modulo-2 Multiplication

GF(p)
z Using the same idea as GF(2), we can
generate any Galois field with a prime
number, p, of elements over modulo-p
addition and multiplication.
Example GF(3)
+ 0 1 2 × 0 1 2
0 0 1 2 0 0 0 0
1 1 2 0 1 0 1 2
2 2 0 1 2 0 2 1
Modulo-3 Addition Modulo-3 Multiplication

Extension Fields GF(pm)
z We cannot construct finite fields simply by
using modulo arithmetic.
z For example, GF(4) is not 0,1,2,3 using
modulo-4 addition and multiplication.
z GF(4) can be constructed by considering it as
2 dimensional GF(2).
z GF(4)={(0,0), (0,1), (1,0), (1,1)}.
z We say that GF(4) is an extension field of
GF(2).
Characteristic of a Field
z Consider a finite field of q elements, GF(q).
k
z Let t k = ∑1 .
i =1
z Let λ be the smallest value of k for which tk =
0.
z Then λ is called the characteristic of the field
GF(q).
z For example, in GF(2), λ = 2 (since 1+1 = 0).
In GF(3), 1+1+1 = 0, thus λ = 3.
Theorem 5
z The characteristic of a field is always a

prime number.
Order of an element in GF(q)
z Suppose α is a nonzero element in GF(q).
z Since the non-zero elements in a field form a
closed set under multiplication, then α2, α3,
α4 … are also elements in GF(q).
z The order of element α in GF(q) is the
smallest integer, ord(α), for which αord(α) = 1.
Example GF(3)
z GF(3)={0,1,2}
z 1: 11 = 1, therefore ord(1) = 1.
z 2: 21 = 2, 22 = 4mod3=1, therefore ord(2)
= 2.
Theorem 6
z Let α be a non-zero element in GF(q).
Then αq-1 = 1.
Theorem 7
z Let α be an element in GF(q). Then
ord(α) divides q-1. (ord(α)|q-1)
Primitive Elements
z Any element in GF(q) whose order is q-1 is a
primitive element of GF(q).
• For example, in GF(3), element 2 has order 2. Thus 2
is a primitive element of GF(3).
z Let α be a primitive element in GF(q), then the
series α1, α2, …, αq-1 produces q-1 distinct
non-zero elements in GF(q).
z In other words, the q-1 successive powers of
a primitive element α produce all of the
non-zero elements in GF(q). Thus GF(q) =
{0, α, α2, …, αq-1}.
Example GF(4)
z 0 = (0,0), 1 = (0,1), α = (1,0) and α2 = (1,1).
z In other words, α2 = α+1 (*).
z If α is the primitive, then ord(α) = 3.
z α3 = α2α = (α+1)α = α2+α = α+1+α =
(1,0)+(0,1)+(1,0) = (1+0+1,0+1+0) = (0,1).
z Primitive element is defined by (*).
z How do we define the primitive of a field?
z Special type of polynomial: primitive
polynomial.
Polynomials over GF(q)
z The polynomial f(X) = f0 + f1X + f2X2 + …
+fnXn is a polynomial of degree n over
GF(q) if the coefficients fi come from
GF(q) and obey GF(q) arithmetic.
z Suppose f(X) and g(X) are two
polynomials over GF(q) and are given by
(assume m<n):
f ( X ) = f o + f1 X + ... + f n X
n
g ( X ) = g o + g1 X + ... + g m X m
Addition of polynomials
f ( X ) + g ( X ) = ( f o + g o ) + ( f1 + g1 ) X + ... + ( f m + g m ) X m
+ f m +1 X m +1 + ... + f n X n
Where all additions are performed as

defined in GF(q)
Multiplication of polynomials
z f(X)g(X) = c0 + c1X + … cn+mXn+m
c0 = f0 g0
c1 = f 0 g1 + f1 g 0
c2 = f 0 g 2 + f1 g1 + f 2 g 0
M M M
cn+ m = fn gm
Examples
z Polynomials in GF(2)
f (X ) = 1+ X + X 3
g( X ) = 1 + X 2
z f(X)+g(X) = (1+1) + (1+0)X + (0+1)X2 +
(1+0)X3 = X + X2 + X3
z f(X)g(X) = (1+X+X3) × (1+X2) = 1 + X2 +
X + X3 + X3 +X5 = 1 + X + X2 + (1 + 1)X3
+ X5 = 1 + X + X2 + X5.
Examples
z Polynomials in GF(4)
f ( X ) = 1 + αX + αX 2
g( X ) = 1 + α 2 X
Properties of Polynomials over
GF(q)
Commutative
a(X) + b(X) = b(X) + a(X)
a(X)b(X) = b(X)a(X)
Associative
a(X) + [b(X) + c(X)] = [a(X) + b(X)] + c(X)
a(X)[b(X)c(X)] = [a(X)b(X)]c(X)
Distributive
a(X)[b(X) + c(X)] = a(X)b(X) + a(X)c(X)
Polynomial Division
z When we divide f(X) by g(X), we get two
new polynomials; q(X) is the quotient
and r(X) is the remainder.
z The degree of the remainder, r(X) is
smaller than the degree of g(X).
X2 +1
X 3 +1 X 5 + +X2 +1
X5 + X3
X3 + X2 +1
X3 + +1
X2
ELG 5372 Error
Control Coding
Lecture 5: Algebra 3:
Irreducible, Primitive and
Minimal Polynomials
Irreducible Polynomials
• When f(X) is divided by g(X) and r(X) = 0, then g(X) is

a factor of f(X) and we say that f(X) is divisible by
g(X). (g(X)|f(X))
• If a polynomial f(X) has no factors other than 1 and
itself, then we say that the polynomial is irreducible.
• Furthermore, any reducible polynomial can be
expressed as the multiplication of a group of
irreducible polynomials much like any number can be
factored into a multiplication of primes.
Irreducible, Primitive and Minimal Polynomials

Factorization of Polynomials
• For f(X) on GF(q) and β is an element of GF(q), if f(β) = 0, then β

is a root of f(X) and f(X) is divisible by X-β.
• Example
– On GF(2), if f0 = 0 for any polynomial, then it is divisible by X.
• f(X) = X+X2 has 0 as a root, therefore f(X) = X(1+X). (as
we can see, it also has 1 as a root.
– On GF(2), if f(X) has an even number of terms, then f(1) = 0.
Therefore (X+1) is a factor of f(X).
• f(X) = 1+X+X3+X4. f(1) = 1+1+13+14 = 1+1+1+1=0.
• f(X)=(1+X3)(1+X). Furthermore, we can show that 1+X3 =
(1+X)(1+X+X2).
– 1+X+X2 is a polynomial of degree 2. It is irreducible in
GF(2).

Factorization of Polynomials (2)
• Suppose we define f(X) = 1+X+X2 over GF(4).

• Then f(0) = 1, f(1) = 1, f(α) = 1+α+α2 = α2+α2 = 0 and
f(α2) = 1+α2+(α2)2 = 1+α2+α = 0.
• Thus α and α2 are roots of 1+X+X2 in GF(4). Thus
1+X+X2 = (X-α)(X-α2) = (X+α)(X+α2).
• The conclusion here is that a polynomial that is
irreducible in GF(p), is not necessarily irreducible in
GF(pm).

Theorem 8
• An irreducible polynomial of degree m on GF(p)

divides X −1 − 1
m
p
• For proof of theorem 8 see R.J. McEliece, Finite Fields for

Computer Scientists and Engineers, Boston: Kluwer Academic
Publishers, 1988.
• It will become apparent when we discuss minimal polynomials.

Example of Theorem 8
• We have seen that 1+X+X2 is irreducible in GF(2).

Therefore according to Theorem 8, it must divide
1+X3.
X +1
X 2 + X +1 X 3 +1
X3 + X2 + X
X 2 + X +1
X 2 + X +1
0

Primitive Polynomials
• An irreducible polynomial on GF(p), f(X), is said to be primitive if

the smallest value of n for which it divides Xn-1 is n = pm-1.
• In other words, although all irreducible polynomials divide Xn-1
where n = pm-1, some polynomials also divide Xn-1 where n <
pm-1. These polynomials are not primitive.
• 1+X+X2 is a primitive polynomial on GF(2), as it divides X3+1 but
it does not divide Xn+1 for n<3.
• 1+X+X4 is an irreducible polynomial in GF(2). It divides X15+1,
but it does not divide Xn+1 for n<15. Therefore it is primitive.
• X4+X3+X2+X+1 is irreducible on GF(2). It divides X15+1, but it
also divides X5+1. It is, therefore, not primitive.

Theorem 9
• An irreducible polynomial of degree m in GF(p) has

roots in GF(pm) that all have the same order. In other
words, if f(X) is a polynomial of degree m and is
irreducible in GF(p), and if f(α) = f(β) = 0 in GF(pm),
then ord(α) = ord(β).
– This will become evident when we discuss
conjugacy classes and minimal polynomials.

Theorem 10
• Primitive polynomials of degree m in GF(p) have

roots in GF(pm) which have order pm-1. In other
words, if f(X) is primitive in GF(p), and f(α) = 0 in
GF(pm), then α has order pm-1.
– Proof using theorems 8 and 9.

Consequence of Theorem 10
• If f(X) is a primitive polynomial of degree m in GF(p)

and α is a root of f(X) in GF(pm), then a has order pm-
1 in GF(pm) and is therefore a primitive element in
GF(pm).

Example
• GF(4) as an extension field of GF(2).

– f(X)=X2+X+1 is a primitive polynomial of degree 2 in GF(2).
– m = 2.
– The root of f(X) in GF(22) is a primitive element of GF(22).
– Element α is a root of f(X) in GF(4) if α2+α+1=0. Or α2=α+1.
– Then α1 = α, α2 = α+1 and α3 = α2α = α2+α = α+1+α =
(1+1)α+1 = 1.

Example 2
• GF(8) as an extension field of GF(2).

– We need a primitive polynomial of degree 3.
– X3+X+1 is irreducible and divides X7+1 but does not divide
Xn+1 for n < 7. Therefore X3+X+1 is primitive.
– The element α is a root if α3 = α+1.
– GF(8) is {0, α1=α, α2= α2, α3= α+1, α4= α2+ α, α5= α3+ α2=
α2+ α+1, α6= α3+ α2+α= α2+ 1, α7= α3+α = 1}.
– Vectorially, GF(8) = {(0,0,0), (0,0,1), (0,1,0), (1,0,0), (0,1,1),
(1,1,0), (1,1,1), (1,0,1)}.

Minimal Polynomials and Conjugate Elements
• A minimal polynomial is defined as follows:

– Let α be an element in the field GF(qm). The
minimal polynomial of α with respect to GF(q) is
the smallest degree non-zero polynomial p(X) in
GF(q) such that p(α) = 0 in GF(qm).

Properties of Minimal Polynomials
• For each element α in GF(qm) there exists a unique,

non-zero polynomial p(X) of minimal degree in GF(q)
such that the following are true:
1. p(α) = 0 in GF(qm)
2. The degree of p(X) is less than or equal to m
3. f(α)=0 implies that f(X) is a multiple of p(X).
4. p(X) is irreducible in GF(q).

Conjugates of field elements
• Let β be an element of GF(qm).

• β is a conjugate of β, where i is an integer.
qi
• Theorem 11
– The conjugacy class of β is made up of the
sequence β , β q , β q 2 , β q 3 ,..., β q d −1
– If we continue the sequence βd = β and this is the
first element of the sequence to be repeated.
– d divides m.
See S.B. Wicker, Error Control Systems for Digital Communication and Storage,
Upper Saddle River, NJ: Prentice Hall, 1995, pages 55-56 for proof.

Example
• Conjugacy class of elements in GF(8) wrt GF(2)

– {1}
– {α, α2, α4}
– {a3, α6, α5}
• Conjugacy class of elements in GF(16) wrt GF(4)
– {1}
– {α, α4}, {α2, α8}, {α3, α12}, {α5}
– {α6, α9}, {α7, α13}, {α10}, {α11, α14}

Theorem 12
• Let β , which is an element in GF(qm), have a minimal

polynomial p(X) with respect to GF(q).
• The roots of p(X) in GF(qm) are the conjugates of β
with respect to GF(q).
From Theorem 12 we find that if p(X) is a minimal

polynomial of β in GF(qm) wrt GF(q), then
d −1
p( X ) = ∏ ( X − β ) qi
i =0

Example
• Minimal polynomials of GF(4) wrt GF(2):

– {1} -> X+1
– {α, α2} -> (X+α)(X+α2) = X2+(α+α2)X+α3 = X2+X+1
• Minimal polynomials of GF(8) wrt GF(2)
– {1} -> X+1
– {α, α2, α4} -> (X+α)(X+α2) (X+α4)=X3 +
(α+α2+α4)X2 + (α3+α5+α6)X + α7= X3+X+1.
– {α3, α5, α6} -> (X+α3)(X+α5) (X+α6)=X3 +
(α3+α5+α6)X2 + (α+α2+α4)X + α7= X3+X2+1.

ELG 5372 Error
Control Coding
Lecture 6: (a) Factoring Xn-1

and (b) Introduction to Linear
Block Codes: Vector Spaces
Factoring Xn-1
• In GF(pm), the expression Xn-1 has n roots, β1, β2, …,

βn.
• The order of these roots, ord(βi) must divide n and n
must divide pm-1.
• If we wish to factor Xn-1 in GF(p), we need to find the
minimal polynomials in GF(pm) wrt GF(p).
• Consider X7+1 in GF(2).
• We need to determine the extension field of GF(2) in
which there are at least 7 roots of this equation with
order that divides 7. -> GF(8).
Factoring Xn-1
• Since all nonzero elements of GF(8) have order 1 or 7, then β7-

1=0 for β = any non-zero element in GF(8).
• The minimal polynomials of GF(8) wrt to GF(2) are polynomials
in GF(2) but have the nonzero elements of GF(8) as roots in
GF(8).
– {1} -> X+1
– {α, α2, α4} -> (X+α)(X+α2) (X+α4)=X3 +
(α+α2+α4)X2 + (α3+α5+α6)X + α7= X3+X+1.
– {α3, α5, α6} -> (X+α3)(X+α5) (X+α6)=X3 +
(α3+α5+α6)X2 + (α+α2+α4)X + α7= X3+X2+1.
– (X3+X2+1)(X3+X+1)(X+1) = X7+1.
Factoring Xn-1
• Factoring Xn-1 when n = pm-1 is simple as we only need to find

the minimal polynomials of GF(pm) wrt to GF(p).
– For example X15+1 is equal to the multiplication of all
minimal polynomials of GF(16) wrt GF(2).
• However, it is a little more complicated to factor Xn-1 when n ≠
pm-1.
• For example to factor X5+1 in GF(2), we need to find an
extension field in which there are nonzero elements of order 5,
or that divide 5.
– In GF(16), elements must have order 15, 5, 3, or 1.
– Therefore we choose this field in which to find the roots of
X5+1.
Factoring Xn-1
• We find an element of GF(16) that has order 5.

– The element α3 has order 5.
– The conjugacy class of a3 is {α3, α6, α12, α9}.
– The minimal polynomial corresponding to this
group is X4+X3+X2+X+1.
– There are no other elements in GF(16) that have
order 5.
– The element 1 has order 1 which divides 5.
Therefore X+1 must divide X5+1.
– X5+1 = (X4+X3+X2+X+1)(X+1).
Squaring Polynomials in GF(2)
• Let p(X) = a0 + a1X+ a2X2+…+amXm.

• p2(X) = ?
• In GF(2), (a)2 = a. Furthermore (a+b)2 =
a2+b2+ab+ab = a2+b2
• [a0+(a1X+ a2X2+…+amXm)]2 = a02+(a1X+
a2X2+…+amXm)2=a0+.
• +(a1X+ a2X2+…+amXm)2
• [a1X+ (a2X2+…+amXm)]2 = a1X2 + (a2X2+…+amXm)2.
• Until we find (a0 + a1X+ a2X2+…+amXm)2 = a0 + a1X2+
a2X4+…+amX2m .
• For example X5+1 = X10+X5+X5+1 = X10+1.
List of Primitive Polynomials of Degree m
• m=2
– X2+X+1
• m=3
– X3+X+1, X3+X2+1
• m=4
– X4+X+1, X4+X3+1
• m=5
– X5+X2+1, X5+X3+1, X5+X3+X2+X+1, X5+X4+X2+X+1
– X5+X4+X3+X+1, X5+X4+X3+X2+1
• m=6
– X6+X+1, X6+X4+X3+X+1, X6+X5+1, X6+X5+X2+X+1
– X6+X5+X3+X2+1, X6+X5+X4+X+1
Introduction to Linear Block Codes: Vector
Spaces
• Let V be a set of elements called vectors and let F be a field of

elements called scalars. The addition operation + is defined
between vectors. A scalar multiplication operation · is defined
such that for any a in F and v in V, a · v is also in V. We say
that V is a vector space over F if + and · satisfy the following
conditions:
1. V forms a commutative group under +
2. For any a in F and v in V, a ·v is also in V (a ·v + b ·w in
also in V if a,b are in F and v and w are in V (from 1 and 2)).
3. + and · distribute (a + b) ·v = a ·v + b ·v and a ·(v+w) = a ·v
+ a ·w.
4. The operation · is associative (a ·b) ·v = a ·(b ·v).
Example
• Let V be the set of all length 2 vectors over GF(4).

• F={0,1 α, α2}
• V={(0,0), (0,1), (0,α), (0, α2), (1,0), (1,1), (1,α), (1,α2), (α,0), (α,
1), (α, α), (α, α2), (α2,0), (α2, 1), (α2, α), (α2, α2)}
• Vector addition is done according to GF(4) addition and
(a,b)+(c,d) = (a+c, b+d)
• Scalar multiplication is done a ·(b,c) = (a ·b, a ·c) and is done
according to GF(4) multiplication.
• Therefore it is easy to show that V forms a commutative group
over addition, that a ·v is also in V, that addition and
multiplication distribute and that multiplication is associative.
Linear Combinations
• Let v1, v2, …, vk be vectors in vector space V on field F.

• Let a1, a2, …, ak be scalars in field F.
• a1v1+a2v2+…+akvk is a linear combination of the vectors.
a1 v1 + a2 v 2 + L + a k v k = mG
m = [a1 a2 L ak ]
⎡ v1 ⎤
⎢v ⎥
G = ⎢ 2⎥
⎢ M ⎥
⎢ ⎥
⎣v k ⎦
Spanning Sets
• Let V be a vector space

• Let G = {v1, v2, …, vk}, each in V, be a spanning set of V.
• G is a spanning set if all vectors in V can be written as a linear
combination of the vectors in G.
– The set of all vectors obtained from linear combinations of G
is called the span of G = span(G).
Example of Spanning Sets
• W = {v1=(0,0,0), v2=(0,1,1), v3=(1,0,0) and v4=(1,1,1)}

on GF(2)
• G={(0,1,1), (1,0,0)} = {v2, v3}
• 0v2+0v3 = v1
• 0v2+1v3 = v3
• 1v2+0v3 = v2
• 1v1+1v3 = v4
• Therefore G is a spanning set of W since span(G) =
W.
Example 2
• Let G2 = {v2, v3, v4}. This is also a spanning set of W because

span(G2) = W.
• However, there are multiple ways to express a vector in W as a
linear combination of vectors in G2.
– For example v4 = 0v2+0v3+1v4 or v4=1v2+1v3+0v4.
• This is because G2 contains vectors that are linearly dependent.
• Definition:
• The vectors v1, v2, …, vk are linearly dependent if there exists a
set of scalars {a1, a2, …, ak} (except for the all 0 case) for which
a1 v1+ a2 v2+…+ ak vk=0.
Basis
• A spanning set for vector space V that has the smallest possible
number of vectors in it is called a basis for V.
– A basis is formed by using only linearly independent vectors
– If one vector in G is linearly dependent on others in G, it can
be removed from the set and the set still spans V.
• Example: V is the set of all binary vectors of length 3.
• V={(0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0),
(1,1,1)}.
• Let G1 = {(0,0,1), (0,1,0), (1,0,0)}
• Let G2 = {(0,1,1), (1,1,0), (1,1,1)}
• Both G1 and G2 each form a basis for V.
Dimension of a Vector Space
• Dim(V) = number of vectors that form a basis for V.

• In the previous example. Dim(V) = 3.
• In the first example Dim(W) = 2.
Vector Subspace
• Let V be a vector space on F and let W be a subset of V.

• If W forms a vector space, then W is a vector subspace of V.
• In the previous examples W is a subspace of V.
ELG 5372 Error
Control Coding
Lecture 7: Fundamentals of
Linear Block Codes
Basic Definitions
• m = (m0, m1, …, mk-1) is the q-ary k-tuple information

vector.
• c = (c0, c1, …, cn-1) is the q-ary n-tuple codeword
vector.
• We say that c is an element of code C. ( c ∈ C)
Definition 1
• An (n,k) block code C over an alphabet of q symbols

is a set of qk n-tuples called codewords. Associated
with the code is an encoder which maps a message
mi, which is a q-ary k-tuple to its associated
codeword, ci.
m c
Block
Coder
Definition 2
• The vector space of all n-tuples from over field Fq is

denoted as Fqn.
– Since Fqn is the set of all possible n-tuples, then the
dimension of Fqn is n.
– Let W be a k dimensional vector subspace of Fqn.
– Let W’ be the set of all codewords in Fqn that are orthogonal
to all codewords in W. (w’.w = 0).
– W’ is called the dual space of W and it can be shown that it
has dimension n-k. (see text page 79-80).
Definition 3
• The (n, k) block code C is a linear block code only if

and only if its qk codewords form a k dimensional
vector subspace of Fqn. The rate of the code is R =
k/n.
– This means that C is a closed set. Therefore the sum of any
two codewords in C produces another codeword in C.
Definition 4
• The hamming weight of a codeword c is equal to the number of

non-zero elements in the codeword.
– Example: Hamming (7,4) code
codeword HW(c) codeword HW(c)
0000000 0 1000110 3
0001101 3 1001011 4
0010111 4 1010001 3
0011010 3 1011100 4
0100011 3 1100101 4
0101110 4 1101000 3
0110100 3 1110010 4
0111001 4 1111111 7
Definition 5: Hamming Distance
• The hamming distance between two codewords in C

is the number of positions in which the two
codewords differ.
• HD(ci, cj) = HW(ci - cj)
• For codes that form vector spaces on GF(2m), ci - cj =
ci + cj.
Definition 6: Minimum Hamming Distance
• The minimum Hamming distance of code C is the

smallest Hamming distnace between two distinct
codewords in the code.
– Since HD(ci, cj) = HW(ci - cj), then for linear block
codes, ci-cj= another non-zero codeword.
Therefore, the minimum Hamming distance of the
code is the minimum non-zero Hamming weight of
the code.
– For Hamming (7,4) example, dmin = 3.
Generator Matrix Description of Linear Block
Codes
• Since a linear block code C is a k-dimensional vector

space, there exist k linearly independent vectors which
form a basis for C.
– {g0, g2, …, gk-1} form a basis for C.
– All qk codewords in C can be expressed as a linear
combination of these basis vectors.
– ci = m0c0 + m1c1 + … + mk-1ck-1, where mi are
elements in GF(q). ⎡ g0 ⎤
⎢ g ⎥
• Let m = [m0 m1 … mk-1] and G = ⎢ 1 ⎥ , then c=mG.
⎢ M ⎥
⎢ ⎥
⎣ k −1 ⎦
g
Generator Matrix Description of Linear Block
Codes (2)
• There are qk distinct vectors for m, therefore there

are qk distinct codewords.
• There are qk distinct information sequences,
therefore, m is the information vector (or message).
• G provides the transformation from information to
codeword, thus G is referred to as the code generator
matrix.
Example
⎡1 1 0 1 1 0 ⎤
G1 = ⎢⎢0 1 1 0 1 1⎥⎥
⎢⎣1 1 1 1 1 1⎥⎦
m c m c
000 000000 100 110110
001 111111 101 001001
010 011011 110 101101
011 100100 111 010010
Example 2
⎡1 1 0 1 1 0⎤
G 2 = ⎢⎢0 1 1 0 1 1⎥⎥
⎢⎣1 0 0 1 0 0⎥⎦
m c m c
000 000000 100 110110
001 100100 101 010010
010 011011 110 101101
011 111111 111 001001
Systematic Codes
• A code C is said to be systematic if the original message

appears explicitly in the codeword.
m m p p m
• For a systematic linear block code, the generator matrix is called

a systematic generator.
Systematic Generators
• Gsyst takes the form [Ik | P] or [P | Ik], where Ik is a k ×

k identity matrix and P is a k × (n-k) matrix which
generates parity symbols.
• For any given G, we can find Gsyst by linear
combinations of rows.
Example
⎡1 1 0 1 1 0⎤
G1 = ⎢⎢0 1 1 0 1 1⎥⎥
3=1+3 ⎢⎣1 1 1 1 1 1⎥⎦
2=2+3
⎡1 1 0 1 1 0⎤ ⎡1 1 0 1 1 0⎤
G 2 = ⎢⎢0 1 1 0 1 1⎥⎥ G 3 = ⎢⎢0 1 0 0 1 0⎥⎥
⎢⎣0 0 1 0 0 1⎥⎦ ⎢⎣0 0 1 0 0 1⎥⎦
⎡1 0 0 1 0 0⎤
G syst = ⎢⎢0 1 0 0 1 0⎥⎥ 1=1+2
⎢⎣0 0 1 0 0 1⎥⎦
Example
m c m c
000 000000 100 100100
001 001001 101 101101
010 010010 110 110110
011 011011 111 111111

Generator of Dual Code
• Let C be a (n,k) linear block code with generator G.

• Let C’ be the dual of C. In other words, C’ is made
up of all n-tuples that are orthogonal to all n-tuples in
C.
• The basis vectors in C’ are orthogonal to the basis
vectors in C.
• C’ will be a (n, n-k) linear block code.
How to find the generator of Dual Code
• Let H be the (n-k) × n generator matrix of C’.

• GHT = k × (n-k) all 0 matrix.
• Recall that Gsyst produces the same code as
G.
• Gsyst = [Ik | P]
• If H = [PT | In-k], then GsystHT = P+P = 0.
• This means GHT = 0.
Example
⎡1 1 1 0 1⎤
G=⎢
⎣0 1 0 1 1⎥⎦
⎡1 0 1 1 0⎤ ⎡1 1 0 ⎤
G syst =⎢ ⎥ , P=⎢ ⎥
⎣0 1 0 1 1⎦ ⎣ 0 1 1 ⎦
⎡1 1 0⎤
⎢0 1 1⎥⎥
⎡1 0 1 0 0⎤
⎡1 1 1 0 1⎤ ⎢ ⎡0 0 0 ⎤
H = ⎢⎢1 1 0 1 0⎥⎥, GH = ⎢
T
⎥ ⎢1 ⎥
0 0 =⎢ ⎥
⎢⎣0 1 0 0 1⎥⎦ ⎣0 1 0 1 1⎦ ⎢0 ⎥
1 0⎥ ⎣
0 0 0 ⎦
⎢
⎢⎣0 0 1⎥⎦
ELG 5372 Error
Control Coding
Lecture 8: Parity Check

Matrices and Decoding of
Linear Block Codes
Parity Check Matrix
• Let C be an (n,k) linear block code over Fq.

• Let G be the generator matrix of C.
• Let H be the generator matrix of C’, which is the (n,n-
k) dual code of C.
• Let c be a codeword from C.
• Since c = mG, then cHT = mGHT = 01,(n-k) where 0i,j is
an i×j all zero matrix.
• The H matrix can be used to check that c is a valid
codeword, hence it is called the parity check matrix of
C.
Example
⎡1 1 0⎤
⎡1 0 0 0 1 1 0⎤
⎢0
⎢ 1 1⎥⎥
⎢0 1 0 0 0 1 1⎥⎥
⎢1 1 1⎥
⎥
G=⎢ [1 1 0 0 1 0 1]⎢⎢1 0 1⎥ = [0 0 0]
⎢0 0 1 0 1 1 1⎥
⎢ ⎥
⎢1 0 0⎥
⎢ ⎥
⎣0 0 0 1 1 0 1⎦ ⎢0 1 0⎥
⎡1 0 1 1 1 0 0⎤
⎢0
⎣ 0 1⎥⎦
H = ⎢⎢1 1 1 0 0 1 0⎥⎥ ⎡1 1 0⎤
⎢0 1 1⎥⎥
⎢⎣0 1 1 1 0 0 1⎥⎦ ⎢
⎢1 1 1⎥
⎥
[1 1 0 0 0 0 0]⎢⎢1 0 1⎥ = [1 0 1]
⎢1 0 0⎥
⎢ ⎥
⎢0 1 0⎥
⎢0
⎣ 0 1⎥⎦
Parity check equations
⎡1 0 1 1 1 0 0⎤
⎢
H = ⎢1 1 1 0 0 1 0⎥ ⎥
⎢⎣0 1 1 1 0 0 1⎥⎦
• Parity check matrix gives rise to a set of parity
check equations
• c0+c2+c3+c4=0, c0+c1+c2+c5=0, c1+c2+c3+c6=0
• Or c4=c0+c2+c3, c5=c0+c1+c2, c6=c1+c2+c3.
Linear Block Code Theorem 1
• Let linear block code C have parity check matrix H. The

minimum distance of the code is equal to the smallest positive
number of columns of H which are linearly dependent.
– Proof
Let the column vectors of H be designated h0T, h1T, … hn-1T,
where hi is a 1×n vector. Let codeword c=[c0 c1 … cn-1] be a 1×n
codeword of C. Then cHT = c0h0+c1h1+…cn-1hn-1 = 01,(n-k).
Let c be a codeword of C of minimum weight. Therefore HW(c)

= dmin. Further, let c be nonzero at indices i1, i2, …, idmin. Then
Linear Block Code Theorem 1 cont’d
ci1hi1+ci2hi2+…+cidminhidmin = 01,(n-k). Therefore we know that we

can find at least one linear combination of dmin column vectors of
H that add up to zero.
Consequently, if there were a linearly dependent set of column

vectors of less than dmin column vectors, then there would have
to be a corresponding codeword of weight that is less that dmin.
Example of LBC Theorem 1
⎡1 0 1 1 1 0 0⎤
⎢
H = ⎢1 1 1 0 0 1 0⎥ ⎥
⎢⎣0 1 1 1 0 0 1⎥⎦
↑ ↑ ↑
↑ ↑ ↑
↑ ↑ ↑
Rank of a Matrix
• The rank of a matrix is the number of maximum
number of linearly independent rows or columns of a
matrix.
– The column rank is the maximum number of
linearly independent columns
– The row rank is the maximum number of linearly
independent rows
– Row rank = column rank.
• For a (n-k) ×n H matrix, the row rank is (n-k).
• Therefore column rank = (n-k). Therefore we know
that we cannot find a set of n-k+1 linearly
independent column vectors in H.
Singleton Bound
• We know that dmin is the minimum number of linearly

dependent column vectors in H and form the previous
slide, we know that the maximum number of linearly
independent column vectors in H is n-k.
– dmin ≤ n-k+1.
• Any code that satisfies the Singleton Bound with
equality is called a maximum separable (MDS) code.
Example (4,2) 4-ary code
⎡1 0 α 2 α ⎤
G=⎢ ⎥
⎢⎣ 0 1 1 α ⎥⎦
m c m c
00 0000 α0 α01α2
01 011α α1 α101
0α 0ααα2 αα ααα20 dmin = 3 = 4-2+1
0α2 0α2α21 αα2 αα2αα
10 10α2α α20 α20α1
11 11α0 α21 α21α2α2
1α 1α11 α2α α2α0α
1α2 1α20α2 α2α2 α2α210
Example cont’d
⎡α 2 1 1 0⎤
H=⎢ ⎥
⎢⎣ α α 0 1⎥⎦
⎡α 2 ⎤ ⎡ 1 ⎤ ⎡0⎤ ⎡0⎤
α⎢ ⎥ + ⎢ ⎥ + ⎢ ⎥ = ⎢ ⎥
⎢⎣ α ⎥⎦ ⎣α ⎦ ⎣1⎦ ⎣0⎦
Hamming Spheres
• Consider a t error correcting code.

• A code can correct t errors if dmin ≥ 2t+1.
• A non-codeword has distance of t or less from at least one
codeword.
• The vectors of Hamming distance t or less away from a codeword
form a “sphere” of radius t around the codeword. This is called a
Hamming Sphere.
• There are Vq(n,t) vectors of length n within a Hamming sphere of
radius t, where
i ⎛n⎞
t
Vq (n, t ) = ∑ (q − 1) ⎜⎜ ⎟⎟
i =0 ⎝i⎠
(this number includes the given codeword).
Example
• Returning to the 4-ary example shown previously, let us

consider codeword (0000).
– Using this codeword as the center of the Hamming sphere,
there are 13 vectors in a Hamming sphere of radius 1
around this codeword
– 0000, 0001, 0010, 0100, 1000, 000α, 00α0, 0α00, α000,
000α2, 00α20, 0α200, α2000.
– The above vectors also fall into a Hamming sphere of radius
2 around 0000. All vectors of weight 2 also fall into this
sphere (0011, 0110 1100, 001α, …). There are 54 weight 2
length 4 vectors over GF(4). Therefore there are 67 vectors
that fall into this sphere.
Hamming Bound
• For hard decision decoding, we can express the received word

as r = c+e, where e is called the error pattern.
• The codeword c is an element of C but e is an element of Vqn (of
which C is a subspace), therefore r is an element of Vqn.
• Vqn can be divided into Hamming spheres around codewords of
C.
• For a t error correcting code, all error patterns of weight t or less
can be corrected as long as dmin ≥ 2t+1.
• We can divide the elements of Vqn into M = qk non-overlapping
spheres of radius t. However, there may exist some elements in
Vqn whose Hamming distance from every codeword in C is
greater than t.
Hamming Bound cont’d
• Therefore MVq(n,t) ≤ qn.

• Vq(n,t) ≤ qn/M → logqVq(n,t) ≤ n - logqM
• For linear block codes, M = qk, therefore n-k ≥ logqVq(n,t).
• The Hamming bound states that if we want to design a t error
correcting code, the amount of redundancy needed is greater
than or equal to the log of the number of vectors in a Hamming
sphere of radius t.
• Example Hamming (7,4) is a one error correcting code.
– V2(7,1) = 1+7 = 8
– Then n-k ≥ 3.
– In the Hamming (7,4) case, n-k = 3.
Hamming Bound example 2
• For our (4,2) 4-ary code, dmin = 3, therefore t = 1.

• For any general 1 error correcting code of length 4
over GF(4), we need n-k ≥ log4V4(4,1) = log4(13) =
1.85.
• Therefore we need to choose k = 1 or 2. (k<2.15).
Perfect Code
• A “perfect” code is a code that satisfies the Hamming bound

with equality.
– This title does not imply that the code is the best possible
code.
– It tells us that all elements in Vqn fall into a Hamming sphere.
Therefore a t error correcting code corrects all error patterns
of weight t but it cannot correct any of weight t+1.
• Most block codes (linear and nonlinear) are not perfect.
• Hamming codes, Golay (23,12)1 and odd length repetition codes
are examples of perfect codes. See page 89 of text for
complete list of perfect codes.
1 this is a binary 3 error correcting code.

Error Detection and Error Correction with
Hard Decisions
• Error detection
– r = c+e
– S = rHT (this is called the syndrome).
– S = (c+e)HT = cHT+eHT = eHT.
– When the error pattern is all zero (no error has occurred,
then the syndrome is all zero).
– If the syndrome is not all zero, an error is detected.
– In automatic repeat request (ARQ) schemes, if the syndrome
is non-zero, the receiver requests that the sender resend the
codeword.
ELG 5372 Error
Control Coding
Lecture 9: Decoding of Linear

Block Codes and Performance
Measures
Error Correction with Hard Decisions
• Error Correction
– Standard Array
– Syndrome Decoding
Standard Array
• A standard array is a table of all of the possible n-tuples in Vqn.

– None are missing and none are repeated.
• In the top row of the table are all of the codewords in C.
• C forms a subspace of Vqn.
• In the rows beneath are all of the cosets of C.
• The leftmost column contains the coset leader of each row.
• The coset leader can be thought of as the most likely error
pattern that when added to each codeword in the code, will
produce all of the vectors in the coset.
• Coset leader is thus the lowest weight element of the coset.
• The coset leader for the code itself is the all 0 codeword (it can
be viewed as a codeword and a zero weight error pattern.
Example
00000 01000 10000 11000 00000 01011 10110 11101
00001 01001 10001 11001
00010 01010 10010 11010
00011 01011 10011 11011
00100 01100 10100 11100
00101 01101 10101 11101
00110 01110 10110 11110
00111 01111 10111 11111
Example
00000 01000 10000 11000 00000 01011 10110 11101
00001 01001 10001 11001
00001 01010 10111 11100
00010 01010 10010 11010
00011 01011 10011 11011
00100 01100 10100 11100
00101 01101 10101 11101
00110 01110 10110 11110
00111 01111 10111 11111
Example
00000 01000 10000 11000 00000 01011 10110 11101
00001 01001 10001 11001
00001 01010 10111 11100
00010 01010 10010 11010
00011 01011 10011 11011 00010 01001 10100 11111
00100 01100 10100 11100
00101 01101 10101 11101
00110 01110 10110 11110
00111 01111 10111 11111
Example
00000 01000 10000 11000 00000 01011 10110 11101
00001 01001 10001 11001
00001 01010 10111 11100
00010 01010 10010 11010
00011 01011 10011 11011 00010 01001 10100 11111
00100 01100 10100 11100
00100 01111 10010 11001
00101 01101 10101 11101
00110 01110 10110 11110
00111 01111 10111 11111
Example
00000 01000 10000 11000 00000 01011 10110 11101
00001 01001 10001 11001
00001 01010 10111 11100
00010 01010 10010 11010
00011 01011 10011 11011 00010 01001 10100 11111
00100 01100 10100 11100
00100 01111 10010 11001
00101 01101 10101 11101
00110 01110 10110 11110 01000 00011 11110 10101
00111 01111 10111 11111
10000 11011 00110 01101
01100 00111 11010 10001
11000 10011 01110 00101

Decoding using the standard array
• For a given received word r, we find it in the standard array then

follow the column up to the top and that is our decoded word.
• Note that this code, with dmin = 3 can correct two error patterns
of weight two.
• If the error pattern is not among the coset leaders, then a
decoding error will occur.
• Example r = 00110 will be decoded as 10110.
• Example 2: suppose c = 00000 and e = 10111. This will be
decoded as 10110 (decoder will assume that the coset leader is
the most likely error pattern).
• Lookup is implemented using a memory device where the
address specified by r contains the decoder output cdec.
Syndrome Decoding
• S = rHT = (c+e)HT = eHT.

• Let v1 be a coset leader and v2 be in v1’s coset. Then if e = v1,
S = v1HT.
• If e = v2, then S = v2HT = (v1+c)HT = v1HT.
• If v1 is coset leader, it has lower weight than v2, therefore it is
more likely to occur.
• Decoding algorithm:
– Compute S = rHT.
– For a given S, there is a most likely es.
– Compute cdec = r-es.
Example
• In our previous example,
⎡1 0 1 0 0⎤
⎡1 0 1 1 0⎤ ⎢ ⎥
G=⎢ ⎥ ⇒ H = ⎢1 1 0 1 0⎥
⎣0 1 0 1 1 ⎦ ⎢⎣0 1 0 0 1⎥⎦
Example: Syndromes (S=rHT)
row0 00000 01011 10110 11101 000

00001 01010 10111 11100 001
row1
row2 00010 01001 10100 11111 010
row3 00100 01111 10010 11001
100
01000 00011 11110 10101
row4 011
row5 10000 11011 00110 01101 110
01100 00111 11010 10001
row6 111
row7 11000 10011 01110 00101 101
Example
S e
000 00000
001 00001
010 00010
100 00100
011 01000
110 10000
111 01100 or 10001
101 11000 or 00101
Performance of Linear Block Codes
• P(E) is the probability of decoder error (Word error
rate – WER)
– This is the probability that the codeword at the
output of the decoder is not the same as the
transmitted codeword.
• Pb is the probability of bit error.
– Probability that the decoded message bits are not
the same as the original message bits.
• Pu(E) is the probability of undetected error.
– Probability that errors occurring in a codeword are
not detected.
• Pd(E) is the probability of detected codeword error.
Performance of Linear Block Codes (2)
• Pub is the probability of message bit error in an

undetected codeword error.
• Pdb is the probability of message bit error in a
codeword with a detected error.
• P(F) is the probability of decoder failure. This is the
probability that a decoder cannot decode the
received vector (it is able to determine that it cannot
decode).
Binary Symmetric Channel BSC
p0 0 1-p 0
p
p1 1 1
1-p
The channel is memoryless. In other words, events

occurring in one signaling interval do not affect
events occurring in the following signaling intervals.
Error Detection Performance
• r = c+e.
• S = rHT
• An error is detected if S ≠ 0.
• Therefore an error is undetected if the error pattern e is equal
to a codeword.
• For the BSC, Pu(E) is given by:
n
Pu ( E ) = ∑ i
A p i
(1 − p ) n −i
i = d min
• Where Ai is the number of codewords of weight I in code C.

Error Detection Performance 2
• The probability of detected error is the probability

that the error pattern has weight > 0 and that the
error does not go undetected.
– therefore
Pd ( E ) = 1 − (1 − p ) n − Pu ( E )
Examples
• Suppose p =0.1.
– For Hamming (7,4), Pu(E) = 7p3(1-p)4+7p4(1-p)3+p7.
– Then Pu(E) = 0.0051 or 0.51%.
– For the (5,2) code given at beginning of lecture, Pu(E) =
2p3(1-p)2+p4(1-p) = 0.0017 = 0.17%.
– Is this a fair comparison?
• For these two examples, the probability of detected error would
be:
– Hamming (7,4), Pd(E) = 1-(1-p)7-Pu(E) = 0.5166
– (5,2) code Pd(E) = 1-(1-p)5-Pu(E) = 0.4078
Weight Distribution of the Code
• The weight distribution of a code tells us how many

codewords there are in the code of weight i.
• It is usually expressed as a polynomial.
A( x) = 1 + Ad min x d min + Ad min +1 x d min +1 + ... + An x n

n
= ∑ i
A x i
i =0
• Where Ai is the number of codewords of weight i.
• For a linear block code A0 = 1 and for any code An = 0 or
1.
Bounds on Pu(E) and Pd(E)
• Calculation of Pu(E) and Pd(E) requires that we know the weight

distribution of the code.
• For long block codes, the weight distribution is not always
known.
• However, we do know that for error detection, all error patterns of
weight dmin-1 and less can be detected.
• Although most error patterns of weight dmin and more can be
detected, we do not know how many. Therefore
d min −1 n
n ⎛n⎞ i ⎛ ⎞
Pu ( E ) ≤ ∑ ⎜⎜ ⎟⎟ p (1 − p ) = 1 − ∑ ⎜⎜ ⎟⎟ p i (1 − p ) n −i
n −i
i = d min ⎝ i ⎠ i =0 ⎝ i ⎠
d min −1 n
n ⎛n⎞ i ⎛ ⎞ i
Pd ( E ) ≥ 1 − (1 − p) − ∑ ⎜ ⎟ p (1 − p) = ∑ ⎜⎜ ⎟⎟ p (1 − p ) n −i
n
⎜ ⎟ n −i
i = d min ⎝ i ⎠ i =1 ⎝ i ⎠
Comparing bounds to actual values for
our examples
• If we apply the bounds of the previous slide to our

examples of Hamming (7,4) and the (5,2) code with p
= 0.1, then we find:
– Hamming (7,4): Pu(E) ≤ 1-(1-p)7-7p(1-p)6 = 0.15
(actual is 0.0051) and Pd(E) ≥7p(1-p)6 = 0.372
(actual is 0.5166).
– Our (5,2) code Pu(E) ≤ 1-(1-p)5-5p(1-p)4 = 0.081
(actual is 0.0017) and Pd(E) ≥7p(1-p)6 = 0.328
(actual is 0.4078).
Error Correction Performance
• The probability that the decoder produces the incorrect codeword

at its output is the probability that the error pattern is not among
the correctable error patterns:
– For example in Hamming (7,4), the decoder can correct all
weight 1 error patterns, therefore the probability of decoder
error is the probability that the error pattern has weight greater
than 1.
– Thus for Hamming (7,4),
7 ⎛7⎞ i 1 ⎛7⎞
P(E ) = ∑ ⎜⎜ ⎟⎟ p (1 − p ) = 1 − ∑ ⎜⎜ ⎟⎟ p i (1 − p ) 7 −i
7 −i
i =2 ⎝ i ⎠ i =0 ⎝ i ⎠
Error Correction Performance
• As another example, for our (5,2) linear block code, the

decoder can correct all error patterns of weight 1 and 2
error patterns of weight 2.
⎡ 1 ⎛ 5⎞ i 3⎤
P(E ) = 1 − ⎢∑ ⎜⎜ ⎟⎟ p (1 − p ) + 2 p (1 − p ) ⎥
5−i 2
⎢⎣i =0 ⎝ i ⎠ ⎥⎦
Error correction performance
• For our examples, if p = 0.1,

– Hamming (7,4), P(E) = 1-0.97-7(0.1)(0.9)6 = 0.15
– (5,2) code, P(E) = 1-0.95-5(0.1)(0.9)4-2(0.1)2(0.9)3 =
0.067.
• In general
t ⎛n⎞ i
P ( E ) ≤ 1 − ∑ ⎜⎜ ⎟⎟ p (1 − p) n −i
i =0 ⎝ i ⎠
ELG 5372 Error
Control Coding
Lecture 10: Performance

Measures: BER after decoding
Error Correction Performance Review
• The probability of incorrectly decoding a received word is the

probability that the error pattern is not one of the coset leaders
of the standard array.
– For the Hamming (7,4) case (and for any Hamming Code),
this is the probability that the error pattern has a weight of 2
or more.
– For the (5,2) linear code example, it is the probability that the
error pattern is not one of the 8 error patterns in the standard
array (one of weight 0, 5 of weight 1 and 2 of weight 2).
• Conversely, we can state that the probability that the decoder
correctly decodes received word is the probability that the error
pattern is one of the coset leaders.
– Denote this as Pc, then P(E) = 1-Pc.
Error Correction Performance Review
• Hamming (7,4), Pc = (1-p)7+7p(1-p)6.

• (5,2) block code: Pc = (1-p)5+5p(1-p)4+2p2(1-p)3 if the
decoder is a complete decoder.
– This means that it corrects all coset leaders in it
standard array.
– For imperfect codes, there are some cosets eladers
with weight greater than t = ⎣d 2 −1 ⎦
min
Bounded-Distance Decoder
• Decoder does not correct any error patterns of weight greater

than ⎣d min2 −1 ⎦
• In the event that the code is not perfect, this means that for
coset leaders in the standard array that have weight greater
than ⎣d 2 −1 ⎦ , then the decoder declares a decoder failure for any
min
received words that fall in that coset.

• This is equivalent to saying that a decoder failure is declared for
any received word that does not fall into any Hamming sphere of
radius ⎣d 2 −1 ⎦ .
min
Examples
Coset leaders
00000 01011 10110 11101

codewords
00001 01010 10111 11100
00010 01001 10100 11111

Correct error
00100 01111 10010 11001
01000 00011 11110 10101
10000 11011 00110 01101
01100 00111 11010 10001

Declare failure
11000 10011 01110 00101

Example 2
000000 011011 101101 110110
000001 011010 101100 110111
t=1 000010 011001 101111 110100
Correct error
000100 011111 101001 110010

001000 010011 100101 111110
010000 001011 111101 100110
100000 111011 001101 010110
000011 011000 101110 110101
000101 011110 101000 110011
Declare failure
000110 011101 101011 110000

001001 010010 100100 111111
001010 010001 100111 111101
001100 010111 100001 111010
010100 001111 111001 100010
000111 011100 101010 110001
001110 010101 100011 111000
Probability of Decoder Failure
• We declare decoder failure for bounded-distance decoders if the

received word is in a row of the standard array that corresponds
to a coset that is outside all Hamming spheres of radius t, where
t= ⎣d min2 −1 ⎦.
• The probability of decoder failure, P(F), is the probability that the
error pattern is among the cosets found below the “correct error
line” of the standard array.
⎣dmin −12 ⎦⎛ n ⎞
P( F ) = 1 − ∑
⎜⎜ ⎟⎟ p i (1 − p ) n −1 − P( E )
i =0 ⎝ i ⎠
• Where P(E) is now the probability that the error pattern is not a
coset leader among all of the n-tuples above the “correct error
line”.
Example
• For the (5,2) code, a decoder failure is declared if the error

pattern is one of the following:
– 01100, 00111, 11010, 10001, 11000, 10011, 01110 and
00101.
– Therefore P(F) = 4p2(1-p)3+4p3(1-p)2.
– Pc = (1-p)5+5p(1-p)4.
– Therefore P(E) = 1-Pc-P(F) = 1-(1-p)5+5p(1-p)4+ 4p2(1-
p)3+4p3(1-p)2.
Probability of bit error (after decoding)
• When the decoder corrects an error, there are two scenarios

– Decoder correctly “corrects” a codeword (ĉ = c).
– Decoder incorrectly “corrects” a codeword (ĉ = another
codeword = c+cc).
r ĉ=c+cc
decoder
Probability of bit error 2
• Consider the all 0 word of Hamming (7,4) systematic code:

– There are 7 other codewords that are a distance of 3 from
this codeword
– There are 7 other codewords that are a distance 4 from this
codeword.
– There is 1 codeword that is a distance 7 from this codeword.
• Since the code is linear, it is a subset of V27.
• If we select any codeword as the coset leader, and add it to all
of the vectors in the code, the result is the code itself.
• This means that all codewords in the code have the same
distance profile as the all 0 codeword.
• Assuming that the all 0 codeword is transmitted, then the

decoder is successful if ĉ = 0000000.
• If a decoder error occurs, then ĉ will be another codeword in the
code.
• If the error pattern has weight 0 or 1, ĉ = 0000000.
• If the error pattern has weight 2 an error will occur
– Since the code is a one error correcting code, it will change
one bit from a 1 to a 0 (will change received word from
weight 2 to weight 1) or from a 0 to a 1 (will change received
word from weight 2 to weight 3).
– ĉ must have weight 3.
• If the error pattern has weight 3

– If e = a codeword of weight 3, then the decoder will not
change any bits and ĉ will have weight 3.
– If e ≠ a codeword, then the decoder will attempt to correct by
inverting a bit. Since there are no codewords of weight 2,
the result here is that ĉ will have weight 4.
– There are 7 codewords of weight 3 and 7!/(3!4!)-7 = 28
weight 3 error patterns that are not codewords. All error
patterns of weight 3 are equally likely. Therefore if the error
pattern has weight 3, there is a probability of 0.2 that ĉ will
have weight 3 and 0.8 probability that it will have weight 4.
• If error pattern has weight 4:

– ĉ will have weight 4 with probability 0.2 and weight 3 with
probability 0.8.
• If error pattern has weight 5
– ĉ will have weight 4.
– ĉ will have weight 7
– ĉ will have weight 7.
• Since the transmitted codeword is the all 0 codeword, then the
number of bit errors in the decoded codeword is the weight of
the decoded codeword.
• For block codes, how does bit error rate in decoded codeword
translate to bit error rate in decoded message?
• Let us consider systematic codes for ease of illustration.
• Assume that the all 0 codeword is transmitted. The first 4 bits
are the message.
• Let us assume that the decoded codeword has weight 3.
(therefore 3/7 of the code bits are in error).
0000000 0 1000110 3
0001101 3 1001011 4
0010111 4 1010001 3
0011010 3 1011100 4
0100011 3 1100101 4
0101110 4 1101000 3
0110100 3 1110010 4
0111001 4 1111111 7
• 0001101, 0011010, 0100011, 0110100, 1000110, 1010001,

1101000.
• We can show that if we transmit the all 0 codeword and the
decoded codeword has 3 errors in it, it is equally likely that any
one of the above if the erroneous codeword.
• Therefore probability that a message bit is in error when the
decoded codeword has 3/7 bits in error is (1/4 + 2/4 +1/4 + 2/4 +
1/4 + 2/4 + 3/4)/7 = 12/28 = 3/7.
• It is equal to the code bit error rate. We can show the same
thing for when the decoded codeword has 4/7 bits in error or
when the decoded codeword has 7/7 bits in error.
• This is because errors are uniformly distributed among the info
and parity bits.
• Therefore Pb for Hamming (7,4) is given by:
3 ⎛7⎞ 2 5 ⎛ 3 4 ⎞⎛ 7 ⎞ 3 4 ⎛ 3 4 ⎞⎛ 7 ⎞ 4
Pb = ⎜⎜ ⎟⎟ p (1 − p ) + ⎜ 0.2 + 0.8 ⎟⎜⎜ ⎟⎟ p (1 − p ) + ⎜ 0.8 + 0.2 ⎟⎜⎜ ⎟⎟ p (1 − p ) 3
7 ⎝ 2⎠ ⎝ 7 7 ⎠⎝ 3 ⎠ ⎝ 7 7 ⎠⎝ 4 ⎠
4 ⎛7⎞ 5 2 ⎛7⎞ 6 ⎛7⎞ 7
+ ⎜⎜ ⎟⎟ p (1 − p ) + ⎜⎜ ⎟⎟ p (1 − p ) + ⎜⎜ ⎟⎟ p
7 ⎝5⎠ ⎝ 6⎠ ⎝7⎠
Pb = 9 p 2 (1 − p ) 5 + 19 p 3 (1 − p ) 4 + 16 p 4 (1 − p ) 3 + 12 p 5 (1 − p ) 2 + 7 p 6 (1 − p ) + p 7
• Of course, the above equation is for the case when the all 0
codeword is transmitted. By using distance rather than weight
arguments, we can show that this is the probability of bit error for
any transmitted codeword in the code.
• The equation derived for the bit error of probability for Hamming
(7,4) code is long and requires knowledge of the weight
distribution of the code.
• In many cases we don’t know the exact weight distribution of the
code.
• Therefore, Pb is estimated using bounds.
Probability of bit error bounds
• Given the probability of decoder error P(E).

– If a decoder error has occurred, at least one of the message
bits must be in error.
– If a decoder error has occurred, at most all of the message
bits are in error.
• (1/k)P(E) ≤ Pb ≤ P(E).
• Let us consider the AWGN channel with BPSK modulation
– ⎛ 2E ⎞ .
p = Q⎜ N b R ⎟
⎝ 0
⎠
1.00E-01
1.00E-02
1.00E-03
p(uncoded)
P(E)
BER
(1/k)P(E)
exact
1.00E-04
1.00E-05
1.00E-06
0 2 4 6 8 10 12
Eb/No (in dB)
ELG 5372 Error
Control Coding
Lecture 11: Erasure Decoding,

Modifications to Linear Codes, and
Introduction to Cyclic Codes
Erasure Decoding
• An erasure is a symbol where the probability of error is high.

– For example, in BPSK, if the decision variable is close to 0,
the certainty of the detection is low. Decoder may declare
this symbol to be an erasure.
– In packet transmissions, codewords may be interleaved over
multiple packets. If one packet is not received, then the
symbols contained in that packet are “erased”.
• When the decoder declares an erasure, then we essentially
have a symbol error with a known location.
Erasure decoding (2)
• Consider the all 0 codeword in Hamming (7,4). Assume we

receive the following:
– 000X00X, where X is an erasure.
– 0000000 and 0001000 0000001 or 0001001 are the possible
received vectors.
– Decoding the first yields 0000000
– Decoding the second yields 0000000
– Decoding the third yields 0000000
– Decoding the fourth yields 0001101
• Comparing all to the non-erased bits of the received word, the
fourth has 1 bit different, where the other three are the same.
Therefore, the decoder outputs 0000000.
Erasure decoding 3
• We don’t need to consider all combinations.

• Only when replacing all erasures with 0 and all erasures with 1
do we get distinct outputs from the decoder.
– Binary erasure decoding algorithm
1. Place 0’s in all erased coordinates and decode as c0.
2. Place 1’s in all erased coordinates and decode as c1
3. Output codeword for which HD(ci,r) is minimum.
Erasure capability of code
• Consider a linear block code with minimum distance dmin.

• A single erased symbol leaves a code with minimum distance at
least dmin-1.
• Therefore f erased symbols can be filled provided f < dmin.
– In previous example, assuming no errors in the non erased
bits, only 1 codeword has all zeros in the non-erased bits.
• If there are errors as well as erasures: For a code experiencing
f erasures, then the minimum distance for the code left by the
non-erased symbols is at least dmin-f.
• The number of errors that can be corrected is:
t f = ⎣(d min − f − 1) / 2⎦
• Therefore 2e + f < dmin.

Why does binary erasure algorithm work?
• Suppose we have f erasures and e errors, such that 2e+f < dmin.
• Replacing all erasures by 0 introduces e0 errors into the
received codeword, therefore we have e+e0 total errors. Also e0
≤ f.
• Replacing all erasures by 1 introduces e1 errors into the
received codeword, therefore we have e+e1 total errors. Also e1
≤ f and e0+e1 = f.
• In the worst case, e0 = e1 = f/2. therefore both words to be
decoded contain e+f/2 errors.
• If e0 ≠ e1, then there will be one of the words that has less than
e+f/2 errors. 2(e+f/2) = 2e+f < dmin. Therefore, there is always
one that is below the error correcting capability of the code.
Non Binary Erasure Decoding
• For non binary codes, erasure decoding is more complicated

and depends on the structure of the code
• Erasure decoding is popular for decoding of RS codes.
• Erasure decoding of RS codes will be discussed later in the
course.
Modifications to Linear Codes
• Extending a code
– An (n.k,d) code is extended by adding an additional
redundant coordinate to produce an (n+1,k,d+1) code.
– For example we can use even parity to extend Hamming
(7,4) to an (8,4) code with dmin = 4.
⎡1 0 0 0 1 1 0 1⎤ ⎡1 0 1 1 1 0 0 0⎤
⎢0 1 0 0 0 1 1 1⎥⎥ ⎢1 1 1 0 0 1 0 0⎥⎥
G=⎢ H=⎢
⎢0 0 1 0 1 1 1 0⎥ ⎢0 1 1 1 0 0 1 0⎥
⎢ ⎥ ⎢ ⎥
⎣0 0 0 1 1 0 1 1⎦ ⎣1 1 0 1 0 0 0 1⎦
Modifications to Linear Codes 2
• A code is punctured by deleting one of its parity bits

– A (n,k) code becomes an (n-1, k) code.
– If the punctured symbol is in a non-zero coordinate of the
minimum weight codeword, the minimum distance will also
be reduced by 1.
– Puncturing corresponds to removing a column from the
generator matrix.
Modifications to Linear Codes 3
• Expurgating a code means to produce a new code by deleting

some of its codewords
– (n,k) -> (n, k-1).
– The results may or may not be a linear block code.
– The minimum distance cannot decrease, but it may increase.
• Augmenting a code is achieved by adding codewords.
– (n,k) -> (n,k+1)
– New code may or may not be linear
– Distance may decrease.
• A code is shortened by deleting a message symbol:
– (n,k) -> (n-1, k-1)
• A code is lengthened by adding a message symbol
– (n,k) -> (n+1, k+1)
Introduction to Cyclic Codes
• For linear block codes, the standard array (or the syndrome
lookup) can be used for decoding.
• However, for long codes, the storage and computation time of
this method can be prohibitive.
• There is no mechanism by which we can design a generator
matrix (or parity check matrix) to achieve a given minimum
distance.
• Cyclic codes are based on polynomial operations.
Basic Definitions
• Let c = (c0, c1, …, cn-1) be a codeword.

• Let cR = (cn-1, c0, c1, …, cn-2) be a right cyclic shift of c.
• Let cL = (c1, c2, …, cn-1, c0) be a left cyclic shift of c.
• We can show that cL = cRR…R (n-1 times).
• Definition of a cyclic code
– Let C be a linear (n.k) block code. C is a cyclic code if for
every codeword c in C, then cR is also in C.
Example

0000000 0 1000110 3
0001101 3 1001011 4
0010111 4 1010001 3
0011010 3 1011100 4
0100011 3 1100101 4
0101110 4 1101000 3
0110100 3 1110010 4
0111001 4 1111111 7
It is easy to see that this code is a cyclic code

Polynomial representation
• c = (c0, c1, …, cn-1) → c(x) = c0+c1x+…+cn-1xn-1.

• A shift left (not cyclic) is thus xc(x) = c0x+c1x2+…+cn-1xn.
• Vectorially, this is represented as (0,c0, c1, c2, …, cn-1)
• Let p(x) be a polynomial and let d(x) be a divisor. Then p(x) =
q(x)d(x) + r(x), where q(x) is the quotient and r(x) is the
remainder.
• cR = (cn-1, c0, … cn-2) → cR(x) = cn-1+c0x+…+cn-2xn-1.
• Therefore xc(x) = cR(x) + cn-1xn – cn-1.
• Or xc(x) = cn-1(xn-1) + cR(x).
• cR(x) is the remainder when we divide xc(x) by xn-1.
• cR(x) = xc(x) mod(xn-1).
Example
• (0001101) = x3+x4+x6.
• xc(x) =x4+x5+x7.
• (x7+x5+x4) ÷ (x7+1) = 1 remainder 1+x4+x5 = (1000110).
Rings
• A ring R is a set with two binary operations defined on it (+ and
•) such that
1. R is a commutative group over +. The additive identity is
denoted by 0.
2. The • operation (multiplication) is associative (a•b) •c = a•(b•c).
3. The left and right distributive laws apply:
• a•(b+c) = a•b+a•c
• (a+b) •c = a•c+b•c
4. The ring is said to be a commutative ring if a•b = b•a for every a
and b in R.
• The ring is a ring with identity if there exists a multiplicative
identity denoted as 1.
• Multiplication need not form a group and there may not be a
multiplicative inverse
Rings (2)
• Some elements in a ring with identity may have a multiplicative
inverse.
• For an a in R, if there exist another element such that a•a-1 = 1,
then a is referred to as a unit of R.
• Example: Z4
+ 0 1 2 3 • 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 1 0 1 2 3
2 2 3 0 1 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1
• Although Z4 does not form a group over multiplication, it does

satisfy the requirement to be a ring over + and •
Rings (3)
• We can also show that GF(q =pm) form rings over + and •.
+ 0 1 × 0 1
0 0 1 0 0 0
1 1 0 1 0 1
For example, it is easy to show that GF(2) satisfies the

requirements of being ring.
Rings of Polynomials
• Let R be a ring and let f(x) be a polynomial of degree n with

coefficients in R. (an ≠ 0).
n
f ( x ) = ∑ ai x i
i =0
• The symbol x is called an indeterminate.

• The set of all polynomials with indeterminate x and coefficients
in R form a ring called a polynomial ring (arithmetic is defined
as in R).
– We denote this as R[x].
Examples
• Z4[x] contains all polynomials with coefficients from Z4.

– (2+3x) + (1+2x+x3) = 3+x+x3.
– (2+3x)(1+2x+x3) = 2+2x3+3x+2x3+3x4 = 2+3x+3x4.
• GF(2)[x] is a ring of polynomials whose coefficients are either 0
or 1 with operations in modulo-2 arithmetic.
– (1+x)(1+x) = 1+x2.
– (1+x+x3)(1+x2+x3)(1+x) = 1+x7.
– (1+x+x2)+(x+x3) = 1+x2+x3.
Quotient Rings
• Consider the ring of polynomials GF(2)[x].

• Let S0 be the set of all polynomials that are divisible by xn+1.
– S0 = {0, xn+1, xn+1+x, xn+1+xn+x+1,…}
– For simplicity, let n=3.
– Therefore S0 = {0, x3+1, x4+x, x4+x3+x+1,…}
• Let S1 be the set of polynomials for which f(x) mod(x3+1) = 1
– S1 = {1, x3, x4+x+1, x4+x3+x,…} = 1+S0.
• Let S2 be the set of polynomials for which f(x) mod(x3+1) = x.
– S2 = {x, xn+x+1, x4, x4+x3,…} = x+S0.
Quotient Rings (2)
• S3 = all polynomials mod(x3+1) = x+1 = x+1+S0.

• S4 = all polynomials mod(x3+1) = x2 = x2+S0.
• S5 = all polynomials mod(x3+1) = x2+1 = x2+1+S0.
• S6 = all polynomials mod(x3+1) = x2+x = x2+x +S0.
• S7 = all polynomials mod(x3+1) = x2+x+1 = x2+x+1+S0.
• We can see that S0-S7 form the cosets GF(2)[x] under addition.
• Had we taken n = 4, we would have found 16 cosets, n = 5, 32
cosets etc.
Quotient Rings (3)
+ S0 S1 S2 S3 S4 S5 S6 S7
S0 S0 S1 S2 S3 S4 S5 S6 S7
S1 S1 S0 S3 S2 S5 S4 S7 S6
S2 S2 S3 S0 S1 S6 S7 S4 S5
S3 S3 S2 S1 S0 S7 S6 S5 S4
S4 S4 S5 S6 S7 S0 S1 S2 S3
S5 S5 S4 S7 S6 S1 S0 S3 S2
S6 S6 S7 S4 S5 S2 S3 S0 S1
S7 S7 S6 S5 S4 S3 S2 S1 S0
Quotient Rings (4)
• S0 S1 S2 S3 S4 S5 S6 S7
S0 S0 S0 S0 S0 S0 S0 S0 S0
S1 S0 S1 S2 S3 S4 S5 S6 S7
S2 S0 S2 S4 S6 S1 S3 S5 S7
S3 S0 S3 S6 S5 S5 S6 S3 S0
S4 S0 S4 S1 S5 S2 S6 S3 S7
S5 S0 S5 S3 S6 S6 S3 S5 S0
S6 S0 S6 S5 S3 S3 S5 S6 S0
S7 S0 S7 S7 S0 S7 S0 S0 S7
Quotient Rings (5)
• Let R = {S0, S1, … S7}.

– R forms a commutative group under + with S0 as identity.
– R-S0 does not form a group under •.
– Therefore R is not a field.
– However, R does form a ring, with S1 as multiplicative
identity.
ELG 5372 Error
Control Coding
Lecture 12: Ideals in Rings and

Algebraic Description of Cyclic
Codes
Quotient Ring Example
+ S0 S1 S2 S3 S4 S5 S6 S7
S0 S0 S1 S2 S3 S4 S5 S6 S7
S1 S1 S0 S3 S2 S5 S4 S7 S6
S2 S2 S3 S0 S1 S6 S7 S4 S5
S3 S3 S2 S1 S0 S7 S6 S5 S4
S4 S4 S5 S6 S7 S0 S1 S2 S3
S5 S5 S4 S7 S6 S1 S0 S3 S2
S6 S6 S7 S4 S5 S2 S3 S0 S1
S7 S7 S6 S5 S4 S3 S2 S1 S0
Quotient Ring Example
• S0 S1 S2 S3 S4 S5 S6 S7
S0 S0 S0 S0 S0 S0 S0 S0 S0
S1 S0 S1 S2 S3 S4 S5 S6 S7
S2 S0 S2 S4 S6 S1 S3 S5 S7
S3 S0 S3 S6 S5 S5 S6 S3 S0
S4 S0 S4 S1 S5 S2 S6 S3 S7
S5 S0 S5 S3 S6 S6 S3 S5 S0
S6 S0 S6 S5 S3 S3 S5 S6 S0
S7 S0 S7 S7 S0 S7 S0 S0 S7
Quotient Ring
• Recall the quotient ring R={S0, S1, …, S7}, where Si was the set
of all polynomials in GF(2)[x] whose remainder is i = ax2+bx+c
when divided by x3+1.
• Let’s identify each coset by its lowest degree polynomial
• S0 = 0, S1 = 1, S2 = x, S3 = x+1, S4 = x2, S5 = x2+1, S6 = x2+x and
S7 = x2+x+1.
• Let Rnew = {0, 1, x, x+1, x2, x2+1, x2+x, x2+x+1} where addition is
defined as conventional polynomial addition and multiplication is
conventional polynomial multiplication, both followed by
computing the remainder modulo x3+1.
• Then Rnew is a ring as well.
– We denote this ring as GF(2)[x]/(x3+1).
Quotient Ring
• For Rnew, (x+1)(x2+x+1) = 0 and for R, S3S7 = S0.

– These rings are called isomorphic rings.
GF(q)[x]/f(x)
• For a field GF(q), the ring of polynomials can be partitioned by a

polynoial f(x) of degree m into qm different equivalence classes
– One equivalence for each remainder modulo f(x).
• This ring is denoted as GF(q)[x]/f(x).
• It can be a field, but only if f(x) is irreducible in GF(q).
• In our example GF(2)[x]/x3+1, it is not a field as x3+1 =
(x+1)(x2+x+1).
Ideals in Rings
• Let R be a ring.
• Let I be a non empty subset of R
• I is an ideal if it satisfies the following conditions:
1. I forms a group under the addition operation of R
2. And any a in I and any r in R, a•r is in I.
Example
• Consider ring R (GF(2)[x]/x3+1).

• Trivial cases: {0} and {R} are ideals.
• I = {S0, S7}
– This set forms a group under addition
– S0•any element in R = S0
– S7•any element in R = S0 or S7.
• Consider Rnew
• I = {0, x2+x+1}
– Forms a group under addition
– 0(x2+x+1) = 0, 1(x2+x+1) = x2+x+1, x(x2+x+1) = x3+x2+x mod
x3+1 = x2+x+1, (x+1)(x2+x+1) = x(x2+x+1)+(x2+x+1) = 0,
x2(x2+x+1) = x(x(x2+x+1)) = x(x2+x+1) = x2+x+1 etc.
Examples
• In GF(2)[x]/x3+1, {S0, S3, S5, S6} also form an ideal.

• Since GF(2)[x]/x3+1 and Rnew are isomorphic, we can see that
{0, x+1, x2+1, x2+x} form an ideal in Rnew.
• Vectorially, the ideal in Rnew is {(000), (011), (101), (110)}.
• The above ideal satisfies all the conditions of a cyclic code.
Principal Ideal
• An ideal, I, in ring R is said to be principal if there exists some

element g in I such that every element a in I can be expressed
as a=mg, where m is an element in R. The element g is called
the generator element.
– I = {S0, S7}, S0 and S7 can be expressed as multiples of S7.
Therefore g = S7 and this ideal is said to be principal.
– I = {S0, S3, S5, S6}, either S3, S5 or S6 can act as the
generator g. This ideal is also said to be principal.
Cyclic Code Theorem 1
• Let I be an ideal in GF(q)[x]/xn-1. Then

1. There exists a unique monic polynomial g(x) in I of minimal
degree
2. I is principal with generator g(x).
3. g(x) divides xn-1 in GF(q)[x].
– A polynomial of degree m, g(x) = gmxm+gm-1xm-1+…+g0, is monic

if gm = 1.
Proof of CCT 1
• There is at least one ideal in any ring (since the entire
ring is an ideal).
• There is a lower bound on the degrees of the
polynomials in the ideal. Hence there is at least one
polynomial in the ideal of minimal degree (which may
have to be normalized to be monic).
• To show uniqueness, suppose there are two monic
polynomials of minimal degree in the ideal, g(x) and
f(x). Then h(x) = g(x)-f(x) is also an element in I. h(x)
would have a smaller degree than g(x) and f(x),
therefore this contradicts the statement that g(x) and
f(x) are of minimal degree.
Proof of CCT 1 cont’d
• To show I is principle (all elements of I are multiples
of g(x)), we assume that there exists a polynomial f(x)
in I for which f(x) = m(x)g(x) + r(x), where m(x) and
r(x) are in R.
• Since r(x) is the remainder, it has degree less than
g(x).
• The definition of an Ideal tells us that m(x)g(x) is in I.
Then r(x) = f(x)-m(x)g(x) must also be in I. But since
r(x) has a smaller degree than g(x), it contradicts the
statement that g(x) is a polynomial of minimal degree
in I. Therefore the only solution is that r(x) = 0 and
f(x) is a multiple of g(x).
Proof of CCT 1 cont’d
• To show that g(x) divides xn-1, we assume that it

doesn’t
• Then xn-1 = h(x)g(x)+r(x) where r(x) has degree less
than the degree of g(x).
• But h(x)g(x) is in I and r(x) = xn-1 - h(x)g(x), which is
the additive inverse of h(x)g(x), which is also in I,
again contradicting the statement that g(x) is a
polynomial of minimal degree in I.
• Therefore r(x) = 0 and g(x) divides xn-1.
Example
• Consider the ring GF(2)[x]/x4+1

• {0, 1, x, x+1, x2, x2+1, x2+x, x2+x+1, x3, x3+1, x3+x,
x3+x+1, x3+x2, x3+x2+1, x3+x2+x, x3+x2+x+1}
• x4+1 =(x+1)(x3+x2+x+1) or (x2+1)(x2+1).
• Therefore we have the following principal Ideals
– {0}, {0, x+1, x2+x, x2+1, x3+x2, x3+x2+x+1, x3+1,
x3+x}, {0, x3+x2+x+1}, {0, x2+1, x3+x, x3+x2+x+1}
Example cont’d
• {0, x+1, x2+x, x2+1, x3+x2, x3+x2+x+1, x3+1, x3+x} →

{(0000), (1100), (0110), (1010), (0011), (1111),
(1001), (0101)}.
• We can see that the above is a cyclic code.
• Cyclic codes of length n form an ideal in
GF(q)[x]/xn-1.
• They can be described by their generator
polynomial g(x) which is a polynomial that
divides xn-1.
Algebraic Description of Cyclic Codes
• cR(x) = xc(x) mod xn-1.

• Think of c(x) as an element of GF(q)[x]/xn-1.
– Since the arithmetic is done modulo xn-1, we simply state
that cR(x) = xc(x).
• For an (n,k) cyclic code, the generator polynomial, g(x), is the
generator of an ideal in GF(q)[x]/xn-1.
– The degree of g(x) is n-k.
• c(x) = m(x)g(x), where m(x) is a polynomial representing a
message to be encoded. Deg(m(x)) < k.
– If m(x) has degree greater than k-1, then c(x) will have
degree greater than n-1 (before dividing by xn-1 and finding
the remainder). After performing the modulo operation, c(x)
will not be a distinct codeword.
Example: Binary Cyclic Codes of Length 7
• x7-1 = (x+1)(x3+x+1)(x3+x2+1).
• (7,6) code: g(x) = x+1
• (7,4) code g1(x)= x3+x+1 or g2(x) = x3+x2+1.
• (7,3) code g3(x) = (x+1)(x3+x+1)=x4+x3+x2+1 or g4(x)
= (x+1)(x3+x2+1) = x4+x2+x+1
• (7,1) code: g5(x) = (x3+x+1)(x3+x2+1) =
x6+x5+x4+x3+x2+x+1.
ELG 5372 Error
Control Coding
Lecture 13: Nonsystematic and

Systematic Encoding of Cyclic
Codes
Nonsystematic Encoding of Cyclic Codes
• The message vector m = [m0, m1, …, mk-1].

• Let this correspond to a message polynomial m(x) =
m0+m1x+…mk-1xk-1.
• The code polynomial is obtained by polynomial multiplication,
c(x) = m(x)g(x) = m0g(x)+m1xg(x)+…+mk-1xk-1g(x).
• In matrix form:
⎡ g ( x) ⎤
⎢ xg ( x) ⎥
c( x) = [m0 m1 L mk −1 ]⎢ ⎥
⎢ M ⎥
⎢ k −1 ⎥
⎣ x g ( x)⎦
Nonsystematic Encoding of Cyclic Codes (2)
• Therefore we can write:
⎡g0 g1 L g n−k 0 0 0 L 0 ⎤
⎢0 g0 g1 L g n−k 0 0 L 0 ⎥
⎢ ⎥
c = [m0 m1 L mk −1 ]⎢ 0 0 g0 g1 L g n−k 0 L 0 ⎥
⎢ ⎥
⎢M M M M M M O M ⎥
⎢0 L g1 L g n − k ⎥⎦
⎣ 0 0 0 g0
c = mG
Example
• For a length 7 code, x7+1 = (x+1)(x3+x+1)(x3+x2+1).

• For a (7,3) code, g(x) = (x+1)(x3+x2+1) = (x4+x2+x+1).
m(x) m c(x) c m(x) m c(x) c
0 000 0 0000000 x2 001 x6+x4+x3+ 0011101

x2
1 100 x4+x2+x+1 11100100 x2+1 101 x6+x3+x+1 1101001
x 010 x5+x3+x2+x 01110010 x2+x 011 x6+x5+x4+ 1000111

1
x+1 110 x5+x4+x3+1 10010110 x2+x 111 x6+x5+x2+ 1010011
+1 1
Example cont’d
• The generator matrix for this code is:
⎡1 1 1 0 1 0 0⎤
G = ⎢⎢0 1 1 1 0 1 0⎥⎥
⎢⎣0 0 1 1 1 0 1⎥⎦
• Note that the diagonals are constant. Such a

matrix is called a Toeplitz matrix.
Parity Check Polynomial
• Previously, we saw that for a generator matrix G,

there exists a matrix H for which GHT = 0.
• This is equivalent to having a generator polynomial
g(x) for which there is a polynomial h(x) so that
g(x)h(x) = xn-1.
• h(x) has degree k.
Example
• For our (7,3) code of the previous example, h(x) =

x3+x+1.
• Any c(x)h(x) = a multiple of x7+1. Therefore when
multiplication is done modulo (x7+1), the result is 0.
• For example for c(x) = x5+x4+x3+1, c(x)h(x) =
x8+x7+x+1 = (x+1)(x7+1)mod(x7+1) = 0.
• c(x)h(x) = m(x)g(x)h(x) = xnm(x)-m(x) = 0. Since
xnm(x) is n right cyclic shifts of m(x) which equals
m(x).
Nonsystematic Parity Check Matrix
• S(x) = c(x)h(x) mod(xn-1) = 0. S(x) = s0+s1x+…sn-1xn-1.

n −1 k
s0 + s1 x + ... + s n −1 x n −1
= ∑ ci x i
∑ h j x j mod( x n − 1)
i =0 j =0
s0 + s1 x + ... + s n −1 x n −1 = c0 h0 + (c0 h1 + c1h0 ) x + (c0 h2 + c1h1 + c2 h0 ) x 2 +

+ (cn −1h1 + cn − 2 h2 + ... + cn − k hk ) x n + ...
+ (cn −1hk ) x n + k −1 mod( x n − 1)
s0 + s1 x + ... + s n −1 x n −1 = (c0 h0 + cn −1h1 + cn − 2 h2 + ... + cn − k hk )
+ (c0 h1 + c1h0 + cn −1h2 + ...) x + ...
n −1
st = ∑ ci h j , where j satisfies (i + 1) mod n = t.
i =0
Nonsystematic Parity Check Matrix 2
• The last n-k of these syndrome equations can be written

in matrix form:
S = [s k s k +1 L s n −1 ] = [c0 hk + c1hk −1 + ... + cn −1h1 c1hk + ... + cn −1h2 L c2 hk −1 + ... + cn −1h3 ]
T
⎡hk hk −1 L h1 0 0 0 L 0⎤
⎢0
⎢ hk hk −1 L h1 0 0 L 0 ⎥⎥
S = [s k s k +1 L s n −1 ] = [c0 c1 L cn −1 ]⎢ 0 0 hk hk −1 L h1 0 L 0⎥
⎢ ⎥
⎢M M M M M M O M⎥
⎢0
⎣ 0 0 L 0 hk hk −1 L h1 ⎥⎦
Example
• In our example, H is:
⎡1 0 1 1 0 0 0⎤
⎢0 1 0 1 1 0 0⎥⎥
H=⎢
⎢0 0 1 0 1 1 0⎥
⎢ ⎥
⎣0 0 0 1 0 1 1⎦
Example cont’d
⎡1 0 0 0⎤
⎢0 1 0 0⎥⎥
⎢
⎡1 1 1 0 1 0 0⎤ ⎢1 0 1 0 ⎥ ⎡0 0 0 0 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
GH = ⎢0 1 1 1 0 1 0⎥ ⎢1
T
1 0 1 ⎥ = ⎢0 0 0 0 ⎥
⎢⎣0 0 1 1 1 0 1⎥⎦ ⎢0 1 1 0⎥ ⎢⎣0 0 0 0⎥⎦
⎢ ⎥
⎢0 0 1 1⎥
⎢0 0 0 1⎥⎦
⎣
Generator Polynomial of Dual Code
• Recall that H is the generator matrix of the dual of the

code generated by G.
• Therefore gd(x) = gd0+gd1x+…+gdkxk = hk+hk-
x+…+h x k.
1 1
• We say that gd(x) is the reciprocal of h(x) (h*(x))
where h*(x) = xdeg(h)h(x-1).
• h(x) = x3+x+1, then h*(x) = x3(x-3+x-1+1) = x3+x2+1.
Systematic Encoding
• Recall that c(x) = q(x)g(x)

• For a codeword to be systematic, the first symbols
must be equal to the message symbols.
• Let xn-km(x) = m0xn-k+m1xn-k+1+…+mk-1xn-1.
• Dividing this by g(x) we get
– xn-km(x) = q(x)g(x)+d(x), where deg(d(x))≤n-k-1.
– d(x) = d0+d1x+..+dn-k-1xn-k-1.
• q(x)g(x) = c(x) = xn-km(x) – d(x)
Example
• For the binary cyclic (7,3) code, g(x) = x4+x2+x+1.

• Let m(x) = x2 = (001)
• x4m(x) = x6.
• x6 ÷x4+x2+x+1 = x2+1 with remainder x3+x+1.
• Therefore c(x) = x6+x3+x2+1 = (1011001)
Example cont’d
m(x) m x4m(x) r(x) c(x) c
0 000 0 0 0 0000000
1 100 x4 x2+x+1 x4 +x2+x+1 1110100
x 010 x5 x3+x2+x x5+x3+x2+x 0111010
x+1 110 x5+x4 x3+1 x5+x4+ x3+1 1001110
x2 001 x6 x3+x2+1 x6+x3+x2+1 1011001
x2+1 101 x6+x4 x3+x2 x6+x4+ x3+x2 0011101
x2+x 011 x6+x5 x2+1 x6+x5+ x2+1 1010011
x2+x+1 111 x6+x5+x4 x x6+x5+x4+x 0100111

Generator Matrix
• To get the systematic generator, we need the

codewords corresponding to all weight 1 messages.
• In our example:
⎡1 1 1 0 1 0 0⎤
G = ⎢⎢0 1 1 1 0 1 0⎥⎥
⎢⎣1 0 1 1 0 0 1⎥⎦
⎡1 0 0 0 1 0 1⎤
⎢0 1 0 0 1 1 0 ⎥
H=⎢ ⎥
⎢0 0 1 0 1 1 1 ⎥
⎢ ⎥
⎣ 0 0 0 1 0 1 1 ⎦
Systematic Error Detection
• r(x) =r0+r1x+…+rn-1xn-1 = c(x)+e(x) = (c0+e0) + (c1+e1)x + … +(cn-1

+ en-1)xn-1
• Assuming that no error have occurred, ei = 0 for all i and r(x) =
c(x).
• For a systematic code, we would throw away the parity bits and
xn-km(x) would remain. This is the message part of the
codeword.
• Dividing by g(x) we should get the same remainder as is in the
parity part of the codeword.
• If they are different, an error is detected.
Example
• r(x) = x6+x3+x2+1.
• If y(x) contains no errors, x4m(x) = x6 and d(x) =
x3+x2+1.
• Dividing x6 by x4+x2+x+1 yields a remainder of
x3+x2+1. Therefore, we can conclude that y(x) is a
valid codeword and no error is detected.
• y(x) = x4+x+1.
• Dividing x4 by x4+x2+x+1 yields d(x) = x2+x+1 which
is not equal to x+1. Therefore an error is detected.
Introduction to Shift Registers
• Shift registers are made up of storage devices, adders and

multipliers
• Storage devices are (generally D) flip flops.
• For binary addition, adders are XOR gates
• In GF(2), multiplication is done by 1 (closed circuit) or 0 (open
circuit).
• For GF(2m) we use combinational circuits for addition and
multiplication.
D + g
Non binary storage devices
• For GF(2m), we can place m FFs in parallel.
MSB
LSB
Non binary addition
• Consider GF(4)
+ 0 1 α α2
00 01 10 11
0 0 1 α α2
00 00 01 10 11
1 1 0 α2 α
01 01 00 11 10
α α α2 0 1
10 10 11 00 01
α2 α2 α 1 0
11 11 10 01 00
Non binary addition
• (a,b) + (c,d) = (e,f)

c
}
0 0 1 1
0 0 1 1
}b
1 1 0 0
a{
1 1 0 0
}
d
e = a XOR c and it is easy to show that f = b XOR d
Non binary multiplication
• In GF(4), let a = (a1, a0)

• Suppose we wish to compute b = αa.
• a=0, b=0 (0,0) → (0,0)
• a=1, b=α. (0,1) → (1,0)
• a=α, b=α2 (1,0) → (1,1)
• a=α2, b=1 (1,1) → (0,1)
• b1 = a1 XOR a0, b0 = a1.

ELG 5372 Error
Control Coding
Lecture 14: Shift Registers for

Encoding and Decoding of
Cyclic Codes
Register State and Polynomial
Representation
• State of register is the contents of the storage devices
1 1 0 0 1 output
• State = 1001
• A delay of n time units is represented as xn.
• The polynomial output by the above circuit is 1+x3+x4. (First
element first representation). Or 1+x+x4 (last element first
representation).
Polynomial Multiplication
• Let a(x) = a0+a1x+…+amxm and g(x) = g0+g1x+…+gnxn

• Let b(x) = a(x)g(x) = g0a(x)+xg1a(x)+x2g2a(x)+…+xngna(x).
a(x) …
g0 g1 g2 gn
+ + … + b(x)
Last element first implementation

Example
• Let g(x) = 1+x+x4 in GF(2)[x].

• Let a(x) = 1+x+x3.
• Then b(x) = 1+x2+x3+x5+x7.
Example cont’d
a3=1
1 1
1 1 1
+ 0 0 0 + b(x)
0
Example cont’d
a2=0
0 0
0 1
1 + 1 0 0 + 01
Example cont’d
a1=1
1 1
1 1
0 + 1 1 0 + 101
Example cont’d
a0=1
1 1
1 0
1 + 1 1 1 + 0101
Example cont’d
0 0
0 1
1 + 0 1 1 + 10101
Example cont’d
0 0
0 0
0 + 1 0 1 + 110101
Example cont’d
0 0
0 0
0 + 0 1 0 + 0110101
Example cont’d
0 0
0 0
0 + 0 0 1 + 10110101
1+x2+x3+x5+x7
Polynomial Multiplication First Element First
• To implement the multiplier for First element first processing,

reverse the order of the coefficients of g(x) in the register.
am, am-1, …, a0
…
gn gn-1 gn-2 g0
+ + … + b(x)
Polynomial Division
• Computing polynomial division, and more importantly,

computing the remainder after division are important tasks in
encoding cyclic codes.
r0 r1 rn-1
a(x)
+ + … +
Last element first
g0 g1 gn-1 gn-1
q(x)
Example
• Let g(x) = x5+x2+1 in GF(2)[x].

• We wish to find a(x) = q(x)g(x)+d(x). Let a(x) =
x8+x2+1.
• We can see that a(x) = (x3+1)g(x)+x3.
Example cont’d
a4 a5 a6 a7 a8
a3=0
1 1
+ 0 0 + 0 0 1
1
1
Example cont’d
a2=1 1
1 0
+ 1 0 + 1 0 0
0
0
Example cont’d
a1=0 01
0 1
+ 1 1 + 0 1 0
0
0
Example cont’d
a0=1 001
0 0
+ 0 1 + 1 0 1
1
1
Example cont’d
x3
0 1001
0 0
+ 0 0 + 0 1 0
0
0
1+x3
Joint multiplication-division
• Note that a multiplier circuit is essentially an FIR filter and a

division circuit is essentially an IIR filter.
• If we wanted a circuit to compute a(x) × (p1(x)/p2(x)), we could
cascade a multiplier circuit followed by a division circuit.
• For example, the circuit with response x2+1/x3+x+1 is
+ + +
Non-Systematic Encoding of Cyclic Codes
• Non-Systematic encoding of cyclic codes is simply polynomial

multiplication.
• The encoder for a (7,4) cyclic code generated by g(x) = x3+x+1
is:
+ +
Systematic Encoding of Cyclic Codes
• Here we will use a switched circuit.

• We need a divider circuit to compute the remainder of
xn-km(x)/g(x).
• There are two parts: 1) message part of codeword, 2)
calculation of parity symbols of codeword.
Systematic Encoding of Cyclic Codes
y
g0 g1 g1 gn-k-1 x
+ + … + +
x
m0, m1, …, mk-1
x y
Codeword register
y
Initially all switches to x until message word is completely entered,
then all switches to y.
Example (7,4) code, g(x) = x3+x+1
0
1 1
1 0 1
0 + 0 +
1
1
101 1
0
1
Message m(x) = x3+x2+1

0
1 1
1 0 1
1 + 1 +
0
1
10 1
0
1 1
0
1 1
1 1 1
1 + 0 +
0
0
1 0
0
11 0
0
1 1
1 0 1
1 + 0 +
0
1
1
0
110 1
0
0
0 0 0
1 + 0 +
1
0
0
1101
Over next three cycles,

the remainder will shift out of the register
0
0
0 0 0
0 + 1 +
0
0
0
11010
0
0
0 1 1
0 + 0 +
0
0
0
110100
0
0
0 0 0
0 + 0 +
0
0
0
1101001
Syndrome decoding
• Let us define the syndrome as the remainder in the

following equation:
– r(x) = q(x)g(x)+s(x), where r(x) = c(x)+e(x).
– s(x) = s0+s1x+…+sn-k-1xn-k-1.
• Let rR(x) be the right cyclic shift of r(x).
– rR(x) = xr(x) mod(xn-1).
Cyclic Coding Theorem 2
• For r(x) having syndrome s(x), rR(x) has syndrome s’(x) = xs(x)
mod g(x).
• Proof
– r(x) = q(x)g(x)+s(x)
– rR(x) = xr(x)-(xn-1)rn-1.
– rR(x) = q’(x)g(x)+s’(x) = x(q(x)g(x)+s(x))-(xn-1)rn-1
– xn-1 = g(x)h(x).
– Therefore q’(x)g(x)+s’(x) = x(q(x)g(x)+s(x))-g(x)h(x)rn-1
– xs(x) = (q’(x)-xq(x)+h(x)rn-1)g(x)+s’(x).
– Therefore, s’(x) is the remainder when we divide xs(x) by
g(x).
Syndrome calculation
• Assume that we transmit 0000000 for the cyclic code with

generator g(x) = x3+x+1.
• If we receive 1000000 (r(x) = 1), s(x) = 1
• For r(x) = x, s(x) = x
• For r(x) = x2, s(x) = x2
• For r(x) = x3, s(x) = x+1
• For r(x) = x4, s(x) = x2+x
• For r(x) = x5, s(x) = x2+x+1
• For r(x) = x6, s(x) = x2+1
• For systematic codes, when the error is in the parity bits, the
syndrome is equal to the error polynomial e(x).
ELG 5372 Error
Control Coding
Lecture 15: Decoding of Cyclic

Codes and Intro to BCH codes
Meggitt Decoder
• Consider the decoder shown below

Connections determined by g(x)
Gate
+ Syndrome register
1
…
Error pattern Gate
Detection circuit 2
+ Buffer register +
Gate
2
Meggitt Decoder 2
• The Meggitt decoder shifts the received word (and its

corresponding syndrome) until a syndrome corresponding to a
error in the first bit (ie en-1(x) ≠ 0).
• Then it corrects that error and adjusts the syndrome and
continues shifting until all errors are corrected.
Example
• Consider the (7,4) single error correcting code for which g(x) =
x3+x+1.
• As we saw, for e(x) = x6, s(x) = x2+1. s0=1, s1=0 and s2=1.
• Therefore the decoder shifts the received word until the
syndrome x2+1 is detected (s0s1’s2).
• Let us assume that the received word is r(x) = x2.
Example
Initially gate 1
is on and gates
0 Gate 2 are off.
r(x) + 0 + 0 0
1
0 Gate
2
0
+ +
Gate
2
Example
r(x) Gate 0 +
+ 0 1
1
0 Gate
After the codeword 2
is shifted in, we 0
have s(x) = x2
in the syndrome 0
+ 0010000 +
register and the
received word in
Gate
the buffer register
2
Example
0 + 0 + 0 1
Gate 1 is then
0
switched off
and Gates 2
are switched 0
on. 0 0
+ 0010000 +
Example
0 + 1 + 1 0
0
0 00
+ 0001000 +
Example
0 + 0 + 1 1
0
0 000
+ 0000100 +
Example
0 + 1 + 1 1
0
0 0000
+ 0000010 +
Example
0 + 1 + 0 1
Here we add 1
x6 to the
shifted received 1
word to correct the
error. We must 1 00000
adjust the + 0000001 +
syndrome to
account for the change
Example
0 1+1
+ =0
+ 0 0
Rnew(x) =
0
r(x)+x6.
s’new(x) =
(xr(x)+xx6) 0
mod(x7+1) = 0 000000
s’(x)+1 + 00000000 +
Since syndrome = 0, just shift out remaining bits.

Meggitt Decoder
• The syndrome correction after an error is corrected

allows the decoder to search for more errors in the
event of a multiple error correcting code.
• The error pattern detection circuit has to be
hardwired to search for all error patterns in which the
MSB is in error.
BCH and RS codes
• BCH codes are named for Bose, Ray-Chaudhuri and

Hocquenghem who developed a means of designing cyclic
codes with a specified design distance.
• RS code are named for their inventors as well.
• It was later determined that these codes are related and their
decoding algorithms are quite similar.
Designing BCH codes
• BCH codes can be specified by a generator polynomial.

• A BCH code over GF(q) of length n with dmin ≥ δ+1:
– Determine the smallest m such that GF(qm) has an nth root
of unity β.
– Select a nonnegative integer b (usually b = 1).
– Write down a list of δ consecutive powers of β: βb, βb+1,
βb+2,…, βb+δ-1.
– Find the associated minimal polynomials for each of these
elements wrt GF(q). These minimal polynomials may not be
distinct.
– The generator polynomial g(x) = LCM of the minimal
polynomials found above.
Example
• We wish to design a binary BCH code of length 9 capable of

correcting 2 errors (we want dmin ≥ 5).
• In GF(64), β = α7 has order 9.
• Let us choose b = 1, δ = 4.
• Therefore we need to find the minimal polynomials of α7, α14,
α21, α28.
• The elements α7, α14, and α28 are all in the same conjugacy
class, therefore they share the same minimal polynomial ->
x6+x3+1.
• The remaining element has minimal polynomial x2+x+1.
• Therefore g(x) = LCM(x6+x3+1, x6+x3+1, x2+x+1, x6+x3+1) =
(x6+x3+1)(x2+x+1)=x8+x7+x6+x5+x4+x3+x2+x+1. (dmin = 8)
Example cont’d
• Suppose we had chosen b = 2.

• Our list of elements becomes
– (α7)2 = α14, (α7)3 = α21, (α7)4 = α28, (α7)5=α35.
– The generator polynomial is still g(x) = x8 + x7 + x6 + x5 + x4 +
x3 + x2 + x + 1.
– Still a (9,1) repetition code.
Example 2
• We want to design a binary BCH code of length 15 with dmin ≥ 5.

• In GF(16), α has order 15.
– The list of 4 elements is: α, α2, α3, α4.
– x4+x+1
– x4+x3+x2+x+1
– g(x) = x8+x7+x6+x4+1
– Since g(x) is a codeword of weight 5, we know that dmin ≤5
and from the BCH bound, dmin ≥ 5, therefore dmin = 5 for this
(15,7) code.
Definitions
• A BCH code is said to be narrow sense if b = 1.

• A BCH code is said to be primitive if the root of its
generator polynomial (β) is a primitive element in
GF(qm). This is only the case when n = qm-1.
• BCH Bound: For generator polynomial g(x), δ is the
number of consecutive powers of the nth root of unity
β that are roots of g(x). Then dmin ≥ δ+1. See proof of
this bound on pages 237-239 of text.
Example 3
• Design a binary BCH code of length 7 that corrects one error

(dmin ≥ 3)
– Choose α which has order 7 in GF(8): α, α2 from GF(8).
– g(x) = x3+x+1
– g(x) is the primitive polynomial used to generate GF(8).
– This is the Hamming (7,4) code.
– All Hamming codes use the primitive polynomial as their
generator matrix. They all have two consecutive powers of α
as roots, therefore dmin ≥ 3 (actually dmin = 3 for all Hamming
codes).
Non Binary BCH Codes
• Codes are constructed on GF(q) where q ≠ 2.

• For example, suppose we wanted to design a 4-ary code of
length 15.
– Need to find minimal polynomials of GF(16) wrt GF(4).
Example
• {1} → (x+1)
• {α, α4} → (x2+x+α5)
• {α2, α8} → (x2+x+α10)
• {α3, α12} → (x2+α10x+1)
• {α5} → (x+α5)
• {α6, α9} → (x2+α5x+1)
• {α7, α13} → (x2+α5x+α5)
• {α10} → (x+α10)
• {α11, α14} → (x2+α10x+α10)
α5 and α10 are elements in GF(16) with order 3, they are α and
α2 of GF(4).
Example
• Design a 4-ary BCH code of length 15 with dmin ≥ 5.

• Choose α, α2, α3, α4.
• g(x) = (x2+x+α)(x2+x+α2) (x2+α2x+1) = (x6+x5+α2x4+x3+αx+α2).
• The code is a (15,9) code.
• Rate = 9/15
• For the Binary BCH code with dmin ≥ 5, rate = 7/15.
Reed Solomon Codes
• An RS code is a q-ary BCH code of length q-1.

• We need minimal polynomials of GF(q) wrt GF(q).
• Conjugacy class is β, βq, βq2 …
βq = β.
• Conjugacy classes contain 1 element and minimal
polynomial is in the form (x-β).
Example
• Design a 16-ary length 15 RS code with dmin ≥ 5.

• g(x) = (x+α)(x+α2)(x+α3)(x+α4) = x4+a13x3+α6x2+α3x+a10.
• Since there are no extraneous roots, k = n-δ., therefore δ = n-k
and dmin ≥ n-k+1.
• But Singleton bound states that dmin ≤ n-k+1.
• Therefore dmin = n-k+1.
ELG 5372 Error
Control Coding
Lecture 16: Decoding of BCH

and RS Codes
Algebraic Decoding of BCH and RS Codes
• The algebraic decoding of BCH and RS codes has the following

general steps:
– Computation of the syndrome
– Determination of an error location polynomial. The roots of
this polynomial provide the location of the errors. There are
many algorithms for finding this polynomial (Peterson’s,
Berlekamp-Massey, Peterson-Gorenstein-Zierler etc)
– Determination of roots of error locator polynomial. Usally
done by Chien search
– For non-binary BCH and RS codes, error values must be
found (usually using Forney’s algorithm).
Computation of Syndrome
• For all examples, we will assume narrow-sense BCH

or RS codes.
• We know that α, α2, …, α2t are roots of g(x), therefore
they are roots of c(x) as well.
• Therefore c(α) = c(α2) = … = c(α2t).
• The received polynomial r(x) = c(x)+e(x).
• Let Sj = r(αj) = c(αj)+e(αj) = e(αj) for j = 1, 2, …, 2t.
• The values S1, S2, …, S2t are the syndromes of the
received polynomial.
Computation of Syndrome
( ) = ∑e α
n n
S j = ∑ ei α j i
i
ij
(*)
i =1 i =1
Suppose that r(x) has v errors in it and that they

are in positions i1, i2, …, iv. Then (*) becomes:
( ) ( )
v v v
S j = ∑ eil α = ∑ eil α il = ∑ eil X l j
j il j
(**)
l =1 l =1 l =1
where X l = α il and j = 1, 2, ..., 2t

Computation of Syndrome for Binary Codes
• For binary codes eil = 1. Therefore (**) becomes

v
S j = ∑ X lj
l =1
where X l = α il and j = 1, 2, ..., 2t
• If we know Xl, then we know the location of the

error.
– For example, if X1= α2, then by definition, i1 = 2
and the error is in digit r2.
The Error Locator Polynomial for Binary BCH
Codes
• We obtain the following set of equations:

S1 = X 1 + X 2 + ... + X v
S 2 = X 12 + X 22 + ... + X v2
M
S 2t = X 12t + X 22t + ... + X v2t
• The equations are said to be power-sum

symmetric functions and it gives us a set of 2t
equations with v unknowns.
The Error Locator Polynomial for Binary BCH
Codes
• The set of power symmetric functions is a solvable

set of functions (for v ≤ t). However, it is
computationally complex.
• Therefore a new polynomial is introduced. This is
the error locator polynomial:
v
Λ ( x) = ∏ (1 − X l x) = Λ v x v + Λ v −1 x v −1 + ... + Λ1 x + 1
l =1
• Xl-1 is a root of this polynomial.

Finding the Error Locator Polynomial
Let us consider the case when v = 2.
Λ ( x) = (1 − X 1 x)(1 − X 2 x) = 1 − ( X 1 + X 2 ) x + X 1 X 2 x 2
Λ1 = -(X1+X2) and Λ2 = X1X2
We can see that S1+Λ1 = 0
S2 = X12+X22, S2+2Λ2 = (X12+2X1X2+X22) = (X1+X2)2
S2+S1Λ1+2Λ2 = 0
Also S3+Λ1S2+Λ2S1 = 0
And S4+Λ1S3+Λ2S2 = 0
Finding the Error Locator Polynomial 2
• We can extend this to arbitrary v:

k =1 : S1 + Λ1 = 0
k=2 : S 2 + S1Λ1 + 2Λ 2 = 0
M
k =v : S v + S v −1Λ1 + S v − 2 Λ 2 + ... + vΛ v = 0
k = v +1 : S v +1 + S v Λ1 + S v −1Λ 2 + ... + S1Λ v = 0
k =v+2 : S v + 2 + S v +1Λ1 + S v Λ 2 + ... + S 2 Λ v = 0
M
k = 2t S 2t + S 2t −1Λ1 + S v −1Λ 2 + ... + S 2t − v Λ v = 0
These are Newton’s identities
Finding the Error Locator Polynomial 3
Let v = t
⎡ S1 S2 S3 L Sv ⎤ ⎡ Λ v ⎤ ⎡ S v +1 ⎤
⎢S S3 S4 L S v +1 ⎥⎥ ⎢⎢ Λ v −1 ⎥⎥ ⎢S ⎥
⎢ 2 ⎢ v+2 ⎥
⎢ S3 S4 S5 L S v + 2 ⎥ ⎢Λ v − 2 ⎥ = − ⎢ S v + 3 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢M M M O M ⎥⎢ M ⎥ ⎢ M ⎥
⎢Sv L S 2v −1 ⎥⎦ ⎢⎣ Λ1 ⎥⎦ ⎢ S 2v ⎥
⎣ S v +1 Sv+2 ⎣ ⎦
MvΛ = -S
Peterson-Gorenstein-Zierler Algorithm
• Set v = t
• Form Mv and determine if Mv is invertible (compute
det(Mv), if det(Mv) = 0, Mv is not invertible.
– If not invertible, it means there are less than t
errors
– Set v = t-1 and repeat step
• Once Mv is invertible, compute Λ = Mv-1(-S)
Example
• Consider the binary (15,7) BCH code.

• This is a two error correcting code.
• Suppose r(x) = x7.
• S1 = α7, S2 = α14, S3 = α6, S4 = α13,
• Assume v = 2
⎡ α 7 α 14 ⎤
M 2 = ⎢ 14 6 ⎥
, det( M 2 ) = α 13
− α 13
=0
⎢⎣α α ⎥⎦
Example cont’d
• Therefore we assume that v = 1

• M1 = [α7]
• Then a7Λ1 = -α14
• Or Λ1 = -α7.
• The error locator polynomial is Λ(x) = 1-α7x (or
1+α7x). This has root x = α8. Therefore X1-1 = α8, or
X1 = α7. Error position is r7 in r(x). Therefore c(x)
=r(x)-x7 = 0.
Example 2
• For the same code, assume that r(x) = x2+x5.

• S1 = α2+α5 = α, S2 = α4+α10 = α2, S3 = α6+1 = α13
and S4 = α8+α5 = α4.
⎡α α2 ⎤
M2 = ⎢ 2 ⎥ , det( M 2 ) = α 14
− α 4
= α 9
⎢⎣α α ⎥⎦
13
⎡ 2⎤ ⎡ 8⎤
1 α 13
α α 4
α
M −21 = 9 ⎢ 2 ⎥=⎢ 8 7⎥
α ⎢⎣ α α ⎥⎦ ⎢⎣α α ⎥⎦
⎡Λ 2 ⎤ ⎡α 4 α 8 ⎤ ⎡α 13 ⎤ ⎡α 7 ⎤
Λ=⎢ ⎥=⎢ 8 ⎥⎢ 4 ⎥ = ⎢ ⎥
⎣ Λ1 ⎦ ⎢⎣α α ⎥⎦ ⎢⎣ α ⎥⎦ ⎢⎣ α ⎥⎦
7
Example 2 cont’d
• Therefore Λ(x) = α7x2+αx+1 = (α2x+1)(α5x+1)

• The roots are X1-1 = α13 and X2-1 = α10. Therefore X1
= a2 and X2 = α5.
• This means r2 and r5 are incorrect.
• c(x) = r(x)+x2+x5 = 0.
Simplifications for Binary Codes
• For GF(2m), (X+Y)2 = (X2+Y2).

• Therefore S2j = Sj2.
• Also nX = 0 if n is even and nX = X if n is odd.
Newton’s Identities
k =1 : S1 + Λ1 = 0
k=2 : S 2 + S1Λ1 + 2Λ 2 = 0
M
k =v : S v + S v −1Λ1 + S v − 2 Λ 2 + ... + vΛ v = 0
k = v +1 : S v +1 + S v Λ1 + S v −1Λ 2 + ... + S1Λ v = 0
k =v+2 : S v + 2 + S v +1Λ1 + S v Λ 2 + ... + S 2 Λ v = 0
M
k = 2t S 2t + S 2t −1Λ1 + S v −1Λ 2 + ... + S 2t −v Λ v = 0
All the even equations are redundant
Newton’s identities minus redundant
equations
k =1 : S1 + Λ1 = 0
k =3 : S 3 + S 2 Λ1 + S1Λ 2 + Λ 3 = 0
M
k = 2t − 1 : S 2t −1 + S 2t − 2 Λ1 + ... + S t −1Λ t = 0
Newton’s identities minus redundant
equations in matrix form
• AΛ = -S
⎡ 1 0 0 0 L 0 0 ⎤
⎢ S ⎥ Λ
⎢ 2 S1 1 0 L 0 0 ⎥⎡ 1 ⎤ ⎡ S1 ⎤
⎢ S4 S3 S2 S1 1 L 0 ⎥ ⎢Λ 2 ⎥ ⎢ S ⎥
⎢ ⎥ ⎢ ⎥ = − ⎢ 3 ⎥
⎢ M M M M M M M ⎥⎢ M ⎥ ⎢ M ⎥
⎢ S 2t − 4 ⎥ ⎢ ⎥ ⎢ ⎥
S 2t − 5 S 2t − 6 S 2t − 7 L St −2 S t −3 ⎣ Λ t ⎦ ⎣ S 2t −1 ⎦
⎢ ⎥
⎢⎣ S 2t − 2 S 2t − 3 S 2t − 4 S 2t − 5 L St S t −1 ⎥⎦
Peterson Algorithm
• Assume there are t errors. If there are in fact t errors, A is

invertible.
– If A not invertible, delete last two rows and last two columns
and repeat
• Once A is invertible, Λ = A-1(-S).
Coefficients for Error Locator Polynomial for
small number of errors
• Using Peterson’s algorithm, explicit expressions for Λi have

been computed for codes that can correct a small number of
errors.
• 1 error correcting, Λ1 = S1
• 2 error correcting, Λ1 = S1 and Λ2 = (S3+S13)/S1.
• 3 error correcting, Λ1 = S1 and Λ2 = (S12S3+S5)/(S13+S3), Λ3 =
(S13+S3)+S1Λ2.
• Others can be found on page 252 of text.
ELG 5372 Error
Control Coding
Lecture 17: Berlekamp-Massey

Algorithm for Binary BCH
Codes
Chien Search
• If Λ(βi) = 0, then rn-i is in error.

• This means that Λ(βi)+1 = 1.
• X(βi) = Λ1βi + Λ2β2i+…Λvβvi.
• If X(βi) = 1, cn-i = rn-i+1, else cn-i = rn-i.
• If the Chien Search fails to find v roots of a
error locator polynomial of degree v, then the
error pattern is an uncorrectable error pattern.
Chien Search 2
r0 r1 r2 … rn-1 +
X If X = 1
+ Y=1, else Y=0
Λ1 Λ2 … Λv
β β2 βv
Berlekamp-Massey Algorithm
• Peterson’s method involves straightforward linear algebra, but it

is computationally complex to implement.
• Should A be singular, the last two rows and columns are deleted
and the determinant of the new A must be computed again.
• Thus, the Peterson method starts with a big problem and works
it down to a small problem (thus if it is a small problem to begin
with, the most computationally complex step is done for
nothing).
• The Berlekamp-Massey algorithm starts with a small problem
and works up to a large problem.
• Complexity of Peterson algorithm is proportional to v3, while that
of Berlekamp-Massey algorithm is proportional to v2.
Berlekamp-Massey Algorithm 2
• It was observed from Newton’s identities that

v
S j = −∑ Λ i S j −i , j = v + 1, v + 2,...,2t (*)
i =1
• (*) describes the output of a linear feedback shift

register with coefficients Λ1, Λ2, …, Λv.
• Given a sequence S1, S2, …, S2t, we can
determine the LFSR coefficients.
• In the Berlekamp-Massey algorithm, we build the LFSR that

produces the entire sequence by successively modifying an
existing LFSR to produce increasingly longer sequences.
• We start with a LFSR that can produce S1, then we check to see
if that LFSR can produce {S1,S2}.
– If so, no modification is necessary.
– If not, then we need to modify the current LFSR to produce a
new one that can produce the sequence.
• We repeat until we have a LFSR that produces the sequence
{S1, S2, … S2t}.
• Let k be the iteration index of the algorithm and let Lk

be the length of the LFSR on iteration k.
• Let Λ(k)(x) be the error locator polynomial at iteration
k.
Λ( k ) ( x) = 1 + Λ(1k ) x + Λ(2k ) x 2 + ... + Λ(Lk ) x Li
i
• At iteration k, we have a LFSR capable of producing

sequence {S1, S2, …, Sk}.
Lk
S j = −∑ Λ(ik ) S j −i , j = Lk + 1,..., k
i =1
• Suppose after k-1 iterations, we have Λ(k-1)(x). On

iteration k, we compute:
Lk −1
Sˆ k = − ∑ Λ(ik −1) S k −i (**)
i =1
• If this is equal to Sk, then the error locator polynomial

is good to produce the sequence {S1, S2, …, Sk} and
no changes are needed. Therefore Λ(k)(x) = Λ(k-1)(x).
• If (**) is not equal to Sk, then the polynomial needs to
be modified. Lk −1
• This discrepancy is d k = S k − Sˆ k = S k + ∑ Λ i S k −i
( k −1)
i =1
Lk −1
d k = S k − Sˆ k = ∑ i S k −i
Λ( k −1)
i =0
Let us produce a new polynomial Λ(k)(x) = Λ(k-1)(x)+AxlΛ(m-1)(x),
where A is some element in the field, l is an integer and
Λ(m-1)(x) is one of the prior error locator polynomials associated
with an non-zero discrepancy dm.
Let us compute the new discrepancy using this new polynomial.

Lk −1 Lm−1
d 'k = ∑ Λ(ik −1) S k −i + A ∑ Λ(im −1) S k −i −l = d k + Ad m if we select l = k − m
i =0 i =0
By choosing A = -dm-1dk, d’k = 0. Thus, new polynomial produces

{S1, S2, …, Sk}. Proof in text to show that this algorithm produces
shortest LFSR.
Example
• Consider the two error correcting binary (15,7) BCH

code. The generator polynomial has roots α, α2, α3
and α4.
• Let r(x) = x2+x5.
• S1 = α, S2 = α2, S3 = α13 and S4 = α4.
Example Cont’d
k Sk dk c(x) L p(x) dm
0 0 1 1 0 1 1
1 α α 1+αx 0 1 1
2 α2 0 1+αx 1 1 α
3 α13 α13+α3 = 1+αx+x(3- 2 1+αx α

α8. 1)α8α14 =
1+αx+α7x2.
4 α4 α4+α14+ 1+αx+α7x2. 2 1+αx α8

α9 = 0
Simplification for binary codes
• Since S2k is not independent of Sk, every even

iteration of the Berlekamp-Massey algorithm will
result in dk = 0. Thus, we can skip every even
iteration.
ELG 5372 Error
Control Coding
Lecture 18: Decoding of

Nonbinary BCH and RS Codes
Nonbinary BCH and RS decoding
• The solution of the error locator polynomial requires some

extra work and, once found, the error values must be
determined.
S1 = ei1 X 1 + ei2 X 2 + ... + eiv X v

S 2 = ei1 X 12 + ei2 X 22 + ... + eiv X v2
M
S 2t = ei1 X 12t + ei2 X 22t + ... + eiv X v2t
• These equations are not power-sum symmetric.

Nonbinary BCH and RS decoding 2
• Let Λ(x) = 1+Λ1x+…+Λvxv have roots Xl-1, for l = 1,

2,…, v.
• Then, Λ(Xl-1) = 0 = 1+Λ1Xl-1+…+ΛvXl-v
• Multiplying by eijXlj, we get 0= eilXlj(1+Λ1Xl-1+ …
+ΛvXl-v ) = eil(Xlj+Λ1Xlj-1+…+ΛvXlj-v ).
• Summing over all l, we get
v
0 = ∑ eil (Λ v X l j −v +Λ v −1 X l j −v +1 + ... + Λ1 X l j −1 + X l j )
l =1
v v v v
0 = Λ v ∑ eil X l j −v
+Λ v −1 ∑ eil X l j − v +1
+... + Λ1 ∑ eil X l j −1
+ ∑ eil X l j
l =1 l =1 l =1 l =1
Nonbinary BCH and RS decoding 3
• Therefore
0 = ΛvSj-v+Λv-1Sj-v+1+…+Λ1Sj-1+Sj. (***)
v
S j = − ∑ Λ i S j −i , j = v + 1, v + 2,...,2t (*)
i =1
• We used the Berlekamp-Massey Algorithm to

solve (*) in the binary case. (*) and (***) are
identical. Therefore we can use the
Berlekamp-Massey algorithm to solve for Λi’s
in (***) as well.
Forney’s Algorithm
• Once the error locations are known, we use Forney’s

algorithm to find the error values.
v
S j = ∑ eil X l j , j = 1,2,...,2t (****)
l =1
• Now, we know the values of Xl. We can rewrite (****)
as ⎡ X 1 X 2 X 3 ... X v ⎤ ⎡ ei ⎤ ⎡ S1 ⎤
⎥ ⎢e ⎥ ⎢ S ⎥
1
⎢X2 X2 X 32 L X v2
⎢ 1 2 ⎥ ⎢ i2 ⎥ = ⎢ 2 ⎥
⎢ M M M O M ⎥⎢ M ⎥ ⎢ M ⎥
⎢ 2t 2t 2t ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 1
X X 2 X 32t L X v ⎦ ⎢⎣ iv ⎥⎦ ⎣ S 2t ⎦
e
Xe = S
Forney’s Algorithm 2
• Although we can solve for e by inverting X, X is a

Vandermonde matrix.
• We can solve Vandermonde systems in a manner
that is less computationally complex than matrix
inversion.
• This solution is called Forney’s algorithm.
Forney’s Algorithm
2t −1
• Let S ( x) = ∑ j +1
S x j
= S1 + S 2 x + ... + S 2t x 2t −1
j =0
• And let Ω( x) = S ( x)Λ ( x) mod x 2t
• Then Forney’s algorithm states that

eik = −Ω( X k−1 ) / Λ ' ( X k−1 )
Proof of Forney’s Algorithm
2t −1
1− x 2t
= (1 − x)(1 + x + x + ... + x
2 2t −1
) = (1 − x) ∑ x i (1)
i =0
(1 − x )mod x ⎡ 2t −1 ⎤
2t 2t
= 1, therefore ⎢(1 − x) ∑ x i ⎥ mod x 2t = 1 (2)
⎣ i =0 ⎦
Ω( x ) = (S ( x)Λ( x) ) mod x 2t
=
⎛
⎜
⎜
(
S1 + S 2 x + ... + S 2t x 2 t −
)
1 ⎛ v
⎜
⎞⎞
⎜ ∏ (1 − X i x) ⎟ ⎟ mod x 2t
⎟⎟
⎝ ⎝ i =1 ⎠⎠
⎛ 2t −1⎛ v ⎞ ⎛ v ⎞⎞
= ⎜ ∑ ⎜⎜ ∑ eil X l ⎟⎟ x ⎜⎜ ∏ (1 − X i x) ⎟⎟ ⎟ mod x 2t
j +1 j
⎜ j =0⎝ l =1 ⎠ ⎝ i =1 ⎟
⎝ ⎠⎠
⎛ v 2t −1 v ⎞
= ∑ eil X l ∑ ( X l x) ∏ (1 − X i x) ⎟ mod x 2t
⎜ j
(3)
⎜ l =1 ⎟
⎝ j =0 i =1 ⎠
Proof of Forney’s Algorithm 2
• Equation (3) can be rewritten as:
⎛ ⎞
⎜ ⎟
⎜ v ⎡ 2t −1 ⎤ v ⎟
Ω( x) = ⎜ ∑ eil X l ⎢(1 − X l x) ∑ ( X l x) ⎥∏ (1 − X l x) ⎟ mod x 2t
j
⎜ l =1 ⎢⎣ j =0 ⎥⎦ i =1 ⎟
⎜ 1 4 4 4 2 4 443 i ≠l ⎟
⎝ − ⎠
2t
1 ( X l x )
• 1-(Xlx)2t mod x2t = 1. Therefore:

v v
Ω( x) = ∑ eil X l ∏ (1 − X l x)
l =1 i =1
i ≠l
v v
Ω( X k−1 ) = ∑ eil X l ∏ (1 − X l X k−1 ) (4)
l =1 i =1
i ≠l
Equation (4) always has a zero in the product except when l = k.
Therefore (4) becomes:
v
Ω( X k−1 ) = eik X k ∏ (1 − X l X k−1 ) (5)
i =1
i≠k
v
Λ ( x) = ∏ (1 − X i x)
i =1
v v
Λ ' ( x) = −∑ X l ∏ (1 − X i x)
l =1 i =1
i ≠l
v v v
Λ ' ( X k−1 ) = −∑ X l ∏ (1 − X i X k−1 ) = − X k ∏ (1 − X i X k−1 )
l =1 i =1 i =1
i ≠l i≠k
v
− eik X k ∏ (1 − X l X k−1 )
i =1
i≠k
− Ω( X k−1 ) / Λ ' ( X k−1 ) = v
= eik
− X k ∏ (1 − X l X k−1 )
i =1
i≠k
Example
• Consider a two error correcting RS code of

length 7.
• g(x) = (x+α)(x+α2)(x+α3)(x+α4)
• This is a (7,3) 8-ary code.
• Suppose r(x) = α3x+α4x3.
• S1 = α4+1 = α5.
• S2 = α5+α3 = α2.
• S3 = α6+α6 = 0.
• S4 = α7+α9 = 1+α2 = α6.
Example: Berlekamp-Massey to find Λ(x)
Λ( −1) ( x) = 1
Λ( 0) ( x) = 1, d 0 = 1
k = 1, S1 = α 5 , d1 = α 5 , Λ(1) ( x) = Λ(0) ( x) + d 0−1d1 xΛ( −1) ( x) =
1+ α 5x
k = 2, S 2 = α 2 , d 2 = α 2 + α 3 = α 5 , Λ( 2) ( x) = Λ(1) ( x) + d1−1d 2 xΛ(0) ( x) =
1+ α 5x + x = 1+ α 4x
k = 3, S 3 = 0, d 3 = α 6 , Λ(3) ( x) = Λ( 2) ( x) + d 2−1d 3 xΛ(1) ( x) =
1 + α 4 x + αx(1 + α 5 x) = 1 + α 2 x + α 6 x
k = 4, S 4 = α 6 , d 4 = α 6 + α = α 5 , Λ( 4) ( x) = Λ(3) ( x) + d 3−1d 4 xΛ( 2) ( x) =
1 + α 4 x + α 6 x(1 + α 4 x) = 1 + x + α 4 x
Λ ( x) = 1 + x + α 4 x 2 = (1 + αx)(1 + α 3 x)
X1 = α , X 2 = α 3
Example: Finding ei1 and ei2 using the Forney
Algorithm
• S(x) = α5+α2x+α6x3.
• And Λ(x) = 1+x+α4x2.
• Thus Ω(x) = (α5 + (α2+α5)x + (α9+α2)x2 + (α6+α6)x3 +
α6x4 + α3x5)mod x4 = (α5 + α3x + α6x4 + α3x5)mod x4 =
α5 + α3x.
• Lastly, Λ’(x) = 1.
• ei1 = Ω(X1-1) / Λ’(X1-1) = α5+α3α6 = α5+α2 = α3.
• ei2 = Ω(X2-1) / Λ’(X2-1) = α5+α3α4 = α5+1 = α4.
• Therefore e(x) = α3x+α4x3.
• We decode the received word as c(x) = r(x)+e(x) = 0.
ELG 5372 Error
Control Coding
Lecture 19: Introduction to

Convolutional Codes
Introduction
• The encoder can be viewed as a filtering or

convolution operation.
• The encoder is a set of linear time-invariant digital
filters.
• The code sequence is the interleaved filter outputs.
• Whereas block codes take block of k message
symbols and produce blocks of n code symbols,
convolutional codes are viewed as stream codes in
that they often operate on a stream of input symbols
and produce a stream of output symbols.
Example
+ 1
1101
0 0
+ 1
Example
+ 1
110 11
1 0
+ 0
Example
+ 0
11 0111
0 1
+ 0
Example
+ 0
1 000111
1 0
+ 1
Example
+ 0
0 10000111
1 1
+ 1
Example
+ 1
0 1010000111
0 1
+ 1
Example
+ 0
0 111010000111
0 0
+ 0
1011 → 11, 10, 00, 01, 01, 11

D transform
• The message m = (m0, m1, …, mN) is can be

represented by the D transform:
N
m( D) = ∑ mi D i = m0 + m1 D + ... + m N D N
i =0
• In our example, m(D) = 1+D2+D3.
• The generators are also expressed in D transform
notation:
g1 ( D) = 1 + D + D 2
g 2 ( D) = 1 + D 2
D Transform of Encoder Output
• ci(D) = m(D)gi(D).
• c1(D) = (1+D2+D3)(1+D+D2) = 1+D+D2+D2+D3+
D4+D3+D4+D5 = 1+D+D5 = 110001.
• C2(D) = (1+D2+D3)(1+D2) = 1+D2+D2+D4+D3+D5 =
1+D3+D4+D5 = 100111.
• Output = 11,10, 01, 01, 01, 11.
Example 2
• The following code is a rate 2/3 convolutional

code c
1
m1
+ c2
m2
⎡1 1 + D D ⎤
c3 G=⎢ ⎥
+ ⎣ 0 D 1 ⎦
c = [c1 (D) c2 (D) c3 (D)]
m = [m1 (D) m2 (D)]
c = mG
Example 2
• m1(D) = 1+D+D2
• m2(D) = 1+D.
• c1(D) = 1+D+D2
• c2(D) = (1+D+D2)(1+D)+(1+D)D = 1+D+D2+D3.
• c3(D) = (1+D+D2)D +(1+D) = 1+D2+D3.
• Output = 111, 110, 111, 011
Encoder State
• A convolutional encoder is a state machine.

• For both encoding and decoding purposes, it is useful
to draw the state diagram.
• A state diagram is a temporal diagram showing
current state/next state information as well as the
input and output information.
• Current state = current contents of memory
• Next State = contents of memory following clock
pulse (depends on input).
Example 1
+ c1
m
+ c2
1/11 1/01
10
0/10
0/00 00 1/00 11 1/10
01 0/01
0/11
Example 2
c1
m1
+ c2
m2
+ c3
Trellis Diagram
• The Trellis diagram shows the state

transitions over a number of time intervals.
11 10 10
01 01 01
10 01 01
10 10 10
00 00
01
11 11
11 11
11
00
11
00 00 00 00
Systematic Convolutional Codes
• For all i, there exists one generator gij(D) = 1.

Recursive Convolutional Code
• Some generators have feedback
+ +
m(D) = 1
g1(D) = 1 c1(D) = 1
g2(D) = D3/(1+D2+D3) c2(D) = D3/(1+D2+D3) =
D3+D5+D6+D7+….
ELG 5372 Error
Control Coding
Lecture 20: Convolutional

Codes: Equivalent Codes and
Basic Encoders
Equivalent Codes
• Two convolutional codes are said to be equivalent if

the set of sequences that they produce are identical.
• Consider the following two encoders:
– Ga(D) = [1+D+D2, 1+D2]
– Gb(D) = [1+D+D2/(1+D2), 1]
Encoders for Ga(D) and Gb(D)
+ c1
m
+ c2
+ c1
m +
c2
State Diagram for Ga(D)
1/11 1/01
10
0/10
0/00 00 1/00 11 1/10
01 0/01
0/11
State Diagram for Gb(D)
1/11 1/01
10
0/10
0/00 00 0/00 11 0/10
01 1/01
1/11
Comparison
• When comparing the state diagram of Ga(D) to that of
Gb(D), we see that some state changes are affected, but
the same outputs occur.
• For encoder a, if ma = 1000, output = 11,10, 11, 00
• For encoder b, if mb = 1010, output = 11, 10, 11, 00.
• This is because Gb(D) = T(D)Ga(D), where T(D) is a k × k
invertible matrix.
• Let ca = maGa(D) and cb = mbGb(D) = mbT(D)Ga(D) = ca if
mbT(D) = ma. If T(D) is invertible, then for every possible
ma, there is a distinct mb such that ma = mbT(D).
• In this example, T(D) = 1/(1+D2)
Example 2
Consider the code

⎡1 0 D /(1 + D 3 ) ⎤
G1 ( D) = ⎢ 3 ⎥
⎢⎣0 1 D /(1 + D )⎥⎦
2
m1 c1
m2 c2
+ + c3
Example 2 cont’d
• Let G2(D) be given by:

⎡ 1 D 2 ⎤ ⎡1 0 D /(1 + D 3 ) ⎤ ⎡ 1 D2 D⎤
G2 ( D ) = ⎢ ⎥⎢ 3 ⎥
=⎢ ⎥
⎢⎣ D 1 ⎥⎦ ⎢⎣0 1 D /(1 + D )⎥⎦ ⎢⎣ D
2
1 0 ⎥⎦
+ c1
m1 + c2
c3
m2
Both implementati0ns use 3 storage devices and 2 adders.

Catastrophic Encoders
Consider the rate ½ encoder
G ( D) = [1 + D 2 1 + D]
Let m = 111111…., m(D) = 1+D+D2+D3+….=1/(1+D)

Then c(D) = [1+D, 1] = 1110000000….
Suppose the first three bits are received in error
r(D) = 0000000…, this will be decoded as
m’ = 0000…. Therefore a finite number of channel
errors cause an infinite number of decoded bit errors.
This is called a catastrophic code.
Catastrophic Encoders cont’d
+ c1
m
A code is catastrophic
If there exist a loop in the
+ c2 State diagram that has
A zero output weight for a
1/11 1/10 Non-zero input weight.
10
0/01
0/00 00 1/01 11 1/00
01 0/11
0/10
Catastrophic Codes cont’d
The code G2(D) = [1+D, 1] is equivalent
1/11
0 1 1/01
0/00
0/10
This code, which is less complex to implement, is not

catastrophic. The output sequence 111000000… is
produced by input 10000….
Catastrophic Codes cont’d
• The main conclusion is that catastrophic

codes may have equivalent codes that are
not catastrophic.
Right Inverse
• Let k<n. A right inverse of a k×n matrix G is a n×k

matrix G-1, such that GG-1 = Ik.
• For G = [1+D2, 1+D], G-1 = [1/(D+D2)
(1+D+D2)/(D+D2)]T.
Polynomial Encoder
• A transfer function matrix with only polynomial entries

is said to be a polynomial encoder
– It uses FIR filters.
• If the transfer function has rational entries, it is a
rational encoder
– It uses IIR Filters.
Catastrophic Codes revisited
• Let G(D) be the transfer function matrix of a

convolutional code.
• c(D) = m(D)G(D) and c(D)G-1(D) = m(D).
• If the code is catastrophic, there is a finite length c(D)
that is produced by an infinite length m(D). This only
occurs is G-1(D) is rational.
• Therefore if G-1(D) is a polynomial matrix, the code
cannot be catastrophic.
Basic Encoders
• A transfer function matrix G(D) is basic if it is

polynomial and it has a polynomial right inverse.
• It is shown on page 464 of the text that all rational
encoders have a basic equivalent.
ELG 5372 Error
Control Coding
Lecture 21: Polynomial, Rational and

Systematic Encoders and
Introduction to Decoding of
Convolutional Codes.
Polynomial and Rational Encoders
• Every rational encoder has an equivalent basic

encoder.
– This implies that it is sufficient to use only
feedforward encoders to represent every code
– However, there may not be an equivalent basic
systematic code.
– If a systematic code is desired (for example, Turbo
codes), it may be necessary to use a rational
encoder.
Invariant Factor Decomposition
• Let G(D) be a k×n polynomial matrix.

• G(D) can be written as A(D)Γ(D)B(D), where A(D) is
a k×k polynomial matrix and B(D) is an n×n
polynomial matrix where det(A(D)) = det(B(D)) = 1
and Γ(D) is the k×n diagonal matrix given below:
⎡γ 1 ( D) 0 0 L 0 0 0⎤
⎢ 0
⎢ γ 2 ( D) 0 L 0 0 0⎥⎥
Γ( D ) = ⎢ 0 0 γ 3 ( D) L 0 0 0⎥
⎢ ⎥
⎢ 0 0 L O M M M⎥
⎢ 0
⎣ 0 L 0 γ k ( D) L 0⎥⎦
Invariant Factor Decomposition 2
• The nonzero elements of Γ(D) are polynomials called
invariant factors of G(D).
• Then invariant factors satisfy the property that γi(D)
divides γi+1(D).
• If G(D) is rational, G(D) = A(D)Γ(D)B(D) is still true,
only Γ(D) is now rational and takes the form
⎡ α1 ( D ) 0 0 L 0 0 0⎤
⎢ β1 ( D ) ⎥
α 2 ( D)
⎢ 0
β 2 ( D)
0 L 0 0 0⎥
⎢ ⎥
Γ( D) = ⎢ 0 0
α3 ( D)
L 0 0 0⎥
β3 ( D)
⎢ ⎥
⎢ 0 0 L O M L M⎥
α k ( D)
⎢ 0 0 L L L 0⎥
⎣ βk ( D) ⎦
• Let us express B(D) as
⎡ G ' ( D) ⎤
B( D) = ⎢ ⎥
B
⎣ 2 ( D ) ⎦
• Where G’(D) is a k×n polynomial matrix and B2(D) is

a (n-k) × n polynomial matrix.
• Since the last (n-k) columns of Γ(D) are zero,
Γ(D)B(D) = Γ’(D)G’(D)
• Where Γ’(D) is given by
⎡ α1 ( D ) 0 0 L 0 ⎤
⎢ β1 ( D ) ⎥
α 2 ( D)
⎢ 0
β 2 ( D)
0 L 0 ⎥
⎢ ⎥
Γ' ( D ) = ⎢ 0 0
α3 ( D)
L 0 ⎥
β3 ( D)
⎢ ⎥
⎢ M M M O M ⎥
⎢ 0 α k ( D) ⎥
0 0 L βk ( D) ⎦
⎣
• Therefore, invariant factor decomposition states that

for rational G(D), it can be expressed as
G(D)=A(D)Γ’(D)G’(D).
• Since A(D)Γ’(D) is non singular, G(D) and G’(D) are
equivalent encoders. Since B(D) is polynomial, so is
G’(D).
• Also, since det(B(D)) = 1, the right inverse of B(D) is
polynomial. Since G’(D) is part of B(D), it must also
have a polynomial inverse. Thus G’(D) is a basic
encoder.
• Every rational encoder has an equivalent basic
transfer function matrix
Constraint length and minimal encoders
• Let G(D) be a basic encoder.

• Let vi = maxj deg(gij(D)) denote the maximum degree
of the polynomials in row i of G(D).
• The constraint length v = v1+v2+…+vk. This
represents the number of memory elements required
by the encoder.
• A minimal basic encoder is a basic encoder that has
the smallest constraint length among all equivalent
basic encoders.
• We are interested in equivalent basic encoders as
they require the least amount of hardware and have
the smallest number of states.
Encoder matrix decomposition
• In general a basic encoder matrix G(D) can be written

as:
⎡ D v1 0 L 0 ⎤
⎢ ⎥
D v2 L 0 ⎥
G ( D) = ⎢
0 ~ ~
G h + G ( D) = Λ ( D)G h + G ( D)
⎢ M M O M ⎥
⎢ vk
⎥
⎢⎣ 0 0 K D ⎥⎦
• Where Gh is a binary matrix which contains a 1

indicating the position in each row where the highest
degree term Dvi occurs.
Example
⎡ 1 D2 D⎤ ⎡D 2 0 ⎤ ⎡0 1 0⎤ ⎡1 0 D ⎤
G=⎢ ⎥=⎢ ⎥⎢ ⎥ +⎢ ⎥
⎢⎣ D 1 0 ⎥⎦ ⎢⎣ 0 D ⎥⎦ ⎣1 0 0 ⎦ ⎣ 0 1 0 ⎦
Basic Encoder Theorem 1 (BET1)
• Let G(D) be a k×n basic encoding matrix, then G(D)

is a minimal basic encoding matrix if
– The maximum degree of the k×k subdeterminants
of G(D) is equal to v. (1)
– Gh is full rank. (2)
• Statements (1) and (2) are equivalent.
• See proof on pages 466-467 in text.
Examples
⎡ 1 D2 D⎤ ⎡D 2 0 ⎤ ⎡0 1 0⎤ ⎡1 0 D ⎤
G1 = ⎢ ⎥=⎢ ⎥⎢ ⎥ +⎢ ⎥
⎢⎣ D 1 0 ⎥⎦ ⎢⎣ 0 D ⎥⎦ ⎣1 0 0 ⎦ ⎣ 0 1 0 ⎦
⎡1 1 + D 2 + D 3 D + D 2 ⎤ ⎡D3 0 ⎤ ⎡0 1 0⎤ ⎡1 1 + D 2 D + D2 ⎤
G2 = ⎢ 2 ⎥
=⎢ +
3 ⎥ ⎢0 1 0 ⎥ ⎢ ⎥
⎣⎢0 D + D3 D ⎦⎥ ⎣⎢ 0 D ⎦⎥ ⎣ ⎦ ⎣⎢0 D D 2 ⎦⎥
Producing equivalent basic encoder of
reduced constraint length
• Let G be a basic encoder.

• If Gh(D) is rank deficient, then G(D) is not a minimal
basic code.
• Let gi be the row ofk greatest degree.
– Then g i = g i + ∑ D vi −vd g j
j ≠i
– Determine row of maximum degree. If it is still gi,

stop. Otherwise repeat above.
• See page 466-467 for proof.
Example cont’d
⎡1 1 + D 2 + D 3 D + D2 ⎤
G2 = ⎢ ⎥
⎣⎢0 D + D3 D 2 ⎦⎥
[
g1 = 1 1 + D 2 + D 3 ] [
D + D 2 , g 2 = 0 D + D3 D2 ]
Both have degree 3. Let g1 = [1 1+D2+D3 D+D2]+[0 D+D3 D2] =
[1 1+D+D2 D], which now has degree 2.
Let g2 = [0 D+D3 D2]+D[1 1+D+D2 D] = [D D2 0]
⎡ 1 1+ D + D2 D⎤
G3 = ⎢ ⎥
⎣⎢ D D2 0 ⎦⎥
[
g1 = 1 1 + D + D 2 ] [
D , g2 = D D2 0 ]
⎡1 + D 1 + D D ⎤ ⎡1 + D D ⎤
G4 = ⎢ 2 2⎥ = ⎢ D 2 1 + D + D 2 ⎥G 2
⎣ D D D ⎦ ⎣ ⎦
Decoding convolutional codes
• Several algorithms exist for the decoding of

convolutional codes.
• Most common is Viterbi algorithm.
• Variation is the soft output Viterbi Algorithm (SOVA)
which not only provides the decoded output but a
reliability measure of each decoded symbol.
• Suboptimal decoding algorithms exist. These are
used to reduce complexity, especially when the
constraint length is large. Stack and Fano algorithms
are of particular interest.
Viterbi Algorithm
• Originally proposed by Andrew Viterbi.

• Only later was it shown to provide the maximum likely
code sequence given the received data.
• It is essentially a shortest path algorithm.
Viterbi algorithm for hard decision decoding
• Received data is “hard” (decisions rather than

likelihoods are given to the decoder).
• The algorithm attempts to find the path that produces
the code sequences that is closest in terms of
Hamming distance.
• The algorithm uses the trellis diagram introduced in a
previous lecture.
Example
• Consider the rate ½ code G(D) = [1+D+D2 1+D2].
11 10 10 10 10 10
01 01 01 01 01 01
10 01 01 01 01 01
10 10 10 10 10 10
00 00 00 00 00
01
11 11 11 11 11
11 11
11
00
11 11 11 11
00 00 00 00 00 00 00
Example
11 10 10 10 10 10
01 01 01 01 01 01
10 01 01 01 01 01
10 10 10 10 10 10
00 00 00 00 00
01
11 11 11 11 11
11 11
11
00
11 11 11 11
00 00 00 00 00 00 00
r = 11, 10, 00, 10, 01, 11, 00

Example
11 10 10 10 10 10
01 01 01 01 01 01
10 0 01 01 01 01 01
10 10 10 10 10 10
00 00 00 00 00
01
11 11 11 11 11
11 11
11
00
11 11 11 11
00 2 00 00 00 00 00 00
r = 11, 10, 00, 10, 01, 11, 00

Example
11 10 10 10 10 10
2
01 01 01 01 01 01
10 0 01 01 01 01 01
10 3 10 10 10 10 10
00 00 00 00 00
01
11 11 11 11 11
11 11 0
11
00
11 11 11 11
00 2 00 3 00 00 00 00 00
r = 11, 10, 00, 10, 01, 11, 00

Example
3
11 10 10 10 10 10
2 4
01 01 01 01 01 01
10 0 010 01 01 01 01
10 3 10 5 10 10 10 10
00 3 00 00 00 00
01 4
11 11 11 11 11
11 11 0
11
00
2 11 11 11 11
00 2 00 3 00 3 00 00 00 00
r = 11, 10, 00, 10, 01, 11, 00

Example If terminated c = 11,10, 00,01, 01,11,00
M = 1,0,1,1,0,0,0
3 3 4 4 5
11 10 10 10 10 10
2 4 2 3 2 2
01 01 01 01 01 01
10 0 010 014 01 1 01 4 01 2
10 3 10 5 10 3 10 4 10 1 10 4
00 3 00 5 00 2 00 4 00 3
4 5
01
11 11 0 11 11 2 11 2
11 11 0
11
00
2 115 111 112 11 4
00 2 00 3 00 3 00 3 00 4 00 3 00 2
r = 11, 10, 00, 10, 01, 11, 00

ELG 5372 Error
Control Coding
Lecture 22: Soft Decision

Decoding of Convolutional
Codes.
Likelihoods
• Consider the following channel

0’’’
p(ci) 0 0’’
0’
0
1 p(ri|ci) are known

1 1’
1’’
1’’’
likelihood
ci ri
AWGN Channel and Likelihoods
fr(ri|ci=0) fr(ri|ci=1)
1’’ 1’’’
-1 0 1 r
Maximum Likelihood Sequence Estimation
• Suppose we receive r = (r0, r1, r2) = (1’’, 0’’, 0).

• In the hard decision case, this would be given to the
decoder as (1,0,0).
• In the coded case, suppose that the only possible
code sequences are (0, 0, 0), (0,1,1), (1,0,1) and
(1,1,0).
• Assuming that 0 and 1 are transmitted with equal
probability, the most likely codeword is the one that
maximizes the following:
3
max ∏ p (ri | ci )
c i =1
Example
ri 0’’’ 0’’ 0’ 0 1 1’ 1’’ 1’’’
ci
0 0.47 0.25 0.139 0.085 0.0455 0.009 0.001 0.0005
1 0.0005 0.001 0.009 0.0455 0.085 0.139 0.25 0.47
Assume that 0 and 1 are transmitted with equal probability.

Example
• Then p(r|000) = 0.001×0.25×0.085 = 0.000021.

• p(r|010) = 0.001×0.001×0.085 = 8.5×10-9.
• p(r|101) = 0.25×0.25×0.0455 = 0.0028.
• p(r|110) = 0.25×0.001×0.085 = 0.000021.
• In hard decision case, the decoder would have
determined that 000, 101 and 110 are all equally
likely.
Log likelihood function
• The likelihood function is a product of conditional

probabilities.
• For long sequences, the resulting likelihoods will be
small compared to 1.
• To simplify, we use the log likelihood function.
• p(ri|ci) is expressed as log(p(ri|ci)) and
L L
∏ p(ri | ci ) becomes ∑ log( p(ri|ci )) (1)
i =1 i =1
The most likely codeword maximizes (1)

Decoding Metrics
• M(ri|ci) is a function of the log likelihood function.
• Since the log of a probability is always negative, we will
add a constant to all log likelihood functions so that they
are all positive.
• Log likelihood functions may also have many digits after
the decimal, so we multiply by another constant to yield
metrics that can be approximated by whole numbers (or
numbers that don’t require much memory).
• M(ri|ci) = a(log(p(ri|ci)+b).
• The path metric is
L L L
∑ M (ri | ci ) = ∑ a(log( p(ri | ci ) + b) = a∑ log( p(ri | ci )) + abL
i =1 i =1 i =1
ri 0’’’ 0’’ 0’ 0 1 1’ 1’’ 1’’’
ci
0 0.47 0.25 0.139 0.085 0.0455 0.009 0.001 0.0005
1 0.0005 0.001 0.009 0.0455 0.085 0.139 0.25 0.47
ri 0’’’ 0’’ 0’ 0 1 1’ 1’’ 1’’’
ci
0 -0.328 -0.612 -0.857 -1.07 -1.34 -2.05 -3 -3.3
1 -3.3 -3 -2.05 -1.34 -1.07 -0.857 -0.612 -0.328

ri 0’’’ 0’’ 0’ 0 1 1’ 1’’ 1’’’
ci
0 2.972 2.688 2.443 2.23 1.96 1.25 0.3 0
1 0 0.3 1.25 1.96 2.23 2.443 2.688 2.972
Add 3.3 to all then multiply by a constant an round (I used 20).

ri 0’’’ 0’’ 0’ 0 1 1’ 1’’ 1’’’
ci
0 59 54 49 45 39 25 6 0
1 0 6 25 39 45 49 54 59
Example
• Again consider r = (1’’, 0’’, 0).

• M(r|000) = 6+54+45 = 105
• M(r|011) = 6+6+39 = 51
• M(r|101) = 54+54+39 = 147
• M(r|110) = 54+6+45 = 105
Metrics applied to Viterbi Algorithm
194 312 379 449 656
11 10 10 10 10 10
139 167 315 400 605 704
01 01 01 01 01 01
10 104 01 315 01 202 01 521 01 434 01 669
10 8810 10
154 10 37210 61110
302 573
00 21800 19400 40900 48400 698
01
11 143 11 433 11 390 11 570 11 662
11 11 212
11 243
00
11 202 11503 11 517 11 615
00 39 00 00 226 00 302 00 390 00 528 00 627
39+25+59=123
r = 1’’’1, 1’0’’’, 0’’0’, 1’’’0’’’, 0’1, 1’1’’’, 0’’0

Metrics applied to Viterbi Algorithm
• If the encoder is reset to all zero state, then the

decoded stream must end in all zero state. Therefore
the decoded path would be
– 00-10-01-10-01-10-01-00 (state transitions)
– 11, 10, 00, 10, 00, 10, 11 (code sequence)
– 1,0,1,0,1,0,0 (message)
Soft Decision Decoding
• In the previous example, we used quantized values

for the received sequence.
• This is not pure soft decision decoding, but rather a
compromise between soft decision and hard decision
decoding.
• In soft decision decoding, infinite quantization is
used (in other words, we can use the decision
variable output or perhaps a log likelihood
L
ratio).
• The code sequence that maximizes ∏ p(ri | ci ) is also
i =1
the code sequence that has the smallest Euclidean
distance from r.
Euclidean Distance
• Let v = (v1, v2, v3, … vL) and let u = (u1, u2, …, uL).
• The Euclidean distance between v and u is:
L
ED( v, u) = ∑ i i
( v − u ) 2
i =1
• If c minimizes ED(r,c), then c also minimizes

ED2(r,c) (since distances cannot be negative).
Example
15.3
11 10 10 10 10 10
12 21.3
01 01 01 01 01 01
01 2.7 01 01 01 01
10 1.59
10 1910 19.510 10 10 10
00 14.300 00 00 00
01
11 22.3 11 11 11 11
11 11 2.4
117.7
00
11 11 11 11
2
002
00 00 14.5 00 00 00 00
3.05 +1.3 = 10.99 14.2
r = 2.05 0.3, 0.6 -1.8, -0.75 -0.5, 1.44 -1.3, -0.5 0.21, 0.55 1.8, -0.8 -0.2
Assuming a 1 is received as 1 and a 0 as -1 in the absence of noise.

Code Transfer Function
Consider the rate ½ code G(D) = [1+D+D2, 1+D]. The state
diagram is:
1/11 1/01
10
0/10
0/00 00 1/00 11 1/10
01 0/01
0/11
The code transfer function tells us how many paths there are
of weight d as well as the weight of the message sequences
that produce this path.
Definition of a non-zero weight path
Any path that diverges and the remerges with the all-zero path
Transfer Function
• Find all the paths that diverge and then remerge with
all-zero state XY
11 Y
XY
XY2 Y Y2
00 10 01 00
X
This can be seen as a system with feedforward and feedback
loops. The transfer function T(X,Y) is the “gain” of the system.
Mason’s Rule
• To find the transfer function of a system with

multiple feedforward and feedback loops, we use
Mason’s rule.
∑
FΔ i i
T ( X ,Y ) = i
Δ
• Fi is the gain of the ith forward loop. A forward loop
goes from start state to end state without passing
through a state more than once.
• The graph determinant is Δ. It is given by:
Δ = 1 − ∑ Cl + ∑ Cl Cm − ∑ Cl CmCn + ...
Ll Ll Lm Ll Lm Ln
Mason’s Rule
• Cl is the gain of the lth loop. A loop starts in a state

and ends in that same state, without going through
any intermediate state more than once.
• Ll and Lm are pairs of non touching loops.
• Ll, Lm, Ln are trios of non-touching loops.
• The cofactor of forward path i is Δi which is the same
as Δ but we eliminate and loops that are touching the
ith forward loop.
Forward paths
• In our example, we have two forward paths

• Fpath 1 = 00-10-01-00: F1 = XY5.
• Fpath 2 = 00-10-11-01-00: F2 = X2Y6.
Loops
• In our example, the are three loops:

• L1 = 10 – 01 – 10. C1 = XY
• L2 = 11-11. C2 = XY
• L3 = 10-11-01-10. C3 = X2Y2
Graph Determinant
• L1 and L2 are non touching (they have no states in

common). This is the only pair of non-touching loops
and there is no set of 3 loops that are non-touching
loops.
• The graph determinant is Δ = 1-(C1+C2+C3)+(C1C2) =
1-2XY-X2Y2+X2Y2 = 1-2XY.
Cofactors of Paths 1 and 2
• Fpath 1 does not touch loop 2. Therefore Δ1 = 1-C2 =

1-XY
• All loops touch Fpath2, therefore Δ2 = 1.
Transfer Function
XY (1 − XY ) + X Y
5 2 6
XY 5
T ( X ,Y ) = = =
1 − 2 XY 1 − 2 XY
XY 5 + 2 X 2Y 6 + 4 X 3Y 7 + 8 X 4Y 8 + ...
There is one path of weight 5. It is produced by a message

of weight 1. There are 2 paths of weight 6 and both are produced
by messages of weight 2. There are 4 paths of weight 7 and
all are produced by messages of weight 3…

ELG 5372 Error Control Coding

Uploaded by

Copyright:

Available Formats

ELG 5372 Error Control Coding

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ELG 5372 Error Control Coding

Uploaded by

Copyright:

Available Formats

ELG 5372 Error Control Coding

z Shannon demonstrated that by proper

z Transmission and storage of digital

z Two structurally different types of codes are

z Each k bit message sequence is mapped to one of

z Code rate R = k/n.

zA convolutional code also produces n

z Symbol rate = Rs, signaling interval = T = 1/Rs.

z Noise introduces detection errors at the

z BER for BPSK in AWGN is:

z BER for BPSK in slow flat Rayleigh fading with

z Coding increases symbol rate (k info bits

z Coded data streams provide improved bit

z 0000 0000000 1000 1000110

z Assume that we transmit 0000 in the uncoded case.

z Assuming independent errors

Pb = 9 Pc2 (1 − Pc ) 5 + 19 Pc3 (1 − Pc ) 4 + 16 Pc4 (1 − Pc ) 3 +

Hard Decision Soft Decision (2 bit quantization)

The optimum decoding rule is one that minimizes P(E).

Since p < (1-p), P(r|c) is maximized by the codeword for

 a×(1/a) = 1 and 1/a ∈ G.

 H is closed under *. (property 1)

Lecture 4: Algebra 2: Fields

Modulo-2 Addition Modulo-2 Multiplication

Modulo-3 Addition Modulo-3 Multiplication

z The characteristic of a field is always a

Where all additions are performed as

• When f(X) is divided by g(X) and r(X) = 0, then g(X) is

Irreducible, Primitive and Minimal Polynomials

• For f(X) on GF(q) and β is an element of GF(q), if f(β) = 0, then β

Irreducible, Primitive and Minimal Polynomials

• Suppose we define f(X) = 1+X+X2 over GF(4).

Irreducible, Primitive and Minimal Polynomials

• An irreducible polynomial of degree m on GF(p)

• For proof of theorem 8 see R.J. McEliece, Finite Fields for

Irreducible, Primitive and Minimal Polynomials

• We have seen that 1+X+X2 is irreducible in GF(2).

Irreducible, Primitive and Minimal Polynomials

• An irreducible polynomial on GF(p), f(X), is said to be primitive if

Irreducible, Primitive and Minimal Polynomials

• An irreducible polynomial of degree m in GF(p) has

Irreducible, Primitive and Minimal Polynomials

• Primitive polynomials of degree m in GF(p) have

Irreducible, Primitive and Minimal Polynomials

• If f(X) is a primitive polynomial of degree m in GF(p)

Irreducible, Primitive and Minimal Polynomials

• GF(4) as an extension field of GF(2).

Irreducible, Primitive and Minimal Polynomials

• GF(8) as an extension field of GF(2).

Irreducible, Primitive and Minimal Polynomials

• A minimal polynomial is defined as follows:

Irreducible, Primitive and Minimal Polynomials

• For each element α in GF(qm) there exists a unique,

Irreducible, Primitive and Minimal Polynomials

• Let β be an element of GF(qm).

Irreducible, Primitive and Minimal Polynomials

• Conjugacy class of elements in GF(8) wrt GF(2)

Irreducible, Primitive and Minimal Polynomials

• Let β , which is an element in GF(qm), have a minimal

From Theorem 12 we find that if p(X) is a minimal

a×(1/a) = 1 and 1/a ∈ G.

H is closed under *. (property 1)