Lecture Notes On "Source and Channel Coding"
Lecture Notes On "Source and Channel Coding"
= d
min
1
symbol errors for sure. Although each symbol consists of log
2
bits, for q > 2 this
does not mean that (d
min
1) log
2
(q) bit errors are detected for sure.
Example
single parity check code: Each SPC code is able to detect one error for sure
Proof of error detection theorem
25
Channel coding
Each sphere of radius d
min
1 around a code word includes no other code words
If at the most d
min
1 errors occurs, the received word is included in the
decoding sphere around the transmitted code word
Since in this decoding sphere no further code word is contained, a confusion with
an allowed code word is not possible
Therefore, an (n, k, d
min
)
q
block code can detect t
= d
min
1 symbol errors for
sure
Error correction
Theorem
A block code (n, k, d
min
)
q
C is able to correct
t = (d
min
1)/2
symbol errors for sure. Although each symbol consists of log
2
(q) bits, for q > 2
this does not mean that (d
min
1)/2 log
2
(q) bit errors are corrected for sure.
26
Channel coding
Example
Repetition code: A (3, 1, 3)
2
code is able to correct one error for sure.
Proof of error correction theorem
Now we consider decoding spheres of radius t
Hence, decoding sphere with a minimum distance d
min
= 2t + 1 are disjunct
Therefore, an (n, k, d
min
)
q
block code can correct t symbol errors for sure, if
2t + 1 d
min
, i.e., t (d
min
1)/2
27
Channel coding
Hamming codes
Denition
(n, k, d
min
)
q
= (n, n r, 3)
q
Hamming codes of order r are dened as follows:
n =
q
r
1
q 1
= 1 + q + q
2
+ + q
r1
, k = n r
Hamming codes exist for all orders r 2
All Hamming codes are linear and systematic and have a minimum distance
d
min
= 3, i.e. t = 1 symbol error can be corrected for sure. Example q = 2
r n k R d
min
2 3 1 1/3 3
3 7 4 4/7 3
4 15 11 11/15 3
r 2
r
1 2
r
r 1 k/n 3
28
Channel coding
Example: The (7, 4, 3)
2
Hamming code consists of 16 code words of length 7:
u x
[0000] [0000000]
[0001] [0001111]
[0010] [0010110]
[0011] [0011001]
[0100] [0100101]
[0101] [0101010]
[0110] [0110011]
[0111] [0111100]
29
Channel coding
u x
[1000] [1000011]
[1001] [1001100]
[1010] [1010101]
[1011] [1011010]
[1100] [1100110]
[1101] [1101001]
[1110] [1110000]
[1111] [1111111]
30
Channel coding
Hard-decision and soft-decision decoding
hard-decision (HD) decoding (hard-input decoding)
Let y := x e, where the error components e
i
, i = 0, 1, . . . , n 1, are over the
symbol alphabet and where the addition is done modulo q.
For the hard-decision decoding, we apply the rule:
d
H
(y, x) d
H
(y, x) x C
soft-decision (SD) decoding (soft-input decoding)
Let y := x + n, where n
i
IR, i = 0, 1, . . . , n 1, and the addition is dened
over IR.
For the soft-decision decoding, we apply the rule:
||y x|| ||y x|| x C
31
Channel coding
Example of hard-decision decoding
Consider the (7, 4, 3)
2
Hamming code with the following assumptions:
Info word: u = [0010]
Code word: x = [0010110]
Error word: e = [0010000]
Received word: y = [0000110]
Decoded code word: x = [0010110] ()
Decoded info word: u = [0010]
() This code word has the smallest Hamming distance with respect to the received
word: d
H
(y, x) = 1, d
H
(y, x) > 1 x = x. According to this example, we recognize
that for systematic codes the parity check symbols should not be eliminated before
decoding takes place.
32
Channel coding
Example of soft-decision decoding
Example 1: Consider again the (7, 4, 3)
2
Hamming code, binary antipodal
modulation (0 +1, 1 1)
In soft-decision decoding, usually the squared Euclidean distance is used
Info word: u = [0010]
Transmitted sequence: x = [+1.0, +1.0, 1.0, +1.0, 1.0, 1.0, +1.0]
Noise sequence: n = [+0.2, 0.3, +1.1, +0.1, 0.1, 0.2, +0.3]
Received sequence: y = [+1.2, +0.7, +0.1, +1.1, 1.1, 1.2, 1.3]
Decoded code word: x = [+1.0, +1.0, 1.0, +1.0, 1.0, 1.0, +1.0] ()
decoded info word: u = [0010]
() This modulated code word has the smallest squared Euclidean distance with
respect to the received sequence. Due to the soft decoding often more than
t
= d
min
1 errors can be detected and more than t = (d
min
1)/2 errors can
be corrected.
33
Channel coding
Error probability
Denition I (word error probability):
The word error probability is by denition the probability that the decoded info
word deviates from the transmitted info word in at least one symbol:
P
w
:= P( u = u) = P( x = x)
Denition II (symbol error probability):
The symbol error probability is by denition the average probability of a symbol
error:
P
s
:=
1
n
n1
i=0
p( u
i
= u
i
)
Remarks
For binary codes (q = 2), the symbol error probability, P
s
, is equal to the bit
error probability, P
b
.
Since the number of symbol errors per word error is between 1 and n,
P
w
/n P
s
P
w
holds.
34
Channel coding
Decoding principles for block codes
Let x be hypotheses of the code words of a block code C and let y be the received
code word. Furthermore, let P( x|y) be the conditional probability of x given y.
Denition I (Maximum a posteriori (MAP) decoding):
The MAP decoding rule is as follows: Select the code word x such that for a given
code word y
HD&SD : P( x|y) P( x|y) x
Denition II (Maximum-likelihood (ML) decoding):
The ML rule is as follows: Select the code word x such that for a given received
word y
HD : P(y| x) P(y| x) x C
SD : p(y| x) p(y| x) x C
In both rules, all q
n
possible code words x will be compared with the received word
y, at least conceptionally. The most likely code word (MAP rule) or the code word
nearst to the received code word (ML rule) is nally selected.
35
Channel coding
Theorem:
MAP and ML decoding are identical, if all q
n
code words are equiprobable
Proof:
According to Bayes rule P(x|y) = P(y|x) P(x)/P(y) or
P(x|y) = p(y|x) P(x)/p(y). Since the denominator on the right hand side is
irrelevant with respect to a maximization, both decoding rules dier only in the use
of a priori information P(x). Therefore, for equiprobable code words both decoding
rules are identical. q.e.d.
Denition III (bounded minimum distance (BMD) decoding)
Let us given decoding spheres of radius r (d
min
1)/2. Only those received
words y will be considered for decoding, which are within a decoding sphere. In this
case, the code word in the center of corresponding decoding sphere is selected. For
all received words y which are outside of the decoding spheres, an erasure is
declared.
Denition IV (perfect codes)
A code is called perfect, if the ML and the BMD decoding principle are identical.
36
Channel coding
Binary symmetric channel, BSC
A so-called binary symmetric channel (BSC) models independent bit errors e
i
:
y
i
= x
i
e
i
, i = 0, . . . , n 1
where x
i
, e
i
, y
i
{0, 1}. The corresponding bit error probability is denoted as
P
BSC
.
P
BSC
0
1
1
0
1 P
BSC
1 P
BSC
P
BSC
37
Channel coding
Word error probability for the BSC
Now we consider a coded transmission system with hard-decision decoding given
the example of a BSC with error probability P
BSC
:
Theorem
For linear (n, k, d
min
)
q
block codes the word error probability for ML decoding can
be approximated by the upper bound
P
w
1
t
r=0
n
r
P
r
BSC
(1 P
BSC
)
nr
For perfect codes (e.g., for binary Hamming codes) this bound is exact.
In this case ML and BMD decoding are identical.
38
Channel coding
Proof:
P
w
= 1 P(correct decoding)
Assumption: Decoding is correct, if at most t errors occur:
P
w
= 1 P(w
H
(e) t)
This corresponds to bounded minimum distance (BMD) decoding. Therefore,
P
w
1
t
r=0
P(w
H
(e) = r)
The number of errors in a word of length n is binomial distributed:
P(w
H
(e) = r) =
n
r
P
r
BSC
(1 P
BSC
)
nr
q.e.d.
39
Channel coding
Additive white Gaussian noise (AWGN) channel
In an additive white Gaussian noise (AWGN) channel model, zero mean white
Gaussian noise is linearly added to the channel input value x
i
:
y
i
= x
i
+ n
i
, i = 0, 1, . . . , n 1
where x
i
, n
i
, y
i
IR. The channel inputs x
i
and the noise samples n
i
are
statistically independent.
+
n
i
x
i
y
i
40
Channel coding
The noise n
i
(n
i
IR) is:
additive: y
i
= x
i
+ n
i
, x
i
, y
i
IR, E{x
2
i
} = 1, e.g. x
i
{+1, 1}
white: E{n
i
n
j
} = 0 if i = j
Gaussian distributed: p(n
i
) =
1
2
2
e
(n
i
)
2
2
2
, (i.e., p(y
i
|x
i
) =
1
2
2
e
(y
i
x
i
)
2
2
2
)
with zero mean = E{n
i
} = 0
and variance
2
= E{n
2
i
} =
1
2E
s
/N
0
where E
s
is the energy per symbol
and N
0
is the single-sided noise power density: N
0
= kT
eff
(k is here the Boltzmann constant)
E
s
/N
0
is the so-called signal-to-noise ratio (SNR)
The signal-to-noise ratio per info bit is E
b
/N
0
= R
1
E
s
/N
0
(Note that white, Gaussian distributed random variables are statistically
independent)
(R is the code rate)
41
Channel coding
Bit error probability for the AWGN channel
For an uncoded transmission system (R = 1) with binary, identically distributed
symbols x
i
{+1, 1} the bit error probability P
b
of the AWGN channel can be
calculated as follows:
P
b
=
1
2
p(y
i
|x
i
= +1) dy
i
+
1
2
0
p(y
i
|x
i
= 1) dy
i
=
p(y
i
|x
i
= +1) dy
i
With
p(n
i
) =
1
2
2
e
n
2
i
2
2
,
2
=
N
0
2E
s
follows
p(y
i
|x
i
) =
1
2
2
e
(y
i
x
i
)
2
2
2
After substitution we obtain the nal result
P
b
=
1
2
2
(y
i
1)
2
2
2
dy
i
:=
1
2
erfc
E
s
N
0
, where E
s
= E
b
42
Channel coding
Word error probability for the AWGN channel
Now we consider a coded transmission system with soft-decision decoding given the
example of an AWGN channel model with signal-to-noise ratio E
s
/N
0
:
Theorem For (n, k, d
min
)
q
block code and antipodal transmission (x
i
{+1, 1})
with ML decoding the word error probability can be approximated by an lower and
upper bound:
1
2
erfc
d
min
E
s
N
0
P
w
1
2
n
d=d
min
a
d
erfc
d
E
s
N
0
where E
s
/N
0
= RE
b
/N
0
and a
d
is the number of all code words with Hamming
distance d with respect to the transmitted code word
For linear codes a
d
is equal to the number of code words with Hamming weight d.
For linear codes the inequality holds for all q
k
code words
Example: a
d
min
= a
3
= 7 for the (7, 4, 3)
2
Hamming code
Asymptotically, i.e. for large E
s
/N
0
, the bounds are exact
43
Channel coding
Coding gain
Denition (coding gain)
The coding gain is the dierence (in decibel) between the signal-to-noise ratio for
uncoded transmission and for coded transmission, respectively, given the same bit
error probability (e.g. , 10
4
)
The asymptotic coding gain (asymptotic means: For large SNR) for the
AWGN channel model and soft-decision decoding is given by:
G
asy
:= 10 log
10
(d
min
R) dB
With repetition codes, on the AWGN channel model no coding gain is possible:
d
min
R = n 1/n = 1 and a
dmin
= 1
Desirable is a large minimum distance d
min
at a high code rate R as well as a
small number of code words with Hamming weight d
min
44
Channel coding
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
E
b
/N
0
in dB
10
5
10
4
10
3
10
2
10
1
10
0
B
i
t
E
r
r
o
r
P
r
o
b
a
b
i
l
i
t
y
Code: (7,4) Hamming Code. Channel: AWGN. Decoder: hard/soft
uncoded transmission
hard decision ML decoding
soft decision ML decoding
1.8 dB
45
Channel coding
Matrix description of block codes
Theorem
To each (n, k, d
min
)
q
linear block code C a generator (k n) matrix G exists so
that the q
k
code words can be written as follows:
x := u G,
where u = [u
0
, u
1
, . . . , u
k1
] is a (1 k) info word and x = [x
0
, x
1
, . . . , x
n1
] is the
corresponding (1 n) code word.
For systematic linear block codes the generator matrix can be written as
G = [I
k
|P]
where I
k
is the (k k) identity matrix and P a (k (n k)) matrix which
presents the parity check symbols
46
Channel coding
Theorem (parity check matrix)
((n k) n) matrix H exists such that
x H := 0
if x is a code word of C
The all-zero word on the right hand includes (n k) elements (zeros)
The matrix H is called parity check matrix
For systematic linear block codes the parity check matrix can be written as
H = [P
T
|I
nk
]
where I
nk
is the ((n k) (n k) identity matrix and P
T
an ((n k) k)
matrix representing the parity check symbols
47
Channel coding
Denition
The syndrome s of an (n, k, d
min
)
q
block codes is dened as follows:
s := y H
T
The n k components of s are zero, if y is a code word
Note that s = y H
T
= (x e) H
T
= x H
T
e H
T
= e H
T
Syndrome decoding
Construct syndrome table (with q
nk
rows):
e = arg min
e: eH
T
=s
w
H
(e)
For each received word y compute the syndrome s = y H
T
Search syndrome s in table and hence obtain e
Compute x = y e
48
Channel coding
Cyclic block codes
Denition
A linear (n, k, d
min
)
q
block code C is called cyclic, if each cyclic of a code word
yields a valid code word:
[x
0
, x
1
, . . . , x
n1
] C [x
n1
, x
0
, . . . , x
n2
] C
Remark
The (7, 4, 3)
2
Hamming code is not a cyclic block code
[0001111] is a code word But the code word [1111000] obtained by shifting the code
word [0001111] four time to the left hand side is not a code word.
However by addition of the rows and columns of the generator one can always
obtain a cyclic Hamming code
It is not important whether we shift to left hand side or to the right hand side.
49
Channel coding
Info polynomial and code polynomial
Each code word of an arbitrary block code C can be described by a code
polynomial of degree n 1:
x(D) := x
0
+ x
1
D + + x
n2
D
n2
+ x
n1
D
n1
where [x
0
, . . . , x
n1
] represent the n code symbols
x
i
{0, . . . , q 1}, i = 0, . . . , n 1
Example
The code word [110100] (n = 6) can be represented by the code polynomial
x(D) = 1 + D + D
3
of degree 3
Accordingly, we dene an info polynomial of degree k 1
u(D) := u
0
+ u
1
D + + u
k1
D
k1
where [u
0
, . . . , u
k1
] represent the k info symbols
u
i
{0, 1, . . . , q 1}, i = 0, . . . , k 1
50
Channel coding
Generator polynomial
Theorem
Let u(D) be an info polynomial of degree k 1 and g(D) be a so-called
generator polynomial of degree n k:
g(D) := 1 + g
1
D + + g
nk1
D
nk1
+ 1 D
nk
where g
i
{0, . . . , q 1}, i = 1, . . . , n k 1
The product u(D) g(D) is a polynomial of degree n 1 and corresponds to a
code word of a linear (n, k, d
min
)
q
block code C:
x(D) = u(D) g(D)
The code C is not necessarly cyclic
51
Channel coding
Parity check polynomial
Theorem
Let g(D) be generator polynomial of degree n k of a linear (n, k, d
min
)
q
block
code C. Then
C is cyclic g(D) is a Divisor of D
n
1
Therefore a polynomial h(D) := h
0
+ h
1
D + + h
k1
D
k1
+ 1 D
k
of degree k
exists such that g(D) h(D) = D
n
1, where
h
i
{0, . . . , q 1}, i = 0, . . . , k 1
The polynomial h(D) is called parity check polynomial
52
Channel coding
Golay code
An example of a linear, cyclic block code is the (23, 12, 7)
2
Golay code
Its generator polynomial is
g(D) = D
11
+ D
9
+ D
7
+ D
6
+ D
5
+ D + 1
The corresponding parity check polynomial is
h(D) = D
12
+ D
10
+ D
7
+ D
4
+ D
3
+ D
2
+ D + 1
Proof:
(D
23
+1) : (D
11
+D
9
+D
7
+D
6
+D
5
+D+1) = D
12
+D
10
++D
7
+D
4
+D
3
+D
2
+D+1
Remarks
1. For binary codes (q = 2) D
n
1 = D
n
+ 1
2. The (23, 12, 7)
2
Golay code is a perfect code
53
Channel coding
Deniton of CRC code: A cyclic (2
r
1, 2
r
r 2, 4)
2
code is called cyclic
redundancy check code, if the generator polynomial is of the form
g(D) := (1 + D) p(D)
where p(D) is a primitive polynomial of degree r 3
Example of primitive polynomials:
Degree primitive polynomial p(D)
1 D + 1
2 D
2
+D + 1
3 D
3
+D + 1
4 D
4
+D + 1
5 D
5
+D
2
+ 1
6 D
6
+D + 1
7 D
8
+D
6
+D
5
+D
4
+ 1
8 D + 1
54
Channel coding
CCITT has standardized the following CRC codes, among others, for applications
in the open systems interconnection (OSI) data security layer
D
16
+ D
12
+ D
5
+ 1 = (D + 1)(D
15
+ D
14
+ D
13
+ D
4
+ D
3
+ D
2
+ D + 1)
D
12
+ D
2
+ D + 1 = (D + 1)(D
11
+ D
2
+ 1)
D
8
+ D
2
+ D + 1 = (D + 1)(D
7
+ D
6
+ D
5
+ D
4
+ D
3
+ D
2
+ D + 1)
55
Channel coding
Circuit for the generation of systematic, cyclic block codes with
generator polynomial g(D)
. . .
+
+ + +
D D D
Code polynomial
x(D)
u(D)
Info polynomial
g
1 g
2
g
nk1
56
RS Codes and BCH Codes
Reed-Solomon (RS) codes and Bose-Chaudhuri-Hocquenghem (BCH) codes have
been invented about 1960
Are most powerful block codes
RS codes and BCH codes can be designed analytically
The minimum distance is a design parameter of RS codes
BCH codes can be interpreted as binary RS codes
BCH codes are more suitable for the correction of single errors
RS codes are more suitable for the correction of burst errors
57
Channel coding
Reed-Solomon codes
RS codes are characterized by the following parameters:
n = p
m
1
n k = d
min
1 = 2t
d
min
= 2t + 1
q = p
m
where m and t are arbitrary positive integer numbers and p is a prime number
Often, p = 2 and m = 8 are chosen (q = 256), i. e. , one symbol corresponds to one
byte
Examples
1. (255, 127, 129)
256
RS code
This R 1/2 code can correct 64 bytes for sure
The number of code words is q
k
10
308
58
Channel coding
2. (255, 239, 17)
256
RS code
This R 0.94 code can correct 8 bytes for sure
This code is often used as an outer code in concatenated coding systems
59
Block Codes
Bose-Chaudhuri-Hocquenghem Codes
BCH codes are characterized by the following parameters:
n = 2
m
1
n k m
t
d
min
= 2t + 1
q = 2,
where m
(m
65
Channel coding
State diagram of a binary, non-recursive R = 1/2 convolutional
encoder with 4 states
Zustnde: u
k-2
u
k-1
u
k
= 0
u
k
= 1
Infobits:
Codebits: x
1,k
x
2,k
0 0
0 1
1 1
1 0
0 0
1 1 1 0
0 1
1 0 1 1
0 0 0 1
66
Channel coding
Trellis segment of a binary, non-recursive R = 1/2 convolutional
encoder with 4 states
0 1
1 0
1 1
0 0 0 0
0 1
1 0
1 1
u u u u
k-2 k-1 2,k 1,k
x x
0 0
1 1
0 0
0 1
1 0 1 0
1 1
0 1
Previous state Consecutive state
u
u
k
k
= 0
= 1
Code bits
Info bits
k-1 k
67
Channel coding
Trellis diagram of a binary, non-recursive R = 1/2 convolutional
encoder with 4 states
= 0
u = 1
u
k
k
00
01
10
11
00 00 00 00 00 00 00 00 00
11 11 11 11 11 11 11 11 11
10 10 10 10 10 10 10 10
01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
10 10 10 10 10 10 10
11 11 11 11 11 11 11
00 00 00 00 00 00 00
k=0 k=1 k=2 k=3 k=4 k=5 k=6 k=7 k=8
68
Channel coding
Terminated trellis diagram of a binary, non-recursive R = 7/8 1/2
convolutional encoder with 4 states
k
u = 1
u = 0
k
00
01
10
11
00 00 00 00 00 00 00 00 00
11 11 11 11 11 11 11
01 01 01 01 01 01 01
00 00 00 00 00
10 10 10 10 10 10
01 01 01 01 01
10 10 10 10 10 10
11 11 11 11 11 11 11
A Z
69
Channel coding
Decoding of convolutional codes
A trellis diagram is a so-called directed graph
The optimal decoder in the sense of maximum-likelihood decoding searches the
most probable sequence among all possible sequences within the trellis
Hard decision decoding
The path with the smallest Hamming distance with respect to the received sequence
is selected
Soft decision decoding
The path with the smallest squared Euclidean distance with respect to the received
sequence is selected
This is an optimization strategy: Which path is the best from all possible
paths in the trellis, where path costs corresponds to Hamming or squared Euclidean
distances
Viterbi algorithm
70
Channel coding
Viterbi algorithm
1. Initialization
Initialization of all path metrics
1
, where
1
= 0 for the initial state and
1
= for all other states
2. Computation of the branch metrics
Compute the branch metric
j
k
, 0 k K 1 for all path
Example: The squared Euclidean branch metric
j
k
=
i
(y
i,k
x
j
i,k
)
2
, 0 k K 1
where x
j
i,k
are the encoded bits corresponding to the j-th path and y
i,k
{0, 1}
(hard decision) respectively y
i,k
IR (soft decision) are the received values
71
Channel coding
3. Add-compare-select operation
add branch metrics:
j
k
=
j
k1
+
j
k
and
j
k
=
j
k1
+
j
k
compare the path metrics
j
k
and
j
k
select the best path, the best path
j
k
or
j
k
is remains
4. Back search
If in a terminated trellis diagram the nal state is reached, only the ML path
remains. This path is traced backward (back search or trace-back), in order
to obtain the decode info bits u
k
72
Channel coding
Theorem
For the AWGN channel model the Euclidean branch metric
k
is optimal
Proof
For the AWGN channel model y
n
= x
n
+ n
n
, 0 n N 1, where
p
Y |X
(y
n
|x
n
) =
1
2
2
e
(y
n
x
n
)
2
2
2
,
2
=
1
2E
s
/N
0
Therefore
p
Y|X
(y|x) =
1
(2
2
)
N/2
N1
n=0
e
(y
n
x
n
)
2
2
2
Hence
ln(p
Y|X
(y|x)) =
N
2
ln(2
2
) +
N1
n=0
(y
n
x
n
)
2
2
2
a + b
N1
n=0
(y
n
x
n
)
2
where a and b > 0 are constant factors
73
Channel coding
Viterbi algorithm
The optimal sequence in the sense of maximum-likelihood sequence estimation
(MLSE) is
u
MLSE
= arg max
x
p
Y|
X
(y| x)
u
MLSE
= arg max
x
ln(p
Y|
X
(y| x))
u
MLSE
= arg min
x
ln(p
Y|
X
(y| x))
u
MLSE
= arg min
x
a + b
N1
n=0
(y
n
x
n
)
2
. .. .
:=
n
= arg min
x
N1
n=0
n
u
MLSE
= arg min
x
K1
k=0
i
(y
i,k
x
i,k
)
2
. .. .
:=
k
= arg min
x
K1
k=0
k
Therefore,
k
=
i
(y
i,k
x
i,k
)
2
is the wanted branch metric
74
Channel coding
Distance properties
Convolutional codes are linear, since the code bit are obtained by a linear
operation from the info bits. Due to the linearity, without loss of generality the
all-zeros sequence will be assumed as transmitted
Denition of error path
An error path is a path, which deviates in the k
1
-th trellis segment for the rst time
from the all-zeros path sequence and converge in the k
2
-th trellis segment with the
all-zeros sequence, where k
2
> k
1
The capability of the error correction of convolutional codes is related to the
Hamming weight of the error paths, and not the lengths of the error paths
Denition of the free distance
The free distance, d
free
, is equal to the minimum Hamming weight of all error paths
75
Channel coding
Rate-1/2 convolutional codes with maximum free distance
Taps (octal) d
free
Applications
2 5,7 5
3 15,17 6
4 23,35 7 GSM
5 53, 75 8
6 133,171 10 DAB, DVB, Satcom
Optimization of convolutional codes by computer search
76
Channel coding
Rate-1/3 convolutional codes with maximum free distance
Taps (octal) d
free
Applications
2 5,7,7 8
3 13,15, 17 10
4 25,33, 37 12
5 47, 53 , 75 13
6 133,145, 171 14
77
Channel coding
Distance spectrum
The number of error paths with distance d with respect to the all-zeros sequence
d d
free
, is denoted as a
d
The corresponding sum of info bits, which are equal to one, is denoted as c
d
The list of values a
d
versus d and c
d
versus d is called distance spectrum
Example: Rate-1/2 convolutional code with = 2
d a
d
c
d
5 1 1
6 2 4
7 4 12
8 8 32
9 16 80
d 2
dd
free
a
d
(d d
free
+ 1)
78
Channel coding
Bit error probability for ML-decoding For binary antipodal transmission
via an AWGN channel, the bit error probability for soft-decision ML-decoding (or
MAP-decoding without a-priori information) can be lower and upper bounded as
follows:
1
2
erfc
d
free
R
E
b
N
0
P
b
1
2
d
free
c
d
erfc
d R
E
b
N
0
where RE
b
= E
s
The lower bound takes only the error event with smallest Hamming distance into
account, whereas the upper bound takes all error events into account (union
bound)
For large E
b
/N
0
both curves merge
79
Channel coding
Bounds on the bit error probability for the rate-1/2 convolutional
code with 64 states
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
E
b
/N
0
in dB
10
5
10
4
10
3
10
2
10
1
B
i
t
E
r
r
o
r
P
r
o
b
a
b
i
l
i
t
y
Code: Rate1/2 code, =6. Channel: AWGN. Decoder: VA
uncoded system
upper bound
lower bound
asymptotic coding gain: 7 dB
80
Channel coding
Denition of coding gain for convolutional codes
The coding gain is the dierence in decibel between the necessary E
b
/N
0
for
uncoded transmission and E
b
/N
0
for coded transmission, in order to obtain the
same bit error probability
For convolutional codes, the AWGN channel, and ML-decoding, the asymptotic
coding gain (i. e. , for E
b
/N
0
) is
G
asy
= 10 log
10
(d
free
R) (dB)
Examples
The asymptotic coding gain for R = 1/2 for convolutional codes with maximal free
distance is
4 dB for = 2
5.44 dB for = 4
7 dB for = 6
81
Channel coding
Bounds on the bit error probability for the rate-1/2 convolutional
code with memory lengths = 2, . . . , 6
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0
E
b
/N
0
in dB
10
10
10
9
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
B
i
t
E
r
r
o
r
P
r
o
b
a
b
i
l
i
t
y
(
B
o
u
n
d
s
)
R=1/2 convolutional codes, AWGN channel, ML decoding
uncoded
=2
=4
=6
82
Channel coding
Bounds on the bit error probability for the rate-1/3 convolutional
code with memory lengths = 2, . . . , 6
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0
E
b
/N
0
in dB
10
10
10
9
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
B
i
t
E
r
r
o
r
P
r
o
b
a
b
i
l
i
t
y
(
B
o
u
n
d
s
)
R=1/3 convolutional codes, AWGN channel, ML decoding
uncoded
=2
=4
=6
83
Channel coding
Polynomial representation of convolutional codes
Denition of generator polynomials
Non-recursive R = 1/n convolutional encoders can be described by the generator
polynomials
g
i
(D) =
j=0
g
i,j
D
j
, i = 1, . . . , n
where g
i,j
= 0 if the corresponding modulo-2 addition does not exist and g
i,j
= 1 if
the corresponding modulo-2 addition exists
Accordingly, the memory length is
= max
1in
deg g
i
(D)
Denition of info polynomial
The info polynomial (which may be of innite length) is dened as follows
u(D) =
k=0
u
k
D
k
, u
k
{0, 1}
84
Channel coding
where u
0
, u
1
, . . . are the info bits
Encoding corresponds to a multiplication of the polynomials u(D) and g
i
(D):
x
i
(D) = u(D) g(D) for 1 i n
or equivalently
[x
1
(D), . . . , x
n
(D)] = u(D) [g
1
(D), . . . , g
n
(D)]
. .. .
G(D)
Denition of generator matrix
G(D) := [g
1
(D), . . . , g
n
(D)] is often dubbed generator matrix
The set of all code words can be written as
C =
u(D) G(D)|u(D) =
u
k
D
k
, u
k
{0, 1}
= 4 states
(generator matrix G(D) = [1 + D
2
, 1 + D + D
2
])
0 0
0 1
1 1
1 0
0 0
1 1
1 0
1 0 1 1
0 0 0 1
0 1
u = 0
u = 1
k
k
u = 0
u = 1
k
k
0 0 0 1 0 0
1 1
1 1
1 0
1 0
0 1
1 1
1 0 0 0
0 1
State Diagram Modied State Diagram
Computation of the distance spectrum can be done by means of the
modied state diagram
86
Channel coding
Catastrophic convolutional encoders
As an example, we consider the R = 1/2 convolutional code with generator matrix
G(D) = [1 + D, 1 + D
2
], i.e. , the taps (6, 5)|
8
The info sequence u
0
= [0, 0, . . . , 0] corresponds to the code sequence
x
0
= [0, 0, 0, 0, 0, . . . , 0]
The info sequence u
1
= [1, 1, . . . , 1] corresponds to the code sequence
x
1
= [1, 1, 0, 1, 0, . . . , 0]
An important observation is that two info sequences with innite Hamming
distance exist, whose corresponding code sequence dier in a nite number of bits
Denition of catastrophic convolutional encoders
Convolutional codes are called catastrophic, if two info sequences with innite
Hamming distance exist, whose corresponding code sequences have a nite
Hamming distance
In the modied state diagram, catastrophic encoders are
characterized by loops without any distance gain.
87
Recursive Convolutional Encoders
For each non-recursive convolutional encoder a corresponding recursive convolutional
encoder generating a code with the same free distance d
free
can be constructed.
Example: Recursive, systematic R = 1/2 convolutional encoder with 4 states
q - q
+
D D
q q q
-
? ?
-
?
-
-
+
u
k
a
k
a
k1
a
k2
x
1,k
x
2,k
Signal representation:
(1) x
1,k
= u
k
(2) a
k
= u
k
a
k2
(3) x
2,k
= a
k
a
k1
a
k2
Polynomial representation:
(1) x
1
(D) = u(D), therefore g
1
(D) = 1
(2) a(D) = u(D) + D
2
a(D)
(3) x
2
(D) = a(D) + Da(D) + D
2
a(D)
(2) yields a(D) =
1
1+D
2
u(D). Insertion into (3) yields: x
2
(D) =
1 + D + D
2
1 + D
2
. .. .
g
2
(D)
u(D)
88
The Compact Disc
Design of an audio CD
Sampling of the audio signal
Channel coding
Interleaving
Modulation
89
The Compact Disc
The audio CD has been introduced 1982 on the market and is the rst mass
product employing channel coding
The high quality is essentially based on error correction and error concealment
Data is stored in form of
Holes (pits) scattering of the laser beam
Plain surface (lands) reection of the laser beam
Spiral track of about 5 km (!) length
Length of a pit or a land, respectively, about 0.3 m
Track width about 0.6 m
Distance between tracks about 1 m
Sampling speed about 1.2 m/s
Scattered or reected laser beam is evaluated with a laser diode
90
The Compact Disc
Data rate of the channel bits 4.3218 Mbit/s
Playtime 74 min about 1.9 10
10
bits are stored (!)
A channel bit corresponds to a track length of about 0.3 m
Scratches typically cause burst errors
Material defects typically cause single errors
Hence, the channel decoder must be able to correct single and burst errors
44.1 kHz sampling rate (20 kHz audio bandwidth)
16 bit A/D conversion 2 16 44.1 10
3
= 1.4112 Mbit/s info data rate
6 samples from both stereo channels form an info word
8 bits are combined in one symbol (i.e., q = 2
8
)
This results in an info word length of k = 2 6 16/8 = 24 symbols
91
The Compact Disc
Given a (255, 251, 5)
256
RS code, by means of shortening the following codes result:
(28, 24, 5)
256
RS code C
o
(32, 28, 5)
256
RS code C
i
(Shortening means to suppress info bits)
Since RS codes are maximum-distance separable, the minimal distance of the
shortened codes is also d
min
= 5
The total rate of the serial-concatenated code is R = R
o
R
i
= 24/32 = 3/4
The inner decoder is able to correct two single symbol errors (e.g., material defects)
In order to convert burst errors into single errors, a convolutional deinterleaver
(N = 28, J = 4, 8 bits/symbol) is used
The single symbol errors (after deinterleaving) are corrected by the outer decoder
Interleaver and deinterleaver store N(N 1)J/2 = 1512 symbols (12096 bits) each
92
The Compact Disc
In order to ease clock synchronization, a so called eight-to-fourteen
modulation (EFM) is used: Inside two ones between two and ten zeros must occur
A one causes a transition from a pit to a land or vice versa
Due to three coupling bits it can be maintained that also in a continuous data
stream inside two ones between two and ten zeros occur
Additionally, 24 + 3 sync bits are inserted
Altogether, 588 channel bits per 192 info bits are generated
the eective rate is (only) R
eff
= 0.3265, although the code rate is R = 3/4
In contrast to sampling, coding, modulation, and interleaving the decoder has not
been standardized
Possible quality improvement: Joint EFM demodulation and channel decoding.
93
Digital Transmission System
Source
-
Source
encoder
-
Encryption
-
Channel
encoder
-
Modulator
?
Physical
channel
?
De-
modulator
Channel
decoder
De-
cryption
Source
decoder
Sink
u
u
x
y
Transmitter
Receiver
94
Shannons Information Theory
Claude E. Shannon (1948)
Source coding: Data compression
Cryptology: Data encryption
Channel coding: Error detection/correction/concealment
Separation theorem:
Source coding, encryption, and channel coding may be separated without information
loss (note that the separation theorem holds for very long data sequences only)
95
Examples for Source Coding, Encryption, and Channel
Coding
Source Coding:
1. Example: Characters A-Z encoded with log
2
(26) = 5 bits, no data compression
2. Example: Characters A-Z encoded, with data compr. (e.g. Human algorithm)
No data compression: A [00000] With data compression: A [11]
B [00001] B [001]
C [00010] C [0110]
. . . . . .
Encryption:
Example: Add a key word to each source code word modulo 2
e.g. source code word [0110] (
.
= C), key word [1010] (random sequence)
info word [0110] [1010] = [1100]
Channel coding:
Example: (2,1) repetition code with code rate R = 1/2
e.g. info word [1100] code word [11 11 00 00]
96
Examples for Source Coding Techniques
Application Rate without compr. Rate with compr. Technique
Speech coding 64 kbit/s 13 . . . 7 kbit/s CELP
(8 kHz 8 bit/sample) 4.8 . . . 2.4 kbit/s vocoder
Audio coding 1.536 Mbit/s 256 . . . 128 kbit/s MPEG-1
(2 48 kHz 16 bit/sample) 96 kbit/s MPEG-2 AAC
Image coding 8 bit/pixel 0.25 . . . 1.25 bit/pix. JPEG
Video coding 625 Mbit/s (HDTV) 24 Mbit/s MPEG-2
163.9 Mbit/s (SDTV) 6 Mbit/s
Text compr. factor 3:1 . . . 10:1 Lempel-Ziv
97
Fundamental Questions of Information Theory
Let us given a discrete memoryless source. What is the minimum number of
bits/source symbol, R, after lossless source encoding?
Answer: Entropy H
On average, each symbol of a discrete-time source can be represented (and
recovered with an arbitrarily small error) by R bits/source symbol if R H,
but not if R < H.
What is the maximum number of bits/channel symbol, R, after channel encoding?
Answer: Channel capacity C
On average, R randomly generated bits/channel symbol can be transmitted via a
noisy channel with arbitrarily low error probability if R C, but not if R > C.
Symbols should not be transmitted individually. Instead, the channel encoder
should map the info bits onto the coded symbols so that each info bit inuences
as many coded symbols as possible.
98
Source coding
Assume that a code word with length W
i
is assigned to q
i
.
The probability of q
i
is p
i
. The average codeword length
W is given by
W(Q) =
L
i=1
p
i
W
i
The lowest average bite rate is achieved with the code that gives the smallest
W(Q). This lower bound is given by the rst Shannons source-coding theorem
Theorem (Shannons source coding theorem): Given the constraint that
n , it is necessary and sucient that lossless source encoding is done on
average with
H(Q) =
L
i=1
p
Q
(q
(i)
) log p
Q
(q
(i)
)
bits/source symbol
99
Source coding
Optimal source coding is done if:
H(Q)
W(Q) H(Q) + 1
Remarks:
A code is only useful in a transmission system if every message can be uniquely
decoded.
A sucient, but not necessary, condition for a code to be uniquely decodable is
the prex condition
The prex condition states that for no two codewords C
i
and C
j
can a binary
sequence S be found such that C
i
S = C
j
Example:
Codewords 1111,1110, 110, 10, 00, 010, 0110, and 0111.
100
Source coding
Human coding
The Human code is an optimal binary prex-condition code. This means that
there is no other uniquely decodable binary code with a smaller average
codeword length
Human codes use a code tree constructed as follows:
1. The source symbols are sorted in order of decreasing probabilities
2. The two active nodes with the smallest probabilities are connected to the same
node.
The upper branch of each node is assigned to 1 and the lower branch to 0
3. The resulting probability of each node is obtained by adding the two
probabilities of the two active nodes (see 2.)
4. The resulting probability is considered as a symbol probability for the next
coding step
5. The code tree is complete if the last resulting probability is equal to one
101
Source coding
Example: Coding of a source with 8 symbols (A, B, C, . . . , H)
B
A
C
D
E
F
G
H
0.24
0.34
0.14
0.12
0.07
0.05
0.03
0.01
0
0
0
0
1
1
1
1
1
0
1
0.04
0.09
0
1
0.16
0.4
0
0.6
1.0
0.26
q
i
p
i
Symbol A B C D E F G H
p
i
0.34 0.24 0.14 0.12 0.07 0.05 0.03 0.01
Codeword 11 01 101 100 000 0011 00101 00100
Codeword length 2 2 3 3 3 4 5 5
102
Source coding
Calculate the average codeword length
W
Calculate the entropie of the source
Is the redundancy in the source completely removed?
Remarks
Human coding is often used in data compression
Human coding and decoding are usually done by using lookup tables
For larger sequences of symbols these tables become prohibitively large
Another disadvantage of Human coding is that its performance is sensitive to
changes in the signal statistics
If the statistics change and the code is not adapted, the bit rate will increase and many
even exceed log
2
(L) bits per symbol
103
Source coding
Runlength coding
Runlength coding is useful if long subsequences, or runs, of the same symbol
occur
This is the case, for instance, if the probability density function of the input of a
quantizer shows a sharp peak at zero
Long sequences of zeros can then be expected at the output of the quantizer
The idea of runlength coding is to detect runs of the same symbol and to assign
to each run one codeword that indicates its length
An extensive statistical analysis of runlength coding is dicult
Runlength coding is not only used exclusively for sequences of independent
symbols
It can also give good results if the symbols in a sequence are dependent
104
Source coding
Example of runlength coding: Runlength coding is used for example in
JPEG for encoding the AC coecients. After the quantization many symbols in
the sequence including the AC coecients are equal to zero. At the end of the
sequence of the AC coecients the number of zeros is large.
For the runlength coding following steps are considered
1. The number of zeros between two non zero AC coecients (also called run
length) is transmitted
2. In JPEG the runlength is between 0 and 15
3. Non zero AC coecients are divided into categories
4. From the categories of the AC coecients and the runlength new data symbols
are build
105
Source coding
5. Two important symbol are
ZRL: denes a runlength of 15 followed by a 0 symbol
EOB: denes the end of block, which give the last non zero AC coecient. All
zeros after the EOB symbol will be not transmitted
106
Source coding
Lempel-Ziv coding
Lempel-Ziv coding is a universal coding scheme
This means that it adapts to the signal statistics and therefore can be used
without measuring statistics and designing codes according to these in advance
It is suitable for sources producing independent symbols as well as for sources
that produce dependent symbols.
It is often employed in data compression algorithms used in computers to store
data on a disk.
Many variations of the Lempel-Ziv algorithm exist
In following Lempel-Ziv (LZ78) algorithm is described
1. The source sequence is divided into subsequences, which are as short as
possible and which did not occur before
107
Source coding
Three parameter are introduced and denote
m: The total number of subsequences
Sux: The last symbol of each subsequence
Prex: The remaining symbols of each subsequence
2. The position of each prex is encoded
log
2
m bits are needed in order to encode the position and 1 bit for the sux,
i.e., m (1 + log
2
m) bits are needed in total for a source sequence
Example: Let a [10110100010] be a source sequence of binary symbols
How many bits we need to encode one position?
What is the encoded sequence?