Information Theory & Coding: Understand
Information Theory & Coding: Understand
Information Theory & Coding: Understand
=
=
1
0
2
)
1
( log
K
k
k
k
p
p H
Entropy (example)
Entropy of Binary
Memoryless Source
Binary memoryless
source has symbols
0 and 1 which have
probabilities p0 and
p1 (1-p0)
Count the entropy as
a function of p0
Source Coding Theorem
Source coding means an effective
representation of data generated by a
discrete source
representation by source encoder
statistics of the source must be
known (e.g. if coding priorities exist)
Data Compaction
Data compaction (a.k.a lossless data
compression) means that we will remove
redundant information from the signal
prior the transmission
basically this is achieved by assigning short
descriptions to the most frequent outcomes of
the source output and vice versa
Source-coding schemes that are used in
data compaction are e.g. prefix coding,
huffman coding, lempel-ziv
Data Compaction example
(Huffman Coding)
Channel Capacity
Capacity in the channel is defined as a
intrinsic ability of a channel to convey
information
Channel Coding Theorem
Channel coding consists of mapping the
incoming data sequence into a channel
input sequence and vice versa via
inverse mapping
mapping operations performed by encoders
Source
Channel
Encoder
Channel
Channel
decoder
Destination
Channel Coding Theorem
For reliable communication , needs channel encoding & decoding.
any coding scheme which gives the error as small as possible, and which is
efficient enough that code rate is not too small?
=> Shannons second theorem (noisy coding theorem)
Let dms with alphabet X have entropy H(X) and produce symbols once
every Ts, and dmc have capacity C and be used once every Tc . Then,
i) if , there exists a coding scheme.
) if , it is not possible to transmit with arbitrary
small error.
c s
T
C
T
X H
s
) (
c s
T
C
T
X H
>
) (
11
Shannons Channel Capacity Theorem/ Information Capacity
Theorem
12
For bandlimited, power limited Gaussian channels
log 1 (bits/s)
2
The capacity of a channel of bandwidth
P
C B
N
B
| |
|
\ .
= +
, perturbed by
additive white gaussian noise of psd / 2, and limited in bandwidth to ,
0
is the average transmitted power, and is the noise ( )
- It is not possible to transmit at rate highe
N B
P N N B
o
r than reliability by any means.
- It does not say how to find coding and modulation to achieve maximum capacity,
but it indicates that approaching this limit, the transmitted signal should
C
have statistical
property approximately to Gaussian noise.
Implications of the Information
Capacity Theorem
Set of M-ary examples
Channel capacity for infinite Banwidth
Channel capacity for noiseless channel
Shannon Fano Coding Technique
Algorithm.
Step 1: Arrange all messages in descending order of probability.
Step 2: Devide the Seq. in two groups in such a way that sum of
probabilities in each group is same.
Step 3: Assign 0 to Upper group and 1 to Lower roup.
Step 4:Repeat the Step 2 and 3 for Group 1 and 2 and So on
Example
Message
s
Mi
Pi No.
Of
Bits
Code
M1
M2
M3
M4
M5
M6
M7
m8
1/8/
1/8
1/16
1/16
1/16
1/32
1/32
0
1
1
1
1
1
1
1
0
0
1
1
1
1
1
0
1
0
0
1
1
1
0
1
0
1
1
0
1
Coding Procedure
1
3
3
4
4
4
5
5
0
100
101
1100
1101
1110
11110
11111
Dr. Muqaibel \ EE430 Convolutional Codes
17
Convolutional Codes
Dr. Muqaibel \ EE430 Convolutional Codes
18
Basic Definitions
k =1, n = 2 , (2,1) Rate-1/2 convolutional
code
Two-stage register ( M=2 )
Each input bit influences the output for 3
intervals (K=3)
K = constraint length of the code = M + 1
Dr. Muqaibel \ EE430 Convolutional Codes
19
Generator Polynomial
A convolutional code may be defined by a
set of n generating polynomials for each
input bit.
For the circuit under consideration:
g
1
(D) = 1 + D + D
2
g
2
(D) = 1 + D
2
The set {g
i
(D)} defines the code
completely. The length of the shift register
is equal to the highest-degree generator
polynomial.
Dr. Muqaibel \ EE430 Convolutional Codes
20
State Diagram Representation
The output depends on the current input and
the state of the encoder ( i. e. the contents of
the shift register).
Dr. Muqaibel \ EE430 Convolutional Codes
21
Trellis Diagram Representation
Expansion of state diagram in time.
Dr. Muqaibel \ EE430 Convolutional Codes
22
Decoding
A message m is encoded into the code
sequence c.
Each code sequence represents a path in the
trellis diagram.
Minimum Distance Decoding
Upon receiving the received sequence r, search
for the path that is closest ( in Hamming
distance) to r .
Dr. Muqaibel \ EE430 Convolutional Codes
23
The Viterbi Algorithm
Walk through the trellis and compute the
Hamming distance between that branch of r
and those in the trellis.
At each level, consider the two paths
entering the same node and are identical
from this node onwards. From these two
paths, the one that is closer to r at this stage
will still be so at any time in the future. This
path is retained, and the other path is
discarded.
Proceeding this way, at each stage one path
will be saved for each node. These paths are
called the survivors. The decoded sequence
(based on MDD) is guaranteed to be one of
these survivors.
Dr. Muqaibel \ EE430 Convolutional Codes
24
The Viterbi Algorithm (contd)
Each survivor is associated with a metric of
the accumulated Hamming distance (the
Hamming distance up to this stage).
Carry out this process until the received
sequence is considered completely. Choose
the survivor with the smallest metric.
6.3 The Viterbi Algorithm:
The viterbi algorithm is used to decode convolutional codes
and any structure or system that can be described by a trellis.
It is a maximum likelihood decoding algorithm that selects
the most probable path that maximizes the likelihood
function.
The algorithm is based on add-compare-select the best path
each time at each state.
Example: For the convolutional code example in the previous lecture,
starting from state zero, Decode the following received sequence.
Add the weight of the
path at each state
Compute the two possible paths at
each state and select the one
with less cumulative Hamming
weight
This is called
the survival
path
At the end of the
trellis, select the
path with the
minimum
cumulative
Hamming weight
This is the
survival
path in
this
example
Decoded
sequence is
m=[10 1110]
Dr. Muqaibel \ EE430 Convolutional Codes
27
Distance Properties of Conv.
Codes
Def: The free distance, d
free
, is the minimum
Hamming distance between any two code
sequences.
Criteria for good convolutional codes:
Large free distance, d
free
.
Small Hamming distance (i.e. as few
differences as possible ) between the input
information sequences that produce the
minimally separated code sequences. d
inf
There is no known constructive way of
designing a conv. code of given distance
properties. However, a given code can be
analyzed to find its distance properties.
Dr. Muqaibel \ EE430 Convolutional Codes
28
Distance Prop. of Conv. Codes
(contd)
Convolutional codes are linear. Therefore,
the Hamming distance between any pair of
code sequences corresponds to the
Hamming distance between the all-zero
code sequence and some nonzero code
sequence. Thus for a study of the distance
properties it is possible to focus on the
Hamming distance between the all-zero
code sequence and all nonzero code
sequences.
The nonzero sequence of minimum
Hamming weight diverges from the all-zero
path at some point and remerges with the
all-zero path at some later point.
Dr. Muqaibel \ EE430 Convolutional Codes
29
Distance Properties: Illustration
sequence 2: Hamming weight = 5, d
inf
= 1
sequence 3: Hamming weight = 7, d
inf
= 3.
Dr. Muqaibel \ EE430 Convolutional Codes
30
Modified State Diagram
The span of interest to us of a nonzero path
starts from the 00 state and ends when the
path first returns to the 00 state. Split the 00
state (state a) to two states: a
0
and a
1
.
The branches are labeled with the dummy
variables D, L and N, where:
The power of D is the Hamming weight (# of 1s)
of the
output corresponding to that branch.
The power of N is the Hamming weight (# of 1s)
of the
information bit(s) corresponding to that branch.
The power of L is the length of the branch
(always = 1).
Dr. Muqaibel \ EE430 Convolutional Codes
31
Modified State Diagram (contd)
Dr. Muqaibel \ EE430 Convolutional Codes
32
Properties of the Path
Sequence 2:
code sequence: .. 00 11 10 11 00 ..
state sequence: a
0
b c a
1
Labeled: (D
2
LN)(DL)(D
2
L) = D
5
L
3
N
Prop. : w =5, d
inf
=1, diverges from the allzero
path by 3 branches.
Sequence 3:
code sequence: .. 00 11 01 01 00 10 11 00 ..
state sequence: a
0
b d c b c a
1
Labeled: (D
2
LN)(DLN)(DL)(DL)(LN)(D
2
L)
= D
7
L
6
N
3
Prop. : w =7, d
inf
=3, diverges from the allzero
path by 6 branches.
Dr. Muqaibel \ EE430 Convolutional Codes
33
Transfer Function
Input-Output relations:
a
0
= 1
b = D
2
LN a
0
+ LNc
c = DLb + DLNd
d = DLNb + DLNd
a
1
= D
2
Lc
The transfer function T(D,L,N) = a
1
/a
0
T(D, L, N)
D L
DNL(1 L)
=
+
5 3
1
Dr. Muqaibel \ EE430 Convolutional Codes
34
Transfer Function (contd)
Performing long division:
T = D
5
L
3
N + D
6
L
4
N
2
+ D
6
L
5
N
2
+ D
7
L
5
N
3
+ .
If interested in the Hamming distance
property of the code only, set N = 1 and L =
1 to get the distance transfer function:
T (D) = D
5
+ 2D
6
+ 4D
7
There is one code sequence of weight 5.
Therefore d
free
=5.
There are two code sequences of weight 6,
four code sequences of weight 7, .
296.3 Page35
296.3:Algorithms in the Real
World
Convolutional Coding & Viterbi
Decoding
296.3 Page36
And now a word from my
father
"First, computer software and hardware are
the most complex and rapidly developing
intellectual creations of modem man."
-- p. iii, Internet and Computer Law, P.
B. Maggs, J. T. Soma, and J. A. Sprowl,
2001
296.3 Page37
Todays lecture is based on
A Tutorial on Convolutional Coding with
Viterbi Decoding
Chip Fleming
Spectrum Applications
http://home.netcom.com/~chip.f/viterbi/tutorial.ht
ml
296.3 Page38
Origin of Viterbi Decoding
Andrew J. Viterbi, "Error Bounds for
Convolutional Codes and an Asymptotically
Optimum Decoding Algorithm," IEEE
Transactions on Information Theory,
Volume IT-13, pp. 260-269, April 1967.
Viterbi is a founder of Qualcomm.
296.3 Page39
Terminology
k number of message symbols (as before)
n number of codeword symbols (as before)
r rate = k/n
m number of encoding cycles an input symbol
is stored
K number of input symbols used by encoder
to compute each output symbol (decoding time
exponentially dependent on K)
296.3 Page40
Convolution Encoder
flip flop
(stores one bit)
k = 15, n = 30, r = , K = 3, m = 2
output upper input followed
by lower input
296.3 Page41
Encoding Example
Input: 010111001010001
Output: 00 11 10 00 01 10 01 11 11 10 00 10 11 00 11
Both flip flops set to 0 initially.
Flush encoder by clocking m = 2 times with 0 inputs.
296.3 Page42
Viterbi Decoding Applications
decoding trellis-coded modulation in modems
most common FEC technique used in space
communications (r = , K = 7)
usually implemented as serial concatenated
block and convolutional coding first Reed-
Solomon, then convolutional
Turbo codes are a new parallel-concatenated
convolutional coding technique
296.3 Page43
State Transition and Output Tables
Next State, if
Current State Input = 0: Input = 1:
00 00 10
01 00 10
10 01 11
11 01 11
Output Symbols, if
Current State Input = 0: Input = 1:
00 00 11
01 11 00
10 10 01
11 01 10
State transition table Output table
2
m
rows
296.3 Page44
State Transitions
input symbol is 1
input symbol is 0
arcs labeled with output symbols
296.3 Page45
Trellis
296.3 Page46
Oh no! Errors in received bits!
296.3 Page47
Viterbi Decoding - Accumulated Error Metric
(use Hamming distance in our example)
Trying to find the input
sequence who
corresponding output
matches the received
output as closely as
possible.
296.3 Page48
Accumulated Error Metric
296.3 Page49
Decoder Trellis
296.3 Page50
Decoder Trellis
296.3 Page51
Decoder Trellis
296.3 Page52
Final Decoder Trellis
296.3 Page53
Accumulated Error Metric over Time
t = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
State
00
2
0 2 3 3 3 3 4 1 3 4 3 3 2 2 4 5 2
State
01
2
3 1 2 2 3 1 4 4 1 4 2 3 4 4 2
State
10
2
2 0 2 1 3 3 4 3 1 4 1 4 3 3 2
State
11
2
3 1 2 1 1 3 4 4 3 4 2 3 4 4
Last
two
inputs
known
to be
zero.
296.3 Page54
Surviving Predecessor States
t = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
State
00
2
00 00 00 01 00 01 01 00 01 00 00 01 00 01 00 00 00 01
State
01
2
00 00 10 10 11 11 10 11
11 10 10 11 10 11 10 10 10 00
State
10
2
00 00 00 00 01 01 01 00 01 00 00 01 01 00 01 00 00 00
State
11
2
00 00 10 10 11 10 11 10 11 10 10 11 10 11 10 10 00 00
296.3 Page55
States Selected when Tracing Back
t
=
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
00 00 10 01 10 11 11 01 00 10 01 10 01 00 00 10 01 00
296.3 Page56
Coding Gain
Transmission voltages (signal to noise ratio SNR 20 dB).
No errors.
296.3 Page57
Coding Gain
Transmission voltages with Gaussian noise (SNR 6dB)
bit error rate (BER) of about 0.235%
296.3 Page58
Coding Gain
convolutional coding with Viterbi decoding
can achieve a BER of less than 1 x 10
-7
at
the same SNR, 6 dB
r = , K = 3
Use 5db less power to achieve 1 x 10
-7
BER
than without coding
Coding uses twice as much (3dB)
bandwidth
Coding gain: 5dB-3dB = 2dB less energy
296.3 Page59
References (from Fleming)
Some Books about Forward Error Correction
S. Lin and D. J. Costello, Error Control Coding. Englewood Cliffs,
NJ: Prentice Hall, 1982.
A. M. Michelson and A. H. Levesque, Error Control Techniques for
Digital Communication. New York: John Wiley & Sons, 1985.
W. W. Peterson and E. J. Weldon, Jr., Error Correcting Codes, 2 nd
ed. Cambridge, MA: The MIT Press, 1972.
V. Pless, Introduction to the Theory of Error-Correcting Codes, 3rd
ed. New York: John Wiley & Sons, 1998.
C. Schlegel and L. Perez, Trellis Coding. Piscataway, NJ: IEEE Press,
1997
S. B. Wicker, Error Control Systems for Digital Communication and
Storage . Englewood Cliffs, NJ: Prentice Hall, 1995.
296.3 Page60
More References (from Fleming)
Some Papers about Convolutional Coding with Viterbi Decoding
For those interested in VLSI implementations of the Viterbi algorithm, I recommend the following
paper and the papers to which it refers (and so on):
Lin, Ming-Bo, "New Path History Management Circuits for Viterbi Decoders," IEEE Transactions
on Communications, vol. 48, October, 2000, pp. 1605-1608.
Other papers are:
G. D. Forney, Jr., "Convolutional Codes II: Maximum-Likelihood Decoding," Information Control,
vol. 25, June, 1974, pp. 222-226.
K. S. Gilhousen et. al., "Coding Systems Study for High Data Rate Telemetry Links," Final Contract
Report, N71-27786, Contract No. NAS2-6024, Linkabit Corporation, La Jolla, CA, 1971.
J. A. Heller and I. M. Jacobs, Viterbi Decoding for Satellite and Space Communications," IEEE
Transactions on Communication Technology, vol. COM-19, October, 1971, pp. 835-848.
K. J. Larsen, "Short Convolutional Codes with Maximal Free Distance for Rates 1/2, 1/3, and 1/4,"
IEEE Transactions on Information Theory, vol. IT-19, May, 1973, pp. 371-372.
J. P. Odenwalder, "Optimum Decoding of Convolutional Codes," Ph. D. Dissertation, Department of
Systems Sciences, School of Engineering and Applied Sciences, University of California at Los
Angeles, 1970.
A. J. Viterbi, "Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding
Algorithm," IEEE Transactions on Information Theory , vol. IT-13, April, 1967, pp. 260-269.
Convolutional codes
Tomashevich Victor
- 62 -
Introduction
Convolutional codes map information to code bits sequentially by
convolving a sequence of information bits with generator sequences
A convolutional encoder encodes K information bits to N>K code bits
at one time step
Convolutional codes can be regarded as block codes for which the
encoder has a certain structure such that we can express the encoding
operation as convolution
- 63 -
Properties of convolutional codes
Consider a convolutional encoder. Input to the encoder is a
information bit sequence
u
(partitioned into blocks of length K):
) , , (
1 0
u u u =
), , , (
) ( ) 2 ( ) 1 ( K
i i i i
u u u u =
The encoder output the code bit sequence
x
(partitioned into blocks
) , , (
1 0
x x x =
), , , (
) ( ) 2 ( ) 1 ( N
i i i i
x x x x =
N
K
R =
of length N)
- 64 -
Example: Consider a rate convolutional code with K=1 and N=2
defined by the circuit:
i
u
i
x
The sequences
) , , (
) 1 (
1
) 1 (
0
x x
,
) , , (
) 2 (
1
) 2 (
0
x x
are generated as follows:
i i
u x =
) 1 (
1
) 2 (
+ =
i i i
u u x and
Multiplexing between
) 1 (
i
x
and
) 2 (
i
x
gives the code bit sequence
) , , ( ) ), ( ), ((
1 0
) 2 (
1
) 1 (
1
) 2 (
0
) 1 (
0
x x x x x x x = =
- 65 -
The convolutional code is linear
The encoding mapping is bijective
Code bits generated at time step i are affected by information bits up
to M time steps i 1, i 2, , i M back in time. M is the maximal
delay of information bits in the encoder
Code memory is the (minimal) number of registers to construct an
encoding circuit for the code.
Constraint length is the overall number of information bits affecting
code bits generated at time step i: =code memory + K=MK + K=(M +
1)K
A convolutional code is systematic if the N code bits generated at time
step i contain the K information bits
- 66 -
Example: The rate code defined by the circuit
i
u
i
x
has delay M=1, memory 1, constraint length 2, and it is systematic
Example: the rate 2/3 code defined by the circuit
) 1 (
i
u
) 2 (
i
u
) 1 (
i
x
) 2 (
i
x
) 3 (
i
x
has delay M=1, memory 2, constraint length 4, and not systematic
- 67 -
Tree
A
A
A
B
B
A
0
u
1
u
2
u
1
0
00
01
00
00
01
11
00
11
01
10
10
11
10
11
- 68 -
Trellis The tree graph can be contracted to a direct graph called trellis
of the convolutional code having at most S nodes at distance
i=0,1, to the root
The contents of the (at most) MK encoder registers are assigned
the variables
1 , , 1 , 0 ), 2 (
) (
= e K M j GF s
j
i
The vector ) , , (
) 1 ( ) 1 ( ) 0 (
=
K M
i i i i
s s s s
combibing all register contents at time step i is called state of
the encoder at time step i.
The code bit block
i
x
is clearly a function of
i
s
and
i
u
, only
- 69 -
Example:
The encoder of the rate convolutional code has
2 2
1
= = S
different states. The state is given by
i
s s =
The code bit block
i
x
at time step i is computed from
i
s
and
i
u
by
i i
u x =
) 1 (
and
i i i
s u x + =
) 2 (
0 =
i
u
1 =
i
u
A A A A
B B B
11 11 11
00 00 00
01
01
10 10
- 70 -
Example: Constructing a trellis section
i
u
i
x
i i
u x =
) 1 (
and
i i i
s u x + =
) 2 (
Two equations are required:
(1) How does
i
s
depend on
m i
u
and possibly
0 , )
m s
m i
1
=
i i
u s
(2) How does
i
x
depend on
i
s and
i
u
i i
u x =
) 1 (
and
i i i
s u x + =
) 2 (
The branches are labeled with
i i
x u |
called state transition
leading from a state
i
s
to a new state
1 + i
s
- 71 -
Trellis section:
0 0
1 1
0|00
0|01
1|11
1|10
0 0 0 0 0
1
1 1 1 1
0|00
0|00 0|00
1|10
1|10
1|10
0|01 0|01
0|01
1|11 1|11 1|11
0
s
1
s
2
s
2 L
s
1 L
s
- 72 -
0 0 0 0 0
1 1 1 1
0|00 0|00 0|00
1|10 1|10
0|01 0|01
1|11 1|11 1|11
0
s
1
s
2
s
2 L
s
1 L
s
- 73 -
State diagram
Example: Trellis of the rate convolutional code
i i
u x =
) 1 (
and
i i i
s u x + =
) 2 (
0 0
1 1
00
01
11
10
0 =
i
u
1 =
i
u
i
s
1 + i
s
State diagram:
0
1
00
01
11
10
0 =
i
u
1 =
i
u
- 74 -
Description with submatrices
Definition: A convolutional code is a set C of code bit sequences
), , , , , (
1 0
i
x x x x =
), , , (
) ( ) 2 ( ) 1 ( N
i i i i
x x x x = ) 2 (
) (
GF x
j
i
e
partitioned into lenth N blocks
There exist many encoders mapping information bit sequences
) , , (
1 0
u u u =
), , , (
) ( ) 2 ( ) 1 ( K
i i i i
u u u u = ) 2 (
) (
GF u
j
i
e
(partitioned into length K<N blocks) to code bit sequences x
for the same code
- 75 -
Example: the following two encoding curcuits generate the
same set of code word sequences
) 1 (
i
u
) 2 (
i
u
) 1 (
i
x
) 2 (
i
x
) 3 (
i
x
) 1 (
i
x
) 2 (
i
x
) 3 (
i
x
) 1 (
i
u
) 2 (
i
u
- 76 -
Generator matrix
, G u x =
where
|
|
|
|
|
.
|
\
|
=
M
M
M
G G G G
G G G G
G G G G
G
2 1 0
2 1 0
2 1 0
i G u x
M
m
m m i i
=
=
,
0
The generated convolutional code has rate R=K/N, memory
K*M, and constraint length K*(M+1)
- 77 -
Example:
The rate code is given by
i i
u x =
) 1 (
i i i
s u x + =
) 2 (
and
0
G
governs how
i
u
affects
( ) 1 1 : ) (
0
) 2 ( ) 1 (
= = G x x x
i i i
1
G governs how
1 i
u
affects
( ) 1 0 :
1
= G x
i
, ) , , ( )) ( ), ( ), ((
2 1 0
) 2 (
2
) 1 (
2
) 2 (
1
) 1 (
1
) 2 (
0
) 1 (
0
G u u u x x x x x x =
where
|
|
|
.
|
\
|
=
11
01 11
01 11
G
- 78 -
Description with polynomials
|
|
|
|
|
.
|
\
|
=
) ( ) ( ) (
) ( ) ( ) (
) ( ) ( ) (
) (
) ( ) 2 ( ) 1 (
) (
2
) 2 (
2
) 1 (
2
) (
1
) 2 (
1
) 1 (
1
D g D g D g
D g D g D g
D g D g D g
D G
N
K K K
N
N
) 2 ( , ) (
) (
,
) (
,
2 ) (
2 ,
1 ) (
1 ,
) (
0 ,
) (
GF g D g D g D g g D g
j
m i
M j
M i
j
i
j
i
j
i
j
i
e + + + + =
)) ( ), ( ), ( ( ) (
) ( ) 2 ( ) 1 (
D u D u D u D u
K
=
where
, , , 2 , 1 , ) (
) ( ) (
1
) (
0
) (
K j D u D u u D u
i j
i
j j j
= + + + + =
)) ( , ), ( ), ( ( ) (
) ( ) 2 ( ) 1 (
D x D x D x D x
N
=
where
N j D x D x x D x
i j
i
j j j
, , 2 , 1 , ) (
) ( ) (
1
) (
0
) (
= + + + + =
) ( ) ( ) ( D G D u D x =
, , , 0 , , , 1 , , , 1 ), , (
) (
,
M m N j K i j i G g
m
j
m i
= = = =
N j K i D g M
j
i
j i
, , 1 , , , 1 )), ( deg( max
) (
,
= = =
- 79 -
Example:
The rate code is given by
i i
u x =
) 1 (
i i i
s u x + =
) 2 (
and
) (
) 1 (
1
D g
and
) (
) 2 (
1
D g )) ( ) ( ( ) (
) 2 (
1
) 1 (
1
D g D g D G =
From M=1 follows that
1 ))) ( deg((
) (
s D g
j
i
The polynomial
) (
) 1 (
1
D g
governs how
, 1 , 0 , =
m u
m l
affects
1 0 1 ) ( :
) 1 (
1
) 1 (
= + = D D g x
l
The polynomial
) (
) 2 (
1
D g
governs how , 1 , 0 , =
m u
m l
affects
1 1 1 ) ( :
) 2 (
1
) 2 (
+ = + = D D D g x
l
D D u u + = = 1 ) ( ) , 0 , 1 , 1 (
, G u x = yielding ) , 00 , 01 , 10 , 11 ( = x
) ( ) ( ) ( D G D u D x = yielding
( )
2
1 1 ) ( D D D x + + =
- 80 -
Punctured convolutional codes
A sequence of code bits is punctured by deleting some of the
bits in the sequence according to a fixed rule
In general, the the puncturing of a rate K/N convolutional
code is defined using N puncturing tables, one for any code
bit , , , 1 ,
) (
N j x
j
i
= in a block
i
x
Each table contains p bits, where p is the puncturing period. If
a bit is 1, the corresponding code bit is part of the punctured
code, if the bit is 0, the corresponding code bit is not part of the
punctured code
For a sequence of code bit blocks , , 1 , 0 , = i x
i
the puncturing
tables are applied periodically. N puncturing tables are combined
in a
p N
puncturing matrix
P
- 81 -
Example:
The encoder circuit of rate convolutional code given by
( )
2 2
1 1 ) ( D D D D G + + + =
) 0 , 0 , 1 , 0 , 0 ( = u ) 11 , 01 , 11 , 00 , 00 ( =
NP
x
The sequence
NP
x
is punctured using two different puncturing
matrices:
,
1 0 0 1
0 1 1 1
1
|
|
.
|
\
|
= P
|
|
.
|
\
|
=
1 0 1 1
0 1 1 1
2
P
The puncturing period p is 4. Using
1
P , 3 out of 4 code bits
) 1 (
i
x
and 2 out of 4 code bits
) 2 (
i
x of the mother code bits are used, the
others are discarded
5 / 4 ) 2 3 /( ) 4 4 ( 2 / 1 = + + = R
and
u
is encoded to
) 11 , 1 , 1 , 0 , 00 ( ) 11 , 1 , 1 , 0 , 00 ( = = X X X x
- 82 -
Using
,
2
P
the rate of the punctured code is
3 / 2 ) 3 3 /( ) 4 4 ( 2 / 1 = + + = R
and
u
is encoded to
) 11 , 1 , 1 , 00 , 00 ( ) 11 , 1 , 1 , 00 , 00 ( = = X X x
) 1 (
i
x
) 2 (
i
x
1 1 1 0
1 0 0 1
1 1 1 0
1 1 0 1
1
P
2
P
Puncturing tables
Pucturing period p=4
1 2 3 4
1 0
|
1
|
1
|
0
|
0 0
) 2 (
4
) 1 (
4
) 2 (
3
) 1 (
2
) 1 (
1
) 2 (
0
) 1 (
0
x x x x x x x
0 0 1 0 0
4 3 2 1 0
u u u u u
Encoder of a rate code punctured to a rate 4/5 (top puncturing
tables) or a rate 2/3 code (bottom puncturing tables)
- 83 -
The rate R of a punctured code obtained from a rate
N K R /
0
=
mother code using the
p N
puncturing matrix
P
is given as
P in of
p K
P in of
p N
R R
1 # 1 #
0
=
=
With puncturing we can easily construct convolutional codes
with arbitrary rational rate. However, punctured codes of rate
R=K/N obtained from an optimized good mother code of
memory m usually perform worse than unpunctured rate K/N,
memory m optimized codes given by a
N K
generator matrix
) (D G
This performance gap increases with the number of punctured
bits. The advantage of puncturing is that the decoding
complexity is not altered, since the original trellis of the mother
code can be used
- 84 -
Consider a rate 1/3, memory 4 mother code given by the submatrices
) 101 ( ), 011 ( ), 010 ( ), 111 (
3 2 1 0
= = = = G G G G
and
), 111 (
4
= G
code# rate punc. table
f
d
f
d
c
9
1/3
8/24
8
7
6
5
4/11
8/12
4/10
8/20
4/9
8/18
1/2
8/16
11
9
8
7
7
8
10
2
2
32
1111 1111
1111 1111
1111 1111
1111 1111
1111 1111
1110 1110
1111 1111
1111 1111
1100 1100
1111 1111
1111 1111
1000 1000
1111 1111
1111 1111
0000 0000
code#
rate
punc. table
f
d
f
d
c
4
4/7
8/14
3
2
1
2/3
8/12
4/5
8/10
8/9
8/9
5
4
3
2
8
4
42
2
1111 1111
1110 1110
0000 0000
1111 1111
1010 1010
0000 0000
1111 1111
1000 1000
0000 0000
1111 0111
1000 1000
0000 0000
- 85 -
Decoding of convolutional codes
The Viterbi algorithm
j
u
j
x
1
j
x
2
j
s
1 j
s
2
) 1 , 1 , 1 , 1 , 1 , 1 , 1 ( + + + + + + + = u
) 1 , 1 ( ) , (
20 10
+ + = s s
) 1 1 , 1 1 , 1 1 , 1 1 , 1 1 , 1 1 , 1 1 ( + + + + + + + + + + + + + + = x
- 86 -
+1+1 +1+1
-1-1
-1+1
+1-1 +1-1
-1+1
-1-1
j j
s s
2 1
,
j j
x x
2 1
,
1 2 1 1
,
+ + j j
s s
+1/+1+1
-1/-1-1
+1/+1+1
-1/-1+1
-1/-1+1
+1/+1-1
+1/+1-1
-1/-1-1
- 87 -
Hard decisions
= y
j
m
j j
m
j
m
j
y x y x
2
) (
2 1
) (
1
) (
+ = A
+1+1 +1+1 +1+1 +1+1 +1+1 +1+1 -1+1
+1+1
0
-2
+2
0
0
-2
+2
0
-2
0
0
0
+2
-2
+2
j=0
j=6 j=5 j=4 j=3
j=2 j=1
j=7
+2
-4
+8
+6 +4
+12
+10 +8 +6
+2 0 +2
+4 +2
-2
0
+6
+6
+4 +2
0 +2
+4 +2 +4 +2
(+1+1,-1+1,+1+1,+1+1,+1+1,+1+1,+1+1)
- 88 -
+1+1 +1+1 +1+1 +1+1 +1+1 +1+1 -1+1
+1+1
0
-2
+2
0
0
-2
+2
0
-2
0
0
0
+2
-2
+2
j=0
j=6 j=5 j=4 j=3
j=2 j=1
j=7
+2
-4
+8
+6 +4
+12
+10 +8 +6
+2 0 +2
+4 +2
-2
0
+6
+6
+4 +2
0 +2
+4 +2 +4 +2
=
j
u
+1 +1 +1 +1 +1 +1 +1
- No error
- 89 -
Hard decisions
= y
j
m
j j
m
j
m
j
y x y x
2
) (
2 1
) (
1
) (
+ = A
+1+1 +1+1 +1+1 +1+1 +1+1 +1-1 -1+1
+1+1
0
-2
+2
0
0
0
0
-2
+2
+2
0
-2
+2
-2
+2
-2
+2
0
0
0
0
-2
+2 0
j=0
j=6 j=5 j=4 j=3
j=2 j=1
j=7
+2
-4
+6
+4 +2
+10
+8 +6 +4
+4 +2 +2
+2 +2
-2
0
+4
+4
+2 +4
+2 0
+6 +4 +2 +4
(+1+1,-1+1,+1-1,+1+1,+1+1,+1+1,+1+1)
- 90 -
+1+1 +1+1 +1+1 +1+1 +1+1 +1-1 -1+1
+1+1
0
-2
+2
0
0
0
0
-2
+2
+2
0
-2
+2
-2
+2
-2
+2 +2
0
0
0
0
-2
+2 0
j=0
j=6 j=5 j=4 j=3
j=2 j=1
j=7
+2
-4
+6
+10
+8 +6 +4
+4 +2 +2
+2 +2
-2
0
+4
+4
+2 0
+2 +4
=
j
u
+1 +1 +1 +1 +1 +1 +1
- No error
- 91 -
Hard decisions
= y
j
m
j j
m
j
m
j
y x y x
2
) (
2 1
) (
1
) (
+ = A
+1+1 +1+1 +1+1 +1+1 +1+1 -1+1 -1-1
+1+1
0
-2
+2
0
0
+2
-2
-2
+2
-2
0
+2
-2
+2
0
0
0
0
0
0
-2
+2 0
j=0
j=6 j=5 j=4 j=3
j=2 j=1
j=7
+2
-2
+8
+6 +8
+12
+10 +8 +2
+2 0 +4
0 0
-2
-2
+6
+10
+8
+4 +6
+8 +6 +8 +2
(+1+1,-1-1,-1+1,+1+1,+1+1,+1+1,+1+1)
- 92 -
+1+1 +1+1 +1+1 +1+1 +1+1 -1+1 -1-1
+1+1
0
+2
-2
+2
0
0
+2
-2
-2
+2
-2
0
+2
-2
+2
0
0
+2
-2
0
0
0
0
0
-2 -2
+2 0 +2
j=0
j=6 j=5 j=4 j=3
j=2 j=1
j=7
+2
-2
+8
+6
+12
+10 +8 +2
+2 0 +4
0 0
-2
-2
+6
+10
+8
+4 +6
+8 +8 +2
=
j
u
+1 -1 -1 +1 +1 +1 +1
- 2 decoding errors
- 93 -
Soft decisions
=
=
= +
= +
=
, , 2
, , 2 / 1
, , 2 / 1
, 2
) (
) (
) (
,
) (
ij
m
ij
ij
m
ij
ij
m
ij
ij
m
ij
ij
y x
y x
y x
y x
l
GOOD channel
BAD channel
GOOD channel
BAD channel
j j j j j j
m
j
y l x y l x
2 2 2 1 1 1
) (
+ =
CSI values ((G,B),(B,B),(G,G),(G,B),(B,B),(G,G),(G,G))
= y
(+1+1,-1-1,-1+1,+1+1,+1+1,+1+1,+1+1)
- 94 -
+1+1 +1+1 +1+1 +1+1 +1+1 -1+1 -1-1
+1+1
0
+1
-2.5
+2.5
-4
+4
-1
+1
-2.5
+2.5
0
-4
0
0
0
0
0
-4
+4
-4
+4
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
-4 -4 -1
-2.5
+1
+2.5
+4 +4 +4
j=0
j=6 j=5 j=4 j=3
j=2 j=1
j=7
+2.5
-2.5
+10
+7 +9
+18
+14 +11
+10
+5
-0.5 +1.5
+7.5 +3.5
-2.5
-2.5
+7
+13
+9 +5
-0.5
+1.5
+9 +5 +4 +1.5
G B G G G G B B G G G B B B
+2 +0.5 +2 +2 +2 +2 +0.5 +0.5 +2 +0.5 -2 +2 -0.5-0.5
- 95 -
+1+1 +1+1 +1+1 +1+1 +1+1 -1+1 -1-1
+1+1
0
+1
-2.5
+2.5
-4
+4
-1
+1
-2.5
+2.5
0
-4
0
0
0
0
0
+4 +4
0
0
0
0
-2.5
+2.5
+4
j=0
j=6 j=5 j=4 j=3
j=2 j=1
j=7
+2.5
-2.5
+10
+18
+14 +11
+10
-0.5 +1.5
+7.5 +3.5
-2.5
-2.5
+7
+13
+1.5
+1.5
G B G G G G B B G G G B B B
+2 +0.5 +2 +2 +2 +2 +0.5 +0.5 +2 +0.5 -2 +2 -0.5-0.5
=
j
u
+1 +1 +1 +1 +1 +1 +1
- No error
The viterbi algorithm
A.J. Han Vinck
Lecture notes data communications
10.01.2009
97
content
Viterbi decoding for convolutional
codes
Hidden Markov models
With contributions taken from Dan
Durafsky
98
Problem formulation
noi
se
informat
ion
Finite
State
Machine
observatio
n
What is the best estimate for the information given the
observation?
Maximum Likelihood receiver:
max P( Y | X ) = max P( X+N | X ) = max P( N )
for independent transmissions = max H
i=1,L
P( N
i
)
minimum weight noise
sequence
x n
y = x +
n
99
The Noisy Channel Model
Search through space of all possible
sentences.
Pick the one that is most probable given the
waveform.
100
characteristics
the Viterbi algorithm is a standard component of tens of millions of high-
speed modems. It is a key building block of modern information
infrastructure
The symbol "VA" is ubiquitous in the block diagrams of modern
receivers.
Essentially:
the VA finds a path through any Markov graph, which is a
sequence of states governed by a Markov chain.
many practical applications:
convolutional decoding and channel trellis decoding.
fading communication channels,
partial response channels in recording systems,
optical character recognition,
voice recognition.
DNA sequence analysis
etc.
101
Illustration of the algorithm
st 1 0.7 st 2
0.5 0.2
IEM 0.5 1.2
UNI
0.8 0.2
st 3 st 4
0.8
0.5 1.2
1.2
1.0 0.8
surviv
or
102
Key idea
Best path from A to C = best of
- the path A-F-C
- best path A to B + best path from B
to C
- the path via D does not influence the best way from B to C
A
B
C
D E
F
103
Application to convolutional
code
encode
r
VD
channel
Info code code + noise
estimate
binary noise sequences
P(n1=1)=P(n2=1) = p
I
dela
y
c
1
c
2
n
1
n
2
c1
n1
c2
n2
VITERBI DECODER: find sequence I that corresponds to code
sequence ( c1, c2 ) at minimum distance from (r1,r2) = (c1 n1,
c2 n2)
104
Use encoder state space
I
dela
y
c
2
00
01
11
10
Stat
e 0
Stat
e 1
Time 0 1 2 3
00
01
11
10
00
01
11
10
00
01
11
10
105
00
11
Stat
e 0
Stat
e 1
00
01
11
10
00
01
11
10
00
01
11
10
00
11
Stat
e 0
Stat
e 1
00
01
11
10
00
01
11
10
00
01
11
10
Encoder output 00 11 10
00
channel output 00 10 10 00
0
2 1
1 1
2
1
3
best
106
Viterbi Decoder action
VITERBI DECODER:
find sequence I that corresponds to
code sequence ( c1, c2 ) at minimum distance from ( r1, r2 ) =
( c1 n1, c2 n2 )
Maximum Likelihood receiver: find ( c1, c2 ) that maximizes
Probability ( r1, r2 | c1, c2 ) = Prob ( c1 n1, c2 n2 |
c1, c2 ) =
= Prob ( n1, n2 )
= minimum # noise digits equal to
1
107
Distance Properties of Conv. Codes
Def: The free distance, d
free
, is the minimum
Hamming distance between any two code
sequences.
Criteria for good convolutional codes:
1. Large free distance, d
free
.
2. Small numer of information bits equal to 1 in
sequences with low Hamming weight
There is no known constructive way of
designing a convolutional code of given
distance properties.
However, a given code can be analyzed to
find its distance properties.
108
Convolutional Codes 108
Distance Prop. of Convolutional Codes (contd)
Convolutional codes are linear.
Therefore, the Hamming distance between any pair of code sequences corresponds to
the Hamming distance between the all-zero code sequence and some nonzero code
sequence.
The nonzero sequence of minimum Hamming weight diverges from the all-zero path at
some point and remerges with the all-zero path at some later point.
109
Distance Properties: Illustration
sequence 2: Hamming weight = 5, d
inf
= 1
sequence 3: Hamming weight = 7, d
inf
= 3.
110
Modified State Diagram (contd)
A path from (00) to (00) is denoted by
D
i
(weight)
L
j
(length)
N
k
(# info 1s)
111
Transfer Function
The transfer function T(D,L,N)
T(D, L, N)
D L
DNL(1 L)
=
+
5 3
1
112
Transfer Function (contd)
Performing long division:
T(D,L,N) = D
5
L
3
N + D
6
L
4
N
2
+ D
6
L
5
N
2
+ D
7
L
5
N
3
+ .
If interested in the Hamming distance property of the code only,
set N = 1 and L = 1 to get the distance transfer function:
T (D) = D
5
+ 2D
6
+ 4D
7
+
There is one code sequence of weight 5. Therefore d
free
=5.
There are two code sequences of weight 6,
four code sequences of weight 7, .
113
performance
The event error probability is defined as
the probability that the decoder selects
a code sequence that was not
transmitted
For two codewords the Pairwise Error
Probability is
The upperbound for the event error
probability is given by
d
d
2 / d
p 1
p
d i d i
d
2
1 d
i
) ) p 1 ( p ( 4 (
) p 1 ( 2 ) p 1 ( p
i
d
) d ( PEP
s
|
.
|
\
|
s
|
|
.
|
\
|
=
+
=
=
s
correct
node
incorrect
114
performance
using the T(D,N,L), we can formulate this as
The bit error rate (not probability) is written as
) p 1 ( p 2 D ; 1 N L
event
) N , L , D ( T P
= = =
s
) p 1 ( p 2 D ; 1 N ; 1 L
dN
d
bit
) N , L , D ( T P
= = =
s
115
The constraint length of the convolutional code: k = 1 + # memory elements
Complexity Viterbi decoding: proportional to 2
K
(number of different states)
116
PERFORMANCE:
theoretical uncoded BER given by
where Eb is the energy per information bit
for the uncoded channel, E
s
/N
0
= E
b
/N
0
, since there is one channel
symbol per bit.
for the coded channel with rate k/n, nE
s
= kE
b
and thus E
s
= E
b
k/n
The loss in the signal to noise ratio is thus -10log
10
k/n dB
for rate codes we thus loose 3 dB in SNR at the receiver
) ( Q P
2 / No
b
E
uncoded
~
117
metric
We determine the Hamming distance
between the received symbols and the code
symbols
d(x, y) is called a metric
Properties:
d(x, y) 0 (non-negativity)
d(x, y) = 0 if and only if x = y (identity)
d(x, y) = d(y, x) (symmetry)
d(x, z) d(x, y) + d(y, z) (triangle inequality).
118
Markov model for Dow Jones
Figure from Huang et al, via
119
Markov Model for Dow Jones
What is the probability of 5 consecutive up
days?
Sequence is up-up-up-up-up
I.e., state sequence is 1-1-1-1-1
P(1,1,1,1,1) =
t
1
a
11
a
11
a
11
a
11
= 0.5 x (0.6)
4
= 0.0648
120
Application to Hidden Markov
Models
Definition:
The HMM is
a finite set of states, each of which is associated with a
probability distribution.
transitions among the states are governed by a set of
probabilities called transition probabilities.
In a particular state an outcome or observation can be
generated, according to the associated probability
distribution.
It is only the outcome, not the state visible to an
external observer and therefore states are ``hidden'' to the
outside; hence the name Hidden Markov Model.
EXAMPLE APPLICATION: speech recognition
and synthesis
121
Example HMM for Dow Jones (from
Huang et al.)
1 2
3
0.2
0.5
0.2
0.1
0.3 0.6
0.5
0.2
0.4
P(up)
P(down) =
P(no-change)
0.3
0.3
0.4
0.7
0.1
0.2
0.1
0.6
0.3
0.5
0.2 = initial state probability
0.3
0.6
0.5
0.4
0.2
0.3
0.1
0.2
0.2 transition matrix
0.5
122
Calculate
Probability ( observation | model
)
Trellis:
0.5
0.3
0.2
P(up)
P(down)
P(no-change)
0.3
0.3
0.4
0.7
0.1
0.2
0.1
0.6
0.3
0.179
0.036
0.008
Probability, UP, UP,
UP, ***
0.35
0.02
0.09
0.35*0.2*0.3
0.02*0.5*0.7
0.09*0.4*0.7
0.02*0.2*0.3
0.09*0.5*0.3
0.35*0.6*0.7 0.179*0.6*0.7
0.008*0.5*0.7
0.036*0.4*0.7
0.6
0.5
0.4
0.2
0.3
0.1
0.2
0.2 transition matrix
0.5
0.223 0.46
add
probabilities !
123
Calculate
Probability ( observation | model
)
Note: The given algorithm calculates
) , , , , ( ) , , , , , ( =
up up up up P sequence state up up up up P
sequences
state
all
124
Calculate
max
S
Prob( up, up, up and state sequence S )
0.35
0.09
0.02
P(up)
P(down)
P(no-change)
0.3
0.3
0.4
0.7
0.1
0.2
0.1
0.6
0.3
0.147
0.021
0.007
Observation is (UP, UP, UP,
*** )
0.35*0.2*0.3
0.02*0.5*0.7
0.09*0.4*0.7
0.02*0.2*0.3
0.09*0.5*0.3
0.35*0.6*0.7 0.147*0.6*0.7
0.007*0.5*0.7
0.021*0.4*0.7
0.6
0.5
0.4
0.2
0.3
0.1
0.2
0.2 transition matrix
0.5
0.5
0.2
0.3
best
Select highest
probability !
125
Calculate
max
S
Prob( up, up, up and state sequence S )
Note: The given algorithm calculates
) , , , , ( ) , , , , | (
) , , , , (
max
max
up up up up P up up up up sequence state P
sequence state and up up up up P
sequence
state
sequence
state
Hence, we find
the most likely state sequence given the observation
126
06 June 2005 08:00 AM (GMT -05:00)
Send
Link
Printer
Friendly
(From The Institute print edition)
Viterbi Receives Franklin
Medal
As a youth, Life Fellow Andrew Viterbi never envisioned that hed create an algorithm used in every cellphone or that he would cofound Qualcomm, a Fortune 500 company
that is a worldwide leader in wireless technology.
Viterbi came up with the idea for that algorithm while he was an engineering professor at the University of California at Los Angeles (UCLA) and then at the University of
California at San Diego (UCSD), in the 1960s. Today, the algorithm is used in digital cellphones and satellite receivers to transmit messages so they wont be lost in noise.
The result is a clear undamaged message thanks to a process called error correction coding. This algorithm is currently used in most cellphones.
The algorithm was originally created for improving communication from space by being able to operate with a weak signal but today it has a multitude of applications,
Viterbi says.
For the algorithm, which carries his name, he was awarded this years Benjamin Franklin Medal in electrical engineering by the Franklin Institute in Philadelphia, one of the
United States oldest centers of science education and development. The institute serves the public through its museum, outreach programs, and curatorial work. The
medal, which Viterbi received in April, recognizes individuals who have benefited humanity, advanced science, and deepened the understanding of the universe. It also
honors contributions in life sciences, physics, earth and environmental sciences, and computer and cognitive sciences.
Qualcomm wasnt the first company Viterbi started. In the late 1960s, he and some professors from UCLA and UCSD founded Linkabit, which developed a video scrambling
system called Videocipher for the fledgling cable network Home Box Office. The Videocipher encrypts a video signal so hackers who havent paid for the HBO service cant
obtain it.
Viterbi, who immigrated to the United States as a four-year-old refugee from facist Italy, left Linkabit to help start Qualcomm in 1985. One of the companys first successes
was OmniTracs, a two-way satellite communication system used by truckers to communicate from the road with their home offices. The system involves signal processing
and an antenna with a directional control that moves as the truck moves so the antenna always faces the satellite. OmniTracs today is the transportation industrys largest
satellite-based commercial mobile system.
Another successful venture for the company was the creation of code-division multiple access (CDMA), which was introduced commercially in 1995 in cellphones and is still
big today. CDMA is a spread-spectrum technologywhich means it allows many users to occupy the same time and frequency allocations in a band or space. It assigns
unique codes to each communication to differentiate it from others in the same spectrum.
Although Viterbi retired from Qualcomm as vice chairman and chief technical officer in 2000, he still keeps busy as the president of the Viterbi Group, a private investment
company specializing in imaging technologies and biotechnology. Hes also professor emeritus of electrical engineering systems at UCSD and distinguished visiting
professor at Technion-Israel Institute of Technology in Technion City, Haifa. In March he and his wife donated US $52 million to the University of Southern California in Los
Angeles, the largest amount the school ever received from a single donor.
To honor his generosity, USC renamed its engineering school the Andrew and Erna Viterbi School of Engineering. It is one of four in the nation to house two active National
Science Foundationsupported engineering research centers: the Integrated Media Systems Center (which focuses on multimedia and Internet research) and the
Biomimetic Research Center (which studies the use of technology to mimic biological systems).
Andrew Viterbi