1 Merged

Information Theory and Applications
(17B1NEC735)
Lecture 17: Module No. – 4 Data Transmission
Topics: Discrete memoryless channel (DMC),
Mutual information and Channel capacity of DMCs.
Department of Electronics and Communication Engineering

Jaypee Institute of Information Technology, Noida
1
Data Transmission (Noisy Coding)
Learning Objectives:
 To learn fundamentals of data transmission through noisy

transmission channels.
 How a noisy transmission can be formalized by the notion of a
“channel”
 To know Discrete channel capacity and transmission rate.
 To know conditions in which data transmission is possible in noisy
channels- Channel Coding Theorem.
 Capacity of AWGN channel- Channel Capacity Theorem.
 Shannon Limit (Implications of channel capacity theorem).
2
Introduction to channel coding (data transmission)
• Examples of noisy communication channels :

1. Phone line (twisted pair : thermal noise, distortions, cross-talk. )
2. Satellite (cosmic rays. . . )
3. Hard-disk (read or write errors, imperfect material)
4. Wireless communication channels
• Two engineering approaches to noisy conditions :

1. Physical approach: better circuits, lower density, better cooling,
increase signal power etc.
2. System approach: compensate for the bad performances by using
the resources in an “intelligent” way.
3
Information theory (and coding) provides system approach (and solutions)
• Add redundancy to the input message and exploit this redundancy

when decoding the received message.
• Information theory :
– Which are the possibilities (and limitations) terms of
performance trade-offs ?
⇒ analysis problem (Module No - 4 and 5)
• Coding theory :
– How to build practical error-compensation systems ?
⇒ design problem (Module No - 6,7 and 8)
4
Important Remarks
• Analysis problem: solved for most of the channels (but not for
networks)
• Design problem: generally very difficult, solved in a satisfactory way
for a subset of problems only.
• Cost of the system approach: performance trade-off, computational
complexity, loss of real-time characteristics (in some cases).
• Cost of the physical approach: investment and power.
• Example (Deep-space communications)

– Channel coding can make “infeasible” projects “feasible”
(Massey J.L. (1992) Deep-space communications and coding: A marriage made in heaven. In: Hagenauer J. (eds)
Advanced Methods for Satellite and Deep Space Communications. Lecture Notes in Control and Information
Sciences, vol 182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0036046)
5
Communication System
6
Discrete Memoryless Channels
• Channel Representations:
– A communication channel is the path or medium through
which the symbols flow to the receiver.
– A discrete memoryless channel (DMC) is a statistical model
with an input X and output Y.
– During each use of channel, the channel accepts an input
symbol from X, and in response it generates an output symbol
from Y.
– The channel is “discrete” in nature when the alphabets of X
and Y are both finite.
– It is “memoryless” when the current output depends on only
the current input and not on any of the previous inputs.
7
p( y j | xi )
• The above diagram represents a DMC with m input and n outputs.

• The input X consists of input symbols x1, x2, x3…,xm.
• The prior probabilities of the source symbols p(xi) are assumed to
be known.
• The output Y consists of output symbols y1, y2, y3,…,yn.
• Each possible input-to-output path is indicated along with a
conditional probability p(yj|xi) where p(yj|xi) is the conditional
probability of obtaining output yj given that the input is xi, and is
called channel transition probability.
8
Channel Matrix, P[Y|X]
• A channel matrix can be completely specified by the complete set
of transition probabilities.
 p( y1 | x1 ) p( y 2 | x1 ) p( y 3 | x1 ) ... p( y n | x1 ) 
 p( y | x ) p( y | x ) p( y | x ) ... p( y n | x 2 ) 
 1 2 2 2 3 2
P [ Y | X ]   p( y1 | x3 ) p( y 2 | x3 ) p( y 3 | x3 ) ... p( y n | x3 ) 
 
      
 p( y1 | x m ) p( y 2 | x m ) p( y 3 | x m )  p( y n | x m )
• The matrix P[Y|X] is called the channel matrix. Since each input to
the channel results in some output, each row of the channel
matrix must sum to unity, i.e.,
n
j 1
p( y j | xi )  1j
9
• Input matrix P[ X ]  [ p( x1 ) [ p( x2 )  p( xm )]
• Output matrix P [ Y ]  [ p( y1 ) p( y 2 )  p( y n )]
And P [ Y ]  P [ X ] P [ Y | X ]
• If P[X] is represented as diagonal matrix

 p( x1 ) 0  0 
 0 p( x 2 )  0 
P[ X ]d   
     
 
 0 0  p( x m )
And P [ X ,Y ]  P [ X ] d P [ Y | X ]
10
Mutual Information and Channel Capacity
• The mutual information I(X;Y) of a channel is defined by

I(X;Y) = H(Y) - H(Y|X)= H(X) - H(X|Y).
Since H(X) represents the uncertainty about the channel
input before the channel output is observed and H(Y|X) represents
the uncertainty about channel output given channel input.
• The mutual information I(X;Y) represents the uncertainty about
the channel input that is resolved by observing the channel
output.
• The channel capacity per symbol of a DMC is defined as
C  max I ( X ;Y )
p( x i )
where maximization is over all possible input probability

distributions p(xi) on X.
11
Classification of Channels (or Special channels)
1. Lossless channels – only one non-zero element in each column of

channel matrix.
3 / 4 1 / 4 0 0 0
P [ Y | X ]   0 0 1 / 3 2 / 3 0
 0 0 0 0 1
H(X|Y) = 0 (Since output-to-input mapping is many-to-one).

Therefore, I(X;Y) = H(X).
Capacity C  max I ( X ;Y )  max H ( X )  log m , m is no of input
p( x i ) p( x i )
12
2. Deterministic channels – only one non-zero element in each row of

channel matrix.
1 0 0
1 0 0

P[Y | X ]  0 1 0
 
0 1 0
0 0 1
H(Y|X) = 0 (Since input-to-output mapping is many-to-one).

Therefore, I(X;Y) = H(Y).
Capacity C  max I ( X ;Y )  max H ( Y )  log n , n is no of output
p( x i ) p( x i )
13
3. Noiseless channels = Lossless + Deterministic – only one non-zero element
in each row and column of channel matrix.
1 0  0
0 1  0
P[ Y | X ]  
   
 
0 0  1
H(X|Y) = H(Y|X)= 0 (Since output-to-input and input-to-output mapping

both is one-to-one). Therefore, I(X;Y) = H(X) = H(Y).
Capacity C  max I ( X ;Y )  max H ( X )  log m  log n  m  n

p( x i ) p( x i )
where m and n are no of input and output, respectively.
14
4. Symmetric channels – rows and columns in channel matrix are

permutations of each other.
1 / 2 1 / 4 0 1 / 4
 0 1 / 2 1 / 4 1 / 4
P[Y | X ]   
1 / 4 1 / 4 1 / 2 0 
 
1 / 4 0 1 / 4 1 / 2
Most famous Example: Binary symmetric Channel (BSC)
1  p p 
P[Y | X ]   
 p 1  p 
15
Symmetric Channels
• Important role in communication systems and many such
system attempt, by design, to achieve a symmetric
channel function.
• Example (By system design): QPSK is a modulation method
that produces a symmetric channel.
• Reason: It frequently has greater channel capacity than an
otherwise equivalent non-symmetric channel .
• Here, we will study
1. Binary Symmetric Channel (BSC)
2. Binary Erasure Channel (BEC)
16
References for Module No. 4
• R.B. ASH: Information Theory, Dover, 1990.
• R.W. YEUNG: Information Theory and Network Coding, Springer,
2010.
• R. BOSE: Information Theory, Coding and Cryptography, Tata
McGraw-Hill Education, 2016.
• T.M. COVER, J.A. THOMAS: Elements of Information Theory,
Wiley, 2006.
• H.P. HSU: Analog and Digital Communication: Schaum’s Outline
Series, McGraw-Hill, 2017.
17
(17B1NEC735)
Topics: Channel capacity of BSC and BEC

1
Binary Symmetric Channel (BSC)
1  p p 
P[Y | X ]   
 p 1  p 
Channel Matrix
Channel Model
 This is a binary channel in which the input symbols are
complemented with probability p.
 This is the simplest model of a channel with errors, yet it
captures most of the complexity of the general problem.
 When an error occurs, a 0 is received as a 1, and vice
versa.
 The bits received do not reveal where the errors have
occurred.
2
Capacity of BSC (1/2)
• Using I(X;Y)= H(Y) – H(Y|X)

Now , find H ( Y | X )   p( xi )H ( Y | X  xi )
i
 p( x1 )H ( Y | X  x1 )  p( x 2 )H ( Y | X  x 2 )
H(row of channel matrix)
 p( x1 )H ( 1  p , p )  p( x 2 )H ( p ,1  p )
 ( p( x1 )  p( x 2 ))H ( p ,1  p )
 H ( p ,1  p ) Since  p( x
i
i ) 1
So , H ( Y | X  x1 )  H ( Y | X  x 2 )
And H ( Y | X )  H ( p )
Or H ( Y | X )   p log p  ( 1  p ) log( 1  p )
3
Capacity of BSC (2/2)
Substituti ng H ( Y | X ) in C
C  max I ( X ;Y )  max( H ( Y )  p log p  ( 1  p ) log( 1  p ))
p( x i ) p( x i )
 max( H ( Y ))  p log p  ( 1  p ) log( 1  p )

p( x i )
• Since H(Y|X) is independent of parameter of p(xi).

C  1  p log p  ( 1  p ) log( 1  p )
or C  1  H ( p ,1  p ) C BSC vs p
• Since output and input are binary,
max(H(X)) = max(H(Y))=1.
- Equality is achieved when the input
distribution is uniform.
4
Example: Calculate Mutual Information and Channel Capacity of BSC:
a) p = 0.1, α = 0.3 c) p = 0.5, α = 0.3
b) p = 0.1, α = 0.5 d) p = 0.5, α = 0.5
where p is probability of error and α is the input probability of first symbol.
Solution:
a) Given: p = 0.1, α = 0.3
0.9 0.1
Therefore, here, P[X] = [0.3 1 - 0.3 = 0.7] and P [ Y | X ]   
 0 .1 0 .9 
For BSC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.1,0.9) = 0.469 bit.

0.9 0.1
P [ Y ]  P [ X ] P [ Y | X ]  0.3 0.7   0.34 0.66
 0.1 0.9
H(Y) = H(0.34, 0.66) = 0.925 bit
Substituting H(Y) and H(YIX) in I(X;Y) and H(YIX) in C of BSC,
I(X;Y) = H(Y) - H(Y|X) = 0.456 bit
C = 1 - H(Y|X) = 0.531 bit
5
Example (cont.):
Calculate Mutual Information and Channel Capacity of BSC:
a) p = 0.1, α = 0.3 c) p = 0.5, α = 0.3
b) p = 0.1, α = 0.5 d) p = 0.5, α = 0.5
Solution:
b) Given: p = 0.1, α = 0.5
0.9 0.1
Therefore, here, P[X] = [0.5 1 - 0.5 = 0.5] and P [ Y | X ]   0.1 0.9
 
For BSC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.1,0.9) = 0.469 bit.

0.9 0.1
P [ Y ]  P [ X ] P [ Y | X ]  0.5 0.5   0.5 0.5
 0.1 0.9
H(Y) = H(0.5, 0.5) = 1 bit
I(X;Y) = H(Y) - H(Y|X) = 0.531 bit
C = 1- H(Y|X) = 0.531 bits = I(X:Y).
6
Example (cont.):
a) p = 0.1, α = 0.3 c) p = 0.5, α = 0.3
b) p = 0.1, α = 0.5 d) p = 0.5, α = 0.5
Solution:
c) Given: p = 0.5, α = 0.3
0.5 0.5
Therefore, here, P[X] =[0.3 1 - 0.3 = 0.7] and P [ Y | X ]   
 0. 5 0 .5 
For BSC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.5,0.5) = 1 bit
0.5 0.5
P [ Y ]  P [ X ] P [ Y | X ]  0.3 0.7   0.5 0.5
0.5 0.5
H(Y) = H(0.5,0.5)= 1 bit
I(X;Y) = H(Y) – H(Y|X) = 1 – 1 = 0
and also C = 1 – H(Y|X) = 1 - 1 = 0 bit. (Useless Channel)
7
Example (cont.):
a) p = 0.1, α = 0.3 c) p = 0.5, α = 0.3
b) p = 0.1, α = 0.5 d) p = 0.5, α = 0.5
Solution:
d) Given: p = 0.5, α = 0.5
0.5 0.5
 0. 5 0 .5 
For BSC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.5,0.5) = 1 bit

0.5 0.5
P [ Y ]  P [ X ] P [ Y | X ]  0.5 
0.5    0.5 0.5
0.5 0.5
H(Y) = 1 bit
H(Y|X) = H(Y) = 1bit
Therefore, C = I(X;Y) = 0 bit. (Useless Channel)
8
Example (contd.)
Summary of Results
Case p α I(X;Y) C
1 0.1 0.3 0.456 0.531
2 0.1 0.5 0.531 0.531
3 0.5 0.3 0 0
4 0.5 0.5 0 0
9
Binary Erasure Channel (BEC)
p( x1 )  
1  p p 0 
P[Y | X ]  
 0 p 1  p 
Channel Matrix
p( x2 )  1  
Channel Model
• The analog of the BSC in which some bits are lost (rather than
corrupted) is the BEC.
• In this channel, a fraction p of the bits are erased. The receiver
knows which bits have been erased.
• The BEC has two inputs and three outputs.
10
Capacity of BEC (1/3)
Use I(X; Y)  H(Y) – H(Y | X) for C of BEC
find H ( Y | X )   p( xi )H ( Y | X  xi )
i
 p( x1 )H ( Y | X  x1 )  p( x 2 )H ( Y | X  x 2 )
 p( x1 )H ( 1  p , p ,0 )  p( x 2 )H ( 0 , p ,1  p )
 ( p( x1 )  p( x 2 ))H ( 1  p , p )
 H (1  p, p )
So , H ( Y | X )  H ( Y | X  x1 )  H ( Y | X  x 2 )  H ( p )
  p log p  ( 1  p ) log( 1  p ), Same as BSC
• For H(Y) find P[Y],

1  p p 0 
P [ Y ]   1   
 0 p 1  p 
  ( 1  p ) p ( 1   )( 1  p )   p( y1 ) p( y 2 ) p( y 3 )
11
H ( Y )  H (  ( 1  p ), p ,( 1   )( 1  p ))
  ( 1  p ) log{  ( 1  p )}  p log p  {( 1   )(1  p )} log{( 1   )( 1  p )}
 {  ( 1  p )} log   {  ( 1  p )} log{( 1  p )}  p log p
 {( 1   )(1  p )} log{( 1   )}  {( 1   )( 1  p )} log{( 1  p )}
 {  ( 1  p )} log   {  ( 1  p )} log{( 1  p )}  {( 1   )(1  p )} log( 1   )
 {( 1   )(1  p )} log( 1  p )  p log p
 ( 1  p )[  log    log( 1  p )  ( 1   ) log( 1   )  ( 1   ) log( 1  p )]  p log p
 ( 1  p )[  log   ( 1   ) log( 1   )   log( 1  p )  ( 1   ) log( 1  p )]  p log p
 ( 1  p )[( H ( X )  log( 1  p )]  p log p

 ( 1  p )( H ( X )  ( 1  p ) log( 1  p )  p log p
Finally , H ( Y )  ( 1  p )( H ( X ))  H ( Y | X )
12
C  max I ( X ;Y )  max( H ( Y )  H ( Y | X ))
p( x i ) p( x i )
 max( H ( Y ))  H ( Y | X )
p( x i )
 max ( 1  p )H ( X )   H ( Y | X )  H ( Y | X )
p( x i )
 (1  p )
• The expression for the capacity has some intuitive meaning:
– Since a proportion p of the bits are lost in the channel, we
can recover (at most) a proportion 1 − p of the bits. Hence
the capacity is at most 1 − p.
– It is not immediately obvious that it is possible to achieve
this rate. This will follow from Shannon’s second theorem.
13
Example: Calculate Mutual Information and Channel Capacity of BEC:
a) p = 0.1, α = 0.3 c) p = 0.5, α = 0.3
b) p = 0.1, α = 0.5 d) p = 0.5, α = 0.5
where p is probability of error and α is the input probability of first symbol
Solution:
a) Given: p = 0.1, α = 0.3
0.9 0.1 0 
 0 0.1 0.9 
For BEC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.1,0.9) = 0.469 bit
0.9 0.1 0 
P [ Y ]  P [ X ] P [ Y | X ]  0.3 0.7   0.27 0.1 0.63
 0 0.1 0.9
H(Y) = H(0.27,0.1,0.63) = 1.262 bits

Substituting H(Y) and H(YIX) in I(X;Y) and p in C,
I(X;Y) = H(Y) - H(Y|X) = 0.793 bit
C = 1 - p = 0.9 bit.
14
Example (cont.):
Calculate Mutual Information and Channel Capacity of BEC:
a) p = 0.1, α = 0.3 c) p = 0.5, α = 0.3
b) p = 0.1, α = 0.5 d) p = 0.5, α = 0.5
Solution:
b) Given: p = 0.1, α = 0.5
0.9 0.1 0 
Therefore, here, P[X] =[0.5 1 – 0.5 = 0.5] and P [ Y | X ]   
 0 0.1 0.9 
For BEC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.1,0.9) = 0.469 bit

0.9 0.1 0 
P [ Y ]  P [ X ] P [ Y | X ]  0.5 0.5   0.45 0.1 0.45
 0 0.1 0.9
H(Y) = H(0.45,0.1,0.45) = 1.369 bits

I(X;Y) = 1.369 - 0.469 = 0.9 bit.
C = 1- p = 0.9 bit = I(X;Y).
15
Example (cont.):
a) p = 0.1, α = 0.3 c) p = 0.5, α = 0.3
b) p = 0.1, α = 0.5 d) p = 0.5, α = 0.5
Solution:
c) Given: p = 0.5, α = 0.3
 0 .5 0 .5 0 
Therefore, here, P[X] =[0.3 1 – 0.3 = 0.7] and P [ Y | X ]   
 0 0 . 5 0 . 5 
For BEC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.5,0.5) = 1 bit
0.5 0.5 0 
P [ Y ]  P [ X ] P [ Y | X ]  0.3 0.7   0.15 0.5 0.35
 0 0.5 0.5
H(Y) = H(0.15, 0.5, 0.35) = 1.441 bits.

I(X;Y)= 1.441 – 1 = 0.441 bit and C = 1 - 0.5 = 0.5 bit.
16
Example (cont.):
a) p = 0.1, α = 0.3 c) p = 0.5, α = 0.3
b) p = 0.1, α = 0.5 d) p = 0.5, α = 0.5
Solution:
d) Given: p = 0.5, α = 0.5
 0 .5 0 .5 0 
Therefore, here, P[X] =[0.5 1 - 0.5 = 0.5] and P [ Y | X ]   0 0 .5 0 .5 
 
For BEC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.5,0.5) = 1 bit
0.5 0.5 0 
P [ Y ]  P [ X ] P [ Y | X ]  0.5 0.5   0.25 0.5 0.25
 0 0.5 0.5
H(Y) = H(0.25, 0.5, 0.25) = 1.5 bits.

I(X;Y) = 1.5 – 1 = 0.5 bit = C.
17
Example (contd.)
Summary of Results
Case p α I(X;Y) C
1 0.1 0.3 0.793 0.9
2 0.1 0.5 0.9 0.9
3 0.5 0.3 0.441 0.5
4 0.5 0.5 0.5 0.5
18
References for Module No. 4
• R.W. YEUNG: Information Theory and Network Coding,
Springer, 2010.
• R. BOSE: Information Theory, Coding and Cryptography,
Tata McGraw-Hill Education, 2016.
• T.M. COVER, J.A. THOMAS: Elements of Information
Theory, Wiley, 2006.
19
(17B1NEC735)
Topics: Channel Capacity Examples, Channel Coding Theorem

1
Example: A channel is described by given channel matrix. Draw the channel
diagram and find channel capacity.
1 / 2 1 / 2 0
P[ Y | X ]   
 0 0 1 
Solution: The channel diagram
The Channel capacity:

Find mutual information I(X;Y) = H(Y) – H(Y|X)
For I ( X ;Y ), find H ( Y | X )   p( xi )H ( Y | X  xi )
i
 p( x1 )H ( Y | X  x1 )  p( x 2 )H ( Y | X  x 2 )
2
Example (cont.):
Assume p( x1 )  
H ( Y | X )  p( x1 )H ( 1 / 2,1 / 2,0 )  p( x 2 )H ( 0,0,1 )
 p( x1 )(log 2 )  p( x 2 )( 0 )

1 / 2 1 / 2 0
For H ( Y ), find P [ Y ]  P [ X ] P [ Y | X ]   1    
 0 0 1 
  / 2  / 2 ( 1   )   p( y1 ) p( y 2 ) p( y3 )
H ( Y )  H (  / 2, / 2,( 1   ))
  log(  / 2 )  ( 1   ) log( 1   )
  log 2   log(  )  ( 1   ) log( 1   )
Substituti ng H ( Y ) and H ( Y | X ) in I ( X ;Y )
    log   ( 1   ) log( 1   )  
  log(  )  ( 1   ) log( 1   )
 H( X )
C  max I ( X ;Y )  max H ( X )  log 2  1bit / symbol

p( xi ) p( xi )
3
Cascade Interconnection of BSCs
w1 = 0
X Y
w2 = 1
1  p p  1  p p 
P[ W | X ]   
1  p   p 1  p 
P [ Y | W ]
 p 
Channel Matrix 1 Channel Matrix 2
P[ Y ]  P[ W ] P[ Y | W ]  P[ X ] P[ W | X ] P[ Y | W ]
Also P [ Y ]  P [ X ] P [ Y | X ]
Therefore , P [ Y | X ]  P [ W | X ] P [ Y | W ]
Example : For p  0.1, find overall channel matrix
P[ Y | X ]  P[ W | X ] P[ Y | W ]
0.9 0.1 0.9 0.1 0.82 0.18
 
 0.1 0.9  0.1 0.9 0.18 0.82
So, overall p = 0.18 in interconnected BSCs in cascade manner.
4
Example: Two BSCs are connected in cascade:
Find overall channel matrix, P[Z] and Channel Capacity C. Assume p(x1)=0.6.
Solution: For overall channel matrix

P[ Z | X ]  P[ Y | X ] P[ Z | Y ]
0.8 0.2 0.7 0.3 0.62 0.38
 
0.2 0.8 0.3 
0.7 0.38 0.62
P[ Z ]  P[ X ] P[ Z | X ]
0.62 0.38
 0.6 0.4   0.524 0.476
0.38 0.62
C  1  H ( transition probabilit y )
 1  0.38 log 0.38  0.62 log 0.62  1  0.958  0.042 bit / symbol
5
Properties of Channel Capacity
• C ≥ 0 since I (X; Y) ≥ 0 (non-negative).
• C ≤ log m since C = max I (X; Y) ≤ maxH(X) = log m, m is no
of input.
• Also C ≤ log n, n is no of output.
• I (X; Y) is a continuous function of p(x).
• I(X; Y) is a concave function of p(x). I(X; Y) is a concave
function over a closed convex set, a local maximum is a
global maximum.
• In general, there is no closed-form solution for the
capacity. But for many simple channels it is possible to
calculate the capacity using properties such as symmetry.
6
Channel Coding Theorem (1/6)
(or Shannon’s fundamental (or Second )Theorem)
• The noise in a channel causes errors between input and output data
sequences of a digital communication systems.
• In most of the applications (probability of error up to 0.1 for wireless
communication channel), level of reliability is unacceptable. To
achieve high performance, channel coding is required.
7
• The design goal of channel coding is to increase the resistance of
digital communication systems to channel noise.
• There is channel encoder in transmitter and channel decoder in
receiver.
• The channel encoding and decoding blocks are under designer’s
control to optimize the overall reliability of communication system.
• This involves adding redundancy by channel encoder so as to
reconstruct the original sequence as accurately possible.
• The accurate reconstruction of the original source sequence at the
destination requires that the average probability of symbol error be
arbitrarily low.
8
• The two important questions are:
1) Does there exists a channel coding scheme such that the
probability that a message bit will be in error is less than any
positive number?
2) The channel coding scheme is yet efficient in that code rate r
(= number of message bits/number of bits in coded block ) need
not to be small?
Answer to these questions: Channel Coding theorem
9
• Assumptions:
• The DMS has the source alphabet S and entropy H(S), this emits
symbols at once every Ts seconds. Hence, average information rate
of this source is H(S)/Ts bits per second.
• The DMC has a channel capacity equal to C bits per use of channel.
• Further, the channel is capable of being used once every Tc seconds.
• Hence, the channel capacity per unit time is C/Tc bits per second,
which represents the maximum rate of information transfer over the
channel.
10
11
ii) Conversely, if
it is not possible to transmit information over the channel and

reconstruct it with an arbitrarily small probability of error.
Most important result in Information Theory: The channel coding

theorem specifies the channel capacity C as a fundamental limit on
the rate at which transmission of reliable error-free messages can
take place over a discrete memoryless channel.
12
Application of Channel Coding Theorem to BSCs
But Tc/Ts = r implies r ≤ C. That is, for r ≤ C, there exists a code

capable achieving a arbitrarily low probability of error.
13
Example: Repetition Codes
Consider transmission error probability p =1/4.
n = 3, 0-000 Decision
Y
1-111 X - 0, If 2 or more are 0
- 1, If 2 or more are 1
• Encoder- Repetition Codes (repeat each source bit n times, n is odd)
• Decoder -Majority Selector (Declare 0 if greater than n/2 of the received
bits are 0, otherwise decode 1)
• Channel - BSC
• Now, average probability of error (m+1 or more) for repetition code of
length n = 2m+1 = 3 will be
n
Pe   n
k  m 1
C k ( p ) k (1  p) n  k
 3(0.25) (0.75)  (0.25) 3  0.1563

2
14
• Now, average probability of error for repetition code of length 5 will
be
n
Pe   n
k  m 1
Ck ( p) k (1  p) n  k
 10(0.25) 3 (0.75) 2  5(0.25) 4 (0.75)  (0.25) 5  0.1035
• So with increase in n, Pe is decreasing.

As n  , code rate, r  0 and Pe  0
• This example highlights the unexpected result presented by channel
coding theorem, its not necessary to have code rate approach zero,
the theorem merely requires that code rate be less than the channel
capacity C.
• If we assume channel is BSC, then C = 0.1887 bits/symbol
15
Average rate of Pe for repetition Codes
Code used Pe Code rate r Remarks
None 1/4 1 Error is not

acceptable, r > C
Repetition 0-000, 0.1563 1/3 r>C
1-111 Code, n=3
Repetition Code, 0.1035 1/5 r>C
n=5, 0-00000
1-11111
Repetition Code, 0.0706 1/7 r<C
n=7, 0-0000000 Error –free
1-1111111 transmission
possible
Repetition code, n Tends to 0 Tends to 0 Code rate

tends to ∞ approaching zero.
16
References
Springer, 2010.
• H.P. HSU: Analog and Digital Communication: Schaum’s
Outline Series, McGraw-Hill, 2017.
• S. HAYKIN: Digital communication system, Wiley &
sons, 2014.
17
(17B1NEC735)
Topics: Capacity of a band-limited AWGN channel

1
Capacity of a band-limited AWGN channel
– Information Capacity Theorem (1/10)
• Also Shannon’s third theorem

• For band-limited, power-limited Gaussian Channels.
– The Gaussian channels can be used to model a wide range of
communication links.
– The causes of noise in such channels are very diverse in nature.
– However, these random effects can be assumed as independent
and identically distributed.
– And can be modelled as creating electrical disturbances in
communication circuits.
– Applying superposition theorem for electrical circuits, all random
effects could be added. The net effect of the independent and
identically distributed random effects is Gaussian from the
central limit theorem.
2
Since I ( X ;Y )  h( Y )  h( Y | X )
Channel Capacity C  pmax ( x)
I ( X ;Y )
Xk
Let us find the capacity of a continuous channel with bandwidth B

Hz and effected by white Gaussian noise with power spectral density
(PSD) N 0 .
2
X(t) Y(t)
Noise Y(t)= X(t)+ N(t)

N (t)
3
• Consider a zero-mean stationary process X(t) that is band-limited

to B Hz.
• Now, sample process X(t) at the Nyquist rate of 2B samples per
second.
• Let Xk, k = 1,2,…,K, denote the continuous random variables
obtained by uniform sampling of the process X(t).
• These samples are transmitted in T seconds over a noisy channel,
also band-limited to B Hz. Hence, the number of samples, K is
given by
K = 2BT
Therefore, Xk is a sample of transmitted signal.
4
• The channel output is perturbed by additive white Gaussian noise

(AWGN) of zero mean and PSD N0/2. The noise is band-limited to B Hz.
• Let the continuous random variables Yk, k = 1,2,…,K denotes samples of
the received signal, as shown by
Nk
Xk Yk = Xk + Nk, k = 1,2,…,K
 The noise sample Nk is Gaussian with zero mean and variance is given
by σ2 = N0B.
 Assume that the samples Yk, k = 1,2,…,K are statistically independent.
 The channel used here is discrete-time memoryless Gaussian channel.
5
• To make meaningful statements about channel, assign a cost to

each channel input. Typically, the transmitter is power-limited; it is
therefore reasonable to define the cost as
E[ X ]  P, k  1,2,...,K
2
k
where P is average transmitted power.
• The power-limited Gaussian channel employed here also has

practical importance. It models many communication channels
such as LOS radio and satellite links.
6
• The information capacity of the channel is defined as the maximum

of the mutual information between the channel input Xk and the
channel output Yk over all distributions on the input Xk that satisfy
the power requirements.
• Let I(Xk ; Yk) denote mutual information between Xk and Yk
• Therefore, the information capacity of the channel can be
expressed as
C  max[ I ( X k ; Yk ) : E[ X k2 ]  P]
pXk ( x)
where the maximization is performed with respect to p X k ( x )

the probability density function of Xk.
7
Capacity of a band-limited AWGN channel –
Information Capacity Theorem (7/10)
• Using, I ( X k ; Yk )  h(Yk )  h(Yk | X k )

• Since, Xk and Nk are independent random variables, and their sum
equals Yk, we find that the conditional entropy of Yk given Xk is equal
to the differential entropy of Nk,
h(Yk|Xk) = h(Nk)
• Since h(Nk) is independent of the distribution of Xk.
• The maximizing I(Xk ; Yk) requires maximizing h(Yk), the differential

entropy of sample Yk of the received signal.
• For h(Yk) to be maximum, Yk has to be Gaussian random variable,
i.e., Yk represent a noise-like process.
• Since Nk is Gaussian by assumption, Xk must be Gaussian too.
8
• Therefore, the maximization of I(Xk; Yk) can be attained by choosing

Xk from a noise-like process of average power P.
• Reformulate the formula of C
C  I ( X k ;Yk ) : X k Gaussian , E [ X k2 ]  P
• For evaluation of C, proceed in three steps:

1. The variance of Yk of the received signal = P + σ2. Hence,
differential entropy of Yk
h(Yk )  12 log 2 [2e( P   2 )]
2. The variance of the noise sample Nk = σ2.
h( N k )  12 log 2 [2e( 2 )]
9
3. Substituting h(Yk) and h(Nk) in I(Xk; Yk) and reorganizing C

C  12 log 2 [2e( P   2 )]  12 log 2 [2e 2 ]
 P 
 C  log 2 1  2 bits per transmissi on
1
  
2
• With the channel used K times for the transmission of K samples

of the process X(t) in T seconds, we find that the transformation
capacity per unit is (K/T) times of the obtained C.
• Also K =2BT and σ2 =N0B
 P   P 
C  B log 2 1  2   B log 2 1  bits per sec ond
    N0 B 
10
• Statement of the theorem as follows:

 The information capacity of a continuous channel of
bandwidth B Hz, perturbed by additive white Gaussian
noise of power spectral density N0/2 and limited in
bandwidth to B, is given by
 P 
C  B log 2 1   bits per second
 N0 B 
where P is the average transmitted power.
11
Example: Consider telephone channel with bandwidth 4 kHz
and SNR= 25dB. Let us model it as a band-limited and power-
limited Gaussian channel and find its channel capacity.
Solution:
Note: P/N0B also commonly known as Signal-to-Noise ratio (SNR).
Given: SNR in dB = 25 dB.
Using 25 = 10log10(SNR), SNR = 316.22.
Channel Bandwidth B = 4 kHz or 4000 Hz
The channel capacity
C  B log 2 1  SNR 
 4  10 3  log 2 1  316.22   33,238 bps
• The ITU V.90 modem, that operates on the telephone line, provides
a data rate of 33,600 bps. Thus, the ITU V.90 modem operates very
close to the capacity of the channel. If the channel SNR is poor, a
lower data rate is used by the modem.
12
Example: Consider an AWGN channel with 12 kHz
bandwidth and noise power spectral density N0/2 = 10-10
W/Hz. The transmitted power is 0.50 mW. Calculate
capacity of this channel.
Solution:
Given: B = 12 kHz
N0/2 = 10 -10
P = 0.50 mW
N 0 B  2  12  10 7
Substituting P, B, and N0B in C
 P 
C  B log 2 1  
 N0 B 
 0.5  10 3 
 12000  log 2 1  7
bps  92,516 bps
 24  10 
13
Information Capacity (or Shannon-Hartley ) Theorem
– Important points to note
• One of the most remarkable result of information theory.

• In single formula, three key system parameters: channel bandwidth
(B), average transmitted power (P) (or equivalently average received
power), and noise power spectral density (N0/2) at the channel
output.
• C linearly depends on B.
• C logarithmically depends on P/N0B (= Signal-to-Noise Ratio(SNR)).
• So, it is easier to increase the information capacity of communication
channel by expanding its bandwidth than increasing the transmitted
power for a prescribed noise variance.
14
Information Capacity (or Shannon-Hartley ) Theorem
– Important points to note (cont.)

• The theorem implies that, for a given transmitted power P and
channel bandwidth B, we can transmit information at a rate of C bits
per second with arbitrarily small probability of error by employing
sufficiently complex encoding systems.
• It is not possible to transmit at a rate higher than C bits per second by
any encoding system without a definite probability of error.
• Hence, this theorem defines the fundamental limit on the
transmission rate of error-free transmission for a power-limited, band-
limited Gaussian channel.
• To approach ultimate transmission rate C bits per second, the
transmitted signal must have statistical properties of white Gaussian
noise.
15
References

Springer, 2010.
• T.M. COVER, J. A. THOMAS: Elements of Information
th
• S. HAYKIN: Communication Systems, 4 edition, Wiley,
2006.
16
(17B1NEC735)
Topics: Implications of Information Capacity Theorem

1
Information Capacity Theorem
• Statement of the theorem as follows:

 The information capacity of a continuous channel of
bandwidth B Hz, perturbed by additive white Gaussian
noise of power spectral density N0/2 and limited in
bandwidth to B, is given by
 P  bits per second

C  B log 2 1  
 N0 B 
where P is the average transmitted power.
2
Implications of Information Capacity Theorem (1/6)
 Consider a Gaussian channel that is limited in both time and power.
 Information Capacity Theorem - The three key system parameters:

channel bandwidth (B), average transmitted power (P) (or equivalently
average received power), and noise power spectral density (N0/2) at the
channel output in the single formula where channel is AWGN.
 P 
Capacity C  B log 2 1  
 N0 B 
 Here, we will explore mathematical limits of communication system

under following constraints.
- Now, assume an ideal system, one that transmits the data at a bit rate Rb
equal to the information capacity C of the channel, i.e., Rb= C.
3
• The average transmitted power can be expressed as
P = EbC where Eb is the transmitted energy per bit.
 Apply P  Eb C
 Therefore, the information capacity theorem for this ideal system
is defined by
C  Eb C 
 log 2 1  
B  N0 B 
 The signal energy-per-bit to noise power spectral density ratio

Eb/N0 in terms of ratio C/B
Eb 2C / B   1

N0 C / B 
4
• A plot of bandwidth efficiency Rb/B versus Eb/N0 is called
bandwidth efficiency diagram.
• The ideal system is represented by the line Rb= C.
Courtesy: S. Haykin: Communication Systems, 4th edition, Wiley, 2006. 5

• Conclusions from Bandwidth Efficiency Diagram.
1. For infinite bandwidth,
 Eb  2 C / B   1
   lim
 0  B  B  C / B 
N 
2x   1
 lim
x 0  x 
x C / B
 loge 2  0.693  1.6dB

Shannon Limit
6
• Conclusions from Bandwidth Efficiency Diagram.
• The corresponding limiting value of C is

 P 
C B  lim B log 2 1  

B 
 N 0 
B
 P   P 
log 2 1  x  log 2 1  x 
 lim  N0 
 lim
P  N0 

P
log 2 e
x 0 x x 0 N 0 P N0
x
N0
x 1/ B x 1/ B
7
2. The capacity boundary defined by the curve for the critical bit rate Rb
= C, separates combination of system parameters that have the
potential for supporting error-free transmission (Rb<C) from those for
which error-free transmission is not possible (Rb> C).
3. The diagram provides potential trade-offs among Eb/N0, Rb/B, and

probability of error Pe.
• Movement of operating point along horizontal line trading Pe versus
Eb/N0 for fixed Rb/B.
• Movement of operating point along vertical line trading Pe versus
Rb/B for fixed Eb/N0.
4. Any point on the Bandwidth Efficiency Diagram corresponds to an

operating point corresponding to a set of values of SNR, Bandwidth
efficiency and BER.
8
Example: According to the Shannon-Hartley law, if the transmitted power is
equal to the noise power, the channel capacity C in bps is equal to the
bandwidth B Hz.
Solution:
Given
P  N0 B
Substituti ng P in C
 P 
C  B log 2 1  
 N0 B 
 B log 2 1  1  B
9
Example: Consider an AWGN channel with 12 kHz bandwidth and noise
spectral density N0/2 = 10-10 W/Hz. The received power is 0.55 mW.
Calculate capacity of this channel.
Solution:
Given: B = 12 kHz
N0/2 = 10 -10
P + N0B = 0.55 mW
N 0 B  2  10 10  12  10 3
 P 
C  B log 2 1  
 N0 B 
 0.55  10 3 
 12000  log 2  7
 bps  94,080 bps
 24  10 
10
Example: Consider transmission over a telephone line with a bandwidth B = 4
kHz. This is an analog channel which can be considered as perturbed by
AWGN, and for which power signal-to-noise ratio is at least 25 dB. Calculate
(a) Capacity of this channel, and (b) required signal-to-noise ratio to transmit
an M-ary signal able to carry 16000 bps?
Solution: Given B = 4 kHz, SNR in dB= 25 dB = 10log10(SNR)

a) C  B log 2 (1  SNR)
 4000 log 2 (1  316.22)  33, 241bps
b) Rb= 16000bps
If Rb<C, error-free transmission is possible.
16000  4000 log 2 (1  SNR )

4  log 2 (1  SNR )
SNR  15 or 11.76dB
11
Example: An analog signal having 4 kHz bandwidth is sampled at
1.25 times the Nyquist rate, and each sample is quantized into one
of 256 equally likely levels.
Assume that the successive samples are statistically independent.
a) What is the information rate of this source?

b) Can the output of this source be transmitted without error over an
AWGN channel with a bandwidth of 10 kHz and an SNR of 20 dB?
c) Find the SNR required for error-free transmission of the output of
this source if the B is 10 kHz.
d) Find the bandwidth required for error-free transmission of the
output of this source if the SNR is 20 dB.
Solution:
a) No of samples/second = 1.25·2·4000= 10000 samples/second
No of bits/sample = log2(256) = 8 bits /sample
Rb = 80,000 bps
12
Example (cont.):
b) C  B log 2 (1  SNR )
 10000  log 2 (1  100)  66,589 bps
Rb > C error-free transmission is not possible
c) 80000  10000 log 2 (1  SNR )

8  log 2 (1  SNR )
SNR  255 or 24.1dB
d) 80000  B log 2 (1  100)

80000  B log 2 (101)  B  80000 / 6.658
B  12 kHz
13
References
• T.M. COVER, J. A. THOMAS: Elements of Information
th
• S. HAYKIN: Communication Systems, 4 edition, Wiley,
2006.
Springer, 2010.
14
(17B1NEC735)
Lecture 22: Module No. 5 Error Control Coding

Topics - Coding for reliable digital transmission and
storage; Types of codes; Modulation and coding.

1
Coding for reliable digital
transmission and storage (1/7)
• Tasks for designer of digital communication systems
• Providing a cost-effective system for transmitting information
from one end of the system at a rate and level of reliability
and quality that are acceptable to user at the other end.
• Key parameters
1. Transmitted signal power
2. Channel bandwidth
• These two parameters together with power spectral density
(psd) of receiver noise, determine the signal energy per bit-to-
noise psd ratio Eb/N0.
• This ratio Eb/N0 uniquely determines the bit error rate (BER) for
a particular modulation technique. Practical considerations
usually decides value of Eb/N0.
2
• In practice, often modulation techniques fail in achieving required
data quality (error performance).
• For fixed Eb/N0, the only practical option available for
changing/improving data quality from problematic to acceptable is
to use error control coding.
• Another practical motivation for the use of error control coding is
to reduce the required Eb/N0 for fixed BER. The reduced
requirement of Eb/N0 in turn reduces requirement of transmitter
power or reduces the hardware costs by reducing size of required
antenna in radio communications.
• Error control for improving data quality – by means of forward
error correction (FEC).
• Error correcting codes, as the name suggests, are used for
correcting errors when messages are transmitted over a noisy
channel or stored data is retrieved.
3
• Model of digital communication system.

Courtesy:T.K. MOON: Error Correction Coding, Wiley, 2006.
4
• The discrete source generates information in form of binary symbols.

• Transmitter
• The source encoder performs data compression by removing
redundancy.
• The encrypter hides or scrambles information so that unintended
listeners are unable to discern the information content. The codes
used for encryption are generally different from the error
correction codes.
• The channel encoder in the transmitter accepts message bits and
add redundancy according to prescribed rule, there by producing
encoded bits at a higher rate.
• The modulator converts symbol sequences from the channel
encoders into signals appropriate for transmission over the
channel and provides the appropriate channel-conforming
representation.
5
• The channel is the medium over which the information is conveyed.
• The type of channels in which information is conveyed between two
distinct places.
• Examples of these channels are telephone lines, internet cables,
fiber-optic lines, microwave radio channels, high frequency channels,
cell phone channels, etc.
• The type of channels in which information may also be conveyed
between two distinct times.
• For example, by writing information onto a computer disk, then
retrieving it at a later time. Hard drives, diskettes, CD-ROMs, DVDs,
and solid state memory are also examples of channels.
6
• Receiver
• The demodulator/equalizer receives the signal from the channel and
converts it into a sequence of symbols. This typically involves many
functions, such as filtering, demodulation, carrier synchronization,
symbol timing estimation, frame synchronization, and matched
filtering, followed by a detection step in which decisions about the
transmitted symbols are made.
• The channel decoder in the receiver exploits the redundancy to
decide which message bits were actually transmitted.
• The decrypter removes any encryption.
• The source decoder provides an uncompressed representation of
the data.
• The sink is the ultimate destination of the data.
• The combined goal of the channel encoder and decoder block is to
minimize effect of channel noise (minimizing error between channel
encoder input and channel decoder output ).
7
8
Introduction to Error Control Codes (1/3)
• On the basis of memory, error control codes can broadly classified
into:
1. Block Codes – no memory
o To generate an (n, k) block code, the channel encoder accepts
information in successive k-bit blocks;
o For each block, n - k bits redundant bits added to k message bits
which are related to k message bits algebraically;
o an overall encoded block of n bits is produced where n > k. The
n-bit block is called a codeword, and n is called the block length
of code.
n bits
Message bits Redundant bits
k bits n - k bits
9
1. Block Codes – no memory (cont.)
oThe channel encoder produces bits at the rate R0 = (n/k)Rs, where Rs

is the bit rate of the information source.
o The dimensions ratio r = k/n is called the code rate, where 0 < r < 1.
oThe bit rate R0 coming out of the encoder, is called the channel data
rate.
o The code rate is dimensionless but source data rate and channel
data rate both measured in bits per second.
10
2. Convolutional Codes - memory
o The encoding operation – discrete-time convolution of the
input sequence with the impulse response of the encoder.
o The duration of the impulse response = memory of the
encoder.
o Therefore, convolution encoder operates on the incoming
message sequence, using a sliding window equal in
duration to its own memory.
o Unlike block code, convolutional channel encoder accepts
message bits as a continuous sequence and thereby
generate a continuous sequence of encoded bits at a higher
rate.
 The operations of channel coding and modulation are
performed separately in transmitters likewise for the operations
of detection and decoding at receiver. However, major concern –
bandwidth efficiency.
11
Error Control Methods (1/1)
The two different approaches are:
1. Forward Error correction (FEC) – depends on controlled use
of redundancy in the transmitted codeword for the both
detection and correction of errors incurred during the
course of transmission over noisy channel.
No further processing even in case of transmission failure.
Here, only one-way link is required between transmitter
and receiver.
2. Automatic Repeat Request (ARQ) – uses redundancy only for

error detection. Upon error in transmission, the receiver
requests repeat transmission, and that requires a feedback
path between receiver and transmitter.
12
Discrete Memoryless Channels (DMCs) (1/3)
• The waveform channel is said to be memoryless if the detector
output in a given interval depends only on the signal transmitted in
that interval, not on an any previous transmission.
• Under this condition, DMC = combination of the modulator,
waveform channel, and detector (or demodulator).
• Such a DMC is completely characterized by the set of channel
transition probabilities (or transmission probabilities) p(yj|xi)
where xi denotes a modulator input symbol, yj denotes a
demodulator output symbol, and p(yj|xi) denotes the probability of
receiving symbol j given that symbol i was sent.
• The simplest DMC – binary symmetric channel (BSC)
• Only binary inputs and outputs symbols (0 and 1).
• To maintain binary as output symbols – a hard decision is made
on demodulator output as so to which symbol was actually
transmitted.
13
• BSC – assuming a channel noise modeled as additive white Gaussian
Noise (AWGN) channel, is completely described by the transition
probability p as shown in figure given below.
BSC
• Due to simplicity of implementation, binary coding is majorly

employed in coded digital communication systems.
• Hard decision decoders, or algebraic decoders, take advantage of

the special algebraic structure that is built into the design of channel
codes to make the decoding implementation relatively easier.
14
• But, hard decision also causes loss of irreversible loss of information.
• To reduce this loss, soft decision coding is used. A multilevel

quantizer with Q-levels is used and then channel is called input Q-
ary output DMC.
• Soft decision increases performance at the cost of more complexity.
• Q-ary output DMC, Q=3
15
References
• T.K. MOON: Error Correction Coding, Wiley, 2006.

• S. HAYKIN: Communication Systems, 4th edition, Wiley,
2006.
• S. LIN & D.J. COSTELLO: Error Control Coding, 2nd
edition, Pearson, 2011.
16
(17B1NEC735)
Lecture 23: Module No. 5 Error Control Coding
Topics - ML decoding and Performance measures

1
Minimum distance decoding and
Maximum Likelihood (ML) decoding (1/8)
• Importance of Hamming distance and Hamming weight in

algebraic error correction
• Definition: The Hamming weight of a codeword (or any vector) is

equal to the number of non-zero elements in the codeword.
• The Hamming weight of a codeword c is denoted by w(c).
• Definition: The Hamming distance between two codewords is the

number of places by which the codewords differ.
• The Hamming distance between two codewords c1 and c2 is
denoted by d(c1, c2).
2
• Importance of Hamming distance and Hamming weight in
algebraic error correction (cont.)
• Let transmitted codeword ci and received codeword r

• Then, error e = r – ci
• The no of errors that occurred in transmission = weight of e, i.e.,
w(r – ci) = hamming distance d(r, ci) between transmitted ci and
received codeword r.
• Detecting an error then simple means detecting non-null weights.

• Correcting an error however, implies to be able to further
compute actual difference (without knowing ci ) only r is known at
receiver.
3
4
• Decoding procedure: From a probabilistic (Bayesian) point of view
• When a word r is received, the most likely codeword to have been

emitted (knowing that this word is received) is the codeword c
which maximize the probability P(X = c | Y = r) where X and Y are
input and output of the channel, respectively.
• In practice, the probability is not easy to estimate if distribution P(X

= ci) of the codeword at emission (so called “a priori probabilities”)
are not known.
P(Y  r , X  c)
ma x P( X  c | Y  r )  ma x
cC cC P(Y  r )
P(Y  r | X  c) P( X  c)
Also, ma x P( X  c | Y  r )  ma x
5
• Decoding procedure (cont.) :
P(Y  r | X  c) P( X  c)
ma x P( X  c | Y  r )  ma x
• All the codewords are equally likely (maximum entropy assumption)
 ma x P( X  c | Y  r )  ma x P(Y  r | X  c)
cC cC
• P(Y = r |X = c) is known as likelihood and it finally happens that

decoder should decode r by the most likely codeword, i.e.,
codeword that maximizes P(Y = r| X = c). This is called maximum
likelihood (ML) decision rule.
• P(Y = r| X = c) appears to be more easy to handle than the first P(X =
c | Y = r).
6
• In the case of DMC, P(X = c | Y = r) turns into the product of the
transmission probabilities for each symbol.
• For BSC, where all symbols have the same error probability p, this
probability simply becomes:
P(Y  r | X  c)  p d ( c ,r ) (1  p) n d ( c ,r )
    
number of places number of places
they differ they are same
• d(c, r) - symbols corrupted by transmission

• n - d(c, r) - symbols transmitted correctly
• Codeword that maximizes P(Y = r| X = c) is the one that minimizes d(c, r).
• Assuming p<½, this corresponds to choosing the value for which d(c, r) is
the smallest, i.e., the codeword c nearest to r in hamming distance.
• This proves that for BSC, a minimum distance decoding and maximum
likelihood decoding are equivalent.
7
• Example: For given message M = {00, 01,10,11} and its code C =
{0000, 0101, 1011, 1110}. The received codeword r = 1000 is
received at output of BSC with transition probability p = 0.1, find the
transmitted codeword.
Sol. Find P(Y = r| X = c), for all possible codewords c.

P(Y  r | X  c)  p d (c,r ) (1  p) nd (c,r )
P(Y  1000 | X  0000)  0.11 (0.9)3  0.0729
The transmitted
P(Y  1000 | X  0101)  0.1 (0.9)  0.0009
3 1
codeword c is “0000”
P(Y  1000 | X  1011)  0.12 (0.9) 2  0.0081 which maximizes
P(Y  1000 | X  1110)  0.12 (0.9) 2  0.0081 P(Y = r|X=c).
8
• Example (cont.):
Also find d(r, ci)
d (1000, 0000 )  1
The transmitted
d (1000, 0101)  3 codeword c is “0000”
d (1000, 1011)  2 which is at minimum
d (1000, 1110 )  2 distance from 1000.
Note that for BSC, a minimum distance decoding and maximum

likelihood decoding are equivalent.
9
Soft vs Hard decisions
• In Hard decision decoding, the received codeword is
compared with the all possible code-words and the
codeword which gives the minimum Hamming distance is
selected.
• In Soft decision decoding, the received codeword is
compared with the all possible code-words and the
codeword which gives the minimum Euclidean distance is
selected.
• Thus the soft decision decoding improves the decision
making process by supplying additional reliability
information (calculated Euclidean distance or calculated
log-likelihood ratio).
10
Hard Decision Decoding (1/2)
11
Hard Decision Decoding (2/2)
All possible Hard decision Hamming
Codewords output distance
000 001 1
011 001 1
101 001 1
110 001 3
The decoder may choose any of the three possibility and the
probability of getting the correct codeword (“001” – this is what
we transmitted) is always 1/3.
So when the hard decision decoding is employed the

probability of recovering our data ( in this particular case) is
1/3.
12
Soft-Decision Decoding (1/3)
13
14
 The minimum Euclidean distance is “0.49” corresponding
to “011” codeword (which is what we transmitted).
The decoder selects this codeword as the output.
Even though the parity encoder cannot correct errors, the

soft decision scheme helped in recovering the data in this
case.
15
Performance Measures of Linear Codes (1/2)
• There are several different ways that we can characterize the error
detecting and correcting capabilities of codes at the output of the
channel decoder.
1. P(E) is the probability of decoder error, also known as the word
error rate. This is the probability that the codeword at the output
of the decoder is not the same as the codeword at the input of the
encoder.
2. Pb(E) or Pb is the probability of bit error, also known as the bit
error rate. This is the probability that the decoded message bits
(extracted from a decoded codeword of a binary code) are not the
same as the encoded message bits. Note that when a decoder
error occurs, there may be anywhere from one to k message bits
in error, depending on what codeword is sent, what codeword was
decoded, and the mapping from message bits to codewords.
3. Pu (E) is the probability of undetected codeword error, the
probability that errors occurring in a codeword are not detected.
16
Performance Measures of Linear Codes (2/2)
4. Pd(E) is the probability of detected codeword error, the

probability that one or more errors occurring in a codeword are
detected.
5. Pub is the undetected bit error rate, the probability that a
decoded message bit is in error and is contained within a
codeword corrupted by an undetected error.
6. Pdb is the detected bit error rate, the probability that a received
message bit is in error and is contained within a codeword
corrupted by a detected error.
7. P(F) is the probability of decoder failure, which is the
probability that the decoder is unable to decode the received
vector (and is able to determine that it cannot decode).
17
Objectives of Good Error Control Coding
• An Error Control Code for a channel, represented by the channel
transition probability matrix P[Y|X], consists of:
(i) A message set.
(ii) An encoding function which maps each message to a unique
codeword. The set of codewords is called a codebook.
(iii) A decoding function which makes a guess based on a decoding
strategy in order to map back the received vector to one of the
possible messages.
• The objectives of a good error control coding scheme are as
follows:
(i) High error correcting capability in terms of the number of errors
that it can rectify,
(ii) Fast encoding of the message, i.e., an efficient coding strategy,
(iii) Fast and efficient decoding of the received message,
(iv) Maximum transfer of information per unit time (i.e., fewer
overheads in terms of the redundancy).
18
References

• S. HAYKIN: Communication Systems, 4th edition, Wiley,
2006.
• S. LIN, D.J. COSTELLO: Error Control Coding, 2nd edition,
Pearson, 2011.
19
(17B1NEC735)
Lecture 24: Module No. 6 Linear Block Codes

Topics - Algebra Background-Groups

1
Introduction
• Linear block codes form a group and a vector space.
• Hence, the study of the properties of this class of codes benefits
from a formal introduction to these concepts.
• The codes, in turn, reinforce the concepts of groups and subgroups
that are valuable in the study of block codes.
• This study of groups leads us to cyclic groups, subgroups, cosets, and
factor groups.
• These concepts, important in their own right, also build insight in

understanding the construction of extension fields which are
essential for some coding algorithms to be developed.
2
Groups (1/8)
• A group formalizes some of the basic rules of arithmetic necessary
for cancellation and solution of some simple algebraic equations.
• Definition 1: A binary operation ∗ on a set is a rule that assigns to
each ordered pair of elements of the set (a , b) some element of
the set. (Since the operation returns an element in the set, this is
actually defined as closed binary operation. We assume that all
binary operations are closed.)
• Example 1: On the set of positive integers, we can define a binary
operation ∗ by a ∗ b = min(a, b).
• Example 2: On the set of real numbers, we can define a binary
operation ∗ by a ∗ b = a (i.e., the first argument).
• Example 3: On the set of real numbers, we can define a binary
operation ∗ by a ∗ b = a + b. That is, the binary operation is regular
addition.
3
Groups (2/8)
• Definition 2: A group 〈G, ∗〉 is a set G together with a binary
operation ∗ on G such that:
G1 The operator is associative: for any a , b, c ϵ G , (a ∗ b) ∗ c =
a ∗ (b ∗ c).
G2 There is an element e ϵ G called the identity element such
that a ∗ e = e ∗ a = a for all a ϵ G.
G3 For every a ϵ G, there is an element b ϵ G known as the
inverse of a such that a ∗ b = e . The inverse of a is sometimes
denoted as a-l (when the operator ∗ is multiplication-like) or as
-a (when the operator ∗ is addition-like).
• Where the operation is clear from context, the

group 〈G, ∗〉 may be denoted simply as G.
4
Groups (3/8)
• Definition 3: If G has a finite number of elements, it is said to be a
finite group. The order of a finite group G, denoted |G|, is the
number of elements in G.
• This definition of order (of a group) is to be distinguished from
the order of an element (see cyclic group).
• Definition 4: A group 〈G, ∗〉 is commutative if a ∗ b = b ∗ a for
every a, b ϵ G.
• Example 4 The set 〈ℤ, +〉, which is the set of integers under
addition, forms a group. The identity element is 0, since 0 + a = a
+ 0 = a for any a ϵ ℤ. The inverse of any a ϵ ℤ is -a.
• This is a commutative group.
• As a matter of convention, a group that is commutative with an
additive-like operator is said to be an Abelian group (by
mathematician N.H. Abel).
5
Groups (4/8)
• Example 5 The set 〈ℤ, · 〉, the set of integers under multiplication,
does not form a group. There is a multiplicative identity, 1, but
there is not a multiplicative inverse for every element in ℤ.
• Example 6 The set 〈ℚ \ {0}, ·〉, the set of rational numbers
excluding 0, is a group with identity element I. The inverse of an
element a is a -l = l/a.
• The requirements on a group are strong enough to introduce the
idea of cancellation. In a group G, if a ∗ b = a ∗ c, then b = c (this is
left cancellation). To see this, let a-l be the inverse of a in G. Then
a-1 ∗ (a ∗ b ) = a-I ∗ (a ∗ c ) = (a-1 ∗ a ) ∗ c = e ∗ c = c and a-1∗ (a ∗ b)
= (a-1∗ a ) ∗ b = e ∗ b = b, by the properties of associativity and
identity .
• Under group requirements, we can also verify that solutions to
linear equations of the form a ∗ x = b are unique. Using the group
properties we get immediately that x = a-1b. If x1 and x2 are two
solutions, such that a ∗ x1 = b = a ∗ x2, then by cancellation we get
immediately that x1 = x2.
6
Groups (5/8)
• Example 7 Let 〈ℤ5, +〉 denote addition on the numbers {0, 1,2, 3,4}
modulo 5. The operation is demonstrated in tabular form in the
table below:
+ 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
• Clearly 0 is the identity element. Since 0 appears in each row and

column, every element has an inverse.
• By the uniqueness of solution, we must have every element
appearing in every row and column, as it does.
• By the symmetry of the table it is clear that the operation is Abelian
(commutative). Thus we verify that 〈ℤ 5, +〉 is an Abelian group.
• Typically, when using a table to represent a group operation a ∗ b,
the first operand a is the row and the second operand b is the
column in the table.
7
Groups (6/8)
• In general, we denote the set of numbers 0, 1, . . . ,n - 1 with
addition modulo n by 〈ℤn,+ 〉 or, more briefly, ℤn.
• Example 8 Consider the set of numbers {1,2,3,4,5} using the
operation of multiplication modulo 6, shown in the following
table:
· 1 2 3 4 5
1 1 2 3 4 5
2 2 4 0 2 4
3 3 0 3 0 3
4 4 2 0 4 2
5 5 4 3 2 1
• The number 1 acts as an identity, but this does not form a group,
since not every element has a multiplicative inverse. In fact, the
only elements that have a multiplicative inverse are those that
are relatively prime to 6, that is, those numbers that don’t share
a divisor with 6 other than one.
8
Groups (7/8)
• One way to construct groups is to take the Cartesian, or direct,
product of groups. Given groups 〈GI, ∗〉, 〈G2, ∗〉, . . . , 〈Gr, ∗〉, the
direct product group G1 x G2 x . . . x Gr has elements (a1,a2 ,. . . , ar),
where each ai ϵ Gi.
• The operation in G is defined element by-element. That is, if
(a1 , a2 ,..., ar ) ∈ G and (b1 , b2 ,..., br ) ∈ G ,
then (a1 , a2 ,..., ar ) ∗ (b1 , b2 ,..., br ) ∈ G = (a1 ∗ b1 , a2 ∗ b2 ,..., ar ∗ br )
• Example 9 The group 〈 ℤ2 x ℤ2, +〉 consists of two-tuples with
addition defined element-by element modulo two. An addition for
the group table is shown here:
• This group is called the Klein 4-group.
+ (0,0) (0,1) (1,0) (1,1)
(0,0) (0,0) (0,1) (1,0) (1,1)
(0,1) (0,1) (0,0) (1,1) (1,0)
(1,0) (1,0) (1,1) (0,0) (0,1)
(1,1) (1,1) (1,0) (0,1) (0,0)
9
Groups (8/8)
• Example 10 This example introduces the idea of permutations as
elements in a group. It is interesting because it introduces a group
operation that is function composition, as opposed to the mostly
arithmetic group operations presented to this point.
• A permutation of a set A is a one-to-one, onto function (a
bijection) of a set A onto itself.
• Let A be a set of n integers. For example,
A = {1,2,3,4}
• A permutation p can be written in the notation
1 2 3 4
𝑝𝑝 =
3 4 1 2
which means that 1 →3 2→ 4 3 →1 4 → 2 .
• There are n! different permutations on n distinct elements.
• The set of all n ! permutations on n elements forms a group,
where the group operation is function composition. This group is
referred to as the symmetric group on n letters. The group is
commonly denoted by Sn .
10
Subgroups (1/2)
• Definition 5: A subgroup 〈H , ∗〉 of a group 〈G, ∗〉 is a group formed
from a subset of elements in a group G with the same operation ∗.
• Notationally, we may write H < G to indicate that H is a subgroup of
G.
• If the elements of H are a strict subset of the elements of G (i.e., H

c G but not H = G ) , then the subgroup is said to be a proper
subgroup. If H = G, then H is an improper subgroup of G.
• The subgroups H = { e } c G (e is the identity) and H = G are said to
be trivial subgroups.
• Let G be a group of finite order and let H be a subgroup of G. Then

the order of H divides the order of G. That is, I H I divides I GI.
11
Subgroups (2/2)
• Example 11 Let G = 〈ℤ6, +〉, the set of numbers {0,1,2,3,4,5} using
addition modulo 6.
• Let H = 〈{0, 2, 4}, +〉, with addition taken modulo 6. As a set, H c G.
It can be shown that H forms a group.
• Let K = 〈{0, 3}, + 〉, with addition taken modulo 6. Then K is a
subgroup of G.
+ 0 1 2 3 4 5 + 0 2 4
0 0 1 2 3 4 5 0 0 2 4
1 1 2 3 4 5 0 2 2 4 0
4 4 0 2
2 2 3 4 5 0 1
3 3 4 5 0 1 2 + 0 3
4 4 5 0 1 2 3 0 0 3
5 5 0 1 2 3 4 3 3 0
• Example 12 A variety of familiar groups can be arranged as

subgroups. For example, 〈ℤ,+〉 < 〈 ℚ, + 〉 < 〈ℝ, +〉 < 〈ℂ, +〉.
12
Cyclic Groups (1/3)
• In a group G with operation ∗ or multiplication operation, use the
notation an to indicate a ∗ a ∗ a ∗··· ∗ a , with the operand a
appearing n times. Thus a1 = a , a2 = a ∗ a , etc.
• Take ao to be the identity element in the group G.
• Use a-2 to indicate (a-1)(a-1), and a-n to indicate (a-1) n.
• For a group with an additive operator +, the notation na is often

used, which means a + a + a +···+ a , with the operand appearing n
times. Throughout this section we use the an notation; making the
switch to the additive operator notation is straightforward.
• Let G be a group and let a ϵ G. Any subgroup containing a must
also contain a2 , a3, and so forth. The subgroup must contain e =
aa-l, and hence a-2, a-3, and so forth, are also in the subgroup.
13
Cyclic Groups (2/3)
• Definition 6: For any a ϵ G, the set {an|n ϵ Z} generates a subgroup
of G called the cyclic subgroup. The element a is said to be the
generator of the subgroup. The cyclic subgroup generated by a is
denoted as 〈 a 〉 .
• Definition 7: If every element of a group can be generated by a
single element, the group is said to be cyclic.
• Example 13 The group 〈ℤ5,+〉, is cyclic, since every element in the
set can be generated by a = 2 (under the appropriate addition law):
2,
2+2=4,
2+2+2=1,
2+2+2+2=3,
2 + 2 + 2 + 2 + 2 = 0. In this case, ℤ5 = 〈2〉
Observe that there are several generators for ℤ5 .
14
Cyclic Groups (3/3)
• Definition 8: In a group G , with a ϵ G , the smallest n such that an is

equal to the identity in G is said to be the order of a. If no such n
exists, a is of infinite order.
• The order of an element should not be confused with the order of a
group, which is the number of elements in the group.
• In ℤ5, the computations above show that the element 2 is of order 5.
In fact, the order of every nonzero element in ℤ5 is 5.
• Example 14 Let G = 〈ℤ6, +〉, Then

〈2〉={0,2,4}, 〈3〉={0,3}, 〈5〉={0,1,2,3,4,5} = ℤ6,
• It is easy to verify that an element a ϵ ℤ6, is a generator for the whole
group if and only if a and 6 are relatively prime.
• Every group of prime order is cyclic.

15
Cosets (1/1)
• Definition 9: Let H be a subgroup of 〈G, ∗〉 (where G is not
necessarily commutative) and let a ϵ G.
• The left coset of H , a ∗ H , is the set {a ∗ h|h ϵ H } .
• The right coset of H, H ∗ a , is the set {h ∗ a|h ϵ H }
• In a commutative group, the left and right cosets are the same.
• Every coset of H in a group G has the same number of elements.

• The distinct cosets of H in a group G are disjoint.
16
References

Pearson, 2011.
17
(17B1NEC735)

Topics – Algebra background (cont.) - Fields, Binary field
arithmetic, Vector Spaces over GF(2)

1
Fields (1/4)
• This concept here since fields are used in defining vector spaces
and simple linear block codes.
• Definition 10 A field 〈F, +, · 〉 is a set of objects F on which the

operations of addition and multiplication, subtraction (or additive
inverse), and division (or multiplicative inverse) apply in a manner
analogous to the way these operations work for real numbers. In
particular, the addition operation + and the multiplication
operation · (or juxtaposition) satisfy the following :
• F1 Closure under addition: For every a and b in F, a + b is also in F.
• F2 Additive identity: There is an element in F, which we denote as
0, such that a + 0 = 0 + a = a for every a ϵ F.
• F3 Additive inverse (subtraction): For every a ϵ F there exists an
element b in F such that a + b = b + a = 0. The element b is
frequently called the additive inverse of a and is denoted as -a.
2
Fields (2/4)
• F4 Associativity: (a + b) + c = a + (b + c ) for every a , b, c ϵ F.
• F5 Commutativity: a + b = b + a for every a, b ϵ F.
• The first four requirements mean that the elements of F form a
group under addition; with the fifth requirement, a commutative
group is obtained.
• F6 Closure under multiplication: For every a and b in F, a · b is also

in F.
• F7 Multiplicative identity: There is an element in F, which we
denote as 1, such that a · 1 = 1 · a = a for every a ϵ F with a ≠ 0.
• F8 Multiplicative inverse: For every a ϵ F with a ≠ 0, there is an
element b ϵ F such that a · b = b · a = 1. The element b is called the
multiplicative inverse, or reciprocal, of a and is denoted as a-l.
3
Fields (3/4)
• F9 Associativity: (a · b) · c = a · (b · c) for every a, b, c ϵ F.
• F10 Commutativity: a · b = b · a for every a, b ϵ F.
• Thus the non-zero elements of F form a commutative group under
multiplication.
• F11 Multiplication distributes over addition:
a · (b + c) = a · b + a · c.
• The field 〈F, +, · 〉 is frequently referred to simply as F. A field with

q elements in it may be denoted as Fq.
4
Fields (4/4)
• Example 15 The field with two elements in it, F2 = ℤ2 = GF(2) has the
following addition and multiplication tables.
• Ex-Or AND
+ 0 1 · 0 1
0 0 1 0 0 0
1 1 0 1 0 1
• The field GF(2) is very important, since it is the field where the operations involved
in binary codes work.
• Example 16 The field F5 = ℤ5 = GF(5) has the following addition and
multiplication tables:
+ 0 1 2 3 4 · 0 1 2 3 4
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 1 0 1 2 3 4
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1
• There are similarly constructed fields for every prime p, denoted by either GF(p).
5
Vector Spaces (1/10)
• Linear block codes are based on concepts from linear algebra.
• Definition 11: Let V be a set of elements called vectors and let F be a
field of elements called scalars. An addition operation + is defined
between vectors. A scalar multiplication operation · (or juxtaposition) is
defined such that for a scalar a ϵ F and a vector v ϵ V, a · v ϵ V. Then V is
a vector space over F if + and · satisfy the following:
• V1 V forms a commutative group under +.
• V2 For any element a ϵ F and v ϵ V, a · v ϵ V .
• Combining V1 and V2 , we must have a · v + b · w ϵ V for every v, w ϵ V
and a, b ϵ F.
• V3 The operations + and · distribute:
( a + b ) · v = a · v + b · v and a · ( u + v ) = a · u + a · v
for all scalars a, b ϵ F and vectors v, u ϵ V.
• V4 The operation · is associative: (a · b) · v = a · (b · v) for all a, b ϵ F and
v ϵ V.
• F is called the scalar field of the vector space V.
6
7
G  [v1 v2  vk ]
 a1 
a 
a1v1  a2 v2  ...  ak vk  v1 v2  vk  2

 
 ak 
8
 v1 
v 
a1v1  a2 v2    ak vk  a1 a2  ak  2 

 
 vk 
9
10
 1 0  1   0  1 0  1 

  1 1    0  1 1 
  0      Let W  span (G )   0      
G
 1 1 0    0  1 1 0  
             
 0 0    
0  0 0  0  0 
11
• The set G is a spanning set for W, but it is not a spanning set for V.
However, G is not a basis for W; the set G has some redundancy in it,
since the third vector is a linear combination of the first two. The vectors
in G are not linearly independent. The third vector in G can be removed,
resulting in the set G’  0 1 
  0  
 1  
G 
 1 1 
   
 0 0 
• No spanning set for W has fewer vectors in it than does G’, so dim(W) =
2.
• Theorem : Let V be a k-dimensional vector space defined over a scalar
field with finite number of elements q in it. Then the number of
elements in V is IVI = qk.
• Definition 16: Let V be a vector space over a scalar field Fq and let W c V
be a vector space. That is, for any w1 and w2 ϵ W , awl + bw2 ϵ W for any
a, b ϵ F. Then W is called a vector subspace (or simply a subspace) of F.
12
n 1
u  v  u  v   ui vi
i 0
13
• Definition 19: Let W be a k-dimensional subspace of a vector space V.
The set of all vectors u ϵ V which are orthogonal to all the vectors of W
is called the dual space of W (sometimes called the orthogonal
complement of W or nullspace), denoted as W⊥. (This symbol
sometimes pronounced "W perp," for "perpendicular.")
That is, W⊥ = {u ϵ V : u · w = o for all w ϵ W).
• Example 19 Let V = F24 and let W be as in previous example. Then it

can be verified that
 0  0  1 1 
  0  1 1 
  0      
W 
 0  0  1 1 
       
 0 1 0  1 
14
  0  1  
   1  
   0     ,
Also W  span 
 1  
  0   
  1 0  

then dim(W  )  2
• Theorem: Let V be a finite-dimensional vector space of n-tuples,

Fn, with a subspace W of dimension k. Let U = W⊥ be the dual
space of W. Then dim(W⊥) = dim(V) - dim(W) = n - k.
15
References

Pearson, 2011.
16
(17B1NEC735)

Topics - Linear block codes, Generator matrices

1
Linear Block Codes – Basic Definitions (1/4)
2
• This binary information sequence is divided into message
block of fixed length k, denoted by m.
• Each message block consists of k information bits - total of
2k distinct messages.
Channel Encoder
• The encoder transforms each input message m (binary k-

tuple) into a binary n-tuple c with n > k.
• This n-tuple c is referred to as the codeword ( or code
vector) of the message m.
- distinct 2k codewords for distinct 2k messages or
message blocks. Why?
• This set of 2k codewords is called a block code.
3
• For error correction purposes, block code must have one-
to-one correspondence between a message m and its
codeword c.
• However, for a given code C, there may be more than one
possible way of mapping messages to codewords.
• A block code can be represented as an exhaustive list, but

for large k this would be prohibitively complex to store
and decode.
• The encoder complexity can be reduced by imposing some

sort of mathematical structure on the code. The most
common requirement is linearity.
4
• Definition: A block code C over a field F, of q symbols of length n
and qk codewords is a q-ary linear (n, k) code if and only if its qk
codewords form a k-dimensional vector subspace of the vector
space of all the n-tuples Fqn .
• The number n is said to be the length of the code and the
number k is the dimension of the code. The rate of the code is R
= k/n.
• For a linear code, the sum of any two codewords is also a
codeword. More generally, any linear combination of codewords
is a codeword.
For example, (256, 200) binary linear block code, k = 200 n-tuple codewords
needs to stored in place of 2k for non-linear block code.
m (k bits )
Channel Encoder c c (n bits)
(n, k) Linear Block Coder
2k distinct messages n-tuple 2k distinct codewords
5
Error Detection and Correction Capability of Code (1/4)
 If s errors occur during transmission, the Hamming distance

between the sent codeword and received codeword is s.
 If our code is to detect up to s errors, the minimum distance
between the valid codes must be s + 1, so that the received code
word does not match a valid codeword.
6
Error Detection and Correction Capability of Code(2/4)
• Suppose a codeword x is corrupted by t- bits or less.

• Then this corrupted code word is located either inside or on the
perimeter of this circle.
• If the receiver receives a code word that belongs to this territory, it
decides that the original code word is the one at the centre.
• To guarantee correction of up to t-errors in all cases, the minimum
Hamming distance in a code must be dmin = 2t + 1.
7
Error Detection and Correction Capability of code (3/4)
8
Error Detection and Correction Capability of code (4/4)
9
Linear Block Codes – Examples (1/2)
• Example 1: Binary Repetition Code of length n = 3, k = 1,
C = {000, 111}
0/1 000/111
Linear Block Coder
(3,1,3) Repetition
• Example 2: n ≥ 2, C = Set of vectors of even Hamming weight in Fn2,

C = { 000, 011, 101,110}
-4 = 2^k , k = 2, n = 3, n – k = 1
• This code is called the single parity (even parity) check code.
00/01/10/11 Linear Block Coder 000/011/101/110

(3,2,2) Even Parity
10
Linear Block Codes – Examples (2/2)
11
Generation of Linear Block Codes (1/8)
c  m0 g 0  m1 g1    mk 1 g k 1
 g 0   g 0, 0 g 0,1  g 0,n 1 
g  g g 1,1  g 1,n 1 
G 1 
1, 0
       
   
g k 1  g k 1,0 g k 1,1  g k 1,n 1  kn
12
13
• The representation of the code provided by G is not unique.
• From a given generator G, another generator G' can be

obtained by performing elementary operations (row
operations (linear combinations of the rows) or column
permutations).
• Then an encoding operation defined by c = mG’ maps the

message m to a codeword in code C, but it is not necessarily
the same codeword that would be obtained using the
generator G.
• Codewords may change but Code C remains unchanged.

Only permutations in codewords can be observed. Such
types codes are known as equivalent codes.
14
• Since, performing elementary row operations (replacing a row
with linear combinations of some rows) does not change the
row span, so that the same code is produced.
• If two columns of a generator are interchanged, then the
corresponding positions of the code are changed, but the
distance structure of the code is preserved.
• Definition: Two binary linear codes which are the same except
for a permutation of the components of the code are said to be
equivalent codes.
• Let G and G‘ be generator matrices of equivalent codes. Then G
and G’ are related by the following operations:
1. Column permutations
2. Elementary row operations
15
 Example 4: C = {0000, 1010, 0101, 1111 }
• Let choose two non-null linearly independent codewords to create G
 g 0  0 1 0 1
G  
 1 
g 1 0 1 0  k n
• Since m is 2 bit message, total distinct messages = 22 = 4, i.e.,
M = {00, 01, 10, 11}. (Since binary codes, + is modulo 2 addition)
Find C
0 1 0 1
c 0  mG  [0 0]   [0 0 0 0]
1 0 1 0
0 1 0 1
c1  mG  [0 1]   [1 0 1 0]
1 0 1 0
0 1 0 1
c 2  mG  [1 0]   [0 1 0 1]
1 0 1 0
0 1 0 1
c 3  mG  [1 1]   [1 1 1 1]
1 0 1 0
16
• Now, find G’ by elementary row operations:
0 1 0 1 0 1 0 1
G   G'   
1 0 1 0  k n 1 1 1 1
row2 row2  row1
0 1 0 1
c 0  mG'  [0 0]   [0 0 0 0]
1 1 1 1
0 1 0 1
c1  mG'  [0 1]   [1 1 1 1]
1 1 1 1
0 1 0 1
c 2  mG'  [1 0]   [0 1 0 1]
1 1 1 1
0 1 0 1
c 3  mG'  [1 1]   [1 0 1 0]
1 1 1 1
• Observe C ={0000, 1010, 0101, 1111} and C’ ={0000, 1111, 0101, 1010},
two are equivalent codes.
17
An alternative method to find linear block code
Suppose message m = [mo m1], and codeword c = [co c1 c2 c3] = [mo m1]G
0 1 0 1
G 
1 0 1 0 k n
c0 c1 c2 c3   m0 m1  G  m1 m0 m1 m0 
c = mG
m0 m1 w
c0 = m1 c1= m0 c2= m1 c3= m0
0 0 0 0 0 0 0
0 1 1 0 1 0 2
1 0 0 1 0 1 2
1 1 1 1 1 1 4
(Here, + is modulo 2 addition) 18

An alternative method to find linear block code
Suppose message m = [mo m1], and codeword c’ = [co c1 c2 c3] = [mo m1]G’
0 1 0 1
G'   
1 1 1 1
row2  row2  row1
c0 c1 c2 c3   m0 m1  G'  m1 m0  m1 m1 m0  m1 
c’ = mG’
m0 m1 w
c0 = m1 c1= m0+m1 c2 = m1 c3 = m0+m1
0 0 0 0 0 0 0
0 1 1 1 1 1 4
1 0 0 1 0 1 2
1 1 1 0 1 0 2

More Examples on Non-Systematic Linear Block Codes (1/3)
• Example 5: Single Parity Check Code – only error detection capability
of one bit error patterns.
• n = 3, k = 2, C = {000, 011,101,110}
1 0 1 1 0 1
G  and c  m0 m1    m0 m1 m0  m1 
0 1 1 k n23 0 1 1
c = mG
m0 m1 c0 = m0 c1 = m1 c2 = m0+m1 w
0 0 0 0 0 0
0 1 0 1 1 2
1 0 1 0 1 2
1 1 1 1 0 2

Examples of Non-Systematic Linear Block Codes (2/3)
• Example 6: Consider a (6,3) linear block code defined by the generator
matrix G. 1 0 0 1 1 0
G  0 1 0 0 1 1
0 1 1 1 0 0
c = mG
m0 m1 m2 c1 = c4 = w
c0 = c3 = c5 = m1
m1+m2 c2 = m2 m0+m1
m0 m0+m2
0 0 0 0 0 0 0 0 0 0
0 0 1 0 1 1 1 0 0 3
0 1 0 0 1 0 0 1 1 3
0 1 1 0 0 1 1 1 1 4
1 0 0 1 0 0 1 1 0 3
1 0 1 1 1 1 0 1 0 4
1 1 0 1 1 0 1 0 1 4
1 1 1 1 0 1 0 0 1 3
• C = {000000, 011100, 010011, 001111, 100110, 111010, 110101, 101001}

Examples of Non-Systematic Linear Block Codes (3/3)
• Example 7: The (7,4) Hamming code has the generator matrix
1 1 0 1 0 0 0 1 0 1 1 1 0 0
0 1 1 0 1 0 0 0 1 1 0 1 0 0
G  G'  
0 0 1 1 0 1 0 0 0 1 1 0 1 0
   
0 0 0 1 1 0 1 0 0 0 1 1 0 1
row1  row1  row2
• Encode message m = [1 0 1 0]
• c = mG = [ 1 1 1 0 0 1 0 ]
• And c’ = mG’ = [1 0 0 0 1 1 0]
22
Hamming bound for binary (n, k) code (1/2)
For a binary (n, k) code
 Total number of valid message words = 2k
 Total number of valid codewords (1 for each message) = 2k
 Total number of available words or vectors = 2n
• For n=3, if 000 is a valid codeword and 1-error occurs then any one of the
following three invalid combination may generate
• 100,010,001 (at Hamming distance 1)----i.e 3 invalid combinations = 3C 1
• Similarly, for any codeword with n-bit, if j-error (at Hamming distance j)
n
occurs the number of invalid pattern (error pattern) will be----- C j
• For each valid codeword, number of invalid pattern up to t-errors-will

t
be --  n C
j
j 1
t
• Total number of invalid patterns or vectors = 2  n C j
k
j 1
23
Hamming bound for binary (n, k) code (2/2)
 Total number of valid message words =2k
 Total number of valid codewords (1 for each message) = 2k
 Total number of available codewords = 2n
t
 Total number of invalid patterns (or vectors) = 2  n C j
k
j 1
Total number of available vectors ≥ Valid codewords + invalid codewords

t t
 C j  2 nk   n C j
t
2 n  2 k  2 k  n C j i.e., 2 n  2 k n
j 1 j 0 j 0
• This is called Hamming Bound and gives relation between n,

k and t.
• A code for which this inequality turns in to equality is called
perfect code.
24
Systematic Linear Block Codes
• In the non-systematic (or asystematic) codes, message words
and redundant (parity) bits are intermingled i.e., no separate
boundaries for data and parity (or redundant) bits.
n bits
k (message bits) + n - k (redundant bits)
• In systematic codes, data and parity bits are separately

demarked, It is easier to separate them at the receiver.
n bits
Redundant bits Message bits
n - k bits k bits
n bits
• or
Message bits Redundant bits
k bits n - k bits 25
Generation of Systematic Linear Block Codes (1/5)
26
Generation of Systematic Linear Block Codes(2/5)
 
 g 0   p0 , 0 p0,1  p0,n  k 1 1 0  0
 g   p p1,1  p1,n  k 1 0 1  0
G 1  
1, 0
|
           
  
 g k 1   pk 1, 0 pk 1,1  pk 1,n  k 1 0 0  1
 Partity Matrix , P k k Ik 
k n
27
• Example 8: Find systematic G’ by elementary operations and systematic
code.
0 1 0 1  1 0 1 0
G   G'   
1 0 1 0  k n  0 1 0 1 
row1  row 2
1 0 1 0
c 0  mG'  [0 0]   [0 0 0 0]
0 1 0 1
1 0 1 0
c1  mG'  [0 1]   [0 1 0 1]
0 1 0 1
1 0 1 0
c 2  mG'  [1 0]   [1 0 1 0]
0 1 0 1
1 0 1 0
c 3  mG'  [1 1]   [1 1 1 1]
0 1 0 1
• Observe C = {0000, 1010, 0101,1111} and systematic C’ ={0000, 0101,
1010, 1111}, two are equivalent codes.
28
• Example 9: Consider a systematic (6, 3) linear block code defined by the
generator matrix and, find systematic code.
1 1 0 1 0 0
G  0 1 1 0 1 0
1 0 1 0 0 1
c = mG
m0 m1 m2 w
c0=m0+m2 c1=m0+m1 c2=m1+m2 c3=m0 c4=m1 c5=m2
0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 1 0 0 1 3
0 1 0 0 1 1 0 1 0 3
0 1 1 1 1 0 0 1 1 4
1 0 0 1 1 0 1 0 0 3
1 0 1 0 1 1 1 0 1 4
1 1 0 1 0 1 1 1 0 4
1 1 1 0 0 0 1 1 1 3
C = {000000, 1010001, 011010, 110011,110100,011101,101110,000111}

29
• Example 10: The (7,4) Hamming code of has the systematic
generator matrix G’’ (obtained from G in previous example of (7,4)
linear block code by elementary row operations)
1 1 0 1 0 0 0 1 1 0 1 0 0 0
0 1 1 0 1 0 0 0 1 1 0 1 0 0
G  G   
0 0 1 1 0 1 0 1 1 1 0 0 1 0
   
0 0 0 1 1 0 1
1 0 1 0 0 0 1
• Encode message m = [1 0 1 0]
• c’’ = mG’’ =[ 0 0 1 1 0 1 0].
30
References

Pearson, 2011.
31

1 Merged

Uploaded by

Copyright:

Available Formats

1 Merged

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Merged

Uploaded by

Copyright:

Available Formats

Information Theory and Applications

Department of Electronics and Communication Engineering

 To learn fundamentals of data transmission through noisy

• Examples of noisy communication channels :

• Two engineering approaches to noisy conditions :

• Add redundancy to the input message and exploit this redundancy

• Example (Deep-space communications)

• The above diagram represents a DMC with m input and n outputs.

• If P[X] is represented as diagonal matrix

• The mutual information I(X;Y) of a channel is defined by

where maximization is over all possible input probability

1. Lossless channels – only one non-zero element in each column of

H(X|Y) = 0 (Since output-to-input mapping is many-to-one).

2. Deterministic channels – only one non-zero element in each row of

H(Y|X) = 0 (Since input-to-output mapping is many-to-one).

H(X|Y) = H(Y|X)= 0 (Since output-to-input and input-to-output mapping

Capacity C  max I ( X ;Y )  max H ( X )  log m  log n  m  n

where m and n are no of input and output, respectively.

4. Symmetric channels – rows and columns in channel matrix are

Department of Electronics and Communication Engineering

• Using I(X;Y)= H(Y) – H(Y|X)

 max( H ( Y ))  p log p  ( 1  p ) log( 1  p )

• Since H(Y|X) is independent of parameter of p(xi).

For BSC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.1,0.9) = 0.469 bit.

For BSC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.1,0.9) = 0.469 bit.

2 0.1 0.5 0.531 0.531

• For H(Y) find P[Y],

 ( 1  p )[( H ( X )  log( 1  p )]  p log p

H(Y) = H(0.27,0.1,0.63) = 1.262 bits

For BEC, H(Y|X) = H(p), therefore, H(Y|X) = H(0.1,0.9) = 0.469 bit

H(Y) = H(0.45,0.1,0.45) = 1.369 bits

H(Y) = H(0.15, 0.5, 0.35) = 1.441 bits.

H(Y) = H(0.25, 0.5, 0.25) = 1.5 bits.

2 0.1 0.5 0.9 0.9

3 0.5 0.3 0.441 0.5

4 0.5 0.5 0.5 0.5

Department of Electronics and Communication Engineering

Solution: The channel diagram

The Channel capacity:

C  max I ( X ;Y )  max H ( X )  log 2  1bit / symbol

Solution: For overall channel matrix

Answer to these questions: Channel Coding theorem

it is not possible to transmit information over the channel and

Most important result in Information Theory: The channel coding

But Tc/Ts = r implies r ≤ C. That is, for r ≤ C, there exists a code

 3(0.25) (0.75)  (0.25) 3  0.1563

 10(0.25) 3 (0.75) 2  5(0.25) 4 (0.75)  (0.25) 5  0.1035

• So with increase in n, Pe is decreasing.

None 1/4 1 Error is not

Repetition code, n Tends to 0 Tends to 0 Code rate

Department of Electronics and Communication Engineering

• Also Shannon’s third theorem

Let us find the capacity of a continuous channel with bandwidth B

Noise Y(t)= X(t)+ N(t)

• Consider a zero-mean stationary process X(t) that is band-limited

• The channel output is perturbed by additive white Gaussian noise

• To make meaningful statements about channel, assign a cost to

where P is average transmitted power.

• The power-limited Gaussian channel employed here also has