Coding and Error Control CH-10
Coding and Error Control CH-10
273
274 CHAPTER 10 / Coding and Error Control
LEARNING OBJECTIVES
After studying this chapter, you should be able to:
• Describe and compare error recovery processes for error detection, retransmis-
sion/ARQ, and error correction.
• Create and decode cyclic block codes.
• Create and decode convolutional codes.
• Describe the capabilities and bandwidth efficiency of codes in terms of their
coding rate, Hamming distance, and coding gain.
• Describe the operation of LDPC and turbo codes.
• Explain the operation of H-ARQ and the options for how it could be
implemented.
In earlier chapters, we talked about transmission impairments and the effect of data
rate and s ignal-to-noise ratio on bit error rate. Regardless of the design of the trans-
mission system, there will be errors, resulting in the change of one or more bits in a
transmitted frame.
Three approaches are in common use for coping with data transmission errors:
• Error detection codes
• Error correction codes, also called forward error correction (FEC) codes
• Automatic repeat request (ARQ) protocols
An error detection code simply detects the presence of an error. Typically,
such codes are used in conjunction with a protocol at the data link or transport level
that uses an ARQ scheme. With an ARQ scheme, a receiver discards a block of
data in which an error is detected and the transmitter retransmits that block of data.
FEC codes are designed not just to detect but correct errors, avoiding the need for
retransmission. FEC schemes are frequently used in wireless transmission, where
retransmission schemes are highly inefficient and error rates may be high. Some
wireless protocols use Hybrid ARQ, which is a combination of FEC and ARQ.
This chapter looks at all three approaches in turn.
In what follows, we assume that data are transmitted as one or more contiguous
sequences of bits, called frames. Let us define these probabilities with respect to
errors in transmitted frames:
Pb: Probability of a single-bit error; also known as the bit error rate (BER)
P1: Probability that a frame arrives with no bit errors
P2: Probability that, with an error detection algorithm in use, a frame
arrives with one or more undetected errors
P3: Probability that, with an error detection algorithm in use, a frame
arrives with one or more detected bit errors but no undetected bit errors
10.1 / Error Detection 275
First consider the case when no means are taken to detect errors. Then the
probability of detected errors (P3) is zero. To express the remaining probabilities,
assume the probability that any bit is in error (Pb) is constant and independent for
each bit. Then we have,
P1 = (1 - Pb)F
P2 = 1 - P1
where F is the number of bits per frame. In words, the probability that a frame arrives
with no bit errors decreases when the probability of a single-bit error increases,
as you would expect. Also, the probability that a frame arrives with no bit errors
decreases with increasing frame length; the longer the frame, the more bits it has
and the higher the probability that one of these is in error.
Example 10.1 A system has a defined objective for connections that the BER should be
less than 10-6 on at least 90% of observed 1-minute intervals. Suppose now that we have
the rather modest user requirement that on average one frame with an undetected bit error
should occur per day on a continuously used 1 Mbps channel, and let us assume a frame
length of 1000 bits. The number of frames that can be transmitted in a day comes out to
8.64 * 107, which yields a required frame error rate of P2 = 1/(8.64 * 107) = 1.16 * 10-8.
But if we assume a value of Pb of 10-6, then P1 = (0.999999)1000 = 0.999 and therefore
P2 = 10-3, which is about five orders of magnitude too large to meet our requirement.
This means that (8.64 * 107)*P2 = 86,400 frames with undetected bit errors would occur
per day for Pb of 10-6.
This is the kind of result that motivates the use of error detection techniques.
All of these techniques operate on the following principle (Figure 10.1). For a given
frame of bits, the transmitter adds additional bits that constitute an error detection
code. This code is calculated as a function of the other transmitted bits. Typically,
for a data block of k bits, the error detection algorithm yields an error detection
code of n - k bits, where (n - k) 6 k. The error detection code, also referred to
as the check bits, is appended to the data block to produce a frame of n bits, which
is then transmitted. The receiver separates the incoming frame into the k bits of
data and (n - k) bits of the error detection code. The receiver performs the same
error detection calculation on the data bits and compares this value with the value
of the incoming error detection code. A detected error occurs if and only if there is a
mismatch. Thus, P3 is the probability that a frame contains errors and that the error
detection scheme will detect that fact. P2 is known as the residual error rate and is
the probability that an error will be undetected despite the use of an error detection
scheme.
Parity Check
The simplest error detection scheme is to append a parity bit to the end of a block
of data. A typical example is character transmission, in which a parity bit is attached
to each 7-bit character. The value of this bit is selected so that the character has an
even number of 1s (even parity) or an odd number of 1s (odd parity).
276 CHAPTER 10 / Coding and Error Control
k bits
Data Data'
Receiver
Data
Transmitter
Example 10.2 If the transmitter is transmitting 1110001 and using odd parity, it will
append a 1 and transmit 11110001. The receiver examines the received character to con-
duct a parity check and, if the total number of 1s is odd, assumes that no error has
occurred. If one bit (or any odd number of bits) is erroneously inverted during transmis-
sion (e.g., 11100001), then the receiver will detect an error.
Note, however, that if two (or any even number) of bits are inverted due to
error, an undetected error occurs. Typically, even parity is used for synchronous
transmission and odd parity for asynchronous transmission.
The use of the parity bit is not foolproof, as noise impulses are often long
enough to destroy more than one bit, especially at high data rates.
1
This procedure is slightly different from that in Figure 10.1. As shall be seen, the CRC process could be
implemented as follows. The receiver could perform a division operation on the incoming k data bits and
compare the result to the incoming (n - k) check bits.
10.1 / Error Detection 277
To clarify this, we present the procedure in three ways: modulo 2 arithmetic,
polynomials, and digital logic.
Example 10.3
1. Given
Message D = 1010001101 (10 bits)
1 1 0 1 0 1 0 1 1 0 Q
P 1 1 0 1 0 1
1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 2n kD
1 1 0 1 0 1
1 1 1 0 1 1
1 1 0 1 0 1
1 1 1 0 1 0
1 1 0 1 0 1
1 1 1 1 1 0
1 1 0 1 0 1
1 0 1 1 0 0
1 1 0 1 0 1
1 1 0 0 1 0
1 1 0 1 0 1
0 1 1 1 0 R
1 1 0 1 0 1 0 1 1 0 Q
1 0 1 0 0 0 1 1 0 1 0 1 1 1 0 T
P 1 1 0 1 0 1
1 1 0 1 0 1
1 1 1 0 1 1
1 1 0 1 0 1
1 1 1 0 1 0
1 1 0 1 0 1
1 1 1 1 1 0
1 1 0 1 0 1
1 0 1 1 1 1
1 1 0 1 0 1
1 1 0 1 0 1
1 1 0 1 0 1
0 R
The pattern P is chosen to be one bit longer than the desired FCS, and the
exact bit pattern chosen depends on the type of errors expected. At minimum, both
the high- and low-order bits of P must be 1.
There is a concise method for specifying the occurrence of one or more errors.
An error results in the reversal of a bit. This is equivalent to taking the XOR of the
10.1 / Error Detection 279
bit and 1 (modulo 2 addition of 1 to the bit): 0 + 1 = 1; 1 + 1 = 0. Thus, the errors
in an n-bit frame can be represented by an n-bit field with 1s in each error position.
The resulting frame Tr can be expressed as
Tr = T ⊕ E
where
T = transmitted frame
E = error pattern with 1s in positions where errors occur
Tr = received frame
If there is an error (E ≠ 0), the receiver will fail to detect the error if and only if Tr
is divisible by P, which is equivalent to E divisible by P. Intuitively, this seems an
unlikely occurrence.
Polynomials A second way of viewing the CRC process is to express all values
as polynomials in a dummy variable X, with binary coefficients. The coefficients cor-
respond to the bits in the binary number. Arithmetic operations are again modulo 2.
The CRC process can now be described as
Xn - kD(X) R(X)
= Q(X) +
P(X) P(X)
Example 10.4 Continuing with Example 10.3, for D = 1010001101, we have D(X) = X9 +
X7 + X3 + X2 + 1, and for P = 110101, we have P(X) = X5 + X4 + X2 + 1. We should
end up with R = 01110, which corresponds to R(X) = X3 + X2 + X. Figure 10.2 shows the
polynomial division that corresponds to the binary division in the preceding example.
X9 + X8 + X6 + X4 + X2 + X Q(X)
P(X) X5 + X4 + X2 + 1 X14 X12 X8 + X7 + X5 X5D(X)
X14 + X13 + X11 + X9
X13 + X12 +X11 + X9 + X8
X13 + X12 + X10 + X8
X11 + X10+ X9 + X7
X11 + X10 + X8 + X6
X9 + X8 + X7 + X6 + X5
X9 + X8 + X6 + X4
7
X + X5 + X4
X7 + X6 + X4 + X2
X6 + X5 + X2
X6 + 5
X + 3
X + X
X3 + X2 +X R(X)
Digital Logic The CRC process can be represented by, and indeed implemented
as, a dividing circuit consisting of XOR gates and a shift register. The shift register is a
string of 1-bit storage devices. Each device has an output line, which indicates the val-
ue currently stored, and an input line. At discrete time instants, known as clock times,
the value in the storage device is replaced by the value indicated by its input line. The
entire register is clocked simultaneously, causing a 1-bit shift along the entire register.
The circuit is implemented as follows:
1. The register contains n - k bits, equal to the length of the FCS.
2. There are up to n - k XOR gates.
2
A burst error of length B is a contiguous sequence of B bits in which the first and last bits and any num-
ber of intermediate bits are received in error.
10.1 / Error Detection 281
3. The presence or absence of a gate corresponds to the presence or absence of a
term in the divisor polynomial, P(X), excluding the terms 1 and Xn - k.
Example 10.5 The architecture of this circuit is best explained by first considering an
example, which is illustrated in Figure 10.3. In this example, we use
Output
(15 bits)
Switch 1
A B B
Input C4 { C3 C2 { C1 C0
(10 bits)
Switch
2 A
{
C4 C3 C2 C1 C0 C4 ≈ C3 ≈ I C4 ≈ C1 ≈ I C4 ≈ I I = input
Initial 0 0 0 0 0 1 1 1 1
Step 1 1 0 1 0 1 1 1 1 0
Step 2 1 1 1 1 1 1 1 0 1
Step 3 1 1 1 1 0 0 0 1 0
Step 4 0 1 0 0 1 1 0 0 0 Message to
Step 5 1 0 0 1 0 1 0 1 0 be sent
Step 6 1 0 0 0 1 0 0 0 1
Step 7 0 0 0 1 0 1 0 1 1
Step 8 1 0 0 0 1 1 1 1 0
Step 9 1 0 1 1 1 0 1 0 1
Step 10 0 1 1 1 0
input data bits are fed in with both switches in the A position. As a result, for the first 10
steps, the input bits are fed into the shift register and also used as output bits. After the
last data bit is processed, the shift register contains the remainder (FCS) (shown shaded).
As soon as the last data bit is provided to the shift register, both switches are set to the
B position. This has two effects: (1) all of the XOR gates become simple pass-throughs;
no bits are changed, and (2) as the shifting process continues, the 5 CRC bits are output.
At the receiver, the same logic is used. As each bit of M arrives, it is inserted into the
shift register. If there have been no errors, the shift register should contain the bit pattern
for R at the conclusion of M. the transmitted bits of R now begin to arrive, and the effect
is to zero out the register so that, at the conclusion of reception, the register contains all 0s.
Figure 10.4 indicates the general architecture of the shift register implementa-
tion of a CRC for the polynomial P(X) = a AiXi, where A0 = An - k = 1 and all
n-k
Error detection is a useful technique, found in data link control protocols, such
as HDLC, and in transport protocols, such as TCP. However, correction of errors
using an error detection code requires that block of data be retransmitted, using the
Output
(n bits)
Switch 1
A B
Input
(k bits)
Cn–k–1 { Cn–k–2 { • • • { C1 { C0
An–k–1 An–k–2 A2 A1
Switch
2 A
{
B
Figure 10.4 General CRC Architecture to Implement Divisor 1 + A1X + A2X2 + g +
An-1 Xn-k-1 + Xn-k
3
It is common for the CRC register to be shown shifting to the right, which is the reverse of the analogy
to binary division. Because binary numbers are usually shown with the most significant bit on the left, a
left-shifting register is more appropriate.
10.2 / Block Error Correction Codes 283
ARQ discipline explained in Section 10.4. For wireless applications, this approach is
inadequate for two reasons:
1. The bit error rate on a wireless link can be quite high, which would result in a
large number of retransmissions.
2. In some cases, especially satellite links, the propagation delay is very long com-
pared to the transmission time of a single frame. The result is a very inefficient
system. As is discussed in Section 10.4, the common approach to retransmis-
sion is to retransmit the frame in error plus all subsequent frames. With a long
data link, an error in a single frame necessitates retransmitting many frames.
Instead, it would be desirable to enable the receiver to correct errors in an incoming trans-
mission on the basis of the bits in that transmission. Figure 10.5 shows in general how this
is done. On the transmission end, each k-bit block of data is mapped into an n-bit block
(n > k) called a codeword, using an forward error correction (FEC) encoder. The code-
word is then transmitted; in the case of wireless transmission a modulator produces an
analog signal for transmission. During transmission, the signal is subject to noise, which
may produce bit errors in the signal. At the receiver, the incoming signal is demodulated
to produce a bit string that is similar to the original codeword but may contain errors.
This block is passed through an FEC decoder, with one of five possible outcomes:
1. If there are no bit errors, the input to the FEC decoder is identical to the
original codeword, and the decoder produces the original data block as output.
2. For certain error patterns, it is possible for the decoder to detect and cor-
rect those errors. Thus, even though the incoming data block differs from the
k bits
Codeword’
Data
FEC
FEC decoder
encoder
Detectable but not
correctable error
No error or
correctable
error
Codeword
n bits Error
Data
indication
Transmitter Receiver
transmitted codeword, the FEC decoder is able to map this block into the
original data block.
3. For certain error patterns, the decoder can detect but not correct the errors. In
this case, the decoder simply reports an uncorrectable error.
4. For certain, typically rare, error patterns, the decoder detects an error, but
does not correct it properly. It assumes a certain block of data was sent when
in reality of different one was sent.
5. For certain even more rare error patterns, the decoder does not detect that any
errors have occurred and maps the incoming n-bit data block into a k-bit block
that differs from the original k-bit block.
How is it possible for the decoder to correct bit errors? In essence, error
correction works by adding redundancy to the transmitted message. Consider an
example where a binary 0 or 1 were to be sent, but instead the codewords that were
sent were either 0000 or 1111. The redundancy makes it possible for the receiver to
deduce what the original message was, even in the face of a certain level of error
rate. If a 0010 were received, we could assume that a 0000 was sent corresponding
to the original binary 0, because only one bit change would have occurred to make
this happen. There is, however, a much more unlikely yet possible scenario where
a 1111 was sent. The decoder would then make a mistake by assuming a 0 was sent.
Consider if another received codeword were 0011. In this case, the decoder would
not be able to decide because it would be equally likely that 0000 or 1111 was sent.
In this section we look at a widely used form of error correction code known as
a block error correction code. We begin with a discussion of general principles and
then look at some specific codes. Before proceeding, we note that in many cases, the
error correction code follows the same general layout as shown for error detection
codes in Figure 10.1. That is, the FEC algorithm takes as input a k-bit block and
adds (n - k) check bits to that block to produce an n-bit block; all of the bits in the
original k-bit block show up in the n-bit block. For some FEC algorithms, such as
the convolutional code discussed in Section 10.3, the FEC algorithm maps the k-bit
input into an n-bit codeword in such a way that the original k bits do not appear in
the codeword.
Now, suppose that a codeword block is received with the bit pattern 00100. This is not a
valid codeword and so the receiver has detected an error. Can the error be corrected? We
cannot be sure which data block was sent because 1, 2, 3, 4, or even all 5 of the bits that were
transmitted may have been corrupted by noise. However, notice that it would r equire only
a single bit change to transform the valid codeword 00000 into 00100. It would take two bit
changes to transform 00111 to 00100, three bit changes to transform 11110 to 00100, and
it would take four bit changes to transform 11001 into 00100. Thus, we can deduce that the
most likely codeword that was sent was 00000 and that therefore the desired data block is
00. This is error correction. In terms of Hamming distances, we have
d(00000, 00100) = 1; d(00111, 00100) = 2; d(11001, 00100) = 4;
d(11110, 00100) = 3
So the rule we would like to impose is that if an invalid codeword is received, then
the valid codeword that is closest to it (minimum distance) is selected. This will only work
if there is a unique valid codeword at a minimum distance from each invalid codeword.
For our example, it is not true that for every invalid codeword there is one and only
one valid codeword at a minimum distance. There are 25 = 32 possible codewords of which
4 are valid, leaving 28 invalid codewords. For the invalid codewords, we have the following:
There are eight cases in which an invalid codeword is at a distance 2 from two differ-
ent valid codewords. Thus, if one such invalid codeword is received, an error in 2 bits could
286 CHAPTER 10 / Coding and Error Control
have caused it and the receiver has no way to choose between the two alternatives. An
error is detected but cannot be corrected. The only remedy is retransmission. However, in
every case in which a s ingle-bit error occurs, the resulting codeword is of distance 1 from
only one valid codeword and the decision can be made. This code is therefore capable of
correcting all single-bit errors but cannot correct d ouble-bit errors. Another way to see
this is to look at the pairwise distances between valid codewords:
d(00000, 00111) = 3; d(00000, 11001) = 3; d(00000, 11110) = 4;
d(00111, 11001) = 4; d(00111, 11110) = 3; d(11001, 11110) = 3;
The minimum distance between valid codewords is 3. Therefore, a single-bit error
will result in an invalid codeword that is a distance 1 from the original valid codeword but
a distance at least 2 from all other valid codewords. As a result, the code can always cor-
rect a single-bit error. Note that the code also will always detect a double-bit error.
the design of a block code is equivalent to the design of a function of the form
vc = f(vd), where vd is a vector of k data bits and vc is a vector of n codeword bits.
With an (n, k) block code, there are 2k valid codewords out of a total of 2n
possible codewords. The ratio of redundant bits to data bits, (n - k)/k, is called the
redundancy of the code, and the ratio of data bits to total bits, k/n, is called the code
rate. The code rate is a measure of how much additional bandwidth is required to
carry data at the same data rate as without the code. For example, a code rate of 12
requires double the bandwidth of an uncoded system to maintain the same data rate.
Our example has a code rate of 25 and so requires a bandwidth 2.5 times the band-
width for an uncoded system. For example, if the data rate input to the encoder is 1
Mbps, then the output from the encoder must be at a rate of 2.5 Mbps to keep up.
For a code consisting of the codewords w1, w2, c, ws, where s = 2k, the mini-
mum distance dmin of the code is defined as
dmin = min[d(wi,wj)]
i≠j
It can be shown that the following conditions hold. For a given positive integer t,
if a code satisfies dmin Ú 2t + 1, then the code can correct all bit errors up to and
including errors of t bits. If dmin Ú 2t, then all errors … t - 1 bits can be corrected
and errors of t bits can be detected but not, in general, corrected. Conversely, any
code for which all errors of magnitude … t are corrected must satisfy dmin Ú 2t + 1,
and any code for which all errors of magnitude … t - 1 are corrected and all errors
of magnitude t are detected must satisfy dmin Ú 2t.
Another way of putting the relationship between dmin and t is to say that the
maximum number of guaranteed correctable errors per codeword satisfies
dmin - 1
t = j k
2
where : x ; means the largest integer not to exceed x (e.g., : 6.3 ; = 6). Furthermore,
if we are concerned only with error detection and not error correction, then the
number of errors, t, that can be detected satisfies
t = dmin - 1
10.2 / Block Error Correction Codes 287
To see this, consider that if dmin errors occur, this could change one valid codeword
into another. Any number of errors less than dmin cannot result in another valid
codeword.
The design of a block code involves a number of considerations.
1. For given values of n and k, we would like the largest possible value of dmin.
2. The code should be relatively easy to encode and decode, requiring minimal
memory and processing time.
3. We would like the number of extra bits, (n - k), to be small, to reduce
bandwidth.
4. We would like the number of extra bits, (n - k), to be large, to reduce error
rate.
Clearly, the last two objectives are in conflict, and trade-offs must be made.
Before looking at specific codes, it will be useful to examine Figure 10.6. The
literature on error correction codes frequently includes graphs of this sort to dem-
onstrate the effectiveness of various encoding schemes. Recall from Chapter 7
that coding can be used to reduce the required Eb /N0 value to achieve a given
bit error rate.4 The coding discussed in Chapter 7 has to do with the definition of
signal elements to represent bits. The coding discussed in this chapter also has an
effect on Eb /N0. In Figure 10.6, the curve on the right is for an uncoded modula-
tion system; the shaded region represents the area in which potential improve-
ment can be achieved. In this region, a smaller BER is achieved for a given Eb /N0,
and conversely, for a given BER, a smaller Eb /N0 is required. The other curve is
a typical result of a code rate of o ne-half (equal number of data and check bits).
Note that at an error rate of 10-6, the use of coding allows a reduction in Eb /N0
of 2.77 dB. This reduction is referred to as the coding gain, which is defined as
the reduction, in decibels, in the required Eb /N0 to achieve a specified BER of
an error correction coded system compared to an uncoded system using the same
modulation.
It is important to realize that the BER for the second rate 12 curve refers to the
rate of uncorrected errors and that the Eb value refers to the energy per data bit.
Because the rate is 12 , there are two bits on the channel for each data bit, effectively
reducing the data throughput by 12 (or requiring twice the raw data rate) as well. The
energy per coded bit is half that of the energy per data bit, or a reduction of 3 dB. If
we look at the energy per coded bit for this system, then we see that the channel bit
error rate is about 2.4 * 10-2, or 0.024.
Finally, note that below a certain threshold of Eb /N0, the coding scheme actu-
ally degrades performance. In our example of Figure 10.6, the threshold occurs at
about 5.4 dB. Below the threshold, the extra check bits add overhead to the system
that reduces the energy per data bit causing increased errors. Above the threshold,
the error-correcting power of the code more than compensates for the reduced Eb,
resulting in a coding gain.
We now turn to a look at some specific block error correction codes.
4
Eb /N0 is the ratio of signal energy per bit to noise power density per hertz; it was defined and discussed
in Chapter 6.
288 CHAPTER 10 / Coding and Error Control
10–1
Without
coding
10–2
Probability of bit error (BER)
Channel bit
error probability Rate 1/2
coding
10–3
10–4
Region of
–5
potential
10 coding gain
10–6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
3 dB 2.77 dB
(Eb/N0) (dB)
Figure 10.6 How Coding Improves System Performance
Hamming Code
Hamming codes are a family of (n, k) block error correction codes that have the
following parameters:
Block length: n = 2m - 1
Number of data bits: k = 2m - m - 1
Number of check bits: n - k = m
Minimum distance: dmin = 3
where m Ú 3. Hamming codes are straightforward and easy to analyze but are
rarely used. We begin with these codes because they illustrate some of the funda-
mental principles of block codes.
Hamming codes are designed to correct s ingle-bit errors. To start, let us deter-
mine how long the code must be. The Hamming code process has the same struc-
ture as the error detection logic shown in Figure 10.1; that is, the encoding process
preserves the k data bits and adds (n - k) check bits. For decoding, the comparison
10.2 / Block Error Correction Codes 289
Table 10.1 Hamming Code Requirements
Single-Error Correction/
Single-Error Correction Double-Error Detection
logic receives as input two (n - k)-bit values, one from the incoming codeword and
one from a calculation performed on the incoming data bits. A b it-by-bit compari-
son is done by taking the XOR of the two inputs. The result is called the syndrome
word. Thus, each bit of the syndrome is 0 or 1 according to whether there is or is not
a match in that bit position for the two inputs.
The syndrome word is therefore (n - k) bits wide and has a range between 0
and 2(n - k) - 1. The value 0 indicates that no error was detected, leaving 2(n - k) - 1
values to indicate, if there is an error, which bit was in error. Now, because an error
could occur on any of the k data bits or (n - k) check bits, we must have
2(n - k) - 1 Ú k + (n - k) = n
This equation gives the number of bits needed to correct a single-bit error in a word
containing k data bits. Table 10.1 lists the number of check bits required for various
data lengths.
For convenience, we would like to generate a syndrome with the following
characteristics:
• If the syndrome contains all 0s, no error has been detected.
• If the syndrome contains one and only one bit set to 1, then an error has
occurred in one of the check bits. No correction is needed.
• If the syndrome contains more than one bit set to 1, then the numerical value
of the syndrome indicates the position of the data bit in error. This data bit is
inverted for correction.
To achieve these characteristics, the data and check bits are arranged into an
n-bit block as follows. Counting from the least-significant (rightmost) position, the
Hamming check bits are inserted at positions that are a power of 2 [i.e., positions
1, 2, 4, . . . , 2(n - k)]. The remaining bits are data bits. To calculate the check bits, each
data position that has a value 1 is represented by a binary value equal to its position;
thus, if the 9th bit is 1, the corresponding value is 1001. All of the position values are
then XORed together to produce the bits of the Hamming code. At the receiver,
all bit position values where there is 1 are XORed. In this case, the XOR includes
290 CHAPTER 10 / Coding and Error Control
both data bits and check bits. Because the check bits occur at bit positions that are
a power of 2, we can simply XOR all data bit positions with a value of 1, plus the
Hamming code formed by the check bits. If the result of the XOR is zero, no error
is detected. If the result is nonzero, then the result is the syndrome, and its value
equals the bit position that is in error.
Example 10.7 A (12, 8) Hamming code has the assignment shown in Table 10.2. The
8-bit data block is 00111001. Four of the data bits have a value 1 (shaded in the table), and
their bit position values are XORed to produce the Hamming code 0111, which forms the
four check digits. The entire block that is transmitted is 001101001111. Suppose now that
data bit 3, in bit position 6, sustains an error and is changed from 0 to 1. Then the received
block is 001101101111. The received Hamming code is still 0111. The receiver performs an
XOR of the Hamming code and all of the bit position values for nonzero data bits, with
a result of 0110. The nonzero result detects an error and indicates that the error is in bit
position 6.
Bit Position 12 11 10 9 8 7 6 5 4 3 2 1
Position Number 1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001
Data Bit D8 D7 D6 D5 D4 D3 D2 D1
Check Bit C8 C4 C2 C1
Transmitted Block 0 0 1 1 0 1 0 0 1 1 1 1
Codes 1010 1001 0111 0011
Position Code
10 1010
9 1001
7 0111
3 0011
XOR = C8 C4 C2 C1 0111
Bit Position 12 11 10 9 8 7 6 5 4 3 2 1
Position Number 1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001
Data Bit D8 D7 D6 D5 D4 D3 D2 D1
Check Bit C8 C4 C2 C1
Received Block 0 0 1 1 0 1 1 0 1 1 1 1
Codes 1010 1001 0111 0110 0011
10.2 / Block Error Correction Codes 291
Position Code
Hamming 0111
10 1010
9 1001
7 0111
6 0110
3 0011
XOR = syndrome 0110
Cyclic Codes
Most of the error correction block codes that are in use are in a category called
cyclic codes. For such codes, if the n-bit sequence c = (c0, c1, c , cn - 1) is a valid
codeword, then (cn - 1, c0, c1, c , cn - 2), which is formed by cyclically shifting c one
place to the right, is also a valid codeword. This class of codes can be easily encoded
and decoded using linear feedback shift registers (LFSRs). Examples of cyclic codes
include the Bose–Chaudhuri–Hocquenghem (BCH) and Reed–Solomon (RS)
codes.
The LFSR implementation of a cyclic error correction encoder is the same as
that of the CRC error detection code, illustrated in Figure 10.4. The key difference
is that the CRC code takes an input of arbitrary length and produces a fixed-length
CRC check code, while a cyclic error correction code takes a fixed-length input
(k bits) and produces a fixed-length check code (n - k bits).
Figure 10.7 shows the LFSR implementation of the decoder for a cyclic block
code. Compare this to the encoder logic in Figure 10.4. Note that for the encoder,
the k data bits are treated as input to produce an (n - k) code of check bits in the
shift register. For the decoder, the input is the received bit stream of n bits, consist-
ing of k data bits followed by (n - k) check bits. If there have been no errors, after
the first k steps, the shift register contains the pattern of check bits that were trans-
mitted. After the remaining (n - k) steps, the shift register contains a syndrome
code.
For decoding of a cyclic code, the following procedure is used:
1. Process received bits to compute the syndrome code in exactly the same fash-
ion as the encoder processes the data bits to produce the check code.
2. If the syndrome bits are all zero, no error has been detected.
3. If the syndrome is nonzero, perform additional processing on the syndrome
for error correction.
292 CHAPTER 10 / Coding and Error Control
Received
(n bits)
{ Sn–k–1 { Sn–k–2 { • • • { S1 { S0
An–k–1 An–k–2 A2 A1
Figure 10.7 Block Syndrome Generator for Divisor 1 + A1X + A2X2 + g + An - 1X n-k-1 + X n-k
To understand the significance of the syndrome, let us examine the block code
using polynomials. As in the case of the CRC, a particular cyclic code can be repre-
sented by a polynomial divisor, called the generator polynomial. For an (n, k) code,
the generator polynomial has the form
a AiX + X
n-k-1
i n-k
P(X) = 1 +
i=1
C(X) C(X)
= aQ(X) + b + = Q(X) (10.4)
P(X) P(X)
The last equality is valid because of the rules of modulo 2 arithmetic (a + a = 0,
whether a = 1 or a = 0). Thus, if there are no errors, the division of T(X) by P(X)
produces no remainder.
If one or more bit errors occur, then the received block Z(X) will be of the form
Z(X) = T(X) + E(X)
10.2 / Block Error Correction Codes 293
where E(X) is an n-bit error polynomial with a value of 1 in each bit position that is
in error in Z(X). If we pass Z(X) through the LFSR of Figure 10.7, we are perform-
ing the division Z(X)/P(X), which produces the (n - k) bit syndrome S(X):
Z(X) S(X)
= B(X) + (10.5)
P(X) P(X)
where B(X) is the quotient and S(X) is the remainder. Thus, S(X) is a function of
Z(X). But how does this help us perform error correction? To see this, let us expand
Equation (10.5).
Z(X ) S(X )
= B(X) +
P(X ) P(X )
T(X) + E(X) S(X )
= B(X) +
P(X) P(X )
E(X ) S(X)
Q(X) + = B(X) +
P(X ) P(X) (10.6)
E(X ) S(X )
= [Q(X ) + B(X )] +
P(X ) P(X )
What we see is that E(X)/P(X) produces the same remainder as Z(X)/P(X).
Therefore, regardless of the initial pattern of bits (transmitted value of T(X)), the
syndrome value S(X) depends only on the error bits. If we can recover the error bits,
E(X), from S(X), then we can correct the errors in Z(X) by simple addition:
Z(X) + E(X) = T(X) + E(X) + E(X) = T(X)
Because S(X) depends only on E(X), we can easily determine the power of
a cyclic block code. The syndrome pattern consists of n - k bits and therefore
takes on 2n - k possible values. A value of all zeros indicates no errors. Therefore,
a total of 2n - k - 1 different error patterns can be corrected. To be able to cor-
rect all possible single-bit errors with an (n, k) code, we must have n … (2n - k - 1).
To be able to correct all single-and double-
bit errors, the relationship is
n(n - 1) n-k
an + b … (2 - 1).
2
The way in which E(X) is recovered from S(X) may depend on the specific
code involved. The most straightforward approach is to develop a table of all
possible values of E(X) with the corresponding value of S(X) of each. Then a simple
table lookup is required.
BCH Codes
BCH codes are among the most powerful cyclic block codes and are widely used in
wireless applications. For any positive pair of integers m and t, there is a binary (n, k)
BCH code with the following parameters:
Block length: n = 2m - 1
Number of check bits: n - k … mt
Minimum distance: dmin Ú 2t + 1
294 CHAPTER 10 / Coding and Error Control
Example 10.8 5 Consider a (7, 4) code with the generator polynomial P(X) = X3 + X2 + 1.
We have 7 = 23 - 1, so this code is capable of correcting all s ingle-bit errors. Table 10.3a
lists all of the valid codewords; note that dmin is 3, confirming that this is a single-error-
correcting code. For example, for the data block 1010, we have D(X) = X3 + X and
Xn - kD(X) = X6 + X4. Dividing as in Equation (10.4),
X3 X2 1 Q(X)
P(X) X3 X2 1 X6 X4 23D(X)
X6 X5 X3
5 4
X X X3
X5 X4 X2
X3 X2
X3 X2 1
1 C(X)
X3 X2 X Q(X ) B(X )
P(X) X3 X2 1 X6 E(X )
6 5 3
X X X
X5 X3
X5 X4 X2
X4 X3 X2
X4 X3 X
X2 X S(X )
Therefore, S = 110. The remaining entries in Table 10.3b are calculated similarly. Now
suppose the received block is 1101101, or Z(X) = X6 + X5 + X3 + X2 + 1. Using
Equation (10.5):
X3 B(X )
3 2 6 5 3 2
P(X) X X 1 X X X X 1 Z(X )
X6 X5 X3
X2 1 S(X )
5
This example is taken from [LATH98].
10.2 / Block Error Correction Codes 295
Table 10.3 A Single-Error-Correcting (7, 4) Cyclic Code
(a) Table of valid codewords (b) Table of syndromes for single-bit errors
This code can correct all combinations of t or fewer errors. The generator polyno-
m
mial for this code can be constructed from the factors of (X2 - 1 + 1). The BCH
codes provide flexibility in the choice of parameters (block length, code rate) for
a given number of errors to be corrected. Table 10.4 lists the BCH parameters for
code lengths up to 28 - 1. Table 10.5 lists some of the BCH generator polynomials.
A number of techniques have been designed for BCH decoding that require
less memory than a simple table lookup. One of the simplest was proposed by
Berlekamp [BERL80]. The central idea is to compute an error-locator polynomial
and solve for its roots. The complexity of the algorithm increases only as the square
of the number of errors to be corrected.
Reed–Solomon Codes
eed–Solomon codes are a widely used subclass of nonbinary BCH codes. With RS
R
codes, data are processed in chunks of m bits, called symbols. An (n, k) RS code has
the following parameters:
n k t n k t n k t n k t n k t
7 4 1 63 30 6 127 64 10 255 207 6 255 99 23
15 11 1 24 7 57 11 199 7 91 25
7 2 18 10 50 13 191 8 87 26
5 3 16 11 43 14 187 9 79 27
31 26 1 10 13 36 15 179 10 71 29
21 2 7 15 29 21 171 11 63 30
16 3 127 120 1 22 23 163 12 55 31
11 5 113 2 15 27 155 13 47 42
6 7 106 3 8 31 147 14 45 43
63 57 1 99 4 255 247 1 139 15 37 45
51 2 92 5 239 2 131 18 29 47
45 3 85 6 231 3 123 19 21 55
39 4 78 7 223 4 115 21 13 59
36 5 71 9 215 5 107 22 9 63
N k t P(X)
7 4 1 3
X + X + 1
15 11 1 X4 + X + 1
15 7 2 X8 + X7 + X6 + X4 + 1
15 5 3 X10 + X8 + X5 + X4 + X2 + X + 1
31 26 1 X5 + X2 + 1
31 21 2 X10 + X9 + X8 + X6 + X5 + X3 + 1
RS codes are well suited for burst error correction. They make highly efficient
use of redundancy, and block lengths and symbol sizes can be easily adjusted to
10.2 / Block Error Correction Codes 297
accommodate a wide range of message sizes. In addition, efficient coding techniques
are available for RS codes.
Variable
nodes V1 V2 V3 V4 V5 V6
(data bits)
Constraint C3 C3 C3
nodes
are most likely to be different than their estimates. This corresponds to the
most likely bit changes that are needed to satisfy the equations.
3. The estimates from the constraint nodes are sent to the variable nodes. Because
variable nodes are connected to multiple constraint nodes, the variable nodes
combine the newly acquired information to update their estimates of their bit
values and probabilities.
4. These are sent again to the constraint nodes. If the equations are now satisfied,
then stop. Otherwise, continue the decoding process.
This decoding procedure is known as message passing or belief propagation. The
performance of LDPC codes can be impressive, approaching Shannon capacity
within a fraction of a dB when using long codes.
Block Interleaving
Block interleaving is a common technique used with block codes in wireless systems;
we saw an example of this in Figure 6.17. The advantage of interleaving is that a
burst error that affects a sequence of bits is spread out over a number of separate
blocks at the receiver so that error correction is possible. Interleaving is accom-
plished by reading and writing data from memory in different orders. Figure 10.9
illustrates a simple and common interleaving technique. In this case, m blocks of
data to be transmitted are stored in a rectangular array in which each row consists
of n bits, equal to the block size. Data are then read out one column at a time. The
result is that the k data bits and their corresponding (n - k) check bits, which form a
single n-bit block, are spread out and interspersed with bits from other blocks. There
are m - 1 bits from other blocks in between. At the receiver, the data are deinter-
leaved to recover the original order. If, during transmission, a burst of noise affects
a consecutive sequence of bits, those bits belong to different blocks and hence only
1 2 3 4 5 6 • • • n
n+1 n+2 n+3 n+4 n+5 n+6 • • • 2n
Read in 2n + 1 • • •
coded bits
from encoder • • • m rows
• • •
• • • mn – n
mn–n+1 • • • mn
Note: The numbers in the matrix indicate the order in which bits are read in.
Interleaver output sequence: 1, n + 1, 2n + 1, …
Figure 10.9 Block Interleaving
10.3 / Convolutional Codes 299
a few of the bits in error are part of a particular block that needs to be corrected by
any one set of check bits. Specifically, a burst of length l = mb is broken up into m
bursts of length b. Some thought should convince you of the following assertion:
Suppose we have an (n, k) code that can correct all combinations of t or fewer errors,
where t = : (n - k)/2 ; . Then if we use an interleaver of degree m, then the result is
an (mn, mk) code that can correct burst errors of up to mt bits.
Block codes are one of the two widely used categories of error correction codes
for wireless transmission; the other is convolutional codes. An (n, k) block code
processes data in blocks of k bits at a time, producing a block of n bits (n 7 k) as
output for every block of k bits as input. If data are transmitted and received in a
more or less continuous stream, a block code, particularly one with a large value of
n, may not be as convenient as a code that generates redundant bits continuously so
that error checking and correcting are carried out continuously. This is the function
of convolution codes.
A convolutional code is defined by three parameters: n, k, and K. An (n, k, K)
code processes input data k bits at a time and produces an output of n bits for each
incoming k bits. So far this is the same as the block code. In the case of a convolu-
tional code, n and k are generally quite small numbers. The difference is that con-
volutional codes have memory, which is characterized by the constraint factor K. In
essence, the current n-bit output of an (n, k, K) code depends not only on the value
of the current block of k input bits but also on the previous K - 1 blocks of k input
bits. Hence, the current output of n bits is a function of the last K * k input bits.
Convolutional codes are best understood by looking at a specific example. We
use the example shown in Figure 10.10. There are two alternative representations
of the code shown in the figure. Figure 10.10a is a shift register, which is most con-
venient for describing and implementing the encoding process. Figure 10.10b is an
equivalent representation that is useful in discussing the decoding process.
For an (n, k, K) code, the shift register contains the most recent K * k input
bits; the register is initialized to all zeros.6 The encoder produces n output bits, after
which the oldest k bits from the register are discarded and k new bits are shifted in.
Thus, although the output of n bits depends on K * k input bits, the rate of encod-
ing is n output bits per k input bits. As in a block code, the code rate is therefore
k/n. The most commonly used binary encoders have k = 1 and hence a shift register
length of K. Our example is of a (2, 1, 3) code (Figure 10.10a). The shift register
holds K * k = 3 * 1 bits un, un - 1, and un - 2. For each new input bit un, two output
bits vn1 and vn2 are produced using the three most recent bits. The first output bit
produced here is from the upper logic circuit (vn1 = un ⊕ un - 1 ⊕ un - 2), and the
second output bit from the lower logic circuit (vn2 = un ⊕ un - 2).
For any given input of k bits, there are 2k(K - 1) different functions that map
the k input bits into n output bits. Which function is used depends on the history
6
In some of the literature, the shift register is shown with one less storage cell and with the input bits feed-
ing the XOR circuits as well as a storage cell; the depictions are equivalent.
300 CHAPTER 10 / Coding and Error Control
00
a = 00
Output
bits 11
11 b = 10
10
{
00
Decoding
To understand the decoding process, it simplifies matters to expand the state dia-
gram to show the time sequence of the encoder. If the state diagram is laid out
vertically, as shown in Figure 10.10b, then the expanded diagram, called a trellis, is
constructed by reproducing the states horizontally and showing the state transitions
going from left to right corresponding to time, or data input (Figure 10.11). If the
constraint length K is large, then the trellis becomes unwieldy because there would
be a large number of rows. In that case, 2K - 2 simplified trellis fragments can be used
10.3 / Convolutional Codes 301
Encoder
state
t0 t1 t2 t3 t4 t5 t6 t7
0/00 0/00 0/00 0/00 0/00 0/00 0/00
a = 00 a a a a a a a a
1/
1/
1/
1/
1/
1/
1/
11
11
11
11
11
11
11
1/ 11
1/ 11
1/ 11
1/ 11
1/ 11
b = 10 b b b b b b b b
0/
0/
0/
0/
0/
00
00
00
00
00
0/
10
0/
0/
0/
0/
0/
10
10
10
10
10
c = 01 c c c c c c c c
1/0
1
1/0
1/0
1/0
1/0
1/0
01
01
01
01
01
1
1
0/
0/
0/
0/
0/
1/10 1/10 1/10 1/10 1/10
d = 11 d d d d d d d d
Input bit = 0
Input bit = 1
Figure 10.11 Trellis Diagram for Encoder of Figure 10.9
to depict the transitions. Figure 10.12 demonstrates this for a (2, 1, 7) code. Each of
the states of the encoder is shown, along with the branch definition.
Any valid output is defined by a path through the trellis. In our example, the
path a-b-c-b-d-c-a-a produces the output 11 10 00 01 01 11 00 and was generated by
the input 1011000. If an invalid path occurs, such as a-c, then the decoder attempts
error correction. In essence, the decoder must determine what data input was most
likely to have produced the invalid output.
A number of error correction algorithms have been developed for convolu-
tional codes. Perhaps the most important is the Viterbi algorithm. In essence, the
Viterbi technique compares the received sequence with all possible transmitted
sequences. The algorithm chooses a path through the trellis whose coded sequence
differs from the received sequence in the fewest number of places. Once a valid path
is selected as the correct path, the decoder can recover the input data bits from the
most likely output code bits.
There are several variations on the Viterbi algorithm, depending on which
metric is used to measure differences between received sequences and valid
sequences. To give an idea of the operation of the algorithm, we use the common
metric of Hamming distance. We represent a received coded sequence as the word
w = w0w1w2 c, and attempt to find the most likely valid path through the trellis.
At each time i and for each state we list the active path (or paths) through the trellis to
the state. An active path is a valid path through the trellis whose Hamming distance
from the received word up to time i is minimal. We label each state at time i by the
distance of its active path from the received word. The following relationship is used:
(distance of a path) = (distance of the last edge)
+ (distance of the last - but - one state) (10.7)
302 CHAPTER 10 / Coding and Error Control
Example 10.10 7 Using the encoder defined in Figures 10.9 and 10.10, Figure 10.13 shows
the application of the Viterbi algorithm to the sequence 10010100101100 . . . , with a
decoding window of length b = 7. The animation for Figure 10.13 illustrates the follow-
ing procedure. The lines in the figure represent valid paths through the trellis. The label
for each lines shows the input data bit that caused that path and the output bits. The bold
lines indicate the current active paths that were chosen at each step that had the minimal
Hamming distance. At step 1, we have a received sequence of w0w1 = 10. The two valid
sequences are 00 for edge a-a and 11 for edge a-b as seen in Figure 10.10. For both of
these sequences, there is a distance of 1 from the received sequence. Two active paths
are defined, each with a state label of 1. For the next step, we have w2w3 = 01. Using
Equation (10.7), we compute the total cumulative differences to the four possible valid
states (from top to bottom) as 2, 2, 3, and 1. So far, all possible valid paths are included
as active paths. In Step 3, we see that some valid paths do not survive because they are
not chosen as active paths. This is because each such path terminates on a state for which
there is another valid path that has a smaller distance. For example, the state sequence
a-a-a-a has a discrepancy of 3, while the state sequence a-b-c-a has the discrepancy of 4
and is excluded. At the conclusion of step 7, all active paths pass through the first edge
a-b, which has the output 11. The algorithm corrects w0w1 to 11 and continues with the
next block w2w3. Working backward through the trellis also decodes the original data by
looking at the input bits that caused that path. Note that if the window size b was 5, the
error would not have been correctable because the first edges a -a and a-b are both active.
Turbo Coding
As higher and higher speeds are used in wireless applications, error correction con-
tinues to pose a major design challenge. Recently, a new class of codes, called turbo
codes, has emerged as a popular choice for third and fourth-generation wireless sys-
tems. Turbo codes exhibit performance, in terms of bit error probability, that is very
close to the Shannon limit and can be efficiently implemented for high-speed use.
A number of different turbo encoders and decoders have been introduced, most of
which are based on convolutional encoding. In this subsection, we give a general
overview.
Figure 10.14a depicts a turbo encoder. In this scheme, the encoder is rep-
licated. One copy of the encoder receives a stream of input bits and produces a
single output check bit C1 for each input bit. The input to the other encoder is an
interleaved version of the input bit stream, producing a sequence of C2 check bits.
The initial input bit plus the two check bits are then multiplexed to produce the
sequence I1C11C21I2C12C22 c, that is, the first input bit followed by the first bit
from encoder 1, followed by the first bit from encoder 2, and so on. The resulting
sequence has a code rate of 13 . A code rate of 12 can be achieved by taking only half
of the check bits, alternating between outputs from the two encoders; this process
7
This example is based on one in [ADAM91].
304 CHAPTER 10 / Coding and Error Control
Step 0 a 0 Step 4 a 3
b b 1
c c 3
d d 3
Step 1 a 1 Step 5 a 4
b 1 b 4
c c 1
d d 3
Step 2 a 2 Step 6 a 1
b 2 b 3
c 3 c 4
d 1 d 4
Step 3 a 3 Step 7 a 1
b 3 b 3
c 1 c 4
d 2 d 4
IC1C2IC1C2• • •
IC1IC2• • •
Multiplexer
C1
Encoder 1 Puncture
C2
Interleaver Encoder 2
(a) Encoder
C'2 Iout
Decoder 2
I'C'1C'2IC'1C'2• • •
X2
I'C'1I'C'2• • •
Demultiplexer
Deinterleaver
Interleaver
Interleaver
I'
Depuncture
X1
Decoder 1
C'1
(b) Decoder
Figure 10.14 Turbo Encoding and Decoding
called soft decision decoding. Decoder 1 then produces correction (X1) values. The
I′ and X1 values are fed into Decoder 2, together with the C′ 2 values. Interleaving
must be performed to align bits properly. Decoder 2 uses all of its input to pro-
duce correction values X2. These are fed back to Decoder 1 for a second iteration
dn dn
{ an an-1 an-2
vn
dn dn
{ an an-1 an-2
{
Interleaver v1n
{ bn bn-1 bn-2
v2n
of the decoding algorithm, being first deinterleaved for alignment. After sufficient
iterations to produce a high level of confidence, an output bit is generated. This
may take several iterations to produce a good result, which could cause significant
delay. Turbo coding’s use of interleaving, parallel encoding, puncturing, soft deci-
sion decoding, and feedback gives it high performance.
Automatic repeat request (ARQ) is a mechanism used in data link control and
transport protocols and relies on the use of an error detection code, such as the
cyclic redundancy check (CRC) described in Section 10.1. The ARQ error control
mechanism is closely related to a flow control mechanism that is also a part of these
protocols. We first examine flow control and then go on to look at ARQ. In what
follows, we refer to the block of data that is transmitted from one protocol entity to
another as a protocol data unit (PDU); this term was introduced in Chapter 4.
Flow Control
Flow control is a technique for assuring that a transmitting entity does not over-
whelm a receiving entity with data. The receiving entity typically allocates a data
buffer of some maximum length for a transfer. When data are received, the receiver
must do a certain amount of processing (e.g., examine the header and strip it off the
10.4 / Automatic Repeat Request 307
PDU) before passing the data to the h igher-level software. In the absence of flow
control, the receiver’s buffer may fill up and overflow while it is processing old data.
To begin, we examine mechanisms for flow control in the absence of errors. The
model we will use is depicted in Figure 10.17a, which is a vertical-time sequence diagram.
It has the advantages of showing time dependencies and illustrating the correct send-
receive relationship. Each arrow represents a single PDU traversing a data link between
two stations. The data are sent in a sequence of PDUs, with each PDU containing a por-
tion of the data and some control information. For now, we assume that all PDUs that
are transmitted are successfully received; no PDUs are lost and none arrive with errors.
Furthermore, PDUs arrive in the same order in which they are sent. However, each
transmitted PDU suffers an arbitrary and variable amount of delay before reception.
Typically, when a source has a block or stream of data to transmit, the source
will break up the block of data into smaller blocks and transmit the data in many
PDUs. This is done for the following reasons:
• The buffer size of the receiver may be limited.
• The longer the transmission, the more likely that there will be an error, neces-
sitating retransmission of the entire PDU. With smaller PDUs, errors are
detected sooner, and a smaller amount of data needs to be retransmitted.
• On a shared medium, such as a LAN, it is usually desirable not to permit one
station to occupy the medium for an extended period, thus causing long delays
at the other sending stations.
PDU 1 PDU 1
PDU 1 PDU 1
PDU 2 PDU 2
PDU 2
PDU 3 PDU 3
Time
PDU 3 PDU 3
PDU 4 PDU 4
PDU 4
X Garbled
frame
PDU 5 PDU 5
PDU 5 PDU 5
Typically, protocols that have a flow control mechanism allow multiple PDUs to
be in transit at the same time. Let us examine how this might work for two stations, A
and B, connected via a full-duplex link. Station B allocates buffer space for W PDUs.
Thus B can accept W PDUs, and A is allowed to send W PDUs without waiting for
any acknowledgments. To keep track of which PDUs have been acknowledged, each is
labeled with a sequence number. B acknowledges a PDU by sending an acknowledg-
ment that includes the sequence number of the next PDU expected. This acknowledg-
ment also implicitly announces that B is prepared to receive the next W PDUs, beginning
with the number specified. This scheme can also be used to acknowledge multiple PDUs
to save network overhead. For example, B could receive PDUs 2, 3, and 4 but withhold
acknowledgment until PDU 4 has arrived. By then returning an acknowledgment with
sequence number 5, B acknowledges PDUs 2, 3, and 4 at one time. A maintains a list of
sequence numbers that it is allowed to send, and B maintains a list of sequence numbers
that it is prepared to receive. Each of these lists can be thought of as a window of PDUs.
The operation is referred to as sliding-window flow control.
Several additional comments need to be made. Because the sequence number
to be used occupies a field in the PDU, it is clearly of bounded size. For example,
for a 3-bit field, the sequence number can range from 0 to 7. Accordingly, PDUs are
numbered modulo 8; that is, after sequence number 7, the next number is 0. In gen-
eral, for a k-bit field the range of sequence numbers is 0 through 2k - 1, and PDUs
are numbered modulo 2k.
Figure 10.18 is a useful way of depicting the s liding-window process. It assumes
the use of a 3-bit sequence number, so that PDUs are numbered sequentially from 0
through 7, and then the same numbers are reused for subsequent PDUs. The shaded
rectangle indicates the PDUs that may be sent; in this figure, the sender may transmit
five PDUs, beginning with PDU 0. Each time a PDU is sent, the shaded window shrinks;
each time an acknowledgment is received, the shaded window grows. PDUs between
the vertical bar and the shaded window have been sent but not yet acknowledged. As
we shall see, the sender must buffer these PDUs in case they need to be retransmitted.
The window size need not be the maximum possible size for a given
sequence number length. For example, using a 3-bit sequence number, a window
size of 4 could be configured for the stations using the sliding-window flow con-
trol protocol.
The mechanism so far described does indeed provide a form of flow control:
The receiver must only be able to accommodate seven PDUs beyond the one it
has last acknowledged. Most protocols also allow a station to cut off the flow of
PDUs from the other side by sending a Receive Not Ready (RNR) message, which
acknowledges former PDUs but forbids transfer of future PDUs. Thus, RNR 5
means “I have received all PDUs up through number 4 but am unable to accept any
more.” At some subsequent point, the station must send a normal acknowledgment
to reopen the window.
So far, we have discussed transmission in one direction only. If two stations
exchange data, each needs to maintain two windows, one to transmit and one to
receive, and each side needs to send the data and acknowledgments to the other. To
provide efficient support for this requirement, a feature known as piggybacking is
typically provided. Each data PDU includes a field that holds the sequence number
of that PDU plus a field that holds the sequence number used for acknowledgment.
10.4 / Automatic Repeat Request 309
PDUs buffered
until acknowledged
Window of PDUs that
PDUs already transmitted may be transmitted
••• 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 •••
Window of PDUs
PDUs already received that may be accepted
••• 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 •••
Thus, if a station has data to send and an acknowledgment to send, it sends both
together in one PDU, saving communication capacity. Of course, if a station has an
acknowledgment but no data to send, it sends a separate acknowledgment PDU. If
a station has data to send but no new acknowledgment to send, it must repeat the
last acknowledgment that it sent. This is because the data PDU includes a field for
the acknowledgment number, and some value must be put into that field. When a
station receives a duplicate acknowledgment, it simply ignores it.
Error Control
Error control refers to mechanisms to detect and correct errors that occur in the
transmission of PDUs. The model that we will use, which covers the typical case, is
illustrated in Figure 10.17b. As before, data are sent as a sequence of PDUs; PDUs
arrive in the same order in which they are sent; and each transmitted PDU suffers
an arbitrary and variable amount of delay before reception. In addition, we admit
the possibility of two types of errors:
• Lost PDU: A PDU fails to arrive at the other side. For example, a noise burst
may damage a PDU to the extent that the receiver is not aware that a PDU
has been transmitted.
310 CHAPTER 10 / Coding and Error Control
Example 10.11 An example is shown in Figure 10.19 and its animation. The example assumes a
3-bit sequence number field and a maximum window size of seven PDUs. Initially, A and B have
windows indicating that A may transmit seven PDUs, beginning with PDU 0 (P0). After transmit-
ting three PDUs (P0, P1, P2) without acknowledgment, A has shrunk its window to four PDUs and
maintains a copy of the three transmitted PDUs. The window indicates that A may transmit four
PDUs, beginning with PDU number 3. B then transmits an RR (Receive Ready) 3, which means
“I have received all PDUs up through PDU number 2 and am ready to receive PDU number 3; in
fact, I am prepared to receive seven PDUs, beginning with PDU number 3.” With this acknowledg-
ment, A once again has permission to transmit seven frames, still beginning with frame 3; also A
may discard the buffered frames that have now been acknowledged. A proceeds to transmit PDUs
3, 4, 5, and 6. B returns RR 4, which acknowledges P3, and allows transmission of P4 through the
next instance of P2. By the time this RR reaches A, it has already transmitted P4, P5, and P6, and
therefore A may only open its window to permit sending four PDUs beginning with P7.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
P0
P1
P2
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
RR 3 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
P3
P4
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
P5
P6
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 4
RR
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
• Damaged PDU: A recognizable PDU does arrive, but some of the bits are in
error (have been altered during transmission) and cannot be corrected.
The most common techniques for error control are based on some or all of the
following ingredients:
• Error detection: The receiver detects errors and discards PDUs that are in
error.
10.4 / Automatic Repeat Request 311
• Positive acknowledgment: The destination returns a positive acknowledgment
for one or more successfully received, error-free frames.
• Retransmission after timeout: The source retransmits a PDU that has not been
acknowledged after a predetermined amount of time.
• Negative acknowledgment and retransmission: The destination returns a
negative acknowledgment to PDUs in which an error is detected. The source
retransmits such PDUs.
Collectively, these mechanisms are all referred to as automatic repeat request
(ARQ); the effect of ARQ is to turn an unreliable data link into a reliable one. The
most commonly used version of ARQ is known as go-back-N ARQ. Go-back-N
ARQ is based on the sliding-window flow control mechanism just discussed.
In go-back-N ARQ, a station may send a series of PDUs sequentially numbered
modulo some maximum value. The number of unacknowledged PDUs outstand-
ing is determined by window size, using the sliding-window flow control technique.
While no errors occur, the destination will acknowledge incoming PDUs as usual
(RR = receive ready, or piggybacked acknowledgment). If the destination station
detects an error in a PDU, it sends a negative acknowledgment (REJ = reject) for
that PDU. The destination station will discard that PDU and all future incoming PDUs
until the PDU in error is correctly received. Thus the source station, when it receives a
REJ, must retransmit the PDU in error plus all succeeding PDUs that had been trans-
mitted in the interim. Hence, the name go-back-N to retransmit these PDUs.
Consider that station A is sending PDUs to station B. After each transmission,
A sets an acknowledgment timer for the PDU just transmitted. Suppose that B has
previously successfully received PDU (i - 1) and A has just transmitted PDU i. The
go-back-N technique takes into account the following contingencies:
1. Damaged PDU. If the received PDU is invalid (i.e., B detects an error),
B discards the PDU and takes no further action as the result of that PDU. There
are two subcases:
a. Within a reasonable period of time, A subsequently sends PDU (i + 1).
B receives PDU (i + 1) out of order because it is expecting PDU (i) and
sends a REJ i. A must retransmit PDU i and all subsequent PDUs.
b. A does not soon send additional PDUs. B receives nothing and returns
neither an RR nor a REJ. When A’s timer expires, it transmits an RR PDU
that includes a bit known as the P bit, which is set to 1. B interprets the RR
PDU with a P bit of 1 as a command that must be acknowledged by send-
ing an RR indicating the next PDU that it expects, which is PDU i. When
A receives the RR, it retransmits PDU i.
2. Damaged RR. There are two subcases:
a. B receives PDU i and sends RR (i + 1), which suffers an error in tran-
sit. Because acknowledgments are cumulative (e.g., RR 6 means that all
PDUs through 5 are acknowledged), it may be that A will receive a sub-
sequent RR to a subsequent PDU and that it will arrive before the timer
associated with PDU i expires.
b. If A’s timer expires, it transmits an RR command as in Case 1b. It sets an-
other timer, called the P-bit timer. If B fails to respond to the RR command,
312 CHAPTER 10 / Coding and Error Control
or if its response suffers an error in transit, then A’s P-bit timer will expire.
At this point, A will try again by issuing a new RR command and restarting
the P-bit timer. This procedure is tried for a number of iterations. If A fails
to obtain an acknowledgment after some maximum number of attempts, it
initiates a reset procedure.
3. Damaged REJ. If a REJ is lost, this is equivalent to Case 1b.
Figure 10.20 is an example of the PDU flow for g o-back-N ARQ. It sends
RRs only for even numbered PDUs. Because of the propagation delay on the line,
by the time that an acknowledgment (positive or negative) arrives back at the send-
ing station, it has already sent two additional PDUs beyond the one being acknowl-
edged. Thus, when a REJ is received to PDU 5, not only PDU 5 but PDUs 6 and
7 must be retransmitted. Thus, the transmitter must keep a copy of all unacknowl-
edged PDUs.
PDU 0
PDU 1
PDU 2
RR 2
PDU 3
PDU 4
RR 4
PDU 5
PDU 6
X
Error
PDU 7
REJ 5 Discarded
by receiver
PDU 5
5, 6, and 7 PDU 6
retransmitted RR 6
PDU 7
PDU 0
timeout RR 0
PDU 1
X
RR (P
b it = 1
)
RR 2
PDU 2
Example 10.12 Consider an FEC coder that produces a 13 rate code that is punctured to
become a 12 rate code. Say, there are 100 bits of data that become a 300 bit FEC codeword.
To become a 12 rate FEC codeword, there need to be 2 bits of codeword for every 1 bit
of data, hence a 200 bit codeword. This means 100 bits, 1 of every 3 bits of the original
FEC code, need to be punctured. At the receiver, the missing 100 bits would be replaced
before decoding. These could just be replaced with random numbers, which would mean
that roughly 50 of those would coincidentally be correct and the other 50 incorrect. The
original FEC code may actually still be plenty effective enough to correct those errors if
the received s ignal-to-noise ratio is relatively good. If not, a later retransmission might use
less puncturing or puncture different bits.
rates allows the decoder structure to remain the same, instead of having multiple
decoders for different code rates. The benefits of this reduction in complexity can
outweigh the reduction in performance from puncturing. Used with HARQ incre-
mental redundancy, puncturing will take a single output from an FEC coder and
remove more or different bits each time.
• Adaptive modulation and coding: Systems will use channel quality information
(CQI) to estimate the best modulation and coding to work with HARQ. For
example, fourth-generation cellular LTE uses the CQI to determine the high-
est modulation and coding rate that would provide a 10% block error rate for
the first HARQ transmission. Also, if the CQI changes in the middle of an
HARQ process, the modulation and coding might be adapted.
• Parallel HARQ processes: Some systems wait until the HARQ process finishes
for one frame before sending the next frame; this is known as a stop-and-wait
protocol. The process of waiting for an ACK or NACK, followed by possible
multiple retransmissions can be time consuming, however. Therefore, some
HARQ implementations allow for multiple open HARQ operations to be occur-
ring at the same time. This is known as an N-channel Stop-and-Wait protocol.
The classic treatment of error detection codes and CRC is [PETE61]. [RAMA88] is an excel-
lent tutorial on CRC.
[ADAM91] provides comprehensive treatment of error correction codes. [SKLA01]
contains a clear, well-written section on the subject. Two useful survey articles are [BERL87]
and [BHAR83]. A quite readable theoretical and mathematical treatment of error correction
codes is [ASH90].
Two good treatments of turbo codes are [SKLA97] and [BERR96]. A more detailed analy-
sis is [VUCE00]. Hybrid ARQ is discussed in [GHOS11] in several places when describing LTE.
BERR96 Berrou, C., and Glavieux, A. “Near Optimum Error Correcting Codes and
Decoding: Turbo Codes.” IEEE Transactions on Communications, October 1996.
BHAR83 Bhargava, V. “Forward Error Correction Schemes for Digital Communica-
tions.” IEEE Communications Magazine, January 1983.
GHOS11 Ghosh, A.; Zhang, J.; Andrews J.; and Muhamed, R. Fundamentals
of LTE. Upper Saddle River, NJ: Prentice Hall, 2011.
PETE61 Peterson, W., and Brown, D. “Cyclic Codes for Error Detection.” Proceedings
of the IEEE, January 1961.
RAMA88 Ramabadran, T., and Gaitonde, S. “A Tutorial on CRC Computations.”
IEEE Micro, August 1988.
SKLA97 Sklar, B. “A Primer on Turbo Code Concepts.” IEEE Communications
Magazine, December 1997.
SKLA01 Sklar, B. Digital Communications: Fundamentals and Applications. Upper
Saddle River, NJ: Prentice Hall, 2001.
VUCE00 Vucetic, B., and Yuan, J. Turbo Codes: Principles and Applications. Boston:
Kluwer Academic Publishers, 2000.
Key Terms
Review Questions
10.1 What is a parity bit?
10.2 What is the CRC?
10.3 Why would you expect a CRC to detect more errors than a parity bit?
10.4 List three different ways in which the CRC algorithm can be described.
10.5 Is it possible to design an FEC that will correct some double-bit errors but not all
double-bit errors? Why or why not?
10.6 In an (n, k) block FEC, what do n and k represent?
10.7 In an (n, k, K) convolutional code, what do n, k, and K represent?
316 CHAPTER 10 / Coding and Error Control
Problems
10.1 What is the purpose of using modulo 2 arithmetic rather than binary arithmetic in
computing an FCS?
10.2 Consider a frame consisting of two characters of four bits each. Assume that the prob-
ability of bit error is 10-3 and that it is independent for each bit.
a. What is the probability that the received frame contains at least one error?
b. Now add a parity bit to each character. What is the probability?
10.3 Using the C RC-CCITT polynomial, generate the 16-bit CRC code for a message con-
sisting of a 1 followed by 15 0s.
a. Use long division.
b. Use the shift register mechanism shown in Figure 10.4.
10.4 Explain in words why the shift register implementation of CRC will result in all 0s at
the receiver if there are no errors. Demonstrate by example.
10.5 For P = 110011 and M = 11100011, find the CRC.
10.6 A CRC is constructed to generate a 4-bit FCS for an 11-bit message. The generator
polynomial is X4 + X3 + 1.
a. Draw the shift register circuit that would perform this task (see Figure 10.4).
b. Encode the data bit sequence 10011011100 (leftmost bit is the least significant)
using the generator polynomial and give the codeword.
c. Now assume that bit 7 (counting from the least significant bit) in the codeword is
in error and show that the detection algorithm detects the error.
10.7 A modified CRC procedure is commonly used in communications standards such
as HDLC. It is defined as follows:
Switch 1
A B
Cn–k–1 { Cn–k–2 { • • • { C1 { C0
Delay of
n–k
shifts
An–k–1 An–k–2 A2 A1
Input
(k bits)
{
Switch
2 A
B
Figure 10.21 Another CRC Architecture to Implement Divisor 1 + A1X + A2X2 + # +
An-1Xn-k-1 + Xn-k
is that it shows the correspondence with the long division process more clearly.
Explain.
d. What is a disadvantage of the structure in Figure 10.21?
10.9 Calculate the Hamming pairwise distances among the following codewords:
a. 00000, 10101, 01010
b. 000000, 010101, 101010, 110110
10.10 Section 10.2 discusses block error correction codes that make a decision on the basis
of minimum distance. That is, given a code consisting of s equally likely codewords of
length n, for each received sequence v, the receiver selects the codeword w for which
the distance d(w, v) is a minimum. We would like to prove that this scheme is “ideal”
in the sense that the receiver always selects the codeword for which the probability of
w given v, p(w v), is a maximum. Because all codewords are assumed equally likely,
the code word that maximizes p(w v) is the same as the codeword that maximizes
p(v w).
a. In order that w be received as v, there must be exactly d(w, v) errors in transmis-
sion, and these errors must occur in those bits where w and v disagree. Let b be
the probability that a given bit is transmitted incorrectly and n be the length of
a codeword. Write an expression for p(v w) as a function of b, d(w, v), and n.
Hint: the number of bits in error is d(w, v) and the number of bits not in error is
n - d(w, v).
b. Now compare p(v w1) and p(v w2) for two different codewords w1 and w2 by
calculating p(v w1)/p(v w2).
c. Assume that 0 6 b 6 0.5 and show that p(v w1) 7 p(v w2) if and only if
d(v, w1) 6 d(v, w2). This proves that the codeword w that gives the largest value
of p(v w) is that word whose distance from v is a minimum.
10.11 Section 10.2 states that for a given positive integer t, if a code satisfies dmin Ú 2t + 1,
then the code can correct all bit errors up to and including errors of t bits. Prove this
assertion. Hint: Start by observing that for a codeword w to be decoded as another
codeword w′, the received sequence must be at least as close to w′ as to w.
318 CHAPTER 10 / Coding and Error Control
10.12 For the Hamming code shown in Table 10.2, show the formulas used to calculate the
check bits as functions of the data bits.
10.13 For the Hamming code shown in Table 10.2, show what happens when a check bit
rather than a data bit is received in error.
10.14 Suppose an 8-bit data word stored in memory is 11000010. Using the Hamming algo-
rithm, determine what check bits would be stored in memory with the data word.
Show how you got your answer.
10.15 For the 8-bit word 00111001, the check bits stored with it would be 0111. Suppose
when the word is read from memory, the check bits are calculated to be 1101. What is
the data word that was read from memory?
10.16 How many check bits are needed if the Hamming error correction code is used to
detect single-bit errors in a 1024-bit data word?
10.17 Divide f(X) = X6 + 1 by g(X) = X4 + X3 + X + 1. Verify the result by multiply-
ing the quotient by g(X) to recover f(X).
10.18 For the example related to Table 10.3:
a. Draw the LFSR.
b. Using a layout similar to Figure 10.3b, show that the check bits for the data block
1010 are 001.
10.19 Using an interleaving structure of n = 4, m = 6 (Figure 10.9), demonstrate each of
the following block interleaving characteristics:
a. Any burst of m contiguous channel bit errors results in isolated errors at the dein-
terleaver output that are separated from each other by at least n bits.
b. Any bm bursts of errors (b 7 1) results in output bursts from the deinterleaver of
no more than < b = errors. Each output burst is separated from the other bursts by
no less than : b ; bits. The notation < b = means the smallest integer no less than b,
and : b ; means the largest integer no greater than b.
c. A periodic sequence of single-bit errors spaced m bits apart results in a single
burst of errors of length n at the deinterleaver output.
d. Not including channel propagation delay, the interleaver end-to-end delay is
2(n(m - 1) + 1). Only (n(m - 1) + 1) cells need to be filled before transmission
can begin and a corresponding number needs to be filled at the receiver before
deinterleaving can begin.
10.20 Consider a convolutional encoder defined by (vn1 = un ⊕ un - 2) and (vn2 = un - 1 ⊕
un - 2).
a. Draw a shift register implementation for this encoder similar to Figure 10.10a.
b. Draw a state diagram for this encoder similar to Figure 10.10b.
c. Draw a trellis diagram for this encoder similar to Figure 10.11.
10.21 For the encoder of Problem 10.20, assume that the shift register is initialized to all
zeros and that after the transmission of the last information bit, two zero bits are
transmitted.
a. Why are the two extra bits needed?
b. What is the encoded sequence corresponding to the information sequence
1101011, where the leftmost bit is the first bit presented to the encoder?
10.22 The simplest form of flow control, known as stop-and-wait flow control, works
as follows. A source entity transmits a frame. After the destination entity receives
the frame, it indicates its willingness to accept another frame by sending back an
acknowledgment to the frame just received. The source must wait until it receives the
acknowledgment before sending the next frame. The destination can thus stop the
flow of data simply by withholding acknowledgment. Consider a h alf-duplex point-
to-point link using a stop-and-wait scheme, in which a series of messages is sent, with
each message segmented into a number of frames. Ignore errors and frame overhead.
a. What is the effect on line utilization of increasing the message size so that fewer
messages will be required? Other factors remain constant.
10.6 / Key Terms, Review Questions, and Problems 319
4000 km 1000 km
A B C
b. What is the effect on line utilization of increasing the number of frames for a
constant message size?
c. What is the effect on line utilization of increasing frame size?
10.23 In Figure 10.22, frames are generated at node A and sent to node C through
node B. Determine the minimum data rate required between nodes B and C so that
the buffers of node B are not flooded, based on the following:
• The data rate between A and B is 100 kbps.
• The propagation delay is 5 ms/km for both lines.
• There are full duplex lines between the nodes.
• All data frames are 1000 bits long; ACK frames are separate frames of negligible
length.
• Between A and B, a sliding-window protocol with a window size of 3 is used.
• Between B and C, stop-and-wait is used.
• There are no errors.
Hint: In order not to flood the buffers of B, the average number of frames entering
and leaving B must be the same over a long interval.
10.24 A channel has a data rate of R bps and a propagation delay of t seconds per kilome-
ter. The distance between the sending and receiving nodes is L kilometers. Nodes
exchange fixed-size frames of B bits. Find a formula that gives the minimum sequence
field size of the frame as a function of R, t, B, and L (considering maximum utiliza-
tion). Assume that ACK frames are negligible in size and the processing at the nodes
is instantaneous.
10.25 Two neighboring nodes (A and B) use a sliding- window protocol with a 3-bit
sequence number. As the ARQ mechanism, go-back-N is used with a window size of
4. Assuming A is transmitting and B is receiving, show the window positions for the
following succession of events:
a. Before A sends any frames
b. After A sends frames 0, 1, 2 and receives acknowledgment from B for 0 and 1
c. After A sends frames 3, 4, and 5 and B acknowledges 4 and the ACK is received
by A
10.26 Two stations communicate via a 1-Mbps satellite link with a propagation delay of
270 ms. The satellite serves merely to retransmit data received from one station to
another, with negligible switching delay. Using frames of 1024 bits with 3-bit sequence
numbers, what is the maximum possible data throughput; that is, what is the through-
put of data bits carried in frames?