Viterbi Decoding Techniques For The TMS320C55x DSP Generation
Viterbi Decoding Techniques For The TMS320C55x DSP Generation
Viterbi Decoding Techniques For The TMS320C55x DSP Generation
In practice, this states metric is only slightly lower than the others. This can be explained by the fact that all paths with drastically lower metrics
have already been eliminated. Some more advanced forms of the VA look at two or more states with the lowest accumulated distances, and pick
the actual path based on other criteria.
(6)
SPRA776A
7 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
2.8 Soft Versus Hard Decisions
Local distances are calculated for each possible path state in the metric update, producing a
probability measure that the received data was sent as that symbol. The method used to
calculate these local distances depends on the representation of the received data. If the data is
represented by a single bit, it is referred to as hard-decision data and Hamming distance
measures are used. When the data is represented by multiple bits, it is referred to as
soft-decision data and Euclidean distance measures are used.
The use of soft-decision inputs can provide up to about 2.2 dB more E
b
/N
0
at the same bit-error
level (for 4-bit data). This is because the received data contains some information on the
reliability of the data. Table 1 lists values and their significance for 3-bit quantized inputs.
Table 1. Soft-Decision Values
Value Significance
011 Most confident value
010
001
000 Least confident positive value
Null value
111 Less confident value
110
101
100 Most confident negative value
These soft-decision values typically come from a Viterbi equalizer, which reduces intersymbol
interference. This produces confidence values based on differences between received and
expected data.
2.9 Local-Distance Calculation
With hard-decision inputs, the local distance used is the Hamming distance. This is calculated
by summing the individual bit differences between received and expected data. With
soft-decision inputs, the Euclidean distance is typically used. This is defined (for rate 1/C) by:
local_distance( j ) +
C*1
n+0
[ SD
n
*G
n
( j )]
2
where SD
n
are the soft-decision inputs, G
n
(j) are the expected inputs for each path state, j is an
indicator of the path, and C is the inverse of the coding rate. This distance measure is the
(squared) C-dimensional vector length from the received data to the expected data.
(3)
SPRA776A
8 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
Expanding equation 3:
local_distance( j ) +
C*1
n+0
[ SD
2
n
*2SD
n
G
n
( j ) )G
2
n
( j ) ]
To minimize the accumulated distance, we are concerned with the portions of the equation that
are different for each path. The terms
C*1
n+0
SD
2
n
and
C*1
n+0
G
2
n
( j ) are the same for all paths, thus
they can be eliminated, reducing the equation to:
local_distance( j ) +*2
C*1
n+0
SD
n
G
n
( j )
Since the local distance is a negative value, its minimum value occurs when the local distance is
a maximum. The leading 2 scalar is removed, and the maximums are searched for in the
metric update procedure. This equation is a sum of the products of the received and expected
values on a bit-by-bit basis. Table 2 expands this equation for several coding rates.
Table 2. Local-Distance Values
RATE LOCAL_DISTANCE(J)
1/2 SD
0
G
0
(j) + SD
1
G
1
(j)
1/3 SD
0
G
0
(j) + SD
1
G
1
(j) + SD
2
G
2
(j)
1/4 SD
0
G
0
(j) + SD
1
G
1
(j) + SD
2
G
2
(j) + SD
3
G
3
(j)
The dependence of G
n
on the path is due to the mapping of specific path states to the trellis
structure, as determined by the encoder polynomials. Conversely, the SD
n
values represent the
received data and have no dependence on the current state. The local-distance calculation
differs depending on which new state is being evaluated.
The G
n
(j)s are coded as signed antipodal values, meaning that 0 corresponds to +1 and 1
corresponds to 1. This representation allows the equations to be even further reduced to
simple sums and differences in the received data. For a rate 1/n system, there are only 2
n
unique local distances at each symbol interval. Since half of these local distances are simply the
inverse of the other half, only 2
n1
values must be calculated and/or stored.
2.10 Puncturing
Puncturing is a method to reduce the coding rate by deleting symbols from the encoded data.
The decoder detects which symbols were deleted and replaces them, a process called
depuncturing. While this has the effect of introducing errors, the magnitude of the errors is
reduced by the use of soft-decision data and null symbols, which are halfway between a positive
and negative value. These null symbols add very little bias to the accumulated metrics. In some
coding schemes, no null value exists, requiring the depuncturing to use alternatively the smallest
positive and negative values.
(7)
Using the coding scheme in Table 1, the punctured symbols are
replaced by 000, then 111, etc. As expected, the performance of punctured codes is not equal to
that of their nonpunctured counterparts, but the increased coding rate is worth the decreased
performance.
For example, consider a 1/2-rate system punctured by deleting every 4th bit, a puncturing rate of
3/4. This means that the coding rate increases to
12
34
+
2
3
(4)
(5)
SPRA776A
9 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
The input sequence I(0) I(1) I(2) I(3) . . . is coded as:
G
0
(0) G
1
(0) G
0
(1) G
1
(1) G
0
(2) G
1
(2) G
0
(3) G
1
(3) ...
then is punctured and becomes:
G
0
(0) G
1
(0) G
0
(1) X G
0
(2) G
1
(2) X G
1
(3) ...
Usually, the deleted bit represented as X, alternates between G
0
and G
1
.
The bits are recombined and transmitted as:
G
0
(0) G
1
(0) G
0
(1) G
1
(2) G
0
(2) G
1
(3) ...
Assuming the receiver is using 3-bit soft-decision inputs as shown in Table 1, the depunctured
data appears as:
G
0
(0) G
1
(0) G
0
(1) 000 G
0
(2) G
1
(2) 111 G
1
(3)
and the normal Viterbi decoding process then is performed.
3 TMS320C55x Code for Viterbi Decoding
The TMS320C55x code for Viterbi decoding can be divided into three parts: initialization, metric
update, and traceback. These same code segments, with slight modifications, are used on
systems with different constraint lengths, frame sizes, and code rates.
3.1 Initialization
Before Viterbi decoding begins, a number of events must occur:
The processing mode is configured with:
Sign extension mode on (SXMD = 1)
Dual 16-bit accumulator mode on (C16 = 1), to enable simultaneous metric update of two
trellis paths
The required buffers and pointers are set:
Input buffer
Output buffer
Transition table
Metric storage (circular buffers must be set up and enabled).
Metric values are initialized.
The block-repeat counter is loaded with a number of output bits 1 (for metric update).
The input-data buffer is a linear buffer of size FS/CR words, where FS is the original frame size
in bits, and CR is the overall coding rate including puncturing. This buffer is larger than the frame
size because each transmitted bit is received as a multibit word (for soft-decision data). Since
these values are typically four bits or less, they can be packed to save space.
SPRA776A
10 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
The output buffer contains single-bit values for each symbol period. These bits are packed, so
that they require a linear buffer of size FS/16 words.
The transition table size in words is determined by the constraint length and the frame size
(2
K5
FS = number of states/ 16 the frame size).
Metric storage requires two buffers, each with a size equal to the number of states (2
K1
). To
minimize pointer manipulation, these buffers are usually configured as a single circular buffer.
All states, except state 0, are set to the same initial metric value. State 0 is the starting state and
requires an initial bias. State 0 is usually set to a value of 0, while all other states are set to the
minimum possible value (0x8000), providing room for growth as the metrics are updated.
3.2 Metric Update
Most of the calculation time is spent on the metric update, since all of the states must be
updated at each symbol interval. The calculations involved in the four steps of the metric update
for one state follow.
1. Calculate local distance of input to each possible path.
The local distance can be described as a sum of products; for example, SD
0
G
0
(j) + SD
1
G
1
(j) for
a rate 1/2 system. This is a straightforward add/subtract/accumulate procedure. Only 2
n1
local
distances must be calculated for a rate 1/n system, since one half are the inverse of the other
half. The inverse local distances are accommodated via subtraction in the total distance
accumulation.
2. Accumulate total distance for each path.
Due to its splittable ALU, dual accumulators, and specialized instructions, the C55x can
accumulate metrics for two paths in a single cycle if the local distance is stored in a Tx register.
The dual add/subtract instruction, ADDSUB, adds a Tx register to a value from memory, stores
the total in the lower half of the accumulator, subtracts the Tx register from the next memory
location and stores the result in the upper half of the accumulator.
3. Select and save minimum distance.
4. Save indication of path taken.
These previous two steps are accomplished in a single cycle, due to another specialized C55x
instruction. The MAXDIFF instruction:
Compares four 16-bit signed values in the upper and lower halves of two accumulators
Stores the maximum value to an accumulator
According to the extrema found, decision bits are shifted in TRN0 and TRN1 from the MSBs
to the LSBs.
This selects the minimum accumulated metric and indicates the path associated with this value.
The previous state values are not stored; they are reconstructed in the traceback routine from
the transition register.
3.3 Symmetry for Simplification
For rate 1/n systems, some inherent symmetry in the trellis structure is used to simplify these
calculations.
C55x is a trademark of Texas Instruments.
SPRA776A
11 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
The path states associated with the two paths leading to a delay state are complementary. If one
path has G
0
G
1
= 00, the other path has G
0
G
1
= 11. This symmetry is a function of the encoder
polynomials, so it is true in most systems, but not all.
Two starting and ending states are paired in a butterfly structure including all paths between
them. The four-path states in a butterfly also have symmetry as previously described (see
Figure 4).
00 00
01
New_Metric(0)
New_Metric(2)
Old_Metric(0)
Old_Metric(1)
00
10
11
11
00
Figure 4. Butterfly Structure for K = 3, Rate 1/2 Convolutional Encoder
These symmetries provide methods to simplify the metric update procedure:
Only one local-distance measure is needed for each butterfly; it is alternately added and
subtracted for each new state.
The prior accumulated metrics (old metric values) are the same for the updates of both new
states, minimizing address manipulations.
For these reasons and to satisfy pipeline latencies, the metric update is usually performed on
butterflies.
Since rate 1/n systems have 2
n1
absolute local distances for each symbol interval, many
butterflies share the same local distances. The local distances are calculated and stored before
the rest of the metric update. The following is the C55x code for a single butterfly in steady state:
; AR5: pointer to the old metrics table
; AR4: pointer to the new metrics table
; T2 = SD(2*j) SD(2*j+1)
;Compute New_metric (i)&(i+8)
hi(AC0) = *AR5+ T2, ;AC0=Old_Met(2*j) +T2
lo(AC0) = *AR5+ + T2 ;AC0=Old_met(2*j+1)T2
hi(AC1) = *AR5+ + T2, ;AC1=Old_Met(2*j) T2
lo(AC1) = *AR5+ T2 ;AC1=Old_met(2*j+1)+T2
max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1
||*AR4(T0) = lo(AC2), ;Store New_metric(i1)&(i1+8)
*AR4+ = hi(AC2)
Three instructions are required to update two states. The states are updated in consecutive
order to simplify pointer manipulation. In many systems, the same local distance is used in
consecutive butterflies.
SPRA776A
12 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
3.4 Use of Buffers
Two buffers are used in the metric update: one for the old accumulated metrics and one for the
new metrics. Each array is 2
K1
words, equal to the number of delay states. The old metrics are
accessed in consecutive order, requiring one pointer. The new metrics are updated in the order
0, 2
K2
, 1, 2
K2
+ 1, 2, 2
K2
+ 2, etc., and require two pointers for addressing. At the end of the
metric update, these buffers are swapped, so that the recently updated metrics become the old
metrics for the next symbol interval.
In addition to the metrics buffers, the transition registers must be stored. Since only one bit per
state is required to indicate the survivor path, one word of memory is required for each of the
16 states. Transition register (TRNx) storage requires 2
K5
words of memory.
3.5 Example Metric Update
Table 3 provides an example of the metric-update procedure for a K = 5, 1/2-rate encoder as
used in the Global System for Mobile Communications (GSM) system for speech full-rate traffic
(TCH/FS). In Table 3, sum and diff refer to the local distances. New(
) and Old(
) refer to the
current and previous metrics for a given state. The TRN data indicates the state associated with
each bit or an unknown, x.
SPRA776A
13 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
Table 3. Metric-Update Operations for GSM Viterbi Decoding
OPERATION CALCULATION
Calculate local distances Temp(0) = SD
0
SD
1
= diff
Temp(1) = SD
0
+SD
1
= sum
Load Tx registers T
3
= Temp(1), T
2
= Temp(0)
BFLY_DIR New(0) = max[ Old(0)+sum, Old(1)sum ]
New(8) = max[ Old(0)sum, Old(1)+sum ]
TRN0 = 0xxx xxxx xxxx xxxx
TRN1 = 1xxx xxxx xxxx xxxx
BFLY_REV New(1) = max[ Old(2)sum, Old(3)+sum ]
New(9) = max[ Old(2)+sum, Old(3)sum ]
TRN0 = 10xx xxxx xxxx xxxx
TRN1 = 11xx xxxx xxxx xxxx
BFLY_DIR New(2) = max[ Old(4)+sum, Old(5)sum ]
New(10) = max[ Old(4)sum, Old(5)+sum ]
TRN0 = 110x xxxx xxxx xxxx
TRN1 = 011x xxxx xxxx xxxx
BFLY_REV New(3) = max[ Old(6)sum, Old(7)+sum ]
New(11) = max[ Old(6)+sum, Old(7)sum ]
TRN0 = 0110 xxxx xxxx xxxx
TRN1 = 1011 xxxx xxxx xxxx
BFLY_DIR New(4) = max[ Old(8)+diff, Old(9)diff ]
New(12) = max[ Old(8)diff, Old(9)+diff ]
TRN0 = 0011 0xxx xxxx xxxx
TRN1 = 1101 1xxx xxxx xxxx
BFLY_REV New(5) = max[ Old(10)diff, Old(11)+diff ]
New(13) = max[ Old(10)+diff, Old(11)diff ]
TRN0 = 1001 10xx xxxx xxxx
TRN1 = 0110 11 xx xxxx xxxx
BFLY_DIR New(6) = max[ Old(12)+diff, Old(13)diff ]
New(14) = max[ Old(12)diff, Old(13)+diff ]
TRN0 = 1100 110x xxxx xxxx
TRN1 = 1011 011x xxxx xxxx
BFLY_REV New(7) = max[ Old(14)diff, Old(15)+diff ]
New(15) = max[ Old(14)+diff, Old(15)diff ]
TRN0 = 1110 0110 xxxx xxxx
TRN1 = 0101 1011 xxxx xxxx
Combine transition registers ACX = TRN1/TRN0<<8
TRN0 = 0110 0110 xxxx xxxx
TRN1 = 0101 1011 xxxx xxxx
Store transition register Trans(i) = TRN
After the metrics in one symbol interval are updated, the metrics-buffer pointers are updated for
the next iteration. Since the metrics buffers are set up as a circular buffer, this is accomplished
without overhead. The transition-data-buffer pointer is incremented by one.
Example 1 shows the implementation of the main loop of metric-update operations for the
GSM Viterbi decoding shown in Table 3.
SPRA776A
14 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
Example 1. Main Loop of the GSM Viterbi Implementation
blockrepeat {
T3 = hi(AC0) ;T3 = SD(2*j) + SD(2*j+1)
||TRN1 = *AR0 ;Clear TRN1
; BFLY_DIR ;Compute New_metric(0)&(8)
hi(AC0) = *AR5+ + T3, ;AC0=Old_met(2*j) +T3
lo(AC0) = *AR5+ T3 ;AC1=Old_met(2*j+1)T3
||T2 = lo(AC0) ;T2 = SD(2*j) SD(2*j+1)
hi(AC1) = *AR5+ T3, ;AC1=Old_met(2*j) T3
lo(AC1) = *AR5+ + T3 ;AC1=Old_met(2*j+1)+T3
||*AR1+ = AC1 ;Store hard decisions from
;previous iteration
max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1
||AC3 = #0 ;Clear AC3
; BFLY_REV ;Compute New_metric(1)&(9)
hi(AC0) = *AR5+ T3, ;AC0=Old_Met(2*j) T3
lo(AC0) = *AR5+ + T3 ;AC0=Old_met(2*j+1)+T3
||mar(AR4 + #1)
hi(AC1) = *AR5+ + T3, ;AC1=Old_Met(2*j) +T3
lo(AC1) = *AR5+ T3 ;AC1=Old_met(2*j+1)T3
max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1
||*AR4(T0) = lo(AC2), ;Store New_metric(0)&(8)
*AR4+ = hi(AC2)
; BFLY_DIR ;Compute New_metric(2)&(10)
hi(AC0) = *AR5+ + T3, ;AC0=Old_Met(2*j) +T3
lo(AC0) = *AR5+ T3 ;AC0=Old_met(2*j+1)T3
hi(AC1) = *AR5+ T3, ;AC1=Old_Met(2*j) T3
lo(AC1) = *AR5+ + T3 ;AC1=Old_met(2*j+1)+T3
max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1
||*AR4(T0) = lo(AC2), ;Store New_metric(1)&(9)
*AR4+ = hi(AC2)
; BFLY_REV ;Compute New_metric(3)&(11)
hi(AC0) = *AR5+ T3, ;AC0=Old_Met(2*j) +T3
lo(AC0) = *AR5+ + T3 ;AC0=Old_met(2*j+1)T3
hi(AC1) = *AR5+ + T3, ;AC1=Old_Met(2*j) T3
lo(AC1) = *AR5+ T3 ;AC1=Old_met(2*j+1)+T3
max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1
||*AR4(T0) = lo(AC2), ;Store New_metric(2)&(10)
*AR4+ = hi(AC2)
SPRA776A
15 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
; BFLY_DIR ;Compute New_metric(4)&(12)
hi(AC0) = *AR5+ + T2, ;AC0=Old_Met(2*j) +T2
lo(AC0) = *AR5+ T2 ;AC0=Old_met(2*j+1)T2
hi(AC1) = *AR5+ T2, ;AC1=Old_Met(2*j) T2
lo(AC1) = *AR5+ + T2 ;AC1=Old_met(2*j+1)+T2
max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1
||*AR4(T0) = lo(AC2), ;Store New_metric(3)&(11)
*AR4+ = hi(AC2)
; BFLY_REV ;Compute New_metric(5)&(13)
hi(AC0) = *AR5+ T2, ;AC0=Old_Met(2*j) +T2
lo(AC0) = *AR5+ + T2 ;AC0=Old_met(2*j+1)T2
hi(AC1) = *AR5+ + T2, ;AC1=Old_Met(2*j) T2
lo(AC1) = *AR5+ T2 ;AC1=Old_met(2*j+1)+T2
max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1
||*AR4(T0) = lo(AC2), ;Store New_metric(4)&(12)
*AR4+ = hi(AC2)
; BFLY_DIR ;Compute New_metric(6)&(14)
hi(AC0) = *AR5+ + T2, ;AC0=Old_Met(2*j) +T2
lo(AC0) = *AR5+ T2 ;AC0=Old_met(2*j+1)T2
hi(AC1) = *AR5+ T2, ;AC1=Old_Met(2*j) T2
lo(AC1) = *AR5+ + T2 ;AC1=Old_met(2*j+1)+T2
max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1
||*AR4(T0) = lo(AC2), ;Store New_metric(5)&(13)
*AR4+ = hi(AC2)
; BFLY_REV ;Compute New_metric(7)&(15)
hi(AC0) = *AR5+ T2, ;AC0=Old_Met(2*j) +T2
lo(AC0) = *AR5+ + T2 ;AC0=Old_met(2*j+1)T2
hi(AC1) = *AR5+ + T2, ;AC1=Old_Met(2*j) T2
lo(AC1) = *AR5+ T2 ;AC1=Old_met(2*j+1)+T2
max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1
||*AR4(T0) = lo(AC2), ;Store New_metric(6)&(14)
*AR4+ = hi(AC2)
*AR6 = TRN1 ;Store TRN1 in AC1 (MMR access)
||T3 = *AR2 ;T3=SD(2*j+1)
hi(AC0) = *(AR2+T1) + T3, ;AC0_H = SD(2*j) + SD(2*j+1)
lo(AC0) = *(AR2+T1) T3 ;AC0_H = SD(2*j) SD(2*j+1)
||*AR7 = TRN0 ;Store TRN0 in AC3 (MMR access)
*AR4(T0) = lo(AC2), ;Store New_metric(7)&(15)
*(AR4+T0) = hi(AC2)
||AC1 = AC1 | (AC3 <<< #8) ;Combine hard decisions
} ;Endo
SPRA776A
16 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
3.6 Traceback Function
Traceback requires much less processing than the metric update, since only one bit per symbol
interval is output for hard-output Viterbi. The calculations and code follow:
1. Calculate position in transition data of the current state.
The metric update stores one bit per delay state indicating the survivor path. Although each
transition decision table entry has information from 2
K1
delay states, only one state is used for
each iteration. The main function of the traceback algorithm is to extract the correct bit from the
transition data for each symbol interval. If the butterflies are updated in consecutive order, the
transition data for one symbol interval is stored as shown in Table 4. The state values are in
hexadecimal to make the structure visible.
Table 4. State Ordering in Transition Data for One Symbol Interval NIL
Bit Number in Transition Word
15 14 13 12 11 10 9 8
0 2
K2
+7 2
K2
+6 2
K2
+5 2
K2
+4 2
K2
+3 2
K2
+2 2
K2
+1 2
K2
TRN
1 2
K2
+F 2
K2
+E 2
K2
+D 2
K2
+C 2
K2
+B 2
K2
+A 2
K2
+9 2
K2
+8
TRN
Word#
2 2
K2
+17 2
K2
+16 2
K2
+15 2
K2
+14 2
K2
+13 2
K2
+12 2
K2
+11 2
K2
+10
Word#
...
2
K5
1 2
K1
1 2
K1
2 2
K1
3 2
K1
4 2
K1
5 2
K1
6 2
K1
7 2
K1
8
Bit Number in Transition Word
7 6 5 4 3 2 1 0
0 7 6 7 4 3 2 1 0
TRN
1 F E D C BE A 9 8
TRN
Word#
2 17 16 15 14 13 12 11 9
Word#
...
2
K5
1 2
K2
1 2
K2
2 2
K2
3 2
K2
4 2
K2
5 2
K2
6 2
K2
7 2
K2
8
A clearer example for a K = 6 system is shown in Table 5. There are 32 states and two transition
words.
Table 5. State Ordering in Transition Table for K = 6, Rate 1/2 System
Bit Number in Transition Word
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Word #
0 17 16 15 14 13 12 11 9 8 7 6 5 4 3 2 1
Word #
1 81F 1E 1D 1C 1B 1A 19 18 F E D C B A 9 8
Relatively simple algorithms find the correct transition word number and the correct bit number
within that transition word. Table 5 shows that each 16-bit data word contains eight pairs of
transition bits that differ only in the MSB. The three LSBs and the MSB determine the bit position
in the word, while the remaining bits determine the word number (see Figure 5).
SPRA776A
17 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
Bit # in Transition Word
Bit
k2
Bit
2
Bit
1
Bit
0
Word # in Transition Table
Figure 5. State Variable Representation
The algorithms extract information from a state variable indicating the current delay state. The
state is updated in the algorithm reflecting the new value read from the transition table. The
correct transition word is determined by masking off the bits between the three LSBs and the
MSB, then adding this value to the transition table start address. This can be expressed as:
Word# + (State u 3) & MASK, where MASK + 2
K*5
*1.
This value is added to the current table address, which is updated each iteration. For systems
with K 5, this part of the algorithm can be eliminated, since the transition data requires only
one word per symbol interval.
Finding the correct bit number within the selected transition word requires consideration of the
C55x bit extraction method. The Bit Test instruction, BTST src, Smem, TCx performs a bit
manipulation in the A-unit ALU. The instruction tests a single bit, as defined by the content of the
source (src) operand of a memory (Smem) location. The tested bit is copied into the selected
TCx status bit.
The generated bit address must be within 0-15 (only the 4 LSBs of the register are used to
determine the bit position).
2. Read selected bit corresponding to the state.
The BITT instruction copies the selected bit into the TCx bit. Simultaneously, the address is set
back to the start of the transition table entry, to position it for the next iteration.
3. Update state value with new bit.
The Rotate Left instruction, ROL CARRY, src, TC2, dst, performs bitwise rotation to the MSBs.
In this algorithm, CARRY is used to shift in one bit, and TC2 is used to store the shifted out bit.
dst +TC2 src CARRY
The CARRY is inserted at position 0, and then TC2 is extracted at the position according to M40
bit in ST1_55. This value becomes the new state, used in the next iteration.
The traceback algorithm extracts the output bits in a loop of 16, allowing the single bit from each
iteration to be combined into 16-bit words. The algorithm fills the area past the last set of
transition decisions with zeros to start on a 16-word boundary. The same number (X) of tail bits
that are added at the transmitter must be added before padding, since the output bits represent
the actual outputs for X number of prior iterations.
(6)
(7)
SPRA776A
18 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
3.7 Benchmarks
Based on the previous code examples, generic benchmarks can be developed for systems of
rate 1/n (before puncturing) and any constraint length. The benchmarks use the following
symbols:
R = original coding rate = 1/n = input bits/transmitted bits
PR = puncturing rate = p/q = bits retained/total bits
FS = original frame size (# bits) before coding
FR = number of data frames per second
A method of comparison of the various frame sizes and rates is shown in Figure 6. The
benchmark numbers, in cycles per frame, include all processing except minor
processor-initialization tasks. The equivalent MIPS are found by multiplying by the frame rate,
FR.
Overall Rate = R/PR
PR FS/R
Bits
FS/R Bits FS Bits FS/R Bits FS Bits
Encoder
Rate = R
Puncturing
Rate = PR
Depuncturing Decoding
Input
Data
Output
Data
Figure 6. Data Rates for Overall System
Metric update: Cycles/frame = (#States/2 butterflies butterfly calculation + TRN store + local dist
calculation.) # bits
= (2
K2
y 5 + 2
K5
+ 1 + n y 2
n1
) y FS
Traceback: Cycles/frame = (loop overhead and data storage + loop y 16) y # bits/16
= (9 + 12 y 16) y FS/16
= 201 y FS/16
Data reversal: Cycles/frame = 43 y FS/16
Total MIPS = Frame rate y (metric update + traceback + data reversal) cycles/frame
= FR y [(2
K2
y 5 + 2
K5
+ 1 + n y 2
n1
) y FS + (201/16) y FS + (43/16) y FS]
= FR y FS y (2
K2
y 5 + 2
K5
+ 1 + n y 2
n1
+ (201 + 43)/16)
= FR y FS y (2
K2
y 5 + 2
K5
+ n y 2
n1
+ 16.25)
This total does not include processor setup or depuncturing time. If necessary, depuncturing
requires (data copy time # bits) = (1 cycle/bit n FS bits) cycles/frame. With a frame of 200
bits, a rate 1/2 system requires 400 cycles/frame, which is only 0.02 MIPS at a 50-Hz frame rate.
The processor setup time for other functions is even smaller, so neither is included in the overall
benchmarks. Table 6 summarizes benchmarks for some specific systems.
(8)
(9)
(10)
(11)
SPRA776A
19 Viterbi Decoding Techniques for the TMS320C55x DSP Generation
Table 6. Viterbi Decoding Benchmarks for Various Wireless Standards
Standard Data Type
Coding
Rate (R)
Puncture
RatE (PR)
Constraint
Length (K)
Frame Size
(FS)
Frame Rate
(FR)
Benchmark
(MIPS)
GSM Voice 1/2 5 189 bits 50 Hz 0.58
Data 9.6 1/2 57/61 5 244 bits 50 Hz 0.75
Data 4.8 1/3 5 152 bits 50 Hz 0.53
IS136 Voice 1/2 6 89 bits 50 Hz 0.46
FACCH 1/4 6 65 bits 50 Hz 0.42
WLL