ECE-VII-DSP ALGORITHMS & ARCHITECTURE Part A
ECE-VII-DSP ALGORITHMS & ARCHITECTURE Part A
ECE-VII-DSP ALGORITHMS & ARCHITECTURE Part A
asia 1
University Syllabus
PART - A
UNIT - 1
INTRODUCTION TO DIGITAL SIGNAL PROCESSING: Introduction, A Digital Signal-
Processing System, The Sampling Process, Discrete Time Sequences, Discrete Fourier Transform
(DFT) and Fast Fourier Transform (FFT), Linear Time-Invariant Systems, Digital Filters, Decimation
and Interpolation. 5 Hours
UNIT - 2
ARCHITECTURES FOR PROGRAMMABLE DIGITAL SIGNAL-PROCESSORS:
Introduction, Basic Architectural Features, DSP Computational Building Blocks, Bus Architecture and
Memory, Data Addressing Capabilities, Address Generation Unit, Programmability and Program
Execution, Features for External Interfacing. 8 Hours
UNIT - 3
PROGRAMMABLE DIGITAL SIGNAL PROCESSORS: Introduction, Commercial digital
Signal-processing Devices, Data Addressing Modes of TMS32OC54xx., Memory Space of
TMS32OC54xx Processors, Program Control. 6 Hours
UNIT - 4
Detail Study of TMS320C54X & 54xx Instructions and Programming, On-Chip peripherals, Interrupts
of TMS32OC54XX Processors, Pipeline Operation of TMS32OC54xx Processor. 6 Hours
PART - B
UNIT - 5
IMPLEMENTATION OF BASIC DSP ALGORITHMS: Introduction, The Q-notation, FIR Filters,
IIR Filters, Interpolation and Decimation Filters (one example in each case). 6 Hours
UNIT - 6
6 Hours
UNIT - 7
INTERFACING MEMORY AND PARALLEL I/O PERIPHERALS TO DSP DEVICES:
Introduction, Memory Space Organization, External Bus Interfacing Signals. Memory Interface,
Parallel I/O Interface, Programmed I/O, Interrupts and I / O Direct Memory Access (DMA).
8 Hours
UNIT - 8
INTERFACING AND APPLICATIONS OF DSP PROCESSOR: Introduction, Synchronous
Serial Interface, A CODEC Interface Circuit. DSP Based Bio-telemetry Receiver, A Speech
Processing System, An Image Processing System.
6 Hours
TEXT BOOK:
1. “Digital Signal Processing”, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.
REFERENCE BOOKS:
1. Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W Pearson-
Education, PHI/ 2002
2. “Digital Signal Processors”, B Venkataramani and M Bhaskar TMH, 2002
3. “Architectures for Digital Signal Processing”, Peter Pirsch John Weily, 2007
INDEX SHEET
Sl.
Unit & Topic of Discussion Page No.
No.
PART-A:
UNIT-1: INTRODUCTION TO DIGITAL SIGNAL
PROCESSING:
1 Introduction, A Digital Signal-Processing System,
2 The Sampling Process, Discrete Time Sequences 5-15
3 Discrete Fourier Transform (DFT) and Fast Fourier
Transform (FFT),
4 Linear Time-Invariant Systems, Digital Filters,
5 Decimation and Interpolation
UNIT-2 : ARCHITECTURES FOR
PROGRAMMABLE DIGITAL SIGNAL-
PROCESSORS:
6 Introduction, Basic Architectural Features
7 DSP Computational Building Blocks
8 Explanations of functional blocks 16-35
9 Bus Architecture
10 Memory, Data Addressing Capabilities
11 Address Generation Unit,
12 Programmability and Program Execution
13 Features for External Interfacing
UNIT-3 : PROGRAMMABLE DIGITAL SIGNAL
PROCESSORS
14 Introduction, Commercial Digital Signal-processing
Devices,
15 Data Addressing Modes of TMS32OC54xx-1 36-59
16 Data Addressing Modes of TMS32OC54xx-2
17 Special addressing modes
18 Memory Space of TMS32OC54xx Processors
19 Program Control, Programming
UNIT-4 : INSTRUCTIONS AND PROGRAMMING
20 Detail Study of TMS320C54X
21 Instructions
22 Programming 60-119
23 On-Chip peripherals,
24 Interrupts of TMS32OC54XX Processors
25 Pipeline Operation of TMS32OC54xx Processor
PART-B
UNIT-5 : IMPLEMENTATION OF BASIC DSP
120-134
ALGORITHMS
26 Introduction, The Q-notation
Dept.ECE, SJBIT Page 3
Smartworld.asia 4
27 PROBLEMS on Q- notation
28 FIR Filters
29 IIR Filters,
30 Interpolation Filters
31 Decimation Filters
UNIT-6 : IMPLEMENTATION OF FFT
ALGORITHMS
32 Introduction, An FFT Algorithm for DFT Computation
33 Overflow and Scaling
135-154
34 Bit-Reversed Index Generation
35 Routine for bit reversed index
36 Implementation on the TMS32OC54xx.-1
37 Implementation on the TMS32OC54xx.-2
UNIT-7 : INTERFACING MEMORY AND
PARALLEL I/O PERIPHERALS TO DSP DEVICES
38 Introduction, Memory Space Organization,
39 External Bus Interfacing Signals
40 Timing Diagram of interfacing
155-170
41 Memory Interface
42 Problems on memory interface
43 Parallel I/O Interface
44 Programmed I/O
45 Interrupts and I / O Direct Memory Access (DMA).
UNIT-8 : INTERFACING AND APPLICATIONS
OF DSP PROCESSOR
46 Introduction, Synchronous Serial Interface
47 Block diagram of CODEC
48 A CODEC Interface Circuit 171-182
49 ADC interface
50 DSP Based Bio-telemetry Receiver
51 A Speech Processing System
52 An Image Processing System
UNIT-1
A computer or a processor is used for digital signal processing. Anti aliasing filter is a LPF
which passes signal with frequency less than or equal to half the sampling frequency in order to avoid
Aliasing effect. Similarly at the other end, reconstruction filter is used to reconstruct the samples from
the staircase output of the DAC (Figure 1.2).
ADC process involves sampling the signal and then quantizing the same to a digital value. In
order to avoid Aliasing effect, the signal has to be sampled at a rate at least equal to the Nyquist rate.
Where, fs is the sampling frequency, fm is the maximum frequency component in the message
signal. If the sampling of the signal is carried out with a rate less than the Nyquist rate, the higher
frequency components of the signal cannot be reconstructed properly. The plots of the reconstructed
outputs for various conditions are as shown in figure 1.4.
A sequence that repeats itself after every period N is called a periodic sequence.
Consider a periodic sequence x (n) with period N x (n)=x (n+N) n=……..,-1,0,1,2,……..
Frequency response gives the frequency domain equivalent of a discrete time sequence. It is denoted
as X(ejθ)=∑x(n) e-jnθ
Frequency response of a discrete sequence involves both magnitude response and phase response.
We have,
From the above expression it is clear that we can use DFT to find the Frequency response of a
A system which satisfies superposition theorem is called as a linear system and a system that
has same input output relation at all times is called a Time Invariant System. Systems, which satisfy
both the properties, are called LTI systems.
LTI systems are characterized by its impulse response or unit sample response in time domain whereas
it is characterized by the system function in frequency domain.
1.7.1 Convolution
Convolution is the operation that related the input output of an LTI system, to its unit sample
response. The output of the system y (n) for the input x (n) and the impulse response of the system
being h (n) is given as y (n) = x(n) * h(n) = ∑ -k), x(n) is the input of the system, h(n) is the
impulse response of the system, y(n) is the output of the system.
1.7.2 Z Transformation
Z Transformations are used to find the frequency response of the system. The Z Transform for
a discrete sequence x (n) is given by, X(Z)= ∑x(n) z-n
Values of the filter coefficients vary with respect to the type of the filter. Design of a digital filter
involves determining the filter coefficients. Based on the length of the impulse response, digital filters
are classified into two categories via Finite Impulse Response (FIR) Filters and Infinite Impulse
Response (IIR) Filters.
The major drawback of FIR filters is, they require more number of filter coefficients to realize a
desired response as compared to IIR filters. Thus the computational time required will also be more.
Stability of IIR filters depends on the number and the values of the filter coefficients. The major
advantage of IIR filters over FIR is that, they require lesser coefficients compared to FIR filters for the
same desired response, thus requiring less computation time.
Design procedure of an FIR filter involves the determination of the filter coefficients bk.
Direct IIR filter design methods are based on least squares fit to a desired frequency response. These
methods allow arbitrary frequency response specifications.
1.9.1 Decimation
Decimation is a process of dropping the samples without violating sampling theorem. The
factor by which the signal is decimated is called as decimation factor and it is denoted by M. It is
given by,
1.9.2 Interpolation
Interpolation is a process of increasing the sampling rate by inserting new samples in between.
The input output relation for the interpolation, where the sampling rate is increased by a factor L, is
given as,
Problems:
1. Obtain the transfer function of the IIR filter whose difference equation is given by y (n)=
0.9y (n-1)+0.1x (n)
y (n)= 0.9y (n-1)+0.1x (n)
Taking Z transformation both sides
Y (Z) = 0.9 Z-1 Y (Z) + 0.1 X (Z)
Y (Z) [1- 0.9 Z-1] = 0.1 X (Z)
The transfer function of the system is given by the expression,
H (Z)= Y(Z)/X(Z)
= 0.1/ [ 1- 0.9 Z-1]
Realization of the IIR filter with the above difference equation is as shown in figure.
2. Let x(n)= [0 3 6 9 12] be interpolated with L=3. If the filter coefficients of the
filters are bk=[1/3 2/3 1 2/3 1/3], obtain the interpolated sequence
Recommended Questions
1. Explain with the help of mathematical equations how signed numbers can be
multiplied. The sequence x(n) = [3,2,-2,0,7].It is interpolated using interpolation
sequence bk=[0.5,1,0.5] and the interpolation factor of 2. Find the interpolated
sequence y(m).
2. An analog signal is sampled at the rate of 8KHz. If 512 samples of this signal are used
to compute DFT X(k) determine the analog and digital frequency spacing between
adjacent X(k0 elements. Also, determine analog and digital frequencies corresponding
to k=60.
3. With a neat diagram explain the scheme of the DSP system.
4. What is DSP? What are the important issues to be considered in designing and
implementing a DSP system? Explain in detail.
5. Why signal sampling is required? Explain the sampling process.
6. Define decimation and interpolation process. Explain them using block diagrams and
equations. With a neat diagram explain the scheme of a DSP system.
7. With an example explain the need for the low pass filter in decimation process.
8. For the FIR filter y(n)=(x(n)+x(n-1)+x(n-2))/3. Determine i) System Function ii)
Magnitude and phase function iii) Step response iv) Group Delay.
9. List the major architectural features used in DSP system to achieve high speed program
execution.
10. Explain how to simulate the impulse responses of FIR and IIR filters.
11. Explain the two method of sampling rate conversions used in DSP system, with suitable
block diagrams and examples. Draw the corresponding spectrum.
12. Assuming X(K) as a complex sequence determine the number of complex real
multiplies for computing IDFT using direct and Radix-2 FT algorithms.
13. With a neat diagram explain the scheme of a DSP system. (June.12, 8m)
14. With an example explain the need for the low pass filter in decimation process.
(June.12, 4m)
15. For the FIR filter y(n)=(x(n)+x(n-1)+x(n-2))/3. Determine i) System Function ii)
Magnitude and phase function iii) Step response iv) Group Delay. (June.12, 8m)
16. List the major architectural features used in DSP system to achieve high speed program
execution. (Dec.11, 6m).
17. Explain how to simulate the impulse responses of FIR and IIR filters. (Dec.11, 6m).
18. Explain the two method of sampling rate conversions used in DSP system, with suitable
block diagrams and examples. Draw the corresponding spectrum. (Dec.11, 8m).
19. Explain with the help of mathematical equations how signed numbers can be
multiplied. (July.11, 8m).
20. With a neat diagram explain the scheme of the DSP system. (Dec.10-Jan.11, 8m)
(July.11, 8m).
UNIT-2
Architectures for Programmable Digital Signal Processing
Devices
2.1 Basic Architectural Features
A programmable DSP device should provide instructions similar to a conventional
microprocessor. The instruction set of a typical DSP device should include the following,
a. Arithmetic operations such as ADD, SUBTRACT, MULTIPLY etc
b. Logical operations such as AND, OR, NOT, XOR etc
c. Multiply and Accumulate (MAC) operation
d. Signal scaling operation
In addition to the above provisions, the architecture should also include,
a. On chip registers to store immediate results
b. On chip memories to store signal samples (RAM)
c. On chip memories to store filter coefficients (ROM)
2.2 DSP Computational Building Blocks
Each computational block of the DSP should be optimized for functionality and speed and in
the meanwhile the design should be sufficiently general so that it can be easily integrated with other
blocks to implement overall DSP systems.
2.2.1 Multipliers
The advent of single chip multipliers paved the way for implementing DSP functions on a
VLSI chip. Parallel multipliers replaced the traditional shift and add multipliers now days. Parallel
multipliers take a single processor cycle to fetch and execute the instruction and to store the result.
They are also called as Array multipliers. The key features to be considered for a multiplier are:
a. Accuracy
b. Dynamic range
c. Speed
The number of bits used to represent the operands decides the accuracy and the dynamic range
of the multiplier. Whereas speed is decided by the architecture employed. If the multipliers are
implemented using hardware, the speed of execution will be very high but the circuit complexity will
also increases considerably. Thus there should be a tradeoff between the speed of execution and the
circuit complexity. Hence the choice of the architecture normally depends on the application.
This operation can be implemented paralleling using Braun multiplier whose hardware structure is as
shown in the figure 2.1.
In the Braun multiplier the sign of the numbers are not considered into account. In order to
implement a multiplier for signed numbers, additional hardware is required to modify the Braun
multiplier. The modified multiplier is called as Baugh-Wooley multiplier.
2.2.4 Speed
Conventional Shift and Add technique of multiplication requires n cycles to perform the
multiplication of two n bit numbers. Whereas in parallel multipliers the time required will be the
longest path delay in the combinational circuit used. As DSP applications generally require very high
speed, it is desirable to have multipliers operating at the highest possible speed by having parallel
implementation.
2.2.6 Shifters
Shifters are used to either scale down or scale up operands or the results. The following
scenarios give the necessity of a shifter
a. While performing the addition of N numbers each of n bits long, the sum can grow up to n+log2 N
bits long. If the accumulator is of n bits long, then an overflow error will occur. This can be overcome
by using a shifter to scale down the operand by an amount of log2N.
b. Similarly while calculating the product of two n bit numbers, the product can grow up to 2n bits
long. Generally the lower n bits get neglected and the sign bit is shifted to save the sign of the product.
c. Finally in case of addition of two floating-point numbers, one of the operands has to be shifted
appropriately to make the exponents of two numbers equal.
From the above cases it is clear that, a shifter is required in the architecture of a DSP.
In conventional microprocessors, normal shift registers are used for shift operation. As it
requires one clock cycle for each shift, it is not desirable for DSP applications, which generally
involves more shifts. In other words, for DSP applications as speed is the crucial issue, several shifts
are to be accomplished in a single execution cycle. This can be accomplished using a barrel shifter,
which connects the input lines representing a word to a group of output lines with the required shifts
determined by its control inputs. For an input of length n, log2 n control lines are required. And an
dditional control line is required to indicate the direction of the shift.
The block diagram of a typical barrel shifter is as shown in figure 2.3.
Figure 2.4 depicts the implementation of a 4 bit shift right barrel shifter. Shift to right by 0, 1, 2 or 3
bit positions can be controlled by setting the control inputs appropriately.
Although addition and multiplication are two different operations, they can be performed in parallel.
By the time the multiplier is computing the product, accumulator can accumulate the product of the
previous multiplications. Thus if N products are to be accumulated, N-1 multiplications can overlap
with N-1 additions. During the very first multiplication, accumulator will be idle and during the last
accumulation, multiplier will be idle. Thus N+1 clock cycles are required to compute the sum of N
products.
Shifters
Shifters can be provided at the input of the MAC to normalize the data and at the output to de
normalize the same.
Guard bits
As the normalization process does not yield accurate result, it is not desirable for some
applications. In such cases we have another alternative by providing additional bits called guard bits in
the accumulator so that there will not be any overflow error. Here the add/subtract unit also has to be
modified appropriately to manage the additional bits of the accumulator.
Saturation Logic
Overflow/ underflow will occur if the result goes beyond the most positive number or below
the least negative number the accumulator can handle. Thus the overflow/underflow error can be
resolved by loading the accumulator with the most positive number which it can handle at the time of
overflow and the least negative number that it can handle at the time of underflow. This method is
called as saturation logic. A schematic diagram of saturation logic is as shown in figure 2.7. In
saturation logic, as soon as an overflow or underflow condition is satisfied the accumulator will be
loaded with the most positive or least negative number overriding the result computed by the MAC
unit.
Status Flags
ALU includes circuitry to generate status flags after arithmetic and logic operations. These flags
include sign, zero, carry and overflow.
Overflow Management
Depending on the status of overflow and sign flags, the saturation logic can be used to limit the
accumulator content.
Register File
Instead of moving data in and out of the memory during the operation, for better speed, a large set of
general purpose registers are provided to store the intermediate results.
In order to increase the speed of operation, separate memories were used to store program and
data and a separate set of data and address buses have been given to both memories, the architecture
called as Harvard Architecture. It is as shown in figure 2.10.
Although the usage of separate memories for data and the instruction speeds up the processing,
it will not completely solve the problem. As many of the DSP instructions require more than one
operand, use of a single data memory leads to the fetch the operands one after the other, thus
increasing the delay of processing. This problem can be overcome by using two separate data
memories for storing operands separately, thus in a single clock cycle both the operands can be fetched
together (Figure 2.11).
Although the above architecture improves the speed of operation, it requires more hardware
and interconnections, thus increasing the cost and complexity of the system. Therefore there should be
a trade off between the cost and speed while selecting memory architecture for a DSP.
a. As many DSP algorithms require instructions to be executed repeatedly, the instruction can be
stored in the external memory, once it is fetched can reside in the instruction cache.
b. The access times for memories on-chip should be sufficiently small so that it can be accessed more
than once in every execution cycle.
c. On-chip memories can be configured dynamically so that they can serve different purpose at
different times.
There are four special cases in this addressing mode. They are
The block diagram of a typical address generation unit is as shown in figure 2.13.
Problems:
1). Investigate the basic features that should be provided in the DSP architecture to be used to
implement the following Nth order FIR filter.
Solution:-
2). It is required to find the sum of 64, 16 bit numbers. How many bits should the
accumulator have so that the sum can be computed without the occurrence of
overflow error or loss of accuracy?
The sum of 64, 16 bit numbers can grow up to (16+ log2 64 )=22 bits long. Hence
the accumulator should be 22 bits long in order to avoid overflow error from occurring.
3. If a sum of 256 products is to be computed using a pipelined MAC unit, and if the MAC
execution time of the unit is 100nsec, what will be the total time required to complete the
operation?
Dept.ECE, SJBIT Page 33
Smartworld.asia 34
As N=256 in this case, MAC unit requires N+1=257execution cycles. As the single MAC
execution time is 100nsec, the total time required will be, (257*100nsec)=25.7usec
4. Consider a MAC unit whose inputs are 16 bit numbers. If 256 products are to be
summed up in this MAC, how many guard bits should be provided for the
accumulator to prevent overflow condition from occurring?
As it is required to calculate the sum of 256, 16 bit numbers, the sum can be as
long as (16+ log2 256)=24 bits. Hence the accumulator should be capable of handling
these 22 bits. Thus the guard bits required will be (24-16)= 8 bits.
The block diagram of the modified MAC after considering the guard or extention bits is as shown in
the figure
5. What are the memory addresses of the operands in each of the following cases of indirect
addressing modes? In each case, what will be the content of the addreg after the memory
access? Assume that the initial contents of the addreg and the offsetreg are 0200h and 0010h,
respectively.
a. ADD *addreg
b.ADD +*addreg
c. ADD offsetreg+,*addreg
d. ADD *addreg,offsetreg-
6. A DSP has a circular buffer with the start and the end addresses as 0200h and 020Fh
respectively. What would be the new values of the address pointer of the buffer if, in the course
of address computation, it gets updated to
Dept.ECE, SJBIT Page 34
Smartworld.asia 35
a. 0212h
b. 01FCh
Buffer Length= (EAR-SAR+1) = 020F-0200+1=10h
a. New Address Pointer= Updated Pointer-buffer length = 0212-10=0202h
b. New Address Pointer= Updated Pointer+ buffer length = 01FC+10=020Ch
9. Compute the indices for an 8-point FFT using Bit reversed Addressing Mode
Start with index 0. Therefore the first index would be (000)
Next index can be calculated by adding half the FFT length, in this case it is (100)
to the previous index. i.e. Present Index= (000)+B (100)= (100)
Similarly the next index can be calculated as
Present Index= (100)+B (100)= (010)
The process continues till all the indices are calculated. The following table summarizes
the calculation.
Recommended Questions:
1. Explain implementation of 8- tap FIR filter, (i) pipelined using MAC units and (ii) parallel
using two MAC units. Draw block diagrams.
2. What is the role of a shifter in DSP? Explain the implementation of 4-bit shift right barrel
shifter, with a diagram.
3. Identify the addressing modes of the operands in each of the following instructions & their
operations
i)ADD B ii) ADD #1234h iii) ADD 5678h iv) ADD +*addreg
4. Draw the schematic diagram of the saturation logic and explain the same.
5. Explain how the circular addressing mode and bit reversal addressing mode are implemented in
a DSP.
6. Explain the purpose of program sequencer.
7. Give the structure of a 4X4 Braun multiplier, Explain its concept. What modification is
required to carry out multiplication of signed numbers? Comment on the speed of the
multiplier.
8. Explain guard bits in a MAC unit of DSP. Consider a MAC unit whose inputs are 24-bit
numbers. How many guard bits should be provided if 512 products have to be added in the
accumulator to prevent overflow condition? What is the overall size of the accumulator
required?
9. With a neat block diagram explain ALU of DSP system.
10. Explain circular buffer addressing mode ii) Parallelism iii) Guard bits.
11. The 256 unsigned numbers, 16 bit each are to be summed up in a processor. How many guard
bits are needed to prevent overflow.
12. How will you implement an 8X8 multiplier using 4X4 multipliers as the building blocks.
13. Describe the basic features that should be provided in the DSP architecture to be used to
implement the Nth order FIR filter, where x(n) denotes the input sample, y(n) the output
sample and h(i) denotes ith filter coefficient.(Dec.09-Jan.10, 8m)
14. Explain the issues to be considered in designing and implementing a DSP system, with the help
of a neat block diagram. (May/June10 , 6m)
15. Briefly explain the major features of programmable DSPs. (May/June10, 8m)
16. Explain the operation used in DSP to increase the sampling rate. The sequence x(n)=[0,2,4,6,8]
is interpolated using interpolation sequence bk =[1/2,1,1/2] and the interpolation factor is 2.find
the interpolated sequence y(m). (May/June10, 8m)
17. Explain with the help of mathematical equations how signed numbers can be multiplied.
(Dec.10-Jan.11, 8m)
18. The sequence x(n) = [3,2,-2,0,7].It is interpolated using interpolation sequence bk=[0.5,1,0.5]
and the interpolation factor of 2. Find the interpolated sequence y(m).(Dec.10-Jan.11, 6m)
19. Why signal sampling is required? Explain the sampling process. (Dec.12, 5m)
20. Define decimation and interpolation process. Explain them using block diagrams and
equations. (Dec.12, 6m).
UNIT-3
3.1 Introduction:
Leading manufacturers of integrated circuits such as Texas Instruments (TI), Analog devices &
Motorola manufacture the digital signal processor (DSP) chips. These manufacturers have developed a
range of DSP chips with varied complexity.
The TMS320 family consists of two types of single chips DSPs: 16-bit fixed point &32-bit floating-
point. These DSPs possess the operational flexibility of high-speed controllers and the numerical
capability of array processors
Accumulators A and B store the output from the ALU or the multiplier/adder block and provide a
second input to the ALU. Each accumulators is divided into three parts: guards bits (bits 39-32), high-
order word (bits-31-16), and low-order word (bits 15- 0), which can be stored and retrieved
individually. Each accumulator is memory-mapped and partitioned. It can be configured as the
destination registers. The guard bits are used as a head margin for computations.
Barrel shifter: provides the capability to scale the data during an operand read or write.
No overhead is required to implement the shift needed for the scaling operations. The’54xx barrel
shifter can produce a left shift of 0 to 31 bits or a right shift of 0 to 16 bits on the input data. The shift
count field of status registers ST1, or in the temporary
register T. Figure 3.3 shows the functional diagram of the barrel shifter of TMS320C54xx processors.
The barrel shifter and the exponent encoder normalize the values in an accumulator in a single cycle.
The LSBs of the output are filled with0s, and the MSBs can be either zero filled or sign extended,
depending on the state of the sign-extension mode bit in the status register ST1. An additional shift
capability enables the processor to perform numerical scaling, bit extraction, extended arithmetic, and
overflow prevention operations.
Multiplier/adder unit: The kernel of the DSP device architecture is multiplier/adder unit. The
multiplier/adder unit of TMS320C54xx devices performs 17 x 17 2’s complement multiplication with
a 40-bit addition effectively in a single instruction cycle.
In addition to the multiplier and adder, the unit consists of control logic for integer and
fractional computations and a 16-bit temporary storage register, T. Figure 3.4 show the functional
diagram of the multiplier/adder unit of TMS320C54xx processors. The compare, select, and store unit
(CSSU) is a hardware unit specifically incorporated to accelerate the add/compare/select operation.
This operation is essential to implement the Viterbi algorithm used in many signal-processing
applications. The exponent encoder unit supports the EXP instructions, which stores in the T register
the number of leading redundant bits of the accumulator content. This information is useful while
shifting the accumulator content for the purpose of scaling.
memory. Figure 3.5(a) and (b) shows the internal CPU registers and peripheral registers with their
addresses. The processors mode status (PMST) registers
that is used to configure the processor. It is a memory-mapped register located at address 1Dh on page
0 of the RAM. A part of on-chip ROM may contain a boot loader and look-up tables for function such
as sine, cosine, μ- law, and A- law.
HM: Hold mode, indicates whether the processor continues internal execution or acknowledge for
external interface.
INTR: Interrupt vector pointer, point to the 128-word program page where the interrupt vectors
reside.
MP/MC: Microprocessor/Microcomputer mode,
MP/MC=0, the on chip ROM is enabled.
MP/MC=1, the on chip ROM is enabled.
OVLY: RAM OVERLAY, OVLY enables on chip dual access data RAM blocks to be mapped into
program space.
AVIS: It enables/disables the internal program address to be visible at the address pins.
DROM: Data ROM, DROM enables on-chip ROM to be mapped into data space.
CLKOFF: CLOCKOUT off.
Data addressing modes provide various ways to access operands to execute instructions and place
results in the memory or the registers. The 54XX devices offer seven basic addressing modes
1. Immediate addressing.
2. Absolute addressing.
3. Accumulator addressing.
4. Direct addressing.
5. Indirect addressing.
6. Memory mapped addressing
7. Stack addressing.
Figure 3.7 Block diagram of the direct addressing mode for TMS320C54xx Processors.
TMS320C54xx have 8, 16 bit auxiliary register (AR0 – AR 7). Two auxiliary register arithmetic units
(ARAU0 & ARAU1)
Used to access memory location in fixed step size. AR0 register is used for indexed and bit reverse
addressing modes.
– operand addressing
MOD _ type of indirect addressing
ARF _ AR used for addressing
ARP depends on (CMPT) bit in ST1
CMPT = 0, Standard mode, ARP set to zero
CMPT = 1, Compatibility mode, Particularly AR selected by ARP
Table 3.2 Indirect addressing options with a single data –memory operand.
Circular Addressing;
Bit-Reversed Addressing:
o Used for FFT algorithms.
o AR0 specifies one half of the size of the FFT.
o The value of AR0 = 2N-1: N = integer FFT size = 2N
o AR0 + AR (selected register) = bit reverse addressing.
o The carry bit propagating from left to right.
Dual-Operand Addressing:
Dual data-memory operand addressing is used for instruction that simultaneously
perform two reads (32-bit read) or a single read (16-bit read) and a parallel store (16-bit
store) indicated by two vertical bars, II. These instructions access operands using indirect addressing
mode.
If in an instruction with a parallel store the source operand the destination operand point to the
same location, the source is read before writing to the destination. Only 2 bits are available in the
instruction code for selecting each auxiliary register in this mode. Thus, just four of the auxiliary
registers, AR2-AR5, can be used, The ARAUs together with these registers, provide capability to
access two operands in a single cycle. Figure 3.11 shows how an address is generated using dual data-
memory operand addressing.
Problems:
1. Assuming the current content of AR3 to be 200h, what will be its contents after
each of the following TMS320C54xx addressing modes is used? Assume that the
contents of AR0 are 20h.
a. *AR3+0
b. *AR3-0
c. *AR3+
d. *AR3
e. *AR3
f. *+AR3 (40h)
g. *+AR3 (-40h)
Solution:
a. AR3 ← AR3 + AR0;
AR3 = 200h + 20h = 220h
b. AR3← AR3 - AR0;
AR3 = 200h - 20h = 1E0h
c. AR3 ← AR3 + 1;
AR3 = 200h + 1 = 201h
d. AR3 ← AR3 - 1;
AR3 = 200h - 1 = 1FFh
e. AR3 is not modified.
AR3 = 200h
f. AR3 ← AR3 + 40h;
AR3 = 200 + 40h = 240h
g. AR3 ← AR3 - 40h;
AR3 = 200 - 40h = 1C0h
2. Assuming the current contents of AR3 to be 200h, what will be its contents after
each of the following TMS320C54xx addressing modes is used? Assume that the contents of AR0 are
20h
a. *AR3 + 0B
b. *AR3 – 0B
Solution:
a. AR3 ← AR3 + AR0 with reverse carry propagation;
AR3 = 200h + 20h (with reverse carry propagation) = 220h.
b. AR3 ← AR3 - AR0 with reverse carry propagation;
AR3 = 200h - 20h (with reverse carry propagation) = 23Fh.
Recommended Questions:
1. Compare architectural features of TMS320C25 and DSP6000 fixed point digital signal
processors. (Dec.09-Jan.10, 6m)
2. Write an explanatory note on direct addressing mode of TMS320C54XX processors. Give
example. (Dec.09-Jan.10, 6m)
3. Describe the operation of the following instructions of TMS320C54XX processors.
i) MPY *AR2-,*AR4+0B (ii) MAC *ar5+,#1234h,A (iii) STH A,1,*AR2 iv) SSBX
SXM (Dec.09-Jan.10, 8m)
4. With a block diagram explain the indirect addressing mode of TMS320C54XX processor using
dual data memory operand. (June.12, 6m)
5. What is the function of an address generation unit explain with the help of block diagram.
(Dec.12, 6m)
6. Why circular buffers are required in DSP processor? How they are implemented? (Dec.12, 2m)
7. Explain the direct addressing mode of the TMS320C54XX processor with the help of a block
diagram. (Dec.12, 2m)
8. Describe the multiplier/adder unit of TMS320c54xx processor with a neat block diagram.
(May/June2010, 6m)
9. Describe any four data addressing modes of TMS320c54xx processor(May/June2010, 8m)
10. Assume that the current content of AR3 is 400h, what will be its contents after each of the
following. Assume that the content of AR0 is 40h. (May/June2010, 8m)
UNIT-4
Branch Instructions
NOP: No Operation
The timer register (TIM) is a 16-bit memory-mapped register that decrements at every pulse from the
prescaler block (PSC).
The timer period register (PRD) is a 16-bit memory-mapped register whose contents are loaded onto
the TIM whenever the TIM decrements to zero or the device is reset (SRESET).
The timer can also be independently reset using the TRB signal. The timer control register
(TCR) is a 16-bit memory-mapped register that contains status and control bits. Table shows the
functions of the various bits in the TCR.
The prescaler block is also an on-chip counter. Whenever the prescaler bits count down to 0, a
clock pulse is given to the TIM register that decrements the TIM register by 1. The TDDR bits contain
the divide-down ratio, which is loaded onto the prescaler block after each time the prescaler bits count
down to 0.
That is to say that the 4-bit value of TDDR determines the divide-by ratio of the timer clock
with respect to the system clock. In other words, the TIM decrements either at the rate of the system
clock or at a rate slower than that as decided by the value of the TDDR bits. TOUT and TINT are the
output signal generated as the TIM register decrements to 0. TOUT can trigger the start of the
conversion signal in an ADC interfaced to the DSP.
The sampling frequency of the ADC determines how frequently it receives the TOUT signal.
TINT is used to generate interrupts, which are required to service a peripheral such as a DRAM
controller periodically. The timer can also be stopped, restarted, reset, or disabled by specific status
bits.
The synchronous serial ports are high-speed, full-duplex ports and that provide direct
communications with serial devices, such as codec, and analog-to-digital (A/D) converters. A buffered
serial port (BSP) is synchronous serial port that is provided with
an auto buffering unit and is clocked at the full clock rate. The head of servicing interrupts. A time-
division multiplexed (TDM) serial port is a synchronous serial port that is provided to allow time-
division multiplexing of the data. The functioning of each of these on-chip peripherals is controlled by
memory-mapped registers assigned to the respective peripheral.
5 In the read phase the data operand(s), if any, are read from the data buses, DB and CB. This phase
completes the two-phase read process and starts the two phase write processes. The data address of the
write operand, if any, is loaded into the data write address bus, EAB.
6 The execute phase writes the data using the data write bus, EB, and completes the operand write
sequence. The instruction is executed in this phase.
Recommended Questions: