Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

ECE-VII-DSP ALGORITHMS & ARCHITECTURE Part A

Download as pdf or txt
Download as pdf or txt
You are on page 1of 111

Smartworld.

asia 1

DSP Algorithm and Architecture 10EC751

University Syllabus

Subject Code : 10EC751 IA Marks : 25


No. of Lecture Hrs/Week : 04 Exam Hours : 03
Total no. of Lecture Hrs. : 52 Exam Marks : 100

PART - A

UNIT - 1
INTRODUCTION TO DIGITAL SIGNAL PROCESSING: Introduction, A Digital Signal-
Processing System, The Sampling Process, Discrete Time Sequences, Discrete Fourier Transform
(DFT) and Fast Fourier Transform (FFT), Linear Time-Invariant Systems, Digital Filters, Decimation
and Interpolation. 5 Hours

UNIT - 2
ARCHITECTURES FOR PROGRAMMABLE DIGITAL SIGNAL-PROCESSORS:
Introduction, Basic Architectural Features, DSP Computational Building Blocks, Bus Architecture and
Memory, Data Addressing Capabilities, Address Generation Unit, Programmability and Program
Execution, Features for External Interfacing. 8 Hours

UNIT - 3
PROGRAMMABLE DIGITAL SIGNAL PROCESSORS: Introduction, Commercial digital
Signal-processing Devices, Data Addressing Modes of TMS32OC54xx., Memory Space of
TMS32OC54xx Processors, Program Control. 6 Hours

UNIT - 4
Detail Study of TMS320C54X & 54xx Instructions and Programming, On-Chip peripherals, Interrupts
of TMS32OC54XX Processors, Pipeline Operation of TMS32OC54xx Processor. 6 Hours

PART - B
UNIT - 5
IMPLEMENTATION OF BASIC DSP ALGORITHMS: Introduction, The Q-notation, FIR Filters,
IIR Filters, Interpolation and Decimation Filters (one example in each case). 6 Hours

UNIT - 6

Dept.ECE, SJBIT Page 1


Smartworld.asia 2

DSP Algorithm and Architecture 10EC751

IMPLEMENTATION OF FFT ALGORITHMS: Introduction, An FFT Algorithm for DFT


Computation, Overflow and Scaling, Bit-Reversed Index Generation & Implementation on the
TMS32OC54xx.

6 Hours
UNIT - 7
INTERFACING MEMORY AND PARALLEL I/O PERIPHERALS TO DSP DEVICES:
Introduction, Memory Space Organization, External Bus Interfacing Signals. Memory Interface,
Parallel I/O Interface, Programmed I/O, Interrupts and I / O Direct Memory Access (DMA).

8 Hours
UNIT - 8
INTERFACING AND APPLICATIONS OF DSP PROCESSOR: Introduction, Synchronous
Serial Interface, A CODEC Interface Circuit. DSP Based Bio-telemetry Receiver, A Speech
Processing System, An Image Processing System.

6 Hours

TEXT BOOK:
1. “Digital Signal Processing”, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.

REFERENCE BOOKS:
1. Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W Pearson-
Education, PHI/ 2002
2. “Digital Signal Processors”, B Venkataramani and M Bhaskar TMH, 2002
3. “Architectures for Digital Signal Processing”, Peter Pirsch John Weily, 2007

Dept.ECE, SJBIT Page 2


Smartworld.asia 3

DSP Algorithm and Architecture 10EC751

INDEX SHEET

Sl.
Unit & Topic of Discussion Page No.
No.
PART-A:
UNIT-1: INTRODUCTION TO DIGITAL SIGNAL
PROCESSING:
1 Introduction, A Digital Signal-Processing System,
2 The Sampling Process, Discrete Time Sequences 5-15
3 Discrete Fourier Transform (DFT) and Fast Fourier
Transform (FFT),
4 Linear Time-Invariant Systems, Digital Filters,
5 Decimation and Interpolation
UNIT-2 : ARCHITECTURES FOR
PROGRAMMABLE DIGITAL SIGNAL-
PROCESSORS:
6 Introduction, Basic Architectural Features
7 DSP Computational Building Blocks
8 Explanations of functional blocks 16-35
9 Bus Architecture
10 Memory, Data Addressing Capabilities
11 Address Generation Unit,
12 Programmability and Program Execution
13 Features for External Interfacing
UNIT-3 : PROGRAMMABLE DIGITAL SIGNAL
PROCESSORS
14 Introduction, Commercial Digital Signal-processing
Devices,
15 Data Addressing Modes of TMS32OC54xx-1 36-59
16 Data Addressing Modes of TMS32OC54xx-2
17 Special addressing modes
18 Memory Space of TMS32OC54xx Processors
19 Program Control, Programming
UNIT-4 : INSTRUCTIONS AND PROGRAMMING
20 Detail Study of TMS320C54X
21 Instructions
22 Programming 60-119
23 On-Chip peripherals,
24 Interrupts of TMS32OC54XX Processors
25 Pipeline Operation of TMS32OC54xx Processor
PART-B
UNIT-5 : IMPLEMENTATION OF BASIC DSP
120-134
ALGORITHMS
26 Introduction, The Q-notation
Dept.ECE, SJBIT Page 3
Smartworld.asia 4

DSP Algorithm and Architecture 10EC751

27 PROBLEMS on Q- notation
28 FIR Filters
29 IIR Filters,
30 Interpolation Filters
31 Decimation Filters
UNIT-6 : IMPLEMENTATION OF FFT
ALGORITHMS
32 Introduction, An FFT Algorithm for DFT Computation
33 Overflow and Scaling
135-154
34 Bit-Reversed Index Generation
35 Routine for bit reversed index
36 Implementation on the TMS32OC54xx.-1
37 Implementation on the TMS32OC54xx.-2
UNIT-7 : INTERFACING MEMORY AND
PARALLEL I/O PERIPHERALS TO DSP DEVICES
38 Introduction, Memory Space Organization,
39 External Bus Interfacing Signals
40 Timing Diagram of interfacing
155-170
41 Memory Interface
42 Problems on memory interface
43 Parallel I/O Interface
44 Programmed I/O
45 Interrupts and I / O Direct Memory Access (DMA).
UNIT-8 : INTERFACING AND APPLICATIONS
OF DSP PROCESSOR
46 Introduction, Synchronous Serial Interface
47 Block diagram of CODEC
48 A CODEC Interface Circuit 171-182
49 ADC interface
50 DSP Based Bio-telemetry Receiver
51 A Speech Processing System
52 An Image Processing System

Dept.ECE, SJBIT Page 4


Smartworld.asia 5

DSP Algorithm and Architecture 10EC751

UNIT-1

Introduction to Digital Signal Processing

1.1 What is DSP?


DSP is a technique of performing the mathematical operations on the signals in digital domain.
As real time signals are analog in nature we need first convert the analog signal to digital, then we
have to process the signal in digital domain and again converting back to analog domain. Thus ADC is
required at the input side whereas a DAC is required at the output end. A typical DSP system is as
shown in figure 1.1.

1.2 Need for DSP

Analog signal Processing has the following drawbacks:


 They are sensitive to environmental changes
 Aging
 Uncertain performance in production units
 Variation in performance of units
 Cost of the system will be high
 Scalability
If Digital Signal Processing would have been used we can overcome the above shortcomings of ASP.

1.3 A Digital Signal Processing System

A computer or a processor is used for digital signal processing. Anti aliasing filter is a LPF
which passes signal with frequency less than or equal to half the sampling frequency in order to avoid
Aliasing effect. Similarly at the other end, reconstruction filter is used to reconstruct the samples from
the staircase output of the DAC (Figure 1.2).

Dept.ECE, SJBIT Page 5


Smartworld.asia 6

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 6


Smartworld.asia 7

DSP Algorithm and Architecture 10EC751

1.4 The Sampling Process

ADC process involves sampling the signal and then quantizing the same to a digital value. In
order to avoid Aliasing effect, the signal has to be sampled at a rate at least equal to the Nyquist rate.

Where, fs is the sampling frequency, fm is the maximum frequency component in the message
signal. If the sampling of the signal is carried out with a rate less than the Nyquist rate, the higher
frequency components of the signal cannot be reconstructed properly. The plots of the reconstructed
outputs for various conditions are as shown in figure 1.4.

Dept.ECE, SJBIT Page 7


Smartworld.asia 8

DSP Algorithm and Architecture 10EC751

1.5 Discrete Time Sequences

Sampling Interval T, in the above equation replacing t by nT we get, x (nT) = A cos (2


where n= 0,1, 2,..etc
For simplicity denote x (nT) as x (n)
 x (n) = A cos (2πfnT) where n= 0,1, 2,..etc
We have fs=1/T also θ ΠfnT
 πfnT)= A cos (2πfn/fs) = A cos πn
θ called as digital frequency.
θ = 2πfT = 2πf/fs radians

Fig 1.5 A Cosine Waveform

A sequence that repeats itself after every period N is called a periodic sequence.
Consider a periodic sequence x (n) with period N x (n)=x (n+N) n=……..,-1,0,1,2,……..
Frequency response gives the frequency domain equivalent of a discrete time sequence. It is denoted
as X(ejθ)=∑x(n) e-jnθ

Frequency response of a discrete sequence involves both magnitude response and phase response.

1.6 Discrete Fourier Transform and Fast Fourier Transform

1.6.1 DFT Pair:


DFT is used to transform a time domain sequence x (n) to a frequency domain sequence X
(K).The equations that relate the time domain sequence x (n) and the corresponding frequency domain
sequence X (K) are called DFT Pair and is given by,

Dept.ECE, SJBIT Page 8


Smartworld.asia 9

DSP Algorithm and Architecture 10EC751

1.6.2 The Relationship between DFT and Frequency Response:

We have,

From the above expression it is clear that we can use DFT to find the Frequency response of a

the signal record length.


It is clear from the expression of
samples N has to be a large value. Although DFT is an efficient technique of obtaining the frequency
response of a sequence, it requires more number of complex operations like additions and
multiplications.
Thus many improvements over DFT were proposed. One such technique is to use the
periodicity property of the twiddle factor e- . Those algorithms were called as Fast Fourier
Transform Algorithms. The following table depicts the complexity involved in the computation using
DFT algorithms.

Dept.ECE, SJBIT Page 9


Smartworld.asia 10

DSP Algorithm and Architecture 10EC751

FFT algorithms are classified into two categories via


1. Decimation in Time FFT
2. Decimation in Frequency FFT
In decimation in time FFT the sequence is divided in time domain successively till we reach
the sequences of length 2. Whereas in Decimation in Frequency FFT, the sequence X(K) is divided
successively. The complexity of computation will get reduced considerably in case of FFT algorithms.

1.7 Linear Time Invariant Systems

A system which satisfies superposition theorem is called as a linear system and a system that
has same input output relation at all times is called a Time Invariant System. Systems, which satisfy
both the properties, are called LTI systems.

LTI systems are characterized by its impulse response or unit sample response in time domain whereas
it is characterized by the system function in frequency domain.

1.7.1 Convolution
Convolution is the operation that related the input output of an LTI system, to its unit sample
response. The output of the system y (n) for the input x (n) and the impulse response of the system

Dept.ECE, SJBIT Page 10


Smartworld.asia 11

DSP Algorithm and Architecture 10EC751

being h (n) is given as y (n) = x(n) * h(n) = ∑ -k), x(n) is the input of the system, h(n) is the
impulse response of the system, y(n) is the output of the system.

1.7.2 Z Transformation
Z Transformations are used to find the frequency response of the system. The Z Transform for
a discrete sequence x (n) is given by, X(Z)= ∑x(n) z-n

1.7.3 The System Function


An LTI system is characterized by its System function or the transfer function. The system
function of a system is the ratio of the Z transformation of its output to that of its input. It is denoted as
H (Z) and is given by H (Z) = Y (Z)/ X (Z).
The magnitude and phase of the transfer function H (Z) gives the frequency response of the
system. From the transfer function we can also get the poles and zeros of the system by solving its
numerator and denominator respectively.

1.8 Digital Filters


Filters are used to remove the unwanted components in the sequence. They are characterized
by the impulse response h (n). The general difference equation for an Nth order filter is given by
∑aky(n-k)+ ∑ k x(n-k)
A typical digital filter structure is as shown in figure 1.7.

Fig 1.7 Structure of a Digital Filter

Values of the filter coefficients vary with respect to the type of the filter. Design of a digital filter
involves determining the filter coefficients. Based on the length of the impulse response, digital filters
are classified into two categories via Finite Impulse Response (FIR) Filters and Infinite Impulse
Response (IIR) Filters.

Dept.ECE, SJBIT Page 11


Smartworld.asia 12

DSP Algorithm and Architecture 10EC751

1.8.1 FIR Filters


FIR filters have impulse responses of finite lengths. In FIR filters the present output depends
only on the past and present values of the input sequence but not on the previous output sequences.
Thus they are non recursive hence they are inherently stable.FIR filters possess linear phase response.
Hence they are very much applicable for the applications requiring linear phase response.
The difference equation of an FIR filter is represented as

The frequency response of an FIR filter is given as

The major drawback of FIR filters is, they require more number of filter coefficients to realize a
desired response as compared to IIR filters. Thus the computational time required will also be more.

1.8.2 IIR Filters


Unlike FIR filters, IIR filters have infinite number of impulse response samples. They are
recursive filters as the output depends not only on the past and present inputs but also on the past
outputs. They generally do not have linear phase characteristics. Typical system function of such
filters is given by,

Stability of IIR filters depends on the number and the values of the filter coefficients. The major
advantage of IIR filters over FIR is that, they require lesser coefficients compared to FIR filters for the
same desired response, thus requiring less computation time.

1.8.3 FIR Filter Design


Frequency response of an FIR filter is given by the following expression,

Design procedure of an FIR filter involves the determination of the filter coefficients bk.

1.8.4 IIR Filter Design


IIR filters can be designed using two methods viz using windows and direct method. In this
approach, a digital filter can be designed based on its equivalent analog filter. An analog filter is
designed first for the equivalent analog specifications for the given digital specifications. Then using
appropriate frequency transformations, a digital filter can be obtained. The filter specifications consist
of passband and stopband ripples in dB and Passband and Stopband frequencies in rad/sec.

Dept.ECE, SJBIT Page 12


Smartworld.asia 13

DSP Algorithm and Architecture 10EC751

Fig 1.11 Lowpass Filter Specifications

Direct IIR filter design methods are based on least squares fit to a desired frequency response. These
methods allow arbitrary frequency response specifications.

1.9 Decimation and Interpolation


Decimation and Interpolation are two techniques used to alter the sampling rate of a sequence.
Decimation involves decreasing the sampling rate without violating the sampling theorem whereas
interpolation increases the sampling rate of a sequence appropriately by considering its neighboring
samples.

1.9.1 Decimation
Decimation is a process of dropping the samples without violating sampling theorem. The
factor by which the signal is decimated is called as decimation factor and it is denoted by M. It is
given by,

Fig 1.12 Decimation Process

Dept.ECE, SJBIT Page 13


Smartworld.asia 14

DSP Algorithm and Architecture 10EC751

1.9.2 Interpolation
Interpolation is a process of increasing the sampling rate by inserting new samples in between.
The input output relation for the interpolation, where the sampling rate is increased by a factor L, is
given as,

Fig 1.13 Interpolation Process

Problems:

1. Obtain the transfer function of the IIR filter whose difference equation is given by y (n)=
0.9y (n-1)+0.1x (n)
y (n)= 0.9y (n-1)+0.1x (n)
Taking Z transformation both sides
Y (Z) = 0.9 Z-1 Y (Z) + 0.1 X (Z)
Y (Z) [1- 0.9 Z-1] = 0.1 X (Z)
The transfer function of the system is given by the expression,
H (Z)= Y(Z)/X(Z)
= 0.1/ [ 1- 0.9 Z-1]
Realization of the IIR filter with the above difference equation is as shown in figure.

Dept.ECE, SJBIT Page 14


Smartworld.asia 15

DSP Algorithm and Architecture 10EC751

2. Let x(n)= [0 3 6 9 12] be interpolated with L=3. If the filter coefficients of the
filters are bk=[1/3 2/3 1 2/3 1/3], obtain the interpolated sequence

After inserting zeros,


w (m) = [0 0 0 3 0 0 6 0 0 9 0 0 12]
bk=[1/3 2/3 1 2/3 1/3]
We have,
y(m)= -k) = b-2 w(m+2)+ b-1 w(m+1)+ b0 w(m)+ b1 w(m-1)+ b2 w(m-2)
Substituting the values of m, we get
y(0)= b-2 w(2)+ b-1 w(1)+ b0 w(0)+ b1 w(-1)+ b2 w(-2)= 0
y(1)= b-2 w(3)+ b-1 w(2)+ b0 w(1)+ b1 w(0)+ b2 w(-1)=1
y(2)= b-2 w(4)+ b-1 w(3)+ b0 w(2)+ b1 w(1)+ b2 w(0)=2
Similarly we get the remaining samples as,
y (n) = [ 0 1 2 3 4 5 6 7 8 9 10 11 12]

Recommended Questions
1. Explain with the help of mathematical equations how signed numbers can be
multiplied. The sequence x(n) = [3,2,-2,0,7].It is interpolated using interpolation
sequence bk=[0.5,1,0.5] and the interpolation factor of 2. Find the interpolated
sequence y(m).
2. An analog signal is sampled at the rate of 8KHz. If 512 samples of this signal are used
to compute DFT X(k) determine the analog and digital frequency spacing between
adjacent X(k0 elements. Also, determine analog and digital frequencies corresponding
to k=60.
3. With a neat diagram explain the scheme of the DSP system.
4. What is DSP? What are the important issues to be considered in designing and
implementing a DSP system? Explain in detail.
5. Why signal sampling is required? Explain the sampling process.
6. Define decimation and interpolation process. Explain them using block diagrams and
equations. With a neat diagram explain the scheme of a DSP system.
7. With an example explain the need for the low pass filter in decimation process.
8. For the FIR filter y(n)=(x(n)+x(n-1)+x(n-2))/3. Determine i) System Function ii)
Magnitude and phase function iii) Step response iv) Group Delay.
9. List the major architectural features used in DSP system to achieve high speed program
execution.

Dept.ECE, SJBIT Page 15


Smartworld.asia 16

DSP Algorithm and Architecture 10EC751

10. Explain how to simulate the impulse responses of FIR and IIR filters.
11. Explain the two method of sampling rate conversions used in DSP system, with suitable
block diagrams and examples. Draw the corresponding spectrum.
12. Assuming X(K) as a complex sequence determine the number of complex real
multiplies for computing IDFT using direct and Radix-2 FT algorithms.
13. With a neat diagram explain the scheme of a DSP system. (June.12, 8m)
14. With an example explain the need for the low pass filter in decimation process.
(June.12, 4m)
15. For the FIR filter y(n)=(x(n)+x(n-1)+x(n-2))/3. Determine i) System Function ii)
Magnitude and phase function iii) Step response iv) Group Delay. (June.12, 8m)
16. List the major architectural features used in DSP system to achieve high speed program
execution. (Dec.11, 6m).
17. Explain how to simulate the impulse responses of FIR and IIR filters. (Dec.11, 6m).
18. Explain the two method of sampling rate conversions used in DSP system, with suitable
block diagrams and examples. Draw the corresponding spectrum. (Dec.11, 8m).
19. Explain with the help of mathematical equations how signed numbers can be
multiplied. (July.11, 8m).

20. With a neat diagram explain the scheme of the DSP system. (Dec.10-Jan.11, 8m)
(July.11, 8m).

Dept.ECE, SJBIT Page 16


Smartworld.asia 17

DSP Algorithm and Architecture 10EC751

UNIT-2
Architectures for Programmable Digital Signal Processing
Devices
2.1 Basic Architectural Features
A programmable DSP device should provide instructions similar to a conventional
microprocessor. The instruction set of a typical DSP device should include the following,
a. Arithmetic operations such as ADD, SUBTRACT, MULTIPLY etc
b. Logical operations such as AND, OR, NOT, XOR etc
c. Multiply and Accumulate (MAC) operation
d. Signal scaling operation
In addition to the above provisions, the architecture should also include,
a. On chip registers to store immediate results
b. On chip memories to store signal samples (RAM)
c. On chip memories to store filter coefficients (ROM)
2.2 DSP Computational Building Blocks
Each computational block of the DSP should be optimized for functionality and speed and in
the meanwhile the design should be sufficiently general so that it can be easily integrated with other
blocks to implement overall DSP systems.

2.2.1 Multipliers
The advent of single chip multipliers paved the way for implementing DSP functions on a
VLSI chip. Parallel multipliers replaced the traditional shift and add multipliers now days. Parallel
multipliers take a single processor cycle to fetch and execute the instruction and to store the result.
They are also called as Array multipliers. The key features to be considered for a multiplier are:
a. Accuracy
b. Dynamic range
c. Speed

The number of bits used to represent the operands decides the accuracy and the dynamic range
of the multiplier. Whereas speed is decided by the architecture employed. If the multipliers are
implemented using hardware, the speed of execution will be very high but the circuit complexity will
also increases considerably. Thus there should be a tradeoff between the speed of execution and the
circuit complexity. Hence the choice of the architecture normally depends on the application.

2.2.2 Parallel Multipliers


Consider the multiplication of two unsigned numbers A and B. Let A be represented using m
bits as (Am-1 Am-2 …….. A1 A0) and B be represented using n bits as (Bn-1 Bn-2 …….. B1 B0).
Then the product of these two numbers is given by,

Dept.ECE, SJBIT Page 17


Smartworld.asia 18

DSP Algorithm and Architecture 10EC751

This operation can be implemented paralleling using Braun multiplier whose hardware structure is as
shown in the figure 2.1.

Fig 2.1 Braun Multiplier for a 4X4 Multiplication

Dept.ECE, SJBIT Page 18


Smartworld.asia 19

DSP Algorithm and Architecture 10EC751

2.2.3 Multipliers for Signed Numbers

In the Braun multiplier the sign of the numbers are not considered into account. In order to
implement a multiplier for signed numbers, additional hardware is required to modify the Braun
multiplier. The modified multiplier is called as Baugh-Wooley multiplier.

Consider two signed numbers A and B,

2.2.4 Speed
Conventional Shift and Add technique of multiplication requires n cycles to perform the
multiplication of two n bit numbers. Whereas in parallel multipliers the time required will be the
longest path delay in the combinational circuit used. As DSP applications generally require very high
speed, it is desirable to have multipliers operating at the highest possible speed by having parallel
implementation.

2.2.5 Bus Widths


Consider the multiplication of two n bit numbers X and Y. The product Z can be at most 2n
bits long. In order to perform the whole operation in a single execution cycle, we require two buses of
width n bits each to fetch the operands X and Y and a bus of width 2n bits to store the result Z to the
memory. Although this performs the operation faster, it is not an efficient way of implementation as it
is expensive. Many alternatives for the above method have been proposed. One such method is to use
the program bus itself to fetch one of the operands after fetching the instruction, thus requiring only
one bus to fetch the operands. And the result Z can be stored back to the memory using the same
operand bus. But the problem with this is the result Z is 2n bits long whereas the operand bus is just n
bits long. We have two alternatives to solve this problem, a. Use the n bits operand bus and save Z at
two successive memory locations. Although it stores the exact value of Z in the memory, it takes two
cycles to store the result.
b. Discard the lower n bits of the result Z and store only the higher order n bits into the memory. It is
not applicable for the applications where accurate result is required. Another alternative can be used
for the applications where speed is not a major concern. In which latches are used for inputs and
outputs thus requiring a single bus to fetch the operands and to store the result (Fig 2.2).

Dept.ECE, SJBIT Page 19


Smartworld.asia 20

DSP Algorithm and Architecture 10EC751

Fig 2.2: A Multiplier with Input and Output Latches

2.2.6 Shifters

Shifters are used to either scale down or scale up operands or the results. The following
scenarios give the necessity of a shifter
a. While performing the addition of N numbers each of n bits long, the sum can grow up to n+log2 N
bits long. If the accumulator is of n bits long, then an overflow error will occur. This can be overcome
by using a shifter to scale down the operand by an amount of log2N.
b. Similarly while calculating the product of two n bit numbers, the product can grow up to 2n bits
long. Generally the lower n bits get neglected and the sign bit is shifted to save the sign of the product.
c. Finally in case of addition of two floating-point numbers, one of the operands has to be shifted
appropriately to make the exponents of two numbers equal.
From the above cases it is clear that, a shifter is required in the architecture of a DSP.

2.2.7 Barrel Shifters

In conventional microprocessors, normal shift registers are used for shift operation. As it
requires one clock cycle for each shift, it is not desirable for DSP applications, which generally
involves more shifts. In other words, for DSP applications as speed is the crucial issue, several shifts
are to be accomplished in a single execution cycle. This can be accomplished using a barrel shifter,
which connects the input lines representing a word to a group of output lines with the required shifts
determined by its control inputs. For an input of length n, log2 n control lines are required. And an
dditional control line is required to indicate the direction of the shift.
The block diagram of a typical barrel shifter is as shown in figure 2.3.

Fig 2.3 A Barrel Shifter

Dept.ECE, SJBIT Page 20


Smartworld.asia 21

DSP Algorithm and Architecture 10EC751

Fig 2.4 Implementation of a 4 bit Shift Right Barrel Shifter

Figure 2.4 depicts the implementation of a 4 bit shift right barrel shifter. Shift to right by 0, 1, 2 or 3
bit positions can be controlled by setting the control inputs appropriately.

2.3 Multiply and Accumulate Unit


Most of the DSP applications require the computation of the sum of the products of a series of
successive multiplications. In order to implement such functions a special unit called a multiply and
Accumulate (MAC) unit is required. A MAC consists of a multiplier and a special register called
Accumulator. MACs are used to implement the functions of the type A+BC. A typical MAC unit is as
shown in the figure 2.5.

Dept.ECE, SJBIT Page 21


Smartworld.asia 22

DSP Algorithm and Architecture 10EC751

Fig 2.5 A MAC Unit

Although addition and multiplication are two different operations, they can be performed in parallel.
By the time the multiplier is computing the product, accumulator can accumulate the product of the
previous multiplications. Thus if N products are to be accumulated, N-1 multiplications can overlap
with N-1 additions. During the very first multiplication, accumulator will be idle and during the last
accumulation, multiplier will be idle. Thus N+1 clock cycles are required to compute the sum of N
products.

2.3.1 Overflow and Underflow


While designing a MAC unit, attention has to be paid to the word sizes encountered at the
input of the multiplier and the sizes of the add/subtract unit and the accumulator, as there is a
possibility of overflow and underflows. Overflow/underflow can be avoided by using any of the
following methods viz
a. Using shifters at the input and the output of the MAC
b. Providing guard bits in the accumulator
c. Using saturation logic

Shifters
Shifters can be provided at the input of the MAC to normalize the data and at the output to de
normalize the same.

Guard bits
As the normalization process does not yield accurate result, it is not desirable for some
applications. In such cases we have another alternative by providing additional bits called guard bits in
the accumulator so that there will not be any overflow error. Here the add/subtract unit also has to be
modified appropriately to manage the additional bits of the accumulator.

Dept.ECE, SJBIT Page 22


Smartworld.asia 23

DSP Algorithm and Architecture 10EC751

Saturation Logic
Overflow/ underflow will occur if the result goes beyond the most positive number or below
the least negative number the accumulator can handle. Thus the overflow/underflow error can be
resolved by loading the accumulator with the most positive number which it can handle at the time of
overflow and the least negative number that it can handle at the time of underflow. This method is
called as saturation logic. A schematic diagram of saturation logic is as shown in figure 2.7. In
saturation logic, as soon as an overflow or underflow condition is satisfied the accumulator will be
loaded with the most positive or least negative number overriding the result computed by the MAC
unit.

Fig 2.7: Schematic Diagram of the Saturation Logic

2.4 Arithmetic and Logic Unit


A typical DSP device should be capable of handling arithmetic instructions like ADD, SUB,
INC, DEC etc and logical operations like AND, OR , NOT, XOR etc. The block diagram of a typical
ALU for a DSP is as shown in the figure 2.8.
It consists of status flag register, register file and multiplexers.

Dept.ECE, SJBIT Page 23


Smartworld.asia 24

DSP Algorithm and Architecture 10EC751

Fig 2.8 Arithmetic Logic Unit of a DSP

Status Flags
ALU includes circuitry to generate status flags after arithmetic and logic operations. These flags
include sign, zero, carry and overflow.

Overflow Management
Depending on the status of overflow and sign flags, the saturation logic can be used to limit the
accumulator content.

Register File
Instead of moving data in and out of the memory during the operation, for better speed, a large set of
general purpose registers are provided to store the intermediate results.

2.5 Bus Architecture and Memory


Conventional microprocessors use Von Neumann architecture for memory management
wherein the same memory is used to store both the program and data (Fig 2.9). Although this
architecture is simple, it takes more number of processor cycles for the execution of a single
instruction as the same bus is used for both data and program.

Dept.ECE, SJBIT Page 24


Smartworld.asia 25

DSP Algorithm and Architecture 10EC751

Fig 2.9 Von Neumann Architecture

In order to increase the speed of operation, separate memories were used to store program and
data and a separate set of data and address buses have been given to both memories, the architecture
called as Harvard Architecture. It is as shown in figure 2.10.

Fig 2.10 Harvard Architecture

Although the usage of separate memories for data and the instruction speeds up the processing,
it will not completely solve the problem. As many of the DSP instructions require more than one
operand, use of a single data memory leads to the fetch the operands one after the other, thus
increasing the delay of processing. This problem can be overcome by using two separate data
memories for storing operands separately, thus in a single clock cycle both the operands can be fetched
together (Figure 2.11).

Dept.ECE, SJBIT Page 25


Smartworld.asia 26

DSP Algorithm and Architecture 10EC751

Fig 2.11 Harvard Architecture with Dual Data Memory

Although the above architecture improves the speed of operation, it requires more hardware
and interconnections, thus increasing the cost and complexity of the system. Therefore there should be
a trade off between the cost and speed while selecting memory architecture for a DSP.

2.5.1 On-chip Memories


In order to have a faster execution of the DSP functions, it is desirable to have some memory
located on chip. As dedicated buses are used to access the memory, on chip memories are faster.
Speed and size are the two key parameters to be considered with respect to the on-chip memories.
Speed
On-chip memories should match the speeds of the ALU operations in order to maintain the single
cycle instruction execution of the DSP.
Size
In a given area of the DSP chip, it is desirable to implement as many DSP functions as possible. Thus
the area occupied by the on-chip memory should be minimum so that there will be a scope for
implementing more number of DSP functions on- chip.

2.5.2 Organization of On-chip Memories


Ideally whole memory required for the implementation of any DSP algorithm has to reside on-
chip so that the whole processing can be completed in a single execution cycle. Although it looks as a
better solution, it consumes more space on chip, reducing the scope for implementing any functional
block on-chip, which in turn reduces the speed of execution. Hence some other alternatives have to be
thought of. The following are some other ways in which the on-chip memory can be organized.

Dept.ECE, SJBIT Page 26


Smartworld.asia 27

DSP Algorithm and Architecture 10EC751

a. As many DSP algorithms require instructions to be executed repeatedly, the instruction can be
stored in the external memory, once it is fetched can reside in the instruction cache.
b. The access times for memories on-chip should be sufficiently small so that it can be accessed more
than once in every execution cycle.
c. On-chip memories can be configured dynamically so that they can serve different purpose at
different times.

2.6 Data Addressing Capabilities

Data accessing capability of a programmable DSP device is configured by means of its


addressing modes. The summary of the addressing modes used in DSP is as shown in the table below.

2.6.1 Immediate Addressing Mode


In this addressing mode, data is included in the instruction itself.

2.6.2 Register Addressing Mode


In this mode, one of the registers will be holding the data and the register has to be specified in
the instruction.

2.6.3 Direct Addressing Mode


In this addressing mode, instruction holds the memory location of the operand.

2.6.4 Indirect Addressing Mode


In this addressing mode, the operand is accessed using a pointer. A pointer is generally a
register, which holds the address of the location where the operands resides. Indirect addressing mode
can be extended to inculcate automatic increment or decrement capabilities, which has lead to the
following addressing modes.

Dept.ECE, SJBIT Page 27


Smartworld.asia 28

DSP Algorithm and Architecture 10EC751

2.7 Special Addressing Modes


For the implementation of some real time applications in DSP, normal addressing modes will
not completely serve the purpose. Thus some special addressing modes are required for such
applications.

2.7.1 Circular Addressing Mode


While processing the data samples coming continuously in a sequential manner, circular
buffers are used. In a circular buffer the data samples are stored sequentially from the initial location
till the buffer gets filled up. Once the buffer gets filled up, the next data samples will get stored once
again from the initial location. This process can go forever as long as the data samples are processed in
a rate faster than the incoming data rate.
Circular Addressing mode requires three registers viz
a. Pointer register to hold the current location (PNTR)
b. Start Address Register to hold the starting address of the buffer (SAR)
c. End Address Register to hold the ending address of the buffer (EAR)

There are four special cases in this addressing mode. They are

Dept.ECE, SJBIT Page 28


Smartworld.asia 29

DSP Algorithm and Architecture 10EC751

a. SAR < EAR & updated PNTR > EAR


b. SAR < EAR & updated PNTR < SAR
c. SAR >EAR & updated PNTR > SAR
d. SAR > EAR & updated PNTR < EAR
The buffer length in the first two case will be (EAR-SAR+1) whereas for the next tow cases (SAR-
EAR+1)
The pointer updating algorithm for the circular addressing mode is as shown below.

Dept.ECE, SJBIT Page 29


Smartworld.asia 30

DSP Algorithm and Architecture 10EC751

Fig 2.12 Special Cases in Circular Addressing Mode

Dept.ECE, SJBIT Page 30


Smartworld.asia 31

DSP Algorithm and Architecture 10EC751

2.7.2 Bit Reversed Addressing Mode


To implement FFT algorithms we need to access the data in a bit reversed manner. Hence a
special addressing mode called bit reversed addressing mode is used to calculate the index of the next
data to be fetched. It works as follows. Start with index 0. The present index can be calculated by
adding half the FFT length to the previous index in a bit reversed manner, carry being propagated from
MSB to LSB.
Current index= Previous index+ B (1/2(FFT Size))

2.8 Address Generation Unit


The main job of the Address Generation Unit is to generate the address of the operands
required to carry out the operation. They have to work fast in order to satisfy the timing constraints. As
the address generation unit has to perform some mathematical operations in order to calculate the
operand address, it is provided with a separate ALU.
Address generation typically involves one of the following operations.
a. Getting value from immediate operand, register or a memory location
b. Incrementing/ decrementing the current address
c. Adding/subtracting the offset from the current address
d. Adding/subtracting the offset from the current address and generating new address according to
circular addressing mode
e. Generating new address using bit reversed addressing mode

The block diagram of a typical address generation unit is as shown in figure 2.13.

Fig 2.13 Address generation unit

Dept.ECE, SJBIT Page 31


Smartworld.asia 32

DSP Algorithm and Architecture 10EC751

2.9 Programmability and program Execution


A programmable DSP device should provide the programming capability involving branching,
looping and subroutines. The implementation of repeat capability should be hardware based so that it
can be programmed with minimal or zero overhead. A dedicated register can be used as a counter. In a
normal subroutine call, return address has to be stored in a stack thus requiring memory access for
storing and retrieving the return address, which in turn reduces the speed of operation. Hence a LIFO
memory can be directly interfaced with the program counter.

2.9.1 Program Control


Like microprocessors, DSP also requires a control unit to provide necessary control and timing
signals for the proper execution of the instructions. In microprocessors, the controlling is micro coded
based where each instruction is divided into microinstructions stored in micro memory. As this
mechanism is slower, it is not applicable for DSP applications. Hence in DSP the controlling is
hardwired base where the Control unit is designed as a single, comprehensive, hardware unit.
Although it is more complex it is faster.

2.9.2 Program Sequencer


It is a part of the control unit used to generate instruction addresses in sequence needed to
access instructions. It calculates the address of the next instruction to be fetched. The next address can
be from one of the following sources.
a. Program Counter
b. Instruction register in case of branching, looping and subroutine calls
c. Interrupt Vector table
d. Stack which holds the return address
The block diagram of a program sequencer is as shown in figure 2.14.

Fig 2.14 Program Sequencer


Dept.ECE, SJBIT Page 32
Smartworld.asia 33

DSP Algorithm and Architecture 10EC751

Program sequencer should have the following circuitry:


a. PC has to be updated after every fetch
b. Counter to hold count in case of looping
c. A logic block to check conditions for conditional jump instructions
d. Condition logic-status flag

Problems:
1). Investigate the basic features that should be provided in the DSP architecture to be used to
implement the following Nth order FIR filter.

Solution:-

y(n)= ∑h(i) x(n-i) n=0,1,2…


In order to implement the above operation in a DSP, the architecture requires the
following features

i. A RAM to store the signal samples x (n)


ii. A ROM to store the filter coefficients h (n)
iii. An MAC unit to perform Multiply and Accumulate operation
iv. An accumulator to store the result immediately
v. A signal pointer to point the signal sample in the memory
vi. A coefficient pointer to point the filter coefficient in the memory
vii. A counter to keep track of the count
viii. A shifter to shift the input samples appropriately

2). It is required to find the sum of 64, 16 bit numbers. How many bits should the
accumulator have so that the sum can be computed without the occurrence of
overflow error or loss of accuracy?
The sum of 64, 16 bit numbers can grow up to (16+ log2 64 )=22 bits long. Hence
the accumulator should be 22 bits long in order to avoid overflow error from occurring.

1. In the previous problem, it is decided to have an accumulator with only 16 bits


but shift the numbers before the addition to prevent overflow, by how many bits
should each number be shifted?
As the length of the accumulator is fixed, the operands have to be shifted by an
amount of log2 64 = 6 bits prior to addition operation, in order to avoid the condition of
overflow.
2. If all the numbers in the previous problem are fixed point integers, what is the
actual sum of the numbers?
The actual sum can be obtained by shifting the result by 6 bits towards left side after the sum
being computed. Therefore
Actual Sum= Accumulator content X 2 6

3. If a sum of 256 products is to be computed using a pipelined MAC unit, and if the MAC
execution time of the unit is 100nsec, what will be the total time required to complete the
operation?
Dept.ECE, SJBIT Page 33
Smartworld.asia 34

DSP Algorithm and Architecture 10EC751

As N=256 in this case, MAC unit requires N+1=257execution cycles. As the single MAC
execution time is 100nsec, the total time required will be, (257*100nsec)=25.7usec

4. Consider a MAC unit whose inputs are 16 bit numbers. If 256 products are to be
summed up in this MAC, how many guard bits should be provided for the
accumulator to prevent overflow condition from occurring?
As it is required to calculate the sum of 256, 16 bit numbers, the sum can be as
long as (16+ log2 256)=24 bits. Hence the accumulator should be capable of handling
these 22 bits. Thus the guard bits required will be (24-16)= 8 bits.
The block diagram of the modified MAC after considering the guard or extention bits is as shown in
the figure

5. What are the memory addresses of the operands in each of the following cases of indirect
addressing modes? In each case, what will be the content of the addreg after the memory
access? Assume that the initial contents of the addreg and the offsetreg are 0200h and 0010h,
respectively.
a. ADD *addreg
b.ADD +*addreg
c. ADD offsetreg+,*addreg
d. ADD *addreg,offsetreg-

6. A DSP has a circular buffer with the start and the end addresses as 0200h and 020Fh
respectively. What would be the new values of the address pointer of the buffer if, in the course
of address computation, it gets updated to
Dept.ECE, SJBIT Page 34
Smartworld.asia 35

DSP Algorithm and Architecture 10EC751

a. 0212h
b. 01FCh
Buffer Length= (EAR-SAR+1) = 020F-0200+1=10h
a. New Address Pointer= Updated Pointer-buffer length = 0212-10=0202h
b. New Address Pointer= Updated Pointer+ buffer length = 01FC+10=020Ch

7. Repeat the previous problem for SAR= 0210h and EAR=0201h


Buffer Length= (SAR-EAR+1)= 0210-0201+1=10h
c. New Address Pointer= Updated Pointer- buffer length = 0212-10=0202h
d. New Address Pointer= Updated Pointer+ buffer length = 01FC+10=020Ch

9. Compute the indices for an 8-point FFT using Bit reversed Addressing Mode
Start with index 0. Therefore the first index would be (000)
Next index can be calculated by adding half the FFT length, in this case it is (100)
to the previous index. i.e. Present Index= (000)+B (100)= (100)
Similarly the next index can be calculated as
Present Index= (100)+B (100)= (010)
The process continues till all the indices are calculated. The following table summarizes
the calculation.

Dept.ECE, SJBIT Page 35


Smartworld.asia 36

DSP Algorithm and Architecture 10EC751

Recommended Questions:

1. Explain implementation of 8- tap FIR filter, (i) pipelined using MAC units and (ii) parallel
using two MAC units. Draw block diagrams.
2. What is the role of a shifter in DSP? Explain the implementation of 4-bit shift right barrel
shifter, with a diagram.
3. Identify the addressing modes of the operands in each of the following instructions & their
operations
i)ADD B ii) ADD #1234h iii) ADD 5678h iv) ADD +*addreg
4. Draw the schematic diagram of the saturation logic and explain the same.
5. Explain how the circular addressing mode and bit reversal addressing mode are implemented in
a DSP.
6. Explain the purpose of program sequencer.
7. Give the structure of a 4X4 Braun multiplier, Explain its concept. What modification is
required to carry out multiplication of signed numbers? Comment on the speed of the
multiplier.
8. Explain guard bits in a MAC unit of DSP. Consider a MAC unit whose inputs are 24-bit
numbers. How many guard bits should be provided if 512 products have to be added in the
accumulator to prevent overflow condition? What is the overall size of the accumulator
required?
9. With a neat block diagram explain ALU of DSP system.
10. Explain circular buffer addressing mode ii) Parallelism iii) Guard bits.
11. The 256 unsigned numbers, 16 bit each are to be summed up in a processor. How many guard
bits are needed to prevent overflow.
12. How will you implement an 8X8 multiplier using 4X4 multipliers as the building blocks.
13. Describe the basic features that should be provided in the DSP architecture to be used to
implement the Nth order FIR filter, where x(n) denotes the input sample, y(n) the output
sample and h(i) denotes ith filter coefficient.(Dec.09-Jan.10, 8m)
14. Explain the issues to be considered in designing and implementing a DSP system, with the help
of a neat block diagram. (May/June10 , 6m)
15. Briefly explain the major features of programmable DSPs. (May/June10, 8m)

Dept.ECE, SJBIT Page 36


Smartworld.asia 37

DSP Algorithm and Architecture 10EC751

16. Explain the operation used in DSP to increase the sampling rate. The sequence x(n)=[0,2,4,6,8]
is interpolated using interpolation sequence bk =[1/2,1,1/2] and the interpolation factor is 2.find
the interpolated sequence y(m). (May/June10, 8m)
17. Explain with the help of mathematical equations how signed numbers can be multiplied.
(Dec.10-Jan.11, 8m)
18. The sequence x(n) = [3,2,-2,0,7].It is interpolated using interpolation sequence bk=[0.5,1,0.5]
and the interpolation factor of 2. Find the interpolated sequence y(m).(Dec.10-Jan.11, 6m)
19. Why signal sampling is required? Explain the sampling process. (Dec.12, 5m)
20. Define decimation and interpolation process. Explain them using block diagrams and
equations. (Dec.12, 6m).

Dept.ECE, SJBIT Page 37


Smartworld.asia 38

DSP Algorithm and Architecture 10EC751

UNIT-3

Programmable Digital Signal Processors

3.1 Introduction:
Leading manufacturers of integrated circuits such as Texas Instruments (TI), Analog devices &
Motorola manufacture the digital signal processor (DSP) chips. These manufacturers have developed a
range of DSP chips with varied complexity.
The TMS320 family consists of two types of single chips DSPs: 16-bit fixed point &32-bit floating-
point. These DSPs possess the operational flexibility of high-speed controllers and the numerical
capability of array processors

3.2 Commercial Digital Signal-Processing Devices:


There are several families of commercial DSP devices. Right from the early eighties, when
these devices began to appear in the market, they have been used in numerous applications, such as
communication, control, computers, Instrumentation, and consumer electronics. The architectural
features and the processing power of these devices have been constantly upgraded based on the
advances in technology and the application needs. However, their basic versions, most of them have
Harvard architecture, a single-cycle hardware multiplier, an address generation unit with dedicated
address registers, special addressing modes, on-chip peripherals interfaces. Of the various families of
programmable DSP devices that are commercially available, the three most popular ones are those
from Texas Instruments, Motorola, and Analog Devices. Texas Instruments was one of the first to
come out with a commercial programmable DSP with the introduction of its TMS32010 in 1982.

Summary of the Architectural Features of three fixed-Points DSPs

Dept.ECE, SJBIT Page 38


Smartworld.asia 39

DSP Algorithm and Architecture 10EC751

3.3. The architecture of TMS320C54xx digital signal processors:

TMS320C54xx processors retain in the basic Harvard architecture of their predecessor,


TMS320C25, but have several additional features, which improve their performance over it. Figure 3.1
shows a functional block diagram of TMS320C54xx processors. They have one program and three
data memory spaces with separate buses, which provide simultaneous accesses to program instruction
and two data operands and enables writing of result at the same time. Part of the memory is
implemented on-chip and consists of combinations of ROM, dual-access RAM, and single-access
RAM. Transfers between the memory spaces are also possible.
The central processing unit (CPU) of TMS320C54xx processors consists of a 40- bit arithmetic
logic unit (ALU), two 40-bit accumulators, a barrel shifter, a 17x17 multiplier, a 40-bit adder, data
address generation logic (DAGEN) with its own arithmetic unit, and program address generation logic
(PAGEN). These major functional units are supported by a number of registers and logic in the
architecture. A powerful instruction set with a hardware-supported, single-instruction repeat and block
repeat operations, block memory move instructions, instructions that pack two or three simultaneous
reads, and arithmetic instructions with parallel store and load make these devices very efficient for
running high-speed DSP algorithms.
Several peripherals, such as a clock generator, a hardware timer, a wait state generator, parallel
I/O ports, and serial I/O ports, are also provided on-chip. These peripherals make it convenient to
interface the signal processors to the outside world. In these following sections, we examine in detail
the various architectural features of the TMS320C54xx family of processors.
Dept.ECE, SJBIT Page 39
Smartworld.asia 40

DSP Algorithm and Architecture 10EC751

Figure 3.1.Functional architecture for TMS320C54xx processors.

Dept.ECE, SJBIT Page 40


Smartworld.asia 41

DSP Algorithm and Architecture 10EC751

3.3.1 Bus Structure:


The performance of a processor gets enhanced with the provision of multiple buses to provide
simultaneous access to various parts of memory or peripherals. The 54xx architecture is built around
four pairs of 16-bit buses with each pair consisting of an address bus and a data bus. As shown in
Figure 3.1, these are The program bus pair (PAB, PB); which carries the instruction code from the
program memory. Three data bus pairs (CAB, CB; DAB, DB; and EAB, EB); which interconnected
the various units within the CPU. In Addition the pair CAB, CB and DAB, DB are used to read from
the data memory, while The pair EAB, EB; carries the data to be written to the memory. The ‘54xx
can generate up to two data-memory addresses per cycle using the two auxiliary register arithmetic
unit (ARAU0 and ARAU1) in the DAGEN block. This enables accessing two operands
simultaneously.

3.3.2 Central Processing Unit (CPU):


The ‘54xx CPU is common to all the ‘54xx devices. The ’54xx CPU contains a 40-bit
arithmetic logic unit (ALU); two 40-bit accumulators (A and B); a barrel shifter; a
17 x 17-bit multiplier; a 40-bit adder; a compare, select and store unit (CSSU); an exponent
encoder(EXP); a data address generation unit (DAGEN); and a program address generation unit
(PAGEN).
The ALU performs 2’s complement arithmetic operations and bit-level Boolean operations on
16, 32, and 40-bit words. It can also function as two separate 16-bit ALUs
and perform two 16-bit operations simultaneously. Figure 3.2 show the functional diagram of the ALU
of the TMS320C54xx family of devices.

Accumulators A and B store the output from the ALU or the multiplier/adder block and provide a
second input to the ALU. Each accumulators is divided into three parts: guards bits (bits 39-32), high-
order word (bits-31-16), and low-order word (bits 15- 0), which can be stored and retrieved
individually. Each accumulator is memory-mapped and partitioned. It can be configured as the
destination registers. The guard bits are used as a head margin for computations.

Dept.ECE, SJBIT Page 41


Smartworld.asia 42

DSP Algorithm and Architecture 10EC751

Figure 3.2.Functional diagram of the central processing unit of the TMS320C54xx


processors.

Barrel shifter: provides the capability to scale the data during an operand read or write.
No overhead is required to implement the shift needed for the scaling operations. The’54xx barrel
shifter can produce a left shift of 0 to 31 bits or a right shift of 0 to 16 bits on the input data. The shift
count field of status registers ST1, or in the temporary
register T. Figure 3.3 shows the functional diagram of the barrel shifter of TMS320C54xx processors.
The barrel shifter and the exponent encoder normalize the values in an accumulator in a single cycle.
The LSBs of the output are filled with0s, and the MSBs can be either zero filled or sign extended,
depending on the state of the sign-extension mode bit in the status register ST1. An additional shift
capability enables the processor to perform numerical scaling, bit extraction, extended arithmetic, and
overflow prevention operations.

Dept.ECE, SJBIT Page 42


Smartworld.asia 43

DSP Algorithm and Architecture 10EC751

Figure 3.3.Functional diagram of the barrel shifter

Multiplier/adder unit: The kernel of the DSP device architecture is multiplier/adder unit. The
multiplier/adder unit of TMS320C54xx devices performs 17 x 17 2’s complement multiplication with
a 40-bit addition effectively in a single instruction cycle.
In addition to the multiplier and adder, the unit consists of control logic for integer and
fractional computations and a 16-bit temporary storage register, T. Figure 3.4 show the functional
diagram of the multiplier/adder unit of TMS320C54xx processors. The compare, select, and store unit
(CSSU) is a hardware unit specifically incorporated to accelerate the add/compare/select operation.
This operation is essential to implement the Viterbi algorithm used in many signal-processing
applications. The exponent encoder unit supports the EXP instructions, which stores in the T register
the number of leading redundant bits of the accumulator content. This information is useful while
shifting the accumulator content for the purpose of scaling.

Dept.ECE, SJBIT Page 43


Smartworld.asia 44

DSP Algorithm and Architecture 10EC751

Figure 3.4. Functional diagram of the multiplier/adder unit of TMS320C54xx processors.

3.3.3 Internal Memory and Memory-Mapped Registers:


The amount and the types of memory of a processor have direct relevance to the efficiency and
performance obtainable in implementations with the processors. The ‘54xx memory is organized into
three individually selectable spaces: program, data, and I/O spaces. All ‘54xx devices contain both
RAM and ROM. RAM can be either dual-access type (DARAM) or single-access type (SARAM). The
on-chip RAM for these processors is organized in pages having 128 word locations on each page.
The ‘54xx processors have a number of CPU registers to support operand addressing and
computations. The CPU registers and peripherals registers are all located on page 0 of the data

Dept.ECE, SJBIT Page 44


Smartworld.asia 45

DSP Algorithm and Architecture 10EC751

memory. Figure 3.5(a) and (b) shows the internal CPU registers and peripheral registers with their
addresses. The processors mode status (PMST) registers
that is used to configure the processor. It is a memory-mapped register located at address 1Dh on page
0 of the RAM. A part of on-chip ROM may contain a boot loader and look-up tables for function such
as sine, cosine, μ- law, and A- law.

Figure 3.5(a) Internal memory-mapped registers of TMS320C54xx processors.

Dept.ECE, SJBIT Page 45


Smartworld.asia 46

DSP Algorithm and Architecture 10EC751

Figure 3.5(b).peripheral registers for the TMS320C54xx processors

Status registers (ST0,ST1):


ST0: Contains the status of flags (OVA, OVB, C, TC) produced by arithmetic operations
& bit manipulations.
ST1: Contain the status of various conditions & modes. Bits of ST0&ST1registers can be set or clear
with the SSBX & RSBX instructions.
PMST: Contains memory-setup status & control information.

Dept.ECE, SJBIT Page 46


Smartworld.asia 47

DSP Algorithm and Architecture 10EC751

Figure 3.6(a). ST0 diagram

ARP: Auxiliary register pointer.


TC: Test/control flag.
C: Carry bit.
OVA: Overflow flag for accumulator A.
OVB: Overflow flag for accumulator B.
DP: Data-memory page pointer.

Figure 3.6(b). ST1 diagram


BRAF: Block repeat active flag
BRAF=0, the block repeat is deactivated.
BRAF=1, the block repeat is activated.

CPL: Compiler mode


CPL=0, the relative direct addressing mode using data page pointer is selected.
CPL=1, the relative direct addressing mode using stack pointer is selected.

HM: Hold mode, indicates whether the processor continues internal execution or acknowledge for
external interface.

INTM: Interrupt mode, it globally masks or enables all interrupts.


INTM=0_all unmasked interrupts are enabled.
INTM=1_all masked interrupts are disabled.
0: Always read as 0

OVM: Overflow mode.


OVM=1_the destination accumulator is set either the most positive value or the most negative value.
OVM=0_the overflowed result is in destination accumulator.

SXM: Sign extension mode.


SXM=0 _Sign extension is suppressed.
Dept.ECE, SJBIT Page 47
Smartworld.asia 48

DSP Algorithm and Architecture 10EC751

SXM=1_Data is sign extended

C16: Dual 16 bit/double-Precision arithmetic mode.


C16=0_ALU operates in double-Precision arithmetic mode.
C16=1_ALU operates in dual 16-bit arithmetic mode.

FRCT: Fractional mode.


FRCT=1_the multiplier output is left-shifted by 1bit to compensate an extra sign bit.

CMPT: Compatibility mode.


CMPT=0_ ARP is not updated in the indirect addressing mode.
CMPT=1_ARP is updated in the indirect addressing mode.

ASM: Accumulator Shift Mode.


5 bit field, & specifies the Shift value within -16 to 15 range.

Processor Mode Status Register (PMST):

INTR: Interrupt vector pointer, point to the 128-word program page where the interrupt vectors
reside.
MP/MC: Microprocessor/Microcomputer mode,
MP/MC=0, the on chip ROM is enabled.
MP/MC=1, the on chip ROM is enabled.

OVLY: RAM OVERLAY, OVLY enables on chip dual access data RAM blocks to be mapped into
program space.

AVIS: It enables/disables the internal program address to be visible at the address pins.
DROM: Data ROM, DROM enables on-chip ROM to be mapped into data space.
CLKOFF: CLOCKOUT off.

SMUL: Saturation on multiplication.

SST: Saturation on store.

Dept.ECE, SJBIT Page 48


Smartworld.asia 49

DSP Algorithm and Architecture 10EC751

3.4 Data Addressing Modes of TMS320C54X Processors:

Data addressing modes provide various ways to access operands to execute instructions and place
results in the memory or the registers. The 54XX devices offer seven basic addressing modes
1. Immediate addressing.
2. Absolute addressing.
3. Accumulator addressing.
4. Direct addressing.
5. Indirect addressing.
6. Memory mapped addressing
7. Stack addressing.

3.4.1 Immediate addressing:


The instruction contains the specific value of the operand. The operand can be short (3,5,8 or 9
bit in length) or long (16 bits in length). The instruction syntax for short operands occupies one
memory location,
Example: LD #20, DP.
RPT #0FFFFh.

3.4.2 Absolute Addressing:


The instruction contains a specified address in the operand.
1. Dmad addressing. MVDK Smem,dmad, MVDM dmad,MMR
2. Pmad addressing. MVDP Smem,pmad, MVPD pmem,Smad
3. PA addressing. PORTR PA, Smem,
4.*(lk) addressing .

3.4.3 Accumulator Addressing:


Accumulator content is used as address to transfer data between Program and Data memory.
Ex: READA *AR2

3.4.4 Direct Addressing:


Base address + 7 bits of value contained in instruction = 16 bit address. A page of 128
locations can be accessed without change in DP or SP.Compiler mode bit (CPL) in ST1 register is
used.
If CPL =0 selects DP
CPL = 1 selects SP,
It should be remembered that when SP is used instead of DP, the effective address is
computed by adding the 7-bit offset to SP.

Dept.ECE, SJBIT Page 49


Smartworld.asia 50

DSP Algorithm and Architecture 10EC751

Figure 3.7 Block diagram of the direct addressing mode for TMS320C54xx Processors.

3.4.5 Indirect Addressing:

TMS320C54xx have 8, 16 bit auxiliary register (AR0 – AR 7). Two auxiliary register arithmetic units
(ARAU0 & ARAU1)
Used to access memory location in fixed step size. AR0 register is used for indexed and bit reverse
addressing modes.
– operand addressing
MOD _ type of indirect addressing
ARF _ AR used for addressing
ARP depends on (CMPT) bit in ST1
CMPT = 0, Standard mode, ARP set to zero
CMPT = 1, Compatibility mode, Particularly AR selected by ARP

Dept.ECE, SJBIT Page 50


Smartworld.asia 51

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 51


Smartworld.asia 52

DSP Algorithm and Architecture 10EC751

Table 3.2 Indirect addressing options with a single data –memory operand.
Circular Addressing;

 Used in convolution, correlation and FIR filters.


 A circular buffer is a sliding window contains most recent data. Circular buffer of size R must
start on a N-bit boundary, where 2N > R .

 Effective base address (EFB): By zeroing the N LSBs of a user selected AR (ARx).

If 0 _ index + step < BK ; index = index +step;
else if index + step _ BK ; index = index + step - BK;
else if index + step < 0; index + step + BK

Dept.ECE, SJBIT Page 52


Smartworld.asia 53

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 53


Smartworld.asia 54

DSP Algorithm and Architecture 10EC751

Bit-Reversed Addressing:
o Used for FFT algorithms.
o AR0 specifies one half of the size of the FFT.
o The value of AR0 = 2N-1: N = integer FFT size = 2N
o AR0 + AR (selected register) = bit reverse addressing.
o The carry bit propagating from left to right.

Dual-Operand Addressing:
Dual data-memory operand addressing is used for instruction that simultaneously
perform two reads (32-bit read) or a single read (16-bit read) and a parallel store (16-bit
store) indicated by two vertical bars, II. These instructions access operands using indirect addressing
mode.
If in an instruction with a parallel store the source operand the destination operand point to the
same location, the source is read before writing to the destination. Only 2 bits are available in the
instruction code for selecting each auxiliary register in this mode. Thus, just four of the auxiliary
registers, AR2-AR5, can be used, The ARAUs together with these registers, provide capability to
access two operands in a single cycle. Figure 3.11 shows how an address is generated using dual data-
memory operand addressing.

Dept.ECE, SJBIT Page 54


Smartworld.asia 55

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 55


Smartworld.asia 56

DSP Algorithm and Architecture 10EC751

3.4.6. Memory-Mapped Register Addressing:


 Used to modify the memory-mapped registers without affecting the current data page
 pointer (DP) or stack-pointer (SP)
o Overhead for writing to a register is minimal
o Works for direct and indirect addressing
o Scratch –pad RAM located on data PAGE0 can be modified
 STM #x, DIRECT
 STM #tbl, AR1

3.4.7 Stack Addressing:


• Used to automatically store the program counter during interrupts and subroutines.
• Can be used to store additional items of context or to pass data values.
• Uses a 16-bit memory-mapped register, the stack pointer (SP).
• PSHD X2

Dept.ECE, SJBIT Page 56


Smartworld.asia 57

DSP Algorithm and Architecture 10EC751

3.5. Memory Space of TMS320C54xx Processors


 A total of 128k words extendable up to 8192k words.
 Total memory includes RAM, ROM, EPROM, EEPROM or Memory mapped peripherals.
 mapped
registers.

Dept.ECE, SJBIT Page 57


Smartworld.asia 58

DSP Algorithm and Architecture 10EC751

Figure 3.14 Memory map for the TMS320C5416 Processor.

Dept.ECE, SJBIT Page 58


Smartworld.asia 59

DSP Algorithm and Architecture 10EC751

3.6. Program Control


 It contains program counter (PC), the program counter related H/W, hard stack, repeat
counters &status registers.
 PC addresses memory in several ways namely:
 Branch: The PC is loaded with the immediate value following the branch instruction
 Subroutine call: The PC is loaded with the immediate value following the call instruction
 Interrupt: The PC is loaded with the address of the appropriate interrupt vector.
 Instructions such as BACC, CALA, etc ;The PC is loaded with the contents of the accumulator
low word
 End of a block repeat loop: The PC is loaded with the contents of the block repeat program
address start register.
 Return: The PC is loaded from the top of the stack.

Problems:

1. Assuming the current content of AR3 to be 200h, what will be its contents after
each of the following TMS320C54xx addressing modes is used? Assume that the
contents of AR0 are 20h.
a. *AR3+0
b. *AR3-0
c. *AR3+
d. *AR3
e. *AR3
f. *+AR3 (40h)
g. *+AR3 (-40h)
Solution:
a. AR3 ← AR3 + AR0;
AR3 = 200h + 20h = 220h
b. AR3← AR3 - AR0;
AR3 = 200h - 20h = 1E0h
c. AR3 ← AR3 + 1;
AR3 = 200h + 1 = 201h
d. AR3 ← AR3 - 1;
AR3 = 200h - 1 = 1FFh
e. AR3 is not modified.
AR3 = 200h
f. AR3 ← AR3 + 40h;
AR3 = 200 + 40h = 240h
g. AR3 ← AR3 - 40h;
AR3 = 200 - 40h = 1C0h

Dept.ECE, SJBIT Page 59


Smartworld.asia 60

DSP Algorithm and Architecture 10EC751

2. Assuming the current contents of AR3 to be 200h, what will be its contents after
each of the following TMS320C54xx addressing modes is used? Assume that the contents of AR0 are
20h
a. *AR3 + 0B
b. *AR3 – 0B
Solution:
a. AR3 ← AR3 + AR0 with reverse carry propagation;
AR3 = 200h + 20h (with reverse carry propagation) = 220h.
b. AR3 ← AR3 - AR0 with reverse carry propagation;
AR3 = 200h - 20h (with reverse carry propagation) = 23Fh.

Recommended Questions:
1. Compare architectural features of TMS320C25 and DSP6000 fixed point digital signal
processors. (Dec.09-Jan.10, 6m)
2. Write an explanatory note on direct addressing mode of TMS320C54XX processors. Give
example. (Dec.09-Jan.10, 6m)
3. Describe the operation of the following instructions of TMS320C54XX processors.
i) MPY *AR2-,*AR4+0B (ii) MAC *ar5+,#1234h,A (iii) STH A,1,*AR2 iv) SSBX
SXM (Dec.09-Jan.10, 8m)
4. With a block diagram explain the indirect addressing mode of TMS320C54XX processor using
dual data memory operand. (June.12, 6m)
5. What is the function of an address generation unit explain with the help of block diagram.
(Dec.12, 6m)
6. Why circular buffers are required in DSP processor? How they are implemented? (Dec.12, 2m)
7. Explain the direct addressing mode of the TMS320C54XX processor with the help of a block
diagram. (Dec.12, 2m)
8. Describe the multiplier/adder unit of TMS320c54xx processor with a neat block diagram.
(May/June2010, 6m)
9. Describe any four data addressing modes of TMS320c54xx processor(May/June2010, 8m)

10. Assume that the current content of AR3 is 400h, what will be its contents after each of the
following. Assume that the content of AR0 is 40h. (May/June2010, 8m)

Dept.ECE, SJBIT Page 60


Smartworld.asia 61

DSP Algorithm and Architecture 10EC751

11. Explain PMST register. (May/June2011, 8m)


12. With an example each, explain immediate, absolute, and direct addressing
mode.(May/June2011, 12m)
13. Explain the functioning of barrel shifter in TMS320C54XX processor. (June.12, 6m)
14. Explain sequential and other types of program control(June.11, 7m)
15. With an example each, explain immediate, absolute, and direct addressing mode.
16. Explain the functioning of barrel shifter in TMS320C54XX processor.
17. Explain sequential and other types of program control
18. Assume that the current content of AR3 is 400h, what will be its contents after each of the
following. Assume that the content of AR0 is 40h.
19. Explain PMST register.
20. Compare architectural features of TMS320C25 and DSP6000 fixed point digital signal
processors.

Dept.ECE, SJBIT Page 61


Smartworld.asia 62

DSP Algorithm and Architecture 10EC751

UNIT-4

Instruction and programming

4.1Assembly language instructions can be classified as:


 Arithmetic operations
 Load and store instructions.
 Logical operations
 Program-control operations

Dept.ECE, SJBIT Page 62


Smartworld.asia 63

DSP Algorithm and Architecture 10EC751

4.1.1 Arithmetic Instructions:

Dept.ECE, SJBIT Page 63


Smartworld.asia 64

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 64


Smartworld.asia 65

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 65


Smartworld.asia 66

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 66


Smartworld.asia 67

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 67


Smartworld.asia 68

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 68


Smartworld.asia 69

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 69


Smartworld.asia 70

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 70


Smartworld.asia 71

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 71


Smartworld.asia 72

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 72


Smartworld.asia 73

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 73


Smartworld.asia 74

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 74


Smartworld.asia 75

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 75


Smartworld.asia 76

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 76


Smartworld.asia 77

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 77


Smartworld.asia 78

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 78


Smartworld.asia 79

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 79


Smartworld.asia 80

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 80


Smartworld.asia 81

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 81


Smartworld.asia 82

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 82


Smartworld.asia 83

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 83


Smartworld.asia 84

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 84


Smartworld.asia 85

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 85


Smartworld.asia 86

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 86


Smartworld.asia 87

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 87


Smartworld.asia 88

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 88


Smartworld.asia 89

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 89


Smartworld.asia 90

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 90


Smartworld.asia 91

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 91


Smartworld.asia 92

DSP Algorithm and Architecture 10EC751

MVPD: Move Data From Program Memory to Data Memory

PORTR: Read Data from Port

PORTW: Write Data to Port

Dept.ECE, SJBIT Page 92


Smartworld.asia 93

DSP Algorithm and Architecture 10EC751

READA: Read Program Memory addressed by Accumulator A and Store in Data


Memory

WRITA: Write Data to Program Memory Addressed by Accumulator A

Branch Instructions

B[D]: Branch Unconditionally

BACC[D]: Branch to Location Specified by Accumulator

Dept.ECE, SJBIT Page 93


Smartworld.asia 94

DSP Algorithm and Architecture 10EC751

BANZ[D]: Branch on Auxiliary Register Not Zero

BC [D]: Branch Conditionally

FB [D]: Far Branch Unconditionally

FBACC [D]: Far Branch to Location Specified by Accumulator

Dept.ECE, SJBIT Page 94


Smartworld.asia 95

DSP Algorithm and Architecture 10EC751

CALA [D]: Call Subroutine at Location Specified by Accumulator

CALL[D]: Call Unconditionally

CC [D]: Call Conditionally

Dept.ECE, SJBIT Page 95


Smartworld.asia 96

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 96


Smartworld.asia 97

DSP Algorithm and Architecture 10EC751

FCALA [D]: Far Call Subroutine at Location Specified by Accumulator

Dept.ECE, SJBIT Page 97


Smartworld.asia 98

DSP Algorithm and Architecture 10EC751

4.1.5. Interrupt Instructions:

INTR: Software Interrupt

TRAP: Software Interrupt

Dept.ECE, SJBIT Page 98


Smartworld.asia 99

DSP Algorithm and Architecture 10EC751

4.1.6. Return Instructions

FRET [D]: Far Return

FRETE [D]: Enable Interrupts and Far Return From Interrupt

RC [D]: Return Conditionally

Dept.ECE, SJBIT Page 99


Smartworld.asia 100

DSP Algorithm and Architecture 10EC751

Dept.ECE, SJBIT Page 100


Smartworld.asia 101

DSP Algorithm and Architecture 10EC751

RET [D]: Return

RETF [D]: Enable Interrupts and Fast Return From Interrupt

4.1.7. Repeat Instructions

RPT: Repeat Next Instruction

RPTB [D]: Block Repeat

Dept.ECE, SJBIT Page 101


Smartworld.asia 102

DSP Algorithm and Architecture 10EC751

RPTZ: Repeat Next Instruction and Clear Accumulator

4.1.8. Stack-Manipulating Instructions

FRAME: Stack Pointer Immediate Offset

POPD: Pop Top of Stack to Data Memory

Dept.ECE, SJBIT Page 102


Smartworld.asia 103

DSP Algorithm and Architecture 10EC751

POPM: Pop Top of Stack to Memory-Mapped Register

PSHD: Push Data-Memory Value onto Stack

PSHM: Push Memory-Mapped Register onto Stack

4.1.9. Miscellaneous Program-Control Instructions

SSBX: Set Status Register Bit

RSBX: Reset Status Register Bit

Dept.ECE, SJBIT Page 103


Smartworld.asia 104

DSP Algorithm and Architecture 10EC751

NOP: No Operation

RESET: Software Reset

4.3. On chip peripherals:

Dept.ECE, SJBIT Page 104


Smartworld.asia 105

DSP Algorithm and Architecture 10EC751

It facilitates interfacing with external devices. The peripherals are:


 General purpose I/O pins
 A software programmable wait state generator.
 Hardware timer
 Host port interface (HPI)
 Clock generator
 Serial port

4.3.1 It has two general purpose I/O pins:

 BIO-input pin used to monitor the status of external devices.


 XF- output pin, software controlled used to signal external devices

4.3.2. Software programmable wait state generator:


 Extends external bus cycles up to seven machine cycles.

4.3.3. Hardware Timer



of 3 memory mapped registers:


 The timer register (TIM)
 Timer period register (PRD)
 Timer controls register (TCR)
• Pre scaler block (PSC).
• TDDR (Time Divide Down ratio)
• TIN &TOUT

The timer register (TIM) is a 16-bit memory-mapped register that decrements at every pulse from the
prescaler block (PSC).
The timer period register (PRD) is a 16-bit memory-mapped register whose contents are loaded onto
the TIM whenever the TIM decrements to zero or the device is reset (SRESET).
The timer can also be independently reset using the TRB signal. The timer control register
(TCR) is a 16-bit memory-mapped register that contains status and control bits. Table shows the
functions of the various bits in the TCR.
The prescaler block is also an on-chip counter. Whenever the prescaler bits count down to 0, a
clock pulse is given to the TIM register that decrements the TIM register by 1. The TDDR bits contain
the divide-down ratio, which is loaded onto the prescaler block after each time the prescaler bits count
down to 0.
That is to say that the 4-bit value of TDDR determines the divide-by ratio of the timer clock
with respect to the system clock. In other words, the TIM decrements either at the rate of the system
clock or at a rate slower than that as decided by the value of the TDDR bits. TOUT and TINT are the
output signal generated as the TIM register decrements to 0. TOUT can trigger the start of the
conversion signal in an ADC interfaced to the DSP.

Dept.ECE, SJBIT Page 105


Smartworld.asia 106

DSP Algorithm and Architecture 10EC751

The sampling frequency of the ADC determines how frequently it receives the TOUT signal.
TINT is used to generate interrupts, which are required to service a peripheral such as a DRAM
controller periodically. The timer can also be stopped, restarted, reset, or disabled by specific status
bits.

Dept.ECE, SJBIT Page 106


Smartworld.asia 107

DSP Algorithm and Architecture 10EC751

4.3.4. Host port interface (HPI):

• Allows to interface to an 8bit or 16bit host devices or a host processor


• Signals in HPI are:
• Host interrupt (HINT)
• HRDY
• HCNTL0 &HCNTL1
• HBIL
• HR/w

Dept.ECE, SJBIT Page 107


Smartworld.asia 108

DSP Algorithm and Architecture 10EC751

Important signals in the HPI are as follows:


• The 16-bit data bus and the 18-bit address bus.
• The host interrupt, Hint, for the DSP to signal the host when it attention is required.
• HRDY, a DSP output indicating that the DSP is ready for transfer.
• HCNTL0 and HCNTL1, control signal that indicate the type of transfer to carry out. The
transfer types are data, address, etc.
• HBIL. If this is low it indicates that the current byte is the first byte; if it is high, it
indicates that it is second byte.
• HR/W indicates if the host is carrying out a read operation or a write operation

4.3.5. Clock Generator:


The clock generator on TMS320C54xx devices has two options-an external clock
and the internal clock. In the case of the external clock option, a clock source is directly connected to
the device. The internal clock source option, on the other hand, uses an internal clock generator and a
phase locked loop (PLL) circuit. The PLL, in turn, can be hardware configured or software
programmed. Not all devices of the TMS320C54xx family have all these clock options; they vary
from device to device.

4.3.6. Serial I/O Ports:


Three types of serial ports are available:
• Synchronous ports.
• Buffered ports.
• Time-division multiplexed ports.

Dept.ECE, SJBIT Page 108


Smartworld.asia 109

DSP Algorithm and Architecture 10EC751

The synchronous serial ports are high-speed, full-duplex ports and that provide direct
communications with serial devices, such as codec, and analog-to-digital (A/D) converters. A buffered
serial port (BSP) is synchronous serial port that is provided with
an auto buffering unit and is clocked at the full clock rate. The head of servicing interrupts. A time-
division multiplexed (TDM) serial port is a synchronous serial port that is provided to allow time-
division multiplexing of the data. The functioning of each of these on-chip peripherals is controlled by
memory-mapped registers assigned to the respective peripheral.

4.4. Interrupts of TMS320C54xx Processors:


Many times, when CPU is in the midst of executing a program, a peripheral device may require
a service from the CPU. In such a situation, the main program may be interrupted by a signal
generated by the peripheral devices. This results in the processor suspending the main program in
order to execute another program, called interrupt service routine, to service the peripheral device. On
completion of the interrupt service routine, the processor returns to the main program to continue from
where it left.
Interrupt may be generated either by an internal or an external device. It may also be generated by
software. Not all interrupts are serviced when they occur. Only those interrupts that are called
nonmaskable are serviced whenever they occur. Other interrupts, which are called maskable interrupts,
are serviced only if they are enabled. There is also a priority to determine which interrupt gets serviced
first if more than one interrupts occur simultaneously.
Almost all the devices of TMS320C54xx family have 32 interrupts. However, the
types and the number under each type vary from device to device. Some of these interrupts are
reserved for use by the CPU.

4.5. Pipeline operation of TMS320C54xx Processors:


The CPU of ‘54xx devices have a six-level-deep instruction pipeline. The six stages of the
pipeline are independent of each other. This allows overlapping execution of instructions. During any
given cycle, up to six different instructions can be active, each at a different stage of processing. The
six levels of the pipeline structure are program prefetch, program fetch, decode, access, read and
execute.
1 During program prefetch, the program address bus, PAB, is loaded with the address of the next
instruction to be fetched.
2 In the fetch phase, an instruction word is fetched from the program bus, PB, and loaded into the
instruction register, IR. These two phases from the instruction fetch sequence.
3 During the decode stage, the contents of the instruction register, IR are decoded to determine the
type of memory access operation and the control signals required for the data-address generation unit
and the CPU.
4 The access phase outputs the read operand’s on the data address bus, DAB. If a second operand is
required, the other data address bus, CAB, also loaded with an appropriate address. Auxiliary
registers in indirect addressing mode and the stack pointer (SP) are also updated.

Dept.ECE, SJBIT Page 109


Smartworld.asia 110

DSP Algorithm and Architecture 10EC751

5 In the read phase the data operand(s), if any, are read from the data buses, DB and CB. This phase
completes the two-phase read process and starts the two phase write processes. The data address of the
write operand, if any, is loaded into the data write address bus, EAB.
6 The execute phase writes the data using the data write bus, EB, and completes the operand write
sequence. The instruction is executed in this phase.

Dept.ECE, SJBIT Page 110


Smartworld.asia 111

DSP Algorithm and Architecture 10EC751

Recommended Questions:

1. Describe Host Port Interface and explain its signals.


2. writes an assembly language program of TMS320C54XX processors to compute the sum of
three product terms given by the equation y(n)=h(0)x(n)+h(1)x(n-1)+h(2)x(n-2) with usual
notations. Find y (n) for signed 16 bit data samples and 16 bit constants.
3. Describe the pipelining operation of TMS320C54XX processors.
4. Explain the operation of serial I/O ports and hardware timer of TMS320C54XX on chip
peripherals.
5. Expalin the differents types ofinterrupts in TMS320C54xx Processors.
6. Describe the operation of the following instructions of TMS 320c54xx processor, with example
Describe the operation of hardware timer with neat diagram.
7. By means of a figure explain the pipeline operation of the following sequence of instruction if
the initial values of AR1,AR3,A are 104,101,2 and the values stored in the memory locations
101,102,103,104 are 4,6,8,12. Also provide the values of registers AR3, AR1,T & A.
8. Describe the operation of the following instructions of TMS320C54XX processors.
9. Describe the operation of the following instructions of TMS320C54XX processors. (July 12,
8m)
10. Explain the following assembler directives of TMS320C54XX processors (i) .mmregs (ii)
.global (iii) .include ‘xx’ (iv) .data ( v) .end (vi) .bss (Dec 09/Jan 10 6marks)
11. Describe Host Port Interface and explain its signals. (Dec 09/Jan 10 6marks)
12. writes an assembly language program of TMS320C54XX processors to compute the sum of
three product terms given by the equation y(n)=h(0)x(n)+h(1)x(n-1)+h(2)x(n-2) with usual
notations. Find y (n) for signed 16 bit data samples and 16 bit constants. (May/June 2011,
6m)
13. Describe the pipelining operation of TMS320C54XX processors.(Dec.11, 8m)
14. Explain the operation of serial I/O ports and hardware timer of TMS320C54XX on chip
peripherals. (Dec.11, 8m)
15. Expalin the differents types ofinterrupts in TMS320C54xx Processors.(May/June 2009, 6m)

Dept.ECE, SJBIT Page 111

You might also like