Ece-Vii-dsp Algorithms & Architecture (10ec751) - Notes
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Notes
10EC751
University Syllabus
DSP Algorithms and Architecture
Subject Code
: 10EC751
IA Marks
: 25
Exam Hours
: 03
Exam Marks
: 100
PART - A
UNIT - 1
INTRODUCTION TO DIGITAL SIGNAL PROCESSING: Introduction, A Digital SignalProcessing System, The Sampling Process, Discrete Time Sequences, Discrete Fourier Transform
(DFT) and Fast Fourier Transform (FFT), Linear Time-Invariant Systems, Digital Filters, Decimation
and Interpolation.
5 Hours
UNIT - 2
ARCHITECTURES FOR PROGRAMMABLE DIGITAL SIGNAL-PROCESSORS:
Introduction, Basic Architectural Features, DSP Computational Building Blocks, Bus Architecture and
Memory, Data Addressing Capabilities, Address Generation Unit, Programmability and Program
Execution, Features for External Interfacing.
8 Hours
UNIT - 3
PROGRAMMABLE DIGITAL SIGNAL PROCESSORS: Introduction, Commercial digital
Signal-processing Devices, Data Addressing Modes of TMS32OC54xx., Memory Space of
TMS32OC54xx Processors, Program Control.
6 Hours
UNIT - 4
Detail Study of TMS320C54X & 54xx Instructions and Programming, On-Chip peripherals, Interrupts
of TMS32OC54XX Processors, Pipeline Operation of TMS32OC54xx Processor.
6 Hours
PART - B
UNIT - 5
IMPLEMENTATION OF BASIC DSP ALGORITHMS: Introduction, The Q-notation, FIR Filters,
IIR Filters, Interpolation and Decimation Filters (one example in each case).
6 Hours
Dept.ECE, SJBIT
Page 1
10EC751
UNIT - 6
IMPLEMENTATION OF FFT ALGORITHMS: Introduction, An FFT Algorithm for DFT
Computation, Overflow and Scaling, Bit-Reversed Index Generation & Implementation on the
TMS32OC54xx.
6 Hours
UNIT - 7
INTERFACING MEMORY AND PARALLEL I/O PERIPHERALS TO DSP DEVICES:
Introduction, Memory Space Organization, External Bus Interfacing Signals. Memory Interface,
Parallel I/O Interface, Programmed I/O, Interrupts and I / O Direct Memory Access (DMA).
8 Hours
UNIT - 8
INTERFACING AND APPLICATIONS OF DSP PROCESSOR: Introduction, Synchronous
Serial Interface, A CODEC Interface Circuit. DSP Based Bio-telemetry Receiver, A Speech
Processing System, An Image Processing System.
6 Hours
TEXT BOOK:
1. Digital Signal Processing, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.
REFERENCE BOOKS:
1. Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W PearsonEducation, PHI/ 2002
2. Digital Signal Processors, B Venkataramani and M Bhaskar TMH, 2002
3. Architectures for Digital Signal Processing, Peter Pirsch John Weily, 2007
Dept.ECE, SJBIT
Page 2
10EC751
INDEX SHEET
Sl.
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Dept.ECE, SJBIT
Page No.
5-15
16-35
36-59
60-119
120-134
Page 3
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
Dept.ECE, SJBIT
FIR Filters
IIR Filters,
Interpolation Filters
Decimation Filters
UNIT-6 : IMPLEMENTATION OF FFT
ALGORITHMS
Introduction, An FFT Algorithm for DFT Computation
Overflow and Scaling
Bit-Reversed Index Generation
Routine for bit reversed index
Implementation on the TMS32OC54xx.-1
Implementation on the TMS32OC54xx.-2
UNIT-7 : INTERFACING MEMORY AND
PARALLEL I/O PERIPHERALS TO DSP DEVICES
Introduction, Memory Space Organization,
External Bus Interfacing Signals
Timing Diagram of interfacing
Memory Interface
Problems on memory interface
Parallel I/O Interface
Programmed I/O
Interrupts and I / O Direct Memory Access (DMA).
UNIT-8 : INTERFACING AND APPLICATIONS
OF DSP PROCESSOR
Introduction, Synchronous Serial Interface
Block diagram of CODEC
A CODEC Interface Circuit
ADC interface
DSP Based Bio-telemetry Receiver
A Speech Processing System
An Image Processing System
10EC751
135-154
155-170
171-186
Page 4
10EC751
UNIT-1
Introduction to Digital Signal Processing
Syllabus:INTRODUCTION TO DIGITAL SIGNAL PROCESSING: Introduction, A Digital SignalProcessing System, The Sampling Process, Discrete Time Sequences, Discrete Fourier Transform
(DFT) and Fast Fourier Transform (FFT), Linear Time-Invariant Systems, Digital Filters, Decimation
and Interpolation.
5 Hours
TEXT BOOK:
Digital Signal Processing, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.
REFERENCE BOOKS:
Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W PearsonEducation, PHI/ 2002
Digital Signal Processors, B Venkataramani and M Bhaskar TMH, 2002
Architectures for Digital Signal Processing, Peter Pirsch John Weily, 2007
Dept.ECE, SJBIT
Page 5
10EC751
Dept.ECE, SJBIT
Page 6
10EC751
Dept.ECE, SJBIT
Page 7
10EC751
Dept.ECE, SJBIT
Page 8
10EC751
Dept.ECE, SJBIT
Page 9
10EC751
From the above expression it is clear that we can use DFT to find the Frequency response of a
discrete signal. Spacing between the elements of X(k) is given as f=fs/N=1/NT=1/T0.Where T0 is
the signal record length.
It is clear from the expression of f that, in order to minimize the spacing between the
samples N has to be a large value. Although DFT is an efficient technique of obtaining the frequency
response of a sequence, it requires more number of complex operations like additions and
multiplications.
Thus many improvements over DFT were proposed. One such technique is to use the
periodicity property of the twiddle factor e-j2/N. Those algorithms were called as Fast Fourier
Transform Algorithms. The following table depicts the complexity involved in the computation using
DFT algorithms.
Dept.ECE, SJBIT
Page 10
10EC751
LTI systems are characterized by its impulse response or unit sample response in time domain whereas
it is characterized by the system function in frequency domain.
1.7.1 Convolution
Convolution is the operation that related the input output of an LTI system, to its unit sample
response. The output of the system y (n) for the input x (n) and the impulse response of the system
Dept.ECE, SJBIT
Page 11
10EC751
being h (n) is given as y (n) = x(n) * h(n) = x(k) h(n-k), x(n) is the input of the system, h(n) is the
impulse response of the system, y(n) is the output of the system.
1.7.2 Z Transformation
Z Transformations are used to find the frequency response of the system. The Z Transform for
a discrete sequence x (n) is given by, X(Z)= x(n) z-n
1.7.3 The System Function
An LTI system is characterized by its System function or the transfer function. The system
function of a system is the ratio of the Z transformation of its output to that of its input. It is denoted as
H (Z) and is given by H (Z) = Y (Z)/ X (Z).
The magnitude and phase of the transfer function H (Z) gives the frequency response of the
system. From the transfer function we can also get the poles and zeros of the system by solving its
numerator and denominator respectively.
1.8 Digital Filters
Filters are used to remove the unwanted components in the sequence. They are characterized
by the impulse response h (n). The general difference equation for an Nth order filter is given by
y (n) =aky(n-k)+ bk x(n-k)
A typical digital filter structure is as shown in figure 1.7.
Dept.ECE, SJBIT
Page 12
10EC751
The major drawback of FIR filters is, they require more number of filter coefficients to realize a
desired response as compared to IIR filters. Thus the computational time required will also be more.
1.8.2 IIR Filters
Unlike FIR filters, IIR filters have infinite number of impulse response samples. They are
recursive filters as the output depends not only on the past and present inputs but also on the past
outputs. They generally do not have linear phase characteristics. Typical system function of such
filters is given by,
Stability of IIR filters depends on the number and the values of the filter coefficients. The major
advantage of IIR filters over FIR is that, they require lesser coefficients compared to FIR filters for the
same desired response, thus requiring less computation time.
1.8.3 FIR Filter Design
Frequency response of an FIR filter is given by the following expression,
Design procedure of an FIR filter involves the determination of the filter coefficients bk.
Page 13
10EC751
Direct IIR filter design methods are based on least squares fit to a desired frequency response. These
methods allow arbitrary frequency response specifications.
1.9 Decimation and Interpolation
Decimation and Interpolation are two techniques used to alter the sampling rate of a sequence.
Decimation involves decreasing the sampling rate without violating the sampling theorem whereas
interpolation increases the sampling rate of a sequence appropriately by considering its neighboring
samples.
1.9.1 Decimation
Decimation is a process of dropping the samples without violating sampling theorem. The
factor by which the signal is decimated is called as decimation factor and it is denoted by M. It is
given by,
Dept.ECE, SJBIT
Page 14
10EC751
1.9.2 Interpolation
Interpolation is a process of increasing the sampling rate by inserting new samples in between.
The input output relation for the interpolation, where the sampling rate is increased by a factor L, is
given as,
Problems:
1. Obtain the transfer function of the IIR filter whose difference equation is given by y (n)=
0.9y (n-1)+0.1x (n)
y (n)= 0.9y (n-1)+0.1x (n)
Taking Z transformation both sides
Y (Z) = 0.9 Z-1 Y (Z) + 0.1 X (Z)
Y (Z) [1- 0.9 Z-1] = 0.1 X (Z)
The transfer function of the system is given by the expression,
H (Z)= Y(Z)/X(Z)
= 0.1/ [ 1- 0.9 Z-1]
Realization of the IIR filter with the above difference equation is as shown in figure.
Dept.ECE, SJBIT
Page 15
10EC751
2. Let x(n)= [0 3 6 9 12] be interpolated with L=3. If the filter coefficients of the
filters are bk=[1/3 2/3 1 2/3 1/3], obtain the interpolated sequence
After inserting zeros,
w (m) = [0 0 0 3 0 0 6 0 0 9 0 0 12]
bk=[1/3 2/3 1 2/3 1/3]
We have,
y(m)= bk w(m-k) = b-2 w(m+2)+ b-1 w(m+1)+ b0 w(m)+ b1 w(m-1)+ b2 w(m-2)
Substituting the values of m, we get
y(0)= b-2 w(2)+ b-1 w(1)+ b0 w(0)+ b1 w(-1)+ b2 w(-2)= 0
y(1)= b-2 w(3)+ b-1 w(2)+ b0 w(1)+ b1 w(0)+ b2 w(-1)=1
y(2)= b-2 w(4)+ b-1 w(3)+ b0 w(2)+ b1 w(1)+ b2 w(0)=2
Similarly we get the remaining samples as,
y (n) = [ 0 1 2 3 4 5 6 7 8 9 10 11 12]
Recommended Questions
1. Explain with the help of mathematical equations how signed numbers can be
multiplied. The sequence x(n) = [3,2,-2,0,7].It is interpolated using interpolation
sequence bk=[0.5,1,0.5] and the
sequence y(m).
2. An analog signal is sampled at the rate of 8KHz. If 512 samples of this signal are used
to compute DFT X(k) determine the analog and digital frequency spacing between
adjacent X(k0 elements. Also, determine analog and digital frequencies corresponding
to k=60.
3. With a neat diagram explain the scheme of the DSP system.
4. What is DSP? What are the important issues to be considered in designing and
implementing a DSP system? Explain in detail.
5. Why signal sampling is required? Explain the sampling process.
6. Define decimation and interpolation process. Explain them using block diagrams and
equations. With a neat diagram explain the scheme of a DSP system.
7. With an example explain the need for the low pass filter in decimation process.
8. For the FIR filter y(n)=(x(n)+x(n-1)+x(n-2))/3. Determine i) System Function ii)
Magnitude and phase function iii) Step response iv) Group Delay.
9. List the major architectural features used in DSP system to achieve high speed program
execution.
Dept.ECE, SJBIT
Page 16
10EC751
10. Explain how to simulate the impulse responses of FIR and IIR filters.
11. Explain the two method of sampling rate conversions used in DSP system, with suitable
block diagrams and examples. Draw the corresponding spectrum.
12. Assuming X(K) as a complex sequence determine the number of complex real
multiplies for computing IDFT using direct and Radix-2 FT algorithms.
13. With a neat diagram explain the scheme of a DSP system. (June.12, 8m)
14. With an example explain the need for the low pass filter in decimation process.
(June.12, 4m)
15. For the FIR filter y(n)=(x(n)+x(n-1)+x(n-2))/3. Determine i) System Function ii)
Magnitude and phase function iii) Step response iv) Group Delay. (June.12, 8m)
16. List the major architectural features used in DSP system to achieve high speed program
execution. (Dec.11, 6m).
17. Explain how to simulate the impulse responses of FIR and IIR filters. (Dec.11, 6m).
18. Explain the two method of sampling rate conversions used in DSP system, with suitable
block diagrams and examples. Draw the corresponding spectrum. (Dec.11, 8m).
19. Explain with the help of mathematical equations how signed numbers can be
multiplied. (July.11, 8m).
20. With a neat diagram explain the scheme of the DSP system. (Dec.10-Jan.11, 8m)
(July.11, 8m).
Dept.ECE, SJBIT
Page 17
10EC751
UNIT-2
Architectures for Programmable Digital Signal Processing
Devices
Digital Signal Processing, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.
REFERENCE BOOKS:
Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W PearsonEducation, PHI/ 2002
Digital Signal Processors, B Venkataramani and M Bhaskar TMH, 2002
Architectures for Digital Signal Processing, Peter Pirsch John Weily, 2007
Dept.ECE, SJBIT
Page 18
10EC751
multipliers take a single processor cycle to fetch and execute the instruction and to store the result.
They are also called as Array multipliers. The key features to be considered for a multiplier are:
a. Accuracy
b. Dynamic range
c. Speed
The number of bits used to represent the operands decides the accuracy and the dynamic range
of the multiplier. Whereas speed is decided by the architecture employed. If the multipliers are
implemented using hardware, the speed of execution will be very high but the circuit complexity will
also increases considerably. Thus there should be a tradeoff between the speed of execution and the
circuit complexity. Hence the choice of the architecture normally depends on the application.
2.2.2 Parallel Multipliers
Consider the multiplication of two unsigned numbers A and B. Let A be represented using m
bits as (Am-1 Am-2 .. A1 A0) and B be represented using n bits as (Bn-1 Bn-2 .. B1 B0).
Then the product of these two numbers is given by,
This operation can be implemented paralleling using Braun multiplier whose hardware structure is as
shown in the figure 2.1.
Dept.ECE, SJBIT
Page 19
10EC751
Dept.ECE, SJBIT
Page 20
10EC751
2.2.4 Speed
Conventional Shift and Add technique of multiplication requires n cycles to perform the
multiplication of two n bit numbers. Whereas in parallel multipliers the time required will be the
longest path delay in the combinational circuit used. As DSP applications generally require very high
speed, it is desirable to have multipliers operating at the highest possible speed by having parallel
implementation.
2.2.5 Bus Widths
Consider the multiplication of two n bit numbers X and Y. The product Z can be at most 2n
bits long. In order to perform the whole operation in a single execution cycle, we require two buses of
width n bits each to fetch the operands X and Y and a bus of width 2n bits to store the result Z to the
memory. Although this performs the operation faster, it is not an efficient way of implementation as it
is expensive. Many alternatives for the above method have been proposed. One such method is to use
the program bus itself to fetch one of the operands after fetching the instruction, thus requiring only
one bus to fetch the operands. And the result Z can be stored back to the memory using the same
operand bus. But the problem with this is the result Z is 2n bits long whereas the operand bus is just n
bits long. We have two alternatives to solve this problem, a. Use the n bits operand bus and save Z at
two successive memory locations. Although it stores the exact value of Z in the memory, it takes two
cycles to store the result.
b. Discard the lower n bits of the result Z and store only the higher order n bits into the memory. It is
not applicable for the applications where accurate result is required. Another alternative can be used
for the applications where speed is not a major concern. In which latches are used for inputs and
outputs thus requiring a single bus to fetch the operands and to store the result (Fig 2.2).
Dept.ECE, SJBIT
Page 21
10EC751
a. While performing the addition of N numbers each of n bits long, the sum can grow up to n+log2 N
bits long. If the accumulator is of n bits long, then an overflow error will occur. This can be overcome
by using a shifter to scale down the operand by an amount of log2N.
b. Similarly while calculating the product of two n bit numbers, the product can grow up to 2n bits
long. Generally the lower n bits get neglected and the sign bit is shifted to save the sign of the product.
c. Finally in case of addition of two floating-point numbers, one of the operands has to be shifted
appropriately to make the exponents of two numbers equal.
From the above cases it is clear that, a shifter is required in the architecture of a DSP.
2.2.7 Barrel Shifters
In conventional microprocessors, normal shift registers are used for shift operation. As it
requires one clock cycle for each shift, it is not desirable for DSP applications, which generally
involves more shifts. In other words, for DSP applications as speed is the crucial issue, several shifts
are to be accomplished in a single execution cycle. This can be accomplished using a barrel shifter,
which connects the input lines representing a word to a group of output lines with the required shifts
determined by its control inputs. For an input of length n, log2 n control lines are required. And an
dditional control line is required to indicate the direction of the shift.
The block diagram of a typical barrel shifter is as shown in figure 2.3.
Dept.ECE, SJBIT
Page 22
10EC751
Dept.ECE, SJBIT
Page 23
10EC751
Page 24
10EC751
Saturation Logic
Overflow/ underflow will occur if the result goes beyond the most positive number or below
the least negative number the accumulator can handle. Thus the overflow/underflow error can be
resolved by loading the accumulator with the most positive number which it can handle at the time of
overflow and the least negative number that it can handle at the time of underflow. This method is
called as saturation logic. A schematic diagram of saturation logic is as shown in figure 2.7. In
saturation logic, as soon as an overflow or underflow condition is satisfied the accumulator will be
loaded with the most positive or least negative number overriding the result computed by the MAC
unit.
Dept.ECE, SJBIT
Page 25
10EC751
Dept.ECE, SJBIT
Page 26
10EC751
Dept.ECE, SJBIT
Page 27
10EC751
Page 28
10EC751
a. As many DSP algorithms require instructions to be executed repeatedly, the instruction can be
stored in the external memory, once it is fetched can reside in the instruction cache.
b. The access times for memories on-chip should be sufficiently small so that it can be accessed more
than once in every execution cycle.
c. On-chip memories can be configured dynamically so that they can serve different purpose at
different times.
2.6 Data Addressing Capabilities
Data accessing capability of a programmable DSP device is configured by means of its
addressing modes. The summary of the addressing modes used in DSP is as shown in the table below.
Dept.ECE, SJBIT
Page 29
10EC751
Page 30
10EC751
Dept.ECE, SJBIT
Page 31
10EC751
Dept.ECE, SJBIT
Page 32
10EC751
Dept.ECE, SJBIT
Page 33
10EC751
Page 34
10EC751
Problems:
1). Investigate the basic features that should be provided in the DSP architecture to be used to
implement the following Nth order FIR filter.
Solution:y(n)= h(i) x(n-i) n=0,1,2
In order to implement the above operation in a DSP, the architecture requires the
following features
i. A RAM to store the signal samples x (n)
ii. A ROM to store the filter coefficients h (n)
iii. An MAC unit to perform Multiply and Accumulate operation
iv. An accumulator to store the result immediately
v. A signal pointer to point the signal sample in the memory
vi. A coefficient pointer to point the filter coefficient in the memory
vii. A counter to keep track of the count
viii. A shifter to shift the input samples appropriately
2). It is required to find the sum of 64, 16 bit numbers. How many bits should the
accumulator have so that the sum can be computed without the occurrence of
overflow error or loss of accuracy?
The sum of 64, 16 bit numbers can grow up to (16+ log2 64 )=22 bits long. Hence
the accumulator should be 22 bits long in order to avoid overflow error from occurring.
1. In the previous problem, it is decided to have an accumulator with only 16 bits
but shift the numbers before the addition to prevent overflow, by how many bits
should each number be shifted?
As the length of the accumulator is fixed, the operands have to be shifted by an
amount of log2 64 = 6 bits prior to addition operation, in order to avoid the condition of
overflow.
2. If all the numbers in the previous problem are fixed point integers, what is the
actual sum of the numbers?
The actual sum can be obtained by shifting the result by 6 bits towards left side after the sum
being computed. Therefore
Actual Sum= Accumulator content X 2 6
3. If a sum of 256 products is to be computed using a pipelined MAC unit, and if the MAC
execution time of the unit is 100nsec, what will be the total time required to complete the
operation?
Dept.ECE, SJBIT
Page 35
10EC751
As N=256 in this case, MAC unit requires N+1=257execution cycles. As the single MAC
execution time is 100nsec, the total time required will be, (257*100nsec)=25.7usec
4. Consider a MAC unit whose inputs are 16 bit numbers. If 256 products are to be
summed up in this MAC, how many guard bits should be provided for the
accumulator to prevent overflow condition from occurring?
As it is required to calculate the sum of 256, 16 bit numbers, the sum can be as
long as (16+ log2 256)=24 bits. Hence the accumulator should be capable of handling
these 22 bits. Thus the guard bits required will be (24-16)= 8 bits.
The block diagram of the modified MAC after considering the guard or extention bits is as shown in
the figure
5. What are the memory addresses of the operands in each of the following cases of indirect
addressing modes? In each case, what will be the content of the addreg after the memory
access? Assume that the initial contents of the addreg and the offsetreg are 0200h and 0010h,
respectively.
a. ADD *addreg
b.ADD +*addreg
c. ADD offsetreg+,*addreg
d. ADD *addreg,offsetreg-
6. A DSP has a circular buffer with the start and the end addresses as 0200h and 020Fh
respectively. What would be the new values of the address pointer of the buffer if, in the course
of address computation, it gets updated to
Dept.ECE, SJBIT
Page 36
10EC751
a. 0212h
b. 01FCh
Buffer Length= (EAR-SAR+1) = 020F-0200+1=10h
a. New Address Pointer= Updated Pointer-buffer length = 0212-10=0202h
b. New Address Pointer= Updated Pointer+ buffer length = 01FC+10=020Ch
7. Repeat the previous problem for SAR= 0210h and EAR=0201h
Buffer Length= (SAR-EAR+1)= 0210-0201+1=10h
c. New Address Pointer= Updated Pointer- buffer length = 0212-10=0202h
d. New Address Pointer= Updated Pointer+ buffer length = 01FC+10=020Ch
9. Compute the indices for an 8-point FFT using Bit reversed Addressing Mode
Start with index 0. Therefore the first index would be (000)
Next index can be calculated by adding half the FFT length, in this case it is (100)
to the previous index. i.e. Present Index= (000)+B (100)= (100)
Similarly the next index can be calculated as
Present Index= (100)+B (100)= (010)
The process continues till all the indices are calculated. The following table summarizes
the calculation.
Dept.ECE, SJBIT
Page 37
10EC751
Recommended Questions:
1. Explain implementation of 8- tap FIR filter, (i) pipelined using MAC units and (ii)
parallel
4. Draw the schematic diagram of the saturation logic and explain the same.
5. Explain how the circular addressing mode and bit reversal addressing mode are implemented in
a DSP.
6. Explain the purpose of program sequencer.
7. Give the structure of a 4X4 Braun multiplier, Explain its concept. What modification is
required to carry out multiplication of signed numbers? Comment on the speed of the
multiplier.
8. Explain guard bits in a MAC unit of DSP. Consider a MAC unit whose inputs are 24-bit
numbers. How many guard bits should be provided if 512 products have to be added in the
accumulator to prevent overflow condition? What is the overall size of the accumulator
required?
9. With a neat block diagram explain ALU of DSP system.
10. Explain circular buffer addressing mode ii) Parallelism iii) Guard bits.
11. The 256 unsigned numbers, 16 bit each are to be summed up in a processor. How many guard
bits are needed to prevent overflow.
12. How will you implement an 8X8 multiplier using 4X4 multipliers as the building blocks.
13. Describe the basic features that should be provided in the DSP architecture to be used to
implement the Nth order FIR filter, where x(n) denotes the input sample, y(n) the output
sample and h(i) denotes ith filter coefficient.(Dec.09-Jan.10, 8m)
14. Explain the issues to be considered in designing and implementing a DSP system, with the help
of a neat block diagram. (May/June10 , 6m)
15. Briefly explain the major features of programmable DSPs. (May/June10, 8m)
Dept.ECE, SJBIT
Page 38
10EC751
16. Explain the operation used in DSP to increase the sampling rate. The sequence x(n)=[0,2,4,6,8]
is interpolated using interpolation sequence bk =[1/2,1,1/2] and the interpolation factor is 2.find
the interpolated sequence y(m). (May/June10, 8m)
17. Explain with the help of mathematical equations how signed numbers can be multiplied.
(Dec.10-Jan.11, 8m)
18. The sequence x(n) = [3,2,-2,0,7].It is interpolated using interpolation sequence bk=[0.5,1,0.5]
and the interpolation factor of 2. Find the interpolated sequence y(m).(Dec.10-Jan.11, 6m)
19. Why signal sampling is required? Explain the sampling process. (Dec.12, 5m)
20. Define decimation and interpolation process. Explain them using block diagrams and
equations. (Dec.12, 6m).
Dept.ECE, SJBIT
Page 39
10EC751
UNIT-3
Programmable Digital Signal Processors
TEXT BOOK:
Digital Signal Processing, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.
REFERENCE BOOKS:
Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W PearsonEducation, PHI/ 2002
Digital Signal Processors, B Venkataramani and M Bhaskar TMH, 2002
Architectures for Digital Signal Processing, Peter Pirsch John Weily, 2007
3.1 Introduction:
Leading manufacturers of integrated circuits such as Texas Instruments (TI), Analog devices &
Motorola manufacture the digital signal processor (DSP) chips. These manufacturers have developed a
range of DSP chips with varied complexity.
The TMS320 family consists of two types of single chips DSPs: 16-bit fixed point &32-bit floatingpoint. These DSPs possess the operational flexibility of high-speed controllers and the numerical
capability of array processors
3.2 Commercial Digital Signal-Processing Devices:
There are several families of commercial DSP devices. Right from the early eighties, when
these devices began to appear in the market, they have been used in numerous applications, such as
communication, control, computers, Instrumentation, and consumer electronics. The architectural
features and the processing power of these devices have been constantly upgraded based on the
advances in technology and the application needs. However, their basic versions, most of them have
Harvard architecture, a single-cycle hardware multiplier, an address generation unit with dedicated
address registers, special addressing modes, on-chip peripherals interfaces. Of the various families of
programmable DSP devices that are commercially available, the three most popular ones are those
Dept.ECE, SJBIT
Page 40
10EC751
from Texas Instruments, Motorola, and Analog Devices. Texas Instruments was one of the first to
come out with a commercial programmable DSP with the introduction of its TMS32010 in 1982.
Dept.ECE, SJBIT
Page 41
10EC751
Dept.ECE, SJBIT
Page 42
10EC751
Page 43
10EC751
Dept.ECE, SJBIT
Page 44
10EC751
Page 45
10EC751
Dept.ECE, SJBIT
Page 46
10EC751
Dept.ECE, SJBIT
Page 47
10EC751
memory. Figure 3.5(a) and (b) shows the internal CPU registers and peripheral registers with their
addresses. The processors mode status (PMST) registers
that is used to configure the processor. It is a memory-mapped register located at address 1Dh on page
0 of the RAM. A part of on-chip ROM may contain a boot loader and look-up tables for function such
as sine, cosine, - law, and A- law.
Dept.ECE, SJBIT
Page 48
10EC751
Dept.ECE, SJBIT
Page 49
10EC751
Page 50
10EC751
INTR: Interrupt vector pointer, point to the 128-word program page where the interrupt vectors
reside.
MP/MC: Microprocessor/Microcomputer mode,
MP/MC=0, the on chip ROM is enabled.
MP/MC=1, the on chip ROM is enabled.
OVLY: RAM OVERLAY, OVLY enables on chip dual access data RAM blocks to be mapped into
program space.
AVIS: It enables/disables the internal program address to be visible at the address pins.
DROM: Data ROM, DROM enables on-chip ROM to be mapped into data space.
CLKOFF: CLOCKOUT off.
SMUL: Saturation on multiplication.
SST: Saturation on store.
Dept.ECE, SJBIT
Page 51
10EC751
Dept.ECE, SJBIT
Page 52
10EC751
Figure 3.7 Block diagram of the direct addressing mode for TMS320C54xx Processors.
3.4.5 Indirect Addressing:
Data space is accessed by address present in an auxiliary register.
TMS320C54xx have 8, 16 bit auxiliary register (AR0 AR 7). Two auxiliary register arithmetic units
(ARAU0 & ARAU1)
Used to access memory location in fixed step size. AR0 register is used for indexed and bit reverse
addressing modes.
For single operand addressing
MOD _ type of indirect addressing
ARF _ AR used for addressing
ARP depends on (CMPT) bit in ST1
CMPT = 0, Standard mode, ARP set to zero
CMPT = 1, Compatibility mode, Particularly AR selected by ARP
Dept.ECE, SJBIT
Page 53
10EC751
Dept.ECE, SJBIT
Page 54
10EC751
Table 3.2 Indirect addressing options with a single data memory operand.
Circular Addressing;
Used in convolution, correlation and FIR filters.
A circular buffer is a sliding window contains most recent data. Circular buffer of size R must
start on a N-bit boundary, where 2N > R .
The circular buffer size register (BK): specifies the size of circular buffer.
Effective base address (EFB): By zeroing the N LSBs of a user selected AR (ARx).
End of buffer address (EOB) : By repalcing the N LSBs of ARx with the N LSBs of BK.
If 0 _ index + step < BK ; index = index +step;
else if index + step _ BK ; index = index + step - BK;
else if index + step < 0; index + step + BK
Dept.ECE, SJBIT
Page 55
10EC751
Dept.ECE, SJBIT
Page 56
10EC751
Bit-Reversed Addressing:
o Used for FFT algorithms.
o AR0 specifies one half of the size of the FFT.
o The value of AR0 = 2N-1: N = integer FFT size = 2N
o AR0 + AR (selected register) = bit reverse addressing.
o The carry bit propagating from left to right.
Dual-Operand Addressing:
Dual data-memory operand addressing is used for instruction that simultaneously
perform two reads (32-bit read) or a single read (16-bit read) and a parallel store (16-bit
store) indicated by two vertical bars, II. These instructions access operands using indirect addressing
mode.
If in an instruction with a parallel store the source operand the destination operand point to the
same location, the source is read before writing to the destination. Only 2 bits are available in the
instruction code for selecting each auxiliary register in this mode. Thus, just four of the auxiliary
registers, AR2-AR5, can be used, The ARAUs together with these registers, provide capability to
access two operands in a single cycle. Figure 3.11 shows how an address is generated using dual datamemory operand addressing.
Dept.ECE, SJBIT
Page 57
10EC751
Dept.ECE, SJBIT
Page 58
10EC751
Dept.ECE, SJBIT
Page 59
10EC751
Dept.ECE, SJBIT
Page 60
10EC751
Page 61
10EC751
Dept.ECE, SJBIT
Page 62
10EC751
2. Assuming the current contents of AR3 to be 200h, what will be its contents after
each of the following TMS320C54xx addressing modes is used? Assume that the contents of AR0 are
20h
a. *AR3 + 0B
b. *AR3 0B
Solution:
a. AR3 AR3 + AR0 with reverse carry propagation;
AR3 = 200h + 20h (with reverse carry propagation) = 220h.
b. AR3 AR3 - AR0 with reverse carry propagation;
AR3 = 200h - 20h (with reverse carry propagation) = 23Fh.
Recommended Questions:
1. Compare architectural features of TMS320C25 and DSP6000 fixed point digital
processors.
signal
(Dec.09-Jan.10, 6m)
(Dec.09-Jan.10, 6m)
SXM
(Dec.09-Jan.10, 8m)
4. With a block diagram explain the indirect addressing mode of TMS320C54XX processor using
dual data memory operand. (June.12, 6m)
5. What is the function of an address generation unit explain with the help of block diagram.
(Dec.12, 6m)
6. Why circular buffers are required in DSP processor? How they are implemented? (Dec.12, 2m)
7. Explain the direct addressing mode of the TMS320C54XX processor with the help of a block
diagram. (Dec.12, 2m)
8. Describe the multiplier/adder unit of TMS320c54xx processor with a neat block diagram.
(May/June2010, 6m)
9. Describe any four data addressing modes of TMS320c54xx processor(May/June2010, 8m)
10. Assume that the current content of AR3 is 400h, what will be its contents after each of the
following. Assume that the content of AR0 is 40h. (May/June2010, 8m)
Dept.ECE, SJBIT
Page 63
10EC751
an
example
each,
explain
immediate,
absolute,
and
direct
addressing
mode.(May/June2011, 12m)
13. Explain the functioning of barrel shifter in TMS320C54XX processor. (June.12, 6m)
14. Explain sequential and other types of program control(June.11, 7m)
15. With an example each, explain immediate, absolute, and direct addressing mode.
16. Explain the functioning of barrel shifter in TMS320C54XX processor.
17. Explain sequential and other types of program control
18. Assume that the current content of AR3 is 400h, what will be its contents after each of the
following. Assume that the content of AR0 is 40h.
19. Explain PMST register.
20. Compare architectural features of TMS320C25 and DSP6000 fixed point digital
signal
processors.
Dept.ECE, SJBIT
Page 64
10EC751
UNIT-4
Instruction and programming
Syllabus:Detail Study of TMS320C54X & 54xx Instructions and Programming, On-Chip peripherals, Interrupts
of TMS32OC54XX Processors, Pipeline Operation of TMS32OC54xx Processor.
6 Hours
TEXT BOOK:
Digital Signal Processing, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.
REFERENCE BOOKS:
Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W PearsonEducation, PHI/ 2002
Digital Signal Processors, B Venkataramani and M Bhaskar TMH, 2002
Architectures for Digital Signal Processing, Peter Pirsch John Weily, 2007
Dept.ECE, SJBIT
Page 65
10EC751
Dept.ECE, SJBIT
Page 66
10EC751
Dept.ECE, SJBIT
Page 67
10EC751
Dept.ECE, SJBIT
Page 68
10EC751
Dept.ECE, SJBIT
Page 69
10EC751
Dept.ECE, SJBIT
Page 70
10EC751
Dept.ECE, SJBIT
Page 71
10EC751
Dept.ECE, SJBIT
Page 72
10EC751
Dept.ECE, SJBIT
Page 73
10EC751
Dept.ECE, SJBIT
Page 74
10EC751
Dept.ECE, SJBIT
Page 75
10EC751
Dept.ECE, SJBIT
Page 76
10EC751
Dept.ECE, SJBIT
Page 77
10EC751
Dept.ECE, SJBIT
Page 78
10EC751
Dept.ECE, SJBIT
Page 79
10EC751
Dept.ECE, SJBIT
Page 80
10EC751
Dept.ECE, SJBIT
Page 81
10EC751
Dept.ECE, SJBIT
Page 82
10EC751
Dept.ECE, SJBIT
Page 83
10EC751
Dept.ECE, SJBIT
Page 84
10EC751
Dept.ECE, SJBIT
Page 85
10EC751
Dept.ECE, SJBIT
Page 86
10EC751
Dept.ECE, SJBIT
Page 87
10EC751
Dept.ECE, SJBIT
Page 88
10EC751
Dept.ECE, SJBIT
Page 89
10EC751
Dept.ECE, SJBIT
Page 90
10EC751
Dept.ECE, SJBIT
Page 91
10EC751
Dept.ECE, SJBIT
Page 92
10EC751
Dept.ECE, SJBIT
Page 93
10EC751
Dept.ECE, SJBIT
Page 94
10EC751
Dept.ECE, SJBIT
Page 95
10EC751
Dept.ECE, SJBIT
Page 96
10EC751
Branch Instructions
B[D]: Branch Unconditionally
Dept.ECE, SJBIT
Page 97
10EC751
Dept.ECE, SJBIT
Page 98
10EC751
Page 99
10EC751
Dept.ECE, SJBIT
Page 100
10EC751
Dept.ECE, SJBIT
Page 101
10EC751
Dept.ECE, SJBIT
Page 102
10EC751
Dept.ECE, SJBIT
Page 103
10EC751
Dept.ECE, SJBIT
Page 104
10EC751
Dept.ECE, SJBIT
Page 105
10EC751
Dept.ECE, SJBIT
Page 106
10EC751
Dept.ECE, SJBIT
Page 107
10EC751
NOP: No Operation
Page 108
10EC751
Dept.ECE, SJBIT
Page 109
10EC751
The sampling frequency of the ADC determines how frequently it receives the TOUT signal.
TINT is used to generate interrupts, which are required to service a peripheral such as a DRAM
controller periodically. The timer can also be stopped, restarted, reset, or disabled by specific status
bits.
Dept.ECE, SJBIT
Page 110
10EC751
Dept.ECE, SJBIT
Page 111
10EC751
Page 112
10EC751
The synchronous serial ports are high-speed, full-duplex ports and that provide direct
communications with serial devices, such as codec, and analog-to-digital (A/D) converters. A buffered
serial port (BSP) is synchronous serial port that is provided with
an auto buffering unit and is clocked at the full clock rate. The head of servicing interrupts. A timedivision multiplexed (TDM) serial port is a synchronous serial port that is provided to allow timedivision multiplexing of the data. The functioning of each of these on-chip peripherals is controlled by
memory-mapped registers assigned to the respective peripheral.
4.4. Interrupts of TMS320C54xx Processors:
Many times, when CPU is in the midst of executing a program, a peripheral device may require
a service from the CPU. In such a situation, the main program may be interrupted by a signal
generated by the peripheral devices. This results in the processor suspending the main program in
order to execute another program, called interrupt service routine, to service the peripheral device. On
completion of the interrupt service routine, the processor returns to the main program to continue from
where it left.
Interrupt may be generated either by an internal or an external device. It may also be generated by
software. Not all interrupts are serviced when they occur. Only those interrupts that are called
nonmaskable are serviced whenever they occur. Other interrupts, which are called maskable interrupts,
are serviced only if they are enabled. There is also a priority to determine which interrupt gets serviced
first if more than one interrupts occur simultaneously.
Almost all the devices of TMS320C54xx family have 32 interrupts. However, the
types and the number under each type vary from device to device. Some of these interrupts are
reserved for use by the CPU.
4.5. Pipeline operation of TMS320C54xx Processors:
The CPU of 54xx devices have a six-level-deep instruction pipeline. The six stages of the
pipeline are independent of each other. This allows overlapping execution of instructions. During any
given cycle, up to six different instructions can be active, each at a different stage of processing. The
six levels of the pipeline structure are program prefetch, program fetch, decode, access, read and
execute.
1 During program prefetch, the program address bus, PAB, is loaded with the address of the next
instruction to be fetched.
2 In the fetch phase, an instruction word is fetched from the program bus, PB, and loaded into the
instruction register, IR. These two phases from the instruction
fetch sequence.
3 During the decode stage, the contents of the instruction register, IR are decoded to determine the
type of memory access operation and the control signals required for the data-address generation unit
and the CPU.
4 The access phase outputs the read operands on the data address bus, DAB. If a second operand is
required, the other data address bus, CAB, also loaded with an appropriate address. Auxiliary
registers in indirect addressing mode and the stack pointer (SP) are also updated.
Dept.ECE, SJBIT
Page 113
10EC751
5 In the read phase the data operand(s), if any, are read from the data buses, DB and CB. This phase
completes the two-phase read process and starts the two phase write processes. The data address of the
write operand, if any, is loaded into the data write address bus, EAB.
6 The execute phase writes the data using the data write bus, EB, and completes the operand write
sequence. The instruction is executed in this phase.
Dept.ECE, SJBIT
Page 114
10EC751
Recommended Questions:
1. Describe Host Port Interface and explain its signals.
2. writes an assembly language program of TMS320C54XX processors to compute the sum of
three product terms given by the equation y(n)=h(0)x(n)+h(1)x(n-1)+h(2)x(n-2) with usual
notations. Find y (n) for signed 16 bit data samples and 16 bit constants.
3. Describe the pipelining operation of TMS320C54XX processors.
4. Explain the operation of serial I/O ports and hardware timer of TMS320C54XX on chip
peripherals.
5. Expalin the differents types ofinterrupts in TMS320C54xx Processors.
6. Describe the operation of the following instructions of TMS 320c54xx processor, with example
Describe the operation of hardware timer with neat diagram.
7. By means of a figure explain the pipeline operation of the following sequence of instruction if
the initial values of AR1,AR3,A are 104,101,2 and the values stored in the memory locations
101,102,103,104 are 4,6,8,12. Also provide the values of registers AR3, AR1,T & A.
8. Describe the operation of the following instructions of TMS320C54XX processors.
9. Describe the operation of the following instructions of TMS320C54XX processors. (July 12,
8m)
10. Explain the following assembler directives of TMS320C54XX processors (i) .mmregs (ii)
.global (iii) .include xx (iv) .data ( v) .end (vi) .bss
11. Describe Host Port Interface and explain its signals. (Dec 09/Jan 10 6marks)
12. writes an assembly language program of TMS320C54XX processors to compute the sum of
three product terms given by the equation y(n)=h(0)x(n)+h(1)x(n-1)+h(2)x(n-2) with usual
notations. Find y (n) for signed 16 bit data samples and 16 bit constants. (May/June 2011,
6m)
13. Describe the pipelining operation of TMS320C54XX processors.(Dec.11, 8m)
14. Explain the operation of serial I/O ports and hardware timer of TMS320C54XX on chip
peripherals. (Dec.11, 8m)
15. Expalin the differents types ofinterrupts in TMS320C54xx Processors.(May/June 2009, 6m)
Dept.ECE, SJBIT
Page 115
10EC751
UNIT-5
Implementation of Basic DSP Algorithms
Syllabus:IMPLEMENTATION OF BASIC DSP ALGORITHMS: Introduction, The Q-notation, FIR Filters,
IIR Filters, Interpolation and Decimation Filters (one example in each case).
6 Hours
TEXT BOOK:
Digital Signal Processing, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.
REFERENCE BOOKS:
Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W PearsonEducation, PHI/ 2002
Digital Signal Processors, B Venkataramani and M Bhaskar TMH, 2002
Architectures for Digital Signal Processing, Peter Pirsch John Weily, 2007
5.1 Introduction:
In this unit, we deal with implementations of DSP algorithms & write programs to implement
the core algorithms only. However, these programs can be combined with input/output routines to
create applications that work with a specific hardware.
Q-notation
FIR filters
IIR filters
Interpolation filters
Decimation filters
5.2 The Q-notation:
DSP algorithm implementations deal with signals and coefficients. To use a fixed point DSP
device efficiently, one must consider representing filter coefficients and signal samples using fixedpoint2s complement representation. Ex: N=16, Range: -2N-1 to +2N-1 -1(-32768 to
32767).Typically, filter coefficients are fractional numbers.
To represent such numbers, the Q-notation has been developed. The Q-notation specifies the number
of fractional bits.
Dept.ECE, SJBIT
Page 116
10EC751
A commonly used notation for DSP implementations is Q15. In the Q15 representation, the least
significant 15 bits represent the fractional part of a number. In a processor where 16 bits are used to
represent numbers, the Q15 notation uses the MSB to represent the sign of the number and the rest of
the bits represent the value of the number.
In general, the value of a 16-bit Q15 number N represented as:
Multiplication of numbers represented using the Q-notation is important for DSP implementations.
Figure 5.1(a) shows typical cases encountered in such implementations.
Dept.ECE, SJBIT
Page 117
10EC751
Dept.ECE, SJBIT
Page 118
10EC751
Dept.ECE, SJBIT
Page 119
10EC751
Dept.ECE, SJBIT
Page 120
10EC751
Dept.ECE, SJBIT
Page 121
10EC751
Dept.ECE, SJBIT
Page 122
10EC751
Dept.ECE, SJBIT
Page 123
10EC751
Dept.ECE, SJBIT
Page 124
10EC751
The kind of interpolation carried out in the examples is called linear interpolation because the
convolving sequence h(n) is derived based on linear interpolation of samples. Further, in this case, the
h(n) selected is just a second-order filter and therefore uses just two adjacent samples to interpolate a
sample. A higher-order filter can be used to base interpolation on more input samples. To implement
an ideal interpolation. Figure 5.6 shows how an interpolating filter using a 15-tap FIR filter and an
interpolation factor of 5 can be implemented. In this example, each incoming samples is followed by
four zeros to increase the number of samples by a factor of 5.
The interpolated samples are computed using a program similar to the one used for a FIR filter
implementation. One drawback of using the implementation strategy depicted in Figure 5.7 is that
there are many multiplies in which one of the multiplying elements is zero. Such multiplies need not
be included in computation if the computation is rearranged to take advantage of this fact. One such
scheme, based on generating what are called poly-phase sub-filters, is available for reducing the
computation. For a case where the number of filter coefficients N is a multiple of the interpolating
factor L, the scheme implements the interpolation filter using the equation.
Figure 5.7 shows a scheme that uses poly-phase sub-filters to implement the interpolating filter
using the 15-tap FIR filter and an interpolation factor of 5. In this implementation, the 15 filter taps are
arranged as shown and divided into five 3-tap sub filters. The input samples x(n), x(n-1) and x(n-2) are
used five times to generate the five output samples. This implementation requires 15 multiplies as
opposed to 75 in the direct implementation of Figure 5.7.
Dept.ECE, SJBIT
Page 125
10EC751
Figure 5.6 interpolating filter using a 15-tap FIR filter and an interpolation factor of 5
Figure5.7: A scheme that uses poly-phase sub-filters to implement the interpolating filter
Using the 15-tap FIR filter and an interpolation factor of 5
Dept.ECE, SJBIT
Page 126
10EC751
Dept.ECE, SJBIT
Page 127
10EC751
Dept.ECE, SJBIT
Page 128
10EC751
Dept.ECE, SJBIT
Page 129
10EC751
Problems:
1. What values are represented by the 16-bit fixed point number N=4000h in
Q15 & Q7 notations?
Solution:
Q15 notation: 0.100 0000 0000 0000 (N=0.5)
Q7 notation: 0100 0000 0.000 0000 (N=+128)
Recommended Questions:
1. Describe the importance of Q-notation in DSP algorithm implementation with examples. What
are the values represented by 16- bit fixed point number N=4000h in Q15, Q10, Q7 notations?
Explain how the FIR filter algorithms can be implemented using TMS320c54xx processor.
2. Explain with the help of a block diagram and mathematical equations the implementation of a
second order IIR filter. No program code is required.
3. Write the assembly language program for TMS320C54XX processor to implement an FIR
filter.
4. What is the drawback of using linear interpolation for implementing of an FIR filter in
TMS320C54XX processor? Show the memory organization for the filter implementation.
5. Briefly explain IIR filters
6. Determine the value of each of the following 16- bit numbers represented using the given Qnotations:
7. (i) 4400h as a Q10 number (ii) 4400h as a Q7 number (iii) 0.3125 as a Q15 number (iv) 0.3125 as a Q15 number.
8. Write an assembly language program for TMS320C54XX processors to multiply two Q15
numbers to produce Q15 number result.
9. What is an interpolation filter? Explain the implementation of digital interpolation using FIR
filter and poly phase sub filter.
10. Determine the value of each of the following 16- bit numbers represented using the given Qnotations:
Dept.ECE, SJBIT
Page 130
10EC751
11. (i) 4400h as a Q10 number (ii) 4400h as a Q7 number (iii) 0.3125 as a Q15 number (iv) 0.3125 as a Q15 number.
(MAY-JUNE 10, 6m)
12. Write an assembly language program for TMS320C54XX processors to multiply two Q15
numbers to produce Q15 number result. (Dec 12 , 6 marks)(July 11, 6m) (June/July2012,
4m)
13. What is an interpolation filter? Explain the implementation of digital interpolation using FIR
filter and poly phase sub filter.
(Dec 12 8 marks)
14. Describe the importance of Q-notation in DSP algorithm implementation with examples. What
are the values represented by 16- bit fixed point number N=4000h in Q15, Q10, Q7 notations?
(MAY-JUNE 10, 6m)
15. Explain how the FIR filter algorithms can be implemented using TMS320c54xx processor.
(DEC 2012, 6m) (MAY-JUNE 10,
10marks)
16. Explain with the help of a block diagram and mathematical equations the implementation of a
second order IIR filter. No program code is required.(June/July2011, 10m)
17. Write the assembly language program for TMS320C54XX processor to implement an FIR
filter. (June/July2012, 12m)
18. What is the drawback of using linear interpolation for implementing of an FIR filter in
TMS320C54XX processor? Show the memory organization for the filter implementation.
(DEC 2012, 6m)
19. Briefly explain IIR filters. (DEC 2011, 4m)
Dept.ECE, SJBIT
Page 131
10EC751
Unit 6
Implementation of FFT algorithms
Syllabus:IMPLEMENTATION OF FFT ALGORITHMS: Introduction, An FFT Algorithm for DFT
Computation, Overflow and Scaling, Bit-Reversed Index Generation & Implementation on the
TMS32OC54xx.
6 Hours
TEXT BOOK:
Digital Signal Processing, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.
REFERENCE BOOKS:
Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W PearsonEducation, PHI/ 2002
Digital Signal Processors, B Venkataramani and M Bhaskar TMH, 2002
Architectures for Digital Signal Processing, Peter Pirsch John Weily, 2007
6.1 Introduction: The N point Discrete Fourier Transform (DFT) of x(n) is a discrete
signal of length N is given by eq(6.1)
By referring to eq (6.1) and eq (6.2), the difference between DFT & IDFT are seen to be
Dept.ECE, SJBIT
Page 132
10EC751
the sign of the argument for the exponent and multiplication factor, 1/N. The computational
complexity in computing DFT / I DFT is thus same (except for the additional multiplication factor in
IDFT). The computational complexity in computing each X(k) and all the x(k) is shown in table 6.1.
In a typical Signal Processing System, shown in fig 6.1 signal is processed using DSP in the DFT
domain. After processing, IDFT is taken to get the signal in its original domain. Though certain
amount of time is required for forward and inverse transform, it is because of the advantages of
transformed domain manipulation, the signal processing is carried out in DFT domain. The
transformed domain manipulations are sometimes simpler. They are also more useful and powerful
than time domain manipulation. For example, convolution in time domain requires one of the signals
to be folded, shifted and multiplied by another signal, cumulatively. Instead, when the signals to be
convolved are transformed to DFT domain, the two DFT are multiplied and inverse transform is taken.
Thus, it simplifies the process of convolution.
6.2 An FFT Algorithm for DFT Computation: As DFT / IDFT are part of signal processing system,
there is a need for fast computation of DFT / IDFT. There are algorithms available for fast
computation of DFT/ IDFT. There are referred to as Fast Fourier Transform (FFT) algorithms. There
are two FFT algorithms: Decimation-In-Time
FFT (DITFFT) and Decimation-In-Frequency FFT (DIFFFT). The computational complexity of both
the algorithms are of the order of log2(N). From the hardware / software implementation viewpoint the
algorithms have similar structure throughout the
computation. In-place computation is possible reducing the requirement of large memory locations.
The features of FFT are tabulated in the table 6.2.
Dept.ECE, SJBIT
Page 133
10EC751
Consider an example of computation of 2 point DFT. The signal flow graph of 2 point DITFFT
Computation is shown in fig 6.2. The input / output relations is as in eq (6.3) which are arrived at from
eq(6.1).
Similarly, the Butterfly structure in general for DITFFT algorithm is shown in fig. 6.3. The signal flow
graph for N=8 point DITFFT is shown in fig. 4. The relation between input and output of any Butterfly
structure is shown in eq (6.4) and eq(6.5).
Dept.ECE, SJBIT
Page 134
10EC751
Separating the real and imaginary parts, the four equations to be realized in implementation of
DITFFT Butterfly structure are as in eq(6.6).
Observe that with N=2^M, the number of stages in signal flow graph=M, number of multiplications =
(N/2)log2(N) and number of additions = (N/2)log2(N). Number of Butterfly Structures per stage =
N/2. They are identical and hence in-place computation is possible. Also reusability of hardware
designed for implementing Butterfly structure is
possible. However in case FFT is to be computed for a input sequence of length other than 2^M the
sequence is extended to N=2^M by appending additional zeros. The process will not alter the
Dept.ECE, SJBIT
Page 135
10EC751
information content of the signal. It improves frequency resolution. To make the point clear, consider
a sequence whose spectrum is shown in fig. 6.5.
The spectrum is sampled to get DFT with only N=10. The same is shown in fig 6.
The variations in the spectrum are not traced or caught by the DFT with N=10. For example, dip in the
spectrum near sample no. 2, between sample no.7 & 8 are not represented in DFT. By increasing
N=16, the DFT plot is shown in fig. 6.7. As depicted in fig 6.7, the approximation to the spectrum
with N=16 is better than with N=10. Thus, increasing N to a suitable value as required by an algorithm
improves frequency resolution.
Dept.ECE, SJBIT
Page 136
10EC751
Problem P6.1: What minimum size FFT must be used to compute a DFT of 40 points? What
must be done to samples before the chosen FFT is applied? What is the frequency resolution
achieved?
Solution:
Minimum size FFT for a 40 point sequence is 64 point FFT. Sequence is extended to 64 by appending
additional 24 zeros. The process improves frequency resolution from
6.3 Overflow and Scaling: In any processing system, number of bits per data in signal
processing is fixed and it is limited by the DSP processor used. Limited number of bits leads to
overflow and it results in erroneous answer. InQ15 notation, the range of numbers that can be
represented is -1 to 1. If the value of a number exceeds these limits, there will be underflow /
overflow. Data is scaled down to avoid overflow.
However, it is an additional multiplication operation. Scaling operation is simplified by
selecting scaling factor of 2^-n. And scaling can be achieved by right shifting data by n bits. Scaling
factor is defined as the reciprocal of maximum possible number in the operation. Multiply all the
numbers at the beginning of the operation by scaling factor so that the maximum number to be
processed is not more than 1. In the case of DITFFT computation, consider for example,
Dept.ECE, SJBIT
Page 137
10EC751
To find the maximum possible value for LHS term, Differentiate and equate to zero
Thus scaling factor is 1/2.414=0.414. A scaling factor of 0.4 is taken so that it can be implemented by
shifting the data by 2 positions to the right. The symbolic representation
of Butterfly Structure is shown in fig. 6.8. The complete signal flow graph with scaling factor is shown
in fig. 6.9.
Dept.ECE, SJBIT
Page 138
10EC751
6.4 Bit-Reversed Index Generation: As noted in table 6.2, DITFFT algorithm requires input in bit
reversed order. The input sequence can be arranged in bit reverse order by reverse carry add operation.
Add half of DFT size (=N/2) to the present bit reversed ndex to get next bit reverse index. And employ
reverse carry propagation while adding bits from left to right. The original index and bit reverse index
for N=8 is listed in table 6.3
Dept.ECE, SJBIT
Page 139
10EC751
Consider an example of computing bit reverse index. The present bit reversed index be
110. The next bit reversed index is
There are addressing modes in DSP supporting bit reverse indexing, which do the computation of
reverse index.
6.5 Implementation of FFT on TMS32OC54xx: The main program flow for the implementation of
DITFFT is shown in fig. 6.10. The subroutines used are _clear to clear all the memory locations
reserved for the results. _bitrev stores the data sequence x (n) in bit reverse order. _butterfly computes
the four equations of computing real and imaginary parts of butterfly structure. _spectrum computes
the spectrum of x (n). The Butterfly subroutine is invoked 12 times and the other subroutines are
invoked only once.
Dept.ECE, SJBIT
Page 140
10EC751
Dept.ECE, SJBIT
Page 141
10EC751
Clear subroutine is shown in fig. 6.11. Sixteen locations meant for final results are cleared. AR2 is
used as pointer to the locations. Bit reverse subroutine is shown in fig. 6.12. Here, AR1 is used as
pointer to x(n). AR2 is used as pointer to X(k) locations. AR0 is loaded with 8 and used in bit reverse
addressing. Instead of N/2 =4, it is loaded with N=8 because each X(k) requires two locations, one for
real part and the other for imaginary part. Thus, x(n) is stored in alternate locations, which are meant
for real part of X(k). AR3 is used to keep track of number of transfers.
Dept.ECE, SJBIT
Page 142
10EC751
Butterfly subroutine is invoked 12 times. Part of the subroutine is shown in fig. 6.13. Real part and
imaginary of A and B input data of butterfly structure is divided by 4 which
is the scaling factor. Real part of A data which is divided by 2 is stored in temp location. It is used
further in computation of eq (3) and eq (4) of butterfly. Division is carried out by shifting the data to
the right by two places. AR5 points to real part of A input data, AR2 points to real part of B input data
and AR3 points to real part of twiddle factor while
invoking the butterfly subroutine. After all the four equations are computed, the pointers
are in the same position as they were when the subroutine is invoked. Thus, the results
are stored such that in-place computation is achieved. Fig. 6.14 through 6.17 show the
butterfly subroutine for the computation of 4 equations.
Dept.ECE, SJBIT
Page 143
10EC751
Dept.ECE, SJBIT
Page 144
10EC751
Figure 6.18 depicts the part of the main program that invokes butterfly subroutine by supplying
appropriate inputs, A and B to the subroutine. The associated butterfly structure is also shown for
quick reference. Figures 6.19 and 6.20 depict the main program for the computation of 2nd and 3rd
stage of butterfly.
Dept.ECE, SJBIT
Page 145
10EC751
Dept.ECE, SJBIT
Page 146
10EC751
Dept.ECE, SJBIT
Page 147
10EC751
After the computation of X(k), spectrum is computed using the eq(6.8). The pointer AR1
is made to point to X(k). AR2 is made to point to location meant for spectrum. AR3 is loaded with
keeps track of number of computation to be performed. The initialization of
the pointer registers before invoking the spectrum subroutine is shown in fig. 6.21. The
subroutine is shown in fig. 6.22. In the subroutine, square of real and imaginary parts are computed
and they are added. The result is converted to Q15 notation and stored.
Dept.ECE, SJBIT
Page 148
10EC751
Problems:
1. Derive equations to implement a Butterfly encountered in a DIFFFT implementation.
Solution:
Butterfly structure for DIFFFT:
The input / output relations are
Dept.ECE, SJBIT
Page 149
10EC751
2. How many add/subtract and multiply operations are needed to implement a general butterfly of
DITFFT?
Solution:
Referring to 4 equations required in implementing DITFFT Butterfly structure, Add//subttrractt
operations 06 and Multiply operations 04
3. Derive the optimum scaling factor for the DIFFFT Butterfly structure.
Solution: The four equations of Butterfly structure are
Differentiating 4th relation and setting it to zero, (any equation may be considered)
Thus scaling factor is 0.707. To achieve multiplication by right shift, it is chosen as 0.5.
Dept.ECE, SJBIT
Page 150
10EC751
Recommended Questions:
1. Derive the equation to implement a butterfly structure In DITFFT algorithm.
2. How many add/subtract and multiply operations are needed to compute the butterfly structure?
Write the subroutine for bit reversed address generation. Explain the same.
3. Why zero padding is done before computing the DFT?
4. What do you mean by bit-reversed index generation and how it is implemented in
TMS320C54XX DSp assembly language?
5. Write a subroutine program to find the spectrum of the transformed data using TMS320C54XX
DSP.
6. Explain a general DITFFT butterfly in place computation structure.
7. Determine the number of stages and number of butterflies in each stage and the total number of
butterflies needed for the entire computation of 512 point FFT.
8. Explain how the bit reversed index generation can be done in 8 pt FFT. Also write a
TMS320C54xx program for 8 pt DIT-FFT bit reversed index generation.
9. Determine the following for a 128-point FFT computation: (i) number of stages (ii) number of
butterflies in each stage (iii) number of butterflies needed for the entire computation (iv)
number of butterflies that need no twiddle factors (v) number of butterflies that require real
twiddle factors (vi) number of butterflies that require complex twiddle factors.
10. Explain, how scaling prevents overflow conditions in the butterfly computation.
11. Explain, how scaling prevents overflow conditions in the butterfly computation.(June/July
2012, 6m)
12. With the help of the implementation structure, explain the FFT algorithm for DIT-FFT
computation
Page 151
10EC751
18. Write a subroutine program to find the spectrum of the transformed data using TMS320C54XX
DSP. (DEC 2012, 6m)
19. With the help of the implementation structure, explain the FFT algorithm for DIT-FFT
computation
20. Determine the following for a 128-point FFT computation: (i) number of stages (ii) number of
butterflies in each stage (iii) number of butterflies needed for the entire computation (iv)
number of butterflies that need no twiddle factors (v) number of butterflies that require real
twiddle factors (vi) number of butterflies that require complex twiddle factors. (MAY-JUNE
11)
Dept.ECE, SJBIT
Page 152
10EC751
Unit 7
Interfacing Memory & Parallel I/O Peripherals
to DSP Devices
Syllabus:INTERFACING MEMORY AND PARALLEL I/O PERIPHERALS TO DSP DEVICES:
Introduction, Memory Space Organization, External Bus Interfacing Signals. Memory Interface,
Parallel I/O Interface, Programmed I/O, Interrupts and I / O Direct Memory Access (DMA).
8 Hours
TEXT BOOK:
Digital Signal Processing, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.
REFERENCE BOOKS:
Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W PearsonEducation, PHI/ 2002
Digital Signal Processors, B Venkataramani and M Bhaskar TMH, 2002
Architectures for Digital Signal Processing, Peter Pirsch John Weily, 2007
7.1 Introduction: A typical DSP system has DSP with external memory, input devices and output
devices. Since the manufacturers of memory and I/O devices are not same as that of manufacturers of
DSP and also since there are variety of memory and I/O devices available, the signals generated by
DSP may not suit memory and I/O devices to be connected to DSP. Thus, there is a need for
interfacing devices the purpose of it being to use DSP signals to generate the appropriate signals for
setting up communication with the memory. DSP with interface is shown in fig. 7.1.
Dept.ECE, SJBIT
Page 153
10EC751
7.2 Memory Space Organization: Memory Space in TMS320C54xx has 192K words of 16 bits each.
Memory is divided into Program Memory, Data Memory and I/O Space, each are of 64K words. The
actual memory and type of memory depends on particular DSP device of the family. If the memory
available on a DSP is not sufficient for an application, it can be interfaced to an external memory as
depicted in fig. 7.2. The On- Chip Memory are faster than External Memory. There are no interfacing
requirements. Because they are on-chip, power consumption is less and size is small. It exhibits better
performance by DSP because of better data flow within pipeline. The purpose of such memory is to
hold Program / Code / Instructions, to hold constant data such as filter coefficients / filter order, also to
hold trigonometric tables / kernels of transforms employed in an algorithm. Not only constants are
stored in such memory, they are also used to hold variable data and intermediate results so that the
processor need not refer to external memory for the purpose.
Dept.ECE, SJBIT
Page 154
10EC751
External memory is off-chip. They are slower memory. External Interfacing is required to
establish the communication between the memory and the DSP. They can be with large memory
space. The purpose is being to store variable data and as scratch pad memory. Program memory can be
ROM, Dual Access RAM (DARAM), Single Access RAM (SARAM), or a combination of all these.
The program memory can be extended externally to 8192K words. That is, 128 pages of 64K words
each. The arrangement of memory and DSP in the case of Single Access RAM (SARAM) and Dual
Access RAM (DARAM) is shown in fig. 7.3. One set of address bus and data bus is available in the
case of SARAM and two sets of address bus and data bus is available in the case of DARAM. The
DSP can thus access two memory locations simultaneously.
There are 3 bits available in memory mapped register, PMST for the purpose of on-chip
memory mapping. They are microprocessor / microcomputer mode. If this bit is 0, the on-chip ROM is
enabled and addressable and if this bit is 1 the on-chip ROM not available. The bit can be manipulated
by software / set to the value on this pin at system
reset. Second bit is OVLY. It implies RAM Overlay. It enables on-chip DARAM data memory blocks
to be mapped into program space. If this bit is 0, on-chip RAM is addressable in data space but not in
Program Space and if it is 1, on-chip RAM is mapped into Program & Data Space. The third bit is
DROM. It enables on-chip DARAM 4-7 to be mapped into data space. If this bit is 0, on-chip
DARAM 4-7 is not mapped into data space and if this bit is 1, on-chip DARAM 4-7 is mapped into
Data Space. On-chip data memory is partitioned into several regions as shown in table 7.1. Data
memory can be onchip / off-chip.
Dept.ECE, SJBIT
Page 155
10EC751
The on-chip memory of TMS320C54xx can be both program & data memory. It enhances speed of
program execution by using parallelism. That is, multiple data access capability is provided for
concurrent memory operations. The number of operations in single memory access is 3 reads & one
write. The external memory to DSP can be interfaced with 16 -23 bit Address Bus, 16 bit Data Bus.
Interfacing Signals are generated by the DSP to refer to external memory. The signals required by the
memory are typically chip Select, Output Enable and Write Enable. For example, TMS320C5416 has
16K ROM, 64K DARAM and 64K SARAM.
Extended external Program Memory is interfaced with 23 address lines i.e., 8192K locations. The
external memory thus interfaced is divided into 128 pages, with 64K words per page.
7.3: External Bus Interfacing Signals: In DSP there are 16 external bus interfacing signals. The
signal is characterized as single bit i.e., single line or multiple bits i.e., Multiple lines / bus. It can be
synchronous / asynchronous with clock. The signal can be
active low / active high. It can be output / input Signal. The signal carrying line / lines Can be
unidirectional / bidirectional Signal. The characteristics of the signal depend on
the purpose it serves. The signals available in TMS320C54xx are listed in table 7.2 (a) & table 7.2 (b).
In external bus interfacing signals, address bus and data bus are multi-lines bus. Address bus is
unidirectional and carries address of the location refereed. Data bus is bidirectional and carries data to
Dept.ECE, SJBIT
Page 156
10EC751
or from DSP. When data lines are not in use, they are tri-stated. Data Space Select, Program Space
Select, I/O Space Select are meant for data space, program space or I/O space selection. These
interfacing signals are all active low. They are active during the entire operation of data memory /
program memory / I/O space reference. Read/Write Signal determines if the DSP is reading the
external device or writing.
Read/Write Signal is low when DSP is writing and high when DSP is reading. Strobe Interfacing
Signals, Memory Strobe and I/O Strobe both are active low. They remain low
during the entire read & write operations of memory and I/O operations respectively. External Bus
Interfacing Signals from 1-8 are all are unidirectional except Data Bus which is bidirectional. Address
Lines are outgoing signals and all other control signals are also outgoing signals.
Data Ready signal is used when a slow device is to be interfaced. Hold Request and Hold
Acknowledge are used in conjunction with DMA controller. There are two Interrupt related signals:
Interrupt Request and Interrupt Acknowledge. Both are active low. Interrupt Request typically for data
exchange. For example, between ADC / another Processor. TMS320C5416 has 14 hardware interrupts
for the purpose of User interrupt, Mc-BSP, DMA and timer. The External Flag is active high,
asynchronous and outgoing control signal. It initiates an action or informs about the completion of a
transaction to the peripheral device. Branch Control Input is a active low, asynchronous, incoming
control signal. A low on this signal makes the DSP to respond or attend to the peripheral device. It
informs about the completion of a transaction to the DSP.
Dept.ECE, SJBIT
Page 157
10EC751
7.4 The Memory Interface: The memory is organized as several locations of certain number of bits.
The number of locations decides the address bus width and memory capacity. The number of bits per
locations decides the data bus width and hence word length. Each location has unique address. The
demand of an application may be such that memory capacity required is more than that available in a
memory IC. That means there are insufficient words in memory IC. Or the word length required may
be more than that is available in a memory IC. Thus, there may be insufficient word length. In both the
cases, more number of memory ICs are required.
Typical signals in a memory device are address bus to carry address of referred memory location. Data
bus carries data to or from referred memory location. Chip Select Signal selects one or more memory
ICs among many memory ICs in the system. Write Enable enables writing of data available on data
bus to a memory location. Output Enable signal enables the availability of data from a memory
location onto the data bus. The address bus is unidirectional, carries address into the memory IC. Data
bus is bidirectional. Chip Select, Write Enable and Output Enable control signals are active high or
low and they carry signals into the memory ICs. The task of the memory interface is to use DSP
signals and generate the appropriate signals for setting up communication with the memory. The
logical spacing of interface is shown in fig. 7.4.
The timing sequence of memory access is shown in fig. 7.5. There are two read operations, both
referring to program memory. Read Signal is high and Program Memory Select is low. There is one
Write operation referring to external data memory. Data Memory Select is low and Write Signal low.
Read and write are to memory device and hence memory strobe is low. Internal program memory
reads take one clock cycle and External data memory access require two clock cycles.
Dept.ECE, SJBIT
Page 158
10EC751
Dept.ECE, SJBIT
Page 159
10EC751
7.5 Parallel I/O Interface: I/O devices are interfaced to DSP using unconditional I/O mode,
programmed I/O mode or interrupt I/O mode. Unconditional I/O does not require any handshaking
signals. DSP assumes the readiness of the I/O and transfers the data with its own speed. Programmed
I/O requires handshaking signals. DSP waits for the readiness of the I/O readiness signal which is one
of the handshaking signals. After the
completion of transaction DSP conveys the same to the I/O through another handshaking signal.
Interrupt I/O also requires handshaking signals. DSP is interrupted by the I/O indicating the readiness
Dept.ECE, SJBIT
Page 160
10EC751
of the I/O. DSP acknowledges the interrupt, attends to the interrupt. Thus, DSP need not wait for the
I/O to respond. It can engage itself in execution as long as there is no interrupt.
7.6: Programmed I /O interface: The timing diagram in the case of programmed I/O is shown in fig.
7.6. I/O strobe and I/O space select are issued by the DSP. Two clock cycles each are required for I/O
read and I/O write operations.
An example of interfacing ADC to DSP in programmed I/O mode is shown in fig. 7.7. ADC has a start
of conversion (SOC) signal which initiates the conversion. In programmed I/O mode, external flag
signal is issued by DSP to start the conversion. ADC issues end of conversion (EOC) after completion
of conversion. DSP receives Branch input control by ADC when ADC completes the conversion. The
DSP issues address of the ADC, I/O strobe and read / write signal as high to read the data. An address
decoder does the translation of this information into active low read signal to ADC. The data is
supplied on data bus by ADC and DSP reads the same. After reading,
DSP issues start of conversion once again after the elapse of sample interval. Note that
there are no address lines for ADC. The decoded address selects the ADC. During conversion, DSP
waits checking branch input control signal status for zero. The flow chart of the activities in
programmed I/O is shown in fig. 7.8.
Dept.ECE, SJBIT
Page 161
10EC751
7.7 Interrupt I/O: This mode of interfacing I/O devices also requires handshaking signals. DSP is
interrupted by the I/O whenever it is ready. DSP Acknowledges the interrupt, after testing certain
conditions, attends to the interrupt. DSP need not wait for the I/O to respond. It can engage itself in
execution. There are a variety of interrupts. One of the classifications is maskable and nonmaskable. If
maskable, DSP can ignore when that interrupt is masked. Another classification is vectored and nonvectored. If vectored, Interrupt Service subroutine (ISR) is in specific location. In Software Interrupt,
instruction is written in the program.
Dept.ECE, SJBIT
Page 162
10EC751
In Hardware interrupt, a hardware pin, on the DSP IC will receive an interrupt by the external
device. Hardware interrupt is also referred to as external interrupt and software interrupt is referred to
as internal interrupt. Internal interrupt may also be due to execution of certain instruction can causing
interrupt. In TMS320C54xx there are total of 30 interrupts. Reset, Non-maskable, Timer Interrupt,
HPI, one each, 14 Software Interrupts, 4 External user Interrupts, 6 Mc-BSP related Interrupts and 2
DMA related Interrupts. Host Port Interface (HPI) is a 8 bit parallel port. It is possible to interface to a
Host Processor using HPI. Information exchange is through on-chip memory of DSP
which is also accessible Host processor.
Registers used in managing interrupts are Interrupt flag Register (IFR) and Interrupt Mask
Register (IMR). IFR maintains pending external & internal interrupts. One in any bit position implies
pending interrupt. Once an interrupt is received, the orresponding bit is set. IMR is used to mask or
unmask an interrupt. One implies that the corresponding interrupt is unmasked. Both these registers
are Memory Mapped Registers. One flag, Global enable bit (INTM), in ST1 register is used to enable
or disable all interrupts globally. If INTM is zero, all unmasked interrupts are enabled. If it is one, all
maskable interrupts are disabled.
When an interrupt is received by the DSP, it checks if the interrupt is maskable. If the interrupt
is non-maskable, DSP issues the interrupt acknowledgement and thus serves the interrupt. If the
interrupt is hardware interrupt, global enable bit is set so that no other interrupts are entertained by the
DSP. If the interrupt is maskable, status of the INTM is checked. If INTM is 1, DSP does not respond
to the interrupt and it continues with program execution. If the INTM is 0, bit in IMR register
corresponding to the interrupt is checked. If that bit is 0, implying that the interrupt is masked, DSP
does not respond to the interrupt and continues with its program execution. If the interrupt is
unmasked, then DSP issues interrupt acknowledgement. Before branching to the interrupt service
routine, DSP saves the PC onto the stack. The same will be reloaded after attending the interrupt so as
to return to the program that has been interrupted. The response of DSP to an Interrupt is shown in
flow chart in fig. 7.9.
Dept.ECE, SJBIT
Page 163
10EC751
7.8: Direct Memory Access (DMA) operation: In any application, there is data transfer
between DSP and memory and also DSP and I/O device, as shown in fig. 7.10. However, there may be
need for transfer of large amount of data between two memory regions or between memory and I/O.
DSP can be involved in such transfer, as shown in fig. 7.11. Since amount of data is large, it will
engage DSP in data transfer task for a long time. DSP thus will not get utilized for the purpose it is
meant for, i.e., data manipulation. The intervention of DSP has to be avoided for two reasons: to
utilize DSP for useful signal processing task and to increase the speed of transfer by direct data
transfer between memory or memory and I/O. The direct data transfer is referred to as direct memory
access (DMA). The arrangement expected is shown in fig. 7.12. DMA controller helps in data transfer
instead of DSP.
Dept.ECE, SJBIT
Page 164
10EC751
In DMA, data transfer can be between memory and peripherals which are either internal
or external devices. DMA controller manages DMA operation. Thus DSP is relieved of the task of
data transfer. Because of direct transfer, speed of transfer is high. In TMS320C54xx, there are up to 6
independent programmable DMA channels. Each channel is between certain source & destination.
One channel at a time can be used for
Dept.ECE, SJBIT
Page 165
10EC751
data transfer and not all six simultaneously. These channels can be prioritized. The speed of transfer
measured in terms of number of clock cycles for one DMA transfer depends on several factors such as
source and destination location, external interface conditions, number of active DMA channels, wait
states and bank switching time. The time for data transfer between two internal memory is 4 cycles for
each word.
Requirements of maintaining a channel are source & Destination address for a channel,
separately for each channel. Data transfer is in the form of block, with each block having frames of 16
/ 32 bits. Block size, frame size, data are programmable. Along with these, mode of transfer and
assignment of priorities to different channels are also to be maintained for the purpose of data transfer.
There are five, channel context registers for each DMA channel. They are Source
Address Register (DMSRC), Destination Address Register (DMDST), Element Count Register
(DMCTR), Sync select & Frame Count register (DMSFC), Transfer Mode Control Register
(DMMCR). There are four reload registers. The context register DMSRC & DMDST are source &
destination address holders. DMCTR is for holding number of data elements in a frame. DMSFC is to
convey sync event to use to trigger DMA transfer, word size for transfer and for holding frame count.
DMMCR Controls transfer mode by specifying source and destination spaces as program memory,
data memory or I/O space. Source address reload & Destination address reload are useful in
reloading source address and destination address. Similarly, count reload and frame count reload are
used in reloading count and frame count. Additional registers for DMA that are common to all
channels are Source Program page address, DMSRCP, Destination Program page address, DMDSTP,
Element index address register, Frame index address register.
Number of memory mapped registers for DMA are 6x(5+4) and some common registers
for all channels, amounting to total of 62 registers required. However, only 3 (+1 for priority related)
are available. They are DMA Priority & Enable Control Register (DMPREC), DMA sub bank Address
Register (DMSA), DMA sub bank Data Register with auto increment (DMSDI) and DMA sub bank
Data Register (DMSDN). To access each of the DMA Registers Register sub addressing Technique is
employed. The schematic of the arrangement is shown in fig. 7.13. A set of DMA registers of all
channels (62) are made available in set of memory locations called sub bank. This voids the need for
62 memory mapped registers. Contents of either DMSDI or DMSDN indicate the code (1s & 0s) to
be written for a DMA register and contents of DMSA refers to the unique sub address of DMA
register to be accessed. Mux routes either DMSDI or DMSDN to the sub bank. The memory location
to be written
Dept.ECE, SJBIT
Page 166
10EC751
DMSDI is used when an automatic increment of the sub address is required after each access. Thus it
can be used to configure the entire set of registers. DMSDN is used when single DMA register access
is required. The following examples bring out clearly the method of accessing the DMA registers and
transfer of data in DMA mode.
Recommended Questions:
1. Explain an interface between an A/D converter and the TMS320C54XX processor in the
programmed I/O mode.
2. Describe DMA with respect to TMS320C54XX processors.
3. Drew the timing diagram for memory interface for read-read-write sequence of operation.
Explain the purpose of each signal involved.
4. Explain the memory interface block diagram for the TMS 320 C54xx processor.
5. Draw the I/O interface timing diagram for read write read sequence of operation.
6. What are interrupts? How interrupts are handled by C54xx DSP Processors.
Dept.ECE, SJBIT
Page 167
10EC751
7. Explain the memory interface block diagram for the TMS 320 C54xx processor.
8. Draw the I/O interface timing diagram for read write read sequence of operation.
9. What are interrupts? How interrupts are handled by C54xx DSP Processors.
10. Design a data memory system with address range 000800h 000fffh for a c5416 processor
using 2kx8 SRAM memory chips.
11. Design a data memory system with address range 000800h 000fffh for a c5416 processor
using 2kx8 SRAM memory chips. (MAY-JUNE 10, 6m)
12. Explain an interface between an A/D converter and the TMS320C54XX processor in the
programmed I/O mode. . (JUNE 12, 10m)
13. Describe DMA with respect to TMS320C54XX processors. (June/July 11, 10m)
14. Drew the timing diagram for memory interface for read-read-write sequence of operation.
Explain the purpose of each signal involved.(June/July 11, 10m)
15. Explain the memory interface block diagram for the TMS 320 C54xx processor.(Dec 2010)
16. Draw the I/O interface timing diagram for read write read sequence of operation (Dec 2010)
17. What are interrupts? How interrupts are handled by C54xx DSP Processors. (Dec 2010,12)
18. What are interrupts? What are the classes of interrupts available in the TMS320C54xx
processor. (JUNE/July 11, 8m)
Dept.ECE, SJBIT
Page 168
10EC751
Unit 8
Interfacing and Applications of DSP Processor
Syllabus:INTERFACING AND APPLICATIONS OF DSP PROCESSOR: Introduction, Synchronous
Serial Interface, A CODEC Interface Circuit. DSP Based Bio-telemetry Receiver, A Speech
Processing System, An Image Processing System.
6 Hours
TEXT BOOK:
Digital Signal Processing, Avatar Singh and S. Srinivasan, Thomson Learning, 2004.
REFERENCE BOOKS:
Digital Signal Processing: A practical approach, Ifeachor E. C., Jervis B. W PearsonEducation, PHI/ 2002
Digital Signal Processors, B Venkataramani and M Bhaskar TMH, 2002
Architectures for Digital Signal Processing, Peter Pirsch John Weily, 2007
8.1 Introduction: In the case of parallel peripheral interface, the data word will be transferred with all
the bits together. In addition to parallel peripheral interface, there is a
need for interfacing serial peripherals. DSP has provision of interfacing serial devices too.
8.2 Synchronous Serial Interface: There are certain I/O devices which handle transfer
of one bit at a time. Such devices are referred to as serial I/O devices or peripherals. Communication
with serial peripherals can be synchronous, with processor clock as reference or it can be
asynchronous. Synchronous serial interface (SSI) makes communication a fast serial communication
and asynchronous mode of communication is slow serial communication. However, in comparison
with parallel peripheral interface,
the SSI is slow. The time taken depends on the number of bits in the data word.
8.3 CODEC Interface Circuit: CODEC, a coder-decoder is an example for synchronous serial I/O. It
has analog input-output, ADC and DAC. The signals in SSI generated by the DSP are DX: Data
Transmit to CODEC, DR: Data Receive from CODEC, CLKX: Transmit data with this clock
reference, CLKR: Receive data with this clock reference, FSX: Frame sync signal for transmit, FSR:
Frame sync signal for receive, First bit, during transmission or reception, is in sync with these signals,
RRDY: indicator for receiving all bits of data and XRDY: indicator for transmitting all bits of data.
Similarly, on the CODEC side, signals are FS*: Frame sync signal, DIN: Data Receive from DSP,
DOUT: Data Transmit to DSP and SCLK: Tx / Rx data with this clock reference. The block diagram
Dept.ECE, SJBIT
Page 169
10EC751
depicting the interface between TMS320C54xx and CODEC is shown in fig. 8.1. As only one signal
each is available on CODEC for clock and frame synchronization, the related DSP side signals are
connected together to clock and frame sync signals on CODEC. Fig. 8.2 and fig. 8.3 show the timings
for receive and transmit in SSI, respectively.
As shown, the receiving or transmit activity is initiated at the rising edge of clock, CLKR
/ CLKX. Reception / Transfer starts after FSR / FSX remains high for one clock cycle. RRDY /
XRDY is initially high, goes LOW to HIGH after the completion of data transfer. Each transfer of bit
requires one clock cycle. Thus, time required to transfer / receive data word depends on the number of
bits in the data word. An example of data word of 8 bits is shown in the fig. 8.2 and fig. 8.3.
Dept.ECE, SJBIT
Page 170
10EC751
Fig. 8.4 shows the block diagram of PCM3002 CODEC. Analog front end samples signal at 64X over
sampling rate. It eliminates need for sample-and-hold circuit and simplifies need for anti aliasing filter.
ADC is based on Delta-sigma modulator to convert analog signal to digital form. Decimation filter
reduces the sampling rate and thus processing does not need high speed devices. DAC is Delta-sigma
modulator, converts digital signal to analog signal. Interpolation increases the sampling rate back to
original value. LPF smoothens the analog reconstructed signal by removing high frequency
components. The Serial Interface monitors serial data transfer. It accepts built-in ADC output and
converts to serial data and transmits the same on DOUT. It also accepts serial data on DIN & gives the
Dept.ECE, SJBIT
Page 171
10EC751
same to DAC. The serial interface works in synchronization with BCLKIN & LRCIN. The Mode
Control initializes the serial data transfer. It sets all the desired modes, the number of bits and the
mode Control Signals, MD, MC and ML. MD carries Mode Word. MC is the mode Clock Signal. MD
to be loaded is sent with reference to this clock. ML is the mode Load Signal. It defines start and end
of latching bits into CODEC device.
Figure 8.5 shows interfacing of PCM3002 to DSP in DSK. DSP is connected to PCM3002 through
McBSP2. The same port can be connected to HPI. Mux selects one among these two based on CPLD
signal. CPLD in Interface also provides system clock for DSP and for CODEC, Mode control signals
for CODEC. CPLD generates BCLKIN and LRCIN signals required for serial interface.
PCM3002 CODEC handles data size of 16 / 20 bits. It has 64x over-sampling, delta-sigma ADC &
DAC. It has two channels, called left and right. The CODEC is programmable for digital de-emphasis,
digital attenuation, soft mute, digital loop back, power-down mode. System clock, SYSCLK of
CODEC can be 256fs, 384fs or 512fs. Internal clock is always 256fs for converters, digital filters.
DIN, DOUT are the single line data lines to carry the data into the CODEC and from CODEC.
Another signal BCLKIN is data bit clock, the default value of which is CODEC SYSCLK / 4. LRCIN
is frame sync signal for Left and Right Channels. The frequency of this signal is same as the sampling
Dept.ECE, SJBIT
Page 172
10EC751
frequency. The default divide factor can be 2, 4, 6 and 8. Thus, sampling rate is minimum of 6 KHz
and maximum of 48 KHz.
Problem P8.1: A PCM3002 is programmed for the 12 KHz sampling rate. Determine the divisor N
that should be written to the CPLD of the DSK and the various clock frequencies for the set up.
Solution: CPLD input Clock=12.288MHz (known)
Sampling rate fs=CODEC_SYSCLK / 256 =12KHz (given)
CPLD output clock, CODEC_SYSCLK =12.288 x 106 / N
Thus, CODEC_SYSCLK =256 x 12 KHz
& N=12.288 x 106/(256 x 12 x 103)
=4
Problem P8.3: Frame Sync is generated by dividing the 8.192MHz clock by 256 for the
serial communication. Determine the sampling rate and the time a 16 bit sample takes when
transmitted on the data line.
Solution: LRCIN, Frame Sync = 8.192x106/256 =32 KHz
Sampling rate fs= frequency of LRCIN=32 KHz
BCLKIN, Bit clock rate=CODEC_SYSCLK / 4=8.192x106/4=2.048MHz
Dept.ECE, SJBIT
Page 173
10EC751
The CODEC PCM3002 supports four data formats as listed in table 8.1. The four data formats depend
on the number of bits in the data word, if the data is right justified or left justified with respect to
LRCIN and if it is I2S (Integrated Inter-chip Sound) format.
Figure 8.6 and fig. 8.7 depicts the data transaction for CODEC PCM3002. As shown in fig. 8.6, DIN (/
DOUT) carries the data. BCLKIN is the reference for transfer. When LRCIN is high, left channel
inputs (/ outputs) the data and when LRCIN is low, right channel inputs (/ outputs) the data. The data
bits at the end (/ beginning) of the LRCIN thus Right (/ left) justified.
Another data format handled by PCM3002 is I2S (Integrated Inter-chip Sound). It is used for
transferring PCM between CD transport & DAC in CD player. LRCIN is low for left channel and high
for right channel in this mode of transfer. During the first BCKIN, there is no transmission by ADC.
During 2nd BCKIN onwards, there is transmission with MSB first and LSB last. Left channel data is
handled first followed by right channel data.
Dept.ECE, SJBIT
Page 174
10EC751
An example of processing ECG signal is considered. The scheme involves modulation of ECG signal
by employing Pulse Position Modulation (PPM). At the receiving end, it is
demodulated. This is followed by determination of Heart beat Rate (HR). PPM Signal either encodes
single or multiple signals. The principle of modulation being that the position of pulse decides the
sample value.
Dept.ECE, SJBIT
Page 175
10EC751
The PPM signal with two ECG signals encoded is shown in fig. 8.9. The transmission requires a sync
signal which has 2 pulses of equal interval to mark beginning of a cycle.
The sync pulses are followed by certain time gap based on the amplitude of the sample of 1st signal to
be transmitted. At the end of this time interval there is another pulse. This is again followed by time
gap based on the amplitude of the sample of the 2nd signal to be transmitted. After encoding all the
samples, there is a compensation time gap followed by sync pulses to mark the beginning of next set
of samples. Third signal may be encoded in either of the intervals of 1st or 2nd signal. With two
signals encoded and the pulse width as tp, the total time duration is 5tp.
Dept.ECE, SJBIT
Page 176
10EC751
A DSP based PPM signal decoding is shown in fig. 8.11. PPM signal interface generates the interrupt
for DSP. DSP entertains the interrupt and starts a timer. When it receives another interrupt, it stops the
timer and the count is treated as the digital equivalent of the sample value. The process repeats. Dual
DAC converts two signals encoded into analog signals. And heart rate is determined referring to the
ECG obtained by decoding
Heart Rate (HR) is a measure of time interval between QRS complexes in ECG signal. QRS complex
in ECG is an important segment representing the heart beat. There is periodicity in its appearance
indicating the heart rate. The algorithm is based on 1st and 2nd order absolute derivatives of the ECG
signal. Since absolute value of derivative is taken, the filter will be a nonlinear filtering.
Dept.ECE, SJBIT
Page 177
10EC751
Mean of half of peak amplitudes is determined, which is threshold for detection of QRS complex.
QRS interval is then the time interval between two such peaks. Time Interval between two peaks is
determined using internal timer of DSP. Heart Rate, heart beat perminute is computed using the
relation HR=Sampling rate x 60 / QRS interval. The signals at various stages are shown in fig. 8.12.
Dept.ECE, SJBIT
Page 178
10EC751
8.5 A Speech Processing System: The purpose of speech processing is for analysis, transmission or
reception as in the case of radio / TV / phone, denoising, compression and so on. There are various
applications of speech processing which include identification and verification of speaker, speech
synthesis, voice to text conversion and
vice versa and so on. A speech processing system has a vocoder, a voice coding / decoding circuit.
Schematic of speech production is shown in fig. 8.13. The vocal tract has vocal cord at one end and
mouth at the other end. The shape of the vocal tract depends on position of lips, jaws, tongue and the
velum. It decides the sound that is produced. There is another tract, nasal tract. Movement of velum
connects or disconnects nasal tract. The overall voice that sounds depends on both, the vocal tract and
nasal tract.
Two types of speech are voiced sound and unvoiced sound. Vocal tract is excited with quasi periodic
pulses of air pressure caused by vibration of vocal cords resulting in voiced sound. Unvoiced sound is
produced by forcing air through the constriction, formed somewhere in the vocal tract and creating
turbulence that produces source of noise to excite the vocal tract.
By the understanding of speech production mechanism, a speech production model representing the
same is shown in fig. 8.14. Pulse train generator generates periodic pulse train. Thus it represents the
voiced speech signal. Noise generator represents unvoiced speech. Vocal tract system is supplied
either with periodic pulse train or noise. The final output is the synthesized speech signal.
Sequence of peaks occurs periodically in voiced speech and it is the fundamental frequency of speech.
The fundamental frequency of speech differs from person to person and hence sound of speech differs
from person to person. Speech is a non stationary signal. However, it can be considered to be
relatively stationary in the intervals of 20ms. Fundamental frequency of speech can be determined by
Dept.ECE, SJBIT
Page 179
10EC751
The speech signal s(t) is filtered to retain frequencies up to 900Hz and sampled using ADC to get s(n).
The sampled signal is processed by dividing it into set of samples of 30ms duration with 20ms overlap
of the windows. The same is shown in fig. 8.16.
Dept.ECE, SJBIT
Page 180
10EC751
A threshold is set for three level clipping by computing minimum of average of absolute values of 1st
100 samples and last 100 samples. The scheme is shown in fig. 8.17.
The transfer characteristics of three level clipping circuit is shown in fig. 8.18. If the sample value is
greater than +CL, the output y(n) of the clipper is set to 1. If the sample value is more negative than -
Dept.ECE, SJBIT
Page 181
10EC751
CL, the output y(n) of the clipper is set to -1. If the sample value is between CL and +CL, the output
y(n) of the clipper is set to 0.
The autocorrelation of y(n) is computed which will be 0,1 or -1 as defined by eq (1). The largest peak
in autocorrelation is found and the peak value is compared to a fixed threshold. If the peak value is
below threshold, the segment of s(n) is classified as unvoiced segment. If the peak value is above
threshold, the segment of s(n) is classified
as voiced segment. The functioning of autocorrelation is shown in fig. 8.19.
As shown in fig. 8.19, A is a sample sequence y(n). B is a window of samples of length N and it is
compared with the N samples of y(n). There is maximum match. As the window is moved further, say
to a position C the match reduces. When window is moved further say to a position D, again there is
maximum match. Thus, sequence y(n) is periodic. The period of repetition can be measured by
locating the peaks and finding the time gap between them.
Dept.ECE, SJBIT
Page 182
10EC751
8.5 An Image Processing System: In comparison with the ECG or speech signal considered so far,
image has entirely different requirements. It is a two dimensional signal. It can be a color or gray
image. A color image requires 3 matrices to be maintained for three primary colors-red, green and
blue. A gray image requires only one
matrix, maintaining the gray information of each pixel (picture cell). Image is a signal with large
amount of data. Of the many processing, enhancement, restoration, etc., image compression is one
important processing because of the large amount of data in image.
To reduce the storage requirement and also to reduce the time and band width required to transmit the
image, it has to be compressed. Data compression of the order of factor 50 is sometimes preferred.
JPEG, a standard for image compression employs lossy compression technique. It is based on discrete
cosine transform (DCT). Transform domain compression separates the image signal into low
frequency components and high frequency components. Low frequency components are retained
Dept.ECE, SJBIT
Page 183
10EC751
because they represent major variations. High frequency components are ignored because they
represent minute variations and our eye is not sensitive to minute variations.
Image is divided into blocks of 8 x 8. DCT is applied to each block. Low frequency coefficients are of
higher value and hence they are retained. The amount of high frequency components to be retained is
decided by the desirable quality of reconstructed image. Forward DCT is given by eq (2).
Since the coefficients values may vary with a large range, they are quantized. As already noted low
frequency coefficients are significant and high frequency coefficients are insignificant, they are
allotted varying number of bits. Significant coefficients are quantized precisely, with more bits and
insignificant coefficients are quantized coarsely,
with fewer bits. To achieve this, a quantization table as shown in fig. 8.20 is employed. The contents
of Quantization Table indicate the step size for quantization. An entry as smaller value implies smaller
step size, leading to more bits for the coefficients and vice
versa.
The quantized coefficients are coded using Huffman coding. It is a variable length coding Huffman
Encoding. Shorter codes are allotted for frequently occurring long sequence of 1s & 0s. Decoding
requires Huffman table and dequantization table. Inverse DCT is taken employing eq(3). The data
blocks so obtained are combined to form complete image. The schematic of encoding and decoding is
shown in fig. 8.21.
Dept.ECE, SJBIT
Page 184
10EC751
Recommended Questions:
1. With the help of a block diagram, explain the image compression and reconstruction using
JPEG encoder and decoder.
2. Write a pseudo algorithm heart rate(HR), using the digital signal processor.
3. Explain briefly the building blocks of a PCM3002 CODEC device. What do you understand by
a DSP based biotelemetry receiver?
4. With the help of block diagram explain JPEG algorithm.
5. Explain with the neat diagram the operation of pitch detector.
6. Explain with a neat diagram, the synchronous serial interface between the C54xx and a
CODEC device. Explain the operation of pulse position modulation (PPM) to encode two
biomedical signals.
7. Explain with a neat block diagram the operation, the operation of the pitch detector.
Dept.ECE, SJBIT
Page 185
10EC751
Dept.ECE, SJBIT
Page 186