ECT303 M5 Ktunotes - in
ECT303 M5 Ktunotes - in
ECT303 M5 Ktunotes - in
12%1 Introduction
(1) Hardware designed for efficient execution of specific DSP algorithmns aual
digital filters, fast Fourier transform. This type of special-purp0se
dware is
sometimes called algorithm-specific digital signal processor,
an
In most cases
application-specific digital signal processors execute specific algo-
rithms, such PCM encoding/decoding, but are also
as
required to perform other
application-specific operations. Examples of special-purpose DSP processors are
Cirrus's processor for digital audio
sampling rate converters (CS8420), Mitel's multi-
channel telephony voice echo canceller
and
(MT9300), FFT processor (PDSP16515A)
programmable FlR filter (VPDSP16256).
Both general-purpose and
special-purpose
chips or with individual blocks of multipliers, processors
can be
designed with single
ALUs, memories, and so on.
First, we will discuss the architectural
made real-time DSP in features of digital signal processors that have
many areas
possible.
Optional
ALU
1/0 Multiplier
devices
Program
and data
Accumulator Product memory
register
X data bus
Y data bus
P data bus
Figur
se 12.2 Basic generic hardware architecture for signal
processing.
achieved by making the individual units operate faster, but there is a limit on
how fast
they can be made to operate.
If it is to operate in real time, a DSP prOcessor must have its architecture
optimized
for executing DSP functions. Figure 12.2 shows a generic hardware architecture
suitable for real-time DSP. It is characterized by the following:
Multiple bus structure with separate memory space for data and orogto
m
instructions. Typically the data memories hold input data, intermediate
data
values and output samples, as well as nxed coCIncients for, for example, dioital
filters or FFTs. The program instructions are stored in the program mem ital
1ory.
The 1/0 port provides a means of passing data to and fronm external devices sch
as the ADC and DAC or for passing digital data to other processors. Direct
memory access (DMA), if available, alloWs for rapid transfer of blocks af d.s.
ata
from data RAM, typically under external control.
directly to or
Arithmetic units for logical and arithmetic operations, which include an ALU,
a hardware multiplier and shifters (or multiplier-accumulator).
single bus structure for both data and instructions, as shown in Figure 12..
Suppose that in a standard microprocessor we wish to read a value op1 at address
ADR1 in memory into the accumulator and then store it at Iwo other addresses, ADR2
and ADR3. The instructions could be
LDA ADR1 load the operand op1 into the accumulator from ADR1
STA ADR2 store op1 in address ADR2
STA ADR3 store op1 in address ADR3
IR
OP 1
ADR1
ADR2
ADR3
(a)
instruction fetch;
instruction decode;
instruction execute.
involves fetching the next instruction from memory,
In our case, the instruction fetch
either reading or writing In a
data into memory.
and instruction execute involves
Harvard architecture,
tne program instructions (that is, the
standard processor, without in one memory space; see Figure 12.3.
and the data (operands) are held
program code) while the current one is executing is not
of the next instruction
Thus the fetching each require memory access,
the fetch and execution phases
allowed, because the program instructions and data lie
architecture (Figure 12.4), SInce
In a Harvard instruction can
of the' next instruction can overlap the
the fetching propram
in separate memory
in
spaces, 12.5. Normally, the memory
instruction; see Figure
current variables such as the innut data
lata
execution of the stores
while the data memory
holds the p r o g r a mm code,
samples.
Downloaded from Ktunotes.in
Cnaprer
Data bus
Figure 12.4 Basic Harvard architecture with separate data and program memory spaces
Data and program instruction fetches can be overlapped as two independent
memories are used.
Clock
STA ADR2
Fetch Decode Execute
12.2.2 Pipelining
Pipelining 1s a
technique which allows two or more
operations to overlap du
execution. In pipelining, a task is broken down into a
number of distinct suolsa
L
Instruction 1 Pipestage Pipestage Pipestagee
Instruction 2
Pipestage Pipestage Pipestage
2
Instruction 3
Pipestage| Pipestage Pipestage
3
(a)
i- 1 +1
Instruction decode i+2
Instruction execute
i+ L it2
(b)
be simultaneously
the (i
-
2)th
executing
same
time
al the
In the ideal case, the speed increase is equal number-of pipe stages. In practice,
to the
and
the speed increase will be less because of the overheads in setting up the pipeline,
delays in the pipeline registers, and so on.
Downloaded from Ktunotes.in
In the pipeline each instruction still takes three clock cycles, but at each
machine,
is
cycle the processor executing up to three different instructions. Pipelining increases
the system throughput, but not the execution time of each instruction on its own.
Typically, there is a slight increase in the execution time of each instruction because
of the pipeline overhead.
Pipelining has a major impact on the system memory. The number of memory
accesses in a pipeiine machine increases, essentially by the number of stages. In DSP
the use of Harvard architecture, where data and instructions lie in separate memory
spaces, promotes pipelining.
When a slow unit, such as a data memory, and an arithmetic element are connected
in series. the arithmetic unit often waits idly for a good deal of the time for data.
Pipelining may be used in such cases to allow a better utilization of the arithmetic
unit. The nexi example illustrates the concept.
Downloaded from Ktunotes.in
DSP algorithms often repetitive but highly
are
to multilevel
structured, making them well suiled
pipelining. For example, FFT requires the continuous calculation
butterflies. Although each butterfly requires different data O
and coefficients the basic
butterfly arithmetic operations are identical. Thus arithmetic units such as FFT
processors can be tailored to take
advantage of this. Pipelining ensures a steady flow
of instructions to the CPU, and in
general leads to a significant increase in system
throughput. However, on occasions pipelining may cause problems. For example, in
some digital signal processors,
pipelining may cause an unwanted instruction to be
executed, especially near branch instructions, and the designer should be aware of this
possibility.
Xregister Y register
16 16,
Pregister
32
32
R register
accepts two 16-bit 2's complement fractional numbers and computes a 32-bit product
in a single cycle (25 ns typically). The average MAC instruction time can De
Internal Memory
The c67x DSP has a 32 bit, byte addressable address space. Internal memory is
organized in separate data and prog spaces. When off chip memory is used, these
spaces are unified on most devices to a single memory space via the external;
memory interface (EMIF).
Memory and peripheral options
A variety of memory and peripherals options are available for the C6000 platform.
Large on chip RAM, up-to 7M bits
Program cache holds the frequently accessing data and the size is 32 bit
address and 256 bit data.
Data memory can also be used as Data cache. There are varied size as 8, 16
or 32 bit data size. 2 level cache.
32 bit external memory interface supports SDRAM, SBSRAM, SRAM, and
other asynchronous memories for a board range of external memory
requirement and max system performance.
DMA
DMA (Direct Memory Access) controller transfers data between address
ranges in the memory map without intervention by the CPU. There are 4
channels which are programmable and one auxillary channel which is non-
programmable.
Extended DMA (EDMA) controller performs the same functions as the DMA
controller.
EDMA controller which offers 16 programmable channels.
HPI
Host Port Interface (HPI) is a parallel port through which a host processor can
directly access the cpu’s memory space. This is a dedicated port for
communication between the 2 processors.
Expansion bus
Expansion bus is a replacement for the HPI, as well as an expansion of the
EMIF (External Memory Interface). When EMIF is used, the program memory
and data memory together will be used.
There are 2 modes of operation – asynchronous mode and synchronous
mode. In asynchronous mode, it acts as slave. In synchronous both slave and
master modes are possible.
Multi Channel Buffered serial Port (McBSP)
McBSP is based on the standard serial port interface found on the DSP
processor. McBSP allows full duplex communication. The McBSP consists of
a data path and a control path that connect to external devices.
Timers
Timers in the c6000 devices are two 32 bit general purpose timers used for
these functions
Time event.
Count event.
Generate pulses.
Interrupt the CPU.
Send synchronization events to the DMA/EDMA controllers.
Interrupt CPU
Power-down logic
Power-down logic allows reduced power consumption. If the DSP is acting in
some power saving mode, the power down logic will be active.