Unit 1

UNIT - 1
What is Signal?
• Anything that carries information can be called as signal. It can also be defined as
a physical quantity that varies with time, temperature, pressure or with any
independent variables such as speech signal or video signal.
What is Signal Processing?
• The process of operation in which the characteristics of a signal like amplitude,
shape, phase, frequency, etc. undergoes a change is known as signal processing.
Why Processing?
• To obtain signal in mor desirable form.
• To extract correct information and removal of unwanted information.
• Transformation of signal from one domain to other domain.
UNIT - 1
What is Digital Signal Processing?
• DSP is a technique of performing the mathematical operations on the signals in
digital domain.
• As real time signals are analog in nature we need first convert the analog
signal to digital, then we have to process the signal in digital domain and again
converting back to analog domain.
• Thus ADC is required at the input side whereas a DAC is required at the output
end.
• typical DSP system is as shown in figure
Advantage of Digital Signal Processing
• Ease of Processing
• Thermal Drift and Reliability
• Repeatability
• Immunity to Noise
• Programmability
• Storage
DSP Applications
• Audio – Coding, decoding, surround sound, equalizer
• Communication – Scrambling, Cellular phones, software radios
• Control System – Robotics, Disk drive control, Motor Control
• Medical – Diagnose equipment, Hearing aids
• Defence – Radar & Sonar, Missile guidance
Introduction to Programmable DSPs
• The Programmable Digital Signal Processors (P-DSPs) are designed with
features that are specifically required for digital signal processing
applications.
• The conventional microprocessors are meant for general purpose
applications and hence they do not have these features.
• General microprocessors can also execute digital signal processing
algorithms successfully, but dedicated DSPs usually have better power
efficiency thus they are more suitable in portable devices such as mobile
phones because of power consumption constrains.
• DSPs uses special memory architecture that are able to fetch multiple data or
instructions at the same time.
Multiplier and Multiplier Accumulator (MAC)
• One of the most common operations required in digital signal processing

applications is array multiplication. For example Convolution and
Correlation require array multiplication
• The array multiplication can be done using a single multiplier and adder.
• One of the important requirements of these array multipliers is that they
have to process the signals in real time.
• Before the next sample of the input signal arrives at the input to the array,
the array multiplication should be completed.
• This requires the multiplication as well as accumulation to be carried out
using single hardware elements.
• There are two approaches to solve this problem.
Multiplier and Multiplier Accumulator (MAC)
• In first approach a dedicated MAC unit may be implemented in hardware,
which consists of multiplier and accumulator in a single hardware unit.
• This approach is adopted by the Motorola DSP processors DSP5600X.
• The second approach is to have multiplier and accumulator separate. The
output of multiplier is stored into the product register. The content of
product register can be added to accumulator register.
• This approach is adopted by Texas Instrument DSP processor 320C5X.
Fig. C Implementation of Convolver with Single Multiplier/adder
Von Neumann Architecture
• The Von Neumann architecture consists of three distinct components: a central
processing unit (CPU), memory unit, and input/output (I/O) interfaces.
• The ALU is responsible for carrying out all arithmetic and logic operations on
data, whereas the control unit determines the order of flow of instructions that
need to be executed in programs by issuing control signals to the hardware.
• The registers are basically temporary storage locations that store addresses of the
instructions that need to be executed. The memory unit consist of RAM, which is
the main memory used to store program data and instructions. The I/O interfaces
allows the users to communicate with the outside world such as storage devices.
• In a Von-Neumann architecture, the same memory and bus are used to store both
data and instructions that run the program.
• Since you cannot access program memory and data memory simultaneously, the
Von Neumann architecture is susceptible to bottlenecks and system performance
is affected.
Von Neumann Architecture
Harvard Architecture
• Harvard architecture uses two separate buses for the program and data memory.
• Hence the content of program memory and data memory can be accessed in
parallel.
• The instruction code can be fed from the program memory to the control unit
while the operand is fed to the processing unit from the data memory.
• The processing unit consisting of the registers and processing elements such as
MAC unit, multiplier, ALU, shifter, etc.
• Computers designed with the Harvard architecture are able to run a program and
access data independently, and therefore simultaneously.
• Harvard architecture has a strict separation between data and code. Thus,
Harvard architecture is more complicated but separate pipelines remove the
bottleneck that Von Neumann creates.
Harvard Architecture
Modified Harvard Architecture
Modified Harvard Architecture
• The P-DSPs follow the modified Harvard architecture.

• In this architecture one set of bus is used to access a memory that has both
program and data and another that has data alone.
• Data can also be transferred from one memory to another.
• Modified Harvard architecture is used in several P-DSPs, for example P-DSPs
from Texas Instruments and Analog devices.
VLIW Architecture
• Another architecture used for P-DSPs is the Very Long Instruction

Word (VLIW) architecture.
Block Diagram of the VLIW architecture

VLIW Architecture
• Instruction of VLIW processor consists of multiple independent
operations grouped together.
• There are multiple independent functional units in VLIW
processor architecture.
• Each operation in the instruction is aligned to a functional unit.
• The multiple functional units share a common multipored
register file for fetching the operands and storing the results.
• Read/write cross bar provides the parallel random access of
register file to the functional units.
• Execution of the operations in the functional units is carried out
simultaneously with the load/store operation of data between a
RAM and the register file.
Pipelining
• One of the approaches adopted for increasing the efficiency of
advanced microprocessors as well as P-DSPs is instruction
pipelining.
• An instruction cycle starting with the fetching of an instruction
and ending with execution of instruction including the time
storage of the results can be split into a number of
microinstructions.
• Execution of each of the microinstructions is also referred to as
one phase of an instruction. Four phases are as follows
• Fetch phase - in which the instruction is fetched from the
program memory.
• Decode phase – in which the instruction is decoded.
Pipelining
• Memory read phase – in which the operand required for the
execution of the instruction may be read from the data memory.
• Execution phase – in which execution as well as the storage of
the results in either one of the registers or memory is carried out.
• Each of the above microinstructions may be carried out
separately by four functional units.
• Let us assume that each of the above four phases take equal time
for completion. In this case in a conventional microprocessor with
no pipelining, each of the functional unit is busy only 25% of the
time.
• This is because only one instruction is processed at the CPU at a
time.
Pipelining
Fig. (a) Instruction cycle of a processor with no pipelining Fig. (b) Instruction cycle of a processor with pipelining
Pipelining
• The functional units can be kept busy almost all the time by
processing a number of instructions simultaneously in the CPU.
• For example, in a machine with four functional units, four
instructions I1, I2, I3 and I4 can be processed simultaneously as
shown in Fig. (b).
• When I1 enters the decode phase I2 can enter the opcode fetch
phase.
• When I1 enters the operand read phase I2 enters the decode
phase and I3 enters the opcode fetch phase.
• When I1 enters the execute phase I2 enters the operand read
phase I3 enters the decode phase and I4 enters the opcode fetch
phase.
Pipelining
• The instructions that follow I4 keep the functional units busy till
the program is exited.
• Let T denote the time required for each phase of the instruction.
One clock cycle of the processor corresponds to T. In a period of
12T only three instructions can be executed in a machine without
pipelining.
• In the same period nine instructions can be executed as shown in
Fig. (b).
• Hence the throughput is increased by a factor of 3 in this case.
Pipelining Speedup
𝑛𝑇1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =
(𝐾 + 𝑛 − 1 )𝑇𝐾
Where,
n = Number of tasks
𝑇1 = Time taken by non-pipelined system
K = Number of pipeline segment
𝑇𝐾 = Time taken by pipelined system
Pipelining Speedup
• A non-pipelined system takes 100 ns to process a task. The same
task can be performed in 5 segments pipelined into 10 ns each.
Determine speedup ratio of pipelined for 1000 tasks.
• Solution –
given
n=1000 tasks
k=5
1000×100
𝑇1 = 100 ns 𝑠𝑝𝑒𝑒𝑑𝑢𝑝 =
(5+(1000−1)×10
𝑇𝐾 = 10 ns 𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = 9.9601
Pipelining Speedup
Determine speedup ratio of pipeline for 1000 tasks.
Determine speedup ratio of pipeline for 100 tasks.
Pipeline Depth
• The number of instructions that are processed simultaneously in
the CPU, also referred to as depth of the instruction pipeline,
differs in different families of P-DSPs.
• The pipeline depths of some of the P-DSPs are given in Table
below
Special Addressing Modes in P-DSPs
• In addition to the addressing modes such as direct, indirect and
immediate supported by the conventional microprocessors, P-
DSPs have special addressing modes that permit single
word/instruction format and thereby speed up the execution by
making effective use of the instruction pipelining.
• Further there are also special addressing modes such as cyclic
addressing and bit reversed addressing that are specifically
tailored for DSP applications.
• The details of these addressing are as follows.
Shot Immediate Addressing –
• This type of addressing permits the operand to be specified using
a short constant that forms part of a single word instruction.
• The length of the short constant depends on the instruction type
and the P-DSP.
• For example in the case of Tl TMS320C5X, an 8-bit constant can be
specified as one of the operands in the single word instructions for
addition, subtraction, AND, OR, XOR, etc.
Shot Direct Addressing –
• This type of addressing permits the lower order address of the
operand of an instruction to be specified in the single word
instruction.
• In the Tl TMS320 DSPs, the higher order 9 bits of the memory are
stored in the data page pointer and only the lower 7 bits are
specified as a part of the instruction.
• Each contiguous block of 128 words is referred to as one page in
the Tl DSPs. The argument in the instruction specifies only the
location within the current page.
• In the Motorola DSP5600X, short direct memory addressing
permits a 6-bit address to be specified in the instruction.
Memory-mapped Addressing –
• The CPU registers and the I/O registers of the P-DSPs are also
accessible as memory location. This is achieved by storing them in
either the starting page or the final page of the memory space.
• For example, in TMS320C5X, page 0 corresponds to the CPU
registers and I/O registers.
• In the case of Motorola DSP5600X, the last page of the memory
space containing 64 locations is used as the memory map for the
CPU and I/O registers.
• When these registers are accessed using memory mapped
addressing modes, the higher address bits are not taken from the
data page pointer and instead made to be 0 in the case of TI DSPs
and made to be 1 in Motorola DSPs.
Indirect Addressing –
• In P-DSPs this addressing mode has a number of options. This
permits an array of data to be processed in P-DSP to be efficiently
fetched and stored.
• The address of the operands can be stored in one of the registers
called indirect address registers.
• In the case of TI processors, the indirect address registers are
called auxiliary registers ARs.
• The ARs may be incremented or decremented either in steps of 1
or in steps specified by the content of an offset register.
• In the case of TI processors, the offset register is called an INDX
register.
Bit Reversed Addressing Mode–
• The bit reversed number representation is explained in above
table.
• The binary pattern corresponding to a particular decimal number
is obtained by writing the natural binary equivalent of the number
in the reverse order so that the most significant bit of the natural
binary number becomes the least significant bit of the bit reversed
no and vice versa.
• For the computation of the FFT, the data is to be arranged in the
bit reversed order and 2-point DFT of the resulting sequence is to
be computed first.
• In the bit reversed addressing mode, when a 16-point FFT is to be
computed, 2-point DFT of X(0) and X(8) is to be found.
• Similarly 2-point DFT of X(4) and X(12) and so on. It may be noted
from Table that the value 0, 8, 4, 12 corresponds to the consecutive
numbers in the bit reversed number representation.
• In the bit reversed addressing mode, the address is
incremented/decremented by the number represented in the bit
reversed form.
Circular Addressing Mode–
• In real time processing of signals, the input signal is continuously
stored in the memory.
• The processed data is stored in another memory space
continuously and may be written onto the output device. In this
case input as well as output program will be simple.
• However, since the input as well as output memory space will be
finite in size, the entire memory space would be exhausted after
processing the input signal for some time, if the data is written
into the memory by using linear addressing mode.
• One way to overcome this problem is to keep checking whether
the range of either the input or the output memory space is
exceeded.
Circular Addressing Mode–
• In that case, the new data is to be stored starting from the
beginning of the particular memory space. However, checking this
condition is an overhead that can be overcome using the circular
addressing mode.
• In this mode, the memory can be organised as a circular buffer
with the beginning memory address and the ending memory
address corresponding to this buffer defined by the programmer.
• In the circular addressing mode, when the address pointer is
incremented, the address will be checked with the ending memory
address of the circular buffer. If it exceeds that, the address will be
made equal to the beginning address of the circular buffer.
MULTIPLE ACCESS MEMORY
• The number of memory accesses/clock period can also be
increased by using a high speed memory that permits more than
one memory access/clock period.
• For example, the DARAM, the dual access RAM, permits two
memory access/clock period.
• Multiple access RAM may be connected to the processing unit of
the P-DSP by using the Harvard architecture.
• For example DARAM connected to a P-DSP with two independent
data and address buses can be used to achieve four memory
accesses/ clock period.
MULTIPORTED MEMORY
• Another technique that is adopted for increasing the number of
accesses/clock period is to use multiported memory.
• For example the dual port memory has two independent data and
address buses as shown in Fig. hence two memory accesses can be
achieved in a clock period.
MULTIPORTED MEMORY
• Multiported memories dispense with the need for storing the program
and data in two different memory chips in order to permit
simultaneous access to both program and data memory.
• However, one of the major limitations of the dualported memory is the
increase in the cost compared to two single port memory of the same
total capacity. This is because of the increased number of pins and
larger chip area required for the dualported memory.
• Larger number of I/O pins require a larger and more expensive package
and a larger die size.
• Some P-DSPs combine the modifi ed Harvard architecture with the
dualported memories.
• For example, the Motorola DSP 561XX processors have a singleported
program memory and a dualported data memory. Hence one program
memory access and two data memory accesses can be achieved per
clock period.
On-chip Peripherals
• The P-DSPs have a number of on-chip peripherals that relieve the
CPU from routine functions.
• Some of the on-chip peripherals in the P-DSPs and their functions
are as follows.
1. On-chip Timer –
• Two of the common applications of the timers are generation of
periodic interrupts to the P-DSPs and generation of the sampling
clocks for the A/D converters.
• The timers can generate a single pulse or a periodic train of pulses.
They can also generate a single square wave or a periodic square
wave.
• The period of the timer is also made programmable.
On-chip Peripherals
2. Serial Port-
• This enables the data communication between the P-DSP and an
external peripheral such as A/D converter, D/A converter or an
RS232 C device.
• These ports normally have input and output buffers so that the P-
DSP writes or reads from the serial port in parallel form and the
serial port sends and receives data to the peripherals in serial
form.
• They also generate interrupts when the serial port output buffer is
empty or the input buffer is full, These devices have parallel to
serial and serial to parallel converter inbuilt into them.
On-chip Peripherals
2. Serial Port-
• The serial ports can operate either in the asynchronous mode or in
the synchronous mode.
• In the asynchronous mode, the transmit data and receive data
lines alone are used for communication and bit clock is
transmitted from either end.
• In the case of synchronous mode, both bit clock and a frame sync
signal that indicates the beginning of the first bit of the data
transmitted using synchronous mode is transmitted from the
serial port to the I/O device and also from I/O port to the serial
port.
On-chip Peripherals
3. Parallel Port-
• Parallel ports enable communication between the P-DSP and other
devices to be faster compared to the serial communication by
using a number of lines in parallel.
• The P-DSPs have two approaches for assigning lines for parallel
port.
• In one approach used by the TI, the data bus itself is used for
parallel ports. This is achieved by allocating a specific address
space for I/O and whenever this address space is addressed using
the I/O instructions, the parallel port signals including the
handshaking signals are sent over the data bus.
• In another approach, separate lines are dedicated for parallel
ports including the handshaking signals.

Unit 1

Uploaded by

Copyright:

Available Formats

Unit 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1

Uploaded by

Copyright:

Available Formats

UNIT - 1

• One of the most common operations required in digital signal processing

• The P-DSPs follow the modified Harvard architecture.

• Another architecture used for P-DSPs is the Very Long Instruction

Block Diagram of the VLIW architecture

You might also like