Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DSP Processors Engr. Naveed Khan Baloch

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 48

DSP Processors

Engr. Naveed Khan Baloch

Digital Signal Processing


Processing of digitally represented signals
Signals represented digitally via sequence of samples
Digital signals obtained from physical signals via

Transducers and Analog to digital convertors (ADC)

Digital Signal Processor


Electronic system that processes digital signals

Definition
A digital signal processor (DSP) is a specialized

microprocessor with optimized architecture for fast


operational needs of Digital Signal Processing.

DSP Applications
Audio
Coding, Decoding, Surround-sound

Communication
Scrambling, Cellular phones, software radios

Control
Robotics, Disk drive control, motor control

Medical
Diagnostics equipment, hearing aids

Defense
Radar and sonar processing, missile guidance
4

Why DSP Processors ?


Reprogrammable
Cost effective
Fast computation
Energy Efficiency
Fast Multipliers
Multiple Execution Units
Efficient Memory Accesses
Circular Buffering
Data Format
Zero-Overhead Looping
Streamlined I/O
Specialized Instruction Sets
SIMD

Reprogrammable

Cost Effective

There is no need for a separate signal processing unit


Signal processing and control functions can be performed

on a single silicon chip

Faster computation
Because of specialized Hardware for DSP application

computation becomes vary fast


Separate MAC units and fast multipliers are used for
many DSP algorithms for faster execution i.e.
FIR Filter
IIR Filter
DCT
FFT

Fast Multipliers
Originally, microprocessors implemented multiplications by a

series of shift and add operations, each of which consumed one


or more clock cycles.
Most DSP processors can only take one clock cycle for the
multiplication operation.
modern DSP processors include at least one dedicated singlecycle multiplier or combined multiply-accumulate
(MAC) unit

Multiple Execution Units


DSP applications typically have very high computational

requirements in comparison to other types of computing


tasks, since they often must execute DSP algorithms (such
as FIR filtering) in real time on lengthy segments of
signals sampled at 10-100 KHz or higher. Hence, DSP
processors often include several independent execution
units that are capable of operating in parallelfor
example, in addition to the MAC unit, they typically
contain an arithmetic- logic unit (ALU) and a shifter.
10

Efficient Memory Accesses

Small bank of RAM near the processor core that

is used as an instruction cache


Many DSP processors also support circular addressing, which
allows the processor to access a block of data sequentially and
then automatically wrap around to the beginning address

11

Circular Buffering
The process by which the Data Address Generator (DAG)

wraps around or repeatedly steps through a range of


registers.
Instructions Accommodate 3 elements
Buffer Address
Buffer Size
Increment

12

Data Format
Fixed point and floating point processors.
Use of Accumulator to reduce the overflow.

13

Assignment # 2
Highlight the difference between the Architecture of

Fixed point and Floating point DSP processors with at


least 2 examples from TI and Blackfin processors.
www.TI.com
www.analog.com

14

Zero-Overhead Looping
Special loop or repeat instruction is provided which allows
the programmer to implement a for-next loop without
expending any clock cycles for updating and testing the
loop counter or branching back to the top of the loop. This
feature is often referred to as zero-overhead looping.

15

Streamlined I/O
To allow low-cost, high-performance input and output,

most DSP processors incorporate one or more specialized


serial or parallel I/O interfaces, and streamlined I/O
handling mechanisms, such as low-overhead interrupts
and direct memory access (DMA), to allow data transfers
to proceed with little or no intervention from the
processor's computational units.

16

Specialized Instruction Sets


DSP processor instruction sets have traditionally been

designed with two goals in mind


Maximum use of the processor's underlying hardware
Minimize the amount of memory space required to store

DSP programs

Highly Specialized
Complicated
Irregular
Use Assembly instead of C for maximum benefit
17

SIMD
SIMD, or single-instruction, multiple-data, is not a class

of architecture itself, but is instead an architectural


technique that can be used within any of the classes of
architectures
Improves performance on some algorithms by allowing
the processor to execute multiple instances of the same
operation
For example, a SIMD multiplication instruction could
perform two or more multiplications on different sets of
input operands in parallel in a single clock cycle.
18

Outline
Blackfin Family Overview
The Blackfin Core
Arithmetic operations
Data fetching
Sequencing
The Blackfin Bus Architecture and Memory
Modified Harvard architecture
Hierarchical memory structure
Flexible memory management
Additional Blackfin Core Features
DMA
Dynamic power management
On-chip debug support
19

Blackfin Family Overview


The Blackfin family consists of:
A broad range of Blackfin processors
Software development tools
Hardware evaluation and debug tools
Extensive third-party support
Development tools
Operating systems
TCP/IP stacks
Hardware building blocks
Software solutions

20

Blackfin Processors
All Blackfin processors combine extensive DSP capability with
high end MCU functions on the same core.
Creates a highly efficient and cost-effective solution.
A single software development tool chain
All Blackfin processors are based on the same core architecture.
Once you understand one Blackfin processor, you can easily migrate
from one family member to another.
Code compatible across family members.
Processors vary in clock speed, amount of on-chip memory,
peripheral suite, package types and sizes, power, and price.
Large selection lets you optimize your choice of a Blackfin processor for your application.

21

Blackfin Family Peripherals


The Blackfin family supports a wide variety of I/O:
EBIU (External Bus Interface Unit)
Parallel peripheral interface (PPI)
Serial ports (SPORTS)
GPIO
Timers
UARTS
SPI
Ethernet
USB
CAN
Two Wire Interface (TWI)
Pixel compositor
Lockboxsecure technology
Host DMA
ATAPI
SDIO
22

Blackfin Processors Perform


Signal Processing and Microcontroller Functions

23

Blackfin Architecture
What does it mean for the developer?
Combining controller and DSP capabilities into a single core, along with rich I/O, enables

development of efficient, low cost embedded media applications.


For example, multimedia over IP, digital cameras, telematics,
software radio
From a development perspective, a single core means there is only one tool chain.
An embedded application consisting of both control and signal processing modules is built

using the same compiler.


The result is dense control code and high performance DSP code.

24

Features
Controller
L1 memory space for stack and heap
Dedicated stack and frame pointers
Byte addressability
Simple bit-level manipulation
DSP
Fast, flexible arithmetic computational units
Unconstrained data flow to/from computational units
Extended precision and dynamic range
Efficient sequencing
Efficient I/O processing
The DSP aspect of the Blackfin core is optimized to perform FFTsand convolutions

25

Blackfin Core (e.g., ADSP-BF54x)

26

27

The Blackfin CoreThe core consists of:


Arithmetic unit
Supports SIMD operation
Load/store architectureAddressing

Addressing unit
Supports dual data fetch

Sequencer
Efficient program flow control

Register files
Data
Addressing

28

29

The Arithmetic Unit


The Arithmetic UnitPerforms arithmetic operations
Dual 40-bit ALU (Arithmetic/Logic Unit)
Performs 16-/32-/40-bit arithmetic and logical operations

Dual 16 x 16 multiplier
Performs dual MACs(multiply-accumulates) when used with ALUs

Barrel shifter
Performs shifts, rotates, bit operations

30

31

Data Registers
There are 8x 32-bit registers in the data register file.
Used to hold 32-bit vales or packed 16-bit

There are also 2x 40-bit accumulators.


Typically used for MAC operations

32

16-Bit ALU OperationsExamples


The Algebraic Assembly syntax is intuitive and makes it easy to understand

what the instruction is doing.

33

32-Bit ALU OperationsExamples

34

Dual MAC OperationsExample

35

Barrel Shifter
Enable shifting or rotating any number of bits within a 16-/32-/40-bit register in

a single cycle
Perform individual bit operations on 32-bit data register contents
BITSET, BITCLR, BITTGL, BITTST
Field Extract and Deposit instructions
Extract or insert a field of bits out of or into a 32-bit data register

36

37

8-Bit ALU Operations


Four 8-bit ALUsprovide parallel computational power targeted mainly for video

operations.
Quad 8-bit add/subtract
Quad 8-bit average
SAA (Subtract-Absolute-Accumulate) instruction
A quad 8-bit ALU instruction takes one cycle to complete.

38

Additional Arithmetic Instructions


There are a number of specialized instructions that are used
to speed up the inner loop on various algorithms.
Bitwise XOR
Enable creating LFSR (Linear Feedback Shift Registers) for use in CRC calculations or

the generation of PRN sequences


Bit stream multiplexing, add on sign, compare select
Convolutionalencoder and Viterbidecoder support
Add/Subtract with prescaleup/down
IEEE 1180compliant 2D 8 x 8 DCTs(Discrete Cosine Transforms)
Vector search
Enable search a vector a pair at a time for greatest or least value

39

The Addressing Unit


UnitThe addressing unit generates addresses for data fetches.
Two DAG (Data Address Generator) arithmetic units enable generation of

independent 32-bit wide addresses that can reach anywhere within the Blackfin
memory space.
Up to two fetches can occur at the same time.

40

41

Address Registers
There are 6x general-purpose Pointer Registers.
Used for GP 8-/16-/32-bit fetches
from memory
There are four sets of registers used for DSP-style data accesses.
Used for 16-/32-bit DSP data fetches such as dual data fetch, circular buffer

addressing, and bit reversal


There are also dedicated stack (SP) and frame (FP) pointers.
These are used for 32-bit accesses to stack frames.

42

43

Addressing
Addressing Unit supports:
Addressing only
With specified Pointer or Index Register
Provide address and post modify
Add an offset after the fetch is done
Circular buffering supported with this method
Provide address with an offset
Add an offset before the fetch, but no pointer update
Update address only
Modify address with reverse carry add
All addressing is Register Indirect.

44

Addressing
Index Registers I0-I3 (32-/16-bit accesses)
Pointer Registers P0P5 (32-/16-/8-bit accesses)
Stack and Frame Pointer Registers (32-bit accesses)
All addresses are Byte addresses.
Ordering is Little Endian.
Addresses must be aligned for the word size being fetched.
i.e., 32-bit fetches from addresses that are a multiple of four

45

Circular Buffer Example


ExampleBase address (B) and Starting address (I) = 0
Buffer length L = 44(There are 11 data elements and each data element is 4-

bytes)
Modify value M = 16 (4 elements *4-bytes/element)
Example memory access:R1 = [I0 ++ M2];

The Addressing Unit supports Circular Buffer pointer addressing.


The process of boundary checking and pointer wrapping to stay inbounds

happens in hardware with no overhead.


Buffers can be placed anywhere in memory without restriction dueto the Base
address registers.

46

47

The Sequencer
The sequencers function is to generate addresses for fetching instructions.
Uses a variety of registers to select the next address

Aligns instructions as they are fetched


Always reads 64 bits from memory
Realigns what is fetched into individual 16-/32-/64-bit opcodes before sending to the

execution pipeline

Handles events
Interrupts and exceptions

Conditional execution

48

You might also like