Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
131 views

Mapping DSP Algorithms Into Fpgas

This document provides an overview of mapping digital signal processing (DSP) algorithms to field programmable gate arrays (FPGAs). It discusses the history of FPGA usage for algorithm implementation and why FPGAs are well-suited for signal processing. The document then highlights several interesting algorithms for FPGA implementation, including critically sampled channelizers, divide-and-conquer discrete Fourier transforms (DFT), and Winograd fast Fourier transforms (FFT). It also outlines Xilinx's FPGA architecture and DSP tool flow.

Uploaded by

Mayam Ayo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views

Mapping DSP Algorithms Into Fpgas

This document provides an overview of mapping digital signal processing (DSP) algorithms to field programmable gate arrays (FPGAs). It discusses the history of FPGA usage for algorithm implementation and why FPGAs are well-suited for signal processing. The document then highlights several interesting algorithms for FPGA implementation, including critically sampled channelizers, divide-and-conquer discrete Fourier transforms (DFT), and Winograd fast Fourier transforms (FFT). It also outlines Xilinx's FPGA architecture and DSP tool flow.

Uploaded by

Mayam Ayo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Mapping DSP Algorithms

Into FPGAs
Sean Gallagher,Senior DSP Specialist,
Xilinx Inc.
seang@xilinx.com
215-990-4616
Page 2
Agenda
History of Algorithm implementations in FPGAs
Why FPGAs for Signal Processing
Overview of Xilinx FPGA
Interesting Algorithms for FPGA implelentation
Critically sampled channlizer
Divide and Conquer DFT
Winograd FFT
The Xilinx DSP tool flow
Page 3
History FPGAs for Algorithm Implementation
Systolic Array processing techniques were established in the
70s
S.Y. Kung, others
FPGA technology invented by Xilinx in 1984
Glue logic integration
Super Computing Research Center (SRC) built Splash I and II
coprocessing boards in early 90s
Board of 32 Xilinx FPGAs slaved to a Sun workstation
Computation speeds of 6-7 times greater than a Cray II computer
Page 4
My History With FPGAs
Visited SRC in early 90s to sell synthesis tools
Had no clue what they were talking about
Pursued MSCE at Villanova focused on algorithms in FPGAs
They had no idea what I was talking about
Masters thesis in 95, Implementing Algorithms in FPGAs
Came to Xilinx in 2001 as DSP Specialist
Still learning
Lowest Power
and Cost
Industrys Best
Price-Performance
Industrys Highest
System Performance
and Capacity
Handheld portable
ultrasound
Digital SLR lens
control module
Software defined radio
Wireless LTE infrastructure
10G PON OLT line card
LED backlit and 3D video
displays
Medical imaging
Avionics imaging
100GE line card
300G bridge
Terabit switch fabric
100G OTN
MUXPONDER
RADAR
ASIC emulation
Test & Measurement
Test &
Measurement
Next Gen Wireless
Communications
Emerging Applications Drive Demand for
Next Generation FPGAs
Automotive
Infotainment
Audio Video
Broadcast
Next Gen Wired
Communications
Aerospace
& Defense
Consume
r
Medical
Imaging
Page 5
Page 6
Why FPGA for Signal Processing?
1 GHz
256 clock cycles
= 4 MSPS
500 MHz
1 clock cycle
= 500 MSPS
- How much computational power do you need?
Page 7
Flexibility
- Howmany MACs (multiply accumulator) do you need?
- For Example, in FIR Filter,
ClockRate ate InputDataR
annels NumberOfCh ps NumberOfTa Rate OutputData
*
* *
required MACs of Number =
+
+
D Q
+
+
+
+
D Q

+
+
+
+
+
+
Parallel Semi -Parallel Serial
FPGAs can meet various throughput requirement
Speed Area


Page 8
Multi-Channel Friendly
LPF
Multi Channel
Filter
80MHz
Samples
ch1
ch2
ch3
ch4
LPF
LPF
LPF
LPF
20MHz
Samples
Parallelism enables efficient implementation of multi-channel into a single
FPGA
Many low sample rate channels can be multiplexed (e.g. TDM) and
processed in the FPGA, at a higher rate
Many of Xilinx IPs takes advantage of multi-channel implementation -
FIRCompiler, FFT
Page 9
C6416 C6416
100 MHz
100 kHz
100 kHz
FPGA as pre-processor
500 MHz 500 MHz
C6416 C6416
FPGA as co-processor
FPGA + DSP Processor
FPGA enables DSP processor acceleration mapping speed critical
loop of DSP code to FPGA
FPGAs enables consolidation of glue logic, memory, interfaces, ASSP
For detail on interface (EMIF,VLYNQ,LinkPort), see
http://www.xilinx.com/esp/wireless.htm
6 Series Xilinx FPGAs
Virtex-6 - Industry leading DSP performance
Spartan-6 Industry leading DSP cost / performance
Page 10
Industries Best
Price/Performance
Industries Highest
System Performance
Logic Cells 3.8K 147K 74K 567K
DSP Slices 8-180 288-2016
Max Transceivers 8 72
Transceiver
Performance
3.125 Gbps 6.6 Gbps
11.18 Gpbs
Memory 4,824 Kbits 38,309 Kbits
Max. SelectIO 576 1200
SelectIO Voltages 1.2v to 3.3v 1v to 2.5v
Now Shipping
Introducing the 7 Series FPGAs
Industrys Lowest Power and First Unified Architecture
Spanning Low-Cost to Ultra High-End applications
Three new device families with breakthrough innovations in power
efficiency, performance-capacity and price-performance
Page 11
Bridging the DSP Performance Gap with 7-Series
33 GMAC
770 GMAC
4752 GMAC
Virtex-7
Kintex-7
Time
Virtex-
6
Spartan-6
90 GMAC
2000 GMAC
6-Series
7-Series
DSP Performance
Multi-core DSP Architectures
* Peak performance for symmetric filters
Page 13 Virtex-6 Base Platform 13
FPGA Resource
BRAM BRAM
Switch
Matrix
DCM
CLB,
IOB,
DCM
DSP48 DSP48
Logic Fabric Logic Fabric
Virtex-6 Overview 13
=
B
A
D
+/-
25x18
MULT
25
18
48 P
25
25
C =
B
A
D
+/- +/-
25x18
MULT
25
18
48 P
25
25
C
Challenge: How do we make the best
use of these resources in most
efficient manner?
Page 14
DSP Performance through the DSP48E1 Slice
Virtex-6, Artex-7, Kintex-7, Virtex-7
2 DSP48E1 Slices / Tile
Column Structure to avoid routing delay
Pre-adder, 25x18 bit multiplier, accumulator
Pattern detect, logic operation, convergent/symmetric rounding
638 MHz Fmax
+/-
X
=
B
A
D
C
+
-
Pre-Add
25x18
Pattern
Detector
48-Bit Accum
P
DSP48E1
Slice
DSP48E1
Slice

I
n
t
e
r
c
o
n
n
e
c
t
DSP48 Tile
DSP48E1 Slice
Pre-Adder
Hardened Pre-Adder leverages
filter symmetry to reduce Logic,
Power and Routing
No restriction to coefficient table
size
Coefficients c
n
Filter symmetry
exploited to pre-add
tap delay values and
reduce multiplies by
50%
Greater Flexibility with Fully Independent Multipliers
Full, independent access to every multiplier
One accumulator for each multiplier
5 Interconnects support up to 50 bit multiplies per tile
DSP48E1
Slice
DSP48E1
Slice

I
n
t
e
r
c
o
n
n
e
c
t
DSP48 Tile
25x18 Multiplier
Single DSP slice supports up to
25x18 multiplies
50% fewer DSP resources required
for high-precision multiplies
Efficient FFT Implementations
Efficient single-precision floating-
point implementations
Single DSP Tile supports up to
50x36 multiplies
Delivers higher performance and
lower power
Efficient Rounding Modes using Pattern Matching
Only FPGA architecture
that supports pattern
detection
Pattern can be constant (set by
attribute) or C input
Efficient implementation of
rounding modes
Symmetric
Convergent
Saturation
One Accumulator for each Multiplier
DSP48E1 slice provides an
accumulator for each
multiplier
2X more than competitive
architectures
Up to 48-bits accumulation per DSP
slice
25x18 multiply
Up to 96-bits accumulation per DSP
tile
50x36 multiply
20
DSP IP Portfolio
Comprehensive IP portfolio
Constraint Driven
IP can be imported into RTL,
System Generator and
Platform Studio
Category IP Blocks
Math mult, adder, accumulator,
divider, trig, CORDIC
Filters FIR, CIC
Memory RAM, register, FIFO, shift
register
Transforms FFT, IFFT, LTE FFT
Processors MicroBlaze
Video Color correction, CFA, pixel
correction, image
characterization, edge
enhancement, noise reduction,
statistics, CSC, VFBC, Scaler,
timing controller,
Wireless DDS, DUC/DDC, MIMO
Decoder/encoder, RACH
preamble det, DPD, CFR,
Floating-Point Add/sub, mult, div, sqrt,
compare, convert, FFT
Constraint Driven IP
21
FIR Compiler
6.0
Parameter Result 1 Result 2 Result 3 Result 4
Channels 2 2 4 4
Clock Frequency 122.7 245.4 245.4 368.1
DSP Slice Count 3 1 3 1
30.22 MHz 61.44 MHz
Interpolate by 2
11 Tap FIR Filter
Overclocking automatically used to reduce DSP slice count
Quick estimates provided by IP compiler GUI
Insures best results for your design requirements
Interesting Algorithms For FPGA
Implementation
22
Critically sampled channelizers
Polyphase with a DFT bank
Divide and conquer DFT
Calculating a 1D FFT as a 2DFFT
Winograd FFT Transform
Least amount of multiplies
Channelization 23
Passband Polyphase Filters
In a FDM digital communication system a common
requirement is, for each channel:
translate the channel to baseband
shape the channel spectrum
reduce the sample rate to match the channel bandwidth
This is the function of a channelizer
When the channel spacings are equal a
computationally efficient structure for performing the
above functions is the carrier centered polyphase
transform
f
( ) S f
c
f
c
f
c
f
c
f
c
f
channels
Channelization 24
Baseband Polyphase Filter
0
( ) h n
1
( ) h n
2
( ) h n
1
( )
M
h n

( ) x n
( ) y Mn
0 0
1 1 1 1
1 1 2 1 1
( )
( )
( )
M N M
M N M
M M M N
h n h h h
h n h h h
h n h h h

+ +

=
=
=
L
L
M M M L M
L
Channelization 25
Passband Polyphase Filters
1 2
1 2 1 2
( ) ( ) 0, , 1, 0, , 1
Express the filter coefficient set in terms of a course and vernier index
and respectively
= =
r r
N
h n h r Mr r M r
M
= + K K
Invoke the modulation theorem to convert a prototype baseband filter
to its equivalent carrier centered, or spectrally shifted version
0
0
( ) ( )
( ) ( )
if
then

j n
h n H
h n e H


Channelization 26
Passband Polyphase Filters
0
( ) ( )
The coefficients of the carrier centered filter are

j n
g n h n e

=

| ( ) | H
| ( ) | G
0 1 2
1
0 1 0 2
( )
2 1 2
1 2
( ) ( )
( )
Now perform a polyphase partition on the modulated coefficients

j r Mr
r
j r j Mr
g r h r Mr e
h r Mr e e


+
= +
= +
0
0
Select so that a single period of the series is harmonically
related to

j n
e
M

Channelization 27
Passband Polyphase Filters
2
0 1
1
1
0
2
2 1 2
2
1 2
2
( ) ( )
( )

jk Mr
j r
M
r
jk r
M
k
M
g r h r Mr e e
h r Mr e

=
= +
= +
0
( ) h n
1
( ) h n
( ) x n
( , ) y Mn k
1
k
j
e

1
k
j
e

2
( )
M
h n

( 2)
k
j M
e

1
( )
M
h n

( 1)
k
j M
e

Carrier centered
polyphase filter
the one structure
basebands the
channel
shapes the signal
reduces the
sample rate
Channelization 28
Passband Polyphase Filters
0
( ) h n
1
( ) h n
( ) x n
( , ) y Mn k
1
k
j
e

1
k
j
e

2
( )
M
h n

( 2)
k
j M
e

1
( )
M
h n

( 1)
k
j M
e

0
( ) h n
1
( ) h n
( , 0) y Mn
2
( )
M
h n

1
( )
M
h n

Recovering 2 channels from


FDM spectra
The two sets of filters
employ identical coefficients
Note: the two sets of filters
contain the same data
Channelization 29
Passband Polyphase Filters
( , ) y Mn k
1
k
j
e

1
k
j
e

( 2)
k
j M
e

( 1)
k
j M
e

( , 0) y Mn
0
( ) h n
1
( ) h n
( ) x n
2
( )
M
h n

1
( )
M
h n

Only one filter is required


because the data is the same in
both filters on the previous slide
Baseband and carrier centered
polyphase filter, heterodyne and
downsample
Channelization 30
Polyphase Transform
0
( ) h n
1
( ) h n
( ) x n
( , 0) y Mn
2
( )
M
h n

1
( )
M
h n

M-Point
IDFT
( ,1) y Mn
( , 2) y Mn M
( , 1) y Mn M
1
2 /
0
( )
( ) ( ) 0,1, , 1
Recall that the IDFT of an -point sequence is

M
j nk M
k
M Y k
y n Y k e n M

=
= =

K
If the phase rotators are sequenced over all of the values of
we recognize that this is the same as computing an IDFT
M M k
Channelization 31
Passband Polyphase Filters
0
( ) h n
1
( ) h n
1
k
j
e

1
k
j
e

2
( )
M
h n

( 2)
k
j M
e

1
( )
M
h n

( 1)
k
j M
e

0
( ) h n
1
( ) h n
2
( )
M
h n

1
( )
M
h n

Carrier centered polyphase


filters can also be used for
constructing frequency
division multiplexed signals
Baseband and carrier
centered polyphase filter,
heterodyne and upsample
Channelization 32
Passband Polyphase Filters
1
k
j
e

1
k
j
e

( 2)
k
j M
e

( 1)
k
j M
e

Baseband and carrier
centered polyphase common
filter, heterodyne and
upsample
0
( ) h n
1
( ) h n
2
( )
M
h n

1
( )
M
h n

Page 33
Divide and Conquer FFT
It is possible to compute a one dimensional DFT as a two
dimensional DFT
Ideal for processing hi rate data that has been demuxed to multiple
paths at a lower rate


=
+ +

=
=
1
0
) )( (
1
0
) , ( ) , (
L
l
l mL q Mp
N
M
m
W m l x q p X
lq
N
Mpl
N
MLq
N
MLmp
N
l mL q Mp
N
W W W W W =
+ + ) )( (
mq
M
mq
L N
mqL
N
Nmp
N
W W W W = = =
/
, 1 and
pl
L
pl
M N
Mpl
N
W W W = =
/
{ }
lp
L
M
m
mq
M
L
l
lq
N
W W m l x W q p X

=


=

=
1
0
1
0
) , ( ) , (
However:
But:
Decompose DFT
into two dimensions:
Page 34
Divide and Conquer FFT
{ }
lp
L
M
m
mq
M
L
l
lq
N
W W m l x W q p X

=


=

=
1
0
1
0
) , ( ) , (
These simplifications lead to:
Process Steps:
1. Store signal column-wise
2. Compute the M point DFT for each row
3. Multiply the resulting array by the phase factors
4. Compute the L-point DFT of each column
5. Read the resulting array row wise
lq
N
W
Page 35
Winograd FFT
Developed by mathematician Schmuel Winograd in 1976
Goal was to reduce the number of multiplies required
Multiplies minimized but at expense of increased complexity
Memory mappings became very complex too
Due to complexity, cost of doing an fft did not significantly go down
Problem with algorithm is that multiplies and accumulates were separated
so execution on DSP processor was not efficient

You might also like