4-Distributed Arithmetic SD
4-Distributed Arithmetic SD
4-Distributed Arithmetic SD
Dr Sumam David S.
Dept. of E&C, NITK Surathkal
Distributed arithmetic
What ?
Where ?
How ?
What is DA?
N BITS WIDE
SAMPLE DATA
A0 Partial
Parallel
X0 1 Product +/- Z-1
ROM
to serial
converter Scaling Accumulator
A0
0 00000...0
1 C0
-23 22 21 20 -23 22 21 20
C0 = 1 0 0 1 (-7) C1 = 0 1 1 0 ( 6)
X X0 = 0 1 1 1 ( 7) X X1 = 0 1 0 1 ( 5)
( 1 0 0 1 + 0 1 1 0) 1111 (-1)
( 1 0 0 1 + 0 0 0 0 ) 1001 (-14)
( 1 0 0 1 + 0 1 1 0 ) 1111 (-4)
(0 0 0 0 + 0 0 0 0 ) 0000 (0)
1 1 0 0 1 1 1 1 (-49) 0 0 0 1 1 1 1 0 ( 30) =11101101 (-19)
N BITS WIDE
SAMPLE DATA
X0
A0 Partial
Product +/- Z-1
A1
X1 1
ROM
Scaling Accumulator
A0
X0
0000...0
C0
1
A1
+
X1
0000...0
1
C1 Partial Z-1
+/-
+
A2 0000...0
Product
X2
C2 Scaling
1 ROM Accumulator
A3
+
X3
0000...0
C3
SDA 8-Tap FIR Filter
N BITS WIDE
SAMPLE DATA
A0
X0
1
A1
X1 Partial
1
A2 Product
X Pre-Adder
2 1 ROM
A3
X3
+/- Z-1
1
+
A0
X4
1
Scaling
Accumulator
A1
X5
1 Partial
X
A2 Product
6 1
ROM 4 -input LUT contains all
A3 possible sums of the
X7 partial products
Xilinx DA FIR Performance
60 6000
Sample Rate (MSPS)
Single MAC
Performance (MMACs/s)
Dual MAC
50 DA FIR B=8 5000 DA FIR B=8
DA FIR B=12 DA FIR B=12
40 DA FIR B=16 4000 DA FIR B=16
30 3000
0 0
0 50 100 150 200 250 0 50 100 150 200 250
Filter Length (Taps) Filter Length (Taps)
b7 b7 b7
Serial-DA Parallel-DA
b4
b3
b0
Hardware b0 Hardware b0 Hardware
b0
Over-sampling = 8 Over-sampling = 4 Over-sampling = 2
b 7 b3
Hardware
Over-sampling = 1
b4 b0
The sample is serialized The sample is serialized
and processed 1 bit per and processed 2 bits
clock cycle. 8 clock per clock cycle. 4 clock The sample is
cycles are thus required cycles are thus required The sample is serialized b0 processed in
to process the whole to process the whole and processed 4 bits per parallel 8 bits
sample sample clock cycle per clock cycle
Conclusion
Efficiency of computation
Slow as its bit serial
Memory requirements
References