FPGA Architecture and Implementation
FPGA Architecture and Implementation
Filter
DIGITAL LOGIC
Logic Gates
MOORES LAW
Transistor Switches
< 40 nm ! $$$
DIGITAL LOGIC
Digital Logic Function
3 Inputs
Black Box
Truth Table
(Look Up Table LUT)
SUM of PRODUCTS
Circuit) has a longer design cycle and costlier ECO (Engineering Change Order)
Still, ASIC has its own market due to the added benefit of faster performance
Programmable chips are good for medium to low volume products. If you need
WHAT IS AVAILABLE?
PLA (Programmable Logic Array) is a simple field programmable chip that
has an AND plane followed by an OR plane. It is based on the fact that any
logical function can be written in SOP (Sum of Products) form thus any
function can be implemented by AND gates generating products which
feed to an OR gate that sums them up
FPGA (Field Programmable Gate Array) has narrower logic choices and
more memory elements. LUT (Lookup Table) may replace actual logic gates
ANDs
SUM of PRODUCTS
(Re-)Programmble Links
Reconfigurable
GLUE LOGIC
Planes of
ANDs, ORs
ORs
Outputs
Logic Functions
Programmed PLD
x
Sums
Product Terms
COMPLEX PLDS
CPLDs
Programmable PLD Blocks
Programmable Interconnects
Electrically Erasable links
Feedback Outputs
CPLD Architecture
PROGRAMMABILITY: WHERE
DO FPGAS FIT?
MultiCore
Intel CPU
TI DSP
GPU
ASSP
FPGA
ManyCore
ASIC
CPU:
Market-agnostic
Accessible to many
programmers (C++)
Flexible, portable
FPGA:
Somewhat Restricted
Market
Harder to Program
(Verilog)
More efficient than SW
More expensive than
ASIC
ASIC
Market-specific
Fewer programmers
Rigid, less
programmable
Hard to build
(physical)
FPGAs
Off-the-shelf
High performance
Low development cost
Low power
Low cost in
high volumes
Re-configurability
Local
Memory
19.68 mm
19.80 mm
51x
2.7 mm
2.82 mm
WHAT IS FPGA
Field
(Moores Law)
An FPGA is a device that contains a matrix of reconfigurable gate
FPGAs are truly parallel in nature ie the performance of one part of the
FPGAs use dedicated hardware for processing logic and do not have an
operating system .
running
Logic blocks
to implement combinational
and sequential logic
Interconnect
wires to connect inputs and
outputs to logic blocks
I/O blocks
special logic blocks at periphery
of device for external connections
FPGA families
vendor
Low cost
High performance
Xilinx
Spartan 3,3L,3E
Vertex 4
LX/SX/FX,Vertex 5
LX
Altera
Cyclone II,III
Stratix II ,II GX
These wires are connected by the user and therefore must use an
PROGRAMMING TECHNOLOGIES
Fuse and anti-fuse ( One Time Programming)
fuse makes or breaks link between two wires
typical connections are 50-300 ohm
one-time programmable
SRAM:
Cell Connection
Matrix (CCM)
PSM
ANTIFUSE
:
EPROM
The EEPROM/FLASH cell in FPGAs can be used in two ways, as a control
Xilinx CLB
.. CONTD
The input lines go into the input and enable of lookup table.
The output of the lookup table gives the result of the logic
function that it implements. Lookup table is implemented
using SRAM
LOOKUP TABLE
A LUT (Lookup table) is a one bit wide memory array
A 4-input AND gate is replaced by a LUT that has four address inputs and
Location 15 would have a logic value 1 stored, all others would be zero
LUTs can be programmed and reprogrammed to change the logical
function implemented
LOOK UP TABLES
LUTS
3 6 Inputs
Multiplexer MUX
CONFIGURING LUT
Truth Table
Required Function
Programmed LUT
x1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
x2
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
x3
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
x4
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
x1
x2
x3
x4
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
LUT
x1 x2 x3 x4
x1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
x2
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
x3
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
x4
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
x1 x2
y
y
DISTRIBUTED RAM
RAM16X1S
D
W
EWCL
K
A0
A1
A2
A3
LUT
RAM32X1S
D
WE
WCL
A0
K
A1
A2
A3
A4
LUT
or
=
LUT
RAM16X2S
D0
D1
WE
WCLK O
A0
0
O
A1
1
A2
A3
or
RAM16X1D
D
W
EWCL
K
A0
SPO
A1
A2
A3
DPRA0 DP
DPRA1 O
DPRA2
DPRA3
SHIFT REGISTER
Each LUT can be configured as shift
register
LUT
IN
CE
CL
K
D
Q
CE
D
Q
CE
LUT
D
Q
CE
D
Q
CE
DEPTH[3:0]
OUT
COUT
YB
G4
G3
G2
G1
Y
Look-Up
Table O
Carry
&
Control
Logic
CK
EC
R
F5IN
BY
SR
XB
F4
F3
F2
F1
CIN
CLK
CE
X
Look-Up
Table O
Carry
&
Control
Logic
S
D
CK
EC
SLICE
LSB
Carry Logic
Routing
Storage element
Latch or flip-flop
Set and reset
True or inverted inputs
Sync. or async. control
BLOCK RAM
Port B
Port A
Spartan-3
Dual-Port
Block RAM
Block RAM
18 X 18 MULTIPLIER
Embedded 18-bit x 18-bit multiplier
2s complement signed operation
Multipliers are organized in columns
Fast arithmetic functions
Optimized to implement
multiply / accumulate modules
Data_A
(18 bits)
18 x 18
Multiplier
Data_B
(18 bits)
Output
(36 bits)
IOB FUNCTIONALITY
IOB provides interface between the package pins and CLBs
Each IOB can work as uni- or bi-directional I/O
Outputs can be forced into High Impedance
Inputs and outputs can be registered
advised for high-performance I/O
Inputs can be delayed
FPGA DESIGN
FLOW
Array
Circuit
A
B
DESIGN PROCESS
Specification
Spec
RTL
C = A+B
.
Synthesis
Netlist
Functional simulation
Post-synthesis simulation
Circuit
A
B
Array
Configuration
On chip testing
DESIGN PROCESS
IN DETAIL
Logic Synthesis
VHDL description
Module MLU( );
Reg A1,B1,Y1;
Reg MUX_0, MUX_1, MUX_2, MUX_3;
begin
A1<=(NEG_A=0)?A:~A;
B1<=(NEG_B=0)?B:~B;
Y<=(NEG_Y=0)Y1:~Y1;
MUX_0<=A1 & B1;
MUX_1<=A1 | B1;
MUX_2<=A1 ^ B1;
MUX_3<=A1 ^ B1;
case({L1,L0})
0: Y1<=MUX_0
1: Y1<=MUX_01
2: Y1<=MUX_2;
3: Y1<=MUX_3;
Endcase end
Circuit netlist
Mapping
LUT0
LUT4
LUT1
FF1
LUT5
LUT2
FF2
LUT3
Placing
FPGA
CLB SLICES
Routing
Programmable Connections
FPGA
The BIT file can be downloaded directly to the FPGA, or can be converted
Programming
Bit File
JTAG Testing
USER CONSTRAINTS
CONSTRAINTS
NET "CLOCK" LOC = "V10" | IOSTANDARD = "LVCMOS33";
NET "SEG<0>"
NET "SEG<1>"
VIRTEX 5 CLB
ARCHITECTURE
PIPELINING IDEA
PIPELINE SOLUTION