Synthesis and Simulation Design Guide: UG626 (V 11.4) December 2, 2009
Synthesis and Simulation Design Guide: UG626 (V 11.4) December 2, 2009
Design Guide
Xilinx is disclosing this user guide, manual, release note, and/or specification (the “Documentation”) to you
solely for use in the development of designs to operate with Xilinx hardware devices. You may not reproduce,
distribute, republish, download, display, post, or transmit the Documentation in any form or by any means
including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior
written consent of Xilinx. Xilinx expressly disclaims any liability arising out of your use of the Documentation.
Xilinx reserves the right, at its sole discretion, to change the Documentation without notice at any time. Xilinx
assumes no obligation to correct any errors contained in the Documentation, or to advise you of any corrections
or updates. Xilinx expressly disclaims any liability in connection with technical support or assistance that may be
provided to you in connection with the Information.
THE DOCUMENTATION IS DISCLOSED TO YOU “AS-IS” WITH NO WARRANTY OF ANY KIND. XILINX
MAKES NO OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, REGARDING
THE DOCUMENTATION, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE, OR NONINFRINGEMENT OF THIRD-PARTY RIGHTS. IN NO EVENT WILL
XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL
DAMAGES, INCLUDING ANY LOSS OF DATA OR LOST PROFITS, ARISING FROM YOUR USE OF THE
DOCUMENTATION.
© Copyright 2002-2009 Xilinx Inc. All Rights Reserved. XILINX, the Xilinx logo, the Brand Window and other
designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their
respective owners. The PowerPC name and logo are registered trademarks of IBM Corp., and used under license.
All other trademarks are the property of their respective owners.
Additional Resources
To find additional documentation, see the Xilinx website at:
http://www.xilinx.com/literature.
To search the Answer Database of silicon, software, and IP questions and answers, or to create a technical
support WebCase, see the Xilinx website at:
http://www.xilinx.com/support.
Conventions
This document uses the following conventions. An example illustrates each convention.
Typographical
The following typographical conventions are used in this document:
Convention Meaning or Use Example
Courier font Messages, prompts, and program files speed grade: - 100
that the system displays
Courier bold Literal commands that you enter in a ngdbuild design_name
syntactical statement
Helvetica bold Commands that you select from a File > Open
menu
Keyboard shortcuts Ctrl+C
Italic font Variables in a syntax statement for ngdbuild design_name
which you must supply values
References to other manuals See the Command Line Tools User Guide
for more information.
Emphasis in text If a wire is drawn so that it overlaps
the pin of a symbol, the two nets are
not connected.
Square brackets [ ] An optional entry or parameter. ngdbuild [option_name]
However, in bus specifications, such as design_name
bus[7:0], they are required.
Braces { } A list of items from which you must lowpwr ={on|off}
choose one or more
Vertical bar | Separates items in a list of choices lowpwr ={on|off}
Vertical ellipsis Repetitive material that has been IOB #1: Name = QOUT IOB #2:
omitted Name = CLKIN
.
.
.
Horizontal ellipsis . . . Repetitive material that has been allow block . . . block_name
omitted loc1 loc2 ... locn;
Online Document
The following conventions are used in this document:
Convention Meaning or Use Example
Blue text Cross-reference link See the section Additional Resources
for details.
Refer to Title Formats in Chapter 1 for
details.
See Figure 2-5 in the Virtex®-6
Handbook.
This chapter describes Hardware Description Language (HDL). This chapter includes:
• Advantages of Using a Hardware Description Language (HDL) to Design FPGA Devices
• Designing FPGA Devices With Hardware Description Language (HDL)
Designers use an HDL to describe the behavior and structure of system and circuit designs. Understanding
FPGA architecture allows you to create HDL code that effectively uses FPGA system features. To learn more
about designing FPGA devices with HDL:
• Enroll in training classes offered by Xilinx® and by synthesis tool vendors.
• Review the HDL design examples in this Guide.
• Download design examples from Xilinx Support.
• Take advantage of the many other resources offered by Xilinx, including:
– Documentation
– Tutorials
– Service packs
– Telephone hotline
– Answers database
For more information, see Additional Resources.
Designing Hierarchy
Using a Hardware Description Language (HDL) gives added flexibility in describing the design. Not all HDL
code is optimized the same. How and where the functionality is described can have dramatic effects on end
optimization. For example:
• Certain techniques may unnecessarily increase the design size and power while decreasing performance.
• Other techniques can result in more optimal designs in terms of any or all of those same metrics.
This Guide will help instruct you in techniques for optional FPGA design methodologies.
Design hierarchy is important in both the implementation of an FPGA and during interactive changes. Some
synthesizers maintain the hierarchical boundaries unless you group modules together. Modules should have
registered outputs so their boundaries are not an impediment to optimization. Otherwise, modules should be as
large as possible within the limitations of your synthesis tool.
The “5,000 gates per module” rule is no longer valid, and can interfere with optimization. Check with your
synthesis vendor for the preferred module size. As a last resort, use the grouping commands of your synthesizer,
if available. The size and content of the modules influence synthesis results and design implementation. This
Guide describes how to create effective design hierarchy.
Architecture Wizard
Use Architecture Wizard to configure advanced features of Xilinx® devices. Architecture Wizard consists of
several components for configuring specific device features. Each component functions as an independent
wizard. For more information, see Architecture Wizard Components.
Architecture Wizard creates a VHDL, Verilog, or Electronic Data Interchange Format (EDIF) file, depending on
the flow type passed to it. The generated Hardware Description Language (HDL) output is a module consisting
of one or more primitives and the corresponding properties, and not just a code snippet. This allows the output
file to be referenced from the HDL Editor. No User Constraints File (UCF) is output, since the necessary
attributes are embedded inside the HDL file.
Clocking Wizard
The Clocking Wizard enables:
• Digital clock setup
• DCM and clock buffer viewing
• DRC checking
RocketIO Wizard
The RocketIO Wizard enables serial connectivity between devices, backplanes, and subsystems.
The RocketIO Wizard allows you to:
• Specify RocketIO type
• Define Channel Bonding options
• Specify General Transmitter Settings, including encoding, CRC, and clock
• Specify General Receptor Settings, including encoding, CRC, and clock
• Provide the ability to specify Synchronization
• Specify Equalization, Signal integrity tip (resister, termination mode ...)
• View and edit component attributes
• View and edit component constraints
• Automatically place one component in the XAW file
• Save component settings to a VHDL file or Verilog file
ChipSync Wizard
The ChipSync Wizard applies to Virtex-4 devices and Virtex-5 devices only.
The ChipSync Wizard:
• Facilitates the implementation of high-speed source synchronous applications.
• Configures a group of I/O blocks into an interface for use in memory, networking, or any other type
of bus interface.
• Creates Hardware Description Language (HDL) code with these features configured according to your input:
– Width and IO standard of data, address, and clocks for the interface
– Additional pins such as reference clocks and control pins
– Adjustable input delay for data and clock pins
– Clock buffers (BUFIO) for input clocks
– ISERDES/OSERDES or IDDR/ODDR blocks to control the width of data, clock enables, and tristate
signals to the fabric
VHO Files
VHDL template (VHO) template files contain code that can be used as a model for instantiating a CORE
Generator software module in a VHDL design. VHO files come with a VHDL (VHD) wrapper file.
VEO Files
Verilog template (VEO) files contain code that can be used as a model for instantiating a CORE Generator
software module in a Verilog design. VEO files come with a Verilog (V) wrapper file.
Some cores may generate actual source code or an additional top level Hardware Description Language (HDL)
wrapper with clocking resource and Input Output Block (IOB) instances to enable you to tailor your clocking
scheme to your own requirements. For more information, see the core-specific documentation.
The V (Verilog) and VHD (VHDL) wrapper files mainly support simulation and are not synthesizable.
Synplify Commands
Function Command
Start a new project project -new
Set device options set_option -technology virtex
set_option -part XCV50E
set_option -package CS144
set_option -speed_grade -8
Add file options add_file -constraint watch.sdc
add_file -vhdl -lib work macro1.vhd
add_file -vhdl -lib work macro2.vhd
add_file -vhdl -lib work top_levle.vhd
Set compilation and mapping options set_option -default_enum_encoding onehot
set_option -symbolic_fsm_compiler true
set_option -resource_sharing true
Set simulation options set_option -write_verilog false
set_option -write_vhdl false
Set automatic Place and Route (vendor) options set_option -write_apr_cnstrnt true
set_option -part XCV50E
set_option -package CS144
set_option -speed_grade -8
Function Command
Set result format and file options project -result_format edif
project -result_file top_level.edf
project -run
project -save “watch.prj”
Exit exit
Reading Cores
The synthesis tools discussed in this section support incorporating the information in the CORE Generator™
software NDF files when performing design timing and area analysis.
Including the IP core NDF files in a design when analyzing a design results in better timing and resource
optimizations for the surrounding logic. The NDF is used to estimate the delay through the logic elements
associated with the IP core. The synthesis tools do not optimize the IP core itself, nor do they integrate the IP
core netlist into the synthesized design output netlist.
Setting Constraints
Setting constraints:
• Allows you to control timing optimization
• Uses synthesis tools and implementation processes more efficiently
• Helps minimize runtime and achieve your design requirements
The Precision RTL Synthesis and Synplify constraint editing tools allow you to apply constraints to your
Hardware Description Language (HDL) design.
These reports are usually accurate because the synthesis tool creates the logic from your code and maps your
design into the FPGA device. These reports are different for the various synthesis tools. Some reports specify
the minimum number of CLBs required, while other reports specify the unpacked number of CLBs to make
an allowance for routing. For an accurate comparison, compare reports from the Xilinx® mapper tool after
implementation.
Any instantiated components, such as CORE Generator™ software modules, Electronic Data Interchange Format
(EDIF) files, or other components that your synthesis tool does not recognize during compilation, are not included
in the report file. If you include these components, you must include the logic area used by these components
when estimating design size. Sections may be trimmed during mapping, resulting in a smaller design.
Use the timing report command of your synthesis tool to obtain a report with estimated data path delays.
For more information, see your synthesis tool documentation.
The timing report is based on the logic level delays from the cell libraries and estimated wire-load models. While
this report estimates how close you are to your timing goals, it is not the actual timing. An accurate timing report
is available only after the design is placed and routed.
A typical design for a Virtex®-4 device or a Virtex-5 device should allow 40% of the delay for logic, and 60%
of the delay for routing. If most of your time is taken by logic, the design will probably not meet timing after
Place and Route.
Each device family has a unique set of system features. For more information about the system features available
for your target device, see the device data sheet.
Timing Simulation
Timing simulation is important in verifying circuit operation after the worst-case placed and routed delays are
calculated. In many cases, you can use the same test bench that you used for functional simulation to perform
a more accurate simulation with less effort. Compare the results from the two simulations to verify that your
design is performing as initially specified. The Xilinx® tools create a VHDL or Verilog simulation netlist of your
placed and routed design, and provide libraries that work with many common Hardware Description Language
(HDL) simulators. For more information, see Simulating Your Design.
Timing-driven PAR is based on TRACE, the Xilinx timing analysis tool. TRACE is an integrated static timing
analysis, and does not depend on input stimulus to the circuit. Placement and routing are executed according to
the timing constraints that you specified at the beginning of the design process. TRACE interacts with PAR to
make sure that the timing constraints you imposed are met.
If there are timing constraints, TRACE generates a report based on those constraints. If there are no timing
constraints, TRACE can optionally generate a timing report containing:
• An analysis that enumerates all clocks and the required OFFSETs for each clock
• An analysis of paths having only combinatorial logic, ordered by delay
For more information on TRACE, see the Command Line Tools User Guide. For more information on Timing
Analysis, see the ISE® Design Suite Timing Analyzer Help.
Reserved Names
The following FPGA resource names are reserved. Do not use them to name nets or components.
• Device architecture names (such as CLB, IOB, PAD, and Slice)
• Dedicated pin names (such as CLK and INIT)
• GND and VCC
• UNISIM primitive names such as BUFG, DCM, and RAMB16
• Do not use pin names such as P1 or A4 for component names
For language-specific naming restrictions, see the VHDL and Verilog reference manuals. Xilinx® does not
recommend using escape sequences for illegal characters. If you plan to import schematics, or to use mixed
language synthesis or verification, use the most restrictive character set.
internal signals — —
Since Verilog is case sensitive, module and instance names can be made unique by changing their capitalization.
For compatibility with file names, mixed language support, and other tools, Xilinx recommends that you rely on
more than just capitalization to make instances unique.
Naming Identifiers
To make design code easier to debug and reuse:
• Use concise but meaningful identifier names.
• Use meaningful names for wires, regs, signals, variables, types, and any identifier in the code.
Example: CONTROL_reg
• Use underscores to make the identifiers easier to read.
Instantiating Sub-Modules
Xilinx® recommends the following when using instantiating sub-modules:
• Use named association. Named association prevents incorrect connections for the ports of instantiated
components.
• Never combine positional and named association in the same statement.
• Use one port mapping per line to:
– Improve readability
– Provide space for a comment
– Allow for easier modification
FDCPE_inst : FDCPE
generic map (
INIT => ’0’) -- Initial value of register (’0’ or ’1’)
port map (
Q => Q, -- Data output
C => C, -- Clock input
CE => CE, -- Clock enable input
CLR => CLR, -- Asynchronous clear input
D => D, -- Data input
PRE => PRE -- Asynchronous set input
);
FDCPE #(
.INIT(1’b0) // Initial value of register (1’b0 or 1’b1)
) FDCPE_inst (
.Q(Q), // Data output
.C(C), // Clock input
.CE(CE), // Clock enable input
.CLR(CLR), // Asynchronous clear input
.D(D), // Data input
.PRE(PRE) // Asynchronous set input
);
Specifying Constants
Use constants in your design to substitute numbers with more meaningful names. Constants make a design
more readable and portable.
Specifying constants can be a form of in-code documentation that allows for easier understanding of code
function.
• For VHDL, Xilinx® recommends not using variables for constants. Define constant numeric values as
constants, and use them by name.
• For Verilog, parameters can be used as constants in the code in a similar manner. This coding convention
allows you to easily determine if several occurrences of the same literal value have the same meaning.
In the following coding examples, the OPCODE values are declared as constants or parameters, and the names
refer to their function. This method produces readable code that is easier to understand and modify.
Various synthesis and simulation directives can allow the asynchronous FIFO to behave in a known manner
when testing asynchronous conditions.
• In many cases, a timing violation can not be avoided when designing FIFO flag logic. If a timing violation
occurs during timing simulation, the simulator produces an unknown (X) output to indicate the unknown
state. For this reason, if logic is being driven from a known asynchronous source, and the proper design
precautions were made to ensure proper operation regardless of the violation, Xilinx recommends adding
the ASYNC_REG=TRUE attribute to the associated flag register. This indicates that the register can safely
receive asynchronous input. Timing violations on the register no longer result in an X, but instead maintain
its previous value. This can also prevent the software from replicating the register, or performing other
optimizations that can have a negative affect on the register operation. For more information, see Disabling X
Propagation for Synchronous Elements.
• A memory collision may take place when a read occurs at the same time as a write to the same memory
location. Memory collisions should generally be avoided, since they can corrupt the read data. The memory
collision can be safely ignored only if the read data is disregarded in the logic or design. In those rare cases,
you can disable collision checking with the SIM_COLLISION_CHECK attribute on the RAM model. For more
information, see Disabling BlockRAM Collision Checks for Simulation.
For more information, see Using Partitions in the ISE® Design Suite Help.
Declaring Ports
Use the std_logic type for all entity port declarations. The std_logic type makes it easier to integrate the
synthesized netlist back into the design hierarchy without requiring conversion functions for the ports. The
following VHDL coding example uses the std_logic for port declarations:
Entity alu is
port(
A : in STD_LOGIC_VECTOR(3 downto 0);
B : in STD_LOGIC_VECTOR(3 downto 0);
CLK : in STD_LOGIC;
C : out STD_LOGIC_VECTOR(3 downto 0)
);
end alu;
If a top-level port is specified as a type other than std_logic, software generated simulation models (such as
timing simulation) may no longer bind to the test bench. This is due to the following factors:
• Type information cannot be stored for a particular design port.
• Simulation of FPGA hardware requires the ability to specify the values of std_logic such as high-Z
(tristate), and X (unknown) in order to properly display hardware behavior.
Because signal C is used both internally and as an output port, every level of hierarchy in your design that
connects to port C must be declared as a buffer. Buffer types are not commonly used in VHDL designs because
they can cause errors during synthesis.
Using ‘timescale
Attention This section applies to Verilog only.
All Verilog test bench and source files should contain a ‘timescale directive, or reference an include file
containing a ‘timescale directive. Place the ‘timescale directive or reference near the beginning of the
source file, and before any module or other design unit definitions in the source file.
Xilinx® recommends that you use a ‘timescale with a resolution of 1ps. Some Xilinx primitive components
such as DCM require a 1ps resolution in order to work properly in either functional or timing simulation. There is
little or no simulation speed difference for a 1ps resolution as compared to a coarser resolution.
The following directive is a typical default:
‘timescale 1ns/1ps
always @ (*)
begin
case (SEL)
2’b00: MUX_OUT = A;
2’b01: MUX_OUT = B;
2’b10: MUX_OUT = C;
2’b11: MUX_OUT = D;
default: MUX_OUT = 0;
endcase
end
endmodule
In order to avoid these two problems, synthesis may assume that the sensitivity list contains other signals which
were not explicitly listed in the HDL code. As a result, while you will get the hardware you intended, the RTL
and post-synthesis simulation will differ. In this case, some synthesis tools may issue a message warning of an
incomplete sensitivity list. In that event, check the synthesis log file and, if necessary, fix the RTL code.
The following example describes a simple AND function using a process and always block. The sensitivity list is
complete and a single LUT is generated.
The following examples are based on the previous two coding examples, but signal b is omitted from the
sensitivity list. In this case, the synthesis tool assumes the presence of b in the sensitivity list and still generates
the combinatorial logic (AND function).
Do not use the After XX ns statement in your VHDL code or the Delay assignment in your Verilog code.
XX specifies the number of nanoseconds that must pass before a condition is executed. This statement is
usually ignored by the synthesis tool. In this case, the functionality of the simulated design does not match
the functionality of the synthesized design.
Flip-Flop with Positive Edge Clock and Clock Enable VHDL Coding Example
process (C)
begin
if (C’event and C=’1’) then
if (CE=’1’) then
Q <= D;
end if;
end if;
end process;
Flip-Flop with Positive Edge Clock and Clock Enable Verilog Coding Example
always @(posedge C)
begin
if (CE)
Q <= D;
end
Flip-Flop with Negative Edge Clock and Asynchronous Reset VHDL Coding Example
process (C, CLR)
begin
if (CLR = ’1’)then
Q <= ’0’;
elsif (C’event and C=’0’)then
Q <= D;
end if;
end process;
Flip-Flop with Negative Edge Clock and Asynchronous Reset Verilog Coding Example
always @(negedge C or posedge CLR)
begin
if (CLR)
Q <= 1’b0;
else
Q <= D;
end
Flip-Flop with Positive Edge Clock and Synchronous Set VHDL Coding Example
process (C)
begin
if (C’event and C=’1’) then
if (S=’1’) then
Q <= ’1’;
else
Q <= D;
end if;
end if;
end process;
Flip-Flop with Positive Edge Clock and Synchronous Set Verilog Coding Example
always @(posedge C)
begin
if (S)
Q <= 1’b1;
else
Q <= D;
end
Dual-Data Rate (DDR) Input Output Block (IOB) Registers VHDL Coding Example
library ieee;
use ieee.std_logic_1164.all;
entity ddr_input is
port ( clk : in std_logic;
d : in std_logic;
rst : in std_logic;
q1 : out std_logic;
q2 : out std_logic
);
end ddr_input;
Dual-Data Rate (DDR) Input Output Block (IOB) Registers Verilog Coding Example
module ddr_input (
input data_in, clk, rst,
output data_out);
Many times this is done by mistake. The design may still appear to function properly in simulation. This can be
problematic for FPGA designs, since timing for paths containing latches can be difficult to analyze. Synthesis
tools usually report in the log files when a latch is inferred to alert you to this occurrence.
Xilinx® recommends that you avoid using latches in FPGA designs, due to the more difficult timing analyses
that take place when latches are used.
Some synthesis tools can determine the number of latches in your design.
For more information, see your synthesis tool documentation.
You should convert all if statements without corresponding else statements and without a clock edge to
registers or logic gates. Use the recommended coding styles in the synthesis tool documentation to complete this
conversion.
In addition, SRLs have address inputs (LUT A3, A2, A1, A0 inputs for SRL16) determining the length of the
shift register. The shift register may be of a fixed, static length, or it may be dynamically adjusted. In dynamic
mode each time a new address is applied to the address pins, the new bit position value is available on the Q
output after the time delay to access the LUT.
As mentioned before, Synchronous and Asynchronous set/reset control signals are not available in the SLRs
primitives. However some synthesis tools are able to take advantage of dedicated SRL resources and propose
implementation allowing a significant area savings.
For more information, see your synthesis tool documentation.
8-Bit Shift-Left Register Serial In and Serial Out VHDL Coding Example
library ieee;
use ieee.std_logic_1164.all;
entity shift_regs_1 is
port(C, SI : in std_logic;
SO : out std_logic);
end shift_regs_1;
process (C)
begin
if (C’event and C=’1’) then
tmp <= tmp(6 downto 0) & SI;
end if;
end process;
SO <= tmp(7);
end archi;
8-Bit Shift-Left Register Serial In and Serial Out Verilog Coding Example
module v_shift_regs_1 (C, SI, SO);
input C,SI;
output SO;
reg [7:0] tmp;
always @(posedge C)
begin
tmp = {tmp[6:0], SI};
end
assign SO = tmp[7];
endmodule
16-Bit Dynamic Shift Register With Serial In and Serial Out VHDL Coding Example
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;
entity dynamic_shift_regs_1 is
port(CLK : in std_logic;
DATA : in std_logic;
CE : in std_logic;
A : in std_logic_vector(3 downto 0);
Q : out std_logic);
end dynamic_shift_regs_1;
begin
PROC_SRL16 : process (CLK)
begin
if (CLK’event and CLK = ’1’) then
if (CE = ’1’) then
SRL_SIG <= DATA & SRL_SIG(0 to DEPTH_WIDTH-2);
end if;
end if;
end process;
Q <= SRL_SIG(conv_integer(A));
end rtl;
16-Bit Dynamic Shift Register With Serial In and Serial Out Verilog Coding Example
module v_dynamic_shift_regs_1 (Q,CE,CLK,D,A);
input CLK, D, CE;
input [3:0] A;
output Q;
reg [15:0] data;
assign Q = data[A];
endmodule
Control Signals
This section discusses Control Signals, and includes:
• Set, Resets, and Synthesis Optimization
• Asynchronous Resets Coding Examples
• Synchronous Resets Coding Examples
• Using Clock Enable Pin Instead of Gated Clocks
• Converting the Gated Clock to a Clock Enable
The synthesis tool now has more flexibility as to how this function can be represented. For a possible
implementation of this code, see the following diagram.
In this implementation, the synthesis tool can identify that any time A is active high, Q is always a logic one.
With the register now configured with the set/reset as a synchronous operation, the set is now free to be used
as part of the synchronous data path. This reduces:
• The amount of logic necessary to implement the function
• The data path delays for the D and E signals
Logic could have also been shifted to the reset side as well if the code was written in a way that was a more
beneficial implementation
Since eight signals now contribute to the logic function, a minimum of three LUTs are needed to implement this
function. For a possible implementation of this code, see the following diagram.
Initial State of the Registers and Latches VHDL Coding Example One
signal register1 : std_logic := ’0’; -- specifying register1 to start as a zero
signal register2 : std_logic := ’1’; -- specifying register2 to start as a one
signal register3 : std_logic_vector(3 downto 0):="1011"; -- specifying INIT value for 4-bit register
Initial State of the Registers and Latches Verilog Coding Example One
reg register1 = 1’b0; // specifying regsiter1 to start as a zero
reg register2 = 1’b1; // specifying register2 to start as a one
reg [3:0] register3 = 4’b1011; //specifying INIT value for 4-bit register
Initial State of the Registers and Latches Verilog Coding Example Two
Another possibility in Verilog is to use an initial statement:
reg [3:0] register3;
initial begin
register3= 4’b1011;
end
Not all synthesis tools support this initialization. To determine whether it is supported, see your synthesis
tool documentation. If this initialization is not supported, or if it is not specified in the code, the initial value
is determined by the presence or absence of an asynchronous preset in the code. If an asynchronous preset is
present, the register initializes to a one. If an asynchronous preset is not present, the register initializes to a
logic zero.
Not all synthesis tools support this initialization. To determine whether it is supported, see your synthesis
tool documentation.
Multiplexers
You can implement multiplexers on Xilinx® FPGA devices by using:
• Dedicated resources such as MUXF5, MUXF6 ...
• Using Carry chains
• LUTs only
The implementation choice is automatically taken by the synthesis tool and driven by speed or area design
requirements. However some synthesis tools allow you to control the implementation style of multiplexers.
entity multiplexers_2 is
port (a, b, c, d : in std_logic;
s : in std_logic_vector (1 downto 0);
o : out std_logic);
end multiplexers_2;
always @(a or b or c or d or s)
begin
case (s)
2’b00 : o = a;
2’b01 : o = b;
2’b10 : o = c;
default : o = d;
endcase
end
endmodule
entity multiplexers_1 is
port (a, b, c, d : in std_logic;
s : in std_logic_vector (1 downto 0);
o : out std_logic);
end multiplexers_1;
always @(a or b or c or d or s)
begin
if (s == 2’b00) o = a;
else if (s == 2’b01) o = b;
else if (s == 2’b10) o = c;
else o = d;
end
endmodule
For a Hardware Description Language (HDL), process (VHDL) and always blocks (Verilog) are the best ways to
describe FSM components. Xilinx® uses process to refer to both VHDL processes and Verilog always blocks.
You may have several processes (1, 2 or 3) in your description, consider and decompose the different parts
of the preceding model.
The following example shows the Moore Machine with an Asynchronous Reset (RESET):
• 4 states: s1, s2, s3, s4
• 5 transitions
• 1 input: "x1"
• 1 output: "outp"
This model is represented by the following Bubble Diagram.
Bubble Diagram
Finite State Machine (FSM) With One Process VHDL Coding Example
---- State Machine with a single process.
--
library IEEE;
use IEEE.std_logic_1164.all;
entity fsm_1 is
port ( clk, reset, x1 : IN std_logic;
outp : OUT std_logic);
end entity;
begin
process (clk,reset)
begin
if (reset =’1’) then
state <=s1;
outp<=’1’;
elsif (clk=’1’ and clk’event) then
case state is
when s1 => if x1=’1’ then
state <= s2;
outp <= ’1’;
else
state <= s3;
outp <= ’0’;
end if;
when s2 => state <= s4; outp <= ’0’;
when s3 => state <= s4; outp <= ’0’;
when s4 => state <= s1; outp <= ’1’;
end case;
end if;
end process;
end beh1;
Finite State Machine (FSM) With a Single Always Block Verilog Coding Example
//
// State Machine with a single always block.
//
module v_fsm_1 (clk, reset, x1, outp);
input clk, reset, x1;
output outp;
reg outp;
reg [1:0] state;
initial begin
state = 2’b00;
end
In VHDL, the type of a state register can be a different type, such as:
• integer
• bit_vector
• std_logic_vector
Xilinx® recommends that you use an enumerated type containing all possible state values and to declare your
state register with that type. This method was used in the previous VHDL Coding Example.
In Verilog, the type of state register can be an integer or a set of defined parameters. Xilinx recommends using a
set of defined for state register definition. This method was used in the previous Verilog coding example.
A Finite State Machine (FSM) With One Process can be described with three processes using the FSM
decomposition shown in the following diagram.
Implementing Memory
Xilinx® FPGA devices provide two types of RAM:
• Distributed RAM (SelectRAM)
• Block RAM (Block SelectRAM)
CORE Generator software Gives more control over the RAM • May complicate design migration
creation from one FPGA family to another
• Slower simulation comparing to
Inference
Instantiation Offers the most control over the • Limit and complicates design
implementation migration from one FPGA family
to another
• Requires multiple instantiations
to properly create the right RAM
configuration
Block and Distributed RAMs offer synchronous write capabilities. Read operation of the Block RAM is
synchronous, while the distributed RAM can be configured for either asynchronous or synchronous reads.
In general, the selection of distributed RAM versus block RAM depends on the size of the RAM. If the RAM
is not very deep, it is generally advantageous to use the distributed RAM. If you require a deeper RAM, it is
generally more advantageous to use the block memory.
If a memory description can be implemented using Block and Distributed RAM resources, the synthesis tool
automatically chooses how to implement RAM. This choice is driven by RAM size, speed, and area design
requirements. If the automatic implementation choice does not meet your requirements, synthesis tools offer
dedicated constraints allowing you to select the RAM type.
For more information, see your synthesis tool documentation.
Since all Xilinx RAMs have the ability to be initialized, the RAMs may also be configured either as a ROM
(Read Only Memory), or as a RAM with pre-defined contents. Initialization of RAMs can be done directly
from HDL code.
Some synthesis tools provide additional control over RAM inference and optimization process, such as
pipelining, automatic Block RAM packing, and automatic Block RAM resource management.
For more information, see your synthesis tool documentation.
For additional information about Implementing Memory, see:
• Block RAM Inference
• Distributed RAM Inference
entity rams_01 is
port (clk : in std_logic;
we : in std_logic;
en : in std_logic;
addr : in std_logic_vector(5 downto 0);
di : in std_logic_vector(15 downto 0);
do : out std_logic_vector(15 downto 0));
end rams_01;
process (clk)
begin
if clk’event and clk = ’1’ then
if en = ’1’ then
if we = ’1’ then
RAM(conv_integer(addr)) <= di;
end if;
do <= RAM(conv_integer(addr)) ;
end if;
end if;
end process;
end syn;
input clk;
input we;
input en;
input [5:0] addr;
input [15:0] di;
output [15:0] do;
reg [15:0] RAM [63:0];
reg [15:0] do;
entity rams_02a is
port (clk : in std_logic;
we : in std_logic;
en : in std_logic;
addr : in std_logic_vector(5 downto 0);
di : in std_logic_vector(15 downto 0);
do : out std_logic_vector(15 downto 0));
end rams_02a;
process (clk)
begin
if clk’event and clk = ’1’ then
if en = ’1’ then
if we = ’1’ then
RAM(conv_integer(addr)) <= di;
do <= di;
else
do <= RAM( conv_integer(addr));
end if;
end if;
end if;
end process;
end syn;
input clk;
input we;
input en;
input [5:0] addr;
input [15:0] di;
output [15:0] do;
reg [15:0] RAM [63:0];
reg [15:0] do;
entity rams_02b is
port (clk : in std_logic;
we : in std_logic;
en : in std_logic;
addr : in std_logic_vector(5 downto 0);
di : in std_logic_vector(15 downto 0);
do : out std_logic_vector(15 downto 0));
end rams_02b;
do <= ram(conv_integer(read_addr));
end syn;
input clk;
input we;
input en;
input [5:0] addr;
input [15:0] di;
output [15:0] do;
reg [15:0] RAM [63:0];
reg [5:0] read_addr;
assign do = RAM[read_addr];
endmodule
entity rams_03 is
port (clk : in std_logic;
we : in std_logic;
en : in std_logic;
addr : in std_logic_vector(5 downto 0);
di : in std_logic_vector(15 downto 0);
do : out std_logic_vector(15 downto 0));
end rams_03;
process (clk)
begin
if clk’event and clk = ’1’ then
if en = ’1’ then
if we = ’1’ then
RAM(conv_integer(addr)) <= di;
else
do <= RAM( conv_integer(addr));
end if;
end if;
end if;
end process;
end syn;
input clk;
input we;
input en;
input [5:0] addr;
input [15:0] di;
output [15:0] do;
reg [15:0] RAM [63:0];
reg [15:0] do;
endmodule
Dual-Port RAM in Read-First Mode With One Write Port Pin Descriptions
IO Pins Description
clka, clkb Positive-Edge Clock
ena Primary Global Enable (Active High)
enb Dual Global Enable (Active High)
wea Primary Synchronous Write
addra Write Address/Primary Read Address
addrb Dual Read Address
dia Primary Data Input
doa Primary Output Port
dob Dual Output Port
Dual-Port RAM in Read-First Mode with One Write Port VHDL Coding Example
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
entity rams_01_1 is
port (clka, clkb : in std_logic;
wea : in std_logic;
ena, enb : in std_logic;
addra, addrb : in std_logic_vector(5 downto 0);
dia : in std_logic_vector(15 downto 0);
doa, dob : out std_logic_vector(15 downto 0));
end rams_01_1;
process (clka)
begin
if clka’event and clka = ’1’ then
if ena = ’1’ then
if wea = ’1’ then
RAM(conv_integer(addra)) <= dia;
end if;
doa <= RAM(conv_integer(addra)) ;
end if;
end if;
end process;
process (clkb)
begin
if clkb’event and clkb = ’1’ then
if enb = ’1’ then
dob <= RAM(conv_integer(addrb)) ;
end if;
end if;
end process;
end syn;
Dual-Port RAM in Read-First Mode with One Write Port Verilog Coding Example
module v_rams_01_1 (clka, clkb, ena, enb, wea, addra, addrb, dia, doa, dob);
endmodule
Dual-Port Block RAM in Read-First Mode With Two Write Ports Pin Descriptions
IO Pins Description
clka, clkb Positive-Edge Clock
ena Primary Global Enable (Active High)
enb Dual Global Enable (Active High)
wea, web Primary Synchronous Write Enable (Active High)
addra Write Address/Primary Read Address
addrb Dual Read Address
dia Primary Data Input
dib Dual Data Input
doa Primary Output Port
dob Dual Output Port
Dual-Port Block RAM in Read-First Mode With Two Write Ports VHDL Coding Example
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;
entity rams_16 is
port(clka : in std_logic;
clkb : in std_logic;
ena : in std_logic;
enb : in std_logic;
wea : in std_logic;
web : in std_logic;
addra : in std_logic_vector(5 downto 0);
addrb : in std_logic_vector(5 downto 0);
dia : in std_logic_vector(15 downto 0);
dib : in std_logic_vector(15 downto 0);
doa : out std_logic_vector(15 downto 0);
dob : out std_logic_vector(15 downto 0));
end rams_16;
process (CLKA)
begin
if CLKA’event and CLKA = ’1’ then
if ENA = ’1’ then
DOA <= RAM(conv_integer(ADDRA));
if WEA = ’1’ then
RAM(conv_integer(ADDRA)) := DIA;
end if;
end if;
end if;
end process;
process (CLKB)
begin
if CLKB’event and CLKB = ’1’ then
if ENB = ’1’ then
DOB <= RAM(conv_integer(ADDRB));
if WEB = ’1’ then
RAM(conv_integer(ADDRB)) := DIB;
end if;
end if;
end if;
end process;
end syn;
Dual-Port Block RAM in Read-First Mode With Two Write Ports Verilog Coding Example
module v_rams_16 (clka,clkb,ena,enb,wea,web,addra,addrb,dia,dib,doa,dob);
input clka,clkb,ena,enb,wea,web;
input [5:0] addra,addrb;
input [15:0] dia,dib;
output [15:0] doa,dob;
reg [15:0] ram [63:0];
reg [15:0] doa,dob;
endmodule
entity rams_04 is
port (clk : in std_logic;
we : in std_logic;
a : in std_logic_vector(5 downto 0);
di : in std_logic_vector(15 downto 0);
do : out std_logic_vector(15 downto 0));
end rams_04;
process (clk)
begin
if (clk’event and clk = ’1’) then
if (we = ’1’) then
RAM(conv_integer(a)) <= di;
end if;
end if;
end process;
do <= RAM(conv_integer(a));
end syn;
input clk;
input we;
input [5:0] a;
input [15:0] di;
output [15:0] do;
reg [15:0] ram [63:0];
assign do = ram[a];
endmodule
entity rams_09 is
port (clk : in std_logic;
we : in std_logic;
a : in std_logic_vector(5 downto 0);
dpra : in std_logic_vector(5 downto 0);
di : in std_logic_vector(15 downto 0);
spo : out std_logic_vector(15 downto 0);
dpo : out std_logic_vector(15 downto 0));
end rams_09;
process (clk)
begin
if (clk’event and clk = ’1’) then
if (we = ’1’) then
RAM(conv_integer(a)) <= di;
end if;
end if;
end process;
end syn;
input clk;
input we;
input [5:0] a;
input [5:0] dpra;
input [15:0] di;
output [15:0] spo;
output [15:0] dpo;
reg [15:0] ram [63:0];
endmodule
Arithmetic Support
Xilinx® FPGA devices traditionally contain several hardware resources such as LUTs and Carry Chains. These
hardware resources efficiently implement various arithmetic operations such as adders, subtractors, counters,
accumulators, and comparators.
With the release of the Virtex®-4 device, Xilinx introduced a new primitive called DSP48. This block was
further enhanced in later families such as Virtex-5 devices and Spartan®-3A DSP devices. DSP48 allows you to
create numerous functions, including multipliers, adders, counters, barrel shifters, comparators, accumulators,
multiply accumulate, complex multipliers, and others.
Currently, synthesis tools support the most important and frequently used DSP48 modes for DSP applications
such as multipliers, adders/subtractors, multiply adders/subtractors, and multiply accumulate. The synthesis
tools also take advantage of the internal registers available in DSP48, as well as the dynamic OPMODE port.
DSP48 fast connections allow you to efficiently build fast DSP48 chains as filters. These fast connections are
automatically supported by synthesis tools today.
The level of DSP48 support may differ from one synthesis tool to another.
For more information, see your synthesis tool documentation.
Since there are several ways to implement the same arithmetic operation on the target device, synthesis tools
make automatic choices depending on the operation type, size, context usage, or timing requirements. In some
situations, the automatic choice may not meet your goals. Synthesis tools therefore offer several constraints to
control implementation process such as use_dsp48 in Xilinx Synthesis Technology (XST) or syn_dspstyle in
Synplicity.
For more information, see your synthesis tool documentation.
If you migrate a design previously implemented using an older and FPGA device family to a newer one with a
DSP48 block, and you want to take advantage of available DSP48 blocks, you must be aware of the following
rules in order to get the best performance.
• DSP48 blocks give you the best performance when fully pipelined. You should add additional pipelining
stages in order to get the best performance.
• Internal DSP48 registers support synchronous set and reset signals. Asynchronous set and reset signals are
not supported. You must replace asynchronous initialization signals by synchronous ones. Some synthesis
tools may automatically make this replacement. This operation renders the generated netlist NOT equivalent
to the initial RTL description.
For more information, see your synthesis tool documentation.
• For DSP applications, use chain structures instead of tree structures in your RTL description in order to take
full advantage of the DSP48 capabilities.
For more information on DSP48 blocks and specific DSP application coding style, see the XtremeDSP™ User
Guide for your target family.
entity arith_01 is
port(A,B : in std_logic_vector(7 downto 0);
SUM : out std_logic_vector(7 downto 0));
end arith_01;
SUM <= A + B;
end archi;
assign SUM = A + B;
Endmodule
entity arith_02 is
port(A,B : in std_logic_vector(7 downto 0);
SUM : out std_logic_vector(7 downto 0));
end arith_02;
SUM <= A + B;
end archi;
assign SUM = A + B;
Endmodule
entity arith_03 is
port(clk : in std_logic;
A,B : in std_logic_vector(7 downto 0);
SUM : out std_logic_vector(7 downto 0));
end arith_03;
end archi;
endmodule
entity arith_04 is
port(A,B : in std_logic_vector(7 downto 0);
OPER: in std_logic;
RES : out std_logic_vector(7 downto 0));
end arith_04;
end archi;
endmodule
entity arith_05 is
port(A,B : in std_logic_vector(7 downto 0);
CMP : out std_logic);
end arith_05;
end archi;
endmodule
entity arith_06 is
port(clk : in std_logic;
A : in unsigned (16 downto 0);
B : in unsigned (16 downto 0);
MULT : out unsigned (33 downto 0));
end arith_06;
begin
process (clk)
begin
if (clk’event and clk=’1’) then
reg_a <= A; reg_b <= B;
MULT <= reg_a * reg_b;
end if;
end process;
end beh;
input clk;
input [16:0] A;
input [16:0] B;
output [33:0] MULT;
entity arith_07 is
port(clk, reset : in std_logic;
Res : out std_logic_vector(7 downto 0));
end arith_07;
end archi;
entity arith_08 is
port(clk, reset : in std_logic;
din : in std_logic_vector(7 downto 0);
Res : out std_logic_vector(7 downto 0));
end arith_08;
end archi;
Multiplier Adder With 2 Register Levels on Multiplier Inputs, 1 Register Level after Multiplier and
1 Register Level after Adder VHDL Coding Example
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity arith_09 is
generic (p_width: integer:=8);
port (clk : in std_logic;
A, B : in std_logic_vector(7 downto 0);
C : in std_logic_vector(15 downto 0);
RES : out std_logic_vector(15 downto 0));
end arith_09;
process (clk)
begin
if (clk’event and clk=’1’) then
reg1_A <= A; reg2_A <= reg1_A;
reg1_B <= B; reg2_B <= reg1_B;
reg_C <= C;
reg_mult <= reg2_A * reg2_B;
RES <= reg_mult + reg_C;
end if;
end process;
end beh;
Multiplier Adder With 2 Register Levels on Multiplier Inputs, 1 Register Level after Multiplier and
1 Register Level after Adder Verilog Coding Example
module v_arith_09 (clk, A, B, C, RES);
input clk;
input [7:0] A;
input [7:0] B;
input [15:0] C;
output [15:0] RES;
reg [7:0] reg1_A, reg2_A, reg1_B, reg2_B;
reg [15:0] reg_C, reg_mult, RES;
endmodule
Multiplier Up Accumulator With 2 Register Levels on Multiplier Inputs, 1 Register Level after
Multiplier and 1 Register Level after Accumulator VHDL Coding Example
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity arith_10 is
port (clk : in std_logic;
A, B : in std_logic_vector(7 downto 0);
RES : out std_logic_vector(15 downto 0));
end arith_10;
process (clk)
begin
if (clk’event and clk=’1’) then
reg1_A <= A; reg2_A <= reg1_A;
reg1_B <= B; reg2_B <= reg1_B;
reg_mult <= reg2_A * reg2_B;
reg_accu <= reg_accu + reg_mult;
end if;
end process;
end beh;
Multiplier Up Accumulator With 2 Register Levels on Multiplier Inputs, 1 Register Level after
Multiplier and 1 Register Level after Accumulator Verilog Coding Example
module v_arith_10 (clk, A, B, RES);
input clk;
input [7:0] A;
input [7:0] B;
output [15:0] RES;
reg [7:0] reg1_A, reg2_A, reg1_B, reg2_B;
reg [15:0] reg_mult, reg_accu;
wire [15:0] RES;
endmodule
For Verilog, the following two statements are not necessarily equivalent:
ADD = A1 + A2 + A3 + A4;
ADD = (A1 + A2) + (A3 + A4);
The first statement cascades three adders in series. The second statement creates two adders in parallel: A1 + A2
and A3 + A4. In the second statement, the two additions are evaluated in parallel and the results are combined
with a third adder. Register Transfer Level (RTL) simulation results are the same for both statements. The second
statement results in a faster circuit after synthesis (depending on the bit width of the input signals).
Although the second statement generally results in a faster circuit, in some cases, you may want to use the first
statement. For example, if the A4 signal reaches the adder later than the other signals, the first statement produces
a faster implementation because the cascaded structure creates fewer logic levels for A4. This structure allows A4
to catch up to the other signals. In this case, A1 is the fastest signal followed by A2 and A3. A4 is the slowest signal.
Most synthesis tools can balance or restructure the arithmetic operator tree if timing constraints require it.
However, Xilinx® recommends that you code your design for your selected structure.
Resource Sharing
Resource sharing uses a single functional block (such as an adder or comparator) to implement several operators
in the HDL code. Use resource sharing to improve design performance by reducing the gate count and the
routing congestion. If you do not use resource sharing, each HDL operation is built with separate circuitry. You
may want to disable resource sharing for speed critical paths in your design.
The following operators can be shared either with instances of the same operator or with an operator on the
same line.
• *
• + -
• > >= < <=
For example, a + (plus) operator can be shared with instances of other + (plus) operators or with – (minus)
operators. An * (asterisk) operator can be shared only with other * (asterisk) operators.
You can implement the following arithmetic functions with gates or with your synthesis tool module library.
• +
• –
• magnitude comparators
The library functions use modules that take advantage of the carry logic in the FPGA devices. Carry logic and its
dedicated routing increase the speed of arithmetic functions that are larger than 4 bits. To increase speed, use the
module library if your design contains arithmetic functions that are larger than 4 bits, or if your design contains
only one arithmetic function. Resource sharing of the module library automatically occurs in most synthesis
tools if the arithmetic functions are in the same process.
Resource sharing adds additional logic levels to multiplex the inputs to implement more than one function. You
may not want to use it for arithmetic functions that are part of a time critical path.
Since resource sharing allows you to reduce design resources, the device area required for your design is also
decreased. The area used for a shared resource depends on the type and bit width of the shared operation. You
should create a shared resource to accommodate the largest bit width and to perform all operations.
If you use resource sharing in your designs, you may want to use multiplexers to transfer values from different
sources to a common resource input. In designs that have shared operations with the same output target,
multiplexers are reduced as shown in the following coding examples.
The VHDL example is shown implemented with gates in the following diagram.
If you disable resource sharing, or if you code the design with the adders in separate processes, the design is
implemented using two separate modules as shown in the following diagram.
Some synthesis tools may require you to explicitly include a UNISIM library to the project.
For more information, see your synthesis tool documentation.
Many Xilinx Primitives have a set of associated properties. These constraints can be added to the primitive
through:
• VHDL attribute passing
• Verilog attribute passing
• VHDL generic passing
• Verilog parameter passing
• User Constraints File (UCF)
For more information on how to use these properties, see Attributes and Constraints.
Attributes
An attribute is a property associated with a device architecture primitive component that affects an instantiated
component’s functionality or implementation. Attributes are passed as follows:
• In VHDL, by means of generic maps
• In Verilog, by means of defparams or inline parameter passing
Examples of attributes are:
• The INIT property on a LUT4 component
• The CLKFX_DIVIDE property on a DCM
All attributes are described in the Libraries Guides as a part of the primitive component description.
Synthesis Constraints
Synthesis constraints direct the synthesis tool optimization technique for a particular design or piece of HDL
code. They are either embedded within the VHDL or Verilog code, or within a separate synthesis constraints file.
Examples of synthesis constraints are:
• USE_DSP48 (XST)
• RAM_STYLE (XST)
For more information, see your synthesis tool documentation.
For more information about Xilinx Synthesis Technology (XST) constraints, see the XST User Guide.
Implementation Constraints
Implementation constraints are instructions given to the FPGA implementation tools to direct the mapping,
placement, timing, or other guidelines for the implementation tools to follow while processing an FPGA design.
Implementation constraints are generally placed in the User Constraints File (UCF), but may exist in the HDL
code, or in a synthesis constraints file.
Examples of implementation constraints are:
• LOC (placement)
• PERIOD (timing)
For more information, see the Constraints Guide.
Passing Attributes
Attributes are properties that are attached to Xilinx® primitive instantiations in order to specify their behavior.
They should be passed via the generic (VHDL) or parameter (Verilog) mechanism to ensure that they are
properly passed to both synthesis and simulation.
Pipelining
You can use pipelining to:
• Dramatically improve device performance at the cost of added latency (more clock cycles to process the data)
• Increase performance by restructuring long data paths with several levels of logic, and breaking it up
over multiple clock cycles
• Achieve a faster clock cycle, and, as a result, an increased data throughput at the expense of added data
latency
Because Xilinx® FPGA devices are register-rich, the pipeline is created at no cost in device resources. Since data
is now on a multi-cycle path, you must account for the added path latency in the rest of your design. Use care
when defining timing specifications for these paths.
Before Pipelining
In the following Before Pipelining diagram the clock speed is limited by:
• Clock-to out-time of the source flip-flop
• Logic delay through four levels of logic
• Routing associated with the four function generators
• Setup time of the destination register
After Pipelining
The After Pipelining diagram below is an example of the same data path shown in the Before Pipelining Diagram
after pipelining. Because the flip-flop is contained in the same CLB as the function generator, the clock speed is
limited by:
• The clock-to-out time of the source flip-flop
• The logic delay through one level of logic: one routing delay
• The setup time of the destination register
In this example, the system clock runs much faster after pipelining than before pipelining.
Retiming
Some synthesis tools can automatically move registers across logic (forward or backward) in order to increase
design speed. This process:
• Is called Retiming or Register Balancing, depending on the synthesis tool
• Allows you to increase design speed without modifying your design
• May significantly increase the number of flip-flops
For more information, see your synthesis tool documentation.
SmartModel Technology
Since Xilinx® SmartModels are simulator-independent models derived from the actual design, they are accurate
evaluation models. To simulate these models, you must use a simulator that supports the SWIFT Interface.
Synopsys Logic Modeling uses the SWIFT Interface to deliver models. The SWIFT Interface is a simulator
and device independent API from Synopsys. It has been adopted by all major simulator vendors, including
Synopsys, Cadence, and Mentor Graphics, as a way of linking simulation models to design tools.
When running a back-annotated simulation, the precompiled SmartModels support:
• Gate-Level Timing
Gate-level timing distributes the delays throughout the design. All internal paths are accurately distributed.
Multiple timing versions can be provided for different speed parts.
• Pin-to-Pin Timing
Pin-to-pin timing is less accurate, but it is faster since only a few top-level delays must be processed.
• Back-Annotation Timing
Back-annotation timing allows the model to accurately process the interconnect delays between the model
and the rest of the design. Back-annotation timing can be used with either gate-level or pin-to-pin timing,
or by itself.
Installing SmartModels
The following software is required to install and run SmartModels:
• The Xilinx® implementation tools
• A Hardware Description Language (HDL) simulator that can simulate either VHDL or Verilog, and the
SWIFT Interface
SmartModels are installed with the Xilinx implementation tools, but they are not immediately ready for use.
There are two ways to use them:
• Installing SmartModels (Method One)
Use the precompiled models if your design does not use any other vendors’ SmartModels.
• Installing SmartModels (Method Two)
Install the SmartModels with additional SmartModels incorporated in the design. Compile all SmartModels
into a common library for the simulator to use.
Simulation Flows
Observe the rules shown in the following table when compiling source files.
Although the Xilinx Hardware Description Language (HDL) Netlister produces IEEE-STD-1076-2000 VHDL code
or IEEE-STD-1364-2001 Verilog code, that does not restrict using newer or older standards for the creation of test
benches or other simulation files. If the simulator supports both older and newer standards, both standards
can generally be used in these simulation files. You must indicate to the simulator during code compilation
which standard was used to create the file.
Xilinx® does not support SystemVerilog. For more information, contact the Xilinx EDA partners listed in the
following sections for their SystemVerilog roadmaps:
• Simulating Xilinx Designs in ModelSim
• Simulating Xilinx Designs in NCSim
• Simulating Xilinx Designs in Synopsys VCS-MX and Synopsys VCS-MXi
Xilinx Libraries
The Xilinx® VHDL libraries are tied to the IEEE-STD-1076.4-2000 VITAL standard for simulation acceleration.
VITAL 2000 is in turn based on the IEEE-STD-1076-93 VHDL language. Because of this, the Xilinx libraries
must be compiled as 1076-93.
VITAL libraries include some additional processing for timing checks and back-annotation styles. The UNISIM
library turns these timing checks off for unit delay functional simulation. The SIMPRIM back-annotation library
keeps these checks on by default to allow accurate timing simulations.
The Post-NGDBuild and Post-Map simulations can be used when debugging synthesis or map optimization
issues.
For more information about SecureIP, see Encryption Methodology Used for SecureIP Models.
Keep the code behavioral for the initial design creation. Do not instantiate specific components unless necessary.
This allows for:
• More readable code
• Faster and simpler simulation
• Code portability (the ability to migrate to different device families)
• Code reuse (the ability to use the same code in future designs)
You may find it necessary to instantiate components if the component is not inferable.
As with the post-NGDBuild simulation, NetGen is used to create the structural simulation. Running the
simulation Netlister tool, NetGen creates a Standard Delay Format (SDF) file. The delays for the design are
stored in the SDF file which contains all block or logic delays. It does not contain any of the routing delays for the
design since the design has not yet been placed and routed. As with all netlists created with NetGen, Global
Set/Reset (GSR) and Global Tristate (GTS) signals must be accounted for. For more information on using the GSR
and GTS signals for post-NGDBuild simulation, see Global Reset and Tristate for Simulation.
Fifth Simulation Point: Timing Simulation Post-Place and Route (Block and Net
Delays)
The fifth simulation point is Timing Simulation Post-Place and Route (Block and Net Delays). This simulation
point requires the SIMPRIM and SmartModel/SecureIP Libraries.
Virtex-5
SIMPRIM (All Xilinx Technologies) $XILINX/vhdl/src/simprims
Simulation Libraries
XST supports the following simulation libraries:
• UNISIM Library
• VHDL UNISIM Library
• Verilog UNISIM Library
• UniMacro Library
• VHDL UniMacro Library
• Verilog UniMacro Library
• CORE Generator™ Software XilinxCoreLib Library
• SIMPRIM Library
• SmartModel Libraries
• SecureIP Libraries
• VHDL SecureIP Library
UNISIM Library
The UNISIM Library is used for functional simulation and synthesis only. This library includes:
• All Xilinx Unified Library primitives that are inferred by most synthesis tools
• Primitives that are commonly instantiated, such as:
– DCM
– BUFG
– MGT
Xilinx recommends that you infer most design functionality using behavioral Register Transfer Level (RTL)
code unless:
• The desired component is not inferable by your synthesis tool, or
• You want to take manual control of mapping and placement of a function
UniMacro Library
The UniMacro library:
• Is used for functional simulation and synthesis only.
• Provides macros to aid the instantiation of complex Xilinx primitives.
• Is an abstraction of the primitives in the UNISIM library. The synthesis tools automatically expand each
UniMacro to its underlying primitive.
For more information, see the Libraries Guides.
SIMPRIM Library
The SIMPRIM library is used for the following simulations:
• Post NGDBuild (gate level functional)
• Post-Map (partial timing)
• Post-Place and Route (full timing)
The SIMPRIM library is architecture independent.
SmartModel Libraries
This is the final release of SmartModel libraries. Starting in the next major release, SmartModels will no longer be
used. All users must have a migration plan to transition to the new SecureIP models. This section applies only
if you are using the following simulator versions:
• NCSim 8.1s005 and below
• VCS 2006.06
If you are using any of the simulator versions listed in SmartModel Supported Simulators and Operating
Systems, you do not need to set up SmartModels. For more information, see IP Encryption Methodology.
The SmartModel Libraries are used to model complex functions of modern FPGA devices such as the PowerPC®
processor and the RocketIO™ transceiver. SmartModels are encrypted source files that communicate with
simulators via the SWIFT interface.
The SmartModel Libraries require additional installation steps to properly install on your system. Additional
setup within the simulator may also be required. For more information on how to install and set up the
SmartModel Libraries, see Using SmartModels.
SecureIP Libraries
HARD IP Blocks are fully supported in ISim without additional setup. For more information see the ISim User
Guide. Xilinx leverages the latest encryption methodology as specified in Verilog LRM - IEEE Std 1364–2005.
Virtex®-4 and Virtex-5 device simulation models for the Hard-IP such as PowerPC processors, MGT, and PCIe®
leverages this technology. Everything is automatically handled by means of Compxlib, provided the appropriate
version of the simulator is present on your computer. When running a simulation with this new methodology
in Verilog, you must reference the SecureIP library. For most simulators, this can be done by using the -L
switch as an argument to the simulator, such as -L secureip. For the switch to use with your simulator, see
your simulator documentation.
The table below lists special considerations that need to be arranged with your simulator vendor for using
these libraries.
If you are using pre-compiled libraries, use the correct directive to point to the precompiled libraries. Following
is an example for ModelSim:
-L secureip
Virtex-5 Device BlockRAM Features Not Supported When Using FAST Mode
Feature Description
Parameter validity checks Checks for the generics/parameters to ensure that they are
legal for the primitive in use
Cascade feature Ability to cascade multiple BlockRAMs together
ECC feature Error checking and correction
Memory collision checks Checks to ensure that data is not being written to and read
from the same address location
Virtex-5 Device FIFO Features Not Supported When Using FAST Mode
Feature Description
Parameter checks Checks for the generics/parameters to ensure that they are
legal for the primitive in use
Design rule checks for reset When doing a reset, the model will not check for correct
number of reset pulses being applied
ECC feature Error checking and correction
Virtex-5 Device DSP Block Features Not Supported When Using FAST Mode
Feature Description
DRC checks – opmode and alumode The DSP48 block has various design rule checks for the
opmode and alumode settings that have been removed
For a complete simulation, and to insure that the simulation model functions in hardware as expected, use
SAFE mode.
SIM_MODE applies to UNISIM Register Transfer Level (RTL) simulation models only. SIM_MODE is not
supported for SIMPRIM gate simulation models. For a SIMPRIM based simulation, the model performs every
check at the cost of simulation runtimes.
JTAG Simulation
Simulation of the BSCAN component is supported for the following devices:
• Virtex®-4
• Virtex-5
• Spartan®-3A
The simulation supports the interaction of the JTAG ports and some of the JTAG operation commands. The JTAG
interface, including interface to the scan chain, is not yet fully supported. In order to simulate this interface:
1. Instantiate the BSCAN_VIRTEX4, BSCAN_VIRTEX5, or BSCAN_SPARTAN3A component and connect it to
the design.
2. Instantiate the JTAG_SIM_VIRTEX4, JTAG_SIM_VIRTEX5, or JTAG_SIM_SPARTAN3A component into
the test bench (not the design).
This becomes:
• The interface to the external JTAG signals (such as TDI, TDO, and TCK)
• The communication channel to the BSCAN component
The communication between the components takes place in the VPKG VHDL package file or the glbl
Verilog global module. Accordingly, no implicit connections are necessary between the JTAG_SIM_VIRTEX4,
JTAG_SIM_VIRTEX5, or JTAG_SIM_SPARTAN3A component and the design, or the BSCAN_VIRTEX4,
BSCAN_VIRTEX5, or BSCAN_SPARTAN3A symbol.
Stimulus can be driven and viewed from the JTAG_SIM_VIRTEX4, JTAG_SIM_VIRTEX5, or
JTAG_SIM_SPARTAN3A component within the test bench to understand the operation of the JTAG/BSCAN
function. Instantiation templates for both of these components are available in both the ISE® Design Suite HDL
Templates and the Virtex-4 device and Virtex-5 device Libraries Guides.
SelectMAP Simulation
The configuration simulation model allows supported configuration interfaces to be simulated ultimately
showing the DONE pin going high. This is a model of how the supported devices will react to stimulus on the
supported configuration interface. For a list of supported interfaces and devices, see the following table. The
model is set up to handle control signal activity as well as bit file downloading. Included are internal register
settings such as the CRC, IDCODE, and Status Registers. The Sync Word can be monitored as it enters the device
and the Start Up Sequence can be monitored as it progresses. The diagram below shows how the system should
map from the hardware to the simulation environment. The configuration process is specifically outlined in the
Configuration User Guides for each device family. These guides contain information on the configuration sequence
as well as the configuration interfaces.
Supported Features
Each device-specific Configuration User Guides outlines the supported methods of interacting with each
configuration interface. This guide outlines items discussed in the Configuration User Guides, which are not
supported by the model. Spartan-3A Slave SelectMAP Features Supported by the Model, and Virtex-5 Slave
SelectMAP Features Supported by the Model, list features discussed in the Configuration User Guides not
supported by the model.
Readback of configuration data is not supported by the model. The model does not store configuration data
provided although a CRC value is calculated. Readback can only be performed on specific registers to ensure a
valid command sequence and signal handling is provided to the device. The model is not intended to allow
readback data files to be produced.
SPI_ACCESS Attributes
Five attributes can be set for the SPI_ACCESS component.
• SPI_ACCESS SIM_DEVICE
• SPI_ACCESS SIM_USER_ID
• SPI_ACCESS SIM_MEM_FILE
• SPI_ACCESS SIM_FACTORY_ID
• SPI_ACCESS SIM_DELAY_TYPE
In simulation, the FACTORY_ID can be written only once. As soon as a value other than one is detected in the
factory ID, no further writing is allowed.
In the hardware, each individual device has a unique factory programmed ID in this field. It cannot be
reprogrammed or erased.
For more information on using the SPI_ACCESS primitive, see the Libraries Guides.
SIM_COLLISION_CHECK Strings
Use the strings shown in the following table to control what happens in the event of a collision.
SIM_COLLISION_CHECK Strings
String Write Collision Messages Write Xs on the Output
ALL Yes Yes
WARN_ONLY Yes No
Applies only at the time of collision.
Subsequent reads of the same address
space may produce Xs on the output.
GENERATE_X_ONLY No Yes
None No No
Applies only at the time of collision.
Subsequent reads of the same address
space may produce Xs on the output.
SIM_COLLISION_CHECK can be applied at an instance level. This enables you to change the setting for each
block RAM instance.
In simulation, the GTS signal is usually not driven. The circuitry for driving GTS is available in the back-end
simulation and can be optionally added for the front end simulation, but the GTS pulse width is set to 0
by default.
To preserve the design hierarchy through implementation with little or no degradation in performance or
increase in design resources:
• Follow stricter design rules.
• Select the design hierarchy so that optimization is not necessary across the design hierarchy.
Use the –mhf switch to produce individual files for each Keep Hierarchy instance in the design. You can also use
the –mhf switch together with the –dir switch to place all associated files in a separate directory.
netgen -sim -ofmt {vhdl|verilog} -mhf -dir directory_name design_name .ncd
When you run NetGen with the –mhf switch, NetGen produces a text file called design_mhf_info.txt. The
design_mhf_info.txt file lists all produced module and entity names, their associated instance names,
Standard Delay Format (SDF) files, and sub modules. The design_mhf_info.txt file is useful for determining
proper simulation compile order, SDF annotation options, and other information when you use one or more of
these files for simulation.
Module : hex2led_1
Instance : msbled
Design File : hex2led_1_sim.vhd
SDF File : hex2led_1_sim.sdf
SubModule : NONE
Module : hex2led
Instance : lsbled
Design File : hex2led_sim.vhd
SDF File : hex2led_sim.sdf
SubModule : NONE
Module : smallcntr_1
Instance : lsbcount
Design File : smallcntr_1_sim.vhd
SDF File : smallcntr_1_sim.sdf
SubModule : NONE
Module : smallcntr
Instance : msbcount
Design File : smallcntr_sim.vhd
SDF File : smallcntr_sim.sdf
SubModule : NONE
Module : cnt60
Instance : sixty
Design File : cnt60_sim.vhd
SDF File : cnt60_sim.sdf
SubModule : smallcntr, smallcntr_1
Module : smallcntr, Instance : msbcount
Module : smallcntr_1, Instance : lsbcount
Module : decode
Instance : decoder
Design File : decode_sim.vhd
SDF File : decode_sim.sdf
SubModule : NONE
Module : dcm1
Instance : Inst_dcm1
Design File : dcm1_sim.vhd
SDF File : dcm1_sim.sdf
SubModule : NONE
Module : statmach
Instance : MACHINE
Design File : statmach_sim.vhd
SDF File : statmach_sim.sdf
SubModule : NONE
Module : stopwatch
Design File : stopwatch_timesim.vhd
SDF File : stopwatch_timesim.sdf
SubModule : statmach, dcm1, decode, cnt60, hex2led, hex2led_1
Module : statmach, Instance : MACHINE
Module : dcm1, Instance : Inst_dcm1
Module : decode, Instance : decoder
Module : cnt60, Instance : sixty
Module : hex2led, Instance : lsbled
Module : hex2led_1, Instance : msbled
Hierarchy created by generate statements may not match the original simulation due to naming differences
between the simulator and synthesis engines for generated instances.
The simulator performs the clk <= clk_b assignment before advancing the simulation time. As a result,
events that should occur in two clock edges will occur instead in one clock edge, causing a race condition.
Recommended ways to introduce causality in simulators for such cases include:
• Do not change clock and data at the same time. Insert a delay at every output.
• Be sure to use the same clock.
• Force a delta delay by using a temporary signal as follows:
clk_b <= clk;
clk_prcs : process (clk)
begin
end if;
end process;
result_temp <= result;
clk_b_prcs : process (clk_b)
begin
if (clk_b’event and clk_b=’1’) then
result1 <= result_temp;
end if;
end process;
Almost every event-based simulator can display delta cycles. Use this to your advantage when debugging
simulation issues.
The registers to which ASYNC_REG is attached retain the previous value during timing simulation, and do
not output an X to simulation.
A timing violation error may still occur. Use care, as the new value may have been clocked in as well.
ASYNC_REG is applicable to CLB and Input Output Block (IOB) registers and latches only. If you cannot avoid
clocking in asynchronous data, Xilinx® recommends that you do so for IOB or CLB registers only. Clocking in
asynchronous signals to RAM, Shift Register LUT (SRL), or other synchronous elements has less deterministic
results, and therefore should be avoided.
Xilinx highly recommends that you first properly synchronize any asynchronous signal in a register, latch, or
FIFO before writing to a RAM, Shift Register LUT (SRL), or any other synchronous element.
For more information, see the Constraints Guide.
MIN/TYP/MAX Simulation
The Standard Delay Format (SDF) file allows you to specify three sets of delay values for simulation:
• Minimum (MIN)
• Typical (TYP)
• Maximum (MAX)
Xilinx® uses these values to allow the simulation of the target architecture under various operating conditions.
By allowing for the simulation across various operating conditions, you can perform more accurate setup and
hold timing verification.
Minimum (MIN)
Minimum (MIN) represents the device under the best case operating conditions. The base case operating
conditions are defined as the minimum operating temperature, the maximum voltage, and the best case process
variations. Under best case conditions, the data paths of the device have the minimum delay possible, while
the clock path delays are the maximum possible relative to the data path delays. This situation is ideal for hold
time verification of the device.
Typical (TYP)
Typical (TYP) represents the typical operating conditions of the device. In this situation, the clock and data path
delays are both the maximum possible. This is different from the Maximum (MAX) field, in which the clock paths
are the minimum possible relative to the maximum data paths. Xilinx generated Standard Delay Format (SDF)
files do not take advantage of this field.
Maximum (MAX)
Maximum (MAX) represents the delays under the worst case operating conditions of the device. The worst case
operating conditions are defined as the maximum operating temperature, the minimum voltage, and the worst
case process variations. Under worst case conditions, the data paths of the device have the maximum delay
possible, while the clock path delays are the minimum possible relative to the data path delays. This situation is
ideal for setup time verification of the device.
Run NetGen
To obtain accurate Standard Delay Format (SDF) numbers, run netgen with -pcf pointing to a valid Physical
Constraints File (PCF). NetGen must be run with -pcf since newer Xilinx® devices take advantage of relative
mins for timing information. Once netgen is called with -pcf the Minimum (MIN) and Maximum (MAX) numbers
in the SDF file will be different for the components.
Once the correct SDF file is created, two types of simulation must be run for complete timing closure:
• Setup Simulation
• Hold Simulation
In order to run the different simulations, the simulator must be called with the appropriate switches.
Combining both minimum values overrides prorating, and results in issuing only absolute process MIN values
for the simulation Standard Delay Format (SDF) file.
Prorating is available for certain FPGA devices only. It is not intended for military and industrial ranges. It is
applicable only within commercial operating ranges.
As for simulation, the DLL/DCM simulation model itself attempts to align the input clock to the clock coming
back into the feedback input. Instead of putting the delay in the DLL or DCM itself, the delays are handled by
combining some of them into the feedback path as clock delay on the clock buffer (component) and clock net
(port delay). The remainder is combined with the port delay of the CLKFB pin. While this is different from the
way TRACE or Timing Analyzer reports it, and the way it is implemented in the silicon, the end result is the
same functionality and timing. TRACE and simulation both use a simple delay model rather than an adjustable
delay tap line similar to silicon.
The primary function of the DLL/DCM is to remove the clock delay from the internal clocking circuit as shown in
the following diagram.
Do not confuse this with de-skewing the clock. Clock skew is generally associated with delay variances in the
clock tree, which is a different matter. By removing the clock delay, the input clock to the device pin should be
properly phase aligned with the clock signal as it arrives at each register it is sourcing. Observing signals at the
DLL/DCM pins generally does not give the proper viewpoint to observe the removal of the clock delay.
To determine if the DCM is functioning as intended, compare the input clock (at the input port to the design)
with the clock pins of one of the sourcing registers. If these are aligned (or shifted to the desired amount), then
the DLL/DCM is functioning as intended.
Viewer Considerations
Depending on the simulator, the waveform viewer may not depict the delay timing in the expected manner. Some
simulators (including ModelSim) combine interconnect and port delays with the input pins of the component
delays. While the simulation results are correct, the depiction in the waveform viewer may be unexpected.
Since interconnect delays are combined, when you look at a pin using the ModelSim viewer, you do not see the
transition as it happens on the pin. The simulation acts properly, but when attempting to calculate clock delay,
the interconnect delays before the clock pin must be taken into account if the simulator you are using combines
these interconnect delays with component delays.
For more information, search the Xilinx® Answer Database for the following topic: ModelSim Simulations: Input
and Output clocks of the DCM and CLKDLL models do not appear to be de-skewed (VHDL, Verilog).
Functional Simulation
While functional simulation is an important part of the verification process, it should not be the only part.
Functional simulation tests only for the functional capabilities of the Register Transfer Level (RTL) design. It
does not include any timing information, nor does it take into consideration changes made to the original design
due to implementation and optimization
In-System Testing
Most designers rely on In-System Testing as the ultimate test. If the design works on the board, and passes the
test suites, they view the device as ready for release. While In-System Testing is definitely effective for some
purposes, it may not immediately detect all potential problems. At times the design must be run for a lengthy
period before corner-case issues become apparent. For example, issues such as timing violations may not become
apparent in the same way in all devices. By the time these corner-case issues manifest themselves, the design
may already be in the hands of the end customer. It will mean high costs, downtime, and frustration to try to
resolve the problem. In order to properly complete In-System Testing, all hardware hurdles such as problems
with SSO, Cross-talk, and other board related issues must be overcome. Any external interfaces must also be
connected before beginning the In-System Testing, increasing the time to market.
The traditional methods of verification are not sufficient for a fully verified system. There are compelling
reasons to do dynamic timing analysis.
VHDL Simulation
For VHDL simulation, library components are instantiated by NetGen and proper values are annotated for pulse
rejection in the simulation netlist. The result of these constructs in the simulation netlists is a more true-to-life
simulation model, and therefore a more accurate simulation.
Verilog Simulation
For Verilog simulation, this information is passed by the PATHPULSE construct in the Standard Delay Format
(SDF) file. This construct is used to specify the size of pulses to be rejected or swallowed on components in
the netlist.
Line One points to the line in the simulation model that is in error. In this example, the failing line is line 96
of the Verilog file X_RAMD16.
Line Two gives information about the two signals that caused the error:
• The type of violation, such as $setup, $hold, or $recovery. This example is a $setup violation.
• The name of each signal involved in the violation, followed by the simulation time at which that signal last
changed values. In this example, the failing signals are the negative-going edge of the signal WE, which last
changed at 29138 picoseconds, and the positive-going edge of the signal CLK, which last changed at 29151
picoseconds.
• The allotted amount of time for the setup. In this example, the signal on WE should be stable for 373 pico
seconds before the clock transitions. Since WE changed only 13 pico seconds before the clock, the simulator
reported a violation.
Line Three gives the simulation time at which the error was reported, and the instance in the structural design
(time_sim) in which the violation occurred.
Asynchronous Clocks
If the design has two or more clock domains, any path that crosses data from one domain to another can cause
timing problems. Although data paths that cross from one clock domain to another are not always asynchronous,
it is always best to be cautious.
Always treat the following as asynchronous:
• Two clocks with unrelated frequencies
• Any clocking signal coming from off-chip
• Any time a register’s clock is gated (unless extreme caution is used)
To see if the path in question crosses asynchronous clock boundaries, check the source code and the Timing
Analysis report. If your design does not allow enough time for the path to be properly clocked into the other
domain, you may need to redesign your clocking scheme. Consider using an asynchronous FIFO as a better
way to pass data from one clock domain to another.
Asynchronous Inputs
Data paths that are not controlled by a clocked element are asynchronous inputs. Because they are not clock
controlled, they can easily violate setup and hold time specifications.
Check the source code to see if the path in question is synchronous to the input register. If synchronization is
not possible, you can use the ASYNC_REG constraint to work around the problem. For more information,
see Using the ASYNC_REG Constraint.
Debugging Tips
When you have a timing violation, ask:
• Was the clock path analyzed by TRACE or Timing Analyzer?
• Did TRACE or Timing Analyzer report that the data path can run at speeds being clocked in simulation?
• Is clock skew being accounted for in this path delay?
• Does subtracting the clock path delay from the data path delay still allow clocking speeds?
• Will slowing down the clock speeds eliminate the $setup or $hold time violations?
• Does this data path cross clock boundaries (from one clock domain to another) ? Are the clocks synchronous
to each other? Is there appreciable clock skew or phase difference between these clocks?
• If this path is an input path to the device, does changing the time at which the input stimulus is applied
eliminate the $setup or $hold time violations?
Depending on your answers, you may need to change your design or test bench to accommodate the simulation
conditions. For more information, see Design Considerations.
RAM Considerations
This section discusses RAM Considerations for Setup and Hold Violations, and includes:
• Timing Violations
• Collision Checking
• Hierarchy Considerations
Timing Violations
Xilinx® devices contain two types of memories:
• Block RAM
• Distributed RAM
Since block RAM and distributed RAM are synchronous elements, you must take care to avoid timing violations.
To guarantee proper data storage, the data input, address lines, and enables, must all be stable before the
clock signal arrives.
Collision Checking
Block RAMs also perform synchronous read operations. During a read cycle, the addresses and enables must be
stable before the clock signal arrives, or a timing violation may occur.
When you use block RAM in dual-port mode, take special care to avoid memory collisions. A memory collision
occurs when:
1. One port is being written to, and
2. An attempt is made to either read or write to the other port at the same address at the same time (or within
a very short period of time thereafter)
Hierarchy Considerations
It is possible for the top-level signals to switch correctly, keeping the setup and hold times accounted for, while
at the same time, an error is reported at the lowest level primitive. As the signals travel down the hierarchy
to the lowest level primitive, the delays they experience can reduce the differences between them to the point
that they violate the setup time.
To correct this problem:
1. Browse the design hierarchy, and add the signals of the instance reporting the error to the top-level
waveform. Make sure that the setup time is actually being violated at the lower level.
2. Step back through the structural design until a link between a Register Transfer Level (RTL) (pre-synthesis)
design path and this instance reporting the error can be determined.
3. Constrain the Register Transfer Level (RTL) path using timing constraints so that the timing violation no
longer occurs. Usually, most implemented designs have a small percentage of unconstrained paths after
timing constraints have been applied, and these are the ones where $setup and $hold violations usually
occur.
The debugging steps for $hold violations and $setup violations are identical.
Design Considerations
This chapter discusses practices to consider during your design. This chapter includes:
• Understanding the Architecture
• Clocking Resources
• Defining Timing Requirements
• Driving Synthesis
• Choosing Implementation Options
• Evaluating Critical Paths
• Design Preservation With SmartCompile™ Technology
Slice Structure
The slice contains the basic elements for implementing both sequential and combinatorial circuits in an FPGA
device. In order to minimize area and optimize performance of a design, it is important to know if a design is
effectively using the slice features. Some issues to consider are:
• What basic elements are contained with a slice? What are the different configurations for each of those basic
elements? For example, a look-up table (LUT) can also be configured as a distributed RAM or a shift register.
• What are the dedicated interconnects between those basic elements? For example, could the fanout of a LUT
to multiple registers prevent optimal packing of a slice?
• What common inputs do the elements of a slice share such as control signals and clocks that would
potentially limit its packing? Using Registers with common set/reset, clock enable, and clocks improves the
packing of the design. By using logic replication, the same reset net may have multiple unique names, and
prevents optimal register packing in a slice. Consider turning off Logic Replication for reset nets and clock
enables in the synthesis flow.
• What is the size of the LUT, and how many LUTs are required to implement certain combinatorial functions
of a design?
Hard IP Blocks
If a Hard IP block, such as a BRAM or DSP block, appears repeatedly as the source or destination of your
critical paths, try the following:
• Use Block Features Optimally
• Evaluate the Percentage of BRAMs or DSP Blocks
• Lock Down Block Placement
• Compare Hard-IP Blocks and Slice Logic
• Use SelectRAM™ memory
• Compare Placing Logic Functions in Slice Logic or DSP Block
Clocking Resources
You must determine whether the clocking resources of the target architecture meet design requirements. These
may include:
• Number and type of clock routing resources
• Maximum allowed frequency of each of the clock routing resources
• Number of dedicated clock input pins
• Number and type of resources available for clock manipulation, such as DCMs and PLLs
• Features and restrictions of DCMs and PLLs in terms of frequency, jitter, and flexibility in the manipulation
of clocks
For most Xilinx® FPGA architectures, the devices are divided into clock regions and there are restrictions on the
number of clock routing resources available in each of those regions. Since the number of total clock routing
resources is typically greater than the number of clocks available to a region, many designs exceed the number of
clocks available for one particular region. When this occurs, the software must place the design so that the clocks
can be dispersed among multiple regions. This can be done only if there are no restrictions in place that force it to
place synchronous elements in a way that violates the clock region rules.
Clock Reporting
The Place and Route Report (<design_name> .par) includes a Clock Report that details the clocks it has
detected in the design. For each clock, the report details:
• Whether the resource used was global, regional, or local
• Whether the clock buffer was locked down with a location constraint or not
• Fanout
• Maximum skew
• Maximum delay
Over-Constraining
Although over-constraining can help you understand a design’s potential maximum performance, use it with
caution. Over-constraining can cause excessive replication in synthesis.
The auto relaxation feature in PAR automatically scales back the constraint if the software determines that the
constraint is not achievable. This reduces runtime, and attempts to ensure the best performance for all constraints.
The timing constraints specified for synthesis should try to match the constraints specified for implementation.
Although most synthesis tools can write out timing constraints for implementation, Xilinx® recommends that
you avoid this option. Specify your implementation constraints separately in the User Constraints File (UCF)
(<design_name.ucf>) For a complete description of the supported timing constraints and syntax examples,
see the Constraints Guide.
Constraint Coverage
In your synthesis report, check for any replicated registers, and ensure that timing constraints that might apply to
the original register also cover the replicated registers for implementation. To minimize implementation runtime
and memory usage, write timing constraints by grouping the maximum number of paths with the same timing
requirement first before generating a specific timespec.
Driving Synthesis
To create high-performance circuits, Xilinx® recommends that you:
• Use Proper Coding Techniques
• Analyze Inference of Logic
• Provide a Complete Picture of Your Design
• Use Optimal Software Settings
For a complete listing of attributes and their functionality, see your synthesis tool documentation. For more
information about XST constraints, see the XST User Guide.
SmartXplorer
Use ISE® Design Suite to determine which implementation options provide maximum design performance.
SmartXplorer has two modes of operation:
• Timing Closure Mode
• Best Performance Mode
It is usually best to run SmartXplorer over the weekend since it typically runs more than a single iteration of MAP
and PAR. Once SmartXplorer has selected the optimal tools settings, continue to use these settings for subsequent
design runs. If you have made many design changes since the original SmartXplorer run, and your design is no
longer meeting timing with the options determined by SmartXplorer, consider running SmartXplorer again.
For help with SecureIP simulation, open a WebCase with Xilinx Technical Support at
http://www.xilinx.com/support.
The $XILINX/verilog/src/unisims area contains the Unified Library components for RTL simulation. The
$XILINX/verilog/src/simprims area contains generic simulation primitives.
For timing simulation and post-map simulation, or for post-translate simulation, the SIMPRIM based libraries
are used. Specify the following at the command line:
ncverilog -y $XILINX/verilog/src/simprims $XILINX/verilog/src/glbl.v
\+libext+.v <testfixture>.v <design>.v
For more information about annotating Standard Delay Format (SDF) files, see Back-Annotating Delay Values
from SDF File.
Depending on the makeup of the design (for example, Xilinx instantiated primitives, CORE Generator™
software) for RTL simulation, edit the hdl.var and cds.lib to specify the library mapping as shown in
the following examples.
CDS.LIB Example
# cds.lib DEFINE worklib worklib
HDL.VAR Example
# hdl.var DEFINE LIB_MAP ($LIB_MAP, + => worklib)
NCSDFC creates a file called sdf_filename.sdf.X. If a compiled file exists, NCSDFC checks to make sure
that the date of the compiled file is newer than the date of the source file and that the version of the compiled
file matches the version of NCSDFC. If either check fails, the SDF file is recompiled. Otherwise, the compiled
file is read.
For Back Annotated simulation, the SIMPRIM based libraries (except for Post Synthesis) are used. Specify the
following at the command line:
ncvlog -messages -update $XILINX/verilog/src/glbl.v <testfixture>.v time_sim.v
ncelab -messages -autosdf <testfixture_name> glbl
ncsim -messages <testfixture_name>
-f $XILINX/secureip/ncsim/ncsim_secureip_cell.list.f
Example
ncverilog \
design>.v testbench>.v \
${Xilinx}/verilog/src/glbl.v \
-f $XILINX/secureip/ncsim/ncsim_secureip_cell.list.f \ \b>
-y ${Xilinx}/verilog/src/unisims +libext+.v \
-y ${Xilinx}/verilog/src/simprims +libext+.v \
+access+r+w
For help with SecureIP simulation, open a WebCase with Xilinx Technical Support at
http://www.xilinx.com/support.
CDS.LIB Example
# cds.lib DEFINE worklib worklib
HDL.VAR Example
# hdl.var DEFINE LIB_MAP ($LIB_MAP, + => worklib)
For timing simulation, the SIMPRIM based libraries are used. Specify the following at the command line:
vcs +compsdf -y $XILINX/verilog/src/simprims $XILINX/verilog/src/glbl.v \
+libext+.v -Mupdate -R <testfixture>.v time_sim.v
For information on back-annotating the Standard Delay Format (SDF) file for timing simulation, see Using
Standard Delay Format (SDF) with VCS.
The -R option automatically simulates the executable after compilation.
The -Mupdate option enables incremental compilation. Modules may be recompiled for one of the following
reasons:
• The target of a hierarchical reference has changed.
• A compile time constant, such as a parameter, has changed.
• The ports of a module instantiated in the module have changed.
• Module inlining. For example, the merging internally in VCS of a group of module definitions into a larger
module definition that leads to faster simulation. These affected modules are again recompiled. This is
performed only once.
For timing simulation or post-NGD2VER, the SIMPRIM based libraries are used. Specify the following at the
command-line:
vcs +compsdf -Mupdate -Mlib=<compiled_lib_dir>/simprims_ver \
-y $XILINX/verilog/src/simprims $XILINX/verilog/src/glbl.v +libext+.v \
-R <testfixture>.v time_sim.v
For information on back-annotating the Standard Delay Format (SDF) file for timing simulation, see Using
Standard Delay Format (SDF) with VCS.
The -R option automatically simulates the executable after compilation.
The -Mlib=<compiled_lib_dir> option provides VCS with a central place to look for the descriptor information
before it compiles a module and a central place to obtain the object files when it links the executables together.
The -Mupdate option enables incremental compilation. Modules may be recompiled for one of the following
reasons:
• The target of a hierarchical reference has changed.
• A compile time constant such as a parameter has changed.
• The ports of a module instantiated in the module have changed.
• Module inlining. For example, merging internally in VCS a group of module definitions into a larger module
definition leads to faster simulation. These affected modules are again recompiled. This is performed
only once.
SecureIP libraries can be used at compile time by leveraging the -f switch in the simulator.
If you are using the SystemVerilog switch with SecureIP, see Xilinx Answer Record 32821).
For help with SecureIP simulation, open a WebCase with Xilinx® Technical Support at
http://www.xilinx.com/support.