FPGA Tutorial
FPGA Tutorial
FPGA Tutorial
Generator for
DSP
User Guide
This document applies to the following software versions: ISE Design Suite 14.3 through 14.6
Xilinx is disclosing this user guide, manual, release note, and/or specification (the "Documentation") to you solely for use in the development
of designs to operate with Xilinx hardware devices. You may not reproduce, distribute, republish, download, display, post, or transmit the
Documentation in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written consent of Xilinx. Xilinx expressly disclaims any liability arising out of your use of the Documentation. Xilinx reserves
the right, at its sole discretion, to change the Documentation without notice at any time. Xilinx assumes no obligation to correct any errors
contained in the Documentation, or to advise you of any corrections or updates. Xilinx expressly disclaims any liability in connection with
technical support or assistance that may be provided to you in connection with the Information.
THE DOCUMENTATION IS DISCLOSED TO YOU AS-IS WITH NO WARRANTY OF ANY KIND. XILINX MAKES NO OTHER
WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, REGARDING THE DOCUMENTATION, INCLUDING ANY
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT OF THIRD-PARTY
RIGHTS. IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL
DAMAGES, INCLUDING ANY LOSS OF DATA OR LOST PROFITS, ARISING FROM YOUR USE OF THE DOCUMENTATION.
Copyright 2006 - 2012. Xilinx, Inc. XILINX, the Xilinx logo, Artix, ISE, Kintex, Spartan, Virtex, and other designated brands included herein
are trademarks of Xilinx in the United States and other countries. All other trademarks are the property of their respective owners.
www.xilinx.com
Table of Contents
Chapter 1: Hardware Design Using System Generator
A Brief Introduction to FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Note to the DSP Engineer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Note to the Hardware Engineer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
18
20
21
24
24
25
36
37
39
40
44
44
50
51
52
55
56
57
60
62
63
66
69
71
73
73
74
75
www.xilinx.com
88
90
91
91
92
94
94
95
97
97
97
97
98
98
108
109
109
110
115
119
119
120
121
123
124
127
128
129
130
131
135
www.xilinx.com
151
151
152
153
: Hardware/Software Co-Design
Hardware/Software Co-Design in System Generator . . . . . . . . . . . . . . . . . . . . . . . . 156
Black Box Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
PicoBlaze Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
EDK Processor Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
158
159
159
160
163
166
167
www.xilinx.com
260
262
263
266
268
270
274
276
281
283
286
290
315
317
319
321
338
339
340
354
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
Configuring the HDL Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
Co-Simulating Multiple Black Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
www.xilinx.com
357
371
378
380
382
384
404
405
406
406
406
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
www.xilinx.com
421
www.xilinx.com
Chapter 1
System-Level Modeling in
System Generator
www.xilinx.com
AXI Interface
10
www.xilinx.com
405), and multi-gigabit serial transceivers. The compute and I/O resources are linked
under the control of the bitstream by a programmable interconnect architecture that allows
them to be wired together into systems.
FPGAs are high performance data processing devices. DSP performance is derived from
the FPGAs ability to construct highly parallel architectures for processing data. In contrast
with a microprocessor or DSP processor, where performance is tied to the clock rate at
which the processor can run, FPGA performance is tied to the amount of parallelism that
can be brought to bear in the algorithms that make up a signal processing system. A
combination of increasingly high system clock rates (current system frequencies of 100-200
MHz are common today) and a highly-distributed memory architecture gives the system
designer an ability to exploit parallelism in DSP (and other) applications that operate on
data streams. For example, the raw memory bandwidth of a large FPGA running at a clock
rate of 150 MHz can be hundreds of terabytes per second.
There are many DSP applications (e.g., digital up/down converters) that can be
implemented only in custom integrated circuits (ICs) or in an FPGA; a von Neumann
processor lacks both the compute capability and the memory bandwidth required.
Advantages of using an FPGA include significantly lower non-recurring engineering costs
than those associated with a custom IC (FPGAs are commercial off-the-shelf devices),
shorter time to market, and the configurability of an FPGA, which allows a design to be
modified, even after deployment in an end application.
When working in System Generator, it is important to keep in mind that an FPGA has
many degrees of freedom in implementing signal processing functions. You have, for
example, the freedom to define data path widths throughout your system and to employ
many individual data processors (e.g., multiply-accumulate engines), depending on
system requirements. System Generator provides abstractions that allow you to design for
an FPGA largely by thinking about the algorithm you want to implement. However, the
more you know about the underlying FPGA, the more likely you are to exploit the unique
capabilities an FPGA provides in achieving high performance.
The remainder of this topic is a brief introduction to some of the logic resources available in
the FPGA, so that you gain some appreciation for the abstractions provided in System
Generator.
www.xilinx.com
11
The figure above shows a physical view of a Virtex-4 FPGA. To a signal DSP engineer, an
FPGA can be thought of as a 2-D array of logic slices striped with columns of hard macro
blocks (block memory and arithmetic blocks) suitable for implementing DSP functions,
embedded within a configurable interconnect mesh. In a Virtex-4 FPGA, the DSP blocks
(shown in the next figure) can run in excess of 450 MHz, and are pitch-matched to dual
port memory blocks (BRAMs) whose ports can be configured to a wide range of word sizes
(18 Kb total per BRAM). The Virtex-4 SX55 device contains 512 such DSP blocks and
BRAMs. In System Generator, you can access all of these resources through arithmetic and
logic abstractions to build very high performance digital filters, FFTs, and other arithmetic
and signal processing functions.
Each logic slice contains two 4-input lookup tables (LUTs), two configurable D-flip flops,
multiplexers, dedicated carry logic, and gates used for creating slice-based multipliers.
Each LUT can implement an arbitrary 4-input Boolean function. Coupled with dedicated
logic for implementing fast carry circuits, the LUTs can also be used to build fast
adder/subtractors and multipliers of essentially any word size. In addition to
implementing Boolean functions, each LUT can also be configured as a 16x1 bit RAM or as
12
www.xilinx.com
a shift register (SRL16). An SRL16 shift register is a synchronously clocked 16x1 bit delay
line with a dynamically addressable tap point.
In System Generator, these different memory options are represented with higher-level
abstractions. Instead of providing a D-flip flop primitive, System Generator provides a
register of arbitrary size. There are two blocks that provide abstractions of arbitrary
width, arbitrary depth delay lines that map directly onto the SRL16 configuration. The
delay block can be used for pipeline balancing, and can also be used as storage for timedivision multiplexed (TDM) data streams. The addressable shift register (ASR) block,
with a function depicted in the figure below, provides an arbitrary width, arbitrary depth
tapped delay line. This block is of particular interest to the DSP engineer, since it can be
used to implement tapped delay lines as well as sweeping through TDM data streams.
Although random access memories can be constructed either out of the BRAM or LUT
(RAM16x1) primitives, doing so can require considerable care to ensure most efficient
mappings, and considerable clerical attention to detail to correctly assemble the primitives
into larger structures. System Generator removes the need for such tasks.
For example, the dual port RAM (DPRAM) block shown in the figure below maps
efficiently onto as many BRAM or RAM16x1 components on the device as are necessary to
implement the desired memory. As can be seen from the mask dialog box for the DPRAM,
the interface allows you to specify a type of memory (BRAM or RAM16x1), depth (data
www.xilinx.com
13
width is inferred from the Simulink signal driving a particular input port), initial memory
contents, and other characteristics.
In general, System Generator maps abstractions onto device primitives efficiently, freeing
you from worrying about interconnections between the primitives. System Generator
employs libraries of intellectual property (IP) when appropriate to provide efficient
implementations of functions in the block libraries. In this way, you dont always have to
have detailed knowledge of the underlying FPGA details. However, when it makes sense
to implement an algorithm using basic functions (e.g., adder, register, memory), System
Generator allows you to exploit your FPGA knowledge while reducing the clerical tasks of
managing all signals explicitly.
System Generator library blocks and the mapping from Simulink to hardware are
described in detail in subsequent topics of this documentation. There is a wealth of
detailed information about FPGAs that can be found online at http://support.xilinx.com,
including data books, application notes, white papers, and technical articles.
14
www.xilinx.com
Algorithm Exploration
System Generator is particularly useful for algorithm exploration, design prototyping, and
model analysis. When these are the goals, you can use the tool to flesh out an algorithm in
order to get a feel for the design problems that are likely to be faced, and perhaps to
estimate the cost and performance of an implementation in hardware. The work is
preparatory, and there is little need to translate the design into hardware.
In this setting, you assemble key portions of the design without worrying about fine points
or detailed implementation. Simulink blocks and MATLAB M-code provide stimuli for
simulations, and for analyzing results. Resource estimation gives a rough idea of the cost
of the design in hardware. Experiments using hardware generation can suggest the
hardware speeds that are possible.
Once a promising approach has been identified, the design can be fleshed out. System
Generator allows refinements to be done in steps, so some portions of the design can be
made ready for implementation in hardware, while others remain high-level and abstract.
System Generator's facilities for hardware co-simulation are particularly useful when
portions of a design are being refined.
www.xilinx.com
15
less well suited for sophisticated external interfaces that have strict timing requirements. In
this case, it may be useful to implement parts of the design using System Generator,
implement other parts outside, and then combine the parts into a working whole.
A typical approach to this flow is to create an HDL wrapper that represents the entire
design, and to use the System Generator portion as a component. The non-System
Generator portions of the design can also be components in the wrapper, or can be
instantiated directly in the wrapper.
A clock wrapper that encloses the design. This clock wrapper produces the clock and
clock enable signals that the design needs.
A HDL testbench that encloses the clock wrapper. The testbench allows results from
Simulink simulations to be compared against ones produced by a logic simulator.
Project files and scripts that allow various synthesis tools, such as XST and Synplify
Pro to operate on System Generator HDL
Files that allow the System Generator HDL to be used as a project in Project
Navigator.
For details concerning the files that System Generator writes, see the topic Compilation
Results.
16
www.xilinx.com
Signal Types
Synchronization Mechanisms
Resource Estimation
www.xilinx.com
17
System Generator blocks are bit-accurate and cycle-accurate. Bit-accurate blocks produce
values in Simulink that match corresponding values produced in hardware; cycle-accurate
blocks produce corresponding values at corresponding times.
18
www.xilinx.com
Xilinx Blockset
The Xilinx Blockset is a family of libraries that contain basic System Generator blocks.
Some blocks are low-level, providing access to device-specific hardware. Others are highlevel, implementing (for example) signal processing and advanced communications
algorithms. For convenience, blocks with broad applicability (e.g., the Gateway I/O
blocks) are members of several libraries. Every block is contained in the Index library. The
libraries are described below.
Note: It is important that you dont name your design the same as a Xilinx block. For example, if you
name your design shared_memory.mdl, it may cause System Generator to issue an error message.
Library
Description
AXI4
Basic Elements
Communication
Control Logic
DSP
Data Types
Floating-Point
Index
Math
Memory
Shared Memory
Tools
Note: More information concerning blocks can be found in the topic Xilinx Blockset.
Library
Description
Communication
Control Logic
DSP
Imaging
Math
www.xilinx.com
19
Each block in this blockset is a composite, i.e., is implemented as a masked subsystem, with
parameters that configure the block.
You can use blocks from the Reference Blockset libraries as is, or as starting points when
constructing designs that have similar characteristics. Each reference block has a
description of its implementation and hardware resource requirements. Individual
documentation for each block is also provided in the topic Xilinx Reference Blockset.
Signal Types
In order to provide bit-accurate simulation of hardware, System Generator blocks operate
on Boolean, floating-point, and arbitrary precision fixed-point values. By contrast, the
fundamental scalar signal type in Simulink is double precision floating point. The
connection between Xilinx blocks and non-Xilinx blocks is provided by gateway blocks. The
gateway in converts a double precision signal into a Xilinx signal, and the gateway out
converts a Xilinx signal into double precision. Simulink continuous time signals must be
sampled by the Gateway In block.
Most Xilinx blocks are polymorphic, i.e., they are able to deduce appropriate output types
based on their input types. When full precision is specified for a block in its parameters
dialog box, System Generator chooses the output type to ensure no precision is lost. Sign
extension and zero padding occur automatically as necessary. User-specified precision is
usually also available. This allows you to set the output type for a block and to specify how
quantization and overflow should be handled. Quantization possibilities include unbiased
rounding towards plus or minus infinity, depending on sign, or truncation. Overflow
options include saturation, truncation, and reporting overflow as an error.
20
www.xilinx.com
Note: System Generator data types can be displayed by selecting Format > Port Data Types in
Simulink. Displaying data types makes it easy to determine precision throughout a model. If, for
example, the type for a port is Fix_11_9, then the signal is a two's complement signed 11-bit number
having nine fractional bits. Similarly, if the type is Ufix_5_3, then the signal is an unsigned 5-bit
number having three fractional bits.
In the System Generator portion of a Simulink model, every signal must be sampled.
Sample times may be inherited using Simulink's propagation rules, or set explicitly in a
block customization dialog box. When there are feedback loops, System Generator is
sometimes unable to deduce sample periods and/or signal types, in which case the tool
issues an error message. Assert blocks must be inserted into loops to address this problem.
It is not necessary to add assert blocks at every point in a loop; usually it suffices to add an
assert block at one point to break the loop.
Note: Simulink can display a model by shading blocks and signals that run at different rates with
different colors (Format > Sample Time Colors in the Simulink pulldown menus). This is often useful
in understanding multirate designs.
X Exponent bits
E0 to Ex-1
Y Fraction Bits
F0 to FY-1
According to the IEEE-754 standard, a floating-point value is represented and stored in the
normalized form. In the normalized form the exponent value E is a biased/normalized
value. The normalized exponent, E, equals the sum of the actual exponent value and the
exponent bias. In the normalized form, Y-1 bits are used to store the fraction value. The F 0
fraction bit is always a hidden bit and its value is assumed to be 1.
S represents the value of the sign of the number. If S is 0 then the value is a positive
floating-point number; otherwise it is negative. The X bits that follow are used to store the
normalized exponent value E and the last Y-1 bits are used to store the fraction/mantissa
value in the normalized form.
For the given exponent width, the exponent bias is calculated using the following
equation:
www.xilinx.com
21
In compliance with the IEEE-754 standard, if Single precision is selected then the total bit
width is assumed to be 32; 8 bits for the exponent and 24 bits for the fraction. Similarly
when Double precision is selected, the total bit width is assumed to be 64 bits; 11 bits for
the exponent and 53 bits for the fraction part. When Custom precision is selected, the
Exponent width and Fraction width fields are activated and you are free to specify values
for these fields (8 and 24 are the default values). The total bit width for Custom precision
data is the summation of the number of exponent bits and the number of fraction bits.
22
www.xilinx.com
Similar to fraction bit width for Single precision and Double precision data types the
fraction bit width for Custom precision data type must include the hidden bit F0
www.xilinx.com
23
2.
If the data input (both A and B data inputs, where applicable) and the data output of a
System Generator block are not of the same floating-point data type. The DRC check
will be made between the two inputs of a block as well as between an input and an
output of the block.
If a Custom precision floating-point data type is specified, the exponent bit width and
the fraction bit width of the two ports are compared to determine that they are of the
same data type.
Note: The Convert and Relational blocks are excluded from this check. The Convert block
supports Float-to-float data type conversion between two different floating-point data types. The
Relational block output is always the Boolean data type because it gives a true or false result for
a comparison operation.
3.
If the data inputs are of the fixed-point data type and the data output is expected to be
floating-point and vice versa.
Note: The Convert and Relational blocks are excluded from this check. The Convert block
supports Fixed-to-float as well as Float-to-fixed data type conversion. The Relational block
output is always the Boolean data type because it gives a true or false result for a comparison
operation.
4.
If User Defined precision is selected for the Output Type of blocks that support the
floating-point data type. For example, for blocks such as AddSub, Mult, CMult, and
MUX, only Full output precision is supported if the data inputs are of the floatingpoint data type.
5.
If the Carry In port or Carry Out port is used for the AddSub block when the operation
on a floating-point data type is specified.
6.
If the Floating-Point Operator IP core gives an error for DRC rules defined for the IP.
24
www.xilinx.com
boundaries of the design are the points at which System Generator gateway blocks exist.
When a design is translated into hardware, Gateway In (respectively, Gateway Out) blocks
become top-level input (resp., output) ports.
The Gateway In block is configured with a sample period of one second. The Gateway Out
block converts the Xilinx fixed-point signal back to a double (so it can analyzed in the
Simulink scope), but does not alter sample rates. The scope output below shows the
unaltered and sampled versions of the sine wave.
Multirate Models
System Generator supports multirate designs, i.e., designs having signals running at
several sample rates. System Generator automatically compiles multirate models into
www.xilinx.com
25
hardware. This allows multirate designs to be implemented in a way that is both natural
and straightforward in Simulink.
Rate-Changing Blocks
System Generator includes blocks that change sample rates. The most basic rate changers
are the Up Sample and Down Sample blocks. As shown in the figure below, these blocks
explicitly change the rate of a signal by a fixed multiple that is specified in the blocks
dialog box.
Other blocks (e.g., the Parallel To Serial and Serial To Parallel converters) change rates
implicitly in a way determined by block parameterization.
Consider the simple multirate example below. This model has two sample periods, SP1
and SP2. The Gateway In dialog box defines the sample period SP1. The Down Sample
block causes a rate change in the model, creating a new rate SP2 which is half as fast as SP1.
Hardware Oversampling
Some System Generator blocks are oversampled, i.e., their internal processing is done at a
rate that is faster than their data rates. In hardware, this means that the block requires more
than one clock cycle to process a data sample. In Simulink such blocks do not have an
observable effect on sample rates.
One block that can be oversampled is the DAFIR FIR filter. An oversampled DAFIR
processes samples serially, thus running at a higher rate, but using less hardware.
Although blocks that are oversampled do not cause an explicit sample rate change in
Simulink, System Generator considers the internal block rate along with all other sample
rates when generating clocking logic for the hardware implementation. This means that
you must consider the internal processing rates of oversampled blocks when you specify
the Simulink system period value in the System Generator token dialog box.
Asynchronous Clocking
System Generator focuses on the design of hardware that is synchronous to a single clock.
It can, under some circumstances, be used to design systems that contain more than one
clock. This is possible provided the design can be partitioned into individual clock
domains with the exchange of information between domains being regulated by dual port
memories and FIFOs. System Generator fully supports such multi-clock designs, including
the ability to simulate them in Simulink and to generate complete hardware descriptions.
Details are discussed in the topic Generating Multiple Cycle-True Islands for Distinct
Clocks. The remainder of this topic focuses exclusively on the clock-synchronous aspects
of System Generator. This discussion is relevant to both single-clock and multiple-clock
designs.
26
www.xilinx.com
Synchronous Clocking
As shown in the figure below, when you use the System Generator token to compile a
design into hardware, there are three clocking options for Multirate implementation: (1)
Clock Enables (the default), (2) Hybrid DCM-CE, and (3) Expose Clock Ports.
www.xilinx.com
27
CE4 equal 2, 3, and 4 system clock periods, respectively. A timing diagram for the example
clock enable signals is shown below:
Clock Probe
DAFIR
Downsample - when the Sample option First value of the frame is selected
FIR Compiler - when the core rate is not equal to the input sample rate
Upsample - when the Copy samples (otherwise zeros are inserted) option is not
selected.
28
Clock Probe
DAFIR
www.xilinx.com
Downsample - when the Sample option First value of the frame is selected
FIR Compiler - when the core rate is not equal to the input sample rate
Upsample - when the Copy samples (otherwise zeros are inserted) option is not
selected.
Addressable Shift Register (ASR): used to implement the input delay buffer. The
address port runs n times faster than the data port, where n is the number of the filter
taps (5 for this example)
www.xilinx.com
29
30
2.
Double-click on the System Generator token to bring up the following dialog box:
www.xilinx.com
As shown, click on the the Clocking tab, select Hybrid DCM-CE, then click Generate.
After a few moments, a sub-directory named hdl_netlist_dcm is created in the current
working directory containing the generated files.
3.
4.
5.
6.
From the Project Navigator Design Sources Hierarchy view, do the following:
a.
b.
Observe that System Generator automatically infers and instantiates the DCM
instance and its parameters according to the required clock outputs.
c.
Next, you are going to examine the clock propagation by examining the ISE timing report.
First, you must generate the report.
7.
Open the following folder: Processes view > Implement Design > Place & Route >
Generate Post-Place & Route Static Timing
www.xilinx.com
31
8.
Double -click on Analyze Post-Place & Route Static Timing and you should see the
information in the figure below:
This design is comprised of six clock rates 1, 2, 4, 8, 20, 40 with respect to the 10 ns
global clock constraint. The timing report validates the correct clock generation and
propagation by System Generator as follows:
DCM-based clocks: clk_1 (CLK0 ->10 ns), clk_2 (CLKFX ->20 ns) , clk_4 (CLKDIV
->40 ns) generated by the DCM based on the 10 ns global clock input
Clock Enable-based clocks: ce_8 (80 ns), ce_20 (200 ns), ce_40 (400 ns) generated
by clock enables based on the clk_4 clock input
32
As shown in the following figure, move to the Sources for dialog box in the Sources
window, then select Behavioral Simulation
www.xilinx.com
Note: System Generator automatically creates the top-wrapper VHDL testbench, script file and
input/output stimulus data files. The Processes tab changes and displays according to the
Sources type being selected.
1. Select
2. Double Click
10. Simulate the design, as shown above, by double-click on Simulate Behavioral Model
in the Processes window
11. After the simulation is finished, you should be able to observe the simulation
waveforms as shown in the figure below:
All DCM clocks are included in the top-level wrapper testbench file
(hybrid_dcm_ce_case1_dcm_mcw_tb.vhd) clk_1, clk_2 and clk_4.
Summary
www.xilinx.com
33
When you select the Hybrid DCM-CE option, System Generator automatically infers and
instantiates a DCM without further manual intervention. In addition, the tool intelligently
generates different clock rates by using a combination of DCM and CE clock generation
algorithms and by assigning appropriate clock rates to either the DCM or CE in order to
obtain optimal Quality of Results and low power consumption. You do not have to set
attributes or specify DCM clock outputs. You should expect minimal clock skew when
selecting the Hybrid DCM-CE option compared to the Clock Enables option alone.
34
1.
Addressable Shift Register (ASR): used to implement the input delay buffer. The
address port runs n times faster than the data port, where n is the number of the filter
taps (5 for this example)
www.xilinx.com
2.
Double-click on the System Generator token to bring up the following dialog box:
As shown above, click on the Clocking tab, select Expose Clock Ports, then click Generate.
After a few moments, a sub-directory named hdl_netlist is created in the current working
directory containing the generated files.
3.
4.
5.
From the Project Navigator Design Sources Hierarchy view, do the following:
a.
b. Observe that System Generator infers the clocks based on the different rates in the
design and brings the clock ports to the top-level wrapper. Since this design
contains two clock rates, clocks clk_1 and clk_5 are pulled to the top-level
wrapper. This will allow you to directly drive the multiple synchronous clocks
from outside the System Generator design.
c.
As shown below, move to the Sources for dialog box in the Sources window, then
select Behavioral Simulation
www.xilinx.com
35
Note: System Generator automatically creates the top-wrapper VHDL testbench, script file and
input/output stimulus data files. The Processes tab changes and displays according to the
Sources type being selected.
1. Select
2. Double Click
7.
8.
After the simulation is finished, you should be able to observe the simulation
waveforms as shown in the figure below:
Summary
When you select the Expose Clock Ports option, System Generator automatically infers the
correct clocks from the design rates and exposes the clock ports in the top-level wrapper.
The clock rates are determined by the same methodology when you use the Clock Enables
option. You can now drive the exposed clock ports from an external synchronous clock
source.
Synchronization Mechanisms
System Generator does not make implicit synchronization mechanisms available. Instead,
synchronization is the responsibility of the designer, and must be done explicitly.
36
www.xilinx.com
Valid Ports
System Generator provides several blocks (in particular, a FIFO) that can be used for
synchronization. Several blocks provide input (respectively, output) ports that specify
when an input (resp., output) sample is valid. Such ports can be chained, affording a
primitive form of flow control. Blocks with such ports include the FFT, FIR, and Viterbi.
Indeterminate Data
Indeterminate values are common in many hardware simulation environments. Often they
are called dont cares or Xs. In particular, values in System Generator simulations can
be indeterminate. A dual port memory block, for example, can produce indeterminate
results if both ports of the memory attempt to write the same address simultaneously.
What actually happens in hardware depends upon effectively random implementation
details that determine which port sees the clock edge first. Allowing values to become
indeterminate gives the system designer greater flexibility. Continuing the example, there
is nothing wrong with writing to memory in an indeterminate fashion if subsequent
processing does not rely on the indeterminate result.
HDL modules that are brought into the simulation through HDL co-simulation are a
common source for indeterminate data samples. System Generator presents indeterminate
values to the inputs of an HDL co-simulating module as the standard logic vector 'XXX . .
. XX'.
Indeterminate values that drive a Gateway Out become what are called NaNs. (NaN
abbreviates not a number.) In a Simulink scope, NaN values are not plotted. Conversely,
NaNs that drive a Gateway In become indeterminate values. System Generator provides
an Indeterminate Probe block that allows for the detection of indeterminate values. This
probe cannot be translated into hardware.
In System Generator, any arithmetic signal can be indeterminate, but Boolean signals
cannot be. If a simulation reaches a condition that would force a Boolean to become
indeterminate, the simulation is halted and an error is reported. Many Xilinx blocks have
control ports that only allow Boolean signals as inputs. The rule concerning indeterminate
Booleans means that such blocks never see an indeterminate on a control port
A UFix_1_0 is a type that is equivalent to Boolean except for the above restriction
concerning indeterminate data.
Block Masks
In Simulink, blocks are parameterized through a mechanism called masking. In essence, a
block can be assigned mask variables whose values can be specified by a user through dialog
box prompts or can be calculated in mask initialization commands. Variables are stored in
a mask workspace. A mask workspace is local to the blocks under the mask and cannot be
accessed by external blocks.
Note: It is possible for a mask to access global variables and variables in the base workspace. To
access a base workspace variable, use the MATLAB evalin function. For more information on the
www.xilinx.com
37
MATLAB and Simulink scoping rules, refer to the manuals titled Using MATLAB and Using Simulink
from The MathWorks, Inc.
Parameter Passing
It is often desirable to pass variables to blocks inside a masked subsystem. Doing so allows
the blocks configuration to be determined by parameters on the enclosing subsystem. This
technique can be applied to parameters on blocks in the Xilinx blockset whose values are
set using a listbox, radio button, or checkbox. For example, when building a subsystem
that consists of a multiply and accumulate block, you can create a parameter on the
subsystem that allows you to specify whether to truncate or round the result. This
parameter will be called trunc_round as shown in the figure below.
As shown below, in the parameter editing dialog for the accumulator and multiplier
blocks, there are radio buttons that allow either the truncate or round option to be selected.
In order to use a parameter rather than the radio button selection, right click on the radio
button and select: Define With Expression. A MATLAB expression can then be used as
the parameter setting. In the example below, the trunc_round parameter from the
38
www.xilinx.com
subsystem mask can be used in both the accumulator and multiply blocks so that each
block will use the same setting from the mask variable on the subsystem.
Resource Estimation
System Generator supplies tools that estimate the FPGA hardware resources needed to
implement a design. Estimates include numbers of slices, lookup tables, flip-flops, block
memories, embedded multipliers, I/O blocks and tristate buffers. These estimates make it
easy to determine how design choices affect hardware requirements. To estimate the
resources needed for a subsystem, drag a Resource Estimator block into the subsystem,
double-click on the estimator, and press the Estimate button.
Compilation Results
HDL Testbench
www.xilinx.com
39
40
www.xilinx.com
EDK Export Tool - for exporting to the Xilinx Embedded Development Kit
Timing and Power Analysis - a report on the timing and power consumption of the
design.
HDL Netlist is the type used most often. In this case, the result is a collection of HDL and
EDIF files, and a few auxiliary files that simplify downstream processing. The collection is
ready to be processed by a synthesis tool (e.g., XST), and then fed to the Xilinx physical
design tools (i.e., ngdbuild, map, par, and bitgen) to produce a configuration bitstream for
a Xilinx FPGA. The files that are produced are described in more detail in Compilation
Results.
NGC Netlist is similar to HDL Netlist but the resulting files are NGC files instead of HDL
files.
When the type is a variety of hardware co-simulation, then System Generator produces an
FPGA configuration bitstream that is ready to run in a hardware FPGA platform. The
particular platform depends on the variety chosen. For example, when the variety is
Hardware Co-simulation > XtremeDSP Development Kit > PCI and USB, then the
bitstream is suitable for the XtremeDSP board (available for separate purchase from
Xilinx). System Generator also produces a hardware co-simulation block to which the
bitstream is associated. This block is able to participate in Simulink simulations. It is
functionally equivalent to the portion of the design from which it was derived, but is
implemented by its bitstream. In a simulation, the block delivers the same results as those
produced by the portion, but the results are calculated in working hardware.
The remaining compilation parameters are described in the table below. Some are
available only when the compilation type is HDL Netlist. For example, the clock pin
location cannot be chosen for a hardware co-simulation compilation because it is fixed in
each hardware FPGA platform.
Control
Description
Part
Target Directory
www.xilinx.com
41
Control
Description
Synthesis tool
Hardware description
language
Specifies the language to be used for HDL netlist of the design. The
possibilities are VHDL and Verilog.
Create testbench
Import as
configurable
subsystem
Defines the pin location for the hardware clock. This information is
passed to the Xilinx implementation tools through a constraints file.
Multirate
implementation
42
www.xilinx.com
Control
Description
This instructs System Generator to provide a ce_clr port on the toplevel clock wrapper. The ce_clr signal is used to reset the clock
enable generation logic. Capability to reset clock enable generations
logic allows designs to have dynamic control for specifying the
beginning of data path sampling. See the topic for details.
Input data typesthe input data types for each port are shown
Output data typesoutput data types for each port are shown
www.xilinx.com
43
Hierarchical Controls
The Simulink System Period control (see the topic Simulink System Period above) on the
System Generator token is hierarchical. A hierarchical control on a System Generator token
applies to the portion of the design within the scope of the token, but can be overridden on
other System Generator tokens deeper in the design. For example, suppose Simulink
System Period is set in a System Generator token at the top of the design, but is changed in
a System Generator token within a subsystem S. Then that subsystem will have the second
period, but the rest of the design will use the period set in the top level.
Compilation Results
In topic discusses the low-level files System Generator produces when HDL Netlist is
selected on the System Generator token and Generate is clicked. The files consist of HDL,
NGC and EDIF that implement the design. In addition, System Generator produces
auxiliary files that simplify downstream processing, e.g., bringing the design into Project
Navigator, simulating using an HDL simulator, and synthesizing using various synthesis
tools. All files are written to the target directory specified on the System Generator token.
44
www.xilinx.com
If no testbench is requested, then the key files produced by System Generator are the
following:
File Name or Type
Description
<design>.vhd/.v
<design>_cw.vhd/.v
globals
<design>_cw.ise
This allows the HDL and EDIF to be brought into the Xilinx
project management tool Project Navigator.
hdlFiles
synplify_<design>.prj, or
xst_<design>.pr
vcom.do
If a testbench is requested, then, in addition to the above, System Generator produces files
that allow simulation results to be compared. The comparisons are between Simulink
simulation results and corresponding results from ModelSim. The additional files are the
following:
File Name or Type
Description
<design>_tb.vhd/.v
vsim.do
pn_behavioral.do,
pn_postmap.do,
pn_postpar.do,
pn_posttranslate.do
www.xilinx.com
45
The speed, with respect to the system clock, at which various portions of the design
must run;
The file format depends on the synthesis tool that is specified in the System Generator
token. When XST is selected, the file is written in the XCF format; for Synplify and Synplify
Pro, the NCF format is used. The file name ends with.xcf or .ncf, as appropriate.
46
www.xilinx.com
Constraints Example
The figure below shows a small multirate design and the constraints System Generator
produces for it.
The up sampler doubles the rate, and the down sampler divides the rate by three. Assume
the system clock period is 10 ns. Then the clock periods are 10 ns for the FIR, 20 ns for the
input register, and 30 ns for the output register. The following text describes the constraints
that convey this information.
The lines that indicate the system clock period is10 ns are the following:
# Global period constraint
NET "clk" TNM_NET = "clk_392b7670";
TIMESPEC "TS_clk_392b7670" = PERIOD "clk_392b7670" 10.0 ns HIGH 50 %;
To build timing constraints, the blocks in the design are partitioned into timing groups.
Two blocks are in the same timing group if and only if they run at the same sample rate. In
this design there are three timing groups, corresponding to the three rates. The nature of
constraints dictates that no name is needed for the fastest group. The remaining groups are
named ce_2_392b7670_group and ce_3_392b7670_group; they correspond to periods 20 ns
and 30 ns respectively.
The FIR runs at the system (i.e., fastest) rate and therefore is constrained using the global
period constraint shown above. The logic used to generate clocks always runs at the
system rate and is also constrained to the system rate.
The ce_2_392b7670_group consists of the blocks that operate at half the system rate, i.e., the
input register and the up sampler. Every block in the group is driven by the clock enable
net named ce2_sysgen. The constraints that define the group are the following:
# ce_2_392b7670_group and inner group constraint
Net "ce_2_sg_x0*" TNM_NET = "ce_2_392b7670_group";
TIMESPEC "TS_ce_2_392b7670_group_to_ce_2_392b7670_group" = FROM
"ce_2_392b7670_group" TO "ce_2_392b7670_group" 20.0 ns;
Note: A wildcard character is added to the net name to constrain any additional copies of this net
that may be generated when clock enable logic is replicated. The maximum fanout of a clock enable
net can be controlled in the synthesis tool.
The ce_3_392b7670_group operates at one third the system rate. It contains the down
sampler and the output register, and is defined in a similar manner to the ce2_group.
# ce_3_392b7670_group and inner group constraint
Net "ce_3_sg_x0*" TNM_NET = "ce_3_392b7670_group";
www.xilinx.com
47
Group to group constraints establish relative speeds. Here are the constraints that relate
the speeds of ce_2_392b7670_group and ce_3_392b7670_group:
# Group-to-group constraints
TIMESPEC "TS_ce_2_392b7670_group_to_ce_3_392b7670_group" = FROM
"ce_2_392b7670_group" TO "ce_3_392b7670_group" 20.0 ns;
TIMESPEC "TS_ce_3_392b7670_group_to_ce_2_392b7670_group" = FROM
"ce_3_392b7670_group" TO "ce_2_392b7670_group" 20.0 ns;
Port timing requirements can be set in the parameter dialog boxes for gateways. These
requirements are translated into port constraints such as those shown below. In this
example, the 3-bit din input is constrained to operate at its gateway's sample rate
(corresponding to a period of 20 ns). The "FAST" attributes indicate the ports should be
implemented using hardware that reduces delay. The reduction comes at a cost of
increased noise and power consumption.
# Offset in constraints
NET "din(0)" OFFSET = IN : 20.0 : BEFORE "clk";
NET "din(0)" FAST;
NET "din(1)" OFFSET = IN : 20.0 : BEFORE "clk";
NET "din(1)" FAST;
NET "din(2)" OFFSET = IN : 20.0 : BEFORE "clk";
NET "din(2)" FAST;
Selecting Specify IOB Location Constraints for a gateway allows port locations to be
specified. The locations must be entered as a cell array of strings in the box labeled IOB
Pad Locations. Locations are package-specific; in this example a Virtex-E 2000 in a FG680
package is used. The location constraints for the din bus are provided in the dialog box as
"{'D35', 'B36', 'C35' }". This is translated into constraints in the .xcf (or .ncf) file in the
following way:
# Loc constraints
NET "din(2)" LOC = "D35";
NET "din(1)" LOC = "B36";
NET "din(0)" LOC = "C35";
Clock Handling in HDL
48
www.xilinx.com
Note: The clock wrapper exposes a port named ce. The port does nothing except to serve as a
companion to the clk port on the wrapper. The reason for having the port is to allow the clock wrapper
to be used as a black box in System Generator designs.
www.xilinx.com
49
Core Caching
System Generator uses cores produced by Xilinx CORE Generator (coregen) to
implement parts of designs. Generating cores can be expensive, so System Generator
caches previously generated ones. Before coregen is called, System Generator looks in the
cache, and if the core has already been generated, System Generator reuses it.
By default, the cache is the directory $TEMP/sg_core_cache. And by default, System
Generator caches no more than 2,000 cores. When the limit is reached, System Generator
deletes cached cores to make room for new ones.
Note: Environment variables can be used to change the location of the cache and the cache size
limit. The variables are described below.
Environment Variable
Description
SGCORECACHE
SGCORECACHELIMIT
HDL Testbench
Ordinarily, System Generator designs are bit and cycle-accurate, so Simulink simulation
results exactly match those seen in hardware. There are, however, times when it is useful to
compare Simulink simulation results against those obtained from an HDL simulator. In
particular, this makes sense when the design contains black boxes. The Create Testbench
checkbox in the System Generator token makes this possible.
Suppose the design is named <design>, and a System Generator token is placed at the top
of the design. Suppose also that in the token the Compilation field is set to HDL Netlist,
and the Create Testbench checkbox is selected. When the Generate button is clicked,
System Generator produces the usual files for the design, and in addition writes the
following:
1.
2.
Various .dat files that contain test vectors for use in an HDL testbench simulation.
3.
Scripts vcom.do and vsim.do that can be used in ModelSim to compile and simulate
the testbench, comparing Simulink test vectors against those produced in HDL.
System Generator generates the .dat files by saving the values that pass through
gateways. In the HDL simulation, input values from the .dat files are stimuli, and output
values are expected results. The testbench is simply a wrapper that feeds the stimuli to the
HDL for the design, then compares HDL results against expected ones.
50
www.xilinx.com
Example 1 Simple Selector shows how to implement a function that returns the
maximum value of its inputs;
Example 5 Passing Parameters into the MCode Block shows how to pass parameters
into a MCode block;
Example 6 Optional Input Ports shows how to implement optional input ports on an
MCode block;
Example 7 Finite State Machines shows how to implement a finite state machine;
Example 9 FIR Example and System Verification shows how to model FIR blocks and
how to do system verification;
Example 10 RPN Calculator shows how to model a RPN calculator a stack machine;
Example 11 Example of disp Function shows how to use disp function to print
variable values.
Simple Selector
This example is a simple controller for a data path, which assigns the maximum value of
two inputs to the output. The M-function is specified as the following and is saved in an Mfile xlmax.m:
function z = xlmax(x, y)
if x > y
z = x;
www.xilinx.com
51
else
z = y;
end
The xlmax.m file should be either saved in the same directory of the model file or should
be in the MATLAB path. Once the xlmax.m has been saved to the appropriate place, you
should drag a MCode block into your model, open the block parameter dialog box, and
enter xlmax into the MATLAB Function field. After clicking the OK button, the block has
two input ports x and y, and one output port z.
The following figure shows what the block looks like after the model is compiled. You can
see that the block calculates and sets the necessary fixed-point data type to the output port.
52
www.xilinx.com
% container types.%
% You must use a xfix() to specify type, number of bits, and
% binary point position to convert floating point values to
% Xilinx fixed-point constants or variables.
% By default, the xfix call uses xlTruncate
% and xlWrap for quantization and overflow modes.
% const1 is Ufix_8_3
const1 = xfix({xlUnsigned, 8, 3}, 1.53);
% const2 is Fix_10_4
const2 = xfix({xlSigned, 10, 4, xlRound, xlWrap}, 5.687);
z1 = a + const1;
z2 = -b - const2;
z3 = z1 - z2;
% convert z3 to Fix_12_8 with saturation for overflow
z3 = xfix({xlSigned, 12, 8, xlTruncate, xlSaturate}, z3);
% z4 is true if both inputs are positive
z4 = a>const1 & b>-1;
This M-function uses addition and subtraction operators. The MCode block calculates
these operations in full precision, which means the output precision is sufficient to carry
out the operation without losing information.
One thing worth discussing is the xfix function call. The function requires two
arguments: the first for fixed-point data type precision and the second indicating the value.
The precision is specified in a cell array. The first element of the precision cell array is the
type value. It can be one of three different types: xlUnsigned, xlSigned, or xlBoolean.
The second element is the number of bits of the fixed-point number. The third is the binary
point position. If the element is xlBoolean, there is no need to specify the number of bits
and binary point position. The number of bits and binary point position must be specified
in pair. The fourth element is the quantization mode and the fifth element is the overflow
mode. The quantization mode can be one of xlTruncate, xlRound, or xlRoundBanker.
The overflow mode can be one of xlWrap, xlSaturate, or xlThrowOverflow.
Quanitization mode and overflow mode must be specified as a pair. If the quantizationoverflow mode pair is not specified, the xfix function uses xlTruncate and xlWrap for
signed and unsigned numbers. The second argument of the xfix function can be either a
double or a Xilinx fixed-point number. If a constant is an integer number, there is no need
to use the xfix function. The Mcode block converts it to the appropriate fixed-point
number automatically.
www.xilinx.com
53
After setting the dialog box parameter MATLAB Function to xlSimpleArith, the block
shows two input ports a and b, and four output ports z1, z2, z3, and z4.
54
www.xilinx.com
M-functions using Xilinx data types and functions can be tested in the MATLAB command
window. For example, if you type: [z1, z2, z3, z4] = xlSimpleArith(2, 3) in
the MATLAB command window, you'll get the following lines:
UFix(9, 3): 3.500000
Fix(12, 4): -8.687500
Fix(12, 8): 7.996094
Bool: true
Notice that the two integer arguments (2 and 3) are converted to fixed-point numbers
automatically. If you have a floating-point number as an argument, an xfix call is
required.
Two delay blocks are added after the MCode block. By selecting the option Implement
using behavioral HDL on the Delay blocks, the downstream logic synthesis tool is able to
perform the appropriate optimizations to achieve higher performance.
www.xilinx.com
55
Shift Operations
This example shows how to implement bit-shift operations using the MCode block. Shift
operations are accomplished with multiplication and division by powers of two. For
example, multiplying by 4 is equivalent to a 2-bit left-shift, and dividing by 8 is equivalent
to a 3-bit right-shift. Shift operations are implemented by moving the binary point position
and if necessary, expanding the bit width. Consequently, multiplying a Fix_8_4 number by
4 results in a Fix_8_2 number, and multiplying a Fix_8_4 number by 64 results in a
Fix_10_0 number.
The following shows the xlsimpleshift.m file which specifies one left-shift and one
right-shift:
function [lsh3, rsh2] = xlsimpleshift(din)
% [lsh3, rsh2] = xlsimpleshift(din) does a left shift
% 3 bits and a right shift 2 bits.
% The shift operation is accomplished by
% multiplication and division of power
% of two constant.
lsh3 = din * 8;
rsh2 = din / 4;
56
www.xilinx.com
The following diagram shows a subsystem containing two MCode blocks that use Mfunction xl_sconvert. The arguments nbits and binpt of the M-function are specified
differently for each block by passing different parameters to the MCode blocks. The
parameters passed to the MCode block labeled signed convert 1 cause it to convert
the input data from type Fix_16_8 to Fix_10_5 at its output. The parameters passed to
the MCode block labeled signed convert2 causes it to convert the input data from type
Fix_16_8 to Fix_8_4 at its output.
www.xilinx.com
57
To pass parameters to each MCode block in the diagram above, you can click the Edit
Interface button on the block GUI then set the values for the M-function arguments. The
mask for MCode block signed convert 1 is shown below:
58
www.xilinx.com
The above interface window sets the M-function argument nbits to be 10 and binpt to
be 5. The mask for the MCode block signed convert 2 is shown below:
The above interface window sets the M-function argument nbits to be 8 and binpt
to be 4.
www.xilinx.com
59
The following diagram shows a subsystem containing two MCode blocks that use Mfunction xl_m_addsub.
60
www.xilinx.com
The Block Interface Editor of the MCode block labeled add is shown in below.
As a result, the add block features two input ports a and b; it performs full precision
addition. Input parameter sub of the MCode block labeled addsub is not bound with any
value. Consequently, the addsub block features three input ports: a, b, and sub; it
performs full precision addition or subtraction based on the value of input port sub.
www.xilinx.com
61
The M-function that is used by the MCode block contains a transition function, which
computes the next state based on the current state and the current input. Unlike example 3
though, the M-function in this example defines persistent state variables to store the state
of the finite state machine in the MCode block. The following M-code, which defines
function detect1011_w_state is contained in file detect1011_w_state.m:
function matched = detect1011_w_state(din)
% This is the detect1011 function with states for detecting a
% pattern of 1011.
seen_none = 0; %
seen_1 = 1;
%
%
seen_10 = 2;
%
%
seen_101 = 3; %
%
62
www.xilinx.com
state = seen_10;
end
case seen_10 % seen 10
if din==1
state = seen_101;
else
% no part of sequence seen, go to seen_none
state = seen_none;
end
case seen_101
if din==1
state = seen_1;
matched = true;
else
state = seen_10;
matched = false;
end
end
The following diagram shows a state machine subsystem containing a MCode block after
compilation; the MCode block uses M-function detect1101_w_state.
Parameterizable Accumulator
This example shows how to use the MCode block to build an accumulator using persistent
state variables and parameters to provide implementation flexibility. The following Mcode, which defines function xl_accum is contained in file xl_accum.m:
function q = xl_accum(b, rst, load, en, nbits, ov, op,
feed_back_down_scale)
% q = xl_accum(b, rst, nbits, ov, op, feed_back_down_scale) is
% equivalent to our Accumulator block.
binpt = xl_binpt(b);
init = 0;
precision = {xlSigned, nbits, binpt, xlTruncate, ov};
persistent s, s = xl_state(init, precision);
q = s;
if rst
if load
% reset from the input port
s = b;
else
% reset from zero
s = init;
www.xilinx.com
63
end
else
if ~en
else
% if enabled, update the state
if op==0
s = s/feed_back_down_scale + b;
else
s = s/feed_back_down_scale - b;
end
end
end
The following diagram shows a subsystem containing the accumulator MCode block using
M-function xl_accum. The MCode block is labeled MCode Accumulator. The
subsystem also contains the Xilinx Accumulator block, labeled Accumulator, for
comparison purposes. The MCode block provides the same functionality as the Xilinx
Accumulator block; however, its mask interface differs in that parameters of the MCode
block are specified with a cell array in the Function Parameter Bindings parameter.
64
www.xilinx.com
Optional inputs rst and load of block Accum_MCode1 are disabled in the cell array of the
Function Parameter Bindings parameter. The block mask for block MCode Accumulator is
shown below:
www.xilinx.com
65
The example contains two additional accumulator subsystems with MCode blocks using
the same M-function, but different parameter settings to accomplish different accumulator
implementations.
The model contains two FIR blocks. Both are modeled with the MCode block and both are
synthesizable. The following are the two functions that model those two blocks.
function y = simple_fir(x, lat, coefs, len, c_nbits, c_binpt, o_nbits,
o_binpt)
coef_prec = {xlSigned, c_nbits, c_binpt, xlRound, xlWrap};
out_prec = {xlSigned, o_nbits, o_binpt};
coefs_xfix
persistent
persistent
persistent
= xfix(coef_prec, coefs);
coef_vec, coef_vec = xl_state(coefs_xfix, coef_prec);
x_line, x_line = xl_state(zeros(1, len-1), x);
p, p = xl_state(zeros(1, lat), out_prec, lat);
sum = x * coef_vec(0);
for idx = 1:len-1
sum = sum + x_line(idx-1) * coef_vec(idx);
sum = xfix(out_prec, sum);
end
y = p.back;
p.push_front_pop_back(sum);
x_line.push_front_pop_back(x);
function y = fir_transpose(x, lat, coefs, len, c_nbits, c_binpt,
o_nbits, o_binpt)
coef_prec = {xlSigned, c_nbits, c_binpt, xlRound, xlWrap};
out_prec = {xlSigned, o_nbits, o_binpt};
coefs_xfix = xfix(coef_prec, coefs);
persistent coef_vec, coef_vec = xl_state(coefs_xfix, coef_prec);
persistent reg_line, reg_line = xl_state(zeros(1, len), out_prec);
if lat <= 0
error('latency must be at least 1');
end
66
www.xilinx.com
lat = lat - 1;
persistent dly,
if lat <= 0
y = reg_line.back;
else
dly = xl_state(zeros(1, lat), out_prec, lat);
y = dly.back;
dly.push_front_pop_back(reg_line.back);
end
for idx = len-1:-1:1
reg_line(idx) = reg_line(idx - 1) + coef_vec(len - idx - 1) * x;
end
reg_line(0) = coef_vec(len - 1) * x;
www.xilinx.com
67
In order to verify that the functionality of two blocks are equal, we also use another MCode
block to compare the outputs of two blocks. If the two outputs are not equal at any given
time, the error checking block will report the error. The following function does the error
checking:
function eq = error_ne(a, b, report, mod)
persistent cnt, cnt = xl_state(0, {xlUnsigned, 16, 0});
switch mod
case 1
eq = a==b;
case 2
eq = isnan(a) || isnan(b) || a == b;
case 3
eq = ~isnan(a) && ~isnan(b) && a == b;
otherwise
eq = false;
error(['wrong value of mode ', num2str(mod)]);
end
if report
if ~eq
error(['two inputs are not equal at time ', num2str(cnt)]);
end
end
cnt = cnt + 1;
68
www.xilinx.com
RPN Calculator
This example shows how to use the MCode block to model a RPN calculator which is a
stack machine. The block is synthesizable:
www.xilinx.com
69
OP_ADD = 2;
OP_SUB = 3;
OP_MULT = 4;
OP_NEG = 5;
OP_DROP = 6;
q = acc;
active = acc_active;
if rst
acc = 0;
acc_active = false;
stack_pt = 0;
elseif en
if ~is_oper
% enter data, push
if acc_active
stack_pt = xfix(stack_pt_prec, stack_pt + 1);
mem(stack_pt) = acc;
stack_active = true;
else
acc_active = true;
end
acc = din;
else
if op == OP_NEG
% unary op, no stack op
acc = -acc;
elseif stack_active
b = mem(stack_pt);
switch double(op)
case OP_ADD
acc = acc + b;
case OP_SUB
acc = b - acc ;
case OP_MULT
acc = acc * b;
case OP_DROP
acc = b;
end
stack_pt = stack_pt - 1;
elseif acc_active
acc_active = false;
acc = 0;
end
end
end
stack_active = stack_pt ~= 0;
70
www.xilinx.com
www.xilinx.com
71
Here are the lines that are displayed on the MATLAB console for the first simulation step.
mcode_block_disp/MCode (Simulink time: 0.000000, FPGA clock: 0)
Hello World!
num2str(dly) is [0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
0.000000, 0.000000, 0.000000]
disp(dly) is
type: Fix_11_7,
maxlen: 8,
length: 8,
0: binary 0000.0000000, double 0.000000,
1: binary 0000.0000000, double 0.000000,
2: binary 0000.0000000, double 0.000000,
3: binary 0000.0000000, double 0.000000,
4: binary 0000.0000000, double 0.000000,
5: binary 0000.0000000, double 0.000000,
6: binary 0000.0000000, double 0.000000,
7: binary 0000.0000000, double 0.000000,
disp(rom) is
type: Fix_11_7,
maxlen: 4,
length: 4,
0: binary 0011.0000000, double 3.0,
1: binary 0010.0000000, double 2.0,
2: binary 0001.0000000, double 1.0,
3: binary 0000.0000000, double 0.0,
a = 0.000000, b = 0.000000, x = 0.000000
1
disp(10) is
type: UFix_4_0, binary: 1010, double: 10.0
disp(-10) is
type: Fix_5_0, binary: 10110, double: -10.0
disp(a) is
type: Fix_11_7, binary: 0000.0000000, double: 0.000000
disp(a == b)
type: Bool, binary: 1, double: 1
72
www.xilinx.com
Enables you to perform certain design iterations between Project Navigator and the
System Generator design
www.xilinx.com
73
74
www.xilinx.com
A Step-by-Step Example
In this example, two HDL netlists from System Generator are integrated into a larger
VHDL design. Design #1 is named SPRAM and design #2 is named MAC_FIR. The toplevel VHDL entity combines the two data ports and a control signal from the SPRAM
design to create a bidirectional bus. The top-level VHDL also instantiates the MAC_FIR
design and supplies a separate clock signal named clk2. A block diagram of this design is
shown below.
The files used in this example are located in the System Generator tree at pathname
<ISE_Design_Suite_tree>/sysgen/examples/projnav/mult_diff_designs.
The following files are provided:
www.xilinx.com
75
Open the first design, spram.mdl, in MATLAB. This is a multirate design due to the
down sampling block placed after the output of the Single Port RAM.
2.
Double click on the System Generator token; select the HDL Netlist target and press
the Generate button. By pressing the Generate button, the HDL file for this design is
created in the directory
<ISE_Design_Suite_tree>/sysgen/examples/projnav/mult_diff_desig
ns/hdl_netlist1.
3.
Repeat steps 1 and 2 for the mac_fir.mdl model. The HDL file for this design is
created in the directory
<ISE_Design_Suite_tree>/sysgen/examples/projnav/mult_diff_desig
ns/hdl_netlist2.
Note: You are now finished generating HDL Netlists from System Generator
76
1.
2.
www.xilinx.com
Launch ISE and reload the pre-generated top-level design ISE project at
~top_level/top_level.ise.
Note: At this point, your Project Navigator should look like the figure below. Both spram_cw and
mac_fir_cw instances are instantiated at the top_level design. But since they are not located on the
same directory as the top-level design, Project Navigator puts a question mark next to each one of
them to indicate that it can not find these two instances / modules.
:
2.
Add the System Generator source: under the Sources tab, right-click on
u_spram_cw -> Add Sourceat
<ISE_Design_Suite_tree>/sysgen/examples/projnav/mult_diff_desig
ns/hdl_netlist1/spram_cw.sgp
www.xilinx.com
77
3.
4.
As shown below, make sure the file top_level is selected, then implement the design
by double clicking on Implement Design in the Processes window. Once the
implementation is finished, Project Navigator should look like the figure below.
5.
Examine the timing constraints in the Place and Route Report that is located in the
Detailed Reports section of the Design Summery pane.
Note that in the PAR report the multirate constraints were met:
Constraints for each System Generator design were created and translated to a UCF (User
Constraint File). These UCF constraint files were then consolidated and associated during
ISE implementation (NGDBUILD). They are briefly described as follows:
A system sample period of 100 ns was set in the System Generator token for both designs
(1 & 2)
78
www.xilinx.com
The down sampling block in the SRAM design performs a down sample by 2. The
ce2_f488215c_group_to_ce2_f488215c_group2 constraint is for all the synchronous
elements after the down sampler and is set to twice the system sample period (4)
With the new integration between System Generator and Project Navigator, these
constraints are automatically associated and consolidated by Project Navigator up to the
top-level design. This flow is only available starting with Release 10.1.
System Generator creates VHDL files and invokes the selected logic synthesis tool to
generate the HDL Netlist. These VHDL files are used when simulating the top-level
design. The VHDL files generated for a design are named <design>_cw.vhd, and
<design>.vhd. Open the custom ModelSim do file named top_level_testbench.do
to see how the VHDL files for both designs are referenced.
Memory initialization (.mif) and coefficient (.coe) files that are used during
simulation must be placed in the same directory as the top-level VHDL file. For this
example, the mif files are copied from both hdl_netlist1 and hdl_netlist2 subfolders by the following statement in the ModelSim do file (top_level_testbench.do):
foreach i [glob ../hdl_netlist1/*.mif] {
file copy -force $i .
In a case where there are also coefficient files, you can add a similar statement to the do
file to copy the files up to the top-level VHDL file.
2.
Change the Design View to Simulation. Select the top_level_testbenchstructural(top_level.vhd) source file. This file is imported into the project as a
testbench file, thus allowing you to simulate the design using the Simulator.
www.xilinx.com
79
3.
80
In the Processes View, right click on Simulate Behavioral Model and select Process
Properties You should see a Simulation Properties dialog box as shown below. Note
that aCustom Do File has been specified.
www.xilinx.com
The previous screen shot shows the ModelSim commands used to compile the VHDL code
generated by System Generator. To simulate the top_level design, double left click on the
Simulate Behavioral Model process. The ModelSim .do file compiles the VHDL code and
runs the simulation for 10000 ns. The resulting waveform is shown below.
Summary
This topic has shown you how to import a System Generator Design into a larger system.
There are a few important things to keep in mind during each phase of the process.
While creating a System Generator design:
IOB constraints should not be specified on Gateways in the System Generator model;
neither should the System Generator token specify a clock pin location.
Use the HDL Netlist compilation target in the System Generator token. The HDL
Netlist file that System Generator produces contains both the RTL, EDIF and
constraint information for your design.
Create a custom ModelSim .do file in order to compile the VHDL files created by
System Generator. Modify the Project Navigator settings to use this custom .do file
New capabilities:
Add System Generator Source type project file (.sgp) into Project Navigator as a submodule design
Consolidate and associate System Generator constraints into the top-level design
Launch MATLAB and System Generator MDL directly from Project Navigator to
perform certain design iterations
www.xilinx.com
81
82
1.
Launch Xilinx System Generator and from the MATLAB console navigate to the
System Generator examples directory at the following path:
<ISE tree>/sysgen/examples/shared_memory/hardware_cosim/con5X5_video
2.
Open the file conv5X5_video_ex.mdl, then double click on the System Generator
Token.
www.xilinx.com
3.
Note: Both the Synthesis and Implementation strategies are set to defaults; PlanAhead Defaults
and ISE Defaults respectively. These strategy options are only available when the Compilation setting
is set to either HDL Netlist or Bitstream and the Project type is PlanAhead. In other iterations, you can
choose a different strategy from the dropdown lists.
4.
Click Generate to generate the netlist files and the PlanAhead project file (PPR file).
5.
www.xilinx.com
83
6.
Observe in the PlanAhead Project Summary pane that the Design strategies are the
same as that specified in the System Generator Token.
7.
8.
As shown below, right click on the Strategy name in the Project Settings dialog box and
select Save As.
9.
10. Return to the open conv5X5_video_ex.mdl design in the MATLAB environment and
double-click on the System Generator token.
84
www.xilinx.com
11. Select PlanAhead in the Project type field and notice that the Synthesis strategy is now
automatically specified as custom_conv5X5_video.
www.xilinx.com
85
Open a PlanAhead project and add a file named top_level.vhd as the structural
Top-level source file.
2.
3.
After the rgb2gray model is added, right-click on it in the PlanAhead Sources window
and select Open File. This will invoke MATLAB and open the System Generator
model.
4.
Simulate the model to make sure the output simulation results match those provided
as references in the top_level.vhd
5.
6.
Now you ready to generate an HDL netlist from the System Generator model. Rightclick on the rgb2grey.mdl file and select Generate Output Products...
Note: Note: It will take approximately 3~5 minutes to generate. Once the netlist generation is
finished, you should see a hierarchical tree showing up under the rgb2grey model.
86
7.
8.
At this point, your Design Sources should look similar to the figure below:
www.xilinx.com
Create a new PlanAhead project by selecting an RTL Project type and targeting the
XC7K325TFFG900-2 device. Click through the Add Sources, Add Existing IP, and Add
Constraints popup windows. Basically you just created an empty project file with no
source files.
2.
Right-click on Design Sources > Add Sources... > Add or Create DSP Sources
3.
In the Add Sources window, select Add Files to add a System Generator
rgb2grey.mdl model located in the System Generator examples folder under
planahead/sysgen_import, then click Finish.
4.
5.
6. Now create a top-wrapper or top-level design for the rgb2gray_cw instance. Rightclick on rgb2gray_cw (rgb2gray.mdl) and select Create Top HDL.
You should now see a top-wrapper Verilog file named rgb2gray_cw_stub being
created by PlanAhead. This feature is useful even if you are an HDL coder.
www.xilinx.com
87
88
www.xilinx.com
Drag a template block into the library. (Templates can be found in the Simulink library
browser under Simulink/Ports & Subsystems/Configurable Subsystem.)
In the template GUI, turn on each checkbox corresponding to a block that should be
an implementation.
www.xilinx.com
89
90
As described above, create the library that defines the configurable subsystem.
Drag a copy of the template block from the library to the appropriate part of the
design.
Right-click on the instance, and under Block choice select the block that should be
used as the underlying implementation for the instance.
www.xilinx.com
Double click on the template, and turn off the checkbox associated to the block to be
deleted.
If necessary, update the choice for each instance of the configurable subsystem.
www.xilinx.com
91
Double click on the template, and turn on the checkbox next to the added block.
If necessary, update the choice for each instance of the configurable subsystem.
92
Select one of the blocks in the library, and double click to open it. (Aside from the
template any block will do, provided the block is itself a subsystem. If there is no such
subsystem in the library, it is not possible to use a configurable subsystem manager.)
www.xilinx.com
Drag a manager block into the subsystem opened above. (The manager block can be
found in Xilinx Blockset/Tools/Configurable Subsystem Manager).
Double click to open the GUI on the manager, then select the block that should be
used for hardware generation in the configurable subsystem.
www.xilinx.com
93
The more complex IP blocks in a System Generator design like FIR Compiler and FFT
are generated by CORE Generator under the hood. They are provided as highlyoptimized NGC netlists to the synthesis tool and the implementation tools, so further
optimization may not be possible.
System Generator netlisting produces HDL code with many instantiated primitives
such as registers, BRAMs, and DSP48s. There is not much a synthesis tool can do to
optimize these elements.
The following tips focus on what you can do in System Generator to increase the
performance of your design before you start the implementation process.
Review the Hardware Notes Included with Each Block Dialog Box
Review the Hardware Notes Included with Each Block Dialog Box
Pay close attention to the Hardware Notes included in the block dialog boxes. Many blocks
in the Xilinx Blockset library have notes that explain how to achieve the most hardware
efficient implementation. For example, the notes point out that the Scale block costs
nothing in hardware. By contrast, the Shift block (which is sometimes used for the same
purpose) can use hardware.
94
www.xilinx.com
and before Gateway Out blocks. Selecting any of the Register block features adds
hardware.
Double registering the I/Os may also be beneficial. This can be performed by instantiating
two separate Register blocks, or by instantiating two Delay blocks, each having latency 1.
This allows one of the registers to be packed into the IOB and the other to be placed next to
the logic in the FPGA fabric. A Delay block with latency 2 does not give the same result
because the block with a latency of 2 is implemented using an SRL16 and cannot be packed
into an IOB.
www.xilinx.com
95
As shown below, the Convert block can be pipelined with embedded register stages to
guarantee maximum performance.
To achieve a more efficient implementation on some Xilinx blocks, you can select the
Implement using behavioral HDL option. As shown below, if the delay on a Delay block
is 32 or greater, Xilinx synthesis infers a SRLC32E (32-bit Shift-Register) which maps into a
single LUT.
For BRAMS, use the internal output register. You do this by setting the latency from 1 (the
default) to 2. This enables the BRAM output register.
When you are using DSP48s, use the input, output and internal registers; for FIFOs, use the
embedded registers option. Also, check all the high-level IP blocks for pipelining options.
96
www.xilinx.com
Register Duplication: on
If you are using the ISE Project Navigator flow, these MAP options are also on by default.
However, if you are using a System Generator flow like Bitstream, you must turn on these
MAP options by modifying the bitstream OPT file or by providing you own OPT file. See
the topic XFLOW Option Files for more information.
www.xilinx.com
97
98
www.xilinx.com
Compiling Your IP
Before you can simulate your design, you must compile your IP (cores) libraries with
ModelSim.
ModelSim SE
There are multiple ways to compile your IP libraries. Complete instructions for running
compxlib can be found in the chapter titled COMPXLIB in the Command Line Tools User
Guide.
From the Windows command line you can compile the necessary HDL libraries using the
compxlib program. For example, the following command can be used to compile all the
HDL libraries with ModelSim SE:
compxlib -s mti_se -f all -l all
www.xilinx.com
99
on your PC in the Model Tech Simulator edit box. You must include the name of the
executable file in this field.
The Project Navigator project is already set up to run simulations at four different stages of
implementation. System Generator creates four different ModelSim .do files when the
Create Testbench option is selected on the System Generator token. The ModelSim do files
created by System Generator are:
100
pn_postmap.do - to run a simulation after your design has been mapped. This file
also includes a back-annotated simulation on the post-mapped design.
pn_postpar.do - to run a simulation after your design has been placed and routed.
This file also includes a back-annotated simulation step.
www.xilinx.com
In the Project Navigator Design Simulation view, you can use the pull-down menu to
select Behavioral Simulation, Post-Translate Simulation, Post-Map Simulation, or PostRoute Simulation (corresponding to pn_behavioral.do, pn_posttranslate.do,
pn_postmap.do, and pn_postpar.do respectively).
If you select the <your design>_tb.vhd/.v file in the Project Navigator Design
Simulation view, the ModelSim Simulator will become available in the Processes view.
Expand the ModelSim Simulator process by clicking on the plus button to the left of it. A
simulation process associated with the ModelSim Simulator will appear (in the image
below the process is labeled Simulate Behavioral Model).
www.xilinx.com
101
The Process Properties dialog box shows that the System Generator.do file is already
associated as a custom file for this process.
Now if you double-click on the simulation process, the ModelSim console opens, and the
associated custom do file is used to compile and run your System Generator testbench. The
testbench uses the same input stimuli that was generated in Simulink, and compares the
HDL simulation results with the Simulink results. Provided that your design was error
free, ModelSim reports that the simulation finished without errors.
102
www.xilinx.com
www.xilinx.com
103
my_project_cw - structural. The Processes window shows the processes that can
be run on the top-level HDL module.
In the Processes window, if you right-click on Generate Programming File and select Run,
you are instructing Project Navigator to run through whatever processes are necessary to
produce a programming file (FPGA bitstream) from the selected HDL source. In the
messages console window, you see that Project Navigator is synthesizing, translating,
mapping, routing, and generating a bitstream for your design.
Now that you have generated a bitstream for your design, you have access to all the files
that were produced on the way to bitstream creation.
104
www.xilinx.com
The effect of ce_clr signal cannot be simulated using the original System Generator
design. To model this behavior within Simulink follow the steps below:
1.
Select Provide clock enable clear pin and NGC Netlist Compilation option on the
System Generator token.
2.
3.
Run the following command from the MATLAB console to produce the post translate
VHDL netlist. Use -ofmt verilog with netgen for generating Verilog netlist:
>> !netgen -ofmt vhdl ./<target_directory>/<design_name>_cw.ngc
4.
Bring in the post translate VHDL/Verilog file as a Black Box within Simulink and use
HDL co-simulation to model the effect of asserting ce_clr signal on your design.
www.xilinx.com
105
Table 1-1:
Block Name
106
Synchronized to
ce after ce_clr
Behavior after ce_clr is de-asserted
Synchronized
deasserted
and the next ce pulse
to ce_clr
( 1 sample cycle
delay)
Down Sampler
with Last Value
of frame
Yes
N/A
Down Sampler
with First Value
of frame
No
No
Up Sampler
with copy
samples
Yes
N/A
Up Sampler
with zeros
inserted
No
Yes
Time Division
Multiplexer
No
Yes
Time Division
Demultiplexer
No
Yes
Parallel to Serial
No
Yes
Serial to Parallel
No
Yes
www.xilinx.com
Table 1-1:
Block Name
Addressable
Shift Register
(ASR)
Polyphase FIR
Synchronized to
ce after ce_clr
Behavior after ce_clr is de-asserted
Synchronized
deasserted
and the next ce pulse
to ce_clr
( 1 sample cycle
delay)
No
No
Yes
No
Based on the above analysis, the ce_clr signal can be used if the following
recommendations are adhered to:
Replace down sampler blocks with first value of frame behavior with an equivalent
circuit using down sampler block with last value of frame selected.
Design for N clock cycles of invalid data after ce_clr is de-asserted, where N is the
slowest ce associated with the block.
Design the model to always use down sampler with last value of frame and up
sampler with copy samples.
If N cycle invalid data is not desired replace parallel to serial, serial to parallel, time
division multiplexer and time division demultiplexer block with an equivalent circuit
built out of a counter, mux and up/down sampler blocks. The equivalent design
circuit should also have a reset port pulled to the top-level and connected to the same
signal driving the ce_clr port.
Always verify the effect of ce_clr signal on the design by importing and simulating
the post translate HDL model as a black box.
www.xilinx.com
107
The DSP48 combines an 18-bit by 18-bit signed multiplier with a 48-bit adder and a
programmable mux to select the adder's inputs. It implements the basic operation: "p=a*b
+(c+cin);", however other operations can be selected dynamically. Optional input and
multiplier pipeline registers are also included and must be used to achieve maximum
speed. Also included with the DSP48 are high performance local interconnects between
adjacent DSP48 blocks (BCIN-BCOUT and PCIN-PCOUT). The DSP48 also includes
support for symmetric rounding. This combination of features enables DSP systems which
use the higher-speedDSP48 devices to be clocked at over 500 MHz.
There are three ways to program a DSP48 in System Generator:
108
Use Standard Components - Map designs to Mult and AddSub blocks or use higherlevel IP such as the MACFIR filter generator blocks. This approach is useful if the
design uses a lower-speed clock and the mapping to DSP48s is not required.
Use Synthesizable Blocks - Structure the design to map onto the DSP48's internal
architecture and compose the design from synthesizable Mult, AddSub, Mux and
Delay blocks. This approach relies on logic synthesis to infer DSP48 blocks where
appropriate. This approach gives the compiler the most freedom and can often
achieve full-rate performance.
Use DSP48 Blocks - Use System Generator's DSP48 and DSP48 Macro blocks to
directly implement DSP48-based designs. This is the highest performance design
technique. Be aware however that obtaining maximum performance and minimum
area for designs using DSP48s may require careful mapping of the target algorithm to
the DSP48's internal architecture, as well as the physical planning of the design.
www.xilinx.com
To obtain the best possible performance, you should set the multiplier latency to 3 and
include an input register to cover the delay from the DSP48's output to the adder. In
Virtex-4, unlike Spartan-3 devices, the multiply speed in nearly independent of bit
width. For medium speed designs, this approach works fine.
An additional way to use the DSP48 is to use IP blocks optimized for the DSP48 such as the
MACFIR block available from coregen, or to use the architecture wizard to generate a
custom configured DSP48. Both of these approached require importing the logic
containing the DSP48 as a black box into System Generator. Simulation will require
ModelSim HDL cosim.
If the design is composed of synthesizable blocks, both Synplify Pro and XST have
demonstrated the ability to infer DSP48s and to make use of the DSP48's local interconnect
buses (PCOUT-PCIN and BCOUT-BCIN). In the above example, three blocks have been
built using the MCode blocks which are defined by the following M-functions.
function o = xlsynmux2(i0,i1,sel)
if (sel==0) o=i0; else o=i1; end
function p = xlsynmult(a,b)
p=a*b;
function s = xlsynadd(a,b)
s=a+b;
www.xilinx.com
109
For synthesis to work, the circuit must be mappable to the DSP48 and signal bitwidths
must be less than the equivalent buses in the DPS48.
You should kept in mind that the logic synthesis tools are rapidly evolving and that
inferring DPS48 configurations is more of an art than a science. This means that some
mappable designs may not be mapped efficiently, or that the mapping results may not be
consistent. It will be necessary to inspect the post synthesis netlist using a tool similar to
Synplify Pro's gate-level technology viewer to determine if the design is being correctly
mapped. If not, it may be possible to recast it to be correctly inferred. A model of a fully
synthesizable FIR filter is located at the follwing pathname in the System Generator
software tree:
.../sysgen/examples/dsp48/synth_fir/synth_fir_tb.mdl
The DSP48 block is effectively a wrapper for the DSP48 UNISIM primitive. Because of this,
any possible DSP48 design can be implemented. This low-level implementation however
requires an 11-bit binary opmode to be routed to the DSP48's control ports in order to
configure its function. The Constant block has a special mode enabling it to generate a
DSP48 control field. The DSP48's parameters dialog box is used to configure the pipelining
mode of the DSP48 as well as the use of the DSP48's local interconnect buses named
PCOUT-PCIN and BCOUT-BCIN. You can try out the DSP48 block by opening the
simulink model that is located at the follwing pathname in the System Generator software
tree:
.../sysgen/examples/dsp48/dsp48_primitive.mdl
110
www.xilinx.com
method of generating this type of control pattern is to use a mux to select the DSP48
instruction on a clock by clock basis.
The above example illustrates the use of a DSP48 and Constant blocks to implement a 35bit by 35-bit multiplier over 4 clock cycles. During synthesis, the mux and constant logic is
reduced by logic optimization. In the example above, the DSP48 block and the 4:1 mux are
reduced to just two 4-LUTs. A Simulink model that illustrates how to implement both
parallel and sequential 35*35-bit multipliers using dynamic operation for the sequential
mode of operation is located at the follwing pathname in the System Generator software
tree:
.../sysgen/examples/dsp48/mult35x35/mult35x35_tb.mdl
The DSP48 Macro block is a wrapper for the DSP48 block which makes it simple to
implement a sequence of DSP48 instructions (known as dynamic instructions). In addition,
it provides support for specifying input and output types. For example, in the model
above, a DSP48 Macro block is configured to implement a complex multiplier using a
sequence of four different instructions. The instructions are entered in a text window in the
DSP48 Macro's dialog menu. You can try out the DSP48 Macro block by opening the
simulink model that is located at the follwing pathname in the System Generator software
www.xilinx.com
111
tree:
.../sysgen/examples/dsp48/dsp48_macro.mdl
112
www.xilinx.com
Make sure that input and output pipeline register selections between the old and the
new block are the same. You can do this by examining and comparing the Pipeline
Options settings.
2.
If there is more than one unique input operand required, you must provide MUX
circuits as shown in the fugure below.
3.
Ensure that the new design provides the same functionality correctness and quality of
results compared to the old version. This can be accomplished by performing a quick
Simulink simulation and implementing the design.
4.
When configuring and specifying a pre-adder mode using the DSP48 Macro 2.0 block
in System Generator, certain design parameters such as data width input operands are
device dependent. Refer to the LogiCORE IP DSP48 Macro v2.0 Product Specification
for details on all the parameters on this LogicCore IP.
www.xilinx.com
113
114
sel
A inputs
B inputs
Opode
alo
blo
A*B
alo
bhi
A*B+P>>17
ahi
blo
A*B+P
ahi
bhi
A*B+P>>17
www.xilinx.com
You can find the above complete model at the following pathname:
<sysgen_path>/examples/dsp48/mult35x35/dsp48macro_mult35x35.mdl
1.
Always use DSP48, BRAM16, FIFO16 with input, mult and output registers
2.
3.
Plan out the usage of the PCOUT-PCIN bus to allow DSP48 chaining
4.
5.
6.
7.
Limit LUTs to 1 level or a 4:1 MUX and insure a local register for input or output
www.xilinx.com
115
8.
Use RAMs, SRL16 to clock out control patterns instead of state machines
9.
Use DSP48 to implement counters and adders greater than 8-16 bits
C-Input Sharing
Each pair of DSP48s share a single C input. You should be aware of this when you do
resource planning. Since the placer will not always find the most optimal placement to
share C inputs, DSP48s should avoid using C inputs if possible.
Placement
Most designs will benefit from some placement of DSP48 and BRAMs. Use of area
constraints to constrain LUT fabric logic placement may also be beneficial.
116
www.xilinx.com
Use the command map -timing with effort level high for both map and place
Use trce v 100 to get a good sense of the failing nets and inspect the
xflow/design.twr file to understand the nature of the design's timing.
Synthesis Flow
Use Synplify Pro with retiming and pipelining enabled to avoid having to manually
pipeline every LUT and signal.
Use Synplify Pro with the fanout limit set around 32 to avoid long net delays.
Open compiled projects in Synplify Pro and inspect the generated logic using the
RTL- and Gate-level views to get a good idea of what logic is being generated.
The file syn.pl is available in the examples/dsp48 directory. Place this file in
<ISE_Design_Suite_tree>/sysgen/scripts directory to modify the synthesis
options in System Generator
Only one net can be allowed in a critical path at 450 MHz. This allows a 4:1 mux to a
reg a 4_input LUT to a reg or a net through a LUT directly to a DSP48
Counters up to 16-bits can be used, but do not use count limited counters without
additional pipelining
If accumulators or counters are used, invert the enable line to an active-low condition
to prevent a extra LUT from being inserted in the critical path
Any adders must have local input registers. It may be necessary to place control
counters in the DSP48 to insure speed.
Fanout Planning
Avoid fanouts of more than 32 LUTs or 8 DSP48s or BRAMs. This can be avoided by
inserting additional pipeline registers in these signals paths.
Register Retiming
Check retiming on delay blocks to allow them to be used as registers for pipelining. Then
use Synplify Pro or XST with retiming enabled to allow the synthesis tool to move registers
into optimal positions.
www.xilinx.com
117
Although a single MAC engine FIR filter is used for this example, we strongly recommend
that you look at the DSP Reference Library provided as a part of the Xilinx Reference
Blockset. The DSP Reference Library consists of multi-MAC, as well as, multi-channel
implementation examples with variations on the type of memory used.
118
www.xilinx.com
A demo included in the System Generator demos library also shows an efficient way to
implement a MAC-based interpolation filter. To see the demo, type the following in the
MATLAB command window:
>> demo blockset xilinx
then select FIR filtering: Polyphase 1:8 filter using SRL16Es from the list of demo designs.
Design Overview
This design uses the random number source block from the DSP Blockset library to drive
two different implementations of a FIR filter:
The first filter is the one that could be implemented in a Xilinx device. It is a fixedpoint FIR filter implemented with a dual-port Block memory and a single multiplyaccumulator.
The frequency response of each filter is then plotted in a transfer function scope.
2.
Open the design model by typing mac_df2t from your MATLAB command window.
For the purpose of this tutorial, the variables coef, coef_width, coef_binpt,
data_width, data_binpt and Fs are not defined. You will first use these variables as
mask parameters to the MAC Based FIR block and then design and assign the filter
www.xilinx.com
119
coefficients using the FDATool. The fully functional model is available in the current
directory and is called mac_df2t_soln.mdl.
120
1.
Right Click on the MAC-Based FIR block and select Edit Mask as shown in the figure
below.
2.
Double-click on the Parameters tab and add the parameters coef, data_width and
data_binpt as shown below.
www.xilinx.com
Drag and drop the FDATool block into your model from the DSP Xilinx Blockset
Library.
2.
Double-click on the FDATool block and enter the following specifications in the Filter
Design & Analysis Tool for a low-pass filter designed to eliminate high-frequency
noise in audio systems:
Frequency Specifications
3.
Units: Hz
Fs: 44100
Fpass: 6000
Fstop: 7725
Magnitude Specifications
-
Units: dB
Apass: 1
Astop: 48
Click on Design Filter at the bottom of the tool window to find out the filter order and
observe the magnitude response.
You can also view the phase response, impulse response, coefficients and more by
selecting the appropriate icon at the top-right of the GUI. Based on the FDATool, a 43tap FIR filter (order 0-42) is required in order to meet the design specifications listed
above.
www.xilinx.com
121
These useful functions help you find the maximum and minimum coefficient value in
order to adequately specify the coefficient width and binary point:
>> max(xlfda_numerator('FDATool'))
>> min(xlfda_numerator('FDATool'))
For this tutorial, the coefficient type has been set to be Fix_12_12, which is a 12-bit
number with the binary point to the left of the twelfth bit. The result of the max()
function above shows that the largest coefficient is 0.3022, which means that the binary
point may be positioned to the left of the most significant bit. How do you reason that?
A Fix_12_12 number has a range of -0.5 to 0.4998, meaning the dynamic range is
maximized by putting the binary point left of the most significant bit. If you moved the
binary point to the right (by using a Fix_12_11 number) you would lose one bit of
dynamic range because a Fix_12_11 number has a range of -1 to 0.9995, which is more
than you require to represent the coefficients.
4.
Click on the Reference Filter block and the MAC Based FIR block and verify the
parameter values for coef, coef_width, coef_binpt, data_width, data_binpt and Fs as
shown below.
122
www.xilinx.com
At this point, the MAC filter is set up for a 10-bit signed input data (Fix_10_8), a 12-bit
signed coefficient (Fix_12_12), and 43 taps. All these parameters can be modified directly
from the MAC block GUI. The coefficients and data need to be stored in a memory system.
For the tutorial, you choose to use a dual-port memory to store the data and coefficients,
with the data being captured and read out using a circular RAM buffer. The RAM is used
in a mixed-mode configuration: values are written and read from port A (RAM mode), and
the coefficients are only read from port B (ROM mode).
The multiplier is set up to use the embedded multiplier resource available in Xilinx
Virtex devices as well as three levels of latency in order to achieve the fastest performance
possible. The precision required for the multiplier and the accumulator is a function of the
filter taps (coefficients) and the number of taps. Since these are fixed at design time, it is
possible to tailor the hardware resources to the filter specification. The accumulator need
only have sufficient precision to accumulate maximal input against the filter taps, which is
calculated as follows:
acc_nbits = ceil(log2(sum(abs(coef*2^coef_width_bp)))) + data_width+ 1;
Upon reset, the accumulator re-initializes to its current input value rather than zero, which
allows the MAC engine to stream data without stalling. A capture register is required for
streaming operation since the MAC engine reloads its accumulator with an incoming
sample after computing the last partial product for an output sample.
Finally, a downsampler reduces the capture register sample period to the output sample
period. The block is configured with latency to obtain the most efficient hardware
implementation. The downsampling rate is equal to the coefficient array length.
www.xilinx.com
123
System Generator gets its input sample period from the din Gateway In block which
has 1/Fs specified as the data input sample period. As the MAC-based FIR filter is
over-sampled according to the number of taps, the System Clock Period will always be
equal to 1/(Filter Taps * Fs).
2.
Double click on the System Generator token and change the Simulink system period to
specify the System Clock Period as 5.273427e-007 = 1/(43 * 44100) as shown below.
3.
Run the simulation again and notice that the Xilinx implementation of the MAC-based
FIR filter meets the original filter specifications and that its frequency response is
almost identical to the double precision Simulink models.
As you can see, the filter passband response measurement as well as zeros can clearly
be seen. You should get similar frequency responses as shown in the following figure.
124
www.xilinx.com
It is possible to increase or decrease the precision of the Xilinx Filter in order to reach the
perfect area/performance/quality trade off required by your design specifications.
Stop the simulation and modify the coefficient width to FIX_10_10 and the data width to
FIX_8_6 from the block GUI. Update the model (Ctrl-d) and push into the MAC engine
block. You should now notice that the datapath has been automatically updated to only
eighteen bits on the output of the multiplier and twenty on the output of the accumulator.
www.xilinx.com
125
Restart the simulation and observe how the frequency response has been affected. The
attenuation has indeed degraded (less than 40dB) due to the fixed-wordlength effects.
126
www.xilinx.com
Using hierarchy to partition a System Generator model into two or more clock
domains;
Wiring multiple clock domains together using the Xilinx Multiple Subsystem
Generator block.
A step-by-step example is provided to help clarify the topics listed above. Although the
example uses two clocks, the concepts presented here can be extended so that System
Generator designs requiring any number of clock sources can be constructed using similar
techniques.
Before continuing with the example, you may want to familiarize yourself with standard
System Generator clocking terminology and implementation methodologies. This
information is covered in-depth in the topic Timing and Clocking. In general, System
Generator designs are driven by a single, system clock source. Multirate design portions
are handled using clock enables derived from the system clock source. It is possible,
however, to use System Generator to implement designs that are driven by distinct clock
sources.
Broadly speaking, the approach is the following:
Divide the design into several subsystems, each of which is to be driven by a different
clock. In the example, you call these subsystems asynchronous clock islands. Xilinx shared
memory blocks should be used as bridges that communicate between these clock islands.
Once the design is partitioned, the Xilinx Multiple Subsystem Generator block may be
used to translate the design into hardware that uses multiple distinct clock sources.
www.xilinx.com
127
requirements and would only be employed if the sample rate were very fast. An
alternative approach is to clock the FIR filter at the sample rate, creating one sample per
cycle. This scenario takes an intermediate amount of hardware and would be used for
intermediate sample rates. If the sample rate is slow, the FIR filter may be clocked at a rate
several times faster than the sample rate, perhaps by means of a DCM that multiplies the
sample-rate clock. In this way the multiplier-accumulator units of the FIR filter may be
reused several times during the calculation of each sample output, requiring the least
amount of hardware. This last method would use a symbol-rate clock domain, a highspeed processing clock domain, and a sample-rate clock domain.
A good FPGA design practice is to have each resource in the FPGA device operating at the
highest possible rate to optimize hardware usage. In general, it is best to use a single clock
domain when possible and to use clock enables to gate slower circuitry, creating multicycle
paths. The drawback to this technique is that it increases power consumption and may
make it difficult to route the high-speed clock enable. As a result, separate domains for
high-speed processing are preferable in some instances. Also, it may not be possible to
avoid dealing with different clock domains when dealing with asynchronous data inputs
and outputs.
The physical clock lines are abstracted away from the block diagram;
Because the domains are well-defined, System Generator can accurately produce
timing constraints for the synchronous islands.
The abstraction level of System Generator reduces the risk that users will perpetrate some
of the more common design errors. These include:
128
Gated Clocks: because the clocks in System Generator are inferred during hardware
generation, it is not possible to connect non-clock lines to clock inputs (i.e., gated
clocks).
Inferred Latches: latches will not be generated from System Generator designs.
www.xilinx.com
Shared Memory
When these shared memory blocks are used to cross clock domains, each set should be
split into a matched pair.
The To FIFO block is put in the domain in which it is to be written. The From FIFO is put
in the domain in which it is to be read. The two blocks are linked by the name of the
Shared memory name parameter. The FIFO is implemented in hardware using the Xilinx
FIFO Generator core. Using FIFO blocks is the safest and easiest-to-use of the three blocks
which cross domains and is the best for high-bandwidth, sequential data transfers.
A pair of Shared Memory blocks is implemented as embedded Xilinx dual-port block
RAM core. The two blocks are linked by the name of the shared memory object. Each
www.xilinx.com
129
member of the pair resides in a different domain. Because the RAM is a true dual-port,
each domain may write to the RAM. Care must be taken, by means of semaphores or other
logic, to ensure that two writes or a read and a write to the same address do not happen
simultaneously. For example, if domain A writes to a memory location at the same time
that domain B is reading from it, the data read may not be valid. The shared memory is
implemented as a using Xilinx Dual Port Block Memory core to ensure that large memories
are efficiently mapped across multiple BRAMs.
The To Register is put in the domain in which it is to be written, and the From
Register in the domain from which it is to be read. The two blocks are linked by the
name of the shared memory. The To Register may also be read synchronously in its own
domain. The register may be of variable width and will synthesize as flip-flops. A 1-bit
To/From Register pair will synthesize as a single flop.
Note: Crossing domains in this manner can be unsafe, and requires the use of metastabilityreducing synchronization flops and semaphores for multiple-bit transfers. This technique should only
be used when the hardware pitfalls are well-understood.
Note: The Multiple Subsystem Generator block does not support designs that include an EDK
Processor block
130
www.xilinx.com
The diagram below illustrates the concept of putting domain-crossing blocks into their
own subsystem. When a multiple-domain design is netlisted, System Generator does the
following:
Creates an HDL file for Domain 0 (on the left), excluding the To FIFO block, and calls
the netlister to create a black-box netlist delivered as an NGC file;
Creates an HDL file for Domain 1 (on the right), excluding the From FIFO block, and
calls the netlister to create a black-box netlist delivered as an NGC file;
Invokes the Xilinx CORE Generator to produce a core for the FIFO block (middle);
Step-by-Step Example
This example shows how design hierarchy can be used to partition a System Generator
design into multiple asynchronous clock islands. The example also demonstrates how
Xilinx Shared Memory blocks may be used to communicate between these islands. Lastly,
the example describes how the Multiple Subsystem Generator block can be used to netlist
the complete multi-clock design.
1.
2.
Open the two_async_clks model from the MATLAB command window, and save it
into a temporary directory of your choosing.
Subsystem hierarchy is used in this example to partition the design into two synchronous
clock domains, to which you refer as domains A and B, that are internally synchronous to
a single clock, but asynchronous relative to each other. The design includes two
subsystems named ss_clk_domainA and ss_clk_domainB, which include logic
associated with clock domains, A and B, respectively. The blocks inside the
ss_clk_domainA subsystem operate in clock domain A while all blocks inside the
ss_clk_domainB subsystem operate in a second clock domain, B.
The asynchronous islands in the example communicate with one another via a shared
memory interface implemented using a pair of Xilinx Shared Memory blocks. The two
Shared Memory blocks are distributed so that one block resides in domain
ss_clk_domainA and the other resides in domain ss_clk_domainB. Both blocks
specify the same shared memory object name, bram_iface. This allows the Shared
Memory blocks to access a common address space during simulation. Note that in the
diagram there is no physical connection shown between the two shared memory halves.
www.xilinx.com
131
This is because the connection is implicitly defined by the fact that the two Shared Memory
blocks specify the same shared memory object name and therefore, share an address space.
When the two subsystems are wired together and translated into hardware, the shared
memory blocks are moved from their respective subsystems and merged into a block RAM
core. For more information on how this works, refer to the topic Multiple Subsystem
Generator.
The synchronous islands sample different input sources. Island ss_clk_domainA samples a
sinusoid input, while ss_clk_domainB samples a saw-tooth wave input. Each subsystem
writes its samples into opposite halves of the shared memory. Once an island has filled its
half of memory, it reads samples from the other island's half. You can simulate the design
to visualize of the model's behavior.
3.
4.
Also shown in the output scope are the two clocks, clk_A and clk_B. At the default time
scale, it is difficult to distinguish the two. Zoom in to get a more detailed view.
Notice that clk_A and clk_B have different periods and are out of phase with one
another. Earlier, it was claimed that System Generator uses a single clock source per
design. In the scope, you clearly see two different clocks. How is this possible?
The answer is in the hierarchical construction of the design. All blocks are buried in at least
one level of hierarchy using subsystems. Because there is no System Generator token at the
top level, you can consider each subsystem as a completely separate System Generator
design (at least for the time being). In this model, you have effectively defined two clock
domains by giving the ss_clk_domainA and ss_clk_domainB subsystems different
Simulink system periods. This is allowed since you are treating these subsystems as
separate System Generator designs. The clock probes in the ss_clk_domainA and
ss_clk_domainB subsystems use the Simulink system periods in their respective
132
www.xilinx.com
subsystems to determine their output, hence different system periods yield different
system clocks.
Now consider the clocks defined by the System Generator token in the ss_clk_domainA
and ss_clk_domainB subsystems.
5.
Open the System Generator token parameter dialog boxes inside the
ss_clk_domainA and ss_clk_domainB subsystems.
The System Generator token dialog box in the ss_clk_domainA subsystem defines an
FPGA clock period of 10ns (i.e., a frequency of 100MHz). To simplify the sample period
values in the model, the 10 ns clock is normalized to a Simulink system period value of 2
sec. In the ss_clk_domainB subsystem, an FPGA clock period of 15ns (i.e., a frequency
66.7 MHz) is defined. Normalizing this clock period gives us a Simulink system period
value of 3 sec.
Because the two subsystems in this example implement multiple, synchronous, System
Generator domains, you will use the Multiple Subsystem Generator block to wire the
subsystems together into a single HDL top-level component that exposes two clock ports.
When the Multiple Subsystem Generator translates a design into hardware, it generates
each subsystem individually as an NGC netlist file. It also creates a top-level VHDL
component or Verilog module that instantiates the subsystem netlist files as black boxes,
and wires them together using shared memory cores as clock domain bridges.
You begin by using the Multiple Subsystem Generator block to netlist subsystems
ss_clk_domainA and ss_clk_domainB.
6.
Open the Multiple Subsystem Generator dialog box by double clicking on the Multiple
Subsystem Generator block included in the top-level of the two_async_clks model.
7.
Pick a suitable target directory inside the Multiple Subsystem Generator dialog box.
The default directory is netlist.
www.xilinx.com
133
8.
Press the Generate button. You may leave the Part, Synthesis Tool, and Hardware
Description Language fields as they are.
Once the Multiple Subsystem Generator block is finished running, it will display a
message box indicating that generation is complete. It is worthwhile to take a look at the
generated results.
9.
134
www.xilinx.com
There are several interesting things to notice about the port interface. First, the component
exposes two clock ports (shown in bold text). The two clock ports are named after the
subsystems from which they are derived (e.g., ss_clk_domaina), and are wired to their
respective subsystem NGC netlist files. Also note that the top-level ports of each
subsystem (e.g., din_a and dout_a) appear as top-level ports in the port interface.
The Multiple Subsystem Generator block does not generate circuitry (e.g., a DCM) to
generate multiple clock sources. You may modify the top-level HDL component to include
the circuitry, or instantiate the top-level HDL as a component in a separate wrapper that
includes the clocking circuitry.
Instantiate the System Generator top-level component along with other wrapper logic
(e.g., a DCM);
Create a new top-level port map which supersedes that from the System Generator
component.
www.xilinx.com
135
use unisim.vcomponents.all;
entity top_wrapper is
port (
clk : in std_logic;
din_a : in std_logic_vector(7 downto 0);
din_b : in std_logic_vector(7 downto 0);
dout_a : out std_logic_vector(7 downto 0);
dout_b : out std_logic_vector(7 downto 0)
);
end top_wrapper;
architecture structural of top_wrapper is
--------------------------------------- SysGen Model Component Declaration
-------------------------------------component two_async_clks
port (
din_a: in std_logic_vector(7 downto 0);
din_b: in std_logic_vector(7 downto 0);
ss_clk_domaina_cw_ce: in std_logic := '1';
ss_clk_domaina_cw_clk: in std_logic;
ss_clk_domainb_cw_ce: in std_logic := '1';
ss_clk_domainb_cw_clk: in std_logic;
dout_a: out std_logic_vector(7 downto 0);
dout_b: out std_logic_vector(7 downto 0)
);
end component;
component bufg
port(i: in std_logic;
o: out std_logic);
end component;
--------------------------------------- DCM Component Declaration
-------------------------------------component dcm
-- synopsys translate_off
generic (clkout_phase_shift : string := "fixed";
dll_frequency_mode : string := "low";
duty_cycle_correction : boolean := true;
clkdv_divide : real := 3;
clkfx_multiply : integer := 2;
clkfx_divide : integer := 1);
-- synopsys translate_on
port (clkin : in std_logic;
clkfb : in std_logic;
dssen : in std_logic;
psincdec : in std_logic;
psen : in std_logic;
psclk : in std_logic;
rst : in std_logic;
clk0 : out std_logic;
clk90 : out std_logic;
clk180 : out std_logic;
clk270 : out std_logic;
clk2x : out std_logic;
clk2x180 : out std_logic;
clkdv : out std_logic;
clkfx : out std_logic;
clkfx180 : out std_logic;
136
www.xilinx.com
clk0unbuf : std_logic;
clk0buf : std_logic;
clkfxbuf : std_logic;
clk2xunbuf : std_logic;
clkfxunbuf : std_logic;
clkdvunbuf : std_logic;
clkdvbuf : std_logic;
ff1,ff2,ff3,ff4 : std_logic;
dcm_rst : std_logic;
intlock : std_logic;
------------------------------------------------------------------------------- The top level instantiates the SysGen design, a DCM, and two BUFGs.
-- The DCM generates two clocks of different frequencies.
-- These two clocks are used to drive the two different clock domains
-- in the SysGen block.
-----------------------------------------------------------------------------begin
dcm0: dcm
-- synopsys translate_off
generic map (dll_frequency_mode => frequency_mode,
clkdv_divide => clkdv_divide_generic,
clkfx_multiply => clkfx_multiply_generic,
clkfx_divide => clkfx_divide_generic)
-- synopsys translate_on
port map (clkin => clk,
clkfb => clk0buf,
dssen => '0',
psincdec => '0',
psen => '0',
psclk => '0',
rst => dcm_rst,
clk0 => clk0unbuf,
clk2x => clk2xunbuf,
www.xilinx.com
137
138
www.xilinx.com
2.
Open the chip_ex1.mdl model from the MATLAB console. This model represents a
simple usage model of a DDS Compiler block that will produce sine and cosine output
waveforms. Both sine and cosine output waveforms will later be connected to a
Chipscope block, enabling you to debug and verify the Systen Generator block by
probing and plotting the waveforms.
www.xilinx.com
139
140
3.
The 8-bit Counter is used to trigger ChipScope. The most significant bit is extracted
with a slice block and can be used for a variety of purposes such as driving an LED on
the ML506 Platform for this exercise.
4.
The first plot represents the most significant bit of the 8-bit counter. The MSB
becomes 1 when the counter output is within the range of 128 through 255.
www.xilinx.com
The third and forth plots show the output sine and cosine respectively.
5.
Integrate ChipScope into the Simulink model. The ChipScope block can be found in
the Simulink Library Browser in the Xilinx Blockset, under the Tools library. While
holding down the left mouse button, select the ChipScope block and drag it into the
open area in the lower-right corner of the Simulink model.
6.
Double click on the ChipScope block in order to set the following parameters:
Number of trigger ports: Multiple trigger ports allow a larger range of events to
be detected and can reduce the number of values that must be stored. Up to 16
trigger ports can be selected. In this example, only one is used.
Display settings for trigger port: For each trigger port, the number of match units
and the match type need to be set. The pulldown menu displays options for a
particular trigger port. For N ports, the display options for trigger port 0 to N-1
can be shown. In this example, there is one Trigger port named Trig0. This option
should therefore be set to 0.
Number of match units: Using multiple match units per trigger port increases the
flexibility of event detection. One to four match units can be used in conjunction
to test for a trigger event. In this example, this option should be set to 1 since you
are only checking for one condition (i.e., the 8-bit counter value). You will set the
trigger value at run-time in the ChipScope Pro Analyzer.
Match type: This option can be set to one of the following six types:
1. Basic: performs = or <> comparisons
2. Basic With Edges: in addition to the basic operations high/low, low/high
transitions can also be detected
3. Extended: performs =, <>,>,<, <=, >= comparisons
4. Extended With Edges: in addition to the extended operations, high/low,
low/high transitions can also be detected.
5. Range: performs =, <>, >, >=, <, <=, in range, not in range comparisons
6. Range With Edges: in addition to the range operations, high/low, low/high
transitions can also be detected. In this example, set the Match Type to Basic with
Edges.
Number of data ports: Up to 256 bits can be captured per sample. This means that
the sum over all ports of the bits used per port must be less than or equal to 256.
System Generator propagates the data width automatically; therefore, only the
number of data ports needs to be specified. In this example, you want to view the
sine and cosine and trig_counter, hence you enter 3.
www.xilinx.com
141
After parameterization the ChipScope GUI should look like the following:
142
www.xilinx.com
7.
Note that the names of the ports on the ChipScope block are specified by names given
to the signals connected to the block, e.g. Sine and Cosine.
8.
Location Constraints
Now that the design is fully implemented and simulates correctly, the next step is to
prepare it for connection to the hardware target. Although it can work on any
hardware platform, the process is described for the ML506.
Two pins need to be locked down in this design: The LED and the clock pin.
LED Pin: Double click on the Gateway Out1 block, select Specify IOB Location
constraints and type in {'AE24'} (note the need for single quotes).
Clock Pin: Double click on the System Generator token, set the clock period to
10ns and the clock pin location to AH15.
www.xilinx.com
143
If you are using a different board, the pin locations should be modified appropriately.
9.
Double click on the System Generator token and verify the parameter settings as
follows:
144
The Core Generator is automatically called to generate the Sine/Cosine table and
Counter netlists. ChipScope generator is called to create an Integrated Logic
Analyzer (ILA) core and an ICON core to communicate with the ChipScope Pro
software via the JTAG port.
www.xilinx.com
Real-Time Debug
The next step is to run the design on the ML506 platform and view the probed outputs with
the ChipScope Pro Analyzer.
1.
Connect one end of the Parallel Cable IV or Platform USB cable to the General JTAG
connector (J1) on the ML506 board. Connect the other end to your computer.
2.
www.xilinx.com
145
3.
4.
Under the New Project Window, right click on Device 4 > Configure > Select
New File. At this point, you need to look for the bitstream which was generated
in step 10 of the previous section ( ./bitstream/chip_cw.bit). After configuration,
you should see an INFO message at the bottom of the ChipScope Analyzer
window Found 1 Core Unit in the JTAG chain.
5.
In the New Project window, under Device 4, double click on Trigger Setup to
bring up the setup window but do not set it yet at this step.
In the New Project window, under Device $ > Unit 0 MyILA0 (ILA), double click
on Bus Plot.
A Bus Plot window appears. Select cosine and sine in the Bus Selection section, and
then arm the trigger by clicking the
button. Since you have not yet set any trigger
conditions, values are captured immediately. Both the sine and cosine appear as
shown below. You can change the display option to represent the waveforms with
points, lines, or both.
6.
Setup Trigger
In the Trigger Setup window, change the current X value with all 1s. A low-to-high
pulse is used for this trigger and can be manually triggered by pushing the center PB
SW as shown below. ChipScope starts capturing data when it detects a low-to-high
pulse. Earlier, you setup the buffer to 1024 so that up to 1024 data points can be
captured and visualized in ChipScope.
146
www.xilinx.com
This method of triggering is useful if you want a full control of when you like to
capture the data. This is accomplished by connecting one of the PB switches to a single
shot (Rising Edge Detector) circuit. The center PB switch (AJ6, SW14) is used for this
exercise.
www.xilinx.com
147
2.
Start MATLAB and change the current working directory to the location where you
saved sinecos.prn.
3.
You can plot the values using the MATLAB plot function.
148
Select File > Export option from within ChipScope Pro Analyzer. Select ASCII
format and choose Bus Plot Buses to export. Press the Export button and save the
file as sinecos.prn.
www.xilinx.com
2.
3.
Replace all the Simulink blocks with the JTAG HWCS block that you just generated
except for input and output gateways
www.xilinx.com
149
4.
Add a Simulink Slider Gain block to attenuate phase inc/dec changes and your model
should look similar to the figure below:
Benefits
One of the main benefits for this feature is the ability capture and examine the System
Generator data in real time. The data can be captured directly from external IO pins via
non-memory-mapped IO such as Analog to Digital without having to capture the data
onto Shared Memory of FIFO and then read it onto Simulink. In this particular example,
ChipScope captures data at real time rates when using Free Running, HW Co-simulation
mode.
How to Iterate a Design between System Generator for DSP and ChipScope
Pro Analyzer
In Simulink
Press play on the Simulink model to start hardware co-simulation to download the
bitstream
Start the ChipScope Pro Analyzer from the Windows Start menu
2.
From the pulldown menu: File > Open Project... and select
chipscope_ex2_chipscope.cpj
3.
From the pulldown menu: JTAG Chain > Xilinx Platform USB Cable... then press OK
4.
From the toolbar: Press the Trigger Now button (i.e. T! button)
Note: You should be able to observe the following output waveform from the Bus Plot
150
www.xilinx.com
AXI Interface
In Simulink
1.
Press Play and change the Slider Gain setting to change the frequency of the DDS
2.
Press the Trigger Now button again to capture the new Sine and Cosine waves that are
running at a different frequency
AXI Interface
Introduction
AMBA AXI4 (Advanced eXtensible Interface 4) is the fourth generation of the AMBA
interface defined and controlled by ARM, and has been adopted by Xilinx as the nextgeneration interconnect for FPGA designs. Xilinx and ARM worked closely to ensure that
the AXI4 specification addresses the needs of FPGAs.
AXI is an open interface standard that is widely used by many 3rd-party IP vendors since
it is public, royalty-free and an industry standard.
The AMBA AXI4 interface connections are point-to-point and come in three different
flavors: AXI4, AXI4-Lite and AXI4-Stream.
In the following documentation, AXI4 refers to the AXI4 memory map interface, and AXI4Lite and AXI4-Stream each refer to their respective flavor of the AMBA AXI4 interface.
When referring to the collection of interfaces, the term AMBA AXI4 shall be used.
The purpose of this section is to provide an introduction to AMBA AXI4 and to draw
attention to AMBA AXI4 details with respect to System Generator. For more detailed
information on the AMBA AXI4 specification please refer to the Xilinx AMBA-AXI4
documents found in http://www.xilinx.com/ipcenter/axi4.htm.
www.xilinx.com
151
Naming conventions
AXI4-Stream signals are named in the following manner:
<Role>_<ClassName>[_<BusName>]_[<ChannelName>]<SignalName>
For instance:
m_axis_tvalid
Here m denotes the Role (master), axis the ClassName (AXI4-Stream) and tvalid the
SignalName
s_axis_control_tdata
Here s denotes the Role (slave), axis the ClassName, control the BusName which
distinguishes between multiple instances of the same class on a particular IP, and tdata
the SignalName.
152
A transfer on any given channel occurs when both TREADY and TVALID are high in
the same cycle.
TVALID once asserted, may only be de-asserted after a transfer has completed
(TREADY is sampled high). Transfers may not be retracted or aborted.
www.xilinx.com
AXI Interface
Once TVALID is asserted, no other signals in the same channel (except TREADY) may
change value until the transfer completes (the cycle after TREADY is asserted).
TREADY may be asserted before, during or after the cycle in which TVALID is
asserted.
The assertion of TVALID may not be dependent on the value of TREADY. But the
assertion of TREADY may be dependent on the value of TVALID.
There must be no combinatorial paths between input and output signals on both
master and slave interfaces:
Applied to AXI4-Stream IP, this means that the TREADY slave output cannot be
combinatorially generated from the TVALID slave input. A slave that can
immediately accept data qualified by TVALID, should pre-assert its TREADY
signal until data is received. Alternatively TREADY can be registered and driven
the cycle following TVALID assertion.
Note that combinatorial paths between input and output signals are permitted
across separate AXI4-Stream channels. It is however a recommendation that
multiple channels belonging to the same interface (related group of channels that
operate together) should not have any combinatorial paths between input and
output signals.
For any given channel, all signals propagate from the source (typically master) to the
destination (typically slave) except for TREADY. Any other information-carrying or
control signals that need to propagate in the opposite direction must either be part of
a separate channel (back-channel with separate TREADY/TVALID handshake) or
be an out-of-band signal (no handshake). TREADY should not be used as a
mechanism to transfer opposite direction information from a slave to a master.
AXI4-Stream allows TREADY to be omitted which defaults its value to 1. This may
limit interoperability with IP that generates TREADY. It is possible to connect an
AXI4-Stream master with only forward flow control (TVALID only)
Port Groupings
www.xilinx.com
153
Blocks that proffer AXI4-Stream interfaces have AXI4-Stream channels grouped together
and color coded. For example, on the DDS Compiler 5.0 block shown above, the top-most
input port data_tready and the top two output ports, data_tvalid and data_tdata belong in
the same AXI4-Stream channel. As does phase_tready, phase_tvalid and phase_tdata.
Signals that are not part of any AXI4-Stream channels are given the same background color
as the block; rst is an example.
The breaking out of multi-channel TDATA does not add additional logic to the design and
is done in System Generator as a convenience to the users. The data in each broken out
TDATA port is also correctly byte-aligned.
154
www.xilinx.com
Hardware/Software Co-Design
The Chapter covers topics regarding developing software and hardware in System
Generator.
Hardware/Software Co-Design
in System Generator
EDK Support
www.xilinx.com
155
PicoBlaze Block
The PicoBlaze block provides the smallest degree of flexibility but is the least complex to
use. The Xilinx PicoBlaze Microcontroller block implements an embedded 8-bit
microcontroller using the PicoBlaze macro, and exposes a fixed interface to System
Generator. Ordinarily, a single block ROM containing 1024 or fewer 8 bit words serves as
the program store. You can program the PicoBlaze using the PicoBlaze Assembler
language. This flow is documented in the topic Designing PicoBlaze Microcontroller
Applications.
156
www.xilinx.com
The EDK Processor block provides a solution to both these problems through automation.
The EDK Processor block encourages the interface between the processor and the custom
logic to be specified via shared-memories. Shared-memories are used to provide storage
locations that can be referenced by name. This allows a memory map and the associated
software drivers to be generated.
Please refer to the EDK Processor block documentation regarding information on the use
of the block. The topics that follow describe the automatic memory map creation,
hardware generation in different compilation flows, and the use of the associated software
drivers, and the two clock wiring schemes provided by the EDK Processor block.
Memory Map Creation
Hardware Generation
Hardware Co-Simulation
Explains how to create a hardware cosimulation model for the EDK Processor block.
Asynchronous Support
Troubleshooting
www.xilinx.com
157
A System Generator model is shown on the bottom-right of the figure above. The System
Generator model corresponds to custom logic that will be integrated with the
MicroBlaze processor. In the construction of the model, shared-memories are used in
locations where software access is required. For example, the status of the hardware might
be kept in a register. To make that status information visible in the processor, the register is
replaced by a named shared-register. Naming the shared-register "status" gives the name
of the memory context that will be useful later on during software development.
The block GUI of the EDK Processor block allows these shared-memories to be added to
the memory map of the processor (bottom-left of the figure). The block diagram at the top
of the figure above shows the flow of data. When a shared memory is added to the memory
map of the processor, the EDK Processor block creates the corresponding matching shared
memory. This shared memory is attached to the memory map that is generated for that
EDK Processor block. Next, a bus adaptor is used to connect that memory map to the
MicroBlaze processor.
Note: The EDK Processor block does not support Shared Memory blocks with spaces in their
names.
When hardware is generated, each shared memory pair is implemented with a single
physical memory. The implementation for each class of shared memory is documented in
the topic Shared Memory Support, found under the topic Using Hardware Co-Simulation.
158
www.xilinx.com
Hardware Generation
The EDK Processor block supports two modes of operation: EDK pcore generation and
HDL netlisting. The different modes of operation are illustrated below and can be chosen
from a list-box in the EDK Processor block's GUI.
Hardware Co-Simulation
Currently the EDK Processor block provides hardware-based simulation through
hardware co-simulation. The creation of a Hardware Co-Simulation block follows the
standard co-simulation flow described in the topic Using Hardware Co-Simulation. The
only difference is how top-level ports of the imported XPS project are treated.
When an XPS project is imported into System Generator, the import wizard assumes that
all the ports are well constrained and applies that given constraint on the ports during the
creation of the Hardware Co-Simulation block. That is to say, if the top-level entity of the
XPS system contains ports that connect to pads on the FPGA, when compiling a Hardware
www.xilinx.com
159
Co-Simulation block, these ports will still connect to the pads on the FPGA and will not
appear as ports on the Hardware Co-Simulation block. Similarly, the bitstream flow
constraints specified on top-level ports in the imported XPS system will be honored.
Should there be top-level ports that do not connect to pads, or are not constrained, these
ports can be made visible in System Generator by exposing the ports using the Processor
Port Interface table in the Advanced tab of the EDK Processor block. See the topic Exposing
Processor Ports to System Generator for details.
You may use the EDK's XPS tool to write and compile your software. However before
simulation can begin, the Compile and update bitstream button in the co-simulation
block's Software tab must be used to put the compiled C-code into the bitstream.
When used in conjunction with a hardware-board supported by network-based hardware
co-simulation, it is possible to free up the JTAG port on the FPGA and use that for software
debug with XMD.
along with other SDK export files. By doing so, the generated software driver can be used
in the Xilinx SDK (Software Development Kit).
160
www.xilinx.com
As shown in the following figure, you can click on the Launch Xilinx SDK button on the
Software tab of a Hardware Co-Simulation block GUI to launch the Xilinx SDK.
When Xilinx SDK is launched from System Generator, the following items are
automatically set up by System Generator.
1.
The workspace is set to the one as specified on the above Hardware Co-Simulation
block GUI.
2.
3.
www.xilinx.com
161
Note: If you launch Xilinx SDK standalone, rather than through System Generator as mentioned
above, you need to specify the Workspace directory and manually add the folder
<netlist_dir>/SDK_Export/sysgen_repos to the local repositories.
API Documentation
There is API documention associated with the software driver, which you can find by
clicking the Documentation link shown in the above the figure.
In order to utilize these functions, the following two header files need to be included in
your C code.
#include "xparameters.h"
#include "sg_plbiface.h"
The hardware settings of the shared memories inside the System Generator pcore can be
found in the header file xparameters.h. For example, absolute memory-mapped
addresses, data bit widths (n_bits) and binary point positions (bin_pt), depths of the "To
FIFO" shared memories on the processor memory map can be found in this header file. The
header file sg_plbiface.h defines the basic data types and software driver functions for
accessing the shared memories.
162
www.xilinx.com
There is a Shared Memory Settings session in the API documentation, which lists the settings
of the available shared memories contained by the System Generator peripheral as shown
in the following figure.
In the API documentation, a number of example code snippets are provided to perform
read/write operations. These code snippets are detailed in the following text.
www.xilinx.com
163
// obtain the memory location for storing the settings of shared memory
"t"
xc_get_shmem(iface, "toreg", &toreg);
// write value to the "din" port of shared memory "toreg"
xc_write(iface, toreg->din, (const unsigned) value);
Single-Word Reads
The following code snippet reads data stored in the "From Register" shared memory
named "fromreg" into "value".
uint32_t value;
xc_iface_t *iface;
xc_from_reg_t *fromreg;
// initialize the software driver, assuming the Pcore device ID is 0
xc_create(&iface, &SG_PLBIFACE_ConfigTable[0]);
// obtain the memory location for storing the settings of shared memory
"fromreg"
xc_get_shmem(iface, "fromreg", (void **) &fromreg);
// read data from the "dout" port of shared memory "fromreg" and store
at value
xc_read(iface, fromreg->dout, &value);
164
www.xilinx.com
Single-Word Reads
The following code snippet reads data stored in the "From Register" shared memory
named "fromreg" into "value".
uint32_t empty;
uint32_t value;
xc_iface_t *iface;
xc_from_fifo_t *fromfifo;
// initialize the software driver, assuming the Pcore device ID is 0
xc_create(&iface, &SG_PLBIFACE_ConfigTable[0]);
// obtain the memory location for storing the settings of shared memory
"fromfifo"
xc_get_shmem(iface, "fromreg", (void **) &fromfifo);
// check the "empty" port of shared memory "fromfifo"
do {
xc_read(iface, fromfifo->empty, );
} while (empty == 1);
// read data from the "dout" port of shared memory "fromfifo" and store
at value
xc_read(iface, fromfifo->dout, &value);
Single-Word Reads
The following code snippet reads data stored in the shared memory named "shram2" into
"value".
uint32_t value;
xc_iface_t *iface;
xc_shram_t *shram;
// initialize the software driver, assuming the Pcore device ID is 0
xc_create(&iface, &SG_PLBIFACE_ConfigTable[0]);
// obtain the memory location for storing the settings of shared memory
"fromfifo"
xc_get_shmem(iface, "shram2", (void **) &shram);
// read data from the shared memory "shram2" and store at value
xc_read(iface, xc_get_addr(shram->addr, 2), &value);
www.xilinx.com
165
Asynchronous Support
Asynchronous support for processors allow for the processor and the accelerator hardware
hanging off the processor to be clocked with different clocks. This allows the hardware
accelerator to run at the fastest possible clock rate, or at a clock rate that is necessary for its
correct functioning, for example when it is required to interface with an external
peripheral.
This feature is enabled when the Dual Clock check box is selected in the EDK Processor
block GUI's Implementation tab. The figure below shows how clocks will be connected for
the import and export flow; in the export flow, the MicroBlaze (MB) block is not present.
Basically, the custom logic design in System Generator is driven with the clk clock and the
processor system is driven with the xps_clk clock. The clock source that drives the PLB bus
in the MicroBlaze processor system is extracted to drive the bus adaptor, the memory map,
and halves of the shared memories. Shared memories straddle between these two domains
(e.g. the clk domain and the plb_clock domain) and are driven by both these clocks. In the
import flow where an XPS project is imported into System Generator, the PLB bus on the
processor must be driven with the same clock as the xps_clk signal.
When Dual Clock is enabled and a design is netlisted for hardware co-simulation, a
slightly different clock wiring topology is used. This is shown in the figure below. The
clock source from the board is bifurcated with one branch going into the Hardware Cosimulation module before being connected to the clk clock (depicted in the figure above).
The other branch is routed through a clock buffer and connected to the xps_clk clock
signal.
This topology allows for the custom logic designed in System Generator to be singlestepped, while allowing the MicroBlaze processor to continue in free-running mode. This
allows for clock-sensitive peripherals (such as the RS232 UARTS) to work when the
Hardware Co-Simulation token is set to single-step.
166
www.xilinx.com
In hardware co-simulation, the processor subsystem is driven by the board clock directly.
This means that the processor subsystem must be able to meet the requirements set by this
clock. In hardware co-simulation, it is possible for users to select different ratios of clock
frequencies based of the input board frequency. Note that this hardware co-simulation
clock is generated in the hardware co-simulation module and is not available to the
processor subsystem.
For exmaple, if the input board frequency is 125MHz, and the hardware co-simulation
frequency is set to 33 Mhz, only the custom logic portion of the design will be constrained
to 33 MHz, the MicroBlaze processor must still run at 125 MHz. If the MicroBlaze processor
cannot meet timing at this speed, you need to instantiate a clock generator pheripheral in
your XPS project and slow down the clock in that way.
When participating in hardware co-simulation, the EDK Processor block provides two
clocking schemes to suit different simulation and runtime requirements: dual-clock wiring
and single-clock wiring. In dual-clock wiring, the EDK Processor and the System Generator
design are driven by two asynchronous clocks; in single-clock wiring, the EDK Processor
and the rest of the System Generator design are driven by the same clock.
www.xilinx.com
167
As a rule of thumb, if you want the processor to free-run at the board rate, you should
choose the dual-clock wiring scheme. In case you want to single-step the processor for
debug or profiling purposes, you should choose the single-clock wiring scheme.
Starting with Release 12.1, the dual-clock wiring scheme is turned on by default. You can
change the wiring scheme to single-clock wiring through the Implementation tab on the
EDK Processor block GUI.
168
www.xilinx.com
The third advantage is that designs compiled with the dual-clock wiring scheme tend to
meet timing more easily compared with the single-clock wiring scheme. With the dualclock wiring scheme, the DCM in the hardware co-simulation clock control module and
the clock generator in the imported XPS project are not cascaded (as is the case when
single-clock wiring is used). This greatly improves the chances of meeting timing when
generating the Hardware Co-Simulation block with the imported XPS project.
Find out the frequency of the board input clock used by the System Generator
hardware co-simulation target. For the JTAG ML506 hardware co-simulation
compilation target, you can look at the file
<sysgen>/plugins/compilation/Hardware Co-Simulation/ML506/JTAG/
ML506_JTAG.ucf
2.
Verify that the board input clock frequency is 200 MHz. Another way to find out the
board input clock source is to run the hardware co-simulation compilation target once
and look at the file <netlist_dir>/jtagcosim_top.ucf. In the following snippet of
the file ML506_JTAG.ucf, you can see that the System Generator hardware cosimulation uses a 200 MHz LVDS board input clock source.
NET "sys_clk_p" LOC = "L19";
NET "sys_clk_n" LOC = "K19";
NET "sys_clk_p" TNM_NET = "hwcosim_sys_clk";
NET "sys_clk_n" TNM_NET = "hwcosim_sys_clk";
TIMESPEC "TS_hwcosim_sys_clk" = PERIOD "hwcosim_sys_clk" 200 MHz HIGH
50%;
3.
In the system.mhs file found in the XPS project, change the input clock frequency from
100 MHz to 200 MHz, which is the frequency of the clock source used by the System
Generator hardware co-simulation compilation target.
PORT fpga_0_clk_1_sys_clk_pin = dcm_clk_s, DIR = I, SIGIS = CLK,
CLK_FREQ = 200000000
www.xilinx.com
169
4.
Change the input clock frequency of the clock generator in the imported XPS project.
BEGIN clock_generator
PARAMETER INSTANCE = clock_generator_0
PARAMETER C_CLKIN_FREQ = 200000000
PARAMETER C_CLKOUT0_FREQ = 125000000
PARAMETER C_CLKOUT0_PHASE = 0
PARAMETER C_CLKOUT0_GROUP = NONE
PARAMETER C_CLKOUT0_BUF = TRUE
PARAMETER C_EXT_RESET_HIGH = 0
PARAMETER HW_VER = 4.00.a
PORT CLKIN = dcm_clk_s
PORT CLKOUT0 = clk_125_0000MHz
PORT RST = sys_rst_s
PORT LOCKED = Dcm_all_locked
END
5.
After this, you can import the modified XPS project through the EDK Processor block
and generate a Hardware Co-Simulation block for this project.
170
www.xilinx.com
When a System Generator model contains an XPS project imported through the EDK
Processor block in single clock mode, the XPS project is driven by the clock generated by
the Hardware Co-Simulation module. This allows the processor to be simulated in lockstep with the rest of the DUT and the Simulink simulation. This kind of simulation can be
very helpful when you are debugging transactions over a custom bus or when you are
profiling code.
The original system.mhs file created using BSB is shown below. Observe that the input
board clock fpga_0_clk_1_sys_clk_pin is connected to the CLKIN pin of the clock
generator instance. The output clock pin CLKOUT0 is then used to drive the processor
and the other hardware peripherals.
www.xilinx.com
171
2.
Next, you should simply comment out the clock generator. The output clock is directly
attached to the board input clock. The modified system.mhs file is like the following:
PORT fpga_0_clk_1_sys_clk_pin = clk_125_0000MHz, DIR = I, SIGIS = CLK,
CLK_FREQ = 100000000
# BEGIN clock_generator
# PARAMETER INSTANCE = clock_generator_0
# PARAMETER C_CLKIN_FREQ = 100000000
# PARAMETER C_CLKOUT0_FREQ = 125000000
# PARAMETER C_CLKOUT0_PHASE = 0
# PARAMETER C_CLKOUT0_GROUP = NONE
# PARAMETER C_CLKOUT0_BUF = TRUE
# PARAMETER C_EXT_RESET_HIGH = 0
# PARAMETER HW_VER = 4.00.a
# PORT CLKIN = dcm_clk_s
# input clock
# PORT CLKOUT0 = clk_125_0000MHz # output clock
# PORT RST = sys_rst_s
# PORT LOCKED = Dcm_all_locked
# END
3.
The Dcm_all_locked pin on the clock generator is used to indicate whether the output
clock signal is locked with the input clock signal. Replace the input pins driven by this
signal with net_vcc. These kind of changes can be tricky in some design scenarios. So
far, no abnormality has been observed for the hardware peripherals generated from
BSB.
BEGIN proc_sys_reset
PARAMETER INSTANCE = proc_sys_reset_0
PARAMETER C_EXT_RESET_HIGH = 0
PARAMETER HW_VER = 2.00.a
PORT Slowest_sync_clk = clk_125_0000MHz
PORT Ext_Reset_In = sys_rst_s
PORT MB_Debug_Sys_Rst = Debug_SYS_Rst
PORT Dcm_locked = net_vcc
# changed from Dcm_all_locked
PORT MB_Reset = mb_reset
PORT Bus_Struct_Reset = sys_bus_reset
PORT Peripheral_Reset = sys_periph_reset
END
172
www.xilinx.com
4.
Comment out the software driver for the clock generator in the system.mss file
# BEGIN DRIVER
# PARAMETER DRIVER_NAME = generic
# PARAMETER DRIVER_VER = 1.00.a
# PARAMETER HW_INSTANCE = clock_generator_0
# END
5.
After the modification, the clock generator is safely removed from the XPS project. You
can import this modified XPS project into System Generator through the single-clock
wiring scheme.
Troubleshooting
Limitations on the Imported XPS Project
In theory, any XPS project can be imported into System Generator through the EDK
Processor block. However, you may need to modify the XPS project in some situations to
avoid resource conflicts and to allow the EDK Processor block to properly interpret the
project.
Input clock port: XPS uses SIGIS = CLK to tag an external port as a clock port. The
EDK Processor block only recognizes a single input clock to implement the single
clock and dual clock wiring described above. In this case, you need to remove the
SIGIS = CLK tag on other clock ports. The following XPS project example has two
input clock ports, sys_clk_pin and fpga_0_PCIe_Diff_Clk_IBUF_DS. In order to
import this project into System Generator, you need to ensure that SIGIS = CLK is
removed from the PCI input clock ports.
PORT sys_clk_pin = dcm_clk_s, DIR = I, SIGIS = CLK, CLK_FREQ = 100000000
PORT fpga_0_PCIe_Diff_Clk_IBUF_DS_P_pin = PCIe_Diff_Clk, DIR = I,
DIFFERENTIAL_POLARITY = P # need to remove SIGIS = CLK
PORT fpga_0_PCIe_Diff_Clk_IBUF_DS_N_pin = PCIe_Diff_Clk, DIR = I,
DIFFERENTIAL_POLARITY = N # need to remove SIGIS = CLK
Resource conflict: You need to ensure that there is no resource conflict between the
imported XPS project and the rest of the System Generator design. For example, if you
use the Point-to-Point Ethernet-Based Hardware Co-Simulation flow and the target
hardware board has only one Ethernet MAC component (e.g., the Xilinx ML506,
SP601, and SP605 evaluation boards), the XPS project can contain peripherals that use
the Ethernet MAC (e.g., xps_ethernetlite). You should consider changing to the JTAGBased Hardware Co-Simulation flow in this case. Another example is when the target
hardware board only has a single BSCAN module (e.g., the Spartan 3A DSP 1800
www.xilinx.com
173
Starter board). You have to remove the JTAG-Based MDM (Microprocessor Debug
Module) peripheral from the imported XPS project. Otherwise, you need to switch to
the Point-to-Point Ethernet-Based Hardware Co-Simulation flow and use the Ethernet
for downloading the bitstream.
Constraint handling: The EDK Processor block automatically modifies the UCF (user
constraint file) file from the imported XPS based on the compilation flow that is used.
Upon importing an XPS project, a copy of the modified UCF file is placed under
<xps_project_dir>/data/sg_<xps_project_name>.ucf. The snippet of a modified
UCF file is shown below. Constraints that belong to certain external ports of the
imported XPS are commented/uncommented depending on whether or not the port
is exposed on the EDK Processor block. The input clock ports are commented out in
the hardware co-simulation flow automatically.
# constraints for pin 'fpga_0_RS232_Uart_1_RX_pin' (not exposed)
Net fpga_0_RS232_Uart_1_RX_pin LOC = AG15 | IOSTANDARD=LVCMOS33;
# constraints for pin 'fpga_0_RS232_Uart_1_TX_pin' (not exposed)
Net fpga_0_RS232_Uart_1_TX_pin LOC = AG20 | IOSTANDARD=LVCMOS33;
#
#
#
#
In case where you do not want the EDK Processor block to make automatic
modifications, you can put the line #### SYSGEN VERBATIM ### in the original XPS
project UCF file. All the lines after this commented line will be untouched. See the
explanation found from the beginning of the modified UCF file, which is also shown in
the code snippet below.
# This file is generated automatically by System Generator for DSP from
# the following file:
#
# C:\dev\trunk\test\edk\edkplbimport\EDKPrj\data\system.ucf
# Do NOT modify this file directly. Instead, change the above original
# file. Synchronize the processor memory map, or re-import the XPS
# project to apply the changes. # # In case that the automatic changes
by System Generator for DSP are
# undesired, put the following comment in the above original file. All
# the contents after this comment will be copied verbatim.
# #### SYSGEN VERBATIM ###
174
www.xilinx.com
EDK Support
EDK Support
Importing an EDK Processor
Exporting a pcore
There are two ways to launch the EDK Import Wizard in the EDK Processor block: (1) press
the Import button, or (2) select HDL netlisting when the EDK project field is empty.
Note: When you import the EDK Project into System Generator, there are modifications made to the
EDK project. These modifications are described in the following topic.
www.xilinx.com
175
Note: The import process will alter your EDK project to work inside System Generator. If you wish
to retain an unadulterated version, please make a copy before importing. System Generator
automatically backs up the hardware platform specification (i.e., the MHS file) and the software
platform specification (the MSS file) of the EDK project to files with the "bak" suffix.
When an EDK project is imported into System Generator, the EDK project is augmented
with a PLB46 interface depending on the options made on the EDK Processor block. A
pcore (xlsg_plbiface) is also added to provide software drivers for the interface.The MHS
and MSS files in the EDK project will be altered. Following that, the HDL files that describe
the processor will be generated and linked to your System Generator project.
Limitations
Currently the Wizard can only import single processor projects. Only the MicroBlaze
processor is supported. Peripherals added to the processor cannot conflict with the
resources used by other System Generator services. For instance, if network-based
hardware co-simulation is used, the EDK project cannot make use of the peripherals using
the Ethernet MAC.
176
www.xilinx.com
EDK Support
The top-right box in the figure above shows a snippet from an EDK project in XPS. The
external port list has among other ports, a user-defined port called myExternalPort.
After importing the EDK project, open up the processor's block GUI in System Generator.
Select the Advanced tab to reveal the processor port interface table.
The port list shows all the top-level ports available on the processor. This port list has been
filtered to remove clock ports and also signals used by System Generator to implement the
memory-map interface. In this example, the RS232 ports, sys_rst_pin and myexternalport are
shown to be ports that can be exposed to the top-level of the System Generator block.
Selecting the expose check box will cause the port to be exposed on the EDK Processor
block. As shown in the figure above, the display name of the port can be changed, should
the original name be too long.
This mechanism allows ports from the processor to be directly exposed to the System
Generator design without going through the memory map generated by System
Generator. You may choose to do this to expose the reset ports on the processor, or to
expose interrupt ports directly to the System Generator diagram.
www.xilinx.com
177
Exporting a pcore
System Generator designs containing an EDK Processor block can be exported as an EDK
pcore using the EDK Export Tool compilation target on the System Generator token.
Before exporting to the EDK as a pcore, the EDK Processor block must be configured for
"EDK pcore generation". This can be done by opening the EDK Processor block GUI and
selecting the relevant drop down option in the "Configure processor for" parameter.
Please refer to the topic EDK Export Tool for more information.
PicoBlaze Overview
The following example uses PicoBlaze 3 (hereto referred to simply as PicoBlaze), which is
optimized for low resource requirements. A memory block is used as a program store for
up to 1024 instructions.
Signal
178
Direction
Description
in_port[7:0]
Input
brk
Input
rst
Input
Reset
instr[17:0]
Input
Instruction Input.
out_port[7:0]
Output
port_id[7:0]
Output
Port Address.
www.xilinx.com
Signal
Direction
Description
rs
Output
Read Strobe.
ws
Output
Write Strobe.
addr[9:0]
Output
ack
Output
Interrupt Acknowledge.
Architecture Highlights
www.xilinx.com
179
ALU
The Arithmetic Logic Unit (ALU) provides operations such as add, sub, load, and, or, xor,
shift, rotate, compare, and test. The first operand to each instruction is a register to which
the result is stored. Operations requiring a second operand can specify either a second
register or an 8-bit constant value.
Input/Output
There are 256 input ports and 256 output ports. The port being accessed is indicated by an
8-bit address value provided on port_id. The port address can be specified in the
program as an absolute value or indirectly specified as the contents of a register. During an
input operation, the value provided to in_port is transferred into any of the 16 registers.
During an output operation, a value is transferred from a register to out_port.
Interrupt
The processor provides a single interrupt input port, brk. When interrupts are enabled,
setting brk to 1 causes the program counter to be set to memory location 0x3FF, where a
jump vector to the interrupt service routine is stored. At this time, a pulse is generated on
the ack port (two clock cycles after brk is asserted), the control flags are preserved and
further interrupts are disabled. The return instruction ensures that the end of an interrupt
routine restores the status of the control flags and specifies if future interrupts should be
enabled.
For extensive details regarding the feature and instruction set, please refer online to the
topic PicoBlaze User Resources.
2.
Open Pico_dds.mdl.
3.
180
Find the PicoBlaze block in the Xilinx Blockset Library under Index or Control
Logic and add it to the model where indicated. The default settings of the block do
not give the same number of ports as is expected by the model. This will be
corrected in the following step. You may need to resize the block to fit into the
space allocated in the design.
www.xilinx.com
b. Double-click the block and set Version to PicoBlaze 3. Turn off the option to
Display internal state. Connect the ports to the existing lines in the model.
c.
Find the PicoBlaze Instruction Display block in the Index or Tools Library and add it
to the model where indicated. Make sure it is connected properly, as shown in the
figure below:
www.xilinx.com
181
d. Double-click the PicoBlaze Instruction Display block and set the Version to
PicoBlaze 3. Check the Disable Display option. Disabling the display option
allows the simulation to run without the overhead of updating the block display.
4.
e.
Find the ROM block in the Memory Library and add it to the model where
indicated. Flip the block by Right clicking on the block and selecting Format > Flip
Block. Attach the ports to the existing lines.
f.
Configure the program store. Double click the ROM to do the following.
With the Basic tab selected:
182
a.
The ROM block is used to store the PicoBlaze instructions. The depth of the ROM
must be set to 1024. This is because the program uses interrupts and setting brk to
1 causes the program counter to be set to 0x3FF.
b.
As detailed in step 5, the code is assembled and produces an initialization file for
the memory named fill_pico_code_program_store.m. Hence the ROM
Initial Value Vector should be set to fill_pico_code_program_store.
c.
To increase the performance for synchronous designs, the Latency should be set to
www.xilinx.com
5.
The Word Type should be Unsigned and Number of Bits should be set to 18 with
the Binary Point at 0.
Open pico_code.psm.
Note: The Xilinx PicoBlaze Assembler is only available with the Windows Operating System. Thirdparty PicoBlaze Assemblers are available for Linux, but are not shipped by Xilinx.
www.xilinx.com
183
Notice the sine wave frequency increasing proportionally to the phase increment.
8.
If the program is not working properly, there are several tools that can be utilized to ease
debugging. Deselecting the Disable Display checkbox in the PicoBlaze Instruction
Display block causes the block to be activated, displaying the updated program counter
and instruction each clock cycle. In conjunction with enabling the display, the registers and
control flag values can be viewed by selecting the Display Internal State in the PicoBlaze
Microcontroller block. Change the Single-Step Simulation block to single-step mode by
double clicking on the block. Step through the simulation to debug.
184
www.xilinx.com
The following exercise illustrates how to create a XPS pcore using System Generator. The
files used in this exercise can be found in:
<ISE_Design_Suite_tree>/sysgen/examples/EDK/rgb2gray, where
<ISE_Design_Suite_tree>/sysgen denotes the System Generator installation
directory.
Prerequisites
The exercise assumes that you have the following items installed on your computer.
edk
C:\export_pcore\edk\system.xmp
This is an pre-configured XPS project.
source
C:\export_pcore\source\rgb2gray.mdl
This is a System Generator design that will be exported as a pcore.
Launch MATLAB and from the MATLAB console, set the current directory to
C:\export_pcore\source.
www.xilinx.com
185
2.
Notice that the EDK Processor block has already been added to this model.
3.
186
Click on the System Generator token and generate a pcore using the Export as a pcore
to EDKflow.:
www.xilinx.com
As shown above, set the Compilation type to be Export as a pcore to EDK. Click on the
Settings... button to open up options for the compilation target. Accept the default settings
so that the pcore is generated and exported into the model's target directory.
Click on the Generate button to initiate the pcore export process.
Note: This should take less than 5 mins.
4.
5.
Select all the default settings from the popped up dialog windows.
Next, add the pcore into the embedded design. You can either use your mouse to drag
and drop the pcore into the System Assembly View or you can right-click on the pcore
instance and select Add IP.
2.
Connect a clock to the system pcore. From the System Assembly View, select the
following: Port tab > rgb2gray_axiw_0 > connect to 50 MHz clock, as shown below::
www.xilinx.com
187
1.
This will take about 15 minutes for this particular design because XPS has to generate
a netlist and implement the entire design.
2.
When you are asked to select a workspace, you can specify the following path:
C:\export_pcore\edk\SDK
From the SDK main menu, select Xilinx Tools > Repositories .
2.
188
3.
Right click on edk_hw_platform and select New > Project. Select Xilinx C Project.
and click Next >
4.
Select Hello World and then click Finish. This step creates a Xilinx C project file in
SDK.
www.xilinx.com
Note: Note: after this step, a board package and a default ELF file are automatically created for you
and you should see something similar to the following in SDK:
Set the STDIO Connection. From the main SDK menu, select Run > Run
Configurations
2.
Double-click on the Xilinx C/C++ ELF. Select the STDIO Connection tab.
3.
Click on the Connect STDIO to Console radio button, select an active channel like
COM1 or COM2 and then click Apply.
Note: You cables need to be connected to the SP601 - both the UART and JTAG cables
www.xilinx.com
189
4.
Find and delete the helloworld.c file that was auto created by the tool as an
example and replace it with the provided C code from
C:\export_pcore\source\rgb2gray.c as shown below:
Note: Note: the easiest way to add a new source file is to drag and drop from Window Explorer.
Once added, SDK should auto compile and if things go as planned, you should see the following
compiling messages:
190
5.
Compile the Ccode and if things go as planned, you should see the following
compiling messages:
6.
Now you're ready to configure and download a bit file to your SP601 evaluation board:
from the main menu, select Xilinx Tools > Program FPGA > Program
www.xilinx.com
7.
Execute the ELF file. Right-click on the elf file and select Run As > Launch on
Hardware.
You should see the following messages echoed to the Console through the Uart
peripheral:
2.
www.xilinx.com
191
192
www.xilinx.com
www.xilinx.com
193
You will now configure the EDK Processor block to import the XPS project. The import
process will make changes to the XPS project. Thus, ensure that the XPS project is not
currently opened by Xilinx Platform Studio before importing.
Double click on the processor block to bring up the block dialog box. In the Configure
processor for drop-down menu, select HDL netlisting. The Import button is enabled as
a result of the selection.
Note that the Import button is disabled when the processor is configured for EDK pcore
generation. In EDK pcore generation mode, it is expected that you will create a pcore in
System Generator and export it to be used in another XPS project. In this case, the
processor is not instanced inside the EDK Processor block. In HDL netlisting mode, it is
expected that you import an XPS project into the System Generator model and netlist it
with other System Generator blocks.
If no XPS project is ever imported, configuring the processor for HDL netlisting will
automatically trigger the launching of the XPS Import Wizard. The XPS Import Wizard can
be launched manually by pressing the Import button.
In the pop-up file selection dialog, browse to the XPS project created in earlier steps. The
import process starts once a XPS project file (xmp file) is selected. The import process
copies necessary files into the XPS project and changes the project accordingly to allow the
MicroBlaze processor to communicate with the System Generator model.
Note that if there is any software applications contained by the imported XPS project, they
are not compiled during import.
194
www.xilinx.com
The above figure shows a portion of the System Assembly View of the XPS project in Xilinx
Platform Studio. A sg_plbiface peripheral is automatically added to an XPS project after it is
successfully imported into System Generator. The sg_plbiface peripheral connects the PLB
bus attached to the imported MicroBlaze processor to the System Generator model
through a memory-mapped interface, and to capture information on how to generate the
www.xilinx.com
195
corresponding device software drivers. Right click on sg_plbiface in the System Assembly
View to see its API documentation.
Follow the instructions in the API documentation to include the following header file and
initialize the software driver in MyProject.c.
#include "sg_plbiface.h"
xc_iface_t *iface;
// initialize the software driver
xc_create(&iface, &SG_PLBIFACE_ConfigTable[0]);
Before reviewing the code to run on the processor, first consider how to write data to the a
register on the model. Look at the DSP48 Co-Processor model. Recall that the a port of the
DSP48 block is driven by the output of a shared register by the same name. You want to
write a value to that shared register from with-in MicroBlaze processor code. By referring
to the driver API, you can see that the shared memory called a is a To Register memory
type with xc_to_reg_t access data type, which contains the following data fields:
typedef struct {
xc_w_addr_t din;
uint32_t n_bits;
uint32_t bin_pt;
} xc_to_reg_t;
Once the software driver is initialized, din stores the memory-mapped address of the din
port of the shared memory a, while n_bits and bin_pt store the number of bits and
binary point information.
So in order to write a value to the a shared register, you need to first obtain the shared
register settings through xc_get_shmem and thus:
xc_to_reg_t *toreg_a;
xc_get_shmem(iface, "a",
Note: Calling xc_get_shmem is expensive. You should cache the returned toreg_a for later use
and avoid calling xc_get_shmem multiple times in a program.
196
www.xilinx.com
You can then use the following single-word write access function to write to the a shared
register:
// -- Set the a port register to 2
xc_write(iface, toreg_a->din, 2);
Copy and paste the above code into your source code file MyProject.c
A reference copy of the full code of MyProject.c is located at the following pathname:
<ISE_Design_Suite_tree>/sysgen/examples/EDK/DSP48CoProcessor/MyProject
.c
www.xilinx.com
197
198
www.xilinx.com
Return to the testbench model. Double click on the Processor Subsystem hwcosim block to
bring up the dialog box shown above. To compile the software contained in the XPS project
listed in the Software tab and load it into the hardware co-simulation bitstream, click the
button labeled Compile and update bitstream.
Since Point-to-point Ethernet co-simulation is chosen, you need to configure the Ethernet
interface and also the Configuration interface of the Processor Subsystem hwcosim block.
Select a valid Host interface for your Ethernet communications, and set the configure
interface to Point-to-point Ethernet. Refer to the topic Using Hardware Co-Simulation for
more usage information of the hardware co-simulation block.
www.xilinx.com
199
Using XPS
This topic provides a quick tutorial on several aspects of the Xilinx Embedded
Development Kit (EDK). Please refer to the EDK documentation for more in depth
explanations and tutorials.
200
1.
2.
When XPS launches, the following dialog should appear. Select Base System Builder
wizard (recommended), then click OK.
3.
Next, specify the XPS project name as system.xmp and the location, as shown below,
then click OK.
www.xilinx.com
4.
Next, tell Base System Builder that you would like to create a new design, then click
Next.
5.
Base System Builder Select the Board Vendor and Board Name, then click Next.
6.
Base System Builder Select a Single Processor System, then click Next.
www.xilinx.com
201
7.
Base System Builder Configure the Reference Clock Frequency and the Local
Memory, as shown below, then click Next.
8.
Base System Builder Select RS232_Uart, dlmb_cntlr and ilmb_cntlr. Remove other
peripherals using the Remove button, then click Next.
9.
Base System Builder Click Next in the Cache configuration dialog box.
10. Base System Builder Click Next in the Application configuration dialog box.
11. Base System Builder Click Finish in the Summary screen. At this point, the XPS
project will be automatically generated.
202
www.xilinx.com
To add a new software application to an EDK Project, first open the EDK project in the
EDK.
2.
In the Project Information Area, click on the Applications tab to reveal the Software
Projects page.
3.
The first item on this page is Add Software Application Project Double click on this
to bring up the Add Software Application Project dialog box. Type in a project name,
then click OK.
4.
By default, the project is created and not set to be initialized into BRAMS. Make sure to
initialized the project into BRAMS; otherwise, the software code will not be compiled
and added to the bitstream. Also, if you have more than one application, ensure that all
other applications have Marked to Initialize BRAM unchecked.
www.xilinx.com
203
5.
Next, create Source or header files. Double click on the Sources branch of a project tree
to cause a File Open Dialog to pop-up. The dialog is rooted at the base location of your
EDK project. It is good to create a directory named after your project and keep your
source and header files there; in this case, MyProject. Create the directory in the same
directory as your EDK.xmp file
Pcores in an EDK project must be in the user repository, or in a directory named pcores,
at the same directory level as the EDK project file
2.
To ensure that the pcore has been loaded from XPS, select Project > Rescan User
Repositories
3.
You may use the Configure Co-processor tool. The tool can be launched from XPS by
selecting Hardware > Configure Coprocessor...
Available pcores are listed on the right hand window. Select the relevant pcore, then click
on the Add button. The Configure Coprocessor tool takes care of connecting the clock and
reset signals, however, any user signals must be wired up by you.
204
www.xilinx.com
Pressing the Edit software button will launch the SDK. The SDK workspace can also be
found under the netlist directory of the design. The ELF file field tells the Hardware Cosimulation block what executable binary to use during simulation, and the "Compile and
update bitstream" takes the binary file specified in "ELF file" and merges it into the
bitstream.
Please refer to documentation in the SDK regarding the creation of a software platform and
a managed C or C++ project within the SDK.
www.xilinx.com
205
In the list of Sample Applications, select the sg_plbiface example and click Finish. This
creates a software project with a main() routine that prints "Hello World". The file also
contains example functions that show how to access the memories in the System Generator
design.
206
www.xilinx.com
This will scroll the report into the section pertaining to the System Generator peripheral. In
the IP Specs table, click on the DRIVER link and that will launch the documentation
generated by System Generator for the peripheral.
www.xilinx.com
207
Open the System Generator model with the hardware co-sim block (vfbc_hwcs.mdl).
Double-click on the Hardware Co-Sim block and click on the Edit software button to
launch SDK.
Note: Notice that you do not have to enter the ELF file yet at this point.
208
www.xilinx.com
2.
www.xilinx.com
209
210
3.
4.
www.xilinx.com
5.
www.xilinx.com
211
6.
212
www.xilinx.com
7.
Enter the Software Platform Project name, select Empty Application with no template
and click Finish.
Your SDK design cockpit should look similar to the figure below:.
www.xilinx.com
213
8.
The last step is to either create a new C-code source file or add an existing one to the
project. In this case, you can just add the existing one from C:\VFBC\C-code\vfbc.c.
The easiest way to add a C-code source file to the VFBC {SysGen_VFBC} application
project is to simply Copy & Paste or Drag & Drop the file into the project. Once the file
is added, the project will be built and compiled automatically.
Software Iteration
1.
SDK -- Modify C-code and make sure the software project is recompiled successfully
2.
Hardware Iteration
1.
2.
3.
4.
5.
Note: Making Hardware changes requires a new design implementation through Place & Route.
Enables you to rapidly import a test MicroBlaze processor subsystem into System
Generator and simulated through hardware co-simulation in order to debug a DSP
circuit under development
2.
The majority of this tutorial exercise is focused on the use-case where you import an XPS
design into System Generator. This flow allows you to debug your DSP design in System
Generator with real live data generated from the MicroBlaze. System Generator's
Hardware Co-Simulation technology allows the MicroBlaze to be running in hardware and
for the rest of the DSP design to be simulated (in software) in System Generator. This gives
you visibility into all the signals of the DSP design and is useful for finding hardware and
interface/protocol bugs.
214
www.xilinx.com
The following are some of benefits of using Co-Debug between System Generator and
SDK:
Set a breakpoint and debug while the MicroBlaze and hardware are stopped
Signals to probe do not need to be chosen before the bit stream is generated
Find a bug, modify the C code, recompile and update the bitstream in seconds.
No need to rerun synthesis and the implementation flow when the software
changes
The initial software program (ELF file) is automatically updated to the download
bitstream. You no longer need to manually click the Compile and update
bitstream button on the Hardware Co-Simulation block.
Tight integration
The SDK project is automatically setup with the correct hardware platform
Objectives
After completing this tutorial exercise, you will be able to:
Use shared memories in System Generator to interface DSP hardware with the
MicroBlaze embedded processor
SP601 Platform
www.xilinx.com
215
Design Description
The System Generator design below includes a FIR Compiler 5.0 block with Shared
Memory blocks From / To FIFOs. Also included is an SP601 embedded system with a
MicroBlaze processor, PLB4.6 bus, and a UART Lite peripheral, all created using the
Platform Studio BSB (Base System Builder) Wizard.
216
FIR Compiler 5.0: a parameterizable FIR filter that accepts input data from the din
pin.
From FIFO din block: is used to accept input data from the MicroBlaze processor and
feed it to the input din of FIR Compiler. This input data is accessible via both Simulink
and MicroBlaze during the co-debugging session.
To FIFO dout block: is used to accept output data from the FIR Compiler dout and
feed it to the MicroBlaze processor. This output data is also accessible via both
Simulink and MicroBlaze during the co-debugging session.
www.xilinx.com
The two design files for this exercise are located at the following pathname:
<ISE Design Suite_tree>/sysgen/examples/SDK_CoDebug
PROCEDURE
In this procedure, you will follow four primary steps:
Step 1 Familiarize yourself with the tool flow between System Generator and Platform
Studio and how DSP and Embedded components can be integrated together
Step 2 Create an embedded system that includes a MicroBlaze embedded processor using
Xilinx Platform Studio (XPS)
Step 3 Incorporate an XPS project into a System Generator design and generate a
Hardware Co-Simulation block
Step 4 Create a software application project and co-debug a System Generator design
using an integrated flow between System Generator and SDK
Step 1 Familiarize yourself with tool flow between System Generator and Xilinx
Platform Studio
Note: You can skip this step and start with Step 2 if you are already familiar with Sysgen design
flows. You can revisit these steps later if you like.
The figure above shows a typical Xilinx tool flow between IP, Project Navigator, Platform
Studio, Software Development Kit and System Generator. Depending on your application,
you may want to use different design methodologies when designing an Embedded DSP
application. Sysgen provides two different approaches to integrating a MicroBlaze
processor from Platform Studio with a System Generator design and they serve different
purposes. It will be helpful to understand some of its basic differences between these two
unique flows.
1.
EDK Export flow: allows you to generate and export a Sysgen design model as a pcore
to the MicroBlaze processor project. This flow works well if you want to integrate a
System Generator design as a sub-level design to the MicroBlaze processor system.
The typical steps to accomplish this design flow are described as follows:
Generate and export the System Generator model to an XPS project. The XPS
project can be created using the Base System Builder Wizard. The System
Generator pcore will appear in the XPS IP catalog.
www.xilinx.com
217
2.
EDK Import flow: allows you to integrate a MicroBlaze processor system into a
System Generator design as a sub-system. This flow is very useful especially if you
want to bring the MicroBlaze processor system into the System Generator design
environment for debugging and simulating purposes. You can take advantages of a
very complex and powerful simulation platform Simulink, HDL (including
ModelSim and ISIM), and hardware simulations. The typical steps to accomplish this
design flow can be described as follows:
Import the XPS project into a System Generator design by using the EDK
Processor block
Depending on your needs, you can either perform hardware co-simulation for
debugging or validating your hardware platform or add the netlist into a bigger
design
Note: The obvious advantage with this flow is the ability to perform the hardware co-simulation
on the processor block and its peripherals and take advantages of the rich and powerful Simulink
simulation and debugging capability.
3.
218
System Generator Dual Clock Support for EDK Processors: System Generator
supports dual clock wiring, which means that the imported processor system and the
other portion of a System Generator model are driven by two independent clock
domains. One major benefit with dual clock wiring is that the MicroBlaze processor
system and the System Generator user logic can run at different clock frequencies. For
example, MicroBlaze can comfortably operate at 100 MHz, while a DSP FIR (finite
impulse response) filter in System Generator can run at up to 400 MHz.
www.xilinx.com
Step 2 Generating the BSB System Using the XPS BSB Wizard
General Flow for this Exercise
1.
1.
Select Start > All Programs > Xilinx Tools > ISE Design Suite 14 > EDK > Xilinx
Platform Studio or double-click the Xilinx Platform Studio shortcut on the desktop if
available
2.
3.
4.
Specify an XPS project name, located in the SDK_CoDebug/XPS folder and click OK.
5.
Click Next
www.xilinx.com
219
220
6.
7.
8.
www.xilinx.com
9.
Use the Remove button to remove unused IO Devices and Internal Peripherals under
Processor 1 at the right-hand side of the screen and only keep RS232_Uart_1,
dlmb_cntlr, and ilmb_cntr as shown in the figure below. Click Next.
10. Click Next in the Cache configuration screen and press OK on Timing closure
Warning.
11. Click Next in the Application configuration screen
12. Click Finish in the Summary screen
13. Click OK in The Next Step screen and close the XPS application
Step 3 Import the Embedded System into the Sysgen Design and Generate a
Hardware Co-Simulation Block
General Flow for this Exercise
You have just completed the process of creating and configuring an embedded system for
a Xilinx FPGA. This embedded system is now ready to be incorporated into a Sysgen
design fir_example_mb.mdl
1.
Note: This may take up to 3 minutes the first time since Sysgen will call PartGen in order to populate
the devices in the Sysgen token.
Note: Change the Simulink Solver on this model to ode45 or you will get the following warning:
Warning: The model 'fir_example_mb' does not have continuous states, hence Simulink is using the
solver 'VariableStepDiscrete' instead of solver 'ode45'
www.xilinx.com
221
2.
3.
Open the Simulink Library Browser, then open the Xilinx Blockset/Index folder. Select
and Drag an EDK Processor block into the Subsystem sheet as shown above. Select the
pulldown menu File > Save to save the sheet.
4.
Double-click on the EDK Processor block and click Add to map all available shared
memory blocks and you should see the following shared memories:
222
www.xilinx.com
5.
Select the Implementation tab and verify that the Dual Clocks options is selected. The
Bus Type is automatically detected by System Generator.
Note: You will be using the Dual Clocks feature for this exercise. It enables System Generator
and the imported XPS processor subsystem to operate in different asynchronous clock domains.
Select HDL netlisting in the Configure Processor for pull down menu. Click the
Import button and use the Import EDK project... dialog box to select the XPS project
that you just created in Step 2. The file is called system.xmp in the XPS directory.
www.xilinx.com
223
7.
Click Apply and OK to close the dialog box. Save the design with the File > Save
pulldown menu.
8.
You are now ready to generate a hardware co-simulation block for this subsystem.
Double-click on the Sysgen Token on this Subsystem and set its parameters as shown
below.
9.
Click the Generate button to start generating a Hardware Co-Simulation block. This
may take a few minutes.
Step 4 Create a Software Application Project and Co-Debug the Sysgen Design
Using an Integrated Design Flow between Sysgen and SDK
General Flow for this Exercise
Importing an XPS design into System Generator allows you to co-debug your DSP design
in SDK with real live data generated from the MicroBlaze while observing signals on the
Simulink model. System Generator's Hardware Co-Simulation technology allows the
MicroBlaze to be running in hardware and for the rest of the DSP design to be simulated (in
software) in System Generator. This gives you visibility into all the signals of the DSP
design and is useful for finding hardware and interface/protocol bugs.
In this section of the exercise, you will co-debug a System Generator design using SDK and
System Generator. This will involve single-steping the C-code and oberving expected
output signals inside the SDK console as well as on the waveform Scope of the Simulink
224
www.xilinx.com
model. This co-debug methodology enables you to examine and verify signal values at
different points in the C-code and Simulink signals.
1.
Delete the Subsystem block from the fir_example_mb.mdl model as shown below:
2.
Now you will replace it with the hardware co-simulation block you just generated.
Open the file ...netlist\Subsystem_hwcosim_lib.mdl. then copy and paste the
generated hwcosim block into the model as shown in the figure below.
3.
www.xilinx.com
225
4.
Double click on the Subsystem hwcosim block and configure the block as follows:
Note: Verify that the Share cable for concurrent access with: checkbox is selected
226
Software tab:
5.
Click the Launch Xilinx SDK button, as shown in the figure above to launch SDK.
www.xilinx.com
Note: When SDK is launched directly from System Generator, the target hardware platform should
already be associated for you as shown by the figure below:
Note: You can also choose launch SDK independently and associate the SDK project with a
hardware co-sim design using the following procedure:
a.
c.
From the SDK pulldown menu, select Xilinx Tools > System Generator CoDebug Settings
d. Enter the pathname to the associated Hardware Co-Simulation model. You can use
the default Port specification 4739.
www.xilinx.com
227
6.
Continuing from step 5, create a new Xilinx C Project: File > New > Xilinx C Project >
Hello World (default project template)
7.
228
8.
The next task is to develop C-code to interface with the System Generator pcore. The
default file helloworld.c is created for you and it can be used as your starting point.
For your convenience, a complete C-code source file is provided for you named
sg_hello_world.c and is located in the SDK_CoDebug folder.
9.
Under the hello_world_0 C application, within the src folder, delete the
helloworld.c and replace it with the ...sg_hello_world.c. The easiest way to add a new
C-code file is to simply drag and drop it from Windows Explorer into the src directory.
www.xilinx.com
Note: Notice that the new C-code is automatically detected and compiled. If there is no syntax error
and everything goes well, you should see a screen-shot similar to the one shown below.
10. Before you start debugging your design, you may need to make sure that the COM
port setting in SDK STDIO is the same with what is being set on your PC.
Verify the COM port setting on your PC as follows:
a.
b.
www.xilinx.com
229
c.
View the Silicon Labs CP210x USB toUART Bridge entry. The COM port
assignment appears in parenthesis at the end of the line. In the case below, the port
assignment is COM4.
d. If the port assignment is COM1, right-click on Silicon Labs CP210x USB toUART
Bridge (COM1) and select Properties.
e.
f.
In the COM Port Number entry box, change the value from COM1 to any other
unused port such as COM2.
g.
11. There are a couple of ways to get to COM port settings from the SDK GUI. Here is one
way:
a.
230
www.xilinx.com
b.
Expand the Xilinx C/C++ ELF tree and select hello_world_0 Configuration.
Configure the COM port by clicking on the STDIO Connection tab and select the
COM port and BAUD Rate to match your PC settings as shown by the figure
below, then click Apply.
c.
Note: Keep the following expected behavior in mind when debugging this simple Embedded DSP
application.
MicroBlaze creates a impulse signal that is transferred into the din FIFO shared
memory.
This input impulse response is then propagated through the input din of the FIR
Compiler IP with filter coefficients of 1~16.
The FIR Compiler outputs are then captured by the MicroBlaze via the dout
FIFO shared memory. In this case the outputs are simply the filter coefficients,
which are 1, 2, 3, 416.
12. Highlight the file hello_world_0.elf under the Debug folder and select the pulldown
menu Run > Debug to initiate a debug session
www.xilinx.com
231
The tool should start downloading and configuring the bitstream through JTAG.
13. You may need to rearrange the Debug windows to display what you like to observe
during the Debug session as shown in the following figure.
14. When you debug any application in the workspace for the first-time, CDT switches to
"Debug Perspective" and prompts the user "if this should be the default behavior on
debug?". Click on Yes to confirm this behavior.
232
www.xilinx.com
C-code
First Console
Second Console
Note: Here are some of the nice features in this Debug cockpit that might be useful to you when
debugging a design.
You can hover on most of the variables in the C-code to display the values
You dont have to bring up a separate HyperTerminal console. Its now being
integrated inside the SDK GUI.
The First Console is not available the first time you launch SDK but it can be
added by using the New Console View as shown in the figure below.
www.xilinx.com
233
15. First, just download the bitstream and run the whole program without any
breakpoints by clicking on the play button as shown below.
16. You should see the same results for the dout signal both on the SDK console and
Simulink scope as shown below.
SDK Console
Simulink Scope
17. Place a breakpoint on line 55 of your C-code by double clicking on the line number.
This will toggle the breakpoint off/on.
Note: You need to right click on the gray bar and select Show Line Numbers to display line
numbers. When you first start SDK it does not show line numbers.
234
www.xilinx.com
18. Instead of clicking on the play button, click on the Debug button as shown below.
20. Again, place your mouse cursor into the open console and press any key to continue
running the program.
21. Continue clicking the Resume button and observe both the SDK console and Simulink
Scope. You should see dout bit 0~16 being displayed as you step through the program.
On the Simulink side, multiple signals are wired to the scope.
Note: Click on the Autoscale button to refresh the scope.
www.xilinx.com
235
22. Next, terminate the current debug session and relaunch another one by right-clicking
on the current application and select the Terminate and Relaunch submenu as shown
below.
23. Another nice feature in SDK is an ability to examine and override variables. This is
especially useful if you want to test for certain conditions of your C-code as well as
your System Generator model. For example, if you want to test the While-loop by
overriding the i variable, right-click on the i variable and change its current value
to 16 (the last value).
24. Click the Resume button and you should see Done printed out on the SDK console.
236
www.xilinx.com
Summary
The following are some of advantages of using Co-Debug between System Generator and
SDK:
You can develop and debug hardware and software concurrently without having to
recompile the bitstream
The System Generator Co-Debug circuit is automatically inserted into the XPS design
When SDK is launched from System Generator, the SDK project is automatically setup
with the correct hardware platform
www.xilinx.com
237
238
www.xilinx.com
Chapter 3
www.xilinx.com
239
240
www.xilinx.com
When a compilation target is selected, the fields on the System Generator token dialog box
are automatically configured with settings appropriate for the selected compilation target.
System Generator remembers the dialog box settings for each compilation target. These
settings are saved when a new target is selected, and restored when the target is recalled.
www.xilinx.com
241
compilation as it is run. It is possible to hide the compilation details by pressing the Hide Details
button on the status dialog box.
The configuration bitstream contains the hardware associated with your model, and also
contains additional interfacing logic that allows System Generator to communicate with
your design using a physical interface between the board and the PC. This logic includes a
memory map interface over which System Generator can read and write values to the
input and output ports on your design. It also includes any board-specific circuitry (e.g.,
DCMs, external component wiring) that is required for the target FPGA board to function
correctly.
242
www.xilinx.com
out of the library and use it in your System Generator design as you would other Simulink
and System Generator blocks.
The hardware co-simulation block assumes the external interface of the model or
subsystem from which it is derived. The port names on the hardware co-simulation block
match the ports names on the original subsystem. The port types and rates also match the
original design.
Hardware co-simulation blocks are used in a Simulink design the same way other blocks
are used. During simulation, a hardware co-simulation block interacts with the underlying
FPGA board, automating tasks such as device configuration, data transfers, and clocking.
A hardware co-simulation block consumes and produces the same types of signals that
www.xilinx.com
243
other System Generator blocks use. When a value is written to one of the block's input
ports, the block sends the corresponding data to the appropriate location in hardware.
Similarly, the block retrieves data from hardware when there is an event on an output port.
Hardware co-simulation blocks may be driven by Xilinx fixed-point signal types, Simulink
fixed-point signal types, or Simulink doubles. Output ports assume a signal type that is
appropriate for the block they drive. If an output port connects to a System Generator
block, the output port produces a Xilinx fixed-point signal. Alternatively, the port
produces a Simulink data type when the port drives a Simulink block directly.
Note: When Simulink data types are used as the block signal type, quantization of the input data is
handled by rounding, and overflow is handled by saturation.
Like other System Generator blocks, hardware co-simulation blocks provide parameter
dialog boxes that allow them to be configured with different settings. The parameters that
a hardware co-simulation block provides depend on the FPGA board the block is
implemented for (i.e., different FPGA boards provide their own customized hardware cosimulation blocks).
244
www.xilinx.com
Interface
JTAG,
Point-to-point Ethernet,
Network-based Ethernet
System Clock
Frequency
100 MHz
Available
Frequencies
100 MHz
66.7 MHz
50 MHz
33.3 MHz
Xilinx ML506
Point-to-point Ethernet,
Network-based Ethernet
200 MHz
100 MHz
66.7 MHz
50 MHz
33.3 MHz
As shown below, you set the target clock frequency at compilation time, by clicking the
Settings button on the System Generator token dialog box, then select the frequency in the
pulldown menu.
1. Click
2. Select
www.xilinx.com
245
Clocking Modes
There are several ways in which a System Generator hardware co-simulation block can be
synchronized with its associated FPGA hardware. In single-step mode, the FPGA is in
effect clocked from Simulink, whereas in free-running clock mode, the FPGA runs off an
internal clock, and is sampled asynchronously when Simulink wakes up the hardware cosimulation block.
Single-Step Clock
In single-step clock mode, the hardware is kept in lock step with the software simulation.
This is achieved by providing a single clock pulse (or some number of clock pulses if the
FPGA is over-clocked with respect to the input/output rates) to the hardware for each
simulation cycle. In this mode, the hardware co-simulation block is bit-true and cycle-true
to the original model.
Because the hardware co-simulation block is in effect producing the clock signal for the
FPGA hardware only when Simulink awakes it, the overhead associated with the rest of
the Simulink model's simulation, and the communication overhead (e.g. bus latency)
between Simulink and the FPGA board can significantly limit the performance achieved
by the hardware. As a general rule of thumb, as long as the amount of computation inside
the FPGA is significant with respect to the communication overhead (e.g. the amount of
logic is large, or the hardware is significantly over-clocked), the hardware will provide
significant simulation speed-up.
Free-Running Clock
In free-running clock mode, the hardware runs asynchronously relative to the software
simulation. Unlike the single-step clock mode, where Simulink effectively generates the
FPGA clock, in free-running mode, the hardware clock runs continuously inside the FPGA
itself.
In this mode, simulation is not bit and cycle true to the original model, because Simulink is
only sampling the internal state of the hardware at the times when Simulink awakes the
hardware co-simulation block. The FPGA port I/O is no longer synchronized with events
in Simulink. When an event occurs on a Simulink port, the value is either read from or
written to the corresponding port in hardware at that time. However, since an unknown
number of clock cycles have elapsed in hardware between port events, the current state of
the hardware cannot be reconciled to the original System Generator model. For many
streaming applications, this is in fact highly desirable, as it allows the FPGA to work at full
speed, synchronizing only periodically to Simulink.
In free-running mode, you must build explicit synchronization mechanisms into the
System Generator model. A simple example is a status register, exposed as an output port
on the hardware co-simulation block, which is set in hardware when a condition is met.
The rest of the System Generator model can poll the status register to determine the state of
the hardware.
246
www.xilinx.com
Note: The clocking options available to a hardware co-simulation block depend on the FPGA board
being used (i.e., some boards may not support a free-running clock source, in which case it is not
available as a dialog box parameter).
www.xilinx.com
247
Generator compiles the design into hardware, it connects the signals that are associated
with the Gateways to the appropriate external devices they signify in hardware.
248
www.xilinx.com
Interface Features
The interface supports 10/100/1000 Mbps half/full duplex modes. Jumbo Frame is also
supported on a Gigabit Ethernet, provided it is enabled by the underlying connection. For
FPGA device configuration, the interface supports either JTAG-based configuration over a
Xilinx Parallel Cable IV or a Xilinx Platform USB cable, or , for selected boards, Ethernetbased configuration over the same Point-to-point Ethernet connection for co-simulation.
Note: This co-simulation interface utilizes an evaluation version of the Ethernet MAC core. Because
this is an evaluation version of the core, it will become dysfunctional after continuous, prolonged
operation (e.g., around 7 hours) in the target FPGA. Operation of the core will restart with a new
simulation. For more information about obtaining the full version of the core, please visit the product
page at http://www.xilinx.com/xlnx/xebiz/designResources/ip_product_details.jsp?key=TEMAC.
Use the Basic tab to select the appropriate clock source for the co-simulation.
Select a Clock
www.xilinx.com
249
2.
250
Change the Configuration timeout (ms) value only when necessary. The default
value should suffice in most cases. A larger value is needed when it takes a
considerable amount of time to re-establish a network connection with the FPGA
board after device configuration completes.
If there is a Video I/O daughter card attached to the ML402 board, select Video
I/O Daughter Card (VIODC) from the Configuration profile pulldown menu
www.xilinx.com
3.
From the Host interface panel, use the pulldown list to select the appropriate
network interface for co-simulation.
Note: The pull down list only shows those Ethernet-compatible network interfaces installed
on the host, which support 10/100/1000 Mbps, and are currently enabled and attached to an
active Ethernet segment. If the target interface is not listed as expected, examine the
connection and click the Refresh button to update the list.
4.
The information box beneath the pull-down list provides the details about the
selected interface. Examine the information to ensure the appropriate interface is
chosen, and adjust the network settings in the operating system when necessary.
Depending on which configuration method is chosen, the MAC address in the FPGA
interface panel may need to be changed.
a.
Observe the MAC address displayed on the LCD screen of the target board when the
configuration boot-loader is running. Change the FPGA MAC address in the cosimulation block if the default value does not match the target board. Refer to Optional
Step to Set the Ethernet MAC Address and the IPv4 Address for details about
assigning the MAC address on a ML402 board.
Select an Interface
www.xilinx.com
251
Note: The MAC address must be specified using six pairs of two-digit hexadecimal number
separated by colons (for example, 00:0a:35:11:22:33).
252
1.
The final configuration file is first generated based on the input bitstream specified in
the block parameters.
2.
The final configuration file is then transferred to the target board using the selected
download cable, and used to configure the FPGA device. The progress of
configuration is shown in the dialog box when the configuration is performed over a
Point-to-point Ethernet connection.
3.
www.xilinx.com
Known Issues
Setup Procedures
1.
2.
The target FPGA listens on the UDP port 9999. Please ensure the underlying network
does not block the associated traffic.
Known Issues
www.xilinx.com
253
254
www.xilinx.com
If the Cable Location is set to Remote CSE Server, you can specify a CSE server, in form of
a host name or an IP address, followed by an optional port number:
<host name or IP address> [ :<port number> ]
If you omit the port number, the default port number is used by the CSE server.
www.xilinx.com
255
256
Co-Simulating Unprotected
Shared Memories
www.xilinx.com
From Block
Hardware Implementation
Shared Memory
Shared Memory
To FIFO
To FIFO
Fifo Generator
To Register
To Register
synth_reg_w_init.(vhd,v)
There are two ways in which shared memories are compiled for hardware co-simulation.
The type of compilation depends on whether the shared memory name is unique in the
design, or if the shared memory has a partner who shares the same name. The following
topics describe the two types of compilation behavior.
The shared memory hardware and interface logic are completely encapsulated by the
hardware co-simulation block that is generated for the design. By co-simulating a
hardware co-simulation block that contains a shared memory, it is possible for your design
logic and host PC software to share a common address space on the FPGA.
www.xilinx.com
257
Note: The name of the hardware shared memory is the same as the shared memory name used by
the original shared memory block. For example, if a shared memory block uses "my_memory," the
hardware implementation of the block can be accessed using the "my_memory" name.
All shared memories embedded inside the FPGA are automatically created and initialized
before the start of a simulation by their respective co-simulation blocks. This means that
any other shared memory objects that wish to access the hardware shared memory must
specify Ownership and initialization parameter as Owned and initialized elsewhere.
Doing so causes the software-based shared memories to attach automatically to the shared
memories that reside inside the FPGA.
Note that because both sides of the shared memory connect to user design logic, it is not
possible to communicate with these shared memories directly from the host PC.
258
www.xilinx.com
The shared memory information table describes the type, bit width, and depth of each
shared memory in the design. For Shared Memory blocks, it also specifies the Access
Protection mode. Clicking on the [+] or [-] symbol next to the shared memory icon expands
or collapses the shared memory table, respectively.
The icons associated with each shared memory type are shown in the table below.
Memory Type
Icon
Shared Memory
Shared FIFO
Shared Register
www.xilinx.com
259
When software shared memory objects read or write data to the shared memory, a proxy
seamlessly handles communication with the hardware memory resource.
The following figure shows an example of unprotected shared memory implemented in
the FPGA that is communicating with three shared memory objects running on the host
PC. In this example, the software shared memory objects access the hardware shared
memory by specifying the same shared memory name, my_mem. From the perspective of
the software shared memories, the implementation of the shared memory resource is
irrelevant; the hardware shared memory is treated as any another shared memory object.
Read and writes to the shared memory are handled by the shared memory API.
Note: Not all shared memory objects need to be created or executed in the Simulink environment.
The C++ application in the figure below is just one example of an external application that may
communicate with the hardware shared memory data using the shared memory API.
260
www.xilinx.com
shared memories include additional logic to handle the mutual exclusion. The interaction
between hardware and software lockable shared memories is shown in the figure below: .
The red circle in the figure above represents a lock token. This token may be passed to any
shared memory object, regardless of whether it is implemented in hardware or software.
The dashed circle represents lock placeholders and signifies that lock can be passed to the
block it is associated with. The diamond in the figure above represents a modifiable token.
This token illustrates that when hardware has lock of the memory, the hardware shared
memory image may be modified. Likewise, when a software shared memory object has
lock, the software shared memory image may be modified.
Having two shared memory images requires synchronization between software and
hardware to ensure the images are coherent. This synchronization is accomplished by
transferring the memory image between software and hardware upon lock transfer.
www.xilinx.com
261
System Generator performs high speed data transfers between the host PC and FPGA. The
semantics associated with these transactions are shown in the figure below. .
262
www.xilinx.com
is possible for the PC to write to the register using System Generator's hardware cosimulation interfaces.
When a To Register block is compiled for hardware co-simulation, as shown in the figure
below, the input ports are wired to user logic while the output port is wired to PC interface
logic. You may access a shared register during hardware co-simulation using the other half
of the shared register (i.e., using a To or From Register block), a C program or executable
(System Generator API), or a MATLAB program.
For designs that use hardware co-simulation, shared register pairs are typically distributed
between software and FPGA hardware. In other words, one half of the pair is implemented
in the FPGA while the other half is simulated in software using a To or From Register
block. When data is written to a software To Register block, the hardware register is
updated to with the same data. Similarly, when data is written into the hardware register,
the same data is read by the From Register software block. A software shared register may
connect to a hardware shared register simply by specifying the name of the shared register
as it was compiled for hardware co-simulation.
Note: You may find the names of all shared memories embedded inside an FPGA co-simulation
design by viewing the Shared Memories tab on a hardware co-simulation block.
When a software / hardware shared memory pair is co-simulated, System Generator
transparently manages the interaction between the PC and FPGA hardware. This means
that a shared register pair simulated in software should behave the same as a shared
register pair distributed between the PC and FPGA hardware.
www.xilinx.com
263
Asynchronous FIFOs are typically used in multi-clock applications to safely cross clock
domain boundaries. When a Free-Running Clock mode is used for hardware cosimulation, the FPGA operates asynchronously relative to the Simulink simulation. That is,
the FPGA is not kept in lockstep with the simulation. Using the Free-Running Clock mode
effectively establishes two clock domains: the Simulink simulation clock domain and the
FPGA free-running clock domain. In these designs, Shared FIFOs provide a reliable and
safe way to transfer data between the host PC and FPGA board.
Shared FIFOs may also be used to support burst transfers during co-simulation. It is
possible to create vectors or frames of data, and transfer the data to the FPGA in a single
transaction with the hardware. These interfaces can be used to further accelerate
simulation speeds beyond what is typically possible with hardware co-simulation. For
more information on how this is accomplished, refer to the topic Frame-Based Acceleration
using Hardware Co-Simulation.
When a shared FIFO pair is generated for co-simulation, a single asynchronous FIFO core
replaces the two software shared FIFO blocks. As shown in the figure below, the read /
write FIFO sides are attached to user design logic (i.e., logic derived from the original
System Generator model) that attached to the From FIFO and To FIFO blocks. Because both
FIFO sides attach to user logic in hardware, the PC does not share control of the FIFO with
the design. Instead, the FIFO behavior is similar to a System Generator design that
includes a traditional FIFO block.
Note that even though the FIFO exposes independent clock ports, the same co-simulation
clock drives both ports when a FIFO pair is compiled. This is different from compiling a
shared FIFO pair using the Multiple Subsystem Generator block, where the clocks are from
distinct clock domains.
Single shared FIFO blocks are treated differently than shared FIFO pairs. A single To FIFO
or From FIFO block is replaced by an asynchronous FIFO core when it is compiled for
hardware co-simulation. One side of the FIFO (i.e., the unused shared FIFO half in System
Generator) is connected to PC interface logic. The other side is connected to user design
logic that attached to the original To or From FIFO block. In this manner, control over the
FIFO is distributed between the PC and FPGA design.
As shown in the following figure, when a To FIFO block is compiled for hardware cosimulation, the write side of the FIFO is connected to the same logic that attached to To
264
www.xilinx.com
FIFO block in user design. The read side of the FIFO is connected to PC interface logic that
allows the PC to read data from the FIFO during simulation.
In the figure below, the opposite wiring approach is used when a From FIFO block is
compiled for hardware co-simulation. In this case, the write side of the FIFO is connected
to PC interface logic, while the read side is connected to the user design logic. The host PC
writes data into the FIFO and the design logic may read data from the FIFO.
For designs that use hardware co-simulation, shared FIFO pairs are typically distributed
between software and FPGA hardware. In other words, one half of the pair is implemented
in the FPGA while the other half is simulated in software using a To or From FIFO block.
Together, the software and hardware portions form a fully functional asynchronous FIFO.
When a software / hardware shared FIFO pair is co-simulated, System Generator
transparently manages the necessary transactions between the PC and FPGA hardware.
When data is written to a software To FIFO block during simulation, the same data is
written to the FIFO in hardware. The design in hardware may then retrieve this data by
reading from the FIFO. Similarly, when data is written into the hardware FIFO by design
logic, the data may be read by the From FIFO software block. Note that the empty, full, read
and write count ports on the shared FIFO blocks pessimistically reflect the state of the
hardware FIFO counterpart. A software shared FIFO may connect to a hardware shared
FIFO simply by specifying the name of the shared FIFO as it was compiled for hardware
co-simulation.
www.xilinx.com
265
Note: You may find the names of all shared memories embedded inside an FPGA co-simulation
design by viewing the Shared Memories tab on a hardware co-simulation block.
The access protection mode of a shared memory may not be modified once it has been
compiled for hardware co-simulation.
Shared memory address port widths are limited to 24-bits (or less), allowing an
address space of 16,777,216 words;
Shared memory, register, and FIFO data port widths are currently limited to 32-bits or
less.
Shared memories and FIFOs are implemented in hardware using block memories;
neither distributed nor external memory implementations are currently supported.
No more than two shared memories with the same shared memory name may be
compiled for hardware co-simulation.
Two or more hardware co-simulation blocks that have shared memory names in
common may not concurrently be used in the same design.
266
www.xilinx.com
The Hardware Co-Simulation Settings dialog box, shown below, allows you to specify
options files other than the default options files provided by the compilation target.
Implementation Flow: Specifies the options file that is used by the implement flow
type. By default, System Generator will use the implement options file that is
specified by the compilation target.
Configuration Flow: Specifies the options file that is used by the config flow type. By
default, System Generator will use the config options file that is specified by the
compilation target.
The Xilinx ISE software includes several example XFLOW options files. From the base
directory of your Xilinx ISE software tree, these files are located under the directory
sysgen/plugins/compilation/Bitstream. Commonly used implementation
options files include:
balanced.opt
bitgen.opt
Note: It is possible to define options files that may cause errors in the System Generator hardware
co-simulation flow. As a result, it is typically a good idea to make backup copies of the default options
files before modifying them. In addition, the configuration options file should be edited with caution, as
most FPGA hardware boards have specific configuration parameter requirements.
www.xilinx.com
267
Transfer the vector data to a buffer residing on the FPGA using a burst transfer;
Use the FPGA, in free-running clock mode, to sequentially process the entire input
buffer;
Transfer the contents of the output buffer back into Simulink and reconstruct the data
as a Simulink vector;
Shared Memories
Before a System Generator design can support vector transfers, it must be augmented with
appropriate input and output buffers. In hardware, these buffers are implemented using
internal memory (e.g., BRAMs) and are used to store vectors of simulation data that are
written to and read from the FPGA by the PC. This means that the maximum size of the
268
www.xilinx.com
buffers is limited by the amount of internal memory available on the target device. In
System Generator, shared memory blocks provide interfaces that implement such buffers.
A question that quickly comes to mind is why not use standard FIFO or memory blocks?
The buffers required for hardware co-simulation differ from traditional FIFOs and
memories in that they must be controllable by both the PC and FPGA user design logic.
The standard FIFO and memory blocks provided by System Generator can only interface
with user design logic.
There are two types of shared memories that provide this control: lockable shared
memories and shared FIFOs. These blocks provide different buffering styles; each with
their own handshaking protocols that determine when and how burst transactions with
the FPGA occur. In this tutorial, primary attention is focused on shared FIFO buffers. For
an example on how to use lockable shared memories, please refer to the tutorial entitled
Real-Time Signal Processing using Hardware Co-Simulation. You may find the lockable
shared memory and FIFO blocks in the Shared Memory library of the Xilinx Blockset.
Because shared FIFOs play a central role in enabling vector transfers, it is worth a brief
aside to discuss their behavior. A shared FIFO pair is comprised of a To FIFO block and a
From FIFO block that specify the same name (e.g., Bar in the figure above). The To FIFO
block provides the "write side" control signals, while the From FIFO block provides the
"read side" control signals. When used together, a shared FIFO pair is conceptually the
same thing as a single FIFO only the control signals for the two sides are graphically
disjoint. This means that a shared FIFO pair shares the same FIFO memory space. For
example, if you write data into a To FIFO block, you may retrieve the same data by reading
from the From FIFO block. The connection between these two blocks is implicit; shared
FIFOs are associated with one another by name and not by explicit Simulink wires.
Shared FIFOs and shared memories in general may be compiled for hardware cosimulation. Note that although this tutorial touches briefly on how shared FIFOs are cosimulated, it is useful to refer to the topic titled Co-Simulating Shared FIFOs for more indepth information. When one-half of a shared FIFO block is compiled for hardware cosimulation, a full FIFO block is embedded in the FPGA using the FIFO Generator core. One
side of the FIFO connects to user design logic (i.e., the System Generator logic that
connected to the shared FIFO block). The other half connects to interface logic that allows
it to be controlled by the PC. This side of the FIFO may be controlled by other System
Generator software model logic (e.g., the half of the shared FIFO), by a C program or
software executable, or by a MATLAB program. By compiling shared FIFOs for hardware
www.xilinx.com
269
co-simulation, you create embedded FIFO-style buffers in the FPGA that can be controlled
directly by a PC.
There are several ways to communicate with a shared FIFO that is embedded inside the
FPGA. The most common approach is to include the other half of the shared FIFO in the
System Generator design. It is also possible to communicate with the shared FIFO using a
C program or MATLAB program. System Generator provides additional blocks that
support vector transfers to and from the FIFO. These blocks will be touched on later in the
tutorial as they play a key role in supporting burst transfers to and from the FPGA.
270
www.xilinx.com
available in the input FIFO. Conversely, data is written into the output FIFO whenever
valid data is present on the data path.
To gain a better understanding of how the Shared FIFOs are used, you will now take a look
at an example design that uses vector transfers to accelerate a MAC filter design.
1.
2.
The example design implements a 32-tap MAC FIR filter that removes additive white noise
from a sinusoid input source. The amount of white noise can be adjusted interactively by
moving the Slider Gain control bar before or during simulation. An output scope compares
the filtered output data against the unfiltered input data. The MAC filter itself is contained
inside a subsystem named hw_cosim. This subsystem contains all of the logic that will be
www.xilinx.com
271
compiled into the FPGA for hardware co-simulation. You consider everything else in the
design (i.e., all blocks in the top-level) as the design test bench.
Pushing into the hw_cosim subsystem, you have an n-tap MAC FIR Filter block that
implements the design data path. Wrapping the filter are From FIFO and To FIFO blocks
that provide the input and output buffers, respectively. The MAC filter in the example
design is a modified version of the n-tap MAC filter available in the System Generator DSP
Reference Blockset library. In the example, the filter is modified to include valid in and
valid out ports in order to support the FIFO flow control scheme.
In total, there are four shared memory blocks in the design that define the CA and VA
shared FIFO pairs. In truth, you only need the shared FIFO blocks contained inside the
hw_cosim subsystem to successfully compile the design for hardware co-simulation.
Because you would like to simulate the complete design, including FPGA hardware, you
include a CA To FIFO block and VA From FIFO block in the test bench logic. These shared
FIFO blocks are responsible for writing and reading test data from the shared FIFOs in the
hw_cosim subsystem.
Unfiltered data from the din Gateway In block is written into the CA To FIFO block. At this
point, the CA From FIFO block in the hw_cosim subsystem reads data from the FIFO and
writes it into the MAC filter. The MAC filter in turn processes the data and writes it into the
output buffer, represented by the VA To FIFO block. Lastly, the VA From FIFO block in the
top-level reads the data and sends it to the Scope block for visualization.
For this example, you have chosen a maximum buffer size of 4K. This parameter is set by
specifying 4K for the Depth parameter on the CA From FIFO and VA To FIFO block dialog
boxes. Note that because shared FIFOs are implemented using asynchronous FIFO
272
www.xilinx.com
Generator cores, the actual depth of the hardware FIFO is n-1 words, where n is the depth
specified on the dialog box.
You will now have a chance to simulate the design to see how fast it runs in software.
3.
4.
Record the time required to simulate the design for 10000 cycles. To get an accurate
measurement, it is preferable to leave the scope block closed since the graphic updates
may affect simulation performance.
You may adjust the Slider Gain bar during simulation to see how the presence of additional
noise affects the filter performance. You may view the filtered and unfiltered data in the
output scope block. The top axis shows the unfiltered input data. The bottom axis shows
the filtered data results.
www.xilinx.com
273
Double-click on the System Generator token in the hw_cosim subsystem to open the
System Generator dialog box.
6.
7.
Press the Generate button on the System Generator dialog box to generate the design.
A new hardware co-simulation library and block are created once System Generator
finishes compiling the design. Note that the new hardware co-simulation block does not
have any input or output ports. This is because the subsystem that was compiled did not
contain gateway blocks or Simulink ports. Instead, all connections to other Simulink blocks
are handled implicitly through shared memories that were compiled into the FPGA.
Because you left the To FIFO and From FIFO blocks as part of the software testbench, the
software FIFOs will automatically attach to the FIFOs in hardware at the beginning of
simulation.
It is often necessary to examine the type and configuration of a shared memory that was
compiled for hardware co-simulation. The information about each shared memory is
available in a Shared Memories tab on the hardware co-simulation block dialog box. This
tab contains a tree view of information about each shared memory embedded in the
design.
8.
Double-click on the hardware co-simulation block to open the parameters dialog box.
9.
Select the Shared Memories tab in the hardware co-simulation block dialog box.
The tree-view contains information about the CA and VA shared FIFO blocks that were
compiled. If your co-simulation design contains other shared memory blocks, information
about these blocks will also be displayed here. You may expand or collapse shared
274
www.xilinx.com
memory information by clicking on the (+) or (-) icons located adjacent to the shared
memory icons.
13. Configure the hardware co-simulation block with any settings necessary to cosimulate using single-step clock mode.
14. Press the Simulink Start button to start the design.
www.xilinx.com
275
15. Record the amount of time required to simulate the design for 10000 cycles.
___________________________________________________________________
16. Close the design, but leave the hardware co-simulation library open since you will
need it in the next topic.
In the simulation above, hardware co-simulation uses single word transfers. That is,
whenever there is a new simulation value to be read or written to the hardware cosimulation, the PC initiates a transaction with the FPGA. The next topic describes how
vector transfers may be used to increase simulation speed by making more efficient use of
the available hardware co-simulation bandwidth.
The Shared Memory Write block accepts a Simulink scalar, vector, matrix or frame data
type and writes the data sequentially into a shared memory. The complete contents of the
Simulink signal are written into the shared memory in a single simulation cycle. As is the
case with all shared memory blocks, an association is made between a Shared Memory
Read or Write block and another shared memory by specifying the same shared memory
name.
Matrix types are treated as having a column-major order. That is, when data is written
sequentially into a shared memory, the elements in a column are written first before
advancing to the next column. For example, assume you have the matrix of data shown
below. During simulation, this matrix data is written into the FIFO (or shared memory) in
the following order:
Using these blocks, it is possible to read or write full vector, frame, or matrix signals into
shared memories, provided the following conditions are met:
The input signal driven to a shared memory write block is an 8-bit, 16-bit, or 32-bit
signed or unsigned integer;
The number of elements in the vector or matrix does not exceed the depth of the
shared memory or FIFO.
The data width of the Shared Memory Read or Write block (i.e., the bitwidth of the
scalar, or vector or matrix element) equals the shared memory or FIFO data width.
You can use these blocks in the example design to read and write vectors of data samples
to the MAC filter in a single software / hardware transaction.
276
www.xilinx.com
Note that the buffer block introduces a sample rate change in the design. For every 4095
inputs, there is only one output. Thus if the data input sample period is 1, the buffer data
output sample period is 4095. This means that the Shared Memory Write block need only
send a new frame of data to the FPGA on every 4095th simulation cycle (which is
considerably more efficient than initiating a hardware transaction during every simulation
cycle).
Because the Buffer block introduces a rate change, you must adjust the downstream blocks
to accommodate the slower sample period. You begin by telling the Shared Memory Read
block to read a frame of data every 4095th simulation cycle.
18. Double-click on the Shared Memory Read block to open its parameters dialog box.
On the Type field under the Basic tab, you have configured the block to use shared FIFOs.
To ensure a new frame is read at the appropriate time, you configure the Shared Memory
Read block with a Sample time value of 4095.
The Shared Memory Read block allows you to specify the output data type and
dimensions.
www.xilinx.com
277
19. On the parameters dialog box, switch to the Output Type tab.
There are several things of interest on this tab. First, you set the output data type as an int32
to match the filter data path output width of 32-bits. Note the design will not simulate
unless these widths match. Secondly, you choose an output dimension that is 4095 words
deep in the Output dimensions field. Finally, you tell the block to generate frame-based
output since frame data types are required by the downstream Unbuffer block.
20. Close the parameters dialog box.
The Simulink Unbuffer block takes the frame data from the Shared Memory Read block
and deserializes it into sequential scalar values. The Simulink Unbuffer block also
introduces a sample rate change in the diagram. Because the input sample period to the
block is 4095, and the frame size is 4095 words, the Unbuffer block output sample period is
1. This works out nicely since you have data moving through the overall system at an
effective sample period of 1.
Because the Shared Memory Write and Read block operate on integer values, you must
insert Simulink type conversion blocks into the diagram so that the data is interpreted
correctly in various portions of the model. The in_data_conv subsystem converts the
Simulink doubles into 16-bit integer values that can be interpreted appropriately by the
FPGA hardware. On the output side, the out_data_conv subsystem converts the 32-bit
integers into 32-bit Simulink fixed-precision values.
Before simulating the design, you must add the hardware co-simulation block you created
from the previous design.
278
www.xilinx.com
21. Add the hardware co-simulation block to the design as shown below.
As mentioned before, the Shared Memory Write block writes a new input frame of 4095
words to the FPGA on every 4095th clock cycle. Likewise, the Shared Memory Read block
reads an output frame of 4095 words from the FPGA on every 4095th clock cycle. This
means that the FPGA must process the entire frame in a single-cycle. How exactly is this
accomplished?
The first step is to configure the FPGA in free-running clock mode. In doing so, you allow
the FPGA to process data considerably faster than if it were otherwise kept in lockstep with
the Simulink simulation. Whereas in single-step mode the FPGA can only process one data
per Simulink cycle, the FPGA processing speed is limited only by the system clock
frequency when operating in free-running clock mode. Even so, if the buffer is large
enough, the FPGA may not have time to process the complete buffer before the next block
in the design is woken up. You still need a way to stall the rest of the simulation while the
FPGA processes the entire buffer.
The Shared Memory Read block checks the number of FIFO words available in the output
buffer before trying to read a frame. If the number of words in the buffer is insufficient, the
Read block waits for a small amount of time, and then checks again to determine if the
words have become available. It only reads the frame once all of the words are available in
the output buffer, in this case 4095. In this manner, the Shared Memory Read block can stall
the simulation until the complete frame has been processed by the FPGA.
www.xilinx.com
279
Two steps necessary to run the simulation using Simulink frames signals are provided
below:
22. Double-click on the hardware co-simulation block to bring up the parameters dialog
box.
23. Select Free running clock mode as shown below.
24. Configure the hardware co-simulation block with any additional settings necessary for
simulation according to the requirements of your co-simulation board.
25. Press the Simulink Start button to start the design.
26. Record the amount of time required to simulate the design for 10000 cycles.
___________________________________________________________________
27. What is the simulation speed increase over the time recorded in step 15?
___________________________________________________________________
280
www.xilinx.com
2.
The I/O buffering interface allows you to easily buffer and stream data through a System
Generator signal processing data path during hardware co-simulation. The example
design is comprised of two subsystems that implement input and output buffer storage,
named Input Buffer and Output Buffer, respectively. The turquoise block in the center of
the diagram is a placeholder for the signal processing data path which you will substitute
into the design.
At the heart of each buffering subsystem is a lockable shared memory block that provides
the buffer storage. Each shared memory is wrapped by logic that controls the flow of data
www.xilinx.com
281
from the host PC, through the interface, and back to the host PC. Operation of the I/O
buffering interface is shown in the flow chart below:
Notice that the buffering interface design includes several data valid ports. These ports are
used for data flow control. A "true" output from the Input Buffer dout_valid port
indicates new data is ready to be processed by the data path. Likewise, when the data path
is finished processing the data, it should drive the Output Buffer subsystem's
din_valid port to "true" to indicate valid output data (the din_valid port is analogous to
a write enable control signal).
The example includes a placeholder that should be replaced by a System Generator data
path. You may insert any data path in the buffer interface provided that it works within the
valid signal semantics described above.
282
www.xilinx.com
Note: The output buffer shared memory does not release lock until the output buffer is full. To avoid
deadlock, the number of valid assertions by the data path should equal the output memory buffer size
for a given processing cycle.
4.
The data path uses line buffers to properly align data samples in the filter kernel. The size
of these line buffers can be parameterized to accommodate different frame sizes. In this
example, the line buffers are implemented in the Virtex2 5 Line Buffer block in the
conv5x5_video_ex/5x5_filter subsystem, and are pre-configured with a line size of
www.xilinx.com
283
128. If you decide to process a different size frame, the Line Size parameter should be
updated accordingly.
An addressable shift register block (ASR) is used to delay the valid bit. The offset port is
used to control the address of the ASR block, which in turn controls the amount of latency
the valid bit incurs. By simply delaying the valid bit generated by the input buffer block,
You ensure the number of words written to the output buffer is always equal to the buffer
size. Note that when the design is run in hardware, a change in the offset value will cause
the vertical alignment of the filtered images to change.
284
www.xilinx.com
The coefficient_memory contains a copy of the most recently loaded filter coefficients,
which are stored in an unprotected shared memory named coef_buffer. During runtime, the subsystem monitors the shared memory contents, and initiates a reload sequence
if detects a change. By co-simulating the unprotected shared memory, any process on the
host PC may write new kernel coefficients simply by writing to a shared memory object
named coef_buffer. This interface is convenient, as communication with the FPGA
hardware is completely abstracted through the Shared Memory API.
Double click on the System Generator token located at the top of the
conv5x5_video_ex model.
6.
7.
Press the Generate button to compile the design for hardware co-simulation.
Hardware co-simulation blocks include information about any shared memories, registers,
or FIFOs that were compiled as part of the design. You may view this information by
double-clicking on the hardware co-simulation block to open the parameters dialog box.
www.xilinx.com
285
Once the dialog box is open, selecting the Shared Memories tab reveals information
about each shared memory in the compiled design.
Go ahead and leave the hardware co-simulation library open. In the next topic you will
include the hardware co-simulation block in a video processing testbench design.
From the
...sysgen/examples/shared_memory/hardware_cosim/conv5x5_video
directory, open conv5x5_video_testbench.mdl.
The testbench model uses a From Workspace block to produce the looped video sequence.
Each frame of the video sequence is represented as a 128x128 uint8 Simulink matrix (a
pre-load function loads and initializes the video sequence automatically when the model is
opened). Video frames are written into the FPGA Processing subsystem where they are
filtered at the rate of one frame per simulation cycle. The filtered output is then written to
a Matrix Viewer block for analysis.
The FPGA Processing subsystem contains a stub for the hardware co-simulation block,
as well as Shared Memory Read and Write blocks. In this example, the Shared Memory
Read and Write blocks are responsible for managing video frame I/O to and from the
shared memories operating inside the FPGA. The operation of these blocks is described
below:
286
a.
The Shared Memory Write block wakes up and requests lock of the input buffer
lockable shared memory Foo. Once lock is granted, the block writes the video
frame data input into the lockable shared memory and releases lock.
b.
The hardware co-simulation block wakes up and requests lock of the input and
output buffer shared memories Foo and Bar. The host PC shared memory images
are transferred to the FPGA and lock is granted. The FPGA processes the input
buffer data and writes the output into the output buffer. Lastly the FPGA releases
www.xilinx.com
lock of Foo and Bar, causing the FPGA shared memory images to be transferred
back to the host PC.
c.
The Shared Memory Read block wakes up and requests lock of the output buffer
lockable shared memory Bar. The block reads a video from the output buffer and
drives its output port with the processed video frame data.
Note that the three steps listed above assume a specific sequencing of the hardware cosimulation and Shared Memory Read and Write blocks. To ensure these blocks are
properly sequenced, you can set block priorities, where a lower priority block is woken up
first during simulation.
www.xilinx.com
287
9.
Add the hardware co-simulation block to the testbench model in place of the turquoise
placeholder residing in the FPGA Processing subsystem.
The Shared Memory Write block in the testbench is pre-configured with a priority of 1, and
the Shared Memory Read block is pre-configured with a priority of 3. Since you want the
hardware co-simulation block to wake up second in the simulation sequence, you must set
the hardware co-simulation block priority to 2.
10. Right-click on the hardware co-simulation block, and select Block Properties.
288
www.xilinx.com
The left image is the original video frame. The image on the right is the same frame that has
been processed using the "smooth" filter kernel. Note that the smoothing filter is just one of
several filters that can be applied to the video source.
www.xilinx.com
289
290
www.xilinx.com
Install Xilinx ISE Design Suite software as described in the document: ISE Design
Suite Installation, Licensing, and Release Notes
Install WinPcap version 4.1.1 software, which may be obtained from the website at
http://www.winpcap.org.
As shown below, from the Start menu, select Control Panel, then right-click on Local
Area Connection, then select Properties.
www.xilinx.com
291
292
2.
As shown below, select Internet Protocol (TCP/IP), then click on the Properties button
and set the IP Address 192.168.8.2 and Subnet mask to 255.255.255.0. (The last digit of
the IP Address must be something other than 1, because 192.168.8.1 is the default IP
address for the board.
3.
Click on the Configure button, select the Advanced tab, select Flow Control, then
select Auto.
www.xilinx.com
4.
Set Speed & Duplex to Auto, then click out using the OK button.
The CompactFlash card comes with a series of demo files that you might want to re-load
and exercise later.
a.
Connect the CompactFlash Reader to the PC. This is usually done through a USB
port.
b.
c.
Click on the MyComputer icon, then select the Removable Disk drive that
represents the CompactFlash Reader.
d. Create or open a backup folder on the PC and copy the content of the
CompactFlash card to that folder for later use.
Note: For the following steps, 'e:' is assumed to be the drive name associated with the
CompactFlash reader.
2.
The card needs to be re-formatted to a FAT16 file system before the System Generator files
can be transferred. You use the mkdosfs utility to format the card.
a.
b.
www.xilinx.com
293
c.
Open a Windows shell by selecting Start > Run..., then type cmd in the Run dialog
box and click OK.
Caution! In the following step, make sure the drive name (e.g., 'e:' in this case) is specified
correctly for the Compact Flash Removable Disk. Otherwise, the information on the mistakenly
targeted drive will be erased and the drive will be re-formatted.
e.
Type the following mkdosfs command after the Windows command prompt:
mkdosfs -v -F 16 e:
The content of the Compact Flash card should be wiped clean and re-formatted.
3.
Note: For reference, the Sysgen files to be copied are located at the following pathname:
For ML402:
...<ISE_Design_Suite_tree>/sysgen/plugins/bin/ML402_sysace_cf.zip
For ML506:
...<ISE_Design_Suite_tree>/sysgen/plugins/bin/ML506_sysace_cf.zip
For Spartan-3A DSP 3400:
...<ISE_Design_Suite_tree>/sysgen/plugins/bin/S3ADSP_DB__sysace_cf.zip
Invoke MATLAB on the PC, then enter the following command on the MATLAB
Command Line. The command for ML402 is illustrated:
unzip(fullfile(xilinx.environment.getpath('sysgen'),'plugins/bin/ML402_sysace_cf.zip'),'e:/')
The following files and folder should now be listed on the CompactFlash drive:
Optional Step to Set the Ethernet MAC Address and the IPv4 Address
Note: The following step may be necessary if the default MAC and IP addresses conflict with your
default network settings, or if you wish to co-simulate two or more boards concurrently. If not, proceed
to the next topic.
After writing the data to the card, you will find two files, mac.dat and ip.dat, in the
card root directory. The mac.dat and ip.dat files specify the Ethernet MAC address
and IPv4 address associated with the board, respectively. These addresses are used to
uniquely identify a target board during Ethernet hardware co-simulation.
a.
294
Open mac.dat in a text editor and change the Ethernet MAC address. The MAC
address must be specified as a six pair of two-digit hexadecimal separated by
colons (e.g. 00:0a:35:11:22:33). All-zeros, broadcast, or multicast MAC
addresses are not supported.
www.xilinx.com
b.
Open ip.dat in a text editor and change the IP address. The IP address must be
specified in IPv4 dotted decimal notation (e.g. 192.168.8.1). All-zeros,
broadcast, multicast, or loop-back IP address are not supported. After changing
the IP address for the board, update the IP address for the network connection on
the PC accordingly as mentioned in topic Setting Up the Local Area Network on
the PC. For a direct connection, the board and the PC must be on the same subnet.
Otherwise, the board IP address should be reachable from the PC and vice versa.
2.
3.
Run the shell script called install_pcap_proxy.sh. For example, at the shell command
line type:
./install_pcap_proxy.sh
CompactFlash Card
b.
c.
Setup the PC
1.
Install the related software on the PC as described in the topic Installing Software on
the Host PC.
2.
Setup the Local Area Network as described in the topic Setting Up the Local Area
Network on the PC.
www.xilinx.com
295
296
1.
Position the ML402 board so the Virtex-4 and Xilinx logos are oriented near the top
edge of the board.
2.
Make sure the power switch, located in the upper-right corner of the board, is in the
OFF position.
3.
As shown below, Eject the CompactFlash card from the CompactFlash Reader.
4.
5.
Locate the CompactFlash card slot (on the back side of the ML402 board), and carefully
insert the CompactFlash card with its front label facing away from the board. The
figure below shows the back side of the board with the ConpactFlash card properly
inserted.
www.xilinx.com
Note: The CompactFlash card provided with your board might differ.
Caution! Be careful when inserting or removing the CompactFlash card from the slot. Do not
force it.
6.
Connect the AC power cord to the power supply brick. Plug the power supply adapter
cable into the ML402 board. Plug in the power supply to AC power.
Caution! Make sure you use an appropriate power supply with correct voltage and power
ratings.
7.
Using the RJ45 Male/Male Ethernet Cable, connect the Ethernet connector on the
ML402 board directly to the Ethernet connector on the host PC.
8.
9.
www.xilinx.com
297
As shown below, set the Configuration Source Selector Switch to SYS ACE
b.
Check the on-board status LEDs to ensure the FPGA is configured. If the
configuration succeeded, the DONE LED should be on and all error LEDs should
be off.
c.
As shown below, check the information displayed on the 16-character 2-line LCD
screen of the board. If no error occurred, the Ethernet MAC address (without
colons) should appear on the first line of the display and the IPv4 address should
appear on the second line.
d. If the LCD display does not show the information correctly, press the System
ACE Reset button to reconfigure the FPGA.
e.
Check the status LEDs again to ensure the configuration sequence completed
successfully.
298
a.
b.
Check the on-board Ethernet status LEDs to make sure the Ethernet interface is
attached to an active Ethernet segment. The LEDs should reflect the link speed and
the duplex mode at which the interface is operating. The TX and RX leds should
flash on and off occasionally depending on the network traffic. If no LED is on,
press the CPU Reset button to reset the FPGA, and also examine whether the
Ethernet segment is active.
www.xilinx.com
c.
To ensure the board is reachable by the host, issue ICMP ping from the host to
check the connectivity. For example, type "ping 192.168.8.1" on a console to test the
connectivity to a board with IP address 192.168.8.1.
d. The target FPGA listens on the UDP port 9999. Please ensure the underlying
network does not block the associated traffic when network-based Ethernet
configuration is used. This does not affect point-to-point Ethernet configuration.
www.xilinx.com
299
2.
b.
c.
CompactFlash Card
b.
c.
300
www.xilinx.com
Power Connector
Power
Ethernet
LCD
1.
Position the ML506 board so the Xilinx logo is oriented near the lower-left corner.
2.
Make sure the power switch, located in the upper-right corner of the board, is in the
OFF position.
www.xilinx.com
301
3.
As shown below, Eject the CompactFlash card from the CompactFlash Reader.
4.
5.
Locate the CompactFlash card slot (on the back side of the ML506 board), and carefully
insert the CompactFlash card with its front label facing away from the board. The
figure below shows the back side of the board with the CompactFlash card properly
inserted.
Note: The CompactFlash card provided with your board might differ.
Caution! Be careful when inserting or removing the CompactFlash card from the slot. Do not
force it.
6.
Connect the AC power cord to the power supply brick. Plug the 5V power supply
adapter cable into the ML506 board. Plug in the power supply to AC power.
Caution! Make sure you use an appropriate power supply with correct voltage and power
ratings.
7.
302
Using the RJ45 Male/Male Ethernet Cable, connect the Ethernet connector on the
ML506 board directly to the Ethernet connector on the host PC.
www.xilinx.com
8.
b.
Check the on-board status LEDs to ensure the FPGA is configured. If the
configuration succeeded, the DONE LED should be on and all error LEDs should
be off.
c.
As shown below, check the information displayed on the 16-character 2-line LCD
screen of the board. If no error occurred, the Ethernet MAC address (without
colons) should appear on the first line of the display and the IPv4 address should
appear on the second line.
www.xilinx.com
303
d. If the LCD display does not show the information correctly, press the System
ACE Reset button to reconfigure the FPGA.
e.
Check the status LEDs again to ensure the configuration sequence completed
successfully.
b.
Check the on-board Ethernet status LEDs to make sure the Ethernet interface is
attached to an active Ethernet segment. The LEDs should reflect the link speed and
the duplex mode at which the interface is operating. The TX and RX leds should
flash on and off occasionally depending on the network traffic. If no LED is on,
press the CPU Reset button to reset the FPGA, and also examine whether the
Ethernet segment is active.
c.
To ensure the board is reachable by the host, issue ICMP ping from the host to
check the connectivity. For example, type "ping 192.168.8.1" on a console to test the
connectivity to a board with IP address 192.168.8.1.
d. The target FPGA listens on the UDP port 9999. Please ensure the underlying
network does not block the associated traffic when network-based Ethernet
configuration is used. This does not affect point-to-point Ethernet configuration.
304
www.xilinx.com
2.
b.
b.
Set up the Local Area Network on your PC as described in the topic Setting Up the
Local Area Network on the PC.
Power Connector
Power Switch
LEDs
Ethernet
1.
Position the ML605 board so the Xilinx logo is oriented near the lower-left corner.
www.xilinx.com
305
2.
Make sure the power switch, located in the upper-right corner of the board, is in the
OFF position.
3.
Connect the AC power cord to the power supply brick. Plug the 12V power supply
adapter cable into the ML605 board. Plug in the power supply to AC power.
Caution! Make sure you use an appropriate power supply with correct voltage and power
ratings.
4.
306
Using the RJ45 Male/Male Ethernet Cable, connect the Ethernet connector on the
ML605 board directly to the Ethernet connector on the host PC.
www.xilinx.com
2.
Xilinx Spartan-3A DSP 1800A Starter Board which includes the following:
a.
b.
b.
c.
Xilinx Parallel Cable IV with associated Power Jack splitter cable or a Xilinx
Platform USB Cable and a 14-pin ribbon cable.
Set up the Local Area Network on your PC as described in the topic Setting Up the
Local Area Network on the PC.
Position the Spartan-3A DSP 1800A Starter Board so the Xilinx logo is oriented
rightside up and located in the lower-right quadrant of the board.
2.
Make sure the power switch, located in the upper-right corner of the board, is in the
OFF position.
3.
If you are using a Xilinx Parallel Cable IV, follow steps 3a through 3d.
a.
Connect the DB25 Plug Connector on the Xilinx Parallel Cable IV to the IEEE-1284
compliant PC Parallel (Printer) Port Connector.
b. Using the narrow (14 pin) 6 High Performance Ribbon cable, connect the pod end
of the Xilinx Parallel Cable IV to the JTAG Port (J2) on the Starter Board.
c.
Connect the attached Power Jack cable to the Keyboard/Mouse connector on the
PC.
d. If necessary, connect the male end of the Keyboard/Mouse cable to the associated
female connector on the Xilinx Power Jack cable (splitter cable).
4.
If you are using a Xilinx Platform Cable USB, follow step 4a and 4b.
a.
Connect the Xilinx Platform Cable USB to a USB port on the PC.
b. Using the narrow (14 pin) 6 High Performance Ribbon cable, connect the pod end
of the Xilinx Platform Cable USB to the JTAG Port (J2) on the Starter Board.
www.xilinx.com
307
5.
Connect the AC power cord to the power supply brick. Plug the 5V power supply
adapter cable into the 5V DC ONLY connector (J5) on the Starter Board. Plug the power
supply cord into AC power.
Caution! Make sure you use an appropriate power supply with correct voltage and power
ratings.
6.
Turn the Spartan-3A DSP 1800A Starter Board POWER switch ON.
2.
Xilinx Spartan-3A DSP 3400A Development Board Kit which includes the following:
a.
b.
c.
CompactFlash Card
b.
c.
308
www.xilinx.com
LYR178-101C (Rev C)
Ethernet Port
Configuration
Address
DIP Switches (S2)
System ACE
Reset Button
www.xilinx.com
LCD
309
The figure below illustrates the Spartan-3A 3400A Board (Rev D) components of interest in
this setup procedure:
Ethernet Mode Select
jumper (JP2)
Ethernet Port
LYR178-101D (Rev D)
Configuration
Address
DIP Switches (S2)
System ACE
Reset Button
LCD
Compact Flash Card
310
Power Switch
+5V Power Connector
1.
Position the Spartan-3A 3400A Development Board as shown above with the LCD
display at the bottom.
2.
3.
As shown below, Eject the CompactFlash card from the CompactFlash Reader.
www.xilinx.com
4.
5.
Locate the CompactFlash card slot (on the back side of the Spartan-3A 3400A Board)
and carefully insert the CompactFlash card with its front label facing away from the
board. The figure below shows the back side of a board with the ConpactFlash card
properly inserted.
Note: The CompactFlash card provided with your board might differ.
Caution! Be careful when inserting or removing the CompactFlash card from the slot. Do not
force it.
6.
If you are using a Rev C 3400A Development Board, plug the +12V power supply
adapter cable into the power connector. Plug in the power supply into AC power.
If you are using a Rev D 3400A Development Board, plug the +5V power supply
adapter cable into the power connector. Plug in the power supply into AC power.
Caution! Make sure you use an appropriate power supply with the correct voltage and power
ratings.
7.
Using the RJ45 Male/Male Ethernet Cable, connect the Ethernet connector on the
Spartan-3A 3400A board directly to the Ethernet connector on the host PC.
8.
9.
Set the Ethernet Mode Select jumper JP2 to pin 1 and pin 2 (the default GMII).
b. As shown below, check the information displayed on the 16-character 2-line LCD
screen of the board. If no error occurred, the Ethernet MAC address (without
colons) should appear on the first line of the display and the IPv4 address should
appear on the second line.
www.xilinx.com
311
c.
If the LCD display does not show the information correctly, press the System
ACE Reset button to reconfigure the FPGA.
To ensure the board is reachable by the host, issue ICMP ping from the host to
check the connectivity. For example, type "ping 192.168.8.1" on a console to test the
connectivity to a board with IP address 192.168.8.1.
b. The target FPGA listens on the UDP port 9999. Please ensure the underlying
network does not block the associated traffic when network-based Ethernet
configuration is used. This does not affect point-to-point Ethernet configuration.
For in-depth reference information on the Spartan-3A 3400A Development Board, please
refer to the following online manual:
http://www.xilinx.com/support/documentation/boards_and_kits/ug498_s3a_3400_bo
ard.pdf
312
www.xilinx.com
Power
Switch
LEDs
Power
Connector
2.
Make sure the power switch, located in the right edge of the board, is in the OFF
position.
3.
On the SP605 board, connect the small end of the Mini USB cable to the connector USB
socket closest to the LEDs, as shown below. On the SP601 board, connect the small end
of the USB cable to the socket labeled USB JTAG.
www.xilinx.com
313
Connect Small
End Here
Connecting Ethernet cable and USB JTAG cable to the SP601 board
USB JTAG
Cable
Ethernet
Cable
USB UART
Cable
314
4.
Connect the large end of the Mini USB cable to a USB socket on your PC.
5.
Connect one end of the Ethernet cable to the Ethernet socket on the SP601/SP605
board and the other end to the Ethernet socket on the PC.
www.xilinx.com
6.
Connect the AC power cord to the power supply brick. Plug the power supply adapter
cable into the SP601/SP605 board. Plug in the power supply to AC power.
7.
2.
b.
c.
CompactFlash Card
Xilinx Parallel Cable IV with associated Power Jack splitter cable or Xilinx
Platform USB Cable and a 14-pin ribbon cable.
b.
www.xilinx.com
315
1.
Position the ML402 board so the Virtex-4 and Xilinx logos are oriented near the top
edge of the board.
2.
Make sure the power switch, located in the upper-right corner of the board, is in the
OFF position.
3.
If you are using a Xilinx Parallel Cable IV, follow steps 3a through 3d.
a.
Connect the DB25 Plug Connector on the Xilinx Parallel Cable IV to the IEEE-1284
compliant PC Parallel (Printer) Port Connector.
b. Using the narrow (14 pin) 6 High Performance Ribbon cable, connect the pod end
of the Xilinx Parallel Cable IV to the FPGA & CPU Debug Port (shown above) on
the ML402 board.
c.
Connect the attached Power Jack cable to the Keyboard/Mouse connector on the
PC.
d. If necessary, connect the male end of the Keyboard/Mouse cable to the associated
female connector on the Xilinx Power Jack cable (splitter cable).
4.
If you are using a Xilinx Platform Cable USB, follow step 4a and 4b.
a.
Connect the Xilinx Platform Cable USB to a USB port on the PC.
b. Using the narrow (14 pin) 6 High Performance Ribbon cable, connect the pod end
of the Xilinx Platform Cable USB to the FPGA & CPU Debug Port (shown above)
on the ML402 board.
5.
Connect the AC power cord to the power supply brick. Plug the power supply adapter
cable into the ML402 board. Plug in the power supply to AC power.
Caution! Make sure you use an appropriate power supply with correct voltage and power
ratings.
6.
316
www.xilinx.com
b.
c.
Power Connector
Mini USB
Connector
Power Switch
LEDs
1.
2.
Make sure the power switch, located in the upper-right corner of the board, is in the
OFF position.
www.xilinx.com
317
3.
As shown below, connect the small end of the Mini USB cable to the connector USB
socket closest to the LEDs.
Connect Large
End to PC
Connect Small
End Here
4.
Connect the large end of the Mini USB cable to a USB socket on your PC.
As shown below, the LED next to the Mini USB connector turns green when the cable
is connected properly.
LED turns green when
the cable is connected
properly
5.
Connect the AC power cord to the power supply brick. Plug the power supply adapter
cable into the ML605 board. Plug in the power supply to AC power.
Caution! Make sure you use an appropriate power supply with correct voltage and power
ratings.
6.
318
www.xilinx.com
Power
Switch
LEDs
Power
Connector
1.
2.
Make sure the power switch, located in the right edge of the board, is in the OFF
position.
3.
As shown below, connect the small end of the Mini USB cable to the connector USB
socket closest to the LEDs.
www.xilinx.com
319
Connect Small
Connecting Xilinx USB cable and AC power cable to the SP601 board
USB JTAG
Cable
Power
Cable
320
4.
Connect the large end of the Mini USB cable to a USB socket on your PC.
5.
Connect the AC power cord to the power supply brick. Plug the power supply adapter
cable into the SP601/SP605 board. Plug in the power supply to AC power.
6.
www.xilinx.com
b.
c.
2.
Double click on the install_digilent.exe executable and follow the wizard instructions
to install the plugin.
www.xilinx.com
321
Power Switch
LEDs
Power Connector
322
1.
2.
Make sure the power switch, located in the upper-right corner of the board, is in the
OFF position.
3.
Connect the small end of the Micro USB-JTAG cable to the JTAG socket.
4.
Connect the large end of the Micro USB-JTAG cable to a USB socket on your PC.
5.
Connect the AC power cord to the power supply brick. Plug the power supply adapter
cable into the KC705 board. Plug in the power supply to AC power.
6.
www.xilinx.com
Hardware Requirements
An FPGA board can support the JTAG hardware co-simulation interface, provided it
includes the following hardware components:
A Xilinx FPGA part that is available in System Generator as a supported device (i.e., a
device that can be chosen in the Part field of the System Generator token dialog box);
An on-board oscillator that supplies the FPGA with a free-running clock source;
www.xilinx.com
323
Once the main dialog box is open, you may create a board support package by filling in the
required fields described below:
Board Name: Tells a descriptive name of the board. This is the name that will be listed in
System Generator when selecting your JTAG hardware co-simulation board for
compilation.
System Clock: JTAG hardware co-simulation requires an on-board clock to drive the
System Generator design. The fields described below specify information about the
board's system clock:
324
www.xilinx.com
Frequency (MHz): Specifies the frequency of the on-board system clock in MHz.
Note: You should use a clock frequency between 10 MHz and 100 MHz. Depending on the
target FPGA device and your design, the design compiled for hardware co-simulation may
not meet timing constraints at a higher clock frequency after the hardware co-simulation
logic is added.
Pin Location: Specifies the FPGA input pin to which the system clock is connected.
JTAG Options: System Generator needs to know several things about the FPGA board's
JTAG chain to be able to program the FPGA for hardware co-simulation. The topic
Obtaining Platform Information describes how and where to find the information required
for these fields. If you are unsure of the specifications of your board, please refer to the
manufacturer's documentation. The fields specific to JTAG Options are described below:
Boundary Scan Position: Specifies the position of the target FPGA on the JTAG chain.
This value should be indexed from 1. (e.g. the first device in the chain has an index of
1, the second device has an index of 2, etc.)
IR Lengths: Specifies the lengths of the instruction registers for all of the devices on
the JTAG chain. This list may be delimited by spaces, commas, or semicolons.
Detect: This action attempts to identify the IR Lengths automatically by querying the
FPGA board. The board must be powered and connected to a Parallel Cable IV for this
to function properly. Any unknown devices on the JTAG chain will be represented
with a "?" in the list, and must be specified manually.
Targetable Devices: This table displays a list of available FPGAs on the board for
programming. This is not a description of all of the devices on the JTAG chain, but rather a
description of the possible devices that may exist at the aforementioned boundary scan
position. For most boards, only one device needs to be specified, but some boards may
have alternate, e.g., a choice between an xcv1000 or an xcv2000 in the same socket. Use the
Add and Delete buttons described below to build the device list:
Add: Brings up a menu to select a new device for the board. As shown in the figure
below, devices are organized by family, then part name, then speed, and finally the
package type.
Non-Memory-Mapped Ports: You can add support for your own board-specific ports
when creating a board support package. Board-specific ports are useful when you have onboard components (e.g., external memories, DACs, or ADCs) that you would like the
FPGA to interface to during hardware co-simulation. Board specific ports are also referred
to as non-memory-mapped because when the design is compiled for hardware cosimulation, these ports will be mapped to their physical locations, rather than creating
Simulink ports. See Specifying Non-Memory Mapped Ports for more information. The
Add, Edit, and Delete buttons provide the controls needed for configuring non-memory
mapped ports.
www.xilinx.com
325
Add: Brings up the dialog to enter information about the new port.
The port editor dialog presents the following controls for port configuration:
Port Options: Specifies the options that will affect the entire port.
Port Name: This is the name that will describe the port in System Generator. It should
be a MATLAB-compatible name (begins with a letter, followed by only letters,
numbers, and underscores).
New Pin: This is the entry point to add pins to a port. Ports may consist of a single pin for
a Boolean value, or multiple pins for a vector or bus.
326
Pin LOC: Defines the absolute placement of the pin within the FPGA by specifying a
location constraint. It is necessary to define this for every pin to make sure that the
FPGA programming corresponds to the actual hardware connections.
PULLUP: A constraint that can be applied to each pin. It guarantees a logic High level
to allow 3-stated nets to avoid floating when not being driven.
www.xilinx.com
FAST: A constraint that can be applied to each pin. It increases the speed of an IOB
output. FAST produces a faster output but may increase noise and power
consumption.
Add Pin: Add a pin to the port. Note that the pin is not part of the port until this
button is selected.
Note: Pressing 'enter' while the cursor is in the Pin LOC field is equivalent to pressing this
button.
Pin List:
Index: (Cannot edit directly) Since a port can be more than one bit, it is represented as
a vector of pins. The index indicates which bit position a particular pin represents in
the port. Zero is the least-significant bit.
Move Up/Down: Move the selected pin up or down in the pin list. This is useful to
correct the vector bit-ordering of the port.
Save and Start New: Save the port to the board support package. The form will then be
cleared so that you may enter a new port.
Save and Close: Save the port to the board support package and return to the main screen.
Cancel: Discard changes to the current port and return to the main screen.
When you are finished entering a port, it will look similar to the dialog box shown below:
www.xilinx.com
327
At this point, you can save the board support package into a System Generator plugin zip
file or as the raw board support package files described in the topic Board Support Package
Files, plus the additional SBDBuilder files described below:
328
yourboard_libgen.m: Automates the process of creating the gateways for the nonmemory-mapped ports on this device. Running this script results in the creation a
library like that shown below:
www.xilinx.com
xltarget.m Tells System Generator that your FPGA board is a compilation target.
There is a unique xltarget.m file for each compilation target. This function tells the tool
the name of the compilation target (this name is shown in the compilation target field
of the System Generator token dialog box) and also the name of the function where it
can look for information about the particular board.
2.
3.
4.
yourboard.ucf User Constraints File (UCF) for the FPGA board. Specifies clock pin
location and frequency, and optionally constrains any board-specific ports.
Included in the System Generator software tree are templates for the files listed above. If
you would like to manually support a new board, you may customize each of the four
template files with information that is specific to your board. You must also rename the
files by substituting a suitable name in place of the 'yourboard' prefix.
Each template is fully annotated with step-by-step instructions that indicate which fields
should be modified, and the types of values that should be given to these fields. The fields
that must be modified are underlined using "~~~" notation. The template files can be
found in the sysgen/hwcosim/jtag/templates directory of your System Generator
install tree.
www.xilinx.com
329
Description
Clock period
Tells the position of the target FPGA in the board's Boundary Scan
Chain. Indexing begins at 1, with device 1 being the first device in
the chain.
Instruction register
lengths
You may obtain the clock pin location and period from any number of possible sources,
including the vendor documentation, existing constraints files, or vendor online
documentation/support.
If you do not know which devices are in your board's boundary scan chain, you may use
iMPACT to assist you in finding this information. iMPACT is a tool that is included with
the Xilinx ISE software that allows you to perform device configuration and file
generation functions. When the tool is invoked, it automatically detects the contents of
your board's boundary scan chain, and displays these contents graphically, as shown
below.
330
www.xilinx.com
Once you have determined which devices are in the Boundary Scan Chain, you must
determine the instruction register lengths for each device. The table below specifies the
instruction register lengths for various Xilinx families. You may use the auto detection
capability of SBDBuilder to determine the instruction register lengths. If this utility does
not work, you may use the following table to find the instruction register lengths for a
particular part family.
Family
IR Length
System_ACE-CF
Virtex-4 LX, SX
10
Virtex-4 FX 12, 20
10
14
10
10
14
10
16
www.xilinx.com
331
Add all board-specific ports to the yourboard.ucf template file. Each constraint
should be accompanied by a special comment, <port> contingent, where <port>
is the name of the board specific port. When System Generator compiles a model for
hardware, it creates a custom UCF file. Constraints associated with signals that aren't
used in the model are removed from the custom UCF file.
Name the Gateway with the name of your board specific port (this name must
match the port name used in the post-generation function and UCF file)
You are now ready to use your board-specific gateway in System Generator. When you
include the gateway in your model, you must make sure the signals that drive (or are
driven by) the gateway have widths that match the widths of the ports in hardware. You
can force the width of a signal driving a gateway out by preceding it with a convert block.
332
www.xilinx.com
Note: A subsystem (as shown below) is a convenient place to store the gateway out and convert
block pairs.
www.xilinx.com
333
Plugins Directory
The System Generator software provides a special directory in which the board support
package files for new compilation targets can be added. This directory,
plugins/compilation, provides a repository for System Generator compilation target
plugins, and has unique properties that are discussed later in this topic. Your System
Generator software tree should resemble the tree hierarchy shown below.
The board support package files for you board should be saved in a subdirectory, or series
of subdirectories, under the plugins/compilation directory.
Note: All configuration files associated with a board support package must be saved in the same
directory.
System Generator searches this directory (and subdirectories) for compilation targets.
Recall that the xltarget.m file tells System Generator the board should be used as a
compilation target. When the tool searches the plugins/compilation directory, it adds
a compilation target to the System Generator token dialog box for every xltarget.m file
that it encounters.
The System Generator token dialog box Compilation submenus mirror the directory
structure under the plugins/compilation directory. When you create a new directory,
334
www.xilinx.com
or directory hierarchy, for a board support package, the names of the directories define the
taxonomy of the compilation target submenus.
You can now select the FPGA board from the list of compilation targets in the System
Generator token dialog box.
Note: If you have a System Generator token dialog box open when you enter this command, it will
not show up until you close and re-open the dialog box.
www.xilinx.com
335
336
www.xilinx.com
Chapter 4
Wizard.
Black Box Configuration MFunction
HDL Co-Simulation
Configuring the HDL Simulator
www.xilinx.com
337
338
The entity name must not collide with any other entity name in the design.
Bi-directional ports are supported in HDL black boxes, however they will not be
displayed in the System Generator as ports; they only appear in the generated HDL
after netlisting.
For Verilog black boxes, the module and port names must be lower case and must
follow standard VHDL naming conventions.
Any port that is a clock or clock enable must be of type std_logic. (For Verilog black
boxes, ports must be of non-vector inputs, e.g., input clk.)
Clock and clock enable ports in black box HDL should be expressed as follows: Clock
and clock enables must appear as pairs (i.e., for every clock, there is a corresponding
clock enable, and vice-versa). Although a black box may have more than one clock
port, a single clock source is used to drive each clock port. Only the clock enable rates
differ.
Each clock name (respectively, clock enable name) must contain the substring clk, for
example my_clk_1 and my_ce_1.
www.xilinx.com
The name of a clock enable must be the same as that for the corresponding clock, but
with ce substituted for clk. For example, if the clock is named src_clk_1, then the
clock enable must be named src_ce_1.
www.xilinx.com
339
After searching the model's directory for .vhd and .v files, the Configuration Wizard
opens a new window that lists the possible files that can be imported. An example
screenshot is shown below:
You can select the file you would like to import by selecting the file, and then pressing the
Open button. At this point, the configuration wizard generates a configuration M-function
and associates it with the black box block.
Note: The configuration M-function is saved in the model's directory as <module>_config.m,
where <module> is the name of the module that you are importing.
If your model has a combinational path, you must call the tagAsCombinational
method of the block's SysgenBlockDescriptor object.
The Configuration Wizard only knows about the top-level entity that is being
imported. There are typically other files that go along with this entity. These files must
be added manually in the configuration M-function by invoking the addFile method
for each additional file.
The Configuration Wizard creates a single-rate black box. This means that every port
on the black box runs at the same rate. In most cases, this is acceptable. You may want
to explicitly set port rates, which can result in a faster simulation time.
340
www.xilinx.com
Port descriptions;
The name of the configuration M-function associated with a black box is specified as a
parameter in the black box parameters dialog box (parity_block_config.m in the
example shown below).
Language Selection
The black box can import VHDL and Verilog modules. SysgenBlockDescriptor provides a
method, setTopLevelLanguage, that tells the black box what type of module you are
importing. This method should be invoked once in the configuration M-function. The
following code shows how to select between the VHDL and Verilog languages.
VHDL Module:
this_block.setTopLevelLanguage('VHDL');
Verilog Module:
this_block.setTopLevelLanguage('Verilog');
Note: The Configuration Wizard automatically selects the appropriate language when it generates
a configuration M-function.
www.xilinx.com
341
this_block.setEntityName('foo');
Note: The Configuration Wizard automatically sets the name of the top-level entity when it
generates a configuration M-function.
Bi-directional ports are supported only during the netlisting of a design and will not
appear on the System Generator diagram; they only appear in the generated HDL. As
such, it is important to only add the bi-directional ports when System Generator is
generating the HDL. The if-end conditional statement is guarding the execution of the
code to add-in the bi-directional port.
It is also possible to define both the input and output ports using a single method call. The
setSimulinkPorts method accepts two parameters. The first parameter is a cell array of
strings that define the input port names for the block. The second parameter is a cell array
of strings that define the output port names for the block.
Note: The Configuration Wizard automatically sets the port names when it generates a
configuration M-function
342
www.xilinx.com
accessing the port objects that are associated with it. For example, the following method
retrieves the port named din on the this_block descriptor:
Accessing a SysgenPortDescriptor object:
din = this_block.port('din');
In the above code, an object din is created and assigned to the descriptor returned by the
port function call.
SysgenBlockDescriptor also provides methods, inport and outport, that return a port
object given a port index. A port index is the index of the port (in the order shown on the
block interface) and is some value between 1 and the number of input/output ports on the
block. These methods are useful when you need to iterate through the block's ports (e.g.,
for error checking).
The first code segment sets the port attributes using individual method calls. The second
code segment defines the signal type by specifying the signal type as a string. Both code
segments are functionally equivalent.
The black box supports HDL modules with 1-bit ports that are declared using either single
bit port (e.g., std_logic) or vectors (e.g., std_logic_vector(0 downto 0)) notation. By default,
System Generator assumes ports to be declared as vectors. You may change the default
behavior using the useHDLVector method of the descriptor. Setting this method to true
tells System Generator to interpret the port as a vector. A false value tells System
Generator to interpret the port as single bit.
dout.useHDLVector(true); % std_logic_vector
dout.useHDLVector(false); % std_logic
Note: The Configuration Wizard automatically sets the port types when it generates a configuration
M-function.
www.xilinx.com
343
bidi_port.setGatewayFileName('bidi.dat');
end
In the above example, a text file "bidi.dat" is used during simulation to provide stimulation
to the port. The data file should be a text file, where each line represents the signal driven
on the port at each simulation cycle. For example, a 3-bit bi-directional port that is
simulated for 4 cycles might have the following data file:
ZZZ
110
011
XXX
Simulation will return with an error if the specified data file cannot be found.
A rate of 3 means that a new sample is generated on the dout port every 3 Simulink system
periods. Since the Simulink system period is 2 sec, this means the Simulink sample rate of
the port is 3 x 2 = 6 sec.
Note: If your port is a non-sampled constant, you may define it as so in the configuration M-function
using the setConstant method of SysgenPortDescriptor. You can also define a constant by passing
Inf to the setRate method.
Note: A black box's configuration M-function is invoked at several different times when a model is
compiled. The configuration function may be invoked before the data types and rates have been
propagated to the black box.
344
www.xilinx.com
Setting dynamic rates works in a similar manner. The code below sets the sample rate of
output port dout to be twice as slow as the sample rate of input port din:
if (this_block.inputRatesKnown)
dout.setRate(this_block.port('din').rate*2);
end
The strings containing the substrings clk and ce must be the same (e.g., my_clk_1
and my_ce_1).
The third parameter defines the rate relationship between the clock and the clock enable
port. The rate parameter should not be thought of as a Simulink sample rate. Instead, this
parameter tells System Generator the relationship between the clock sample period, and
the desired clock enable sample period. The rate parameter is an integer value that defines
the ratio between the clock rate and the corresponding clock enable rate.
For example, assume you have a clock enable port named ce_3 that would like to have a
period three times larger than the system clock period. The following function call
establishes this clock enable port:
addClkCEPair('clk_3','ce_3',3);
www.xilinx.com
345
When System Generator compiles a black box into hardware, it produces the appropriate
clock enable signals for your module, and automatically wires them up to the appropriate
clock enable ports.
Combinational Paths
If the module you are importing has at least one combinational path (i.e., a change on any
input can effect an output port without a clock event), you must indicate this in the
configuration M-function. SysgenBlockDescriptor object provides a
tagAsCombinational method that indicates your module has a combinational path. It
should be invoked as follows in the configuration M-function:
this_block.tagAsCombinational;
It is also possible to set generic values based on port on propagated input port information
(e.g., a generic specifying the width of a dynamic output port).
Because a black box's configuration M-function is invoked at several different times when
a model is compiled, the configuration function may be invoked before the data types (or
rates) have been propagated to the black box. If you are setting generic values based on
input port types or rates, the addGeneric calls should be nested inside a conditional
statement that checks the value of the inputTypesKnown or inputRatesKnown
variables. For example, the width of the dout port can be set based on the value of din as
follows:
if (this_block.inputTypesKnown)
% set generics that depend on input port types
this_block.addGeneric('dout_width', ...
this_block.port('din').width);
end
Generic values can be configured based on mask parameters associated with a block box.
SysgenBlockDescriptor provides a member variable, blockName, which is a string
representation of the black box's name in Simulink. You may use this variable to gain
access the black box associated with the particular configuration M-function. For example,
assume a black box defines a parameter named init_value. A generic with name
init_value can be set as follows:
simulink_block = this_block.blockName;
init_value = get_param(simulink_block,'init_value');
this_block.addGeneric('init_value', 'String', init_value);
Note: You can add your own parameters (e.g., values that specify generic values) to the black box
by doing the following:
346
www.xilinx.com
www.xilinx.com
347
The VHDL module below is the top-level module that is used to instantiate the previous
modules. This is the module that you need to point to when adding the BlackBox into you
System Generator model.
The VHDL is imported by first importing the top-level entity, top_level, using the Black
Box.
348
www.xilinx.com
Once the file is imported, the associated Black Box Configuration M-file needs to be
modified as follows:
The interface function addFileToLibrary is used to specify a library name other than
work and to instruct the tool to compile the associated HDL source to the specified
library.
The System Generator model should look similar to the figure below.
The next step is to double-click on the System Generator token and click on the Generate
button to generate the HDL netlist.
During the generation process, an ISE project file is created and placed in the netlist folder.
To verify that each VHDL sub-module was compiled into its own library, double click on
the ISE project file to bring up Project Navigator. Select the Libraries tab and you will see
that not only is there a work library, but a async_counter_lib library and a
sync_counter_lib library as well.
www.xilinx.com
349
Error Checking
It is often necessary to perform error checking on the port types, rates, and mask
parameters of a black box. SysgenBlockDescriptor provides a method, setError, which
allows you to specify an error message that is reported to the user. The string parameter
passed to setError is the error message that is seen by user.
350
Member
Description
String
entityName
String
blockName
Integer
numSimulinkInports
Integer
numSimulinkOutports
Boolean
inputTypesKnown
Boolean
inputRatesKnown
Array of
Doubles
inputRates
Boolean
error
Cell Array of
Strings
errorMessages
www.xilinx.com
SysgenBlockDescriptor Methods
Method
Description
setTopLevelLanguage(language)
setEntityName(name)
addSimulinkInport(pname)
addSimulinkOutport(pname)
setSimulinkPorts(in,out)
addInoutport(pname)
tagAsCombinational()
addClkCEPair(clkPname, cePname,
rate)
port(name)
inport(indx)
outport(indx)
www.xilinx.com
351
Method
352
Description
addGeneric(identifier, value)
addFile(fn)
getDeviceFamilyName()
getConfigPhaseString
setSimulatorCompilationScript
(script)
setError(message)
www.xilinx.com
Member
Description
String
name
Integer
simulinkPortNumber
Boolean
typeKnown
String
type
Boolean
isBool
Boolean
isSigned
Boolean
isConstant
Integer
width
Integer
binpt
Boolean
rateKnown
Double
rate
SysgenPortDescriptor Methods
Method
Description
setName(name)
setSimulinkPortNumber(num)
setType(typeName)
setWidth(w)
setBinpt(bp)
makeBool()
makeSigned()
makeUnsigned()
www.xilinx.com
353
Method
Description
setConstant()
setGatewayFileName(filename)
setRate(rate)
useHDLVector(s)
HDLTypeIsVector()
HDL Co-Simulation
Introduction
This topic describes how a mixed language/mixed flow design that includes Xilinx blocks,
HDL modules, and a Simulink block diagram can be simulated in its entirety.
System Generator simulates black boxes by automatically launching an HDL simulator,
generating additional HDL as needed (analogous to an HDL testbench), compiling HDL,
scheduling simulation events, and handling the exchange of data between the Simulink
and the HDL simulator. This is called HDL co-simulation.
ISE Simulator
To use the ISE Simulator for co-simulating the HDL associated with the black box, select
ISE Simulator as the option for the Simulation mode parameter on the black box as shown
in the following figure. The model is then ready to be simulated and the HDL cosimulation takes place automatically.
354
www.xilinx.com
HDL Co-Simulation
ModelSim Simulator
To use the ModelSim simulator by Model Technology, Inc., you must first add the
ModelSim block that appears in the Tools library of the Xilinx Blockset to your Simulink
diagram.
For each black box that you wish to have co-simulated using ModelSim simulator, you
need to open its block parameterization dialog and set it to use the ModelSim session
represented by the black box that was just added. You do this by making the following two
settings:
www.xilinx.com
355
The block parameter dialog for the ModelSim block includes some parameters that you
can use to control various options for the ModelSim session. See the Modelsim block help
pages for details. The model is then ready to be simulated with these options, and the HDL
co-simulation takes place automatically.
356
www.xilinx.com
www.xilinx.com
357
how to write a VHDL wrapper to import CORE Generator modules as black boxes. The
flow graph below illustrates the process of importing CORE generator modules.
358
Start CORE Generator and open the the following CORE Generator project file:
<ISE_Design_Suite_tree>/sysgen/examples/coregen_import/example1
/coregen_import_example1.cgp
www.xilinx.com
2.
Double click the CORDIC 4.0 icon to launch the customization GUI
3.
Parameterize and generate the CORDIC 4.0 core with component name
cordic_sincos, a functional Selection of Sin and Cos and the remaining options set
to be the default values as shown below:
www.xilinx.com
359
360
www.xilinx.com
4.
Click Generate. Core Generator produces the following files after generation:
5.
6.
Drag and drop the black box from the Xilinx "Basic Elements" library into the model
coregen_import_example1.mdl. Select cordic_sincos.vhd for the top-level
HDL file and click Open.
7.
Connect the input and output ports of the black box to the open wires.
www.xilinx.com
361
362
8.
Open the cordic_sincos_config.m file, and add the EDIF netlist to the black box
file list as shown below. This file will get included as part of the System Generator
netlist for the design when it is netlisted.
9.
Open the black box parameterization GUI and select ISE Simulator for the simulation
mode.
www.xilinx.com
10. Press the Simulate button to compile and co-simulate the CORDIC core using the ISE
simulator. The simulation results are as shown below.
www.xilinx.com
363
364
1.
Start Core Generator and open the following Core Generator project file:
<ISE_Design_Suite_tree>/sysgen/examples/coregen_import/example2
/coregen_import_example2.cgp
2.
As shown below, double click the FIR Compiler icon to launch the customization GUI.
3.
Customize and generate the FIR Compiler 4.0 core with the following parameters:
Filter Coefficients:
-
www.xilinx.com
www.xilinx.com
365
4.
In the next two frames, leave the options set to the default values.
This example will show you how to import a core that does not have a CE (clock
enable) port. In the next frame, Verify that the CE or ACLKEN port option in the
Control Signals field is not selected, then click Geneate.
CORE Generator produces the following files:
fir_compiler_8tap.ngc: Implementation netlist
fir_compiler_8tap.vhd: VHDL wrapper for behavioral simulation
fir_compiler_8tap.vho: Core instantiation template
fir_compiler_8tap.xco: Parameters selected for core generation
Multiple .mif files: Memory initialization files for functional simulation
Since this core does not have a ce port and the System Generator blackbox requires a
clk, ce pair, you need to specify a core wrapper to add a ce port to the top level.
366
www.xilinx.com
5.
6.
Copy the port declaration for the component fir_compiler_8tap and paste it
for the fir_compiler_8tap entity declaration
(after ---- Add Port declaration for entity ----)
www.xilinx.com
367
7.
368
Add the ce port to the top-level entity declaration, and change the case of the CLK
port to clk.
www.xilinx.com
8.
Drag and drop the black box from the "Basic Elements" library in the
coregen_import_example2.mdl. Select fir_compiler_8tap_wrapper.vhd
for the top-level HDL file.
9.
10. Open the fir_compiler_8tap_wrapper_config.m file, and add the VHDL file,
EDIF netlist and MIF files to the black box file list as shown below. These files get
included as part of the System Generator netlist for the design when it is generated.
Note: The order in which the files are added in the configuration function is the order in which they
get compiled during synthesis and simulation.
www.xilinx.com
369
11. Open the black box parameterization GUI and select the ISE Simulator for simulation
mode.
12. Press the Simulate button to compile and co-simulate the FIR core using the ISE
simulator. The simulation results are as shown below.
370
www.xilinx.com
2.
transpose_fir.vhd - Top-level VHDL for a transpose form FIR filter. This file
is the VHDL that is associated with the black box.
mac.vhd Multiply and add component used to build the transpose FIR filter.
3.
Open the subsystem named Transpose FIR Filter Black Box. At this point, the
subsystem contains two inports and one outport. The black box subsystem is shown
below:
4.
Go to the Simulink Library Browser and add a black box block to this subsystem. The
black box is located in the Xilinx Blockset's Basic Elements library. The Black Box
Configuration Wizard is automatically invoked when a new black box is added to the
subsystem. A browser window appears that lists the VHDL source files that can be
www.xilinx.com
371
associated with the black box. From this window, select the top -level VHDL file
transpose_fir.vhd. This is illustrated in the figure below:
Note: The wizard will only run if the black box is added to a model that has been saved to a file. If
the model has not been saved, the wizard does not know where to search for files and System
Generator will instead display a warning that looks like the following:
5.
372
The wizard parses the VHDL to generate a configuration M-function for the black box.
This is a MATLAB script that, among other things, associates the black box to the
VHDL and creates black box ports. Once the function has run, the ports on the black
box match those in the top-level VHDL entity (not including clock and clock enable
ports). This is illustrated below:
www.xilinx.com
6.
A synchronous HDL design that is associated with a black box must have one or
more clock and clock enable ports. These ports must occur in pairs, one clock for
each clock enable, and vice-versa. Each of these ports must be of type std_logic.
The name of the clock port must contain the substring clk. The name of the clock
enable port must be the same as the name of the clock port, but with ce
substituted for clk.
The clock enable port has a specific meaning to System Generator and is not a
general purpose user enable for the block. Refer to the topic Black Box HDL
Requirements and Restrictions for details.
Double click on the black box block. The dialog box shown below appears:
Block configuration M-function - This specifies the name of the configuration Mfunction for the black box. In this example, the field contains the name of the
function that was generated by the Configuration Wizard. By default, the black
box uses the function the wizard produces. You can, however, substitute one you
produce yourself. For more information on the configuration M-function, refer to
the topic Black Box Configuration M-Function.
Inactive - When the mode is Inactive, the black box participates in the
simulation by ignoring its inputs and producing zeros. This setting is
typically used when a separate simulation model is available for the black
box, and the model is wired in parallel with the black box using a simulation
multiplexer. Black Box Tutorial Example 1: Importing a Core Generator
Module that Satisfies Black Box HDL Requirements shows how this is
accomplished.
ISE Simulator - When the mode is ISE Simulator, simulation results for the
black box are produced using co-simulation on the HDL associated to the
black box.
www.xilinx.com
373
FPGA Area Estimation - The numbers entered in this field are estimates of how
much of the FPGA is used by the HDL for the black box. These numbers must be
entered by hand. The numbers are only needed if you would like to use the
resource estimating utilities supplied with System Generator. For more
information, see Resource Estimation.
To continue the tutorial, leave the parameters set as they currently are.
374
7.
8.
Run the simulation by clicking the Simulation Play button and then double click on the
scope block. Notice the black box output shown in the Output Signal scope is zero.
This is expected as the black box is configured to be inactive during simulation.
www.xilinx.com
9.
Go to the Simulink Library Browser and add a ModelSim block to this subsystem. The
ModelSim block is located in the Xilinx Blockset /Tools library. This block enables the
black box to communicate with a ModelSim simulator. Double click on the ModelSim
block to open the dialog box shown below:
10. Make sure the parameters match those shown in the preceding figure. Close the dialog
box.
11. From the Simulink menu, select Port Data Types from the Format menu to display the
port types for the black box. Compile the model (Ctrl-d) to ensure the port data types
are up to date. Notice that the black box port output type is UFix_26_0. This means it
is unsigned, 26 bits wide and has a binary point 0 positions to the left of the least
significant bit.
12. Open the configuration M-function transpose_fir_config.m and change the
output type from UFix_26_0 to Fix_26_12. The modified line should read:
dout_port.setType('Fix_26_12');
13. Edit the configuration M-function to associate an additional HDL file with the black
box. Locate the line:
this_block.addFile('transpose_fir.vhd');
www.xilinx.com
375
14. Save the changes to the configuration M-function and recompile the model (Ctrl-d).
Your subsystem should appear as follows:
15. From the black box block parameter dialog box, change the Simulation mode field
from Inactive to External co-simulator. Enter ModelSim in the HDL co-simulator to
use field. The name in this field corresponds to the name of the ModelSim block that
you added to the model. The black box dialog box should appear as follows:
376
www.xilinx.com
16. Run the simulation. A ModelSim command window and waveform viewer opens.
ModelSim simulates the VHDL while Simulink controls the overall simulation. The
resulting waveform looks something like the following:
They are caused by the black box VHDL not specifying initial values at the start of
simulation.
www.xilinx.com
377
17. Examine the scope output after the simulation has completed. When the Simulation
Mode was set to Inactive, the Output Signal scope displayed constant zero. Notice the
waveform is no longer zero. Instead, Output Signal shows the results from the
ModelSim simulation.
378
black_box_ex4.mdl A Simulink model with two black boxes, one using VHDL
and the other using Verilog.
shutter.v The Verilog for a simple synchronous latch. The code has been
parameterized so that the input port din can have arbitrary width.
www.xilinx.com
Navigate into the example4 directory and open the example model.
This is a simple design with two black boxes, one VHDL and the other Verilog. The
VHDL black box computes the parity of each input word, and the Verilog black box
latches the words that have odd parity. No Simulink model is used to compute the
behavior of the black boxes; instead, HDL co-simulation is used. The example model is
shown in the figure below.
You must have a license for mixed-mode ModelSim simulation to run this example. If
you do and you run the simulation, you will see a ModelSim waveform window that
looks like the one captured below. The behavior of both black boxes is shown. You can
browse the design structure in ModelSim to see how System Generator has combined
the two black boxes.
www.xilinx.com
379
2.
Change the input type to an arbitrary type and rerun the simulation. Both black boxes
adjust in the appropriate way to the change.
mac.vhd Multiply and add component used to build the transpose FIR filter.
380
1.
2.
Run the simulation from the top-level model, and view the results displayed in the
scopes.
www.xilinx.com
3.
Reduce the number of bits on the gateway Din Gateway In from 16 bits down to 12
and the binary point from 14 to 10, then run the simulation again. Note that both the
input and output widths on the black box adjust automatically. The black box
subsystem and simulation results should look like those shown below.
www.xilinx.com
381
4.
The black box is able to adjust to changes in input width because of its configuration
M-function. To make this work, the M-function must be augmented by hand. Open the
M-function file transpose_fir_parametric.m. The important points are described
below.
For details concerning the black box configuration M-function, seethe topic Black Box
Configuration M-Function.
If you examine the black box VHDL file transpose_fir_parametric.vhd you see generics
input_bitwidth and output_bitwidth that specify input and output width. These
are passed to lower-level VHDL components.
parity_block.vhd: VHDL for a simple state machine that tracks the running
parity of an 8-bit input word.
382
www.xilinx.com
the connection between the black boxes and ModelSim. The example model is shown in
the figure below.
If you run the simulation, you will see a Simulink scope and ModelSim waveform window
that look like the figures below. The scope shows that the black boxes produce matching
parity results (as expected), but with one delayed from the other by one clock cycle. The
waveform window shows the same results, but viewed in ModelSim and expressed in
binary. System Generator automatically configures the waveform viewer to display the
input and output signals of each black box. You can also browse the design structure in
ModelSim to see how System Generator has elaborated the design to combine the two
black boxes.
www.xilinx.com
383
This example also shows a way to view signals coming from a black box. In Simulink,
waveforms are typically viewed with a scope. The Simulink scope block serves this
purpose and the System Generator WaveScope block is available in versions 8.1 and later.
The waveform viewer in the ModelSim simulator may also be used to view waveforms. In
this example, a black box is configured as a specialized ModelSim waveform scope for
Xilinx fixed-point signals. When a model that uses the black box scope is simulated, the
signals that drive the black box are displayed in ModelSim.
The files for this example are contained in the directory
<ISE_Design_Suite_tree>/sysgen/examples/black_box/example5.
The files contained in this directory are:
384
scope_config.m: The configuration M-function for the black box waveform viewer.
www.xilinx.com
Navigate into the example5 directory and open the example black_box_ex5.mdl
file. The model includes an adder that is driven by two input gateways. The gateways
are configured to produce signed 8-bit values, each with six bits to the right of the
binary point. Sine wave generators drive the gateways. The model also includes a
black box named waveform scope. This is driven by three signals. The first input is
driven by the adder. The other two are driven by the inputs to the adder. The
ModelSim block enables HDL co-simulation. The example model is shown below.
2.
www.xilinx.com
385
signal is represented in two ways in the ModelSim viewer binary and analog. The
ModelSim waveforms for the black_box_ex5 simulation are shown below.
3.
Double click on the Simulink scope in the model. The output is shown below and
resembles the analog signal in the ModelSim waveform viewer.
The black box in this example is configured using mask parameters. There are many
situations in which this is useful. In this case, the number of black box input ports, i.e.,
the number of scope inputs, is determined by a mask parameter.
386
www.xilinx.com
4.
Double click on the waveform scope black box. Notice a Number of Input Ports field is
included in the block dialog box and is unique to this black box instance. The dialog
box is shown below:
5.
Change the number of input ports from 3 to 4 and apply the changes. The black box
now has an additional input port labeled sig4 and should look like the following:
Every black box has a standard list of mask parameters. The black box in this example
has an additional mask parameter nports that stores the number of input ports
selected by the user. To change a black box mask it is necessary to disable the link to the
library. When a black box is changed in this way, it is best to save the black box in a
library. (See the Simulink documentation on libraries for details.) The tutorial library
scope_lib.mdl contains the modified signal scope black box used in this example.
When a black box configuration M-function adds an HDL file, the path to the file can
be relative to the directory in which the library is saved. This eliminates the need to
copy the HDL into the same directory as the model.
The black box's configuration M-function is invoked whenever the block parameter
dialog box is modified. This allows the M-function to check the mask parameters and
configure the black box accordingly. In this example, the M-function adjusts the
number of block input ports based on the nports parameter specified in the mask.
6.
Open the file scope_config.m that defines the configuration M-function for the
example black box. Locate the line:
simulink_block = this_block.blockName;
This obtains the Simulink name of the black box and assigns it to the variable
simulink_block. The name is useful because it is the handle that MATLAB
functions need to manipulate the block.
7.
www.xilinx.com
387
8.
Once the number of input ports is determined, the M-function adds the input ports to
the black box. The code that does this is shown below.
for i=1:nports
this_block.addSimulinkInport(sprintf('sig%d',i));
end
There are four VHDL files, named scope1.vhd, scope2.vhd, scope3.vhd, and
scope4.vhd, which the black box in this example can use. The black box associates
itself to the one that declares an appropriate number of ports.
9.
The configuration M-function selects the appropriate VHDL file for the black box.
Locate the following line in scope_config.m:
entityName = sprintf('scope%d',nports);
The HDL entity name for the black box is constructed by appending the value of
nports to scope. The VHDL is associated with the black box in the following line:
this_block.addFile(['vhdl/' entityName '.vhd']);
10. The input port widths for each VHDL entity are assigned using generics. The generic
name identifies the input port to which the width is assigned. For example, the width3
generic specifies the width of the third input. In scope_config.m, the generic names
and values are set as follows:
% ----------------------------if (this_block.inputTypesKnown)
for i=1:nports
width = this_block.inport(i).width;
this_block.addGeneric(sprintf('width%d',i),width);
end
end % if(inputTypesKnown)
% -----------------------------
11. You can change the way ModelSim displays the signal waveforms during simulation
by using custom tcl scripts in the ModelSim block. Double click on the ModelSim block
in the black_box_ex5 model. The following dialog box appears:
Custom scripts are defined by selecting the Add Custom Scripts checkbox. In this
case, a script named waveform.do is specified in the Script to Run after vsim field.
This script contains the ModelSim commands necessary to display the adder output as
an analog waveform.
388
www.xilinx.com
This design imports an encrypted VHDL file generated from the licensed core Color
Correction Matrix v1.0. The input to the core is a 24-bit RGB signal {R, G, B} and the
output is a Color transformed 24-bit signal {Rt, Gt, Bt} signal such that :
The active_video_in signal is used to mark each video_data_in sample as valid. The
signals hblank_in and vblank_in are ignored in this example design. Refer to the
Color Correction Matrix v1.0 LogiCORE datasheet for more information on this core.
2.
www.xilinx.com
389
Double click on the Black Box in the example design and you will see this config file
specified:
Notice also that the ISE Simulator has been been specified as the simulator to use.
3.
In order to tell System Generator to netlist the encrypted VHDL file separately, you
must open the file encrypted_hdl_inport_wrapper_config.m and modify the
file by adding the following line:
this_block.addFile('encrypted_hdl_import.vhd','encrypted_hdl_import.vhd');
In the above line, the second parameter in the addFile function instructs System
Generator to netlist the encrypted file as a separate file and to not include the file in the
390
www.xilinx.com
consolidated VHDL netlist. The following figure shows how this line has already been
added for you in this example:
www.xilinx.com
391
4.
392
www.xilinx.com
5.
Double click on the System Generator Token and verify that the Compilation option is
set to HDL Netlist. Click Generate.
A folder named hdl is created inside the example7 folder.
6.
Open the hdl folder and notice the file named encrypted_hdl_import.vhd. Open
the file to see that this is the encrypted file that was netlisted separately.
Note: The file encrypted_hdl_import.vhd is for simulation purposes only. If you want to netlist
this design for implementation, youll need to include another addFile line in the configuration file that
specifies the NGC file that is created by Core Generator. Refer to the tutorial Black Box Tutorial
Example 2: Importing a Core Generator Module that Needs a VHDL Wrapper to Satisfy
Black Box HDL Requirements for an example of how to do this.
www.xilinx.com
393
394
1.
2.
www.xilinx.com
3.
Double click on the Subsystem block and change the COUNT_MAX to a different
count value, simulate the design, and verify the count on the WaveScope.
4.
Next, take a look at the counter_config.m file and examine the following lines of
M-code that were added to the original machine-generated code by System Generator.
a.
www.xilinx.com
395
b. Set the appropriate bit width for the count output based on the count_max value
entered by a user.
c.
The following is a screen-shot of the parameters that are declared at the beginning of the
counter.vhd file.
396
www.xilinx.com
Chapter 5
Bitstream Compilation
Hardware Co-Simulation
Compilation
www.xilinx.com
397
Runs the selected synthesis tool to produce a lower-level netlist. The type of netlist
(e.g., EDIF for Synplify or Synplify Pro, NGC for XST) depends on which synthesis tool
is chosen for compilation.
Note: Note: IO buffers are not inserted in the design during synthesis.
2.
398
Combines synthesis results, core netlists, black box netlists, and optionally the
constraints files into a single NGC file.
www.xilinx.com
Bitstream Compilation
As shown below, you may select the NGC compilation target by left-clicking the
Compilation submenu control on the System Generator token dialog box, and selecting
the NGC Netlist target.
You may access additional compilation settings specific to NGC Netlist compilation by
clicking on the Settings... button when NGC Netlist is selected as the compilation type in
the System Generator token dialog box. Parameters specific to the NGC Netlist Settings
dialog box include:
Include Clock Wrapper: Selecting this checkbox tells System Generator whether the
clock wrapper portion of your design should be included in the NGC netlist file. Refer
to the topic Compilation Results for more information on the clock wrapper.
Note: If you exclude the clock wrapper from multirate designs, you will need to drive the clock
enable ports with appropriate signals from your own top-level design.
Include Constraints File: Selecting this checkbox tells System Generator whether the
constraints file associated with the design should be included in the NGC netlist file.
Note: When the constraints file is excluded, you should supply your own constraints to ensure
the multi-cycle paths in the System Generator design are appropriately constrained.
Bitstream Compilation
The Bitstream compilation type allows you to compile your design into a Xilinx
configuration bitstream file that is suitable for the FPGA part that is selected in the System
Generator dialog box. The bitstream file is named <design>_cw.bit and is placed in the
design's target directory, where <design> is derived from the portion of the design being
compiled.
System Generator produces the bitstream file by performing the following steps during
compilation:
1.
2.
Runs the selected synthesis tool to produce a lower-level netlist. The type of netlist
(e.g., EDIF for Synplify Pro, NGC for XST) depends on which synthesis tool is chosen
for compilation.
3.
www.xilinx.com
399
As shown below, you may select the Bitstream compilation by left-clicking the
Compilation submenu control on the System Generator token dialog box, and selecting
the Bitstream target.
System Generator uses XFLOW to run the tools necessary to produce the configuration
bitstream. Execution of XFLOW is broken into two flows, implementation and configuration.
The implementation flow is responsible for compiling the synthesis tool netlist output
(e.g., EDIF or NGC) into a placed and routed NCD file. In summary, the implementation
flow performs the following tasks:
1.
Combines synthesis results, core netlists, black box netlists, and constraints files using
NGDBuild.
2.
Runs MAP, PAR, and Trace on the design (in that particular order).
The configuration flow type runs the tools (e.g., BitGen) necessary to create an FPGA BIT
file, using the fully elaborated NCD file as input.
balanced.opt;
fast_runtime.opt;
high_effort.opt.
Note: By default, System Generator uses the balanced.opt file for the implementation flow, and
bitgen.opt file for the configuration flow.
Sometimes you may want to use options files that use settings that differ (e.g., to specify a
higher placer effort level in PAR) from the default options provided by the target. In this
case, you may create your own options files, or edit the default options files to include your
desired settings. The Bitstream settings dialog box allows you to specify options files other
than the default files.
400
www.xilinx.com
Bitstream Compilation
Additional Settings
You may access additional compilation settings specific to Bitstream compilation by
clicking on the Settings... button when Bitstream is selected as the compilation type in the
System Generator token dialog box. Parameters specific to the Bitstream Settings dialog
box include:
Import Top-level Netlist: Allows you to specify your own top-level netlist into which
the System Generator portion of the design is included as a module. You may choose
to import your own top-level netlist if you have a larger design that instantiates the
System Generator clock wrapper level as a component. Refer to the Compilation
Results topic for more information on the clock wrapper level. This top-level netlist is
included in the bitstream file that is generated during compilation. Selecting this
checkbox enables the edit fields Top-level Netlist File (EDIF or NGC) and Search Path
for Additional Netlist and Constraint Files.
Top-level Netlist File (EDIF or NGC): Specifies the name and location of the toplevel netlist file to include during compilation. Note that any HDL components
that are used by your top-level (including the top-level itself) must have been
previously synthesized into netlist files.
Search Path for Additional Netlist and Constraint Files: Specifies the directory
where System Generator should look for additional netlist and constraint files
that go along with the top-level netlist file. System Generator copies all netlist
(e.g., .edn, .edf, .ngc) and constraints files (e.g., .ucf, .xcf, .ncf) into the
implementation directory when this directory is specified. If you do not specify a
directory, System Generator will only copy the netlist file specified in the Toplevel Netlist File field.
Specify Alternate Clock Wrapper: Allows you to substitute your own clock wrapper
logic in place of the clock wrapper HDL System Generator produces. The clock
wrapper level is the top-level HDL file that is created for a System Generator design,
and is responsible for driving the clock and clock enable signals in that design.
Sometimes you may want to supply your own clock wrapper, for example, if your
design uses multiple clock signals, or if you have a board-specific hardware you
would like your design to interface to.
Note: The name of the alternate clock wrapper file must be named <design>_cw.vhd or
<design>_cw.v or it will not be used during bitstream generation.
XFLOW Option Files: When a design is compiled for System Generator hardware cosimulation, the command line tool, XFLOW, is used to implement and configure your
design for the selected FPGA platform. XFLOW defines various flows that determine
the sequence of programs that should be run on your design during compilation.
There are typically multiple flows that must be run in order to achieve the desired
output results, which in the case of hardware co-simulation targets, is a configuration
bitstream.
Configuration Phase (BitGen): Specifies the options file that is used by the
configuration flow type. By default, System Generator will use the configuration
options file that is specified by the compilation target.
www.xilinx.com
401
where
xmp_file is the pathname to the imported EDK project file
bit_file is the pathname to the Sysgen bitstream file
bmm_file is the pathname of the back-annotated BMM file produced by
Sysgen during bitstream compilation
402
www.xilinx.com
Clicking on the Settings button brings up the EDK export settings dialog.
Pcore options allow you to do the following:
www.xilinx.com
403
1. Double Click
2. Select
3. Click
4. Select
5. Click
6. Enter
404
www.xilinx.com
You follow the sequence in the previous figure to bring up the Bus Interface dialog box. In
this dialog box, you define a new Bus Interface called vid_out that is marked as a
myVideoBus Bus Standard and is Bus Type INITIATOR. (Other supported Bus Types
include: Target, Master, Slave, Master-slave, Monitor.) Next, in the Port-Bus Mapping
table, you list all the gateways that you want in the bus, then give each a Bus Interface
Name. You then Netlist the design as a pcore. Remember that you marked this pcore bus as
INITIATOR since it contains outputs.
In another model (shown below), you create corresponding input gateways. You set this
up as a TARGET bus giving the bus interface the same Bus Standard myVideoBus. XPS
will use the Bus Standard name to match different bus interfaces. XPS will then connect the
outputs to the inputs with the same Bus Interface Names.
You export this pcore to the XPS project. When these two pcores are used in the same XPS
project, XPS will detect that they have compatible buses and will allow you to connect
them if you wish.
www.xilinx.com
405
The following table shows subdirectory structure of the pcore that is generated by System
Generator:
pcore
Description
Subdirectory
data
The data directory contains four files: BBD, PAO, MPD and
TCL.
The BBD (black-box definition) file tells the EDK what EDN
or NGC files are used in the design.
The PAO (peripheral analyze order) file tells the EDK the
analyze order of the HDL files.
The MPD (Microprocessor Peripheral Description) file tells
the EDK how the peripheral will connect to the processor.
The TCL file is used by LibGen when elaborating software
drivers for this peripheral.
doc
hdl
netlist
The netlist directory contains the EDN and NGC files listed by
the BBD file
src
See Also:
EDK Processor
406
www.xilinx.com
1. Select
3. Click
2. Select Part
5. Click
www.xilinx.com
407
After filling out the dialog box, click the Generate button and System Generator will
perform the following steps:
1.
The design is compiled using Simulink then netlisted by Sysgen into HDL source.
2.
If you selected the Power Analysis option Full simulation-based analysis, the ISim
simulator is called to simulate the HDL design. The HDL Synthesis Tool is then called
to turn the HDL into an EDIF (Synplify/Synplify Pro) or NGC (XST) netlist.
3.
NGD Build is called to next to turn the netlist into an NGD file. The ISE Mapper
software is then called to map elements of logic together into slices; this creates an
NCD file.
4.
The ISE Place & Route software is then called to place the slices and other elements on
the Xilinx die and to route the connections between the slices. This creates another
NCD file.
5.
The ISE Trace software is then called to analyze the second NCD file and find the paths
with the worst slack. This creates a trace report. The System Generator Timing
Analyzer tool appears, displaying the data from the trace report.
Note: If timing data is generated using this method and you wish to view it again at a later time, then
you can enter the following command at the MATLAB command line:
>>xlTimingAnalysis('timing')
where 'timing' is the name of the target directory in which a prior analysis was carried out.
6.
As shown below, you can click the Power Analysis button on the Timing Analyzer
window to bring up the Xilinx XPower Analysis tool report.
Click
408
www.xilinx.com
The path shown is from the Q output of the register on the left (register3) to the D input of
the register on the right (parity_reg). The path goes through two LUTs (lookup tables) that
are configured as 4-input XOR gates. This path has two levels of logic. That means that it
goes through two separate combinational elements (the two LUTs).
The requested period for this path is 10ns. This path easily meets timing. The second of the
two red comma-separated numbers above each logic elements shows the slack for the path.
The slack is the amount of time by which the path 'meets timing'. In this case the slack is
7.79ns. That means that the path could be 7.79ns slower and still meet the 10ns period
requirement. A negative slack value indicates that the path does not meet timing and has
a setup (or hold) time violation.
www.xilinx.com
409
The top section of the display shows a list of slow paths, while the bottom section of the
display shows details of the path that is selected. The elements of this display are explained
here:
410
Timing Constraint: You may opt to view the paths from all timing constraints or just a
single constraint. A typical System Generator design has but a single timing
constraint which defines the period of the system clock. This is the constraint shown
in this example. TS_clk_a5c9593d is the name of the constraint; the (sometimes
confusing) suffix is a hash meant to make the identifier unique when multiple System
Generator designs are used as components inside a larger design. The timing group
clk_a5c9593 is a group of synchronous logic, again with a hash suffix. The group in
this case contains all the synchronous elements in the design. The period of the clock
here is 10ns with a 50% duty cycle.
Destination: This is the System Generator block that is the terminus of the path.
Slack: The slack for this particular path. See the topic entitled Period and Slack for
more details.
Delay (Path): The delay of the entire path, including the setup time requirement.
% Route Delay: This is the percentage of the path that is consumed by routing (net)
delay. The remainder portion of the path is consumed by logic delay.
www.xilinx.com
Levels of Logic: The number of levels of combinatorial logic in the path. The
combinatorial logic typically comprises LUTs, F5 muxes, and carry chain muxes.
Path Element: This shows the logic and net elements in the highlighted path.
Delay (Element): This shows the delay through the logic and net elements in the
highlighted path.
Type of Delay: This is the kind of delay incurred by the given path element. These
values are defined in the Xilinx part's data sheet. In the example shown above, Tcko is
the clk-to-out time of a flip-flop; net is a net delay; Tilo is the delay through a LUT, and
Tas is the setup time of a flip-flop.
You may click on the column headings to reorder the paths or elements according to delay,
slack, path name, or other column headings. Failing paths are highlighted in red/pink.
Cross-Probing
Highlighting a path in the Slow Paths view will highlight the blocks in the path in the
System Generator diagram. The path's source and destination blocks, as well as
combinational blocks through which the path passes, will be highlighted in red. The
diagram below shows how the model appears when the path that has Registerc as its
www.xilinx.com
411
source and parity_reg as its destination is highlighted. The blocks xor_1b, xor_2a, and
xor_3a are also highlighted because they are part of the path.
Histogram Charts
Clicking on the Charts icon displays a histogram of the slow paths. This histogram is a
useful metric in analyzing the design. You may know that the design will only run at, for
example, 99MHz in your part when you wish it to run at 100MHz. But how close is the
design to meeting timing and how much work is involved in meeting this requirement?
412
www.xilinx.com
The histogram will quickly give you an estimate of the work involved. For example, look
at the histogram of the results of a simple design below:
This shows that most of the slow paths are concentrated about 1.5ns. The slowest path is
about 2.35ns. The numbers at the tops of the bins show the number of paths in each bin.
There is only one path in the bin which encompasses the time range 2.31ns-2.39ns. The bins
to the right of it are empty. This shows that the slowest path is an outlier and that if your
timing requirement were for a period of, for example, 2ns, you would need only to speed
up this single path to meet your timing requirements.
www.xilinx.com
413
Histogram Detail
The slider bar allows you to adjust the width of the bins in the histogram. This allows you
to get more detail about the paths if desired. The display below shows the results of a
different design with a larger number of bins than the diagram above:
This diagram shows the paths grouped into three regions, with each forming a rough bell
curve distribution. These groups are probably from different portions of the circuit or from
different timing constraints that are from different clock regions. If you wish to analyze the
paths from a single timing constraint, you may select a single constraint for viewing from
the Timing constraint pulldown menu at the top of the display.
Note the bins and portions thereof shown in red. These are the paths that have negative
slack; i.e., they do not meet the timing constraint. In this example you can see that some
paths have failed but not by a large margin so it seems reasonable that with some work this
design could be reworked to meet timing.
Statistics
Clicking on the Statistics icon displays several design statistics, including the number of
constraints, paths analyzed, and maximum frequency of the design.
Trace Report
Clicking on the Trace icon shows the raw text report from the Trace program. This file gives
considerable detail about the paths analyzed. Each path analyzed contains information
414
www.xilinx.com
about every net and logic delay, clock skew, and clock uncertainty. The box at the bottom
left of this display shows the path name of the timing report.
Change the source design. Just about any timing problem can be solved by changing
the source design and this is the easiest way to speed up the circuit. Unfortunately, this
is often the last step taken by designers, who often look for a quick solution such as
using a faster part. The source design may be changed in several ways:
a.
Pipelining. This is the surest way to improve speed, but may also be tricky.
Adding pipelining registers increases latency. For designs with feedback, this may
require great care since portions of the design may require pipeline rebalancing.
See the later example for more details on pipelining.
b.
c.
Retiming. This involves taking existing registers and moving them to different
points within the combinational logic to rob from Peter to pay Paul, so to speak.
This works if, to stretch the maxim, Paul is bereft of slack, while Peter has a surfeit.
Some synthesis tools can perform a degree of retiming automatically.
Shannon Expansion. This method involves replicating the faster logic in a critical
path in order to remove dependencies on slower logic. This is sometimes done
automatically by the synthesizer.
www.xilinx.com
415
f.
Using Hard Cores. Are you using a ROM that is implemented in distributed RAM
when it would operate much faster in a block memory hard core? Do you have a
wide adder that would benefit from being put in a DSP48 block, which can operate
at 500MHz? Take advantage of the embedded hard cores.
g.
New Paradigms. Do you need to create a large delay? Instead of using a counter
with a long carry chain, why not build a delay out of cascaded Johnson rings using
SRL16s? Or how about using an LFSR? Neither requires a carry chain and can
operate much faster. Sometimes you have to rethink certain design elements
completely.
2.
Eliminate overconstraints. Ensure that elements of your design that only need to be
operated at a subsampled rate are designed that way by using the downsample and
upsample blocks in System Generator. If these blocks are not used, then the timing
analyzer is not aware that these sections of the circuit are subsampled, and the design
is overconstrainted.
3.
Change the constraints. Is it possible to run the design at a lower clock speed? If so,
this is an easy way to meet your requirements. Unfortunately, this is rarely possible
due to design requirements.
4.
Increase PAR effort levels. The mapper and place & route tools (PAR) in ISE take
effort levels as arguments. When using ISE (from the Project Navigator GUI), try the
timing option in MAP. You may also increase the PAR effort levels which will increase
the PAR execution time but may also result in a faster design.
5.
6.
Floorplanning. This step should be avoided if possible, but can yield huge
improvements. The automatic placer in PAR can be improved upon by human
intervention. Floorplanning places critical elements close to each other on the Xilinx
die, reducing net delays. The PACE tool in ISE may be used for CPLD. A more
advanced tool, PlanAhead software, is used for FPGA.
7.
Use a faster part. This is often the first solution seized upon, but is also expensive. If
you are using an old Xilinx part, porting your design to a newer, faster Xilinx part may
often save money because the new parts may be cheaper on account of Moore's Law.
However, moving to a faster part in the same family incurs significant extra costs, and
often isn't necessary if the previous steps are followed.
416
www.xilinx.com
The way in which System Generator compiles a model into hardware depends on the
compilation target that is chosen for the design. The HDL Netlist compilation target is
most common, and generates an HDL netlist of your design plus any cores that go along
with it. New compilation targets can be created that extend the HDL Netlist target so that
additional tools can be applied to the resulting HDL netlist files.
This topic explains how you can create new compilation targets that extend the HDL
Netlist target in order to produce and configure FPGA hardware. More specifically, it
describes how to configure System Generator to produce a bitstream for a model, and how
to invoke various tools once the bitstream is created.
Although an xltarget function can specify multiple targets, it is not uncommon for each
compilation target to have its own xltarget function. The directories these functions are
saved in distinguish the targets. This means that each xltarget.m file must be saved in
its own subdirectory under the plugins/compilation directory.
www.xilinx.com
417
An xltarget function returns a cell array of target information. Different elements in this
cell array define different compilation targets. The elements in this cell array are MATLAB
structs that define two parameters:
1.
The name of the compilation target as it should appear in the Compilation field of the
System Generator parameters dialog box;
2.
The name of the MATLAB function it should invoke to find out more information (e.g.,
System Generator dialog box parameters, which post-generation function to use, if
any) about the target.
The following code shows how to define three compilation targets named Standalone
Bitstream, iMPACT, and ChipScope Pro Analyzer:
function s = xltarget
s = {};
target_1.('name') = 'Standalone Bitstream';
target_1.('target_info') = 'xltools_target';
target_2.('name') = 'iMPACT';
target_2.('target_info') = 'xltools_target';
target_3.('name') = 'ChipScope Pro Analyzer';
target_3.('target_info') = 'xltools_target';
s = {target_1, target_2, target_3};
The name field in the code shown above specifies the name of the compilation target, as it
should appear in the Compilation field of the System Generator dialog box:
target_1.('name') = 'Standalone Bitstream';
The target_info field tells System Generator the target info function it should call to find
out more information about the target. This function can have any name provided it is
saved in the same directory as the corresponding xltarget.m file, or it is saved somewhere
in the MATLAB path.
target_1.('target_info') = 'xltools_target';
It defines the available and default settings for the target in the System Generator
token dialog box;
It specifies the functions System Generator should call before and after the standard
code generation process.
Post-generation Functions
One way to extend System Generator compilation is by defining a new variety of
compilation that specifies a post-generation function. A post-generation function is a
418
www.xilinx.com
MATLAB function that tells System Generator how to process the HDL and netlist files
once they are generated. This function is run after System Generator finishes the normal
code generation steps involved with HDL Netlist compilation (i.e., producing an HDL
description of the design, running CORE Generator, etc). For example, a hardware cosimulation target defines a post-generation function that in turn runs the tools necessary to
produce hardware that can be used in the Simulink simulation loop.
Note: Two post-generation functions xlBitstreamPostGeneration.m and
xltools_postgeneration.m, are included in the examples/comp_targets directory of your
System Generator install tree.
xlBitstreamPostGeneration.m
This example post-generation function compiles your model into a configuration bitstream
that is appropriate for the settings (e.g., FPGA part, clock frequency, clock pin location)
given in the System Generator dialog box of your design.
It then uses an XFLOW-based flow to invoke the Xilinx tools necessary to produce an
FPGA configuration bitstream.
It is possible to configure the tools and configurations for each tool invoked by XFLOW.
For more information on how to do this, refer to the topic in this example entitled Using
XFLOW
xltools_postgeneration.m
Sometimes you may want to run tools that configure and run the FPGA after a
configuration bitstream has been generated (e.g., iMPACT, ChipScope Pro Analyzer).
The xltools_postgeneration function first calls the xlBitstreamGeneration function to
generate the bitstream. It then invokes the appropriate tool (or tools) depending on the
compilation target that is selected.
For example, you may want a compilation target that invokes iMPACT after the bitstream
is generated. This can be done as follows (assuming iMPACT is in your system path):
if (strcmp(params.compilation, 'iMPACT'))
dos('impact');
end;
The first line checks the name of the compilation target. The second line sets up a DOS
command that invokes iMPACT. ChipScope Pro Analyzer can be invoked similarly to the
code above:
if (strcmp(params.compilation, 'ChipScope Pro Analyzer'))
xlCallChipScopeAnalyzer;
end;
2.
3.
Add the desired compilation targets (e.g., iMPACT, ChipScope Analyzer Pro) to the
xltarget.m file.
www.xilinx.com
419
4.
5.
Note: The System Generator Compilation submenus mirror the directory structure under the
plugins/compilation directory. When you create a new directory, or directory hierarchy, for the
compilation target files, the names of the directories define the taxonomy of the compilation target
submenus.
6.
7.
8.
You can now access the newly installed compilation target from the System Generator
graphical interface.
Using XFLOW
The post-generation scripting included with this example uses XFLOW to produce a
configuration file for your FPGA. XFLOW allows you to automate the process of design
synthesis, implementation, and simulation using a command line interface. XFLOW uses
command files to tell it which tools to run, and how they should be run.
This example contains two XFLOW options files, balanced_xltools.opt and
bitgen_xltools.opt. These files are associated with the implementation and
configuration flows of XFLOW, respectively. The balanced_xltools.opt options files
runs the Xilinx NGDBUILD, MAP, and PAR tools. The settings for each tool are specified in
the options files . The bitgen_xltools.opt file runs BITGEN to produce a
configuration file for your FPGA. You may modify these files as desired (e.g., to run the
timing analyzer after PAR).
420
www.xilinx.com
Index
A
378
importing a VHDL module 371
importing a Xilinx Core Generator module 357
Importing an Encrypted VHDL
File 389
Importing, Simulating, and Exporting an Encrypted VHDL
Module 389
simulating several black boxes
simultaneously 382
HDL Co-Sim
configuring the HDL simulator
B
Bit-Accurate 18
Bitstream Compilation 399
Bit-True Modeling 24
Black Box
Configuration M-Function
adding new ports 342
black box API 350
black box clocking 345
combinational paths 346
configuring port sample rates
344
configuring port types 343
defining block ports 342
dynamic output ports 344
error checking 350
language selection 341
obtaining a port object 342
specifying the top-level entity
341
specifying Verilog parameters
346
specifying VHDL Generics 346
SysgenBlockDescriptor Member Variables 350
SysgenBlockDescriptor methods 351
SysgenPortDescriptor Member
Variables 353
SysgenPortDescriptor methods
353
Examples 357
advanced black box example using ModelSim 384
dynamic black boxes 380
354
co-simulating multiple black
boxes 356
Black Box Configuration
M-function 340
Black Box Configuration Wizard 339
Block Masks 37
Blockset
Xilinx 19
C
ChipScope Pro Analyzer 139
Clock Domain Partitioning 128
Clock Enable
Fanout Reduction 97
Clock Frequency
selecting for Hardware Co-Sim 245
Clocking
and timing 25
asynchronous 26
synchronous 27
Clocking Options
Clock Enable 27
Expose Clock Ports 28
Hybrid DCM-CE 28, 42
Code Generation
automatic 39
Color Shading
blocks by signal rate 21
Compilation Type
using XFLOW 420
www.xilinx.com
Compilation Types
Bitstream Compilation 399
configuring and installing the Compilation Target 419
creating new compilation targets
416
EDK Export Tool 403
Hardware Co-Simulation Compilation 407
HDL Netlist Compilation 398
NGC Netlist Compilation 398
Compiling for
bitstream generation 399
EDK Export 403
Hardware Co-Simulation 407
NGC Netlist generation 398
Compiling for HDL Netlist generation
398
Compiling MATLAB
complex multiplier with latency 55
disp function 71
finite state machines 62
FIR example 66
into an FPGA 51
optional input ports 60
parameterizable accumulator 63
passing parameters into the MCode
block 57
RPN calculator 69
shift operation 56
simple arithmetic operation 52
simple selector 51
Compiling Shared Memories
for HW Co-Sim 257
Configurable Subsystems and System
Generator 88
Configuring and Installing the Compilation Target 419
Constraints File
System Generator 46
Controls
hierarchical 44
Creating Compilation Targets 416
Crossing Clock Domains 129
Custom Bus Interfaces
for exported pcore 404
Cycle-Accurate 18
Cycle-True Clock Islands 127
Cycle-True Modeling 24
421
FDATool
using in digital filter applications
109
mapping to from logic synthesis
tools 109
physical planning for 116
DSP48 Macro block 111
E
EDK
generating software drivers 160
support from System Generator 175
writing a software program 163
EDK Export Tool 403
exporting a pcore 178
EDK Import Wizard 175
EDK Processor
exposing processor ports 177
importing 175
Encrypted VHDL File
how to import as a Black Box 389
Ethernet-based HW Co-Sim 307
Export pcore
enable Custom Bus Interfaces 404
Exporting
a pcore 178
a System Generator model as a pcore
159
Expose Clock Ports Option
tutorial 34
F
Fanout Reduction
for Clock Enable 97
422
118
Floating-Point Data Type
signal Groups 21
FPGA
a brief introduction 10
generating a bitstream 102
notes for higher performance 94
Frame-Based Acceleration
using Hardware Co-Sim 268
Full Precision signal type 20
G
Generating
an FPGA bitstream 102
EDK software drivers 160
Generating an FPGA Bitstream
Generating an FPGA Bitstream 102
H
Hardware
oversampling 26
Hardware Co-Sim 239
blocks 242
choosing a compilation target 241
compiling shared memories 257
co-simulating lockable shared memories 260
co-simulating shared FIFOs 263
co-simulating shared registers 262
co-simulating unprotected shared
memories 259
Installing Software on the Host PC
291
Installing the Proxy Executable for
Linux Users 295
invoking the code generator 241
JTAG hardware requirements 323
Loading the Sysgen HW Co-Sim
Configuration Files 293
Network-Based Ethernet 253
Point-to-Point Ethernet 249
processor integration 159
restrictions on shared memories 266
selecting the target clock frequency
245
Setting Up the Local Area Network
on the PC 291
shared memory support 256
using for frame-based acceleration
268
www.xilinx.com
281
Xilinx tool flow settings 266
Hardware Co-Simulation Compilation
407
Hardware Debugging
using ChipScope Pro 139
Hardware Generation 159
Hardware Generation Mode
EDK pcore 159
HDL netlist 159
Hardware/Software Co-Design 156
Examples
creating a MicroBlaze Peripheral in System Generator 185
designing and simulating MicroBlaze Processor Systems
192
using EDK 200
using PicoBlase in System Generator 180
HDL Co-Sim
configuring the HDL simulator 354
co-simulating multiple black boxes
356
HDL Netlist Compilation 398
HDL Testbench 50
Hierarchical Controls 44
Histogram Charts
from Timing Analyzer 412, 415
Hybrid DCM-CE Option
locked pin 28
reset pin 28
tutorial 29
I
Implementing
a complete design 16
part of a design 15
Importing
a System Generator design 73
A System Generator Design into PlanAhead 86
an EDK processor 175
an EDK project 159
Importing a System Generator Design 73
integration design rules 73
integration flow with Project Navigator 74
step-by-step example 75
Installation
Installing a KC705 Board for JTAG
Hardware Co-Sim 321
J
JTAG Hardware Co-Sim
board support package files 329
Detecting New Board Packages 335
installing board-support packages
334
manually specifying board-specific
ports 332
obtaining platform information 330
providing your own top-level 333
supporting new boards 323
JTAG-based HW Co-Sim 313, 315, 317,
319, 321
K
KC705 Board
Installation for JTAG HW Co-Sim
321
L
Linux
Installing the Proxy Executable for
Linux Users 295
Locked pin
Hybrid DCM-CE Option 28
315
ML605 Board
Installation for JTAG HW Co-Sim
317
Modeling
bit-true and cycle-true 24
Multiple Clock Applications 127
Multirate Designs
color shading by signal rate 21
Multirate Models 25
178
in System Generator tutorial 180
overview 178
PlanAhead
generating a PPR file from System
Generator 82
Importing a System Generator Design 86
PLB-based pcore 156
Point-to-Point Ethernet HW Co-Sim 249
Power Analysis
using XPower 407
Processor Integration
Hardware Co-Sim 159
hardware generation 159
memory map creation 158
using custom logic 156
Project File
Generating a PlanAhead project file
from System Generator 82
Project Navigator
integration flow with System Generator 74
R
N
Netlisting
multiple clock designs 130
Network-Based Ethernet Hardware CoSim 253
NGC Netlist Compilation 398
Notes
for higher performance FPGA design
94
O
OutputFiles
produced by System Generator 44
Oversampling 26
M
MATLAB
compiling into an FPGA 51
complex multiplier with latency 55
disp function 71
finite state machines 62
FIR example 66
optional input ports 60
parameterizable accumulator 63
Rate-Changing Blocks 26
Real-Time Signal Processing
using Hardware Co-Sim 281
Reducing
Clock Enable Fannout 97
Reference Blockset
Xilinx 19
Reset pin
Hybrid DCM-CE Option 28
Resource Estimation 39
S
SBD Builder
saving plugin files 328
specifying board-specific I/O ports
326
P
Parameter Passing 38
Pcore
export as under development 403
pcore
exporting 178
exporting a System Generator model
as a peripheral 159
PicoBlaze
www.xilinx.com
SDK Standalone
Migrating a software project from
XPS 207
Shared Memory Support
for HW Co-Sim 256
Signal Groups
AXI 24
Floating-Point Data Type 21
Signal Types 20
423
307
Synchronization Mechanisms
indeterminate data 37
valid ports 37
Synchronous Clocking 27
Clock Enable option 27
Expose Clock Ports option 28
Hybrid DCM-CE option 28, 42
System Generator
adding a block to a Configurable
Subsystem 91
and Configurable Subsystems 88
blocksets 18
defining a Configurable Subsystem
88
deleting a block from a Configurable
Subsystem 91
generating hardware from Configurable Subsystems 92
output files 44
processing a design with physical
design tools 99
resetting auto-generated Clock Enable logic 105
system-level modeling 17
using a Configurable Subsystem 90
System Generator Constraints
constraints file 46
example 47
IOB timing and placement 46
multicycle path 46
system clock period 46
System Generator Design Flows
algorithm exploration 15
implementing a complete design 16
implementing part of a larger design
15
System Generator token
compiling and simulating 40
System-Level Modeling 17
424
T
Tapped Delay Lines 13
TDM data streams 13
Testbench
HDL 50
Time-Division Multiplexed 13
Timing Analysis
clock skew and jitter 410
concepts review 409
cross-probing 411
displaying low-level names 411
histogram charts 412, 415
improving failing paths 415
observing slow paths 410
path analysis example 409
period and slack 409
statistics 414
trace report 414
Timing Analyzer
invoking on previously-generated
data 408
Timing and Clocking 25
Timing and Power Analysis
compilation type
Compiling for
Trace Report
timing analysis 414
Tutorials
Black Box
Dynamic Black Boxes 380
Importing a Core Generator
Module 358
Importing a Core Generator
Module that Needs a VHDL
Wrapper 364
Importing a Verilog Module
379
Importing a VHDL Module 371
Importing, Simulating, and Exporting an Encrypted VHDL
Module 389
Simulating Several Black Boxes
Simultaneously 382
ChipScope
Using ChipScope in System
Generator 139
Clocking
Using the Clock Generator(DCM) Option 29
Using the Expose Clock Ports
Option 34
www.xilinx.com
Hardware/Software Co-Design
Creating a MicroBlaze Peripheral in System Generator 185
Creating a New XPS Project 200
Designing and Simulating MicroBlaze Processor Systems
192
Using PicoBlaze in System Generator 180
Using System Generator and SDK to
Co-Debug an Embedded DSP Design 214
U
Underdevelopment
export pcore as 403
Using XFLOW 420
V
Variable Clock Frequency
selecting for Hardware Co-Sim 245
W
Wizards
Base System Builder 200
Black Box Configuration 339, 371
EDK Import 175
XPS Import 194
X
Xilinx
Blockset 19
Reference Blockset 19
Xilinx Tool Flow Settings
for HW Co-Sim 266
xlCallChipScopeAnalyzer 419
xlmax 51
xlSimpleArith 52
xltarget
defining new Compilation Targets
417
xlTimingAnalysis 408
xltools_postgeneration 418, 419
xltools_target 418
XPower
power analysis 407
XPS Import Wizard 194