HB DSPB Adv
HB DSPB Adv
HB DSPB Adv
(Advanced Blockset)
Handbook
Contents
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
2
Contents
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
3
Contents
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
4
Contents
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
5
Contents
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
6
Contents
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
7
Contents
12. IP Library..................................................................................................................242
12.1. Channel Filter and Waveform Library...................................................................242
12.1.1. DSP Builder FIR and CIC Filters............................................................. 243
12.1.2. DSP Builder FIR Filters.......................................................................... 246
12.1.3. Channel Viewer (ChanView) ................................................................. 248
12.1.4. Complex Mixer (ComplexMixer) ............................................................ 249
12.1.5. Decimating CIC .................................................................................. 251
12.1.6. Decimating FIR .................................................................................. 252
12.1.7. Fractional Rate FIR ............................................................................. 254
12.1.8. Interpolating CIC ................................................................................257
12.1.9. Interpolating FIR ................................................................................ 258
12.1.10. NCO ................................................................................................260
12.1.11. Real Mixer (Mixer) .............................................................................265
12.1.12. Scale .............................................................................................. 267
12.1.13. Single-Rate FIR ................................................................................ 268
12.2. Dependent Delay Library................................................................................... 270
12.3. FFT IP Library.................................................................................................. 271
12.3.1. Bit Reverse Core C (BitReverseCoreC and VariableBitReverse) .................. 271
12.3.2. FFT (FFT, FFT_Light, VFFT, VFFT_Light) ..................................................272
13. Interfaces Library..................................................................................................... 275
13.1. Memory-Mapped Library................................................................................... 275
13.1.1. Bus Slave (BusSlave) .......................................................................... 275
13.1.2. Bus Stimulus (BusStimulus) ................................................................. 276
13.1.3. Bus Stimulus File Reader (Bus StimulusFileReader) ................................. 277
13.1.4. External Memory, Memory Read, Memory Write........................................ 279
13.1.5. Register Bit (RegBit) ........................................................................... 283
13.1.6. Register Field (RegField) ......................................................................283
13.1.7. Register Out (RegOut) .........................................................................284
13.1.8. Shared Memory (SharedMem) ..............................................................285
13.2. Streaming Library.............................................................................................286
13.2.1. Avalon-ST Input (AStInput) ................................................................. 286
13.2.2. Avalon-ST Input FIFO Buffer (AStInputFIFO) .......................................... 287
13.2.3. Avalon-ST Output (AStOutput) ............................................................. 287
14. Primitives Library.....................................................................................................289
14.1. Vector and Complex Type Support..................................................................... 289
14.1.1. Vector Type Support............................................................................ 289
14.1.2. Complex Support.................................................................................290
14.2. FFT Design Elements Library............................................................................. 291
14.2.1. About Pruning and Twiddle for FFT Blocks................................................ 292
14.2.2. Bit Vector Combine (BitVectorCombine) ................................................. 294
14.2.3. Butterfly Unit (BFU) ............................................................................ 294
14.2.4. Butterfly I C (BFIC) (Deprecated) ......................................................... 295
14.2.5. Butterfly II C (BFIIC) (Deprecated) ....................................................... 295
14.2.6. Choose Bits (ChooseBits) .....................................................................296
14.2.7. Crossover Switch (XSwitch) ................................................................. 297
14.2.8. Dual Twiddle Memory (DualTwiddleMemoryC) ......................................... 297
14.2.9. Edge Detect (EdgeDetect) ....................................................................298
14.2.10. Floating-Point Twiddle Generator (TwiddleGenF) (Deprecated) .................298
14.2.11. Fully-Parallel FFTs (FFT2P, FFT4P, FFT8P, FFT16P, FFT32P, and FFT64P) ..... 298
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
8
Contents
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
9
Contents
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
10
Contents
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
11
HB_DSPB_ADV | 2019.04.01
Send Feedback
You can create designs without needing detailed device knowledge and generate
designs that run on a variety of FPGA families with different hardware architectures.
DSP Builder allows you to manually describe algorithmic functions and apply rule-
based methods to generate hardware optimized code. The advanced blockset is
particularly suited for streaming algorithms characterized by continuous data streams
and occasional control. For example, use DSP Builder to create RF card designs that
comprise long filter chains.
After specifying the desired clock frequency, target device family, number of channels,
and other top-level design constraints, DSP Builder pipelines the generated RTL to
achieve timing closure. By analyzing the system-level constraints, DSP Builder can
optimize folding to balance latency versus resources, with no need for manual RTL
editing.
DSP Builder advanced blockset includes its own timing-driven IP blocks that can
generate high performance FIR, CIC, and NCO models.
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
1. About DSP Builder for FPGAs
HB_DSPB_ADV | 2019.04.01
Note: DSP Builder design can only have one synthesizable top-level design, which can
contain many subsystems (primitive and IP blocks) to help organize your design. Any
primitive blocks must be within a primitive subsystem hierarchy and any IP blocks
must be outside primitive subsystem hierarchies.
Top-Level Design
The top-level design must have a Control block to specify RTL output directory and
top-level threshold parameters
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
13
1. About DSP Builder for FPGAs
HB_DSPB_ADV | 2019.04.01
Note: Every DSP Builder design must have Control block to allow you to simulate or compile
your design. Do not place the Device block in the top-level design. DSP Builder
propogates data types from the testbench to the synthesizable top-level design.
Primitive subsystems are scheduled domains for Primitive and IP library blocks. A
primitive subsystem must have:
• A SynthesisInfo block, with synthesis style set to Scheduled, so that DSP
Builder can pipeline and redistribute memories optimally to achieve the desired
clock frequency.
• Boundary blocks that delimit the primitive subsystem:
— ChannelIn (channelized input),
— ChannelOut (channelized output),
— GPIn (general purpose input)
— GPOut (general purpose output).
DSP Builder synchronizes connections that pass through the same boundary block.
Use system interface blocks to delimit the boundaries of scheduled domains within a
subsystem. Within these boundary blocks DSP Builder optimizes the implementation
you specify by the schematic. DSP Builder inserts pipelining registers to achieve the
specified system clock rate. When DSP Builder inserts pipelining registers, it adds
equivalent latency to parallel signals that need to be kept synchronous so that DSP
Builder schedules them together. DSP Builder schedules signals that go through the
same input boundary block (ChannelIn or GPIn) to start at the same point in time;
signals that go through the same output boundary block (ChannelOut or GPOut) to
finish at the same point in time. DSP Builder adds any pipelining latency that you add
to achieve fMAX in balanced cuts through the signals across the design. DSP Builder
applies the correction to the simulation at the boundary blocks to account for this
latency in HDL generation. The primitive subsystem as a whole remains cycle
accurate. You can specify further levels of hierarchy within primitive subsystems
containing primitive blocks, but no further primitive boundary blocks or IP blocks.
Related Information
• Synthesis Information (SynthesisInfo) on page 362
• Primitives Library on page 289
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
14
1. About DSP Builder for FPGAs
HB_DSPB_ADV | 2019.04.01
Configuration blocks Blocks that configure how DSP Builder synthesizes the design or subsystem
Low-level building blocks Basic operator, logic, and memory primitive blocks for scheduled subsystems delimited
(primitives) by boundary configuration blocks (primitive subsystems).
Common design elements Common functions for parameterizable subsystems of primitives and within scheduled
subsystems delimited by boundary configuration blocks
IP function-level functions (IP) Stand-alone IP-level blocks comprising functions such as entire FFTs, FIRs and NCOs. Use
these blocks only outside of primitive subsystems.
System interface blocks Blocks that expose Avalon-ST and Avalon-MM interfaces for interaction with other IP
(such as external memories) in Platform Designer.
Non-synthesizable blocks Blocks that play no part in the synthesized design. For example, blocks that provide
testbench stimulus, blocks that provide information, or enable design analysis.
Library Description
Design Configuration Blocks that set the design parameters, such as device family, target fMAX and
bus interface signal width.
Primitives ➤ Primitive Configuration Blocks that change how DSP Builder synthesizes primitive subsystems,
including boundary delimiters.
Primitives ➤ Primitive Design Configurable blocks and common design patterns built from primitive blocks.
Elements
Primitives ➤ FFT Design Elements Configurable FFT component blocks built from primitive blocks. Use in
primitive subsystems to build custom FFTs.
IP FFT IP Full FFT IP functions. These blocks are complete primitive subsystems. Click
Look under the Mask to see how DSP Builder builds these blocks from the
primitive FFT design elements.
continued...
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
15
1. About DSP Builder for FPGAs
HB_DSPB_ADV | 2019.04.01
Library Description
IP ➤ Channel Filter And Waveform Functions to construct digital up- and down-conversion chains: FIR, CIC, NCO,
mixers, complex mixers, channel view, and scale IP.
Interfaces Blocks that set and use Avalon interfaces. DSP Builder treats design-level
ports that do not route via Avalon interface blocks as individual conduits.
Interfaces ➤ Memory Mapped Blocks that set and use Avalon-MM interfaces, including memory-mapped
blocks, memory-mapped stimulus blocks, and external memory blocks.
Utilities ➤ Analyze And Test Blocks that help with design testing and debugging.
Related Information
• Design Configuration Library on page 234
• IP Library on page 242
• Interfaces Library on page 275
• Primitives Library on page 289
• Utilities Library on page 370
• Scheduled Synthesis on page 363
• Avalon Interface Specification
Avalon interfaces simplify system design by allowing you to easily connect
components in an Intel® FPGA
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
16
HB_DSPB_ADV | 2019.04.01
Send Feedback
Related Information
• IP Tutorial
• Primitives Tutorial
Related Information
• The DSP Builder Windows Shortcut Menu
Create the shortcut to set the file paths to DSP Builder and run a batch file with
an argument for the MATLAB executable to use.
• Browsing DSP Builder Libraries and Adding Blocks to a New Model
• Browsing and Opening DSP Builder Design Examples
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
2. DSP Builder for Intel FPGAs Advanced Blockset Getting Started
HB_DSPB_ADV | 2019.04.01
Related Information
• Starting DSP Builder in MATLAB
• Browsing and Opening DSP Builder Design Examples
• DSP Builder Advanced Blockset Libraries
• Creating a DSP Builder Design in Simulink
Intel recommends you create new designs with the DSP Builder New Model
Wizard or copy and rename a design example.
Related Information
• Starting DSP Builder in MATLAB on page 17
• Starting DSP Builder in MATLAB
• DSP Builder Advanced Blockset Libraries
• Browsing DSP Builder Libraries and Adding Blocks to a New Model
• Creating a DSP Builder Design in Simulink
Intel recommends you create new designs with the DSP Builder New Model
Wizard or copy and rename a design example.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
18
2. DSP Builder for Intel FPGAs Advanced Blockset Getting Started
HB_DSPB_ADV | 2019.04.01
2.4. Creating a New DSP Builder Design with the DSP Builder New
Model Wizard
Intel recommends you create new designs with the DSP Builder New Model Wizard.
Alternatively, you can copy and rename a design example.
Related Information
• Starting DSP Builder in MATLAB on page 17
• Starting DSP Builder in MATLAB
• DSP Builder Advanced Blockset Libraries
• Simulating, Generating, and Compiling Your Design
• DSP Builder Menu Options
Simulink includes a DSP Builder menu on any Simulink model window. Use
this menu to easily start all the common tasks you need to perform on your
DSP Builder model.
• DSP Builder New Model Wizard Setup Script Parameters
Use the setup script to set name-spaced workspace variables that DSP Builder
uses to configure the design.
• DSP Builder Design Rules and Recommendations
Use the design rules and recommendations to ensure your design performs
correctly.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
19
2. DSP Builder for Intel FPGAs Advanced Blockset Getting Started
HB_DSPB_ADV | 2019.04.01
Create new design New Model Wizard Create a new model from a simple template.
New SIL Wizard Create a version of the existing design setup for hardware
cosimulation.
Verification Design Checker Verify your design against basic design rules.
Generated hardware details Resource Usage … View resource estimates of the generated hardware.
Run other software tools Run Quartus Prime Run a Quartus Prime project for the generated hardware.
Software
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
20
2. DSP Builder for Intel FPGAs Advanced Blockset Getting Started
HB_DSPB_ADV | 2019.04.01
Floating The testbench propagates single precision floating-point data into the synthesizable system.
Fixed The testbench propagates signed fixed-point data into the synthesizable system.
Channelizer The testbench consists of a Channelizer block, which outputs data from a MATLAB array in the DSP
Builder valid-channel-data protocol
'IP' The synthesizable system has two IP function-level subsystems (lP library blocks) a FIR and a Scale
block
'Primitive' The synthesizable system is a scheduled primitive subsystem with ChannelIn and ChannelOut
boundary blocks. Use this start point to create your own function using low-level (primitive) building
blocks .
Related Information
• Creating a New DSP Builder Design with the DSP Builder New Model Wizard
Intel recommends you create new designs with the DSP Builder New Model
Wizard. Alternatively, you can copy and rename a design example.
• DSP Builder Menu Options
Simulink includes a DSP Builder menu on any Simulink model window. Use
this menu to easily start all the common tasks you need to perform on your
DSP Builder model.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
21
2. DSP Builder for Intel FPGAs Advanced Blockset Getting Started
HB_DSPB_ADV | 2019.04.01
Note: If you turn on Run Quartus Prime Software, the verification script
also compiles the design in the Quartus Prime software. MATLAB reports
the postcompilation resource usage details in the verification window.
MATLAB verifies that the Simulink simulation results match a simulation of the
generated HDL in the ModelSim simulator.
c. Close both verification windows when MATLAB completes the verification.
4. Examine the generated resource summaries:
a. Click Simulation ➤ Start.
b. Click Resource Usage ➤ Design for a top-level design summary.
5. View the Avalon-MM register memory map:
a. Click Simulation ➤ Start.
b. Click Memory Map ➤ Design. DSP Builder highlights in red any memory
conflicts.
Note: DSP Builder also generates the memory map in the <design
name>_mmap.h file.
6. Compile your design in the Quartus Prime software by clicking Run Quartus
Prime. When the Quartus Prime software opens, click Processing ➤ Start
Compilation.
Related Information
• DSP Builder Generated Files on page 62
• Creating a New DSP Builder Design with the DSP Builder New Model Wizard
Intel recommends you create new designs with the DSP Builder New Model
Wizard. Alternatively, you can copy and rename a design example.
• Creating a New Design by Copying a DSP Builder Design Example
• DSP Builder Advanced Blockset Generated Files
DSP Builder generates the files in a directory structure at the location you
specify in the Control block, which defaults to ..\rtl (relative to the
working directory that contains the .mdl file)
• Control
The Control block specifies information about the hardware generation
environment and the top-level memory-mapped bus interface widths.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
22
HB_DSPB_ADV | 2019.04.01
Send Feedback
N
Implement design in
DSP Builder advanced
blockset
Verify in MATLAB
or Simulink
Functionality correct?
N
N
Explore design
tradeoffs
Meeting resource
requirement?
Verify in hardware
Successful?
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
7. Integrating Your DSP Builder Advanced Blockset Design into Hardware on page
62
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
24
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
The channel (uint8) signal is a synchronization counter for multiple channel data on
the data signals. Typically, it increments from 0 with the changing channels across the
data signals within a frame of data
The data signals can be any number of synchronized signals carrying single or
multichannel data.
The valid (ufix(1) or bool)) signal indicates whether the concurrent data and
channel signals have valid information (1), are unknown (0), or do not care (0).
Only one set of valid, channel,and data signals can exist in a IP and synthesized
subsystem. But multiple data signals can exist in a customized synthesizable
subsystem.
Data on the data wire is only valid when DSP Builder asserts valid high. During this
clock cycle, channel carries an 8-bit integer channel identifier. DSP Builder preserves
this channel identifier through the datapath, so that you can easily track and decode
data.
This simple protocol is easy to interface with external circuitry. It avoids balancing
delays, and counting cycles, because you can simply decode the valid and channel
signals to determine when to capture the data in any downstream blocks. DSP Builder
distributes the control structures in each block of your design.
The IP library blocks follow the same rules. Therefore, it is easy to connect IP blocks
and Primitive subsystems.
The IP library filters all use the same protocol with an additional simplification—DSP
Builder produces all the channels for a frame in a multichannel filter in adjacent
cycles, which is also a requirement on the filter inputs. If a FIR filter needs to use flow
control, pull down the valid signal between frames of data—just before you transmit
channel 0 data.
The same <data, valid, channel> protocol connects all CIC and FIR filter blocks
and all subsystems with Primitive library blocks. The blocks in the Channel Filter
and Waveform library support separate real and imaginary (or sine and cosine)
signals. The design may require some splitting or combining logic when using the
mixer blocks. Use a Primitive subsystem to implement this logic.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
25
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Related Information
• Channel In (ChannelIn) on page 359
• Channel Out (ChannelOut) on page 360
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
26
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
27
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
3.1.2.2. Periods
For any data signal in a DSP Builder design, the FPGA clock rate to sample rate ratio
determines the period value of this data signal. In a multirate design, the signal
sample rate can change as the data travels through a decimation or interpolation filter.
Therefore period at different stages of your design may be different.
In a multichannel design, period also decides how many channels you can process on
a wire, or on one signal. Where you have more channels than you can process on one
path, or wire, in a conventional design, you need to duplicate the datapath and
hardware to accommodate the channels that do not fit in a single wire. If the
processing for each channel or path is not exactly the same, DSP Builder advanced
blockset supports vector or array data and performs the hardware and datapath
duplication for you. You can use a wire with a one dimensional data type to represent
multiple parallel datapaths. DSP Builder IP and Primitive library blocks, such as
adder, delay and multiplier blocks, all support vector inputs, or fat wires, so that you
can easily connect models using a single bus as if it is a single wire.
Use the following variables to determine the number of wires and the number of
channels each wire carries by parameterization:
• ClockRate is the system clock frequency.
• SampleRate is the data sample rate per channel (MSPS).
• ChanCount is the number of channels.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
28
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
29
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
30
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Note: The generated Help page for the block shows the input and output data channel
format that the FIR or CIC filter use after you have run a Simulink simulation.
For more than a single data wire, it is not equal to the channel count on data wires,
but specifies the synchronous channel data alignment across all the data wires. For
example,
For a single wire, the channel signal is the same as a channel count. However, for
ChanWireCount > 1, the channel signal specifies the channel data separation per
wire, rather than the actual channel number: it counts from 0 to ChanCycleCount –1
rather than 0 to ChanCount –1.
The channel signal remains a single wire, not a wire for each data wire. It counts over
0 to ChanCycleCount –1.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
31
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Figure 17. Simulink and Hardware Representations of a Single Rate FIR Filter
In a typical wideband CDMA macro-cell system, the DUC module in the RF card needs
to process eight inphase (I) and quadrature (Q) data pairs, resulting in 16
independent channels on the datapath. The input sample rate to a DUC is at sample
rate 3.84 MHz as defined in the 3GPP specification. A high-performance FPGA running
at 245.76 MHz typically maximizes parallel processing power.
Figure 18. 16-channel WCDMA DUC DesignShows how channel's distribution on wires
change in a multirate system.
data data data data
valid valid valid valid
FIR1 FIR2 CIC
channel channel channel channel
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
32
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Input to FIR1 245.76 16 3.48 64 I1, I2, ...I8, Q1, ... Q8, zeros(1, 2
64–16)
Input to FIR2 245.76 16 7.68 32 I1, I2, ...I8, Q1, ... Q8, zeros(1, 2
32–16)
Output of CIC 245.75 16 122.88 2 I1, I2, I3, I4, I5, I6, I7, I8, Q1, 8
Q2, Q3, Q4, Q5, Q6, Q7, Q8
In this example, the input data at low sample rate 3.84 can accommodate all channels
on a single wire. So the ChanWireCount is 1. In fact more time slots are available for
processing, since period is 64 and only 16 channels are present to occupy the 64 time
slots. Therefore the ChanCycleCount is 16, which is the number of cycles occupied on
a wire. As the data travels down the up conversion chain, its sample rate increases
and in turn period reduces to a smaller number. At the output of CIC filter, the data
sample rate increases to 122.88 Msps, which means only two time slots are available
on a wire. As there are 16 channels, spread them out on 8 wires, where each wire
supports two channels. At this point, the ChanWireCount becomes 8, and
ChanCycleCount becomes 2. The ChanCycleCount does not always equal period, as
the input data to FIR1 shows.
For most systems, sample rate is less than clock rate, which gives WirePerChannel=1.
In this case, ChanWireCount is the same as WireGroups, and it is the number of wires
to accommodate all channels. In a super-sample rate system, a single channel's data
needs to be split onto multiple wires. Use parallel signals at a clock rate to give an
equivalent sample rate that exceeds the clock rate. In this case, WiresPerChannel is
greater than one, and ChanWireCount = WireGroups × WiresPerChannel because one
channel requires multiple wires.
When connecting two modules in DSP Builder, the output interface of the upstream
module must have the same ChanWireCount and ChanCycleCount parameters as the
input interface of the downstream module.
Related Information
AN 544: Digital Modem Design with the DSP Builder Advanced Blockset.
For more information about channelization in a real design
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
33
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
4. Open the new model file as text and globally replace the parameter structure to
match.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
34
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Related Information
• Starting DSP Builder in MATLAB
• DSP Builder Advanced Blockset Libraries
• Simulating, Generating, and Compiling Your Design
• DSP Builder Menu Options
Simulink includes a DSP Builder menu on any Simulink model window. Use
this menu to easily start all the common tasks you need to perform on your
DSP Builder model.
3.1.3.1. Creating a New Design From the DSP Builder FIR Design Example and
Changing the Namespaces
1. Open the FIR design example (demo_firi) from the Filters directory, by typing the
following command at the MATLAB command prompt:
demo_firi
2. In the demo_firi window (the schematic), double-click on the EditParams block
to open the setup script setup_demo_firi.m in the MATLAB Editor.
3. In the Editor, click File ➤ Save As and save as setup_mytutorial.m in a
different directory, for example \myexamples.
4. In the demo_firi window, click File ➤ Save As and save as mytutorial.mdl in
the \myexamples directory.
5. In the main MATLAB window, navigate to the \myexamples directory.
6. In the Editor, click Edit ➤ Find And Replace, enter dspb_firi in Find what:
and my_tutorial in Replace with:. Click Replace All. Click Close. This step
ensures all the setup variables do not interfere with any other workspace
variables.
7. Save setup_mytutorial.m.
8. On the Debug menu click Run setup_mytutorial.m to run the script, which
creates the workspace variables to use the schematic design.
9. To ensure MATLAB runs the setup script on opening (so that the design displays
correctly) and just before simulation (so that the parameters are up-to-date and
reflect any edits made since opening), perform the following steps:
a. In the mytutorial window (schematic), on the File menu click Model
Properties.
b. On the Callbacks tab click on PreLoadFcn and replace setup_demo_firi;
with setup_mytutorial;.
c. Repeat for the InitFnc.
d. Click OK.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
35
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Unlike traditional methods, you do not need to manually instantiate two IP blocks and
pass a single wire to each in parallel. Each IP block internally vectorizes. DSP Builder
uses the same paradigm on outputs, where it represents high data rates on multiple
wires as vectors.
Each IP block determines the input and output wire counts, based on the clock rate,
sample rate, and number of channels.
Any rate changes in the IP block affect the output wire count. If a rate change exists,
such as interpolating by two, the output aggregate sample rate doubles. DSP Builder
packs the output channels into the fewest number of wires (vector width) that
supports that rate. For example, an interpolate by two FIR filter may have two wires
at the input, but three wires at the output.
The IP block performs any necessary multiplexing and packing. The blocks connected
to the inputs and outputs must have the same vector widths, which Simulink enforces.
Resolve vector width errors by carefully changing the sample rates.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
36
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
1. Verifying your DSP Builder Advanced Blockset Design with a Testbench on page
37
2. Running DSP Builder Advanced Blockset Automatic Testbenches on page 38
3. Using DSP Builder Advanced Blockset References on page 41
4. Setting Up Stimulus in DSP Builder Advanced Blockset on page 41
5. Analyzing your DSP Builder Advanced Blockset Design on page 41
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
37
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
When designing with DSP Builder advanced blockset, use the following visualization
features of MATLAB and Simulink:
• OutScope block. In addition to exporting data to work space for analysis, you can
use the OutScope block to visualize a signal or multiple signals. The OutScope
block probes and displays data on a wire or a bus relative to the time samples,
which is useful when debugging your design.
• OutputSpectrum block. You can also use the OutputSpectrum block, which
displays the signal spectrum in real time, when your design has filtering or FFT.
• Fixed-point toolbox. When dealing with bit growth and quantization, the fixed-
point toolbox can be a valuable tool. You can even visualize the dynamic range of
a signal by looking at the histogram of the signal.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
38
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
where:
• model = design name (without extension, in single quotes)
• entity = entity to test (the name of a Primitive subsystem or a ModelIP block, in
single quotes)
• rtl_path = optional path to the generated RTL (in single quotes, if not specified
the path is read from the Control block in your model)
For example:
dspba.runModelsimATB('demo_fft16_radix2', 'FFTChip');
The return values are in the format [pass, status, result] where:
• pass = 1 for success, or 0 for failure
• status = should be 0
• result = should be a string such as:
"# ** Note: Arrived at end of stimulus data on clk <clock name>"
DSP Builder writes an output file with the full path to the component under test in the
working directory. DSP Builder creates a new file with an automatically incremented
suffix each time the testbench is run. For example:
demo_fft_radix2_DUT_FFTChip_atb.6.out
This output file includes the ModelSim transcript and is useful for debugging if you
encounter any errors.
where:
• model = design name (without extension, in single quotes)
• runSimulation = optional flag that runs a simulation when specified (if not
specified, a simulation must run previously to generate the required files)
• runFit = optional flag which runs the Quartus Prime Fitter when specified
For example:
run_all_atbs('demo_agc');
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
39
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
run_all_atbs('demo_agc', true);
The return value is 1 if all tests are successful or 0 if any tests fail. The output is
written to the MATLAB command window.
...
...
...
These errors may occur when a ModelSim precompiled model is out of date, but not
automatically recompiled. A similar problem may occur after making design changes
when ModelSim has cached a previously compiled model for a component and does
not detect when it changes. In either of these cases, delete the rtl directory,
resimulate your design and run the dspba.runModelsimATB or run_all_atbs
command again.
If you run the Quartus Prime Fitter, the command also reports whether the design
achieves the target fMAX. For example:
A summary also writes to a file results.txt in the current working directory. For
example:
PASSED
(Directory=../quartus_demo_agc_AGC_Chip_2): PASSED
PASSED
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
40
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
41
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Using fixed-point types preserves the extra information of binary point position
through hardware blocks, so that it is easy to perform rounding and shifting
operations without having to manually track the interpretation of an integer value. A
fixed-point type change propagates through your design, with all downstream
calculations automatically adjusted.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
42
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
3.3.5. Changing Data Type with Convert Blocks and Specifying Output
Types
1. Preserve the real-world value using a Convert block.
2. Preserve bit pattern by setting the output data type mode on any other Primitive
library block or use a Reinterpretcast block.
Related Information
Convert on page 323
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
43
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Figure 21. Convert Block Changing Data Type while preserving real-world value
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
44
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Figure 22. Convert Block Using Same Number of Bits while preserving real-world value
Related Information
Convert on page 323
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
45
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
For example, a Mult block with both input data types specified as sfix16_En15
naturally has an output type of sfix32_En30. The specified output data type has two
fewer fractional bits than the natural input data type. Therefore, if you specify the
output data type as sfix32_En28, the output numerical value is effectively multiplied
by four, and a 1*1 input gives an output value of 4.
If you specify output data type of sfix32_En31, the output numerical value is
effectively divided by two and a 1*1 input gives an output value of 0.5.
If you want to change the data type format in a way that preserves the numerical
value, use a Convert block, which adds the corresponding hardware. Adding a
Convert block directly after a Primitive library block allows you to specify the data
type in a way that preserves the numerical value. For example, a Mult block followed
by a Convert block, with input values 1*1 always gives output value 1.
To reinterpret the bit pattern and also discard bits, if the type you specify with the
Output data type is smaller than the natural (inherited) output type, DSP Builder
discards the MSBs (most significant bits).
Never set Specify via dialog to be bigger than the natural (inherited) bit pattern—
DSP Builder performs no zero-padding or sign extension, and the result may generate
hardware errors due to signal width mismatches. Use the Convert block for any sign
extension or zero padding.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
46
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
If you want to use sign extends and zero pads to reinterpret the bit pattern, you can
combine these methods.
To set a specific format so that DSP Builder can resolve types, for example, in
feedback loops, set Specify via dialog on an existing Primitive library block or
insert a zero-cycle sample delay (which generates no hardware and just casts the type
interpretation).
To ensure the data type is equal to some other signal data type, force the data type
propagation with a Simulink data type propagation block.
Related Information
Primitives Library on page 289
3.4. Verifying your DSP Builder Design with C++ Software Models
DSP Builder supports C++ software models for designs that support bit-accurate
simulation.
The software model includes a testbench, which is an executable program to check the
output of the software models matches the output of Simulink simulation. The
generated CMake script creates projects and makefiles (depending on parameters)
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
47
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
that you can use to compile the software model and testbench. The testbench and the
CMake script allow you to verify the model functionality. Also, you can use the
testbench as a starting point for integration of generated models into a larger, system-
level, simulation.
1. In th e design’s Control block turn on Generate software model.
The default language is cpp03 (C++ 2003 standard conformant) and Generate
an ATB (automatic testbench) and CMake build script is turned on (by
default).
2. Turn on Bit Accurate Simulation on the SynthesisInfo blocks in all
subsystems.
You must enable bit-accurate simulation for all subsystems otherwise DSP Builder
generates incomplete software models.
3. Compile the design.
DSP Builder creates a directory, cmodel, which contains the following files:
• A csl.h header file containing utility functions and implementation details for
the generated models.
A [model/subsystem name]_CModel(.h/.cpp) pair for each subsystem
and the device level system.
A [model/subsystem name]_atb.cpp file containing the device level test
bench for the model.
A CMakeFiles.txt/CMakeLists.txt file containing CMake build scripts
for building the ATB executable and model files.
4. Generate the project or makefiles using CMakeLists.txt.
For example, to generate Visual Studio 2017 projects, run:
cmake -G "Visual Studio 15 2017 Win64
Or to generate a makefile for the release build with symbols on Linux:
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=RelWithDebInfo
Refer to the CMake documentation for more options.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
48
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
8. Refer to the testbench to see how you can integrate the generated models into an
existing system.
Subsystems contain structs representing their inputs and outputs. These structs
have a generated constructor that reads values from a stimulus file for the
testbench.
struct IO_xIn
{
int64_t v; int64_t c; int64_t x; int64_t y;
IO_xIn()
: v(0)
, c(0)
, x(0)
, y(0)
{
}
IO_xIn(csl::StimulusFile& stm)
{
stm.Get<1>(v); stm.Get<8>(c); stm.Get<27>(x); stm.Get<27>(y);
}
};
When integrating the model, replace the stimulus file constructor by manually
setting the input or output values on the struct before using them to drive the
model using read(), write(), or execute() functions.
option(USE_MPIR "Include and link against the MPIR library for models that
require arbitrary precision" OFF)
option(USE_MPFR "Include and link against the MPFR library for models that
require arbitrary precision floating point" OFF)
include("CMakeFiles.txt")
if(USE_MPIR)
add_definitions(-DCSL_USE_MPIR) find_path(MPIR_INC
NAMES mpir.h
HINTS ${MPIR_INC_PATH}
)
find_library(MPIR_LIB NAMES mpir altera_mpir HINTS ${MPIR_LIB_PATH}
)
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
49
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
<name>.vhd The HDL that is generated as part of the design (regardless of automatic testbenches).
<name>_stm.vhd An HDL file that reads in data files of captured Simulink simulation inputs and outputs on
<name>
<input>/<output>.stm The captured Simulink data that the ChannelIn, ChannelOut, GPIn, GPout and IP
blocks write.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
50
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Each block writes a single stimulus file capturing all the signals through it writing them
in columns as doubles with one row for each timestep.
The device-level testbenches use these same stimulus files, following connections
from device-level ports to where the signals are captured. Device-level testbenches
are therefore restricted to cases where the device-level ports are connected to
stimulus capturing blocks.
DSP Builder captures stimulus files on the device level inputs and records Simulink
output data on the device level outputs. It creates a ModelSim testbench that contains
the HDL generated for the device that the captured inputs feed. It compares the
Simulink outputs to the ModelSim simulation outputs in an HDL testbench process,
reports any mismatches, and stops the ModelSim simulation.
This interface provides memory-mapped read and write accesses to your design
running on an FPGA using the System Console system debugging tool.
Method Description
designLoad(path) Loads the design (.sof) file specified through <path> parameter to FPGA.
openMaster(index) Creates and returns a master connection to a specified master link. The <index> specifies the
index (starting 1) of the connection from the list returned by refreshMasters function.
For example, M=SystemConsole.openMaster(1);
Method Description
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
51
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Method Description
Note: Always call this method when you finish working with current master connection.
setTimeOutValue(timeout) Use this method to override the default timeout value of 60 seconds for the master
connection object. The specified <timeout> value in seconds.
read(type, address, size [, Returns a list of <size> number of values of type <type> read from memory on FPGA
timeout]) starting at address <address>.
For example,
data = masterObj.read(‘single’, 1024, 10)
Reads consequent 10 4-byte values (40 bytes overall) with a starting address of 1,024
and returns the results as list of 10 ‘single’ typed values.
write(type, address, data [, Writes <data> (a list of values of type <type>) to memory starting at address
timeout]) <address>.
For example:
masterObj.write(‘uint16’, 1024, 1:10);
Writes values 1 to 10 to memory starting address 1,024, where each value occupies 2
bytes in memory (overall 20 bytes are written).
<address> The start address for the read operation. You can specify as a hexadecimal string.
Note: The address should specify a byte address
<size> The number of <type> (type specifies 1/2/4/8 bytes based on value) values to read.
<timeout> An optional parameter to override the default timeout value for this operation only.
<type> The type each element in specified <data>. Each type specifies 1/2/4/8 bytes:
• 1 byte : ‘char’, ‘uint8’, ‘int8’
• 2 bytes: ‘uint16’, ‘int16'
• 4 bytes: ‘uint32’, ‘int32’, ‘single’
• 8 bytes: ‘uint64’, ‘int64’, ‘double’
<address> The start address for the write operation. You can specify as a hexadecimal string.
Note: The address should be specified as a byte address
<timeout> An optional parameter to override the default timeout value for this operation only.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
52
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
1. Set up verification structures around the DUT using on-chip RAMs. If the design
interfaces to off-chip RAM for reading and storing data, the design requires no
additional verification structures.
a. Add buffers to load with test vectors for DUT inputs and logic to drive DUT
inputs with this data.
b. Add buffers to store the DUT results.
• Use a SharedMem block from the Interface library to implement buffers.
DSP Builder automatically generates processor interface to these blocks
that it requires to load and read the buffers from MATLAB (with MATLAB
API).
• Use Counter blocks from the Primitive library or custom logic to
implement a connection between the test buffers and DUT inputs and
outputs.
• Consider using RegField, RegBit, and RegOut blocks from the Interface
library to control the system and poll the results from MATLAB. DSP
Builder automatically generates a processor interface for these blocks.
2. Assemble the high-level system in Platform Designer.
3. Use appropriate Platform Designer library blocks to add debugging interfaces and
data storage.
a. Add PLLs to generate clocks with the required frequency. You can use separate
clocks for the processor interface clock and system clock of the DSP Builder
design, if you generate the DSP Builder design with Use separate bus clock
option.
b. Add debug Master (JTAG/USB). All memory-mapped read and write requests
go through this IP core. Connect it to DSPBA processor interface (Avalon MM
Slave) and any other IP that needs to be accessed from host.
c. Add the DSP Builder top-level design with the source and sink buffers.
d. If you assemble a system with a DSP Builder design that connects to off-chip
memory, add an appropriate block to the Platform Designer system and
connect it to the DSP Builder block interfaces (Avalon-MM master). Also,
connect the debug master to off-chip RAM so the host can access it.
4. Create a Quartus Prime project.
5. Add your high-level Platform Designer system into a top-level module and connect
up all external ports.
6. Provide port placement constraints.
If you are using on-chip RAMs for testing and JTAG-based debugging interface,
you mainly need to place clock and reset ports. If you use off-chip RAM for data
storage, provide more complex port assignments. Other assignments may be
required based on the specific design and external interfaces it uses.
7. Provide timing constraints.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
53
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
54
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
55
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
System-in-the-loop:
• Automatically generates HW verification system for DSP Builder designs based on
your configuration.
• Provides a wizard-based interface to configure, generate, and run HW verification
system.
• Provides two separate modes:
— Run Test Vectors loads and runs test vectors with large chunks (based on
test memory size on target verification platform)
— Data Sample Stepping loads one set sample at a time while stepping
through Simulink simulation
Data Sample Stepping generates a copy of the original model and replaces the DSP
Builder block with a special block providing connection to the FPGA to process data.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
56
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
All block input and output ports should pass through a single DSP Builder ChannelIn
or ChannelOut interface, or be connected to a single IP block. The block may contain
memory-mapped registers and memory blocks (accessible through the autogenerated
Avalon-MM slave interface). Observe the following limitations:
• The design should use the same clock for system and bus interfaces. The design
does not support separate clocks.
• For autogenerated Avalon MM slave interfaces, use the name bus.
• The design does not support any other combination of DSP Builder block interface,
including Avalon-MM master interfaces.
The overall bitwidth of block input and output ports should not exceed 512 bits
(excluding the valid signal).
Running hardware verification with Data Sample Stepping loads a new set of test
data to FPGA every simulation step (if the data set is valid), which gives big timing
gaps between two subsequent cycles for DSP Builder blocks running on hardware. If
your DSP Builder block implementation cannot handle such gaps, system-in-the-loop
simulation results may be incorrect.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
57
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
58
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
System-in the-loop supports the following third-party board support packages that are
available for OpenCL:
• Bittware
• Nallatech
• ProcV
These packages are not available in the system-in-the-loop wizard by default. After
you install these boards, publish the packages to the system-in-the-loop wizard.
Bittware bittware_s5phg
Nallatech nallatech_pcie3x5
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
59
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
This walkthrough uses a DSP Builder design that implements a primitive FIR filter with
memory-mapped registers for storing coefficients
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
60
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
BSP Settings BSP Select the target BSP you want to run the hardware test on.
BSP Memory Total Memory Size Specify the total size for test memory to use.
Allocation
Input Memory Size Specify the amount of memory (from total memory size) for storing input
test data. The remaining memory is for storing output data.
You might require several iterationsto load and process all input test
vectors because of memory limitations.
Design Interface Clock Rate Specify the same value as in the DSP Builder blocksetup file.
Sample Rate Specify the same value as in the DSP Builder block setup file.
Number of Channels The number of channels for the DSP Builder block. Specify the same
value as in the DSP Builder block setup file.
Frame Size This value represents a number of valid data samples that you should
supply to the DSP Builder block without timing gaps in between.
If this value is more than 1, the wizard inserts a specific block in
between test data provider and the DSP Builder block. This block enables
data transmission to the DSP Builder block only when the specified
amount of data is already available.
An example of such a design is a folded multichannel design.
- Destination Specify the directory where DSP Builder should generate the system- in-
Directory the-loop related files.
You should change to this directory to simulate the system-in-the-loop
generated model with an FPGA proxy.
Select SIL Flow Select the system- in-the-loop flow to use. The options are:
Run Test Vectors runs all test vectors through the hardware verification system. The
test vectors are based on simulation data recorded in DSP Builder .stm format files
during Simulink simulation.
Step Through Simulation allows processing every different set of valid input data on
hardware separately, while simulating a design from Simulink. The wizard generates a
separate model <model_name>_SIL in the SIL destination directory, which you should
use for hardware verification. The original DSP Builder device level block is replaced with
a specific block providing communication with the FPGA.
You should change to SIL destination directory before you can simulate this model.
If you change the flow, regenerate and recompile the system into a new destination
directory.
Generate Generates the infrastructure, files, and blocks for the hardware verification platform.
Compile Compiles the entire hardware verification system in the Quartus Prime software to the
generation configuration file.
Allow at least 10-15 minutes for this step to run (more time for large DSP Builder
designs). During this time the MATLAB input interface is unavailable.
Select JTAG Cable Press Scan to scan available JTAG connections for programming the board.
Choose the required JTAG cable from the discovered list.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
61
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Setting Description
The hardware test automatically detects and executes write requests over the DSP
Builder autogenerated Avalon-MM slave interface. The wizard cannot keep the sequence
of transfers for write requests over Avalon-MM slave interface and the DSP Builder data
interface on hardware exactly the same as during simulation. Therefore, you may see
data mismatches for a few sets of output samples at points where write requests are
issued.
Compare Compare the hardware verification results with simulation outputs. Run Test Vectors
only.
The wizard compares only valid output samples.
Simulate During simulation, the FPGA proxy block that replaces the original DSP Builder design in
<original_model>_SIL system the system-in-the-loop:
• Every time you update inputs, it loads data to DSP Builder if valid input is high
• Every time you request outputs, it populates outputs with data read from hardware if
output memory contains valid sample.
Step through simulation only.
Because the FPGA proxy updates its output only with valid samples, you see the same
results repeated on the outputs until hardware has a new valid set of data. This
behavior may differ from simulation results, where outputs are populated at every
simulation cycles with available values.
DSP Builder creates a directory structure that mirrors the structure of your design. The root to this directory
can be an absolute path name or a relative path name. For a relative path name (such as ../rtl), DSP Builder
creates the directory structure relative to the MATLAB current directory.
File Description
rtl directory
<model name>.xml An XML file that describes the attributes of your model.
<model name>_entity.xml An XML file that describes the boundaries of the system (for Signal Compiler in
designs that combine blocks from the standard and advanced blocksets).
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
62
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
File Description
<model name>_params.xml When you open a model, DSP Builder produces a model_name_params.xml
file that contains settings for the model. You must keep this file with the
model.
<block name>.xml An XML file containing information about each block in the advanced blockset,
which translates into HTML on demand for display in the MATLAB Help viewer
and for use by the DSP Builder menu options.
<model name>.vhd This is the top-level testbench file. It may contain non-synthesizable blocks,
and may also contain empty black boxes for Simulink blocks that are not fully
supported.
<model name>.add.tcl This script loads the VHDL files in this subdirectory and in the subsystem
hierarchy below it into the Quartus Prime project.
<model name>.qip This file contains information about all the files DSP Builder requires to process
your design in the Quartus Prime software. The file includes a reference to
any .qip file in the next level of the subsystem hierarchy.
<model name>_<block name>.vhd DSP Builder generates a VHDL file for each component in your model.
safe_path.vhd Helper function that the .qip and .add.tcl files reference to ensure that
pathnames read correctly in the Quartus Prime software.
safe_path_msim.vhd Helper function that ensures a path name reads correctly in ModelSim.
<subsystem>_atb.do Script that loads the subsystem automatic testbench into ModelSim.
<subsystem>_atb.wav.do Script that loads signals for the subsystem automatic testbench into ModelSim.
<subsystem>/<block>/*.hex Files that initialize the RAM in your design for either simulation or synthesis.
<subsystem>.tcl This Tcl script exists only in the subsystem that contains a Device block. You
can use this script to setup the Quartus Prime project.
<subsystem>_hw.tcl A Tcl script that loads the generated hardware into Platform Designer.
Related Information
• Simulating the Fibonacci Design in Simulink on page 71
• Simulating the IP Design in Simulink on page 75
• Simulating, Verifying, Generating, and Compiling Your DSP Builder Design on page
21
• Simulating, Verifying, Generating, and Compiling Your DSP Builder Design on page
21
• Control on page 236
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
63
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
The Quartus Prime project file (.qpf), Quartus Prime settings file (.qsf), and .qip
files have the same name as the subsystem in your design that contains the Device
block. For example, DSP Builder creates the files DDCChip.qpf, DDCChip.qsf, and
DDCChip.qip for the demo_ddc design.
These files contain all references to the files in the hardware destination directory that
the Control block specifies. DSP Builder generates these files when you run a
Simulink simulation. The project automatically loads into the Quartus Prime software.
When you compile your design the project compiles with the .tcl scripts in the
hardware destination directory.
The .qip file references all the files that the project requires. Use the Archive
Project command in the Quartus Prime software to use this file to archive the project.
For information about archiving projects, refer to the Quartus Prime Help.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
64
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
When you integrate your model into a Platform Designer system, Platform
Designer generates a base address for the entire DSP Builder model. Platform
Designer references individual modules within the .mdl design based on the
model base address (autogenerated) and relative base address you assign in
the .mdl file or its setup script.
4. Manage base addresses, by specifying the bus data width in the Control block.
5. For IP designs consider the number of registers each IP core needs and the
number of words each register requires
6. For Primitive subsystems, treat registers independently.
7. Ensure each IP library block and register or memory in a Primitive subsystem
has a unique base address.
The output of a DSP Builder design is a source of Avalon-ST data for downstream
components. It supplies data (and corresponding valid, channel, and start and end of
packet information) and accepts a Boolean flag input from the downstream
components, which indicates the downstream block is ready to accept data.
The input of the DSP Builder design is a sink of Avalon-ST data for upstream
components. It accepts data (and corresponding valid, channel, and start and end of
packet information) and provides a Boolean flag output to the upstream component,
which indicates the DSP Builder component is ready to accept data.
1. Simulate your design with Hardware Generation turned on in Control block.
DSP Builder generates a <model>_hw.tcl file for the subsystem containing the
Device block. This file marks the boundary of the synthesizable part of your
design and ignores the testbench blocks.
2. Add the synthesizable model to Platform Designer by including <model>_hw.tcl
at the IP search path.
Platform Designer native streaming data interface is the Avalon Streaming
(Avalon-ST) interface, which DSP Builder advanced blockset does not support. The
DSP Builder advanced blockset native interface <valid, channel, data> ports
are exported to the top-level as conduit signals.
3. Add DSP Builder components to Platform Designer by adding a directory that
contains generated hardware to the IP Search Path in the Platform Designer
Options dialog box.
4. Define Avalon-ST interfaces to build system components that Platform Designer
can join together.
Upstream and downstream components are part of the system outside of the DSP
Builder design.
5. Register all paths across the DSP builder design to avoid algebraic loops.
A design may have multiple Avalon-ST input and output blocks.
6. Generate the Platform Designer system.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
65
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
In the hw.tcl file, the name of the Avalon-ST masked subsystem block is the
name of the interface.
7. Add FIFO buffers on the output (and if required on the input) to build designs that
supporting backpressure, and declare the collected signals as an Avalon-ST
interface in the hw.tcl file generated for the device level.
These blocks do not enforce Avalon-ST behavior. They encapsulate the common
Avalon-ST signals into an interface.
Related Information
Interfaces Library on page 275
Related Information
Streaming Library on page 286
3.7.3.2.2. Restrictions for DSP Builder Designs with Avalon-ST Interface Blocks
You can place the Avalon-ST interface blocks in different levels of hierarchy. However,
never place Simulink, IP or Primitive library blocks between the interface and the
device level ports.
The Avalon-ST interface specification only allows a single data port per interface. Thus
you may not add further data ports, or even using a vector through the interface and
device-level port (which creates multiple data ports).
To handle multiple data ports through a single Avalon-ST interface, pack them
together into a single (not vector or bus) signal, then unpack on the other side of the
interface. The maximum width for a data signal is 256 bits.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
66
3. DSP Builder Design Flow
HB_DSPB_ADV | 2019.04.01
Related Information
Streaming Library on page 286
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
67
HB_DSPB_ADV | 2019.04.01
Send Feedback
The Fibonacci sequence is the sequence of numbers that you can create when you add
1 to 0 then successively add the last two numbers to get the next number:0, 1, 1, 2,
3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, ...
Each Primitive library block in the design example is parameterizable. When you
double-click a block in the model, a dialog box appears where you can enter the
parameters for the block. Click the Help button in these dialog boxes to view help for
a specific block.
You can use the demo_fibonacci.mdl model in the <DSP Builder Advanced
install path>/Examples/Primitive directory or you can create your own
Fibonacci model.
1. Creating a Fibonacci Design from the DSP Builder Primitive Library on page 68
2. Setting the Parameters on the Testbench Source Blocks on page 70
3. Simulating the Fibonacci Design in Simulink on page 71
4. Modifying the DSP Builder Fibonacci Design to Generate Vector Signals on page
72
5. Simulating the RTL of the Fibonacci Design on page 72
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
4. Primitive Library Blocks Tutorial
HB_DSPB_ADV | 2019.04.01
9. Select both of the SampleDelay blocks and point to Rotate and Flip on the
popup menu and click Flip Block to reverse the direction of the blocks.
10. Drag and drop Add and Mux blocks into your model.
11. Drag and drop a Const block. Double-click the block and:
a. Select Specify via Dialog for Output data type mode.
b. For Output type enter ufix(1).
c. For Scaling enter 1
d. For Value enter 1.
e. Click OK.
12. Connect the blocks.
13. Double-click on the second SampleDelay block (SampleDelay1) to display the
Function Block Parameters dialog box and change the Number of delays
parameter to 2.
14. Double-click on the Add block to display the Function Block Parameters dialog
box and set the parameters.
a. For Output data type mode, select Specify via Dialog .
b. For Output type enter ufix(120).
c. For Output scaling value enter 2^-0
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
69
4. Primitive Library Blocks Tutorial
HB_DSPB_ADV | 2019.04.01
Related Information
• Starting DSP Builder in MATLAB on page 17
• Starting DSP Builder in MATLAB on page 17
• DSP Builder Menu Options on page 20
fixdt(0,fibonacci_param.input_word_length,fibonacci_param.inpu
t_fraction_length)
7. Save the Fibonacci model.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
70
4. Primitive Library Blocks Tutorial
HB_DSPB_ADV | 2019.04.01
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
71
4. Primitive Library Blocks Tutorial
HB_DSPB_ADV | 2019.04.01
Note: You can verify that the fib output continues to increment according to the
Fibonacci sequence by simulating for longer time periods.
The sequence on the fib output starts at 0, and increments to 1 when q_v and
q_c are both high at time 21.0. It then follows the expected Fibonacci sequence
incrementing through 0, 1, 1, 2, 3, 5, 8, 13 and 21 to 34 at time 30.0.
Related Information
• DSP Builder Generated Files on page 62
• Simulating, Verifying, Generating, and Compiling Your DSP Builder Design on page
21
2. Copy the real input block, add a Simulink mux and connect to the Convert block.
3. Edit the timing of the real1 block, for example [0 1 1 1 zeros(1,50)].
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
72
4. Primitive Library Blocks Tutorial
HB_DSPB_ADV | 2019.04.01
Related Information
Simulating, Verifying, Generating, and Compiling Your DSP Builder Design on page 21
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
73
HB_DSPB_ADV | 2019.04.01
Send Feedback
5. IP Tutorial
This tutorial demonstrates how to use blocks from the DSP Builder IP library. It shows
how you can double the number of channels through a filter, increase the fMAX, and
target a different device family by editing top-level parameters in Simulink.
Related Information
• Starting DSP Builder in MATLAB on page 17
• Starting DSP Builder in MATLAB on page 17
• DSP Builder Menu Options on page 20
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
5. IP Tutorial
HB_DSPB_ADV | 2019.04.01
Verify the design in MATLAB. Compile the design in the Quartus Prime software
Related Information
• Simulating, Verifying, Generating, and Compiling Your DSP Builder Design on page
21
• DSP Builder Generated Files on page 62
• Simulating, Verifying, Generating, and Compiling Your DSP Builder Design on page
21
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
75
5. IP Tutorial
HB_DSPB_ADV | 2019.04.01
Related Information
Simulating, Verifying, Generating, and Compiling Your DSP Builder Design on page 21
The design now closes timing above 480 MHz. At the higher clock rate, the design
shares multiplier resources, and the multiplier count decreases back to 6.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
76
5. IP Tutorial
HB_DSPB_ADV | 2019.04.01
Related Information
• DSP Builder Menu Options
Simulink includes a DSP Builder menu on any Simulink model window. Use
this menu to easily start all the common tasks you need to perform on your
DSP Builder model.
• Creating an IP Design
5.5. Doubling the Target Clock Rate for a DSP Builder IP Design
Create an IP design.
1. Double-click the EditParams block to open my_firi.m in the MATLAB Editor.
Change my_firi_param.ClockRate to 480.0 and click Save.
2. Simulate the design.
3. Click DSP Builder ➤ Verify Design, and click Clear Results to clear the output
pane.
4. Click Run Verification.
Related Information
• DSP Builder Menu Options
Simulink includes a DSP Builder menu on any Simulink model window. Use
this menu to easily start all the common tasks you need to perform on your
DSP Builder model.
• Creating an IP Design
• Simulating the IP Design
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
77
HB_DSPB_ADV | 2019.04.01
Send Feedback
All the design examples have the same basic structure: a top-level testbench
containing an instantiated functional subsystem, which represents the hardware
design.
The testbench typically includes Simulink source blocks that generate the stimulus
signals and sink blocks that display simulation results. You can use other Simulink
blocks to define the testbench logic.
The testbench also includes the following blocks from the DSP Builder advanced
blockset:
• The Control block specifies information about the hardware generation
environment, and the top-level memory-mapped bus interface widths.
• The ChanView block in a testbench allows you to visualize the contents of the
<valid, channel, data> time-division multiplex (TDM) protocol. This block
generates synthesizable HDL and can therefore also be useful in a functional
subsystem.
The functional subsystem in each design contains a Device block that marks the top-
level of the FPGA device and controls the target device for the hardware.
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
6.1.1. Scale
This design example demonstrates the Scale block.
The testbench allows you to see a vectorized block in action. Displays in the testbench
track the smallest and largest values to be scaled and verify the correct behavior of
the saturation modes.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus a ChanView block that deserializes the output bus.
After simulation, in the resource table, you can compare the resources for NCO and
NCO1. NCO1 uses no multipliers at the expense of extra logic. The resource table also
contains resources for the ChannelViewer blocks—synthesizable blocks, that the
design example uses outside the device system.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
79
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
6.2.1. FFT
This design example implements a 2,048 point, radix 22 FFT. This design example
accepts natural order data at the input and produces natural order data at the output.
The design example includes a BitReverseCoreC block, which converts the input data
stream from natural order to bit-reversed order, and an FFT block, which performs an
FFT on bit-reversed data and produces its output in natural order.
Note: The FFT designs do not inherit the width in bits and scaling information. The design
example specifies these values with the Wordlength and FractionLength variables in
the setup script, which are 16 and 19 for this design example. You can also set the
maximum width in bits by setting the MaxOut variable. Most applications do not need
the maximum width in bits. To save resources, set a threshold value for this variable.
The default value of inf allows worst case bit growth.
Note: The FFT designs do not inherit width in bits and scaling information. The design
example specifies these values with the Wordlength and FractionLength variables in
the setup script, which are 16 and 19 for this design example. You can also set the
maximum width in bits by setting the MaxOut variable. Most applications do not need
the maximum width in bits. To save resources, set a threshold value for this variable.
The default value of inf allows worst case bit growth.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
80
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
6.2.3. IFFT
This design example implements a 2,048 point, radix 22 iFFT. This design example
accepts natural order data at the input and produces natural order data at the output.
The design example includes a BitReverseCoreC block, which converts the input data
stream from natural order to bit-reversed order, and an FFT block, which performs an
FFT on bit-reversed data and produces its output in natural order.
Note: The FFT designs do not inherit the width in bits and scaling information. The design
example specifies these values with the Wordlength and FractionLength variables in
the setup script, which are 16 and 19 for this design example. To set the maximum
width in bits, set the MaxOut variable. Most applications do not need the maximum
width in bits. To save resources, set a threshold value for this variable. The default
value of inf allows worst case bit growth.
Note: The FFT designs do not inherit width in bits and scaling information. The design
example specifies these values with the Wordlength and FractionLength variables in
the setup script, which are 16 and 19 for this design example. To set the maximum
width in bits, set the MaxOut variable. Most applications do not need the maximum
width in bits. To save resources, set a threshold value for this variable. The default
value of inf allows worst case bit growth.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
81
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
82
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
83
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The FFT accepts 16-bit fixed-point inputs. The FFT produces block floating-point
output using an 18-bit mantissa and a shared 6-bit exponent.
This FFT implementation is unusual because the FFT gain is 1. Therefore the sum-of-
squares of the input values is equal (allowing for rounding errors) to the sum-of-
squares of the output values.
To configure the design example, edit any of the parameters in the setup file.
This design example takes care of the faster sample rate needed by the DSP Builder
FFT. The setup file chooses a sample rate that is fast enough for calculation but not so
fast that it slows down the simulation unnecessarily. The design also adds buffering to
the original MATLAB fft signal path to make the signal processing delays the same in
both paths.
The incoming data is of fixed-point type and arrives in natural order. The number of
radix-2 stages assigned to the serial section of the hybrid FFT is 7.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
84
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
View the DDCChip subsystem to see the components you require to build a complex,
production ready system.
The top-level testbench includes Control and Signals blocks, and some Simulink
blocks to generate source signals and visualize the output. The full power of the
Simulink blocksets is available for your design.
The DDCChip subsystem block contains the following blocks that form the lowest level
of the design hierarchy:
• The NCO and mixer
• Decimate by 16 CIC filter
• Two decimate by 4 FIR odd-symmetric filters: one with length 21, the other length
with 63.
The other blocks in this subsystem perform a range of rounding and saturation
functions. They also allow dynamic scaling. The Device block specifies the target
FPGA.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
85
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The Signals block allows you to define the relationship between the sample rates and
the system clock, to tell the synthesis engines how much folding or time sharing to
perform. Increasing the system clock permits more folding, and therefore typically
results in more resource sharing, and a smaller design.
You also need a system clock rate so that the synthesis engines know how much to
pipeline the logic. For example, by considering the device and speed grade, the
synthesis tool can calculate the maximum length that an adder can have. If the design
exceeds this length, it pipelines the adder and adjusts the whole pipeline to
compensate. This adjustment typically results in a small increase in logic size, which is
usually more than compensated for by the decrease in logic size through increased
folding.
The Signals block specifies the clock and reset names, with the system clock
frequency. The bus clock or FPGA internal clock for the memory-mapped interfaces
can be run at a lower clock frequency. This lets the design move the low-speed
operations such as coefficient update completely off the critical path.
Note: To specify the clock frequency, clock margin, and bus clock frequency values in this
design, use the MATLAB workspace variables ClockRate and ClockMargin, which you
can edit by double-clicking on the Edit Params block.
The Control block controls the whole DSP Builder advanced blockset environment. It
examines every block in the system, controls the synthesis flow, and writes out all RTL
and scripts. A single control block must be present in every top-level model.
In this design, hardware generation creates RTL. DSP Builder places the RTL and
associated scripts in the directory ../rtl, which is a relative path based on the current
MATLAB directory. DSP Builder creates automatic self-checking testbenches, which
saves the data that a Simulink simulation captures to build testbench stimulus for
each block in your design. DSP Builder generates scripts to run these simulations.
The threshold values control the hardware generation. They control the trade-offs
between hardware resources, such as hard DSP blocks or soft LE implementations of
multipliers. You can perform resource balancing for your particular design needs with a
few top-level controls.
Many memory-mapped registers in the design exist such as filter coefficients and
control registers for gains. You can access these registers through a memory port that
DSP Builder automatically creates at the top-level of your design. DSP Builder can
create all address decode and data multiplexing logic automatically. DSP Builder
generates a memory map in XML and HTML that you can use to understand the
design.
To access this memory map, after simulation, on the DSP Builder menu, point to
Resource Usage and click Design, Current Subsystem, or Selected block. The
address and data widths are set to 8 and 32 in the design.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
86
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The EditParams block allows you to edit the script setup_demo_ddc.m, which sets
up the MATLAB variables that configure your model. Use the MATLAB design example
properties callback mechanism to call this script.
Note: The PreloadFCn callback uses this script to setup the parameters when your design
example opens and the InitFcn callback re-initializes your design example to take
account of any changes when the simulation starts.
The script sets up MATLAB workspace variables. The SampleRate variable is set to
61.44 MHz, which typical of a CDMA system, and represents a quarter of the system
clock rate that the FPGA runs at. You can use the feature to TDM four signals onto any
given wire.
The Simulink environment enables you to create any required input data for your
design. In the DDC design, use manual switches to select sine wave or random noise
generators. DSP Builder encodes a simple six-cycle sine wave as a table in a
Repeating Sequence Stair block from the Simulink Sources library. This sine wave is
set to a frequency that is close to the carrier frequencies that you specify in the NCOs,
allowing you to see the filter lineup decoding some signals. DSP Builder creates VHDL
for each block as part of the testbench RTL.
Simulink Sink library blocks display the results of the DDC simulation. The Scope
block displays the raw output from the DDC design. The design has TDM outputs and
all the data shows as data, valid and channel signals.
At each clock cycle, the value on the data wire either carries a genuine data output,
or data that you can safely discard. The valid signal differentiates between these two
cases. If the data is valid, the channel wire identifies the channel where the data
belongs. Thus, you can use the valid and channel wires to filter the data. The
ChanView block automates this task and decodes 16 channels of data to output
channels 0 and 15. The block decimates these channels by the same rate as the whole
filter line up and passes to a spectrum scope block (OutSpectrum) that examines the
behavior in the frequency domain.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
87
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Two main blocks exist—the DecimatingCIC and the Scale block. To configure the CIC
Filter, double click on the DecimatingCIC block.
The input sample rate is still the same as the data from the antenna. The
dspb_ddc.SampleRate variable specifies the input sample rate. The number of
channels, dspb_ddc.ChanCount, is a variable set to 16. The CIC filter has 5 stages,
and performs decimation by a factor of 16. 1/16 in the dialog box indicates that the
output rate is 1/16th of the input sample rate. The CIC parameter differential delay
controls how many delays each CIC section uses—nearly always set to 1.
The CIC has no registers to configure, therefore no memory map elements exist.
The input data is a vector of four elements, so DSP Builder builds the decimating CIC
from four separate CICs, each operating on four channels. The decimation behavior
reduces the data rate at the output, therefore all 16 data samples (now at 61.44/16
MSPS each channel) can fit onto 1 wire.
The DecimatingCIC block multiplexes the results from each of the internal CIC filters
onto a single wire. That is, four channels from vector element 1, followed by the four
channels from vector element 2. DSP Builder packs the data onto a single TDM wire.
Data is active for 25% of the cycles because the aggregate sample rate is now 61.44
MSPS × 16 channels/16 decimation = 61.44 MSPS and the clock rate for the system is
245.76 MHz.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
88
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Bursts of data occur, with 16 contiguous samples followed by a gap. Each burst is
tagged with the valid signal. Also the channel indicator shows that the channel order is
0..15.
The number of input integrator sections is 4, and the number of output comb sections
is 1. The lower data rate reduces the size of the overall group of 4 CICs. The Help
page also reports the gain for the DCIC to be 1,048,576 or approximately 2^20. The
Help page also shows how DSP Builder combines the four channels of input data on a
single output data channel. The comb section utilization (from the DSP Builder menu)
confirms the 25% calculation for the folding factor.
The Scale block reduces the output width in bits of the CIC results.
In this case, the design requires no variable shifting operation, so it uses a Simulink
constant to tie the shift input to 0. However, because the gain through the
DecimatingCIC block is approximately 2^20 division of the output, enter a scalar
value -20 for the Number of bits to shift left in the dialog box to perform data.
Note: Enter a scalar rather than a vector value to indicate that the scaling is static.
The last part of the DDC datapath comprises two decimating finite impulse response
(FIR) blocks (DecimatingFIR1 and DecimatingFIR2) and their corresponding scale
blocks (Scale1 and Scale2).
These two stages are very similar, the first filter typically compensates for the
undesirable pass band response of the CIC filter, and the second FIR fine tunes the
response that the waveform specification requires.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
89
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The input rate per channel is the output sample rate of the decimating CIC, which is
16 times lower than the raw sample rate from the antenna.
Note: You can enter any MATLAB expression, so DSP Builder can extract the 16 out as a
variable to provide additional parameterization of the whole design.
This filter performs decimation by a factor of 4 and the calculations reduce the size of
the FIR filter. 16 channels exist to process and the coefficients are symmetrical.
This simple design uses a low-pass filter. In a real design, more careful generation of
coefficients may be necessary.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
90
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The output of the FIR filter fits onto a single wire, but because the data reduces
further, there is a longer gap between frames of data.
Access a report on the generated FIR filter from the Help page.
You can scroll down in the Help page to view the port interface details. These match
the hardware block, although the RTL has additional ports for clock, reset, and the bus
interface.
The report shows that the input data format uses a single channel repeating every
64 clock cycles and the output data is on a single channel repeating every 256 clock
cycles.
Details of the memory map include the addresses DSP Builder requires to set up the
filter parameters with an external microprocessor.
You can show the total estimated resources by clicking on the DSP Builder menu,
pointing to Resources, and clicking Device. Intel estimates this filter to use 338
LUT4s, 1 18×18 multiplier and 7844 bits of RAM.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
91
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The Scale1 block that follows the DecimatingFIR1 block performs a similar function
to the DecimatingCIC block.
fi(fir1(62, 0.22),1,16,15)
The DDCChip subsystem contains a Device block. This block labels this level of
design hierarchy that compiles onto the FPGA. The Device block sets the FPGA family,
device and speed grade. The family and speed grade optimize the hardware. In
combination with the target clock frequency, the device determines the degree of
pipelining.
Note: The inputs, NCO, and mixer stages show with Simulink signal formats turned on.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
92
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The primary inputs to the hardware are two parallel data signals (DataInMain and
DataInDiversity), a channel signal (DataChan), and a valid signal (DataValid).
The parallel data signals represent inputs from two antennas. They are of type
sfix14_13 which is a Simulink fixed-point type of total width 14 bits. The type is
signed with 13 bits of fraction, which is a typical number format that an analog-to-
digital converter generates.
The data channel DataChan is always an 8-bit unsigned integer (uint8) and DSP
Builder synthesizes away the top bits if not used. The valid signal DataValid
indicates when real data transmits. The first rising edge of the valid signal starts
operation of the first blocks in the chain. As the first blocks start producing outputs,
their valid outputs start the next blocks in the chain. This mechanism ensures that
filter chain start up is coordinated without having a global controller for the latencies
of each block. The actual latencies of the blocks may change based on the clock
frequency and FPGA selection.
The IP blockset supports vectors on its input and output data wires, which ensures
that a block diagram is scalable when, for example, changing channel counts and
operating frequencies. The merge multiplexer (DDCMerge1) takes two individual
wires and combines them into a vector wire of width 2. This Simulink Mux block does
not perform any multiplexing in hardware—it is just as a vectorizing block. If you
examine the RTL, it contains just wires.
DDC NCO
The NCO block generates sine and cosine waveforms to a given precision. These
waveforms represent a point in the complex plane rotating around the origin at a
given frequency. DSP Builder multiplies this waveform by the incoming data stream to
obtain the data from the transmitted signal.
Note: Four frequencies exist, because the vector in the Phase Increment and Inversion
field is of length 4.
DSP Builder configures the NCO block to produce a signed 18-bit value with 17 bits of
fraction. The internal accumulator width is set to 24 bits. This internal precision affects
the spurious-free dynamic range (SFDR). DSP Builder specifies the initial frequencies
for the simulation as phase increments. The phase accumulator width in bits is 2^24,
thus one complete revolution of the unit circle corresponds to a value of 2^24.
Dividing this number by 5.95, means that the design requires 5.95 cycles to perform
one complete rotation. That is, the wavelength of the sine and cosine that the design
produces are 5.95 cycles. The sample rate is 61.44 MHz, therefore the frequency is
61.44/5.95, which is 10.32 MHz.
The input frequency in the testbench rotates every 6 cycles for a frequency of
61.44/6=10.24 MHz. Therefore, you can expect to recover the difference of these
frequencies (0.08 MHz or 80 kHz), which fall in the low-pass filters pass bands,
because DSP Builder mixes these signals.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
93
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The design exposes phase values through a memory-mapped interface at the address
specified by the variable DDC_NCO_PHASE_INCR, which is set to address 0x0000 in
the setup script. After simulation, to view resource usage for the design example, the
subsystem, or a selected block, on the DSP Builder menu, point to Resource Usage
and click Design, Current Subsystem, or Selected block.
DSP Builder reports for each register, the name, width, reset value, and address. This
report collates all the registers from your design into a single location.
You can view the estimated results for this NCO configuration in the Results tab of
the dialog box).
Based on the selected accumulator width and output width, DSP Builder calculates an
estimated SFDR and accumulator precision. To verify this precision in a separate
testbench, use demo_nco.mdl as a start.
DDC Mixer
The Mixer block performs the superheterodyne operation by multiplying each of the
two received signals (DataInMain and DataInDiversity) by each of the four
frequencies. This action produces eight complex signals or 16 scalar signals (the 16
channels in the DDC design).
The mixer requires sufficient multipliers to perform this calculation. The total number
of real × complex multipliers required for each sample is 2 signals × 4 frequencies =
8.
After simulation, to view resource usage for the design example, the subsystem, or a
selected block, on the DSP Builder menu, point to Resource Usage and click Design,
Current Subsystem, or Selected block.
You can list the input and output ports that DSP Builder creates for this block, with the
data width and brief description, by right-clicking on the block and clicking Help. DSP
Builder suffixes the vector inputs with 0 and 1 to implement the vector. This list of
signals corresponds to the signals in the VHDL entity.
DSP Builder provides the results for the mixer as separate in phase and quadrature
outputs—each is a vector of width 2. It performs the remaining operations on both the
I and Q signals, so that DSP Builder can combine them with another Simulink
multiplexer to provide a vector of width 4. This operation carries the 16 signals, with a
folding factor of 4. At this point the channel counts count 0, 1, 2, 3, 0, 1, ....
At this point in the datapath, the data width is 32 bits representing the full precision
output of multiplying a 14-bit data signal with an 18-bit sine or cosine signal. DSP
Builder needs to reduce the data width to a lower precision to pass on to the
remaining filters, which reduces the resource count considerably, and does not cause
significant information loss. The Scale3 block performs a shift-round-saturate
operation to achieve this reduction. The shift is usually a 1 or 2 bit shift that you can
set to adjust the gain in your design at run time.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
94
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
To determine the setup, DSP Builder usually uses a microprocessor, which writes to a
register to set the shift amount. This design uses a RegField block
(Mixer_Scaling_Register). This block behaves like a constant in the Simulink
simulation, but in hardware the block performs as a processor-writable register that
initializes to the value in your design example.
The register produces a 2-bit output of type ufix(2)—an unsigned fixed-point number.
The scaling is 2^-0 so is, in effect, a 2-bit unsigned integer. These 2 bits are mapped
into bits 0 and 1 of the word (another register may use other bits of this same
address). The initial value for the register is set to 0. DSP Builder provides a
description of the memory map in the resource usage. Sometimes, Simulink needs an
explicit sample time, but you can use the default value of –1 for this tutorial.
The 2-bit unsigned integer is fed to the Scale3 block. This block has a vector of width
4 as its data input. The Scale3 block builds a vector of 4 internal scale units. These
parameters are not visible through the user interface, but you can see them in the
resource usage.
The block produces four outputs, which DSP Builder presents at the output as a vector
of width 4. DSP Builder preserves the order in the vector. You can create quite a large
block of hardware by passing many channels through a IP block. The exception output
of the scale block provides signals to say when saturation occurs, which this design
does not require, so this design terminates them.
The design sets the output format to 16-bit signed with 15 bits of fraction and uses
the Unbiased rounding method. This method (convergent rounding or round-to-even)
typically avoids introducing a DC bias.
The saturation method uses Symmetric rounding which clips values to within
+0.9999 and –0.9999 (for example) rather than clipping to –1. Again this avoids
introducing a DC bias.
The number of bits to shift is a vector of values that the scaling register block
(Mixer_Scaling_Register) indexes. The vector has 4 values, therefore DSP Builder
requires a 2-bit input.
An input of 0 uses the 0th value in the vector (address 1 in Simulink), and so on.
Therefore, in this example inout0 shifts by 0 and the result at the input has the same
numerical range as the input. An input of 1 shifts left by 1, and so multiplies the input
value by 2, thus increasing the gain.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
95
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Note: If you turn on the Generate Hardware option in the parameters for the
Control block, every time the simulation runs, DSP Builder synthesizes the
underlying hardware, and writes out VHDL into the directory you specify.
3. Simulate the generated RTL in the ModelSim simulator.
4. Synthesize and fit the RTL in the Quartus Prime software.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
96
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Figure 44. Generated Directory Structure for the DDC Design Example
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
97
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Note: Separate subdirectories exist corresponding to each hierarchical level in your design.
rtl directory
demo_ddc_entity.xml An XML file that describes the boundaries of the system (for Signal
Compiler in designs that combine blocks from the standard and advanced
blocksets).
rtl\demo_ddc subdirectory
<block name>.xml An XML file containing information about each block in the advanced
blockset which translates into HTML on demand for display in the MATLAB
Help viewer and for use by the DSP Builder menu options.
demo_ddc.add.tcl This script loads the VHDL files in this subdirectory and in the subsystem
hierarchy below it into the Quartus Prime project.
demo_ddc.qip This file contains all the assignments and other information DSP Builder
requires to process the demo_ddc design example in the Quartus Prime
software. The file includes a reference to the .qip file in the DDCChip
subsystem hierarchy.
<block name>.vhd DSP Builder generates a VHDL file for each component in your model.
demo_ddc_DDCChip_ent An XML file that describes the boundaries of the DDCChip subsystem as a
ity.xml black box (for Signal Compiler in designs that combine blocks from the
standard and advanced blocksets).
DDCChip.xml An XML file that describes the attributes of the DDCChip subsystem.
safe_path.vhd Helper function that ensures a pathname is read correctly in the Quartus
Prime software.
rtl\demo_ddc\<subsystem> subdirectories
Separate subdirectories exist for each hierarchical level in your design. These subdirectories include
additional .xml .vhd, qip and .stm files describing the blocks contained in each level. Also
additional .do and .tcl files exist, which it automatically calls from the corresponding files in the top-
level of your model.
<subsystem>_atb.do Script that loads the subsystem automatic testbench into ModelSim.
<subsystem>_atb.wav.d Script that loads signals for the subsystem automatic testbench into
o ModelSim.
<subsystem>/<block>/ Intel format .hex files that initialize the RAM blocks in your design for
*.hex either simulation or synthesis.
<subsystem>_hw.tcl A Tcl script that loads the generated hardware into Platform Designer.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
98
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Figure 45. Simulation Results in the ModelSim Wave Window for the DDC Design
Example
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
99
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
13. Interpolating FIR Filter with Multiple Coefficient Banks on page 104
14. Interpolating FIR Filter with Updating Coefficient Banks on page 105
15. Root-Raised Cosine FIR Filter on page 105
16. Single-Rate FIR Filter on page 105
17. Super-Sample Decimating FIR Filter on page 106
18. Super-Sample Fractional FIR Filter on page 106
19. Super-Sample Interpolating FIR Filter on page 106
20. Variable-Rate CIC Filter on page 106
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_dcic.m script.
Note: This design example uses the Simulink Signal Processing Blockset.
This design example uses the Decimating FIR block to build a 20-channel decimate
by 5, 49-tap FIR filter with a target system clock frequency of 240 MHz.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_fird.m script.
The FilterSystem subsystem includes the Device and Decimating FIR blocks.
Note: This design example uses the Simulink Signal Processing Blockset.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
100
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_filters_flow_control.m script.
Note: This design example uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_fir_fractional.m script.
Note: This design example uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_firf.m script.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
101
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Note: This design example uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_firih.m script.
The FilterSystem subsystem includes the Device block and two separate
InterpolatingFIR blocks for the regular and interpolating filters.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
102
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_icic.m script.
Note: This design example uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_firi.m script.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
103
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Note: This design example uses the Simulink Signal Processing Blockset.
Multiple sets of coefficients requires storage in memory so that the design can switch
easily from one set, or bank, of coefficients in use to another in a single clock cycle.
You specify the coefficient array as a matrix rather than a vector—(bank rows) by
(number of coefficient columns).
The addressing scheme has address offsets of base address + (bank number *
number of coefficients for each bank).
If the number of rows is greater than one, DSP Builder creates a bank select input
port on the FIR filter. In a design, you can drive this input from either data or bus
interface blocks, allowing either direct or bus control. The data type is unsigned
integer of width ceil(log2(number of banks)).
The bank select is a single signal. For example, for a FIR filter with four input channels
over two timeslots:
<0><1>
<2><3>
<0><1>
Here the design receives more than one channel at a time, but can only choose a
single bank of coefficients. Channels 0 and 2 use one set of coefficients and channels
1 and 3 another. Channel 0 cannot use a different set of coefficients to channel 2 in
the same filter.
For multiple coefficient banks, you enter an array of coefficients sets, rather than a
single coefficient set. For example, for a MATLAB array of 1 row and 8 columns [1 x
8], enter:
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
104
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Therefore, you can determine the number of banks by the number of rows without
needing the number of banks. If the number of banks is greater than 1, add an
additional bank select input on the block.
Write to the bus interface using the BusStimulus block with a sample rate
proportionate with the bus clock. Generally, DSP Builder does not guarantee bus
interface transactions to be cycle accurate in Simulink simulations. However, in this
design example, DSP Builder updates the coefficient bank while it is not in use.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_fir_rrc.m script.
The FilterSystem subsystem includes the Device and Decimating FIR blocks.
Note: This design example uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_firs.m script.
Note: This design example uses the Simulink Signal Processing Blockset.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
105
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The input sample rate is six times the clock rate. The filter decimates by two the input
sample rate to three times the clock rate, which is visible in the vector input and
output data connections. The input receives six samples in parallel at the input, and
three samples are output each cycle.
The input sample rate is two times the clock rate. The filter upconverts the input
sample rate to three times the clock rate, which is visible in the vector input and
output data connections. The input receives two samples in parallel at the input, and
three samples are output each cycle.
The input sample rate is twice the clock rate and is interpolated by three by the filter
to six times the clock rate, which is visible in the vector input and output data
connections. The input receives two samples in parallel at the input, and six samples
are output each cycle.
Note: This design example uses the Simulink Signal Processing Blockset.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
106
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
You can control the rate change with a register field, which is part of the control
interface. The register field controls the generation of a valid signal that feeds into the
differentiators.
The design example also contains a gain compensation block that compensates for the
rate change dependent gain of the CIC. It shifts the input up so that the MSB at the
output is always at the same position, regardless of the rate change that you select.
The associated setup file contains parameters for the minimum and maximum
decimation rate, and calculates the required internal data widths and the scaling
number. To change the decimation factor for simulation, adjust variable CicDecRate
to the desired current decimation rate.
Note: Intel has not tested this design on hardware and Intel does not provide a model of a
motor.
Functional Description
An encoder measures the rotor position in the motor, which the FPGA then reads. An
analog-to-digital converter (ADC) measures current feedback, which the FPGA then
reads.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
107
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
SOPC Builder
Position, Nios II Ethernet Industrial
PHY
Speed, Processor MAC Ethernet
and Current
Control IGBT
for AC Motors Control
Example Design Interface
In DSP Builder Power AC
ADC Stage Motor
ADC
Interface
Position
Encoder Encoder
Interface
Each of the FOC, speed, and position feedback loops use a simple PI controller to
reduce the steady state error to zero. In a real-world PI controller, you may also need
to consider integrator windup and tune the PI gains appropriately. The feedback loops
for the integral portion of the PI controllers are internal to the design.
The example assumes you sample the inputs at a rate of 100 kHz and the FPGA clock
rate is 100 MHz (suitable for Cyclone IV devices). ALU folding reduces the resource
usage by sharing operators such as adders, multipliers, cosine. The folding factor is
set to 100 to allow each operator to be timeshared up 100 times, which gives an input
sample rate of 1 Msps, but as the real input sample rate is 100 ksps, only one out of
every ten input timeslots are used. DSP Builder identifies the used timeslots when
valid_in is 1. Use valid_in to enable the latcher in the PI controller, which stores
data for use in the next valid timeslot. The valid_out signal indicates when the
ChannelOut block has valid output data. You can calculate nine additional channels
on the samedesign without incurring extra latency (or extra FPGA resources).
You should adjust the folding factor to see the effect it has on hardware resources and
latency. To adjust, change the Sample rate (MHz) parameter in the ChannelIn and
ChannelOut blocks of the design either directly or change the FoldingFactor
parameter in the setup script. For example, a clock frequency of 100 MHz and sample
rate of 10 MHz gives a folding factor of 10. Disabling folding, or setting the factor to 1,
results in no resource sharing and minimal latency. Generally, you should not set the
folding factor greater than the number of shareable operators, that is, for 24 adders
and 50 multipliers, use a maximum folding factor 50.
Note: The testbench does not support simulations if you adjust the folding factor.
The control algorithm, with the FOC, position, speed, control loops, vary the desired
position across time. The three control loops are parameterized with minimum and
maximum limits, and Pl values. These values are not optimized and are for
demonstrations only.
Resource Usage
Table 16. Position, Speed, and Current Control for AC Motors Design Example Resource
Usage
Folding Factor Add and Sub Blocks Mult Blocks Cos Blocks Latency
No folding 22 22 4 170
>22 1 1 1 279
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
108
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Hardware Generation
When hardware generation is disabled, the Simulink system simulates the design at
the external sample rate of 100 kHz, so that it outputs a new value once every 100
kHz. When hardware generation is enabled, the design simulates at the FPGA clock
rate (100 MHz), which represents real-life latency clock delays, but it only outputs a
new value every 100 kHz. This mode slows the system simulation speed greatly as the
model is evaluated 1,000 times for every output. The setup script for the design
example automatically detects whether hardware generation is enabled and sets the
sample rates accordingly. The example is configured with hardware generation
disabled, which allows fast simulations. When you enable hardware generation, set a
very small simulation time (for example 0.0001 s) as simulation may be very slow.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
109
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
110
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
6.5.2. Position, Speed, and Current Control for AC Motors (with ALU
Folding)
The position, speed, and current control for AC motors (with ALU folding) design
example is a FOC algorithm for AC motors, which is identical to the position, speed,
and current control for AC motors design example. However this design example uses
ALU folding.
The design example targets a Cyclone V device (speed grade 8). Cyclone V devices
have distributed memory (MLABs). ALU folding uses many distributed memory
components. ALU folding performs better in devices that have distributed memories,
rather than devices with larger block memories.
dspb_psc_ctrl.SampleRateHz = 10000 Sample rate. Default set to 10000, which is 10 kHz sample rate.
dspb_psc_ctrl.ClockRate = 100 FPGA clock frequency. Default set to 100, which is 100 MHz clock
When you run this design example without folding, the DSP Builder system operates
at the same 10 kHz sample rate. Therefore, the system calculates a new packet of
data for every Simulink sample. Also, the sample times of the testbench are the same
as the sample times for the DSP Builder system.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
111
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The Rate Transition blocks translate between the Simulink testbench and the DSP
Builder system. These blocks allow Simulink to manage the different sample times
that the DSP Builder system requires. You need not modify the design example when
you run designs with or without folding.
The Rate Transition blocks produce Simulink samples with a sample time of
dspb_psc_ctrl.SampleTime for the testbench and
dspb_psc_ctrl.DSPBASampleTime for the DSP Builder system. The samples are in
the stimuli system, within the dummy motor. To hold the data consistent at the inputs
to the Rate Transition blocks for the entire length of the output sample
(dspb_psc_ctrl.SampleTime), turn on Register Outputs.
The data valid signal consists of a one Simulink sample pulse that signifies the
beginning of a data packet followed by zero values until the next data sample, as
required by ALU folding. The design example sets the period of this pulsing data valid
signal to the number of Simulink samples for the DSP Builder system (at
dspb_psc_ctrl.DSPBASampleTime) between data packets. This value is
dspb_psc_ctrl.SampleTime/dspb_psc_ctrl.DSPBASampleTime.
The verification script within ALU folding uses the To Workspace blocks. The
verification script searches for To Workspace blocks on the output of systems to fold.
The script uses these blocks to record the outputs from both the design example with
and without folding. The script compares the results with respect to valid outputs. To
run the verification script, enter the following command at the MATLAB prompt:
Folder.Testing.RunTest('psc_ctrl_alu');
The direct current component (0 degrees) is set to zero. The algorithm involves the
following steps:
• Converting the 3-phase feedback current inputs and the rotor position from the
encoder into quadrature and direct current components with the Clarke and Park
transforms.
• Using these current components as the inputs to two proportional and integral (PI)
controllers running in parallel to control the direct current to zero and the
quadrature current to the desired torque.
• Converting the direct and quadrature current outputs from the PI controllers back
to 3-phase currents with inverse Clarke and Park transforms.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
112
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
For more information about fine Doppler estimators, refer to Fundamentals of Radar
Signal Processing by Mark A. Richards, McGraw-Hill, ISBN 0-07-144474-2, ch. 5.3.4.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
113
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
A complex number C is in the Mandelbrot set if for the following equation the value
remains finite when repeatedly iterated:
z(n + 1) = zn2 + C
The system takes longer to perform floating-point calculations than for the
corresponding fixed-point calculations. You cannot wait around for partial results to be
ready, if you want to achieve maximum efficiency. Instead, you must ensure your
algorithm fully uses the floating-point calculation engines. The design contains two
floating-point math subsystems: one for scaling and offsetting pixel indices to give a
point in the complex plane; the other to perform the main square-and-add iteration
operation.
For this design example, the total latency is approximately 19 clock cycles, depending
on target device and clock speed. The latency is not excessive; but long enough that it
is inefficient to wait for partial results.
FIFO buffers control the circulation of data through the iterative process. The FIFO
buffers ensure that if a partial result is available for a further iteration in the
z(n +1) = zn2 + C progression, the design works on that point.
Otherwise, the design starts a new point (new value of C). Thus, the design maintains
a full flow of data through the floating-point arithmetic. This main iteration loop can
exert back pressure on the new point calculation engine. If the design does not read
new points off the command queue FIFO buffers quickly enough, such that they fill up,
the loop iteration stalls. The design does not explicitly signal the calculation of each
point when it is required (and thus avoid waiting through the latency cycles before you
can use it). The design does not attempt to exactly calculate this latency in clock
cycles. The design tries to issue generate point commands the exact number of clock-
cycles before you need them. You must change them each time you retarget a device,
or change target clock rate. Instead, the design calculates the points quickly from the
start and catches them in a FIFO buffer. If the FIFO buffer starts to get full—a
sufficient number of cycles ahead of full—The design stops the calculation upstream
without loss of data. This selfregulating flow mitigates latency while remaining flexible.
Avoid inefficiencies by designing the algorithm implementation around the latency and
availability of partial results. Data dependencies in processing can stall processing.
The design example uses the FinishedThisPoint signal as the valid signal. Although
the system constantly produces data on the output, it marks the data as valid only
when the design finishes a point. Downstream components can then just process valid
data, just as the enabled subsystem in the testbench captures and plot the valid
points.
In both feedback loops, you must provide sufficient delay for the scheduler to
redistribute as pipelining. In feed-forward paths you can add pipelining without
changing the algorithm—DSP Builder changes only the timing of the algorithm. But in
feedback loops, inserting a delay can alter the meaning of an algorithm. For example,
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
114
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The FIFO buffers operate in show-ahead mode—they display the next value to be
read. The read signal is a read acknowledgement, which reads the output value,
discards it, and shows the next value. The design uses multiple FIFO buffers with the
same control signal, which are full and give a valid output at the same time. The
design only needs the output control signals from one of the FIFO buffers and can
ignore the corresponding signals from the other FIFO buffers. As floating-point
simulation is not bit accurate to the hardware, some points in the complex plane take
fewer or more iterations to complete in hardware compared to the Simulink
simulation. The results, when you are finished with a particular point, may come out in
a different order. You must build a testbench mechanism that is robust to this feature.
Use the testbench override feature in the Run All Testbenches block:
• Set the condition on mismatches to Warning
• Use the Run All Testbenches block to set an import variable, which brings the
ModelSim results back into MATLAB and a custom verification function that sets
the pass or fail criteria.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
115
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
6.6.11. Normalizer
The normalizer design example demonstrates the ilogb block and the multifunction
ldexp block. The parameters allow you to select the ilogb or ldexp. The design
example implements a simple floating-point normalization. The magnitude of the
output is always in the range 0.5 to 1.0, irrespective of the (non-zero) input.
A matrix multiplication must multiply row and column dot product for each output
element. For 8×8 matrices A and B:
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
116
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
8
ABij = ∑ AikBkj
k=1
You may accumulate the adjacent partial results, or build adder trees, without
considering any latency. However, to implement with a smaller dot product, consider
resource usage folding, which uses a smaller number of multipliers rather than
performing everything in parallel. Also split up the loop over k into smaller chunks.
Then reorder the calculations to avoid adjacent accumulations.
A better implementation is to use FIFO buffers to provide self-timed control. New data
is accumulated when both FIFO buffers have data. This implementation has the
following advantages:
• Runs as fast as possible
• Is not sensitive to latency of dot product on devices or fMAX
• Is not sensitive to matrix size (hardware just stalls for small N)
• Can be responsive to back pressure, which stops FIFO buffers emptying and full
feedback to control
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
117
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
reception, information from different elements are combined such that the expected
pattern of radiation is preferentially observed. A number of different algorithms exist.
An efficient scheme combines multiple paths constructively.
The simulation calculates the phases in MATLAB code (as a reference), simulates the
beamformer 2D design to calculate the phases in DSP Builder Advanced Blockset,
compares the reference to the simulation results and plots the beam pattern.
The design example uses vectors of single precision floating-point numbers, with
state-machine control from two for loops.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
118
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks.
In this design example, the top level of the FPGA device (marked by the Device
block) and the synthesizable KroneckerSubsystem subsystem (marked by the
SynthesisInfo block) are at different hierarchy levels.
The FirChip subsystem includes the Device block and a lower-level primitive FIR
subsystem.
The primitive FIR subsystem includes ChannelIn, ChannelOut, FIFO, Not, And,
Mux, SampleDelay, Const, Mult, Add, and SynthesisInfo blocks.
In this design example, the top level of the FPGA device (marked by the Device
block) and the synthesizable Primitive FIR subsystem (marked by the SynthesisInfo
block) are at different hierarchy levels.
This design example shows how back pressure from a downstream block can halt
upstream processing. This design example provides three FIR filters. A FIFO buffer
follows each FIR filter that can buffer any data that is flowing through the FIFO buffer.
If the FIFO buffer becomes half full, the design asserts the ready signal back to the
upstream block. This signal prevents any new input (as flagged by valid) entering the
FIR block. The FIFO buffers always show the next data if it is available and the valid
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
119
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
signal is asserted high. You must AND this FIFO valid signal with the ready signal to
consume the data at the head of the FIFO buffer. If the AND result is high, you can
consume data because it is available and you are ready for it.
You can chain several blocks together in this way, and no ready signal has to feed
back further than one block, which allows you to use modular design techniques with
local control.
The delay in the feedback loop represents the lumped delay that spreads throughout
the FIR filter block. The delay must be at least as big as the delay through the FIR
filter. This delay is not critical. Experiment with some values to find the right one. The
FIFO buffer must be able to hold at least this much data after it asserts full. The full
threshold must be at least this delay amount below the size of the FIFO buffer (64 –
32 in this design example).
The final block uses an external ready signal that comes from a downstream block in
the system.
The FirChip subsystem includes the Device block and a lower-level Primitive FIR
subsystem.
In this design example, the top level of the FPGA device (marked by the Device
block) and the synthesizable primitive FIR subsystem (marked by the SynthesisInfo
block) are at different hierarchy levels.
The design example has a sequence of three FIR filters that stall when the valid signal
is low, preventing invalid data polluting the datapath. The design example has a
regular filter structure, but with a delay line implemented in single-cycle latches—
effectively an enabled delay line.
You need not enable everything in the filter (multipliers, adders, and so on), just the
blocks with state (the registers). Then observe the output valid signal, which DSP
Builder pipelines with the logic, and observe the valid output data only.
You can also use vectors to implement the constant multipliers and adder tree, which
also speeds up simulation.
You can improve the design example further by using the TappedDelayLine block.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
120
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The FirChip subsystem includes the Device block and a lower-level Primitive FIR
subsystem.
In this design example, the top level of the FPGA device (marked by the Device
block) and the synthesizable primitive FIR subsystem (marked by the SynthesisInfo
block) are at different hierarchy levels.
The design example has a sequence of three FIR filters that stall when the valid signal
is low, preventing invalid data polluting the datapath. The design example has a
regular filter structure, but with a delay line implemented in single-cycle latches—
effectively an enabled delay line.
You need not enable everything in the filter (multipliers, adders, and so on), just the
blocks with state (the registers). Then observe the output valid signal, which DSP
Builder pipelines with the logic, and observe the valid output data only.
You can also use vectors to implement the constant multipliers and adder tree, which
also speeds up simulation. You can improve the design example further with the
TappedDelayLine block.
The token-passing structure is typical for a nested-loop structure. The bs port of the
innermost loop (ForLoopB) connects to the bd port of the same loop, so that the next
loop iteration of this loop starts immediately after the previous iteration.
The bs port of the outer loop (ForLoopA) connects to the ls port of the inner loop;
the ld port of the inner loop loops back to the bd port of the outer loop. Each iteration
of the outer loop runs a full activation of the inner loop before continuing on to the
next iteration.
The ls port of the outer loop connect to external logic and the ld port of the outer
loop is unconnected, which is typical of applications where the control token is
generated afresh for each activation of the outermost loop.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
121
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The initialization, step, and limit values do not have to be constants. By using the
count value from an outer loop as the limit of an inner loop, the counter effectively
walks through a triangular set of indices.
The token-passing structure for this loop is identical to that for the rectangular loop,
except for the parameterization of the loops.
The digital upconverter includes: input memory, upconverter, FIR filter, scaler, mixer
and digital predistortion (DPD).
hdl_import_calc_fir_coefs.m A script to generate the FIR coefficients using MATLAB's cfirpm function. DSP Builder
prints the coefficients to MATLAB's Command Window and you can copy and paste
them into coefficients.vhd.
calc_dpd_coefs.m A script to generate the DPD coefficients using a simple polynomial model of a power
amplifier. DSP Builder prints the coefficients MATLAB's Command Window and you
can copy and paste them into lut_dpd.vhd.
VHDL Components
The design example includes a complex FIR filter in VHDL optimized for Intel Stratix
10 devices. This FIR filter has one valid data sample every eight clock cycles.
See Designing Filters for High Performance.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
122
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Top-Level Design
The top-level design contains the device-level subsystem and five downsample and
spectrum analyzer blocks from MathWork's DSP System Toolbox. These blocks show
the spectral output from the various stages of the up-conversion chain.
Digital Up Converter
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
123
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
This scheduled subsystem contains two SharedMem blocks, which contain the 20
MSPS baseband source: one for the real part of the signal and one for the imaginary
part. You can write to the blocks via the bus or use the preloaded tones.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
124
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Mixer
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
125
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Scale
The scale scheduled subsystem scales the data so that it fits within the DPD's range of
operation by bit-shifting from the mixer's output. You can use the optional multiplier
for increasing the signal level if bit-shifting is insufficient.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
126
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
FIR Coefficients
DPD
The file lut_dpd.vhd contains the DPD for this design example. The DPD consists of
an address generator that indexes a LUT. The output of the LUT is then multiplied with
the complex input data. The LUT contents are calculated in
hdl_import_calc_dpd_coefs.m. This script uses a simple, real-numbered, third-
order model of an amplifier to calculate predistortion coefficients. DSP Builder uses
these coefficients to calculate the LUT contents.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
127
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The first four waveforms are the real and imaginary input and output of the the FIR. The FIR smooths the zero-
padded signals.
The next four waveforms are the real and imaginary input and output of the the DPD.
The two preloaded memory signals are clearly visible about 0, as are their four aliases because of the zero-
insert upsampling.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
128
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The aliased signals are attenuated by 40dB, as expected from the analysis in calc_fir_coefs.m.
The mixed spectrum shows the baseband signal moving over to be centered on 16 MHz. This view shows the
Simulink clock rate of 1 Hz rather than the FPGA clock rate of 640 MHz, so 16 MHz becomes 25 mHz.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
129
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Scaled looks identical to mixed, except that the signal amplitude is much greater.
The post-DPD output signal is a noiser version of the scaled signal. Observe the two third-order harmonics in
the pass-band.
Related Information
Designing Filters for High Performance
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
130
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The design example has two HDL entities: the DPD (lut_dpd.vhd) and the FIR
(complex_fir.vhd).
In DSP Builder cosimulation, each HDL Import block represents an HDL instance. You
must instantiate both of these entities in a top-level VHDL file. For this design
example, Intel provides top.vhd.
In addition, the FIR filter uses a signed data type with a generic for the data width.
When DSP Builder instantiates the FIR filter, it uses its own paradigm (i.e.
std_logic_vector and no generics). This design example adds a wrapper entity:
complex_fir_wrapper.vhd. This entity instantiates complex_fir, including setting
the generic to the appropriate value, and converts signed to std_logic_vector.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
131
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
132
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
133
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
134
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
6. Press the play button or advance through the simulation a cycle at a time.
7. Verify HDL import with the ModelSim simulator, in DSP Builder, select DSP
Builder ➤ Run ModelSim ➤ Device.
The cosimulation turns any non-high state (e.g. U or X) to a zero.
8. Compile the design in Intel Quartus Prime, by selecting DSP Builder > Run
Quartus Prime Software.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
135
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Decimating CIC and FIR filters down convert eight complex carriers (16 real channels)
from 61.44 MHz. The total decimation rate is 64. A real mixer and NCO isolate the
eight carriers. The testbench isolates two channels of data from the TDM signals using
a channel viewer.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus a ChanView block that deserializes the output bus. An
Edit Params block allows easy access to the setup variables in the
setup_demo_ddc.m script.
Note: This design example uses the Simulink Signal Processing Blockset.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
136
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
This design example shows an interpolating filter chain with interpolating CIC and FIR
filters that up convert eight complex channels (16 real channels). The total
interpolation rate is 50. DSP Builder integrates several Primitive subsystems into the
datapath. This design example shows how you can integrate IP blocks with Primitive
subsystems:
• The programmable Gain subsystem, at the start of the datapath, shows how you
can use processor-visible register blocks to control a datapath element.
• The Sync subsystem is a Primitive subsystem that shows how to manage two
data streams coming together and synchronizing. The design writes the data from
the NCOs to a memory with the channel as an address. The data stream uses its
channel signals to read out the NCO signals, which resynchronizes the data
correctly. Alternatively, you can simply delay the NCO value by the correct number
of cycles to ensure that the NCO and channel data arrive at the Mixer on the
same cycle.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus a ChanView block that deserializes the output bus. An
Edit Params block allows easy access to the setup variables in the
setup_demo_duc.m script.
The DUCChip subsystem includes a Device block and a lower level DUC16
subsystem.
It also includes lower level Gain, Sync, and CarrierSum subsystems which make use
of other Interface and Primitive blocks including AddSLoad, And, BitExtract,
ChannelIn, ChannelOut, CompareEquality, Const, SampleDelay, DualMem,
Mult, Mux, Not, Or, RegBit, RegField blocks, and SynthesisInfo blocks.
Note: This design example uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus a ChanView block that deserializes the output bus.
The DUCChip subsystem includes a Device block and a lower level DUC2Antenna
subsystem.
Note: This design example uses the Simulink Signal Processing Blockset.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
137
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Interpolating CIC and FIR filters up convert a single complex channel (2 real
channels). A NCO and Mixer subsystem combine the complex input channels into a
single output channel.
This design example shows how quick and easy it is to emulate the contents of an
existing datapath. A Mixer block implements the mixer in this design example as the
data rate is low enough to save resource using a time-shared hardware technique.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus a ChanView block that deserializes the output bus. An
Edit Params block allows easy access to the setup variables in the
setup_demo_AD9856.m script.
The AD9856 subsystem includes a Device block and a lower level DUCIQ
subsystem.
Note: This design example uses the Simulink Signal Processing Blockset.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
138
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The CornerTurn turn block makes extensive use of Simulink Goto/From blocks to
reduce the wiring complexity. The top-level testbench includes Control and Signals
blocks. The IDCTChip subsystem includes the Device block and a lower level IDCT
subsystem. The IDCT subsystem includes lower level subsystems that it describes
with the ChannelIn, ChannelOut, Const, BitCombine, Shift, Mult, Add, Sub,
BitExtract, SampleDelay, OR Gate, Not, Sequence, and SynthesisInfo blocks.
This design example shows a complex loop with several subloops that it schedules and
pipelines without inserting registers. The design example spreads a lumped delay
around the circuit to satisfy timing while maintaining correctness. Processor visible
registers control the thresholds and gains.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
139
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
In complex algorithmic circuits, the zero-latency blocks make it easy to follow a data
value through the circuit and investigate the algorithm without offsetting all the
results by the pipelining delays.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks.
The AGC_Chip subsystem includes the Device block, a RegField block and a lower
level AGC subsystem.
The one input BitCombine block is a special case that concatenates all the
components of the input vector and produces one wide scalar output signal. You can
apply 1-bit reducing operators to vectors of Boolean signals. The BitCombine block
supports multiple input concatenation. When vectors of Boolean signals are input on
multiple ports, corresponding components from each vector are combined so that the
output is a vector of signals.
This block converts a scalar signal into a vector of Boolean signals. You use the
initialization parameter to arbitrarily order the components of the vector output by the
BitExtract block. If the input to a BitExtract block is a vector, different bits can be
extracted from each of the components. The output does not always have to be a
vector of Boolean signals. You may split a 16-bit wide signal into four components
each 4-bits wide.
The RGB data arrives as three parallel signals each clock cycle. The model file is
demo_csc.mdl.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
140
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Forward paths compensate for nonlinear power amplifiers by applying the inverse of
the distortion that the power amplifier generates, such that the pre-distortion and the
distortion of the power amplifier cancel each other out. The power amplifier's non-
linearity may change over time, therefore such systems are typically adaptive.
This design example is based on "A robust digital baseband pre-distorter constructed
using memory polynomials," L. Ding, G. T. Zhou, D. R. Morgan, et al., IEEE
Transactions on Communications, vol. 52, no. 1, pp. 159-165, 2004.
This design example only implements the forward path, which is representative of
many systems where you implement the forward path in FPGAs, and the feedback
path on external processors. The design example sets the predistortion memory, Q, to
8; the highest nonlinearity order K is 5 in this design example. The file
setup_demo_dpd_fwdpath initializes the complex valued coefficients, which are
stored in registers. During operation, the external processor continuously improves
and adapts these coefficients with a microcontroller interface.
This design example shows that even for circuitry with tight feedback loops and 120-
bit adders, designs can achieve high data rates by the pipelining algorithms. The top-
level testbench includes Control, Signals, Run ModelSim, and Run Quartus Prime
blocks. The Chip subsystem includes the Device block and a lower level FibSystem
subsystem. The FibSystem subsystem includes ChannelIn, ChannelOut,
SampleDelay, Add, Mux, and SynthesisInfo blocks.
Note: In this design example, the top-level of the FPGA device (marked by the Device
block) and the synthesizable Primitive subsystem (marked by the SynthesisInfo
block) are at different hierarchy levels.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
141
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Folded designs repeatedly use a single dual sort stage. The throughput of the design is
limited in the number of channels, vector width, and data rate. The data passes
through the dual sort stage (vector width)/2 times. The vector sort design example
uses full throughput with (vector width)/2 dual sort stages in sequence.
The design example allows you to generate a valid signal. The design example only
generates output and can only accept input every N cycles, where N depends on the
number of stages, the data output format, and the target fMAX. The valid signal goes
high when the output is ready. You can use this output signal to trigger the next input,
for example, a FIFO buffer read for bursty data.
DSP Builder generates results using the same techniques as in the floating point
functions but at generally reduced resource usage, depending on data bit width.
Outputs are faithfully rounded. If the exact result is between two representable
numbers within the data format, DSP Builder uses either of them. In some instances
you see a difference in output result between simulation and hardware by one LSB. To
get bit-accurate results at the subsystem level, this example uses the Bit Exact
option on the SynthesisInfo block.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
142
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
You can also specify the seed value for the random sequence using the seed_value
input. The reset input resets the sequence to the initial state defined by the
seed_value. The output is a 32-bit single-precision floating-point number.
An external input enables a counter that addresses a lookup-table (LUT) that contains
some text. The design example writes the result to a MATLAB array. You can examine
the contents with a char(message) command in the MATLAB command window.
This design example does not use any ChannelIn, ChannelOut, GPIn, or GPOut
blocks. The design example uses Simulink ports for simplicity although they prevent
the automatic testbench flow from working.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks.
The Chip subsystem includes Device, Counter, Lut, and SynthesisInfo blocks.
Note: In this design example, the top-level of the FPGA device (marked by the Device
block) and the synthesizable Primitive subsystem (marked by the SynthesisInfo
block) are at the same level.
The testbench reloads the counter with new parameters every 64 cycles. A manual
switch allows you to control whether the counter is permanently enabled, or only
enabled on alternate cycles. You can view the signals input and output from the
counter with the provided scope.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
143
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
You can use one of the following ways to specify the contents of the Lut block:
• Specify table contents as single row or column vector. The length of the 1D row or
column vector determines the number of addressable entries in the table. If DSP
Builder reads vector data from the table, all components of a given vector share
the same value.
• When a look-up table contains vector data, you can provide a matrix to specify the
table contents. The number of rows in the matrix determines the number of
addressable entries in the table. Each row specifies the vector contents of the
corresponding table entry. The number of columns must match the vector length,
otherwise DSP Builder issues an error.
Note: The default initialization of the LUT is a row vector round([0:255]/17). This vector
is inconsistent with the default for the DualMem block, which is a column vector
[zeros(16, 1)]. The latter form is consistent with the new matrix initialization form in
which the number of rows determines the addressable size.
You can initialize both the dual memory and LUT Primitive library blocks with matrix
data.
The number of rows in the 2D matrix that you provide for initialization determines the
addressable size of the dual memory. The number of columns must match the width of
the vector data. So the nth column specifies the contents of the nth dual memory.
Within each of these columns the ith row specifies the contents at the (i –- 1)th
address (the first row is address zero, second row address 1, and so on).
The exception for this row and column interpretation of the initialization matrix is for
1D data, where the initialization matrix consists of either a single column or single
row. In this case, the interpretation is flexible and maps the vector (row or column)
into the contents of each dual memory. In the previous behavior all dual memories
have identical initial contents.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
144
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
This design example has many feedback loops. The design example implements all the
pipelined delays in the circuit automatically. The multiple channels provide more
latency around the circuit to ensure a high clock frequency result. Lumped delays
allow you to easily parameterize the design example when changing the channel
counts. For example, masking the subsystem provides the benefits of a black-box IP
block but with visibility.
The top-level testbench includes Control and Signals blocks, plus ChanView block
that deserialize the output buses.
The IIRChip subsystem includes the Device block and a masked IIRSubsystem
subsystem. The coefficients for the filter are set from [b, a] = ellip(2, 1, 10, 0.3); in
the callbacks for the masked subsystem. You can look under the mask to see the
implementation details of the IIRSubsystem subsystem which includes ChannelIn,
ChannelOut, SampleDelay, Const, Mult, Add, Sub, Convert, and SynthesisInfo
blocks.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks.
The first datapath reinterprets a single precision complex signal into raw 32-bit
components that separate into real and imaginary parts. A BitCombine block then
merges it into a 64-bit signal. The second datapath uses the BitExtract block to split
a 64-bit wide signal into a two component vectors of 32-bit signals. The
ReinterpretCast block then converts the raw bit pattern into single-precision IEEE
format. The HDL that the design synthesizes is simple wire connections, which
performs no computation.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
145
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
In decimation mode, the design example accepts a new sample every clock cycle, and
produces a new result every two clock cycles. When interpolating, the design example
accepts a new input every other clock cycle, and produces a new result every clock
cycle. In both cases, the design example fully uses multipliers, making this structure
very efficient compared to parallel instantiations of interpolate and decimate filters, or
compared to a single rate filter with external interpolate and decimate stages.
The design example allows you to generate a valid signal. The design example only
generates output and can only accept input every N cycles, where N depends on the
number of stages, the data output format, and the target fMAX. The valid signal goes
high when the output is ready. You can use this output signal to trigger the next input,
for example, a FIFO buffer read for bursty data.
The Mode input can either rotate the input vector by a specified angle, or rotate the
input vector to the x-axis while recording the angle required to make that rotation.
You can experiment with different size of inputs to control the precision of the CORDIC
output.
The SinCos and AGC subsystem includes ChannelIn, ChannelOut, CORDIC, and
SynthesisInfo blocks.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
146
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
You can specify the seed value for the random sequence using the seed_value input.
The reset input resets the sequence to the initial state defined by the seed_value.
The output is a 32-bit random number, which can be interpreted as a random integer
sampled from the uniform distribution.
For sorting, the sortstages subsystem allows either a comparator and mux based
block, or one based on a minimum and a maximum block. The first is more efficient.
Both use the reconfigurable subsystem to choose between implementations using the
BlockChoice parameter.
The design repeatedly uses a dual sort stage in series. The data passes through the
dual sort stage (vector width)/2 times.
Folded designs repeatedly use a single dual sort stage. The throughput of the design is
limited in the number of channels, vector width, and data rate. The data passes
through the dual sort stage (vector width)/2 times. The vector sort design example
uses full throughput with (vector width)/2 dual sort stages in sequence.
When the SampleDelay Primitive library block receives vector input, you can
independently specify a different delay for each of the components of the vector.
You may give individual components zero delay resulting in a direct feed through of
only that component. Avoid algebraic loops if you select some components to be zero
delays.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
147
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
This rule only applies when DSP Builder is reading and outputting vector data. A scalar
specification of delay length still sets all the delays on each vector component to the
same value. You must not specify a vector that is not the same length as the vector on
the input port. A negative delay on any one component is also an error. However, as in
the scalar case, you can specify a zero length delay for one or more of the
components.
The output type of the adder is propagated from one of the inputs. You must select
the correct input, otherwise the accumulator fails to schedule. You may add a Convert
block to ensure the accumulator also maintains sufficient precision.
The optional use of a two-to-one multiplexer allows the accumulator to load values
according to a Boolean control signal. The inputs differ in precision, so the type with
wider fractional part must be propagated to the output type of the adder, otherwise
the accumulator fails to schedule. Converting both inputs to the same precision
ensures that the single-channel accumulator can always be scheduled even at high
fMAX targets.
If neither input has a fixed-point type that is suitable for the adder to output, use a
Convert block to ensure that the precision of both inputs to the Add block are the
same. Scheduling of this accumulator at high fMAX fails.
This folder accesses groups of reference designs that illustrate the design of DDC and
DUC systems for digital intermediate frequency (IF) processing.
The first group implements IF modem designs compatible with the Worldwide
Interoperability for Microwave Access (WiMAX) standard. Intel provides separate
models for one and two antenna receivers and transmitters.
The second group implement IF modem designs compatible with the wideband Code
Division Multiple Access (W-CDMA) standard.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
148
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
STAP for radar systems applies temporal and spatial filtering to separate slow moving
targets from clutter and null jammers. Applications demand highprocessing
requirements and low latency for rapid adaptation. High-dynamic ranges demand
floating-point datapaths.
Related Information
AN 544: Digital IF Modem Design with the DSP Builder Advanced Blockset
For more information about these designs
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
149
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks.
the design includes an Edit Params block to allow easy access to the setup variables
in the setup_wimax_ddc_1rx.m script.
The FIR filters implement a decimating filter chain that down convert the two channels
from a frequency of 89.6 MSPS to a frequency of 11.2 MSPS (a total decimation rate
of eight). The real mixer, NCO, and Interleaver subsystem isolate the two channels.
The design configures the NCO with a single-channel to provide one sine and one
cosine wave at a frequency of 22.4 MHz. The NCO has the same sample rate (89.6
MSPS) as the input data sample rate.
A system clock rate of 179.2 MHz drives the design on the FPGA that the Device block
defines inside the DDCChip subsystem.
Note: This reference design uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks.
the design includes an Edit Params block to allow easy access to the setup variables
in the setup_wimax_ddc_2rx_iiqq.m script.
The FIR filters implement a decimating filter chain that down convert the two channels
from a frequency of 89.6 MSPS to a frequency of 11.2 MSPS (a total decimation rate
of 8). The real mixer and NCO isolate the two channels. The design configures the
NCO with two channels to provide two sets of sine and cosine waves at the same
frequency of 22.4 MHz. The NCO has the same sample rate of (89.6 MSPS) as the
input data sample rate.
A system clock rate of 179.2 MHz drives the design on the FPGA, which the Device
block defines inside the DDCChip subsystem.
Note: This reference design uses the Simulink Signal Processing Blockset.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
150
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks.
The design includes an Edit Params block to allow easy access to the setup variables
in the setup_wimax_duc_1tx.m script.
The DUCChip subsystem includes a Device block to specify the target FPGA device,
and a DUC2Channel subsystem which contains SingleRateFIR, Scale,
InterpolatingFIR, NCO, and ComplexMixer blocks. The deinterleaver subsystem
contains a series of Primitive blocks including delays and multiplexers that
deinterleave the two I and Q channels.
The FIR filters implement an interpolating filter chain that up converts the two
channels from a frequency of 11.2 MSPS to a frequency of 89.6 MSPS (a total
interpolating rate of 8). The complex mixer and NCO modulate the two input channel
baseband signals to the IF domain. The design configures the NCO with a single
channel to provide one sine and one cosine wave at a frequency of 22.4 MHz. The
NCO has the same sample rate (89.6 MSPS) as the input data sample rate.
A system clock rate of 179.2 MHz drives the design on the FPGA, which the Device
block defines inside the DUCChip subsystem.
Note: This reference design uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks.
The design includes an Edit Params block to allow easy access to the setup variables
in the setup_wimax_duc_2tx_iiqq.m script.
The DUCChip subsystem includes a Device block to specify the target FPGA device,
and a DUC2Channel subsystem which contains SingleRateFIR, Scale,
InterpolatingFIR, NCO, ComplexMixer, and Const blocks. It also contains a Sync
subsystem, which shows how to manage two data streams coming together and
synchronizing. The design writes the data from the NCOs to a memory with the
channel index as an address. The data stream uses its channel signals to read out the
NCO signals, which resynchronizes the data correctly. (Alternatively, you can simply
delay the NCO value by the correct number of cycles to ensure that the NCO and
channel data arrive at the Mixer on the same cycle). The deinterleaver subsystem
contains a series of Primitive blocks including delays and multiplexers that de-
interleave the four I and Q channels.
The FIR filters implement an interpolating filter chain that up converts the two
channels from a frequency of 11.2 MSPS to a frequency of 89.6 MSPS (a total
interpolating rate of 8).
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
151
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
A complex mixer and NCO modulate the two input channel baseband signals to the IF
domain. The design configures the NCO to provide two sets of sine and cosine waves
at a frequency of 22.4 MHz. The NCO has the same sample rate (89.6 MSPS) as the
input data sample rate.
The Sync subsystem shows how to manage two data streams coming together and
synchronizing. The design writes the data from the NCOs to a memory with the
channel as an address. The data stream uses its channel signals to read out the NCO
signals, which resynchronizes the data correctly.
A system clock rate of 179.2 MHz drives the design on the FPGA, which the Device
block defines inside the DUCChip subsystem.
Note: This reference design uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks,
plus a ChanView block that isolates two channels of data from the TDM signals.
The CIC and FIR filters implement a decimating filter chain that down converts the
eight complex carriers (16 real channels from two antennas with four pairs of I and Q
inputs from each antenna) from a frequency of 122.88 MSPS to a frequency of 7.68
MSPS (a total decimation rate of 16). The real mixer and NCO isolate the four
channels. The design configures the NCO with four channels to provide four pairs of
sine and cosine waves at frequencies of 12.5 MHz, 17.5 MHz, 22.5 MHz, and 27.5
MHz, respectively. The NCO has the same sample rate (122.88 MSPS) as the input
data sample rate.
The Sync subsystem shows how to manage two data streams that come together and
synchronize. The data from the NCOs writes to a memory with the channel as an
address. The data stream uses its channel signals to read out the NCO signals, which
resynchronizes the data correctly.
A system clock rate of 245.76 MHz drives the design on the FPGA, which the Device
block defines inside the DDCChip subsystem.
Note: This reference design uses the Simulink Signal Processing Blockset.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
152
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks,
plus a ChanView block that isolates two channels of data from the TDM signals.
The CIC and FIR filters implement a decimating filter chain that down converts the two
complex carriers (4 real channels from two antennas with one pair of I and Q inputs
from each antenna) from a frequency of 122.88 MSPS to a frequency of 7.68 MSPS (a
total decimation rate of 16). The real mixer and NCO isolate the four channels. The
design configures the NCO with a single channel to provide one sine and one cosine
wave at a frequency of 17.5 MHz. The NCO has the same sample rate (122.88 MSPS)
as the input data sample rate.
A system clock rate of 122.88 MHz drives the design on the FPGA, which the Device
block defines inside the DDCChip subsystem.
Note: This reference design uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks.
A Spectrum Scope block computes and displays the periodogram of the outputs from
the two antennas.
The DUCChip subsystem includes a Device block to specify the target FPGA device,
and a DUC subsystem that contains InterpolatingFIR, InterpolatingCIC, NCO,
ComplexMixer, and Scale blocks.
The FIR and CIC filters implement an interpolating filter chain that up converts the 16-
channel input data from a frequency of 3.84 MSPS to a frequency of 122.88 MSPS (a
total interpolation factor of 32). The complex mixer and NCO modulate the four
channel baseband input signal onto the IF region. The design configures the NCO with
four channels to provide four pairs of sine and cosine waves at frequencies of 12.5
MHz, 17.5 MHz, 22.5 MHz, and 27.5 MHz, respectively. The NCO has the same sample
rate (122.88 MSPS) as the final interpolated output sample rate from the last CIC filter
in the interpolating filter chain.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
153
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The Sum and SampSelectr subsystems sum up the correct modulated signals to the
designated antenna.
A system clock rate of 245.76 MHz drives the design on the FPGA, which the Device
block defines inside the DUC subsystem.
Note: This reference design uses the Simulink Signal Processing Blockset.
These DUC and matching DDC designs connect to 4 antennas and can process 4
channels per antenna. With a sample rate of 61.44 MHz and a clock rate of 491.52
MHz, these designs represent up- and downconverters used in LTE.
DUC
The top-level design of the upconverter contains a TEST_BENCH block with signal
sources, the upconverter, and a SINKS block that stores the datastreams coming out
of the upconverter in MATLAB variables. Depending on which simulation you run, the
TEST_BENCH block uses either real LTE sample streams or specialized debugging
patterns. The upconverter consists of the LDUC module, the lower DUC, which
contains a channel filter and two interpolating filters, each interpolating by a factor of
2. The filtered sample stream feeds into the COMPLEX MIXER block, where a NCO
generates separate frequencies for each of the four channels, and multiplies the
generated sinewaves with the filtered sample stream. A delay match block ensures
that the sample stream and the generated frequencies align correctly. After the
COMPLEX MIXER block is an antenna summer block, which adds up the different
channels for each antenna, multiplies each with a different frequency, and outputs
them to the four separate antennas.
DDC
The top-level design of the DDC also contains a TESTBENCH block, which contains
source blocks that read from workspace. It uses the data that DSP Builder generates
during the simulation of the DUC. The SINKS block again traces the outputs of the
design in MATLAB variables, which you can analyze and manipulate in MATLAB. The
DDC consists of a complex mixer that matches the complex mixer of the DUC, and the
LDDC (Lower DownConverter), which contains two decimate-by-2 filters and a channel
filter.
Simulation Scripts
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
154
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
generates the input vectors for the downconverter, - then run the downconverter and
analyze the outputs. The designs contains no channel model, but you can add your
own channel model and apply it to the output data of the DUC before running the DDC
to simulate more realistic operating conditions. Run_DUC_DDC_demo.m uses typical
LTE waveforms; Test_DUC_DDC_demo.m works with ramps that help visualizing
which data goes into which channel and which antenna it transmits on. In the test
pattern, an impulse is set first, followed by a ramp on channel 1 on antenna 1. All
other channels and antenna are 0. The next section transmits channel 1 on antenna 1,
channel 2 on antenna 2 … channel 4 on antenna 4. The last section transmits all 4
channels on all 4 antennas, using the full capacity of the system. Use this debug
pattern, if you want to modify or extend the design. Run the scripts using the
echodemo command, to step through the script section by section, by typing
echodemo Run_DUC_DDC_demo.m at the MATLAB command prompt, and then
clicking Next several times to step through the simulation script. Alternatively, you
can run the entire script by typing Run_DUC_DDC_demo.m at the MATLAB command
prompt. The last step of the script calls up a plot function that generates input vs
output plots for each channel, with overlaid input and output plots. These plots should
match closely, displaying only a small quantization error. The script also produces
channel scopes, which show each channel’s data in time and frequency domains.
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks.
A Spectrum Scope block computes and displays the periodogram of the outputs from
the two antennas.
The DUCChip subsystem includes a Device block to specify the target FPGA device,
and a DUC subsystem that contains InterpolatingFIR, InterpolatingCIC, NCO,
ComplexMixer, and Scale blocks.
The FIR and CIC filters implement an interpolating filter chain that up convert the four
channel input data from a frequency of 3.84 MSPS to a frequency of 122.88 MSPS (a
total interpolation factor of 32). The complex mixer and NCO modulate the four
channel baseband input signal onto the IF region.
The design example configures the NCO with a single channel to provide one sine and
one cosine wave at a frequency of 17.5 MHz. The NCO has the same sample rate
(122.88 MSPS) as the final interpolated output sample rate from the last CIC filter in
the interpolating filter chain.
A system clock rate of 122.88 MHz drives the design on the FPGA, which the Device
block defines inside the DDC subsystem.
Note: This reference design uses the Simulink Signal Processing Blockset.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
155
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks.
A Spectrum Scope block computes and displays the periodogram of the outputs from
the two antennas.
The DUCChip subsystem includes a Device block to specify the target FPGA device,
and a DUC subsystem that contains InterpolatingFIR, InterpolatingCIC, NCO,
ComplexMixer, and Scale blocks.
The FIR and CIC filters implement an interpolating filter chain that up converts the 16-
channel input data from a frequency of 3.84 MSPS to a frequency of 122.88 MSPS (a
total interpolation factor of 32). This design example uses dummy signals and carriers
to achieve the desired rate up conversion, because of the unusual FPGA clock
frequency and total rate change combination. The complex mixer and NCO modulate
the four channel baseband input signal onto the IF region. The design example
configures the NCO with four channels to provide four pairs of sine and cosine waves
at frequencies of 12.5 MHz, 17.5 MHz, 22.5 MHz and 27.5 MHz, respectively. The NCO
has the same sample rate (122.88 MSPS) as the final interpolated output sample rate
from the last CIC filter in the interpolating filter chain.
The Sync subsystem shows how to manage two data streams that come together and
synchronize. The data from the NCOs writes to a memory with the channel as an
address. The data stream uses its channel signals to read out the NCO signals, which
resynchronizes the data correctly.
The GenCarrier subsystem manipulates the NCO outputs to generate carrier signals
that can align with the datapath signals.
The CarrierSum and SignalSelector subsystems sum up the right modulated signals
to the designated antenna.
A system clock rate of 368.64 MHz, which is 96 times the input sample rate, drives the
design on the FPGA, which the Device block defines inside the DUC subsystem. The
higher clock rate can potentially allow resource re-use in other modules of a digital
system implemented on an FPGA.
Note: This reference design uses the Simulink Signal Processing Blockset.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
156
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks.
A Spectrum Scope block computes and displays the periodogram of the outputs from
the two antennas.
The DUCChip subsystem includes a Device block to specify the target FPGA device,
and a DUC subsystem that contains InterpolatingFIR, InterpolatingCIC, NCO,
ComplexMixer, and Scale blocks.
The FIR and CIC filters implement an interpolating filter chain that up converts the 16-
channel input data from a frequency of 3.84 MSPS to a frequency of 184.32 MSPS (a
total interpolation factor of 48).
The complex mixer and NCO modulate the four channel baseband input signal onto
the IF region. The design configures the NCO with four channels to provide four pairs
of sine and cosine waves at frequencies of 12.5 MHz, 17.5 MHz, 22.5 MHz, and 27.5
MHz, respectively. The NCO has the same sample rate (184.32 MSPS) as the final
interpolated output sample rate from the last CIC filter in the interpolating filter chain.
The Sync subsystem shows how to manage two data streams that come together and
synchronize. The data from the NCOs writes to a memory with the channel as an
address. The data stream uses its channel signals to read out the NCO signals, which
resynchronizes the data correctly.
The CarrierSum and SignalSelector subsystems sum up the right modulated signals
to the designated antenna.
A system clock rate of 368.64 MHz, which is 96 times the input sample rate, drives the
design on the FPGA, which the Device block defines inside the DUC subsystem. The
higher clock rate can potentially allow resource re-use in other modules of a digital
system implemented on an FPGA.
Note: This reference design uses the Simulink Signal Processing Blockset.
The top-level testbench includes Control, Signals, and Run Quartus Prime blocks.
A Spectrum Scope block computes and displays the periodogram of the outputs from
the two antennas.
The DUCChip subsystem includes a Device block to specify the target FPGA device,
and a DUC subsystem that contains InterpolatingFIR, InterpolatingCIC, NCO,
ComplexMixer, and Scale blocks.
The FIR and CIC filters implement an interpolating filter chain that up converts the 16-
channel input data from a frequency of 3.84 MSPS to a frequency of 153.6 MSPS (a
total interpolation factor of 40).
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
157
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The complex mixer and NCO modulate the four channel baseband input signal onto
the IF region. The design configures the NCO with four channels to provide four pairs
of sine and cosine waves at frequencies of 12.5 MHz, 17.5 MHz, 22.5 MHz, and 27.5
MHz, respectively. The NCO has the same sample rate (153.6 MSPS) as the final
interpolated output sample rate from the last CIC filter in the interpolating filter chain.
The Sync subsystem shows how to manage two data streams that come together and
Synchronize. The design writes data from the NCOs to a memory with the channel as
an address. The data stream uses its channel signals to read out the NCO signals,
which resynchronizes the data correctly.
The CarrierSum and SignalSelector subsystems sum up the right modulated signals
to the designated antenna.
A system clock rate of 307.2 MHz, which is 80 times the input sample rate, drives the
design on the FPGA, which the Device block defines inside the DUC subsystem. The
higher clock rate can potentially allow resource re-use in other modules of a digital
system implemented on an FPGA.
Note: This reference design uses the Simulink Signal Processing Blockset.
Lower
Triangle
Input Cholesky Triangular Matrix J Triangular
Matrix (A) Decomposition Matrix Inversion Matrix Mult A_inverse
Diagonal
Reciprocal
Values 1/Lkk
The Cholesky decomposition calculates the reciprocal values of the diagonal elements
1
of L, L which the triangular matrix inversion requires. The design propagates those
kk
values to the output interface of the Cholesky decomposition reducing resource usage
and latency.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
158
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
A = LH
The design performs Cholesky decomposition and calculates the inverse of L, J = L-1,
through forward substitution. J is a lower triangle matrix. The inverse of the input
matrix requires a triangular matrix multiplication, followed by a Hermitian matrix
multiplication:
A-1 = JH ∙ J
Cholesky Decomposition
Top Datapath Bottom Datapath
Circular
1/√ 18s17 (c)
Memory Li, j
Data Scalar Product and
Input and FIFO
Mux and Subtract Multiplier
Memory Operators
Vectorization
16s15(c) 18s17
1/Li, j
Control Logic
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
159
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
s
Li,j Write Controller
mn
ws
lu
Ro
Co
18s17(c) 1 Cycle
Channels
Circular
Rows* Channels
L Input Memory
Li,j
s-1
J Output
mn
Write
Σ X
lu
X Scale
Mux
Co
Controller Scale Negate FIFO
Rows* Channels
1/Li,j Inv Ljj
Input 4 Cycles 7 Cycles 5 Cycles 0 Cycles
18s12
Memory Diagonal
Control Logic
Matrix inversion takes multiple matrices and interleaves the inverse computations for
all matrices. This method hides the latency in computing each element by pipelining
inversion of a completely different channel. Multichannel designs use the idle cycles in
the computation chain to process the next channel. Two buffers at the input and
output of the design create channels for streaming matrices into multichannel
interfaces.
Sink_Valid Input Boolean 1 Avalon streaming sink valid signal for the input matrix
interface. Number of valid input = (matrix size*(matrix size
+ 1))/2
Sink_Channel Input unsigned integer 8 Avalon streaming sink channel bus for the input matrix
interface.
Sink_Data Input Single floating- 64 bit I/Q Avalon streaming sink data bus for the input matrix
point complex interface. Lower matrix elements are streamed in column
major order.
Source_Valid Output Boolean 1 Avalon streaming source valid signal for output interface.
This signal is asserted for (size*(size+1))/2 clocks
Source_Channel Output unsigned integer 8 Avalon streaming source channel bus for output interface.
Source_Data Output Single floating- 64 bit I/Q Avalon streaming source data bus for output interface.
point complex Lower matrix elements are streamed in column major
order.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
160
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Parameters
Parameter Description
Latency The period in cycles the module waits before receiving the next set of matrices.
DSP Builder calculates the throughput of the design by setting the latency value and
the system clock:
Figure 71. Input streaming interface for 8x8 Hermitian input matrix
The figure shows the latency configuration parameter in the input interface including data, valid, and channel
signals. In this example of 8x8 matrix inversion, the valid signal remains high for 36 clock cycles (total number
of lower triangle elements of the Hermitian matrix of 8x8) and remains low for (latency – 36) cycles before
inserting the next matrix elements. The minimum duration to remain low and hence the minimum latency
period may vary depending on the matrix size and the pipelining required to meet timing constraints.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
161
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Table 21. Recommended Values for the Minimum Latency (maximum throughput)
In Intel Stratix 10 and Intel Arria 10 devices, speed grade –1 and –2, for three different matrix sizes.
4x4 ≥ 30 ≥ 30
8x8 ≥ 75 ≥ 74
Matrix Dimension Number of channels Logic Elements (ALMs) DSP Blocks Memory bits RAM blocks Registers
Table 23. Performance of the floating-point matrix inversion module for different
matrix dimensions
This table shows the fMAX performance of the floating-point design for different matrix sizes with a system clock
of 368.64 MHz and targeting a FPGA device. The maximum throughput is in millions of matrix inversions per
second.
Matrix Dimension Number of channels Target System clock (MHz) fMAX (MHz) ThroughputMAX
The design decomposes A into L*L', therefore L*L'*x = b, or L*y = b, where y = L'*x.
The design solves y with forward substitution and x with backward substitution.
The design calculates the diagonal element of the matrix first before proceeding to
subsequent rows to hide the processing latency of the inverse square root function as
much as possible. Performance of multiple banks operations with vector size of about
30 may increase up to 6%. Expect no performance gain when the vector size is the
same as the matrix size.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
162
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
To input the lower triangular elements of matrix A and b with the input bus, specify
the column, row, and channel index of each element. The design transposes and
appends the column vector b to the bottom of A and treats it as an extension of A in
terms of column and row addressing.
The output is column vector x with the bottom element output first.
The design decomposes A into L*L', therefore L*L'*x = b, or L*y = b, where y = L'*x.
The design solves y with forward substitution and x with backward substitution.
This design uses cycle stealing and command FIFO techniques to enhance
performance. Although it targets multiple channels, it also works well with single
channels.
To input the lower triangular elements of matrix A and b with the input bus, specify
the column, row, and channel index of each element. The design transposes and
appends the column vector b to the bottom of A and treats it as an extension of A in
terms of column and row addressing.
The output is column vector x with the bottom element output first.
You can change the simulation length by clicking on the Simulink Length block.
Related Information
Crest factor reduction for wireless systems
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
163
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The FIR filter length is 2 x (Dmax / Dmin) x N + 1 where Dmax and Dmin are the
maximum and minimum decimation ratios and N is the number of (1 sided) symmetric
coefficients at Dmin.
All channels must have the same decimation ratio. The product of the number of
channels and the minimum decimation ratio must be 4 or more. The design limits the
wire count to 1 and:
To optimize the overall throughput the solver can interleave multiple data instances at
the same time. The inputs of the design are system matrices A [n × m] and input
vectors.
The reference design uses the Gram-Schmidt method to decompose system matrix A
to Q and R matrices. It calculates the solution of the system by completing backward
substitution.
6.12.20. QR Decompostion
This reference design is a complete linear equations system solution that uses QR
decomposition.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
164
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The reference design uses the Gram-Schmidt method to decompose system matrix A
to Q and R matrices, and calculates the solution of the system by completing
backward substitution.
This design uses the Run All Testbenches block to access enhanced features of the
automatically-generated testbench. An application-specific m-function verifies the
simulation output, to correctly handle the complex results and the numerical
approximation because of the floating-point format.
You can modify the parameters in the setup_vardownsampler.m file, which you
access from the Edit Params icon.
The top-level testbench includes blocks to access control and signals, and to run the
Quartus Prime software. It also includes an Edit Params block to allow easy access to
the configuration variables in the setup_sc_LTEtxr.m script. A discrete-time scatter
plot scope displays the constellation of the modulated signal in inphase versus
quadrature components.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
165
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The LTE_txr subsystem includes a Device block to specify the target FPGA device,
and 64QAM, 1K_IFFT, ScaleRnd, CP_bReverse, Chg_Data_Format, and DUC
blocks.
The 64QAM subsystem uses a lookup table to convert the source input data into 64
QAM symbol mapped data. The 1K_IFFT subsystem converts the frequency domain
quadrature amplitude modulation (QAM) modulated symbols to the time domain. The
ScaleRnd subsystem follows the conversion, which scales down the output signals
and converts them to the specified fixed-point type.
The bit CP_bReverse subsystem adds extended cycle prefix (CP) or guard interval
for each orthogonal frequency-domain multiplexing (OFDM) symbol to avoid
intersymbol interference (ISI) that causes multipaths. The CP_bReverse block
reorders the output bits of IFFT subsystems, which are in bit-reversed order, so that
they are in the correct order in the time domain. The design adds the cyclic prefix bit
by copying the last 25% of the data frame, then appends to the beginning of it.
A system clock rate of 245.76 MHz drives the design on the FPGA. The Signals block
of the design defines this clock. The input random data for the 64QAM symbol
mapping subsystem has a data rate of 15.36 Msps.
The design applies this linear system of equations to the steering vector in the
following two steps:
• Forward substitution with the lower triangular matrix
• Backward substitution with the lower triangular matrix
This design uses advanced settings from the DSP Builder > Verify Design menu to
access enhanced features of the automatically generated testbench. An application
specific m-function verifies the simulation output, to correctly compare complex
results and properly handle floating-point errors that arise from the ill-conditioning of
the QRD output.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
166
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
This design uses the Run All Testbenches block to access enhanced features of the
automatically generated testbench. An application specific m-function verifies the
simulation output, to correctly handle the complex results and the numerical
approximation due to the floating-point format.
The design includes the following features so you can simulate and verify the transmit
and receive beamforming operations:
• Waveform (chirp) generation
• Target emulation
• Receiver noise emulation
• Aperture tapering
• Pulse compression
The transmitter can produce random data, which is useful for generating a hardware
demo, or you can feed it with data from the MATLAB environment. You can modulate
the data, where the modulation order can be QAM4 or QAM64. The design filters the
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
167
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
signal, and then feeds it into optional crest factor reduction (CFR) and digital
predistortion (DPD) blocks. Intel assumes you have a control processor that configures
modulation scheme and CFR and DPD parameters.
The channel model contains a random noise source, and a channel model, which you
can configure through the setup script. This channel model allows you to build a
hardware demonstrator on a standard FPGA development platform, without DA or AD
converters and analogue components. Following the channel model is the model of a
decimating ADC, which emulates the behavior of some existing ADC components that
provide this functionality.
The receiver contains an RRC filter, followed by an equalizer. Intel assumes that a
control processor calculates the equalizer coefficients. The equalizer feeds into an AGC
block, which feeds into a demapper. You can configure the demapper to different
modulation orders.
You can modify the parameters in the setup_vardecimator_rt.m file, which you
access from the Edit Params icon.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
168
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_complex_mixer.m script.
Note: This design example uses the Simulink Signal Processing Blockset.
This design example demonstrates frequency-hopping with the NCO block to generate
four channels of sinusoidal waves that you can switch from one set (bank) of
frequencies to another.
The phase increment values are set directly into the NCO Parameter dialog box as a
2 (rows) × 4 (columns) matrix. The input for the bank index is set up so that it
alternates between the two predefined banks with each one lasting 2000 steps.
A BusStimulus block sets up an Avalon-MM interface that writes into the phase
increment memory registers. It shows how you can use the Avalon-MM interface to
dynamically change the frequencies of the NCO-generated sinusoidal signals at run
time. This design example uses a 16-bit memory interface (as the Control block
specifies) and a 24-bit the accumulator in the NCO block. The design example
requires two registers for each phase increment value. With the base address of the
phase increment memory map set to 1000 in this design example, the addresses
[1000 1001 1002 1003 1012 1013 1014 1015] write to the phase increment memory
registers of channels 1 and 2 in bank 1, and to the registers of channels 3 and 4 in
bank 2. The write data is also made up of two parts with each part writing to one of
the registers feeding the selected phase increment accumulators.
This design example has two banks of frequencies with each bank processes 2,000
steps before switching to the other. You should write a new value into the phase
increment memory register for each bank to change the NCO output frequencies after
8,000 steps during simulation. To avoid writing new values to the active bank, the
design example configures the write enable signals in the following way:
This configuration ensures that a new phase increment value for bank 0 is written at
7000 steps when the NCO is processing bank 1; and a new phase increment value for
bank 1 is written at 9000 steps when the NCO is processing bank 0.
Four writes for each bank exist to write new values for channel 1 and 2 into bank 0,
and new values for channel 3 and 4 into bank 1. Each new phase value needs two
registers due to the size of the memory interface.
The Spectrum Scope block shows three peaks for a selected channel with the first
two peaks representing the two banks and the third peak showing the frequency that
you specify through the memory interface. The scope of the select channel shows the
sinusoidal waves of the channel you select. You can zoom in to see the smooth and
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
169
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
continuous sinusoidal signals at the switching point. You can also see the frequency
changes after 8000 steps where the phase increment value alters through the memory
interface.
Note: This design example uses the Simulink Signal Processing Blockset.
This design example is similar to the Four Channel, Two Banks NCO design, but it has
four banks of frequencies defined for the phase increment values. Each spectrum plot
has five peaks: the fifth peak shows the changes the design example writes through
the memory interface.
The design example uses a 32-bit memory interface with a 24-bit accumulator. Hence,
the design example requires only one phase increment memory register for each
phase increment value—refer to the address and data setup on the BusStimulus
block inside this design example.
This design example has four banks of frequencies with each bank processed for 2,000
steps before switching to the other. You should write a new value into the phase
increment memory register for each bank to change the NCO output frequencies after
16,000 steps during simulation. To avoid writing new values to the active bank, the
design example configures the write enable signals in the following way:
This configuration ensures that a new phase increment value for bank 0 is written at
15000 steps when the NCO is processing bank 3; a new phase increment value for
bank 1 is written at 17000 steps when the NCO is processing bank 0; a new phase
increment value for bank 2 is written at 19000 steps when the NCO is processing
bank 1; and a new phase increment value for bank 3 is written at 21000 steps when
the NCO is processing bank 2.
There is one write for each bank to write a new value for channel 1 into bank 0; a new
value for channel 2 into bank 1; a new value for channel 3 into bank 2; and a new
value for channel 4 into bank 3. Each new phase value needs only one register due to
the size of the memory interface.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
170
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Note: This design example uses the Simulink Signal Processing Blockset.
This design example is similar to the Four Channel, 16 Banks NCO design, but has
only eight banks of phase increment values (specified in the setup script for the
workspace variable) feeding into the NCO. Furthermore, the sample time for the NCO
requires two wires to output the four channels of the sinusoidal signals. Two wires
exist for the NCO output, each wire only contains two channels. Hence, the channel
indicator is from 0 .. 3 to 0 .. 1.
You can inspect the eight peaks on the spectrum graph for each channel and see the
smooth continuous sinusoidal waves on the scope display.
The design example outputs the data to the workspace and plots through with the
separate demo_mc_nco_extracted_waves.mdl, which demonstrates that the
output of the bank you select does represent a genuine sinusoidal wave. However,
from the scope display, you can see that the sinusoidal wave is no longer smooth at
the switching point, because the design example uses the different values of phase
increment values between the selected banks. You can only run the
demo_mc_nco_extracted_waves.mdl model after you run
demo_mc_nco_8banks_2wires.mdl.
Note: This design example uses the Simulink Signal Processing Blockset.
A workspace variable phaseIncr defines the 16 (rows) × 4 (columns) matrix for the
phase increment input with the phase increment values that the setup script
calculates.
The input for the bank index is set up so that it cycles from 0 to 15 with each bank
lasting 1200 steps.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
171
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The spectrum display shows clearly 16 peaks for the selected channel indicating that
the design example generates 16 different frequencies for that channel. The scope of
the selected channel shows the sinusoidal waves of the selected channel. You can
zoom in to see that the design example generates smooth and continuous sinusoidal
signals at the switching point.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView blocks that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_mc_nco_16banks.m script.
Note: This design example uses the Simulink Signal Processing Blockset.
6.13.6. IP
The IP design example describes how you can build a NCO design with the NCO block
from the Waveform Synthesis library.
Note: This design example uses the Simulink Signal Processing Blockset.
6.13.7. NCO
This design example uses the NCO block from the Waveform Synthesis library to
implement an NCO. A Simulink double precision sine or cosine wave compares the
results.
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView blocks that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_nco.m script.
Note: This design example uses the Simulink Signal Processing Blockset.
Related Information
NCO on page 260
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
172
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
The top-level testbench includes Control, Signals, Run ModelSim, and Run
Quartus Prime blocks, plus ChanView block that deserialize the output buses. An
Edit Params block allows easy access to the setup variables in the
setup_demo_mix.m script.
Note: This design example uses the Simulink Signal Processing Blockset.
A super-sample NCO uses multiple NCOs that each have an initial phase offset. When
you combine the parallel outputs into a serial stream, they can describe frequencies N
times the Nyquist frequency of a single NCO. Where N is the total number of NCOs
that the design uses.
The NCO block produces four outputs, which all have the same phase increment but
each have a different, evenly distributed initial phase offset. With the four parallel
outputs in series they describe frequencies up to four times higher than the Nyquist
frequency of an individual NCO.
To change the frequency of the super-sample NCO using the bus, write a new phase
increment and offset to each of the four constituent NCOs and then strobe the
synchronization register. The NCO block includes the phase increment register; a
separate primitive subsystem implements the phase offset and synchronization
registers.
DSP Builder writes the output of the super-sample NCO into a MATLAB workspace
variable and compares it with a MATLAB-generated waveform in the script
test_demo_nco_super_sample.
DSP Builder schedules the bus in HDL but not in Simulink, so bus writes occur at
different clock cycles. Therefore, the function verify_demo_nco_super_sample
function verifies the design, which checks that the Simulink and ModelSim frequency
distributions match within a tolerance.
The output of the Spectrum Analyser block show the simulation initializes to the last
frequency in dspb_super_nco.frequencies and then rotates through the list.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
173
6. DSP Builder for Intel FPGAs (Advanced Blockset) Design Examples and Reference Designs
HB_DSPB_ADV | 2019.04.01
Note: This design example uses the Simulink Signal Processing Blockset.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
174
HB_DSPB_ADV | 2019.04.01
Send Feedback
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
176
7. DSP Builder Design Rules, Design Recommendations, and Troubleshooting
HB_DSPB_ADV | 2019.04.01
Related Information
• Control on page 236
• Avalon-MM Slave Settings (AvalonMMSlaveSettings) on page 234
• External Memory, Memory Read, Memory Write on page 279
• Channel In (ChannelIn) on page 359
• Channel Out (ChannelOut) on page 360
• Synthesis Information (SynthesisInfo) on page 362
• Setting DSP Builder Design Parameters with MATLAB Scripts on page 182
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
177
7. DSP Builder Design Rules, Design Recommendations, and Troubleshooting
HB_DSPB_ADV | 2019.04.01
Related Information
DSP Builder Design Rules and Recommendations on page 175
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
178
7. DSP Builder Design Rules, Design Recommendations, and Troubleshooting
HB_DSPB_ADV | 2019.04.01
If the pipelining requirements of the functional units around the loop are greater than
the delay specified by the SampleDelay blocks on the loop path, DSP Builder
generates an error message. The message states that distribution of memory failed as
there was insufficient delay to satisfy the fMAX requirement. DSP Builder cannot
simultaneously satisfy the pipelining to achieve the given fMAX and the loop criteria to
re-circulate the data in the number of clock cycles specified by the SampleDelay
blocks.
DSP Builder automatically adjusts the pipeline requirements of every Primitive block
according to these factors
• The type of block
• The target fMAX
• The device family and speedgrade
• The inputs of inputs
• The bit width in the data inputs
Note: Multipliers on Cyclone devices take two cycles at all clock rates. On Stratix V, Arria V,
and Cyclone V devices, fixed-point multipliers take two cycles at low clock rates, three
cycles at high clock rates. Very wide fixed-point multipliers incur higher latency when
DSP Builder splits them into smaller multipliers and adders. You cannot count the
multiplier and adder latencies separately because DSP Builder may combine them into
a single DSP block. The latency of some blocks depends on what pipelining you apply
to surrounding blocks. DSP Builder avoids pipelining every block but inserts pipeline
stages after every few blocks in a long sequence of logical components, if fMAX is
sufficiently low that timing closure is still achievable.
In the SynthesisInfo block, you can optionally specify a latency constraint limit that
can be a workspace variable or expression, but must evaluate to a positive integer.
However, only use this feature to add further latency. Never use the feature to reduce
latency to less than the latency required to pipeline the design to achieve the target
fMAX.
After you run a simulation in Simulink, the help page for the SynthesisInfo block
shows the latency, port interface, and estimated resource utilization for the current
Primitive subsystem.
When no loops exist, feed-forward datapaths are balanced to ensure that all the input
data reaches each functional unit in the same cycle. After analysis, DSP Builder inserts
delays on all the non-critical paths to balance out the delays on the critical path.
In designs with loops, DSP Builder advanced blockset must synthesize at least one
cycle of delay in every feedback loop to avoid combinational loops that Simulink
cannot simulate. Typically, one or more lumped delays exist. To preserve the delay
around the loop for correct operation, the functional units that need more pipelining
stages borrow from the lumped delay.
Designs that have a cycle containing two adders with only a single sample delay are
not sufficient. In automatically pipelining designs, DSP Builder creates a schedule of
signals through the design. From internal timing models, DSP Builder calculates how
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
179
7. DSP Builder Design Rules, Design Recommendations, and Troubleshooting
HB_DSPB_ADV | 2019.04.01
fast certain components, such as wide adders, can run and how many pipelining
stages they require to run at a specific clock frequency. DSP Builder must account for
the required pipelining while not changing the order of the schedule. The single
sample delay is not enough to pipeline the path through the two adders at the specific
clock frequency. DSP Builder is not free to insert more pipelining, as it changes the
algorithm, accumulating every n cycles, rather than every cycle. The scheduler detects
this change and gives an appropriate error indicating how much more latency the loop
requires for it to run at the specific clock rate. In multiple loops, this error may be hit
a few times in a row as DSP Builder balances and resolves each loop.
The folded IIR filter design example (demo_iir_fold2) demonstrates one channel, at
a low data rate. This design example implements a single-channel infinite impulse
response (IIR) filter with a subsystem built from Primitive blocks folded down to a
serial implementation.
The design of the IIR is the same as the IIR in the multichannel example, demo_iir.
As the channel count is one, the lumped delays in the feedback loops are all one. If
you run the design at full speed, there is a scheduling problem. With new data arriving
every clock cycle, the lumped delay of one cycle is not enough to allow for pipelining
around the loops. However, the data arrives at a much slower rate than the clock rate,
in this example 32 times slower (the clock rate in the design is 320 MHz, and the
sample rate is 10 MHz), which gives 32 clock cycles between each sample.
You can set the lumped delays to 32 cycles long—the gap between successive data
samples—which is inefficient both in terms of register use and in underused multipliers
and adders. Instead, use folding to schedule the data through a minimum set of fully
used hardware.
Set the SampleRate on both the ChannelIn and ChannelOut blocks to 10 MHz, to
inform the synthesis for the Primitive subsystem of the schedule of data through the
design. Even though the clock rate is 320 MHz, each data sample per channel is
arriving only at 10 MHz. The RTL is folded down—in multiplier use—at the expense of
extra logic for signal multiplexing and extra latency.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
180
HB_DSPB_ADV | 2019.04.01
Send Feedback
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
For example
C:\Altera\16.0\quartus\dspba\dsp_builder.bat -m "C:\tools\matlab
\R2013a\windows64\bin\matlab.exe"
You can copy the shortcut from the Start menu and paste it to your desktop to create
a desktop shortcut. You can edit the properties to use different installed DSP Builder
releases, different MATLAB releases, or different start directories.
Related Information
Starting DSP Builder in MATLAB
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
182
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
my_design_params.clockrate = 200;
my_design_params.samplerate = 50;
my_design_params.inputChannels = 4;
2. Clear the specific workspace variables you create with a clear-up script that run
when you close the model. Do not use clear all.
For example,. if you use the named structure my_design_params, run clear
my_design_params;. You may have other temporary workspace variables to
clear too.
For example, in a script that passes the design name (without .mdl extension) as
model you can use:
%% Load the model
load_system(model);
%% Get the Signals block
signals = find_system(model, 'type', 'block', 'MaskType', 'DSP Builder
Advanced Blockset Signals Block');
if (isempty(signals))
error('The design must contain a Signals Block. ');
end;
%% Get the Controls block
control = find_system(model, 'type', 'block', 'MaskType', 'DSP Builder
Advanced Blockset Control Block');
if (isempty(control))
error('The design must contain a Control Block. ');
end;%%
Example: set the RTL destination directory
dest_dir = ['../rtl' num2str(freq)];
dspba.SetRTLDestDir(model, rtlDir);
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
183
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
Similarly you can get and set other parameters. For example, on the Signals block
you can set the target clock frequency:
fmax_freq = 300.0;dspba.set_param(signals{1},'freq', fmax_freq);
You can also change the following threshold values that are parameters on the
Control block:
• distRamThresholdBits
• hardMultiplierThresholdLuts
• mlabThresholdBits
• ramThresholdBits
You can loop over changing these values, change the destination directory, run the
Quartus Prime software each time, and perform design space exploration. For
example:
%% Run a simulation; which also does the RTL generation.
t = sim(model);
%% Then run the Quartus Prime compilation flow.
[success, details] = run_hw_compilation(<model>, './')%%
where details is a struct containing resource and timing information
details.Logic,
details.Comb_Aluts,
details.Mem_Aluts,
details.Regs,
details.ALM,
details.DSP_18bit,
details.Mem_Bits,
details.M9K,
details.M144K,
details.IO,
details.FMax,
details.Slack,
details.Required,
details.FMax_unres,
details.timingpath,
details.dir,
details.command,
details.pwd
such that >> disp(details) gives output something like:
Logic: 4915
Comb_Aluts: 3213
Mem_Aluts: 377
Regs: 4725
ALM: 2952
DSP_18bit: 68
Mem_Bits: 719278
M9K: 97
M144K: 0 IO: 116
FMax: 220.1700
Slack: 0.4581
Required: 200
FMax_unres: 220.1700
timingpath: [1x4146 char]
dir: '../quartus_demo_ifft_4096_for_SPR_FFT_4K_n_2'
command: [1x266 char]
pwd: 'D:\test\script'
Note: The Timing Report is in the timingpath variable, which you can display by
disp(details.timingpath). Unused resources may appear as -1, rather than 0.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
184
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
A useful set of commands to generate RTL, compile in the Quartus Prime software and
return the details is:
load_system(<model>);
sim(<model>);
[success, details] = run_hw_compilation(<model>, './')
Based on the FPGA clock rate and data sample rates, you can derive how many clock
cycles are available to process unique data samples. This parameter is called Period in
many of the design examples. For example, for a period of three, a new sample for
the same channel appears every three clock cycles. For multiplication, you have three
clock cycles to compute one multiplication for this channel. In a design with multiple
channels, you can accommodate three different channels with just one multiplier. A
resource reuse potential exists when the period is greater than one.
1. Define the following parameters:
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
185
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
The Simulink Model Info block displays revision control information about a model as
an annotation block in the model's block diagram. It shows revision control
information embedded in the model and information maintained by an external
revision control or configuration management system.
You can customize some revision control tools to use the Simulink report generator
XML comparison, which allows you to compare two versions of the same file.
Note: You do not need to archive autogenerated files such as Quartus Prime project files or
synthesizable RTL files.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
186
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
the latency after you complete part of a DSP Builder design, for example on an IP
library block or for a Primitive subsystem. In other cases, you may want to limit the
latency in advance, which allows future changes to other subsystems without causing
undesirable effects upon the overall design.
To accommodate extra latency, insert registers. This feature applies only to Primitive
subsystems. To access, use the Synthesis Info block.
Latency is the number of delays in the valid signal across the subsystem. The DSP
Builder advanced blockset balances delays in the valid and channel path with
delays that DSP Builder inserts for autopipelining in the datapath.
Note: User-inserted sample delays in the datapath are part of the algorithm, rather than
pipelining, and are not balanced. However, any uniform delays that you insert across
the entire datapath optimize out. If you want to constrain the latency across the entire
datapath, you can specify this latency constraint in the SynthesisInfo block.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
187
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
If the valid input drives directly the valid output, the delay on the valid signal matches
the latency displayed on the ChannelOut block. It doesn't, if the valid output is
generated in any other way, for example by using a Sequence block.
For example, the 4K FFT design example uses a Sequence block to drive the valid
signal explicitly.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
188
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
The latency that the ChannelOut block reports is therefore not 4096 + the automatic
pipelining value, but just the pipelining value.
In this example, the Mult block has a direct feed-through simulation model, and the
following SampleDelay block has a delay of 10. The Mult block has zero delay in
simulation, followed by a delay of 10. In the generated hardware, DSP Builder
distributes part of this 10-stage pipelining throughout the multiplier optimally, such
that the Mult block has a delay (in this case, four pipelining stages) and the
SampleDelay block a delay (in this case, six pipelining stages). The overall result is
the same—10 pipelining stages, but if you try to match signals in the primitive
subsystem against hardware, you may find DSP Builder shifts them by several cycles.
Similarly, if you have insufficient user-inserted delay to meet the required fMAX, DSP
Builder automatically pipelines and balances the delays, and then corrects the cycle-
accuracy of the primitive subsystem as a whole, by delaying the output signals in
simulation by the appropriate number of cycles at the ChannelOut block.
If you specify no pipelining, the simulation design example for the multiplier is direct-
feed-through, and the result appears on the output immediately.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
189
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
To reach the desired fMAX, DSP Builder then inserts four pipelining stages in the
multiplier, and balances these with four registers on the channel and valid paths.
To correct the simulation design example to match hardware, the ChannelOut block
delays the outputs by four cycles in simulation and displays Lat: 4 on the block. Thus,
if you compare the output of the multiplier simulation with the hardware it is now four
cycles early in simulation; but if you compare the primitive subsystem outputs with
hardware they match, because the ChannelOut block provides the simulation
correction for the automatically inserted pipelining.
If you want a consistent 10 cycles of delay across the valid, channel and
datapath, you may need latency constraints.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
190
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
This example has a consistent line of SampleDelay blocks inserted across the design.
However, the algorithm does not use these delays. DSP Builder recognizes that
designs do not require them and optimizes them away, leaving only the delay that
designs require. In this case, each block requires a delay of four, to balance the four
delay stages to pipeline the multiplier sufficiently to reach the target fMAX. The delay of
10 in simulation remains from the non-direct-feed-through SampleDelay blocks. In
such cases, you receive the following warning on the MATLAB command line:
DSP Builder optimizes away some user inserted SampleDelays. The latency on
the valid path across primitive subsystem design name in hardware is 4, which
may differ from the simulation model. If you need to preserve extra
SampleDelay blocks in this case, use the Constraint Latency option on the
SynthesisInfo block.
Note: SampleDelay blocks reset to unknown values ('X'), not to zero. Designs that rely on
SampleDelays output of zero after reset may not behave correctly in hardware. Use
the valid signal to indicate valid data and its propagation through the design.
Generally, problems occur in feedback loops. You can solve these issues by lowering
the fMAX target, or by restructuring the feedback loop to reduce the combinatorial logic
or increasing the delay. You can redesign some control structures that have feedback
loops to make them completely feed forward.
You cannot set a latency constraint that conflicts with the constraint that the fMAX
target implies. For example, a latency constraint of < 2 may conflict with the fMAX
implied pipelining constraint. The multiplier may need four pipelining stages to reach
the target fMAX. The simulation fails and issues an error, highlighting the Primitive
subsystem.
DSP Builder gives this error because you must increase the constraint limit by at least
3 (that is, to < 5) to meet the target fMAX.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
191
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
DSP Builder relocates the sample delay, to save registers, to the Boolean signal that
drives the s-input of the 2-to-1 Mux block. You may see a mismatch in the first cycle
and beyond, depending on the contents of the LUT.
When you design a control unit as an FSM, the locations of SampleDelay blocks
specify where DSP Builder expects zero values during the first cycle. In Figure 76 on
page 192, DSP Builder expects the first sample that the a-input receives of the
CmpGE block to be zero. Therefore, the first output value of that compare block is
high. Delay redistribution changes this initialization. You cannot rely on the reset state
of that block, especially if you embed the Primitive subsystem within a larger design.
Other subsystems may drive the feedback loop whose pipeline depth adapts to fMAX.
The first valid sample may only enter this subsystem after some arbitrary number of
cycles that you cannot predetermine. To avoid this problem, always ensure you anchor
the SampleDelay blocks to the valid signal so that the control unit enters a well-
defined state when valid-in first goes high.
To make a control unit design resistant to automated delay redistribution and to solve
most hardware FSM designs that fail to match simulation, replace every SampleDelay
block with the Anchored Delay block from the Control folder in the Additional
libraries. When the valid-in first goes high, the Anchored Delay block outputs one (or
more) zeros, otherwise it behaves just like an ordinary SampleDelay block.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
192
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
Synthesizing the example design (fMAX = 250MHz) on Arria V (speedgrade 4), shows
that DSP Builder is still redistributing the delays contained inside of the Anchored
Delay block to minimize register utilization. DSP Builder still inserts a register
initialized to zero before the s-input of the 2-to-1 Mux block. However, the hardware
continues to match Simulink simulation because of the anchoring. If you place highly
pipelined subsystems upstream so that the control unit doesn't enter its first state
until several cycles after device initialization, the FSM still provides correct outputs.
Synchronization is maintained because DSP Builder inserts balancing delays on the
valid-in wire that drives the Anchored Delay and forces the control unit to enter its
initial state the correct number of cycles later.
Control units that use this design methodology are also robust to optimizations that
alter the latency of components. For example, when a LUT block grows sufficiently
large, DSP Builder synthesizes a DualMem block in its place that has a latency of at
least one cycle. Automated delay balancing inserts a sufficient number of one bit wide
delays on the valid signal control path inside every Anchored Delay. Hence, even if
the CmpGE block is registered, its reset state has no influence on the initial state of
the control unit when the valid-in first goes high.
Each Anchored Delay introduces a 2-to-1 Mux block in the control path. When
targeting a high fMAX (or slow device) tight feedback loops may fail to schedule or
meet timing. Using Anchored Delay blocks in place of SampleDelay blocks may also
use more registers and can also contribute to routing congestion.
This style uses FIFO buffers for capturing and flow control of valid outputs, loops, and
for loops, for simple and complex nested counter structures. Also add latches to
enable only components with state—thus minimizing enable line fan-out, which can
otherwise be a bottleneck to performance.
Often designs need to stall or enable signals. Routing an enable signal to all the blocks
in the design can lead to high fan-out nets, which become the critical timing path in
the design. To avoid this situation, enable only blocks with state, while marking output
data as invalid when necessary.
DSP Builder provides the following utility functions in the Additional Blocks Control
library, which are masked subsystems.
• Zero-Latency Latch (latch_0L)
• Single-Cycle Latency Latch (latch_1L)
• Reset-Priority Latch (SRlatch_PS)
• Set-Priority Latch (SRlatch)
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
193
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
Some of these blocks use the Simulink Data Type Prop Duplicate block, which takes
the data type of a reference signal ref and back propagates it to another signal prop.
Use this feature to match data types without forcing an explicit type that you can use
in other areas of your design.
You can use FIFO buffers to build flexible, self-timed designs insensitive to latency.
They are an essential component in building parameterizable designs with feedback,
such as those that implement back pressure.
You must acknowledge reading of invalid output data. Consider a FIFO buffer with the
following parameters:
• Depth = 8
• Fill threshold = 2
• Fill period = 7
A three cycle latency exists between the first write and valid going high. The q output
has a similar latency in response to writes. The latency in response to read
acknowledgements is only one cycle for all output ports. The valid out goes low in
response to the first read, even though the design writes two items to the FIFO buffer.
The second write is not older than three cycles when the read occurs.
With the fill threshold set to a low value, the t output can go high even though the v
out is still zero. Also, the q output stays at the last value read when valid goes low in
response to a read.
Problems can occur when you use no feedback on the read line, or if you take the
feedback from the t output instead with fill threshold set to a very low value (< 3). A
situation may arise where a read acknowledgement is received shortly following a
write but before the valid output goes high. In this situation, the internal state of the
FIFO buffer does not recover for many cycles. Instead of attempting to reproduce this
behavior, Simulink issues a warning when a read acknowledgement is received while
valid output is zero. This intermediate state between the first write to an empty FIFO
buffer and the valid going high, highlights that the input to output latency across the
FIFO buffer is different in this case. This situation is the only time when the FIFO
buffer behaves with a latency greater than one cycle. With other primitive blocks,
which have consistent constant latency across each input to output path, you never
have to consider these intermediate states.
You can mitigate this issue by taking care when using the FIFO buffer. The model
needs to ensure that the read is never high when valid is low using the simple
feedback. If you derive the read input from the t output, ensure that you use a
sufficiently high threshold.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
194
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
You can set fill threshold to a low number (<3) and arrive at a state where output t is
high and output v is low, because of differences in latency across different pairs of
ports—from w to v is three cycles, from r to t is one cycle, from w to t is one cycle. If
this situation arises, do not send a read acknowledgement signal to the FIFO buffer.
Ensure that when the v output is low, the r input is also low. A warning appears in the
MATLAB command window if you ever violate this rule. If you derive the read
acknowledgement signal with a feedback from the t output, ensure that the fill
threshold is set to a sufficiently high number (3 or above). Similarly for the f output
and the full period.
If you supply vector data to the d input, you see vector data on the q output. DSP
Builder does not support vector signals on the w or r inputs, as the behavior is
unspecified. The v, t, and f outputs are always scalar.
The enable input and demo_kronecker design example demonstrate flow control
using a loop.
You can use either Loop or ForLoop blocks for building nested loops.
When a stack of nested loops is the appropriate control structure (for example, matrix
multiplication) use a single Loop block. When a more complex control structure is
required, use multiple ForLoop blocks.
Note: The DSP Builder standard blockset is a legacy product and Intel recommends you do
not use it for new designs, except as a wrapper for advanced blockset designs.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
195
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
The Run ModelSim block from the advanced blockset does not work in a combined
blockset design. However, you can use the TestBench block from the standard
blockset to generate a testbench and compare the simulation results with the
ModelSim simulator.
You can still use the Run Quartus Prime block from the advanced blockset in a
combined blockset design but it only creates a Quartus Prime project for the advanced
blockset subsystem containing the Device block. Use a Signal Compiler block from
the standard blockset to create a Quartus Prime project for the whole combined
blockset design.
Note: DSP Builder generates the advanced blockset design when you simulate the design in
Simulink. Perform this simulation before DSP Builder generates the top-level design
example by running Signal Compiler.
The following settings and parameters must match across the two blocksets in an
integrated design:
• Use forward slash (/) separator characters to specify the hardware destination
directory that you specify in the Control block as an absolute path.
• The device family that you specify in the Device block must match the device
family you specify in the top level Signal Compiler block and the device on your
development board. However, you can set the specific device to Auto, or have
different values. The target device in the generated Quartus Prime project is the
device that you specify in the Signal Compiler block. HIL specifies its own
Quartus Prime project, which can have a different device provided that the device
family is consistent.
• The reset type that you specify in the advanced blockset Signals block must be
active High.
• When you run the TestBench for a combined blockset design, expect mismatches
when the valid signal is low.
• The standard blockset does not support vector signals. To convert any vectors in
the advanced blockset design, use multiplexer and demultiplexer blocks.
Use HDL Input and HDL Output blocks to connect the subsystem that contains the
advanced blockset design. The signal dimensions on the boundary between the
advanced blockset subsystem and the HDL Input/HDL Output blocks must match.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
196
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
Figure 78. Advanced Blockset Subsystem Enclosed by HDL Input and Output Blocks
Note: The signal types are on either side of the HDL Input/HDL Output blocks after you
simulate the subsystem. If the signal types do not display, check that Port Data
Types is turned on in the Simulink Format menu.
If the signal types do not match, there may be error messages of the form:
In the example, DSP Builder issues an error because the signal type for the HDL
OutputQ block is incorrect . Change it from Signed Fractional [2].[26] to Signed
Fractional [2].[15].
After this change, the signal type shows as SBF_2_15 (representing a signed binary
fractional number with 2 integer bit and 15 fractional bits) in the standard blockset
part of the design (before the HDL Input block). The same signal shows as
sfix17_En15 (representing a Simulink fixed-point type with word length 17 and 15
fractional bits) in the advanced blockset design (after the HDL Input block).
For more information about Simulink fixed-point types, refer to the MATLAB help.
Related Information
Fixed-Point Notation in Volume 2: DSP Builder Standard Blockset of the DSP Builder
Handbook.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
197
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
3. Add three Input blocks from the DSP Builder IO & Bus library immediately before
the new Subsystem block: Input (with Signed Fractional [1][11] type),
Input1 (with Single Bit type), and Input2 (with Unsigned Integer 8 type)..
4. Add an Output block immediately after the Subsystem block with Signed
Fractional [1][15] type.
Note: These steps specify the boundaries between Simulink blocks and DSP
Builder blocks.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
198
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
5. Open the Subsystem block and select the AD9856, OvScope, and ChanView
blocks inside the subsystem. Click Create Subsystem on the popup menu to
push these blocks down another level. Rename this subsystem (to for example
DSPBA).
6. Add three HDL Input blocks from the DSP Builder AltLab library between the
Simulink input ports and the DSPBA subsystem.
a. These blocks should have the same types as in Step 2: HDL Input (Signed
Fractional [1][11]), HDL Input1 (Single Bit), and HDL Input2
(Unsigned Integer 8).
b. On the signed fractional HDL Input, set the External Type parameter to
Simulink Fixed-point Type.
7. Add a HDL Output block between the subsystem and the subsystem output port
with the same type as in Step 3 on page 198 (Signed Fractional [1][15]).
Note: Steps 5 on page 199 and 6 on page 199 specify the boundaries between
blocks from the standard and advanced blocksets. The HDL Input and HDL
Output blocks must be in a lower level subsystem than the Input and
Output blocks. If they are at the same level, a NullPointerException
error issues when you run Signal Compiler.
Figure 81. HDL Input, HDL Output and DSPBA Blocks in Subsystem
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
199
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
9. Move the Device block from the AD9856 subsystem up a level into the DSPBA
subsystem you create by making a copy of the existing Device block and then
deleting the old one.
Note: The Device block detects the presence of a DSP Builder advanced
subsystem and should be in the highest level of the advanced blockset
design that does not contain any blocks from the standard blockset.
10. Open the Control block in the top level of the design and change the Hardware
Destination Directory to an absolute path. For example: C:/rtl
Note: You must use forward slashes (/) in this path.
11. Add Signal Compiler, TestBench and Clock blocks from the DSP Builder AltLab
library to the top-level model.
12. In the Signal Compiler block, set the Family to Stratix II to match the family
specified in the Device block.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
200
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
13. In the Clock block, set the Real-World Clock Period to 50 ns so that it matches
the Clock Frequency specified in the Signals block. Set the Reset Name to aclr
Active Low.
14. Remove the Run ModelSim and Run Quartus Prime blocks that the combined
blockset design no longer requires.
15. Simulate the design to generate HDL for the advanced subsystem.
16. Compile the system using Signal Compiler. It should compile successfully with
no errors.
You can also use the Testbench block to compare the Simulink simulation with
ModelSim. However, several cycles of delay in the ModelSim output are not
present in Simulink because the advanced blockset simulation is not cycle-
accurate.
DSP Builder treats the subsystem containing the HDL Input, HDL Output blocks and
the advanced blockset subsystem as a black box. You can only add additional blocks
from the advanced blockset to the subsystem inside this black-box design. However,
you can add blocks from the standard blockset in the top-level design or in additional
subsystems outside this black-box design.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
201
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
Note: Ensure that the Hardware Destination Directory specified in the Control
block is specified as an absolute path using forward slash (/) separator
characters.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
202
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
Figure 85. Hardware in the Loop Parameter Settings Page 1 for the AD9856 Example
7. Close the Hardware in the loop dialog box and connect the dataIn, validIn,
channelIn, ChannelOut0, and ChannelOut1 ports to your design example.
Note: HIL simulation does not use the bus interface. Connect the bus_areset
signal to a GND block and the bus_clk port to a VCC block from the
standard blockset IO & Bus library. Connect the bus_clk_out and
bus_clk_reset_out signals to Simulink Terminator blocks.
8. Clean up the HIL design example by removing the blocks that are not related to
the HIL.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
203
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
204
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
13. Click Compile with Quartus Prime to compile the HIL model.
14. Click Scan JTAG to find all the hardware connected to your computer and select
the required JTAG Cable and Device in chain.
15. Click Configure FPGA to download the compiled programming file (.sof) to the
DSP development board.
16. Close the Hardware in the loop dialog box and save the model.
17. Simulate the HIL model. Compare the displays of the OutputSpectrum and
OutScope blocks to the waveforms in the original model, which should be
identical.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
205
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
18. You can speed up the HIL simulation using burst mode. To use burst mode, open
the Hardware in the loop dialog box and turn on Burst Mode.
19. Then repeat the step to download an updated programming file into the device on
the DSP development board. This action resets the memory and registers to their
starting value. When you simulate the HIL design example again the simulation is
much faster.
The OutputSpectrum display should be identical, but you can observe extra
delays (equal to the burst length) on signals in the OutScope display.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
206
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
1. Set the following variable in MATLAB to prevent DSP Builder from using virtual pin
assignments (virtual pins don't appear on the HIL block).
DSPBA_Features.VirtualPins = false
Alternatively, add this variable to the setup_<modelname>.m file before loading
the model.
2. Open the demo_scale.mdl design example from the Examples\Baseblocks
directory in the installed design examples for the advanced blockset.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
207
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
208
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
4. Open the Device block inside the ScaleSystem subsystem to verify that the
target device family is compatible with the target DSP development board for the
HIL simulation.
5. Double-click on the Run Quartus Prime block to start the Quartus Prime
software and create a Quartus Prime project with the HDL generated in step .
Note: The Quartus Prime project obtains its name from the subsystem that
contains the Device block, ScaleSystem.qpf.
6. In the Quartus Prime Tools menu, click Start Compilation and verify that the
project compiles successfully.
7. Save a copy of the design example as demo_scale_HIL.mdl and delete the
ScaleSystem subsystem in this model.
8. Replace the ScaleSystem subsystem by a HIL block from the AltLab library in
the DSP Builder standard blockset.
9. Double-click on the HIL block to open the Hardware in the loop dialog box. In
the first page of the dialog box, select the Quartus Prime project
(ScaleSystem.qpf) that you created. Select the clock pin (clk) and reset pin
(areset_n). Set the a0, a1, a2, and a3 input port types as signed [2].[30],
the q0, q1, q2, and q3 output port types as signed [1].[15].
Note: Leave the ac and shift input ports and the qc output port as unsigned.
The reset level must match the level specified in the Signals block for the
original model.
Figure 93. Hardware in the Loop Parameter Settings Page 1 for the Scale Block Example
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
209
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
12. Connect all the inputs and outputs to the HIL block using Input and Output
blocks from the standard blockset. Verify that the data types for all the inputs and
outputs are set correctly, matching those set in the Hardware in the loop dialog
box.
Note: The HIL block expands the channel data input bus into four individual
inputs (a0, a1, a2, and a3) and the channel data output bus into four
separate outputs (q0, q1, q2, and q3).
13. Use Simulink Demux and Mux blocks to separate the inputs into individual inputs
and multiplex the four outputs together into one data bus.
14. Connect a VCC block from the standard blockset IO & Bus library to the signal
input bus_areset_n. Terminate the output signals qe0, qe1, qe2, and qe3
using Simulink Terminator blocks.
15. Connect the DSP development board and ensure that you switch it on.
16. Re-open the Hardware in the loop dialog box and set the reset level as
Active_High.
17. Click on Next to display the second page of the Hardware in the loop dialog
box. Enter a full device name into the FPGA device field. Verify that the device
name matches the device on the DSP development board and is compatible with
the device family set in the original model.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
210
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
18. Click Compile with Quartus Prime to compile the HIL model.
19. Click Scan JTAG to find all the hardware connected to your computer and select
the required JTAG Cable and Device in chain
20. Click Configure FPGA to download the compiled programming file (.sof) to the
DSP development board.
21. Close the Hardware in the loop dialog box and save the model.
22. Simulate the HIL model. Compare the waveform of the DataOutScope block to
the results for the original model, which should be identical.
DSP Builder distinguishes control flow from data flow: control flow is the logic you
connect to the ChannelIn and ChannelOut valid signal path. DSP Builder applies
little or no reset minimization to control logic and aggressive minimzation to data flow.
By default, DSP Builder chooses reset minimization options for you automatically. It
automatically applies reset minimization if your target device includes the HyperFlex
architecture.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
211
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
You may override the default automatic reset minimization options, for example as
part of design space optimization.
When you globally apply reset minimization, DSP Builder determines a local reset
minimization setting for each of your synthesizable subsystems. DSP Builder applies
this local reset minimization conditionally, if your subsystem contains ChannelIn or
ChannelOut blocks.
On Off Any No
DSP Builder does not apply reset minimization to blocks with innate state, user-
constructed cycles, and enable logic in your design, as that can give undefined initial
values.
Reset minimization only detects local cycles within a subsystem. You should avoid
broader feedback cycles.
Reset minimization may affect the behavior of your design during Simulink simulation
and on hardware.
Simulink Simulation
The DSP Builder simulation engine within Simulink is unaware of the reset
minimization optimization and therefore always simulates your design behavior with
reset present.
In general there is no difference in behavior, and this is aided by the testbench inputs
defaulting typically to zero and a longer minimum reset pulse-width allowing such
defaults to propagate through the datapath register stages.
However in some cases mismatches may occur, because data entering a Sample
Delay in your design during reset is non-zero.
If an input does not default to zero or the internal behavior is incompatible with
Sample Delay blocks resetting to zeros (or the minimum reset-pulse width is less
than the design latency), the Simulink simulation might be different than the HDL
simulation.
Implementation on Hardware
Removing a reset on the datapath means that when DSP Builder releases a reset, your
data flow logic may contain values clocked in during reset, which might affect the
initial post-reset behavior of your system.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
212
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
Related Information
• Control on page 236
• Synthesis Information (SynthesisInfo) on page 362
Additionally, your HDL must conform to DSP Builder design rules and must:
• Have only one clock domain
• Match reset level with DSP Builder
• Use the std_logic data type for clock and reset ports
• Use std_logic_vector for all other ports
• Have no top-level generics
• Contain no bus components
You may need to write a wrapper HDL file that instantiates your HDL, which might
configure generics, convert from other data types to std_logic_vector, or invert the
reset signal.
DSP Builder can import any number of instantiated entities. To import multiple
copies of an entity or multiple distinct entities, instantiate the entities in a top-
level wrapper file.
Simulink does not model all the signal states that ModelSim uses (e.g. ‘U’).
Simulink interprets all non-‘1’ states as a ‘0’.
Importing HDL uses the HDL Verifier toolbox to communicate with an HDL simulation
running in ModelSim. You can have as many components in your ModelSim simulation
as you like; each component communicates with a separate DSP Builder HDL Import
block. Your top-level design must include an HDL Import Config block.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
213
8. Techniques for Experienced DSP Builder Users
HB_DSPB_ADV | 2019.04.01
Simulink
Source Control
Component 0
HDL Import
Subsystem Subsystem
Component 0
Component 1
ModelSim
DSP Builder Advanced
Sink
You cannot place HDL Import blocks inside a primitive scheduled subsystem.
DSP Builder creates the appropriate instantiation of the component represented by the
HDL Import block.
DSP Builder sees imported HDL as a scheduled system. DSP Builder does not try to
schedule your imported HDL. You cannot import HDL into a scheduled subsystem.
Imported HDL acts like other DSP Builder IP blocks (e.g. NCO, FFT). You must
manually delay-balance any parallel datapaths and turn on Generate Hardware in
the Control block.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
214
HB_DSPB_ADV | 2019.04.01
Send Feedback
9. About Folding
Folding optimizes hardware usage for low throughput systems, which have many clock
cycles between data samples. Low throughput systems often inefficiently use
hardware resources. When you map designs that process data as it arrives every clock
cycle to hardware, many hardware resources may be idle for the clock cycles between
data.
Folding allows you to create your design and generate hardware that reuses resources
to create an efficient implementation.
The folding factor is the number of times you reuse a single hardware resource, such
as a multiplier, and it depends on the ratio of the data and clock rates:
DSP Builder offers ALU folding for folding factors greater than 500. With ALU folding,
DSP Builder arranges one of each resource in a central arithmetic logic unit (ALU) with
a program to schedule the data through the shared operation.
ALU folding reduces the resource consumption of a design by as much as it can while
still meeting the latency constraint. The constraint specifies the maximum number of
clock cycles a system with folding takes to process a packet. If ALU folding cannot
meet this latency constraint, or if ALU folding cannot meet a latency constraint
internal to the DSP Builder system due to a feedback loop, you see an error message
stating it is not possible to schedule the design.
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
9. About Folding
HB_DSPB_ADV | 2019.04.01
For designs that use more than one data type, a Convert block between two data
types uses more resources if the design requires saturation and rounding. An unbiased
rounding operation uses more resources than a biased rounding mode.
With ALU folding, any blocks that store state have a separate state for each channel.
DSP Builder only updates the state for a channel when the system processes the
channel. Thus, a sample delay delays a signal until processing the next data sample.
For 200 clock cycles to a data period, DSP Builder delays the signal for the 200 clock
cycles. Also, data associated with one channel cannot affect the state associated with
any other channel. Changing the number of channels does not affect the behavior of
the design.
Note: For designs without ALU folding, state is associated with a block, which you can
update in any clock cycle. Data input with channel 0 can affect state that then affects
a computation with data input with channel 1.
Simulation rate Specify clock rate or data rate to control how Simulink models the system
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
216
9. About Folding
HB_DSPB_ADV | 2019.04.01
Data Rate
Figure 97. Single Channel Data Rate Simulation with no Register Outputs
Simulink
Sample Time
v 1 1 1
0 0 0 Simulink
a1 a2 a3 Inputs
b1 b2 b3
v 1 0 1 0 1
0 X 0 X 0 Hardware
a1 X a2 X a3 Inputs
b1 X b2 X b3
v 1 1 1
0 0 0 Simulink
qa1 qa2 qa3 Outputs
qb1 qb2 qb3
v 0 1 0 1 0
0 0 0 0 0 Hardware
0 qa1 0 qa2 0 Outputs
0 qb1 0 qb2 0
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
217
9. About Folding
HB_DSPB_ADV | 2019.04.01
sop 1 0 1 0
v 1 1 1 1 1
c 0 1 2 0 1 Inputs
d0 a1 a2 a3 a1 a2
d1 b1 b2 b3 b1 b2
sop 1 0 1 0
v 1 1 1 1 1
c 0 1 2 0 1 Outputs
d0 a1 a2 a3 a1 a2
d1 b1 b2 b3 b1 b2
Clock Rate
Figure 99. Single Channel Clock Rate Simulation with no Register Outputs
v 1 0 1 0 1
0 X 0 X 0
Inputs
a1 X a2 X a3
b1 X b2 X b3
v 0 1 0 1 0...
0 0 0 0 0...
Inputs
0 qa1 0 qa2 0...
0 qb1 0 qb2 0...
Simulink
Sample Time
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
218
9. About Folding
HB_DSPB_ADV | 2019.04.01
Figure 100. Single Channel Clock Rate Simulation with Register Outputs
v 1 0 1 0 1
0 X 0 X 0
Inputs
a1 X a2 X a3
b1 X b2 X b3
v 0 1 0 1 0...
0 0 0
Inputs
0 qa1 qa2
0 qb1 qb2
Simulink
Sample Time
sop 1 0 1 0
v 1 1 1 0 1 1 1 0
c 0 1 2 X 0 1 2 X Inputs
d0 a1 a2 a3 X a1 a2 a3 X
d1 b1 b2 b3 X b2 b2 b3 X
sop 0 1 0 1 00
v 0 1 1 1 0 1 1
c 0 0 1 2 0 0 1 Outputs
d0 0 a1 a2 a3 0 a1 a2
d1 0 b1 b2 b3 0 b1 b2
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
219
9. About Folding
HB_DSPB_ADV | 2019.04.01
sop 1 0 1 0
v 1 1 1 0 1 1 1 0
c 0 1 2 X 0 1 2 X Inputs
d0 a1 a2 a3 X a1 a2 a3 X
d1 b1 b2 b3 X b2 b2 b3 X
sop 0 1 0 1 00
v 0 1 1 1 1 1
c 0 0 1 2 0 1 Outputs
d0 0 a1 a2 a3 a1 b2
d1 0 b1 b2 b3 b1 b2
Note: In the ChannelIn and ChannelOut blocks, before you use ALU folding, ensure you
turn off Folding enabled.
1. Open the top-level design that contains the primitive subsystem you want to add
ALU folding to.
2. Save a backup of your original design.
3. Replace:
• Constant multiplier blocks with multipliers blocks.
• Reciprocal blocks with Divide blocks
• Sin(πx) blocks with sin(x) blocks.
4. Avoid low-level bit manipulation
5. Open the primitive subsystem (which contains the ChannelIn and ChannelOut
blocks) and add an ALU Folding block from the DSP Builder Utilities library.
6. Double click the ALU Folding block to open the Block Parameters window.
7. Enter a value for Sample rate (MHz).
8. Enter a value for Maximum latency (cycles)
9. Turn off Register Outputs to make the output format the same as the input
format. Turn on Register Outputs, so that the outputs hold their values until the
next data sample output occurs.
10. Select the Simulation rate
11. Simulate your design.
DSP Builder generates HDL for the folded implementation of the subsystem and a
testbench. The testbench verifies the sample rate Simulink simulation against a
clock rate ModelSim simulation of the generated HDL.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
220
9. About Folding
HB_DSPB_ADV | 2019.04.01
The testbench uses captured test vectors from the Simulink simulation and plays
through the clock rate simulation of the generated hardware at the data rate. DSP
Builder checks the order and bit-accuracy of the hardware simulation outputs against
the Simulink simulation.
clk
VALID_IN
DATA_IN D1 D2
VALID_OUT
DATA_OUT Q1
READY
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
221
9. About Folding
HB_DSPB_ADV | 2019.04.01
You may use the valid signal instead of the start of packet signal, which does not allow
the folded system to process a non-valid data sample.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
222
HB_DSPB_ADV | 2019.04.01
Send Feedback
Fixed-point designs often cannot support data with a high dynamic range unless the
design explicitly uses a high precision type. Floating-point designs can represent data
over a high dynamic range with limited precision. A compact representation makes
efficient use of memory and minimizes data widths. The lowest precision type that
DSP Builder supports is float16_m10, otherwise known as half-precision float, which
occupies 16 bits of storage. It can represent a range between –216 to +216 (exclusive)
and non-zero magnitudes as small as 2-14.
Typically, fixed-point designs may include fixed-point types of various bit widths and
precisions. When you create fixed-point designs, keep variations in word growth and
word precision within acceptable limits. When you create floating-point designs, you
must limit rounding error to ensure an accurate result. A floating-point design typically
has only one or two floating-point data types.
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
10. Floating-Point Data Types
HB_DSPB_ADV | 2019.04.01
Type Name Sign Width s Exponent Width e Exponent Bias b Mantissa Width m Description
float19_m10 8 127 10
float26_m17 8 127 17
float35_m26 8 127 26
float46_m35 10 511 35
float55_m44 10 511 44
DSP Builder represents the special values positive zero, negative zero, subnormals,
and non-numbers in the standard IEEE 754 manner, namely:
Except for the preceding special values, the numerical value of a float type is given in
terms of its bit-wise representation by:
where:
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
224
10. Floating-Point Data Types
HB_DSPB_ADV | 2019.04.01
For example, for a 32-bit single precision floating point number with a bit-wise
representation of 0x40300000:
s = 0b = 0
e = 10000000b = 128
m = 01100000000000000000000b = 3145728
then:
= 1 × 2 × (1+0.375)
= 2.75
For the fundamental operations (add, subtract, multiple, divide) this error is
determined by the rounding mode:
• Correct. A typical relative error is half the magnitude of the LSB in the mantissa.
• Faithful. A typical relative error is equal to the magnitude of the LSB in the
mantissa.
The relative error for float16_m10 is approximately 0.1% for faithful rounding, and
0.05% for correct rounding. The rounding mode is a configurable mask parameter.
Bit cancellations can occur when subtracting two floating-point numbers that are very
close in value, which can introduce very large relativeerrors. You need to take the
same precautions with floating-point designs as with numerical software to prevent bit
cancellations.
Each of these changes reduces logic utilization at the expense of accuracy. A design
may use more than one floating-point precision for different sections of the circuit,
however if there are too many different precisions you will need to have more type
conversion blocks. Each convert block increases logic utilization.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
225
10. Floating-Point Data Types
HB_DSPB_ADV | 2019.04.01
Related Information
DSP Builder Floating Point Design Examples on page 113
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
226
10. Floating-Point Data Types
HB_DSPB_ADV | 2019.04.01
This MATLAB structure specifies the floating-point precision similar to how fixdt()
specifies fixed-point precisions.
2. For the top-level Convert block on the input wire, open the parameter dialog box:
a. Set the Output data type mode to Specify via dialog.
b. Delete the Output data type field and type the variable name inputType.
3. Repeat for the Convert block in the primitive subsystem, so that the same data
type propagates to both inputs of the Mult blocks.
4. Change the floating-point precision of the design, by assigning a different type to
the variable inputType. You can also initialize the type variable using the data type
name:
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
227
10. Floating-Point Data Types
HB_DSPB_ADV | 2019.04.01
1. Add the DSP Builder custom SameDT block to your design. Do not use the built-in
Simulink same-DT block, which does not propagate data types.
2. Use any of the following blocks to allow you to back propagate DSP Builder's
floating-point data types via their output ports:
• Const
• Lut
• Convert
• ReinterpetCast
3. Set the Output data type mode parameter for these blocks to Inherit via back
propagation. Using this option and the custom SameDT block minimizes
scripting for setting up data types in your design.
The data type propagates via the built-in multiplex to three different wires, and
then back propagates via the respective output ports of the Convert block,
(coefficients) LUT block, and Const block.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
228
10. Floating-Point Data Types
HB_DSPB_ADV | 2019.04.01
The fields ending in _stm are from the stimulus files that Simulink normally writes
out during simulation. You can use these as the golden standard against which to
compare the simulated hardware output. The verification function you specified is
started, passing this struct as the first parameter
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
229
10. Floating-Point Data Types
HB_DSPB_ADV | 2019.04.01
When you turn on Fused datapath, you can select only the rounding modes Nearest
and Towards zero. Logic utilization is highest when your design uses rounding mode
Nearest.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
230
10. Floating-Point Data Types
HB_DSPB_ADV | 2019.04.01
b. Use the custom SameDT block to avoid any data type propagation problems
in feedback cycles.
y = ex
y = mx + c
Note: The following design examples show the various stages of the Newton-Raphson root
finding tutorial:
• demo_newton_iteration.mdl
• demo_newton_convergence.mdl
• demo_newton_valid.mdl
• demo_newton_control.mdl
• demo_newton_final.mdl
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
231
10. Floating-Point Data Types
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
232
10. Floating-Point Data Types
HB_DSPB_ADV | 2019.04.01
To force soft floating-point data type, you must perform this task on all applicable
blocks in your design.
1. Type struct('forceSoftFP', 1) in the Advanced Options dialog box.
Related Information
• Add on page 313
• Multiply (Mult) on page 343
• Subtract (Sub) on page 354
• Scalar Product
• Sum of Elements (SumOfElements) on page 355
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
233
HB_DSPB_ADV | 2019.04.01
Send Feedback
Note: You can either use this block in your design or view the Avalon-MM slave interface
settings on the DSP Builder ➤ Avalon Interface menu.
Bus interface name Specifies the prefix for the address, data and control signals in the generated control bus.
Address width Specifies the width in bits of the memory-mapped address bus (1–32, default=10).
Data width Specifies the width in bits of the memory-mapped data bus (16, 32, or 64, default=16). DSP
Builder does not support byte enables for Avalon-MM slave interface. Only connect masters to this
interface that have the same or a smaller data width. For example, to attach a JTAG master, set
the data width to 32 bits or less.
When using SharedMem block ensure the output data width matches the
AvalonMMSlaveSettings bus data width or is exactly twice the bus data width.
Bus is: Specifies whether the memory-mapped address bus is Big Endian or Little Endian.
Separate bus clock Turn on so any processor-visible control registers are clocked by a separate control bus clock to
ease timing closure.
Bus clock frequency Specifies the frequency of the separate processor interface bus clock (when enabled).
(MHz)
Bus clock Turn on so the bus clock is synchronous with the system clock.
synchronous with
system clock
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
11. Design Configuration Library
HB_DSPB_ADV | 2019.04.01
Use word addressing when accessing memory-mapped blocks in DSP Builder from the
design logic or through DSP Builder processor interface (using the BusStimuli block).
Use byte addressing when you access the same locations through DSP Builder MATLAB
API. To change the word address to byte address, multiply it by the number of bytes
in AvalonMMSlaveSettings block data width. If you use the BusStimuliFileReader
block to drive the BusStimulus block, ensure values for Data Width and Address
Width parameters exactly match the address and data width you set in Avalon
Interfaces ➤ Avalon-MM Slave Settings
Note: Ensure your access permissions are correct, when using the RegBit, RegField, and
SharedMemblocks from the Interface library.
Note: If you read from a nonreadable address, the output data is not valid.
When using the SharedMem block the output data width is twice the bus data width.
In the DSP Builder processor interface, the block appears to have twice the number of
entries compared with the design view. Also DSP Builder interprets each element in an
initialization array to be of output data width. Use the System Console MATLAB API in
DSP Builder to access the memory-mapped locations in DSP Builder designs on the
FPGA. Use byte addressing when using this interface:
dspba_design_base_address_in_qsys +
(block_address_in_dspba_design* dspba_bus_data_width_bytes)
Read and write requests time out in 1 minute if the device shows no response for the
initiated request. For example:
• Read or write requests to an address that is not assigned to any slave in the top-
level system.
• Read requests to a memory-mapped location that does not have read access (i.e.
write only).
If the subsequent requests to valid addresses and locations continue to time out, the
initial request disables the bus interconnect. You must then reset the system or
reprogram the board.
Additionally, close all your master connections in MATLAB before switching off or
reprogramming the board, because MATLAB corrupts the existing connection. If you
cannot start a new debugging session, restart MATLAB.
Clock Crossing
DSP Builder designs use a separate clock for all processor visible control registers if
you select Separate bus clock in Avalon Interfaces ➤ Avalon-MM Slave
Settings. This clock is asynchronous to a main system clock if you turn off Bus clock
synchronous with system clock.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
235
11. Design Configuration Library
HB_DSPB_ADV | 2019.04.01
DSP Builder inserts simple two-stage synchronizers between the bus and system clock
domains. DSP Builder adds the synchronization to the autogenerated bus slave logic if
you use any of the Interface blocks (e.g. RegField) and to NCO IP if you enable
writable access to configuration registers inside the NCO.
The DSP Builder-generated timing constraints set maximum and minimum delays for
paths between two different clocks to a big enough range, so timing analyzer doesn’t
show an error. Using this method allows you to overwrite constraints for concrete
paths if required. However, specifying a false path constraint takes precedence over
other constraints.
You can use similar constraints for all such paths in DSP Builder blocks for the higher
level projects.
When you add synchronizers to DSP Builder designs, the Quartus Prime timing
analyzer also provides a metastability report.
Related Information
• Register Bit (RegBit) on page 283
• Register Field (RegField) on page 283
• Shared Memory (SharedMem) on page 285
11.2. Control
The Control block specifies information about the hardware generation environment
and the top-level memory-mapped bus interface widths.
Note: DSP Builder applies globally the options in the Control block to your design.
Hardware destination Specify the root directory in which to write the output files. This location can be an absolute path
directory or a relative path (for example, ../rtl). A directory tree is created under this root directory that
reflects the names of your model hierarchy.
Generate a single In v18.1 and earlier, DSP Builder designs that you import and generate in Platform Designer have a
Avalon Conduit single Avalon interface for data, valid, and channel signals. In v19.1 or later, if you regenerate an
interface for the existing design, turn on this option to preserve the single Avalon interface.
Platform Designer
Small memory This threshold controls whether the design uses registers or small memories (MLABs) to implement
minimum fill delay lines. DSP Builder uses a small memory only if it fills it with at least the threshold number of
bits. On device families that don't support small memories, DSP Builder ignores this threshold.
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
236
11. Design Configuration Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
Medium memory This threshold controls when the design uses a medium memory (M9K, M10K or M20K) instead of a
minimum fill small memory or registers. DSP Builder uses the medium memory only if it fills it with at least the
threshold number of bits.
Large memory This threshold controls whether the design uses a large memory (M144K) instead of multiple
minimum fill medium memories. DSP Builder uses the large memory only when it can fill it with at least the
threshold number of bits. Default prevents the design using any M144Ks. On device families that
don't support large memories, DSP Builder ignores this threshold.
Multiplier: logic and Specifies the number of logic elements you want to use to save a multiplier. If the estimated cost
DSP threshold of implementing a multiplier in logic is no more than this threshold, DSP Builder implements that
multiplier in logic. Otherwise DSP Builder uses a hard multiplier. Default means the design always
uses hard multipliers.
Clock signal name Specifies the name of the system clock signal that DSP Builder uses in the RTL generation, in the
_hw.tcl file, and that you see in Platform Designer.
Clock frequency Specifies the system clock rate for the system.
(MHz)
Clock margin (MHz) Specifies the margin requested to achieve a high system frequency in the fitter. The specified
margin does not affect the folding options because the system runs at the rate specified by the
Clock frequency parameter setting. Specify a positive clock margin if you need to pipeline your
design more aggressively (or specify a negative clock margin to save resources) when you do not
want to change the ratio between the clock speed and the bus speed.
Reset signal name Specifies the name of the reset signal that DSP Builder uses in the RTL generation, the _hw.tcl
file, and that you see in Platform Designer.
Reset active Specifies whether the logic generated is reset with an active high or active low reset signal.
Use default minimum Turn on to enter a minimum reset value pulse width.
reset pulse width
Minimum reset pulse Enter a value for the minimum number of system clock cycles for which you assert the reset signal
width in your target hardware.
This setting does not enforce that your design correctly resets in the number of cycles you specify,
in particular when you apply reset minimization. You should simulate your design with this value
(which DSP Builder applies in the simulation testbench) to confirm that your design works.
DSP Builder reset minimization uses a longer minimum reset pulse width to remove resets on the
control path. Applying a reset value at an earlier register propagates to later registers during the
reset period, without them needing an explicit reset.
When you turn Global enable On, DSP Builder enters a large, minimum reset pulse width
according to the reset-minimization. When you turn Global enable Off it selects a small minimum
reset pulse width as in previous versions of DSP Builder.
DSP Builder reports the actual minimum reset pulse width value when it generates your design.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
237
11. Design Configuration Library
HB_DSPB_ADV | 2019.04.01
Create automatic Turn on to generate additional automatic testbench files. These files capture the input and output
testbenches of each block in a .stm file. DSP Builder creates a test harness (_atb.vhd) that simulates the
generated RTL alongside the captured data. DSP Builder generates a script (<model>_atb.do)
that you can use to simulate the design in ModelSim and ensure bit and cycle accuracy between
the Simulink model and the generated RTL.
From v14.1, the following parameters are in DSP Builder ➤ Avalon Interfaces ➤
Avalon-MM slave or in the optional AvalonMMSlavesettings block:
• System address width
• System data width
• System bus is:
Options in the Control block specify whether hardware generates for your design
example and the location of the generated RTL. You can also create automatic RTL
testbenches for each subsystem in your design example and specify the depth of
signals that DSP Builder includes when your design example simulates in the
ModelSim simulator.
You can specify the address and data bus widths that the memory-mapped bus
interface use and specify whether DSP Builder stores the high-order byte of the
address in memory at the lowest address and the low-order byte at the highest
address (big endian), or the high-order byte at the highest address and the low-order
byte at the lowest address (little endian).
Related Information
Reset Minimization on page 211
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
238
11. Design Configuration Library
HB_DSPB_ADV | 2019.04.01
DSP Builder converts multipliers with a single constant input into balanced adder
trees, which occurs automatically where the depth of the tree is not greater than 2. If
the depth is greater than 2, DSP Builder compares the hard multiplier threshold with
the estimated size of the adder tree, which is generally much lower than the size of a
full soft multiplier. If DSP Builder combines two non-constant multipliers followed by
an adder into a single DSP block, DSP Builder does not convert the multiplier into LEs,
even if a large threshold is present.
11.3. Device
The Device block indicates a particular Simulink subsystem as the top-level design of
an FPGA device. It also specifies a particular device and allows you to specify the
target device and speed grade for the device.
Note: All blocks in subsystems below this level of hierarchy, become part of the RTL design.
All blocks above this level of hierarchy become part of the testbench.
You can hierarchically separate parts of the design into synthesizeable systems. You
must use a Device block, which sets the device family, part number, speed grade, and
so on, to indicate the top-level synthesizable system.
You can further hierarchically split the synthesizeable system into Primitive
subsystems for Primitive blocks and IP blocks.
DSP Builder generates project files and scripts that relate to this level of hierarchy. All
blocks in subsystems below this level become part of the RTL design. All blocks above
this level of hierarchy become part of the testbench.
You can insert multiple Device blocks in non-overlapping subsystems to use multiple
FPGAs in the same design. You can mix device families freely.
Family member Specify the device member as free-form text or enter AUTO for automatic selection. Click
on ... to display the Device Selector.
Speed grade Select the speed grade for the FPGA target device.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
239
11. Design Configuration Library
HB_DSPB_ADV | 2019.04.01
The Edit Params block is available as a functional block in the Simulink library
browser. To view it open the library, by right clicking on the Design Configuration
Blocks library in the Simulink Library Browser and selec Open Design
Configuration Blocks Library.
11.5. LocalThreshold
The LocalThreshold block allows hierarchical overrides of the global clock margin and
threshold settings set on the Control and Signals blocks.
You can place the LocalThreshold block anywhere in your design to define over-ride
values for the margin and threshold settings for that subsystem and any embedded
subsystems. You can over-ride these values further down in the hierarchy by
implementing more LocalThreshold blocks.
For example, you can specify different clock margins for different regions of your
design.
Clock margin (MHz) Specifies the margin to influence the tradeoff between performance and resources.. The specified
margin does not affect the folding options because the system runs at the rate specified by the
Clock frequency parameter setting. Specify a positive clock margin if you need to pipeline your
design more aggressively (or specify a negative clock margin to save resources) when you do not
want to change the ratio between the clock speed and the bus speed.
Generation Thresholds
Small memory This threshold controls whether registers or small memories (MLABs) implement delay lines. DSP
minimum fill Builder uses a small memory only if it fills it with at least the threshold number of bits. On device
families that don't support small memories, DSP Builder ignores this threshold.
Medium memory This threshold controls when the design uses a medium memory (M9K, M10K or M20K) in place of
minimum fill a small memory or registers. DSP Builder uses the medium memory only if it fills it with at least
the threshold number of bits.
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
240
11. Design Configuration Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
Large memory This threshold controls whether the design uses a large memory (M144K) instead of multiple
minimum fill medium memories. DSP Builder uses the large memory only when it can fill it with at least the
threshold number of bits. Default prevents the design using any M144Ks. On device families that
don't support large memories, DSP Builder ignores this threshold.
Multiplier: logic and Specifies the number of logic elements you want to use to save a multiplier. If the estimated cost
DSP threshold of implementing a multiplier in logic is no more than this threshold, DSP Builder implements that
multiplier in logic. Otherwise DSP Builder uses a hard multiplier. Default means the design always
uses hard multipliers.
Apply Karatsuba Implements this equation: (a+jb) * (c+jd) = (a-b)*(c+d) - a*d + b*c + j(a*d + b*c). DSP Builder
method to complex includes internal preadder steps into DSP blocks but you see bit growth in the multipliers.
multiply blocks
Related Information
DSP Builder Memory and Multiplier Trade-Off Options on page 239
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
241
HB_DSPB_ADV | 2019.04.01
Send Feedback
12. IP Library
Use the DSP Builder advanced blockset IP library blocks to implement full IP
functions. Only use these blocks outside of primitive subsystems.
Multirate filters are essential to the up and down conversion tasks that modern radio
systems require. Cost effective solutions to many other DSP applications also use
multirate filters to reduce the multiplier count.
FIR filter memory-mapped interfaces allow you to read and write coefficients directly,
easing system integration.
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
12. IP Library
HB_DSPB_ADV | 2019.04.01
Note: Each channel is an independent data source. In an IF modem design, two channels are
required for the complex pair from each antenna.
Automatic Pipelining
The required system clock frequency, and the device family and speed grade
determine the maximum logic depth permitted in the output RTL. DSP Builder
pipelines functions such as adders by splitting them into multiple sections with a
registered carry between them. This pipelining decreases the logic depth allowing
higher frequency operation.
High-Speed Operation
The DSP Builder filter generator is responsive to the system clock frequency, therefore
timing closure is much easier to achieve. The generator uses heuristics that ensure
the logic can run at the desired system clock frequency on the FPGA. You can help
timing closure by adding more clock margin, resulting in additional pipelining that
shortens the critical paths.The FPGA structures such as internal multiplier and memory
delays determine the maximum clock frequencies.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
243
12. IP Library
HB_DSPB_ADV | 2019.04.01
Scalability
In some cases, the aggregate sample rate for all channels may be higher than the
system clock rate. In these cases, the filter has multiple input or output buses to carry
the additional data, so DSP Builder implements this requirement in the Simulink block
by increasing the vector width of the data signals.
Coefficient Generation
You can generate filter coefficients using a MATLAB function that reloads at run time
with the memory-mapped interface registers. For example, the Simulink fixed-point
object fi(fir1(49, 0.3),1,18,19)
Channelization
The generated help page for the block shows the input channel data format and
output data channel format that a FIR or CIC filter uses, after you run a Simulink
simulation.
This updated help includes a link back to the help for the general block and the
following information about the generated FIR instance:
• Date and time of generation
• The version number and revision for the FIR
• Number of physical input and output data buses
• Bit width of data output.
• Number of different phases
• Implementation folding. The number of times that the design uses each multiplier
per sample to reduce the implementation size.
• Filter utilization. For some sample rates and some interpolation/decimation
settings, the filter may stall internally one or more cycles. The filter utilization is
the percentage of time that the filter is actively working, assuming that the input
arrives at the specified data rate.
• Tap utilization. When some filters are folded, the design may have extra unused
taps. The extra taps increase the filter length with no hardware resource increase.
• Latency. The depth of pipelining added to the block to meet the target clock
frequency on the chosen target device.
• Parameters table that lists the system clock, clock margin, and all FIR input
parameters.
• Port interface table.
• Input and output data format. An ASCII rendering of the input and output
channelized data ordering.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
244
12. IP Library
HB_DSPB_ADV | 2019.04.01
The updated help includes the following information about the CIC instance:
• Date and time of generation
• The version number and revision for the CIC
• Number of integrators. Depending on the input data rate and interpolation factor
the number of integrator stages DSP Builder needs to process the data may be
more than 1. In these instances, the integrator sections of the filter duplicate
(vectorize) to satisfy the data rate requirement.
• Calculated output bit width. The width in bits of the (vectorized) data output from
the filter.
• Calculated stage bit widths. Each stage in the filter has precise width in bits
requirements—N comb sections followed by N integrator sections.
• The gain through the CIC filter. CIC filters usually have large gains that you must
scale back.
• Comb section utilization. In the comb section, the data rate is lower, so that you
can perform more resource sharing. This message indicates the efficiency of the
subtractor usage.
• Integrator section utilization. In the integrator section, the data rate is higher, so
that you can perform less resource sharing. This message indicates the efficiency
of the adder usage.
• The latency that this block introduces.
• Parameters table that lists the decimation rate, number of stages, differential
delay, number of channels, clock frequency, and input sample rate parameters.
• Port interface table.
• Input and output data format.
The generalized form of these filters is L-band Nyquist filters, in which every Lth
coefficient is zero counting out from the center tap. DSP Builder also supports these
structures and can often reduce the number of multipliers required in a filter.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
245
12. IP Library
HB_DSPB_ADV | 2019.04.01
12.1.1.5. Setting and Changing FIR Filter Coefficients at Runtime in DSP Builder
1. Set the base address of the memory-mapped coefficients with the Base address
parameter.
2. Set the filter coefficients by entering a Simulink fixed-point array into the
Coefficients parameter.
3. Generate a vector of coefficients either by entering an array of numbers, or using
one of the many MATLAB functions to build the required coefficients.
4. Update the parameters through a processor interface during run time using the
BusStimulus block. Alternatively, update the parameters from you model by
exposing hidden processor interface ports (turn on Expose Bus Ports).
Note: If the FIR coefficient is wider than Avalon-MM data width, the design requires several
accesses to write or read a single coefficient.
In your higher level system, access FIR coefficients through the slave interface at the
base address you specified on the FIR block.
Note: The FIR base address is now an offset from the base address assigned to the slave
interface in your Platform Designer system.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
246
12. IP Library
HB_DSPB_ADV | 2019.04.01
When you expose bus interface ports in the Simulink design (turn on Expose Bus
Interface), a valid sub-set of Avalon MM slave interface ports appears on the block
based on the selected bus mode. You can now make direct connections to these ports
in the Simulink model for accessing the coefficients. The FIR coefficient width sets the
data ports (write and read). DSP Builder places bus slave logic on the system clock
domain.
address Input Address of the request. DSP Builder adds address to your
design when Bus Mode is not set to Constant.
The port width depends on the Bus Address Width in the
Avalon-MM Slave Settings block.
For the first coefficient use the Base Address you specify
for the block and for the last one use: Base Address +
Number Of Coefficients -1
Related Information
Avalon-MM Slave Settings (AvalonMMSlaveSettings) on page 234
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
247
12. IP Library
HB_DSPB_ADV | 2019.04.01
The input rate determines the bandwidth of the FIR. If you turn off Reconfigurable
carrier (nonreconfigurable FIR), the IP core allocates this bandwidth equally amongst
each channel. The reconfigurable FIR feature allows the IP core to allocate the
bandwidth manually. You set these allocations during parameterization and you can
change which allocation the IP core uses at run-time using the mode signal. You can
use one channel's bandwidth to process a different channel's data. You specify the
allocation by listing the channels you want the IP core to process in the mode
mapping. For example, a mode mapping of 0,1,2,2 gives channel 2 twice the
bandwidth of channel 0 and 1, at the cost of not processing channel 3.
You can use a ChanView block in a testbench to visualize the contents of the TDM
protocol. It produces synthesizable RTL, so you can use it anywhere in your design.
When a single channel is input, the ChanView block strips out all the non-valid
samples, thus cleaning up the display in the Simulink scope.
The channel outputs are not aligned. For example, if you have input channels c0 and
c1 on a single wire and view both channels, the output is not aligned.
Figure 106. Channel Viewer Output for Two Channels on a Single Wire
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
248
12. IP Library
HB_DSPB_ADV | 2019.04.01
Note: You can add delays after ChanView blocks if you want to realign the output channels.
Number of input Specifies the number of unique channels the block can process. The design does
channels not use this parameter unless the data bus is a vector or the folding factor is
greater than the number of channels. If the data bus is a vector, this value
determines which vector element contains the correct channel.
Output channels A vector that controls the input channels to decode and present as outputs. The
number of outputs equals the length of this vector, and each output corresponds
to one channel in order.
q Input The data input to the block. This signal may be a vector. This block does not support
floating-point types.
v Input Indicates validity of data input signals. If v is high, the data on the wire is valid.
c Input Indicates the channel of the data input signals. If v is high, c indicates which channel
the data corresponds to.
cn Output Each output is a deserialized version of the channel contained on the TDM bus. The
output value is updated on each clock cycle that has valid data when the channel
matches the required channel.
ov Output Optional. Pulses 1 at last cycle of a frame (when all held channel output signals have
correct value for the frame) provided valid is high throughout the frame data.
After DSP Builder runs a simulation, it updates the help pages with specific
information about each instance of a block. For resource usage, on the DSP Builder
menu, point to Resources, and click Design.
Written on Tue Feb 19 11:25:27 Date and time when you ran this file.
2008
Port interface table Lists the port interfaces to the ChanView block.
You can use this block in a digital up converter for a radio system or a general purpose
DSP application. The data has fixed-point types, and the output is the implied full
precision fixed-point type.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
249
12. IP Library
HB_DSPB_ADV | 2019.04.01
Note: You can easily replicate the ComplexMixer block with a Multiply block that takes
complex inputs within a Primitive subsystem.
The system specification, including such factors as the channel count and sample
rates, determines the main parameters for this block. The input sample rate of the
block determines the number of channels present on each input wire and the number
of wires:
For example, a sample rate of 60 MSPS and system clock rate of 240 MHz gives four
samples to be TDM on to each input wire.
If a wire has more channels than TDM slots available, the input wire is a vector of
sufficient width to hold all the samples. Similarly, the number of frequencies (the
number of complex numbers) determines the width of the sine and cosine inputs. The
number of results produced by the ComplexMixer is the product of the sample input
vector and the frequency vector. The results are TDM on to the i and q outputs in a
similar manner to the inputs.
Input Rate Per Channel (MSPS) The data rate per channel measured in millions of samples per second.
i Input The real (in phase) component of the complex data input. If you request more channels than can fit
on a single bus, this signal is a vector. The width in bits inherits from the input wire.
q Input The imaginary (quadrature phase) component of the complex data input. If you request more
channels than can fit on a single bus, this signal is a vector. The width in bits inherits from the input
wire.
v Input Indicates validity of data input signals. If v is high, the data on the a wire is valid.
c Input Indicates the channel of the data input signals. If v is high, c indicates the data channel data.
sin Input The imaginary part of the complex number. For example, the NCO's sine output.
cos Input The real part of the complex number. For example, the NCO’s cosine output.
i Output The in-phase (real) output of the mixer, which is (i × cos – q × sin). If you request more channels
than can fit on a single bus, this signal is a vector. The width in bits is wide enough for the full
precision result.
q Output The quadrature phase (imaginary) output of the mixer, which is (i × sin + q × cos). If you request
more channels than can fit on a single bus, this signal is a vector. The width in bits is wide enough for
the full precision result.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
250
12. IP Library
HB_DSPB_ADV | 2019.04.01
You can use the DecimatingCIC block in a digital down converter for a radio system
or a general purpose DSP application. The coefficients and input data are fixed-point
types, and the output is the implied full-precision fixed-point type. You can reduce the
precision with a separate Scale block, which can perform rounding and saturation to
provide the required output precision.
The DecimatingCIC has a lower output sample rate than the input sample rate by a
factor D, where D is the decimation factor. Usually, the DecimatingCIC discards (D–
1) out of D output samples thus lowering the sample rate by a factor D. The physical
implementation avoids performing additions leading to these discarded samples,
reducing the filter cost.
Figure 107. Decimate by 5 Filter Decreasing Sample Rate of a Random Noise Input
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
251
12. IP Library
HB_DSPB_ADV | 2019.04.01
Input rate per Specifies the sampling frequency of the input data per channel measured in millions of samples
channel per second (MSPS).
Decimation factor Specifies the decimation factor 1/(integer). (An integer greater than 1 implies interpolation.)
a Input The fixed-point data input to the block. If you request more channels than can fit on a single bus,
this signal is a vector. The width in bits is inherited from the input wire.
v Input Indicates validity of the data input signals. If v is high, the data on the a wire is valid.
c Input Indicates channel of data input signals. If v is high, c indicates which channel the data corresponds
to.
bypass Input When this input asserts, the input data is zero-stuffed and scaled by the gain of the filter, which is
useful during hardware debugging.
q Output The data output from the block. If you request more channels than can fit on a single bus, this
signal is a vector. The width in bits is a function of the input width in bits and the parameterization.
Related Information
DSP Builder FIR and CIC Filters on page 243
Use the Decimating FIR block in a digital down converter for a radio system or a
general purpose DSP application. The coefficients and input data are fixed-point types,
and the output is the implied full precision fixed-point type. You can reduce the
precision by using a separate Scale block, which can perform rounding and saturation
to provide the required output precision.
The Decimating FIR block supports rate changes from two upwards, coefficient width
in bits from 2 to 32 bits, half-band and L-band Nyquist filters, real and complex filters,
symmetry and anti(negative)-symmetry.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
252
12. IP Library
HB_DSPB_ADV | 2019.04.01
Figure 108. Decimating by 5 Filter Decreasing Sample Rate of a Sine Wave Input
The Decimating FIR has a lower output sample rate than the input sample rate by a
factor, D, the decimation factor. The decimating FIR discards D–1 out of D output
samples, thus lowering the sample rate by a factor D.
Input rate per channel Specifies the sampling frequency of the input data per channel measured in millions of samples
per second (MSPS).
Symmetry You can select Symmetrical or Anti-Symmetrical coefficients. Symmetrical coefficients can
result in hardware resource savings over the asymmetrical version.
Coefficients You can specify the filter coefficients using a Simulink fixed-point object fi(0). The data type of
the fixed-point object determines the width and format of the coefficients. The length of the array
determines the length of the filter.
For example, fi(fir1(49, 0.3),1,18,19)
Base address You can memory map the filter's coefficients into the address space of the system. This field
determines the starting address for the coefficients. It is specified as a MATLAB double type
(decimal integer) but you can use a MATLAB expression to specify a hexadecimal or octal type if
required.
continued...
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
253
12. IP Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
Read/Write mode You can allow Read, Write, or Read/Write access from the system interface. Turn on
Constant. to map coefficients to the system address space.
Filter structure You can select Use All Taps, Half Band, or other specified band (from 3rd Band to 46th
Band).
Expose Avalon-MM Allows you to reconfigure coefficients without Platform Designer. Also, it allows you to reprogram
Slave in Simulink multiple FIR filters simultaneously. Turn on to show the Avalon-MM inputs and outputs as normal
ports in Simulink. The Read/Write mode decides the valid subset of Avalon-MM slave ports that
appear on the block. If you select Constant, the block shows no Avalon-MM ports.
Channel mapping Enter parameters as a MATLAB 2D aray for reconfigurable FIR filter. Each row represents a mode;
each entry in a row represents the channel input on that time slot. For example, [0,0,0,0;0,1 2,3]
gives the first element of the second row as 0, which means DSP Builder processes channel 0 on
the first cycle when the FIR is set to mode 1.
For more information about Simulink fixed-point objects and MATLAB functions, refer
to the MATLAB Help.
a Input The fixed-point data input to the block. If you request more channels than can fit on a single
bus, this signal is a vector. The width in bits is inherited from the input wire.
v Input Indicates validity of the data input signals. If v is high, the data on the a wire is valid.
c Input Indicates the channel of the data input signals. If v is high, then c indicates which channel the
data corresponds to.
b Input Indicates multibank filter. This input appears when you add a second filter definition to the
Coefficients parameter in the parameters dialog box.
q Output The fixed-point filtered data output from the block. If you request more channels than can fit on
a single bus, this signal is a vector. The width in bits is a function of the input width in bits and
the parameterization.
c Output Indicates the channel of the data output signals. The output data can be non-zero when v is low.
Related Information
• DSP Builder FIR and CIC Filters on page 243
• DSP Builder FIR Filters on page 246
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
254
12. IP Library
HB_DSPB_ADV | 2019.04.01
You can use the FractionalRateFIR block in a digital down converter for a radio
system or a general purpose DSP application. The coefficients and input data are
fixed-point types, and the output is the implied full precision fixed-point type. You can
reduce the precision by using a separate Scale block, which can perform rounding and
saturation to provide the required output precision.
In the basic filter operation, at each sample time, k, the new output y, is calculated by
multiplying coefficients a, by the recent past values of the input x.
The FractionalRateFIR has a modified output sample rate that differs from the input
sample rate by a factor, I /D, where I is the interpolation rate and D is the decimation
factor. Usually, the fractional rate interpolates by a factor I by inserting (I–1) zeros
before performing the filter operation. Then the FIR discards D–1 out of D output
samples, thus lowering the sample rate by a factor D.
Figure 109. Sample Rate of a Sine Wave Input Interpolated by 3 and Decimated by 2
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
255
12. IP Library
HB_DSPB_ADV | 2019.04.01
Input rate per channel Specifies the sampling frequency of the input data per channel measured in millions of samples
per second (MSPS).
Symmetry You can select Symmetrical or Anti-Symmetrical coefficients. Symmetrical coefficients can
result in hardware resource savings over the asymmetrical version.
Coefficients You can specify the filter coefficients using a Simulink fixed-point object fi(0). The data type of
the fixed-point object determines the width and format of the coefficients. The length of the array
determines the length of the filter.
For example, fi(fir1(49, 0.3),1,18,19).
Base address You can memory map the filter's coefficients into the address space of the system. This field
determines the starting address for the coefficients. It is specified as a MATLAB double type
(decimal integer) but you can use a MATLAB expression to specify a hexadecimal or octal type if
required.
Read/Write mode You can allow Read, Write, or Read/Write access from the system interface. Turn on
Constant. to map coefficients to the system address space.
Filter structure You can select Use All Taps, Half Band, or a specified band (from 3rd Band to 46th Band).
Expose Avalon-MM Allows you to reconfigure coefficients without Platform Designer. Also, it allows you to reprogram
Slave in Simulink multiple FIR filters simultaneously. Turn on to show the Avalon-MM inputs and outputs as normal
ports in Simulink. The Read/Write mode decides the valid subset of Avalon-MM slave ports that
appear on the block. If you select Constant, the block shows no Avalon-MM ports.
Channel mapping Enter parameters as a MATLAB 2D aray for reconfigurable FIR filter. Each row represents a mode;
each entry in a row represents the channel input on that time slot. For example, [0,0,0,0;0,1 2,3]
gives the first element of the second row as 0, which means DSP Builder processes channel 0 on
the first cycle when the FIR is set to mode 1.
a Input The fixed-point data input to the block. If you request more channels than can fit on a single
bus, this signal is a vector. The width in bits is inherited from the input wire.
v Input Indicates validity of the data input signals. If v is high, the data on the a wire is valid.
c Input Indicates the channel of the data input signals. If v is high, c indicates which channel the data
corresponds to.
b Input Indicates multibank filter. This input appears when you add a second filter definition to the
Coefficients parameter in the parameters dialog box.
q Output The fixed-point filtered data output from the block. If you request more channels than can fit on
a single bus, this signal is a vector. The width in bits is a function of the input width in bits and
the parameterization.
v Output Indicates validity of data output signals. The output data can be non-zero when v is low.
c Output Indicates the channel of the data output signals. The output data can be non-zero when v is low.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
256
12. IP Library
HB_DSPB_ADV | 2019.04.01
Related Information
• DSP Builder FIR and CIC Filters on page 243
• DSP Builder FIR Filters on page 246
You can use the InterpolatingCIC block in a digital up converter for a radio system
or a general purpose DSP application. The coefficients and input data are fixed-point
types, and the output is the implied full precision fixed-point type. You can reduce the
precision by using a separate Scale block, which can perform rounding and saturation
to provide the required output precision.
The InterpolatingCIC has a higher output sample rate than the input sample rate by
a factor I, where I is the interpolation rate. Usually, the InterpolatingCIC inserts (I–
1) zeros for every input sample, thus raising the sample rate by a factor I.
Figure 110. Interpolate by 5 Filter Increasing Sample Rate of a Sine Wave Input
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
257
12. IP Library
HB_DSPB_ADV | 2019.04.01
Input rate per channel Specifies the sampling frequency of the input data per channel measured in millions of samples
per second (MSPS).
Final decimation You can optionally specify a final decimation by 2 to allow interpolation rates which are multiples
of 0.5. The decimation works by simply throwing away data values. Only use this option to reduce
the number of unique outputs the CIC generates.
a Input The fixed-point data input to the block. If you request more channels than can fit on a single bus,
this signal is a vector. The width in bits is inherited from the input wire.
v Input Indicates validity of the data input signals. If v is high, the data on the a wire is valid.
c Input Indicates the channel of the data input signals. If v is high, c indicates which channel the data
corresponds to.
bypass Input When this input is asserted, the input data is zero-stuffed and scaled by the gain of the filter. This
option can be useful during hardware debug.
q Output The fixed-point filtered data output from the block. If you request more channels than can fit on a
single bus, this signal is a vector. The width in bits is a function of the input width in bits and the
parameterization.
v Output Indicates validity of data output signals. The output data can be non-zero when v is low.
c Output Indicates the channel of the data output signals. The output data can be non-zero when v is low.
Related Information
DSP Builder FIR and CIC Filters on page 243
You can use the InterpolatingFIR block in a digital up converter for a radio system
or a general purpose DSP application. The coefficients and input data are fixed-point
types, and the output is the implied full precision fixed-point type. You can reduce the
precision by using a separate Scale block, which can perform rounding and saturation
to provide the required output precision.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
258
12. IP Library
HB_DSPB_ADV | 2019.04.01
In the basic equation, at each sample time k, the new output y, is calculated by
multiplying coefficients a, by the recent past values of the input x.
The InterpolatingFIR has a higher output sample rate than the input sample rate by
a factor, I, the interpolation factor. Usually, the interpolating FIR inserts I–1 zeroes for
every input sample, thus raising the sample rate by a factor I.
Figure 111. Interpolate by 2 Filter Increasing Sample Rate of a Sine Wave Input
Input rate per channel Specifies the sampling frequency of the input data per channel measured in millions of samples
per second (MSPS).
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
259
12. IP Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
Symmetry You can select Symmetrical or Anti-Symmetrical coefficients. Symmetrical coefficients can
result in hardware resource savings over the asymmetrical version.
Coefficients You can specify the filter coefficients using a Simulink fixed-point object fi(0). The data type of
the fixed-point object determines the width and format of the coefficients. The length of the array
determines the length of the filter.
For example, fi(fir1(49, 0.3),1,18,19).
Base address You can memory map the filter's coefficients into the address space of the system. This field
determines the starting address for the coefficients. It is specified as a MATLAB double type
(decimal integer) but you can use a MATLAB expression to specify a hexadecimal or octal type if
required.
Read/Write mode You can allow Read, Write, or Read/Write access from the system interface. Turn on
Constant. to map coefficients to the system address space.
Filter structure You can select Use All Taps, Half Band, or a specified band (from 3rd Band to 46th Band).
Expose Avalon-MM Allows you to reconfigure coefficients without Platform Designer. Also, it allows you to reprogram
Slave in Simulink multiple FIR filters simultaneously. Turn on to show the Avalon-MM inputs and outputs as normal
ports in Simulink. The Read/Write mode decides the valid subset of Avalon-MM slave ports that
appear on the block. If you select Constant, the block shows no Avalon-MM ports.
Channel mapping Enter parameters as a MATLAB 2D aray for reconfigurable FIR filter. Each row represents a mode;
each entry in a row represents the channel input on that time slot. For example, [0,0,0,0;0,1 2,3]
gives the first element of the second row as 0, which means DSP Builder processes channel 0 on
the first cycle when the FIR is set to mode 1.
a Input The fixed-point data input to the block. If you request more channels than can fit on a single
bus, this signal is a vector. The width in bits is inherited from the input wire.
v Input Indicates validity of the data input signals. If v is high, the data on the a wire is valid.
c Input Indicates the channel of the data input signals. If v is high, c indicates which channel the data
corresponds to.
b Input Indicates multibank filter. This input appears when you add a second filter definition to the
Coefficients parameter in the parameters dialog box.
q Output The fixed-point filtered data output from the block. If you request more channels than can fit on
a single bus, this signal is a vector. The width in bits is a function of the input width in bits and
the parameterization.
v Output Indicates validity of data output signals. The output data can be non-zero when v is low
c Output Indicates the channel of the data output signals. The output data can be non-zero when v is low
Related Information
• DSP Builder FIR and CIC Filters on page 243
• DSP Builder FIR Filters on page 246
12.1.10. NCO
The DSP Builder NCO block uses an octant-based algorithm with trigonometric
interpolation. A numerically controlled oscillator (NCO) or digitally controlled oscillator
(DCO) is an electronic system for synthesizing a range of frequencies from a fixed
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
260
12. IP Library
HB_DSPB_ADV | 2019.04.01
time base. Use NCOs when you require a continuous phase sinusoidal signal with
variable frequency, such as when receiving the signal from an NCO-based transmitter
in a communications system.
The NCO accumulates a phase angle in an accumulator. DSP Builder uses this angle as
a lookup into sine and cosine tables to find a coarse sine and cosine approximation.
DSP Builder implements the tables with a ROM. A Taylor series expansion of the small
angle error refines this coarse approximation to produce accurate sine and cosine
values. The NCO block uses folding to produce multiple sine and cosine values if the
sample rate is an integer fraction of the system clock rate.
You can use this block in a digital up- or down-converter for a radio system or a
general purpose DSP application. The coefficients and input data are fixed-point types,
and the output is the implied full precision fixed-point type.
An NCO sometimes needs to synchronize its phase to an exact cycle. It uses the
phase and sync inputs for this purpose. The sync input is a write enable for the
channel (address) specified by the chan input when the new phase value (data) is
available on the phase input. You may need some external logic (which you can
implement as a primitive subsystem) to drive these signals. For example, you can
prepare a sequence of new phase values in a shared memory and then write all the
values to the NCO on a synchronization pulse. This option is particularly useful if you
want an initial phase offset in the upper sinusoid. You can also use this option to
implement efficient phase-shift keying (PSK) modulators in which the input to the
phase modulator varies according to a data stream.
The system specification, including such factors as the channel count, sample rates,
and noise floor, determines the main parameters for this block. You can express all the
parameters as MATLAB expressions, making it easy to parameterize a complete
system.
The hardware generation techniques create very efficient NCOs, which are fast enough
to update with every Simulink simulation. The edit-simulation loop time is much
reduced, improving productivity.
Output Rate Per The sine and cosine output rate per channel measured in millions of samples per second.
Channel (MSPS)
Output Data Type The output width in bits of the NCO. The bit width controls the internal precision of the NCO. The
spurious-free dynamic range (SFDR) of the waves produced is approximately 6.02 × bit width.
The 6.02 factor comes from the definition of decibels with each added bit of precision increasing
the SFDR by a factor of 20×log10(2).
Output Scaling Value This value interprets the output data in the Simulink environment. The power of 2 scaling
provided lets you specify the range of the output value.
Accumulator Bit Width Specifies the width of the memory-mapped accumulator bit width, which governs the NCO
frequency accuracy that you can control. The width is limited to the range 15–30 for use with a
32-bit memory map (shared by other applications such as a Nios II processor). The top two bits
in the 32-bit width are reserved to control the inversion of the sine and cosine outputs. Select
Constant for the Read/Write Mode to increase the width to 40 bits.
Frequency resolution = clock frequency/2accumulator bit width
Phase Increment and A vector that represents the step in phase between each sample. This vector controls the
Inversion frequencies generated during simulation. The length of the vector determines how many
channels (frequencies) of data are generated from the NCO. The unit of the vector is one (sine or
cosine) cycle.
continued...
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
261
12. IP Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
Phase Increment and Specifies where in the memory-mapped space the NCO registers are mapped.
Inversion Memory Map
Read/Write Mode Specifies whether the NCO phase increment and inversion registers are mapped as Read, Write,
Read/Write, or Constant.
Expose Avalon-MM Allows you to reconfigure increments without . Also, it also allows you to reprogram multiple
Slave in Simulink NCOs simultaneously. When you turn on this parameter, the following three additional input ports
and two output ports appear in Simulink.
• data, address, write
• readdata, valid
Related Information
• Super-sample NCO
• NCO on page 172
To achieve a desired frequency (in MHz) from the NCO block, you must specify a
phase increment value defined by:
Phase Increment Value = Frequency * 2Accumulator Bit Width / Output Data Rate
This value must fall within the range specified by the Accumulator Bit Width
parameter. For example, for an accumulator bit width of 24 bits, you can specify a
phase increment value less than 224.
You can specify the phase increment values in a vector format that generates
multichannel sinusoidal signals. The length of the vector determines how many
channels (frequencies) of data are generated from the NCO block. For example, a
length of 4 implies that four channels of data are generated.
When the design uses the NCO for super-rate applications (NCO frequency is higher
than output data rate), for example direct RF DUC, use multiple channels (in evenly
distributed phases). The phase increment value is:
The modulus function limits the phase value to less than 1 and prevents interfering
with the inversion bits.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
262
12. IP Library
HB_DSPB_ADV | 2019.04.01
When the input is in matrix format (with multiple rows of vectors), the design
configures the NCO block as a multi-bank NCO for frequency hopping for multicarrier
designs. The number of rows in the matrix represents the number of banks of
frequencies (of sine and cosine waves) that generate for a given channel. An
additional bank input and b output port automatically add to the NCO block.
Note: No upper limit to the number of rows exists in the matrix and you can specify any
number of frequency banks. However, you should carefully monitor the resource usage
to ensure that the specified design fits into the target device.
You can also use the Phase Increment and Inversion parameter to indicate
whether the generated sinusoidal signals are inverted. For an accumulator width in
bits of 24 bits, you can add two bits (the 25th and 26th bits) to the phase increment
value for a given frequency. These bits indicate if the sine (26th bit) and cosine (25th
bit) are inverted.
The NCO block only supports one or two registers for each phase increment value. If
one register is required for each phase increment value, the phase increment value for
the first frequency is written into the base address, the second value into the next
address (base address + 1) and so on. If you require two registers, the design uses
the base address and the next address (base address + 1) for the first value with each
address storing part of the value. The next pair of addresses store the next value and
so on.
For example, for a System Data Width of 16, Accumulator Bit Width of 24 and
Phase Increment and Inversion Memory Map base address of 1000, addresses
1000 and 1001 store the phase increment value for the first frequency. Address 1001
stores the lower 16 bits (15 .. 0) and address 1000 stores the remaining 8 bits (23 ..
16). If DSP Builder generates four channels of sinusoidal signals, it uses addresses
1002 and 1003 for the second channel, addresses 1004 and 1005 for the third
channel, addresses 1006 and 1007 for the fourth channel.
In summary:
When DSP Builder writes to the phase increment and inversion memory map registers
(in write mode), the new value takes effect immediately.
If the application is a super-rate operation (like direct RF DUC) and multiple channels
in the NCO are configured for a new center frequency, first configure the phase
increment value for each channel. DSP Builder then synchronizes the phase offsets of
all channels at the same time by asserting the sync pulse.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
263
12. IP Library
HB_DSPB_ADV | 2019.04.01
To minimize the duration of disruption, you may use two banks of phase increment
registers. The new phase increment registers bank switches first. Then, you can apply
the sync pulse to synchronize the new phase offsets.
You can use the Avalon-MM interface to access (read or write) the phase increment
memory registers in the same way as for a single bank with the register address for
the ith bank frequencies starting from:
You can use the bank input as the index to switch the generated sinusoidal waves to
the specified set (bank) of predefined frequencies.
Note: Ensure you constrain the bank input to the range (0 .. <number of banks> – 1). You
can expect unreliable outputs from the NCO block if the bank input exceeds the
number of banks.
When using an Avalon-MM interface to access (read or write) the phase increment
memory registers, ensure that you only write to the inactive banks (banks which are
not equal to the index specified by the input bank port). The dual-port memory that
the NCO block uses is in DONT_CARE mode when reading and writing to the same
address. The NCO block uses the active bank to read the phase increment value.
Writing to the active bank may cause unreliable values to read out and the active bank
may pass out unexpected sinusoidal signals through the memory interface.
The read data, from the address to which you write the new values to, may also be
unreliable because of the memory type that the NCO block uses. Only use read data
from banks where they do not write at the same time.
Expected SFDR The SFDR in decibels relative to the carrier (dBc): (Output Data Type Width) × 20 × log10(2).
Accumulator precision Accumulator precision in Hz: 106 × (output rate) / 2(accumulator width in bits+1).
Frequency Frequency in MHz: (output rate) × (phase increment and inversion) / 2(accumulator width in bits).
# outputs per cycle The number of outputs per cycle is the width of the vector of output signals: physical channels
out = ceil(length(phase increment and inversion)) / ((system clock frequency) / (output rate)))
log2 of look-up table The number of address bits in the internal look-up tables.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
264
12. IP Library
HB_DSPB_ADV | 2019.04.01
chan Input Indicates the channel. If v is high, chan indicates which channel the data corresponds to.
phase Input Specifies the phase offset. The size of this port should match the wire count of the NCO. The
number of sines/cosines per cycle is limited to 1–16 outputs. Use multiple NCO blocks if more
outputs are required.
sync Input Specifies the phase synchronization. The size of this port should match the wire count of the
NCO output. When asserted, the phase offsets of all channels synchronize to the phase inputs.
This signal has no effect to the phase increment and inversion registers. When you use this
signal, you may need to initialize the offsets upon system power-up or reset. The number of
sines/cosines per cycle is limited to 1–16 outputs. Use multiple NCO blocks if more outputs are
required.
bank Input This input is available when you specify a matrix of predefined vectors for the phase increment
values. You can use this input to switch to the bank of predefined frequencies.
data Input The data port has unsigned integers with a width equal to the width of the accumulator plus two
for the inversion bits.
address Input Only available when you turn on Expose Avalon-MM Slave in Simulink. The address port is
the same width as the system address width that you configure in the DSP Builder ➤ Avalon
Interfaces ➤ Avalon MM Slave menu. Also the base address is the same.
sin Output The sine data output from the block. If you request more channels than can fit on a single bus,
this signal is a vector. The width in bits is a function of the input width in bits and the
parameterization.
cos Output The cosine data output from the block. If you request more channels than can fit on a single
bus, this signal is vector. The width in bits is a function of the input width in bits and the
parameterization. The number of sines/cosines per cycle is limited to 1–16 outputs. Use multiple
NCO blocks if more outputs are required.
b Output Indicates the bank that the output signals use. This output is available when you specify a
matrix of predefined vectors for the phase increment values.
readdata Output The data port has unsigned integers with a width equal to the width of the accumulator plus two
for the inversion bits.
The Mixer block multiplies a real input stream by a synchronized complex data
stream, sample by sample.
You can use the Mixer block in a digital down converter for a radio system or a
general purpose DSP application. The data has fixed-point types, and the output is the
implied full precision fixed-point type.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
265
12. IP Library
HB_DSPB_ADV | 2019.04.01
Note: You can easily replicate the Mixer block with a Multiply block that takes one real and
one complex input within a primitive subsystem.
The system specification, including such factors as the channel count and sample
rates, determines the main parameters for this block. The input sample rate of the
block determines the number of channels present on each input wire and the number
of wires:
For example, a sample rate of 60 MSPS and system clock rate of 240 MHz gives four
samples to be TDM on to each input wire:
If there are more channels than TDM slots available on a wire, the input wire is a
vector of sufficient width to hold all the samples. Similarly, the number of frequencies
(the number of complex numbers) determines the width of the sine and cosine inputs.
The number of results that the Mixer produces is the product of the sample input
vector and the frequency vector. The results are TDM on to the i and q outputs in a
similar way to the inputs.
Input Rate Per Channel (MSPS) The data rate per channel measured in millions of samples per second.
a Input The real data input to the block. If you request more channels than can fit on a single bus, this signal
is a vector. The width in bits is inherited from the input wire.
v Input Indicates the validity of the data input signals. If v is high, the data on the a wire is valid.
c Input Indicates the channel of the data input signals. If v is high, c indicates the data channel.
sin Input The imaginary part of the complex number. For example, the NCO's sine output.
cos Input The real part of the complex number. For example, the NCO’s cosine output.
i Output The in-phase (real) output of the mixer, which is (a × cos). If you request more channels than can fit
on a single bus, this signal is a vector. The width in bits is wide enough for the full precision result.
q Output The quadrature phase (imaginary) output of the mixer, which is (a × sin). If you request more
channels than can fit on a single bus, this signal is a vector. The width in bits is wide enough for the
full precision result.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
266
12. IP Library
HB_DSPB_ADV | 2019.04.01
12.1.12. Scale
The Scale block selects part of a wide input word, performs various types of rounding,
saturation and fixed-point scaling, and produces an output of specified precision.
By default, DSP Builder preserves the binary point so that the fixed-point
interpretation of the result has the same value, subject to rounding, as the fixed-point
interpretation of the input.
You can dynamically perform additional scaling, by specifying a variable number of bits
to shift, allowing you to introduce any power of two gain.
Note: Always use Scale blocks to change data types in preference to Convert blocks,
because they vectorize and automatically balance the delays with the corresponding
valid and channel signals.
The Scale block provides scaling in addition to rounding and saturation to help you
manage bit growth. The basic functional modules of a Scale block are shifts followed
by rounding and saturation. The multiplication factor (default is 1) is a constant scale
to apply to the input.
The number of bits to shift left allows you to select the most meaningful bits of a wide
word, and discard unused MSBs. You can specify the number of shifts as a scalar or a
vector. The block relies on shift input port to decide which value to use if you specified
the number of shifts as a vector. The shift input signal selects which gain to use cycle-
by-cycle.
In a multichannel design, changing the shift value cycle-by-cycle allows you to use a
different scaling factor for different channels.
A positive number of Number of bits to shift left indicates that the MSBs are
discarded, and the Scale block introduces a gain to the input. A negative number
means that zeros (or 1 in the signed data case) are padded to the MSBs of the input
data signal, and the output signal is attenuated.
Output data type The type of the result. For example: sfix(16), uint(8).
Output scaling value The scaling of the result if the result type is fixed-point. For example: 2^-15.
Rounding method Specifies one of the following three rounding methods for discarding the least significant bits
(LSBs):
• Truncate: truncates the least significant bits. Has the lowest hardware usage, but
introduces the worst bias.
• Biased: rounds up if the discarded bits are 0.5 or above.
• Unbiased: rounds up if the discarded bits are greater than 0.5, and rounds to even if the
discarded bits equal 0.5.
Multiplication factor Modify the interpreted value by scaling it by this factor. This factor does not affect the
hardware generated for the Scale block, but merely affects the interpretation of the result. For
example: 1, 2, 3, 4, 8, 0.5.
Saturation method Specifies one of the following three saturation methods for discarding the most significant bits
(MSBs):
continued...
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
267
12. IP Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
Number of bits to shift A scalar or a vector that determines the gain of the result. A positive number indicates that the
left scale block introduces a gain to the input. A negative number means that the output signal is
attenuated. A vector of gains allows the shift input signal to select which gain to use on a cycle
per cycle basis. The value of the shift input performs zero-based indexing of the vector.
a Input The fixed-point data input to the block. If you request more channels than can fit onto a single bus,
this signal is a vector. The width in bits is inherited from the input wire.
a_v Input Indicates the validity of the data input signals. If a_v is high, the data on the a wire is valid.
a_chan Input Indicates the channel of the data input signals. If a_v is high, a_chan indicates to which channel
the data corresponds.
shift Input Indicates which element of the zero-based shift vector to use.
q Output The scaled fixed-point data output from the block. If you request more channels than can fit onto a
single bus, this signal is a vector. The width in bits is calculated as a function of the input width in
bits and the parameterization.
q_exp Output Indicates whether the output sample has saturated or overflowed.
After you run a simulation, DSP Builder updates the help pages with specific
information about each instance of a block.
Written on Tue Feb 19 11:25:27 2008 Date and time when this file ran.
Number of physical buses: 4 Depending on the input data rate, the number of data wires needed to carry the
input data may be more than 1.
Calculated bit width of output stage: The width in bits of the (vectorized) data output.
16
Port interface table Lists the port interfaces to the Scale block.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
268
12. IP Library
HB_DSPB_ADV | 2019.04.01
You can use the SingleRateFIR block in a digital up converter for a radio system or a
general purpose DSP application. The coefficients and input data are fixed-point types,
and the output is the implied full precision fixed-point type. You can reduce the
precision by using a separate Scale block, which can perform rounding and saturation
to provide the required output precision.
The SingleRateFIR block supports sample rates from 1 to 500, coefficient width in
bits from 2 to 32 bits, half-band and L-band Nyquist filters, real and complex filters,
and symmetry and anti(negative)-symmetry.
Input rate per channel Specifies the sampling frequency of the input data per channel measured in millions of samples
per second (MSPS).
Symmetry You can select Symmetrical or Anti-Symmetrical coefficients. Symmetrical coefficients can
result in hardware resource savings over the asymmetrical version.
Coefficients You can specify the filter coefficients using a Simulink fixed-point object fi(0). The data type of
the fixed-point object determines the width and format of the coefficients. The length of the array
determines the length of the filter. For example, fi(fir1(49, 0.3),1,18,19)
Base address You can memory map the filter's coefficients into the address space of the system. This field
determines the starting address for the coefficients. It is specified as a MATLAB double type
(decimal integer) but you can use a MATLAB expression to specify a hexadecimal or octal type if
required.
Read/Write mode You can allow Read, Write, or Read/Write access from the system interface. Turn on
Constant, to map coefficients to the system address space.
Expose Avalon-MM Allows you to reconfigure coefficients without Platform Designer. Also, it allows you to reprogram
Slave in Simulink multiple FIR filters simultaneously. Turn on to show the Avalon-MM inputs and outputs as normal
ports in Simulink. The Read/Write mode decides the valid subset of Avalon-MM slave ports that
appear on the block. If you select Constant, the block shows no Avalon-MM ports.
Channel mapping Enter parameters as a MATLAB 2D array for a reconfigurable FIR filter. Each row represents a
mode; each entry in a row represents the channel input on that time slot. For example,
[0,0,0,0;0,1 2,3] gives the first element of the second row as 0, which means DSP Builder
processes channel 0 on the first cycle when the FIR is set to mode 1.
a Input The fixed-point data input to the block. If you request more channels than can fit on a single
bus, this signal is a vector. The width in bits is inherited from the input wire.
v Input Indicates validity of the data input signals. If v is high, the data on the a wire is valid.
c Input Indicates the channel of the data input signals. If v is high, c indicates the channel to which the
data corresponds.
b Input Indicates multibank filter. This input appears when you add a second filter definition to the
Coefficients parameter in the parameters dialog box.
continued...
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
269
12. IP Library
HB_DSPB_ADV | 2019.04.01
q Output The fixed-point filtered data output from the block. If you request more channels than can fit on
a single bus, this signal is a vector. The width in bits is a function of the input width in bits and
the parameterization.
v Output Indicates the validity of data output signals. The output data can be non-zero when v is low.
c Output Indicates the channel of the data output signals. The output data can be non-zero when v is low.
Related Information
• DSP Builder FIR and CIC Filters on page 243
• DSP Builder FIR Filters on page 246
The value of the sample delay may depend on the latency of referenced model, refer
to the SynthesisInfo block.
Number of Data Specify the number of input and output (d and q) connections for the block.
Signals DSP Builder passes each input to the corresponding output and delays it by the latency constraint.
Latency This option allows you to select the type of constraint and to specify its value. The value can be a
Constraint workspace variable or an expression but must evaluate to a positive integer.
You can select the following types of constraint:
• >: Greater than
• >=: Greater than or equal to
• =: Equal to
• <=: Less than or equal to
• <: Less than
Select either + or - and type in a reference model in the text field. Specify the reference as a
Simulink path string e.g. ‘design/topLevel/model’. DSP Builder then ensures the latency
depends on that model, otherwise the default is that DSP Builder depends on no model.
Local Reset- Turn on to allow DSP Builder to apply reset minimization to the delays. You must also turn on Global
Minimization Reset Minimization.
The values are:
• Off. Default, no reset minimization.
• On. DSP Builder applies no reset to all delay stages.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
270
12. IP Library
HB_DSPB_ADV | 2019.04.01
A single synthesis time parameter specifies the length N of the fast Fourier transform.
The bit reversal that this block applies is appropriate only for transform sizes that are
an integer power of two. The block is single-buffered to support full streaming
operation with minimal overhead.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
271
12. IP Library
HB_DSPB_ADV | 2019.04.01
size Input Unsigned integer Logarithm of the current input frame size.
VariableBitReverse only.
qsize Output Unsigned integer Logarithm of the current output frame size.
VariableBitReverse only.
The following blocks are in the Primitives FFT Design Elements library:
• FFT_Light
• VFFT_Light
For floating-point FFTs, select either correct or faithful rounding. Correct rounding
corresponds to the normal IEEE semantics; faithful rounding delivers less accurate
results but requires less logic to implement.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
272
12. IP Library
HB_DSPB_ADV | 2019.04.01
The FFT block provides a full radix-22 streaming FFT or IFFT. Use the FFT block for
fixed-point or floating-point data. The block is a scheduled subsystem.
The FFT blocks all support block-based flow control. You must supply all the input data
required for a single FFT iteration (one block) on consecutive clocks cycles, but an
arbitrary large (or small) gap can exist between consecutive blocks. The
BitReverseCoreC and Transpose blocks produce data in blocks that respect this
protocol.
You may provide the input data to any of these block in either natural or bit-reversed
order; the output result is in bit-reversed or natural order, respectively.
The VFFT block provides a variable-size streaming FFT or IFFT. For these blocks, you
statically specify the largest and smallest FFT that the block handles. You can
dynamically configure the number of points processed in each FFT iteration using the
size signal.
Use the VFFT block for fixed-point or floating-point data. The VFFT block is a
scheduled subsystem and implements v (valid) and c (channel) signals.
The VFFT_light block is a light-weight variant of the VFFT block. It is not a scheduled
subsystem, and it doesn’t implement the c (channel) signal. Instead, it provides an
output g signal, which pulses high at the start of each output block.
The VFFT blocks all support block-based flow control. You must supply all the input
data required for a single VFFT iteration (one block) on consecutive clocks cycles. If
you use two successive FFT iterations that use the same FFT size, the inter-block gap
can be as small (or as large) as you like.
However, if you want to reconfigure the VFFT block between FFT iterations, you must
use the following rules:
• The size input should always be in the range minSize <= size <= maxSize.
• The size input must be kept constant while the VFFT block processes an FFT
iteration.
• When you reconfigure the VFFT, you must completely flush VFFT pipeline before
changing the value of the size input. You must wait at least 2oldSize (where oldSize
is the previous value of the size input) cycles before providing valid input to the
VFFT.
Note: The VariableBitReverse block also requires an inter-block gap of 2oldSize cycles when
you reconfigure its size. If you use both the VariableBitReverse block and the VFFT
block, you need to provide an interblock gap of 2*(2oldSize) cycles to allow both blocks
to reconfigure successfully.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
273
12. IP Library
HB_DSPB_ADV | 2019.04.01
Number of interleaved Enter how many FFTs that DSP Builder interleaves in each block.
subchannels
maxSize The logarithm of the maximum FFT size. VFFT and VFFT_Light only.
minSize The logarithm of the minimum FFT size. VFFT and VFFT_Light only.
Use faithful rounding true if the block uses faithful (rather than correct) rounding for floating-point operations.
Fixed-point FFTs ignore this parameter.
c Input Unsigned 8-bit integer. Channel input signal FFT and VFFT, only.
size Input Unsigned integer. Logarithm of the current FFT size. VFFT and VFFT_Light only.
d Input Any complex fixed-point. Complex data input signal. VFFT and VFFT_Light only.
qc Output Unsigned 8-bit integer. Channel output signal. FFT and VFFT, only.
Related Information
About Pruning and Twiddle for FFT Blocks on page 292
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
274
HB_DSPB_ADV | 2019.04.01
Send Feedback
This library also provides blocks that you can use to simulate the bus interface in the
Simulink environment.
Note: Do not turn on Bit Accurate Simulation when your design includes Memory-
Mapped library blocks, otherwise the simulation is all zeros.
Note: When you use the BusSlave block in a design, DSP Builder disables all Avalon-MM
interface pipelined reads for the whole design.
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
You must provide logic to generate any appropriate response connected to the rd and
rv inputs, which then returns over the processor interface. You must also add your
own decoding logic to work with this block
Note: All signals connected to the BusSlave block are within the bus clock domain. You
must implement appropriate clock-crossing logic (such as a DualMem block).
Memory Name Specifies the memory region. Can be an expression but must evaluate to an integer
address.
Read/Write Specifies the mode of the memory as viewed from the processor:
Mode • Read: processor can only read over specified address range.
• Write: processor can only write over specified address range.
• Read/Write: processor can read or write over specified address range.
• Constant: processor cannot access specified address range. This option continues
to reserve space in the memory map.
Evaluated Displays the evaluated value of the Memory Name expression when you click Apply.
Address
Expression
The BusStimulus block performs hidden accesses to the registers and SharedMem
blocks in the memory hierarchy of your model. It is an interface that allows another
block to read and write to any address. The address and data ports act as though
an external processor reads and writes to your system.
The BusStimulus block transmits data from its input ports (address, writedata
and write) over the processor interface, and thus modifies the internal state of the
memory-mapped registers and memories as appropriate. Any response from the
simulated processor interface is output on the readdata and readvalid output
ports.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
276
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
For example, to use the BusStimulus block connect constants to the address and
data inputs. A pulse on the write port then writes the data to any register mapped to
the specified address. Put a counter on the address input to provide all the data in
every memory location on the readdata port. DSP Builder asserts the
readdatavalid output when a valid read data is on the readdata port.
Show read enable Turn on to show read enable port. If you use the BusStimulus with the BusStimulusFileReader blocks
in a design, ensure this parameter is turned on or turned off in both blocks.
The BusStimulusFileReader block reads a stimulus file (.stm) and generates signals
that match the BusStimulus block.
A bus stimulus file describes a sequence of transactions to occur over the processor
interface, together with expected read back values. This block reads such files and
produces outputs for each entry in the file.
Bus stimulus files automatically write to any blocks that have processor mapped
registers when you simulate a design. Any design with useful register files generates a
bus stimulus file that you can use to bring your design out of reset (all registers 0).
You can also write your own bus stimulus files with the following format:
or
where:
MemSpace specifies the memory space (the format supports multiple memory
spaces).
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
277
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
ExpReadData is the expected read data. The value that is read from a location is
checked against this value to allow self checking tests.
Mask specifies when the expected read data is checked, only the bits in this mask are
checked, to allows you to read, write, or check specified bits in a register.
The mask also masks the written data and performs a read-update-write cycle if you
write to certain bits (i.e. not overwrite all of them).
During simulation, any mismatch between the expected read data (as the bus
stimulus file describes) and the incoming read data (as the BusStimulus block
provides) highlights and DSP Builder issues a warning.
Enabled Turn on to enable reading of the bus stimulus file data. You must turn on Has read enable in the
BusStimulusFileReader block if you turn on Show read enable in the BusStimulus block.
Stimulus File Specifies the file from which to read bus stimulus data.
Name
Log File Name Specifies the file to store a log of all attempted bus stimulus accesses.
Space Width Specifies the width of the memory space as described in the bus stimulus file—must be the same as
the width specified in the DSP Builder > Avalon Interfaces > Avalon MM Slave menu.
Addr Width Specifies the width of the address space as described in the bus stimulus file—must be the same as
the width specified in the DSP Builder > Avalon Interfaces > Avalon MM Slave menu.
Data Width Specifies the width of the data as described in the bus stimulus file—must be the same as the width
specified in the DSP Builder > Avalon Interfaces > Avalon MM Slave menu.
Has read enable Turn on to show read enable port. If you use the BusStimulusFileReader with the BusStimulus block
in a design, ensure this parameter is turned on or turned off in both blocks.
checkstrobe Output Boolean Indicates when the readexpected and mask signals
should be checked against readdata.
endofstimulus Output Boolean Generated signal to indicate when the end of the bus
stimulus file is reached.
readexpected Output 16-bit or 32-bit unsigned Expected read data from file.
integer
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
278
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
Always add the External Memory block to the top-level of your DSP Builder design
(similar to Control or Signals blocks).
Your design can have several instances of these blocks, but you must give them
separate identifiers. DSP Builder creates a separate simulation model for each of these
blocks.
Identifier Numeric value A unique identifier for External Memory block that you
should set on Memory Read or Memory Write block to
associate these blocks with the External Memory block.
Avalon-MM Interface Data A valid Avalon-MM interface The width of the data signal in the generated Avalon-MM
Width data width value. Master interfaces for associated Memory Read and
Should be power of 2. Memory Write blocks.
Set the data ports on these blocks to the same width.
Memory Data Width Should be less than or equal The data width of the actual external memory.
to Avalon-MM Interface Only use to calculate the size of the memory which affects
Data Width. the width of address bus.
The ratio between these two Set this parameter to a quarter of the Avalon MM
widths should be a power of Interface Data Width parameter to define DDR memory
2. operating at half rate.
Number of Rows Numeric value. The number of rows, columns, and banks of the actual
Should be power of 2. physical memory that you connect to the DSP Builder
Number of Columns design.
Carefully chose access patterns based on these values to
Number of Banks
get the best performance of external memory.
Memory Size Read-Only parameter This parameter displays the size of the external memory
based on the specified number of rows, columns, banks and
memory data width.
DSP Builder uses the following equation:
memory_size = rows * columns * banks *
memory_data_width
continued...
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
279
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
Signal Busy for Specified off Any value other than off forces external memory simulation
Amount of Simulation Time 12.5, 25, 50, 75, 87.5 model into a busy state (the model refuses read or write
(%) requests) at random points during simulation.
The actual value limits the overall busy time compared to
design simulation time.
The busy state of the memory model will be indicated with
low value on ready ports for associated Memory Read or
Memory Write blocks.
If this feature is enabled, you may need to increase overall
simulation time in order to get all requests to external
memory through. Longer simulation time will be required
for higher limits.
Show Diagnostic ports Boolean switch Turn on this option to add diagnostic ports, to External
Memory blocks, which display the state of the simulation
model.
Dump Memory Region into Boolean switch Turn on so the External Memory block dumps its content
File for the specified region into a file.
Each Avalon MM Interface Data Width value occupies a
line in the file and is printed as a sequence of 8-bit decimal
values.
For External Memory blocks with Avalon MM Interface
Data Width set to 16, the lines in the dump file have the
following format
a[7:0] a1[15:8]
Dump File Name Valid file name The name of the dump file with extension (DSP Builder does
not add an extension)
The dump file is created in the current directory.
Dump Region Start Address Valid word address The start address of the region in external memory that
should be dumped.
Dump Region Size Non negative number The number of words that should be dumped starting with
the specified address.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
280
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
This block is an access point for reading from the associated External Memory block.
It provides a simple interface with ready and valid based handshaking for reading. In
generated HDL, use this block as an adapter between the provided interface and the
Avalon-MM master interface. You can place these blocks at any level of hierarchy
under the DSP Builder device level block. The design can contain several of these
blocks, with each of the blocks accessing the associated External Memory block.
Identifier One of identifiers set for External Set to match an identifier on one of the External Memory blocks
Memory blocks in the design in the design.
Maximum Burst off If the value is set to off, DSP Builder does not allow burst
Size 2, 4, 8, 16, 32, 64, 128, 256, requests.
512, 1024 For other values, DSP Builder adds a new port to specify an actual
size (less than or equal specified Maximum Burst Size) for each
burst request.
read Input Boolean Set this port to high to indicate a new read
request.
address Input Unsigned Integer Sets the address for the request.
The width of this port is: log2(memory_size),
memory_size is the size of associated External
Memory.
burstcount Input Unsigned Integer Optional. DSP Builder adds if Maximum Burst
Count is not off.
Sets the actual number of bursts for the read
request.
If you initiate a burst request, update this port
and the read and address ports once at the
beginning of request.
The width of this port is:
log2(max_burst_count) + 1
This block is an access point for writing to the associated external memory model. It
provides a simple interface with ready and valid based handshaking for writing. In
generated HDL, this block is an adapter between the provided interface and the actual
Avalon-MM master interface. Place these blocks at any level of hierarchy under DSP
Builder device level block. The design can contain several of these blocks, with each of
the blocks accessing the associated External Memory block.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
281
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
Identifier One of identifiers set for Set to match an identifier on one of the External Memory
External Memory blocks in blocks in the design.
the design
Byte Enables Boolean width Activate this parameter to use byte enables for the write
request.
If enabled, DSP Builder adds a separate port to provide byte
enable values.
Maximum Burst Size off If the value is set to off, DSP Builder does not allow burst
2, 4, 8, 16, 32, 64, 128, requests.
256, 512, 1024 For any other values, DSP Builder adds a new port to
specify an actual size (less than or equal specified
Maximum Burst Size) for each burst request.
If you initiate a burst write, External Memory blocks
ignore subsequent addresses until the burst is completed.
When a burst write is in progress, DSP Builder queues the
read and write requests from associated Memory Read and
Memory Write blocks until the write burst is completed.
write Input Boolean Set this port to high to indicate new write
request to associated External Memory
blocks.
address Input Unsigned integer Sets the address for write request.
The width of this port is the Avalon MM
Interface Data Width parameter value on the
associated External Memory block.
byteenable Input Unsigned integer Optional. DSP Builder adds byteenable when
the Byte Enables parameter is on.
Sets the byte enables for write data.
The width of this port is:
data_port_width / 8
burstcount Input Unsigned integer Optional. DSP Builder adds burstcount when
the Maximum Burst Count parameter is not
off.
Sets the actual burst count for burst write
requests.
When you initiate a burst request, ensure you
update the address port and this port once at
the beginning of request. Update the write
port every time you update the data port to
supply the next portion of burst data. For
example, if you provide a new portion of data
every cycle, keep the write port high
throughout the burst.
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
282
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
Register Offset Specifies the address of the register. Must evaluate to an integer address.
Read/Write Mode Specifies the mode of the memory as viewed from the processor:
• Read: processor can only read over specified address range.
• Write: processor can only write over specified address range.
• Read/Write: processor can read or write over specified address range.
• Constant: processor cannot access specified address range. This option continues to reserve
space in the memory map.
Bit Specifies the bit location of the memory-mapped register in a processor word (allows different
registers to share same address).
Description Text describing the register. The description is propagated to the generated memory map.
Related Information
Avalon-MM Slave Settings (AvalonMMSlaveSettings) on page 234
Register Offset Specifies the address of the register. Must evaluate to an integer address.
Read/Write Mode Specifies the mode of the memory as viewed from the processor:
continued...
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
283
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
Most Significant Specifies the MSB of the memory-mapped register in a processor word (allows different registers to
Bit share same address). When multiple RegBit, RegOut, and RegField blocks specify the same
address, they refer to the same Avalon-MM register. To avoid conflicts, ensure that the ranges that
you specify do not overlap.
Least Significant Specifies the LSB of the memory-mapped register in a processor word (allows different registers to
Bit share same address). When multiple RegBit, RegOut, and RegField blocks specify the same
address, they refer to the same Avalon-MM register. To avoid conflicts, ensure that the ranges that
you specify do not overlap.
Register Output Specifies the width and sign of the data type that the register stores. The size should equal (MSB –
Type LSB + 1).
Register Output Specifies the scaling of data type that the register stores. For example. 2–15 for 15 of the above bits
Scale as fractional bits.
Description Text describing the register. The description is propagated to the generated memory map.
Related Information
Avalon-MM Slave Settings (AvalonMMSlaveSettings) on page 234
Register Offset Specifies the address of the register. Must evaluate to an integer address.
Most Significant Specifies the MSB of the memory-mapped register in a processor word (allows different registers to
Bit share same address). When multiple RegBit, RegOut, and RegField blocks specify the same
address, they refer to the same Avalon-MM register. To avoid conflicts, ensure that the ranges that
you specify do not overlap.
Least Significant Specifies the LSB of the memory-mapped register in a processor word (allows different registers to
Bit share same address). When multiple RegBit, RegOut, and RegField blocks specify the same
address, they refer to the same Avalon-MM register. To avoid conflicts, ensure that the ranges that
you specify do not overlap.
Description Text describing the register. The description is propagated to the generated memory map.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
284
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
The length of the Initial Data parameter, 1-D array, determines the size of the
memory. You can optionally initialize the generated HDL with this data.
Memory-Mapped Specifies the address of the memory block. Must evaluate to an integer address.
Address
Enable bit slicing Turn on to allow multiple SharedMem blocks to occupy the same address range and each to take a
slice of the data bus. When you turn on this parameter, enter the most and least significant bits of the
bus that this SharedMem block connects to in the MSB and LSB parameters. When using this
feature, some restrictions apply to the SharedMem block:
• The bit-slice width must be equal to or less than the bus width (i.e. the SharedMem cannot be
asymmetric)
• The bit-slice of one SharedMem block cannot overlap the bit-slice of another
• The bit-slice must match the size of the data type specified in the Memory Output Type
parameter. If SharedMem blocks share address ranges, their address ranges must overlap exactly
• Only other SharedMem blocks can share an address with a SharedMem block
• The SharedMem block must have an auto-generated address map (i.e. the Memory Mapped
Address parameter must be a scalar value)
Read/Write Mode Specifies the mode of the memory as viewed from the processor:
• Read: processor can only read over specified address range.
• Write: processor can only write over specified address range.
• Read/Write: processor can read or write over specified address range.
• Constant: processor cannot access specified address range. This option continues to reserve
space in the memory map.
Initial Data Specifies the initialization data. The size of the 1-D array determines the memory size.
Initialize Turn on when you want to initialize the generated HDL with the specified initial data.
Hardware Memory
Blocks with Initial
Data Contents
Description Text describing the memory block. The description is propagated to the generated memory map.
Memory Output Specifies the data type that the memory block stores.
Type
Memory Output Specifies the scale factor to apply to the data stored in the memory block.
Scale
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
285
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
Intel Stratix 10 devices do not support all modes of memory operation and some
modes are performance limited. For more information, refer to the Intel Stratix 10
Embedded Memory User Guide.
In Intel Stratix 10 designs, Intel recommends you use a SharedMem block for one-
way communication between internal and external Avalon-MM interfaces. Do not select
Read/Write for Read/Write Mode; only use Read or Write for Read/Write Mode
not both read and write. On the internal side, either do not connect the rd interface or
drive we to constant zero. Do not both dynamically drive we and use the rd output.
Only use the SharedMem block in your design for one-way communication.
DSP Builder may duplicate your memory to provide support for up to one write with
two reads on Intel Stratix 10 devices. Reads on the bus and system side are from
separate copies of the memory and any writes are applied to both copies. DSP Builder
offers SharedMem support in true dual port memory configurations depending on the
constraints of the Intel Stratix 10 M20K block. SharedMem blocks have no support
for dual clocks (bus clock must run at system rate) and no support for mixed widths
(SharedMem data width must match bus width).
Related Information
• Avalon-MM Slave Settings (AvalonMMSlaveSettings) on page 234
• Intel Stratix 10 Embedded Memory User Guide
Related Information
• Modifying Avalon-ST Blocks on page 66
• Restrictions for DSP Builder Designs with Avalon-ST Interface Blocks on page 66
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
286
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
sink_data input The data (which may be, or include control data).
sink_ready output Indicates to upstream components that the DSPBA component can accept
sink_data on this rising clock edge.
sink_valid input Indicates that sink_data, sink_channel, sink_sop, and sink_eop are valid.
input_data output The data (which may be, or include control data).
input_ready input indicates from the output of the DSP Builder component that it can accept
sink_data on this rising clock edge.
input_valid output indicates that input_data, input_channel, input_sop and input_eop are
valid.
source_data Output The data to be output (which may be, or include control data).
source_ready Input Indicates from downstream components that they can accept source_data on
this rising clock edge.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
287
13. Interfaces Library
HB_DSPB_ADV | 2019.04.01
output_data input The output data (which may be, or include control data).
output_ready output Indicates from the output of the DSP Builder component that it can accept
sink_data on this rising clock edge.
The downstream system component may not accept data and so may back pressure
this block by forcing Avalon ST signal source_ready = 0. However, thedesign may
still have valid outputs in the pipeline. You must store these outputs in memory. DSP
Builder writes the output data for the design into a data FIFO buffer, with the Avalon-
ST signals channel. It writes sop and eop into the respective channel, FIFO buffers.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
288
HB_DSPB_ADV | 2019.04.01
Send Feedback
These modes of operation engage with type propagation, and provide a convenient
automatic method for generating repeated design elements to operate on all the data
elements within vector and complex signals.
Blocks automatically determine whether the data they process is in scalar or vector
format and operate accordingly.
The hardware elements that these processes generate fully incorporate into the
optimization schemes available within DSP Builder advanced blockset.
This mode provides a convenient way to generate a uniform array to handle each
element of data in a vector signal, without having to manually instantiate multiple
blocks.
Internally, DSP Builder generates identical block instantiations, one for each element
in the vector signal. The vector width propagates through the Simulink system.
When you use a scalar value with vectors, DSP Builder uses a copy of the single scalar
value with each data element in the vector signal.
This behavior is analogous to the scalar expansion that occurs with Simulink blocks.
The outputs of these blocks are potentially a function of any or all of the inputs. Vector
width does not necessarily propagate.
For each complex value, two identical block instantiations generate internally, for the
real and imaginary components.
The complex nature of the data propagates. Strictly real signals expand to provide a
value for the imaginary component with complex data. The exact behavior depends on
the nature of the port associated with the real signal. The real value is duplicated for
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
290
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
control or address signals. The real and imaginary parts of complex data are subject
to identical control signals. A zero imaginary value generates real data signals in a
complex data context. Real data values, x, expand, when required, to x + 0i.
Not all Primitive library blocks support complex data. Data signals are the only signal
type permitted to be complex. DSP Builder issues an error message if an attempt is
made to drive control or address signals with complex values.
The following elements of the Simulink environment are available for use with the
primitive blocks
• Simulink Complex to Real-Imag and Real-Imag to Complex blocks may
manipulate complex signals within DSP Builder advanced blockset designs.
• Simulink Scope blocks can display signals, but they do not directly support
complex data. Attempting to view complex data generates a type propagation
error.
Simulink automatically converts complex values of form (x + 0i) to real values, which
can cause type propagation errors. The complex() function can resolve this problem.
For more information about the radix-22 algorithm, refer to A New Approach to
Pipeline FFT Processor – Shousheng He & Mats Torkleson, Department of Applied
Electronics, Lund University, Sweden.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
291
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
For example:
dspba.fft.full_wordgrowth(true,false,2,fixdt(1,16,15),fixdt(1,18,17))
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
292
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
An FFT with 2N points has N radix-2 stages and (conceptually) N–1 twiddle multipliers.
In practice, DSP Builder optimizes away many of the twiddle multipliers. However,
they still need entries in the twiddle specification.
The twiddle and pruning specification for this FFT consists of a (N–1)x3 array (N–1
rows with 3 entries in each row) of strings which specify these types. DSP Builder uses
strings because Simulink does not pass raw types into the Simulink GUI.
DSP Builder provides three utility functions to generate twiddle and pruning
specifications, each of which implements a different pruning strategy:
• dspba.fft.full_wordgrowth(complexFFT,radix2,N,input_type,twidd
le_type)
• dspba.fft.mild_pruning(complexFFT,radix2,N,input_type,twiddle_
type)
• dspba.fft.prune_to_width(maxWidth,complexFFT,radix2,N,input_ty
pe,twiddle_type)
In addition, DSP Builder provides a fourth function for floating-point FFTs (where no
pruning is required)
• dspba.fft.all_float(N, float_type)
This function generates a pruning specification where the input, twiddle and output
types are all float_type.
The dspba.fft.mild_pruning() grows the datapath by one bit for each two
radix-2 FFT stages.
Intel provides these built-in strategies only for your convenience. If you need a
different pruning strategy, you can define and use your own pruning function (or just
construct the pruning or twiddle array manually).
Each of these utility functions generate an array in the appropriate format (N–1 rows,
each containing three entries).
In each case:
• complexFFT is a Boolean number (usually true) that indicates whether the FFT's
input is complex.
• radix2 is a Boolean number (usually false) that indicates whether the FFT can
have two consecutive twiddle stages.
• N is an integer indicating the number of radix-2 stages in the FFT. For example, 10
for a 1,024-point FFT.
• input_type is the type of the input signal.
• twiddle_type is the type of the twiddle constants.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
293
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Use the BitVectorCombine block to recombine scalars that the SplitScalar block
splits.
The BFU_long block corresponds to a classical radix-22 butterfly I block plus its
associated feedback path.
The BFU_short block has exactly the same functionality, but it uses only one floating-
point adders. It uses twice as many memory resources as the BFU_long block, but
also uses considerably less logic resources.
Each BFU block performs a two-point FFT pass over a block of data of size 2N (where N
is a compile-time parameter).
During the first 2(N–1) cycles, the control signal, s, is 0. During this time, the BFU
block stores the first half of the input block.
During the second 2(N–1) cycles, s is 1. During this time, the BFU block reads the
second half of the input block and produces the first result of each of 2(N–1) two-point
FFTs on the output.
During the third 2(N–1) cycles, s is 0 again. During this time, the BFU unit produces the
second result of each of the 2(N–1) two-point FFTs, while simultaneously storing the
first half of the next input block.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
294
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
s Input Boolean. Control pin. Drive with external logic. Ensure it is 0 for 2(N-1)
cycles and 1 for the next 2(N-1) cycles.
You should parameterize this block with the incoming data type to ensure that DSP
Builder maintains the necessary data precision. At the output, DSP Builder applies an
additional bit of growth.
The s port connects to the control logic. This control logic is the extraction of the
appropriate bit of a modulo N counter. The value of s determines the signal routing of
each sample and the mathematical combination with other samples.
Input scaling exponent Specifies the fixed-point scaling factor of the input.
x2 Input Complex fixed-point data-type determined by Complex data input from previous stage.
parameterization
z1 Output Complex fixed-point data-type determined by Complex data output to next stage.
parameterization
You should parameterize this block with the incoming data type to ensure that DSP
Builder maintains the necessary data precision. At the output, DSP Builder applies an
additional bit of growth.
The s port connects to the control logic. This control logic is the extraction of the
appropriate bit of a modulo N counter. The value of s determines the signal routing of
each sample and the mathematical combination with other samples. The t port also
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
295
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
connects to the control logic, but the extracted bit is different from the s port. The
value of t determines whether an additional multiplication by –j occurs inside the
butterfly unit.
IFFT Specifies that the design uses the BFIIC block in an IFFT.
Input scaling exponent Specifies the exponent part of the input scaling factor (2-exponent).
Allow output bitwidth growth Specifies that the output is one bit wider than the input.
x2 Input Complex fixed-point data-type determined by Complex input from previous stage.
parameterization
You specify the bits that occur in the output signal by providing a vector of non-
negative integers. Each integer specifies an input bit appears in the output. The block
numbers the input bits from 0 (least significant bit) and lists the output bits starting
from the least significant bit (little-endian ordering).
The block has no restriction on how many times each input bit may appear in the
output. You can omit, reorder, or duplicate bits.
For example, the vector [0,1,4,4,6,5] keeps bits 0 and 1 unchanged, omits bit 3,
duplicates bit 4 and swaps the positions of bits 5 and 6.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
296
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
This block uses an efficient dual-port architecture to minimize the size of the internal
lookup table while supporting the generation of two complex twiddle factors per clock
cycle. The block provides k1 and k2 at the input and they must be less than or equal
to a synthesis time parameter N. Enter the width in bits and fixed-point scaling of the
twiddle factors.
A cosine/sine wave has a range of [-1:1], so you must provide at least two integer
bits, and as many fractional bits as are appropriate. A good starting point is a twiddle
width in bits of 16 bits (enter 16 as the Precision), and a scaling of 2^-14 (enter 14
as the Scaling exponent). The resulting fixed-point type is sfix16_en14 (2.14 in
fixed-point format).
Number of points (N) Specifies the number of points on the unit circle.
Twiddle scaling Specifies the fixed-point scaling factor of the complex twiddle factor.
exponent
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
297
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Related Information
About Pruning and Twiddle for FFT Blocks on page 292
FFT type Specifies whether to generate twiddle factors for an FFT or an IFFT.
Twiddle type Specifies the floating-point type used for the twiddle factors.
Twiddle/pruning
specification(
Use faithful rounding True if the block uses faithful (rather than correct) rounding for floating-point operations.
Fixed-point FFTs ignore this parameter.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
298
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Related Information
About Pruning and Twiddle for FFT Blocks on page 292
Unlike the corresponding P blocks (FFT2P, FFT4P, etc), they implement both FFTs
and iFFTs and offer flexible ordering of the input and output wires.
Each block can also be internally parallelized to process several FFTs at once. For
example, if there are 16 wires, each FFT8P block can calculate two 8-point FFTs (by
specifying the number of spatial bits to be 4). With 32 wires, the same block can
calculate four 8-point FFTs (by specifying the number of spatial bits to be 5).
Twiddle/pruning
specification(
Use faithful rounding true if the block uses faithful (rather than correct) rounding for floating-point operations.
Fixed-point FFTs ignore this parameter.
Related Information
About Pruning and Twiddle for FFT Blocks on page 292
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
299
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
The GeneralTwiddle block generates its twiddle factors using the TwiddleRom
block; the GeneralMultTwiddle block uses the TwiddleMultRom block. The
GeneralMultTwiddle uses approximately twice as many DSP blocks as the
GeneralTwiddle block, but (for large FFTs) uses far fewer memory blocks.
Each data sample in the input stream has a unique address. The address consists of
the timeslot in which it arrived tbits concatenated with the number of wires on which it
arrived sbits. The sbits is forms the least significant part of the address; the tbits
forms the most significant part.
Each data sample is multiplied by a twiddle factor. For an FFT, the twiddle factor is:
twiddle = exp(-2*pi*i*angle/K)
twiddle = exp(2*pi*i*angle/K)
angle = X*Y
where X and Y depend on the position of that data sample in the input stream.
Obtain the value of X (or Y) by extracting user-specified bits from the address of the
data sample, and concatenating them.
sbits The number of spatial address bits i.e. log2(N) where there are N wires.
Use faithful rounding Use faithful rather than correct rounding. Only for floating-point twiddle types.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
300
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Number of spatial bits The number of spatial address bits i.e. log2(N) where there are N wires.
Size of the parallel section The number of radix-2 stages assigned to the parallel section in the surrounding
HybridVFFT (the GeneralVTwiddle links between the serial and parallel sections of the
HybridVFFT)
Use faithful rounding Use faithful rather than correct rounding. Only for floating-point twiddle types.
drop Input Unsigned integer The number of stages to drop. GeneralMultiVTwiddle only.
The hybrid implementation consists of an optional serial section (built using single-
wire streaming FFTs) associated twiddle block, and a parallel section (implemented
using the PFFT_Pipe block).
You control the length of the serial section by a user-supplied parameter. For an FFT
with 2N points that processes 2M points per cycle, this parameter must be no greater
than N–M.
In general, the serial section is more space-efficient; the parallel section is more
multiplier-efficient. So changing the value of this parameter provides a trade-off
between DSP usage and memory usage.
The HybridVFFT serial section absorbs all the variability and the size of the parallel
section is fixed. The variable-size hybrid FFT includes multiple variable-size streaming
FFTs, a variable-size GeneralTwiddle and a parallel FFT.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
301
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
minsize The minimum FFT size is 2^minsize and is limited by the value of sbits M. It cannot be
smaller than 2^sbits (2^M). HybridVFFT only.
Number of serial stages Length of the serial section (in radix-2 stages).
Twiddle/pruning -
specification(
Optimize twiddle memory true to use GeneralMultTwidle (rather than GeneralTwiddle) for top-level twiddle.
usage
Use faithful rounding true if the block uses faithful (rather than correct) rounding for floating-point operations.
Fixed-point FFTs ignore this parameter.
Table 116. Port Interface for the Hybrid_FFT and HybridVFFT Blocks
Signal Direction Type Description
size Input Unsigned integer FFT serial section size, which must be at least equal to the
difference between maxsize and minsize.
Bold b
Italic i
Underlined u
Related Information
About Pruning and Twiddle for FFT Blocks on page 292
Each element in the block has a logical address, which DSP Builder forms by
concatenating its spatial address (wire number) with its temporal address (slot
number). The spatial address is the least-significant part of the logical address; the
temporal address is the most significant part.The block specifies the reordering as an
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
302
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
arbitrary permutation of the address bits.The block numbers the address bits from 0
(least significant). The block specifies the permutation by listing the address bits in
order, starting with the least significant.
Address permutation A vector of integers that describes how to rearrange the block of data.
N The number of spatial address bits. The block has 2N data wires.
The PFFT_Pipe block uses a pipeline of (small) fully-parallel FFTs, twiddle, and
transpose blocks. This FFT uses only a small number of DSP blocks but has a relative
high latency (and associated memory usage).
Twiddle/pruning -.
specification(
Use faithful rounding true if the block uses faithful (rather than correct) rounding for floating-point operations.
Fixed-point FFTs ignore this parameter.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
303
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Related Information
About Pruning and Twiddle for FFT Blocks on page 292
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
304
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
You specify the reordering as an arbitrary permutation of the address bits. The block
numbers the address bits from 0 (least significant). The block specifies the
permutation by listing the address bits in order, starting with the least significant.
[4 5 2 3 0 1] digit-reverse it (radix 4)
Address permutation A vector of integers that describes how to rearrange the block of data.
Width Width of the scalar (in bits), which is also the width of the output vector.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
305
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Table 129. Parameters for the FFT2, FFT4, VFFT2, and VFFT4 Block
Parameter Description
iFFT true for an iFFT, otherwise false. FFTT4 and VFFT4 only.
Bit reversed input true for bit-reversed inputs. FFT4 and VFFT4 only.
Stages before this The number of stages to the left of this FFT.
Stages after this The number of stages to the right of this FFT.
Input type The type of the input signal. For example: fixdt(1,16,15).
Use faithful rounding true to use faithful rather than correct rounding. Fixed-point FFTs ignore this parameter.
Table 130. Port Interface for the FFT2, FFT4, VFFT2, and VFFT4 Blocks
Signal Direction Type Description
drop Input uint(k) for some k Total number of FFT stages to bypass.
qdrop Output uint(k) for some k Total number of FFT stages to bypass.
The TwiddleAngle block takes the output of the counter and splits it into three parts:
• The channel field (LSBs of the counter)
• The index field
• The pivot field (MSBs of the counter)
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
306
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
The TwiddleAngle block has an additional input: v, which keeps the internal state of
the TwiddleAngle block synchronized with the counter. This input should be identical
to the enable input to the counter.
Feed at the input by a modulo N counter (where N is an integer power of two) and the
appropriate complex sequence generates at the output.
To parameterize this block, set the Counter bit width parameter with log2(N) and
enter the width in bits and fixed-point scaling of the twiddle factors. A cosine or sine
wave has a range of [-1:1], therefore you must provide at least two integer bits, and
as many fractional bits as are appropriate. Starting with a twiddle bit width of 16 bits
(enter 16 as the twiddle bit width), and a scaling of 2–14 (enter 14 as the Twiddle
scaling exponent). The resulting fixed-point type is sfix16_en14 (2.14 fixed-point
format).
FFT type Specifies whether to generate twiddle factors for an FFT or an IFFT.
Twiddle scaling exponent value Specifies the fixed-point scaling factor of the complex twiddle factor.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
307
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Each twiddle block joins two FFTs (the left constituent FFT and the right constituent
FFT) to form a larger FFT. For variable-size FFTs, use the VTwiddle block. Each of the
constituent FFTs is either a primitive FFT (e.g. an FFT4 block) or is multiple FFT and
twiddle blocks.
The Twiddle block FFT may be part of an even larger FFT. In fact, the pipeline is
formed by linearizing a binary tree of FFTs (leaf nodes) and twiddle blocks (internal
nodes).
Stages before this The number of stages to the left(*) of this composite FFT.
Stages after this The number of stages to the right(*) of this composite FFT.
Input type The type to which DSP Builder should convert the input. Refers to the type of the input
after you apply an explicit type conversion. It doesn't have to exactly match the actual
input type to the Twiddle block.
Use faithful rounding Use faithful rather than correct rounding. Fixed-point FFTs ignore this parameter.
Note: For bit-reversed FFTs, reverse left and right, so left refers to the number of stages to
the right of the current block.
Table 136. Port Interface for the Twiddle and VTwiddle Blocks
Signal Direction Type Description
drop Input uint(k) for some k Total number of FFT stages to bypass
qdrop Output uint(k) for some k Total number of FFT stages to bypass
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
308
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
The TwiddleRom and TwiddleMutlRom block construct FFTs. They map an angle
(specified as an unsigned integer) to a complex number (the twiddle factor). For an
FFT, the mapping is:
twiddle = exp(-2*pi*i*angle/N)
twiddle = exp(2*pi*i*angle/N)
where N = 2anglewidth and anglewidth is the width of the angle input signal.
The TwiddleRom and TwiddleMultRom blocks have the same external interface but
different internal implementations. TwiddleRom uses a single large memory;
TwiddleMultRom uses two smaller memories and constructs the twiddle factors
using complex multiplication.
TwiddleMultRom consumes more DSP blocks but generally uses fewer memory
blocks than TwiddleRom. TwiddleMultRom also produces slightly less accurate
results than TwiddleRom.
Angle bit width The width of the angle input signal in bits.
Twiddle type The type of the twiddle output. For example: fixdt(1,18,17).
Table 138. Port Interface for the TwiddleROM and TwiddleMultRom Blocks
Signal Direction Type Description
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
309
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
310
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
q = abs(a)
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Inherit via internal rule with word growth: the number of fractional bits is the maximum of
the number of fractional bits in the input data types. The number of integer bits is the maximum of
the number of integer bits in the input data types plus one. This additional word growth allows for
subtracting the most negative number from 0, which exceeds the maximum positive number that
the number of bits of the input can store.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
311
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
r= acc(x, n)
The acc block allows accumulating data sets of variable lengths. The block indicates a
new data set by setting n high with the first element of the accumulation.
clk
x X0 X1 X2 y0 y1 y2
The acc block has single and double-precision floating-point data inputs and outputs.
LSBA This parameter defines the weight of the accumulator’s LSB, and therefore the accuracy of the
accumulation. This value and the maximum number of terms to be accumulated sets the accuracy of
the accumulator. The maximum number of terms the design can accumulate can invalidate the
log_2(N) lower bits of the accumulator. For instance, if an accuracy of 2^(-30) is enough, and you add
1k of numbers, LSBA = –30 – log2(1k) , which is approximately –40.
MSBA The weight of the MSB of the accumulation result. Adding a few guard bits to the value has little
impact on the implementation size. You can set this parameter in one of the following ways:
• For a stock simulation, to limit the value of any stock to $100k before the simulation is invalid, use
a value of ceil(log_2(100K))~ceil(16.6)=17
• For a simulation where the implemented circuit adds numbers <=1, for one year, at 400MHz, use
ceil(log_2(365*86400*400*10^6))~54
maxMSBX The maximum weight of the inputs. When adding probabilities <=1 set this weight to 0. When adding
data from sensors, set bounds on the input ranges. Alternatively, set MaxMSBX = MSBA. However,
the size of the architecture may increase.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
312
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
xO Output Boolean This flag goes high when the input Yes No
value has a weight larger than
selected value for MaxMSBX. The
result of the accumulation is then
invalid.
14.3.3. Add
The Add block outputs the sum of the inputs:
q=a+b
For two or more inputs, the Add block outputs the sum of the inputs:
q = a + b + ...
For a single vector input, the Add block outputs the sum of elements:
q = Σ an
For a single scalar input, the Add block outputs the input value:
q=a
Output data type Determines how the block sets its output data type:
mode
continued...
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
313
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
• Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Inherit via internal rule with word growth: the number of fractional bits is the maximum of
the number of fractional bits in the input data types. The number of integer bits is the maximum of
the number of integer bits in the input data types plus one. This additional word growth allows for
subtracting the most negative number from 0, which exceeds the maximum positive number that
you can store in the number of bits of the input.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Fused datapath This option affects the floating-point architectures. Turn on this option to save hardware by omitting
normalization stages between adder stages. The output deviates from that expected of IEEE
compliance.
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Related Information
Forcing Soft Floating-point Data Types with the Advanced Options on page 232
q = s ? v : (a + b)
If the s input is low, output the sum of the first 2 inputs, a + b, else if s is high, then
output the value v.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
314
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Inherit via internal rule with word growth: the number of fractional bits is the maximum of
the number of fractional bits in the input data types. The number of integer bits is the maximum of
the number of integer bits in the input data types plus one. This additional word growth allows for
subtracting the most negative number from 0, which exceeds the maximum positive number that
you can store in the number of bits of the input.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
14.3.5. AddSub
The AddSub block produces either the sum (a + b) or the difference (a – b)
depending on the input you select (1 for add; 0 for subtract).
Note: For single-precision inputs and designs targeting any device with a floating-point DSP
block, the block uses a mixture of resources including the DSP blocks in floating-point
mode.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
315
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
14.3.6. AddSubFused
The AddSubFused block produces both the sum and the difference of the IEEE
floating-point signals that arrive on the input ports.
If the number of inputs is set to 1, then the logical and of all the individual bits of the
input word is output.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
316
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
You can change the number of inputs on the BitCombine block according to your
requirements. When Boolean vectors are input on multiple ports, DSP Builder
combines corresponding components from each vector and outputs a vector of signals.
The widths of all input vectors must match. However, the widths of the signals arriving
on different inputs do not have to be equal. The one input BitCombine block is a
special case that concatenates all the components of the input vector, so that one wide
scalar signal is output. Use with logical operators to apply a 1-bit reducing operator to
Boolean vectors.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
q = (a >> LSB)
If bit position is a negative number, the bit position is an offset from the MSB instead
of LSB.
If the BitExtract block initialization parameter is a vector of LSB positions, the output
is a vector of matching width, even if the input is a scalar signal. Use this feature to
split a wide data line into a vector of Boolean signals. The components are in the same
order as you specify in the initialization parameter. If the input to the BitCombine
block is a vector, the width of any vector initialization parameter must match, and
then a different bit can be selected from each component in the vector. The output
data type does not always have to be Boolean signals. For example, setting to uint8
provides a simple way to split one wide signal into a vector of unsigned 8-bit data
lines.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
317
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Least Significant Specifies the bit position from the input word as the LSB in the output word.
Bit Position from
Input Word
0 Less than.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
318
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
s Comparison Operator
2 Equal.
4 Greater than.
5 Not equal.
If a = x + yi,
then q = x - yi
q=a
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Inherit via internal rule with word growth: the number of fractional bits is the maximum of
the number of fractional bits in the input data types. The number of integer bits is the maximum of
the number of integer bits in the input data types plus one. This additional word growth allows for
subtracting the most negative number from 0, which exceeds the maximum positive number that
you can store in the number of bits of the input.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
319
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
a == b
a >= b
a<b
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
320
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
a ~= b
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Inherit via back projection: a downstream block that this block drives determines the output
data type. If the driven block does not propagate a data type to the driver, you must use a
Simulink SameDT block to copy the required data type to the output wire.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
• Single: single-precision floating-point data.
• Double: double-precision floating-point data.
• Variable precision floating point: variable precision floating-point output type
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
continued...
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
321
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
Value Specifies the constant value. This parameter may also be a fi object when specifying data of arbitrarily
high precision.
Warn when value Turn off if the constant if you design the constant to be saturated.
is saturated The Constant block generates a warning in the Simulink Diagnostic Viewer if the bit-width is not
sufficient to represent the value. For example:
Warning: Constant block 'constant_saturation_UUT/Const1' has saturated due to
insufficient bit-width. SUGGESTION: Increase the bit-width or disable this
warning in the block parameters.
To configure a Const, DualMem, or LUT with data of precision higher than IEEE
double precision, create a MATLAB fi object of the required precision that contains the
high precision data. Avoid truncation when creating this object. Use the fi object to
specify the Value of the Const, the Initial Contents of the DualMem block, or the
Output value map of the LUT block.
The Value parameter is a floating-point scaling factor that is multiplied by the input
signal. If this parameter is a vector, the output is a vector. If both the input and the
Value parameter are vectors, they must have the same length. If the Value
parameter is complex, the block performs a complex multiply and the output is
complex.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
322
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
14.3.19. Convert
The Convert block performs a type conversion of the input, and outputs the new data
type.
You can optionally perform truncation, biased, or unbiased rounding if the output data
type is smaller than the input. The LSB must be a value in the width in bits of the
input type.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Inherit via back projection: a downstream block that this block drives determines the output
data type. If the driven block does not propagate a data type to the driver, you must use a
Simulink SameDT block to copy the required data type to the output wire.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.
• Boolean: the output type is Boolean.
• Single: single-precision floating-point data.
• Double: double-precision floating-point data.
• Variable precision floating point: variable precision floating-point output type
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Saturation The Convert block allows saturation, which has an optional clip detect output that outputs 1 if any
clipping has occurred. Saturation choices are none, symmetric, or asymmetric.
For example, for an Add or Mult block, you can select the output word-length and
fractional part using dialog.
Specifying the output type is a casting operation, which does not preserve the
numerical value, only the underlying bits. This method never adds hardware to a block
— just changes the interpretation of the output bits.
For example, for a multiplier with both input data-types, sfix16_En15 has output
type sfix32_En30. If you select output format sfix32_En28, the output numerical
value multiplies by four. For example, 1*1 input gives an output value of 4.
If the you select output format sfix32_En31, the output numerical value is divided by
two. For example 1*1 input gives an output value of 0.5.
If you want to change data-type format in a way that preserves the numerical value,
use a convert block, which adds the corresponding hardware. Adding a convert block
directly after a primitive block lets you specify the data-type to preserve the
numerical value.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
323
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
For example, a Mult block followed by a Convert block, with input values 1*1 always
give output value 1.
14.3.20. CORDIC
The CORDIC block performs a coordinate rotation using the coordinate rotation digital
computer algorithm.
The CORDIC algorithm is generally faster than other approaches when you do not
want to use a hardware multiplier, or you want to minimize the number of gates
required. Alternatively, when a hardware multiplier is available, table-lookup and
power series methods are generally faster than CORDIC.
You can calculate this total gain in advance and stored in a table. Additionally:
The CORDIC block implements the these iterative steps using a set of shift-add
algorithms to perform a coordinate rotation.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
324
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
The CORDIC block takes four inputs, where the x and y inputs represent the (x, y)
coordinates of the input vector, the p input represents the angle input, and the v
represents the mode of the CORDIC block. It supports the following modes:
• The first mode rotates the input vector by a specified angle.
• The second mode rotates the input vector to the x-axis while recording the angle
required to make that rotation.
The x and y inputs must have the same width in bits. The input width in bits of the x
and y inputs determines the number of stages (iterations) inside the CORDIC block,
unless you explicitly specify an output width in bits smaller than the input width in bits
in the block parameters.
The CORDIC gain is completely ignored to save time and resource. The width in bits of
the x and y inputs automatically grows by two bits inside the CORDIC block to
account for the gaining factor of the CORDIC algorithm. Hence the x and y outputs
are two bits wider than the input and you must handle the extra two bits in your
design, if you have not specified the output width in bits explicitly through the block
parameters. You can compensate for the CORDIC gain outside the CORDIC block.
The p input is the angular value and has a range between –π and +π, which requires
at least three integer bits to fully represent the range. The v input determines the
mode. You can trade accuracy for size (and efficiency) by specifying a smaller output
data width to reduce the number of stages inside the CORDIC block.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields
that are available when this option is selected. This option reinterprets the output bit pattern
from the LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
325
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
p Output Any fixed-point type Angle through which the. Yes Yes
coordinates rotate
14.3.21. Counter
The Counter block maintains a counter and outputs the counter value each cycle.
The input is a counter enable and allows you to implement irregular counters. The
counter initializes to the value that you provide, and counts with the modulo, with the
step size you provide:
count = _pre_initialization_value;
Note: If you create a counter with a preinitialization value of 0 and with a step of 1, it
outputs the value 1 (not 0) on its first enabled cycle. If you want the counter to output
0 on its first valid output, initialize with:
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
326
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Note: Modulo and step size cannot be coprime—the step size must exactly divide into the
modulo value.
Output data Determines how the block sets its output data type:
type mode • Inherit via internal rule: the number of integer and fractional bits is the
maximum of the number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using
additional fields that are available when this option is selected. This option
reinterprets the output bit pattern from the LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data Specifies the output data type. For example, sfix(16), uint(8).
type
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Mode Determines how the block sets its output data type:
• Count Leading Zeroes: returns the count of the leading zeros in the input
• Count Leading Ones: returns the count of the leading ones in the input
• Count Leading Digits: returns the count of the leading sign digits in the input
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
327
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
The behavior of read during write cycles of the memories depends on the interface to
which you read:
• Reading from q1 while writing to interface 1 outputs the new data on q1 (write
first behavior).
• Reading from q2 while writing to interface 1 outputs the old data on q2 (read first
behavior).
Turning on DONT_CARE may give a higher fMAX for your design, especially if you
implement the memory as a MLAB. When this option is on, the output is not double-
registered (and therefore, in the case of MLAB implementation, uses fewer external
registers), and you gain an extra half-cycle on the output. The word don’t care
overlaid on the block symbol indicates the current setting is DON’T CARE. The default
is off, which outputs old data for read-during-write.
Output data type mode Determines how the block sets its output data type:
• Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional
fields that are available when this option is selected. This option reinterprets the output bit
pattern from the LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling value Specifies the output scaling value. For example, 2^-15.
Initial contents Specifies the initialization data. The size of the 1-D array determines the memory size. This
parameter may also be a fi object when specifying data of arbitrarily high precision.
Use DONT_CARE when Turn this option on to produce faster hardware ( a higher fMAX) but with uncertain read data in
reading from and writing hardware if you are simultaneously reading from and writing to the same address. Ensure that
to the same address you do not read from or write to the same address at the same time to guarantee valid read
data.
Intel Stratix 10 devices restrict permissible configurations of the DualMem block. You might
need to turn on this option for a valid configuration, if DSP Builder gives a warning.To avoid this
restriction, implement a simple dual-port RAM, with port 1 as write-only (e.g. connect q1 read
on port 1 to a Simulink terminator block), and port 2 as read-only (with separate addressing).
When you turn on this option and you have a write on one port, a read on another port, and
both have the same address, the read data is undefined. Simulink simulations represent these
undefined values as zeros; the ModelSim simulation shows Xs. This difference in representation
may cause simulation mismatches if you allow such undefined values to be generated. To
prevent simulation mismatches, either avoid generating accesses that cause undefined values
or detect the conditions of address equality and a write access at the input and not propagate
that output.
Allow write on both Previous to v15.0 you can read and write on the first port but only read on the second port.
ports From v15.0 you can read and write on both ports when you turn on this option.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
328
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
You can specify the contents of the DualMem block in one of the following ways:
• Use a single row or column vector to specify table contents. The length of the 1D
row or column vector determines the number of addressable entries in the table.
If DSP Builder reads vector data from the table, all components of a given vector
share the same value.
• When a look-up table contains vector data, you can provide a matrix to specify the
table contents. The number of rows in the matrix determines the number of
addressable entries in the table. Each row specifies the vector contents of the
corresponding table entry. The number of columns must match the vector length,
otherwise DSP Builder issues an error.
To configure a Const, DualMem, or LUT with data of precision higher than IEEE
double precision, create a MATLAB fi object of the required precision that contains the
high precision data. Avoid truncation when creating this object. Use the fi object to
specify the Value of the Const, the Initial Contents of the DualMem block, or the
Output value map of the LUT block.
d Input Any fixed-point type Data to write for interface Yes Yes
1
Note:
1. If the address for interface 1 exceeds the memory size, q1 is not defined. If the address for interface 2 exceeds the
memory size, q2 is not defined. To write to the same location as DSP Builder reads from with q2, you must provide the
same address on both interfaces.
Related Information
RAM Megafunction User Guide.
For more information about this option
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
329
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Number of output channels A vector of the channel number you want to see for example [0 1 3].
Number of input channels The number of input channels. The block takes valid, channel, and (vector) data inputs. The
channel is the normal channel count, which varies across 0 to
NumberOfChannelsPerWire.
14.3.25. Divide
The Divide block outputs the first input, a, divided by the second input, b.
q = a/b
Output data type mode Determines how the block sets its output data type:
• Inherit via internal rule: if the input data types are floating-point, the output data type
is the same floating-point data type. Mixing different precisions is not allowed.
If the input data types are fixed-point, the output data type is fixed point with bitwidth
equal the to the sum of the bitwidths of the input data types. The fraction width is equal to
the sum of the fraction width of the a-input data type, and the integer bitwidth of the b-
input data type.
• Specify via dialog: you can set the output type of the block explicitly using additional
fields that are available when this option is selected. This option type casts the output to
the chosen fixed-point type. Attempting to type cast floating-point input is disallowed. You
can only use this option to trim bits off the least significant end of the output data type
that is otherwise inherited.
Output data type Specifies the output data type. For example, fixdt(1,16,15)
Output scaling value Specifies the output scaling value. For example, 2^-15.
Float point rounding This option only has an effect for floating-point inputs.:
• Correct: the result is correctly rounded IEEE
• Faithful: the result is may be rounded up or may be rounded down
a sfix16_en10 6 10
b sfix12_en7 5 7
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
330
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
If you specify Specify via dialog for the Output data type mode, the block restricts
the allowed data types to: sfix28_en15, sfix27_en14, sfix26_en13, etc.
14.3.26. Fanout
The Fanout block behaves like a wire, connecting its single input to one or more
outputs.
The number of outputs is one of the parameters of the Fanout block. Use a Fanout
block instead of a simple wire to provide a hint to DSP Builder that the wire is
expected to be long. DSP Builder might ignore the hint (which amounts to
implementing the Fanout block using a simple wire), or might insert one or more
additional registers on the wire to improve the physical routability of the design. The
number of registers it inserts depends on the target device, target fMAX and other
properties of your design. Inserting a Fanout block does not change the behaviour of
your design. If DSP Builder chooses to insert extra registers, it automatically adjusts
the latency of any parallel paths to preserve the original wire-like behaviour. By
default, DSP Builder implements all Fanout blocks as simple wires on non-HyperFlex
devices. FFTs and FIRs (which both contain embedded Fanout blocks) retain the same
QoR characteristics as in DSP Builder v15.0 and earlier (which has no Fanout blocks).
To enable DSP Builder to choose different implementations for the Fanout blocks in
your design, specify DSPBA_Features.EnableFanoutBlocks = true; at the
MATLAB command line. This command increases the number of registers your design
uses, but potentially increases its fMAX. You can specify that DSP Builder doesn't need
to initialize any registers that it chooses to insert. Then DSP Builder inserts hyper-
registers (instead of ordinary, ALM registers) on devices that support the HyperFlex
architecture. You should use this option for datapaths where the initial value is
unimportant, but you should avoid using it for control paths.
Uninitialized Check box Specifies whether DSP Builder can use hyper registers.
When you apply automatic reset minimization, turn off Uninitialized,
which allows reset minimization to choose the correct behavior
automatically.
Turning on Uninitialized forces no reset.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
331
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
14.3.27. FIFO
The FIFO block models a FIFO memory. DSP Builder writes data through the d input
when the write-enable input w is high. After some implementation-specific number of
cycles, DSP Builder presents data at output q and the valid output v goes high. DSP
Builder holds this data at output q until the read acknowledge input r is set high.
The FIFO block wraps the Intel single clock FIFO (SCFIFO) megafunction operating in
show-ahead mode. That is, the read input, r, is a read acknowledgement which
means the DSP Builder has read the output data, q, from the FIFO buffer, so you can
delete it and show the next data output on q. The data you present on q is only valid
if the output valid signal, v, is high
FIFO Setup A vector of three non-zero integers in the format: [<depth> <fill_threshold> <full_period>]
• depth specifies the maximum number of data values that the FIFO can store.
• fill_threshold specifies a low-threshold for empty-detection. If the number of data items in the
memory is greater than the low-threshold, the t output is 1 (otherwise it is 0).
• full_period specifies a high-threshold for full-detection If the number of data items is greater than the
high-threshold, output f is 1 (otherwise it is 0).
If the inputs w or r is a vector, the FIFO setup parameter must be a three column
matrix with the number of rows equal to the number of components in the vector.
Each row in the matrix independently configures the depth, fill_threshold, and
full_period of the FIFO buffer for the corresponding vector component.
You can to set fill_threshold to a low number (<3) and arrive at a state such that
output t is high and output v is low, because of differences in latency across different
pairs of ports—from w to v is 3 cycles, from r to t is 1 cycle, from w to t is 1 cycle. If
this situation arises, do not send a read acknowledgement to the FIFO buffer. Ensure
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
332
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
that when the v output is low, the r input is also low, otherwise a warning appears in
the MATLAB command window. If the read acknowledgement is derived from a
feedback from the t output, ensure that the fill_threshold is set to a sufficiently
high number (3 or above). Likewise for the f output and the full_period.
You may supply vector data to the d input, and vector data on the q output is the
result. DSP Builder does not support vector signals on the w or r inputs, and the
behavior is unspecified. The v, t, and f outputs are always scalar.
x Input Yes — —
Output data type mode Determines how the block sets its output data type:
• Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields
that are available when this option is selected. This option reinterprets the output bit pattern
from the LSB up according to the specified type.
• Boolean: the output type is Boolean.
Function Accumulate (fpAcc) or multiply accumulate (fpMultAcc). When you select fpMultAcc, the block
flushes denormalized numbers to zero on inputs and output; when you select fpAcc, the block
flushes subnormal numbers to zero on inputs and output.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
333
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
x Input — — —
acc Input — — —
q Output — — —
14.3.30. ForLoop
The ForLoop block extends the basic loop, providing a more flexible structure that
implements all common loop structures—for example, triangular loops, parallel loops,
and sequential loops.
Each ForLoop block manages a single counter with a token-passing scheme that
allows you to link these counters in a variety ways.
Each ForLoop block has a static loop test parameter, which may be <=, <, > or >=.
Loops that count up should use <= or <, depending on whether you consider the limit
value, supplied by the limit signal, is within the range of the loop. Loops that count
down should use >= or >.
The latency of the ForLoop block is non-zero. At loop end detection there are some
cycles that may be invalid overhead required to build nested loop structures. The
second activation of an inner loop does not necessarily begin immediately after the
end of the first activation.
bs Output Boolean Token-passing inputs and outputs. The four signals ls Yes No
(loop start), bs (body start), bd (body done) and ld
bd Input Boolean (loop done) pass a control token between different Yes No
ForLoop blocks, to create a variety of different control
ld Output Boolean structures. Yes No
Input Boolean When the ls port receives a token, the ForLoop block Yes No
ls
initializes. The loop counter is set to its initial value
(that the i signal specifies). When the bd port receives
a token, the step value (s) increments the loop
counter. In either case, the new value of the counter is
compared with the limit value (l) with the statically-
configured loop test.
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
334
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
c Output Derived The signal c is the count output from the loop. Its Yes No
unsigned value is reliable only when the valid signal, v, is active
fixed-point
type
e Input Boolean Use the enable input, e, to suspend and resume Yes No
operation of the ForLoop block. When you disable the
loop, the valid signal, v, goes low but DSP Builder
makes no changes to the internal state of the block.
When you re-enable the block, it resumes counting
from the state at which you suspended it.
el Output Boolean Auxiliary loop outputs: the signals fl and ll are active Yes No
on the first loop iteration and last loop iteration,
fl Output Boolean respectively. The signal el is active when the ForLoop Yes No
block is processing an empty loop.
ll Output Boolean Yes No
ldexp(a,b) outputs the first input, a, scaled by 2 raised to the power of the second input, b.
q = a.2b .
The Function mask parameter selects either ldexp or ilogb. The number of input
ports on the block change according to the number of operands.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
335
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
q = (a << b)
The width of the data type a determines the maximum size of the shift. Shifts of more
than the input word width result in an output of 0.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
336
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Internal registers hold the value, modulo, and step size of the counter. The values of
these registers on reset are parameters that you can set on the block. Additionally,
you can reload these registers with new values in-circuit by raising the ld signal high.
While ld is high, DSP Builder writes the values of the i, s, and m input signals into the
value, step, and modulo registers, respectively. The value of i passes through to the
counter output. When ld falls low again, the counter resumes its normal operation
starting from these new values.
If the initial or step values exceed the modulo value, the behavior is undefined. Using
signed step values increases logic usage in hardware.
Counter setup A vector that specifies the counter settings on reset in the
following format:
[<initial value> <modulo> <step size>]
q = LUT[a]
The size of the table determines the size of the initialization arrays.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Inherit via back projection: a downstream block that this block drives determines the output
data type. If the driven block does not propagate a data type to the driver, you must use a
Simulink SameDT block to copy the required data type to the output wire.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
continued...
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
337
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Output value map Specifies the location of the output values. For example, round([0;254]/17). This parameter may
also be a fi object when specifying data of arbitrarily high precision.
You can specify the contents of the Lut block in one of the following ways:
• Specify table contents with a single row or column vector. The length of the 1D
row or column vector determines the number of addressable entries in the table.
If DSP Builder reads vector data from the table, all components of a given vector
share the same value.
• When a look-up table contains vector data, you can provide a matrix to specify the
table contents. The number of rows in the matrix determines the number of
addressable entries in the table. Each row specifies the vector contents of the
corresponding table entry. The number of columns must match the vector length,
otherwise DSP Builder issues an error.
Note: The default initialization of the LUT is a row vector round([0:255]/17). This vector
is inconsistent with the default for the DualMem block, which is a column vector
[zeros(16, 1)]. The latter form is consistent with the new matrix initialization form in
which the number of rows determines the addressable size.
To configure a Const, DualMem, or LUT with data of precision higher than IEEE
double precision, create a MATLAB fi object of the required precision that contains the
high precision data. Avoid truncation when creating this object. Use the fi object to
specify the Value of the Const, the Initial Contents of the DualMem block, or the
Output value map of the LUT block.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
338
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
14.3.35. Loop
The Loop block maintains a set of counters that implement the equivalent of a nested
for loop in software. The counted values range from 0 to limit values provided with an
input signal.
When the go signal is asserted on the g input, limit-values are read into the block with
the c input. The dimension of the vector determines the number of counters (nested
loops). When DSP Builder enables the block with the e input, it presents the counter
values as a vector value at the q output each cycle. The valid output is set to 1 to
indicate that a valid output is present.
There are vectors of flags indicating when first values (output f) and last values
(output l) occur.
A particular element in these vector outputs is set to 1 when the corresponding loop
counter is set at 0 or at count-1 respectively.
Use the Loop block to drive datapaths that operate on regular data either from an
input port or data stored in a memory. The enable input, and corresponding valid
output, facilitate forward flow control.
For a two dimensional loop the equivalent C++ code to describe the general loop is:
q[0] = i;
q[1] = j;
f[0] = (i==0);
f[1] = (j==0);
l[0] = (i==(c[0]-1));
l[1] = (j==(c[1]-1));
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
339
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
14.3.36. Math
The Math block applies a mathematical operation to its floating-point inputs and
outputs the floating-point result. A mask parameter popup menu selects the required
elementary mathematical function that DSP Builder applies.
expm1(x) exp(x) – 1.
inverse(x) The reciprocal of x. For Floating-point rounding, select either correct or faithful.
hypot(x,y) Hypotenuse of right-angled triangle with other two sides length x and y.
log1p(x) log(x+1).
Note:
1. For single-precision input and designs targeting any device with a floating-point DSP block, the block uses a mixture of
resources including the DSP blocks in floating-point mode. This implementation uses fewer ALMs at the expense of
more DSP blocks.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
340
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
The Function mask parameter selects one of five elementary functions. The number
of input ports on the block change as required by the semantics of the function that
you select:
• One-input function: exp(x), exp2(x), exp10(x), expm1(x), log(x), log2(x),
log10(x),log1p(x), inverse(x)
• Two-input functions: hypot(x,y), mod(x,y), pow(x,y), powr(x,y)
• Three input function: hypot3d(x,y,z)
maxmag Floating-point Outputs a if |a| > |b|, b if |b| > |a|, otherwise max(a,b).
minmag Floating-point Outputs a if |a| < |b|, b if |b| < |a|, otherwise min(a,b).
The Function mask parameter selects one of six bounding functions. The number of
input ports on the block change as required by the semantics of the function that you
select:
• Two-input functions: max, min, maxmag, minmag, dim
• Three-input functions: sat
The Output data type mode mask parameter applies only if the input is fixed-point
format.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
341
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
14.3.38. MinMaxCtrl
The MinMaxCtrl block applies a minimum or maximum operator to the inputs
depending on the Boolean signal it receives on the control port.
The Output data type mode mask parameter applies only if the inputs a and b are
fixed-point format.
Output data type mode Determines how the block sets its output data type:
• Inherit via internal rule: the number of integer and fractional bits is the maximum of
the number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional
fields that are available.This option reinterprets the output bit pattern from the LSB up
according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling value Specifies the output scaling value. For example, 2^-15.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
342
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
q=a×b
Note: For single-precision inputs and designs targeting any device with a floating-point DSP
block, the block uses a mixture of resources including the DSP blocks in floating-point
mode.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
• Variable precision floating point: variable precision floating-point output type
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Related Information
Forcing Soft Floating-point Data Types with the Advanced Options on page 232
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
343
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Note: You can make a multiple input multiplexer by combining more than one mux2 blocks
in a tree or by using a Select block.
Number of data The input type for s is an unsigned integer of width log2(number of data signals). Boolean is also
signals allowed in the case of two data inputs.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
q = ~(a & b)
If the number of inputs is set to 1, then output the logical NAND of all the individual
bits of the input word.
Output data type Determines how the block sets its output data type:
mode
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
344
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
• Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
14.3.42. Negate
The Negate block outputs the negation of the input value.
The Output datatype mode determines how the block infers its output data type:
• Inherit via internal rule. The output data type is the same as the input data
type.
• Inherit via internal rule with word growth. The output data type is the same
as the input data type. If the input data type is fixed-point, word growth is applied
to the output data type.
q = ~(a | b)
Set the number of inputs to 1, to output the logical NOR of all the individual bits of
the input word.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
345
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
q = ~a
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
346
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
q=a|b
Set the number of inputs to 1, to output the logical OR of all the individual bits of the
input word.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
14.3.46. Polynomial
The Polynomial block takes input x, and provides the result of evaluating a
polynomial of degree, n:
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
347
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Coefficient banks A vector of (n +1) components. Specify the coefficients in the order a0, a1, a2, …, an.
If input x is driven by a vector signal, then a matrix with (n+1) columns, and one row per vector
component can be specified. Each output component will be the result of evaluating an independently
defined polynomial of degree, n.
If there is more than one coefficient bank, the number of rows in the matrix should be v*u, for v
vector components, and u banks. The coefficients for a given bank are ordered contiguously in the
matrix.
14.3.47. Ready
Use the Ready block in designs with ALU folding. The Ready block adds a ready
signal to your design.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Inherit via back projection: a downstream block that this block drives determines the output
data type. If the driven block does not propagate a data type to the driver, you must use a
Simulink SameDT block to copy the required data type to the output wire.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
348
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
14.3.49. Round
The Round block applies a rounding operation to the floating-point input. A mask
parameter popup menu selects the required rounding function that you apply.
round(x) Round to nearest integer; halfway cases rounded away from zero.
Note: SampleDelay blocks might not reset to zero. Do not use designs that rely on
SampleDelays output of zero after reset. Use the valid signal to indicate valid data
and its propagation through the design.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
349
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
• Single: single floating-point data.
• Double: double floating-point data.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Minimum delay Checks if the delay can grow as needed, so that the specified length becomes the lower bound.
Equivalence group Sample delays that share the same equivalence group string grow by the same increment.
Note: For single-precision inputs and designs targeting devices with floating-point DSP
blocks, the block uses a mixture of resources including the device DSP blocks in
floating-point mode.
Output data type Determines how the block sets its output data type:
mode
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
350
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
• Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Inherit via internal rule with word growth: the number of fractional bits is the maximum of
the number of fractional bits in the input data types. The number of integer bits is the maximum of
the number of integer bits in the input data types plus one. This additional word growth allows for
subtracting the most negative number from 0, which exceeds the maximum positive number that
you can store in the number of bits of the input.
• Boolean: the output type is Boolean.
• Variable precision floating point: variable precision floating-point output type.
Output data type Specifies the output data type. For example, fixdt(1, 16, 15). Only available for Specify via dialog
Output scaling Specifies the output scaling value. For example, 2^-15. Only available for Specify via dialog
value
Floating-point Specifies a predefined floating-point type. Only available for Variable precision floating point:
precision
Fused datapath This option affects the floating-point architectures. Turn on this option to save hardware by omitting
normalization stages between adder stages. The output deviates from that expected of IEEE
compliance.
14.3.52. Select
The Select block outputs one of the data signals (a, b, ...) if its paired select input (0,
1, ...) has a non-zero value.
q = 0 ? a : (1 ? b : d)
If all select inputs are 0, the Select block outputs the default value d. At most one
select input should be high at a time.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
351
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
14.3.53. Sequence
The Sequence block outputs a Boolean pulse of configurable duration and phase.
The input acts as an enable for this sequence. Usually, this block initializes with an
array of Boolean pulses of length period. The first step_value entries are zero, and the
remaining values are one.
A counter steps along this array, one entry at a time, and indexes the array. The
output value is the contents of the array. The counter is initialized to initial_value. The
counter wraps at step period, back to zero, to index the beginning of the array.
Output data type Determines how the block sets its output data type:
mode
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
352
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
• Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Sequence setup A vector that specifies the counter in the format: [<initial_value> <step_value> <period>]
For example, [0 50 100]
14.3.54. Shift
The Shift block outputs the logical right shifted version of the input value if unsigned,
or outputs the arithmetic right shifted version of the input value if signed. The shift is
specified by the input b:
q = (a >> b)
The width of the data type b determines the maximum size of the shift.
Shifts of more than the input word width result in an output of 0 for non-negative
numbers and (0 – 2-F) for negative numbers (where F is the fraction length).
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
353
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
14.3.55. Sqrt
The Sqrt block applies a numerical root operation to its input and produces the result.
The mask parameter pop-up menu selects the required root function that you apply.
Advanced Options Blank or struct('method',256) The sqrt(x) function with integer input
and output has two semantics: floor
semantics, floor(sqrt(x)), for a logic
reduction on wider data types or the
default round-to-nearest semantics. To
select floor semantics, type
struct('method',256).
q = a – b.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
354
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Note: For single-precision inputs and designs targeting any device with a floating-point DSP
block, the block uses a mixture of resources including the DSP blocks in floating-point
mode.
Output data Determines how the block sets its output data type:
type mode • Inherit via internal rule: the number of integer and fractional bits is the
maximum of the number of bits in the input data types.
• Inherit via internal rule with word growth: the number of fractional bits is the
maximum of the number of fractional bits in the input data types. The number of
integer bits is the maximum of the number of integer bits in the input data types
plus one. This additional word growth allows for subtracting the most negative
number from 0, which exceeds the maximum positive number that you can store in
the number of bits of the input.
• Specify via dialog: you can set the output type of the block explicitly using
additional fields that are available when this option is selected. This option
reinterprets the output bit pattern from the LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data Specifies the output data type. For example, sfix(16), uint(8).
type
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Related Information
Forcing Soft Floating-point Data Types with the Advanced Options on page 232
q = Σ an
q=a
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
355
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Inherit via internal rule with word growth: the number of fractional bits is the maximum of
the number of fractional bits in the input data types. The number of integer bits is the maximum of
the number of integer bits in the input data types plus one. This additional word growth allows for
subtracting the most negative number from 0, which exceeds the maximum positive number that
you can store in the number of bits of the input.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
Related Information
Forcing Soft Floating-point Data Types with the Advanced Options on page 232
14.3.58. Trig
The Trig block applies a trigonometric operation to its floating-point inputs and
produces the floating-point result.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
356
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Note: Your design may use up to 50% less resources if you use the pi functions.
atan2(y,x) Four quadrant inverse tangent, output angle in interval [-π,+π] radians.
The Function parameter selects one of the 16 trigonometric functions. The number of
input ports and output ports on the block change as required by the semantics of the
function that you select:
• One-input and one-output: sin, cos, tan, cot, asin, acos, atan
• Two-inputs and one-output: atan2
• One-input and two-outputs: sincos
If you reduce the input range for the sin(x) and cos(x) functions to the interval
[-2pi,2pi], and you target devices with floating-point DSP blocks, in Advanced
Options set struct('rangeReduction',0). The design then uses the floating-point
mode of the DSP blocks to build more efficient architectures.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
357
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
q = ~(a XOR b)
Set the number of inputs to 1, to output the logical XNOR of all the individual bits of
the input word.
Output data type Determines how the block sets its output data type:
mode • Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected. This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
q = (a XOR b)
Set the number of inputs to 1, to output the logical XOR of all the individual bits of
the input word.
Output data type Determines how the block sets its output data type:
mode
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
358
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Parameter Description
• Inherit via internal rule: the number of integer and fractional bits is the maximum of the
number of bits in the input data types.
• Specify via dialog: you can set the output type of the block explicitly using additional fields that
are available when this option is selected.This option reinterprets the output bit pattern from the
LSB up according to the specified type.
• Boolean: the output type is Boolean.
Output data type Specifies the output data type. For example, sfix(16), uint(8).
Output scaling Specifies the output scaling value. For example, 2^-15.
value
The ChannelIn block passes its input through to the outputs unchanged, with types
preserved. This block indicates to DSP Builder that these signals arrive synchronized
from their source, so that the synthesis tool can interpret them.
Number of data signals Specifies the number of data signals on this block.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
359
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
The ChannelOut block passes its input through to the outputs unchanged, with types
preserved. This block indicates to DSP Builder that these signals must synchronize,
which the synthesis tool can ensure.
When you run a simulation in Simulink, DSP Builder adds additional latency from the
balanced pipelining stages to meet the specified timing constraints for your model.
The block accounts for this additional latency. This latency does not include any delay
explicitly added to your model, by for example a SampleDelay block, just added
pipelining for timing closure.
Note: You can also access the value of the latency parameter by typing a command of the
following form on the MATLAB command line:
get_param(gcb,’latency’)
d0, d1, d2, ... Input Any fixed-or A number of output Yes Yes
floating-point data signals
type
continued...
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
360
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
q0, q1, q2, ... Output Any fixed-or A number of data Yes Yes
floating-point signals
type
If the signal width is greater than one, you can assume the multiple inputs are
synchronized.
Number of data signals Specifies the number of input and output signals.
If the width is greater than one, the multiple outputs generate and are synchronized.
Number of data signals Specifies the number of input and output signals.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
361
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Note: If no SynthesisInfo block is present, DSP Builder gives error messages if insufficient
delay is present.
The inputs and outputs to this subsystem become the primary inputs and outputs of
the RTL entity that DSP Builder creates. After you run a Simulink simulation, the
online Help page for the SynthesisInfo block updates to show the latency, and port
interface for the current Primitive subsystem.
Note: The SynthesisInfo block can be at the same level as the Device block (if the
synthesizable subsystem is the same as the generated hardware subsystem).
However, it is often convenient to create a separate subsystem level that contains the
Device block. Refer to the design examples for some examples of design hierarchy.
Constrain This option allows you to select the type of constraint and to specify its value. The
Latency value can be a workspace variable or an expression but must evaluate to a positive
integer.
You can select the following types of constraint:
• >: Greater than
• >=: Greater than or equal to
• =: Equal to
• <=: Less than or equal to
• <: Less than
Select either + or - and type in a reference model in the text field. Specify the
reference as a Simulink path string e.g. ‘design/topLevel/model’. DSP Builder
then ensures the latency depends on that model, otherwise the default is that DSP
Builder depends on no model.
Bit accurate Turn on in floating-point designs to give bit accurate rather than mathematical
simulation simulations. Fixed point designs always use bit accurate.
Local reset Select the reset minimization for the associated synthesizable subsystem. Valid only if
minimization Control block Global Enable is On.
The default is Conditional – On for ChannelIn/Out only.
Select Off to disable reset minimization on this synthesizable subsystem.
Select On – Always (for ChannelIn/Out or GPIn/Out to apply reset minimization
to a synthesizable subsystem that uses GPIn/Out blocks. In a GPIn/Out subsystem
with reset minimization, the whole subsystem is data flow and has no valid signal to be
control flow.
Related Information
Reset Minimization on page 211
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
362
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
363
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
The Anchored Delay block has a data input and a valid input port. Connect the valid
input port to the valid in of the enclosing Primitive subsystem to allow DSP Builder to
correctly schedule the starting state of your control unit design.
DSP Builder implements the NestedLoop blocks as masked subsystems and use
existing DSP Builder Primitive library blocks. They do not have fixed
implementations. DSP Builder generates a new implementation at runtime whenever
you change any of the loop specifications.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
364
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
For each loop in a NestedLoop block, you can specify start, increment, and end
expressions. Each of these expressions may have one of the following three forms:
• A constant expression that evaluates (in the MATLAB base environment) to an
integer. For example, if the MATLAB variable N has the value 256, (log2(N)+1) is a
legal expression (and evaluates to 9).
• An instance of the loop variable controlling an enclosing loop. For example, you
can use "i" (the outer loop variable) as the start expression of the "j" or "k" loops.
• A port name, optionally accompanied by a width specification in angle brackets.
For example "p" or "q<4>". If no width is specified, it defaults to 8. This option
generates a new input port (with the user-defined name and width) on the
NestedLoop block.
For a NestedLoop2 block, with user-supplied start, increment, and end expressions
of S1, I1 and E1 (for the outer loop) and S2, I2 and E2 (for the inner loop), the
equivalent C++ code is:
int i = S1;
do {
int j = S2;
do {
j += I2;
} while (j != E2);
i += I1;
} while (i != E1);
Each NestedLoop block has two fixed input ports (go and en) and a variable number
of additional user-defined input ports. DSP Builder regards each user-defined port as a
signed input.
Each block also has two fixed output ports (qv and ql) and one (NestedLoop1), two
(NestedLoop2) or three (NestedLoop3) output ports for the counter values.
When the input en signal is low (inactive), the output qv (valid) signal is also set low.
The state of the NestedLoop block does not change, even if it receives a go signal.
Normal operation occurs when the en signal is high. The NestedLoop block can be in
the waiting or counting state.
The NestedLoop block resets into the waiting state and remains there until it receives
a go signal. While in the waiting state, the qv signal is low and the value of the other
outputs are undefined.
When the block receives a go signal, the NestedLoop block transitions into the
counting state. The counters start running and the qv ouput signal is set high. When
all the counters eventually reach their final values, the ql (last cycle) output becomes
high. On the following cycle, the NestedLoop block returns to the waiting state until
it receives another go signal.
If the block receives a go signal while the NestedLoop block is already in the
counting state, it remains in the counting state but all its counters are reset to their
start values.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
365
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
14.5.7. Pause
The Pause block implements a breakpoint with trigger count to break and single step
through Simulink simulations.
Use the Pause block for debugging designs. For example; run to breakpoint, then turn
on Show port values when hovering at this point. This option permanently causes
slow simulation, so only turn on when stepping through. Using display blocks allows
you to see variables displayed at the paused time (similar to watch variables in a
software debugger). You can change the trigger count, for example by adding 100, to
simulate the next 100 cycles. The block color changes to red when you turn on the
Pause block. Use the valid signal as the input, so that it counts valid steps only.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
366
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
0 0 Q
1 0 1
0 1 0
1 1 0
0 0 q
1 0 1
0 1 0
1 1 1
These latches work for any data type, and for vector and complex numbers.
Right-click on the block and select Look Under Mask, for the structure.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
367
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
The e signal is a ufix(1) enable signal. When e is high, the latch_1 block delays
data from input d by one cycle and feeds through to output q. When e is low, the
latch_1 block holds the last output.
A switch in e means the latch_1 block holds the output one cycle later.
For example:
where [1 2 3 4]' means 1,2,3 and 4 arrive in parallel on 4 separate wires. The Phases
parameter specifies the number of parallel data samples The delay input must be a
unsigned integer less than the number of parallel data samples.
The number of outputs is one of the parameters of the VectorFanout block. Use a
VectorFanout block instead of a simple wire to tell DSP Builder that you expect the
wire to be long. DSP Builder might ignore the block (which amounts to implementing
the VectorFanout block using a simple wire), or might insert one or more additional
registers on the wire to improve the physical routability of the design. The number of
registers it inserts depends on the target device, target fMAX and other properties of
your design. Inserting a VectorFanout block does not change the behaviour of your
design. If DSP Builder chooses to insert extra registers, it automatically adjusts the
latency of any parallel paths to preserve the original wire-like behaviour. By default,
DSP Builder implements all VectorFanout blocks as simple wires on non-HyperFlex
devices. FFTs and FIRs (which both contain embedded Fanout blocks) retain the same
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
368
14. Primitives Library
HB_DSPB_ADV | 2019.04.01
QoR characteristics as in DSP Builder v15.0 and earlier (which has no VectorFanout
blocks). To enable DSP Builder to choose different implementations for the
VectorFanout blocks in your design, specify
DSPBA_Features.EnableFanoutBlocks = true; at the MATLAB command line.
This command increases the number of registers your design uses, but potentially
increases its fMAX. You can specify that DSP Builder doesn't need to initialize any
registers that it chooses to insert. Then DSP Builder inserts hyper-registers (instead of
ordinary, ALM registers) on devices that support the HyperFlex architecture. You
should use this option for datapaths where the initial value is unimportant, but you
should avoid using it for control paths.
Allow use of uninitialized registers Check box Turn on to allow DSP Builder to use hyper registers. DSP Builder does
not initialize the inserted routing registers on reset
This block is an autogenerating masked subsystem that Primitive library blocks build.
Internally, it is a demultiplexer and multiplexer, but parameterizable such that you do
not have to manually draw and reconnect the connections between the demultiplexer
and multiplexer if the vector width parameter changes.
The e signal is a ufix(1) enable signal. When e is high, the latch_0 block feeds
data from input d through to output q. When e is low, the latch_0 block holds the last
output.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
369
HB_DSPB_ADV | 2019.04.01
Send Feedback
You can add the block anywhere in the Simulink design. The block only supports
the .vcd file format. DSP Builder writes this file in the RTL directory and it derives its
name from the name given to the block. The specific arrangement of .vcd is based on
what ModelSim writes out - i.e. only Boolean wires are used. You can import it into
ModelSim using the vcd2wlf tool. The waveforms should match with those generated
by the HDL simulation, although you might see an offset because of the Simulink
model latency correction.
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
15. Utilities Library
HB_DSPB_ADV | 2019.04.01
Instance Select from any instance in your imported HDL. Each HDL
Import block must represent a unique instance.
I/O Type DSP Builder determines the IO type based on the name of
the port. You can change any entry to Input, Output,
Clock, or Reset. HDL Import only allows one clock and
one reset.
Data Type Informs Simulink and DSP Builder how they should interpret
the ModelSim data. Set the Data Type of inputs to
Inherit; the Data Type of outputs defaults to Signed. For
Boolean or std_logic data type, select Unsigned with 0
fractional bits.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
371
15. Utilities Library
HB_DSPB_ADV | 2019.04.01
The HDL import feature needs the time relationship between ModelSim and Simulink.
ModelSim uses the Control block-defined clock rate.
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
372
15. Utilities Library
HB_DSPB_ADV | 2019.04.01
Working Directory DSP Builder creates this working directory for the ModelSim
library and other intermediate files.
Top-level instance Enter the name of the top-level instance. If that instance is
not the HDL you want to import but a wrapper for multiple
instances, turn on Top-level is a wrapper.
Simulink sample time Specify the sample time of the DSP Builder part of your
Simulink model.
Reset cycles Allows you to hold your imported HDL in reset for an
arbitrary number of cycles before the cosimulation begins.
Port The TCP/IP port number that the cosimulation uses for
communication.
15.1.4. Pause
The Pause block implements a breakpoint with trigger count to break and single step
through Simulink simulations.
Use the Pause block for debugging designs. For example; run to breakpoint, then turn
on Show port values when hovering at this point. This option permanently causes
slow simulation, so only turn on when stepping through. Using display blocks allows
you to see variables displayed at the paused time (similar to watch variables in a
software debugger). You can change the trigger count, for example by adding 100, to
simulate the next 100 cycles. The block color changes to red when you turn on the
Pause block. Use the valid signal as the input, so that it counts valid steps only.
Alternatively, use a different control signal, such as a writeenable, as an input to
get the system to break then. You can easily add other logic blocks to generate a
break signal that you can use just for debugging.
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
373
HB_DSPB_ADV | 2019.04.01
Send Feedback
2018.06.27 18.0 Updated Arria 10 to any device with a floating-point block in floating-point designs
2017.11.06 17.1 • Improved description on NCO block Accumulator Bit Width parameter.
• Corrected parameters on Scalar Product block.
• Added Forcing Soft Floating-point Data Types with the Advanced Options topic
• Added super-sample NCO design example.
• Added support for Intel Cyclone 10 and Intel Stratix 10 devices.
• Removed instances of Signals block.
• Changed input type on GPIn block; changed output type on GPOut block.
• Deleted WYSIWYG option on SynthesisInfo block.
Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX, Nios, Quartus
and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other
countries. Intel warrants performance of its FPGA and semiconductor products to current specifications in ISO
accordance with Intel's standard warranty, but reserves the right to make changes to any products and services 9001:2015
at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any Registered
information, product, or service described herein except as expressly agreed to in writing by Intel. Intel
customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
16. Document Revision History for DSP Builder for Intel FPGAs (Advanced Blockset) Handbook
HB_DSPB_ADV | 2019.04.01
Send Feedback DSP Builder for Intel FPGAs (Advanced Blockset): Handbook
375
16. Document Revision History for DSP Builder for Intel FPGAs (Advanced Blockset) Handbook
HB_DSPB_ADV | 2019.04.01
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
376
16. Document Revision History for DSP Builder for Intel FPGAs (Advanced Blockset) Handbook
HB_DSPB_ADV | 2019.04.01
• Added note to latency contraints topic: latency constraints only apply between
ChannelIn and ChannelOut blocks.
• Added support for Verilog HDL implementation
• Removed architecture versus implementation information
• Removed SynthesisInfo block WYSIWYG option
• Removed the following design examples:
— 1K floating-point FFT
— Radix-2 streaming FFT
— Radix-4 streaming FFT
December 14.1 • Added step about disabling virtual pins in the Quartus Prime software when
2014 using HIL with advanced blockset designs
• Added information on _mmap.h file, which contains register information on your
design
• Corrected Has read enable and Show read enable descriptions in BusStimulus
and BusStimulusFileReader blocks
• Added BusStimulus and BusStimulusFileReader blocks to memory-mapped
registers design example.
• Added AvalonMMSlaveSettings block and DSP Builder > Avalon Interfaces >
Avalon-MM slave menu option
• Removed bus parameters from Control and Signal blocks
• Removed the following design examples:
— Color Space Converter (Resource Sharing Folding)
— Interpolating FIR Filter with Updating Coefficients
— Primitive FIR Filter (Resource Sharing Folding)
— Single-Stage IIR Filter (Resource Sharing Folding)
— Three-stage IIR Filter (Resource Sharing Folding)
— Added system-in-the-loop support
• Added new blocks:
— Floating-point classifier
— Floating-point multiply accumulate
— Added hypotenuse function to math block
• Added design examples:
— Color space converter
— Complex FIR design example
— CORDIC from Primitive Blocks
— Crest factor reduction
— Folding FIR
— Variable Integer Rate Decimation Filter
— Vector sort - sequential and iterative
• Added reference designs:
— Crest factor reduction
— Direct RF with Synthesizable Testbench
— Dynamic Decimation Filter
— Reconfigurable Decimation Filter
— Variable Integer Rate Decimation Filter
• Changed directory structure
• Added correct floating-point rounding for reciprocal and square root blocks.
• Corrected signal descriptions for LoadableCounter block
• Removed resource sharing folder
• Added new ALU folder information:
— Start of packet signal
— Clock-rate mode
DSP Builder for Intel FPGAs (Advanced Blockset): Handbook Send Feedback
378