Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

Courses @ NECST
Lorenzo Di Tucci <lorenzo.ditucci@polimi.it>
Emanuele Del Sozzo <emanuele.delsozzo@polimi.it>
Marco D. Santambrogio <marco.santambrogio@polimi.it>
Xilinx Vivado HLS
18/01/2018

2

Agenda
• Introduction to the Hardware Design Flow
• Vivado HLS:
–Design Flow
–Kernel Creation
–Communication Infrastructure
–Kernel Optimizations
• Hands-On Example

3

Installation Party
Use this Google Doc to provide your data
https://goo.gl/FRCG6y
First, install the VPN we have provided you.
(Mac: Tunnelblick - Windows/Linux: OpenVPN)
To SSH to the machine:
ssh <name>.<surname>@nags31.local.necst.it
password: user

4

Installation Party
You can change your password here:
http://changepassword.local.necst.it/
You can also RDP to the instance using
• Microsoft Remote Desktop (Microsoft/Mac OS)
• Remmina (Linux)
To connect to the machine, or change your password you must
have started the VPN.

5

Hardware Design Flow for HPC
• Hardware Design Flow (HDF): process to realize a
hardware module
• HDF for FPGAs can be seen as a 2 step process

6

The Hardware Design Flow

7

The Hardware Design Flow
• CAD tools aims at alleviating the design on FPGA

8

The Hardware Design Flow
• CAD tools aims at alleviating the design on FPGA
• Designer still need to manually perform system
integration, driver generation and runtime management

9

The Hardware Design Flow
• CAD tools aims at alleviating the design on FPGA
• Designer still need to manually perform system
integration, driver generation and runtime management

10

The Hardware Design Flow
• CAD tools aims at alleviating the design on FPGA
• Designer still need to manually perform system
integration, driver generation and runtime management

11

Vivado HLS
• High Level Synthesis tools
• provided a high level description of a kernel, allows to
generate a HDL of the application
• increase the level of abstraction to ease the designing
process
• provides multiple hardware optimized libraries and
APIs
• directives driven architecture-aware synthesis

12

Vivado HLS Flow
Possibility to specify the kernel using High Level Code
- code must be refactored as a subset of C/C++ is supported
- no dynamic memory allocation
- Multiple API provided (arbitrary precision, mathematical functions, video library,
etc..)
- Possibility to specify a constraint file (clock uncertainty, clock period, target board)
C/C++

13

Vivado HLS Flow
Directives driven design and optimization
- necessary to specify interconnection with memory
(AXI4, AXI4Lite, AXI4-Stream)
- The designer still needs to have clear in mind what
architecture to implement
C/C++
IP Core Design
and
Optimization

14

Vivado HLS Flow
2 levels of emulation
- Software Emulation used to check functional
correctness of the application - does not guarantee
correctness on the FPGA
- Hardware Emulation check the correctness of the
logic generated by the synthesizer
C/C++
IP Core Design
and
Optimization
Software
Emulation
Hardware
Emulation

15

Vivado HLS Flow
C/C++
- Eventually, IP Core can be generated and exported so that it
can be used for the System Level Design Step.
- At each step of the pipeline, it is possible to come back to
the optimization phase
IP Core Design
and
Optimization
Software
Emulation
Hardware
Emulation
IP Core
Generation

16

Kernel Creation
• Not all the parts of an algorithm are suitable for
hardware acceleration
• First, it is necessary to profile the application to identify
the bottlenecks of the application, i.e. the most
compute intensive parts/kernels
• Then, it is possible to start adapting and optimizing the
kernels for the High Level Synthesis process
– define the communication infrastructure
– build the kernel architecture

17

Communication Infrastructure
• Mainly, 3 types of communication may be implemented
using AXI4 protocol:
– AXI-Lite (for control signals)
– Axi-Stream (to stream data)
– Axi-Master (direct connection with the memory)
• Vivado HLS provides directives to specify which type of
communication to implement
• However, the user has to develop the data transfer in
an efficient way

18

Kernel Optimizations
• Vivado HLS provides directives to optimize the kernel
• The directives may refer to:
– how to carry out the computation
• loop pipelining/unrolling
• function inlining
• dataflow
• resources to use for a certain computation
– how to store the data on FPGA memory
• array partitioning (block, cyclic, complete)

19

Loop Pipelining
• Text

20

Loop Unrolling
• Text

21

Resource Contention

22

Array Partitioning

23

Example
• Now we are going to see how to implement a vector
addition of this form:
a = b + c*d
where a, b, c are vectors, and d is a constant
• We will go through all optimization steps with Vivado
HLS, from the project creation to the IP core generation

24

Launch Vivado HLS
Source settings64.sh in Vivado folder, then run Vivado HLS

25

Vivado HLS GUI

26

Project Configuration
• Text

27

Kernel files
• Text
Add/create files
Top Function
Selection

28

Testbench files
• Text
Add/create files

29

Solution Configuration
• Text
Clock period Part selection

30

Device Selection
• Text
Parts/Boards
Selection

31

Solution Configuration
• Text

32

Editor Interface
• Text
Project
Files
Files
Editor
Outline /
Directive
Vivado HLS
Console

33

Editor Interface
• Text
Project
Settings
Solution
Settings
Run
C Simulation
Run
Synthesis

34

Kernel Creation
• Text

35

Kernel Creation
• Text

36

Kernel function
• Text

37

Kernel function
• Text
Let’s add the directives for
the communication interfaces

38

Directives
• Text
Directives tab

39

Insert Directive for port a
• Text

40

Master Axi Directive
• Text

41

Insert Directive for port a
• Text

42

Axilite Directive
• Text

43

Port a directives
• Text

44

Ports directives
• Text
Only axilite for port d
We pass it as a control signal

45

Insert Directive for kernel
• Text

46

Interface Directive
• Text

47

Axilite Directive
• Text

48

Kernel Directive
• Text

49

Naive Implementation
• Text

50

Project Settings
• Text

51

Top Function Selection
• Text

52

Top Function Selection
• Text

53

Top Function Selection
• Text

54

Testbench Creation
• Text

55

Testbench Creation
• Text

56

Testbench
• Text

57

Run C Simulation
• Text

58

C Simulation Settings
• Text

59

C Simulation Result
• Text

60

Run Synthesis
• Text
Synthesis Log
Console

61

Synthesis Report
• Text

62

Naive Implementation Report
Clock Target Estimated Uncertainty
ap_clk 10.00 8.75 1.25
Latency Interval
min max min max Type
812 812 813 813 none
Name BRAM_18K DSP48E FF LUT
Available 2060 2800 2393 2618
Utilization ~0% ~0% ~0% ~0%

63

Run Co-Simulation
• Text

64

Co-Simulation Settings
• Text

65

Co-Simulation Results
• Text

66

V1 Implementation
• Text

67

V1 Implementation Log
• Text

68

V1 Implementation Report
Clock Target Estimated Uncertainty
ap_clk 10.00 8.75 1.25
Latency Interval
min max min max Type
1227 1227 1228 1228 none
Name BRAM_18K DSP48E FF LUT
Available 2060 2800 2393 2618
Utilization ~0% ~0% ~0% ~0%

69

V1 Implementation Loops
Latency Initiation Interval
Loop
Name
min max Iteration Latency achieved target Trip Count Pipelined
memcpy b 101 101 3 1 1 100 yes
memcpy c 101 101 3 1 1 100 yes
loop 900 900 9 - - 100 no
memcpy a 101 101 3 1 1 100 yes

70

V1 Analysis
• Text

71

V1 Analysis
• Text

72

V1 Analysis
• Text

73

V2 Implementation
• Text

74

V2 Implementation Log
• Text

75

V2 Implementation Report
Clock Target Estimated Uncertainty
ap_clk 10.00 8.75 1.25
Latency Interval
min max min max Type
435 435 436 436 none
Name BRAM_18K DSP48E FF LUT
Available 2060 2800 2393 2618
Utilization ~0% ~0% ~0% ~0%

76

V2 Implementation Loops
Latency Initiation Interval
Loop
Name
min max Iteration Latency achieved target Trip Count Pipelined
memcpy b 101 101 3 1 1 100 yes
memcpy c 101 101 3 1 1 100 yes
loop 107 107 9 1 1 100 yes
memcpy a 101 101 3 1 1 100 yes

77

V3 Implementation
• Text

78

V3 Implementation Report
Clock Target Estimated Uncertainty
ap_clk 10.00 8.75 1.25
Latency Interval
min max min max Type
326 326 327 327 none
Name BRAM_18K DSP48E FF LUT
Available 2060 2800 2393 2618
Utilization ~0% ~0% ~0% ~0%

79

V3 Implementation Loops
Latency Initiation Interval
Loop
Name
min max Iteration Latency achieved target Trip Count Pipelined
loop 107 107 9 1 1 100 yes
memcpy a 101 101 3 1 1 100 yes
Latency Initiation Interval
Module min max min max Type
myMemcpy 109 109 109 109 None
myMemcpy 109 109 109 109 None

80

V3 Implementation Analysis
• Text
However, our board has only one physical memory port,
hence it cannot physically transfer data in parallel

81

V4 Implementation
• Text

82

V4 Implementation Report
Clock Target Estimated Uncertainty
ap_clk 10.00 8.75 1.25
Latency Interval
min max min max Type
373 373 374 374 none
Name BRAM_18K DSP48E FF LUT
Available 2060 2800 2393 2618
Utilization ~0% ~0% ~0% 1%

83

V4 Implementation Loops
Latency Initiation Interval
Loop
Name
min max Iteration Latency achieved target Trip Count Pipelined
memcpy b 101 101 3 1 1 100 yes
memcpy c 101 101 3 1 1 100 yes
memcpy a 101 101 3 1 1 100 yes

84

V4 Implementation Analysis
• Text

85

Insert Directive for a_local
• Text

86

Array Directives
• Text

87

Array Partition Directive
• Text

88

Array a_local directive
• Text

89

V5 Implementation
• Text

90

V5 Implementation Report
Clock Target Estimated Uncertainty
ap_clk 10.00 8.75 1.25
Latency Interval
min max min max Type
320 320 321 321 none
Name BRAM_18K DSP48E FF LUT
Available 2060 2800 2393 2618
Utilization ~0% 17% 8% 12%

91

V5 Implementation Loops
Latency Initiation Interval
Loop
Name
min max Iteration Latency achieved target Trip Count Pipelined
memcpy b 101 101 3 1 1 100 yes
memcpy c 101 101 3 1 1 100 yes
memcpy a 101 101 3 1 1 100 yes

92

V5 Implementation Analysis
• Text

93

V5* Implementation
• Text

94

V5* Implementation Report
Clock Target Estimated Uncertainty
ap_clk 10.00 8.75 1.25
Latency Interval
min max min max Type
508 508 509 509 none
Name BRAM_18K DSP48E FF LUT
Available 2060 2800 2393 2618
Utilization ~0% 28% 13% 19%

95

V5* Implementation Loops
Latency Initiation Interval
Loop
Name
min max Iteration Latency achieved target Trip Count Pipelined
memcpy b 160 160 2 1 1 100 yes
memcpy c 160 160 2 1 1 100 yes
memcpy a 160 160 2 1 1 100 yes

96

V6 Implementation
• Text

97

V6 Implementation Report
Clock Target Estimated Uncertainty
ap_clk 10.00 8.75 1.25
Latency Interval
min max min max Type
75 75 76 76 none
Name BRAM_18K DSP48E FF LUT
Available 2060 2800 2393 2618
Utilization 6% 2% 2% 3%

98

V6 Implementation Loops
Latency Initiation Interval
Loop
Name
min max Iteration Latency achieved target Trip Count Pipelined
memcpy b 11 11 3 1 1 10 yes
memcpy c 11 11 3 1 1 10 yes
loop1 17 17 9 1 1 10 yes
memcpy a 11 11 3 1 1 10 yes

99

V7 Implementation
• Text

100

Streaming Interface
• Text

101

V7 Testbench
• Text

102

V7 Implementation Report
Clock Target Estimated Uncertainty
ap_clk 10.00 8.46 1.25
Latency Interval
min max min max Type
169 169 170 170 none
Name BRAM_18K DSP48E FF LUT
Available 2060 2800 2393 2618
Utilization 0% ~0% ~0% ~0%

103

V7 Implementation Loops
Latency Initiation Interval
Loop
Name
min max Iteration Latency achieved target Trip Count Pipelined
loop 166 166 8 1 1 160 yes

104

V8 Implementation
• Text

105

V8 Streaming Interface
• Text

106

V8 Testbench
• Text

107

V8 Implementation Report
Clock Target Estimated Uncertainty
ap_clk 10.00 8.46 1.25
Latency Interval
min max min max Type
19 19 20 20 none
Name BRAM_18K DSP48E FF LUT
Available 2060 2800 2393 2618
Utilization 0% 2% 1% 1%

108

V8 Implementation Loops
Latency Initiation Interval
Loop
Name
min max Iteration Latency achieved target Trip Count Pipelined
loop 16 16 8 1 1 10 yes

109

Export IP

110

Export IP Settings

111

Export IP Done

112

This is only the beginning!!
For more information, read Vivado HLS manual
https://www.xilinx.com/support/documentation/sw_ma
nuals/xilinx2017_2/ug902-vivado-high-level-synthesis.pdf

113

Feedbacks
• We are working at improving this course, would you
share your feedback for this lesson?
https://goo.gl/tLcWQj

114

Thank You for the
Attention!
Lorenzo Di Tucci
lorenzo.ditucci@polimi.it
Emanuele Del Sozzo
emanuele.delsozzo@polimi.it
Marco D. Santambrogio
marco.santambrogio@polimi.it

More Related Content

SDAccel Design Contest: Vivado HLS