Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3199700.3199792acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

HLscope+: fast and accurate performance estimation for FPGA HLS

Published: 13 November 2017 Publication History

Abstract

High-level synthesis (HLS) tools have vastly increased the productivity of field-programmable gate array (FPGA) programmers with design automation and abstraction. However, the side effect is that many architectural details are hidden from the programmers. As a result, programmers who wish to improve the performance of their design often have difficulty identifying the performance bottleneck. It is true that current HLS tools provide some estimate of the performance with a fixed loop count, but they often fail to do so for programs with input-dependent execution behavior. Also, their external memory latency model does not accurately fit the actual bus-based shared memory architecture. This work describes a high-level cycle estimation methodology to solve these problems. To reduce the time overhead, we propose a cycle estimation process that is combined with the HLS software simulation. We also present an automatic code instrumentation technique that finds the reason for stall accurately in on-board execution. The experimental results show that our framework provides a cycle estimate with an average error rate of 1.1% and 5.0% for compute- and DRAM-bound modules, respectively, for ADM-PCIE-7V3 board. The proposed method is about two orders of magnitude faster than the FPGA bitstream generation.

References

[1]
Alpha Data, Alpha Data ADM-PCIE-7V3 Datasheet, 2017, http://www.alpha-data.com/pdfs/adm-pcie-7v3.pdf.
[2]
Alpha Data, Alpha Data ADM-PCIE-KU3 Datasheet, 2017, http://www.alpha-data.com/pdfs/adm-pcie-ku3.pdf.
[3]
Apache Spark examples, http://spark.apache.org/exampless.html.
[4]
L. Benini, et al., "SystemC cosimulation and emulation of multiprocessor SoC designs," Computer, 53--59, 2003.
[5]
L. Cai and D. Gajski, "Transaction level modeling: an overview," in Proc. Int. Conf. Hardware/software Codesign and System Synthesis, 19--24, 2003.
[6]
A. Canis, et al., "From software to accelerators with LegUp high-level synthesis," in Proc. Int. Conf. CASES, 18--26, 2013.
[7]
Y. Choi, et al., "A quantitative analysis on microarchitectures of modern CPU-FPGA platforms," in Proc. DAC, 109--114, 2016.
[8]
Y. Choi and J. Cong, "HLScope: High-Level performance debugging for FPGA designs," in Proc. Int. Symp. FCCM, 2017.
[9]
J. Cong, et al., "CPU-FPGA co-optimization for big data applications: A case study of in-memory Samtool sorting," in Proc. Int. Symp. FPGA, 291, 2017.
[10]
D. Finley, Optimized Quicksort, 2007, http://alienryderflex.com/quicksort.
[11]
IBM, Application Note: Understanding DRAM Operation, 1996.
[12]
Intel, Intel FPGA SDK for OpenCL, 2016, http://www.altera.com/.
[13]
J. Jang, S. Choi, and V. Prasanna, "Energy-and time-efficient matrix multiplication on FPGAs," IEEE T. VLSI, 13(11):1305--19, 2005.
[14]
Kingston, KVR13LSE9/8 memory module specifications, 2012, http://www.kingston.com/datasheets/.
[15]
D. Koeplinger, et al., "Automatic generation of efficient accelerators for reconfigurable hardware," in Proc. ISCA, 2016.
[16]
C. Lee, O. Mutlu, V. Narasiman, and Y. Patt, "Prefetch-aware DRAM controllers," in Proc. Int. Symp. Microarchitecture, 200--209, 2008.
[17]
J. Lei, et al., "A high-throughput architecture for lossless decompression on FPGA designed using HLS," in Proc. Int. Symp. FPGA, 277, 2016.
[18]
P. Li, P. Zhang, L. Pouchet, and J. Cong, "Resource-aware throughput optimization for high-level synthesis," in Proc. Int. Symp. FPGA, 200--209, 2015.
[19]
J. Park, P. Diniz, and K. Shayee, "Performance and area modeling of complete FPGA designs in the presence of loop transformations," IEEE T. Computers, 53(11):1420--1435, 2004.
[20]
L. Pouchet, PolyBench/C, 2015, http://web.cse.ohio-state.edu/pouchet.2/software/polybench/.
[21]
B. Reagon, et al., "Machsuite: Benchmarks for accelerator design and customized architectures," in Proc. IISWC, 110--119, 2014.
[22]
ROSE compiler infrastructure, http://rosecompiler.org/.
[23]
Y. Shao, et al., "Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures," in Proc. ISCA, 97--108, 2014.
[24]
A. Verma, et al., "Developing dynamic profiling and debugging support in OpenCL for FPGAs," in Proc. DAC, 56--61, 2017.
[25]
Xilinx, AXI Reference Guide UG761, 2012, http://www.xilinx.com/.
[26]
Xilinx, SDAccel Development Environment, 2016, http://www.xilinx.com/.
[27]
Xilinx, Vivado High-level Synthesis UG902, 2016, http://www.xilinx.com/.
[28]
C. Zhang, et al., "Optimizing FPGA-based accelerator design for deep convolutional neural networks," in Proc. Int. Symp. FPGA, 161--170, 2015.
[29]
G. Zhong, et al., "Lin-analyzer: A high-level performance analysis tool for FPGA-based accelerators," in Proc. DAC, 136--141, 2016.

Cited By

View all
  • (2021)Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future ProspectsACM Transactions on Reconfigurable Technology and Systems10.1145/346966014:4(1-39)Online publication date: 13-Sep-2021
  • (2019)Rapid Cycle-Accurate Simulator for High-Level SynthesisProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293918(178-183)Online publication date: 20-Feb-2019
  • (2019)XPPEProceedings of the 24th Asia and South Pacific Design Automation Conference10.1145/3287624.3288756(727-732)Online publication date: 21-Jan-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '17: Proceedings of the 36th International Conference on Computer-Aided Design
November 2017
1077 pages

Sponsors

In-Cooperation

  • IEEE-EDS: Electronic Devices Society

Publisher

IEEE Press

Publication History

Published: 13 November 2017

Check for updates

Qualifiers

  • Research-article

Conference

ICCAD '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)5
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future ProspectsACM Transactions on Reconfigurable Technology and Systems10.1145/346966014:4(1-39)Online publication date: 13-Sep-2021
  • (2019)Rapid Cycle-Accurate Simulator for High-Level SynthesisProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293918(178-183)Online publication date: 20-Feb-2019
  • (2019)XPPEProceedings of the 24th Asia and South Pacific Design Automation Conference10.1145/3287624.3288756(727-732)Online publication date: 21-Jan-2019
  • (2018)HLSPredictProceedings of the International Conference on Computer-Aided Design10.1145/3240765.3264635(1-8)Online publication date: 5-Nov-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media