Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Application-Specific Arithmetic in High-Level Synthesis Tools

Published: 04 March 2020 Publication History

Abstract

This work studies hardware-specific optimization opportunities currently unexploited by high-level synthesis compilers. Some of these optimizations are specializations of floating-point operations that respect the usual semantics of the input program without changing the numerical result. Some other optimizations, locally triggered by the programmer thanks to a pragma, assume a different semantics, where floating-point code is interpreted as the specification of computation with real numbers. The compiler is then in charge to ensure an application-level accuracy constraint expressed in the pragma and has the freedom to use non-standard arithmetic hardware when more efficient. These two classes of optimizations are prototyped in the GeCoS source-to-source compiler and evaluated on the Polybench and EEMBC benchmark suites. Latency is reduced by up to 93%, and resource usage is reduced by up to 58%.

References

[1]
IEEE. (n.d.). IEEE Standard for Binary Floating-Point Arithmetic. Standard 754-1985. IEEE, Los Alamitos, CA.
[2]
Intel. 2019. Intel High Level Synthesis Compiler: Best Practices Guide. Intel.
[3]
Xilinx. 2019. Vivado Design Suite User Guide: High-Level Synthesis. Xilinx.
[4]
Levent Aksoy, Eduardo Costa, Paulo Flores, and Jose Monteiro. 2007. Optimization of area in digital FIR filters using gate-level metrics. In Proceedings of the Design Automation Conference. IEEE, Los Alamitos, CA, 420--423.
[5]
Randy Allen and Ken Kennedy. 2002. Optimizing Compilers for Modern Architectures. Morgan Kaufmann.
[6]
Nicolas Brisebarre, Florent de Dinechin, and Jean-Michel Muller. 2008. Integer and floating-point constant multipliers for FPGAs. In Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures, and Processors. IEEE, Los Alamitos, CA, 239--244.
[7]
Gabriel Caffarena, Juan A. Lopez, Carlos Carreras, and Octavio Nieto-Taladriz. 2006. High-level synthesis of multiple word-length DSP algorithms using heterogeneous-resource FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications. IEEE, Los Alamitos, CA.
[8]
Ken Chapman. 1993. Fast integer multipliers fit in FPGAs (EDN 1993 design idea winner). EDN Magazine 10 (May 1993).
[9]
Jason Cong, Muhuan Huang, Peichen Pan, Yuxin Wang, and Peng Zhang. 2016. Source-to-source optimization for HLS. In FPGAs for Software Programmers. Springer, 137--163.
[10]
Florent de Dinechin. 2012. Multiplication by rational constants. IEEE Transactions on Circuits and Systems II: Express Briefs 52, 2 (2012), 98--102.
[11]
Florent de Dinechin, Silviu-Ioan Filip, Luc Forget, and Martin Kumm. 2019. Table-based versus shift-and-add constant multipliers for FPGAs. In Proceedings of the 26th IEEE Symposium on Computer Arithmetic. IEEE, Los Alamitos, CA.
[12]
Florent de Dinechin and Bogdan Pasca. 2013. Reconfigurable arithmetic for high performance computing. In High-Performance Computing Using FPGAs. Springer, 631--664.
[13]
Florent de Dinechin, Bogdan Pasca, Octavian Cret, and Radu Tudoran. 2008. An FPGA-specific approach to floating-point accumulation and sum-of-products. In Proceedings of the International Conference on Field-Programmable Technology. IEEE, Los Alamitos, CA, 33--40.
[14]
Andrew G. Dempster and Malcolm D. Macleod. 1994. Constant integer multiplication using minimum adders. IEE Proceedings—Circuits, Devices and Systems 141, 5 (1994), 407--413.
[15]
Michael Dibrino. 2005. Floating point multiplier/accumulator with reduced latency and method thereof. US Patent 6,904,446.
[16]
Johannes Doerfert, Kevin Streit, Sebastian Hack, and Zino Benaissa. 2015. Polly’s polyhedral scheduling in the presence of reductions. In Proceedings of the International Workshop on Polyhedral Compilation Techniques.
[17]
EEMBC, the Embedded Microprocessor Benchmark Consortium. 2013. Introduction to the EEMBC FPMarkTM FPMark Floating-Point Benchmark Suite. Retrieved January 29, 2020 from http://www.eembc.org/fpmark.
[18]
Bruce M. Fleischer, Juergen Haess, Michael Kroener, Martin S. Schmookler, Eric M. Schwarz, and Son Dao-Trong. 2010. System and method for a floating point unit with feedback prior to normalization and rounding. US Patent 7,730,117.
[19]
Antoine Floc’h, Tomofumi Yuki, Ali El-Moussawi, Antoine Morvan, Kevin Martin, Maxime Naullet, Mythri Alle, et al. 2013. GeCoS: A framework for prototyping custom hardware design flows. In Proceedings of the International Working Conference on Source Code Analysis and Manipulation. IEEE, Los Alamitos, CA, 100--105.
[20]
Luc Forget, Yohann Uguen, Florent de Dinechin, and David Thomas. 2019. A type-safe arbitrary precision arithmetic portability layer for HLS tools. In Proceedings of the International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART ’19).
[21]
Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier, and Paul Zimmermann. 2007. MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Transactions on Mathematical Software 33, 2 (2007), 1--14.
[22]
Marcel Gort and Jason H. Anderson. 2013. Range and bitmask analysis for hardware optimization in high-level synthesis. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE, Los Alamitos, CA, 773--779.
[23]
Oscar Gustafsson. 2007. Lower bounds for constant multiplication problems. ACM Transactions on Circuits and Systems II: Express Briefs 54, 11 (Nov. 2007), 974--978.
[24]
James Hrica. 2012. Floating-Point Design with Vivado HLS. Application Note. Xilinx.
[25]
Qijing Huang, Ruolong Lian, Andrew Canis, Jongsok Choi, Ryan Xi, Nazanin Calagar, Stephen Brown, and Jason Anderson. 2015. The effect of compiler optimizations on high-level synthesis-generated hardware. ACM Transactions on Reconfigurable Technology and Systems 8, 3 (2015), 14.
[26]
ISO. 2011. C11 Standard. ISO/IEC 9899:2011. ISO.
[27]
Edin Kadric, Paul Gurniak, and André DeHon. 2016. Accurate parallel floating-point accumulation. IEEE Transactions on Computers 65, 11 (2016), 3224--3238.
[28]
Nachiket Kapre and André DeHon. 2007. Optimistic parallelization of floating-point accumulation. In Proceedings of the Symposium on Computer Arithmetic. IEEE, Los Alamitos, CA, 205--216.
[29]
Ulrich Kulisch and Van Snyder. 2011. The exact dot product as basic tool for long interval arithmetic. Computing 91, 3 (2011), 307--313.
[30]
Martin Kumm, Oscar Gustafsson, Mario Garrido, and Peter Zipf. 2018. Optimal single constant multiplication using ternary adders. IEEE Transactions on Circuits and Systems II: Express Briefs 65, 7 (2018), 928--932.
[31]
Zhen Luo and Margaret Martonosi. 2000. Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques. IEEE Transactions on Computers 49, 3 (2000), 208--218.
[32]
Robert Morgan. 1998. Building an Optimizing Compiler. Digital Press.
[33]
Steven Muchnick. 1997. Advanced Compiler Design Implementation. Morgan Kaufmann.
[34]
Jean-Michel Muller, Nicolas Brunie, Florent de Dinechin, Claude-Pierre Jeannerod, Mioara Joldes, Vincent Lefèvre, Guillaume Melquiond, Nathalie Revol, and Serge Torres. 2018. Handbook of Floating-Point Arithmetic (2nd ed.). Birkhauser, Boston, MA.
[35]
Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, et al. 2015. A survey and evaluation of FPGA high-level synthesis tools. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 10 (2015), 1591--1604.
[36]
Bogdan Pasca. 2012. Correctly rounded floating-point division for DSP-enabled FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications. IEEE, Los Alamitos, CA, 249--254.
[37]
Louis-Noël Pouchet. 2012. Polybench: The Polyhedral Benchmark Suite. Retrieved March 25, 2019 from https://dl.acm.org/doi/abs/10.1145/1240233.1240234.
[38]
Xavier Redon and Paul Feautrier. 2000. Detection of scans. Parallel Algorithms and Applictations 15, 3–4 (2000), 229--263.
[39]
Olivier Sentieys, Daniel Menard, David Novo, and Karthick Parashar. 2014. Automatic fixed-point conversion: A gateway to high-level power optimization. Tutorial presented at the IEEE/ACM Design Automation and Test in Europe Conference.
[40]
David Thomas. 2019. Templatised soft floating-point for high-level synthesis. In Proceedings of the 27th International Symposium on Field-Programmable Custom Computing Machines. IEEE, Los Alamitos, CA.
[41]
Jason Thong and Nicola Nicolici. 2011. An optimal and practical approach to single constant multiplication. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 9 (2011), 1373--1386.
[42]
Yohann Uguen and Florent de Dinechin. 2017. Design-space exploration for the Kulisch accumulator. CCSD HAL. Retrieved January 29, 2020 from https://hal.archives-ouvertes.fr/hal-01488916v2.
[43]
Yohann Uguen, Florent de Dinechin, and Steven Derrien. 2017. Bridging high-level synthesis and application-specific arithmetic: The case study of floating-point summations. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications. IEEE, Los Alamitos, CA.
[44]
Yohann Uguen, Luc Forget, and Florent de Dinechin. 2019. Evaluating the hardware cost of the posit number system. In Proceedings of the 29th International Conference on Field-Programmable Logic and Applications.
[45]
H. Fatih Ugurdag, Florent de Dinechin, Y. Serhan Gener, Sezer Gören, and Laurent-Stéphane Didier. 2017. Hardware division by small integer constants. IEEE Transactions on Computers 66, 12 (2017), 2097--2110.
[46]
Wim Vanderbauwhede and Khaled Benkrid. 2013. High-Performance Computing Using FPGAs. Vol. 3. Springer.
[47]
Yevgen Voronenko and Markus Püschel. 2007. Multiplierless multiple constant multiplication. ACM Transactions on Algorithms 3, 2, Article 11 (2007). https://dl.acm.org/doi/abs/10.1145/1240233.1240234
[48]
E. George Walters. 2017. Reduced-area constant-coefficient and multiple-constant multipliers for Xilinx FPGAs with 6-Input LUTs. Electronics 6, 4 (2017), 101.
[49]
Michael J. Wirthlin. 2004. Constant coefficient multiplication using look-up tables. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 36, 1 (2004), 7--15.

Cited By

View all
  • (2024)CuFP: An HLS Library for Customized Floating-Point OperatorsElectronics10.3390/electronics1314283813:14(2838)Online publication date: 18-Jul-2024
  • (2023)X-Ray Tomography Reconstruction Accelerated on FPGA Through High-Level Synthesis ToolsIEEE Transactions on Biomedical Circuits and Systems10.1109/TBCAS.2023.325887917:2(375-389)Online publication date: Apr-2023
  • (2023)Floating-Point Accumulation and Sum of ProductsApplication-Specific Arithmetic10.1007/978-3-031-42808-1_21(623-640)Online publication date: 23-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 17, Issue 1
March 2020
206 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3386454
Issue’s Table of Contents
© 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2020
Accepted: 01 December 2019
Revised: 01 December 2019
Received: 01 July 2019
Published in TACO Volume 17, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. High-level synthesis
  2. computer arithmetic
  3. floating point
  4. operator specialization

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)268
  • Downloads (Last 6 weeks)40
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CuFP: An HLS Library for Customized Floating-Point OperatorsElectronics10.3390/electronics1314283813:14(2838)Online publication date: 18-Jul-2024
  • (2023)X-Ray Tomography Reconstruction Accelerated on FPGA Through High-Level Synthesis ToolsIEEE Transactions on Biomedical Circuits and Systems10.1109/TBCAS.2023.325887917:2(375-389)Online publication date: Apr-2023
  • (2023)Floating-Point Accumulation and Sum of ProductsApplication-Specific Arithmetic10.1007/978-3-031-42808-1_21(623-640)Online publication date: 23-Aug-2023
  • (2023)Specialization and Fusion of Floating-Point OperatorsApplication-Specific Arithmetic10.1007/978-3-031-42808-1_15(453-474)Online publication date: 23-Aug-2023
  • (2022)Templatized Fused Vector Floating-Point Dot Product for High-Level SynthesisJournal of Low Power Electronics and Applications10.3390/jlpea1204005612:4(56)Online publication date: 17-Oct-2022
  • (2022)A single-source C++20 HLS flow for function evaluation on FPGA and beyond.Proceedings of the 12th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies10.1145/3535044.3535051(51-58)Online publication date: 9-Jun-2022
  • (2022)Hardware implementation of SLAM algorithms: a survey on implementation approaches and platformsArtificial Intelligence Review10.1007/s10462-022-10310-556:7(6187-6239)Online publication date: 23-Nov-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media