research-article

Open access

Application-Specific Arithmetic in High-Level Synthesis Tools

Authors:

Florent De Dinechin,

Steven DerrienAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 17, Issue 1

Article No.: 5, Pages 1 - 23

https://doi.org/10.1145/3377403

Published: 04 March 2020 Publication History

All formats PDF

Abstract

This work studies hardware-specific optimization opportunities currently unexploited by high-level synthesis compilers. Some of these optimizations are specializations of floating-point operations that respect the usual semantics of the input program without changing the numerical result. Some other optimizations, locally triggered by the programmer thanks to a pragma, assume a different semantics, where floating-point code is interpreted as the specification of computation with real numbers. The compiler is then in charge to ensure an application-level accuracy constraint expressed in the pragma and has the freedom to use non-standard arithmetic hardware when more efficient. These two classes of optimizations are prototyped in the GeCoS source-to-source compiler and evaluated on the Polybench and EEMBC benchmark suites. Latency is reduced by up to 93%, and resource usage is reduced by up to 58%.

References

[1]

IEEE. (n.d.). IEEE Standard for Binary Floating-Point Arithmetic. Standard 754-1985. IEEE, Los Alamitos, CA.

[2]

Intel. 2019. Intel High Level Synthesis Compiler: Best Practices Guide. Intel.

[3]

Xilinx. 2019. Vivado Design Suite User Guide: High-Level Synthesis. Xilinx.

[4]

Levent Aksoy, Eduardo Costa, Paulo Flores, and Jose Monteiro. 2007. Optimization of area in digital FIR filters using gate-level metrics. In Proceedings of the Design Automation Conference. IEEE, Los Alamitos, CA, 420--423.

[5]

Randy Allen and Ken Kennedy. 2002. Optimizing Compilers for Modern Architectures. Morgan Kaufmann.

[6]

Nicolas Brisebarre, Florent de Dinechin, and Jean-Michel Muller. 2008. Integer and floating-point constant multipliers for FPGAs. In Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures, and Processors. IEEE, Los Alamitos, CA, 239--244.

Digital Library

[7]

Gabriel Caffarena, Juan A. Lopez, Carlos Carreras, and Octavio Nieto-Taladriz. 2006. High-level synthesis of multiple word-length DSP algorithms using heterogeneous-resource FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications. IEEE, Los Alamitos, CA.

[8]

Ken Chapman. 1993. Fast integer multipliers fit in FPGAs (EDN 1993 design idea winner). EDN Magazine 10 (May 1993).

[9]

Jason Cong, Muhuan Huang, Peichen Pan, Yuxin Wang, and Peng Zhang. 2016. Source-to-source optimization for HLS. In FPGAs for Software Programmers. Springer, 137--163.

[10]

Florent de Dinechin. 2012. Multiplication by rational constants. IEEE Transactions on Circuits and Systems II: Express Briefs 52, 2 (2012), 98--102.

[11]

Florent de Dinechin, Silviu-Ioan Filip, Luc Forget, and Martin Kumm. 2019. Table-based versus shift-and-add constant multipliers for FPGAs. In Proceedings of the 26th IEEE Symposium on Computer Arithmetic. IEEE, Los Alamitos, CA.

[12]

Florent de Dinechin and Bogdan Pasca. 2013. Reconfigurable arithmetic for high performance computing. In High-Performance Computing Using FPGAs. Springer, 631--664.

[13]

Florent de Dinechin, Bogdan Pasca, Octavian Cret, and Radu Tudoran. 2008. An FPGA-specific approach to floating-point accumulation and sum-of-products. In Proceedings of the International Conference on Field-Programmable Technology. IEEE, Los Alamitos, CA, 33--40.

[14]

Andrew G. Dempster and Malcolm D. Macleod. 1994. Constant integer multiplication using minimum adders. IEE Proceedings—Circuits, Devices and Systems 141, 5 (1994), 407--413.

[15]

Michael Dibrino. 2005. Floating point multiplier/accumulator with reduced latency and method thereof. US Patent 6,904,446.

[16]

Johannes Doerfert, Kevin Streit, Sebastian Hack, and Zino Benaissa. 2015. Polly’s polyhedral scheduling in the presence of reductions. In Proceedings of the International Workshop on Polyhedral Compilation Techniques.

[17]

EEMBC, the Embedded Microprocessor Benchmark Consortium. 2013. Introduction to the EEMBC FPMarkTM FPMark Floating-Point Benchmark Suite. Retrieved January 29, 2020 from http://www.eembc.org/fpmark.

[18]

Bruce M. Fleischer, Juergen Haess, Michael Kroener, Martin S. Schmookler, Eric M. Schwarz, and Son Dao-Trong. 2010. System and method for a floating point unit with feedback prior to normalization and rounding. US Patent 7,730,117.

[19]

Antoine Floc’h, Tomofumi Yuki, Ali El-Moussawi, Antoine Morvan, Kevin Martin, Maxime Naullet, Mythri Alle, et al. 2013. GeCoS: A framework for prototyping custom hardware design flows. In Proceedings of the International Working Conference on Source Code Analysis and Manipulation. IEEE, Los Alamitos, CA, 100--105.

[20]

Luc Forget, Yohann Uguen, Florent de Dinechin, and David Thomas. 2019. A type-safe arbitrary precision arithmetic portability layer for HLS tools. In Proceedings of the International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART ’19).

Digital Library

[21]

Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier, and Paul Zimmermann. 2007. MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Transactions on Mathematical Software 33, 2 (2007), 1--14.

Digital Library

[22]

Marcel Gort and Jason H. Anderson. 2013. Range and bitmask analysis for hardware optimization in high-level synthesis. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE, Los Alamitos, CA, 773--779.

[23]

Oscar Gustafsson. 2007. Lower bounds for constant multiplication problems. ACM Transactions on Circuits and Systems II: Express Briefs 54, 11 (Nov. 2007), 974--978.

[24]

James Hrica. 2012. Floating-Point Design with Vivado HLS. Application Note. Xilinx.

[25]

Qijing Huang, Ruolong Lian, Andrew Canis, Jongsok Choi, Ryan Xi, Nazanin Calagar, Stephen Brown, and Jason Anderson. 2015. The effect of compiler optimizations on high-level synthesis-generated hardware. ACM Transactions on Reconfigurable Technology and Systems 8, 3 (2015), 14.

Digital Library

[26]

ISO. 2011. C11 Standard. ISO/IEC 9899:2011. ISO.

[27]

Edin Kadric, Paul Gurniak, and André DeHon. 2016. Accurate parallel floating-point accumulation. IEEE Transactions on Computers 65, 11 (2016), 3224--3238.

Digital Library

[28]

Nachiket Kapre and André DeHon. 2007. Optimistic parallelization of floating-point accumulation. In Proceedings of the Symposium on Computer Arithmetic. IEEE, Los Alamitos, CA, 205--216.

Digital Library

[29]

Ulrich Kulisch and Van Snyder. 2011. The exact dot product as basic tool for long interval arithmetic. Computing 91, 3 (2011), 307--313.

Digital Library

[30]

Martin Kumm, Oscar Gustafsson, Mario Garrido, and Peter Zipf. 2018. Optimal single constant multiplication using ternary adders. IEEE Transactions on Circuits and Systems II: Express Briefs 65, 7 (2018), 928--932.

[31]

Zhen Luo and Margaret Martonosi. 2000. Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques. IEEE Transactions on Computers 49, 3 (2000), 208--218.

Digital Library

[32]

Robert Morgan. 1998. Building an Optimizing Compiler. Digital Press.

[33]

Steven Muchnick. 1997. Advanced Compiler Design Implementation. Morgan Kaufmann.

[34]

Jean-Michel Muller, Nicolas Brunie, Florent de Dinechin, Claude-Pierre Jeannerod, Mioara Joldes, Vincent Lefèvre, Guillaume Melquiond, Nathalie Revol, and Serge Torres. 2018. Handbook of Floating-Point Arithmetic (2nd ed.). Birkhauser, Boston, MA.

[35]

Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, et al. 2015. A survey and evaluation of FPGA high-level synthesis tools. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 10 (2015), 1591--1604.

Digital Library

[36]

Bogdan Pasca. 2012. Correctly rounded floating-point division for DSP-enabled FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications. IEEE, Los Alamitos, CA, 249--254.

[37]

Louis-Noël Pouchet. 2012. Polybench: The Polyhedral Benchmark Suite. Retrieved March 25, 2019 from https://dl.acm.org/doi/abs/10.1145/1240233.1240234.

[38]

Xavier Redon and Paul Feautrier. 2000. Detection of scans. Parallel Algorithms and Applictations 15, 3–4 (2000), 229--263.

[39]

Olivier Sentieys, Daniel Menard, David Novo, and Karthick Parashar. 2014. Automatic fixed-point conversion: A gateway to high-level power optimization. Tutorial presented at the IEEE/ACM Design Automation and Test in Europe Conference.

[40]

David Thomas. 2019. Templatised soft floating-point for high-level synthesis. In Proceedings of the 27th International Symposium on Field-Programmable Custom Computing Machines. IEEE, Los Alamitos, CA.

[41]

Jason Thong and Nicola Nicolici. 2011. An optimal and practical approach to single constant multiplication. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 9 (2011), 1373--1386.

Digital Library

[42]

Yohann Uguen and Florent de Dinechin. 2017. Design-space exploration for the Kulisch accumulator. CCSD HAL. Retrieved January 29, 2020 from https://hal.archives-ouvertes.fr/hal-01488916v2.

[43]

Yohann Uguen, Florent de Dinechin, and Steven Derrien. 2017. Bridging high-level synthesis and application-specific arithmetic: The case study of floating-point summations. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications. IEEE, Los Alamitos, CA.

[44]

Yohann Uguen, Luc Forget, and Florent de Dinechin. 2019. Evaluating the hardware cost of the posit number system. In Proceedings of the 29th International Conference on Field-Programmable Logic and Applications.

[45]

H. Fatih Ugurdag, Florent de Dinechin, Y. Serhan Gener, Sezer Gören, and Laurent-Stéphane Didier. 2017. Hardware division by small integer constants. IEEE Transactions on Computers 66, 12 (2017), 2097--2110.

[46]

Wim Vanderbauwhede and Khaled Benkrid. 2013. High-Performance Computing Using FPGAs. Vol. 3. Springer.

[47]

Yevgen Voronenko and Markus Püschel. 2007. Multiplierless multiple constant multiplication. ACM Transactions on Algorithms 3, 2, Article 11 (2007). https://dl.acm.org/doi/abs/10.1145/1240233.1240234

Digital Library

[48]

E. George Walters. 2017. Reduced-area constant-coefficient and multiple-constant multipliers for Xilinx FPGAs with 6-Input LUTs. Electronics 6, 4 (2017), 101.

[49]

Michael J. Wirthlin. 2004. Constant coefficient multiplication using look-up tables. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 36, 1 (2004), 7--15.

Digital Library

Cited By

Hajizadeh FOuld-Bachir TDavid J(2024)CuFP: An HLS Library for Customized Floating-Point OperatorsElectronics10.3390/electronics1314283813:14(2838)Online publication date: 18-Jul-2024
https://doi.org/10.3390/electronics13142838
Diakite DGac N(2023)X-Ray Tomography Reconstruction Accelerated on FPGA Through High-Level Synthesis ToolsIEEE Transactions on Biomedical Circuits and Systems10.1109/TBCAS.2023.325887917:2(375-389)Online publication date: Apr-2023
https://doi.org/10.1109/TBCAS.2023.3258879
de Dinechin FKumm Mde Dinechin FKumm M(2023)Floating-Point Accumulation and Sum of ProductsApplication-Specific Arithmetic10.1007/978-3-031-42808-1_21(623-640)Online publication date: 23-Aug-2023
https://doi.org/10.1007/978-3-031-42808-1_21
Show More Cited By

Index Terms

Application-Specific Arithmetic in High-Level Synthesis Tools
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. High-level language architectures
2. Hardware
  1. Electronic design automation
    1. Hardware description languages and compilation
  2. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits
    2. Reconfigurable logic and FPGAs
      1. Reconfigurable logic applications

Recommendations

Bit-level optimization for high-level synthesis and FPGA-based acceleration
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays

Automated hardware design from behavior-level abstraction has drawn wide interest in FPGA-based acceleration and configurable computing research field. However, for many high-level programming languages, such as C/C++, the description of bitwise access ...
Layout-driven RTL binding techniques for high-level synthesis
ISSS '96: Proceedings of the 9th international symposium on System synthesis

The importance of effective and efficient accounting of layout effects is well-established in high-level synthesis (HLS), since it allows more realistic exploration of the design space and the generation of solutions with predictable metrics. This ...
Source-Level Compiler Optimizations for High-Level Synthesis
SEEDA-CECNSM '16: Proceedings of the SouthEast European Design Automation, Computer Engineering, Computer Networks and Social Media Conference

With high-level synthesis becoming the preferred method for hardware design, tools that operate on high-level programming languages and optimize hardware output are crucial for successful synthesis. In high-level synthesis, conventional programming ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 17, Issue 1

March 2020

206 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3386454

Editor:
Koen De Bosschere
Ghent University, Belgium

Issue’s Table of Contents

Copyright © 2020 ACM.

© 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2020

Accepted: 01 December 2019

Revised: 01 December 2019

Received: 01 July 2019

Published in TACO Volume 17, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
1,780
Total Downloads

Downloads (Last 12 months)268
Downloads (Last 6 weeks)40

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hajizadeh FOuld-Bachir TDavid J(2024)CuFP: An HLS Library for Customized Floating-Point OperatorsElectronics10.3390/electronics1314283813:14(2838)Online publication date: 18-Jul-2024
https://doi.org/10.3390/electronics13142838
Diakite DGac N(2023)X-Ray Tomography Reconstruction Accelerated on FPGA Through High-Level Synthesis ToolsIEEE Transactions on Biomedical Circuits and Systems10.1109/TBCAS.2023.325887917:2(375-389)Online publication date: Apr-2023
https://doi.org/10.1109/TBCAS.2023.3258879
de Dinechin FKumm Mde Dinechin FKumm M(2023)Floating-Point Accumulation and Sum of ProductsApplication-Specific Arithmetic10.1007/978-3-031-42808-1_21(623-640)Online publication date: 23-Aug-2023
https://doi.org/10.1007/978-3-031-42808-1_21
de Dinechin FKumm Mde Dinechin FKumm M(2023)Specialization and Fusion of Floating-Point OperatorsApplication-Specific Arithmetic10.1007/978-3-031-42808-1_15(453-474)Online publication date: 23-Aug-2023
https://doi.org/10.1007/978-3-031-42808-1_15
Filippas DNicopoulos CDimitrakopoulos G(2022)Templatized Fused Vector Floating-Point Dot Product for High-Level SynthesisJournal of Low Power Electronics and Applications10.3390/jlpea1204005612:4(56)Online publication date: 17-Oct-2022
https://doi.org/10.3390/jlpea12040056
Forget LHarnisch GKeryell Rde Dinechin F(2022)A single-source C++20 HLS flow for function evaluation on FPGA and beyond.Proceedings of the 12th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies10.1145/3535044.3535051(51-58)Online publication date: 9-Jun-2022
https://dl.acm.org/doi/10.1145/3535044.3535051
Eyvazpour RShoaran MKarimian G(2022)Hardware implementation of SLAM algorithms: a survey on implementation approaches and platformsArtificial Intelligence Review10.1007/s10462-022-10310-556:7(6187-6239)Online publication date: 23-Nov-2022
https://dl.acm.org/doi/10.1007/s10462-022-10310-5

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents