Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Design of Synthesis-time Vectorized Arithmetic Hardware for Tapered Floating-point Addition and Subtraction

Published: 22 March 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Energy efficiency has become the new performance criterion in this era of pervasive embedded computing; thus, accelerator-rich multi-processor system-on-chips are commonly used in embedded computing hardware. Once computationally intensive machine learning applications gained much traction, they are now deployed in many application domains due to abundant and cheaply available computational capacity. In addition, there is a growing trend toward developing hardware accelerators for machine learning applications for embedded edge devices where performance and energy efficiency are critical. Although these hardware accelerators frequently use floating-point operations for accuracy, reduced-width floating-point formats are also used to reduce hardware complexity; thus, power consumption while maintaining accuracy. Vectorization concepts can also be used to improve performance, energy efficiency, and memory bandwidth. We propose the design of a vectorized floating-point adder/subtractor that supports arbitrary length floating-point formats with varying exponent and mantissa widths in this article. In comparison to existing designs in the literature, the proposed design is 2.57× area- and 1.56× power-efficient, and it supports true vectorization with no restrictions on exponent and mantissa widths.

    References

    [1]
    Aqil M. Azmi and Fabrizio Lombardi. 1989. On a tapered floating point system. In Proceedings of the 9th Symposium on Computer Arithmetic. IEEE, 2–9.
    [2]
    David H. Bailey. 2005. High-precision floating-point arithmetic in scientific computation. Comput. Sci. Eng. 7, 3 (2005), 54–61.
    [3]
    Léopold Cambier, Anahita Bhiwandiwalla, Ting Gong, Mehran Nekuii, Oguz H. Elibol, and Hanlin Tang. 2020. Shifted and squeezed 8-bit floating point format for low-precision training of deep neural networks. arXiv preprint arXiv:2001.05674 (2020).
    [4]
    Florent De Dinechin, Bogdan Pasca, and E. Normale. 2011. Custom arithmetic datapath design for FPGAs using the FloPoCo core generator. IEEE Des. Test Comput. 28, 4 (2011), 18–27.
    [5]
    Jean-Pierre Deschamps, Gery J. A. Bioul, and Gustavo D. Sutter. 2006. Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems. John Wiley & Sons.
    [6]
    Hadi Esmaeilzadeh, Emily Blem, Renée St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2013. Power challenges may end the multicore era. Commun. ACM 56, 2 (2013).
    [7]
    Fernando D. Franke, Diego S. Picada, and André L. Aita. 2001. A Logarithmic Shifter for a Floating-Point Adder. In Proceeding of the Student Forum on Microelectronics, SBMicro.
    [8]
    David Goldberg. 1991. What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23, 1 (1991), 5–48.
    [9]
    Neha Gupta. 2021. Introduction to hardware accelerator systems for artificial intelligence and machine learning. In Advances in Computers. Vol. 122. Elsevier, 1–21.
    [10]
    Libo Huang, Li Shen, Kui Dai, and Zhiying Wang. 2007. A new architecture for multiple-precision floating-point multiply-add fused unit design. In Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH’07). IEEE, 69–76.
    [11]
    Christopher Inacio and Denise Ombres. 1996. The DSP decision: Fixed point or floating?IEEE Spect. 33, 9 (1996), 72–74.
    [12]
    Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1–12.
    [13]
    Himanshu Kaul, Mark Anders, Sanu Mathew, Steven Hsu, Amit Agarwal, Farhana Sheikh, Ram Krishnamurthy, and Shekhar Borkar. 2012. A 1.45 GHz 52-to-162GFLOPS/W variable-precision floating-point fused multiply-add unit with certainty tracking in 32nm CMOS. In Proceedings of the IEEE International Solid-state Circuits Conference. IEEE, 182–184.
    [14]
    Stefan Mach. 2021. Floating-point Architectures for Energy-efficient Transprecision Computing. Ph. D. Dissertation. ETH Zurich.
    [15]
    Stefan Mach. 2021. FPnew - New Floating-Point Unit with Transprecision Capabilities. Retrieved from https://github.com/pulp-platform/fpnew.
    [16]
    Stefan Mach, Fabian Schuiki, Florian Zaruba, and Luca Benini. 2020. FPnew: An open-source multiformat floating-point unit architecture for energy-proportional transprecision computing. IEEE Trans. Very Large Scale Integ. Syst. 29, 4 (2020), 774–787.
    [17]
    A. Cristiano, I. Malossi, Michael Schaffner, Anca Molnos, Luca Gammaitoni, Giuseppe Tagliavini, Andrew Emerson, Andrés Tomás, Dimitrios S. Nikolopoulos, Eric Flamand, and Norbert Wehn. 2018. The transprecision computing paradigm: Concept, design, and applications. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE). 1105–1110. DOI:.
    [18]
    Naveen Mellempudi, Sudarshan Srinivasan, Dipankar Das, and Bharat Kaul. 2019. Mixed precision training with 8-bit floating point. arXiv preprint arXiv:1905.12334 (2019).
    [19]
    Robert Morris. 1971. Tapered floating point: A new floating-point representation. IEEE Trans. Comput. 100, 12 (1971), 1578–1579.
    [20]
    Jean-Michel Muller, Nicolas Brisebarre, Florent De Dinechin, Claude-Pierre Jeannerod, Vincent Lefevre, Guillaume Melquiond, Nathalie Revol, Damien Stehlé, Serge Torres, et al. 2018. Handbook of Floating-point Arithmetic. Vol. 1. Springer.
    [21]
    Alberto Nannarelli. 2019. Tunable floating-point adder. IEEE Trans. Comput. 68, 10 (2019), 1553–1560.
    [22]
    Michael L. Overton. 2001. Numerical Computing with IEEE Floating Point Arithmetic. SIAM.
    [23]
    Hyunbin Park and Shiho Kim. 2021. Hardware accelerator systems for artificial intelligence and machine learning. In Advances in Computers. Vol. 122. Elsevier, 51–95.
    [24]
    Stefan Payer, Cedric Lichtenau, Michael Klein, Kerstin Schelm, Petra Leber, Nicol Hofmann, and Tina Babinsky. 2020. SIMD multi format floating-point unit on the IBM z15 (TM). In Proceedings of the IEEE 27th Symposium on Computer Arithmetic (ARITH). IEEE, 125–128.
    [25]
    Tayyar Rzayev, Saber Moradi, David H. Albonesi, and Rajit Manchar. 2017. DeepRecon: Dynamically reconfigurable architecture for accelerating deep neural networks. In Proceedings of the International Joint Conference on Neural Networks (IJCNN). IEEE, 116–124.
    [26]
    Michael B. Taylor. 2013. A landscape of the new dark silicon design regime. IEEE Micro 33, 5 (2013), 8–19.
    [27]
    Jonathan Ying Fai Tong, David Nagle, and Rob A. Rutenbar. 2000. Reducing power by optimizing the necessary precision/range of floating-point arithmetic. IEEE Trans. Very Large Scale Integ. Syst. 8, 3 (2000), 273–286.
    [28]
    Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. 2018. Training deep neural networks with 8-bit floating point numbers. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 7686–7695.
    [29]
    Shaolin Xie, Scott Davidson, Ikuo Magaki, Moein Khazraee, Luis Vega, Lu Zhang, and Michael B. Taylor. 2018. Extreme datacenter specialization for planet-scale computing: ASIC clouds. ACM SIGOPS Oper. Syst. Rev. 52, 1 (2018), 96–108.
    [30]
    Xilinx. 2020. Floating-Point Operator, LogiCORE IP Product Guide, Version 7.1. Retrieved from https://www.xilinx.com/products/intellectual-property/floating_pt.html.
    [31]
    Pedram Zamirai, Jian Zhang, Christopher R. Aberger, and Christopher De Sa. 2020. Revisiting BFloat16 training. arXiv preprint arXiv:2010.06192 (2020).
    [32]
    Hao Zhang, Dongdong Chen, and Seok-Bum Ko. 2019. New flexible multiple-precision multiply-accumulate unit for deep neural network training and inference. IEEE Trans. Comput. 69, 1 (2019), 26–38.

    Index Terms

    1. Design of Synthesis-time Vectorized Arithmetic Hardware for Tapered Floating-point Addition and Subtraction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 28, Issue 3
      May 2023
      456 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/3587887
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 22 March 2023
      Online AM: 08 October 2022
      Accepted: 23 September 2022
      Revised: 04 July 2022
      Received: 16 March 2022
      Published in TODAES Volume 28, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Hardware accelerators
      2. floating-point hardware
      3. digital circuits
      4. register transfer level (RTL)
      5. application specific integrated circuits (ASICs)
      6. intellectual property (IP)

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 311
        Total Downloads
      • Downloads (Last 12 months)124
      • Downloads (Last 6 weeks)15
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media