Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The Unified Accumulator Architecture: A Configurable, Portable, and Extensible Floating-Point Accumulator

Published: 20 May 2016 Publication History

Abstract

Applications accelerated by field-programmable gate arrays (FPGAs) often require pipelined floating-point accumulators with a variety of different trade-offs. Although previous work has introduced numerous floating-point accumulation architectures, few cores are available for public use, which forces designers to use fixed-point implementations or vendor-provided cores that are not portable and are often not optimized for the desired set of trade-offs. In this article, we combine and extend previous floating-point accumulator architectures into a configurable, open-source core, referred to as the unified accumulator architecture (UAA), which enables designers to choose between different trade-offs for different applications. UAA is portable across FPGAs and allows designers to specialize the underlying adder core to take advantage of device-specific optimizations. By providing an extensible, open-source implementation, we hope for the research community to extend the provided core with new architectures and optimizations.

References

[1]
Altera. 2014. Floating-Point IP Cores User Guide. Retrieved April 18, 2016, from https://www.altera.com/en_US/pdfs/literature/ug/ug_altfp_mfug.pdf.
[2]
S. Asano, T. Maruyama, and Y. Yamaguchi. 2009. Performance comparison of FPGA, GPU and CPU in image processing. In Proceedings of the 2009 International Conference on Field Programmable Logic and Applications (FPL’09). 126--131.
[3]
T. O. Bachir and J.-P. David. 2010. Performing floating-point accumulation on a modern FPGA in single and double precision. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’10). 105--108.
[4]
F. de Dinechin, B. Pasca, O. Cret, and R. Tudoran. 2008. An FPGA-specific approach to floating-point accumulation and sum-of-products. In Proceedings of the 2008 International Conference on ICECE Technology (FPT’08). 33--40.
[5]
James Demmel and Yozo Hida. 2003. Accurate and efficient floating point summation. SIAM Journal on Scientific Computing 25, 4, 1214--1248.
[6]
Jeremy Fowers, Greg Brown, Patrick Cooke, and Greg Stitt. 2012. A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12). ACM, New York, NY, 47--56.
[7]
Jeremy Fowers, Kalin Ovtcharov, Karin Strauss, Eric S. Chung, and Greg Stitt. 2014. A high memory bandwidth FPGA accelerator for sparse matrix-vector multiplication. In Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’14). 36--43.
[8]
Nicholas J. Higham. 1993. The accuracy of floating point summation. SIAM Journal on Scientific Computing 14, 4, 783--799.
[9]
E. Kadric, P. Gurniak, and A. DeHon. 2013. Accurate parallel floating-point accumulation. In Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH’13). 153--162.
[10]
N. Kapre and A. DeHon. 2007. Optimistic parallelization of floating-point accumulation. In Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH’07). 205--216.
[11]
S. Kestur, J. D. Davis, and O. Williams. 2010. BLAS comparison on FPGA, CPU and GPU. In Proceedings of the 2010 IEEE Computer Society Annual Symposium on VLSI (ISVLSI’10). 288--293.
[12]
G. Lienhart, A. Kugel, and R. Manner. 2002. Using floating-point arithmetic on FPGAs to accelerate scientific N-body simulations. In Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 182--191.
[13]
Wayne Luk and Ce Guo. 2013. Accelerating HAC estimation for multivariate time series. In Proceedings of the 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors (ASAP’13). IEEE, Los Alamitos, CA, 42--49.
[14]
A. Paidimarri, A. Cevrero, P. Brisk, and P. Ienne. 2009. FPGA implementation of a single-precision floating-point multiply-accumulator with single-cycle accumulation. In Proceedings of the 17th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’09). 267--270.
[15]
M. Papadonikolakis, C. Bouganis, and G. Constantinides. 2009. Performance comparison of GPU and FPGA architectures for the SVM training problem. In Proceedings of the 2009 International Conference on Field-Programmable Technology (FPT’09). 388--391.
[16]
Qingzeng Song, Junhua Gu, and Jinzhu Zhang. 2011. Design and implementation of an FPGA-based high-performance improved vector-reduction method. In Proceedings of the 2011 International Conference on Electronics and Optoelectronics (ICEOE’11), Vol. 2. V2-52--V2-55.
[17]
S. Song and J. Zambreno. 2009. A floating-point accumulator for FPGA-based high performance computing applications. In Proceedings of the International Conference on Field-Programmable Technology (FPT’09).
[18]
Xilinx. 2014. LogiCORE IP Floating-Point Operator v7.0. Retrieved April 18, 2016, from http://www.xilinx.com/support/documentation/ip_documentation/floating_point/v7_0/pg060-floating-point.pdf.
[19]
L. Zhuo, G. R. Morris, and V. K. Prasanna. 2005. Designing scalable FPGA-based reduction circuits using pipelined floating-point cores. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 147a.
[20]
L. Zhuo, G. R. Morris, and V. K. Prasanna. 2007. High-performance reduction circuits using deeply pipelined operators on FPGAs. IEEE Transactions on Parallel and Distributed Systems 18, 10, 1377--1392.
[21]
L. Zhuo and V. K. Prasanna. 2005. High-performance and area-efficient reduction circuits on FPGAs. In Proceedings of the 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’05). 52--59.

Cited By

View all
  • (2024)Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00048(297-303)Online publication date: 2-Sep-2024
  • (2023)Using FPGA Devices to Accelerate Tree-Based Genetic Programming: A Preliminary Exploration with Recent TechnologiesGenetic Programming10.1007/978-3-031-29573-7_12(182-197)Online publication date: 12-Apr-2023
  • (2021)A High-Speed Floating-Point Multiply-Accumulator Based on FPGAsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2021.310526829:10(1782-1789)Online publication date: Oct-2021

Index Terms

  1. The Unified Accumulator Architecture: A Configurable, Portable, and Extensible Floating-Point Accumulator

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Reconfigurable Technology and Systems
    ACM Transactions on Reconfigurable Technology and Systems  Volume 9, Issue 3
    Special Issue on Reconfigurable Components with Source Code
    September 2016
    128 pages
    ISSN:1936-7406
    EISSN:1936-7414
    DOI:10.1145/2940351
    • Editor:
    • Steve Wilton
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 May 2016
    Accepted: 01 July 2015
    Revised: 01 April 2015
    Received: 01 October 2014
    Published in TRETS Volume 9, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FPGA
    2. floating-point accumulation
    3. reduction circuits

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Science Foundation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00048(297-303)Online publication date: 2-Sep-2024
    • (2023)Using FPGA Devices to Accelerate Tree-Based Genetic Programming: A Preliminary Exploration with Recent TechnologiesGenetic Programming10.1007/978-3-031-29573-7_12(182-197)Online publication date: 12-Apr-2023
    • (2021)A High-Speed Floating-Point Multiply-Accumulator Based on FPGAsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2021.310526829:10(1782-1789)Online publication date: Oct-2021
    • (2020)A Tag Based Random Order Vector Reduction CircuitIEEE Access10.1109/ACCESS.2020.29767648(41502-41515)Online publication date: 2020

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media