short-paper

Open access

Multi-input Serial Adders for FPGA-like Computational Fabric

Authors:

Matthew DentonAuthors Info & Claims

FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 35 - 41

https://doi.org/10.1145/3490422.3502352

Published: 11 February 2022 Publication History

Abstract

In this paper, we present a new functional unit to replace the LUT in an FPGA-like computational fabric designed specifically for use to accelerate instance-specific sparse integer matrix multiplication. We use a suite of matrices, the VPR place-and-route tool, and modern architecture representations of the interconnect to examine this architectural idea. The new cell, called the K--ADD, increases density by 2.5x to 4x, and increases performance by 8% to 30% by simultaneously increasing the clock rate and reducing the number of cycles to compute the product. This benefit magnifies the two-orders-of-magnitude advantage of using instance-specific matrix multipliers demonstrated in prior work. We investigate the cluster size, N, across multiple technology nodes. In that investigation, we see a sustained benefit to a larger cluster size (N=8). This observation holds for both netlists mapped to a 6--LUT and to a 6--ADD, which implies this behavior has more to do with the peculiar structure of these matrix multiplication netlists, not the different functional unit.

Supplementary Material

MP4 File (FPGA22-fpgasp153.mp4)

This presentation summarizes our paper on implementing sparse matrix multiplication on FPGA-like fabrics. We introduce the K-ADD, a functional unit that can be used to replace a LUT in a programmable interconnect. The result is a computational fabric specialized to integer matrix multiplication. We evaluate this proposed device by using VPR and modern interconnect architectures to map a suite of matrices. The baseline device is a 6-LUT based FPGA, and our proposed device, which has replaced the 6-LUTs with a 6-ADDs. The 6-ADD based device is up to 4x smaller than a conventional logic FPGA, and has up to 33% higher performance.

Download
20.43 MB

References

[1]

E. Ahmed and J. Rose. 2004. The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 12, 3 (2004), 288--298. https://doi.org/10.1109/TVLSI.2004.824300

Digital Library

[2]

A. Avizienis. 1961. Signed-Digit Number Representations for Fast Parallel Arithmetic. IRE Transactions on Electronic Computers, Vol. EC-10, 3 (1961), 389--400. https://doi.org/10.1109/TEC.1961.5219227

[3]

F. M. Bianchi, S. Scardapane, S. Løkse, and R. Jenssen. 2020. Reservoir Computing Approaches for Representation and Classification of Multivariate Time Series. IEEE Transactions on Neural Networks and Learning Systems (2020), 1--11.

[4]

Yu Cao. [n.d.]. Predictive Technology Model. http://ptm.asu.edu/ (accessed Aug 26, 2020).

[5]

Lawrence T. Clark, Vinay Vashishtha, Lucian Shifren, Aditya Gujja, Saurabh Sinha, Brian Cline, Chandarasekaran Ramamurthy, and Greg Yeric. 2016. ASAP7: A 7-nm finFET predictive process design kit. Microelectronics Journal, Vol. 53 (2016), 105--115. https://doi.org/10.1016/j.mejo.2016.04.006

Digital Library

[6]

A. DeHon. 1994. DPGA-coupled microprocessors: commodity ICs for the early 21st Century. In Proceedings of IEEE Workshop on FPGA's for Custom Computing Machines. 31--39. https://doi.org/10.1109/FPGA.1994.315596

[7]

Matthew Denton and Herman Schmit. 2021. Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing. CoRR, Vol. abs/2101.08884 (2021). arxiv: 2101.08884 https://arxiv.org/abs/2101.08884

[8]

Matthew Denton and Herman Schmit. 2022. Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing. In 2022 IEEE International Symposium on High Performance Computer Architecture (HPCA) .

[9]

Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse GPU Kernels for Deep Learning. arxiv: 2006.10901 [cs.LG] https://arxiv.org/abs/2006.10901

[10]

Jianhua Gao, Weixing Ji, Zhaonian Tan, and Yueyan Zhao. 2020. A Systematic Survey of General Sparse Matrix-Matrix Multiplication. CoRR, Vol. abs/2002.11273 (2020). showeprint[arXiv]2002.11273 https://arxiv.org/abs/2002.11273

[11]

Seth Copen Goldstein, Herman Schmit, Matthew Moe, Mihai Budiu, Srihari Cadambi, R. Reed Taylor, and Ronald Laufer. 1999. PipeRench: A Co/Processor for Streaming Multimedia Acceleration. In Proceedings of the 26th Annual International Symposium on Computer Architecture (Atlanta, Georgia, USA) (ISCA '99). IEEE Computer Society, USA, 28--39. https://doi.org/10.1145/300979.300982

Digital Library

[12]

Intel. 2019 (accessed Aug 30, 2021). Intel Agilex Logic Array Block and Adaptive Logic Modules User Guide, UG-20204 (2019.11.14) . https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/agilex/ug-ag-lab.pdf.

[13]

Jason Leonard and William H. Mangione-Smith. 1997. A case study of partially evaluated hardware circuits: Key-specific DES. In Field-Programmable Logic and Applications, Wayne Luk, Peter Y. K. Cheung, and Manfred Glesner (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 151--160.

[14]

Jason Luu, Ian Kuon, Peter Jamieson, Ted Campbell, Andy Ye, Wei Mark Fang, and Jonathan Rose. 2009. VPR 5.0: FPGA Cad and Architecture Exploration Tools with Single-Driver Routing, Heterogeneity and Process Scaling. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, California, USA) (FPGA '09). Association for Computing Machinery, New York, NY, USA, 133--142. https://doi.org/10.1145/1508128.1508150

Digital Library

[15]

Kevin E. Murray, Oleg Petelin, Sheng Zhong, Jai Min Wang, Mohamed ElDafrawy, Jean-Philippe Legault, Eugene Sha, Aaron G. Graham, Jean Wu, Matthew J. P. Walker, Hanqing Zeng, Panagiotis Patros, Jason Luu, Kenneth B. Kent, and Vaughn Betz. 2020. VTR 8: High Performance CAD and Customizable FPGA Architecture Modelling. ACM Trans. Reconfigurable Technol. Syst. (2020).

[16]

Stefan Nikolić, Francky Catthoor, Zsolt THokei, and Paolo Ienne. 2021. Global Is the New Local: FPGA Architecture at 5nm and Beyond. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Virtual Event, USA) (FPGA '21). Association for Computing Machinery, New York, NY, USA, 34--44. https://doi.org/10.1145/3431920.3439300

Digital Library

[17]

Stefan Nikoliç, Grace Zgheib, and Paolo Ienne. 2020. Straight to the Point: Intra- and Intercluster LUT Connections to Mitigate the Delay of Programmable Routing. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA, USA) (FPGA '20). Association for Computing Machinery, New York, NY, USA, 150--160. https://doi.org/10.1145/3373087.3375315

Digital Library

[18]

University of California Berkeley. 1992. Berkeley Logic Interchange Format (BLIF). https://docs.verilogtorouting.org/en/latest/_downloads/a79b6634b582f56c053f2abad923112a/blif.pdf . Accessed: 2021-09-07.

[19]

Oliver Pell and Wayne Luk. 2008. Instance-specific Design. In Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, Scott Hauck and Andre Dehon (Eds.). Morgan-Kaufman, Chapter 22, 455--473.

[20]

E. Qin, A. Samajdar, H. Kwon, V. Nadella, S. Srinivasan, D. Das, B. Kaul, and T. Krishna. 2020. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 58--70.

[21]

Xilinx. 2017 (accessed Aug 30, 2021). UltraScale Architecture CLB User Guide, UG574 (v1.5) . https://www.xilinx.com/support/documentation/user_guides/ug574-ultrascale-clb.pdf.

[22]

Zhekai Zhang, Hanrui Wang, Song Han, and William J. Dally. 2020. SpArch: Efficient Architecture for Sparse Matrix Multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) . 261--274. https://doi.org/10.1109/HPCA47549.2020.00030

[23]

Peixin Zhong, Margaret Martonosi, and Sharad Malik. 2008. Boolean satisfiability: Creating solvers optimized for specific problem instances. In Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, Scott Hauck and Andre Dehon (Eds.). Morgan-Kaufman, Chapter 29, 613--636.

Recommendations

Efficient Function Implementation for Bit-Serial Parallel Processors

Parallel processors with bit-serial processing elements (PE's) usually implement arithmetic functions by a sequence of word-level arithmetic operations; however, basic operations must be specified at the bit level. In this correspondence the possibility ...
High-speed FPGA 10's complement adders-subtractors
Special issue on selected papers from spl 2009 programmable logic and applications

This paper first presents a study on the classical BCD adders from which a carry-chain type adder is redesigned to fit within the Xilinx FPGA's platforms. Some new concepts are presented to compute the Pand Gfunctions for carry-chain optimization ...
Decimal Adders/Subtractors in FPGA: Efficient 6-input LUT Implementations
RECONFIG '09: Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAs

This paper presents FPGA implementations of add/subtract algorithms for 10 s complement BCD numbers. Carry-chain type circuits have been designed on 6-input LUT s Xilinx Virtex-5 FPGA technologies. Some new concepts are reviewed to compute the P and G ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2022

211 pages

ISBN:9781450391498

DOI:10.1145/3490422

General Chair:
Michael Adler
Intel, USA
,
Program Chair:
Paolo Ienne
EPFL, Switzerland

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2022

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

FPGA '22

Sponsor:

SIGDA

FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 27 - March 1, 2022

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
894
Total Downloads

Downloads (Last 12 months)214
Downloads (Last 6 weeks)23

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents