Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3490422.3502352acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
short-paper
Open access

Multi-input Serial Adders for FPGA-like Computational Fabric

Published: 11 February 2022 Publication History

Abstract

In this paper, we present a new functional unit to replace the LUT in an FPGA-like computational fabric designed specifically for use to accelerate instance-specific sparse integer matrix multiplication. We use a suite of matrices, the VPR place-and-route tool, and modern architecture representations of the interconnect to examine this architectural idea. The new cell, called the K--ADD, increases density by 2.5x to 4x, and increases performance by 8% to 30% by simultaneously increasing the clock rate and reducing the number of cycles to compute the product. This benefit magnifies the two-orders-of-magnitude advantage of using instance-specific matrix multipliers demonstrated in prior work. We investigate the cluster size, N, across multiple technology nodes. In that investigation, we see a sustained benefit to a larger cluster size (N=8). This observation holds for both netlists mapped to a 6--LUT and to a 6--ADD, which implies this behavior has more to do with the peculiar structure of these matrix multiplication netlists, not the different functional unit.

Supplementary Material

MP4 File (FPGA22-fpgasp153.mp4)
This presentation summarizes our paper on implementing sparse matrix multiplication on FPGA-like fabrics. We introduce the K-ADD, a functional unit that can be used to replace a LUT in a programmable interconnect. The result is a computational fabric specialized to integer matrix multiplication. We evaluate this proposed device by using VPR and modern interconnect architectures to map a suite of matrices. The baseline device is a 6-LUT based FPGA, and our proposed device, which has replaced the 6-LUTs with a 6-ADDs. The 6-ADD based device is up to 4x smaller than a conventional logic FPGA, and has up to 33% higher performance.

References

[1]
E. Ahmed and J. Rose. 2004. The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 12, 3 (2004), 288--298. https://doi.org/10.1109/TVLSI.2004.824300
[2]
A. Avizienis. 1961. Signed-Digit Number Representations for Fast Parallel Arithmetic. IRE Transactions on Electronic Computers, Vol. EC-10, 3 (1961), 389--400. https://doi.org/10.1109/TEC.1961.5219227
[3]
F. M. Bianchi, S. Scardapane, S. Løkse, and R. Jenssen. 2020. Reservoir Computing Approaches for Representation and Classification of Multivariate Time Series. IEEE Transactions on Neural Networks and Learning Systems (2020), 1--11.
[4]
Yu Cao. [n.d.]. Predictive Technology Model. http://ptm.asu.edu/ (accessed Aug 26, 2020).
[5]
Lawrence T. Clark, Vinay Vashishtha, Lucian Shifren, Aditya Gujja, Saurabh Sinha, Brian Cline, Chandarasekaran Ramamurthy, and Greg Yeric. 2016. ASAP7: A 7-nm finFET predictive process design kit. Microelectronics Journal, Vol. 53 (2016), 105--115. https://doi.org/10.1016/j.mejo.2016.04.006
[6]
A. DeHon. 1994. DPGA-coupled microprocessors: commodity ICs for the early 21st Century. In Proceedings of IEEE Workshop on FPGA's for Custom Computing Machines. 31--39. https://doi.org/10.1109/FPGA.1994.315596
[7]
Matthew Denton and Herman Schmit. 2021. Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing. CoRR, Vol. abs/2101.08884 (2021). arxiv: 2101.08884 https://arxiv.org/abs/2101.08884
[8]
Matthew Denton and Herman Schmit. 2022. Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing. In 2022 IEEE International Symposium on High Performance Computer Architecture (HPCA) .
[9]
Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse GPU Kernels for Deep Learning. arxiv: 2006.10901 [cs.LG] https://arxiv.org/abs/2006.10901
[10]
Jianhua Gao, Weixing Ji, Zhaonian Tan, and Yueyan Zhao. 2020. A Systematic Survey of General Sparse Matrix-Matrix Multiplication. CoRR, Vol. abs/2002.11273 (2020). showeprint[arXiv]2002.11273 https://arxiv.org/abs/2002.11273
[11]
Seth Copen Goldstein, Herman Schmit, Matthew Moe, Mihai Budiu, Srihari Cadambi, R. Reed Taylor, and Ronald Laufer. 1999. PipeRench: A Co/Processor for Streaming Multimedia Acceleration. In Proceedings of the 26th Annual International Symposium on Computer Architecture (Atlanta, Georgia, USA) (ISCA '99). IEEE Computer Society, USA, 28--39. https://doi.org/10.1145/300979.300982
[12]
Intel. 2019 (accessed Aug 30, 2021). Intel Agilex Logic Array Block and Adaptive Logic Modules User Guide, UG-20204 (2019.11.14) . https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/agilex/ug-ag-lab.pdf.
[13]
Jason Leonard and William H. Mangione-Smith. 1997. A case study of partially evaluated hardware circuits: Key-specific DES. In Field-Programmable Logic and Applications, Wayne Luk, Peter Y. K. Cheung, and Manfred Glesner (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 151--160.
[14]
Jason Luu, Ian Kuon, Peter Jamieson, Ted Campbell, Andy Ye, Wei Mark Fang, and Jonathan Rose. 2009. VPR 5.0: FPGA Cad and Architecture Exploration Tools with Single-Driver Routing, Heterogeneity and Process Scaling. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, California, USA) (FPGA '09). Association for Computing Machinery, New York, NY, USA, 133--142. https://doi.org/10.1145/1508128.1508150
[15]
Kevin E. Murray, Oleg Petelin, Sheng Zhong, Jai Min Wang, Mohamed ElDafrawy, Jean-Philippe Legault, Eugene Sha, Aaron G. Graham, Jean Wu, Matthew J. P. Walker, Hanqing Zeng, Panagiotis Patros, Jason Luu, Kenneth B. Kent, and Vaughn Betz. 2020. VTR 8: High Performance CAD and Customizable FPGA Architecture Modelling. ACM Trans. Reconfigurable Technol. Syst. (2020).
[16]
Stefan Nikolić, Francky Catthoor, Zsolt THokei, and Paolo Ienne. 2021. Global Is the New Local: FPGA Architecture at 5nm and Beyond. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Virtual Event, USA) (FPGA '21). Association for Computing Machinery, New York, NY, USA, 34--44. https://doi.org/10.1145/3431920.3439300
[17]
Stefan Nikoliç, Grace Zgheib, and Paolo Ienne. 2020. Straight to the Point: Intra- and Intercluster LUT Connections to Mitigate the Delay of Programmable Routing. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA, USA) (FPGA '20). Association for Computing Machinery, New York, NY, USA, 150--160. https://doi.org/10.1145/3373087.3375315
[18]
University of California Berkeley. 1992. Berkeley Logic Interchange Format (BLIF). https://docs.verilogtorouting.org/en/latest/_downloads/a79b6634b582f56c053f2abad923112a/blif.pdf . Accessed: 2021-09-07.
[19]
Oliver Pell and Wayne Luk. 2008. Instance-specific Design. In Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, Scott Hauck and Andre Dehon (Eds.). Morgan-Kaufman, Chapter 22, 455--473.
[20]
E. Qin, A. Samajdar, H. Kwon, V. Nadella, S. Srinivasan, D. Das, B. Kaul, and T. Krishna. 2020. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 58--70.
[21]
Xilinx. 2017 (accessed Aug 30, 2021). UltraScale Architecture CLB User Guide, UG574 (v1.5) . https://www.xilinx.com/support/documentation/user_guides/ug574-ultrascale-clb.pdf.
[22]
Zhekai Zhang, Hanrui Wang, Song Han, and William J. Dally. 2020. SpArch: Efficient Architecture for Sparse Matrix Multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) . 261--274. https://doi.org/10.1109/HPCA47549.2020.00030
[23]
Peixin Zhong, Margaret Martonosi, and Sharad Malik. 2008. Boolean satisfiability: Creating solvers optimized for specific problem instances. In Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, Scott Hauck and Andre Dehon (Eds.). Morgan-Kaufman, Chapter 29, 613--636.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2022
211 pages
ISBN:9781450391498
DOI:10.1145/3490422
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2022

Check for updates

Author Tags

  1. bit-serial arithmetic
  2. field-programmable gate arrays
  3. instance-specific acceleration
  4. reservoir computing
  5. sparse matrix multiplication

Qualifiers

  • Short-paper

Conference

FPGA '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 894
    Total Downloads
  • Downloads (Last 12 months)214
  • Downloads (Last 6 weeks)23
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media