Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626202.3637576acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article
Open access

Table-Lookup MAC: Scalable Processing of Quantised Neural Networks in FPGA Soft Logic

Published: 02 April 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Recent advancements in neural network quantisation have yielded remarkable outcomes, with three-bit networks reaching state-of-the-art full-precision accuracy in complex tasks. These achievements present valuable opportunities for accelerating neural networks by computing in reduced precision. Implementing it on FPGAs can take advantage of bit-level reconfigurability, which is not available on conventional CPUs and GPUs. Simultaneously, the high data intensity of neural network processing has inspired computing-in-memory paradigms, including on FPGA platforms. By programming the effects of trained model weights as lookup operations in soft logic, the transfer of weight data from memory units can be avoided, alleviating the memory bottleneck. However, previous methods face poor scalability - the high logic utilisation limiting them to small networks/sub-networks of binary models with low accuracy. In this paper, we introduce Table Lookup Multiply-Accumulate (TLMAC) as a framework to compile and optimise quantised neural networks for scalable lookup-based processing. TLMAC clusters and maps unique groups of weights to lookup-based processing elements, enabling highly parallel computation while taking advantage of parameter redundancy. Further place and route algorithms are proposed to reduce LUT utilisation and routing congestion. We demonstrate that TLMAC significantly improves the scalability of previous related works. Our efficient logic mapping and high degree of reuse enables entire ImageNet-scale quantised models with full-precision accuracy to be implemented using lookup-based computing on one commercially available FPGA.

    References

    [1]
    Igor Aleksander, Massimo De Gregorio, Felipe Maia Galv ao França, Priscila Machado Vieira Lima, and Helen Morton. 2009. A brief introduction to Weightless Neural Systems. In The European Symposium on Artificial Neural Networks. 299--305. https://api.semanticscholar.org/CorpusID:15177925
    [2]
    Marta Andronic and George A Constantinides. 2023. PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference. arXiv preprint arXiv:2309.02334.
    [3]
    Aman Arora, Tanmay Anand, Aatman Borda, Rishabh Sehgal, Bagus Hanindhito, Jaydeep Kulkarni, and Lizy K John. 2022. CoMeFa: Compute-in-Memory Blocks for FPGAs. In 2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 1--9.
    [4]
    Aman Arora, Atharva Bhamburkar, Aatman Borda, Tanmay Anand, Rishabh Sehgal, Bagus Hanindhito, Pierre-Emmanuel Gaillardon, Jaydeep Kulkarni, and Lizy K John. 2023. CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning Acceleration. ACM Transactions on Reconfigurable Technology and Systems (2023).
    [5]
    Babajide O Ayinde, Tamer Inanc, and Jacek M Zurada. 2019. Redundant feature pruning for accelerated inference in deep neural networks. Neural Networks, Vol. 118 (2019), 148--158.
    [6]
    Yash Bhalgat, Jinwon Lee, Markus Nagel, Tijmen Blankevoort, and Nojun Kwak. 2020. Lsq: Improving low-bit quantization through learnable offsets and better initialization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 696--697.
    [7]
    Yuzong Chen and Mohamed S Abdelfattah. 2023. BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs. arXiv preprint arXiv:2304.03974 (2023).
    [8]
    Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).
    [9]
    Anil Damle, Victor Minden, and Lexing Ying. 2019. Simple, direct and efficient multi-way spectral clustering. Information and Inference: A Journal of the IMA, Vol. 8, 1 (2019), 181--203.
    [10]
    Charles Eckert et al. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. In ISCA. IEEE, 383--396.
    [11]
    Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. 2020. Learned Step Size Quantization. In International Conference on Learning Representations. https://openreview.net/forum?id=rkgO66VKDS
    [12]
    Victor C Ferreira, Alexandre S Nery, Leandro AJ Marzulo, Leandro Santiago, Diego Souza, Brunno F Goldstein, Felipe MG Francc a, and Vladimir Alves. 2019. A feasible fpga weightless neural accelerator. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1--5.
    [13]
    Qing Jin, Jian Ren, Richard Zhuang, Sumant Hanumante, Zhengang Li, Zhiyu Chen, Yanzhi Wang, Kaiyuan Yang, and Sergey Tulyakov. 2022. F8net: Fixed-point 8-bit only multiplication for network quantization. arXiv preprint arXiv:2202.05239 (2022).
    [14]
    Seungchul Jung, Hyungwoo Lee, Sungmeen Myung, Hyunsoo Kim, Seung Keun Yoon, Soon-Wan Kwon, Yongmin Ju, Minje Kim, Wooseok Yi, Shinhee Han, et al. 2022. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature, Vol. 601, 7892 (2022), 211--216.
    [15]
    MD Arafat Kabir, Joshua Hollis, Atiyehsadat Panahi, Jason Bakos, Miaoqing Huang, and David Andrews. 2023. Making BRAMs Compute: Creating Scalable Computational Memory Fabric Overlays. In 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 224--224.
    [16]
    Nahsung Kim, Dongyeob Shin, Wonseok Choi, Geonho Kim, and Jongsun Park. 2020. Exploiting retraining-based mixed-precision quantization for low-cost DNN accelerator design. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, 7 (2020), 2925--2938.
    [17]
    Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2021. I-bert: Integer-only bert quantization. In International conference on machine learning. PMLR, 5506--5518.
    [18]
    Cecilia Latotzke, Tim Ciesielski, and Tobias Gemmeke. 2022. Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA. In 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 358--365.
    [19]
    Shuanglong Liu, Hongxiang Fan, Martin Ferianc, Xinyu Niu, Huifeng Shi, and Wayne Luk. 2021. Toward full-stack acceleration of deep convolutional neural networks on FPGAs. IEEE Transactions on Neural Networks and Learning Systems, Vol. 33, 8 (2021), 3974--3987.
    [20]
    Zechun Liu, Kwang-Ting Cheng, Dong Huang, Eric P Xing, and Zhiqiang Shen. 2022. Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4942--4952.
    [21]
    Zechun Liu, Baoyuan Wu, Wenhan Luo, Xin Yang, Wei Liu, and Kwang-Ting Cheng. 2018. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In Proceedings of the European conference on computer vision (ECCV). 722--737.
    [22]
    Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2018. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 26, 7 (2018), 1354--1367.
    [23]
    Igor DS Miranda, Aman Arora, Zachary Susskind, Luis AQ Villon, Rafael F Katopodis, Diego LC Dutra, Leandro S De Araújo, Priscila MV Lima, Felipe MG Francc a, Lizy K John, et al. 2022. LogicWiSARD: Memoryless Synthesis of Weightless Neural Networks. In 2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 19--26.
    [24]
    Tadej Murovivc and Andrej Trost. 2019. Massively parallel combinational binary neural networks for edge processing. Elektrotehniski Vestnik, Vol. 86, 1/2 (2019), 47--53.
    [25]
    Haotong Qin, Yifu Ding, Mingyuan Zhang, Qinghua Yan, Aishan Liu, Qingqing Dang, Ziwei Liu, and Xianglong Liu. 2022. Bibert: Accurate fully binarized bert. arXiv preprint arXiv:2203.06390 (2022).
    [26]
    Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Joon Kyung Kim, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 764--775.
    [27]
    Zhijun Tu, Xinghao Chen, Pengju Ren, and Yunhe Wang. 2022. Adabin: Improving binary neural networks with adaptive binary sets. In European conference on computer vision. Springer, 379--395.
    [28]
    Yaman Umuroglu, Yash Akhauri, Nicholas James Fraser, and Michaela Blott. 2020. LogicNets: Co-designed neural networks and circuits for extreme-throughput applications. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 291--297.
    [29]
    Stylianos I Venieris and Christos-Savvas Bouganis. 2018. fpgaConvNet: Mapping regular and irregular convolutional neural networks on FPGAs. IEEE transactions on neural networks and learning systems, Vol. 30, 2 (2018), 326--342.
    [30]
    Erwei Wang, James J Davis, Peter YK Cheung, and George A Constantinides. 2019. LUTNet: Rethinking inference in FPGA soft logic. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 26--34.
    [31]
    Erwei Wang, James J Davis, Georgios-Ilias Stavrou, Peter YK Cheung, George A Constantinides, and Mohamed Abdelfattah. 2022. Logic shrinkage: Learned FPGA netlist sparsity for efficient neural network inference. In Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 101--111.
    [32]
    Xiaowei Wang, Vidushi Goyal, Jiecao Yu, Valeria Bertacco, Andrew Boutros, Eriko Nurvitadhi, Charles Augustine, Ravi Iyer, and Reetuparna Das. 2021a. Compute-capable block RAMs for efficient deep learning acceleration on FPGAs. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 88--96.
    [33]
    Yin Wang, Hongwei Tang, Yufeng Xie, Xinyu Chen, Shunli Ma, Zhengzong Sun, Qingqing Sun, Lin Chen, Hao Zhu, Jing Wan, et al. 2021b. An in-memory computing architecture based on two-dimensional semiconductors for multiply-accumulate operations. Nature communications, Vol. 12, 1 (2021), 3347.
    [34]
    Xilinx. 2017. UltraScale Architecture Configurable Logic Block User Guide (UG574). Advanced Micro Devices. https://docs.xilinx.com/v/u/en-US/ug574-ultrascale-clb
    [35]
    Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, and Yuxiong He. 2022. Zeroquant: Efficient and affordable post-training quantization for large-scale transformers. Advances in Neural Information Processing Systems, Vol. 35 (2022), 27168--27183.

    Index Terms

    1. Table-Lookup MAC: Scalable Processing of Quantised Neural Networks in FPGA Soft Logic

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          FPGA '24: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays
          April 2024
          300 pages
          ISBN:9798400704185
          DOI:10.1145/3626202
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 02 April 2024

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. clustering
          2. field-programmable gate array
          3. lut-based computing
          4. place & route
          5. quantised neural networks
          6. simulated annealing

          Qualifiers

          • Research-article

          Funding Sources

          • Agency for Science, Technology and Research
          • National Research Foundation Singapore, Quantum Engineering Programme 2.0 (National Quantum Computing Hub)
          • Singapore Government?s Research, Innovation and Enterprise 2020 Plan (Advanced Manufacturing and Engineering domain)

          Conference

          FPGA '24
          Sponsor:

          Acceptance Rates

          Overall Acceptance Rate 125 of 627 submissions, 20%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 228
            Total Downloads
          • Downloads (Last 12 months)228
          • Downloads (Last 6 weeks)87
          Reflects downloads up to 26 Jul 2024

          Other Metrics

          Citations

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media