research-article

Let Coarse-Grained Resources Be Shared: Mapping Entire Neural Networks on FPGAs

Authors:

Tzung-Han Juang,

Christof Schlaak,

Christophe DubachAuthors Info & Claims

ACM Transactions on Embedded Computing Systems, Volume 22, Issue 5s

Article No.: 114, Pages 1 - 23

https://doi.org/10.1145/3609109

Published: 09 September 2023 Publication History

Abstract

Traditional High-Level Synthesis (HLS) provides rapid prototyping of hardware accelerators without coding with Hardware Description Languages (HDLs). However, such an approach does not well support allocating large applications like entire deep neural networks on a single Field Programmable Gate Array (FPGA) device. The approach leads to designs that are inefficient or do not fit into FPGAs due to resource constraints.

This work proposes to shrink generated designs by coarse-grained resource control based on function sharing in functional Intermediate Representations (IRs). The proposed compiler passes and rewrite system aim at producing valid design points and removing redundant hardware. Such optimizations make fitting entire neural networks on FPGAs feasible and produce competitive performance compared to running specialized kernels for each layer.

References

[1]

C. P. R. Baaij. 2015. Digital circuit in C\(\lambda\)aSH: functional specifications and type-directed synthesis. Ph.D. Dissertation. University of Twente, Netherlands. eemcs-eprint-23939.

[2]

Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avižienis, John Wawrzynek, and Krste Asanović. 2012. Chisel: Constructing hardware in a scala embedded language. In Proceedings of the 49th Annual Design Automation Conference (DAC).

Digital Library

[3]

Zhihong Bai, Haoxin Fan, Lingzhi Liu, Li Liu, and Dong Wang. 2019. An OpenCL-based FPGA accelerator with the winograd’s minimal filtering algorithm for convolution neuron networks. In 2019 IEEE 5th International Conference on Computer and Communications (ICCC’19). 277–282.

[4]

Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 33–36.

Digital Library

[5]

Gregory J. Chaitin. 1982. Register allocation & spilling via graph coloring. ACM Sigplan Notices 17, 6 (1982), 98–101.

Digital Library

[6]

Jason Cong, Peng Wei, Cody Hao Yu, and Peipei Zhou. 2017. Bandwidth optimization through on-chip memory restructuring for HLS. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC’17). IEEE, 1–6.

Digital Library

[7]

David Durst, Matthew Feldman, Dillon Huff, David Akeley, Ross Daly, Gilbert Louis Bernstein, Marco Patrignani, Kayvon Fatahalian, and Pat Hanrahan. 2020. Type-directed scheduling of streaming accelerators. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).

Digital Library

[8]

Hsuan Hsiao and Jason H. Anderson. 2018. Sensei: An area-reduction advisor for FPGA high-level synthesis. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE.

[9]

Qijing Huang, Ameer Haj-Ali, William Moses, John Xiang, Ion Stoica, Krste Asanovic, and John Wawrzynek. 2020. AutoPhase: Juggling HLS phase orderings in random forests with deep reinforcement learning. arXiv preprint arXiv:2003.00671 (2020).

[10]

G. Jo, H. Kim, J. Lee, and J. Lee. 2020. SOFF: An OpenCL high-level synthesis framework for FPGAs. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20). 295–308.

[11]

David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2018. Spatial: A language and compiler for application accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).

Digital Library

[12]

Martin Kristien, Bruno Bodin, Michel Steuwer, and Christophe Dubach. 2019. High-level synthesis of functional patterns with lift. In Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming (ARRAY).

Digital Library

[13]

Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. 2019. HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, New York, NY, USA, 242–251.

Digital Library

[14]

Huyuan Li. 2017. Acceleration of Deep Learning on FPGA. Ph.D. Dissertation. University of Windsor (Canada).

[15]

Hung-Yi Liu and Luca P. Carloni. 2013. On learning-based methods for design-space exploration with high-level synthesis. In Proceedings of the 50th Annual Design Automation Conference.

Digital Library

[16]

Richard Membarth, Oliver Reiche, Frank Hannig, Jürgen Teich, Mario Körner, and Wieland Eckert. 2015. Hipa cc: A domain-specific language and compiler for image processing. IEEE Transactions on Parallel and Distributed Systems 27, 1 (2015), 210–224.

Digital Library

[17]

Alexander Montgomerie-Corcoran, Zhewen Yu, and Christos-Savvas Bouganis. 2022. SAMO: Optimised mapping of convolutional neural networks to streaming architectures. In 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL’22). 418–424.

[18]

Rachit Nigam, Sachille Atapattu, Samuel Thomas, Zhijing Li, Theodore Bauer, Yuwei Ye, Apurva Koti, Adrian Sampson, and Zhiru Zhang. 2020. Predictable accelerator design with time-sensitive affine types. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020).

Digital Library

[19]

M Akif Özkan, Arsène Pérard-Gayot, Richard Membarth, Philipp Slusallek, Roland Leißa, Sebastian Hack, Jürgen Teich, and Frank Hannig. 2020. AnyHLS: High-level synthesis with partial evaluation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 3202–3214.

[20]

Luca Piccolboni, Paolo Mantovani, Giuseppe Di Guglielmo, and Luca P. Carloni. 2017. COSMOS: Coordination of high-level synthesis and memory optimization for hardware accelerators. ACM Transactions on Embedded Computing Systems (TECS) 16, 5s (2017), 1–22.

Digital Library

[21]

Raghu Prabhakar, David Koeplinger, Kevin J. Brown, HyoukJoong Lee, Christopher De Sa, Christos Kozyrakis, and Kunle Olukotun. 2016. Generating configurable hardware from parallel patterns. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 651–665.

Digital Library

[22]

Christof Schlaak, Tzung-Han Juang, and Christophe Dubach. 2022. Memory-aware functional IR for higher-level synthesis of accelerators. ACM Trans. Archit. Code Optim. 19, 2, Article 16 (Jan. 2022), 26 pages.

Digital Library

[23]

Christof Schlaak, Tzung-Han Juang, and Christophe Dubach. 2022. Optimizing data reshaping operations in functional IRs for high-level synthesis. In Proceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’22). Association for Computing Machinery, New York, NY, USA, 61–72.

Digital Library

[24]

Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2014. Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). IEEE.

[25]

Stylianos I. Venieris and Christos-Savvas Bouganis. 2016. fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). 40–47.

[26]

Ke Xu, Xiaoyun Wang, Xinyang Liu, Changfeng Cao, Huolin Li, Haiyong Peng, and Dong Wang. 2021. A dedicated hardware accelerator for real-time acceleration of YOLOv2. Journal of Real-Time Image Processing 18 (2021), 481–492.

Digital Library

[27]

Hanchen Ye, HyeGang Jun, Hyunmin Jeong, Stephen Neuendorffer, and Deming Chen. 2022. ScaleHLS: A scalable high-level synthesis framework with multi-level transformations and optimizations: Invited. In Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC’22). Association for Computing Machinery, New York, NY, USA, 1355–1358.

Digital Library

[28]

Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2019. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 11 (2019), 2072–2085.

Digital Library

Index Terms

Let Coarse-Grained Resources Be Shared: Mapping Entire Neural Networks on FPGAs
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language types
        Functional languages

Recommendations

Optimizing data reshaping operations in functional IRs for high-level synthesis
LCTES 2022: Proceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

FPGAs (Field Programmable Gate Arrays) have become the substrate of choice to implement accelerators. They deliver high performance with low power consumption, while offering the flexibility of being re-programmable. But they are notoriously hard to ...
A Parametrizable High-Level Synthesis Library for Accelerating Neural Networks on FPGAs
Abstract
In recent years, Convolutional Neural Network CNN have been incorporated in a large number of applications, including multimedia retrieval and image classification. However, CNN based algorithms are computationally and resource intensive and ...
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional neural networks (CNN) are the current stateof-the-art for many computer vision tasks. CNNs outperform older methods in accuracy, but require vast amounts of computation and memory. As a result, existing CNN applications are typically run ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 22, Issue 5s

Special Issue ESWEEK 2023

October 2023

1394 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3614235

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 09 September 2023

Accepted: 13 July 2023

Revised: 02 June 2023

Received: 23 March 2023

Published in TECS Volume 22, Issue 5s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
345
Total Downloads

Downloads (Last 12 months)176
Downloads (Last 6 weeks)27

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents