Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3508352.3549424acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration

Published: 22 December 2022 Publication History

Abstract

The generation of custom hardware accelerators for applications implemented within high-level productive programming frameworks requires considerable manual effort. To automate this process, we introduce SODA-OPT, a compiler tool that extends the MLIR infrastructure. SODA-OPT automatically searches, outlines, tiles, and pre-optimizes relevant code regions to generate high-quality accelerators through high-level synthesis. SODA-OPT can support any high-level programming framework and domain-specific language that interface with the MLIR infrastructure. By leveraging MLIR, SODA-OPT solves compiler optimization problems with specialized abstractions. Backend synthesis tools connect to SODA-OPT through progressive intermediate representation lowerings. SODA-OPT interfaces to a design space exploration engine to identify the combination of compiler optimization passes and options that provides high-performance generated designs for different backends and targets. We demonstrate the practical applicability of the compilation flow by exploring the automatic generation of accelerators for deep neural networks operators outlined at arbitrary granularity and by combining outlining with tiling on large convolution layers. Experimental results with kernels from the PolyBench benchmark show that our high-level optimizations improve execution delays of synthesized accelerators up to 60x. We also show that for the selected kernels, our solution outperforms the current of state-of-the art in more than 70% of the benchmarks and provides better average speedup in 55% of them. SODA-OPT is an open source project available at https://gitlab.pnnl.gov/sodalite/soda-opt.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX symposium on operating systems design and implementation (OSDI'16). USENIX, Savannah, GA, USA, 265--283. https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
[2]
Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A Survey on Compiler Autotuning Using Machine Learning. ACM Comput. Surv. 51, 5, Article 96 (2018), 42 pages
[3]
Junjie Bai, Fang Lu, Ke Zhang, and ONNX Community. 2019. ONNX: Open Neural Network Exchange. https://onnx.ai/
[4]
Gary Bradski, Adrian Kaehler, and Others. 2018. OpenCV. https://opencv.org/opencv-4-0
[5]
Yuan Cao, Hongkang Lu, and Tao Wen. 2019. A safety computer system based on multi-sensor data processing. Sensors 19, 4 (2019), 818--834.
[6]
Lorenzo Chelini, Andi Drebes, Oleksandr Zinenko, Albert Cohen, Nicolas Vasilache, Tobias Grosser, and Henk Corporaal. 2021. Progressive Raising in Multilevel IR. In CGO. IEEE, Seoul, South Korea, 15--26.
[7]
CIRCT-HLS Developers. 2022. CIRCT-HLS. https://github.com/circt-hls/circt-hls
[8]
E. Del Sozzo, R. Baghdadi, S. Amarasinghe, and M. D. Santambrogio. 2018. A Unified Backend for Targeting FPGAs from DSLs. In IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP'18). IEEE, Milan, Italy, 1--8.
[9]
CIRCT Developers. 2020. "CIRCT" / Circuit IR Compilers and Tools. https://github.com/llvm/circt
[10]
Adel Ejjeh, Aaron Councilman, Akash Kothari, Maria Kotsifakou, Leon Medvinsky, Abdul Rafae Noor, Hashim Sharif, Yifan Zhao, Sarita Adve, Sasa Misailovic, and Vikram Adve. 2022. HPVM: Hardware-Agnostic Programming for Heterogeneous Parallel Systems. IEEE Micro Early Access (2022), 1--12.
[11]
Iker Elorza, Iker Arrizabalaga, Aritz Zubizarreta, Héctor Martín-Aguilar, Aron Pujana-Arrese, and Carlos Calleja. 2021. A Sensor Data Processing Algorithm for Wind Turbine Hydraulic Pitch System Diagnosis. Energies 15, 1 (2021), 33.
[12]
T Goji Etoh, Kazuhiro Shimonomura, Anh Quang Nguyen, Kosei Takehara, Yoshinari Kamakura, Paul Goetschalckx, Luc Haspeslagh, Piet De Moor, Vu Truong Son Dao, Hoang Dung Nguyen, et al. 2018. A 100 Mfps image sensor for biological applications. In High-Speed Biomedical Imaging and Spectroscopy III: Toward Big Data Instrumentation and Management (BiOS'18). International Society for Optics and Photonics, SPIE, San Francisco, California, United States, 9--18.
[13]
Facebook. 2017. PyTorch: tensors and dynamic neural networks in Python with strong GPU acceleration. https://github.com/pytorch/
[14]
Fabrizio Ferrandi, Vito Giovanni Castellana, Serena Curzel, Pietro Fezzardi, Michele Fiorito, Marco Lattuada, Marco Minutoli, Christian Pilato, and Antonino Tumeo. 2021. Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications. In 58th ACM/IEEE Design Automation Conference (DAC'21). IEEE, San Francisco, CA, USA, 1327--1330.
[15]
Qijing Huang, Ruolong Lian, Andrew Canis, Jongsok Choi, Ryan Xi, Stephen Dean Brown, and Jason Helge Anderson. 2013. The Effect of Compiler Optimizations on High-Level Synthesis for FPGAs. In 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM'13). IEEE Computer Society, Seattle, WA, USA, 89--96.
[16]
Rasheed Hussain and Sherali Zeadally. 2018. Autonomous cars: Research results, issues, and future challenges. IEEE Communications Surveys & Tutorials 21, 2 (2018), 1275--1313.
[17]
Andrew B Kahng and Tom Spyrou. 2021. The OpenROAD Project: Unleashing Hardware Innovation. In Government Microcircuit Applications and Critical Technology Conference (GOMAC'21). GOMACTech, Virtual, 1--6. https://vlsicad.ucsd.edu/Publications/Conferences/383/c383.pdf
[18]
Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. 2019. HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'19). ACM, Seaside, CA, USA, 242--251.
[19]
C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. Pienaar, R. Riddle, T. Shpeisman, N. Vasilache, and O. Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In CGO. IEEE, Seoul, South Korea, 2--14.
[20]
William S. Moses, Lorenzo Chelini, Ruizhe Zhao, and Oleksandr Zinenko. 2021. Polygeist: Raising C to Polyhedral MLIR. In 30th International Conference on Parallel Architectures and Compilation Techniques) (PACT'21). IEEE, Atlanta, GA, USA, 45--59.
[21]
Jennifer Ngadiuba, Vladimir Loncar, Maurizio Pierini, Sioni Summers, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Dylan Rankin, Sergo Jindariani, Mia Liu, et al. 2020. Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml. ML: Science and Technology 2, 1 (2020), 1--14.
[22]
Pouchet, Louis-Noël and others. 2021. PolyBench/C 4.2.1. https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/
[23]
S. Skalicky, J. Monson, A. Schmidt, and M. French. 2018. Hot & Spicy: Improving Productivity with Python and HLS for FPGAs. In IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCMM'18). IEEE, Boulder, CO, USA, 85--92.
[24]
TVM Developers. 2020. VTA: Deep learning accelerator stack. docs.tvm.ai/vta
[25]
Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. FINN: A Framework for Fast, Scalable Binarized NN Inference. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'17). ACM, Monterey, California, USA, 65--74.
[26]
Hanchen Ye, Cong Hao, Jianyi Cheng, Hyunmin Jeong, Jack Huang, Stephen Neuendorffer, and Deming Chen. 2022. ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate Representation. In IEEE International Symposium on High-Performance Computer Architecture (HPCA'22). IEEE, Seoul, South Korea, 741--755.
[27]
Hanchen Ye, Xiaofan Zhang, Zhize Huang, Gengsheng Chen, and Deming Chen. 2020. HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation. In 57th ACM/IEEE Design Automation Conference (DAC'20). IEEE, San Francisco, CA, USA, 1--6.
[28]
Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. 2018. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD'18). IEEE, San Diego, CA, USA, 1--8.
[29]
Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-Mei Hwu, and Deming Chen. 2020. DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD'20). IEEE, San Diego, CA, USA, 1--9.
[30]
Ruizhe Zhao and Jianyi Cheng. 2021. Phism: Polyhedral High-Level Synthesis in MLIR. In LATTE virtual workshop (LATTE'21). arXiv, Virtual, 1--3.

Cited By

View all
  • (2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
  • (2024)Ph.D. Project: A Compiler-Driven Approach to HW/SW Co-Design of Deep-Learning Accelerators2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00053(237-238)Online publication date: 5-May-2024
  • (2024)AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based AcceleratorsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444801(143-157)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
October 2022
1467 pages
ISBN:9781450392174
DOI:10.1145/3508352
© 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

  • IEEE-EDS: Electronic Devices Society
  • IEEE CAS
  • IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HLS
  2. MLIR
  3. compilers
  4. high-level optimizations

Qualifiers

  • Research-article

Conference

ICCAD '22
Sponsor:
ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
October 30 - November 3, 2022
California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)465
  • Downloads (Last 6 weeks)51
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
  • (2024)Ph.D. Project: A Compiler-Driven Approach to HW/SW Co-Design of Deep-Learning Accelerators2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00053(237-238)Online publication date: 5-May-2024
  • (2024)AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based AcceleratorsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444801(143-157)Online publication date: 2-Mar-2024
  • (2024)Towards Automated Generation of Chiplet-Based Systems Invited Paper2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473980(771-776)Online publication date: 22-Jan-2024
  • (2024)Modern High-Level Synthesis: Improving Productivity with a Multi-level ApproachSpecial Topics in Information Technology10.1007/978-3-031-51500-2_2(15-25)Online publication date: 20-Mar-2024
  • (2024)Designing a Graphics Accelerator with Heterogeneous ArchitectureHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-031-51057-1_3(29-40)Online publication date: 26-Jan-2024
  • (2023)Stencil-HMLS: A multi-layered approach to the automatic optimisation of stencil codes on FPGAProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624543(556-565)Online publication date: 12-Nov-2023
  • (2023)High-level Synthesis for Domain Specific ComputingProceedings of the 2023 International Symposium on Physical Design10.1145/3569052.3580027(211-219)Online publication date: 26-Mar-2023
  • (2023)ML-CGRA: An Integrated Compilation Framework to Enable Efficient Machine Learning Acceleration on CGRAs2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247873(1-6)Online publication date: 9-Jul-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media