research-article

Public Access

ASAP: automatic synthesis of area-efficient and precision-aware CGRAs

Authors:

Jeff (Jun) Zhang,

Antonino Tumeo,

Ganesh Gopalakrishnan,

Ang LiAuthors Info & Claims

ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing

Article No.: 4, Pages 1 - 13

https://doi.org/10.1145/3524059.3532359

Published: 28 June 2022 Publication History

Abstract

Coarse-grained reconfigurable accelerators (CGRAs) are a promising accelerator design choice that strikes a balance between performance and adaptability to different computing patterns across various applications domains. Designing a CGRA for a specific application domain involves enormous software/hardware engineering effort. Recent research works explore loop transformations, functional unit types, network topology, and memory size to identify optimal CGRA designs given a set of kernels from a specific application domain. Unfortunately, the impact of functional units with different precision support has rarely been investigated. To address this gap, we propose ASAP - a hardware/software co-design framework that automatically identifies and synthesizes optimal precision-aware CGRA for a set of applications of interest. Our evaluation shows that ASAP generates specialized designs 3.2X, 4.21X, and 5.8X more efficient (in terms of performance per unit of energy or area) than non-specialized homogeneous CGRAs, for the scientific computing, embedded, and edge machine learning domains, respectively, with limited accuracy loss. Moreover, ASAP provides more efficient designs than other state-of-the-art synthesis frameworks for specialized CGRAs.

References

[1]

2020. SambaNova Systems. https://sambanova.ai.

[2]

2021. Open-Source High-Level Synthesis IP Libraries. https://github.com/hlslibs

[3]

2022. Catapult High-Level Synthesis. https://www.mentor.com/hls-lp/catapult-high-level-synthesis

[4]

Omid Akbari, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram, and Muhammad Shafique. 2018. PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 413--418.

[5]

Omid Akbari, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram, and Muhammad Shafique. 2018. Toward approximate computing for coarse-grained reconfigurable architectures. IEEE Micro 38, 6 (2018), 63--72.

[6]

Giovanni Ansaloni, Paolo Bonzini, and Laura Pozzi. 2010. EGRA: A coarse grained reconfigurable architectural template. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19, 6 (2010), 1062--1074.

Digital Library

[7]

Semih Aslan, Erdal Oruklu, and Jafar Saniie. 2012. A high-level synthesis and verification tool for fixed to floating point conversion. In 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE, 908--911.

[8]

S Alexander Chin, Noriaki Sakamoto, Allan Rui, Jim Zhao, Jin Hee Kim, Yuko Hara-Azumi, and Jason Anderson. 2017. CGRA-ME: A unified framework for CGRA modelling and exploration. In 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP). IEEE, 184--189.

[9]

Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In 37th international symposium on microarchitecture (MICRO-37'04). IEEE, 30--40.

Digital Library

[10]

Zahra Ebrahimi and Akash Kumar. 2021. BioCare: An energy-efficient CGRA for bio-signal processing at the edge. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1--5.

[11]

Brian Gaide, Dinesh Gaitonde, Chirag Ravishankar, and Trevor Bauer. 2019. Xilinx adaptive compute acceleration platform: VersalTM architecture. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 84--93.

Digital Library

[12]

Tong Geng, Chunshu Wu, Cheng Tan, Bo Fang, Ang Li, and Martin Herbordt. 2020. CQNN: a CGRA-based QNN framework. In 2020 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--7.

[13]

Vaibhav Gupta, Debabrata Mohapatra, Anand Raghunathan, and Kaushik Roy. 2012. Low-power digital signal processing using approximate adders. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 1 (2012), 124--137.

Digital Library

[14]

Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. 2014. Branch-aware loop mapping on CGRAs. In Proceedings of the 51st Annual Design Automation Conference. 1--6.

Digital Library

[15]

Reiner Hartenstein, Michael Herz, Thomas Hoffmann, and Ulrich Nageldinger. 2000. KressArray Xplorer: A new CAD environment to optimize reconfigurable datapath array architectures. In Proceedings 2000. Design Automation Conference.(IEEE Cat. No. 00CH37106). IEEE, 163--168.

[16]

Reiner W Hartenstein, Rainer Kress, and Helmut Reinig. 1994. A reconfigurable data-driven ALU for Xputers. In Proceedings of IEEE Workshop on FPGA's for Custom Computing Machines. IEEE, 139--146.

[17]

Shunning Jiang, Peitian Pan, Yanghui Ou, and Christopher Batten. 2020. PyMTL3: a Python framework for open-source hardware modeling, generation, simulation, and verification. IEEE Micro 40, 4 (2020), 58--66.

Digital Library

[18]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1--12.

Digital Library

[19]

Manupa Karunaratne, Aditi Kulkarni Mohite, Tulika Mitra, and Li-Shiuan Peh. 2017. Hycube: A cgra with reconfigurable single-cycle multi-hop interconnect. In Proceedings of the 54th Annual Design Automation Conference 2017. 1--6.

Digital Library

[20]

Manupa Karunaratne, Cheng Tan, Aditi Kulkarni, Tulika Mitra, and Li-Shiuan Peh. 2018. Dnestmap: mapping deeply-nested loops on ultra-low power cgras. In Proceedings of the 55th Annual Design Automation Conference. 1--6.

Digital Library

[21]

David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, et al. 2018. Spatial: A language and compiler for application accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 296--311.

Digital Library

[22]

Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In 2011 24th Internatioal Conference on VLSI Design. IEEE, 346--351.

Digital Library

[23]

Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53, 2 (2018), 461--475.

Digital Library

[24]

Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. IEEE, 75--86.

[25]

Zhaoying Li, Dhananjaya Wijerathne, Xianzhang Chen, Anuj Pathania, and Tulika Mitra. 2021. Chordmap: Automated mapping of streaming applications onto CGRA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021).

[26]

Stefan Mach, Fabian Schuiki, Florian Zaruba, and Luca Benini. 2019. A 0.80 pJ/flop, 1.24 Tflop/sW 8-to-64 bit Transprecision Floating-Point Unit for a 64 bit RISC-V Processor in 22nm FD-SOI. In 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC). IEEE, 95--98.

[27]

Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2003. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In International Conference on Field Programmable Logic and Applications. Springer, 61--70.

[28]

Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. IEE Proceedings-Computers and Digital Techniques 150, 5 (2003), 255.

[29]

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2017. Mixed Precision Training. CoRR abs/1710.03740 (2017). arXiv:1710.03740 http://arxiv.org/abs/1710.03740

[30]

Hyunchul Park, Yongjun Park, and Scott Mahlke. 2009. Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 370--380.

Digital Library

[31]

Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A reconfigurable architecture for parallel patterns. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 389--402.

Digital Library

[32]

Rohit Prasad, Satyajit Das, Kevin JM Martin, Giuseppe Tagliavini, Philippe Coussy, Luca Benini, and Davide Rossi. 2020. TRANSPIRE: An energy-efficient TRANSprecision floating-point Programmable archItectuRE. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1067--1072.

[33]

Daniel Ratner, Bobby G. Sumpter, Frank Alexander, Jay Jay Billings, Ryan N. Coffee, Sarah M. Cousineau, Peter Denes, Mathieu Doucet, Ian T. Foster, Alexander Hexemer, Dean Hidas, Xiaobiao Huang, Sergei V. Kalinin, Mariam Kiran, Aaron Gilad Kusne, Apurva Mehta, Anibal J. RamirezCuesta, Subramanian K.R.S. Sankaranarayanan, M. C. Scott, Mark Stevens, Yipeng Sun, Jana Thayer, Brian H. Toby, Daniela M. Ushizima, Rama K. Vasudevan, Stuart B. Wilkins, and Kevin G. Yager. 2019. BES Roundtable on Producing and Managing Large Scientific Data with Artificial Intelligence and Machine Learning.

[34]

Kartik Sankaran, Minhui Zhu, Xiang Fa Guo, Akkihebbal L Ananda, Mun Choon Chan, and Li-Shiuan Peh. 2014. Using mobile phone barometer for low-power transportation context detection. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems. 191--205.

Digital Library

[35]

Cheng Tan, Nicolas Bohm Agostini, Tong Geng, Chenhao Xie, Jiajia Li, Ang Li, Kevin Barker, and Antonino Tumeo. 2022. DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAs. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE.

[36]

Cheng Tan, Nicolas Bohm Agostini, Jeff Zhang, Marco Minutoli, Vito Giovanni Castellana, Chenhao Xie, Tong Geng, Ang Li, Kevin Barker, and Antonino Tumeo. 2021. OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays. In 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 149--155.

[37]

Cheng Tan, Tong Geng, Chenhao Xie, Nicolas Bohm Agostini, Jiajia Li, Ang Li, Kevin Barker, and Antonino Tumeo. 2021. DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications. In 2021 IEEE 39th International Conference on Computer Design (ICCD). IEEE, 33--40.

[38]

Cheng Tan, Manupa Karunaratne, Tulika Mitra, and Li-Shiuan Peh. 2018. Stitch: Fusible heterogeneous accelerators enmeshed with many-core architecture for wearables. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 575--587.

Digital Library

[39]

Cheng Tan, Aditi Kulkarni, Vanchinathan Venkataramani, Manupa Karunaratne, Tulika Mitra, and Li-Shiuan Peh. 2017. LOCUS: Low-power customizable many-core architecture for wearables. ACM Transactions on Embedded Computing Systems (TECS) 17, 1 (2017), 1--26.

Digital Library

[40]

Cheng Tan, Chenhao Xie, Tong Geng, Andres Marquez, Antonino Tumeo, Kevin Barker, and Ang Li. 2021. Arena: Asynchronous reconfigurable accelerator ring to enable data-centric parallel computing. IEEE Transactions on Parallel and Distributed Systems 32, 12 (2021), 2880--2892.

[41]

Cheng Tan, Chenhao Xie, Ang Li, Kevin J Barker, and Antonino Tumeo. 2020. OpenCGRA: An open-source unified framework for modeling, testing, and evaluating CGRAs. In 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE, 381--388.

[42]

Cheng Tan, Chenhao Xie, Ang Li, Kevin J Barker, and Antonino Tumeo. 2021. AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1388--1393.

[43]

Shyamkumar Thoziyoor, N Muralimanohar, J Ahn, and N Jouppi. 2009. CACTI 6.5. hpl. hp. com (2009).

[44]

Christopher Torng, Peitian Pan, Yanghui Ou, Cheng Tan, and Christopher Batten. 2021. Ultra-elastic cgras for irregular loop specialization. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 412--425.

[45]

Francisco-Javier Veredas, Michael Scheppler, Will Moffat, and Bingfeng Mei. 2005. Custom implementation of the coarse-grained reconfigurable ADRES architecture for multimedia purposes. In International Conference on Field Programmable Logic and Applications, 2005. IEEE, 106--111.

[46]

Jian Weng, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, and Tony Nowatzki. 2020. Dsagen: Synthesizing programmable spatial accelerators. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 268--281.

Digital Library

[47]

Mark Wijtvliet, Henk Corporaal, and Egor Bondarev. 2019. Mixed-Precision Neural Network Inference Acceleration on a Coarse Grain Reconfigurable Architecture. (2019).

[48]

Max Willsey, Vincent T Lee, Alvin Cheung, Rastislav Bodík, and Luis Ceze. 2018. Iterative search for reconfigurable accelerator blocks with a compiler in the loop. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 3 (2018), 407--418.

Cited By

Mou DWang BLiu D(2024)SC-CGRA: An Energy-Efficient CGRA Using Stochastic ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345331035:11(2023-2038)Online publication date: Nov-2024
https://doi.org/10.1109/TPDS.2024.3453310
Ebrahimi ZKumar A(2024)GREEN: An Approximate SIMD/MIMD CGRA for Energy-Efficient Processing at the EdgeIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338334943:10(2874-2887)Online publication date: Oct-2024
https://doi.org/10.1109/TCAD.2024.3383349
Jokai RTan CZhang J(2024)Fused Functional Units for Area-Efficient CGRAs2024 25th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED60706.2024.10528780(1-8)Online publication date: 3-Apr-2024
https://doi.org/10.1109/ISQED60706.2024.10528780
Show More Cited By

Index Terms

ASAP: automatic synthesis of area-efficient and precision-aware CGRAs
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Reconfigurable computing
2. Hardware
  1. Electronic design automation
    1. Methodologies for EDA
      1. Software tools for EDA

Recommendations

HW/SW co-design for public-key cryptosystems on the 8051 micro-controller

It is a challenge to implement large word length public-key algorithms on embedded systems. Examples are smartcards, RF-ID tags and mobile terminals. This paper presents a HW/SW co-design solution for RSA and Elliptic Curve Cryptography (ECC) over GF(p) ...
Hardware/software co-design for particle swarm optimization algorithm

This paper presents a hardware/software (HW/SW) co-design approach using SOPC technique and pipeline design method to improve design flexibility and execution performance of particle swarm optimization (PSO) for embedded applications. Based on modular ...
HW/SW partitioning and code generation of embedded control applications on a reconfigurable architecture platform
CODES '02: Proceedings of the tenth international symposium on Hardware/software codesign

This paper studies the use of a reconfigurable architecture platform for embedded control applications aimed at improving real time performance. The hw/sw codesign methodology from POLIS is used. It starts from high-level specifications, optimizes an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing

June 2022

514 pages

ISBN:9781450392815

DOI:10.1145/3524059

General Chairs:
Lawrence Rauchwerger
University of Illinois at Urbana-Champaign
,
Kirk Cameron
Virginia Tech
,
Program Chairs:
Dimitrios S. Nikolopoulos
Virginia Tech
,
Dionisios Pnevmatikatos
National Technical University of Athens

Copyright © 2022 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

DOE U.S. Department of Energy

Conference

ICS '22

Sponsor:

SIGARCH

ICS '22: 2022 International Conference on Supercomputing

June 28 - 30, 2022

Virtual Event

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
393
Total Downloads

Downloads (Last 12 months)174
Downloads (Last 6 weeks)13

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mou DWang BLiu D(2024)SC-CGRA: An Energy-Efficient CGRA Using Stochastic ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345331035:11(2023-2038)Online publication date: Nov-2024
https://doi.org/10.1109/TPDS.2024.3453310
Ebrahimi ZKumar A(2024)GREEN: An Approximate SIMD/MIMD CGRA for Energy-Efficient Processing at the EdgeIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338334943:10(2874-2887)Online publication date: Oct-2024
https://doi.org/10.1109/TCAD.2024.3383349
Jokai RTan CZhang J(2024)Fused Functional Units for Area-Efficient CGRAs2024 25th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED60706.2024.10528780(1-8)Online publication date: 3-Apr-2024
https://doi.org/10.1109/ISQED60706.2024.10528780
Wu DChen PBandara TLi ZMitra T(2023)Flip: Data-centric Edge CGRA AcceleratorACM Transactions on Design Automation of Electronic Systems10.1145/363111829:1(1-25)Online publication date: 18-Dec-2023
https://dl.acm.org/doi/10.1145/3631118
Tan CPatil DTumeo AWeisz GReinhardt SZhang J(2023)VecPAC: A Vectorizable and Precision-Aware CGRA2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323910(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323910
Liu SWeng JKupsh DSohrabizadeh AWang ZGuo LLiu JZhulin MMani RZhang LCong JNowatzki THardavellas NCampanoni SGrot BKarpuzcu U(2022)OverGen: Improving FPGA Usability through Domain-Specific Overlay GenerationProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00018(35-56)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00018

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents