Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3524059.3532359acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

ASAP: automatic synthesis of area-efficient and precision-aware CGRAs

Published: 28 June 2022 Publication History

Abstract

Coarse-grained reconfigurable accelerators (CGRAs) are a promising accelerator design choice that strikes a balance between performance and adaptability to different computing patterns across various applications domains. Designing a CGRA for a specific application domain involves enormous software/hardware engineering effort. Recent research works explore loop transformations, functional unit types, network topology, and memory size to identify optimal CGRA designs given a set of kernels from a specific application domain. Unfortunately, the impact of functional units with different precision support has rarely been investigated. To address this gap, we propose ASAP - a hardware/software co-design framework that automatically identifies and synthesizes optimal precision-aware CGRA for a set of applications of interest. Our evaluation shows that ASAP generates specialized designs 3.2X, 4.21X, and 5.8X more efficient (in terms of performance per unit of energy or area) than non-specialized homogeneous CGRAs, for the scientific computing, embedded, and edge machine learning domains, respectively, with limited accuracy loss. Moreover, ASAP provides more efficient designs than other state-of-the-art synthesis frameworks for specialized CGRAs.

References

[1]
2020. SambaNova Systems. https://sambanova.ai.
[2]
2021. Open-Source High-Level Synthesis IP Libraries. https://github.com/hlslibs
[3]
2022. Catapult High-Level Synthesis. https://www.mentor.com/hls-lp/catapult-high-level-synthesis
[4]
Omid Akbari, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram, and Muhammad Shafique. 2018. PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 413--418.
[5]
Omid Akbari, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram, and Muhammad Shafique. 2018. Toward approximate computing for coarse-grained reconfigurable architectures. IEEE Micro 38, 6 (2018), 63--72.
[6]
Giovanni Ansaloni, Paolo Bonzini, and Laura Pozzi. 2010. EGRA: A coarse grained reconfigurable architectural template. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19, 6 (2010), 1062--1074.
[7]
Semih Aslan, Erdal Oruklu, and Jafar Saniie. 2012. A high-level synthesis and verification tool for fixed to floating point conversion. In 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE, 908--911.
[8]
S Alexander Chin, Noriaki Sakamoto, Allan Rui, Jim Zhao, Jin Hee Kim, Yuko Hara-Azumi, and Jason Anderson. 2017. CGRA-ME: A unified framework for CGRA modelling and exploration. In 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP). IEEE, 184--189.
[9]
Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In 37th international symposium on microarchitecture (MICRO-37'04). IEEE, 30--40.
[10]
Zahra Ebrahimi and Akash Kumar. 2021. BioCare: An energy-efficient CGRA for bio-signal processing at the edge. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1--5.
[11]
Brian Gaide, Dinesh Gaitonde, Chirag Ravishankar, and Trevor Bauer. 2019. Xilinx adaptive compute acceleration platform: VersalTM architecture. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 84--93.
[12]
Tong Geng, Chunshu Wu, Cheng Tan, Bo Fang, Ang Li, and Martin Herbordt. 2020. CQNN: a CGRA-based QNN framework. In 2020 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--7.
[13]
Vaibhav Gupta, Debabrata Mohapatra, Anand Raghunathan, and Kaushik Roy. 2012. Low-power digital signal processing using approximate adders. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 1 (2012), 124--137.
[14]
Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. 2014. Branch-aware loop mapping on CGRAs. In Proceedings of the 51st Annual Design Automation Conference. 1--6.
[15]
Reiner Hartenstein, Michael Herz, Thomas Hoffmann, and Ulrich Nageldinger. 2000. KressArray Xplorer: A new CAD environment to optimize reconfigurable datapath array architectures. In Proceedings 2000. Design Automation Conference.(IEEE Cat. No. 00CH37106). IEEE, 163--168.
[16]
Reiner W Hartenstein, Rainer Kress, and Helmut Reinig. 1994. A reconfigurable data-driven ALU for Xputers. In Proceedings of IEEE Workshop on FPGA's for Custom Computing Machines. IEEE, 139--146.
[17]
Shunning Jiang, Peitian Pan, Yanghui Ou, and Christopher Batten. 2020. PyMTL3: a Python framework for open-source hardware modeling, generation, simulation, and verification. IEEE Micro 40, 4 (2020), 58--66.
[18]
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1--12.
[19]
Manupa Karunaratne, Aditi Kulkarni Mohite, Tulika Mitra, and Li-Shiuan Peh. 2017. Hycube: A cgra with reconfigurable single-cycle multi-hop interconnect. In Proceedings of the 54th Annual Design Automation Conference 2017. 1--6.
[20]
Manupa Karunaratne, Cheng Tan, Aditi Kulkarni, Tulika Mitra, and Li-Shiuan Peh. 2018. Dnestmap: mapping deeply-nested loops on ultra-low power cgras. In Proceedings of the 55th Annual Design Automation Conference. 1--6.
[21]
David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, et al. 2018. Spatial: A language and compiler for application accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 296--311.
[22]
Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In 2011 24th Internatioal Conference on VLSI Design. IEEE, 346--351.
[23]
Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53, 2 (2018), 461--475.
[24]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. IEEE, 75--86.
[25]
Zhaoying Li, Dhananjaya Wijerathne, Xianzhang Chen, Anuj Pathania, and Tulika Mitra. 2021. Chordmap: Automated mapping of streaming applications onto CGRA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021).
[26]
Stefan Mach, Fabian Schuiki, Florian Zaruba, and Luca Benini. 2019. A 0.80 pJ/flop, 1.24 Tflop/sW 8-to-64 bit Transprecision Floating-Point Unit for a 64 bit RISC-V Processor in 22nm FD-SOI. In 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC). IEEE, 95--98.
[27]
Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2003. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In International Conference on Field Programmable Logic and Applications. Springer, 61--70.
[28]
Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. IEE Proceedings-Computers and Digital Techniques 150, 5 (2003), 255.
[29]
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2017. Mixed Precision Training. CoRR abs/1710.03740 (2017). arXiv:1710.03740 http://arxiv.org/abs/1710.03740
[30]
Hyunchul Park, Yongjun Park, and Scott Mahlke. 2009. Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 370--380.
[31]
Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A reconfigurable architecture for parallel patterns. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 389--402.
[32]
Rohit Prasad, Satyajit Das, Kevin JM Martin, Giuseppe Tagliavini, Philippe Coussy, Luca Benini, and Davide Rossi. 2020. TRANSPIRE: An energy-efficient TRANSprecision floating-point Programmable archItectuRE. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1067--1072.
[33]
Daniel Ratner, Bobby G. Sumpter, Frank Alexander, Jay Jay Billings, Ryan N. Coffee, Sarah M. Cousineau, Peter Denes, Mathieu Doucet, Ian T. Foster, Alexander Hexemer, Dean Hidas, Xiaobiao Huang, Sergei V. Kalinin, Mariam Kiran, Aaron Gilad Kusne, Apurva Mehta, Anibal J. RamirezCuesta, Subramanian K.R.S. Sankaranarayanan, M. C. Scott, Mark Stevens, Yipeng Sun, Jana Thayer, Brian H. Toby, Daniela M. Ushizima, Rama K. Vasudevan, Stuart B. Wilkins, and Kevin G. Yager. 2019. BES Roundtable on Producing and Managing Large Scientific Data with Artificial Intelligence and Machine Learning.
[34]
Kartik Sankaran, Minhui Zhu, Xiang Fa Guo, Akkihebbal L Ananda, Mun Choon Chan, and Li-Shiuan Peh. 2014. Using mobile phone barometer for low-power transportation context detection. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems. 191--205.
[35]
Cheng Tan, Nicolas Bohm Agostini, Tong Geng, Chenhao Xie, Jiajia Li, Ang Li, Kevin Barker, and Antonino Tumeo. 2022. DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAs. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE.
[36]
Cheng Tan, Nicolas Bohm Agostini, Jeff Zhang, Marco Minutoli, Vito Giovanni Castellana, Chenhao Xie, Tong Geng, Ang Li, Kevin Barker, and Antonino Tumeo. 2021. OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays. In 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 149--155.
[37]
Cheng Tan, Tong Geng, Chenhao Xie, Nicolas Bohm Agostini, Jiajia Li, Ang Li, Kevin Barker, and Antonino Tumeo. 2021. DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications. In 2021 IEEE 39th International Conference on Computer Design (ICCD). IEEE, 33--40.
[38]
Cheng Tan, Manupa Karunaratne, Tulika Mitra, and Li-Shiuan Peh. 2018. Stitch: Fusible heterogeneous accelerators enmeshed with many-core architecture for wearables. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 575--587.
[39]
Cheng Tan, Aditi Kulkarni, Vanchinathan Venkataramani, Manupa Karunaratne, Tulika Mitra, and Li-Shiuan Peh. 2017. LOCUS: Low-power customizable many-core architecture for wearables. ACM Transactions on Embedded Computing Systems (TECS) 17, 1 (2017), 1--26.
[40]
Cheng Tan, Chenhao Xie, Tong Geng, Andres Marquez, Antonino Tumeo, Kevin Barker, and Ang Li. 2021. Arena: Asynchronous reconfigurable accelerator ring to enable data-centric parallel computing. IEEE Transactions on Parallel and Distributed Systems 32, 12 (2021), 2880--2892.
[41]
Cheng Tan, Chenhao Xie, Ang Li, Kevin J Barker, and Antonino Tumeo. 2020. OpenCGRA: An open-source unified framework for modeling, testing, and evaluating CGRAs. In 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE, 381--388.
[42]
Cheng Tan, Chenhao Xie, Ang Li, Kevin J Barker, and Antonino Tumeo. 2021. AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1388--1393.
[43]
Shyamkumar Thoziyoor, N Muralimanohar, J Ahn, and N Jouppi. 2009. CACTI 6.5. hpl. hp. com (2009).
[44]
Christopher Torng, Peitian Pan, Yanghui Ou, Cheng Tan, and Christopher Batten. 2021. Ultra-elastic cgras for irregular loop specialization. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 412--425.
[45]
Francisco-Javier Veredas, Michael Scheppler, Will Moffat, and Bingfeng Mei. 2005. Custom implementation of the coarse-grained reconfigurable ADRES architecture for multimedia purposes. In International Conference on Field Programmable Logic and Applications, 2005. IEEE, 106--111.
[46]
Jian Weng, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, and Tony Nowatzki. 2020. Dsagen: Synthesizing programmable spatial accelerators. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 268--281.
[47]
Mark Wijtvliet, Henk Corporaal, and Egor Bondarev. 2019. Mixed-Precision Neural Network Inference Acceleration on a Coarse Grain Reconfigurable Architecture. (2019).
[48]
Max Willsey, Vincent T Lee, Alvin Cheung, Rastislav Bodík, and Luis Ceze. 2018. Iterative search for reconfigurable accelerator blocks with a compiler in the loop. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 3 (2018), 407--418.

Cited By

View all
  • (2024)SC-CGRA: An Energy-Efficient CGRA Using Stochastic ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345331035:11(2023-2038)Online publication date: Nov-2024
  • (2024)GREEN: An Approximate SIMD/MIMD CGRA for Energy-Efficient Processing at the EdgeIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338334943:10(2874-2887)Online publication date: Oct-2024
  • (2024)Fused Functional Units for Area-Efficient CGRAs2024 25th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED60706.2024.10528780(1-8)Online publication date: 3-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing
June 2022
514 pages
ISBN:9781450392815
DOI:10.1145/3524059
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CGRA
  2. HW/SW co-design
  3. precision-awareness

Qualifiers

  • Research-article

Funding Sources

Conference

ICS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)174
  • Downloads (Last 6 weeks)13
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SC-CGRA: An Energy-Efficient CGRA Using Stochastic ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345331035:11(2023-2038)Online publication date: Nov-2024
  • (2024)GREEN: An Approximate SIMD/MIMD CGRA for Energy-Efficient Processing at the EdgeIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338334943:10(2874-2887)Online publication date: Oct-2024
  • (2024)Fused Functional Units for Area-Efficient CGRAs2024 25th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED60706.2024.10528780(1-8)Online publication date: 3-Apr-2024
  • (2023)Flip: Data-centric Edge CGRA AcceleratorACM Transactions on Design Automation of Electronic Systems10.1145/363111829:1(1-25)Online publication date: 18-Dec-2023
  • (2023)VecPAC: A Vectorizable and Precision-Aware CGRA2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323910(1-9)Online publication date: 28-Oct-2023
  • (2022)OverGen: Improving FPGA Usability through Domain-Specific Overlay GenerationProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00018(35-56)Online publication date: 1-Oct-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media