research-article

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

ACM Transactions on Embedded Computing Systems, Volume 22, Issue 2

Article No.: 35, Pages 1 - 34

https://doi.org/10.1145/3534933

Published: 24 January 2023 Publication History

Abstract

With the slowing of Moore’s law, computer architects have turned to domain-specific hardware specialization to continue improving the performance and efficiency of computing systems. However, specialization typically entails significant modifications to the software stack to properly leverage the updated hardware. The lack of a structured approach for updating the compiler and the accelerator in tandem has impeded many attempts to systematize this procedure. We propose a new approach to enable flexible and evolvable domain-specific hardware specialization based on coarse-grained reconfigurable arrays (CGRAs). Our agile methodology employs a combination of new programming languages and formal methods to automatically generate the accelerator hardware and its compiler from a single source of truth. This enables the creation of design-space exploration frameworks that automatically generate accelerator architectures that approach the efficiencies of hand-designed accelerators, with a significantly lower design effort for both hardware and compiler generation. Our current system accelerates dense linear algebra applications but is modular and can be extended to support other domains. Our methodology has the potential to significantly improve the productivity of hardware-software engineering teams and enable quicker customization and deployment of complex accelerator-rich computing systems.

References

[1]

Alon Amid, David Biancolin, Abraham Gonzalez, Daniel Grubb, Sagar Karandikar, Harrison Liew, Albert Magyar, Howard Mao, Albert Ou, Nathan Pemberton, Paul Rigge, Colin Schmidt, John Wright, Jerry Zhao, Yakun Sophia Shao, Krste Asanović, and Borivoje Nikolić. 2020. Chipyard: Integrated design, simulation, and implementation framework for custom SoCs. IEEE Micro 40, 4 (2020), 10–21.

Digital Library

[2]

Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, Jackson Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James Thomas, Christopher Torng, Leonard Truong, Nestan Tsiskaridze, and Keyi Zhang. 2020. Creating an agile hardware design flow. In 57th ACM/IEEE Design Automation Conference (DAC’20). 1–6.

[3]

Clark Barrett, Pascal Fontaine, and Cesare Tinelli. 2016. The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.org.

[4]

Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). ACM, 33–36.

Digital Library

[5]

Alex Carsello, Kathleen Feng, Taeyoung Kong, Kalhan Koul, Qiaoyi Liu, Jackson Melchert, Gedeon Nyengele, Maxwell Strange, Keyi Zhang, Ankita Nayak, Jeff Setter, James Thomas, Kavya Sreedhar, Po-Han Chen, Nikhil Bhagdikar, Zachary Myers, Brandon D’Agostino, Pranil Joshi, Stephen Richardson, Rick Bahr, Christopher Torng, Mark Horowitz, and Priyanka Raina. 2022. Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a coarse-grained reconfigurable array for flexible acceleration of dense linear algebra. In 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). 70–71.

[6]

Alex Carsello, James Thomas, Ankita Nayak, Po-Han Chen, Mark Horowitz, Priyanka Raina, and Christopher Torng. 2022. mflowgen: A modular flow generator and ecosystem for community-driven physical design. In Design Automation Conference (DAC).

[7]

Yu-Chen Chen, Sheng-Yen Chen, and Yao-Wen Chang. 2014. Efficient and effective packing and analytical placement for large-scale heterogeneous FPGAs. In 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). IEEE, 647–654.

[8]

Yuze Chi, Jason Cong, Peng Wei, and Peipei Zhou. 2018. SODA: Stencil with optimized dataflow architecture. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). 1–8.

Digital Library

[9]

S. Alexander Chin, Noriaki Sakamoto, Allan Rui, Jim Zhao, Jin Hee Kim, Yuko Hara-Azumi, and Jason Anderson. 2017. CGRA-ME: A unified framework for CGRA modelling and exploration. In IEEE 28th International Conference on Application-Specific Systems, Architectures and Processors (ASAP’17). 184–189.

[10]

Nitin Chugh, Vinay Vasista, Suresh Purini, and Uday Bondhugula. 2016. A DSL compiler for accelerating image processing pipelines on FPGAs. In Proceedings of the International Conference on Parallel Architectures and Compilation. 327–338.

Digital Library

[11]

Keith D. Cooper, L. Taylor Simpson, and Christopher A. Vick. 2001. Operator strength reduction. ACM Transactions on Programming Languages and Systems 23, 5 (2001), 603–625.

Digital Library

[12]

Ross Daly, Leonard Truong, and Pat Hanrahan. 2018. Invoking and linking generators from multiple hardware languages using CoreIR. In Workshop on Open-Source EDA Technology (WOSET’18). https://woset-workshop.github.io/PDFs/2018/a11.pdf.

[13]

Jules R. Degila and Brunilde Sanso. 2004. A survey of topologies and performance measures for large-scale networks. IEEE Communications Surveys Tutorials 6, 4 (2004), 18–31.

Digital Library

[14]

David Durst, Matthew Feldman, Dillon Huff, David Akeley, Ross Daly, Gilbert Louis Bernstein, Marco Patrignani, Kayvon Fatahalian, and Pat Hanrahan. 2020. Type-directed scheduling of streaming accelerators. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’20). 408–422.

Digital Library

[15]

Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. DySER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32, 5 (2012), 38–51.

Digital Library

[16]

James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. 2014. Darkroom: Compiling high-level image processing code into hardware pipelines. ACM Transactions on Graphics 33, 4, Article 144 (July2014), 11 pages.

Digital Library

[17]

James Hegarty, Ross Daly, Zachary DeVito, Jonathan Ragan-Kelley, Mark Horowitz, and Pat Hanrahan. 2016. Rigel: Flexible multi-rate image processing hardware. ACM Transactions on Graphics 35, 4, Article 85 (2016), 11 pages.

Digital Library

[18]

Ralf Hinze. 2004. An algebra of scans. In International Conference on Mathematics of Program Construction. Springer, 186–210.

[19]

Dillon Huff, Steve Dai, and Pat Hanrahan. 2021. Clockwork: Resource-efficient static scheduling for multi-rate image processing applications on FPGAs. In IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’21). 186–194.

[20]

Intel Inc. [n. d.]. Altera OpenCL. https://www.intel.com/content/www/us/en/software/programmable/sdk-for-opencl/overview.html.

[21]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, FL) (MM’14). ACM, New York, NY, 675–678.

Digital Library

[22]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. SIGARCH Comput. Archit. News 45, 2 (June2017), 1–12.

Digital Library

[23]

Andrew B. Kahng, Sherief Reda, and Qinke Wang. 2005. APlace: A general analytic placement framework. In Proceedings of the 2005 International Symposium on Physical Design (San Francisco, CA) (ISPD’05). ACM, New York, NY, 233–235.

Digital Library

[24]

Khronos® OpenCL Working Group. [n. d.]. The OpenCL™ C Specification. Retrieved July 13, 2022 from https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_C.pdf.

[25]

Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor algebra compilation with workspaces. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’19). 180–192.

[26]

Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 77 (Oct2017), 29 pages.

Digital Library

[27]

David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2018. Spatial: A language and compiler for application accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA) (PLDI’18). ACM, New York, NY, 296–311.

Digital Library

[28]

Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53, 2 (March2018), 461–475.

Digital Library

[29]

Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. 2019. HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19) (Seaside, CA). 242–251.

Digital Library

[30]

Jiajie Li, Yuze Chi, and Jason Cong. 2020. HeteroHalide: From image processing DSL to efficient FPGA acceleration. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA) (FPGA’20). ACM, New York, NY, 51–57.

Digital Library

[31]

Qiaoyi Liu, Dillon Huff, Jeff Setter, Maxwell Strange, Kathleen Feng, Kavya Sreedhar, Ziheng Wang, Keyi Zhang, Mark Horowitz, Priyanka Raina, et al. 2021. Compiling halide programs to push-memory accelerators. arXiv preprint arXiv:2105.12858 (2021).

[32]

Muhammad Masud. 2000. FPGA Routing Structures: A Novel Switch Block and Depopulated Interconnect Matrix Architectures. Ph. D. Dissertation. University of British Columbia. https://people.ece.ubc.ca/stevew/papers/pdf/imran_masc.pdf.

[33]

Maxeler Inc. [n. d.]. MaxCompiler. Retrieved July 13, 2022 from https://www.maxeler.com/products/software/maxcompiler.

[34]

Wim Meeus, Kristof Van Beeck, Toon Goedemé, Jan Meel, and Dirk Stroobandt. 2012. An overview of today’s high-level synthesis tools. Design Automation for Embedded Systems 16, 3 (2012), 31–51.

Digital Library

[35]

Baisha Mei, Mladen Berekovic, and J.-Y. Mignolet. 2007. ADRES & DRESC: Architecture and compiler for coarse-grain reconfigurable processors. Springer.

[36]

Mentor Graphics Inc. [n. d.]. Catapult High Level Synthesis. Retrieved July 13, 2022 from https://www.mentor.com/hls-lp/catapult-high-level-synthesis.

[37]

Thierry Moreau, Tianqi Chen, and Luis Ceze. 2018. Leveraging the VTA-TVM hardware-software stack for FPGA acceleration of 8-bit ResNet-18 inference. In Proceedings of the Reproducible Quality-Efficient Systems Tournament on Co-Designing Pareto-Efficient Deep Learning (ReQuEST) (Williamsburg, VA). Article 5.

Digital Library

[38]

Thierry Moreau, Tianqi Chen, Ziheng Jiang, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. VTA: An open hardware-software stack for deep learning. arXiv preprint arXiv:1807.04188 (2018).

[39]

Ravi Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics 35 (72016), 1–11.

Digital Library

[40]

Ankita Nayak, Keyi Zhang, Raj Setaluri, Alex Carsello, Makai Mann, Stephen Richardson, Rick Bahr, Pat Hanrahan, Mark Horowitz, and Priyanka Raina. 2020. A framework for adding low-overhead, fine-grained power domains to CGRAs. In 2020 Design Automation Test in Europe Conference Exhibition (DATE’20). 846–851.

[41]

John O’Donnell. 1988. Hydra: Hardware description in a functional language using recursion equations and high order combining forms. The Fusion of Hardware Design and Verification (1988), 309–328.

[42]

Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A reconfigurable architecture for parallel patterns. In ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA’17). 389–402.

Digital Library

[43]

Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2017. Programming heterogeneous systems from an image processing DSL. ACMTransactions on Architecture and Code Optimization 14, 3, Article 26 (Aug.2017), 25 pages.

Digital Library

[44]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, WA) (PLDI’13). ACM, New York, NY, 519–530.

Digital Library

[45]

Oliver Reiche, Moritz Schmid, Frank Hannig, Richard Membarth, and Jürgen Teich. 2014. Code generation from a domain-specific language for C-based HLS of hardware accelerators. In 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’14). 1–10.

Digital Library

[46]

Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1–12.

[47]

Jordan S. Swartz, Vaughn Betz, and Jonathan Rose. 1998. A fast routability-driven router for FPGAs. In Proceedings of the ACM/SIGDA 6th International Symposium on Field Programmable Gate Arrays (Monterey, CA) (FPGA’98). ACM, New York, NY, 140–149.

Digital Library

[48]

Christopher Torng, Peitian Pan, Yanghui Ou, Cheng Tan, and Christopher Batten. 2021. Ultra-elastic CGRAs for irregular loop specialization. In IEEE International Symposium on High-Performance Computer Architecture (HPCA’21). 412–425.

[49]

Lenny Truong and Pat Hanrahan. 2019. A golden age of hardware description languages: Applying programming language techniques to improve design productivity. In 3rd Summit on Advances in Programming Languages (SNAPL’19)(Leibniz International Proceedings in Informatics (LIPIcs), Vol. 136), Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 7:1–7:21.

[50]

Lenny Truong, Steven Herbst, Rajsekhar Setaluri, Makai Mann, Ross Daly, Keyi Zhang, Caleb Donovick, Daniel Stanley, Mark Horowitz, Clark Barrett, and Pat Hanrahan. 2020. fault: A python embedded domain-specific language for metaprogramming portable hardware verification components. In Computer Aided Verification. Springer International Publishing, 403–414.

Digital Library

[51]

Nestan Tsiskaridze, Maxwell Strange, Makai Mann, Kavya Sreedhar, Qiaoyi Liu, Mark Horowitz, and Clark Barrett. 2021. Automating system configuration. In 2021 Formal Methods in Computer Aided Design (FMCAD).

[52]

Peter J. M. Van Laarhoven and Emile H. L. Aarts. 1987. Simulated annealing. In Simulated Annealing: Theory and Applications. Springer, 7–15.

[53]

Artem Vasilyev, Nikhil Bhagdikar, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz. 2016. Evaluating programmable architectures for imaging and vision applications. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1–13.

[54]

Rangharajan Venkatesan, Yakun Sophia Shao, Miaorong Wang, Jason Clemons, Steve Dai, Matthew Fojtik, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Yanqing Zhang, Brian Zimmer, William J. Dally, Joel Emer, Stephen W. Keckler, and Brucek Khailany. 2019. MAGNet: A modular accelerator generator for neural networks. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’19). 1–8.

[55]

Veripool. [n. d.]. Verilator. Retrieved July 13, 2022 from https://www.veripool.org/verilator/.

[56]

Renda Wang, Longjiang Guo, Chunyu Ai, Jinbao Li, Meirui Ren, and Keqin Li. 2013. An efficient graph isomorphism algorithm based on canonical labeling and its parallel implementation on GPU. In IEEE 10th International Conference on High Performance Computing and Communications IEEE International Conference on Embedded and Ubiquitous Computing. 1089–1096.

[57]

Xilinx Inc. [n. d.]. Vivado High Level Synthesis. Retrieved July 13, 2022 from https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html.

[58]

Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, and Yingyan Lin. 2020. AutoDNNchip: An automated DNN chip predictor and builder for both FPGAs and ASICs. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA) (FPGA’20). ACM, New York, NY, 40–50.

Digital Library

[59]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, CA) (FPGA’15). ACM, New York, NY, 161–170.

Digital Library

[60]

Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. 2018. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD’18) (San Diego, CA). Article 56, 8 pages.

Digital Library

[61]

Wei Zuo, Yun Liang, Peng Li, Kyle Rupnow, Deming Chen, and Jason Cong. 2013. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, CA) (FPGA’13). ACM, New York, NY, 9–18.

Digital Library

Cited By

de Bruin BVadivel KWijtvliet MJääskeläinen PCorporaal H(2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 10-May-2024
https://dl.acm.org/doi/10.1145/3656642
Chen SCai CZheng SLi JZhu GLi JYan YDai YYin WWang L(2024)HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space ExplorationACM Transactions on Reconfigurable Technology and Systems10.1145/365617617:2(1-31)Online publication date: 10-May-2024
https://dl.acm.org/doi/10.1145/3656176
Qiu YMao YGao XChen SLi JYin WWang L(2024)FDRA: A Framework for a Dynamically Reconfigurable Accelerator Supporting Multi-Level ParallelismACM Transactions on Reconfigurable Technology and Systems10.1145/361422417:1(1-26)Online publication date: 27-Jan-2024
https://dl.acm.org/doi/10.1145/3614224
Show More Cited By

Index Terms

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Recommendations

Spatial: a language and compiler for application accelerators
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation

Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for ...
Spatial: a language and compiler for application accelerators
PLDI '18

Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for ...
Compiler assisted architectural exploration for coarse grained reconfigurable arrays
GLSVLSI '07: Proceedings of the 17th ACM Great Lakes symposium on VLSI

A large number of factors influence the hardware cost and the mapping efficiency of applications on coarse grain reconfigurable architectures. This paper investigates for the first time in a unified way the four factors that are directly related with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 22, Issue 2

March 2023

560 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3572826

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 24 January 2023

Online AM: 07 July 2022

Accepted: 30 April 2022

Revised: 12 March 2022

Received: 18 October 2021

Published in TECS Volume 22, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

DSSoC DARPA
Stanford AHA Agile Hardware Center
Affiliates Program, Intel’s Science and Technology Center (ISTC)
Stanford SystemX Alliance

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
3,537
Total Downloads

Downloads (Last 12 months)1,676
Downloads (Last 6 weeks)117

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

de Bruin BVadivel KWijtvliet MJääskeläinen PCorporaal H(2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 10-May-2024
https://dl.acm.org/doi/10.1145/3656642
Chen SCai CZheng SLi JZhu GLi JYan YDai YYin WWang L(2024)HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space ExplorationACM Transactions on Reconfigurable Technology and Systems10.1145/365617617:2(1-31)Online publication date: 10-May-2024
https://dl.acm.org/doi/10.1145/3656176
Qiu YMao YGao XChen SLi JYin WWang L(2024)FDRA: A Framework for a Dynamically Reconfigurable Accelerator Supporting Multi-Level ParallelismACM Transactions on Reconfigurable Technology and Systems10.1145/361422417:1(1-26)Online publication date: 27-Jan-2024
https://dl.acm.org/doi/10.1145/3614224
Feng KKong TKoul KMelchert JCarsello ALiu QNyengele GStrange MZhang KNayak ASetter JThomas JSreedhar KChen PBhagdikar NMyers ZD’Agostino BJoshi PRichardson STorng CHorowitz MRaina P(2024)Amber: A 16-nm System-on-Chip With a Coarse- Grained Reconfigurable Array for Flexible Acceleration of Dense Linear AlgebraIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.331311659:3(947-959)Online publication date: Mar-2024
https://doi.org/10.1109/JSSC.2023.3313116
Gao XQiu YDai YYin WWang L(2024)A CGRA Front-end Compiler Enabling Extraction of General Control and Dedicated Operators2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473891(799-804)Online publication date: 22-Jan-2024
https://doi.org/10.1109/ASP-DAC58780.2024.10473891
Torng C(2023)Building First-Order Energy Modeling Intuition in Computer Architecture LecturesProceedings of the Workshop on Computer Architecture Education10.1145/3605507.3610632(26-33)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3605507.3610632
Dai YLi JZhu QQiu YHu YYin WWang L(2023)HETA: A Heterogeneous Temporal CGRA Modeling and Design Space Exploration via Bayesian OptimizationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.334453632:3(505-518)Online publication date: 25-Dec-2023
https://dl.acm.org/doi/10.1109/TVLSI.2023.3344536
Peng BSun SDai YLi JQiu YWang KYin WWang L(2023)PRAD: A Bayesian Optimization-based DSE Framework for Parameterized Reconfigurable Architecture Design2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM57271.2023.00054(226-226)Online publication date: May-2023
https://doi.org/10.1109/FCCM57271.2023.00054
Liu SWeng JKupsh DSohrabizadeh AWang ZGuo LLiu JZhulin MMani RZhang LCong JNowatzki THardavellas NCampanoni SGrot BKarpuzcu U(2022)OverGen: Improving FPGA Usability through Domain-Specific Overlay GenerationProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00018(35-56)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00018
Park DXiao YDeHon A(2022)Fast and Flexible FPGA Development using Hierarchical Partial Reconfiguration2022 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT56656.2022.9974201(1-10)Online publication date: 5-Dec-2022
https://doi.org/10.1109/ICFPT56656.2022.9974201

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents