Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Published: 24 January 2023 Publication History

Abstract

With the slowing of Moore’s law, computer architects have turned to domain-specific hardware specialization to continue improving the performance and efficiency of computing systems. However, specialization typically entails significant modifications to the software stack to properly leverage the updated hardware. The lack of a structured approach for updating the compiler and the accelerator in tandem has impeded many attempts to systematize this procedure. We propose a new approach to enable flexible and evolvable domain-specific hardware specialization based on coarse-grained reconfigurable arrays (CGRAs). Our agile methodology employs a combination of new programming languages and formal methods to automatically generate the accelerator hardware and its compiler from a single source of truth. This enables the creation of design-space exploration frameworks that automatically generate accelerator architectures that approach the efficiencies of hand-designed accelerators, with a significantly lower design effort for both hardware and compiler generation. Our current system accelerates dense linear algebra applications but is modular and can be extended to support other domains. Our methodology has the potential to significantly improve the productivity of hardware-software engineering teams and enable quicker customization and deployment of complex accelerator-rich computing systems.

References

[1]
Alon Amid, David Biancolin, Abraham Gonzalez, Daniel Grubb, Sagar Karandikar, Harrison Liew, Albert Magyar, Howard Mao, Albert Ou, Nathan Pemberton, Paul Rigge, Colin Schmidt, John Wright, Jerry Zhao, Yakun Sophia Shao, Krste Asanović, and Borivoje Nikolić. 2020. Chipyard: Integrated design, simulation, and implementation framework for custom SoCs. IEEE Micro 40, 4 (2020), 10–21.
[2]
Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, Jackson Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James Thomas, Christopher Torng, Leonard Truong, Nestan Tsiskaridze, and Keyi Zhang. 2020. Creating an agile hardware design flow. In 57th ACM/IEEE Design Automation Conference (DAC’20). 1–6.
[3]
Clark Barrett, Pascal Fontaine, and Cesare Tinelli. 2016. The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.org.
[4]
Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). ACM, 33–36.
[5]
Alex Carsello, Kathleen Feng, Taeyoung Kong, Kalhan Koul, Qiaoyi Liu, Jackson Melchert, Gedeon Nyengele, Maxwell Strange, Keyi Zhang, Ankita Nayak, Jeff Setter, James Thomas, Kavya Sreedhar, Po-Han Chen, Nikhil Bhagdikar, Zachary Myers, Brandon D’Agostino, Pranil Joshi, Stephen Richardson, Rick Bahr, Christopher Torng, Mark Horowitz, and Priyanka Raina. 2022. Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a coarse-grained reconfigurable array for flexible acceleration of dense linear algebra. In 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). 70–71.
[6]
Alex Carsello, James Thomas, Ankita Nayak, Po-Han Chen, Mark Horowitz, Priyanka Raina, and Christopher Torng. 2022. mflowgen: A modular flow generator and ecosystem for community-driven physical design. In Design Automation Conference (DAC).
[7]
Yu-Chen Chen, Sheng-Yen Chen, and Yao-Wen Chang. 2014. Efficient and effective packing and analytical placement for large-scale heterogeneous FPGAs. In 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). IEEE, 647–654.
[8]
Yuze Chi, Jason Cong, Peng Wei, and Peipei Zhou. 2018. SODA: Stencil with optimized dataflow architecture. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). 1–8.
[9]
S. Alexander Chin, Noriaki Sakamoto, Allan Rui, Jim Zhao, Jin Hee Kim, Yuko Hara-Azumi, and Jason Anderson. 2017. CGRA-ME: A unified framework for CGRA modelling and exploration. In IEEE 28th International Conference on Application-Specific Systems, Architectures and Processors (ASAP’17). 184–189.
[10]
Nitin Chugh, Vinay Vasista, Suresh Purini, and Uday Bondhugula. 2016. A DSL compiler for accelerating image processing pipelines on FPGAs. In Proceedings of the International Conference on Parallel Architectures and Compilation. 327–338.
[11]
Keith D. Cooper, L. Taylor Simpson, and Christopher A. Vick. 2001. Operator strength reduction. ACM Transactions on Programming Languages and Systems 23, 5 (2001), 603–625.
[12]
Ross Daly, Leonard Truong, and Pat Hanrahan. 2018. Invoking and linking generators from multiple hardware languages using CoreIR. In Workshop on Open-Source EDA Technology (WOSET’18). https://woset-workshop.github.io/PDFs/2018/a11.pdf.
[13]
Jules R. Degila and Brunilde Sanso. 2004. A survey of topologies and performance measures for large-scale networks. IEEE Communications Surveys Tutorials 6, 4 (2004), 18–31.
[14]
David Durst, Matthew Feldman, Dillon Huff, David Akeley, Ross Daly, Gilbert Louis Bernstein, Marco Patrignani, Kayvon Fatahalian, and Pat Hanrahan. 2020. Type-directed scheduling of streaming accelerators. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’20). 408–422.
[15]
Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. DySER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32, 5 (2012), 38–51.
[16]
James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. 2014. Darkroom: Compiling high-level image processing code into hardware pipelines. ACM Transactions on Graphics 33, 4, Article 144 (July2014), 11 pages.
[17]
James Hegarty, Ross Daly, Zachary DeVito, Jonathan Ragan-Kelley, Mark Horowitz, and Pat Hanrahan. 2016. Rigel: Flexible multi-rate image processing hardware. ACM Transactions on Graphics 35, 4, Article 85 (2016), 11 pages.
[18]
Ralf Hinze. 2004. An algebra of scans. In International Conference on Mathematics of Program Construction. Springer, 186–210.
[19]
Dillon Huff, Steve Dai, and Pat Hanrahan. 2021. Clockwork: Resource-efficient static scheduling for multi-rate image processing applications on FPGAs. In IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’21). 186–194.
[21]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, FL) (MM’14). ACM, New York, NY, 675–678.
[22]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. SIGARCH Comput. Archit. News 45, 2 (June2017), 1–12.
[23]
Andrew B. Kahng, Sherief Reda, and Qinke Wang. 2005. APlace: A general analytic placement framework. In Proceedings of the 2005 International Symposium on Physical Design (San Francisco, CA) (ISPD’05). ACM, New York, NY, 233–235.
[24]
Khronos® OpenCL Working Group. [n. d.]. The OpenCL™ C Specification. Retrieved July 13, 2022 from https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_C.pdf.
[25]
Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor algebra compilation with workspaces. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’19). 180–192.
[26]
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 77 (Oct2017), 29 pages.
[27]
David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2018. Spatial: A language and compiler for application accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA) (PLDI’18). ACM, New York, NY, 296–311.
[28]
Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53, 2 (March2018), 461–475.
[29]
Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. 2019. HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19) (Seaside, CA). 242–251.
[30]
Jiajie Li, Yuze Chi, and Jason Cong. 2020. HeteroHalide: From image processing DSL to efficient FPGA acceleration. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA) (FPGA’20). ACM, New York, NY, 51–57.
[31]
Qiaoyi Liu, Dillon Huff, Jeff Setter, Maxwell Strange, Kathleen Feng, Kavya Sreedhar, Ziheng Wang, Keyi Zhang, Mark Horowitz, Priyanka Raina, et al. 2021. Compiling halide programs to push-memory accelerators. arXiv preprint arXiv:2105.12858 (2021).
[32]
Muhammad Masud. 2000. FPGA Routing Structures: A Novel Switch Block and Depopulated Interconnect Matrix Architectures. Ph. D. Dissertation. University of British Columbia. https://people.ece.ubc.ca/stevew/papers/pdf/imran_masc.pdf.
[33]
Maxeler Inc. [n. d.]. MaxCompiler. Retrieved July 13, 2022 from https://www.maxeler.com/products/software/maxcompiler.
[34]
Wim Meeus, Kristof Van Beeck, Toon Goedemé, Jan Meel, and Dirk Stroobandt. 2012. An overview of today’s high-level synthesis tools. Design Automation for Embedded Systems 16, 3 (2012), 31–51.
[35]
Baisha Mei, Mladen Berekovic, and J.-Y. Mignolet. 2007. ADRES & DRESC: Architecture and compiler for coarse-grain reconfigurable processors. Springer.
[36]
Mentor Graphics Inc. [n. d.]. Catapult High Level Synthesis. Retrieved July 13, 2022 from https://www.mentor.com/hls-lp/catapult-high-level-synthesis.
[37]
Thierry Moreau, Tianqi Chen, and Luis Ceze. 2018. Leveraging the VTA-TVM hardware-software stack for FPGA acceleration of 8-bit ResNet-18 inference. In Proceedings of the Reproducible Quality-Efficient Systems Tournament on Co-Designing Pareto-Efficient Deep Learning (ReQuEST) (Williamsburg, VA). Article 5.
[38]
Thierry Moreau, Tianqi Chen, Ziheng Jiang, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. VTA: An open hardware-software stack for deep learning. arXiv preprint arXiv:1807.04188 (2018).
[39]
Ravi Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics 35 (72016), 1–11.
[40]
Ankita Nayak, Keyi Zhang, Raj Setaluri, Alex Carsello, Makai Mann, Stephen Richardson, Rick Bahr, Pat Hanrahan, Mark Horowitz, and Priyanka Raina. 2020. A framework for adding low-overhead, fine-grained power domains to CGRAs. In 2020 Design Automation Test in Europe Conference Exhibition (DATE’20). 846–851.
[41]
John O’Donnell. 1988. Hydra: Hardware description in a functional language using recursion equations and high order combining forms. The Fusion of Hardware Design and Verification (1988), 309–328.
[42]
Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A reconfigurable architecture for parallel patterns. In ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA’17). 389–402.
[43]
Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2017. Programming heterogeneous systems from an image processing DSL. ACMTransactions on Architecture and Code Optimization 14, 3, Article 26 (Aug.2017), 25 pages.
[44]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, WA) (PLDI’13). ACM, New York, NY, 519–530.
[45]
Oliver Reiche, Moritz Schmid, Frank Hannig, Richard Membarth, and Jürgen Teich. 2014. Code generation from a domain-specific language for C-based HLS of hardware accelerators. In 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’14). 1–10.
[46]
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1–12.
[47]
Jordan S. Swartz, Vaughn Betz, and Jonathan Rose. 1998. A fast routability-driven router for FPGAs. In Proceedings of the ACM/SIGDA 6th International Symposium on Field Programmable Gate Arrays (Monterey, CA) (FPGA’98). ACM, New York, NY, 140–149.
[48]
Christopher Torng, Peitian Pan, Yanghui Ou, Cheng Tan, and Christopher Batten. 2021. Ultra-elastic CGRAs for irregular loop specialization. In IEEE International Symposium on High-Performance Computer Architecture (HPCA’21). 412–425.
[49]
Lenny Truong and Pat Hanrahan. 2019. A golden age of hardware description languages: Applying programming language techniques to improve design productivity. In 3rd Summit on Advances in Programming Languages (SNAPL’19)(Leibniz International Proceedings in Informatics (LIPIcs), Vol. 136), Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 7:1–7:21.
[50]
Lenny Truong, Steven Herbst, Rajsekhar Setaluri, Makai Mann, Ross Daly, Keyi Zhang, Caleb Donovick, Daniel Stanley, Mark Horowitz, Clark Barrett, and Pat Hanrahan. 2020. fault: A python embedded domain-specific language for metaprogramming portable hardware verification components. In Computer Aided Verification. Springer International Publishing, 403–414.
[51]
Nestan Tsiskaridze, Maxwell Strange, Makai Mann, Kavya Sreedhar, Qiaoyi Liu, Mark Horowitz, and Clark Barrett. 2021. Automating system configuration. In 2021 Formal Methods in Computer Aided Design (FMCAD).
[52]
Peter J. M. Van Laarhoven and Emile H. L. Aarts. 1987. Simulated annealing. In Simulated Annealing: Theory and Applications. Springer, 7–15.
[53]
Artem Vasilyev, Nikhil Bhagdikar, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz. 2016. Evaluating programmable architectures for imaging and vision applications. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1–13.
[54]
Rangharajan Venkatesan, Yakun Sophia Shao, Miaorong Wang, Jason Clemons, Steve Dai, Matthew Fojtik, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Yanqing Zhang, Brian Zimmer, William J. Dally, Joel Emer, Stephen W. Keckler, and Brucek Khailany. 2019. MAGNet: A modular accelerator generator for neural networks. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’19). 1–8.
[55]
Veripool. [n. d.]. Verilator. Retrieved July 13, 2022 from https://www.veripool.org/verilator/.
[56]
Renda Wang, Longjiang Guo, Chunyu Ai, Jinbao Li, Meirui Ren, and Keqin Li. 2013. An efficient graph isomorphism algorithm based on canonical labeling and its parallel implementation on GPU. In IEEE 10th International Conference on High Performance Computing and Communications IEEE International Conference on Embedded and Ubiquitous Computing. 1089–1096.
[57]
Xilinx Inc. [n. d.]. Vivado High Level Synthesis. Retrieved July 13, 2022 from https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html.
[58]
Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, and Yingyan Lin. 2020. AutoDNNchip: An automated DNN chip predictor and builder for both FPGAs and ASICs. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA) (FPGA’20). ACM, New York, NY, 40–50.
[59]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, CA) (FPGA’15). ACM, New York, NY, 161–170.
[60]
Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. 2018. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD’18) (San Diego, CA). Article 56, 8 pages.
[61]
Wei Zuo, Yun Liang, Peng Li, Kyle Rupnow, Deming Chen, and Jason Cong. 2013. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, CA) (FPGA’13). ACM, New York, NY, 9–18.

Cited By

View all
  • (2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 10-May-2024
  • (2024)HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space ExplorationACM Transactions on Reconfigurable Technology and Systems10.1145/365617617:2(1-31)Online publication date: 10-May-2024
  • (2024)FDRA: A Framework for a Dynamically Reconfigurable Accelerator Supporting Multi-Level ParallelismACM Transactions on Reconfigurable Technology and Systems10.1145/361422417:1(1-26)Online publication date: 27-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 22, Issue 2
March 2023
560 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3572826
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 24 January 2023
Online AM: 07 July 2022
Accepted: 30 April 2022
Revised: 12 March 2022
Received: 18 October 2021
Published in TECS Volume 22, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hardware accelerators
  2. coarse-grained reconfigurable arrays
  3. domain-specific languages
  4. image processing

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • DSSoC DARPA
  • Stanford AHA Agile Hardware Center
  • Affiliates Program, Intel’s Science and Technology Center (ISTC)
  • Stanford SystemX Alliance

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,676
  • Downloads (Last 6 weeks)117
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 10-May-2024
  • (2024)HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space ExplorationACM Transactions on Reconfigurable Technology and Systems10.1145/365617617:2(1-31)Online publication date: 10-May-2024
  • (2024)FDRA: A Framework for a Dynamically Reconfigurable Accelerator Supporting Multi-Level ParallelismACM Transactions on Reconfigurable Technology and Systems10.1145/361422417:1(1-26)Online publication date: 27-Jan-2024
  • (2024)Amber: A 16-nm System-on-Chip With a Coarse- Grained Reconfigurable Array for Flexible Acceleration of Dense Linear AlgebraIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.331311659:3(947-959)Online publication date: Mar-2024
  • (2024)A CGRA Front-end Compiler Enabling Extraction of General Control and Dedicated Operators2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473891(799-804)Online publication date: 22-Jan-2024
  • (2023)Building First-Order Energy Modeling Intuition in Computer Architecture LecturesProceedings of the Workshop on Computer Architecture Education10.1145/3605507.3610632(26-33)Online publication date: 17-Jun-2023
  • (2023)HETA: A Heterogeneous Temporal CGRA Modeling and Design Space Exploration via Bayesian OptimizationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.334453632:3(505-518)Online publication date: 25-Dec-2023
  • (2023)PRAD: A Bayesian Optimization-based DSE Framework for Parameterized Reconfigurable Architecture Design2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM57271.2023.00054(226-226)Online publication date: May-2023
  • (2022)OverGen: Improving FPGA Usability through Domain-Specific Overlay GenerationProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00018(35-56)Online publication date: 1-Oct-2022
  • (2022)Fast and Flexible FPGA Development using Hierarchical Partial Reconfiguration2022 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT56656.2022.9974201(1-10)Online publication date: 5-Dec-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media