Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

MC-DeF: Creating Customized CGRAs for Dataflow Applications

Published: 14 April 2021 Publication History

Abstract

Executing complex scientific applications on Coarse-Grain Reconfigurable Arrays (CGRAs) promises improvements in execution time and/or energy consumption compared to optimized software implementations or even fully customized hardware solutions. Typical CGRA architectures contain of multiple instances of the same compute module that consist of simple and general hardware units such as ALUs, simple processors. However, generality in the cell contents, while convenient for serving a wide variety of applications, penalizes performance and energy efficiency. To that end, a few proposed CGRAs use custom logic tailored to a particular application’s specific characteristics in the compute module. This approach, while much more efficient, restricts the versatility of the array. To date, versatility at hardware speeds is only supported with Field programmable gate arrays (FPGAs), that are reconfigurable at a very fine grain.
This work proposes MC-DeF, a novel Mixed-CGRA Definition Framework targeting a Mixed-CGRA architecture that leverages the advantages of CGRAs by utilizing a customized cell array, and those of FPGAs by incorporating a separate LUT array used for adaptability. The framework presented aims to develop a complete CGRA architecture. First, a cell structure and functionality definition phase creates highly customized application/domain specific CGRA cells. Then, mapping and routing phases define the CGRA connectivity and cell-LUT array transactions. Finally, an energy and area estimation phase presents the user with area occupancy and energy consumption estimations of the final design. MC-DeF uses novel algorithms and cost functions driven by user defined metrics, threshold values, and area/energy restrictions. The benefits of our framework, besides creating fast and efficient CGRA designs, include design space exploration capabilities offered to the user.
The validity of the presented framework is demonstrated by evaluating and creating CGRA designs of nine applications. Additionally, we provide comparisons of MC-DeF with state-of-the-art related works, and show that MC-DeF offers competitive performance (in terms of internal bandwidth and processing throughput) even compared against much larger designs, and requires fewer physical resources to achieve this level of performance. Finally, MC-DeF is able to better utilize the underlying FPGA fabric and achieves the best efficiency (measured in LUT/GOPs).

References

[1]
E. Ahmed and J. Rose. 2004. The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 12, 3 (March 2004), 288--298.
[2]
Mythri Alle, Keshavan Varadarajan, Alexander Fell, Ramesh Reddy C., Nimmy Joseph, Saptarsi Das, Prasenjit Biswas, Jugantor Chetia, Adarsh Rao, S. K. Nandy, and Ranjani Narayan. 2009. REDEFINE: Runtime reconfigurable polymorphic ASIC. ACM Transactions on Embedded Computing Systems 9, 2, Article 11 (Oct. 2009), 48 pages.
[3]
G. Ansaloni, P. Bonzini, and L. Pozzi. 2011. EGRA: A coarse grained reconfigurable architectural template. IEEE Transactions on Very Large Scale Integration Systems 19, 6 (June 2011), 1062--1074.
[4]
J. Chang et al. 2017. 12.1 A 7nm 256Mb SRAM in high-k metal-gate FinFET technology with write-assist circuitry for low-VMIN applications. In 2017 IEEE International Solid-State Circuits Conference (ISSCC’17). 206--207.
[5]
George Charitopoulos and Dionisios N. Pnevmatikatos. 2018. DARSA: A dataflow analysis tool for reconfigurable platforms. In 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS’18). 65--72.
[6]
George Charitopoulos and Dionisios N. Pnevmatikatos. 2020. A CGRA definition framework for dataflow applications. In Applied Reconfigurable Computing. Springer International Publishing, Cham.
[7]
S. A. Chin, N. Sakamoto, A. Rui, J. Zhao, J. H. Kim, Y. Hara-Azumi, and J. Anderson. 2017. CGRA-ME: A unified framework for CGRA modelling and exploration. In 2017 IEEE 28th International Conference on Application-Specific Systems, Architectures and Processors (ASAP’17). 184--189.
[8]
N. Clark, Hongtao Zhong, and S. Mahlke. 2003. Processor acceleration through automated instruction set customization. In 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36. 129--140.
[9]
J. Cong, H. Huang, C. Ma, B. Xiao, and P. Zhou. 2014. A fully pipelined and dynamically composable architecture of CGRA. In 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. 9--16.
[10]
J. Coole and G. Stitt. 2010. Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing. In 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’10). 13--22.
[11]
Bill Dally. 2015. Challenges for Future Computing Systems. Presentation in HiPEAC Conference.
[12]
Bjorn De Sutter, Praveen Raghavan, and Andy Lambrechts. 2019. Coarse-grained reconfigurable array architectures. In Handbook of Signal Processing Systems. Springer, 427--472.
[13]
Carl Ebeling, Darren C. Cronquist, and Paul Franklin. 1996. RaPiD—Reconfigurable pipelined datapath. In The 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers (FPL’96). 126--135.
[14]
Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. GraMi: Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment 7, 7 (March 2014), 517--528.
[15]
V. Govindaraju, C. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim. 2012. DySER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32, 5 (Sept. 2012), 38--51.
[16]
V. Govindaraju, C. Ho, and K. Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. 503--514.
[17]
Reiner Hartenstein. 2001. Coarse grain reconfigurable architecture (embedded tutorial). In The 2001 Asia and South Pacific Design Automation Conference (DAC’01). ACM, 564--570.
[18]
Takaki Hayashi and Nakahiro Yoshida. 2005. On covariance estimation of non-synchronously observed diffusion processes. Bernoulli 11, 2 (April 2005), 359--379.
[19]
Wen-Hsiang Hu, Seung Eun Lee, and Nader Bagherzadeh. 2008. DMesh: A diagonally-linked mesh network-on-chip architecture. Network on Chip Architectures (2008), 14.
[20]
Konstantinos Iordanou, Sofia Maria Nikolakaki, Pavlos Malakonakis, and Apostolos Dollas. 2018. A performance evaluation of multi-FPGA architectures for computations of information transfer. In 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS’18). 1--9.
[21]
M. Jacobsen, P. Meng, S. Sampangi, and R. Kastner. 2014. FPGA accelerated online boosting for multi-target tracking. In 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. 165--168.
[22]
A. K. Jain, S. A. Fahmy, and D. L. Maskell. 2015. Efficient overlay architecture based on DSP blocks. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. 25--28.
[23]
Abhishek Kumar Jain, Xiangwei Li, Suhaib A. Fahmy, and Douglas L. Maskell. 2016. Adapting the DySER architecture with DSP blocks as an overlay for the Xilinx Zynq. SIGARCH Computer Architecture News 43, 4 (April 2016), 28--33.
[24]
A. K. Jain, X. Li, P. Singhai, D. L. Maskell, and S. A. Fahmy. 2016. DeCO: A DSP block based FPGA accelerator overlay with low overhead interconnect. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). 1--8.
[25]
A. K. Jain, D. L. Maskell, and S. A. Fahmy. 2016. Are coarse-grained overlays ready for general purpose application acceleration on FPGAs? In 2016 IEEE 14th International Conference on Dependable, Autonomic and Secure Computing, 14th International Conference on Pervasive Intelligence and Computing (DASC/PiCom/DataCom/CyberSciTech’16). 586--593.
[26]
S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco. 2011. GPUs and the future of parallel computing. IEEE Micro 31, 5 (Sept. 2011), 7--17.
[27]
I. Kuon and J. Rose. 2007. Measuring the Gap between FPGAs and ASICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 2 (2007), 203--215.
[28]
Aaron Landy and Greg Stitt. 2012. A low-overhead interconnect architecture for virtual reconfigurable fabrics. In The 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’12). Association for Computing Machinery, New York, NY, 111--120.
[29]
C. Liu, H. Ng, and H. K. So. 2015. QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay. In 2015 International Conference on Field Programmable Technology (FPT’15). 56--63.
[30]
C. Liu, C. L. Yu, and H. K. So. 2013. A soft coarse-grained reconfigurable array based high-level synthesis methodology: Promoting design productivity and exploring extreme FPGA frequency. In 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines. 228--228.
[31]
D. Liu et al. 2018. Data-flow graph mapping optimization for CGRA with deep reinforcement learning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2018), 1--1.
[32]
Sen Ma, Zeyad Aklah, and David Andrews. 2016. Just in time assembly of accelerators. In The 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). Association for Computing Machinery, New York, NY, 173--178.
[33]
K. T. Madhu, S. Das, S. Nalesh, S. K. Nandy, and R. Narayan. 2015. Compiling HPC kernels for the REDEFINE CGRA. In IEEE 17th International Conference on High Performance Computing and Communications, and 12th International Conference on Embedded Software and Systems. 405--410.
[34]
A. Niedermeier, Jan Kuper, and Gerard J. M. Smit. 2014. A dataflow inspired programming paradigm for coarse-grained reconfigurable arrays. In Reconfigurable Computing: Architectures, Tools, and Applications. Springer International Publishing, Cham, 275--282.
[35]
O. Pell and V. Averbukh. 2012. Maximum performance computing with dataflow engines. Computing in Science Engineering 14, 4 (July 2012), 98--103.
[36]
D. Rossi, F. Conti, A. Marongiu, A. Pullini, I. Loi, M. Gautschi, G. Tagliavini, A. Capotondi, P. Flatresse, and L. Benini. 2015. PULP: A parallel ultra low power platform for next generation IoT applications. In 2015 IEEE Hot Chips 27 Symposium (HCS’15). 1--39.
[37]
Mainak Sen et al. 2007. Dataflow-based mapping of computer vision algorithms onto FPGAs. EURASIP Journal on Embedded Systems 2007, 1 (Jan. 2007), 049236.
[38]
S. Shreejith, S. A. Fahmy, and M. Lukasiewycz. 2013. Reconfigurable computing in next-generation automotive networks. IEEE Embedded Systems Letters 5, 1 (2013), 12--15.
[39]
T. Standaert et al. 2016. BEOL process integration for the 7 nm technology node. In 2016 IEEE International Interconnect Technology Conference/Advanced Metallization Conference (IITC/AMC’16). 2--4.
[40]
M. Stojilovi ć, D. Novo, L. Saranovac, P. Brisk, and P. Ienne. 2013. Selective flexibility: Creating domain-specific reconfigurable arrays. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 5 (May 2013), 681--694.
[41]
C. Tan, M. Karunaratne, T. Mitra, and L. Peh. 2018. Stitch: Fusible heterogeneous accelerators enmeshed with many-core architecture for wearables. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 575--587.
[42]
Nemanja Trifunovic, Veljko Milutinovic, Nenad Korolija, and Georgi Gaydadjiev. 2016. An AppGallery for dataflow computing. Journal of Big Data 3, 1 (2016), 4.
[43]
B. S. C. Varma, K. Paul, and M. Balakrishnan. 2013. Accelerating 3D-FFT using hard embedded blocks in FPGAs. In 2013 26th International Conference on VLSI Design and 2013 12th International Conference on Embedded Systems. 92--97.
[44]
Xilinx 2018. 7 Series FPGAs Data Sheet: Overview. Xilinx. Rev. 2.6.
[45]
S. Yin, D. Liu, L. Sun, L. Liu, and S. Wei. 2017. DFGNet: Mapping dataflow graph onto CGRA by a deep learning approach. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS’17). 1--4.

Cited By

View all
  • (2023)Loop Subgraph-Level Greedy Mapping Algorithm for Grid Coarse-Grained Reconfigurable ArrayTsinghua Science and Technology10.26599/TST.2022.901000128:2(330-343)Online publication date: Apr-2023
  • (2022)Architectural Implications for Inference of Graph Neural Networks on CGRA-based Accelerators2022 17th Conference on Ph.D Research in Microelectronics and Electronics (PRIME)10.1109/PRIME55000.2022.9816810(373-376)Online publication date: 12-Jun-2022
  • (2022)Energy Efficient Design of Coarse-Grained Reconfigurable Architectures: Insights, Trends and Challenges2022 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT56656.2022.9974339(1-11)Online publication date: 5-Dec-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 18, Issue 3
September 2021
370 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3460978
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 April 2021
Accepted: 01 January 2021
Revised: 01 December 2020
Received: 01 July 2020
Published in TACO Volume 18, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CGRA
  2. CGRA framework
  3. FPGA
  4. reconfigurable computing

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)219
  • Downloads (Last 6 weeks)29
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Loop Subgraph-Level Greedy Mapping Algorithm for Grid Coarse-Grained Reconfigurable ArrayTsinghua Science and Technology10.26599/TST.2022.901000128:2(330-343)Online publication date: Apr-2023
  • (2022)Architectural Implications for Inference of Graph Neural Networks on CGRA-based Accelerators2022 17th Conference on Ph.D Research in Microelectronics and Electronics (PRIME)10.1109/PRIME55000.2022.9816810(373-376)Online publication date: 12-Jun-2022
  • (2022)Energy Efficient Design of Coarse-Grained Reconfigurable Architectures: Insights, Trends and Challenges2022 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT56656.2022.9974339(1-11)Online publication date: 5-Dec-2022
  • (2021)FastCGRA: A Modeling, Evaluation, and Exploration Platform for Large-Scale Coarse-Grained Reconfigurable Arrays2021 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT52863.2021.9609928(1-10)Online publication date: 6-Dec-2021
  • (2021)CGRA-ME: An Open-Source Framework for CGRA Architecture and CAD Research : (Invited Paper)2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP52443.2021.00030(156-162)Online publication date: Jul-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media