Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Improving Energy Efficiency of CGRAs with Low-Overhead Fine-Grained Power Domains

Published: 02 April 2023 Publication History

Abstract

To effectively minimize static power for a wide range of applications, power domains for coarse-grained reconfigurable array (CGRA) architectures need to be more fine-grained than those found in a typical application-specific integrated circuit. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that reduces the area overhead of power domain boundary protection from around 9% to less than 1% without incurring any extra timing delay from the isolation cells. Conventional Unified Power Format based flow for power domain boundary protection does not support this design choice. Therefore, we create our own compiler-like passes that iteratively introduce the needed design changes, and formally verify the transformations using methods based on satisfiability modulo theories. These passes also let us optimize how we handle test and debug signals through the off tiles in the CGRA. Using our framework, we add power domains to a CGRA that we designed and taped out. The CGRA has 32 × 16 processing element and memory tiles and 4-MB secondary memory. We address the implementation challenges encountered due to the introduction of fine-grained power domains, including the addressing of the CGRA tiles, the power grid design, well substrate connections, and distribution of global signals. Our CGRA achieves up to 83% reduction in leakage power and 26% reduction in total power versus an identical CGRA without multiple power domains, for a range of image processing and machine learning applications.

References

[1]
Andrew Adams. n.d. Halide. Retrieved September 6, 2022 from https://github.com/halide/halide.github.com.
[2]
Altera. 2017. Stratix V Device Handbook. Retrieved September 6, 2022 from https://media.digikey.com/pdf/Data%20Sheets/Altera%20PDFs/Stratix_V_Handbook.pdf.
[3]
Kota Ando, Shinya Takamaeda-Yamazaki, Masayuki Ikebe, Tetsuya Asai, and Masato Motomura. 2017. A multithreaded CGRA for convolutional neural network processing. Circuits and Systems 8, 6 (2017), 149–170.
[4]
Stephen D. Brown, Robert J. Francis, Jonathan Rose, and Zvonko G. Vranesic. 2012. Field-Programmable Gate Arrays. Vol. 180. Springer Science & Business Media, New York, NY.
[5]
Assem A. M. Bsoul and Steven J. E. Wilton. 2010. An FPGA architecture supporting dynamically controlled power gating. In Proceedings of the 2010 International Conference on Field-Programmable Technology. IEEE, Los Alamitos, CA, 1–8.
[7]
Allan Carroll, Stephen Friedman, Brian Van Essen, Aaron Wood, Benjamin Ylvisaker, Carl Ebeling, and Scott Hauck. 2007. Designing a coarse-grained reconfigurable architecture for power efficiency. In Proceedings of the Department of Energy NA-22 University Information Technical Interchange Review Meeting, Vol. 20. U.S. Department of Energy.
[8]
Yu-Chen Chen, Sheng-Yen Chen, and Yao-Wen Chang. 2014. Efficient and effective packing and analytical placement for large-scale heterogeneous FPGAs. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). IEEE, Los Alamitos, CA, 647–654.
[9]
Zhengyu Chen, Hai Zhou, and Jie Gu. 2019. R-accelerator: An RRAM-based CGRA accelerator with logic contraction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27, 11 (2019), 2655–2667.
[10]
Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the CVPR 2011 Workshops. IEEE, Los Alamitos, CA, 109–116.
[11]
David Flynn, Rob Aitken, Alan Gibbons, and Kaijian Shi. 2007. Low Power Methodology Manual: For System-on-Chip Design. Springer Science & Business Media, New York, NY.
[12]
Aman Gayasen, Y. Tsai, Narayanan Vijaykrishnan, Mahmut Kandemir, Mary Jane Irwin, and Tim Tuan. 2004. Reducing leakage energy in FPGAs using region-constrained placement. In Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 51–58.
[13]
Graham Gobieski, Ahmet Oguz Atli, Kenneth Mai, Brandon Lucia, and Nathan Beckmann. 2021. Snafu: An ultra-low-power, energy-minimal CGRA-generation framework and architecture. In Proceedings of the 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA’21). IEEE, Los Alamitos, CA, 1027–1040.
[14]
Venkatesh Gourisetty, Hamid Mahmoodi, Vazgen Melikyan, Eduard Babayan, Rich Goldman, Katie Holcomb, and Troy Wood. 2013. Low power design flow based on Unified Power Format and Synopsys tool chain. In Proceedings of the 2013 3rd Interdisciplinary Engineering Design Education Conference. IEEE, Los Alamitos, CA, 28–31.
[15]
Tom R. Halfhill. 2010. Tabulas Time Machine—Rapidly Reconfigurable Chips Will Challenge Conventional FPGAs. Microprocessor Report. Tabula.
[16]
Kyuseung Han, Seongsik Park, and Kiyoung Choi. 2012. State-based full predication for low power coarse-grained reconfigurable architecture. In Proceedings of the 2012 Design, Automation, and Test in Europe Conference and Exhibition (DATE’12). IEEE, Los Alamitos, CA, 1367–1372.
[17]
Pat Hanrahan. n.d. Magma. Retrieved September 6, 2022 from https://github.com/phanrahan/magma.
[18]
Shota Ishihara, Masanori Hariyama, and Michitaka Kameyama. 2010. A low-power FPGA based on autonomous fine-grain power gating. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19, 8 (2010), 1394–1406.
[19]
Syed M. A. H. Jafri, Ozan Bag, Ahmed Hemani, Nasim Farahini, Kolin Paul, Juha Plosila, and Hannu Tenhunen. 2013. Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells. In Proceedings of the International Symposium on Quality Electronic Design (ISQED’13). IEEE, Los Alamitos, CA, 104–111. DOI:
[20]
Syed M. A. H. Jafri, Tuan Nguyen Gia, Sergei Dytckov, Masoud Daneshtalab, Ahmed Hemani, Juha Plosila, and Hannu Tenhunen. 2014. NeuroCGRA: A CGRA with support for neural networks. In Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS’14). IEEE, Los Alamitos, CA, 506–511.
[21]
Andrew B. Kahng, Sherief Reda, and Qinke Wang. 2005. APlace: A general analytic placement framework. In Proceedings of the 2005 International Symposium on Physical Design. ACM, New York, NY, 233–235.
[22]
Manupa Karunaratne, Aditi Kulkarni Mohite, Tulika Mitra, and Li-Shiuan Peh. 2017. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. In Proceedings of the 54th Annual Design Automation Conference.ACM, New York, NY, 1–6.
[23]
Changmoo Kim, Mookyoung Chung, Yeongon Cho, Mario Konijnenburg, Soojung Ryu, and Jeongwook Kim. 2014. ULP-SRP: Ultra low-power Samsung reconfigurable processor for biomedical applications. ACM Transactions on Reconfigurable Technology and Systems 7, 3 (2014), 1–15.
[24]
Takuya Kojima, Nguyen Anh Vu Doan, and Hideharu Amano. 2020. GenMap: A genetic algorithmic approach for optimizing spatial mapping of coarse-grained reconfigurable architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 11 (2020), 2383–2396.
[25]
Guilherme Korol, Michael Guilherme Jordan, Marcelo Brandalero, Michael Hübner, Mateus Beck Rutzig, and Antonio Carlos Schneider Beck. 2020. MCEA: A resource-aware multicore CGRA architecture for the edge. In Proceedings of the 2020 30th International Conference on Field-Programmable Logic and Applications (FPL’20). IEEE, Los Alamitos, CA, 33–39.
[26]
Johannes Maximilian Kühn, Dustin Peterson, Hideharu Amano, Oliver Bringmann, and Wolfgang Rosenstiel. 2015. Spatial and temporal granularity limits of body biasing in UTBB-FDSOI. In Proceedings of the 2015 Design, Automation, and Test in Europe Conference and Exhibition (DATE’15). IEEE, Los Alamitos, CA, 876–879.
[27]
Yuan Lei, Peng Luo, Chi Hong Chan, Xiao Huo, Yiu Kei Li, and Mei Kei Ieong. 2020. Low power AI ASIC design for portable edge computing. In Proceedings of the 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology (ICSICT’20). IEEE, Los Alamitos, CA, 1–4.
[28]
Ce Li, Yiping Dong, and Takahiro Watanabe. 2011. New power-efficient FPGA design combining with region-constrained placement and multiple power domains. In Proceedings of the 2011 IEEE 9th International New Circuits and Systems Conference. IEEE, Los Alamitos, CA, 69–72.
[29]
Fei Li and Lei He. 2001. Maximum current estimation considering power gating. In Proceedings of the 2001 International Symposium on Physical Design. ACM, New York, NY, 106–111.
[30]
Yixing Li, Zichuan Liu, Wenye Liu, Yu Jiang, Yongliang Wang, Wang Ling Goh, Hao Yu, and Fengbo Ren. 2018. A 34-FPS 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing. IEEE Transactions on Industrial Electronics 66, 9 (2018), 7407–7416.
[31]
João Lopes, Diogo Sousa, and João Canas Ferreira. 2017. Evaluation of CGRA architecture for real-time processing of biological signals on wearable devices. In Proceedings of the 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig’17). IEEE, Los Alamitos, CA, 1–7.
[32]
Anmol Mathur and Qi Wang. 2009. Power reduction techniques and flows at RTL and system level. In Proceedings of the 2009 22nd International Conference on VLSI Design. IEEE, New York, NY, 28–29. DOI:
[33]
Cristian Mattarei, Makai Mann, Clark Barrett, Ross G. Daly, Dillon Huff, and Pat Hanrahan. 2018. CoSA: Integrated verification for agile hardware design. In Proceedings of the 2018 Conference on Formal Methods in Computer Aided Design (FMCAD’18). IEEE, Los Alamitos, CA, 1–5.
[34]
Nick Mehta. 2012. Xilinx 7 Series FPGAs: The Logical Advantage. Retrieved September 6, 2022 from https://www.techonline.com/tech-papers/xilinx-7-series-fpgas-the-logical-advantage/.
[35]
Narasinga Rao Miniskar, Rahul R. Patil, Raj Narayana Gadde, Young-Chul Rams Cho, Sukjin Kim, and Shi Hwa Lee. 2016. Intra mode power saving methodology for CGRA-based reconfigurable processor architectures. In Proceedings of the 2016 IEEE International Symposium on Circuits and Systems (ISCAS’16). IEEE, Los Alamitos, CA, 714–717.
[36]
Fahad Bin Muslim, Affaq Qamar, and Luciano Lavagno. 2015. Low power methodology for an ASIC design flow based on high-level synthesis. In Proceedings of the 2015 23rd International Conference on Software, Telecommunications, and Computer Networks (SoftCOM’15). IEEE, Los Alamitos, CA, 11–15. DOI:
[37]
Chris Nicol. 2017. A Coarse Grain Reconfigurable Array (CGRA) for Statically Scheduled Data Flow Computing. White Paper. Wave Computing.
[38]
Nobuaki Ozaki, Yoshihiro Yasuda, Mai Izawa, Yoshiki Saito, Daisuke Ikebuchi, Hideharu Amano, Hiroshi Nakamura, Kimiyoshi Usami, Mitaro Namiki, and Masaaki Kondo. 2011. Cool mega-arrays: Ultralow-power reconfigurable accelerator chips. IEEE Micro 31, 6 (2011), 6–18.
[39]
Wikipedia. n.d. Boolean Satisfiability Problem. Retrieved September 6, 2022 from https://en.wikipedia.org/wiki/Boolean_satisfiability_problem.
[40]
Raj Setaluri. n.d. Gemstone. Retrieved September 6, 2022 from https://github.com/StanfordAHA/gemstone.
[41]
Reza Sharafinejad, Bijan Alizadeh, and Masahiro Fujita. 2015. UPF-based formal verification of low power techniques in modern processors. In Proceedings of the 2015 IEEE 33rd VLSI Test Symposium (VTS’15). IEEE, Los Alamitos, CA, 1–6.
[42]
G. A. Shaw, J. C. Anderson, and V. K. Madisetti. 1995. Assessing and improving current practice in the design of application-specific signal processors. In Proceedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing, Vol. 4. 2707–2710. DOI:
[43]
Youngsoo Shin, Jun Seomun, Kyu-Myung Choi, and Takayasu Sakurai. 2010. Power gating: Circuits, design methodologies, and best practice for standard-cell VLSI designs. ACM Transactions on Design Automation of Electronic Systems 15, 4 (Oct. 2010), Article 28, 37 pages. DOI:
[44]
Wikipedia. n.d. Satisfiability Modulo Theories. Retrieved September 6, 2022 from https://en.wikipedia.org/wiki/Satisfiability_modulo_theories.
[45]
Jordan S. Swartz, Vaughn Betz, and Jonathan Rose. 1998. A fast routability-driven router for FPGAs. In Proceedings of the 1998 ACM/SIGDA 6th International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 140–149.
[46]
Masakazu Tanomoto, Shinya Takamaeda-Yamazaki, Jun Yao, and Yasuhiko Nakashima. 2015. A CGRA-based approach for accelerating convolutional neural networks. In Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip. IEEE, Los Alamitos, CA, 73–80.
[47]
UPF. 2018. Power Intent Standard. https://www.p1801.org.
[48]
Peter J. M. Van Laarhoven and Emile H. L. Aarts. 1987. Simulated annealing. In Simulated Annealing: Theory and Applications. Springer, New York, NY, 7–15.
[49]
Artem Vasilyev, Nikhil Bhagdikar, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz. 2016. Evaluating programmable architectures for imaging and vision applications. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, Los Alamitos, CA, 1–13.
[50]
Xingyu Zhou, Robert Canady, Shunxing Bao, and Aniruddha Gokhale. 2020. Cost-effective hardware accelerator recommendation for edge computing. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Edge Computing (HotEdge’20).

Cited By

View all
  • (2024)Amber: A 16-nm System-on-Chip With a Coarse- Grained Reconfigurable Array for Flexible Acceleration of Dense Linear AlgebraIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.331311659:3(947-959)Online publication date: Mar-2024
  • (2024)An Investigation into the Impact of Using Automated Synthesisable Internal Power-Gating on Improved Power Efficiency for ASICs2024 International Conference on Optimization Computing and Wireless Communication (ICOCWC)10.1109/ICOCWC60930.2024.10470629(1-5)Online publication date: 29-Jan-2024
  • (2024)Mapping Enumeration for Multi-Context CGRAs Using Zero-Suppressed Binary Decision Diagrams2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00026(151-161)Online publication date: 5-May-2024

Index Terms

  1. Improving Energy Efficiency of CGRAs with Low-Overhead Fine-Grained Power Domains

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 16, Issue 2
        June 2023
        451 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/3587031
        • Editor:
        • Deming Chen
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 02 April 2023
        Online AM: 27 August 2022
        Accepted: 08 August 2022
        Revised: 30 May 2022
        Received: 13 December 2021
        Published in TRETS Volume 16, Issue 2

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Reconfigurable computing
        2. coarse-grained reconfigurable arrays
        3. power domains
        4. hardware generators

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)243
        • Downloads (Last 6 weeks)20
        Reflects downloads up to 09 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Amber: A 16-nm System-on-Chip With a Coarse- Grained Reconfigurable Array for Flexible Acceleration of Dense Linear AlgebraIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.331311659:3(947-959)Online publication date: Mar-2024
        • (2024)An Investigation into the Impact of Using Automated Synthesisable Internal Power-Gating on Improved Power Efficiency for ASICs2024 International Conference on Optimization Computing and Wireless Communication (ICOCWC)10.1109/ICOCWC60930.2024.10470629(1-5)Online publication date: 29-Jan-2024
        • (2024)Mapping Enumeration for Multi-Context CGRAs Using Zero-Suppressed Binary Decision Diagrams2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00026(151-161)Online publication date: 5-May-2024

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media