research-article

A Software Scheme for Multithreading on CGRAs

Authors:

Reiley Jeyapaul,

Aviral ShrivastavaAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 14, Issue 1

Article No.: 19, Pages 1 - 26

https://doi.org/10.1145/2638558

Published: 21 January 2015 Publication History

Abstract

Recent industry trends show a drastic rise in the use of hand-held embedded devices, from everyday applications to medical (e.g., monitoring devices) and critical defense applications (e.g., sensor nodes). The two key requirements in the design of such devices are their processing capabilities and battery life. There is therefore an urgency to build high-performance and power-efficient embedded devices, inspiring researchers to develop novel system designs for the same. The use of a coprocessor (application-specific hardware) to offload power-hungry computations is gaining favor among system designers to suit their power budgets. We propose the use of CGRAs (Coarse-Grained Reconfigurable Arrays) as a power-efficient coprocessor. Though CGRAs have been widely used for streaming applications, the extensive compiler support required limits its applicability and use as a general purpose coprocessor. In addition, a CGRA structure can efficiently execute only one statically scheduled kernel at a time, which is a serious limitation when used as an accelerator to a multithreaded or multitasking processor. In this work, we envision a multithreaded CGRA where multiple schedules (or kernels) can be executed simultaneously on the CGRA (as a coprocessor). We propose a comprehensive software scheme that transforms the traditionally single-threaded CGRA into a multithreaded coprocessor to be used as a power-efficient accelerator for multithreaded embedded processors. Our software scheme includes (1) a compiler framework that integrates with existing CGRA mapping techniques to prepare kernels for execution on the multithreaded CGRA and (2) a runtime mechanism that dynamically schedules multiple kernels (offloaded from the processor) to execute simultaneously on the CGRA coprocessor. Our multithreaded CGRA coprocessor implementation thus makes it possible to achieve improved power-efficient computing in modern multithreaded embedded systems.

References

[1]

ARM-A9. 2009. ARM-A9 Datasheet. Retrieved from http://www.arm.com/files/pdf/ARMCortexA-9Processors.pdf.

[2]

F. Bouwens, M. Berekovic, A. Kanstein, and G. Gaydadjiev. 2007. Architectural exploration of the ADRES coarse-grained reconfigurable array. In ARC’07. 1--13. http://dl.acm.org/citation.cfm&quest;id=1764631.1764633.

Digital Library

[3]

CUDA-fermi 2010. Tesla S2050 GPU Computing System. Retrieved from http://www.nvidia.com/docs/IO/43395/NV-DS-Tesla-S2050-june10-final-LORES.pdf.

[4]

G. Dimitroulakos, S. Georgiopoulos, M. D. Galanis, and C. E. Goutis. 2009. Resource aware mapping on coarse grained reconfigurable arrays. Microprocess. Microsyst. 33, 2 (2009), 91--105.

Digital Library

[5]

G. Dimitroulakos, M. D. Galanis, and C. E. Goutis. 2005. A compiler method for memory-conscious mapping of applications on coarse-grained reconfigurable architectures. In 19th IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society, Washington, DC, USA, 4.

Digital Library

[6]

C. Ebeling, D. C. Cronquist, P. Franklin, J. Secosky, and S. G. Berg. 1997. Mapping applications to the RaPiD configurable architecture. In FCCM’97. IEEE Computer Society, 106--115.

Digital Library

[7]

S. Friedman, A. Carroll, B. Van Essen, B. Ylvisaker, C. Ebeling, and S. Hauck. 2009. SPR: An architecture-adaptive CGRA mapping tool. In FPGA’09. ACM, New York, NY, USA, 191--200.

Digital Library

[8]

M. Hamzeh, A. Shrivastava, and S. Vrudhula. 2012. EPIMap: Using epimorphism to map applications on CGRAs. In DAC’12. ACM, 1284--1291.

Digital Library

[9]

M. Hamzeh, A. Shrivastava, and S. Vrudhula. 2013. REGIMap: Register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, USA, Article 18, 10 pages.

Digital Library

[10]

R. Hartenstein. 2001. A decade of reconfigurable computing: A visionary retrospective. In DATE’01. IEEE Press.

Digital Library

[11]

R. W. Hartenstein and R. Kress. 1995. A datapath synthesis system for the reconfigurable datapath architecture. In ASP-DAC’95. ACM, New York, NY, USA, Article 77.

Digital Library

[12]

A. Hatanaka and N. Bagherzadeh. 2007. A modulo scheduling algorithm for a coarse-grain reconfigurable array template. In IPDPS’07. 1--8.

[13]

Intel-N550. 2010. Intel N550 Datasheet. Retrieved from http://ark.intel.com/products/50154/Intel-Atom- Processor-N550-(1M-Cache-1_50-GHz).

[14]

Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi. 2005. Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization. In DATE’05. IEEE Computer Society, Washington, DC, USA, 12--17.

Digital Library

[15]

Y. Kim, R. N. Mahapatra, and K. Choi. 2010. Design space exploration for efficient resource utilization in coarse-grained reconfigurable architecture. In Transactions on VLSI Systems. IEEE Press.

Digital Library

[16]

C. Liang and X. Huang. 2009. SmartCell: An energy efficient coarse-grained reconfigurable architecture for stream-based applications. EURASIP J. Embedded Syst. 2009, Article 1 (Jan. 2009), {15} pages.

Digital Library

[17]

B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In FTP’02. 166--173.

[18]

B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In DATE’03. IEEE Computer Society. 296--301.

Digital Library

[19]

B. Mei, F.-J. Veredas, and B. Masschelein. 2005. Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. In International Conference on Field Programmable Logic and Applications, 2005. 622--625.

[20]

B. Mei, M. Berekovic, and J.-Y. Mignolet. 2007. ADRES & DRESC: Architecture and compiler for coarse-grain reconfigurable processors. In Fine- and Coarse-Grain Reconfigurable Computing, S. Vassiliadis and D. Soudris (Eds.). Springer Netherlands, 255--297.

[21]

H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-S Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In PACT’08. ACM, New York, NY, USA, 166--176.

Digital Library

[22]

H. Park, Y. Park, and S. Mahlke. 2009a. Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications. In MICRO 42. ACM, New York, NY, USA, 370--380.

Digital Library

[23]

H. Park, K. Fan, M. Kudlur, and S. Mahlke. 2006. Modulo graph embedding: Mapping applications onto coarse-grained reconfigurable architectures. In CASES’06. ACM, 136--146.

Digital Library

[24]

Y. Park, H. Park, and S. Mahlke. 2009b. CGRA express: Accelerating execution using dynamic operation fusion. In CASES’09. ACM, New York, NY, USA, 271--280.

Digital Library

[25]

Y. Park, H. Park, and S. A. Mahlke. 2009. CGRA express: Accelerating execution using dynamic operation fusion. In CASES’09. 271--280.

Digital Library

[26]

B. Ramakrishna Rau. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In MICRO 27. ACM.

Digital Library

[27]

A. Shrivastava, J. Pager, R. Jeyapaul, M. H., and S. Vrudhula. 2011. Enabling multithreading on CGRAs. In ICPP’11. IEEE Computer Society, 255--264.

Digital Library

[28]

H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M. Chaves Filho. 2000. MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Comput. 49, 5 (May 2000), 465--481.

Digital Library

[29]

J. W. Yoon, J. W. Yoon, A. Shrivastava, S. Park, M. Ahn, R. Jeyapaul, and Y. Paek. 2008. SPKM: A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures. In DAC’08. 776--782.

Digital Library

Cited By

Gobieski GGhosh SHeule MMowry TNowatzki TBeckmann NLucia BHardavellas NCampanoni SGrot BKarpuzcu U(2022)RipTide: A Programmable, Energy-Minimal Dataflow Compiler and ArchitectureProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00046(546-564)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00046
Liu LWei SZhu JDeng CLiu LWei SZhu JDeng C(2022)Technical Difficulties and Development TrendSoftware Defined Chips10.1007/978-981-19-7636-0_3(135-166)Online publication date: 15-Nov-2022
https://doi.org/10.1007/978-981-19-7636-0_3
Wei SLiu LZhu JDeng CWei SLiu LZhu JDeng C(2022)Compilation SystemSoftware Defined Chips10.1007/978-981-19-6994-2_4(197-311)Online publication date: 21-Oct-2022
https://doi.org/10.1007/978-981-19-6994-2_4
Show More Cited By

Index Terms

A Software Scheme for Multithreading on CGRAs

Recommendations

Enabling Multithreading on CGRAs
ICPP '11: Proceedings of the 2011 International Conference on Parallel Processing

Coarse-Grained Reconfigurable Arrays or CGRAs are programmable fabrics that promise both high performance and high power efficiency. Traditionally, CGRAs were used to accelerate extremely-embedded systems, and were typically manually programmed. However,...
A power-efficient adaptive heapsort for fpga-based image coding application (abstract only)
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

This paper presents an adaptive heap sort architecture for an image coding implementation on FPGA, which specifically addresses the issue of sorting different amount of data located in each subband during the coding. The proposed sorting architecture is ...
Elastic CGRAs
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Vital technology trends such as voltage scaling and homogeneous multicore scaling have reached their limits and architects turn to alternate computing paradigms, such as heterogeneous and domain-specialized solutions. Coarse-Grain Reconfigurable Arrays (...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 14, Issue 1

January 2015

443 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/2724585

Editor:
Sandeep K. Shukla
Virginia Tech, USA

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 21 January 2015

Accepted: 01 June 2014

Revised: 01 April 2014

Received: 01 December 2011

Published in TECS Volume 14, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
299
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gobieski GGhosh SHeule MMowry TNowatzki TBeckmann NLucia BHardavellas NCampanoni SGrot BKarpuzcu U(2022)RipTide: A Programmable, Energy-Minimal Dataflow Compiler and ArchitectureProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00046(546-564)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00046
Liu LWei SZhu JDeng CLiu LWei SZhu JDeng C(2022)Technical Difficulties and Development TrendSoftware Defined Chips10.1007/978-981-19-7636-0_3(135-166)Online publication date: 15-Nov-2022
https://doi.org/10.1007/978-981-19-7636-0_3
Wei SLiu LZhu JDeng CWei SLiu LZhu JDeng C(2022)Compilation SystemSoftware Defined Chips10.1007/978-981-19-6994-2_4(197-311)Online publication date: 21-Oct-2022
https://doi.org/10.1007/978-981-19-6994-2_4
Wei SLiu LZhu JDeng CWei SLiu LZhu JDeng C(2022)Hardware Architectures and CircuitsSoftware Defined Chips10.1007/978-981-19-6994-2_3(77-196)Online publication date: 21-Oct-2022
https://doi.org/10.1007/978-981-19-6994-2_3
Wei SLiu LZhu JDeng CWei SLiu LZhu JDeng C(2022)Overview of SDCSoftware Defined Chips10.1007/978-981-19-6994-2_2(27-76)Online publication date: 21-Oct-2022
https://doi.org/10.1007/978-981-19-6994-2_2
Chen LZhu JDeng YLi ZChen JJiang XYin SWei SLiu L(2021)An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.308480432:12(3066-3080)Online publication date: 1-Dec-2021
https://doi.org/10.1109/TPDS.2021.3084804
Liu LMan XZhu JYin SWei S(2020)Pattern-Based Dynamic Compilation System for CGRAs With Online Configuration TransformationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300749231:12(2981-2994)Online publication date: 1-Dec-2020
https://doi.org/10.1109/TPDS.2020.3007492
Liu LZhu JLi ZLu YDeng YHan JYin SWei S(2019)A Survey of Coarse-Grained Reconfigurable Architecture and DesignACM Computing Surveys10.1145/335737552:6(1-39)Online publication date: 16-Oct-2019
https://dl.acm.org/doi/10.1145/3357375
Man XLiu LZhu JWei S(2019)A General Pattern-Based Dynamic Compilation Framework for Coarse-Grained Reconfigurable ArchitecturesProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317745(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317745
Lu YLiu LDeng YWeng JYin SShi YWei S(2018)Triggered-Issuance and Triggered-Execution: A Control Paradigm to Minimize Pipeline Stalls in Distributed Controlled Coarse-Grained Reconfigurable ArraysIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.282270829:10(2360-2372)Online publication date: 1-Oct-2018
https://doi.org/10.1109/TPDS.2018.2822708
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents