Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Software Scheme for Multithreading on CGRAs

Published: 21 January 2015 Publication History

Abstract

Recent industry trends show a drastic rise in the use of hand-held embedded devices, from everyday applications to medical (e.g., monitoring devices) and critical defense applications (e.g., sensor nodes). The two key requirements in the design of such devices are their processing capabilities and battery life. There is therefore an urgency to build high-performance and power-efficient embedded devices, inspiring researchers to develop novel system designs for the same. The use of a coprocessor (application-specific hardware) to offload power-hungry computations is gaining favor among system designers to suit their power budgets. We propose the use of CGRAs (Coarse-Grained Reconfigurable Arrays) as a power-efficient coprocessor. Though CGRAs have been widely used for streaming applications, the extensive compiler support required limits its applicability and use as a general purpose coprocessor. In addition, a CGRA structure can efficiently execute only one statically scheduled kernel at a time, which is a serious limitation when used as an accelerator to a multithreaded or multitasking processor. In this work, we envision a multithreaded CGRA where multiple schedules (or kernels) can be executed simultaneously on the CGRA (as a coprocessor). We propose a comprehensive software scheme that transforms the traditionally single-threaded CGRA into a multithreaded coprocessor to be used as a power-efficient accelerator for multithreaded embedded processors. Our software scheme includes (1) a compiler framework that integrates with existing CGRA mapping techniques to prepare kernels for execution on the multithreaded CGRA and (2) a runtime mechanism that dynamically schedules multiple kernels (offloaded from the processor) to execute simultaneously on the CGRA coprocessor. Our multithreaded CGRA coprocessor implementation thus makes it possible to achieve improved power-efficient computing in modern multithreaded embedded systems.

References

[1]
ARM-A9. 2009. ARM-A9 Datasheet. Retrieved from http://www.arm.com/files/pdf/ARMCortexA-9Processors.pdf.
[2]
F. Bouwens, M. Berekovic, A. Kanstein, and G. Gaydadjiev. 2007. Architectural exploration of the ADRES coarse-grained reconfigurable array. In ARC’07. 1--13. http://dl.acm.org/citation.cfm?id=1764631.1764633.
[3]
CUDA-fermi 2010. Tesla S2050 GPU Computing System. Retrieved from http://www.nvidia.com/docs/IO/43395/NV-DS-Tesla-S2050-june10-final-LORES.pdf.
[4]
G. Dimitroulakos, S. Georgiopoulos, M. D. Galanis, and C. E. Goutis. 2009. Resource aware mapping on coarse grained reconfigurable arrays. Microprocess. Microsyst. 33, 2 (2009), 91--105.
[5]
G. Dimitroulakos, M. D. Galanis, and C. E. Goutis. 2005. A compiler method for memory-conscious mapping of applications on coarse-grained reconfigurable architectures. In 19th IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society, Washington, DC, USA, 4.
[6]
C. Ebeling, D. C. Cronquist, P. Franklin, J. Secosky, and S. G. Berg. 1997. Mapping applications to the RaPiD configurable architecture. In FCCM’97. IEEE Computer Society, 106--115.
[7]
S. Friedman, A. Carroll, B. Van Essen, B. Ylvisaker, C. Ebeling, and S. Hauck. 2009. SPR: An architecture-adaptive CGRA mapping tool. In FPGA’09. ACM, New York, NY, USA, 191--200.
[8]
M. Hamzeh, A. Shrivastava, and S. Vrudhula. 2012. EPIMap: Using epimorphism to map applications on CGRAs. In DAC’12. ACM, 1284--1291.
[9]
M. Hamzeh, A. Shrivastava, and S. Vrudhula. 2013. REGIMap: Register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, USA, Article 18, 10 pages.
[10]
R. Hartenstein. 2001. A decade of reconfigurable computing: A visionary retrospective. In DATE’01. IEEE Press.
[11]
R. W. Hartenstein and R. Kress. 1995. A datapath synthesis system for the reconfigurable datapath architecture. In ASP-DAC’95. ACM, New York, NY, USA, Article 77.
[12]
A. Hatanaka and N. Bagherzadeh. 2007. A modulo scheduling algorithm for a coarse-grain reconfigurable array template. In IPDPS’07. 1--8.
[13]
Intel-N550. 2010. Intel N550 Datasheet. Retrieved from http://ark.intel.com/products/50154/Intel-Atom- Processor-N550-(1M-Cache-1_50-GHz).
[14]
Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi. 2005. Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization. In DATE’05. IEEE Computer Society, Washington, DC, USA, 12--17.
[15]
Y. Kim, R. N. Mahapatra, and K. Choi. 2010. Design space exploration for efficient resource utilization in coarse-grained reconfigurable architecture. In Transactions on VLSI Systems. IEEE Press.
[16]
C. Liang and X. Huang. 2009. SmartCell: An energy efficient coarse-grained reconfigurable architecture for stream-based applications. EURASIP J. Embedded Syst. 2009, Article 1 (Jan. 2009), {15} pages.
[17]
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In FTP’02. 166--173.
[18]
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In DATE’03. IEEE Computer Society. 296--301.
[19]
B. Mei, F.-J. Veredas, and B. Masschelein. 2005. Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. In International Conference on Field Programmable Logic and Applications, 2005. 622--625.
[20]
B. Mei, M. Berekovic, and J.-Y. Mignolet. 2007. ADRES & DRESC: Architecture and compiler for coarse-grain reconfigurable processors. In Fine- and Coarse-Grain Reconfigurable Computing, S. Vassiliadis and D. Soudris (Eds.). Springer Netherlands, 255--297.
[21]
H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-S Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In PACT’08. ACM, New York, NY, USA, 166--176.
[22]
H. Park, Y. Park, and S. Mahlke. 2009a. Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications. In MICRO 42. ACM, New York, NY, USA, 370--380.
[23]
H. Park, K. Fan, M. Kudlur, and S. Mahlke. 2006. Modulo graph embedding: Mapping applications onto coarse-grained reconfigurable architectures. In CASES’06. ACM, 136--146.
[24]
Y. Park, H. Park, and S. Mahlke. 2009b. CGRA express: Accelerating execution using dynamic operation fusion. In CASES’09. ACM, New York, NY, USA, 271--280.
[25]
Y. Park, H. Park, and S. A. Mahlke. 2009. CGRA express: Accelerating execution using dynamic operation fusion. In CASES’09. 271--280.
[26]
B. Ramakrishna Rau. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In MICRO 27. ACM.
[27]
A. Shrivastava, J. Pager, R. Jeyapaul, M. H., and S. Vrudhula. 2011. Enabling multithreading on CGRAs. In ICPP’11. IEEE Computer Society, 255--264.
[28]
H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M. Chaves Filho. 2000. MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Comput. 49, 5 (May 2000), 465--481.
[29]
J. W. Yoon, J. W. Yoon, A. Shrivastava, S. Park, M. Ahn, R. Jeyapaul, and Y. Paek. 2008. SPKM: A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures. In DAC’08. 776--782.

Cited By

View all
  • (2022)RipTide: A Programmable, Energy-Minimal Dataflow Compiler and ArchitectureProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00046(546-564)Online publication date: 1-Oct-2022
  • (2022)Technical Difficulties and Development TrendSoftware Defined Chips10.1007/978-981-19-7636-0_3(135-166)Online publication date: 15-Nov-2022
  • (2022)Compilation SystemSoftware Defined Chips10.1007/978-981-19-6994-2_4(197-311)Online publication date: 21-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 14, Issue 1
January 2015
443 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2724585
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 21 January 2015
Accepted: 01 June 2014
Revised: 01 April 2014
Received: 01 December 2011
Published in TECS Volume 14, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CGRA
  2. compiler framework
  3. embedded system
  4. multithreading
  5. power efficiency
  6. runtime transformation
  7. scheduling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)RipTide: A Programmable, Energy-Minimal Dataflow Compiler and ArchitectureProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00046(546-564)Online publication date: 1-Oct-2022
  • (2022)Technical Difficulties and Development TrendSoftware Defined Chips10.1007/978-981-19-7636-0_3(135-166)Online publication date: 15-Nov-2022
  • (2022)Compilation SystemSoftware Defined Chips10.1007/978-981-19-6994-2_4(197-311)Online publication date: 21-Oct-2022
  • (2022)Hardware Architectures and CircuitsSoftware Defined Chips10.1007/978-981-19-6994-2_3(77-196)Online publication date: 21-Oct-2022
  • (2022)Overview of SDCSoftware Defined Chips10.1007/978-981-19-6994-2_2(27-76)Online publication date: 21-Oct-2022
  • (2021)An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.308480432:12(3066-3080)Online publication date: 1-Dec-2021
  • (2020)Pattern-Based Dynamic Compilation System for CGRAs With Online Configuration TransformationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300749231:12(2981-2994)Online publication date: 1-Dec-2020
  • (2019)A Survey of Coarse-Grained Reconfigurable Architecture and DesignACM Computing Surveys10.1145/335737552:6(1-39)Online publication date: 16-Oct-2019
  • (2019)A General Pattern-Based Dynamic Compilation Framework for Coarse-Grained Reconfigurable ArchitecturesProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317745(1-6)Online publication date: 2-Jun-2019
  • (2018)Triggered-Issuance and Triggered-Execution: A Control Paradigm to Minimize Pipeline Stalls in Distributed Controlled Coarse-Grained Reconfigurable ArraysIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.282270829:10(2360-2372)Online publication date: 1-Oct-2018
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media