research-article

An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures

Authors:

Soojung RyuAuthors Info & Claims

LCTES '11: Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Pages 151 - 160

https://doi.org/10.1145/1967677.1967699

Published: 11 April 2011 Publication History

Abstract

In this paper, we propose a data partitioning technique for the memory subsystem that consists of a multi-ported scratchpad memory (SPM) unit and a single-ported data cache in coarse-grained reconfigurable arrays (CGRA) architecture. The embedded reconfigurable processor executes programs by switching between the Non-VLIW and VLIW modes depending on the type of the code region to achieve high performance. The VLIW mode exploits code regions with high ILP that require high memory bandwidth and the Non-VLIW mode exploits those with low ILP that require low memory latency. Our data partitioning technique between the SPM and the data cache is based on data interference graph reduction and profiling information. Given an SPM size, it finds the optimal data partitions by taking the VLIW instruction schedule into consideration. We evaluate our data partitioning technique for the CGRA architecture with three representative multimedia applications.

References

[1]

Federico Angiolini, Luca Benini, and Alberto Caprara. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, pages 318--326, 2003.

Digital Library

[2]

Oren Avissar, Rajeev Barua, and Dave Stewart. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst., 1(1):6--26, 2002.

Digital Library

[3]

Kristof Beyls and Erik H. D'Hollander. Generating cache hints for improved program efficiency. J. Syst. Archit., 51(4):223--250, 2005.

Digital Library

[4]

CACTI 4.2. http://quid.hpl.hp.com:9081/cacti/, 2006.

[5]

Hyungmin Cho, Bernhard Egger, Jaejin Lee, and Heonshik Shin. Dynamic data scratchpad memory management for a memory subsystem with an mmu. In LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 195--206, 2007.

Digital Library

[6]

Intel Corporation. Intel Itanium 2 Processor Reference Manual For Software Development and Optimization. 2004.

[7]

Eddy De Greef, Francky Catthoor, and Hugo De Man. Array placement for storage size reduction in embedded multimedia systems. In ASAP '97: Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors, pages 66--, 1997.

Digital Library

[8]

Angel Dominguez, Nghi Nguyen, and Rajeev K. Barua. Recursive function data allocation to scratch-pad memory. In CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, pages 65--74, 2007.

Digital Library

[9]

Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing, 1(4):521--540, 2005.

Digital Library

[10]

Michael R. Garey and David S. Johnson. Computers and Intractability. Freeman, 1979.

Digital Library

[11]

Antonio Gonz&#225;lez, Carlos Aliagas, and Mateo Valero. A data cache with multiple caching strategies tuned to different types of locality. In ICS '95: Proceedings of the 9th international conference on Supercomputing, pages 338--347, 1995.

Digital Library

[12]

AMD Inc. Software Optimization Guide for AMD64 Processors. 2005.

[13]

Texas Instruments Incoporated. Tms320c6000 high performance dsps. http://www.ti.com, 2006.

[14]

ISO/IEC. IS 13818--3 Information Technology - Generic Coding of Moving Pictures and Associated Audio: Audio. 1996. MP3.

[15]

ISO/IEC. IS 14496--10 Information Technology - Coding of Audio Visual Objects: Advanced Video Coding. 2005. H.264.

[16]

ISO/IEC. IS 14496--3 Information Technology - Coding of Audio Visual Objects: Audio. 2005. AAC.

[17]

Toni Juan, Juan J. Navarro, and Olivier Temam. Data caches for superscalar processors. In ICS '97: Proceedings of the 11th international conference on Supercomputing, pages 60--67, 1997.

Digital Library

[18]

Hsien-Hsin S. Lee and Gary S. Tyson. Region-based caching: an energy-delay efficient memory architecture for embedded processors. In CASES '00: Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems, pages 120--127, 2000.

Digital Library

[19]

Jacob Leverich, Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, and Christos Kozyrakis. Comparing memory systems for chip multiprocessors. In ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture, pages 358--368, 2007.

Digital Library

[20]

ARM Limited. RealView SoC Designer 6.2,. http://www.arm.com/products/DevTools/SoCDesigner.html.

[21]

Guangming Lu, Hartej Singh, Ming-Hau Lee, Nader Bagherzadeh, Fadi J. Kurdahi, and Eliseu M. Chaves Filho. The morphosys parallel reconfigurable system. In Euro-Par '99: Proceedings of the 5th International Euro-Par Conference on Parallel Processing, pages 727--734, 1999.

Digital Library

[22]

Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In DATE '03: Proceedings of the conference on Design, Automation and Test in Europe, page 10296, 2003.

Digital Library

[23]

Bingfeng Mei, Serge Vernalde, Diederik Verkest, and Rudy Lauwereins. Design methodology for a tightly coupled vliw/reconfigurable matrix architecture: A case study. In DATE '04: Proceedings of the conference on Design, automation and test in Europe, page 21224, 2004.

Digital Library

[24]

Wilfried Oed and O. Lange. On the effective bandwidth of interleaved memories in vector processor systems. IEEE Trans. Comput., 34(10):949--957, 1985.

Digital Library

[25]

Taewook Oh, Bernhard Egger, Hyunchul Park, and Scott Mahlke. Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures. In LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 21--30, 2009.

Digital Library

[26]

Hyunchul Park, Kevin Fan, Manjunath Kudlur, and Scott Mahlke. Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures. In CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, pages 136--146, 2006.

Digital Library

[27]

Hyunchul Park, Kevin Fan, Scott A. Mahlke, Taewook Oh, Heeseok Kim, and Hong-seok Kim. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 166--176, 2008.

Digital Library

[28]

Yongjun Park, Hyunchul Park, and Scott Mahlke. Cgra express: accelerating execution using dynamic operation fusion. In CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, pages 271--280, 2009.

Digital Library

[29]

Ram Raghavan and John P. Hayes. Reducing interference among vector accesses in interleaved memories. IEEE Trans. Comput., 42(4):471--483, 1993.

Digital Library

[30]

B. Ramakrishna Rau. Iterative modulo scheduling: an algorithm for software pipelining loops. In MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture, pages 63--74, 1994.

Digital Library

[31]

Rajiv Ravindran, Michael Chu, and Scott Mahlke. Compiler-managed partitioned data caches for low power. In LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 237--247, 2007.

Digital Library

[32]

Jude A. Rivers, Gary S. Tyson, Edward S. Davidson, and Todd M. Austin. On high-bandwidth data cache design for multi-issue processors. In MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pages 46--56, 1997.

Digital Library

[33]

Julio Sahuquillo, Salvador Petit, Ana Pont, and Veljko Milutinovi&#263;. Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors. J. Syst. Archit., 51(8):451--469, 2005.

Digital Library

[34]

Jes&#250;s S&#225;nchez and Antonio Gonz&#225;lez. A locality sensitive multi-module cache with explicit management. In ICS '99: Proceedings of the 13th international conference on Supercomputing, pages 51--59, 1999.

Digital Library

[35]

Aviral Shrivastava, Ilya Issenin, and Nikil Dutt. Compilation techniques for energy reduction in horizontally partitioned cache architectures. In CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, pages 90--96, 2005.

Digital Library

[36]

Gurindar S. Sohi and Manoj Franklin. High-bandwidth data memory systems for superscalar processors. In ASPLOS-IV: Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, pages 53--62, 1991.

Digital Library

[37]

Stefan Steinke, Lars Wehmeyer, Bo-Sik Lee, and Peter Marwedel. Assigning program and data objects to scratchpad for energy reduction. In DATE '02: Proceedings of the conference on Design, automation and test in Europe, page 409, 2002.

Digital Library

[38]

Tensilica Inc. Xtensa customizable processors. http://www.tensilica.com, 2007.

[39]

Remko Tron&#231;on, Maurice Bruynooghe, Gerda Janssens, and Francky Catthoor. Storage size reduction by in-place mapping of arrays. In VMCAI '02: Revised Papers from the Third International Workshop on Verification, Model Checking, and Abstract Interpretation, pages 167--181, 2002.

Digital Library

[40]

Gary Tyson, Matthew Farrens, John Matthews, and Andrew R. Pleszkun. A modified approach to data cache management. In MICRO 28: Proceedings of the 28th annual international symposium on Microarchitecture, pages 93--103, 1995.

Digital Library

[41]

Sumesh Udayakumaran and Rajeev Barua. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, pages 276--286, 2003.

Digital Library

[42]

Osman S. Unsal, Israel Koren, C. Mani Krishna, and Csaba Andras Moritz. The minimax cache: An energy-efficient framework for media processors. In HPCA '02: Proceedings of the 8th International Symposium on High-Performance Computer Architecture, page 131, 2002.

Digital Library

[43]

Manish Verma, Stefan Steinke, and Peter Marwedel. Data partitioning for maximal scratchpad usage. In ASP-DAC '03: Proceedings of the 2003 Asia and South Pacific Design Automation Conference, pages 77--83, 2003.

Digital Library

[44]

Lars Wehmeyer, Urs Helmig, and Peter Marwedel. Compiler-optimized usage of partitioned memories. In WMPI '04: Proceedings of the 3rd workshop on Memory performance issues, pages 114--120, 2004.

Digital Library

Cited By

Tkel MYurdakul Ars B(2018)Customizable embedded processor array for multimedia applicationsIntegration, the VLSI Journal10.1016/j.vlsi.2017.09.00960:C(213-223)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1016/j.vlsi.2017.09.009
Sutter BRaghavan PLambrechts A(2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
https://doi.org/10.1007/978-3-319-91734-4_12
Theocharis PSutter B(2016)A Bimodal Scheduler for Coarse-Grained Reconfigurable ArraysACM Transactions on Architecture and Code Optimization10.1145/289347513:2(1-26)Online publication date: 6-Jun-2016
https://dl.acm.org/doi/10.1145/2893475
Show More Cited By

Index Terms

An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures
LCTES '10

In this paper, we propose a data partitioning technique for the memory subsystem that consists of a multi-ported scratchpad memory (SPM) unit and a single-ported data cache in coarse-grained reconfigurable arrays (CGRA) architecture. The embedded ...
Fast, frequency-based, integrated register allocation and instruction scheduling

Instruction scheduling and register allocation are two of the most important optimization phases in modern compilers as they have a significant impact on the quality of the generated code. Unfortunately, the objectives of these two optimizations are in ...
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LCTES '11: Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

April 2011

182 pages

ISBN:9781450305556

DOI:10.1145/1967677

General Chair:
Jan Vitek
Purdue University, USA
,
Program Chair:
Bjorn De Sutter
Ghent University, Belgium

ACM SIGPLAN Notices Volume 46, Issue 5
LCTES '10
May 2011
170 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2016603
Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 April 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

LCTES '11

Sponsor:

LCTES '11: SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems

April 11 - 14, 2011

IL, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
499
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tkel MYurdakul Ars B(2018)Customizable embedded processor array for multimedia applicationsIntegration, the VLSI Journal10.1016/j.vlsi.2017.09.00960:C(213-223)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1016/j.vlsi.2017.09.009
Sutter BRaghavan PLambrechts A(2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
https://doi.org/10.1007/978-3-319-91734-4_12
Theocharis PSutter B(2016)A Bimodal Scheduler for Coarse-Grained Reconfigurable ArraysACM Transactions on Architecture and Code Optimization10.1145/289347513:2(1-26)Online publication date: 6-Jun-2016
https://dl.acm.org/doi/10.1145/2893475
Miniskar NPatil RGadde RCho YKim SLee S(2016)Intra mode power saving methodology for CGRA-based reconfigurable processor architectures2016 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS.2016.7527340(714-717)Online publication date: May-2016
https://doi.org/10.1109/ISCAS.2016.7527340
Miniskar NKohli SPark HYoo DChatha KErnst RRaghunathan AIyer R(2014)Retargetable automatic generation of compound instructions for CGRA based reconfigurable processor applicationsProceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems10.1145/2656106.2656125(1-9)Online publication date: 12-Oct-2014
https://dl.acm.org/doi/10.1145/2656106.2656125
Gauthier LUeno SInoue KRabbah RRaghunathan A(2013)Hybrid compile and run-time memory management for a 3D-stacked reconfigurable acceleratorProceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.5555/2555729.2555739(1-10)Online publication date: 29-Sep-2013
https://dl.acm.org/doi/10.5555/2555729.2555739
Atak OAtalar A(2013)BilRCIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2012.220774821:7(1285-1298)Online publication date: 1-Jul-2013
https://dl.acm.org/doi/10.1109/TVLSI.2012.2207748
Gauthier LUeno SInoue K(2013)Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)10.1109/CASES.2013.6662514(1-10)Online publication date: Sep-2013
https://doi.org/10.1109/CASES.2013.6662514
Miniskar NGode PKohli SYoo DJerraya ACarloni LMooney VRabbah R(2012)Function inlining and loop unrolling for loop acceleration in reconfigurable processorsProceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems10.1145/2380403.2380426(101-110)Online publication date: 7-Oct-2012
https://dl.acm.org/doi/10.1145/2380403.2380426
Sutter BRaghavan PLambrechts A(2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
https://doi.org/10.1007/978-3-319-91734-4_12
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents