article

Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays

Authors:

Grigorios Dimitroulakos,

Nikos Kostaras,

Michalis D. Galanis,

Costas E. GoutisAuthors Info & Claims

The Journal of Supercomputing, Volume 48, Issue 2

Pages 115 - 151

https://doi.org/10.1007/s11227-008-0208-y

Published: 01 May 2009 Publication History

Abstract

Coarse Grain Reconfigurable Array (CGRA) architectures have been extensively used for accelerating time consuming loops. The design of such systems requires good balance between the architecture abilities and the loops' characteristics. A reliable design is characterized by optimized cost-performance trade-off. The main target of this paper is to present an exploration framework that automates the evaluation of CGRA architectures. In specific, the framework helps the designer to identify CGRA architectures tuned toward a specific application domain. The whole process is assisted: (1) by an optimized retargetable compiler based on modulo scheduling and (2) by the Synopsys Design Compiler that provides realization metrics such as the area and clock frequency. Both target on the description of a parametric CGRA architecture template which is capable of instantiating a large diversity of these architectures. Until now, many studies suggest that clock frequency influences performance. However, none of them examines the impact of architecture on clock frequency and performance. Our work studies in a unified way for the first time the area, the clock frequency, the instructions per cycle and performance. Hence, architectures with good compromise between cost and performance can be identified. Another objective of the paper is to present the advances made to the compiler approach used by the exploration framework. In specific, a new more effective priority scheme is proposed while the modulo scheduler has been equipped with backtracking capability. The experiments outline the algorithm's efficiency and scalability for a given set of DSP benchmarks. Moreover, optimized architectures with respect to cost-performance trade-off have been identified by an exploration over 72 CGRA architecture alternatives.

References

[1]

Hartenstein R (2001) A decade of reconfigurable computing: A visionary retrospective. In: Proc of ACM/IEEE DATE'01, pp 642-649.

Digital Library

[2]

Pact Corporation (2005) The XPP white Paper. Technical report, www.pactcorp.com

[3]

Mei B, Vernalde S, Verkest D, De Man H, Lauwereins R (2003) ADRES: an architecture with tightly coupled vliw processor and coarse grained reconfigurable matrix. In: Proc of FPL'03, pp 61-70.

[4]

Singh H, Ming-Hau L, Guangming L et al (2000) Morphosys: an integrated reconfigurable system for data-parallel and communication-intensive applications. IEEE Trans Comput 49(5):465-481.

Digital Library

[5]

Miyamori T, Olukotun K (1998) A quantitative analysis of reconfigurable coprocessors for multimedia applications. In: IEEE symposium on FPGAs for custom computing machines, pp 2-11.

Digital Library

[6]

Ebeling C, Fisher C, Xing G, Shen M, Liu H (2004) Implementing an OFDM receiver on the RaPiD reconfigurable architecture. IEEE Trans Comput 53(11):1436-1448.

Digital Library

[7]

Waingold E, Taylor M, Sarkar V, Lee V et al (1997) Baring it all to software: raw machines. IEEE Comput 30(9):86-93.

Digital Library

[8]

Lee J, Choi K, Dutt N (2003) Compilation approach for coarse-grained reconfigurable architectures. IEEE Des Test Comput 20(1):26-33.

Digital Library

[9]

Kwok Z, Wilton SJE (2005) Register file architecture optimization in coarse grained reconfigurable architecture. In: Proc 13th annual IEEE symp. on field programmable custom computing machines, pp 1-10.

Digital Library

[10]

Panda PR, Catthoor F, Dutt ND et al (2001) Data and memory optimization techniques for embedded systems. ACM Trans Des Automat Electron Syst (TODAES) 6(2):149-206.

Digital Library

[11]

Hartenstein RW, Kress R (1995) A datapath synthesis system for the reconfigurable datapath architecture. In: Proc of ASP-DAC, Article No 77, Sep 1995.

Digital Library

[12]

Cardoso JMP, Weinhardt M (2002) XPP-VC: a compiler with temporal partitioning for the PACT-XPP architecture. In: Proc of FPL 02. LNCS, vol 2438. Springer, Berlin, pp 864-874.

Digital Library

[13]

Ferreira R, Cardoso JMP, Toledo A, Neto HC (2005) Data driven regular reconfigurable arrays: design, space exploration and mapping. In: SAMOS, Greece 2005. LNCS, vol 3553. Springer, Berlin, pp 41- 50.

Digital Library

[14]

Kennedy K, Allen R (2002) Optimizing compilers for modern architectures. Morgan Kauffman, San Mateo.

Digital Library

[15]

Zalamea J, Llosa J, Ayguade E, Valero M (2004) Register constrained modulo scheduling. IEEE Trans Parallel Distrib Syst 15(5):417-430.

Digital Library

[16]

Dimitroulakos G, Galanis MD, Goutis CE (2006) Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures. In: Proc int symp par and distr systems (IPDPS 06), April 25-29, 2006, p 10.

Digital Library

[17]

Galanis MD, Dimitroulakos G, Goutis CE (2007) Speedups and energy reductions from mapping DSP applications on an embedded reconfigurable system. IEEE Trans Very Large Scale Integr Syst 15(12):1362-1366.

Digital Library

[18]

Galanis MD, Dimitroulakos G, Goutis CE (2006) Partitioning methodology for heterogeneous reconfigurable functional unit. J Supercomput 38(1):17-34.

Digital Library

[19]

Mahlke SA, Lin DC, Chen WY et al (1992) Effective compiler support for predicated execution using the hyperblock. In: Proc 25th microarchitecture, pp 45-54.

Digital Library

[20]

Allan VH, Jones RB, Lee RM, Allan SJ (1995) Software pipelining. ACM Comput Surv 27(3):367- 432.

Digital Library

[21]

Rau BR (1994) Iterative Modulo scheduling: an algorithm for software pipelining loops. In: Proc 27th ann int'l symp microarchitecture, San Jose, CA, Dec 1994, pp 63-74.

Digital Library

[22]

Lam MS (1988) Software pipelining: an effective scheduling technique for VLIW machines. In: Proc of SIGPLAN'88, pp 318-328.

Digital Library

[23]

Ruttenberg J, Gao GR, Stoutchinin A, Lichtenstein W (1996) Software pipelining showdown: optimal vs heuristic methods in a production compiler. In: Proc of PLDI 96, pp 1-11.

Digital Library

[24]

Hartenstein RW, Hoffman T, Nageldinger U (2000) Design-space exploration of low power coarse grained reconfigurable datapath array architectures. In: Proc PATMOS 2000. LNCS, vol 1918, pp 118-128.

Digital Library

[25]

Panda PR, Dutt N, Nicolau A (1999) Memory issues in embedded systems-on-chip: optimizations and exploration. Kluwer Academic, Dordrecht.

Digital Library

[26]

Rau BR, Lee M, Tirumalai P, Schlansker MS Register allocation for software pipelined loops. In: Proc of ACM SIGPLAN.

Digital Library

[27]

Wuytack S, Diguet JP, Catthoor F, De Man H (1998) Formalized methodology for data reuse exploration for low-power hierarchical memory mappings. In: IEEE transactions on VLSI systems, vol 6, no 4.

Digital Library

[28]

Hall MW et al (1996) Maximizing multiprocessor performance with the SUIF compiler. Computer 29:84-89.

Digital Library

[29]

Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge.

Digital Library

[30]

Leupers R, Basu A, Marwedel P (1998) Optimized array index cmputation in {DSP} programs. In: ASP-DAC, pp 87-92.

[31]

De Micheli G (1994) Synthesis and optimization of digital circuits. McGraw-Hill, New York.

Digital Library

[32]

Texas Instruments Inc. (2005) www.ti.com

[33]

Synopsys (2008) http://www.synopsys.com/products/logic/design_compiler.html

Cited By

Wang LXue JYang X(2018)Optimizing modulo scheduling to achieve reuse and concurrency for stream processorsThe Journal of Supercomputing10.1007/s11227-010-0522-z59:3(1229-1251)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11227-010-0522-z

Index Terms

Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance

Index terms have been assigned to the content through auto-classification.

Recommendations

Compiler assisted architectural exploration for coarse grained reconfigurable arrays
GLSVLSI '07: Proceedings of the 17th ACM Great Lakes symposium on VLSI

A large number of factors influence the hardware cost and the mapping efficiency of applications on coarse grain reconfigurable architectures. This paper investigates for the first time in a unified way the four factors that are directly related with ...
A unified evaluation framework for coarse grained reconfigurable array architectures
CF '07: Proceedings of the 4th international conference on Computing frontiers

The efficiency of a coarse grained reconfigurable array architecture in terms of performance and hardware cost is hard to be determined. The large number of parameters that define an architecture instance and the mapping complexity makes the evaluation ...
Accelerating loops for coarse grained reconfigurable architectures using instruction extensions
RACS '11: Proceedings of the 2011 ACM Symposium on Research in Applied Computation

Aggressive embedded processors are often equipped with general purpose cores and special purpose acceleration logics. In our paper, we consider a reconfigurable processor that consists of very long instruction word (VLIW) cores and coarse grained ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing

The Journal of Supercomputing Volume 48, Issue 2

May 2009

112 pages

ISSN:0920-8542

Issue’s Table of Contents

Copyright © Copyright © 2009 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2009

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang LXue JYang X(2018)Optimizing modulo scheduling to achieve reuse and concurrency for stream processorsThe Journal of Supercomputing10.1007/s11227-010-0522-z59:3(1229-1251)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11227-010-0522-z

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents