Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays

Published: 01 May 2009 Publication History

Abstract

Coarse Grain Reconfigurable Array (CGRA) architectures have been extensively used for accelerating time consuming loops. The design of such systems requires good balance between the architecture abilities and the loops' characteristics. A reliable design is characterized by optimized cost-performance trade-off. The main target of this paper is to present an exploration framework that automates the evaluation of CGRA architectures. In specific, the framework helps the designer to identify CGRA architectures tuned toward a specific application domain. The whole process is assisted: (1) by an optimized retargetable compiler based on modulo scheduling and (2) by the Synopsys Design Compiler that provides realization metrics such as the area and clock frequency. Both target on the description of a parametric CGRA architecture template which is capable of instantiating a large diversity of these architectures. Until now, many studies suggest that clock frequency influences performance. However, none of them examines the impact of architecture on clock frequency and performance. Our work studies in a unified way for the first time the area, the clock frequency, the instructions per cycle and performance. Hence, architectures with good compromise between cost and performance can be identified. Another objective of the paper is to present the advances made to the compiler approach used by the exploration framework. In specific, a new more effective priority scheme is proposed while the modulo scheduler has been equipped with backtracking capability. The experiments outline the algorithm's efficiency and scalability for a given set of DSP benchmarks. Moreover, optimized architectures with respect to cost-performance trade-off have been identified by an exploration over 72 CGRA architecture alternatives.

References

[1]
Hartenstein R (2001) A decade of reconfigurable computing: A visionary retrospective. In: Proc of ACM/IEEE DATE'01, pp 642-649.
[2]
Pact Corporation (2005) The XPP white Paper. Technical report, www.pactcorp.com
[3]
Mei B, Vernalde S, Verkest D, De Man H, Lauwereins R (2003) ADRES: an architecture with tightly coupled vliw processor and coarse grained reconfigurable matrix. In: Proc of FPL'03, pp 61-70.
[4]
Singh H, Ming-Hau L, Guangming L et al (2000) Morphosys: an integrated reconfigurable system for data-parallel and communication-intensive applications. IEEE Trans Comput 49(5):465-481.
[5]
Miyamori T, Olukotun K (1998) A quantitative analysis of reconfigurable coprocessors for multimedia applications. In: IEEE symposium on FPGAs for custom computing machines, pp 2-11.
[6]
Ebeling C, Fisher C, Xing G, Shen M, Liu H (2004) Implementing an OFDM receiver on the RaPiD reconfigurable architecture. IEEE Trans Comput 53(11):1436-1448.
[7]
Waingold E, Taylor M, Sarkar V, Lee V et al (1997) Baring it all to software: raw machines. IEEE Comput 30(9):86-93.
[8]
Lee J, Choi K, Dutt N (2003) Compilation approach for coarse-grained reconfigurable architectures. IEEE Des Test Comput 20(1):26-33.
[9]
Kwok Z, Wilton SJE (2005) Register file architecture optimization in coarse grained reconfigurable architecture. In: Proc 13th annual IEEE symp. on field programmable custom computing machines, pp 1-10.
[10]
Panda PR, Catthoor F, Dutt ND et al (2001) Data and memory optimization techniques for embedded systems. ACM Trans Des Automat Electron Syst (TODAES) 6(2):149-206.
[11]
Hartenstein RW, Kress R (1995) A datapath synthesis system for the reconfigurable datapath architecture. In: Proc of ASP-DAC, Article No 77, Sep 1995.
[12]
Cardoso JMP, Weinhardt M (2002) XPP-VC: a compiler with temporal partitioning for the PACT-XPP architecture. In: Proc of FPL 02. LNCS, vol 2438. Springer, Berlin, pp 864-874.
[13]
Ferreira R, Cardoso JMP, Toledo A, Neto HC (2005) Data driven regular reconfigurable arrays: design, space exploration and mapping. In: SAMOS, Greece 2005. LNCS, vol 3553. Springer, Berlin, pp 41- 50.
[14]
Kennedy K, Allen R (2002) Optimizing compilers for modern architectures. Morgan Kauffman, San Mateo.
[15]
Zalamea J, Llosa J, Ayguade E, Valero M (2004) Register constrained modulo scheduling. IEEE Trans Parallel Distrib Syst 15(5):417-430.
[16]
Dimitroulakos G, Galanis MD, Goutis CE (2006) Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures. In: Proc int symp par and distr systems (IPDPS 06), April 25-29, 2006, p 10.
[17]
Galanis MD, Dimitroulakos G, Goutis CE (2007) Speedups and energy reductions from mapping DSP applications on an embedded reconfigurable system. IEEE Trans Very Large Scale Integr Syst 15(12):1362-1366.
[18]
Galanis MD, Dimitroulakos G, Goutis CE (2006) Partitioning methodology for heterogeneous reconfigurable functional unit. J Supercomput 38(1):17-34.
[19]
Mahlke SA, Lin DC, Chen WY et al (1992) Effective compiler support for predicated execution using the hyperblock. In: Proc 25th microarchitecture, pp 45-54.
[20]
Allan VH, Jones RB, Lee RM, Allan SJ (1995) Software pipelining. ACM Comput Surv 27(3):367- 432.
[21]
Rau BR (1994) Iterative Modulo scheduling: an algorithm for software pipelining loops. In: Proc 27th ann int'l symp microarchitecture, San Jose, CA, Dec 1994, pp 63-74.
[22]
Lam MS (1988) Software pipelining: an effective scheduling technique for VLIW machines. In: Proc of SIGPLAN'88, pp 318-328.
[23]
Ruttenberg J, Gao GR, Stoutchinin A, Lichtenstein W (1996) Software pipelining showdown: optimal vs heuristic methods in a production compiler. In: Proc of PLDI 96, pp 1-11.
[24]
Hartenstein RW, Hoffman T, Nageldinger U (2000) Design-space exploration of low power coarse grained reconfigurable datapath array architectures. In: Proc PATMOS 2000. LNCS, vol 1918, pp 118-128.
[25]
Panda PR, Dutt N, Nicolau A (1999) Memory issues in embedded systems-on-chip: optimizations and exploration. Kluwer Academic, Dordrecht.
[26]
Rau BR, Lee M, Tirumalai P, Schlansker MS Register allocation for software pipelined loops. In: Proc of ACM SIGPLAN.
[27]
Wuytack S, Diguet JP, Catthoor F, De Man H (1998) Formalized methodology for data reuse exploration for low-power hierarchical memory mappings. In: IEEE transactions on VLSI systems, vol 6, no 4.
[28]
Hall MW et al (1996) Maximizing multiprocessor performance with the SUIF compiler. Computer 29:84-89.
[29]
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge.
[30]
Leupers R, Basu A, Marwedel P (1998) Optimized array index cmputation in {DSP} programs. In: ASP-DAC, pp 87-92.
[31]
De Micheli G (1994) Synthesis and optimization of digital circuits. McGraw-Hill, New York.
[32]
Texas Instruments Inc. (2005) www.ti.com
[33]
Synopsys (2008) http://www.synopsys.com/products/logic/design_compiler.html

Cited By

View all
  • (2018)Optimizing modulo scheduling to achieve reuse and concurrency for stream processorsThe Journal of Supercomputing10.1007/s11227-010-0522-z59:3(1229-1251)Online publication date: 31-Dec-2018

Index Terms

  1. Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image The Journal of Supercomputing
    The Journal of Supercomputing  Volume 48, Issue 2
    May 2009
    112 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 May 2009

    Author Tags

    1. Architectural exploration
    2. Coarse-grained reconfigurable arrays
    3. Compiler techniques
    4. High productivity tools
    5. Modulo scheduling

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Optimizing modulo scheduling to achieve reuse and concurrency for stream processorsThe Journal of Supercomputing10.1007/s11227-010-0522-z59:3(1229-1251)Online publication date: 31-Dec-2018

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media