Article

Free access

Instruction scheduling for clustered VLIW architectures

Authors:

Jesús Sánchez,

Antonio GonzálezAuthors Info & Claims

ISSS '00: Proceedings of the 13th international symposium on System synthesis

Pages 41 - 46

Published: 20 September 2000 Publication History

Abstract

Clustered VLIW organizations are nowadays a common trend in the design of embedded/DSP processors. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is more effective than doing first the assignment and latter the scheduling. We also show that loop unrolling significantly enhances the performance of the proposed scheduler, especially when the communication channel among clusters is the main performance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation for the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. Moreover, when the cycle time is taken into account, a 4-cluster configuration is 3.6 times faster than the unified architecture.

References

[1]

E. Ayguade, C. Barrado, A. Gonz~lez, J. Labarta, D. L~pez, S. Moreno, D. Padua, F. Reig, Q. Riera and M. Valero, "Ictineo: a Tool for Research on ILP", in SC'96, Research Exhibit "Polaris at Work", 1996

[2]

A. Capitanio, D. Dytt and A. Nicolau, "Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs", in Procs. of 25th. Int. Symp. on Microarchitecture, pp. 192- 300, 1992

Digital Library

[3]

J. R. Ellis, "Bulldog: A Compiler for VLIW Architectures", MIT Press, pp. 180-184, 1986

Digital Library

[4]

M.M. Fernandes, J. Llosa and N. Topham, "Distributed Modulo Scheduling", in Procs. of Int. Symp. on High-Performance Computer Architecture, pp. 130-134, Jan. 1999

Digital Library

[5]

P. Glaskowsky, "MAP1000 unfolds at Equator", Microprocessor Report vol 12, no 16. Dec. 1998

[6]

S. Jang, S. Carr, P. Sweany and D. Kuras, "A Code Generation Framework for VLIW Architectures with Partitioned Register Banks", in Procs. of 3rd. Int. Conf. on Massively Parallel Computing Systems, April 1998

[7]

M. Lam, "Software pipelining: An Effective scheduling technique for VLIW Machines", in Procs. on Conf. on Programming Languages and Implementation Design, pp. 258- 267, June 1993

Digital Library

[8]

D.M. Lavery and W.W. Hwu, "Unrolling-Based Optimizations for Modulo Scheduling", in Procs. of 28th. Int. Symp. on Microarchitecture, pp., 1995

Digital Library

[9]

J. Llosa, A. Gonz~lez, E. Ayguad~ and M. Valero, "Swing Modulo Scheduling: A Lifetime-Sensitive Approach", in Procs. of Int. Conf. on Parallel Architectures and Compilation Techniques, pp. 80-86, Oct. 1996

Digital Library

[10]

E. Nystrom and A. E. Eichenberger, ""Effective Cluster Assingment for Modulo Scheduling", in Procs. of 31th. Int. Symp. on Microarchitecture, pp.103-114, 1998

Digital Library

[11]

S. Palacharla, N.P. Jouppi, and J.E. Smith, "Complexity- Effective Superscalar Processors", in Procs. of the 24th. Int. Symp. on Computer Architecture, pp. 1-13, June 1997

Digital Library

[12]

E. Ozer, S. Banerjia and T.M. Conte, "Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures", in Procs. of 31st Int. Symp. on Microarchitecture, pp. 308-315, Nov. 1998

Digital Library

[13]

B.R. Rau and C.D. Glaeser, "Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing", in Procs. on the 14th Ann. Workshop on Microprogramming, pp. 183-198, Oct. 1981

Digital Library

[14]

J. Sanchez and A. Gonz~lez, "Cache Sensitive Modulo Scheduling", in Procs. of 30th. Int. Symp. on Microarchitecture, pp. 338-348, Dec. 1997

Digital Library

[15]

Semiconductor Industry Association, "The National Technology Roadmap for Semiconductors: Technology Needs", 1997

[16]

Texas Instruments Inc., "TMS320C62x/67x CPU and Instruction Set Reference Guide", 1998

[17]

O. Wolfe and J. Bier, "TigerSharc Sinks Teeth Into VLIW", Microprocessor Report, vol. 12, no. 16, Dec. 1998.

Cited By

Kim NKrall ABrandner FVander Aa T(2014)Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architectureProceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems10.1145/2568326.2568327(25-32)Online publication date: 15-Feb-2014
https://dl.acm.org/doi/10.1145/2568326.2568327
Beg MBeek P(2013)A constraint programming approach for integrated spatial and temporal scheduling for clustered architecturesACM Transactions on Embedded Computing Systems10.1145/251247013:1(1-23)Online publication date: 5-Sep-2013
https://dl.acm.org/doi/10.1145/2512470
Huang YZhao MXue C(2012)WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architectureACM SIGPLAN Notices10.1145/2345141.224842447:5(31-40)Online publication date: 12-Jun-2012
https://dl.acm.org/doi/10.1145/2345141.2248424
Show More Cited By

Instruction scheduling for clustered VLIW architectures
1. General and reference
  1. Cross-computing tools and techniques
2. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Recommendations

Loop fusion for clustered VLIW architectures

Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, high-performance digital signal processors can often exploit considerable instruction-level ...
Loop transformations for clustered vliw architectures
Loop fusion for clustered VLIW architectures
LCTES/SCOPES '02: Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems

Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, high-performance digital signal processors can often exploit considerable instruction-level ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSS '00: Proceedings of the 13th international symposium on System synthesis

September 2000

240 pages

ISBN:1581132670

General Chair:
Fadi Kurdahi
University of California, Irvine, CA, USA
,
Program Chair:
Román Hermida
Complutense University, Madrid, Spain

Sponsors

IEEE: IEEE Computer Society Technical Committee on Design Automation
SIGDA: ACM Special Interest Group on Design Automation

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 September 2000

Check for updates

Qualifiers

Article

Conference

ISSS00

Sponsor:

IEEE
SIGDA

ISSS00: 13th International Symposium on System Synthesis

September 20 - 22, 2000

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 38 of 71 submissions, 54%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
378
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)4

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kim NKrall ABrandner FVander Aa T(2014)Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architectureProceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems10.1145/2568326.2568327(25-32)Online publication date: 15-Feb-2014
https://dl.acm.org/doi/10.1145/2568326.2568327
Beg MBeek P(2013)A constraint programming approach for integrated spatial and temporal scheduling for clustered architecturesACM Transactions on Embedded Computing Systems10.1145/251247013:1(1-23)Online publication date: 5-Sep-2013
https://dl.acm.org/doi/10.1145/2512470
Huang YZhao MXue C(2012)WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architectureACM SIGPLAN Notices10.1145/2345141.224842447:5(31-40)Online publication date: 12-Jun-2012
https://dl.acm.org/doi/10.1145/2345141.2248424
Huang YZhao MXue CWilhelm RFalk HYi W(2012)WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architectureProceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems10.1145/2248418.2248424(31-40)Online publication date: 12-Jun-2012
https://dl.acm.org/doi/10.1145/2248418.2248424
Zhang XWu HXue JGupta RMooney V(2011)An efficient heuristic for instruction scheduling on clustered vliw processorsProceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems10.1145/2038698.2038707(35-44)Online publication date: 9-Oct-2011
https://dl.acm.org/doi/10.1145/2038698.2038707
Xu CXue CHu JSha E(2009)Optimizing scheduling and intercluster connection for application-specific DSP processorsIEEE Transactions on Signal Processing10.1109/TSP.2009.202487057:11(4538-4547)Online publication date: 1-Nov-2009
https://dl.acm.org/doi/10.1109/TSP.2009.2024870
Mercaldi MSwanson SPetersen APutnam ASchwerin AOskin MEggers S(2006)Instruction scheduling for a tiled dataflow architectureACM SIGARCH Computer Architecture News10.1145/1168919.116887634:5(141-150)Online publication date: 20-Oct-2006
https://dl.acm.org/doi/10.1145/1168919.1168876
Mercaldi MSwanson SPetersen APutnam ASchwerin AOskin MEggers S(2006)Instruction scheduling for a tiled dataflow architectureACM SIGPLAN Notices10.1145/1168918.116887641:11(141-150)Online publication date: 20-Oct-2006
https://dl.acm.org/doi/10.1145/1168918.1168876
Mercaldi MSwanson SPetersen APutnam ASchwerin AOskin MEggers S(2006)Instruction scheduling for a tiled dataflow architectureACM SIGOPS Operating Systems Review10.1145/1168917.116887640:5(141-150)Online publication date: 20-Oct-2006
https://dl.acm.org/doi/10.1145/1168917.1168876
Mercaldi MSwanson SPetersen APutnam ASchwerin AOskin MEggers SShen JMartonosi M(2006)Instruction scheduling for a tiled dataflow architectureProceedings of the 12th international conference on Architectural support for programming languages and operating systems10.1145/1168857.1168876(141-150)Online publication date: 23-Oct-2006
https://dl.acm.org/doi/10.1145/1168857.1168876
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten