Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/501790.501801acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article
Free access

Instruction scheduling for clustered VLIW architectures

Published: 20 September 2000 Publication History

Abstract

Clustered VLIW organizations are nowadays a common trend in the design of embedded/DSP processors. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is more effective than doing first the assignment and latter the scheduling. We also show that loop unrolling significantly enhances the performance of the proposed scheduler, especially when the communication channel among clusters is the main performance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation for the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. Moreover, when the cycle time is taken into account, a 4-cluster configuration is 3.6 times faster than the unified architecture.

References

[1]
E. Ayguade, C. Barrado, A. Gonz~lez, J. Labarta, D. L~pez, S. Moreno, D. Padua, F. Reig, Q. Riera and M. Valero, "Ictineo: a Tool for Research on ILP", in SC'96, Research Exhibit "Polaris at Work", 1996
[2]
A. Capitanio, D. Dytt and A. Nicolau, "Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs", in Procs. of 25th. Int. Symp. on Microarchitecture, pp. 192- 300, 1992
[3]
J. R. Ellis, "Bulldog: A Compiler for VLIW Architectures", MIT Press, pp. 180-184, 1986
[4]
M.M. Fernandes, J. Llosa and N. Topham, "Distributed Modulo Scheduling", in Procs. of Int. Symp. on High-Performance Computer Architecture, pp. 130-134, Jan. 1999
[5]
P. Glaskowsky, "MAP1000 unfolds at Equator", Microprocessor Report vol 12, no 16. Dec. 1998
[6]
S. Jang, S. Carr, P. Sweany and D. Kuras, "A Code Generation Framework for VLIW Architectures with Partitioned Register Banks", in Procs. of 3rd. Int. Conf. on Massively Parallel Computing Systems, April 1998
[7]
M. Lam, "Software pipelining: An Effective scheduling technique for VLIW Machines", in Procs. on Conf. on Programming Languages and Implementation Design, pp. 258- 267, June 1993
[8]
D.M. Lavery and W.W. Hwu, "Unrolling-Based Optimizations for Modulo Scheduling", in Procs. of 28th. Int. Symp. on Microarchitecture, pp., 1995
[9]
J. Llosa, A. Gonz~lez, E. Ayguad~ and M. Valero, "Swing Modulo Scheduling: A Lifetime-Sensitive Approach", in Procs. of Int. Conf. on Parallel Architectures and Compilation Techniques, pp. 80-86, Oct. 1996
[10]
E. Nystrom and A. E. Eichenberger, ""Effective Cluster Assingment for Modulo Scheduling", in Procs. of 31th. Int. Symp. on Microarchitecture, pp.103-114, 1998
[11]
S. Palacharla, N.P. Jouppi, and J.E. Smith, "Complexity- Effective Superscalar Processors", in Procs. of the 24th. Int. Symp. on Computer Architecture, pp. 1-13, June 1997
[12]
E. Ozer, S. Banerjia and T.M. Conte, "Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures", in Procs. of 31st Int. Symp. on Microarchitecture, pp. 308-315, Nov. 1998
[13]
B.R. Rau and C.D. Glaeser, "Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing", in Procs. on the 14th Ann. Workshop on Microprogramming, pp. 183-198, Oct. 1981
[14]
J. Sanchez and A. Gonz~lez, "Cache Sensitive Modulo Scheduling", in Procs. of 30th. Int. Symp. on Microarchitecture, pp. 338-348, Dec. 1997
[15]
Semiconductor Industry Association, "The National Technology Roadmap for Semiconductors: Technology Needs", 1997
[16]
Texas Instruments Inc., "TMS320C62x/67x CPU and Instruction Set Reference Guide", 1998
[17]
O. Wolfe and J. Bier, "TigerSharc Sinks Teeth Into VLIW", Microprocessor Report, vol. 12, no. 16, Dec. 1998.

Cited By

View all
  • (2014)Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architectureProceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems10.1145/2568326.2568327(25-32)Online publication date: 15-Feb-2014
  • (2013)A constraint programming approach for integrated spatial and temporal scheduling for clustered architecturesACM Transactions on Embedded Computing Systems10.1145/251247013:1(1-23)Online publication date: 5-Sep-2013
  • (2012)WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architectureACM SIGPLAN Notices10.1145/2345141.224842447:5(31-40)Online publication date: 12-Jun-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSS '00: Proceedings of the 13th international symposium on System synthesis
September 2000
240 pages
ISBN:1581132670
  • General Chair:
  • Fadi Kurdahi,
  • Program Chair:
  • Román Hermida

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 September 2000

Check for updates

Qualifiers

  • Article

Conference

ISSS00
Sponsor:

Acceptance Rates

Overall Acceptance Rate 38 of 71 submissions, 54%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)4
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architectureProceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems10.1145/2568326.2568327(25-32)Online publication date: 15-Feb-2014
  • (2013)A constraint programming approach for integrated spatial and temporal scheduling for clustered architecturesACM Transactions on Embedded Computing Systems10.1145/251247013:1(1-23)Online publication date: 5-Sep-2013
  • (2012)WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architectureACM SIGPLAN Notices10.1145/2345141.224842447:5(31-40)Online publication date: 12-Jun-2012
  • (2012)WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architectureProceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems10.1145/2248418.2248424(31-40)Online publication date: 12-Jun-2012
  • (2011)An efficient heuristic for instruction scheduling on clustered vliw processorsProceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems10.1145/2038698.2038707(35-44)Online publication date: 9-Oct-2011
  • (2009)Optimizing scheduling and intercluster connection for application-specific DSP processorsIEEE Transactions on Signal Processing10.1109/TSP.2009.202487057:11(4538-4547)Online publication date: 1-Nov-2009
  • (2006)Instruction scheduling for a tiled dataflow architectureACM SIGARCH Computer Architecture News10.1145/1168919.116887634:5(141-150)Online publication date: 20-Oct-2006
  • (2006)Instruction scheduling for a tiled dataflow architectureACM SIGPLAN Notices10.1145/1168918.116887641:11(141-150)Online publication date: 20-Oct-2006
  • (2006)Instruction scheduling for a tiled dataflow architectureACM SIGOPS Operating Systems Review10.1145/1168917.116887640:5(141-150)Online publication date: 20-Oct-2006
  • (2006)Instruction scheduling for a tiled dataflow architectureProceedings of the 12th international conference on Architectural support for programming languages and operating systems10.1145/1168857.1168876(141-150)Online publication date: 23-Oct-2006
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media