article

Free access

Software pipelining: an effective scheduling technique for VLIW machines

Author:

M. LamAuthors Info & Claims

ACM SIGPLAN Notices, Volume 23, Issue 7

Pages 318 - 328

https://doi.org/10.1145/960116.54022

Published: 01 June 1988 Publication History

Abstract

This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding iterations complete. The advantage of software pipelining is that optimal performance can be achieved with compact object code.

This paper extends previous results of software pipelining in two ways: First, this paper shows that by using an improved algorithm, near-optimal performance can be obtained without specialized hardware. Second, we propose a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block. With this scheme, all innermost loops, including those containing conditional statements, can be software pipelined. It also diminishes the start-up cost of loops with small number of iterations. Hierarchical reduction complements the software pipelining technique, permitting a consistent performance improvement be obtained.

The techniques proposed have been validated by an implementation of a compiler for Warp, a systolic array consisting of 10 VLIW processors. This compiler has been used for developing a large number of applications in the areas of image, signal and scientific processing.

References

[1]

Aiken, A. and Nicolau, A. Perfect Pipelining: A New Loop Parallelization Technique. Comell University, Oct., 1987.

[2]

Ammratone, M., Bitz, F., Clune E., Kung H. T., Maulik, P., Ribas, H., Tseng, P., and Webb, J. Applications of Warp. Proc. Compcon Spring 87, San Francisco, Feb., 1987, pp. 272-275.

[3]

Annaratone, M., Bitz, F., Deutch, J., Hamey, L., Kung, H. T., Maulik P. C., Tseng, P., and Webb, J. A. Applications Experience on Warp. Proc. 1987 National Computer Cofference, AFIPS, Chicago, June, 1987, pp. 149-158.

[4]

Annaratone, M., Amould, E., Gross, T., Kung, H. T., Lain, M., Menzilcioglu, O. and Webb, I. A. "The Warp Computer: Architecture, Implementation and Performance". IEEE Transactions on Computers C-36, 12 (December 1987).

Digital Library

[5]

Colwell, R. P., Nix, R. P., O' Donnell, J. I., Papworth, D. B., and Rodman, P. K. A VLIW Architecture for a Trace Scheduling Compiler. Proc. Second Intl. Conf. on Amhitecmral Support for Programming Languages and Operating Systems, Oct., 1987, pp. 180-192.

[6]

Dantzig, G. B., Blatmer, W. O. and Rao, M. R. All Shortest Routes from a Fixed Origin in a Graph. Theory of Graphs, Rome, July, 1967, pp. 85-90.

[7]

E2xfioglu, Kemal. A Compilation Technique for Software Pipelining of Loops with Conditional Jumps. Proc. 20th Annual Workshop on Microprogramming, Dec., 1987.

Digital Library

[8]

Ellis, John R. Bulldog: A Compiler for VLIW Architectures. Ph.D. Th., Yale University, 1985.

Digital Library

[9]

Fisher, J. A. The Optimization of Horizontal Microcode Within and Beyond Basic Blocks: An Application of Processor Scheduling with Resources. Ph.D. Th., New York Univ., OeL 1979.

Digital Library

[10]

Fisher, J. A. "Trace Scheduling: A Technique for Global Microcode Compaction". IEEE Trans. on Computers C-30, 7 (July 1981), 478-490.

Digital Library

[11]

Fisher, J. A., Ellis, J. R., Ruttenberg, J. C. and Nicolau, A. Parallel Processing: A Smart Compiler and a Dumb Machine. Proc. ACM SIGPLAN '84 Syrup. on Compiler Construction, Montreal, Canada, June, 1984, pp. 37-47.

Digital Library

[12]

Fisher, J. A., Landskov, D. and Shriver, B. D. Microcode Compaction: looking Backward and Looking Forward. Proc. 1981 National Computer Conference, 1981, pp. 95-102.

Digital Library

[13]

Floyd, R. W. "Algorithm 97: Shortest Path". Comm. ACM 5, 6 (1962), 345.

Digital Library

[14]

Garey, Michael R. and Johnson, David S. Computers and Intractability A Guide to the Theory of NP-Completeness. Freeman, 1979.

Digital Library

[15]

Gross, T. and Lain, M. Compilation for a High-performance Systolic Array. Proc. ACM SIGPLAN 86 Symposium on Compiler Construction, June, 1986, pp. 27-3 8.

Digital Library

[16]

Hsu, Peter. Highly Concurrent Scalar Processing. Ph.D. Th., University of Hlinois at Urbana-Champaign, 1986.

Digital Library

[17]

Isoda, Sadahiro, Kobayashi, Yoshizumi, and Ishida, Tom. "Global Compaction of Horizontal Microprograms Based on the General/zed Data Dependency Graph". IEEE Trans. on Computers o32, 10 (October 1983), 922-933.

Digital Library

[18]

Kuck, D. J., Kuhn, R. H., Padua, D. A., Leasure, B. and Wolfe, M. l)etxmdence Graphs and Compiler Optimizations. Proc. ACM Symposium on Principles of Programming Languages, January, 1981, pp. 207-218.

Digital Library

[19]

Lah, I. and Atkin, E. Tree Compaction of Microprograms. Proc. 16th Annual Workshop on Microprogramming, Oct., 1982, pp. 23-33.

[20]

Lam, Monica. Compiler Optimizations for Asynchronous Systolic Array Programs. Proc. Fifteenth Annual ACM Symposium on Principles of Programming Languages, Jan., 1988.

Digital Library

[21]

Lain, Monica. A Systolic Array Optimizing Compiler. Ph.D. Th., Carnegie Mellon University, May 1987.

[22]

Linn, Joseph L SRDAG Compaction - A Generalization of Trace Scheduling to Increase the Use of Global Context Information. Proc. 16th Annual Workshop on Microprogramming, 1983, pp. 11-2Z

[23]

McMahon, F. H. Lawrence Livermore National Laboratory FORTRAN Kernels: MFLOPS.

[24]

Patel, Janak H. and Davidson, Edward S. Improving the Throughput of a Pipeline by Insertion of Delays. Proc. 3rd Annual Symposium on Computer Architecture, Jan., 1976, pp. 159-164.

Digital Library

[25]

Rau, B. R. and Glaeser, C. D. Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. Proc. 14th Annual Workshop on Microprogramming, Oct, 1981, pp. 183-198.

Digital Library

[26]

Su, B., Ding, S. and Jin, L. An Improvement of Trace Scheduling for Global Microcode Compaction. Proc. 17th Annual Workshop in Microprogramming, Dec., 1984, pp. 78-85.

Digital Library

[27]

5u, B., Ding, $., Wang, I. and Xia, J. GURPR- A Method for Global Software Pipelining. Proc. 20th Annual Workshop on Microprogramming, Dec., 1987, pp. 88-96.

Digital Library

[28]

Su, B., Ding, S. and Xia, J. URPR- An Extension of URCR for Software Pipeline. Proc. 19th Annual Workshop on Microprogramruing, OCt., 1986, pp. 104-108.

Digital Library

[29]

Tarjan, R. E. "Depth first search and linear graph algorithms". SlAM J. Computing 1, 2 (1972), 146-160.

[30]

Touzeau, R. F. A Fortran Compiler for the FPS-164 Scientific Computer. Proc. ACM SIGPLAN '84 Syrup. on Compiler Construction, June, 1984, pp. 48-57.

Digital Library

[31]

Weiss, S. and Smith, J. E. A Study of Scalar Compilation Techniques for Pipelined Supercomputers. Proc. Second Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct, 1987, pp. 105-109.

[32]

Wood, Graham. Global Op 'tlmization of Microprograms Through Modular Control Congress. Proe. 12th Annual Workshop in Microprogramming, 1979, pp. 1-6.

Digital Library

Cited By

Filho DSilva TLópez J(2024)Efficient isochronous fixed-weight sampling with applications to NTRUIACR Communications in Cryptology10.62056/a6n59qgxqOnline publication date: 8-Jul-2024
https://doi.org/10.62056/a6n59qgxq
Yang ZShirako JSarkar V(2024)Fully Verified Instruction SchedulingProceedings of the ACM on Programming Languages10.1145/36897398:OOPSLA2(791-816)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689739
Lee SSung JKwon J(2024)Optimizing RGB Convolution on Cortex-M7 MCU: Approaches for Performance ImprovementIEEE Access10.1109/ACCESS.2024.343958612(140594-140604)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3439586
Show More Cited By

Index Terms

Software pipelining: an effective scheduling technique for VLIW machines
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language features
        Modules / packages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementation

This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding ...
Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture
LCTES '01: Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems

The TMS320C6000 architecture is a leading family of Digital Signal Processors (DSPs). To achieve peak performance, this VLIW architecture relies heavily on software pipelining. Traditionally, software pipelining has been restricted to regular (FOR) ...
Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture
OM '01: Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems

The TMS320C6000 architecture is a leading family of Digital Signal Processors (DSPs). To achieve peak performance, this VLIW architecture relies heavily on software pipelining. Traditionally, software pipelining has been restricted to regular (FOR) ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 23, Issue 7

Proceedings of the SIGPLAN '88 conference on Programming language design and implementation

July 1988

338 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/960116

Issue’s Table of Contents

PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementation
June 1988
338 pages
ISBN:0897912691
DOI:10.1145/53990
Editor:
R. L. Wexelblat
Philips Laboratories, Briarcliff Manor, NY

Copyright © 1988 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1988

Published in SIGPLAN Volume 23, Issue 7

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

937
Total Citations
View Citations
4,015
Total Downloads

Downloads (Last 12 months)435
Downloads (Last 6 weeks)41

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Filho DSilva TLópez J(2024)Efficient isochronous fixed-weight sampling with applications to NTRUIACR Communications in Cryptology10.62056/a6n59qgxqOnline publication date: 8-Jul-2024
https://doi.org/10.62056/a6n59qgxq
Yang ZShirako JSarkar V(2024)Fully Verified Instruction SchedulingProceedings of the ACM on Programming Languages10.1145/36897398:OOPSLA2(791-816)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689739
Lee SSung JKwon J(2024)Optimizing RGB Convolution on Cortex-M7 MCU: Approaches for Performance ImprovementIEEE Access10.1109/ACCESS.2024.343958612(140594-140604)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3439586
Kauth KLanius CGemmeke T(2024)nAIxt: A Light-Weight Processor Architecture for Efficient Computation of Neuron ModelsArchitecture of Computing Systems10.1007/978-3-031-66146-4_1(3-17)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1007/978-3-031-66146-4_1
Raskar SMonsalve Diaz JApplencourt TKumaran KGao GVieira MCardellini VDi Marco ATuma P(2023)Implementation of Dataflow Software Pipelining for Codelet ModelProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578244.3583734(161-172)Online publication date: 15-Apr-2023
https://dl.acm.org/doi/10.1145/3578244.3583734
(2023)BibliographyEngineering a Compiler10.1016/B978-0-12-815412-0.00023-1(793-813)Online publication date: 2023
https://doi.org/10.1016/B978-0-12-815412-0.00023-1
Cooper KTorczon L(2023)Instruction SchedulingEngineering a Compiler10.1016/B978-0-12-815412-0.00018-8(617-662)Online publication date: 2023
https://doi.org/10.1016/B978-0-12-815412-0.00018-8
Yang YStathis DHemani ABolchini CO'Connor IVerbauwhede IWille R(2022)Reducing the configuration overhead of the distributed two-level control systemProceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe10.5555/3539845.3539877(104-107)Online publication date: 14-Mar-2022
https://dl.acm.org/doi/10.5555/3539845.3539877
Yang YStathis DHemani A(2022)Reducing the Configuration Overhead of the Distributed Two-level Control System2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774741(104-107)Online publication date: 14-Mar-2022
https://doi.org/10.23919/DATE54114.2022.9774741
Pronold JJordan JWylie BKitayama IDiesmann MKunkel S(2022)Routing brain traffic through the von Neumann bottleneckParallel Computing10.1016/j.parco.2022.102952113:COnline publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1016/j.parco.2022.102952
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents