Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Software pipelining: an effective scheduling technique for VLIW machines

Published: 01 June 1988 Publication History

Abstract

This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding iterations complete. The advantage of software pipelining is that optimal performance can be achieved with compact object code.
This paper extends previous results of software pipelining in two ways: First, this paper shows that by using an improved algorithm, near-optimal performance can be obtained without specialized hardware. Second, we propose a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block. With this scheme, all innermost loops, including those containing conditional statements, can be software pipelined. It also diminishes the start-up cost of loops with small number of iterations. Hierarchical reduction complements the software pipelining technique, permitting a consistent performance improvement be obtained.
The techniques proposed have been validated by an implementation of a compiler for Warp, a systolic array consisting of 10 VLIW processors. This compiler has been used for developing a large number of applications in the areas of image, signal and scientific processing.

References

[1]
Aiken, A. and Nicolau, A. Perfect Pipelining: A New Loop Parallelization Technique. Comell University, Oct., 1987.
[2]
Ammratone, M., Bitz, F., Clune E., Kung H. T., Maulik, P., Ribas, H., Tseng, P., and Webb, J. Applications of Warp. Proc. Compcon Spring 87, San Francisco, Feb., 1987, pp. 272-275.
[3]
Annaratone, M., Bitz, F., Deutch, J., Hamey, L., Kung, H. T., Maulik P. C., Tseng, P., and Webb, J. A. Applications Experience on Warp. Proc. 1987 National Computer Cofference, AFIPS, Chicago, June, 1987, pp. 149-158.
[4]
Annaratone, M., Amould, E., Gross, T., Kung, H. T., Lain, M., Menzilcioglu, O. and Webb, I. A. "The Warp Computer: Architecture, Implementation and Performance". IEEE Transactions on Computers C-36, 12 (December 1987).
[5]
Colwell, R. P., Nix, R. P., O' Donnell, J. I., Papworth, D. B., and Rodman, P. K. A VLIW Architecture for a Trace Scheduling Compiler. Proc. Second Intl. Conf. on Amhitecmral Support for Programming Languages and Operating Systems, Oct., 1987, pp. 180-192.
[6]
Dantzig, G. B., Blatmer, W. O. and Rao, M. R. All Shortest Routes from a Fixed Origin in a Graph. Theory of Graphs, Rome, July, 1967, pp. 85-90.
[7]
E2xfioglu, Kemal. A Compilation Technique for Software Pipelining of Loops with Conditional Jumps. Proc. 20th Annual Workshop on Microprogramming, Dec., 1987.
[8]
Ellis, John R. Bulldog: A Compiler for VLIW Architectures. Ph.D. Th., Yale University, 1985.
[9]
Fisher, J. A. The Optimization of Horizontal Microcode Within and Beyond Basic Blocks: An Application of Processor Scheduling with Resources. Ph.D. Th., New York Univ., OeL 1979.
[10]
Fisher, J. A. "Trace Scheduling: A Technique for Global Microcode Compaction". IEEE Trans. on Computers C-30, 7 (July 1981), 478-490.
[11]
Fisher, J. A., Ellis, J. R., Ruttenberg, J. C. and Nicolau, A. Parallel Processing: A Smart Compiler and a Dumb Machine. Proc. ACM SIGPLAN '84 Syrup. on Compiler Construction, Montreal, Canada, June, 1984, pp. 37-47.
[12]
Fisher, J. A., Landskov, D. and Shriver, B. D. Microcode Compaction: looking Backward and Looking Forward. Proc. 1981 National Computer Conference, 1981, pp. 95-102.
[13]
Floyd, R. W. "Algorithm 97: Shortest Path". Comm. ACM 5, 6 (1962), 345.
[14]
Garey, Michael R. and Johnson, David S. Computers and Intractability A Guide to the Theory of NP-Completeness. Freeman, 1979.
[15]
Gross, T. and Lain, M. Compilation for a High-performance Systolic Array. Proc. ACM SIGPLAN 86 Symposium on Compiler Construction, June, 1986, pp. 27-3 8.
[16]
Hsu, Peter. Highly Concurrent Scalar Processing. Ph.D. Th., University of Hlinois at Urbana-Champaign, 1986.
[17]
Isoda, Sadahiro, Kobayashi, Yoshizumi, and Ishida, Tom. "Global Compaction of Horizontal Microprograms Based on the General/zed Data Dependency Graph". IEEE Trans. on Computers o32, 10 (October 1983), 922-933.
[18]
Kuck, D. J., Kuhn, R. H., Padua, D. A., Leasure, B. and Wolfe, M. l)etxmdence Graphs and Compiler Optimizations. Proc. ACM Symposium on Principles of Programming Languages, January, 1981, pp. 207-218.
[19]
Lah, I. and Atkin, E. Tree Compaction of Microprograms. Proc. 16th Annual Workshop on Microprogramming, Oct., 1982, pp. 23-33.
[20]
Lam, Monica. Compiler Optimizations for Asynchronous Systolic Array Programs. Proc. Fifteenth Annual ACM Symposium on Principles of Programming Languages, Jan., 1988.
[21]
Lain, Monica. A Systolic Array Optimizing Compiler. Ph.D. Th., Carnegie Mellon University, May 1987.
[22]
Linn, Joseph L SRDAG Compaction - A Generalization of Trace Scheduling to Increase the Use of Global Context Information. Proc. 16th Annual Workshop on Microprogramming, 1983, pp. 11-2Z
[23]
McMahon, F. H. Lawrence Livermore National Laboratory FORTRAN Kernels: MFLOPS.
[24]
Patel, Janak H. and Davidson, Edward S. Improving the Throughput of a Pipeline by Insertion of Delays. Proc. 3rd Annual Symposium on Computer Architecture, Jan., 1976, pp. 159-164.
[25]
Rau, B. R. and Glaeser, C. D. Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. Proc. 14th Annual Workshop on Microprogramming, Oct, 1981, pp. 183-198.
[26]
Su, B., Ding, S. and Jin, L. An Improvement of Trace Scheduling for Global Microcode Compaction. Proc. 17th Annual Workshop in Microprogramming, Dec., 1984, pp. 78-85.
[27]
5u, B., Ding, $., Wang, I. and Xia, J. GURPR- A Method for Global Software Pipelining. Proc. 20th Annual Workshop on Microprogramming, Dec., 1987, pp. 88-96.
[28]
Su, B., Ding, S. and Xia, J. URPR- An Extension of URCR for Software Pipeline. Proc. 19th Annual Workshop on Microprogramruing, OCt., 1986, pp. 104-108.
[29]
Tarjan, R. E. "Depth first search and linear graph algorithms". SlAM J. Computing 1, 2 (1972), 146-160.
[30]
Touzeau, R. F. A Fortran Compiler for the FPS-164 Scientific Computer. Proc. ACM SIGPLAN '84 Syrup. on Compiler Construction, June, 1984, pp. 48-57.
[31]
Weiss, S. and Smith, J. E. A Study of Scalar Compilation Techniques for Pipelined Supercomputers. Proc. Second Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct, 1987, pp. 105-109.
[32]
Wood, Graham. Global Op 'tlmization of Microprograms Through Modular Control Congress. Proe. 12th Annual Workshop in Microprogramming, 1979, pp. 1-6.

Cited By

View all
  • (2024)Efficient isochronous fixed-weight sampling with applications to NTRUIACR Communications in Cryptology10.62056/a6n59qgxqOnline publication date: 8-Jul-2024
  • (2024)Fully Verified Instruction SchedulingProceedings of the ACM on Programming Languages10.1145/36897398:OOPSLA2(791-816)Online publication date: 8-Oct-2024
  • (2024)Optimizing RGB Convolution on Cortex-M7 MCU: Approaches for Performance ImprovementIEEE Access10.1109/ACCESS.2024.343958612(140594-140604)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 23, Issue 7
Proceedings of the SIGPLAN '88 conference on Programming language design and implementation
July 1988
338 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/960116
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementation
    June 1988
    338 pages
    ISBN:0897912691
    DOI:10.1145/53990
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1988
Published in SIGPLAN Volume 23, Issue 7

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)435
  • Downloads (Last 6 weeks)41
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient isochronous fixed-weight sampling with applications to NTRUIACR Communications in Cryptology10.62056/a6n59qgxqOnline publication date: 8-Jul-2024
  • (2024)Fully Verified Instruction SchedulingProceedings of the ACM on Programming Languages10.1145/36897398:OOPSLA2(791-816)Online publication date: 8-Oct-2024
  • (2024)Optimizing RGB Convolution on Cortex-M7 MCU: Approaches for Performance ImprovementIEEE Access10.1109/ACCESS.2024.343958612(140594-140604)Online publication date: 2024
  • (2024)nAIxt: A Light-Weight Processor Architecture for Efficient Computation of Neuron ModelsArchitecture of Computing Systems10.1007/978-3-031-66146-4_1(3-17)Online publication date: 13-May-2024
  • (2023)Implementation of Dataflow Software Pipelining for Codelet ModelProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578244.3583734(161-172)Online publication date: 15-Apr-2023
  • (2023)BibliographyEngineering a Compiler10.1016/B978-0-12-815412-0.00023-1(793-813)Online publication date: 2023
  • (2023)Instruction SchedulingEngineering a Compiler10.1016/B978-0-12-815412-0.00018-8(617-662)Online publication date: 2023
  • (2022)Reducing the configuration overhead of the distributed two-level control systemProceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe10.5555/3539845.3539877(104-107)Online publication date: 14-Mar-2022
  • (2022)Reducing the Configuration Overhead of the Distributed Two-level Control System2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774741(104-107)Online publication date: 14-Mar-2022
  • (2022)Routing brain traffic through the von Neumann bottleneckParallel Computing10.1016/j.parco.2022.102952113:COnline publication date: 1-Oct-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media