Article

Free access

Impact of self-scheduling order on performance on multiprocessor systems

Authors:

C.-Q. ZhuAuthors Info & Claims

ICS '88: Proceedings of the 2nd international conference on Supercomputing

Pages 593 - 603

https://doi.org/10.1145/55364.55422

Published: 01 June 1988 Publication History

Abstract

Processor self-scheduling is an efficient dynamic scheduling for multiprocessors. This paper discusses the impact of the self-scheduling order on the performance of multiply-nested parallel loops.

It is shown that, due to data synchronization for cross-iteration data dependencies, the completion time of a multiply-nested loop is reduced when the nesting parallel loops with smaller delays are moved to the inside. The best performance is achieved when a shortest-delay scheduling order is used. The performance of the shortest-delay self-scheduling is compared to other self-scheduling orders and to compile-time static scheduling order proposed elsewhere. Program transformation needed to implement shortest-delay self-scheduling is also included.

References

[1]

E.L. Lusk and R. A. Overbeek. "Implementation of Monitors with Macros: A Programming Aid for the HEP and other Parallel Processors", Argonne National Laboratory, ANL-83-97, Argonne, Illinois, December 1983.

[2]

B. Smith. "The Architecture of HEP," In: Parallel MIMD Computation: HEP Supercomputer and Its Applications, Janusz S. Kowallk, ed. The MIT Press, Cambridge, Massachusetts, 1985, pp. 41- 55.

Digital Library

[3]

P. Tang and P. Yew. "Processor Self-Scheduling for Multiple-Nested Parallel Loops," Proceedings of 1986 International Conference on Parallel Processing (August 19-22, 1986), pp. 528-535.

[4]

Z. Fang, P. Yew, P. Tang and C. Zhu. "Dynamic Processor Self-Scheduling for General Parallel Nested Loops," Proceedings of the 1987 International Conference on Parallel Processing (August, 1987), pp. 1-10.

[5]

C.D. Polyehronopoulos and D. J. Kuck. "Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supereomputers," IEEE Transactions on Computers (December, 1987), Vol. C-36, No. 12, pp. 1425-1439.

Digital Library

[6]

R. Allen, D. Callahan and K. Kennedy. "Automatic Decomposition of Scientific Programs for Parallel Execution," Conference Record of the Fourteenth Annual A CM Symposium on Principles of Programming Languages (January, 1987), pp. 63--76.

Digital Library

[7]

M.D. Guzzi. "Multitasking Runtime Systems for the Cedar Multiproce.,~or", Center for Supercomputing Research and Development, University of Illinois at Urb~ma-Champaign, Rpt No. 604, Urbana, Illinois, July, 1986.

[8]

Alliant Computer Systems Corp. FX/Series-- Architecture Manual., 1986.

[9]

F. Darema-Rogers, D. A. George, A. Norton and G. F. Pfister. "VM/EPEX- A VM/SP Based Environment for Parallel Execution", IBM Research RCl1381, Yorktown Heights, NY, 1985.

[10]

R.L. Graham. "Bounds of Certain Multiprocessing Timing Anomalies," SIAM Journal on Applied Mathematics (1969), Vol. 17, No. 2, pp. 416-429.

[11]

E.G. Coffman Jr. and P. J. Denning. Operating Systems Theory. Prentlce-Hall, Inc., Englewood cliff, New Jersey, 1973.

Digital Library

[12]

S.P. Midkiff and D. A. Padua. "Compiler Algorithms for Synchronization," IEEE Transaction on Computers (December, 1987), Vol. C-36, No. 12, pp. 1485-1495.

Digital Library

[13]

P. Tang, P. Yew and C. Zhu. "Algorithms for Generating Data-Level Synchronization Instructions", Center for Supercomputing Research and Development, University of Illinois at Urbana- Champaign, Rpt. No. 733, Urbana, January, 1988.

[14]

P. Tang, P. Yew, Z. Fang and C. Zhu. "Deadlock Prevention in Processor Self-Scheduling for Parallel Nested Loops," Proceedings of the 1987 International Conference o} Parallel Processing (August 1987), pp. 11-18.

[15]

D.J. Kuck. The Structure of Computers and Computations, Yol. 1. John Wiley & Sons, Inc., 1978.

Digital Library

[16]

C. Zhu and P. Yew. "A Scheme to Enforce Data Dependences on Large Multiprocessor Systems," IEEE Transaction on Software Engineering (June 1987), Vol. SE-13, No. 6, pp. 726-739.

Digital Library

[17]

P. Tang. "Self-Scheduling, Data Synchronization and Program Transformation for Multiprocessor Systems", Center for Supercomputlng Research and Development, University of Illinois at Urbana-Champaign, PhD Thesis, in preparation.

Digital Library

[18]

R. Cytron. "Doacross: Beyond Vectorization for Multiprocessors," Proceedings of the 1986 International Conference for Parallel Processing (August, 1986), pp. 836-844.

[19]

M.J. Wolfe. "Optimizing Compiler for Supercomputers", Department of Computer Science, University of Illinois at Urbana-Champaign, Report No. UIUCDCS-R-82-1105, October, 1982.

[20]

J.R. Allen and K. Kennedy. "Automatic Loop Interchange," Proceedings of the A CM SIG- PLAN 198~ Symposium on Compiler Construction (June, 1984), pp. 233-246.

Digital Library

[21]

M. Wolfe. "Advanced Loop Interchanging," Proceedings o/the 1986 International Conference on Parallel ProcesMng (August 1986), pp. 536- 543.

[22]

R. Cytron. "Limited Processor Scheduling of Doacross Loops," Proceedings of the 1987 International Conference on Parallel Processing (August, 1987), pp. 226-234.

[23]

A. Shoshani and E. G. Coffman. "Detection and Prevention of Deadlocks," Proceedings of 4th Annual Princeton Conference an Information Sciences and Systems (Marchl 1970), pp. 355- 360.

[24]

Z. Li and W. Abu-sufah. "A Technique for Reducing Synchronization Overhead in Large Scale Multiprocessors," Proc. of th~ l'2th international Symposium on Computer Architecture (June 1985), pp. 406-413.

Digital Library

[25]

A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph and M. Snir. "The NYU Ultraeomputer Designing a MIJ/ID Shared Memory Parallel Computer," IEEE Trans. on Computer (Feb. 1983), Vol. C-32, No. 2, pp. 175-189.

[26]

D.J. Kuck et al. "Parallel Computing Today and Cedar Approach," Science (Feb. 1986), pp. 967- 974.

[27]

M. Wolfe. "Loop Skewing: the Wavefront Method Revisited", Kuck and Associates, inc., Savoy, Illinois, 1987.

Cited By

Lu YXu ZXia F(2012)Prediction-Based Independent Task Scheduling for Heterogeneous Distributed Computing SystemsAdvanced Materials Research10.4028/www.scientific.net/AMR.457-458.1039457-458(1039-1046)Online publication date: Jan-2012
https://doi.org/10.4028/www.scientific.net/AMR.457-458.1039
Lennerstad HLundberg L(2003)Chapter 5: Parallel program scheduling with given parallel profileElectronic Notes in Discrete Mathematics10.1016/S1571-0653(04)80604-814(97-111)Online publication date: May-2003
https://doi.org/10.1016/S1571-0653(04)80604-8
Hamidzadeh BKit LLilja D(2000)Dynamic Task Scheduling Using Online OptimizationIEEE Transactions on Parallel and Distributed Systems10.1109/71.88863611:11(1151-1163)Online publication date: 1-Nov-2000
https://dl.acm.org/doi/10.1109/71.888636
Show More Cited By

Recommendations

Scheduling and optimization for multiprocessor systems
IPPS '91: Proceedings of the Fifth International Parallel Processing Symposium

Studies the problem of scheduling arbitrarily nested loops on a tightly-coupled multiprocessor system. First, based on the concept of concurrent iterations, some interesting properties of arbitrarily nested loops are explored. Then an approach is ...
Parallel loop transformation techniques for vector-based multiprocessor systems
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers

A practical processor self-scheduling scheme, trapezoid self-scheduling, is proposed for arbitrary parallel nested loops in shared-memory multiprocessors. Generally, loops are the richest source of parallelism in parallel programs. To dynamically ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '88: Proceedings of the 2nd international conference on Supercomputing

June 1988

679 pages

ISBN:0897912721

DOI:10.1145/55364

Editor:
J. Lenfant
Rennes

Copyright © 1988 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1988

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
282
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lu YXu ZXia F(2012)Prediction-Based Independent Task Scheduling for Heterogeneous Distributed Computing SystemsAdvanced Materials Research10.4028/www.scientific.net/AMR.457-458.1039457-458(1039-1046)Online publication date: Jan-2012
https://doi.org/10.4028/www.scientific.net/AMR.457-458.1039
Lennerstad HLundberg L(2003)Chapter 5: Parallel program scheduling with given parallel profileElectronic Notes in Discrete Mathematics10.1016/S1571-0653(04)80604-814(97-111)Online publication date: May-2003
https://doi.org/10.1016/S1571-0653(04)80604-8
Hamidzadeh BKit LLilja D(2000)Dynamic Task Scheduling Using Online OptimizationIEEE Transactions on Parallel and Distributed Systems10.1109/71.88863611:11(1151-1163)Online publication date: 1-Nov-2000
https://dl.acm.org/doi/10.1109/71.888636
Maheswaran MAli SSiegel HHensgen DFreund R(1999)Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing SystemsProceedings of the Eighth Heterogeneous Computing Workshop10.5555/795690.797893Online publication date: 12-Apr-1999
https://dl.acm.org/doi/10.5555/795690.797893
Maheswaran MAli SSiegal HHensgen DFreund R(1999)Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systemsProceedings. Eighth Heterogeneous Computing Workshop (HCW'99)10.1109/HCW.1999.765094(30-44)Online publication date: 1999
https://doi.org/10.1109/HCW.1999.765094
Lundberg LLennerstad H(1995)An optimal lower bound on the maximum speedup in multiprocessors with clustersProceedings 1st International Conference on Algorithms and Architectures for Parallel Processing10.1109/ICAPP.1995.472251(640-649)Online publication date: 1995
https://doi.org/10.1109/ICAPP.1995.472251
Lundberg LLennerstad HGurd JJalby W(1994)An optimal upper bound on the minimal completion time in distributed supercomputingProceedings of the 8th international conference on Supercomputing10.1145/181181.181339(196-203)Online publication date: 16-Jul-1994
https://dl.acm.org/doi/10.1145/181181.181339
Liu JSaletore VBorchers BCrawford D(1993)Self-scheduling on distributed-memory machinesProceedings of the 1993 ACM/IEEE conference on Supercomputing10.1145/169627.169841(814-823)Online publication date: 1-Dec-1993
https://dl.acm.org/doi/10.1145/169627.169841
Saletore VLiu JLam Y(1993)Scheduling non-uniform parallel loops on distributed memory machines[1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences10.1109/HICSS.1993.284074(516-525)Online publication date: 1993
https://doi.org/10.1109/HICSS.1993.284074
Banerjee UEigenmann RNicolau APadua D(1993)Automatic program parallelizationProceedings of the IEEE10.1109/5.21454881:2(211-243)Online publication date: Jan-1993
https://doi.org/10.1109/5.214548
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents