Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/55364.55422acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article
Free access

Impact of self-scheduling order on performance on multiprocessor systems

Published: 01 June 1988 Publication History

Abstract

Processor self-scheduling is an efficient dynamic scheduling for multiprocessors. This paper discusses the impact of the self-scheduling order on the performance of multiply-nested parallel loops.
It is shown that, due to data synchronization for cross-iteration data dependencies, the completion time of a multiply-nested loop is reduced when the nesting parallel loops with smaller delays are moved to the inside. The best performance is achieved when a shortest-delay scheduling order is used. The performance of the shortest-delay self-scheduling is compared to other self-scheduling orders and to compile-time static scheduling order proposed elsewhere. Program transformation needed to implement shortest-delay self-scheduling is also included.

References

[1]
E.L. Lusk and R. A. Overbeek. "Implementation of Monitors with Macros: A Programming Aid for the HEP and other Parallel Processors", Argonne National Laboratory, ANL-83-97, Argonne, Illinois, December 1983.
[2]
B. Smith. "The Architecture of HEP," In: Parallel MIMD Computation: HEP Supercomputer and Its Applications, Janusz S. Kowallk, ed. The MIT Press, Cambridge, Massachusetts, 1985, pp. 41- 55.
[3]
P. Tang and P. Yew. "Processor Self-Scheduling for Multiple-Nested Parallel Loops," Proceedings of 1986 International Conference on Parallel Processing (August 19-22, 1986), pp. 528-535.
[4]
Z. Fang, P. Yew, P. Tang and C. Zhu. "Dynamic Processor Self-Scheduling for General Parallel Nested Loops," Proceedings of the 1987 International Conference on Parallel Processing (August, 1987), pp. 1-10.
[5]
C.D. Polyehronopoulos and D. J. Kuck. "Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supereomputers," IEEE Transactions on Computers (December, 1987), Vol. C-36, No. 12, pp. 1425-1439.
[6]
R. Allen, D. Callahan and K. Kennedy. "Automatic Decomposition of Scientific Programs for Parallel Execution," Conference Record of the Fourteenth Annual A CM Symposium on Principles of Programming Languages (January, 1987), pp. 63--76.
[7]
M.D. Guzzi. "Multitasking Runtime Systems for the Cedar Multiproce.,~or", Center for Supercomputing Research and Development, University of Illinois at Urb~ma-Champaign, Rpt No. 604, Urbana, Illinois, July, 1986.
[8]
Alliant Computer Systems Corp. FX/Series-- Architecture Manual., 1986.
[9]
F. Darema-Rogers, D. A. George, A. Norton and G. F. Pfister. "VM/EPEX- A VM/SP Based Environment for Parallel Execution", IBM Research RCl1381, Yorktown Heights, NY, 1985.
[10]
R.L. Graham. "Bounds of Certain Multiprocessing Timing Anomalies," SIAM Journal on Applied Mathematics (1969), Vol. 17, No. 2, pp. 416-429.
[11]
E.G. Coffman Jr. and P. J. Denning. Operating Systems Theory. Prentlce-Hall, Inc., Englewood cliff, New Jersey, 1973.
[12]
S.P. Midkiff and D. A. Padua. "Compiler Algorithms for Synchronization," IEEE Transaction on Computers (December, 1987), Vol. C-36, No. 12, pp. 1485-1495.
[13]
P. Tang, P. Yew and C. Zhu. "Algorithms for Generating Data-Level Synchronization Instructions", Center for Supercomputing Research and Development, University of Illinois at Urbana- Champaign, Rpt. No. 733, Urbana, January, 1988.
[14]
P. Tang, P. Yew, Z. Fang and C. Zhu. "Deadlock Prevention in Processor Self-Scheduling for Parallel Nested Loops," Proceedings of the 1987 International Conference o} Parallel Processing (August 1987), pp. 11-18.
[15]
D.J. Kuck. The Structure of Computers and Computations, Yol. 1. John Wiley & Sons, Inc., 1978.
[16]
C. Zhu and P. Yew. "A Scheme to Enforce Data Dependences on Large Multiprocessor Systems," IEEE Transaction on Software Engineering (June 1987), Vol. SE-13, No. 6, pp. 726-739.
[17]
P. Tang. "Self-Scheduling, Data Synchronization and Program Transformation for Multiprocessor Systems", Center for Supercomputlng Research and Development, University of Illinois at Urbana-Champaign, PhD Thesis, in preparation.
[18]
R. Cytron. "Doacross: Beyond Vectorization for Multiprocessors," Proceedings of the 1986 International Conference for Parallel Processing (August, 1986), pp. 836-844.
[19]
M.J. Wolfe. "Optimizing Compiler for Supercomputers", Department of Computer Science, University of Illinois at Urbana-Champaign, Report No. UIUCDCS-R-82-1105, October, 1982.
[20]
J.R. Allen and K. Kennedy. "Automatic Loop Interchange," Proceedings of the A CM SIG- PLAN 198~ Symposium on Compiler Construction (June, 1984), pp. 233-246.
[21]
M. Wolfe. "Advanced Loop Interchanging," Proceedings o/the 1986 International Conference on Parallel ProcesMng (August 1986), pp. 536- 543.
[22]
R. Cytron. "Limited Processor Scheduling of Doacross Loops," Proceedings of the 1987 International Conference on Parallel Processing (August, 1987), pp. 226-234.
[23]
A. Shoshani and E. G. Coffman. "Detection and Prevention of Deadlocks," Proceedings of 4th Annual Princeton Conference an Information Sciences and Systems (Marchl 1970), pp. 355- 360.
[24]
Z. Li and W. Abu-sufah. "A Technique for Reducing Synchronization Overhead in Large Scale Multiprocessors," Proc. of th~ l'2th international Symposium on Computer Architecture (June 1985), pp. 406-413.
[25]
A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph and M. Snir. "The NYU Ultraeomputer Designing a MIJ/ID Shared Memory Parallel Computer," IEEE Trans. on Computer (Feb. 1983), Vol. C-32, No. 2, pp. 175-189.
[26]
D.J. Kuck et al. "Parallel Computing Today and Cedar Approach," Science (Feb. 1986), pp. 967- 974.
[27]
M. Wolfe. "Loop Skewing: the Wavefront Method Revisited", Kuck and Associates, inc., Savoy, Illinois, 1987.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '88: Proceedings of the 2nd international conference on Supercomputing
June 1988
679 pages
ISBN:0897912721
DOI:10.1145/55364
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1988

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Prediction-Based Independent Task Scheduling for Heterogeneous Distributed Computing SystemsAdvanced Materials Research10.4028/www.scientific.net/AMR.457-458.1039457-458(1039-1046)Online publication date: Jan-2012
  • (2003)Chapter 5: Parallel program scheduling with given parallel profileElectronic Notes in Discrete Mathematics10.1016/S1571-0653(04)80604-814(97-111)Online publication date: May-2003
  • (2000)Dynamic Task Scheduling Using Online OptimizationIEEE Transactions on Parallel and Distributed Systems10.1109/71.88863611:11(1151-1163)Online publication date: 1-Nov-2000
  • (1999)Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing SystemsProceedings of the Eighth Heterogeneous Computing Workshop10.5555/795690.797893Online publication date: 12-Apr-1999
  • (1999)Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systemsProceedings. Eighth Heterogeneous Computing Workshop (HCW'99)10.1109/HCW.1999.765094(30-44)Online publication date: 1999
  • (1995)An optimal lower bound on the maximum speedup in multiprocessors with clustersProceedings 1st International Conference on Algorithms and Architectures for Parallel Processing10.1109/ICAPP.1995.472251(640-649)Online publication date: 1995
  • (1994)An optimal upper bound on the minimal completion time in distributed supercomputingProceedings of the 8th international conference on Supercomputing10.1145/181181.181339(196-203)Online publication date: 16-Jul-1994
  • (1993)Self-scheduling on distributed-memory machinesProceedings of the 1993 ACM/IEEE conference on Supercomputing10.1145/169627.169841(814-823)Online publication date: 1-Dec-1993
  • (1993)Scheduling non-uniform parallel loops on distributed memory machines[1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences10.1109/HICSS.1993.284074(516-525)Online publication date: 1993
  • (1993)Automatic program parallelizationProceedings of the IEEE10.1109/5.21454881:2(211-243)Online publication date: Jan-1993
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media