research-article

Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors

Authors:

R. CytronAuthors Info & Claims

IEEE Transactions on Computers, Volume 38, Issue 8

Pages 1203 - 1211

https://doi.org/10.1109/12.30873

Published: 01 August 1989 Publication History

Abstract

Parallel execution of nonvectorizable uniform recurrences is considered. When naively scheduled, such recurrences could create unacceptable communication and synchronization on a multiprocessor. The minimum-distance method partitions such recurrences into totally independent computations without increasing redundancy or perturbing numerical stability. The independent computations are well suited for execution on a multiprocessor, but they may not utilize all available processors. How extra processors can be applied to the independent computations is addressed. The methods are especially attractive for multiprocessors comprised of clusters.

References

[1]

{1} U. Banerjee, S. C. Chen, D. J. Kuch, and R. A. Towle, "Time and parallel processor bounds for Fortran-like loops," IEEE Trans. Comput., vol. C-28, pp. 660-670, Sept. 1979.

[2]

{2} Cray Research, Inc. The Cray X-MP Series of Computers, Cray Res., Inc., 1982. Publication MP-0001.

[3]

{3} R. Cytron, "Doacross: Beyond vectorization for multiprocessors," in Proc. 1986 Int. Conf. Parallel Processing, Aug. 1986, pp. 836-844.

[4]

{4} J. R. B. Davies, "Parallel loop constructs for multiprocessors," M. S. thesis, Univ. of Illinois at Urbana-Champaign, 1981.

[5]

{5} J. A. B. Fortes and D. I. Moldovan, "Parallelism detection and transformation techniques useful for VLSI algorithms," J. Parallel Distrib. Comput., May 1985.

[6]

{6} M. R. Garey and D. S. Johnson, Computers and Intractibility: A Guide to the Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.

[7]

{7} D. A. Padua Haiek, "Multiprocessors: Discussions of some theoretical and practical problems," Ph.D. dissertation. Univ. of Illinois at Urbana-Champaign, Urbana, IL, 1979, Rep. UIUCDCS-R-79-99.

[8]

{8} R. W. Heuft and W. D. Little, "Improved time and parallel processor bounds for Fortran-like loops," IEEE Trans. Comput., vol. C-31, Jan. 1982.

[9]

{9} A. M. Kirch, Elementary Number Theory, Intext, 1974.

[10]

{10} D. J. Kuck, The Structure of Computers and Computations. New York: Wiley, 1978.

[11]

{11} D. J. Kuck, D. Lawrie, R. Cytron, A. Sameh, and D. Gajski, "Cedar Project, in D. H. Sharp, N. Metropolis, and W. J. Worlton, Eds. Berkeley. CA: Frontiers of Supercomputing, Univ. of California Press, 1986, pp. 97-123.

[12]

{12} D. I. Moldovan and J. A. B. Fortes, "Partitioning and mapping algorithms into fixed size systolic arrays," IEEE Trans. Comput., vol. C-35, pp. 1-12, Jan. 1986.

[13]

{13} J.-K. Peir, "Program partitioning and synchronization on multiprocessor systems," Ph.D. dissertation, University of Illinois at Urbana-Champaign, Urbana, IL, Mar. 1986, Rep. UIUCDCS-R-86-1259.

[14]

{14} J.-K. Peir and R. Cytron, "Minimum distance: A method for partitioning recurrences for multiprocessors," in Proc. 1987 Int. Conf. Parallel Processing, St. Charles. IL, 1987.

[15]

{15} J.-K. Peir and D. D. Gajski, "CAMP: A programming aid for multiprocessors," in Proc. Int. Conf. Parallel Processing, 1986, pp. 475-482.

[16]

{16} G. F. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe, E. A. Melton, V. A. Norton, and J. Weiss, "The IBM research parallel prototype (RP3): Introduction and architecture," in Proc. Int. Conf. Parallel Processing, 1985, pp. 764-771.

[17]

{17} C. D. Polychronopoulos, D. J. Kuck, and D. A. Padua, "Execution of parallel loops on parallel processor systems," in Proc. Int. Conf. Parallel Processing, Aug. 1986, pp. 519-527.

[18]

{18} A. H. Sameh, Algorithms and experiments for parallel linear systems solvers," in Proc. Second SIAM Conf. Parallel Processing Scientif. Comput., Nov. 1985.

[19]

{19} W. Shang and J. A. B. Fortes, "Independence partitioning of algorithms with uniform dependencies," in Proc. 1988 Int. Conf. Parallel Processing, Aug. 1988.

[20]

{20} S. G. Tucker, "The IBM 3090 System: An overview," IBM Syst. J., vol. 25, pp. 4-19, 1986.

[21]

{21} M. J. Wolfe, "Optimizing supercompilers for supercomputers," Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign, Urbana, IL, Rep. UIUCDCS-R-82-1105, 1982.

Cited By

Zhao QKoh DRaza SBruening DWong WAmarasinghe S(2011)Dynamic cache contention detection in multi-threaded applicationsACM SIGPLAN Notices10.1145/2007477.195268846:7(27-38)Online publication date: 9-Mar-2011
https://dl.acm.org/doi/10.1145/2007477.1952688
Zhao QKoh DRaza SBruening DWong WAmarasinghe SPetrank ELea D(2011)Dynamic cache contention detection in multi-threaded applicationsProceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments10.1145/1952682.1952688(27-38)Online publication date: 9-Mar-2011
https://dl.acm.org/doi/10.1145/1952682.1952688
Djamegni CQuinton PRajopadhye SRisset TTchuenté M(2009)A reindexing based approach towards mapping of DAG with affine schedules onto parallel embedded systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2008.08.00469:1(1-11)Online publication date: 1-Jan-2009
https://dl.acm.org/doi/10.1016/j.jpdc.2008.08.004
Show More Cited By

Index Terms

Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors

Recommendations

Implementation of a parallel Prolog interpreter on multiprocessors
IPPS '91: Proceedings of the Fifth International Parallel Processing Symposium

Describes the implementation of the Reduce-OR process model for the parallel execution of logic programs in an interpreter for parallel Prolog. The interpreter supports full OR and independent AND parallelism in logic programs on both shared and ...
Efficient Processor Assignment Algorithms and Loop Transformations for Executing Nested Parallel Loops on Multiprocessors

An important issue for the efficient use of multiprocessor systems is the assignment of parallel processors to nested parallel loops. It is desirable for a processor assignment algorithm to be fast and always generate an optimal processor assignment. ...
A Detailed Performance Analysis of the Interpolation Supplemented Lattice Boltzmann Method on the Cray T3E and Cray X1A Detailed Performance Analysis of the Interpolation Supplemented Lattice Boltzmann Method on the Cray T3E and Cray X1

A detailed study of the parallel performance of the interpolation supplemented lattice Boltzmann (ISLB) method using SHMEM and MPI on the Cray T3E-900 and Cray X1 architectures is presented. The noteworthy feature of the ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 38, Issue 8

August 1989

168 pages

ISSN:0018-9340

Editor:
Ming T. Lu
Ohio State Univ., Columbus, OH

Issue’s Table of Contents

Copyright © Copyright © 1989 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 August 1989

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao QKoh DRaza SBruening DWong WAmarasinghe S(2011)Dynamic cache contention detection in multi-threaded applicationsACM SIGPLAN Notices10.1145/2007477.195268846:7(27-38)Online publication date: 9-Mar-2011
https://dl.acm.org/doi/10.1145/2007477.1952688
Zhao QKoh DRaza SBruening DWong WAmarasinghe SPetrank ELea D(2011)Dynamic cache contention detection in multi-threaded applicationsProceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments10.1145/1952682.1952688(27-38)Online publication date: 9-Mar-2011
https://dl.acm.org/doi/10.1145/1952682.1952688
Djamegni CQuinton PRajopadhye SRisset TTchuenté M(2009)A reindexing based approach towards mapping of DAG with affine schedules onto parallel embedded systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2008.08.00469:1(1-11)Online publication date: 1-Jan-2009
https://dl.acm.org/doi/10.1016/j.jpdc.2008.08.004
Zhang CKurdahi F(2007)Reducing off-chip memory access via stream-conscious tiling on multimedia applicationsInternational Journal of Parallel Programming10.5555/1241828.124183135:1(63-98)Online publication date: 1-Feb-2007
https://dl.acm.org/doi/10.5555/1241828.1241831
Djamegni C(2004)Mapping rectangular mesh algorithms onto asymptotically space-optimal arraysJournal of Parallel and Distributed Computing10.1016/j.jpdc.2003.04.00264:3(345-359)Online publication date: 1-Mar-2004
https://dl.acm.org/doi/10.1016/j.jpdc.2003.04.002
Darte AHuard G(2002)Complexity of Multi-dimensional Loop AlignmentProceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science10.5555/646516.696302(179-191)Online publication date: 14-Mar-2002
https://dl.acm.org/doi/10.5555/646516.696302
Drositis IGoumas GKoziris NTsanakas PPapakonstantinou G(2000)Evaluation of Loop Grouping Methods Based on Orthogonal Projection SpacesProceedings of the Proceedings of the 2000 International Conference on Parallel Processing10.5555/850941.852932Online publication date: 21-Aug-2000
https://dl.acm.org/doi/10.5555/850941.852932
Tsanakas PKoziris NPapakonstantinou G(2000)Chain GroupingIEEE Transactions on Parallel and Distributed Systems10.1109/71.87977711:9(941-955)Online publication date: 1-Sep-2000
https://dl.acm.org/doi/10.1109/71.879777
Kyriakis-Bitzaros EGoutis C(1999)A Space-Time Representation Method of Iterative Algorithms for the Design of Processor ArraysJournal of VLSI Signal Processing Systems10.1023/A:100810350483622:3(151-162)Online publication date: 20-Sep-1999
https://dl.acm.org/doi/10.1023/A%3A1008103504836
Jin GLi ZChen F(1998)An Efficient Solution to the Cache Thrashing Problem Caused by True Data SharingIEEE Transactions on Computers10.1109/12.67722847:5(527-543)Online publication date: 1-May-1998
https://dl.acm.org/doi/10.1109/12.677228
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents