Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/263764.263773acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article
Free access

Space and time efficient execution of parallel irregular computations

Published: 21 June 1997 Publication History

Abstract

Solving problems of large sizes is an important goal for parallel machines with multiple CPU and memory resources. In this paper, issues of efficient execution of overhead-sensitive parallel irregular computation under memory constraints are addressed. The irregular parallelism is modeled by task dependence graphs with mixed granularities. The trade-off in achieving both time and space efficiency is investigated. The main difficulty of designing efficient run-time system support is caused by the use of fast communication primitives available on modern parallel architectures. A run-time active memory management scheme and new scheduling techniques are proposed to improve memory utilization while retaining good time efficiency, and a theoretical analysis on correctness and performance is provided. This work is implemented in the context of RAPID system [5] which provides run-time support for parallelizing irregular code on distributed memory machines and the effectiveness of the proposed techniques is verified on sparse Cholesky and LU factorization with partial pivoting. The experimental results on Cray-T3D show that solvable problem sizes can be increased substantially under limited memory capacities and the loss of execution efficiency caused by the extra memory managing overhead is reasonable.

References

[1]
G. E. Blelloch, P. B. Gibbons, and Y. Matias. Provably Efficient Scheduling for Languages with Fine-Grained Parallelism. In Proceedings of 7th A CM Symposium on Parallel Algorithms and Architectures, pages 1-12, July 1995.]]
[2]
R. Blumfoe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of Fifth A CM Symposium on Principles and Practice of Parallel Programming, pages 207--216, July 1995.]]
[3]
S. Chakrabarti, J. Demmel, and K. Yelick. Modeling the Benefits of Mixed Data and Task Parallelism. In Proceedings of 7th A CM Symposium on Parallel Algorithms and Architectures, pages 74-83, July 1995.]]
[4]
R. Cytron and J. Ferrante. What's in a name? The Value of Renaming for Parallelism Detection and Storage Allocation. In Proceedings of International Conference on Parallel Processing, pages 19-27, February 1987.]]
[5]
C. Fu and T. Yang. Run-time Compilation for Parallel Sparse Matrix Computations. In Proceedings of ACM International Conference on Supercomputing, pages 237-244, Philadelphia, May 1996.]]
[6]
C. Fu and T. Yang. Sparse LU Factorization with Partial Pivoting on Distributed Memory Machines. in Proceedings of ACM/1EEE Supercomputing'96, Pittsburgh, November 1996.]]
[7]
C. Fu and T. Yang. Run-time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures. Journal of Parallel and Distributed Computing, 1997. Accepted for publication. Also as UCSB technical report TRCS97-03.]]
[8]
A. Gerasoulis, j. Jiao, and T. Yang. Scheduling of Structured and Unstructured Computation. In D. Hsu, A. Rosenberg, and D. Sotteau, editors, Inter#nnections Networks and Mappings and Scheduling Parallel Computation, pages 139-172. American Math. Society, 1995.]]
[9]
M. Girkar and C. Polychronopoulos. Automatic Extraction of Functina} Parallelism from Ordinary Programs. IEEE Transactions on Parallel and Distributed Systems, 3(2):166-178, 1992.]]
[10]
M. Ibel, K. E. Schauser, C. J. Scheiman, and M. Weis. Implementing Active Messages and Split-C for SCI Clusters and Some Architectural implications. In Sixth International Workshop on SCl-based Low-cost/Highperformance Computing, September 1996.]]
[11]
X. Li. Sparse Gaussian Elimination on High Performance Computers. PhD thesis, CS, UC Berkeley, 1996.]]
[12]
C. D. Polychronopoulos. Parallel Programming and Compilers. Kluwer Academic Publishers, 1988.]]
[13]
S. Ramaswamy, S. Sapatnekar, and P. Banerjee. A Convex Programming Approach for Exploiting Data and Functional Parallelism. In Proceedings of International Conference on Parallel Processing, pages 116- 125, 1994.]]
[14]
E. Rothberg and R. Schreiber. Improved Load Distribution in Parallel Sparse Cholesky Factorization. In Proceedings of A CM/IEEE Supercomputing, pages 783- 792, November 1994.]]
[15]
J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. Run-Time Scheduling and Execution of Loops on Message Passing Machines. Journal of Parallel and Distributed Computing, 8:303-312, 1990.]]
[16]
V. Sarkar. Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors. MIT Press, 1989.]]
[17]
R. Schreiber. Scalability of Sparse Direct Solvers, volume 56 of Graph Theory and Sparse Matrix Computation (Edited by Alan George and John R. Gilbert and Joseph W.H. Liu), pages 191-209. Springer-Verlag, New York, 1993.]]
[18]
T. Stricker, J. Stichnoth, D. O'Hallaron, S. Hinrichs, and T. Gross. Decoupling Synchronization and Data Transfer in Message Passing Systems of Parallel Computers. In Proceedings of A CM International Conference on Supercomputing, pages 1-10, Barcelona, July 1995.]]
[19]
R. Wolski and J. Feo. Program Parititoning for NUMA Multiprocessor Computer Systems. Journal of Parallel and Distributed Computing, 1993.]]
[20]
T. Yang and A. Gerasoulis. List Scheduling with and without Communication Delays. Parallel Computing, 19:1321-1344, 1992.]]
[21]
T. Yang and A. Gerasoulis. DSC: Scheduling Parallel Tasks on An Unbounded Number of Processors. IEEE Transactions on Parallel and Distributed Systems, 5(9):951-967, 1994. A short version is in Proceedings of Supercomputing'91.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
June 1997
287 pages
ISBN:0897919068
DOI:10.1145/263764
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

PPoPP97
Sponsor:
PPoPP97: Principles & Practices of Parallel Programming
June 18 - 21, 1997
Nevada, Las Vegas, USA

Acceptance Rates

PPOPP '97 Paper Acceptance Rate 26 of 86 submissions, 30%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)16
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Compact DAG Representation and Its Dynamic SchedulingJournal of Parallel and Distributed Computing10.1006/jpdc.1999.156658:3(487-514)Online publication date: 4-Jan-2019
  • (2014)Author retrospective for PYRROSACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591647(18-20)Online publication date: 10-Jun-2014
  • (1999)Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machinesACM SIGPLAN Notices10.1145/329366.30111434:8(107-118)Online publication date: 1-May-1999
  • (1999)Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machinesProceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/301104.301114(107-118)Online publication date: 1-May-1999
  • (1998)Elimination forest guided 2D sparse LU factorizationProceedings of the tenth annual ACM symposium on Parallel algorithms and architectures10.1145/277651.277658(5-15)Online publication date: 1-Jun-1998
  • (1998)Low memory cost dynamic scheduling of large coarse grain task graphsProceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing10.1109/IPPS.1998.669966(524-530)Online publication date: 1998
  • (1998)Symbolic partitioning and scheduling of parameterized task graphsProceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)10.1109/ICPADS.1998.741109(428-434)Online publication date: 1998
  • (1998)Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/71.6638649:2(109-125)Online publication date: 1-Feb-1998
  • (1997)Global Optimization for Mapping Parallel Image Processing Tasks on Distributed Memory MachinesJournal of Parallel and Distributed Computing10.1006/jpdc.1997.136045:1(29-45)Online publication date: 25-Aug-1997

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media