Abstract
In this paper, we present an efficient framework for intraprocedural performance based program partitioning for sequential loop nests. Due to the limitations of static dependence analysis especially in the inter-procedural sense, many loop nests are identified as sequential but available task parallelism amongst them could be potentially exploited. Since this available parallelism is quite limited, performance based program analysis and partitioning which carefully analyzes the interaction between the loop nests and the underlying architectural characteristics must be undertaken to effectively use this parallelism.
We propose a compiler driven approach that configures underlying architecture to support a given communication mechanism. We then devise an iterative program partitioning algorithm that generates efficient program partitioning by analyzing interaction between effective cost of communication and the corresponding partitions. We model this problem as one of partitioning a directed acyclic task graph (DAG) in which each node is identified with a sequential loop nest and the edges denote the precedences and communication between the nodes corresponding to data transfer between loop nests. We introduce the concept of behavioral edges between edges and nodes in the task graph for capturing the interactions between computation and communication through parametric functions. We present an efficient iterative partitioning algorithm using the behavioral edge augmented PDG to incrementally compute and improve the schedule. A significant performance improvement (factor of 10 in many cases) is demonstrated by using our framework on some applications which exhibit this type of parallelism.
This work is supported by the National Science Foundation grant no. CCR-9696129 and DARPA contract ARMY DABT63-97-C-0029
Preview
Unable to display preview. Download preview PDF.
References
Banerjee, Utpal, Loop Parallelization, Kluwer Academic Publishers, 1994 (Loop Transformations for Restructuring Compilers Series).
Darbha S. and Agrawal D. P., “Optimal Scheduling Algorithm for Distributed-Memory Machines”, IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 1, January, 1998, pp. 87–95.
High Performance Fortran Forum. High Performance Fortran Language Specification, Version 1.0, Technical Report, CRPC-TR92225, Center for Research on Parallel Computation, Rice University, Houston, TX, 1992 (revised January 1993).
Kwok Y-K and Ahmad I., “Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors”, IEEE Transactions on Parallel and Distributed Systems, May 1996, Vol. 7, No. 5, pp. 506–521.
Bau D., Kodukula I., Kotlyar V., Pingali K. and Stodghill P., “Solving Alignment Using Elementary Linear Algebra”, Proceedings of 7th International Workshop on Languages and Compilers for Parallel Computing, LNCS 892, 1994, pp. 46–60.
Gerasoulis A. and Yang T., “On Granularity and Clustering of Directed Acyclic Task Graphs”, IEEE Transactions on Parallel and Distributed Systems, Vol. 4, Number 6 June 1993, pp. 686–701.
Sarkar V., Partitioning and Scheduling Parallel Programs for Multiprocessors, MIT Press, Cambridge, Mass. 1989.
Subhlok Jaspal and Vondran Gary, “Optimal Mapping of Sequences of Data Parallel Tasks”, Proceedings of Principles and Practice of Parallel Programming (PPoPP) ’95, pp. 134–143.
Chretienne P., ‘Tree Scheduling with Communication Delays’, Discrete Applied Mathematics, vol. 49, no. 1-3, p 129–141, 1994.
Yang, T. and Gerasoulis, A., ‘DSC: scheduling parallel tasks on an unbounded number of processors’, IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 9, 951–967, 1994.
Darbha S. and Pande S. S., ‘A Robust Compile Time Method for Scheduling Task Parallelism on Distributed Memory Systems’, Proceedings of the 1996 ACM/IEEE Conference on Parallel Architectures and Complication Techniques (PACT ’96), pp. 156–162.
Pande S. S. and Psarris K., ‘Program Repartitioning on Varying Communication Cost Parallel Architectures’, Journal of Parallel and Distributed Computing 33, March 1996, pp. 205–213.
Pande S. S., Agrawal D. P., and Mauney J., ‘A Scalable Scheduling Method for Functional Parallelism on Distributed Memory Multiprocessors’, IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 4, April 1995, pp. 388–399.
Fahringer, T., ‘Estimating and Optimizing Performance of Parallel Programs’, IEEE Computer: Special Issue on Parallel and Distributed Processing Tools, Vol. 28, No. 11, November 1995, pp. 47–56.
Miller Barton P., Callaghan M., Cargille J., et al. The Paradyn Parallel Performance Measurement Tool’, IEEE Computer: Special Issue on Parallel and Distributed Proceessing Tools, Vol. 28, No. 11, November 1995, pp. 37–46.
Reed D. A., et al., ’scalable Performance Analysis: The Pablo Performance Analysis Environment’, Proceedings of Scalable Parallel Libraries Conference, IEEE CS Press, 1993, pp. 104–113.
Balasundaram V., Fox G., Kennedy K. and Kremer U., ‘A Static Performance Estimator to Guide Data Partitioning Decisions’, Proceedings of 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1991, pp. 213–223.
Karamcheti V. and Chien A., ’software Overhead in Messaging Layers: Where Does the Time Go?’, Proceedings of the 6th ACM International Conference on Architectural Support for Programming Languages and Systems (ASPLOS VI), pp. 51–60.
Blume W. and Eigenmann R., ’symbolic Range Propagation’, Proceedings of the 9th International Parallel Processing Symposium, April 1995.
Reinhardt, S., Hill M. D., Larus J. R., Lebeck A. et al., ‘The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers’, Proceedings of the 1993 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pp. 48–60, May 1993.
Garey, M.R. and Johnson, D.S., ‘Computers and Intractability: A guide to the theory of NP-Completeness’, Freeman and Company, 1979.
NAS Parallel Benchmarks, http://science.nas.nasa.gov/Software/NPB/
Author information
Authors and Affiliations
Corresponding author
Editor information
Rights and permissions
Copyright information
© 1999 Springer-Verlag
About this paper
Cite this paper
Subramanian, R., Pande, S. (1999). Efficient program partitioning based on compiler controlled communication. In: Rolim, J., et al. Parallel and Distributed Processing. IPPS 1999. Lecture Notes in Computer Science, vol 1586. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0097884
Download citation
DOI: https://doi.org/10.1007/BFb0097884
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65831-3
Online ISBN: 978-3-540-48932-0
eBook Packages: Springer Book Archive