Summary
For data-parallel languages such as High Performance Fortran to achieve wide acceptance, parallelizing compilers must be able to provide consistently high performance for a broad spectrum of scientific applications. Although compilation of regular data-parallel applications for message-passing systems have been widely studied, current state-of-the-art compilers implement only a small number of key optimizations, and the implementations generally focus on optimizing programs using a “case-based” approach. For these reasons, current compilers are unable to provide consistently high levels of performance. In this paper, we describe techniques developed in the Rice dHPF compiler to address key code generation challenges that arise in achieving high performance for regular applications on message-passing systems. We focus on techniques required to implement advanced optimizations and to achieve consistently high performance with existing optimizations. Many of the core communication analysis and code generation algorithms in dHPF are expressed in terms of abstract equations manipulating integer sets. This approach enables general and yet simple implementations of sophisticated optimizations, making it more practical to include a comprehensive set of optimizations in data-parallel compilers. It also enables the compiler to support much more aggressive computation partitioning algorithms than in previous compilers. We therefore believe this approach can provide higher and more consistent levels of performance than are available today.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
V. Adve, J. Mellor-Crummey, and A. Sethi. HPF analysis and code generation using integer sets. Technical Report CS-TR97-275, Dept. of Computer Science, Rice University, April 1997.
S. Amarasinghe and M. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the SIGPLAN’ 93 Conference on Programming Language Design and Implementation, Albuquerque, NM, June 1993.
C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell. A linear algebra framework for static HPF code distribution. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, December 1993.
C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Williamsburg, VA, April 1991.
P. Banerjee, J. Chandy, M. Gupta, E. Hodges, J. Holm, A. Lain, D. Palermo, S. Ramaswamy, and E. Su. The Paradigm compiler for distributed-memory multicomputers. IEEE Computer, 28(10):37–47, October 1995.
R. Barua, D. Kranz, and A. Agarwal. Communication-minimal partitioning of parallel loops and data arrays for cache-coherent distributed-memory multiprocessors. In Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, August 1996.
S. Benkner, B. Chapman, and H. Zima. Vienna Fortran 90. In Proceedings of the 1992 Scalable High Performance Computing Conference, Williamsburg, VA, April 1992.
W. Blume and R. Eigenmann. Demand-driven symbolic range propagation. In Proceedings of the Eighth Workshop on Languages and Compilers for Parallel Computing, pages 141–160, Columbus, OH, August 1995.
François Bourdoncle. Abstract debugging of higher-order imperative languages. In Proceedings of the SIGPLAN’ 93 Conference on Programming Language Design and Implementation, pages 46–55, June 1993.
Z. Bozkus, L. Meadows, S. Nakamoto, V. Schuster, and M. Young. Compiling High Performance Fortran. In Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, pages 704–709, San Francisco, CA, February 1995.
S. Chakrabarti, M. Gupta, and J-D. Choi. Global communication analysis and optimization. In Proceedings of the SIGPLAN’ 96 Conference on Programming Language Design and Implementation, Philadelphia, PA, May 1996.
S. Chatterjee, J. Gilbert, R. Schreiber, and S. Teng. Optimal evaluation of array expressions on massively parallel machines. Technical Report CSL-92-11, Xerox Corporation, December 1992.
R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451–490, October 1991.
M. Gerndt. Updating distributed variables in local computations. Concurrency: Practice and Experience, 2(3):171–193, September 1990.
M. Gupta and P. Banerjee. A methodology for high-level synthesis of communication for multicomputers. In Proceedings of the 1992 ACM International Conference on Supercomputing, Washington, DC, July 1992.
M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K. Wang, W. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Proceedings of Supercomputing’ 95, San Diego, CA, December 1995.
S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan. Compiling array expressions for efficient execution on distributed-memory machines. Journal of Parallel and Distributed Computing, 32(2):155–172, February 1996.
R. v. Hanxleden. Compiler Support for Machine-Independent Parallelization of Irregular Problems. PhD thesis, Dept. of Computer Science, Rice University, December 1994.
J. Harris, J. Bircsak, M. R. Bolduc, J. A. Diewald, I. Gale, N. Johnson, S. Lee, C. A. Nelson, and C. Offner. Compiling High Performance Fortran for distributed-memory systems. Digital Technical Journal of Digital Equipment Corp., 7(3):5–23, Fall 1995.
W. H. Harrison. Compiler analysis of the value ranges for variables. IEEE Transactions on Software Engineering, SE-3(3):243–250, May 1977.
Paul Havlak. Interprocedural Symbolic Analysis. PhD thesis, Dept. of Computer Science, Rice University, May 1994. Also available as CRPC-TR94451 from the Center for Research on Parallel Computation and CS-TR94-228 from the Rice Department of Computer Science.
High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1–2):1–170, 1993.
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiler support for machine-independent parallel programming in Fortran D. In J. Saltz and P. Mehrotra, editors, Languages, Compilers, and Run-Time Environments for Distributed Memory Machines. North-Holland, Amsterdam, The Netherlands, 1992.
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Preliminary experiences with the Fortran D compiler. In Proceedings of Supercomputing’ 93, Portland, OR, November 1993.
Harold Johnson. Data flow analysis of ‘intractable’ imbedded system software. In Proceedings of the SIGPLAN’ 86 Symposium on Compiler Construction, pages 109–117, 1986.
W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Frontiers’ 95: The 5th Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, February 1995.
Wayne Kelly, Vadim Maslov, William Pugh, Evan Rosser, Tatiana Shpeisman, and David Wonnacott. The Omega Library Interface Guide. Technical report, Dept. of Computer Science, Univ. of Maryland, College Park, April 1996.
K. Kennedy, N. Nedeljković, and A. Sethi. A linear-time algorithm for computing the memory access sequence in data-parallel programs. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, CA, July 1995.
Ken Kennedy and Charles Koelbel. The High Performance Fortran 2.0 Language, chapter 1. Lecture Notes in Computer Science Series. Springer-Verlag, 1997.
Ken Kennedy and Ajay Sethi. Resource-based communication placement analysis. In Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, August 1996.
C. Koelbel, D. Loveman, R. Schreiber, G. Steele, Jr., and M. Zosel. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.
C. Koelbel and P. Mehrotra. Compiling global name-space parallel loops for distributed execution. IEEE Transactions on Parallel and Distributed Systems, 2(4):440–451, October 1991.
J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361–376, July 1991.
J. Mellor-Crummey and V. Adve. Simplifying control flow in compiler-generated parallel code. Technical Report CS-TR97-278, Dept. of Computer Science, Rice University, May 1997.
S. Midkiff. Local iteration set computation for block-cyclic distributions. In Proceedings of the 24th International Conference on Parallel Processing, Oconomowoc, WI, August 1995.
D. Oppen. A 22 s pn upper bound on the complexity of Presburger arithmetic. Journal of Computer and System Sciences, 16(3):323–332, July 1978.
W. Pugh. A practical algorithm for exact array dependence analysis. Communications of the ACM, 35(8):102–114, August 1992.
J. Ramanujam. Integer Lattice Based Method for Local Address Generation for Block-Cyclic Distributions, chapter 17. Lecture Notes in Computer Science Series. Springer-Verlag, 1997.
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proceedings of the SIGPLAN’ 89 Conference on Programming Language. Design and Implementation, Portland, OR, June 1989.
J. Saltz. Runtime Support for Irregular Problems, chapter 17. Lecture Notes in Computer Science Series. Springer-Verlag, 1997.
A. Schrijver. Theory of Linear and Integer Programming. John Wiley and Sons, Chichester, Great Britain, 1986.
J. Stichnoth, D. O’Hallaron, and T. Gross. Generating communication for array statements: Design, implementation, and evaluation. Journal of Parallel and Distributed Computing, 21(1):150–159, April 1994.
C.-W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Dept. of Computer Science, Rice University, January 1993.
Peng Tu and David Padua. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In Proceedings of the 1995 ACM International Conference on Supercomputing, Barcelona, Spain, July 1995.
Kees van Reeuwijk, Will Denissen, Henk Sips, and Edwin Paalvast. An implementation framework for hpf distributed arrays on message-passing parallel computer systems. IEEE Transactions on Parallel and Distributed Systems, 7(8):897–914, September 1996.
H. Zima, H.-J. Bast, and M. Gerndt. SUPERB: A tool for semi-automatic MIMD/SIMD parallelization. Parallel Computing, 6:1–18, 1988.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Adve, V., Mellor-Crummey, J. (2001). Advanced Code Generation for High Performance Fortran. In: Pande, S., Agrawal, D.P. (eds) Compiler Optimizations for Scalable Parallel Systems. Lecture Notes in Computer Science, vol 1808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45403-9_16
Download citation
DOI: https://doi.org/10.1007/3-540-45403-9_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41945-7
Online ISBN: 978-3-540-45403-8
eBook Packages: Springer Book Archive