Advanced Code Generation for High Performance Fortran

Adve, Vikram; Mellor-Crummey, John

doi:10.1007/3-540-45403-9_16

Vikram Adve⁶ &
John Mellor-Crummey⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1808))

Summary

For data-parallel languages such as High Performance Fortran to achieve wide acceptance, parallelizing compilers must be able to provide consistently high performance for a broad spectrum of scientific applications. Although compilation of regular data-parallel applications for message-passing systems have been widely studied, current state-of-the-art compilers implement only a small number of key optimizations, and the implementations generally focus on optimizing programs using a “case-based” approach. For these reasons, current compilers are unable to provide consistently high levels of performance. In this paper, we describe techniques developed in the Rice dHPF compiler to address key code generation challenges that arise in achieving high performance for regular applications on message-passing systems. We focus on techniques required to implement advanced optimizations and to achieve consistently high performance with existing optimizations. Many of the core communication analysis and code generation algorithms in dHPF are expressed in terms of abstract equations manipulating integer sets. This approach enables general and yet simple implementations of sophisticated optimizations, making it more practical to include a comprehensive set of optimizations in data-parallel compilers. It also enables the compiler to support much more aggressive computation partitioning algorithms than in previous compilers. We therefore believe this approach can provide higher and more consistent levels of performance than are available today.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran

Shared Memory Parallelism in Modern C++ and HPX

References

V. Adve, J. Mellor-Crummey, and A. Sethi. HPF analysis and code generation using integer sets. Technical Report CS-TR97-275, Dept. of Computer Science, Rice University, April 1997.
Google Scholar
S. Amarasinghe and M. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the SIGPLAN’ 93 Conference on Programming Language Design and Implementation, Albuquerque, NM, June 1993.
Google Scholar
C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell. A linear algebra framework for static HPF code distribution. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, December 1993.
Google Scholar
C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Williamsburg, VA, April 1991.
Google Scholar
P. Banerjee, J. Chandy, M. Gupta, E. Hodges, J. Holm, A. Lain, D. Palermo, S. Ramaswamy, and E. Su. The Paradigm compiler for distributed-memory multicomputers. IEEE Computer, 28(10):37–47, October 1995.
Google Scholar
R. Barua, D. Kranz, and A. Agarwal. Communication-minimal partitioning of parallel loops and data arrays for cache-coherent distributed-memory multiprocessors. In Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, August 1996.
Google Scholar
S. Benkner, B. Chapman, and H. Zima. Vienna Fortran 90. In Proceedings of the 1992 Scalable High Performance Computing Conference, Williamsburg, VA, April 1992.
Google Scholar
W. Blume and R. Eigenmann. Demand-driven symbolic range propagation. In Proceedings of the Eighth Workshop on Languages and Compilers for Parallel Computing, pages 141–160, Columbus, OH, August 1995.
Google Scholar
François Bourdoncle. Abstract debugging of higher-order imperative languages. In Proceedings of the SIGPLAN’ 93 Conference on Programming Language Design and Implementation, pages 46–55, June 1993.
Google Scholar
Z. Bozkus, L. Meadows, S. Nakamoto, V. Schuster, and M. Young. Compiling High Performance Fortran. In Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, pages 704–709, San Francisco, CA, February 1995.
Google Scholar
S. Chakrabarti, M. Gupta, and J-D. Choi. Global communication analysis and optimization. In Proceedings of the SIGPLAN’ 96 Conference on Programming Language Design and Implementation, Philadelphia, PA, May 1996.
Google Scholar
S. Chatterjee, J. Gilbert, R. Schreiber, and S. Teng. Optimal evaluation of array expressions on massively parallel machines. Technical Report CSL-92-11, Xerox Corporation, December 1992.
Google Scholar
R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451–490, October 1991.
Article Google Scholar
M. Gerndt. Updating distributed variables in local computations. Concurrency: Practice and Experience, 2(3):171–193, September 1990.
Article Google Scholar
M. Gupta and P. Banerjee. A methodology for high-level synthesis of communication for multicomputers. In Proceedings of the 1992 ACM International Conference on Supercomputing, Washington, DC, July 1992.
Google Scholar
M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K. Wang, W. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Proceedings of Supercomputing’ 95, San Diego, CA, December 1995.
Google Scholar
S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan. Compiling array expressions for efficient execution on distributed-memory machines. Journal of Parallel and Distributed Computing, 32(2):155–172, February 1996.
Article Google Scholar
R. v. Hanxleden. Compiler Support for Machine-Independent Parallelization of Irregular Problems. PhD thesis, Dept. of Computer Science, Rice University, December 1994.
Google Scholar
J. Harris, J. Bircsak, M. R. Bolduc, J. A. Diewald, I. Gale, N. Johnson, S. Lee, C. A. Nelson, and C. Offner. Compiling High Performance Fortran for distributed-memory systems. Digital Technical Journal of Digital Equipment Corp., 7(3):5–23, Fall 1995.
Google Scholar
W. H. Harrison. Compiler analysis of the value ranges for variables. IEEE Transactions on Software Engineering, SE-3(3):243–250, May 1977.
Article Google Scholar
Paul Havlak. Interprocedural Symbolic Analysis. PhD thesis, Dept. of Computer Science, Rice University, May 1994. Also available as CRPC-TR94451 from the Center for Research on Parallel Computation and CS-TR94-228 from the Rice Department of Computer Science.
Google Scholar
High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1–2):1–170, 1993.
Google Scholar
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiler support for machine-independent parallel programming in Fortran D. In J. Saltz and P. Mehrotra, editors, Languages, Compilers, and Run-Time Environments for Distributed Memory Machines. North-Holland, Amsterdam, The Netherlands, 1992.
Google Scholar
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Preliminary experiences with the Fortran D compiler. In Proceedings of Supercomputing’ 93, Portland, OR, November 1993.
Google Scholar
Harold Johnson. Data flow analysis of ‘intractable’ imbedded system software. In Proceedings of the SIGPLAN’ 86 Symposium on Compiler Construction, pages 109–117, 1986.
Google Scholar
W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Frontiers’ 95: The 5th Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, February 1995.
Google Scholar
Wayne Kelly, Vadim Maslov, William Pugh, Evan Rosser, Tatiana Shpeisman, and David Wonnacott. The Omega Library Interface Guide. Technical report, Dept. of Computer Science, Univ. of Maryland, College Park, April 1996.
Google Scholar
K. Kennedy, N. Nedeljković, and A. Sethi. A linear-time algorithm for computing the memory access sequence in data-parallel programs. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, CA, July 1995.
Google Scholar
Ken Kennedy and Charles Koelbel. The High Performance Fortran 2.0 Language, chapter 1. Lecture Notes in Computer Science Series. Springer-Verlag, 1997.
Google Scholar
Ken Kennedy and Ajay Sethi. Resource-based communication placement analysis. In Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, August 1996.
Google Scholar
C. Koelbel, D. Loveman, R. Schreiber, G. Steele, Jr., and M. Zosel. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.
Google Scholar
C. Koelbel and P. Mehrotra. Compiling global name-space parallel loops for distributed execution. IEEE Transactions on Parallel and Distributed Systems, 2(4):440–451, October 1991.
Article Google Scholar
J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361–376, July 1991.
Article Google Scholar
J. Mellor-Crummey and V. Adve. Simplifying control flow in compiler-generated parallel code. Technical Report CS-TR97-278, Dept. of Computer Science, Rice University, May 1997.
Google Scholar
S. Midkiff. Local iteration set computation for block-cyclic distributions. In Proceedings of the 24th International Conference on Parallel Processing, Oconomowoc, WI, August 1995.
Google Scholar
D. Oppen. A 2² ^s ^pn upper bound on the complexity of Presburger arithmetic. Journal of Computer and System Sciences, 16(3):323–332, July 1978.
Article MATH MathSciNet Google Scholar
W. Pugh. A practical algorithm for exact array dependence analysis. Communications of the ACM, 35(8):102–114, August 1992.
Article Google Scholar
J. Ramanujam. Integer Lattice Based Method for Local Address Generation for Block-Cyclic Distributions, chapter 17. Lecture Notes in Computer Science Series. Springer-Verlag, 1997.
Google Scholar
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proceedings of the SIGPLAN’ 89 Conference on Programming Language. Design and Implementation, Portland, OR, June 1989.
Google Scholar
J. Saltz. Runtime Support for Irregular Problems, chapter 17. Lecture Notes in Computer Science Series. Springer-Verlag, 1997.
Google Scholar
A. Schrijver. Theory of Linear and Integer Programming. John Wiley and Sons, Chichester, Great Britain, 1986.
MATH Google Scholar
J. Stichnoth, D. O’Hallaron, and T. Gross. Generating communication for array statements: Design, implementation, and evaluation. Journal of Parallel and Distributed Computing, 21(1):150–159, April 1994.
Article Google Scholar
C.-W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Dept. of Computer Science, Rice University, January 1993.
Google Scholar
Peng Tu and David Padua. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In Proceedings of the 1995 ACM International Conference on Supercomputing, Barcelona, Spain, July 1995.
Google Scholar
Kees van Reeuwijk, Will Denissen, Henk Sips, and Edwin Paalvast. An implementation framework for hpf distributed arrays on message-passing parallel computer systems. IEEE Transactions on Parallel and Distributed Systems, 7(8):897–914, September 1996.
Article Google Scholar
H. Zima, H.-J. Bast, and M. Gerndt. SUPERB: A tool for semi-automatic MIMD/SIMD parallelization. Parallel Computing, 6:1–18, 1988.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Illinois at Urbana Champaign, 1304 West Springfield Avenue, Urbana, IL, 61801, USA
Vikram Adve
Department of Computer Science and Center for Research on Parallel Computation, Rice University, Houston, Texas, USA
John Mellor-Crummey

Authors

Vikram Adve
View author publications
You can also search for this author in PubMed Google Scholar
John Mellor-Crummey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA, 30332, USA
Santosh Pande
Department of ECECS, University of Cincinnati, P.O. Box 210030, Cincinnati, OH, 45221-0030, USA
Dharma P. Agrawal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Adve, V., Mellor-Crummey, J. (2001). Advanced Code Generation for High Performance Fortran. In: Pande, S., Agrawal, D.P. (eds) Compiler Optimizations for Scalable Parallel Systems. Lecture Notes in Computer Science, vol 1808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45403-9_16

Download citation

DOI: https://doi.org/10.1007/3-540-45403-9_16
Published: 18 May 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41945-7
Online ISBN: 978-3-540-45403-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Advanced Code Generation for High Performance Fortran

Summary

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Shared Memory Parallelism in Modern C++ and HPX

Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran

Shared Memory Parallelism in Modern C++ and HPX

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Advanced Code Generation for High Performance Fortran

Summary

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Shared Memory Parallelism in Modern C++ and HPX

Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran

Shared Memory Parallelism in Modern C++ and HPX

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation