Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/509593.509605acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

Compiling stencils in high performance Fortran

Published: 15 November 1997 Publication History
  • Get Citation Alerts
  • Abstract

    For many Fortran90 and HPF programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils. Stencil computations are commonly used in solving partial differential equations, image processing, and geometric modeling. The efficient handling of such stencils is critical for achieving high performance on distributed-memory machines. Compiling stencils into efficient code is viewed as so important that some companies have built special-purpose compilers for handling them and others have added stencil-recognizers to existing compilers.In this paper we present a general compilation strategy for stencils written using Fortran90 array constructs. Our strategy is capable of optimizing single or multi-statement stencils and is applicable to stencils specified with shift intrinsics or with array-syntax all equally well. The strategy eliminates the need for pattern-recognition algorithms by orchestrating a set of optimizations that address the overhead of both intraprocessor and interprocessor data movement that results from the translation of Fortran90 array constructs. Our experimental results show that code produced by this strategy beats or matches the best code produced by the special-purpose compilers or pattern-recognition schemes that are known to us. In addition, our strategy produces highly optimized code in situations where the others fail, producing several orders of magnitude performance improvement, and thus provides a stencil compilation strategy that is more robust than its predecessors.

    References

    [1]
    Z. Bozkus, L. Meadows, D. Miles, S. Nakamoto, V. Schuster, and M. Young. Techniques for compiling and executing HPF programs on shared-memory and distributed-memory parallel systems. In Proceedings of the First International Workshop on Parallel Processing, Bangalore, India, December 1994.
    [2]
    Z. Bozkus, L. Meadows, S. Nakamoto, V. Schuster, and M. Young. PGHPF --- an optimizing High Performance Fortran compiler for distributed memory machines. Scientific Programming, 6(1):29-40, 1997.
    [3]
    T. Brandes. Compiling data parallel programs to message passing programs for massively parallel MIMD systems. In Working Conference on Massively Parallel Programming Models, Berlin, 1993.
    [4]
    R. G. Brickner, W. George, S. L. Johnsson, and A. Ruttenberg. A stencil compiler for the Connection Machine models CM-2/200. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, December 1993.
    [5]
    R. G. Brickner, K. Holian, B. Thiagarajan, and S. L. Johnsson. A stencil compiler for the Connection Machine model CM-5. Technical Report CRPC-TR94457, Center for Research on Parallel Computation, Rice University, June 1994.
    [6]
    M. Bromley, S. Heller, T. McNerney, and G. Steele, Jr. Fortran at ten gigaflops: The Connection Machine convolution compiler. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.
    [7]
    S. Carr, K. S. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), San Jose, CA, October 1994.
    [8]
    A. Choudhary, G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, S. Ranka, and C.-W. Tseng. Compiling Fortran 77D and 90D for MIMD distributed-memory machines. In Frontiers '92: The 4th Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, October 1992.
    [9]
    R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451-490, October 1991.
    [10]
    W. George, R. Brickner, and S. L. Johnsson. Polyshift communications software for the Connection Machine systems CM-2 and CM-200. Scientific Programming, 3(1):83, Spring 1994.
    [11]
    M. Gerndt. Updating distributed variables in local computations. Concurrency: Practice and Experience, 2(3):171-193, September 1990.
    [12]
    M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K. Wang, W. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Proceedings of Supercomputing '95, San Diego, CA, December 1995.
    [13]
    T. Haupt, S. Reddy, and G. Vengurlekar. Low level HPF compiler benchmark suite. Technical Report SCCS-735, Northeast Parallel Architectures Center, Syracuse University, Syracuse, NY, August 1995.
    [14]
    High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1-2):1-170, 1993.
    [15]
    K. Kennedy, J. Mellor-Crummey, and G. Roth. Optimizing Fortran 90 shift operations on distributed-memory multicomputers. In Languages and Compilers for Parallel Computing, Eighth International Workshop, Columbus, OH, August 1995. Springer-Verlag.
    [16]
    K. Kennedy and K. S. McKinley. Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University, August 1993.
    [17]
    K. Kennedy and G. Roth. Context optimization for SIMD execution. In Proceedings of the 1994 Scalable High Performance Computing Conference, Knoxville, TN, May 1994.
    [18]
    K. Knobe, J. Lukas, and M. Weiss. Optimization techniques for SIMD Fortran compilers. Concurrency: Practice and Experience, 5(7):527-552, October 1993.
    [19]
    K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, July 1996.
    [20]
    A. Mohamed, G. Fox, G. v. Laszewski, M. Parashar, T. Haupt, K. Mills, Y. Lu, N. Lin, and N. Yeh. Applications benchmark set for Fortran-D and High Performance Fortran. Technical Report SCCS-327, Northeast Parallel Architectures Center, Syracuse University, Syracuse, NY, June 1992.
    [21]
    J. R. Rice and J. Jing. Problems to test parallel and vector languages. Technical Report CSD-TR-1016, Dept. of Computer Science, Purdue University, 1990.
    [22]
    G. Roth. Optimizing Fortran90D/HPF for Distributed-Memory Computers. PhD thesis, Dept. of Computer Science, Rice University, April 1997.
    [23]
    G. Sabot. A compiler for a massively parallel distributed memory MIMD computer. In Frontiers '92: The 4th Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, October 1992.
    [24]
    M. J. Wolfe. Optimizing Supercompilers for Supercomputers. The MIT Press, Cambridge, MA, 1989.

    Cited By

    View all
    • (2023)Fragmentation-induced localization and boundary charges in dimensions two and aboveSciPost Physics10.21468/SciPostPhys.14.6.14014:6Online publication date: 1-Jun-2023
    • (2023)Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear AlgebraIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332202934:12(3147-3161)Online publication date: Dec-2023
    • (2023)Spatial dependency analysis to extract information from side-channel mixtures: extended versionJournal of Cryptographic Engineering10.1007/s13389-022-00307-913:4(409-425)Online publication date: 3-Jan-2023
    • Show More Cited By
    1. Compiling stencils in high performance Fortran

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SC '97: Proceedings of the 1997 ACM/IEEE conference on Supercomputing
      November 1997
      921 pages
      ISBN:0897919858
      DOI:10.1145/509593
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 November 1997

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. communication unioning
      2. high performance Fortran
      3. shift optimization
      4. statement partitioning
      5. stencil compilation

      Qualifiers

      • Article

      Conference

      SC '97
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)53
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Fragmentation-induced localization and boundary charges in dimensions two and aboveSciPost Physics10.21468/SciPostPhys.14.6.14014:6Online publication date: 1-Jun-2023
      • (2023)Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear AlgebraIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332202934:12(3147-3161)Online publication date: Dec-2023
      • (2023)Spatial dependency analysis to extract information from side-channel mixtures: extended versionJournal of Cryptographic Engineering10.1007/s13389-022-00307-913:4(409-425)Online publication date: 3-Jan-2023
      • (2021)Spatial Dependency Analysis to Extract Information from Side-Channel MixturesProceedings of the 5th Workshop on Attacks and Solutions in Hardware Security10.1145/3474376.3487280(73-84)Online publication date: 19-Nov-2021
      • (2021)Revisiting split tiling for stencil computations in polyhedral compilationThe Journal of Supercomputing10.1007/s11227-021-03835-zOnline publication date: 27-May-2021
      • (2020)Efficient Acceleration of Stencil Applications through In-Memory ComputingMicromachines10.3390/mi1106062211:6(622)Online publication date: 26-Jun-2020
      • (2019)Low-Overhead In Situ Visualization Using Halo Replay2019 IEEE 9th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV48142.2019.8944265(16-26)Online publication date: Oct-2019
      • (2018)Data Partitioning Strategies for Stencil Computations on NUMA SystemsEuro-Par 2017: Parallel Processing Workshops10.1007/978-3-319-75178-8_48(597-609)Online publication date: 8-Feb-2018
      • (2017)QUARCProceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC10.1145/3148173.3148188(1-11)Online publication date: 12-Nov-2017
      • (2017)OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization MethodologyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.261498128:5(1390-1402)Online publication date: 1-May-2017
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media