Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/509058.509069acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

High performance Fortran compilation techniques for parallelizing scientific codes

Published: 07 November 1998 Publication History
  • Get Citation Alerts
  • Abstract

    With current compilers for High Performance Fortran (HPF), substantial restructuring and hand-optimization may be required to obtain acceptable performance from an HPF port of an existing Fortran application. A key goal of the Rice dHPF compiler project is to develop optimization techniques that can provide consistently high performance for a broad spectrum of scientific applications with minimal restructuring of existing Fortran 77 or Fortran 90 applications. This paper presents four new optimization techniques we developed to support efficient parallelization of codes with minimal restructuring. These optimizations include computation partition selection for loop nests that use privatizable arrays, along with partial replication of boundary computations to reduce communication overhead; communication-sensitive loop distribution to eliminate inner-loop communications; interprocedural selection of computation partitions; and data availability analysis to eliminate redundant communications. We studied the effectiveness of the dHPF compiler, which incorporates these optimizations, in parallelizing serial versions of the NAS SP and BT application benchmarks. We present experimental results comparing the performance of hand-written MPI code for the benchmarks against code generated from HPF using the dHPF compiler and the Portland Group's pghpf compiler. Using the compilation techniques described in this paper we achieve performance within 15% of hand-written MPI code on 25 processors for BT and within 33% for SP. Furthermore, these results are obtained with HPF versions of the benchmarks that were created with minimal restructuring of the serial code (modifying only approximately 5% of the code).

    References

    [1]
    V. Adve and J. Mellor-Crummey, Using Integer Sets for Data-Parallel Program Analysis and Optimization. In Proceedings of the SIGPLAN'98 Conference on Programming Language Design and Implementation (June 1998).]]
    [2]
    V. Adve and J. Mellor-Crummey, Advanced Code Generation for High Performance Fortran. In Languages, Compilation Techniques and Run Time Systems for Scalable Parallel Systems, D. Agarwal and S. Pande, Eds. Springer-Verlag Lecture Notes in Computer Science (to appear).]]
    [3]
    D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo and M. Yarrow, The NAS Parallel Benchmarks 2.0, AMES Technical Report NAS-95-020, NASA Ames Research Center, December 1995.]]
    [4]
    file://ftp.pgroup.com/pub/HPF/examples]]
    [5]
    P. Banerjee, J. Chandy, M. Gupta, E. Hodges, J. Holm, A. Lain, D. Palermo, S. Ramaswamy and E. Su, The Paradigm Compiler for Distributed-Memory Multicomputers. Computer 8, 10 (Oct. 1995) pp. 37-47.]]
    [6]
    S. Chakrabarti, M. Gupta and J-D. Choi, Global Communication Analysis and Optimization. In Proceedings of the SIGPLAN'96 Conference on Programming Language Design and Implementation (May 1996).]]
    [7]
    S. Hiranandani, K. Kennedy, and C-W Tseng, Preliminary Experiences with the Fortran D Compiler. In Proceedings of Supercomputing '93 (Nov. 1993), Association for Computing Machinery.]]
    [8]
    Koelbel, C., Loveman, D., Schreiber, R., Steele, Jr., G., and Zosel, M. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.]]
    [9]
    V. K. Naik, A Scalable implementation of the NAS Parallel Benchmark BT on Distributed Memory Systems, The IBM Systems Journal 34(2), 1995.]]
    [10]
    C. S. Ierotheou, S. P. Johnson, M. Cross and P. F. Leggett, Computer aided parallisation tools (CAPTools) - conceptual overview and performance on the parallelization of structured mesh codes, Parallel Computing 22(1996) 163-195.]]
    [11]
    A. Sawdey and M. O'Keefe, Program Analysis of Overlap Area Usage in Self-Similar Parallel Programs, In Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing (August 1997).]]
    [12]
    J. C. Yan, S. R. Sarukkai, and P. Mehra, Performance Measurement, Visualization and Modeling of Parallel and Distributed Programs using the AIMS Toolkit, In Software Practice & Experience. April 1995. Vol. 25, No.4, pages 429-461.]]

    Cited By

    View all
    • (2011)Transparent runtime parallelization of the R scripting languageJournal of Parallel and Distributed Computing10.1016/j.jpdc.2010.08.01371:2(157-168)Online publication date: 1-Feb-2011
    • (2007)Effective automatic parallelization of stencil computationsACM SIGPLAN Notices10.1145/1273442.125076142:6(235-244)Online publication date: 10-Jun-2007
    • (2007)Effective automatic parallelization of stencil computationsProceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/1250734.1250761(235-244)Online publication date: 15-Jun-2007
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing
    November 1998
    894 pages
    ISBN:089791984X

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 07 November 1998

    Check for updates

    Author Tags

    1. HPF
    2. NAS benchmarks
    3. parallelizing compiler

    Qualifiers

    • Article

    Conference

    SC '98
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)6

    Other Metrics

    Citations

    Cited By

    View all
    • (2011)Transparent runtime parallelization of the R scripting languageJournal of Parallel and Distributed Computing10.1016/j.jpdc.2010.08.01371:2(157-168)Online publication date: 1-Feb-2011
    • (2007)Effective automatic parallelization of stencil computationsACM SIGPLAN Notices10.1145/1273442.125076142:6(235-244)Online publication date: 10-Jun-2007
    • (2007)Effective automatic parallelization of stencil computationsProceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/1250734.1250761(235-244)Online publication date: 15-Jun-2007
    • (2005)Optimizing Compiler for the CELL ProcessorProceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2005.33(161-172)Online publication date: 17-Sep-2005
    • (2005)Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming ModelProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 4 - Volume 0510.1109/IPDPS.2005.331Online publication date: 4-Apr-2005
    • (2005)COTS Clusters vs. the Earth SimulatorProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.156Online publication date: 4-Apr-2005
    • (2003)TESTProceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization10.5555/776261.776294(301-312)Online publication date: 23-Mar-2003
    • (2003)The Jrpm system for dynamically parallelizing Java programsACM SIGARCH Computer Architecture News10.1145/871656.85966831:2(434-446)Online publication date: 1-May-2003
    • (2003)The Jrpm system for dynamically parallelizing Java programsProceedings of the 30th annual international symposium on Computer architecture10.1145/859618.859668(434-446)Online publication date: 9-Jun-2003
    • (2003)Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computationsJournal of Parallel and Distributed Computing10.1016/S0743-7315(03)00103-563:9(887-911)Online publication date: 1-Sep-2003
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media