Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/377792.377807acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Eliminating redundancies in sum-of-product array computations

Published: 17 June 2001 Publication History
  • Get Citation Alerts
  • Abstract

    Array programming languages such as Fortran 90, High Performance Fortran and ZPL are well-suited to scientific computing because they free the scientist from the responsibility of managing burdensome low-level details that complicate programming in languages like C and Fortran 77. However, these burdensome details are critical to performance, thus necessitating aggressive compilation techniques for their optimization. In this paper, we present a new compiler optimization called Array Subexpression Elimination (ASE) that lets a programmer take advantage of the expressibility afforded by array languages and achieve enviable portability and performance. We design a set of micro-benchmarks that model an important class of computations known as stencils and we report on our implementation of this optimization in the context of this micro-benchmark suite. Our results include a 125% improvement on one of these benchmarks and a 50% average speedup across the suite. Also we show a speedup of 32% improvement on the ZPL port of the NAS MG Parallel Benchmark and a 29% speedup over the hand-optimized Fortran version. Further, the compilation time is only negligibly affected.

    References

    [1]
    M. Abramowitz and I. A. Stegun. Handbook of mathematical functions, with formulas, graphs, and mathematical tables. Dover Publications, 1973.
    [2]
    J. C. Adams, W. S. Brainerd, J. T. Martin, B. T. Smith, and J. L. Wagener. Fortran 90 Handbook. McGraw-Hill, 1992.
    [3]
    A. V. Aho, S. C. Johnson, and J. D. Ullman. Code generation for expressions with common subexpressions. Journal of the ACM, 24(1):146-160, January 1977.
    [4]
    D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS parallel benchmarks (94). Technical report, RNR Technical Report RNR-94-007, March 1994.
    [5]
    D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Technical report, NAS Report NAS-95-020, December 1995.
    [6]
    R. G. Brickner, K. Holian, B. Thiagarajan, and S. L. Johnson. A stencil compiler for the connection machine model cm-5. Technical report, Center for Research onParallel Computation CRPC-TR94457, June 1994.
    [7]
    M. Bromley, S. Heller, T. McNerney, and G. L. S. Jr. Fortran at ten giga ops: The connection machine convolution compiler. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, 1991.
    [8]
    D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of the SIGPLAN '90 Conference on Programming Language Design and Implementation, 1990.
    [9]
    S. Carr and K. Kennedy. Scalar replacement inthe presence of conditional control ow. Software - Practice and Experience, 24(1):51-77, January 1994.
    [10]
    C. J. Chaitin, M. A. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins, and P. W. Markstein. Register allocation via coloring. Computer Languages, 6:45-57, January 1981.
    [11]
    B. L. Chamberlain, S. J. Deitz, and L. Snyder. A comparative study of the nas mg benchmark across parallel languages and architectures. In Proceedings of Supercomputing '00: High Performance Networking and Computing, 2000.
    [12]
    M. D. Ernst. Serializing parallel programs by removing redundant computation. Technical report, Microsoft Research Technical Report MSR-TR-94-15, August 1994.
    [13]
    A. L. Fisher and P. T. Highnam. Communication and code optimization in simd programs. In International Conference onParallel Processing, 1988.
    [14]
    A. L. Fisher, J. Leon, and P. T. Highnam. Design and performance of an optimizing simd compiler. In Frontiers of Massively Parallel Computation, 1990.
    [15]
    R. M. Haralick and L. G. Shapiro. Computer and Robot Vision. Addison-Wesley, 1992.
    [16]
    High Performance Fortran Forum. High Performance Fortran Langauge Specification, Version 2.0. January 1997.
    [17]
    E. C. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and contraction in array languages. In Proceedings of the SIGPLAN '98 Conference onProgramming Language Design and Implementation, 1998.
    [18]
    Y. Liu and S. Stoller. Loop optimization for aggregate array computations. In Proceedings of the IEEE 1998 International Conference on Computer Languages, 1998.
    [19]
    G. Roth, J. Mellor-Crummey, K. Kennedy, and R. G. Brickner. Compiling stencils in high performance fortran. In Proceedings of Supercomputing '97: High Performance Networking and Computing, 1997.
    [20]
    L. Snyder. Programming Guide to ZPL. MIT Press, 1999.

    Cited By

    View all
    • (2024)EasyView: Bringing Performance Profiles into Integrated Development Environments2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444840(386-398)Online publication date: 2-Mar-2024
    • (2023)Performance Portability Evaluation of Blocked Stencil Computations on GPUsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624177(1007-1018)Online publication date: 12-Nov-2023
    • (2023)TrivialSpy: Identifying Software Triviality via Fine-grained and Dataflow-based Value ProfilingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607052(1-13)Online publication date: 12-Nov-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '01: Proceedings of the 15th international conference on Supercomputing
    June 2001
    510 pages
    ISBN:158113410X
    DOI:10.1145/377792
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 June 2001

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    ICS01
    Sponsor:

    Acceptance Rates

    ICS '01 Paper Acceptance Rate 45 of 133 submissions, 34%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)EasyView: Bringing Performance Profiles into Integrated Development Environments2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444840(386-398)Online publication date: 2-Mar-2024
    • (2023)Performance Portability Evaluation of Blocked Stencil Computations on GPUsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624177(1007-1018)Online publication date: 12-Nov-2023
    • (2023)TrivialSpy: Identifying Software Triviality via Fine-grained and Dataflow-based Value ProfilingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607052(1-13)Online publication date: 12-Nov-2023
    • (2021)Reducing redundancy in data organization and arithmetic calculation for stencil computationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476154(1-15)Online publication date: 14-Nov-2021
    • (2020)ZeroSpyProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433739(1-14)Online publication date: 9-Nov-2020
    • (2020)What every scientific programmer should know about compiler optimizations?Proceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392754(1-12)Online publication date: 29-Jun-2020
    • (2020)Architecture and Performance of Devito, a System for Automated Stencil ComputationACM Transactions on Mathematical Software10.1145/337491646:1(1-28)Online publication date: 26-Apr-2020
    • (2020)ZeroSpy: Exploring Software Inefficiency with Redundant ZerosSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00033(1-14)Online publication date: Nov-2020
    • (2020)Exploiting Computation Reuse for Stencil Accelerators2020 57th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18072.2020.9218680(1-6)Online publication date: Jul-2020
    • (2019)Tessellating Star StencilsProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337835(1-10)Online publication date: 5-Aug-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media