Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1735688.1735698acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

Published: 14 March 2010 Publication History

Abstract

Programmers for GPGPU face rapidly changing substrate of programming abstractions, execution models, and hardware implementations. It has been established, through numerous demonstrations for particular conjunctions of application kernel, programming languages, and GPU hardware instance, that it is possible to achieve significant improvements in the price/performance and energy/performance over general purpose processors. But these demonstrations are each the result of significant dedicated programmer labor, which is likely to be duplicated for each new GPU hardware architecture to achieve performance portability.
This paper discusses the implementation, in the R-Stream compiler, of a source to source mapping pathway from a high-level, textbook-style algorithm expression method in ANSI C, to multi-GPGPU accelerated computers. The compiler performs hierarchical decomposition and parallelization of the algorithm between and across host, multiple GPGPUs, and within-GPU. The semantic transformations are expressed within the polyhedral model, including optimization of integrated parallelization, locality, and contiguity tradeoffs. Hierarchical tiling is performed. Communication and synchronizations operations at multiple levels are generated automatically. The resulting mapping is currently emitted in the CUDA programming language.
The GPU backend adds to the range of hardware and accelerator targets for R-Stream and indicates the potential for performance portability of single sources across multiple hardware targets.

References

[1]
C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 39--50, Williamsburg, VA, April 1991.
[2]
A. Barvinok. A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed. Mathematics of Operations Research, 19:769--779, 1994.
[3]
M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for gpgpus. In ACM International Conference on Supercomputing (ICS), Jun 2008.
[4]
M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic c-to-cuda code generation for affine programs. In Proceedings of the International Conference on Compiler Construction (ETAPS CC'10), lncs, Cyprus, March 2010. Springer-Verlag.
[5]
M. Manikandan Baskaran, N. Vydyanathan, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In PPOPP, pages 219--228, 2009.
[6]
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Affine transformation for communication minimal parallelization and locality optimization of arbitrarily nested loop sequences. Technical Report OSU-CISRC-5/07-TR43, The Ohio State University, May 2007.
[7]
U. Bondhugula, A. Hartono, J. Ramanujan, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In ACM SIGPLAN Programming Languages Design and Implementation (PLDI '08), Tucson, Arizona, June 2008.
[8]
E. Ehrhart. Polynomes arithmetiques et methode des polyedres en combinatoire. International Series of Numerical Mathematics, 35, 1977.
[9]
P. Feautrier. Some efficient solutions to the affine scheduling problem. part I. One-dimensional time. International Journal of Parallel Programming, 21(5):313--348, October 1992.
[10]
Khronos OpenCL Working Group. The openCL specification (version 1.0). Technical report, 2009.
[11]
D. J. Kuck. High Performance Computing. Oxford University Press, 1996.
[12]
R. Lethin, A. Leung, B. Meister, P. Szilagyi, N. Vasilache, and D. Wohlford. Final report on the R-Stream 3.0 compiler DARPA/AFRL Contract # F03602-03-C-0033, DTIC AFRL-RI-RS-TR-2008-160. Technical report, Reservoir Labs, Inc., May 2008.
[13]
R. Lethin, A. Leung, B. Meister, P. Szilagyi, N. Vasilache, and D. Wohlford. Mapper machine model for the R-Stream compiler. Technical report, Reservoir Labs, Inc., Nov 2008.
[14]
B. Meister, A. Leung, N. Vasilache, D. Wohlford, C. Bastoul, and R. Lethin. Productivity via automatic code generation for pgas platforms with the r-stream compiler. In Workshop on Asynchrony in the PGAS Programming Model, Jun 2009.
[15]
B. Meister and S. Verdoolaege. Polynomial approximations in the polytope model: Bringing the power of quasi-polynomials to the masses. In ODES-6: 6th Workshop on Optimizations for DSP and Embedded Systems, Apr 2008.
[16]
NVIDIA. Cuda zone http://www.nvidia.com/cuda, 2008.
[17]
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, and P. Sadayappan. Hybrid iterative and model-driven optimization in the polyhedral model. Technical Report 6962, INRIA Research Report, June 2009.
[18]
N. Vasilache. Scalable Program Optimization Techniques In the Polyhedral Model. PhD thesis, University of Paris-Sud, September 2007.
[19]
S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and Maurice Bruynooghe. Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations. In Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, pages 248--258. ACM Press, 2004.
[20]
V. Volkov and J. W. Demmel. Benchmarking gpus to tune dense linear algebra. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--11, Piscataway, NJ, USA, 2008. IEEE Press.
[21]
V. Volkov and J. W. Demmel. LU, QR and Cholesky factorizations using vector capabilities of GPUs. Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley, May 2008.

Cited By

View all
  • (2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
  • (2023)Challenges in GPU-Accelerated Nonlinear Dynamic Analysis for Structural SystemsJournal of Structural Engineering10.1061/JSENDH.STENG-11311149:3Online publication date: Mar-2023
  • (2021)On the Impact of Affine Loop Transformations in Qubit AllocationACM Transactions on Quantum Computing10.1145/34654092:3(1-40)Online publication date: 30-Sep-2021
  • Show More Cited By

Index Terms

  1. A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    GPGPU-3: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
    March 2010
    124 pages
    ISBN:9781605589350
    DOI:10.1145/1735688
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 March 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CUDA
    2. GPGPU
    3. automatic translation
    4. compiler optimziation
    5. parallelization
    6. polyhedral model

    Qualifiers

    • Research-article

    Conference

    GPGPU-3

    Acceptance Rates

    Overall Acceptance Rate 57 of 129 submissions, 44%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
    • (2023)Challenges in GPU-Accelerated Nonlinear Dynamic Analysis for Structural SystemsJournal of Structural Engineering10.1061/JSENDH.STENG-11311149:3Online publication date: Mar-2023
    • (2021)On the Impact of Affine Loop Transformations in Qubit AllocationACM Transactions on Quantum Computing10.1145/34654092:3(1-40)Online publication date: 30-Sep-2021
    • (2021)Tile size selection of affine programs for GPGPUs using polyhedral cross-compilationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460369(13-26)Online publication date: 3-Jun-2021
    • (2021)Automatic Thread Block Size Selection Strategy in GPU Parallel Code GenerationParallel Architectures, Algorithms and Programming10.1007/978-981-16-0010-4_34(390-404)Online publication date: 7-Feb-2021
    • (2020)Automatic Mapping and Optimization to Kokkos with Polyhedral Compilation2020 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC43674.2020.9286233(1-7)Online publication date: 22-Sep-2020
    • (2020)Parallel programming models for heterogeneous many-cores: a comprehensive surveyCCF Transactions on High Performance Computing10.1007/s42514-020-00039-42:4(382-400)Online publication date: 31-Jul-2020
    • (2019)Performance evaluation of OpenMP's target construct on GPUs-exploring compiler optimisationsInternational Journal of High Performance Computing and Networking10.5555/3302714.330271813:1(54-69)Online publication date: 1-Jan-2019
    • (2019)Model-driven transformations for multi- and many-core CPUsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314653(469-484)Online publication date: 8-Jun-2019
    • (2019)Parallelization Of Object-oriented Machine Vision Algorithms For Embedded GPUs2019 IEEE 9th International Conference on Consumer Electronics (ICCE-Berlin)10.1109/ICCE-Berlin47944.2019.8966138(392-395)Online publication date: Sep-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media