Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors

Published: 04 June 2011 Publication History
  • Get Citation Alerts
  • Abstract

    MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance.
    In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.

    References

    [1]
    A. V. Aho, Ravi Sethi, J. D. Ullman, M. S. Lam. Compilers: Principles, Techniques, & Tools. Pearson Education, 2009.
    [2]
    G. Almasi, D. Padua. MaJIC: Compiling MATLAB for Speed and Responsiveness. In the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI '02).
    [3]
    ATI Technologies, http://ati.amd.com/products/index.html
    [4]
    M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs. In the 22nd Annual International Conference on Supercomputing (ICS '08).
    [5]
    M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan. Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories. In the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08).
    [6]
    U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In the 2008 ACM SIGPLAN conference on Programming language design and implementation (PLDI '08).
    [7]
    A. Chauhan, C. McCosh, K. Kennedy, and R. Hanson. Automatic Type-Driven Library Generation for Telescoping Languages. In the 2003 ACM/IEEE Conference on Supercomputing (SC '03).
    [8]
    M. Chevalier-Boisvert, L. Hendren, C. Verbrugge. Optimizing MATLAB Through Just-In-Time Specialization. In the 2010 International Conference on Compiler Construction (CC '10).
    [9]
    R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, F. K. Zadeck. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. In the ACM Transactions on Programming Languages and Systems, 13(4):451--490, Oct. 1991.
    [10]
    L. De Rose, D. Padua. Techniques for the translation of MATLAB programs into Fortran 90. In the ACM Transactions on Programming Languages and Systems, 21(2):286--323, Mar. 1999.
    [11]
    J. W. Eaton. GNU Octave Manual, Network Theory Limited, 2002.
    [12]
    GPUMat Home Page. http://gp-you.org/
    [13]
    M. Haldar et. al. MATCH Virtual Machine: An Adaptive Run-Time System to Execute MATLAB in Parallel. In the 2000 International Conference on Parallel Processing (ICPP '00).
    [14]
    Jacket Home Page. http://www.accelereyes.com/
    [15]
    P. Joisha, P. Banerjee. Static Array Storage Optimization in MATLAB. In the ACM SIGPLAN 2003 conference on Programming language design and implementation (PLDI '03).
    [16]
    P. Joisha, P. Banerjee. An Algebraic Array Shape Inference System for MATLAB. ACM Transactions on Programming Languages and Systems, 28(5):848--907, September 2006.
    [17]
    P. Joisha, P. Banerjee. A Translator System for the MATLAB Language, Research Articles on Software Practices and Experience '07.
    [18]
    K. Kennedy, K. S. McKinley. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In the 6th International Workshop on Languages and Compilers for Parallel Computing (LCPC '93).
    [19]
    R. Khoury, B. Burgstaller, B. Scholz, Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture. IEEE Transactions on Parallel and Distributed Systems, 22(1):7--21, Jan. 2011.
    [20]
    E. Lindholm, J. Nickolls, S. Oberman, J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, March 2008.
    [21]
    Mathworks Home Page. http://www.mathworks.com/
    [22]
    NVIDIA Corp, NVIDIA CUDA: Compute Unified Device Architecture: Programming Guide, Version 3.0, 2010.
    [23]
    NVIDIA Corp, Fermi Home Page, http://www.nvidia.com/object/fermi_architecture.html
    [24]
    S. K. Singhai, K. S. Mckinley. A Parametrized Loop Fusion Algorithm for Improving Parallelism and Cache Locality, Computer Journal, 1997.
    [25]
    D. Tarditi, S. Puri, J. Oglesby. Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses. In the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XII).
    [26]
    A. Udupa, R. Govindarajan, M. J. Thazhuthaveetil. Software Pipelined Execution of Stream Programs on GPUs. In the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '09).
    [27]
    A. Udupa, R. Govindarajan, M. J. Thazhuthaveetil. Synergistic Execution of Stream Programs on Multicores with Accelerators. In the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES '09).
    [28]
    V. Volkov, J. W. Demmel. Benchmarking GPUs to Tune Dense Linear Algebra. In the 2008 ACM/IEEE Conference on Supercomputing (SC '08).
    [29]
    Y. Yang, P. Xiang, J. Kong, H. Zhou. A GPGPU Compiler for Memory Optimization and Parallelism Management. In the 2010 ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI '10).

    Cited By

    View all
    • (2020)Compilation of MATLAB computations to CPU/GPU via C/OpenCL generationConcurrency and Computation: Practice and Experience10.1002/cpe.585432:22Online publication date: Jul-2020
    • (2019)Design, implementation, and application of GPU-based Java bytecode interpretersProceedings of the ACM on Programming Languages10.1145/33606033:OOPSLA(1-28)Online publication date: 10-Oct-2019
    • (2017)Compiler Techniques for Efficient MATLAB to OpenCL Code GenerationProceedings of the 5th International Workshop on OpenCL10.1145/3078155.3078186(1-2)Online publication date: 16-May-2017
    • Show More Cited By

    Index Terms

    1. Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 46, Issue 6
        PLDI '11
        June 2011
        652 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1993316
        Issue’s Table of Contents
        • cover image ACM Conferences
          PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2011
          668 pages
          ISBN:9781450306638
          DOI:10.1145/1993498
          • General Chair:
          • Mary Hall,
          • Program Chair:
          • David Padua
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 04 June 2011
        Published in SIGPLAN Volume 46, Issue 6

        Check for updates

        Author Tags

        1. compiling for gpus
        2. data parallelism
        3. gpgpu
        4. matlab

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)10
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2020)Compilation of MATLAB computations to CPU/GPU via C/OpenCL generationConcurrency and Computation: Practice and Experience10.1002/cpe.585432:22Online publication date: Jul-2020
        • (2019)Design, implementation, and application of GPU-based Java bytecode interpretersProceedings of the ACM on Programming Languages10.1145/33606033:OOPSLA(1-28)Online publication date: 10-Oct-2019
        • (2017)Compiler Techniques for Efficient MATLAB to OpenCL Code GenerationProceedings of the 5th International Workshop on OpenCL10.1145/3078155.3078186(1-2)Online publication date: 16-May-2017
        • (2017)Dataflow in MATLAB: Algorithm Acceleration Through ConcurrencyIEEE Access10.1109/ACCESS.2017.26722005(2308-2318)Online publication date: 2017
        • (2017)A MATLAB subset to C compiler targeting embedded systemsSoftware—Practice & Experience10.1002/spe.240847:2(249-272)Online publication date: 1-Feb-2017
        • (2016)SSA-based MATLAB-to-C compilation and optimizationProceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/2935323.2935330(55-62)Online publication date: 2-Jun-2016
        • (2016)Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array ProgrammingundefinedOnline publication date: 2-Jun-2016
        • (2015)A Survey of CPU-GPU Heterogeneous Computing TechniquesACM Computing Surveys10.1145/278839647:4(1-35)Online publication date: 21-Jul-2015
        • (2015)Techniques for efficient MATLAB-to-C compilationProceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/2774959.2774961(7-12)Online publication date: 13-Jun-2015
        • (2015)C and OpenCL generation from MATLABProceedings of the 30th Annual ACM Symposium on Applied Computing10.1145/2695664.2695911(1315-1320)Online publication date: 13-Apr-2015
        • Show More Cited By

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media