research-article

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors

Authors:

Jayvant Anantpur,

R. GovindarajanAuthors Info & Claims

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 152 - 163

https://doi.org/10.1145/1993498.1993517

Published: 04 June 2011 Publication History

Abstract

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance.

In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.

References

[1]

A. V. Aho, Ravi Sethi, J. D. Ullman, M. S. Lam. Compilers: Principles, Techniques, & Tools. Pearson Education, 2009.

Digital Library

[2]

G. Almasi, D. Padua. MaJIC: Compiling MATLAB for Speed and Responsiveness. In the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI '02).

Digital Library

[3]

ATI Technologies, http://ati.amd.com/products/index.html

[4]

M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs. In the 22nd Annual International Conference on Supercomputing (ICS '08).

Digital Library

[5]

M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan. Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories. In the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08).

Digital Library

[6]

U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In the 2008 ACM SIGPLAN conference on Programming language design and implementation (PLDI '08).

Digital Library

[7]

A. Chauhan, C. McCosh, K. Kennedy, and R. Hanson. Automatic Type-Driven Library Generation for Telescoping Languages. In the 2003 ACM/IEEE Conference on Supercomputing (SC '03).

Digital Library

[8]

M. Chevalier-Boisvert, L. Hendren, C. Verbrugge. Optimizing MATLAB Through Just-In-Time Specialization. In the 2010 International Conference on Compiler Construction (CC '10).

Digital Library

[9]

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, F. K. Zadeck. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. In the ACM Transactions on Programming Languages and Systems, 13(4):451--490, Oct. 1991.

Digital Library

[10]

L. De Rose, D. Padua. Techniques for the translation of MATLAB programs into Fortran 90. In the ACM Transactions on Programming Languages and Systems, 21(2):286--323, Mar. 1999.

Digital Library

[11]

J. W. Eaton. GNU Octave Manual, Network Theory Limited, 2002.

[12]

GPUMat Home Page. http://gp-you.org/

[13]

M. Haldar et. al. MATCH Virtual Machine: An Adaptive Run-Time System to Execute MATLAB in Parallel. In the 2000 International Conference on Parallel Processing (ICPP '00).

Digital Library

[14]

Jacket Home Page. http://www.accelereyes.com/

[15]

P. Joisha, P. Banerjee. Static Array Storage Optimization in MATLAB. In the ACM SIGPLAN 2003 conference on Programming language design and implementation (PLDI '03).

Digital Library

[16]

P. Joisha, P. Banerjee. An Algebraic Array Shape Inference System for MATLAB. ACM Transactions on Programming Languages and Systems, 28(5):848--907, September 2006.

Digital Library

[17]

P. Joisha, P. Banerjee. A Translator System for the MATLAB Language, Research Articles on Software Practices and Experience '07.

Digital Library

[18]

K. Kennedy, K. S. McKinley. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In the 6th International Workshop on Languages and Compilers for Parallel Computing (LCPC '93).

Digital Library

[19]

R. Khoury, B. Burgstaller, B. Scholz, Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture. IEEE Transactions on Parallel and Distributed Systems, 22(1):7--21, Jan. 2011.

Digital Library

[20]

E. Lindholm, J. Nickolls, S. Oberman, J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, March 2008.

Digital Library

[21]

Mathworks Home Page. http://www.mathworks.com/

[22]

NVIDIA Corp, NVIDIA CUDA: Compute Unified Device Architecture: Programming Guide, Version 3.0, 2010.

[23]

NVIDIA Corp, Fermi Home Page, http://www.nvidia.com/object/fermi_architecture.html

[24]

S. K. Singhai, K. S. Mckinley. A Parametrized Loop Fusion Algorithm for Improving Parallelism and Cache Locality, Computer Journal, 1997.

[25]

D. Tarditi, S. Puri, J. Oglesby. Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses. In the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XII).

Digital Library

[26]

A. Udupa, R. Govindarajan, M. J. Thazhuthaveetil. Software Pipelined Execution of Stream Programs on GPUs. In the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '09).

Digital Library

[27]

A. Udupa, R. Govindarajan, M. J. Thazhuthaveetil. Synergistic Execution of Stream Programs on Multicores with Accelerators. In the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES '09).

Digital Library

[28]

V. Volkov, J. W. Demmel. Benchmarking GPUs to Tune Dense Linear Algebra. In the 2008 ACM/IEEE Conference on Supercomputing (SC '08).

Digital Library

[29]

Y. Yang, P. Xiang, J. Kong, H. Zhou. A GPGPU Compiler for Memory Optimization and Parallelism Management. In the 2010 ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI '10).

Digital Library

Cited By

Stojanov ARompf TPüschel MSchaefer IReichenbach CStorm T(2019)A stage-polymorphic IR for compiling MATLAB-style dynamic tensor expressionsProceedings of the 18th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3357765.3359514(34-47)Online publication date: 21-Oct-2019
https://dl.acm.org/doi/10.1145/3357765.3359514
Latifis IParashar KDimitroulakos GCappelle HLezos CMasselos KCatthoor F(2017)A MATLAB Vectorizing Compiler Targeting Application-Specific Instruction Set ProcessorsACM Transactions on Design Automation of Electronic Systems10.1145/299618222:2(1-28)Online publication date: 4-Jan-2017
https://dl.acm.org/doi/10.1145/2996182
Latifis IParashar KDimitroulakos GCappelle HLezos CMasselos KCatthoor FFanucci LTeich J(2016)Matlab to C compilation targeting application specific instruction set processorsProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972146(1453-1456)Online publication date: 14-Mar-2016
https://dl.acm.org/doi/10.5555/2971808.2972146
Show More Cited By

Index Terms

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Very high level languages

Recommendations

Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
PLDI '11

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also ...
Automating GPU computing in MATLAB
ICS '11: Proceedings of the international conference on Supercomputing

MATLAB is a popular software platform for scientific and engineering software writers. It offers a high level of abstraction for fundamental mathematical operations and extensive highly optimized domain-specific libraries for several scientific and ...
Synergistic execution of stream programs on multicores with accelerators
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2011

668 pages

ISBN:9781450306638

DOI:10.1145/1993498

General Chair:
Mary Hall
University of Utah
,
Program Chair:
David Padua
University of Illinois at Urbana-Champaign

ACM SIGPLAN Notices Volume 46, Issue 6
PLDI '11
June 2011
652 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1993316
Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '11

Sponsor:

SIGPLAN

PLDI '11: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 4 - 8, 2011

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
536
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)3

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Stojanov ARompf TPüschel MSchaefer IReichenbach CStorm T(2019)A stage-polymorphic IR for compiling MATLAB-style dynamic tensor expressionsProceedings of the 18th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3357765.3359514(34-47)Online publication date: 21-Oct-2019
https://dl.acm.org/doi/10.1145/3357765.3359514
Latifis IParashar KDimitroulakos GCappelle HLezos CMasselos KCatthoor F(2017)A MATLAB Vectorizing Compiler Targeting Application-Specific Instruction Set ProcessorsACM Transactions on Design Automation of Electronic Systems10.1145/299618222:2(1-28)Online publication date: 4-Jan-2017
https://dl.acm.org/doi/10.1145/2996182
Latifis IParashar KDimitroulakos GCappelle HLezos CMasselos KCatthoor FFanucci LTeich J(2016)Matlab to C compilation targeting application specific instruction set processorsProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972146(1453-1456)Online publication date: 14-Mar-2016
https://dl.acm.org/doi/10.5555/2971808.2972146
Spazier JChristgau SSchnor BElsman MGrelck CKlöckner APadua D(2016)Automatic generation of parallel C code for stencil applications written in MATLABProceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/2935323.2935329(47-54)Online publication date: 2-Jun-2016
https://dl.acm.org/doi/10.1145/2935323.2935329
Skalicky SLopez SLukowiak MSchmidt A(2015)A Parallelizing Matlab Compiler Framework and Run Time for Heterogeneous SystemsProceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems10.1109/HPCC-CSS-ICESS.2015.51(232-237)Online publication date: 24-Aug-2015
https://dl.acm.org/doi/10.1109/HPCC-CSS-ICESS.2015.51
Kumar VHendren L(2014)MIX10ACM SIGPLAN Notices10.1145/2714064.266021849:10(617-636)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2714064.2660218
Kumar VHendren LBlack AMillstein T(2014)MIX10Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications10.1145/2660193.2660218(617-636)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2660193.2660218
Garg RHendren LAmaral JTorrellas J(2014)VelociraptorProceedings of the 23rd international conference on Parallel architectures and compilation10.1145/2628071.2628097(317-330)Online publication date: 24-Aug-2014
https://dl.acm.org/doi/10.1145/2628071.2628097
Lazarescu MCohen AGuatto ALê NLavagno LPop APrieto MTerechko ASutii ACorporaal HStuijk S(2014)Energy-aware parallelization flow and toolset for C codeProceedings of the 17th International Workshop on Software and Compilers for Embedded Systems10.1145/2609248.2609264(79-88)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2609248.2609264
Pandit PGovindarajan R(2014)Fluidic KernelsProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2581122.2544163(273-283)Online publication date: 15-Feb-2014
https://dl.acm.org/doi/10.1145/2581122.2544163
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents