Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A domain-specific approach to heterogeneous parallelism

Published: 12 February 2011 Publication History

Abstract

Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be accessible to the average programmer. We propose leveraging domain-specific languages (DSLs) to map high-level application code to heterogeneous devices. To demonstrate the potential of this approach we present OptiML, a DSL for machine learning. OptiML programs are implicitly parallel and can achieve high performance on heterogeneous hardware with no modification required to the source code. For such a DSL-based approach to be tractable at large scales, better tools are required for DSL authors to simplify language creation and parallelization. To address this concern, we introduce Delite, a system designed specifically for DSLs that is both a framework for creating an implicitly parallel DSL as well as a dynamic runtime providing automated targeting to heterogeneous parallel hardware. We show that OptiML running on Delite achieves single-threaded, parallel, and GPU performance superior to explicitly parallelized MATLAB code in nearly all cases.

References

[1]
High Performance Fortran. http://hpff.rice.edu/index.htm.
[2]
Scala. http://www.scala-lang.org.
[3]
AccelerEyes. Jacket. http://www.accelereyes.com/products/jacket.
[4]
AMD. The Industry-Changing Impact of Accelerated Computing. Website. http://sites.amd.com/us/Documents/AMD_fusion_Whitepaper.pdf.
[5]
O.S. Bagge, K.T. Kalleberg, M. Haveraaen, and E. Visser. Design of the CodeBoost transformation system for domain-specific optimisation of C programs. In Source Code Analysis and Manipulation, 2003. Proceedings. Third IEEE International Workshop on, pages 65--74, Sept. 2003.
[6]
Guy E. Blelloch. Programming parallel algorithms. Commun. ACM, 39(3):85--97, 1996.
[7]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. In PPOPP'95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 207--216, New York, NY, USA, 1995. ACM.
[8]
Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: Stream computing on graphics hardware. ACM TRANSACTIONS ON GRAPHICS, 23:777--786, 2004.
[9]
Bryan C. Catanzaro, Armando Fox, Kurt Keutzer, David Patterson, Bor-Yiing Su, Marc Snir, Kunle Olukotun, Pat Hanrahan, and Hassan Chafi. Ubiquitous parallel computing from Berkeley, Illinois, and Stanford. IEEE Micro, 30(2):41--55, 2010.
[10]
Hassan Chafi, Zach DeVito, Adrian Moors, Tiark Rompf, Arvind Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun. Language virtualization for heterogeneous parallel computing. In Onward!, 2010.
[11]
B.L. Chamberlain, D. Callahan, and H.P. Zima. Parallel Programmability and the Chapel Language. Int. J. High Perform. Comput. Appl., 21(3):291--312, 2007.
[12]
Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10):519--538, 2005.
[13]
Gregory F. Diamos and Sudhakar Yalamanchili. Harmony: an execution model and runtime for heterogeneous many core systems. In HPDC'08: Proceedings of the 17th international symposium on High performance distributed computing, pages 197--200, New York, NY, USA, 2008. ACM.
[14]
Rickard E. Faith, Lars S. Nyland, and Jan F. Prins. Khepera: A system for rapid implementation of domain specific languages. In In Proceedings USENIX Conference on Domain-Speci Languages, pages 243--255, 1997.
[15]
Samuel Z. Guyer and Calvin Lin. An annotation language for optimizing software libraries. In PLAN'99: Proceedings of the 2nd conference on Domain-specific languages, pages 39--52, New York, NY, USA, 1999. ACM.
[16]
Klaus Havelund, Michel Ingham, and David Wagner. A case study in DSL development: An experiment with Python and Scala. In The First Annual Scala Workshop at Scala Days 2010, 2010.
[17]
Paul Hudak. Building domain-specific embedded languages. ACM Comput. Surv., page 196.
[18]
Intel. From a Few Cores to Many: A Tera-scale Computing Research Review. Website. http://download.intel.com/research/platform/terascale/terascale_overvie%w_paper.pdf.
[19]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys'07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pages 59--72, New York, NY, USA, 2007. ACM.
[20]
Michael Isard and Yuan Yu. Distributed data-parallel computing using a high-level programming language. In SIGMOD'09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 987--994, New York, NY, USA, 2009. ACM.
[21]
Ken Kennedy, Bradley Broom, Arun Chauhan, Rob Fowler, John Garvin, Charles Koelbel, Cheryl McCosh, and John Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proceedings of the IEEE, 93(3):387, 2005. This provides a current overview of the entire Telescoping Languages Project.
[22]
Michael D. Linderman, Jamison D. Collins, Hong Wang, and Teresa H. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS'08, New York, NY, USA, 2008. ACM.
[23]
Michael D. McCool, Kevin Wadleigh, Brent Henderson, and Hsin-Ying Lin. Performance evaluation of GPUs using the RapidMind development platform. In SC'06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 181, New York, NY, USA, 2006. ACM.
[24]
Erik Meijer, Brian Beckman, and Gavin Bierman. LINQ: Reconciling object, relations and XML in the .NET framework. In SIGMOD'06: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 706--706, New York, NY, USA, 2006. ACM.
[25]
Vijay Menon and Keshav Pingali. A case for source-level transformations in MATLAB. In PLAN'99: Proceedings of the 2nd conference on Domain-specific languages, pages 53--65, New York, NY, USA, 1999. ACM.
[26]
Marjan Mernik, Jan Heering, and Anthony M. Sloane. When and how to develop domain-specific languages. ACM Comput. Surv., 37(4):316--344, 2005.
[27]
NVIDIA. CUDA. http://developer.nvidia.com/object/cuda.html.
[28]
Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Kenneth G. Wilson, and Kunyung Chang. The case for a single-chip multiprocessor. In ASPLOS'96.
[29]
PeakStream. The PeakStream platform: High productivity software development for multi-core processors. technical report, 2006.
[30]
G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst., 4(2):175--187, 1993.
[31]
David Tarditi, Sidd Puri, and Jose Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 325--335, New York, NY, USA, 2006. ACM.
[32]
The Khronos Group. OpenCL 1.0. http://www.khronos.org/opencl/.
[33]
P. W. Trinder, H.-W. Loidl, and R. F. Pointon. Parallel and distributed Haskells. J. Funct. Program., 12(5):469--510, 2002.
[34]
Arie van Deursen, Paul Klint, and Joost Visser. Domain-specific languages: an annotated bibliography. SIGPLAN Not., 35(6):26--36, 2000.
[35]
Perry H. Wang, Jamison D. Collins, Gautham N. Chinya, Hong Jiang, Xinmin Tian, Milind Girkar, Nick Y. Yang, Guei-Yuan Lueh, and Hong Wang. Exochi: architecture and programming environment for a heterogeneous multi-core multithreaded system. In PLDI'07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 156--166, New York, NY, USA, 2007. ACM.

Cited By

View all
  • (2024)Model-Driven Engineering for High-Performance Parallel Discrete Event Simulations on Heterogeneous ArchitecturesProceedings of the Winter Simulation Conference10.5555/3712729.3712913(2202-2213)Online publication date: 15-Dec-2024
  • (2024)Model-Driven Engineering for High-Performance Parallel Discrete Event Simulations on Heterogeneous Architectures2024 Winter Simulation Conference (WSC)10.1109/WSC63780.2024.10838978(2202-2213)Online publication date: 15-Dec-2024
  • (2023)Codon: A Compiler for High-Performance Pythonic Applications and DSLsProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580275(191-202)Online publication date: 17-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 46, Issue 8
PPoPP '11
August 2011
300 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2038037
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
    February 2011
    326 pages
    ISBN:9781450301190
    DOI:10.1145/1941553
    • General Chair:
    • Calin Cascaval,
    • Program Chair:
    • Pen-Chung Yew
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2011
Published in SIGPLAN Volume 46, Issue 8

Check for updates

Author Tags

  1. domain-specific languages
  2. dynamic optimizations
  3. parallel programming
  4. runtimes

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Model-Driven Engineering for High-Performance Parallel Discrete Event Simulations on Heterogeneous ArchitecturesProceedings of the Winter Simulation Conference10.5555/3712729.3712913(2202-2213)Online publication date: 15-Dec-2024
  • (2024)Model-Driven Engineering for High-Performance Parallel Discrete Event Simulations on Heterogeneous Architectures2024 Winter Simulation Conference (WSC)10.1109/WSC63780.2024.10838978(2202-2213)Online publication date: 15-Dec-2024
  • (2023)Codon: A Compiler for High-Performance Pythonic Applications and DSLsProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580275(191-202)Online publication date: 17-Feb-2023
  • (2022)A Multi-target, Multi-paradigm DSL Compiler for Algorithmic Graph ProcessingProceedings of the 15th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3567512.3567513(2-15)Online publication date: 29-Nov-2022
  • (2022)GraphIt to CUDA Compiler in 2021 LOC: A Case for High-Performance DSL Implementation via Staging with BuilDSL2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741280(53-65)Online publication date: 2-Apr-2022
  • (2021)Towards a polyglot framework for factorized MLProceedings of the VLDB Endowment10.14778/3476311.347637214:12(2918-2931)Online publication date: 28-Oct-2021
  • (2021)Database technology for the massesProceedings of the VLDB Endowment10.14778/3476249.347629614:11(2483-2490)Online publication date: 27-Oct-2021
  • (2021)Towards a domain-extensible compilerProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370337(27-38)Online publication date: 27-Feb-2021
  • (2020)A Survey on Parallel Architectures and Programming Models2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO)10.23919/MIPRO48935.2020.9245341(999-1005)Online publication date: 28-Sep-2020
  • (2020)Domain-Specific Language Techniques for Visual Computing: A Comprehensive StudyArchives of Computational Methods in Engineering10.1007/s11831-020-09492-428:4(3113-3134)Online publication date: 27-Oct-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media