Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

HPVM: heterogeneous parallel virtual machine

Published: 10 February 2018 Publication History

Abstract

We propose a parallel program representation for heterogeneous systems, designed to enable performance portability across a wide range of popular parallel hardware, including GPUs, vector instruction sets, multicore CPUs and potentially FPGAs. Our representation, which we call HPVM, is a hierarchical dataflow graph with shared memory and vector instructions. HPVM supports three important capabilities for programming heterogeneous systems: a compiler intermediate representation (IR), a virtual instruction set (ISA), and a basis for runtime scheduling; previous systems focus on only one of these capabilities. As a compiler IR, HPVM aims to enable effective code generation and optimization for heterogeneous systems. As a virtual ISA, it can be used to ship executable programs, in order to achieve both functional portability and performance portability across such systems. At runtime, HPVM enables flexible scheduling policies, both through the graph structure and the ability to compile individual nodes in a program to any of the target devices on a system. We have implemented a prototype HPVM system, defining the HPVM IR as an extension of the LLVM compiler IR, compiler optimizations that operate directly on HPVM graphs, and code generators that translate the virtual ISA to NVIDIA GPUs, Intel's AVX vector units, and to multicore X86-64 processors. Experimental results show that HPVM optimizations achieve significant performance improvements, HPVM translators achieve performance competitive with manually developed OpenCL code for both GPUs and vector hardware, and that runtime scheduling policies can make use of both program and runtime information to exploit the flexible compilation capabilities. Overall, we conclude that the HPVM representation is a promising basis for achieving performance portability and for implementing parallelizing compilers for heterogeneous parallel systems.

References

[1]
R. Allen and K. Kennedy. 2002. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, Inc., San Francisco, CA.
[2]
Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A Language and Compiler for Algorithmic Choice (PLDI).
[3]
E. A. Ashcroft and W. W. Wadge. 1977. Lucid, a Nonprocedural Language with Iteration. Commun. ACM (1977).
[4]
CÃl'dric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-AndrÃl' Wacrenier. 2011. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience (2011).
[5]
Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Adam Betts, Alas-tair F. Donaldson, Jeroen Ketema, Javed Absar, Sven van Haastregt, Alexey Kravets, Anton Lokhmotov, Robert David, and Elnar Hajiyev. 2015. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC, USA, 138--149.
[6]
Michael Bauer, Sean Treichler, Elliot Slaughter, and Alex Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions (SC).
[7]
Tal Ben-Nun, Michael Sutton, Sreepathi Pai, and Keshav Pingali. 2017. Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations. In Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '17). ACM, New York, NY, USA, 235--248.
[8]
Nicolas Benoit and Stéphane Louise. 2010. Extending GCC with a Multi-grain Parallelism Adaptation Framework for MPSoCs. In 2nd Int'l Workshop on GCC Research Opportunities.
[9]
Nicolas Benoit and Stéphane Louise. 2016. Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems. In 24th Euromicro Conference.
[10]
Zoran Budimlic, Michael Burke, Vincent CavÃl', Kathleen Knobe, Geoff Lowney, Ryan Newton, Jens Palsberg, David Peixotto, Vivek Sarkar, Frank Schlimbach, and Sagnak Tasirlar. 2010. Concurrent Collections. Scientific Programming 18, 3--4 (2010), 203--217.
[11]
Li-wen Chang, Abdul Dakkak, Christopher I. Rodrigues, and Wen mei Hwu. 2015. Tangram: a High-level Language for Performance Portable Code Synthesis (MULTIPROG 2015).
[12]
D.E. Culler, S.C. Goldstein, K.E. Schauser, and T. Voneicken. 1993. TAM - A Compiler Controlled Threaded Abstract Machine. Parallel and Distributed Computing.
[13]
Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. 2006. Sequoia: Programming the Memory Hierarchy (SC).
[14]
HSA Foundation. 2015. HSAIL. (2015). Retrieved January 17, 2018from http://www.hsafoundation.com/standards/
[15]
Vladimir Gajinov, Srdjan Stipic, Osman S. Unsal, Tim Harris, Eduard Ayguadé, and Adrián Cristal. 2012. Integrating Dataflow Abstractions into the Shared Memory Model. In 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing. 243--251.
[16]
Vladimir Gajinov, Srdjan Stipic, Osman S. Unsal, Tim Harris, Eduard Ayguadé, and Adrián Cristal. 2012. Supporting Stateful Tasks in a Dataflow Graph. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '12). ACM, New York, NY, USA, 435--436.
[17]
Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, and Vaidyalingam S. Sunderam. 1994. PVM: A Users' Guide and Tutorial for Networked Parallel Computing. MIT press.
[18]
Google. 2013. Google Cloud Dataflow. (2013). Retrieved January 17, 2018 from https://cloud.google.com/dataflow/
[19]
Dounia Khaldi, Pierre Jouvelot, Francois Irigoin, and Corinne Ancourt. 2012. SPIRE: A Methodology for Sequential to Parallel Intermediate Representation Extension (CPC).
[20]
Khronos Group. 2012. SPIR 1.2 Specification. https://www.khronos.org/registry/spir/specs/spir_spec-1.2.pdf. (2012).
[21]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In Proc. Conf. on Code Generation and Optimization. San Jose, CA, USA, 75--88.
[22]
Li-wen Chang. 2015. Personal Communication. (2015).
[23]
D. Majeti and V. Sarkar. 2015. Heterogeneous Habanero-C (H2C): A Portable Programming Model for Heterogeneous Processors (IPDPS Workshop).
[24]
Tim Mattson, Romain Cledat, Zoran Budimlic, Vincent Cave, Sanjay Chatterjee, Bala Seshasayee, Wijngaart Rob van der, and Vivek Sarkar. 2015. OCR: The Open Community Runtime Interface. Technical Report.
[25]
Takamichi Miyamoto, Saori Asaka, Hiroki Mikami, Masayoshi Mase, Yasutaka Wada, Hirofumi Nakano, Keiji Kimura, and Hironori Kasahara. 2008. Parallelization with Automatic Parallelizing Compiler Generating Consumer Electronics Multicore API. In 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications. IEEE.
[26]
Rishiyur S. Nikhil. 1993. The Parallel Programming Language Id and its compilation for parallel machines (IJHSC).
[27]
NVIDIA. 2009. PTX: Parallel Thread Execution ISA. http://docs.nvidia.com/cuda/parallel-thread-execution/index.html. (2009).
[28]
NVIDIA. 2013. NVVM IR. http://docs.nvidia.com/cuda/nvvm-ir-spec. (2013).
[29]
M. Okamoto, K. Yamashita, H. Kasahara, and S. Narita. 1995. Hierarchical macro-dataflow computation scheme. In IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing. Proceedings. IEEE.
[30]
LLVM Project. 2003. LLVM Language Reference Manual. (2003). Retrieved January 17, 2018from http://llvm.org/docs/LangRef.html
[31]
Qualcomm Technologies, Inc. 2014. MARE: Enabling Applications for Heterogeneous Mobile Devices. Technical Report.
[32]
Tao B. Schardl, William S. Moses, and Charles E. Leiserson. 2017. Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation (PPoPP).
[33]
John A Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-Mei W Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical Report.
[34]
Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages (ACM TECS).
[35]
William Thies, Michal Karczmarek, and Saman Amarasinghe. 2002. StreamIt: A Language for Streaming Applications (International Conference on Compiler Construction).
[36]
Yasutaka Wada, Akihiro Hayashi, Takeshi Masuura, Jun Shirako, Hirofumi Nakano, Hiroaki Shikano, Keiji Kimura, and Hironori Kasahara. 2011. A Parallelizing Compiler Cooperative Heterogeneous Multicore Processor Architecture. Springer Berlin Heidelberg, Berlin, Heidelberg.
[37]
Yonghong Yan, Jisheng Zhao, Yi Guo, and Vivek Sarkar. 2009. Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement. In Proceedings of the 22Nd International Conference on Languages and Compilers for Parallel Computing (LCPC'09). Springer-Verlag, Berlin, Heidelberg, 172--187.
[38]
Jin Zhou and Brian Demsky. 2010. Bamboo: A Data-centric, Object-oriented Approach to Many-core Software. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '10). ACM, New York, NY, USA, 388--399.

Cited By

View all
  • (2024)Mobiprox: Supporting Dynamic Approximate Computing on MobilesIEEE Internet of Things Journal10.1109/JIOT.2024.336595711:9(16873-16886)Online publication date: 1-May-2024
  • (2024)Representing Data Collections in an SSA Form2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
  • (2022)Performance portability in a real world applicationInternational Journal of High Performance Computing Applications10.1177/1094342022107710736:3(419-439)Online publication date: 1-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 53, Issue 1
PPoPP '18
January 2018
426 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3200691
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
    February 2018
    442 pages
    ISBN:9781450349826
    DOI:10.1145/3178487
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 February 2018
Published in SIGPLAN Volume 53, Issue 1

Check for updates

Author Tags

  1. GPU
  2. compiler
  3. heterogeneous systems
  4. parallel IR
  5. vector SIMD
  6. virtual ISA

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)331
  • Downloads (Last 6 weeks)42
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Mobiprox: Supporting Dynamic Approximate Computing on MobilesIEEE Internet of Things Journal10.1109/JIOT.2024.336595711:9(16873-16886)Online publication date: 1-May-2024
  • (2024)Representing Data Collections in an SSA Form2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
  • (2022)Performance portability in a real world applicationInternational Journal of High Performance Computing Applications10.1177/1094342022107710736:3(419-439)Online publication date: 1-May-2022
  • (2022)ParaGraph: An application-simulator interface and toolkit for hardware-software co-designProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545069(1-13)Online publication date: 29-Aug-2022
  • (2022)HDNN: a cross-platform MLIR dialect for deep neural networksThe Journal of Supercomputing10.1007/s11227-022-04417-378:11(13814-13830)Online publication date: 25-Mar-2022
  • (2022)Accelerator Design with High-Level SynthesisHandbook of Computer Architecture10.1007/978-981-15-6401-7_19-1(1-33)Online publication date: 27-Jan-2022
  • (2022)Applying Intel's oneAPI to a machine learning case studyConcurrency and Computation: Practice and Experience10.1002/cpe.691734:13Online publication date: 7-Apr-2022
  • (2019)μIR -An intermediate representation for transforming and optimizing the microarchitecture of application acceleratorsProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358292(940-953)Online publication date: 12-Oct-2019
  • (2024)IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342901035:10(1796-1809)Online publication date: Oct-2024
  • (2024)Domain-Specific STT-MRAM-Based In-Memory Computing: A SurveyIEEE Access10.1109/ACCESS.2024.336563212(28036-28056)Online publication date: 2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media