research-article

ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures

Author:

Sylvain HenryAuthors Info & Claims

FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

Pages 3 - 12

https://doi.org/10.1145/2502323.2502329

Published: 23 September 2013 Publication History

Get Access

Abstract

The current trend in high-performance computing is to use heterogeneous architectures (i.e. multi-core with accelerators such as GPUs or Xeon Phi) because they offer very good performance over energy consumption ratios. Programming these architectures is notoriously hard, hence their use is still somewhat restricted to parallel programming experts. The situation is improving with frameworks using high-level programming models to generate efficient computation kernels for these new accelerator architectures. However, an orthogonal issue is to efficiently manage memory and kernel scheduling especially on architectures containing multiple accelerators. Task graph based runtime systems have been a first step toward efficiently automatizing these tasks. However they introduce new challenges of their own such as task granularity adaptation that cannot be easily automatized.

In this paper, we present a programming model and a preliminary implementation of a runtime system called ViperVM that takes advantage of parallel functional programming to extend task graph based runtime systems. The main idea is to substitute dynamically created task graphs with pure functional programs that are evaluated in parallel by the runtime system. Programmers can associate kernels (written in OpenCL, CUDA, Fortran...) to identifiers that can then be used as pure functions in programs. During parallel evaluation, the runtime system automatically schedules kernels on available accelerators when it has to reduce one of these identifiers. An extension of this mechanism consists in associating both a kernel and a functional expression to the same identifier and to let the runtime system decide either to execute the kernel or to evaluate the expression. We show that this mechanism can be used to perform dynamic granularity adaptation.

References

[1]

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst, S. Thibault, and S. Tomov. Faster, Cheaper, Better a Hybridization Methodology to Develop Linear Algebra Software for. 2010.

Abstract

References

Cited By

Index Terms

Recommendations

XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures

GPU Acceleration for Simulating Massively Parallel Many-Core Platforms

Accelerating a Climate Physics Model with OpenCL

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations