research-article

Public Access

Juggler: a dependence-aware task-based execution framework for GPUs

Authors:

Mehmet E. Belviranli,

Seyong Lee,

Jeffrey S. Vetter,

Laxmi N. BhuyanAuthors Info & Claims

ACM SIGPLAN Notices, Volume 53, Issue 1

Pages 54 - 67

https://doi.org/10.1145/3200691.3178492

Published: 10 February 2018 Publication History

PDF eReader

Abstract

Scientific applications with single instruction, multiple data (SIMD) computations show considerable performance improvements when run on today's graphics processing units (GPUs). However, the existence of data dependences across thread blocks may significantly impact the speedup by requiring global synchronization across multiprocessors (SMs) inside the GPU. To efficiently run applications with interblock data dependences, we need fine-granular task-based execution models that will treat SMs inside a GPU as stand-alone parallel processing units. Such a scheme will enable faster execution by utilizing all internal computation elements inside the GPU and eliminating unnecessary waits during device-wide global barriers.

In this paper, we propose Juggler, a task-based execution scheme for GPU workloads with data dependences. The Juggler framework takes applications embedding OpenMP 4.5 tasks as input and executes them on the GPU via an efficient in-device runtime, hence eliminating the need for kernel-wide global synchronization. Juggler requires no or little modification to the source code, and once launched, the runtime entirely runs on the GPU without relying on the host through the entire execution. We have evaluated Juggler on an NVIDIA Tesla P100 GPU and obtained up to 31% performance improvement against global barrier based implementation, with minimal runtime overhead.

Supplementary Material

Artifacts Available (juggler-master-9536993d76c2dc0639ebedd3cf4db8e0440734f2.zip)

This file is a snapshot of the Juggler repository located at the following address:

https://code.ornl.gov/fub/juggler

Please refer to the repository for the most up-to-date version of the project.

Download
129.99 KB

References

[1]

Amir Ali Abdolrashidi, Devashree Tripathy, Mehmet Esat Belviranli, Laxmi Narayan Bhuyan, and Daniel Wong. 2017. Wireframe: Supporting Data-dependent Parallelism Through Dependency Graph Execution in GPUs. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '17).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Juggler: a dependence-aware task-based execution framework for GPUs

Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices

Evaluating Support for OpenMP Offload Features

Comments

Information

Published In

Publisher

Publication History

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations