research-article

Open access

A Model-Based Software Solution for Simultaneous Multiple Kernels on GPUs

Authors:

Hao Wu,

Weizhi Liu,

Huanxin Lin, and

Cho-Li WangAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 17, Issue 1

Article No.: 7, Pages 1 - 26

https://doi.org/10.1145/3377138

Published: 04 March 2020 Publication History

All formats PDF

Editorial Notes

A corrigendum was issued for this article on June 12, 2020. You can download the corrigendum from the supplemental material section of this citation page.

Abstract

As a critical computing resource in multiuser systems such as supercomputers, data centers, and cloud services, a GPU contains multiple compute units (CUs). GPU Multitasking is an intuitive solution to underutilization in GPGPU computing. Recently proposed solutions of multitasking GPUs can be classified into two categories: (1) spatially partitioned sharing (SPS), which coexecutes different kernels on disjointed sets of compute units (CU), and (2) simultaneous multikernel (SMK), which runs multiple kernels simultaneously within a CU. Compared to SPS, SMK can improve resource utilization even further due to the interleaving of instructions from kernels with low dynamic resource contentions.

However, it is hard to implement SMK on current GPU architecture, because (1) techniques for applying SMK on top of GPU hardware scheduling policy are scarce and (2) finding an efficient SMK scheme is difficult due to the complex interferences of concurrently executed kernels. In this article, we propose a lightweight and effective performance model to evaluate the complex interferences of SMK. Based on the probability of independent events, our performance model is built from a totally new angle and contains limited parameters. Then, we propose a metric, symbiotic factor, which can evaluate an SMK scheme so that kernels with complementary resource utilization can corun within a CU. Also, we analyze the advantages and disadvantages of kernel slicing and kernel stretching techniques and integrate them to apply SMK on GPUs instead of simulators. We validate our model on 18 benchmarks. Compared to the optimized hardware-based concurrent kernel execution whose kernel launching order brings fast execution time, the results of corunning kernel pairs show 11%, 18%, and 12% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average. Compared to the Warped-Slicer, the results show 29%, 18%, and 51% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average.

Supplementary Material

a7-wu-corrigendum (a7-wu-corrigendum.pdf)

Corrigendum to "A Model-Based Software Solution for Simultaneous Multiple Kernels on GPUs" by Wu et al., ACM Transactions on Architecture and Code Optimization, Volume 17, Issue 1 (TACO 17:1).

Download
89.13 KB

References

[1]

P. Aguilera, K. Morrow, and N. S. Kim. 2014. Fair share: Allocation of GPU resources for both performance and fairness. In 2014 IEEE 32nd International Conference on Computer Design (ICCD’14). 440--447.

Editorial Notes

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Impact of Vectorization Over 16-bit Data-Types on GPUs

Accelerating a hydrological uncertainty ensemble model using graphics processing units (GPUs)

Energy Efficiency Analysis of GPUs

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations