research-article

Classification-Driven Search for Effective SM Partitioning in Multitasking GPUs

Authors:

Xia Zhao,

Zhiying Wang,

Lieven EeckhoutAuthors Info & Claims

ICS '18: Proceedings of the 2018 International Conference on Supercomputing

Pages 65 - 75

https://doi.org/10.1145/3205289.3205311

Published: 12 June 2018 Publication History

Get Access

Abstract

Graphics processing units (GPUs) feature an increasing number of streaming multiprocessors (SMs) with each successive generation. At the same time, GPUs are increasingly widely adopted in cloud services and data centers to accelerate general-purpose workloads. Running multiple applications on a GPU in such environments requires effective multitasking support. Spatial multitasking in which independent applications co-execute on different sets of SMs is a promising solution to share GPU resources. Unfortunately, how to effectively partition SMs is an open problem.

In this paper, we observe that compared to widely-used even partitioning, dynamic SM partitioning based on the characteristics of the co-executing applications can significantly improve performance and power efficiency. Unfortunately finding an effective SM partition is challenging because the number of possible combinations increases exponentially with the number of SMs and co-executing applications. Through offline analysis, we find that first classifying workloads, and then searching an effective SM partition based on the workload characteristics can significantly reduce the search space, making dynamic SM partitioning tractable.

Based on these insights, we propose Classification-Driven search (CD-search) for low-overhead dynamic SM partitioning in multitasking GPUs. CD-search first classifies workloads using a novel off-SM bandwidth model, after which it enters the performance mode or power mode depending on the workload's characteristics. Both modes follow a specific search strategy to quickly determine the optimum SM partition. Our evaluation shows that CD-search improves system throughput by 10.4% on average (and up to 62.9%) over even partitioning for workloads that are classified for the performance mode. For workloads classified for the power mode, CD-search reduces power consumption by 25% on average (and up to 41.2%). CD-search incurs limited runtime overhead.

References

[1]

Q. Chen, H. Yang, J. Mars, and L. Tang, "Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 681--696, April 2016.

Abstract

References

Cited By

Index Terms

Recommendations

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs

Exploring cache bypassing and partitioning for multi-tasking on GPUs

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations