research-article

smCompactor: a workload-aware fine-grained resource management framework for GPGPUs

Authors:

Heon Young YeomAuthors Info & Claims

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

Pages 1147 - 1155

https://doi.org/10.1145/3412841.3441989

Published: 22 April 2021 Publication History

Abstract

Recently, graphic processing unit (GPU) multitasking has become important in many platforms since an efficient GPU multitasking mechanism can enable more GPU-enabled tasks running on limited physical GPUs. However, current GPU multitasking technologies, such as NVIDIA Multi-Process Service (MPS) and Hyper-Q may not fully utilize GPU resources since they do not consider the efficient use of intra-GPU resources. In this paper, we present smCompactor, which is a fine-grained GPU multitasking framework to fully exploit intra-GPU resources for different workloads. smCompactor dispatches any particular thread blocks (TBs) of different GPU kernels to appropriate stream multiprocessors (SMs) based on our profiled results of workloads. With smCompactor, GPU resource utilization can be improved as we can run more workloads on a single GPU while their performance is maintained. The evaluation results show that smCompactor improves resource utilization in terms of the number of active SMs by up to 33% and it reduces the kernel execution time by up to 26% compared with NVIDIA MPS.

References

[1]

Andy Adinets. 2014. Dynamic Parallelism. Retrieved Nov 28, 2019 from https://devblogs.nvidia.com/cuda-dynamic-parallelism-api-principles/

[2]

T. Allen, X. Feng, and R. Ge. 2019. Slate: Enabling Workload-Aware Efficient Multiprocessing for Modern GPGPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 252--261.

[3]

Amazon AWS 2017. https://aws.amazon.com/cn/nvidia/

[4]

Mohammad Beheshti Roui, S. Kazem Shekofteh, Hamid Noori, and Ahad Harati. 2020. Efficient scheduling of streams on GPGPUs. Journal of Supercomputing (February 2020).

[5]

Google Cloud 2017. https://cloud.google.com/gpu

[6]

Kshitij Gupta, Jeff A Stuart, and John D Owens. 2012. A study of persistent threads style GPU programming for GPGPU workloads. IEEE.

[7]

Arthur L Delcher Amitabh Varshney Michael C Schatz, Cole Trapnell. 2007. High-throughput sequence alignment using Graphics Processing Units. In BMC Bioinformaticsvolume 8, Article number: 474.

[8]

Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 308--317.

Digital Library

[9]

NVIDIA. 2009. NVIDIA Fermi Architecture. Retrieved Nov 28, 2019 from https://www.nvidia.com/content/PDF/fermi_white_papers/P.Glaskowsky_NVIDIA's_Fermi-The_First_Complete_GPU_Architecture.pdf

[10]

NVIDIA. 2013. NVIDIA HyperQ. Retrieved Nov 28, 2019 from hhttp://developer.download.nvidia.com/compute/DevZone/C/html_x64/6_Advanced/simpleHyperQ/doc/HyperQ.pdf

[11]

NVIDIA. 2017. NVIDIA TITAN Xp. Retrieved Nov 28, 2019 from https://www.nvidia.com/en-us/titan/titan-xp/

[12]

NVIDIA. 2019. CUDA context. Retrieved Nov 28, 2019 from https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#contex

[13]

NVIDIA. 2019. CUDA stream. Retrieved Nov 28, 2019 from https://docs.nvidia.com/cuda/cuda-runtime-api/group_CUDART_STREAM.html

[14]

NVIDIA. 2019. NVIDIA HyperQ. Retrieved Nov 28, 2019 from https://docs.nvidia.com/deploy/mps/index.html

[15]

NVIDIA. 2019. NVIDIA NVCC. Retrieved Nov 28, 2019 from https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html

[16]

NVIDIA. 2019. NVIDIA NVProf. Retrieved Nov 28, 2019 from https://docs.nvidia.com/cuda/profiler-users-guide/index.html

[17]

Sreepathi Pai, Matthew J Thazhuthaveetil, and Ramaswamy Govindarajan. 2013. Improving GPGPU concurrency with elastic kernels. ACM SIGARCH Computer Architecture News 41, 1 (2013), 407--418.

Digital Library

[18]

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2015. Chimera: Collaborative preemption for multitasking on a shared GPU. ACM SIGARCH Computer Architecture News 43, 1 (2015), 593--606.

Digital Library

[19]

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2017. Dynamic resource management for efficient utilization of multitasking gpus. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. 527--540.

Digital Library

[20]

et al S.che. 2018. Rodinia. Retrieved Nov 28, 2019 from https://rodinia.cs.virginia.edu/doku.php

[21]

Summit 2018. https://www.olcf.ornl.gov/summit/

[22]

Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, and Mateo Valero. 2014. Enabling preemptive multiprogramming on GPUs. ACM SIGARCH Computer Architecture News 42, 3 (2014), 193--204.

Digital Library

[23]

Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, and Jeffrey Vetter. 2015. Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations. In Proceedings of the 29th ACM on International Conference on Supercomputing. 119--130.

Digital Library

[24]

Qiumin Xu, Hyeran Jeon, Keunsoo Kim, Won Woo Ro, and Murali Annavaram. 2016. Warped-slicer: efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 230--242.

Digital Library

[25]

Qiu Z. Fan, F. 2004. GPU Cluster for High Perforamnce Computing. In 2004 IEEE/ACM conference on Supercomputing 2004. ACM, 47.

Cited By

Adufu THa JKim Y(2024)Application-aware Resource Sharing using Software and Hardware Partitioning on Modern GPUsNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10574996(1-8)Online publication date: 6-May-2024
https://doi.org/10.1109/NOMS59830.2024.10574996
Li RHu TJiang XLi LXing WDeng QGuan N(2023)ROSGM: A Real-Time GPU Management Framework with Plug-In Policies for ROS 22023 IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS58335.2023.00015(93-105)Online publication date: May-2023
https://doi.org/10.1109/RTAS58335.2023.00015
Saroliya UArima ELiu DSchulz M(2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00023
Show More Cited By

Index Terms

smCompactor: a workload-aware fine-grained resource management framework for GPGPUs
1. Computer systems organization
  1. Real-time systems
    1. Real-time operating systems

Recommendations

Early Performance Evaluation of New Six-Core Intel® Xeon® 5600 Family Processors for HPC
ISPDC '10: Proceedings of the 2010 Ninth International Symposium on Parallel and Distributed Computing

In this paper we take a look at what the newest member of the Intel Xeon Processor family, code named Westmere brings to high performance computing. We compare three generations of Intel Xeon based systems and present a performance evolutions based on ...
Mars: Accelerating MapReduce with Graphics Processors

We design and implement Mars, a MapReduce runtime system accelerated with graphics processing units (GPUs). MapReduce is a simple and flexible parallel programming paradigm originally proposed by Google, for the ease of large-scale data processing on ...
Performance analysis of the high-performance conjugate gradient benchmark on GPUs

Graphics processing unit accelerated supercomputers have proved to be very effective, especially with regard to power efficiency, for accelerating compute intensive applications like the high-performance Linpack used in the TOP500 list. This paper ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

March 2021

2075 pages

ISBN:9781450381048

DOI:10.1145/3412841

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Jiman Hong
Soongsil University, South Korea
,
Program Chairs:
Alessio Bechini
University of Pisa, Italy
,
Eunjee Song
Baylor University

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea (NRF) funded by the Ministry of Science
National Research Foundation of the Korea government (MSIT)
National Research Foundation of Korea (NRF) funded by the Ministry of Science

Conference

SAC '21

Sponsor:

SIGAPP

SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing

March 22 - 26, 2021

Virtual Event, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
176
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)4

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Adufu THa JKim Y(2024)Application-aware Resource Sharing using Software and Hardware Partitioning on Modern GPUsNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10574996(1-8)Online publication date: 6-May-2024
https://doi.org/10.1109/NOMS59830.2024.10574996
Li RHu TJiang XLi LXing WDeng QGuan N(2023)ROSGM: A Real-Time GPU Management Framework with Plug-In Policies for ROS 22023 IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS58335.2023.00015(93-105)Online publication date: May-2023
https://doi.org/10.1109/RTAS58335.2023.00015
Saroliya UArima ELiu DSchulz M(2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00023
Kim SKim Y(2022)K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUsCluster Computing10.1007/s10586-021-03429-725:1(597-617)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s10586-021-03429-7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents