Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3412841.3441989acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

smCompactor: a workload-aware fine-grained resource management framework for GPGPUs

Published: 22 April 2021 Publication History

Abstract

Recently, graphic processing unit (GPU) multitasking has become important in many platforms since an efficient GPU multitasking mechanism can enable more GPU-enabled tasks running on limited physical GPUs. However, current GPU multitasking technologies, such as NVIDIA Multi-Process Service (MPS) and Hyper-Q may not fully utilize GPU resources since they do not consider the efficient use of intra-GPU resources. In this paper, we present smCompactor, which is a fine-grained GPU multitasking framework to fully exploit intra-GPU resources for different workloads. smCompactor dispatches any particular thread blocks (TBs) of different GPU kernels to appropriate stream multiprocessors (SMs) based on our profiled results of workloads. With smCompactor, GPU resource utilization can be improved as we can run more workloads on a single GPU while their performance is maintained. The evaluation results show that smCompactor improves resource utilization in terms of the number of active SMs by up to 33% and it reduces the kernel execution time by up to 26% compared with NVIDIA MPS.

References

[1]
Andy Adinets. 2014. Dynamic Parallelism. Retrieved Nov 28, 2019 from https://devblogs.nvidia.com/cuda-dynamic-parallelism-api-principles/
[2]
T. Allen, X. Feng, and R. Ge. 2019. Slate: Enabling Workload-Aware Efficient Multiprocessing for Modern GPGPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 252--261.
[3]
Amazon AWS 2017. https://aws.amazon.com/cn/nvidia/
[4]
Mohammad Beheshti Roui, S. Kazem Shekofteh, Hamid Noori, and Ahad Harati. 2020. Efficient scheduling of streams on GPGPUs. Journal of Supercomputing (February 2020).
[5]
Google Cloud 2017. https://cloud.google.com/gpu
[6]
Kshitij Gupta, Jeff A Stuart, and John D Owens. 2012. A study of persistent threads style GPU programming for GPGPU workloads. IEEE.
[7]
Arthur L Delcher Amitabh Varshney Michael C Schatz, Cole Trapnell. 2007. High-throughput sequence alignment using Graphics Processing Units. In BMC Bioinformaticsvolume 8, Article number: 474.
[8]
Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 308--317.
[9]
NVIDIA. 2009. NVIDIA Fermi Architecture. Retrieved Nov 28, 2019 from https://www.nvidia.com/content/PDF/fermi_white_papers/P.Glaskowsky_NVIDIA's_Fermi-The_First_Complete_GPU_Architecture.pdf
[10]
NVIDIA. 2013. NVIDIA HyperQ. Retrieved Nov 28, 2019 from hhttp://developer.download.nvidia.com/compute/DevZone/C/html_x64/6_Advanced/simpleHyperQ/doc/HyperQ.pdf
[11]
NVIDIA. 2017. NVIDIA TITAN Xp. Retrieved Nov 28, 2019 from https://www.nvidia.com/en-us/titan/titan-xp/
[12]
NVIDIA. 2019. CUDA context. Retrieved Nov 28, 2019 from https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#contex
[13]
NVIDIA. 2019. CUDA stream. Retrieved Nov 28, 2019 from https://docs.nvidia.com/cuda/cuda-runtime-api/group_CUDART_STREAM.html
[14]
NVIDIA. 2019. NVIDIA HyperQ. Retrieved Nov 28, 2019 from https://docs.nvidia.com/deploy/mps/index.html
[15]
NVIDIA. 2019. NVIDIA NVCC. Retrieved Nov 28, 2019 from https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
[16]
NVIDIA. 2019. NVIDIA NVProf. Retrieved Nov 28, 2019 from https://docs.nvidia.com/cuda/profiler-users-guide/index.html
[17]
Sreepathi Pai, Matthew J Thazhuthaveetil, and Ramaswamy Govindarajan. 2013. Improving GPGPU concurrency with elastic kernels. ACM SIGARCH Computer Architecture News 41, 1 (2013), 407--418.
[18]
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2015. Chimera: Collaborative preemption for multitasking on a shared GPU. ACM SIGARCH Computer Architecture News 43, 1 (2015), 593--606.
[19]
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2017. Dynamic resource management for efficient utilization of multitasking gpus. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. 527--540.
[20]
et al S.che. 2018. Rodinia. Retrieved Nov 28, 2019 from https://rodinia.cs.virginia.edu/doku.php
[21]
Summit 2018. https://www.olcf.ornl.gov/summit/
[22]
Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, and Mateo Valero. 2014. Enabling preemptive multiprogramming on GPUs. ACM SIGARCH Computer Architecture News 42, 3 (2014), 193--204.
[23]
Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, and Jeffrey Vetter. 2015. Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations. In Proceedings of the 29th ACM on International Conference on Supercomputing. 119--130.
[24]
Qiumin Xu, Hyeran Jeon, Keunsoo Kim, Won Woo Ro, and Murali Annavaram. 2016. Warped-slicer: efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 230--242.
[25]
Qiu Z. Fan, F. 2004. GPU Cluster for High Perforamnce Computing. In 2004 IEEE/ACM conference on Supercomputing 2004. ACM, 47.

Cited By

View all
  • (2024)Application-aware Resource Sharing using Software and Hardware Partitioning on Modern GPUsNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10574996(1-8)Online publication date: 6-May-2024
  • (2023)ROSGM: A Real-Time GPU Management Framework with Plug-In Policies for ROS 22023 IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS58335.2023.00015(93-105)Online publication date: May-2023
  • (2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
  • Show More Cited By

Index Terms

  1. smCompactor: a workload-aware fine-grained resource management framework for GPGPUs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing
    March 2021
    2075 pages
    ISBN:9781450381048
    DOI:10.1145/3412841
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 April 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPU multitasking
    2. GPU resource management
    3. HPC
    4. OS
    5. parallel computing

    Qualifiers

    • Research-article

    Funding Sources

    • National Research Foundation of Korea (NRF) funded by the Ministry of Science
    • National Research Foundation of the Korea government (MSIT)
    • National Research Foundation of Korea (NRF) funded by the Ministry of Science

    Conference

    SAC '21
    Sponsor:
    SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing
    March 22 - 26, 2021
    Virtual Event, Republic of Korea

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Application-aware Resource Sharing using Software and Hardware Partitioning on Modern GPUsNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10574996(1-8)Online publication date: 6-May-2024
    • (2023)ROSGM: A Real-Time GPU Management Framework with Plug-In Policies for ROS 22023 IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS58335.2023.00015(93-105)Online publication date: May-2023
    • (2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
    • (2022)K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUsCluster Computing10.1007/s10586-021-03429-725:1(597-617)Online publication date: 1-Feb-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media