Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3037697.3037707acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs

Published: 04 April 2017 Publication History

Abstract

As graphics processing units (GPUs) are broadly adopted, running multiple applications on a GPU at the same time is beginning to attract wide attention. Recent proposals on multitasking GPUs have focused on either spatial multitasking, which partitions GPU resource at a streaming multiprocessor (SM) granularity, or simultaneous multikernel (SMK), which runs multiple kernels on the same SM. However, multitasking performance varies heavily depending on the resource partitions within each scheme, and the application mixes. In this paper, we propose GPU Maestro that performs dynamic resource management for efficient utilization of multitasking GPUs. GPU Maestro can discover the best performing GPU resource partition exploiting both spatial multitasking and SMK. Furthermore, dynamism within a kernel and interference between the kernels are automatically considered because GPU Maestro finds the best performing partition through direct measurements. Evaluations show that GPU Maestro can improve average system throughput by 20.2% and 13.9% over the baseline spatial multitasking and SMK, respectively.

References

[1]
Green500 list, 2016. https://www.top500.org/green500/lists/2016/11/.
[2]
Top500 list, 2016. http://www.top500.org/lists/2016/11/.
[3]
J. T. Adriaens, K. Compton, N. S. Kim, and M. J. Schulte. Thecase for GPGPU spatial multitasking. In Proc. of the 18th International Symposium on High-Performance Computer Architecture, pages 1--12, 2012.
[4]
Amazon. Amazon web services. https://aws.amazon.com/ec2/.
[5]
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing CUDA workloads using a detailed GPU simulator. In Proc. of the 2009 IEEE Symposium on Performance Analysis of Systems and Software, pages 163--174, Apr. 2009.
[6]
C. Basaran and K.-D. Kang. Supporting preemptive task executions and memory copies in GPGPUs. In 2012 24th Euromicro Conference on Real-Time Systems, pages 287--296, 2012.
[7]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proc. of the IEEE Symposium on Workload Characterization, pages 44--54, 2009.
[8]
S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42--53, 2008.
[9]
K. Gupta, J. A. Stuart, and J. D. Owens. A study of persistent threads style GPU programming for GPGPU workloads. Innovative Parallel Computing, pages 1--14, 2012.
[10]
S. Hong and H. Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In Proc. of the 36th Annual International Symposium on Computer Architecture, pages 152--163, 2009.
[11]
S. Kato, K. Lakshmanan, R. R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. pages 17--30, 2011.
[12]
KHRONOS Group. OpenCL - the open standard for parallel programming of heterogeneous systems, 2010. URL http://www.khronos.org.
[13]
V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt. Improving GPU performance via large warps and two-level warp scheduling. In Proc. of the 44th Annual International Symposium on Microarchitecture, pages 308--317, 2011.
[14]
NVIDIA. GPU Computing SDK. http://developer.nvidia.com/gpu-computing-sdk.
[15]
NVIDIA. NVIDIA's next generation CUDA compute architecture: Kepler GK110, 2012. www.nvidia.com/content/PDF/NVIDIAKeplerGK110ArchitectureWhitepaper.pdf.
[16]
NVIDIA. Sharing a GPU between MPI processes: Multi-process service (MPS) overview, 2014. http://docs.nvidia.com/deploy/mps /index.html.
[17]
NVIDIA. NVIDIA GeForce GTX 980: Featuring Maxwell, the most advanced GPU ever made, 2014. http://international.download.nvidia.com/geforce-com/international/pdfs/GeForceGTX980WhitepaperFINAL.PDF.
[18]
NVIDIA. NVIDIA CUDA C Programming Guide, version 7.5, 2015.
[19]
S. Pai, M. J. Thazhuthaveetil, and R. Govindarajan. Improving GPGPU concurrency with elastic kernels. In 18th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 407--418, Mar. 2013.
[20]
J. J. K. Park, Y. Park, and S. Mahlke. ELF: Maximizin memory-level parallelism for GPUs with coordinated warp and fetch scheduling. In Proceedings of SC15: the International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2015.
[21]
J. J. K. Park, Y. Park, and S. Mahlke. Chimera: Collaborative preemption for multitasking on a shared GPU. In 20th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 593--606, Mar. 2015.
[22]
T. G. Rogers, M. O'Connor, and T. M. Aamodt. Cache-conscious wavefront scheduling. In Proc. of the 45th Annual International Symposium on Microarchitecture, pages 72--83, 2012.
[23]
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proc. of the 23rd ACM Symposium on Operating Systems Principles, pages 233--248, 2011.
[24]
A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. John Wiley and Sons, Inc., 8th edition, 2013.
[25]
J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. D. Liu, and W. mei Hwu. Parboil: A revised benchmark suite for scientific and commercial through put computing. Technical Report IMPACT-12-01, University of Illinois at Urbana-Champaign, Mar. 2012.
[26]
I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, and M. Valero. Enabling preemptive multiprogramming on GPUs. In Proc. of the 41st Annual International Symposium on Computer Architecture, pages 193--204, 2014.
[27]
D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proc. of the 22nd Annual International Symposium on Computer Architecture, pages 392--403, June 1995.
[28]
D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proc. of the 23rd Annual International Symposium on Computer Architecture, pages 191--202, May 1996.
[29]
A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scale cluster management at Google with Borg. In Proc. of the 10th European Conference on Computer Systems, 2015.
[30]
Z. Wang, J. Yang, R. Melhem, B. Childers, Y. Zhang, and M. Guo. Simultaneous multikernel GPU: Multi-tasking throughput processors via fine-grained sharing. In Proc. of the 22nd International Symposium on High-Performance Computer Architecture, pages 358--369, Mar. 2016.
[31]
B. Wu, G. Chen, D. Li, X. Shen, and J. Vetter. Enabling and exploiting flexible task assignment on GPU through SMcentric program transformations. In Proc. of the 2015 International Conference on Supercomputing, pages 119--130, June 2015.
[32]
Q. Xu, H. Jeon, K. Kim, W. W. Ro, and M. Annavaram. Warped-slicer: Efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming. In Proc. of the 43rd Annual International Symposium on Computer Architecture, 2016.
[33]
Y. Zhang and J. D. Owens. A quantitative performance analysis model for GPU architectures. In Proc. of the 17th International Symposium on High-Performance Computer Architecture, pages 382--393, Feb. 2011.

Cited By

View all
  • (2024)POSTER: FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN InferencesProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638485(451-453)Online publication date: 2-Mar-2024
  • (2024)ElasticRoom: Multi-Tenant DNN Inference Engine via Co-design with Resource-constrained Compilation and Strong Priority SchedulingProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658654(1-14)Online publication date: 3-Jun-2024
  • (2024)IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342901035:10(1796-1809)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
April 2017
856 pages
ISBN:9781450344654
DOI:10.1145/3037697
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graphics processing unit
  2. multitasking
  3. resource management

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '17

Acceptance Rates

ASPLOS '17 Paper Acceptance Rate 53 of 320 submissions, 17%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)472
  • Downloads (Last 6 weeks)54
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)POSTER: FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN InferencesProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638485(451-453)Online publication date: 2-Mar-2024
  • (2024)ElasticRoom: Multi-Tenant DNN Inference Engine via Co-design with Resource-constrained Compilation and Strong Priority SchedulingProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658654(1-14)Online publication date: 3-Jun-2024
  • (2024)IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342901035:10(1796-1809)Online publication date: Oct-2024
  • (2024)Graft: Efficient Inference Serving for Hybrid Deep Learning With SLO Guarantees via DNN Re-AlignmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334051835:2(280-296)Online publication date: Feb-2024
  • (2023)Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with ComboProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624660(265-280)Online publication date: 30-Oct-2023
  • (2023)Enabling Efficient Spatio-Temporal GPU Sharing for Network Function VirtualizationIEEE Transactions on Computers10.1109/TC.2023.327854172:10(2963-2977)Online publication date: Oct-2023
  • (2023)ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-Grained Resource ManagementIEEE Transactions on Computers10.1109/TC.2022.321408872:5(1473-1487)Online publication date: 1-May-2023
  • (2023)MSHGN: Multi-Scenario Adaptive Hierarchical Spatial Graph Convolution Network for GPU Utilization Prediction in Heterogeneous GPU ClustersJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104796(104796)Online publication date: Nov-2023
  • (2022)NaviSimProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569666(333-345)Online publication date: 8-Oct-2022
  • (2022)GPUPoolProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569650(317-332)Online publication date: 8-Oct-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media