Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3489048.3522656acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
abstract

NURA: A Framework for Supporting Non-Uniform Resource Accesses in GPUs

Published: 06 June 2022 Publication History

Abstract

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensure fairness and Quality-of-Service(QoS). The key idea is that each streaming multiprocessor (SM) executes the Cooperative Thread Arrays (CTAs) that belong to only one application (similar to spatial multi-tasking) and shares its unused resources with the SMs running other applications demanding more resources. NURA handles resource sharing process mainly using a software approach to provide simplicity, low hardware overhead, and flexibility.We also perform some hardware modifications as an architectural support for our software-based proposal. Our conservative analysis reveals that the hardware area overhead of our proposal is less than 1.07% with respect to the whole GPU die. Our experimental results over various mixes of GPU workloads show that NURA improves throughput by 26% compared to the state-of-the-art spatial multi-tasking, on average, while meeting QoS targets. In terms of fairness, NURA has almost similar results to spatial multitasking, while it outperforms simultaneous multi-kernel by 76%, on average.

References

[1]
Sina Darabi, Negin Mahani, Hazhir Baxishi, Ehsan Yousefzadeh-Asl-Miandoab, Mohammad Sadrosadati, and Hamid Sarbazi-Azad. 2022. NURA: A Framework for Supporting Non-Uniform Resource Accesses in GPUs. Proceedings of the ACM on Measurement and Analysis of Computing Systems 6, 1 (2022), 1--27.
[2]
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2017. Dynamic resource management for efficient utilization of multitasking GPUs. ACM SIGOPS Operating Systems Review 51, 2 (2017), 527--540.
[3]
W. Zhao, Q. Chen, H. Lin, J. Zhang, J. Leng, C. Li, W. Zheng, L. Li, and M. Guo. 2019. Themis: Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 653--663. https://doi.org/10.1109/IPDPS.2019.00074

Cited By

View all
  • (2024)Fair Resource Allocation in Virtualized O-RAN PlatformsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390438:1(1-34)Online publication date: 21-Feb-2024
  • (2022)Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core ResourcesProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00029(228-244)Online publication date: 1-Oct-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems
June 2022
132 pages
ISBN:9781450391412
DOI:10.1145/3489048
  • cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 50, Issue 1
    SIGMETRICS '22
    June 2022
    118 pages
    ISSN:0163-5999
    DOI:10.1145/3547353
    Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2022

Check for updates

Author Tags

  1. gpu
  2. multitasking
  3. quality of services
  4. system throughput

Qualifiers

  • Abstract

Conference

SIGMETRICS/PERFORMANCE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fair Resource Allocation in Virtualized O-RAN PlatformsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390438:1(1-34)Online publication date: 21-Feb-2024
  • (2022)Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core ResourcesProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00029(228-244)Online publication date: 1-Oct-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media