Article

TimeGraph: GPU scheduling for real-time multi-tasking environments

Authors:

Karthik Lakshmanan,

Ragunathan Rajkumar,

Yutaka IshikawaAuthors Info & Claims

USENIXATC'11: Proceedings of the 2011 USENIX conference on USENIX annual technical conference

Page 2

Published: 15 June 2011 Publication History

Abstract

The Graphics Processing Unit (GPU) is now commonly used for graphics and data-parallel computing. As more and more applications tend to accelerate on the GPU in multi-tasking environments where multiple tasks access the GPU concurrently, operating systems must provide prioritization and isolation capabilities in GPU resource management, particularly in real-time setups.

We present TimeGraph, a real-time GPU scheduler at the device-driver level for protecting important GPU workloads from performance interference. TimeGraph adopts a new event-driven model that synchronizes the GPU with the CPU to monitor GPU commands issued from the user space and control GPU resource usage in a responsive manner. TimeGraph supports two priority-based scheduling policies in order to address the tradeoff between response times and throughput introduced by the asynchronous and non-preemptive nature of GPU processing. Resource reservation mechanisms are also employed to account and enforce GPU resource usage, which prevent misbehaving tasks from exhausting GPU resources. Prediction of GPU command execution costs is further provided to enhance isolation.

Our experiments using OpenGL graphics benchmarks demonstrate that TimeGraph maintains the frame-rates of primary GPU tasks at the desired level even in the face of extreme GPU workloads, whereas these tasks become nearly unresponsive without TimeGraph support. Our findings also include that the performance overhead imposed on TimeGraph can be limited to 4-10%, and its event-driven scheduler improves throughput by about 30 times over the existing tick-driven scheduler.

References

[1]

BAUTIN, M., DWARAKINATH, A., AND CHIUEH, T. Graphics Engine Resource Management. In Proc. MMCN (2008).

[2]

DIMITRIJEVIC, Z., RANGAWAMI, R., AND CHANG, E. Design and Implementation of Semi-preemptible IO. In Proc. USENIX FAST (2003).

[3]

DOWTY, M., AND SUGEMAN, J. GPU Virtualization on VMware's Hosted I/O Architecture. ACM SIGOPS Operating Systems Review 43, 3 (2009), 73-82.

[4]

DUDA, K., AND CHERITON, D. Borrowed-Virtual-Time (BVT) Scheduling: Supporting Latency-Sensitive Threads in a General-Purpose Scheduler. In Proc. ACM SOSP (1999), pp. 261-276.

[5]

DWARAKINATH, A. A Fair-Share Scheduler for the Graphics Processing Unit. Master's thesis, Stony Brook University, 2008.

[6]

FAITH, R. The Direct Rendering Manager: Kernel Support for the Direct Rendering Infrastructure. Precision Insight, Inc., 1999.

[7]

FREEDESKTOP. Nouveau Open-Source Driver. http://nouveau.freedesktop.org/.

[8]

GUPTA, V., GAVRILOVSKA, A., TOLIA, N., AND TALWAR, V. GViM: GPU-accelerated Virtual Machines. In Proc. ACM HPCVirt (2009), pp. 17-24.

[9]

JONES, M., ROSU, D., AND ROSU, M.-C. CPU Reservations and Time Constraints: Efficient, Predictable Scheduling of Independent Activities. In Proc. ACM SOSP (1997), pp. 198-211.

[10]

KRASIC, C., SAUBHASIK, M., AND GOEL, A. Fair and Timely Scheduling via Cooperative Polling. In Proc. ACM EuroSys (2009), pp. 103-116.

[11]

LAGAR-CAVILLA, H., TOLIA, N., SATYANARAYANAN, M., AND DE LARA, E. VMM-Independent Graphics Acceleration. In Proc. ACM VEE (2007), pp. 33-43.

[12]

LEHOCZKY, J., SHA, L., AND STROSNIDER, J. Enhanced Aperiodic Responsiveness in Hard Real-Time Environments. In Proc. IEEE RTSS (1987), pp. 261-270.

[13]

LIU, C., AND LAYLAND, J. Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment. Journal of the ACM 20 (1973), 46-61.

[14]

MARTIN, K., FAITH, R., OWEN, J., AND AKIN, A. Direct Rendering Infrastructure, Low-Level Design Document. Precision Insight, Inc., 1999.

[15]

MESA3D. Gallium3D. http://www.mesa3d.org/.

[16]

MOLANO, A., JUWA, K., AND RAJKUMAR, R. Real-Time Filesystems. Guaranteeing Timing Constraints for Disk Accesses in RT-Mach. In Proc. IEEE RTSS (1997), pp. 155-165.

[17]

NIEH, J., AND LAM, M. SMART: A Processor Scheduler for Multimedia Applications. In Proc. ACM SOSP (1995).

[18]

NIGHTINGALE, E., HODSON, O., MCLLORY, R., HAWBLITZEL, C., AND HUNT, G. Helios: Heterogeneous Multiprocessing with Satellite Kernels. In Proc. ACM SOSP (2009).

[19]

NVIDIA CORPORATION. Proprietary Driver. http://www.nvidia.com/page/drivers.html.

[20]

OIKAWA, S., AND RAJKUMAR, R. Portable RT: A Portable Resource Kernel for Guaranteed and Enforced Timing Behavior. In Proc. IEEE RTAS (1999), pp. 111-120.

[21]

PATHSCALE INC. ENZO. http://www.pathscale.com/.

[22]

PATHSCALE INC. PSCNV. https://github.com/pathscale/pscnv.

[23]

PHORONIX. NVIDIA Developer Talks Openly About Linux Support. http://www.phoronix.com/scan.php?page=article&item=nvidia_qa_linux&num=2.

[24]

PHORONIX. Phoronix Test Suite. http://www.phoronix-test-suite.com/.

[25]

PRONOVOST, S., MORETON, H., AND KELLEY, T. Windows Display Driver Model (WDDM v2) And Beyond. In Windows Hardware Engineering Conference (2006).

[26]

RAJKUMAR, R., LEE, C., LEHOCZKY, J., AND SIEWIOREK, D. A Resource Allocation Model for QoS Management. In Proc. IEEE RTSS (1997), pp. 298-307.

[27]

ROUSSOS, K., BITAR, N., AND ENGLISH, R. Deterministic Batch Scheduling Without Static Partitioning. In Proc. JSSPP (1999), pp. 220-235.

[28]

SPEC. SPECviewperf. http://www.spec.org/gwpg/gpc.static/vp11info.html.

[29]

SPRUNT, B., LEHOCZKY, J., AND SHA, L. Exploiting Unused Periodic Time for Aperiodic Service using the Extended Priority Exchange Algorithm. In Proc. IEEE RTSS (1988), pp. 251-258.

[30]

SPURI, M., AND BUTTAZO, G. Efficient Aperiodic Service under Earliest Deadline Scheduling. In Proc. IEEE RTSS (1994), pp. 2-11.

[31]

YANG, T., LIU, T., BERGER, E., KAPLAN, S., AND MOSS, J.- B. Redline: First Class Support for Interactivity in Commodity Operating Systems. In Proc. USENIX OSDI (2008), pp. 73-86.

Cited By

Chen BZhao HCui WHe YZhang SChen QLi ZGuo M(2023)Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with ComboProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624660(265-280)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624660
Ng KDemoulin HLiu VDruschel PKaufmann AMace JFlinn JSeltzer M(2023)Paella: Low-latency Model Serving with Software-defined GPU SchedulingProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613163(595-610)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613163
Darabi SMahani NBaxishi HYousefzadeh-Asl-Miandoab ESadrosadati MSarbazi-Azad H(2022)NURAProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080366:1(1-27)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3508036
Show More Cited By

Index Terms

TimeGraph: GPU scheduling for real-time multi-tasking environments
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Index terms have been assigned to the content through auto-classification.

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

USENIXATC'11: Proceedings of the 2011 USENIX conference on USENIX annual technical conference

June 2011

36 pages

Program Chairs:
Jason Nieh
Columbia University
,
Carl Waldspurger

Publisher

USENIX Association

United States

Publication History

Published: 15 June 2011

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen BZhao HCui WHe YZhang SChen QLi ZGuo M(2023)Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with ComboProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624660(265-280)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624660
Ng KDemoulin HLiu VDruschel PKaufmann AMace JFlinn JSeltzer M(2023)Paella: Low-latency Model Serving with Software-defined GPU SchedulingProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613163(595-610)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613163
Darabi SMahani NBaxishi HYousefzadeh-Asl-Miandoab ESadrosadati MSarbazi-Azad H(2022)NURAProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080366:1(1-27)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3508036
Di BHu DXie ZSun JChen HRen JLi D(2021)TLB-pilot: Mitigating TLB Contention Attack on GPUs with Microarchitecture-Aware SchedulingACM Transactions on Architecture and Code Optimization10.1145/349121819:1(1-23)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3491218
Hunt TJia ZMiller VSzekely AHu YRossbach CWitchel EBhagwan RPorter G(2020)TelekineProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388301(817-834)Online publication date: 25-Feb-2020
https://dl.acm.org/doi/10.5555/3388242.3388301
Eran HZeno LTork MMalka GSilberstein MDan TDahlia M(2019)NICAProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358838(345-361)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358838
Wang GLi WZhang JGe YFu ZZhang FWang YZhang D(2019)sharedChargingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/33512663:3(1-25)Online publication date: 9-Sep-2019
https://dl.acm.org/doi/10.1145/3351266
Zhang WCui WFu KChen QMawhirter DWu BLi CGuo MEigenmann RDing CMcKee S(2019)LaiusProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330351(58-68)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330351
Li YShan CChen RTang XCai WTang SLiu XWang GGong XZhang YWeissman JButt ASmirni E(2019)GAugurProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3307681.3325409(231-242)Online publication date: 17-Jun-2019
https://dl.acm.org/doi/10.1145/3307681.3325409
Zhang KHe BHu JWang ZHua BMeng JYang LSeshan SBanerjee S(2018)G-netProceedings of the 15th USENIX Conference on Networked Systems Design and Implementation10.5555/3307441.3307458(187-200)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.5555/3307441.3307458
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents