Article

Gdev: first-class GPU resource management in the operating system

Authors:

Michael McThrow,

Carlos Maltzahn,

Scott BrandtAuthors Info & Claims

USENIX ATC'12: Proceedings of the 2012 USENIX conference on Annual Technical Conference

Page 37

Published: 13 June 2012 Publication History

Abstract

Graphics processing units (GPUs) have become a very powerful platformembracing a concept of heterogeneous many-core computing. However, application domains of GPUs are currently limited to specific systems, largely due to a lack of "first-class" GPU resource management for general-purposemulti-tasking systems.

We present Gdev, a new ecosystem of GPU resource management in the operating system (OS). It allows the user space as well as the OS itself to use GPUs as first-class computing resources. Specifically, Gdev's virtual memory manager supports data swapping for excessive memory resource demands, and also provides a shared devicememory functionality that allows GPU contexts to communicate with other contexts. Gdev further provides a GPU scheduling scheme to virtualize a physical GPU into multiple logical GPUs, enhancing isolation among working sets of multi-tasking systems.

Our evaluation conducted on Linux and the NVIDIA GPU shows that the basic performance of our prototype implementation is reliable even compared to proprietary software. Further detailed experiments demonstrate that Gdev achieves a 2x speedup for an encrypted file system using the GPU in the OS. Gdev can also improve the makespan of dataflow programs by up to 49% exploiting shared device memory, while an error in the utilization of virtualized GPUs can be limited within only 7%.

References

[1]

BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In Proc. of ACM Symposium on Operating Systems Principles (2003).

[2]

BAUTIN, M., DWARAKINATH, A., AND CHIUEH, T. Graphics engine resource management. In Proc. of Annual Multimedia Computing and Networking Conference (2008).

[3]

CHE, S., BOYER, M., MENG, J., TARJAN, D., SHEAFFER, J., LEE, S.- H., AND SKADRON, K. Rodinia: A benchmark suite for heterogeneous computing. In Proc. of IEEE International Conference on Workload Characterization (2009), pp. 44-54.

[4]

CHEN, L., VILLA, O., KRISHNAMOORTHY, S., AND GAO, G. Dynamic Load Balancing on Single- and Multi-GPU Systems. In Proc. of IEEE International Parallel and Distributed Processing Symposium (2010).

[5]

DIAMOS, G., KERR, A., YALAMANCHILI, S., AND CLARK, N. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In Proc. of ACM International Conference on Parallel Architectures and Compilation Techniques (2010), pp. 353-364.

[6]

DOWTY, M., AND SUGEMAN, J. GPU virtualization on VMware's hosted I/O architecture. ACM Operating Systems Review 43, 3 (2009), 73-82.

[7]

GUEVARA, M., GREGG, C., HAZELWOOD, K., AND SKADRON, K. Enabling Task Parallelism in the CUDA Scheduler. In Proc. of Workshop on Programming Models for Emerging Architectures (2009), pp. 69-76.

[8]

GULATI, A., AHMAD, I., AND WALDSPURGER, C. PARDA: Proportional allocation of resources for distributed storage access. In Proc. of USENIX Conference on File and Storage Technology (2009).

[9]

GUPTA, V., SCHWAN, K., TOLIA, N., TALWAR, V., AND RANGANATHAN, P. Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In Proc. of USENIX Annual Technical Conference (2011).

[10]

HAND, S., JANG, K., PARK, K., AND MOON, S. PacketShader: a GPU-accelerated software router. In Proc. of ACM SIGCOMM (2010).

[11]

JABLIN, T., PRABHU, P., JABLIN, J., JOHNSON, N., BEARD, S., AND AUGUST, D. Automatic CPU-GPU communication management and optimization. In Proc. of ACM Conference on Programming Language Design and Implementation (2011).

[12]

JANG, K., HAN, S., HAN, S., MOON, S., AND PARK, K. SSLShader: Cheap SSL acceleration with commodity processors. In Proc. of USENIX Conference on Networked Systems Design and Implementation (2011).

[13]

KATO, S., LAKSHMANAN, K., ISHIKAWA, Y., AND RAJKUMAR, R. Resource sharing in GPU-accelerated windowing systems. In Proc. of IEEE Real-Time and Embedded Technology and Aplications Symposium (2011), pp. 191-200.

[14]

KATO, S., LAKSHMANAN, K., KUMAR, A., KELKAR, M., ISHIKAWA, Y., AND RAJKUMAR, R. RGEM: A responsive GPGPU execution model for runtime engines. In Proc. of IEEE Real-Time Systems Symposium (2011), pp. 57-66.

[15]

KATO, S., LAKSHMANAN, K., RAJKUMAR, R., AND ISHIKAWA, Y. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proc. of USENIX Annual Technical Conference (2011).

[16]

KIM, C., CHHUGANI, J., SATISH, N., SEDLAR, E., NGUYEN, A., KALDEWEY, T., LEE, V., BRANDT, S., AND DUBEY, P. FAST: Fast architecture sensitive tree search on modern CPUs and GPUs. In Proc. of ACM International Conference on Management of Data (2010).

[17]

LAGAR-CAVILLA, H., TOLIA, N., SATYANARAYANAN, M., AND DE LARA, E. VMM-independent graphics acceleration. In Proc. of ACM/Usenix International Conference on Virtual Execution Environments (2007), pp. 33-43.

[18]

MARTIN, K., FAITH, R., OWEN, J., AND AKIN, A. Direct Rendering Infrastructure, Low-Level Design Document. Precision Insight, Inc., 1999.

[19]

MCNAUGHTON, M., URMSON, C., DOLAN, J., AND LEE, J.-W. Motion Planning for Autonomous Driving with a Conformal Spatiotemporal Lattice. In Proc. of IEE International Conference on Robotics and Automation (2011), pp. 4889-4895.

[20]

NVIDIA. NVIDIA's next generation CUDA compute architecture: Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/ NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf, 2009.

[21]

NVIDIA. CUDA 4.0. http://developer.nvidia.com/ cuda-toolkit-40, 2011.

[22]

POVZNER, A., KALDEWY, T., BRANDT, S., GOLDING, R., WONG, T., AND MALTZAHN, C. Efficient guaranteed disk request scheduling with Fahrrad. In Proc. of ACM European Conference on Computer Systems (2008), pp. 13-25.

[23]

PRONOVOST, S., MORETON, H., AND KELLEY, T. Windows Display Driver Model (WDDM v2 and beyond. Windows Hardware Engineering Conference, 2006.

[24]

ROSSBACH, C., CURREY, J., SILBERSTEIN, M., RAY, B., AND WITCHEL, E. PTask: Operating system abstractions to manage GPUs as compute devices. In Proc. of ACM Symposium on Operating Systems Principles (2011).

[25]

S. KATO. Gdev Project. http://sys.ertl.jp/gdev/, 2012.

[26]

SABA, A., AND MANGHARAM, R. Anytime Algorithms for GPU Architectures. In Proc. of IEEE Real-Time Systems Symposium (2011).

[27]

SHIMOKAWABE, T., AOKI, T., MUROI, C., ISHIDA, J., KAWANO, K., ENDO, T., NUKADA, A., MARUYAMA, N., AND MATSUOKA, S. An 80-Fold Speedup, 15.0 TFlops, Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code. In Proc. of ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (2010).

[28]

SUN, W., RICCI, R., AND CURRY, M. GPUstore: Harnessing GPU Computing for Storage Systems in the OS Kernel.

[29]

TOP500 SUPERCOMPUTING SITE. http://www.top500.org/, 2011.

[30]

WANG, Y., AND MERCHANT, A. Proportional-share scheduling for distributed storage systems. In Proc. of USENIX Conference on File and Storage Technology (2007).

Cited By

Di BHu DXie ZSun JChen HRen JLi D(2021)TLB-pilot: Mitigating TLB Contention Attack on GPUs with Microarchitecture-Aware SchedulingACM Transactions on Architecture and Code Optimization10.1145/349121819:1(1-23)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3491218
Eduardo VDe Bona LZola WMerchant AWeatherspoon H(2019)Speculative encryption on GPU applied to cryptographic file systemsProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323307(93-105)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3323298.3323307
Tu CLin T(2019)Augmenting Operating Systems with OpenCL AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/331556924:3(1-29)Online publication date: 28-Mar-2019
https://dl.acm.org/doi/10.1145/3315569
Show More Cited By

Gdev: first-class GPU resource management in the operating system

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

USENIX ATC'12: Proceedings of the 2012 USENIX conference on Annual Technical Conference

June 2012

41 pages

Publisher

USENIX Association

United States

Publication History

Published: 13 June 2012

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

53
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Di BHu DXie ZSun JChen HRen JLi D(2021)TLB-pilot: Mitigating TLB Contention Attack on GPUs with Microarchitecture-Aware SchedulingACM Transactions on Architecture and Code Optimization10.1145/349121819:1(1-23)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3491218
Eduardo VDe Bona LZola WMerchant AWeatherspoon H(2019)Speculative encryption on GPU applied to cryptographic file systemsProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323307(93-105)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3323298.3323307
Tu CLin T(2019)Augmenting Operating Systems with OpenCL AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/331556924:3(1-29)Online publication date: 28-Mar-2019
https://dl.acm.org/doi/10.1145/3315569
Khan NLatif MPervaiz NBaig MKhatoon HBaig MBurney A(2019)Smart Scheduler for CUDA Programming in Heterogeneous CPU/GPU EnvironmentProceedings of the 11th International Conference on Computer Modeling and Simulation10.1145/3307363.3307377(250-253)Online publication date: 16-Jan-2019
https://dl.acm.org/doi/10.1145/3307363.3307377
Jang ITang AKim TSethumadhavan SHuh JBahar IHerlihy MWitchel ELebeck A(2019)Heterogeneous Isolated Execution for Commodity GPUsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304021(455-468)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304021
Tan HTan YHe XLi KLi K(2019)A Virtual Multi-Channel GPU Fair Scheduling Method for Virtual MachinesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286534130:2(257-270)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1109/TPDS.2018.2865341
Volos SVaswani KBruno RArpaci-Dusseau AVoelker G(2018)GravitonProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291219(681-696)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291219
Ausavarungnirun RMiller VLandgraf JGhose SGandhi JJog ARossbach CMutlu O(2018)MASKACM SIGPLAN Notices10.1145/3296957.317316953:2(503-518)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3173169
Yoo SKim HKim JKasikci BSilberstein M(2018)Secure Compute-VMProceedings of the 3rd Workshop on System Software for Trusted Execution10.1145/3268935.3268945(34-36)Online publication date: 15-Oct-2018
https://dl.acm.org/doi/10.1145/3268935.3268945
Yazdanbakhsh ASong CSacks JLotfi-Kamran PEsmaeilzadeh HKim NEvripidou SStenström PO'Boyle M(2018)In-DRAM near-data approximate acceleration for GPUsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243188(1-14)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243188
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents