research-article

Crane: fast and migratable GPU passthrough for OpenCL applications

Authors:

Eyal de LaraAuthors Info & Claims

SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage Conference

Article No.: 11, Pages 1 - 13

https://doi.org/10.1145/3078468.3078478

Published: 22 May 2017 Publication History

Abstract

General purpose GPU (GPGPU) computing in virtualized environments leverages PCI passthrough to achieve GPU performance comparable to bare-metal execution. However, GPU passthrough prevents service administrators from performing virtual machine migration between physical hosts.

Crane is a new technique for virtualizing OpenCL-based GPGPU computing that achieves within 5.25% of passthrough GPU performance while supporting VM migration. Crane interposes a virtualization-aware OpenCL library that makes it possible to reclaim and subsequently reassign physical GPUs to a VM without terminating the guest or its applications. Crane also enables continued GPU operation while the VM is undergoing live migration by transparently switching between GPU passthrough operation and API remoting.

References

[1]

PCI express - address translation services revision 1.1. 2009.

[2]

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Savannah, Georgia, USA, 2016.

Digital Library

[3]

K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. ACM SIGOPS Operating Systems Review, 40(5):2--13, 2006.

Digital Library

[4]

Altera. Implementing FPGA design with the OpenCL standard. Whitepaper, 2013.

[5]

Amazon Web Services. EC2 Pricing. https://aws.amazon.com/ec2/pricing/on-demand/.

[6]

Amazon Web Services. Linux Accelerated Computing Instances. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cluster_computing.html.

[7]

AMD. AMD I/O virtualization technology (IOMMU) specification. 2011.

[8]

AMD Corporation. AMD FirePro S Series. https://www.amd.com/Documents/FirePro-S-Series-Datasheet.pdf.

[9]

P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In ACM SIGOPS operating systems review, volume 37, pages 164--177. ACM, 2003.

Digital Library

[10]

A. L. Beberg, D. L. Ensign, G. Jayachandran, S. Khaliq, and V. S. Pande. Folding@home: Lessons from eight years of volunteer distributed computing. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--8. IEEE, 2009.

Digital Library

[11]

A. Burtsev, K. Srinivasan, P. Radhakrishnan, K. Voruganti, and G. R. Goodson. Fido: Fast inter-virtual-machine communication for enterprise appliances. In USENIX Annual technical conference. San Diego, CA, 2009.

Digital Library

[12]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44--54. Ieee, 2009.

Digital Library

[13]

C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In Proceedings of the 2Nd Conference on Symposium on Networked Systems Design & Implementation - Volume 2, NSDI'05, pages 273--286, Berkeley, CA, USA, 2005. USENIX Association.

Digital Library

[14]

R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A MATLAB-like environment for machine learning. In BigLearn, NIPS Workshop, number EPFL-CONF-192376, 2011.

[15]

C. I. Dalton, D. Plaquin, W. Weidner, D. Kuhlmann, B. Balacheff, and R. Brown. Trusted virtual platforms: a key enabler for converged client devices. ACM SIGOPS Operating Systems Review, 43(1):36--43, 2009.

Digital Library

[16]

A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 63--74. ACM, 2010.

Digital Library

[17]

Debian Wiki. KVM VGA Passthrough. https://wiki.debian.org/VGAPassthrough.

[18]

Y. Dong, M. Xue, X. Zheng, J. Wang, Z. Qi, and H. Guan. Boosting GPU virtualization performance with hybrid shadow page tables. In USENIX Annual Technical Conference, pages 517--528, 2015.

[19]

M. Dowty and J. Sugerman. GPU virtualization on VMware's hosted I/O architecture. ACM SIGOPS Operating Systems Review, 43(3):73--82, 2009.

Digital Library

[20]

P. Eastman, M. S. Friedrichs, J. D. Chodera, R. J. Radmer, C. M. Bruns, J. P. Ku, K. A. Beauchamp, T. J. Lane, L.-P. Wang, D. Shukla, T. Tye, M. Houston, T. Stich, C. Klein, M. R. Shirts, and V. S. Pande. OpenMM 4: A reusable, extensible, hardware independent library for high performance molecular simulation. Journal of Chemical Theory and Computation, 9(1):461--469, 2013. 23316124.

[21]

Google Cloud Platform. Graphics Processing Units (GPU) | Google Cloud Platform. https://cloud.google.com/gpu/.

[22]

M. Gottschlag, M. Hillenbrand, J. Kehne, J. Stoess, and F. Bellosa. LoGV: Low-overhead GPGPU virtualization. In High Performance Computing and Communications 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on, pages 1721--1726, Nov 2013.

[23]

V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, HPCVirt '09, pages 17--24, New York, NY, USA, 2009. ACM.

Digital Library

[24]

Y.-J. Huang, H.-H. Wu, Y.-C. Chung, and W.-C. Hsu. Building a KVM-based hypervisor for a heterogeneous system architecture compliant system. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 3--15. ACM, 2016.

Digital Library

[25]

Hugh Perkins. DeepCL: deep convolutional networks in OpenCL. http://deepcl.hughperkins.com/.

[26]

Intel. Intel virtualization technology for directed I/O, revision 2.4. 2016.

[27]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675--678. ACM, 2014.

Digital Library

[28]

A. Kadav and M. M. Swift. Live migration of direct-access devices. SIGOPS Oper. Syst. Rev., 43(3):95--104, July 2009.

Digital Library

[29]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.

Digital Library

[30]

KVM. Hotadd PCI Devices. http://www.linux-kvm.org/page/Hotadd_pci_devices.

[31]

H. A. Lagar-Cavilla, N. Tolia, M. Satyanarayanan, and E. de Lara. VMM-independent Graphics Acceleration. In Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE '07, pages 33--43, New York, NY, USA, 2007. ACM.

Digital Library

[32]

Linux open-source community. Nouveau Open-Source GPU Device Driver. http://nouveau.freedesktop.org/.

[33]

Matt Kapko. How (and Why) Facebook Excels at Data Center Efficiency. http://www.cio.com/article/2854720/data-center/how-and-why-facebook-excels-at-data-center-efficiency.html.

[34]

Microsoft Azure. N-Series GPU enabled Virtual Machines. https://azure.microsoft.com/en-us/pricing/details/virtual-machines/series/#n-series.

[35]

Netflix Inc. Distributed Neural Networks with GPUs in the AWS Cloud. http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html.

[36]

NVIDIA Corporation. Virtual GPU Technology - NVIDIA GRID. http://www.nvidia.ca/object/grid-technology.html.

[37]

Z. Pan, Y. Dong, Y. Chen, L. Zhang, and Z. Zhang. Compsc: Live migration with pass-through devices. ACM SIGPLAN Notices, 47(7):109--120, 2012.

Digital Library

[38]

C. P. Sapuntzakis, R. Chandra, B. Pfaff, J. Chow, M. S. Lam, and M. Rosenblum. Optimizing the migration of virtual computers. SIGOPS Oper. Syst. Rev., 36(SI):377--390, Dec. 2002.

[39]

L. Shi, H. Chen, J. Sun, and K. Li. vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Transactions on Computers, 61(6):804--816, June 2012.

Digital Library

[40]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.

[41]

Stanford University. FAHBench. https://fahbench.github.io.

[42]

H. Su, M. Wen, N. Wu, J. Ren, and C. Zhang. Efficient parallel video processing techniques on GPU: From framework to implementation. The Scientific World Journal, 2014, 2014. Hindawi Publishing Corporation, 19 pages.

[43]

Y. Suzuki, S. Kato, H. Yamada, and K. Kono. GPUvm: Why not virtualizing GPUs at the hypervisor? In USENIX Annual Technical Conference, pages 109--120, 2014.

[44]

H. Takizawa, K. Koyama, K. Sato, K. Komatsu, and H. Kobayashi. CheCL: Transparent checkpointing and process migration of OpenCL applications. In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International, pages 864--876. IEEE, 2011.

Digital Library

[45]

K. Tian, Y. Dong, and D. Cowperthwaite. A Full GPU Virtualization Solution with Mediated Pass-Through. In 2014 USENIX Annual Technical Conference (USENIX ATC 14), pages 121--132, Philadelphia, PA, 2014. USENIX Association.

Digital Library

[46]

VideoLAN. x264, the best H.264/AVC encoder. http://www.videolan.org/developers/x264.html.

[47]

J. P. Walters, A. J. Younge, D. I. Kang, K. T. Yao, M. Kang, S. P. Crago, and G. C. Fox. GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications. In 2014 IEEE 7th International Conference on Cloud Computing, pages 636--643, June 2014.

Digital Library

[48]

J. Wang, K.-L. Wright, and K. Gopalan. XenLoop: a transparent high performance inter-VM network loopback. In Proceedings of the 17th international symposium on High performance distributed computing, pages 109--118. ACM, 2008.

Digital Library

[49]

Xen Wiki. Xen 4.2: XL and PCI pass-through. https://wiki.xen.org/wiki/Xen_4.2:_xl_and_pci_pass-through.

[50]

Xen Wiki. Xen VGA Passthrough. https://wiki.xen.org/wiki/Xen_PCI_Passthrough.

[51]

S. Xiao, P. Balaji, J. Dinan, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. c. Feng. Transparent Accelerator Migration in a Virtualized GPU Environment. In Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on, pages 124--131, May 2012.

Digital Library

[52]

S. Xiao, P. Balaji, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. c. Feng. VOCL: An optimized environment for transparent virtualization of graphics processing units. In Innovative Parallel Computing (InPar), 2012, pages 1--12, May 2012.

[53]

X. Xu and B. Davda. SRVM: Hypervisor support for live migration with passthrough SR-IOV network devices. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '16, pages 65--77, New York, NY, USA, 2016. ACM.

Digital Library

[54]

C.-T. Yang, J.-C. Liu, H.-Y. Wang, and C.-H. Hsu. Implementation of GPU virtualization using PCI pass-through mechanism. J. Supercomput., 68(1):183--213, Apr. 2014.

Digital Library

[55]

Y.-P. You, H.-J. Wu, Y.-N. Tsai, and Y.-T. Chao. VirtCL: A framework for OpenCL device abstraction and management. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pages 161--172, New York, NY, USA, 2015. ACM.

Digital Library

[56]

YouTube Creator Blog. Look ahead: creator features coming to YouTube. https://01.org/igvt-g.

[57]

E. Zhai, G. D. Cummings, and Y. Dong. Live migration with pass-through device for Linux VM. In Ottawa Linux Symposium, pages 261--268, 2008.

Cited By

Kang JYu H(2021)GPGPU Task Scheduling Technique for Reducing the Performance Deviation of Multiple GPGPU Tasks in RPC-Based GPU Virtualization EnvironmentsSymmetry10.3390/sym1303050813:3(508)Online publication date: 20-Mar-2021
https://doi.org/10.3390/sym13030508
Chen GZhang JZhu ZJiang QJiang HPang C(2020)CRState: checkpoint/restart of OpenCL program for in-kernel applicationsThe Journal of Supercomputing10.1007/s11227-020-03460-2Online publication date: 6-Nov-2020
https://doi.org/10.1007/s11227-020-03460-2
Kang JLim JYu H(2020)Partial migration technique for GPGPU tasks to Prevent GPU Memory Starvation in RPC‐based GPU VirtualizationSoftware: Practice and Experience10.1002/spe.280150:6(948-972)Online publication date: 11-Feb-2020
https://doi.org/10.1002/spe.2801

Index Terms

Crane: fast and migratable GPU passthrough for OpenCL applications

Recommendations

SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...
SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16: Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...
A Hypervisor Approach to Enable Live Migration with Passthrough SR-IOV Network Devices
Special Topics

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (physical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage Conference

May 2017

195 pages

ISBN:9781450350358

DOI:10.1145/3078468

General Chair:
Doron Chen
IBM Research
,
Program Chairs:
Peter Desnoyers
Northeastern University
,
Eyal de Lara
University of Toronto

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

In-Cooperation

TCE: Technion Computer Engineering Center
USENIX Assoc: USENIX Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences and Engineering Research Council of Canada

Conference

SYSTOR'17

Sponsor:

SIGOPS

SYSTOR'17: International Systems and Storage Conference

May 22 - 24, 2017

Haifa, Israel

Acceptance Rates

Overall Acceptance Rate 108 of 323 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
208
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kang JYu H(2021)GPGPU Task Scheduling Technique for Reducing the Performance Deviation of Multiple GPGPU Tasks in RPC-Based GPU Virtualization EnvironmentsSymmetry10.3390/sym1303050813:3(508)Online publication date: 20-Mar-2021
https://doi.org/10.3390/sym13030508
Chen GZhang JZhu ZJiang QJiang HPang C(2020)CRState: checkpoint/restart of OpenCL program for in-kernel applicationsThe Journal of Supercomputing10.1007/s11227-020-03460-2Online publication date: 6-Nov-2020
https://doi.org/10.1007/s11227-020-03460-2
Kang JLim JYu H(2020)Partial migration technique for GPGPU tasks to Prevent GPU Memory Starvation in RPC‐based GPU VirtualizationSoftware: Practice and Experience10.1002/spe.280150:6(948-972)Online publication date: 11-Feb-2020
https://doi.org/10.1002/spe.2801

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents