Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3078468.3078478acmconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article

Crane: fast and migratable GPU passthrough for OpenCL applications

Published: 22 May 2017 Publication History

Abstract

General purpose GPU (GPGPU) computing in virtualized environments leverages PCI passthrough to achieve GPU performance comparable to bare-metal execution. However, GPU passthrough prevents service administrators from performing virtual machine migration between physical hosts.
Crane is a new technique for virtualizing OpenCL-based GPGPU computing that achieves within 5.25% of passthrough GPU performance while supporting VM migration. Crane interposes a virtualization-aware OpenCL library that makes it possible to reclaim and subsequently reassign physical GPUs to a VM without terminating the guest or its applications. Crane also enables continued GPU operation while the VM is undergoing live migration by transparently switching between GPU passthrough operation and API remoting.

References

[1]
PCI express - address translation services revision 1.1. 2009.
[2]
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Savannah, Georgia, USA, 2016.
[3]
K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. ACM SIGOPS Operating Systems Review, 40(5):2--13, 2006.
[4]
Altera. Implementing FPGA design with the OpenCL standard. Whitepaper, 2013.
[5]
Amazon Web Services. EC2 Pricing. https://aws.amazon.com/ec2/pricing/on-demand/.
[6]
Amazon Web Services. Linux Accelerated Computing Instances. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cluster_computing.html.
[7]
AMD. AMD I/O virtualization technology (IOMMU) specification. 2011.
[8]
AMD Corporation. AMD FirePro S Series. https://www.amd.com/Documents/FirePro-S-Series-Datasheet.pdf.
[9]
P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In ACM SIGOPS operating systems review, volume 37, pages 164--177. ACM, 2003.
[10]
A. L. Beberg, D. L. Ensign, G. Jayachandran, S. Khaliq, and V. S. Pande. Folding@home: Lessons from eight years of volunteer distributed computing. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--8. IEEE, 2009.
[11]
A. Burtsev, K. Srinivasan, P. Radhakrishnan, K. Voruganti, and G. R. Goodson. Fido: Fast inter-virtual-machine communication for enterprise appliances. In USENIX Annual technical conference. San Diego, CA, 2009.
[12]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44--54. Ieee, 2009.
[13]
C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In Proceedings of the 2Nd Conference on Symposium on Networked Systems Design & Implementation - Volume 2, NSDI'05, pages 273--286, Berkeley, CA, USA, 2005. USENIX Association.
[14]
R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A MATLAB-like environment for machine learning. In BigLearn, NIPS Workshop, number EPFL-CONF-192376, 2011.
[15]
C. I. Dalton, D. Plaquin, W. Weidner, D. Kuhlmann, B. Balacheff, and R. Brown. Trusted virtual platforms: a key enabler for converged client devices. ACM SIGOPS Operating Systems Review, 43(1):36--43, 2009.
[16]
A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 63--74. ACM, 2010.
[17]
Debian Wiki. KVM VGA Passthrough. https://wiki.debian.org/VGAPassthrough.
[18]
Y. Dong, M. Xue, X. Zheng, J. Wang, Z. Qi, and H. Guan. Boosting GPU virtualization performance with hybrid shadow page tables. In USENIX Annual Technical Conference, pages 517--528, 2015.
[19]
M. Dowty and J. Sugerman. GPU virtualization on VMware's hosted I/O architecture. ACM SIGOPS Operating Systems Review, 43(3):73--82, 2009.
[20]
P. Eastman, M. S. Friedrichs, J. D. Chodera, R. J. Radmer, C. M. Bruns, J. P. Ku, K. A. Beauchamp, T. J. Lane, L.-P. Wang, D. Shukla, T. Tye, M. Houston, T. Stich, C. Klein, M. R. Shirts, and V. S. Pande. OpenMM 4: A reusable, extensible, hardware independent library for high performance molecular simulation. Journal of Chemical Theory and Computation, 9(1):461--469, 2013. 23316124.
[21]
Google Cloud Platform. Graphics Processing Units (GPU) | Google Cloud Platform. https://cloud.google.com/gpu/.
[22]
M. Gottschlag, M. Hillenbrand, J. Kehne, J. Stoess, and F. Bellosa. LoGV: Low-overhead GPGPU virtualization. In High Performance Computing and Communications 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on, pages 1721--1726, Nov 2013.
[23]
V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, HPCVirt '09, pages 17--24, New York, NY, USA, 2009. ACM.
[24]
Y.-J. Huang, H.-H. Wu, Y.-C. Chung, and W.-C. Hsu. Building a KVM-based hypervisor for a heterogeneous system architecture compliant system. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 3--15. ACM, 2016.
[25]
Hugh Perkins. DeepCL: deep convolutional networks in OpenCL. http://deepcl.hughperkins.com/.
[26]
Intel. Intel virtualization technology for directed I/O, revision 2.4. 2016.
[27]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675--678. ACM, 2014.
[28]
A. Kadav and M. M. Swift. Live migration of direct-access devices. SIGOPS Oper. Syst. Rev., 43(3):95--104, July 2009.
[29]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.
[30]
KVM. Hotadd PCI Devices. http://www.linux-kvm.org/page/Hotadd_pci_devices.
[31]
H. A. Lagar-Cavilla, N. Tolia, M. Satyanarayanan, and E. de Lara. VMM-independent Graphics Acceleration. In Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE '07, pages 33--43, New York, NY, USA, 2007. ACM.
[32]
Linux open-source community. Nouveau Open-Source GPU Device Driver. http://nouveau.freedesktop.org/.
[33]
Matt Kapko. How (and Why) Facebook Excels at Data Center Efficiency. http://www.cio.com/article/2854720/data-center/how-and-why-facebook-excels-at-data-center-efficiency.html.
[34]
Microsoft Azure. N-Series GPU enabled Virtual Machines. https://azure.microsoft.com/en-us/pricing/details/virtual-machines/series/#n-series.
[35]
Netflix Inc. Distributed Neural Networks with GPUs in the AWS Cloud. http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html.
[36]
NVIDIA Corporation. Virtual GPU Technology - NVIDIA GRID. http://www.nvidia.ca/object/grid-technology.html.
[37]
Z. Pan, Y. Dong, Y. Chen, L. Zhang, and Z. Zhang. Compsc: Live migration with pass-through devices. ACM SIGPLAN Notices, 47(7):109--120, 2012.
[38]
C. P. Sapuntzakis, R. Chandra, B. Pfaff, J. Chow, M. S. Lam, and M. Rosenblum. Optimizing the migration of virtual computers. SIGOPS Oper. Syst. Rev., 36(SI):377--390, Dec. 2002.
[39]
L. Shi, H. Chen, J. Sun, and K. Li. vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Transactions on Computers, 61(6):804--816, June 2012.
[40]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
[41]
Stanford University. FAHBench. https://fahbench.github.io.
[42]
H. Su, M. Wen, N. Wu, J. Ren, and C. Zhang. Efficient parallel video processing techniques on GPU: From framework to implementation. The Scientific World Journal, 2014, 2014. Hindawi Publishing Corporation, 19 pages.
[43]
Y. Suzuki, S. Kato, H. Yamada, and K. Kono. GPUvm: Why not virtualizing GPUs at the hypervisor? In USENIX Annual Technical Conference, pages 109--120, 2014.
[44]
H. Takizawa, K. Koyama, K. Sato, K. Komatsu, and H. Kobayashi. CheCL: Transparent checkpointing and process migration of OpenCL applications. In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International, pages 864--876. IEEE, 2011.
[45]
K. Tian, Y. Dong, and D. Cowperthwaite. A Full GPU Virtualization Solution with Mediated Pass-Through. In 2014 USENIX Annual Technical Conference (USENIX ATC 14), pages 121--132, Philadelphia, PA, 2014. USENIX Association.
[46]
VideoLAN. x264, the best H.264/AVC encoder. http://www.videolan.org/developers/x264.html.
[47]
J. P. Walters, A. J. Younge, D. I. Kang, K. T. Yao, M. Kang, S. P. Crago, and G. C. Fox. GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications. In 2014 IEEE 7th International Conference on Cloud Computing, pages 636--643, June 2014.
[48]
J. Wang, K.-L. Wright, and K. Gopalan. XenLoop: a transparent high performance inter-VM network loopback. In Proceedings of the 17th international symposium on High performance distributed computing, pages 109--118. ACM, 2008.
[49]
Xen Wiki. Xen 4.2: XL and PCI pass-through. https://wiki.xen.org/wiki/Xen_4.2:_xl_and_pci_pass-through.
[50]
Xen Wiki. Xen VGA Passthrough. https://wiki.xen.org/wiki/Xen_PCI_Passthrough.
[51]
S. Xiao, P. Balaji, J. Dinan, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. c. Feng. Transparent Accelerator Migration in a Virtualized GPU Environment. In Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on, pages 124--131, May 2012.
[52]
S. Xiao, P. Balaji, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. c. Feng. VOCL: An optimized environment for transparent virtualization of graphics processing units. In Innovative Parallel Computing (InPar), 2012, pages 1--12, May 2012.
[53]
X. Xu and B. Davda. SRVM: Hypervisor support for live migration with passthrough SR-IOV network devices. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '16, pages 65--77, New York, NY, USA, 2016. ACM.
[54]
C.-T. Yang, J.-C. Liu, H.-Y. Wang, and C.-H. Hsu. Implementation of GPU virtualization using PCI pass-through mechanism. J. Supercomput., 68(1):183--213, Apr. 2014.
[55]
Y.-P. You, H.-J. Wu, Y.-N. Tsai, and Y.-T. Chao. VirtCL: A framework for OpenCL device abstraction and management. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pages 161--172, New York, NY, USA, 2015. ACM.
[56]
YouTube Creator Blog. Look ahead: creator features coming to YouTube. https://01.org/igvt-g.
[57]
E. Zhai, G. D. Cummings, and Y. Dong. Live migration with pass-through device for Linux VM. In Ottawa Linux Symposium, pages 261--268, 2008.

Cited By

View all
  • (2021)GPGPU Task Scheduling Technique for Reducing the Performance Deviation of Multiple GPGPU Tasks in RPC-Based GPU Virtualization EnvironmentsSymmetry10.3390/sym1303050813:3(508)Online publication date: 20-Mar-2021
  • (2020)CRState: checkpoint/restart of OpenCL program for in-kernel applicationsThe Journal of Supercomputing10.1007/s11227-020-03460-2Online publication date: 6-Nov-2020
  • (2020)Partial migration technique for GPGPU tasks to Prevent GPU Memory Starvation in RPC‐based GPU VirtualizationSoftware: Practice and Experience10.1002/spe.280150:6(948-972)Online publication date: 11-Feb-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage Conference
May 2017
195 pages
ISBN:9781450350358
DOI:10.1145/3078468
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • TCE: Technion Computer Engineering Center
  • USENIX Assoc: USENIX Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPGPU
  2. GPU
  3. OpenCL
  4. live migration
  5. passthrough
  6. virtualization

Qualifiers

  • Research-article

Funding Sources

Conference

SYSTOR'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 108 of 323 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)GPGPU Task Scheduling Technique for Reducing the Performance Deviation of Multiple GPGPU Tasks in RPC-Based GPU Virtualization EnvironmentsSymmetry10.3390/sym1303050813:3(508)Online publication date: 20-Mar-2021
  • (2020)CRState: checkpoint/restart of OpenCL program for in-kernel applicationsThe Journal of Supercomputing10.1007/s11227-020-03460-2Online publication date: 6-Nov-2020
  • (2020)Partial migration technique for GPGPU tasks to Prevent GPU Memory Starvation in RPC‐based GPU VirtualizationSoftware: Practice and Experience10.1002/spe.280150:6(948-972)Online publication date: 11-Feb-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media