research-article

Heterogeneous Isolated Execution for Commodity GPUs

Authors:

Simha Sethumadhavan,

Jaehyuk HuhAuthors Info & Claims

ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 455 - 468

https://doi.org/10.1145/3297858.3304021

Published: 04 April 2019 Publication History

Abstract

Traditional CPUs and cloud systems based on them have embraced the hardware-based trusted execution environments to securely isolate computation from malicious OS or hardware attacks. However, GPUs and their cloud deployments have yet to include such support for hardware-based trusted computing. As large amounts of sensitive data are offloaded to GPU acceleration in cloud environments, ensuring the security of the data is a current and pressing need. As deployed today, the outsourced GPU model is vulnerable to attacks from compromised privileged software. To support isolated remote execution on GPUs even under vulnerable operating systems, this paper proposes a novel hardware and software architecture, called HIX (Heterogeneous Isolated eXecution). HIX does not require modifications to the GPU architecture to offer protections: Instead, it offers security by modifying the I/O interconnect between the CPU and GPU, and by refactoring the GPU device driver to work from within the CPU trusted environment. A result of the architectural choices behind HIX is that the concept can be applied to other offload accelerators besides GPUs. This work implements the proposed HIX architecture on an emulated machine with KVM and QEMU. Experimental results from the emulated security support with a real GPU show that the performance overhead for security is curtailed to 26% on average for the Rodinia benchmark, while providing secure isolated GPU computing.

References

[1]

Ittai Anati, Shay Gueron, Simon Johnson, and Vincent Scarlata. 2013. Innovative Technology for CPU Based Attestation and Sealing. In The 2nd International Workshop on Hardware and Architectural Support for Security and Privacy (HASP '13), Vol. 13. 1--6.

[2]

Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, Andre Martin, Christian Priebe, Joshua Lind, Divya Muthukumaran, Dan O'Keeffe, Mark Stillwell, David Goltzsche, Dave Eyers, Radiger Kapitza, Peter Pietzuch, and Christof Fetzer. 2016. SCONE: Secure Linux Containers with Intel SGX . In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16). 689--703.

Digital Library

[3]

Maurice Bailleu, Jörg Thalehim, Pramod Bhatotia, Christof Fetzer, Michio Honda, and Kapil Vaswani. 2019. SPEICHER: Securing LSM-based Key-Value Stores using Shielded Execution. In 17th USENIX Conference on File and Storage Technologies (FAST '19).

Digital Library

[4]

Silas Boyd-Wickizer and Nickolai Zeldovich. 2010. Tolerating Malicious Device Drivers in Linux. In 2010 USENIX Annual Technical Conference (USENIX ATC '10). 1--9.

Digital Library

[5]

Ravi Budruk, Don Anderson, and Tom Shanley. 2004. PCI Express System Architecture.

Digital Library

[6]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A Benchmark Suite for Heterogeneous Computing. In IEEE International Symposium on Workload Characterization (IISWC '09). 44--54.

Digital Library

[7]

Shuai Che, Jeremy W Sheaffer, Michael Boyer, Lukasz G Szafaryn, Liang Wang, and Kevin Skadron. 2010. A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads. In IEEE International Symposium on Workload Characterization (IISWC '10). 1--11.

Digital Library

[8]

Stephen Checkoway and Hovav Shacham. 2013. Iago Attacks: Why the System Call API is a Bad Untrusted RPC Interface. In The 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). 253--264.

Digital Library

[9]

Victor Costan and Srinivas Devadas. 2017. Intel SGX Explained . IACR Cryptology ePrint Archive (Feb 2017), 1--118.

[10]

Advanced Micro Devices. 2017. Radeon's Next Generation Vega Architecture. Technical Report. Advanced Micro Devices, Santa Clara, CA, USA.

[11]

Bang Di, Jianhua Sun, and Hao Chen. 2016. A Study of Overflow Vulnerabilities on GPUs. In IFIP International Conference on Network and Parallel Computing (NPC '16). 103--115.

[12]

Whitfield Diffie and Martin E. Hellman. 1976. New Directions in Cryptography . Transactions on Information Theory, Vol. 22, 6 (Nov 1976), 644--654.

Digital Library

[13]

Envytools. 2016. Envytools - Tools for People Envious of NVIDIA's Blob Driver . https://github.com/envytools/envytools Retrieved August 6, 2018 from

[14]

OpenSSL Software Foundation. 2003. OpenSSL: The Open Source toolkit for SSL/TLS . https://openssl.org Retrieved July 14, 2018 from

[15]

Yusuke Fujii, Takuya Azumi, Nobuhiko Nishio, Shinpei Kato, and Masato Edahiro. 2013. Data Transfer Matters for GPU Computing. In International Conference on Parallel and Distributed Systems (ICPADS '13). 275--282.

Digital Library

[16]

Peter N Glaskowsky. 2009. NVIDIA's Fermi: The First Complete GPU Computing Architecture. Technical Report. NVIDIA, Santa Clara, CA, USA.

[17]

Ari B Hayes, Lingda Li, Mohammad Hedayati, Jiahuan He, Eddy Z Zhang, and Kai Shen. 2017. GPU Taint Tracking. In 2017 USENIX Annual Technical Conference (USENIX ATC '17). 209--220.

Digital Library

[18]

Intel. 2014. Intel Software Guard Extensions Programming Reference. Technical Report. Intel, Santa Clara, CA, USA. https://software.intel.com/sites/default/files/managed/48/88/329298-002.pdf

[19]

Intel. 2016a. 6th Generation Intel Processor Datasheet for S-Platforms. Technical Report. Intel, Santa Clara, CA, USA. https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/desktop-6th-gen-core-family-datasheet-vol-2.pdf

[20]

Intel. 2016b. Intel Software Guard Extensions Remote Attestation End-to-End Example . https://software.intel.com/en-us/articles/intel-software-guard-extensions-remote-attestation-end-to-end-example Retrieved Jan 2, 2019 from

[21]

Intel. 2018a. Intel Software Guard Extensions SSL . https://github.com/intel/intel-sgx-ssl Retrieved December 29, 2018 from

[22]

Intel. 2018b. KVM-SGX . https://github.com/intel/kvm-sgx Retrieved December, 29, 2018 from

[23]

Intel. 2018c. QEMU-SGX . https://github.com/intel/qemu-sgx Retrieved December 29, 2018 from

[24]

Qing Jiao, Mian Lu, Huynh Huynh Phung, and Tulika Mitra. 2015. Improving GPGPU Energy-Efficiency through Concurrent Kernel Execution and DVFS. In IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). 1--11.

[25]

Asim Kadav and Michael M. Swift. 2012. Understanding Modern Device Drivers. In The 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12). 87--98.

Digital Library

[26]

Shinpei Kato. 2013. Implementing Open-Source CUDA Runtime. Technical Report. Nagoya University.

[27]

Shinpei Kato, Yuki Abe, Jason Aumiller, Takuya Edahiro, Yuseke Fujii, Masaki Iwata, Marcin Koscielnicki, Michael McThrow, Martin Peres, Hiroshi Sasaki, Yuske Suzuki, Hisashi Usuda, Kaibo Wang, and Hiroshi Yamada. 2014. Gdev: Open-Source GPGPU Runtime and Driver Software . https://github.com/shinpei0208/gdev Retrieved June 17, 2018 from

[28]

Shinpei Kato, Michael McThrow, Carlos Maltzahn, and Scott A. Brandt. 2012. Gdev: First-Class GPU Resource Management in the Operating System. In 2012 USENIX Annual Technical Conference (USENIX ATC '12). 401--412.

Digital Library

[29]

Michael Kerrisk. 2012. XDC2012: Graphics Stack Security .

[30]

Seong Min Kim, Juhyeng Han, Jaehyeong Ha, Taesoo Kim, and Dongsu Han. 2017. Enhancing Security and Privacy of Tor's Ecosystem by Using Trusted Execution Environments. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI '17). 145--161.

Digital Library

[31]

Taehoon Kim, Joonun Park, Jaewook Woo, Seungheun Jeon, and Jaehyuk Huh. 2019. ShieldStore: Shielded In-memory Key-value Storage with SGX. In 14th European Conference on Computer Systems (EuroSys '19).

Digital Library

[32]

Yonggon Kim, Ohmin Kwon, Jinsoo Jang, Seongwook Jin, Hyeongboo Baek, Brent Byunghoon Kang, and Hyunsoo Yoon. 2016. On-demand bootstrapping mechanism for isolated cryptographic operations on commodity accelerators . Computers & Security, Vol. 62 (Sep 2016), 33--48.

[33]

Ted Krovetz and Phillip Rogaway. 2014. The OCB authenticated-encryption algorithm. Technical Report. 1--19 pages.

[34]

Sangho Lee, Youngsok Kim, Jangwoo Kim, and Jong Kim. 2014. Stealing webpages rendered on your browser by exploiting GPU vulnerabilities. In IEEE Symposium on Security and Privacy (SP '14). 19--33.

Digital Library

[35]

Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos V. Rozas, Hisham Shafi, Vedvyas Shanbhogue, and Uday R. Savagaonkar. 2013. Innovative Instructions and Software Model for Isolated Execution. In The 2nd International Workshop on Hardware and Architectural Support for Security and Privacy (HASP '13). 1--8.

Digital Library

[36]

Zhenyu Ning, Fengwei Zhang, Weisong Shi, and Weidong Shi. 2017. Position Paper: Challenges Towards Securing Hardware-assisted Execution Environments. In The Hardware and Architectural Support for Security and Privacy (HASP '17). 1--8.

Digital Library

[37]

NVIDIA. 2017a. Multi Process Service. Technical Report. NVIDIA, Santa Clara, CA, USA. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf

[38]

NVIDIA. 2017b. NVIDIA Volta Architecture. Technical Report. NVIDIA, Santa Clara, CA, USA.

[39]

Lena E. Olson, Jason Power, Mark D. Hill, and David A. Wood. 2015. Border Control: Sandboxing Accelerators. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO '15). 470--481.

Digital Library

[40]

Meni Orenbach, Pavel Lifshits, Marina Minkin, and Mark Silberstein. 2017. Eleos: ExitLess OS Services for SGX Enclaves. In 12th European Conference on Computer Systems (EuroSys '17). 238--253.

Digital Library

[41]

PCI-SIG. 2004. PCI Local Bus Specification Specification, Revision 3.0. Technical Report. PCI-SIG, Beaverton, OR, USA.

[42]

PCI-SIG. 2009. Address Translation Services Specification, Revision 1.1. Technical Report. PCI-SIG, Beaverton, OR, USA.

[43]

PCI-SIG. 2010. PCI Express Base Specification Specification, Revision 3.0. Technical Report. PCI-SIG, Beaverton, OR, USA.

[44]

Bharath Pichai, Lisa Hsu, and Abhishek Bhattacharjee. 2014. Architectural Support for Address Translation on GPUs: Designing Memory Management Units for CPU/GPUs with Unified Address Spaces. In The 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). 743--758.

Digital Library

[45]

Roberto Di Pietro, Flavio Lombardi, and Antonio Villani. 2016. CUDA Leaks: A Detailed Hack for CUDA and a (Partial) Fix . ACM Transactions on Embedded Computing Systems (TECS), Vol. 15, 1, Article 15 (Feb 2016), bibinfonumpages25 pages.

Digital Library

[46]

Phillip W. Rogaway. 2006. Method and Apparatus for Facilitating Efficient Authenticated Encryption . Patent No. U.S. 7,046,802, Filed July 30th., 2001, Issued May 16th., 2006.

[47]

Phil Rogers. 2013. Heterogeneous System Architecture Overview. In A Symposium on High Performance Chips (Hot Chips '13). 1--41.

[48]

Nikolay Sakharnykh. 2017. Unified Memory on Pascal and Volta . http://on-demand.gputechconf.com/gtc/2017/presentation/s7285-nikolay-sakharnykh-unified-memory-on-pascal-and-volta.pdf GPU Technology Conference '17.

[49]

Darmawan Salihun. 2014. System Address Map Initialization in x86/64 Architecture Part 2: PCI Express-Based Systems . http://resources.infosecinstitute.com/system-address-map-initialization-x86x64-architecture-part-2-pci-express-based-systems/ Retrieved Jan 2, 2019 from

[50]

Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why Not Virtualizing GPUs at the Hypervisor?. In 2014 USENIX Annual Technical Conference (USENIX ATC '14). 109--120.

Digital Library

[51]

Giorgos Vasiliadis, Elias Athanasopoulos, Michalis Polychronakis, and Sotiris Ioannidis. 2014. PixelVault: Using GPUs for Securing Cryptographic Operations. In ACM SIGSAC Conference on Computer and Communications Security (CCS '14). 1131--1142.

Digital Library

[52]

Stavros Volos, Kapil Vaswani, and Rodrigo Bruno. 2018. Graviton: Trusted Execution Environments on GPUs. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18). 681--696.

Digital Library

[53]

Samuel Weiser and Mario Werner. 2017. SGXIO: Generic Trusted I/O Path for Intel SGX . In ACM Conference on Data and Application Security and Privacy (CODASPY '17). 261--268.

Digital Library

[54]

Sheng Yang. 2008. Extending KVM with new Intel Virtualization Technology . https://www.linux-kvm.org/images/c/c7/KvmForum2008%24kdf2008_11.pdf KVM Forum.

[55]

Hangchen Yu and Christopher J. Rossbach. 2017. Full Virtualization for GPUs Reconsidered. In 14th Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD '17). 1--11.

[56]

Zhe Zhou, Wenrui Diao, Xiangyu Liu, Zhou Li, Kehuan Zhang, and Rui Liu. 2017. Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU . Proceedings on Privacy Enhancing Technologies (PoPETs), Vol. 2017, 2 (2017), 57--73.

[57]

Zongwei Zhou, Virgil D. Gligor, James Newsome, and Jonathan M. McCune. 2012. Building Verifiable Trusted Path on Commodity x86 Computers. In Symposium on Security and Privacy (SP '12). 616--630.

Digital Library

[58]

Zhiting Zhu, Sangman Kim, Yuri Rozhanski, Yige Hu, Emmett Witchel, and Mark Silberstein. 2017. Understanding The Securty of Discrete GPUs. In Proceedings of the General Purpose GPUs (GPGPU '10). 1--11.

Digital Library

Cited By

Tan YTan CMi ZChen HEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined EncryptionProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707224(843-857)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707224
Yuan SAwad AZhou H(2025)Delta Counter: Bandwidth-Efficient Encryption Counter Representation for Secure GPU MemoryIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.338956022:1(101-113)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TDSC.2024.3389560
Lee KAshok MMaji SAgrawal RJoshi AYan MEmer JChandrakasan A(2025)Secure Machine Learning Hardware: Challenges and Progress [Feature]IEEE Circuits and Systems Magazine10.1109/MCAS.2024.350937625:1(8-34)Online publication date: Sep-2026
https://doi.org/10.1109/MCAS.2024.3509376
Show More Cited By

Index Terms

Heterogeneous Isolated Execution for Commodity GPUs
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Security and privacy
  1. Systems security
    1. Operating systems security
      1. Trusted computing

Recommendations

Adaptive heterogeneous scheduling for integrated GPUs
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute ...
Performance Evaluation and Optimization Mechanisms for Inter-operable Graphics and Computation on GPUs
GPGPU-7: Proceedings of Workshop on General Purpose Processing Using GPUs

Graphics Processing Units (GPUs) have gained recognition as the primary form of accelerators for graphics rendering in the gaming domain. They have also been widely accepted as the computing platform of choice in many scientific and high performance ...
ScrimpCo: scalable matrix profile on commodity heterogeneous processors
Abstract
The discovery of time series motifs and discords is considered a paramount and challenging problem regarding time series analysis. In this work, we present ScrimpCo, a heterogeneous implementation of a previous algorithm called SCRIMP that excels ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

April 2019

1126 pages

ISBN:9781450362405

DOI:10.1145/3297858

General Chairs:
Iris Bahar
Brown University
,
Maurice Herlihy
Brown University
,
Program Chairs:
Emmett Witchel
University of Texas, Austin
,
Alvin Lebeck
Duke University

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPLOS '19

Sponsor:

ASPLOS '19: Architectural Support for Programming Languages and Operating Systems

April 13 - 17, 2019

RI, Providence, USA

Acceptance Rates

ASPLOS '19 Paper Acceptance Rate 74 of 351 submissions, 21%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

63
Total Citations
View Citations
1,802
Total Downloads

Downloads (Last 12 months)265
Downloads (Last 6 weeks)16

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tan YTan CMi ZChen HEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined EncryptionProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707224(843-857)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707224
Yuan SAwad AZhou H(2025)Delta Counter: Bandwidth-Efficient Encryption Counter Representation for Secure GPU MemoryIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.338956022:1(101-113)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TDSC.2024.3389560
Lee KAshok MMaji SAgrawal RJoshi AYan MEmer JChandrakasan A(2025)Secure Machine Learning Hardware: Challenges and Progress [Feature]IEEE Circuits and Systems Magazine10.1109/MCAS.2024.350937625:1(8-34)Online publication date: Sep-2026
https://doi.org/10.1109/MCAS.2024.3509376
Misono MStavrakakis DSantos NBhatotia P(2024)Confidential VMs Explained: An Empirical Analysis of AMD SEV-SNP and Intel TDXProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004188:3(1-42)Online publication date: 10-Dec-2024
https://dl.acm.org/doi/10.1145/3700418
Feng EFeng DDu DXia YZheng WZhao SChen HTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)sIOPMP: Scalable and Efficient I/O Protection for TEEsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640378(1061-1076)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640378
Wang CDeng YNing ZLeach KLi JYan SHe ZCao JZhang F(2024)Building a Lightweight Trusted Execution Environment for Arm GPUsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.3334277(1-16)Online publication date: 2024
https://doi.org/10.1109/TDSC.2023.3334277
Bai YLi PHuang YHuang MZhao SZhao LZhang FMeng DHou R(2024)HyperTEE: A Decoupled TEE Architecture with Secure Enclave Management2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00018(105-120)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00018
Chen TTan YLi CZhang ZMeng WLi Y(2024)SecureComm: A Secure Data Transfer Framework for Neural Network Inference on CPU-FPGA Heterogeneous Edge DevicesIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2024.3491169(1-1)Online publication date: 2024
https://doi.org/10.1109/JETCAS.2024.3491169
Feng EFeng DDu DXia YChen H(2024)sNPU: Trusted Execution Environments on Integrated NPUs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00057(708-723)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00057
Abdullah RLee HZhou HAwad A(2024)Salus: Efficient Security Support for CXL-Expanded GPU Memory2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00027(1-15)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00027
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten