research-article

Delta Counter: Bandwidth-Efficient Encryption Counter Representation for Secure GPU Memory

Authors:

Huiyang ZhouAuthors Info & Claims

IEEE Transactions on Dependable and Secure Computing, Volume 22, Issue 1

Pages 101 - 113

https://doi.org/10.1109/TDSC.2024.3389560

Published: 16 April 2024 Publication History

Abstract

The security of GPUs has recently gained significant attention. To support secure memory for GPUs, the critical performance bottleneck is the memory bandwidth contention between the regular data and the security metadata (Yuan et al. 2021). With counter-mode encryption, the security meta includes counters, the Bonsai Merkle Tree (BMT), and message authentication codes (MACs). In this work, we focus on encryption counters given their impact on the counter and BMT traffic while leveraging prior schemes (Saileshwar et al. 2018), (Taassori et al. 2018) to address the MAC traffic. We first analyze the characteristics of the encryption counters from a wide range of GPGPU benchmarks and make two key observations. (1) With the split counter scheme, the cache blocks in a large portion of the memory space, sometimes the entire GPU memory space, share the same major counter value. (2) The difference among minor counters is limited. We then propose a novel scheme to reduce the encryption counter traffic. Our design includes (a) a highly compact way of counter representation and (b) a verification scheme to determine the correct minor counter values. Compared to prior works on reducing counter traffic (Na et al. 2021), our scheme handles more counter value patterns (as we don't restrict the counters to be the same in a memory chunk) and is more effective in reducing counter traffic.

References

[1]

S. Yuan, A. W. B. Yudha, Y. Solihin, and H. Zhou, “Analyzing secure memory architecture for GPUs,” in Proc. IEEE Int. Symp. Perform. Anal. Syst. Softw., Stony Brook, NY, USA, 2021, pp. 59–69.

[2]

G. Saileshwar, P. J. Nair, P. Ramrakhyani, W. Elsasser, and M. K. Qureshi, “SYNERGY: Rethinking secure-memory design for error-correcting memories,” in Proc. IEEE Int. Symp. High Perform. Comput. Architecture, Vienna, Austria, 2018, pp. 454–465.

[3]

M. Taassori, A. Shafiee, and R. Balasubramonian, “VAULT: Reducing paging overheads in SGX with efficient integrity verification structures,” in Proc. 23rd Int. Conf. Architectural Support Program. Lang. Operating Syst., Williamsburg, VA, USA, X. Shen, J. Tuck, R. Bianchini, and V. Sarkar, Eds., 2018, pp. 665–678.

Digital Library

[4]

S. Na, S. Lee, Y. Kim, J. Park, and J. Huh, “Common counters: Compressed encryption counters for secure GPU memory,” in Proc. IEEE Int. Symp. High-Perform. Comput. Architecture, Seoul, South Korea, 2021, pp. 1–13.

[5]

J. K. Tugnait, “Detection of active eavesdropping attack by spoofing relay in multiple antenna systems,” IEEE Wireless Commun. Lett., vol. 5, no. 5, pp. 460–463, Oct. 2016.

[6]

J. A. Halderman et al., “Lest we remember: Cold boot attacks on encryption keys,” in Proc. 17th USENIX Secur. Symp., San Jose, CA, USA, 2008, pp. 45–60. [Online]. Available: http://www.usenix.org/events/sec08/tech/full_papers/halderman/halderman.pdf

[7]

O. Mutlu, “The rowhammer problem and other issues we may face as memory becomes denser,” 2017,. [Online]. Available: http://arxiv.org/abs/1703.00626

[8]

S. Gueron, “Memory encryption for general-purpose processors,” IEEE Secur. Priv., vol. 14, no. 6, pp. 54–62, Nov./Dec. 2016.

Digital Library

[9]

D. Kaplan, J. Powell, and T. Woller, “AMD memory encryption,” 2016. [Online]. Available: https://www.amd.com/content/dam/amd/en/documents/epyc-business-docs/whitepapers/memory-encryption-white-paper.pdf

[10]

S. Yuan, Y. Solihin, and H. Zhou, “PSSM: Achieving secure memory for GPUs with partitioned and sectored security metadata,” in Proc. Int. Conf. Supercomput., H. Zhou, J. Moreira, F. Mueller, and Y. Etsion, Eds., 2021, pp. 139–151.

Digital Library

[11]

S. Yuan, A. Awad, A. W. B. Yudha, Y. Solihin, and H. Zhou, “Adaptive security support for heterogeneous memory on GPUs,” in Proc. IEEE Int. Symp. High-Perform. Comput. Architecture, Seoul, South Korea, 2022, pp. 213–228.

[12]

A. W. B. Yudha, J. Meyer, S. Yuan, H. Zhou, and Y. Solihin, “LITE: A low-cost practical inter-operable GPU TEE,” in Proc. Int. Conf. Supercomput., L. Rauchwerger, K. W. Cameron, D. S. Nikolopoulos, and D. N. Pnevmatikatos, Eds., 2022, pp. 7:1–7:13.

Digital Library

[13]

J. A. Stratton et al., “Parboil: A revised benchmark suite for scientific and commercial throughput computing,” Champaign, IL USA, Tech. Rep., 2009. [Online]. Available: http://impact.crhc.illinois.edu/shared/Docs/impact-12-01.parboil.pdf

[14]

S. Che et al., “Rodinia: A benchmark suite for heterogeneous computing,” in Proc. IEEE Int. Symp. Workload Characterization, Austin, TX, USA, 2009, pp. 44–54.

Digital Library

[15]

S. Grauer-Gray and J. Cavazos, “Optimizing and auto-tuning belief propagation on the GPU,” in Proc. 23rd Int. Workshop Lang. Compilers Parallel Comput., Houston, TX, USA, K. D. Cooper, J. M. Mellor-Crummey, and V. Sarkar, Eds., USA:Springer, 2010, pp. 121–135.

[16]

S. Che, B. M. Beckmann, S. K. Reinhardt, and K. Skadron, “Pannotia: Understanding irregular GPGPU graph applications,” in Proc. IEEE Int. Symp. Workload Characterization, Portland, OR, USA, 2013, pp. 185–195.

[17]

D. Lie et al., “Architectural support for copy and tamper resistant software,” in Proc. 9th Int. Conf. Architectural Support Program. Lang. Operating Syst., Cambridge, MA, USA, 2000, pp. 168–177.

Digital Library

[18]

I. Jang, A. Tang, T. Kim, S. Sethumadhavan, and J. Huh, “Heterogeneous isolated execution for commodity GPUs,” in Proc. 24th Int. Conf. Architectural Support Program. Lang. Operating Syst., Providence, RI, USA, I. Bahar, M. Herlihy, E. Witchel, and A. R. Lebeck, Eds., 2019, pp. 455–468.

Digital Library

[19]

S. Volos, K. Vaswani, and R. Bruno, “Graviton: Trusted execution environments on GPUs,” in Proc. 13th USENIX Symp. Operating Syst. Des. Implementation, Carlsbad, CA, USA, A. C. Arpaci-Dusseau and G. Voelker, Eds., 2018, pp. 681–696. [Online]. Available: https://www.usenix.org/conference/osdi18/presentation/volos

[20]

Z. H. Jiang, Y. Fei, and D. R. Kaeli, “A complete key recovery timing attack on a GPU,” in Proc. IEEE Int. Symp. High Perform. Comput. Architecture, Barcelona, Spain, 2016, pp. 394–405.

[21]

H. Naghibijouybari, A. Neupane, Z. Qian, and N. B. Abu-Ghazaleh, “Rendered insecure: GPU side channel attacks are practical,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Toronto, ON, Canada, 2018, pp. 2139–2153.

Digital Library

[22]

Y. Gao, H. Zhang, W. Cheng, Y. Zhou, and Y. Cao, “Electro-magnetic analysis of GPU-based AES implementation,” in Proc. 55th Annu. Des. Autom. Conf., San Francisco, CA, USA, 2018, pp. 121:1–121:6.

Digital Library

[23]

G. E. Suh, D. E. Clarke, B. Gassend, M. van Dijk, and S. Devadas, “AEGIS: Architecture for tamper-evident and tamper-resistant processing,” in Proc. 17th Annu. Int. Conf. Supercomput., San Francisco, CA, USA, U. Banerjee, K. A. Gallivan, and A. González, Eds., 2003, pp. 160–171.

Digital Library

[24]

C. Yan et al., “Improving cost, performance, and security of memory encryption and authentication,” in Proc. 33rd Int. Symp. Comput. Architecture, Boston, MA, USA, 2006, pp. 179–190.

[25]

S. Chhabra, B. Rogers, Y. Solihin, and M. Prvulovic, “Making secure processors OS- and performance-friendly,” ACM Trans. Archit. Code Optim., vol. 5, no. 4, pp. 16:1–16:35, 2009.

Digital Library

[26]

B. Rogers, S. Chhabra, M. Prvulovic, and Y. Solihin, “Using address independent seed encryption and bonsai merkle trees to make secure processors OS- and performance-friendly,” in Proc. 40th Annu. IEEE/ACM Int. Symp. Microarchitecture, Chicago, Illinois, USA, 2007, pp. 183–196.

[27]

A. Freij, S. Yuan, H. Zhou, and Y. Solihin, “Persist level parallelism: Streamlining integrity tree updates for secure persistent memory,” in Proc. 53rd Annu. IEEE/ACM Int. Symp. Microarchitecture, Athens, Greece, 2020, pp. 14–27.

[28]

R. Abdullah, H. Zhou, and A. Awad, “Plutus: Bandwidth-efficient memory security for GPUs,” in Proc. IEEE Int. Symp. High-Perform. Comput. Architecture, Montreal, QC, Canada, 2023, pp. 543–555.

[29]

A. Hidayat, “Fastlz. USA,” Jun. 2019. [Online]. Available: https://github.com/ariya/FastLZ

[30]

G. Panwar et al., “Translation-optimized memory compression for capacity,” in Proc. 55th IEEE/ACM Int. Symp. Microarchitecture, Chicago, IL, USA, 2022, pp. 992–1011.

Digital Library

[31]

M. Khairy, Z. Shen, T. M. Aamodt, and T. G. Rogers, “Accel-sim: An extensible simulation framework for validated GPU modeling,” in Proc. 47th ACM/IEEE Annu. Int. Symp. Comput. Architecture, Valencia, Spain, 2020, pp. 473–486.

Digital Library

[32]

Nvidia Turing Architecture, 2018. [Online]. Available: https://images.nvidia.com/aem-dam/enzz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

[33]

B. Gassend, G. E. Suh, D. E. Clarke, M. van Dijk, and S. Devadas, “Caches and hash trees for efficient memory integrity verification,” in Proc. 9th Int. Symp. High-Perform. Comput. Architecture, Anaheim, California, USA, 2003, pp. 295–306.

[34]

A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, “Analyzing CUDA workloads using a detailed GPU simulator,” in Proc. IEEE Int. Symp. Perform. Anal. Syst. Softw., 2009, pp. 163–174.

Index Terms

Delta Counter: Bandwidth-Efficient Encryption Counter Representation for Secure GPU Memory
1. Networks
  1. Network properties
    1. Network security
      1. Security protocols
2. Security and privacy

Index terms have been assigned to the content through auto-classification.

Recommendations

Accurate age counter for wear leveling on non-volatile based main memory

Limited lifetime has been a key challenge in development of emerging non-volatile memories (NVM). Age counter based wear leveling is the most effective approach in the extension of their lifetime. The age counters in these approaches are determined by ...
Counter Tree: A Scalable Counter Architecture for Per-Flow Traffic Measurement

Per-flow traffic measurement, which is to count the number of packets for each active flow during a certain measurement period, has many applications in traffic engineering, classification of routing distribution or network usage pattern, service ...
Efficient implementation of a statistics counter architecture

Internet routers and switches need to maintain millions of (e.g., per prefix) counters at up to OC-768 speeds that are essential for traffic engineering. Unfortunately, the speed requirements require the use of large amounts of expensive SRAM memory. ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Dependable and Secure Computing

IEEE Transactions on Dependable and Secure Computing Volume 22, Issue 1

Jan.-Feb. 2025

844 pages

Issue’s Table of Contents

1545-5971 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 16 April 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents