Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Delta Counter: Bandwidth-Efficient Encryption Counter Representation for Secure GPU Memory

Published: 16 April 2024 Publication History

Abstract

The security of GPUs has recently gained significant attention. To support secure memory for GPUs, the critical performance bottleneck is the memory bandwidth contention between the regular data and the security metadata (Yuan et al. 2021). With counter-mode encryption, the security meta includes counters, the Bonsai Merkle Tree (BMT), and message authentication codes (MACs). In this work, we focus on encryption counters given their impact on the counter and BMT traffic while leveraging prior schemes (Saileshwar et al. 2018), (Taassori et al. 2018) to address the MAC traffic. We first analyze the characteristics of the encryption counters from a wide range of GPGPU benchmarks and make two key observations. (1) With the split counter scheme, the cache blocks in a large portion of the memory space, sometimes the entire GPU memory space, share the same major counter value. (2) The difference among minor counters is limited. We then propose a novel scheme to reduce the encryption counter traffic. Our design includes (a) a highly compact way of counter representation and (b) a verification scheme to determine the correct minor counter values. Compared to prior works on reducing counter traffic (Na et al. 2021), our scheme handles more counter value patterns (as we don't restrict the counters to be the same in a memory chunk) and is more effective in reducing counter traffic.

References

[1]
S. Yuan, A. W. B. Yudha, Y. Solihin, and H. Zhou, “Analyzing secure memory architecture for GPUs,” in Proc. IEEE Int. Symp. Perform. Anal. Syst. Softw., Stony Brook, NY, USA, 2021, pp. 59–69.
[2]
G. Saileshwar, P. J. Nair, P. Ramrakhyani, W. Elsasser, and M. K. Qureshi, “SYNERGY: Rethinking secure-memory design for error-correcting memories,” in Proc. IEEE Int. Symp. High Perform. Comput. Architecture, Vienna, Austria, 2018, pp. 454–465.
[3]
M. Taassori, A. Shafiee, and R. Balasubramonian, “VAULT: Reducing paging overheads in SGX with efficient integrity verification structures,” in Proc. 23rd Int. Conf. Architectural Support Program. Lang. Operating Syst., Williamsburg, VA, USA, X. Shen, J. Tuck, R. Bianchini, and V. Sarkar, Eds., 2018, pp. 665–678.
[4]
S. Na, S. Lee, Y. Kim, J. Park, and J. Huh, “Common counters: Compressed encryption counters for secure GPU memory,” in Proc. IEEE Int. Symp. High-Perform. Comput. Architecture, Seoul, South Korea, 2021, pp. 1–13.
[5]
J. K. Tugnait, “Detection of active eavesdropping attack by spoofing relay in multiple antenna systems,” IEEE Wireless Commun. Lett., vol. 5, no. 5, pp. 460–463, Oct. 2016.
[6]
J. A. Halderman et al., “Lest we remember: Cold boot attacks on encryption keys,” in Proc. 17th USENIX Secur. Symp., San Jose, CA, USA, 2008, pp. 45–60. [Online]. Available: http://www.usenix.org/events/sec08/tech/full_papers/halderman/halderman.pdf
[7]
O. Mutlu, “The rowhammer problem and other issues we may face as memory becomes denser,” 2017,. [Online]. Available: http://arxiv.org/abs/1703.00626
[8]
S. Gueron, “Memory encryption for general-purpose processors,” IEEE Secur. Priv., vol. 14, no. 6, pp. 54–62, Nov./Dec. 2016.
[9]
D. Kaplan, J. Powell, and T. Woller, “AMD memory encryption,” 2016. [Online]. Available: https://www.amd.com/content/dam/amd/en/documents/epyc-business-docs/whitepapers/memory-encryption-white-paper.pdf
[10]
S. Yuan, Y. Solihin, and H. Zhou, “PSSM: Achieving secure memory for GPUs with partitioned and sectored security metadata,” in Proc. Int. Conf. Supercomput., H. Zhou, J. Moreira, F. Mueller, and Y. Etsion, Eds., 2021, pp. 139–151.
[11]
S. Yuan, A. Awad, A. W. B. Yudha, Y. Solihin, and H. Zhou, “Adaptive security support for heterogeneous memory on GPUs,” in Proc. IEEE Int. Symp. High-Perform. Comput. Architecture, Seoul, South Korea, 2022, pp. 213–228.
[12]
A. W. B. Yudha, J. Meyer, S. Yuan, H. Zhou, and Y. Solihin, “LITE: A low-cost practical inter-operable GPU TEE,” in Proc. Int. Conf. Supercomput., L. Rauchwerger, K. W. Cameron, D. S. Nikolopoulos, and D. N. Pnevmatikatos, Eds., 2022, pp. 7:1–7:13.
[13]
J. A. Stratton et al., “Parboil: A revised benchmark suite for scientific and commercial throughput computing,” Champaign, IL USA, Tech. Rep., 2009. [Online]. Available: http://impact.crhc.illinois.edu/shared/Docs/impact-12-01.parboil.pdf
[14]
S. Che et al., “Rodinia: A benchmark suite for heterogeneous computing,” in Proc. IEEE Int. Symp. Workload Characterization, Austin, TX, USA, 2009, pp. 44–54.
[15]
S. Grauer-Gray and J. Cavazos, “Optimizing and auto-tuning belief propagation on the GPU,” in Proc. 23rd Int. Workshop Lang. Compilers Parallel Comput., Houston, TX, USA, K. D. Cooper, J. M. Mellor-Crummey, and V. Sarkar, Eds., USA:Springer, 2010, pp. 121–135.
[16]
S. Che, B. M. Beckmann, S. K. Reinhardt, and K. Skadron, “Pannotia: Understanding irregular GPGPU graph applications,” in Proc. IEEE Int. Symp. Workload Characterization, Portland, OR, USA, 2013, pp. 185–195.
[17]
D. Lie et al., “Architectural support for copy and tamper resistant software,” in Proc. 9th Int. Conf. Architectural Support Program. Lang. Operating Syst., Cambridge, MA, USA, 2000, pp. 168–177.
[18]
I. Jang, A. Tang, T. Kim, S. Sethumadhavan, and J. Huh, “Heterogeneous isolated execution for commodity GPUs,” in Proc. 24th Int. Conf. Architectural Support Program. Lang. Operating Syst., Providence, RI, USA, I. Bahar, M. Herlihy, E. Witchel, and A. R. Lebeck, Eds., 2019, pp. 455–468.
[19]
S. Volos, K. Vaswani, and R. Bruno, “Graviton: Trusted execution environments on GPUs,” in Proc. 13th USENIX Symp. Operating Syst. Des. Implementation, Carlsbad, CA, USA, A. C. Arpaci-Dusseau and G. Voelker, Eds., 2018, pp. 681–696. [Online]. Available: https://www.usenix.org/conference/osdi18/presentation/volos
[20]
Z. H. Jiang, Y. Fei, and D. R. Kaeli, “A complete key recovery timing attack on a GPU,” in Proc. IEEE Int. Symp. High Perform. Comput. Architecture, Barcelona, Spain, 2016, pp. 394–405.
[21]
H. Naghibijouybari, A. Neupane, Z. Qian, and N. B. Abu-Ghazaleh, “Rendered insecure: GPU side channel attacks are practical,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Toronto, ON, Canada, 2018, pp. 2139–2153.
[22]
Y. Gao, H. Zhang, W. Cheng, Y. Zhou, and Y. Cao, “Electro-magnetic analysis of GPU-based AES implementation,” in Proc. 55th Annu. Des. Autom. Conf., San Francisco, CA, USA, 2018, pp. 121:1–121:6.
[23]
G. E. Suh, D. E. Clarke, B. Gassend, M. van Dijk, and S. Devadas, “AEGIS: Architecture for tamper-evident and tamper-resistant processing,” in Proc. 17th Annu. Int. Conf. Supercomput., San Francisco, CA, USA, U. Banerjee, K. A. Gallivan, and A. González, Eds., 2003, pp. 160–171.
[24]
C. Yan et al., “Improving cost, performance, and security of memory encryption and authentication,” in Proc. 33rd Int. Symp. Comput. Architecture, Boston, MA, USA, 2006, pp. 179–190.
[25]
S. Chhabra, B. Rogers, Y. Solihin, and M. Prvulovic, “Making secure processors OS- and performance-friendly,” ACM Trans. Archit. Code Optim., vol. 5, no. 4, pp. 16:1–16:35, 2009.
[26]
B. Rogers, S. Chhabra, M. Prvulovic, and Y. Solihin, “Using address independent seed encryption and bonsai merkle trees to make secure processors OS- and performance-friendly,” in Proc. 40th Annu. IEEE/ACM Int. Symp. Microarchitecture, Chicago, Illinois, USA, 2007, pp. 183–196.
[27]
A. Freij, S. Yuan, H. Zhou, and Y. Solihin, “Persist level parallelism: Streamlining integrity tree updates for secure persistent memory,” in Proc. 53rd Annu. IEEE/ACM Int. Symp. Microarchitecture, Athens, Greece, 2020, pp. 14–27.
[28]
R. Abdullah, H. Zhou, and A. Awad, “Plutus: Bandwidth-efficient memory security for GPUs,” in Proc. IEEE Int. Symp. High-Perform. Comput. Architecture, Montreal, QC, Canada, 2023, pp. 543–555.
[29]
A. Hidayat, “Fastlz. USA,” Jun. 2019. [Online]. Available: https://github.com/ariya/FastLZ
[30]
G. Panwar et al., “Translation-optimized memory compression for capacity,” in Proc. 55th IEEE/ACM Int. Symp. Microarchitecture, Chicago, IL, USA, 2022, pp. 992–1011.
[31]
M. Khairy, Z. Shen, T. M. Aamodt, and T. G. Rogers, “Accel-sim: An extensible simulation framework for validated GPU modeling,” in Proc. 47th ACM/IEEE Annu. Int. Symp. Comput. Architecture, Valencia, Spain, 2020, pp. 473–486.
[33]
B. Gassend, G. E. Suh, D. E. Clarke, M. van Dijk, and S. Devadas, “Caches and hash trees for efficient memory integrity verification,” in Proc. 9th Int. Symp. High-Perform. Comput. Architecture, Anaheim, California, USA, 2003, pp. 295–306.
[34]
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, “Analyzing CUDA workloads using a detailed GPU simulator,” in Proc. IEEE Int. Symp. Perform. Anal. Syst. Softw., 2009, pp. 163–174.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing  Volume 22, Issue 1
Jan.-Feb. 2025
844 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 16 April 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media