Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

SPX64: A Scratchpad Memory for General-purpose Microprocessors

Published: 30 December 2020 Publication History

Abstract

General-purpose computing systems employ memory hierarchies to provide the appearance of a single large, fast, coherent memory. In special-purpose CPUs, programmers manually manage distinct, non-coherent scratchpad memories. In this article, we combine these mechanisms by adding a virtually addressed, set-associative scratchpad to a general purpose CPU. Our scratchpad exists alongside a traditional cache and is able to avoid many of the programming challenges associated with traditional scratchpads without sacrificing generality (e.g., virtualization). Furthermore, our design delivers increased security and improves performance, especially for workloads with high locality or that interact with nonvolatile memory.

References

[1]
Intel Corporation. 2018. White Paper: Retpoline: A Branch Target Injection Mitigation. Technical Report 337131-003. Retrieved from https://software.intel.com/security-software-guidance/api-app/sites/default/files/Retpoline-A-Branch-Target-Injection-Mitigation.pdf?source=techstories.org
[2]
Sam Ainsworth and Timothy M. Jones. 2019. MuonTrap: Preventing Cross-Domain Spectre-Like Attacks by Capturing Speculative State. arxiv:cs.CR/1911.08384 (2019).
[3]
Joy Arulraj, Andrew Pavlo, and Subramanya R. Dulloor. 2015. Let’s talk about storage 8 recovery methods for non-volatile memory database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[4]
Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM Trans. Archit. Code Optim. 14, 2 (June 2017).
[5]
R. Banakar, S. Steinke, Bo-Sik Lee, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES’02). 73--78.
[6]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Comput. Archit. News 39, 2 (2011), 1--7.
[7]
Joseph Bonneau and Ilya Mironov. 2006. Cache-collision timing attacks against AES. In Cryptographic Hardware and Embedded Systems - CHES 2006, Louis Goubin and Mitsuru Matsui (Eds.). Lecture Notes in Computer Science, Vol. 4249. Springer Berlin, 201--215.
[8]
Dhruva R. Chakrabarti, Hans-J. Boehm, and Kumud Bhandari. 2014. Atlas: Leveraging locks for non-volatile memory consistency. ACM SIGPLAN Not. 49 (2014), 433--452.
[9]
Jeremy Condit, Edmund Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles.
[10]
Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, and Yi Zou. 2011. An energy-efficient adaptive hybrid cache. In Proceedings of the 17th IEEE/ACM International Symposium on Low-power Electronics and Design (ISLPED’11). IEEE Press, Piscataway, NJ, 67--72. Retrieved from http://dl.acm.org/citation.cfm?id=2016802.2016825.
[11]
Henry Cook, Krste Asanovic, and David A. Patterson. 2009. Virtual Local Sstores: Enabling Software-managed Memory Hierarchies in Mainstream Computing Environments. Technical Report. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-131 (2009).
[12]
Luke Dalessandro, Michael Spear, and Michael L. Scott. 2010. NOrec: Streamlining STM by abolishing ownership records. In Proceedings of the 15th ACM Symposium on Principles and Practice of Parallel Programming.
[13]
Ning Deng, Weixing Ji, Jaxin Li, and Qi Zuo. 2011. A semi-automatic scratchpad memory management framework for CMP. In Proceedings of the International Workshop on Advanced Parallel Processing Technologies. Springer, 73--87.
[14]
Goran Doychev, Dominik Feld, Boris Köpf, Laurent Mauborgne, and Jan Reineke. 2013. CacheAudit: A tool for the static analysis of cache side channels. In Proceedings of the 22nd USENIX Conference on Security. 431--446.
[15]
Subramanya Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th European Conference on Computer Systems.
[16]
Poletti Francesco, Paul Marchal, David Atienza, Luca Benini, Francky Catthoor, and Jose M. Mendias. 2004. An integrated hardware/software approach for runtime scratchpad management. In Proceedings of the 41st Design Automation Conference. ACM, 238--243.
[17]
Christopher Garman, Xiaochen Guo, and Michael Spear. 2017. A study of unnecessary write backs. In Proceedings of the International Symposium on Memory Systems (MEMSYS’17). ACM, New York, NY, 127--129.
[18]
Michael Gschwind. 2007. The cell broadband engine: Exploiting multiple levels of parallelism in a chip multiprocessor. Int. J. Parallel Prog. 35, 3 (2007), 233--262.
[19]
David Gullasch, Endre Bangerter, and Stephan Krenn. 2011. Cache games—Bringing access-based cache attacks on AES to practice. In Proceedings of the IEEE Symposium on Security and Privacy (S8P’11). 490--505.
[20]
Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th IEEE International Workshop on Workload Characterization (WWC’01). IEEE, 3--14.
[21]
Intel Inc. 2019. Intel Skylake. Retrieved from https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client).
[22]
G. Irazoqui, T. Eisenbarth, and B. Sunar. 2015. S$A: A shared cache attack that works across cores and defies VM sandboxing—and its application to AES. In Proceedings of the IEEE Symposium on Security and Privacy (S8P’15). 591--604.
[23]
Vladimir Kiriansky, Ilia A. Lebedev, Saman P. Amarasinghe, Srinivas Devadas, and Joel S. Emer. 2018. DAWG: A defense against cache timing attacks in speculative execution processors. In Proceedings of the 51st IEEE/ACM International Symposium on Microarchitecture (MICRO’18). 974--987.
[24]
Michael Kistler, Michael Perrone, and Fabrizio Petrini. 2006. Cell multiprocessor communication network: Built for speed. IEEE Micro 26, 3 (2006), 10--23.
[25]
Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, et al. 2019. Spectre attacks: Exploiting speculative execution. In Proceedings of the IEEE Symposium on Security and Privacy (SP’19). IEEE, 1--19.
[26]
Paul C. Kocher. 1996. Timing attacks on implementations of Diffie–Hellman, RSA, DSS, and other systems. In Proceedings of the Advances in Cryptology Conference (CRYPTO’96).
[27]
Rakesh Komuravelli, Matthew D. Sinclair, Johnathan Alsop, Muhammad Huzaifa, Maria Kotsifakou, Prakalp Srivastava, Sarita V. Adve, and Vikram S. Adve. 2015. Stash: Have your scratchpad and cache it too. SIGARCH Comput. Archit. News 43, 3 (June 2015), 707--719.
[28]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd IEEE/ACM International Symposium on Microarchitecture. ACM, 469--480.
[29]
Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. 2018. Meltdown: Reading kernel memory from user space. In Proceedings of the 27th USENIX Security Symposium (USENIX Security’18). 973--990.
[30]
Fangfei Liu, Y. Yarom, Qian Ge, G. Heiser, and R. B. Lee. 2015. Last-level cache side-channel attacks are practical. In Proceedings of the IEEE Symposium on Security and Privacy (SP’15). 605--622.
[31]
Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, and Jinglei Ren. 2017. DudeTM: Building durable transactions with decoupling for persistent memory. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems.
[32]
MICRON. 2020. DDR4 SDRAM. Retrieved from https://www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr4/8gb_ddr4_sdram.pdf.
[33]
Sanketh Nalli, Swapnil Haria, Mark D. Hill, Michael M. Swift, Haris Volos, and Kimberly Keeton. 2017. An analysis of persistent memory use with WHISPER. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems.
[34]
NVIDIA. 2013.Using Shared Memory in CUDA C/C++. Retrieved from https://devblogs.nvidia.com/using-shared-memory-cuda-cc/.
[35]
Dag A. Osvik, Adi Shamir, and Eran Tromer. 2006. Cache attacks and countermeasures: The case of AES. Topics in Cryptology–CT-RSA 2006 (Jan. 2006). Springer, 1--20.
[36]
Colin Percival. 2005. Cache missing for fun and profit. In Proceedings of the BSDCan Conference.
[37]
Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. 2009. Hey, you, get off of my cloud: Exploring information leakage in third-party compute clouds. In Proceedings of the 16th ACM Conference on Computer and Communications Security. 199--212.
[38]
Muhammad Refaat Soliman and Rodolfo Pellizzoni. 2017. WCET-driven dynamic data scratchpad management with compiler-directed prefetching. In Proceedings of the 29th Euromicro Conference on Real-Time Systems (ECRTS’17) (Leibniz International Proceedings in Informatics (LIPIcs)), Marko Bertogna (Ed.), Vol. 76. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 24:1–24:23.
[39]
Eran Tromer, DagArne Osvik, and Adi Shamir. 2010. Efficient cache attacks on AES, and countermeasures. J. Cryptol. 23, 1 (2010), 37--71.
[40]
Sumesh Udayakumaran and Rajeev Barua. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’03). Association for Computing Machinery, New York, NY, 276--286.
[41]
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems.
[42]
M. Yan, J. Choi, D. Skarlatos, A. Morrison, C. Fletcher, and J. Torrellas. 2018. InvisiSpec: Making speculative execution invisible in the cache hierarchy. In Proceedings of the 51st IEEE/ACM International Symposium on Microarchitecture (MICRO’18). 428--441.
[43]
Yuval Yarom and Naomi Benger. 2014. Recovering OpenSSL ECDSA nonces using the flush+reload cache side-channel attack. Cryptology ePrint Archive, Report 2014/140.
[44]
Yuval Yarom and Katrina Falkner. 2014. FLUSH+RELOAD: A high resolution, low noise, L3 cache side-channel attack. In Proceedings of the 23rd USENIX Conference on Security. 719--732.
[45]
Yuval Yarom, Daniel Genkin, and Nadia Heninger. 2016. CacheBleed: A timing attack on OpenSSL constant time RSA. In Proceedings of the Conference on Cryptographic Hardware and Embedded Systems (CHES’16) (Lecture Notes in Computer Science), Benedikt Gierlichs and Axel Y. Poschmann (Eds.), Vol. 9813. Springer, 346--367. Retrieved from http://dblp.uni-trier.de/db/conf/ches/ches2016.html#YaromGH16.
[46]
Richard Yoo, Yang Ni, Adam Welc, Bratin Saha, Ali-Reza Adl-Tabatabai, and Hsien-Hsin Lee. 2008. Kicking the tires of software transactional memory: Why the going gets tough. In Proceedings of the 20th ACM Symposium on Parallelism in Algorithms and Architectures.
[47]
Pantea Zardoshti, Tingzhe Zhou, Pavithra Balaji, Michael L. Scott, and Michael Spear. 2019. Simplifying transactional memory support in C++. ACM Trans. Archit. Code Optim. 16, 3 (July 2019).
[48]
Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2012. Cross-VM side channels and their use to extract private keys. In Proceedings of the ACM Conference on Computer and Communications Security. 305--316.

Cited By

View all
  • (2025)MuDP: multi-granularity data placement for uniform loops on SPM-DRAM architectures to minimize latencyFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3566-y19:5Online publication date: 1-May-2025
  • (2024)Optimizing code allocation for hybrid on-chip memory in IoT systemsIntegration10.1016/j.vlsi.2024.10219597(102195)Online publication date: Jul-2024
  • (2022)ASA: Accelerating Sparse Accumulation in Column-wise SpGEMMACM Transactions on Architecture and Code Optimization10.1145/354306819:4(1-24)Online publication date: 11-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 18, Issue 1
March 2021
402 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3446348
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 December 2020
Accepted: 01 November 2020
Revised: 01 November 2020
Received: 01 May 2020
Published in TACO Volume 18, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Scratchpad memory
  2. cache
  3. persistent memory
  4. security
  5. software managed memory

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)511
  • Downloads (Last 6 weeks)41
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)MuDP: multi-granularity data placement for uniform loops on SPM-DRAM architectures to minimize latencyFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3566-y19:5Online publication date: 1-May-2025
  • (2024)Optimizing code allocation for hybrid on-chip memory in IoT systemsIntegration10.1016/j.vlsi.2024.10219597(102195)Online publication date: Jul-2024
  • (2022)ASA: Accelerating Sparse Accumulation in Column-wise SpGEMMACM Transactions on Architecture and Code Optimization10.1145/354306819:4(1-24)Online publication date: 11-Jun-2022
  • (2022)An interactive and dynamic scratchpad memory management strategy for multi-core processorsMicroprocessors and Microsystems10.1016/j.micpro.2022.10456592(104565)Online publication date: Jul-2022
  • (2022)A Survey on Advancements of Real-Time Analytics Architecture ComponentsComputational Methods and Data Engineering10.1007/978-981-19-3015-7_41(547-559)Online publication date: 9-Sep-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media