research-article

Open access

SPX64: A Scratchpad Memory for General-purpose Microprocessors

Authors:

Abhishek Singh,

Pantea Zardoshti,

Robert Brotzman,

Aviral Shrivastava,

Michael SpearAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 1

Article No.: 14, Pages 1 - 26

https://doi.org/10.1145/3436730

Published: 30 December 2020 Publication History

All formats PDF

Abstract

General-purpose computing systems employ memory hierarchies to provide the appearance of a single large, fast, coherent memory. In special-purpose CPUs, programmers manually manage distinct, non-coherent scratchpad memories. In this article, we combine these mechanisms by adding a virtually addressed, set-associative scratchpad to a general purpose CPU. Our scratchpad exists alongside a traditional cache and is able to avoid many of the programming challenges associated with traditional scratchpads without sacrificing generality (e.g., virtualization). Furthermore, our design delivers increased security and improves performance, especially for workloads with high locality or that interact with nonvolatile memory.

References

[1]

Intel Corporation. 2018. White Paper: Retpoline: A Branch Target Injection Mitigation. Technical Report 337131-003. Retrieved from https://software.intel.com/security-software-guidance/api-app/sites/default/files/Retpoline-A-Branch-Target-Injection-Mitigation.pdf?source=techstories.org

[2]

Sam Ainsworth and Timothy M. Jones. 2019. MuonTrap: Preventing Cross-Domain Spectre-Like Attacks by Capturing Speculative State. arxiv:cs.CR/1911.08384 (2019).

[3]

Joy Arulraj, Andrew Pavlo, and Subramanya R. Dulloor. 2015. Let’s talk about storage 8 recovery methods for non-volatile memory database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data.

[4]

Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM Trans. Archit. Code Optim. 14, 2 (June 2017).

Digital Library

[5]

R. Banakar, S. Steinke, Bo-Sik Lee, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES’02). 73--78.

[6]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Comput. Archit. News 39, 2 (2011), 1--7.

Digital Library

[7]

Joseph Bonneau and Ilya Mironov. 2006. Cache-collision timing attacks against AES. In Cryptographic Hardware and Embedded Systems - CHES 2006, Louis Goubin and Mitsuru Matsui (Eds.). Lecture Notes in Computer Science, Vol. 4249. Springer Berlin, 201--215.

Digital Library

[8]

Dhruva R. Chakrabarti, Hans-J. Boehm, and Kumud Bhandari. 2014. Atlas: Leveraging locks for non-volatile memory consistency. ACM SIGPLAN Not. 49 (2014), 433--452.

Digital Library

[9]

Jeremy Condit, Edmund Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles.

Digital Library

[10]

Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, and Yi Zou. 2011. An energy-efficient adaptive hybrid cache. In Proceedings of the 17th IEEE/ACM International Symposium on Low-power Electronics and Design (ISLPED’11). IEEE Press, Piscataway, NJ, 67--72. Retrieved from http://dl.acm.org/citation.cfm?id=2016802.2016825.

Digital Library

[11]

Henry Cook, Krste Asanovic, and David A. Patterson. 2009. Virtual Local Sstores: Enabling Software-managed Memory Hierarchies in Mainstream Computing Environments. Technical Report. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-131 (2009).

[12]

Luke Dalessandro, Michael Spear, and Michael L. Scott. 2010. NOrec: Streamlining STM by abolishing ownership records. In Proceedings of the 15th ACM Symposium on Principles and Practice of Parallel Programming.

[13]

Ning Deng, Weixing Ji, Jaxin Li, and Qi Zuo. 2011. A semi-automatic scratchpad memory management framework for CMP. In Proceedings of the International Workshop on Advanced Parallel Processing Technologies. Springer, 73--87.

[14]

Goran Doychev, Dominik Feld, Boris Köpf, Laurent Mauborgne, and Jan Reineke. 2013. CacheAudit: A tool for the static analysis of cache side channels. In Proceedings of the 22nd USENIX Conference on Security. 431--446.

[15]

Subramanya Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th European Conference on Computer Systems.

Digital Library

[16]

Poletti Francesco, Paul Marchal, David Atienza, Luca Benini, Francky Catthoor, and Jose M. Mendias. 2004. An integrated hardware/software approach for runtime scratchpad management. In Proceedings of the 41st Design Automation Conference. ACM, 238--243.

[17]

Christopher Garman, Xiaochen Guo, and Michael Spear. 2017. A study of unnecessary write backs. In Proceedings of the International Symposium on Memory Systems (MEMSYS’17). ACM, New York, NY, 127--129.

Digital Library

[18]

Michael Gschwind. 2007. The cell broadband engine: Exploiting multiple levels of parallelism in a chip multiprocessor. Int. J. Parallel Prog. 35, 3 (2007), 233--262.

Digital Library

[19]

David Gullasch, Endre Bangerter, and Stephan Krenn. 2011. Cache games—Bringing access-based cache attacks on AES to practice. In Proceedings of the IEEE Symposium on Security and Privacy (S8P’11). 490--505.

Digital Library

[20]

Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th IEEE International Workshop on Workload Characterization (WWC’01). IEEE, 3--14.

[21]

Intel Inc. 2019. Intel Skylake. Retrieved from https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client).

[22]

G. Irazoqui, T. Eisenbarth, and B. Sunar. 2015. S$A: A shared cache attack that works across cores and defies VM sandboxing—and its application to AES. In Proceedings of the IEEE Symposium on Security and Privacy (S8P’15). 591--604.

[23]

Vladimir Kiriansky, Ilia A. Lebedev, Saman P. Amarasinghe, Srinivas Devadas, and Joel S. Emer. 2018. DAWG: A defense against cache timing attacks in speculative execution processors. In Proceedings of the 51st IEEE/ACM International Symposium on Microarchitecture (MICRO’18). 974--987.

[24]

Michael Kistler, Michael Perrone, and Fabrizio Petrini. 2006. Cell multiprocessor communication network: Built for speed. IEEE Micro 26, 3 (2006), 10--23.

Digital Library

[25]

Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, et al. 2019. Spectre attacks: Exploiting speculative execution. In Proceedings of the IEEE Symposium on Security and Privacy (SP’19). IEEE, 1--19.

[26]

Paul C. Kocher. 1996. Timing attacks on implementations of Diffie–Hellman, RSA, DSS, and other systems. In Proceedings of the Advances in Cryptology Conference (CRYPTO’96).

[27]

Rakesh Komuravelli, Matthew D. Sinclair, Johnathan Alsop, Muhammad Huzaifa, Maria Kotsifakou, Prakalp Srivastava, Sarita V. Adve, and Vikram S. Adve. 2015. Stash: Have your scratchpad and cache it too. SIGARCH Comput. Archit. News 43, 3 (June 2015), 707--719.

[28]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd IEEE/ACM International Symposium on Microarchitecture. ACM, 469--480.

[29]

Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. 2018. Meltdown: Reading kernel memory from user space. In Proceedings of the 27th USENIX Security Symposium (USENIX Security’18). 973--990.

[30]

Fangfei Liu, Y. Yarom, Qian Ge, G. Heiser, and R. B. Lee. 2015. Last-level cache side-channel attacks are practical. In Proceedings of the IEEE Symposium on Security and Privacy (SP’15). 605--622.

[31]

Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, and Jinglei Ren. 2017. DudeTM: Building durable transactions with decoupling for persistent memory. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[32]

MICRON. 2020. DDR4 SDRAM. Retrieved from https://www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr4/8gb_ddr4_sdram.pdf.

[33]

Sanketh Nalli, Swapnil Haria, Mark D. Hill, Michael M. Swift, Haris Volos, and Kimberly Keeton. 2017. An analysis of persistent memory use with WHISPER. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[34]

NVIDIA. 2013.Using Shared Memory in CUDA C/C++. Retrieved from https://devblogs.nvidia.com/using-shared-memory-cuda-cc/.

[35]

Dag A. Osvik, Adi Shamir, and Eran Tromer. 2006. Cache attacks and countermeasures: The case of AES. Topics in Cryptology–CT-RSA 2006 (Jan. 2006). Springer, 1--20.

[36]

Colin Percival. 2005. Cache missing for fun and profit. In Proceedings of the BSDCan Conference.

[37]

Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. 2009. Hey, you, get off of my cloud: Exploring information leakage in third-party compute clouds. In Proceedings of the 16th ACM Conference on Computer and Communications Security. 199--212.

Digital Library

[38]

Muhammad Refaat Soliman and Rodolfo Pellizzoni. 2017. WCET-driven dynamic data scratchpad management with compiler-directed prefetching. In Proceedings of the 29th Euromicro Conference on Real-Time Systems (ECRTS’17) (Leibniz International Proceedings in Informatics (LIPIcs)), Marko Bertogna (Ed.), Vol. 76. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 24:1–24:23.

[39]

Eran Tromer, DagArne Osvik, and Adi Shamir. 2010. Efficient cache attacks on AES, and countermeasures. J. Cryptol. 23, 1 (2010), 37--71.

[40]

Sumesh Udayakumaran and Rajeev Barua. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’03). Association for Computing Machinery, New York, NY, 276--286.

Digital Library

[41]

Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems.

[42]

M. Yan, J. Choi, D. Skarlatos, A. Morrison, C. Fletcher, and J. Torrellas. 2018. InvisiSpec: Making speculative execution invisible in the cache hierarchy. In Proceedings of the 51st IEEE/ACM International Symposium on Microarchitecture (MICRO’18). 428--441.

[43]

Yuval Yarom and Naomi Benger. 2014. Recovering OpenSSL ECDSA nonces using the flush+reload cache side-channel attack. Cryptology ePrint Archive, Report 2014/140.

[44]

Yuval Yarom and Katrina Falkner. 2014. FLUSH+RELOAD: A high resolution, low noise, L3 cache side-channel attack. In Proceedings of the 23rd USENIX Conference on Security. 719--732.

[45]

Yuval Yarom, Daniel Genkin, and Nadia Heninger. 2016. CacheBleed: A timing attack on OpenSSL constant time RSA. In Proceedings of the Conference on Cryptographic Hardware and Embedded Systems (CHES’16) (Lecture Notes in Computer Science), Benedikt Gierlichs and Axel Y. Poschmann (Eds.), Vol. 9813. Springer, 346--367. Retrieved from http://dblp.uni-trier.de/db/conf/ches/ches2016.html#YaromGH16.

[46]

Richard Yoo, Yang Ni, Adam Welc, Bratin Saha, Ali-Reza Adl-Tabatabai, and Hsien-Hsin Lee. 2008. Kicking the tires of software transactional memory: Why the going gets tough. In Proceedings of the 20th ACM Symposium on Parallelism in Algorithms and Architectures.

Digital Library

[47]

Pantea Zardoshti, Tingzhe Zhou, Pavithra Balaji, Michael L. Scott, and Michael Spear. 2019. Simplifying transactional memory support in C++. ACM Trans. Archit. Code Optim. 16, 3 (July 2019).

Digital Library

[48]

Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2012. Cross-VM side channels and their use to extract private keys. In Proceedings of the ACM Conference on Computer and Communications Security. 305--316.

Digital Library

Cited By

Du YSha ESong YGuo YXu LZhuge Q(2025)MuDP: multi-granularity data placement for uniform loops on SPM-DRAM architectures to minimize latencyFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3566-y19:5Online publication date: 1-May-2025
https://dl.acm.org/doi/10.1007/s11704-023-3566-y
Sun ZZhou ZFu F(2024)Optimizing code allocation for hybrid on-chip memory in IoT systemsIntegration10.1016/j.vlsi.2024.10219597(102195)Online publication date: Jul-2024
https://doi.org/10.1016/j.vlsi.2024.102195
Zhang CBremer MChan CShalf JGuo X(2022)ASA: Accelerating Sparse Accumulation in Column-wise SpGEMMACM Transactions on Architecture and Code Optimization10.1145/354306819:4(1-24)Online publication date: 11-Jun-2022
https://dl.acm.org/doi/10.1145/3543068
Show More Cited By

Index Terms

SPX64: A Scratchpad Memory for General-purpose Microprocessors

Recommendations

SA-SPM: an efficient compiler for security aware scratchpad memory (invited paper)
LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

Scratchpad memories (SPM) are often used to boost the performance of application-specific embedded systems. In embedded systems, main memories are vulnerable to external attacks such as bus snooping or memory extraction. Therefore it is desirable to ...
Fast and Accurate Code Placement of Embedded Software for Hybrid On-Chip Memory Architecture
HPCC '14: Proceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)

On chip SRAMs including scratchpad memories (SPMs) and caches are widely used in embedded systems to narrow the speed gap between CPU and memory. Memory subsystem acts as both performance and energy bottleneck for many applications in many contemporary ...
Endurance-aware cache line management for non-volatile caches

Nonvolatile memories (NVMs) have the potential to replace low-level SRAM or eDRAM on-chip caches because NVMs save standby power and provide large cache capacity. However, limited write endurance is a common problem for NVM technologies, and today's ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 18, Issue 1

March 2021

402 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3446348

Editor:
David Kaeli
Northeastern University, USA

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 December 2020

Accepted: 01 November 2020

Revised: 01 November 2020

Received: 01 May 2020

Published in TACO Volume 18, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSF (National Science Foundation)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
2,549
Total Downloads

Downloads (Last 12 months)511
Downloads (Last 6 weeks)41

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Du YSha ESong YGuo YXu LZhuge Q(2025)MuDP: multi-granularity data placement for uniform loops on SPM-DRAM architectures to minimize latencyFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3566-y19:5Online publication date: 1-May-2025
https://dl.acm.org/doi/10.1007/s11704-023-3566-y
Sun ZZhou ZFu F(2024)Optimizing code allocation for hybrid on-chip memory in IoT systemsIntegration10.1016/j.vlsi.2024.10219597(102195)Online publication date: Jul-2024
https://doi.org/10.1016/j.vlsi.2024.102195
Zhang CBremer MChan CShalf JGuo X(2022)ASA: Accelerating Sparse Accumulation in Column-wise SpGEMMACM Transactions on Architecture and Code Optimization10.1145/354306819:4(1-24)Online publication date: 11-Jun-2022
https://dl.acm.org/doi/10.1145/3543068
Tabbassum KTalpur SKhahro S(2022)An interactive and dynamic scratchpad memory management strategy for multi-core processorsMicroprocessors and Microsystems10.1016/j.micpro.2022.10456592(104565)Online publication date: Jul-2022
https://doi.org/10.1016/j.micpro.2022.104565
Dashora RBabu M(2022)A Survey on Advancements of Real-Time Analytics Architecture ComponentsComputational Methods and Data Engineering10.1007/978-981-19-3015-7_41(547-559)Online publication date: 9-Sep-2022
https://doi.org/10.1007/978-981-19-3015-7_41

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents