research-article

Open access

Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory

Authors:

Mohammad Alshboul,

Hussein Elnawawy,

James Tuck, and

Yan SolihinAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 16, Issue 2

Article No.: 18, Pages 1 - 27

https://doi.org/10.1145/3323091

Published: 29 May 2019 Publication History

All formats PDF

Abstract

Future main memory will likely include Non-Volatile Memory. Non-Volatile Main Memory (NVMM) provides an opportunity to rethink checkpointing strategies for providing failure safety to applications. While there are many checkpointing and logging schemes in the literature, their use must be revisited as they incur high execution time overheads as well as a large number of additional writes to NVMM, which may significantly impact write endurance.

In this article, we propose a novel recompute-based failure safety approach and demonstrate its applicability to loop-based code. Rather than keeping a fully consistent logging state, we only log enough state to enable recomputation. Upon a failure, our approach recovers to a consistent state by determining which parts of the computation were not completed and recomputing them. Effectively, our approach removes the need to keep checkpoints or logs, thus reducing execution time overheads and improving NVMM write endurance at the expense of more complex recovery. We compare our new approach against logging and checkpointing on five scientific workloads, including tiled matrix multiplication, on a computer system model that was built on gem5 and supports Intel PMEM instruction extensions. For tiled matrix multiplication, our recompute approach incurs an execution time overhead of only 5%, in contrast to 8% overhead with logging and 207% overhead with checkpointing. Furthermore, recompute only adds 7% additional NVMM writes, compared to 111% with logging and 330% with checkpointing. We also conduct experiments on real hardware, allowing us to run our workloads to completion while varying the number of threads used for computation. These experiments substantiate our simulation-based observations and provide a sensitivity study and performance comparison between the Recompute Scheme and Naive Checkpointing.

References

[1]

2016. Ruby Memory System. Retrieved from http://gem5.org/Ruby.

[2]

Song Ho Ahn. 2005. Convolution. Retrieved from http://www.songho.ca/dsp/convolution/convolution.html.

[3]

Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. IEEE J. 98, 12 (2010), 2237--2251.

[4]

M. Alshboul, J. Tuck, and Y. Solihin. 2018. Lazy persistency: A high-performing and write-efficient software persistency technique. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 439--451.

Digital Library

[5]

Amro Awad, Sergey Blagodurov, and Yan Solihin. 2016. Write-aware management of NVM-based memory extensions. In Proceedings of the 2016 International Conference on Supercomputing (ICS’16).

Digital Library

[6]

Amro Awad, Brett Kettering, and Yan Solihin. 2015. Non-volatile memory host controller interface performance analysis in high-performance I/O systems. In Proceedings of International Symposium on Performance Analysis of Systems and Software (ISPASS).

[7]

Amro Awad, Pratyusa Manadhata, Stuart Haber, Yan Solihin, and William Horne. 2016. Silent shredder: Zero-cost shredding for secure non-volatile main memory controllers. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16).

Digital Library

[8]

Amro Awad, Yipeng Wang, Deborah Shands, and Yan Solihin. 2017. ObfusMem: A low-overhead access obfuscation for trusted memories. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[9]

F. Bedeschi, et al. 2004. An 8Mb demonstrator for high-density 1.8V phase-change memories. In Proceedings of the International Symposium on VLSI Circuits.

[10]

Brian N. Bershad, David D. Redell, and John R. Ellis. 1992. Fast mutual exclusion for uniprocessors. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V).

Digital Library

[11]

N. Binkert, et al. 2011. The GEM5 simulator. ACM SIGARCH Computer Architecture News (CAN) (2011).

Digital Library

[12]

G. Bronevetsky, D. Marques, K. Pingali, P. K. Szwed, and M. Schulz. 2004. Application-level checkpointing for shared memory programs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[13]

Dhruva R. Chakrabarti, Hans-J. Boehm, and Kumud Bhandari. 2014. Atlas: Leveraging locks for non-volatile memory consistency. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages 8 Applications (OOPSLA’14).

Digital Library

[14]

Andreas Chatzistergiou, Marcelo Cintra, and Stratis D. Viglas. 2015. REWIND: Recovery write-ahead system for in-memory non-volatile data-structures. Proc. VLDB Endow. 8, 5 (Jan. 2015), 497--508.

Digital Library

[15]

Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-heaps: Making persistent objects fast and safe with next-generation non-volatile memories. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[16]

J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the Symposium on Operating Systems Principles (SOSP).

Digital Library

[17]

Intel Corp. 2016. Intel 64 and IA-32 Architectures Developer’s Manual: Vol. 3A.

[18]

Marc de Kruijf and Karthikeyan Sankaralingam. 2011. Idempotent processor architecture. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44).

Digital Library

[19]

Marc de Kruijf and Karthikeyan Sankaralingam. 2011. Idempotent processor architecture. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, 140--151.

Digital Library

[20]

Marc de Kruijf and Karthikeyan Sankaralingam. 2013. Idempotent code generation: Implementation, analysis, and evaluation. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (CGO’13).

Digital Library

[21]

Marc A. de Kruijf, Karthikeyan Sankaralingam, and Somesh Jha. 2012. Static analysis and compiler design for idempotent processing. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12).

Digital Library

[22]

Xiangyu Dong, Naveen Muralimanohar, Norm Jouppi, Richard Kaufmann, and Yuan Xie. 2009. Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems. In Proceedings of the International Conference on High Performance Computing Networking, Storage and Analysis (SC).

Digital Library

[23]

H. Elnawawy, M. Alshboul, J. Tuck, and Y. Solihin. 2017. Efficient checkpointing of loop-based codes for non-volatile main memory. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 318--329.

[24]

Vaibhav Gogte, Stephan Diestelhorst, William Wang, Satish Narayanasamy, Peter M. Chen, and Thomas F. Wenisch. 2018. Persistency for synchronization-free regions. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 46--61.

Digital Library

[25]

Terry Ching-Hsiang Hsu, Helge Brügner, Indrajit Roy, Kimberly Keeton, and Patrick Eugster. 2017. NVthreads: Practical persistence for multi-threaded applications. In Proceedings of the 12th European Conference on Computer Systems (EuroSys’17).

[26]

Dewan Ibtesham, Kurt B. Ferreira, and Dorian Arnold. 2015. A checkpoint compression study for high-performance computing systems. Int. J. High Perform. Comput. Appl. 29, 4 (2015), 387--402.

Digital Library

[27]

Intel. 2016. Persistent Memory Programming. Retrieved from http://pmem.io.

[28]

Intel and Micron. 2015. Intel and Micron Produce Breakthrough Memory Technology.

[29]

Joseph Izraelevitz, Terence Kelly, and Aasheesh Kolli. 2016. Failure-atomic persistent memory updates via JUSTDO logging. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16).

Digital Library

[30]

Yangqing Jia. 2014. Learning Semantic Image Representations at a Large Scale. Ph.D. Dissertation.

[31]

Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, and Stratis Viglas. 2015. Efficient persist barriers for multicores. In Proceedings of International Symposium on Microarchitecture (Micro).

Digital Library

[32]

A. Joshi, V. Nagarajan, S. Viglas, and M. Cintra. 2017. ATOM: Atomic durability in non-volatile memory through hardware logging. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[33]

S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. 2013. Optimizing checkpoints using NVM as virtual memory. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS).

Digital Library

[34]

T. Kawahara, R. Takemura, K. Miura, J. Hayakawa, S. Ikeda, Y. Lee, R. Sasaki, Y. Goto, K. Ito, T. Meguro, F. Matsukura, H. Takahashi, H. Matsuoka, and H. Ohno. 2007. 2Mb spin-transfer torque RAM (SPRAM) with bit-by-bit bidirectional current write and parallelizing-direction current read. In Proceedings of the International Solid-State Circuits Conference (ISSCC).

[35]

Wook-Hee Kim, Jinwoong Kim, Woongki Baek, Beomseok Nam, and Youjip Won. 2016. NVWAL: Exploiting NVRAM in write-ahead logging. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). ACM, New York, NY, 385--398.

Digital Library

[36]

Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. 2016. High-performance transactions for persistent memories. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[37]

Emre Kultursay, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu. 2013. Evaluating STT-RAM as an energy-effcient main memory alternative. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS).

[38]

Benjamin C. Lee. 2010. Phase change technology and the future of main memory. IEEE Micro (2010).

Digital Library

[39]

Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, Weimin Zheng, and Jinglei Ren. 2017. DudeTM: Building durable transactions with decoupling for persistent memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). ACM, New York, NY, 329--343.

Digital Library

[40]

Q. Liu, J. Izraelevitz, S. K. Lee, M. L. Scott, S. H. Noh, and C. Jung. 2018. iDO: Compiler-directed failure atomicity for nonvolatile memory. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[41]

Qingrui Liu, Joseph Izraelevitz, Se Kwon Lee, Michael L. Scott, Sam H. Noh, and Changhee Jung. 2018. iDO: Compiler-directed failure atomicity for nonvolatile memory. 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2018).

Digital Library

[42]

Q. Liu, C. Jung, D. Lee, and D. Tiwari. 2016. Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.

Digital Library

[43]

Youyou Lu, Jiwu Shu, Long Sun, and Onur Mutlu. 2014. Loose-ordering consistency for persistent memory. In Proceedings of the International Conference on Computer Design (ICCD).

[44]

Scott A. Mahlke, William Y. Chen, Wen-mei W. Hwu, B. Ramakrishna Rau, and Michael S. Schlansker. 1992. Sentinel scheduling for VLIW and superscalar processors. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V).

Digital Library

[45]

C. Mohan, D. Haderle, B. Lindsay, et al. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS) (1992).

Digital Library

[46]

Adam Moody, Greg Bronevetsky, Kathryn Mohror, and Bronis R. de Supinski. 2010. Design, modeling, and evaluation of a scalable multi-level checkpointing system. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC).

Digital Library

[47]

K. Osawa, A. Sekiya, H. Naganuma, and R. Yokota. 2017. Accelerating matrix multiplication in deep learning by using low-rank approximation. In 2017 International Conference on High Performance Computing Simulation (HPCS).

[48]

Steven Pelley, Peter M. Chen, and Thomas F. Wenisch. 2014. Memory persistency. In Proceedings of International Symposium on Computer Architecture (ISCA).

Digital Library

[49]

M. K. Qureshi. 2011. Pay-as-you-go: Low-overhead hard-error correction for phase change memories. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

Digital Library

[50]

Raghunath Rajachandrasekar, Sreeram Potluri, Akshay Venkatesh, Khaled Hamidouche, Md. Wasi ur Rahman, and Dhabaleswar K. (DK) Panda. 2014. MIC-check: A distributed checkpointing framework for the Intel many integrated cores architecture. In Proceedings of the International Symposium on High-performance Parallel and Distributed Computing (HPDC).

Digital Library

[51]

Mohit Saxena and Michael M. Swift. 2010. FlashVM: Virtual memory management on flash. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference (USENIXATC’10).

Digital Library

[52]

B. Schroeder and G. A. Gibson. 2007. Understanding failures in petascale computers. Journal of Physics 78, 1 (2007), 12--22.

[53]

Seunghee Shin, Satish Kumar Tirukkovalluri, James Tuck, and Yan Solihin. 2017. Proteus: A flexible and fast software supported hardware logging approach for NVM. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture.

Digital Library

[54]

Seunghee Shin, James Tuck, and Yan Solihin. 2017. Hiding the long latency of persist barriers using speculative execution. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17).

Digital Library

[55]

Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[56]

M. E. Wolf and M. S. Lam. 1990. A data locality optimizing algorithm. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI).

Digital Library

[57]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of International Symposium on Computer Architecture (ISCA).

Digital Library

[58]

S. C. Woo, J. P. Singh, and J. L. Hennessy. 1994. The performance advantages of integrating block data transfer in cache-coherent multiprocessors. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

Cited By

Badri SSaini MGoel N(2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 20-Oct-2023
https://dl.acm.org/doi/10.1145/3629524
Elnawawy HTuck JByrd G(2023)PreFlush: Lightweight Hardware Prediction Mechanism for Cache Line Flush and Writeback2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00015(74-85)Online publication date: 21-Oct-2023
https://doi.org/10.1109/PACT58117.2023.00015
Ye CXu YShen XSha YLiao XJin HSolihin Y(2023)Reconciling Selective Logging and Hardware Persistent Memory Transaction2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071088(664-676)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071088
Show More Cited By

Index Terms

Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
  2. Dependable and fault-tolerant systems and networks
    1. Processors and memory architectures
2. Hardware
  1. Emerging technologies
    1. Memory and dense storage
  2. Integrated circuits
    1. Semiconductor memory
      1. Non-volatile memory

Recommendations

Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
Read More
Register allocation for write activity minimization on non-volatile main memory
ASPDAC '11: Proceedings of the 16th Asia and South Pacific Design Automation Conference

Non-volatile memories are good candidates for DRAM replacement as main memory in embedded systems and they have many desirable characteristics. Nevertheless, the disadvantages of non-volatile memory co-exist with its advantages. First, the lifetime of ...
Read More
File-Based Memory Management for Non-volatile Main Memory
COMPSAC '13: Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference

Active research and development efforts on byte addressable non-volatile (NV) memory technologies, such as STT-RAM, PCM, and ReRAM, have been conducted in recent years. Because they are byte addressable, they can be used as main memory by directly ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 16, Issue 2

June 2019

317 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3325131

Editor:
Koen De Bosschere
Ghent University, Belgium

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2019

Accepted: 01 February 2019

Revised: 01 February 2019

Received: 01 December 2018

Published in TACO Volume 16, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
1,364
Total Downloads

Downloads (Last 12 months)230
Downloads (Last 6 weeks)22

Other Metrics

View Author Metrics

Citations

Cited By

Badri SSaini MGoel N(2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 20-Oct-2023
https://dl.acm.org/doi/10.1145/3629524
Elnawawy HTuck JByrd G(2023)PreFlush: Lightweight Hardware Prediction Mechanism for Cache Line Flush and Writeback2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00015(74-85)Online publication date: 21-Oct-2023
https://doi.org/10.1109/PACT58117.2023.00015
Ye CXu YShen XSha YLiao XJin HSolihin Y(2023)Reconciling Selective Logging and Hardware Persistent Memory Transaction2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071088(664-676)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071088
Nath AKapoor H(2022) Pop-Crypt: Identification and Management of Pop ular Words for Enhancing Lifetime of En Crypt ed Nonvolatile Main Memories IEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.318379330:9(1219-1229)Online publication date: Sep-2022
https://doi.org/10.1109/TVLSI.2022.3183793
Xu YIzraelevitz JSwanson SSherwood TBerger EKozyrakis C(2021)Clobber-NVM: log less, re-execute moreProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446730(346-359)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446730
Lee HKim HKim CHan HSeo E(2021)Idempotence-Based Preemptive GPU Kernel Scheduling for Embedded SystemsIEEE Transactions on Computers10.1109/TC.2020.298825170:3(332-346)Online publication date: 9-Feb-2021
https://dl.acm.org/doi/10.1109/TC.2020.2988251
Alshboul MRamrakhyani PWang WTuck JSolihin Y(2021)BBB: Simplifying Persistent Programming using Battery-Backed Buffers2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00019(111-124)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00019
Alshboul MTuck JSolihin YLi Z(2020)WETProceedings of the 57th ACM/EDAC/IEEE Design Automation Conference10.5555/3437539.3437610(1-6)Online publication date: 20-Jul-2020
https://dl.acm.org/doi/10.5555/3437539.3437610
Dhulipala LMcGuffey CKang HGu YBlelloch GGibbons PShun J(2020)SageProceedings of the VLDB Endowment10.14778/3397230.339725113:9(1598-1613)Online publication date: 1-May-2020
https://dl.acm.org/doi/10.14778/3397230.3397251
Elkhouly RAlshboul MHayashi ASolihin YKimura K(2019)Compiler-support for Critical Data Persistence in NVMACM Transactions on Architecture and Code Optimization10.1145/337123616:4(1-25)Online publication date: 26-Dec-2019
https://dl.acm.org/doi/10.1145/3371236

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents