Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory

Published: 29 May 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Future main memory will likely include Non-Volatile Memory. Non-Volatile Main Memory (NVMM) provides an opportunity to rethink checkpointing strategies for providing failure safety to applications. While there are many checkpointing and logging schemes in the literature, their use must be revisited as they incur high execution time overheads as well as a large number of additional writes to NVMM, which may significantly impact write endurance.
    In this article, we propose a novel recompute-based failure safety approach and demonstrate its applicability to loop-based code. Rather than keeping a fully consistent logging state, we only log enough state to enable recomputation. Upon a failure, our approach recovers to a consistent state by determining which parts of the computation were not completed and recomputing them. Effectively, our approach removes the need to keep checkpoints or logs, thus reducing execution time overheads and improving NVMM write endurance at the expense of more complex recovery. We compare our new approach against logging and checkpointing on five scientific workloads, including tiled matrix multiplication, on a computer system model that was built on gem5 and supports Intel PMEM instruction extensions. For tiled matrix multiplication, our recompute approach incurs an execution time overhead of only 5%, in contrast to 8% overhead with logging and 207% overhead with checkpointing. Furthermore, recompute only adds 7% additional NVMM writes, compared to 111% with logging and 330% with checkpointing. We also conduct experiments on real hardware, allowing us to run our workloads to completion while varying the number of threads used for computation. These experiments substantiate our simulation-based observations and provide a sensitivity study and performance comparison between the Recompute Scheme and Naive Checkpointing.

    References

    [1]
    2016. Ruby Memory System. Retrieved from http://gem5.org/Ruby.
    [2]
    Song Ho Ahn. 2005. Convolution. Retrieved from http://www.songho.ca/dsp/convolution/convolution.html.
    [3]
    Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. IEEE J. 98, 12 (2010), 2237--2251.
    [4]
    M. Alshboul, J. Tuck, and Y. Solihin. 2018. Lazy persistency: A high-performing and write-efficient software persistency technique. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 439--451.
    [5]
    Amro Awad, Sergey Blagodurov, and Yan Solihin. 2016. Write-aware management of NVM-based memory extensions. In Proceedings of the 2016 International Conference on Supercomputing (ICS’16).
    [6]
    Amro Awad, Brett Kettering, and Yan Solihin. 2015. Non-volatile memory host controller interface performance analysis in high-performance I/O systems. In Proceedings of International Symposium on Performance Analysis of Systems and Software (ISPASS).
    [7]
    Amro Awad, Pratyusa Manadhata, Stuart Haber, Yan Solihin, and William Horne. 2016. Silent shredder: Zero-cost shredding for secure non-volatile main memory controllers. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16).
    [8]
    Amro Awad, Yipeng Wang, Deborah Shands, and Yan Solihin. 2017. ObfusMem: A low-overhead access obfuscation for trusted memories. In Proceedings of the International Symposium on Computer Architecture (ISCA).
    [9]
    F. Bedeschi, et al. 2004. An 8Mb demonstrator for high-density 1.8V phase-change memories. In Proceedings of the International Symposium on VLSI Circuits.
    [10]
    Brian N. Bershad, David D. Redell, and John R. Ellis. 1992. Fast mutual exclusion for uniprocessors. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V).
    [11]
    N. Binkert, et al. 2011. The GEM5 simulator. ACM SIGARCH Computer Architecture News (CAN) (2011).
    [12]
    G. Bronevetsky, D. Marques, K. Pingali, P. K. Szwed, and M. Schulz. 2004. Application-level checkpointing for shared memory programs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    [13]
    Dhruva R. Chakrabarti, Hans-J. Boehm, and Kumud Bhandari. 2014. Atlas: Leveraging locks for non-volatile memory consistency. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages 8 Applications (OOPSLA’14).
    [14]
    Andreas Chatzistergiou, Marcelo Cintra, and Stratis D. Viglas. 2015. REWIND: Recovery write-ahead system for in-memory non-volatile data-structures. Proc. VLDB Endow. 8, 5 (Jan. 2015), 497--508.
    [15]
    Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-heaps: Making persistent objects fast and safe with next-generation non-volatile memories. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    [16]
    J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the Symposium on Operating Systems Principles (SOSP).
    [17]
    Intel Corp. 2016. Intel 64 and IA-32 Architectures Developer’s Manual: Vol. 3A.
    [18]
    Marc de Kruijf and Karthikeyan Sankaralingam. 2011. Idempotent processor architecture. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44).
    [19]
    Marc de Kruijf and Karthikeyan Sankaralingam. 2011. Idempotent processor architecture. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, 140--151.
    [20]
    Marc de Kruijf and Karthikeyan Sankaralingam. 2013. Idempotent code generation: Implementation, analysis, and evaluation. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (CGO’13).
    [21]
    Marc A. de Kruijf, Karthikeyan Sankaralingam, and Somesh Jha. 2012. Static analysis and compiler design for idempotent processing. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12).
    [22]
    Xiangyu Dong, Naveen Muralimanohar, Norm Jouppi, Richard Kaufmann, and Yuan Xie. 2009. Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems. In Proceedings of the International Conference on High Performance Computing Networking, Storage and Analysis (SC).
    [23]
    H. Elnawawy, M. Alshboul, J. Tuck, and Y. Solihin. 2017. Efficient checkpointing of loop-based codes for non-volatile main memory. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 318--329.
    [24]
    Vaibhav Gogte, Stephan Diestelhorst, William Wang, Satish Narayanasamy, Peter M. Chen, and Thomas F. Wenisch. 2018. Persistency for synchronization-free regions. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 46--61.
    [25]
    Terry Ching-Hsiang Hsu, Helge Brügner, Indrajit Roy, Kimberly Keeton, and Patrick Eugster. 2017. NVthreads: Practical persistence for multi-threaded applications. In Proceedings of the 12th European Conference on Computer Systems (EuroSys’17).
    [26]
    Dewan Ibtesham, Kurt B. Ferreira, and Dorian Arnold. 2015. A checkpoint compression study for high-performance computing systems. Int. J. High Perform. Comput. Appl. 29, 4 (2015), 387--402.
    [27]
    Intel. 2016. Persistent Memory Programming. Retrieved from http://pmem.io.
    [28]
    Intel and Micron. 2015. Intel and Micron Produce Breakthrough Memory Technology.
    [29]
    Joseph Izraelevitz, Terence Kelly, and Aasheesh Kolli. 2016. Failure-atomic persistent memory updates via JUSTDO logging. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16).
    [30]
    Yangqing Jia. 2014. Learning Semantic Image Representations at a Large Scale. Ph.D. Dissertation.
    [31]
    Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, and Stratis Viglas. 2015. Efficient persist barriers for multicores. In Proceedings of International Symposium on Microarchitecture (Micro).
    [32]
    A. Joshi, V. Nagarajan, S. Viglas, and M. Cintra. 2017. ATOM: Atomic durability in non-volatile memory through hardware logging. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
    [33]
    S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. 2013. Optimizing checkpoints using NVM as virtual memory. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS).
    [34]
    T. Kawahara, R. Takemura, K. Miura, J. Hayakawa, S. Ikeda, Y. Lee, R. Sasaki, Y. Goto, K. Ito, T. Meguro, F. Matsukura, H. Takahashi, H. Matsuoka, and H. Ohno. 2007. 2Mb spin-transfer torque RAM (SPRAM) with bit-by-bit bidirectional current write and parallelizing-direction current read. In Proceedings of the International Solid-State Circuits Conference (ISSCC).
    [35]
    Wook-Hee Kim, Jinwoong Kim, Woongki Baek, Beomseok Nam, and Youjip Won. 2016. NVWAL: Exploiting NVRAM in write-ahead logging. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). ACM, New York, NY, 385--398.
    [36]
    Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. 2016. High-performance transactions for persistent memories. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    [37]
    Emre Kultursay, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu. 2013. Evaluating STT-RAM as an energy-effcient main memory alternative. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS).
    [38]
    Benjamin C. Lee. 2010. Phase change technology and the future of main memory. IEEE Micro (2010).
    [39]
    Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, Weimin Zheng, and Jinglei Ren. 2017. DudeTM: Building durable transactions with decoupling for persistent memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). ACM, New York, NY, 329--343.
    [40]
    Q. Liu, J. Izraelevitz, S. K. Lee, M. L. Scott, S. H. Noh, and C. Jung. 2018. iDO: Compiler-directed failure atomicity for nonvolatile memory. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
    [41]
    Qingrui Liu, Joseph Izraelevitz, Se Kwon Lee, Michael L. Scott, Sam H. Noh, and Changhee Jung. 2018. iDO: Compiler-directed failure atomicity for nonvolatile memory. 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2018).
    [42]
    Q. Liu, C. Jung, D. Lee, and D. Tiwari. 2016. Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.
    [43]
    Youyou Lu, Jiwu Shu, Long Sun, and Onur Mutlu. 2014. Loose-ordering consistency for persistent memory. In Proceedings of the International Conference on Computer Design (ICCD).
    [44]
    Scott A. Mahlke, William Y. Chen, Wen-mei W. Hwu, B. Ramakrishna Rau, and Michael S. Schlansker. 1992. Sentinel scheduling for VLIW and superscalar processors. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V).
    [45]
    C. Mohan, D. Haderle, B. Lindsay, et al. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS) (1992).
    [46]
    Adam Moody, Greg Bronevetsky, Kathryn Mohror, and Bronis R. de Supinski. 2010. Design, modeling, and evaluation of a scalable multi-level checkpointing system. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC).
    [47]
    K. Osawa, A. Sekiya, H. Naganuma, and R. Yokota. 2017. Accelerating matrix multiplication in deep learning by using low-rank approximation. In 2017 International Conference on High Performance Computing Simulation (HPCS).
    [48]
    Steven Pelley, Peter M. Chen, and Thomas F. Wenisch. 2014. Memory persistency. In Proceedings of International Symposium on Computer Architecture (ISCA).
    [49]
    M. K. Qureshi. 2011. Pay-as-you-go: Low-overhead hard-error correction for phase change memories. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
    [50]
    Raghunath Rajachandrasekar, Sreeram Potluri, Akshay Venkatesh, Khaled Hamidouche, Md. Wasi ur Rahman, and Dhabaleswar K. (DK) Panda. 2014. MIC-check: A distributed checkpointing framework for the Intel many integrated cores architecture. In Proceedings of the International Symposium on High-performance Parallel and Distributed Computing (HPDC).
    [51]
    Mohit Saxena and Michael M. Swift. 2010. FlashVM: Virtual memory management on flash. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference (USENIXATC’10).
    [52]
    B. Schroeder and G. A. Gibson. 2007. Understanding failures in petascale computers. Journal of Physics 78, 1 (2007), 12--22.
    [53]
    Seunghee Shin, Satish Kumar Tirukkovalluri, James Tuck, and Yan Solihin. 2017. Proteus: A flexible and fast software supported hardware logging approach for NVM. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture.
    [54]
    Seunghee Shin, James Tuck, and Yan Solihin. 2017. Hiding the long latency of persist barriers using speculative execution. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17).
    [55]
    Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    [56]
    M. E. Wolf and M. S. Lam. 1990. A data locality optimizing algorithm. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI).
    [57]
    S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of International Symposium on Computer Architecture (ISCA).
    [58]
    S. C. Woo, J. P. Singh, and J. L. Hennessy. 1994. The performance advantages of integrating block data transfer in cache-coherent multiprocessors. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

    Cited By

    View all
    • (2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 20-Oct-2023
    • (2023)PreFlush: Lightweight Hardware Prediction Mechanism for Cache Line Flush and Writeback2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00015(74-85)Online publication date: 21-Oct-2023
    • (2023)Reconciling Selective Logging and Hardware Persistent Memory Transaction2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071088(664-676)Online publication date: Feb-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 16, Issue 2
    June 2019
    317 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3325131
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 May 2019
    Accepted: 01 February 2019
    Revised: 01 February 2019
    Received: 01 December 2018
    Published in TACO Volume 16, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Memory systems
    2. computer architecture
    3. emerging memory technologies

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)230
    • Downloads (Last 6 weeks)22

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 20-Oct-2023
    • (2023)PreFlush: Lightweight Hardware Prediction Mechanism for Cache Line Flush and Writeback2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00015(74-85)Online publication date: 21-Oct-2023
    • (2023)Reconciling Selective Logging and Hardware Persistent Memory Transaction2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071088(664-676)Online publication date: Feb-2023
    • (2022) Pop-Crypt: Identification and Management of Pop ular Words for Enhancing Lifetime of En Crypt ed Nonvolatile Main Memories IEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.318379330:9(1219-1229)Online publication date: Sep-2022
    • (2021)Clobber-NVM: log less, re-execute moreProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446730(346-359)Online publication date: 19-Apr-2021
    • (2021)Idempotence-Based Preemptive GPU Kernel Scheduling for Embedded SystemsIEEE Transactions on Computers10.1109/TC.2020.298825170:3(332-346)Online publication date: 9-Feb-2021
    • (2021)BBB: Simplifying Persistent Programming using Battery-Backed Buffers2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00019(111-124)Online publication date: Feb-2021
    • (2020)WETProceedings of the 57th ACM/EDAC/IEEE Design Automation Conference10.5555/3437539.3437610(1-6)Online publication date: 20-Jul-2020
    • (2020)SageProceedings of the VLDB Endowment10.14778/3397230.339725113:9(1598-1613)Online publication date: 1-May-2020
    • (2019)Compiler-support for Critical Data Persistence in NVMACM Transactions on Architecture and Code Optimization10.1145/337123616:4(1-25)Online publication date: 26-Dec-2019

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media