Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3492321.3519590acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

ResPCT: fast checkpointing in non-volatile memory for multi-threaded applications

Published: 28 March 2022 Publication History

Abstract

Non-volatile memory (NVMM) technologies are a great opportunity to build fast fault-tolerant programs, as they provide persistent storage in main memory. However, since the processor caches remain volatile, solutions are needed to recover a consistent state from NVMM after a crash. This paper presents ResPCT, a checkpointing approach to make multi-threaded programs fault tolerant, by flushing persistent data structures to NVMM periodically. ResPCT uses In-Cache-Line logging to efficiently track modifications during failure-free execution, and to restore a consistent state after a crash. The ResPCT API enables programmers to position restart points in their program, which simplifies the identification of the persistent program state and can also help improving performance. Experiments with representative benchmarks and applications, show that ResPCT can outperform state-of-the-art solutions by up to 2.7×, and that its overhead can be as low as 4% at large core count.

References

[1]
Mohammad Alshboul, Prakash Ramrakhyani, William Wang, James Tuck, and Yan Solihin. 2021. BBB: Simplifying Persistent Programming using Battery-Backed Buffers. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) (Seoul, South Korea). 111--124.
[2]
Mohammad Alshboul, James Tuck, and Yan Solihin. 2018. Lazy persistency: A high-performing and write-efficient software persistency technique. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) (Los Angeles, USA). 439--451.
[3]
Leonardo Bautista-Gomez, Seiji Tsuboi, Dimitri Komatitsch, Franck Cappello, Naoya Maruyama, and Satoshi Matsuoka. 2011. FTI: High performance fault tolerance interface for hybrid systems. In Proceedings of 2011 international conference for high performance computing, networking, storage and analysis (SC) (Seatle, USA). 1--32.
[4]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT) (Toronto, Canada). 72--81.
[5]
Hans-J Boehm and Sarita V Adve. 2008. Foundations of the C++ concurrency memory model. ACM SIGPLAN Notices 43, 6 (2008), 68--78.
[6]
Dhruva R Chakrabarti, Hans-J Boehm, and Kumud Bhandari. 2014. Atlas: Leveraging locks for non-volatile memory consistency. ACM SIGPLAN Notices 49, 10 (2014), 433--452.
[7]
Kyeongmin Cho, Sung-Hwan Lee, Azalea Raad, and Jeehoon Kang. 2021. Revamping hardware persistency models: view-based and axiomatic persistency models for Intel-x86 and Armv8. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI). 16--31.
[8]
Joel Coburn, Adrian M Caulfield, Ameen Akel, Laura M Grupp, Rajesh K Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: making persistent objects fast and safe with next-generation, nonvolatile memories. ACM SIGARCH Computer Architecture News 39, 1 (2011), 105--118.
[9]
Nachshon Cohen, David T Aksun, Hillel Avni, and James R Larus. 2019. Fine-Grain Checkpointing with In-Cache-Line Logging. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Providence, USA). 441--454.
[10]
Nachshon Cohen, Michal Friedman, and James R Larus. 2017. Efficient logging in non-volatile memory by exploiting coherency protocols. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1--24.
[11]
Nachshon Cohen, Rachid Guerraoui, and Igor Zablotchi. 2018. The inherent cost of remembering consistently. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (SPAA) (Vienna, Austria). 259--269.
[12]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing (SoCC) (Indianapolis, USA). 143--154.
[13]
Andreia Correia, Pascal Felber, and Pedro Ramalhete. 2018. Romulus: Efficient algorithms for persistent transactional memory. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (SPAA) (Vienna, Austria). 271--282.
[14]
Andreia Correia, Pascal Felber, and Pedro Ramalhete. 2020. Persistent memory and the rise of universal constructions. In Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys) (Heraklion, Crete, Greece). 1--15.
[15]
Marc A De Kruijf, Karthikeyan Sankaralingam, and Somesh Jha. 2012. Static analysis and compiler design for idempotent processing. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI) (Beijing, China). 475--486.
[16]
Facebook. 2017. RocksDB. http://rocksdb.org/.
[17]
Michal Friedman, Maurice Herlihy, Virendra Marathe, and Erez Petrank. 2018. A persistent lock-free queue for non-volatile memory. ACM SIGPLAN Notices 53, 1 (2018), 28--40.
[18]
Ellis R Giles, Kshitij Doshi, and Peter Varman. 2015. SoftWrAP: A lightweight framework for transactional support of storage class memory. In 2015 31st Symposium on Mass Storage Systems and Technologies (MSST) (Santa Clara, USA). 1--14.
[19]
Vaibhav Gogte, Stephan Diestelhorst, William Wang, Satish Narayanasamy, Peter M Chen, and Thomas F Wenisch. 2018. Persistency for synchronization-free regions. ACM SIGPLAN Notices 53, 4 (2018), 46--61.
[20]
Jinyu Gu, Qianqian Yu, Xiayang Wang, Zhaoguo Wang, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Pisces: a scalable and efficient persistent transactional memory. In 2019 USENIX Annual Technical Conference (USENIXATC 19) (Renton, USA). 913--928.
[21]
Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures (SPAA) (Thira, Santorini, Greece). 355--364.
[22]
Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463--492.
[23]
Terry Ching-Hsiang Hsu, Helge Brügner, Indrajit Roy, Kimberly Keeton, and Patrick Eugster. 2017. NVthreads: Practical persistence for multi-threaded applications. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys) (Belgrade, Serbia). 468--482.
[24]
Intel. 2021. eADR: New Opportunities for Persistent Memory Applications. https://www.intel.com/content/www/us/en/developer/articles/technical/eadr-new-opportunities-for-persistent-memory-applications.html.
[25]
Intel. 2021. Intel Optane DC Persistent Memory. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.
[26]
Joseph Izraelevitz, Hammurabi Mendes, and Michael L Scott. 2016. Brief announcement: Preserving happens-before in persistent memory. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) (Pacific Grove, USA). 157--159.
[27]
Joseph Izraelevitz, Hammurabi Mendes, and Michael L Scott. 2016. Linearizability of persistent memory objects under a full-system-crash failure model. In International Symposium on Distributed Computing (DISC) (Paris, France). 313--327.
[28]
Nikolaos D. Kallimanis. 2021. Synch: A framework for concurrent data-structures and benchmarks. Journal of Open Source Software 6, 64 (2021), 3143.
[29]
Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, Weimin Zheng, and Jinglei Ren. 2017. DudeTM: Building durable transactions with decoupling for persistent memory. ACM SIGPLAN Notices 52, 4 (2017), 329--343.
[30]
Qingrui Liu, Joseph Izraelevitz, Se Kwon Lee, Michael L Scott, Sam H Noh, and Changhee Jung. 2018. iDO: Compiler-directed failure atomicity for nonvolatile memory. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (Fukuoka, Japan). 258--270.
[31]
Haonan Lu, Kaushik Veeraraghavan, Philippe Ajoux, Jim Hunt, Yee Jiun Song, Wendy Tobagus, Sanjeev Kumar, and Wyatt Lloyd. 2015. Existential consistency: measuring and understanding consistency at facebook. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP) (Monterey, USA). 295--310.
[32]
Amirsaman Memaripour, Anirudh Badam, Amar Phanishayee, Yanqi Zhou, Ramnatthan Alagappan, Karin Strauss, and Steven Swanson. 2017. Atomic in-place updates for non-volatile main memories with kamino-tx. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys) (Belgrade, Serbia). 499--512.
[33]
Amirsaman Memaripour, Joseph Izraelevitz, and Steven Swanson. 2020. Pronto: Easy and Fast Persistence for Volatile Data Structures. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Lausanne, Switzerland). 789--806.
[34]
Memcached. 2018. memcached - a distributed memory object caching system. http://memcached.org/.
[35]
Sanketh Nalli, Swapnil Haria, Mark D Hill, Michael M Swift, Haris Volos, and Kimberly Keeton. 2017. An analysis of persistent memory use with WHISPER. ACM SIGPLAN Notices 52, 4 (2017), 135--148.
[36]
Faisal Nawab, Joseph Izraelevitz, Terence Kelly, Charles B Morrey III, Dhruva R Chakrabarti, and Michael L Scott. 2017. Dalí: A periodically persistent hash map. In 31st International Symposium on Distributed Computing (DISC) (Vienna, Austria).
[37]
pmem.io. 2022. Persistent Memory Development Kit. https://pmem.io/pmdk/.
[38]
The NDCTL project. 2022. NDCTL User Guide. https://docs.pmem.io/ndctl-user-guide/.
[39]
Azalea Raad, John Wickerson, Gil Neiger, and Viktor Vafeiadis. 2019. Persistency semantics of the Intel-x86 architecture. Proceedings of the ACM on Programming Languages 4, POPL (2019), 1--31.
[40]
Pedro Ramalhete, Andreia Correia, and Pascal Felber. 2021. Efficient algorithms for persistent transactional memory. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). 1--15.
[41]
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. 2007. Evaluating mapreduce for multi-core and multiprocessor systems. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA) (Phoenix, USA). 13--24.
[42]
Andy Rudoff. 2017. Persistent memory programming. Login: The Usenix Magazine 42, 2 (2017), 34--40.
[43]
Yuval Tamir and Carlo H Sequin. 1984. Error recovery in multicomputers using global checkpoints. In In International Conference on Parallel Processing (ICPP) (Lausanne, Switzerland). 32--41.
[44]
Haris Volos, Andres Jaan Tack, and Michael M Swift. 2011. Mnemosyne: Lightweight persistent memory. ACM SIGARCH Computer Architecture News 39, 1 (2011), 91--104.
[45]
Haosen Wen, Wentao Cai, Mingzhe Du, Louis Jenkins, Benjamin Valpey, and Michael L Scott. 2021. A Fast, General System for Buffered Persistent Data Structures. In International Conference on Parallel Processing (ICPP) (Chicago, USA). 1--11.
[46]
Kai Wu, Ivy Peng, Jie Ren, and Dong Li. 2020. Ribbon: High performance cache line flushing for persistent memory. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT) (Atlanta, USA). 427--439.
[47]
Zhenwei Wu, Kai Lu, Andrew Nisbet, Wenzhe Zhang, and Mikel Luján. 2020. PMThreads: persistent memory threads harnessing versioned shadow copies. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (London, UK). 623--637.
[48]
Yi Xu, Joseph Izraelevitz, and Steven Swanson. 2021. Clobber-NVM: log less, re-execute more. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 346--359.
[49]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. 2020. An empirical guide to the behavior and use of scalable persistent memory. In 18th USENIX Conference on File and Storage Technologies (FAST) (Santa Clara, USA). 169--182.
[50]
Pantea Zardoshti, Michael Spear, Aida Vosoughi, and Garret Swart. 2020. Understanding and Improving Persistent Transactions on Optane™ DC Memory. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (New Orleans, USA). 348--357.
[51]
Yoav Zuriel, Michal Friedman, Gali Sheffi, Nachshon Cohen, and Erez Petrank. 2019. Efficient lock-free durable sets. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1--26.

Cited By

View all
  • (2024)Userland Page Table - A Key for Transparent Persistent MemoryProceedings of the 25th International Middleware Conference: Demos, Posters and Doctoral Symposium10.1145/3704440.3704774(7-8)Online publication date: 2-Dec-2024
  • (2024)MCAD: a multi-controller storage system adopting NVM and a global deduplication algorithmProceedings of the 2024 4th International Conference on Artificial Intelligence, Automation and Algorithms10.1145/3700523.3700554(168-173)Online publication date: 27-Sep-2024
  • (2024)Buffered Persistence in B+ TreesProceedings of the ACM on Management of Data10.1145/36988012:6(1-24)Online publication date: 20-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '22: Proceedings of the Seventeenth European Conference on Computer Systems
March 2022
783 pages
ISBN:9781450391627
DOI:10.1145/3492321
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. checkpointing
  2. fault tolerance
  3. multi-threaded applications
  4. non-volatile memory

Qualifiers

  • Research-article

Conference

EuroSys '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Userland Page Table - A Key for Transparent Persistent MemoryProceedings of the 25th International Middleware Conference: Demos, Posters and Doctoral Symposium10.1145/3704440.3704774(7-8)Online publication date: 2-Dec-2024
  • (2024)MCAD: a multi-controller storage system adopting NVM and a global deduplication algorithmProceedings of the 2024 4th International Conference on Artificial Intelligence, Automation and Algorithms10.1145/3700523.3700554(168-173)Online publication date: 27-Sep-2024
  • (2024)Buffered Persistence in B+ TreesProceedings of the ACM on Management of Data10.1145/36988012:6(1-24)Online publication date: 20-Dec-2024
  • (2024)ZAN: Using Zone-weight Aware Data Layout and NVM to Improve the Performance of Drive-Managed SMRProceedings of the 2024 4th International Conference on Internet of Things and Machine Learning10.1145/3697467.3697653(248-252)Online publication date: 9-Aug-2024
  • (2024)Fastmove: A Comprehensive Study of On-Chip DMA and its Demonstration for Accelerating Data Movement in NVM-based Storage SystemsACM Transactions on Storage10.1145/365647720:3(1-30)Online publication date: 6-Jun-2024
  • (2024)Prosper: Program Stack Persistence in Hybrid Memory Systems2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00091(1168-1183)Online publication date: 2-Mar-2024
  • (2023)A Design Space Exploration and Evaluation for Main-Memory Hash Joins in Storage Class MemoryProceedings of the VLDB Endowment10.14778/3583140.358314416:6(1249-1263)Online publication date: 1-Feb-2023
  • (2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 14-Dec-2023
  • (2023)An NVM Performance Study Towards Whole System Persistence on Server PlatformsProceedings of the 1st Workshop on Disruptive Memory Systems10.1145/3609308.3625269(45-51)Online publication date: 23-Oct-2023
  • (2023)General-purpose Asynchronous Periodic Checkpointing in Hybrid MemoryProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605648(675-684)Online publication date: 7-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media