Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1267411.1267429guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Libckpt: transparent checkpointing under Unix

Published: 16 January 1995 Publication History

Abstract

Checkpointing is a simple technique for rollback recovery: the state of an executing program is periodically saved to a disk file from which it can be recovered after a failure. While recent research has developed a collection of powerful techniques for minimizing the overhead of writing checkpoint files, checkpointing remains unavailable to most application developers. In this paper we describe libckpt, a portable checkpointing tool for Unix that implements all applicable performance optimizations which are reported in the literature. While libckpt can be used in a mode which is almost totally transparent to the programmer, it also supports the incorporation of user directives into the creation of checkpoints. This user-directed checkpointing is an innovation which is unique to our work.

References

[1]
[1] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK User's Guide. SIAM, Philadelphia, PA, 1992.
[2]
[2] E. N. Elnozahy, D. B. Johnson, and W. Zwaenepoel. The performance of consistent checkpointing. In 11th Symposium on Reliable Distributed Systems, pages 39-47, October 1992.
[3]
[3] S. I. Feldman and C. B. Brown. Igor: A system for program debugging via reversible execution. ACM SIGPLAN Notices, Workshop on Parallel and Distributed Debugging, 24(1):112-123, Jan 1989.
[4]
[4] R. Fitzgerald and R.F. Rashid. The integration of virtual memory management and interprocess communication in accent. ACM Transactions on Computer Systems, 4(2):147-177, May 1986.
[5]
[5] J. J. Hack, R. Jakob, and D. L. Williamson. Solutions to the shallow water test set using the spectral transform method. Technical Report TN-388-STR, National Center for Atmospheric Research, Boulder, CO, 1993.
[6]
[6] J. Kennington. A primal partitioning code for solving multicommodity flow problems (version 1). Technical Report IEOR-79009, Southern Methodist University, 1979.
[7]
[7] J. León, A. L. Fisher, and P. Steenkiste. Fail-safe PVM: A portable package for distributed programming with transparent recovery. Technical Report CMU-CS-93-124, Carnegie Mellon University, February 1993.
[8]
[8] C-C. J. Li and W. K. Fuchs. CATCH - Compiler-assisted techniques for checkpointing. In 20th International Symposium on Fault Tolerant Computing, pages 74-81, 1990.
[9]
[9] K. Li, J. F. Naughton, and J. S. Plank. Real-time, concurrent checkpoint for parallel programs. In Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 79-88, March 1990.
[10]
[10] K. Li, J. F. Naughton, and J. S. Plank. Low-latency, concurrent checkpointing for parallel programs. IEEE Transactions on Parallel and Distributed Systems, 5(8):874-879, August 1994.
[11]
[11] M. Litzkow and M. Solomon. Supporting checkpointing and process migration outside the Unix kernel. In Conference Proceedings, Usenix Winter 1992 Technical Conference, pages 283-290, San Francisco, CA, January 1992.
[12]
[12] D. Z. Pan and M. A. Linton. Supporting reverse execution of parallel programs. ACM SIGPLAN Notices, Workshop on Parallel and Distributed Debugging, 24(1):124-129, January 1989.
[13]
[13] J. S. Plank and K. Li. Ickp -- a consistent checkpointer for multicomputers. IEEE Parallel & Distributed Technology, 2(2):62-67, Summer 1994.
[14]
[14] L. M. Silva, B. Veer, and J. G. Silva. Checkpointing SPMD applications on transputer networks. In Scalable High Performance Computing Conference, pages 694-701, Knoxville, TN, May 1994.
[15]
[15] W. Richard Stevens. Advanced Programming in the UNIX Environment. Addison-Wesley, Reading, Mass., 1992.
[16]
[16] M. M. Theimer, K. A. Lantz, and D. R. Cheriton. Preemptable remote execution facilities for the V-system. In Tenth ACM Symposium on Operating System Principles, pages 2-11, Orchas Island Washington, December 1985.
[17]
[17] T. A. Welch. A technique for high-performance data compression. IEEE Computer , 17:8-19, June 1984.
[18]
[18] P. R. Wilson and T. G Moher. Demonic memory for process histories. In SIGPLAN '89 Conference on Programming Language Design and Implementation, pages 330-343, June 1989.

Cited By

View all
  • (2021)The Aurora operating systemProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465285(136-143)Online publication date: 1-Jun-2021
  • (2020)On Providing OS Support to Allow Transparent Use of Traditional Programming Models for Persistent MemoryACM Journal on Emerging Technologies in Computing Systems10.1145/338863716:3(1-24)Online publication date: 23-Jun-2020
  • (2019)Fast in-memory CRIU for docker containersProceedings of the International Symposium on Memory Systems10.1145/3357526.3357542(53-65)Online publication date: 30-Sep-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
TCON'95: Proceedings of the USENIX 1995 Technical Conference Proceedings
January 1995
251 pages

Publisher

USENIX Association

United States

Publication History

Published: 16 January 1995

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)The Aurora operating systemProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465285(136-143)Online publication date: 1-Jun-2021
  • (2020)On Providing OS Support to Allow Transparent Use of Traditional Programming Models for Persistent MemoryACM Journal on Emerging Technologies in Computing Systems10.1145/338863716:3(1-24)Online publication date: 23-Jun-2020
  • (2019)Fast in-memory CRIU for docker containersProceedings of the International Symposium on Memory Systems10.1145/3357526.3357542(53-65)Online publication date: 30-Sep-2019
  • (2019)GreenMMProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330373(308-318)Online publication date: 26-Jun-2019
  • (2019)GPU snapshotProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330361(171-183)Online publication date: 26-Jun-2019
  • (2019)Efficient intermittent computing with differential checkpointingProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326357(70-81)Online publication date: 23-Jun-2019
  • (2018)Implementation of OpenMP Data-Sharing on CAPEProceedings of the 9th International Symposium on Information and Communication Technology10.1145/3287921.3287950(359-366)Online publication date: 6-Dec-2018
  • (2018)Zeroing memory deallocator to reduce checkpoint sizes in virtualized HPC environmentsThe Journal of Supercomputing10.1007/s11227-018-2548-674:11(6236-6257)Online publication date: 1-Nov-2018
  • (2017)Design and implementation of a new execution model for CAPEProceedings of the 8th International Symposium on Information and Communication Technology10.1145/3155133.3155199(453-459)Online publication date: 7-Dec-2017
  • (2017)Supporting Transparent Snapshot for Bare-metal Malware Analysis on Mobile DevicesProceedings of the 33rd Annual Computer Security Applications Conference10.1145/3134600.3134647(339-349)Online publication date: 4-Dec-2017
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media