Analysis and performance optimization of checkpointing schemes with task duplication

January 1996

Author:
Avi Ziv

Publisher:

Stanford University
408 Panama Mall, Suite 217
Stanford
CA
United States

Order Number:UMI Order No. GAX96-02996

Bibliometrics

Abstract

This thesis deals with fault tolerant schemes that include checkpointing to shorten recovery time after failures, and task duplication for fault detection. Until now there was no known analytical method to analyze these schemes, and simulation was used to check their performance. The thesis includes a new analysis technique for checkpointing schemes with task duplication. This technique gives an easy-to-use method to analyze and study the performance of the schemes. A few applications of the analysis tool, such as finding the optimal interval between checkpoints and comparing different aspects in the performance of existing schemes, are given.

One of conclusions we reached from studying the performance of existing schemes is that the system on which the scheme is implemented can have a major effect on the scheme performance. The thesis describes new checkpointing schemes that consist of two types of checkpoints, compare checkpoints and store checkpoints. The two types of checkpoints can be used to tune the schemes to the system they are used on, and enable an efficient use of the system resources. Analysis results show that using two types of checkpoints can lead to a significant improvement in the performance of checkpointing schemes. Experimental results, obtained on the Intel Paragon parallel computer and a cluster of workstations, confirm that the tuning of checkpointing schemes to the specific systems they are used on can significantly improve their performance.

Another way to improve the performance of checkpointing schemes is to use changes in the checkpointing cost to improve the checkpointing placement strategy. A new on-line algorithm, that uses past and present knowledge when it decides whether or not to place a checkpoint, is presented. Analysis of the new scheme shows that the total overhead of execution time when the proposed algorithm is used is significantly smaller than the overhead when fixed intervals are used. Although the proposed on-line algorithm uses only knowledge about the past and present, its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost in all possible locations.

Cited By

Contributors

Avi Ziv
IBM Research - Haifa
- Publication Years1991 - 2007
- Publication counts31
- Citation count447
- Available for Download10
- Downloads (cumulative)3,635
- Downloads (12 months)358
- Downloads (6 weeks)53
- Average Downloads per Article364
- Average Citation per Article14
View Full Profile

Index Terms

Analysis and performance optimization of checkpointing schemes with task duplication
1. Applied computing
  1. Physical sciences and engineering
    1. Electronics

Comments

Recommendations

Performance Optimization of Checkpointing Schemes with Task Duplication

In checkpointing schemes with task duplication, checkpointing serves two purposes: detecting faults by comparing the processors' states at checkpoints, and reducing fault recovery time by supplying a safe point to rollback to. In this paper, we show ...
Performance Optimization of Checkpointing Schemes with Task Duplication
IMSCCS '06: Proceedings of the First International Multi-Symposiums on Computer and Computational Sciences - Volume 2 (IMSCCS'06) - Volume 02

Using store-checkpoints (SCPs) and compare- checkpoints (CCPs), we present an adaptive checkpointing scheme that dynamically adjusts the checkpointing interval on line in this paper. With additional SCPs and CCPs, we can use both the comparison and ...
Analysis of Checkpointing Schemes with Task Duplication

This paper suggests a technique for analyzing the performance of checkpointing schemes with task duplication. We show how this technique can be used to derive the average execution time of a task and other important parameters related to the performance ...

Browse Theses

Sections

Cited By

Index Terms

Performance Optimization of Checkpointing Schemes with Task Duplication

Performance Optimization of Checkpointing Schemes with Task Duplication

Analysis of Checkpointing Schemes with Task Duplication

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Performance Optimization of Checkpointing Schemes with Task Duplication

Performance Optimization of Checkpointing Schemes with Task Duplication

Analysis of Checkpointing Schemes with Task Duplication