Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Analysis and performance optimization of checkpointing schemes with task duplication
Publisher:
  • Stanford University
  • 408 Panama Mall, Suite 217
  • Stanford
  • CA
  • United States
Order Number:UMI Order No. GAX96-02996
Reflects downloads up to 12 Nov 2024Bibliometrics
Skip Abstract Section
Abstract

This thesis deals with fault tolerant schemes that include checkpointing to shorten recovery time after failures, and task duplication for fault detection. Until now there was no known analytical method to analyze these schemes, and simulation was used to check their performance. The thesis includes a new analysis technique for checkpointing schemes with task duplication. This technique gives an easy-to-use method to analyze and study the performance of the schemes. A few applications of the analysis tool, such as finding the optimal interval between checkpoints and comparing different aspects in the performance of existing schemes, are given.

One of conclusions we reached from studying the performance of existing schemes is that the system on which the scheme is implemented can have a major effect on the scheme performance. The thesis describes new checkpointing schemes that consist of two types of checkpoints, compare checkpoints and store checkpoints. The two types of checkpoints can be used to tune the schemes to the system they are used on, and enable an efficient use of the system resources. Analysis results show that using two types of checkpoints can lead to a significant improvement in the performance of checkpointing schemes. Experimental results, obtained on the Intel Paragon parallel computer and a cluster of workstations, confirm that the tuning of checkpointing schemes to the specific systems they are used on can significantly improve their performance.

Another way to improve the performance of checkpointing schemes is to use changes in the checkpointing cost to improve the checkpointing placement strategy. A new on-line algorithm, that uses past and present knowledge when it decides whether or not to place a checkpoint, is presented. Analysis of the new scheme shows that the total overhead of execution time when the proposed algorithm is used is significantly smaller than the overhead when fixed intervals are used. Although the proposed on-line algorithm uses only knowledge about the past and present, its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost in all possible locations.

Contributors
  • IBM Research - Haifa

Index Terms

  1. Analysis and performance optimization of checkpointing schemes with task duplication

    Recommendations