Algorithm For Asynchronous Check Pointing and Recovery

The document describes an asynchronous checkpointing and recovery algorithm for distributed systems. It makes assumptions about reliable communication channels and event-driven processes. Each process asynchronously checkpoints triplets of its state, incoming messages, and outgoing messages. During recovery, processes track messages sent and received to determine if any became orphaned after another process rolled back. Processes iteratively compare message counts and roll back to the latest consistent state where outgoing equals incoming counts. An example illustrates three processes recovering after one fails.

Uploaded by

vigneshg463

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

388 views

Algorithm For Asynchronous Check Pointing and Recovery

Uploaded by

vigneshg463

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CS3551 DISTRIBUTED COMPUTING

4.7 ALGORITHM FOR ASYNCHRONOUS CHECKPOINTING AND RECOVERY:

The algorithm of Juang and Venkatesan for recovery in a system that uses asynchronous check
pointing.
A. System Model and Assumptions
The algorithm makes the following assumptions about the underlying system:
• The communication channels are reliable, deliver the messages in FIFO order and have
infinite buffers.
• The message transmission delay is arbitrary, but finite.
• Underlying computation/application is event-driven: process P is at state s, receives
message m, processes the message, moves to state s’ and send messages out. So the
triplet (s, m, msgs_sent) represents the state of P
Two type of log storage are maintained:
– Volatile log: short time to access but lost if processor crash. Move to stable log
periodically.
– Stable log: longer time to access but remained if crashed
A. Asynchronous Check pointing
– After executing an event, the triplet is recorded without any synchronization with
other processes.
– Local checkpoint consist of set of records, first are stored in volatile log, then
moved to stable log.
B. The Recovery Algorithm
Notations and data structure
The following notations and data structure are used by the algorithm:
• RCVDi←j(CkPti) represents the number of messages received by processor pi from processor
pj , from the beginning of the computation till the checkpoint CkPti.

• SENTi→j(CkPti) represents the number of messages sent by processor pi to processor pj , from

the beginning of the computation till the checkpoint CkPti.

Basic idea
• Since the algorithm is based on asynchronous check pointing, the main issue in the
recovery is to find a consistent set of checkpoints to which the system can be restored.
• The recovery algorithm achieves this by making each processor keep track of both the
ROHINI COLLEGE OF ENGINEERING AND TECHNOLOGY
CS3551 DISTRIBUTED COMPUTING

number of messages it has sent to other processors as well as the number of messages it
has received from other processors.
• Whenever a processor rolls back, it is necessary for all other processors to find out if any
message has become an orphan message. Orphan messages are discovered by comparing
the number of messages sent to and received from neighboring processors.
For example, if RCVDi←j(CkPti) > SENTj→i(CkPtj) (that is, the number of messages received
by processor pi from processor pj is greater than the number of messages sent by processor pj to
processor pi, according to the current states the processors), then one or more messages at
processor pj are orphan messages.
The Algorithm
When a processor restarts after a failure, it broadcasts a ROLLBACK message that it had failed
Procedure RollBack_Recovery
processor pi executes the following:
STEP (a)
if processor pi is recovering after a failure then
CkPti := latest event logged in the stable storage
else
CkPti := latest event that took place in pi {The latest event at pi can be either in stable or in
volatile storage.}
end if
STEP (b)
for k = 1 1 to N {N is the number of processors in the system} do
for each neighboring processor pj do
compute SENTi→j(CkPti)
send a ROLLBACK(i, SENTi→j(CkPti)) message to pj
end for
for every ROLLBACK(j, c) message received from a neighbor j do
if RCVDi←j(CkPti) > c {Implies the presence of orphan messages} then
find the latest event e such that RCVDi←j(e) = c {Such an event e may be in the volatile storage
or stable storage.}
CkPti := e
end if

ROHINI COLLEGE OF ENGINEERING AND TECHNOLOGY

CS3551 DISTRIBUTED COMPUTING

end for
end for{for k}
D. An Example
Consider an example shown in Figure 2 consisting of three processors. Suppose processor Y
fails and restarts. If event ey2 is the latest checkpointed event at Y, then Y will restart from the
state corresponding to ey2.

Figure 2: An example of Juan-Venkatesan algorithm.

• Because of the broadcast nature of ROLLBACK messages, the recovery algorithm is
initiated at processors X and Z.
• Initially, X, Y, and Z set CkPtX ← ex3, CkPtY ← ey2 and CkPtZ ← ez2, respectively,
and X, Y, and Z send the following messages during the first iteration:
• Y sends ROLLBACK(Y,2) to X and ROLLBACK(Y,1) to Z;
• X sends ROLLBACK(X,2) to Y and ROLLBACK(X,0) to Z;
• Z sends ROLLBACK(Z,0) to X and ROLLBACK(Z,1) to Y.
Since RCVDX←Y (CkPtX) = 3 > 2 (2 is the value received in the ROLLBACK(Y,2) message
from Y), X will set CkPtX to ex2 satisfying RCVDX←Y (ex2) = 1≤ 2.

Since RCVDZ←Y (CkPtZ) = 2 > 1, Z will set CkPtZ to ez1 satisfying RCVDZ←Y (ez1) = 1 ≤
1.
At Y, RCVDY←X(CkPtY ) = 1 < 2 and RCVDY←Z(CkPtY ) = 1 = SENTZ←Y (CkPtZ).
Y need not roll back further.
ROHINI COLLEGE OF ENGINEERING AND TECHNOLOGY
CS3551 DISTRIBUTED COMPUTING

In the second iteration, Y sends ROLLBACK(Y,2) to X and ROLLBACK(Y,1) to Z;

Z sends ROLLBACK(Z,1) to Y and ROLLBACK(Z,0) to X;

X sends ROLLBACK(X,0) to Z and ROLLBACK(X, 1) to Y.
If Y rolls back beyond ey3 and loses the message from X that caused ey3, X can resend this
message to Y because ex2 is logged at X and this message available in the log. The second and
third iteration will progress in the same manner. The set of recovery points chosen at the end of
the first iteration, {ex2, ey2, ez1}, is consistent, and no further rollback occurs.

ROHINI COLLEGE OF ENGINEERING AND TECHNOLOGY

CS3492 DBMS Notes
No ratings yet
CS3492 DBMS Notes
165 pages
Programming Exercise: Analyzing Baby Names Assignment: Java Programming: Solving Problems With Software
No ratings yet
Programming Exercise: Analyzing Baby Names Assignment: Java Programming: Solving Problems With Software
3 pages
CS3551 DC - Unit - Ii Qbank Final With Answers
No ratings yet
CS3551 DC - Unit - Ii Qbank Final With Answers
40 pages
CSM Laboratory Manual Edited
No ratings yet
CSM Laboratory Manual Edited
22 pages
CSE 2-2 CS & Syllabus - UG - R20
No ratings yet
CSE 2-2 CS & Syllabus - UG - R20
83 pages
SPC Book
No ratings yet
SPC Book
128 pages
Cs3451 Ios Unit 5 Notes
No ratings yet
Cs3451 Ios Unit 5 Notes
21 pages
Unit I-Introduction
No ratings yet
Unit I-Introduction
23 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
cs3451 Ios Unit III Notes
No ratings yet
cs3451 Ios Unit III Notes
31 pages
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
100% (1)
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
17 pages
Unit 2
No ratings yet
Unit 2
45 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
DSA Lab Syllabus
No ratings yet
DSA Lab Syllabus
1 page
Architectural Design Challenges
No ratings yet
Architectural Design Challenges
12 pages
CCS341 Data Warehousing Notes Unit I
No ratings yet
CCS341 Data Warehousing Notes Unit I
30 pages
CS3401 - Algorithm
No ratings yet
CS3401 - Algorithm
37 pages
CST 402 - Distributed Computing
No ratings yet
CST 402 - Distributed Computing
78 pages
CS3451 Course Plan
100% (1)
CS3451 Course Plan
10 pages
Anna University - Operating Systems Lesson Plan and Lecture Plan
No ratings yet
Anna University - Operating Systems Lesson Plan and Lecture Plan
8 pages
CS3401 Algorithms Unit IV
No ratings yet
CS3401 Algorithms Unit IV
57 pages
RAID (Redundant Arrays of Independent Disks) - GeeksforGeeks
No ratings yet
RAID (Redundant Arrays of Independent Disks) - GeeksforGeeks
4 pages
cd3291 Dsa Study Material
No ratings yet
cd3291 Dsa Study Material
169 pages
18CS653 - NOTES Module 1
No ratings yet
18CS653 - NOTES Module 1
24 pages
Lab Record-Cs3401 Algorithms
No ratings yet
Lab Record-Cs3401 Algorithms
79 pages
Data Analytics Unit-3 Notes
No ratings yet
Data Analytics Unit-3 Notes
21 pages
Besck104e-204e
No ratings yet
Besck104e-204e
3 pages
Data Analytics Unit-I
No ratings yet
Data Analytics Unit-I
25 pages
Unit-III Notes
No ratings yet
Unit-III Notes
33 pages
Binary Search Tree: Reny Jose
No ratings yet
Binary Search Tree: Reny Jose
36 pages
Question Bank - Module 2 - Module-3 Module 4 -Module 5
No ratings yet
Question Bank - Module 2 - Module-3 Module 4 -Module 5
4 pages
CS8381 Data Structures Record
No ratings yet
CS8381 Data Structures Record
107 pages
ADA IMP Questions With Solution For GTU
No ratings yet
ADA IMP Questions With Solution For GTU
63 pages
Study On Intel 80386 Microprocessor
No ratings yet
Study On Intel 80386 Microprocessor
3 pages
Dbms
No ratings yet
Dbms
99 pages
Unit-3-Greedy Method PDF
No ratings yet
Unit-3-Greedy Method PDF
22 pages
WIT Important Questions-1
No ratings yet
WIT Important Questions-1
7 pages
Ai-Unit2 - QB-VDP
No ratings yet
Ai-Unit2 - QB-VDP
13 pages
Dbms Model Question Papers
No ratings yet
Dbms Model Question Papers
5 pages
Unix Lab Manual
No ratings yet
Unix Lab Manual
23 pages
Automated Bonafide Certificate Generator
No ratings yet
Automated Bonafide Certificate Generator
17 pages
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
No ratings yet
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
23 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Advance Java Questions
No ratings yet
Advance Java Questions
4 pages
CS3451 Os
No ratings yet
CS3451 Os
2 pages
MC 4203 Cloud Computing Technologies Prev QP
No ratings yet
MC 4203 Cloud Computing Technologies Prev QP
2 pages
Design and Analysis of Algorithms Laboratory (15Csl47)
100% (1)
Design and Analysis of Algorithms Laboratory (15Csl47)
12 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
73 pages
CS3362 Set3
No ratings yet
CS3362 Set3
3 pages
ccs346 Eda Lab Manual
No ratings yet
ccs346 Eda Lab Manual
41 pages
CN Lab Programs Part-B Java Programs
No ratings yet
CN Lab Programs Part-B Java Programs
14 pages
CS2302 Computer Networks Anna University Engineering Question Bank 4 U
No ratings yet
CS2302 Computer Networks Anna University Engineering Question Bank 4 U
48 pages
IOT Mod4@AzDOCUMENTS - in
No ratings yet
IOT Mod4@AzDOCUMENTS - in
17 pages
CS3301 - DS Unit 1 New
100% (1)
CS3301 - DS Unit 1 New
23 pages
Question Paper - AI (Feb 1)
No ratings yet
Question Paper - AI (Feb 1)
2 pages
Cs3481 - Dbms Record
No ratings yet
Cs3481 - Dbms Record
63 pages
CS9211-Computer Architecture Question
No ratings yet
CS9211-Computer Architecture Question
7 pages
12_JuangVenkatesan
No ratings yet
12_JuangVenkatesan
4 pages
Rohini 836843492
No ratings yet
Rohini 836843492
3 pages
Importing Presets Into ToneX
No ratings yet
Importing Presets Into ToneX
4 pages
I-7083/7083D/7083B/7083BD User Manual (V 1.1, Dec/2007) - 1
No ratings yet
I-7083/7083D/7083B/7083BD User Manual (V 1.1, Dec/2007) - 1
48 pages
Paper 17
No ratings yet
Paper 17
6 pages
Mid Important Questions-Pps
No ratings yet
Mid Important Questions-Pps
3 pages
Manuais
No ratings yet
Manuais
53 pages
TestNG - Assignment Problem Scenario & Instructions
No ratings yet
TestNG - Assignment Problem Scenario & Instructions
5 pages
Data Collection For Maintenance
No ratings yet
Data Collection For Maintenance
7 pages
Uttam Kumar: Experience Summary
No ratings yet
Uttam Kumar: Experience Summary
4 pages
Electronic Project 3
No ratings yet
Electronic Project 3
50 pages
MM-1014 ViewSmart 1600
No ratings yet
MM-1014 ViewSmart 1600
2 pages
M. Tech. EXAMINATION, May 2019: No. of Printed Pages: 03 Roll No. ......................
No ratings yet
M. Tech. EXAMINATION, May 2019: No. of Printed Pages: 03 Roll No. ......................
2 pages
Annexure A6 CCTV Specifications
No ratings yet
Annexure A6 CCTV Specifications
6 pages
IAT-II Question Paper With Solution of 15EC752 IoT and Wireless Sensor Networks Oct-2018-Richa, Eisha
No ratings yet
IAT-II Question Paper With Solution of 15EC752 IoT and Wireless Sensor Networks Oct-2018-Richa, Eisha
21 pages
History of Operating Systems
No ratings yet
History of Operating Systems
4 pages
18112729
No ratings yet
18112729
28 pages
Workshop 9 Dune Buggy Frame Structure - Vertically Supported With Springs
No ratings yet
Workshop 9 Dune Buggy Frame Structure - Vertically Supported With Springs
18 pages
Aver Media Seb 5116
No ratings yet
Aver Media Seb 5116
29 pages
LS 3 Ge16
No ratings yet
LS 3 Ge16
1 page
Chapter02-Accessing The Command Line
No ratings yet
Chapter02-Accessing The Command Line
4 pages
SAP GUI For HTML and Web Dynpro Tiles On FLP - Troubleshooting Guide
No ratings yet
SAP GUI For HTML and Web Dynpro Tiles On FLP - Troubleshooting Guide
16 pages
EN Jabra Evolve2 55 Data Sheet A4 WEB 150323
No ratings yet
EN Jabra Evolve2 55 Data Sheet A4 WEB 150323
2 pages
Introduction to Gi Fi Technology
No ratings yet
Introduction to Gi Fi Technology
10 pages
Create PO ME21N
100% (1)
Create PO ME21N
5 pages
CSC 318 Class Notes
No ratings yet
CSC 318 Class Notes
21 pages
Performance Assessment of Routing Protocols For Campus Area Emergency Delay-Tolerant Network
No ratings yet
Performance Assessment of Routing Protocols For Campus Area Emergency Delay-Tolerant Network
9 pages
Crash
No ratings yet
Crash
86 pages
Driver Updater
No ratings yet
Driver Updater
3 pages
Kenwood TK 272 G Service Manual
No ratings yet
Kenwood TK 272 G Service Manual
48 pages
Instructions, Fetch, Execution Cycle and Concept of Operand, Register and Storage
No ratings yet
Instructions, Fetch, Execution Cycle and Concept of Operand, Register and Storage
22 pages