Distributed Recovery Management: UNIT-4

Distributed Recovery Management
UNIT-4
 To maintain the consistency of data, each transaction in a
database environment must preserve the ACID property.
 To do this, it is the sole responsibility of a database
management system to adopt some appropriate techniques.
 This complicated task is performed by a DBMS module,
called recovery manager.
 The major objective of a recovery manager is to deploy an
appropriate technique for recovering the system from
failures and to preserve the database in the consistent state
that existed prior to the failure.
 In the context of distributed recovery management, two
terms are often used: reliability and availability.
 Reliability refers to the probability that the system under
consideration does not experience any failures in a given
time period.
 Availability refers to the probability that the system can
continue its normal execution according to the specification
at a given point in time in spite of failures.
Failures in a Distributed Database System
The following is the list of failures that may occur in a
centralized DBMS.
Transaction failure
System crash
Disk failure
In a distributed DBMS, several different types of failures can
occur
Site failure,
Communication link failure,
Loss of messages and
Network partition
Steps Followed after a Failure
 A distributed recovery manager must ensure the atomicity of

global transactions;
 If a distributed DBMS detects that a site failure has occurred,
then the following steps are to be followed for recovery.
All transactions that are affected by the failure must be
aborted.
A message must be broadcast about the failure of the site so
that no other site will try to access the failed site.
A checking must be done periodically to see whether the failed
site has recovered or not or, alternatively, must wait for the
message from the failed site that it has recovered.
On restart, the failed site must initiate a recovery procedure to
abort all partial transactions that were active at the time of
failure.
After local recovery, the failed site must update its data to
preserve the data consistency.
Distributed Recovery Protocols
 The recovery in a distributed DBMS is more complicated than in
a centralized DBMS, as it requires ensuring the atomicity of
global transactions as well as of local transactions.
 Therefore, it is necessary to modify the commit and abort
processing in a distributed DBMS, so that a global transaction
does not commit or abort until all subtransactions of it have
successfully committed or aborted.
 The distributed recovery protocols must have the capability to
deal with different types of failures such as site failures,
communication link failures and network partitions. Termination
protocols are unique to distributed database systems.
In a distributed DBMS, the execution of a global transaction may
involve several sites, and if one of the participating sites fails during the
execution of the global transaction, termination protocols are used to
terminate the transaction at the other participating sites.
One desirable property for distributed recovery protocols
is independency.
An independent recovery protocol decides how to terminate a
transaction that was executing at the time of a failure without consulting
any other site.
Moreover, the distributed recovery protocols should cater for different
types of failures in a distributed system to ensure that the failure of one
site does not affect processing at another site.
Protocols that obey this property are known as non-blocking protocols.
Distributed recovery protocols:
 Two-phase commit (2PC) protocol and
 Three-phase commit (3PC) protocol.
Two-Phase Commit Protocol (2PC)
 In 2PC protocol, there are two phases known as
Voting phase and
Decision phase.
Voting phase:
 The transaction coordinator asks all participants whether they
agree to commit the transaction.
 If one participant votes to abort or fails to vote within a specified
time period, the coordinator instructs all participants to abort the
transaction.
 On the other hand, if all participants vote to commit within the
time limit, the coordinator instructs all participants to commit the
transaction.
 In this case, any site is free to abort a transaction independently
at any time until it votes to commit. This type of abort is
called unilateral abort.
Decision phase:
 In decision phase, a decision is made as to whether the
transaction will be aborted or committed.
TERMINATION PROTOCOLS FOR 2PC
 The termination protocol is invoked when the coordinator or a
participant fails to receive a message within the time limit.
 The action that has to be taken in this situation depends on
whether the coordinator or the participant has failed and when
the timeout has occurred.
Coordinator
 Timeout in the wait state
 Timeout in the commit or abort state
Participant
 Timeout in the initial state
 Timeout in the ready state
The state transition diagram for 2PC
RECOVERY PROTOCOLS FOR 2PC
 Recovery protocols determine the actions to be taken by a failed
site on recovery, to maintain the atomicity and the consistency
property of transactions. The actions depend on the state of the
coordinator or the participant at the time of failure.
Coordinator failure
The following are the three possible cases of failure of the
coordinator.
 Failure in initial state–. Here, the coordinator has not yet
started the commit procedure. Therefore, it will start the commit
procedure on recovery.
 Failure in wait state–. In this case, the coordinator has sent the
“prepare” message to participants. On recovery, the coordinator
will again start the commit procedure from the beginning; thus, it
will send the “prepare” message to all participants once more.
 Failure in commit or abort state–. In this case, the coordinator
has sent the global decision to all participants. On restart, if the
coordinator has received all acknowledgements, it can complete
successfully. Otherwise, the coordinator will initiate the
termination protocol.
Participant failure
The following are the three possible cases of failure of the participant.
Failure in initial state–. Here the participant has not yet voted to
commit the transaction. Hence, the participant can unilaterally abort
the transaction, because the participant has failed before sending the
vote. In this case, the coordinator cannot make a global commit
decision without this participant’s vote.
Failure in ready state–. In this case, the participant has sent its vote
to the coordinator. On recovery, the participant will invoke the
termination protocol.
Failure in commit or abort state–. In this case, the participant has
completed the transaction; thus, on restart no further action is
necessary.
COMMUNICATION SCHEMES FOR 2PC
 There are several communication schemes that can be
employed for implementing 2PC protocol:
 Centralized 2PC,
 Linear 2PC and
 Distributed 2PC.
Centralized 2PC Communication Scheme
Linear 2PC Communication Scheme
 In linear 2PC, participants can communicate with one another.
 In this communication scheme, an ordering is maintained among
the sites in the distributed system. Let us consider that the sites
are numbered as 1, 2, 3, ..., n such that site number 1 is the
coordinator and the others are participants.
 The 2PC is implemented by a forward chain of communication
from the coordinator to the participant n in the voting phase and
a backward chain of communication from participant n to the
coordinator in the decision phase.
 In the voting phase, the coordinator passes the voting
instruction to site 2, site 2 votes and passes it to site 3, site
3 combines its vote and passes it to site 4 and so on.
 When the nth participant adds its vote, the global decision
is obtained and it is passed backward to the participants,
and eventually back to the coordinator.
 The linear 2PC reduces the number of messages compared
to centralized 2PC, but does not provide any parallelism,
thus, suffers from low response time performance.
Linear 2PC Communication Scheme
Distributed 2PC Communication Scheme
 In distributed 2PC, all participants can communicate with
each other directly.
 In this communication scheme, the coordinator sends
“prepare” message to all participants in the voting phase.
 In the decision phase, each participant sends its decision to
all other participants and waits for messages from all other
participants.
 As the participants can reach a decision on their own, the
distributed 2PC eliminates the requirement for the decision
phase of the 2PC protocol.
Three-Phase Commit Protocol
The requirements for the 3PC protocol are as follows:
 Network partition should not occur.
 All sites should not fail simultaneously, that is, at least one
site must be available always.
 At the most k sites can fail simultaneously, where k is less
than total number of sites in the distributed system.
 In 3PC, a new phase is introduced, known as precommit
phase.
 It is between the voting phase and the global decision
phase, for eliminating the uncertainty period for
participants that have voted commit and are waiting for the
global decision from the coordinator.
 In 3PC, if all participants vote for commit, the coordinator
sends a “global precommit” message to all participants.
 A participant who has received a “precommit” message
from the coordinator will definitely commit by itself, if it
has not failed.
 Each participant acknowledges the receipt of “precommit”
message to the coordinator and after receiving all
acknowledgements the coordinator sends a “global
commit” message to all participants.
TERMINATION PROTOCOLS FOR 3PC
 Coordinator
 Timeout in the wait state
 Timeout in the precommit state
 Timeout in the commit or abort state
 Participant
 Timeout in the initial state
 Timeout in the ready state
 Timeout in the precommit state
State Transition Diagram for 3PC
RECOVERY PROTOCOLS FOR 3PC
 Coordinator failure in wait state

 Coordinator failure in precommit state
 Participant failure in precommit state

Distributed Recovery Management: UNIT-4

Uploaded by

Copyright:

Available Formats

Distributed Recovery Management: UNIT-4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed Recovery Management: UNIT-4

Uploaded by

Copyright:

Available Formats

Distributed Recovery Management

 A distributed recovery manager must ensure the atomicity of

 Coordinator failure in wait state

You might also like