Chapter 5
Chapter 5
Chapter 5
Introduction of Transaction
Collections of operations that forms a single logical unit of work are called
“Transaction”
A unit of program execution that accesses & possibly updates various data items.
Initiated by a user program written in a high level DML(SQL) or programming
language(c , c++, java)
Function calls statement or loop is a begin transaction & end statement of a function
is a end transaction.
Transaction consists of all operation executed between the begin transaction & end
transaction.
A transaction is a unit of program execution that accesses and possibly updates
various data items.
A transaction must see a consistent database.
During transaction execution the database may be inconsistent.
When the transaction is committed, the database must be consistent.
Two main issues to deal with:
Failures of various kinds, such as hardware failures and system crashes
Concurrent execution of multiple transactions
Process of Transaction
The transaction is executed as a series of reads and writes of database objects, which
are explained below:
Read Operation
To read a database object, it is first brought into main memory from disk, and then
its value is copied into a program variable as shown in figure.
To read a database object, it is first brought into main memory from disk, and
then its value is copied into a program variable as shown in figure.
Write Operation
To write a database object, an in-memory copy of the object is first modified and then
written to disk.
Example :
Suppose a bank employee transfers Rs 500 from A's account to B's account.This very
simple and small transaction involves several low-level tasks.
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
ACID Properties
A transaction is a very small unit of a program and it may contain several low level
tasks.
A transaction in a database system must maintain Atomicity,Consistency, Isolation,
and Durability − commonly known as ACID properties
If any error occurs in a transaction, then any changes already made will be
automatically rolled back.
Since the system was in a consistent state when the transaction was started, it
will once again be in a consistent state.
Example:
Assume that a transaction attempts to subtract 10 from A without altering B.
Because consistency is checked after each transaction, it is known that A + B = 100
before the transaction begins.
If the transaction removes 10 from A successfully, atomicity will be achieved.
However, a validation check will show that A + B = 90, which is inconsistent with the
rules of the database.
The entire transaction must be cancelled and the affected rows rolled back to their pre-
transaction state.
If there had been other constraints, triggers, every single change operation would have
been checked in the same way as above before the transaction was committed.
I : Isolation (concurrent changes invisibles)
The isolation property ensures that the concurrent execution of transactions
results in a system state that would be obtained if transactions were executed
serially,
i.e., one after the other. Providing isolation is the main goal of concurrency
control.
It requires that multiple transactions occurring at the same time not impact each
other’s execution.
Note that the isolation property does not ensure which transaction will execute
first, simply that they will not interfere with each other.
Example Isolation failure :
Assume two transactions execute at the same time, each attempting to modify
the same data. One of the two must wait until the other completes in order to
maintain isolation.
Consider two transactions.
T1 transfers 10 from A to B.
T2 transfers 10 from B to A.
Combined, there are four actions:
T1 subtracts 10 from A.
T1 adds 10 to B.
T2 subtracts 10 from B.
T2 adds 10 to A.
If these operations are performed in order, isolation is maintained, although
T2 must wait.
Consider what happens if T1 fails half-way through.
The database eliminates T1's effects,
and T2sees only valid data.
By interleaving the transactions, the actual order of actions might be:
T1 subtracts 10 from A.
T2 subtracts 10 from B.
T2 adds 10 to A.
T1 adds 10 to B.
consider if T1 fails halfway through. By the time T1 fails,
T2 has already modified A;
it cannot be restored to the value it had before T1 without leaving an invalid
database.
This is known as a write-write failure because two transactions attempted to
write to the same data field.
In a typical system, the problem would be resolved by reverting to the last
known good state, canceling the failed transaction T1,
and restarting the interrupted transaction T2 from the good state.
However, the changes are still queued in the disk buffer waiting to be committed to
the disk.
Power fails and the changes are lost.
The user assumes (understandably) that the changes have been made.
States of transaction:
Active : Initial state, the transaction stays in this state while it is executed.
Partially committed: after the final statement has been executed. When a transaction
executes its final operation/Final Statement , it is said to be in a partially committed
state.
Failed: normal execution can no longer proceed.
A transaction is said to be in a failed state if any of the checks made by the database
recovery system fails. Normal execution can no longer proceed..
Aborted : terminated , transaction has been rolled back & the db has been restored to
it’s prior to the start of the transaction
◦ Restart : hardware / software error
◦ Kill : internal logical error , rewriting the application program
• Committed : after successful completion
• The state after successful completion of the transaction. All its effects are now
permanently established on the database system.
We cannot abort or rollback a committed transaction.
Serializability:
Serializability is the major correctness condition for concurrent transactions'
executions.
It is considered the highest level of isolation between transactions, and plays an
essential role in concurrency control.
As such it is supported in all general purpose database systems
When multiple transactions are being executed by the operating system in a
multiprogramming environment, there are possibilities that instructions of one
transactions are interleaved with some other transaction.
Schedule − A chronological execution sequence of a transaction is called a schedule.
A schedule can have many transactions in it, each comprising of a number of
instructions/tasks.
Serial Schedule − It is a schedule in which transactions are aligned in such a way that
one transaction is executed first. When the first transaction completes its cycle, then the
next transaction is executed. Transactions are ordered one after the other. This type of
schedule is called a serial schedule, as transactions are executed in a serial manner.
Basic Assumption – Each transaction preserves database consistency. Thus serial
execution of a set of transactions preserves database consistency.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.
When several concurrent transactions are trying to access the same data item, the
instructions within these concurrent transactions must be ordered in some way so as
there are no problem in accessing and releasing the shared data item
Two aspects of Serializibility
1. Conflict Serializibility
2. View Serializibility
We ignore operations other than read and write instructions, and we assume that
transactions may perform arbitrary computations on data in local buffers in between
reads and writes.
Our simplified schedules consist of only read and write instructions
Conflict Serializability
Two instructions of two different transactions may want to access the same data item
in order to perform a read/write operation.
Conflict Serializability deals with detecting whether the instructions are conflicting in
any way, and specifying the order in which these two instructions will be executed in
case there is any conflict.
Rules: Conflict Serializability
If two instructions of the two concurrent transactions are both for read operation, then
they are not in conflict, and can be allowed to take place in any order.
If one of the instructions wants to perform a read operation and the other instruction
wants to perform a write operation, then they are in conflict, hence their ordering is
important.
If the read instruction is performed first, then it reads the old value of the data item and
after the reading is over, the new value of the data item is written.
If the write instruction is performed first, then updates the data item with the new value
and the read instruction, reads the newly updated value.
If both the transactions are for write operation, then they are in conflict but can be
allowed to take place in any order, because the transaction do not read the value updated
by each other.
However, the value that persists in the data item after the schedule is over is the one
written by the instruction that performed the last write.
It may happen that we may want to execute the same set of transaction in a different
schedule on another day.
Keeping in mind these rules, we may sometimes alter parts of one schedule (S1) to
create another schedule (S2) by swapping only the non-conflicting parts of the first
schedule.
The conflicting parts cannot be swapped in this way because the ordering of the
conflicting instructions is important and cannot be changed in any other schedule that
is derived from the first.
If these two schedules are made of the same set of transactions, then both S1 and S2
would yield the same result if the conflict resolution rules are maintained while creating
the new schedule. In that case the schedule S1 and S2 would be called Conflict
Equivalent.
Conflict Equivalence
Two schedules would be conflicting if they have the following properties −
Both belong to separate transactions.
Both accesses the same data item.
At least one of them is "write" operation.
Two schedules having multiple transactions with conflicting operations are said to be
conflict equivalent if and only if −
Both the schedules contain the same set of Transactions.
The order of conflicting pairs of operation is maintained in both the schedules
Conflict Serializability
Instructions l i and l j of transactions Ti and Tj respectively, conflict if and only if there
exists some item Q accessed by both l i and l j , and at least one of these instructions wrote
Q.
1. l i = read(Q), lj = read(Q). l i and l j don’t conflict.
2. l i = read(Q), lj = write(Q). They conflict.
3. l i = write(Q), lj = read(Q). They conflict
4. l i = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between l i and l j forces a (logical) temporal order between them. If
l i and l j are consecutive in a schedule and they do not conflict, their results would remain
the same even if they had been interchanged in the schedule
If a schedule S can be transformed into a schedule S1 by a series of swaps of non-
conflicting instructions, we say that S and S1 are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial
schedule
Example
Schedule 3 below can be transformed into Schedule 1, a serial schedule where T2 follows T1,
by series of swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable
T1 T2
Read(A)
Write(A)
Read(A)
Read(B) Write(A)
Write(B)
Read(B)
Write(B)
View Serializability
This is another type of serializability that can be derived by creating another schedule
out of an existing schedule, involving the same set of transactions.
These two schedules would be called View Serializable if the following rules are
followed while creating the second schedule out of the first.
Let S and S1 be two schedules with the same set of transactions. S and S1 are view
equivalent if the following three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then
transaction Ti must, in schedule S1, also read the initial value of Q.
2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that value
was produced by transaction Tj (if any), then transaction Ti must in schedule S1 also
read the value of Q that was produced by transaction Tj .
3. For each data item Q, the transaction (if any) that performs the final write(Q)
operation in schedule S must perform the final write(Q) operation in schedule S1.
As can be seen, view equivalence is also based purely on reads and writes alone.
A schedule S is view serializable it is view equivalent to a serial schedule.
Every conflict serializable schedule is also view serializable.
Schedule (from text) — a schedule which is view-serializable but not conflict
serializable.
T3 T4 T6
Read(Q)
Write(Q)
Write(Q)
Write(Q)
Every view serializable schedule that is not conflict serializable has blind writes
Rules
If in S1, T1 reads the initial value of the data item, then in S2 also, T1 should read the
initial value of that same data item.
If in S1, T1 writes a value in the data item which is read by T2, then in S2 also, T1
should write the value in the data item before T2 reads it.
If in S1, T1 performs the final write operation on that data item, then in S2 also, T1
should perform the final write operation on that data item.
Serializability Example:
Consider a schedule S in which there are 2 instruction I & J of transaction Ti & Tj
respectively (i ≠ j) .
If I & J refer to different data items, then we can swap I & J without affecting the
results of any results of any instruction in the schedule.
If I & J refer to the same data items Q then the order of the two steps may matter.
I- read(Q), J-read(Q) : order of I & J does not matter. The same value of Q is read by
Ti & Tj.
read(Q), J- write(Q) : If I comes before J then, Ti does not read the value of Q i.e.
written by Tj in instruction J. If J comes before I, then Ti reads the value of Q i.e.
written by Tj.
I- write(Q), J-read(Q) : order of I & J matters.
I- write(Q), J- write(Q) : since both instruction are write operations, the order of these
instructions does not affect either Ti or Tj.
I & J conflict if they are operations by different transactions on the same data item &
at least one of these instructions is a write operation.
The amount of data is sufficiently great that at any given time only fraction of the data
can be in primary memory & rest should be swapped from secondary memory as
needed.
Even if the entire database can be present in primary memory, there may be multiple
processes.
Concurrency control
Several transaction execute concurrently in the database.
The isolation property may no longer be preserved.
It is the system must control the interaction among the concurrent transactions.
This control is achieved through one of a variety of mechanisms called “Concurrency
control”
Types of locking Technique
The two Phase locking protocol
Time Stamping Protocol
Validation Based protocol
(Pessimistic concurrency control-Locking
Optimistic concurrency control )
Shared (S) : If a transaction Ti has obtained a shared mode lock on item Q, then Ti can
read , but can not write Q.
Exclusive(X) : If a transaction Ti has obtained a exclusive mode lock on item Q, then
Ti can both read & write Q.
Compatibility Function :
S X
S True False
X False False
T2 : lock-S(A);
read(A);
unlock(A);
lock-S(B);
read(B);
unlock(B);
display(A+B);
Disadvantages of locking
• Lock management overhead.
• Deadlock detection/resolution.
• Concurrency is significantly lowered, when congested nodes are locked.
• To allow a transaction to abort itself when mistakes occur, locks can’t be released until
the end of transaction, thus currency is significantly lowered
• (Most Important) Conflicts are rare. (We might get better performance by not locking,
and instead checking for conflicts at commit time.)
Granting of Locks
Before using a data item A, transaction requests lock for A from the lock manager.
If A is already locked and the existing lock is incompatible with the requested lock, the
transaction is delayed.
Otherwise, the lock is granted
Shrinking Phase : A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase .
The transaction acquires locks as needed.
Once the transaction releases a lock, it enters the shrinking phase, it can issue no more
lock requests.
The lock point is the moment when transitioning from the growing phase to the
shrinking phase
1. a database record
2. a field value of a database record
3. a disk block
4. a whole file
5. the whole database
If a typical transaction accesses a small number of records it is advantageous that the
data item granularity is one record.
If a transaction typically accesses many records of the same file it is better to have block
or file granularity so that the transaction will consider all those records as one data item.
A too-fine granularity will increase the frequency of locks requests and locks releases,
which therefore will add additional instructions.
You must locate a balance between a too-fine and too-coarse granularity.
For example if the data item size is a ‘Table’ denoted by Table1, a transaction T1 that
needs to lock a record X must lock the whole table Table1 that contains record X
because the lock is associated with the whole data item, Table1. If another transaction
T2 wants to lock a different record Y of Table1, it is forced to wait till T1 releases the
lock on Table1. If the data item size is single record, then transaction T2 would be able
to proceed, because it would lock different data item.
Multiple Granularity
Ti needs to access entire database- locking protocol – must lock each item in the db –
executing these lock time consuming.
A deadlock occurs when two transactions wait indefinitely for each other to unlock
data.
Every transaction in the set is waiting for another transaction in the set.
A set of waiting transaction {T0,T1----Tn} – T0 waiting for a data item that T1 holds
& T1--- T2 holds & Tn-1---- Tn holds & Tn ----- T0 holds.
Rolling back some of the transactions involved in the deadlock
o rollback of transaction may be partial
o - transaction may be roll back to the point where it obtained a lock whose
resolves the deadlock
Two principal methods for dealing with the deadlock problem.
Deadlock Prevention Protocol – to ensure that the system will never enter a deadlock
state.
Allow the system to enter a deadlock state, & then try to recover by using Deadlock
detection and Deadlock recovery.
Prevention is used if the probability that the system would enter a deadlock state is
relatively high.
Deadlock Prevention
Two Approaches
No cyclic can occur by ordering the request for locks.
Closer to deadlock recovery & performs. Transaction rollback instead of waiting for a
lock.
The first approach requires that each transaction locks all its data items before it begins
execution.
o Two Disadvantages
It is hard to predict, before the transaction begins , what data items need to be
locked.
Data item utilization may be very slow- many data item locked but unused for
a long time.
Second approach – is to use preemption & transaction rollbacks.
In preemption- when trans. T2 requests a lock that the trans. T1 holds, the lock granted
to T1 may be preempted by rolling back of T1 & granting of the lock to T2.
To control the preemption, we assign a unique timestamp to each transaction.
The system uses the timestamp only to decide whether a transaction should wait or
rollback.
If a transaction is rolled back, it retains it’s old timestamp when restarted.
Two different deadlock prevention schemes-
o Wait-die : scheme is a non-preemptive.
Ti requests a data item held by Tj , Ti - wait only if timestamp smaller than
Tj (Ti is older than Tj) otherwise Ti is rolled back(die).
E.g. : T2, T3, T4 have timestamps 5,10,15 resp. T2 requests data item held
by T3 & if T4 requests data item head by T3. So T2 wait & T4 rolled back.
o wound-wait :
Scheme is a preemptive tech.
Counter part to the wait-die scheme.
Ti requests a data item held by Tj , Ti - wait only if timestamp larger than Tj
(Ti is younger than Tj) otherwise Tj is rolled back(Tj is wound by Ti).
E.g. : T2, T3, T4 have timestamps 5,10,15 resp. T2 requests data item held
by T3 & if T4 requests data item head by T3. So T4 wait.
TimeOut Based Scheme –
o Specific amount of time will be wait
o Rollback transaction itself and restarted.
Deadlock Detection & Recovery
An algorithm that examines the state of the system is invoked periodically to determine
whether a deadlock has occurred.
If one has, then the system must attempt to recover from the deadlock.
o Maintain info. about current allocation of data items to transaction as well as
any outstanding data item requests.
o Provide an algorithm that uses this info. To determine whether the system has
entered a deadlock state.
Recover from the deadlock when the detection algorithm determines that a deadlock
exists.
Deadlock can be described in term of a directed graph is called as ‘ wait –for- graph’.
Graph consist of a pair G=(V, E).
The set of vertices consist of all transaction in the system.
Ti -> Tj is in E, then there is a directed edge from transaction Ti to Tj.
When Ti requests a data items -held by Tj then the edge Ti->Tj is inserted in the wait
for graph.
A deadlock exists in the system, if & only if wait-for – graph contains a cycle.
Each transaction involved in the cycle is said to be deadlocked.
To detect deadlock, the system requires to maintain wait-for-graph and periodically to
invoke an algorithm to search for a cycle in the graph.
Data items allocated to deadlocked transactions will be unavailable to other transaction
until the deadlock can be broken.
Deadlock Recovery :
When a detection algorithm determines- deadlock exists, the system must recover from
the deadlock.
Most common solution is to rollback one or more transaction to break the deadlock.
Three action taken
o Selection of victim: determines which transaction to rollback to break the
deadlock.
o Minimum cost
o Many factors –
• How long the transaction has computed & how much longer the transaction will
compute before it completes its task.
• How many data items the transaction has used.
• How many more data items the transaction needs to complete task.
• How Many transaction will be involved in the rollback.
Rollback : How far the transaction should be rollback.
o Total rollback – Abort transaction & restart.
o Partial rollback- System requires to maintain additional info. about the state of
all the running transaction.