Database
Database
Database
Chapter 2
Transaction processing systems are systems with large databases and hundreds of concurrent
users executing database transactions. Examples of such systems include airline reservations,
banking, credit card processing, online retail purchasing and many other applications. These
systems require high availability and fast response time for hundreds of concurrent users.
According to the number of users who can use the system concurrently a DBMS is classified into
two. A DBMS is single-user if at most one user at a time can use the system, and it is multiuser
if many users can use the system—and hence accesses the database—concurrently. Database
systems used in banks, insurance agencies, stock exchanges, supermarkets, and many other
applications are multiuser systems. In these systems, hundreds or thousands of users are typically
operating on the data-base by submitting transactions concurrently to the system.
Multiple users can access databases simultaneously because of the concept of multiprogramming,
which allows the operating system of the computer to execute multiple programs—or processes—
at the same time. A single central processing unit (CPU) can only execute at most one process at
a time. However, multiprogramming operating systems execute some commands from one
process, then suspend that process and execute some commands from the next process, and so on.
A process is resumed at the point where it was suspended whenever it gets its turn to use the CPU
again. Hence, concurrent execution of processes is actually interleaved. Interleaving keeps the
CPU busy when a process requires an input or output (I/O) operation, such as reading a block from
disk. The CPU is switched to execute another process rather than remaining idle during I/O time.
Interleaving also prevents a long process from delaying other processes. If the computer system
has multiple hardware processors (CPUs), parallel processing of multiple processes is possible.
What is a Transaction?
A transaction is a unit of program under execution that accesses and possibly updates various
data items.
1. read(A) 3. write(A)
2. A := A – 50 4. read(B)
Page 1
Chapter 2-ADBMS
5. B := B + 50 6. write(B)
Database operations that form a transaction can either be embedded within an application
program or they can be specified interactively via a high-level query language such as
SQL.
One way of specifying the transaction boundaries is by specifying explicit begin
transaction and end transaction statements in an application program.
A successful transaction changes the database from one consistent state to another. A
consistent database state is one in which all data integrity constraints are satisfied. To
ensure consistency of the database, every transaction must begin with the database in a
known consistent state and end in a consistent state.
Two main issues to deal with:
→ Failures of various kinds, such as hardware failures and system crashes
→ Concurrent execution of multiple transactions
Although a SELECT query in SQL does not make any changes in the table, the SQL code
represents a transaction because it accesses the database. If the database existed in a consistent
state before the access, the database remains in a consistent state after the access because the
transaction did not alter the database. A transaction may consist of a single SQL statement or
a collection of related SQL statements.
If the database operations in a transaction do not update the database but only retrieve data, the
transaction is called a read-only transaction; otherwise it is known as a read-write transaction
Note: By default, MS Access does not support transaction management as discussed here. More
sophisticated DBMSs, such as Oracle, SQL Server, and DB2, do support the transaction
management components discussed in this chapter.
For recovery purposes, the system needs to keep track of when the transaction starts,
terminates, and commits or aborts.
Page 2
Chapter 2-ADBMS
serializability of transactions in a multiuser database environment. Concurrency control is
important because the simultaneous execution of transactions over a shared database can create
several data integrity and consistency problems. The three main problems are lost updates,
uncommitted data, and inconsistent retrievals.
1. Lost Update
The lost update problem occurs when two concurrent transactions, T1 and T2, are updating the
same data element and one of the updates is lost (overwritten by the other transaction.)
Assume that you have a product whose current quantity value is 35. Also assume that
two concurrent transactions, T1 and T2, occur that update the quantity value for some item in the
database. The transactions are as below:
Transaction Computation
Following table shows the serial execution of those transactions under normal circumstances,
yielding the correct answer quantity = 105.
1 T1 Read quantity 35
5 T2 quantity =135-30
But suppose that a transaction is able to read a product’s quantity value from the table before a
previous transaction (using the same product) has been committed. The sequence depicted below
shows how the lost update problem can arise. Note that the first transaction (T1) has not yet been
committed when the second transaction (T2) is executed.
Page 3
Chapter 2-ADBMS
1 T1 Read quantity 35
2 T2 Read qunatity 35
4 T2 quantity =35-30
The phenomenon of uncommitted data occurs when two transactions, T1 and T2, are executed
concurrently and the first transaction (T1) is rolled back after the second transaction (T2) has
already accessed the uncommitted data—thus violating the isolation property of transactions. To
illustrate that possibility, let’s use the same transactions described during the lost updates
discussion. T1 is forced to roll back due to an error. T1 transaction is rolled back to eliminate the
addition of the 100 units. Because T2 subtracts 30 from the original 35 units, the correct answer
should be 5.
Following table shows the serial execution of those transactions under normal circumstances,
yielding the correct answer quantity = 5.
1 T1 Read quantity 35
4 T1 Rollback 35
5 T2 Read quantity 35
6 T2 quantity =35-30
7 T2 Write quantity 5
Page 4
Chapter 2-ADBMS
Following table shows how the uncommitted data problem can arise when the ROLLBACK is
completed after T2 has begun its execution.
1 T1 Read quantity 35
5 T2 quantity =135-30
6 T1 ROLLBACK 35
Inconsistent retrievals occur when a transaction accesses data before and after another
transaction(s) finish working with such data. For example, an inconsistent retrieval would occur
if transaction T1 calculated some summary (aggregate) function over a set of data while another
transaction (T2) was updating the same data. The problem is that the transaction might read some
data before they are changed and other data after they are changed, thereby yielding inconsistent
results.
2. At the same time, T2 updates the quantity for two of the items int the PRODUCT table.
Transaction 1 Transaction 2
Select sum(quantity) from product; Update product set quantity=quantity+10 where pid = 1003
Commit
Page 5
Chapter 2-ADBMS
The following table shows the serial execution of those transactions under normal circumstances
1001 8 8
1002 32 32
1003 15 25(15+10)
1004 23 13(23-10)
1005 8 8
Total 86 86
Below Table demonstrates that inconsistent retrievals are possible during the transaction
execution, making the result of T1’s execution incorrect.
4 T2 Quantity=qunatity+10
9 T2 Quantity=qunatity-10
11 T2 Commit
The Unrepeatable Read Problem. A transaction T1 reads the same item twice and the item is
changed by another transaction T2 between the two reads. Hence, T1 receives different values for
its two reads of the same item. This may occur, for example, if during an airline reservation
Page 6
Chapter 2-ADBMS
transaction, a customer inquiry about seat availability on several flights. When the customer
decides on a particular flight, the transaction then reads the number of seats on that flight a second
time before completing the reservation, and it may end up reading a different value for the item.
Whenever a transaction is submitted to a DBMS for execution, the system is responsible for
making sure that either all the operations in the transaction are completed successfully and their
effect is recorded permanently in the database, or that the transaction does not have any effect on
the database or any other transactions. In the first case, the transaction is said to be committed,
whereas in the second case, the transaction is aborted. Database recovery restores a database from
a given state (usually inconsistent) to a previously consistent state after a failure. It is a service
provided by the DBMS to ensure that the database is reliable and remains in a consistent state in
the presence of failure.
Types of Failures
Failures are generally classified as transaction, system, and media failures. There are several
possible reasons for a transaction to fail in the middle of execution:
1. A computer failure (system crash): A hardware, software, or network error occurs in the
computer system during transaction execution. Hardware crashes are usually media failures—
for example, main memory failure.
2. A transaction or system error: Some operations in the transaction such as integer overflow
or division by zero or logical programming error may cause it to fail.
3. Local errors or exception conditions detected by the transaction: certain conditions that
may occur that necessitate cancellation of transaction. (Notice that an exception condition such
as insufficient account balance in a banking database may cause transaction such as fund
withdrawal to be canceled.)This exception should be programmed in the transaction itself &
hence would not be considered a failure.
4. Concurrency Control Enforcement: the concurrency control method may decide to abort the
transaction, to be restarted later, because it violates serializability or because several
transactions are in a state of deadlock.
5. Disk failure Some disk blocks may lose their data because of a read or write malfunction or
because of a disk read/write head crash. This may happen during a read or a write operation of
the transaction.
6. Physical problems and catastrophes: This refers to an endless list of problems that includes
power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake.
Transaction States:
Page 7
Chapter 2-ADBMS
1. Active state
2. Partially committed state
3. Committed state
4. Failed state
5. Terminated State
1. A transaction goes into an active state immediately after it starts execution where it can issue
READ or WRITE operations. When the transaction ends, it moves to the partially committed
state where certain recovery protocol ensures that a system failure will not result in an
inability to record the changes of the transaction permanently. (Changes are recorded in
TRANSACTION LOG)
a. If this check is successful the transaction enters into a commit point and enters the
committed state. If so, all its changes must be recorded permanently in the database.
3. Transaction can go the failed state, from the partially committed state if any of the checks
there fails or if the transaction is aborted from its active state itself.
a. The transaction may then have to be rolled back to undo the effect of WRITE
operations.
4. The terminated state corresponds to the transaction leaving the system.
5. Failed or Aborted transactions may be restarted later either automatically or after being
resubmitted by the user.
Transaction Properties
Each individual transaction must ensure atomicity, consistency, isolation, and durability.
These properties are sometimes referred to as the ACID test. In addition, when executing multiple
transactions, the DBMS must schedule the concurrent execution of the transaction’s operations.
The schedule of such transaction’s operations must exhibit the property of serializability.
Page 8
Chapter 2-ADBMS
Atomicity requires that all operations (SQL requests) of a transaction be completed; if not,
the transaction is aborted. If a transaction T1 has four SQL requests, all four requests must be
successfully completed; otherwise, the entire transaction is aborted. In other words, a
transaction is treated as a single, indivisible, logical unit of work.
Consistency indicates the permanence of the database’s consistent state. A transaction takes
a database from one consistent state to another consistent state. When a transaction is
completed, the database must be in a consistent state; if any of the transaction parts violates
an integrity constraint, the entire transaction is aborted.
Isolation means that the data used during the execution of a transaction cannot be used by a
second transaction until the first one is completed. In other words, if a transaction T1 is being
executed and is using the data item X, that data item cannot be accessed by any other
transaction (T2, ..., Tn) until T1 ends. This property is particularly useful in multiuser database
environments because several users can access and update the database at the same time.
Durability ensures that once transaction changes are done (committed), they cannot be
undone or lost, even in the event of a system failure.
Serializability ensures that the schedule for the concurrent execution of the transactions
yields consistent results. This property is important in multiuser and distributed databases,
where multiple transactions are likely to be executed concurrently. Naturally, if only a single
transaction is executed, serializability is not an issue.
A single-user database system automatically ensures serializability and isolation of the database
because only one transaction is executed at a time. The atomicity, consistency, and durability of
transactions must be guaranteed by the single-user DBMSs. (Even a single-user DBMS must
manage recovery from errors created by operating-system-induced interruptions, power
interruptions, and improper application execution.)
Multiuser databases are typically subject to multiple concurrent transactions. Therefore, the
multiuser DBMS must implement controls to ensure serializability and isolation of transactions in
addition to atomicity and durability—to guard the database’s consistency and integrity. For
example, if several concurrent transactions are executed over the same data set and the second
transaction updates the database before the first transaction is finished, the isolation property is
violated and the database is no longer consistent. The DBMS must manage the transactions by
using concurrency control techniques to avoid such undesirable situations.
A DBMS uses a transaction log to keep track of all transactions that update the database. The
information stored in this log is used by the DBMS for a recovery requirement triggered by a
ROLLBACK statement, a program’s abnormal termination, or a system failure such as a network
discrepancy or a disk crash. Some RDBMSs use the transaction log to recover a database forward
Page 9
Chapter 2-ADBMS
to a currently consistent state. After a server failure, for example, Oracle automatically rolls back
uncommitted transactions and rolls forward transactions that were committed but not yet written
to the physical database. This behaviour is required for transactional correctness and is typical of
any transactional DBMS. Although using a transaction log increases the processing overhead of a
DBMS, the ability to restore a corrupted database is worth the price.
While the DBMS executes transactions that modify the database, it also automatically updates the
transaction log. The transaction log stores:
The transaction log is a critical part of the database and it is usually implemented as one or
more files that are managed separately from the actual database files. The transaction log is subject
to common dangers such as disk-full conditions and disk crashes. Because the transaction log
contains some of the most critical data in a DBMS, some implementations support logs on several
different disks to reduce the consequences of a system failure
The Scheduler
Schedule – a sequence of instructions that specify the chronological order in which instructions
of concurrent transactions are executed. When transactions are executing concurrently in an
interleaved fashion, the order of execution of operations from the various transactions, forms what
is known as a transaction schedule (or history).
– A schedule for a set of transactions must consist of all instructions of those transactions.
– It must preserve the order in which the instructions appear in each individual transaction.
– A transaction that successfully completes its execution will have a commit instructions
as the last statement
– A transaction that fails to successfully complete its execution will have an abort
instruction as the last statement
As long as two transactions, T1 and T2, access unrelated data, there is no conflict and the order
of execution is irrelevant to the final outcome. But if the transactions operate on related (or the
same) data, conflict is possible among the transaction components and the selection of one
Page 10
Chapter 2-ADBMS
execution order over another might have some undesirable consequences. So how is the correct
order determined, and who determines that order? Fortunately, the DBMS handles that tricky
assignment by using a built-in scheduler.
The scheduler is a special DBMS process that establishes the order in which the operations
within concurrent transactions are executed. The scheduler interleaves the execution of database
operations to ensure serializability and isolation of transactions. To determine the appropriate
order, the scheduler bases its actions on concurrency control algorithms, such as locking or time
stamping methods.
– The scheduler also makes sure that the computer’s central processing unit (CPU) and
storage systems are used efficiently.
– Additionally, the scheduler facilitates data isolation to ensure that two transactions do not
update the same data element at the same time.
Transactions→ T1 T2 Result
– Several methods have been proposed to schedule the execution of conflicting operations
in concurrent transactions.
Serial Schedule: A schedule S is serial if, for every transaction T participating in the schedule, all
the operations of T are executed consecutively in the schedule.
Page 11
Chapter 2-ADBMS
– Eg: Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
TI T2 TI T2
Read(A) Read(A)
AA-50 TempA*0.1
Write(A) AA-temp
Read(B) Write(A)
B=B+50 Read(B)
Write(B) B=B+temp
Read(A) Write(B)
TempA*0.1 Read(A) •
AA-temp AA-50
Write(A) Write(A)
Read(B) Read(B)
B=B+temp B=B+50
Write(B) Write(B)
– Serial schedule never leaves the database in an inconsistent sate, so every serial schedule is
considered correct.
– If a transaction waits for an I/O operation to complete, we cannot switch the CPU Processor
to another transaction.
– If some transaction T is long, the other transactions must wait for T to complete all its
operations.
Page 12
Chapter 2-ADBMS
– For eg: Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
Two non-serial schedules with interleaving of operations for T1 and T2 is listed below.
TI T2 TI T2
Read(A) Read(A)
AA-50 AA-50
Write(A) Read(A)
Read(A) TempA*0.1
TempA*0.1 AA-temp
AA-temp Write(A)
Write(A) Read(B)
Read(B) Write(A)
B=B+50 Read(B)
Write(B) B=B+50
Read(B) Write(B)
Write(B) Write(B)
Non serial schedule I (above) give a correct result equal to the serial schedule whereas the non-
serial schedule II gives an erroneous result. Determining which of the non-serial schedules
always give a correct result and which may give erroneous results helps interleaving operations
in transactions. The concept used to characterize schedules in this manner is that of
serializability of a schedule.
Page 13
Chapter 2-ADBMS
1. Result Equivalency
2. Conflict Equivalency
3. View Equivalence
– Two schedules are called result equivalent if they produce the same final state of the
database. However two different schedules may accidently produce the same final state.
Hence result equivalence is not used to define equivalence of schedules.
– Two schedules are said to be conflict equivalent if the order of any two conflicting
operations is the same in both schedules. If a schedule S can be transformed into a schedule
S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict
equivalent.
To check whether a schedule is conflict equivalent , follow the step given below:
– For any 2 given schedules, say S1 and S2 if the order of all possible conflicting
operations is the same in both then it is said to be conflict equivalent.
T1 T2 T2 T1 T1 T2
Page 14
Chapter 2-ADBMS
So S1and S3 are Conflict Equivalent
Note:
– A Serializable Schedule gives the benefits of concurrent execution without giving up any
correctness.
– Practically, it is difficult to test for the serializability. Also, it is impractical to execute the
schedule and then test the result for serializability and cancel the effect of the schedule if
it is not serializable. The approach taken in most commercial DBMSs is to design
Protocols that will ensure serializability of all schedules.
Algorithm looks at only the read_item and write_item operations in a schedule to construct a
precedence graph (or serialization graph), which is a directed graph G = (N, E) that consists
of a set of nodes N = {T1, T2, ..., Tn } and a set of directed edges E = {e1,e2, ..., em }. There is
one node in the graph for each transaction Ti in the schedule. Each edge ei in the graph is of the
form (Tj→Tk ), 1 ≤ j ≤ n, 1 ≤ k ≤ n, where Tj is the starting node of ei and Tk is the ending node
of ei. Such an edge from node Tj to node Tk is created by the algorithm if one of the operations in
Tj appears in the schedule before some conflicting operation in Tk.
(Cycle means - the sequence starts and ends at the same node).
Topological Sorting
Page 15
Chapter 2-ADBMS
The process of ordering the nodes of an acrylic graph is known as topological sorting. We use
topological sorting to find the equivalent serial schedule for the conflict serializable schedule.
1. Consider the In-degree of the nodes. Find the nodes with in-degree zero.
In-degree= No of edges coming to the node. That is a vertex with no incoming edges.
T1 T2
2. Ignore the node whose in-degree is zero and note it down, T1.
T1 T2
3. Ignore all those edges which is connected with T1
4. Consider the remaining nodes and continue the above steps again. But here there is only
T2 whose in- degree is now zero. So the possible equivalent serial schedule is T1,T2.
Exercise:1
Consider the following graph for three transactions and specify whether it supports conflict
serializable schedule and write the equivalent serial schedule.
Step 1:
T1 T2 Indegree of T1 –0
Indegree of T2 –1
Indegree of T3 –2
Therefore consider T1 first and delete/ignore T1
T3 and all edges from T1.
Step 2
T2 Indegree of T2 –0
Indegree of T3 –1
Therefore consider T2 first and delete/ignore T2
and all edges from T1.
T3
T3 Step 3
Indegree of T3 –0
Therefore consider T3
Hence the serial schedule equivalent to the conflict serializable schedule is T1, T2 and T3.
Exercise:2
Page 16
Chapter 2-ADBMS
Consider the following concurrent schedule for three transactions and specify whether it is conflict
serializable schedule. If it is a conflict serializable schedule find the equivalent serial schedule.
T1 T2 T3
T1 T2
R(X)
T3
R(X)
W(X)
R(X)
W(X)
From T2→T3 there is R-W conflict, From T3→T1 there is W-R conflict, From T2→T1 there is
R-W conflict. As there is no cycle or loop in precedence graph the given schedule is a conflict
serializable schedule.
Step 1:
Step 3
Indegree of T1 –2
Indegree of T1 –0
Indegree of T2 –0
Therefore consider T1
Indegree of T3 –1
Therefore consider T2 first and delete/ignore T2
and all edges from T2.
Consider the following concurrent schedule for three transactions and specify whether it is conflict
serializable schedule. If it is a conflict serializable schedule find the equivalent serial schedule .
T1 T2 T3
R(x)
W(x)
W(x)
W(x)
Page 17
Chapter 2-ADBMS
Exercise: 4 (Try Yourself)
Consider the following two schedules S1 and S2 for the transactions T1 and T2. Now determine
which among the schedules are conflict serializable schedule.
T1 T2 T1 T2
R(x) R(x)
W(y) W(y)
W(x) R(y)
W(x)
View Equivalent
Two schedules is said to be view equivalent if the following three conditions hold:
1. Initial Reads: If T1 reads the initial data X in S1, then T1 also reads the initial data X in
S2.
2. W-R Conflict: If T1 reads the value written by T2 in S1, then T1 also reads the value
written by T2 in S2.
3. Final Write: If T1 performs the final write on the data value in S1, then it also performs
the final write on the data value in S2.
T1 T2 T3 T2 T3 T1
R(X) R(X)
R(X) R(X)
Page 18
Chapter 2-ADBMS
W(X) W(X)
R(X) R(X)
W(X) W(X)
Initial Reads: In S1 there are only two initial reads, one from T2 and one from T3. Similarly in
the given serial schedule also there are two initial reads exactly one from T2 and T3 on data item
X. So the initial read condition is satisfied here.
W-R Conflict: In S1 there is one write- read conflict from T3 to T1. Similarly there is write-read
schedule from T3 to T1. So the second condition is also satisfied.
Final write: In S1 the final write on X is from T1, similarly in schedule S2 also the final write on
X is from T1. So the final condition is also satisfied.
For some schedules it is easy to recover from transaction and system failures, whereas for other
schedules the recovery process can be quite complicated. In some cases, it is even not possible to
recover correctly after a failure. Therefore, it is important to characterize the types of schedules
for which recovery is possible, as well as those for which recovery is note possible.
Sometimes a transaction may not execute completely due to a software issue, system crash or
hardware failure. In that case, the failed transaction has to be rollback. But some other transaction
may also have used value produced by the failed transaction. So, we also have to rollback those
transactions. This process is called recoverability of schedules. There are different types of
recoverability of schedules those are Recoverable Schedule, Cascade less Schedule, and Strict
Schedule.
1. Recoverable Schedule:
If in a schedule,
•A
transaction performs a dirty read operation from an uncommitted transaction
• And its commit operation is delayed till the uncommitted transaction either commits or
roll backs
Then such a schedule is known as a Recoverable Schedule.
– A schedule S is recoverable if no transaction T in S commits until all transactions T1 that
have written some item X that T reads have committed.
Page 19
Chapter 2-ADBMS
Sa : r1 (X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1; Recoverable (but suffers from lost
update problem)
Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1; Non recoverable.
Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2; Recoverable.
2. Cascade less schedule (avoid cascading rollback): A schedule is said to be cascade less,
or to avoid cascading rollback, if every transaction in the schedule reads only items that
were written by committed transactions. In this case, all items read will be committed data,
so no cascading rollback will occur.
3. Strict Schedule: A schedule in which a transaction can neither read nor write an item X
until the last transaction that wrote X has committed.
Summary:
The cascade less schedules will be a subset of the recoverable schedules, and the strict schedules
will be a subset of the cascade less schedules. It is important to note that any strict schedule is also
cascade less, and any cascade less schedule is also recoverable.
ALL
Recoverable
Cascadeless
Strict
Page 20
Chapter 2-ADBMS
Summary
Page 21