DBMS UNIT-4 (Database Transactions & Query Processing)
DBMS UNIT-4 (Database Transactions & Query Processing)
Database Transactions
&
Query Processing
Department of Computer Engineering
Database System Concepts - 5th Edition, July 28, 2005. 7.1 ©Silberschatz, Korth and Sudarshan
Unit IV
Database Transactions
& Query Processing
❖Terms
• Transaction Concept
• Transaction State
• Serializability
• Recoverability
• Testing for Serializability.
Transaction Concept
• A transaction is a unit of program execution that
accesses and possibly updates various data items.
• A transaction must see a consistent database.
• During transaction execution the database may be
inconsistent.
• When the transaction is committed, the database
must be consistent.
• Two main issues to deal with:
– ! Failures of various kinds, such as hardware failures and
system crashes
– ! Concurrent execution of multiple transactions
ACID Properties
To preserve the integrity of data the database system must ensure:
Consider the
partial schedule
• The lock request is compatible with the lock granted to T2, so T3 may be
granted the shared-mode lock. At this point T2 may release the lock, but
still T1 has to wait for T3 to finish.
– T8: read(a1);
– read(a2);
–...
– read(an);
– write(a1).
– T9: read(a1);
– read(a2);
– display(a1 + a2).
• If we employ the two-phase locking protocol, then T8 must
lock a1 in exclusive mode. Therefore, any concurrent
execution of both transactions amounts to a serial execution.
• –First Phase:
– can acquire a lock-S on item
– can acquire a lock-X on item
– can convert a lock-S to a lock-X (upgrade)
• –Second Phase:
– can release a lock-S
– can release a lock-X
– can convert a lock-X to a lock-S (downgrade)
Timestamp Ordering Protocol
• With each transaction Ti in the system, we associate a unique
fixed timestamp, denoted by TS(Ti). This timestamp is assigned by
the database system before the transaction Ti starts execution.
• If a transaction Ti has been assigned timestamp TS(Ti), and a new
transaction Tj enters the system, then TS(Ti) < TS(Tj ).
• There are two simple methods for implementing this scheme:
• 1. Use the value of the system clock as the timestamp; that is, a
transaction’s timestamp is equal to the value of the clock when
the transaction enters the system.
• 2. Use a logical counter that is incremented after a new
timestamp has been assigned; that is, a transaction’s timestamp
is equal to the value of the counter when the transaction enters the
system.
Timestamp Ordering Protocol
• Each transaction is issued a timestamp when it enters the
system. If an old transaction Ti has time-stamp TS(Ti), a
new transaction Tj is assigned time-stamp TS(Tj) such
that TS(Ti) <TS(Tj).
•Since T16 starts before T17, we shall assume that TS(T16) < TS(T17).
The read(Q) operation of T16 succeeds, as does the write(Q) operation of
T17.
•When T16 attempts its write(Q) operation, we find that TS(T16) < W-
timestamp(Q), since W-timestamp(Q) = TS(T17).
•
•Thus, the write(Q) by T16 is rejected and transaction T16 must be rolled
back.
• Although the rollback of T16 is required by the
timestamp-ordering protocol, it is unnecessary.
➢ Output of updated blocks can take place at any time before or after
transaction commit
➢ Order in which blocks are output can be different from the order in which
they are written.
➢ Recovery procedure has two operations instead of one:
➢ undo(Ti) restores the value of all data items updated by Ti to
their old values, going backwards from the last log record for Ti
➢ redo(Ti) sets the value of all data items updated by Ti to the
new values, going forward from the first log record for Ti.
Tc Tf
T1
T2
T3
T4
➢ All the transactions in the undo-list are then undone and their logs
are removed. All the transactions in the redo-list and their previous
logs are removed and then redone before saving their logs.
➢ Checkpoints are performed as before, except that the checkpoint
log record is now of the form
< checkpoint L>
where L is the list of transactions active at the time of the
checkpoint
➢ We assume no updates are in progress while the checkpoint is
carried out.
➢ When the system recovers from a crash, it first does the following:
1. Initialize undo-list and redo-list to empty
2. Scan the log backwards from the end, stopping when the first
<checkpoint L> record is found.
For each record found during the backward scan:
➢ if the record is <Ti commit>, add Ti to redo-list
➢ if the record is <Ti start>, then if Ti is not in redo-list, add
Ti to undo-list
3. For every Ti in L, if Ti is not in redo-list, add Ti to undo-list
Example of Recovery
➢ Go over the steps of the recovery algorithm on the following log:
<T0 start>
<T0, A, 0, 10>
<T0 commit>
<T1 start> --stop backward for undo
<T1, B, 0, 10>
<T2 start>
<T2, C, 0, 10>
<T2, C, 10, 20>
<checkpoint {T1, T2}> -- start forward for redo
<T3 start>
<T3, A, 10, 20>
<T3, D, 0, 10>
<T3 commit>
Summary-Checkpoint
➢ Keeping and maintaining logs in real time and in
real environment may fill out all the memory
space available in the system.
➢ As time passes, the log file may grow too big to
be handled at all.
➢ Checkpoint is a mechanism where all the
previous logs are removed from the system and
stored permanently in a storage disk.
➢ Checkpoint declares a point before which the
DBMS was in consistent state, and all the
transactions were committed.
Shadow Paging
➢ Shadow paging is an alternative to log-based recovery; this
scheme is useful if transactions execute serially.
➢ Idea: maintain two page tables during the lifetime of a
transaction –the current page table, and the shadow page table
➢ Store the shadow page table in nonvolatile storage, such that
state of the database prior to transaction execution may be
recovered.
➢ Shadow page table is never modified during execution
➢ To start with, both the page tables are identical. Only current
page table is used for data item accesses during execution of the
transaction.
➢ Whenever any page is about to be written for the first time
➢ A copy of this page is made onto an unused page. [to be a new current
page]
➢ The current page table is then made to point to the copy
➢ The update is performed on the copy
Sample Page Table
Example of Shadow Paging
➢ When Ti requests a data item currently being held by Tj, then the
edge Ti → Tj is inserted in the wait-for graph. This edge is
removed only when Tj is no longer holding a data item needed
by Ti.
➢ Before query processing can begin, the system must translate the query into a
usable form.
➢ Thus, the first action the system must take in query processing is to translate a
given query into its internal form.
➢ In generating the internal form of the query, the parser checks the syntax of the
user’s query, verifies that the relation names appearing in the query are names of
the relations in the database, and so on.
➢ If the query was expressed in terms of a view, the translation phase also replaces
all uses of the view by the relational-algebra expression that defines the view.
Steps in query processing
➢ Given a query, there are generally a variety of methods for
computing the answer.
➢ Each SQL query can itself be translated into a relational
algebra expression in one of several ways.
➢ Furthermore, the relational-algebra representation of a
query specifies only partially how to evaluate a query; there
are usually several ways to evaluate relational-algebra
expressions.
➢ consider the query:
select salary
from instructor
where salary < 75000;
➢ This query can be translated into either of the following relational-
algebra expressions:
➢ salary <75000 (salary (instructor))
➢ salary ( salary<75000 (instructor))
➢ Since we are concerned with only those tuples in the branch relation
➢ that pertain to branches located in Brooklyn, we do not need to
➢ consider those tuples that do not have branch-city = “Brooklyn”.
➢ Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest
cost.
➢ Cost is estimated using statistical information from the
database catalog
➢ e.g. number of tuples in each relation, size of tuples, etc.
➢ we study
➢ How to measure query costs
➢ Algorithms for evaluating relational algebra operations
➢ How to combine algorithms for individual operations in order to evaluate a complete
expression
Measures of Query Cost
➢ Cost is generally measured as total elapsed time for answering query
➢ Many factors contribute to time cost
➢ disk accesses, CPU, or even network communication
➢ Typically disk access is the predominant cost, and is also relatively easy to estimate.
Measured by taking into account
➢ Number of seeks * average-seek-cost
➢ + Number of blocks read * average-block-read-cost
➢ + Number of blocks written * average-block-write-cost
➢ Cost to write a block is greater than cost to read a block
➢ data is read back after being written to ensure that the write was successful
➢ Assumption: single disk
➢ Can modify formulae for multiple disks/RAID arrays
➢ Or just use single-disk formulae, but interpret them as measuring resource
consumption instead of time
Measures of Query Cost (Cont.)
➢ For simplicity we just use the number of block transfers from disk and the number of seeks
as the cost measures
➢ tT – time to transfer one block
➢ tS – time for one seek
➢ Cost for b block transfers plus S seeks
b * tT + S * tS
➢ We ignore CPU costs for simplicity
➢ Real systems do take CPU cost into account
➢ We do not include cost to writing output to disk in our cost formulae
➢ Several algorithms can reduce disk IO by using extra buffer space
➢ Amount of real memory available to buffer depends on other concurrent queries and
OS processes, known only during execution
➢ We often use worst case estimates, assuming only the minimum amount of
memory needed for the operation is available
➢ Required data may be buffer resident already, avoiding disk I/O
➢ But hard to take into account for cost estimation
Selection Operation
➢ File scan – search algorithms that locate and retrieve records that fulfill a selection
condition.
➢ Algorithm A1 (linear search). Scan each file block and test all records to see whether they
satisfy the selection condition.
➢ Cost estimate = br block transfers + 1 seek
➢ br denotes number of blocks containing records from relation r
➢ If selection is on a key attribute, can stop on finding record
➢ cost = (br /2) block transfers + 1 seek
➢ Linear search can be applied regardless of
➢ selection condition or
➢ ordering of records in the file, or
➢ availability of indices
Selection Operation (Cont.)
➢ A2 (binary search). Applicable if selection is an equality comparison on the attribute on
which file is ordered.
➢ Assume that the blocks of a relation are stored contiguously
➢ Cost estimate (number of disk blocks to be scanned):
➢ cost of locating the first tuple by a binary search on the blocks
➢ log2(br) * (tT + tS)
➢ If there are multiple records satisfying selection
➢ Add transfer cost of the number of blocks containing records that satisfy
selection condition
Selections Using Indices
➢ Index scan – search algorithms that use an index
➢ selection condition must be on search-key of index.
➢ A3 (primary index on candidate key, equality). Retrieve a single record that satisfies the
corresponding equality condition
➢ Cost = (hi + 1) * (tT + tS)
➢ A4 (primary index on nonkey, equality) Retrieve multiple records.
➢ Records will be on consecutive blocks
➢ Let b = number of blocks containing matching records
➢ Cost = hi * (tT + tS) + tS + tT * b
➢ A5 (equality on search-key of secondary index).
➢ Retrieve a single record if the search-key is a candidate key
➢ Cost = (hi + 1) * (tT + tS)
➢ Retrieve multiple records if search-key is not a candidate key
➢ each of n matching records may be on a different block
➢ Cost = (hi + n) * (tT + tS)
➢ Can be very expensive!