Rdbms Unit V
Rdbms Unit V
Rdbms Unit V
Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved are:
After this, the actual evaluation of the queries and a variety of query -optimizing
transformations and takes place. Thus before processing a query, a computer system needs
totranslate the query into a human-readable and understandable language.
The translation process in query processing is similar to theparser of a query. When a user
executes any query, for generating the internal form of the query, the parser in the system
checks the syntax of the query, verifies the name of the relation in the database, the tuple,
and finally the required attribute value.
The parser creates a tree of the query, known as 'parse-tree.'Further, translate it into the
form of relational algebra. Withthis, it evenly replaces all the use of the views when used
inthe query.
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the following
query is undertaken:
Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra. We can bring this query in the relational algebra form as:
After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.
2) Optimization
o The cost of the query evaluation can vary for differenttypes of queries. Although the system is
responsible for constructing the evaluation plan, the user does need not to write their query
efficiently.
o Usually, a database system generates an efficient queryevaluation plan, which minimizes its
cost. This type of taskperformed by the database system and is known as Query Optimization.
o For optimizing a query, the query optimizer should have anestimated cost analysis of each
operation. It is because the overall operation cost depends on the memory allocationsto several
operations, execution costs, and so on.
3) Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the translated
relational algebra expression with the instructions used for specifying and evaluating each
operation. Thus, after translating the user query, the system executes a query evaluation plan.
Query Evaluation Plan
o In order to fully evaluate a query, the system needs to construct a query evaluation plan.
o The annotations in the evaluation plan may refer to the algorithms to be used for the particular
index or the specificoperations.
o Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.
o Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a
query. The queryevaluation plan is also referred to as the query execution plan.
o A query execution engine is responsible for generating theoutput of the given query. It takes
the query execution plan,executes it, and finally makes the output for the user query.
5.2 Transaction
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This
small transaction contains several low-level tasks:
X's Account
Open_Account(X) Old_Balance =
X.balance
New_Balance = Old_Balance - 800 X.balance =
New_Balance Close_Account(X)
Y's Account
Open_Account(Y)
Old_Balance = Y.balance New_Balance =
Old_Balance + 800 Y.balance = New_Balance
Close_Account(Y)
Read(X): Read operation is used to read the value of X from thedatabase and stores it in a buffer
in main memory.
Write(X): Write operation is used to write the value back to thedatabase from the buffer.
Let's take an example to debit transaction from an accountwhich consists of following operations:
R(X);
X = X - 500;W(X);
o The first operation reads X's value from database and storesit in a buffer.
o The second operation will decrease the value of X by 500.So buffer will contain 3500.
o The third operation will write the buffer's value to thedatabase. So X's final value will be 3500.
recovery manager rolls backall its write operations on the database to bring the
database back to its original state where it was prior to the execution of the transaction.
Transactions in this state are called aborted. The database recovery module can select one
of the two operations after a transaction aborts −
o Re-start the transaction
o Kill the transaction
A transaction is a very small unit of a program and it may contain several lowlevel tasks.
A transaction in a database system must maintain Atomicity, Consistency, Isolation,and
Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.
Atomicity − This property states that a transaction must betreated as an atomic unit, that
is, either all of its operations are executed or none. There must be no state in a database
where a transaction is left partially completed. States should be defined either before the
execution of the transaction or after the execution/abortion/failure of thetransaction.
Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution ofthe transaction as well.
Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will affect the
existence of any other transaction.
Durability − The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold themodified data. If a transaction commits but the
system failsbefore the data could be written on to the disk, then that data will be updated
once the system springs back into action.
But before knowing about concurrency control, we should knowabout concurrent execution.
o In a multi-user system, multiple users can access and usethe same database at one time, which
is known as theconcurrent execution of the database. It means that thesame database is executed
simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the database
by multiple users for performing different operations, and in that case, concurrent execution of
the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an interleaved
manner, and nooperation should affect the other executing operations, thus maintaining the
consistency of the database. Thus, on making the concurrent execution of the transaction
operations, there occur several challenging problems that need to be solved.
The problem occurs when two different database transactions perform the read/write
operations on the samedatabase items in an interleaved manner
(i.e., concurrent execution) that makes the values of the items incorrect hence making the
database inconsistent.
For example:
Consider the below diagram where two transactions TX and TY, are performed on the same
account A where the balance of account A is $300.
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted and
not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300 only
because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not
updated/write).
o At time t6, transaction TX writes the value of account Athat will be updated as $250 only,
as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as done
at time t4 that will be
$400. It means the value written by TX is lost, i.e., $250 is lost.
Hence data becomes incorrect, and database sets to inconsistent. Dirty Read Problems (W-R
Conflict)
The dirty read problem occurs when one transactionupdates an item of the database, and
somehow the transaction fails, and before the data gets rollback, theupdated
database item is accessed by another transaction. There comes the Read-Write Conflict
between both transactions.
RASC- DEPARTMENT COMPUTER SCIENCE AND APPLICATION-N.MANIMOZHI/AP 7
RELATIONAL DATABASE MANAGEMENT DESIGN UNIT-V
For example:
Consider two transactions TX and TY in the below diagram performing read/write operations
on account A where the available balance in account A is $300:
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account A,
having an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account Aby adding $100 to the available
balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e.,
$400.
o After that, at time t5, transaction TX reads the availablevalue of account A, and that will
be read as $400.
o It means that within the same transaction TX, it reads two different values of account A, i.e., $
300 initially, and after updation made by transaction TY, it reads $400. It is an unrepeatable read
and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency
Control comes into role.
Concurrency Control is the working concept that is required for controlling and managing the
concurrent execution of database operations and thus avoiding the inconsistencies in the database.
Thus, for maintaining the concurrency of the database, we have the concurrency control protocols.
RASC- DEPARTMENT COMPUTER SCIENCE AND APPLICATION-N.MANIMOZHI/AP 9
RELATIONAL DATABASE MANAGEMENT DESIGN UNIT-V
The concurrency control protocols ensure the atomicity, consistency, isolation, durability and
serializability of the concurrent execution of the database transactions. Therefore,these protocols
are categorized as:
1. Lock-Based Protocol
In this type of protocol, any transaction cannot read or write datauntil it acquires an appropriate
lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock, then
it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads aswell as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
It is the simplest way of locking the data while transaction.Simplistic lock-based protocols allow
all the transactions to get the lock on the data before insert or delete or update on it. It willunlock
the data item after completing the transaction.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released,
but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
Example:
The following way shows how unlocking and locking work with2-PL.
Transaction T1:
o Growing phase: from step 1-3
o Shrinking phase: from step 5-7
o Lock point: at 3
Transaction T2:
o Growing phase: from step 2-6
o Shrinking phase: from step 8-9
o Lock point: at 6
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock
after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at
a time.
o Strict-2PL protocol does not have shrinking phase of lock release.
• A deadlock is a condition where two or more transactions are waiting indefinitely for one
another to give up locks. Deadlock is said to be one of the most feared complicationsin
DBMS as no task ever gets finished and is in waiting state forever.
For example:
• In the student table, transaction T1 holds a lock on some rows and needs to update some
rows in the grade table.
Simultaneously, transaction T2 holds locks on some rowsin the grade table and needs
to update the rows in theStudent table held by Transaction T1.
• Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and
similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a halt state and
remain at a standstill. It will remain in a standstill until the DBMS detects the deadlock and aborts
one of the transactions.
Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather
than aborting or restatingthe database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A
method like "wait forgraph" is used for detecting the deadlock situation but this method is
suitable only for the smaller database. For the larger database, deadlock prevention method
can be used.
Deadlock Detection
o In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should
detect whether the transaction is involved in a deadlock or not. The lock manager maintains
a Wait for the graph to detect the deadlock cycle in the database.
The wait for a graph for the above scenario is shown below:
Deadlock Prevention
o Deadlock prevention method is suitable for a largedatabase. If the resources are allocated
in such a way that deadlock never occurs, then the deadlock can be prevented.
o The Database management system analyzes the operations of the transaction whether they
can create a deadlock situation or not. If they do, then the DBMS never allowed that
transaction to be executed.
Wait-Die scheme
• In this scheme, if a transaction requests for a resource which is already held with a
conflicting lock by another transaction then the DBMS simply checks the timestamp of
both transactions. It allows the older transaction to waituntil the resource is available
for execution.
Let's assume there are two transactions Ti and Tj and let TS(T)is a timestamp of any transaction
T. If T2 holds a lock by some other transaction and T1 is requesting for resources held by T2 then
the following actions are performed by DBMS:
1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction andTj has held some resource, then
Ti is allowed to wait until the data-item is available for execution. That means if the older
transaction is waiting for a resource which is locked by the younger transaction, then the
older transaction is allowed to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and hasheld some resource and if Tj is
waiting for it, then Tj is killed and restarted later with the random delay but with thesame
timestamp.
o The entire DBMS is a very complex structure with multipletransactions being performed
and carried out every second.
o The toughness and strength of a system depend not only on the complex and secured
architecture of a system but also in the way how data are managed and maintained in the
worst cases. If the underlying architecture fails or crashes, then there must be some
techniques and procedures by which the lost data during a transaction gets recovered.
o It is the method of restoring the database to its correct state in the event of a failure at the
time of the transaction or after the end of a process. Earlier, you have been given the
concept of database recovery as a service that should be provided by all the DBMS for
ensuring that the database is dependable and remains in a consistent state in the presence
of failures. In this context, dependability refers to both the flexibility of the DBMS to
various kinds of failure and its ability to recover from those failures.
Every DBMS should offer the following facilities to help outwith the recovery mechanism:
Backup mechanism makes backup copies at a specificinterval for the database.
Logging facilities keep tracing the current state of transactions and any
changes made to the database.
Checkpoint facility allows updates to the database for gettingthe latest patches to be made
permanent and keep secure from vulnerability.
Recovery manager allows the database system for restoringthe database to a reliable and
steady-state after any failure occurs.
Failure Classification
To find that where the problem has occurred, we generalize afailure into the following categories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when itreaches a point from where it
can't go any further. If a few transaction or process is hurt, then this is called as transaction failure.
Reasons for a transaction failure could be –
2. System Crash
System failure can occur due to power failure or other hardware or software failure.
3. Disk Failure
o It occurs where hard-disk drives or storage drives used to failfrequently. It was a common
problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, diskhead crash, and unreachability
to the disk or any other failure,which destroy all or part of disk storage.
Log-Based Recovery
• The log is a sequence of records. Log of each transaction ismaintained in some stable
storage so that if any failure occurs, then it can be recovered from there.
• If any operation is performed on the database, then it willbe recorded in the log.
• But the process of storing the logs should be done before the actual transaction is
applied in the database.
• Let's assume there is a transaction to modify the City of a student. The following logs are
written for this transaction.
When the transaction is initiated, then it writes 'start' log.
1. <Tn, Start>
When the transaction modifies the City from 'Noida' to'Bangalore', then another log
is written to the file.
1. <Tn, City, 'Noida', 'Bangalore' >
When the transaction is finished, then it writes another log to indicate the end of the
transaction.
1. <Tn, Commit>
The deferred modification technique occurs if the transaction does not modify
the database until it hascommitted.
In this method, all the logs are created and stored in the stable storage, and the
database is updated when a transaction commits.
2. Immediate database modification:
When the system is crashed, then the system consults thelog to find which transactions
need to be undone and which need to be redone.
1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then
the Transaction Ti needsto be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti,
commit> or <Ti, abort>, then the Transaction Ti needs to be undone.
Checkpoint
The checkpoint is a type of mechanism where all theprevious logs are removed
from the system and permanently stored in the storage disk.
The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created
When it reaches to the checkpoint, then the transaction willbe updated into the database,
and till that point, the entire log file will be removed from the file. Then the log file is
updated with the new step of transaction till next checkpoint and so on.
The checkpoint is used to declare a point before which theDBMS was in the consistent
state, and all transactions were committed.
In the following manner, a recovery system recovers thedatabase from this failure:
The recovery system reads log files from the end to start. Itreads log files from T4 to T1.
Recovery system maintains two lists, a redo-list, and anundo-list.
The transaction is put into redo state if the recovery systemsees a log with <Tn, Start> and
<Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous list, all the
transactions are removed and then redone before saving their logs.
For example:
• In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn, Commit>. The T1
transaction will have only <Tn, commit> in the log file. That's why the transaction is
committed after the checkpoint is crossed. Hence it puts T1, T2 and T3 transaction into
redo list.
• The transaction is put into undo state if the recovery systemsees a log with <Tn, Start> but
no commit or abort log found. In the undo-list, all the transactions are undone, and their
logs are removed.
For example:
• Transaction T4 will have <Tn, Start>. So T4 will be put into undo list since this
transaction is not yet complete andfailed amid.