Dbms Mod3
Dbms Mod3
Dbms Mod3
MODULE 3
● Storage Strategies
❖ Comparison of ordered indexing and hashing
Indexing Hashing
It is a technique that
It is a technique that
allows to search location
allows to quickly retrieve
of desired data on disk
records from database
without using index
file.
structure.
It uses mathematical
It uses data reference to functions known as hash
hold address of disk function to calculate
block. direct location of records
on disk.
It is important because it
It is important because it
ensures data integrity of
protects file and
files and messages, takes
documents of large size
variable length string or
business organizations,
messages and
and optimize
compresses and converts
performance of database.
it into fixed length value.
❖ Indices
➔ Types of index
➔ Ordered indices
➔ Hash indices
➔ Dense Index
➔ Sparse index
➔ Multilevel Index
➔ Types of Indexing
★ Single level (primary,clustering ,secondary index)
Indexing in DBMS
○ Indexing is used to optimize the performance of a database by minimizing the
number of disk accesses required when a query is processed.
○ The index is a type of data structure. It is used to locate and access the data in a
database table quickly.
Index structure:
Indexes can be created using some database columns.
○ The first column of the database is the search key that contains a copy of the
primary key or candidate key of the table. The values of the primary key are
stored in sorted order so that the corresponding data can be accessed easily.
○ The second column of the database is the data reference. It contains a set of
pointers holding the address of the disk block where the value of the particular
key can be found.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted
are known as ordered indices.
Example: Suppose we have an employee table with thousands of record and each of
which is 10 bytes long. If their IDs start with 1, 2, 3....and so on and we have to search
student with ID-543.
○ In the case of a database with no index, we have to search the disk block from
starting till it reaches 543. The DBMS will read the record after reading
543*10=5430 bytes.
○ In the case of an index, we will search using indexes and the DBMS will read the
record after reading 542*2= 1084 bytes which are very less compared to the
previous case.
Primary Index
○ If the index is created on the basis of the primary key of the table, then it is
known as primary indexing. These primary keys are unique to each record and
contain 1:1 relation between the records.
○ As primary keys are stored in sorted order, the performance of the searching
operation is quite efficient.
○ The primary index can be classified into two types: Dense index and Sparse
index.
Dense index
○ The dense index contains an index record for every search key value in the data
file. It makes searching faster.
○ In this, the number of records in the index table is same as the number of records
in the main table.
○ It needs more space to store index record itself. The index records have the
search key and a pointer to the actual record on the disk.
Sparse index
○ In the data file, index record appears only for a few items. Each item points to a
block.
○ In this, instead of pointing to each record in the main table, the index points to the
records in the main table in a gap.
Clustering Index
○ A clustered index can be defined as an ordered data file. Sometimes the index is
created on non-primary key columns which may not be unique for each record.
○ In this case, to identify the record faster, we will group two or more columns to
get the unique value and create index out of them. This method is called a
clustering index.
○ The records which have similar characteristics are grouped, and indexes are
created for these group.
1. In the sparse indexing, as the size of the table grows, the size of mapping also
grows. These mappings are usually kept in the primary memory so that address
fetch should be faster.
2. Then the secondary memory searches the actual data based on the address got
from mapping.
3. If the mapping size grows then fetching the address itself becomes slower. In
this case, the sparse index will not be efficient.
4. To overcome this problem, secondary indexing is introduced.
5. In secondary indexing, to reduce the size of mapping, another level of indexing is
introduced.
6. In this method, the huge range for the columns is selected initially so that the
mapping size of the first level becomes small.
7. Then each range is further divided into smaller ranges. The mapping of the first
level is stored in the primary memory, so that address fetch is faster.
8. The mapping of the second level and actual data are stored in the secondary
memory (hard disk).
For example:
★ If you want to find the record of roll 111 in the diagram, then
it will search the highest entry which is smaller than or equal
to 111 in the first level index. It will get 100 at this level.
★ Then in the second index level, again it does max (111) <=
111 and gets 110. Now using the address 110, it goes to the
data block and starts searching each record till it gets 111.
★ This is how a search is performed in this method. Inserting,
updating or deleting is also done in the same manner.
Multilevel Indexing:
1. Multilevel Indexing: With the growth of the size of the database, indices also grow. As the
index is stored in the main memory, a single-level index might become too large a size to
store with multiple disk accesses.
2. The multilevel indexing segregates the main block into various smaller blocks so that the
same can be stored in a single block.
3. The outer blocks are divided into inner blocks which in turn are pointed to the data
blocks.
4. This can be easily stored in the main memory with fewer overheads.
Parameter Clustered Non-clustered
s
Use for You can sort the records A non-clustered index helps
and store clustered index you to creates a logical
physically in memory as order for data rows and uses
per the order. pointers for physical data
files.
Disadvantages of Indexing
● Indexing necessitates more storage space to hold the index data structure, which might
increase the total size of the database.
● Increased database maintenance overhead: Indexes must be maintained as data is
added, destroyed, or modified in the table, which might raise database maintenance
overhead.
● Indexing can reduce insert and update performance since the index data structure must
be updated each time data is modified.
● Choosing an index can be difficult: It can be challenging to choose the right indexes for a
specific query or application and may call for a detailed examination of the data and
access patterns.
★ In the B+ tree, the leaf nodes are linked using a link list.
Therefore, a B+ tree can support random access as well as
sequential access.
Structure of B+ Tree
○ In the B+ tree, every leaf node is at equal distance from the root node. The B+
tree is of the order n where n is fixed for every B+ tree.
Internal node
○ An internal node of the B+ tree can contain at least n/2 record pointers except
the root node.
Leaf node
○ The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key
values.
○ Every leaf node of the B+ tree contains one block pointer P to point to next leaf
node.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the
end, we will be redirected to the third leaf node. Here DBMS will perform a sequential
search to find 55.
B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf
node after 55. It is a balanced tree, and a leaf node of this tree is already full, so we
cannot insert 60 there.
In this case, we have to split the leaf node, so that it can be inserted into tree without
affecting the fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We
will split the leaf node of the tree in the middle so that its balance is not altered. So we
can group (50, 55) and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It
should have 60 added to it, and then we can have pointers to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very
easy to find the node where it fits and then place it in that leaf node.
B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to remove
60 from the intermediate node as well as from the 4th leaf node too. If we remove it
from the intermediate node, then the tree will not satisfy the rule of the B+ tree. So we
need to modify it to have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as
follows:
❖ B-Trees
➔ B+trees examples
➔ Queries on B-Trees
➔ B-Trees index files
https://builtin.com/data-science/b-tree-index
❖ Hashing
➔ Static Hashing
➔ Deficiencies of static hashing
➔ Linear Probing
➔ Dynamic Hashing
➔ Hash Structure(extendable hashing)
Hashing in DBMS
In a huge database structure, it is very inefficient to search all the index
values and reach the desired data. Hashing technique is used to calculate
the direct location of a data record on the disk without using index
structure.
In this, a hash function can choose any of the column value to generate the
address. Most of the time, the hash function uses the primary key to
generate the address of the data block. A hash function is a simple
mathematical function to any complex mathematical function. We can even
consider the primary key itself as the address of the data block. That
means each row whose address will be the same as a primary key stored
in the data block.
The above diagram shows data block addresses same as primary key
value. This hash function can also be a simple mathematical function like
exponential, mod, cos, sin, etc. Suppose we have mod (5) hash function to
determine the address of the data block. In this case, it applies mod (5)
hash function on the primary keys and generates 3, 3, 1, 4 and 2
respectively, and records are stored in those data block addresses.
Types of Hashing:
Static Hashing
In static hashing, the resultant data bucket address will always be the same. That
means if we generate an address for EMP_ID =103 using the hash function mod (5) then
it will always result in same bucket address 3. Here, there will be no change in the
bucket address.
Hence in this static hashing, the number of data buckets in memory remains constant
throughout. In this example, we will have five data buckets in the memory used to store
the data.
Operations of Static Hashing
○ Searching a record
When a record needs to be searched, then the same hash function retrieves the address
of the bucket where the data is stored.
○ Insert a Record
When a new record is inserted into the table, then we will generate an address for a new
record based on the hash key and record is stored in that location.
○ Delete a Record
To delete a record, we will first fetch the record which is supposed to be deleted. Then
we will delete the records for that address in memory.
○ Update a Record
To update a record, we will first search it using a hash function, and then the data record
is updated.
If we want to insert some new record into the file but the address of a data bucket
generated by the hash function is not empty, or data already exists in that address. This
situation in the static hashing is known as bucket overflow. This is a critical situation in
this method.
To overcome this situation, there are various methods. Some commonly used methods
are as follows:
1. Open Hashing
When a hash function generates an address at which data is already stored, then the
next bucket will be allocated to it. This mechanism is called as Linear Probing.
For example: suppose R3 is a new address which needs to be inserted, the hash
function generates address as 112 for R3. But the generated address is already full. So
the system searches next available data bucket, 113 and assigns R3 to it.
2. Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and
is linked after the previous one. This mechanism is known as Overflow chaining.
For example: Suppose R3 is a new address which needs to be inserted into the table,
the hash function generates address as 110 for it. But this bucket is full to store the new
data. In this case, a new bucket is inserted at the end of 110 buckets and is linked to it.
Dynamic Hashing
○ The dynamic hashing method is used to overcome the problems of static
hashing like bucket overflow.
○ This method makes hashing dynamic, i.e., it allows insertion or deletion without
resulting in poor performance.
○ Check how many bits are used in the directory, and these bits are called as i.
○ Take the least significant i bits of the hash address. This gives an index of the
directory.
○ Now using the index, go to the directory and find bucket address where the
record might be.
○ Firstly, you have to follow the same procedure for retrieval, ending up in some
bucket.
○ If there is still space in that bucket, then place the record in it.
○ If the bucket is full, then we will split the bucket and redistribute the records.
For example:
Consider the following grouping of keys into buckets, depending on the prefix of their
hash address:
The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and
6 are 01, so it will go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into
bucket B2. The last two bits of 7 are 11, so it will go into B3.
Insert key 9 with hash address 10001 into the above
structure:
○ Since key 9 has hash address 10001, it must go into the first bucket. But bucket
B1 is full, so it will get split.
○ The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are 001, so it will
go into bucket B1, and the last three bits of 6 are 101, so it will go into bucket B5.
○ Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and 100 entry
because last two bits of both the entry are 00.
○ Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and 110 entry
because last two bits of both the entry are 10.
○ Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry because
last two bits of both the entry are 11.
Advantages of dynamic hashing
○ In this method, the performance does not decrease as the data grows in the
system. It simply increases the size of memory to accommodate the data.
○ In this method, memory is well utilized as it grows and shrinks with the data.
There will not be any unused memory lying.
○ This method is good for the dynamic database where data grows and shrinks
frequently.
● In this method, if the data size increases then the bucket size is also increased.
These addresses of data will be maintained in the bucket address table. This is
because the data address will keep changing as buckets grow and shrink. If there
is a huge increase in data, maintaining the bucket address table becomes
tedious.
● In this case, the bucket overflow situation will also occur. But it might take little
time to reach this situation than static hashing.
RAID technology
There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1, ...., RAID 6.
○ In this technology, the operating system views these separate disks as a single
logical disk.
○ In this technology, data is distributed across the physical drives of the array.
○ In case of disk failure, the parity information can be helped to recover the data.
RAID 0
○ RAID level 0 provides data stripping, i.e., a data can place across multiple disks. It
is based on stripping that means if one disk fails then all data in the array is lost.
○ This level doesn't provide fault tolerance but increases the system performance.
Example:
20 21 22 23
24 25 26 27
28 29 30 31
32 33 34 35
In this level, instead of placing just one block into a disk at a time, we can work with two
or more blocks placed it into a disk before moving on to the next one.
20 22 24 26
21 23 25 27
28 30 32 34
29 31 33 35
In this above figure, there is no duplication of data. Hence, a block once lost cannot be
recovered.
Pros of RAID 0:
○ In this level, throughput is increased because multiple data requests probably not
on the same disk.
○ This level full utilizes the disk space and provides high performance.
Cons of RAID 0:
○ In this level, failure of either disk results in complete data loss in respective array.
RAID 1
This level is called mirroring of data as it copies the data from drive 1 to drive 2. It
provides 100% redundancy in case of a failure.
Example:
A A B B
C C D D
E E F F
G G H H
Only half space of the drive is used to store the data. The other half of drive is just a
mirror to the already stored data.
Pros of RAID 1:
○ The main advantage of RAID 1 is fault tolerance. In this level, if one disk fails,
then the other automatically takes over.
○ In this level, the array will function even if any one of the drives fails.
Cons of RAID 1:
○ In this level, one extra drive is required per drive for mirroring, so the expense is
higher.
RAID 2
○ RAID 2 consists of bit-level striping using hamming code parity. In this level, each
data bit in a word is recorded on a separate disk and ECC code of data words is
stored on different set disks.
○ Due to its high cost and complex structure, this level is not commercially used.
This same performance can be achieved by RAID 3 at a lower cost.
Pros of RAID 2:
Cons of RAID 2:
RAID 3
○ RAID 3 consists of byte-level striping with dedicated parity. In this level, the parity
information is stored for each disk section and written to a dedicated parity drive.
○ In case of drive failure, the parity drive is accessed, and data is reconstructed
from the remaining devices. Once the failed drive is replaced, the missing data
can be restored on the new drive.
○ In this level, data can be transferred in bulk. Thus high-speed data transmission is
possible.
A B C P(A, B, C)
D E F P(D, E, F)
G H I P(G, H, I)
J K L P(J, K, L)
Pros of RAID 3:
Cons of RAID 3:
○ This level allows recovery of at most 1 disk failure due to the way parity works. In
this level, if more than one disk fails, then there is no way to recover the data.
○ Level 3 and level 4 both are required at least three disks to implement RAID.
A B C P0
D E F P1
G H I P2
J K L P3
In this level, parity can be calculated using an XOR function. If the data bits are 0,0,0,1
then the parity bits is XOR(0,1,0,0) = 1. If the parity bits are 0,0,1,1 then the parity bit is
XOR(0,0,1,1)= 0. That means, even number of one results in parity 0 and an odd number
of one results in parity 1.
C1 C2 C3 C4 Parity
0 1 0 0 1
0 0 1 1 0
Suppose that in the above figure, C2 is lost due to some disk failure. Then using the
values of all the other columns and the parity bit, we can recompute the data bit stored
in C2. This level allows us to recover lost data.
RAID 5
○ RAID 5 is a slight modification of the RAID 4 system. The only difference is that in
RAID 5, the parity rotates among the drives.
○ Same as RAID 4, this level allows recovery of at most 1 disk failure. If more than
one disk fails, then there is no way for data recovery.
0 1 2 3 P0
5 6 7 P1 4
10 11 P2 8 9
15 P3 12 13 14
P4 16 17 18 19
This level was introduced to make the random write performance better.
Pros of RAID 5:
Cons of RAID 5:
○ In this level, disk failure recovery takes longer time as parity has to be calculated
from all available drives.
RAID 6
○ In RAID 6, you can survive 2 concurrent disk failures. Suppose you are using RAID
5, and RAID 1. When your disks fail, you need to replace the failed disk because if
simultaneously another disk fails then you won't be able to recover any of the
data, so in this case RAID 6 plays its part where you can survive two concurrent
disk failures before you run out of options.
A0 B0 Q0 P0
A1 Q1 P1 D1
Q2 P2 C2 D2
P3 B3 C3 Q3
Pros of RAID 6:
○ This level performs RAID 0 to strip data and RAID 1 to mirror. In this level,
stripping is performed before mirroring.
Cons of RAID 6:
● Transaction Processing
❖ Transaction concepts
❖ Transaction state
❖ ACID properties
Operations of Transaction
A user can make different types of requests to access and modify the contents of a database.
So, we have different types of operations relating to a transaction. They are discussed as
follows:
i) Read(X)
1. A read operation is used to read the value of X from the database and store it in a buffer
in the main memory for further actions such as displaying that value.
2. Such an operation is performed when a user wishes just to see any content of the
database and not make any changes to it.
3. For example, when a user wants to check his/her account’s balance, a read operation
would be performed on user’s account balance from the database.
ii) Write(X)
1. A write operation is used to write the value to the database from the buffer in the main
memory.
2. For a write operation to be performed, first a read operation is performed to bring its
value in buffer, and then some changes are made to it, e.g. some set of arithmetic
operations are performed on it according to the user’s request, then to store the modified
value back in the database, a write operation is performed.
3. For example, when a user requests to withdraw some money from his account, his
account balance is fetched from the database using a read operation, then the amount
to be deducted from the account is subtracted from this value, and then the obtained
value is stored back in the database using a write operation.
iii) Commit
1. This operation in transactions is used to maintain integrity in the database. Due to some
failure of power, hardware, or software, etc., a transaction might get interrupted before all
its operations are completed.
2. This may cause ambiguity in the database, i.e. it might get inconsistent before and after
the transaction.
3. To ensure that further operations of any other transaction are performed only after work
of the current transaction is done, a commit operation is performed to the changes made
by a transaction permanently to the database.
iv) Rollback
1. This operation is performed to bring the database to the last saved state when any
transaction is interrupted in between due to any power, hardware, or software failure.
2. In simple words, it can be said that a rollback operation does undo the operations of
transactions that were performed before its interruption to achieve a safe state of the
database and avoid any kind of ambiguity or inconsistency.
Transaction Schedules
When multiple transaction requests are made at the same time, we need to decide their order of
execution. Thus, a transaction schedule can be defined as a chronological order of execution of
multiple transactions. There are broadly two types of transaction schedules discussed as
follows,
i) Serial Schedule
1. In this kind of schedule, when multiple transactions are to be executed, they are
executed serially, i.e. at one time only one transaction is executed while others wait for
the execution of the current transaction to be completed.
2. This ensures consistency in the database as transactions do not execute
simultaneously.
3. But, it increases the waiting time of the transactions in the queue, which in turn lowers
the throughput of the system, i.e. number of transactions executed per time.
4. To improve the throughput of the system, another kind of schedule are used which has
some more strict rules which help the database to remain consistent even when
transactions execute simultaneously.
Serializable
1. Serializability in DBMS is the property of a nonserial schedule that determines whether it
would maintain the database consistency or not.
2. The nonserial schedule which ensures that the database would be consistent after the
transactions are executed in the order determined by that schedule is said to be
Serializable Schedules.
3. The serial schedules always maintain database consistency as a transaction starts only
when the execution of the other transaction has been completed under it.
4. Thus, serial schedules are always serializable.
5. A transaction is a series of operations, so various states occur in its completion journey.
They are discussed as follows:
i) Active
1. It is the first stage of any transaction when it has begun to execute. The execution of the
transaction takes place in this state.
2. Operations such as insertion, deletion, or updation are performed during this state.
3. During this state, the data records are under manipulation and they are not saved to the
database, rather they remain somewhere in a buffer in the main memory.
iii) Commited
1. This state of transaction is achieved when all the transaction-related operations have
been executed successfully along with the Commit operation, i.e. data is saved into the
database after the required manipulations in this state.
2. This marks the successful completion of a transaction.
iv) Failed
1. If any of the transaction-related operations cause an error during the active or partially
committed state, further execution of the transaction is stopped and it is brought into a
failed state.
2. Here, the database recovery system makes sure that the database is in a consistent
state.
v) Aborted
1. If the error is not resolved in the failed state, then the transaction is aborted and a
rollback operation is performed to bring database to the the last saved consistent state.
2. When the transaction is aborted, the database recovery module either restarts the
transaction or kills it.
3. The illustration below shows the various states that a transaction may encounter in its
completion journey.
Transaction in DBMS
Properties of Transaction
1. As transactions deal with accessing and modifying the contents of the database, they
must have some basic properties which help maintain the consistency and integrity of
the database before and after the transaction.
2. Transactions follow 4 properties, namely, Atomicity, Consistency, Isolation, and
Durability.
3. Generally, these are referred to as ACID properties of transactions in DBMS. ACID is the
acronym used for transaction properties.
4. A brief description of each property of the transaction is as follows.
i) Atomicity
1. This property ensures that either all operations of a transaction are executed or it is
aborted.
2. In any case, a transaction can never be completed partially.
3. Each transaction is treated as a single unit (like an atom). Atomicity is achieved through
commit and rollback operations, i.e. changes are made to the database only if all
operations related to a transaction are completed, and if it gets interrupted, any changes
made are rolled back using rollback operation to bring the database to its last saved
state.
ii) Consistency
1. This property of a transaction keeps the database consistent before and after a
transaction is completed.
2. Execution of any transaction must ensure that after its execution, the database is either
in its prior stable state or a new stable state.
3. In other words, the result of a transaction should be the transformation of a database
from one consistent state to another consistent state.
4. Consistency, here means, that the changes made in the database are a result of logical
operations only which the user desired to perform and there is not any ambiguity.
iii) Isolation
1. This property states that two transactions must not interfere with each other, i.e. if some
data is used by a transaction for its execution, then any other transaction can not
concurrently access that data until the first transaction has completed.
2. It ensures that the integrity of the database is maintained and we don’t get any
ambiguous values. Thus, any two transactions are isolated from each other.
3. This property is enforced by the concurrency control subsystem of DBMS.
iv) Durability
1. This property ensures that the changes made to the database after a transaction is
completely executed, are durable.
2. It indicates that permanent changes are made by the successful execution of a
transaction.
3. In the event of any system failures or crashes, the consistent state achieved after the
completion of a transaction remains intact.
4. The recovery subsystem of DBMS is responsible for enforcing this property.
❖ Concurrent execution
❖ Problem with concurrent execution
❖ Concurrency Control
❖ Serializability of schedule
➔ Types of schedule
➔ Cascading rollback
➔ Conflict operation
➔ What is serializability
➔ Testing for serializability
➔ Algorithm for creation of graphs
➔ View serializability
○ In a multi-user system, multiple users can access and use the same database at
one time, which is known as the concurrent execution of the database. It means
that the same database is executed simultaneously on a multi-user system by
different users.
○ The thing is that the simultaneous execution that is performed should be done in
an interleaved manner, and no operation should affect the other executing
operations, thus maintaining the consistency of the database. Thus, on making
the concurrent execution of the transaction operations, there occur several
challenging problems that need to be solved.
The problem occurs when two different database transactions perform the read/write
operations on the same database items in an interleaved manner (i.e., concurrent
execution) that makes the values of the items incorrect hence making the database
inconsistent.
For example:
Consider the below diagram where two transactions TX and TY, are performed on the
same account A where the balance of account A is $300.
○ At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
○ At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
○ Alternately, at time t3, transaction TY reads the value of account A that will be
$300 only because TX didn't update the value yet.
○ At time t4, transaction TY adds $100 to account A that becomes $400 (only
added but not updated/write).
○ At time t6, transaction TX writes the value of account A that will be updated as
$250 only, as TY didn't update the value yet.
○ Similarly, at time t7, transaction TY writes the values of account A, so it will write
as done at time t4 that will be $400. It means the value written by TX is lost, i.e.,
$250 is lost.
The dirty read problem occurs when one transaction updates an item of the database,
and somehow the transaction fails, and before the data gets rollback, the updated
database item is accessed by another transaction. There comes the Read-Write Conflict
between both transactions.
For example:
○ At time t3, transaction TX writes the updated value in account A, i.e., $350.
○ Then at time t4, transaction TY reads account A that will be read as $350.
○ Then at time t5, transaction TX rollbacks due to server problem, and the value
changes back to $300 (as initially).
○ But the value for account A remains $350 for transaction TY as committed, which
is the dirty read and therefore known as the Dirty Read Problem.
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account
A, having an available balance = $300. The diagram is shown below:
○ At time t1, transaction TX reads the value from account A, i.e., $300.
○ At time t2, transaction TY reads the value from account A, i.e., $300.
○ At time t3, transaction TY updates the value of account A by adding $100 to the
available balance, and then it becomes $400.
○ After that, at time t5, transaction TX reads the available value of account A, and
that will be read as $400.
○ It means that within the same transaction TX, it reads two different values of
account A, i.e., $ 300 initially, and after updation made by transaction TY, it reads
$400. It is an unrepeatable read and is therefore known as the Unrepeatable read
problem.
Thus, in order to maintain consistency in the database and avoid such problems that
take place in concurrent execution, management is needed, and that is where the
concept of Concurrency Control comes into role.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and
managing the concurrent execution of database operations and thus avoiding the
inconsistencies in the database. Thus, for maintaining the concurrency of the database,
we have the concurrency control protocols.
The concurrency control protocols ensure the atomicity, consistency, isolation, durability
and serializability of the concurrent execution of the database transactions. Therefore,
these protocols are categorized as:
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
○ It is also known as a Read-only lock. In a shared lock, the data item can only read
by the transaction.
○ It can be shared between the transactions because when the transaction holds a
lock, then it can't update the data on the data item.
2. Exclusive lock:
○ In the exclusive lock, the data item can be both reads as well as written by the
transaction.
○ This lock is exclusive, and in this lock, multiple transactions do not modify the
same data simultaneously.
It is the simplest way of locking the data while transaction. Simplistic lock-based
protocols allow all the transactions to get the lock on the data before insert or delete or
update on it. It will unlock the data item after completing the transaction.
○ Pre-claiming Lock Protocols evaluate the transaction to list all the data items on
which they need locks.
○ Before initiating an execution of the transaction, it requests DBMS for all the lock
on all those data items.
○ If all the locks are granted then this protocol allows the transaction to begin.
When the transaction is completed then it releases all the lock.
○ If all the locks are not granted then this protocol allows the transaction to rolls
back and waits until all the locks are granted.
3. Two-phase locking (2PL)
○ The two-phase locking protocol divides the execution phase of the transaction
into three parts.
○ In the first part, when the execution of the transaction starts, it seeks permission
for the lock it requires.
○ In the second part, the transaction acquires all the locks. The third phase is
started as soon as the transaction releases its first lock.
○ In the third phase, the transaction cannot demand any new locks. It only releases
the acquired locks.
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by
the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can
happen:
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
○ Lock point: at 3
Transaction T2:
○ Lock point: at 6
4. Strict Two-phase locking (Strict-2PL)
○ The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all
the locks, the transaction continues to execute normally.
○ The only difference between 2PL and strict 2PL is that Strict-2PL does not
release a lock after using it.
○ Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
➔ Timestamp protocols
○ The lock-based protocol is used to manage the order between conflicting pairs
among transactions at the execution time. But Timestamp based protocols start
working as soon as a transaction is created.
○ Let's assume there are two transactions T1 and T2. Suppose the transaction T1
has entered the system at 007 times and transaction T2 has entered the system
at 009 times. T1 has the higher priority, so it executes first as it is entered the
system first.
○ The timestamp ordering protocol also maintains the timestamp of last 'read' and
'write' operation on a data.
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
○ If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise
the operation is executed.
Where,
➔ But the schedule may not be recoverable and may not even be
cascade- free.
1. Read phase: In this phase, the transaction T is read and executed. It is used to
read the value of various data items and stores them in temporary local
variables. It can perform all the write operations on temporary variables without
an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary
results are written to the database or system otherwise the transaction is rolled
back.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its
validation phase.
➔ Deadlock handling
➔ Strategies
➔ Prevention
➔ Detection
DO IT FROM MA’AM’S PDF
➔ Thomas write rule
○ If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and operation is
rejected.
○ If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the transaction
and continue processing.
○ If neither condition 1 nor condition 2 occurs, then allowed to execute the WRITE
operation by transaction Ti and set W_TS(X) to TS(T).
If we use the Thomas write rule then some serializable schedule can be permitted that
does not conflict serializable as illustrate by the schedule in a given figure:
In the above figure, T1's read and precedes T1's write of the same data item. This
schedule does not conflict serializable.
Thomas write rule checks that T2's write is never seen by any transaction. If we delete
the write operation in transaction T2, then conflict serializable schedule can be obtained
which is shown in below figure.
Figure: A Conflict Serializable Schedule