Unit 6 DBMS Concurrency Control and Normalization
Unit 6 DBMS Concurrency Control and Normalization
Lock-Based Protocol
What is Concurrency Control?
Concurrency control is the procedure in DBMS for managing simultaneous operations without conflicting
with each another. Concurrent access is quite easy if all users are just reading data. There is no way they
can interfere with one another. Though for any practical database, would have a mix of reading and WRITE
operations and hence the concurrency is a challenge.
Concurrency control is used to address such conflicts which mostly occur with a multi-user system. It helps
you to make sure that database transactions are performed concurrently without violating the data integrity
of respective databases.
Therefore, concurrency control is a most important element for the proper functioning of a system where two
or multiple database transactions that require access to the same data, are executed simultaneously.
• Lost Updates occur when multiple transactions select the same row and update the row based on the
value selected
• Uncommitted dependency issues occur when the second transaction selects a row which is updated
by another transaction (dirty read)
• Non-Repeatable Read occurs when a second transaction is trying to access the same row several
times and reads different data each time.
• Incorrect Summary issue occurs when one transaction takes summary over the value of all the
instances of a repeated data-item, and second transaction update few instances of that specific data-
item. In that situation, the resulting summary does not reflect a correct result.
Why use Concurrency method?
Reasons for using Concurrency control method is DBMS:
Example
Assume that two people who go to electronic kiosks at the same time to buy a movie ticket for the same
movie and the same show time.
However, there is only one seat left in for the movie show in that particular theatre. Without concurrency
control, it is possible that both moviegoers will end up purchasing a ticket. However, concurrency control
method does not allow this to happen. Both moviegoers can still access information written in the movie
seating database. But concurrency control only provides a ticket to the buyer who has completed the
transaction process first.
• Lock-Based Protocols
• Two Phase
• Timestamp-Based Protocols
• Validation-Based Protocols
Lock-based Protocols
A lock is a data variable which is associated with a data item. This lock signifies that operations that can be
performed on the data item. Locks help synchronize access to the database items by concurrent
transactions.
All lock requests are made to the concurrency-control manager. Transactions proceed only once the lock
request is granted.
Binary Locks: A Binary lock on a data item can either locked or unlocked states.
Shared/exclusive: This type of locking mechanism separates the locks based on their uses. If a lock is
acquired on a data item to perform a write operation, it is called an exclusive lock.
A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared between
transactions. This is because you will never have permission to update data on the data item.
For example, consider a case where two transactions are reading the account balance of a person. The
database will let them read by placing a shared lock. However, if another transaction wants to update that
account's balance, shared lock prevent it until the reading process is over.
With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can't be held
concurrently on the same data item. X-lock is requested using lock-x instruction. Transactions may unlock
the data item after finishing the 'write' operation.
For example, when a transaction needs to update the account balance of a person. You can allows this
transaction by placing X lock on it. Therefore, when the second transaction wants to read or write, exclusive
lock prevent this operation.
3. Simplistic Lock Protocol
This type of lock-based protocols allows transactions to obtain a lock on every object before beginning
operation. Transactions may unlock the data item after finishing the 'write' operation.
4. Pre-claiming Locking
Pre-claiming lock protocol helps to evaluate operations and create a list of required data items which are
needed to initiate an execution process. In the situation when all locks are granted, the transaction executes.
After that, all locks release when all of its operations are over.
Starvation
Starvation is the situation when a transaction needs to wait for an indefinite period to acquire a lock.
Deadlock
Deadlock refers to a specific situation where two or more processes are waiting for each other to release a
resource or more than two processes are waiting for the resource in a circular chain.
This locking protocol divides the execution phase of a transaction into three different parts.
• In the first phase, when the transaction begins to execute, it requires permission for the locks it needs.
• The second part is where the transaction obtains all the locks. When a transaction releases its first
lock, the third phase starts.
• In this third phase, the transaction cannot demand any new locks. Instead, it only releases the
acquired locks.
The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in two steps:
• Growing Phase: In this phase transaction may obtain locks but may not release any locks.
• Shrinking Phase: In this phase, a transaction may release locks but not obtain any new lock
It is true that the 2PL protocol offers serializability. However, it does not ensure that deadlocks do not
happen.
In the above-given diagram, you can see that local and global deadlock detectors are searching for
deadlocks and solve them with resuming transactions to their initial states.
Centralized 2PL
In Centralized 2 PL, a single site is responsible for lock management process. It has only one lock manager
for the entire DBMS.
Distributed 2PL
In this kind of two-phase locking mechanism, Lock managers are distributed to all sites. They are
responsible for managing locks for data at that site. If no data is replicated, it is equivalent to primary copy
2PL. Communication costs of Distributed 2PL are quite higher than primary copy 2PL
Timestamp-based Protocols
The timestamp-based algorithm uses a timestamp to serialize the execution of concurrent transactions. This
protocol ensures that every conflicting read and write operations are executed in timestamp order. The
protocol uses the System Time or Logical Count as a Timestamp.
The older transaction is always given priority in this method. It uses system time to determine the time stamp
of the transaction. This is the most commonly used concurrency protocol.
Lock-based protocols help you to manage the order between the conflicting transactions when they will
execute. Timestamp-based protocols manage conflicts as soon as an operation is created.
Example:
Suppose there are three transactions T1, T2, and T3.
T1 has entered the system at time 0010
T2 has entered the system at 0020
T3 has entered the system at 0030
Priority will be given to transaction T1, then transaction T2 and lastly Transaction T3.
Advantages:
Disadvantages:
Summary
• Concurrency control is the procedure in DBMS for managing simultaneous operations without
conflicting with each another.
• Lost Updates, dirty read, Non-Repeatable Read, and Incorrect Summary Issue are problems faced
due to lack of concurrency control.
• Lock-Based, Two-Phase, Timestamp-Based, Validation-Based are types of Concurrency handling
protocols
• The lock could be Shared (S) or Exclusive (X)
• Two-Phase locking protocol which is also known as a 2PL protocol needs transaction should acquire a
lock after it releases one of its locks. It has 2 phases growing and shrinking.
• The timestamp-based algorithm uses a timestamp to serialize the execution of concurrent
transactions. The protocol uses the System Time or Logical Count as a Timestamp.
Transaction
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to perform operations for accessing the
contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This small transaction contains
several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that transaction may fail before finished
all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation 2 then X's value will remain
4000 in the database which is not acceptable by the bank.
Transaction property
The transaction has the four properties. These are used to maintain consistency in a database, before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as one unit and either run to
completion or is not executed at all.
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs 600 and B consists of Rs 300.
Transfer Rs 100 from account A to account B.
T1 T2
Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
If the transaction T fails after the completion of transaction T1 but before completion of transaction T2, then the amount will be
deducted from A but not added to B. This shows the inconsistent database state. In order to ensure correctness of database
state, the transaction must be executed in entirety.
Consistency
o The integrity constraints are maintained so that the database is consistent before and after the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a new stable state.
o The consistent property of database states that every transaction sees a consistent database instance.
o The transaction is used to transform the database from one consistent state to another consistent state.
For example: The total amount must be maintained before or after the transaction.
Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then inconsistency will occur.
Isolation
o It shows that the data which is used at the time of execution of a transaction cannot be used by the second transaction
until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then that data item can't be accessed by
any other transaction T2 until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.
Durability
o The durability property is used to indicate the performance of the database's consistent state. It states that the
transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the system failure. When a transaction is
completed, then the database reaches a state known as the consistent state. That consistent state cannot be lost, even
in the event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.
States of Transaction
In a database, the transaction can be in one of the following states -
Active state
o The active state is the first state of every transaction. In this state, the transaction is being executed.
o For example: Insertion or deletion or updating a record is done here. But all the records are still not saved to the
database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the data is still not saved to the
database.
o In the total mark calculation example, a final display of the total marks step is executed in this state.
Committed
A transaction is said to be in a committed state if it executes all its operations successfully. In this state, all the effects are now
permanently saved on the database system.
Failed state
o If any of the checks made by the database recovery system fails, then the transaction is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to fetch the marks, then the
transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the database recovery system will make sure
that the database is in its previous consistent state. If not then it will abort or roll back the transaction to bring the
database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the transaction, all the executed
transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two operations:
1. Re-start the transaction
2. Kill the transaction
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it can’t go any further. This is
called transaction failure where only a few transactions or processes are hurt.
Reasons for a transaction failure could be −
• Logical errors − Where a transaction cannot complete because it has some code error or any internal error condition.
• System errors − Where the database system itself terminates an active transaction because the DBMS is not able to execute it, or
it has to stop because of some system condition. For example, in case of deadlock or resource unavailability, the system aborts an
active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop abruptly and cause the system to
crash. For example, interruptions in power supply may cause the failure of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk drives or storage drives used to fail
frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failure, which
destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be divided into two categories −
• Volatile storage − As the name suggests, a volatile storage cannot survive system crashes. Volatile storage devices are placed
very close to the CPU; normally they are embedded onto the chipset itself. For example, main memory and cache memory are
examples of volatile storage. They are fast but can store only a small amount of information.
• Non-volatile storage − These memories are made to survive system crashes. They are huge in data storage capacity, but slower
in accessibility. Examples may include hard-disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the
system. As time passes, the log file may grow too big to be handled at all. Checkpoint is a mechanism where all the
previous logs are removed from the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in consistent state, and all the transactions were committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following manner −
• The recovery system reads the logs backwards from the end to the last checkpoint.
• It maintains two lists, an undo-list and a redo-list.
• If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in the redo-list.
• If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redo-list and
their previous logs are removed and then redone before saving their logs.
Database SecurityDatabase security has many different layers, but the key aspects are:
Authentication
User authentication is to make sure that the person accessing the database is who he claims to be. Authentication can be
done at the operating system level or even the database level itself. Many authentication systems such as retina scanners
or bio-metrics are used to make sure unauthorized people cannot access the database.
Authorization
Authorization is a privilege provided by the Database Administer. Users of the database can only view the contents they
are authorized to view. The rest of the database is out of bounds to them.
The different permissions for authorizations available are:
1. System Administrator - This is the highest administrative authorization for a user. Users with this authorization can also execute
some database administrator commands such as restore or upgrade a database.
2. System Control - This is the highest control authorization for a user. This allows maintenance operations on the database but not
direct access to data.
3. System Maintenance - This is the lower level of system control authority. It also allows users to maintain the database but within a
database manager instance.
4. System Monitor - Using this authority, the user can monitor the database and take snapshots of it.
Database Integrity
Data integrity in the database is the correctness, consistency and completeness of data. Data integrity is enforced using
the following three integrity constraints:
1. Entity Integrity - This is related to the concept of primary keys. All tables should have their own primary keys which should
uniquely identify a row and not be NULL.
2. Referential Integrity - This is related to the concept of foreign keys. A foreign key is a key of a relation that is referred in another
relation.
3. Domain Integrity - This means that there should be a defined domain for all the columns in a database.
• A DBMS system always has a separate system for security which is responsible for protecting database against
accidental or intentional loss, destruction or misuse.
• Security Levels:
o Database level:- DBMS system should ensure that the authorization restriction needs to be there on users.
o Operating system Level:- Operating system should not allow unauthorized users to enter in system.
o Network Level:- Database is at some remote place and it is accessed by users through the network so
security is required.
• Security Mechanisms:
o Access Control(Authorization)
▪ Which identifies valid users who may have any access to the valid data in the Database and which
may restrict the operations that the user may perform?
▪ For Example The movie database might designate two roles:”users” (query the data only) and
“designers”(add new data)user must be assigned to a role to have the access privileges given to that
role.
▪ Each applications is associated with a specified role. Each role has a list of authorized users who
may execute/Design/administers the application.
o Authenticate the User:
▪ Which identify valid users who may have any access to the data in the Database?
▪ Restrict each user’s view of the data in the database
▪ This may be done with help of concept of views in Relational databases.
o Cryptographic control/Data Encryption:
▪ Encode data in a cryptic form(Coded)so that although data is captured by unintentional user still he
can’t be able to decode the data.
▪ Used for sensitive data, usually when transmitted over communications links but also may be used to
prevent by passing the system to gain access to the data.
o Inference control:
▪ Ensure that confidential information can’t be retrieved even by deduction.
▪ Prevent disclosure of data through statistical summaries of confidential data.
o Flow control or Physical Protection:
▪ Prevents the copying of information by unauthorized person.
▪ Computer systems must be physically secured against any unauthorized entry.
o Virus control:
▪ At user level authorization should be done to avoid intruder attacks through humans.
▪ There should be mechanism for providing protection against data virus.
o User defined control:
▪ Define additional constraints or limitations on the use of database.
▪ These allow developers or programmers to incorporate their own security procedures in addition to
above security mechanism.
Authorization
• Authorization is finding out if the person,once identified,is permitted to have the resource.
• Authorization explains that what you can do and is handled through the DBMS unless external security procedures
are available.
• Database management system allows DBA to give different access rights to the users as per their requirements.
• Basic Authorization we can use any one form or combination of the following basic forms of authorizations
o Resource authorization:-Authorization to access any system resource. e.g. sharing of database, printer etc.
o Alternation Authorization:- Authorization to add attributes or delete attributes from relations
o Drop Authorization:-Authorization to drop a relation.
• Granting of privileges:
o A system privilege is the right to perform a particular action,or to perform an action on any schema objects of
a particular type.
o An authorized user may pass on this authorization to other users.This process is called as granting of
privileges..
o Syntax:
o Example:
The following grant statement grants user U1,U2 and U3 the select privilege on Emp_Salary relation:
GRANT select
ON Emp_Salary
TO U1,U2 and U3.
• Revoking of privileges:
o We can reject the privileges given to particular user with help of revoke statement.
o To revoke an authorization,we use the revoke statement.
o Syntax:
o REVOKE <privilege list>
o ON<relation name or view name>
o FROM <user/role list>[restrict/cascade]
• Example:
The revocation of privileges from user or role may cause other user or roles also have to loose that privileges.This
behavior is called cascading of the revoke.
Revoke select
ON Emp_Salary
FROM U1,U2,U3.
o Execute privileges:
This privileges authorizes a user to execute a function or procedure. Thus,only user who has execute
privilege on a function Create_Acc() can call function.
GRANT EXECUTE
ON Create_Acc
TO U1.
DATABASE RECOVERY IN DBMS AND ITS
TECHNIQUES
DATABASE RECOVERY IN DBMS AND ITS TECHNIQUES: There can be any case in database system like any computer system
when database failure happens. So data stored in database should be available all the time whenever it is needed. So Database recovery
means recovering the data when it get deleted, hacked or damaged accidentally. Atomicity is must whether is transaction is over or not it
should reflect in the database permanently or it should not effect the database at all. So database recovery and database recovery techniques
are must in DBMS. So database recovery techniques in DBMS are given below.
Crash recovery:
DBMS may be an extremely complicated system with many transactions being executed each second. The sturdiness and hardiness of
software rely upon its complicated design and its underlying hardware and system package. If it fails or crashes amid transactions, it’s
expected that the system would follow some style of rule or techniques to recover lost knowledge.
DATABASE RECOVERY IN DBMS AND ITS TECHNIQUES
Classification of failure:
To see wherever the matter has occurred, we tend to generalize a failure into numerous classes, as follows:
▪ Transaction failure
▪ System crash
▪ Disk failure
Types of Failure
1. Transaction failure: A transaction needs to abort once it fails to execute or once it reaches to any further extent from wherever it
can’t go to any extent further. This is often known as transaction failure wherever solely many transactions or processes are hurt.
The reasons for transaction failure are:
▪ Logical errors
▪ System errors
1. Logical errors: Where a transaction cannot complete as a result of its code error or an internal error condition.
2. System errors: Wherever the information system itself terminates an energetic transaction as a result of the DBMS isn’t able to
execute it, or it’s to prevent due to some system condition. to Illustrate, just in case of situation or resource inconvenience, the
system aborts an active transaction.
3. System crash: There are issues − external to the system − that will cause the system to prevent abruptly and cause the system to
crash. For instance, interruptions in power supply might cause the failure of underlying hardware or software package failure.
Examples might include OS errors.
4. Disk failure: In early days of technology evolution, it had been a typical drawback wherever hard-disk drives or storage drives
accustomed to failing oftentimes. Disk failures include the formation of dangerous sectors, unreachability to the disk, disk crash or
the other failure, that destroys all or a section of disk storage.
Storage structure:
Classification of storage structure is as explained below:
Classification Of Storage
1. Volatile storage: As the name suggests, a memory board (volatile storage) cannot survive system crashes. Volatile storage devices
are placed terribly near to the CPU; usually, they’re embedded on the chipset itself. For instance, main memory and cache memory
are samples of the memory board. They’re quick however will store a solely little quantity of knowledge.
2. Non-volatile storage: These recollections are created to survive system crashes. they’re immense in information storage capability,
however slower in the accessibility. Examples could include hard-disks, magnetic tapes, flash memory, and non-volatile (battery
backed up) RAM.
Recovery and Atomicity:
When a system crashes, it should have many transactions being executed and numerous files opened for them to switch the information
items. Transactions are a product of numerous operations that are atomic in nature. However consistent with ACID properties of a database,
atomicity of transactions as an entire should be maintained, that is, either all the operations are executed or none.
When a database management system recovers from a crash, it ought to maintain the subsequent:
▪ It ought to check the states of all the transactions that were being executed.
▪ A transaction could also be within the middle of some operation; the database management system should make sure the atomicity
of the transaction during this case.
▪ It ought to check whether or not the transaction is completed currently or it must be rolled back.
▪ No transactions would be allowed to go away from the database management system in an inconsistent state.
There are 2 forms of techniques, which may facilitate a database management system in recovering as well as maintaining the atomicity of a
transaction:
▪ Maintaining the logs of every transaction, and writing them onto some stable storage before truly modifying the info.
▪ Maintaining shadow paging, wherever the changes are done on a volatile memory, and later, and the particular info is updated.
Log-based recovery Or Manual Recovery):
Log could be a sequence of records, which maintains the records of actions performed by dealing. It’s necessary that the logs area unit
written before the particular modification and hold on a stable storage media, that is failsafe. Log-based recovery works as follows:
▪ The log file is unbroken on a stable storage media.
▪ When a transaction enters the system and starts execution, it writes a log regarding it.
Recovery with concurrent transactions (Automated Recovery):
When over one transaction is being executed in parallel, the logs are interleaved. At the time of recovery, it’d become exhausting for the
recovery system to go back all logs, and so begin recovering. To ease this example, the latest package uses the idea of ‘checkpoints’.
Automated Recovery is of three types
Deferred Update Recovery
Immediate Update Recovery
Shadow Paging
Data is a valuable entity which must have to be firmly handled and managed as with any economic resource. So some
part or all of the commercial data may have tactical importance to their respective organization and hence must have to
be kept protected and confidential. In this chapter, you will learn about the scope of database security. There is a range of
computer-based controls that are offered as countermeasures to
these threats.
These listed circumstances mostly signify the areas in which the organization should focus on reducing the risk that is the
chance of incurring loss or damage to data within a database. In some conditions, these areas are directly related such
that an activity that leads to a loss in one area may also lead to a loss in another since all of the data within an
organization is interconnected.
What is a Threat?
Any situation or event, whether intentionally or incidentally, can cause damage which can reflect an adverse effect on the
database structure and consequently the organization. A threat may occur by a situation or event involving a person, or
the action or situations that is probably to bring harm to an organization and its database.
The degree that an organization undergoes as a result of a threat's following which depends upon some aspects, such as
the existence of countermeasures and contingency plans. Let us take an example where you have a hardware failure
occurs corrupting secondary storage; all processing activity must cease until the problem is resolved.
Computer-Based Controls
The different forms of countermeasure to threats on computer systems range from physical controls to managerial
procedures. In spite of the range of computer-based controls that are preexisting, it is worth noting that, usually, the
security of a DBMS is merely as good as that of the operating system, due to the close association among them.
• Access authorization.
• Access controls.
• Views.
• Backup and recovery of data.
• Data integrity.
• Encryption of data.
• RAID technology.
• Transaction support.
• Concurrency Control.
Although each function can be discussed discretely; but they are mutually dependent. Many DBMSs allow users to carry
out simultaneous operations on the database. If these operations are not restricted, the accesses may get in the way with
one another, and the database can become incompatible. For defeating this problem the DBMS implements a
concurrency control technique using a protocol which prevents database accesses from prying with one another. In this
chapter, you will learn about the concurrency control and transaction support for any centralized DBMS that consists of a
single database.
• Due to hardware or software errors, the system crashes which ultimately resulting in loss of main memory.
• Failures of media, such as head crashes or unreadable media that results in the loss of portions of secondary storage.
• There can be application software errors, such as logical errors which are accessing the database that can cause one or
more transactions to abort or fail.
• Natural physical disasters can also occur such as fires, floods, earthquakes, or power failures.
• Carelessness or unintentional destruction of data or directories by operators or users.
• Damage or intentional corruption or hampering of data (using malicious software or files) hardware or software facilities.
Whatever the grounds of the failure are, there are two principal things that you have to consider:
Recovery Facilities
Every DBMS should offer the following facilities to help out with the recovery mechanism:
• Backup mechanism makes backup copies at a specific interval for the database.
• Logging facilities keep tracing the current state of transactions and any changes made to the database.
• Checkpoint facility allows updates to the database for getting the latest patches to be made permanent and keep secure
from vulnerability.
• Recovery manager allows the database system for restoring the database to a reliable and steady state after any failure
occurs.
Normalization
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate the
undesirable characteristics like Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them using relationship.
o The normal form is used to reduce redundancy from the database table.
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the
primary key.
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach
more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate
key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X →
Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes
(EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a
Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship
between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of
data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth normal form (5NF)
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In
this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we
leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns
blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is
required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look like:
Employee ⋈ Department
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part of R1 or R2 or
must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of relation R1(ABC).
Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third
attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always
requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each model every
year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation of these
dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined COLOR".
DBMS - Normalization
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional dependency says that
if two tuples have same values for attributes A1, A2,..., An, then those two tuples must have to have same values for
attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y. The left-
hand side attributes determine the values of attributes on the right-hand side.
Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F +, is the set of all functional dependencies
logically implied by F. Armstrong's Axioms are a set of rules, that when applied repeatedly, generates a closure of
functional dependencies.
• Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
• Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is adding attributes in dependencies,
does not change the basic dependencies.
• Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also holds. a → b is called as a
functionally that determines b.
Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator.
Managing a database with anomalies is next to impossible.
• Update anomalies − If data items are scattered and are not linked to each other properly, then it could lead to strange situations.
For example, when we try to update one data item having its copies scattered over several places, a few instances get updated
properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
• Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also
saved somewhere else.
• Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent state.
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule, non-
key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key attribute
individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial dependency.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City can be
identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip
→ City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows −
Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −
Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly,
update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss normal forms with
examples.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are – Insertion,
update and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that has
four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for
storing employee’s address and emp_dept for storing the department details in which the employee works. At
some point of time the table looks like this:
The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two departments of
the company. If we want to update the address of Rick then we have to update the same in two rows or the data
will become inconsistent. If somehow, the correct address gets updated in one department but not in other then
as per the database, Rick would be having two different addresses, which is not correct and would lead to
inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned
to any department then we would not be able to insert the data into the table if emp_dept field doesn’t allow
nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the rows
that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned
only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will discuss about
normalization.
Normalization
Here are the most commonly used normal forms:
Example: Suppose a company wants to store the names and contact details of its employees. It creates a table
that looks like this:
emp_id emp_name emp_address emp_mobile
8812121212
102 Jon Kanpur
9900012222
9990000123
104 Lester Bangalore 8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same field as
you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the
emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
emp_id emp_name emp_address emp_mobile
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they teach. They create a table
that looks like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a
same teacher.
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime
attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This violates
the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key
of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency
X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create a table
named employee_details that looks like this:
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that
makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id).
This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the transitive
dependency:
employee table:
employee_zip table:
Example: Suppose there is a company wherein employees work in more than one department. They store the
data like this:
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
The attributes of a table is said to be dependent on each other when an attribute of a table uniquely identifies
another attribute of the same table.
For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id
attribute uniquely identifies the Stu_Name attribute of student table because if we know the student id we can tell
the student name associated with it. This is known as functional dependency and can be written as Stu_Id-
>Stu_Name or in words we can say Stu_Name is functionally dependent on Stu_Id.
Formally:
If column A of a table uniquely identifies the column B of same table then it can represented as A->B (Attribute B
is functionally dependent on attribute A)