Java Merged
Java Merged
Development
Unit 1
Introduction to DBMS
Topic 2
File Organization
Session 2
Objectives
After completing this session, you will be able to understand :
▪ File organisation.
The easiest method for file Organization is Sequential method. In this method the file are stored
one after another in a sequential manner. There are two ways to implement this method as Pile File
method, Sorted File Method.
Heap File Organization works with data blocks. In this method records are inserted at the end of the
file, into the data blocks. No Sorting or Ordering is required in this method. If a data block is full, the
new record is stored in some other block, Here the other data block need not be the very next data
block, but it can be any block in the memory. It is the responsibility of DBMS to store and manage
the new records.
Hashed File Organisation
3. Hashed File Organisation
▪ Hashed file organisation is also called a direct file organisation.
▪ In this method, for storing the records a hash function is calculated, which provides the
address of the block to store the record. Any type of mathematical function can be used
as a hash function. It can be simple or complex.
▪ Hash function is applied to columns or attributes to get the block address. The records
are stored randomly. So, it is also known as Direct or Random file organization.
▪ If the generated hash function is on the column which is considered as key, then the
column can be called as hash key and if the generated hash function is on the column
which is considered as non-key, then the column can be called as hash column.
Differences
File management System Database Management system
Need an individual application program to perform any Using a single command any operation can be performed on data
operation on data files. files.
Programming is done using COBOL, C, PASCAL called as Programming is done using SQL which is a 4GL.
3GL.
▪ In B Tree, Keys and records both can be stored in the internal as well as leaf nodes.
Whereas, in B+ tree, records (data) can only be stored on the leaf nodes while internal
nodes can only store the key values.
▪ The leaf nodes of a B+ tree are linked together in the form of a singly linked lists to make
the search queries more efficient.
Continued…
B+ Tree are used to store the large amount of data which can not be stored in the main
memory. Due to the fact that, size of main memory is always limited, the internal nodes
(keys to access records) of the B+ tree are stored in the main memory whereas, leaf nodes
are stored in the secondary memory.
The internal nodes of B+ tree are often called index nodes. A B+ tree of order 3 is shown in
the following figure.
B+ Tree:
B+ File Organisation
4. B+ File Organization:
▪ B+ tree file organization is the advanced method of an indexed sequential access
method. It uses a tree-like structure to store records in File.
▪ It uses the same concept of key-index where the primary key is used to sort the records.
For each primary key, the value of the index is generated and mapped with the record.
▪ The B+ tree is similar to a binary search tree (BST), but it can have more than two
children. In this method, all the records are stored only at the leaf node. Intermediate
nodes act as a pointer to the leaf nodes. They do not contain any records.
The given B+ tree shows that:
▪ There is one root node of the tree, i.e., 25.
▪ There is an intermediary layer with nodes. They do not store the actual record. They
have only pointers to the leaf node.
▪ The nodes to the left of the root node contain the prior value of the root and nodes to
the right contain next value of the root, i.e., 15 and 30 respectively.
▪ There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
▪ Searching for any record is easier as all the leaf nodes are balanced.
▪ In this method, searching any record can be traversed through the single path
and accessed easily.
Cluster File Organisation
5. Cluster file organization:
▪ When the two or more records are stored in the same file, it is known as clusters. These
files will have two or more tables in the same data block, and key attributes which are
used to map these tables together are stored only once.
▪ This method reduces the cost of searching for various records in different files.
▪ The cluster file organization is used when there is a frequent need for joining the tables
with the same condition. These joins will give only a few records from both tables. In the
given example, we are retrieving the record for only particular departments.
▪ This method can't be used to retrieve the record for the entire department.
In this method, we can directly insert, update or delete any record. Data is sorted
based on the key with which searching is done. Cluster key is a type of key with which
joining of the table is performed.
Types of Cluster file organization:
Cluster file organization is of two types:
1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The
above EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster.
Here, all the records are grouped based on the cluster key- DEP_ID and all the records are
grouped.
2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on
the cluster key, we generate the value of the hash key for the cluster key and store the
records with the same hash key value.
Advantages & Disadvantages of Clustered File
Organization:
Advantages of Clustered File Organization:
This method is best suited when there is frequent request for joining the tables with same
joining condition.
This method is not suitable for very large databases since the performance of this method
on them is low.
We cannot use this clusters, if there is any change is joining condition. If the joining
condition changes, the traversing the file takes lot of time.
This method is not suitable for less frequently joined tables or tables with 1:1 conditions.
Summary
▪ File Organization refers to the logical relationships among various records that constitute
the file.
▪ Hashed file organisation is also called a direct file organisation and it is implemented with
hash function.
▪ Cluster file organization can be implemented with indexed clusters and hash clusters.
▪ Characteristics of DBMS.
▪ Types of DBMS.
▪ File organisation.
What is a Database?
Database Definition
▪ A database is a collection of related data which represents some aspect of the real
world.
▪ A database system is designed to be built and populated with data for a certain
task.
▪ Databases are used for storing, maintaining and accessing any sort of data. They
collect information on people, places or things. That information is gathered in one
place so that it can be observed and analyzed. Databases can be thought of as an
organized collection of information.
What are databases used for?
Businesses use data stored in databases to make informed business decisions. Some of the
ways organizations use databases include the following:
▪ Improve business processes. Companies collect data about business processes, such
sales, order processing and customer service. They analyze that data to improve these
processes, expand their business and grow revenue.
▪ Keep track of customers. Databases often store information about people, such as
customers or users. For example, social media platforms use databases to store user
information, such as names, email addresses and user behavior.
▪ Secure personal health information. Healthcare providers use databases to securely
store personal health data to inform and improve patient care.
▪ Store personal data. Databases can also be used to store personal information. For
example, personal cloud storage is available for individual users to store media, such as
photos, in a managed cloud.
Database Management System (DBMS)
▪ Database Management System (DBMS) is software for storing and retrieving users’
data while considering appropriate security measures. It consists of a group of programs
which manipulate the database.
▪ The DBMS accepts the request for data from an application and instructs the operating
system to provide the specific data. In large systems, a DBMS helps users and other
third-party software to store and retrieve data.
▪ DBMS allows users to create their own databases as per their requirement. The term
“DBMS” includes the user of the database and other application programs. It provides an
interface between the data and the software application.
Characteristics of DBMS
Characteristics and properties of DBMS:
Here are the characteristics and properties of Database Management System:
▪ Database Management Software allows entities and relations among them to form tables.
▪ DBMS supports multi-user environment that allows users to access and manipulate data in
parallel.
Users of DBMS
The end users are the people who interact with the database
End-Users management system. They conduct various operations on
database like retrieving, updating, deleting, etc.
Application of DBMS
Sector Use of DBMS
For customer information, account activities, payments, deposits,
Banking
loans, etc.
Airlines For reservations and schedule information.
Universities For student information, course registrations, colleges and grades.
Telecommunication It helps to keep call records, monthly bills, maintaining balances, etc.
2. Network Database
The network database model allows each child to have multiple parents. It helps you to
address the need to model more complex relationships like as the orders/parts
many-to-many relationship. In this model, entities are organized in a graph which can be
accessed through several paths.
3. Relational Data base
Relational DBMS is the most widely used DBMS model because it is one of the easiest.
This model is based on normalizing data in the rows and columns of the tables. Relational
model stored in fixed structures and manipulated using SQL.
In Object-oriented Model data stored in the form of objects. The structure which is called
classes which display data within it. It is one of the components of DBMS that defines a
database as a collection of objects which stores both data members values and operations.
File Organization in DBMS
File:
▪ A file is named collection of related information that is recorded on secondary storage
such as magnetic disks, magnetic tables and optical disks.
▪ A database consist of a huge amount of data. The data is grouped within a table in
RDBMS, and each table have related records. A user can see that the data is stored in
form of tables, but in actual this huge amount of data is stored in physical memory in
form of files.
What is File Organization?
▪ File Organization refers to the logical relationships among various records that constitute
the file, particularly with respect to the means of identification and access to any specific
record. In simple terms, Storing the files in certain order is called file Organization.
▪ File Structure refers to the format of the label and data blocks and of any logical control
record.
19
Types of File Organizations :
Various methods have been introduced to Organize files. These particular methods have
advantages and disadvantages on the basis of access or selection. Thus it is all upon the
programmer to decide the best suited file Organization method according to his requirements.
Some types of File Organizations are:
20
1. Sequential File Organization –
The easiest method for file Organization is Sequential method. In this method the file are
stored one after another in a sequential manner. There are two ways to implement this
method:
21
i. Pile File Method:
This method is quite simple,
in which we store the records in a sequence i.e
one after other in the order in which they are
inserted into the tables.
22
ii. Sorted File Method
In this method, As the name itself suggest whenever a new record has to be inserted, it is
always inserted in a sorted (ascending or descending) manner. Sorting of records may be
based on any primary key or any other key.
23
Insertion of new record –
Let us assume that there is a preexisting sorted sequence of four records R1, R3, and so on upto R7
and R8. Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the
end of the file and then it will sort the sequence.
24
Heap File Organization:
▪ Heap File Organization works with data blocks. In this method records are inserted at the
end of the file, into the data blocks.
▪ No Sorting or Ordering is required in this method. If a data block is full, the new record is
stored in some other block, Here the other data block need not be the very next data
block, but it can be any block in the memory.
▪ When the data block is full, the new record is stored in some other block. This new data
block need not to be the very next data block, but it can select any data block in the
memory to store new records. The heap file is also known as an unordered file.
▪ In the file, every record has a unique id, and every page in a file is of the same size. It is
the DBMS responsibility to store and manage the new records.
Summary
▪ A database is a collection of related data which represents some aspect of the real
world.
▪ DBMS allows users to create their own databases as per their requirement. The term
“DBMS” includes the user of the database and other application programs.
▪ Relational DBMS is the most widely used DBMS model because it is one of the easiest.
▪ File Organization refers to the logical relationships among various records that constitute
the file.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 1
Introduction to DBMS
Topic 3
Normalization
Session 3
Objectives
After completing this session, you will be able to understand :
▪ Primary key is a field in a table which uniquely identifies each row/record in a database table.
Primary keys must contain unique values. A primary key column cannot have NULL values.
▪ Secondary Key is the key that has not been selected to be the primary key. However, it is
considered a candidate key for the primary key.
Therefore, a candidate key not selected as a primary key is called secondary key. Candidate
key is an attribute or set of attributes that you can consider as a Primary key.
A Foreign Key creates a link between tables. It references the primary key in another table
and links it.
For example, the DeptID in the Employee table is a foreign key −
<Employee>
<Department>
▪ The DeptID in the Employee table is a Foreign Key in the Employee Table.
▪ Unique Key: Many users consider Primary Key as Unique Key, since both uniquely
identify a table, but Unique Key is different from Primary Key. Unique Key accepts null
values and Primary Key cannot have null.
Normal Forms
Normal Forms:
▪ It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
▪ First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:
The decomposition of the EMPLOYEE table into 1NF has been shown below:
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:739Difference between JDK, J JVM
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_ID SUBJECT
25 Chemistry
TEACHER_SUBJECT table: 25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF):
▪ A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
▪ 3 NF is used to reduce the data duplication. It is also used to achieve the data integrity.
▪ If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
▪ A relation is in third normal form if it holds at least one of the following conditions for
every non-trivial function dependency X → Y.
X is a super key.
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMP_ID EMP_NAME EMP_ZIP
EMPLOYEE table:
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
201010 UP Noida
•A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
•For a dependency A → B, if for a single value of A, multiple values of B exists,
•then the relation will be a multi-valued dependency.
Example: STUDENT Table
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
▪ The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
▪ So to make the above table into 4NF, we can decompose it into two tables:
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
STU_ID HOBBY
59 Physics
21 Dancing
21 Singing
STUDENT_HOBBY 34 Dancing
74 Cricket
59 Hockey
Fifth normal form (5NF):
A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should
be lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.
▪ In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
▪ Suppose we add a new Semester as Semester 3 but do not know about the subject
and who will be taking that subject so we leave Lecturer and Subject as NULL. But
all three columns together acts as a primary key, so we can't leave other two
columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3:
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Overview of Normal Forms:
A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
2 NF
functional dependent on the primary key.
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
4 NF
valued dependency.
A relation is in 5NF if it is in 4NF and not contains any join dependency and
5 NF
joining should be lossless.
Summary
▪ Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.
▪ A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the
primary key.
▪ A relation will be in 4 NF if it is in Boyce Codd normal form and has no multi-valued dependency.
▪ A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 1
Introduction to DBMS
Topic 4
Entity Relationship Model
Session 4
Objectives
After completing this session, you will be able to understand :
▪ ER Diagram.
▪ It develops a conceptual design for the database. It also develops a very simple and
easy to design view of data.
▪ The address can be another entity with attributes like city, street name, pin code, etc and
there will be a relationship between them.
Component of ER Diagram:
1. Entity:
▪ An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.
▪ For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute:
▪ The key attribute is used to represent the main characteristics of an entity.
▪ The composite attribute is represented by an ellipse, and those ellipses are connected
with an ellipse.
c. Multivalued Attribute:
▪ An attribute can have more than one value. These attributes are known as a multivalued
attribute. The double oval is used to represent multivalued attribute.
▪ For example, a student can have more than one phone number.
d. Derived Attribute:
▪ An attribute that can be derived from other attribute is known as a derived attribute. It
can be represented by a dashed ellipse.
▪ For example, A person's age changes over time and can be derived from another
attribute like Date of birth.
3. Relationship:
A relationship is used to describe the relation between entities. Diamond or rhombus is used
to represent the relationship.
a. One-to-One Relationship
b. One-to-many relationship
c. Many-to-one relationship
d. Many-to-many relationship
a. One-to-One Relationship:
▪ When only one instance of an entity is associated with the relationship, then it is known
as one to one relationship.
▪ For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship:
▪ When only one instance of the entity on the left, and more than one instance of an entity
on the right associates with the relationship then this is known as a one-to-many
relationship.
▪ For example, Scientist can invent many inventions, but the invention is done by the only
specific scientist.
c. Many-to-one relationship:
▪ When more than one instance of the entity on the left, and only one instance of an entity
on the right associates with the relationship then it is known as a many-to-one
relationship.
▪ For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship:
▪ When more than one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then it is known as a many-to-many
relationship.
▪ For example, Employee can assign by many projects and project can have many
employees.
Notation of ER diagram:
▪ Database can be represented using the notations. In ER diagram, many notations are
used to express the cardinality. These notations are as follows:
Summary
▪ ER model stands for an Entity-Relationship model. It is a high-level data model.
▪ An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.
▪ The attribute is used to describe the property of an entity. Eclipse is used to represent an
attribute.
▪ Query in DBMS
▪ Commands in SQL.
DBMS Query
SQL Commands:
▪ Pre-requisite for DBMS is SQL Commands. SQL is an abbreviation for Structured
Query Language.
▪ SQL can perform various tasks like create a table, add data to tables, drop the
table, modify the table, set permission for users.
SQL Commands:
▪ Types of SQL Commands:
▪ There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.
1. Data Definition Language (DDL):
▪ DDL changes the structure of the table like creating a table, deleting a table, altering a
table, etc.
▪ All the command of DDL are auto-committed that means it permanently save all the
changes in the database.
i. CREATE
ii. ALTER
iii. DROP
iv. TRUNCATE
a. CREATE :
▪ It is used to create a new table in the database.
▪ Syntax:
Example:
▪ Syntax:
Example
▪ Syntax:
EXAMPLE
Syntax:
Example:
▪ The command of DML is not auto-committed that means it can't permanently save all the
changes in the database. They can be rollback.
a. INSERT
b. UPDATE
c. DELETE
a. INSERT:
The INSERT statement is a SQL query. It is used to insert data into the row of a table.
Syntax:
Or
For example:
Syntax:
For example:
UPDATE students
▪ Syntax:
For example:
WHERE Author="Sonoo";
Data Query Language:
▪ DQL is used to fetch the data from the database. It uses only one command: SELECT
a. SELECT: This is the same as the projection operation of relational algebra. It is used to
select the attribute based on the condition described by WHERE clause.
Syntax:
SELECT expressions
FROM TABLES
WHERE conditions;
For example:
SELECT emp_name
FROM employee
WHERE age > 20;
Summary
▪ SQL commands are instructions. It is used to communicate with the database.
▪ SQL can perform various tasks like create a table, add data to tables, drop the table,
modify the table, set permission for users.
▪ DDL changes the structure of the table like creating a table, deleting a table, altering a table, etc.
▪ DML commands are used to modify the database. It is responsible for all form of changes in the
database.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 1
Introduction to DBMS
Topic 6
Query Processing in DBMS
Session 6
Objectives
After completing this session, you will be able to understand :
▪ Query processing
▪ Parser is a compiler that is used to break the data into smaller elements coming from
lexical analysis phase.
▪ A parser takes input in the form of sequence of tokens and produces output in the form
of parse tree.
▪ It gets translated into expressions that can be further used at the physical level of
the file system.
▪ After this, the actual evaluation of the queries and a variety of query -optimizing
transformations and takes place.
▪ Thus before processing a query, a computer system needs to translate the query
into a human-readable and understandable language.
▪ Relational algebra is well suited for the internal representation of a query.
▪ When a user executes any query, for generating the internal form of the query, the parser
in the system checks the syntax of the query, verifies the name of the relation in the
database, the tuple, and finally the required attribute value. The parser creates a tree of
the query, known as 'parse-tree.'
▪ Further, translate it into the form of relational algebra. With this, it evenly replaces all the
use of the views when used in the query.
Example:
▪ Suppose a user executes a query.
▪ In SQL, a user wants to fetch the records of the employees whose salary is greater than or
equal to 10000. For doing this, the following query is undertaken:
▪ Thus, to make the system understand the user query, it needs to be translated in the form
of relational algebra. We can bring this query in the relational algebra form as:
After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.
Evaluation:
▪ For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation.
▪ Thus, after translating the user query, the system executes a query evaluation plan.
Query Evaluation Plan:
▪ In order to fully evaluate a query, the system needs to construct a query evaluation plan.
▪ The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.
▪ Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query execution
plan.
▪ A query execution engine is responsible for generating the output of the given query. It
takes the query execution plan, executes it, and finally makes the output for the user
query.
Optimization:
▪ The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to
write their query efficiently.
▪ For optimizing a query, the query optimizer should have an estimated cost analysis of
each operation. It is because the overall operation cost depends on the memory
allocations to several operations, execution costs, and so on.
▪ Finally, after selecting an evaluation plan, the system evaluates the query and
produces the output of the query.
Summary
▪ Query Processing is the activity performed in extracting data from the database.
▪ When a user executes any query, for generating the internal form of the query, the parser
in the system checks it.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 2
Concurrency control transactions
and schedule
Topic 1
Introduction to Deadlock
Session 1
Objectives
After completing this session, you will be able to understand :
▪ A Deadlock is a situation where each of the computer process waits for a resource
which is being assigned to some another process.
▪ In this situation, none of the process gets executed since the resource it needs, is
held by some other process which is also waiting for some other resource to be
released.
Example of Deadlock:
▪ Let us assume that there are three processes P1, P2 and P3. There are three different
resources R1, R2 and R3. R1 is assigned to P1, R2 is assigned to P2 and R3 is assigned to
P3.
Example of Deadlock:
▪ After some time, P1 demands for R1 which is being used by P2. P1 halts its execution since it
can't complete without R2. P2 also demands for R3 which is being used by P3.
▪ P2 also stops its execution because it can't continue without R3. P3 also demands for R1
which is being used by P1 therefore P3 also stops its execution.
▪ In this scenario, a cycle is being formed among the three processes. None of the process is
progressing and they are all waiting. The computer becomes unresponsive since all the
processes got blocked.
▪ A deadlock can be indicated by a cycle in the wait-for-graph. This is a directed graph in
which the vertices denote transactions and the edges denote waits for data items.
▪ For example, in the following wait-for-graph, transaction T1 is waiting for data item X
which is locked by T3. T3 is waiting for Y which is locked by T2 and T2 is waiting for Z
which is locked by T1.
▪ Hence, a waiting cycle is formed, and none of the transactions can proceed executing.
Sr. No. Deadlock Starvation
1 Deadlock is a situation where no process Starvation is a situation where the low priority
got blocked and no process proceeds process got blocked and the high priority
processes proceed.
2 Deadlock is an infinite waiting. Starvation is a long waiting but not infinite.
5 Deadlock happens when Mutual exclusion, It occurs due to the uncontrolled priority and
hold and wait, No pre-emption and circular resource management.
wait occurs simultaneously.
Necessary conditions for Deadlocks:
A resource can only be shared in mutually exclusive
1 Mutual Exclusion manner i.e. two processes cannot use the same
resource at the same time.
A process waits for some resources while holding another
2 Hold & Wait
resource at the same time.
The process which once scheduled will be executed till the
3 No preemption completion. No other process can be scheduled by the
scheduler meanwhile.
All the processes must be waiting for the resources in a cyclic
4 Circular wait manner so that the last process is waiting for the resource
which is being held by the first process.
Strategies for handling Deadlock:
1. Deadlock Ignorance
2. Deadlock prevention
3. Deadlock avoidance
▪ In this approach, the Operating system assumes that deadlock never occurs. It simply
ignores deadlock.
▪ This approach is best suitable for a single end user system where User uses the system
only for browsing and all other normal stuff.
2. Deadlock prevention:
▪ Deadlock happens only when Mutual Exclusion, hold and wait, No pre-emption and
circular wait holds simultaneously.
▪ If it is possible to violate one of the four conditions at any time then the deadlock can
never occur in the system.
▪ This method is suitable for a large database. If the resources are allocated in such a way
that deadlock never occurs, then the deadlock can be prevented.
▪ The Database management system analyses the operations of the transaction whether
they can create a deadlock situation or not. If they do, then the DBMS never allowed that
transaction to be executed.
14
3. Deadlock avoidance:
▪ In deadlock avoidance, the operating system checks whether the system is in safe state
or in unsafe state at every step which the operating system performs.
▪ The process continues until the system is in safe state. Once the system moves to
unsafe state, the OS has to backtrack one step.
▪ In simple words, The OS reviews each allocation so that the allocation doesn't cause the
deadlock in the system.
15
Deadlock avoidance:
▪ Deadlock avoidance mechanism is used to detect any deadlock situation in advance.
▪ A method like "wait for graph" is used for detecting the deadlock situation but this method
is suitable only for the smaller database. For the larger database, deadlock prevention
method can be used.
16
4. Deadlock detection and recovery:
▪ This approach let the processes fall in deadlock and then periodically check whether
deadlock occur in the system or not.
▪ If it occurs then it applies some of the recovery methods to the system to get rid of
deadlock.
17
Deadlock detection and recovery:
1. If resources have a single instance –
In this case for Deadlock detection, we can run an algorithm to check for the cycle in the
Resource Allocation Graph. The presence of a cycle in the graph is a sufficient condition for
deadlock.
18
Deadlock detection and recovery:
2. In the above diagram, resource 1 and resource 2 have single instances. There is a cycle
R1 → P1 → R2 → P2. So, Deadlock is Confirmed.
19
Deadlock Recovery:
A traditional operating system such as Windows doesn’t deal with deadlock recovery as it is
a time and space-consuming process. Real-time operating systems use Deadlock
recovery.
20
Summary
▪ A Deadlock is a situation where each of the computer process waits for a resource which
is being assigned to some another process.
▪ Starvation is a situation where the low priority process got blocked and the high priority
processes proceed.
▪ Transaction Model.
▪ Distributed deadlock.
▪ Simple transaction model follows all ACID properties while doing transactions.
i. Active
ii. Partially committed
iii. Failed
iv. Aborted
v. Committed
Transaction States:
Description:
▪ At first, when the transaction is going to operate it is the active state. When the read or
write operations occurs, it can be called partially committed states.
▪ Finally, after read or write operations, when they use commit operations, they will be
committed states meaning the transaction is stored permanently in the database.
▪ And when after these both of active states and partially committed states fails, they will
fall under the category failed state. Without executing, failed state will rollback which will
create the aborted state.
▪ Once again, aborted state will be automatically converges into terminated state. Also,
after the committed state, the transaction terminates.
Description:
▪ In the process of series of operations, read and write operations creates partially
committed. That will be stored in local memory or buffer.
▪ After, the use of commit statement, data will be moved into permanent storage. This will
justify the flow from active to partially committed and to committed state.
▪ On the other hand, in the case of power failure, it will be in failed state. Also, in the
partially committed state, in the power failure case, it will be in failed sate.
▪ After this, rollback occurs meaning the local memory is cleared. Then aborted occurs
meaning the database is unchanged and finally terminated.
Deadlock:
▪ We have covered basics about deadlock in the last lecture. Now, we will discuss,
about distributed deadlock.
▪ For example, if we take ATM machines and do not use concurrency, multiple persons
cannot draw money at a time in different places. This is where we need concurrency.
Advantages:
The advantages of concurrency control are as follows −
i. Locking: Lock guaranties exclusive use of data items to a current transaction. It first
accesses the data items by acquiring a lock, after completion of the transaction it
releases the lock.
ii. Time stamping: Time stamp is a unique identifier created by DBMS that indicates
relative starting time of a transaction. Whatever transaction we are doing it stores the
starting time of the transaction and denotes a specific time.
iii. Optimistic: It is based on the assumption that conflict is rare and it is more efficient to
allow transactions to proceed without imposing delays to ensure serializability.
Concurrency Control and Recovery in
Distributed Databases:
▪ For currency control and recovery purposes, numerous problems arise in a distributed DBMS
environment that is not encountered in a centralized DBMS environment. These include the
following:
v. Distributed Deadlock
Lock management:
▪ Lock management can be distributed across sites in many ways:
i. Centralized: A single site is in-charge of handling lock and unlock requests for all objects.
ii. Primary copy: One copy of each object is designates as the primary copy. All requests to
lock or unlock a copy of these objects are handled by the lock manager at the site where the
primary copy is stored, regardless of where the copy itself is stored.
iii. Fully Distributed: Request to lock or unlock a copy of an object stored at a site are handled
by the lock manager at the site where the copy is stored.
Distributed Deadlock:
▪ One issue that requires special attention when using either primary copy or fully
distributed locking is deadlock detection. Each site maintains a local waits-for graph and
a cycle in a local graph indicates a deadlock.
▪ As shown in the following figure T2 is waiting for T1 at site A and T1 is waiting for T2 at
site B thus we have a Deadlock.
Distributed Deadlock Detection Algorithm:
To detect such deadlocks, a distributed deadlock detection algorithm must be
used and we have three types of algorithms:
1 Centralized Algorithm
2 Hierarchical Algorithm
3 Simple Algorithm
1. Centralized Algorithm:
▪ It consists of periodically sending all local waits-for graphs to some one site that is
responsible for global deadlock detection.
▪ At this site, the global waits-for graphs is generated by combining all local graphs and in
the graph the set of nodes is the union of nodes in the local graphs and there is an edge
from one node to another if there is such an edge in any of the local graphs.
2. Hierarchical Algorithm:
▪ This algorithm groups the sites into hierarchies and the sites might be grouped by states,
then by country and finally into single group that contain all sites.
▪ Every node in this hierarchy constructs a wait-for graph that reveals deadlocks involving
only sites contained in (the sub tree rooted at) this node.
▪ Thus, all sites periodically (e.g., every 10 seconds) send their local waits-for graph to the
site constructing the waits-for graph for their country.
▪ The sites constructing waits-for graph at the country level periodically (e.g., every 10
minutes) send the country waits-for graph to site constructing the global waits-for graph.
19
3. Simple Algorithm:
▪ If a transaction waits longer than some chosen time-out interval, it is aborted.
▪ Although this algorithm causes many unnecessary restart but the overhead of the
deadlock detection is low.
20
Summary
▪ Simple transaction model is a model of transaction how it must be. It has active, partially
committed, failed, aborted, and committed states.
▪ Concurrency control is a procedure in DBMS which helps us for the management of two
simultaneous processes to execute without conflicts between each other.
▪ There are three concurrency control techniques as Locking, Time stamping, Optimistic.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 2
Concurrency control transactions
and schedule
Topic 3
Lock-Based Protocol
Session 3
Objectives
After completing this session, you will be able to understand :
▪ A transaction log keeps track of all transactions that update the database.
i. Shared lock
▪ It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:
▪ In the exclusive lock, the data item can be both reads as well as written by the
transaction.
▪ This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
Lock protocols:
▪ Locking protocols are used in database management systems as a means of
concurrency control. Multiple transactions may request a lock on a data item
simultaneously. Hence, we require a mechanism to manage the locking requests made
by transactions.
▪ Such a mechanism is called as Lock Manager. It relies on the process of message
passing where transactions and lock manager exchange messages to handle the locking
and unlocking of data items.There are four types of lock protocols available:
▪ In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.
▪ In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
▪ In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
There are two phases of 2PL:
▪ Growing phase: In the growing phase, a new lock on the data item may be acquired by
the transaction, but none can be released.
▪ Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
iv. Strict Two-phase locking (Strict-2PL):
▪ The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the
locks, the transaction continues to execute normally.
▪ The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock
after using it.
▪ Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at
a time.
▪ Strict-2PL protocol does not have shrinking phase of lock release. It does not have
cascading abort as 2PL does.
Summary
▪ A transaction log keeps track of all transactions that update the database.
▪ A prerequisite of this protocol is that we know the order to access a Database Item.
For this we implement a Partial Ordering on a set of the Database Items (D) {d1, d2,
d3, ….., dn} . The protocol following the implementation of Partial Ordering is stated
as-
▪ If di –> dj then any transaction accessing both di and dj must access di before
accessing dj.
Implies that the set D may now be viewed as a directed acyclic graph (DAG), called
a database graph.
Tree Based Protocol:
▪ Partial Order on Database items determines a tree like structure.
▪ The first lock by Ti may be on any data item. Subsequently, a data Q can be locked by
Ti only if the parent of Q is currently locked by Ti.
▪ Data items Locked and Unlocked are following the same rule as given above and follows the
Database Graph.
▪ Thus, let’s revise once more what are the key points of Graph Based Protocols.
Advantages & Disadvantages:
Advantage –
i. Ensures Conflict Serializable Schedule.
ii. Ensures Deadlock Free Schedule
iii. Unlocking can be done anytime
iv. With some advantages comes some Disadvantages also.
Disadvantage –
i. Unnecessary locking overheads may happen sometimes, like if we want both D and E,
then at least we have to lock B to follow the protocol.
ii. Cascading Rollbacks is still a problem. We don’t follow a rule of when Unlock operation
may occur so this problem persists for this protocol.
Overall this protocol is mostly known and used for its unique way of implementing Deadlock
Freedom.
Summary
▪ A Graph Based Protocols are yet another way of implementing Lock Based Protocols.
▪ Crash Recovery.
▪ Checkpoint.
Crash Recovery:
▪ DBMS is a highly complex system with hundreds of transactions being executed
every second.
▪ The durability and robustness of a DBMS depends on its complex architecture and
its underlying hardware and system software.
▪ If it fails or crashes amid transactions, it is expected that the system would follow
some sort of algorithm or techniques to recover lost data.
Failure Classification:
▪ To see where the problem has occurred, we generalize a failure into various
categories, as follows −
1. Transaction Failure
2. System Crash
3. Disk Failure
1. Transaction failure:
▪ A transaction has to abort when it fails to execute or when it reaches a point from where
it can’t go any further. This is called transaction failure where only a few transactions or
processes are hurt.
▪ Logical errors − Where a transaction cannot complete because it has some code error
or any internal error condition.
▪ System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts
an active transaction.
2. System Crash:
▪ There are problems − external to the system − that may cause the system to stop
abruptly and cause the system to crash.
▪ For example, interruptions in power supply may cause the failure of underlying hardware
or software failure.
▪ Disk failures include formation of bad sectors, unreachability to the disk, disk head crash
or any other failure, which destroys all or a part of disk storage.
Storage Structure:
In brief, the storage structure can be divided into two categories −
i. Volatile storage − As the name suggests, a volatile storage cannot survive system
crashes. Volatile storage devices are placed very close to the CPU; normally they are
embedded onto the chipset itself. For example, main memory and cache memory are
examples of volatile storage. They are fast but can store only a small amount of information.
ii. Non-volatile storage − These memories are made to survive system crashes. They are
huge in data storage capacity, but slower in accessibility. Examples may include hard-disks,
magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity:
When a system crashes, it may have several transactions being executed and various files
opened for them to modify the data items. Transactions are made of various operations, which are
atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a whole
must be maintained, that is, either all the operations are executed or none.
▪ It should check the states of all the transactions, which were being executed.
▪ A transaction may be in the middle of some operation; the DBMS must ensure the atomicity of
the transaction in this case.
▪ It should check whether the transaction can be completed now or it needs to be rolled back.
▪ Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
▪ Maintaining shadow paging, where the changes are done on a volatile memory, and
later, the actual database is updated.
Log-based Recovery:
Log is a sequence of records, which maintains the records of actions performed by a transaction. It
is important that the logs are written prior to the actual modification and stored on a stable storage
media, which is failsafe.
When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start> When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2> It reads Tn has changed the value of X, from V1 to V2.
<Tn, commit>
The database can be modified using two approaches −
▪ Deferred database modification − All logs are written on to the stable storage and the
database is updated when a transaction commits.
▪ Checkpoint: Keeping and maintaining logs in real time and in real environment may fill
out all the memory space available in the system. As time passes, the log file may grow
too big to be handled at all.
▪ Checkpoint is a mechanism where all the previous logs are removed from the system
and stored permanently in a storage disk. Checkpoint declares a point before which the
DBMS was in consistent state, and all the transactions were committed.
Recovery:
When a system with concurrent transactions crashes and recovers, it behaves in the
following manner −
Recovery:
▪ The recovery system reads the logs backwards from the end to the last checkpoint.
▪ If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list.
▪ If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it
puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the
transactions in the redo-list and their previous logs are removed and then redone before
saving their logs.
Summary
▪ Reasons for system crash are Logical Errors or System Errors.
▪ Checkpoint is a mechanism where all the previous logs are removed from the system
and stored permanently in a storage disk.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 2
Concurrency control transactions
and schedule
Topic 6
Nested Transactions
Session 6
Objectives
After completing this session, you will be able to understand :
▪ Distributed Transactions
▪ Flat Transactions
▪ Nested Transactions
Introduction to Transactions:
▪ A transaction is a series of object operations that must be done in an
ACID-compliant manner.
▪ Atomicity: The transaction is completed entirely or not at all.
▪ Consistency : It is a term that refers to the transition from one consistent state to
another.
▪ Isolation: It is carried out separately from other transactions.
▪ Durability: Once completed, it is long lasting.
Transaction – Commands:
▪ Begin –
initiate a new transaction.
▪ Commit –
End a transaction and the changes made during the transaction are saved.
▪ Abort –
End a transaction and all changes made during the transaction will be undone.
How to run a transaction successfully?
▪ Client –
The transactions are issued by the clients.
▪ Coordinator –
The execution of the entire transaction is controlled by it (handles Begin, commit &
abort).
▪ Server –
Every component that accesses or modifies a resource is subject to transaction control.
The coordinator must be known by the transactional server. The transactional server
registers its participation in a transaction with the coordinator.
Distributed transaction:
▪ Definition: A flat or nested transaction that accesses objects handled by different servers
is referred to as a distributed transaction.
▪ When a distributed transaction reaches its end, in order to maintain the atomicity
property of the transaction, it is mandatory that all of the servers involved in the
transaction either commit the transaction or abort it.
▪ To do this, one of the servers takes on the job of coordinator, which entails ensuring that
the same outcome is achieved across all servers.
▪ The method by which the coordinator accomplishes this is determined by the protocol
selected. The most widely used protocol is the ‘two-phase commit protocol’, it enables
the servers to communicate whether to commit or abort the complete transaction.
Flat & Nested Distributed Transactions:
If a client transaction calls actions on multiple servers, it is said to be distributed. Distributed
transactions can be structured in two different ways:
1. Flat transactions
2. Nested transactions
1. Flat Transactions:
▪ A flat transaction has a single initiating point(Begin) and a single end point(Commit or
abort). They are usually very simple and are generally used for short activities rather
than larger ones.
▪ Before moving on to the next request, a flat client transaction completes the previous
one.
▪ The top-level transaction in a nested transaction can open sub-transactions, and each
sub-transaction can open more sub-transactions down to any depth of nesting.
Nested Transaction:
▪ A client’s transaction T opens up two sub-transactions, T1 and T2, which access objects
on servers X and Y, as shown in the diagram below.
▪ T1.1, T1.2, T2.1, and T2.2, which access the objects on the servers M, N, and P, are
opened by the sub-transactions T1 and T2.
Objects on servers
▪ Concurrent Execution of the Sub-transactions is done which are at the same level – in
the nested transaction strategy.
▪ In given diagram, T1 and T2 invoke objects on different servers and hence they can run
in parallel and are therefore concurrent.
T1.1, T1.2, T2.1, and T2.2 are four sub-transactions. These sub-transactions can also
run in parallel.
Example:
▪ Consider a distributed transaction (T) in which a customer transfers :
Account A is on server X
The transaction T involves four requests – 2 for deposits and 2 for withdrawals. Now they
can be treated as sub transactions (T1, T2, T3, T4) of the transaction T.
As shown in the figure below, transaction T is designed as a set of four nested transactions:
T1, T2, T3 and T4.
Advantage:
The performance is higher
than a single transaction in
which four operations are
invoked one after the other in
sequence.
So, the Transaction T may be divided into sub-transactions as :
T = open transaction
//T1openSubtransactiona.withdraw(105);
//T2openSubtransactionb.withdraw(205);
//T3openSubtransactionc.deposit(105);
//T4openSubtransactiond.deposit(205);
close Transaction
Summary
▪ A transaction is a series of object operations that must be done in an ACID-compliant
manner.
▪ A flat or nested transaction that accesses objects handled by different servers is referred
to as a distributed transaction.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!