Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
39 views

Java Merged

Java programming practical manual. It will help beginners to learn java from the beginning. Java programming practical manual. It will help beginners to learn java from the beginning.

Uploaded by

Nura Muhammad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Java Merged

Java programming practical manual. It will help beginners to learn java from the beginning. Java programming practical manual. It will help beginners to learn java from the beginning.

Uploaded by

Nura Muhammad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 291

Database Design

Development
Unit 1
Introduction to DBMS
Topic 2
File Organization
Session 2
Objectives
After completing this session, you will be able to understand :

▪ File organisation.

▪ Concept of hashed file, heap file, B+ file organisation.


File Organisation
Types of File Organizations
Various methods have been introduced to Organize files in the last presentation. Some of the
types are:
▪ Sequential File Organization
▪ Heap File Organization
▪ Hash File Organization
▪ B+ Tree File Organization
▪ Clustered File Organization
Revisiting sequential & Heap file:
1. Sequential File Organization –

The easiest method for file Organization is Sequential method. In this method the file are stored
one after another in a sequential manner. There are two ways to implement this method as Pile File
method, Sorted File Method.

2. Heap File Organization –

Heap File Organization works with data blocks. In this method records are inserted at the end of the
file, into the data blocks. No Sorting or Ordering is required in this method. If a data block is full, the
new record is stored in some other block, Here the other data block need not be the very next data
block, but it can be any block in the memory. It is the responsibility of DBMS to store and manage
the new records.
Hashed File Organisation
3. Hashed File Organisation
▪ Hashed file organisation is also called a direct file organisation.

▪ In this method, for storing the records a hash function is calculated, which provides the
address of the block to store the record. Any type of mathematical function can be used
as a hash function. It can be simple or complex.

▪ Hash function is applied to columns or attributes to get the block address. The records
are stored randomly. So, it is also known as Direct or Random file organization.

▪ If the generated hash function is on the column which is considered as key, then the
column can be called as hash key and if the generated hash function is on the column
which is considered as non-key, then the column can be called as hash column.
Differences
File management System Database Management system

Small system. Large system.

Relatively cheap. Relatively expensive.

Few files. Many files.

Need an individual application program to perform any Using a single command any operation can be performed on data
operation on data files. files.

Transaction management is difficult. Transaction management is easy.

Programming is done using COBOL, C, PASCAL called as Programming is done using SQL which is a 4GL.
3GL.

Simple structure. Complex Structure.


No Security. Rigorous Security.
Simple, primitive backup or recovery. Complex and sophisticated backup or recovery.

Single user. Multiple users

Duplication of data cannot be minimized. Duplication of data can be minimized.

Data Consistency is less. Data consistency is more because of normalization.

It stores the unstructured data. It is used for storing structured data.


B Tree:
Next type is B+ file organization, but pre-requisite required is B Tree and B+ Tree. Let’s
understand what is B Tree?

▪ B Tree is a specialized version of m-way tree.


▪ M-way tree is a tree that can have m children. Just like binary tree can have 2 children, it
is also a m way tree with m = 2. So how is B Tree a specialized version of m-way tree. B
tree of order m contains all the properties of an M way tree. In addition, it contains the
following properties.
• Every node in a B-Tree contains at most m children.
• Every node in a B-Tree except the root node and the leaf node contain at least m/2
children.
• The root nodes must have at least 2 nodes.
• All leaf nodes must be at the same level.
B Tree:
B+ Tree:
▪ B+ Tree is an extension of B Tree which allows efficient insertion, deletion and search
operations.

▪ In B Tree, Keys and records both can be stored in the internal as well as leaf nodes.
Whereas, in B+ tree, records (data) can only be stored on the leaf nodes while internal
nodes can only store the key values.

▪ The leaf nodes of a B+ tree are linked together in the form of a singly linked lists to make
the search queries more efficient.
Continued…
B+ Tree are used to store the large amount of data which can not be stored in the main
memory. Due to the fact that, size of main memory is always limited, the internal nodes
(keys to access records) of the B+ tree are stored in the main memory whereas, leaf nodes
are stored in the secondary memory.

The internal nodes of B+ tree are often called index nodes. A B+ tree of order 3 is shown in
the following figure.
B+ Tree:
B+ File Organisation
4. B+ File Organization:
▪ B+ tree file organization is the advanced method of an indexed sequential access
method. It uses a tree-like structure to store records in File.

▪ It uses the same concept of key-index where the primary key is used to sort the records.
For each primary key, the value of the index is generated and mapped with the record.

▪ The B+ tree is similar to a binary search tree (BST), but it can have more than two
children. In this method, all the records are stored only at the leaf node. Intermediate
nodes act as a pointer to the leaf nodes. They do not contain any records.
The given B+ tree shows that:
▪ There is one root node of the tree, i.e., 25.
▪ There is an intermediary layer with nodes. They do not store the actual record. They
have only pointers to the leaf node.
▪ The nodes to the left of the root node contain the prior value of the root and nodes to
the right contain next value of the root, i.e., 15 and 30 respectively.
▪ There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
▪ Searching for any record is easier as all the leaf nodes are balanced.
▪ In this method, searching any record can be traversed through the single path
and accessed easily.
Cluster File Organisation
5. Cluster file organization:
▪ When the two or more records are stored in the same file, it is known as clusters. These
files will have two or more tables in the same data block, and key attributes which are
used to map these tables together are stored only once.

▪ This method reduces the cost of searching for various records in different files.

▪ The cluster file organization is used when there is a frequent need for joining the tables
with the same condition. These joins will give only a few records from both tables. In the
given example, we are retrieving the record for only particular departments.

▪ This method can't be used to retrieve the record for the entire department.
In this method, we can directly insert, update or delete any record. Data is sorted
based on the key with which searching is done. Cluster key is a type of key with which
joining of the table is performed.
Types of Cluster file organization:
Cluster file organization is of two types:

1. Indexed Clusters:

In indexed cluster, records are grouped based on the cluster key and stored together. The
above EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster.
Here, all the records are grouped based on the cluster key- DEP_ID and all the records are
grouped.

2. Hash Clusters:

It is similar to the indexed cluster. In hash cluster, instead of storing the records based on
the cluster key, we generate the value of the hash key for the cluster key and store the
records with the same hash key value.
Advantages & Disadvantages of Clustered File
Organization:
Advantages of Clustered File Organization:

This method is best suited when there is frequent request for joining the tables with same
joining condition.

When there is a 1:M mapping between the tables, it results efficiently

Disadvantages of Clustered File Organization:

This method is not suitable for very large databases since the performance of this method
on them is low.

We cannot use this clusters, if there is any change is joining condition. If the joining
condition changes, the traversing the file takes lot of time.

This method is not suitable for less frequently joined tables or tables with 1:1 conditions.
Summary
▪ File Organization refers to the logical relationships among various records that constitute
the file.

▪ There are five methods of file organisations.

▪ Hashed file organisation is also called a direct file organisation and it is implemented with
hash function.

▪ Cluster file organization can be implemented with indexed clusters and hash clusters.

▪ Sequential file organisation is best among all.


Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 1
Introduction to DBMS
Topic 1
Review of Database Concepts
Session 1
Objectives
After completing this session, you will be able to understand :

▪ Database and its uses.

▪ Characteristics of DBMS.

▪ Types of DBMS.

▪ File organisation.
What is a Database?
Database Definition
▪ A database is a collection of related data which represents some aspect of the real
world.
▪ A database system is designed to be built and populated with data for a certain
task.
▪ Databases are used for storing, maintaining and accessing any sort of data. They
collect information on people, places or things. That information is gathered in one
place so that it can be observed and analyzed. Databases can be thought of as an
organized collection of information.
What are databases used for?
Businesses use data stored in databases to make informed business decisions. Some of the
ways organizations use databases include the following:
▪ Improve business processes. Companies collect data about business processes, such
sales, order processing and customer service. They analyze that data to improve these
processes, expand their business and grow revenue.
▪ Keep track of customers. Databases often store information about people, such as
customers or users. For example, social media platforms use databases to store user
information, such as names, email addresses and user behavior.
▪ Secure personal health information. Healthcare providers use databases to securely
store personal health data to inform and improve patient care.
▪ Store personal data. Databases can also be used to store personal information. For
example, personal cloud storage is available for individual users to store media, such as
photos, in a managed cloud.
Database Management System (DBMS)
▪ Database Management System (DBMS) is software for storing and retrieving users’
data while considering appropriate security measures. It consists of a group of programs
which manipulate the database.

▪ The DBMS accepts the request for data from an application and instructs the operating
system to provide the specific data. In large systems, a DBMS helps users and other
third-party software to store and retrieve data.

▪ DBMS allows users to create their own databases as per their requirement. The term
“DBMS” includes the user of the database and other application programs. It provides an
interface between the data and the software application.
Characteristics of DBMS
Characteristics and properties of DBMS:
Here are the characteristics and properties of Database Management System:

▪ Provides security and removes redundancy.

▪ Self-describing nature of a database system.

▪ Insulation between programs and data abstraction.

▪ Support of multiple views of the data.

▪ Sharing of data and multiuser transaction processing.

▪ Database Management Software allows entities and relations among them to form tables.

▪ It follows the ACID concept (Atomicity, Consistency, Isolation, and Durability).

▪ DBMS supports multi-user environment that allows users to access and manipulate data in
parallel.
Users of DBMS

Application The Application programmers write programs in various


Programmers programming languages to interact with databases.

Database Database Admin is responsible for managing the entire DBMS


Administrators system. The person is called Database admin or DBA.

The end users are the people who interact with the database
End-Users management system. They conduct various operations on
database like retrieving, updating, deleting, etc.
Application of DBMS
Sector Use of DBMS
For customer information, account activities, payments, deposits,
Banking
loans, etc.
Airlines For reservations and schedule information.
Universities For student information, course registrations, colleges and grades.

Telecommunication It helps to keep call records, monthly bills, maintaining balances, etc.

For storing information about stock, sales, and purchases of


Finance
financial instruments like stocks and bonds.
Sales Use for storing customer, product & sales information.
It is used for the management of supply chain and for tracking
Manufacturing
production of items. Inventories status in warehouses.
For information about employees, salaries, payroll, deduction,
HR Management
generation of paychecks, etc.
Types of DBMS
The main Four Types of Database Management System are:
1. Hierarchical database
2. Network database
3. Relational database
4. Object-Oriented database
1. Hierarchical Database

In a Hierarchical database, model data is organized in a tree-like structure. Data is Stored


Hierarchically (top down or bottom up) format. Data is represented using a parent-child
relationship. In Hierarchical DBMS parent may have many children, but children have only
one parent.

2. Network Database

The network database model allows each child to have multiple parents. It helps you to
address the need to model more complex relationships like as the orders/parts
many-to-many relationship. In this model, entities are organized in a graph which can be
accessed through several paths.
3. Relational Data base

Relational DBMS is the most widely used DBMS model because it is one of the easiest.
This model is based on normalizing data in the rows and columns of the tables. Relational
model stored in fixed structures and manipulated using SQL.

4. Object-Oriented Data base

In Object-oriented Model data stored in the form of objects. The structure which is called
classes which display data within it. It is one of the components of DBMS that defines a
database as a collection of objects which stores both data members values and operations.
File Organization in DBMS
File:
▪ A file is named collection of related information that is recorded on secondary storage
such as magnetic disks, magnetic tables and optical disks.

▪ A database consist of a huge amount of data. The data is grouped within a table in
RDBMS, and each table have related records. A user can see that the data is stored in
form of tables, but in actual this huge amount of data is stored in physical memory in
form of files.
What is File Organization?
▪ File Organization refers to the logical relationships among various records that constitute
the file, particularly with respect to the means of identification and access to any specific
record. In simple terms, Storing the files in certain order is called file Organization.

▪ File Structure refers to the format of the label and data blocks and of any logical control
record.

19
Types of File Organizations :
Various methods have been introduced to Organize files. These particular methods have
advantages and disadvantages on the basis of access or selection. Thus it is all upon the
programmer to decide the best suited file Organization method according to his requirements.
Some types of File Organizations are:

▪ Sequential File Organization

▪ Heap File Organization

▪ Hash File Organization

▪ B+ Tree File Organization

▪ Clustered File Organization

20
1. Sequential File Organization –
The easiest method for file Organization is Sequential method. In this method the file are
stored one after another in a sequential manner. There are two ways to implement this
method:

i. Pile File Method

ii. Sorted File Method

21
i. Pile File Method:
This method is quite simple,
in which we store the records in a sequence i.e
one after other in the order in which they are
inserted into the tables.

Insertion of new record –


Let the R1, R3 and so on upto R5 and R4 be

four records in the sequence. Here, records are

nothing but a row in any table. Suppose a new

record R2 has to be inserted in the sequence,

then it is simply placed at the end of the file.

22
ii. Sorted File Method
In this method, As the name itself suggest whenever a new record has to be inserted, it is
always inserted in a sorted (ascending or descending) manner. Sorting of records may be
based on any primary key or any other key.

23
Insertion of new record –

Let us assume that there is a preexisting sorted sequence of four records R1, R3, and so on upto R7
and R8. Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the
end of the file and then it will sort the sequence.

24
Heap File Organization:
▪ Heap File Organization works with data blocks. In this method records are inserted at the
end of the file, into the data blocks.

▪ No Sorting or Ordering is required in this method. If a data block is full, the new record is
stored in some other block, Here the other data block need not be the very next data
block, but it can be any block in the memory.

▪ It is the responsibility of DBMS to store and manage the new records.


▪ It is the simplest and most basic type of organization. It works with data blocks. In heap
file organization, the records are inserted at the file's end. When the records are inserted,
it doesn't require the sorting and ordering of records.

▪ When the data block is full, the new record is stored in some other block. This new data
block need not to be the very next data block, but it can select any data block in the
memory to store new records. The heap file is also known as an unordered file.

▪ In the file, every record has a unique id, and every page in a file is of the same size. It is
the DBMS responsibility to store and manage the new records.
Summary
▪ A database is a collection of related data which represents some aspect of the real
world.

▪ DBMS allows users to create their own databases as per their requirement. The term
“DBMS” includes the user of the database and other application programs.

▪ Relational DBMS is the most widely used DBMS model because it is one of the easiest.

▪ File Organization refers to the logical relationships among various records that constitute
the file.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 1
Introduction to DBMS
Topic 3
Normalization
Session 3
Objectives
After completing this session, you will be able to understand :

▪ Normalization and its types.

▪ Concept of 1st , 2nd , 3rd , 4th Normal Form.


Normalization
What is Normalization?
▪ Normalization is the process of organizing the data in the database.
▪ Normalization is used to minimize the redundancy from a relation or set of relations. It is
also used to eliminate the undesirable characteristics like Insertion, Update and Deletion
Anomalies.
▪ Normalization divides the larger table into the smaller table and links them using
relationship.
▪ The normal form is used to reduce redundancy from the database table.
Types of Normal Forms:
There are the four types of normal forms:
Keys in Database Table:
▪ Candidate key is a single key or a group of multiple keys that uniquely identify rows in a table.

▪ Primary key is a field in a table which uniquely identifies each row/record in a database table.
Primary keys must contain unique values. A primary key column cannot have NULL values.

▪ Secondary Key is the key that has not been selected to be the primary key. However, it is
considered a candidate key for the primary key.

Therefore, a candidate key not selected as a primary key is called secondary key. Candidate
key is an attribute or set of attributes that you can consider as a Primary key.

Note: Secondary Key is not a Foreign Key.


Example:

Student_ID Student_Enroll Student_Name Student_Age Student_Email

096 9122717 Manish 25 aaa@gmail.com

055 9122655 Manan 23 abc@gmail.com

067 9122699 Shreyas 28 pqr@gmail.com

Above, Student_ID, Student_Enroll and Student_Email are the candidate


keys. They are considered candidate keys since they can uniquely identify the
student record. Select any one of the candidate key as the primary key. Rest of
the two keys would be Secondary Key.
Let’s say you selected Student_ID as primary key,
therefore Student_Enroll and Student_Email will be Secondary Key (candidates of primary
key).

A Foreign Key creates a link between tables. It references the primary key in another table
and links it.
For example, the DeptID in the Employee table is a foreign key −
<Employee>

EmpID EmpName EmpAge DeptID

<Department>

DeptID DeptName DeptZone


▪ The DeptID in the Department table is a Primary Key in the Department Table.

▪ The DeptID in the Employee table is a Foreign Key in the Employee Table.

▪ Unique Key: Many users consider Primary Key as Unique Key, since both uniquely
identify a table, but Unique Key is different from Primary Key. Unique Key accepts null
values and Primary Key cannot have null.
Normal Forms
Normal Forms:

1 NF First Normal Form

2 NF Second Normal Form

3 NF Third Normal Form

4 NF Fourth Normal Form / Boyce Codd Normal Form

5 NF Fifth Normal Form / Project-join normal form


First Normal Form (1NF)
▪ A relation will be in 1NF if it contains an atomic value. (An atomic value is a value that
cannot be divided.)

▪ It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.

▪ First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
Continued…

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF):
•In the 2NF, relational must be in 1NF.
•In the second normal form, all non-key attributes are fully functional dependent on the primary
key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table: In the given table, non-prime attribute TEACHER_AGE is dependent on
TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for
2NF.
TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:739Difference between JDK, J JVM
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE

25 30
47 35
83 38
TEACHER_ID SUBJECT

25 Chemistry
TEACHER_SUBJECT table: 25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF):
▪ A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.

▪ 3 NF is used to reduce the data duplication. It is also used to achieve the data integrity.

▪ If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.

▪ A relation is in third normal form if it holds at least one of the following conditions for
every non-trivial function dependency X → Y.

X is a super key.

Y is a prime attribute, i.e., each element of Y is part of some candidate key.


Example:
EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY


222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table given:
• {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on
super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMP_ID EMP_NAME EMP_ZIP
EMPLOYEE table:
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

EMPLOYEE_ZIP table: 02228 US Boston


60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Fourth normal form (4NF):

•A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
•For a dependency A → B, if for a single value of A, multiple values of B exists,
•then the relation will be a multi-valued dependency.
Example: STUDENT Table
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
▪ The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.

▪ In the STUDENT relation, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

▪ So to make the above table into 4NF, we can decompose it into two tables:
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology
STU_ID HOBBY
59 Physics
21 Dancing

21 Singing
STUDENT_HOBBY 34 Dancing

74 Cricket

59 Hockey
Fifth normal form (5NF):
A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should
be lossless.

5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.

5NF is also known as Project-join normal form (PJ/NF).


Example:
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

▪ In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
▪ Suppose we add a new Semester as Semester 3 but do not know about the subject
and who will be taking that subject so we leave Lecturer and Subject as NULL. But
all three columns together acts as a primary key, so we can't leave other two
columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1: SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2: SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen
P3:
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Overview of Normal Forms:

Normal Form Description

1 NF A relation is in 1NF if it contains an atomic value.

A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
2 NF
functional dependent on the primary key.

3 NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
4 NF
valued dependency.
A relation is in 5NF if it is in 4NF and not contains any join dependency and
5 NF
joining should be lossless.
Summary
▪ Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.

▪ A relation is in 1NF if it contains an atomic value.

▪ A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the
primary key.

▪ A relation will be in 3NF if it is in 2NF and no transition dependency exists.

▪ A relation will be in 4 NF if it is in Boyce Codd normal form and has no multi-valued dependency.

▪ A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 1
Introduction to DBMS
Topic 4
Entity Relationship Model
Session 4
Objectives
After completing this session, you will be able to understand :

▪ Entity Relationship Model.

▪ ER Diagram.

▪ Attributes in the ER Diagram.

▪ Different Entity Relationship.


ER Diagram
ER model:
▪ ER model stands for an Entity-Relationship model. It is a high-level data model.
This model is used to define the data elements and relationship for a specified
system.

▪ It develops a conceptual design for the database. It also develops a very simple and
easy to design view of data.

▪ In ER modeling, the database structure is portrayed as a diagram called an


entity-relationship diagram.
Example of ER Diagram:
▪ Suppose we design a school database. In this database, the student will be an entity
with attributes like address, name, id, age, etc.

▪ The address can be another entity with attributes like city, street name, pin code, etc and
there will be a relationship between them.
Component of ER Diagram:
1. Entity:
▪ An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.

▪ Consider an organization as an example- manager, product, employee, department etc.


can be taken as an entity.
a. Weak Entity:
An entity that depends on another entity called a weak entity. The weak entity doesn't
contain any key attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute:
▪ The attribute is used to describe the property of an entity. Eclipse is used to represent an
attribute.

▪ For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute:
▪ The key attribute is used to represent the main characteristics of an entity.

▪ It represents a primary key.

▪ The key attribute is represented by an ellipse with the text underlined.


b. Composite Attribute:
▪ An attribute that composed of many other attributes is known as a composite attribute.

▪ The composite attribute is represented by an ellipse, and those ellipses are connected
with an ellipse.
c. Multivalued Attribute:
▪ An attribute can have more than one value. These attributes are known as a multivalued
attribute. The double oval is used to represent multivalued attribute.

▪ For example, a student can have more than one phone number.
d. Derived Attribute:
▪ An attribute that can be derived from other attribute is known as a derived attribute. It
can be represented by a dashed ellipse.

▪ For example, A person's age changes over time and can be derived from another
attribute like Date of birth.
3. Relationship:
A relationship is used to describe the relation between entities. Diamond or rhombus is used
to represent the relationship.

Types of relationship are as follows:

a. One-to-One Relationship
b. One-to-many relationship
c. Many-to-one relationship
d. Many-to-many relationship
a. One-to-One Relationship:
▪ When only one instance of an entity is associated with the relationship, then it is known
as one to one relationship.

▪ For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship:
▪ When only one instance of the entity on the left, and more than one instance of an entity
on the right associates with the relationship then this is known as a one-to-many
relationship.

▪ For example, Scientist can invent many inventions, but the invention is done by the only
specific scientist.
c. Many-to-one relationship:
▪ When more than one instance of the entity on the left, and only one instance of an entity
on the right associates with the relationship then it is known as a many-to-one
relationship.

▪ For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship:
▪ When more than one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then it is known as a many-to-many
relationship.

▪ For example, Employee can assign by many projects and project can have many
employees.
Notation of ER diagram:
▪ Database can be represented using the notations. In ER diagram, many notations are
used to express the cardinality. These notations are as follows:
Summary
▪ ER model stands for an Entity-Relationship model. It is a high-level data model.

▪ An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.

▪ The attribute is used to describe the property of an entity. Eclipse is used to represent an
attribute.

▪ A relationship is used to describe the relation between entities. Diamond or rhombus is


used to represent the relationship.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 1
Introduction to DBMS
Topic 5
SQL- Structured Query Language
Session 5
Objectives
After completing this session, you will be able to understand :

▪ Query in DBMS

▪ Commands in SQL.
DBMS Query
SQL Commands:
▪ Pre-requisite for DBMS is SQL Commands. SQL is an abbreviation for Structured
Query Language.

▪ SQL commands are instructions. It is used to communicate with the database. It is


also used to perform specific tasks, functions, and queries of data.

▪ SQL can perform various tasks like create a table, add data to tables, drop the
table, modify the table, set permission for users.
SQL Commands:
▪ Types of SQL Commands:

▪ There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.
1. Data Definition Language (DDL):
▪ DDL changes the structure of the table like creating a table, deleting a table, altering a
table, etc.

▪ All the command of DDL are auto-committed that means it permanently save all the
changes in the database.

▪ Here are some commands that come under DDL:

i. CREATE
ii. ALTER
iii. DROP
iv. TRUNCATE
a. CREATE :
▪ It is used to create a new table in the database.

▪ Syntax:

CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);

Example:

CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DOB DATE);


b. DROP:
▪ It is used to delete both the structure and record stored in the table.

▪ Syntax:

DROP TABLE table_name;

Example

DROP TABLE EMPLOYEE;


c. ALTER: :
▪ It is used to alter the structure of the database. This change could be either to modify the
characteristics of an existing attribute or probably to add a new attribute.

▪ Syntax:

To add a new column in the table

ALTER TABLE table_name ADD column_name COLUMN-definition;

To modify existing column in the table:

ALTER TABLE table_name MODIFY(column_definitions....);

EXAMPLE

ALTER TABLE STU_DETAILS ADD(ADDRESS VARCHAR2(20));

ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));


d. TRUNCATE:
▪ It is used to delete all the rows from the table and free the space containing the table.

Syntax:

TRUNCATE TABLE table_name;

Example:

TRUNCATE TABLE EMPLOYEE;


2. Data Manipulation Language
▪ DML commands are used to modify the database. It is responsible for all form of
changes in the database.

▪ The command of DML is not auto-committed that means it can't permanently save all the
changes in the database. They can be rollback.

▪ Here are some commands that come under DML:

a. INSERT
b. UPDATE
c. DELETE
a. INSERT:
The INSERT statement is a SQL query. It is used to insert data into the row of a table.

Syntax:

INSERT INTO TABLE_NAME

(col1, col2, col3,.... col N)

VALUES (value1, value2, value3, .... valueN);

Or

INSERT INTO TABLE_NAME

VALUES (value1, value2, value3, .... valueN);

For example:

INSERT INTO javatpoint (Author, Subject) VALUES ("Sonoo", "DBMS");


b. UPDATE:
▪ This command is used to update or modify the value of a column in the table.

Syntax:

UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHERE


CONDITION]

For example:

UPDATE students

SET User_Name = 'Sonoo'

WHERE Student_Id = '3'


c. DELETE:
▪ It is used to remove one or more row from a table.

▪ Syntax:

DELETE FROM table_name [WHERE condition];

For example:

DELETE FROM javatpoint

WHERE Author="Sonoo";
Data Query Language:
▪ DQL is used to fetch the data from the database. It uses only one command: SELECT

a. SELECT: This is the same as the projection operation of relational algebra. It is used to
select the attribute based on the condition described by WHERE clause.

Syntax:
SELECT expressions
FROM TABLES
WHERE conditions;
For example:
SELECT emp_name
FROM employee
WHERE age > 20;
Summary
▪ SQL commands are instructions. It is used to communicate with the database.

▪ SQL can perform various tasks like create a table, add data to tables, drop the table,
modify the table, set permission for users.

▪ DDL changes the structure of the table like creating a table, deleting a table, altering a table, etc.

▪ DML commands are used to modify the database. It is responsible for all form of changes in the
database.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 1
Introduction to DBMS
Topic 6
Query Processing in DBMS
Session 6
Objectives
After completing this session, you will be able to understand :

▪ Query processing

▪ Parsing and Translation.


Query Processing
Query Processing in DBMS:
▪ Query Processing is the activity performed in extracting data from the database. In
query processing, it takes various steps for fetching the data from the database.
▪ The steps involved are:
a. Parsing and translation
b. Optimization
c. Evaluation
Parsing and Translation:
▪ The working of a query processing is shown in the below-described diagram:
What is Parser?
First we will understand parser.

▪ Parser is a compiler that is used to break the data into smaller elements coming from
lexical analysis phase.

▪ A parser takes input in the form of sequence of tokens and produces output in the form
of parse tree.

▪ Parsing is of two types: top down parsing and bottom up parsing.

▪ Parsing, syntax analysis, or syntactic analysis is the process of analyzing


a string of symbols, either in natural language, computer languages or data structures,
conforming to the rules of a formal grammar.
Steps in Query Processing:
▪ Queries get translated in high-level database languages such as SQL.

▪ It gets translated into expressions that can be further used at the physical level of
the file system.

▪ After this, the actual evaluation of the queries and a variety of query -optimizing
transformations and takes place.

▪ Thus before processing a query, a computer system needs to translate the query
into a human-readable and understandable language.
▪ Relational algebra is well suited for the internal representation of a query.

▪ The translation process in query processing is similar to the parser of a query.

▪ When a user executes any query, for generating the internal form of the query, the parser
in the system checks the syntax of the query, verifies the name of the relation in the
database, the tuple, and finally the required attribute value. The parser creates a tree of
the query, known as 'parse-tree.'

▪ Further, translate it into the form of relational algebra. With this, it evenly replaces all the
use of the views when used in the query.
Example:
▪ Suppose a user executes a query.

▪ In SQL, a user wants to fetch the records of the employees whose salary is greater than or
equal to 10000. For doing this, the following query is undertaken:

select emp_name from Employee where salary>10000;

▪ Thus, to make the system understand the user query, it needs to be translated in the form
of relational algebra. We can bring this query in the relational algebra form as:

σsalary>10000 (πsalary (Employee))

πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.
Evaluation:
▪ For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation.

▪ Thus, after translating the user query, the system executes a query evaluation plan.
Query Evaluation Plan:
▪ In order to fully evaluate a query, the system needs to construct a query evaluation plan.

▪ The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.

▪ Such relational algebra with annotations is referred to as Evaluation Primitives. The


evaluation primitives carry the instructions needed for the evaluation of the operation.

▪ Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query execution
plan.

▪ A query execution engine is responsible for generating the output of the given query. It
takes the query execution plan, executes it, and finally makes the output for the user
query.
Optimization:
▪ The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to
write their query efficiently.

▪ Usually, a database system generates an efficient query evaluation plan, which


minimizes its cost. This type of task performed by the database system and is known
as Query Optimization.

▪ For optimizing a query, the query optimizer should have an estimated cost analysis of
each operation. It is because the overall operation cost depends on the memory
allocations to several operations, execution costs, and so on.

▪ Finally, after selecting an evaluation plan, the system evaluates the query and
produces the output of the query.
Summary
▪ Query Processing is the activity performed in extracting data from the database.

▪ When a user executes any query, for generating the internal form of the query, the parser
in the system checks it.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 2
Concurrency control transactions
and schedule
Topic 1
Introduction to Deadlock
Session 1
Objectives
After completing this session, you will be able to understand :

▪ Deadlock and how to handle it.

▪ Difference between deadlock & starvation.

▪ Necessary conditions for deadlock.


Deadlock:
▪ Every process needs some resources to complete its execution. However, the
resource is granted in a sequential order.

a. The process requests for some resource.


b. OS grant the resource if it is available otherwise let the process waits.
c. The process uses it and release on the completion.

▪ A Deadlock is a situation where each of the computer process waits for a resource
which is being assigned to some another process.

▪ In this situation, none of the process gets executed since the resource it needs, is
held by some other process which is also waiting for some other resource to be
released.
Example of Deadlock:
▪ Let us assume that there are three processes P1, P2 and P3. There are three different
resources R1, R2 and R3. R1 is assigned to P1, R2 is assigned to P2 and R3 is assigned to
P3.
Example of Deadlock:
▪ After some time, P1 demands for R1 which is being used by P2. P1 halts its execution since it
can't complete without R2. P2 also demands for R3 which is being used by P3.

▪ P2 also stops its execution because it can't continue without R3. P3 also demands for R1
which is being used by P1 therefore P3 also stops its execution.

▪ In this scenario, a cycle is being formed among the three processes. None of the process is
progressing and they are all waiting. The computer becomes unresponsive since all the
processes got blocked.
▪ A deadlock can be indicated by a cycle in the wait-for-graph. This is a directed graph in
which the vertices denote transactions and the edges denote waits for data items.

▪ For example, in the following wait-for-graph, transaction T1 is waiting for data item X
which is locked by T3. T3 is waiting for Y which is locked by T2 and T2 is waiting for Z
which is locked by T1.

▪ Hence, a waiting cycle is formed, and none of the transactions can proceed executing.
Sr. No. Deadlock Starvation
1 Deadlock is a situation where no process Starvation is a situation where the low priority
got blocked and no process proceeds process got blocked and the high priority
processes proceed.
2 Deadlock is an infinite waiting. Starvation is a long waiting but not infinite.

3 Every Deadlock is always a starvation. Every starvation need not be deadlock.

4 The requested resource is blocked by the The requested resource is continuously be


other process. used by the higher priority processes.

5 Deadlock happens when Mutual exclusion, It occurs due to the uncontrolled priority and
hold and wait, No pre-emption and circular resource management.
wait occurs simultaneously.
Necessary conditions for Deadlocks:
A resource can only be shared in mutually exclusive
1 Mutual Exclusion manner i.e. two processes cannot use the same
resource at the same time.
A process waits for some resources while holding another
2 Hold & Wait
resource at the same time.
The process which once scheduled will be executed till the
3 No preemption completion. No other process can be scheduled by the
scheduler meanwhile.
All the processes must be waiting for the resources in a cyclic
4 Circular wait manner so that the last process is waiting for the resource
which is being held by the first process.
Strategies for handling Deadlock:

1. Deadlock Ignorance

2. Deadlock prevention

3. Deadlock avoidance

4. Deadlock detection and recovery


1. Deadlock Ignorance:
▪ Deadlock Ignorance is the most widely used approach among all the mechanism. This is
being used by many operating systems mainly for end user uses.

▪ In this approach, the Operating system assumes that deadlock never occurs. It simply
ignores deadlock.

▪ This approach is best suitable for a single end user system where User uses the system
only for browsing and all other normal stuff.
2. Deadlock prevention:
▪ Deadlock happens only when Mutual Exclusion, hold and wait, No pre-emption and
circular wait holds simultaneously.

▪ If it is possible to violate one of the four conditions at any time then the deadlock can
never occur in the system.

▪ This method is suitable for a large database. If the resources are allocated in such a way
that deadlock never occurs, then the deadlock can be prevented.

▪ The Database management system analyses the operations of the transaction whether
they can create a deadlock situation or not. If they do, then the DBMS never allowed that
transaction to be executed.

14
3. Deadlock avoidance:
▪ In deadlock avoidance, the operating system checks whether the system is in safe state
or in unsafe state at every step which the operating system performs.

▪ The process continues until the system is in safe state. Once the system moves to
unsafe state, the OS has to backtrack one step.

▪ In simple words, The OS reviews each allocation so that the allocation doesn't cause the
deadlock in the system.

15
Deadlock avoidance:
▪ Deadlock avoidance mechanism is used to detect any deadlock situation in advance.

▪ A method like "wait for graph" is used for detecting the deadlock situation but this method
is suitable only for the smaller database. For the larger database, deadlock prevention
method can be used.

16
4. Deadlock detection and recovery:
▪ This approach let the processes fall in deadlock and then periodically check whether
deadlock occur in the system or not.

▪ If it occurs then it applies some of the recovery methods to the system to get rid of
deadlock.

17
Deadlock detection and recovery:
1. If resources have a single instance –
In this case for Deadlock detection, we can run an algorithm to check for the cycle in the
Resource Allocation Graph. The presence of a cycle in the graph is a sufficient condition for
deadlock.

18
Deadlock detection and recovery:
2. In the above diagram, resource 1 and resource 2 have single instances. There is a cycle
R1 → P1 → R2 → P2. So, Deadlock is Confirmed.

3. If there are multiple instances of resources –


Detection of the cycle is necessary but not sufficient condition for deadlock detection, in this
case, the system may or may not be in deadlock varies according to different situations.

19
Deadlock Recovery:
A traditional operating system such as Windows doesn’t deal with deadlock recovery as it is
a time and space-consuming process. Real-time operating systems use Deadlock
recovery.

▪ Killing the process –


Killing all the processes involved in the deadlock. Killing process one by one. After killing
each process check for deadlock again keep repeating the process till the system recovers
from deadlock. Killing all the processes one by one helps a system to break circular wait
condition.
▪ Resource Preemption –
Resources are preempted from the processes involved in the deadlock, preempted
resources are allocated to other processes so that there is a possibility of recovering the
system from deadlock. In this case, the system goes into starvation.

20
Summary
▪ A Deadlock is a situation where each of the computer process waits for a resource which
is being assigned to some another process.

▪ Starvation is a situation where the low priority process got blocked and the high priority
processes proceed.

▪ We can handle deadlock by Deadlock Ignorance, Deadlock prevention, Deadlock


avoidance, Deadlock detection & recovery.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 2
Concurrency control transactions
and schedule
Topic 2
Distributed concurrency control
and recovery:
Session 2
Objectives
After completing this session, you will be able to understand :

▪ Transaction Model.

▪ Distributed Concurrency Control.

▪ Distributed deadlock.

▪ Concurrency control and recovery.


Transaction Model:
▪ Simple transaction model is a model of transaction how it must be. It has active,
partially committed, failed, aborted, and committed states. Transaction is a several
operations that can change the content of the database which is handled by a single
program.

▪ Simple transaction model follows all ACID properties while doing transactions.

▪ Transaction States are:

i. Active
ii. Partially committed
iii. Failed
iv. Aborted
v. Committed
Transaction States:
Description:
▪ At first, when the transaction is going to operate it is the active state. When the read or
write operations occurs, it can be called partially committed states.

▪ Finally, after read or write operations, when they use commit operations, they will be
committed states meaning the transaction is stored permanently in the database.

▪ And when after these both of active states and partially committed states fails, they will
fall under the category failed state. Without executing, failed state will rollback which will
create the aborted state.

▪ Once again, aborted state will be automatically converges into terminated state. Also,
after the committed state, the transaction terminates.
Description:
▪ In the process of series of operations, read and write operations creates partially
committed. That will be stored in local memory or buffer.

▪ After, the use of commit statement, data will be moved into permanent storage. This will
justify the flow from active to partially committed and to committed state.

▪ On the other hand, in the case of power failure, it will be in failed state. Also, in the
partially committed state, in the power failure case, it will be in failed sate.

▪ After this, rollback occurs meaning the local memory is cleared. Then aborted occurs
meaning the database is unchanged and finally terminated.
Deadlock:
▪ We have covered basics about deadlock in the last lecture. Now, we will discuss,
about distributed deadlock.

▪ Distributed deadlocks can occur when distributed transactions or concurrency


control are utilized in distributed systems. It may be identified via a distributed
technique like edge chasing or by creating a global wait-for graph (WFG) from local
wait-for graphs at a deadlock detector.
Concurrency control in DBMS:
▪ Concurrency control concept comes under the Transaction in database management
system (DBMS). It is a procedure in DBMS which helps us for the management of two
simultaneous processes to execute without conflicts between each other, these conflicts
occur in multi user systems.

▪ Concurrency can simply be said to be executing multiple transactions at a time. It is


required to increase time efficiency. If many transactions try to access the same data,
then inconsistency arises. Concurrency control required to maintain consistency data.

▪ For example, if we take ATM machines and do not use concurrency, multiple persons
cannot draw money at a time in different places. This is where we need concurrency.
Advantages:
The advantages of concurrency control are as follows −

▪ Waiting time will be decreased.

▪ Response time will decrease.

▪ Resource utilization will increase.

▪ System performance & Efficiency is increased.


Concurrency control techniques:
The concurrency control techniques are as follows −

i. Locking: Lock guaranties exclusive use of data items to a current transaction. It first
accesses the data items by acquiring a lock, after completion of the transaction it
releases the lock.

ii. Time stamping: Time stamp is a unique identifier created by DBMS that indicates
relative starting time of a transaction. Whatever transaction we are doing it stores the
starting time of the transaction and denotes a specific time.

iii. Optimistic: It is based on the assumption that conflict is rare and it is more efficient to
allow transactions to proceed without imposing delays to ensure serializability.
Concurrency Control and Recovery in
Distributed Databases:
▪ For currency control and recovery purposes, numerous problems arise in a distributed DBMS
environment that is not encountered in a centralized DBMS environment. These include the
following:

i. Dealing with multiple copies of the data items.

ii. Failure of individual sites.

iii. Failure of communication links

iv. Distributed Commit

v. Distributed Deadlock
Lock management:
▪ Lock management can be distributed across sites in many ways:

i. Centralized: A single site is in-charge of handling lock and unlock requests for all objects.

ii. Primary copy: One copy of each object is designates as the primary copy. All requests to
lock or unlock a copy of these objects are handled by the lock manager at the site where the
primary copy is stored, regardless of where the copy itself is stored.

iii. Fully Distributed: Request to lock or unlock a copy of an object stored at a site are handled
by the lock manager at the site where the copy is stored.
Distributed Deadlock:
▪ One issue that requires special attention when using either primary copy or fully
distributed locking is deadlock detection. Each site maintains a local waits-for graph and
a cycle in a local graph indicates a deadlock.

▪ As shown in the following figure T2 is waiting for T1 at site A and T1 is waiting for T2 at
site B thus we have a Deadlock.
Distributed Deadlock Detection Algorithm:
To detect such deadlocks, a distributed deadlock detection algorithm must be
used and we have three types of algorithms:

1 Centralized Algorithm

2 Hierarchical Algorithm

3 Simple Algorithm
1. Centralized Algorithm:
▪ It consists of periodically sending all local waits-for graphs to some one site that is
responsible for global deadlock detection.

▪ At this site, the global waits-for graphs is generated by combining all local graphs and in
the graph the set of nodes is the union of nodes in the local graphs and there is an edge
from one node to another if there is such an edge in any of the local graphs.
2. Hierarchical Algorithm:
▪ This algorithm groups the sites into hierarchies and the sites might be grouped by states,
then by country and finally into single group that contain all sites.

▪ Every node in this hierarchy constructs a wait-for graph that reveals deadlocks involving
only sites contained in (the sub tree rooted at) this node.

▪ Thus, all sites periodically (e.g., every 10 seconds) send their local waits-for graph to the
site constructing the waits-for graph for their country.

▪ The sites constructing waits-for graph at the country level periodically (e.g., every 10
minutes) send the country waits-for graph to site constructing the global waits-for graph.

19
3. Simple Algorithm:
▪ If a transaction waits longer than some chosen time-out interval, it is aborted.

▪ Although this algorithm causes many unnecessary restart but the overhead of the
deadlock detection is low.

20
Summary
▪ Simple transaction model is a model of transaction how it must be. It has active, partially
committed, failed, aborted, and committed states.

▪ Concurrency control is a procedure in DBMS which helps us for the management of two
simultaneous processes to execute without conflicts between each other.

▪ There are three concurrency control techniques as Locking, Time stamping, Optimistic.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 2
Concurrency control transactions
and schedule
Topic 3
Lock-Based Protocol
Session 3
Objectives
After completing this session, you will be able to understand :

▪ Transaction & concurrency control.

▪ Lock based protocol.


What is a Transaction & Concurrency Control?
▪ A transaction is a logical unit of work that must be either entirely completed or
aborted, no intermediate states are acceptable.

▪ A transaction log keeps track of all transactions that update the database.

▪ Transaction log is itself a database, and it is managed by DBMS.

▪ Concurrency control coordinates simultaneous execution of transactions in a


multiprocessing database.

▪ The objective of concurrency control is to ensure the serializability of transactions in


a multi-user database environment.
Lock-Based Protocol:
▪ In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:

i. Shared lock

ii. Exclusive lock


1. Shared lock:
▪ It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.

▪ It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:
▪ In the exclusive lock, the data item can be both reads as well as written by the
transaction.

▪ This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
Lock protocols:
▪ Locking protocols are used in database management systems as a means of
concurrency control. Multiple transactions may request a lock on a data item
simultaneously. Hence, we require a mechanism to manage the locking requests made
by transactions.
▪ Such a mechanism is called as Lock Manager. It relies on the process of message
passing where transactions and lock manager exchange messages to handle the locking
and unlocking of data items.There are four types of lock protocols available:

i. Simplistic lock protocol

ii. Pre-claiming Lock Protocol

iii. Two-phase locking (2PL)

Strict Two-phase locking (Strict-2PL)


iv.
i. Simplistic lock protocol:
▪ It is the simplest way of locking the data while transaction. Simplistic lock-based protocols
allow all the transactions to get the lock on the data before insert or delete or update on it.

▪ It will unlock the data item after completing the transaction.


ii. Pre-claiming Lock Protocol:
▪ Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which they
need locks. Before initiating an execution of the transaction, it requests DBMS for all the lock
on all those data items.
▪ If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
▪ If all the locks are not granted then this protocol allows the transaction to rolls back and waits
until all the locks are granted.
iii. Two-phase locking (2PL):
▪ The two-phase locking protocol divides the execution phase of the transaction into three
parts.

▪ In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.

▪ In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.

▪ In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
There are two phases of 2PL:

▪ Growing phase: In the growing phase, a new lock on the data item may be acquired by
the transaction, but none can be released.

▪ Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
iv. Strict Two-phase locking (Strict-2PL):
▪ The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the
locks, the transaction continues to execute normally.
▪ The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock
after using it.

▪ Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at
a time.
▪ Strict-2PL protocol does not have shrinking phase of lock release. It does not have
cascading abort as 2PL does.
Summary
▪ A transaction log keeps track of all transactions that update the database.

▪ There are four types of lock protocols.

▪ Locking protocols are used in database management systems as a means of


concurrency control.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 2
Concurrency control transactions
and schedule
Topic 4
Graph-Based Locking Protocol
& Tree Protocol
Session 4
Objectives
After completing this session, you will be able to understand :

▪ Graph based locking protocol.

▪ Tree based protocol.


Graph Based Concurrency Control
Protocol in DBMS:
▪ Graph Based Protocols are yet another way of implementing Lock Based
Protocols.
▪ As we know the prime problems with Lock Based Protocol has been avoiding
Deadlocks and ensuring a Strict Schedule.
▪ We’ve seen that Strict Schedules are possible with following Strict or Rigorous
2-PL.
▪ We’ve even seen that Deadlocks can be avoided if we follow Conservative 2-PL but
the problem with this protocol is it cannot be used practically.
▪ Graph Based Protocols are used as an alternative to 2-PL.
Tree Based Protocols:
▪ Tree Based Protocols is a simple implementation of Graph Based Protocol.

▪ A prerequisite of this protocol is that we know the order to access a Database Item.
For this we implement a Partial Ordering on a set of the Database Items (D) {d1, d2,
d3, ….., dn} . The protocol following the implementation of Partial Ordering is stated
as-

▪ If di –> dj then any transaction accessing both di and dj must access di before
accessing dj.

Implies that the set D may now be viewed as a directed acyclic graph (DAG), called
a database graph.
Tree Based Protocol:
▪ Partial Order on Database items determines a tree like structure.

▪ Only Exclusive Locks are allowed.

▪ The first lock by Ti may be on any data item. Subsequently, a data Q can be locked by
Ti only if the parent of Q is currently locked by Ti.

▪ Data items can be unlocked at any time.


▪ Following the Tree based Protocol ensures Conflict Serializability and Deadlock
Free schedule. We need not wait for unlocking a Data item as we did in 2-PL protocol,
thus increasing the concurrency.

▪ Now, let us see an Example, following is a

Database Graph which will be used as a

reference for locking the items subsequently.

Fig. Database Graph


▪ Let’s look at an example based on T1 T2 T3
Lock-X(A)
the above Database Graph. Lock-X(B)
Lock-X(D)
▪ We have three Transactions in this Lock-X(H)
Unlock-X(D)
schedule and this is a skeleton Lock-X(E)
Lock-X(D)
example, i.e, we will only see how
Unlock-X(B)
Locking and Unlocking works, let’s Unlock-X(E)
Lock-X(B)
keep this simple and not make this Lock-X(E)
Unlock-X(H)
complex by adding operations on Lock-X(B)
data. Lock-X(G)
Unlock-X(D)
Unlock-X(E)
Unlock-X(B)
Unlock-X(G)
▪ From the given example, first see that the schedule is Conflict Serializable. Serializability for
Locks can be written as T2 –> T1 –> T3.

▪ Data items Locked and Unlocked are following the same rule as given above and follows the
Database Graph.

▪ Thus, let’s revise once more what are the key points of Graph Based Protocols.
Advantages & Disadvantages:
Advantage –
i. Ensures Conflict Serializable Schedule.
ii. Ensures Deadlock Free Schedule
iii. Unlocking can be done anytime
iv. With some advantages comes some Disadvantages also.
Disadvantage –
i. Unnecessary locking overheads may happen sometimes, like if we want both D and E,
then at least we have to lock B to follow the protocol.
ii. Cascading Rollbacks is still a problem. We don’t follow a rule of when Unlock operation
may occur so this problem persists for this protocol.
Overall this protocol is mostly known and used for its unique way of implementing Deadlock
Freedom.
Summary
▪ A Graph Based Protocols are yet another way of implementing Lock Based Protocols.

▪ Tree Based Protocols is a simple implementation of Graph Based Protocol.


Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 2
Concurrency control transactions
and schedule
Topic 5
Crash Recovery, Log &
Check-Point
Session 5
Objectives
After completing this session, you will be able to understand :

▪ Crash Recovery.

▪ Log based Recovery.

▪ Checkpoint.
Crash Recovery:
▪ DBMS is a highly complex system with hundreds of transactions being executed
every second.
▪ The durability and robustness of a DBMS depends on its complex architecture and
its underlying hardware and system software.
▪ If it fails or crashes amid transactions, it is expected that the system would follow
some sort of algorithm or techniques to recover lost data.
Failure Classification:
▪ To see where the problem has occurred, we generalize a failure into various
categories, as follows −

1. Transaction Failure

2. System Crash

3. Disk Failure
1. Transaction failure:
▪ A transaction has to abort when it fails to execute or when it reaches a point from where
it can’t go any further. This is called transaction failure where only a few transactions or
processes are hurt.

▪ Reasons for a transaction failure could be −

▪ Logical errors − Where a transaction cannot complete because it has some code error
or any internal error condition.

▪ System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts
an active transaction.
2. System Crash:
▪ There are problems − external to the system − that may cause the system to stop
abruptly and cause the system to crash.

▪ For example, interruptions in power supply may cause the failure of underlying hardware
or software failure.

▪ Examples may include operating system errors.


3. Disk Failure:
▪ In early days of technology evolution, it was a common problem where hard-disk drives
or storage drives used to fail frequently.

▪ Disk failures include formation of bad sectors, unreachability to the disk, disk head crash
or any other failure, which destroys all or a part of disk storage.
Storage Structure:
In brief, the storage structure can be divided into two categories −

i. Volatile storage − As the name suggests, a volatile storage cannot survive system
crashes. Volatile storage devices are placed very close to the CPU; normally they are
embedded onto the chipset itself. For example, main memory and cache memory are
examples of volatile storage. They are fast but can store only a small amount of information.

ii. Non-volatile storage − These memories are made to survive system crashes. They are
huge in data storage capacity, but slower in accessibility. Examples may include hard-disks,
magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity:
When a system crashes, it may have several transactions being executed and various files
opened for them to modify the data items. Transactions are made of various operations, which are
atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a whole
must be maintained, that is, either all the operations are executed or none.

When a DBMS recovers from a crash, it should maintain the following −

▪ It should check the states of all the transactions, which were being executed.

▪ A transaction may be in the middle of some operation; the DBMS must ensure the atomicity of
the transaction in this case.

▪ It should check whether the transaction can be completed now or it needs to be rolled back.

▪ No transactions would be allowed to leave the DBMS in an inconsistent state.


There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −

▪ Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.

▪ Maintaining shadow paging, where the changes are done on a volatile memory, and
later, the actual database is updated.
Log-based Recovery:
Log is a sequence of records, which maintains the records of actions performed by a transaction. It
is important that the logs are written prior to the actual modification and stored on a stable storage
media, which is failsafe.

Log-based recovery works as follows −

▪ The log file is kept on a stable storage media.

When a transaction enters the system and starts execution, it writes a log about it.

<Tn, Start> When the transaction modifies an item X, it write logs as follows −

<Tn, X, V1, V2> It reads Tn has changed the value of X, from V1 to V2.

When the transaction finishes, it logs −

<Tn, commit>
The database can be modified using two approaches −

▪ Deferred database modification − All logs are written on to the stable storage and the
database is updated when a transaction commits.

▪ Immediate database modification − Each log follows an actual database modification.


That is, the database is modified immediately after every operation.
Recovery with Concurrent Transactions:
▪ When more than one transaction are being executed in parallel, the logs are interleaved.
At the time of recovery, it would become hard for the recovery system to backtrack all
logs, and then start recovering. To ease this situation, most modern DBMS use the
concept of 'checkpoints'.

▪ Checkpoint: Keeping and maintaining logs in real time and in real environment may fill
out all the memory space available in the system. As time passes, the log file may grow
too big to be handled at all.

▪ Checkpoint is a mechanism where all the previous logs are removed from the system
and stored permanently in a storage disk. Checkpoint declares a point before which the
DBMS was in consistent state, and all the transactions were committed.
Recovery:
When a system with concurrent transactions crashes and recovers, it behaves in the
following manner −
Recovery:
▪ The recovery system reads the logs backwards from the end to the last checkpoint.

▪ It maintains two lists, an undo-list and a redo-list.

▪ If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list.

▪ If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it
puts the transaction in undo-list.

All the transactions in the undo-list are then undone and their logs are removed. All the
transactions in the redo-list and their previous logs are removed and then redone before
saving their logs.
Summary
▪ Reasons for system crash are Logical Errors or System Errors.

▪ Log is a sequence of records, which maintains the records of actions performed by a


transaction.

▪ Checkpoint is a mechanism where all the previous logs are removed from the system
and stored permanently in a storage disk.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!
Database Design
Development
Unit 2
Concurrency control transactions
and schedule
Topic 6
Nested Transactions
Session 6
Objectives
After completing this session, you will be able to understand :

▪ Distributed Transactions

▪ Flat Transactions

▪ Nested Transactions
Introduction to Transactions:
▪ A transaction is a series of object operations that must be done in an
ACID-compliant manner.
▪ Atomicity: The transaction is completed entirely or not at all.
▪ Consistency : It is a term that refers to the transition from one consistent state to
another.
▪ Isolation: It is carried out separately from other transactions.
▪ Durability: Once completed, it is long lasting.
Transaction – Commands:
▪ Begin –
initiate a new transaction.

▪ Commit –
End a transaction and the changes made during the transaction are saved.

Also, it allows other transactions to see the modifications you’ve made.

▪ Abort –
End a transaction and all changes made during the transaction will be undone.
How to run a transaction successfully?
▪ Client –
The transactions are issued by the clients.

▪ Coordinator –
The execution of the entire transaction is controlled by it (handles Begin, commit &
abort).

▪ Server –
Every component that accesses or modifies a resource is subject to transaction control.
The coordinator must be known by the transactional server. The transactional server
registers its participation in a transaction with the coordinator.
Distributed transaction:
▪ Definition: A flat or nested transaction that accesses objects handled by different servers
is referred to as a distributed transaction.

▪ When a distributed transaction reaches its end, in order to maintain the atomicity
property of the transaction, it is mandatory that all of the servers involved in the
transaction either commit the transaction or abort it.

▪ To do this, one of the servers takes on the job of coordinator, which entails ensuring that
the same outcome is achieved across all servers.

▪ The method by which the coordinator accomplishes this is determined by the protocol
selected. The most widely used protocol is the ‘two-phase commit protocol’, it enables
the servers to communicate whether to commit or abort the complete transaction.
Flat & Nested Distributed Transactions:
If a client transaction calls actions on multiple servers, it is said to be distributed. Distributed
transactions can be structured in two different ways:

1. Flat transactions

2. Nested transactions
1. Flat Transactions:
▪ A flat transaction has a single initiating point(Begin) and a single end point(Commit or
abort). They are usually very simple and are generally used for short activities rather
than larger ones.

▪ A client makes requests to multiple servers in a flat transaction. Transaction T, for


example, is a flat transaction that performs operations on objects in servers X, Y, and Z.

▪ Before moving on to the next request, a flat client transaction completes the previous
one.

▪ As a result, each transaction visits the server object in order.


A transaction can only wait for one object at a time when servers utilize locking.
Objects on servers

Fig. Flat Transaction


Limitations of a flat Transaction:
▪ All work is lost in the event of a crash.

▪ Only one DBMS may be used at a time.

▪ No partial rollback is possible.


2. Nested Transactions:
▪ A transaction that includes other transactions within its initiating point and end point are
known as nested transactions. So the nesting of the transactions is done in a transaction.
The nested transactions here are called sub-transactions.

▪ The top-level transaction in a nested transaction can open sub-transactions, and each
sub-transaction can open more sub-transactions down to any depth of nesting.
Nested Transaction:
▪ A client’s transaction T opens up two sub-transactions, T1 and T2, which access objects
on servers X and Y, as shown in the diagram below.
▪ T1.1, T1.2, T2.1, and T2.2, which access the objects on the servers M, N, and P, are
opened by the sub-transactions T1 and T2.

Fig. Nested Transaction

Objects on servers
▪ Concurrent Execution of the Sub-transactions is done which are at the same level – in
the nested transaction strategy.

▪ In given diagram, T1 and T2 invoke objects on different servers and hence they can run
in parallel and are therefore concurrent.
T1.1, T1.2, T2.1, and T2.2 are four sub-transactions. These sub-transactions can also
run in parallel.
Example:
▪ Consider a distributed transaction (T) in which a customer transfers :

Rs. 105 from account A to account C and


Subsequently, Rs. 205 from account B to account D.
▪ It can be viewed/ thought of as :
Transaction T : Start
Transfer Rs. 105 from A to C :
Deduct Rs. 105 from A(withdraw from A) & Add Rs. 105 to C (depopsit to C)
Transfer Rs. 205 from B to D :
Deduct Rs. 205 from B (withdraw from B) & Add Rs. 205 to D(depopsit to D)
End
Assuming:

Account A is on server X

Account B is on server Y,and

Accounts C and D are on server Z.

The transaction T involves four requests – 2 for deposits and 2 for withdrawals. Now they
can be treated as sub transactions (T1, T2, T3, T4) of the transaction T.
As shown in the figure below, transaction T is designed as a set of four nested transactions:
T1, T2, T3 and T4.
Advantage:
The performance is higher
than a single transaction in
which four operations are
invoked one after the other in
sequence.
So, the Transaction T may be divided into sub-transactions as :

//Start the Transaction

T = open transaction

//T1openSubtransactiona.withdraw(105);

//T2openSubtransactionb.withdraw(205);

//T3openSubtransactionc.deposit(105);

//T4openSubtransactiond.deposit(205);

//End the trsnaction

close Transaction
Summary
▪ A transaction is a series of object operations that must be done in an ACID-compliant
manner.

▪ A transaction run with client, coordinator, server.

▪ A flat or nested transaction that accesses objects handled by different servers is referred
to as a distributed transaction.
Additional Resources
▪ “Distributed Database Systems”, by Patrick Valduriez, Pearson Publication.
Any Questions?
Thank You!

You might also like