Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DBMS Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 114

Lecture Notes On

DBMS (4CS4-05)

IV SEM (2022-23)
UNIT I: Introduction to DBMS
Objectives:
 To Understand the basic concepts and the applications of database systems
 To Master the basics of SQL and construct queries using SQL
 To understand the relational database design principles
 To become familiar with the basic issues of transaction processing andconcurrency control
 To become familiar with database storage structures and access techniques

Outcomes:
 Demonstrate the basic elements of a relational database management system
 Ability to identify the data models for relevant problems
 Ability to design entity relationship and convert entity relationship diagrams into RDBMS and formulate
SQL queries on the respect data
 Apply normalization for the development of application software

INTRODUCTION TO DBMS:
 Data is nothing but facts and statistics stored or free flowing over a network, generallyit's raw and
unprocessed.
 Data becomes information when it is processed, turning it into something meaningful.
 The database is a collection of inter-related data which is used to retrieve, insert and delete the data
efficiently.
 It is also used to organize the data in the form of a table, schema, views, and reports,etc.
 Using the database, you can easily retrieve, insert, and delete the information.
 For example: The college Database organizes the data about the admin, staff, studentsand faculty etc.

DBMS Vs. File System:

DBMS File System

DBMS is a collection of data. In DBMS, theuser File system is a collection of data. In this system, the
is not required to write the procedures. user has to write the procedures for managing the
database.

Searching data is easy in Dbms Searching is difficult in File System


Dbms is structured data Files are unstructured data

No data redundancy in Dbms Data redundancy is there in file system

Memory utilisation well in dbms Memory utilisation poor in file system

No data inconsistency in dbms Inconsistency in file system

DBMS gives an abstract view of data that hides File system provides the detail of the data
the details. representation and storage of data.
DBMS provides a crash recovery mechanism, File system doesn't have a crash mechanism, i.e., if the
i.e., DBMS protects the user from the system system crashes while entering some data, then the
failure. content of the file will lost.

DBMS provides a good protection mechanism. It is very difficult to protect a file under the filesystem.

DBMS contains a wide variety of sophisticated File system can't efficiently store and retrieve thedata.
techniques to store and retrieve the data.

DBMS takes care of Concurrent access of data In the File system, concurrent access has many
using some form of locking. problems like redirecting the file while other deleting
some information or updating some information.

History of DBMS:

Data is a collection of facts and figures. The data collection was increasing day to day and they needed to be
stored in a device or software which is safer.
Charles Bachman was the first person to develop the Integrated Data Store (IDS) which was based on
network data model for which he was inaugurated with the Turing Award (The most prestigious award which
is equivalent to Nobel Prize in the field of Computer Science.). It was developed in early 1960’s.
In the late 1960’s, IBM (International Business Machines Corporation) developed the Integrated Management
Systems which is the standard database system used till date in many places. It was developed based on the
hierarchical database model. It was during the year 1970 that the relational database model was developed by
Edgar Codd. Many of the database models we use today are relational based. It was considered the
standardized database model from then.
The relational model was still in use by many people in the market. Later during the same decade (1980’s),
IBM developed the Structured Query Language (SQL) as a part of R project. It was declared as a standard
language for the queries by ISO and ANSI. The Transaction Management Systems for processing transactions
was also developed by James Gray for which he was felicitated the Turing Award.

 A DBMS is software that allows creation, definition and manipulation of database, allowing users to store,
process and analyse data easily.
 DBMS provides us with an interface or a tool, to perform various operations like creating database,
storing data in it, updating data, creating tables in the database anda lot more.
 DBMS also provides protection and security to the databases.
 It also maintains data consistency in case of multiple users. Here are some examples of popular DBMS
used these days:
 MySql
 Oracle
 SQL Server
 IBM DB2
DATABASE APPLICATIONS:
1. Telecom: There is a database to keeps track of the information regarding calls made, network usage,
customer details etc.
2. Industry: Where it is a manufacturing unit, warehouse or distribution centre, each oneneeds a database to
keep the records of ins and outs
3. Banking System: For storing customer info, tracking day to day credit and debit transactions, generating
bank statements etc.
4. Sales: To store customer information, production information and invoice details.
5. Airlines: To travel though airlines, we make early reservations; this reservation information along with
flight schedule is stored in database.
6. Education sector: Database systems are frequently used in schools and colleges to store and retrieve the
data regarding student details, staff details, course details, examdetails, payroll data, attendance details, fees
details etc.

PURPOSE OF DATABASE SYSTEMS:


The main purpose of database systems is to manage the data. Consider a university that keeps the data of
students, teachers, courses, books etc. To manage this data we need to store this data somewhere where we
can add new data, delete unused data, update outdated data, retrieve data, to perform these operations on
data we need a Database management system that allows us to store the data in such a way so that all these
operations can be performed on the data efficiently.

Characteristics of DBMS:
 Data stored into Tables: Data is never directly stored into the database. Data is stored into tables, created
inside the database.
 Reduced Redundancy: In the modern world hard drives are very cheap, but earlier when hard drives
were too expensive, unnecessary repetition of data in database was a big problem. But DBMS follows
Normalisation which divides the data in such a way that repetition is minimum.
 Data Consistency: On Live data, i.e. data that is being continuosly updated and added, maintaining the
consistency of data can become a challenge. But DBMS handles it allby itself.
 Support Multiple user and Concurrent Access: DBMS allows multiple users to work on it(update, insert,
delete data) at the same time and still manages to maintain the data consistency.
 Query Language: DBMS provides users with a simple Query language, using whichdata can be easily
fetched, inserted, deleted and updated in a database.

Advantages of DBMS:
 Controls database redundancy: It can control data redundancy because it stores all data in one single
database file and that recorded data is placed in the database.
 Data sharing: In DBMS, the authorized users of an organization can share data among multiple users.
 Easily Maintenance: It can be easily maintainable due to the centralized nature of thedatabase system.
 Reduce time: It reduces development time and maintenance need.
 Backup: It provides backup and recovery subsystems which create automatic backupof data from
hardware and software failures and restores the data if required.
 Multiple user interface: It provides different types of user interfaces like graphicaluser interfaces,
application program interfaces

Disadvantages of DBMS:
 Cost of Hardware and Software: It requires a high speed of data processor and large memory size to run
DBMS software.
 Size: It occupies a large space of disks and large memory to run them efficiently.
 Complexity: Database system creates additional complexity and requirements.
 Higher impact of failure: Failure is highly impacted the database because in most of the organization, all
the data stored in a single database and if the database is damageddue to electric failure or database
corruption then the data may be lost forever.

Disadvantages of file oriented approach:


1) Data redundancy and inconsistency:
The same information may be written in several files. This redundancy leads to higher storage and access cost.
It may lead data inconsistency that is the various copies of the same data may longer agree for example a
changed customer address may be reflected in single file but not else where in the system.
2) Difficulty in accessing data:
The conventional file processing system does not allow data to retrieve in a convenient and efficient manner
according to user choice.
3) Data isolation:
Because data are scattered in various file and files may be in different formats with new application programs
to retrieve the appropriate data is difficult.
4) Integrity Problems:
Developers enforce data validation in the system by adding appropriate code in the various application
programs. However when new constraints are added, it is difficult to change the programs to enforce them.
5) Atomicity:
It is difficult to ensure atomicity in a file processing system when transactionfailure occurs due to power failure,
networking problems etc.
(Atomicity: either all operations of the transaction are reflected properly in database or none are)
6) Concurrent access:
In the file processing system it is not possible to access a same file fortransaction at same time
7) Security problems:
There is no security provided in file processing system to secure the data from unauthorized user access.

Database:
A database is organized collection of related data of an organization stored in formatted way which is shared by
multiple users.
The main features of data in a database are:
1. It must be well organized
2. It is related
3. It is accessible in a logical order without any difficulty
4. It is stored only once for example:
Consider the roll no, name, address of a student stored in a student file. It is collection of related data with an
implicit meaning.
Data in the database may be persistent, integrated and shared.

Persistent:
If data is removed from database due to some explicit request from user to remove.
Integrated:
A database can be a collection of data from different files and when any redundancyamong those files is
removed from database is said to be integrated data.
Sharing Data:
The data stored in the database can be shared by multiple users simultaneously without affecting the correctness
of data.
Why Database over file system:
In order to overcome the limitation of a file system, a new approach was required. Hence a database approach
emerged. A database is a persistent collection of logically related data. The initial attempts were to provide a
centralized collection of data. A database has a self describing nature. It contains not only the data sharing and
integrationof data of an organization in a single database.

A small database can be handled manually but for a large database and having multiple users it is difficult to
maintain it, In that case a computerized database is useful. The advantages of database system over traditional,
paper based methods of record keeping are:
Compactness: No need for large amount of paper files
Speed: The machine can retrieve and modify the data faster way then human being.
Less drudgery: Much of the maintenance of files by hand is eliminated.
Accuracy: Accurate, up-to-date information is fetched as per requirement of theuser at any time.

Database Management System (DBMS):


A database management system consists of collection of related data and refers to a set ofprograms for defining,
creation, maintenance and manipulation of a database.

Function of DBMS:
1. Defining database schema: it must give facility for defining the databasestructure also specifies access
rights to authorized users.
2. Manipulation of the database: DBMS must have functions like insertion of record into database, updating
of data, deletion of data, and retrieval of data.
3. Sharing of database: The DBMS must share data items for multiple users bymaintaining consistency of
data.
4. Protection of database: It must protect the database against unauthorized users.
5. Database recovery: If for any reason the system fails DBMS must facilitate database recovery.

View of Data in DBMS:


Abstraction is one of the main features of database systems.
Hiding irrelevant details from user and providing abstract view of data to users, helps in easy and efficient
user-database interaction.
The three level of DBMS architecture: The top level of that architecture is “view level”. The view level
provides the “view of data” to the users and hides the irrelevant details such as data relationship, database
schema, constraints, security etc from the user.

Data Abstraction in DBMS:

Database systems are made-up of complex data structures. To ease the user interaction withdatabase, the
developers hide internal irrelevant details from users. This process of hiding irrelevant details from user is
called data abstraction.
We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how data is actuallystored in database.
You can get the complex data structure details at this level.
Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data is stored
in database.
View level: Highest level of data abstraction. This level describes the user interaction withdatabase system.

Instance and schema in DBMS:

 Definition of schema: Design of a database is called the schema. Schema is of three types: Physical
schema, logical schema and view schema.
 The design of a database at physical level is called physical schema, how the data stored in blocks of
storage is described at this level.
 Design of database at logical level is called logical schema, programmers and database administrators
work at this level, at this level data can be described as certaintypes of data records gets stored in data
structures, however the internal details such as implementation of data structure is hidden at this level
(available at physical level).
 Design of database at view level is called view schema. This generally describes enduser interaction with
database systems.
Definition of instance: The data stored in database at a particular moment of time is called instance of
database. Database schema defines the variable declarations in tables that belong to a particular database; the
value of these variables at a moment of time is called the instance ofthat database.

What is Relational Model?

Relational Model (RM) represents the database as a collection of relations. A relation is nothing but a table
of values. Every row in the table represents a collection of related data values. These rows in the table denote a
real-world entity or relationship.
The table name and column names are helpful to interpret the meaning of values in each row.The data are
represented as a set of relations. In the relational model, data are stored as tables. However, the physical
storage of the data is independent of the way the data are logically organized.
Relational Model Concepts:
1. Attribute: Each column in a Table. Attributes are the properties which define arelation. e.g.,
Student_Rollno, NAME,etc.
2. Tables: In the Relational model the, relations are saved in the table format. It is stored along with its
entities. A table has two properties rows and columns. Rows represent records and columns represent
attributes.
3. Tuple: It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with itsattributes.
5. Degree: The total number of attributes which in the relation is called the degree of therelation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system. Relation instances
never have duplicate tuples.
9. Relation key - Every row has one, two or multiple attributes, which is called relation key.
10. Attribute domain – Every attribute has some pre-defined value and scope which isknown as attribute
domain.

Keys in DBMS:
KEYS in DBMS is an attribute or set of attributes which helps you to identify a row(tuple) in a relation(table).
They allow you to find the relation between two tables. Keys help you uniquely identify a row in a table by a
combination of one or more columns in that table. Key is also helpful for finding unique record or row from the
table. Database key is also helpful for finding unique record or row from the table.
Why we need a Key?
Here are some reasons for using SQL key in the DBMS system:
 Keys help you to identify any row of data in a table. In a real-world application, a table could contain
thousands of records. Moreover, the records could be duplicated. Keys ensure that you can uniquely identify a
table record despite these challenges.
 Allows you to establish a relationship between and identify the relation betweentables
 Help you to enforce identity and integrity in the relationship.

Types of Keys in Database Management System:

There are mainly seven different types of Keys in DBMS and each key has its different functionality:

 Super Key - A super key is a group of single or multiple keys which identifies rows in a table.
 Primary Key - is a column or group of columns in a table that uniquely identifyevery row in that table.
 Candidate Key - is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super key
with no repeated attributes.
 Alternate Key - is a column or group of columns in a table that uniquely identifyevery row in that table.
 Foreign Key - is a column that creates a relationship between two tables. The purpose of Foreign keys is to
maintain data integrity and allow navigation betweentwo different instances of an entity.
 Compound Key - has two or more attributes that allow you to uniquely recognize a specific record. It is
possible that each column may not be unique by itself within thedatabase.
 Composite Key - An artificial key which aims to uniquely identify each record is called a surrogate key.
These kind of key are unique because they are created whenyou don't have any natural primary key.
 Surrogate Key - An artificial key which aims to uniquely identify each record is called a surrogate key.
These kind of key are unique because they are created whenyou don't have any natural primary key.

Primary key example:

CREATE TABLE Persons (


ID int NOT NULL,
LastName varchar(255) NOT NULL, FirstName varchar(255),
Age int,
PRIMARY KEY (ID)
);
Create Primary Key (ALTER TABLE statement):

Syntax
The syntax to create a primary key using the ALTER TABLE statement in SQL is:

ALTER TABLE table_name


ADD CONSTRAINT constraint_name
PRIMARY KEY (column1, column2, ... column_n);

FOREIGN KEY on CREATE TABLE

The following SQL creates a FOREIGN KEY on the "PersonID" column when the "Orders"table is
created:
CREATE TABLE Orders (
OrderID int NOT NULL, OrderNumber int NOT NULL, PersonID int,
PRIMARY KEY (OrderID),
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID)
);

ER model:
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to
define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple andeasy to design view
of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity- relationship diagram.

For example, suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city,
street name, pin code, etc and there will be a relationship between them.
Component of ER Diagram:

1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can berepresented as
rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be taken as
an entity.

Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't containany key
attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent anattribute.
For example, id, age, contact number, name, etc. can be attributes of a student.

Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.

Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. Thecomposite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.

Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivaluedattribute. The
double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can berepresented by a
dashed ellipse.
For example, a person's age changes over time and can be derived from another attributelike Date of birth.

3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is usedto represent
the relationship.

Types of relationship are as follows:

One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known asone to one
relationship.

For example, A female can marry to one male, and a male can marry to one female.
One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity onthe right
associates with the relationship, this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.

Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity onthe right
associates with the relationship, it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.

Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship, it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.

Notation of ER diagram:
Database can be represented using the notations. In ER diagram, many notations are used to express the
cardinality. These notations are as follows:
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes haveto be
performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraints:


1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The value of the
attribute must be available in the corresponding domain.

Example:

2. Entity integrity constraints

o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relationand if the primary key
has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.

3. Referential Integrity Constraints


A referential integrity constraint is specified between two tables.
In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of Table 2, then
every value of the Foreign Key in Table 1 must be nullor be available in Table 2.
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A primary key can
contain a unique and null value in the relational table.

Database Basics:
Data item:
The data item is also called as field in data processing and is the smallest unit of datathat has meaning to its
users.
Eg: “e101”,”sumit”

Entities and attributes:


An entity is a thing or object in the real world that is distinguishable from all other objects
Eg: Bank, employee, student
Attributes are properties are properties of an entity.
Eg: Empcode, ename, rollno, name
Logical data and physical data:
Logical data are the data for the table created by user in primary memory. Physical data refers to the data stored
in the secondary memory.

Schema and sub-schema:


A schema is a logical database description and is drawn as a chart of the types of data that are used. It gives the
names of the entities and attributes and specifies the relationships between them.

A database schema includes such information as:

 Characteristics of data items such as entities and attributes.


 Logical structures and relationships among these data items.
 Format for storage representation.
 Integrity parameters such as physical authorization and back up policies.

A subschema is derived schema derived from existing schema as per the user requirement. There may be more
than one subschema creates for a single conceptual schema.

Three level architecture of DBMS :

External level View View View


user1 User2 User n

Conceptual level Mapping supplied by DBMS

Conceptual view

Physical Mapping supplied by DBMS/OS

Internal level
A database management system that provides three level of data is said to follow three-level architecture.

 External level
 Conceptual level
 Internal level

The external level is at the highest level of database abstraction. At this level, there will be many views define
for different users requirement. A view will describe only a subset of the database. Any number of user views
may exist for a given global or subschema.
For example, each student has different view of the time table. The view of a student of Btech (CSE) is different
from the view of the student of Btech(ECE).Thus this level of abstraction is concerned with different categories
of users. Each external view is described by means of a schema called schema or schema.

Conceptual level:
At this level of database abstraction all the database entities and the relationships among them are included.
One conceptual view represents the entire database. This conceptual view is defined by the conceptual schema.
The conceptual schema hides the details of physical storage structures and concentrate on describing entities,
data types, relationships, user operations and constraints.
It describes all the records and relationships included in the conceptual view. There is only one conceptual
schema per database. It includes feature that specify the checks to relation data consistency and integrity.

Internal level:
It is the lowest level of abstraction closest to the physical storage method used . It indicates how the data will
be stored and describes the data structures and access methods to be used by the database . The internal view is
expressed by internal schema.
The following aspects are considered at this level:
1. Storage allocation e.g: B-tree,hashing
2. access paths eg. specification of primary and secondary keys,indexes etc
3. Miscellaneous eg. Data compression and encryption techniques,optimization of the internal structures.

Database users :
Naive users:
Users who need not be aware of the presence of the database system or any other system supporting their usage
are considered naïve users. A user of an automatic teller machine falls on this category.
.
Online users:
These are users who may communicate with the database directly via an online terminal or indirectly via a user
interface and application program. These users are aware of the database system and also know the data
manipulation language system.

Application programmers:
Professional programmers who are responsible for developing application programs or user interfaces utilized
by the naïve and online user fall into this category.

Database Administration:
A person who has central control over the system is called database administrator.
The function of DBA are:
1. Creation and modification of conceptual Schema definition
2. Implementation of storage structure and access method.
3. Schema and physical organization modifications.
4. Granting of authorization for data access.
5. Integrity constraints specification.
6. Execute immediate recovery procedure in case of failures.
7. Ensure physical security to database.

Database language :

1) Data definition language(DDL) :


DDL is used to define database objects .The conceptual schema is specified by a set of definitions expressed by
this language. It also gives some details about how to implement this schema in the physical devices used to
store the data. This definition includes all the entity sets and their associated attributes and their relationships.
The result of DDL statements will be a set of tables that are stored in special file called data dictionary.

2) Data manipulation language(DML) :


A DML is a language that enables users to access or manipulate data stored in the database. Data manipulation
involves retrieval of data from the database, insertion of new data into the database and deletion of data or
modification of existing data.

There are basically two types of DML:


 Procedural: It requires a user to specify what data is needed and how to get it.
 Non-procedural: which requires a user to specify what data is needed without specifying how to get it.

3) Data control language(DCL):


This language enables user to grant authorization and cancelingauthorization of database objects.

Elements of DBMS:

DML pre-compiler:
It converts DML statement embedded in an application program to normal procedure calls in the host language.
The pre-complier must interact with the query processor in order to generate the appropriate code.
DDL compiler:
The DDL compiler converts the data definition statements into a set of tables. These tables contain information
concerning the database and are in a form that can be used by other components of the dbms.
File manager:
File manager manages the allocation of space on disk storage and the data structure used to represent
information stored on disk.
Database manager:
A database manager is a program module which provides the interface between the low level data stored in the
database and the application programs and queries submitted to the system.
The responsibilities of database manager are:
1. Interaction with file manager: The data is stored on the disk using the file system which is provided by
operating system. The database manager translates the different DML statements into low-level file system
commands, so database manager is responsible for the actual storing, retrieving and updating of data in the
database.
2. Integrity enforcement: The data values stored in the database must satisfy certain constraints (eg: the age
of a person can't be less than zero).These constraints are specified by DBA. Data manager checks the
constraints and ifit satisfies then it stores the data in the database.
3. Security enforcement: Data manager checks the security measures for database from unauthorized users.
4. Backup and recovery: Database manager detects the failures occurs due to different causes (like disk
failure, power failure, deadlock, s/w error) and restores the database to original state of the database.
5. Concurrency control: When several users access the same database file simultaneously, there may be
possibilities of data inconsistency. It is responsible of database manager to control the problems occurs for
concurrenttransactions.

Query processor:
The query processor used to interpret to online user’s query and convert it into an efficient series of operations
in a form capable of being sent to the data manager for execution. The query processor uses the data dictionary
to find the details of data file and using this information it create query plan/access plan to execute the query.

Data Dictionary:
Data dictionary is the table which contains the information about database objects. It contains information like
1. External, conceptual and internal database description
2. Description of entities , attributes as well as meaning of data elements
3. Synonyms, authorization and security codes
4. Database authorization

The data stored in the data dictionary is called metadata.

DBMS STRUCTURE:

Naïve user Application On line user DBA


programers

Application System calls Ddl compiler


programs

Application prog Dml precomplier Query processor Ddl compiler


obj code

Database manager

File manager

DBMS

Data file

Data dictionary

Q. List four significant differences between a file-processing system and a DBMS.

Some main differences between a database management system and a file-processing system are:
• Both systems contain a collection of data and a set of programs which access that data. A database
management system coordinates both the physical and the logical
access to the data, whereas a file-processing system coordinates only the physical access.
• A database management system reduces the amount of data duplication by ensuring that a physical piece of
data is available to all programs authorized to have access to it, where as data written by one program in a file-
processing system may not be readable by another program.
• A database management system is designed to allow flexible access to data (i.e., queries), whereas a file-
processing system is designed to allow predetermined access to data (i.e., compiled programs).
• A database management system is designed to coordinate multiple users accessing the same data at the
same time. A file-processing system is usually designed to allow one or more programs to access different data
files at the same time. In a file-processing system, a file can be accessed by two programs concurrently only if
both programs have read-only access to the file.

Q. Explain the difference between physical and logical data independence.


• Physical data independence is the ability to modify the physical scheme without making it necessary to
rewrite application programs. Such modifications include changing from unblocked to blocked record storage,
or from sequential to randomaccess files.
• Logical data independence is the ability to modify the conceptual scheme without making it necessary to
rewrite application programs. Such a modification might be adding a field to a record; an application program’s
view hides this change from the program.

Q. List five responsibilities of a database management system. For each responsibility, explain the
problems that would arise if the responsibility were not discharged.
A general purpose database manager (DBM) has five responsibilities:
a. Interaction with the file manager.
b. Integrity enforcement.
c. Security enforcement.
d. Backup and recovery.
e. Concurrency control.
If these responsibilities were not met by a given DBM (and the text points out that sometimes a responsibility is
omitted by design, such as concurrency control on a single-user DBM for a micro computer) the following
problems can occur, respectively:

a. No DBMS can do without this, if there is no file manager interaction then nothing stored in the files can be
retrieved.
b. Consistency constraints may not be satisfied, account balances could go below the minimum allowed,
employees could earn too much overtime (e.g.,hours > 80) or, airline pilots may fly more hours than allowed by
law.
c. Unauthorized users may access the database, or users authorized to access part of the database may be able
to access parts of the database for which they lack authority. For example, a high school student could get
access to national defense secret codes, or employees could find out what their supervisors earn.
d. Data could be lost permanently, rather than at least being available in a consistent state that existed prior to
a failure.
e. Consistency constraints may be violated despite proper integrity enforcement in each transaction. For
example, incorrect bank balances might be reflected due to simultaneous withdrawals and deposits, and so on.

Q. What are five main functions of a database administrator?

Five main functions of a database administrator are:


 To create the scheme definition
 To define the storage structure and access methods
 To modify the scheme and/or physical organization when necessary
 To grant authorization for data access
 To specify integrity constraints
Q. List six major steps that you would take in setting up a database for a particularenterprise.
Six major steps in setting up a database for a particular enterprise are:
 Define the high level requirements of the enterprise (this step generates a document known as the system
requirements specification.)
 Define a model containing all appropriate types of data and datarelationships.
 Define the integrity constraints on the data.
 Define the physical level.
 For each known problem to be solved on a regular basis (e.g., tasks to be carried out by clerks or Web
users) define a user interface to carry out the task, and write the necessary application programs to
implement the user interface.
 Create/initialize the database.

EXERCISES:

1. What is database management system


2. What are the disadvantage of file processing system
3. State advantage and disadvantage of database management system
4. What ate different types of database users
5. What is data dictionary and what are its contents
6. What are the function of DBA
7. What are the different database languages explain with example.
8. Explain the three layer architecture of DBMS.
9. Differentiate between physical data independence and logical data independence
10. Explain the function of data base manager
11. Explain meta data

ER-MODEL
Data model:
The data model describes the structure of a database. It is a collection of conceptual tools for describing data,
data relationships and consistency constraints and various types of data model such as
• Object based logical model
• ER-model
• Functional model
• Object oriented model
• Semantic model
• Record based logical model
• Hierarchical database model
• Network model
• Relational model
• Physical model

The entity-relationship data model perceives the real world as consisting of basic objects, called entities and
relationships among these objects. It was developed to facilitate data base design by allowing specification of an
enterprise schema which represents the overall logical structure of a data base.

Main features of ER-MODEL:


 Entity relationship model is a high level conceptual model
 It allows us to describe the data involved in a real world enterprise in terms ofobjects and their relationships.
 It is widely used to develop an initial design of a database.
 It provides a set of useful concepts that make it convenient for a developer to move from a based set of
information to a detailed and description of information that can be easily implemented in a database system.
 It describes data as a collection of entities, relationships and attributes.

The E-R data model employs three basic notions : entity sets, relationship sets andattributes.

Entity sets:
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For example,
each person in an enterprise is an entity. An entity has a set properties and the values for some set of properties
may uniquely identify an entity.
BOOK is entity and its properties (called as attributes) bookcode, booktitle, price etc.
An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all
persons who are customers at a given bank, for example, can be defined as the entity set customer.

Attributes:
An entity is represented by a set of attributes. Attributes are descriptive propertiespossessed by each member of
an entity set.

Customer is an entity and its attributes are customerid, custmername, custaddress etc.
An attribute as used in the E-R model , can be characterized by the following attributetypes.

Simple and composite attribute:


simple attributes are the attributes which can’t be divided into sub parts eg: customerid,empno
composite attributes are the attributes which can be divided into subparts. eg: name consisting of first name,
middle name, last name address consisting of city,pincode,state

Single-valued and multi-valued attribute:


The attribute having unique value is single –valued attributeeg: empno,customerid,regdno etc.
The attribute having more than one value is multi-valued attributeeg: phone-no, dependent name, vehicle

Derived Attribute:
The values for this type of attribute can be derived from the values of existingattributes
eg:age which can be derived from (currentdate-birthdate) experience_in_year can be calculated as (currentdate-
joindate)

NULL valued attribute:


The attribute value which is unknown to user is called NULL valued attribute.

Relationship sets:
A relationship is an association among several entities.
A relationship set is a set of relationships of the same type. Formally, it is a mathematical relation on n>=2
entity sets. If E1,E2…En are entity sets, then a relationship set R is a subset of
{(e1,e2,…en)|e1Є E1,e2 Є E2..,en Є En}where (e1,e2,…en) is a relationship.

borrow loan
customer
Consider the two entity sets customer and loan. We define the relationship set borrow to denote the association
between customers and the bank loans that the customers have.

More about entities and Relationship:


Recursive relationships:
When the same entity type participates more than once in a relationship type in different roles, the relationship
type is called recursive relationship.

Participation constraints:
The participation constraints specify whether the existence of any entity depends on its being related to another
entity via the relationship. There are two types of participation constraints

Total :
When all the entities from an entity set participate in a relationship type , is called total participation. For
example, the participation of the entity set student on the relationship set must ‘opts’ is said to be total because
every student enrolled must opt for a course.

Partial:
When it is not necessary for all the entities from an entity set to particapte ion a relationship type, it is called
participation. For example, the participation of the entity set student in ‘represents’ is partial, since not every
student in a class is a class representative.

Weak Entity:
Entity types that do not contain any key attribute, and hence cannot be identified independently are called weak
entity types. A weak entity can be identified by uniquely only by considering some of its attributes in
conjunction with the primary key attribute of another entity, which is called the identifying owner entity.
Generally a partial key is attached to a weak entity type that is used for unique identification of weak entities
related to a particular owner type. The following restrictions must hold:
The owner entity set and the weak entity set must participate in one to many relationship set. This relationship
set is called the identifying relationship set of the weak entity set.
The weak entity set must have total participation in the identifying relationship.

Example:
Consider the entity type dependent related to employee entity, which is used to keep track of the dependents of
each employee. The attributes of dependents are: name, birthrate, sex and relationship. Each employee entity set
is said to its own the dependent entities that are related to it. However, not that the ‘dependent’ entity does not
exist of its own, it is dependent on the employee entity. In other words we can say that in case an employee
leaves the organization all dependents related to without the entity ‘employee’. Thus it is a weak entity.

Keys:
Super key:
A super key is a set of one or more attributes that taken collectively, allow us to identify uniquely an entity in the
entity set.
For example , customer-id,(cname,customer-id),(cname,telno)

Candidate key:
In a relation R, a candidate key for R is a subset of the set of attributes of R, whichhave the following properties:
Uniqueness:No two distinct tuples in R have the same values for the candidate key.
Irreducible:No proper subset of the candidate key has the uniqueness property that is the candidate key.
Eg: (cname,telno)
Primary key:
The primary key is the candidate key that is chosen by the database designer as the principal means of
identifying entities with in an entity set. The remaining candidate keys (if any), are called alternate key.
.

Advanced ER-diagram:

Abstraction is the simplification mechanism used to hide superfluous details of a set of objects. It allows one to
concentrate on the properties that are of interest to the application.
There are two main abstraction mechanism used to model information:

Generalization and specialization:


Generalization is the abstracting process of viewing set of objects as a single general class by concentrating on
the general characteristics of the constituent sets while suppressing or ignoring their differences. It is the union
of a number of lower-level entity types for the purpose of producing a higher-level entity type. For instance,
student is a generalization of graduate or undergraduate, full-time or part-time students. Similarly, employee is
generalization of the classes of objects cook, waiter, and cashier. Generalization is an IS_A relationship;
therefore, manager IS_AN employee, cook IS_ANemployee, waiter IS_AN employee, and so forth.
Specialization is the abstracting process of introducing new characteristics to an existing class of objects to
create one or more new classes of objects. This involves taking a higher-level, and using additional
characteristics, generating lower-level entities. The lower-level entities also inherit the, characteristics of the
higher-level entity. In applying the characteristics size to car we can create a full-size, mid-size, compact or
subcompact car. Specialization may be seen as the reverse process of generalization addition specific properties
are introduced at a lower level in a hierarchy of objects.
.

Aggregation:
Aggregation is the process of compiling information on an object, there by abstracting a higher level object. In
this manner, the entity person is derived by aggregating the characteristics of name, address, ssn. Another form
of the aggregation is abstracting a relationship objects and viewing the relationship as an object.

Job

Branch
Employe
Work
son

Manages

Manager
ER- Diagram For College Database

rollno
name addres
coursei cname duratio

Student
opts Course
1
1

has enroll Taug


ed
1 N
1 Work fid

gaurdian Department dno Faculty addre

Head
name dnam 1 name sal
1
addres relationship

Date

Conversion of ER-diagram to relational database


Conversion of entity sets:
1. For each strong entity type E in the ER diagram, we create a relation R containing all the single attributes of
E. The primary key of the relation R will be one of the key attribute of R.

STUDENT(rollno (primary key),name, address) FACULTY(id(primary key),name ,address, salary)


COURSE(course-id,(primary key),course_name,duration) DEPARTMENT(dno(primary key),dname.

2. For each weak entity type W in the ER diagram, we create another relation R that contains all simple
attributes of W. If E is an owner entity of W then key attribute of E is also include In R. This key attribute of R
is set as a foreign key attribute of R. Now the combination of primary key attribute of owner entity type and
partial key of the weak entity type will form the key of the weak entity type.
GUARDIAN((rollno,name) (primary key),address,relationship)

Conversion of relationship sets:

Binary Relationships:

One-to-one relationship:
For each 1:1 relationship type R in the ER-diagram involving two entities E1 and E2 we choose one of
entities(say E1) preferably with total participation and add primary key attribute of another E as a foreign key
attribute in the table of entity(E1). We will also include all the simple attributes of relationship type R in E1 if
any, For example, the department relationship has been extended tp include head-id and attribute of the
relationship.

DEPARTMENT(D_NO,D_NAME,HEAD_ID,DATE_FROM)

One-to-many relationship:
For each 1:n relationship type R involving two entities E1 and E2, we identify the entity type (say E1) at the n-
side of the relationship type R and include primary key of the entity on the other side of the relation (say E2) as
a foreign key attribute in the table of E1. We include all simple attribute(or simple components of a composite
attribute of R(if any) in the table E1)
For example:
This works in relationship between the DEPARTMENT and FACULTY. For this relationship choose the entity
at N side, i.e, FACULTY and add primary key attribute of another entity DEPARTMENT, ie, DNO as a foreign
key attribute in FACULTY.

FACULTY(CONSTAINS WORKS_IN RELATIOSHIP)(ID,NAME,ADDRESS,BASIC_SAL,DNO)

Many-to-many relationship:
For each m:n relationship type R, we create a new table (say S) to represent R, Wealso include the primary key
attributes of both the participating entity types as a foreign key attribute in s. Any simple attributes of the m:n
relationship type(or simple components as a composite attribute) is also included as attributes of S. For
example:
The M:n relationship taught-by between entities COURSE; and FACULTY shod be represented as a new table.
The structure of the table will include primary key of COURSE and primary key of FACULTY entities.
TAUGHT-BY(ID (primary key of FACULTY table),course-id (primary key of COURSE table)

N-ary relationship:
For each n-anry relationship type R where n>2, we create a new table S to represent R, We include as foreign
key attributes in s the primary keys of the relations that represent the participating entity types. We also include
any simple attributes of the n-ary relationship type(or simple components of complete attribute) as attributes of
S. The primary key of S is usually a combination of all the foreign keys that reference the relations representing
the participating entity types.
Multi-valued attributes:
For each multivalued attribute ‘A’, we create a new relation R that includes an attribute corresponding to plus
the primary key attributes k of the relation that represents the entity type or relationship that has as an attribute.
The primary key of R is then combination of A and k.
For example, if a STUDENT entity has rollno,name and phone number where phone numer is a multivalued
attribute the we will create table PHONE(rollno,phoneno) where primary key is the combination,In the
STUDENTtable we need not have phone number, instead if can be simply (rollno,name) only.
PHONE(rollno,phoneno)
.

Account_n name
Account
branch
generalization

specialization
Is-a specialization

intrest
charges

Saving
Current

Converting Generalisation /specification hierarchy to tables:


A simple rule for conversion may be to decompose all the specialized entities into table in case they are disjoint,
for example, for the figure we can create the two table as:
Account(account_no,name,branch,balance) Saving account(account-no,intrest) Current_account(account-
no,charges)

Record Based Logical Model

Hierarchical Model:

 A hierarchical database consists of a collection of records which are connected toone another through links.
 a record is a collection of fields, each of which contains only one data value.
 A link is an association between precisely two records.
 The hierarchical model differs from the network model in that the records areorganized as collections of
trees rather than as arbitrary graphs.

Tree-Structure Diagrams:
 The schema for a hierarchical database consists of
o boxes, which correspond to record types
o lines, which correspond to links
 Record types are organized in the form of a rooted tree.
o No cycles in the underlying graph.
o Relationships formed in the graph must be such that only
one-to-many or one-to-one relationships exist between a parent and achild.

Database schema is represented as a collection of tree-structure diagrams.


 single instance of a database tree
 The root of this tree is a dummy node
 The children of that node are actual instances of theappropriate record type
When transforming E-R diagrams to corresponding tree-structure diagrams, we must ensure that the resulting
diagrams are in the form of rooted trees.

Single Relationships:

 Example E-R diagram with two entity sets, customer and account, related througha binary, one-to-many
relationship depositor.
 Corresponding tree-structure diagram has
o the record type customer with three fields: customer-name, customer-street, and customer-city.
o the record type account with two fields: account-number and balance
o the link depositor, with an arrow pointing to customer

 If the relationship depositor is one to one, then the link depositor has two arrows.

 Only one-to-many and one-to-one relationships can be directly represented in thehierarchical mode.

Transforming Many-To-Many Relationships:

 Must consider the type of queries expected and the degree to which the databaseschema fits the given E-R
diagram.
 In all versions of this transformation, the underlying database tree (or trees) willhave replicated records.
 Create two tree-structure diagrams, T1, with the root customer, and T2, withthe root account.
 In T1, create depositor, a many-to-one link from account to customer.
 In T2, create account-customer, a many-to-one link from customer to account.

Virtual Records:
 For many-to-many relationships, record replication is necessary to preserve the tree-structure organization
of the database.
o Data inconsistency may result when updating takes place
o Waste of space is unavoidable
 Virtual record — contains no data value, only a logical pointer to a particular physical record.
 When a record is to be replicated in several database trees, a single copy of that record is kept in one of the
trees and all other records are replaced with a virtual record.
 Let R be a record type that is replicated in T1, T2, . . ., Tn. Create a new virtual record type virtual-R and
replace R in each of the n – 1 trees with a record of type virtual-R.
 Eliminate data replication in the diagram shown on page B.11; create virtual- customer and virtual-account.
 Replace account with virtual-account in the first tree, and replace customer with
virtual-customer in the second tree.
 Add a dashed line from virtual-customer to customer, and from virtual-account to account, to specify the
association between a virtual record and its corresponding physical record.
Network Model:
 Data are represented by collections of records.
o similar to an entity in the E-R model
o Records and their fields are represented as record type
 Type customer = record type account = record type
customer-name: string; account-number: integer;
customer-street: string; balance: integer;
customer-city: string;
 Relationships among data are represented by links
o similar to a restricted (binary) form of an E-R relationship
o restrictions on links depend on whether the relationship is many-many, many-to-one, or one-to-one.
Data-Structure Diagrams:
 Schema representing the design of a network database.
 A data-structure diagram consists of two basic components:
o Boxes, which correspond to record types.
o Lines, which correspond to links.
 Specifies the overall logical structure of the database.

For every E-R diagram, there is a corresponding data-structure diagram.

Since a link cannot contain any data value, represent an E-R relationship withattributes with a new record type
and links.
To represent an E-R relationship of degree 3 or higher, connect the participating record types through a new
record type that is linked directly to each of the originalrecord types.

1. Replace entity sets account, customer, and branch with record types account, customer, and branch,
respectively.
2. Create a new record type Rlink (referred to as a dummy record type).
3. Create the following many-to-one links:
o CustRlink from Rlink record type to customer record type
o AcctRlnk from Rlink record type to account recordtype
o BrncRlnk from Rlink record type to branch recordtype
The DBTG CODASYL Model:
o All links are treated as many-to-one relationships.
o To model many-to-many relationships, a record type is defined to represent therelationship and two links are
used.

DBTG Sets:

o The structure consisting of two record types that are linked together is referredto in the DBTG model as a
DBTG set
o In each DBTG set, one record type is designated as the owner, and the other isdesignated as the member, of
the set.
o Each DBTG set can have any number of set occurrences (actual instances oflinked records).
o Since many-to-many links are disallowed, each set occurrence has preciselyone owner, and has zero or more
member records.
o No member record of a set can participate in more than one occurrence of theset at any point.
o A member record can participate simultaneously in several set occurrences of
different DBTG set.
UNIT-II: Relational Algebra
 Relational Algebra is procedural query language, which takes Relation as input andgenerates relation as
output. Relational algebra mainly provides theoretical foundation for relational databases and SQL.
 Relational algebra is a procedural query language, it means that it tells what data to beretrieved and how to be
retrieved.
 Relational Algebra works on the whole table at once, so we do not have to use loopsetc to iterate over all the
rows (tuples) of data one by one.
 All we have to do is specify the table name from which we need the data, and in a single line of command,
relational algebra will traverse the entire given table to fetchdata for you.
Basic/Fundamental Operations:

1. Select (σ)
2. Project (∏)
3. Union (𝖴)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)

Select Operation (σ) :This is used to fetch rows (tuples) from table(relation) which satisfies a given condition.
Syntax: σp(r)

 σ is the predicate
 r stands for relation which is the name of the table
 p is prepositional logic ex: σage > 17 (Student)
This will fetch the tuples(rows) from table Student, for which age will be greater than 17. σage > 17 and gender
= 'Male' (Student)
This will return tuples(rows) from table Student with information of male students, of age more than 17.

BRANCH_NAME LOAN_NO AMOUNT


Downtown L-17 1000
Redwood L-23 2000
Perryride L-15 1500
Downtown L-14 1500
Mianus L-13 500
Roundhill L-11 900
Perryride L-16 1300

Input:
σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT


Perryride L-15 1500
Perryride L-16 1300

Project Operation (∏):


Project operation is used to project only a certain set of attributes of a relation. In simple words, If you want to
see only the names all of the students in the Student table, then you can use Project Operation.
It will only project or show the columns or attributes asked for, and will also removeduplicate data from the
columns.
Syntax of Project Operator (∏)
∏ column_name1, column_name2, .... , column_nameN(table_name)
Example:
∏Name, Age(Student)
Above statement will show us only the Name and Age columns for all the rows of data in Student table.

Example: CUSTOMER RELATION

NAME STREET CITY


Jones Main Harrison
Smith North Rye
Hays Main Harrison
Curry North Rye
Johnson Alma Brooklyn
Brooks Senator Brooklyn

Input:
∏ NAME, CITY (CUSTOMER)

Output:

NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn

Union Operation (𝖴):

This operation is used to fetch data from two relations(tables) or temporaryrelation(result of another
operation).For this operation to work, the relations(tables) specified should have same number of
attributes(columns) and same attribute domain. Also the duplicate tuples are autamatically eliminated
from the result.

Syntax: A 𝖴 B
∏Student(RegularClass) 𝖴 ∏Student(ExtraClass)
Example:

DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284

BORROW RELATION
CUSTOMER_NAME LOAN_NO

Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17

Input:
∏ CUSTOMER_NAME (BORROW) 𝖴 ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
Set Difference (-):
This operation is used to find data present in one relation and not present in the secondrelation. This
operation is also applicable on two relations, just like Union operation.
Syntax: A - B
where A and B are relations.
For example, if we want to find name of students who attend the regular class but not theextra class,
then, we can use the below operation:
∏Student(RegularClass) - ∏Student(ExtraClass)

Input: ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

CUSTOMER_NAME
Smith
Jones

Cartesian Product (X):


This is used to combine data from two different relations(tables) into one and fetch data from the combined
relation.
Syntax: A X B
For example, if we want to find the information for Regular Class and Extra Class which areconducted
during morning, then, we can use the following operation:
σtime = 'morning' (RegularClass X ExtraClass)
For the above query to work, both RegularClass and ExtraClass should have theattribute time.
Notation: E X D

EMPLOYEE
EMP_ID EMP_NAME EMP_DEPT
1 Smith A
2 Harry C
3 John B

DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal

Input:
EMPLOYEE X DEPARTMENT
Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME


1 Smith A A Marketing
1 Smith A B Sales

1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal

3 John B A Marketing
3 John B B Sales
3 John B C Legal

Rename Operation (ρ):


This operation is used to rename the output relation for any query operation which returnsresult like
Select, Project etc. Or to simply rename a relation(table)
Syntax: ρ(RelationNew, RelationOld)

The rename operation is used to rename the output relation. It is denoted by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)

Join in DBMS:

 A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.
 Join in DBMS is a binary operation which allows you to combine join product andselection in
one single statement.
 The goal of creating a join condition is that it helps you to combine the data from twoor more
DBMS tables.
 The tables in DBMS are associated using the primary key and foreign keys.

Types of SQL JOIN


1. INNER JOIN

2. LEFT JOIN

3. RIGHT JOIN
4. FULL JOIN
Table name: EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE


1 Angelina Chicago 200000 30
2 Robert Austin 300000 26
3 Christian Denver 100000 42
4 Kristen Washington 500000 29
5 Russell Los angels 200000 36
6 Marry Canada 600000 48

PROJECT
PROJECT_NO EMP_ID DEPARTMENT
101 1 Testing
102 2 Development
103 3 Designing
104 4 Development

1. INNER JOIN

In SQL, INNER JOIN selects records that have matching values in both tables as long as thecondition is
satisfied.

It returns the combination of all rows from both the tables where the condition satisfies.

Syntax
SELECT table1.column1, table1.column2 FROM table1 INNER JOIN table2
ON table1.matching_column = table2.matching_column;

Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE INNER JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

2. LEFT JOIN

The SQL left join returns all the values from left table and the matching values from the righttable. If
there is no matching join value, it will return NULL.

Syntax
SELECT table1.column1, table1.column2 FROM table1LEFT JOIN table2
ON table1.matching_column = table2.matching_column;

Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE LEFT JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL

3.RIGHT JOIN

In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and the matched values
from the left table. If there is no matching in both tables, it will return NULL.

Syntax
SELECT table1.column1, table1.column2 FROM table1 RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;

Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE RIGHT JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
.
4. FULL JOIN

In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join tables have
all the records from both tables. It puts NULL on the place of matches not found.

Syntax
SELECT table1.column1, table1.column2 FROM table1 FULL JOIN table2
ON table1.matching_column = table2.matching_column;

Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE
FULL JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
Division Operator in SQL
Division Operator (÷): Division operator A÷B can be applied if and only if:
 Attributes of B is proper subset of Attributes of A.
 The relation returned by division operator will have attributes = (All attributes of A –All
Attributes of B)
 The relation returned by division operator will return those tuples from relation Awhich are
associated to every B’s tuple.

The division operator is used when we have to evaluate queries which contain thekeyword ALL.
Table 1: Course_Taken → It consists of the names of Students against the courses that they have
taken.

Student_Name Course

Robert Databases

Robert Programming Languages

David Databases

David Operating Systems

Hannah Programming Languages

Hannah Machine Learning

Tom Operating Systems


Table 2: Course_Required → It consists of the courses that one is required to take in order to
graduate.

Course

Databases

Programming Languages

Find all the students. Create a set of all students that have taken courses. This can be done easily using the
following command.
CREATE TABLE AllStudents AS SELECT DISTINCT Student_Name FROM Course_Taken
This command will return the table AllStudents, as the resultset:

Student_name

Robert

David

Hannah

Tom
Find all the students and the courses required to graduate
Next, we will create a set of students and the courses they need to graduate. We can express this in the
form of Cartesian Product of AllStudents and Course_Required using the following command.
CREATE table StudentsAndRequired AS
SELECT AllStudents.Student_Name, Course_Required.CourseFROM AllStudents,
Course_Required
Now, the new resultset - table Students And Required will be:

Student_Name Course

Robert Databases

Robert Programming Languages

David Databases

David Programming Languages

Hannah Databases

Hannah Programming Languages

Tom Databases

Tom Programming Languages


Relational Calculus:
Relational calculus is a non-procedural query language that tells the system what data to be retrieved but
doesn’t tell how to retrieve it. Relational Calculus exists in two forms:
1. Tuple Relational Calculus (TRC)
2. Domain Relational Calculus (DRC)
Tuple Relational Calculus (TRC)
Tuple relational calculus is used for selecting those tuples that satisfy the given condition.
Table: Student
First_Name Last_Name Age

Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Lets write relational calculus queries.
Query to display the last name of those students where age is greater than 30
{ t.Last_Name | Student(t) AND t.age > 30 }

In the above query you can see two parts separated by | symbol. The second part is where wedefine the
condition and in the first part we specify the fields which we want to display for the selected tuples.
The result of the above query would be:
Last_Name

Singh

{ t | Student(t) AND t.Last_Name = 'Singh' }


Query to display all the details of students where Last name is ‘Singh’
Output:
First_Name Last_Name
Age

Ajeet Singh 30
Chaitanya Singh 31

Ex:
Table-1: Customer
Customer name Street City
Saurabh A7 Patiala
Mehak B6 Jalandhar
Sumiti D9 Ludhiana
Ria A5 Patiala
Table-2: Branch
Branch name Branch city
ABC Patiala
DEF Ludhiana
GHI Jalandhar
Table-3: Account
Account number Branch name Balance
1111 ABC 50000
1112 DEF 10000
1113 GHI 9000
1114 ABC 7000
Table-4: Loan
Loan number Branch name Amount
L33 ABC 10000
L35 DEF 15000
L49 GHI 9000
L98 DEF 65000
Table-5: Borrower
Customer name Loan number
Saurabh L33
Mehak L49
Ria L98

Table-6: Depositor
Customer name Account number
Saurabh 1111
Mehak 1113
Sumiti 1114

Queries-1: Find the loan number, branch, amount of loans of greater than or equal to 10000 amount.

{t| t ∈ loan 𝖠 t[amount]>=10000}Resulting relation


Loan number Branch name Amount
L33 ABC 10000
L35 DEF 15000
L98 DEF 65000

Domain Relational Calculus (DRC)


In domain relational calculus the records are filtered based on the domains. Again we take the same
table to understand how DRC works.
Table: Student

First_Name Last_Name Age

Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28

{< First_Name, Age > | ∈ Student 𝖠 Age > 27}


Query to find the first name and age of students where student age is greater than 27
Note:
The symbols used for logical operators are: 𝖠 for AND, ∨ for OR and ┓for NOT.
Output:
First_Name Age

Ajeet 30
Chaitanya 31
Carl 28

SQL Basic Structure


Basic structure of an SQL expression consists of select, from and where clauses.
o select clause lists attributes to be copied - corresponds to relationalalgebra project.
o from clause corresponds to Cartesian product - lists relations to be used.
o where clause corresponds to selection predicate in relational algebra.

The SELECT statement is used to select data from a database.

The data returned is stored in a result table, called the result-set. To fetch the entire table or all the fields in
the table:
SELECT * FROM table_name;To fetch individual column data
SELECT column1,column2 FROM table_name

WHERE SQL clause


WHERE clause is used to specify/apply any condition while retrieving, updating or deletingdata from a
table. This clause is used mostly with SELECT, UPDATE and DELETEquery.
The basic syntax of the SELECT statement with the WHERE clause is as shown below. SELECT column1,
column2, columnN
FROM table_name
WHERE [condition]

Example
Consider the CUSTOMERS table having the following records −
+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ + + + + +

The following code is an example which would fetch the ID, Name and Salary fields fromthe
CUSTOMERS table, where the salary is greater than 2000 −
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000;

This would produce the following result −


+ + + +
| ID | NAME | SALARY |
+ + + +
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ + + +

From clause:
From clause can be used to specify a sub-query expression in SQL. The relation producedby the
sub-query is then used as a new relation on which the outer query is applied.
 Sub queries in the from clause are supported by most of the SQL implementations.

 The correlation variables from the relations in from clause cannot be used in the sub-queries
in the from clause.

Syntax:
SELECT column1, column2 FROM
(SELECT column_x as C1, column_y FROM table WHERE PREDICATE_X)as table2
WHERE PREDICATE;

SET Operations
SQL supports few Set operations which can be performed on the table data. These are used to get
meaningful results from data stored in the table, under different special conditions.
In this tutorial, we will cover 4 different types of SET operations, along with example:

1. UNION
2. UNION ALL
3. INTERSECT
4. MINUS

1. Union

o The SQL Union operation is used to combine the result of two or more SQL SELECT
queries.
o In the union operation, all the number of datatype and columns must be same in boththe
tables on which UNION operation is being applied.
The union operation eliminates the duplicate rows from its resultset

Syntax
SELECT column_name FROM table1 UNION
SELECT column_name FROM table2;
The First table

ID NAME
1 Jack
2 Harry
3 Jackson

The Second table

ID NAME
3 Jackson
4 Stephan
5 David
Union SQL query will be:SELECT * FROM First UNION
SELECT * FROM Second;
The resultset table will look like:

ID NAME
1 Jack
2 Harry
3 Jackson
4 Stephan
5 David
2. Union All
Union All operation is equal to the Union operation. It returns the set without removingduplication
and sorting the data.

Syntax:
SELECT column_name FROM table1 UNION ALL
SELECT column_name FROM table2;
Example: Using the above First and Second table. Union All query will be like:
SELECT * FROM First UNION ALL
SELECT * FROM Second;

The resultset table will look like:

-ID NAME
1 Jack
2 Harry
3 Jackson
3 Jackson
4 Stephan
5 David
3. Intersect

o It is used to combine two SELECT statements. The Intersect operation returns thecommon
rows from both the SELECT statements.
o In the Intersect operation, the number of data type and columns must be the same.
o It has no duplicates and it arranges the data in ascending order by default.

Syntax
SELECT column_name FROM table1 INTERSECT
SELECT column_name FROM table2;

Example:
Using the above First and second table, Intersect query will be:

SELECT * FROM First INTERSECT


SELECT * FROM Second;
The result set table will look like:

ID NAME
3 Jackson

4. Minus

o It combines the result of two SELECT statements. Minus operator is used to displaythe rows
which are present in the first query but absent in the second query.
o It has no duplicates and data arranged in ascending order by default.

Syntax:
SELECT column_name FROM table1 MINUS
SELECT column_name FROM table2;

Example:

Using the above First and Second table, Minus query will be:
SELECT * FROM FirstMINUS
SELECT * FROM Second;

The result set table will look like:

ID NAME
1 Jack
2 Harry
Aggregate functions in SQL
o SQL aggregation function is used to perform the calculations on multiple rows of asingle
column of a table. It returns a single value.
o It is also used to summarize the data.

Aggregate Functions
1) Count()
2) Sum()
3) Avg()
4) Min()
5) Max()
1. COUNT FUNCTION

o COUNT function is used to Count the number of rows in a database table. It can workon
both numeric and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified
table. COUNT(*) considers duplicate and Null.
Count(*): Returns total number of records

PRODUCT_MAST

PRODUCT COMPANY QTY RATE COST


Item1 Com1 2 10 20
Item2 Com2 3 25 75
Item3 Com1 2 30 60
Item4 Com3 5 10 50
Item5 Com2 2 20 40
Item6 Cpm1 3 25 75
Item7 Com1 5 30 150
Item8 Com1 3 10 30
Item9 Com2 2 25 50
Item10 Com3 4 30 120

Example: COUNT()
SELECT COUNT(*) FROM PRODUCT_MAST;

Output: 10

Example: COUNT with WHERE


SELECT COUNT(*) FROM PRODUCT_MAST; WHERE RATE>=20;

Output:7

Example: COUNT() with DISTINCT

SELECT COUNT(DISTINCT COMPANY) FROM PRODUCT_MAST;

Output:3

2. SUM Function

Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.
Syntax:
SUM()
or
SUM( [ALL|DISTINCT] expression )

Example: SUM()
SELECT SUM(COST) FROM PRODUCT_MAST;
Output:670

Example: SUM() with WHERE


SELECT SUM(COST) FROM PRODUCT_MAST WHERE QTY>3;
Output:320

3. AVG function

The AVG function is used to calculate the average value of the numeric type. AVG functionreturns
the average of all non-Null values.

Syntax
AVG()

Example:
SELECT AVG(COST) FROM PRODUCT_MAST;

Output:
67.00

4. MAX Function

MAX function is used to find the maximum value of a certain column. This function determines the
largest value of all selected values of a column.

Syntax: MAX()

Example:
SELECT MAX(RATE) FROM PRODUCT_MAST;
30

5. MIN Function

MIN function is used to find the minimum value of a certain column. This function determines the
smallest value of all selected values of a column.

Syntax:MIN() )

Example: SELECT MIN(RATE)FROM PRODUCT_MAST;

Output:10

GROUP BY Statement

The GROUP BY statement groups rows that have the same values into summary rows, like"find the
number of customers in each country".
The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM,
AVG) to group the result-set by one or more columns.
Syntax:

SELECT column_name(s)FROM table_name WHERE condition


GROUP BY column_name(s)
ORDER BY column_name(s);

Example:
 Group By single column: Group By single column means, to place all the rows with same
value of only that particular column in one group. Consider the query as shown below:
 SELECT NAME, SUM(SALARY) FROM Employee
 GROUP BY NAME;
The above query will produce the below output:

Group By multiple columns: Group by multiple column is say for example, GROUP BY
column1, column2. This means to place all the rows with same values of both the columns
column1 and column2 in one group. Consider the below query:
SELECT SUBJECT, YEAR, Count(*)
FROM Student
GROUP BY SUBJECT, YEAR;
HAVING Clause:

We know that WHERE clause is used to place conditions on columns but what if we wantto place
conditions on groups?
This is where HAVING clause comes into use. We can use HAVING clause to place conditions to
decide which group will be the part of final result-set. Also we can not usethe aggregate functions
like SUM(), COUNT() etc. with WHERE clause. So we have to use HAVING clause if we want
to use any of these functions in the conditions.
Syntax:
SELECT column1, function_name(column2) FROM table_name
WHERE condition
GROUP BY column1, column2HAVING condition
ORDER BY column1, column2;
function_name: Name of the function used for example, SUM() , AVG().
table_name: Name of the table.
condition: Condition used.
Example:
SELECT NAME, SUM(SALARY) FROM EmployeeGROUP BY NAME
HAVING SUM(SALARY)>3000;

Example:
Consider the CUSTOMERS table having the following records.
+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ + + + + +

Following is an example, which would display a record for a similar age count that would be more
than or equal to 2.
SQL > SELECT ID, NAME, AGE, ADDRESS, SALARY
FROM CUSTOMERS
GROUP BY age
HAVING COUNT(age) >= 2;

This would produce the following result −


+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 2 | Khilan | 25 | Delhi | 1500.00 |
+ + + + + +
Nested Queries

In nested queries, a query is written inside a query. The result of inner query is used inexecution
of outer query. We will use STUDENT, COURSE,
STUDENT_COURSE tables for understanding nested queries.
STUDENT
S_ID S_NAME S_ADDRESS S_PHONE S_AGE
S1 RAM DELHI 9455123451 18
S2 RAMESH GURGAON 9652431543 18
S3 SUJIT ROHTAK 9156253131 20
S4 SURESH DELHI 9156768971 18

COURSE
C_ID C_NAME
C1 DSA
C2 Programming
C3 DBMS

STUDENT_COURSE
S_ID C_ID
S1 C1
S1 C3
S2 C1
S3 C2
S4 C2
S4 C3

Example
Consider the CUSTOMERS table having the following records −

+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ + + + + +
Now, let us check the following subquery with a SELECT statement.

SQL> SELECT *
FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS
WHERE SALARY > 4500);

This would produce the following result.

+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ + + + + +

Students
id name class_id GPA
1 Jack Black 3 3.45
2 Daniel White 1 3.15
3 Kathrine Star 1 3.85
4 Helen Bright 2 3.10
5 Steve May 2 2.40

Teachers
id name subject class_id monthly_salary
1 Elisabeth Grey History 3 2,500
2 Robert Sun Literature [NULL] 2,000
3 John Churchill English 1 2,350
4 Sara Parker Math 2 3,000
Classes
id grade teacher_id number_of_students
1 10 3 21
2 11 4 25
3 12 1 28

SELECT *
FROM students WHERE GPA > (
SELECT AVG(GPA)
FROM students);

RESULT:
id name class_id GPA
1 Jack Black 3 3.45
3 Kathrine Star 1 3.85
SELECT AVG(number_of_students) FROM classes
WHERE teacher_id IN ( SELECT id
FROM teachers
WHERE subject = 'English' OR subject = 'History');

Views in SQL
o Views in SQL are considered as a virtual table. A view also contains rows and columns.
o To create the view, we can select the fields from one or more tables present in thedatabase.
o A view can either have specific rows based on certain condition or all the rows of atable.

Student_Detail
STU_ID NAME ADDRESS
1 Stephan Delhi
2 Kathrin Noida
3 David Ghaziabad
4 Alina Gurugram
Student_Marks
STU_ID NAME MARKS AGE
1 Stephan 97 19
2 Kathrin 86 21
3 David 74 18
4 Alina 90 20
5 John 96 18

1. Creating view

A view can be created using the CREATE VIEW statement. We can create a view from a single
table or multiple tables.

Syntax:
CREATE VIEW view_name ASSELECT column1, column2.....
FROM table_nameWHERE condition;

2. Creating View from a single table


Query:
CREATE VIEW DetailsView ASSELECT NAME, ADDRESS
FROM Student_Details WHERE STU_ID < 4;

Just like table query, we can query the view to view the data. SELECT * FROM DetailsView;
Output:
NAME ADDRESS
Stephan Delhi
Kathrin Noida
David Ghaziabad

3. Creating View from multiple tables


View from multiple tables can be created by simply include multiple tables in the SELECT
statement.
In the given example, a view is created named MarksView from two tables Student_Detailand
Student_Marks.
Query:
CREATE VIEW MarksView AS
SELECT Student_Detail.NAME, Student_Detail.ADDRESS, Student_Marks.MARKS FROM
Student_Detail, Student_Mark
WHERE Student_Detail.NAME = Student_Marks.NAME;
To display data of View MarksView:SELECT * FROM MarksView;
NAME ADDRESS MARKS
Stephan Delhi 97
Kathrin Noida 86
David Ghaziabad 74
Alina Gurugram 90

4. Deleting View
A view can be deleted using the Drop View statement.
Syntax:
DROP VIEW view_name;
Example:
If we want to delete the View MarksView, we can do this as:
DROP VIEW MarksView;

Uses of a View :
A good database should contain views due to the given reasons:
1. Restricting data access –
Views provide an additional level of table security by restricting access to apredetermined set
of rows and columns of a table.
2. Hiding data complexity –
A view can hide the complexity that exists in a multiple table join.
3. Simplify commands for the user –
Views allows the user to select information from multiple tables without requiring theusers to
actually know how to perform a join.
4. Store complex queries –
Views can be used to store complex queries.
5. Rename Columns –
Views can also be used to rename the columns without affecting the base tables provided the
number of columns in view must match the number of columns specified in select statement.
Thus, renaming helps to to hide the names of the columns of the base tables.
6. Multiple view facility –
Different views can be created on the same table for different users.

Trigger: A trigger is a stored procedure in database which automatically invokes whenever a


special event in the database occurs. For example, a trigger can be invokedwhen a row is inserted
into a specified table or when certain table columns are being updated.
Syntax:
create trigger [trigger_name] [before | after]
{insert | update | delete}on [table_name]
[for each row] [trigger_body] Explanation of syntax:

1. create trigger [trigger_name]: Creates or replaces an existing trigger with thetrigger_name.


2. [before | after]: This specifies when the trigger will be executed.
3. {insert | update | delete}: This specifies the DML operation.
4. on [table_name]: This specifies the name of the table associated with the trigger.
5. [for each row]: This specifies a row-level trigger, i.e., the trigger will be executed for each row
being affected.
6. [trigger_body]: This provides the operation to be performed as trigger is fired
BEFORE and AFTER of Trigger:
BEFORE triggers run the trigger action before the triggering statement is run. AFTER triggers run
the trigger action after the triggering statement is run.
Example:
Given Student Report Database, in which student marks assessment is recorded. In such schema,
create a trigger so that the total and average of specified marks is automatically inserted whenever a
record is insert.
Here, as trigger will invoke before record is inserted so, BEFORE Tag can be used.
Suppose the database Schema –
mysql> desc Student;
+ -+ + -+ -+ + +
| Field | Type | Null | Key | Default | Extra |
+ -+ + -+ -+ + +
| tid | int(4) | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL | |
| subj1 | int(2) | YES | | NULL | |
| subj2 | int(2) | YES | | NULL | |
| subj3 | int(2) | YES | | NULL | |
| total | int(3) | YES | | NULL | |
| per | int(3) | YES | | NULL | |
+ -+ + -+ -+ + +7 rows in set (0.00 sec)
SQL Trigger to problem statement. create trigger stud_marks
before INSERTon
Student
for each row
set Student.total = Student.subj1 + Student.subj2 + Student.subj3, Student.per =Student.total * 60
/ 100;
Above SQL statement will create a trigger in the student database in which whenever subject’s
marks are entered, before inserting this data into the database, trigger will compute those two
values and insert with the entered values. i.e.,
mysql> insert into Student values(0, "ABCDE", 20, 20, 20, 0, 0);
Query OK, 1 row affected (0.09 sec)
mysql> select * from Student;
+ +- + -+ +- + + +
| tid | name | subj1 | subj2 | subj3 | total | per |
+ +- + -+ +- + + +
| 100 | ABCDE | 20 | 20 | 20 | 60 | 36 |
+ +- + -+ +- + + +1 row in set (0.00 sec)
In this way trigger can be creates and executed in the databases.

Advantages of Triggers:

These are the following advantages of Triggers:


o Trigger generates some derived column values automatically
o Enforces referential integrity
o Event logging and storing information on table access
o Auditing
o Synchronous replication of tables
o Imposing security authorizations
o Preventing invalid transactionsCreating a trigger:
Syntax for creating trigger:
CREATE [OR REPLACE ] TRIGGER trigger_name
{BEFORE | AFTER | INSTEAD OF }
{INSERT [OR] | UPDATE [OR] | DELETE}
[OF col_name]
ON table_name
[REFERENCING OLD AS o NEW AS n]
[FOR EACH ROW]
WHEN (condition)
DECLARE
Declaration-statements
BEGIN
Executable-statementsEXCEPTION
Exception-handling-statements
END;

Here,
o CREATE [OR REPLACE] TRIGGER trigger_name: It creates or replaces an existingtrigger
with the trigger_name.
o {BEFORE | AFTER | INSTEAD OF} : This specifies when the trigger would beexecuted.
The INSTEAD OF clause is used for creating trigger on a view.
o {INSERT [OR] | UPDATE [OR] | DELETE}: This specifies the DML operation.
o [OF col_name]: This specifies the column name that would be updated.
o [ON table_name]: This specifies the name of the table associated with the trigger.
o [REFERENCING OLD AS o NEW AS n]: This allows you to refer new and old values for
various DML statements, like INSERT, UPDATE, and DELETE.
o [FOR EACH ROW]: This specifies a row level trigger, i.e., the trigger would be executed for
each row being affected. Otherwise the trigger will execute just once when the SQL statement
is executed, which is called a table level trigger.
o WHEN (condition): This provides a condition for rows for which the trigger would fire. This
clause is valid only for row level triggers.

PL/SQL Trigger Example


Let's take a simple example to demonstrate the trigger. In this example, we are using the following
CUSTOMERS table:

Create table and have records:


ID NAME AGE ADDRESS SALARY
1 Ramesh 23 Allahabad 20000
2 Suresh 22 Kanpur 22000
3 Mahesh 24 Ghaziabad 24000
4 Chandan 25 Noida 26000
5 Alex 21 Paris 28000
6 Sunita 20 Delhi 30000

Create trigger:
Let's take a program to create a row level trigger for the CUSTOMERS table that would fire for
INSERT or UPDATE or DELETE operations performed on the CUSTOMERS table. Thistrigger
will display the salary difference between the old values and new values:
CREATE OR REPLACE TRIGGER display_salary_changes BEFORE DELETE OR INSERT
OR UPDATE ON customersFOR EACH ROW
WHEN (NEW.ID > 0)
DECLARE
sal_diff number;
BEGIN
sal_diff := :NEW.salary - :OLD.salary; dbms_output.put_line('Old salary: ' || :OLD.salary);
dbms_output.put_line('New salary: ' || :NEW.salary);dbms_output.put_line('Salary difference: ' ||
sal_diff);
END;

After the execution of the above code at SQL Prompt, it produces the following result.
Trigger created.
Check the salary difference by procedure:

Use the following code to get the old salary, new salary and salary difference after the trigger
created.
DECLARE
total_rows number(2);
BEGIN
UPDATE customers
SET salary = salary + 5000;
IF sql%notfound THEN
dbms_output.put_line('no customers updated');
ELSIF sql%found THEN
total_rows := sql%rowcount;
dbms_output.put_line( total_rows || ' customers updated ');
END IF;
END;
Old salary: 20000
New salary: 25000
Salary difference: 5000
Old salary: 22000
New salary: 27000
Salary difference: 5000
Old salary: 24000
New salary: 29000
Salary difference: 5000
Old salary: 26000
New salary: 31000
Salary difference: 5000
Old salary: 28000
New salary: 33000
Salary difference: 5000

Output:

Old salary: 30000


New salary: 35000
Salary difference: 5000
6 customers updated

Note: As many times you executed this code, the old and new both salary is incremented by5000
and hence the salary difference is always 5000.
After the execution of above code again, you will get the following result.
Old salary: 25000
New salary: 30000
Salary difference: 5000
Old salary: 27000
New salary: 32000
Salary difference: 5000
Old salary: 29000
New salary: 34000
Salary difference: 5000
Old salary: 31000
New salary: 36000
Salary difference: 5000
Old salary: 33000
New salary: 38000
Salary difference: 5000
Old salary: 35000
New salary: 40000
Salary difference: 5000
6 customers updated

Important Points:
Following are the two very important points and should be noted carefully.
o OLD and NEW references are used for record level triggers these are not available fortable level triggers.
o If you want to query the table in the same trigger, then you should use the AFTER keyword, because
triggers can query the table or change it again only after the initialchanges are applied and the table is back
in a consistent state.
Procedure
The PL/SQL stored procedure or simply a procedure is a PL/SQL block which performs oneor more specific
tasks. It is just like procedures in other programming languages.
The procedure contains a header and a body.
o Header: The header contains the name of the procedure and the parameters or variables passed to the
procedure.
o Body: The body contains a declaration section, execution section and exception section similar to a
general PL/SQL block.

How to pass parameters in procedure:


When you want to create a procedure or function, you have to define parameters .There isthree ways to pass
parameters in procedure:
1. IN parameters: The IN parameter can be referenced by the procedure or function. The value of the
parameter cannot be overwritten by the procedure or the function.
2. OUT parameters: The OUT parameter cannot be referenced by the procedure orfunction, but the value
of the parameter can be overwritten by the procedure or function.
3. INOUT parameters: The INOUT parameter can be referenced by the procedure or function and the
value of the parameter can be overwritten by the procedure or function.
NOTE: A procedure may or may not return any value.
PL/SQL Create Procedure
Syntax for creating procedure:
CREATE [OR REPLACE] PROCEDURE procedure_name[ (parameter [,parameter]) ]
IS
[declaration_section]
BEGIN
executable_section[EXCEPTION
exception_section]
END [procedure_name];
Create procedure example

In this example, we are going to insert record in user table. So you need to create user table first.

Table creation:
create table user(id number(10) primary key,name varchar2(100)); Now write theprocedure code
to insert record in user table.

Procedure Code:
create or replace procedure "INSERTUSER"(id IN NUMBER,
name IN VARCHAR2)
is begin
insert into user values(id,name);end;
/
Output:
Procedure created.
PL/SQL program to call procedure
Let's see the code to call above created procedure.
BEGIN
insertuser(101,'Rahul'); dbms_output.put_line('record inserted successfully');
END;
Now, see the "USER" table, you will see one record is inserted.

ID Name
101 Rahul
PL/SQL Drop Procedure
Syntax for drop procedure
DROP PROCEDURE procedure name;
Example of drop procedure
DROP PROCEDURE pro1;
UNIT- III Normalization
Decomposition: the process of breaking up or dividing a single relation into two or more sub
relations is called as decomposition of a relation.

Decomposition in DBMS removes redundancy, anomalies and inconsistencies from adatabase by


dividing the table into multiple tables.

Lossless Decomposition

o If the information is not lost from the relation that is decomposed, then the decomposition
will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the samerelation
as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all thedecomposition give
the original relation.

Example:
EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT DEPT_NAME
_ID
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Market ing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing

o The above relation is decomposed into two relations EMPLOYEE andDEPARTMENT


EMPLOYEE table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
Now, when these two relations are joined on the common column "EMP_ID", thenthe resultant
relation will look like:
Employee ⋈ Department

EMP_ID EMP_NAM E EMP_AGE EMP_CITY DEPT_ID DEPT_NAM E


22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
Hence, the decomposition is Lossless join decomposition.

Lossy Decomposition:
As the name suggests, when a relation is decomposed into two or more relational schemas, the
loss of information is unavoidable when the original relation is retrieved.
Let us see an example:
<EmpInfo>

Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name


E001 Jacob 29 Alabama Dpt1 Operations
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Finance

Decompose the above table into two tables −


<EmpDetails>

Emp_ID Emp_Name Emp_Age Emp_Location


E001 Jacob 29 Alabama
E002 Henry 32 Alabama
E003 Tom 22 Texas

<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance

Now, you won’t be able to join the above tables,since Emp_ID is not a part of the
DeptDetails relation. Therefore, the above relation has lossy decomposition.

Dependency Preserving:

o It is an important constraint of the database.


o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of Reither must
be a part of R1 or R2 or must be derivable from the combination of functional dependencies
of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependencyset (A-
>BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).

Multivalued Dependency

o Multivalued dependency occurs when two attributes in a table are independent of eachother
but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a third
attribute that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(whiteand
black) of each model every year.

BIKE_MODEL MANUF_YEAR COLOR


M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other.

In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:

BIKE_MODEL → → MANUF_YEARBIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and"BIKE_MODEL


multidetermined COLOR".

Normalization: Normalization is a process of organizing the data in database to avoid data


redundancy, insertion anomaly, update anomaly & deletion anomaly.

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate the undesirable characteristics like Insertion, Update and Deletion
Anomalies.
o Normalization divides the larger table into the smaller table and links them usingrelationship.
o The normal form is used to reduce redundancy from the database table.

Anomalies in DBMS

There are three types of anomalies that occur when the database is not normalized. These are
– Insertion, update and deletion anomaly.
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing the
department details in which the employee works. At some point of time the table looks like this:

Update anomaly: we have two rows for employee Rick as he belongs to two departments of the
company. If we want to update the address of Rick then we have to update the same in two rows or
the data will become inconsistent. If somehow, the correct address gets updated in one department
but not in other then as per the database, Rick would be having two different addresses, which is not
correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently
not assigned to any department then we would not be able to insert the data into thetable if emp_dept
field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D890 then
deleting the rows that are having emp_dept as D890 would also delete the information of employee
Maggie since she is assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we will discuss
about normalization.

First Normal Form (1NF)

o A relation will be 1NF if it contains an atomic value.


o It states that an attribute of a table cannot hold multiple values. It must hold onlysingle-
valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attributeEMP_PHONE.

EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

Example: First normal form (1NF)

As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It
should hold only atomic values.
Example: Suppose a company wants to store the names and contact details of its employees. It
creates a table that looks like this:
emp_id emp_name emp_address emp_mobile
101 Herschel New Delhi 8912312390
102 Jon Kanpur 8812121212
9900012222
103 Ron Chennai 7778881212
104 Lester Bangalore 9990000123
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the
same field as you can see in the table above.

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”,
the emp_mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we should have the data like this:

emp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390
102 Jon Kanpur 8812121212
102 Jon Kanpur 9900012222
103 Ron Chennai 7778881212
104 Lester Bangalore 9990000123
104 Lester Bangalore 8123450987

Example:
ID Name Courses

1 A c1, c2

2 E c3

3 M C2, c3

In the above table Course is a multi valued attribute so it is not in 1NF. Below Table is in 1NF
as there is no multi valued attribute

ID Name Course

1 A c1

1 A c2

2 E c3

3 M c2

3 M c3
Second Normal Form (2NF)

o In the 2NF, relational must be in 1NF.


o In the second normal form, all non-key attributes are fully functional dependent on the
primary key

A table is said to be in 2NF if both the following conditions hold:


 Table is in 1NF (First normal form)
 No non-prime attribute is dependent on the proper subset of any candidate key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.

TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a


proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38

TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer

Example: Suppose a school wants to store the data of teachers and the subjects they teach. They
create a table that looks like this: Since a teacher can teach more than one subjects, thetable can have
multiple rows for a same teacher.
teacher_id subject teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non
prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate
key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the
proper subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40

teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry

Now the tables comply with Second normal form (2NF).

In Second Normal Form –

 A relation must be in first normal form and relation must not contain any partial
dependency.
 A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute(attributes
which are not part of any candidate key) is dependent on any proper subset of any candidate
key of the table.

 Partial Dependency – If the proper subset of candidate key determines non-primeattribute,


it is called partial dependency.

Example 1 – Consider table below.


STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
Note that, there are many courses having the same course fee.
Here, COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence, COURSE_FEE would be a non-prime attribute, as it does not belong to the one onlycandidate key
{STUD_NO, COURSE_NO} ;

But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper
subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the
candidate key, which is a partialdependency and so this relation is not in 2NF.
To convert the above relation to 2NF, we need to split the table into two tables such as:
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE

Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000

Example 2 – Consider following functional dependencies in relation R (A, B , C, D )

AB -> C [A and B together determine C]BC -> D [B and C together determine D]


In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper subset
of AB doesn’t determine any non-prime attribute.

Third Normal Form (3NF)

o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.

o If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for everynon-trivial function
dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

EMPLOYEE_DETAIL:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Super key in the table above:

{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP} ....................... so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-
prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule
of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
A table design is said to be in 3NF if both the following conditions hold:
 Table must be in 2NF
 Transitive functional dependency of non-prime attribute on any super key should beremoved.

An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for eachfunctional
dependency X-> Y at least one of the following conditions hold:

 X is a super key of table


 Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, theycreate a table
named employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district
1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of anycandidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that
makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id).
This violates the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to removethe transitive
dependency:

employee table:
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999

employee_zip table:
emp_zip emp_state emp_city emp_district
282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan

A relation is in third normal form, if there is no transitive dependency for non-primeattributes as well as it is
in second normal form.
A relation is in 3NF if at least one of the following condition holds in every non-trivialfunction dependency
X –> Y
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).

Transitive dependency – If A->B and B->C are two FDs then A->C is called transitivedependency.
Example 1 – In relation STUDENT given in Table 4,
FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE,STUD_STATE -> STUD_COUNTRY,
STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal form. To convert it
in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)

Example 2 – Consider relation R(A, B, C, D, E)A -> BC,


CD -> E,
B -> D,E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All attributeare on right
sides of all functional dependencies are prime.
Fourth normal form (4NF)
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, thenthe relation
will be a multi-valued dependency.
Example:

STUDENT:
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is
no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE:
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics

STUDENT_HOBBY:
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey

A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valueddependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, thenthe relation will be a multi-
valued dependency.
Example:

STUDENT:
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is
no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE:
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics

STUDENT_HOBBY:
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey

Example – Consider the database table of a class whaich has two relations R1 contains student ID(SID) and
student name (SNAME) and R2 contains course id(CID) and coursename (CNAME).

Table – R1(SID, SNAME)


ID SNAME
S1 A
S2 B
CID CNAME
C1 C
C2 D
When there cross product is done it resulted in multivalued dependencies:

Table – R1 X R2
SID SNAME CID CNAME
S1 A C1 C
S1 A C2 D
S2 B C1 C
S2 B C2 D
Multivalued dependencies (MVD) are:
SID->->CID; SID->->CNAME; SNAME->->CNAME

Multivalued Dependency:
o Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend
on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on athird attribute that's why
it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(whiteand black) of each
model every year.

BIKE_MODEL MANUF_YEAR COLOR


M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation of
these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and"BIKE_MODEL multidetermined
COLOR".

Join Dependency
 Join decomposition is a further generalization of Multivalued dependencies.
 If the join of R1 and R2 over C is equal to relation R, then we can say that a joindependency (JD) exists.
 Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D).
 Alternatively, R1 and R2 are a lossless decomposition of R.
 A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2, , Rn is lossless-join decomposition.
 The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to therelation R.
 Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD ofR.

Fifth normal form (5NF)


 A relation is in 5NF if it is in 4NF and not contains any join dependency and joiningshould be lossless.
 5NF is satisfied when all the tables are broken into as many tables as possible in orderto avoid
redundancy.
 5NF is also known as Project-join normal form (PJ/NF).

Example:
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1

Math John Semester 1


Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for
Semester 2. In this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all three columnstogether acts as a primary key, so we
can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen

P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
UNIT-IV TRANSACTION MANAGEMENT IN DBMS
 A transaction is a set of logically related operations.

 Now that we understand what is transaction, we should understand what are the problems associated with it.

 The main problem that can happen during a transaction is that the transaction can fail before finishing the all
the operations in the set. This can happen due to power failure, system crash etc.

 This is a serious problem that can leave database in an inconsistent state. Assume thattransaction fail after
third operation (see the example above) then the amount would be deducted from your account but your friend
will not receive it.

To solve this problem, we have the following two operations:

Commit: If all the operations in a transaction are completed successfully then commit thosechanges to the
database permanently.

Rollback: If any of the operation fails then rollback all the changes done by previousoperations.

STATES OF TRANSACTION:
Transactions can be implemented using SQL queries and Server. In the below-givendiagram, you can see how
transaction states works.

Active state

o The active state is the first state of every transaction. In this state, the transaction isbeing executed.
o For example: Insertion or deletion or updating a record is done here. But all therecords are still not saved to
the database.

Partially committed

o In the partially committed state, a transaction executes its final operation, but the data is still not saved to the
database.
o In the total mark calculation example, a final display of the total marks step isexecuted in this state.

Committed

A transaction is said to be in a committed state if it executes all its operations successfully. Inthis state, all the
effects are now permanently saved on the database system.
Failed state

o If any of the checks made by the database recovery system fails, then the transaction is said to be in the failed
state.
o In the example of total mark calculation, if the database is not able to fire a query to fetch the marks, then the
transaction will fail to execute.
Aborted

o If any of the checks fail and the transaction has reached a failed state then the database recovery system will
make sure that the database is in its previous consistent state. If not then it will abort or roll back the transaction
to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing thetransaction, all the executed
transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the twooperations:
1. Re-start the transaction
2. Kill the transaction

TRANSACTION PROPERTY

The transaction has the four properties. These are used to maintain consistency in a database, before and after the
transaction.

Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability

Atomicity
o It states that all operations of the transaction take place at once if not, the transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction istreated as one unit and
either run to completion or is not executed at all.

Atomicity involves the following two operations:

Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.

Consistency
o The integrity constraints are maintained so that the database is consistent before andafter the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a new stable state.
o The consistent property of database states that every transaction sees a consistent database instance.
o The transaction is used to transform the database from one consistent state to another consistent state.

Isolation
o It shows that the data which is used at the time of execution of a transaction cannot beused by the second
transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then that data item can't be
accessed by any other transaction T2 until the transaction T1ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.

Durability
o The durability property is used to indicate the performance of the database'sconsistent state. It states that the
transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the system failure. When a
transaction is completed, then the database reaches a state known as the consistent state. That consistent state
cannot be lost, even in the event ofa system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.

IMPLEMENTATION OF ATOMICITY AND DURABILITY

The recovery-management component of a database system can support atomicityand durability by a variety of
schemes.

E.g. the shadow-database scheme:

Shadow copy:

 In the shadow-copy scheme, a transaction that wants to update the database first creates a complete copy of
the database.
 All updates are done on the new database copy, leaving the original copy, the shadow copy, untouched. If at
any point the transaction has to be aborted, the system merely deletes the new copy. The old copy of the
database has not been affected.
 This scheme is based on making copies of the database, called shadow copies, assumes thatonly one
transaction is active at a time.
 The scheme also assumes that the database is simply a file on disk. A pointer called db- pointer is maintained
on disk; it points to the current copy of the database.
If the transaction completes, it is committed as follows:

 First, the operating system is asked to make sure that all pages of the new copy of the database have been
written out to disk. (Unix systems use the flush command for this purpose.)
 After the operating system has written all the pages to disk, the database system updates thepointer db-pointer
to point to the new copy of the database;
 the new copy then becomes the current copy of the database. The old copy of the database is then deleted.
Figure below depicts the scheme, showing the database state before and after the update.
SCHEDULE:

A series of operation from one transaction to another transaction is known as schedule. It is used to preserve the
order of the operation in each of the individual transaction.

1. SERIAL SCHEDULE

The serial schedule is a type of schedule where one transaction is executed completely beforestarting another
transaction. In the serial schedule, when the first transaction completes its cycle, then the next transaction is
executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no
interleaving of operations, then there are the following two possible outcomes:

1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
o Schedule A shows the serial schedule where T1 followed byT2.
o Schedule B shows the serial schedule where T2 followed byT1.

2. NON-SERIAL SCHEDULE
o If interleaving of operations is allowed, then there will be non-serial schedule.
o It contains many possible orders in which the system can execute the individualoperations of the
transactions.
o Schedule C and Schedule D are the non-serialschedules. It has interleaving of operations.

3. SERIALIZABLE SCHEDULE
o The serializability of schedules is used to find non-serial schedules that allow the transaction to execute
concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction have interleaving of their
operations.
o A non-serial schedule will be serializable if its result is equal to the result of itstransactions executed
serially.

SERIALIZABILITY IN DBMS
 Some non-serial schedules may lead to inconsistency of the database.
 Serializability is a concept that helps to identify which non-serial schedules are correct and will maintain the
consistency of the database.

1. Conflict Serializability

If a given non-serial schedule can be converted into a serial schedule by swapping its non-conflicting
operations, then it is called as a conflict serializable schedule.
Conflicting Operations:
Two operations are called as conflicting operations if all the following conditions hold true for them-
 Both the operations belong to different transactions
 Both the operations are on the same data item
 At least one of the two operations is a write operation
Example-Consider the following schedule-

In this schedule,

W1 (A) and R2 (A) are called as conflicting operations. This is because all the above conditions hold true for
them. Checking Whether a Schedule is Conflict Serializable Or Not-
Follow these steps to check whether a given non-serial schedule is conflict serializable or not-
Step-01:
Find and list all the conflicting operations.

Step-02:
Start creating a precedence graph by drawing one node for each transaction.

Step-03:
Draw an edge for each conflict pair such that if X i (V) and Yj (V) forms a conflict pair thendraw an edge from
Ti to Tj.
 This ensures that Ti gets executed before Tj.

Step-04:
Check if there is any cycle formed in the graph.
 If there is no cycle found, then the schedule is conflict serializable otherwise not.

VIEW SERIALIZABILITY?
View Serializability is a process to find out that a given schedule is view serializable or not. To check whether a
given schedule is view serializable, we need to check whether the given schedule is View Equivalent to its
serial schedule.

2. View Serializability
o A schedule will view serializable if it is view equivalent to a serial schedule.
o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.

View Equivalent:
Two schedules S1 and S2 are said to be view equivalent if they satisfy the followingconditions:

1. Initial Read:
An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In schedule S1, if a
transaction T1 is reading the data item A, then in S2, transaction T1 shouldalso read A.

Above two schedules are view equivalent because Initial read operation in S1 is done by T1and in S2 it is also
done by T1.

2. Updated Read
In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should read Awhich is updated by
Tj.
3. Final Write
A final write must be the same between both the schedules. In schedule S1, if a transactionT1 updates A at last
then in S2, final writes operations should also be done by T1.

Above two schedules is view equal because Final write operation in S1 is done by T3 and in S2, the final write
operation is also done by T3.

Recoverability of Schedule:
Sometimes a transaction may not execute completely due to a software issue, system crash orhardware failure.
In that case, the failed transaction has to be rollback. But some other transaction may also have used value
produced by the failed transaction. So we also have to rollback those transactions.

TRANSACTION ISOLATION LEVELS IN DBMS


The SQL standard defines four isolation levels :

1. Read Uncommitted – Read Uncommitted is the lowest isolation level. In this level, one transaction may
read not yet committed changes made by other transaction, therebyallowing dirty reads. In this level,
transactions are not isolated from each other.
2. Read Committed – This isolation level guarantees that any data read is committed at the moment it is read.
Thus it does not allow dirty read. The transactions hold a read or write lock on the current row, and thus
prevent other transactions from reading, updating or deleting it.
3. Repeatable Read – This is the most restrictive isolation level. The transaction holds read locks on all rows
it references and writes locks on all rows it inserts, updates, or deletes. Since other transaction cannot read, update
or delete these rows, consequently itavoids non-repeatable read.
4. Serializable – This is the highest isolation level. A serializable execution is guaranteed to be serializable.
Serializable execution is defined to be an execution of operations in which concurrently executing transactions
appears to be serially executing.

FAILURE CLASSIFICATION
To find that where the problem has occurred, we generalize a failure into the followingcategories:
1. Transaction failure
2. System crash
3. Disk failure

1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from where it can't go any
further. If a few transaction or process is hurt, then this is called as transaction failure.
Reasons for a transaction failure could be -
1. Logical errors: If a transaction cannot complete due to some code error or an internal error condition, then
the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction because the database system
is not able to execute it. For example, the system aborts an active transaction, in case of deadlock or resource
unavailability.

2. System Crash
System failure can occur due to power failure or other hardware or software failure. Example: Operating
system error.
Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be corrupted.

3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently. It was a common problem in the
early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability to the disk or any
other failure, which destroy all or part of diskstorage.

CONCURRENT EXECUTION OF TRANSACTION


In the transaction process, a system usually allows executing more than one transactionsimultaneously. This
process is called a concurrent execution.

Advantages of concurrent execution of a transaction


1. Decrease waiting time or turnaround time.
2. Improve response time
3. Increased throughput or resource utilization.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations. So, there is a need to
manage these two operations in the concurrent execution of the transactionsas if these operations are not
performed in an interleaved manner, and the data may become inconsistent. So, the following problems occur
with the Concurrent Execution of the operations:
1: Lost Update Problems (W - W Conflict)
2. Dirty Read Problems (W-R Conflict)
3. Unrepeatable Read Problem (W-R Conflict)

1. Lost update problem (Write – Write conflict)


This type of problem occurs when two transactions in database access the same data item and have their
operations in an interleaved manner that makes the value of some database item incorrect.
If there are two transactions T1 and T2 accessing the same data item value and then update it, then the second
record overwrites the first record.
Example: Let’s take the value of A is 100
Time Transaction T1 Transaction T2
t1 Read(A)
t2 A=A-50
t3 Read(A)
t4 A=A+50
t5 Write(A)
t6 Write(A)
Here,
 At t1 time, T1 transaction reads the value of A i.e., 100.
 At t2 time, T1 transaction deducts the value of A by 50.
 At t3 time, T2 transactions read the value of A i.e., 100.
 At t4 time, T2 transaction adds the value of A by 150.
 At t5 time, T1 transaction writes the value of A data item on the basis of value seen at time t2i.e., 50.
 At t6 time, T2 transaction writes the value of A based on value seen at time t4 i.e., 150.
 So at time T6, the update of Transaction T1 is lost because Transaction T2 overwrites the value of A
without looking at its current value.
 Such type of problem is known as the Lost Update Problem.

Dirty read problem (W-R conflict)

This type of problem occurs when one transaction T1 updates a data item of the database, and then that
transaction fails due to some reason, but its updates are accessed by some other transaction.
Example: Let’s take the value of A is 100.

Time Transaction T1 Transaction T2


t1 Read(A)
t2 A=A+20
t3 Write(A)
t4 Read(A)
t5 A=A+30
t6 Write(A)
t7 Write(B)
Here,
 At t1 time, T1 transaction reads the value of A i.e., 100.
 At t2 time, T1 transaction adds the value of A by 20.
 At t3 time, T1transaction writes the value of A (120) in the database.
 At t4 time, T2 transactions read the value of A data item i.e., 120.
 At t5 time, T2 transaction adds the value of A data item by 30.
 At t6 time, T2transaction writes the value of A (150) in the database.
 At t7 time, a T1 transaction fails due to power failure then it is rollback according to atomicity property
of transaction (either all or none).
 So, transaction T2 at t4 time contains a value which has not been committed in the database. The value
read by the transaction T2 is known as a dirty read.

Unrepeatable read (R-W Conflict)


It is also known as an inconsistent retrieval problem. If a transaction T 1 reads a value of data item twice and the
data item is changed by another transaction T 2 in between the two read operation. Hence T1 access two
different values for its two read operation of the same data item.
Example: Let’s take the value of A is 100

Time Transaction T1 Transaction T2


t1 Read(A)
t2 Read(A)
t3 A=A+30
t4 Write(A)
t5 Read(A)
Here,
 At t1 time, T1 transaction reads the value of A i.e., 100.
 At t2 time, T2transaction reads the value of A i.e., 100.
 At t3 time, T2 transaction adds the value of A data item by 30.
 At t4 time, T2 transaction writes the value of A (130) in the database.
 Transaction T2 updates the value of A. Thus, when another read statement is performed by transaction
T1, it accesses the new value of A, which was updated by T2. Such type of conflict is known as R-W
conflict.
UNIT-V CONCURRENCY CONTROL
CONCURRENCY CONTROL

Concurrency Control is the working concept that is required for controlling and managing theconcurrent
execution of database operations and thus avoiding the inconsistencies in the database. Thus, for maintaining
the concurrency of the database, we have the concurrency control protocols.

Concurrency Control Protocols

The concurrency control protocols ensure the atomicity, consistency, isolation, durability and serializability of
the concurrent execution of the database transactions. Therefore, these protocols are categorized as:
o Lock Based Concurrency Control Protocol
o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol

Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires anappropriate lock on it. There
are two types of lock:

1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read bythe transaction.
o It can be shared between the transactions because when the transaction holds a lock,then it can't update the
data on the data item.

2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by thetransaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the samedata simultaneously.

TWO-PHASE LOCKING (2PL)


o The two-phase locking protocol divides the execution phase of the transaction into three parts.
o In the first part, when the execution of the transaction starts, it seeks permission forthe lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is started assoon as the transaction
releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only releases theacquired locks.

There are two phases of 2PL:


Growing phase: In the growing phase, a new lock on the data item may be acquired by thetransaction, but none
can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may bereleased, but no new
locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.
2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.

Example:

The following way shows how unlocking and locking work with 2-PL.

Transaction T1:
o Growing phase: from step 1-3
o Shrinking phase: from step 5-7
o Lock point: at 3

Transaction T2:
o Growing phase: from step 2-6
o Shrinking phase: from step 8-9
o Lock point: at 6

4. Strict Two-phase locking (Strict-2PL)


o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks, the transaction
continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock release.
TIMESTAMP ORDERING PROTOCOL
o The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The order of
transaction is nothing but the ascending order of the transaction creation.
o The priority of the older transaction is higher that's why it executes first. To determinethe timestamp of the
transaction, this protocol uses system time or logical counter.
o The lock-based protocol is used to manage the order between conflicting pairs amongtransactions at the
execution time. But Timestamp based protocols start working as soon as a transaction is created.

Basic Timestamp ordering protocol works as follows:

1. Check the following condition whenever a transaction Ti issues a Read (X) operation:

 If W_TS(X) >TS(Ti) then the operation is rejected.

 If W_TS(X) <= TS(Ti) then the operation is executed.

 Timestamps of all the data items are updated.

2. Check the following condition whenever a transaction Ti issues a Write(X) operation:

 If TS(Ti) < R_TS(X) then the operation is rejected.

 If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the operation is
executed.

Where,
TS(TI) denotes the timestamp of the transaction Ti. R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X. Validation Based Protocol
Validation phase is also known as optimistic concurrency control technique. In the validation based protocol, the
transaction is executed in the following three phases:

1. Read phase: In this phase, the transaction T is read and executed. It is used to read the value of various data
items and stores them in temporary local variables. It can perform all the write operations on temporary
variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against the actual data to see
if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results are written to the
database or system otherwise the transaction is rolled back.

Here each phase has the following different timestamps:


Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its validationphase.
Finish(Ti): It contains the time when Ti finishes its write phase.

 This protocol is used to determine the time stamp for the transaction for serialization using the time
stamp of the validation phase, as it is the actual phase which determines if the transaction will commit or
rollback.
 Hence TS(T) = validation(T).
 The serializability is determined during the validation process. It can't be decided in advance.
 While executing the transaction, it ensures a greater degree of concurrency and also less number of
conflicts.
 Thus it contains transactions which have less number of rollbacks.

THOMAS WRITE RULE

Thomas Write Rule provides the guarantee of serializability order for the protocol. Itimproves the Basic
Timestamp Ordering Algorithm.
The basic Thomas write rules are as follows:
o If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and operation is rejected.
o If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the transactionand continue processing.
o If neither condition 1 nor condition 2 occurs, then allowed to execute the WRITEoperation by transaction Ti
and set W_TS(X) to TS(T).

MULTIPLE GRANULARITY

Granularity: It is the size of data item allowed to lock.


Multiple Granularity:
o It can be defined as hierarchically breaking up the database into blocks which can be locked.
o The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
o It maintains the track of what to lock and how to lock.
o It makes easy to decide either to lock a data item or to unlock a data item. This type ofhierarchy can be
graphically represented as a tree.
o The first level or higher level shows the entire database.
o The second level represents a node of type area. The higher level database consists ofexactly these areas.
o The area consists of children nodes which are known as files. No file can be present in more than one area.
o Finally, each file contains child nodes known as records. The file has exactly thoserecords that are its child
nodes. No records represent in more than one file.
Hence, the levels of the tree starting from the top level are as follows:

 Database
 Area
 File
 Record
Recovery and Atomicity:

When a system crashes, it may have several transactions being executed and various files opened for them to
modify the data items.
But according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained, that is,
either all the operations are executed or none.
Database recovery means recovering the data when it get deleted, hacked ordamaged accidentally.
Atomicity is must whether is transaction is over or not it should reflect in the databasepermanently or it should
not effect the database at all.

When a DBMS recovers from a crash, it should maintain the following −

 It should check the states of all the transactions, which were being executed.

 A transaction may be in the middle of some operation; the DBMS must ensurethe atomicity of the transaction
in this case.
 It should check whether the transaction can be completed now or it needs to berolled back.
 No transactions would be allowed to leave the DBMS in an inconsistent state.

There are two types of techniques, which can help a DBMS in recovering as well as maintaining the
atomicity of a transaction −
Maintaining the logs of each transaction, and writing them onto some stable storagebefore actually modifying
the database.
Maintaining shadow paging, where the changes are done on a volatile memory, and later, the actual database
is updated.
Log-Based Recovery

o The log is a sequence of records. Log of each transaction is maintained in some stablestorage so that if any
failure occurs, then it can be recovered from there.
o If any operation is performed on the database, then it will be recorded in the log.

o But the process of storing the logs should be done before the actual transaction isapplied in the database.

There are two approaches to modify the database:

1. Deferred database modification:

o The deferred modification technique occurs if the transaction does not modify thedatabase until it has
committed.
o In this method, all the logs are created and stored in the stable storage, and thedatabase is updated when a
transaction commits.

2. Immediate database modification:

o The Immediate modification technique occurs if database modification occurs whilethe transaction is still active.
o In this technique, the database is modified immediately after every operation. It follows an actual database
modification.

Recovery with Concurrent Transactions

Concurrency control means that multiple transactions can be executed at the same time and then the interleaved
logs occur. But there may be changes in transaction results so maintain the order of execution of those
transactions.
During recovery, it would be very difficult for the recovery system to backtrack all the logs and then start
recovering.
Recovery with concurrent transactions can be done in the following four ways.

1. Interaction with concurrency control


2. Transaction rollback
3. Checkpoints
4. Restart recovery

Interaction with concurrency control:


In this scheme, the recovery scheme depends greatly on the concurrency control scheme that is used. So, to
rollback a failed transaction, we must undo the updates performed by thetransaction.

Transaction rollback :
 In this scheme, we rollback a failed transaction by using the log.
 The system scans the log backward a failed transaction, for every log record found inthe log the system
restores the data item.

Checkpoints :
 A checkpoint is a process of saving a snapshot of the applications state so that it canrestart from that point in
case of failure.
 Checkpoint is a point of time at which a record is written onto the database form thebuffers.
 Checkpoint shortens the recovery process.
 When it reaches the checkpoint, then the transaction will be updated into the database, and till that point, the
entire log file will be removed from the file. Then the log file is updated with the new step of transaction till the
next checkpoint and so on.
 The checkpoint is used to declare the point before which the DBMS was in the consistent state, and all the
transactions were committed.

Restart recovery:
 When the system recovers from a crash, it constructs two lists.
 The undo-list consists of transactions to be undone, and the redo-list consists oftransaction to be redone.
 The system constructs the two lists as follows: Initially, they are both empty. Thesystem scans the log
backward, examining each record, until it finds the first
<checkpoint> record.

Check Points:
 Checkpoints are a process of saving a snapshot of the applications state so that it canrestart from that point in
case of failure.
 Checkpoint is a point of time at which a record is written onto the database form the buffers.
 Checkpoint shortens the recovery process.
 When it reaches the checkpoint, then the transaction will be updated into the database, and till that point, the
entire log file will be removed from the file. Then the log file is updated with the new step of transaction till the
next checkpoint and so on.
The checkpoint is used to declare the point before which the DBMS was in the consistent state, and all the
transactions were committed.

BUFFER MANAGEMENT
The buffer manager is the software layer that is responsible for bringing pages from physical disk to main
memory as needed. The buffer manages the available main memory bydividing the main memory into a
collection of pages, which we called as buffer pool. The main memory pages in the buffer pool are called
frames.
 Data must be in RAM for DBMS to operate on it.
 Buffer manager hides the fact that not all data is in RAM.
Buffer Manager

 A Buffer Manager is responsible for allocating space to the buffer in order to store data into the buffer.
 If a user request a particular block and the block is available in the buffer, the buffer manager provides
the block address in the main memory.
 If the block is not available in the buffer, the buffer manager allocates the block in the buffer.
 If free space is not available, it throws out some existing blocks from the buffer to allocate the required
space for the new block.
 The blocks which are thrown are written back to the disk only if they are recentlymodified when writing
on the disk.
 If the user requests such thrown-out blocks, the buffer manager reads the requested block from the disk
to the buffer and then passes the address of the requested block tothe user in the main memory.
 However, the internal actions of the buffer manager are not visible to the programs that may create any
problem in disk-block requests. The buffer manager is just like avirtual machine

Failure with Loss of Nonvolatile Storage

Loss of Volatile Storage


A volatile storage like RAM stores all the active logs, disk buffers, and related data. In addition, it stores all the
transactions that are being currently executed. What happens if such a volatile storage crashes abruptly? It
would obviously take away all the logs and active copies of the database. It makes recovery almost
impossible, as everything that is required to recover the data is lost.
Following techniques may be adopted in case of loss of volatile storage −

 We can have checkpoints at multiple stages so as to save the contents of the database periodically.
 A state of active database in the volatile memory can be periodically dumped onto a stable storage,
which may also contain logs and active transactions and buffer blocks.
 <dump> can be marked on a log file, whenever the database contents are dumped from a non-volatile
memory to a stable one.

Recovery
 When the system recovers from a failure, it can restore the latest dump.
 It can maintain a redo-list and an undo-list as checkpoints.
 It can recover the system by consulting undo-redo lists to restore the state of all transactions up to the
last checkpoint.
ARIES Algorithm:

Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is based on the Write Ahead Log (WAL)
protocol. Every update operation writes a log record which is one of the following :

1. Undo-only log record:


Only the before image is logged. Thus, an undo operation can be done to retrieve the old data.
2. Redo-only log record:
Only the after image is logged. Thus, a redo operation can be attempted.
3. Undo-redo log record:
Both before images and after images are logged.
o In it, every log record is assigned a unique and monotonically increasing logsequence number (LSN).
o Every data page has a page LSN field that is set to the LSN of the log recordcorresponding to the last
update on the page.
o WAL requires that the log record corresponding to an update make it to stablestorage before the data
page corresponding to that update is written to disk.
o For performance reasons, each log write is not immediately forced to disk. A log tailis maintained in
main memory to buffer log writes.
o The log tail is flushed to disk when it gets full. A transaction cannot be declaredcommitted until the
commit log record makes it to disk.
o Once in a while the recovery subsystem writes a checkpoint record to the log. Thecheckpoint record
contains the transaction table and the dirty page table.
o A master log record is maintained separately, in stable storage, to store the LSN ofthe latest checkpoint
record that made it to disk.
o On restart, the recovery subsystem reads the master log record to find the checkpoint’s LSN, reads the
checkpoint record, and starts recovery from there on.
The recovery process actually consists of 3 phases:

1. Analysis:
The recovery subsystem determines the earliest log record from which the next passmust start. It also scans
the log forward from the checkpoint record to construct a snapshot of what the system looked like at the
instant of the crash.
2. Redo:
Starting at the earliest LSN, the log is read forward and each update redone.
3. Undo:
The log is scanned backward and updates corresponding to loser transactions areundone.
4. Remote Backup:
Remote backup provides a sense of security in case the primary location where the database is located gets
destroyed. Remote backup can be offline or real-time or online. In case it is offline, it is maintained manually.
Online backup systems are more real-time and lifesavers for database administrators and investors. An online
backup system is a mechanism where every bit of the real-time data is backed up simultaneously at two distant
places. One of them is directly connected to the system and the other one is kept at a remote place as backup.
As soon as the primary database storage fails, the backup system senses the failure and switches the user system
to the remote storage. Sometimes this is so instant that the users can’t even realize a failure.
File – A file is named collection of related information that is recorded on secondary storage such as magnetic
disks, magnetic tables and optical disks.

Concurrency Control

 Lock-Based Protocols
 Timestamp-Based Protocols
 Validation-Based Protocols
 Multiple Granularity
 Multiversion Schemes
 Deadlock Handling
Lock-Based Protocols
 A lock is a mechanism to control concurrent access to a data item
 Data items can be locked in two modes :
o exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X
instruction.
o shared (S) mode. Data item can only be read. S-lock is requested using
lock-S instruction.
 Lock requests are made to concurrency-control manager. Transaction can proceedonly after request is
granted.
 Lock-compatibility matrix
 A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on
the item by other transactions
 Any number of transactions can hold shared locks on an item, but if any transaction holds an exclusive on the
item no other transaction may hold any lock on the item.
 If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other
transactions have been released. The lock is then granted.
 Example of a transaction performing locking:
T2: lock-S(A);read (A); unlock(A); lock-S(B);
read (B); unlock(B); display(A+B)
 Locking as above is not sufficient to guarantee serializability — if A and B getupdated in-between the read of
A and B, the displayed sum would be wrong.
 A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. Locking
protocols restrict the set of possible schedules.

Pitfalls of Lock-Based Protocols:


 Consider the partial schedule

 Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to release its lock on
B, while executing lock-X(A) causes T3 to wait forT4 to release its lock on A.
 Such a situation is called a deadlock.
 To handle a deadlock one of T3 or T4 must be rolled backand its locks released.
The Two-Phase Locking Protocol
One protocol that ensures serializability is the two-phase locking protocol. This protocolrequires that each
transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the transaction
releases a lock, it enters the shrinking phase, and it can issue no more lock requests.

For example, transactions T3 and T4 are two phase. On the other hand, transactions T1 and T2 are not two
phase. Note that the unlock instructions do not need to appear at the end of the transaction. For example, in the
case of transaction T3, we could move the unlock(B) instruction to just after the lock-X(A) instruction, and still
retainthe two-phase locking property.
Cascading rollbacks can be avoided by a modification of two-phase locking called the strict two-phase locking
protocol. This protocol requires not only that locking be two phase, but also that all exclusive-mode locks taken
by a transaction be held until that transaction commits. This requirement ensures that any data written by an
uncommitted transaction are locked in exclusive mode until the transaction commits, preventing any other
transaction from reading the data.

Timestamp-Based Protocols

Timestamps:
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by TS(Ti). This
timestamp is assigned by the database system before the transaction Ti starts execution. If a transaction Ti has
been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then TS(Ti) < TS(Tj ). There are
two simple methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a transaction’s timestamp is equal to the value of
the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a transaction’s
timestamp is equal to the value of the counter When the transaction enters the system.
The timestamps of the transactions determine the serializability order. Thus, if
TS(Ti) < TS(Tj ), then the system must ensure that the produced schedule is equivalent to a serial schedule in
which transaction Ti appears before transaction Tj .
To implement this scheme, we associate with each data item Q two timestamp values:
• W-timestamp(Q) denotes the largest timestamp of any transaction that executedwrite(Q) successfully.
• R-timestamp(Q) denotes the largest timestamp of any transaction that executedread(Q) successfully.
These timestamps are updated whenever a new read(Q) or write(Q) instruction isexecuted.

The Timestamp-Ordering Protocol:


The timestamp-ordering protocol ensures that any conflicting read and write operations are executed in
timestamp order. This protocol operates as follows:
1. Suppose that transaction Ti issues read(Q).
 If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten. Hence, the
read operation is rejected, and Ti is rolled back.
 If TS(Ti) ≥ W-timestamp(Q), then the read operation is executed, and Rtimestamp(Q) is set to the maximum
of R-timestamp(Q) and TS(Ti).
2. Suppose that transaction Ti issues write(Q).
o If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the system
assumed that that value would never be produced. Hence, the system rejects the write operation and rolls Ti
back.
o If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, the system rejects
this write operation and rolls Ti back. Otherwise, the system executes the write operation and sets W-
timestamp( Q) to TS(Ti).

DEADLOCK HANDLING:
System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another
transaction inthe set.
Deadlock prevention protocols ensure that the system will never enter into a deadlock state. Some prevention
strategies :
o Require that each transaction locks all its data items before it begins execution (pre declaration).
o Impose partial ordering of all data items and require that a transaction can lock data items only in the
order specified by thepartial order (graph-based protocol).

DEADLOCK DETECTION:

Wait-for graph without a cycle Wait-for graph with a cycle

DEADLOCK DETECTION:
Deadlocks can be described as a wait-for graph, which consists of apair G = (V,E),
o V is a set of vertices (all the transactions in the system)
o E is a set of edges; each element is an ordered pair Ti Tj.
o If Ti Tj is in E, then there is a directed edge from Ti to Tj, implyingthat Ti is waiting for Tj to release
a data item.
o When Ti requests a data item currently being held by Tj, then the edge Ti Tj is inserted in the wait-for
graph. This edge is removed only when Tj is no longer holding a data item neededby Ti.
The system is in a deadlock state if and only if the wait-for graphhas a cycle. We must invoke a deadlock-
detection algorithm periodically to look for cycles.
Recovery from Deadlock:
When a detection algorithm determines that a deadlock exists, the system must recover from the deadlock. The
most common solution is to roll back one or more transactions to break the
deadlock. Three actions need to be taken:
1. Selection of a victim. Given a set of deadlocked transactions, we must determine which transaction (or
transactions) to roll back to break the deadlock. We should roll back those transactions that will incur the
minimum cost. Unfortunately, the term minimum cost is not a precise one. Many factors may determine the cost
of a rollback, including
a. How long the transaction has computed, and how much longer the transaction will compute before it
completes its designated task.
b. How many data items the transaction has used.
c. How many more data items the transaction needs for it to complete.
d. How many transactions will be involved in the rollback.
2. Rollback. Once we have decided that a particular transaction must be rolled back, we must determine how
far this transaction should be rolled back. The simplest solution is a total rollback: Abort the transaction and
then restart it. However, it is more effective to roll back the transaction only as far as necessary to break the
deadlock. Such partial rollback requires the system to maintain additional information about the state of all the
running transactions. Specially, the sequence of lock requests/grants and updates performed by the transaction
needs to be recorded. The deadlock detection mechanism should decide which locks the selected transaction
needs to release in order to break the deadlock. The selected transaction must be rolled back to the point where
it obtained the conflict of these locks, undoing all actions it took after that point. The recovery mechanism must
be capable of performing such partial rollbacks. Furthermore, the transactions must be capable of resuming
execution after a partial rollback. See the bibliographical notes for relevant references.
3. Starvation. In a system where the selection of victims is based primarily on cost factors, it may happen that
the same transaction is always picked as a victim. As a result, this transaction never completes its designated
task, thus there is starvation. We must ensure that transaction can be picked as a victim only a (small) number
of times. The most common solution is to include the number of rollbacks in the cost factor.

The Phantom Phenomenon


Consider transaction T29 that executes the following SQL query on the bank database:
select sum(balance) from account where branch-name = ’Perryridge’
Transaction T29 requires access to all tuples of the account relation pertaining to thePerryridge branch.
Let T30 be a transaction that executes the following SQL insertion:
insert into account values (A-201, ’Perryridge’, 900)
Let S be a schedule involving T29 and T30. We expect there to be potential for a conflict for the following
reasons:
• If T29 uses the tuple newly inserted by T30 in computing sum(balance), thenT29 read a value written by T30.
Thus, in a serial schedule equivalent to S, T30 must come before T29.
• If T29 does not use the tuple newly inserted by T30 in computing sum(balance),then in a serial schedule
equivalent to S, T29 must come before T30.
The second of these two cases is curious. T29 and T30 do not access any tuple in common, yet they connect
with each other. In effect, T29 and T30 connect on a phantom tuple. If concurrency control is performed at the
tuple granularity, this conflict would be detected. This problem is called the phantom phenomenon.

You might also like