DBMS Notes
DBMS Notes
DBMS Notes
DBMS (4CS4-05)
IV SEM (2022-23)
UNIT I: Introduction to DBMS
Objectives:
To Understand the basic concepts and the applications of database systems
To Master the basics of SQL and construct queries using SQL
To understand the relational database design principles
To become familiar with the basic issues of transaction processing andconcurrency control
To become familiar with database storage structures and access techniques
Outcomes:
Demonstrate the basic elements of a relational database management system
Ability to identify the data models for relevant problems
Ability to design entity relationship and convert entity relationship diagrams into RDBMS and formulate
SQL queries on the respect data
Apply normalization for the development of application software
INTRODUCTION TO DBMS:
Data is nothing but facts and statistics stored or free flowing over a network, generallyit's raw and
unprocessed.
Data becomes information when it is processed, turning it into something meaningful.
The database is a collection of inter-related data which is used to retrieve, insert and delete the data
efficiently.
It is also used to organize the data in the form of a table, schema, views, and reports,etc.
Using the database, you can easily retrieve, insert, and delete the information.
For example: The college Database organizes the data about the admin, staff, studentsand faculty etc.
DBMS is a collection of data. In DBMS, theuser File system is a collection of data. In this system, the
is not required to write the procedures. user has to write the procedures for managing the
database.
DBMS gives an abstract view of data that hides File system provides the detail of the data
the details. representation and storage of data.
DBMS provides a crash recovery mechanism, File system doesn't have a crash mechanism, i.e., if the
i.e., DBMS protects the user from the system system crashes while entering some data, then the
failure. content of the file will lost.
DBMS provides a good protection mechanism. It is very difficult to protect a file under the filesystem.
DBMS contains a wide variety of sophisticated File system can't efficiently store and retrieve thedata.
techniques to store and retrieve the data.
DBMS takes care of Concurrent access of data In the File system, concurrent access has many
using some form of locking. problems like redirecting the file while other deleting
some information or updating some information.
History of DBMS:
Data is a collection of facts and figures. The data collection was increasing day to day and they needed to be
stored in a device or software which is safer.
Charles Bachman was the first person to develop the Integrated Data Store (IDS) which was based on
network data model for which he was inaugurated with the Turing Award (The most prestigious award which
is equivalent to Nobel Prize in the field of Computer Science.). It was developed in early 1960’s.
In the late 1960’s, IBM (International Business Machines Corporation) developed the Integrated Management
Systems which is the standard database system used till date in many places. It was developed based on the
hierarchical database model. It was during the year 1970 that the relational database model was developed by
Edgar Codd. Many of the database models we use today are relational based. It was considered the
standardized database model from then.
The relational model was still in use by many people in the market. Later during the same decade (1980’s),
IBM developed the Structured Query Language (SQL) as a part of R project. It was declared as a standard
language for the queries by ISO and ANSI. The Transaction Management Systems for processing transactions
was also developed by James Gray for which he was felicitated the Turing Award.
A DBMS is software that allows creation, definition and manipulation of database, allowing users to store,
process and analyse data easily.
DBMS provides us with an interface or a tool, to perform various operations like creating database,
storing data in it, updating data, creating tables in the database anda lot more.
DBMS also provides protection and security to the databases.
It also maintains data consistency in case of multiple users. Here are some examples of popular DBMS
used these days:
MySql
Oracle
SQL Server
IBM DB2
DATABASE APPLICATIONS:
1. Telecom: There is a database to keeps track of the information regarding calls made, network usage,
customer details etc.
2. Industry: Where it is a manufacturing unit, warehouse or distribution centre, each oneneeds a database to
keep the records of ins and outs
3. Banking System: For storing customer info, tracking day to day credit and debit transactions, generating
bank statements etc.
4. Sales: To store customer information, production information and invoice details.
5. Airlines: To travel though airlines, we make early reservations; this reservation information along with
flight schedule is stored in database.
6. Education sector: Database systems are frequently used in schools and colleges to store and retrieve the
data regarding student details, staff details, course details, examdetails, payroll data, attendance details, fees
details etc.
Characteristics of DBMS:
Data stored into Tables: Data is never directly stored into the database. Data is stored into tables, created
inside the database.
Reduced Redundancy: In the modern world hard drives are very cheap, but earlier when hard drives
were too expensive, unnecessary repetition of data in database was a big problem. But DBMS follows
Normalisation which divides the data in such a way that repetition is minimum.
Data Consistency: On Live data, i.e. data that is being continuosly updated and added, maintaining the
consistency of data can become a challenge. But DBMS handles it allby itself.
Support Multiple user and Concurrent Access: DBMS allows multiple users to work on it(update, insert,
delete data) at the same time and still manages to maintain the data consistency.
Query Language: DBMS provides users with a simple Query language, using whichdata can be easily
fetched, inserted, deleted and updated in a database.
Advantages of DBMS:
Controls database redundancy: It can control data redundancy because it stores all data in one single
database file and that recorded data is placed in the database.
Data sharing: In DBMS, the authorized users of an organization can share data among multiple users.
Easily Maintenance: It can be easily maintainable due to the centralized nature of thedatabase system.
Reduce time: It reduces development time and maintenance need.
Backup: It provides backup and recovery subsystems which create automatic backupof data from
hardware and software failures and restores the data if required.
Multiple user interface: It provides different types of user interfaces like graphicaluser interfaces,
application program interfaces
Disadvantages of DBMS:
Cost of Hardware and Software: It requires a high speed of data processor and large memory size to run
DBMS software.
Size: It occupies a large space of disks and large memory to run them efficiently.
Complexity: Database system creates additional complexity and requirements.
Higher impact of failure: Failure is highly impacted the database because in most of the organization, all
the data stored in a single database and if the database is damageddue to electric failure or database
corruption then the data may be lost forever.
Database:
A database is organized collection of related data of an organization stored in formatted way which is shared by
multiple users.
The main features of data in a database are:
1. It must be well organized
2. It is related
3. It is accessible in a logical order without any difficulty
4. It is stored only once for example:
Consider the roll no, name, address of a student stored in a student file. It is collection of related data with an
implicit meaning.
Data in the database may be persistent, integrated and shared.
Persistent:
If data is removed from database due to some explicit request from user to remove.
Integrated:
A database can be a collection of data from different files and when any redundancyamong those files is
removed from database is said to be integrated data.
Sharing Data:
The data stored in the database can be shared by multiple users simultaneously without affecting the correctness
of data.
Why Database over file system:
In order to overcome the limitation of a file system, a new approach was required. Hence a database approach
emerged. A database is a persistent collection of logically related data. The initial attempts were to provide a
centralized collection of data. A database has a self describing nature. It contains not only the data sharing and
integrationof data of an organization in a single database.
A small database can be handled manually but for a large database and having multiple users it is difficult to
maintain it, In that case a computerized database is useful. The advantages of database system over traditional,
paper based methods of record keeping are:
Compactness: No need for large amount of paper files
Speed: The machine can retrieve and modify the data faster way then human being.
Less drudgery: Much of the maintenance of files by hand is eliminated.
Accuracy: Accurate, up-to-date information is fetched as per requirement of theuser at any time.
Function of DBMS:
1. Defining database schema: it must give facility for defining the databasestructure also specifies access
rights to authorized users.
2. Manipulation of the database: DBMS must have functions like insertion of record into database, updating
of data, deletion of data, and retrieval of data.
3. Sharing of database: The DBMS must share data items for multiple users bymaintaining consistency of
data.
4. Protection of database: It must protect the database against unauthorized users.
5. Database recovery: If for any reason the system fails DBMS must facilitate database recovery.
Database systems are made-up of complex data structures. To ease the user interaction withdatabase, the
developers hide internal irrelevant details from users. This process of hiding irrelevant details from user is
called data abstraction.
We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how data is actuallystored in database.
You can get the complex data structure details at this level.
Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data is stored
in database.
View level: Highest level of data abstraction. This level describes the user interaction withdatabase system.
Definition of schema: Design of a database is called the schema. Schema is of three types: Physical
schema, logical schema and view schema.
The design of a database at physical level is called physical schema, how the data stored in blocks of
storage is described at this level.
Design of database at logical level is called logical schema, programmers and database administrators
work at this level, at this level data can be described as certaintypes of data records gets stored in data
structures, however the internal details such as implementation of data structure is hidden at this level
(available at physical level).
Design of database at view level is called view schema. This generally describes enduser interaction with
database systems.
Definition of instance: The data stored in database at a particular moment of time is called instance of
database. Database schema defines the variable declarations in tables that belong to a particular database; the
value of these variables at a moment of time is called the instance ofthat database.
Relational Model (RM) represents the database as a collection of relations. A relation is nothing but a table
of values. Every row in the table represents a collection of related data values. These rows in the table denote a
real-world entity or relationship.
The table name and column names are helpful to interpret the meaning of values in each row.The data are
represented as a set of relations. In the relational model, data are stored as tables. However, the physical
storage of the data is independent of the way the data are logically organized.
Relational Model Concepts:
1. Attribute: Each column in a Table. Attributes are the properties which define arelation. e.g.,
Student_Rollno, NAME,etc.
2. Tables: In the Relational model the, relations are saved in the table format. It is stored along with its
entities. A table has two properties rows and columns. Rows represent records and columns represent
attributes.
3. Tuple: It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with itsattributes.
5. Degree: The total number of attributes which in the relation is called the degree of therelation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system. Relation instances
never have duplicate tuples.
9. Relation key - Every row has one, two or multiple attributes, which is called relation key.
10. Attribute domain – Every attribute has some pre-defined value and scope which isknown as attribute
domain.
Keys in DBMS:
KEYS in DBMS is an attribute or set of attributes which helps you to identify a row(tuple) in a relation(table).
They allow you to find the relation between two tables. Keys help you uniquely identify a row in a table by a
combination of one or more columns in that table. Key is also helpful for finding unique record or row from the
table. Database key is also helpful for finding unique record or row from the table.
Why we need a Key?
Here are some reasons for using SQL key in the DBMS system:
Keys help you to identify any row of data in a table. In a real-world application, a table could contain
thousands of records. Moreover, the records could be duplicated. Keys ensure that you can uniquely identify a
table record despite these challenges.
Allows you to establish a relationship between and identify the relation betweentables
Help you to enforce identity and integrity in the relationship.
There are mainly seven different types of Keys in DBMS and each key has its different functionality:
Super Key - A super key is a group of single or multiple keys which identifies rows in a table.
Primary Key - is a column or group of columns in a table that uniquely identifyevery row in that table.
Candidate Key - is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super key
with no repeated attributes.
Alternate Key - is a column or group of columns in a table that uniquely identifyevery row in that table.
Foreign Key - is a column that creates a relationship between two tables. The purpose of Foreign keys is to
maintain data integrity and allow navigation betweentwo different instances of an entity.
Compound Key - has two or more attributes that allow you to uniquely recognize a specific record. It is
possible that each column may not be unique by itself within thedatabase.
Composite Key - An artificial key which aims to uniquely identify each record is called a surrogate key.
These kind of key are unique because they are created whenyou don't have any natural primary key.
Surrogate Key - An artificial key which aims to uniquely identify each record is called a surrogate key.
These kind of key are unique because they are created whenyou don't have any natural primary key.
Syntax
The syntax to create a primary key using the ALTER TABLE statement in SQL is:
The following SQL creates a FOREIGN KEY on the "PersonID" column when the "Orders"table is
created:
CREATE TABLE Orders (
OrderID int NOT NULL, OrderNumber int NOT NULL, PersonID int,
PRIMARY KEY (OrderID),
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID)
);
ER model:
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to
define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple andeasy to design view
of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity- relationship diagram.
For example, suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city,
street name, pin code, etc and there will be a relationship between them.
Component of ER Diagram:
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can berepresented as
rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be taken as
an entity.
Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't containany key
attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent anattribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.
Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. Thecomposite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivaluedattribute. The
double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can berepresented by a
dashed ellipse.
For example, a person's age changes over time and can be derived from another attributelike Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is usedto represent
the relationship.
One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known asone to one
relationship.
For example, A female can marry to one male, and a male can marry to one female.
One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity onthe right
associates with the relationship, this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.
Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity onthe right
associates with the relationship, it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship, it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
Notation of ER diagram:
Database can be represented using the notations. In ER diagram, many notations are used to express the
cardinality. These notations are as follows:
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes haveto be
performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
Example:
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relationand if the primary key
has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Database Basics:
Data item:
The data item is also called as field in data processing and is the smallest unit of datathat has meaning to its
users.
Eg: “e101”,”sumit”
A subschema is derived schema derived from existing schema as per the user requirement. There may be more
than one subschema creates for a single conceptual schema.
Conceptual view
Internal level
A database management system that provides three level of data is said to follow three-level architecture.
External level
Conceptual level
Internal level
The external level is at the highest level of database abstraction. At this level, there will be many views define
for different users requirement. A view will describe only a subset of the database. Any number of user views
may exist for a given global or subschema.
For example, each student has different view of the time table. The view of a student of Btech (CSE) is different
from the view of the student of Btech(ECE).Thus this level of abstraction is concerned with different categories
of users. Each external view is described by means of a schema called schema or schema.
Conceptual level:
At this level of database abstraction all the database entities and the relationships among them are included.
One conceptual view represents the entire database. This conceptual view is defined by the conceptual schema.
The conceptual schema hides the details of physical storage structures and concentrate on describing entities,
data types, relationships, user operations and constraints.
It describes all the records and relationships included in the conceptual view. There is only one conceptual
schema per database. It includes feature that specify the checks to relation data consistency and integrity.
Internal level:
It is the lowest level of abstraction closest to the physical storage method used . It indicates how the data will
be stored and describes the data structures and access methods to be used by the database . The internal view is
expressed by internal schema.
The following aspects are considered at this level:
1. Storage allocation e.g: B-tree,hashing
2. access paths eg. specification of primary and secondary keys,indexes etc
3. Miscellaneous eg. Data compression and encryption techniques,optimization of the internal structures.
Database users :
Naive users:
Users who need not be aware of the presence of the database system or any other system supporting their usage
are considered naïve users. A user of an automatic teller machine falls on this category.
.
Online users:
These are users who may communicate with the database directly via an online terminal or indirectly via a user
interface and application program. These users are aware of the database system and also know the data
manipulation language system.
Application programmers:
Professional programmers who are responsible for developing application programs or user interfaces utilized
by the naïve and online user fall into this category.
Database Administration:
A person who has central control over the system is called database administrator.
The function of DBA are:
1. Creation and modification of conceptual Schema definition
2. Implementation of storage structure and access method.
3. Schema and physical organization modifications.
4. Granting of authorization for data access.
5. Integrity constraints specification.
6. Execute immediate recovery procedure in case of failures.
7. Ensure physical security to database.
Database language :
Elements of DBMS:
DML pre-compiler:
It converts DML statement embedded in an application program to normal procedure calls in the host language.
The pre-complier must interact with the query processor in order to generate the appropriate code.
DDL compiler:
The DDL compiler converts the data definition statements into a set of tables. These tables contain information
concerning the database and are in a form that can be used by other components of the dbms.
File manager:
File manager manages the allocation of space on disk storage and the data structure used to represent
information stored on disk.
Database manager:
A database manager is a program module which provides the interface between the low level data stored in the
database and the application programs and queries submitted to the system.
The responsibilities of database manager are:
1. Interaction with file manager: The data is stored on the disk using the file system which is provided by
operating system. The database manager translates the different DML statements into low-level file system
commands, so database manager is responsible for the actual storing, retrieving and updating of data in the
database.
2. Integrity enforcement: The data values stored in the database must satisfy certain constraints (eg: the age
of a person can't be less than zero).These constraints are specified by DBA. Data manager checks the
constraints and ifit satisfies then it stores the data in the database.
3. Security enforcement: Data manager checks the security measures for database from unauthorized users.
4. Backup and recovery: Database manager detects the failures occurs due to different causes (like disk
failure, power failure, deadlock, s/w error) and restores the database to original state of the database.
5. Concurrency control: When several users access the same database file simultaneously, there may be
possibilities of data inconsistency. It is responsible of database manager to control the problems occurs for
concurrenttransactions.
Query processor:
The query processor used to interpret to online user’s query and convert it into an efficient series of operations
in a form capable of being sent to the data manager for execution. The query processor uses the data dictionary
to find the details of data file and using this information it create query plan/access plan to execute the query.
Data Dictionary:
Data dictionary is the table which contains the information about database objects. It contains information like
1. External, conceptual and internal database description
2. Description of entities , attributes as well as meaning of data elements
3. Synonyms, authorization and security codes
4. Database authorization
DBMS STRUCTURE:
Database manager
File manager
DBMS
Data file
Data dictionary
Some main differences between a database management system and a file-processing system are:
• Both systems contain a collection of data and a set of programs which access that data. A database
management system coordinates both the physical and the logical
access to the data, whereas a file-processing system coordinates only the physical access.
• A database management system reduces the amount of data duplication by ensuring that a physical piece of
data is available to all programs authorized to have access to it, where as data written by one program in a file-
processing system may not be readable by another program.
• A database management system is designed to allow flexible access to data (i.e., queries), whereas a file-
processing system is designed to allow predetermined access to data (i.e., compiled programs).
• A database management system is designed to coordinate multiple users accessing the same data at the
same time. A file-processing system is usually designed to allow one or more programs to access different data
files at the same time. In a file-processing system, a file can be accessed by two programs concurrently only if
both programs have read-only access to the file.
Q. List five responsibilities of a database management system. For each responsibility, explain the
problems that would arise if the responsibility were not discharged.
A general purpose database manager (DBM) has five responsibilities:
a. Interaction with the file manager.
b. Integrity enforcement.
c. Security enforcement.
d. Backup and recovery.
e. Concurrency control.
If these responsibilities were not met by a given DBM (and the text points out that sometimes a responsibility is
omitted by design, such as concurrency control on a single-user DBM for a micro computer) the following
problems can occur, respectively:
a. No DBMS can do without this, if there is no file manager interaction then nothing stored in the files can be
retrieved.
b. Consistency constraints may not be satisfied, account balances could go below the minimum allowed,
employees could earn too much overtime (e.g.,hours > 80) or, airline pilots may fly more hours than allowed by
law.
c. Unauthorized users may access the database, or users authorized to access part of the database may be able
to access parts of the database for which they lack authority. For example, a high school student could get
access to national defense secret codes, or employees could find out what their supervisors earn.
d. Data could be lost permanently, rather than at least being available in a consistent state that existed prior to
a failure.
e. Consistency constraints may be violated despite proper integrity enforcement in each transaction. For
example, incorrect bank balances might be reflected due to simultaneous withdrawals and deposits, and so on.
EXERCISES:
ER-MODEL
Data model:
The data model describes the structure of a database. It is a collection of conceptual tools for describing data,
data relationships and consistency constraints and various types of data model such as
• Object based logical model
• ER-model
• Functional model
• Object oriented model
• Semantic model
• Record based logical model
• Hierarchical database model
• Network model
• Relational model
• Physical model
The entity-relationship data model perceives the real world as consisting of basic objects, called entities and
relationships among these objects. It was developed to facilitate data base design by allowing specification of an
enterprise schema which represents the overall logical structure of a data base.
The E-R data model employs three basic notions : entity sets, relationship sets andattributes.
Entity sets:
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For example,
each person in an enterprise is an entity. An entity has a set properties and the values for some set of properties
may uniquely identify an entity.
BOOK is entity and its properties (called as attributes) bookcode, booktitle, price etc.
An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all
persons who are customers at a given bank, for example, can be defined as the entity set customer.
Attributes:
An entity is represented by a set of attributes. Attributes are descriptive propertiespossessed by each member of
an entity set.
Customer is an entity and its attributes are customerid, custmername, custaddress etc.
An attribute as used in the E-R model , can be characterized by the following attributetypes.
Derived Attribute:
The values for this type of attribute can be derived from the values of existingattributes
eg:age which can be derived from (currentdate-birthdate) experience_in_year can be calculated as (currentdate-
joindate)
Relationship sets:
A relationship is an association among several entities.
A relationship set is a set of relationships of the same type. Formally, it is a mathematical relation on n>=2
entity sets. If E1,E2…En are entity sets, then a relationship set R is a subset of
{(e1,e2,…en)|e1Є E1,e2 Є E2..,en Є En}where (e1,e2,…en) is a relationship.
borrow loan
customer
Consider the two entity sets customer and loan. We define the relationship set borrow to denote the association
between customers and the bank loans that the customers have.
Participation constraints:
The participation constraints specify whether the existence of any entity depends on its being related to another
entity via the relationship. There are two types of participation constraints
Total :
When all the entities from an entity set participate in a relationship type , is called total participation. For
example, the participation of the entity set student on the relationship set must ‘opts’ is said to be total because
every student enrolled must opt for a course.
Partial:
When it is not necessary for all the entities from an entity set to particapte ion a relationship type, it is called
participation. For example, the participation of the entity set student in ‘represents’ is partial, since not every
student in a class is a class representative.
Weak Entity:
Entity types that do not contain any key attribute, and hence cannot be identified independently are called weak
entity types. A weak entity can be identified by uniquely only by considering some of its attributes in
conjunction with the primary key attribute of another entity, which is called the identifying owner entity.
Generally a partial key is attached to a weak entity type that is used for unique identification of weak entities
related to a particular owner type. The following restrictions must hold:
The owner entity set and the weak entity set must participate in one to many relationship set. This relationship
set is called the identifying relationship set of the weak entity set.
The weak entity set must have total participation in the identifying relationship.
Example:
Consider the entity type dependent related to employee entity, which is used to keep track of the dependents of
each employee. The attributes of dependents are: name, birthrate, sex and relationship. Each employee entity set
is said to its own the dependent entities that are related to it. However, not that the ‘dependent’ entity does not
exist of its own, it is dependent on the employee entity. In other words we can say that in case an employee
leaves the organization all dependents related to without the entity ‘employee’. Thus it is a weak entity.
Keys:
Super key:
A super key is a set of one or more attributes that taken collectively, allow us to identify uniquely an entity in the
entity set.
For example , customer-id,(cname,customer-id),(cname,telno)
Candidate key:
In a relation R, a candidate key for R is a subset of the set of attributes of R, whichhave the following properties:
Uniqueness:No two distinct tuples in R have the same values for the candidate key.
Irreducible:No proper subset of the candidate key has the uniqueness property that is the candidate key.
Eg: (cname,telno)
Primary key:
The primary key is the candidate key that is chosen by the database designer as the principal means of
identifying entities with in an entity set. The remaining candidate keys (if any), are called alternate key.
.
Advanced ER-diagram:
Abstraction is the simplification mechanism used to hide superfluous details of a set of objects. It allows one to
concentrate on the properties that are of interest to the application.
There are two main abstraction mechanism used to model information:
Aggregation:
Aggregation is the process of compiling information on an object, there by abstracting a higher level object. In
this manner, the entity person is derived by aggregating the characteristics of name, address, ssn. Another form
of the aggregation is abstracting a relationship objects and viewing the relationship as an object.
Job
Branch
Employe
Work
son
Manages
Manager
ER- Diagram For College Database
rollno
name addres
coursei cname duratio
Student
opts Course
1
1
Head
name dnam 1 name sal
1
addres relationship
Date
2. For each weak entity type W in the ER diagram, we create another relation R that contains all simple
attributes of W. If E is an owner entity of W then key attribute of E is also include In R. This key attribute of R
is set as a foreign key attribute of R. Now the combination of primary key attribute of owner entity type and
partial key of the weak entity type will form the key of the weak entity type.
GUARDIAN((rollno,name) (primary key),address,relationship)
Binary Relationships:
One-to-one relationship:
For each 1:1 relationship type R in the ER-diagram involving two entities E1 and E2 we choose one of
entities(say E1) preferably with total participation and add primary key attribute of another E as a foreign key
attribute in the table of entity(E1). We will also include all the simple attributes of relationship type R in E1 if
any, For example, the department relationship has been extended tp include head-id and attribute of the
relationship.
DEPARTMENT(D_NO,D_NAME,HEAD_ID,DATE_FROM)
One-to-many relationship:
For each 1:n relationship type R involving two entities E1 and E2, we identify the entity type (say E1) at the n-
side of the relationship type R and include primary key of the entity on the other side of the relation (say E2) as
a foreign key attribute in the table of E1. We include all simple attribute(or simple components of a composite
attribute of R(if any) in the table E1)
For example:
This works in relationship between the DEPARTMENT and FACULTY. For this relationship choose the entity
at N side, i.e, FACULTY and add primary key attribute of another entity DEPARTMENT, ie, DNO as a foreign
key attribute in FACULTY.
Many-to-many relationship:
For each m:n relationship type R, we create a new table (say S) to represent R, Wealso include the primary key
attributes of both the participating entity types as a foreign key attribute in s. Any simple attributes of the m:n
relationship type(or simple components as a composite attribute) is also included as attributes of S. For
example:
The M:n relationship taught-by between entities COURSE; and FACULTY shod be represented as a new table.
The structure of the table will include primary key of COURSE and primary key of FACULTY entities.
TAUGHT-BY(ID (primary key of FACULTY table),course-id (primary key of COURSE table)
N-ary relationship:
For each n-anry relationship type R where n>2, we create a new table S to represent R, We include as foreign
key attributes in s the primary keys of the relations that represent the participating entity types. We also include
any simple attributes of the n-ary relationship type(or simple components of complete attribute) as attributes of
S. The primary key of S is usually a combination of all the foreign keys that reference the relations representing
the participating entity types.
Multi-valued attributes:
For each multivalued attribute ‘A’, we create a new relation R that includes an attribute corresponding to plus
the primary key attributes k of the relation that represents the entity type or relationship that has as an attribute.
The primary key of R is then combination of A and k.
For example, if a STUDENT entity has rollno,name and phone number where phone numer is a multivalued
attribute the we will create table PHONE(rollno,phoneno) where primary key is the combination,In the
STUDENTtable we need not have phone number, instead if can be simply (rollno,name) only.
PHONE(rollno,phoneno)
.
Account_n name
Account
branch
generalization
specialization
Is-a specialization
intrest
charges
Saving
Current
Hierarchical Model:
A hierarchical database consists of a collection of records which are connected toone another through links.
a record is a collection of fields, each of which contains only one data value.
A link is an association between precisely two records.
The hierarchical model differs from the network model in that the records areorganized as collections of
trees rather than as arbitrary graphs.
Tree-Structure Diagrams:
The schema for a hierarchical database consists of
o boxes, which correspond to record types
o lines, which correspond to links
Record types are organized in the form of a rooted tree.
o No cycles in the underlying graph.
o Relationships formed in the graph must be such that only
one-to-many or one-to-one relationships exist between a parent and achild.
Single Relationships:
Example E-R diagram with two entity sets, customer and account, related througha binary, one-to-many
relationship depositor.
Corresponding tree-structure diagram has
o the record type customer with three fields: customer-name, customer-street, and customer-city.
o the record type account with two fields: account-number and balance
o the link depositor, with an arrow pointing to customer
If the relationship depositor is one to one, then the link depositor has two arrows.
Only one-to-many and one-to-one relationships can be directly represented in thehierarchical mode.
Must consider the type of queries expected and the degree to which the databaseschema fits the given E-R
diagram.
In all versions of this transformation, the underlying database tree (or trees) willhave replicated records.
Create two tree-structure diagrams, T1, with the root customer, and T2, withthe root account.
In T1, create depositor, a many-to-one link from account to customer.
In T2, create account-customer, a many-to-one link from customer to account.
Virtual Records:
For many-to-many relationships, record replication is necessary to preserve the tree-structure organization
of the database.
o Data inconsistency may result when updating takes place
o Waste of space is unavoidable
Virtual record — contains no data value, only a logical pointer to a particular physical record.
When a record is to be replicated in several database trees, a single copy of that record is kept in one of the
trees and all other records are replaced with a virtual record.
Let R be a record type that is replicated in T1, T2, . . ., Tn. Create a new virtual record type virtual-R and
replace R in each of the n – 1 trees with a record of type virtual-R.
Eliminate data replication in the diagram shown on page B.11; create virtual- customer and virtual-account.
Replace account with virtual-account in the first tree, and replace customer with
virtual-customer in the second tree.
Add a dashed line from virtual-customer to customer, and from virtual-account to account, to specify the
association between a virtual record and its corresponding physical record.
Network Model:
Data are represented by collections of records.
o similar to an entity in the E-R model
o Records and their fields are represented as record type
Type customer = record type account = record type
customer-name: string; account-number: integer;
customer-street: string; balance: integer;
customer-city: string;
Relationships among data are represented by links
o similar to a restricted (binary) form of an E-R relationship
o restrictions on links depend on whether the relationship is many-many, many-to-one, or one-to-one.
Data-Structure Diagrams:
Schema representing the design of a network database.
A data-structure diagram consists of two basic components:
o Boxes, which correspond to record types.
o Lines, which correspond to links.
Specifies the overall logical structure of the database.
Since a link cannot contain any data value, represent an E-R relationship withattributes with a new record type
and links.
To represent an E-R relationship of degree 3 or higher, connect the participating record types through a new
record type that is linked directly to each of the originalrecord types.
1. Replace entity sets account, customer, and branch with record types account, customer, and branch,
respectively.
2. Create a new record type Rlink (referred to as a dummy record type).
3. Create the following many-to-one links:
o CustRlink from Rlink record type to customer record type
o AcctRlnk from Rlink record type to account recordtype
o BrncRlnk from Rlink record type to branch recordtype
The DBTG CODASYL Model:
o All links are treated as many-to-one relationships.
o To model many-to-many relationships, a record type is defined to represent therelationship and two links are
used.
DBTG Sets:
o The structure consisting of two record types that are linked together is referredto in the DBTG model as a
DBTG set
o In each DBTG set, one record type is designated as the owner, and the other isdesignated as the member, of
the set.
o Each DBTG set can have any number of set occurrences (actual instances oflinked records).
o Since many-to-many links are disallowed, each set occurrence has preciselyone owner, and has zero or more
member records.
o No member record of a set can participate in more than one occurrence of theset at any point.
o A member record can participate simultaneously in several set occurrences of
different DBTG set.
UNIT-II: Relational Algebra
Relational Algebra is procedural query language, which takes Relation as input andgenerates relation as
output. Relational algebra mainly provides theoretical foundation for relational databases and SQL.
Relational algebra is a procedural query language, it means that it tells what data to beretrieved and how to be
retrieved.
Relational Algebra works on the whole table at once, so we do not have to use loopsetc to iterate over all the
rows (tuples) of data one by one.
All we have to do is specify the table name from which we need the data, and in a single line of command,
relational algebra will traverse the entire given table to fetchdata for you.
Basic/Fundamental Operations:
1. Select (σ)
2. Project (∏)
3. Union (𝖴)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)
Select Operation (σ) :This is used to fetch rows (tuples) from table(relation) which satisfies a given condition.
Syntax: σp(r)
σ is the predicate
r stands for relation which is the name of the table
p is prepositional logic ex: σage > 17 (Student)
This will fetch the tuples(rows) from table Student, for which age will be greater than 17. σage > 17 and gender
= 'Male' (Student)
This will return tuples(rows) from table Student with information of male students, of age more than 17.
Input:
σ BRANCH_NAME="perryride" (LOAN)
Output:
Input:
∏ NAME, CITY (CUSTOMER)
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
This operation is used to fetch data from two relations(tables) or temporaryrelation(result of another
operation).For this operation to work, the relations(tables) specified should have same number of
attributes(columns) and same attribute domain. Also the duplicate tuples are autamatically eliminated
from the result.
Syntax: A 𝖴 B
∏Student(RegularClass) 𝖴 ∏Student(ExtraClass)
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
∏ CUSTOMER_NAME (BORROW) 𝖴 ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
Set Difference (-):
This operation is used to find data present in one relation and not present in the secondrelation. This
operation is also applicable on two relations, just like Union operation.
Syntax: A - B
where A and B are relations.
For example, if we want to find name of students who attend the regular class but not theextra class,
then, we can use the below operation:
∏Student(RegularClass) - ∏Student(ExtraClass)
CUSTOMER_NAME
Smith
Jones
EMPLOYEE
EMP_ID EMP_NAME EMP_DEPT
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
EMPLOYEE X DEPARTMENT
Output:
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)
Join in DBMS:
A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.
Join in DBMS is a binary operation which allows you to combine join product andselection in
one single statement.
The goal of creating a join condition is that it helps you to combine the data from twoor more
DBMS tables.
The tables in DBMS are associated using the primary key and foreign keys.
2. LEFT JOIN
3. RIGHT JOIN
4. FULL JOIN
Table name: EMPLOYEE
PROJECT
PROJECT_NO EMP_ID DEPARTMENT
101 1 Testing
102 2 Development
103 3 Designing
104 4 Development
1. INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables as long as thecondition is
satisfied.
It returns the combination of all rows from both the tables where the condition satisfies.
Syntax
SELECT table1.column1, table1.column2 FROM table1 INNER JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE INNER JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
2. LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the righttable. If
there is no matching join value, it will return NULL.
Syntax
SELECT table1.column1, table1.column2 FROM table1LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE LEFT JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
3.RIGHT JOIN
In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and the matched values
from the left table. If there is no matching in both tables, it will return NULL.
Syntax
SELECT table1.column1, table1.column2 FROM table1 RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE RIGHT JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
.
4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join tables have
all the records from both tables. It puts NULL on the place of matches not found.
Syntax
SELECT table1.column1, table1.column2 FROM table1 FULL JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE
FULL JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
Division Operator in SQL
Division Operator (÷): Division operator A÷B can be applied if and only if:
Attributes of B is proper subset of Attributes of A.
The relation returned by division operator will have attributes = (All attributes of A –All
Attributes of B)
The relation returned by division operator will return those tuples from relation Awhich are
associated to every B’s tuple.
The division operator is used when we have to evaluate queries which contain thekeyword ALL.
Table 1: Course_Taken → It consists of the names of Students against the courses that they have
taken.
Student_Name Course
Robert Databases
David Databases
Course
Databases
Programming Languages
Find all the students. Create a set of all students that have taken courses. This can be done easily using the
following command.
CREATE TABLE AllStudents AS SELECT DISTINCT Student_Name FROM Course_Taken
This command will return the table AllStudents, as the resultset:
Student_name
Robert
David
Hannah
Tom
Find all the students and the courses required to graduate
Next, we will create a set of students and the courses they need to graduate. We can express this in the
form of Cartesian Product of AllStudents and Course_Required using the following command.
CREATE table StudentsAndRequired AS
SELECT AllStudents.Student_Name, Course_Required.CourseFROM AllStudents,
Course_Required
Now, the new resultset - table Students And Required will be:
Student_Name Course
Robert Databases
David Databases
Hannah Databases
Tom Databases
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Lets write relational calculus queries.
Query to display the last name of those students where age is greater than 30
{ t.Last_Name | Student(t) AND t.age > 30 }
In the above query you can see two parts separated by | symbol. The second part is where wedefine the
condition and in the first part we specify the fields which we want to display for the selected tuples.
The result of the above query would be:
Last_Name
Singh
Ajeet Singh 30
Chaitanya Singh 31
Ex:
Table-1: Customer
Customer name Street City
Saurabh A7 Patiala
Mehak B6 Jalandhar
Sumiti D9 Ludhiana
Ria A5 Patiala
Table-2: Branch
Branch name Branch city
ABC Patiala
DEF Ludhiana
GHI Jalandhar
Table-3: Account
Account number Branch name Balance
1111 ABC 50000
1112 DEF 10000
1113 GHI 9000
1114 ABC 7000
Table-4: Loan
Loan number Branch name Amount
L33 ABC 10000
L35 DEF 15000
L49 GHI 9000
L98 DEF 65000
Table-5: Borrower
Customer name Loan number
Saurabh L33
Mehak L49
Ria L98
Table-6: Depositor
Customer name Account number
Saurabh 1111
Mehak 1113
Sumiti 1114
Queries-1: Find the loan number, branch, amount of loans of greater than or equal to 10000 amount.
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Ajeet 30
Chaitanya 31
Carl 28
The data returned is stored in a result table, called the result-set. To fetch the entire table or all the fields in
the table:
SELECT * FROM table_name;To fetch individual column data
SELECT column1,column2 FROM table_name
Example
Consider the CUSTOMERS table having the following records −
+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ + + + + +
The following code is an example which would fetch the ID, Name and Salary fields fromthe
CUSTOMERS table, where the salary is greater than 2000 −
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000;
From clause:
From clause can be used to specify a sub-query expression in SQL. The relation producedby the
sub-query is then used as a new relation on which the outer query is applied.
Sub queries in the from clause are supported by most of the SQL implementations.
The correlation variables from the relations in from clause cannot be used in the sub-queries
in the from clause.
Syntax:
SELECT column1, column2 FROM
(SELECT column_x as C1, column_y FROM table WHERE PREDICATE_X)as table2
WHERE PREDICATE;
SET Operations
SQL supports few Set operations which can be performed on the table data. These are used to get
meaningful results from data stored in the table, under different special conditions.
In this tutorial, we will cover 4 different types of SET operations, along with example:
1. UNION
2. UNION ALL
3. INTERSECT
4. MINUS
1. Union
o The SQL Union operation is used to combine the result of two or more SQL SELECT
queries.
o In the union operation, all the number of datatype and columns must be same in boththe
tables on which UNION operation is being applied.
The union operation eliminates the duplicate rows from its resultset
Syntax
SELECT column_name FROM table1 UNION
SELECT column_name FROM table2;
The First table
ID NAME
1 Jack
2 Harry
3 Jackson
ID NAME
3 Jackson
4 Stephan
5 David
Union SQL query will be:SELECT * FROM First UNION
SELECT * FROM Second;
The resultset table will look like:
ID NAME
1 Jack
2 Harry
3 Jackson
4 Stephan
5 David
2. Union All
Union All operation is equal to the Union operation. It returns the set without removingduplication
and sorting the data.
Syntax:
SELECT column_name FROM table1 UNION ALL
SELECT column_name FROM table2;
Example: Using the above First and Second table. Union All query will be like:
SELECT * FROM First UNION ALL
SELECT * FROM Second;
-ID NAME
1 Jack
2 Harry
3 Jackson
3 Jackson
4 Stephan
5 David
3. Intersect
o It is used to combine two SELECT statements. The Intersect operation returns thecommon
rows from both the SELECT statements.
o In the Intersect operation, the number of data type and columns must be the same.
o It has no duplicates and it arranges the data in ascending order by default.
Syntax
SELECT column_name FROM table1 INTERSECT
SELECT column_name FROM table2;
Example:
Using the above First and second table, Intersect query will be:
ID NAME
3 Jackson
4. Minus
o It combines the result of two SELECT statements. Minus operator is used to displaythe rows
which are present in the first query but absent in the second query.
o It has no duplicates and data arranged in ascending order by default.
Syntax:
SELECT column_name FROM table1 MINUS
SELECT column_name FROM table2;
Example:
Using the above First and Second table, Minus query will be:
SELECT * FROM FirstMINUS
SELECT * FROM Second;
ID NAME
1 Jack
2 Harry
Aggregate functions in SQL
o SQL aggregation function is used to perform the calculations on multiple rows of asingle
column of a table. It returns a single value.
o It is also used to summarize the data.
Aggregate Functions
1) Count()
2) Sum()
3) Avg()
4) Min()
5) Max()
1. COUNT FUNCTION
o COUNT function is used to Count the number of rows in a database table. It can workon
both numeric and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified
table. COUNT(*) considers duplicate and Null.
Count(*): Returns total number of records
PRODUCT_MAST
Example: COUNT()
SELECT COUNT(*) FROM PRODUCT_MAST;
Output: 10
Output:7
Output:3
2. SUM Function
Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.
Syntax:
SUM()
or
SUM( [ALL|DISTINCT] expression )
Example: SUM()
SELECT SUM(COST) FROM PRODUCT_MAST;
Output:670
3. AVG function
The AVG function is used to calculate the average value of the numeric type. AVG functionreturns
the average of all non-Null values.
Syntax
AVG()
Example:
SELECT AVG(COST) FROM PRODUCT_MAST;
Output:
67.00
4. MAX Function
MAX function is used to find the maximum value of a certain column. This function determines the
largest value of all selected values of a column.
Syntax: MAX()
Example:
SELECT MAX(RATE) FROM PRODUCT_MAST;
30
5. MIN Function
MIN function is used to find the minimum value of a certain column. This function determines the
smallest value of all selected values of a column.
Syntax:MIN() )
Output:10
GROUP BY Statement
The GROUP BY statement groups rows that have the same values into summary rows, like"find the
number of customers in each country".
The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM,
AVG) to group the result-set by one or more columns.
Syntax:
Example:
Group By single column: Group By single column means, to place all the rows with same
value of only that particular column in one group. Consider the query as shown below:
SELECT NAME, SUM(SALARY) FROM Employee
GROUP BY NAME;
The above query will produce the below output:
Group By multiple columns: Group by multiple column is say for example, GROUP BY
column1, column2. This means to place all the rows with same values of both the columns
column1 and column2 in one group. Consider the below query:
SELECT SUBJECT, YEAR, Count(*)
FROM Student
GROUP BY SUBJECT, YEAR;
HAVING Clause:
We know that WHERE clause is used to place conditions on columns but what if we wantto place
conditions on groups?
This is where HAVING clause comes into use. We can use HAVING clause to place conditions to
decide which group will be the part of final result-set. Also we can not usethe aggregate functions
like SUM(), COUNT() etc. with WHERE clause. So we have to use HAVING clause if we want
to use any of these functions in the conditions.
Syntax:
SELECT column1, function_name(column2) FROM table_name
WHERE condition
GROUP BY column1, column2HAVING condition
ORDER BY column1, column2;
function_name: Name of the function used for example, SUM() , AVG().
table_name: Name of the table.
condition: Condition used.
Example:
SELECT NAME, SUM(SALARY) FROM EmployeeGROUP BY NAME
HAVING SUM(SALARY)>3000;
Example:
Consider the CUSTOMERS table having the following records.
+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ + + + + +
Following is an example, which would display a record for a similar age count that would be more
than or equal to 2.
SQL > SELECT ID, NAME, AGE, ADDRESS, SALARY
FROM CUSTOMERS
GROUP BY age
HAVING COUNT(age) >= 2;
In nested queries, a query is written inside a query. The result of inner query is used inexecution
of outer query. We will use STUDENT, COURSE,
STUDENT_COURSE tables for understanding nested queries.
STUDENT
S_ID S_NAME S_ADDRESS S_PHONE S_AGE
S1 RAM DELHI 9455123451 18
S2 RAMESH GURGAON 9652431543 18
S3 SUJIT ROHTAK 9156253131 20
S4 SURESH DELHI 9156768971 18
COURSE
C_ID C_NAME
C1 DSA
C2 Programming
C3 DBMS
STUDENT_COURSE
S_ID C_ID
S1 C1
S1 C3
S2 C1
S3 C2
S4 C2
S4 C3
Example
Consider the CUSTOMERS table having the following records −
+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ + + + + +
Now, let us check the following subquery with a SELECT statement.
SQL> SELECT *
FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS
WHERE SALARY > 4500);
+ + + + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ + + + + +
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ + + + + +
Students
id name class_id GPA
1 Jack Black 3 3.45
2 Daniel White 1 3.15
3 Kathrine Star 1 3.85
4 Helen Bright 2 3.10
5 Steve May 2 2.40
Teachers
id name subject class_id monthly_salary
1 Elisabeth Grey History 3 2,500
2 Robert Sun Literature [NULL] 2,000
3 John Churchill English 1 2,350
4 Sara Parker Math 2 3,000
Classes
id grade teacher_id number_of_students
1 10 3 21
2 11 4 25
3 12 1 28
SELECT *
FROM students WHERE GPA > (
SELECT AVG(GPA)
FROM students);
RESULT:
id name class_id GPA
1 Jack Black 3 3.45
3 Kathrine Star 1 3.85
SELECT AVG(number_of_students) FROM classes
WHERE teacher_id IN ( SELECT id
FROM teachers
WHERE subject = 'English' OR subject = 'History');
Views in SQL
o Views in SQL are considered as a virtual table. A view also contains rows and columns.
o To create the view, we can select the fields from one or more tables present in thedatabase.
o A view can either have specific rows based on certain condition or all the rows of atable.
Student_Detail
STU_ID NAME ADDRESS
1 Stephan Delhi
2 Kathrin Noida
3 David Ghaziabad
4 Alina Gurugram
Student_Marks
STU_ID NAME MARKS AGE
1 Stephan 97 19
2 Kathrin 86 21
3 David 74 18
4 Alina 90 20
5 John 96 18
1. Creating view
A view can be created using the CREATE VIEW statement. We can create a view from a single
table or multiple tables.
Syntax:
CREATE VIEW view_name ASSELECT column1, column2.....
FROM table_nameWHERE condition;
Just like table query, we can query the view to view the data. SELECT * FROM DetailsView;
Output:
NAME ADDRESS
Stephan Delhi
Kathrin Noida
David Ghaziabad
4. Deleting View
A view can be deleted using the Drop View statement.
Syntax:
DROP VIEW view_name;
Example:
If we want to delete the View MarksView, we can do this as:
DROP VIEW MarksView;
Uses of a View :
A good database should contain views due to the given reasons:
1. Restricting data access –
Views provide an additional level of table security by restricting access to apredetermined set
of rows and columns of a table.
2. Hiding data complexity –
A view can hide the complexity that exists in a multiple table join.
3. Simplify commands for the user –
Views allows the user to select information from multiple tables without requiring theusers to
actually know how to perform a join.
4. Store complex queries –
Views can be used to store complex queries.
5. Rename Columns –
Views can also be used to rename the columns without affecting the base tables provided the
number of columns in view must match the number of columns specified in select statement.
Thus, renaming helps to to hide the names of the columns of the base tables.
6. Multiple view facility –
Different views can be created on the same table for different users.
Advantages of Triggers:
Here,
o CREATE [OR REPLACE] TRIGGER trigger_name: It creates or replaces an existingtrigger
with the trigger_name.
o {BEFORE | AFTER | INSTEAD OF} : This specifies when the trigger would beexecuted.
The INSTEAD OF clause is used for creating trigger on a view.
o {INSERT [OR] | UPDATE [OR] | DELETE}: This specifies the DML operation.
o [OF col_name]: This specifies the column name that would be updated.
o [ON table_name]: This specifies the name of the table associated with the trigger.
o [REFERENCING OLD AS o NEW AS n]: This allows you to refer new and old values for
various DML statements, like INSERT, UPDATE, and DELETE.
o [FOR EACH ROW]: This specifies a row level trigger, i.e., the trigger would be executed for
each row being affected. Otherwise the trigger will execute just once when the SQL statement
is executed, which is called a table level trigger.
o WHEN (condition): This provides a condition for rows for which the trigger would fire. This
clause is valid only for row level triggers.
Create trigger:
Let's take a program to create a row level trigger for the CUSTOMERS table that would fire for
INSERT or UPDATE or DELETE operations performed on the CUSTOMERS table. Thistrigger
will display the salary difference between the old values and new values:
CREATE OR REPLACE TRIGGER display_salary_changes BEFORE DELETE OR INSERT
OR UPDATE ON customersFOR EACH ROW
WHEN (NEW.ID > 0)
DECLARE
sal_diff number;
BEGIN
sal_diff := :NEW.salary - :OLD.salary; dbms_output.put_line('Old salary: ' || :OLD.salary);
dbms_output.put_line('New salary: ' || :NEW.salary);dbms_output.put_line('Salary difference: ' ||
sal_diff);
END;
After the execution of the above code at SQL Prompt, it produces the following result.
Trigger created.
Check the salary difference by procedure:
Use the following code to get the old salary, new salary and salary difference after the trigger
created.
DECLARE
total_rows number(2);
BEGIN
UPDATE customers
SET salary = salary + 5000;
IF sql%notfound THEN
dbms_output.put_line('no customers updated');
ELSIF sql%found THEN
total_rows := sql%rowcount;
dbms_output.put_line( total_rows || ' customers updated ');
END IF;
END;
Old salary: 20000
New salary: 25000
Salary difference: 5000
Old salary: 22000
New salary: 27000
Salary difference: 5000
Old salary: 24000
New salary: 29000
Salary difference: 5000
Old salary: 26000
New salary: 31000
Salary difference: 5000
Old salary: 28000
New salary: 33000
Salary difference: 5000
Output:
Note: As many times you executed this code, the old and new both salary is incremented by5000
and hence the salary difference is always 5000.
After the execution of above code again, you will get the following result.
Old salary: 25000
New salary: 30000
Salary difference: 5000
Old salary: 27000
New salary: 32000
Salary difference: 5000
Old salary: 29000
New salary: 34000
Salary difference: 5000
Old salary: 31000
New salary: 36000
Salary difference: 5000
Old salary: 33000
New salary: 38000
Salary difference: 5000
Old salary: 35000
New salary: 40000
Salary difference: 5000
6 customers updated
Important Points:
Following are the two very important points and should be noted carefully.
o OLD and NEW references are used for record level triggers these are not available fortable level triggers.
o If you want to query the table in the same trigger, then you should use the AFTER keyword, because
triggers can query the table or change it again only after the initialchanges are applied and the table is back
in a consistent state.
Procedure
The PL/SQL stored procedure or simply a procedure is a PL/SQL block which performs oneor more specific
tasks. It is just like procedures in other programming languages.
The procedure contains a header and a body.
o Header: The header contains the name of the procedure and the parameters or variables passed to the
procedure.
o Body: The body contains a declaration section, execution section and exception section similar to a
general PL/SQL block.
In this example, we are going to insert record in user table. So you need to create user table first.
Table creation:
create table user(id number(10) primary key,name varchar2(100)); Now write theprocedure code
to insert record in user table.
Procedure Code:
create or replace procedure "INSERTUSER"(id IN NUMBER,
name IN VARCHAR2)
is begin
insert into user values(id,name);end;
/
Output:
Procedure created.
PL/SQL program to call procedure
Let's see the code to call above created procedure.
BEGIN
insertuser(101,'Rahul'); dbms_output.put_line('record inserted successfully');
END;
Now, see the "USER" table, you will see one record is inserted.
ID Name
101 Rahul
PL/SQL Drop Procedure
Syntax for drop procedure
DROP PROCEDURE procedure name;
Example of drop procedure
DROP PROCEDURE pro1;
UNIT- III Normalization
Decomposition: the process of breaking up or dividing a single relation into two or more sub
relations is called as decomposition of a relation.
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition
will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the samerelation
as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all thedecomposition give
the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT DEPT_NAME
_ID
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Market ing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
Lossy Decomposition:
As the name suggests, when a relation is decomposed into two or more relational schemas, the
loss of information is unavoidable when the original relation is retrieved.
Let us see an example:
<EmpInfo>
<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
Now, you won’t be able to join the above tables,since Emp_ID is not a part of the
DeptDetails relation. Therefore, the above relation has lossy decomposition.
Dependency Preserving:
Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent of eachother
but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a third
attribute that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(whiteand
black) of each model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are
– Insertion, update and deletion anomaly.
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing the
department details in which the employee works. At some point of time the table looks like this:
Update anomaly: we have two rows for employee Rick as he belongs to two departments of the
company. If we want to update the address of Rick then we have to update the same in two rows or
the data will become inconsistent. If somehow, the correct address gets updated in one department
but not in other then as per the database, Rick would be having two different addresses, which is not
correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently
not assigned to any department then we would not be able to insert the data into thetable if emp_dept
field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then
deleting the rows that are having emp_dept as D890 would also delete the information of employee
Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will discuss
about normalization.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It
should hold only atomic values.
Example: Suppose a company wants to store the names and contact details of its employees. It
creates a table that looks like this:
emp_id emp_name emp_address emp_mobile
101 Herschel New Delhi 8912312390
102 Jon Kanpur 8812121212
9900012222
103 Ron Chennai 7778881212
104 Lester Bangalore 9990000123
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the
same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”,
the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
Example:
ID Name Courses
1 A c1, c2
2 E c3
3 M C2, c3
In the above table Course is a multi valued attribute so it is not in 1NF. Below Table is in 1NF
as there is no multi valued attribute
ID Name Course
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Second Normal Form (2NF)
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Example: Suppose a school wants to store the data of teachers and the subjects they teach. They
create a table that looks like this: Since a teacher can teach more than one subjects, thetable can have
multiple rows for a same teacher.
teacher_id subject teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non
prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate
key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the
proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
A relation must be in first normal form and relation must not contain any partial
dependency.
A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute(attributes
which are not part of any candidate key) is dependent on any proper subset of any candidate
key of the table.
But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper
subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the
candidate key, which is a partialdependency and so this relation is not in 2NF.
To convert the above relation to 2NF, we need to split the table into two tables such as:
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for everynon-trivial function
dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
EMPLOYEE_DETAIL:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-
prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule
of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
A table design is said to be in 3NF if both the following conditions hold:
Table must be in 2NF
Transitive functional dependency of non-prime attribute on any super key should beremoved.
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for eachfunctional
dependency X-> Y at least one of the following conditions hold:
To make this table complies with 3NF we have to break the table into two tables to removethe transitive
dependency:
employee table:
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
employee_zip table:
emp_zip emp_state emp_city emp_district
282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan
A relation is in third normal form, if there is no transitive dependency for non-primeattributes as well as it is
in second normal form.
A relation is in 3NF if at least one of the following condition holds in every non-trivialfunction dependency
X –> Y
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).
Transitive dependency – If A->B and B->C are two FDs then A->C is called transitivedependency.
Example 1 – In relation STUDENT given in Table 4,
FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE,STUD_STATE -> STUD_COUNTRY,
STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal form. To convert it
in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
STUDENT:
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is
no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE:
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY:
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valueddependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, thenthe relation will be a multi-
valued dependency.
Example:
STUDENT:
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is
no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE:
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY:
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example – Consider the database table of a class whaich has two relations R1 contains student ID(SID) and
student name (SNAME) and R2 contains course id(CID) and coursename (CNAME).
Table – R1 X R2
SID SNAME CID CNAME
S1 A C1 C
S1 A C2 D
S2 B C1 C
S2 B C2 D
Multivalued dependencies (MVD) are:
SID->->CID; SID->->CNAME; SNAME->->CNAME
Multivalued Dependency:
o Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend
on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on athird attribute that's why
it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(whiteand black) of each
model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation of
these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and"BIKE_MODEL multidetermined
COLOR".
Join Dependency
Join decomposition is a further generalization of Multivalued dependencies.
If the join of R1 and R2 over C is equal to relation R, then we can say that a joindependency (JD) exists.
Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D).
Alternatively, R1 and R2 are a lossless decomposition of R.
A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2, , Rn is lossless-join decomposition.
The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to therelation R.
Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD ofR.
Example:
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all three columnstogether acts as a primary key, so we
can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
UNIT-IV TRANSACTION MANAGEMENT IN DBMS
A transaction is a set of logically related operations.
Now that we understand what is transaction, we should understand what are the problems associated with it.
The main problem that can happen during a transaction is that the transaction can fail before finishing the all
the operations in the set. This can happen due to power failure, system crash etc.
This is a serious problem that can leave database in an inconsistent state. Assume thattransaction fail after
third operation (see the example above) then the amount would be deducted from your account but your friend
will not receive it.
Commit: If all the operations in a transaction are completed successfully then commit thosechanges to the
database permanently.
Rollback: If any of the operation fails then rollback all the changes done by previousoperations.
STATES OF TRANSACTION:
Transactions can be implemented using SQL queries and Server. In the below-givendiagram, you can see how
transaction states works.
Active state
o The active state is the first state of every transaction. In this state, the transaction isbeing executed.
o For example: Insertion or deletion or updating a record is done here. But all therecords are still not saved to
the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the data is still not saved to the
database.
o In the total mark calculation example, a final display of the total marks step isexecuted in this state.
Committed
A transaction is said to be in a committed state if it executes all its operations successfully. Inthis state, all the
effects are now permanently saved on the database system.
Failed state
o If any of the checks made by the database recovery system fails, then the transaction is said to be in the failed
state.
o In the example of total mark calculation, if the database is not able to fire a query to fetch the marks, then the
transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the database recovery system will
make sure that the database is in its previous consistent state. If not then it will abort or roll back the transaction
to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing thetransaction, all the executed
transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the twooperations:
1. Re-start the transaction
2. Kill the transaction
TRANSACTION PROPERTY
The transaction has the four properties. These are used to maintain consistency in a database, before and after the
transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction istreated as one unit and
either run to completion or is not executed at all.
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Consistency
o The integrity constraints are maintained so that the database is consistent before andafter the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a new stable state.
o The consistent property of database states that every transaction sees a consistent database instance.
o The transaction is used to transform the database from one consistent state to another consistent state.
Isolation
o It shows that the data which is used at the time of execution of a transaction cannot beused by the second
transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then that data item can't be
accessed by any other transaction T2 until the transaction T1ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.
Durability
o The durability property is used to indicate the performance of the database'sconsistent state. It states that the
transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the system failure. When a
transaction is completed, then the database reaches a state known as the consistent state. That consistent state
cannot be lost, even in the event ofa system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.
The recovery-management component of a database system can support atomicityand durability by a variety of
schemes.
Shadow copy:
In the shadow-copy scheme, a transaction that wants to update the database first creates a complete copy of
the database.
All updates are done on the new database copy, leaving the original copy, the shadow copy, untouched. If at
any point the transaction has to be aborted, the system merely deletes the new copy. The old copy of the
database has not been affected.
This scheme is based on making copies of the database, called shadow copies, assumes thatonly one
transaction is active at a time.
The scheme also assumes that the database is simply a file on disk. A pointer called db- pointer is maintained
on disk; it points to the current copy of the database.
If the transaction completes, it is committed as follows:
First, the operating system is asked to make sure that all pages of the new copy of the database have been
written out to disk. (Unix systems use the flush command for this purpose.)
After the operating system has written all the pages to disk, the database system updates thepointer db-pointer
to point to the new copy of the database;
the new copy then becomes the current copy of the database. The old copy of the database is then deleted.
Figure below depicts the scheme, showing the database state before and after the update.
SCHEDULE:
A series of operation from one transaction to another transaction is known as schedule. It is used to preserve the
order of the operation in each of the individual transaction.
1. SERIAL SCHEDULE
The serial schedule is a type of schedule where one transaction is executed completely beforestarting another
transaction. In the serial schedule, when the first transaction completes its cycle, then the next transaction is
executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no
interleaving of operations, then there are the following two possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
o Schedule A shows the serial schedule where T1 followed byT2.
o Schedule B shows the serial schedule where T2 followed byT1.
2. NON-SERIAL SCHEDULE
o If interleaving of operations is allowed, then there will be non-serial schedule.
o It contains many possible orders in which the system can execute the individualoperations of the
transactions.
o Schedule C and Schedule D are the non-serialschedules. It has interleaving of operations.
3. SERIALIZABLE SCHEDULE
o The serializability of schedules is used to find non-serial schedules that allow the transaction to execute
concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction have interleaving of their
operations.
o A non-serial schedule will be serializable if its result is equal to the result of itstransactions executed
serially.
SERIALIZABILITY IN DBMS
Some non-serial schedules may lead to inconsistency of the database.
Serializability is a concept that helps to identify which non-serial schedules are correct and will maintain the
consistency of the database.
1. Conflict Serializability
If a given non-serial schedule can be converted into a serial schedule by swapping its non-conflicting
operations, then it is called as a conflict serializable schedule.
Conflicting Operations:
Two operations are called as conflicting operations if all the following conditions hold true for them-
Both the operations belong to different transactions
Both the operations are on the same data item
At least one of the two operations is a write operation
Example-Consider the following schedule-
In this schedule,
W1 (A) and R2 (A) are called as conflicting operations. This is because all the above conditions hold true for
them. Checking Whether a Schedule is Conflict Serializable Or Not-
Follow these steps to check whether a given non-serial schedule is conflict serializable or not-
Step-01:
Find and list all the conflicting operations.
Step-02:
Start creating a precedence graph by drawing one node for each transaction.
Step-03:
Draw an edge for each conflict pair such that if X i (V) and Yj (V) forms a conflict pair thendraw an edge from
Ti to Tj.
This ensures that Ti gets executed before Tj.
Step-04:
Check if there is any cycle formed in the graph.
If there is no cycle found, then the schedule is conflict serializable otherwise not.
VIEW SERIALIZABILITY?
View Serializability is a process to find out that a given schedule is view serializable or not. To check whether a
given schedule is view serializable, we need to check whether the given schedule is View Equivalent to its
serial schedule.
2. View Serializability
o A schedule will view serializable if it is view equivalent to a serial schedule.
o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.
View Equivalent:
Two schedules S1 and S2 are said to be view equivalent if they satisfy the followingconditions:
1. Initial Read:
An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In schedule S1, if a
transaction T1 is reading the data item A, then in S2, transaction T1 shouldalso read A.
Above two schedules are view equivalent because Initial read operation in S1 is done by T1and in S2 it is also
done by T1.
2. Updated Read
In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should read Awhich is updated by
Tj.
3. Final Write
A final write must be the same between both the schedules. In schedule S1, if a transactionT1 updates A at last
then in S2, final writes operations should also be done by T1.
Above two schedules is view equal because Final write operation in S1 is done by T3 and in S2, the final write
operation is also done by T3.
Recoverability of Schedule:
Sometimes a transaction may not execute completely due to a software issue, system crash orhardware failure.
In that case, the failed transaction has to be rollback. But some other transaction may also have used value
produced by the failed transaction. So we also have to rollback those transactions.
1. Read Uncommitted – Read Uncommitted is the lowest isolation level. In this level, one transaction may
read not yet committed changes made by other transaction, therebyallowing dirty reads. In this level,
transactions are not isolated from each other.
2. Read Committed – This isolation level guarantees that any data read is committed at the moment it is read.
Thus it does not allow dirty read. The transactions hold a read or write lock on the current row, and thus
prevent other transactions from reading, updating or deleting it.
3. Repeatable Read – This is the most restrictive isolation level. The transaction holds read locks on all rows
it references and writes locks on all rows it inserts, updates, or deletes. Since other transaction cannot read, update
or delete these rows, consequently itavoids non-repeatable read.
4. Serializable – This is the highest isolation level. A serializable execution is guaranteed to be serializable.
Serializable execution is defined to be an execution of operations in which concurrently executing transactions
appears to be serially executing.
FAILURE CLASSIFICATION
To find that where the problem has occurred, we generalize a failure into the followingcategories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from where it can't go any
further. If a few transaction or process is hurt, then this is called as transaction failure.
Reasons for a transaction failure could be -
1. Logical errors: If a transaction cannot complete due to some code error or an internal error condition, then
the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction because the database system
is not able to execute it. For example, the system aborts an active transaction, in case of deadlock or resource
unavailability.
2. System Crash
System failure can occur due to power failure or other hardware or software failure. Example: Operating
system error.
Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be corrupted.
3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently. It was a common problem in the
early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability to the disk or any
other failure, which destroy all or part of diskstorage.
This type of problem occurs when one transaction T1 updates a data item of the database, and then that
transaction fails due to some reason, but its updates are accessed by some other transaction.
Example: Let’s take the value of A is 100.
Concurrency Control is the working concept that is required for controlling and managing theconcurrent
execution of database operations and thus avoiding the inconsistencies in the database. Thus, for maintaining
the concurrency of the database, we have the concurrency control protocols.
The concurrency control protocols ensure the atomicity, consistency, isolation, durability and serializability of
the concurrent execution of the database transactions. Therefore, these protocols are categorized as:
o Lock Based Concurrency Control Protocol
o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires anappropriate lock on it. There
are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read bythe transaction.
o It can be shared between the transactions because when the transaction holds a lock,then it can't update the
data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by thetransaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the samedata simultaneously.
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
o Growing phase: from step 1-3
o Shrinking phase: from step 5-7
o Lock point: at 3
Transaction T2:
o Growing phase: from step 2-6
o Shrinking phase: from step 8-9
o Lock point: at 6
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the operation is
executed.
Where,
TS(TI) denotes the timestamp of the transaction Ti. R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X. Validation Based Protocol
Validation phase is also known as optimistic concurrency control technique. In the validation based protocol, the
transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the value of various data
items and stores them in temporary local variables. It can perform all the write operations on temporary
variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against the actual data to see
if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results are written to the
database or system otherwise the transaction is rolled back.
This protocol is used to determine the time stamp for the transaction for serialization using the time
stamp of the validation phase, as it is the actual phase which determines if the transaction will commit or
rollback.
Hence TS(T) = validation(T).
The serializability is determined during the validation process. It can't be decided in advance.
While executing the transaction, it ensures a greater degree of concurrency and also less number of
conflicts.
Thus it contains transactions which have less number of rollbacks.
Thomas Write Rule provides the guarantee of serializability order for the protocol. Itimproves the Basic
Timestamp Ordering Algorithm.
The basic Thomas write rules are as follows:
o If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and operation is rejected.
o If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the transactionand continue processing.
o If neither condition 1 nor condition 2 occurs, then allowed to execute the WRITEoperation by transaction Ti
and set W_TS(X) to TS(T).
MULTIPLE GRANULARITY
Database
Area
File
Record
Recovery and Atomicity:
When a system crashes, it may have several transactions being executed and various files opened for them to
modify the data items.
But according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained, that is,
either all the operations are executed or none.
Database recovery means recovering the data when it get deleted, hacked ordamaged accidentally.
Atomicity is must whether is transaction is over or not it should reflect in the databasepermanently or it should
not effect the database at all.
It should check the states of all the transactions, which were being executed.
A transaction may be in the middle of some operation; the DBMS must ensurethe atomicity of the transaction
in this case.
It should check whether the transaction can be completed now or it needs to berolled back.
No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining the
atomicity of a transaction −
Maintaining the logs of each transaction, and writing them onto some stable storagebefore actually modifying
the database.
Maintaining shadow paging, where the changes are done on a volatile memory, and later, the actual database
is updated.
Log-Based Recovery
o The log is a sequence of records. Log of each transaction is maintained in some stablestorage so that if any
failure occurs, then it can be recovered from there.
o If any operation is performed on the database, then it will be recorded in the log.
o But the process of storing the logs should be done before the actual transaction isapplied in the database.
o The deferred modification technique occurs if the transaction does not modify thedatabase until it has
committed.
o In this method, all the logs are created and stored in the stable storage, and thedatabase is updated when a
transaction commits.
o The Immediate modification technique occurs if database modification occurs whilethe transaction is still active.
o In this technique, the database is modified immediately after every operation. It follows an actual database
modification.
Concurrency control means that multiple transactions can be executed at the same time and then the interleaved
logs occur. But there may be changes in transaction results so maintain the order of execution of those
transactions.
During recovery, it would be very difficult for the recovery system to backtrack all the logs and then start
recovering.
Recovery with concurrent transactions can be done in the following four ways.
Transaction rollback :
In this scheme, we rollback a failed transaction by using the log.
The system scans the log backward a failed transaction, for every log record found inthe log the system
restores the data item.
Checkpoints :
A checkpoint is a process of saving a snapshot of the applications state so that it canrestart from that point in
case of failure.
Checkpoint is a point of time at which a record is written onto the database form thebuffers.
Checkpoint shortens the recovery process.
When it reaches the checkpoint, then the transaction will be updated into the database, and till that point, the
entire log file will be removed from the file. Then the log file is updated with the new step of transaction till the
next checkpoint and so on.
The checkpoint is used to declare the point before which the DBMS was in the consistent state, and all the
transactions were committed.
Restart recovery:
When the system recovers from a crash, it constructs two lists.
The undo-list consists of transactions to be undone, and the redo-list consists oftransaction to be redone.
The system constructs the two lists as follows: Initially, they are both empty. Thesystem scans the log
backward, examining each record, until it finds the first
<checkpoint> record.
Check Points:
Checkpoints are a process of saving a snapshot of the applications state so that it canrestart from that point in
case of failure.
Checkpoint is a point of time at which a record is written onto the database form the buffers.
Checkpoint shortens the recovery process.
When it reaches the checkpoint, then the transaction will be updated into the database, and till that point, the
entire log file will be removed from the file. Then the log file is updated with the new step of transaction till the
next checkpoint and so on.
The checkpoint is used to declare the point before which the DBMS was in the consistent state, and all the
transactions were committed.
BUFFER MANAGEMENT
The buffer manager is the software layer that is responsible for bringing pages from physical disk to main
memory as needed. The buffer manages the available main memory bydividing the main memory into a
collection of pages, which we called as buffer pool. The main memory pages in the buffer pool are called
frames.
Data must be in RAM for DBMS to operate on it.
Buffer manager hides the fact that not all data is in RAM.
Buffer Manager
A Buffer Manager is responsible for allocating space to the buffer in order to store data into the buffer.
If a user request a particular block and the block is available in the buffer, the buffer manager provides
the block address in the main memory.
If the block is not available in the buffer, the buffer manager allocates the block in the buffer.
If free space is not available, it throws out some existing blocks from the buffer to allocate the required
space for the new block.
The blocks which are thrown are written back to the disk only if they are recentlymodified when writing
on the disk.
If the user requests such thrown-out blocks, the buffer manager reads the requested block from the disk
to the buffer and then passes the address of the requested block tothe user in the main memory.
However, the internal actions of the buffer manager are not visible to the programs that may create any
problem in disk-block requests. The buffer manager is just like avirtual machine
We can have checkpoints at multiple stages so as to save the contents of the database periodically.
A state of active database in the volatile memory can be periodically dumped onto a stable storage,
which may also contain logs and active transactions and buffer blocks.
<dump> can be marked on a log file, whenever the database contents are dumped from a non-volatile
memory to a stable one.
Recovery
When the system recovers from a failure, it can restore the latest dump.
It can maintain a redo-list and an undo-list as checkpoints.
It can recover the system by consulting undo-redo lists to restore the state of all transactions up to the
last checkpoint.
ARIES Algorithm:
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is based on the Write Ahead Log (WAL)
protocol. Every update operation writes a log record which is one of the following :
1. Analysis:
The recovery subsystem determines the earliest log record from which the next passmust start. It also scans
the log forward from the checkpoint record to construct a snapshot of what the system looked like at the
instant of the crash.
2. Redo:
Starting at the earliest LSN, the log is read forward and each update redone.
3. Undo:
The log is scanned backward and updates corresponding to loser transactions areundone.
4. Remote Backup:
Remote backup provides a sense of security in case the primary location where the database is located gets
destroyed. Remote backup can be offline or real-time or online. In case it is offline, it is maintained manually.
Online backup systems are more real-time and lifesavers for database administrators and investors. An online
backup system is a mechanism where every bit of the real-time data is backed up simultaneously at two distant
places. One of them is directly connected to the system and the other one is kept at a remote place as backup.
As soon as the primary database storage fails, the backup system senses the failure and switches the user system
to the remote storage. Sometimes this is so instant that the users can’t even realize a failure.
File – A file is named collection of related information that is recorded on secondary storage such as magnetic
disks, magnetic tables and optical disks.
Concurrency Control
Lock-Based Protocols
Timestamp-Based Protocols
Validation-Based Protocols
Multiple Granularity
Multiversion Schemes
Deadlock Handling
Lock-Based Protocols
A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes :
o exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X
instruction.
o shared (S) mode. Data item can only be read. S-lock is requested using
lock-S instruction.
Lock requests are made to concurrency-control manager. Transaction can proceedonly after request is
granted.
Lock-compatibility matrix
A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on
the item by other transactions
Any number of transactions can hold shared locks on an item, but if any transaction holds an exclusive on the
item no other transaction may hold any lock on the item.
If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other
transactions have been released. The lock is then granted.
Example of a transaction performing locking:
T2: lock-S(A);read (A); unlock(A); lock-S(B);
read (B); unlock(B); display(A+B)
Locking as above is not sufficient to guarantee serializability — if A and B getupdated in-between the read of
A and B, the displayed sum would be wrong.
A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. Locking
protocols restrict the set of possible schedules.
Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to release its lock on
B, while executing lock-X(A) causes T3 to wait forT4 to release its lock on A.
Such a situation is called a deadlock.
To handle a deadlock one of T3 or T4 must be rolled backand its locks released.
The Two-Phase Locking Protocol
One protocol that ensures serializability is the two-phase locking protocol. This protocolrequires that each
transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the transaction
releases a lock, it enters the shrinking phase, and it can issue no more lock requests.
For example, transactions T3 and T4 are two phase. On the other hand, transactions T1 and T2 are not two
phase. Note that the unlock instructions do not need to appear at the end of the transaction. For example, in the
case of transaction T3, we could move the unlock(B) instruction to just after the lock-X(A) instruction, and still
retainthe two-phase locking property.
Cascading rollbacks can be avoided by a modification of two-phase locking called the strict two-phase locking
protocol. This protocol requires not only that locking be two phase, but also that all exclusive-mode locks taken
by a transaction be held until that transaction commits. This requirement ensures that any data written by an
uncommitted transaction are locked in exclusive mode until the transaction commits, preventing any other
transaction from reading the data.
Timestamp-Based Protocols
Timestamps:
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by TS(Ti). This
timestamp is assigned by the database system before the transaction Ti starts execution. If a transaction Ti has
been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then TS(Ti) < TS(Tj ). There are
two simple methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a transaction’s timestamp is equal to the value of
the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a transaction’s
timestamp is equal to the value of the counter When the transaction enters the system.
The timestamps of the transactions determine the serializability order. Thus, if
TS(Ti) < TS(Tj ), then the system must ensure that the produced schedule is equivalent to a serial schedule in
which transaction Ti appears before transaction Tj .
To implement this scheme, we associate with each data item Q two timestamp values:
• W-timestamp(Q) denotes the largest timestamp of any transaction that executedwrite(Q) successfully.
• R-timestamp(Q) denotes the largest timestamp of any transaction that executedread(Q) successfully.
These timestamps are updated whenever a new read(Q) or write(Q) instruction isexecuted.
DEADLOCK HANDLING:
System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another
transaction inthe set.
Deadlock prevention protocols ensure that the system will never enter into a deadlock state. Some prevention
strategies :
o Require that each transaction locks all its data items before it begins execution (pre declaration).
o Impose partial ordering of all data items and require that a transaction can lock data items only in the
order specified by thepartial order (graph-based protocol).
DEADLOCK DETECTION:
DEADLOCK DETECTION:
Deadlocks can be described as a wait-for graph, which consists of apair G = (V,E),
o V is a set of vertices (all the transactions in the system)
o E is a set of edges; each element is an ordered pair Ti Tj.
o If Ti Tj is in E, then there is a directed edge from Ti to Tj, implyingthat Ti is waiting for Tj to release
a data item.
o When Ti requests a data item currently being held by Tj, then the edge Ti Tj is inserted in the wait-for
graph. This edge is removed only when Tj is no longer holding a data item neededby Ti.
The system is in a deadlock state if and only if the wait-for graphhas a cycle. We must invoke a deadlock-
detection algorithm periodically to look for cycles.
Recovery from Deadlock:
When a detection algorithm determines that a deadlock exists, the system must recover from the deadlock. The
most common solution is to roll back one or more transactions to break the
deadlock. Three actions need to be taken:
1. Selection of a victim. Given a set of deadlocked transactions, we must determine which transaction (or
transactions) to roll back to break the deadlock. We should roll back those transactions that will incur the
minimum cost. Unfortunately, the term minimum cost is not a precise one. Many factors may determine the cost
of a rollback, including
a. How long the transaction has computed, and how much longer the transaction will compute before it
completes its designated task.
b. How many data items the transaction has used.
c. How many more data items the transaction needs for it to complete.
d. How many transactions will be involved in the rollback.
2. Rollback. Once we have decided that a particular transaction must be rolled back, we must determine how
far this transaction should be rolled back. The simplest solution is a total rollback: Abort the transaction and
then restart it. However, it is more effective to roll back the transaction only as far as necessary to break the
deadlock. Such partial rollback requires the system to maintain additional information about the state of all the
running transactions. Specially, the sequence of lock requests/grants and updates performed by the transaction
needs to be recorded. The deadlock detection mechanism should decide which locks the selected transaction
needs to release in order to break the deadlock. The selected transaction must be rolled back to the point where
it obtained the conflict of these locks, undoing all actions it took after that point. The recovery mechanism must
be capable of performing such partial rollbacks. Furthermore, the transactions must be capable of resuming
execution after a partial rollback. See the bibliographical notes for relevant references.
3. Starvation. In a system where the selection of victims is based primarily on cost factors, it may happen that
the same transaction is always picked as a victim. As a result, this transaction never completes its designated
task, thus there is starvation. We must ensure that transaction can be picked as a victim only a (small) number
of times. The most common solution is to include the number of rollbacks in the cost factor.