Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DBMS

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 261

DATABASE MANAGEMENT

SYSTEM
What is Data

 Data is a collection of a distinct small unit of information.


 It can be used in a variety of forms like text, numbers,
media, bytes, etc.
 it can be stored in pieces of paper or electronic memory, etc.
 Word 'Data' is originated from the word 'datum' that means
'single piece of information.'
 It is plural of the word datum.
 In computing, Data is information that can be translated into
a form for efficient movement and processing.

2
What is Database?

 A database is an organized collection of data, so that it can


be easily accessed and managed.
 The main purpose of the database is to operate a large
amount of information by storing, retrieving, and managing
data.
 There are many dynamic websites on the World Wide Web
nowadays which are handled through databases.
 For example, a model that checks the availability of rooms in
a hotel. It is an example of a dynamic website that uses a
database.

3
DATABASES
• Web indexes • Train timetables
• Library catalogues • Airline bookings
• Medical records • Credit card details
• Bank accounts • Student records
• Stock control • Customer histories
• Personnel systems • Stock market prices
• Product catalogues • Discussion boards
• Telephone directories • and so on…
WHY DBMS

• Defines (data types, structures, constraints) and constructs


(storing data on some storage medium controlled by DBMS)

• Manipulates (querying, update, report generation, etc…)


• databases for various applications.

• A Database Management Software (DBMS) is used for storing,


manipulating, managing data and to satisfy their specific
(business) requirements.
DBMS (Data Base Management System)

Database management System is software which is used to store


and retrieve the database. For example, Oracle, MySQL, etc.; these
are some popular DBMS tools.
 DBMS provides the interface to perform the various operations
like creation, deletion, modification, etc.
 DBMS allows the user to create their databases as per their
requirement.
 DBMS accepts the request from the application and provides
specific data through the operating system.
 DBMS contains the group of programs which acts according to
the user instruction.
 It provides security to the database.

6
DBMS APPLICATION

• Banking: all transactions


• Airlines: reservations, schedules
• Universities: registration, grades
• Sales: customers, products, purchases
• Manufacturing: production, inventory, orders,
supply chain
• Human resources: employee records, salaries,
tax deductions, etc…
File Management System Database Management System
File System is a general, easy-to-use system to
store general files which require less security Database management system is used when
and constraints. security constraints are high.

Data Redundancy is more in file management Data Redundancy is less in database


system. management system.

Data Inconsistency is less in database


Data Inconsistency is more in file system. management system.

Centralisation is hard to get when it comes to Centralisation is achieved in Database


File Management System. Management System.
In Database Management System, user is
User locates the physical address of the files to unaware of physical address where data is
access data in File Management System. stored.
Security is high in Database Management
Security is low in File Management System. System.

Database Management System stores


File Management System stores unstructured structured data which have well defined
data as isolated data files/entities. constraints and interrelation.
ADVANTAGES

• REDUCED DATA REDUNDANCY


• ELIMINATION OF INCONSISTENCIES DATA
• DATA SECURITY
• EASIER DATA ACCESS
• IMPROVED DECISION MAKING
• BACKUP AND RECOVERY
DISADVANTAGES

• HIGH INITIAL COST RESOURCES


• COMPLEXITY
• COST OF CONVERSION IN SOME SITUATIONS
• SECURITY ISSUES
• SLOWER RESPONSE
EXAMPLES
• Oracle
• DB2 (IBM)
• MS SQL Server
• MS Access
• Clipper
• MySQL
• dBASE
• FoxPro
etc....
Characteristics

• Way of Storing the data


• Reduced Redundancy
• Concurrent Access
• Data Consistency
• Support to Structure Query Language
• Security
• Transaction Support
Components of Database System
• Users- People who interact with the database:
• Application Programmers.
• End Users.
• Data Administrators.
• Software- Lies between the stored data and the users:
• DBMS.
• Application Software.
• User Interface.
• Hardware- Physical device on which database resides.
• e.g.:
• Computers, Disk Drives,
• Printers, Cables etc.
• Data- numbers, characters, pictures.
• e.g.:
• NIOS, 1008, Noida, India.
Users of DBMS
• Application Programmers
The Application programmers write programs
in various programming languages to interact
with databases.
• Database Administrators
Database Admin is responsible for managing
the entire DBMS system.
• End-Users
The end users are the people who interact
with the database management system.
Database Administrator (DBA)

• Individual or a group, having centralized control of the database.


• Has a good understanding of database and coordinates all
activities of the database.
• Functions:
-Defines schema.
-Defines storage structure and access method.
-Modification of both.
-Granting user authority to access the database.
-Monitoring performance and responding to changes.
Types of DBMS

• Hierarchical databases
• Network databases
• Object oriented databases
• Relational databases
• NoSQL databases
Hierarchical Data Model(HDBMS)
 1968-1980 was the era of the Hierarchical Database.
Prominent hierarchical database model was IBM's first DBMS.
 It was called IMS (Information Management System).
 In this model, files are related in a parent/child manner.
 Tree like structure.

17
Network databases
 A network database model is a database model that allows
multiple records to be linked to the same owner file.
 In this model, files are related as owners and members, like
to the common network model.
Object oriented databases
Relational Database (Tabular)
 1970 - Present: It is the era of Relational Database and
Database Management. In 1970, the relational model was
proposed by E.F. Codd.
 Relational database model has two main terminologies called
instance and schema.
 The instance is a table with rows or columns
 This model uses some mathematical concept like set theory
and predicate logic.
 The first internet database application had been created in
1995.
 During the era of the relational database, many more models
had introduced like object-oriented model, object-relational
model, etc.

20
Relational databases
Session-2
Data Model

• Defined as an abstract model that organizes data description,


data semantics, and consistency constraints of data.
• It emphasizes on what data is needed and how it should be
organized instead of what operations will be performed on
data.
• It is a conceptual representation of Data objects, the
associations between different data objects, and the rules.
Classification

RELATIONAL STORAGE MEDIUM


E-R MODEL MODEL
Conceptual Model
• Also Known as High-Level Model
• Provide flexible data-structuring capabilities.
• Present a “community view”.
• It is independent of both software and hardware.
• It shows relationships among data (entity)
• Consider a database as a collection of entities (objects) of various
kinds.
• It is independent regardless of the database you will be using in the
future.
• Ensures data requirement of the users.
• Not concerned with representation, but its conceptual form.
• Three Imp terms:
• Entity: Any object, exists physically or conceptually.
• Attribute: Property or characteristic of entity.
• Relationship: Association or link b/w two entities.
• These 3 terms make Entity-Relationship Model.
Representational (Internal) Model
• Representation of data stored inside a database.
• Describes the physical structure of the database.
• It uses the concepts which are close to the end-users.
• Classification:
• Hierarchical
• Relational
• Network
• It consider a database as a collection of fixed-size records.
• It involve mapping the entities in the conceptual model to the
tables in the relational model.
• It is often referred to as the logical Model.
Physical Model

• It is the physical representation of the database


• It has the lowest level of abstractions
• It shows how the data is stored; they deal with
• Run-time performance
• Storage utilization and compression
• File organization and access methods
• Data encryption
• It is managed by the operating system (OS)
• It Provide concepts that describe the details of how data are
stored in the computer’s memory
Schema
• Logical structure of the database. - Doesn't show the data in
database.
• Classification:
• 1. Physical
• 2. Conceptual
• 3. External
Contd…

Physical Schema:
• Describes the physical storage of database.
• Not in terms of blocks or devices, but describes
organization of files, access path etc.
Conceptual Schema:
• Describes structure of whole database.
• Describes entities their relationships and constraints.
External Schema:
• Provides a user's view of data.
• Shows relevant info particular to user, hides rest of the
info.
• one or more levels.
Instances or State
• Database State:
• The actual data stored in a database at a particular
moment in time. This includes the collection of all the
data in the database.
• Also called database instance (or occurrence or
snapshot).
• The database schema changes very infrequently.
• The database state changes every time the database is
updated.

• Schema is also called intension.


• State is also called extension.
Example of a Database Schema
Example of a database state
DBMS Architecture

• The DBMS design depends upon its architecture. The basic


client/server architecture is used to deal with a large number of
PCs, web servers, database servers and other components that
are connected with networks.

• The client/server architecture consists of many PCs and a


workstation which are connected via the network.

• DBMS architecture depends upon how users are connected to


the database to get their request done.
Types of DBMS Architecture

• 1-Tier Architecture

• In this architecture, the database is directly available to the


user. It means the user can directly sit on the DBMS and uses
it.
• Any changes done here will directly be done on the database
itself. It doesn't provide a handy tool for end users.
• The 1-Tier architecture is used for development of the local
application, where programmers can directly communicate
with the database for the quick response.
2-Tier Architecture
• The 2-Tier architecture is same as basic client-server. In
the two-tier architecture, applications on the client end
can directly communicate with the database at the server
side. For this interaction, API's like: ODBC, JDBC are used.
• The user interfaces and application programs are run on
the client-side.
• The server side is responsible to provide the
functionalities like: query processing and transaction
management.
• To communicate with the DBMS, client-side application
establishes a connection with the server side.
3-Tier Architecture

• The 3-Tier architecture contains another layer between the


client and server. In this architecture, client can't directly
communicate with the server.
• The application on the client-end interacts with an application
server which further communicates with the database system.
• End user has no idea about the existence of the database
beyond the application server. The database also has no idea
about any other user beyond the application.
• The 3-Tier architecture is used in case of large web application.
Three-Schema Architecture
• Defines DBMS schemas at three levels:
• Internal schema at the internal level to describe physical
storage structures and access paths (e.g indexes).
• Typically uses a physical data model.
• Conceptual schema at the conceptual level to describe
the structure and constraints for the whole database for a
community of users.
• Uses a conceptual or an implementation data model.
• External schemas at the external level to describe the
various user views.
• Usually uses the same data model as the conceptual
schema.
Session-3
The three-schema architecture

Conceptual
Or
Logical

Internal
Or
Physical
Data Independence
• Logical Data Independence:
• The capacity to change the conceptual schema without
having to change the external schemas and their
associated application programs.

• Physical Data Independence:


• The capacity to change the internal schema without
having to change the conceptual schema.
• For example, the internal schema may be changed when
certain file structures are reorganized or new indexes are
created to improve database performance
Database Languages
• It can be used to read, store and update the data in the
database.
• Types of Database Language :
DDL

• Stands for Data Definition Language


• It is used to create schema, tables, indexes, constraints, etc.
in the database.
• Using the DDL statements, you can create the skeleton of
the database.
• Tasks perform :
Create, Alter, Drop, Truncate, Rename,
Comment…..
DML
• Stands for Data Manipulation Language
• It is used for accessing and manipulating data in a database.
• It handles user requests.
• Tasks perform :
Select, Insert, Update, Delete, Merge, Call,
Lock Table…….
DCL

• Stands for Data Control Language


• It is used to retrieve the stored or saved data.
• This execution is transactional.
• It also has rollback parameters.
• Tasks perform :
Grant and Revoke …
TCL
• Stands for Transaction Control Language
• It is used to run the changes made by the DML statement.
• It can be grouped into a logical transaction.
• Tasks perform :
Commit and Rollback…
DBMS Architecture Layers
• External View :
Refer the content of the database as it is seen by
some specific particular user.
• Conceptual View :
Describes the Database structure of the whole
database for the community of users.
• Physical View:
Refer the way data are physically stored
and processed in a database.
ER Basics-Entity Relational Model
 High-level conceptual data model diagram known as E-R
Diagram.
 Based on the notion of real-world entities and the
relationship between them.
 Helps you to analyze data requirements systematically to
produce a well-designed database.

47
E-R Diagram

• ER Diagram is a visual representation of data that describes


how data is related to each other.
• ER diagrams help to explain the logical structure of
databases.
• ER diagrams are created based on four basic concepts:

 Entities
 Attributes
 Keys
 Relationships
ENTITY
• A real-world thing that is to be represented in our database.
• An entity can be place, person, object, event or a concept,
which stores data in the database.
• An entity is made up of some 'attributes' which represent
that entity.
 5 Types of entity
 Person: Employee, Student, Patient
 Place: Store, Building student
 Object: Machine, product, and Car
 Event: Sale, Registration, Renewal
 Concept: Account, Course
• A rectangle symbol is used for ENTITY representation..
• Group of similar entities- entity set
Attributes
• An attribute describes the property of an entity.
• E.g : student : name ,rollno, age…
• An attribute is represented as Ellipses in an ER diagram.

student

name rollno age


Types of Attributes
Types of Attributes Description
Simple attribute Simple attributes can't be divided any further.
Ex: Rollno, Class
Composite attribute It is possible to break down composite attribute.
Ex:- a student's full name may be further divided into first name,
second name, and last name.
Derived attribute This type of attribute does not include in the physical database.
However, their values are derived from other attributes present
in the database.
Ex:- Age should not be stored directly. Instead, it should be
derived from the DOB of that employee.

Multivalued attribute Multivalued attributes can have more than one values.
Ex:- A student can have more than one mobile number, email
address, etc.
Key Attribute The key attribute is used to represent the main characteristics of
an entity. It represents a primary key. The key attribute is
represented by an ellipse with the text underlined.
Ex: Rollno is a key attribute as it can identify any student
uniquely.

Single Valued Attribute Single valued attributes are those attributes which can take only
one value for a given entity from an entity set. Ex: Age, DoB,
Rollno, etc.
Session-4
Key Attributes
• Keys play an important role in the relational database.
• It is used to uniquely identify any entity or record or row of
data from the table. It is also used to establish and identify
relationships between tables.

• Types of Key Attributes


• Primary Key
• Candidate Key
• Super Key
• Alternate Key
• Foreign Key
• Composite Key
• Surrogate Key
Primary key and Candidate Key
• It is the first key used to identify one and only one instance of an
entity uniquely. An entity can contain multiple keys. The key which
is most suitable from those lists becomes a primary key.
• For each entity, the primary key selection is based on
requirements and developers.
• A candidate key is an attribute or set of attributes that can
uniquely identify a tuple.
• Except for the primary key, the remaining attributes are
considered a candidate key. The candidate keys are as strong as
the primary key.
• Super Key
• Super key is an attribute set that can uniquely identify a tuple. A
super key is a superset of a candidate key.
• Foreign key
• Foreign keys are the attribute of the table used to point to the
primary key of another table.
• Composite key
• Whenever a primary key consists of more than one attribute, it is
known as a composite key. This key is also known as Concatenated
Key.
• Alternate key
• The total number of the alternate keys is the total number of
candidate keys minus the primary key
• Surrogate key
• Surrogate key also called a synthetic primary key, is generated when
a new record is inserted into a table automatically by a database that
can be declared as the primary key of that table . It is the sequential
number outside of the database that is made available to the user.
Student
Roll No. Regd. No. Name gender E-mail DOB

1 100100 Abc male abc@gmail.com 12-03-2003

2 100230 Xyz Female xyz@gmail.com 03-02-2001

3 100536 Pqr Female pqr@gmail.com 15-05-2002

4 100469 Dhr Male dhr@gmail.com 08-04-2002

Library
Roll No Book Name Author Issue Date Return Date Penalty

1 C++ Abc 2-3-2021 3-3-2021 0

2 C++ Ghs 3-1-2021 2-3-2021 0

1 Dbms abc 8-9-2021 6-10-2021 0


Department
Relationship
• It represents the association between entities.
• A relationship is represented by diamond shape in ER diagram
• Example:
• Employee works at a Department,
• A student enrolls in a course.
• A set of relationships of similar type is called a relationship
set.

student enrolls Course


Types of Relations

One-to-one − One entity from entity set A can be associated


with at most one entity of entity set B and vice versa.

Example:- Patient : Bed(1:1)

60
One-to-many − One entity from entity set A can be associated
with more than one entities of entity set B however an entity
from entity set B, can be associated with at most one entity.

Father : Children (1:M)

61
Many-to-one − More than one entities from entity set A can be
associated with at most one entity of entity set B, however an
entity from entity set B can be associated with more than one
entity from entity set A.

Student : Slot (M:1)

62
Many-to-many − One entity from A can be associated with more
than one entity from B and vice versa.

Teacher : student (M:N)

63
Weak Entities
 A weak entity is a type of entity which doesn't have its key
attribute.

64
Strong Entity Weak Entity
Strong entity always have one Weak entity have a foreign key
primary key. referencing primary key of strong
entity.
Strong entity is independent of Weak entity is dependent on
other entities. strong entity.
A strong entity is represented by A weak entity is represented by
single rectangle. double rectangle.
Relationship between two strong Relationship between a strong and
entities is represented by single weak entity is represented by
diamond. double diamond.

Strong entity may or may not Weak entity always participates in


participate in entity relationships. entity relationships.
DATA SEMANTICS

• It is a conceptual data model in which semantic information is


included. This means that the model describes the meaning of
its instances.

• This provides the data combinations and business questions


that are required by users and applications
DATA CONSTRAINTS
• Constraints are the rules enforced on the data columns of a
table.
• This ensures the accuracy and reliability of the data in
the database.(INTIGRITY)
• TYPES :
NOT NULL
UNIQUE
DEFAULT
CHECK
Key Constraints – PRIMARY KEY, FOREIGN KEY
Domain constraints
Mapping constraints
Relationships: constraints
• The degree of a relationship type
• binary (connects 2 entity types)
• unary/ recursive (connects 1 entity type with itself) Degree
• complex (connects 3 or more entity types)
• Ternary (connects 3)

• Relationship constraints - cardinality


• one to one (1:1)
• one to many (1:m)
• many to many (m:n)
Multiplicity
• Relationship constraints – participation
• full/mandatory
• or partial/optional
68
Relationships: Degree

Entity1 Entity2
HasLinkWith
Binary relationship

S u p e rv is o r Supe rv ise s

Entity1
Staff Recursive (Unary) relationship -
example
S u p e rv is e e

Entity1 Entity3
Te rnaryRe lationship

Complex relationship –
here ternary Entity2

69
Relationships: Multiplicity
label lines to show cardinality and participation
0..1 “zero or one” optional
0..* “zero or more”
1..1 “one”
1..4 “between 1 and 4” mandatory
1..* “one or more”

Entity1 Entity2
HasLinkWith

1..1 0..*

Entity1 has a 1:m relationship with Entity2;


participation for Entity1 is mandatory, for Entity2 optional.
70
Relationships example

Manages
Manager Department
1..1 0..3

responsibility [1..*]
dateAllocated

Each manager
Each manages UP TO 3
department departments
Relationship
is managed by (but need not manage
attributes
ONE manager any department)
71
FEATURES OF ER-diagram
Session-5
Generalization
• Generalization is like a bottom-up approach in which two or
more entities of lower level combine to form a higher level
entity if they have some attributes in common.
• Entities are combined to form a more generalized entity, i.e.,
subclasses are combined to make a superclass.

IS A
Specialization
• This is a top-down approach, and it is opposite to
Generalization.
• This is used to identify the subset of an entity set that shares
some distinguishing characteristics.

IS A
Aggregation
• Where the relation between two entities is treated as a single
entity.
• Where relationship with its corresponding entities is
aggregated into a higher level entity.

ENQUIRE
E-R Diagram Steps
Here we are going to design an Entity Relationship (ER) model
for a college database . Say we have the following statements.
1. A college contains many departments
2. Each department can offer any number of courses
3. Many instructors can work in a department
4. An instructor can work only in one department
5. For each department there is a Head
6. An instructor can be head of only one department
7. Each instructor can take any number of courses
8. A course can be taken by only one instructor
9. A student can enroll for any number of courses
10. Each course can have any number of students
77
Step 1 : Identify the Entities

What are the entities here?


From the statements given, the entities are
1. Department
2. Course
3. Instructor
4. Student

78
Step 2 : Identify the relationships
1. One department offers many courses. But one particular course
can be offered by only one department. hence the cardinality
between department and course is One to Many (1:N)
2. One department has multiple instructors . But instructor belongs
to only one department. Hence the cardinality between
department and instructor is One to Many (1:N)
3. One department has only one head and one head can be the
head of only one department. Hence the cardinality is one to
one. (1:1)
4. One course can be enrolled by many students and one student
can enroll for many courses. Hence the cardinality between
course and student is Many to Many (M:N)
5. One course is taught by only one instructor. But one instructor
teaches many courses. Hence the cardinality between course and
instructor is Many to One (N :1)
79
Step 3: Identify the key attributes
• "Department_Name" can identify a department uniquely.
Hence Department_Name is the key attribute for the Entity
"Department".
• Course_ID is the key attribute for "Course" Entity.
• Student_ID is the key attribute for "Student" Entity.
• Instructor_ID is the key attribute for "Instructor" Entity.

Step 4: Identify other relevant attributes


• For the department entity, other attribute is location
• For course entity, other attributes are course_name,duration
• For instructor entity, other attributes are first_name,
last_name, phone
• For student entity, first_name, last_name, phone
80
Step 5: Draw complete ER diagram
By connecting all these details, we can now draw ER diagram as
given below.

81
Session-6
BANKING SYSTEM
• ER diagram of Bank has the following description :

• Bank have Customers.


• Banks are identified by a name, code, address of main office.
• Banks have branches.
• Branches are identified by a branch_no., branch_name, address.
• Customers are identified by name, cust-id, phone number,
address.
• Customer can have one or more accounts.
• Accounts are identified by account_no., acc_type, balance.
• Customer can avail loans.
• Loans are identified by loan_id, loan_type and amount.
• Account and loans are related to bank’s branch.
1
Library Management System
• The system keeps track of the staff with a single point
authentication system comprising login Id and password.
• Staff maintains the book catalog with its ISBN, Book title, price(in
INR), category(novel, general, story), edition, author Number
and details.
• A publisher has publisher Id, Year when the book was published,
and name of the book.
• Readers are registered with their user_id, email, name (first
name, last name), Phone no (multiple entries allowed),
communication address. The staff keeps track of readers.
• Readers can return/reserve books that stamps with issue date
and return date. If not returned within the prescribed time
period, it may have a due date too.
• Staff also generate reports that has readers id, registration no of
report, book no and return/issue info.
Relational DBMS
Relational model can represent as a table with columns and rows. Each row
is known as a tuple. Each table of the column has a name or attribute.
Terminologies:
• Tuple: Each row of a relation is known as tuple. e.g.; STUDENT relation
given below has 4 tuples.
• Attribute: It contains the name of a column in a particular table. Each
attribute Ai must have a domain, dom (Ai)
• Domain: The possible values an attribute can take in a relation is called its
domain. For Example, domain of STUD_AGE can be from 18 to 40.
• Relational instance: In the relational database system, the relational
instance is represented by a finite set of tuples. Relation instances do not
have duplicate tuples.
• Relational schema: A relational schema contains the name of the relation
and name of all columns or attributes.
• Relational key: In the relational key, each row has one or more attributes. It
can identify the row in the relation uniquely.
EF Codd’s Rules
• Rule 0:
• Rule 1: Information Rule
• The data stored in a database, may it be user data or metadata, must
be a value of some table cell.
• Rule 2: Guaranteed Access Rule
• Every single data element (value) is guaranteed to be accessible
logically with a combination of table-name, primary-key (row value),
and attribute-name (column value).
• Rule 3: Systematic Treatment of NULL Values
• The NULL values in a database must be given a systematic and uniform
treatment. This is a very important rule because a NULL can be
interpreted as one the following − data is missing, data is not known, or
data is not applicable.
• Rule 4: Active Online Catalog
• The structure description of the entire database must be stored in an
online catalog, known as data dictionary, which can be accessed by
authorized users.
• Rule 5: Comprehensive Data Sub-Language Rule
• A database can only be accessed using a language having linear
syntax that supports data definition, data manipulation, and
transaction management operations.
• Rule 6: View Updating Rule
• All the views of a database, which can theoretically be updated,
must also be updatable by the system.
• Rule 7: High-Level Insert, Update, and Delete Rule
• A database must support high-level insertion, updation, and
deletion.
• Rule 8: Physical Data Independence
• The data stored in a database must be independent of the
applications that access the database. Any change in the physical
structure of a database must not have any impact on how the data is
being accessed by external applications.
• Rule 9: Logical Data Independence
• The logical data in a database must be independent of its user’s view
(application). Any change in logical data must not affect the
applications using it.
• Rule 10: Integrity Independence
• All its integrity constraints can be independently modified without
the need of any change in the application. This rule makes a database
independent of the front-end application and its interface.
• Rule 11: Distribution Independence
• The end-user must not be able to see that the data is distributed over
various locations.
• Rule 12: Non-Subversion Rule
• If a system has an interface that provides access to low-level records,
then the interface must not be able to subvert the system and bypass
security and integrity constraints.
Constraints
Every relation has some conditions that must hold for it to be a
valid relation. These conditions are called Relational Integrity
Constraints. There are three main integrity constraints −
• Key constraints
• Domain constraints
• Referential integrity constraints

Key Constraints or Entity Constraints


• The minimal subset of attributes is called key for a relation
which can identify a tuple uniquely.
• Key constraints force that −
• in a relation with a key attribute, no two tuples can have
identical values for key attributes.
• a key attribute can not have NULL values.
Domain Constraints
• Attributes have specific values in real-world scenario.
• For example, age can only be a positive integer. The same
constraints have been tried to employ on the attributes of a
relation.
• Every attribute is bound to have a specific range of values.
• For example, age cannot be less than zero and telephone
numbers cannot contain a digit outside 0-9.

Referential integrity Constraints


• Referential integrity constraints work on the concept of
Foreign Keys. A foreign key is a key attribute of a relation that
can be referred in other relation.
• Referential integrity constraint states that if a relation refers
to a key attribute of a different or same relation, then that key
element must exist.
REFERENCE TABLE
Session-7
Relational Algebra
• Relational algebra is a procedural query language, which takes
instances of relations as input and yields instances of relations as
output.
• It uses operators to perform queries. An operator can be
either unary or binary.
• The fundamental operations of relational algebra are as follows
• Select (σ)
• Project (∏)
• Union (∪)
• Intersection(∩)
• Set different (−)
• Cartesian product (Χ)
• Rename (ρ)
• Join
Select Operation (σ)
It selects tuples that satisfy the given predicate from a relation.
• Notation − σp(R)
• Where σ stands for selection predicate and R stands for
relation. p is prepositional logic formula which may use
connectors like and, or, and not. These terms may use relational
operators like − =, ≠, ≥, < , >, ≤.
• For example −
σsubject = "database"(Books)
• Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)

• Output − Selects tuples from books where subject is 'database'


and 'price' is 450.
Project Operation (∏)

It projects column(s) that satisfy a given predicate.


Notation − ∏A1, A2, An (R)
Where A1, A2 , An are attribute names of relation R.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
∏subject, author (Books)

Output:-Selects and projects columns named as subject and


author from the relation Books.
Union Operation (∪)
• It includes all tuples that are in tables A or in B. It also
eliminates duplicate tuples. So, set A UNION set B would be
expressed as:
• Notation<- A ∪ B
• For a union operation to be valid, the following conditions
must hold -
• R and S must be the same number of attributes.
• Attribute domains need to be compatible.
• Duplicate tuples should be automatically removed.
• Example
∏ author (Books) ∪ ∏ author (Articles)
• Output − Projects the names of the authors who have either
written a book or an article or both.
Intersection(∩)
• An intersection is defined by the symbol ∩
• Notation:- A ∩ B
• Defines a relation consisting of a set of all tuple that are in
both A and B. However, A and B must be union-compatible.
• Example:- Find all the authors who written books and
articles

∏ author (Books) ∩ ∏ author (Articles)

• Output − Projects the names of the authors who have


written book and article.
Set Difference (−)

• The result of set difference operation is tuples, which are


present in one relation but are not in the second relation.
• Notation :- R − S
• Finds all the tuples that are present in R but not in S.

∏ author (Books) − ∏ author (Articles)


• Output − Provides the name of authors who have written books but
not articles.
Cartesian Product (Χ)
• Combines information of two different relations into one.
• Notation :− R Χ S.
• If R has m tuples and and S has n tuples, cross product of R and S will
have m X n tuples.
Rename Operation (ρ)
• The results of relational algebra are also relations but
without any name. The rename operation allows us to
rename the output relation. 'rename' operation is denoted
with small Greek letter rho ρ.
• Notation − ρ (X,E)
• Where the result of expression E is saved with name of X.
• Example:-
ρ(STUDENT1, STUDENT)
• Output:- STUDENT table is renamed as STUDENT1
Join Operations
• Join is a combination of a Cartesian product followed by a
selection process.
• A Join operation pairs two tuples from different relations, if
and only if a given join condition is satisfied.
• Types of Join Operation
• Conditional Join(⋈c)
• Equijoin(⋈)
• Natural Join(⋈)
• Left Outer Join(⟕)
• Right Outer Join(⟖)
• Full Outer Join(⟗)

104
Conditional Join(⋈c)
• Conditional Join is used when you want to join two or more
relation based on some conditions.
• Example: Select students whose ROLL_NO is greater than
EMP_NO of employees

STUDENT ⋈c STUDENT.ROLL_NO>EMPLOYEE.EMP_NO EMPLOYEE

In terms of basic operators (cross product and selection) :

σ (STUDENT.ROLL_NO>EMPLOYEE.EMP_NO)(STUDENT×EMPLOYEE)
Equijoin(⋈)
• Equijoin is a special case of conditional join where only
equality condition holds between a pair of attributes.
• As values of two attributes will be equal in result of equijoin,
only one attribute will be appeared in result.

Example: Select students whose ROLL_NO is equal to EMP_NO


of employees

STUDENT⋈STUDENT.ROLL_NO=EMPLOYEE.EMP_NOEMPLOYEE
Natural Join(⋈)
• It is a special case of equijoin in which equality condition
hold on all attributes which have same name in relations R
and S (relations on which join operation is applied).
• While applying natural join on two relations, there is no
need to write equality condition explicitly.
• Example: Select students whose ROLL_NO is equal to
ROLL_NO of STUDENT_SPORTS as:

STUDENT⋈STUDENT_SPORTS
Analysis of Inner join

• Theta Join, Equijoin, and Natural Join are called inner joins.
• An inner join includes only those tuples with matching
attributes and the rest are discarded in the resulting relation.

110
Outer Joins

• we need to use outer joins to include all the tuples from the
participating relations in the resulting relation.
• There are three kinds of outer joins −
• left outer join,
• right outer join,
• full outer join.

111
Left Outer Join(R S)
• All the tuples from the Left relation, R, are included in the
resulting relation.
• If there are tuples in R without any matching tuple in the
Right relation S, then the S-attributes of the resulting relation
are made NULL.

112
Right Outer Join: ( R Right Outer Join S )
• All the tuples from the Right relation, S, are included in the
resulting relation.
• If there are tuples in S without any matching tuple in R, then
the R-attributes of resulting relation are made NULL.
Course (Left) HOD (Right)
A B C D
100 Database 100 Alex
101 Mechanics 102 Maya
102 Electronics 104 Mira

113
Full Outer Join: ( R Full Outer Join S)
• All the tuples from both participating relations are included in
the resulting relation.
• If there are no matching tuples for both relations, their
respective unmatched attributes are made NULL.
Course (Left) HOD (Right)
A B C D
100 Database 100 Alex
101 Mechanics 102 Maya
102 Electronics 104 Mira

114
Relational Calculus
 Relational calculus is a non-procedural query language.
 In the non-procedural query language, the user is concerned
with the details of how to obtain the end results.
 The relational calculus tells what to do but never explains
how to do.
 Types of Relational calculus

115
Comparison Chart
BASIS FOR
COMPARISON RELATIONAL ALGEBRA RELATIONAL CALCULUS

Basic Relational Algebra is a Relational Calculus is


Procedural language. Declarative language.
States Relational Algebra states how to Relational Calculus states
obtain the result. what result we have to
obtain.

Order Relational Algebra describes the Relational Calculus does not


order in which operations have specify the order of
to be performed. operations.

Domain Relational Algebra is not domain Relation Calculus can be


dependent. domain dependent.
Related It is close to a programming It is close to the natural
language. language.

116
Tuple Relational Calculus
The Tuple Relational Calculus list the tuples selected from a relation,
based on a certain condition provided. It is formally denoted as:
{ t | P(t) }
Where t is the set of tuples from which the condition P is true.
where t = resulting tuples,
P(t) = known as Predicate and these are the conditions that are used to
fetch t
Thus, it generates set of all tuples t, such that Predicate P(t) is true for
t.
P(t) may have various conditions logically combined with OR (∨), AND
(∧), NOT(¬).
It also uses quantifiers:
∃ t ∈ r (Q(t)) = ”there exists” a tuple in t in relation r such that
predicate Q(t) is true.
∀ t ∈ r (Q(t)) = Q(t) is true “for all” tuples in relation r.

117
Build TRC expression case
 Ex- let t is a tuple variable assigns to EMP relation as follows:

t → EMP( eno, ename, age, sal)

Q1. list the ename and age of the employees who are getting
salary above 10,000.

{ t| EMP(t) ∧ t.sal > 10,000 } // list all attributes

Or

{ t.ename, t.age | EMP(t) ∧ t.sal > 10,000 }

118
Some more
Q1. list the age of the employee whose eno is 102.

{t.age | EMP(t) ∧ t.eno = 102 }


Q2. list the ename, age, and sal of the employees who are getting
salary below 10,000 and above 20,000.

{ t.ename, t.age, t.sal | EMP(t) ∧ t.sal < 10,000 ∨ t.sal >


20,000}

For practice:
1. list the employee name and age of the employees whose age
is below 35 or above 50 and sal is above 30000.
2. List the eno and sal of the employees who are getting salary
between 20,000 to 50000.
119
Domain Relational Calculus
• The Domain Relational Calculus list the attributes to be selected
from a relation, based on certain condition.
• The formal definition of Domain Relational Calculus is as follow:
{<X1, X2, X3, . . . Xn> | P(X1, X2, X3, . . . Xn)}
• Where X1, X2, X3, . . . Xn are the attributes and P is the certain
condition.
• Domain relational calculus uses the same operators as tuple
calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not).
• It uses Existential (∃) and Universal Quantifiers (∀) to bind the
variable.
{< article, page, subject > | ∈ studypoint ∧ subject = 'database'}
Output: This query will yield the article, page, and subject from the
relational studypoint, where the subject is a database.

120
Build DRC expression case
EMP(eno, ename, age, sal)
For individual domain one variable need to be
assigned.
P = eno < variable p ranges over the domain eno>
q =ename
r = age
s = sal
Q1. Build the DRC expression to find the age of Mr. X
[r|(∃p)(∃q)(∃r)(∃s) { EMP(pqrs) ∧ q= ‘Mr. X’} ]

121
More …
Q2. List the ename of the employees who are getting salary 10000
and above.
[q|(∃p)(∃q)(∃r)(∃s) { EMP(pqrs) ∧ s ≥ 10000} ]

Q3. List the employee name and age of the employees who age is
below 35 or above 50 and sal is above 30000.
[q, r|(∃p)(∃q)(∃r)(∃s) { EMP(pqrs) ∧ r < 35 ∨ r > 50 ∧ s ≥ 30000 } ]

For Practice:
1. List the employee id and ename of the employees who age is
35 or above 50 and sal is below 30000.
2. List the eno and sal of the employees who are getting salary
between 20,000 to 50000.

122
Practice sessions(RA, TRC,DRC)
R(A,B) S(B,C) // two relations R and S
SQL> select A,C from R,S
where R.B = S.B
and S.C = 3;
Transform into RA, TRC, and DRC expression?

123
R(A,B) S(B,C) // two relations R and S
SQL> select A,C from R,S
where R.B = S.B
and S.C = 3;
Relational Algebra: -
πA,C(σS.C=3(R⋈S))
Tuple Relational Calculus:-
{t.A, u.C | R(t) ∧ S(u) ∧ t.B = u.B ∧ u.C = 3}

Domain Relational Calculus:-


[p, s|(∃p)(∃q)(∃r)(∃s) { R(pq) ∧ S(rs) ∧ q= r ∧ s = 3} ]

124
Task for you (RA, TRC,DRC)

R(A,B,C,D,E,F,G)
SQL> SELECT A,C,D,G
FROM R
WHERE R.C >5000 ;
Transform into RA, TRC, and DRC expression?
Transform into RA, TRC, DRC expressions.

125
Queries

• Queries-1: Find the tuples of loans where amount is


greater than or equal to 10000.

• Queries-2: Find the loan number for each loan of an


amount greater or equal to 10000.

• Queries-3: Find the names of all customers who have a


loan and an account at the bank.

• Queries-4: Find the names of all customers having a loan


at the “ABC” branch.
Tuple Calculus
• {t| t ∈ loan ∧ t.amount >=10000}
• {t| ∃ s ∈ loan(t.loan number = s.loan number ∧ s.amount
>=10000)}
• {t | ∃ s ∈ borrower( t.customer-name = s.customer-name) ∧
∃ u ∈ depositor( t.customer-name = u.customer-name)}
• {t | ∃ s ∈ borrower(t.customer-name = s.customer-name ∧ ∃
u ∈ loan(u.branch-name = “ABC” ∧ u.loan-number = s.loan-
number))}
• Domain Calculus
• { p, q, r | (∃p)(∃q)(∃r) {loan(pqr) ∧ r >=10000}}
• { p | (∃p)(∃q)(∃r) {loan(pqr) ∧ r >=10000 }}
• {x | (∃x)(∃y) (∃p)(∃q) { depositor(xy) ∧ borrower(pq) ∧ x=p}}
Normalization
Problems Without Normalization

• Since, databases contains a lot of data in the form of tables, it is


very difficult or almost impossible to manage data if any of the
anomaly occur i.e. either Insertion, Deletion or Updation.
Hence, removal of redundant data is very necessary.
• Insertion, Updation and Deletion Anomalies are very frequent if
database is not normalized.
• Inconsistency – redundancy---anomalies--extra memory space

130
Normalizations on Relational Database
 Normalization is a database design technique which
organizes tables in a manner that reduces redundancy and
dependency of data.
 It divides larger tables to smaller tables and links them using
relationships.
 The inventor of the relational model Edgar Codd proposed
the theory of normalization with the introduction of First
Normal Form, and he continued to extend theory with
Second and Third Normal Form. Later he joined with
Raymond F. Boyce to develop the theory of Boyce-Codd
Normal Form(BCNF).

131
Evolution of Normalization theories

 Theory of Data Normalization in SQL is still being developed


further.
 For example, there are discussions even on 6th Normal
Form.
 However, in most practical applications, normalization
achieves its best in 3rd Normal Form.
 Normalization is carried out in practice so that the resulting
designs are of high quality and meet the desirable
properties

132
First Normal Form(1NF)
• The First Normal Form(1NF) works on the concept of “Atomicity”
in values of every individual tuple of tables present in the
database.
• It means, a relation is said to be in "1NF" if, every attribute in a
relation is has “Single Valued” tuple.
Functional Dependency (FD)
 FD is a set of constraints between two attributes in a
relation.
 A relationship which only exists when an attribute can
determine other attribute functionally.
• Functional Dependency in DBMS is denoted using an
arrow between two or more attributes such as FD : A
→B
• Here, A & B are the attributes present in any relation.
• “A → B” means, “B” is functionally dependent upon “A”
or “A” functionally determines “B”. Functional
dependency acts as a constraint between set of
attributes present in any database.

134
Example-1 : Consider a table student_details containing details of
some students.
Example : student_details Table

We can conclude from Roll_No attribute in the table, we are able


to determine the Name of student uniquely and same is the case
with marks too. Hence, we can say that Name and Marks are
functionally dependent on Roll_No but the vice versa is not true.

FD1 : Roll_No → Name


FD2 : Roll_No → Marks
135
Armstrong’s Axioms
• Axioms in database management systems was introduced
by William W. Armstrong in late 90’s and these axioms play a
vital role while implementing the concept of functional
dependency in DBMS for database normalization.

• Reflexive : It means, if set “B” is a subset of “A”, then A → B.


• Augmentation : It means, if A → B, then AC → BC.
• Transitive : It means, if A → B and B → C, then A → C.
• Decomposition : It means, if A → BC, then A → B and A → C.
• Union : It means, if A → B and A → C, then A → BC.
• Pseudo-Transitivity : It means, if A → B and DB → C,
then DA → C.
Closure Of Functional Dependency

• The Closure Of Functional Dependency means the


complete set of all possible attributes that can be
functionally derived from given functional dependency
using the inference rules known as Armstrong’s Rules.

• If “F” is a functional dependency then closure of


functional dependency can be denoted using “{F}+”.
• There are three steps to calculate closure of functional
dependency. These are:

• Step-1 : Add the attributes which are present on Left


Hand Side in the original functional dependency.

• Step-2 : Now, add the attributes present on the Right


Hand Side of the functional dependency.

• Step-3 : With the help of attributes present on Right Hand


Side, check the other attributes that can be derived from
the other given functional dependencies.
• Step-4: Repeat this process until all the possible attributes
which can be derived are added in the closure.
Example

Consider a relation R(A,B,C,D,E) having below mentioned


functional dependencies.
FD1 : A → BC
FD2 : C → B
FD3 : D → E
FD4 : E → D
Now, we need to calculate the closure of attributes of the
relation R. The closures will be:

{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {B, C}
{D}+ = {D, E}
{E}+ = {E}
Identifying Candidate Key
•“A Candidate Key of a relation is an attribute or set of attributes
that can determine the whole relation or contains all the
attributes in its closure."
Example-1 : Consider the relation R(A,B,C) with given functional
dependencies :
FD1 : A → B
FD2 : B → C

Now, calculating the closure of the attributes as :

{A}+ = {A, B, C}
{B}+ = {B, C}
{C}+ = {C}

Clearly, “A” is the candidate key as, its closure contains all the
attributes present in the relation “R”.
Example-2 : Consider another relation R(A, B, C, D, E) having the
Functional dependencies :
FD1 : A → BC
FD2 : C → B
FD3 : D → E
FD4 : E → D
Now, calculating the closure of the attributes as :
{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {C, B}
{D}+ = {E, D}
{E}+ = {E, D}
In this case, a single attribute is unable to determine all the attribute on
its own like in previous example. Here, we need to combine two or
more attributes to determine the candidate keys.

{A, D}+ = {A, B, C, D, E}


{A, E}+ = {A, B, C, D, E}
• Example -2: Consider a relation R ( A , B , C , D , E , F , G )
with the functional dependencies-
• A → BC
• BC → DE
•D→F
• CF → G
• Example -3: Consider the given functional dependencies-
• AB → CD
• AF → D
• DE → F
•C→G
•F→E
•G→A
Key Definitions
• Prime Attributes : Attributes which are indispensable part of
candidate keys. For example : “A, D, E” attributes are prime
attributes in above example-2.
• Non-Prime Attributes : Attributes other than prime attributes
which does not take part in formation of candidate keys. For
example “B, C”.
• Extraneous Attributes : Attributes which does not make any
effect on removal from candidate key.
• For example : Consider the relation R(A, B, C, D) with FDs
• FD1 : A → BC Here, Candidate key can be “AD” only. Hence,
• FD2 : B → C Prime Attributes : A, D.
• FD3 : D → C Non-Prime Attributes : B, C
• Extraneous Attributes : B, C(As if we add any of the attribute to the
candidate key, it will remain unaffected).
Second Normal Form(2NF)
• Any relation to be in 2NF must follow the below two rules:
• The relation/table must be in 1NF.
• There should not be any partial dependency.
• Whenever any non-prime attribute is dependent upon a part of
candidate key of a relation, it is known as partial dependency.
• For example : Consider a relation R(X,Y,Z,T) with following
functional dependencies :
• FD1 : XY → T
• FD2 : Y → Z

Step 1: Find the closure and determine the candidate key.


• {X}+ = {X}
• {Y}+ = {Y,Z}
• {XY}+ = {X,Y,Z,T} Here, Candidate key will be “XY”.
• {T}+ = {T}
Step-2: Check whether the non-prime attributes in the FD’s
are fully dependent upon candidate key or not.

• XY → T (Attribute “T” fully depends upon “XY” as “XY” is a


candidate key)
• Y → Z (Attribute “Z” is partially dependent upon the candidate
key as, “Y” is a part of “XY”)

Step-3: To remove partial dependency, decompose the whole


relation R(X,Y,Z,T) into R1(X,Y,T) and R2(Y,Z).

• Therefore, R1 : XY → T (“XY” will be the candidate key)


• and R2 : Y → Z (“Y” will be the candidate key)

• Hence in relation R2, “Z” will be fully dependent upon “Y” as it is


the candidate key for relation R2.
Third Normal Form(3NF)
• Any relation to be in Third Normal Form(3NF) must follow the
below mentioned two rules :
• The relation/table should be in Second Normal Form.
• The relation/table must not have any transitive dependency.
• If a “Non-Prime” attribute is functionally dependent upon another
“Non-Prime” attribute, that functional dependency will also be
termed as transitive.
Example:- Consider the relation R(X,Y,Z,T) with the following FDs
• FD1 : XY → Z
• FD2 : Z → T
Step-1 : Check whether given relation is in 2NF or not.
Here, “XY” will be the candidate key for the above mentioned
FD’s and no partial dependency exists in this relation too.
Step-2 : Remove transitive dependencies from the relation
and try to decompose it into a separate
relations.
• XY → Z (“Z” is fully dependent on candidate key “XY”)
• Z → T (Z & T, both are non-prime attributes and thus forms a
transitive dependency)
Step-3: Splitting the whole relation into two separated
relations R1(X,Y,Z) and R2(Z,T)
• R1 : XY → Z
(Here, “XY” is the candidate key and there is no transitive
dependency in this whole relation)
• R2 : Z → T
(Here, “Z” is the candidate key and there is no transitive dependency
in this whole relation)
• Hence, the relation can be termed to be in 3NF, as both the
relation R1 and R2 does not have any transitive dependency
as well as both as are in 2NF too.
BCNF (Boyce Codd. Normal Form)
• BCNF is known as “Boyce Codd Normal Form” and is a
successor to “Third Normal Form”.
• Any relation to be in BCNF must follow below mentioned
two rules.
• The relation/table needs to be in 3NF.
• For a functional dependency P → Q, “P” should be a
super key.
• BCNF deals with the cases where “Non-Prime attribute
derives a prime attribute”.
Example:- Consider a relation R(X,Y,Z) having following
functional dependencies :
• FD1 : XY → Z
• FD2 : Z → Y
Step-1 : Check whether given relation is in 3NF or not.
• Attribute “Z” is fully dependent upon the candidate key “XY” and there is
not transitive dependency too. Also, “X & Y” are prime attributes
whereas “Z & T” are non-prime attributes.
Step-2 : Check for FD’s where a non-prime attribute determines
a prime attribute.
• In this example “Z → Y” is that FD.
• FD1 : XY → Z (“XY” is the candidate key, so no problem in this FD)
• FD2 : Z → Y (Here, a non-prime attribute is deriving a prime attribute)
Step-3: Decompose relation R(X,Y,Z,T) into R1(X,Y,Z) an
R2(Z, Y) to convert the relation in BCNF.
• R1 : XY → Z
• R2 : Z → Y
• “XY” and “Z” are super keys of their respective relations i.e. “R1” and
“R2” as each of them can uniquely determine the attributes present
in the relations.
Fourth normal form (4NF)
• A relation will be in 4NF if it is in Boyce Codd normal form
and has no multi-valued dependency.
• For a dependency X → Y, if for a single value of X, multiple
values of Y exists, then the relation will be a multi-valued
dependency.
• Also, a table should have at-least 3 columns for it to have a
multi-valued dependency.
• And, for a relation R(X,Y,Z), if there is a multi-valued
dependency between, X and Y, then Y and Z should be
independent of each other.
Challenging task to practice
Q. R = {A,B,C,D,E,F}

FD= {AB→C, AD→B, C→B, F→AD,F→E}

1) Is R in 2 NF? If no, decompose R into 2NF.

2) Check R in 3NF or not, if not Decompose into 3NF.

3) Further check R in BCNF or not, if not

152
Challenging task to practice
Q. R = {A,B,C,D,E,F,G,H,I,J,K}

FD= {AB→CK, AD→BG, C→BH, F→ADI,F→EJ}

1) Is R in 2 NF? If no, decompose R into 2NF.

2) Check R in 3NF or not, if not Decompose into 3NF.

3) Further check R in BCNF or not, if not

153
Introduction to SQL

• SQL (Structured Query Language) is used to perform


operations on the records stored in the database such as
updating records, deleting records, creating and modifying
tables, views, etc.
• SQL is just a query language; it is not a database.
• To perform SQL queries, you need to install any database, for
example, Oracle, MySQL, MongoDB, PostGre SQL, SQL Server,
DB2, etc.

154
What is SQL
• SQL stands for Structured Query Language.
• It is designed for managing data in a relational database
management system (RDBMS).
• It is pronounced as S-Q-L or sometime See-Qwell.
• SQL is a database language, it is used for database creation,
deletion, fetching rows, and modifying rows, etc.
• SQL is based on relational algebra and tuple relational
calculus.
• All RDBMS like MySQL, Oracle, MS Access, Sybase, Informix,
Postgres, and SQL Server use SQL as standard database
language.

155
Why SQL is required

• To create new databases, tables and views

• To insert records in a database

• To update records in a database

• To delete records from a database

• To retrieve data from a database

156
Tables and Views

• A view is a virtual table. A view consists of rows and columns


just like a table.
• views are definitions built on top of other tables (or views),
and do not hold data themselves.
• If data is changing in the underlying table, the same change is
reflected in the view.
• A table contains data.
• The advantage of a view is that it can join data from several
tables thus creating a new view of it.

157
SQL Statements

• SELECT
• INSERT
• UPDATE Data manipulation language (DML)
• DELETE

• CREATE
• ALTER
• DROP Data definition language (DDL)
• RENAME
• TRUNCATE
• COMMENT
• GRANT Data control language (DCL)
• REVOKE
• COMMIT
• ROLLBACK Transaction
control
• SAVEPOINT
Data Types
DDL Commands
The CREATE TABLE statement is used to create a new table in a database.
Syntax
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
....
);
• The DROP TABLE statement is used to drop an existing table in a
database.
DROP TABLE table_name;
• The TRUNCATE TABLE statement is used to delete the data inside a table,
but not the table itself.
TRUNCATE TABLE table_name;
• The ALTER TABLE statement is used to add, delete, or modify columns
in an existing table.
• The ALTER TABLE statement is also used to add and drop various
constraints on an existing table.

• To add a column in a table, use the following syntax:


• ALTER TABLE table_name ADD column_name datatype;

• to delete a column in a table, use the following syntax (notice that


some database systems don't allow deleting a column):
• ALTER TABLE table_name DROP COLUMN column_name;

To change the data type of a column in a table, use the following


syntax:
• SQL Server / MS Access:
• ALTER TABLE table_name ALTER COLUMN column_name datatype;
• My SQL / Oracle (prior version 10G):
• ALTER TABLE table_name MODIFY COLUMN column_name datatype;

• Oracle 10G and later:


• ALTER TABLE table_name MODIFY column_name datatype;

• To RENAME a Table use the following syntax:


Syntax(Oracle,MySQL)

ALTER TABLE table_name RENAME TO new_table_name;

• Columns can be also be given new name with the use of ALTER TABLE.
Syntax(MySQL, Oracle)

ALTER TABLE table_name RENAME COLUMN old_name TO new_name;


The Human Resources (HR)Schema

DEPARTMENTS LOCATIONS
department_id location_id
department_name street_address
manager_id postal_code city
location_id state_province
country_id

JOB_HISTORY
employee_id
start_date EMPLOYEES
end_date employee_id
job_id first_name COUNTRIES
department_id last_name email country_id
phone_number country_name
hire_date region_id
job_id
salary
commission_pct
JOBS manager_id
job_id department_id
job_title REGIONS
min_salary region_id
max_salary region_name
DML Commands
• INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)]
VALUES (value1, value2, value3,...valueN);
OR
• INSERT INTO TABLE_NAME VALUES (value1,value2,value3,...valueN);
• Insert Multiple Rows
INSERT ALL
INTO table_name (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
INTO table_name(column1, column2, column_n) VALUES (expr1, expr2, expr_n)
INTO table_name (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
SELECT * FROM dual;
• SELECT column1, column2, columnN FROM table_name;
• SELECT * FROM table_name;
• SELECT statement with WHERE clause is as follows:
• SELECT column1, column2, columnN FROM table_name WHERE
[condition]
• Example:
• SQL> SELECT ID, NAME, SALARY FROM CUSTOMERS WHERE
SALARY > 2000;
• AND operator with WHERE clause
• SELECT column1, column2, columnN FROM table_name WHERE
[condition1] AND [condition2]...AND [conditionN];
• Example:
• SQL> SELECT ID, NAME, SALARY FROM CUSTOMERS WHERE
SALARY > 2000 AND age < 25;
• UPDATE table_name SET column1 = value1, column2 =
value2...., columnN = valueN WHERE [condition];
• Example:
• SQL> UPDATE CUSTOMERS SET ADDRESS = 'Pune' WHERE
ID = 6;
• You can combine N number of conditions using AND or OR
operators.
• DELETE query with WHERE clause is as follows:
• DELETE FROM table_name WHERE [condition];
• Example: SQL> DELETE FROM CUSTOMERS WHERE ID = 6;
• Delete All Records
• DELETE FROM table_name;
SQL Constraints
Constraints are used to limit the type of data that can go into a
table. This ensures the accuracy and reliability of the data in the
table. If there is any violation between the constraint and the data
action, the action is aborted.

The following constraints are commonly used in SQL:


•NOT NULL - Ensures that a column cannot have a NULL value
•UNIQUE - Ensures that all values in a column are different
•PRIMARY KEY - A combination of a NOT NULL and UNIQUE.
Uniquely identifies each row in a table
•FOREIGN KEY - Prevents actions that would destroy links between
tables
•CHECK - Ensures that the values in a column satisfies a specific
condition
•DEFAULT - Sets a default value for a column if no value is specified
CREATE Table with 5 integrity constraints
Create table Employee

(EID NUMBER(10) primary key,

ENAME VARCHAR2(100) not null,

SALARY NUMBER(10,2) check (salary >0),

JOB_ID VARCHAR2(3) unique,

Commission NUMBER(10,2) default 0);


168
• NOT NULL on ALTER TABLE
• ALTER TABLE Persons MODIFY Age int NOT NULL;
• UNIQUE Constraint on ALTER TABLE
• ALTER TABLE Persons ADD UNIQUE (ID);
•ALTER TABLE Orders
ADD FOREIGN KEY (PersonID) REFERENCES Persons(PersonID);
Oracle View
In Oracle, view is a virtual table that does not physically exist.
It is stored in Oracle data dictionary and do not store any data.
It can be executed when called.

A view is created by a query joining one or more tables.

170
Oracle CREATE VIEW
Syntax:
CREATE VIEW view_name AS SELECT columns FROM tables
WHERE conditions;
Example:
CREATE TABLE "SUPPLIERS"
( "SUPPLIER_ID" NUMBER,
"SUPPLIER_NAME" VARCHAR2(4000),
"SUPPLIER_ADDRESS" VARCHAR2(4000)
);

171
Cont..
CREATE TABLE "ORDERS"
( "ORDER_NO." NUMBER,
"QUANTITY" NUMBER,
"PRICE" NUMBER
);
Input the records…………….Then…………….
Create View Query:
CREATE VIEW sup_orders AS
SELECT suppliers.supplier_id, orders.quantity, orders.price
FROM suppliers
INNER JOIN orders
ON suppliers.supplier_id = supplier_id
WHERE suppliers.supplier_name = 'VOJO';

172
Cont..
You can now check the Oracle VIEW by this query:
SELECT * FROM sup_orders;

173
Oracle Update VIEW
In Oracle, the CREATE OR REPLACE VIEW statement is used to
modify the definition of an Oracle VIEW without dropping it.
Syntax:

CREATE OR REPLACE VIEW view_name AS


SELECT columns
FROM table
WHERE conditions;

174
Cont..
Example:
Execute the following query to update the definition of Oracle VIEW called
sup_orders without dropping it.

CREATE or REPLACE VIEW sup_orders AS


SELECT suppliers.supplier_id, orders.quantity, orders.price
FROM suppliers
INNER JOIN orders
ON suppliers.supplier_id = supplier_id
WHERE suppliers.supplier_name = 'HCL';

You can now check the Oracle VIEW by this query:


SELECT * FROM sup_orders;

175
Oracle DROP VIEW
The DROP VIEW statement is used to remove or delete the
VIEW completely.
Syntax:

DROP VIEW view_name;

Example:

DROP VIEW sup_orders;

176
Joins
• The basic syntax of INNER JOIN is as follows:
• SELECT table1.column1, table2.column2... FROM table1 INNER
JOIN table2 ON table1.common_filed = table2.common_field;
• SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
INNER JOIN ORDERS ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;

• The basic syntax of LEFT JOIN is as follows:


• SELECT table1.column1, table2.column2... FROM table1 LEFT JOIN
table2 ON table1.common_filed = table2.common_field;
• SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS LEFT
JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
• The basic syntax of RIGHT JOIN is as follows:
• SELECT table1.column1, table2.column2... FROM table1 RIGHT JOIN
table2 ON table1.common_filed = table2.common_field;
• SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS RIGHT
JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

• The basic syntax of FULL JOIN is as follows:


• SELECT table1.column1, table2.column2... FROM table1 FULL JOIN
table2 ON table1.common_filed = table2.common_field;
• SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS FULL JOIN
ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

• The basic syntax of CROSS JOIN is as follows:


• SELECT table1.column1, table2.column2... FROM table1, table2 [,
table3 ]
• SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS, ORDERS;
example
Oracle ORDER BY Clause
In Oracle, ORDER BY Clause is used to sort or re-arrange the
records in the result set. The ORDER BY clause is only used with
SELECT statement.
Syntax:
SELECT column-list FROM table_name [WHERE condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
SQL> SELECT * FROM CUSTOMERS ORDER BY NAME, SALARY;

180
Oracle ORDER BY Example: (sorting in descending
order)

SQL> SELECT * FROM CUSTOMERS ORDER BY NAME DESC;

181
Oracle GROUP BY Clause
In Oracle GROUP BY clause is used with SELECT statement
to collect data from multiple records and group the
results by one or more columns.
Syntax:-
SELECT column1, column2 FROM table_name WHERE [ conditions ]
GROUP BY column1, column2
SQL> SELECT NAME, SUM(SALARY) FROM CUSTOMERS GROUP BY
NAME;

182
Oracle GROUP BY Example: (with COUNT function)
Customer
O/P

SELECT state, COUNT(*) AS "Number of customers"


FROM customers
WHERE salary > 10000
GROUP BY state;
183
Oracle GROUP BY Example: (with MIN function)
EMPLOYEES

SELECT department,
MIN(salary) AS "Lowest salary"
FROM employees
GROUP BY department;

184
Oracle GROUP BY Example: (with MAX function)
EMPLOYEES

SELECT department,
MAX(salary) AS "Highest salary"
FROM employees
GROUP BY department;
185
Oracle HAVING Clause
In Oracle, HAVING Clause is used with GROUP BY Clause to restrict the groups of
returned rows where condition is TRUE.
SELECT expression1, expression2, ... expression_n,
aggregate_function (aggregate_expression)
FROM tables
WHERE conditions
GROUP BY expression1, expression2, ... expression_n
HAVING having_condition;
*********************************
aggregate_function : SUM, COUNT, MIN, MAX or AVG functions.

186
Oracle HAVING Example: (with GROUP BY COUNT function)

Customer O/P

SELECT state, COUNT(*) AS "Number of customers"


FROM customers
WHERE salary > 10000
GROUP BY state
HAVING COUNT(*) >= 2;

187
Oracle HAVING Example: (with GROUP BY MIN function)

EMPLOYEES

SELECT department,
MIN(salary) AS "Lowest salary"
FROM employees
GROUP BY department
HAVING MIN(salary) < 15000;

188
Oracle HAVING Example: (with GROUP BY MAX function)

EMPLOYEES

SELECT department,
MAX(salary) AS "Highest salary"
FROM employees
GROUP BY department
HAVING MAX(salary) > 30000;

189
LIKE Clause
• SQL LIKE clause is used to compare a value to similar values using wildcard
operators.
• There are two wildcards used in conjunction with the LIKE operator:
• The percent sign (%) - The percent sign represents zero, one, or multiple
characters.
• The underscore (_) - The underscore represents a single number or
character. The symbols can be used in combinations.
TOP Clause

• TOP clause is used to fetch a TOP N number or X percent


records from a table.
• The basic syntax of TOP clause with SELECT statement would
be as follows:
• SELECT TOP number|percent column_name(s) FROM
table_name WHERE [condition]
• SQL> SELECT TOP 3 * FROM CUSTOMERS;

• MySQL supports LIMIT clause to fetch limited number of


records
• SQL> SELECT * FROM CUSTOMERS LIMIT 3;
Distinct Keyword
• SQL DISTINCT keyword is used in conjunction with SELECT
statement to eliminate all the duplicate.
• The basic syntax of DISTINCT keyword to eliminate duplicate
records is as follows:

• SELECT DISTINCT column1, column2,.....columnN FROM


table_name WHERE [condition]

• SQL> SELECT DISTINCT SALARY FROM CUSTOMERS ORDER


BY SALARY;
Set Operations
• UNION Clause
• To use UNION, each SELECT must have the same number of columns
selected, the same number of column expressions, the same data
type, and have them in the same order.
• SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE
condition] UNION SELECT column1 [, column2 ] FROM table1 [,
table2 ] [WHERE condition]
• SQL> SELECT City FROM Customers
UNION
SELECT City FROM Suppliers
ORDER BY City;

• UNION ALL Clause:


• The UNION ALL operator is used to combine the results of two
SELECT statements including duplicate rows.
• INTERSECT Clause
• SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE
condition] INTERSECT SELECT column1 [, column2 ] FROM table1
[, table2 ] [WHERE condition]
• SQL> SELECT City FROM Customers
INTERSECT
SELECT City FROM Suppliers
ORDER BY City;

• EXCEPT Clause
• SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE
condition] EXCEPT SELECT column1 [, column2 ] FROM table1 [,
table2 ] [WHERE condition]
SQL> SELECT City FROM Customers
EXCEPT
SELECT City FROM Suppliers
ORDER BY City;
SQL Aggregate Functions
• COUNT Function
SELECT COUNT(*) FROM Customers;
SELECT COUNT(*) FROM Customers WHERE salary>=2000;
• SUM Function
SELECT SUM(Salary) FROM Customers;
SELECT SUM(Salary) FROM Customers WHERE salary>=2000;
• AVG function
SELECT AVG(Salary) FROM Customers;
• MAX Function
SELECT MAX(Salary) FROM Customers;
• MIN Function
SELECT MIN(Salary) FROM Customers;
Alias

• Rename a table or a column temporarily by giving another


name known as alias.

• The basic syntax of table alias is as follows:


SELECT column1, column2.... FROM table_name AS
alias_name WHERE [condition];

• The basic syntax of column alias is as follows:


SELECT column_name AS alias_name FROM table_name
WHERE [condition];
SQL Sub Queries
• Subquery or Inner query or Nested query is a query within another
SQL query and embedded within the WHERE clause.
• A subquery is used to return data that will be used in the main query
as a condition to further restrict the data to be retrieved.
• Subqueries can be used with the SELECT, INSERT, UPDATE, and
DELETE statements along with the operators like =, , >=, <=, IN,
BETWEEN etc.
• Subqueries with the SELECT Statement:
• SQL> SELECT * FROM CUSTOMERS WHERE ID IN (SELECT ID FROM
CUSTOMERS WHERE SALARY > 4500) ;
• Subqueries with the INSERT Statement:
• SQL> INSERT INTO CUSTOMERS_BKP SELECT * FROM CUSTOMERS
WHERE ID IN (SELECT ID FROM CUSTOMERS WHERE SALARY > 4500) ;
• Subqueries with the UPDATE Statement:
• SQL> UPDATE CUSTOMERS SET SALARY = SALARY * 0.25 WHERE AGE
IN (SELECT AGE FROM CUSTOMERS_BKP WHERE AGE >= 27 );

• Subqueries with the DELETE Statement:


• SQL> DELETE FROM CUSTOMERS WHERE AGE IN (SELECT AGE FROM
CUSTOMERS_BKP WHERE AGE > 27 );
• Subqueries: Guidelines
• A subquery must be enclosed in parentheses.
• A subquery must be placed on the right side of the comparison
operator.
• Subqueries cannot manipulate their results internally, therefore
ORDER BY clause cannot be added into a subquery.
• If a subquery (inner query) returns a null value to the outer query,
the outer query will not return any rows when using certain
comparison operators in a WHERE clause.
Task
• Employee table
(EMPLOYEE_ID , FIRST_NAME , LAST_NAME , EMAIL,
PHONE_NUMBER , HIRE_DATE , JOB_ID , SALARY ,
OMMISSION_PCT , MANAGER_ID , DEPARTMENT_ID )

• Queries:
1. find those employees who get higher salary than the employee
whose ID is 163.
2. find those employees whose salary matches the smallest salary
of any of the departments.
3. find those employees who report that manager whose first
name is ‘Ramesh’.
4. find the employee whose salary is 3000 and reporting person’s
ID is 121.
1. find those employees whose ID matches any of the
number 134, 159 and 183.
2. find those employees who do not work in those
departments where manager ids are in the range 100,
200.
3. find those employees who get second-highest salary.
4. find those employees who work in the same department
where ‘Clara’ works.
5. find those employees who work in a department where
the employee’s first name contains a letter 'T‘.
6. find those employees who earn more than the average
salary and work in a department with any employee
whose first name contains a character a 'J'.
Answer
1. SELECT first_name, last_name FROM employees WHERE salary >
( SELECT salary FROM employees WHERE employee_id=163 );
2. SELECT first_name, last_name, salary, department_id FROM
employees WHERE salary IN ( SELECT MIN(salary) FROM
employees GROUP BY department_id );
3. SELECT first_name, last_name, employee_id, salary FROM
employees WHERE manager_id = (SELECT employee_id FROM
employees WHERE first_name = ‘Ramesh' );
4. SELECT * FROM employees WHERE (salary,manager_id)= (SELECT
3000,121);
5. SELECT * FROM employees WHERE employee_id IN (134,159,183);
6. SELECT * FROM employees WHERE department_id NOT IN (SELECT
department_id FROM departments WHERE manager_id BETWEEN
100 AND 200);
Answer
1. SELECT * FROM employees WHERE employee_id IN (SELECT
employee_id FROM employees WHERE salary = (SELECT
MAX(salary) FROM employees WHERE salary < (SELECT
MAX(salary) FROM employees)));
2. SELECT first_name, last_name, hire_date FROM employees
WHERE department_id = ( SELECT department_id FROM
employees WHERE first_name = 'Clara') AND first_name <>
'Clara';
3. SELECT employee_id, first_name, last_name FROM employees
WHERE department_id IN ( SELECT department_id FROM
employees WHERE first_name LIKE '%T%' );
4. SELECT employee_id, first_name , salary FROM employees
WHERE salary > (SELECT AVG (salary) FROM employees ) AND
department_id IN ( SELECT department_id FROM employees
WHERE first_name LIKE '%J%');
Query Processing
It is the step-by-step process of breaking the high-level language
into a low-level language in which the machine can understand
and perform the requested action for the user.

203
Steps in Query Processing

• Validate and translate the query


• Good syntax.
• All referenced relations exist.
• Translate the SQL to relational algebra.
• Optimizer
• It is a process in which multiple query execution plan for
satisfying a query are examined and most efficient query
plan is satisfied for execution.
• Database catalog stores the execution plans and then
optimizer passes the lowest cost plan for execution.
• Evaluation Engine
• Evaluates the Query and displays the result.

204
Translation Example

Possible SQL Query:


SELECT ename
FROM account
WHERE balance<2500
Possible Relational Algebra Query:
enamebalance<2500(account))

205
Tree Representation of Relational Algebra
• A query tree is a tree data structure representing a relational algebra expression.
• The tables of the query are represented as leaf nodes. The relational algebra
operations are represented as the internal nodes. The root represents the query
as a whole.
enamebalance<2500(account))

ename

balance<2500

account
206
Optimization
 Rule-1: Perform the selection operation first.
 By doing so, we can reduce the number of records involved in
the query, rather than using the whole tables throughout the
query.

207
Rule-2: Perform all the projection as early as possible in the
query.
This is similar to selection but will reduce the number of columns
in the query.

208
Rule-3: Perform most restrictive joins and selection
operations.
When we say most restrictive joins and selection means, select
those set of tables and views which will result in comparatively
less number of records.

Inefficient way:
• ∏STD_NAME, ADDRESS, AGE, CLASS_NAME, TEACHER_NAME ((STUDENT ∞ CLASS_ID
CLASS)∞ TECH_IDTEACHER)

Efficient way:
• ∏STD_NAME, ADDRESS, AGE, CLASS_NAME, TEACHER_NAME (STUDENT ∞ CLASS_ID
(CLASS∞ TECH_IDTEACHER))

209
Evaluation

 In problem design, the relations along with the query


requirement scenario is given to you.
 You have to design the SQL statement and transform into
equivalence relational algebraic expressions, which are
known as query equivalence plans.
 For individual query plan, develop the query tree and
apply query optimization process or algorithm to find out
the best plan.
Cont.. Example

Given two relations R(A,B), S(B,C)

SQL> Select A,C


From R,S
Where R.B = S.B
And S.C=3;

Now we convert this SQL into equivalence relational


algebraic expressions.

211
Cont..

Equivalence plan-01
πAC(σC=3(R⋈S))
Equivalence plan-02
πAC(σC=3(S) ⋈ (R))

212
Cont.. Design Query Tree for each plan

π AC π AC


σ C=3

⋈ R
σ C=3

R S
QUERY
TREE-02
S
QUERY TREE-01 213
Cont..
 Here both query trees are known as equivalence plans as
they produce the same result.
 But the processing speed differs in between two query
trees/plans.
 Out of two query plans, choose an optimal plan that takes
less processing time….
 Less selection time or search time…
 Less matching time or comparison time..

214
Cont..

 Here each unit time is considered as one unit cost.


 So here, our objective is to reduce the query cost to
develop an optimal plan.
 Here we have to apply the query optimization
algorithm in which two main factors are to be
considered.
 Search time during selection operation
 Comparison time during join operation

215
Cont.. Case scenario analysis

 Let M number of tuples are there in relation R and N


number of tuples in relation S.
 Common column in both R and S with homogeneous values.
 Now we can perform the comparison operation through
natural join operator i.e. number of comparison=M x N.
 Now calculate Total CPU time= search time + comparison
time.

216
How do we select an Optimal plan

Let R(A,B) and S(B,C) are to relation having 100 tuples


in each and 10 tuples satisfy the imposed condition i.e.
C=3.
SQL> Select A,C
From R,S
Where R.B = S.B
And S.C=3;

217
Cont.. Design Query Tree for each plan

π AC π AC

⋈10x100
σ C=3 10

⋈ 100x100 R
10
σ C=3

100 100

R S QUERY TREE-02

S100
QUERY TREE-01
218
Cont..
 Total minimum CPU time for QUERY plan-01
 Total minimum CPU time = 100 x 100 + 100 x 100 = 20,000
unit time
 Total minimum CPU time for QUERY plan-02
 Total minimum CPU time= 100 + 10x100 = 1100 unit time
 Assume one comparison consumes 1 unit time and one
selection consumes 1 unit time.
 Now from above computation, we observe that Plan-02
consumes less minimal CPU time and consider as better
optimal plan than Plan-1.

219
Task
Consider Three relations i.e.
STUDENT (SID, SNAME), COURSE (CID, CNAME),
ASSIGNS (SID, CID)
Both having 1000 tuples in each.

The query is :-
Find the CNAME of Mr. X

and 100 tuples satisfy this condition.


Using query optimization technique, try to design query
execution plans and investigate the optimization factors to
select the optimized query execution plan?
220
Transaction
• The transaction is a set of logically related operation. It
contains a group of tasks.

• It generally represents change in database.

• All types of database access operation which are held


between the beginning and end transaction statements are
considered as a single logical transaction in DBMS.

• During the transaction the database is inconsistent. Only once


the database is committed the state is changed from one
consistent state to another.
States of Transactions
State Transaction types
A transaction enters into an active state
when the execution process begins. During
Active State
this state read or write operations can be
performed.
A transaction goes into the partially
Partially Committed committed state after the end of a
transaction.
When the transaction is committed to state,
it has already completed its execution
Committed State
successfully. Moreover, all of its changes are
recorded to the database permanently.
A transaction considers failed when any one
Failed State of the checks fails or if the transaction is
aborted while it is in the active state.
State of transaction reaches terminated state
Terminated State when certain transactions which are leaving
the system can’t be restarted.
Partially
Committed
Committed

Begin
Active Terminated

Failed Abort
ACID Properties
• Atomicity: A transaction is a single unit of operation. You
either execute it entirely or do not execute it at all. There
cannot be partial execution.
• Consistency: Once the transaction is executed, it should
move from one consistent state to another.
• Isolation: Transaction should be executed in isolation from
other transactions (no Locks). During concurrent transaction
execution, intermediate transaction results from
simultaneously executed transactions should not be made
available to each other. (Level 0,1,2,3)
• Durability: After successful completion of a transaction, the
changes in the database should persist. Even in the case of
system failures.
Atomicity
Atomicity involves the following two operations:
• Abort: If a transaction aborts then all the changes made are not
visible.
• Commit: If a transaction commits then all the changes made are
visible.
Example: Let's assume that following transaction T consisting of T1
and T2. A consists of Rs 600 and B consists of Rs 300. Transfer Rs
100 from account A to account B.
After completion of the transaction, A consists of Rs 500 and B
consists of Rs 400.

T1 T2

Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
Consistency
• The integrity constraints are maintained so that the database is
consistent before and after the transaction.
• The execution of a transaction will leave a database in either its
prior stable state or a new stable state.
• The consistent property of database states that every transaction
sees a consistent database instance.
• The transaction is used to transform the database from one
consistent state to another consistent state.
• For example: The total amount must be maintained before or
after the transaction.
• Total before T occurs = 600+300=900
• Total after T occurs= 500+400=900
• Therefore, the database is consistent. In the case when T1 is
completed but T2 fails, then inconsistency will occur.
Isolation

• It shows that the data which is used at the time of execution


of a transaction cannot be used by the second transaction
until the first one is completed.
• In isolation, if the transaction T1 is being executed and using
the data item X, then that data item can't be accessed by
any other transaction T2 until the transaction T1 ends.
• The concurrency control subsystem of the DBMS enforced
the isolation property.
Durability

• The durability property is used to indicate the performance


of the database's consistent state. It states that the
transaction made the permanent changes.
• They cannot be lost by the erroneous operation of a faulty
transaction or by the system failure. When a transaction is
completed, then the database reaches a state known as the
consistent state. That consistent state cannot be lost, even in
the event of a system's failure.
• The recovery subsystem of the DBMS has the responsibility
of Durability property.
Schedule
• It is a process of lining the transactions and executing them one by
one.
• Scheduling is required when there are multiple transactions that are
running in a concurrent manner and the order of operation is needed
to be set so that the operations do not overlap each other.

Schedules

Parallel /
Serial Serializability
Non-serial
Serial Schedule

• The serial schedule is a type of schedule where one


transaction is executed completely before starting another
transaction.
• In the serial schedule, when the first transaction completes
its cycle, then the next transaction is executed.
• For example: Suppose there are two transactions T1 and T2
which have some operations. If it has no interleaving of
operations, then there are the following two possible
outcomes:
• Execute all the operations of T1 which was followed by all
the operations of T2.
• Execute all the operations of T1 which was followed by all
the operations of T2.
Parallel / Non-serial Schedule
• If interleaving of operations is allowed, then there will be non-serial
schedule.
• It contains many possible orders in which the system can execute the
individual operations of the transactions.
• In the given figure (c) and (d), Schedule C and Schedule D are the non-
serial schedules. It has interleaving of operations.
Read/Write Conflicts
Serializable schedule
• The serializability of schedules is used to find non-
serial schedules that allow the transaction to
execute concurrently without interfering with one
another.
• It identifies which schedules are correct when
executions of the transaction have interleaving of
their operations.
• A non-serial schedule will be serializable if its result
is equal to the result of its transactions executed
serially.
• To find a clone of a non-serial schedule which will
be a serial schedule.
Conflict Serializable
A schedule is called conflict serializable if it can be transformed into a serial
schedule by swapping non-conflicting operations.
• Conflicting operations: Two operations are said to be conflicting if all
conditions satisfy:
• They belong to different transactions
• They operate on the same data item
• At Least one of them is a write operation
• Example: –
• Conflicting operations pair (R1(A), W2(A)) because they belong to two
different transactions on same data item A and one of them is write
operation.
• Similarly, (W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
• On the other hand, (R1(A), W2(B)) pair is non-conflicting because they
operate on different data item.
• Similarly, ((W1(A), W2(B)) pair is non-conflicting.
Testing of Serializability
• Serialization Graph is used to test the Serializability of a
schedule.
• Assume a schedule S. For S, we construct a graph known as
precedence graph. This graph has a pair G = (V, E), where V
consists a set of vertices, and E consists a set of edges.

• Create a node Ti → Tj
• if Ti executes write (A) before Tj executes read (A).
• Create a node Ti → Tj
• if Ti executes read (A) before Tj executes write (A).
• Create a node Ti → Tj
• if Ti executes write (A) before Tj executes write (A).
• If a precedence graph contains a single edge Ti → Tj, then all
the instructions of Ti are executed before the first instruction
of Tj is executed.
• If a precedence graph for schedule S contains a cycle, then S
is non-serializable. If the precedence graph has no cycle,
then S is known as serializable.
Example-1
Read(A): In T1, no subsequent writes to A, so
no new edges
Read(B): In T2, no subsequent writes to B, so
no new edges
Read(C): In T3, no subsequent writes to C, so
no new edges
Write(B): B is subsequently read by T3, so add
edge T2 → T3
Write(C): C is subsequently read by T1, so add
edge T3 → T1
Write(A): A is subsequently read by T2, so add
edge T1 → T2
Write(A): In T2, no subsequent reads to A, so
no new edges
Write(C): In T1, no subsequent reads to C, so
no new edges
Write(B): In T3, no subsequent reads to B, so no
new edges
Example-2
Read(A): In T4,no subsequent writes to A, so no
new edges
Read(C): In T4, no subsequent writes to C, so no
new edges
Write(A): A is subsequently read by T5, so add
edge T4 → T5
Read(B): In T5,no subsequent writes to B, so no
new edges
Write(C): C is subsequently read by T6, so add
edge T4 → T6
Write(B): A is subsequently read by T6, so add
edge T5 → T6
Write(C): In T6, no subsequent reads to C, so no
new edges
Write(A): In T5, no subsequent reads to A, so no
new edges
Write(B): In T6, no subsequent reads to B, so no
new edges
View Serializable
• A Schedule is called view serializable if it is view equal to a serial
schedule (no overlapping transactions).
• A conflict schedule is a view serializable but if the serializability
contains blind writes, then the view serializable does not conflict
serializable.

T1 T2 T3 T1 T2 T3

R(a) R(a)

a=a+50 a=a+30
W(a) W(a)

a=a+30 a=a+50
W(a) W(a)

a=a-20 a=a-20
W(a) W(a)
Concurrency Control

242
Database concurrency
 The term concurrency may be defined as the concurrent or
simultaneous operations by more than one transactions
over a data item in database.
 In the concurrency control, the multiple transactions can be
executed simultaneously.
 It may affect the transaction result.
 It is highly important to maintain
the order of execution
of those transactions.

243
Problems of concurrency control
 Several problems can occur when concurrent transactions
are executed in an uncontrolled manner.
 Following are the three problems in concurrency control.
 Lost updates
 Dirty read
 Incorrect summary

244
1. Lost updates

 When two transactions that access the same database items for update
operations , that makes the value of same data item incorrect, then the
lost update problem occurs.
 If two transactions X and Y read a data item and then update it, then the
effect of updating of the first record will be overwritten by the second
update.

 At time T5, the update of Transaction-X is lost because Transaction y


overwrites it without looking at its current value.
 Such type of problem is known as Lost Update Problem as update made
by one transaction is lost here.

245
2. Dirty Read
 The dirty read occurs in the case when one
transaction updates an item of the database, and
then the transaction fails for some reason.
 (before commit by T1 if trans fails )
Such type of problem is known as Dirty Read Problem, as one
transaction reads a dirty value which has not been committed.

T1 T2
Read_item (x)
X= x-500
Write_item(x)
Read_item (x)
X= x+ 1000
Write_item(x)

246
3. Incorrect summary

 At the end of each transaction execution, we may not get


the correct value, if those transaction are executed in an
interleaving way.

247
Concurrency Control Protocol
 Concurrency control protocols ensure atomicity, isolation,
and serializability of concurrent transactions.
 The concurrency control protocol can be divided into three
categories:

1. Lock based protocol


2. Time-stamp protocol

248
Binary Lock Protocol
If lock(x)= 1 ;; data item x is locked and other trans can
not access.

If lock(x)= 0 ;; data item x is unlocked and other trans


can access.
T must perform lock operation before any read/write
trans.

Disavantage:
1. The DBMS will not allow two transactions to read the same
database object
249
Shared/Exclusive Lock protocol

 any transaction cannot read or write data until it acquires an


appropriate lock on it.

1. Shared lock:
 It is also known as a Read-only lock. In a shared lock, the data item can
only read by the transaction.
 It can be shared between the transactions because when the transaction
holds a lock, then it can't update the data on the data item.

2. Exclusive lock:
 In the exclusive lock, the data item can be both reads as well as written
by the transaction.
 This lock is exclusive, and in this lock, multiple transactions do not
modify the same data simultaneously.

250
Two-phase locking protocol (2PL)
A transaction is said to follow the
Two-Phase Locking protocol if
Locking and Unlocking can be done
in two phases.
 Growing Phase: New locks on
data items may be acquired but
none can be released.
 Shrinking Phase: Existing locks
may be released but no new
locks can be acquired.

251
What is LOCK POINT?
The Point at which the growing phase ends, i.e., when a transaction takes the final lock it
needs to carry on its work.
2-PL ensures serializability, but there are still some drawbacks of 2-PL.
• Cascading Rollback is possible under 2-PL.
• Deadlocks and Starvation are possible.

Because of Dirty Read in T2 and T3 in


lines 8 and 12 respectively, when
T1 failed we have to roll back others
also. Hence, Cascading Rollbacks
are possible in 2-PL.

252
Categories of Two Phase Locking
• Strict 2-PL
• Rigorous 2-PL
• Conservative 2-PL
• Strict 2-PL :
• This requires that in addition to the lock being 2-Phase all
Exclusive(X) locks held by the transaction be released
until after the Transaction Commits.
Following Strict 2-PL ensures that our schedule is:
• Recoverable
• Cascadeless
• Hence, it gives us freedom from Cascading Abort which was
still there in Basic 2-PL and moreover guarantee Strict
Schedules but still, Deadlocks are possible!
• Rigorous 2-PL
• This requires that in addition to the lock being 2-Phase all
Exclusive(X) and Shared(S) locks held by the transaction be
released until after the Transaction Commits.
Following Rigorous 2-PL ensures that our schedule is:
• Recoverable
• Cascadeless
• Conservative 2-PL
• A Static 2-PL, this protocol requires the transaction to lock
all the items it access before the Transaction begins
execution by pre declaring its read-set and write-set.
• If any of the pre declared items needed cannot be locked,
the transaction does not lock any of the items, instead, it
waits until all the items are available for locking.
• Conservative 2-PL is Deadlock free and but it does not
ensure a Strict schedule
Timestamp Ordering Protocol
 The Timestamp Ordering Protocol is used to order the
transactions based on their Timestamps.
 The order of transaction is nothing but the ascending order
of the transaction creation.
 The priority of the older transaction is higher that's why it
executes first.
 To determine the timestamp of the transaction, this protocol
uses system time or logical counter.

255
Basic Timestamp ordering protocol
• 1. Check the following condition whenever a transaction Ti
issues a Read (X) operation:
• If TS(Ti) < W_TS(X) then the operation is rejected.
• Otherwise, the operation is executed.
• Timestamps of the data item is updated.
• Set R_TS(X)=max(R_TS(X),TS(Ti))

• 2. Check the following condition whenever a transaction Ti


issues a Write(X) operation:
• If TS(Ti) < R_TS(X) then the operation is rejected.
• If TS(Ti) < W_TS(X) then the operation is rejected and Ti is
rolled back
• otherwise the operation is executed.
• Set W_TS(X)=TS(Ti)
Cont..

257
Deadlock in DBMS
 A deadlock is a condition where two or more transactions
are waiting indefinitely for one another to give up locks.
 no task ever gets finished and is in waiting state forever.

258
Deadlock in DBMS
Different Conditions where deadlock occurs
1. Hold and wait
2. Mutual exclusion
3. No Preemption
4. Circular wait

Two techniques to resolve the deadlock situation ?

5. Wait and die

6. Wound and wait

259
Wait-Die scheme
• if a transaction requests for a resource which is already held
with a conflicting lock by another transaction then the DBMS
simply checks the timestamp of both transactions.
• It allows the older transaction to wait until the resource is
available for execution.

• Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj


has held some resource, then Ti is allowed to wait until the
data-item is available for execution.

• Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held


some resource and if Tj is waiting for it, then Tj is killed and
restarted later with the random delay but with the same
timestamp.
Wound wait scheme
• In wound wait scheme, if the older transaction requests for a
resource which is held by the younger transaction, then
older transaction forces younger one to kill the transaction
and release the resource. After the minute delay, the
younger transaction is restarted but with the same
timestamp.

• If the older transaction has held a resource which is


requested by the Younger transaction, then the younger
transaction is asked to wait until older releases it.

You might also like