CS3492 DBMS Notes
CS3492 DBMS Notes
CS3492 DBMS Notes
com
3003
OBJECTIVES:
Purpose of Database System – Views of data – Data Models – Database System Architecture –
UNIT II www.EnggTree.com
DATABASE DESIGN 8
Algorithms for Selection, Sorting and join operations – Query optimization using Heuristics -
Cost Estimation.
TEXT BOOKS:
REFERENCES:
www.EnggTree.com
1. C.J.Date, A.Kannan, S.Swamynathan, “An Introduction to Database Systems”, Eighth Edition,
Purpose of Database System – Views of data – Data Models – Database System Architecture –
Introduction to relational databases – Relational Model – Keys- Relational Algebra – SQL
fundamentals – Advanced SQL features – Embedded SQL– Dynamic SQL
INTRODUCTION
“A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. The collection of data, usually referred to as the database,
contains information relevant to an enterprise. The primary goal of a DBMS is to provide a
way to store and retrieve database information that is both convenient and efficient.”
www.EnggTree.com
Database-System Applications
Databases are widely used. Here are some applications:
▪ Sales: For customer, product, and purchase information.
▪ Accounting: For payments, receipts, account balances, assets and other accounting
information.
▪ Human resources: For information about employees, salaries, payroll taxes, and
benefits, and for generation of paychecks.
▪ Manufacturing: For management of the supply chain and for tracking production of
items in factories, inventories of items in warehouses and stores, and orders for items.
▪ Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.
▪ Banking and Finance
o Banking: For customer information, accounts, loans, and banking transactions.
o Credit card transactions: For purchases on credit cards and generation of
monthly statements.
formats, writing new application programs to retrieve the appropriate data is difficult.
Integrity problems. The data values stored in the database must satisfy certain types of
consistency constraints. Suppose the university maintains an account for each department,
and records the balance amount in each account. Suppose also that the university requires
that the account balance of a department may never fall below zero. Developers enforce
these constraints in the system by adding appropriate code in the various application
programs.
Atomicity problems. A computer system, like any other device, is subject to failure. In
many applications, it is crucial that, if a failure occurs, the data be restored to the
consistent state that existed prior to the failure. Consider a program to transfer $500 from
the account balance of department A to the account balance of department B. If a system
failure occurs during the execution of the program, it is possible that the $500 was
removed from the balance of department A but was not credited to the balance of
● Security problems.
VIEWS OF DATA
A database system is a collection of interrelated data and a set of programs that allow users to
www.EnggTree.com
access and modify these data. A major purpose of a database system is to provide users with an
abstract view of the data. That is, the system hides certain details of how the data are stored
and maintained.
Data Abstraction
Since many database-system users are not computer trained, developers hide the complexity
from users through several levels of abstraction, to simplify users’ interactions with the system:
● Physical level. The lowest level of abstraction describes how the data are actually
stored. The physical level describes complex low-level data structures in detail.
● Logical level. The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus
describes the entire database in terms of a small number of relatively simple structures.
This is referred to as physical data independence.
● View level. The highest level of abstraction describes only part of the entire database.
DATABASE MODELS
A Database model defines the logical design and structure of a database and defines how
data will be stored, accessed and www.EnggTree.com
updated in a database management system.
a. Hierarchical Model
This database model organizes data into a tree-like-structure, with a single root, to which all
the other data is linked. The hierarchy starts from the Root data, and expands like a tree,
adding child nodes to the parent nodes. In this model, a child node will only have a single
parent node.
b. Network Model
This is an extension of the Hierarchical model. In this model data is organized more like a
graph, and are allowed to have more than one parent node. This database model was used to
map many-to-many data relationships.
www.EnggTree.com
c. Entity-relationship Model
In this database model, relationships are created by dividing object into entity and its
characteristics into attributes. Different entities are related using relationships.
Let's take an example, If we have to design a School Database, then Student will be
an entity with attributes name, age, address etc. As Address is generally complex, it can
be another entity with attributes street name, pincode, city etc, and there will be a relationship
between them.
d. Relational Model
In this model, data is organized in two-dimensional tables and the relationship is
maintained by storing a common field. The basic structure of data in the relational
model is tables. All the information related to a particular type is stored in rows of that
table. Hence, tables are also known as relations in relational model.
www.EnggTree.com
c) 3-tier architecture
d) 4- n-tier architecture
a) 1-tier architecture
One-tier architecture involves putting all of the required components for a software application
or technology on a single server or platform.
www.EnggTree.com
Basically, a one-tier architecture keeps all of the elements of an application, including the
interface, Middleware and back-end data, in one place.
b) 2-tier architecture
The two-tier is based on Client Server architecture. The two-tier architecture is like client
server application. The direct communication takes place between client and server. There is
no intermediate between client and server.
Advantages
1. Easy to maintain and modification is bit easy.
2. Communication is faster.
Disadvantages
1. In two tier architecture application performance will be degrade upon increasing the users.
2. Cost-ineffective.
c) 3-tier architecture
A 3-tier architecture separates its tiers from each other based on the complexity of the users
and how they use the data present in the database. It is the most widely used architecture to
design a DBMS. www.EnggTree.com
It can be used in web applications and distributed applications.
1. logic tier,
2. the presentation tier, and
3. The data tier.
Transaction Management
A transaction is a collection of operations that performs a single logical function in a database application.
Transaction-management component ensures that the database remains in a consistent (correct) state despite
system failures (e.g. power failures and operating system crashes) and transaction failures.
Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the
consistency of the database.
Storage Management
• A storage manager is a program module that provides the interface between the low-level data stored
in the database and the application programs and queries submitted to the system.
• The storage manager is responsible for the following tasks:
• Interaction with the file manager
database access.
o The object code is then optimized in the best way to execute a query by the
query optimizer and then send to the data manager.
www.EnggTree.com
3. Data Manager:
o The Data Manager is the central software component of the DBMS also knows
as Database Control System.
o The Main Functions Of Data Manager Are:
1. Convert operations in user's Queries coming from the application programs
or combination of DML Compiler and Query optimizer which is known as
Query Processor from user's logical view to physical file system.
www.EnggTree.com
www.EnggTree.com
a) Domain constraints
b) Key constraints
c) Referential integrity constraints
a) Domain Constraints
Domain constraints can be violated if an attribute value is not appearing in the corresponding
domain or it is not of the appropriate data type.
Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real
numbers, characters, Booleans, variable length strings, etc.
Example:
Create DOMAIN CustomerName CHECK (value not NULL)
The example shown demonstrates creating a domain constraint such that CustomerName is not
NULL.
Key constraints
www.EnggTree.com
An attribute that can uniquely identify a tuple in a relation is called the key of the table. The
value of the attribute for different tuples in the relation has to be unique.
Example:
In the given table, CustomerID is a key attribute of Customer Table. It is most likely to have a
single key for one customer, CustomerID =1 is only for the CustomerName =" Google".
www.EnggTree.com
a) Insert Operation
The insert operation gives values of the attribute for a new tuple which should be inserted into
a relation.
b) Update Operation
You can see that in the below-given relation table CustomerName= 'Apple' is updated from
Inactive to Active.
www.EnggTree.com
c) Delete Operation
To specify deletion, a condition on the attributes of the relation selects the tuple to be deleted.
d) Select Operation
● Relational databases can sometimes become complex as the amount of data grows, and
the relations between pieces of data become more complicated.
● Complex relational database systems may lead to isolated databases where the
information cannot be shared from one system to another.
1. Table = Relation
2. Row = Record/Tuple
3. Column = Attribute/Field
Example:
www.EnggTree.com
5. Compound key
6. Secondary or Alternative key
7. Non- key attribute
8. Non- prime attribute
9. Foreign key
10. Simple key
11. Artificial key
1) Super keys
Super key is a set of one or more than one columns (attributes) which uniquely identifies
each record in a table. Super key is a super set of candidate key.
www.EnggTree.com
For example: Roll No. is unique in relation. This can be selected as a super key. Also we
can select more than one column as a super key to uniquely identify a row, like roll no., First
name.
2) Candidate keys
Candidate key is a set of one or more than one columns (attributes) which uniquely
identifies each record in a table, but there must not be redundant values (repetition of cells)
in selected attribute. Candidate key is a sub set of Super key.
For example: Roll No. is unique in relation. This can be selected as a candidate key. Also
we can select more than one column as a candidate key to uniquely identify a record. Unlike
the super key in above example we can select only those attributes which don’t have repeating
cells like course code.
3) Primary keys
Primary key is used to uniquely identify a record in relation. The primary keys are
compulsory in every table. The primary keys are having model stability, occurrence of
minimum fields, being definitive and feature of accessibility.
www.EnggTree.com
Only Roll No. is unique in the above table, so it is selected as primary key. Course code can
also be selected as a primary key.
4) Composite keys
Composite Key has at-least two or more than two attributes which specially identifies the
occurrence of an entity.
In the above example the Roll No. and Course Code is combined to uniquely identify the
record in relation.
5) Compound key
Like other keys Compound key is also used to uniquely recognize a record in relation.
This can be an attribute or a set of attributes, but the attributes in relation cannot be use
as independent keys. If we use them individually, we will not get any unique record.
www.EnggTree.com
7) Non-key Attribute
The attributes excluding the candidate keys are called as non-key attributes.
Example: If we consider Roll No. and Course code as candidate key then First Name of
Student and First Name of Student will be Non Key attribute.
8) Non-prime Attribute
Example: It is considered as only Roll No. is primary key, so all the remaining attributes will
be non-prime attributes, but if we considering course code also a primary key than it will not
non-prime attribute.
9) Foreign keys
www.EnggTree.com
Foreign key is a key of one table, which points to the primary key in second table. It has a
relationship with primary key in another table.
RELATIONAL ALGEBRA
Relational algebra is a procedural query language that works on relational model. The purpose
of a query language is to retrieve data from database or perform various operations such as
insert, update, delete on the data.
On the other hand relational calculus is a non-procedural query language, which means it tells
what data to be retrieved but doesn’t tell how to retrieve it.
Types of operations in relational algebra
1. Basic Operations
2. Derived Operations
Basic/Fundamental Operations:
1. Select (σ)
2. Project (∏)
3. Union (𝖴)
4. Set Difference (-)
Derived Operations:
1. Natural Join (⋈)
2. Left, Right, Full outer join (⟕, ⟖, ⟗)
3. Intersection (∩)
4. Division (÷)
www.EnggTree.com
Select Operator (σ) Example
Table: CUSTOMER
Query:
σ Customer_City="Agra" (CUSTOMER)
Output:
Customer_Id Customer_Name Customer_City
Query:
∏ Customer_Name, Customer_City (CUSTOMER)
Output:
Customer_Name Customer_City
Steve Agra
Raghu Agra
Chaitanya Noida
Ajeet Delhi
Carl Delhi
Table 1: COURSE
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Table 2: STUDENT
Query:
∏ Student_Name (COURSE) 𝖴 ∏ Student_Name (STUDENT)
Output:
Student_Name
Aditya
Carl
Paul
Lucy
Rick
Steve
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Table 2: STUDENT
Query:
∏ Student_Name (COURSE) ∩ ∏ Student_Name (STUDENT)
Output:
Student_Name
Aditya
Steve
Paul
Lucy
www.EnggTree.com
5. Set Difference (-)
Set Difference is denoted by – symbol. Let’s say we have two relations R1 and R2 and we want
to select all those tuples(rows) that are present in Relation R1 but not present in Relation R2,
this can be done using Set difference R1 – R2.
Query:
Let’s write a query to select those student names that are present in STUDENT table but not
present in COURSE table.
∏ Student_Name (STUDENT) - ∏ Student_Name (COURSE)
Output:
Student_Name
Carl
Rick
Col_X Col_Y
XX 99
YY 11
ZZ 101
Table 2: S
Query:
Let’s find the Cartesian product of table R and S.
RXS
Output:
Note: The number of rows in the output will always be the cross product of number of rows in
each table. In our example table 1 has 3 rows and table 2 has 3 rows so the output has 3×3 = 9
rows.
7. Rename (ρ)
Rename (ρ) operation can be used to rename a relation or an attribute of a relation.
Syntax:
ρ(new_relation_name, old_relation_name)
Table: CUSTOMER
Customer_Id Customer_Name Customer_City
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:
ρ(CUST_NAMES, ∏(Customer_Name)(CUSTOMER))
Output:
CUST_NAMES
Steve
www.EnggTree.com
Raghu
Chaitanya
Ajeet
Carl
8. Joins
Join is a combination of a Cartesian product followed by a selection process. A Join operation
pairs two tuples from different relations, if and only if a given join condition is satisfied.
Types of join
✔ Theta (θ) Join
Theta join combines tuples from different relations provided they satisfy the theta condition.
The join condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the
attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
Student
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
www.EnggTree.com
11 Music
11 Sports
Student_Detail −
STUDENT ⋈Student.Std = Subject.Class SUBJECT
Student_detail
Equijoin
✔ When Theta join uses only equality comparison operator, it is said to be equijoin. The
above example corresponds to equijoin.
Courses
CS01 Database CS
ME01 Mechanics ME
EE01 Electronics EE
HoD
Dept Head
CS Alex
www.EnggTree.com
ME Maya
EE Mira
Courses ⋈ HoD
All the tuples from the Left relation, R, are included in the resulting relation. If there are
tuples in R without any matching tuple in the Right relation S, then the S-attributes of the
resulting relation are made NULL.
Left
A B
100 Database
101 Mechanics
102 Electronics
www.EnggTree.com
Right
A B
100 Alex
102 Maya
104 Mira
Courses HoD
A B C D
All the tuples from the Right relation, S, are included in the resulting relation. If there are
tuples in S without any matching tuple in R, then the R-attributes of resulting relation are
made NULL.
Courses HoD
A B C D
www.EnggTree.com
All the tuples from both participating relations are included in the resulting relation. If there
are no matching tuples for both relations, their respective unmatched attributes are made
NULL.
Courses HoD
A B C D
www.EnggTree.com
SQL FUNDAMENTALS
These SQL commands are mainly categorized into four categories as:
1. DDL – Data Definition Language
2. DQl – Data Query Language
3. DML – Data Manipulation Language
4. DCL – Data Control Language
Example of DQL:
● SELECT – is used to retrieve data from the a database.
3. DML(Data Manipulation Language) : The SQL commands that deals with the
manipulation of data present in the database belong to DML or Data Manipulation
Language and this includes most of the SQL statements.
Examples of DML:
● INSERT – is used to insert data into a table.
● UPDATE – is used to update existing data within a table.
● DELETE – is used to delete records from a database table.
Creating a Database
To create a database in RDBMS, create command is used. Following is the syntax,
create database <db_name>
The above command will create a database named test, which will be an empty schema
without any table.
To create tables in this newly created database, we can again use the create command.
Creating a Table
Create command can also be used to create tables. Now when we create a table, we have to
specify the details of the columns of the tables too. We can specify the names and data types
of various columns in the create command itself.
www.EnggTree.com
Most commonly used data types for Table columns
Datatype Use
VARCHAR used for columns which will be used to store characters and integers, basically a string.
CHAR used for columns which will store char values(single character).
used for columns which will store text which is generally long in length. For example, if
you create a table for storing profile information of a social networking website, then for
TEXT about me section you can have a column of type TEXT.
www.EnggTree.com
Example:
SQL> create table bankAccount(id number(3),custname varchar(15),branch varchar(10));
Table created.
ID NUMBER(3)
CUSTNAME VARCHAR2(15)
BRANCH VARCHAR2(10)
Using ALTER command we can add a column to any existing table. Following is the syntax,
ALTER TABLE table_name ADD(column_name datatype);
ALTERcommand can also be used to modify data type of any existing column. Following is
the syntax,
ALTER TABLE table_name modify( column_name
www.EnggTree.com
Table altered.
ID NUMBER(4)
CUSTNAME VARCHAR2(15)
BRANCH VARCHAR2(10)
ID NUMBER(4)
CUSTNAME VARCHAR2(15)
CITY VARCHAR2(10)
ID NUMBER(4)
CUSTNAME VARCHAR2(15)
CITY
www.EnggTree.com
VARCHAR2(10)
TRUNCATE command
TRUNCATE command removes all the records from a table. But this command will not
destroy the table's structure. When we use TRUNCATE command on a table its (auto-
increment) primary key is also initialized. Following is its syntax,
ID NUMBER(4)
CUSTNAME VARCHAR2(15)
CITY VARCHAR2(10)
DROP command
DROP command completely removes a table from the database. This command will also
destroy the table structure and the data stored in it. Following is its syntax,
RENAME query
RENAMEcommand is used to set a new name for any existing table. Following is the syntax,
RENAME TABLE old_table_name to new_table_name
2. DML Command
Using INSERT SQL command
Data Manipulation Language (DML) statements are used for managing data in database. DML
commands are not auto-committed. It means changes made by DML command are not
permanent to database, it can be rolled back.
INSERT command
Insert command is used to insert data into a table. Following is its general
syntax, INSERT INTO table_name VALUES(data1, data2, ...)
The above SQL query will only insert id and name values in the newly inserted record.
Insert NULL value to a column
Both the statements below will insert NULL value into age column of the student table.
SQL> desc acct;
Name Null? Type
ID NUMBER(4)
CUSTNAME VARCHAR2(15)
CITY VARCHAR2(10)
www.EnggTree.com
INSERT – is used to insert data into a table.
SQL> insert into acct
old 2: values(&id,'&custname','&city')
new 2: values(102,'sreeram','bangalore')
www.EnggTree.com
1 row created.
SQL> /
Enter value for id: 103
Enter value for custname:
mohan Enter value for city:
kerala
old 2: values(&id,'&custname','&city')
new 2: values(103,'mohan','kerala')
1 row
created.
SQL> /
Enter value for id: 104
Enter value for custname: setti
Enter value for city: bengal www.EnggTree.com
old 2: values(&id,'&custname','&city')
new 2: values(104,'setti','bengal')
1 row
created.
SQL> /
Enter value for id: 105
Enter value for custname:
balaji Enter value for city:
delhi
old 2: values(&id,'&custname','&city')
new 2: values(105,'balaji','delhi')
1 row created.
ID CUSTNAME CITY
www.EnggTree.com
WHERE is used to add a condition to any SQL query, we will soon study about it in detail.
102 Alex 18
103 Abhi 17
1 row updated.
www.EnggTree.com
ID CUSTNAME CITY
101 Adam 15
102 Alex 18
103 Abhi 17
The above command will delete all the records from the table student.
Delete a particular Record from a Table
In our student table if we want to delete a single record, we can use the WHERE clause to
provide a condition in our DELETE statement.
DELETE FROM student WHERE s_id=103;
The above command will delete the record where s_id is 103 from the table student.
101 Adam 15
102 Alex 18
www.EnggTree.com
table. SQL> delete acct where id=103;
1 row deleted.
SQL> commit;
Commit complete.
acct;
ID CUSTNAME CITY
ROLLBACK command
This command restores the database to last commited state. It is also used with SAVEPOINT
command to jump to a savepoint in an ongoing transaction.
Following is rollback command's syntax,
ROLLBACK TO savepoint_name;
SAVEPOINT command
www.EnggTree.com
SAVEPOINT command is used to temporarily save a transaction so that you can rollback to
that point whenever required.
SQL>savepoint s1;
Savepoint created.
ID NAME
www.EnggTree.com
1 Abhi
2 Adam
4 Alex
Let’s use some SQL queries on the above table and see the results.
INSERT INTO class VALUES(5, 'Rahul');
COMMIT;
SAVEPOINT A;
SAVEPOINT B;
SAVEPOINT C;
ID NAME
1 Abhi
2 Adam
4 Alex
5 Abhijit
www.EnggTree.com
6 Chris
7 Bravo
Now let's use the ROLLBACK command to roll back the state of data to the savepoint B.
ROLLBACK TO B;
1 Abhi
2 Adam
4 Alex
5 Abhijit
6 Chris
Now let's again use the ROLLBACK command to roll back the state of data to the savepoint A
ROLLBACK TO A;
1 Abhi
www.EnggTree.com
2 Adam
4 Alex
5 Abhijit
So now you know how the commands COMMIT, ROLLBACK and SAVEPOINT works.
Database Querying – Simple Queries, Nested Queries, Sub Queries and Joins
Syntax
SELECT column1, column2, columnN FROM table_name;
Here, column1, column2... are the fields of a table whose values you want to fetch. If you want
to fetch all the fields available in the field, then you can use the following syntax.
ID NAME SALARY
1 Ramesh 2000.00
2 Khilan 1500.00
3 kaushik 2000.00
4 Chaitali 6500.00
5 Hardik 8500.00
6 Komal 4500.00
7 Muffy 10000.00
If you want to fetch all the fields of the CUSTOMERS table, then you should use the following
query.
www.EnggTree.com
Product Table
Prod_id Prod_Name Quantity Price
Sub Query
If a Query that contains another Query, then the Query inside the main Query is called a Sub
Query and the main Query is known as the parent Query. In Oracle the Sub Query will
executed on the prior basis and the result will be available to the parent Query and then the
execution of the parent/main Query takes place. Sub Queries are very useful for selecting rows
from a table having a condition that depends on the data of the table itself. A Sub Query can
also be called a Nested/Inner Query.
www.EnggTree.com
Syntax
SELECT <column, ...> FROM <table> WHERE expression operator (
SELECT<column,...> FROM<table>WHERE <condition> );
Or
SELECT Col_name [, Col_name] FROM table1 [,table2] WHERE Col_name OPERATOR (
SELECT Col_name [,Col_name] FROM table1 [,table2] [WHERE] );
STUDENT TABLE
SUBJECT TABLE
www.EnggTree.com
www.EnggTree.com
In a Single Row Sub Query the queries return a single/one row of results to the parent/main
Query. It can include any of the following operators:
∙ = Equals to
● Greater than
● < Less than
● >= Greater than Equals to
● <= Less than Equals to
● Not Equals to
Example
SELECT * FROM employees WHERE salary = (SELECT MIN(salary) FROM employees);
www.EnggTree.com
Single Row Sub Query using HAVING Clause
SELECT department_id,
MIN(salary) FROM employees
GROUP BY department_id
HAVING MIN(salary) > ( SELECT MIN(salary)
FROM employees
WHERE department_id = 50);
www.EnggTree.com
Example
www.EnggTree.com
www.EnggTree.com
Note: We can use a Sub Query using a FROM clause in the main query.
www.EnggTree.com
When we write a Sub Query in a WHERE and HAVING clause of another Sub Query then it is
called a nested Sub Query.
SELECT e.first_name,e.salary
FROM employees e WHERE e.manager_id in ( SELECT e.manager_id FROM employees e
WHERE department_id in (select d.department_id
FROM departments d
WHERE d.department_name='Purchasing' ));
www.EnggTree.com
www.EnggTree.com
STUDENT
COURSE
C_ID C_NAME
C1 DSA
C2 Programming
C3 DBMS
STUDENT_COURSE
S_ID C_ID
www.EnggTree.com
S1 C1
S1 C3
S2 C1
S3 C2
S4 C2
S4 C3
IN: If we want to find out S_ID who are enrolled in C_NAME ‘DSA’ or ‘DBMS’, we can
write it with the help of independent nested query and IN operator. From COURSE table, we
can find out C_ID for C_NAME ‘DSA’ or DBMS’ and we can use these C_IDs for finding
S_IDs from STUDENT_COURSE TABLE.
Note: If we want to find out names of STUDENTs who have either enrolled in ‘DSA’ or
‘DBMS’, it can be done as: www.EnggTree.com
NOT IN: If we want to find out S_IDs of STUDENTs who have neither enrolled in ‘DSA’ nor
in ‘DBMS’, it can be done as:
IN
The innermost query will return a set with members C1 and C3. Second inner query will return
those S_IDs for which C_ID is equal to any member of set (C1 and C3 in this case) which are
S1, S2 and S4. The outermost query will return those S_IDs where S_ID is not a member of
set
(S1, S2 and S4). So it will return S3.
Co-related Nested Queries: In co-related nested queries, the output of inner query depends on
the row which is being currently executed in outer query. e.g.; If we want to find out S_NAME
of STUDENTs who are enrolled in C_ID ‘C1’, it can be done with the help of co- related
nested query as:
JOINS IN ORACLE
In Oracle, a join is the most powerful operation for merging information from multiple tables
based on a common field. There are various types of joins but an INNER JOIN is the common
of them.
Syntax
SELECT col1, col2, col3...
FROM table_name1,
table_name2
WHERE table_name1.col2 = table_name2.col1;
Types Of Joins
www.EnggTree.com
To understand each of the preceding joins clearly we are assuming the following
"CUSTOMER" and "ORDERS" tables: CREATE TABLE Customer
(
Cust_id Number(10) NOT NULL, Cust_name
varchar2(20), Country varchar2(20), Receipt_no
Number(10),
Order_id Number(10) NOT NULL,
);
Table: CUSTOMER
First of all we will explain the "USING" clause and the "ON" clause.
1. Using Clause
To join a table using the USING Clause we write the following command.
Query
SELECT Cust_id, Cust_name, Country, item_Ordered, Order_date
FROM Customer C JOIN Orders O USING (Order_id);
www.EnggTree.com
2. On Clause
Query
Equi Join
An Equi join is used to get the data from multiple tables where the names are common and the
www.EnggTree.com
columns are specified. It includes the equal ("=") operator.
Example
SELECT Cust_id, Cust_name, item_Ordered, Order_date
FROM Customer C, Orders O WHERE C.Order_id = O.Order_id;
1. Inner Join
An Inner Join retrieves the matching records, in other words it retrieves all the rows where
www.EnggTree.com
there is at least one match in the tables.
Example
SELECT Cust_id, Cust_name, Country, item_ordered, Order_date
FROM Customer INNER JOIN Orders USING (Order_id);
2. Outer Join
The records that don't match will be retrieved by the Outer join. It is of the following three
www.EnggTree.com
types:
1. Left Outer Join
2. Right Outer Join
3. Full Outer Join
www.EnggTree.com
www.EnggTree.com
Or: Method 2
SELECT Cust_id, Cust_name, Country, item_ordered, Order_date
FROM customer C, Orders O
WHERE C.Order_id = O.Order_id(+);
www.EnggTree.com
2. Right Outer Join
A Right Outer Join retrieves the records from the right hand side columns.
Example
Method
www.EnggTree.com
To retrieve all the records, both matching and unmatched from all the tables then use the FULL
OUTER JOIN.
Example
SELECT Cust_id, Cust_name, Country, item_ordered, Order_date
FROM customer C, FULL OUTER JOIN Orders OON (C. Order_id = O.Order_id)
www.EnggTree.com
2. Non-Equi Join
A Non-Equi join is based on a condition using an operator other than equal to "=".
Example
SELECT Cust_id, Cust_name, Country, Item_ordered, Order_date
FROM Customer C, Oredrs O WHERE C. Order_id >
O.Order_id;
3. Self-join
When a table is joined to itself only then that condition is called a self-join.
Example
SELECT C1.Cust_id, C2.Cust_name, C1.Country,
C2.Order_id FROM Customer C1, Customer C2
WHERE C. Cust_id > O.Order_id; Execution of the query with result:
www.EnggTree.com
4. Natural Join
A natural join is just like an equi-join since it compares the common columns of both tables.
www.EnggTree.com
Example
SELECT Cust_id, Cust_name, Country, Item_ordered,
Order_date FROM Customer, NATURAL JOIN Orders;
Execution of the query with result:
5. Cross Join
www.EnggTree.com
This join is a little bit different from the other joins since it generates the Cartesian product of two
tables as in the following:
Syntax
SELECT * FROM table_name1 CROSS JOIN table_name2;
Example
SELECT Cust_id, Cust_name, Country, Item_ordered, Order_date FROM Customer,
CROSS JOIN Orders;
Now, let us join these two tables in our SELECT statement as shown below.
SQL> SELECT ID, NAME, AGE, AMOUNT FROM CUSTOMERS, ORDERS WHERE
CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
This would produce the following result.
ID www.EnggTree.com
NAME AGE AMOUNT
3 kaushik 23 3000
3 kaushik 23 1500
2 Khilan 25 1560
4 Chaitali 25 2060
Here, it is noticeable that the join is performed in the WHERE clause. Several operators can be
used to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be
used to join tables. However, the most common operator is the equal to symbol.
● LEFT JOIN − returns all rows from the left table, even if there are no matches in the right
table.
● RIGHT JOIN − returns all rows from the right table, even if there are no matches in the left
table.
● FULL JOIN − returns rows when there is a match in one of the tables.
● SELF JOIN − is used to join a table to itself as if the table were two tables, temporarily
renaming at least one table in the SQL statement.
● CARTESIAN JOIN − returns the Cartesian product of the sets of records from the two or
more joined tables.
The most important and frequently used of the joins is the INNER JOIN. They are also referred to
as an EQUIJOIN.
The INNER JOIN creates a new result table by combining column values of two tables (table1 and
table2) based upon the join-predicate. The query compares each row of table1 with each row of
www.EnggTree.com
table2 to find all pairs of rows which satisfy the join-predicate. When the join-predicate is satisfied,
column values for each matched pair of rows of A and B are combined into a result row.
Syntax
SELECT table1.column1, table2.column2... FROM
table1 INNER JOIN table2
ON table1.common_field = table2.common_field;
The SQL LEFT JOIN returns all rows from the left table, even if there are no matches in the right
table. This means that if the ON clause matches 0 (zero) records in the right table; the join will still
return a row in the result, but with NULL in each column from the right table.
www.EnggTree.com
This means that a left join returns all the values from the left table, plus matched values from the right
table or NULL in case of no matching join predicate.
Syntax
Here, the given condition could be any given expression based on your requirement.
Example
Now, let us join these two tables using the LEFT JOIN as follows.
sql> select id, name, amount, date from customers left join orders on customers.id =
orders.customer_id;
Syntax
SELECT table1.column1, table2.column2... FROM
table1 RIGHT JOIN table2
ON table1.common_field = table2.common_field;
www.EnggTree.com
www.EnggTree.com
Table 2 − ORDERS Table is as follows.
Now, let us join these two tables using the RIGHT JOIN as follows.
The SQL FULL JOIN combines the results of both left and right outer joins.
The joined table will contain all records from both the tables and fill in NULLs for missing
matches on either side.
Syntax
SELECT table1.column1, table2.column2... FROM
table1 FULL JOIN table2 www.EnggTree.com
ON table1.common_field = table2.common_field;
Here, the given condition could be any given expression based on your requirement.
Example
www.EnggTree.com
Now, let us join these two tables using FULL JOIN as follows.
www.EnggTree.com
www.EnggTree.com
If your Database does not support FULL JOIN (MySQL does not support FULL JOIN), then you
can use UNION ALL clause to combine these two JOINS as shown below.
Sql> select id, name, amount, date from customers left join orders on customers.id
= orders.customer_id union all select id, name, amount, date from customers right join
orders on customers.id = orders.customer_id
Syntax
SELECT a.column_name, b.column_name... FROM table1 a, table1 b
www.EnggTree.com
WHERE a.common_field = b.common_field;
Here, the WHERE clause could be any given expression based on your requirement.
www.EnggTree.com
sql> select a.id, b.name, a.salary from customers a, customers b where a.salary <
b.salary;
ID NAME SALARY
2 Ramesh 1500.00
2 kaushik 1500.00
1 Chaitali 2000.00
2 Chaitali 1500.00
3 Chaitali 2000.00
6 Chaitali 4500.00
1 Hardik 2000.00
2 Hardik 1500.00
3 Hardik 2000.00
www.EnggTree.com
4 Hardik 6500.00
6 Hardik 4500.00
1 Komal 2000.00
2 Komal 1500.00
3 Komal 2000.00
1 Muffy 2000.00
2 Muffy 1500.00
3 Muffy 2000.00
4 Muffy 6500.00
5 Muffy 8500.00
6 Muffy 4500.00
The CARTESIAN JOIN or CROSS JOIN returns the Cartesian product of the sets of records from
two or more joined tables. Thus, it equates to an inner join where the join-condition always
evaluates to either True or where the join-condition is absent from the statement.
Syntax
The basic syntax of the CARTESIAN JOIN or the CROSS JOIN is as follows −
Now, let us join these two tables using CARTESIAN JOIN as follows −
EMBEDDED SQL
The first technique for sending SQL statements to the DBMS is embedded SQL. The SQL standard
defines embeddings of SQL in a variety of programming languages such as C,Java, and Cobol.
A language to which SQL queries are embedded is referred to as a host language, and the SQL
structures permitted in the host language comprise embedded SQL.
The following techniques are used to embed SQL statements in a host language:
● Embedded SQL statements are processed by a special SQL precompiler. All SQL statements
begin with an introducer and end with a terminator, both of which flag the SQL statement for
the precompiler. For example, the introducer is "EXEC SQL" in C and "& and the terminator is
a semicolon (;) in C.
● Variables from the application program, called host variables, can be used in embedded SQL
statements wherever constants are allowed.
● Queries that return a single row of data are handled with a singleton SELECT statement; this
statement specifies both the query and the host variables in which to return data.
www.EnggTree.com
● Queries that return multiple rows of data are handled with cursors. A cursor keeps track of the
current row within a result set. The DECLARE CURSOR statement defines the query, the
OPEN statement begins the query processing, the FETCH statement retrieves successive rows
of data, and the CLOSE statement ends query processing.
● While a cursor is open, positioned update and positioned delete statements can be used to
update or delete the row currently selected by the cursor.
Note: this varies by language (for example, the Java embedding uses # SQL
{ …. }; )
From within a host language, find the names and cities of customers with more than the variable
amount dollars in some account.
SQL
END_EXEC
The fetch statement causes the values of one tuple in the query result to be placed on host language
variables.
EXEC SQL fetch c into :cn, :cc END_EXEC Repeated calls to fetch get successive tuples in
the query result
A variable called SQLSTATE in the SQL communication area (SQLCA) gets set to ‘02000’ to indicate no
more data is available www.EnggTree.com
The close statement causes the database system to delete the temporary relation that holds the result of the
query.
DYNAMIC SQL
Dynamic SQL is the process that we follow for programming SQL queries in such a way that the
queries are built dynamically with the application operations.
It helps us to manage big industrial applications and manage the transactions without any added
overhead.
With dynamic SQL we are free to create flexible SQL queries and the names of the variables or
any other parameters are passed when the application runs. Allows programs to construct and
submit SQL queries at run time. We can use stored procedures to create dynamic queries which
can run when we desire.
When we use static SQL it is not altered from one execution to others, but in the case of dynamic
SQL, we can alter the query in each execution.
www.EnggTree.com
Why do we need Dynamic SQL?
When we need to run dynamic queries on our database, mainly DML queries.
When we need to access an object which is not in existence during the compile time.
When we need to perform operations on application fed data using invoker rights.
char * sqlprog = “update account set balance = balance * 1.05 where account_number = ?”
EXEC
The dynamic SQL program contains a ?, which is a place holder for a value that is provided
Dynamic SQL statements can be built at run time and placed in a string host variable. They are
sent to the DBMS for processing. Because the DBMS must generate an access plan at run time
for dynamic SQL statements, dynamic SQL is generally slower than static SQL.
The simplest way to execute a dynamic SQL statement is with an EXECUTE IMMEDIATE
statement. This statement passes the SQL statement to the DBMS for compilation and execution.
One disadvantage of the EXECUTE IMMEDIATE statement is that the DBMS must go through
each of the five steps of processing an SQL statement each time the statement is executed.
To address this situation, dynamic SQL offers an optimized form of execution called prepared
execution, which uses the following steps:
● The program constructs an SQL statement in a buffer, just as it does for the EXECUTE
IMMEDIATE statement. Instead of host variables, a question mark (?) can be substituted for
a constant anywhere in the statement text to indicate that a value for the constant will be
supplied later. The question mark is called as a parameter marker.
● The program passes the SQL statement to the DBMS with a PREPARE statement, which
requests that the DBMS parse, validate, and optimize the statement and generate an execution
www.EnggTree.com
plan for it. The program then uses an EXECUTE statement (not an EXECUTE IMMEDIATE
statement) to execute the PREPARE statement at a later time. It passes parameter values for
the statement through a special data structure called the SQL Data Area or SQLDA.
● The program can use the EXECUTE statement repeatedly, supplying different parameter
values each time the dynamic statement is executed.
● Prepared execution is still not the same as static SQL. In static SQL, the first four steps of
processing an SQL statement take place at compile time. In prepared execution, these steps
still take place at run time, but they are performed only once; execution of the plan takes
place only when EXECUTE is called. This helps eliminate some of the performance
disadvantages inherent in the architecture of dynamic SQL.
Difference between Static SQL and Dynamic SQL
Efficiency Static SQL statements are more Dynamic SQL statements are less
2
faster and efficient. efficient.
Compilation Static SQL statements are compiled Dynamic SQL statements are
3
at compile time. compiled at run time.
Use Cases Static SQL is used in case of Dynamic SQL is used in case of
5
uniformly distributed data. non-uniformly distributed data.
www.EnggTree.com
ENTITY-RELATIONSHIP MODEL
The ER model defines the conceptual view of a database. It works around real-world entities and
the associations among them.
Entity www.EnggTree.com
An entity can be a real-world object that can be easily identifiable. For example, in a school
database, students, teachers, classes, and courses offered can be considered as entities. All these
entities have some attributes or properties that give them their identity.
Attributes
Entities are represented by means of their properties, called attributes. All attributes have values.
For example, a student entity may have name, class, and age as attributes.
Types of Attributes
Simple attribute − Simple attributes are atomic values, which cannot be divided further.
Composite attribute − Composite attributes are made of more than one simple
attribute. For example, a student's complete name may have first_name and last_name.
Derived attribute − Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. For
www.EnggTree.com
example, average_salary in a department should not be saved directly in the database, instead it
can be derived. For another example, age can be derived from data_of_birth.
Social_Security_Number.
Multi-value attribute − Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email_address, etc.
Entity-Set :
An entity set is a collection of similar types of entities. An entity set may contain entities with
attribute sharing similar values. For example, a Students set may contain all the students of a
school; likewise a Teachers set may contain all the teachers of a school from all faculties. Entity
sets need not be disjoint.
www.EnggTree.com
Keys :
Key is an attribute or collection of attributes that uniquely identifies an entity among entity set. For
example, the roll_number of a student makes him/her identifiable among students.
Super Key − A set of attributes (one or more) that collectively identifies an entity in an
entity set.
Candidate Key − A minimal super key is called a candidate key. An entity set may
Primary Key − A primary key is one of the candidate keys chosen by the database
Relationship
The association among entities is called a relationship. For example, an employee works_at a
department, a student enrolls in a course. Here, Works_at and Enrolls are called relationships.
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a relationship too can
have attributes. These attributes are called descriptive attributes.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the relationship.
Binary = degree 2
Ternary = degree 3
n-ary = degree
It represents the number of entities of another entity set which are connected to an entity
types:
1. One to one
2. One to many
3. Many to one
4. Many to many
1. One-to-one relationship
An entity in A is associated with at most (only) one entity in B and an entity in B is associated
with at most (only) one entity in A.
www.EnggTree.com
A customer is connected with only one loan using the relationship borrower and a loan is
connected with only one customer using borrower.
2. One-to-many relationship
www.EnggTree.com
An entity in A is associated with any number (zero or more) of entities in Band an entity in Bis
associated with at most one (only) entity in A.
In the one-to-many relationship a loan is connected with only one customer using
borrower and a customer is connected with more than one loans using borrower.
3. Many-to-one relationship
An entity in A is associated with at most (only) one entity in B and an entity in B is associated
with any number (zero or more) of entities in A.
www.EnggTree.com
In a many-to-one relationship a loan is connected with more than one customer using borrower
and a customer is connected with only one loan using borrower.
4. Many-to-many relationship
An entity in A is associated with any number (zero or more) of entities in Band an entity in Bis
associated with any number (zero or more) of entities in A.
A customer is connected with more than one loan using borrower and a loan is connected with
more than one customer using borrower.
E-R Diagrams
E-R diagram is the short form of “Entity-Relationship” diagram. An E-R diagram efficiently shows
the relationships between various entities stored in a database.
www.EnggTree.com
E-R diagrams are used to model real-world objects like a person, a car, a company etc. and the
relation between these real-world objects. An e-r diagram has following features:
E-R diagrams are used to represent E-R model in a database, which makes them easy to be
E-R diagrams provide the purpose of real-world modeling of objects which makes them
intently useful.
These diagrams are very easy to understand and easy to create even by a naive user.
www.EnggTree.com
E R Diagrams Example:
www.EnggTree.com
www.EnggTree.com
EER is a high-level data model that incorporates the extensions to the original ER model.
It is a diagrammatic technique for displaying the following concepts
Union or Category
Aggregation
These concepts are used when the comes in EER schema and the resulting schema diagrams called
as EER Diagrams.
Sub class and Super class relationship leads the concept of Inheritance.
The relationship between sub class and super class is denoted with symbol.
1. Super Class
● Super class is an entity type that has a relationship with one or more subtypes.
● An entity cannot exist in database merely by being member of any super class.
For example: Shape super class is having sub groups as Square, Circle, and Triangle.
2. Sub Class
● Sub class is a group of entities with unique attributes.
www.EnggTree.com
● Sub class inherits properties and attributes from its super class.
For example: Square, Circle, Triangle are the sub class of Shape super class.
1. Generalization
● Generalization is the process of generalizing the entities which contain the properties of
all
the generalized entities.
● It is a bottom approach, in which two lower level entities combine to form a higher level
entity.
● Generalization is the reverse process of Specialization.
● It defines a general entity type from a set of specialized entity type.
● It minimizes the difference between the entities by identifying the common features.
For example:
In the above example, Tiger, Lion, Elephant can all be generalized as Animals.
2. Specialization
Specialization is a process that defines a group entities which is divided into sub groups
www.EnggTree.com
based on their characteristic.
It is a top down approach, in which one higher entity can be broken down into two lower
level entity.
It maximizes the difference between the members of an entity by identifying the unique
It defines one or more sub class for the super class and also forms the
superclass/subclass relationship.
For example
In the above example, Employee can be specialized as Developer or Tester, based on what role
they play in an Organization.
C. Category or Union
www.EnggTree.com
Category represents a single super class or sub class relationship with more than one
super class.
For example Car booking, Car owner can be a person, a bank (holds a possession on a
Car) or a company. Category (sub class) → Owner is a subset of the union of the three
super classes → Company, Bank, and Person. A Category member must exist in at least
one of its super classes.
D. Aggregation
Aggregation is a process that represent a relationship between a whole object and its
component parts.
www.EnggTree.com
In the above example, the relation between College and Course is acting as an Entity in Relation
with Student.
ER-to-Relational Mapping
way as to allow easy translation to the relational schema model, this is not an entirely
www.EnggTree.com
trivial process. The ER diagram represents the conceptual level of database design meanwhile
the relational schema is the logical level for the database design.
1. Entities and Simple Attributes:
An entity type within ER diagram is turned into a table. You may preferably keep the same name
for the entity or give it a sensible name but avoid DBMS reserved words as well as avoid the use
of special characters.
Each attribute turns into a column (attribute) in the table. The key attribute of the entity is the
primary key of the table which is usually underlined. It can be composite if required but can
never be null.
It is highly recommended that every table should start with its primary key attribute
conventionally named as TablenameID.
www.EnggTree.com
The initial relational schema is expressed in the following format writing the table names with the
attributes list inside a parentheses as shown below for
2. Multi-Valued Attributes
A multi-valued attribute is usually represented with a double-line oval.
If you have a multi-valued attribute, take the attribute and turn it into a new entity or table of its own.
Then make a 1:N relationship between the new entity and the existing one. In simple words.
1. Create a table for the attribute. 2. Add the primary (id) column of the parent entity as a foreign
key within the new table as shown below:
3. 1:1 Relationships
www.EnggTree.com
To keep it simple and even for better performances at data retrieval, I would personally
recommend using attributes to represent such relationship. For instance, let us consider the case
where the Person has or optionally has one wife. You can place the primary key of the wife
within the table of the Persons which we call in this case Foreign key as shown below.
For cases when the Person is not married i.e. has no wifeID, the attribute can set to NULL
4. 1:N Relationships
This is the tricky part ! For simplicity, use attributes in the same way as 1:1 relationship but we
have only one choice as opposed to two choices. For instance, the Person can have a House from
zero to many , but a House can have only one Person. To represent such relationship
the personidas the Parent node must be placed within the Child table as a foreign key but not
the other way around as shown next:
www.EnggTree.com
It should convert to :
Persons( personid , name, lastname, email
) House ( houseid , num , address,
personid)
5. N:N Relationships
We normally use tables to express such type of relationship. This is the same for N − ary
relationship of ER diagrams. For instance, The Person can live or work in many countries. Also, a
country can have many people. To express this relationship within a relational schema we use a
separate table as shown below:
www.EnggTree.com
Relationship with attributes:
It is recommended to use table to represent them to keep the design tidy and clean regardless of
the cardinality of the relationship.
Case Study
Company( CompanyID , name , address ) Staff( StaffID , dob , address , WifeID) Child( ChildID
www.EnggTree.com
, name , StaffID ) Wife
( WifeID , name )
Phone(PhoneID , phoneNumber , StaffID) Task ( TaskID , description)
Work(WorkID , CompanyID , StaffID , since )
Perform(PerformID , StaffID , TaskID )
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
X→Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because
if we know the Emp_Id, we can tell that employee name associated with it.
www.EnggTree.com
Example:
1. Consider a table with two columns Employee_Id and Employee_Name.
2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional
dependency as Employee_Id is a subset of {Employee_Id, Employee_Name}.
3. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are
trivia l dependencies too.
Example:
1. ID → Name,
2. Name → DOB
Armstrong'sAxioms
+
If F is a set of functional dependencies then the closure of F, denoted as F , is the set of all
functional dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when
applied repeatedly, generates a closure of functional dependencies.
Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha
holds beta.
Non-loss Decomposition
Decomposition in DBMS removeswww.EnggTree.com
redundancy, anomalies and inconsistencies from a database by
dividing the table into multiple tables.
The following are the types:
Lossless Decomposition
Decomposition is lossless if it is feasible to reconstruct relation R from decomposed tables using
Joins. This is the preferred choice. The information will not lose from the relation when
decomposed. The join would result in the same original relation.
Let us see an example:
<EmpInfo>
www.EnggTree.com
<DeptDetails>
www.EnggTree.com
Dpt2 E002 HR
Therefore, the above relation had lossless decomposition i.e. no loss of information.
Lossy Decomposition
As the name suggests, when a relation is decomposed into two or more relational schemas, the
loss of information is unavoidable when the original relation is retrieved.
Let us see an example:
<EmpInfo>
Emp_I D Emp_Nam e Emp_Ag e Emp_Locatio n Dept_I D Dept_Nam e
<EmpDetails>
www.EnggTree.com
Emp_ID Emp_Name Emp_Age Emp_Location
<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
● Now, you won’t be able to join the above tables, since Emp_ID isn’t part
of the DeptDetails relation.
● Therefore, the above relation has lossy decomposition.
NORMALIZATION
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy(repetition) and
undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step
process that puts data into tabular form, removing duplicated data from the relation tables.
If a table is not properly normalized and have data redundancy then it will not only eat up extra
memory space but will also make it difficult to handle and update the database, without facing
data loss. Insertion, Updation and Deletion Anamolies are very frequent if database is not
normalized. To understand these anomalies let us take an example of a Student table.
In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields
branch, hod(Head of Department) and office_tel is repeated for the students who are in the same
branch in the college, this is Data Redundancy.
1. Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot
be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information will be
repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.
2. Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science department? In
that case all the student records will have to be updated, and if by mistake we miss any record, it
www.EnggTree.com
will lead to data inconsistency. This is Updation anomaly.
In our Student table, two different informations are kept together, Student information and
Branch information. Hence, at the end of the academic year, if student records are deleted, we
will also lose the branch information. This is Deletion anomaly.
Normalization
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is
also used to eliminate the undesirable characteristics like Insertion, Update and Deletion
Anomalies.
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and
www.EnggTree.com
joining should be lossless.
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 www.EnggTree.com
English 35
83 Math 38
83 Computer 38
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
www.EnggTree.com
25 Biology
47 English
83 Math
83 Computer
A relation is in third normal form if it holds at least one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
www.EnggTree.com
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
www.EnggTree.com
In the above table Functional dependencies are as follows:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
www.EnggTree.com
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Fourth normal form (4NF)
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
www.EnggTree.com
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
www.EnggTree.com
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together
acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
www.EnggTree.com
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
www.EnggTree.com