Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Query Language

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 44

Query Language

In simple words, a Language which is used to store and


retrieve data from database is known as query
language. For example – SQL

There are two types of query language:


1.Procedural Query language
2.Non-procedural query language

1. Procedural Query language:


In procedural query language, user instructs the system
to perform a series of operations to produce the desired
results. Here users tells what data to be retrieved from
database and how to retrieve it.

For example – Let’s take a real world example to


understand the procedural language, you are asking
your younger brother to make a cup of tea, if you are
just telling him to make a tea and not telling the process
then it is a non-procedural language, however if you are
telling the step by step process like switch on the stove,
boil the water, add milk etc. then it is a procedural
language.

2. Non-procedural query language:


In Non-procedural query language, user instructs the
system to produce the desired result without telling the
step by step process. Here users tells what data to be
retrieved from database but doesn’t tell how to retrieve
it.

Now let’s back to our main topic of relational algebra


and relational calculus.

Relational Algebra:
Relational algebra is a conceptual procedural query
language used on relational model.

Relational Calculus:
Relational calculus is a conceptual non-procedural query
language used on relational model.

Note:
I have used word conceptual while describing relational
algebra and relational calculus, because they are
theoretical mathematical system or query language, they
are not the practical implementation, SQL is a practical
implementation of relational algebra and relational
calculus.
Relational Algebra, Calculus, RDBMS & SQL:
Relational algebra and calculus are the theoretical
concepts used on relational model.

RDBMS is a practical implementation of relational


model.

SQL is a practical implementation of relational algebra


and calculus.

What is Relational Algebra in DBMS?


Relational algebra is a procedural query language that
works on relational model. The purpose of a query
language is to retrieve data from database or perform
various operations such as insert, update, delete on the
data. When I say that relational algebra is a procedural
query language, it means that it tells what data to be
retrieved and how to be retrieved.
On the other hand relational calculus is a non-
procedural query language, which means it tells what
data to be retrieved but doesn’t tell how to retrieve it. We
will discuss relational calculus in a separate tutorial.

Types of operations in relational algebra


We have divided these operations in two categories:
1. Basic Operations
2. Derived Operations

Basic/Fundamental Operations:
1. Select (σ)
2. Project (∏)
3. Union (∪)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)

Derived Operations:
1. Natural Join (⋈)
2. Left, Right, Full outer join (⟕, ⟖, ⟗)
3. Intersection (∩)
4. Division (÷)

Lets discuss these operations one by one with the help


of examples.

Select Operator (σ)


Select Operator is denoted by sigma (σ) and it is used to
find the tuples (or rows) in a relation (or table) which
satisfy the given condition.
If you understand little bit of SQL then you can think of it
as a where clause in SQL, which is used for the same
purpose.

Syntax of Select Operator (σ)

σ Condition/Predicate(Relation/Table name)
Select Operator (σ) Example
Table: CUSTOMER
---------------

Customer_Id Customer_Name
Customer_City
----------- -------------
-------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:

σ Customer_City="Agra" (CUSTOMER)
Output:

Customer_Id Customer_Name Customer_City


----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
Project Operator (∏)
Project operator is denoted by ∏ symbol and it is used
to select desired columns (or attributes) from a table (or
relation).

Project operator in relational algebra is similar to


the Select statement in SQL.

Syntax of Project Operator (∏)

∏ column_name1, column_name2, ....,


column_nameN(table_name)
Project Operator (∏) Example
In this example, we have a table CUSTOMER with three
columns, we want to fetch only two columns of the table,
which we can do with the help of Project Operator ∏.

Table: CUSTOMER

Customer_Id Customer_Name
Customer_City
----------- -------------
-------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:

∏ Customer_Name, Customer_City (CUSTOMER)


Output:

Customer_Name Customer_City
------------- -------------
Steve Agra
Raghu Agra
Chaitanya Noida
Ajeet Delhi
Carl Delhi
Union Operator (∪)
Union operator is denoted by ∪ symbol and it is used to
select all the rows (tuples) from two tables (relations).

Lets discuss union operator a bit more. Lets say we


have two relations R1 and R2 both have same columns
and we want to select all the tuples(rows) from these
relations then we can apply the union operator on these
relations.

Note: The rows (tuples) that are present in both the


tables will only appear once in the union set. In short
you can say that there are no duplicates present after
the union operation.

Syntax of Union Operator (∪)

table_name1 ∪ table_name2
Union Operator (∪) Example
Table 1: COURSE

Course_Id Student_Name Student_Id


--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT

Student_Id Student_Name Student_Age


------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Query:

∏ Student_Name (COURSE) ∪ ∏ Student_Name


(STUDENT)
Output:

Student_Name
------------
Aditya
Carl
Paul
Lucy
Rick
Steve
Note: As you can see there are no duplicate names
present in the output even though we had few common
names in both the tables, also in the COURSE table we
had the duplicate name itself.

Intersection Operator (∩)


Intersection operator is denoted by ∩ symbol and it is
used to select common rows (tuples) from two tables
(relations).

Lets say we have two relations R1 and R2 both have


same columns and we want to select all those
tuples(rows) that are present in both the relations, then
in that case we can apply intersection operation on
these two relations R1 ∩ R2.

Note: Only those rows that are present in both the


tables will appear in the result set.

Syntax of Intersection Operator (∩)

table_name1 ∩ table_name2
Intersection Operator (∩) Example
Lets take the same example that we have taken above.
Table 1: COURSE

Course_Id Student_Name Student_Id


--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT

Student_Id Student_Name Student_Age


------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Query:

∏ Student_Name (COURSE) ∩ ∏ Student_Name


(STUDENT)
Output:

Student_Name
------------
Aditya
Steve
Paul
Lucy
Set Difference (-)
Set Difference is denoted by – symbol. Lets say we
have two relations R1 and R2 and we want to select all
those tuples(rows) that are present in Relation R1
but not present in Relation R2, this can be done using
Set difference R1 – R2.

Syntax of Set Difference (-)

table_name1 - table_name2
Set Difference (-) Example
Lets take the same tables COURSE and STUDENT that
we have seen above.

Query:
Lets write a query to select those student names that
are present in STUDENT table but not present in
COURSE table.

∏ Student_Name (STUDENT) - ∏ Student_Name


(COURSE)
Output:

Student_Name
------------
Carl
Rick
Cartesian product (X)
Cartesian Product is denoted by X symbol. Lets say we
have two relations R1 and R2 then the cartesian product
of these two relations (R1 X R2) would combine each
tuple of first relation R1 with the each tuple of second
relation R2. I know it sounds confusing but once we take
an example of this, you will be able to understand this.

Syntax of Cartesian product (X)


R1 X R2
Cartesian product (X) Example

Table 1: R

Col_A Col_B
----- ------
AA 100
BB 200
CC 300
Table 2: S

Col_X Col_Y
----- -----
XX 99
YY 11
ZZ 101
Query:
Lets find the cartesian product of table R and S.

R X S
Output:

Col_A Col_B Col_X Col_Y


----- ------ ------ ------
AA 100 XX 99
AA 100 YY 11
AA 100 ZZ 101
BB 200 XX 99
BB 200 YY 11
BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101
Note: The number of rows in the output will always be
the cross product of number of rows in each table. In our
example table 1 has 3 rows and table 2 has 3 rows so
the output has 3×3 = 9 rows.

Rename (ρ)
Rename (ρ) operation can be used to rename a relation
or an attribute of a relation.
Rename (ρ) Syntax:
ρ(new_relation_name, old_relation_name)
Rename (ρ) Example
Lets say we have a table customer, we are fetching
customer names and we are renaming the resulted
relation to CUST_NAMES.

Table: CUSTOMER

Customer_Id Customer_Name
Customer_City
----------- -------------
-------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:

ρ(CUST_NAMES, ∏(Customer_Name)(CUSTOMER))
Output:

CUST_NAMES
----------
Steve
Raghu
Chaitanya
Ajeet
Carl
What is Relational Calculus?
Relational calculus is a non-procedural query language
that tells the system what data to be retrieved but
doesn’t tell how to retrieve it.
Types of Relational Calculus

1. Tuple Relational Calculus (TRC)


Tuple relational calculus is used for selecting those
tuples that satisfy the given condition.
Table: Student

First_Name Last_Name Age


---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Lets write relational calculus queries.

Query to display the last name of those students where


age is greater than 30

{ t.Last_Name | Student(t) AND t.age > 30 }


In the above query you can see two parts separated by |
symbol. The second part is where we define the
condition and in the first part we specify the fields which
we want to display for the selected tuples.

The result of the above query would be:

Last_Name
---------
Singh
Query to display all the details of students where Last
name is ‘Singh’

{ t | Student(t) AND t.Last_Name = 'Singh' }


Output:

First_Name Last_Name Age


---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
2. Domain Relational Calculus (DRC)
In domain relational calculus the records are filtered
based on the domains.
Again we take the same table to understand how DRC
works.
Table: Student

First_Name Last_Name Age


---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Query to find the first name and age of students where
student age is greater than 27
{< First_Name, Age > | ∈ Student ∧ Age > 27}
Note:
The symbols used for logical operators are: ∧ for AND, ∨
for OR and ┓ for NOT.

Output:

First_Name Age
---------- ----
Ajeet 30
Chaitanya 31
Carl 28
Types of keys in DBMS
Note: Guys I have been getting comments that there are
no examples of keys here. If you click on the hyperlink
provided below in green colour, you would see the
complete separate tutorial of each key with examples.

Primary Key – A primary is a column or set of columns


in a table that uniquely identifies tuples (rows) in that
table.

Super Key – A super key is a set of one of more


columns (attributes) to uniquely identify rows in a table.

Candidate Key – A super key with no redundant attribute


is known as candidate key

Alternate Key – Out of all candidate keys, only one gets


selected as primary key, remaining keys are known as
alternate or secondary keys.
Composite Key – A key that consists of more than one
attribute to uniquely identify rows (also known as
records & tuples) in a table is called composite key.

Foreign Key – Foreign keys are the columns of a table


that points to the primary key of another table. They act
as a cross-reference between tables.

Definition: A primary key is a minimal set of attributes


(columns) in a table that uniquely identifies tuples (rows)
in that table.

Primary Key Example in DBMS


Lets take an example to understand the concept of
primary key. In the following table, there are three
attributes: Stu_ID, Stu_Name & Stu_Age. Out of these
three attributes, one attribute or a set of more than one
attributes can be a primary key.

Attribute Stu_Name alone cannot be a primary key as


more than one students can have same name.

Attribute Stu_Age alone cannot be a primary key as


more than one students can have same age.

Attribute Stu_Id alone is a primary key as each student


has a unique id that can identify the student record in
the table.

Note: In some cases an attribute alone cannot uniquely


identify a record in a table, in that case we try to find a
set of attributes that can uniquely identify a row in table.
We will see the example of it after this example.
Table Name: STUDENT

Stu_Id Stu_Name Stu_Age

101 Steve 23

102 John 24

103 Robert 28

104 Steve 29

105 Carl 29

Points to Note regarding Primary Key

 We denote usually denote it by underlining the


attribute name (column name).
 The value of primary key should be unique for each
row of the table. The column(s) that makes the key
cannot contain duplicate values.
 The attribute(s) that is marked as primary key is not
allowed to have null values.
 Primary keys are not necessarily to be a single
attribute (column). It can be a set of more than one
attributes (columns). For
example {Stu_Id, Stu_Name} collectively can identify
the tuple in the above table, but we do not choose it
as primary key because Stu_Id alone is enough to
uniquely identifies rows in a table and we always go
for minimal set. Having that said, we should choose
more than one columns as primary key only when
there is no single column that can uniquely identify
the tuple in table.
Another example of primary key – More than one
attributes
Consider this table ORDER, this table keeps the daily
record of the purchases made by the customer. This
table has three
attributes: Customer_ID, Product_ID & Order_Quantity.

Customer_ID alone cannot be a primary key as a single


customer can place more than one order thus more than
one rows of same Customer_ID value. As we see in the
following example that customer id 1011 has placed two
orders with product if 9023 and 9111.

Product_ID alone cannot be a primary key as more than


one customers can place a order for the same product
thus more than one rows with same product id. In the
following table, customer id 1011 & 1122 placed an
order for the same product (product id 9023).

Order_Quantity alone cannot be a primary key as more


more than one customers can place the order for the
same quantity.

Since none of the attributes alone were able to become


a primary key, lets try to make a set of attributes that
plays the role of it.
{Customer_ID, Product_ID} together can identify the
rows uniquely in the table so this set is the primary key
for this table.

Table Name: ORDER

Customer_ID Product_ID Order_Quantity

1011 9023 10

1122 9023 15

1099 9031 20

1177 9031 18

1011 9111 50

Note: While choosing a set of attributes for a primary


key, we always choose the minimal set that has
minimum number of attributes. For example, if there are
two sets that can identify row in table, the set that has
minimum number of attributes should be chosen as
primary key.

How to define primary key in RDBMS?


In the above example, we already had a table with data
and we were trying to understand the purpose and
meaning of primary key, however you should know that
generally we define the primary key during table
creation. We can define the primary key later as well but
that rarely happens in the real world scenario.

Lets say we want to create the table that we have


discussed above with the customer id and product id set
working as primary key. We can do that in SQL like this:

Create table ORDER


(
Customer_ID int not null,
Product_ID int not null,
Order_Quantity int not null,
Primary key (_Customer_ID, Product ID)
)
Suppose we didn’t define the primary key while creating
table then we can define it later like this:

Create table order(cutomer-id int,product_id


int);
Alter table order add constraint Pk_order
primary key(customer-id,product_id);

ALTER TABLE ORDER


ADD CONSTRAINT PK_Order PRIMARY KEY
(Customer_ID, Product_ID);
Another way:
When we have only one attribute as primary key, like we
see in the first example of STUDENT table. we can
define the key like this as well:
Create table STUDENT
(
Stu_Id int primary key,
Stu_Name varchar(255) not null,
Stu_Age int not null
)
Super key in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS

Definition of Super Key in DBMS: A super key is a set


of one or more attributes (columns), which can uniquely
identify a row in a table. Often DBMS beginners get
confused between super key and candidate key, so we
will also discuss candidate key and its relation with
super key in this article.

How candidate key is different from super key?


Answer is simple – Candidate keys are selected from
the set of super keys, the only thing we take care while
selecting candidate key is: It should not have any
redundant attribute. That’s the reason they are also
termed as minimal super key.

Let’s take an example to understand this:


Table: Employee

Emp_SSN Emp_Number Emp_Name


--------- ---------- --------
123456789 226 Steve
999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert
Super keys: The above table has following super keys.
All of the following sets of super key are able to uniquely
identify a row of the employee table.

 {Emp_SSN}
 {Emp_Number}
 {Emp_SSN, Emp_Number}
 {Emp_SSN, Emp_Name}
 {Emp_SSN, Emp_Number, Emp_Name}
 {Emp_Number, Emp_Name}
Candidate Keys: As I mentioned in the beginning, a
candidate key is a minimal super key with no redundant
attributes. The following two set of super keys are
chosen from the above sets as there are no redundant
attributes in these sets.

 {Emp_SSN}
 {Emp_Number}
Only these two sets are candidate keys as all other sets
are having redundant attributes that are not necessary
for unique identification.

Super key vs Candidate Key


I have been getting lot of comments regarding the
confusion between super key and candidate key. Let me
give you a clear explanation.
1. First you have to understand that all the candidate
keys are super keys. This is because the candidate keys
are chosen out of the super keys.
2. How we choose candidate keys from the set of super
keys? We look for those keys from which we cannot
remove any fields. In the above example, we have not
chosen {Emp_SSN, Emp_Name} as candidate key
because {Emp_SSN} alone can identify a unique row in
the table and Emp_Name is redundant.

Primary key:
A Primary key is selected from a set of candidate keys.
This is done by database admin or database designer.
We can say that
either {Emp_SSN} or {Emp_Number} can be chosen as
a primary key for the table Employee.

Candidate Key in DBMS


BY CHAITANYA SINGH | FILED UNDER: DBMS

Definition of Candidate Key in DBMS: A super


key with no redundant attribute is known as candidate
key. Candidate keys are selected from the set of super
keys, the only thing we take care while selecting
candidate key is that the candidate key should not have
any redundant attributes. That’s the reason they are also
termed as minimal super key.

Candidate Key Example


Lets take an example of table “Employee”. This table
has three attributes: Emp_Id, Emp_Number &
Emp_Name. Here Emp_Id & Emp_Number will be
having unique values and Emp_Name can have
duplicate values as more than one employees can have
same name.

Emp_Id Emp_Number Emp_Name


------ ---------- --------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
How many super keys the above table can have?
1. {Emp_Id}
2. {Emp_Number}
3. {Emp_Id, Emp_Number}
4. {Emp_Id, Emp_Name}
5. {Emp_Id, Emp_Number, Emp_Name}
6. {Emp_Number, Emp_Name}

Lets select the candidate keys from the above set of


super keys.

1. {Emp_Id} – No redundant attributes


2. {Emp_Number} – No redundant attributes
3. {Emp_Id, Emp_Number} – Redundant attribute. Either
of those attributes can be a minimal super key as both of
these columns have unique values.
4. {Emp_Id, Emp_Name} – Redundant attribute
Emp_Name.
5. {Emp_Id, Emp_Number, Emp_Name} – Redundant
attributes. Emp_Id or Emp_Number alone are sufficient
enough to uniquely identify a row of Employee table.
6. {Emp_Number, Emp_Name} – Redundant attribute
Emp_Name.

The candidate keys we have selected are:


{Emp_Id}
{Emp_Number}
Note: A primary key is selected from the set of
candidate keys. That means we can either have Emp_Id
or Emp_Number as primary key. The decision is made
by DBA (Database administrator)

oreign key in DBMS


BY CHAITANYA SINGH | FILED UNDER: DBMS

Definition: Foreign keys are the columns of a table that


points to the primary key of another table. They act as a
cross-reference between tables.

For example:
In the below example the Stu_Id column
in Course_enrollment table is a foreign key as it points
to the primary key of the Student table.

Course_enrollment table:

 Course_Id  Stu_Id

C01 101

C02 102

C03 101

C05 102
C06 103

C07 102

Student table:

 Stu_Id  Stu_Name  Stu_Age

101 Chaitanya 22

102 Arya 26

103 Bran 25

104 Jon 21

Note: Practically, the foreign key has nothing to do with


the primary key tag of another table, if it points to a
unique column (not necessarily a primary key) of
another table then too, it would be a foreign key. So, a
correct definition of foreign key would be: Foreign keys
are the columns of a table that points to the candidate
key of another table.
Composite key in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS

Definition of Composite key: A key that has more than


one attributes is known as composite key. It is also
known as compound key.

Note: Any key such as super key, primary


key, candidate key etc. can be called composite key if it
has more than one attributes.

Composite key Example


Lets consider a table Sales. This table has four columns
(attributes) – cust_Id, order_Id, product_code &
product_count.

Table – Sales

cust_Id order_Id product_code


product_count
-------- -------- ------------
-------------
C01 O001 P007 23
C02 O123 P007 19
C02 O123 P230 82
C01 O001 P890 42
None of these columns alone can play a role of key in
this table.

Column cust_Id alone cannot become a key as a same


customer can place multiple orders, thus the same
customer can have multiple entires.
Column order_Id alone cannot be a primary key as a
same order can contain the order of multiple products,
thus same order_Id can be present multiple times.

Column product_code cannot be a primary key as


more than one customers can place order for the same
product.

Column product_count alone cannot be a primary key


because two orders can be placed for the same product
count.

Based on this, it is safe to assume that the key should


be having more than one attributes:
Key in above table: {cust_id, product_code}

This is a composite key as it is made up of more than


one attributes.

Alternate key in DBMS


BY CHAITANYA SINGH | FILED UNDER: DBMS

As we have seen in the candidate key guide that a table


can have multiple candidate keys. Among these
candidate keys, only one key gets selected as primary
key, the remaining keys are known as alternative or
secondary keys.

Alternate Key Example


Lets take an example to understand the alternate key
concept. Here we have a table Employee, this table has
three attributes: Emp_Id, Emp_Number & Emp_Name.
Table: Employee/strong>

Emp_Id Emp_Number Emp_Name


------ ---------- --------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
There are two candidate keys in the above table:
{Emp_Id}
{Emp_Number}

DBA (Database administrator) can choose any of the


above key as primary key. Lets say Emp_Id is chosen
as primary key.

Since we have selected Emp_Id as primary key, the


remaining key Emp_Number would be called alternative
or secondary key.

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in


Database
BY CHAITANYA SINGH | FILED UNDER: DBMS

Normalization is a process of organizing the data in


database to avoid data redundancy, insertion anomaly,
update anomaly & deletion anomaly. Let’s discuss about
anomalies first then we will discuss normal forms with
examples.

Anomalies in DBMS
There are three types of anomalies that occur when the
database is not normalized. These are – Insertion,
update and deletion anomaly. Let’s take an example to
understand this.

Example: Suppose a manufacturing company stores the


employee details in a table named employee that has
four attributes: emp_id for storing employee’s id,
emp_name for storing employee’s name, emp_address
for storing employee’s address and emp_dept for storing
the department details in which the employee works. At
some point of time the table looks like this:

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004

The above table is not normalized. We will see the


problems that we face when a table is not normalized.
Update anomaly: In the above table we have two rows
for employee Rick as he belongs to two departments of
the company. If we want to update the address of Rick
then we have to update the same in two rows or the
data will become inconsistent. If somehow, the correct
address gets updated in one department but not in other
then as per the database, Rick would be having two
different addresses, which is not correct and would lead
to inconsistent data.

Insert anomaly: Suppose a new employee joins the


company, who is under training and currently not
assigned to any department then we would not be able
to insert the data into the table if emp_dept field doesn’t
allow nulls.

Delete anomaly: Suppose, if at a point of time the


company closes the department D890 then deleting the
rows that are having emp_dept as D890 would also
delete the information of employee Maggie since she is
assigned only to this department.

To overcome these anomalies we need to normalize the


data. In the next section we will discuss about
normalization.

Normalization
Here are the most commonly used normal forms:

 First normal form(1NF)


 Second normal form(2NF)
 Third normal form(3NF)
 Boyce & Codd normal form (BCNF)
First normal form (1NF)
As per the rule of first normal form, an attribute (column)
of a table cannot hold multiple values. It should hold only
atomic values.

Example: Suppose a company wants to store the


names and contact details of its employees. It creates a
table that looks like this:

emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

8812121212
102 Jon Kanpur

9900012222

103 Ron Chennai 7778881212

9990000123
104 Lester Bangalore
8123450987

Two employees (Jon & Lester) are having two mobile


numbers so the company stored them in the same field
as you can see in the table above.
This table is not in 1NF as the rule says “each attribute
of a table must have atomic (single) values”, the
emp_mobile values for employees Jon & Lester violates
that rule.

To make the table complies with 1NF we should have


the data like this:

emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

Second normal form (2NF)


A table is said to be in 2NF if both the following
conditions hold:
 Table is in 1NF (First normal form)
 No non-prime attribute is dependent on the proper
subset of any candidate key of table.
An attribute that is not part of any candidate key is
known as non-prime attribute.

Example: Suppose a school wants to store the data of


teachers and the subjects they teach. They create a
table that looks like this: Since a teacher can teach more
than one subjects, the table can have multiple rows for a
same teacher.

teacher_id subject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic
values. However, it is not in 2NF because non prime
attribute teacher_age is dependent on teacher_id alone
which is a proper subset of candidate key. This violates
the rule for 2NF as the rule says “no non-prime attribute
is dependent on the proper subset of any candidate key
of the table”.

To make the table complies with 2NF we can break it in


two tables like this:
teacher_details table:

teacher_id teacher_age

111 38

222 38

333 40

teacher_subject table:

teacher_id subject

111 Maths

111 Physics
222 Biology

333 Physics

333 Chemistry

Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)


A table design is said to be in 3NF if both the following
conditions hold:

 Table must be in 2NF


 Transitive functional dependency of non-prime
attribute on any super key should be removed.
An attribute that is not part of any candidate key is
known as non-prime attribute.

In other words 3NF can be explained like this: A table is


in 3NF if it is in 2NF and for each functional dependency
X-> Y at least one of the following conditions hold:

 X is a super key of table


 Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is
known as prime attribute.

Example: Suppose a company wants to store the


complete address of each employee, they create a table
named employee_details that looks like this:

emp_id emp_name emp_zip emp_state emp_city emp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id,


emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are
non-prime as they are not part of any candidate keys.
Here, emp_state, emp_city & emp_district dependent on
emp_zip. And, emp_zip is dependent on emp_id that
makes non-prime attributes (emp_state, emp_city &
emp_district) transitively dependent on super key
(emp_id). This violates the rule of 3NF.

To make this table complies with 3NF we have to break


the table into two tables to remove the transitive
dependency:

employee table:

emp_id emp_name emp_zip

1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999

employee_zip table:

emp_zip emp_state emp_city emp_district


282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)


It is an advance version of 3NF that’s why it is also
referred as 3.5NF. BCNF is stricter than 3NF. A table
complies with BCNF if it is in 3NF and for
every functional dependency X->Y, X should be the
super key of the table.

Example: Suppose there is a company wherein


employees work in more than one department. They
store the data like this:

emp_idemp_nationalityemp_dept dept_typedept_no_of_em
Production and
1001 Austrian D001 200
planning

1001 Austrian stores D001 250

design and technical


1002 American D134 100
support

Purchasing
1002 American D134 600
department

Functional dependencies in the table above:


emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor


emp_dept alone are keys.

To make the table comply with BCNF we can break the


table in three tables like this:
emp_nationality table:

emp_id emp_nationality
1001 Austrian

1002 American

emp_dept table:

emp_dept dept_type dept_no_of_emp

Production and planning D001 200

stores D001 250

design and technical support D134 100

Purchasing department D134 600

emp_dept_mapping table:

emp_id emp_dept

1001 Production and planning


1001 stores

1002 design and technical support

1002 Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional


dependencies left side part is a key.

Functional dependency in DBMS


BY CHAITANYA SINGH | FILED UNDER: DBMS

The attributes of a table is said to be dependent on each


other when an attribute of a table uniquely identifies
another attribute of the same table.

For example: Suppose we have a student table with


attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id
attribute uniquely identifies the Stu_Name attribute of
student table because if we know the student id we can
tell the student name associated with it. This is known
as functional dependency and can be written as Stu_Id-
>Stu_Name or in words we can say Stu_Name is
functionally dependent on Stu_Id.

Formally:
If column A of a table uniquely identifies the column B of
same table then it can represented as A->B (Attribute B
is functionally dependent on attribute A)

Types of Functional Dependencies

 Trivial functional dependency


 non-trivial functional dependency
 Multivalued dependency
 Transitive dependency

You might also like