Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

DBMS Module-2

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

DBMS Module-2

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 80

Module-2

RDBMS
Relational DBMS
Relational model can represent as a table with columns and rows. Each
row is known as a tuple. Each table of the column has a name or attribute.
Terminologies:
• Tuple: Each row of a relation is known as tuple. e.g.; STUDENT relation
given below has 4 tuples.
• Attribute: It contains the name of a column in a particular table. Each
attribute Ai must have a domain, dom (Ai)
• Domain: The possible values an attribute can take in a relation is called
its domain. For Example, domain of STUD_AGE can be from 18 to 40.
• Relational instance: In the relational database system, the relational
instance is represented by a finite set of tuples. Relation instances do
not have duplicate tuples.
• Relational schema: A relational schema contains the name of the
relation and name of all columns or attributes.
• Relational key: In the relational key, each row has one or more
attributes. It can identify the row in the relation uniquely.
EF Codd’s Rules
• Rule 0: Foundation rule
• RDBMS should be able to manage the stored data in its entirety
through its relational capabilities.
• Rule 1: Information Rule
• The data stored in a database, may it be user data or metadata, must
be a value of some table cell.
• Rule 2: Guaranteed Access Rule
• Every single data element (value) is guaranteed to be accessible
logically with a combination of table-name, primary-key (row value),
and attribute-name (column value).
• Rule 3: Systematic Treatment of NULL Values
• The NULL values in a database must be given a systematic and
uniform treatment. This is a very important rule because a NULL can
be interpreted as one the following − data is missing, data is not
known, or data is not applicable.
• Rule 4: Active Online Catalog
• The structure description of the entire database must be stored in an
online catalog, known as data dictionary, which can be accessed by
authorized users.
• Rule 5: Comprehensive Data Sub-Language Rule
• A database can only be accessed using a language having linear syntax
that supports data definition, data manipulation, and transaction
management operations.
• Rule 6: View Updating Rule
• All the views of a database, which can theoretically be updated, must
also be updatable by the system.
• Rule 7: High-Level Insert, Update, and Delete Rule
• A database must support high-level insertion, updation, and deletion.
• Rule 8: Physical Data Independence
• The data stored in a database must be independent of the applications
that access the database. Any change in the physical structure of a
database must not have any impact on how the data is being accessed
by external applications.
• Rule 9: Logical Data Independence
• The logical data in a database must be independent of its user’s
view (application). Any change in logical data must not affect the
applications using it.
• Rule 10: Integrity Independence
• All its integrity constraints can be independently modified without
the need of any change in the application. This rule makes a
database independent of the front-end application and its interface.
• Rule 11: Distribution Independence
• The end-user must not be able to see that the data is distributed
over various locations.
• Rule 12: Non-Subversion Rule
• If a system has an interface that provides access to low-level
records, then the interface must not be able to subvert the system
and bypass security and integrity constraints.
Constraints
Every relation has some conditions that must hold for it to be a
valid relation. These conditions are called Relational Integrity
Constraints. There are three main integrity constraints −
• Key constraints
• Domain constraints
• Referential integrity constraints

Key Constraints or Entity Constraints


• The minimal subset of attributes is called key for a relation
which can identify a tuple uniquely.
• Key constraints force that −
• in a relation with a key attribute, no two tuples can have
identical values for key attributes.
• a key attribute can not have NULL values.
Domain Constraints
• Attributes have specific values in real-world scenario.
• For example, age can only be a positive integer. The same
constraints have been tried to employ on the attributes of a
relation.
• Every attribute is bound to have a specific range of values.
• For example, age cannot be less than zero and telephone
numbers cannot contain a digit outside 0-9.

Referential integrity Constraints


• Referential integrity constraints work on the concept of
Foreign Keys. A foreign key is a key attribute of a relation
that can be referred in other relation.
• Referential integrity constraint states that if a relation refers
to a key attribute of a different or same relation, then that
key element must exist.
REFERENCE TABLE
Conversion from ER Diagram to Relational Schema
Relational Algebra
• Relational algebra is a procedural query language, which takes
instances of relations as input and yields instances of relations as
output.
• It uses operators to perform queries. An operator can be
either unary or binary.
• The fundamental operations of relational algebra are as follows
• Select (σ)
• Project (∏)
• Union (∪)
• Intersection(∩)
• Set different (−)
• Cartesian product (Χ)
• Rename (ρ)
• Join
Select Operation (σ)
It selects tuples that satisfy the given predicate from a relation.
• Notation − σp(R)
• Where σ stands for selection predicate and R stands for
relation. p is prepositional logic formula which may use
connectors like and, or, and not. These terms may use relational
operators like − =, ≠, ≥, < , >, ≤.
• For example − σ (Books)
subject = "database"

• Output − Selects tuples from books where subject is


'database'. σsubject = "database" and price = "450"(Books)

• Output − Selects tuples from books where subject is


'database' and 'price' is 450.
Project Operation (∏)

It projects column(s) that satisfy a given predicate.


Notation − ∏A1, A2, An (R)
Where A1, A2 , An are attribute names of relation R.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
∏subject, author (Books)

Output:-Selects and projects columns named as subject and


author from the relation Books.
Sequences of Operations and the
RENAME Operation
• For example, to retrieve the first name, last name,
and salary of all employees who work in
department number 5.
Union Operation (∪)
• It includes all tuples that are in tables A or in B. It also
eliminates duplicate tuples. So, set A UNION set B would be
expressed as:
• Notation<- A ∪ B
• For a union operation to be valid, the following conditions
must hold -
• R and S must be the same number of attributes.
• Attribute domains need to be compatible.
• Duplicate tuples should be automatically removed.
• Example ∏ (Books) ∪ ∏ (Articles)
author author

• Output − Projects the names of the authors who have either


written a book or an article or both.
Intersection(∩)
• An intersection is defined by the symbol ∩
• Notation:- A ∩ B
• Defines a relation consisting of a set of all tuple that are in
both A and B. However, A and B must be union-compatible.
• Example:- Find all the authors who written books and
articles

∏ author (Books) ∩ ∏ author (Articles)

• Output − Projects the names of the authors who have


written book and article.
Set Difference (−)

• The result of set difference operation is tuples, which are


present in one relation but are not in the second relation.
• Notation :- R − S
• Finds all the tuples that are present in R but not in S.

∏ author (Books) − ∏ author (Articles)

• Output − Provides the name of authors who have


written books but not articles.
Cartesian Product (Χ)
• Combines information of two different relations into one.
• Notation :− A Χ B.
• If A has m tuples and and B has n tuples, cross
product of A and B will have m X n tuples.
Rename Operation (ρ)

• The results of relational algebra are also relations but


without any name. The rename operation allows us to
rename the output relation. 'rename' operation is denoted
with small Greek letter rho ρ.
• Notation − ρ (X,E)
• Where the result of expression E is saved with name of X.
• Example:-
ρ(STUDENT1, STUDENT)
• Output:- STUDENT table is renamed as STUDENT1
Join Operations
• Join is a combination of a Cartesian product followed by a
selection process.
• A Join operation pairs two tuples from different relations, if
and only if a given join condition is satisfied.
• Types of Join Operation
• Conditional Join(⋈c)
• Equijoin(⋈)
• Natural Join(⋈)
• Left Outer Join(⟕)
• Right Outer Join(⟖)
• Full Outer Join(⟗)

20
Conditional Join(⋈c)

• Conditional Join is used when you want to join two or more


relations based on some conditions.
• Example: Retrieve the name of the manager of each
department.

In terms of basic operators (cross product and selection) :

σ (MGRSSN=SSN)(DEPARTMENT×EMPLOYEE)
Equijoin(⋈)
• Equijoin is a special case of conditional join where only the
equality condition holds between a pair of attributes.
• As values of two attributes will be equal in the result of
equijoin, only one attribute will appear in the result.

• Example: Retrieve the name of the manager of each


department.

DEPARTMENT⋈ MGRSSN=SSN EMPLOYEE


Natural Join(⋈)
• It is a special case of equijoin in which the equality condition
holds on all attributes which have the same name in
relations R and S (relations on which join operation is
applied).
• While applying natural join on two relations, there is no
need to write equality conditions explicitly.
• Example: Select students whose ROLL_NO is equal to
ROLL_NO of STUDENT_SPORTS as:

STUDENT⋈STUDENT_SPORTS
Analysis of Inner join

• Theta Join, Equijoin, and Natural Join are called inner joins.
• An inner join includes only those tuples with matching
attributes and the rest are discarded in the resulting relation.

26
Outer Joins

• we need to use outer joins to include all the tuples from the
participating relations in the resulting relation.
• There are three kinds of outer joins −
• left outer join,
• right outer join,
• full outer join.

27
Left Outer Join(R S)
• All the tuples from the Left relation, R, are included in the
resulting relation.
• If there are tuples in R without any matching tuple in the
Right relation S, then the S-attributes of the resulting relation
are made NULL.

28
Right Outer Join: ( R Right Outer Join S )
• All the tuples from the Right relation, S, are included in the
resulting relation.
• If there are tuples in S without any matching tuple in R, then
the R-attributes of resulting relation are made NULL.
Course (Left) HOD (Right)
A B C D
100 Database 100 Alex
101 Mechanics 102 Maya
102 Electronics 104 Mira

29
Full Outer Join: ( R Full Outer Join S)
• All the tuples from both participating relations are included in
the resulting relation.
• If there are no matching tuples for both relations, their
respective unmatched attributes are made NULL.
Course (Left) HOD (Right)
A B C D
100 Database 100 Alex
101 Mechanics 102 Maya
102 Electronics 104 Mira

30
DIVISION Operation

• The DIVISION operation is applied to two relations R(Z) ÷ S(X), where


the attributes of R are a subset of the attributes of S; that is, X ⊆ Z
• Syntax:-
Result(Attribute) ← R ÷ S
• Example :- Retrieve the names of employees who work on all the
projects that ‘John Smith’ works on.
R1 ← σFname=‘John’ AND Lname=‘Smith’(EMPLOYEE)
R2 ← πPno(WORKS_ON Essn=Ssn R1)
R3 ← πEssn, Pno(WORKS_ON)
R4(Ssn) ← R3 ÷ R2
RESULT ← πFname, Lname(R4 * EMPLOYEE)
R3 R2

R4
Generalized Projection

• The generalized projection operation extends the projection


operation by allowing functions of attributes to be included in the
projection list.

Syntax: ∏ F1,F2,….,Fn (R)

• where F1, F2, ..., Fn are functions over the attributes in relation R and
may involve arithmetic operations and constant values
• This operation is helpful when developing reports where computed
values have to be produced in the columns of a query result
Example
• As an example, consider the relation
• EMPLOYEE (Ssn, Salary, Deduction, Years_service)
• A report may be required to show
• Net Salary = Salary – Deduction,
• Bonus = 2000 * Years_service, and
• Tax = 0.25 * Salary.
• Then a generalized projection combined with renaming may be used
as follows:
REPORT ← ρ(Ssn, Net_salary, Bonus, Tax) (∏ssn , Salary – Deduction, 2000 * Years_service,

0.25 * Salary
(EMPLOYEE)).
Aggregate Functions and Grouping

• The aggregate functions are used to perform calculations on a set of


tuples (rows) from a relation (table) and return a single value.
• These functions are crucial for summarizing data, such as calculating
totals, averages, minimum and maximum values, etc.
• Aggregate functions in relational algebra include:
• SUM: Adds up all the values in a specified attribute.
• COUNT: Counts the number of tuples in a relation.
• AVG: Calculates the average of values in a specified attribute.
• MIN: Returns the minimum value of a specified attribute.
• MAX: Returns the maximum value of a specified attribute.
• Syntax: <group attributes>ℑ <function list> (R)
• where <group attributes> are the list of attributes of the relation
specified in R,
• and <function list> is a list of (<function> <attribute> ) pairs.
• Example: Retrieve each department number, the number of
employees in the department, and their average salary, while
renaming the resulting attributes.
ρR(Dno, No_of_employees, Average_sal) (Dno ℑ COUNT (Ssn), AVERAGE (Salary) (EMPLOYEE))
• Retrieve the name and address of all employees who work
for the 'Research' department.
• For every project located in 'Stafford', list the project
number, the controlling department number, and the
department manager's last name, address, and birth date.
• Retrieve the names of employees who have no dependents.
• Retrieve the names of all employees in department 5 who
work more than 10 hours per week on the 'ProductX'
project.
• List the names of all employees who have a dependent with
the same first name as themselves.
• List the last names of all department managers who have no
dependents.
Relational Calculus
 Relational calculus is a non-procedural query language.
 In the non-procedural query language, the user is concerned
with the details of how to obtain the end results.
 The relational calculus tells what to do but never explains
how to do.
 Types of Relational calculus

42
Comparison Chart
BASIS FOR
COMPARISON RELATIONAL ALGEBRA RELATIONAL CALCULUS

Basic Relational Algebra is a Relational Calculus is


Procedural language. Declarative language.
States Relational Algebra states how to Relational Calculus states
obtain the result. what result we have to
obtain.

Order Relational Algebra describes the Relational Calculus does not


order in which operations have specify the order of
to be performed. operations.

Domain Relational Algebra is not domain Relation Calculus can be


dependent. domain dependent.
Related It is close to a programming It is close to the natural
language. language.

43
Tuple Relational Calculus
The Tuple Relational Calculus list the tuples selected from a relation,
based on a certain condition provided. It is formally denoted as:
{ t | P(t) }
Where t is the set of tuples from which the condition P is true.
where t = resulting tuples,
P(t) = known as Predicate and these are the conditions that are used to
fetch t
Thus, it generates a set of all tuples t, such that Predicate P(t) is true for
t.
P(t) may have various conditions logically combined with OR (∨), AND
(∧), NOT(¬).
It also uses quantifiers:
∃ t ∈ r (Q(t)) = ”there exists” a tuple in t in relation r such that
predicate Q(t) is true.
∀ t ∈ r (Q(t)) = Q(t) is true “for all” tuples in relation r.
44
Build TRC expression case
 Ex- let t is a tuple variable assigns to EMP relation as follows:

t → EMP( eno, ename, age, sal)

Q1. list the Details of the employees who are getting a salary
above 10,000.

{ t| EMP(t) ∧ t.sal > 10,000 } // list all attributes

Q1. list the ename and age of the employees who are getting
salary above 10,000.

{ t.ename, t.age | EMP(t) ∧ t.sal > 10,000 }


45
Some more
Q1. list the age of the employee whose eno is 102.

{t.age | EMP(t) ∧ t.eno = 102 }


Q2. list the ename, age, and sal of the employees who are getting
salary below 10,000 and above 20,000.

{ t.ename, t.age, t.sal | EMP(t) ∧ t.sal < 10,000 ∨ t.sal > 20,000}

For practice:
1. list the employee name and age of the employees whose age is
below 35 or above 50 and sal is above 30000.
2. List the eno and sal of the employees who are getting salary
between 20,000 to 50000.

46
Domain Relational Calculus

• The Domain Relational Calculus list the attributes to be selected


from a relation, based on certain condition.
• The formal definition of Domain Relational Calculus is as follows:
{<X1, X2, X3, . . . Xn> | P(X1, X2, X3, . . . Xn)}
• Where X1, X2, X3, . . . Xn are the attributes and P is the certain
condition.
• Domain relational calculus uses the same operators as tuple
calculus. It uses logical connectives ∧ (and), ∨ (or), and ┓ (not).
• It uses Existential (∃) and Universal Quantifiers (∀) to bind the
variable.

47
Build DRC expression case
EMP(eno, ename, age, sal)
For individual domain one variable need to be
assigned.
P = eno < variable p ranges over the domain eno>
q =ename
r = age
s = sal
Q1. Build the DRC expression to find the age of Mr. X
[r|(∃p)(∃q)(∃r)(∃s) { EMP(pqrs) ∧ q= ‘Mr. X’} ]

48
More …
Q2. List the ename of the employees who are getting salary 10000 and
above.
[q|(∃p)(∃q)(∃r)(∃s) { EMP(pqrs) ∧ s ≥ 10000} ]

Q3. List the employee name and age of the employees who age is
below 35 or above 50 and sal is above 30000.
[q, r|(∃p)(∃q)(∃r)(∃s) { EMP(pqrs) ∧ r < 35 ∨ r > 50 ∧ s ≥ 30000 } ]

For Practice:
1. List the employee id and ename of the employees who age is 35
or above 50 and sal is below 30000.
2. List the eno and sal of the employees who are getting salary
between 20,000 to 50000.

49
Queries
• Queries-1: Find the tuples of loans where amount is
greater than or equal to 10000.

• Queries-2: Find the loan number for each loan of an


amount greater or equal to 10000.

• Queries-3: Find the names of all customers who have a


loan and an account at the bank.

• Queries-4: Find the names of all customers having a loan


at the “ABC” branch.
Tuple Calculus
• {t| t ∈ loan ∧ t.amount >=10000}
• {t| ∃ s ∈ loan(t. loan number = s.loan number ∧ s.amount
>=10000)}
• {t | ∃ s ∈ borrower( t. customer-name = s. customer-name)
∧ ∃ u ∈ depositor( t. customer-name = u. customer-name)}
• {t | ∃ s ∈ borrower(t. customer-name = s. customer-name ∧
∃ u ∈ loan(u. branch-name = “ABC” ∧ u. loan-number =
s.loan-number))}
• Domain Calculus
• { p, q, r | (∃p)(∃q)(∃r) {loan(pqr) ∧ r >=10000}}
• { p | (∃p)(∃q)(∃r) {loan(pqr) ∧ r >=10000 }}
• {x | (∃x)(∃y) (∃p)(∃q) { depositor(xy) ∧ borrower(pq) ∧ x=p}}
Queries
1. Retrieve the names of all employees in department 5 who work more than 10 hours
per week on the 'ProductX' project.
2. List the names of all employees who have a dependent with the same first name as
themselves.
3. Find the names of all employees who are directly supervised by 'Franklin Wong'.
4. For each project, list the project name and the total hours per week (by all
employees) spent on that project.
5. Retrieve the names of all employees who work on every project.
6. Retrieve the names of all employees who do not work on any project.
7. For each department, retrieve the department name and the average salary of all
employees working in that department.
8. Retrieve the average salary of all female employees.
9. Find the names and addresses of all employees who work on at least one project
located in Houston but whose department has no location in Houston.
10. List the last names of all department managers who have no dependents.
Normalization
Problems Without Normalization

• Since, databases contains a lot of data in the form of tables, it


is very difficult or almost impossible to manage data if any of
the anomaly occur i.e. either Insertion, Deletion or Updation.
Hence, removal of redundant data is very necessary.
• Insertion, Updation and Deletion Anomalies are very frequent
if database is not normalized.
• Inconsistency – redundancy---anomalies--extra memory space

55
Example of Anomalies
Sid​ Sname​ cid​ Cname​ Fid​ Fname​ Salary​

1​ Ram​ C1​ Dbms​ F1​ John​ 50,000​

2​ Ravi​ C2​ Java​ F2​ Bob​ 70,000​

3​ Nitin​ C1​ Dbms​ F1​ John​ 50,000​

4​ Amit​ C1​ Dbms​ F1​ John​ 50,000​


Normalizations on Relational Database
 Normalization is a database design technique which
organizes tables in a manner that reduces redundancy and
dependency of data.
 It divides larger tables to smaller tables and links them using
relationships.
 The inventor of the relational model Edgar Codd proposed
the theory of normalization with the introduction of First
Normal Form, and he continued to extend theory with
Second and Third Normal Form. Later he joined with
Raymond F. Boyce to develop the theory of Boyce-Codd
Normal Form(BCNF).

57
Evolution of Normalization theories

 Theory of Data Normalization in SQL is still being developed


further.
 For example, there are discussions even on 6th Normal
Form.
 However, in most practical applications, normalization
achieves its best in 3rd Normal Form.
 Normalization is carried out in practice so that the resulting
designs are of high quality and meet the desirable
properties

58
First Normal Form(1NF)
• The First Normal Form(1NF) works on the concept of “Atomicity”
in values of every individual tuple of tables present in the
database.
• It means, a relation is said to be in "1NF" if, every attribute in a
relation is has “Single Valued” tuple.
Functional Dependency (FD)
 FD is a set of constraints between two attributes in a
relation.
 A relationship which only exists when an attribute can
determine other attribute functionally.
• Functional Dependency in DBMS is denoted using an
arrow between two or more attributes such as FD : A
→B
• Here, A & B are the attributes present in any relation.
• “A → B” means, “B” is functionally dependent upon “A”
or “A” functionally determines “B”. Functional
dependency acts as a constraint between set of
attributes present in any database.

60
Example-1 : Consider a table student_details containing details of
some students.
Example : student_details Table

We can conclude from Roll_No attribute in the table, we are able


to determine the Name of student uniquely and same is the case
with marks too. Hence, we can say that Name and Marks are
functionally dependent on Roll_No but the vice versa is not true.

FD1 : Roll_No → Name


FD2 : Roll_No → Marks
61
Armstrong’s Axioms
• Axioms in database management systems was introduced
by William W. Armstrong in late 90’s and these axioms play a
vital role while implementing the concept of functional
dependency in DBMS for database normalization.

• Reflexive : It means, if set “B” is a subset of “A”, then A → B.


• Augmentation : It means, if A → B, then AC → BC.
• Transitive : It means, if A → B and B → C, then A → C.
• Decomposition : It means, if A → BC, then A → B and A →
C.
• Union : It means, if A → B and A → C, then A → BC.
• Pseudo-Transitivity : It means, if A → B and DB → C,
then DA → C.
Closure Of Functional Dependency

• The Closure Of Functional Dependency means the


complete set of all possible attributes that can be
functionally derived from given functional dependency
using the inference rules known as Armstrong’s Rules.

• If “F” is a functional dependency then closure of


functional dependency can be denoted using “{F}+”.
• There are three steps to calculate closure of functional
dependency. These are:

• Step-1 : Add the attributes which are present on Left


Hand Side in the original functional dependency.

• Step-2 : Now, add the attributes present on the Right


Hand Side of the functional dependency.

• Step-3 : With the help of attributes present on Right


Hand Side, check the other attributes that can be
derived from the other given functional dependencies.
• Step-4: Repeat this process until all the possible
attributes which can be derived are added in the closure.
Example

Consider a relation R(A,B,C,D,E) having below mentioned


functional dependencies.
FD1 : A → BC
FD2 : C → B
FD3 : D → E
FD4 : E → D
Now, we need to calculate the closure of attributes of the
relation R. The closures will be:

{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {B, C}
{D}+ = {D, E}
{E}+ = {E,D}
Identifying Candidate Key
•“A Candidate Key of a relation is an attribute or set of attributes
that can determine the whole relation or contains all the
attributes in its closure."
Example-1 : Consider the relation R(A,B,C) with given functional
dependencies :
FD1 : A → B
FD2 : B → C

Now, calculating the closure of the attributes as :

{A}+ = {A, B, C}
{B}+ = {B, C}
{C}+ = {C}

Clearly, “A” is the candidate key as, its closure contains all the
attributes present in the relation “R”.
Example-2 : Consider another relation R(A, B, C, D, E) having the
Functional dependencies :
FD1 : A → BC
FD2 : C → B
FD3 : D → E
FD4 : E → D
Now, calculating the closure of the attributes as :
{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {C, B}
{D}+ = {E, D}
{E}+ = {E, D}
In this case, a single attribute is unable to determine all the attribute on
its own like in previous example. Here, we need to combine two or
more attributes to determine the candidate keys.

{A, D}+ = {A, B, C, D, E}


{A, E}+ = {A, B, C, D, E}
• Example -2: Consider a relation R ( A , B , C , D , E , F , G ) with the
functional dependencies-
• A → BC
• BC → DE
• D→F
• CF → G
• Example -3: Consider the given functional dependencies-
• AB → CD
• AF → D
• DE → F
• C→G
• F→E
• G→A
Key Definitions
• Prime Attributes : Attributes which are indispensable part of
candidate keys. For example : “A, D, E” attributes are prime
attributes in above example-2.
• Non-Prime Attributes : Attributes other than prime attributes
which does not take part in formation of candidate keys. For
example “B, C”.
• Extraneous Attributes : Attributes which does not make any
effect on removal from candidate key.
• For example : Consider the relation R(A, B, C, D) with FDs
• FD1 : A → BC Here, Candidate key can be “AD” only.
Hence,
• FD2 : B → C Prime Attributes : A, D.
• FD3 : D → C Non-Prime Attributes : B, C
• Extraneous Attributes : B, C(As if we add any of the attribute to the
candidate key, it will remain unaffected).
Second Normal Form(2NF)
• Any relation to be in 2NF must follow the below two rules:
• The relation/table must be in 1NF.
• There should not be any partial dependency.
• Whenever any non-prime attribute is dependent upon a part of
candidate key of a relation, it is known as partial dependency.
• For example : Consider a relation R(X,Y,Z,T) with following functional
dependencies :
• FD1 : XY → T
• FD2 : Y → Z

Step 1: Find the closure and determine the candidate key.


• {X}+ = {X}
• {Y}+ = {Y,Z}
• {XY}+ = {X,Y,Z,T} Here, Candidate key will be “XY”.
• {T}+ = {T}
Step-2: Check whether the non-prime attributes in the FD’s
are fully dependent upon candidate key or not.

• XY → T (Attribute “T” fully depends upon “XY” as “XY” is a


candidate key)
• Y → Z (Attribute “Z” is partially dependent upon the candidate
key as, “Y” is a part of “XY”)

Step-3: To remove partial dependency, decompose the whole


relation R(X,Y,Z,T) into R1(X,Y,T) and R2(Y,Z).

• Therefore, R1 : XY → T (“XY” will be the candidate key)


• and R2 : Y → Z (“Y” will be the candidate key)

• Hence in relation R2, “Z” will be fully dependent upon “Y” as it is


the candidate key for relation R2.
Third Normal Form(3NF)
• Any relation to be in Third Normal Form(3NF) must follow the
below mentioned two rules :
• The relation/table should be in Second Normal Form.
• The relation/table must not have any transitive
dependency.
• If a “Non-Prime” attribute is functionally dependent upon another
“Non-Prime” attribute, that functional dependency will also be
termed as transitive.
Example:- Consider the relation R(X,Y,Z,T) with the following FDs
• FD1 : XY → Z
• FD2 : Z → T
Step-1 : Check whether given relation is in 2NF or not.
Here, “XY” will be the candidate key for the above mentioned FD’s
and no partial dependency exists in this relation too.
Step-2 : Remove transitive dependencies from the relation
and try to decompose it into a separate
relations.
• XY → Z (“Z” is fully dependent on candidate key “XY”)
• Z → T (Z & T, both are non-prime attributes and thus forms a
transitive dependency)
Step-3: Splitting the whole relation into two separated
relations R1(X,Y,Z) and R2(Z,T)
• R1 : XY → Z
(Here, “XY” is the candidate key and there is no transitive
dependency in this whole relation)
• R2 : Z → T
(Here, “Z” is the candidate key and there is no transitive dependency
in this whole relation)
• Hence, the relation can be termed to be in 3NF, as both the
relation R1 and R2 does not have any transitive dependency
as well as both as are in 2NF too.
BCNF (Boyce Codd. Normal Form)
• BCNF is known as “Boyce Codd Normal Form” and is a
successor to “Third Normal Form”.
• Any relation to be in BCNF must follow below mentioned
two rules.
• The relation/table needs to be in 3NF.
• For a functional dependency P → Q, “P” should be a
super key.
• BCNF deals with the cases where “Non-Prime attribute
derives a prime attribute”.
Example:- Consider a relation R(X,Y,Z) having following
functional dependencies :
• FD1 : XY → Z
• FD2 : Z → Y
Step-1 : Check whether given relation is in 3NF or not.
• Attribute “Z” is fully dependent upon the candidate key “XY” and there
is not transitive dependency too. Also, “X & Y” are prime attributes
whereas “Z & T” are non-prime attributes.
Step-2 : Check for FD’s where a non-prime attribute determines
a prime attribute.
• In this example “Z → Y” is that FD.
• FD1 : XY → Z (“XY” is the candidate key, so no problem in this FD)
• FD2 : Z → Y (Here, a non-prime attribute is deriving a prime attribute)
Step-3: Decompose relation R(X,Y,Z,T) into R1(X,Y,Z) an
R2(Z, Y) to convert the relation in BCNF.
• R1 : XY → Z
• R2 : Z → Y
• “XY” and “Z” are super keys of their respective relations i.e. “R1”
and “R2” as each of them can uniquely determine the attributes
present in the relations.
Fourth normal form (4NF)
• A relation will be in 4NF if it is in Boyce Codd normal form
and has no multi-valued dependency.
• For a dependency X → Y, if for a single value of X, multiple
values of Y exists, then the relation will be a multi-valued
dependency.
• Also, a table should have at-least 3 columns for it to have a
multi-valued dependency.
• And, for a relation R(X,Y,Z), if there is a multi-valued
dependency between, X and Y, then Y and Z should be
independent of each other.
Steps to solve Normalization
problems
• Step -1: Find candidate Key
• Step-2: List Prime and Non-Prime attributes
• Step-3: Check different NFs
Challenging task to practice
Q. R = {A,B,C,D,E,F}

FD= {AB→C, AD→B, C→B, F→AD,F→E}

1) Is R in 2 NF? If no, decompose R into 2NF.

2) Check R in 3NF or not, if not Decompose into 3NF.

3) Further check R in BCNF or not, if not

79
Challenging task to practice
Q. R = {A,B,C,D,E,F,G,H,I,J,K}

FD= {AB→CK, AD→BG, C→BH, F→ADI,F→EJ}

1) Is R in 2 NF? If no, decompose R into 2NF.

2) Check R in 3NF or not, if not Decompose into 3NF.

3) Further check R in BCNF or not, if not

80

You might also like