Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

F - DataBase Chapter - 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Chapter 4 Fundamentals of Database Systems

Chapter 4: Functional Dependency and Normalization

4.1. .Functional Dependency

The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table. X → Y

The left side of FD is known as a determinant, the right side of the production is known
as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Dependencies in DBMS is a relation between two or more attributes. It has the following
types in DBMS −

 Fully-Functional Dependency
 Multivalued Dependency
 Partial Dependency
 Transitive Dependency
Functional Dependency
If the information stored in a table can uniquely determine another information in the same
table, then it is called Functional Dependency. Consider it as an association between two
attributes of the same relation.
If P functionally determines Q, then
P -> Q
Let us see an example −
<Employee>
EmpID EmpName EmpAge
E01 Amit 28
E02 Rohit 31

1
Chapter 4 Fundamentals of Database Systems

In the above table, EmpName is functionally dependent


on EmpID because EmpName can take only one value for the given value of EmpID:
EmpID -> EmpName

Fully-functionally Dependency
An attribute is fully functional dependent on another attribute, if it is Functionally
Dependent on that attribute and not on any of its proper subset.
For example, an attribute Q is fully functional dependent on another attribute P, if it is
Functionally Dependent on P and not on any of the proper subset of P.
Let us see an example −

<EmployeeProject>
EmpID ProjectID Days (spent on the project)
E099 001 320
E056 002 190

Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent on the
project by the employee.
This summarizes and gives our fully functional dependency −
{EmpID, ProjectID} -> (Days)
Multivalued Dependency
When existence of one or more rows in a table implies one or more other rows in the same
table, then the Multi-valued dependencies occur.
If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.
It is represented by double arrow −
->->

For our example:


P->->Q
Q->->R
In the above case, Multivalued Dependency exists only if Q and R are independent
attributes.

2
Chapter 4 Fundamentals of Database Systems

Partial Dependency
Partial Dependency occurs when a nonprime attribute is functionally dependent on part of
a candidate key.
u Partial Dependency – when an non-key attribute is determined by a part, but not
the whole, of a COMPOSITE primary key.

The 2nd Normal Form (2NF) eliminates the Partial Dependency. Let us see an example −
<StudentProject>
StudentID ProjectNo StudentName ProjectName
S01 199 Katie Geo Location
S02 120 Ollie Cluster Exploration
In the above table, we have partial dependency; let us see how −
The prime key attributes are StudentID and ProjectNo.
As stated, the non-prime attributes i.e. StudentName and ProjectName should be
functionally dependent on part of a candidate key, to be Partial Dependent.
The StudentName can be determined by StudentID that makes the relation Partial
Dependent.
The ProjectName can be determined by ProjectID, which that the relation Partial
Dependent.

Transitive Dependency
When an indirect relationship causes functional dependency it is called Transitive
Dependency.
If P -> Q and Q -> R is true, then P-> R is a transitive dependency.
Transitive Dependency – when a non-key attribute determines another non-key attribute.

3
Chapter 4 Fundamentals of Database Systems
Types of Functional dependency

1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency
as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are tri
vial dependencies too.

2. Non-trivial functional dependency

o A → B has a non-trivial functional dependency if B is not a subset of A.


o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID → Name,
2. Name → DOB
4
Chapter 4 Fundamentals of Database Systems

Anomalies in DBMS

The type of problems that could occur in insufficiently normalized table is called
anomalies

There are three types of anomalies that occur when the database is not normalized. These
are – Insertion, update and deletion anomaly. Let’s take an example to understand this.

In update anomaly, the data cannot be updated correctly because the same values occur
multiple times in a column and in delete anomaly the deletion of a record creates
inconsistency as it gets deleted from more than one row. So the aim of normalization is to
remove redundant data as well as storing only related data in the table. This decreases the
database size and the data gets logically stored in the database.

Example: Suppose a manufacturing company stores the employee details in a table


named employee that has four attributes: emp_id for storing employee’s id, emp_name
for storing employee’s name, emp_address for storing employee’s address and emp_dept
for storing the department details in which the employee works. At some point of time
the table looks like this:

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004


The above table is not normalized. We will see the problems that we face when a table is
not normalized.

Update anomaly: In the above table we have two rows for employee Rick as he belongs
to two departments of the company. If we want to update the address of Rick then we
have to update the same in two rows or the data will become inconsistent. If somehow,
the correct address gets updated in one department but not in other then as per the
database, Rick would be having two different addresses, which is not correct and would
lead to inconsistent data.

5
Chapter 4 Fundamentals of Database Systems
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D890
then deleting the rows that are having emp_dept as D890 would also delete the
information of employee Maggie since she is assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we will
discuss about normalization.

Normalization is the process of identifying the logical associations between data items
and designing a database that will represent such associations but without suffering the
anomalies which are;
1. Insertion Anomalies
2. Deletion Anomalies
3. Modification Anomalies
Normalization may reduce system performance since data will be cross referenced from
many tables. Thus denormalization is sometimes used to improve performance, at the cost
of reduced consistency guarantees.

Normalization normally is considered as good if it is lossless decomposition.

All the normalization rules will eventually remove the anomalies that may exist during data
manipulation after the implementation.

The type of problems that could occur in insufficiently normalized table is called anomalies
which includes;

(1) Insertion anomalies


An "insertion anomaly" is a failure to place information about a new database entry
into all the places in the database where information about that new entry needs to be
stored. In a properly normalized database, information about a new entry needs to be
inserted into only one place in the database; in an inadequately normalized database,
information about a new entry may need to be inserted into more than one place and,

6
Chapter 4 Fundamentals of Database Systems
human fallibility being what it is, some of the needed additional insertions may be
missed. In other words, due to human fallibility, additional insertions may be missed.
(2) Deletion anomalies
A "deletion anomaly" is a failure to remove information about an existing database
entry when it is time to remove that entry. In a properly normalized database,
information about an old, to-be-gotten-rid-of entry needs to be deleted from only one
place in the database; in an inadequately normalized database, information about that
old entry may need to be deleted from more than one place, and, human fallibility being
what it is, some of the needed additional deletions may be missed.
(3) Modification/Updating anomalies
A modification of a database involves changing some value of the attribute of a table.
In a properly normalized database table, whatever information is modified by the user,
the change will be effected and used accordingly.

N.B: The purpose of normalization is to reduce the chances for anomalies to occur in a database.

4.2. Normal Forms

Normalization

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like Insertion,
Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.

Types of Normal Forms

7
Chapter 4 Fundamentals of Database Systems

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional depe
on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

4.2.1. First normal form (1NF)

 As per the rule of first normal form, an attribute (column) of a table cannot hold
multiple values.
 It should hold only atomic values.
 There are no duplicated rows in the table. Unique identifier.
 Each cell is single-valued (i.e., there are no repeating groups).
 Entries in a column (attribute, field) are of the same kind.

8
Chapter 4 Fundamentals of Database Systems

Example 1: for First Normal Form (1NF): Consider the following unnormalized database table.

EmpID FirstName LastName Skill SkillType School SchoolAdd SkillLevel

12 Abebe Mekuria SQL, Database, AAU, Sidist_Kilo 5

VB6 Programming Helico Piazza 8

16 Lemma Alemu C++ Programming Unity Gerji 6

IP Programming Jimma Jimma City 4

28 Chane Kebede SQL Database AAU Sidist_Kilo 10

65 Almaz Belay SQL Database Helico Piazza 9

Prolog Programming Jimma Jimma City 8

Java Programming AAU Sidist_Kilo 6

24 Dereje Tamiru Oracle Database Unity Gerji 5

94 Alem Kebede Cisco Networking AAU Sidist_Kilo 7

First normal form (1NF)

Remove all repeating groups. Distribute the multi-valued attributes into different rows and identify
a unique identifier for the relation so that is can be said is a relation in relational database.

EmpID FirstName LastName SkillID Skill SkillType School SchoolAdd SkillLevel

12 Abebe Mekuria 1 SQL Database AAU Sidist_Kilo 5

12 Abebe Mekuria 3 VB6 Programming Helico Piazza 8

16 Lemma Alemu 2 C++ Programming Unity Gerji 6

16 Lemma Alemu 7 IP Programming Jimma Jimma City 4

28 Chane Kebede 1 SQL Database AAU Sidist_Kilo 10

65 Almaz Belay 1 SQL Database Helico Piazza 9

65 Almaz Belay 5 Prolog Programming Jimma Jimma City 8

65 Almaz Belay 8 Java Programming AAU Sidist_Kilo 6

24 Dereje Tamiru 4 Oracle Database Unity Gerji 5

94 Alem Kebede 6 Cisco Networking AAU Sidist_Kilo 7

9
Chapter 4 Fundamentals of Database Systems

Remember you can add additional fields/attributes in normalizing database tables. As you
can see in the above SkillID is added to organize the database table effectively and
efficiently.

Example2: Suppose a company wants to store the names and contact details of its
employees. It creates a table that looks like this:

emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

8812121212
102 Jon Kanpur
9900012222

103 Ron Chennai 7778881212

9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored
them in the same field as you can see in the table above.

This table is not in 1NF as the rule says “each attribute of a table must have atomic
(single) values”, the emp_mobile values for employees Jon & Lester violates that rule.
emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987


To make the table complies with 1NF we should have the data like this:

10
Chapter 4 Fundamentals of Database Systems

4.2.2. Second Normal Form (2NF)

o In the 2NF, relational must be in 1NF.


o In the second normal form, all non-key attributes are fully functional dependent on
the primary key
o It is in 1NF and
o If all non-key attributes are dependent on the entire primary key i.e. if the primary
key is composite key.
o No partial dependency.

Example: Let's assume, a school can store the data of teachers and the subjects they teach.
In a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38
Candidate Keys: {teacher_id, subject}
Non-prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non-prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-
prime attribute is dependent on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID


which is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

11
Chapter 4 Fundamentals of Database Systems
TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

4.2.3. Third Normal Form (3NF)

o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must
be in third normal form.
o third normal form (3NF): the relation must be in 2NF and
o all transitive dependencies must be removed; a non-key attribute may not be
functionally dependent on another non-key attribute

Example for Third Normal Form (3NF): Consider the following example: Students of same
batch (same year) live in one building or dormitory.

12
Chapter 4 Fundamentals of Database Systems

STUDENT

StudID Stud_F_Name Stud_L_Name Dep’t Year Dormitary

125/97 Abebe Mekuria Info Sc 1 401

654/95 Lemma Alemu Geog 3 403

842/95 Chane Kebede CompSc 3 403

165/97 Alem Kebede InfoSc 1 401

985/95 Almaz Belay Geog 3 403

o
o This schema is in its 2NF since the primary key is a single attribute.
o Let’s take StudID, Year and Dormitary and see the dependencies.
o StudIDYear AND YearDormitary
o And Year cannot determine StudID and Dormitary cannot determine StudID. Then
transitively StudIDDormitary.
o To convert it to a 3NF we need to remove all transitive dependencies of non-key attributes
on another non-key attribute.
o The non-primary key attributes, dependent on each other will be moved to another table
and linked with the main table using candidate key- foreign key relationship.

13
Chapter 4 Fundamentals of Database Systems

O STUDENT DORM
StudID Stud F_Name Stud L_Name Dep’t Year Year Dormitary

125/97 Abebe Mekuria Info Sc 1 1 401

654/95 Lemma Alemu Geog 3 3 403

842/95 Chane Kebede CompSc 3

165/97 Alem Kebede InfoSc 1

985/95 Almaz Belay Geog 3

A relation is in third normal form if it holds at least one of the following conditions for every non-
trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal


Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

14
Chapter 4 Fundamentals of Database Systems

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007


Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on
super key (EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

4.2.4. Boyce Codd normal form (BCNF)

o BCNF is the advance version of 3NF. It is stricter than 3NF.


o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o BCNF requires that the table is 3NF and determinants are the candidate keys.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
o Isolate Independent Multiple Relationships - No table may contain two or more 1: n or N: M
relationships that are not directly related.

15
Chapter 4 Fundamentals of Database Systems

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

16
Chapter 4 Fundamentals of Database Systems

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate: keys :For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

17

You might also like