Unit 4: Relational Database Design
Unit 4: Relational Database Design
Unit 4: Relational Database Design
This represents the result of a natural join on the relations corresponding to instructor and
department. This seems like a good idea because some queries can be expressed using fewer joins,
until we think carefully about the facts about the university that led to our E-R design.
Let us consider the instance of the inst_dept relation shown in below figure. Notice that we
have to repeat the department information (“building” and “budget”) once for each instructor in the
department. For example, the information about the Comp.Sci. department (Taylor, 100000) is
included in the tuples of instructors Katz, Srinivasan, and Brandt.
It is important that all these tuples agree as to the budget amount since otherwise our
database would be inconsistent. In our original design using instructor and department, we stored
the amount of each budget exactly once. This suggests that using inst dept is a bad idea since it
Department of IT Page 1
stores the budget amounts redundantly and runs the risk that some user might update the budget
amount in one tuple but not all, and thus create inconsistency.
Even if we decided to live with the redundancy problem, there is still another problem with
the inst_dept schema. Suppose we are creating a new department in the university. In the
alternative design above, we cannot represent directly the information concerning a department
(dept_name, building, budget) unless that department has at least one instructor at the university.
This is because tuples in the inst_dept table require values for ID, name, and salary. This means that
we cannot record information about the newly created department until the first instructor is hired
for the new department.
Update Anomalies: - If one copy of such repeated data is updated, an inconsistency is created
unless all copies are similarly updated.
Insertion Anomalies: - It may not be possible to store certain information unless some other,
unrelated, information is stored as well.
Deletion Anomalies: - It may not be possible to delete certain information without losing some
other, unrelated, information as well.
The flaw in this decomposition arises from the possibility that the enterprise has two employees
with the same name. This is not unlikely in practice, as many cultures have certain highly popular
names. Of course each person would have a unique employee-id, which is why ID can serve as the
Department of IT Page 2
primary key. As an example, let us assume two employees, both named Kim, work at the university
and have the following tuples in the relation on schema employee in the original design:
The above figure shows these tuples, the resulting tuples using the schemas resulting from the
decomposition, and the result if we attempted to regenerate the original tuples using a natural join.
As we see in the figure, the two original tuples appear in the result along with two new tuples that
incorrectly mix data values pertaining to the two employees named Kim. Although we have more
tuples, we actually have less information in the following sense. We can indicate that a certain
street, city, and salary pertain to someone named Kim, but we are unable to distinguish which of
the Kims. Thus, our decomposition is unable to represent certain important facts about the
university employees. Clearly, we would like to avoid such decompositions. We shall refer to such
decompositions as being lossy decompositions, and, conversely, to those that are not as lossless
decompositions.
NORMALIZATION: -
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy(repetition) and
undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization divides
the larger table into the smaller table and links them using relationship. The normal form is used to
reduce redundancy from the database table.
Normalization is used for mainly two purposes,
Eliminating redundant (useless) data.
Ensuring data dependencies make sense i.e data is logically stored.
Problems without Normalization
If a table is not properly normalized and have data redundancy then it will not only eat up extra
memory space but will also make it difficult to handle and update the database, without facing data
Department of IT Page 3
loss. Insertion, Updation and Deletion Anomalies are very frequent if database is not normalized. To
understand these anomalies let us take an example of a Student table.
Rollno name branch hod office_tel
Insertion Anomaly: -
Suppose for a new admission, until and unless a student opts for a branch, data of the student
cannot be inserted, or else we will have to set the branch information as NULL. Also, if we have to
insert data of 100 students of same branch, then the branch information will be repeated for all
those 100 students. These scenarios are nothing but Insertion anomalies.
Updation Anomaly: -
What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that
case all the student records will have to be updated, and if by mistake we miss any record, it will
lead to data inconsistency. This is Updation anomaly.
Deletion Anomaly: -
In our Student table, two different information’s are kept together, Student information and Branch
information. Hence, at the end of the academic year, if student records are deleted, we will also lose
the branch information. This is Deletion anomaly.
Normalization Rule
Normalization rules are divided into the following normal forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
5. Fourth Normal Form
Normal Form Description
Department of IT Page 4
5NF A relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.
14 John 7272826385, UP
9064738238
Department of IT Page 5
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
What is Dependency?
Let's take an example of a Student table with columns student_id, name, reg_no(registration
number), branch and address(student's home address).
student_id name reg_no branch address
In this table, student_id is the primary key and will be unique for every row, hence we can
use student_id to fetch any row of data from this table. Even for a case, where student names are
same, if we know the student_id we can easily fetch the correct record.
student_id name reg_no branch address
Hence we can say a Primary Key for a table is the column or a group of columns (composite
key) which can uniquely identify each record in the table. I can get the branch name of student
with student_id 10. Similarly, I can get name of student with student_id 10 or 11. So all I need
is student_id and every other column depends on it, or can be fetched using it. This
is Dependency and we also call it Functional Dependency.
Department of IT Page 6
1 Java
2 C++
3 Php
Now we have a Student table with student information and another table Subject for
storing subject information.
Let's create another table Score, to store the marks obtained by students in the respective
subjects. We will also be saving name of the teacher who teaches that subject along with marks.
score_id student_id subject_id marks teacher
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are these
and subject_id to know for which subject the marks are for. Together, student_id +
subject_id forms a Candidate Key for this table, which can be the Primary key. Now if you look
at the Score table, we have a column names teacher which is only dependent on the subject, for
Java it's Java Teacher and for C++ it's C++ Teacher & so on.
The primary key for this table is a composition of two columns which
is student_id & subject_id but the teacher's name only depends on subject, hence the subject_id,
and has nothing to do with student_id. This is Partial Dependency, where an attribute in a table
depends on only a part of the primary key and not on the whole key.
To remove Partial Dependency there can be many different solutions for this, but out objective is
to remove teacher's name from Score table. The simplest solution is to remove
columns teacher from Score table and add it to the Subject table. Hence, the Subject table will
become:
subject_id subject_name teacher
1 10 1 70
2 10 2 75
3 11 1 80
Quick Recap: -
1. For a table to be in the Second Normal form, it should be in the First Normal form and it
should not have Partial Dependency.
2. Partial Dependency exists, when for a composite primary key, any attribute in the table
depends only on a part of the primary key and not on the complete primary key.
Department of IT Page 7
3. To remove Partial dependency, we can divide the table, remove the attribute which is causing
partial dependency, and move it to some other table where it fits in well.
In the Score table, we need to store some more information, which is the exam name and total
marks, so let's add 2 more columns to the Score table.
score_id student_id subject_id marks exam_name total_marks
Department of IT Page 8
How to remove Transitive Dependency?
The solution is very simple. Take out the columns exam_name and total_marks from Score table
and put them in an Exam table and use the exam_id wherever required.
Score Table: In 3rd Normal Form
score_id student_id subject_id marks exam_id
1 Workshop 200
2 Mains 70
3 Practicals 30
103 C# P.Chash
Department of IT Page 9
In the table above student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table. One more important point
is, one professor teaches only one subject, but one subject may have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends on the
professor name.
This table satisfies the 1st Normal form because all the values are atomic, column names are
unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
But this table is not in Boyce-Codd Normal Form.
In the table above, student_id, subject form primary key, which means subject column is
a prime attribute. But, there is one more dependency, professor → subject. And while subject is
a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF.
Department of IT Page 10
5. Fourth Normal Form (4NF): -
Fourth Normal Form comes into picture when Multi-valued Dependency occur in any relation.
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey
In the table above, student with s_id 1 has opted for two courses, Science and Maths, and has
two hobbies, Cricket and Hockey. Well the two records for student with s_id 1, will give rise to
two more records, as shown below, because for one student, two hobbies exists, hence along
with both the courses, these hobbies should be specified.
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
And, in the table above, there is no relationship between the columns course and hobby. They
are independent of each other. So there is multi-value dependency, which leads to un-necessary
repetition of data and other anomalies as well.
How to satisfy 4th Normal Form?
To make the above relation satify the 4th normal form, we can decompose the table into 2 tables.
Department of IT Page 11
Now this relation satisfies the fourth normal form. A table can also have functional dependency
along with multi-valued dependency. In that case, the functionally dependent columns are
moved in a separate table and the multi-valued dependent columns are moved to separate
tables. If you design your database carefully, you can easily avoid these issues.
Example: -
SUBJECT LECTURER SEMESTER
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to identify a
valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank. So to make the above
table into 5NF, we can decompose it into three relations P1, P2 & P3:
Department of IT Page 12
FUNCTIONAL DEPENDENCY THEORY: -
Functional Dependency (FD) determines the relation of one attribute to another attribute
in a database management system (DBMS) system. The functional dependency is a relationship that
exists between two attributes. It typically exists between the primary key and non-key attribute
within a table. Functional dependency helps you to maintain the quality of data in the database.
A functional dependency is denoted by an arrow →. The functional dependency of X on Y is
represented by X → Y. The left side of FD is known as a determinant, the right side of the
production is known as a dependent. Functional Dependency plays a vital role to find the
difference between good and bad database design.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address. Here
Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we
know the Emp_Id, we can tell that employee name associated with it. Functional dependency can be
written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Key terms: -
Here, are some key terms for functional dependency:
S.NO KEY TERM DESCRIPTION
1 Axiom Axiom is a set of inference rules used to infer all the
functional dependencies on a relational database.
2 Decomposition It is a rule that suggests if you have a table that appears to
contain two entities which are determined by the same
primary key then you should consider breaking them up
into two different tables.
3 Dependent It is displayed on the right side of the functional
dependency diagram.
4 Determinant It is displayed on the left side of the functional dependency
Diagram.
5 Union It suggests that if two tables are separate, and the PK is the
same, you should consider putting them together.
Department of IT Page 13
We know that STUD_NO is unique for each student. So STUD_NO->STUD_NAME, STUD_NO-
>STUD_PHONE, STUD_NO->STUD_STATE, STUD_NO->STUD_COUNTRY and STUD_NO ->
STUD_AGE all will be true.
Similarly, STUD_STATE->STUD_COUNTRY will be true as if two records have same STUD_STATE,
they will have same STUD_COUNTRY as well.
For relation STUDENT_COURSE, COURSE_NO->COURSE_NAME will be true as two records with
same COURSE_NO will have same COURSE_NAME.
Attribute Closure: -
Attribute closure of an attribute set can be defined as set of attributes which can be functionally
determined from it.
How to find attribute closure of an attribute set?
To find attribute closure of an attribute set:
Add elements of attribute set to the result set.
Recursively add elements to the result set which can be functionally determined from the
elements of the result set.
Using FD set, attribute closure can be determined as:
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY, STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}
Department of IT Page 14
Symbolically: A ->B is trivial functional dependency if B is a subset of A. The following
dependencies are also trivial: A->A & B->B.
For Example: -
Consider a table with two columns Student_id and Student_Name. {Student_Id, Student_Name} ->
Student_Id is a trivial functional dependency as Student_Id is a subset of {Student_Id,
Student_Name}. That makes sense because if we know the values of Student_Id and
Student_Name then the value of Student_Id can be uniquely determined. Also, Student_Id ->
Student_Id & Student_Name -> Student_Name are trivial dependencies too.
2. Non-trivial functional dependency: - If a functional dependency X->Y holds true where Y is not
a subset of X then this dependency is called non trivial Functional dependency.
For Example: -
An employee table with three attributes: emp_id, emp_name, emp_address. The following
functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)
On the other hand, the following dependencies are trivial:
{emp_id, emp_name} -> emp_name [emp_name is a subset of {emp_id, emp_name}]
If a FD X->Y holds true where X intersection Y is null then this dependency is said to be
completely non trivial function dependency.
Here columns manuf_year and color are independent of each other and dependent on
bike_model. In this case these two columns are said to be multivalued dependent on bike_model.
These dependencies can be represented like this:
bike_model ->> manuf_year
bike_model ->> color
4. Transitive dependency: -
A functional dependency is said to be transitive if it is indirectly formed by two functional
dependencies.
X -> Z is a transitive dependency if the following three functional dependencies hold true:
X->Y
Y does not ->X
Y->Z
Note: A transitive dependency can only occur in a relation of three of more attributes. This
dependency helps us normalizing the database in 3NF (3rd Normal Form).
Department of IT Page 15
Example: -
Company CEO Age
Alibaba Jack Ma 54
{Company} -> {CEO} (if we know the company, we know its CEO's name)
{CEO} -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{Company} -> {Age} should hold, that makes sense because if we know the company name, we
can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of three or
more attributes.
Decomposition: -
The process of breaking up or dividing a single relation into two or more sub relations is called as
decomposition of a relation. When a relation in the relational model is not in appropriate normal
form then the decomposition of a relation is required. In a database, it breaks the table into
multiple tables. If the relation has no proper decomposition, then it may lead to problems like loss
of information. Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
Types of Decomposition: -
1. Lossless Decomposition.
2. Dependency Preserving.
1. Lossless Decomposition: - If the information is not lost from the relation that is decomposed,
then the decomposition will be lossless. The lossless decomposition guarantees that the join of
relations will result in the same relation as it was decomposed. The relation is said to be lossless
decomposition if natural joins of all the decomposition give the original relation.
Example: -
EMPLOYEE_DEPARTMENT table
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
Department of IT Page 16
EMPLOYEE table
EMP_ID EMP_NAME EMP_AGE EMP_CITY
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:
Employee ⋈ Department
Department of IT Page 18
UNIT WISE IMPORTANT QUESTIONS: -
TWO MARK QUESTIONS: -
1. Define Functional Dependency [APRIL 2019]
(OR)
Define Functional Dependency [APRIL 2018]
(OR)
Define Functional Dependency [MAY 2016]
2. What is the necessity of normalization in data base? [APRIL 2019]
3. Write the definition of BCNF. [APRIL 2018]
4. Define Functional Dependency. Why are some functional dependencies trivial? [APRIL 2017]
5. Demonstrate transitive dependency. Give an example. [APRIL 2017]
6. Define 3rd Normal form. [APRIL 2017]
7. Write the definition of 3NF. [OCTOBER 2018]
8. Define MVD. [OCTOBER 2018]
9. Write any two reasons for going to normalization on a relation. [OCTOBER 2017]
10. What is multi valued dependencies? Give an example. [OCTOBER 2017]
11. Why certain functional dependencies are called trivial? [OCTOBER 2017]
12. Identify the problems caused by Redundancy. [DECEMBER 2016]
13. Define Transitive dependencies. [DECEMBER 2016]
ESSAY QUESTIONS: -
1. Explain how 3NF and BCNF can remove redundancy from the relations? [APRIL 2019]
2. Discuss Multi valued dependency and 4NF in detail. [APRIL 2019]
3. What is MVD and write the MVD axioms? [APRIL 2018]
4. Explain 4NF and uses of 4NF with examples. [APRIL 2018]
5. Consider a relation scheme R=(A,B,C,D,E,H) on which the following functional dependencies hold:
{A⟶B, BC⟶D, E⟶C, D⟶A}. Write the candidate keys of R. [APRIL 2017]
6. Consider the statement “Every relation in 3 NF is also in BCNF and Vice Versa”. Judge whether statement
is correct or not? Give explanation. [APRIL 2017]
7. Explain Lossless Join Decomposition and Dependency Preserving Decomposition. [MAY 2016]
8. Define BCNF? How does BCNF differ from 3NF? Explain with an example. [MAY 2016]
9. What is FD and write the FD axioms. [OCTOBER 2018]
10. Explain BCNF and uses of BCNF with examples. [OCTOBER 2018]
11. Consider a relation scheme R = (A, B, C, D, E, H) on which the following functional dependencies hold:
{A⟶B, BC⟶D, E⟶C, D⟶A}. Compute the canonical cover. [OCTOBER 2017]
12. Define first normal form and second normal form. Give an example. [OCTOBER 2017]
13. What is schema refinement? Discuss the problems caused by redundancy. [DECEMBER 2016]
14. Contrast 3NF decomposition method with BCNF decomposition method illustratively.
[DECEMBER 2016]
Department of IT Page 19