Unit-Iv Schema Refinement and Normalisation: Unit 4 Contents at A Glance
Unit-Iv Schema Refinement and Normalisation: Unit 4 Contents at A Glance
Unit-Iv Schema Refinement and Normalisation: Unit 4 Contents at A Glance
UNIT-IV
SCHEMA REFINEMENT AND
NORMALISATION
Schema Refinement:
The Schema Refinement refers to refine the schema by using some technique. The best
technique of schema refinement is decomposition.
Normalisation or Schema Refinement is a technique of organizing the data in the
database. It is a systematic approach of decomposing tables to eliminate data
redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies.
Redundancy refers to repetition of same data or duplicate copies of same data stored in
different locations.
Anomalies: Anomalies refers to the problems occurred after poorly planned and normalised
databases where all the data is stored in one table which is sometimes called a flat file
database.
Anomalies refers to the problems occurred after poorly planned and unnormalised
databases where all the data is stored in one table which is sometimes called a flat file
database. Let us consider such type of schema –
Here all the data is stored in a single table which causes redundancy of data or say
anomalies as SID and Sname are repeated once for same CID . Let us discuss
anomalies one by one.
Due to redundancy of data we may get the following problems, those are-
1.insertion anomalies : It may not be possible to store some information unless
some other information is stored as well.
2.redundant storage: some information is stored repeatedly
Insertion Anomaly and Deletion Anomaly- These anomalies exist only due to
redundancy, otherwise they do not exist.
Because of insertion of some data, It is forced to insert some other dummy data.
Deletion Anomaly :
Deletion of S3 student cause the deletion of course.
Because of deletion of some data forced to delete some other useful data.
Candidate Key:
Candidate Key is minimal set of attributes of a relation which can be used to identify a
tuple uniquely.
Consider student table: student(sno, sname,sphone,age)
we can take sno as candidate key. we can have more than 1 candidate key in
a table. types of candidate keys:
1. simple(having only one attribute)
2. composite(having multiple attributes as candidate key)
Super Key:
Super Key is set of attributes of a relation which can be used to identify a tuple uniquely.
as super key
Normalization:
Normalization is a process of designing a consistent database with minimum
redundancy which support data integrity by grating or decomposing given relation into
smaller relations preserving constraints on the relation.
Normalisation removes data redundancy and it will helps in designing a good data base
which involves a set of normal forms as follows -
1)First normal form(1NF)
2)Second normal
form(2NF) 3)Third normal
form(3NF)
4)Boyce coded normal
form(BCNF) 5)Forth normal
form(4NF)
6)Fifth normal form(5NF)
14 John 7272826385, UP
9064738238
14 John 7272826385 UP
14 John 9064738238 UP
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID
which is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
Example:
EMPLOYEE_DETAIL table:
1.
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
Prepard by K.Swathi Lakshmi Durga
Assistant Professor
Department of IT
DATA BASE MANAGEMENT
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Note: In some cases multi value dependencies may exist not more than one time in a
given relation.
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is
required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and
redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look
like:
Employee ⋈ Department
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part of
R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The
relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A-
>BC is a part of relation R1(ABC).
o 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
SUBJECT LECTURER SEMESTER
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Prepard by K.Swathi Lakshmi Durga
Assistant Professor
Department of IT
DATA BASE MANAGEMENT
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Algorithm fifth normal form: fifth normal form is related to join dependencies.
A relation R is said to be in fifth normal form if for every join dependency JD join {R1 , R2
,…….RN
} that holds over relation R one of the following statements must be
true- 1)Ri =R for some i
2)the join dependency is implied by the set of those functional dependency over
relation R in which the left side is key attribute for R.
NOTE: if the relation schema is a third normal form and each of its keys consist of single
attribute, we can say that it can also be in fifth normal form.
A join dependency JD join {R1, R2, ……RN} is said to hold for a relation R if R1,R2…..RN
this decomposition is a loss less join decomposition of R.
When a relation is in forth normal form and decompose further to eliminate redundancy
and anomalies due to insert or update or delete operation, there should not be any loss
of data or should not create a new record when the decompose tables are rejoin.
Solution:
Important Points for solving above type of question.
1) It is always a good idea to start checking from BCNF, then 3 NF and so on.
2) If any functional dependency satisfied a normal form then there is no need to check
for lower normal form. For example, ABC –> D is in BCNF (Note that ABC is a super
key), so no need to check this dependency for lower normal forms.
Candidate keys in given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super key so this
dependency is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don’t need to check for this dependency as it already satisfied BCNF.
Let us consider CD -> AE. Since E is not a prime attribute, so relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD which is a proper subset of a
candidate key and it determine E, which is non prime attribute. So, given relation is also
not in 2 NF. So, the highest normal form is 1 NF.
problem 2:
Find the highest normal form of a relation
R(A,B,C,D,E) with FD set as
{BC-
>D,
AC-
>BE,
B->E}
Step 1:As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all
attribute of relation, So AC will be candidate key. A or C can’t be derived from any other
attribute of the relation, so there will be only 1 candidate key {AC}.
Step 2: Prime attribute are those attribute which are part of candidate key {A,C} in this
example and others will be non-prime {B,D,E} in this example.
Step 3: The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not
proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate
key) and B->E is in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor
D is a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute)
but to satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be
prime attribute.
So the highest normal form of relation will be 2nd Normal form.
Decomposition: It is the process of splitting original table into smaller relations such that
attribute sets of two relations will be the subset of attribute set of original table.
Rules of decomposition:
If ‘R’ is a relation splitted into ‘R1’ and ‘R2’ relations, the decomposition done should satisfy
following-
1)Union of two smaller subsets of attributes gives all attributes of ‘R’.
R1(attributes)UR2(attributes)=R(attributes)
2)Both relations interaction should not give null value.
R1(attributes)∩R2(attributes)!=null
3) Both relations interaction should give key
attribute. R1(attribute)∩R2(attribute)=R(key
attribute)
Properties of decomposition:
Lossless decomposition: while joining two smaller tables no data should be lost and
should satisfy all the rules of decomposition. No additional data should be generated on
natural join of decomposed tables.
Lossy join decomposition: if information is lost after joining and if do not satisfy any
one of the above rules of decomposition.
example 1:
example 2:
Dependency Preservation