DBMS - Data Models and Relational Database Design Notes
DBMS - Data Models and Relational Database Design Notes
DBMS - Data Models and Relational Database Design Notes
Hierarchical Model
Network Model
Entity-relationship Model
Relational Model
Hierarchical Model
This database model organises data into a tree-like-structure, with a single root, to
which all the other data is linked. The heirarchy starts from the Root data, and
expands like a tree, adding child nodes to the parent nodes.
In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book,
recipes etc.
In hierarchical model, data is organised into tree-like structure with one one-to-
many relationship between two different types of data, for example, one department
can have many courses, many professors and of-course many students.
Network Model
This is an extension of the Hierarchical model. In this model data is organised more
like a graph, and are allowed to have more than one parent node.
In this database model data is more related as more relationships are established in
this database model. Also, as the data is more related, hence accessing the data is
also easier and fast. This database model was used to map many-to-many data
relationships.
This was the most widely used database model, before Relational Model was
introduced.
Entity-relationship Model
In this database model, relationships are created by dividing object of interest into
entity and its characteristics into attributes.
Different entities are related using relationships.
E-R Models are defined to represent the relationships into pictorial form to make it
easier for different stakeholders to understand.
This model is good to design a database, which can then be turned into tables in
relational model(explained below).
Let's take an example, If we have to design a School Database, then Student will be
an entity with attributes name, age, address etc. As Address is generally complex,
it can be another entity with attributes street name, pincode, city etc, and there
will be a relationship between them.
Relationships can also be of different types.
Relational Model
In this model, data is organised in two-dimensional tables and the relationship is
maintained by storing a common field.
This model was introduced by E.F Codd in 1970, and since then it has been the most
widely used database model, infact, we can say the only database model used
around the world.
The basic structure of data in the relational model is tables. All the information
related to a particular type is stored in rows of that table.
Hence, tables are also known as relations in relational model.
In the coming tutorials we will learn how to design tables, normalize them to reduce
data redundancy and how to use Structured Query language to access data from
tables.
Entity Relationship Model
ER Model is used to model the logical view of the system from data perspective which
consists of these components:
Attribute(s):
Attributes are the properties which define the entity type. For example, Roll_No,
Name, DOB, Age, Address, Mobile_No are the attributes which defines entity type
Student. In ER diagram, attribute is represented by an oval.
1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called
key attribute.For example, Roll_No will be unique for each student. In ER
diagram, key attribute is represented by an oval with underlying lines.
2. Composite Attribute –
An attribute composed of many other attribute is called as composite
attribute. For example, Address attribute of student Entity type consists of
Street, City, State, and Country. In ER diagram, composite attribute is
represented by an oval comprising of ovals.
3. Multivalued Attribute –
An attribute consisting more than one value for a given entity. For example,
Phone_No (can be more than one for a given student). In ER diagram,
multivalued attribute is represented by double oval.
4. Derived Attribute –
An attribute which can be derived from other attributes of the entity type is
known as derived attribute. e.g.; Age (can be derived from DOB). In ER
diagram, derived attribute is represented by dashed oval.
2. Binary Relationship –
When there are TWO entities set participating in a relation, the relationship
is called as binary relationship.For example, Student is enrolled in Course.
3. n-ary Relationship –
When there are n entities set participating in a relation, the relationship is
called as n-ary relationship.
Cardinality:
The number of times an entity of an entity set participates in a relationship set is
known as cardinality. Cardinality can be of different types:
1. One to one – When each entity in each entity set can take part only once in the
relationship, the cardinality is one to one. Let us assume that a male can marry to one
female and a female can marry to one male. So the relationship will be one to one.
2. Many to one – When entities in one entity set can take part only once in the
relationship set and entities in other entity set can take part more than once in
the relationship set, cardinality is many to one. Let us assume that a student can take
only one course but one course can be taken by many students. So the cardinality will
be n to 1. It means that for one course there can be n students but for one student,
there will be only one course.
Using
Sets, it can be represented as:
In this case, each student is taking only 1 course but 1 course has been taken by many
students.
3. Many to many – When entities in all entity sets can take part more than once in
the relationshipcardinality is many to many. Let us assume that a student can take
more than one course and one course can be taken by many students. So the
relationship will be many to many.
Every student in Student Entity set is participating in relationship but there exists a
course C4 which is not taking part in the relationship.
Weak Entity Type and Identifying Relationship:
An entity type has a key attribute which uniquely identifies each entity in the entity set. But
there exists some entity type for which key attribute can’t be defined. These are called
Weak Entity type.
For example, A company may store the information of dependants (Parents, Children,
Spouse) of an Employee. But the dependents don’t have existence without the employee. So
Dependent will be weak entity type and Employee will be Identifying Entity type for
Dependant.
A weak entity type is represented by a double rectangle. The participation of weak entity
type is always total. The relationship between weak entity type and its identifying strong
entity type is called identifying relationship and it is represented by double diamond.
Extended Entity Relationship Model
As the complexity of data increased in the late 1980s, it became more and more difficult to
use the traditional ER Model for database modelling. Hence some improvements or
enhancements were made to the existing ER Model to make it able to handle the complex
applications better.
Hence, as part of the Enhanced ER Model, along with other improvements, three new
concepts were added to the existing ER Model, they were:
1. Generalization
2. Specialization
3. Aggregration
Let's understand what they are, and why were they added to the existing ER Model.
Generalization
Generalization is a bottom-up approach in which two lower level entities combine to form
a higher level entity. In generalization, the higher level entity can also combine with other
lower level entities to make further higher level entity.
It's more like Superclass and Subclass system, but the only difference is the approach,
which is bottom-up. Hence, entities are combined to form a more generalised entity, in
other words, sub-classes are combined to form a super-class.
Specialization
Specialization is opposite to Generalization. It is a top-down approach in which one higher
level entity can be broken down into two lower level entity. In specialization, a higher level
entity may not have any lower-level entity sets, it's possible.
Aggregration
Aggregration is a process when relation between two entities is treated as a single entity.
In the relational model, all data is logically structured within relations, i.e., tables, as
mentioned above. Each relation has a name and is formed from named attributes or
columns of data. Each tuple or row holds one value per attribute. The greatest
strength of the relational model is this simple logical structure that it forms. Behind
this simple structure is a sophisticated theoretical foundation that is lacking in the
first generation of DBMSs.
A logical data model describes the data in as much detail as possible, without regard to how
they will be physical implemented in the database. Features of a logical data model include:
The steps for designing the logical data model are as follows:
Comparing the logical data model shown above with the conceptual data model diagram,
we see the main differences between the two:
In a logical data model, primary keys are present, whereas in a conceptual data
model, no primary key is present.
In a logical data model, all attributes are specified within an entity. No attributes are
specified in a conceptual data model.
Relationships between entities are specified using primary keys and foreign keys in
a logical data model. In a conceptual data model, the relationships are simply stated,
not specified, so we simply know that two entities are related, but we do not specify
what attributes are used for this relationship.
Keys
Keys are very important part of Relational database model. They are used to
establish and identify relationships between tables and also to uniquely identify any
record or row of data inside a table.
A Key can be a single attribute or a group of attributes, where the combination may
act as a key.
1 Akon 9876723452 17
2 Akon 9991165674 19
3 Bkon 7898756543 18
4 Ckon 8987867898 19
5 Dkon 9990080080 17
Super Key
Super Key is defined as a set of attributes within a table that can uniquely identify
each record within a table. Super Key is a superset of Candidate key.
In the table defined above super key would include student_id, (student_id,
name), phoneetc.
Confused? The first one is pretty simple as student_id is unique for every row of
data, hence it can be used to identity each row uniquely.
Next comes, (student_id, name), now name of two students can be same, but
their student_idcan't be same hence this combination can also be a key.
Similarly, phone number for every student will be unique, hence again, phone can
also be a key.
So they all are super keys.
Candidate Key
Candidate keys are defined as the minimal set of fields which can uniquely identify
each record in a table. It is an attribute or a set of attributes that can act as a Primary
Key for a table to uniquely identify each record in that table. There can be more than
one candidate key.
In our example, student_id and phone both are candidate keys for table Student.
A candiate key can never be NULL or empty. And its value should be unique.
There can be more than one candidate keys for a table.
A candidate key can be a combination of more than one columns(attributes).
Primary Key
Primary key is a candidate key that is most appropriate to become the main key for
any table. It is a key that can uniquely identify each record in a table.
Composite Key
Key that consists of two or more attributes that uniquely identify any record in a
table is called Composite key. But the attributes which together form
the Composite key are not a key independentely or individually.
In the above picture we have a Score table which stores the marks scored by a
student in a particular subject.
In this table student_id and subject_id together will form the primary key, hence it is
a composite key.
Non-key Attributes
Non-key attributes are the attributes or fields of a table, other than candidate
key attributes/fields in a table.
Non-prime Attributes
Non-prime Attributes are attributes other than Primary Key attribute(s)..
Integrity Rules
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of
information.
o Integrity constraints ensure that the data insertion, updating, and other processes
have to be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the
database.
Types of Integrity Constraint
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc.
The value of the attribute must be available in the corresponding domain.
Example:
2. Entity integrity constraints
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation
and if the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Example:
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary
key. A primary key can contain a unique and null value in the relational table.
Example:
Relational algebra will have operators to indicate the operations. This algebra can
be applied on single relation – called unary or can be applied on two tables –
called binary. While applying the operations on the relation, the resulting subset of
relation is also known as new relation. There can be multiple steps involved in some
of the operations. The subsets of relations at the intermediary level are also known
as relation. We will understand it better when we see different operations below.
Relational Algebra in DBMS has 6 fundamental operations. There are several other
operations defined upon these fundamental operations.
Select (σ)
Select (σ) - This is a unary relational operation. This operation pulls the horizontal subset
(subset of rows) of the relation that satisfies the conditions. This can use operators like <, >,
<=, >=, = and != to filter the data from the relation. It can also use logical AND, OR and NOT
operators to combine the various filtering conditions. This operation can be represented as
below:
σ p (r)
Where σ is the symbol for select operation, r represents the relation/table, and p is the
logical formula or the filtering conditions to get the subset. Let us see an example as below:
σSTD_NAME = “James” (STUDENT)
What does above relation algebra do? It selects the record/tuple from the STUDENT table
with Student name as ‘James’
σdept_id = 20 AND salary>=10000 (EMPLOYEE) - Selects the records from EMPLOYEE table with
department ID = 20 and employees whose salary is more than 10000.
Project (∏)
Project (∏) - This is a unary operator and is similar to select operation above. It creates
the subset of relation based on the conditions specified. Here, it selects only selected
columns/attributes from the relation- vertical subset of relation. The select operation
above creates subset of relation but for all the attributes in the relation. It is denoted as
below:
Where ∏ is the operator for projection, r is the relation and a1, a2, a3 are the attributes of
the relations which will be shown in the resultant subset.
∏std_name, address, course (STUDENT) - This will select all the records from STUDENT table but
only selected columns – std_name, address and course. Suppose we have to select only
these 3 columns for particular student then we have to combine both project and select
operations.
∏STD_ID, address, course (σ STD_NAME = “James”(STUDENT)) - this selects the record for ‘James’ and
displays only std_ID, address and his course columns. Here we can see two unary operators
are combined, and it has two operations performing. First it selects the tuple from
STUDENT table for ‘James’. The resultant subset of STUDENT is also considered as
intermediary relation. But it is temporary and exists till the end of this operation. It then
filters the 3 columns from this temporary relation.
Rename (ρ)
Rename (ρ) - This is a unary operator used to rename the tables and columns of a relation.
When we perform self join operation, we have to differentiate two same tables. In such case
rename operator on tables comes into picture. When we join two or more tables and if
those tables have same column names, then it is always better to rename the columns to
differentiate them. This occurs when we perform Cartesian product operation.
ρ
(E)
R
Where ρ is the rename operator, E is the existing relation name, and R is the new relation
name.
ρ STUDENT (STD_TABLE) – Renames STD_TABLE table to STUDENT
Let us see another example to rename the columns of the table. If the STUDENT table has
ID, NAME and ADDRESS columns and if they have to be renamed to STD_ID, STD_NAME,
STD_ADDRESS, then we have to write as follows.
ρ
STD_ID, STD_NAME, STD_ADDRESS (STUDENT) – It will rename the columns in the order the names
appear in the table
Cartesian product (X): - This is a binary operator. It combines the tuples of two relations
into one relation.
RXS
Where R and S are two relations and X is the operator. If relation R has m tuples and
relation S has n tuples, then the resultant relation will have mn tuples. For example, if we
perform cartesian product on EMPLOYEE (5 tuples) and DEPT relations (3 tuples), then we
will have new tuple with 15 tuples.
EMPLOYEE X DEPT
This operator will simply create a pair between the tuples of each table. i.e.; each employee
in the EMPLOYEE table will be mapped with each department in DEPT table. Below
diagram depicts the result of cartesian product.
Union (U)
Union (U) - It is a binary operator, which combines the tuples of two relations. It is
denoted by
R U S
DESIGN_EMPLOYEE U TESTING_EMPLOYEE
Cartesian product combines the attributes of two relations into one relation
whereas Union combines the tuples of two relations into one relation.
In Union, both relations should have same number of columns. Suppose we have to
list the employees who are working for design and testing department. Then we
will do the union on employee table. Since it is union on same table it has same
number of attributes. Cartesian product does not concentrate on number of
attribute or rows. It blindly combines the attributes.
In Union, both relations should have same types of attributes in same order. In the
above example, since union is on employee relation, it has same type of attribute
in the same order.
It need not have same number of tuples in both the relation. If there is a duplicate tuples as
a result of union, then it keeps only one tuple. If a tuple is present in any one relation, then
it keeps that tuple in the new relation. In the above example, number of employees in
design department need not be same as employees in testing department. Below diagram
shows the same. We can observe that it combines the table data in the order they appear in
the table.
We would not able to join both these tables if the order of columns or the number of
columns were different.
Set-difference (-)
Set-difference (-) - This is a binary operator. This operator creates a new relation with
tuples that are in one relation but not in other relation. It is denoted by ‘-‘symbol.
R – S
Suppose we want to retrieve the employees who are working in Design department but not
in testing.
Set Intersection
Set Intersection - This operation is a binary operation. It results in a relation with tuples
that are in both the relations. It is denoted by ‘∩ ‘.
R∩S
Where R and S are the relations. It picks all the tuples that are present in both R and S, and
results it in a new relation.
Suppose we have to find the employees who are working in both design and testing
department. If we have tuples as in above example, the new result relation will not have
any tuples. Suppose we have tuples like below and see the new relation after set difference.
This set intersection can also be written as a combination of set difference operations.
R ∩ S R-(R-S)
i.e.; it evaluates R-S to get the tuples which are present only in R and then it gets the record
which are present only in R but not in new resultant relation of R-S.
It first filters only those employees who are only design employees – (104, Kathy). This
result is then used to find the difference with design employee. This will find those
employees who are design employees but not in new result – (100, James). Thus it gives the
result tuple which is both designer and tester. We can see here fundamental relational
operator is used twice to get set intersection. Hence this operation is not fundamental
operation.
Assignment
Assignment - As the name indicates, the assignment operator ‘ ’ is used to assign the
result of a relational operation to temporary relational variable. This is useful when there is
multiple steps in relational operation and handling everything in one single expression is
difficult. Assigning the results into temporary relation and using this temporary relation in
next operation makes task simple and easy.
T σ p (E)
Our example above in projection for getting STD_ID, ADDRESS and COURSE for the Student
‘James’ can be re-written as below.
T σ STD_NAME = “James”(STUDENT)
Natural Join
Natural join - As we have seen above, cartesian product simply combines the attributes of
two relations into one. But the new relation will not have correct tuples. It has only
combinations of tuples. In order to get the correct tuples, we have to use selection
operation on the cartesian product result. This set of operations – cartesian product
followed by selection – is combined into one relation called natural join. It is denoted by ∞
R∞S
Suppose we want to select the employees who are working for department 10. Then we
will perform the cartesian product on the EMPLOYEES and DEPT and find the DEPT_ID in
both relations matching to 10. The same is done with natural join as
Left outer join - In this operation, all the tuples in the left hand side relation is retained. All
matching attribute in the right hand relation is displayed with values and the ones which
do not have value are shown as NULL.
Below example of left outer join on DEPT and EMPLOYEE table combines the matching
combination of DEPT_ID = 10 with values. But DEPT_ID = 30 does not have any employees
yet. Hence it displays NULL for those employees. Thus this outer join makes more
meaningful to combining two relations than a cartesian product.
Right outer join
Right outer join - This is opposite of left outer join. Here all the attributes of right hand
side is retained and it matching attribute in left hand relation is found and displayed. If no
matching is found then null is displayed. Same above example is re-written to understand
this as below:
Full outer join - This is the combination of both left and right outer join. It displays all the
attributes from both the relation. If the matching attribute exists in other relation, then that
will be displayed, else those attributes are shown as null.
Hope above diagram is self explanatory.
Division
Division - This operation is used to find the tuples with phrase ‘for all’. It is denoted by ‘÷’.
Suppose we want to see all the employees who work in all of departments. What are the
steps involved to find this?
In third step we will find the employees in T2 with the entire department ID in T1. This is
obtained by using division operation – T2 ÷ T1
Data Dictionary and System Catalog
Data Dictionary consists of database metadata. It has records about objects in the
database.
Example
<StudentPersonalDetails>
The DBMS software manages the active data dictionary automatically. The modification is
an automatic task and most RDBMS has active data dictionary. It is also known as
integrated data dictionary.
Managed by the users and is modified manually when the database structure change. Also
known as non-integrated data dictionary.
Codd’s Relational Database Rules
Dr Edgar F. Codd, after his extensive research on the Relational Model of database
systems, came up with twelve rules of his own, which according to him, a database
must obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using
only its relational capabilities. This is a foundation rule, which acts as a base for all
the other rules.
The forms of Normalization i.e. 1NF, 2NF, 3NF, BCF, 4NF and 5NF remove all the
Insert, Update and Delete anomalies.
Insertion Anomaly occurs when you try to insert data in a record that does not
exist.
Deletion Anomaly is when a data is to be deleted and due to the poor deign of
database, other record also deletes.
Storing same data item multiple times is known as Data Redundancy. A normalized table do
not have the issue of redundancy of data.
Data Dependency
The data gets stored in the correct table and ensures normalization.
Isolation of Data
A good designed database states that the changes in one table or field do not affect other.
This is achieved through Normalization.
Data Consistency
While updating if a record is left, it can led to inconsistent data, Normalization resolves it
and ensures Data Consistency.
ADVANTAGES OF NORMALIZATION
The following are the advantages of the normalization.
• More efficient data structure.
• Avoid redundant fields or columns.
• More flexible data structure i.e. we should be able to add new rows and data values easily
• Better understanding of data.
• Ensures that distinct tables exist when necessary.
• Easier to maintain data structure i.e. it is easy to perform operations and complex queries
can be easily handled.
• Minimizes data duplication.
• Close modeling of real world entities, processes and their relationships.
DISADVANTAGES OF NORMALIZATION
The following are disadvantages of normalization.
• You cannot start building the database before you know what the user needs.
• On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
• It is very time consuming and difficult process in normalizing relations of higher degree.
• Careless decomposition may leads to bad design of database which may leads to serious
problems.
Functional Dependency
A functional dependency A->B in a relation holds if two tuples having same value of
attribute A also have same value for attribute B. For Example, in relation STUDENT shown
in table 1, Functional Dependencies
STUD_NO->STUD_NAME, STUD_NO->STUD_ADDR hold
but
STUD_NAME->STUD_ADDR do not hold
Functional Dependency Set: Functional Dependency set or FD set of a relation is the set
of all FDs present in the relation. For Example, FD set for relation STUDENT shown in table
1 is:
{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE, STUD_NO-
>STUD_COUNTRY,
STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY }
Attribute Closure:
Attribute closure of an attribute set can be defined as set of attributes which can be
functionally determined from it.
How to find attribute closure of an attribute set?
To find attribute closure of an attribute set:
Add elements of attribute set to the result set.
Recursively add elements to the result set which can be functionally determined
from the elements of the result set.
Using FD set of table 1, attribute closure can be determined as:
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY,
STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}
How to find Candidate Keys and Super Keys using Attribute Closure?
If attribute closure of an attribute set contains all attributes of relation, the attribute
set will be super key of the relation.
If no subset of this attribute set can functionally determine all attributes of the
relation, the set will be candidate key as well. For Example, using FD set of table 1,
(STUD_NO, STUD_NAME)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY, STUD_AGE}
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY,
STUD_AGE}
(STUD_NO, STUD_NAME) will be super key but not candidate key because its subset
(STUD_NO)+ is equal to all attributes of the relation. So, STUD_NO will be a candidate key.
GATE Question: Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and the set
of functional dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H} -> {K, L}, K -> {M}, L -> {N}
on R. What is the key for R? (GATE-CS-2014)
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
Answer: Finding attribute closure of all given options, we get:
{E,F}+ = {EFGIJ}
{E,F,H}+ = {EFHGIJKLMN}
{E,F,H,K,L}+ = {{EFHGIJKLMN}
{E}+ = {E}
{EFH}+ and {EFHKL}+ results in set of all attributes, but EFH is minimal. So it will be
candidate key. So correct option is (B).
Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
In the above table Course is a multi valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c1
3 M c2
To be in second normal form, a relation must be in first normal form and relation must not
contain any partial dependency. A relation is in 2NF iff it has No Partial
Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate
key) is dependent on any proper subset of any candidate key of the table.
Partial Dependency – If proper subset of candidate key determines non-prime attribute, it
is called partial dependency.
Example 1 – In relation STUDENT_COURSE given in Table 3,
FD set: {COURSE_NO->COURSE_NAME}
Candidate Key: {STUD_NO, COURSE_NO}
In FD COURSE_NO->COURSE_NAME, COURSE_NO (proper subset of candidate key) is
determining COURSE_NAME (non-prime attribute). Hence, it is partial dependency
and relation is not in second normal form.
To convert it to second normal form, we will decompose the relation
STUDENT_COURSE (STUD_NO, COURSE_NO, COURSE_NAME) as :
STUDENT_COURSE (STUD_NO, COURSE_NO)
COURSE (COURSE_NO, COURSE_NAME)
Note – This decomposition will be lossless join decomposition as well as dependency
preserving.
Example 2 – Consider following functional dependencies in relation R (A, B , C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency,
i.e., any proper subset of AB doesn’t determine any non-prime attribute.
A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super key. A
relation is in BCNF iff in every non-trivial functional dependency X –> Y, X is a super key.
Example 1 – Find the highest normal form of a relation R(A,B,C,D,E) with
FD set as {BC->D, AC->BE, B->E}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can
determine all attribute of relation, So AC will be candidate key. A
or C can’t be derived from any other attribute of the relation, so
there will be only 1 candidate key {AC}.
Step 2. Prime attribute are those attribute which are part of
candidate key {A,C} in this example and others will be non-prime
{B,D,E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS
does not allow multi-valued or composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal
form (BC is not proper subset of candidate key AC) and AC->BE is
in 2nd normal form (AC is candidate key) and B->E is in 2nd
normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither
BC is a super key nor D is a prime attribute) and in B->E (neither B
is a super key nor E is a prime attribute) but to satisfy 3rd normal
for, either LHS of an FD should be super key or RHS should be
prime attribute.
So the highest normal form of relation will be 2nd Normal form.
Example 2 –For example consider relation R(A, B, C)
A -> BC,
B ->A and B both are super keys so above relation is in BCNF.
Exercise 1: Find the highest normal form in R (A, B, C, D, E) under following functional
dependencies.
ABC --> D
CD --> AE
Important Points for solving above type of question.
1) It is always a good idea to start checking from BCNF, then 3 NF and so on.
2) If any functional dependency satisfied a normal form then there is no need to check for
lower normal form. For example, ABC –> D is in BCNF (Note that ABC is a super key), so no
need to check this dependency for lower normal forms.
Candidate keys in given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super key so this dependency
is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don’t need to check for this dependency as it already satisfied BCNF. Let
us consider CD -> AE. Since E is not a prime attribute, so relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD which is a proper subset of a
candidate key and it determine E, which is non prime attribute. So, given relation is also not
in 2 NF. So, the highest normal form is 1 NF.
If two or more independent relation are kept in a single relation or we can say multivalue
dependencyoccurs when the presence of one or more rows in a table implies the presence
of one or more other rows in that same table. Put another way, two attributes (or columns)
in a table are independent of one another, but both depend on a third attribute.
A multivalued dependency always requires at least three attributes because it consists of
at least two attributes that are dependent on a third.
For a dependency A -> B, if for a single value of A, multiple value of B exists, then the table
may have multi-valued dependency. The table should have at least 3 attributes and B and C
should be independent for A ->> B multivalued dependency. For example,
Person->-> mobile,
Person ->-> food_likes
This is read as “person multidetermines mobile” and “person multidetermines food_likes.”
Fourth normal form (4NF) is a level of database normalization where there are no non-
trivial multivalued dependencies other than a candidate key. It builds on the first three
normal forms (1NF, 2NF and 3NF) and the Boyce-Codd Normal Form (BCNF). It states that,
in addition to a database meeting the requirements of BCNF, it must not contain more than
one multivalued dependency.
Properties – A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. the table should not have any Multi-valued Dependency.
A table with a multivalued dependency violates the normalization standard of Fourth
Normal Form (4NK) because it creates unnecessary redundancies and can contribute to
inconsistent data. To bring this up to 4NF, it is necessary to break this information into two
tables.
Example – Consider the database table of a class whaich has two relations R1 contains
student ID(SID) and student name (SNAME) and R2 contains course id(CID) and course
name (CNAME).
S1 A
S2 B
CID CNAME
C1 C
C2 D
Table – R1 X R2
S1 A C1 C
S1 A C2 D
S2 B C1 C
S2 B C2 D
Example –
Table – R1
COMPANY PRODUCT
C1 pendrive
C1 mic
C2 speaker
C2 speaker
Company->->Product
Table – R2
AGENT COMPANY
Aman C1
Aman C2
Mohan C1
Agent->->Company
Table – R3
AGENT PRODUCT
Aman pendrive
Aman mic
AGENT PRODUCT
Aman speaker
Mohan speaker
Agent->->Product
Table – R1⋈R2⋈R3
C1 pendrive Aman
C1 mic Aman
C2 speaker speaker
C1 speaker Aman
Agent->->Product
A relation R is in 5NF if and only if every join dependency in R is implied by the candidate
keys of R. A relation decomposed into two relations must have loss-less join Property,
which ensures that no spurious or extra tuples are generated, when relations are reunited
through a natural join.
Properties – A relation R is in 5NF if and only if it satisfies following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency)
Example – Consider the above schema, with a case as “if a company makes a product and
an agent is an agent for that company, then he always sells that product for the company”.
Under these circumstances, the ACP table is shown as:
Table – ACP
A1 PQR Nut
A1 PQR Bolt
AGENT COMPANY PRODUCT
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decomposes into 3 relations. Now, the natural Join of all the three
relations will be shown as:
Table – R1
AGENT COMPANY
A1 PQR
A1 XYZ
A2 PQR
Table – R2
AGENT PRODUCT
A1 Nut
A1 Bolt
A2 Nut
Table – R3
COMPANY PRODUCT
PQR Nut
PQR Bolt
COMPANY PRODUCT
XYZ Nut
XYZ Bolt
Result of Natural Join of R1 and R3 over ‘Company’ and then Natural Join of R13 and R2
over ‘Agent’and ‘Product’ will be table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP
is a lossless join decomposition. Therefore, the relation is in 5NF as it does not violate the
property of lossless join.