Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Normalizationcse

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 120

5

NORMALIZATION

1
5

2
5

Example Relation (Table)


Consider a Sample Relation SUPP
S# City Status P# Qty

S1 London 20 P1 300
S1 London 20 P2 200
S1 London 20 P3 400
S1 London 20 P4 200
S1 London 20 P5 100
S1 London 20 P6 100
S2 Paris 10 P1 300
S2 Paris 10 P2 400
S3 Paris 10 P2 200
S4 London 20 P2 200
S4 London 20 P4 300
S4 London 20 P5 400

S# - Supplier No City – Supplier City Status – City Status P# - Part No Qty - Quantity
3
5

Update Anomalies
 INSERT: We can’t insert a record for a new
supplier unless the supplier supplies a part*
 DELETE: If we delete the only tuple for a
supplier, we destroy not only the shipment but
also the information that the supplier is located
in a particular city. Ex: Supplier S3
 UPDATE: If supplier S1 moves from London
to New York, all S1 records need to be
updated! Redundant!
part # is part of Primary Key!
4
5

6
5
Definition
 Let R be the relation, and let x and y be the
arbitrary subset of the set of attributes of R.
Then we say that Y is functionally dependent
on x – in symbol.
XY
(Read x functionally determines y) –
If and only if each x value in R has associated
with it precisely one y value in R
In other words
Whenever two tuples of R agree on their x
value, they also agree on their Y value.
7
5

Example Relation (Table)


Consider a Sample Relation SUPP
S# City Status P# Qty

S1 London 20 P1 300
S1 London 20 P2 200
S1 London 20 P3 400
S1 London 20 P4 200
S1 London 20 P5 100
S1 London 20 P6 100
S2 Paris 10 P1 300
S2 Paris 10 P2 400
S3 Paris 10 P2 200
S4 London 20 P2 200
S4 London 20 P4 300
S4 London 20 P5 400

S# - Supplier No City – Supplier City Status – City Status P# - Part No Qty - Quantity
8
5
Example (contd..)
One FD : - ( { S#}  {City})

 Because every tuple of that relation with


a given S# value also has the same city
value.

 The left and right hand side of an FD


are sometimes called determinant and
the dependents respectively.

9
5
Exercise
Check whether following relation satisfy
FD as not
 < S#, P# >  <QTY>
 <S#, P#>  <City>
 < S#, P#>  <City, QTY>
 <S#, P#>  <S#>
 <S#, P#>  <S#, P#, QTY, City>
 <OTY>  <S#>

10
5
Functional Dependencies
(Cont.)
 Let R be a relation schema
  R and   R
 The functional dependency

holds on R if and only if for any legal relations r(R), whenever
any two tuples t1 and t2 of r agree on the attributes , they also
agree on the attributes . That is,
t1[] = t2 []  t1[ ] = t2 [ ]
 Example: Consider r(A,B ) with the following instance of r.

1 5
3 7
1 4

On this instance, A  B does NOT hold, but B  A does hold.

11
5
Functional Dependencies
(Cont.)
 K is a superkey for relation schema R if and only if K  R
 K is a candidate key for R if and only if
 K  R, and
 for no   K,   R
 Functional dependencies allow us to express constraints that cannot
be expressed using superkeys. Consider the schema:
inst_dept (ID, name, salary, dept_name, building, budget ).
We expect these functional dependencies to hold:
dept_name building
and ID  building
but would not expect the following to hold:
dept_name  salary

12
5
Use of Functional
Dependencies
 We use functional dependencies to:
 test relations to see if they are legal under a given set of functional
dependencies.
 If a relation r is legal under a set F of functional dependencies,

we say that r satisfies F.


 specify constraints on the set of legal relations
 We say that F holds on R if all legal relations on R satisfy the

set of functional dependencies F.


 Note: A specific instance of a relation schema may satisfy a functional
dependency even if the functional dependency does not hold on all
legal instances.
 For example, a specific instance of instructor may, by chance,
satisfy
name  ID.
13
5
Functional Dependencies
(Cont.)
 A functional dependency is trivial if it is satisfied by all instances of
a relation
 Example:
 ID, name  ID

 name  name

 In general,    is trivial if   

14
TRIVIAL & NON-TRIVIAL DEPENDENCIES 5

 One-way to reduce the size of the set of


FD we need to deal with is to eliminate
the trivial dependencies.
 An FD is trivial if and only if the right
hand side is a subset of the left hand
side.
e.g. <S#, P#>  <S#>. (Trivial)
 Nontrivial dependencies are the one,
which are not trivial.

15
5
What About Smaller
Schemas?
 Suppose we had started with inst_dept. How would we know to
split up (decompose) it into instructor and department?
 Write a rule “if there were a schema (dept_name, building,
budget), then dept_name would be a candidate key”
 Denote as a functional dependency:
dept_name  building, budget
 In inst_dept, because dept_name is not a candidate key, the
building and budget of a department may have to be repeated.
 This indicates the need to decompose inst_dept

16
5
Lossy Decomposition
 Not all decompositions are good. Suppose we decompose
employee(ID, name, street, city, salary) into
employee1 (ID, name)
employee2 (name, street, city, salary)
 The next slide shows how we lose information -- we cannot
reconstruct the original employee relation -- and so, this is a lossy
decomposition.

17
5
A Lossy Decomposition

Two tuples have


become four!
We didn’t lose
tuples, but
we lost
information. 18
5
Decomposition
S# status CITY
A
S1 30 paris
S2 30 Athens

S# status S# CITY

B
S1 30 S1 paris

S2 30 S2 Athens

S# status status CITY

C 30 paris
S1 30 30 Athens
S2 30 Designed By Deepak Moud 19 19
5
Example of Lossless-Join Decomposition

 Lossless join decomposition


 Decomposition of R = (A, B, C)
R1 = (A, B) R2 = (B, C)

A B C A B B C
 1 A  1 1 A
 2 B  2 2 B
r A,B(r) B,C(r)

A B C
A (r) B (r)
 1 A
 2 B

In general decomposition is lossless provided certain


functional dependencies hold; more on this later.
20
5
Dependencies: Definitions
 Partial Dependency – when an non-key
attribute is determined by a part, but not the
whole, of a COMPOSITE primary key.

CUSTOMER Partial Dependency

Cust_ID Name Order_ID


101 AT&T 1234
101 AT&T 156
125 Cisco 1250

23
5
Dependencies: Definitions

 Transitive Dependency – when a non-


key attribute determines another non-key
attribute. Transitive
Dependency

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

25
5
First Normal Form
 Domain is atomic if its elements are considered to be indivisible
units
 Examples of non-atomic domains:
 Set of names, composite attributes

 Identification numbers like CS101 that can be broken up

into parts
 A relational schema R is in first normal form if the domains of
all attributes of R are atomic
 Non-atomic values complicate storage and encourage
redundant (repeated) storage of data
 Example: Set of accounts stored with each customer, and
set of owners stored with each account
 We assume all relations are in first normal form (and revisit
this in Chapter 22: Object Based Databases)

27
5
First Normal Form (Cont’d)
 Atomicity is actually a property of how the elements of the
domain are used.
 Example: Strings would normally be considered indivisible
 Suppose that students are given roll numbers which are
strings of the form CS0012 or EE1127
 If the first two characters are extracted to find the
department, the domain of roll numbers is not atomic.
 Doing so is a bad idea: leads to encoding of information in
application program rather than in the database.

28
5

29
5

30
5
Dependencies: Definitions
 Multivalued Attributes (or repeating groups): non-
key attributes or groups of non-key attributes the
values of which are not uniquely identified by
(directly or indirectly) (not functionally dependent on)
the value of the Primary Key (or its part).
If a non key attribute or a group of non key attribute dependent on
primary key or subset of primary key directly and indirectly then table in
first normal form. (By Deepak Moud Sir)

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
31
5
Example 1: Determine NF
All attributes are directly
 ISBN  Title or indirectly determined
 ISBN  Publisher by the primary key;
therefore, the relation is at
 Publisher  Address least in 1 NF

BOOK

ISBN Title Publisher Address

32
5
Example 2: Determine NF

 Product_ID  Description
All attributes are directly or
indirectly determined by the primary
key; therefore, the relation is at
least in 1 NF
ORDER

Order_No Product_ID Description

33
5
Example 3: Determine NF
 Part_ID  Description
Comp_ID and No are not
 Part_ID  Price determined by the primary
 Part_ID, Comp_ID  No key; therefore, the relation is
NOT in 1 NF. No sense in
looking at partial or
transitive dependencies.

PART

Part_ID Descr Price Comp_ID No

34
5
Example 3: Determine NF
In your solution you will write
 Part_ID  Description the following justification:
 Part_ID  Price 1) There are M/V attributes;
therefore, not 1NF
 Part_ID, Comp_ID  No Conclusion: The relation is not
normalized.

PART

Part_ID Descr Price Comp_ID No

35
5
Bringing a Relation to 1NF

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00

36
5
Bringing a Relation to 1NF
 Option 1: Make a determinant of the
repeating group (or the multivalued
attribute) a part of the primary key.
Composite Primary
Key

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00

37
5
Bringing a Relation to 1NF
 Option 2: Remove the entire repeating group from the
relation. Create another relation which would contain
all the attributes of the repeating group, plus the
primary key from the first relation. In this new relation,
the primary key from the original relation and the
determinant of the repeating group will comprise a
primary key.
STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00

38
5
Bringing a Relation to 1NF
STUDENT

Stud_ID Name
101 Lennon
125 Jonson

STUDENT_COURSE

Stud_ID Course Units


101 MSI 250 3
102 MSI 415 3
125 MSI 331 3

39
5

Example Relation (Table)


Consider a Sample Relation SUPP
S# City Status P# Qty

S1 London 20 P1 300
S1 London 20 P2 200
S1 London 20 P3 400
S1 London 20 P4 200
S1 London 20 P5 100
S1 London 20 P6 100
S2 Paris 10 P1 300
S2 Paris 10 P2 400
S3 Paris 10 P2 200
S4 London 20 P2 200
S4 London 20 P4 300
S4 London 20 P5 400

S# - Supplier No City – Supplier City Status – City Status P# - Part No Qty - Quantity
40
5

First Normal Form (1NF)


 A relation is said to be in 1NF if and only if it
satisfies the condition that it contains no
repeating groups.
 1NF dictates that, for every row-by-
column position in a given table,
there exists only one value, not an
array or list of values.
 Ex: Our sample table (SUPP) is in 1NF
 Primary key: (S#, P#). Primary Key(s) uniquely
identifies a tuple (record).
1NF is not sufficient. Why? Update anomalies!
41
5

1NF Update Anomalies


 INSERT: We can’t insert a record for a new
supplier unless the supplier supplies a part*
 DELETE: If we delete the only tuple for a
supplier, we destroy not only the shipment but
also the information that the supplier is located
in a particular city. Ex: Supplier S3
 UPDATE: If supplier S1 moves from London
to New York, all S1 records need to be
updated! Redundant!
part # is part of Primary Key!
42
5

 These anomalies indicate that 1NF in


our case is not ideal. How do we
solve this?
 Break the relation into two!!
 Proposed Solutions 2nd NF  

43
5

44
5

45
5
Example 1: Determine NF
 ISBN  Title
The relation is at least in 1NF.
 ISBN  Publisher There is no COMPOSITE primary
 Publisher  Address key, therefore there can’t be
partial dependencies. Therefore,
the relation is at least in 2NF
BOOK

ISBN Title Publisher Address

46
5
Example 2: Determine NF

 Product_ID  Description

The relation is at least in 1NF.


There is a COMPOSITE Primary Key (PK) (Order_No,
Product_ID), therefore there can be partial
dependencies. Product_ID, which is a part of PK,
determines Description; hence, there is a partial
dependency. Therefore, the relation is not 2NF

ORDER

Order_No Product_ID Description

47
5
Example 2: Determine NF

 Product_ID  Description
We know that the relation is at least
in 1NF, and it is not in 2 NF.
Therefore, we conclude that the
relation is in 1 NF.

ORDER

Order_No Product_ID Description

48
5
Example 1: Determine NF
We know that the relation is at
 ISBN  Title least in 2NF, and it is not in 3
 ISBN  Publisher NF. Therefore, we conclude
that the relation is in 2NF.
 Publisher  Address

BOOK

ISBN Title Publisher Address

49
5
Bringing a Relation to 2NF

Composite
Primary Key

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00

50
5
Bringing a Relation to 2NF
 Goal: Remove Partial Dependencies
Composite Partial Dependencies
Primary Key

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00

51
5
Bringing a Relation to 2NF

Composite
Primary Key

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00

52
5
Bringing a Relation to 2NF
 Remove attributes that are dependent from the part
but not the whole of the primary key from the original
relation. For each partial dependency, create a new
relation, with the corresponding part of the primary
key from the original as the primary key.
STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00

53
5
Bringing a Relation to 2NF
CUSTOMER
STUDENT_COURSE
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00 Stud_ID Course_ID
125 Johnson MSI 331 3.00
101 MSI 250
101 MSI 415
125 MSI 331

STUDENT COURSE

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
54
5

Example Relation (Table)


Consider a Sample Relation SUPP
S# City Status P# Qty

S1 London 20 P1 300
S1 London 20 P2 200
S1 London 20 P3 400
S1 London 20 P4 200
S1 London 20 P5 100
S1 London 20 P6 100
S2 Paris 10 P1 300
S2 Paris 10 P2 400
S3 Paris 10 P2 200
S4 London 20 P2 200
S4 London 20 P4 300
S4 London 20 P5 400

S# - Supplier No City – Supplier City Status – City Status P# - Part No Qty - Quantity
55
5
New Relations!: Second
Normal Form (2NF)
Primary Key
S# City Status S# P# Qty
S1 London 20
S1 P1 300
S2 Paris 10
S1 P2 200
S3 Paris 10
S1 P3 400
S4 London 20
S1 P4 200
S5 Athens 30
S1 P5 100
S1 P6 100
SUPP1 - Note we S2 P1 300
added a NEW supplier S5 S2 P2 400
located in Athens easily! S3 P2 200
S4 P2 200
SUPP2  S4 P4 300
S4 P5 400

56
5

Second Normal Form (2NF)


 A relation is in 2NF if and only if it is in
1NF and every nonkey attribute is fully
dependent on the (entire) primary key.
 Since SUPP2’s only nonkey attribute is
dependent on Primary key (S#, P#), it is in 2NF
 Similarly, nonkey attributes of SUPP1 are
dependent on the Primary Key S# and SUPP1 is
in 2NF.

57
5

1NF Update Problems


Overcome!
Note that the revised structure overcomes Update
problems mentioned earlier!
 INSERT: We can add a new supplier S5 (in SUPP1)
even though S5 does not currently supply any parts
 DELETE: If we delete a shipment (quantity) tuple in
SUPP2, we still have Supplier Information in SUPP1
 UPDATE: If Supplier S1 moves from London to New
York, we need to update only one tuple in SUPP1!
But, this new structure is NOT enough.

58
5

2NF Update Anomalies


 SUPP2 poses no problems. In SUPP1, however,
there is still redundancy.
 INSERT: We can not create a new tuple giving a
specific Status to a City unless we have a Supplier
from the City*

 DELETE: If we delete the only tuple for a particular city,


we destroy supplier information as well as city status
information!

 UPDATE: If we wanted to change status for a city, we


need to modify all the tuples for the city! Redundant!

*Supplier # is the primary key!


59
5

But, this new structure is NOT enough.


 Although SUPP1 is in 2NF, it is still not
sufficient. City and Status are
dependent on Supplier. But, they are
also related to each other! We need to
restructure!
 Proposed Solutions 3rd NF  

60
5

61
5
Example 1: Determine NF
Publisher is a non-key attribute,
 ISBN  Title and it determines Address, another
 ISBN  Publisher non-key attribute. Therefore, there
is a transitive dependency, which
 Publisher  Address means that the relation is NOT in 3
NF.

BOOK

ISBN Title Publisher Address

62
5
Example 1: Determine NF
 ISBN  Title In your solution you will write the
 ISBN  Publisher following justification:
1) No M/V attributes, therefore at least
 Publisher  Address 1NF
2) No partial dependencies, therefore
at least 2NF
3) There is a transitive dependency
(Publisher  Address), therefore,
not 3NF
Conclusion: The relation is in 2NF

BOOK

ISBN Title Publisher Address

63
5
Bringing a Relation to 3NF

 Goal: Get rid of transitive dependencies.

Transitive
Dependency
EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

64
5
Bringing a Relation to 3NF
 Remove the attributes, which are dependent on a
non-key attribute, from the original relation. For each
transitive dependency, create a new relation with the
non-key attribute which is a determinant in the
transitive dependency as a primary key, and the
dependent non-key attribute as a dependent.

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

65
5
Bringing a Relation to 3NF
EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID


111 Mary Jones 1
122 Sarah Smith 2

DEPARTMENT

Dept_ID Dept_Name
1 Acct
2 Mktg

66
5

Example Relation (Table)


Consider a Sample Relation SUPP
S# City Status P# Qty

S1 London 20 P1 300
S1 London 20 P2 200
S1 London 20 P3 400
S1 London 20 P4 200
S1 London 20 P5 100
S1 London 20 P6 100
S2 Paris 10 P1 300
S2 Paris 10 P2 400
S3 Paris 10 P2 200
S4 London 20 P2 200
S4 London 20 P4 300
S4 London 20 P5 400

S# - Supplier No City – Supplier City Status – City Status P# - Part No Qty - Quantity
67
5
New Relations!: Second
Normal Form (2NF)
Primary Key
S# City Status S# P# Qty
S1 London 20
S1 P1 300
S2 Paris 10
S1 P2 200
S3 Paris 10
S1 P3 400
S4 London 20
S1 P4 200
S5 Athens 30
S1 P5 100
S1 P6 100
SUPP1 - Note we S2 P1 300
added a NEW supplier S5 S2 P2 400
located in Athens easily! S3 P2 200
S4 P2 200
SUPP2  S4 P4 300
S4 P5 400

68
5

The New Relations!


SUPP1.1 SUPP2
S# City S# P# Qty
S1 London
S1 P1 300
S2 Paris
S1 P2 200
S3 Paris
S1 P3 400
S4 London
S1 P4 200
S5 Athens
S1 P5 100
SUPP1.2 S1 P6 100
City Status S2 P1 300
S2 P2 400
London 20
S3 P2 200
Paris 10
S4 P2 200
Athens 30
S4 P4 300
Rome 50
S4 P5 400
We can add a City (Rome) and
Status without adding a Supplier
69
5

Third Normal Form (3NF)


 A relation is in 3NF, if and only if it is in 2NF
and if all the non-key attributes are
mutually independent.
 Now, all our relations are in the 3NF. From a
single relation (SUPP) in 1NF, we have
developed three relations, (SUPP1.1,
SUPP1.2, & SUPP2) all in 3NF.
 If we join these three relations we WILL get
SUPP! (lossless decomposition).

70
5
Again, update Problems
Overcome!
The new structure overcomes update problems of
SUPP1
 INSERT: We can add a city and assign a status
without adding a supplier
 DELETE: If we delete a tuple in SUPP1.1 we lose
only Supplier information. Status information for
the City is still available in SUPP1.2
 UPDATE: Updating status for a city involves only
one tuple in SUPP1.2. Similarly, if a supplier moved
to a different city only one tuple needs to be modified
in SUPP1.1

71
5
Third Normal Form
 A relation schema R is in third normal form (3NF) if for all:
   in F+
at least one of the following holds:
    is trivial (i.e.,   )
  is a superkey for R
 Each attribute A in  –  is contained in a candidate key for R.

(NOTE: each attribute may be in a different candidate key)


 If a relation is in BCNF it is in 3NF (since in BCNF one of the first
two conditions above must hold).
 Third condition is a minimal relaxation of BCNF to ensure
dependency preservation (will see why later).

72
5
3NF Example
 Relation dept_advisor:
 dept_advisor (s_ID, i_ID, dept_name)
F = {s_ID, dept_name  i_ID, i_ID  dept_name}
 Two candidate keys: s_ID, dept_name, and i_ID, s_ID
 R is in 3NF
 s_ID, dept_name  i_ID

 s_ID, dept_name is a superkey

 i_ID  dept_name

 dept_name is contained in a candidate key

73
5
Closure of a Set of
Functional Dependencies
 Given a set F of functional dependencies, there are certain
other functional dependencies that are logically implied by F.
 For example: If A  B and B  C, then we can infer
that A  C
 More on functional dependency inference later…


The set of all functional dependencies logically implied by F
is the closure of F.
 We denote the closure of F by F+.
 F+ is a superset of F.

74
5
Closure of a Set of
Functional Dependencies
 We can find F +, the closure of F, by repeatedly applying
Armstrong’s Axioms:

 if   , then    (reflexivity)

 if   , then      (augmentation)

 if   , and   , then    (transitivity)

75
5
Example
 R = (A, B, C, G, H, I)
F={ AB
AC
CG  H
CG  I
B  H}
 some members of F +
 AH
 by transitivity from A  B and B  H

 AG  I
 by augmenting A  C with G, to get AG  CG

and then transitivity with CG  I


Quiz Q1: Given the above FDs, the functional dependency AB  B
(1) cannot be inferred (2) can be inferred using transitivity
(3) can be inferred using reflexivity (4) can be inferred using augmentation

76
Closure of Functional 5

Dependencies (Cont.)
 Additional rules:
 If    holds and    holds, then     holds
(union)
 If     holds, then    holds and    holds
(decomposition)
 If    holds and     holds, then     holds
(pseudotransitivity)
.

Quiz Q2: Given a schema r(A, B, C, D) with functional


dependencies A  B and B  C, then which of the following
is a candidate key for r?
(1) A (2) AC (3) AD (4) ABD)

77
5
Closure of Attribute Sets
 Given a set of attributes  define the closure of  under F (denoted by +)
as the set of attributes that are functionally determined by  under F

 Algorithm to compute +, the closure of  under F

result := ;
while (changes to result) do
for each    in F do
begin
if   result then result := result  
end

If +, includes all attributes then  is super key . (By Deepak Moud Sir)

78
Example of Attribute Set 5

 R = (A, B, C, G, H, I) Closure
 F = {A  B AC
CG  H CG  I
B  H}
 (AG)+
1. result = AG
2. result = ABCG (A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG  AGBCH)
 Is AG a candidate key?
1. Is AG a super key?
1. Does AG  R? == Is (AG)+  R

2. Is any subset of AG a superkey?


1. Does A  R? == Is (A)+  R

2. Does G  R? == Is (G)+  R

79
5
Quiz Time

Quiz Q3: Given the functional dependencies


A  B, B  CD and DE  F
the attribute closure A+ is:
(1) ABC
(2) ABCD
(3) BCD
(4) ABCDF

80
5
Uses of Attribute Closure

There are several uses of the attribute closure algorithm:


 Testing for superkey:
 To test if  is a superkey, we compute +, and check if +
contains all attributes of R.
 Testing functional dependencies
 To check if a functional dependency    holds (or, in other
words, is in F+), just check if   +.
 That is, we compute + by using attribute closure, and then
check if it contains .
 Is a simple and cheap test, and very useful
 Computing closure of F
 For each   R, we find the closure +, and for each S  +, we
output a functional dependency   S.
81
5
Lossless-join
Decomposition
 For the case of R = (R1, R2), we require that for all possible
relations r on schema R
r = R1 (r ) R2 (r )
 A decomposition of R into R1 and R2 is lossless join if at least
one of the following dependencies is in F+:
 R1  R2  R1

 R1  R2  R2
 The above functional dependencies are a sufficient condition
for lossless join decomposition; the dependencies are a
necessary condition only if all constraints are functional
dependencies

82
5
Example
 R = (A, B, C)
F = {A  B, B  C)
 Can be decomposed in two different ways
 R1 = (A, B), R2 = (B, C)
 Lossless-join decomposition:
R1  R2 = {B} and B  BC
 Dependency preserving
 R1 = (A, B), R2 = (A, C)
 Lossless-join decomposition:
R1  R2 = {A} and A  AB
 Not dependency preserving
(cannot check B  C without computing R1 R2)

83
Dependency Preservation 5

 Let Fi be the set of dependencies F + that include only


attributes in Ri.
 A decomposition is dependency preserving, if
(F1  F2  …  Fn )+ = F +
 If it is not, then checking updates for violation of
functional dependencies may require computing joins,
which is expensive.
 See book for efficient algorithm for checking dependency
preservation

84
5
Higher Normal Forms

 After Codd defined the original set of normal


forms it was discovered that Third Normal
Form, as originally defined, had certain
inadequacies.
 This led to several higher normal forms,
including the Boyce/Codd, Fourth and Fifth
Normal Forms. However for most cases, third
normal form is sufficient.
 Few points are worth noting  

85
5
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of
functional dependencies if for all functional dependencies
in F+ of the form



where   R and   R, at least one of the following holds:


    is trivial (i.e.,   )
  is a superkey for R

Example schema not in BCNF:

instr_dept (ID, name, salary, dept_name, building, budget )

because dept_name building, budget


holds on instr_dept, but dept_name is not a superkey

86
5
Decomposing a Schema into
BCNF
 Suppose we have a schema R and a non-trivial dependency
 + 
 causes a violation of BCNF.
We decompose R into:
• ( + U  )
• (R-(-))
 In our example,
  = dept_name
  = building, budget

and inst_dept is replaced by


 ( + U  ) = ( dept_name, building, budget )
 ( R - (  -  ) ) = ( ID, name, salary, dept_name )

87
5
Example
 R = (A, B, C )
F = {A  B
B  C}
Key = {A}
 R is not in BCNF
 Decomposition R1 = (A, B), R2 = (B, C)
 R1 and R2 in BCNF
 Lossless-join decomposition
 Dependency preserving

88
5
Example
 Question 2 Suppose you are given a relation R = (A, B, C, D, E) with the
following functional
 Dependencies: {CE -> D, D -> B,C -> A}.
 a. Find all candidate keys.
 b. Identify the best normal form that R satisfies (1NF, 2NF, 3NF, or BCNF).
 c. If the relation is not in BCNF, decompose it until it becomes BCNF. At
each step, identify a
 new relation, decompose and re-compute the keys and the normal forms
they satisfy.

 a. The only key is {C,E}


 b. The relation is in 1NF
 c. Decompose into R1=(A,C) and R2=(B,C,D,E). R1 is in BCNF, R2 is in
2NF. Decompose R2
 into, R21=(C,D,E) and R22=(B,D). Both relations are in BCNF.

89
5
Example
 Question 3 Suppose you are given a relation R=(A,B,C,D,E) with the
following functional dependencies:
 {BC -> ADE,D -> B}.
 a. Find all candidate keys.
 b. Identify the best normal form that R satisfies (1NF, 2NF, 3NF, or BCNF).
 c. If the relation is not in BCNF, decompose it until it becomes BCNF. At
each step, identify a
 new relation, decompose and re-compute the keys and the normal forms
they satisfy.
 Answer.
 a. The keys are {B,C} and {C,D}
 b. The relation is in 3NF
 c. It cannot be put into BCNF, even if I remove D and put into a relation of the
form (B, C, D) (I need C for the functional dependency), the resulting
relation would not be in BCNF.

   90
5
Example
 Question 4. Which normal form is considered adequate for normal
relational database design?
(a) 2NF (b) 5NF (c) 4NF (d) 3NF

Ans: (d)

 Explanation:
 A relational database table is often described as "normalized" if it is in
the Third Normal Form because most of the 3NF tables are free of
insertion, update, and deletion anomalies.

91
5
Example
 Question 5. Consider a schema R (A, B, C, D) and functional
dependencies A -> B and C -> D. Then the decomposition of R into R1
(A, B) and R2(C, D) is
 (a) dependency preserving and lossless join
 (b) lossless join but not dependency preserving
 (c) dependency preserving but not lossless join
 (d) not dependency preserving and not lossless join

92
5
Example
 Ans: (c)
 While decomposing a relational table we must verify the following properties:
 i) Dependency Preserving Property: A decomposition is said to be
dependency preserving if F+=(F1 ∪ F2 ∪ .. Fn)+, Where F+=total functional
dependencies(FDs) on universal relation R, F1 = set of FDs of R1, and F2 =
set of FDs of R2.
For the above question R1 preserves A->B and R2 preserves C->D. Since
the FDs of universal relation R is preserved by R1 and R2, the
decomposition is dependency preserving.
ii) Lossless-Join Property:
The decomposition is a lossless-join decomposition of R if at least one of the
following functional dependencies are in F+:-
a) R1 ∩ R2 -> R1
b) R1 ∩ R2 -> R2
It ensures that the attributes involved in the natural join ( ) are a candidate
key for at least one of the two relations.In the above question schema R is
decomposed into R1 (A, B) and R2(C, D), and R1 ∩ R2 is empty. So, the
decomposition is not lossless. 93
5
Example
 6. A table has fields F1, F2, F3, F4, and F5, with the following functional
dependencies:
F1->F3
F2->F4
(F1,F2)->F5
in terms of normalization, this table is in
(a) 1NF (b) 2NF (c) 3NF (d) None of these

Ans: (a)
Explanation:
Since the primary key is not given we have to derive the primary key of the
table. Using the closure set of attributes we get the primary key as (F1,F2).
From functional dependencies, "F1->F3, F2->F4", we can see that there is
partial functional dependency therefore it is not in 1NF. Hence the table is in
1NF.

94
5
Example
7. Let R(A,B,C,D,E,P,G) be a relational schema in which the following FDs
are known to hold:
AB->CD
DE->P
C->E
P->C
B->G
The relation schema R is
(a) in BCNF (b) in 3NF, but not in BCNF
(c) in 2NF, but not in 3NF (d) not in 2NF

Ans: (d)
Explanation:
From the closure set of attributes we can see that the key for the relation is
AB. The FD B->G is a partial dependency; hence it is not in 2NF.

95
5
BCNF and Dependency
Preservation
 Constraints, including functional dependencies, are costly to check
in practice unless they pertain to only one relation
 If it is sufficient to test only those dependencies on each individual
relation of a decomposition in order to ensure that all functional
dependencies hold, then that decomposition is dependency
preserving.
 Because it is not always possible to achieve both BCNF and
dependency preservation, we consider a weaker normal form,
known as third normal form.

96
5
Testing for BCNF

To check if a non-trivial dependency  causes a violation of BCNF
1. compute + (the attribute closure of ), and
2. verify that it includes all attributes of R, that is, it is a superkey of R.
 Simplified test: To check if a relation schema R is in BCNF, it suffices to
check only the dependencies in the given set F for violation of BCNF,
rather than checking all dependencies in F+.
 If none of the dependencies in F causes a violation of BCNF, then
none of the dependencies in F+ will cause a violation of BCNF.
 However, simplified test using only F is incorrect when testing a
relation in a decomposition of R
 Consider R = (A, B, C, D, E), with F = { A  B, BC  D}
 Decompose R into R = (A,B) and R = (A,C,D, E)
1 2
 Neither of the dependencies in F contain only attributes from

(A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF.

 In fact, dependency AC  D in F+ shows R2 is not in BCNF.

97
5
Example of BCNF
 R = (A, B, C ) Decomposition
F = {A  B
B  C}
Key = {A}
 R is not in BCNF (B  C but B is not superkey)
 Decomposition
 R1 = (B, C)

 R2 = (A,B)

Quiz Q4: Given relation r(A, B, C, D) and the functional


dependency A  CD the BCNF decomposition is:
(1) ABC, ACD
(2) AB, ACD
(3) AB, BCD
(4) ABC, CD

98
Example of BCNF Decomposition
5

 class (course_id, title, dept_name, credits, sec_id, semester,


year, building, room_number, capacity, time_slot_id)
 Functional dependencies:
 course_id→ title, dept_name, credits
 building, room_number→capacity
 course_id, sec_id, semester, year→building, room_number,
time_slot_id
 A candidate key {course_id, sec_id, semester, year}.
 BCNF Decomposition:
 course_id→ title, dept_name, credits holds
 but course_id is not a superkey.

 We replace class by:


 course(course_id, title, dept_name, credits)

 class-1 (course_id, sec_id, semester, year, building,

room_number, capacity, time_slot_id)

99
5
BCNF Decomposition
(Cont.)
 course is in BCNF
 How do we know this?
 building, room_number→capacity holds on class-1
 but {building, room_number} is not a superkey for class-1.
 We replace class-1 by:
 classroom (building, room_number, capacity)

 section (course_id, sec_id, semester, year, building,

room_number, time_slot_id)
 classroom and section are in BCNF.

100
5
BCNF and Dependency
Preservation
It is not always possible to get a BCNF decomposition that is
dependency preserving

 R = (J, K, L )
F = {JK  L
LK}
Two candidate keys = JK and JL
 R is not in BCNF
 Any decomposition of R will fail to preserve
JK  L
This implies that testing for JK  L requires a join

101
5
Normal Forms: Review

 Unnormalized – There are multivalued


attributes or repeating groups
 1 NF – No multivalued attributes or
repeating groups.
 2 NF – 1 NF plus no partial
dependencies
 3 NF – 2 NF plus no transitive
dependencies
102
5

103
5
Testing Decomposition for BCNF
 To check if a relation Ri in a decomposition of R is in BCNF,
 Either test Ri for BCNF with respect to the restriction of F to Ri
(that is, all FDs in F+ that contain only attributes from Ri)
 or use the original set of dependencies F that hold on R, but with
the following test:
 for every set of attributes   R , check that + (the attribute
i
closure of ) either includes no attribute of Ri- , or includes
all attributes of Ri.
 If the condition is violated by some   in F, the dependency
 (+ - )  Ri
can be shown to hold on Ri, and Ri violates BCNF.
 We use above dependency to decompose Ri

E.g. given { A  B, BC  D} and decomposition R1 (A,B) and


R2 (A,C,D, E), AC+ = ABCD, so R2 violates BCNF due to the
dependency AC  D
104
5
BCNF Decomposition
result := {R }; Algorithm
done := false;
compute F +;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let    be a nontrivial functional dependency that
holds on Ri such that   Ri is not in F +,
and    = ;
result := (result – Ri )  (Ri – )  (,  );
end
else done := true;

Note: each Ri is in BCNF, and decomposition is lossless-join.

105
5
Third Normal Form:
Motivation
 There are some situations where
 BCNF is not dependency preserving, and
 efficient checking for FD violation on updates is
important
 Solution: define a weaker normal form, called Third
Normal Form (3NF)
 Allows some redundancy (with resultant problems; we
will see examples later)
 But functional dependencies can be checked on
individual relations without computing a join.
 There is always a lossless-join, dependency-
preserving decomposition into 3NF.

106
5
Third Normal Form
 A relation schema R is in third normal form (3NF) if for all:
   in F+
at least one of the following holds:
    is trivial (i.e.,   )
  is a superkey for R
 Each attribute A in  –  is contained in a candidate key for R.

(NOTE: each attribute may be in a different candidate key)


 If a relation is in BCNF it is in 3NF (since in BCNF one of the first
two conditions above must hold).
 Third condition is a minimal relaxation of BCNF to ensure
dependency preservation (will see why later).

107
5
Redundancy in 3NF
 There is some redundancy in this schema
 Example of problems due to redundancy in 3NF
 R = (J, K, L)
F = {JK  L, L  K } J L K
j1 l1 k1

j2 l1 k1

j3 l1 k1
null l2 k2

 repetition of information (e.g., the relationship l1, k1)


 (i_ID, dept_name)
 need to use null values (e.g., to represent the relationship
l2, k2 where there is no corresponding value for J).
 (i_ID, dept_name ) if there is no separate relation mapping instructors to departments

108
5
Testing for 3NF
 Testing a given schema to see if it satisfies 3NF has been
shown to be NP-hard
 Possible to achieve 3NF by repeated decomposition based on
finding functional dependencies that show violation of 3NF
 similar to BCNF decomposition, NP hardness not a big deal
since schemas tend to be small
 BUT does not guarantee dependency preservation
 e.g. R = (A, B, C)

F = {A  B, B  C), decomposed using A  B


 Coming up: an algorithm to compute a dependency preserving
decomposition into third normal form
 Based on the notion of a “canonical cover”
 Interestingly, runs in polynomial time, even though testing
for 3NF is NP hard

109
5
Canonical Cover
 Sets of functional dependencies may have redundant
dependencies that can be inferred from the others
 For example: A  C is redundant in: {A  B, B  C, A C}
 Parts of a functional dependency may be redundant
 E.g.: on RHS: {A  B, B  C, A  CD} can be

simplified to
{A  B, B  C, A  D}
 E.g.: on LHS: {A  B, B  C, AC  D} can be
simplified to
{A  B, B  C, A  D}
 Intuitively, a canonical cover of F is a “minimal” set of functional
dependencies equivalent to F, having no redundant dependencies
or redundant parts of dependencies

110
5


Extraneous Attributes
Consider a set F of functional dependencies and the functional
dependency    in F.
 Attribute A is extraneous in  if A  
and F logically implies (F – {  })  {( – A)  }.
 Attribute A is extraneous in  if A  
and the set of functional dependencies
(F – {  })  { ( – A)} logically implies F.
 Note: implication in the opposite direction is trivial in each of the cases
above, since a “stronger” functional dependency always implies a
weaker one
 Example: Given F = {A  C, AB  C }
 B is extraneous in AB  C because {A  C, AB  C} logically
implies A  C (I.e. the result of dropping B from AB  C).
 Example: Given F = {A  C, AB  CD}
 C is extraneous in AB  CD since AB  C can be inferred even
after deleting C

111
Testing if an Attribute is 5

Extraneous
Consider a set F of functional dependencies and the functional
dependency    in F.
 To test if attribute A   is extraneous in 
1. compute ({} – A)+ using the dependencies in F
2. check that ({} – A)+ contains ; if it does, A is extraneous
in 
 To test if attribute A   is extraneous in 
1. compute + using only the dependencies in
F’ = (F – {  })  { ( – A)},
2. check that + contains A; if it does, A is extraneous in 

• Example: Given F = {A  C, AB  C }:
B is extraneous in AB  C because AB-B = A, and A+ contains C
• Example: Given F = {A  C, AB  CD}:
C is extraneous in AB  CD since (AB)+ under {AC, ABD}
(AB)+ = ACD, which contains C
112
5
Canonical Cover
 A canonical cover for F is a set of dependencies Fc such that
 F logically implies all dependencies in Fc, and
 Fc logically implies all dependencies in F, and
 No functional dependency in Fc contains an extraneous attribute, and
 Each left side of functional dependency in Fc is unique.

113
5
Computing a Canonical
 Cover
To compute a canonical cover for F:
repeat
Use the union rule to replace any dependencies in F
1  1 and 1  2 with 1  1 2
Find a functional dependency    with an
extraneous attribute either in  or in 
/* Note: test for extraneous attributes done using Fc, not F*/
If an extraneous attribute is found, delete it from   
until F does not change
 Note: Union rule may become applicable after some extraneous
attributes have been deleted, so it has to be re-applied

114
5

Computing
R = (A, B, C)
a Canonical Cover
F = {A  BC
BC
AB
AB  C}
 Combine A  BC and A  B into A  BC
 Set is now {A  BC, B  C, AB  C}
 A is extraneous in AB  C
 Check if the result of deleting A from AB  C is implied by the other
dependencies
 Yes: in fact, B  C is already present!

 Set is now {A  BC, B  C}


 C is extraneous in A  BC
 Check if A  C is logically implied by A  B and the other
dependencies
 Yes: using transitivity on A  B and B  C.

 Can use attribute closure of A in more complex cases

 The canonical cover is: A  B


BC

115
5
3NF Decomposition
Algorithm
Let Fc be a canonical cover for F;
i := 0;
for each functional dependency    in Fc do
if none of the schemas Rj, 1  j  i contains  
then begin
i := i + 1;
Ri :=  
end
if none of the schemas Rj, 1  j  i contains a candidate key for R
then begin
i := i + 1;
Ri := any candidate key for R;
end
/* Optionally, remove redundant relations */
for all Rk
if schema Rk is contained in another schema Rk
then Rk = Ri; i=i-1; /* delete Rk */
return (R1, R2, ..., Ri)
116
5
3NF Decomposition
Algorithm (Cont.)
 Above algorithm ensures:
 each relation schema Ri is in 3NF
 decomposition is dependency preserving and lossless-join

117
5
3NF Decomposition: An
 Example
Relation schema:
cust_banker_branch = (customer_id, employee_id,
branch_name, type )
 The functional dependencies for this relation schema are:
1. customer_id, employee_id  branch_name, type
2. employee_id  branch_name
3. customer_id, branch_name  employee_id
 We first compute a canonical cover
 branch_name is extraneous in the r.h.s. of the 1st dependency
 No other attribute is extraneous, so we get FC =

customer_id, employee_id  type


employee_id  branch_name
customer_id, branch_name  employee_id

118
5
3NF Decompsition
Example (Cont.)
 The for loop generates following 3NF schema:
(customer_id, employee_id, type )
(employee_id, branch_name)
(customer_id, branch_name, employee_id)
 Observe that (customer_id, employee_id, type ) contains a
candidate key of the original schema, so no further relation
schema needs be added
 At end of for loop, detect and delete schemas, such as
(employee_id, branch_name), which are subsets of other schemas
 result will not depend on the order in which FDs are considered
 The resultant simplified 3NF schema is:
(customer_id, employee_id, type)
(customer_id, branch_name, employee_id)
119
5
Comparison of BCNF and
3NF
 It is always possible to decompose a relation into a set of relations
that are in 3NF such that:
 the decomposition is lossless
 the dependencies are preserved
 It is always possible to decompose a relation into a set of relations
that are in BCNF such that:
 the decomposition is lossless
 it may not be possible to preserve dependencies.

120

You might also like