Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

BC2402 Week 2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 69

Week 2

Relational Data Model


Normalization

BC2402 Designing and Developing Databases


Agenda
 Relational Data Model

 Normalization
2.1 Relational Data Model
Relational Data Model
Previously…
 Use Case
 ERD
 Extended ERD Use Case

ERD

Relational
Data Model
Relation
 Definition: A relation is a named, two-dimensional table of data

 Table consists of rows (records) and columns (attribute or field)

 Requirements for a table to qualify as a relation:


 It must have a unique name
 Every attribute value must be atomic (not multivalued, not composite)
 Every row must be unique (can’t have two rows with exactly the same
values for all their fields)
 Attributes (columns) in tables must have unique names
 The order of the columns must be irrelevant
 The order of the rows must be irrelevant
The Relational Data Model
 Logical data model
 Mathematically oriented
 Implementable on any system
 Determine results using mathematical analysis

 Math foundation
 Relations (almost tables)
 Operators (Select, Project, Join, Union, Intersection,
Subtraction…)
Correspondence with E-R Model
 Relations (tables) correspond with entity types and with
many-to-many (incl. one to many) relationship types

 Rows correspond with entity instances and with many-to-


many relationship instances

 Columns correspond with attributes

 NOTE: The word relation (in relational database) is NOT


the same as the word relationship (in E-R model)
ER vs. Relational vs. Tables
Entity- Relationship Relational Physical
Model Model RDBMS
Entity Relation Table
Entity Instance Tuple Record
Relationship
Attribute Attribute Attribute
Element Element Element
Primary Key Primary Key Primary Key
Foreign Key Foreign Key
Domain Data Type + Constraints
Relational Algebra SQL/QBE
Key Fields
 Keys are special fields that serve two main purposes:
 Primary keys are unique identifiers of the relation in question.
Examples include employee numbers, social security numbers, etc. This
is how we can guarantee that all rows are unique
 Foreign keys are identifiers that enable a dependent relation (on the
many side of a relationship) to refer to its parent relation (on the one
side of the relationship)

 Keys can be simple (a single field) or composite (more than


one field)

 Keys usually are used as indexes to speed up the response to


user queries (A very interesting (debatable) topic… we will look
at this in SQL and noSQL)
Relational schema of four relations

Primary Key

Foreign Key (implements


1:N relationship between
customer and order)

Combined, these are a composite


primary key (uniquely identifies the
order line)…individually they are foreign
keys (implement M:N relationship
between order and product)
Constraints
 Domain Integrity Constraints
 Allowable values for an attribute (ie. INTERGER ONLY)

 Entity Integrity
 No primary key attribute may be null. All primary key fields MUST have
data

 Referential Integrity
 States that any foreign key value (on the relation of the many side) MUST
match a primary key value in the relation of the one side. (Or the foreign key
can be null). For example:
 Delete Rules
 Restrict: don’t allow delete of “parent” side if related rows exist in “dependent” side
 Cascade: automatically delete “dependent” side rows that correspond with the “parent” side row to
be deleted
 Set-to-Null: set the foreign key in the dependent side to null if deleting from the parent side  not
allowed for weak entities
Referential integrity constraints

Referential integrity
constraints are drawn via
arrows from dependent to
parent table
Transforming ER to Relational
 Entities
 Every entity becomes a table
 Weak entity takes key of strong entity as part of primary key

 Attribute
 Every ER attribute becomes a relational attribute
 Composite attributes: Use only their simple, component attributes
 Multivalued Attribute: Becomes a separate relation with a foreign key taken from the
superior entity
 Remember: ER Model has no foreign keys!

 Relationships
 Become a table or
 Set foreign keys
 Sometimes both

 Following rules are 90% true

 Exceptions: Watch out!


Mapping a regular entity

(a) CUSTOMER entity


type with simple
attributes

(b) CUSTOMER relation


Mapping a composite attribute

(a) CUSTOMER entity


type with composite
attribute

(b) CUSTOMER relation with address detail


Mapping an entity with a multi-valued attribute

Multi-valued attribute becomes a separate relation with foreign key

One–to–many relationship between original entity and new relation


Transforming EER Diagrams into Relations
Mapping Weak Entities
 Becomes a separate relation with a foreign key taken from the
superior entity

 Primary key composed of:


 Partial identifier of weak entity
 Primary key of identifying relation (strong entity)

Weak entity DEPENDENT


Example of mapping a weak entity
Relations resulting from weak entity

Composite primary key


Mapping Binary Relationships
 One-to-Many (1:M)
 Primary key on the one side becomes a foreign key on the many side

 Many-to-Many (M:N)
 Create a new relation with the primary keys of the two entities as its
primary key
 Watch out for superkeys

 One-to-One (1:1)
 One side optional
 Primary key on the mandatory side becomes a foreign key on the
optional side
 Treat like 1:M

 Both sides optional


 Create intersection table
 Primary key of one side becomes a foreign key on the other side
 Treat like M:N
Binary 1:M Example 1
a) Relationship between customers and orders

Note the mandatory one

b) Mapping the relationship

Again, no null value in the


foreign key…this is because of
the mandatory minimum
cardinality
Foreign key
Binary 1:M Example 2

Course [CNo, CName]


Course-Sem [CNo, Year, Semester, Coordinator]
Section [CNo, Year, Semester, SecNo, Instructor]
Binary M:N Example 1

a) Completes relationship (M:N)

The Completes relationship will need to become a separate relation


Binary M:N Example 1
Three resulting relations

Composite primary key

New
Foreign key associative
relation
Foreign key
Binary M:N Example 2
IName CourseCode

DOB (1,M) (1,5)


Instructor Teaches Course

Gender
CName

Instructor [IName, DOB, Gender]


Course [CourseCode, CName]
Teaches [IName, CourseCode]
Binary Example: 1:1 Optionality

In_charge relationship (1:1)

Often in 1:1 relationships, one direction is optional


Binary Example: 1:1 Optionality
Resulting relations

Foreign key goes in the relation on the optional side,


matching the primary key on the mandatory side
Transforming EER Diagrams into Relations
Mapping Associative Entities
 Identifier Not Assigned
 Default primary key for the association relation is composed of the
primary keys of the two entities (as in M:N relationship)

 Identifier Assigned
 It is natural and familiar to end-users

Good practice to decompose M:N relationships in E/ERDs to show


associative entities
 Relations are typically translated directly into in 3rd normal
form
Example of mapping an associative entity

An associative entity
Example of mapping an associative entity
Three resulting relations

Composite primary key formed from the two foreign keys


Example of mapping an associative entity with an identifier
Example of mapping an associative entity with an identifier

Three resulting relations

Primary key differs from foreign keys


Unary Relationships
 One-to-Many
 Recursive foreign key in the same relation

 Many-to-Many
 Two relations:
 One for the entity type
 One for an associative relation in which the primary key has
two attributes, both taken from the primary key of the entity
Unary Relationships: 1:1
 Person marries person

 Possible implementations:
 Foreign key in table:
Person [ID, Gender, DOB, Married_ID]

 New table with two foreign keys


Person [ID, Gender, DOB]
Marries [Man_ID, Woman_ID]

 New table with invented relationship identifier


Person [ID, Gender, DOB]
Marries [Marriage_Cert, ID]
Unary Relationship 1:M

EMPLOYEE entity
with unary
relationship

EMPLOYEE
relation with
recursive
foreign key
Unary Relationship 1:M
 Employee supervises employees

 Possible solutions:
 Foreign key in table:
 Employee [ID, Gender, DOB, Supervisor]

 New table with two foreign keys


 Employee [ID, Gender, DOB]
 Supervisor [Supervisor_ID, ID]

 New table with invented relationship identifier


 Employee [ID, Gender, DOB]
 Workgroup [GroupNo, ID, IsSupervisor]
Unary Relationship M:N

Bill-of-materials
relationships (M:N)

ITEM and
COMPONENT
relations
Unary Relationship M:N
 Person related to person

 Possible solutions
 New Table With Two Foreign Keys
 Person [ID, Gender, DOB]
 Related-To [Left_ID, Right_ID, Relationship_Type]

 New Table With Invented Relationship Identifier


 Person [ID, Gender, DOB]
 Relationship [Rel_ID, ID, Position]
 Relationship-Def [Rel_ID, Definition]
N-Ary Relationship
 One relation for each entity and one for the associative
entity
 Associative entity has foreign keys to each entity in the relationship
 Primary key of intersection table based on relationships

 In relational database, dealt with by two normal forms


 4NF, 5NF
 Not discussed in class
 Just know they exist
N-Ary Example
PATIENT TREATMENT Ternary relationship with
associative entity
N-Ary Example
Mapping the ternary relationship PATIENT TREATMENT

Remember that This is why treatment But this makes a It would be better
the primary key date and time are very cumbersome to create a
MUST be included in the key… surrogate key like
unique composite primary key Treatment#
Summary
 Transforming ER to the relational data model
 Binary relationships
 Unary relationships
 N-ary relationships
2.2 Normalization
Normalization
Data Normalization
 Primarily a tool to validate and improve a logical design
so that it satisfies certain constraints that avoid
unnecessary duplication of data

 The process of decomposing relations with anomalies to


produce smaller, well-structured relations

 Level of normalization is directly dependent on level of


decomposition of ERD
 Decomposed ERD usually result in normalized relations
Why Normalization
 Eliminate redundancy from database
 Create new relations
 No extraneous functional dependencies

 Definitions
 Redundancy: Information appears more than once
 Functional dependency: Given the value of attribute A, or a set of
attributes A, I know the value of attribute B

 Redundancy causes
 Insertion anomalies
 Update anomalies
 Query anomalies
 Deletion anomalies (every entity be as specific as possible, don’t rojak)
 Headaches
Well-Structured Relations
 A relation that contains minimal data redundancy and
allows users to insert, delete, and update rows without
causing data inconsistencies

 Goal is to avoid anomalies


 Insertion Anomaly – adding new rows forces user to create
duplicate data
 Deletion Anomaly – deleting rows may cause a loss of data that
would be needed for other future rows
 Modification Anomaly – changing data in a row forces changes
to other rows because of duplication
General rule of thumb: A table should not pertain
to more than one entity set (unless associative
entity)
Exercise

Yes: Unique rows and no multivalued


Question: Is this a relation? attributes

Question: What’s the primary key? (when ctrl-F must only


give u one record, EMP_ID cmi here) Candidate keys: Emp_ID, Course_Title
(min number of attributes to find a unique
row that can be NULL)
Super Key (computation of multiple attribute to find unique row)
Anomalies in this Table, if Emp_ID and Course_Title are PK

o Insertion – can’t enter a new employee without having the employee take a class
o Deletion – if we remove employee 140, we lose information about the existence of a Tax Acc
class
o Modification – giving a salary increase to employee 100 forces us to update multiple records

o Why do these anomalies exist?


• Because there are two themes (entity sets) in this one relation. This results in data
duplication and an unnecessary dependency between the entities
Functional Dependencies and Keys
 Functional Dependency: The value of one attribute (the
determinant) determines the value of another attribute.
 Full dependency: The primary key fully determines the value of another
non-key attribute
 Partial dependency: The primary key partially determines the value of
another non-key attribute (ie. Using a part of a composite key. No partial
dependency if is a simple primary key)
 Transitive dependency: A non-key field determines the value of another
non-key attribute (finding a value without relying on PK)

 Candidate Key:
 A unique identifier. One of the candidate keys will become the primary key
 E.g. perhaps there is both credit card number and SS# in a table…in this case both are
candidate keys

 Each non-key field is functionally dependent on every candidate key


Steps in Normalization
Getting to First Normal Form
 Rules
 No multi-valued attributes
 Every attribute value is atomic
 Eliminate repeating groups to new relation

 Recommendation
 Decompose ERD to reflect:
 multivalued entity;
 Associative entity

 Primary key of new relation = key of previous relation + key of repeating group

 Spilt repeating group into separate relation

 All relations are in 1st Normal Form


Table with Multi-valued Attributes

Table is not in 1st Normal Form (multivalued attributes)  it is not a relation


Table with No Multivalued Attributes and Unique Rows

Table is in 1st Normal form  it is a relation but not a well structured one!
Anomalies in this Table, for given PK

o Insertion–if new product is ordered for order 1007 of existing customer, customer data must be re-
entered, causing duplication
o Deletion–if we delete the Dining Table from Order 1006, we lose information concerning this item's
finish and price
o Update–changing the price of product ID 4 requires update in several records

o Why do these anomalies exist?


• Because there are multiple themes (entity types) in one relation. This results in
duplication and an unnecessary dependency between the entities
Getting to Second Normal Form
 1NF PLUS every non-key attribute is fully and
functionally dependent on the ENTIRE primary key
Don’t use part of the primary key
 Rules:
 A non-key field must be dependent on the entire primary key

 No partial dependencies
 Non-key fields dependent on a partial key spawned to new relation

 Key of new relation is the partial key


Functional Dependency Diagram

Order_ID  Order_Date, Customer_ID, Customer_Name, Customer_Address


Customer_ID  Customer_Name, Customer_Address
Product_ID  Product_Description, Product_Finish, Unit_Price
Order_ID, Product_ID  Order_Quantity

Therefore, NOT in 2nd Normal Form


Removing Partial Dependencies

Partial dependencies are removed, but there are still transitive dependencies
Getting to Third Normal Form
 2NF PLUS no transitive dependencies
 i.e. functional dependencies on non-primary-key attributes

 Rules
 No non-key field should be dependent on a non-key field

 Dependent non-key field spawned as separate relation

 Non-key with functional dependency is new key

 Recommendation
 Non-key determinant with transitive dependencies go into a new table;

 Non-key determinant becomes primary key in the new table and stays as foreign key in the old
table

 Simply put
 "Every non-key must be dependent on the key, the whole key (no partial dependency), and
nothing but the key(no transitive dependency); so help me Codd“
Removing Transitive Dependencies

Transitive dependencies are removed


Boyce-Codd Normal Form (BCNF)
 Only occurs in a relation
 having more than one candidate key
 anomalies may result even though the relation is in 3NF
 composite keys with at least one candidate or non- key attribute
in common.

 A relation is in BCNF is, and only if, every determinant is


a candidate key
BCNF - Logical Model
A B C D
o In the above relation, the functional dependencies are:
• A, B -> C, D
• C -> B
• Therefore: A, C -> B, D
o Relation is not in BCNF. Why?
• Non-key attribute (C) determines value of part of a primary key (i.e. value of B)
• This is NOT a transitive dependency
• An implied partial dependency may exist between key (B) and non-key (D)
• BCNF requires every determinant to be key
o Solution:
• A,C -> D
• C -> B
o Should have been taken care of in previous rounds of normalization, but some
dependencies are overlooked or not salient in those rounds
BCNF Example Stu_id Stf_id C-Code Grade

(non key identify key)


In the above relation, the functional dependencies are:
Stu_id, Stf_id -> C_code, Grade
C_Code -> Stf_id

Relation is not in BCNF. Why?


Non-key attribute (C_Code) determines value of part of a primary key (i.e.
value of Stf_id)

Solution:
Stu_id, C_code -> Grade
C_Code -> Stf_id
Normalization – in Practice
 A well decomposed ERD will give you relations in 3NF
 1NF should not exist
 2NF is usually skipped

 BCNF and above present tradeoffs between:


 Data preservation/Information loss
 Operating efficiency, especially in large databases
 Solution is typically to de-normalize to 3NF (see next slide)

 Know they exist


 Rule of thumb: Aim for BCNF but settle for 3NF
Merging Relations
 View Integration
 Combining entities and attributes from multiple ER models into
common relations
 Occurs when data are typically processed together

 Issues to watch out for when merging entities from different ER


models:
 Synonyms: two or more attributes with different names but same
meaning (expressed as 2 different names with the same values)
 Homonyms: attributes with same name but different meanings ( same
column name but different values)
 Transitive dependencies: even if relations are in 3NF prior to merging,
they may not be after merging
 Supertype/subtype relationships: may be hidden prior to merging
Normalization and ER
 Decomposed ER
 90% of entities normalized
 Watch out for the 10%

 Going back to ER
 Some newly normalized tables should be entities
 Revise ER diagram accordingly
Summary
 To articulate why normalization is necessary (avoid
anomalies, save storage space)

 To normalize a relation to the highest normal form


What have we learnt?
 Relational Data Model
 Transforming ERD to relational data model
 Binary relationships
 Unary relationships
 N-ary relationships
 Normalization
 1NF, 2NF, 3NF, BCNF
What’s next?
Fundamental Query Language: Basic Topics in Structured Query Language (SQL)

You might also like