Lecture 2 - Database Theory For Data Science
Lecture 2 - Database Theory For Data Science
ANUP APREM
• What data?
• Numerical/Textual
Table/Relation
EMPLOYEE Variable
• Disadvantages
• Redundancy in one of the tables
• Difficult to modify tables.
Database design – Data Normalization
• Given a collection of data, how do
we represent it in a database?
• Key tasks
• Identify major attributes.
• Identify relationship types.
• Determine Primary Keys.
• Determine Foreign Keys.
• Associate attributes with entity or
relationship types. Why is Normalization of data important?
• Validate the model using
Normalization. -To minimize data entry (and repetition)
-To avoid duplicate entry of data
-To avoid data inconsistency
-To simplify database maintenance
Functional Dependency
• What is normal form? • Employees table
A normal form is a state of a relation
variable that requires that certain rules ID_No Name
regarding relationships between
attributes (or functional dependencies) • Functional dependency happens when
are satisfied. one attribute in a table uniquely
• 1NF, 2NF and 3NF are based on identifies another attribute
functional dependency • Formal Definition: Attribute B is
functionally dependent on attribute A
if, for every valid instance of A, that
value of A uniquely determines the
value of B (Dutka and Hanson, 1989).
An example: Grade report
Student ID Student- Campus Major Cours Course-Title Instructor- Instructor- Grade
Name Address e-ID Name Location
A relational variable that is in First Normal Form and every non-primary-key attribute is fully
functionally dependent on the primary key, then the relational variable is in Second Normal Form (2NF).
Grader report in 2NF Student-ID Course-ID Grade
Student-ID Student- Campus- Major
Name Address 268300458 IS350 A
Thank you!