Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
13 views

Lecture 2 - Database Theory For Data Science

Study material
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Lecture 2 - Database Theory For Data Science

Study material
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Database Theory for Data Science

ANUP APREM

• BASED ON MATERIAL FROM DALT7002 (P08801): DATA SCIENCE FOUNDATIONS AT


OXFORD BROOKES UNIVERSITY
• SPONSORED BY BRITISH COUNCIL GOING GLOBAL EXPLORATORY GRANT
Databases
• Database is to used to store (and
manage) large volume of (structured) • Advantages
data • Large volume of data
• Consistency of data
• Example • Security and Sharing of data
• Employees in an organization • Persistency of data (failure and recovery)
• Books in Library • Distribution of data (using distributed
• Airline and Hotel Bookings database)

• What data?
• Numerical/Textual

• DBMS: Database Management System


Software that manages data in database
Databases
• DBMS: Three levels of abstraction
• External or View
• Logical (We will focus on this!)
• This level tells how the data is actually
stored and structured.
• Physical

• Database models (Logical Level)


• Hierarchical model
• Network model
• Relational model (We will focus on
this!)
• Example SQL
An Example -- Employees
Primary key Candidate key

Table/Relation
EMPLOYEE Variable

Name Title ID_No NI_No Home_ Email_ Phone_No


Address Address Attributes/
M. Younas Dr 2233 0011B 100 ABC St. my@aa.org 0001122 Fields

S.Kamal Dr 1122 0022J 200 ABC St. sk@aa.org 0002211


Tuples
D. Duce Prof 2244 0033J 122 AA St dd@aa.org 0003311
H. Zhu Prof 2245 0024K 123 BB St hz@bb.org 0003211
M. Younas Mr 3311 1100S 200 AA St my@bb.org 221100
Relational Database
• Data in a database is stored in form • Candidate Keys
of tables (or relation variables)
A field or set of fields that uniquely
• Each table has a name. identifies a single tuple in the table
• Example: Employee table
• Primary Key
• Each table has a set of columns
One of the candidate keys (arbitrarily!)
which represent the different data
items (or attributes) about an entity • Alternate Keys
• Example: Title, ID_No, NI_No, Home_
Address Remaining candidate keys
• Each table also has a set of rows (or
tuples) that represent the data
(records) about different entities of
the same types.
ID_NO
An example – Employees and Expenses
Name Title ID_No NI_No Home_ Email_ Phone_No
Address Address
M. Younas Dr 2233 0011B 100 ABC St. my@aa.org 0001122
S.Kamal Dr 1122 0022J 200 ABC St. sk@aa.org 0002211
EMPLOYEE table
D. Duce Prof 2244 0033J 122 AA St dd@aa.org 0003311
H. Zhu Prof 2245 0024K 123 BB St hz@bb.org 0003211
M. Younas Mr 3311 1100S 200 AA St my@bb.org 221100

Ref_No ID_No Ticket Food Accommodation Stationary Books


A100 1122 £200 £60.50 £300 £25
EXPENSES table
A105 2244 £250 £50 £400 0

A106 2245 £300 £70 £350 £50


Foreign Key
• Incorporate relationship between two • “Rules for foreign key”
tables. • Same name as the primary key from which
it was copied.
• Typically, Inserting the primary key • Same “specifications” as the primary key.
field of one table into another table.
• Example: ID_No in Expenses table.
• Referential integrity
• Value of foregin key must match the
existing value of the primary key
Table Relationships
• Relationship establishes a “logical
connection” between a pair of tables
• One-to-one relationship: For every
tuple in the first table, there is at most
one tuple in the second table
Table Relationships
• One-to-many relationship: For every tuple in the first table, there is at one or
more than one tuple in the second table
• However, a single tuple in the second table can be related to only one tuple in the
first
Table Relationships
• Many-to-many relationship:
• For every tuple in the first table, there is at one or more than one tuple in the second table
• Similarly, a single tuple in the second table can be related to more than one in first table.

• Disadvantages
• Redundancy in one of the tables
• Difficult to modify tables.
Database design – Data Normalization
• Given a collection of data, how do
we represent it in a database?
• Key tasks
• Identify major attributes.
• Identify relationship types.
• Determine Primary Keys.
• Determine Foreign Keys.
• Associate attributes with entity or
relationship types. Why is Normalization of data important?
• Validate the model using
Normalization. -To minimize data entry (and repetition)
-To avoid duplicate entry of data
-To avoid data inconsistency
-To simplify database maintenance
Functional Dependency
• What is normal form? • Employees table
A normal form is a state of a relation
variable that requires that certain rules ID_No Name
regarding relationships between
attributes (or functional dependencies) • Functional dependency happens when
are satisfied. one attribute in a table uniquely
• 1NF, 2NF and 3NF are based on identifies another attribute
functional dependency • Formal Definition: Attribute B is
functionally dependent on attribute A
if, for every valid instance of A, that
value of A uniquely determines the
value of B (Dutka and Hanson, 1989).
An example: Grade report
Student ID Student- Campus Major Cours Course-Title Instructor- Instructor- Grade
Name Address e-ID Name Location

268300458 Williams 208 Brooks IS IS350 Database Codd B104 A


Mgt

208 Brooks Parsons B317 B


IS465 Systems
Analysis

543291073 Baker 104 Phillips Acctg IS350 Database Codd B104 C


Mgt

104 Phillips Acctg B


201 Fund Acctg Miller H310
104 Phillips A
Mktg Intro Mktg
Bennett B212
300
First Normal Form (1NF)
Any multivalued attributes (also called Multivalued attributes: Campus address,
repeating groups) have been removed, Course-ID, Course-Title, Instructor-
so there is a single value at the Name, Instructor-Location, Grade
intersection of each row and column

Student ID Student- Campus Major Cours Course- Instruct Instructor- Grade


Name Address e-ID Title or-Name Location
Disadvantages
268300458 Williams 208 IS IS350 Database Codd B104 A
Brooks Mgt
- Redundancy
268300458 Williams 208 IS IS465 Systems Parsons B317 B
Brooks Analysis
- Update/Deletion
543291073 Baker 104 Acctg IS350 Database Codd B104 C anomaly
Phillips Mgt

543291073 Baker 104 Acctg Acctg Fund Acctg Miller H310 B


Phillips 201

543291073 Baker 104 Acctg Mktg Intro Mktg Bennett B212 A


Phillips 300
Second Normal Form (2NF)
Table actually contains information on Functional Dependency
three separate entities: Student-ID → Student-Name, Campus-
- Student Address, Major
Course-ID → Course-Title, Instructor-
- Course Name, Instructor-Location
- Instructor Student-ID, Course-ID → Grade
Each repeated many times in table
• Single attribute primary key => 2NF
?
• Composite primary key ֜ 2NF

A relational variable that is in First Normal Form and every non-primary-key attribute is fully
functionally dependent on the primary key, then the relational variable is in Second Normal Form (2NF).
Grader report in 2NF Student-ID Course-ID Grade
Student-ID Student- Campus- Major
Name Address 268300458 IS350 A

268300458 Williams 208 IS 268300458 IS465 B


Brookes
548291073 IS350 C
548291073 Baker 104 Acctg
Phillips 548291073 Acctg 201 B

Course-ID Course-Title Instructor- Instructor- 548291073 Mktg 300 A


Name Location
IS350 Database Mgt Codd B104
IS465 Systems Parsons B317
Analysis
Acctg 201 Fund Acctg Miller H310
Mktg 300 Intro Mktg Bennett B212
Third Normal Form (3NF)
• A relation is in third normal form Functional Dependency
(3NF) if it is in second normal form
and no transitive dependencies exist Course-ID → Course-Title, Instructor-
Name → Instructor-Location
• A transitive dependency in a relation is
a functional dependency between the (Transitive dependency)
primary key and one or more nonkey
attributes that are dependent on
the primary key via another nonkey
attribute.
STUDENT
Grader report in 3NFREGISTRATION
Student-ID Student- Campus- Major Student-ID Course-ID Grade
Name Address
268300458 Williams 208 IS 268300458 IS350 A
Brookes
268300458 IS465 B
548291073 Baker 104 Acctg
Phillips 548291073 IS350 C

548291073 Acctg 201 B

548291073 Mktg 300 A


COURSE
Course-ID Course-Title Instructor- INSTRUCTOR
Name
Instructor- Instructor-
IS350 Database Mgt Codd Name Location

IS465 Systems Parsons Codd B 104


Analysis
Parsons B 317
Acctg 201 Fund Acctg Miller
Miller H 310
Mktg Intro Mktg Bennett
Bennett B 212
References
• Database Design for Mere Mortals: A Hands-on Guide to Relational
Database Design, Michael J. Hernandez
• Modern database Management, Jeffrey A. Hoffer, V. Ramesh, Heikki
Topi

Thank you!

You might also like