Introduction To Database
Introduction To Database
Introduction to Databases
Presented by
Yun Shen (yshen16@bu.edu)
Research Computing
Introduction
• What is Database
• Key Concepts
• Typical Applications and Demo
• Lastest Trends
Research Computing
What is Database
• Three levels to view:
▫ Level 1: literal meaning – the place where data is stored
Database = Data + Base, the actual storage of all the information that are
interested
▫ Level 2: Database Management System (DBMS)
The software tool package that helps gatekeeper and manage data storage,
access and maintenances. It can be either in personal usage scope (MS Access,
SQLite) or enterprise level scope (Oracle, MySQL, MS SQL, etc).
▫ Level 3: Database Application
All the possible applications built upon the data stored in databases (web site,
BI application, ERP etc).
Research Computing
Database Types
• Flat Model
• Navigational databases
▫ Hierarchical (tree) database model
▫ Network/Graph model
• Relational Model
• Object model
• Document model
• Entity–attribute–value model
• Star schema
Research Computing
app
DBMS DB
Research Computing
• Demo :
▫ Take a look at the following file directories:
Access -- C:\ARCS_dbtutorial\db\access\
Postgresql -- /project/scv/examples/db/tutorial/data/postgresql/testdb/
Research Computing
ATTENTION
!!!
NO database files can be accessed directly,
but only through the database engine, called “DBMS”
Research Computing
app
DBMS DB
Three Common Acronyms
• SQL – Structured Query Language
Database Content
Typical Database • User Data: tables to store user data
For example, A ‘student’ table may contains (student id, first name, last name,
grade, school name, home address, …), and each row may represent one
student’s information, and each column of the table represents one piece of
information of all students. And this is called a ‘relation’.
Research Computing
Surrogate Key
• Surrogate key is a unique column
• added to a relation to use as the primary key when lack of natural
column serves as primary key, or when composite key needs to be
replaced for various reasons.
• Surrogate key is usually in form of auto increment numeric value, and
of no meaning to the user, and thus cd often hidden in the table, or
form or other entity for the internal use.
• Surrogate keys are often used in the place of composite key to add more
flexibility to the table.
Research Computing
Examples:
1:1 Employee – Locker
1:N Customer – Order, Order – Order Detail
M:N Student – Course
Research Computing
Sample E-R
diagram #1
Research Computing
Sample E-R
diagram #2
Research Computing
Sample E-R
diagram #3
Research Computing
Id Name Course
1. Jack Mathematics
4. Jack Chemistry
2. Tim Chemistry
3. Ana Physics
5. Ana Chemistry
Research Computing
Id Name CourseID
Course
Course Department
1. Jack M-1 ID
4. Jack C-1 M-1 Mathematics Math
2. Tim C-1 C-1 Chemistry Chemistry
3. Ana P-1 P-1 Physics Physics
5. Ana C-1
Research Computing
Member Member
Rate Type Court Court Start Time End Time
Flag Flag
SAVER 1 Yes Yes 1 09:30 10:30
STANDARD 1 No Yes 1 11:00 12:00
PREMIUM-A 2 Yes No 1 14:00 15:30
PREMIUM-B 2 No No 2 10:00 11:30
No 2 11:30 13:30
Yes 2 15:00 16:30
Research Computing
Advantages of Normalization
BCNF+ normalization can eliminate all anomalies :
▫ No Redundancy
▫ No Inconsistency – all changes can only be made at the same place and
keep consistent (because of the key constraints), in DB terminology – get
away with all update anormaly.
▫ Normalization is the process of decomposition, so all the business concepts
can be modeled with clear logical relationships
▫ The entire database system remains consistent over time as the database
grows with least redundancy and much durability.
▫ Strong support to be ACID compliant
Research Computing
Advantages of Normalization
• No(less) data redundancy – means easy management, less storage, etc.
• No headache caused by data operation anomalies. Good for data
integrity and consistency.
Research Computing
Disadvantages of Normalization
• Take effort
• May increase complexity in data structure
• Data retrieving efficiency may be discounted due to the need of join of
multiple tables; So may not be proper in read-intensive data
applications
• Sometimes the constraints may be too strict to be flexible to make some
customized change needed.
Research Computing
Disadvantages of Normalization
• Hard to deal with complex data structures such as class, objects, rows in
a field.
• Query for comprehensive information can be costly.[6]
• Due to fixed predesigned structure, it is not flexible in terms of
restructure of data
Research Computing
Modern applications
• Today companies like Google, Amazon and Facebook deal with loads of
data and storing that data in an efficient manner is always a big task.
They use NoSQL database which is based on the principles of
unnormalized relational model to deal with storage issue.Some of the
examples of NoSQL databases are MongoDB, Apache
Cassandra and Redis. These databases are more scalable and easy to
query with as they do not involve expensive operations like JOIN.
Research Computing
Denormalization
• Normalization and denormalization both have advantages and
disavantages. The best practice is always a trade off between the two.
• Denormalization will increase the risk of loss of data integrity and the
size of storage, but may gain the simplicity and intuitivity of presenting
data.
Research Computing
Denormalization - Example
• Customer (CustomerID, Name, Address, Zip, City, State)
Denormalization - Example
• This is the normalized table design
Denormalization - Example
• This is the denormalized table design
CustomID Name Address Zip City State
Database Operation/Administration
• CRUD (Create/Read/Update/Delete) – four basic operations
• All through SQL (Structured Query Language)
▫ Sublanguage
▫ DDL (Data definition Language)
▫ DQL (Data Query Language)
▫ DML (Data Manipulate Language)
▫ DCL (Data Control Language)
▫ Scope of SQL: Query (select), Manipulate(Insert/update/delete),
Definition(Create/Modify tables/columns) , Access Control (permission)
Research Computing
BI Systems
• Reporting System
• Data Mining (Has big overlap with today’s ML/AI trend)
• Data Warehouse/Data Mart
• ETL (Extract/Transform/Load)
Research Computing
Big Data
• 4Vs:
▫ Volume – how big in storage is need?
▫ Variety – how diverse is the data ?
▫ Veracity – can data be verified/trusted?
▫ Velocity – how fast is the data being generated?
Research Computing
Summary of Training
• List important points from each lesson.
• Provide resources for more information on subject.
▫ List resources on this slide.
▫ Provide handouts with additional resource material.