Quiz 1 Notes DBMS
Quiz 1 Notes DBMS
Quiz 1 Notes DBMS
Week 1 Lecture 1
Class BSCCS2001
Materials https://drive.google.com/drive/folders/19FhdYYKeH3ZshWhoZIJlP_MC1nVnUUmU?usp=sharing
Module # 1
Type Lecture
Week # 1
🚨 DBMS: A database management system (or DBMS) is essentially nothing more than a computerized data-
keeping system. (via IBM)
Database Applications:
Banking: transactions
Week 1 Lecture 1 1
University Database Example
Application program examples
Add new students, instructors and courses
Assign grades to students, compute Grade Point Average (GPA) and generate transcripts
In early days, database applications were built directly on top of file systems
Data isolation
Integrity problems
Integrity constraints (eg: account balance > 0) become "buried" in program code rather than being stated explicity
Atomicity of updates
Failures may leave databases in an inconsistent state with partial updates carries out
Example: Transfer of funds from one account to another should either complete or not happen at all
Example: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at
the same time
Security problems
Course pre-requisites:
Set Theory
Definition of a set
Intensional definition
Extensional definition
Set-builder notation
Operations on sets:
De-Morgan's Law
Week 1 Lecture 1 2
Image, Pre-image, Inverse
Definition of functions
Composition of functions
Inverse of functions
Propositional Logic
Truth values and Truth tables
Predicate Logic
Predicates
Quantification
Existential
Universal
Python
Merge sort
Quick sort
Search
Linear search
Binary search
Interpolation search
Data Structures
Arrays
List
Balanced Tree
B - Tree
Hash table/map
Python
Cheatsheet: https://www.pythoncheatsheet.org
Week 1 Lecture 1 3
C Language: https://www.youtube.com/watch?
v=zYierUhIFNQ&list=PLhQjrBD2T382_R182iC2gNZI9HzWFMC_8&index=2 (part of CS50 2020 Lectures)
Week 1 Lecture 1 4
📚
Week 1 Lecture 2
Class BSCCS2001
Materials
Module # 2
Type Lecture
Week # 1
Why DBMS?
Data Management
Storage
Retrieval
Transaction
Audit
Archival
For
Individuals
Global
1. Physical:
Physical Data or Records Management, more formally known as Book Keeping, has been using physical ledgers
and journals for centuries
The most significant development happened when Henry Brown patented a "receptacle for storing and preserving
papers" on November 2, 1886
Herman Hollerith adapted the punch cards used for weaving looms to act as the memory for a mechanical tabulating
machine in 1890
Week 1 Lecture 2 1
2. Electronic:
Electronic Data or Records management moves with the advances in technology, especially of memory, storage,
computing and networking
1960s: Data Management with punch cards / tapes and magnetic tapes
1970s:
On October 14, 1979, Apple II platform shipped VisiCalc, marking the birth of spreadsheets
2000s: e-Commerce boomed, NoSQL was introduced for unstructured data management
Durability
Scalability
Security
Retrieval
Ease of Use
Consistency
Efficiency
Cost
Book Keeping
A book register was maintained on which the shop owner wrote the amount received from customers, the amount due for
any customer, inventory details and so on ...
Durability: Physical damage to these registers is a possibility due to rodents, humidity, wear and tear
Scalability: Very difficult to maintain over the years, some shops have numerous registers spanning over the years
Not only small shops but large orgs also used to maintain their transactions in book registers
Durability: These are computer applications and hence data is less prone to physical damage
Scalability: Easier to search, insert and modify records as compared to book ledgers
Easy to Use: Computer applications are used to search and manipulate records in the spreadsheets leading to
reduction in manpower needed to perform routing computations
Week 1 Lecture 2 2
Consistency: Not guaranteed but spreadsheets are less prone to mistakes registers
With rapid scale up of data, there has been considerable increase in the time required to perform most operations
A typical spreadsheet file may have an upper limit on the number of rows
The above mentioned limitations of filesystems paved the way for a comprehensive platform dedicated to management of
data - the Database Management System
1980s
Research relational prototypes evolve into commercial systems - SQL becomes industrial standard
1990s
Early 2000s
Later 2000s
Giant data storage systems - Google BigTable, Yahoo PNuts, Amazon, ...
Week 1 Lecture 2 3
📚
Week 1 Lecture 3
Class BSCCS2001
Materials
Module # 3
Type Lecture
Week # 1
If the account balance is not enough, it will now allow the fund transfer
If the account numbers are not correct, it will flash a message and terminate the transaction
We will use this banking transaction system to compare various features of a file-based (.csv file) implementation viz-a-viz a
DBMS-based implementation
Source: https://github.com/bhaskariitm/transition-from-files-to-db
Initiating a transaction
Python
Week 1 Lecture 3 1
def begin_Transaction(credit_account, debit_account, amount):
temp = []
success = 0
SQL
Transaction
Python
try:
for sRec in f_reader1:
# CONDITION CHECK FOR ENOUGH BALANCE
if sRec['AcctNo'] == debitAcc and int(sRec['Balance']) > int(amt):
for rRec in f_reader2:
if rRec['AcctNo'] == creditAcc:
sRec['Balance'] = str(int(sRec['Balance']) - int(amt)) # DEBIT
temp.append(sRec)
# CRITICAL POINT
f_writer.writerow({
'Acct1':sRec['AcctNo'],
'Acct2':rRec['AcctNo'],
'Amount':amt,
'D/C':'D'
})
rRec['Balance'] = str(int(rRec['Balance']) + int(amt)) # CREDIT
temp.append(rRec)
f_writer.writerow({'Account1': r_record['Account_no'], 'Account2': s_record['Account_no'], 'Amount': amount,'D/C': 'C'})
success = success + 1
break
f_obj_Account1.seek(0)
next(f_obj_Account1)
for record in f_reader1:
if record['Account_no'] != temp[0]['Account_no'] and record['Account_no'] != temp[1]['Account_no']:
temp.append(record)
except:
print('\nWrong input entered !!!')
SQL
do $$
begin
amt = 5000
sendVal = '1800090';
recVal = '1800100';
select balance from accounts
into sbalance
where account_no = sendVal;
if sbalance < amt then
raise notice "Insufficient balance";
else
update accounts
set balance = balance - amt
where account_no = sendVal;
insert into ledger(sendAc, recAc, amnt, ttype)
values(sendVal, recVal, amt, 'D')
update accounts
set balance = balance + amt
where account_no = recVal;
insert into ledger(sendAc, recAc, amnt, ttype)
values(sendVal, recVal, amt, 'C')
commit;
raise notice "Successful";
end if;
end; $$
Week 1 Lecture 3 2
Closing a transaction
Python
f_obj_Account1.close()
f_obj_Account2.close()
f_obj_Ledger.close()
if success == 1:
f_obj_Account = open('Accounts.csv', 'w+', newline='')
f_writer = csv.DictWriter(f_obj_Account, fieldnames=col_name_Account)
f_writer.writeheader()
for data in temp:
f_writer.writerow(data)
f_obj_Account.close()
print("\nTransaction is successfull !!")
else:
print('\nTransaction failed : Confirm Account details')
SQL
Comparison
Scalability with
Very difficult to handle insert, update and querying of In-built features to provide high scalability for a large
respect to amount of
records number of records
data
Scalability with
Extremely difficult to change the structure of records Adding or removing attributes can be done seamlessly
respect to changes in
as in the case of adding or removing attributes using simple SQL queries
structure
Time of execution in seconds in milliseconds
Data processed using temporary data structures Data persistence is ensured via automatic, system
Persistence
have to be manually updated to the file induced mechanisms
Ensuring robustness of data has to be done Backup, recovery and restore need minimum manual
Robustness
manually intervention
Difficult to implement in Python (Security at OS
Security User-specific access at database level
level)
Most file access operations involve extensive coding Standard and simple built-in queries reduce the effort
Programmer's
to ensure persistence, robustness and security of involved in coding thereby increasing a programmer's
productivity
data throughput
Arithmetic operations Easy to do arithmetic computations Limited set of arithmetic operations are available
Parameterized Comparison
Scalability
File Handling in Python
Number of records: As the # of records increases, the efficiency of flat files reduces:
Structural Change: To add an attribute, initializing the new attribute of each record with a default value has to be done
by program. It is very difficult to detect and maintain relationships between entities if and when an attribute has to be
removed
DBMS
Number of records: Databases are built to efficiently scale up when the # of records increase drastically.
Week 1 Lecture 3 3
Structural Changes: During adding an attribute, a default value can be defined that holds for all existing records - the
new attribute gets initialized with default value. During deletion, constraints are used either not to allow the removal on
ensure its safe removal
However, in the number of records is really large, then the time required in the initialization process of a database will
be negligible as compared to that of using SQL queries
In order to process a 1GB file, a program in Python would typically take a few seconds
DBMS
The effort to install and configure a DB in a DB server in expensive and time consuming
In order to process a 1GB file, an SQL query would typically take a few milliseconds
Programmer's Productivity
File Handling in Python
Building a file handler: Since the constraints within and across entities have to be enforced manually, the effort
involved in building a file handling application is huge
Maintenance: To maintain the consistency of data, one must regularly check for sanity of data and the relationships
between entities during inserts, updates and deletes
Handling huge data: As the data grows beyond the capacity of the file handler, more efforts are needed
DBMS
Configuring the database: The installation and configuration of a database is a specialized job of a DBA. A
programmer, on the other hand, is saved the trouble
Maintenance: DBMS has built-in mechanisms to ensure consistency and sanity of data being inserted, updated or
deleted. The programmer does not need to do such checks
Handling huge data: DBMS can handle even terabytes of data - Programmer does not have to worry
Arithmetic Operations
File Handling in Python
Extensive support for arithmetic and logical operations on data using Python. These include complex numerical
calculations and recursive computations
DBMS
SQL provides limited support for arithmetic and logical operations. Any complex computation has to be done outside of
SQL
File systems are cheaper to install and use. No specialized hardware, software or personnel are required to maintain
filesystems
DBMS
Large databases are served by dedicated database servers which need large storage and processing power
DBMSs are expensive software that have to be installed and regularly updated
Databases are inherently complex and need specialized people to work on it - like DBA (Database System
Administrator)
The above factors lead to huge costs in implementing and maintaining database management systems
Week 1 Lecture 3 4
📚
Week 1 Lecture 4
Class BSCCS2001
Materials
Module # 4
Type Lecture
Week # 1
Introduction to DBMS
Levels of Abstraction
Physical Level: describes how a record (eg: instructor) is stored
Logical Level: describes data stored in a database and the relationships among the data fields
Views can also hide information (such as employee's salary) for security purposes
Week 1 Lecture 4 1
Schema and Instances
TLDR: Schema is the way in which data is organized and Instance is the actual value of the data
Schema
Example: The database consists of information about a set of customers and accounts in a bank and the
relationship between them
Customer Schema
Account Schema
Instance
Customer Instance
Account Instance
Week 1 Lecture 4 2
Account # Account Type Interest Rate Min. Bal. Balance
Physical Data Independence - the ability to modify the physical schema without changing the logical schema
In general, the interfaces between various levels and components should be well defined so that changes in some
parts do not seriously influence others.
Data Models
A collection of tools that describe the following ...
Data
Data relationships
Data semantics
Data constraints
Network model
Hierarchical model
XML format
Relational Model
All the data is stored in various tables
Example
Data dictionary contains metadata (that is, data about the data)
Database schema
Week 1 Lecture 4 3
Integrity constraints
Authorization
Pure - used for proving properties about computational power and for optimization
Cannot be used to solve all problems that a C program, for example, can solve
To be able to compute complex complex functions, SQL is usually embedded in some higher-level language
Application Programming Interfaces or APIs (eg: ODBC / JDBC) which allow SQL queries to be sent to the
databases
Database Design
The process of designing the general structure of the database:
Logical Design - Deciding on the database schema. Database design requires that we find a good collection of
relation schema
Business decision
What relation schemas should we have and how should the attributes be distributed among the various
relation schemas?
Week 1 Lecture 4 4
📚
Week 1 Lecture 5
Class BSCCS2001
Materials
Module # 5
Type Lecture
Week # 1
Extend the relational data model by including object orientation and constructs to deal with added data types
Allow attributes of tuples to have complex types, including non-atomic values such as nested relations
Preserve relational foundations, in particular the declarative access to data, while extending modeling power
Week 1 Lecture 5 1
XML: eXtensible Markup Language
Defined by the WWW Consortium (W3C)
The ability to specify new tags and to create tag structures made XML a great way to exchange data, not just
documents
XML has become the basis for all new generation data interchange formats
A wide variety of tools are available for parsing, browsing and querying XML documents
Database Engine
3 major components are:
Storage Manager
Query processing
Transaction Manager
Storage Management
Storage Manager is a program module that provides the interface between the low-level data stored in the database and
the application programs and queries submitted to the system
Issues:
Storage access
File organization
Query Processing
Parsing and Translation
Optimization
Evaluation
Equivalent expressions
Cost difference between a good and a bad way of evaluating a query can be enormous
Depends critically on statistical information about relations which the database must maintain
Need to estimate statistics for intermediate results to compute cost of complex expressions
Transaction Management
What is the system fails?
What if more than one user is concurrently updating the same file?
A transaction is a collection of operations that perform single logical function in a database application
Transaction-Management component ensure that the database remains in a consistent (correct) state despite
system failures (eg: power failures and operating system crashes) and transaction failures
Week 1 Lecture 5 2
Concurrency-control manager controls the interaction among the concurrent transactions to ensure consistency of
the database
Database Architecture
The architecture of a database system is greatly influenced by the underlying computer system on which the database is
running:
Centralized
Client-Server
Parallel (multi-processor)
Distributed
Cloud
Week 1 Lecture 5 3
📚
Week 2 Lecture 1
Class BSCCS2001
Materials
Module # 6
Type Lecture
Week # 2
Student = Roll #, First Name, Last Name, DoB, Passport #, Aadhaar #, Department
relation
The set of allowed values for each attribute is called the domain of the attribute
DoB - Date
The special value null is a member of every domain. Indicates that the value is unknown
the null value may cause complications in the definition of many operations
Week 2 Lecture 1 1
Roll # First Name Last Name DoB Passport Aadhaar Dept.
16EE30029 Jatin Chopra 17-Nov-1996 null 391718363816 Electrical
D1 ✕D2 ✕...Dn
Thus, a relation is a set of n-tuples (a1 , a2 , ..., an ) where each ai ∈ Di
The current values (relation instance) of a relation are specified by a table
Example
instructor ≡ (String(5) ✕ String ✕ String ✕ Number+), where ID ∈ String(5), name ∈ String, dept_name ∈ String and
salary ∈ Number+
Keys
Let K ⊆ R, where R is the set of attributes in the relation
K is a superkey of R if values of K are sufficient to identify a unique tuple of each possible relation r(R)
Example: {ID} and {ID, name} are both superkeys of instructor
A surrogate key (or synthetic key) in a database is a unique identifier for either an entity in the modeled world or an
object in the database
The surrogate key is not derived from application data, unlike a natural (or business) key which is derived from
application data
Keys: Examples
Students = Roll #, First Name, Last Name, DoB, Passport #, Aadhaar #, Department
Passport # cannot be a key because it is an optional field and can take null values, but an ID can never be null
It may suffice for unique identification, but Roll # may have additional useful information.
Read it as 14-CS-92-P-01
14 - Admission in 2014
01 - Serial Number
Week 2 Lecture 1 2
Composite Key: {First Name, Last Name}
One or more of the attributes, which make up the key are not simple keys in their own right
Foreign key constraint: Value in one relation must appear in another (in other words, when a particular attribute is a
key in a different table)
Referencing relation
Referenced relation
Students, Courses
A compound key consists of more than one attribute to uniquely identify an entity occurence
Each attribute, which makes up the key, is a simple key in its own right
{Roll #, Course #}
Procedural programming requires that the programmer tell the computer what to do
That is, how to get the output for the range of required inputs
The programmer must know what relationships hold between various entities
Week 2 Lecture 1 3
Relational Query Language: Example
"Pure" languages:
Relational Algebra
Week 2 Lecture 1 4
📚
Week 2 Lecture 2
Class BSCCS2001
Materials https://www.caam.rice.edu/~heinken/latex/symbols.pdf
Module # 7
Type Lecture
Week # 2
Week 2 Lecture 2 1
The select operation is defined as
Week 2 Lecture 2 2
The union of two relation is defined as
Week 2 Lecture 2 3
Joining two relations - Cartesian-product
Relation r, s
Week 2 Lecture 2 4
Renaming a Table
Allows us to refer to a relation, say E, by more than one name
Relations r
Self product
Composition of Operations
Can build expressions using multiple operations
Example:
r ╳s
Week 2 Lecture 2 5
Joining two relations - Natural Join
Let r and s be relations on schemas R and S respectively. Then, the "natural join" of relations R and S is a relation
on schema R ∪ S
If tr and ts have the same value on each of the attributes in R ∩ S , add a tuple t to the result, where
Natural join
Week 2 Lecture 2 6
Aggregation Operators
Can we compute:
SUM
AVG
MAX
MIN
All data in the output table appears in one of the input tables
Week 2 Lecture 2 7
📚
Week 2 Lecture 3
Class BSCCS2001
Materials
Module # 8
Type Lecture
Week # 2
Description
Name
SQL -
First formalized by ANSI
86
SQL -
+ Integrity Constraints
89
SQL -
Major revision (ISO/IEC 9075 standard), De-facto Industry Standard
92
+ Regular Expression Matching, Recursive Queries, Triggers, Support for Procedural and Control Flow Statements,
SQL :
Non-scalar types (Arrays) and some OO features (structured types), Embedding SQL in Java (SQL/OLB) and Embedding
1999
Java in SQL (SQL/JRT)
SQL : + XML features (SQL/XML), Window functions, Standardized sequences and columns with auto-generated values (identity
2003 columns)
SQL : + Way of importing and storing XML data in a SQL database, manipulating it within the database, and publishing both XML
2006 and conventional SQL-data in XML form
SQL :
Legalizes ORDER BY outside Cursor Definitions + INSTEAD OF Triggers, TRUNCATE statements and FETCH clause
2008
Week 2 Lecture 3 1
Description
Name
SQL :
+ Temporal data (PERIOD FOR) Enhancements for Window functions and FETCH clause
2011
SQL :
+ Row Pattern Matching, Polymorphic Table Functions and JSON
2016
SQL :
+ Multidimensional Arrays (MDarray type and operators)
2019
Compliance
SQL is the de facto industry standard today for relational or structured data systems
Commercial system as well as open system may be fully or partially compliant to one or more standards from SQL-92
onward
Not all examples here may work on your particular system. Check your system's SQL docs.
Alternatives
There aren't any alternatives to SQL for speaking to relational databases (i.e. SQL as a protocol)
These alternatives have been implemented in the form of front-ends for working with relational databases. Some
examples of a front-end include (for a section of languages):
They also look a lot more like SQL than other front-ends
HaskellDB
Derivatives
There are several query languages that are derived from or inspired by SQL.
SPARQL (pronounced sparkle, a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF
query language
A semantic query language for databases - able to retrieve and manipulate data stored in Resource Description
Framework (RDF) format.
It has been standardized by the W3C Consortium as key technology of the semantic web
Versions
Used as the query languages for several NoSQL systems - particularly the Graph Databases that use RDF as
store
Integrity Constraints
Week 2 Lecture 3 2
And, as we will see later, also other information such as ...
numeric(p, d) - Fixed point number, with user-specified precision of p digits, with d digits to the right of decimal point.
(ex. numeric(3, 1) allows 44.5 to be stored exactly, but not 444.5 or 0.32)
real, double precision - Floating point and double-precision floating point numbers, with machine-dependent
precision
float(n) - Floating point number with user specified precision of at-least n digits
(integrity − constraint1 ),
...
(integrity − constraintk ));
r is the name of the relation (table)
Example
Week 2 Lecture 3 3
name varchar(20),
dept_name varchar(20),
salary numeric(8, 2));
University DB
instructor
ID
name
dept_name
salary
NOTE: sec_id can be dropped from primary key above to ensure a student cannot register for two sections of the
same course in the same semester
Week 2 Lecture 3 4
Update Tables
Insert (DML command)
drop table r
Where A is the name of the attribute to be added to relation to r and D is the domain of A
All existing tuples in the relation are assigned null as the value for the new attribute
select A1 , A2 , ..., An ,
from r1 , r2 , ..., rm
where P
SELECT clause
The select clause lists the attributes desired in the result of a query
Some people prefer to use UPPER CASE wherever we use the bold font
Week 2 Lecture 3 5
To force the elimination of duplicates, insert the keyword distinct after select
select *
from instructor
select '437'
Result is a table with one column and a single row with the value '437'
select 'A'
from instructor
Result is a table with one column and N rows (number of tuples in the instructors table), each row with value 'A'
The select clause can contain arithmetic expressions involving the operation +, -, * and / and operating on constants or
attributes of tuples
The query:
Would return a relation that is the same as the instructor relation, except that the value of the attribute salary is
divided by 12
WHERE clause
The where clause specifies conditions that the result must satisfy
select name
from instructor
where dept_name = 'Comp. Sci.'
Comparison results can be combined using the logical connectives and, or, not
Week 2 Lecture 3 6
To find all instructors in Comp. Sci. department with salary > 80000
select name
from instructor
where dept_name = 'Comp. Sci.' and salary > 80000
FROM clause
The from clause lists the relations involved in the query
select *
from instructor, teaches
Generates every possible instructor-teaches pair with all attributes from both relations
For common attributes (for eg: ID), the attributes in the resulting table are renamed using the relation name (for
eg: instructor.ID)
Cartesian product is not very useful directly, but useful when combined with the where-clause condition (selection
operation in relational algebra)
Cartesian product
Week 2 Lecture 3 7
📚
Week 2 Lecture 4
Class BSCCS2001
Materials
Module # 9
Type Lecture
Week # 2
Find the names of all instructors who have taught some courses and the course_id
Week 2 Lecture 4 1
Here in this table, we do not have the names of the courses
If we want the name, we will again have to do a similar join operation with a table that has the names of the
courses
Example
Find the names of all the instructors in the Art dept. who have taught some courses and the course_id
Rename AS operation
The SQL allows renaming relations and attributes using the as clause:
old_name as new_name
Find the names of all the instructors who have a higher salary than some instructor in 'Comp. Sci.'
instructor as T ≡ instructor T
String Operations
SQL includes a string-matching operator for comparisons on character strings.
The operator like uses patterns that are described using two special characters:
percent (%)
The % character matches any sub-string
Week 2 Lecture 4 2
underscore ( _ )
Find the names of all instructors whose name includes the sub-string "dar"
select name
from instructor
where name like '%dar%'
meanwhile, '%dar___' (dar followed by 3 underscores), it will match Darwin, but not the others
We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the
default
The Select Top clause is useful on large tables with thousands of records.
Week 2 Lecture 4 3
Oracle uses fetch first n rows only and rownum
Example: Find the names of all the instructors with salary between $90,000 and $100,000
select name
from instructor
where salary between 90000 and 100000
Tuple comparison
IN operator
The in operator allows you to specify multiple values in a where clause
select name
from instructor
where dept_name in ('Comp. Sci.', 'Biology')
Duplicates
In relations with duplicates, SQL can define how many copies of tuples appear in the result
Multiset versions of some of the relational algebra operators - given multiset relations r1 and r2 :
a) SELECT σθ (r1 ) : If there are c1 copies of tuple t1 in r1 and t1 satisfies selection σθ , then there are c1 copies of
t1 in σθ (r1 )
b) PROJECTION ΠA (r) : For each copy of tuple t1 in r1 , there is a copy of tuple ΠA (t1 ) in ΠA (r1 ) where ΠA (t1 )
denotes the projection of the single tuple t1
c) r1 × r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuples t2 in r2 , there are c1 × c2 copies of the
tuple t1 ⋅ t2 in r1 × r2
from r1 , r2 , ..., rm
where P
is equivalent to the multiset version of the expression:
Week 2 Lecture 4 4
📚
Week 2 Lecture 5
Class BSCCS2001
Materials
Module # 10
Type Lecture
Week # 2
(select course_id from section where sem = 'Fall' and year = 2009)
union
(select course_id from section where sem = 'Spring' and year = 2010)
Find the courses that ran in Fall 2009 and in Spring 2010
(select course_id from section where sem = 'Fall' and year = 2009)
intersect
(select course_id from section where sem = 'Spring' and year = 2010)
Find the courses that ran in Fall 2009 but not in Spring 2010
(select course_id from section where sem = 'Fall' and year = 2009)
except
(select course_id from section where sem = 'Spring' and year = 2010)
Find the salaries of all the instructors that are less than the largest salary
Week 2 Lecture 5 1
select distinct T.salary
from instructor as T, instructor as S
where T.salary < S.salary
Set operations such as union, intersect and except automatically eliminate the duplicates
To retain all the duplicates, use the corresponding multiset versions union all, intersect all and except all
NULL values
What is a NULL value?
A NULL value is something unknown or a value that does not exist yet
For eg: Every student may not have a passport at the time of registration
Often times while we are creating/inserting a record, we may not know all the values of all the fields
For eg: When a student joins, the student does not have any credit assigned to him/her, so the total credit is
NULL
We can say 0 (zero), but 0 (zero) and NULL are different
Naturally, when we add an attribute to all the existing rows of a table, the value of the particular field cannot be
known, cannot be set, so it will have to initialized as a NULL value
It is possible for tuples to have a null value, denoted by null, for some of their attributes
select name
from instructor
where salary is null
It is not possible to test for null values with comparison operators such as =, <, > or <>
We need to use the is null and is not null operators instead
Week 2 Lecture 5 2
Three-valued logic using the value unknown:
OR:
AND:
NOT:
(not unknown) = unknown
Aggregate functions
These functions operate on the multiset of values of a column of a relation (table) and return a value
Examples
select avg(salary)
from instructor
where dept_name = 'Comp. Sci.'
Find the total number of instructors who teach a course in the Spring 2010 semester
select count(*)
from courses;
Week 2 Lecture 5 3
So, group by takes a column and makes sub-tables of all those records which have the same value on that particular
group by attribute
It then applies the aggregate function on the column based on this sub-table
Attributes in select clause outside of aggregate functions must appear in group by list
HAVING clause
Find the names and average salaries of all departments whose average salary is greater than 42,000
NOTE: Predicates in the having clause are applied after the formation of groups whereas predicates in the where
clause are applied before forming groups
select sum(salary)
from instructor;
All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes
Week 2 Lecture 5 4
📚
Week 3 Lecture 1
Class BSCCS2001
Materials
Module # 11
Type Lecture
Week # 3
SQL Examples
SELECT DISTINCT
From the classroom relation, find the names of buildings in which every individual classroom has capacity less than
100 (removing the duplicates).
Relation:
classroom
Painter 514 10
Taylor 3128 70
Watson 100 30
Watson 120 50
Query:
Output:
building
Week 3 Lecture 1 1
building
Painter
Taylor
Watson
SELECT ALL
From the classroom relation, find the names of buildings in which every individual classroom has capacity less than
100 (without removing the duplicates).
Relation:
classroom
Painter 514 10
Taylor 3128 70
Watson 100 30
Watson 120 50
Query:
Output:
building
Painter
Taylor
Watson
Watson
NOTE: The duplicate retention is default and hence it is a common practice to skip ALL immediately after SELECT
Cartesian Product
Find the list of all students of departments which have a budget < $100K
name budget
Brandt 50000
Peltier 70000
Levy 70000
Sanchez 80000
Snow 70000
Aoi 85000
Bourikas 85000
Tanaka 90000
Week 3 Lecture 1 2
The above query generates every possible student-department pair, which is the Cartesian product of student and
department.
Then, it filters all the rows with student.dept_name = department.dept_name AND budget < 100000
The common attribute dept_name in the resulting table are renamed using the relation name - student.dept_name and
department.dept_name
RENAME AS Operation
The same query in the above case can be framed by renaming the table as shown below:
studentname deptbudget
Brandt 50000
Peltier 70000
Levy 70000
Sanchez 80000
Snow 70000
Aoi 85000
Bourikas 85000
Tanaka 90000
The above query renames the relation student AS S and the relation department AS D
It also displays the attribute name as StudentName and the budget as DeptBudget
NOTE: The budget attribute does not have any prefix because it occurs only in the department relation
instructor
department
Week 3 Lecture 1 3
dept_name building budget
Query:
SELECT name
FROM instructor I, department D
WHERE D.dept_name = I.dept_name
AND (I.dept_name = 'Finance' OR building IN ('Watson', 'Taylor'));
Output:
name
Srinivasan
Wu
Einstein
Gold
Katz
Singh
Crick
Brandt
Kim
String Operations
From the course relation in the figure, find the titles of all the courses whose course_id has 3 alphabets indicating the
department
course
Query:
SELECT title
FROM course
WHERE course_id LIKE '___-%'; -- 3 underscores
Output:
Week 3 Lecture 1 4
title
Intro. to Biology
Genetics
Computational Biology
Investment Banking
World History
Physical Principles
The course_id of each department has either 2 or 3 alphabets in the beginning followed by a hyphen and then
followed by a 3-digit number. The above query returns the names of those departments that have 3 alphabets in the
beginning
ORDER BY
From the student relation in the figure, obtain the list of all students in alphabetic order of departments and within
each department, in decreasing order of total credits.
student
Query:
Output:
Brandt History 80
Sanchez Music 38
Peltier Physics 56
Levy Physics 46
Week 3 Lecture 1 5
name dept_name tot_cred
Snow Physics 0
IN Operator
From the teaches relation in the figure, find the IDs of all the courses taught in the Fall or Spring of 2018
teaches
Query:
SELECT course_id
FROM teaches
WHERE semester IN ('Fall', 'Spring')
AND year = 2018;
Output:
course_id
CS-315
FIN-201
MU-199
HIS-351
CS-101
CS-319
CS-319
Query:
SELECT course_id
FROM teaches
WHERE semester = 'Fall'
Week 3 Lecture 1 6
AND year = 2018
UNION
SELECT course_id
FROM teaches
WHERE semester = 'Spring'
AND year = 2018
Output:
course_id
CS-101
CS-315
CS-319
FIN-201
HIS-351
MU-199
NOTE: UNION removes all the duplicates. If we use UNION ALL instead of UNION, we get the same set of tuples as
in the above example
instructor
Query:
SELECT name
FROM instructor
WHERE dept_name IN ('Comp. Sci.', 'Finance')
INTERSECT
SELECT name
FROM instructor
WHERE salary > 80000;
Output:
name
Srinivasan
Katz
SELECT name FROM instructor WHERE dept_name IN ('Comp. Sci.', 'Finance') AND salary < 80000;
Week 3 Lecture 1 7
Set Operation: EXCEPT
From the instructor relation in the figure, find the names of all the instructors who taught in either the Computer
Science department or the Finance department and whose salary is either ≥ 90, 000 or ≤ 70, 000
instructor
Query:
SELECT name
FROM instructor
WHERE dept_name IN ('Comp. Sci.', 'Finance')
EXCEPT
SELECT name
FROM instructor
WHERE salary < 90000 AND salary > 70000;
Output:
name
Srinivasan
Brandt
Wu
classroom
Painter 514 10
Taylor 3128 70
Watson 100 30
Watson 120 50
Week 3 Lecture 1 8
Query:
Output:
bulding avg
Taylor 70.00
Packard 500.00
Watson 40.00
instructor
Query:
Output:
least_salary
40000
Query:
Output:
highest_salary
95000
Week 3 Lecture 1 9
From the instructor relation given above, find the number of instructors in each department
Query:
Output:
dept_name ins_count
Comp. Sci. 3
Finance 2
Music 1
Physics 2
History 2
Biology 1
Elec. Eng. 1
course
Query:
Output:
dept_name sum_credits
Finance 3
History 3
Physics 4
Music 3
Comp. Sci. 17
Biology 11
Elec. Eng. 3
Week 3 Lecture 1 10
Week 3 Lecture 1 11
📚
Week 3 Lecture 2
Class BSCCS2001
Materials
Module # 12
Type Lecture
Week # 3
Intermediate SQL
Nested sub-queries
SQL provides a mechanism for the nesting of sub-queries
SELECT A1 , A2 , ..., An
FROM r1 , r2 , ..., rm
WHERE P
as follows:
Week 3 Lecture 2 1
For set membership
Set Membership
Find the courses offered in Fall 2009 and in Spring 2010 (INTERSECT example)
Find courses offered in Fall 2009 but not in Spring 2010 (EXCEPT example)
Find the total number of (distinct) students who have taken course sections taught by the instructor with ID 10101
NOTE: Above query can be written in a simple manner. The formulation above is just to simply illustrate SQL features
SELECT name
FROM instructor
WHERE salary > SOME (
SELECT salary
FROM instructor
WHERE dept_name = 'Biology');
5=
SOME (0, 5) → true # as 0 =
5
Week 3 Lecture 2 2
(= SOME) ≡ IN
However, (= ≡ NOT IN
SOME)
SELECT name
FROM instructor
WHERE salary > ALL (
SELECT salary
FROM instructor
WHERE dept_name = 'Biology');
5 = ALL(4, 5) → false
5=
ALL(4, 5) → true
(=
ALL) ≡ NOT IN
However, (= ALL) ≡
IN
EXISTS r ⇔r=
∅
NOT EXISTS r ⇔r=∅
SELECT course_id
FROM section AS S
WHERE semester = 'Fall' AND year = 2009
AND EXISTS (
SELECT * FROM section AS T
WHERE semester = 'Spring' AND year = 2010
AND S.course_id = T.course_id);
Week 3 Lecture 2 3
First nested query lists all the courses offered by the Biology department
Second nested query lists all the courses a particular student has taken
NOTE: X −Y =∅ ⇔X ⊆Y
NOTE: Cannot write this query string = ALL and its variants
Find all the courses that were offered at most once in 2009
SELECT T.course_id
FROM course AS T
WHERE UNIQUE (
SELECT R.course_id
FROM course AS R
WHERE T.course_id = R.course_id
AND R.year = 2009);
Find the average instructors' salaries of those departments where the average salary is greater than $42,000
WITH clause
The WITH clause provides a way of defining a temporary relation whose definition is available only to the query in
which the WITH clause occurs
WITH max_budget(value) AS
(
SELECT MAX(budget)
FROM department)
SELECT department.name
FROM department, max_budget
WHERE department.budget = max_budget.value;
Week 3 Lecture 2 4
(
SELECT AVG(value)
FROM dept_total)
SELECT dept_name
FROM dept_total, dept_total_avg
WHERE dept_total.value > dept_total_avg.value;
List all departments along with the number of instructors in each department
SELECT dept_name, (
SELECT COUNT(*)
FROM instructor
WHERE department.dept_name = instructor.dept_name)
AS num_instructors
FROM department;
Runtime error occurs if subquery returns more than one result tuple
Deletion
Delete all instructors
Delete all tuples in the instructor relation for those instructors associated with a department located in the Watson
building
Delete all instructors whose salary is less than the average salary of instructors
Solution:
First, compute AVG ( salary ) and find all the tuples to delete
Next, delete all the tuples found above (without recomputing AVG or retesting the tuples)
Insertion
Add a new tuple to the course
Week 3 Lecture 2 5
INSERT INTO course
VALUES ('CS-437', 'Database Systems', 'Comp. Sci.', 4);
or equivalently
The SELECT FROM WHERE statement is evaluated fully before any of its results are inserted into the relation
Updates
Increase salaries of instructors whose salary is over $100,000 by 3% and all other by 5%
UPDATE instructor
SET salary = salary * 1.03
WHERE salary > 100000;
UPDATE instructor
SET salary = salary * 1.05
WHERE salary <= 100000;
UPDATE instructor
SET salary = CASE
WHEN salary <= 100000
THEN salary * 1.05
ELSE salary * 1.03
END;
UPDATE student S
SET tot_creds = (SELECT SUM(credits)
FROM takes, course
WHERE takes.course_id = course.course_id AND
Week 3 Lecture 2 6
S.id = takes.id AND
takes.grade <> 'F' AND
takes.grade IS NOT NULL);
Set tot_creds to null for students who have not taken any course
CASE
WHEN SUM(credits) IS NOT NULL THEN SUM(credits)
ELSE 0
END;
Week 3 Lecture 2 7
📚
Week 3 Lecture 3
Class BSCCS2001
Materials
Module # 13
Type Lecture
Week # 3
A join operation is a Cartesian product which requires that tuples in the two relations match (under some conditions)
It also specifies the attributes that are present in the result of the join
The join operations are typically used as subquery expressions in the FROM clause
Inner join
Equi-join
Natural join
Outer join
Self-join
Cross JOIN
Week 3 Lecture 3 1
CROSS JOIN returns the Cartesian product of rows from tables in the join
Explicit
SELECT *
FROM employee CROSS JOIN department;
Implicit
SELECT *
FROM employee, department;
Relation prereq
course_id prereq_id
BIO-301 BIO-101
CS-190 CS-101
CS-347 CS-101
Observe that
Inner JOIN
course INNER JOIN prereq
course_id prereq_id
BIO-301 BIO-101
CS-190 CS-101
CS-347 CS-101
Week 3 Lecture 3 2
Outer JOIN
An extension of the join operation that avoids loss of information
Computes the join and then adds tuples, from one relation that does not match tuples in the other relation, to the
results of the join
course_id prereq_id
BIO-301 BIO-101
CS-190 CS-101
CS-347 CS-101
Week 3 Lecture 3 3
course_id title dept_name credits prere_id
course_id prereq_id
BIO-301 BIO-101
CS-190 CS-101
CS-347 CS-101
Joined relations
Join operations take two relations and return a relation as the result
These additional operations are typically used as subquery expressions in the FROM clause
Join condition - defines which tuples in the two relations match, and what attributes are present in the result of the
join
Join type - defines how tuples in each relation, that do not match any tuple in the other relation (based on the join
condition), are treated
Join types
inner join
Join conditions
natural
on <predicate>
Week 3 Lecture 3 4
course_id title dept_name credits prereq_id
course_id prereq_id
BIO-301 BIO-101
CS-190 CS-101
CS-347 CS-101
course.course_id = prereq.course_id
What is the difference between the above (equi_join) and a natural join?
course.course_id = prereq.course_id
Week 3 Lecture 3 5
course_id title dept_name credits prere_id
Views
In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in
the database)
Consider a person who needs to know an instructors name and department, but not the salary. This person should
see a relation described, in SQL, by
A VIEW provides a mechanism to hide certain data from the view of certain users
Any relation that is not of the conceptual model but is made visible to a user as a "virtual relation" is called a VIEW
View definition
A view is defined using the CREATE VIEW statement which has the form
Once a view is defined, the view name can be used to refer to the virtual relation that the view generates
View definition is not the same as creating a new relation by evaluating the query expression
Rather, a view definition causes the saving of an expression; the expression is substituted into queries using the
view
Example views
A view of instructors without their salary
SELECT name
FROM faculty
WHERE dept_name = 'Biology'
Week 3 Lecture 3 6
CREATE VIEW physics_fall_2009_watson AS
SELECT course_id, room_number
FROM phsics_fall_2009
WHERE building = 'Watson';
View expansion
Expand use of a view in a query / another view
A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the expression defining v1
A view relation v1 is said to depend on view relation v2 if either v1 depends directly on v2 or there is a path of
dependencies from v1 to v2
View expansion
A way to define the meaning of views defined in terms of other views
Let view v1 be defined by an expression e1 that may itself contain uses of view relations
repeat
As long as the view definitions are not recursive, this loop with terminate
Update of a view
Add a new tuple to faculty view which we defined earlier
Week 3 Lecture 3 7
What if no department is present in Taylor?
The SELECT clause contains only attribute names of the relation and does not have any expressions, aggregates
or DISTINCT specification
Any attribute not listed in the SELECT clause can be set to null
What happens when we insert ('25566', 'Brown', 'Biology', 100000) into the history_instructors ?
Materialized views
Materializing a view: Create a physical table containing all the tuples in the result of the query defining the view
If relations used in the query are updated, the materialized view result becomes out of data
Need to maintain the view, by updating the view whenever the underlying relations are updated
Week 3 Lecture 3 8
📚
Week 3 Lecture 4
Class BSCCS2001
Materials
Module # 14
Type Lecture
Week # 3
Atomic transaction
Example: Bank account transactions, when transferring money from one account to another, the transaction
should either happen or not happen at all.
It should not fail at a stage where money is deducted from one account and not added to the other account
Can turn off auto-commit for a session (for example, using API)
Integrity Constraints
Integrity constraints guard against accidental damage to the database by ensuring that the authorized changes to the
database do not result in a loss of data consistency
Week 3 Lecture 4 1
A salary of a bank employee must be at least Rs. 250.00 an hour
PRIMARY KEY
UNIQUE
UNIQUE(A1 , A2 , ..., Am )
The unique specification states that the attributes A1 , A2 , ..., Am form a candidate key
Referential Integrity
Ensures that a value that appears in one relation for a given set of attributes also appeals for a certain set of attributes
in another relation
Example: If "Biology" is a department name appearing in one of the tuples in the instructor relation, then there exists
a tuple in the department relation for "Biology"
Let A be a set of attributes. Let R and S be two relations than contain attributes A.
A is said to be a FOREIGN KEY of R if for any values of A appearing in R these values also appear in S
Week 3 Lecture 4 2
CREATE TABLE course (
...
dept_name VARCHAR(20),
FOREIGN KEY (dept_name) REFERENCES department
ON DELETE CASCADE
ON UPDATE
...
)
OR, set father and mother to null initially, update after inserting all persons (not possible if father and mother
attributes declared to be NOT NULL)
Index creation
Indices are data structures used to speed up access to records with specified values for index attributes
Week 3 Lecture 4 3
Can be executed by using the index to find the required record, without looking at all records of students
User-defined types
CREATE TYPE construct in SQL creates user-defined type (alias, like typedef in C)
Domains
CREATE TYPE construct in SQL-92 creates user-defined domain types
Large-object types
Large objects (photos, videos, CAD files, etc.) are stored as a large object:
blob: binary large object - object is a large collection of uninterpreted binary data (whose interpretation is left to
an application outside of the database system)
When a query returns a large object, a pointer is returned than the large object itself
Authorization
Forms of authorization on parts of the database:
Insert: allows insertion of new data, but not modification of existing data
A user-id
Week 3 Lecture 4 4
PUBLIC, which allows all valid users the privilege granted
A role
Granting a privilege on a view does not imply granting any privileges on the underlying relations
The grantor of the privilege must already hold the privilege on specified item (or be the database administrator)
Privileges in SQL
SELECT: allows read access to relation or the ability to query using the view
ALL PRIVILEGES: used as a short form for all the allowable privileges
Example:
REVOKE SELECT ON branch FROM U1 , U2 , U3
<privilege list> may be all to revoke all privileges the revokee may hold
If <revokee list> includes public, all users lose the privilege except those granted it explicitly
If the same privilege was granted twice to the same user by different grantees, the user may retain the privilege after
the revocation
All privileges that depend on the privilege being revoked are also revoked
Roles
CREATE ROLE instructor;
Chain of roles
Authorization on views
Week 3 Lecture 4 5
CREATE VIEW geo_instructor AS
(SELECT *
FROM instructor
WHERE dept_name = 'Geology');
GRANT SELECT ON geo_instructor TO geo_staff;
SELECT *
FROM geo_instructor;
What is
Transfer of privileges
Week 3 Lecture 4 6
📚
Week 3 Lecture 5
Class BSCCS2001
Materials
Module # 15
Type Lecture
Week # 3
Advanced SQL
Functions and Procedural Constructs
Week 3 Lecture 5 1
Functions and Procedures
Functions / Procedures and Control Flow statements were added in SQL:1999
Functions/Procedures can be written in SQL itself or in an external programming language like C, Java, etc
Functions written in an external language are particularly useful with specialized data types such as images and
geometric objects
Some database systems support table-valued functions which can return a relation as a result
SQL:1999 also supports a rich set of imperative constructs, including loops , if-then-else and assignment
Many databases have proprietary procedural extensions to SQL that differ from SQL:1999
SQL Functions
Define a function that, given the name of a department, returns the count of the number of instructors in that
department
The function dept_count can be used to find the department names and budget of all departments with more than 12
instructors:
Week 3 Lecture 5 2
May contain multiple SQL statements between BEGIN and END
RETURN: specifies the values are to be returned as result of invoking the function
SQL function are in fact parameterized views that generalize the regular notion of views by allowing parameters
Table functions
Functions that return a relation as a result added in SQL:2003
Usage
SELECT *
FROM TABLE (instructor_of('Music'))
SQL procedures
The dept_count function could instead be written as procedure:
Procedures can be invoked either from an SQL procedure or from embedded SQL, using the CALL statement
SQL:1999 allows overloading - more than one function/procedure of the same name as long as the number of
arguments and/or the types of the arguments differ
Warning: Most database systems implement their own variant of the standard syntax
WHILE loop:
REPEAT loop:
Week 3 Lecture 5 3
REPEAT
sequence of statements;
UNTIL boolean expression
END REPEAT;
FOR loop:
Conditional statements
if-then-else
case
if-then-else statement
The IF statement supports the use of optional ELSEIF clauses and a default ELSE clause
Example procedure: registers student after ensuring classroom capacity is not exceeded
CASE variable
WHEN value1 THEN
sequence of statements;
WHEN value2 THEN
sequence of statements;
...
ELSE
sequence of statements;
END CASE;
The WHEN clause of the CASE statement defines the value that when satisfied determines the flow of control
CASE
WHEN sql-expression = value1 THEN
sequence of statements;
WHEN sql-expression = value2 THEN
sequence of statements;
...
ELSE
sequence of statements;
END CASE;
Any supported SQL expression can be used here. These expressions can contain references to variables,
parameters, special registers and more.
Week 3 Lecture 5 4
DECLARE out_of_classroom_seats CONDITION
DECLARE EXIT HANDLER FOR out_of_classroom_seats
BEGIN
...
SIGNAL out_of_classroom_seats
...
END
The handler here is EXIT - causes enclosing BEGIN ... END to terminate and exit
Such functions can be more efficient than functions defined in SQL. The computations that cannot be carried out in
SQL can be executed by these functions
Drawbacks:
Code to implement function may need to be loaded into the DB system and executed in the DB system's address
space
There are alternatives, which provide good security at the cost of performance
Direct execution in the DB system's space is used when efficiency is more important than security
That is, use a safe language like Java, which cannot be used to access/damage other parts of the DB code
Run external language functions/procedures in a separate process, with no access to the DB process' memory
Many DB systems support both above approaches as well as direct executing in DB system address space
Triggers
A TRIGGER defines a set of actions that are performed in response to an INSERT, UPDATE or DELETE operation
on a specified table
When such an SQL operation is executed, the trigger is said to have been activated
Week 3 Lecture 5 5
Triggers are defined using the CREATE TRIGGER statement
To enforce data integrity rules via referential constraints and check constraints
To cause updates to other tables, automatically generate or transform values for inserted or updated rows, or
invoke functions to perform tasks such as issuing alerts
Specify the events / (like UPDATE, INSERT or DELETE) for the trigger to executed
Values that are being updated or inserted can be modified before the DB is actually modified.
You can use triggers that run before an UPDATE or INSERT to ...
Check or modify the values before they are actually updated or inserted in the DB
You can use triggers than run after an update or insert to:
Useful to ensure data integrity when referential integrity constraints aren't appropriate
When table check constraints limit checking to the current table only
Row level triggers are executed whenever a row is affected by the event on which the trigger is defined
Suppose an UPDATE statement is executed to increase the salary of each employee by 10%
Any row level UPDATE trigger configured on the table Employee will affect all the 100 rows in the table during this
update
Statement level triggers perform a single action for all the rows affected by a statement, instead of executing a
separate action for each affected row
Week 3 Lecture 5 6
Uses referencing old table or referencing new table to refer to temporary tables called transition tables
containing the affected rows
Can be more efficient when dealing with SQL statements that update a large number of rows
Triggers can be activated before an event, which can serve as extra constraints
Adding additional values to a table that may not be available to an application (due to security restrictions or other
limitations), such as:
Login/user name
Server/database name
Simple validation
Week 3 Lecture 5 7
One of the greatest challenges for architects and developers is to ensure that
to not allow them to become a one-size-fits-all solution for any data needs that happen to come along
Adding triggers is often seen as faster and easier than adding code to an application, but the cost of doing so is
compounded over time with each added line of code.
Recursive triggers are set to ON. The DB-level setting is set to off by default
Iteration occurs
Week 3 Lecture 5 8
📚
Week 4 Lecture 1
Class BSCCS2001
Materials
Module # 16
Type Lecture
Week # 4
Relational Algebra
Created by Edgar F. Codd at IBM in 1970
Procedural Language
Select: σ
Project: Π
Union: ∪
Set difference: −
Cartesian product: ×
Rename: ρ
The operators take one or two relations as inputs and produce a new relation as the result
Week 4 Lecture 1 1
SELECT operation
Notation: σp (r)
∧ (and)
∨ (or)
¬ (not)
Each term is one of:
PROJECT operation
Notation: ΠA 1 ,A 2 ,...A k (r)
The result is defined as the relation of k columns obtained by erasing the columns that are not listed.
Week 4 Lecture 1 2
UNION operation
Notation: r ∪s
Defined as: r ∪ s = {t∣t ∈ r or t ∈ s}
For r ∪ s to be valid:
r, s must have the same arity (same number of attributes)
The attribute domains must be compatible (ie: same data type)
Example: To find all the courses taught in the Fall 2009 semester or in the Spring 2010 semester or in both
Week 4 Lecture 1 3
DIFFERENCE operation
Notation: r −s
Defined as: r − s = {t∣t ∈ r and t ∈
/ s}
Set differences must be taken between compatible relations
Example: To find all the courses taught in the Fall 2009 semester, but not in the Spring 2010 semester
INTERSECTION operation
Notation: r ∩s
Defined as:
r ∩ s = {t∣t ∈ r and t ∈ s}
Assume:
Note: r ∩ s = r − (r − s)
Week 4 Lecture 1 4
CARTESIAN-PRODUCT operation
Notation: r ×s
Defined as:
r × s = {t q∣t ∈ r and q ∈ s}
Assume that attributes of r(R) and s(S) are disjoint
That is, R ∩ S =ϕ
If attributes of r(R) and s(S) are not disjoint, then renaming must be used.
Week 4 Lecture 1 5
RENAME operation
Allows us to name and, therefore, refer to the results of relational-algebra expressions
Example:
ρx (E)
returns the expression E under the name X
A1 , A2 , ..., An
DIVISION operation
The division operation is applied to two relations
The result of DIVISION is a relation T (Y ) that includes a tuple t if tuples tR appear in R with tR [Y ] = t, and with
tR [X] = ts for every tuple tS in S
For a tuple t to appear in the result T of the DIVISION, the value in t must appear in R in combination with every
tuple in S
DIVISION Example #1
R S R|S
Green Prolog
Green Databases
Lewis Prolog
Smith Databases
DIVISION Example #2
R S R|S
Green Prolog
Green Databases
Lewis Prolog
Smith Databases
DIVISION Example #3
A B1 A / B1
Week 4 Lecture 1 6
sno pno pno sno
s1 p1 p2 s1
s1 p2 s2
B2
s1 p3 s3
s1 p4 pno s4
s2 p1 p2
A / B2
s2 p2 p4
s3 p2 sno
B3
s4 p2 s1
s4 p4 pno s4
p1
A / B3
p2
p4 sno
s1
DIVISION Example #4
Relation r, s
r s r÷s
A B B A
α 1 1 α
α 2 2 β
α 3
β 1
γ 1
δ 1
δ 3
δ 4
∈ 6
∈ 1
β 2
DIVISION Example #5
Relation r, s:
r s
A B C D E D E
α a α a 1 a 1
α a γ a 1 b 1
α a γ b 1
β a γ a 1
β a γ b 3
γ a γ a 1
γ a γ b 1
γ a β b 1
r÷s
A B C
α a γ
γ a γ
Week 4 Lecture 1 7
eg: Students who have taken both "a" and "b" courses, with instructor "1"
(Find all the students who have taken all courses given by the instructor 1)
Week 4 Lecture 1 8
📚
Week 4 Lecture 2
Class BSCCS2001
Materials
Module # 17
Type Lecture
Week # 4
It adds the concept of predicates and quantifiers to better capture the meaning of statements that cannot be adequately
expressed by propositional logic
Tuple Relational Calculus and Domain Relational Calculus are based on Predicate Calculus
Predicate
Consider the statement: "x is greater than 3"
It has 2 parts
This refers to the property that the subject of the statement can have
The statement "x is greater than 3" can be denoted by P (x) where P denotes the predicate "is greater than 3" and x
is the variable
The predicate P can be considered as a function. It tells the truth value of the statement P (x) at x
Once a value has been assigned to the variable x, the statement P (x) becomes a proposition and has a truth
or false value
Week 4 Lecture 2 1
In general, a statement involving n variables x1 , x2 , x3 , ..., xn can be denoted by P (x1 , x2 , x3 , ..., xn )
Quantifiers
In predicate logic, predicates are used alongside quantifiers to express the extent to which a predicate is true over a range
of elements
Universal Quantifier
Existential Quantifier
Universal Quantifier
Universal Quantification: Mathematical statements sometimes assert that a property is true for all the values of a
variable in a particular domain, called the Domain of Discourse
The universal quantification of P (x) for a particular domain is the proposition that assert that P (x) is true for all
values of x in this domain
The domain is very important here since it decides the possible values of x
Formally, the universal quantification of P (x) is the statement "P (x) for all values of x in the domain"
Solution: As x + 2 is greater than x for any real number, so P (x) ≡ T for all x or ∀xP (x) ≡ T
Existential Quantifier
Existential Quantification: Some mathematical statements assert that there is an element with a certain property
Existential quantification can be used to form a proposition that is true if and only if P (x) is true for at least one value of
x in the domain
Formally, the existential quantification of P (x) is the statement "There exists an element x in the domain such that
P (x)"
The notation ∃P (x) denotes the existential quantification of P (x)
Solution: P (x) is true for all real numbers greater than 5 and false for all real numbers less than 5
{t∣P (t)}
where t = resulting tuples
P (t) = known as predicate and these are the conditions that are used to fetch t
P (t) may have various conditions logically combined with OR( ∨ ), AND( ∧ ), NOT( ¬ )
Week 4 Lecture 2 2
It also uses quantifiers:
∃t ∈ r(Q(t)) = "there exists" a tuple in t in relation r such that predicate Q(t) is true
∀t ∈ r(Q(t)) = Q(t) is true "for all" tuples in relation r
{P ∣∃S ∈ Students and (S.CGP A > 8 ∧ P .name = S.name ∧ P .age = S.age)} :
returns the name and age of students with a CGPA above 8
∃t ∈ r(Q(t)) ≡ "there exists" a tuple in t in relation r such that predicate Q(t) is true
∀t ∈ r(Q(t)) ≡ Q is true "for all" tuples t in relation r
TRC Example #1
Student
Solution:
Fname
David
Varun
Simi
TRC Example #2
Consider the relational schema
Q. 2: Find out the names of all the students who have taken the course named 'DBMS'
Week 4 Lecture 2 3
{s.name, s.rollNo ∣ s ∈ student ∧ ∃c ∈ course(s.courseId = c.courseId ∧ c.cname =′ DBMS ′ )}
{t ∣ ∃s ∈ student ∃c ∈ course(s.courseId = c.courseId ∧ c.cname =′ DBMS ′ ∧ t.name = s.name ∧
t.rollNo = s.rollNo)}
TRC Example #3
Consider the following relations:
RA
TRC Example #4
Consider the following relations:
Q. 5: Find the names and salaries of certified pilots working on Boeing aircrafts
RA
TRC Example #5
Consider the following relations:
Q. 6: Identify the flights that can be piloted by every pilot whose salary is more than $100, 000
Safety of Expressions
It is possible to write tuple calculus expressions that generate infinite relations
For example, {t ∣ ¬t ∈ r} results in an infinite relation if the domain of any attribute of the relation r is infinite
To guard against the problem, we restrict the set of allowable expressions to safe expressions
An expression {t ∣ P (t)} in the tuple relational calculus is safe if every component of t appears in one of the
relations, tuples or constants that appear in P
Eg: {t ∣ t[A] = 5 ∨ true} is not safe → it defines an infinite set with attribute values that do not appear in any
relation or tuples or constants in P
Week 4 Lecture 2 4
Domain Relational Calculus
A non-procedural query language equivalent in power to the tuple relational calculus
Equivalence of Relational Algebra, Tuple Relational Calculus & Domain Relational Calculus
SELECT operation
R = (A, B)
Relational Algebra: σB =17 (r)
PROJECT operation
R = (A, B)
Relational Algebra: ΠA (r)
COMBINING operation
R = (A, B)
Relational Algebra: ΠA (σB =17 (r))
UNION
R = (A, B, C) S = (A, B, C)
Relational Algebra: r ∪s
Tuple Calculus: {t ∣ t ∈ r ∨ t ∈ s}
Domain Calculus: {< a, b, c > ∣ < a, b, c >∈ r ∨ < a, b, c >∈ s}
SET DIFFERENCE
R = (A, B, C) S = (A, B, C)
Relational Algebra: r −s
Tuple Calculus: {t ∣t∈r∧t∈
/ s}
Domain Calculus: {< a, b, c > ∣ < a, b, c >∈ r ∧ < a, b, c >∈
/ s}
INTERSECTION
R = (A, B, C) S = (A, B, C)
Relational Algebra: r ∩s
Tuple Calculus: {t ∣ t ∈ r ∧ t ∈ s}
Domain Calculus: {< a, b, c > ∣ < a, b, c >∈ r ∧ < a, b, c >∈ s}
Week 4 Lecture 2 5
R = (A, B) S = (C, D)
Relational Algebra: r ×s
Tuple Calculus: {t ∣ ∃p ∈ r∃q ∈ s(t[A] = p[A] ∧ t[B] = p[B] ∧ t[C] = q[C] ∧ t[D] = q[D])}
Domain Calculus: {< a, b, c, d > ∣ < a, b >∈ r∧ < c, d >∈ s}
NATURAL JOIN
R = (A, B, C, D) S = (B, D, E)
Relational Algebra:
r⋈s
Πr.A,r.B ,r.C,r.D,s.E (σr.B =s.B ∧r.D=s.D (r × s))
Tuple Calculus:
{t ∣ ∃ p ∈ r ∃ q ∈ s(t[A] = p[A] ∧ t[B] = p[B] ∧ t[C] = p[C] ∧ t[D] = p[D] ∧ t[E] = q[E] ∧ p[B] =
q[B] ∧ p[D] = q[D])}
Domain Calculus:
DIVISION
R = (A, B) S = (B)
Relational Algebra: r ÷s
Tuple Calculus: {t ∣ ∃ p ∈ r ∀ q ∈ s(p[B] = q[B] ⇒ t[A] = p[A])}
Domain Calculus: {< a > ∣ < a >∈ r ∧ ∀ < b > (< b >∈ s ⇒< a, b >∈ r)}
Source: https://www2.cs.sfu.ca/CourseCentral/354/louie/Equiv_Notations.pdf
Week 4 Lecture 2 6
📚
Week 4 Lecture 3
Class BSCCS2001
Materials
Module # 18
Type Lecture
Week # 4
Entity-Relationship Model
Design Process
What is a Design?
A Design:
Satisfies restrictions on the design itself, such as its length or cost, or the tools available for doing the design
Role of Abstraction
Disorganized Complexity results from
Storage (STM) limitations of the human brain - an individual can simultaneously comprehend of the order of
seven, plus or minus two chunks of information
Speed limitations of human brain - it takes the mind about five seconds to accept a new chunk of information
Abstraction provides the major tool to handle Disorganized Complexity by chunking information
Ignore in-essential details, deal only with the generalized, idealized model of the world
Hard to remember
Week 4 Lecture 3 1
Try the octal form: (110)(010)(101)(001) ⟹ 6251
Or the hex form: (1100)(1010)(1001) ⟹ CA9
Model Building
Physics Electrical Circuits
Design Approach
Requirement Analysis: Analyse the data needs of the prospective DB users
Planning
System Defining
Logical Model
Physical Model
Implementation
Testing
Computer Science Decision: What relation schema should we have and how should the attributes be distributed
among the various relation schema?
Week 4 Lecture 3 2
Entity Relationship Model
Database Normalization
The ER model is useful in mapping the meanings and interactions of the real world enterprises onto a conceptual
schema
Attributes
Entity sets
Relationship sets
The ER model also has an associated diagrammatic representation, the ER diagram, which can express the overall
logical structure of a DB graphically
Attributes
An attribute is a property associated with an entity / entity set
Attribute types:
Derived attributes
Week 4 Lecture 3 3
Domain: The set of permitted values for each attribute
Attributes: Composite
Entity sets
An entity is an object that exists and is distinguishable from other objects
An entity set is a set of entities of the same type that share the same properties
An entity is represented by a set of attributes: ie, descriptive properties possessed by all members of an entity set
Example:
A subset of the attributes form a primary key of the entity set; that is, uniquely identifying each member of the set
44553 Peltier
Relationship sets
A relationship is an association among several entities
Example:
A relationship set is a mathematical relation among n ≥ 2 entities, each taken form entity sets
{(e1 , e2 , ..., en )∣e1 ∈ E1 , e2 ∈ E2 , ..., en ∈ En }
where (e1 , e2 , ..., en ) is a relationship
Week 4 Lecture 3 4
Example: (44553, 22222) ∈ advisor
For instance, the advisor relationship set between entity sets instructor and student may have the attribute date
which tracks when the student started being associated with the advisor
Binary relationship
Attributes: Redundant
Suppose we have entity sets:
Week 4 Lecture 3 5
instructors, with attributes: ID, name, dept_name, salary
We model the fact that each instructor has an associated department using a relationship set inst_dept
Since it is the primary key for the entity set department, it replicates information present in the relationship and is
therefore redundant in the entity set instructor and needs to be removed
BUT: When converting back to tables, in some cases the attributes gets re-introduced, as we will see later
For a binary relationship set the mapping cardinality must be one of the following types:
One to One
One to Many
Many to One
Many to Many
Mapping Cardinalities
Week 4 Lecture 3 6
NOTE: Some elements in A and B may not be mapped to any elements in the other set
A strong entity set is an entity set that contains sufficient attributes to uniquely identify all its entities
A weak entity set is an entity set that does not contain sufficient attributes to uniquely identify its entities
In other words, a primary key does not exist for a weak entity set
Since a weak entity set does not have a primary key, it cannot independently exist in the ER model
The combination of discriminator and primary key of the strong entity set makes it possible to uniquely identify all
entities of the weak entity set
Thus, this combination serves as a primary key for the weak entity set
Clearly, this primary key is not formed by the weak entity set completely
Primary Key of a Weak Entity Set = Its own discriminator + Primary Key of Strong Entity Set
Weak entity set must have total participation in the identifying relationship
Week 4 Lecture 3 7
building_no is the primary key here
door_no is its discriminator as door_no alone can not identify an apartment uniquely
There may be several other buildings having the same door number
By total participation in BA, each apartment must be present in at least one building
In contrast, Building has partial participation in BA only as there might exist some buildings which has not apartment
Primary Key of Apartment = Primary Key of the Building + Its own discriminator = building_no + door_no
Suppose we create a relationship set sec_course between entity sets section and course
Note that the information in sec_course is redundant, since section already has an attribute course_id, which identifies
the course with which the section is related
Week 4 Lecture 3 8
📚
Week 4 Lecture 4
Class BSCCS2001
Materials
Module # 19
Type Lecture
Week # 4
ID
Attributes are listed inside entity rectangle
name
Underline indicates primary key attributes
salary
student
ID
name
tot_cred
Relationship sets
Diamonds represent relationship sets
Week 4 Lecture 4 1
Relationship sets with attributes
Roles
Entity sets of relationship need not be distinct
Cardinality Constraints
We express cardinality constraints by drawing either a directed line ( → ), signifying "one" or an undirected line (−),
signifying "many" between the relationship set and the entity set
A student is associated with at most one instructor via the relationship advisor
An instructor is associated with at most one student via the relationship advisor
One-to-Many relationship
Week 4 Lecture 4 2
One-to-Many relationship between an instructor and a student
Many-to-Many relationship
An instructor is associated with several (including 0) students via advisor
Partial participation: some entities may not participate in any relationship in the relationship set
A maximum value of 1 indicates that the entity participation in at most one relationship
Week 4 Lecture 4 3
Instructor can advise 0 or more students
ID
name
first_name
middle_initial
last_name
address
street
street_number
street_name
apt_number
city
state
zip
{ phone_number }
date_of_birth
age()
The relationship set connecting the weak entity set to the identifying strong entity set is depicted by a double diamond
Week 4 Lecture 4 4
ER Model to Relational Schema
Reduction to Relation Schema
Entity sets and relationship sets can be expressed uniformly as relation schemas that represent the contents of the
DB
For each entity set and relationship set there is a unique schema that is assigned the name of the corresponding
entity set or relationship set
Each schema has a number of columns (generally corresponding to attributes) which have unique names
A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set
Week 4 Lecture 4 5
Representing relationship sets
A many-to-many relationship set is represented as a schema with attributes for the primary keys of the two
participating entity sets and any descriptive attributes of the relationship set
Example: Given entity set instructor with composite attribute name with component attributes first_name and
last_name the schema corresponding to the entity set has two attributes name_first_name and
name_last_name
Schema EM has attributes corresponding to the primary key of E and an attribute corresponding to multi-valued
attribute M
Each value of the multi-valued attribute maps to a separate tuple of the relation on schema EM
For example: an instructor entity with primary key 22222 and phone numbers 456-7890 and 123-4567 maps to
two tuples: (22222, 456-7890) and (22222, 123-4567)
Example: Instead of creating a schema for relationship set inst_dept, add an attribute dept_name to the schema
arising from entity set instructor
Week 4 Lecture 4 6
For One-to-One relationship sets, either side can be chosen to act as the "many" side
That is, an extra attribute can be added to either of the tables corresponding to the two entity sets
If participation is partial on the "many" side, replacing a schema by an extra attribute in the schema corresponding to
the "many" side could result in null values
The schema corresponding to a relationship set linking a weak entity set to its identifying strong entity set is redundant
Example: The section schema already contains the attributes that would appear in the sec_course schema
Week 4 Lecture 4 7
📚
Week 4 Lecture 5
Class BSCCS2001
Materials
Module # 20
Type Lecture
Week # 4
Week 4 Lecture 5 1
We allow at most one arrow out of a ternary (or greater degree) relationship to indicate a cardinality constraint
For example, an arrow from proj_guide to instructor indicates each student has at most one guide for a project
If there is more than one arrow, there are two ways of defining the meaning
For example, a ternary relationship R between A, B and C with arrows to B and C could mean
Each pair of entities form (A, B) is associated with a unique entity and each pair (A, C) is associated with a
unique B
Specialization: ISA
Top-down design process: We designate sub-groupings within an entity set that are distinctive from other entities in
the set
These sub-groupings become lower-level entity sets that have attributes or participate in relationships that do not
apply to the higher-level entity set
Depicted by a triangle component leveled ISA (eg: instructor "is a" person)
Attribute inheritance: A lower-level entity set inherits all the attributes and relationship participation of the higher-
level entity set to which it is linked
Week 4 Lecture 5 2
Form a schema for each lower-level entity set, include primary key of higher-level entity set and local attributes
schema attributes
Drawback: Getting information about an employee requires accessing two relations, the one corresponding to the
low-level schema and the one corresponding to the high-level schema
Method 2:
Form a schema for each entity set with all local and inherited attributes
Name Tags
Drawback: name, street and city may be stored redundantly for people who are both students and employees
Generalization
Bottom-up design process: Combine a number of entity sets that share the same features into a higher-level entity
set
Specialization and generalization are simple inversions of each other; they are represented in an ER diagram in the
same way
partial: an entity need not belong to one of the lower-level entity sets
We can specify total generalization in an ER diagram by adding the keyword total in the diagram
Drawing a dashed line from the keyword to the corresponding hollow arrow-head to which it applies (for a total
generalization) or to the set of hollow arrow-heads to which it applies (for an overlapping generalization)
Because the higher-level entity set arrived at through generalization is generally composed of only those entities
in the lower-level entity sets, the completeness constraint for a generalized higher-level entity set is usually total
Aggregation
Consider the ternary relationship proj_guide, which we saw earlier
Week 4 Lecture 5 3
Relationship sets eval_for and proj_guide represent overlapping information
However, some proj_guide relationships may not correspond to any eval_for relationships
Eliminate this redundancy via aggregation without introducing redundancy, the following diagram represents:
Week 4 Lecture 5 4
Representing aggregation via Schema
To represent aggregation, create a schema containing
In our example
The schema
textiteval_for is:
Design Issues
Entities v/s Attributes
Use of entity sets v/s attributes
Use of phone as an entity allows extra information about phone numbers (plus multiple phone numbers)
Possible guideline is to designate a relationship set to describe an action that occurs between entities
Week 4 Lecture 5 5
Placement of relationship attributes
Some relationships that appear to be non-binary may be better represented using binary relationships
For example, a ternary relationship parents, relating a child to his/her father and mother, is best replaced by two
binary relationships, father and mother
Using two binary relationships allows partial information (eg: only mother being known)
Example: proj_guide
Replace R between entity sets A, B and C by an entity set E, and three relationship sets:
RA , relating E and A
RB , relating E and B
RC , relating E and C
Create an identifying attribute for E and add any attributes of R to E
add (ei , ai ) to RA
add (ei , bi ) to RB
add (ei , ci ) to RC
Week 4 Lecture 5 6
Also need to translate constraints
There may be instance in the translated schema that cannot correspond to any instance of R
Exercise: add constraints to the relationships RA , RB and RC to ensure that a newly created entity
corresponds to exactly one entity in each of entity sets — A, B and C
We can avoid creating an identifying attribute by making E, a weak entity set identified by the three relationship
sets
ER Design Decisions
The use of an attribute or entity set to represent an object
The use of aggregation — can treat the aggregate entity set as a single unit without concern for the details of its
internal structure
Week 4 Lecture 5 7
Week 4 Lecture 5 8
Week 4 Lecture 5 9