Quiz 1 Notes DBMS

📚
Week 1 Lecture 1
Class BSCCS2001
Created @August 19, 2021 1:46 PM
Materials https://drive.google.com/drive/folders/19FhdYYKeH3ZshWhoZIJlP_MC1nVnUUmU?usp=sharing
Module # 1
Type Lecture
Week # 1
Database Management Systems (DBMS)
🚨 DBMS: A database management system (or DBMS) is essentially nothing more than a computerized data-
keeping system. (via IBM)
DBMS contains info about a particular enterprise

Collection of interrelated data
Set of programs to access the data
An environment that is both convenient and efficient to use
Database Applications:
Banking: transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Online retailers: order tracking, customized recommendations
Manufacturing: production, inventory, orders, supply chain
HR: employee records, salaries, tax deductions
Databases can be very large

Databases touch various aspects of our lives
Week 1 Lecture 1 1
University Database Example
Application program examples
Add new students, instructors and courses
Register students for courses and generate class rosters
Assign grades to students, compute Grade Point Average (GPA) and generate transcripts
In early days, database applications were built directly on top of file systems
Drawbacks of using file systems to store data

Data redundancy and inconsistency
Multiple file formats, duplication of information in different files
Difficulty in accessing data
Need to write a new program to carry out each new task
Data isolation
Multiple files and formats
Integrity problems
Integrity constraints (eg: account balance > 0) become "buried" in program code rather than being stated explicity
Hard to add new constraints or change existing ones
Atomicity of updates
Failures may leave databases in an inconsistent state with partial updates carries out
Example: Transfer of funds from one account to another should either complete or not happen at all
Concurrent access by multiple users
Concurrent access needed for performance
Uncontrolled concurrent accesses can lead to inconsistencies
Example: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at
the same time
Security problems
Hard to provide user access to some, but not all, data
Database systems offer solutions to the above problems
Course pre-requisites:
Set Theory
Definition of a set
Intensional definition
Extensional definition
Set-builder notation
Membership, Subset, Superset, Power set, Universal set
Operations on sets:
Unions, Intersections, Complement, Difference, Cartesian product
De-Morgan's Law
Relations and Functions

Definition of Relations
Ordered pairs and Binary relations
Domain and Range
Week 1 Lecture 1 2
Image, Pre-image, Inverse
Properties: Reflexive, Symmetric, Anti-symmetric, Transitive, Total
Definition of functions
Properties of functions: Injective, Surjective, Bijective
Composition of functions
Inverse of functions
Propositional Logic
Truth values and Truth tables
Operators: conjunction (and), disjunction (or), negation (not), implication, equivalence
Closure under Operations
Predicate Logic
Predicates
Quantification
Existential
Universal
Python
Algorithms and Programming in C

Sorting
Merge sort
Quick sort
Search
Linear search
Binary search
Interpolation search
Data Structures
Arrays
List
Binary Search Tree
Balanced Tree
B - Tree
Hash table/map
Object-Oriented analysis and design

Refresher material
Discrete Mathematics by Brilliant: https://brilliant.org/wiki/discrete-mathematics
Python
IITM online book: https://pypod.github.io
Cheatsheet: https://www.pythoncheatsheet.org
DataCamp Cheatsheet: https://www.datacamp.com/community/tutorials/python-data-science-cheat-sheet-

basics
Week 1 Lecture 1 3
C Language: https://www.youtube.com/watch?
v=zYierUhIFNQ&list=PLhQjrBD2T382_R182iC2gNZI9HzWFMC_8&index=2 (part of CS50 2020 Lectures)
Week 1 Lecture 1 4
📚
Week 1 Lecture 2
Class BSCCS2001
Materials
Module # 2
Type Lecture
Week # 1
Why DBMS?
Data Management
Storage
Retrieval
Transaction
Audit
Archival
For
Individuals
Small / Big Enterprises
Global
There has been 2 major approaches in this practice:
1. Physical:
Physical Data or Records Management, more formally known as Book Keeping, has been using physical ledgers
and journals for centuries
The most significant development happened when Henry Brown patented a "receptacle for storing and preserving
papers" on November 2, 1886
Herman Hollerith adapted the punch cards used for weaving looms to act as the memory for a mechanical tabulating
machine in 1890
Week 1 Lecture 2 1
2. Electronic:
Electronic Data or Records management moves with the advances in technology, especially of memory, storage,
computing and networking
1950s: Computer programming started
1960s: Data Management with punch cards / tapes and magnetic tapes
1970s:
COBOL and CODASYL approach was introduced in 1971
On October 14, 1979, Apple II platform shipped VisiCalc, marking the birth of spreadsheets
Magnetic disks became prevelant
1980s: RDBMS changed the face of data management
1990s: With internet, data management started becoming global
2000s: e-Commerce boomed, NoSQL was introduced for unstructured data management
2010s: Data Science started riding high
Electronic Data Management Params

Electronic Data or Records management depends on various params including ...
Durability
Scalability
Security
Retrieval
Ease of Use
Consistency
Efficiency
Cost
Book Keeping
A book register was maintained on which the shop owner wrote the amount received from customers, the amount due for
any customer, inventory details and so on ...
Problems with such an approach of book keeping:
Durability: Physical damage to these registers is a possibility due to rodents, humidity, wear and tear
Scalability: Very difficult to maintain over the years, some shops have numerous registers spanning over the years
Security: Susceptible to tampering by the outsiders
Retrieval: Time consuming process to search for previous entry
Consistency: Prone to human errors
Not only small shops but large orgs also used to maintain their transactions in book registers
Spreadsheet files - A better solution

Mostly useful for single user or small enterprise applications
Spreadsheet software like Google Sheets: Due to disadvantages of maintaining ledger registers, organizations dealing
with huge amount of data shifted to using spreadsheets for maintaining records in files
Durability: These are computer applications and hence data is less prone to physical damage
Scalability: Easier to search, insert and modify records as compared to book ledgers
Security: Can be password protected
Easy to Use: Computer applications are used to search and manipulate records in the spreadsheets leading to
reduction in manpower needed to perform routing computations
Week 1 Lecture 2 2
Consistency: Not guaranteed but spreadsheets are less prone to mistakes registers
Why leave filesystems?

Lack of efficiency in meeting growing needs
With rapid scale up of data, there has been considerable increase in the time required to perform most operations
A typical spreadsheet file may have an upper limit on the number of rows
Ensuring consistency of data is a big challenge
No means to check violations of constraints in the face of concurrent processing
Unable to give different permissions to different people in a centralized manner
A system crash could be catastrophic
The above mentioned limitations of filesystems paved the way for a comprehensive platform dedicated to management of
data - the Database Management System
History of Database Systems

1950s and early 1960s
Data processing using magnetic tapes for storage
Tapes provided only sequential access
Punched cards for input
Late 1960s and 1970s
Hard disks allowed direct access to data
Network and hierarchical data model in widespread use
Ted Codd defines the relational data model
Would win the ACM Turin Award for his work
IBM Research begins in System R prototype
UC Berkeley begins Ingres prototype
High-performance (for the era) transaction processing
1980s
Research relational prototypes evolve into commercial systems - SQL becomes industrial standard
Parallel and distributed database systems
Object oriented database systems
1990s
Large decision support and data mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce
Early 2000s
XML and XQuery standards
Automated database administration
Later 2000s
Giant data storage systems - Google BigTable, Yahoo PNuts, Amazon, ...
Week 1 Lecture 2 3
📚
Week 1 Lecture 3
Class BSCCS2001
Materials
Module # 3
Type Lecture
Week # 1
Why DBMS? (part 2)

Case study of a Bank Transaction
Consider a simple banking system where a person can open a bank account, transfer funds to an existing account and
check the history of all her transactions till date
The application performs the following checks
If the account balance is not enough, it will now allow the fund transfer
If the account numbers are not correct, it will flash a message and terminate the transaction
If a transaction is successful, it prints a confirmation message
We will use this banking transaction system to compare various features of a file-based (.csv file) implementation viz-a-viz a
DBMS-based implementation
Account details are stored in
Accounts.csv for file-based implementation
Accounts table for DBMS implementation
The transaction details are stored in
Ledger.csv for file-based implementation
Ledger table for DBMS implementation
Source: https://github.com/bhaskariitm/transition-from-files-to-db
Initiating a transaction
Python
Week 1 Lecture 3 1
def begin_Transaction(credit_account, debit_account, amount):
temp = []
success = 0
# Open file handles to retrieve and store transaction data

f_obj_Account1 = open('Accounts.csv', 'r')
f_reader1 = csv.DictReader(f_obj_Account1)
f_obj_Account2 = open('Accounts.csv', 'r')
f_reader2 = csv.DictReader(f_obj_Account2)
f_obj_Ledger = open('Ledger.csv', 'a+')
f_writer = csv.DictWriter(f_obj_Ledger, fieldnames=col_name_Ledger)
SQL
-- Handled implicitly by the DBMS
Transaction
Python
try:
for sRec in f_reader1:
# CONDITION CHECK FOR ENOUGH BALANCE
if sRec['AcctNo'] == debitAcc and int(sRec['Balance']) > int(amt):
for rRec in f_reader2:
if rRec['AcctNo'] == creditAcc:
sRec['Balance'] = str(int(sRec['Balance']) - int(amt)) # DEBIT
temp.append(sRec)
# CRITICAL POINT
f_writer.writerow({
'Acct1':sRec['AcctNo'],
'Acct2':rRec['AcctNo'],
'Amount':amt,
'D/C':'D'
})
rRec['Balance'] = str(int(rRec['Balance']) + int(amt)) # CREDIT
temp.append(rRec)
f_writer.writerow({'Account1': r_record['Account_no'], 'Account2': s_record['Account_no'], 'Amount': amount,'D/C': 'C'})
success = success + 1
break
f_obj_Account1.seek(0)
next(f_obj_Account1)
for record in f_reader1:
if record['Account_no'] != temp[0]['Account_no'] and record['Account_no'] != temp[1]['Account_no']:
temp.append(record)
except:
print('\nWrong input entered !!!')
SQL
do $$
begin
amt = 5000
sendVal = '1800090';
recVal = '1800100';
select balance from accounts
into sbalance
where account_no = sendVal;
if sbalance < amt then
raise notice "Insufficient balance";
else
update accounts
set balance = balance - amt
where account_no = sendVal;
insert into ledger(sendAc, recAc, amnt, ttype)
values(sendVal, recVal, amt, 'D')
update accounts
set balance = balance + amt
where account_no = recVal;
insert into ledger(sendAc, recAc, amnt, ttype)
values(sendVal, recVal, amt, 'C')
commit;
raise notice "Successful";
end if;
end; $$
Week 1 Lecture 3 2
Closing a transaction
Python
f_obj_Account1.close()
f_obj_Account2.close()
f_obj_Ledger.close()
if success == 1:
f_obj_Account = open('Accounts.csv', 'w+', newline='')
f_writer = csv.DictWriter(f_obj_Account, fieldnames=col_name_Account)
f_writer.writeheader()
for data in temp:
f_writer.writerow(data)
f_obj_Account.close()
print("\nTransaction is successfull !!")
else:
print('\nTransaction failed : Confirm Account details')
SQL
-- Handled implicitly by the DBMS
Comparison
Parameter File handling via Python DBMS
Scalability with
Very difficult to handle insert, update and querying of In-built features to provide high scalability for a large
respect to amount of
records number of records
data
Scalability with
Extremely difficult to change the structure of records Adding or removing attributes can be done seamlessly
respect to changes in
as in the case of adding or removing attributes using simple SQL queries
structure
Time of execution in seconds in milliseconds
Data processed using temporary data structures Data persistence is ensured via automatic, system
Persistence
have to be manually updated to the file induced mechanisms
Ensuring robustness of data has to be done Backup, recovery and restore need minimum manual
Robustness
manually intervention
Difficult to implement in Python (Security at OS
Security User-specific access at database level
level)
Most file access operations involve extensive coding Standard and simple built-in queries reduce the effort
Programmer's
to ensure persistence, robustness and security of involved in coding thereby increasing a programmer's
productivity
data throughput
Arithmetic operations Easy to do arithmetic computations Limited set of arithmetic operations are available
Low costs for hardware, software and human

Costs High costs of hardware, software and human resources
resources
Parameterized Comparison
Scalability
File Handling in Python
Number of records: As the # of records increases, the efficiency of flat files reduces:
the time spent in searching for the right records
the limitations of the OS in handling huge files
Structural Change: To add an attribute, initializing the new attribute of each record with a default value has to be done
by program. It is very difficult to detect and maintain relationships between entities if and when an attribute has to be
removed
DBMS
Number of records: Databases are built to efficiently scale up when the # of records increase drastically.
In-built mechanisms, like indexing, for quick access of right data
Week 1 Lecture 3 3
Structural Changes: During adding an attribute, a default value can be defined that holds for all existing records - the
new attribute gets initialized with default value. During deletion, constraints are used either not to allow the removal on
ensure its safe removal
Time and Efficiency

If the number of records is very small, the overhead in installing and configuring a database will be much more than the
time advantage obtained from executing the queries
However, in the number of records is really large, then the time required in the initialization process of a database will
be negligible as compared to that of using SQL queries
The effort needed to implement a file handler is quite less in Python
In order to process a 1GB file, a program in Python would typically take a few seconds
DBMS
The effort to install and configure a DB in a DB server in expensive and time consuming
In order to process a 1GB file, an SQL query would typically take a few milliseconds
Programmer's Productivity
Building a file handler: Since the constraints within and across entities have to be enforced manually, the effort
involved in building a file handling application is huge
Maintenance: To maintain the consistency of data, one must regularly check for sanity of data and the relationships
between entities during inserts, updates and deletes
Handling huge data: As the data grows beyond the capacity of the file handler, more efforts are needed
DBMS
Configuring the database: The installation and configuration of a database is a specialized job of a DBA. A
programmer, on the other hand, is saved the trouble
Maintenance: DBMS has built-in mechanisms to ensure consistency and sanity of data being inserted, updated or
deleted. The programmer does not need to do such checks
Handling huge data: DBMS can handle even terabytes of data - Programmer does not have to worry
Arithmetic Operations
Extensive support for arithmetic and logical operations on data using Python. These include complex numerical
calculations and recursive computations
DBMS
SQL provides limited support for arithmetic and logical operations. Any complex computation has to be done outside of
SQL
Costs and Complexity

File systems are cheaper to install and use. No specialized hardware, software or personnel are required to maintain
filesystems
DBMS
Large databases are served by dedicated database servers which need large storage and processing power
DBMSs are expensive software that have to be installed and regularly updated
Databases are inherently complex and need specialized people to work on it - like DBA (Database System
Administrator)
The above factors lead to huge costs in implementing and maintaining database management systems
Week 1 Lecture 3 4
📚
Week 1 Lecture 4
Class BSCCS2001
Materials
Module # 4
Type Lecture
Week # 1
Introduction to DBMS
Levels of Abstraction
Physical Level: describes how a record (eg: instructor) is stored
Logical Level: describes data stored in a database and the relationships among the data fields
type instructor = record

ID: string;
name: string;
dept_name: string;
salary: integer;
end;
View Level: application programs hide details of data types
Views can also hide information (such as employee's salary) for security purposes
An architecture for a database system
Week 1 Lecture 4 1
Schema and Instances
TLDR: Schema is the way in which data is organized and Instance is the actual value of the data
Schema
Logical Schema - the overall logical structure of the database
Analogous to type information of a variable in a program (eg: int x = 5)
Example: The database consists of information about a set of customers and accounts in a bank and the
relationship between them
Customer Schema
Name Customer ID Account # Aadhaar ID Mobile #

Untitled
Account Schema
Account # Account Type Interest Rate Min. Bal. Balance

Untitled
Physical Schema - the overall physical structure of the database
Instance
The actual content of the database at a particular point in time
Analogous to the value of a variable
Customer Instance
Name Customer ID Account # Aadhaar ID Mobile #

Pavan Lakha 6728 917322 182719289372 9830100291
Lata Kala 8912 827183 918291204829 7189203928
Nand Prabhu 6617 372912 127837291021 8892021892
Account Instance
Week 1 Lecture 4 2
917322 Savings 4.0% 5000 7812

372912 Current 0.0% 0 291820
827183 Term Deposit 6.75% 10000 100000
Physical Data Independence - the ability to modify the physical schema without changing the logical schema
Analogous to independence of Interface and Implementation in object-oriented systems
Applications depend on the logical schema
In general, the interfaces between various levels and components should be well defined so that changes in some
parts do not seriously influence others.
Data Models
A collection of tools that describe the following ...
Data
Data relationships
Data semantics
Data constraints
Relational model (our focus in this course)
Entity-Relationship data model (mainly for database design)
Object-based data models (Object-oriented and Object-relational)
Other older models
Network model
Hierarchical model
Recent models for Semi-structured or Unstructured data
Converted to easily manageable formats
Content Addressable Storage (CAS) with metadata descriptors
XML format
RDBMS which support BLOBs
Relational Model
All the data is stored in various tables
Tables are also called Relations
Columns are called attributes
They have particular names which tells us the schema
Rows are records that are the values
Data Definition Language (DDL)

Specification notation for defining the database schema
Example
create table instructor (

ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8, 2))
DDL compiler generates a set of table templates stored in a data dictionary
Data dictionary contains metadata (that is, data about the data)
Database schema
Week 1 Lecture 4 3
Integrity constraints
Primary key (ID uniquely identifies instructors)
Authorization
Who can access what
Data Manipulation Language (DML)

Language for accessing and manipulating the data organized by the appropriate data model
DML: also know as Query Language
Two classes of languages
Pure - used for proving properties about computational power and for optimization
Relational Algebra (our focus in this course)
Tuple relational calculus
Domain relational calculus
Commercial - used in commercial systems
SQL is the most widely used commercial language
Structured Query Language (SQL)

Most widely used commercial language
SQL is NOT a Turing Machine equivalent language. Read more here
Cannot be used to solve all problems that a C program, for example, can solve
To be able to compute complex complex functions, SQL is usually embedded in some higher-level language
Application programs generally access databases through one of ...
Language extensions to allow embedded SQL
Application Programming Interfaces or APIs (eg: ODBC / JDBC) which allow SQL queries to be sent to the
databases
Database Design
The process of designing the general structure of the database:
Logical Design - Deciding on the database schema. Database design requires that we find a good collection of
relation schema
Business decision
What attributes should we record in the databases?
Computer Science decision
What relation schemas should we have and how should the attributes be distributed among the various
relation schemas?
Physical Design - Deciding on the physical layout of the database
Week 1 Lecture 4 4
📚
Week 1 Lecture 5
Class BSCCS2001
Created @August 20, 2021 11:13 AM
Materials
Module # 5
Type Lecture
Week # 1
Introduction to DBMS (part 2)

Database Design
Design Approaches
Need to come up with a methodology to ensure that each relation in the database is good
Two ways of doing so:
Entity Relationship Model (primarily tries to capture the business requirements)
Models an enterprise as a collection of entities and relationships
Represented diagrammatically by an entity-relationship diagram
Normalization Theory (this is the Computer Science perspective)
Formalize what designs are bad and test for them
Object-Relational Data Models

Relational model: flat, atomic values
Object Relational Data Models
Extend the relational data model by including object orientation and constructs to deal with added data types
Allow attributes of tuples to have complex types, including non-atomic values such as nested relations
Preserve relational foundations, in particular the declarative access to data, while extending modeling power
Provide upward compatibility with existing relational language
Week 1 Lecture 5 1
XML: eXtensible Markup Language
Defined by the WWW Consortium (W3C)
What XML primarily says; XML is a description of name-value pair
It talks about a tag, so you can put a value on that
Originally intended as a document markup language not a database language
The ability to specify new tags and to create tag structures made XML a great way to exchange data, not just
documents
XML has become the basis for all new generation data interchange formats
A wide variety of tools are available for parsing, browsing and querying XML documents
Database Engine
3 major components are:
Storage Manager
Query processing
Transaction Manager
Storage Management
Storage Manager is a program module that provides the interface between the low-level data stored in the database and
the application programs and queries submitted to the system
The storage manager is responsible for the following tasks:
Interaction with the OS file manager
Efficient storing, retrieving and updating of data
Issues:
Storage access
File organization
Indexing and hashing
Query Processing
Parsing and Translation
Optimization
Evaluation
How a query is processed?
Alternative ways of evaluating a given query
Equivalent expressions
Different algorithms for each operation
Cost difference between a good and a bad way of evaluating a query can be enormous
Need to estimate the cost of operations
Depends critically on statistical information about relations which the database must maintain
Need to estimate statistics for intermediate results to compute cost of complex expressions
Transaction Management
What is the system fails?
What if more than one user is concurrently updating the same file?
A transaction is a collection of operations that perform single logical function in a database application
Transaction-Management component ensure that the database remains in a consistent (correct) state despite
system failures (eg: power failures and operating system crashes) and transaction failures
Week 1 Lecture 5 2
Concurrency-control manager controls the interaction among the concurrent transactions to ensure consistency of
the database
Database Architecture
The architecture of a database system is greatly influenced by the underlying computer system on which the database is
running:
Centralized
Client-Server
Parallel (multi-processor)
Distributed
Cloud
Week 1 Lecture 5 3
📚
Week 2 Lecture 1
Class BSCCS2001
Materials
Module # 6
Type Lecture
Week # 2
Introduction to Relational Model

Attribute Types
Consider
Student = Roll #, First Name, Last Name, DoB, Passport #, Aadhaar #, Department
relation
The set of allowed values for each attribute is called the domain of the attribute
Roll # - Alphanumeric string
First Name, Last Name - Alpha string
DoB - Date
Passport # - String (Letter followed by 7 digits) - nullable (Optional)
Aadhaar # - 12-digit number
Department - Alpha string
Attribute values are (normally) required to be atomic; this is, indivisible
The special value null is a member of every domain. Indicates that the value is unknown
the null value may cause complications in the definition of many operations
Roll # First Name Last Name DoB Passport Aadhaar Dept.

15CS10026 Lalit Dubey 27-Mar-1997 L4032464 172861749239 Computer
Week 2 Lecture 1 1
Roll # First Name Last Name DoB Passport Aadhaar Dept.
16EE30029 Jatin Chopra 17-Nov-1996 null 391718363816 Electrical
Relational Schema and Instance

A1 , A2 , ..., An are the attributes
R = (A1 , A2 , ..., An ) is a relation schema
Example: instructor = (ID, name, dept_name, salary)
Formally, given as D1 , D2 , ..., Dn a relation r is a subset of
D1 ✕D2 ✕...Dn
Thus, a relation is a set of n-tuples (a1 , a2 , ..., an ) where each ai ∈ Di
The current values (relation instance) of a relation are specified by a table
An element t or r is a tuple, represented by a row in a table
Example
instructor ≡ (String(5) ✕ String ✕ String ✕ Number+), where ID ∈ String(5), name ∈ String, dept_name ∈ String and
salary ∈ Number+
Keys
Let K ⊆ R, where R is the set of attributes in the relation
K is a superkey of R if values of K are sufficient to identify a unique tuple of each possible relation r(R)
Example: {ID} and {ID, name} are both superkeys of instructor
Superkey K is a candidate key if K is minimal
Example: {ID} is a candidate key for instructor
One of the candidate keys is selected to be the primary key
A surrogate key (or synthetic key) in a database is a unique identifier for either an entity in the modeled world or an
object in the database
The surrogate key is not derived from application data, unlike a natural (or business) key which is derived from
application data
Keys: Examples
Students = Roll #, First Name, Last Name, DoB, Passport #, Aadhaar #, Department
Super Key: Roll #, {Roll #, DoB}
Candidate Keys: Roll #, {First Name, Last Name}, Aadhaar #
Passport # cannot be a key because it is an optional field and can take null values, but an ID can never be null
Primary Key: Roll #
Can Aadhaar # be a key?
It may suffice for unique identification, but Roll # may have additional useful information.
For example: 14CS92P01
Read it as 14-CS-92-P-01
14 - Admission in 2014
CS - Department: Computer Science
92 - Category of the Student
P - Type of admission: Project
01 - Serial Number
Secondary / Alternate Key: {First Name, Last Name}, Aadhaar #
Simple Key: Consists of a single attribute
Week 2 Lecture 1 2
Composite Key: {First Name, Last Name}
Consists of more than one attribute to uniquely identify an entity occurrence
One or more of the attributes, which make up the key are not simple keys in their own right
Roll # First Name Last Name DoB Passport Aadhaar Dept
15CS10026 Lalit Dubey 27-Mar-1997 L4032464 172861749239 Computer
16EE30029 Jatin Chopra 17-Nov-1996 null 391718363816 Electrical

15EC10016 Smriti Mongra 23-Dec-1996 G5432849 204592710914 Electronics
16CE10038 Dipti Dutta 02-Feb-1997 null 571919482918 Civil
15CS30021 Ramdin Minz 10-Jan-1997 X8811623 492849275924 Computer
Foreign key constraint: Value in one relation must appear in another (in other words, when a particular attribute is a
key in a different table)
Referencing relation
Enrolment: Foreign Keys - Roll #, Course #
Referenced relation
Students, Courses
A compound key consists of more than one attribute to uniquely identify an entity occurence
Each attribute, which makes up the key, is a simple key in its own right
{Roll #, Course #}
Schema Diagram for University Database
Relational Query Languages

Procedural viz-a-viz Non-procedural or Declarative Paradigms
Procedural programming requires that the programmer tell the computer what to do
That is, how to get the output for the range of required inputs
The programmer must know an appropriate algorithm
Declarative programming requires a more descriptive style
The programmer must know what relationships hold between various entities
Week 2 Lecture 1 3
Relational Query Language: Example
"Pure" languages:
Relational Algebra
Tuple relational calculus
Domain relational calculus
The above 3 pure languages are equivalent in computing power
We will concentrate on relational algebra
Not Turing-macine equivalent
Not all algorithms can be expressed in Relational Algebra
Consists of 6 basic operations
Week 2 Lecture 1 4
📚
Week 2 Lecture 2
Class BSCCS2001
Materials https://www.caam.rice.edu/~heinken/latex/symbols.pdf
Module # 7
Type Lecture
Week # 2
Introduction to Relational Model (part 2)

Relational Operators
Basic properties of relations
A relation is a set. Hence,
Ordering of rows / tuples is inconsequential
All rows / tuples must be distinct
Select operation - selection of rows (tuples)

Relation r on the following table
Week 2 Lecture 2 1
The select operation is defined as
And it returns the following table as a result
Project operation - selection of columns (Attributes)

Relation r
The projection operation is defined as
And it returns the following table as a result
Union of two relations

Relation r, s
Week 2 Lecture 2 2
The union of two relation is defined as
And it returns the following result
Set difference of two relations

Relation r, s
The set difference of two relations is defined as
Week 2 Lecture 2 3
Joining two relations - Cartesian-product
Relation r, s
The cartesian product is defined as
Cartesian-product - Naming issue
Week 2 Lecture 2 4
Renaming a Table
Allows us to refer to a relation, say E, by more than one name
returns the expression E under the name X
Relations r
Self product
Composition of Operations
Can build expressions using multiple operations
Example:
r ╳s
Week 2 Lecture 2 5
Joining two relations - Natural Join
Let r and s be relations on schemas R and S respectively. Then, the "natural join" of relations R and S is a relation
on schema R ∪ S
Consider each pair of tuples tr from r and ts from s
If tr and ts have the same value on each of the attributes in R ∩ S , add a tuple t to the result, where
t has the same value as tr on r

t has the same value as ts on s
Natural join example

Relations r, s:
Natural join
Week 2 Lecture 2 6
Aggregation Operators
Can we compute:
SUM
AVG
MAX
MIN
Notes about Relational Languages

Each query input is a table (or a set of tables)
Each query output is a table
All data in the output table appears in one of the input tables
Relational Algebra is not Turing complete
Week 2 Lecture 2 7
📚
Week 2 Lecture 3
Class BSCCS2001
Materials
Module # 8
Type Lecture
Week # 2
Introduction to Structured Query Language (SQL)

History of SQL
IBM developed Structured English Query Language (SEQUEL) as a part of System R project.
Renamed Structured Query Language (SQL: still pronounced as SEQUEL)
ANSI and ISO standard SQL:
Description
Name
SQL -
First formalized by ANSI
86
SQL -
+ Integrity Constraints
89
SQL -
Major revision (ISO/IEC 9075 standard), De-facto Industry Standard
92
+ Regular Expression Matching, Recursive Queries, Triggers, Support for Procedural and Control Flow Statements,
SQL :
Non-scalar types (Arrays) and some OO features (structured types), Embedding SQL in Java (SQL/OLB) and Embedding
1999
Java in SQL (SQL/JRT)
SQL : + XML features (SQL/XML), Window functions, Standardized sequences and columns with auto-generated values (identity
2003 columns)
SQL : + Way of importing and storing XML data in a SQL database, manipulating it within the database, and publishing both XML
2006 and conventional SQL-data in XML form
SQL :
Legalizes ORDER BY outside Cursor Definitions + INSTEAD OF Triggers, TRUNCATE statements and FETCH clause
2008
Week 2 Lecture 3 1
Description
Name
SQL :
+ Temporal data (PERIOD FOR) Enhancements for Window functions and FETCH clause
2011
SQL :
+ Row Pattern Matching, Polymorphic Table Functions and JSON
2016
SQL :
+ Multidimensional Arrays (MDarray type and operators)
2019
Compliance
SQL is the de facto industry standard today for relational or structured data systems
Commercial system as well as open system may be fully or partially compliant to one or more standards from SQL-92
onward
Not all examples here may work on your particular system. Check your system's SQL docs.
Alternatives
There aren't any alternatives to SQL for speaking to relational databases (i.e. SQL as a protocol)
There are alternatives to writing SQL in the applicaions
These alternatives have been implemented in the form of front-ends for working with relational databases. Some
examples of a front-end include (for a section of languages):
SchemeQL and CLSQL
Probably the most flexible, thanks to their Lisp heritage
They also look a lot more like SQL than other front-ends
LINQ (in .NET)
ScalaQL and ScalaQuery (in Scala)
SqlStatement, ActiveRecord and many others in Ruby
HaskellDB
... the list goes on for many other languages
Derivatives
There are several query languages that are derived from or inspired by SQL.
Out of these, the most popular and effective is SPARQL.
SPARQL (pronounced sparkle, a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF
query language
A semantic query language for databases - able to retrieve and manipulate data stored in Resource Description
Framework (RDF) format.
It has been standardized by the W3C Consortium as key technology of the semantic web
Versions
SPARQL 1.0 (Jan. 2008)
SPARQL 1.1 (Mar. 2013)
Used as the query languages for several NoSQL systems - particularly the Graph Databases that use RDF as
store
Data Definition Language (DDL)

The SQL data-definition language (DDL) allows the specification of information about relations, including:
The Schema for each Relation
The Domain of values associated with each Attribute
Integrity Constraints
Week 2 Lecture 3 2
And, as we will see later, also other information such as ...
The set of Indices to be maintained for each relations
Security and Authorization information for each relation
The Physical Storage Structure of each relation on disk
Domain types (or Data types) in SQL

char(n) - Fixed length character string, with user-specified length n
varchar(n) - Variable length character strings, with user-specified max length n
int - Integer (a finite subset of the integers that is machine-dependent)
smallint(n) - Small integer (a machine-dependent subset of the integer domain type)
numeric(p, d) - Fixed point number, with user-specified precision of p digits, with d digits to the right of decimal point.
(ex. numeric(3, 1) allows 44.5 to be stored exactly, but not 444.5 or 0.32)
real, double precision - Floating point and double-precision floating point numbers, with machine-dependent
precision
float(n) - Floating point number with user specified precision of at-least n digits
Schema diagram for a University database
Create Table construct

An SQL relation is defined using the create table command:
create table r (A1 D1 , A2 D2 , ..., An Dn ),
(integrity − constraint1 ),
...
(integrity − constraintk ));
r is the name of the relation (table)
each Ai is an attribute name in the schema of relation r
Di is the data type of values in the domain of attribute Ai
Example

ID char(5),
Week 2 Lecture 3 3
name varchar(20),
salary numeric(8, 2));
University DB
instructor
ID
name
dept_name
salary
Create Table constructs: Integrity constraints

not null
primary key (A1 , ..., An )
foreign key (Am , ..., An ) references r

ID char(5),
name varchar(20),
salary numeric(8, 2));

ID char(5),
name varchar(20) not null,
salary numeric(8, 2),
primary key (ID),
foreign key (dept_name) references department));
primary key declaration on an attribute automatically ensures not null
Create Table construct: More relations
create table student (

ID varchar(5),
name varchar(20) not null,
tot_cred numeric(3, 0),
primary key (ID),
foreign key (dept_name) references department);
create table course (

course_id varchar(8),
title varchar(50),
credits numeric(2, 0),
primary key (course_id),
foreign key (dept_name) references department);
create table takes (

ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year numeric(4, 0),
grade varchar(2),
primary key (ID, course_id, sec_id, semester, year),
foreign key (course_id, sec_id, semester, year) references section);
NOTE: sec_id can be dropped from primary key above to ensure a student cannot register for two sections of the
same course in the same semester
Week 2 Lecture 3 4
Update Tables
Insert (DML command)
insert into instructor values ('10211', 'Smith', 'Biology', 66000);
Delete (DML command)
Remove all tuples from the student relation
delete from student
Drop Table (DDL command)
drop table r
Alter (DDL command) # to edit the schema
alter table r add A D
Where A is the name of the attribute to be added to relation to r and D is the domain of A
All existing tuples in the relation are assigned null as the value for the new attribute
alter table r drop A
Where A is the name of the attribute of relation r
Dropping of attributes not supported by many databases
Data Manipulation Language (DML): Query Structure

Basic query structure
A typical SQL query has the form:
select A1 , A2 , ..., An ,
from r1 , r2 , ..., rm
where P
Ai represents an attribute from ri 's

ri represents a relation
P is a predicate
The result of an SQL query is a relation
SELECT clause
The select clause lists the attributes desired in the result of a query
Corresponds to the projection operation of relational algebra
Example: find the names of all instructors
select name from instructor
NOTE: SQL names are case insensitive
Name = NAME = name
Some people prefer to use UPPER CASE wherever we use the bold font
SQL allows duplicates in relations as well as in query results
Week 2 Lecture 3 5
To force the elimination of duplicates, insert the keyword distinct after select
Find the department names of all instructors and remove duplicates
select distinct dept_name

from instructor
The keyword all specifies that duplicates should not be removed
select all dept_name

from instructor
An asterisk (*) in the select denotes all attributes
select *
from instructor
An attribute can be a literal with no from clause
select '437'
Result is a table with one column and a single row with the value '437'
Can give the column a name using:
select '437' as FOO
An attribute can be a literal with from clause
select 'A'
from instructor
Result is a table with one column and N rows (number of tuples in the instructors table), each row with value 'A'
The select clause can contain arithmetic expressions involving the operation +, -, * and / and operating on constants or
attributes of tuples
The query:
select ID, name, salary/12

from instructor
Would return a relation that is the same as the instructor relation, except that the value of the attribute salary is
divided by 12
Can rename "salary/12" using the as clause:
select ID, name, salary/12 as monthly_salary
WHERE clause
The where clause specifies conditions that the result must satisfy
Corresponds to the selection predicate of the relational algebra
To find all instructors in the Computer Science department
select name
from instructor
where dept_name = 'Comp. Sci.'
Comparison results can be combined using the logical connectives and, or, not
Week 2 Lecture 3 6
To find all instructors in Comp. Sci. department with salary > 80000
select name
from instructor
where dept_name = 'Comp. Sci.' and salary > 80000
Comparisons can be applied to results of arithmetic expressions
FROM clause
The from clause lists the relations involved in the query
Corresponds to the Cartesian product operation of the relational algebra
Find the Cartesian product instructor X teaches
select *
from instructor, teaches
Generates every possible instructor-teaches pair with all attributes from both relations
For common attributes (for eg: ID), the attributes in the resulting table are renamed using the relation name (for
eg: instructor.ID)
Cartesian product is not very useful directly, but useful when combined with the where-clause condition (selection
operation in relational algebra)
Cartesian product
Week 2 Lecture 3 7
📚
Week 2 Lecture 4
Class BSCCS2001
Created @September 3, 2021 11:26 AM
Materials
Module # 9
Type Lecture
Week # 2
Introduction to Structured Query Language (SQL) (part 2)

Cartesian product (cont. from the previous lecture's end)
Example
Find the names of all instructors who have taught some courses and the course_id
select name, course_id

where instructor.ID = teaches.ID
Equi-Join, Natural Join
Week 2 Lecture 4 1
Here in this table, we do not have the names of the courses
If we want the name, we will again have to do a similar join operation with a table that has the names of the
courses
This operations is known as Natural Join
Example
Find the names of all the instructors in the Art dept. who have taught some courses and the course_id

where instructor.ID = teaches.ID and instructor.dept_name = 'Art'
Rename AS operation
The SQL allows renaming relations and attributes using the as clause:
old_name as new_name
Find the names of all the instructors who have a higher salary than some instructor in 'Comp. Sci.'
select distinct T.name

from instructor as T, instructor as S
where T.salary > S.salary and S.dept_name = 'Comp. Sci.'
The keyword as is optional and may be omitted
instructor as T ≡ instructor T
String Operations
SQL includes a string-matching operator for comparisons on character strings.
The operator like uses patterns that are described using two special characters:
percent (%)
The % character matches any sub-string
Week 2 Lecture 4 2
underscore ( _ )
The _ character matches any character
Find the names of all instructors whose name includes the sub-string "dar"
select name
from instructor
where name like '%dar%'
Match the string "100%"
like '100%' escape '\'
in the above example, we use the backslash ( \ ) as the escape character

and '%dar%' could match Darwin, Majumdar, Sardar or Uddarin
meanwhile, '%dar___' (dar followed by 3 underscores), it will match Darwin, but not the others
Patterns are case sensitive
Pattern matching example
'Intro%' matches any string beginning with "Intro"
'%Comp%' matches any string containing "Comp" as a substring
'___' (3 underscores) many any string of exactly 3 characters
'___%' (3 underscores and then a %) matches any string of at least 3 characters
SQL supports variety of string operations such as
Concatenation (using "||") [double pipe symbol]
Converting from upper to lower case (and vice-versa)
Finding the string length, extracting substrings, etc...
Ordering the display of tuples (ORDER BY clause)

List in alphabetic order the names of all the instructors
select distinct name

from instructor
order by name
We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the
default
Example: order by name desc
Can sort on multiple attributes
Example: order by dept_name, name
Selecting number of tuples in output

The Select Top clause is used to specify the number of records to return
The Select Top clause is useful on large tables with thousands of records.
Returning a large number of records can impact performance
select top 10 distinct name

from instructor
Not all database systems support the SELECT TOP clause.
SQL Server & MS Access support select top
MySQL supports the limit clause
Week 2 Lecture 4 3
Oracle uses fetch first n rows only and rownum
select distinct name

from instructor
order by name
fetch first 10 rows only
WHERE clause predicates

SQL includes a between comparison operator
Example: Find the names of all the instructors with salary between $90,000 and $100,000
(that is, ≥ $90,000 and ≤ $100,000)
select name
from instructor
where salary between 90000 and 100000
Tuple comparison

where (instructor.ID, dept_name) = (teaches.ID, 'Biology');
IN operator
The in operator allows you to specify multiple values in a where clause
The in operator is a shorthand for multiple or conditions
select name
from instructor
where dept_name in ('Comp. Sci.', 'Biology')
Duplicates
In relations with duplicates, SQL can define how many copies of tuples appear in the result
Multiset versions of some of the relational algebra operators - given multiset relations r1 and r2 :
a) SELECT σθ (r1 ) : If there are c1 copies of tuple t1 in r1 and t1 satisfies selection σθ , then there are c1 copies of
t1 in σθ (r1 )
b) PROJECTION ΠA (r) : For each copy of tuple t1 in r1 , there is a copy of tuple ΠA (t1 ) in ΠA (r1 ) where ΠA (t1 )
denotes the projection of the single tuple t1
c) r1 × r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuples t2 in r2 , there are c1 × c2 copies of the
tuple t1 ⋅ t2 in r1 × r2
Example: Suppose multiset relations r1 (A, B) and r2 (C) are as follows:
r1 = {(1, a)(2, a)} ; r2 = {(2), (3), (3)}

Then ΠB (r1 ) would be {(a), (a)} while ΠB (r1 ) × r2 would be
{(a, 2), (a, 2), (a, 3), (a, 3), (a, 3), (a, 3)}
SQL duplicate semantics:
select A1 , A2 , ..., An
from r1 , r2 , ..., rm
where P
is equivalent to the multiset version of the expression:
ΠA 1 ,A 2 ,...,A n (σP (r1 × r2 × ... × rm ))
Week 2 Lecture 4 4
📚
Week 2 Lecture 5
Class BSCCS2001
Created @September 4, 2021 6:05 PM
Materials
Module # 10
Type Lecture
Week # 2
Introduction to Structured Query Language (SQL) (part 3)

Set operations
Example
Find the courses that ran in Fall 2009 or in Spring 2010
(select course_id from section where sem = 'Fall' and year = 2009)
union
(select course_id from section where sem = 'Spring' and year = 2010)
Find the courses that ran in Fall 2009 and in Spring 2010
intersect
Find the courses that ran in Fall 2009 but not in Spring 2010
except
Find the salaries of all the instructors that are less than the largest salary
Week 2 Lecture 5 1
select distinct T.salary
from instructor as T, instructor as S
where T.salary < S.salary
Find the salaries of all the instructors
select distinct salary

from instructor
Find the largest salary of all the instructors
(select distinct salary from instructor)

except
(select distinct T.salary from instructor as T, instructor as S where T.salary < S.salary)
Set operations such as union, intersect and except automatically eliminate the duplicates
To retain all the duplicates, use the corresponding multiset versions union all, intersect all and except all
Suppose a tuple occurs m times in r and n times in s, then it occurs ...
m + n times in r union all s

min(m, n) times in r intersect all s
max(0, m - n) times in r except all s
NULL values
What is a NULL value?
A NULL value is something unknown or a value that does not exist yet
Why is NULL value so important?
Certain values may not exist for everyone
For eg: Every student may not have a passport at the time of registration
Often times while we are creating/inserting a record, we may not know all the values of all the fields
For eg: When a student joins, the student does not have any credit assigned to him/her, so the total credit is
NULL
We can say 0 (zero), but 0 (zero) and NULL are different
0 (zero) means the student has not taken a credit

NULL means the credit has not been given yet
Naturally, when we add an attribute to all the existing rows of a table, the value of the particular field cannot be
known, cannot be set, so it will have to initialized as a NULL value
It is possible for tuples to have a null value, denoted by null, for some of their attributes
The predicate is null can be used to check for null values
Example: Find all the instructors whose salary is null
select name
from instructor
where salary is null
It is not possible to test for null values with comparison operators such as =, <, > or <>
We need to use the is null and is not null operators instead
NULL values: Three valued logic

Three values - true, false, unknown
Any comparison with null returns unknown
Example: 5 < null or null <> null or null = null
Week 2 Lecture 5 2
Three-valued logic using the value unknown:
OR:
(unknown or true) = true

(unknown or false) = unknown
(unknown or unknown) = unknown
AND:
(true and unknown) = unknown
(false and unknown) = false
(unknown and unknown) = unknown
NOT:
(not unknown) = unknown
"P is unknown" evaluates to true if predicate P evaluates to unknown
Result of where clause predicate is treated as false if it evaluates to unknown
Aggregate functions
These functions operate on the multiset of values of a column of a relation (table) and return a value
avg: average value
min: minimum value
max: maximum value

sum: sum of the values
count: number of values
Examples
Find the average salary of instructors in the Computer Science department
select avg(salary)
from instructor
where dept_name = 'Comp. Sci.'
Find the total number of instructors who teach a course in the Spring 2010 semester
select count(distinct ID)

from teaches
where semester = 'Spring' and year = 2010
Find the number of tuples in the course relation (table)
select count(*)
from courses;
Example (GROUP BY)
Find the average salary of instructors in each department
select dept_name. avg(salary) as avg_salary

from instructor
group by dept_name;
Week 2 Lecture 5 3
So, group by takes a column and makes sub-tables of all those records which have the same value on that particular
group by attribute
It then applies the aggregate function on the column based on this sub-table
Attributes in select clause outside of aggregate functions must appear in group by list
-- The following query is incorrect because of the 'ID' attribute

select dept_name, ID, avg(salary)
from instructor
group by dept_name;
HAVING clause
Find the names and average salaries of all departments whose average salary is greater than 42,000
select dept_name, ID, avg(salary)

from instructor
group by dept_name
having avg(salary) > 42000;
NOTE: Predicates in the having clause are applied after the formation of groups whereas predicates in the where
clause are applied before forming groups
NULL values and aggregates

Total all salaries
select sum(salary)
from instructor;
Above statement ignores null amounts
Result is null if there is no non-null amount
All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes
What if collection has only null values?
count returns 0 (zero)
all other aggregates return null
Week 2 Lecture 5 4
📚
Week 3 Lecture 1
Class BSCCS2001
Materials
Module # 11
Type Lecture
Week # 3
SQL Examples
SELECT DISTINCT
From the classroom relation, find the names of buildings in which every individual classroom has capacity less than
100 (removing the duplicates).
Relation:
classroom
building room_number capacity
Packard 101 500
Painter 514 10
Taylor 3128 70
Watson 100 30
Watson 120 50
Query:
SELECT DISTINCT building

FROM classroom
WHERE capacity < 100;
Output:
building
Week 3 Lecture 1 1
building
Painter
Taylor
Watson
SELECT ALL
From the classroom relation, find the names of buildings in which every individual classroom has capacity less than
100 (without removing the duplicates).
Relation:
classroom
Packard 101 500
Painter 514 10
Taylor 3128 70
Watson 100 30
Watson 120 50
Query:
SELECT ALL building

FROM classroom
WHERE capacity < 100;
Output:
building
Painter
Taylor
Watson
Watson
NOTE: The duplicate retention is default and hence it is a common practice to skip ALL immediately after SELECT
Cartesian Product
Find the list of all students of departments which have a budget < $100K
SELECT name, budget

FROM student, department
WHERE student.dept_name = department.dept_name AND budget < 100000;
name budget
Brandt 50000
Peltier 70000
Levy 70000
Sanchez 80000
Snow 70000
Aoi 85000
Bourikas 85000
Tanaka 90000
Week 3 Lecture 1 2
The above query generates every possible student-department pair, which is the Cartesian product of student and
department.
Then, it filters all the rows with student.dept_name = department.dept_name AND budget < 100000
The common attribute dept_name in the resulting table are renamed using the relation name - student.dept_name and
department.dept_name
RENAME AS Operation
The same query in the above case can be framed by renaming the table as shown below:
SELECT S.name AS studentname, budget AS deptbudget

FROM student AS S, department AS D
WHERE S.dept_name = D.dept_name AND budget < 100000;
studentname deptbudget
Brandt 50000
Peltier 70000
Levy 70000
Sanchez 80000
Snow 70000
Aoi 85000
Bourikas 85000
Tanaka 90000
The above query renames the relation student AS S and the relation department AS D
It also displays the attribute name as StudentName and the budget as DeptBudget
NOTE: The budget attribute does not have any prefix because it occurs only in the department relation
SELECT: AND and OR

From the instructor and department relations in the figure, find out the names of all the instructors whose department
is Finance or whose department is in any of the following buildings: Watson, Taylor
instructor
id name dept_name salary
10101 Srinivasan Comp. Sci. 65000
12121 Wu Finance 90000
15151 Mozart Music 40000
22222 Einstein Physics 95000
32343 El Said History 60000
33456 Gold Physics 87000
45565 Katz Comp. Sci. 75000
58583 Califieri History 62000
76543 Singh Finance 80000
76766 Crick Biology 72000
83821 Brandt Comp. Sci. 92000
98345 Kim Elec. Eng. 80000
department
dept_name building budget
Biology Watson 90000
Comp. Sci. Taylor 100000
Elec. Eng. Taylor 85000
Week 3 Lecture 1 3
dept_name building budget
Finance Painter 120000
History Painter 50000
Music Packard 80000
Physics Watson 70000
Query:
SELECT name
FROM instructor I, department D
WHERE D.dept_name = I.dept_name
AND (I.dept_name = 'Finance' OR building IN ('Watson', 'Taylor'));
Output:
name
Srinivasan
Wu
Einstein
Gold
Katz
Singh
Crick
Brandt
Kim
String Operations
From the course relation in the figure, find the titles of all the courses whose course_id has 3 alphabets indicating the
department
course
course_id title dept_name credits
BIO-101 Intro. to Biology Biology 4
BIO-301 Genetics Biology 4
BIO-399 Computational Biology Biology 3
CS-101 Intro. to Computer Science Comp. Sci. 4
CS-190 Game Design Comp. Sci. 4
CS-315 Robotics Comp. Sci. 3
CS-319 Image Processing Comp. Sci. 3
CS-347 Database System Concepts Comp. Sci. 3
EE-181 Intro. to Digital Systems Elec. Eng. 3
FIN-201 Investment Banking Finance 3
HIS-351 World History History 3
MU-199 Music Video Production Music 3
PHY-101 Physical Principles Physics 4
Query:
SELECT title
FROM course
WHERE course_id LIKE '___-%'; -- 3 underscores
Output:
Week 3 Lecture 1 4
title
Intro. to Biology
Genetics
Computational Biology
Investment Banking
World History
Physical Principles
The course_id of each department has either 2 or 3 alphabets in the beginning followed by a hyphen and then
followed by a 3-digit number. The above query returns the names of those departments that have 3 alphabets in the
beginning
ORDER BY
From the student relation in the figure, obtain the list of all students in alphabetic order of departments and within
each department, in decreasing order of total credits.
student
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
12345 Shankar Comp. Sci. 32
19991 Brandt History 80
23121 Chavez Finance 110
44553 Peltier Physics 56
45678 Levy Physics 46
54321 Williams Comp. Sci. 54
55739 Sanchez Music 38
70557 Snow Physics 0
76543 Brown Comp. Sci. 58
76653 Aoi Elec. Eng. 60
98765 Bourikas Elec. Eng. 98
98988 Tanaka Biology 120
Query:
SELECT name, dept_name, tot_cred

FROM student
ORDER BY dept_name ASC, tot_cred DESC;
Output:
name dept_name tot_cred
Tanaka Biology 120
Zhang Comp. Sci. 102
Brown Comp. Sci. 58
Williams Comp. Sci. 54
Shankar Comp. Sci. 32
Bourikas Elec. Eng. 98
Aoi Elec. Eng. 60
Chavez Finance 110
Brandt History 80
Sanchez Music 38
Peltier Physics 56
Levy Physics 46
Week 3 Lecture 1 5
name dept_name tot_cred
Snow Physics 0
How is this sort happening?
The list is first sorted in alphabetic order of dept_name
Within each department, it is sorted in decreasing order of total credits
IN Operator
From the teaches relation in the figure, find the IDs of all the courses taught in the Fall or Spring of 2018
teaches
id course_id sec_id semester year
10101 CS-101 1 Fall 2017
10101 CS-315 1 Spring 2018
10101 CS-347 1 Fall 2017
12121 FIN-201 1 Spring 2018
15151 MU-199 1 Spring 2018
22222 PHY-101 1 Fall 2017
32343 HIS-351 1 Spring 2018
45565 CS-101 1 Spring 2018
45565 CS-319 1 Spring 2018
76766 BIO-101 1 Summer 2017
76766 BIO-301 1 Summer 2018
83821 CS-190 1 Spring 2017
83821 CS-190 2 Spring 2017
83821 CS-319 2 Spring 2018
98345 EE-181 1 Spring 2017
Query:
SELECT course_id
FROM teaches
WHERE semester IN ('Fall', 'Spring')
AND year = 2018;
Output:
course_id
CS-315
FIN-201
MU-199
HIS-351
CS-101
CS-319
CS-319
NOTE: Now we can use DISTINCT to remove duplicates
Set Operations: UNION

For the same question in the above table, we can find the solution using UNION operator as follows:
Query:
SELECT course_id
FROM teaches
WHERE semester = 'Fall'
Week 3 Lecture 1 6
AND year = 2018
UNION
SELECT course_id
FROM teaches
WHERE semester = 'Spring'
AND year = 2018
Output:
course_id
CS-101
CS-315
CS-319
FIN-201
HIS-351
MU-199
NOTE: UNION removes all the duplicates. If we use UNION ALL instead of UNION, we get the same set of tuples as
in the above example
Set Operations: INTERSECT

From the instructor relation in the figure, find the names of all the instructors who taught in either Computer Science
department or the Finance department and whose salary is > 80,000
instructor
98345 Kim Elec. Eng. 80000
Query:
SELECT name
FROM instructor
WHERE dept_name IN ('Comp. Sci.', 'Finance')
INTERSECT
SELECT name
FROM instructor
WHERE salary > 80000;
Output:
name
Srinivasan
Katz
NOTE: The same thing can be achieved by using the query:
SELECT name FROM instructor WHERE dept_name IN ('Comp. Sci.', 'Finance') AND salary < 80000;
Week 3 Lecture 1 7
Set Operation: EXCEPT
From the instructor relation in the figure, find the names of all the instructors who taught in either the Computer
Science department or the Finance department and whose salary is either ≥ 90, 000 or ≤ 70, 000
instructor
98345 Kim Elec. Eng. 80000
Query:
SELECT name
FROM instructor
EXCEPT
SELECT name
FROM instructor
WHERE salary < 90000 AND salary > 70000;
Output:
name
Srinivasan
Brandt
Wu
NOTE: The same can be achieved by using the following query
SELECT name FROM instructor

AND (salary >= 90000 OR salary <= 70000);
Aggregate function: AVG

From the classroom relation given in the figure, find the names and the average capacity of each building whose
average capacity is greater than 25
classroom
Packard 101 500
Painter 514 10
Taylor 3128 70
Watson 100 30
Watson 120 50
Week 3 Lecture 1 8
Query:
SELECT building, AVG(capacity)

FROM classroom
GROUP BY building
HAVING AVG(capacity) > 25;
Output:
bulding avg
Taylor 70.00
Packard 500.00
Watson 40.00
Aggregate function: MIN

From the instructor relation given in the figure, find the least salary drawn by any instructor among all the instructors
instructor
98345 Kim Elec. Eng. 80000
Query:
SELECT MIN(salary) AS least_salary FROM instructor;
Output:
least_salary
40000
Aggregate function: MAX

From the instructor relation given above, find the highest salary drawn by any instructor among all the instructors
Query:
SELECT MAX(salary) AS highest_salary FROM instructor;
Output:
highest_salary
95000
Aggregate function: COUNT
Week 3 Lecture 1 9
From the instructor relation given above, find the number of instructors in each department
Query:
SELECT dept_name, COUNT(id) AS ins_count

FROM instructor
GROUP BY dept_name;
Output:
dept_name ins_count
Comp. Sci. 3
Finance 2
Music 1
Physics 2
History 2
Biology 1
Elec. Eng. 1
Aggregate function: SUM

From the course relation given in the figure, find the total credits offered by each department
course
BIO-101 Intro. to Biology Biology 4
BIO-399 Computational Biology Biology 3
CS-101 Intro. to Computer Science Comp. Sci. 4
CS-319 Image Processing Comp. Sci. 3
CS-347 Database System Concepts Comp. Sci. 3
EE-181 Intro. to Digital Systems Elec. Eng. 3
FIN-201 Investment Banking Finance 3
HIS-351 World History History 3
MU-199 Music Video Production Music 3
PHY-101 Physical Principles Physics 4
Query:
SELECT dept_name, SUM(credits) AS sum_credits

FROM course
GROUP BY dept_name;
Output:
dept_name sum_credits
Finance 3
History 3
Physics 4
Music 3
Comp. Sci. 17
Biology 11
Elec. Eng. 3
Week 3 Lecture 1 10
Week 3 Lecture 1 11
📚
Week 3 Lecture 2
Class BSCCS2001
Materials
Module # 12
Type Lecture
Week # 3
Intermediate SQL
Nested sub-queries
SQL provides a mechanism for the nesting of sub-queries
A sub-query is a SELECT-FROM-WHERE expression that is nested within another query
The nesting can be done in the following SQL query
SELECT A1 , A2 , ..., An
FROM r1 , r2 , ..., rm
WHERE P
as follows:
Ai can be replaced by a sub-query that generates a single value

ri can be replace by any valid sub-query
P can be replaced with an expression of the form:
B <operation> (sub-query)
where B is an attribute and <operation> is to be defined later
Input of a query → One or more relations
Output of a query → Always a single relation
Subqueries in WHERE clause

Typical use of subqueries is to perform tests
Week 3 Lecture 2 1
For set membership
For set comparisons
For set cardinality
Set Membership
Find the courses offered in Fall 2009 and in Spring 2010 (INTERSECT example)
SELECT DISTINCT course_id

FROM section
AND year = 2009
AND course_id IN (
SELECT course_id
FROM section
WHERE semester = 'Spring' AND year = 2010);
Find courses offered in Fall 2009 but not in Spring 2010 (EXCEPT example)
SELECT DISTINCT course_id

FROM section
AND year = 2009
AND course_id NOT IN (
SELECT course_id
FROM section
WHERE semester = 'Spring' AND year = 2010);
Find the total number of (distinct) students who have taken course sections taught by the instructor with ID 10101
SELECT COUNT(DISTINCT id)

FROM takes
WHERE (course_id, sec_id, semester, year) IN (
SELECT course_id, sec_id, semester, year
FROM teaches
WHERE teaches.id = 10101);
NOTE: Above query can be written in a simple manner. The formulation above is just to simply illustrate SQL features
Set comparison - "SOME" clause

Find names of instructors with salary greater than that of some (at least one) instructor in the Biology department
SELECT DISTINCT T.name

FROM instructor AS T, instructor AS S
WHERE T.salary > S.salary AND S.dept_name = 'Biology';
The same above query using SOME clause
SELECT name
FROM instructor
WHERE salary > SOME (
SELECT salary
FROM instructor
WHERE dept_name = 'Biology');
Definition of "SOME" clause

F <comp> SOME r ⇔ ∃t ∈ r such that (F <comp> t)
where <comp> can be: <, ≤, >, ≥, =, 
=
SOME represents existential quantification [The entity in "()" is a tuple here]
5 < SOME (0, 5, 6) → true
5 < SOME (0, 5) → false
5 = SOME (0, 5) → true
5=
 SOME (0, 5) → true # as 0 =
5
Week 3 Lecture 2 2
(= SOME) ≡ IN
However, (= ≡ NOT IN
 SOME) 
Set Comparison - "ALL" clause

Find the names of all the instructors whose salary is greater than the salary of all instructors in the Biology department
SELECT name
FROM instructor
WHERE salary > ALL (
SELECT salary
FROM instructor
WHERE dept_name = 'Biology');
Definition of "ALL" clause

F <comp> ALL r ⇔ ∀t ∈ r such that (F <comp> t)
where <comp> can be: <, ≤, >, ≥, =, 
=
ALL represents universal quantification [The entity in "()" is a tuple here]
5 < ALL (0, 5, 6) → false
5 < ALL(6, 10) → true
5 = ALL(4, 5) → false
5=
 ALL(4, 5) → true
(=
 ALL) ≡ NOT IN
However, (= ALL) ≡
 IN
Test for empty relations: "EXISTS"

The EXISTS construct returns the value true if the argument subquery is non-empty
EXISTS r ⇔r=
∅
NOT EXISTS r ⇔r=∅
Use of "EXISTS" clause

Yet another way of specifying the query "Find all the courses taught in both the Fall 2009 semester and in the Spring
2010 semester"
SELECT course_id
FROM section AS S
WHERE semester = 'Fall' AND year = 2009
AND EXISTS (
SELECT * FROM section AS T
WHERE semester = 'Spring' AND year = 2010
AND S.course_id = T.course_id);
Correlation name - variable S in the outer query
Correlated subquery - the inner query
Use of "NOT EXISTS" clause

Find all students who have taken all courses offered by the Biology department
SELECT DISTINCT S.id, S.name

FROM student AS S
WHERE NOT EXISTS (
(
SELECT course_id
FROM course
WHERE dept_name = 'Biology')
EXCEPT
(
SELECT T.course_id
FROM takes AS T
WHERE S.id = T.id));
Week 3 Lecture 2 3
First nested query lists all the courses offered by the Biology department
Second nested query lists all the courses a particular student has taken
NOTE: X −Y =∅ ⇔X ⊆Y
NOTE: Cannot write this query string = ALL and its variants
Test for absence of duplicate tuples: "UNIQUE"

The UNIQUE construct tests whether a subquery has any duplicate tuples in its results
The UNIQUE construct evaluates to "true" if a given subquery contains no duplicates
Find all the courses that were offered at most once in 2009
SELECT T.course_id
FROM course AS T
WHERE UNIQUE (
SELECT R.course_id
FROM course AS R
WHERE T.course_id = R.course_id
AND R.year = 2009);
Subqueries in the "FROM" clause

SQL allows a subquery expression to be used in the FROM clause
Find the average instructors' salaries of those departments where the average salary is greater than $42,000
SELECT dept_name, avg_salary

FROM (
SELECT dept_name, AVG(salary) AS avg_salary
FROM instructor
GROUP BY dept_name)
WHERE avg_salary > 42000;
NOTE: We do not need a HAVING clause
Another way to write the above query
SELECT dept_name, avg_salary

FROM (
SELECT dept_name, AVG(salary)
FROM instructor
GROUP BY dept_name) AS dept_avg(dept_name, avg_salary)
WHERE avg_salary > 42000;
WITH clause
The WITH clause provides a way of defining a temporary relation whose definition is available only to the query in
which the WITH clause occurs
Find all the departments with the maximum budget
WITH max_budget(value) AS
(
SELECT MAX(budget)
FROM department)
SELECT department.name
FROM department, max_budget
WHERE department.budget = max_budget.value;
Complex queries using WITH clause

Find all departments where the total salary is greater than the average of the total salary at all departments
WITH dept_total(dept_name, value) AS

SELECT dept_name, SUM(salary)
FROM instructor
GROUP BY dept_name,
dept_total_avg(value) AS
Week 3 Lecture 2 4
(
SELECT AVG(value)
FROM dept_total)
SELECT dept_name
FROM dept_total, dept_total_avg
WHERE dept_total.value > dept_total_avg.value;
Subqueries in the SELECT clause

Scalar subquery: Where a single value is expected
List all departments along with the number of instructors in each department
SELECT dept_name, (
SELECT COUNT(*)
FROM instructor
WHERE department.dept_name = instructor.dept_name)
AS num_instructors
FROM department;
Runtime error occurs if subquery returns more than one result tuple
Modifications of the Database

Deletion of tuples from a given relation
Insertion of new tuples into a given relation
Updating of values in some tuples in a given relation
Deletion
Delete all instructors
DELETE FROM instructors;
Delete all instructors from the Finance department
DELETE FROM instructor

WHERE dept_name = 'Finance';
Delete all tuples in the instructor relation for those instructors associated with a department located in the Watson
building

WHERE dept_name IN (SELECT dept_name
FROM department
WHERE building = 'Watson');
Delete all instructors whose salary is less than the average salary of instructors

WHERE salary < (SELECT AVG(salary) FROM instructor);
Problem: As we delete tuples from deposit, the average salary changes
Solution:
First, compute AVG ( salary ) and find all the tuples to delete
Next, delete all the tuples found above (without recomputing AVG or retesting the tuples)
Insertion
Add a new tuple to the course
Week 3 Lecture 2 5
INSERT INTO course
VALUES ('CS-437', 'Database Systems', 'Comp. Sci.', 4);
or equivalently
INSERT INTO course (course_id, title, dept_name, credits)

VALUES ('CS-437', 'Database Systems', 'Comp. Sci.', 4);
Add a new tuple to student with tot_creds set to null
INSERT INTO student

VALUES ('3003', 'Green', 'Finance', null);
Add all instructors to the student relation with tot_creds set to 0
INSERT INTO student

SELECT id, name, dept_name, 0
FROM instructor;
The SELECT FROM WHERE statement is evaluated fully before any of its results are inserted into the relation
Otherwise queries like
INSERT INTO table1 SELECT * FROM table1;
would cause problems
Updates
Increase salaries of instructors whose salary is over $100,000 by 3% and all other by 5%
Write two UPDATE statements
UPDATE instructor
SET salary = salary * 1.03
WHERE salary > 100000;
UPDATE instructor
SET salary = salary * 1.05
WHERE salary <= 100000;
The order is important
Can be done better using the CASE statement
CASE statement for conditional updates

Same query as before but with CASE statement
UPDATE instructor
SET salary = CASE
WHEN salary <= 100000
THEN salary * 1.05
ELSE salary * 1.03
END;
Updates with scalar subqueries

Recompute and update tot_creds value for all the students
UPDATE student S
SET tot_creds = (SELECT SUM(credits)
FROM takes, course
WHERE takes.course_id = course.course_id AND
Week 3 Lecture 2 6
S.id = takes.id AND
takes.grade <> 'F' AND
takes.grade IS NOT NULL);
Set tot_creds to null for students who have not taken any course
Instead of SUM (credits) , use:
CASE
WHEN SUM(credits) IS NOT NULL THEN SUM(credits)
ELSE 0
END;
Week 3 Lecture 2 7
📚
Week 3 Lecture 3
Class BSCCS2001
Materials
Module # 13
Type Lecture
Week # 3
Intermediate SQL (part 2)

Joined Relations
Join operations take two relations and return as a result another relation
A join operation is a Cartesian product which requires that tuples in the two relations match (under some conditions)
It also specifies the attributes that are present in the result of the join
The join operations are typically used as subquery expressions in the FROM clause
Types of JOIN relations

Cross join
Inner join
Equi-join
Natural join
Outer join
Left outer join
Right outer join
Full outer join
Self-join
Cross JOIN
Week 3 Lecture 3 1
CROSS JOIN returns the Cartesian product of rows from tables in the join
Explicit
SELECT *
FROM employee CROSS JOIN department;
Implicit
SELECT *
FROM employee, department;
JOIN Operations - Example

Relation course
Relation prereq
course_id prereq_id
BIO-301 BIO-101
CS-190 CS-101
CS-347 CS-101
Observe that
prereq information is missing from CS-315 and
course information is missing from CS-347
Inner JOIN
course INNER JOIN prereq
Name title dept_name credits prere_id course_id
BIO-301 Genetics Biology 4 BIO-101 BIO-301
CS-190 Game Design Comp. Sci. 4 CS-101 CS-190
If specified as NATURAL, the 2nd course_id field is skipped
course_id title Column credits
course_id prereq_id
BIO-301 BIO-101
CS-190 CS-101
CS-347 CS-101
Week 3 Lecture 3 2
Outer JOIN
An extension of the join operation that avoids loss of information
Computes the join and then adds tuples, from one relation that does not match tuples in the other relation, to the
results of the join
Uses null values
Left Outer JOIN

course NATURAL LEFT OUTER JOIN prereq
course_id title dept_name credits prere_id
BIO-301 Genetics Biology 4 BIO-101
CS-190 Game Design Comp. Sci. 4 CS-101
CS-315 Robotics Comp. Sci. 3 null
course_id prereq_id
BIO-301 BIO-101
CS-190 CS-101
CS-347 CS-101
Right Outer JOIN

course NATURAL RIGHT OUTER JOIN prereq
Week 3 Lecture 3 3
CS-347 null null null CS-101
course_id prereq_id
BIO-301 BIO-101
CS-190 CS-101
CS-347 CS-101
Joined relations
Join operations take two relations and return a relation as the result
These additional operations are typically used as subquery expressions in the FROM clause
Join condition - defines which tuples in the two relations match, and what attributes are present in the result of the
join
Join type - defines how tuples in each relation, that do not match any tuple in the other relation (based on the join
condition), are treated
Join types
inner join
left outer join
right outer join
full outer join
Join conditions
natural
on <predicate>
using (A1 , A2 , ..., An )
Full outer JOIN

course NATURAL FULL OUTER JOIN prereq
course_id title dept_name credits prereq_id
Week 3 Lecture 3 4
course_id title dept_name credits prereq_id
course_id prereq_id
BIO-301 BIO-101
CS-190 CS-101
CS-347 CS-101
Joined Relations - Example

course INNER JOIN prereq ON
course.course_id = prereq.course_id
course_id title dept_name credits prere_id courseid
What is the difference between the above (equi_join) and a natural join?
course LEFT OUTER JOIN prereq ON
course.course_id = prereq.course_id
course_id title dept_name credits prere_id courseid
CS-315 Robotics Comp. Sci. 3 null null
course NATURAL RIGHT OUTER JOIN prereq
course FULL OUTER JOIN prereq USING (course_id)
Week 3 Lecture 3 5
Views
In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in
the database)
Consider a person who needs to know an instructors name and department, but not the salary. This person should
see a relation described, in SQL, by
SELECT id, name, dept_name

FROM instructor;
A VIEW provides a mechanism to hide certain data from the view of certain users
Any relation that is not of the conceptual model but is made visible to a user as a "virtual relation" is called a VIEW
View definition
A view is defined using the CREATE VIEW statement which has the form
CREATE VIEW v AS <query expression>
where <query expression> is any legal SQL expression
The view name is represented by v
Once a view is defined, the view name can be used to refer to the virtual relation that the view generates
View definition is not the same as creating a new relation by evaluating the query expression
Rather, a view definition causes the saving of an expression; the expression is substituted into queries using the
view
Example views
A view of instructors without their salary
CREATE VIEW faculty AS

SELECT id, name, dept_name
FROM instructor;
Find all the instructors in the biology department
SELECT name
FROM faculty
WHERE dept_name = 'Biology'
Create a view of department salary totals
CREATE VIEW departments_total_salary(dept_name, total_salary) AS

SELECT dept_name, SUM(salary)
FROM instructor
GROUP BY dept_name;
View defined using other views
CREATE VIEW physics_fall_2009 AS

SELECT course.course_id, sec_id, building, room_number
FROM course, section
WHERE course.course_id = section.course_id
AND course.dept_name = 'Physics'
AND section.semester = 'Fall'
AND section.year = '2009';
Week 3 Lecture 3 6
CREATE VIEW physics_fall_2009_watson AS
SELECT course_id, room_number
FROM phsics_fall_2009
WHERE building = 'Watson';
View expansion
Expand use of a view in a query / another view
CREATE VIEW physics_fall_2009_watson AS

(SELECT course_id, room_number
FROM (SELECT course.course_id, building, room_number
FROM course, section
WHERE course.course_id = section.course_id
AND course.dept_name = 'Physics'
AND section.semester = 'Fall'
AND section.year = '2009')
WHERE building = 'Watson');
Views defined using other views

One view may be used in the expression defining another view
A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the expression defining v1
A view relation v1 is said to depend on view relation v2 if either v1 depends directly on v2 or there is a path of
dependencies from v1 to v2
A view relation v is said to be recursive if it depends on itself
View expansion
A way to define the meaning of views defined in terms of other views
Let view v1 be defined by an expression e1 that may itself contain uses of view relations
View expansion of an expression repeats the following replacement step:
repeat
Find any view relation vi in e1
Replace the view relation vi by the expression defining vi
until no more view relations are present in e1
As long as the view definitions are not recursive, this loop with terminate
Update of a view
Add a new tuple to faculty view which we defined earlier
INSERT INTO faculty VALUES ('30765', 'Green', 'Music');
This insertion must be represented by the insertion of the tuple

('30765', 'Green', 'Music', null)
into the instructor relation
Some updates cannot be translated uniquely
CREATE VIEW instructor_info AS

SELECT id, name, building
FROM instructor, department
WHERE instructor.dept_name = department.dept_name;
INSERT INTO instructor_info VALUE('69987', 'White', 'Taylor');
Which department, if multiple departments in Taylor?
Week 3 Lecture 3 7
What if no department is present in Taylor?
Most SQL implementations allow updates only on simple views
The FROM clause has only one database relation
The SELECT clause contains only attribute names of the relation and does not have any expressions, aggregates
or DISTINCT specification
Any attribute not listed in the SELECT clause can be set to null
The query does not have a GROUP BY or HAVING clause
And some not at all
CREATE VIEW history_instructors AS

SELECT * FROM instructor
WHERE dept_name = 'History';
What happens when we insert ('25566', 'Brown', 'Biology', 100000) into the history_instructors ?
Materialized views
Materializing a view: Create a physical table containing all the tuples in the result of the query defining the view
If relations used in the query are updated, the materialized view result becomes out of data
Need to maintain the view, by updating the view whenever the underlying relations are updated
Week 3 Lecture 3 8
📚
Week 3 Lecture 4
Class BSCCS2001
Materials
Module # 14
Type Lecture
Week # 3
Intermediate SQL (part 3)

Transactions
It is a unit of work
Atomic transaction
Either something is fully executed or it is rolled back as if it never occurred
Example: Bank account transactions, when transferring money from one account to another, the transaction
should either happen or not happen at all.
It should not fail at a stage where money is deducted from one account and not added to the other account
Isolation from concurrent transactions
Transactions begin implicitly
Ended by COMMIT WORK or ROLLBACK WORK
But default on most databases: each SQL statement commits automatically
Can turn off auto-commit for a session (for example, using API)
In SQL:1999, can use: BEGIN ATOMIC ... END
Not supported on most databases
Integrity Constraints
Integrity constraints guard against accidental damage to the database by ensuring that the authorized changes to the
database do not result in a loss of data consistency
A checking account must have a balance greater than Rs. 10,000.00
Week 3 Lecture 4 1
A salary of a bank employee must be at least Rs. 250.00 an hour
A customer must have a (non-null) phone number
Integrity constraints on a single relation

NOT NULL
PRIMARY KEY
UNIQUE
CHECK(P ), where P is a predicate
NOT NULL and UNIQUE constraints

NOT NULL
Declare name and budget to be NOT NULL
name VARCHAR(20) NOT NULL

budget NUMERIC(12, 2) NOT NULL
UNIQUE(A1 , A2 , ..., Am )
The unique specification states that the attributes A1 , A2 , ..., Am form a candidate key
Candidate keys are permitted to be null (in contrast to primary keys)
The CHECK clause

CHECK(P ), where P is a predicate
Ensure that semester is one of fall, winter, spring or summer
CREATE TABLE section (

course_id VARCHAR(8),
sec_id VARCHAR(8),
semester VARCHAR(6),
year NUMERIC(4, 0),
building VARCHAR(15),
room_number VARCHAR(7),
time slot id VARCHAR(4),
PRIMARY KEY (course_id, sec_id, semester, year)
CHECK (semester IN ('Fall', 'Winter', 'Spring', 'Summer'))
);
Referential Integrity
Ensures that a value that appears in one relation for a given set of attributes also appeals for a certain set of attributes
in another relation
Example: If "Biology" is a department name appearing in one of the tuples in the instructor relation, then there exists
a tuple in the department relation for "Biology"
Let A be a set of attributes. Let R and S be two relations than contain attributes A.
Here, A is the primary key of S.
A is said to be a FOREIGN KEY of R if for any values of A appearing in R these values also appear in S
Cascading Actions in Referential Integrity

With cascading, you can define the actions that the Database Engine takes when a user tries to delete or update a
key to which existing foreign keys point
CREATE TABLE course (

course_id CHAR(5) PRIMARY KEY,
title VARCHAR(20),
dept_name VARCHAR(20) REFERENCES department
)
Week 3 Lecture 4 2
CREATE TABLE course (
...
dept_name VARCHAR(20),
FOREIGN KEY (dept_name) REFERENCES department
ON DELETE CASCADE
ON UPDATE
...
)
Alternative actions to cascade: NO ACTION, SET NULL, SET DEFAULT
Integrity constraint violation during transactions
CREATE TABLE person (

id CHAR(10),
name CHAR(40),
mother CHAR(10),
father CHAR(10),
PRIMARY KEY id,
FOREIGN KEY father REFERENCES person,
FOREIGN KEY mother REFERENCES person)
How to insert a tuple without causing constraint violation?
Insert father and mother of a person before inserting person
OR, set father and mother to null initially, update after inserting all persons (not possible if father and mother
attributes declared to be NOT NULL)
OR defer constraint checking
SQL Data Types and Schemas
Built-in data types in SQL

DATE: Dates, containing an (4 digit) year, month and date
Example: DATE '2005-7-27'
TIME: Time of day in hours, minutes and seconds
Example: TIME '09:00:30' TIME '09:00:30.75'
TIMESTAMP: Date plus time of the day
Example: TIMESTAMP '2005-7-27 09:00:30.75'
INTERVAL: Period of time
Example: INTERVAL '1' day
Subtracting a date/time/timestamp value from another gives an interval value
Interval values can be added to date/time/timestamp values
Index creation
CREATE TABLE student

( id VARCHAR(5),
name VARCHAR(20) NOT NULL,
tot_cred NUMERIC(3, 0) DEFAULT 0,
PRIMARY KEY (id));
CREATE INDEX studentid_index ON student(id);
Indices are data structures used to speed up access to records with specified values for index attributes
SELECT * FROM student

WHERE id = '12345';
Week 3 Lecture 4 3
Can be executed by using the index to find the required record, without looking at all records of students
User-defined types
CREATE TYPE construct in SQL creates user-defined type (alias, like typedef in C)
CREATE TYPE Dollars AS NUMERIC(2, 2) FINAL;
CREATE TABLE department (

building VARCHAR(15),
budget Dollars);
Domains
CREATE TYPE construct in SQL-92 creates user-defined domain types
CREATE DOMAIN person_name CHAR(20) NOT NULL;
Types and domains are similar
Domains can have constraints such as NOT NULL specified on them
CREATE DOMAIN degree_level VARCHAR(10)

CONSTRAINT degree_level_test
CHECK (VALUE IN('Bachelors', 'Masters', 'Doctorate'));
Large-object types
Large objects (photos, videos, CAD files, etc.) are stored as a large object:
blob: binary large object - object is a large collection of uninterpreted binary data (whose interpretation is left to
an application outside of the database system)
clob: character large object - object is a large collection of character data
When a query returns a large object, a pointer is returned than the large object itself
Authorization
Forms of authorization on parts of the database:
Read: allows reading, but not modification of data
Insert: allows insertion of new data, but not modification of existing data
Update: allows modification, but not deletion of data
Delete: allows deletion of data
Forms of authorization to modify the database schema
Index: allows creation and deletion of indices
Resources: allows creation of new relations
Alteration: allows addition or deletion of attributes in a relation
Drop: allows deletion of relations
Authorization Specification of SQL

The GRANT statement is used to confer authorization
GRANT <privilege list>

ON <relation name or view name> TO <user list>
<user list> is:
A user-id
Week 3 Lecture 4 4
PUBLIC, which allows all valid users the privilege granted
A role
Granting a privilege on a view does not imply granting any privileges on the underlying relations
The grantor of the privilege must already hold the privilege on specified item (or be the database administrator)
Privileges in SQL
SELECT: allows read access to relation or the ability to query using the view
Example: grant users U1 , U2 and U3 SELECT authorization on the instructor relation:

GRANT SELECT ON instructor TO U1 , U2 , U3
INSERT: the ability to insert tuples
UPDATE: the ability to update using the SQL update statement
DELETE: the ability to delete tuples
ALL PRIVILEGES: used as a short form for all the allowable privileges
Revoking authorization in SQL

The REVOKE statement is used to revoke authorization
REVOKE <privilege list>

ON <relation name or view name> FROM <user list>
Example:
REVOKE SELECT ON branch FROM U1 , U2 , U3
<privilege list> may be all to revoke all privileges the revokee may hold
If <revokee list> includes public, all users lose the privilege except those granted it explicitly
If the same privilege was granted twice to the same user by different grantees, the user may retain the privilege after
the revocation
All privileges that depend on the privilege being revoked are also revoked
Roles
CREATE ROLE instructor;
GRANT instructor TO Amit;
Privileges can be granted to roles:
GRANT SELECT ON takes TO instructor;
Roles can be granted to users as well as to other roles
CREATE ROLE teaching_assistant

GRANT teaching_assistant TO instructor;
Instructor inherits all privileges of teaching_assistant
Chain of roles
CREATE ROLE dean;
GRANT instructor TO dean;
GRANT dean TO Satoshi;
Authorization on views
Week 3 Lecture 4 5
CREATE VIEW geo_instructor AS
(SELECT *
FROM instructor
WHERE dept_name = 'Geology');
GRANT SELECT ON geo_instructor TO geo_staff;
Suppose that a geo_staff member issues
SELECT *
FROM geo_instructor;
What is
geo_staff does not have permissions on instructor?
creator of view did not have some permissions on instructor?
Other authorization features

REFERENCES privilege to create foreign key
GRANT REFERENCE (dept_name) ON department TO Mariano;
Why is this required?
Transfer of privileges
GRANT SELECT ON department TO Amit WITH GRANT OPTION;
REVOKE SELECT ON department FROM Amit, Satoshi CASCADE;
REVOKE SELECT ON department FROM Amit, Satoshi RESTRICT;
Week 3 Lecture 4 6
📚
Week 3 Lecture 5
Class BSCCS2001
Materials
Module # 15
Type Lecture
Week # 3
Advanced SQL
Functions and Procedural Constructs
Native Language ← → Query Language
Week 3 Lecture 5 1
Functions and Procedures
Functions / Procedures and Control Flow statements were added in SQL:1999
Functions/Procedures can be written in SQL itself or in an external programming language like C, Java, etc
Functions written in an external language are particularly useful with specialized data types such as images and
geometric objects
Example: Functions to check if polygons overlap or to compare images for similarity
Some database systems support table-valued functions which can return a relation as a result
SQL:1999 also supports a rich set of imperative constructs, including loops , if-then-else and assignment
Many databases have proprietary procedural extensions to SQL that differ from SQL:1999
SQL Functions
Define a function that, given the name of a department, returns the count of the number of instructors in that
department
CREATE FUNCTION dept_count (dept_name VARCHAR(20))

RETURN INTEGER
BEGIN
DECLARE d_count integer;
SELECT COUNT(*) INTO d_count
FROM instructor
WHERE instructor.dept_name = dept_name
RETURN d_cont;
END
The function dept_count can be used to find the department names and budget of all departments with more than 12
instructors:
SELECT dept_name, budget

FROM department
WHERE dept_count (dept_name) > 12;
Compound statement: BEGIN ... END
Week 3 Lecture 5 2
May contain multiple SQL statements between BEGIN and END
RETURNS: indicates the variable-type that is returned (eg: integer)
RETURN: specifies the values are to be returned as result of invoking the function
SQL function are in fact parameterized views that generalize the regular notion of views by allowing parameters
Table functions
Functions that return a relation as a result added in SQL:2003
Return all instructors in a given department:
CREATE FUNCTION instructor_of (dept_name CHAR(20))

RETURNS TABLE (
id VARCHAR(5),
name VARCHAR(20),
dept_name VARCHAR(20)
salary NUMERIC(8, 2))
RETURN TABLE
( SELECT id, name, dept_name, salary
FROM instructor
WHERE instrutor.dept_name = instructor_of.dept_name)
Usage
SELECT *
FROM TABLE (instructor_of('Music'))
SQL procedures
The dept_count function could instead be written as procedure:
CREATE PROCEDURE dept_count_proc(

IN dept_name VARCHAR(20), OUT d_count INTEGER)
BEGIN
SELECT COUNT(*) INTO d_count
FROM instructor
WHERE instructor.dept_name = dept_count_proc.dept_name
END
Procedures can be invoked either from an SQL procedure or from embedded SQL, using the CALL statement
DECLARE d_count INTEGER;

CALL dept_count_proc('Physics', d_count);
Procedures and functions can be invoked also from dynamic SQL
SQL:1999 allows overloading - more than one function/procedure of the same name as long as the number of
arguments and/or the types of the arguments differ
Language constructs for procedures and functions

SQL supports constructs that gives it almost all the power of a general purpose programming language
Warning: Most database systems implement their own variant of the standard syntax
Compound statement: BEGIN ... END
May contain multiple SQL statements between BEGIN and END
Local variables can be declared within a compound statements
WHILE loop:
WHILE boolean expression DO

sequence of statements;
END WHILE;
REPEAT loop:
Week 3 Lecture 5 3
REPEAT
UNTIL boolean expression
END REPEAT;
FOR loop:
Permits iteration over all results of a query
Find the budget of all departments
DECLARE n INTEGER DEFAULT 0;

FOR r AS
SELECT budget FROM department
DO
SET n = n + r.budget
END FOR;
Conditional statements
if-then-else
case
if-then-else statement
IF boolean expression THEN

ELSEIF boolean expression THEN
...
ELSE
END IF;
The IF statement supports the use of optional ELSEIF clauses and a default ELSE clause
Example procedure: registers student after ensuring classroom capacity is not exceeded
Returns 0 on success and -1 if the capacity is exceeded
Simple CASE statement
CASE variable
WHEN value1 THEN
WHEN value2 THEN
...
ELSE
END CASE;
The WHEN clause of the CASE statement defines the value that when satisfied determines the flow of control
Searched CASE statement
CASE
WHEN sql-expression = value1 THEN
WHEN sql-expression = value2 THEN
...
ELSE
END CASE;
Any supported SQL expression can be used here. These expressions can contain references to variables,
parameters, special registers and more.
Signaling of exception conditions and declaring handlers for exceptions
Week 3 Lecture 5 4
DECLARE out_of_classroom_seats CONDITION
DECLARE EXIT HANDLER FOR out_of_classroom_seats
BEGIN
...
SIGNAL out_of_classroom_seats
...
END
The handler here is EXIT - causes enclosing BEGIN ... END to terminate and exit
Other actions possible on exception
External Language Routines

SQL:1999 allows the definition of functions/procedures in an imperative programming language (Java, C#, C or C++)
which can be invoked from SQL queries
Such functions can be more efficient than functions defined in SQL. The computations that cannot be carried out in
SQL can be executed by these functions
Declaring external language procedures and functions
CREATE PROCEDURE dept_count_proc(

IN dept_count VARCHAR(20),
OUT count INTEGER
)
LANGUAGE C
EXTERNAL NAME '/usr/avi/bin/dept_count_proc'
CREATE FUNCTION dept_count(dept_name VARCHAR(20))

RETURNS integer
LANGUAGE C
EXTERNAL NAME '/usr/avi/bin/dept_count'
Benefits of external language functions/procedures:
More efficient for many operations and more expressive power
Drawbacks:
Code to implement function may need to be loaded into the DB system and executed in the DB system's address
space
Risk of accidental corruption of the DB structures
Security risk, allowing users access to unauthorized data
There are alternatives, which provide good security at the cost of performance
Direct execution in the DB system's space is used when efficiency is more important than security
External Language Routines: Security

To deal with security problems, we can do one of the following:
Use sandbox techniques:
That is, use a safe language like Java, which cannot be used to access/damage other parts of the DB code
Run external language functions/procedures in a separate process, with no access to the DB process' memory
Parameters and results communicated via the inter-process communication
Both have performance overheads
Many DB systems support both above approaches as well as direct executing in DB system address space
Triggers
A TRIGGER defines a set of actions that are performed in response to an INSERT, UPDATE or DELETE operation
on a specified table
When such an SQL operation is executed, the trigger is said to have been activated
Triggers are optional
Week 3 Lecture 5 5
Triggers are defined using the CREATE TRIGGER statement
Triggers can be used
To enforce data integrity rules via referential constraints and check constraints
To cause updates to other tables, automatically generate or transform values for inserted or updated rows, or
invoke functions to perform tasks such as issuing alerts
To design a trigger mechanism, we must:
Specify the events / (like UPDATE, INSERT or DELETE) for the trigger to executed
Specify the time (BEFORE or AFTER) of execution
Specify the actions to be taken when the trigger executes
Syntax of triggers may vary across systems
Types of Triggers: BEFORE

BEFORE triggers
Run before an UPDATE or INSERT
Values that are being updated or inserted can be modified before the DB is actually modified.
You can use triggers that run before an UPDATE or INSERT to ...
Check or modify the values before they are actually updated or inserted in the DB
Useful if user-view and internal DB format differs
Run other non-DB operations coded in user-defined functions
BEFORE DELETE triggers
Run before a DELETE
Checks value (and raises an error, if necessary)
Types of Triggers: AFTER

AFTER triggers
Run after an UPDATE, INSERT or DELETE
You can use triggers than run after an update or insert to:
Update data in other tables
Useful to maintain relationships between data or keep audit trail
Check against other data in the table or in other tables
Useful to ensure data integrity when referential integrity constraints aren't appropriate
When table check constraints limit checking to the current table only
Run non-DB operations coded in user-defined functions
Useful when issuing alerts or to update information outside the DB
Row level and Statement level Triggers

There are two types of triggers based on the level at which the triggers are applied:
Row level triggers are executed whenever a row is affected by the event on which the trigger is defined
Let Employee be a table with 100 rows.
Suppose an UPDATE statement is executed to increase the salary of each employee by 10%
Any row level UPDATE trigger configured on the table Employee will affect all the 100 rows in the table during this
update
Statement level triggers perform a single action for all the rows affected by a statement, instead of executing a
separate action for each affected row
Used for each statement instead of for each row
Week 3 Lecture 5 6
Uses referencing old table or referencing new table to refer to temporary tables called transition tables
containing the affected rows
Can be more efficient when dealing with SQL statements that update a large number of rows
Triggering Events and Actions in SQL

Triggering event can be an INSERT, DELETE or UPDATE
Triggers on update can restricted to specific attributes
For example: after update of takes on grade
Values of attributes before and after an update can be referenced
referencing old row as: for deletes and updates
referencing new row as: for inserts and updates
Triggers can be activated before an event, which can serve as extra constraints
For example: convert blank grades to null
CREATE TRIGGER setnull_trigger BEFORE UPDATE OF takes

REFERENCING NEW ROW AS nrow
FOR EACH ROW
WHEN (nrow.grade = '')
BEGIN ATOMIC
SET nrow.grade = null;
END;
Trigger to maintain credits_earned value
CREATE TRIGGER credits_earned AFTER UPDATE OF takes ON (grade)

REFERENCING NEW ROW AS nrow
REFERENCING OLD ROW AS orow
FOR EACH ROW
WHEN nrow.grade <> 'F' AND nrow.grade IS NOT NULL
AND (orow.grade = 'F' OR orow.grade IS NULL)
BEGIN ATOMIC
UPDATE student
SET tot_cred = tot_cred +
( SELECT credits
FROM course
WHERE course.course_id = nrow.course_id)
WHERE student.id = nrow.id;
END;
How to use triggers?

The optimal use of DML triggers is for short, simple and easy to maintain write operations that act largely independent
of an application business logic
Typical and recommended uses of triggers include:
Logging changes to a history table
Auditing users and their actions against sensitive tables
Adding additional values to a table that may not be available to an application (due to security restrictions or other
limitations), such as:
Login/user name
Time an operation occurs
Server/database name
Simple validation
Source: SQL Server triggers: The good and the scary
How not to use triggers?

Triggers are like Lays: Once you pop, you cannot stop
Week 3 Lecture 5 7
One of the greatest challenges for architects and developers is to ensure that
triggers are used only as needed, and
to not allow them to become a one-size-fits-all solution for any data needs that happen to come along
Adding triggers is often seen as faster and easier than adding code to an application, but the cost of doing so is
compounded over time with each added line of code.
Alright then, how to use triggers?

Trigger can become dangerous when:
There are too many
Trigger code becomes complex
Triggers go cross-server - across DBs over networks
Triggers call other triggers
Recursive triggers are set to ON. The DB-level setting is set to off by default
Functions, stored procedures or views are in triggers
Iteration occurs
Week 3 Lecture 5 8
📚
Week 4 Lecture 1
Class BSCCS2001
Materials
Module # 16
Type Lecture
Week # 4
Formal Relational Query Languages

Relational Algebra
Procedural and Algebra based
Tuple Relational Calculus
Non-procedural and Predicate Calculus based
Domain Relational Calculus
Non-procedural and Predicate Calculus based
Relational Algebra
Created by Edgar F. Codd at IBM in 1970
Procedural Language
Six basic operators
Select: σ
Project: Π
Union: ∪
Set difference: −
Cartesian product: ×
Rename: ρ
The operators take one or two relations as inputs and produce a new relation as the result
Week 4 Lecture 1 1
SELECT operation
Notation: σp (r)
p is called the selection predicate

Defined as:
σp (r) = {t∣t ∈ r and p(t)}

where p is a formula in propositional calculus consisting of terms connected by
∧ (and)
∨ (or)
¬ (not)
Each term is one of:
< attribute > op < attribute > or < constant >

where op is one of: =, =
, >, ≥ . < . ≤
Example of selection:
σdept_name =′ P hysics ′ (instructor)
PROJECT operation
Notation: ΠA 1 ,A 2 ,...A k (r)
where A1 , A2 are attribute names and r is a relation
The result is defined as the relation of k columns obtained by erasing the columns that are not listed.
Duplicate rows removed from result, since relations are sets
Example: To eliminate the dept_name attribute of instructor
ΠID, name, salary (instructor)
Week 4 Lecture 1 2
UNION operation
Notation: r ∪s
Defined as: r ∪ s = {t∣t ∈ r or t ∈ s}
For r ∪ s to be valid:
r, s must have the same arity (same number of attributes)
The attribute domains must be compatible (ie: same data type)
Example: To find all the courses taught in the Fall 2009 semester or in the Spring 2010 semester or in both
Πcourse_id (σsemester="F all"∧ year=2009 (section)) ∪ Πcourse_id (σsemester="Spring"∧ year=2010 (section))
Week 4 Lecture 1 3
DIFFERENCE operation
Notation: r −s
Defined as: r − s = {t∣t ∈ r and t ∈
/ s}
Set differences must be taken between compatible relations
r and s must have the same arity

Attribute domains of r and s must be compatible
Example: To find all the courses taught in the Fall 2009 semester, but not in the Spring 2010 semester
Πcourse_id (σsemester="F all"∧ year=2009 (section)) − Πcourse_id (σsemester="Spring"∧ year=2010 (section))
INTERSECTION operation
Notation: r ∩s
Defined as:
r ∩ s = {t∣t ∈ r and t ∈ s}
Assume:
r, s have the same ability

Attributes of r and s are compatible
Note: r ∩ s = r − (r − s)
Week 4 Lecture 1 4
CARTESIAN-PRODUCT operation
Notation: r ×s
Defined as:
r × s = {t q∣t ∈ r and q ∈ s}
Assume that attributes of r(R) and s(S) are disjoint
That is, R ∩ S =ϕ
If attributes of r(R) and s(S) are not disjoint, then renaming must be used.
Week 4 Lecture 1 5
RENAME operation
Allows us to name and, therefore, refer to the results of relational-algebra expressions
Allows us to refer to a relation by more than one name
Example:
ρx (E)
returns the expression E under the name X
If a relational algebra expression E has arity n, then
ρx(A 1 ,A 2 ,...,A n ) (E)

returns the result of the expression E under the name X and with the attributes renamed to
A1 , A2 , ..., An
DIVISION operation
The division operation is applied to two relations
R(Z) ÷ S(X), where X subset Z

Let Y = Z − X (and hence Z = X ∪ Y )
that is, let Y be the set of attributes of R that are not attributes of S
The result of DIVISION is a relation T (Y ) that includes a tuple t if tuples tR appear in R with tR [Y ] = t, and with
tR [X] = ts for every tuple tS in S
For a tuple t to appear in the result T of the DIVISION, the value in t must appear in R in combination with every
tuple in S
Division is a derived operation and can be expressed in terms of other operations
r ÷ s ≡ ΠR−S (r) − ΠR−S (r)((ΠR−S (r) × s) − ΠR−S ,S (r))
DIVISION Example #1
R S R|S
Lecturer Module Subject Lecturer
Brown Compilers Prolog Green
Brown Databases Lewis
Green Prolog
Green Databases
Lewis Prolog
Smith Databases
DIVISION Example #2
R S R|S
Lecturer Module Subject Lecturer
Brown Compilers Databases Green
Brown Databases Prolog
Green Prolog
Green Databases
Lewis Prolog
Smith Databases
DIVISION Example #3
A B1 A / B1
sno pno pno sno
Week 4 Lecture 1 6
sno pno pno sno
s1 p1 p2 s1
s1 p2 s2
B2
s1 p3 s3
s1 p4 pno s4
s2 p1 p2
A / B2
s2 p2 p4
s3 p2 sno
B3
s4 p2 s1
s4 p4 pno s4
p1
A / B3
p2
p4 sno
s1
DIVISION Example #4
Relation r, s
r s r÷s
A B B A
α 1 1 α
α 2 2 β
α 3
β 1
γ 1
δ 1
δ 3
δ 4
∈ 6
∈ 1
β 2
DIVISION Example #5
Relation r, s:
r s
A B C D E D E
α a α a 1 a 1
α a γ a 1 b 1
α a γ b 1
β a γ a 1
β a γ b 3
γ a γ a 1
γ a γ b 1
γ a β b 1
r÷s
A B C
α a γ
γ a γ
Week 4 Lecture 1 7
eg: Students who have taken both "a" and "b" courses, with instructor "1"
(Find all the students who have taken all courses given by the instructor 1)
Week 4 Lecture 1 8
📚
Week 4 Lecture 2
Class BSCCS2001
Materials
Module # 17
Type Lecture
Week # 4
Formal Relational Query Languages (part 2)

Predicate Logic
Predicate Logic or Predicate Calculus is an extension of Propositional Logic or Boolean Algebra
It adds the concept of predicates and quantifiers to better capture the meaning of statements that cannot be adequately
expressed by propositional logic
Tuple Relational Calculus and Domain Relational Calculus are based on Predicate Calculus
Predicate
Consider the statement: "x is greater than 3"
It has 2 parts
The first part is the variable x
It is the subject of the statement
The second part "is greater than 3"
It is the predicate of the statement
This refers to the property that the subject of the statement can have
The statement "x is greater than 3" can be denoted by P (x) where P denotes the predicate "is greater than 3" and x
is the variable
The predicate P can be considered as a function. It tells the truth value of the statement P (x) at x
Once a value has been assigned to the variable x, the statement P (x) becomes a proposition and has a truth
or false value
Week 4 Lecture 2 1
In general, a statement involving n variables x1 , x2 , x3 , ..., xn can be denoted by P (x1 , x2 , x3 , ..., xn )
Here, P is also referred to as the n-place predicate or an n-ary predicate
Quantifiers
In predicate logic, predicates are used alongside quantifiers to express the extent to which a predicate is true over a range
of elements
Using quantifiers to create such propositions is called quantification
There are 2 types of quantifiers:
Universal Quantifier
Existential Quantifier
Universal Quantifier
Universal Quantification: Mathematical statements sometimes assert that a property is true for all the values of a
variable in a particular domain, called the Domain of Discourse
Such a statement is expressed using universal quantification
The universal quantification of P (x) for a particular domain is the proposition that assert that P (x) is true for all
values of x in this domain
The domain is very important here since it decides the possible values of x
Formally, the universal quantification of P (x) is the statement "P (x) for all values of x in the domain"
The notation ∀P (x) denotes the universal quantification of P (x)
Here, ∀ is called the universal quantifier
∀P (x) is read as "for all x P(x)"

Example: Let P (x) be the statement "x + 2 > x"
What is the truth value of the statement ∀xP (x)?
Solution: As x + 2 is greater than x for any real number, so P (x) ≡ T for all x or ∀xP (x) ≡ T
Existential Quantifier
Existential Quantification: Some mathematical statements assert that there is an element with a certain property
Such statements are expressed by existential quantification
Existential quantification can be used to form a proposition that is true if and only if P (x) is true for at least one value of
x in the domain
Formally, the existential quantification of P (x) is the statement "There exists an element x in the domain such that
P (x)"
The notation ∃P (x) denotes the existential quantification of P (x)
Here ∃ is called the existential quantifier
∃P (x) is read as "There is at least one such x such that P (x)"

Example: Let P (x) be the statement "x > 5"
What is the truth value of the statement ∃xP (x)?
Solution: P (x) is true for all real numbers greater than 5 and false for all real numbers less than 5
So, ∃xP (x) ≡T
Tuple Relational Calculus

TRC is a non-procedural query language, where each query is of the form
{t∣P (t)}
where t = resulting tuples
P (t) = known as predicate and these are the conditions that are used to fetch t
P (t) may have various conditions logically combined with OR( ∨ ), AND( ∧ ), NOT( ¬ )
Week 4 Lecture 2 2
It also uses quantifiers:
∃t ∈ r(Q(t)) = "there exists" a tuple in t in relation r such that predicate Q(t) is true
∀t ∈ r(Q(t)) = Q(t) is true "for all" tuples in relation r
{P ∣∃S ∈ Students and (S.CGP A > 8 ∧ P .name = S.name ∧ P .age = S.age)} :
returns the name and age of students with a CGPA above 8
Predicate Calculus Formula

Set of attributes and constants
Set of comparison operators: (eg: <, ≤, =, =

, >, ≥)
Set of connectives: and(∧), or(∨), not(¬)
Implication (⇒): x ⇒ y, if x is true, then y is true

x ⇒ y ≡ ¬x ∨ y
Set of quantifiers:
∃t ∈ r(Q(t)) ≡ "there exists" a tuple in t in relation r such that predicate Q(t) is true
∀t ∈ r(Q(t)) ≡ Q is true "for all" tuples t in relation r
TRC Example #1
Student
Fname Lname Age Course
David Sharma 27 DBMS
Aaron Lilly 17 JAVA
Sahil Khan 19 Python
Sachin Rao 20 DBMS
Varun George 23 JAVA
Simi Verma 22 JAVA
Q. 1: Obtain the first name of students whose age is greater than 21
Solution:
{t.F name ∣ Student ∧ t.age > 21}

{t.F name ∣ t ∈ Student ∧ t.age > 21}
{t ∣ ∃s ∈ Student(s.age > 21 ∧ t.F name = s.F name)}
Fname
David
Varun
Simi
TRC Example #2
Consider the relational schema
student(rollNo, name, year, courseId)

course(courseId, cname, teacher)
Q. 2: Find out the names of all the students who have taken the course named 'DBMS'
{t ∣ ∃s ∈ student ∃c ∈ course(s.courseId = c.courseId ∧ c.cname =′ DBMS ′ ∧ t.name = s.name)}

{s.name ∣ s ∈ student ∧ ∃c ∈ course(s.courseId = c.courseId ∧ c.cname =′ DBMS ′ )}
Q. 3: Find out the names of all students and their rollNo who have taken the course named 'DBMS'
Week 4 Lecture 2 3
{s.name, s.rollNo ∣ s ∈ student ∧ ∃c ∈ course(s.courseId = c.courseId ∧ c.cname =′ DBMS ′ )}
{t ∣ ∃s ∈ student ∃c ∈ course(s.courseId = c.courseId ∧ c.cname =′ DBMS ′ ∧ t.name = s.name ∧
t.rollNo = s.rollNo)}
TRC Example #3
Consider the following relations:
Flights(flno, from, to, distance, departs, arrive)

Aircraft(aid, aname, cruisingrange)
Certified(eid, aid)
Employees(eid, ename, salary)
Q. 4: Find the eids of pilots certified for Boeing aircraft
RA
Πeid (σaname=′ B oeing ′ (Aircraft ⋈ Certified))

TRC
{C.eid ∣ C ∈ Certified ∧ ∃A ∈ Aircraft(A.aid = C.aid ∧ A.aname =′ Boeing′ )}

{T ∣ ∃C ∈ Certified ∃A ∈ Aircraft(A.aid = C.aid ∧ A.aname =′ Boeing′ ∧ T .eid = C.eid)}
TRC Example #4
Flights (flno, from, to, distance, departs, arrives)

Aircraft (aid, aname, cruisingrange)
Certified (eid, aid)
Employees (eid, ename, salary)
Q. 5: Find the names and salaries of certified pilots working on Boeing aircrafts
RA
Πename,salary (σaname=′ B oeing ′ (Aircraft ⋈ Certified ⋈ Employees))

TRC
{P ∣ ∃E ∈ Employees ∃C ∈ Certified ∃A ∈ Aircraft(A.aid = C.aid ∧ A.aname =′ Boeing′ ∧

E.eid = C.eid ∧ P .ename = E.ename ∧ P .salary = E.salary)}
TRC Example #5
Flights (flno, from, to, distance, departs, arrive)

Aircraft (aid, aname, cruisingrange)
Certified (eid, aid)
Employees (eid, ename, salary)
Q. 6: Identify the flights that can be piloted by every pilot whose salary is more than $100, 000
{Fl.flno ∣ F ∈ Flights ∧ ∃C ∈ Certified ∃E ∈ Employees(E.salary > 100, 000 ∧ E.eid = C.eid)}
Safety of Expressions
It is possible to write tuple calculus expressions that generate infinite relations
For example, {t ∣ ¬t ∈ r} results in an infinite relation if the domain of any attribute of the relation r is infinite
To guard against the problem, we restrict the set of allowable expressions to safe expressions
An expression {t ∣ P (t)} in the tuple relational calculus is safe if every component of t appears in one of the
relations, tuples or constants that appear in P
NOTE: This is more than just a syntax condition
Eg: {t ∣ t[A] = 5 ∨ true} is not safe → it defines an infinite set with attribute values that do not appear in any
relation or tuples or constants in P
Week 4 Lecture 2 4
Domain Relational Calculus
A non-procedural query language equivalent in power to the tuple relational calculus
Each query is an expression of the form:
{< x1 , x2 , ..., xn > ∣P (x1 , x2 , ..., xn )}

x1 , x2 , ..., xn represents domain variables
P represents a formula similar to that of the predicate calculus
Equivalence of Relational Algebra, Tuple Relational Calculus & Domain Relational Calculus
SELECT operation
R = (A, B)
Relational Algebra: σB =17 (r)
Tuple Calculus: {t ∣ t ∈ r ∧ B = 17}

Domain Calculus: {< a, b > ∣ < a, b >∈ r ∧ b = 17}
PROJECT operation
R = (A, B)
Relational Algebra: ΠA (r)
Tuple Calculus: {t ∣ ∃p ∈ r(t[A] = p[A])}

Domain Calculus: {< a > ∣ ∃ b (< a, b >∈ r)}
COMBINING operation
R = (A, B)
Relational Algebra: ΠA (σB =17 (r))
Tuple Calculus: {t ∣ ∃p ∈ r(t[A] = p[A] ∧ p[B] = 17)}

Domain Calculus: {< a > ∣ ∃ b (< a, b >∈ r ∧ b = 17)}
UNION
R = (A, B, C) S = (A, B, C)
Relational Algebra: r ∪s
Tuple Calculus: {t ∣ t ∈ r ∨ t ∈ s}
Domain Calculus: {< a, b, c > ∣ < a, b, c >∈ r ∨ < a, b, c >∈ s}
SET DIFFERENCE
R = (A, B, C) S = (A, B, C)
Relational Algebra: r −s
Tuple Calculus: {t ∣t∈r∧t∈
/ s}
Domain Calculus: {< a, b, c > ∣ < a, b, c >∈ r ∧ < a, b, c >∈
/ s}
INTERSECTION
R = (A, B, C) S = (A, B, C)
Relational Algebra: r ∩s
Tuple Calculus: {t ∣ t ∈ r ∧ t ∈ s}
Domain Calculus: {< a, b, c > ∣ < a, b, c >∈ r ∧ < a, b, c >∈ s}
CARTESIAN / CROSS PRODUCT
Week 4 Lecture 2 5
R = (A, B) S = (C, D)
Relational Algebra: r ×s
Tuple Calculus: {t ∣ ∃p ∈ r∃q ∈ s(t[A] = p[A] ∧ t[B] = p[B] ∧ t[C] = q[C] ∧ t[D] = q[D])}
Domain Calculus: {< a, b, c, d > ∣ < a, b >∈ r∧ < c, d >∈ s}
NATURAL JOIN
R = (A, B, C, D) S = (B, D, E)
Relational Algebra:
r⋈s
Πr.A,r.B ,r.C,r.D,s.E (σr.B =s.B ∧r.D=s.D (r × s))
Tuple Calculus:
{t ∣ ∃ p ∈ r ∃ q ∈ s(t[A] = p[A] ∧ t[B] = p[B] ∧ t[C] = p[C] ∧ t[D] = p[D] ∧ t[E] = q[E] ∧ p[B] =
q[B] ∧ p[D] = q[D])}
Domain Calculus:
{< a, b, c, d, e > ∣ < a, b, c, d >∈ r ∧ < b, d, e >∈ s}
DIVISION
R = (A, B) S = (B)
Relational Algebra: r ÷s
Tuple Calculus: {t ∣ ∃ p ∈ r ∀ q ∈ s(p[B] = q[B] ⇒ t[A] = p[A])}
Domain Calculus: {< a > ∣ < a >∈ r ∧ ∀ < b > (< b >∈ s ⇒< a, b >∈ r)}
Source: https://www2.cs.sfu.ca/CourseCentral/354/louie/Equiv_Notations.pdf
Week 4 Lecture 2 6
📚
Week 4 Lecture 3
Class BSCCS2001
Materials
Module # 18
Type Lecture
Week # 4
Entity-Relationship Model
Design Process
What is a Design?
A Design:
Satisfies a given (perhaps informal) functional specification
Conforms to the limitations of the target medium
Meets implicit or explicit requirements on performance and resource usage
Satisfies implicit or explicit design criteria on the form of the artifact
Satisfies restrictions on the design itself, such as its length or cost, or the tools available for doing the design
Role of Abstraction
Disorganized Complexity results from
Storage (STM) limitations of the human brain - an individual can simultaneously comprehend of the order of
seven, plus or minus two chunks of information
Speed limitations of human brain - it takes the mind about five seconds to accept a new chunk of information
Abstraction provides the major tool to handle Disorganized Complexity by chunking information
Ignore in-essential details, deal only with the generalized, idealized model of the world
Consider: A binary number 110010101001
Hard to remember
Week 4 Lecture 3 1
Try the octal form: (110)(010)(101)(001) ⟹ 6251
Or the hex form: (1100)(1010)(1001) ⟹ CA9
Model Building
Physics Electrical Circuits
Time-Distance Equation Kirchoff's Loop Equations
Quantum Mechanics Time Series Signals and FFT
Chemistry Transistor Models
Valency-bond Structures Schematic Diagrams
Geography Interconnect Routing
Maps Building & Bridges
Projections Drawings - Plan, Elevation, Side view
Finite Element Models
Models are common in all engineering disciplines
Model building follows principles of decomposition, abstraction and hierarchy
Each model describes a specific aspect of the system
Build new models upon old proven models
Design Approach
Requirement Analysis: Analyse the data needs of the prospective DB users
Planning
System Defining
DB Designing: Use a modeling framework to create abstraction of the real world
Logical Model
Physical Model
Implementation
Data Conversion and Loading
Testing
Logical Model: Deciding on a good DB schema
Business Decision: What attributes should we record in the DB?
Computer Science Decision: What relation schema should we have and how should the attributes be distributed
among the various relation schema?
Physical Model: Deciding on the physical layout of the DB
Week 4 Lecture 3 2
Entity Relationship Model
Models an enterprise as a collection of entities and relationships
Entity → A distinguishable "thing" or "object" in the enterprise
Described by a set of attributes
Relationship → An association among multiple entities
Represented by an Entity-Relationship or ER diagram
Database Normalization
Formalize what designs are bad and test for them
Entity Relationship (ER) Model

ER Model: Database Modeling
The ER data model was developed to facilitate DB design by allowing specification of an enterprise schema that
represents the overall logical structure of a DB
The ER model is useful in mapping the meanings and interactions of the real world enterprises onto a conceptual
schema
The ER data model employs three basic concepts:
Attributes
Entity sets
Relationship sets
The ER model also has an associated diagrammatic representation, the ER diagram, which can express the overall
logical structure of a DB graphically
Attributes
An attribute is a property associated with an entity / entity set
Based on the values of certain attributes, an entity can be identified uniquely
Attribute types:
Simple and Composite attributes
Single-valued and Multi-valued attributes
Example: Multi-valued attribute: phone_numbers
Derived attributes
Can be computed from other attributes
Example: age, given date_of_birth
Week 4 Lecture 3 3
Domain: The set of permitted values for each attribute
Attributes: Composite
Entity sets
An entity is an object that exists and is distinguishable from other objects
Example: specific person, company, event, plant
An entity set is a set of entities of the same type that share the same properties
Example: set of all persons, companies, trees, holidays
An entity is represented by a set of attributes: ie, descriptive properties possessed by all members of an entity set
Example:
instructor = (ID, name, street, city, salary)

course = (course_id, title, credits)
-- Here ID and course_id are the primary keys, but

-- the tool I am using to make PDFs is not marking them underline
A subset of the attributes form a primary key of the entity set; that is, uniquely identifying each member of the set
Primary key of an entity set is represented by underlining it
Entity sets - instructor and student

instructor student
instructor_id instructor_name student_id student_name
76766 Crick 98988 Tanaka
45565 Katz 12345 Shankar
10101 Srinivasan 00128 Zhang
98345 Kim 76543 Brown
76543 Singh 76653 Aoi
22222 Einstein 23121 Chavez
44553 Peltier
Relationship sets
A relationship is an association among several entities
Example:
44553 (Peltier) advisor 22222 (Einstein)
student entity relationship set instructor entity
A relationship set is a mathematical relation among n ≥ 2 entities, each taken form entity sets
{(e1 , e2 , ..., en )∣e1 ∈ E1 , e2 ∈ E2 , ..., en ∈ En }
where (e1 , e2 , ..., en ) is a relationship
Week 4 Lecture 3 4
Example: (44553, 22222) ∈ advisor
Relationship set: advisor
An attribute can also be associated with a relationship set
For instance, the advisor relationship set between entity sets instructor and student may have the attribute date
which tracks when the student started being associated with the advisor
Binary relationship
involves two entity sets (or degree two)
most relationship sets in a database systems are binary
Relationships between more than two entity sets are rare
Most relationships are binary
Example: students work on research projects under the guidance of an instructor
Relationship proj_guide is a ternary relationship between instructor , student and project
Attributes: Redundant
Suppose we have entity sets:
Week 4 Lecture 3 5
instructors, with attributes: ID, name, dept_name, salary
department, with attributes: dept_name, building, budget
We model the fact that each instructor has an associated department using a relationship set inst_dept
The attribute dept_name appears in both entity sets
Since it is the primary key for the entity set department, it replicates information present in the relationship and is
therefore redundant in the entity set instructor and needs to be removed
BUT: When converting back to tables, in some cases the attributes gets re-introduced, as we will see later
Mapping Cardinality: Constraints

Express the number of entities to which another entity can be associated via a relationship set
Most useful in describing binary relationship sets
For a binary relationship set the mapping cardinality must be one of the following types:
One to One
One to Many
Many to One
Many to Many
Mapping Cardinalities
Week 4 Lecture 3 6
NOTE: Some elements in A and B may not be mapped to any elements in the other set
Weak Entity sets

An entity set may be one of the two types:
Strong entity set
A strong entity set is an entity set that contains sufficient attributes to uniquely identify all its entities
In other words, a primary key exists for a strong entity set
Primary key of a strong entity set is represented by underlining it
Weak entity set
A weak entity set is an entity set that does not contain sufficient attributes to uniquely identify its entities
In other words, a primary key does not exist for a weak entity set
However, it contains a partial key called as the discriminator
Discriminator can identify a group of entities from the entity set
Discriminator is represented by underlining with a dashed line
Since a weak entity set does not have a primary key, it cannot independently exist in the ER model
It features in the model in relationship with a strong entity set
This is called as the identifying relationship
Primary Key of a Weak entity set
The combination of discriminator and primary key of the strong entity set makes it possible to uniquely identify all
entities of the weak entity set
Thus, this combination serves as a primary key for the weak entity set
Clearly, this primary key is not formed by the weak entity set completely
Primary Key of a Weak Entity Set = Its own discriminator + Primary Key of Strong Entity Set
Weak entity set must have total participation in the identifying relationship
That is, all the entities must feature in the relationship
Weak Entity set: Example

Strong Entity Set: Building(building_no, buildname, address)
Week 4 Lecture 3 7
building_no is the primary key here
Weak Entity Set: Apartment(door_no, floor)
door_no is its discriminator as door_no alone can not identify an apartment uniquely
There may be several other buildings having the same door number
Relationship: BA between Building and Apartment
By total participation in BA, each apartment must be present in at least one building
In contrast, Building has partial participation in BA only as there might exist some buildings which has not apartment
Primary Key: To uniquely identify an apartment
First, building_no is required to identify the particular building
Second, door_no of the apartment is required to uniquely identify the apartment
Primary Key of Apartment = Primary Key of the Building + Its own discriminator = building_no + door_no
Weak Entity set: Example #2

Consider a section entity, which is uniquely identified by a course_id, semester, year and sec_id
Clearly, section entities are related to course entities
Suppose we create a relationship set sec_course between entity sets section and course
Note that the information in sec_course is redundant, since section already has an attribute course_id, which identifies
the course with which the section is related
Week 4 Lecture 3 8
📚
Week 4 Lecture 4
Class BSCCS2001
Materials
Module # 19
Type Lecture
Week # 4
Entity-Relationship Model (part 2)

ER Diagram
Entity Sets
Entities can be represented graphically as follows:
Rectangles represent entity set instructor
ID
Attributes are listed inside entity rectangle
name
Underline indicates primary key attributes
salary
student
ID
name
tot_cred
Relationship sets
Diamonds represent relationship sets
Week 4 Lecture 4 1
Relationship sets with attributes
Roles
Entity sets of relationship need not be distinct
Each occurrence of an entity set plays a "role" in the relationship
The labels "course_id" and "prereq_id" are called roles
Cardinality Constraints
We express cardinality constraints by drawing either a directed line ( → ), signifying "one" or an undirected line (−),
signifying "many" between the relationship set and the entity set
One to One relationship between an instructor and a student:
A student is associated with at most one instructor via the relationship advisor
An instructor is associated with at most one student via the relationship advisor
One-to-Many relationship
Week 4 Lecture 4 2
One-to-Many relationship between an instructor and a student
An instructor is associated with several (including 0) students via advisor
A student is associated with at most one instructor via advisor
Many-to-Many relationship
An instructor is associated with several (including 0) students via advisor
A student is associated with several (including 0) instructors via advisor
Total and Partial participation

Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the
relationship set
participation of student in advisor relation is total
every student must have an associated instructor
Partial participation: some entities may not participate in any relationship in the relationship set
Example: participation of instructor in advisor is partial
Notation for expressing more complex constraints

A line may have an associated minimum and maximum cardinality, shown in the form l..h, where l is the minimum and
h is the maximum cardinality
A minimum value of 1 indicates total participation
A maximum value of 1 indicates that the entity participation in at most one relationship
A maximum value of ∗ indicates no limit
Week 4 Lecture 4 3
Instructor can advise 0 or more students
A student must have 1 advisor; cannot have multiple advisors
Notation to express entity with complex attributes

instructor
ID
name
first_name
middle_initial
last_name
address
street
street_number
street_name
apt_number
city
state
zip
{ phone_number }
date_of_birth
age()
Expressing Weak entity sets

In ER diagrams, a weak entity set is depicted via a double rectangle
We underline the discriminator of a weak entity set with a dashed line
The relationship set connecting the weak entity set to the identifying strong entity set is depicted by a double diamond
Primary key for section - (course_id, sec_id, semester, year)
ER diagram for a University enterprise
Week 4 Lecture 4 4
ER Model to Relational Schema
Reduction to Relation Schema
Entity sets and relationship sets can be expressed uniformly as relation schemas that represent the contents of the
DB
A DB which conforms to an ER diagram can be represented by a collection of schemas
For each entity set and relationship set there is a unique schema that is assigned the name of the corresponding
entity set or relationship set
Each schema has a number of columns (generally corresponding to attributes) which have unique names
Representing entity sets

A strong entity set reduces to a schema with the same attributes
student (ID, name, tot_cred)
A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set
section (course_id, sec_id, sem, year)
Week 4 Lecture 4 5
Representing relationship sets
A many-to-many relationship set is represented as a schema with attributes for the primary keys of the two
participating entity sets and any descriptive attributes of the relationship set
Example: schema for relationship set advisor
advisor = (s_id, i_id)
Representation of entity sets with composite attributes

Composite attributes are flattened out by creating a separate attribute for each component attribute
Example: Given entity set instructor with composite attribute name with component attributes first_name and
last_name the schema corresponding to the entity set has two attributes name_first_name and
name_last_name
Prefix omitted if there is no ambiguity (name_first_name could simply be first_name)
Ignoring multi-valued attributes, extended instructor schema is
instructor (ID, first_name, middle_initial, last_name,

street_number street_name, apt_number, city,
state, zip_code, date_of_birth)
Representation of Entity sets with multi-valued attributes

A multi-valued attribute M of an entity E is represented by a separate schema EM
Schema EM has attributes corresponding to the primary key of E and an attribute corresponding to multi-valued
attribute M
Example: Multi-valued attribute phone_number of instructor is represented by a schema:
inst_phone = (ID, phone_number)
Each value of the multi-valued attribute maps to a separate tuple of the relation on schema EM
For example: an instructor entity with primary key 22222 and phone numbers 456-7890 and 123-4567 maps to
two tuples: (22222, 456-7890) and (22222, 123-4567)
Redundancy of the Schema

Many-to-One and One-to-Many relationship sets that are total on the many-side can be represented by adding an
extra attribute to the "many" side, containing the primary key of the "one" side
Example: Instead of creating a schema for relationship set inst_dept, add an attribute dept_name to the schema
arising from entity set instructor
Week 4 Lecture 4 6
For One-to-One relationship sets, either side can be chosen to act as the "many" side
That is, an extra attribute can be added to either of the tables corresponding to the two entity sets
If participation is partial on the "many" side, replacing a schema by an extra attribute in the schema corresponding to
the "many" side could result in null values
The schema corresponding to a relationship set linking a weak entity set to its identifying strong entity set is redundant
Example: The section schema already contains the attributes that would appear in the sec_course schema
Week 4 Lecture 4 7
📚
Week 4 Lecture 5
Class BSCCS2001
Materials
Module # 20
Type Lecture
Week # 4
Entity-Relationship Model (part 3)

Extended ER features
Non-binary Relationship sets
Most relationship sets are binary
There are occasions when it is more convenient to represent relationships as non-binary
ER diagram with a Ternary Relationship
Cardinality constraints on Ternary Relationship
Week 4 Lecture 5 1
We allow at most one arrow out of a ternary (or greater degree) relationship to indicate a cardinality constraint
For example, an arrow from proj_guide to instructor indicates each student has at most one guide for a project
If there is more than one arrow, there are two ways of defining the meaning
For example, a ternary relationship R between A, B and C with arrows to B and C could mean
Each A entity is associated with a unique entity from B and C or
Each pair of entities form (A, B) is associated with a unique entity and each pair (A, C) is associated with a
unique B
Each alternative has been used in different formalisms
To avoid confusion we outlaw more than one arrow
Specialization: ISA
Top-down design process: We designate sub-groupings within an entity set that are distinctive from other entities in
the set
These sub-groupings become lower-level entity sets that have attributes or participate in relationships that do not
apply to the higher-level entity set
Depicted by a triangle component leveled ISA (eg: instructor "is a" person)
Attribute inheritance: A lower-level entity set inherits all the attributes and relationship participation of the higher-
level entity set to which it is linked
Overlapping: employee and student
Disjoint: instructor and secretary
Total and Partial
Representing Specialization via Schema

Method 1:
Form a schema for the higher-level entity
Week 4 Lecture 5 2
Form a schema for each lower-level entity set, include primary key of higher-level entity set and local attributes
schema attributes
person ID, name, street, city
student ID, tot_cred
employee ID, salary
Drawback: Getting information about an employee requires accessing two relations, the one corresponding to the
low-level schema and the one corresponding to the high-level schema
Method 2:
Form a schema for each entity set with all local and inherited attributes
Name Tags
person ID, name, street, city
student ID, name, street, city, tot_cred
employee ID, name, street, city, salary
Drawback: name, street and city may be stored redundantly for people who are both students and employees
Generalization
Bottom-up design process: Combine a number of entity sets that share the same features into a higher-level entity
set
Specialization and generalization are simple inversions of each other; they are represented in an ER diagram in the
same way
The terms specialization and generalization are used interchangeably
Design constraints on a specialization / generalization

Completeness constraint: Specifies whether or not an entity in the higher-level entity set must belong to at least one
of the lower-level entity sets within a generalization
total: an entity must belong to one of the lower-level entity sets
partial: an entity need not belong to one of the lower-level entity sets
Partial generalization is the default
We can specify total generalization in an ER diagram by adding the keyword total in the diagram
Drawing a dashed line from the keyword to the corresponding hollow arrow-head to which it applies (for a total
generalization) or to the set of hollow arrow-heads to which it applies (for an overlapping generalization)
The student generalization is total
All student entities must be either graduate or undergraduate
Because the higher-level entity set arrived at through generalization is generally composed of only those entities
in the lower-level entity sets, the completeness constraint for a generalized higher-level entity set is usually total
Aggregation
Consider the ternary relationship proj_guide, which we saw earlier
Suppose we want to record evaluations of a student by a guide on a project
Week 4 Lecture 5 3
Relationship sets eval_for and proj_guide represent overlapping information
Every eval_for relationship corresponds to a proj_guide relationship
However, some proj_guide relationships may not correspond to any eval_for relationships
So, we cannot discard the proj_guide relationship
Eliminate this redundancy via aggregation
Treat relationship as an abstract entity
Allows relationships between relationships
Abstraction of relationship into new entity
Eliminate this redundancy via aggregation without introducing redundancy, the following diagram represents:
A student is guided by a particular instructor on a particular project
A student, instructor, project combination may have an associated evaluation
Week 4 Lecture 5 4
Representing aggregation via Schema
To represent aggregation, create a schema containing
Primary key of the aggregated relationship
The primary key of the associated entity set
Any descriptive attributes
In our example
The schema
textiteval_for is:
eval_for (s_ID, project_id, i_ID, evaluation_id)
The schema proj_guide is redundant
Design Issues
Entities v/s Attributes
Use of entity sets v/s attributes
Use of phone as an entity allows extra information about phone numbers (plus multiple phone numbers)
Entities v/s Relationship sets

Use of entity sets v/s relationship sets
Possible guideline is to designate a relationship set to describe an action that occurs between entities
Week 4 Lecture 5 5
Placement of relationship attributes
For example, attribute date as attribute of advisor or as attribute of student
Binary v/s Non-binary Relationships

Although, it is possible to replace any non-binary (n-ary, for n > 2) relationship set by a number of distinct binary
relationship sets, an n-ary relationship set shows more clearly that several entities participate in a single relationship
Some relationships that appear to be non-binary may be better represented using binary relationships
For example, a ternary relationship parents, relating a child to his/her father and mother, is best replaced by two
binary relationships, father and mother
Using two binary relationships allows partial information (eg: only mother being known)
But there are some relationships that are naturally non-binary
Example: proj_guide
Binary v/s Non-binary Relationships: Conversion

In general, any non-binary relationship can be represented using binary relationships by creating an artificial entity set
Replace R between entity sets A, B and C by an entity set E, and three relationship sets:
RA , relating E and A
RB , relating E and B
RC , relating E and C
Create an identifying attribute for E and add any attributes of R to E
For each relationship (ai , bi , ci ) in R, create
A new entity ei in the entity set E
add (ei , ai ) to RA
add (ei , bi ) to RB
add (ei , ci ) to RC
Week 4 Lecture 5 6
Also need to translate constraints
Translating all constraints may not be possible
There may be instance in the translated schema that cannot correspond to any instance of R
Exercise: add constraints to the relationships RA , RB and RC to ensure that a newly created entity
corresponds to exactly one entity in each of entity sets — A, B and C
We can avoid creating an identifying attribute by making E, a weak entity set identified by the three relationship
sets
ER Design Decisions
The use of an attribute or entity set to represent an object
Whether a real-world concept is best expressed by an entity or a relationship set
The use of a ternary relationship versus a pair of binary relationships
The use of strong or weak entity set
The use of specialization/generalization — contributes to modularity in the design
The use of aggregation — can treat the aggregate entity set as a single unit without concern for the details of its
internal structure
Symbols used in the ER Notation
Week 4 Lecture 5 7
Week 4 Lecture 5 8
Week 4 Lecture 5 9

Quiz 1 Notes DBMS

Uploaded by

Copyright:

Available Formats

Quiz 1 Notes DBMS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quiz 1 Notes DBMS

Uploaded by

Copyright:

Available Formats

📚

Created @August 19, 2021 1:46 PM

Database Management Systems (DBMS)

DBMS contains info about a particular enterprise

Set of programs to access the data

An environment that is both convenient and efficient to use

Airlines: reservations, schedules

Universities: registration, grades

Sales: customers, products, purchases

Online retailers: order tracking, customized recommendations

Manufacturing: production, inventory, orders, supply chain

HR: employee records, salaries, tax deductions

Databases can be very large

Register students for courses and generate class rosters

Drawbacks of using file systems to store data

Multiple file formats, duplication of information in different files

Difficulty in accessing data

Need to write a new program to carry out each new task

Multiple files and formats

Hard to add new constraints or change existing ones

Concurrent access by multiple users

Concurrent access needed for performance

Uncontrolled concurrent accesses can lead to inconsistencies

Hard to provide user access to some, but not all, data

Database systems offer solutions to the above problems

Membership, Subset, Superset, Power set, Universal set

Unions, Intersections, Complement, Difference, Cartesian product

Relations and Functions

Ordered pairs and Binary relations

Domain and Range

Properties: Reflexive, Symmetric, Anti-symmetric, Transitive, Total

Properties of functions: Injective, Surjective, Bijective

Operators: conjunction (and), disjunction (or), negation (not), implication, equivalence

Closure under Operations

Algorithms and Programming in C

Binary Search Tree

Object-Oriented analysis and design

Discrete Mathematics by Brilliant: https://brilliant.org/wiki/discrete-mathematics

IITM online book: https://pypod.github.io

DataCamp Cheatsheet: https://www.datacamp.com/community/tutorials/python-data-science-cheat-sheet-

Created @August 19, 2021 3:48 PM

Small / Big Enterprises

There has been 2 major approaches in this practice:

1950s: Computer programming started

COBOL and CODASYL approach was introduced in 1971

Magnetic disks became prevelant

1980s: RDBMS changed the face of data management

1990s: With internet, data management started becoming global

2010s: Data Science started riding high

Electronic Data Management Params

Problems with such an approach of book keeping:

Security: Susceptible to tampering by the outsiders

Retrieval: Time consuming process to search for previous entry

Consistency: Prone to human errors

Spreadsheet files - A better solution

Security: Can be password protected

Why leave filesystems?

Ensuring consistency of data is a big challenge

No means to check violations of constraints in the face of concurrent processing

Unable to give different permissions to different people in a centralized manner