Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Quiz 1 Notes DBMS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 122

📚

Week 1 Lecture 1
Class BSCCS2001

Created @August 19, 2021 1:46 PM

Materials https://drive.google.com/drive/folders/19FhdYYKeH3ZshWhoZIJlP_MC1nVnUUmU?usp=sharing

Module # 1

Type Lecture

Week # 1

Database Management Systems (DBMS)

🚨 DBMS: A database management system (or DBMS) is essentially nothing more than a computerized data-
keeping system. (via IBM)

DBMS contains info about a particular enterprise


Collection of interrelated data

Set of programs to access the data

An environment that is both convenient and efficient to use

Database Applications:
Banking: transactions

Airlines: reservations, schedules

Universities: registration, grades

Sales: customers, products, purchases

Online retailers: order tracking, customized recommendations

Manufacturing: production, inventory, orders, supply chain

HR: employee records, salaries, tax deductions

Databases can be very large


Databases touch various aspects of our lives

Week 1 Lecture 1 1
University Database Example
Application program examples
Add new students, instructors and courses

Register students for courses and generate class rosters

Assign grades to students, compute Grade Point Average (GPA) and generate transcripts

In early days, database applications were built directly on top of file systems

Drawbacks of using file systems to store data


Data redundancy and inconsistency

Multiple file formats, duplication of information in different files

Difficulty in accessing data

Need to write a new program to carry out each new task

Data isolation

Multiple files and formats

Integrity problems

Integrity constraints (eg: account balance > 0) become "buried" in program code rather than being stated explicity

Hard to add new constraints or change existing ones

Atomicity of updates

Failures may leave databases in an inconsistent state with partial updates carries out

Example: Transfer of funds from one account to another should either complete or not happen at all

Concurrent access by multiple users

Concurrent access needed for performance

Uncontrolled concurrent accesses can lead to inconsistencies

Example: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at
the same time

Security problems

Hard to provide user access to some, but not all, data

Database systems offer solutions to the above problems

Course pre-requisites:
Set Theory
Definition of a set

Intensional definition

Extensional definition

Set-builder notation

Membership, Subset, Superset, Power set, Universal set

Operations on sets:

Unions, Intersections, Complement, Difference, Cartesian product

De-Morgan's Law

Relations and Functions


Definition of Relations

Ordered pairs and Binary relations

Domain and Range

Week 1 Lecture 1 2
Image, Pre-image, Inverse

Properties: Reflexive, Symmetric, Anti-symmetric, Transitive, Total

Definition of functions

Properties of functions: Injective, Surjective, Bijective

Composition of functions

Inverse of functions

Propositional Logic
Truth values and Truth tables

Operators: conjunction (and), disjunction (or), negation (not), implication, equivalence

Closure under Operations

Predicate Logic
Predicates

Quantification

Existential

Universal

Python

Algorithms and Programming in C


Sorting

Merge sort

Quick sort

Search

Linear search

Binary search

Interpolation search

Data Structures
Arrays

List

Binary Search Tree

Balanced Tree

B - Tree

Hash table/map

Object-Oriented analysis and design


Refresher material

Discrete Mathematics by Brilliant: https://brilliant.org/wiki/discrete-mathematics

Python

IITM online book: https://pypod.github.io

Cheatsheet: https://www.pythoncheatsheet.org

DataCamp Cheatsheet: https://www.datacamp.com/community/tutorials/python-data-science-cheat-sheet-


basics

Week 1 Lecture 1 3
C Language: https://www.youtube.com/watch?
v=zYierUhIFNQ&list=PLhQjrBD2T382_R182iC2gNZI9HzWFMC_8&index=2 (part of CS50 2020 Lectures)

Week 1 Lecture 1 4
📚
Week 1 Lecture 2
Class BSCCS2001

Created @August 19, 2021 3:48 PM

Materials

Module # 2

Type Lecture

Week # 1

Why DBMS?
Data Management
Storage

Retrieval

Transaction

Audit

Archival

For

Individuals

Small / Big Enterprises

Global

There has been 2 major approaches in this practice:

1. Physical:
Physical Data or Records Management, more formally known as Book Keeping, has been using physical ledgers
and journals for centuries
The most significant development happened when Henry Brown patented a "receptacle for storing and preserving
papers" on November 2, 1886

Herman Hollerith adapted the punch cards used for weaving looms to act as the memory for a mechanical tabulating
machine in 1890

Week 1 Lecture 2 1
2. Electronic:

Electronic Data or Records management moves with the advances in technology, especially of memory, storage,
computing and networking

1950s: Computer programming started

1960s: Data Management with punch cards / tapes and magnetic tapes

1970s:

COBOL and CODASYL approach was introduced in 1971

On October 14, 1979, Apple II platform shipped VisiCalc, marking the birth of spreadsheets

Magnetic disks became prevelant

1980s: RDBMS changed the face of data management

1990s: With internet, data management started becoming global

2000s: e-Commerce boomed, NoSQL was introduced for unstructured data management

2010s: Data Science started riding high

Electronic Data Management Params


Electronic Data or Records management depends on various params including ...

Durability

Scalability

Security

Retrieval

Ease of Use

Consistency

Efficiency

Cost

Book Keeping
A book register was maintained on which the shop owner wrote the amount received from customers, the amount due for
any customer, inventory details and so on ...

Problems with such an approach of book keeping:

Durability: Physical damage to these registers is a possibility due to rodents, humidity, wear and tear

Scalability: Very difficult to maintain over the years, some shops have numerous registers spanning over the years

Security: Susceptible to tampering by the outsiders

Retrieval: Time consuming process to search for previous entry

Consistency: Prone to human errors

Not only small shops but large orgs also used to maintain their transactions in book registers

Spreadsheet files - A better solution


Mostly useful for single user or small enterprise applications
Spreadsheet software like Google Sheets: Due to disadvantages of maintaining ledger registers, organizations dealing
with huge amount of data shifted to using spreadsheets for maintaining records in files

Durability: These are computer applications and hence data is less prone to physical damage

Scalability: Easier to search, insert and modify records as compared to book ledgers

Security: Can be password protected

Easy to Use: Computer applications are used to search and manipulate records in the spreadsheets leading to
reduction in manpower needed to perform routing computations

Week 1 Lecture 2 2
Consistency: Not guaranteed but spreadsheets are less prone to mistakes registers

Why leave filesystems?


Lack of efficiency in meeting growing needs

With rapid scale up of data, there has been considerable increase in the time required to perform most operations

A typical spreadsheet file may have an upper limit on the number of rows

Ensuring consistency of data is a big challenge

No means to check violations of constraints in the face of concurrent processing

Unable to give different permissions to different people in a centralized manner

A system crash could be catastrophic

The above mentioned limitations of filesystems paved the way for a comprehensive platform dedicated to management of
data - the Database Management System

History of Database Systems


1950s and early 1960s

Data processing using magnetic tapes for storage

Tapes provided only sequential access

Punched cards for input

Late 1960s and 1970s

Hard disks allowed direct access to data

Network and hierarchical data model in widespread use

Ted Codd defines the relational data model

Would win the ACM Turin Award for his work

IBM Research begins in System R prototype

UC Berkeley begins Ingres prototype

High-performance (for the era) transaction processing

1980s

Research relational prototypes evolve into commercial systems - SQL becomes industrial standard

Parallel and distributed database systems

Object oriented database systems

1990s

Large decision support and data mining applications

Large multi-terabyte data warehouses

Emergence of Web commerce

Early 2000s

XML and XQuery standards

Automated database administration

Later 2000s

Giant data storage systems - Google BigTable, Yahoo PNuts, Amazon, ...

Week 1 Lecture 2 3
📚
Week 1 Lecture 3
Class BSCCS2001

Created @August 19, 2021 4:47 PM

Materials

Module # 3

Type Lecture

Week # 1

Why DBMS? (part 2)


Case study of a Bank Transaction
Consider a simple banking system where a person can open a bank account, transfer funds to an existing account and
check the history of all her transactions till date

The application performs the following checks

If the account balance is not enough, it will now allow the fund transfer

If the account numbers are not correct, it will flash a message and terminate the transaction

If a transaction is successful, it prints a confirmation message

We will use this banking transaction system to compare various features of a file-based (.csv file) implementation viz-a-viz a
DBMS-based implementation

Account details are stored in

Accounts.csv for file-based implementation

Accounts table for DBMS implementation

The transaction details are stored in

Ledger.csv for file-based implementation

Ledger table for DBMS implementation

Source: https://github.com/bhaskariitm/transition-from-files-to-db

Initiating a transaction
Python

Week 1 Lecture 3 1
def begin_Transaction(credit_account, debit_account, amount):
temp = []
success = 0

# Open file handles to retrieve and store transaction data


f_obj_Account1 = open('Accounts.csv', 'r')
f_reader1 = csv.DictReader(f_obj_Account1)
f_obj_Account2 = open('Accounts.csv', 'r')
f_reader2 = csv.DictReader(f_obj_Account2)
f_obj_Ledger = open('Ledger.csv', 'a+')
f_writer = csv.DictWriter(f_obj_Ledger, fieldnames=col_name_Ledger)

SQL

-- Handled implicitly by the DBMS

Transaction
Python

try:
for sRec in f_reader1:
# CONDITION CHECK FOR ENOUGH BALANCE
if sRec['AcctNo'] == debitAcc and int(sRec['Balance']) > int(amt):
for rRec in f_reader2:
if rRec['AcctNo'] == creditAcc:
sRec['Balance'] = str(int(sRec['Balance']) - int(amt)) # DEBIT
temp.append(sRec)
# CRITICAL POINT
f_writer.writerow({
'Acct1':sRec['AcctNo'],
'Acct2':rRec['AcctNo'],
'Amount':amt,
'D/C':'D'
})
rRec['Balance'] = str(int(rRec['Balance']) + int(amt)) # CREDIT
temp.append(rRec)
f_writer.writerow({'Account1': r_record['Account_no'], 'Account2': s_record['Account_no'], 'Amount': amount,'D/C': 'C'})
success = success + 1
break
f_obj_Account1.seek(0)
next(f_obj_Account1)
for record in f_reader1:
if record['Account_no'] != temp[0]['Account_no'] and record['Account_no'] != temp[1]['Account_no']:
temp.append(record)
except:
print('\nWrong input entered !!!')

SQL

do $$
begin
amt = 5000
sendVal = '1800090';
recVal = '1800100';
select balance from accounts
into sbalance
where account_no = sendVal;
if sbalance < amt then
raise notice "Insufficient balance";
else
update accounts
set balance = balance - amt
where account_no = sendVal;
insert into ledger(sendAc, recAc, amnt, ttype)
values(sendVal, recVal, amt, 'D')
update accounts
set balance = balance + amt
where account_no = recVal;
insert into ledger(sendAc, recAc, amnt, ttype)
values(sendVal, recVal, amt, 'C')
commit;
raise notice "Successful";
end if;
end; $$

Week 1 Lecture 3 2
Closing a transaction
Python

f_obj_Account1.close()
f_obj_Account2.close()
f_obj_Ledger.close()
if success == 1:
f_obj_Account = open('Accounts.csv', 'w+', newline='')
f_writer = csv.DictWriter(f_obj_Account, fieldnames=col_name_Account)
f_writer.writeheader()
for data in temp:
f_writer.writerow(data)
f_obj_Account.close()
print("\nTransaction is successfull !!")
else:
print('\nTransaction failed : Confirm Account details')

SQL

-- Handled implicitly by the DBMS

Comparison

Parameter File handling via Python DBMS

Scalability with
Very difficult to handle insert, update and querying of In-built features to provide high scalability for a large
respect to amount of
records number of records
data
Scalability with
Extremely difficult to change the structure of records Adding or removing attributes can be done seamlessly
respect to changes in
as in the case of adding or removing attributes using simple SQL queries
structure
Time of execution in seconds in milliseconds
Data processed using temporary data structures Data persistence is ensured via automatic, system
Persistence
have to be manually updated to the file induced mechanisms

Ensuring robustness of data has to be done Backup, recovery and restore need minimum manual
Robustness
manually intervention
Difficult to implement in Python (Security at OS
Security User-specific access at database level
level)
Most file access operations involve extensive coding Standard and simple built-in queries reduce the effort
Programmer's
to ensure persistence, robustness and security of involved in coding thereby increasing a programmer's
productivity
data throughput
Arithmetic operations Easy to do arithmetic computations Limited set of arithmetic operations are available

Low costs for hardware, software and human


Costs High costs of hardware, software and human resources
resources

Parameterized Comparison
Scalability
File Handling in Python

Number of records: As the # of records increases, the efficiency of flat files reduces:

the time spent in searching for the right records

the limitations of the OS in handling huge files

Structural Change: To add an attribute, initializing the new attribute of each record with a default value has to be done
by program. It is very difficult to detect and maintain relationships between entities if and when an attribute has to be
removed

DBMS

Number of records: Databases are built to efficiently scale up when the # of records increase drastically.

In-built mechanisms, like indexing, for quick access of right data

Week 1 Lecture 3 3
Structural Changes: During adding an attribute, a default value can be defined that holds for all existing records - the
new attribute gets initialized with default value. During deletion, constraints are used either not to allow the removal on
ensure its safe removal

Time and Efficiency


If the number of records is very small, the overhead in installing and configuring a database will be much more than the
time advantage obtained from executing the queries

However, in the number of records is really large, then the time required in the initialization process of a database will
be negligible as compared to that of using SQL queries

File Handling in Python

The effort needed to implement a file handler is quite less in Python

In order to process a 1GB file, a program in Python would typically take a few seconds

DBMS

The effort to install and configure a DB in a DB server in expensive and time consuming

In order to process a 1GB file, an SQL query would typically take a few milliseconds

Programmer's Productivity
File Handling in Python

Building a file handler: Since the constraints within and across entities have to be enforced manually, the effort
involved in building a file handling application is huge

Maintenance: To maintain the consistency of data, one must regularly check for sanity of data and the relationships
between entities during inserts, updates and deletes

Handling huge data: As the data grows beyond the capacity of the file handler, more efforts are needed

DBMS

Configuring the database: The installation and configuration of a database is a specialized job of a DBA. A
programmer, on the other hand, is saved the trouble

Maintenance: DBMS has built-in mechanisms to ensure consistency and sanity of data being inserted, updated or
deleted. The programmer does not need to do such checks

Handling huge data: DBMS can handle even terabytes of data - Programmer does not have to worry

Arithmetic Operations
File Handling in Python

Extensive support for arithmetic and logical operations on data using Python. These include complex numerical
calculations and recursive computations

DBMS

SQL provides limited support for arithmetic and logical operations. Any complex computation has to be done outside of
SQL

Costs and Complexity


File Handling in Python

File systems are cheaper to install and use. No specialized hardware, software or personnel are required to maintain
filesystems

DBMS

Large databases are served by dedicated database servers which need large storage and processing power

DBMSs are expensive software that have to be installed and regularly updated

Databases are inherently complex and need specialized people to work on it - like DBA (Database System
Administrator)

The above factors lead to huge costs in implementing and maintaining database management systems

Week 1 Lecture 3 4
📚
Week 1 Lecture 4
Class BSCCS2001

Created @August 19, 2021 5:55 PM

Materials

Module # 4

Type Lecture

Week # 1

Introduction to DBMS
Levels of Abstraction
Physical Level: describes how a record (eg: instructor) is stored

Logical Level: describes data stored in a database and the relationships among the data fields

type instructor = record


ID: string;
name: string;
dept_name: string;
salary: integer;
end;

View Level: application programs hide details of data types

Views can also hide information (such as employee's salary) for security purposes

An architecture for a database system

Week 1 Lecture 4 1
Schema and Instances
TLDR: Schema is the way in which data is organized and Instance is the actual value of the data

Schema

Logical Schema - the overall logical structure of the database

Analogous to type information of a variable in a program (eg: int x = 5)

Example: The database consists of information about a set of customers and accounts in a bank and the
relationship between them

Customer Schema

Name Customer ID Account # Aadhaar ID Mobile #


Untitled

Account Schema

Account # Account Type Interest Rate Min. Bal. Balance


Untitled

Physical Schema - the overall physical structure of the database

Instance

The actual content of the database at a particular point in time

Analogous to the value of a variable

Customer Instance

Name Customer ID Account # Aadhaar ID Mobile #


Pavan Lakha 6728 917322 182719289372 9830100291
Lata Kala 8912 827183 918291204829 7189203928

Nand Prabhu 6617 372912 127837291021 8892021892

Account Instance

Account # Account Type Interest Rate Min. Bal. Balance

Week 1 Lecture 4 2
Account # Account Type Interest Rate Min. Bal. Balance

917322 Savings 4.0% 5000 7812


372912 Current 0.0% 0 291820

827183 Term Deposit 6.75% 10000 100000

Physical Data Independence - the ability to modify the physical schema without changing the logical schema

Analogous to independence of Interface and Implementation in object-oriented systems

Applications depend on the logical schema

In general, the interfaces between various levels and components should be well defined so that changes in some
parts do not seriously influence others.

Data Models
A collection of tools that describe the following ...

Data

Data relationships

Data semantics

Data constraints

Relational model (our focus in this course)

Entity-Relationship data model (mainly for database design)

Object-based data models (Object-oriented and Object-relational)

Other older models

Network model

Hierarchical model

Recent models for Semi-structured or Unstructured data

Converted to easily manageable formats

Content Addressable Storage (CAS) with metadata descriptors

XML format

RDBMS which support BLOBs

Relational Model
All the data is stored in various tables

Tables are also called Relations

Columns are called attributes

They have particular names which tells us the schema

Rows are records that are the values

Data Definition Language (DDL)


Specification notation for defining the database schema

Example

create table instructor (


ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8, 2))

DDL compiler generates a set of table templates stored in a data dictionary

Data dictionary contains metadata (that is, data about the data)

Database schema

Week 1 Lecture 4 3
Integrity constraints

Primary key (ID uniquely identifies instructors)

Authorization

Who can access what

Data Manipulation Language (DML)


Language for accessing and manipulating the data organized by the appropriate data model

DML: also know as Query Language

Two classes of languages

Pure - used for proving properties about computational power and for optimization

Relational Algebra (our focus in this course)

Tuple relational calculus

Domain relational calculus

Commercial - used in commercial systems

SQL is the most widely used commercial language

Structured Query Language (SQL)


Most widely used commercial language

SQL is NOT a Turing Machine equivalent language. Read more here

Cannot be used to solve all problems that a C program, for example, can solve

To be able to compute complex complex functions, SQL is usually embedded in some higher-level language

Application programs generally access databases through one of ...

Language extensions to allow embedded SQL

Application Programming Interfaces or APIs (eg: ODBC / JDBC) which allow SQL queries to be sent to the
databases

Database Design
The process of designing the general structure of the database:

Logical Design - Deciding on the database schema. Database design requires that we find a good collection of
relation schema

Business decision

What attributes should we record in the databases?

Computer Science decision

What relation schemas should we have and how should the attributes be distributed among the various
relation schemas?

Physical Design - Deciding on the physical layout of the database

Week 1 Lecture 4 4
📚
Week 1 Lecture 5
Class BSCCS2001

Created @August 20, 2021 11:13 AM

Materials

Module # 5

Type Lecture

Week # 1

Introduction to DBMS (part 2)


Database Design
Design Approaches
Need to come up with a methodology to ensure that each relation in the database is good

Two ways of doing so:

Entity Relationship Model (primarily tries to capture the business requirements)

Models an enterprise as a collection of entities and relationships

Represented diagrammatically by an entity-relationship diagram

Normalization Theory (this is the Computer Science perspective)

Formalize what designs are bad and test for them

Object-Relational Data Models


Relational model: flat, atomic values

Object Relational Data Models

Extend the relational data model by including object orientation and constructs to deal with added data types

Allow attributes of tuples to have complex types, including non-atomic values such as nested relations

Preserve relational foundations, in particular the declarative access to data, while extending modeling power

Provide upward compatibility with existing relational language

Week 1 Lecture 5 1
XML: eXtensible Markup Language
Defined by the WWW Consortium (W3C)

What XML primarily says; XML is a description of name-value pair

It talks about a tag, so you can put a value on that

Originally intended as a document markup language not a database language

The ability to specify new tags and to create tag structures made XML a great way to exchange data, not just
documents

XML has become the basis for all new generation data interchange formats

A wide variety of tools are available for parsing, browsing and querying XML documents

Database Engine
3 major components are:

Storage Manager

Query processing

Transaction Manager

Storage Management
Storage Manager is a program module that provides the interface between the low-level data stored in the database and
the application programs and queries submitted to the system

The storage manager is responsible for the following tasks:

Interaction with the OS file manager

Efficient storing, retrieving and updating of data

Issues:

Storage access

File organization

Indexing and hashing

Query Processing
Parsing and Translation

Optimization

Evaluation

How a query is processed?

Alternative ways of evaluating a given query

Equivalent expressions

Different algorithms for each operation

Cost difference between a good and a bad way of evaluating a query can be enormous

Need to estimate the cost of operations

Depends critically on statistical information about relations which the database must maintain

Need to estimate statistics for intermediate results to compute cost of complex expressions

Transaction Management
What is the system fails?

What if more than one user is concurrently updating the same file?

A transaction is a collection of operations that perform single logical function in a database application

Transaction-Management component ensure that the database remains in a consistent (correct) state despite
system failures (eg: power failures and operating system crashes) and transaction failures

Week 1 Lecture 5 2
Concurrency-control manager controls the interaction among the concurrent transactions to ensure consistency of
the database

Database Architecture
The architecture of a database system is greatly influenced by the underlying computer system on which the database is
running:

Centralized

Client-Server

Parallel (multi-processor)

Distributed

Cloud

Week 1 Lecture 5 3
📚
Week 2 Lecture 1
Class BSCCS2001

Created @August 21, 2021 12:32 PM

Materials

Module # 6

Type Lecture

Week # 2

Introduction to Relational Model


Attribute Types
Consider

Student = Roll #, First Name, Last Name, DoB, Passport #, Aadhaar #, Department
relation

The set of allowed values for each attribute is called the domain of the attribute

Roll # - Alphanumeric string

First Name, Last Name - Alpha string

DoB - Date

Passport # - String (Letter followed by 7 digits) - nullable (Optional)

Aadhaar # - 12-digit number

Department - Alpha string

Attribute values are (normally) required to be atomic; this is, indivisible

The special value null is a member of every domain. Indicates that the value is unknown

the null value may cause complications in the definition of many operations

Roll # First Name Last Name DoB Passport Aadhaar Dept.


15CS10026 Lalit Dubey 27-Mar-1997 L4032464 172861749239 Computer

Week 2 Lecture 1 1
Roll # First Name Last Name DoB Passport Aadhaar Dept.
16EE30029 Jatin Chopra 17-Nov-1996 null 391718363816 Electrical

Relational Schema and Instance


A1 , A2 , ..., An are the attributes
R = (A1 , A2 , ..., An ) is a relation schema

Example: instructor = (ID, name, dept_name, salary)

Formally, given as D1 , D2 , ..., Dn a relation r is a subset of

D1 ✕D2 ✕...Dn
Thus, a relation is a set of n-tuples (a1 , a2 , ..., an ) where each ai ∈ Di
The current values (relation instance) of a relation are specified by a table

An element t or r is a tuple, represented by a row in a table

Example

instructor ≡ (String(5) ✕ String ✕ String ✕ Number+), where ID ∈ String(5), name ∈ String, dept_name ∈ String and
salary ∈ Number+

Keys
Let K ⊆ R, where R is the set of attributes in the relation
K is a superkey of R if values of K are sufficient to identify a unique tuple of each possible relation r(R)
Example: {ID} and {ID, name} are both superkeys of instructor

Superkey K is a candidate key if K is minimal

Example: {ID} is a candidate key for instructor

One of the candidate keys is selected to be the primary key

A surrogate key (or synthetic key) in a database is a unique identifier for either an entity in the modeled world or an
object in the database

The surrogate key is not derived from application data, unlike a natural (or business) key which is derived from
application data

Keys: Examples
Students = Roll #, First Name, Last Name, DoB, Passport #, Aadhaar #, Department

Super Key: Roll #, {Roll #, DoB}

Candidate Keys: Roll #, {First Name, Last Name}, Aadhaar #

Passport # cannot be a key because it is an optional field and can take null values, but an ID can never be null

Primary Key: Roll #

Can Aadhaar # be a key?

It may suffice for unique identification, but Roll # may have additional useful information.

For example: 14CS92P01

Read it as 14-CS-92-P-01

14 - Admission in 2014

CS - Department: Computer Science

92 - Category of the Student

P - Type of admission: Project

01 - Serial Number

Secondary / Alternate Key: {First Name, Last Name}, Aadhaar #

Simple Key: Consists of a single attribute

Week 2 Lecture 1 2
Composite Key: {First Name, Last Name}

Consists of more than one attribute to uniquely identify an entity occurrence

One or more of the attributes, which make up the key are not simple keys in their own right

Roll # First Name Last Name DoB Passport Aadhaar Dept

15CS10026 Lalit Dubey 27-Mar-1997 L4032464 172861749239 Computer

16EE30029 Jatin Chopra 17-Nov-1996 null 391718363816 Electrical


15EC10016 Smriti Mongra 23-Dec-1996 G5432849 204592710914 Electronics

16CE10038 Dipti Dutta 02-Feb-1997 null 571919482918 Civil

15CS30021 Ramdin Minz 10-Jan-1997 X8811623 492849275924 Computer

Foreign key constraint: Value in one relation must appear in another (in other words, when a particular attribute is a
key in a different table)

Referencing relation

Enrolment: Foreign Keys - Roll #, Course #

Referenced relation

Students, Courses

A compound key consists of more than one attribute to uniquely identify an entity occurence

Each attribute, which makes up the key, is a simple key in its own right

{Roll #, Course #}

Schema Diagram for University Database

Relational Query Languages


Procedural viz-a-viz Non-procedural or Declarative Paradigms

Procedural programming requires that the programmer tell the computer what to do

That is, how to get the output for the range of required inputs

The programmer must know an appropriate algorithm

Declarative programming requires a more descriptive style

The programmer must know what relationships hold between various entities

Week 2 Lecture 1 3
Relational Query Language: Example

"Pure" languages:

Relational Algebra

Tuple relational calculus

Domain relational calculus

The above 3 pure languages are equivalent in computing power

We will concentrate on relational algebra

Not Turing-macine equivalent

Not all algorithms can be expressed in Relational Algebra

Consists of 6 basic operations

Week 2 Lecture 1 4
📚
Week 2 Lecture 2
Class BSCCS2001

Created @August 22, 2021 6:57 PM

Materials https://www.caam.rice.edu/~heinken/latex/symbols.pdf

Module # 7

Type Lecture

Week # 2

Introduction to Relational Model (part 2)


Relational Operators
Basic properties of relations
A relation is a set. Hence,

Ordering of rows / tuples is inconsequential

All rows / tuples must be distinct

Select operation - selection of rows (tuples)


Relation r on the following table

Week 2 Lecture 2 1
The select operation is defined as

And it returns the following table as a result

Project operation - selection of columns (Attributes)


Relation r

The projection operation is defined as

And it returns the following table as a result

Union of two relations


Relation r, s

Week 2 Lecture 2 2
The union of two relation is defined as

And it returns the following result

Set difference of two relations


Relation r, s

The set difference of two relations is defined as

And it returns the following result

Week 2 Lecture 2 3
Joining two relations - Cartesian-product
Relation r, s

The cartesian product is defined as

And it returns the following result

Cartesian-product - Naming issue

Week 2 Lecture 2 4
Renaming a Table
Allows us to refer to a relation, say E, by more than one name

returns the expression E under the name X

Relations r

Self product

Composition of Operations
Can build expressions using multiple operations

Example:

r ╳s

Week 2 Lecture 2 5
Joining two relations - Natural Join
Let r and s be relations on schemas R and S respectively. Then, the "natural join" of relations R and S is a relation
on schema R ∪ S

Consider each pair of tuples tr from r and ts from s

If tr and ts have the same value on each of the attributes in R ∩ S , add a tuple t to the result, where

t has the same value as tr on r


t has the same value as ts on s

Natural join example


Relations r, s:

Natural join

Week 2 Lecture 2 6
Aggregation Operators
Can we compute:

SUM

AVG

MAX

MIN

Notes about Relational Languages


Each query input is a table (or a set of tables)

Each query output is a table

All data in the output table appears in one of the input tables

Relational Algebra is not Turing complete

Week 2 Lecture 2 7
📚
Week 2 Lecture 3
Class BSCCS2001

Created @August 22, 2021 8:38 PM

Materials

Module # 8

Type Lecture

Week # 2

Introduction to Structured Query Language (SQL)


History of SQL
IBM developed Structured English Query Language (SEQUEL) as a part of System R project.

Renamed Structured Query Language (SQL: still pronounced as SEQUEL)

ANSI and ISO standard SQL:

Description
Name
SQL -
First formalized by ANSI
86

SQL -
+ Integrity Constraints
89

SQL -
Major revision (ISO/IEC 9075 standard), De-facto Industry Standard
92
+ Regular Expression Matching, Recursive Queries, Triggers, Support for Procedural and Control Flow Statements,
SQL :
Non-scalar types (Arrays) and some OO features (structured types), Embedding SQL in Java (SQL/OLB) and Embedding
1999
Java in SQL (SQL/JRT)
SQL : + XML features (SQL/XML), Window functions, Standardized sequences and columns with auto-generated values (identity
2003 columns)

SQL : + Way of importing and storing XML data in a SQL database, manipulating it within the database, and publishing both XML
2006 and conventional SQL-data in XML form
SQL :
Legalizes ORDER BY outside Cursor Definitions + INSTEAD OF Triggers, TRUNCATE statements and FETCH clause
2008

Week 2 Lecture 3 1
Description
Name
SQL :
+ Temporal data (PERIOD FOR) Enhancements for Window functions and FETCH clause
2011
SQL :
+ Row Pattern Matching, Polymorphic Table Functions and JSON
2016

SQL :
+ Multidimensional Arrays (MDarray type and operators)
2019

Compliance
SQL is the de facto industry standard today for relational or structured data systems

Commercial system as well as open system may be fully or partially compliant to one or more standards from SQL-92
onward

Not all examples here may work on your particular system. Check your system's SQL docs.

Alternatives
There aren't any alternatives to SQL for speaking to relational databases (i.e. SQL as a protocol)

There are alternatives to writing SQL in the applicaions

These alternatives have been implemented in the form of front-ends for working with relational databases. Some
examples of a front-end include (for a section of languages):

SchemeQL and CLSQL

Probably the most flexible, thanks to their Lisp heritage

They also look a lot more like SQL than other front-ends

LINQ (in .NET)

ScalaQL and ScalaQuery (in Scala)

SqlStatement, ActiveRecord and many others in Ruby

HaskellDB

... the list goes on for many other languages

Derivatives
There are several query languages that are derived from or inspired by SQL.

Out of these, the most popular and effective is SPARQL.

SPARQL (pronounced sparkle, a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF
query language

A semantic query language for databases - able to retrieve and manipulate data stored in Resource Description
Framework (RDF) format.

It has been standardized by the W3C Consortium as key technology of the semantic web

Versions

SPARQL 1.0 (Jan. 2008)

SPARQL 1.1 (Mar. 2013)

Used as the query languages for several NoSQL systems - particularly the Graph Databases that use RDF as
store

Data Definition Language (DDL)


The SQL data-definition language (DDL) allows the specification of information about relations, including:

The Schema for each Relation

The Domain of values associated with each Attribute

Integrity Constraints

Week 2 Lecture 3 2
And, as we will see later, also other information such as ...

The set of Indices to be maintained for each relations

Security and Authorization information for each relation

The Physical Storage Structure of each relation on disk

Domain types (or Data types) in SQL


char(n) - Fixed length character string, with user-specified length n

varchar(n) - Variable length character strings, with user-specified max length n

int - Integer (a finite subset of the integers that is machine-dependent)

smallint(n) - Small integer (a machine-dependent subset of the integer domain type)

numeric(p, d) - Fixed point number, with user-specified precision of p digits, with d digits to the right of decimal point.
(ex. numeric(3, 1) allows 44.5 to be stored exactly, but not 444.5 or 0.32)

real, double precision - Floating point and double-precision floating point numbers, with machine-dependent
precision

float(n) - Floating point number with user specified precision of at-least n digits

Schema diagram for a University database

Create Table construct


An SQL relation is defined using the create table command:

create table r (A1 D1 , A2 D2 , ..., An Dn ),

(integrity − constraint1 ),
...
(integrity − constraintk ));
r is the name of the relation (table)

each Ai is an attribute name in the schema of relation r

Di is the data type of values in the domain of attribute Ai

Example

create table instructor (


ID char(5),

Week 2 Lecture 3 3
name varchar(20),
dept_name varchar(20),
salary numeric(8, 2));

University DB

instructor
ID

name

dept_name
salary

Create Table constructs: Integrity constraints


not null

primary key (A1 , ..., An )

foreign key (Am , ..., An ) references r

create table instructor (


ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8, 2));

create table instructor (


ID char(5),
name varchar(20) not null,
dept_name varchar(20),
salary numeric(8, 2),
primary key (ID),
foreign key (dept_name) references department));

primary key declaration on an attribute automatically ensures not null

Create Table construct: More relations

create table student (


ID varchar(5),
name varchar(20) not null,
dept_name varchar(20),
tot_cred numeric(3, 0),
primary key (ID),
foreign key (dept_name) references department);

create table course (


course_id varchar(8),
title varchar(50),
dept_name varchar(20),
credits numeric(2, 0),
primary key (course_id),
foreign key (dept_name) references department);

create table takes (


ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year numeric(4, 0),
grade varchar(2),
primary key (ID, course_id, sec_id, semester, year),
foreign key (course_id, sec_id, semester, year) references section);

NOTE: sec_id can be dropped from primary key above to ensure a student cannot register for two sections of the
same course in the same semester

Week 2 Lecture 3 4
Update Tables
Insert (DML command)

insert into instructor values ('10211', 'Smith', 'Biology', 66000);

Delete (DML command)

Remove all tuples from the student relation

delete from student

Drop Table (DDL command)

drop table r

Alter (DDL command) # to edit the schema

alter table r add A D

Where A is the name of the attribute to be added to relation to r and D is the domain of A

All existing tuples in the relation are assigned null as the value for the new attribute

alter table r drop A

Where A is the name of the attribute of relation r

Dropping of attributes not supported by many databases

Data Manipulation Language (DML): Query Structure


Basic query structure
A typical SQL query has the form:

select A1 , A2 , ..., An ,
from r1 , r2 , ..., rm
where P

Ai represents an attribute from ri 's


ri represents a relation
P is a predicate

The result of an SQL query is a relation

SELECT clause
The select clause lists the attributes desired in the result of a query

Corresponds to the projection operation of relational algebra

Example: find the names of all instructors

select name from instructor

NOTE: SQL names are case insensitive

Name = NAME = name

Some people prefer to use UPPER CASE wherever we use the bold font

SQL allows duplicates in relations as well as in query results

Week 2 Lecture 3 5
To force the elimination of duplicates, insert the keyword distinct after select

Find the department names of all instructors and remove duplicates

select distinct dept_name


from instructor

The keyword all specifies that duplicates should not be removed

select all dept_name


from instructor

An asterisk (*) in the select denotes all attributes

select *
from instructor

An attribute can be a literal with no from clause

select '437'

Result is a table with one column and a single row with the value '437'

Can give the column a name using:

select '437' as FOO

An attribute can be a literal with from clause

select 'A'
from instructor

Result is a table with one column and N rows (number of tuples in the instructors table), each row with value 'A'

The select clause can contain arithmetic expressions involving the operation +, -, * and / and operating on constants or
attributes of tuples

The query:

select ID, name, salary/12


from instructor

Would return a relation that is the same as the instructor relation, except that the value of the attribute salary is
divided by 12

Can rename "salary/12" using the as clause:

select ID, name, salary/12 as monthly_salary

WHERE clause
The where clause specifies conditions that the result must satisfy

Corresponds to the selection predicate of the relational algebra

To find all instructors in the Computer Science department

select name
from instructor
where dept_name = 'Comp. Sci.'

Comparison results can be combined using the logical connectives and, or, not

Week 2 Lecture 3 6
To find all instructors in Comp. Sci. department with salary > 80000

select name
from instructor
where dept_name = 'Comp. Sci.' and salary > 80000

Comparisons can be applied to results of arithmetic expressions

FROM clause
The from clause lists the relations involved in the query

Corresponds to the Cartesian product operation of the relational algebra

Find the Cartesian product instructor X teaches

select *
from instructor, teaches

Generates every possible instructor-teaches pair with all attributes from both relations

For common attributes (for eg: ID), the attributes in the resulting table are renamed using the relation name (for
eg: instructor.ID)

Cartesian product is not very useful directly, but useful when combined with the where-clause condition (selection
operation in relational algebra)

Cartesian product

Week 2 Lecture 3 7
📚
Week 2 Lecture 4
Class BSCCS2001

Created @September 3, 2021 11:26 AM

Materials

Module # 9

Type Lecture

Week # 2

Introduction to Structured Query Language (SQL) (part 2)


Cartesian product (cont. from the previous lecture's end)
Example

Find the names of all instructors who have taught some courses and the course_id

select name, course_id


from instructor, teaches
where instructor.ID = teaches.ID

Equi-Join, Natural Join

Week 2 Lecture 4 1
Here in this table, we do not have the names of the courses

If we want the name, we will again have to do a similar join operation with a table that has the names of the
courses

This operations is known as Natural Join

Example

Find the names of all the instructors in the Art dept. who have taught some courses and the course_id

select name, course_id


from instructor, teaches
where instructor.ID = teaches.ID and instructor.dept_name = 'Art'

Rename AS operation
The SQL allows renaming relations and attributes using the as clause:

old_name as new_name

Find the names of all the instructors who have a higher salary than some instructor in 'Comp. Sci.'

select distinct T.name


from instructor as T, instructor as S
where T.salary > S.salary and S.dept_name = 'Comp. Sci.'

The keyword as is optional and may be omitted

instructor as T ≡ instructor T

String Operations
SQL includes a string-matching operator for comparisons on character strings.

The operator like uses patterns that are described using two special characters:

percent (%)
The % character matches any sub-string

Week 2 Lecture 4 2
underscore ( _ )

The _ character matches any character

Find the names of all instructors whose name includes the sub-string "dar"

select name
from instructor
where name like '%dar%'

Match the string "100%"

like '100%' escape '\'

in the above example, we use the backslash ( \ ) as the escape character


and '%dar%' could match Darwin, Majumdar, Sardar or Uddarin

meanwhile, '%dar___' (dar followed by 3 underscores), it will match Darwin, but not the others

Patterns are case sensitive

Pattern matching example

'Intro%' matches any string beginning with "Intro"

'%Comp%' matches any string containing "Comp" as a substring

'___' (3 underscores) many any string of exactly 3 characters

'___%' (3 underscores and then a %) matches any string of at least 3 characters

SQL supports variety of string operations such as

Concatenation (using "||") [double pipe symbol]

Converting from upper to lower case (and vice-versa)

Finding the string length, extracting substrings, etc...

Ordering the display of tuples (ORDER BY clause)


List in alphabetic order the names of all the instructors

select distinct name


from instructor
order by name

We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the
default

Example: order by name desc

Can sort on multiple attributes

Example: order by dept_name, name

Selecting number of tuples in output


The Select Top clause is used to specify the number of records to return

The Select Top clause is useful on large tables with thousands of records.

Returning a large number of records can impact performance

select top 10 distinct name


from instructor

Not all database systems support the SELECT TOP clause.

SQL Server & MS Access support select top

MySQL supports the limit clause

Week 2 Lecture 4 3
Oracle uses fetch first n rows only and rownum

select distinct name


from instructor
order by name
fetch first 10 rows only

WHERE clause predicates


SQL includes a between comparison operator

Example: Find the names of all the instructors with salary between $90,000 and $100,000

(that is, ≥ $90,000 and ≤ $100,000)

select name
from instructor
where salary between 90000 and 100000

Tuple comparison

select name, course_id


from instructor, teaches
where (instructor.ID, dept_name) = (teaches.ID, 'Biology');

IN operator
The in operator allows you to specify multiple values in a where clause

The in operator is a shorthand for multiple or conditions

select name
from instructor
where dept_name in ('Comp. Sci.', 'Biology')

Duplicates
In relations with duplicates, SQL can define how many copies of tuples appear in the result

Multiset versions of some of the relational algebra operators - given multiset relations r1 and r2 :

a) SELECT σθ (r1 ) : If there are c1 copies of tuple t1 in r1 and t1 satisfies selection σθ , then there are c1 copies of
t1 in σθ (r1 )
b) PROJECTION ΠA (r) : For each copy of tuple t1 in r1 , there is a copy of tuple ΠA (t1 ) in ΠA (r1 ) where ΠA (t1 )
denotes the projection of the single tuple t1
c) r1 × r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuples t2 in r2 , there are c1 × c2 copies of the
tuple t1 ⋅ t2 in r1 × r2

Example: Suppose multiset relations r1 (A, B) and r2 (C) are as follows:

r1 = {(1, a)(2, a)} ; r2 = {(2), (3), (3)}


Then ΠB (r1 ) would be {(a), (a)} while ΠB (r1 ) × r2 would be
{(a, 2), (a, 2), (a, 3), (a, 3), (a, 3), (a, 3)}
SQL duplicate semantics:
select A1 , A2 , ..., An

from r1 , r2 , ..., rm
where P
is equivalent to the multiset version of the expression:

ΠA 1 ,A 2 ,...,A n (σP (r1 × r2 × ... × rm ))

Week 2 Lecture 4 4
📚
Week 2 Lecture 5
Class BSCCS2001

Created @September 4, 2021 6:05 PM

Materials

Module # 10

Type Lecture

Week # 2

Introduction to Structured Query Language (SQL) (part 3)


Set operations
Example

Find the courses that ran in Fall 2009 or in Spring 2010

(select course_id from section where sem = 'Fall' and year = 2009)
union
(select course_id from section where sem = 'Spring' and year = 2010)

Find the courses that ran in Fall 2009 and in Spring 2010

(select course_id from section where sem = 'Fall' and year = 2009)
intersect
(select course_id from section where sem = 'Spring' and year = 2010)

Find the courses that ran in Fall 2009 but not in Spring 2010

(select course_id from section where sem = 'Fall' and year = 2009)
except
(select course_id from section where sem = 'Spring' and year = 2010)

Find the salaries of all the instructors that are less than the largest salary

Week 2 Lecture 5 1
select distinct T.salary
from instructor as T, instructor as S
where T.salary < S.salary

Find the salaries of all the instructors

select distinct salary


from instructor

Find the largest salary of all the instructors

(select distinct salary from instructor)


except
(select distinct T.salary from instructor as T, instructor as S where T.salary < S.salary)

Set operations such as union, intersect and except automatically eliminate the duplicates

To retain all the duplicates, use the corresponding multiset versions union all, intersect all and except all

Suppose a tuple occurs m times in r and n times in s, then it occurs ...

m + n times in r union all s


min(m, n) times in r intersect all s

max(0, m - n) times in r except all s

NULL values
What is a NULL value?

A NULL value is something unknown or a value that does not exist yet

Why is NULL value so important?

Certain values may not exist for everyone

For eg: Every student may not have a passport at the time of registration

Often times while we are creating/inserting a record, we may not know all the values of all the fields

For eg: When a student joins, the student does not have any credit assigned to him/her, so the total credit is
NULL
We can say 0 (zero), but 0 (zero) and NULL are different

0 (zero) means the student has not taken a credit


NULL means the credit has not been given yet

Naturally, when we add an attribute to all the existing rows of a table, the value of the particular field cannot be
known, cannot be set, so it will have to initialized as a NULL value

It is possible for tuples to have a null value, denoted by null, for some of their attributes

The predicate is null can be used to check for null values

Example: Find all the instructors whose salary is null

select name
from instructor
where salary is null

It is not possible to test for null values with comparison operators such as =, <, > or <>
We need to use the is null and is not null operators instead

NULL values: Three valued logic


Three values - true, false, unknown

Any comparison with null returns unknown

Example: 5 < null or null <> null or null = null

Week 2 Lecture 5 2
Three-valued logic using the value unknown:

OR:

(unknown or true) = true


(unknown or false) = unknown

(unknown or unknown) = unknown

AND:

(true and unknown) = unknown

(false and unknown) = false

(unknown and unknown) = unknown

NOT:
(not unknown) = unknown

"P is unknown" evaluates to true if predicate P evaluates to unknown

Result of where clause predicate is treated as false if it evaluates to unknown

Aggregate functions
These functions operate on the multiset of values of a column of a relation (table) and return a value

avg: average value

min: minimum value

max: maximum value


sum: sum of the values

count: number of values

Examples

Find the average salary of instructors in the Computer Science department

select avg(salary)
from instructor
where dept_name = 'Comp. Sci.'

Find the total number of instructors who teach a course in the Spring 2010 semester

select count(distinct ID)


from teaches
where semester = 'Spring' and year = 2010

Find the number of tuples in the course relation (table)

select count(*)
from courses;

Example (GROUP BY)

Find the average salary of instructors in each department

select dept_name. avg(salary) as avg_salary


from instructor
group by dept_name;

Week 2 Lecture 5 3
So, group by takes a column and makes sub-tables of all those records which have the same value on that particular
group by attribute

It then applies the aggregate function on the column based on this sub-table

Attributes in select clause outside of aggregate functions must appear in group by list

-- The following query is incorrect because of the 'ID' attribute


select dept_name, ID, avg(salary)
from instructor
group by dept_name;

HAVING clause
Find the names and average salaries of all departments whose average salary is greater than 42,000

select dept_name, ID, avg(salary)


from instructor
group by dept_name
having avg(salary) > 42000;

NOTE: Predicates in the having clause are applied after the formation of groups whereas predicates in the where
clause are applied before forming groups

NULL values and aggregates


Total all salaries

select sum(salary)
from instructor;

Above statement ignores null amounts

Result is null if there is no non-null amount

All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes

What if collection has only null values?

count returns 0 (zero)

all other aggregates return null

Week 2 Lecture 5 4
📚
Week 3 Lecture 1
Class BSCCS2001

Created @September 25, 2021 9:05 AM

Materials

Module # 11

Type Lecture

Week # 3

SQL Examples
SELECT DISTINCT
From the classroom relation, find the names of buildings in which every individual classroom has capacity less than
100 (removing the duplicates).

Relation:

classroom

building room_number capacity

Packard 101 500

Painter 514 10

Taylor 3128 70

Watson 100 30

Watson 120 50

Query:

SELECT DISTINCT building


FROM classroom
WHERE capacity < 100;

Output:

building

Week 3 Lecture 1 1
building

Painter

Taylor

Watson

SELECT ALL
From the classroom relation, find the names of buildings in which every individual classroom has capacity less than
100 (without removing the duplicates).

Relation:

classroom

building room_number capacity

Packard 101 500

Painter 514 10

Taylor 3128 70

Watson 100 30

Watson 120 50

Query:

SELECT ALL building


FROM classroom
WHERE capacity < 100;

Output:

building

Painter

Taylor

Watson

Watson

NOTE: The duplicate retention is default and hence it is a common practice to skip ALL immediately after SELECT

Cartesian Product
Find the list of all students of departments which have a budget < $100K

SELECT name, budget


FROM student, department
WHERE student.dept_name = department.dept_name AND budget < 100000;

name budget

Brandt 50000

Peltier 70000

Levy 70000

Sanchez 80000

Snow 70000

Aoi 85000

Bourikas 85000

Tanaka 90000

Week 3 Lecture 1 2
The above query generates every possible student-department pair, which is the Cartesian product of student and
department.

Then, it filters all the rows with student.dept_name = department.dept_name AND budget < 100000

The common attribute dept_name in the resulting table are renamed using the relation name - student.dept_name and
department.dept_name

RENAME AS Operation
The same query in the above case can be framed by renaming the table as shown below:

SELECT S.name AS studentname, budget AS deptbudget


FROM student AS S, department AS D
WHERE S.dept_name = D.dept_name AND budget < 100000;

studentname deptbudget

Brandt 50000

Peltier 70000

Levy 70000

Sanchez 80000

Snow 70000

Aoi 85000

Bourikas 85000

Tanaka 90000

The above query renames the relation student AS S and the relation department AS D

It also displays the attribute name as StudentName and the budget as DeptBudget

NOTE: The budget attribute does not have any prefix because it occurs only in the department relation

SELECT: AND and OR


From the instructor and department relations in the figure, find out the names of all the instructors whose department
is Finance or whose department is in any of the following buildings: Watson, Taylor

instructor

id name dept_name salary

10101 Srinivasan Comp. Sci. 65000

12121 Wu Finance 90000

15151 Mozart Music 40000

22222 Einstein Physics 95000

32343 El Said History 60000

33456 Gold Physics 87000

45565 Katz Comp. Sci. 75000

58583 Califieri History 62000

76543 Singh Finance 80000

76766 Crick Biology 72000

83821 Brandt Comp. Sci. 92000

98345 Kim Elec. Eng. 80000

department

dept_name building budget

Biology Watson 90000

Comp. Sci. Taylor 100000

Elec. Eng. Taylor 85000

Week 3 Lecture 1 3
dept_name building budget

Finance Painter 120000

History Painter 50000

Music Packard 80000

Physics Watson 70000

Query:

SELECT name
FROM instructor I, department D
WHERE D.dept_name = I.dept_name
AND (I.dept_name = 'Finance' OR building IN ('Watson', 'Taylor'));

Output:

name

Srinivasan

Wu

Einstein

Gold

Katz

Singh

Crick

Brandt

Kim

String Operations
From the course relation in the figure, find the titles of all the courses whose course_id has 3 alphabets indicating the
department

course

course_id title dept_name credits

BIO-101 Intro. to Biology Biology 4

BIO-301 Genetics Biology 4

BIO-399 Computational Biology Biology 3

CS-101 Intro. to Computer Science Comp. Sci. 4

CS-190 Game Design Comp. Sci. 4

CS-315 Robotics Comp. Sci. 3

CS-319 Image Processing Comp. Sci. 3

CS-347 Database System Concepts Comp. Sci. 3

EE-181 Intro. to Digital Systems Elec. Eng. 3

FIN-201 Investment Banking Finance 3

HIS-351 World History History 3

MU-199 Music Video Production Music 3

PHY-101 Physical Principles Physics 4

Query:

SELECT title
FROM course
WHERE course_id LIKE '___-%'; -- 3 underscores

Output:

Week 3 Lecture 1 4
title

Intro. to Biology

Genetics

Computational Biology

Investment Banking

World History

Physical Principles

The course_id of each department has either 2 or 3 alphabets in the beginning followed by a hyphen and then
followed by a 3-digit number. The above query returns the names of those departments that have 3 alphabets in the
beginning

ORDER BY
From the student relation in the figure, obtain the list of all students in alphabetic order of departments and within
each department, in decreasing order of total credits.

student

id name dept_name tot_cred

00128 Zhang Comp. Sci. 102

12345 Shankar Comp. Sci. 32

19991 Brandt History 80

23121 Chavez Finance 110

44553 Peltier Physics 56

45678 Levy Physics 46

54321 Williams Comp. Sci. 54

55739 Sanchez Music 38

70557 Snow Physics 0

76543 Brown Comp. Sci. 58

76653 Aoi Elec. Eng. 60

98765 Bourikas Elec. Eng. 98

98988 Tanaka Biology 120

Query:

SELECT name, dept_name, tot_cred


FROM student
ORDER BY dept_name ASC, tot_cred DESC;

Output:

name dept_name tot_cred

Tanaka Biology 120

Zhang Comp. Sci. 102

Brown Comp. Sci. 58

Williams Comp. Sci. 54

Shankar Comp. Sci. 32

Bourikas Elec. Eng. 98

Aoi Elec. Eng. 60

Chavez Finance 110

Brandt History 80

Sanchez Music 38

Peltier Physics 56

Levy Physics 46

Week 3 Lecture 1 5
name dept_name tot_cred

Snow Physics 0

How is this sort happening?

The list is first sorted in alphabetic order of dept_name

Within each department, it is sorted in decreasing order of total credits

IN Operator
From the teaches relation in the figure, find the IDs of all the courses taught in the Fall or Spring of 2018

teaches

id course_id sec_id semester year

10101 CS-101 1 Fall 2017

10101 CS-315 1 Spring 2018

10101 CS-347 1 Fall 2017

12121 FIN-201 1 Spring 2018

15151 MU-199 1 Spring 2018

22222 PHY-101 1 Fall 2017

32343 HIS-351 1 Spring 2018

45565 CS-101 1 Spring 2018

45565 CS-319 1 Spring 2018

76766 BIO-101 1 Summer 2017

76766 BIO-301 1 Summer 2018

83821 CS-190 1 Spring 2017

83821 CS-190 2 Spring 2017

83821 CS-319 2 Spring 2018

98345 EE-181 1 Spring 2017

Query:

SELECT course_id
FROM teaches
WHERE semester IN ('Fall', 'Spring')
AND year = 2018;

Output:

course_id

CS-315

FIN-201

MU-199

HIS-351

CS-101

CS-319

CS-319

NOTE: Now we can use DISTINCT to remove duplicates

Set Operations: UNION


For the same question in the above table, we can find the solution using UNION operator as follows:

Query:

SELECT course_id
FROM teaches
WHERE semester = 'Fall'

Week 3 Lecture 1 6
AND year = 2018
UNION
SELECT course_id
FROM teaches
WHERE semester = 'Spring'
AND year = 2018

Output:

course_id

CS-101

CS-315

CS-319

FIN-201

HIS-351

MU-199

NOTE: UNION removes all the duplicates. If we use UNION ALL instead of UNION, we get the same set of tuples as
in the above example

Set Operations: INTERSECT


From the instructor relation in the figure, find the names of all the instructors who taught in either Computer Science
department or the Finance department and whose salary is > 80,000

instructor

id name dept_name salary

10101 Srinivasan Comp. Sci. 65000

12121 Wu Finance 90000

15151 Mozart Music 40000

22222 Einstein Physics 95000

32343 El Said History 60000

33456 Gold Physics 87000

45565 Katz Comp. Sci. 75000

58583 Califieri History 62000

76543 Singh Finance 80000

76766 Crick Biology 72000

83821 Brandt Comp. Sci. 92000

98345 Kim Elec. Eng. 80000

Query:

SELECT name
FROM instructor
WHERE dept_name IN ('Comp. Sci.', 'Finance')
INTERSECT
SELECT name
FROM instructor
WHERE salary > 80000;

Output:

name

Srinivasan

Katz

NOTE: The same thing can be achieved by using the query:

SELECT name FROM instructor WHERE dept_name IN ('Comp. Sci.', 'Finance') AND salary < 80000;

Week 3 Lecture 1 7
Set Operation: EXCEPT
From the instructor relation in the figure, find the names of all the instructors who taught in either the Computer
Science department or the Finance department and whose salary is either ≥ 90, 000 or ≤ 70, 000

instructor

id name dept_name salary

10101 Srinivasan Comp. Sci. 65000

12121 Wu Finance 90000

15151 Mozart Music 40000

22222 Einstein Physics 95000

32343 El Said History 60000

33456 Gold Physics 87000

45565 Katz Comp. Sci. 75000

58583 Califieri History 62000

76543 Singh Finance 80000

76766 Crick Biology 72000

83821 Brandt Comp. Sci. 92000

98345 Kim Elec. Eng. 80000

Query:

SELECT name
FROM instructor
WHERE dept_name IN ('Comp. Sci.', 'Finance')
EXCEPT
SELECT name
FROM instructor
WHERE salary < 90000 AND salary > 70000;

Output:

name

Srinivasan

Brandt

Wu

NOTE: The same can be achieved by using the following query

SELECT name FROM instructor


WHERE dept_name IN ('Comp. Sci.', 'Finance')
AND (salary >= 90000 OR salary <= 70000);

Aggregate function: AVG


From the classroom relation given in the figure, find the names and the average capacity of each building whose
average capacity is greater than 25

classroom

building room_number capacity

Packard 101 500

Painter 514 10

Taylor 3128 70

Watson 100 30

Watson 120 50

Week 3 Lecture 1 8
Query:

SELECT building, AVG(capacity)


FROM classroom
GROUP BY building
HAVING AVG(capacity) > 25;

Output:

bulding avg

Taylor 70.00

Packard 500.00

Watson 40.00

Aggregate function: MIN


From the instructor relation given in the figure, find the least salary drawn by any instructor among all the instructors

instructor

id name dept_name salary

10101 Srinivasan Comp. Sci. 65000

12121 Wu Finance 90000

15151 Mozart Music 40000

22222 Einstein Physics 95000

32343 El Said History 60000

33456 Gold Physics 87000

45565 Katz Comp. Sci. 75000

58583 Califieri History 62000

76543 Singh Finance 80000

76766 Crick Biology 72000

83821 Brandt Comp. Sci. 92000

98345 Kim Elec. Eng. 80000

Query:

SELECT MIN(salary) AS least_salary FROM instructor;

Output:

least_salary

40000

Aggregate function: MAX


From the instructor relation given above, find the highest salary drawn by any instructor among all the instructors

Query:

SELECT MAX(salary) AS highest_salary FROM instructor;

Output:

highest_salary

95000

Aggregate function: COUNT

Week 3 Lecture 1 9
From the instructor relation given above, find the number of instructors in each department

Query:

SELECT dept_name, COUNT(id) AS ins_count


FROM instructor
GROUP BY dept_name;

Output:

dept_name ins_count

Comp. Sci. 3

Finance 2

Music 1

Physics 2

History 2

Biology 1

Elec. Eng. 1

Aggregate function: SUM


From the course relation given in the figure, find the total credits offered by each department

course

course_id title dept_name credits

BIO-101 Intro. to Biology Biology 4

BIO-301 Genetics Biology 4

BIO-399 Computational Biology Biology 3

CS-101 Intro. to Computer Science Comp. Sci. 4

CS-190 Game Design Comp. Sci. 4

CS-315 Robotics Comp. Sci. 3

CS-319 Image Processing Comp. Sci. 3

CS-347 Database System Concepts Comp. Sci. 3

EE-181 Intro. to Digital Systems Elec. Eng. 3

FIN-201 Investment Banking Finance 3

HIS-351 World History History 3

MU-199 Music Video Production Music 3

PHY-101 Physical Principles Physics 4

Query:

SELECT dept_name, SUM(credits) AS sum_credits


FROM course
GROUP BY dept_name;

Output:

dept_name sum_credits

Finance 3

History 3

Physics 4

Music 3

Comp. Sci. 17

Biology 11

Elec. Eng. 3

Week 3 Lecture 1 10
Week 3 Lecture 1 11
📚
Week 3 Lecture 2
Class BSCCS2001

Created @September 25, 2021 5:30 PM

Materials

Module # 12

Type Lecture

Week # 3

Intermediate SQL
Nested sub-queries
SQL provides a mechanism for the nesting of sub-queries

A sub-query is a SELECT-FROM-WHERE expression that is nested within another query

The nesting can be done in the following SQL query

SELECT A1 , A2 , ..., An
FROM r1 , r2 , ..., rm
WHERE P
as follows:

Ai can be replaced by a sub-query that generates a single value


ri can be replace by any valid sub-query
P can be replaced with an expression of the form:
B <operation> (sub-query)

where B is an attribute and <operation> is to be defined later

Input of a query → One or more relations

Output of a query → Always a single relation

Subqueries in WHERE clause


Typical use of subqueries is to perform tests

Week 3 Lecture 2 1
For set membership

For set comparisons

For set cardinality

Set Membership
Find the courses offered in Fall 2009 and in Spring 2010 (INTERSECT example)

SELECT DISTINCT course_id


FROM section
WHERE semester = 'Fall'
AND year = 2009
AND course_id IN (
SELECT course_id
FROM section
WHERE semester = 'Spring' AND year = 2010);

Find courses offered in Fall 2009 but not in Spring 2010 (EXCEPT example)

SELECT DISTINCT course_id


FROM section
WHERE semester = 'Fall'
AND year = 2009
AND course_id NOT IN (
SELECT course_id
FROM section
WHERE semester = 'Spring' AND year = 2010);

Find the total number of (distinct) students who have taken course sections taught by the instructor with ID 10101

SELECT COUNT(DISTINCT id)


FROM takes
WHERE (course_id, sec_id, semester, year) IN (
SELECT course_id, sec_id, semester, year
FROM teaches
WHERE teaches.id = 10101);

NOTE: Above query can be written in a simple manner. The formulation above is just to simply illustrate SQL features

Set comparison - "SOME" clause


Find names of instructors with salary greater than that of some (at least one) instructor in the Biology department

SELECT DISTINCT T.name


FROM instructor AS T, instructor AS S
WHERE T.salary > S.salary AND S.dept_name = 'Biology';

The same above query using SOME clause

SELECT name
FROM instructor
WHERE salary > SOME (
SELECT salary
FROM instructor
WHERE dept_name = 'Biology');

Definition of "SOME" clause


F <comp> SOME r ⇔ ∃t ∈ r such that (F <comp> t)
where <comp> can be: <, ≤, >, ≥, =, 
=
SOME represents existential quantification [The entity in "()" is a tuple here]

5 < SOME (0, 5, 6) → true

5 < SOME (0, 5) → false

5 = SOME (0, 5) → true

5=
 SOME (0, 5) → true # as 0 =
5

Week 3 Lecture 2 2
(= SOME) ≡ IN

However, (= ≡ NOT IN
 SOME) 

Set Comparison - "ALL" clause


Find the names of all the instructors whose salary is greater than the salary of all instructors in the Biology department

SELECT name
FROM instructor
WHERE salary > ALL (
SELECT salary
FROM instructor
WHERE dept_name = 'Biology');

Definition of "ALL" clause


F <comp> ALL r ⇔ ∀t ∈ r such that (F <comp> t)
where <comp> can be: <, ≤, >, ≥, =, 
=
ALL represents universal quantification [The entity in "()" is a tuple here]

5 < ALL (0, 5, 6) → false

5 < ALL(6, 10) → true

5 = ALL(4, 5) → false

5=
 ALL(4, 5) → true
(=
 ALL) ≡ NOT IN
However, (= ALL) ≡
 IN

Test for empty relations: "EXISTS"


The EXISTS construct returns the value true if the argument subquery is non-empty

EXISTS r ⇔r=
∅
NOT EXISTS r ⇔r=∅

Use of "EXISTS" clause


Yet another way of specifying the query "Find all the courses taught in both the Fall 2009 semester and in the Spring
2010 semester"

SELECT course_id
FROM section AS S
WHERE semester = 'Fall' AND year = 2009
AND EXISTS (
SELECT * FROM section AS T
WHERE semester = 'Spring' AND year = 2010
AND S.course_id = T.course_id);

Correlation name - variable S in the outer query

Correlated subquery - the inner query

Use of "NOT EXISTS" clause


Find all students who have taken all courses offered by the Biology department

SELECT DISTINCT S.id, S.name


FROM student AS S
WHERE NOT EXISTS (
(
SELECT course_id
FROM course
WHERE dept_name = 'Biology')
EXCEPT
(
SELECT T.course_id
FROM takes AS T
WHERE S.id = T.id));

Week 3 Lecture 2 3
First nested query lists all the courses offered by the Biology department

Second nested query lists all the courses a particular student has taken

NOTE: X −Y =∅ ⇔X ⊆Y
NOTE: Cannot write this query string = ALL and its variants

Test for absence of duplicate tuples: "UNIQUE"


The UNIQUE construct tests whether a subquery has any duplicate tuples in its results

The UNIQUE construct evaluates to "true" if a given subquery contains no duplicates

Find all the courses that were offered at most once in 2009

SELECT T.course_id
FROM course AS T
WHERE UNIQUE (
SELECT R.course_id
FROM course AS R
WHERE T.course_id = R.course_id
AND R.year = 2009);

Subqueries in the "FROM" clause


SQL allows a subquery expression to be used in the FROM clause

Find the average instructors' salaries of those departments where the average salary is greater than $42,000

SELECT dept_name, avg_salary


FROM (
SELECT dept_name, AVG(salary) AS avg_salary
FROM instructor
GROUP BY dept_name)
WHERE avg_salary > 42000;

NOTE: We do not need a HAVING clause

Another way to write the above query

SELECT dept_name, avg_salary


FROM (
SELECT dept_name, AVG(salary)
FROM instructor
GROUP BY dept_name) AS dept_avg(dept_name, avg_salary)
WHERE avg_salary > 42000;

WITH clause
The WITH clause provides a way of defining a temporary relation whose definition is available only to the query in
which the WITH clause occurs

Find all the departments with the maximum budget

WITH max_budget(value) AS
(
SELECT MAX(budget)
FROM department)
SELECT department.name
FROM department, max_budget
WHERE department.budget = max_budget.value;

Complex queries using WITH clause


Find all departments where the total salary is greater than the average of the total salary at all departments

WITH dept_total(dept_name, value) AS


SELECT dept_name, SUM(salary)
FROM instructor
GROUP BY dept_name,
dept_total_avg(value) AS

Week 3 Lecture 2 4
(
SELECT AVG(value)
FROM dept_total)
SELECT dept_name
FROM dept_total, dept_total_avg
WHERE dept_total.value > dept_total_avg.value;

Subqueries in the SELECT clause


Scalar subquery: Where a single value is expected

List all departments along with the number of instructors in each department

SELECT dept_name, (
SELECT COUNT(*)
FROM instructor
WHERE department.dept_name = instructor.dept_name)
AS num_instructors
FROM department;

Runtime error occurs if subquery returns more than one result tuple

Modifications of the Database


Deletion of tuples from a given relation

Insertion of new tuples into a given relation

Updating of values in some tuples in a given relation

Deletion
Delete all instructors

DELETE FROM instructors;

Delete all instructors from the Finance department

DELETE FROM instructor


WHERE dept_name = 'Finance';

Delete all tuples in the instructor relation for those instructors associated with a department located in the Watson
building

DELETE FROM instructor


WHERE dept_name IN (SELECT dept_name
FROM department
WHERE building = 'Watson');

Delete all instructors whose salary is less than the average salary of instructors

DELETE FROM instructor


WHERE salary < (SELECT AVG(salary) FROM instructor);

Problem: As we delete tuples from deposit, the average salary changes

Solution:

First, compute AVG ( salary ) and find all the tuples to delete

Next, delete all the tuples found above (without recomputing AVG or retesting the tuples)

Insertion
Add a new tuple to the course

Week 3 Lecture 2 5
INSERT INTO course
VALUES ('CS-437', 'Database Systems', 'Comp. Sci.', 4);

or equivalently

INSERT INTO course (course_id, title, dept_name, credits)


VALUES ('CS-437', 'Database Systems', 'Comp. Sci.', 4);

Add a new tuple to student with tot_creds set to null

INSERT INTO student


VALUES ('3003', 'Green', 'Finance', null);

Add all instructors to the student relation with tot_creds set to 0

INSERT INTO student


SELECT id, name, dept_name, 0
FROM instructor;

The SELECT FROM WHERE statement is evaluated fully before any of its results are inserted into the relation

Otherwise queries like

INSERT INTO table1 SELECT * FROM table1;

would cause problems

Updates
Increase salaries of instructors whose salary is over $100,000 by 3% and all other by 5%

Write two UPDATE statements

UPDATE instructor
SET salary = salary * 1.03
WHERE salary > 100000;

UPDATE instructor
SET salary = salary * 1.05
WHERE salary <= 100000;

The order is important

Can be done better using the CASE statement

CASE statement for conditional updates


Same query as before but with CASE statement

UPDATE instructor
SET salary = CASE
WHEN salary <= 100000
THEN salary * 1.05
ELSE salary * 1.03
END;

Updates with scalar subqueries


Recompute and update tot_creds value for all the students

UPDATE student S
SET tot_creds = (SELECT SUM(credits)
FROM takes, course
WHERE takes.course_id = course.course_id AND

Week 3 Lecture 2 6
S.id = takes.id AND
takes.grade <> 'F' AND
takes.grade IS NOT NULL);

Set tot_creds to null for students who have not taken any course

Instead of SUM (credits) , use:

CASE
WHEN SUM(credits) IS NOT NULL THEN SUM(credits)
ELSE 0
END;

Week 3 Lecture 2 7
📚
Week 3 Lecture 3
Class BSCCS2001

Created @September 26, 2021 11:24 AM

Materials

Module # 13

Type Lecture

Week # 3

Intermediate SQL (part 2)


Joined Relations
Join operations take two relations and return as a result another relation

A join operation is a Cartesian product which requires that tuples in the two relations match (under some conditions)

It also specifies the attributes that are present in the result of the join

The join operations are typically used as subquery expressions in the FROM clause

Types of JOIN relations


Cross join

Inner join

Equi-join

Natural join

Outer join

Left outer join

Right outer join

Full outer join

Self-join

Cross JOIN

Week 3 Lecture 3 1
CROSS JOIN returns the Cartesian product of rows from tables in the join

Explicit

SELECT *
FROM employee CROSS JOIN department;

Implicit

SELECT *
FROM employee, department;

JOIN Operations - Example


Relation course

course_id title dept_name credits

BIO-301 Genetics Biology 4

CS-190 Game Design Comp. Sci. 4

CS-315 Robotics Comp. Sci. 3

Relation prereq

course_id prereq_id

BIO-301 BIO-101

CS-190 CS-101

CS-347 CS-101

Observe that

prereq information is missing from CS-315 and

course information is missing from CS-347

Inner JOIN
course INNER JOIN prereq

Name title dept_name credits prere_id course_id

BIO-301 Genetics Biology 4 BIO-101 BIO-301

CS-190 Game Design Comp. Sci. 4 CS-101 CS-190

If specified as NATURAL, the 2nd course_id field is skipped

course_id title Column credits

BIO-301 Genetics Biology 4

CS-190 Game Design Comp. Sci. 4

CS-315 Robotics Comp. Sci. 3

course_id prereq_id

BIO-301 BIO-101

CS-190 CS-101

CS-347 CS-101

Week 3 Lecture 3 2
Outer JOIN
An extension of the join operation that avoids loss of information

Computes the join and then adds tuples, from one relation that does not match tuples in the other relation, to the
results of the join

Uses null values

Left Outer JOIN


course NATURAL LEFT OUTER JOIN prereq

course_id title dept_name credits prere_id

BIO-301 Genetics Biology 4 BIO-101

CS-190 Game Design Comp. Sci. 4 CS-101

CS-315 Robotics Comp. Sci. 3 null

course_id title dept_name credits

BIO-301 Genetics Biology 4

CS-190 Game Design Comp. Sci. 4

CS-315 Robotics Comp. Sci. 3

course_id prereq_id

BIO-301 BIO-101

CS-190 CS-101

CS-347 CS-101

Right Outer JOIN


course NATURAL RIGHT OUTER JOIN prereq

course_id title dept_name credits prere_id

BIO-301 Genetics Biology 4 BIO-101

Week 3 Lecture 3 3
course_id title dept_name credits prere_id

CS-190 Game Design Comp. Sci. 4 CS-101

CS-347 null null null CS-101

course_id title dept_name credits

BIO-301 Genetics Biology 4

CS-190 Game Design Comp. Sci. 4

CS-315 Robotics Comp. Sci. 3

course_id prereq_id

BIO-301 BIO-101

CS-190 CS-101

CS-347 CS-101

Joined relations
Join operations take two relations and return a relation as the result

These additional operations are typically used as subquery expressions in the FROM clause

Join condition - defines which tuples in the two relations match, and what attributes are present in the result of the
join

Join type - defines how tuples in each relation, that do not match any tuple in the other relation (based on the join
condition), are treated

Join types

inner join

left outer join

right outer join

full outer join

Join conditions

natural

on <predicate>

using (A1 , A2 , ..., An )

Full outer JOIN


course NATURAL FULL OUTER JOIN prereq

course_id title dept_name credits prereq_id

BIO-301 Genetics Biology 4 BIO-101

CS-190 Game Design Comp. Sci. 4 CS-101

CS-315 Robotics Comp. Sci. 3 null

Week 3 Lecture 3 4
course_id title dept_name credits prereq_id

CS-347 null null null CS-101

course_id title dept_name credits

BIO-301 Genetics Biology 4

CS-190 Game Design Comp. Sci. 4

CS-315 Robotics Comp. Sci. 3

course_id prereq_id

BIO-301 BIO-101

CS-190 CS-101

CS-347 CS-101

Joined Relations - Example


course INNER JOIN prereq ON

course.course_id = prereq.course_id

course_id title dept_name credits prere_id courseid

BIO-301 Genetics Biology 4 BIO-101 BIO-301

CS-190 Game Design Comp. Sci. 4 CS-101 CS-190

What is the difference between the above (equi_join) and a natural join?

course LEFT OUTER JOIN prereq ON

course.course_id = prereq.course_id

course_id title dept_name credits prere_id courseid

BIO-301 Genetics Biology 4 BIO-101 BIO-301

CS-190 Game Design Comp. Sci. 4 CS-101 CS-190

CS-315 Robotics Comp. Sci. 3 null null

course NATURAL RIGHT OUTER JOIN prereq

course_id title dept_name credits prere_id

BIO-301 Genetics Biology 4 BIO-101

CS-190 Game Design Comp. Sci. 4 CS-101

CS-347 null null null CS-101

course FULL OUTER JOIN prereq USING (course_id)

course_id title dept_name credits prere_id

BIO-301 Genetics Biology 4 BIO-101

CS-190 Game Design Comp. Sci. 4 CS-101

Week 3 Lecture 3 5
course_id title dept_name credits prere_id

CS-315 Robotics Comp. Sci. 3 null

CS-347 null null null CS-101

Views
In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in
the database)

Consider a person who needs to know an instructors name and department, but not the salary. This person should
see a relation described, in SQL, by

SELECT id, name, dept_name


FROM instructor;

A VIEW provides a mechanism to hide certain data from the view of certain users

Any relation that is not of the conceptual model but is made visible to a user as a "virtual relation" is called a VIEW

View definition
A view is defined using the CREATE VIEW statement which has the form

CREATE VIEW v AS <query expression>

where <query expression> is any legal SQL expression

The view name is represented by v

Once a view is defined, the view name can be used to refer to the virtual relation that the view generates

View definition is not the same as creating a new relation by evaluating the query expression

Rather, a view definition causes the saving of an expression; the expression is substituted into queries using the
view

Example views
A view of instructors without their salary

CREATE VIEW faculty AS


SELECT id, name, dept_name
FROM instructor;

Find all the instructors in the biology department

SELECT name
FROM faculty
WHERE dept_name = 'Biology'

Create a view of department salary totals

CREATE VIEW departments_total_salary(dept_name, total_salary) AS


SELECT dept_name, SUM(salary)
FROM instructor
GROUP BY dept_name;

View defined using other views

CREATE VIEW physics_fall_2009 AS


SELECT course.course_id, sec_id, building, room_number
FROM course, section
WHERE course.course_id = section.course_id
AND course.dept_name = 'Physics'
AND section.semester = 'Fall'
AND section.year = '2009';

Week 3 Lecture 3 6
CREATE VIEW physics_fall_2009_watson AS
SELECT course_id, room_number
FROM phsics_fall_2009
WHERE building = 'Watson';

View expansion
Expand use of a view in a query / another view

CREATE VIEW physics_fall_2009_watson AS


(SELECT course_id, room_number
FROM (SELECT course.course_id, building, room_number
FROM course, section
WHERE course.course_id = section.course_id
AND course.dept_name = 'Physics'
AND section.semester = 'Fall'
AND section.year = '2009')
WHERE building = 'Watson');

Views defined using other views


One view may be used in the expression defining another view

A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the expression defining v1

A view relation v1 is said to depend on view relation v2 if either v1 depends directly on v2 or there is a path of
dependencies from v1 to v2

A view relation v is said to be recursive if it depends on itself

View expansion
A way to define the meaning of views defined in terms of other views

Let view v1 be defined by an expression e1 that may itself contain uses of view relations

View expansion of an expression repeats the following replacement step:

repeat

Find any view relation vi in e1

Replace the view relation vi by the expression defining vi

until no more view relations are present in e1

As long as the view definitions are not recursive, this loop with terminate

Update of a view
Add a new tuple to faculty view which we defined earlier

INSERT INTO faculty VALUES ('30765', 'Green', 'Music');

This insertion must be represented by the insertion of the tuple


('30765', 'Green', 'Music', null)

into the instructor relation

Some updates cannot be translated uniquely

CREATE VIEW instructor_info AS


SELECT id, name, building
FROM instructor, department
WHERE instructor.dept_name = department.dept_name;

INSERT INTO instructor_info VALUE('69987', 'White', 'Taylor');

Which department, if multiple departments in Taylor?

Week 3 Lecture 3 7
What if no department is present in Taylor?

Most SQL implementations allow updates only on simple views

The FROM clause has only one database relation

The SELECT clause contains only attribute names of the relation and does not have any expressions, aggregates
or DISTINCT specification

Any attribute not listed in the SELECT clause can be set to null

The query does not have a GROUP BY or HAVING clause

And some not at all

CREATE VIEW history_instructors AS


SELECT * FROM instructor
WHERE dept_name = 'History';

What happens when we insert ('25566', 'Brown', 'Biology', 100000) into the history_instructors ?

Materialized views
Materializing a view: Create a physical table containing all the tuples in the result of the query defining the view

If relations used in the query are updated, the materialized view result becomes out of data

Need to maintain the view, by updating the view whenever the underlying relations are updated

Week 3 Lecture 3 8
📚
Week 3 Lecture 4
Class BSCCS2001

Created @September 26, 2021 3:18 PM

Materials

Module # 14

Type Lecture

Week # 3

Intermediate SQL (part 3)


Transactions
It is a unit of work

Atomic transaction

Either something is fully executed or it is rolled back as if it never occurred

Example: Bank account transactions, when transferring money from one account to another, the transaction
should either happen or not happen at all.

It should not fail at a stage where money is deducted from one account and not added to the other account

Isolation from concurrent transactions

Transactions begin implicitly

Ended by COMMIT WORK or ROLLBACK WORK

But default on most databases: each SQL statement commits automatically

Can turn off auto-commit for a session (for example, using API)

In SQL:1999, can use: BEGIN ATOMIC ... END

Not supported on most databases

Integrity Constraints
Integrity constraints guard against accidental damage to the database by ensuring that the authorized changes to the
database do not result in a loss of data consistency

A checking account must have a balance greater than Rs. 10,000.00

Week 3 Lecture 4 1
A salary of a bank employee must be at least Rs. 250.00 an hour

A customer must have a (non-null) phone number

Integrity constraints on a single relation


NOT NULL

PRIMARY KEY

UNIQUE

CHECK(P ), where P is a predicate

NOT NULL and UNIQUE constraints


NOT NULL

Declare name and budget to be NOT NULL

name VARCHAR(20) NOT NULL


budget NUMERIC(12, 2) NOT NULL

UNIQUE(A1 , A2 , ..., Am )

The unique specification states that the attributes A1 , A2 , ..., Am form a candidate key

Candidate keys are permitted to be null (in contrast to primary keys)

The CHECK clause


CHECK(P ), where P is a predicate

Ensure that semester is one of fall, winter, spring or summer

CREATE TABLE section (


course_id VARCHAR(8),
sec_id VARCHAR(8),
semester VARCHAR(6),
year NUMERIC(4, 0),
building VARCHAR(15),
room_number VARCHAR(7),
time slot id VARCHAR(4),
PRIMARY KEY (course_id, sec_id, semester, year)
CHECK (semester IN ('Fall', 'Winter', 'Spring', 'Summer'))
);

Referential Integrity
Ensures that a value that appears in one relation for a given set of attributes also appeals for a certain set of attributes
in another relation

Example: If "Biology" is a department name appearing in one of the tuples in the instructor relation, then there exists
a tuple in the department relation for "Biology"

Let A be a set of attributes. Let R and S be two relations than contain attributes A.

Here, A is the primary key of S.

A is said to be a FOREIGN KEY of R if for any values of A appearing in R these values also appear in S

Cascading Actions in Referential Integrity


With cascading, you can define the actions that the Database Engine takes when a user tries to delete or update a
key to which existing foreign keys point

CREATE TABLE course (


course_id CHAR(5) PRIMARY KEY,
title VARCHAR(20),
dept_name VARCHAR(20) REFERENCES department
)

Week 3 Lecture 4 2
CREATE TABLE course (
...
dept_name VARCHAR(20),
FOREIGN KEY (dept_name) REFERENCES department
ON DELETE CASCADE
ON UPDATE
...
)

Alternative actions to cascade: NO ACTION, SET NULL, SET DEFAULT

Integrity constraint violation during transactions

CREATE TABLE person (


id CHAR(10),
name CHAR(40),
mother CHAR(10),
father CHAR(10),
PRIMARY KEY id,
FOREIGN KEY father REFERENCES person,
FOREIGN KEY mother REFERENCES person)

How to insert a tuple without causing constraint violation?

Insert father and mother of a person before inserting person

OR, set father and mother to null initially, update after inserting all persons (not possible if father and mother
attributes declared to be NOT NULL)

OR defer constraint checking

SQL Data Types and Schemas

Built-in data types in SQL


DATE: Dates, containing an (4 digit) year, month and date

Example: DATE '2005-7-27'

TIME: Time of day in hours, minutes and seconds

Example: TIME '09:00:30' TIME '09:00:30.75'

TIMESTAMP: Date plus time of the day

Example: TIMESTAMP '2005-7-27 09:00:30.75'

INTERVAL: Period of time

Example: INTERVAL '1' day

Subtracting a date/time/timestamp value from another gives an interval value

Interval values can be added to date/time/timestamp values

Index creation

CREATE TABLE student


( id VARCHAR(5),
name VARCHAR(20) NOT NULL,
dept_name VARCHAR(20),
tot_cred NUMERIC(3, 0) DEFAULT 0,
PRIMARY KEY (id));

CREATE INDEX studentid_index ON student(id);

Indices are data structures used to speed up access to records with specified values for index attributes

SELECT * FROM student


WHERE id = '12345';

Week 3 Lecture 4 3
Can be executed by using the index to find the required record, without looking at all records of students

User-defined types
CREATE TYPE construct in SQL creates user-defined type (alias, like typedef in C)

CREATE TYPE Dollars AS NUMERIC(2, 2) FINAL;

CREATE TABLE department (


dept_name VARCHAR(20),
building VARCHAR(15),
budget Dollars);

Domains
CREATE TYPE construct in SQL-92 creates user-defined domain types

CREATE DOMAIN person_name CHAR(20) NOT NULL;

Types and domains are similar

Domains can have constraints such as NOT NULL specified on them

CREATE DOMAIN degree_level VARCHAR(10)


CONSTRAINT degree_level_test
CHECK (VALUE IN('Bachelors', 'Masters', 'Doctorate'));

Large-object types
Large objects (photos, videos, CAD files, etc.) are stored as a large object:

blob: binary large object - object is a large collection of uninterpreted binary data (whose interpretation is left to
an application outside of the database system)

clob: character large object - object is a large collection of character data

When a query returns a large object, a pointer is returned than the large object itself

Authorization
Forms of authorization on parts of the database:

Read: allows reading, but not modification of data

Insert: allows insertion of new data, but not modification of existing data

Update: allows modification, but not deletion of data

Delete: allows deletion of data

Forms of authorization to modify the database schema

Index: allows creation and deletion of indices

Resources: allows creation of new relations

Alteration: allows addition or deletion of attributes in a relation

Drop: allows deletion of relations

Authorization Specification of SQL


The GRANT statement is used to confer authorization

GRANT <privilege list>


ON <relation name or view name> TO <user list>

<user list> is:

A user-id

Week 3 Lecture 4 4
PUBLIC, which allows all valid users the privilege granted

A role

Granting a privilege on a view does not imply granting any privileges on the underlying relations

The grantor of the privilege must already hold the privilege on specified item (or be the database administrator)

Privileges in SQL
SELECT: allows read access to relation or the ability to query using the view

Example: grant users U1 , U2 and U3 SELECT authorization on the instructor relation:


GRANT SELECT ON instructor TO U1 , U2 , U3

INSERT: the ability to insert tuples

UPDATE: the ability to update using the SQL update statement

DELETE: the ability to delete tuples

ALL PRIVILEGES: used as a short form for all the allowable privileges

Revoking authorization in SQL


The REVOKE statement is used to revoke authorization

REVOKE <privilege list>


ON <relation name or view name> FROM <user list>

Example:
REVOKE SELECT ON branch FROM U1 , U2 , U3

<privilege list> may be all to revoke all privileges the revokee may hold

If <revokee list> includes public, all users lose the privilege except those granted it explicitly

If the same privilege was granted twice to the same user by different grantees, the user may retain the privilege after
the revocation

All privileges that depend on the privilege being revoked are also revoked

Roles
CREATE ROLE instructor;

GRANT instructor TO Amit;

Privileges can be granted to roles:

GRANT SELECT ON takes TO instructor;

Roles can be granted to users as well as to other roles

CREATE ROLE teaching_assistant


GRANT teaching_assistant TO instructor;

Instructor inherits all privileges of teaching_assistant

Chain of roles

CREATE ROLE dean;

GRANT instructor TO dean;

GRANT dean TO Satoshi;

Authorization on views

Week 3 Lecture 4 5
CREATE VIEW geo_instructor AS
(SELECT *
FROM instructor
WHERE dept_name = 'Geology');
GRANT SELECT ON geo_instructor TO geo_staff;

Suppose that a geo_staff member issues

SELECT *
FROM geo_instructor;

What is

geo_staff does not have permissions on instructor?

creator of view did not have some permissions on instructor?

Other authorization features


REFERENCES privilege to create foreign key

GRANT REFERENCE (dept_name) ON department TO Mariano;

Why is this required?

Transfer of privileges

GRANT SELECT ON department TO Amit WITH GRANT OPTION;

REVOKE SELECT ON department FROM Amit, Satoshi CASCADE;

REVOKE SELECT ON department FROM Amit, Satoshi RESTRICT;

Week 3 Lecture 4 6
📚
Week 3 Lecture 5
Class BSCCS2001

Created @September 26, 2021 10:24 PM

Materials

Module # 15

Type Lecture

Week # 3

Advanced SQL
Functions and Procedural Constructs

Native Language ← → Query Language

Week 3 Lecture 5 1
Functions and Procedures
Functions / Procedures and Control Flow statements were added in SQL:1999

Functions/Procedures can be written in SQL itself or in an external programming language like C, Java, etc

Functions written in an external language are particularly useful with specialized data types such as images and
geometric objects

Example: Functions to check if polygons overlap or to compare images for similarity

Some database systems support table-valued functions which can return a relation as a result

SQL:1999 also supports a rich set of imperative constructs, including loops , if-then-else and assignment

Many databases have proprietary procedural extensions to SQL that differ from SQL:1999

SQL Functions
Define a function that, given the name of a department, returns the count of the number of instructors in that
department

CREATE FUNCTION dept_count (dept_name VARCHAR(20))


RETURN INTEGER
BEGIN
DECLARE d_count integer;
SELECT COUNT(*) INTO d_count
FROM instructor
WHERE instructor.dept_name = dept_name
RETURN d_cont;
END

The function dept_count can be used to find the department names and budget of all departments with more than 12
instructors:

SELECT dept_name, budget


FROM department
WHERE dept_count (dept_name) > 12;

Compound statement: BEGIN ... END

Week 3 Lecture 5 2
May contain multiple SQL statements between BEGIN and END

RETURNS: indicates the variable-type that is returned (eg: integer)

RETURN: specifies the values are to be returned as result of invoking the function

SQL function are in fact parameterized views that generalize the regular notion of views by allowing parameters

Table functions
Functions that return a relation as a result added in SQL:2003

Return all instructors in a given department:

CREATE FUNCTION instructor_of (dept_name CHAR(20))


RETURNS TABLE (
id VARCHAR(5),
name VARCHAR(20),
dept_name VARCHAR(20)
salary NUMERIC(8, 2))
RETURN TABLE
( SELECT id, name, dept_name, salary
FROM instructor
WHERE instrutor.dept_name = instructor_of.dept_name)

Usage

SELECT *
FROM TABLE (instructor_of('Music'))

SQL procedures
The dept_count function could instead be written as procedure:

CREATE PROCEDURE dept_count_proc(


IN dept_name VARCHAR(20), OUT d_count INTEGER)
BEGIN
SELECT COUNT(*) INTO d_count
FROM instructor
WHERE instructor.dept_name = dept_count_proc.dept_name
END

Procedures can be invoked either from an SQL procedure or from embedded SQL, using the CALL statement

DECLARE d_count INTEGER;


CALL dept_count_proc('Physics', d_count);

Procedures and functions can be invoked also from dynamic SQL

SQL:1999 allows overloading - more than one function/procedure of the same name as long as the number of
arguments and/or the types of the arguments differ

Language constructs for procedures and functions


SQL supports constructs that gives it almost all the power of a general purpose programming language

Warning: Most database systems implement their own variant of the standard syntax

Compound statement: BEGIN ... END

May contain multiple SQL statements between BEGIN and END

Local variables can be declared within a compound statements

WHILE loop:

WHILE boolean expression DO


sequence of statements;
END WHILE;

REPEAT loop:

Week 3 Lecture 5 3
REPEAT
sequence of statements;
UNTIL boolean expression
END REPEAT;

FOR loop:

Permits iteration over all results of a query

Find the budget of all departments

DECLARE n INTEGER DEFAULT 0;


FOR r AS
SELECT budget FROM department
DO
SET n = n + r.budget
END FOR;

Conditional statements

if-then-else

case

if-then-else statement

IF boolean expression THEN


sequence of statements;
ELSEIF boolean expression THEN
sequence of statements;
...
ELSE
sequence of statements;
END IF;

The IF statement supports the use of optional ELSEIF clauses and a default ELSE clause

Example procedure: registers student after ensuring classroom capacity is not exceeded

Returns 0 on success and -1 if the capacity is exceeded

Simple CASE statement

CASE variable
WHEN value1 THEN
sequence of statements;
WHEN value2 THEN
sequence of statements;
...
ELSE
sequence of statements;
END CASE;

The WHEN clause of the CASE statement defines the value that when satisfied determines the flow of control

Searched CASE statement

CASE
WHEN sql-expression = value1 THEN
sequence of statements;
WHEN sql-expression = value2 THEN
sequence of statements;
...
ELSE
sequence of statements;
END CASE;

Any supported SQL expression can be used here. These expressions can contain references to variables,
parameters, special registers and more.

Signaling of exception conditions and declaring handlers for exceptions

Week 3 Lecture 5 4
DECLARE out_of_classroom_seats CONDITION
DECLARE EXIT HANDLER FOR out_of_classroom_seats
BEGIN
...
SIGNAL out_of_classroom_seats
...
END

The handler here is EXIT - causes enclosing BEGIN ... END to terminate and exit

Other actions possible on exception

External Language Routines


SQL:1999 allows the definition of functions/procedures in an imperative programming language (Java, C#, C or C++)
which can be invoked from SQL queries

Such functions can be more efficient than functions defined in SQL. The computations that cannot be carried out in
SQL can be executed by these functions

Declaring external language procedures and functions

CREATE PROCEDURE dept_count_proc(


IN dept_count VARCHAR(20),
OUT count INTEGER
)
LANGUAGE C
EXTERNAL NAME '/usr/avi/bin/dept_count_proc'

CREATE FUNCTION dept_count(dept_name VARCHAR(20))


RETURNS integer
LANGUAGE C
EXTERNAL NAME '/usr/avi/bin/dept_count'

Benefits of external language functions/procedures:

More efficient for many operations and more expressive power

Drawbacks:

Code to implement function may need to be loaded into the DB system and executed in the DB system's address
space

Risk of accidental corruption of the DB structures

Security risk, allowing users access to unauthorized data

There are alternatives, which provide good security at the cost of performance

Direct execution in the DB system's space is used when efficiency is more important than security

External Language Routines: Security


To deal with security problems, we can do one of the following:

Use sandbox techniques:

That is, use a safe language like Java, which cannot be used to access/damage other parts of the DB code

Run external language functions/procedures in a separate process, with no access to the DB process' memory

Parameters and results communicated via the inter-process communication

Both have performance overheads

Many DB systems support both above approaches as well as direct executing in DB system address space

Triggers
A TRIGGER defines a set of actions that are performed in response to an INSERT, UPDATE or DELETE operation
on a specified table

When such an SQL operation is executed, the trigger is said to have been activated

Triggers are optional

Week 3 Lecture 5 5
Triggers are defined using the CREATE TRIGGER statement

Triggers can be used

To enforce data integrity rules via referential constraints and check constraints

To cause updates to other tables, automatically generate or transform values for inserted or updated rows, or
invoke functions to perform tasks such as issuing alerts

To design a trigger mechanism, we must:

Specify the events / (like UPDATE, INSERT or DELETE) for the trigger to executed

Specify the time (BEFORE or AFTER) of execution

Specify the actions to be taken when the trigger executes

Syntax of triggers may vary across systems

Types of Triggers: BEFORE


BEFORE triggers

Run before an UPDATE or INSERT

Values that are being updated or inserted can be modified before the DB is actually modified.

You can use triggers that run before an UPDATE or INSERT to ...

Check or modify the values before they are actually updated or inserted in the DB

Useful if user-view and internal DB format differs

Run other non-DB operations coded in user-defined functions

BEFORE DELETE triggers

Run before a DELETE

Checks value (and raises an error, if necessary)

Types of Triggers: AFTER


AFTER triggers

Run after an UPDATE, INSERT or DELETE

You can use triggers than run after an update or insert to:

Update data in other tables

Useful to maintain relationships between data or keep audit trail

Check against other data in the table or in other tables

Useful to ensure data integrity when referential integrity constraints aren't appropriate

When table check constraints limit checking to the current table only

Run non-DB operations coded in user-defined functions

Useful when issuing alerts or to update information outside the DB

Row level and Statement level Triggers


There are two types of triggers based on the level at which the triggers are applied:

Row level triggers are executed whenever a row is affected by the event on which the trigger is defined

Let Employee be a table with 100 rows.

Suppose an UPDATE statement is executed to increase the salary of each employee by 10%

Any row level UPDATE trigger configured on the table Employee will affect all the 100 rows in the table during this
update

Statement level triggers perform a single action for all the rows affected by a statement, instead of executing a
separate action for each affected row

Used for each statement instead of for each row

Week 3 Lecture 5 6
Uses referencing old table or referencing new table to refer to temporary tables called transition tables
containing the affected rows

Can be more efficient when dealing with SQL statements that update a large number of rows

Triggering Events and Actions in SQL


Triggering event can be an INSERT, DELETE or UPDATE

Triggers on update can restricted to specific attributes

For example: after update of takes on grade

Values of attributes before and after an update can be referenced

referencing old row as: for deletes and updates

referencing new row as: for inserts and updates

Triggers can be activated before an event, which can serve as extra constraints

For example: convert blank grades to null

CREATE TRIGGER setnull_trigger BEFORE UPDATE OF takes


REFERENCING NEW ROW AS nrow
FOR EACH ROW
WHEN (nrow.grade = '')
BEGIN ATOMIC
SET nrow.grade = null;
END;

Trigger to maintain credits_earned value

CREATE TRIGGER credits_earned AFTER UPDATE OF takes ON (grade)


REFERENCING NEW ROW AS nrow
REFERENCING OLD ROW AS orow
FOR EACH ROW
WHEN nrow.grade <> 'F' AND nrow.grade IS NOT NULL
AND (orow.grade = 'F' OR orow.grade IS NULL)
BEGIN ATOMIC
UPDATE student
SET tot_cred = tot_cred +
( SELECT credits
FROM course
WHERE course.course_id = nrow.course_id)
WHERE student.id = nrow.id;
END;

How to use triggers?


The optimal use of DML triggers is for short, simple and easy to maintain write operations that act largely independent
of an application business logic

Typical and recommended uses of triggers include:

Logging changes to a history table

Auditing users and their actions against sensitive tables

Adding additional values to a table that may not be available to an application (due to security restrictions or other
limitations), such as:

Login/user name

Time an operation occurs

Server/database name

Simple validation

Source: SQL Server triggers: The good and the scary

How not to use triggers?


Triggers are like Lays: Once you pop, you cannot stop

Week 3 Lecture 5 7
One of the greatest challenges for architects and developers is to ensure that

triggers are used only as needed, and

to not allow them to become a one-size-fits-all solution for any data needs that happen to come along

Adding triggers is often seen as faster and easier than adding code to an application, but the cost of doing so is
compounded over time with each added line of code.

Source: SQL Server triggers: The good and the scary

Alright then, how to use triggers?


Trigger can become dangerous when:

There are too many

Trigger code becomes complex

Triggers go cross-server - across DBs over networks

Triggers call other triggers

Recursive triggers are set to ON. The DB-level setting is set to off by default

Functions, stored procedures or views are in triggers

Iteration occurs

Source: SQL Server triggers: The good and the scary

Week 3 Lecture 5 8
📚
Week 4 Lecture 1
Class BSCCS2001

Created @September 29, 2021 11:42 AM

Materials

Module # 16

Type Lecture

Week # 4

Formal Relational Query Languages


Relational Algebra

Procedural and Algebra based

Tuple Relational Calculus

Non-procedural and Predicate Calculus based

Domain Relational Calculus

Non-procedural and Predicate Calculus based

Relational Algebra
Created by Edgar F. Codd at IBM in 1970

Procedural Language

Six basic operators

Select: σ

Project: Π

Union: ∪

Set difference: −

Cartesian product: ×

Rename: ρ

The operators take one or two relations as inputs and produce a new relation as the result

Week 4 Lecture 1 1
SELECT operation
Notation: σp (r)

p is called the selection predicate


Defined as:

σp (r) = {t∣t ∈ r and p(t)}


where p is a formula in propositional calculus consisting of terms connected by

∧ (and)
∨ (or)
¬ (not)
Each term is one of:

< attribute > op < attribute > or < constant >


where op is one of: =, =
, >, ≥ . < . ≤
Example of selection:

σdept_name =′ P hysics ′ (instructor)

PROJECT operation
Notation: ΠA 1 ,A 2 ,...A k (r)

where A1 , A2 are attribute names and r is a relation

The result is defined as the relation of k columns obtained by erasing the columns that are not listed.

Duplicate rows removed from result, since relations are sets

Example: To eliminate the dept_name attribute of instructor

ΠID, name, salary (instructor)

Week 4 Lecture 1 2
UNION operation
Notation: r ∪s
Defined as: r ∪ s = {t∣t ∈ r or t ∈ s}
For r ∪ s to be valid:
r, s must have the same arity (same number of attributes)
The attribute domains must be compatible (ie: same data type)

Example: To find all the courses taught in the Fall 2009 semester or in the Spring 2010 semester or in both

Πcourse_id (σsemester="F all"∧ year=2009 (section)) ∪ Πcourse_id (σsemester="Spring"∧ year=2010 (section))

Week 4 Lecture 1 3
DIFFERENCE operation
Notation: r −s
Defined as: r − s = {t∣t ∈ r and t ∈
/ s}
Set differences must be taken between compatible relations

r and s must have the same arity


Attribute domains of r and s must be compatible

Example: To find all the courses taught in the Fall 2009 semester, but not in the Spring 2010 semester

Πcourse_id (σsemester="F all"∧ year=2009 (section)) − Πcourse_id (σsemester="Spring"∧ year=2010 (section))

INTERSECTION operation
Notation: r ∩s
Defined as:

r ∩ s = {t∣t ∈ r and t ∈ s}
Assume:

r, s have the same ability


Attributes of r and s are compatible

Note: r ∩ s = r − (r − s)

Week 4 Lecture 1 4
CARTESIAN-PRODUCT operation
Notation: r ×s
Defined as:

r × s = {t q∣t ∈ r and q ∈ s}
Assume that attributes of r(R) and s(S) are disjoint

That is, R ∩ S =ϕ
If attributes of r(R) and s(S) are not disjoint, then renaming must be used.

Week 4 Lecture 1 5
RENAME operation
Allows us to name and, therefore, refer to the results of relational-algebra expressions

Allows us to refer to a relation by more than one name

Example:

ρx (E)
returns the expression E under the name X

If a relational algebra expression E has arity n, then

ρx(A 1 ,A 2 ,...,A n ) (E)


returns the result of the expression E under the name X and with the attributes renamed to

A1 , A2 , ..., An

DIVISION operation
The division operation is applied to two relations

R(Z) ÷ S(X), where X subset Z


Let Y = Z − X (and hence Z = X ∪ Y )
that is, let Y be the set of attributes of R that are not attributes of S

The result of DIVISION is a relation T (Y ) that includes a tuple t if tuples tR appear in R with tR [Y ] = t, and with
tR [X] = ts for every tuple tS in S
For a tuple t to appear in the result T of the DIVISION, the value in t must appear in R in combination with every
tuple in S

Division is a derived operation and can be expressed in terms of other operations

r ÷ s ≡ ΠR−S (r) − ΠR−S (r)((ΠR−S (r) × s) − ΠR−S ,S (r))

DIVISION Example #1
R S R|S

Lecturer Module Subject Lecturer

Brown Compilers Prolog Green

Brown Databases Lewis

Green Prolog

Green Databases

Lewis Prolog

Smith Databases

DIVISION Example #2
R S R|S

Lecturer Module Subject Lecturer

Brown Compilers Databases Green

Brown Databases Prolog

Green Prolog

Green Databases

Lewis Prolog

Smith Databases

DIVISION Example #3
A B1 A / B1

sno pno pno sno

Week 4 Lecture 1 6
sno pno pno sno

s1 p1 p2 s1

s1 p2 s2
B2
s1 p3 s3

s1 p4 pno s4

s2 p1 p2
A / B2
s2 p2 p4
s3 p2 sno
B3
s4 p2 s1
s4 p4 pno s4

p1
A / B3
p2

p4 sno

s1

DIVISION Example #4
Relation r, s

r s r÷s

A B B A

α 1 1 α

α 2 2 β

α 3

β 1

γ 1

δ 1

δ 3

δ 4

∈ 6

∈ 1

β 2

DIVISION Example #5
Relation r, s:

r s

A B C D E D E

α a α a 1 a 1

α a γ a 1 b 1

α a γ b 1

β a γ a 1

β a γ b 3

γ a γ a 1

γ a γ b 1

γ a β b 1

r÷s

A B C

α a γ

γ a γ

Week 4 Lecture 1 7
eg: Students who have taken both "a" and "b" courses, with instructor "1"

(Find all the students who have taken all courses given by the instructor 1)

Week 4 Lecture 1 8
📚
Week 4 Lecture 2
Class BSCCS2001

Created @September 30, 2021 10:23 AM

Materials

Module # 17

Type Lecture

Week # 4

Formal Relational Query Languages (part 2)


Predicate Logic
Predicate Logic or Predicate Calculus is an extension of Propositional Logic or Boolean Algebra

It adds the concept of predicates and quantifiers to better capture the meaning of statements that cannot be adequately
expressed by propositional logic

Tuple Relational Calculus and Domain Relational Calculus are based on Predicate Calculus

Predicate
Consider the statement: "x is greater than 3"

It has 2 parts

The first part is the variable x

It is the subject of the statement

The second part "is greater than 3"

It is the predicate of the statement

This refers to the property that the subject of the statement can have

The statement "x is greater than 3" can be denoted by P (x) where P denotes the predicate "is greater than 3" and x
is the variable

The predicate P can be considered as a function. It tells the truth value of the statement P (x) at x

Once a value has been assigned to the variable x, the statement P (x) becomes a proposition and has a truth
or false value

Week 4 Lecture 2 1
In general, a statement involving n variables x1 , x2 , x3 , ..., xn can be denoted by P (x1 , x2 , x3 , ..., xn )

Here, P is also referred to as the n-place predicate or an n-ary predicate

Quantifiers
In predicate logic, predicates are used alongside quantifiers to express the extent to which a predicate is true over a range
of elements

Using quantifiers to create such propositions is called quantification

There are 2 types of quantifiers:

Universal Quantifier

Existential Quantifier

Universal Quantifier
Universal Quantification: Mathematical statements sometimes assert that a property is true for all the values of a
variable in a particular domain, called the Domain of Discourse

Such a statement is expressed using universal quantification

The universal quantification of P (x) for a particular domain is the proposition that assert that P (x) is true for all
values of x in this domain

The domain is very important here since it decides the possible values of x

Formally, the universal quantification of P (x) is the statement "P (x) for all values of x in the domain"

The notation ∀P (x) denotes the universal quantification of P (x)

Here, ∀ is called the universal quantifier

∀P (x) is read as "for all x P(x)"


Example: Let P (x) be the statement "x + 2 > x"
What is the truth value of the statement ∀xP (x)?

Solution: As x + 2 is greater than x for any real number, so P (x) ≡ T for all x or ∀xP (x) ≡ T

Existential Quantifier
Existential Quantification: Some mathematical statements assert that there is an element with a certain property

Such statements are expressed by existential quantification

Existential quantification can be used to form a proposition that is true if and only if P (x) is true for at least one value of
x in the domain
Formally, the existential quantification of P (x) is the statement "There exists an element x in the domain such that
P (x)"
The notation ∃P (x) denotes the existential quantification of P (x)

Here ∃ is called the existential quantifier

∃P (x) is read as "There is at least one such x such that P (x)"


Example: Let P (x) be the statement "x > 5"
What is the truth value of the statement ∃xP (x)?

Solution: P (x) is true for all real numbers greater than 5 and false for all real numbers less than 5

So, ∃xP (x) ≡T

Tuple Relational Calculus


TRC is a non-procedural query language, where each query is of the form

{t∣P (t)}
where t = resulting tuples

P (t) = known as predicate and these are the conditions that are used to fetch t
P (t) may have various conditions logically combined with OR( ∨ ), AND( ∧ ), NOT( ¬ )

Week 4 Lecture 2 2
It also uses quantifiers:

∃t ∈ r(Q(t)) = "there exists" a tuple in t in relation r such that predicate Q(t) is true
∀t ∈ r(Q(t)) = Q(t) is true "for all" tuples in relation r
{P ∣∃S ∈ Students and (S.CGP A > 8 ∧ P .name = S.name ∧ P .age = S.age)} :
returns the name and age of students with a CGPA above 8

Predicate Calculus Formula


Set of attributes and constants

Set of comparison operators: (eg: <, ≤, =, =


, >, ≥)
Set of connectives: and(∧), or(∨), not(¬)

Implication (⇒): x ⇒ y, if x is true, then y is true


x ⇒ y ≡ ¬x ∨ y
Set of quantifiers:

∃t ∈ r(Q(t)) ≡ "there exists" a tuple in t in relation r such that predicate Q(t) is true
∀t ∈ r(Q(t)) ≡ Q is true "for all" tuples t in relation r

TRC Example #1
Student

Fname Lname Age Course

David Sharma 27 DBMS

Aaron Lilly 17 JAVA

Sahil Khan 19 Python

Sachin Rao 20 DBMS

Varun George 23 JAVA

Simi Verma 22 JAVA

Q. 1: Obtain the first name of students whose age is greater than 21

Solution:

{t.F name ∣ Student ∧ t.age > 21}


{t.F name ∣ t ∈ Student ∧ t.age > 21}
{t ∣ ∃s ∈ Student(s.age > 21 ∧ t.F name = s.F name)}

Fname

David

Varun

Simi

TRC Example #2
Consider the relational schema

student(rollNo, name, year, courseId)


course(courseId, cname, teacher)

Q. 2: Find out the names of all the students who have taken the course named 'DBMS'

{t ∣ ∃s ∈ student ∃c ∈ course(s.courseId = c.courseId ∧ c.cname =′ DBMS ′ ∧ t.name = s.name)}


{s.name ∣ s ∈ student ∧ ∃c ∈ course(s.courseId = c.courseId ∧ c.cname =′ DBMS ′ )}
Q. 3: Find out the names of all students and their rollNo who have taken the course named 'DBMS'

Week 4 Lecture 2 3
{s.name, s.rollNo ∣ s ∈ student ∧ ∃c ∈ course(s.courseId = c.courseId ∧ c.cname =′ DBMS ′ )}
{t ∣ ∃s ∈ student ∃c ∈ course(s.courseId = c.courseId ∧ c.cname =′ DBMS ′ ∧ t.name = s.name ∧
t.rollNo = s.rollNo)}

TRC Example #3
Consider the following relations:

Flights(flno, from, to, distance, departs, arrive)


Aircraft(aid, aname, cruisingrange)
Certified(eid, aid)
Employees(eid, ename, salary)

Q. 4: Find the eids of pilots certified for Boeing aircraft

RA

Πeid (σaname=′ B oeing ′ (Aircraft ⋈ Certified))


TRC

{C.eid ∣ C ∈ Certified ∧ ∃A ∈ Aircraft(A.aid = C.aid ∧ A.aname =′ Boeing′ )}


{T ∣ ∃C ∈ Certified ∃A ∈ Aircraft(A.aid = C.aid ∧ A.aname =′ Boeing′ ∧ T .eid = C.eid)}

TRC Example #4
Consider the following relations:

Flights (flno, from, to, distance, departs, arrives)


Aircraft (aid, aname, cruisingrange)
Certified (eid, aid)
Employees (eid, ename, salary)

Q. 5: Find the names and salaries of certified pilots working on Boeing aircrafts

RA

Πename,salary (σaname=′ B oeing ′ (Aircraft ⋈ Certified ⋈ Employees))


TRC

{P ∣ ∃E ∈ Employees ∃C ∈ Certified ∃A ∈ Aircraft(A.aid = C.aid ∧ A.aname =′ Boeing′ ∧


E.eid = C.eid ∧ P .ename = E.ename ∧ P .salary = E.salary)}

TRC Example #5
Consider the following relations:

Flights (flno, from, to, distance, departs, arrive)


Aircraft (aid, aname, cruisingrange)
Certified (eid, aid)
Employees (eid, ename, salary)

Q. 6: Identify the flights that can be piloted by every pilot whose salary is more than $100, 000

{Fl.flno ∣ F ∈ Flights ∧ ∃C ∈ Certified ∃E ∈ Employees(E.salary > 100, 000 ∧ E.eid = C.eid)}

Safety of Expressions
It is possible to write tuple calculus expressions that generate infinite relations

For example, {t ∣ ¬t ∈ r} results in an infinite relation if the domain of any attribute of the relation r is infinite
To guard against the problem, we restrict the set of allowable expressions to safe expressions

An expression {t ∣ P (t)} in the tuple relational calculus is safe if every component of t appears in one of the
relations, tuples or constants that appear in P

NOTE: This is more than just a syntax condition

Eg: {t ∣ t[A] = 5 ∨ true} is not safe → it defines an infinite set with attribute values that do not appear in any
relation or tuples or constants in P

Week 4 Lecture 2 4
Domain Relational Calculus
A non-procedural query language equivalent in power to the tuple relational calculus

Each query is an expression of the form:

{< x1 , x2 , ..., xn > ∣P (x1 , x2 , ..., xn )}


x1 , x2 , ..., xn represents domain variables
P represents a formula similar to that of the predicate calculus

Equivalence of Relational Algebra, Tuple Relational Calculus & Domain Relational Calculus
SELECT operation

R = (A, B)
Relational Algebra: σB =17 (r)

Tuple Calculus: {t ∣ t ∈ r ∧ B = 17}


Domain Calculus: {< a, b > ∣ < a, b >∈ r ∧ b = 17}

PROJECT operation

R = (A, B)
Relational Algebra: ΠA (r)

Tuple Calculus: {t ∣ ∃p ∈ r(t[A] = p[A])}


Domain Calculus: {< a > ∣ ∃ b (< a, b >∈ r)}

COMBINING operation

R = (A, B)
Relational Algebra: ΠA (σB =17 (r))

Tuple Calculus: {t ∣ ∃p ∈ r(t[A] = p[A] ∧ p[B] = 17)}


Domain Calculus: {< a > ∣ ∃ b (< a, b >∈ r ∧ b = 17)}

UNION

R = (A, B, C) S = (A, B, C)
Relational Algebra: r ∪s
Tuple Calculus: {t ∣ t ∈ r ∨ t ∈ s}
Domain Calculus: {< a, b, c > ∣ < a, b, c >∈ r ∨ < a, b, c >∈ s}

SET DIFFERENCE

R = (A, B, C) S = (A, B, C)
Relational Algebra: r −s
Tuple Calculus: {t ∣t∈r∧t∈
/ s}
Domain Calculus: {< a, b, c > ∣ < a, b, c >∈ r ∧ < a, b, c >∈
/ s}

INTERSECTION

R = (A, B, C) S = (A, B, C)
Relational Algebra: r ∩s
Tuple Calculus: {t ∣ t ∈ r ∧ t ∈ s}
Domain Calculus: {< a, b, c > ∣ < a, b, c >∈ r ∧ < a, b, c >∈ s}

CARTESIAN / CROSS PRODUCT

Week 4 Lecture 2 5
R = (A, B) S = (C, D)
Relational Algebra: r ×s
Tuple Calculus: {t ∣ ∃p ∈ r∃q ∈ s(t[A] = p[A] ∧ t[B] = p[B] ∧ t[C] = q[C] ∧ t[D] = q[D])}
Domain Calculus: {< a, b, c, d > ∣ < a, b >∈ r∧ < c, d >∈ s}

NATURAL JOIN

R = (A, B, C, D) S = (B, D, E)
Relational Algebra:

r⋈s
Πr.A,r.B ,r.C,r.D,s.E (σr.B =s.B ∧r.D=s.D (r × s))
Tuple Calculus:

{t ∣ ∃ p ∈ r ∃ q ∈ s(t[A] = p[A] ∧ t[B] = p[B] ∧ t[C] = p[C] ∧ t[D] = p[D] ∧ t[E] = q[E] ∧ p[B] =
q[B] ∧ p[D] = q[D])}
Domain Calculus:

{< a, b, c, d, e > ∣ < a, b, c, d >∈ r ∧ < b, d, e >∈ s}

DIVISION

R = (A, B) S = (B)
Relational Algebra: r ÷s
Tuple Calculus: {t ∣ ∃ p ∈ r ∀ q ∈ s(p[B] = q[B] ⇒ t[A] = p[A])}
Domain Calculus: {< a > ∣ < a >∈ r ∧ ∀ < b > (< b >∈ s ⇒< a, b >∈ r)}

Source: https://www2.cs.sfu.ca/CourseCentral/354/louie/Equiv_Notations.pdf

Week 4 Lecture 2 6
📚
Week 4 Lecture 3
Class BSCCS2001

Created @September 30, 2021 4:40 PM

Materials

Module # 18

Type Lecture

Week # 4

Entity-Relationship Model
Design Process
What is a Design?
A Design:

Satisfies a given (perhaps informal) functional specification

Conforms to the limitations of the target medium

Meets implicit or explicit requirements on performance and resource usage

Satisfies implicit or explicit design criteria on the form of the artifact

Satisfies restrictions on the design itself, such as its length or cost, or the tools available for doing the design

Role of Abstraction
Disorganized Complexity results from

Storage (STM) limitations of the human brain - an individual can simultaneously comprehend of the order of
seven, plus or minus two chunks of information

Speed limitations of human brain - it takes the mind about five seconds to accept a new chunk of information

Abstraction provides the major tool to handle Disorganized Complexity by chunking information

Ignore in-essential details, deal only with the generalized, idealized model of the world

Consider: A binary number 110010101001

Hard to remember

Week 4 Lecture 3 1
Try the octal form: (110)(010)(101)(001) ⟹ 6251
Or the hex form: (1100)(1010)(1001) ⟹ CA9

Model Building
Physics Electrical Circuits

Time-Distance Equation Kirchoff's Loop Equations

Quantum Mechanics Time Series Signals and FFT

Chemistry Transistor Models

Valency-bond Structures Schematic Diagrams

Geography Interconnect Routing

Maps Building & Bridges

Projections Drawings - Plan, Elevation, Side view

Finite Element Models

Models are common in all engineering disciplines

Model building follows principles of decomposition, abstraction and hierarchy

Each model describes a specific aspect of the system

Build new models upon old proven models

Design Approach
Requirement Analysis: Analyse the data needs of the prospective DB users

Planning

System Defining

DB Designing: Use a modeling framework to create abstraction of the real world

Logical Model

Physical Model

Implementation

Data Conversion and Loading

Testing

Logical Model: Deciding on a good DB schema

Business Decision: What attributes should we record in the DB?

Computer Science Decision: What relation schema should we have and how should the attributes be distributed
among the various relation schema?

Physical Model: Deciding on the physical layout of the DB

Week 4 Lecture 3 2
Entity Relationship Model

Models an enterprise as a collection of entities and relationships

Entity → A distinguishable "thing" or "object" in the enterprise

Described by a set of attributes

Relationship → An association among multiple entities

Represented by an Entity-Relationship or ER diagram

Database Normalization

Formalize what designs are bad and test for them

Entity Relationship (ER) Model


ER Model: Database Modeling
The ER data model was developed to facilitate DB design by allowing specification of an enterprise schema that
represents the overall logical structure of a DB

The ER model is useful in mapping the meanings and interactions of the real world enterprises onto a conceptual
schema

The ER data model employs three basic concepts:

Attributes

Entity sets

Relationship sets

The ER model also has an associated diagrammatic representation, the ER diagram, which can express the overall
logical structure of a DB graphically

Attributes
An attribute is a property associated with an entity / entity set

Based on the values of certain attributes, an entity can be identified uniquely

Attribute types:

Simple and Composite attributes

Single-valued and Multi-valued attributes

Example: Multi-valued attribute: phone_numbers

Derived attributes

Can be computed from other attributes

Example: age, given date_of_birth

Week 4 Lecture 3 3
Domain: The set of permitted values for each attribute

Attributes: Composite

Entity sets
An entity is an object that exists and is distinguishable from other objects

Example: specific person, company, event, plant

An entity set is a set of entities of the same type that share the same properties

Example: set of all persons, companies, trees, holidays

An entity is represented by a set of attributes: ie, descriptive properties possessed by all members of an entity set

Example:

instructor = (ID, name, street, city, salary)


course = (course_id, title, credits)

-- Here ID and course_id are the primary keys, but


-- the tool I am using to make PDFs is not marking them underline

A subset of the attributes form a primary key of the entity set; that is, uniquely identifying each member of the set

Primary key of an entity set is represented by underlining it

Entity sets - instructor and student


instructor student

instructor_id instructor_name student_id student_name

76766 Crick 98988 Tanaka

45565 Katz 12345 Shankar

10101 Srinivasan 00128 Zhang

98345 Kim 76543 Brown

76543 Singh 76653 Aoi

22222 Einstein 23121 Chavez

44553 Peltier

Relationship sets
A relationship is an association among several entities

Example:

44553 (Peltier) advisor 22222 (Einstein)

student entity relationship set instructor entity

A relationship set is a mathematical relation among n ≥ 2 entities, each taken form entity sets
{(e1 , e2 , ..., en )∣e1 ∈ E1 , e2 ∈ E2 , ..., en ∈ En }
where (e1 , e2 , ..., en ) is a relationship

Week 4 Lecture 3 4
Example: (44553, 22222) ∈ advisor

Relationship set: advisor

An attribute can also be associated with a relationship set

For instance, the advisor relationship set between entity sets instructor and student may have the attribute date

which tracks when the student started being associated with the advisor

Binary relationship

involves two entity sets (or degree two)

most relationship sets in a database systems are binary

Relationships between more than two entity sets are rare

Most relationships are binary

Example: students work on research projects under the guidance of an instructor

Relationship proj_guide is a ternary relationship between instructor , student and project

Attributes: Redundant
Suppose we have entity sets:

Week 4 Lecture 3 5
instructors, with attributes: ID, name, dept_name, salary

department, with attributes: dept_name, building, budget

We model the fact that each instructor has an associated department using a relationship set inst_dept

The attribute dept_name appears in both entity sets

Since it is the primary key for the entity set department, it replicates information present in the relationship and is
therefore redundant in the entity set instructor and needs to be removed

BUT: When converting back to tables, in some cases the attributes gets re-introduced, as we will see later

Mapping Cardinality: Constraints


Express the number of entities to which another entity can be associated via a relationship set

Most useful in describing binary relationship sets

For a binary relationship set the mapping cardinality must be one of the following types:

One to One

One to Many

Many to One

Many to Many

Mapping Cardinalities

Week 4 Lecture 3 6
NOTE: Some elements in A and B may not be mapped to any elements in the other set

Weak Entity sets


An entity set may be one of the two types:

Strong entity set

A strong entity set is an entity set that contains sufficient attributes to uniquely identify all its entities

In other words, a primary key exists for a strong entity set

Primary key of a strong entity set is represented by underlining it

Weak entity set

A weak entity set is an entity set that does not contain sufficient attributes to uniquely identify its entities

In other words, a primary key does not exist for a weak entity set

However, it contains a partial key called as the discriminator

Discriminator can identify a group of entities from the entity set

Discriminator is represented by underlining with a dashed line

Since a weak entity set does not have a primary key, it cannot independently exist in the ER model

It features in the model in relationship with a strong entity set

This is called as the identifying relationship

Primary Key of a Weak entity set

The combination of discriminator and primary key of the strong entity set makes it possible to uniquely identify all
entities of the weak entity set

Thus, this combination serves as a primary key for the weak entity set

Clearly, this primary key is not formed by the weak entity set completely

Primary Key of a Weak Entity Set = Its own discriminator + Primary Key of Strong Entity Set

Weak entity set must have total participation in the identifying relationship

That is, all the entities must feature in the relationship

Weak Entity set: Example


Strong Entity Set: Building(building_no, buildname, address)

Week 4 Lecture 3 7
building_no is the primary key here

Weak Entity Set: Apartment(door_no, floor)

door_no is its discriminator as door_no alone can not identify an apartment uniquely

There may be several other buildings having the same door number

Relationship: BA between Building and Apartment

By total participation in BA, each apartment must be present in at least one building

In contrast, Building has partial participation in BA only as there might exist some buildings which has not apartment

Primary Key: To uniquely identify an apartment

First, building_no is required to identify the particular building

Second, door_no of the apartment is required to uniquely identify the apartment

Primary Key of Apartment = Primary Key of the Building + Its own discriminator = building_no + door_no

Weak Entity set: Example #2


Consider a section entity, which is uniquely identified by a course_id, semester, year and sec_id

Clearly, section entities are related to course entities

Suppose we create a relationship set sec_course between entity sets section and course

Note that the information in sec_course is redundant, since section already has an attribute course_id, which identifies
the course with which the section is related

Week 4 Lecture 3 8
📚
Week 4 Lecture 4
Class BSCCS2001

Created @September 30, 2021 6:29 PM

Materials

Module # 19

Type Lecture

Week # 4

Entity-Relationship Model (part 2)


ER Diagram
Entity Sets
Entities can be represented graphically as follows:

Rectangles represent entity set instructor

ID
Attributes are listed inside entity rectangle
name
Underline indicates primary key attributes
salary

student

ID

name

tot_cred

Relationship sets
Diamonds represent relationship sets

Week 4 Lecture 4 1
Relationship sets with attributes

Roles
Entity sets of relationship need not be distinct

Each occurrence of an entity set plays a "role" in the relationship

The labels "course_id" and "prereq_id" are called roles

Cardinality Constraints
We express cardinality constraints by drawing either a directed line ( → ), signifying "one" or an undirected line (−),
signifying "many" between the relationship set and the entity set

One to One relationship between an instructor and a student:

A student is associated with at most one instructor via the relationship advisor

An instructor is associated with at most one student via the relationship advisor

One-to-Many relationship

Week 4 Lecture 4 2
One-to-Many relationship between an instructor and a student

An instructor is associated with several (including 0) students via advisor

A student is associated with at most one instructor via advisor

Many-to-Many relationship
An instructor is associated with several (including 0) students via advisor

A student is associated with several (including 0) instructors via advisor

Total and Partial participation


Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the
relationship set

participation of student in advisor relation is total

every student must have an associated instructor

Partial participation: some entities may not participate in any relationship in the relationship set

Example: participation of instructor in advisor is partial

Notation for expressing more complex constraints


A line may have an associated minimum and maximum cardinality, shown in the form l..h, where l is the minimum and
h is the maximum cardinality

A minimum value of 1 indicates total participation

A maximum value of 1 indicates that the entity participation in at most one relationship

A maximum value of ∗ indicates no limit

Week 4 Lecture 4 3
Instructor can advise 0 or more students

A student must have 1 advisor; cannot have multiple advisors

Notation to express entity with complex attributes


instructor

ID
name
first_name
middle_initial
last_name
address
street
street_number
street_name
apt_number
city
state
zip
{ phone_number }
date_of_birth
age()

Expressing Weak entity sets


In ER diagrams, a weak entity set is depicted via a double rectangle

We underline the discriminator of a weak entity set with a dashed line

The relationship set connecting the weak entity set to the identifying strong entity set is depicted by a double diamond

Primary key for section - (course_id, sec_id, semester, year)

ER diagram for a University enterprise

Week 4 Lecture 4 4
ER Model to Relational Schema
Reduction to Relation Schema
Entity sets and relationship sets can be expressed uniformly as relation schemas that represent the contents of the
DB

A DB which conforms to an ER diagram can be represented by a collection of schemas

For each entity set and relationship set there is a unique schema that is assigned the name of the corresponding
entity set or relationship set

Each schema has a number of columns (generally corresponding to attributes) which have unique names

Representing entity sets


A strong entity set reduces to a schema with the same attributes

student (ID, name, tot_cred)

A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set

section (course_id, sec_id, sem, year)

Week 4 Lecture 4 5
Representing relationship sets
A many-to-many relationship set is represented as a schema with attributes for the primary keys of the two
participating entity sets and any descriptive attributes of the relationship set

Example: schema for relationship set advisor

advisor = (s_id, i_id)

Representation of entity sets with composite attributes


Composite attributes are flattened out by creating a separate attribute for each component attribute

Example: Given entity set instructor with composite attribute name with component attributes first_name and
last_name the schema corresponding to the entity set has two attributes name_first_name and
name_last_name

Prefix omitted if there is no ambiguity (name_first_name could simply be first_name)

Ignoring multi-valued attributes, extended instructor schema is

instructor (ID, first_name, middle_initial, last_name,


street_number street_name, apt_number, city,
state, zip_code, date_of_birth)

Representation of Entity sets with multi-valued attributes


A multi-valued attribute M of an entity E is represented by a separate schema EM

Schema EM has attributes corresponding to the primary key of E and an attribute corresponding to multi-valued
attribute M

Example: Multi-valued attribute phone_number of instructor is represented by a schema:

inst_phone = (ID, phone_number)

Each value of the multi-valued attribute maps to a separate tuple of the relation on schema EM

For example: an instructor entity with primary key 22222 and phone numbers 456-7890 and 123-4567 maps to
two tuples: (22222, 456-7890) and (22222, 123-4567)

Redundancy of the Schema


Many-to-One and One-to-Many relationship sets that are total on the many-side can be represented by adding an
extra attribute to the "many" side, containing the primary key of the "one" side

Example: Instead of creating a schema for relationship set inst_dept, add an attribute dept_name to the schema
arising from entity set instructor

Week 4 Lecture 4 6
For One-to-One relationship sets, either side can be chosen to act as the "many" side

That is, an extra attribute can be added to either of the tables corresponding to the two entity sets

If participation is partial on the "many" side, replacing a schema by an extra attribute in the schema corresponding to
the "many" side could result in null values

The schema corresponding to a relationship set linking a weak entity set to its identifying strong entity set is redundant

Example: The section schema already contains the attributes that would appear in the sec_course schema

Week 4 Lecture 4 7
📚
Week 4 Lecture 5
Class BSCCS2001

Created @September 30, 2021 8:37 PM

Materials

Module # 20

Type Lecture

Week # 4

Entity-Relationship Model (part 3)


Extended ER features
Non-binary Relationship sets
Most relationship sets are binary

There are occasions when it is more convenient to represent relationships as non-binary

ER diagram with a Ternary Relationship

Cardinality constraints on Ternary Relationship

Week 4 Lecture 5 1
We allow at most one arrow out of a ternary (or greater degree) relationship to indicate a cardinality constraint

For example, an arrow from proj_guide to instructor indicates each student has at most one guide for a project

If there is more than one arrow, there are two ways of defining the meaning

For example, a ternary relationship R between A, B and C with arrows to B and C could mean

Each A entity is associated with a unique entity from B and C or

Each pair of entities form (A, B) is associated with a unique entity and each pair (A, C) is associated with a
unique B

Each alternative has been used in different formalisms

To avoid confusion we outlaw more than one arrow

Specialization: ISA
Top-down design process: We designate sub-groupings within an entity set that are distinctive from other entities in
the set

These sub-groupings become lower-level entity sets that have attributes or participate in relationships that do not
apply to the higher-level entity set

Depicted by a triangle component leveled ISA (eg: instructor "is a" person)

Attribute inheritance: A lower-level entity set inherits all the attributes and relationship participation of the higher-
level entity set to which it is linked

Overlapping: employee and student

Disjoint: instructor and secretary

Total and Partial

Representing Specialization via Schema


Method 1:

Form a schema for the higher-level entity

Week 4 Lecture 5 2
Form a schema for each lower-level entity set, include primary key of higher-level entity set and local attributes

schema attributes

person ID, name, street, city

student ID, tot_cred

employee ID, salary

Drawback: Getting information about an employee requires accessing two relations, the one corresponding to the
low-level schema and the one corresponding to the high-level schema

Method 2:

Form a schema for each entity set with all local and inherited attributes

Name Tags

person ID, name, street, city

student ID, name, street, city, tot_cred

employee ID, name, street, city, salary

Drawback: name, street and city may be stored redundantly for people who are both students and employees

Generalization
Bottom-up design process: Combine a number of entity sets that share the same features into a higher-level entity
set

Specialization and generalization are simple inversions of each other; they are represented in an ER diagram in the
same way

The terms specialization and generalization are used interchangeably

Design constraints on a specialization / generalization


Completeness constraint: Specifies whether or not an entity in the higher-level entity set must belong to at least one
of the lower-level entity sets within a generalization

total: an entity must belong to one of the lower-level entity sets

partial: an entity need not belong to one of the lower-level entity sets

Partial generalization is the default

We can specify total generalization in an ER diagram by adding the keyword total in the diagram

Drawing a dashed line from the keyword to the corresponding hollow arrow-head to which it applies (for a total
generalization) or to the set of hollow arrow-heads to which it applies (for an overlapping generalization)

The student generalization is total

All student entities must be either graduate or undergraduate

Because the higher-level entity set arrived at through generalization is generally composed of only those entities
in the lower-level entity sets, the completeness constraint for a generalized higher-level entity set is usually total

Aggregation
Consider the ternary relationship proj_guide, which we saw earlier

Suppose we want to record evaluations of a student by a guide on a project

Week 4 Lecture 5 3
Relationship sets eval_for and proj_guide represent overlapping information

Every eval_for relationship corresponds to a proj_guide relationship

However, some proj_guide relationships may not correspond to any eval_for relationships

So, we cannot discard the proj_guide relationship

Eliminate this redundancy via aggregation

Treat relationship as an abstract entity

Allows relationships between relationships

Abstraction of relationship into new entity

Eliminate this redundancy via aggregation without introducing redundancy, the following diagram represents:

A student is guided by a particular instructor on a particular project

A student, instructor, project combination may have an associated evaluation

Week 4 Lecture 5 4
Representing aggregation via Schema
To represent aggregation, create a schema containing

Primary key of the aggregated relationship

The primary key of the associated entity set

Any descriptive attributes

In our example

The schema

textiteval_for is:

eval_for (s_ID, project_id, i_ID, evaluation_id)

The schema proj_guide is redundant

Design Issues
Entities v/s Attributes
Use of entity sets v/s attributes

Use of phone as an entity allows extra information about phone numbers (plus multiple phone numbers)

Entities v/s Relationship sets


Use of entity sets v/s relationship sets

Possible guideline is to designate a relationship set to describe an action that occurs between entities

Week 4 Lecture 5 5
Placement of relationship attributes

For example, attribute date as attribute of advisor or as attribute of student

Binary v/s Non-binary Relationships


Although, it is possible to replace any non-binary (n-ary, for n > 2) relationship set by a number of distinct binary
relationship sets, an n-ary relationship set shows more clearly that several entities participate in a single relationship

Some relationships that appear to be non-binary may be better represented using binary relationships

For example, a ternary relationship parents, relating a child to his/her father and mother, is best replaced by two
binary relationships, father and mother

Using two binary relationships allows partial information (eg: only mother being known)

But there are some relationships that are naturally non-binary

Example: proj_guide

Binary v/s Non-binary Relationships: Conversion


In general, any non-binary relationship can be represented using binary relationships by creating an artificial entity set

Replace R between entity sets A, B and C by an entity set E, and three relationship sets:

RA , relating E and A
RB , relating E and B
RC , relating E and C
Create an identifying attribute for E and add any attributes of R to E

For each relationship (ai , bi , ci ) in R, create

A new entity ei in the entity set E

add (ei , ai ) to RA

add (ei , bi ) to RB

add (ei , ci ) to RC

Week 4 Lecture 5 6
Also need to translate constraints

Translating all constraints may not be possible

There may be instance in the translated schema that cannot correspond to any instance of R

Exercise: add constraints to the relationships RA , RB and RC to ensure that a newly created entity
corresponds to exactly one entity in each of entity sets — A, B and C

We can avoid creating an identifying attribute by making E, a weak entity set identified by the three relationship
sets

ER Design Decisions
The use of an attribute or entity set to represent an object

Whether a real-world concept is best expressed by an entity or a relationship set

The use of a ternary relationship versus a pair of binary relationships

The use of strong or weak entity set

The use of specialization/generalization — contributes to modularity in the design

The use of aggregation — can treat the aggregate entity set as a single unit without concern for the details of its
internal structure

Symbols used in the ER Notation

Week 4 Lecture 5 7
Week 4 Lecture 5 8
Week 4 Lecture 5 9

You might also like