Database-notes
Database-notes
What is Database?
The software that store huge collection of data called database.
Using database software make easier to find, add, update and delete data.
Software to process
queries
Software to process
access stored data
A file-based system is a way of managing data where each set of data is stored in separate files,
often in a flat, unstructured format.
1. Data Redundancy and Inconsistency: Data is often duplicated across multiple files,
leading to redundancy and potential inconsistencies.
2. Data Isolation: It is difficult to access and integrate data scattered across different files.
3. Integrity Problems: Enforcing data integrity rules (like constraints and relationships) is
difficult.
4. Atomicity Issues: Ensuring that all parts of a transaction are completed successfully (or
none at all) is complex.
5. Concurrent Access: Handling multiple users accessing and updating data simultaneously
is challenging.
6. No standards
7. Data dependence
8. No way to generate ad hoc queries
9. No provision for security, recovery, concurrency, etc
10. Security: Implementing robust security measures is harder compared to database
systems.
11. Data Independence: Changes in data structure often require changes in application
programs, unlike databases which separate data and application logic.
Chapter: Database System Concepts and Architecture
Example of DBMS:
-Oracle
-MS Access
-MYSQL
-SQL Server
-DB-2
-Ingress
-Postgress SQL
DDL is used to define and manage database structures. It includes commands that create, alter,
and delete database objects such as tables, indexes, and schemas.
DML is used to manipulate the data within the database. It includes commands that allow users
to insert, update, delete, and retrieve data from database tables.
Procedural DML:
o Requires the user to specify what data is needed and how to get it.
o The user must define the sequence of operations to retrieve or manipulate the
data.
o Example: SQL's PL/SQL or T-SQL, where the user writes procedures and
functions that include detailed steps and loops.
Non-Procedural DML:
o Requires the user to specify what data is needed without defining the sequence
of operations.
o The database system determines the optimal way to retrieve or manipulate the
data.
o Example: Standard SQL (SELECT, INSERT, UPDATE, DELETE), where the
user specifies the desired data through high-level queries.
Non-procedural DML is generally considered easier to use and more declarative, allowing the
database system to optimize the execution plan.
DCL is used to control access to data in the database. It includes commands that grant or revoke
permissions to users or groups of users.
View of Data
Schema – the logical structure of the database (overall design)e.g., the database consists of
information about a set of customers and accounts and the relationship between them)
Analogous to type information of a variable in a program
Physical Data Independence – the ability to modify the physical schema without changing the
logical schema
Applications depend on the logical schema
In general, the interfaces between the various levels and components should be well defined so
that changes in some parts do not seriously influence others.
ANSI/SPARC stands for American National Standards Institute, Standards Planning And
Requirements Committee.
1) Internal Level:
• Deals with physical storage of data
• Structure of records on disk - files, pages, blocks
• Indexes and ordering of records
• Used by database system programmers
2) Conceptual Level:
• Deals with the organisation of the data as a whole
• Abstractions are used to remove unnecessary details of the internal level
• Used by DBAs and application programmers
3) External Level:
• Provides a view of the database tailored to a user
• Parts of the data may be hidden
• Data is presented in a useful form
• Used by end users and application programmers
User 1 User 2 User 3
External External
View 1 View 2
Conceptual
View DBA
Stored
Data
A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. The functional components of a database system can be broadly divided into the
storage manager and the query processor components.
Storage manager is important for data and memory management and interface
Query processor is important because if helps the database system simplify and facilitate access
to data
In Disk-Based DBMS, the original database resides in the disk and a part of the database
is loaded in the main memory for processing.
In main-memory DBMS, original copy of the database resides in the main memory and a
back-up is kept in the disk.
Memory access is faster than disk access. So MMDB is faster then disk-based DBMS.
Memory must be non-volatile in MMDB.
Disk-based DBMS in highly scalable where is MMDBMS is limited scalable.
Server
Chapter: Data modeling
Data Modeling in Databases is the process of defining and structuring the data elements and
their relationships within a database, using conceptual, logical, and physical models. This
ensures efficient organization, storage, and retrieval of data in the database system.
Database Design
Conceptual design- -Build a model independent of the choice of DBMS
Logical design -Create the database in a given DBMS
Physical design -How the database is stored in hardware
Diagramming Entities
In an E/R Diagram, an entity is usually drawn as a box with rounded corners
• The box is labelled with the name of the class of objects represented by that entity
Relationships
Relationships are an association between two or more entities
• Each Student takes several Modules
• Each Module is taught by a Lecturer
• Each Employee works for a single Department
Relationships have
• A name
• A set of entities that participate in them
• A degree – the number of entities
that participate (most
have degree 2)
• A cardinality ratio
Cardinality Ratios
Each entity in a relationship can participate in zero, one, or more than one instances of that
relationship
• This leads to 3 types of relationship…
Diagramming Relationships
Relationships are links between two
entities
• The name is given in a diamond box
• The ends of the link show cardinality
Removing M:M Relationships
• Many to many relationships are difficult to represent
• We can split a many to many relationship into two one to
many relationships
• An entity represents the M:M relationship
Making E/R Models
To make an E/R model you need to identify
• Enitities
• Attributes
• Relationships
• Cardinality ratios
• from a description
• General guidelines
• Since entities are things or objects they are often nouns in the description
• Attributes are facts or properties, and so are often nouns also
• Verbs often describe relationships between entities
Example
A university consists of a number of departments. Each department offers several courses. A
number of modules make up each course. Students enrol in a particular course and take modules
towards the completion of that course. Each module is taught by a lecturer from the appropriate
department, and each lecturer tutors a group of students
Example – Entities
A university consists of a number of departments. Each department offers several courses. A
number of modules make up each course. Students enrol in a particular course and take modules
towards the completion of that course. Each module is taught by a lecturer from the appropriate
department, and each lecturer tutors a group of students
Entities and Attributes
Sometimes it is hard to tell if something should be an entity
or an attribute
• They both represent objects or facts about the world
• They are both often represented by nouns in descriptions
• General guidelines
• Entities can have attributes but attributes have no smaller parts
• Entities can have relationships between them, but an attribute belongs to a single entity
Example
We want to represent information about products in a database. Each product has a description, a
price and a supplier. Suppliers have addresses, phone numbers, and names. Each address is made
up of a street address, a city, and a postcode.
Example - Entities/Attributes
• Entities or attributes:
• product
• description
• price
• supplier
• address
• phone number
• name
• street address
• city
• postcode
• Products, suppliers,
and addresses all
have smaller parts
so we can make
them entities
• The others have no smaller parts and belong to a single entity
Example – Relationships
• Each product has a supplier
• Each product has a single supplier but there is nothing to stop a supplier supplying many
products
• A many to one relationship
• Each supplier has an address
• A supplier has a single address
• It does not seem sensible for two different suppliers to have the same address
• A one to one relationship
One to One Relationships:
• Some relationships between entities, A and B, might be redundant if
• It is a 1:1 relationship between A and B
• Every A is related to a B and every B is related to an A
• Example – the supplier-address relationship
• Is one to one
• Every supplier has an address
• We don’t need addresses that are not related to a supplier
Redundant Relationships
• We can merge the two entities that take part in a redundant relationship together
• They become a single entity
• The new entity has all the attributes of the old one
Making E/R Diagrams
• From a description of the requirements identify the
• Entities
• Attributes
• Relationships
• Cardinality ratios of the relationships
• Draw the E/R diagram and then
• Look at one to one relationships as they might be redundant
• Look at many to many relationships as they might need to be split into two one to
many links
Debugging Designs
With a bit of practice E/R diagrams can be
used to plan queries
• You can look at the
diagram and figure out how to find useful information
• If you can’t find the information you need, you may need to change the design
How can you find a list of students who are enrolled in Database systems?
What is Entity, attribute and relationship?
What is ER diagram?
An Entity-Relationship (ER) Diagram is a visual representation of the data entities and their
relationships within a database.
Customer
Name
Opens
address
Borrower Loan
Customer
Amount
Mapping Cardinalities
Express the number of entities to which another entity can be associated via a relationship set.
Most useful in describing binary relationship sets.
For a binary relationship set the mapping cardinality must be one of the following types:
-One to one
-One to many
-Many to one
-Many to many
One- to- One : A customer can open only one account. An account can be opened by only
one customer.
One- to – many: A customer can open one or more accounts. An account can be opened
by only one customer.
Many- to one: A customer can open only one account. An account can be opened by one
or more customers.
Many- to – Many : A customer can open many accounts. An account can be opened by
many customers.
a) One to one
b) One to many
Note: Some elements in A and B may not be mapped to any elements in the other set
a) Many to one
b) Many to many
Note: Some elements in A and B may not be mapped to any elements in the other set
What is Participation Constraints?
Total: If every entity in set E participate at least one relationship in R.
Partial: If only some entities in E participate in Relationship R.
Can make access-date an attribute of account, instead of a relationship attribute, if each account
can have only one customer i.e., the relationship from account to customer is many to one, or
equivalently, customer to account is one to many
Give example of E-R Diagram with Composite, Multivalued, and Derived Attributes.
-In a university, a course is a strong entity and a course-offering can be modeled as a weak entity
-The discriminator of course-offering would be semester (including year) and section-number (if
there is more than one section)
-If we model course-offering as a strong entity we would model course-number as an attribute.
Then the relationship with course would be implicit in the course-number attribute
What is Specialization?
-Top-down design process; we designate subgroupings within an entity set that are distinctive
from other entities in the set.
-These subgroupings become lower-level entity sets that have attributes or participate in
relationships that do not apply to the higher-level entity set.
-Depicted by a triangle component labeled ISA (E.g. customer “is a” person).
-Attribute inheritance – a lower-level entity set inherits all the attributes and relationship
participation of the higher-level entity set to which it is linked.
What is existence dependency?
Existence dependency means the lifetime of one entity relies on another. If the existence of an
entity ‘x’ depends on the existence of entity ‘y’, then ‘x’ is called existence dependent on ‘y’. if y
is deleted then x will be automatically deleted.
y is dominated entity .
x is subordinate entity.
Imagine a child table (dependent) with a foreign key referencing a parent table. If a record in the
parent table is deleted (ceases to exist), the corresponding records in the child table may also be
deleted to ensure data integrity.
What is UML?
UML (Unified Modeling Language) is a standardized visual language for modeling the structure
and behavior of software systems.
Name Address
id Customer
Id
Customer Name
address
ER diagrams focus on data modeling (entities & relationships) while UML offers a broader
toolkit for software system design.
What is Generalization?
-A bottom-up design process – combine a number of entity sets that share the same features into
a higher-level entity set.
-Specialization and generalization are just inversions of each other; they are represented in an E-
R diagram in the same way.
-The terms specialization and generalization are used interchangeably.
Specialization and Generalization
What is Aggregation?
Why OO Model ?
Limitations of Relational Model for Complex Applications e.g. CAD Internet, Bio-informatic
applications etc.
To Support Complex Data Type e.g.
Address{ Street No, Street Name, City}, Phone-number { o---n entries}
What is Inheritance?
-Several classes are similar
-To allow the direct representation of similarities among classes, we can construct class
hierarchy
Person
Employee Customer
Officer
Security Others
-An object retains its identity even if same or all of the values of variables or definitions of
methods change over time.
-Object-Oriented Systems use an unique object identifier to identify object.
-Object Database Management Group( ODM G ) Standardizes Object Query Language( OQL)
like SQL in relational systems.
Description: Adds more detail to the conceptual model, specifying the attributes of the
entities and the relationships between them, without considering how these will be
physically implemented in the database.
Purpose: To prepare a detailed blueprint of the database structure that can be used to
design the physical model.
Description: Translates the logical data model into a specific database management
system (DBMS), defining tables, columns, data types, indexes, and constraints.
Purpose: To implement the actual database structure in the chosen DBMS for efficient
storage, retrieval, and management of data.
Chapter: Relational Model and Relational Algebra
Relation/Table: A table in a database represents a relation, which is a set of tuples (rows). Each
table consists of columns (fields) and rows (records).
The cardinality of a relation in a database refers to the number of tuples (rows) present in a
table.
Two relation R and S are union-compatible if they have same number of columns and
corresponding column have the same domains.
Example:
S
The result of the union operation on these two tables will be:
Let R and S be two union-compatible relations . then R ∪ S is a relation which contains touples
from both relation.
R ∪ S ={X: X∈ R or X ∈S}
Define super key , candidate key and primary key and foreign key in database.
Super Key
A super key is a set of one or more attributes (columns) that can uniquely identify a record (row)
in a table. A super key can have additional attributes that are not necessary for unique
identification.
{StudentID}
{StudentID, Name}
{StudentID, Name, Email}
Candidate Key
A candidate key is a minimal super key, meaning it is a super key with no redundant attributes.
In other words, it is the smallest subset of attributes that can uniquely identify a record in a table.
Example: From the previous super keys, possible candidate keys could be:{StudentID}
{Email}
Here, {StudentID, Name} is not a candidate key because Name is redundant (StudentID alone
can uniquely identify a record).
Primary Key
A primary key is a candidate key that is chosen by the database designer to uniquely identify
records in a table. There can be only one primary key per table, and it cannot contain null values.
Example: If we choose {StudentID} as the primary key for the student table, it means StudentID
will uniquely identify each record in that table, and no two students can have the same
StudentID.
Foreign key:
A foreign key in a database is a field (or collection of fields) in one table that uniquely identifies
a row of another table. It establishes a link between the two tables, enforcing referential integrity
by ensuring that the value in the foreign key column must match a primary key value in the
referenced table.
Customers Table
Orders Table
In this example:
Restrict (No Action): This disallows inserting a record with a foreign key value that doesn't
exist in the referenced table's primary key.
Cascade: When a record with a referenced primary key is deleted, all corresponding foreign
key references are automatically deleted as well (cascading the deletion).
Set Null: Instead of deleting, the foreign key value can be set to null when the referenced
primary key is deleted.
Set Default: A default value can be assigned to the foreign key if the referenced primary key
is deleted.
Triggers Custom procedural code that executes in response to certain events (INSERT,
UPDATE, DELETE) on a table.
What is Integrity Constraints?
Integrity Constraints are rules applied to ensure the accuracy, consistency, and validity of data
within a database. They enforce the reliability of data by restricting the types of data that can be
inserted, updated, or deleted, thus maintaining the integrity of the database.
Design a database indicating the tables (including primary key and foreign key) for the
following scenario:
In a car company, there are many employees, some of them are in sales, some are in accounts,
some are human resource and some are in management. The company sells cars to customers.
Customer information including names, addresses, mobile numbers, etc. are vital. The orders of
each customer are also stored. The payment information with payment amount, payment type,
payment date, etc. are important.
To design a database for a car company with the given requirements, we'll create several tables
to store information about employees, customers, orders, and payments. Each table will include
primary keys and foreign keys where necessary to establish relationships between the data.
Tables
1. Employees
o EmployeeID (Primary Key)
o Name
o Department (Sales, Accounts, Human Resource, Management)
o Address
o MobileNumber
2. Customers
o CustomerID (Primary Key)
o Name
o Address
o MobileNumber
3. Orders
o OrderID (Primary Key)
o CustomerID (Foreign Key references Customers.CustomerID)
o OrderDate
o TotalAmount
4. OrderDetails
o OrderDetailID (Primary Key)
o OrderID (Foreign Key references Orders.OrderID)
o CarID (Foreign Key references Cars.CarID)
o Quantity
o UnitPrice
5. Payments
o PaymentID (Primary Key)
o OrderID (Foreign Key references Orders.OrderID)
o PaymentAmount
o PaymentType (e.g., Credit Card, Cash, Bank Transfer)
o PaymentDate
6. Cars
o CarID (Primary Key)
o CarModel
o Manufacturer
o Price
Relationships
Each Order is placed by a Customer, and a customer can place multiple orders.
Each Order can have multiple OrderDetails, indicating different cars purchased in that
order.
Each Payment is associated with an Order.
Each OrderDetail includes details about a specific Car.
What are the Six basic operators(Fundamental operator) of Relational algebra?
Unary operator:
Select(σ) [raw]
project (π) [column]
union (∪)
Relation r
A B C D
7
1
7
5
3
12
1
23
0
A B C D
1 7
2 1
3 0
Describe Select Operation.
-Notation: p(r)
-p is called the selection predicate
-Defined as:
p(r) = {t | t r and p(t)}
-Where p is a formula in which terms are connected by : (and), (or), (not)
- Each term is one of:
<attribute>op <attribute> or <constant>
where op is one of: =, , >, . <.
-Example of selection:
branch-name=“Perryridge”(account)
Relation r:
A,C (r)
Notation:
Relations r, s:
r s:
Explain Union Operation.
Notation: r s
Defined as:
r s = {t | t r or t s}
For r s to be valid.
1. r, s must have the same number of attributes
2. The attribute domains must be compatible
e.g., 2nd column
of r deals with the same type of values as does the 2nd column of s
E.g. to find all customers with either an account or a loan
customer-name (depositor) customer-name (borrower)
r – s:
-Notation r – s
Defined as:
r – s = {t | t r and t s}
-Set differences must be taken between compatible relations.
-r and s must have the same number of attributes
-attribute domains of r and s must be compatible
r x s:
Notation r x s
Defined as:
r x s = {t q | t r and q s}
Assume that attributes of r(R) and s(S) are disjoint. (That is, R S = ).
If attributes of r(R) and s(S) are not disjoint, then renaming must be used.
How Rename Operation works?
-Allows us to name, and therefore to refer to, the results of relational-algebra expressions.
-Allows us to refer to a relation by more than one name.
-Example: x (E)
-returns the expression E under the name X
- If a relational-algebra expression E has arity n, then
x (A1, A2, …, An) (E)
returns the result of expression E under the name X, and with the
attributes renamed to A1, A2, …., An.
Banking Example
Example Queries
Find the loan number for each loan of an amount greater than $1200
Find the names of all customers who have a loan, an account, or both, from the bank
customer-name (borrower)
customer-name (depositor)
Find the names of all customers who have a loan and an account at bank.
customer-name (borrower)
customer-name (depositor)
Find the names of all customers who have a loan at the Perryridge branch.
customer-name (branch-name=“Perryridge”
(borrower.LN = loan.LN(borrower x loan)))
Find the names of all customers who have a loan at the Perryridge branch but do not have an
account at any branch of the bank.
Find the names of all customers who have a loan at the Perryridge branch.
Query 1
customer-name(branch-name = “Perryridge” (
borrower.LN = loan.LN(borrower x loan)))
Query 2
customer-name(loan.LN = borrower.LN
((branch-name = “Perryridge”
(loan)) x borrower))
balance(account) - account.balance
(account.balance < d.balance (account x rd (account)))
Additional Operations
Set intersection
Natural join
Division
Set-Intersection Operation
Notation: r s
Defined as:
r s ={ t | t r and t s }
Assume:
r, s have the same arity
attributes of r and s are compatible
Note: r s = r - (r - s)
Natural-Join Operation
It is possible for tuples to have a null value, denoted by null, for some of their attributes
null signifies an unknown value or that a value does not exist.
The result of any arithmetic expression involving null is null.
Aggregate functions simply ignore null values
For duplicate elimination and grouping, null is treated like any other value, and two nulls are
assumed to be the same
Chapter: SQL command
A query language in a database is a specialized computer language that acts as an interface for
users to ask questions, retrieve specific data, and even manipulate and manage information
within the database.
The most widely used query language for relational databases is SQL (Structured Query
Language).
Requires the user to specify how to obtain the desired result by providing a detailed
sequence of operations.
The user defines the control flow and the exact steps to be followed.
Example: SQL's procedural extensions like PL/SQL, relational algebra etc. , where the
user writes procedures and functions that include loops and conditionals.
Requires the user to specify what data is needed without detailing how to obtain it.
The database management system determines the best way to execute the query.
Example: Standard SQL, where users write declarative queries like SELECT statements
to specify the desired data.
What are the Differentiate between drop, truncate and delete commands.
DROP:
Purpose: Deletes an entire table, including its structure, indexes, constraints, and data.
Usage: DROP TABLE table_name;
Effect: Irreversible operation; removes all data and schema definition of the table.
TRUNCATE:
Purpose: Removes all rows from a table quickly without logging individual row
deletions.
Usage: TRUNCATE TABLE table_name;
Effect: Fast operation; resets table data without affecting its structure. Cannot be rolled
back.
DELETE:
Purpose: Removes specific rows from a table based on a condition or deletes all rows if
no condition is specified.
Usage: DELETE FROM table_name WHERE condition;
Effect: Slow compared to TRUNCATE; deletes rows one by one and logs each deletion
in the transaction log. Can be rolled back (if within a transaction) and allows using
conditions for selective deletions.
1. Basic Form:
1. Example Query:
SELECT branch-name
FROM loan
Answer:
Draw table like:
Enm_info(Emp_id, Emp_Name, Address)
Emp_Salary(Emp_id,Month,Year, Salary)
Position_info(Emp_id, postion,Joining_date)
Leave_info(Emp_id, leave_id, start_date_leave,end_date_leave)
Traing_info(T_id, Emp_id, Traing_start, Training_end, completed(Yes/No)
Write SQL command to find how many customer have paid more then 175?
Ans:
Select count(*)
From Customer_info A
Where A.paymnet>175
Write SQL command to find the name of the customer from the above table.
Write SQL command to show the name and address of the customer those have paid more
than 200.
select A.Name, A.Address
from customer_info A
where A.payment>200
Write SQL command to find the of List of customer name, address, Bank name of those
who have paid more than 190 tk
SELECT bank_name
FROM Bank_info
WHERE bank_name LIKE “% _YZ%';
Here ‘_’ means = any 1 letter and ‘%’ means any letter(with any length)
Write SQL Command to find the name of the students who study in ICT and has taken
course ICT-203
Select A.Name
From Student A, Department B, course C
Where A.dept_id=B.id and B.id=c.id and B.Name=”ICT”
Id Name
1 ICT
2 CSE
3 EE
Table: dept_info
Update dept_info
Set Name=”IT”
Where id=”1”
Write SQL Command to Edit the status to 1 where student study in CSE and has taken
course 102
UPDATE student
SET status = 1
WHERE dept_id = (SELECT dept_id FROM departments WHERE dept_name = 'CSE') AND id IN (SELECT
student_id FROM enrollments WHERE course_id = 102);
Course table:
Std_id Course_id
1 101
1 102
2 101
2 103
3 104
3 103
Result:
Count Course
2 101
1 102
2 103
1 104
Customer table
id Name Address
1 M Dhk
2 N
3 P Ctg
Select name
From customer
Where Address is NUL
Alter table T1
Drop column email;
Alter table T1
Add contrain PK_person
Primary key (id, last name)
Alter table T1
Drop constraint;
Alter table T1
Rename from T1 to T2;
Insert into T1
(Roll, Name, GPA)
Value (1, “X”, 5.0);
Delete from T1
Where id=2;
Say T1(c1,c2,c3)
Create view V
As select c1,c2
From T1
Where id=6;
Create index i
On T1(c1,c2);
Drop index i;
Give an example SQL command where the format of the date you are trying to insert, matches
the format of the date column in the table.
Explain with example the difference between "Union" and "Union all" keywords in the
context of SQL?
Union:
Description: Combines the result sets of two or more SELECT queries and removes
duplicate rows.
Example:
This query will return a list of unique names from both the employees and contractors tables,
removing any duplicates.
UNION ALL:
Description: Combines the result sets of two or more SELECT queries without removing
duplicate rows.
Example:
SELECT name FROM employees
UNION ALL
SELECT name FROM contractors;
This query will return a list of all names from both the employees and contractors tables,
including duplicates.
Consider a database with the following three tables. The columns underlined are primary
keys, these are foreign keys in other tables. Applicant_Info (Application ID, Passport_No, Name,
Fathers_Name, Address, Mobile_No, Citizenship, Application_Date)
Booking_Info (Booking ID, Application_ID, Paid_Amount, Travel_From, Travel_To,
Tra.vel_Date) . ,
Write down the necessary SQL commands for the above Tables.
Find the applicant names, and mobile numbers of people who are Bangladeshi citizens.
Find the number of people of British citizenship who have applied so far.
Find the number of people of each country who have applied so far.
Find the applicant names, and paid amount who have paid more than four (4) Lac taka in a single
booking.
Find the applicant names, and paid amount who have paid more than four (4) Lac while traveling
from UK in a single booking.
Edit the Name of a person from Abdul Karimm to Abdul Karim (Application_ID= `102512').
Find the name of applicants whose names starts with "George".
Find the number of unique Application_Id from the Booking_Info table.
Write a stored procedure that will update price of the costliest item in the titles table to double.
Consider a database with the following table. The columns underlined are primary keys.
Applicant_Info (Application ID, Passport_No, Name, Fathers_Name, Address, Mobile No,
Citizenship, Application_Date)
Write down the necessary SQL commands for the above Tables. Write an SQL command to add
one record in Applicant_Info Table. iii) Write an SQL command to delete that one record in
Applicant_Info Table.
Consider a university database for the scheduling of classrooms for final exams. This database
could be modeled as a single entity set exam, with attributes course-name, section-number,
room-number and time. Alternatively one or more additional entity sets could be defined, along
with relationship sets to replace some of the attributes of the exam entity set, as
Show the E-R diagram illustrating the use of all three additional entity sets listed.
The key fields are underlined, and the domain of each field is listed after the field
name. Therefore sid is the key for Suppliers, pid is the key for Parts, and sid and pid
together form the key for Catalog. The Catalog relation lists the prices charged for
parts by Suppliers. The Subparts relation lists for each part, its subparts if any.
cos
t
Figure: E-R diagram
The meaning of these relations is straightforward; for example, Enrolled has one record
per student-class pair such that the student is enrolled in the class.
Write the SQL statements required to create these relations, including appropriate versions of all
primary and foreign key integrity constraints.
Express integrity constraints in SQL for the following constraint: Two classes cannot meet
in the same room at the same time.
Write a trigger that will add a record in the Employee_History table when a record is removed
from the Employee_Info table.
Give an example a an SQL command where the format of the date you are trying to insert,
matches the format of the date column in the table.
Chapter : Database Design and Normalization
Functional Dependencies
Example
In a relation R with attributes A and B, if A uniquely determines B (denoted as A → B), then for
any two tuples in R, if the A values are the same, the B values must also be the same.
What is Normalization ?
Normalization is the process of dividing large tables into smaller, related tables to organize data,
minimize redundancy, and improve data integrity.
1st NF:
The First Normal Form (1NF) in a database ensures that each table column contains atomic,
indivisible values, and each column contains values of a single type. Additionally, each table
must have a unique identifier or primary key.
Course Content
Programming C,C++, Java
web HTML,PHP, ASP
Course Content
Programming C
Programming C++
Programming Java
web HTML
Web PHP
web ASP
2NF:
The Second Normal Form (2NF) in a database builds on 1NF by ensuring that all non-key
attributes are fully functionally dependent on the entire primary key, eliminating partial
dependency on any subset of the primary key.
Prime attribute: an attribute which is a part of candidate key is known as prime attribute.
Non-prime attribute: an attribute which is NOT part of primary key is said to be non-prime
attribute.
Every non-primary attribute should be fully functionally dependent on prime key attribute.
(if X --> A holds then there should NOT be any proper subset X of A also holds true.
Conditions of 2NF:
Now in 2NF:
Student
Project:
Std_id Project_name
3NF:
The first condition for the table to be in Third Normal Form is that the table should be in the
Second Normal Form.
The third Normal Form ensures the reduction of data duplication. It is also used to achieve
data integrity.
Vendor table
i) Here in this exmaple there are Name, acc_no,Bank_code_no and they are functionally
dependent on id
Bank_code_no.--> Bank
So in 3 NF:
Table: Vendor
Bank_code_no Bank
Table: Student
Table: Zip_code
Zip_code city
With appropriate example, briefly discuss the second and third normalization form for a
relational database system.
Here, the primary key is a composite key consisting of StudentID and CourseID. The attribute
CourseName depends only on CourseID and not on StudentID. Thus, it violates 2NF as not all
non-key attributes are fully functionally dependent on the entire primary key.
Students
Now, each non-key attribute is fully functionally dependent on the primary key of its table.
Third Normal Form (3NF)
Here, InstructorOffice is dependent on Instructor, which is not part of the primary key.
This indicates a transitive dependency and violates 3NF.
Courses:
Now, there are no transitive dependencies, and the table is in 3NF.
Normalization helps in organizing the database efficiently, ensuring data integrity and reducing
redundancy.
Chapter: File Organization and Indexing
Index files are typically much smaller than the original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted order
Hash indices: search keys are distributed uniformly across “buckets” using a
“hash function”.
-Access types
Access type can include finding records with a specified attribute value and
finding records whose record attribute value fall in a specified range
-Access time
-Insertion time
-Deletion time
-Space overhead
-In an ordered index, index entries are stored sorted on the search key value. E.g., author
catalog in library.
-Primary index: in a sequentially ordered file, the index whose search key specifies the
sequential order of the file.
Also called clustering index
The search key of a primary index is usually but not necessarily the primary key.
-Secondary index: an index whose search key specifies an order different from the sequential
order of the file. Also called
non-clustering index.
-Index-sequential file: ordered sequential file with a primary index.
-Sparse indices requires less space and less maintenance overhead for insertions and deletions.
-Sparse indices are generally slower than dense index for locating records.
-There is a tradeoff between access time and space overhead
-A good compromise is to have a sparse index with one index entry per block.
If deleted record was the only record in the file with its particular search-key value, the search-
key is deleted from the index also.
Single-level index deletion:
Dense indices – deletion of search-key is similar to file record deletion.
Sparse indices – if an entry for the search key exists in the index, it is deleted by
replacing the entry in the index with the next search-key value in the file (in
search-key order). If the next search-key value already has an index entry, the
entry is deleted instead of being replaced.
Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level
algorithms
What do you understand by primary and secondary index? Explain with example.
Frequently, one wants to find all the records whose values in a certain field (which is not the
search-key of the primary index) satisfy some condition.
Example 1: In the account database stored sequentially by account number, we
may want to find all accounts in a particular branch
Example 2: We want to find all accounts with a specified balance or range of
balances
We can have a secondary index with an index record for each search-key value; index record
points to a bucket that contains pointers to all the actual records with that particular search-key
value.
1. Balanced Structure:
o Ensures all leaf nodes are at the same level.
o Provides uniform access time for all data.
2. Efficient Searching:
o Binary search within nodes.
o Reduces the number of disk reads.
3. Range Queries:
o Supports efficient range queries.
o Sequential access to leaf nodes via linked list pointers.
4. Dynamic Insertions and Deletions:
o Automatically balances itself during insertions and deletions.
o Minimizes the need for reorganization.
5. Non-Redundant Internal Nodes:
o Only stores keys, not actual data.
o Reduces redundancy and saves space.
6. High Fan-Out:
o Minimizes tree height.
o Reduces I/O operations due to fewer levels to traverse.
7. Efficient Space Utilization:
o Node splitting and merging keep nodes densely packed.
o Ensures effective use of storage.
1. Complex Implementation:
o More complex than simpler structures like hash tables or binary trees.
o Requires careful management of node splits and merges.
2. Space Overhead:
o Additional storage required for pointers.
o Internal and leaf nodes need to store links.
3. Write-Heavy Workloads:
o Insertions and deletions can cause node splits and merges.
o Performance may degrade under heavy write operations.
4. Maintenance Cost:
o Regular rebalancing needed to maintain structure.
o More CPU and memory overhead during maintenance operations.
5. Not Always Optimal for Single Key Lookups:
o May be slower than hash indexes for single key lookups.
o More suitable for range queries and ordered datasets.
What is Hashing ?
-A bucket is a unit of storage containing one or more records (a bucket is typically a disk block).
-In a hash file organization we obtain the bucket of a record directly from its search-key value
using a hash function.
-Hash function h is a function from the set of all search-key values K to the set of all bucket
addresses B.
-Hash function is used to locate records for access, insertion as well as deletion.
-Records with different search-key values may be mapped to the same bucket; thus entire bucket
has to be searched sequentially to locate a record.
In a database, hash files store data like a phonebook with hashed names.
1. Hash Function: A code turns a search key (like ID) into a bucket address.
2. Storage: Records go in buckets based on their hash value.
3. Retrieval: The search key's hash value directs you to the bucket for (hopefully) fast
lookups.
It's quick but can have collisions (multiple records in one bucket).
What are advantage and disadvantages of hash function used for indexing?
Advantage hash function used for indexing:
Fast Retrieval: Direct access to records via search key enables quicker lookups compared to
traditional methods.
Simple Implementation: Easier to understand and implement than complex indexing
structures.
Dynamic Operations: Efficient handling of insertions and deletions due to hash value-based
record placement.
Fast Access: Provides constant time complexity (O(1)) for search, insert, and delete
operations on average.
Efficient Space Utilization: More space-efficient if the hash function evenly distributes
entries.
Simplicity: Straightforward and easy to manage implementation.
Uniform Distribution: Good hash functions distribute keys uniformly, reducing collisions
and enhancing efficiency.
Collisions:
Multiple keys hashing to the same index require extra handling (e.g., chaining, open
addressing), potentially degrading performance.
Fixed Size:
The hash table size must be predefined and can be difficult to adjust dynamically.
No Range Queries:
Performance depends on the hash function's quality; poor hash functions can cause
clustering and performance issues.
No Order:
Hashing does not maintain data order, requiring additional techniques for ordered
retrieval.
Searching for values in non-key fields is less efficient compared to indexed search trees.
Chapter: Transaction management
A transaction is a single unit of program execution that access and possible update various data
item maintaining data consistency data consistency and integrity, with changes committed if
successful or rolled back if any operation fails.( ACID properties).
ACID Properties
Transaction States
1. Active
o The initial state when the transaction begins. Operations are being executed.
2. Partially Committed
o After the final operation has been executed but before the transaction is
committed. All changes are temporary until successfully committed.
3. Committed
o The state after the transaction has been successfully completed and all changes are
permanently saved to the database.
4. Failed
o The state when the transaction cannot proceed due to a failure, such as a
constraint violation or system crash.
5. Aborted
o The state after the transaction has been rolled back, and the database is restored to
its state prior to the start of the transaction.
Describe the shadow database scheme for implementing database atomicity and durability.
The shadow database scheme is a technique to ensure atomicity and durability in databases. It
involves maintaining two copies of the database: the current database and the shadow (backup)
database.
1. Current Database: The working copy where all transactions are initially applied.
2. Shadow Database: An unchanged copy of the database representing the last consistent
state.
Steps
1. Begin Transaction:
o Operations are applied to the current database.
2. Commit Transaction:
o Before committing, the changes are written to a new location in the storage.
o If the commit is successful, the shadow database pointer is updated to point to the
new location.
3. Rollback Transaction:
o If a transaction fails, the system discards changes in the current database.
o The shadow database remains unchanged, ensuring durability.
Benefits
Atomicity: Ensures that either all operations of a transaction are reflected in the database
or none at all.
Durability: Once a transaction is committed, its changes are permanent even in case of a
system failure.
1. Lock-Based Protocols
Lock-Based Protocols
Shared Mode (S): Allows multiple transactions to read the same data item
simultaneously.
Behavior: Transactions can acquire shared locks concurrently, but no transaction can
write to the data item until all shared locks are released.
Exclusive Mode (X): Allows a transaction to read and write to a data item.
Behavior: When a transaction holds an exclusive lock, no other transactions can read or
write to the data item until the exclusive lock is released.
Growing Phase: Transactions acquire all the locks they need but do not release any.
Shrinking Phase: Transactions release their locks but do not acquire any new ones.
Guarantee: Ensures serializability but can lead to deadlocks.
What is deadlock in Database? How to handle it?
A deadlock in a database occurs when two or more transactions are waiting for each other to
release locks, creating a cycle of dependencies that prevents any of them from proceeding.
Handling Deadlocks
1. Prevention Protocols:
Ensure no circular wait: Enforce a strict order in which transactions must request locks,
avoiding cycles in the wait-for graph.
2. Wait-Die Scheme:
3. Wound-Wait Scheme:
Older Transactions Wound Younger Ones: If an older transaction requests a lock held
by a younger transaction, the younger transaction is aborted (wounded) and restarted
later.
Younger Transactions Wait: If a younger transaction requests a lock held by an older
transaction, it waits.
Timeout:
Transactions wait for a specified period before timing out and rolling back if they cannot
acquire the necessary locks.
How to recovery from deadlock?
1. Rollback:
Partial Rollback: Only the deadlocked transactions are rolled back to a point before they
requested the conflicting locks.
Steps:
o Identify the deadlocked transactions.
o Roll back these transactions to a safe state.
o Restart them to continue execution.
2. Total Rollback:
Complete Rollback: All the transactions involved in the deadlock are rolled back
entirely.
Steps:
o Abort all transactions in the deadlock cycle.
o Restart them from the beginning.
Least Cost: Preferably abort transactions that have done the least amount of work to
minimize rollback costs.
Age: Younger transactions are often chosen to abort over older transactions to avoid
significant disruption.
What is Schedules?
Schedules – sequences that indicate the chronological order in which instructions of concurrent
transactions are executed
a schedule for a set of transactions must consist of all instructions of those
transactions
must preserve the order in which the instructions appear in each individual
transaction.
Example Schedules
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The following
is a serial schedule (Schedule 1 in the text), in which T1 is followed by T2.
Let T1 and T2 be the transactions defined previously. The following schedule (Schedule 3
in the text) is not a serial schedule, but it is equivalent to Schedule 1.
What is Serializability?
1. Conflict Serializability
2. View Serializability
We focus only on read and write operations. Transactions can perform any computation on data
in local buffers between reads and writes. Our schedules include only read and write instructions.
-Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists
some item Q accessed by both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
-Intuitively, a conflict between li and lj forces a (logical) temporal order between them. If li and
lj are consecutive in a schedule and they do not conflict, their results would remain the same
even if they had been interchanged in the schedule.
What do you understand by view serializability? Explain with view equivalent conditions
with example.
View Serializability
View serializability is a concept in database concurrency control that ensures transactions are
executed in a way that produces the same results as some serial (one after the other) execution of
those transactions, preserving the consistency of the database.
Two schedules are considered view equivalent if they satisfy the following conditions:
1. Initial Read:
o The same transaction reads the initial value of each data item in both schedules.
2. Update Read:
o If a transaction reads a value written by another transaction, the same transaction
reads that value in both schedules.
3. Final Write:
o The same transaction performs the final write operation on each data item in both
schedules.
Example:
Schedule S1 (Serial):
1. T1 reads A
2. T1 writes A
3. T2 reads B
4. T2 writes B
Schedule S2 (Concurrent):
1. T1 reads A
2. T2 reads B
3. T1 writes A
4. T2 writes B
In this example:
Therefore, even though the execution order is different in S2, the final outcome (values of A and
B) is the same as if they were executed sequentially in S1. This makes S2 view equivalent to S1,
and thus, view serializable.
T1 T2 T3
Read(P)
P : = PX5
Write(P)
Read(P)
Value=PX0.5
P : = P-value
Write(P)
Read(Q)
Q : = Q-100
Write(Q) Read(P)
M:=P
Write(M)
Read(Q)
Q : = Q+value
Write(Q)
Chapter: Integrity and Security
• Integrity constraints guard against accidental damage to the database, by ensuring that
authorized changes to the database do not result in a loss of data consistency.
• Domain constraints are the most elementary form of integrity constraint.
• They test values inserted in the database, and test queries to ensure that the comparisons
make sense.
• Ensures that a value that appears in one relation for a given set of attributes also appears
for a certain set of attributes in another relation.
– Arises due to dangling tuple
– Subset dependency
– Example: If “Perryridge” is a branch name appearing in one of the tuples in the
account relation, then there exists a tuple in the branch relation for branch
“Perryridge”.
What is Assertions?
What is Triggers?
– Physical level
• Physical access to computers allows destruction of data by intruders;
traditional lock-and-key security is needed
• Computers must also be protected from floods, fire, etc.
– More in Chapter 17 (Recovery)
– Human level
• Users must be screened to ensure that an authorized users do not give
access to intruders
• Users should be trained on password selection and secrecy
What is Authorization?
What is Encryption?
• Data may be encrypted when database authorization provisions do not offer sufficient
protection.
• Properties of good encryption technique:
– Relatively simple for authorized users to encrypt and decrypt data.
– Encryption scheme depends not on the secrecy of the algorithm but on the
secrecy of a parameter of the algorithm called the encryption key.
– Extremely difficult for an intruder to determine the encryption key.
What is Authentication?